COMP7305A - Cluster and cloud computing

Semester 1, 2021-22

Professor
C.L. Wang
Teaching assistants
Guoxuan Chen
Xueyu Wu
Syllabus This course offers an overview of current cloud technologies for supporting large-scale data processing in a data center, like Google, Amazon, FaceBook, and Microsoft.  Topics include: Introduction to Cloud Computing; Cloud Architectures and Service Models (SaaS, PaaS, IaaS, CaaS, FaaS); Virtualization Techniques (virtual machines and containers); Apache Hadoop (HDFS, MapReduce, YARN); Apache Spark for in-memory computing; Spark Structured Streaming with Kafka for streaming processing.  The course provides hands-on experience in using public Cloud services (e.g., AWS EC2).  Students are required to form a group to build their own private Cloud on a container-enabled PC cluster as their term project.
Introduction by Professor

The course offers details on how to deliver services on the cloud, and the core technologies behind the cloud computing.  The lecture part will provide an overview of 3 basic cloud service models (IaaS, PaaS, SaaS) and two emerging models, namely Container-as-a-Service (CaaS) and Function-as-a Service (FaaS), with motivating examples from Google, Amazon, and Microsoft.  We will discuss the core concepts and internal design of two big data processing engines, namely Hadoop and Spark.  Topic to be discussed include: Hadoop File System (HDFS), YARN scheduler, MapReduce programming, and Spark Resilient Distributed Dataset (RDD) for in-memory computing.  We will also cover Spark Structured Streaming and Apache Kafka which is used for supporting near real-time streaming processing with the use of simple SQL-like API.

The course puts strong emphasis on hands-on experiences and practical training in deploying cloud services.  Students need to complete a simple assignment to learn how to configure and deploy Amazon EC2 and ECS (Elastic Container Service) on AWS Cloud.  Students will form a group of 4 to build a private Cloud on a container-based Linux cluster.  The software to be installed include Hadoop, Spark, Spark Structured Streaming and Kafka.  Students will participate in the assembling, configuring, and benchmarking of some sample Hadoop/Spark programs and port a real-life Spark Structured Streaming application on their cluster.

Special Note: The maximum class size is limited to 90 students due to the limited number of PCs in our lab.

Learning Outcomes
Course Learning Outcomes Relevant Programme Learning Outcome
CLO1. Able to master the key technologies about the Cluster and Cloud Computing, and be able to contrast similar technologies PLO.4, 5, 6, 7, 8, 9, 10, 11, 12, 14, 15, 16
CLO2. Able to self-learn the latest Cloud Computing technologies and build their own Cloud system on a PC cluster PLO.3, 6, 7, 8, 9, 10, 11, 12
View Programme Learning Outcomes
Pre-requisites The students are expected to install various open-source cloud software in the Linux cluster, and exercise the system configuration and administration.  Basic understanding of Linux operating system and some experiences in using Linux commands are required.  The students should have at least some programming experience in Python.  Proficiency with Scala or Java is ideal.
Compatibility Students who have taken "ICOM6041 An introduction to cloud computing" should not be allowed to take COMP7305.
Topics covered
Course Content No. of Hours Course Learning Outcomes
1. Introduction of Cloud Computing 3 CLO1
2. Cloud Service Models 3 CLO1
3. Amazon EC2 and ECS (workshop) 3 CLO1 & CLO2
4. MapReduce and Hadoop File System 3 CLO1
5. Hadoop and Spark Installation (workshop) 3 CLO1 & CLO2
6. Virtualization Techniques 3 CLO1
7. Apache Spark 6 CLO1
8. Apache Kafka 3 CLO1
9. Spark Structured Streaming 3 CLO1
 
Assessment
Description Type Weighting * Examination Period ^ Course Learning Outcomes
EC2 Assignment: Port a simple Java/Python application on a public Cloud (Amazon EC2) Continuous Assessment 15% - CLO2
Term Project: Build a real Cloud system by installing Hadoop and Spark on a Kubernetes-enabled PC cluster, port one real-life application based on Spark Structured Streaming and Kafka Continuous Assessment 35% - CLO2
Written exam covers all taught content in the course
Written Examination 50% 8 - 23 December 2021 CLO1
* The weighting of coursework and examination marks is subject to approval
^ The exact examination date uses to be released when all enrolments are confirmed after add/drop period by the Examinations Office.  Students must oblige to the examination schedule. Students should NOT enrol in the course if they are not certain that they will be in Hong Kong during the examination period.  Absent from examination may result in failure in the course. There is no supplementary examination for all MSc curriculums in the Faculty of Engineering.
Course materials Recommended readings:
  • Available from the course webpage
Session dates
Date Time Venue Remark
Session 1 9 Sep 2021 (Thu) 7:00pm - 10:00pm MW-T3 Face-to-face + Online
Session 2 16 Sep 2021 (Thu) 7:00pm - 10:00pm MW-T3 Face-to-face + Online
Session 3 23 Sep 2021 (Thu) 7:00pm - 10:00pm MW-T3 Face-to-face + Online
Session 4 30 Sep 2021 (Thu) 7:00pm - 10:00pm MW-T3 Face-to-face + Online
Session 5 7 Oct 2021 (Thu) 7:00pm - 10:00pm MW-T3 Face-to-face + Online
Session 6 21 Oct 2021 (Thu) 7:00pm - 10:00pm MW-T3 Face-to-face + Online
Session 7 28 Oct 2021 (Thu) 7:00pm - 10:00pm MW-T3 Face-to-face + Online
Session 8 4 Nov 2021 (Thu) 7:00pm - 10:00pm MW-T3 Face-to-face + Online
Session 9 11 Nov 2021 (Thu) 7:00pm - 10:00pm MW-T3 Face-to-face + Online
Session 10 18 Nov 2021 (Thu) 7:00pm - 10:00pm MW-T3 Face-to-face + Online
MW - Meng Wah Complex
Add/drop 1 September, 2021 - 16 September, 2021
Maximum class size 90
Back