The course offers details on how to deliver services on the cloud, and
the core technologies behind the cloud computing. The lecture part
will provide an overview of 3 basic cloud service models (IaaS, PaaS, SaaS)
and two emerging models, namely Container-as-a-Service (CaaS) and
Function-as-a Service (FaaS), with motivating examples from Google, Amazon,
and Microsoft. We will discuss the core concepts and internal design
of two big data processing engines, namely Hadoop and Spark. Topic to
be discussed include: Hadoop File System (HDFS), YARN scheduler, MapReduce
programming, and Spark Resilient Distributed Dataset (RDD) for in-memory
computing. We will also cover Spark Structured Streaming and Apache
Kafka which is used for supporting near real-time streaming processing with
the use of simple SQL-like API.
The course puts strong emphasis on hands-on experiences and practical
training in deploying cloud services. Students need to complete a
simple assignment to learn how to configure and deploy Amazon EC2 and ECS
(Elastic Container Service) on AWS Cloud. Students will form a group
of 4 to build a private Cloud on a container-based Linux cluster. The
software to be installed include Hadoop, Spark, Spark Structured Streaming
and Kafka. Students will participate in the assembling, configuring,
and benchmarking of some sample Hadoop/Spark programs and port a real-life
Spark Structured Streaming application on their cluster.
Special Note: The maximum class size is limited to 90 students due to the
limited number of PCs in our lab.
|