COMP7104A - Advanced database systems

Semester 1, 2022-23

Bogdan Cautis
Teaching assistants
Yaozhu Sun
Tianle Wang
Mingruo Yuan
Syllabus The course will study some advanced topics and techniques in database systems, with a focus on the aspects of database systems design & algorithms and big data processing for structured data. Traditional topics include query optimization, physical database design, transaction management, crash recovery, parallel databases. The course will also survey some the recent developments in selected areas such as NoSQL databases and SQL-based big data management systems for relational (structured) data.
Introduction by Professor We are living the Big Data era, with virtually all domains of innovation (medicine, finance, sports, physics, astronomy, etc) becoming data rich.  The traditional database management systems (originally designed for processing efficiently business data) have been rapidly evolving, and new ones have emerged as well, in order to support a rich variety of data-intensive applications, with their specific characteristics and requirements.  These applications are oftentimes characterized by the so called four “Vs” of Big Data, namely Volume (data size), Variety (of data formats), Velocity (e.g., streaming sources), and Veracity (uncertain data, data reliability).  Their requirements may correspond to different perspectives on aspects such data storage architecture (e.g., cloud-based), fault tolerance, or flexibility of the data or computation model.  This course reviews data processing architectures and algorithms for managing large quantities of data, covering both transactional processing and data analytics.  While the focus is on core aspects of database architectures and algorithms, the discussions are relevant to adjacent domains, such as knowledge management, recommender systems, or machine learning at scale.

Special Note: This course is co-coded with DASC7104 which is for MDASC students.

Learning Outcomes
Course Learning Outcomes Relevant Programme Learning Outcome
CLO1. Understand trade-offs in database systems techniques, able to apply the acquired knowledge on the state-of-the-art in modern data management for designing holistic solutions based on database systems / Big Data systems and techniques, justify design decisions in the context of a data management solution. PLO 1, 2, 4, 5, 7, 8, 9, 10, 11, 14, 15, 16
CLO2. Able to implement and evaluate complex, scalable database systems solutions, develop new methods in databases based on the acquired knowledge of existing techniques. PLO 1, 2, 4, 5, 7, 8, 9, 10, 11, 14, 15, 16
View Programme Learning Outcomes
Pre-requisites The course assumes basic prior knowledge on database systems (e.g., relational databases and SQL).  Students should be familiar with at least one programming language such as Java, C++, Python, etc. and able to pick up other similar languages.
Compatibility Nil
Topics covered
Course Content No. of Hours Course Learning Outcomes
1. Query processing and optimization for relational database systems 6 CLO1, CLO2
2. Transaction management and crash recovery 6 CLO1, CLO2
3. Indexing methods 3 CLO1, CLO2
4. Parallel database systems 4.5 CLO1, CLO2
5. Big Data systems (with Hadoop, Spark) 6 CLO1, CLO2
6. NoSQL systems 4.5 CLO1, CLO2
Description Type Weighting * Examination Period ^ Course Learning Outcomes
2 written individual assignments Continuous Assessment 25% - CLO1, CLO2
4 practical lab evaluations Continuous Assessment 25% - CLO1, CLO2
2hr written exam, covering all taught content in the course Written Examination 50% 8 - 23 December 2022 CLO1, CLO2
* The weighting of coursework and examination marks is subject to approval
^ The exact examination date uses to be released when all enrolments are confirmed after add/drop period by the Examinations Office.  Students are obliged to follow the examination schedule.  Students should NOT enrol in the course if they are not certain that they will be in Hong Kong during the examination period.  Absent from examination may result in failure in the course. There is no supplementary examination for all MSc curriculums in the Faculty of Engineering.
Course materials No textbook is required, recommended readings in:
  •  Database Management Systems (3rd edition), by R. Ramakrishnan, J. Gehrke
  • Database Systems: The Complete Book, by H. Garcia Molina et al.
  • Hadoop: The Definitive Guide, by T. White
  • Cassandra: The Definitive Guide, by E. Hewitt
  • Graph Databases, by I. Robinson et al.
  • Learning Spark, by H. Karau et al.
Session dates
Date Time Venue Remark
Session 1 7 Sep 2022 (Wed) 2:30pm - 5:30pm Online Zoom
Session 2 14 Sep 2022 (Wed) 2:30pm - 5:30pm Online Zoom
Session 3 21 Sep 2022 (Wed) 2:30pm - 5:30pm Online Zoom
Session 4 28 Sep 2022 (Wed) 2:30pm - 5:30pm Online Zoom
Session 5 5 Oct 2022 (Wed) 2:30pm - 5:30pm Online Zoom
Session 6 12 Oct 2022 (Wed) 2:30pm - 5:30pm MW-T7 Face-to-face
Session 7 16 Oct 2022 (Sun) 2:30pm - 5:30pm MW-T7 Face-to-face
Session 8 19 Oct 2022 (Wed) 2:30pm - 5:30pm CB-C Face-to-face
Session 9 23 Oct 2022 (Sun) 2:30pm - 5:30pm MW-T7 Face-to-face
Session 10 2 Nov 2022 (Wed) 2:30pm - 5:30pm Online Zoom
CB - Chow Yei Ching Building MW - Meng Wah Complex
Add/drop 1 September, 2022 - 15 September, 2022
Maximum class size 60   [For MSc(CompSc) students]
Moodle course website
  • HKU Moodle: (Login using your HKU Portal UID and PIN)

    - Please note that the professor maintains and controls when to release the Moodle teaching website to students.
    - Enrolled students should visit the Moodle teaching website regularly for latest announcements, course materials, assignment submission, discussion forum, etc.