COMP7104A - Advanced database systems

Semester 2, 2018-19

Instructor
Professor Bogdan Cautis
Teaching assistant
Mr. Kevin Y.K. Lam
Syllabus The course will study some advanced topics and techniques in database systems, with a focus on the aspects of big data analytics, algorithms, and system design & organisation. It will also survey the recent development and progress in selected areas. Topics include: query optimization, spatial-spatiotemporal data management, multimedia and time-series data management, information retrieval and XML, data mining.
Introduction by Instructor We are living the Big Data era, with virtually all domains of innovation (medicine, finance, sports, physics, astronomy, etc) becoming data rich. The traditional database management systems (originally designed for processing efficiently business data) have been rapidly evolving, and new ones have emerged as well, in order to support a rich variety of data-intensive applications, with their specific characteristics and requirements. These applications are oftentimes characterized by the so called four “Vs” of Big Data, namely Volume (data size), Variety (of data formats), Velocity (e.g., streaming sources), and Veracity (uncertain data, data reliability). Their requirements may correspond to different perspectives on aspects such data storage architecture (e.g., cloud-based), fault tolerance, or flexibility of the data or computation model. This course reviews data processing architectures and algorithms for managing large quantities of data, covering both transactional processing and data analytics. While the focus is on core aspects of database architectures and algorithms, the discussions are relevant to adjacent domains, such as knowledge management, recommender systems, or machine learning at scale.

Special Note: This course is co-coded with DASC7104 and is mainly for MDASC students. The quota for MSc(CompSc) student is limited to 20.

Learning Outcomes
Course Learning Outcomes Relevant Programme Learning Outcome
CLO1. Understand trade-offs in database systems techniques, able to apply the acquired knowledge on the state-of-the-art in modern data management for designing holistic solutions based on database systems / Big Data systems and techniques, justify design decisions in the context of a data management solution. PLO 1, 2, 4, 5, 7, 8, 9, 10, 11, 14, 15, 16
CLO2. Able to implement and evaluate complex, scalable database systems solutions, develop new methods in databases based on the acquired knowledge of existing techniques. PLO 1, 2, 4, 5, 7, 8, 9, 10, 11, 14, 15, 16
View Programme Learning Outcomes
Pre-requisites The course assumes basic prior knowledge on database systems (e.g., relational databases and SQL). Students should be familiar with at least one programming language such as Java, C++, Python, etc. and able to pick up other similar languages.
Compatibility Nil
Topics covered
Course Content No. of Hours Course Learning Outcomes
1. Query processing and optimization for relational database systems 4.5 CLO1, CLO2
2. Concurrency control techniques / transaction management 4.5 CLO1, CLO2
3. Advanced indexing methods 3 CLO1, CLO2
4. Introduction to parallel and distributed database systems 4.5 CLO1, CLO2
5. Massively parallel processing databases, SQL in Big Data systems (Hadoop, Spark) 6 CLO1, CLO2
6. Column stores and NoSQL systems 4.5 CLO1, CLO2
7. Data management techniques in machine learning 3 CLO1, CLO2
 
Assessment
Description Type Weighting * Examination Period ^ Course Learning Outcomes
2 written individual assignments Continuous Assessment 25% - CLO1, CLO2
4 on-site examinations by practical lab sessions Continuous Assessment 25% - CLO1, CLO2
2hr written exam, covering all taught content in the course Written Examination 50% May 6 to 25, 2019 CLO1, CLO2
* The weighting of coursework and examination marks is subject to approval
^ The exact examination date uses to be released when all enrolments are confirmed after add/drop period by the Examinations Office.  Students must oblige to the examination schedule. Students should NOT enrol in the course if they are not certain that they will be in Hong Kong during the examination period.  Absent from examination may result in failure in the course. There is no supplementary examination for all MSc curriculums in the Faculty of Engineering.
Course materials No textbook is required, recommended readings in:
  • Database Management Systems, by R. Ramakrishnan and J. Gehrke
  • Database Systems: The Complete Book, by H. Garcia Molina et al.
  • Hadoop: The Definitive Guide, by T. White
  • Cassandra: The Definitive Guide, by E. Hewitt
  • Graph Databases, by I. Robinson et al.
  • Learning Spark, by H. Karau et al.
Session dates
Date Time Venue Remark
Session 1 7 Apr 2019 (Sun) 10:00am - 1:00pm CB-C  
Session 2 9 Apr 2019 (Tue) 7:00pm - 10:00pm CB-C  
Session 3 11 Apr 2019 (Thu) 7:00pm - 10:00pm CB-A  
Session 4a 13 Apr 2019 (Sat) 7:00pm - 8:30pm CB-C  
Session 4b 13 Apr 2019 (Sat) 8:30pm - 10:00pm HW-311 & HW-312  
Session 5 15 Apr 2019 (Mon) 7:00pm - 10:00pm CB-C  
Session 6a 18 Apr 2019 (Thu) 7:00pm - 8:30pm CB-A  
Session 6b 18 Apr 2019 (Thu) 8:30pm - 10:00pm HW-311 & HW-312  
Session 7 23 Apr 2019 (Tue) 7:00pm - 10:00pm CB-C  
Session 8a 25 Apr 2019 (Thu) 7:00pm - 8:30pm CB-A  
Session 8b 25 Apr 2019 (Thu) 8:30pm - 10:00pm HW-311 & HW-312  
Session 9 27 Apr 2019 (Sat) 10:00am - 1:00pm CB-A  
Session 10 27 Apr 2019 (Sat) 2:00pm - 5:00pm CB-C  
CB - Chow Yei Ching Building HW - Haking Wong Building
Add/drop 14 January, 2019 - 9 April, 2019
Quota 37   [For MSc(CompSc) & other Engineering TPG students]
Back