The course will study some advanced topics and techniques in database
systems, with a focus on the aspects of database systems design & algorithms
and big data processing for structured data. Traditional topics include
query optimization, physical database design, transaction management, crash
recovery, parallel databases. The course will also survey some the recent
developments in selected areas such as NoSQL databases and SQL-based big
data management systems for relational (structured) data.
|Introduction by Professor
We are living the Big Data era, with virtually all domains of innovation (medicine, finance, sports, physics, astronomy, etc) becoming data rich. The traditional database management systems (originally designed for processing efficiently business data) have been rapidly evolving, and new ones have emerged as well, in order to support a rich variety of data-intensive applications, with their specific characteristics and requirements. These applications are oftentimes characterized by the so called four “Vs” of Big Data, namely Volume (data size), Variety (of data formats), Velocity (e.g., streaming sources), and Veracity (uncertain data, data reliability). Their requirements may correspond to different perspectives on aspects such data storage architecture (e.g., cloud-based), fault tolerance, or flexibility of the data or computation model. This course reviews data processing architectures and algorithms for managing large quantities of data, covering both transactional processing and data analytics. While the focus is on core aspects of database architectures and algorithms, the discussions are relevant to adjacent domains, such as knowledge management, recommender systems, or machine learning at scale.
Special Note: This course is co-coded with DASC7104 which is for MDASC students.
||The course assumes basic prior knowledge on
database systems (e.g., relational databases and SQL). Students should be
familiar with at least one programming language such as Java, C++, Python,
etc. and able to pick up other similar languages.
||No textbook is required, recommended
- Database Management Systems (3rd edition), by R.
Ramakrishnan, J. Gehrke
- Database Systems: The Complete Book, by H. Garcia Molina
- Hadoop: The Definitive Guide, by T. White
- Cassandra: The Definitive Guide, by E. Hewitt
- Graph Databases, by I. Robinson et al.
- Learning Spark, by H. Karau et al.
||1 September, 2022 - 15 September, 2022
|Maximum class size
||60 [For MSc(CompSc) students]
|Moodle course website
(Login using your HKU Portal UID and PIN)
- Please note that the professor maintains and controls when to release the Moodle teaching website to students.
- Enrolled students should visit the Moodle teaching website regularly for latest announcements, course materials, assignment submission, discussion forum, etc.