Advanced Infrastructures for Data Science
2
2020-2021
02038700
Optional
Portuguese
English
Face-to-face
SEMESTRIAL
6.0
Elective
2nd Cycle Studies - Mestrado
Recommended Prerequisites
Good programming skills and knowledge on distributed systems and cloud computing topics are recommended.. Fluency in English level B2 (ideally C1), according to the Common European Framework of Reference for Languages.
Teaching Methods
Lecture classes (T): presentation and discussion around the topics of the course.
Lab classes (PL): application of theoretical concepts in projects.
Learning Outcomes
The main objectives of the course are focused on providing a theoretical and practical approach to the management of high performance IT services and infrastructures, built from the ground up to support big data processing solutions, also including the planning and administration of such infrastructures, as well as the management of existing resources. The curricular organization of the course aims to guide the students through a path leading to the acquisition of skills in areas ranging from the management of virtualisation clusters and data centers to the orchestration of micro-services, in a perspective focused on providing support for big data processing frameworks (such as the Apache Hadoop and Spark, as an example).
In this course, students should acquire skills in understanding, analyzing and summarizing the subjects addressed, critical thinking, organization and planning, problem solving, group work, autonomous learning, and practical application of knowledge.
Work Placement(s)
NoSyllabus
1. Support infrastructures for Data Science: an introduction
2. Managing data center infrastructures: computing, storage and communications
3. Container orchestration systems (ex. Kubernetes, Docker, Vagrant, Mesos)
4. Real-time big data architectures: Kappa and Lambda
5. Scalable and distributed transport (ex. Apache Kafka)
6. Big data processing frameworks (ex. Apache Hadoop and Spark)
7. The placement problem: optimizing data ingestion and processing on massively distributed architectures
8. Advanced cloud computing topics
9. Resource orchestration and management: planning for scalability.
Assessment Methods
Assessment
Research work: 20.0%
Exam: 40.0%
Laboratory work or Field work: 40.0%
Bibliography
- Artigos, recursos disponíveis na Internet e capítulos de livros seleccionados, para cada tópico especializado.
-Neha Narkhede, Gwen Shapira, and Todd Palino, Apache Kafka: the definitive guide (2017)
-Matei Zaharia, Patrick Wendell, Andy Konwinski, Holden Karau, Learning Spark: Lightning-Fast Big Data Analysis (2015)
-Jan Kunigk, Ian Buss (Author), Paul Wilkinson, Lars George, Architecting Modern Data Platforms: A Guide to Enterprise Hadoop at Scale (2019)