Data Science Topics

Year
1
Academic year
2020-2021
Code
02038778
Subject Area
Optional
Language of Instruction
Portuguese
Other Languages of Instruction
English
Mode of Delivery
Face-to-face
Duration
SEMESTRIAL
ECTS Credits
6.0
Type
Elective
Level
2nd Cycle Studies - Mestrado

Recommended Prerequisites

Calculus, Linear Algebra, Programming

Teaching Methods

T classes: presentation and discussion of concepts, techniques and algorithms. In PL, the student exercises in computer the use of algorithms in the resolution of data science problems of average complexity, making possible simulations by means of tools and frameworks. This work is done in a group, in the PL class, with the teacher's monitoring. This component weighs in the final evaluation (20%). Out-of-class achievement of a project with a report and public defense (40% of final grade). Written exam weighing 40%.

Learning Outcomes

The UC intends to introduce the area of data science, presenting the student with an overview of the area, its methodological principles, its challenges and its main applications. It is also intended to introduce the basic algorithms of a pipeline of data analysis with particular emphasis on data preparation, extraction attributes and reduction of dimensionality and on machine learning and validation. At the end it is intended that the student be able to identify from drawing pipelines and validate experimentally and formally the best algorithmic solution for a particular task. It is also intended to foster autonomous learning and group work, interpersonal relationships, and oral and written communication.

Work Placement(s)

No

Syllabus

Cap 1: Introduction
- Big Data and Data Science
- Current situation and prospects
- Required skills
Cap 2: Problems and Applications
- Life cycle and the pipeline
- Typical problems and applications C
Cap 3: Data processing
- Evaluation of signal-to-noise ratio
- Time series filtering
- Detection and treatment of outliers
- Detection and treatment of missing values
- Time-frequency transformations: extraction of non-stationary attributes
Cap 4: Attribute Handling
- Discretization of continuous variables, conversion of categorical variables
- Normalization
- Treatment of unbalanced data
Cap 5: Selection and reduction of attributes
- Classifier / regressor independent methods: Filters
- Methods based on classification / regression performance: "Wrappers"
- Embedded Methods
- Unsupervised reduction
- Supervised reduction
Chapter 6: Computational Learning
- Supervised and unsupervised learning
Cap 7: Validation

Head Lecturer(s)

Paulo Fernando Pereira de Carvalho

Assessment Methods

Assessment
Resolution Problems: 20.0%
Exam: 40.0%
Project: 40.0%

Bibliography

- Peter Flach, Machine Learning: the art and science of algorithms that maker sense of data, Cambridge University Press, 2012.
- Introduction to Machine Learning with Python, Andreas C. Muller and Sarah Guido, O'Reilly, 2017.
- García, Luengo & Herrera (2015). "Data Preprocessing in Data Mining". Springer.
- Nixon & Aguado (2008). "Feature Extraction & Image Processing". Academic Press.