Biological Big Data Analytics

Year
1
Academic year
2020-2021
Code
02038823
Subject Area
Computational Biology
Language of Instruction
Portuguese
Other Languages of Instruction
English
Mode of Delivery
Face-to-face
Duration
SEMESTRIAL
ECTS Credits
6.0
Type
Compulsory
Level
2nd Cycle Studies - Mestrado

Recommended Prerequisites

Not applicable.

Teaching Methods

Teaching methodologies:

1. Theoretical-practical classes with several examples of application and making use of R.

2. Practical group project development - oral (15%) and written presentation (35%)

3. Final exam (50%; without consultation).

The classes are intended to present and explain the selected topics, introducing key concepts, results and main algorithms, always underlining the relationship with a problem of practical interest in computational biology.  The process of exposing the material will be done interactively and adjusted to the speed of assimilation of students. 

Learning Outcomes

Data Analytics is the science of analyzing data to convert scattered information into useful knowledge fitting well in the new science paradigm: data-intensive scientific discovery. The primary goal of this course is to introduce the fundamental concepts concerning the manipulation and exploration of data, statistical and quantitative analysis techniques, exploratory models. The course will prepare students to adapt to the big data era, facilitating them to conduct research in data science and preparing them for the deluge of biological data coming from genomic and proteomic experiences. 

Work Placement(s)

No

Syllabus

1. Data Science: introduction to statistical learning and R-programming.

2. Collection of genomic and proteomic data, exploration and visualization.

3. Descriptive Statistics: data cleaning, measures of central tendency and dispersion; joint probability distribution, conditional probability distribution, Bayes theorem.

4. Data Pre-Processing: data transformations - polynomials; box-cox, log and logit centering and normalization.

5. Data reduction: Principal Component Analysis (PCA) and t-Distributed Stochastic Neighbor Embedding (t-SNE)

6. Exploratory Factor Analysis and its relationship to PCA.

7. Regression analysis: multiple linear regression, forward, backward & stepwise regression, logistic regression.

8. Inferential Statistics: hypothesis testing and errors (Chi-square and t-test), standard analysis of variance (ANOVA), ANCOVA (analysis of covariance), MANOVA (Multivariate ANOVA) and MANCOVA (Multivariate ANCOVA).

Head Lecturer(s)

Irina de Sousa Moreira

Assessment Methods

Assessment
Exam: 50.0%
Research work: 50.0%

Bibliography

1. Probability & Statistics for Engineers & Scientists (9th Edn.), Ronald E. Walpole, Raymond H. Myers, Sharon L. Myers and Keying Ye, Prentice Hall Inc.

2. The Elements of Statistical Learning, Data Mining, Inference, and Prediction (2nd Edn.), Trevor Hastie Robert Tibshirani Jerome Friedman, Springer, 2014

3. An Introduction to Statistical Learning: with Applications in R, G James, D. Witten, T Hastie, and R. Tibshirani, Springer, 2013

4. Software for Data Analysis: Programming with R (Statistics and Computing), John M. Chambers, Springer

5. Beginning R: The Statistical Programming Language, Mark Gardener, Wiley, 2013