Biological Big Data Analytics
1
2020-2021
02038823
Computational Biology
Portuguese
English
Face-to-face
SEMESTRIAL
6.0
Compulsory
2nd Cycle Studies - Mestrado
Recommended Prerequisites
Not applicable.
Teaching Methods
Teaching methodologies:
1. Theoretical-practical classes with several examples of application and making use of R.
2. Practical group project development - oral (15%) and written presentation (35%)
3. Final exam (50%; without consultation).
The classes are intended to present and explain the selected topics, introducing key concepts, results and main algorithms, always underlining the relationship with a problem of practical interest in computational biology. The process of exposing the material will be done interactively and adjusted to the speed of assimilation of students.
Learning Outcomes
Data Analytics is the science of analyzing data to convert scattered information into useful knowledge fitting well in the new science paradigm: data-intensive scientific discovery. The primary goal of this course is to introduce the fundamental concepts concerning the manipulation and exploration of data, statistical and quantitative analysis techniques, exploratory models. The course will prepare students to adapt to the big data era, facilitating them to conduct research in data science and preparing them for the deluge of biological data coming from genomic and proteomic experiences.
Work Placement(s)
NoSyllabus
1. Data Science: introduction to statistical learning and R-programming.
2. Collection of genomic and proteomic data, exploration and visualization.
3. Descriptive Statistics: data cleaning, measures of central tendency and dispersion; joint probability distribution, conditional probability distribution, Bayes theorem.
4. Data Pre-Processing: data transformations - polynomials; box-cox, log and logit centering and normalization.
5. Data reduction: Principal Component Analysis (PCA) and t-Distributed Stochastic Neighbor Embedding (t-SNE)
6. Exploratory Factor Analysis and its relationship to PCA.
7. Regression analysis: multiple linear regression, forward, backward & stepwise regression, logistic regression.
8. Inferential Statistics: hypothesis testing and errors (Chi-square and t-test), standard analysis of variance (ANOVA), ANCOVA (analysis of covariance), MANOVA (Multivariate ANOVA) and MANCOVA (Multivariate ANCOVA).
Head Lecturer(s)
Irina de Sousa Moreira
Assessment Methods
Assessment
Exam: 50.0%
Research work: 50.0%
Bibliography
1. Probability & Statistics for Engineers & Scientists (9th Edn.), Ronald E. Walpole, Raymond H. Myers, Sharon L. Myers and Keying Ye, Prentice Hall Inc.
2. The Elements of Statistical Learning, Data Mining, Inference, and Prediction (2nd Edn.), Trevor Hastie Robert Tibshirani Jerome Friedman, Springer, 2014
3. An Introduction to Statistical Learning: with Applications in R, G James, D. Witten, T Hastie, and R. Tibshirani, Springer, 2013
4. Software for Data Analysis: Programming with R (Statistics and Computing), John M. Chambers, Springer
5. Beginning R: The Statistical Programming Language, Mark Gardener, Wiley, 2013