Statistical analysis of high dimensional biomedical data / Jose Carlos Martinez Avila
VerfasserMartinez Avila, Jose Carlos
Begutachter / BegutachterinFilzmoser, Peter
Umfang3, 69 Bl. : Ill., graph. Darst.
HochschulschriftWien, Techn. Univ., Dipl.-Arb., 2012
Schlagwörter (DE)-Omics/Genomics/High Dimensional Data/
URNurn:nbn:at:at-ubtuw:1-51995 Persistent Identifier (URN)
 Das Werk ist frei verfügbar
Statistical analysis of high dimensional biomedical data [2.69 mb]
Zusammenfassung (Englisch)

During the last decades the amount and accuracy of analytical methods in different fields of science has produced large numbers of measurements in just one individual or sample. Biochemical analysis methods provide a wide range of available measures which should be useful to explain a certain outcome or trait. Biomolecular methods applied to genetics allows to know how a genome is built and expressed.

The association of a Phenotype with a Genotype is a main task of the Genomics and Bioinformatics field.

At the -omics era we are swimming in a sea of data, but to dive in it needs more training and caution.

The aim of this master thesis is to present different methods to analyze high dimensional data with a case study focused in Biomedical Data.

The underlying theory of each method is briefly explained without lack of accuracy. Together with the theory a direct application in R code with an example using a grapevine data is presented.

The master thesis is organized in seven chapters. The first chapter is an introductory chapter which covers basic concepts and definitions about high dimensional data, genetics and the R environment. Also in this chapter the example data set of grapevine is explained.

Chapters from two to five present the theory of the different methods such as Multivariate Outlier Identification, Linear Discriminant Analysis, Principal Components Regression, Partial Least Squares Regression, Penalized Regression, Cluster Analysis, Permutation test, Significance Analysis of Microarrays and Moderated $t$-statistic. At the end of each chapter, the explained method is applied to the grapevine data set.

The sixth Chapter is dedicated to the case study using 8650 transcripts of 364 patients affected and non affected of Alzheimer disease.

The final chapter concludes some key issues in the statistical analysis of high dimensional biomedical data.