REU Project 17—Development of statistical methods for analysis of high-dimensional data—Dr. Michael L. Collyer, Biology Department
Within the field of biotechnology, various disciplines use high-dimensional data (e.g., DNA microarray data, mass spectrometry protein data, morphometric data). Statistical methods to analyze high-dimensional data are not well established, as the development of methods lags behind the development of data-acquisition technology. Currently, a paradox exists because using high-dimensional data from organisms potentially helps researchers better discern characteristic differences among experimental or observational groups, yet typical statistical tests used by researchers lose inferential power as data dimensionality increases. As a result, researchers often employ ad hoc methods to reduce the number of variables they analyze, in order to maintain statistical power in their analyses. Research in my lab is directed at developing Monte Carlo sampling methods for high-dimensional data that preserve or even increase statistical power in hypothesis tests, without eliminating data. Students will be involved with research projects that collect 2-D and 3-D morphological data, which provide empirical data sets to help develop this methodology. Collaborative research with other data types is also possible.