This site is archived from the 2015 program. For information about the 2016 program, please visit



The field of big data science that intersects with public health and biomedicine is changing rapidly with datasets of enormous complexity and size being gathered in diverse areas including genomics, imaging, electronic health records, social media and environmental monitoring. The training of the next generation of quantitative scientists needs to change to meet the demands of the data. We define "Big Data" as datasets of enormous size and complexity (either in number of observations, and/or in the number/nature of predictors/outcomes). Classical theory, computation and intuition often fail for such irregular, sparse data sets of vast size. More training in data management, data storage, visualization, high dimensional statistics, optimization, causal methods, modeling sparse data and machine learning are needed to equip students to tackle these big data challenges. It is expected that the knowledge obtained from these massive heterogeneous data sources will inform prevention, screening, prognosis and treatment of human diseases and play a major role in biology, medicine and public health in the coming decade.

This full-time 4 week summer institute held in the Ann Arbor campus of the University of Michigan is targeted toward undergraduates who have an interest (or are susceptible to being interested) in the intersection of Big Data, Statistics, and Human Health. The institute is led by a distinguished group of faculty from the Department of Biostatistics at the University of Michigan School of Public Health (UMSPH) with additional outstanding faculty from Statistics and Electrical Engineering and Computer Science (EECS).


Each participant will be paid a stipend of up to $2500 to cover costs of travel, housing, and meals.  There are no tuition costs associated with the program.


Program Outline:

We are launching this undergraduate summer research program titled "Transforming Analytical Learning in the Era of Big Data: A Summer Institute in Biostatistics" using a potentially transformative, non-traditional, action-based learning paradigm for biostatistics in big data. There will be didactic lectures each morning from 8:30-11:30 AM by faculty members in Biostatistics, Statistics and EECS for all students. Lectures will be videotaped and made available. Following is a tentative outline of the lecture topics and speakers.



  • Data Acquisition, Database Management
  • Common computing platform, Linux environment
  • Data Structures
  • Data Visualization
  • Probability and Statistical Inference
  • Cloud, Parallel and Distributed Computing
  • Optimization
  • Sampling Methods: Markov Chain, Monte Carlo
  • Medical Informatics/Computing
  • Matrix Computation
  • Bias and Confounding, Missing Data, Causal Inference
  • Machine Learning, Graphical Models, Sparse Learning with Matrices, Social Network Analysis, Imaging


Program faculty



Research Component:

Each afternoon from 2:00-5:00 PM, students will work on big data projects. For project work, students will be divided into groups and will be assigned to one faculty member leading a particular project area of their interest.

We have identified three project areas:

Genomics (Project leader: Gonçalo Abecasis)

Brain Imaging (Project leader: Timothy Johnson)

Electronic Health Records (Project leader: Yi Li)

Each student will work in a small research team on one project (in a given area) for 4 weeks. We will have a concluding Student Research Symposium on June 25-26, 2015 showcasing the research projects of undergraduates through short talks and a poster session and featuring a talk by a nationally-recognized leader in big data statistics. We will organize other social events/outings for the students on weekends. The summer institute will provide undergraduate students with a unique opportunity to interact with faculty and graduate students across three different disciplines: Biostatistics, Statistics and Computer Science. The summer institute will be directed and coordinated by Bhramar Mukherjee.



The summer institute is presently funded by the generous support of the Department of Biostatistics, University of Michigan School of Public Health, the Department of Statistics and the University of Michigan Rackham Graduate School.


Advisory Committee:

The program will receive input and guidance from an internal advisory committee consisting of five faculty members:



Application opens February 1, 2015 for the summer 2015 program.

Application deadline is March 15, 2015.

Applications are accepted and evaluated on a rolling basis.

We will give priority to the following students:

  • Undergraduate students in their sophomore and junior years
  • Students that have demonstrated outstanding academic excellence
  • US citizens and permanent residents
  • Underrepresented and disadvantaged students and students with disabilities
  • Students with convincing personal statements of interest


Biostatistics Contact:

Bhramar Mukherjee
Professor of Biostatistics
Professor of Epidemiology
Associate Chair for Academic Affairs, Biostatistics
School of Public Health
University of Michigan
1415 Washington Heights, SPH II
Ann Arbor, MI 48109
Phone: (734) 764-6544
FAX: (734) 763-2215
E-mail: This email address is being protected from spambots. You need JavaScript enabled to view it.


Administrative Contact:

Sabrina Clayton
School of Public Health, University of Michigan
1415 Washington Heights, SPH II
Ann Arbor, MI 48109
Phone: (734) 764-5452
E-mail: This email address is being protected from spambots. You need JavaScript enabled to view it.