Main Page
From U-M Big Data Summer Institute Wiki
Welcome to the U-M Big Data Summer Institute 2017 Wiki!
Consult the User's Guide for information on using the wiki software.
Contents
- 1 Reading Material
- 2 2017 Presentations
- 3 Symposium
- 4 Additional Resources
Reading Material
Data Mining / Machine Learning Group
EHR Group
Papers
- Bush et al. (2016) Unravelling the Human Genome
- AAndreu-Perez et al. (2015) Big Data for Health
- Madigan et al. (2014) A Systematic Approach to Evaluating Evidence from Observational Studies
- Collins et al. (2015) A New Initiative on Precision Medicine
Genomics Group
Papers
Methods for genome-wide association studies (GWAS)
- Skol AD et al. (2006) "Joint analysis is more efficient than replication-based analysis for two-stage genome-wide association studies" Nat. Genet
- Useful to understand basic methods for GWAS and study design - Willer CJ et al. (2010) "METAL: fast and efficient meta-analysis of genomewide association scans." Nat Genet
- Software tool for meta-analysis
DNA sequencing and De-novo assembly
- The 1000 Genomes Project Consortium (2010) A map of human genome variation from population-scale sequencing Nature
First 1000 genomes paper - The 1000 Genomes Project Consortium (2015) A global reference for human genetic variation Nature
Final release of the 1000 Genomes Project - Iqbal Z. et al (2012) De novo assembly and genotyping of variants using colored de Bruijn graphs. Nature
Variant caller using de-novo assembly graphs - Li et al (2009) Fast and accurate short read alignment with Burrows-Wheeler transform.
Sequence alignment algorithm using BWT
Single Cell RNA Sequencing
- Macosko E et al (2015) Highly Parallel Genome-wide Expression Profiling of Individual Cells Using Nanoliter Droplets. Cell'
Landmark paper for DropSeq method - Zheng G et al (2017) Massively parallel digital transcriptional profiling of single cells. Nat Comm
Paper from 10x genomics - van der Maaten LJP and Hinton GE (2008) Visualizing Data using t-SNE J Machine Learning Research
First paper of t-SNE method
Prediction of Gene Expression and/or Complex Phenotypes
- Gamazon et al (2015) A gene-based association method for mapping traits using reference transcriptome data Nat Genet PrediXcan paper for elasticNet-based prediction of expression
- Lappalainen T et al (2013) Transcriptome and genome sequencing uncovers functional variation in humans. Nat Genet Paper describing GEUVADIS data
- Yang J et al (2011) GCTA: a tool for genome-wide complex trait analysis Am J Hum Genet GCTA paper that has BLUP method
- Zhou X et al (2013) Polygenic modeling with bayesian sparse linear mixed models BSLMM method as a more accurate alternatives to BLUP
Online videos to better understand genetics and genomics
Genetics
- Introduction to Genetics by 23andMe (5 videos)
- TED-Ed : How Mendel's pea plants helped us understand genetics - Hortensia Jiménez Díaz
- Genetic Recombination and Gene Mapping by Bozeman Science
- Useful Genetics : A college-level comprehensive genetics course with 292 lectures offered by Rosie Redfield at UBC
Useful 3D Animations
- From DNA to protein - 3D Animation
- DNA Transcription - 3D Animation
- DNA splicing - 3D Animation
- mRNA Translation - 3D Animation
- How DNA is packaged - 3D Animation
- The Central Dogma - 3D Animation
Gene Regulation and Epigenetics
- Epigenetics Lecture by SciShow
- Hi-C Technique : A 3D map of the Human Genome
- The ENCODE Project
- RNAi by Nature Video
Sequencing Technologies
- TED-Ed : The race to sequence the human genome - Tien Nguyen
- DropSeq - Droplet-based Single Cell Sequencing by McCarroll Lab
Imaging Group
2017 Presentations
Week 1
Day 1: June 6
- Orientation 2017 (Slides) - Bhramar Mukherjee, PhD
- Coordinator Presentation (Slides) - Mitch Sevingy
- Life in Ann Arbor (Slides) - Mitch Sevigny
- On Being a Scientist (Slides & Audio) - Bhramar Mukherjee, PhD
- Ethics Review (Slides) - Bhramar Mukherjee, PhD
- Basic Probability (Audio) - Robert Klemmer
Day 2: June 7
- Data Processing (Slides & Audio) - Jed Carlson
- Study Design and Inference (Slides & Audio) - Rod Little, PhD
- Basic Unix (Slides) - Hyun Min Kang, PhD
Day 3: June 8
- R 101 (Slides & Audio) - Matthew Flickinger, PhD
- Observational Data and Bias (Slides & Audio) - Rod Little, PhD
- Linear Algebra (Audio) - Robert Klemmer
Day 4: June 9
- R 102 (Slides & Audio) - Matthew Flickinger, PhD
- Matrix Computation (Slides & Audio) - Shawn Lee, PhD
- Sebastian Zoellner Journey (Slides) - Sebastian Zoellner, PhD
- R 103 (Slides & Audio) - Matthew Flickinger, PhD
Week 2
Day 5: June 12
- Python 101 (Slides & Audio) - Jonathon Stroud
- Parameter Estimation and Likelihood (Slides & Audio) - Rod Little, PhD
- EHR Project Description (Slides) - Phil Boonstra, PhD; Matt Zawistowski, PhD; Zhenke Wu, PhD
Day 6: June 13
- Python 102 (Audio) - Jonathon Stroud
- Linear Regression (Slides & Audio) - Matt Zawistowski, PhD
- Genomics Project Description (Slides) - Hyun Min Kang, PhD
Day 7: June 14
- Machine Learning 1 (Slides & Audio) - Hui Jiang, PhD
- Logistic Regression (Slides & Audio) - Matt Zawistowski, PhD
- Alfred Hero's Journey Lecture (Slides & Audio) - Alfred Hero, PhD
- Imaging Project Description (Slides) - Tim Johnson, PhD
- Neuroimaging Data Analysis (Slides) - Eunjee Lee, PhD
Day 8: June 15
- Machine Learning 2 (Slides & Audio) - Hui Jiang, PhD
- Reproducible Research (Slides & Audio) - Jed Carlson
- Data Mining/ Machine Learning (Slides) - Johann Gagnon-Bartsch, PhD
Day 9: June 16
- Reading like a Scientific Writer (Slides & Audio) - Brett Griffiths, PhD
- Bhramar Mukherjee Journey Lecture (Slides & Audio) - Bhramar Mukherjee, PhD
- Jeremy Taylor Journey Lecture (Slides & Audio) - Jeremy Taylor, PhD
Week 3
Day 10: June 19
- Unsupervised Machine Learning 1 (Audio) - Jenna Wiens, PhD
- Casual Inference 1 (Audio) - Lu Wang, PhD
Day 11: June 20
- Unsupervised Machine Learning 2 (Audio) - Jenna Wiens, PhD
- Casual Inference 2 (Slides & Audio) - Lu Wang, PhD
Day 12: June 21
- Academic Presentations (Slides & Audio) - Sebastian Zoellner, PhD
- Programming Workshop (Slides & Audio) - Hyun Min Kang, PhD
- Goncalo Abecasis Journey (Slides & Audio) - Goncalo Abecasis, PhD
Day 13: June 22
- Network Models (Slides & Audio) - Zhenke Wu, PhD
- Distributed Computing (Slides & Audio) - Harsha Madhyastha, PhD
Day 14: June 23
- Preparing for Graduate School (Slides & Audio) - Kelley Kidwell, PhD
- Michael Boehnke Journey (Slides & Audio) - Michael Boehnke, PhD
Week 4
Day 15: June 26
- Visualization 1 (Audio) - Matthew Kay, PhD
- Visualization 2 (Audio) - Matthew Kay, PhD
Day 16: June 27
- Data Visualization in R (Slides & Audio) - Matthew Flickinger, PhD
- Introduction to Bayes (Slides & Audio) - Bhramar Mukherjee, PhD
Day 17: June 28
- Leveraging Skills and Deficits in Application Essays (Slides & Audio) - Brett Griffiths, PhD
- Programming Workshop (Audio) - Hyun Min Kang, PhD
Day 18: June 29
- Optimization (Slides & Audio) - Ambuj Tewari, PhD
- Bayes Computation (Slides & Audio) - Veronica Berrocal, PhD
Day 19: June 30
- Writing from Point A to Point D: Simple Strategies for Conveying Complex Ideas (Slides & Audio) - Brett Griffiths, PhD
Week 5
Day 20: July 3
- Bayes Computation 2 (Slides & Audio) - Jian Kang, PhD
- Data Mining 1 (Slides & Audio) - Kayvan Najarian, PhD
Day 22: July 5
- Large Scale Optimization (Slides & Audio) - Ambuj Tewari, PhD
- Data Mining 2 (Slides & Audio) - Kayvan Najarian, PhD
Day 23: July 6
- Python Workshop (Audio) - Arya Farahi
- Learning Health Systems (Audio) - Karandeep Singh, MD
Day 24: July 7
- CV/ Resume Workshop (Slides & Audio) - Tara Allendorfer
- Brisa Sanchez Journey (Slides & Audio) - Brisa Sanchez, PhD
Week 6
Day 25: July 10
- Case Study: Estimating AutoAntibody Signatures to Detect Autoimmune Disease Patient Subsets (Slides & Audio) - Zhenke Wu, PhD
Day 26: July 11
- Case Study: Mixture Models for Sequence Contamination and Single Cell Transcriptions (Slides & Audio) - Hyun Min Kang, PhD
- Case Studies: "Posterior Mean Screening for Scalar-on-Image Regression"; "Bayesian Computation for Log Gaussian Cox Processes with Application to Neuroimaging" (Slides & Audio) - Jian Kang, PhD; Tim Johnson, PhD
Symposium
Student Group Presentations
Student Poster Presentations
- A Time-to-Event Analysis of Heart Failure via Electronic Health Records
- Melanoma Detection by Classifying Skin Lesion Images
- Classifying Skin Lesions Images Using Adaptive Boosting
- Machine Learning Classification of Skin Lesion Images
- Genomics: Genome Storage and Assembly
- Predicting the Transcriptome from the Genome
- Classification of Cell Types from Peripheral Mononuclear Blood Cells
- EHR-Based Study of Long-Term Infectious Diseases
- Visualizing Lab and Phenotype Associations Using PheWAS and Electronic Health Records
- Data Mining: Microenvironment Microarray Spot Based Approach for Cell Prediction
- Estimating Cell Growth with Machine Learning and Data Mining