eagle-i Harvard UniversityHarvard University
See it in Search

Harvard Chan Microbiome Analysis Core

Directors: Huttenhower, Curtis, Ph.D.; Wilkinson, Jeremy E., Ph.D.

Location: Biostatistics Department, Bldg SPH1, 4th Floor, Room 412A, Harvard T.H. Chan School of Public Health, 655 Huntington Ave, Boston, MA 02115


The Microbiome Analysis Core at the Harvard T.H. School of Public Health (HMAC) provides end-to-end support for microbial community and human microbiome research, from experimental design through data generation, bioinformatics, and statistics. This includes general consulting, power calculations, selection of data generation options, and analysis of data from amplicon (16S/18S/ITS), shotgun metagenomic sequencing, metatranscriptomics, metabolomics, and other molecular assays. The HMAC has extensive experience with microbiome profiles in diverse populations, including taxonomic and functional profiles from large cohorts, quantitative ecology, multi'omics and meta-analysis, and microbial systems and human epidemiological analysis. By integrating microbial community profiles with host clinical and environmental information, we enable researchers to interpret molecular activities of the microbiota and assess its impact on human health.






  • Data analysis and interpretation ( Data analysis service )

    Amplicon (16S rRNA / ITS) or shotgun metagenomic, metatranscriptomic (SG) sequencing data is passed through a quality control pipeline using the bioBakery (http://huttenhower.sph.harvard.edu/biobakery_workflows) workflows. 16S / ITS: The amplicon sequence data pipelines consist of two approaches, USEARCH / VSEARCH and DADA2 (https://bitbucket.org/biobakery/biobakery_workflows/wiki/Home#!16s-rrna-16s) to identify operational taxonomic units (OTUs) and amplicon sequence variates (ASVs), respectively. These taxonomic profiles are then passed to PICRUSt (http://picrust.github.io/picrust/index.html), which infers gene content and abundance of taxa, to predict the metagenome composition of the 16S-resolved community. PICRUSt predicted metagenomes are amenable to similar downstream analysis as metagenomes identified from shotgun sequencing data, but with taxonomic resolution limited by 16S. In tiered-design studies, MicroPita (http://huttenhower.sph.harvard.edu/micropita) takes as input results from 16S surveys to inform sample subset selection for SG follow-up work, governed by user-specified features of interest (clinical/environmental metadata, diversity measures, etc.). SG: Microbiome composition (bacteria, archaea, viruses and eukaryotic microbes) is gleaned from SG sequencing data using MetaPhlAn2 (http://huttenhower.sph.harvard.edu/metaphlan2), which resolves taxonomic diversity and abundance at the subspecies level.

    Metagenomes, both PICRUSt-predicted and SG-sequenced, can further be passed through the HUMAnN2 (http://huttenhower.sph.harvard.edu/humann2) pipeline. HUMAnN2 determines conservation and abundance of gene modules (sets of genes related by sequence and function) and biochemical pathways to reveal the metabolic potential of the microbial community.

    Data features derived with these algorithms, including gene/pathway presence and abundance, gene expression, microbiome composition, OTUs, ASVs, or peptide identifications from metaproteomics and compound tables from meta-metabolomics, can be integrated with clinical and environmental metadata using LEfSe (https://bitbucket.org/biobakery/biobakery/wiki/lefse) and MaAsLin2 (http://huttenhower.sph.harvard.edu/maaslin2) along with other packages within R statistical software. LEfSe identifies those data features that are distinct between a pair of metadatums (e.g. differences between two sampling sites, two clinical outcomes, two biochemical markers, two modalities, etc.). MaAsLin2 extends the functionality of LEfSe to identify associations between data features and multiple metadata factors, which can be discrete and/or continuous and can include time series data.

  • Data handling ( Support service )

    For computing infrastructure, the Core is using the FAS Research Computing cluster.

  • Free consultation on study design and data analysis ( Support service )

  • Grant proposal review, drafting, and power analysis ( Support service )

  • Manuscript: review, drafting response to reviewers ( Support service )

    Service models: fee-for-service ($150/hour)
    This rate supports advanced consultation, analysis, administrative tasks, FASRC compute cluster cycles, and data storage.


Web Links:

Last updated: 2020-03-12T15:14:41.714-04:00

Copyright © 2016 by the President and Fellows of Harvard College
The eagle-i Consortium is supported by NIH Grant #5U24RR029825-02 / Copyright 2016