eagle-i Harvard UniversityHarvard University
See it in Search

Forsyth Bioinformatics Core

Location: The Forsyth Institute, 140 The Fenway, Boston MA, 02115


The Forsyth Bioinformatics Core specializes in oral microbial genomics and microarray gene expression analyses through the integration of computer science with molecular biology and genetics. In addition to supporting funded bioinformatics projects, the Bioinformatics Core will also provide computational support to Forsyth and other CATALYST researchers for processing, analyzing, and interpreting biological data.



      Member: Chen, Tsute, Ph.D.
      Role: Instructor in Oral Medicine, Infection, and Immunity
      Phone: (617) 892-8359



    • Genomic Signature Database ( Database )

      "Genomic Signature Database (GSD) contains a series of databases that analyze and compare genomic sequences based on the 'signature' of the sequences, instead of the sequence homology-based methods. Codon usages and oligonucleotide frequences can both be considered variants of sequence signatures. Analysis and comparison of sequences based on the signatures are thus not limited to their sizes and origins (homologous sources).

      Below is a list of the signature-based sequence databases that are available at GSD:

      * Microbial Genome Codon Usage Database (MGCUD)
      MGCUD provides interactive tools for comparing the 'genome-wide' codon usage patterns of all the microbial genomes available at NCBI. It also allows comparing the codon usage patterns of user-uploaded sequecnes with those of selected genomes.

      * Non-redundant Codong Usage Database (NRCUD)
      NRCUD provides interactive tools for studying the 'organism-wide' and 'non-redundant'codon usage patterns calculated based on all the protein coding sequences of an organism (taxon) that are available in the NCBI RefSeq database. It also allows comparing the codon usage patterns of user-uploaded sequecnes with those of selected organisms.

      * Non-redundant Nucleotide Usage Database (NUGET)
      NUGET is a comparative genomics tool that provides clustering software for comparing the usage frequencies of oligo-nucleotide seqeunces among selected genomes. Frequency data of up to 4-letter words (tetra-nucleotide) are calculated from the non-redundant genomic DNA sequences that are available in the NCBI RefSeq database.

      * Genomic Tetranucleotide Composition Analysis (GTCA)
      GTCA studies the variation of tetranucleotide compositions in microbial genomes .

      * A Sequence Signature Search Tool (ASSIST)
      ASSIST is a sequence signature search tool that finds the closest matches of the query DNA sequence among the database sequences by calculating their tetranucleotide profile similarities on the fly. ASSIST can analyze an unknown sequence provided by users and predict the most-likely taxonomic origins. ASSIST is a sequence data-mining tool that complements the homologous search methods such as BLAST, especially when no matches can be found by sequence homology."

    • Human Oral Microbiome Database ( Database )

      The goal of creating the Human Oral Microbiome Database (HOMD) is to provide the scientific community with comprehensive information o­n the approximately 600 prokaryote species that are present in the human oral cavity. The majority of these species are uncultivated and unnamed, recognized primarily by their 16S rRNA sequences. The HOMD presents a provisional naming scheme for the currently unnamed species so that strain, clone, and probe data from any laboratory can be directly linked to a stably named reference entity. The HOMD links sequence data with phenotypic, phylogenetic, clinical, and bibliographic information.

      This project is supported by contract DE016937 from The National Institute for Dental and Craniofacial Research.

    • Microbial Transcriptome Database ( Database )

      The goal of the Microbial Transcriptome Database (MTD) is to provide tools and information for studying the microbial transcriptome profiles, in particular the transcriptome data derived from the hybridization experiments using the high-density genomic tiling microarrays. Currently MTD provides a comprehensive and dynamic probe design pipeline for designing the genomic tiling array probe sets for all microbial genomic sequences. Probe sets for many microbial genomes can be downloaded and more genomes are being added to the probe design pipeline. MTD also accepts request for custom probe design with user specified genomes and array platforms. MTD also provides online tools and interfaces specifically designed for analyzing the transcriptome data.


    • Batch BLAST service ( Material analysis service )

      The processing, analyzing, and interpretation of the biological data, obtained by Batch BLAST.

    • Bioinformatics data analysis service ( Data analysis service )

      Consultation on analysis of bioinformatics data.

    • Custom microbial genome annotation service ( Material production service )

      We offer custom microbial genome annotation services for full, partial or survey genomic sequences.

    • GCG / EMBOSS molecular biology software package service ( Access service )

      Software package access service to the following packages:

      * Accelrys GCG via W2H web interface.

      What is Accelrys GCG Package?
      The original name of the Accelrys GCG Pacakge is "GCG Package" or "Wisconsin Package". It is an integrated package featuring a comprehensive collection of DNA-, RNA-, and protein-sequence-analysis tools.

      * EMBOSS via W2H web interface
      * EMBOSS via wEMBOSS web interface

      What is EMBOSS?
      EMBOSS (The European Molecular Biology Open Software Suite) is a package of high-quality FREE Open Source software for molecular sequence analysis.

    • Microarray data analysis service ( Data analysis service )

      The processing, analyzing, and interpretation of the biological data obtained from microarray data analysis.

    • Microbial genomic tiling array design service ( Material production service )

      High-density microbial genomic tiling array design for transcriptome studies.

    • Proteomic data analysis service ( Data analysis service )

      The processing, analyzing, and interpretation of the biological data obtained from proteomic data analysis.


    • Bioinformatics Resource for Oral Pathogens ( Software )

      Complete genomic sequences of several oral pathogens have been deciphered and multiple sources of independently annotated data are available for the same genome. Different gene identification and functional annotating methods used in these databases present a challenge for efficient use of the data.

      The Bioinformatics Resource for Oral Pathogens (BROP) aims to integrate bioinformatics data from multiple sources for easy comparison, analysis and data-mining through specially designed software interfaces.

      Important features of BROP include:

      * a graphical genome viewer (GenomeView) that allows side-by-side comparison of differently annotated datasets of the same genome
      * a pipeline of automatic data-mining algorithms to keep the genome annotation always up-to-the-date
      * comparative genomic tools such as Genomewide ORF Alignment (GOAL)
      * the Oral Pathogen Microarray Database (OPMD)

      The data models and tools provided by BROP not only are suitable for oral pathogens but for genomic data of all kind. Thus in addition to the genomes of oral pathogens, BROP is also a good tool for studying other microbial genomes such as the NCBI microbial genomes.

    • HM-SVM Algorithm ( Algorithmic software component )

      "RNA expression signals detected by high-density genomic tiling microarrays contain comprehensive transcriptome information of the target organism. Current methods for determining the RNA transcription units are still computer-intense and lack the discriminative power. HM-SVM is an efficient and accurate method for analyzing transcriptome profiles."

    • Human Oral Microbe Identification Microarray Data Analysis ( Software )

      The HOMIM Online Tools allows customers to analyze the microarray data derived from the Human Oral Microbe Identification Microarrays (HOMIM). The tools are specially designed for the following tasks:
      1. Array data upload
      2. Single array data processing and visualization
      3. Multi-array (-sample) comparison - including normalization, cluster analysis and profile visualization
      4. Comparison to the HOMIM profile database (including various disease, health, and specific profiles)
      5. Data management (organizing, sharing and storage of user data)

    • Significance Analysis for Oral Pathogen Microarray Data ( Software )

      "SAOPMD is an easy-to-use online microarray data analysis tool based on the robust LIMMA ( Linear Models for Microarray Data) package for statistical inference. SAOPMD combines all the repeats within and between arrays and automatically generates diagnostics plots and statistics."

    Web Links:

    Last updated: 2020-01-06T13:46:03.490-05:00

    Copyright © 2016 by the President and Fellows of Harvard College
    The eagle-i Consortium is supported by NIH Grant #5U24RR029825-02 / Copyright 2016