eagle-i Harvard UniversityHarvard University
See it in Search

Bulyk Laboratory

Location: Harvard Medical School, New Research Building, Room 466, 77 Avenue Louis Pasteur, Boston, MA 02115


The Bulyk Lab investigates transcriptional regulation. We are particularly interested in transcriptional enhancers and the interactions between sequence-specific transcription factors and their DNA binding sites. For these studies, we develop genomic, proteomic, and computational technologies and approaches and apply them to a wide variety of biological organisms including the yeast S. cerevisiae, the fruit fly D. melanogaster, mouse and human.





    • UniProbe Database ( Database )

      "The UniPROBE (Universal PBM Resource for Oligonucleotide Binding Evaluation) database hosts data generated by universal protein binding microarray (PBM) technology on the in vitro DNA binding specificities of proteins. This initial release of the UniPROBE database provides a centralized resource for accessing comprehensive data on the preferences of proteins for all possible sequence variants ('words') of length k ('k-mers'), as well as position weight matrix (PWM) and graphical sequence logo representations of the k-mer data. In total, the database currently hosts DNA binding data for 406 nonredundant proteins from a diverse collection of organisms, including the prokaryote Vibrio harveyi, the eukaryotic malarial parasite Plasmodium falciparum, the parasitic Apicomplexan Cryptosporidium parvum, the yeast Saccharomyces cerevisiae, the worm Caenorhabditis elegans, mouse, and human. The database's web tools (on the right) include a text-based search, a function for assessing motif similarity between user-entered data and database PWMs, and a function for locating putative binding sites along user-entered nucleotide sequences."


    • Bayesian Hierarchical Model of Protein Binding Microarray (PBM) Data ( Algorithmic software suite )

      "The Bayesian Protein Binding Microarray (PBM) Analysis Suite provides the in-house tools and the procedural methods used in the background noise estimation and correction, transcription factor (TF) subclassfication and TF-common and TF-preferred k-mer identification based on universal protein binding microarrays (PBMs; see Berger et al., 2006) k-mer data. "

    • Lever ( Algorithmic software component )

      "We developed an algorithm, called Lever, that systematically maps DNA regulatory motifs or motif combinations to the sets of genes that they likely regulate. Lever accomplishes this by assessing whether the motifs are enriched within cis regulatory modules (CRMs), predicted by our PhylCRM algorithm, in the noncoding sequences surrounding genes in a collection of gene sets. "

    • Lever 2.0 ( Algorithmic software suite )

      The suite of computational tools described in this manual provides an in silico framework for mapping transcription factor binding site (TFBS) motifs to their likely target genes.

      There are three programs in this suite:
      1) PhylCRM_preprocess: Both PhylCRM and Lever require a collection of input files that parameterize various statistical properties of the utilized motifs. This program generates these parameterization files. Also, it has the ability to generate length-matched sequence sets for each of the foreground gene sets under consideration.
      These matched background gene sets can be used in Lever screens (see below)

      2) Lever: This program takes as input a collection of TFBS motifs, a collection of sequences and a set system (i.e., a collection of subsets) for these sequences. For each subset of sequences and each motif or combination of motifs, the program evaluates whether or not the sequences in the subset
      (i.e., a “foreground” set of sequences) show more enrichment for the motifs under consideration than a
      matched“ background” set of sequences.
      Here, enrichment is determined by comparing the PhylCRM scores of the foreground regions against the PhylCRM scores of the background regions; thus, Lever extends the capabilities of PhylCRM by screening pairings of sequence sets and motifs.

    • MultiFinder ( Algorithmic software component )

      "In order to automate the motif searches, we developed a software package, termed MultiFinder, that performs automated motif searching using four different profile-based motif finders, including AlignACE, MDscan, BioProspector and MEME. We anticipated that using all four of these motif finders might allow the user to combine the strengths of their different algorithms.

      The integration of the results from multiple motif finding tools identifies and ranks highly more known and novel motifs than does the use of just one of these tools. In addition, we believe that our simultaneous enrichment strategies helped to identify likely human cis regulatory elements. A number of the discovered motifs may correspond to novel binding site motifs for as yet uncharacterized tissue-specific TFs. We expect this strategy to be useful for identifying motifs in other metazoan genomes. "

    • PhylCRM ( Algorithmic software component )

      "We developed a new cis-regulatory module (CRM) prediction algorithm, called PhylCRM. PhylCRM combines data for individual motif occurances scored on an alignment using previously described MONKEY scoring sheme (Moses et al., Genome Biology 5, R98, 2004) into a single CRM prediction. PhylCRM can scan very long genomic sequences for candidate CRMs by quantifying both motif clustering and conservation across arbitrarily many genomes using an evolutionary model consistent with the phylogeny of the genomes. "

    • Universal Protein Binding Microarray (PBM) Analysis Suite ( Algorithmic software suite )

      "The Universal Protein Binding Microarray (PBM) Analysis Suite provides the in-house tools and the procedural methods used in the analysis of universal protein binding microarrays (PBMs) synthesized by Agilent Techonologies."

    Web Links:

    Last updated: 2019-08-29T10:36:06.209-04:00

    Copyright © 2016 by the President and Fellows of Harvard College
    The eagle-i Consortium is supported by NIH Grant #5U24RR029825-02 / Copyright 2016