eagle-i Harvard UniversityHarvard University
See it in Search

Research Computing Core (FAS)

Director: Yockel, Scott

Location: 38 Oxford St, Cambridge, MA 02138

Summary:

Research Computing (RC) facilitates the advancement of complex research by providing leading edge computing services across the Faculty of Arts & Sciences (FAS). RC staff maintain expertise in constantly changing computing technologies, while 'speaking the language' of the FAS researchers, to help them use computing more effectively.

Affiliations:

People:

Resources:

Instruments

  • Odyssey ( Computer cluster )

    Linux cluster with over 82,000 cores and many biological and other software packages available (BayesPhylogenies, BLAST and wu-blast, Cladescan, MrBayes, Matlab, Mathematica, RaxML).

Services

  • Consulting: Bioinformatics, Scientific Computing, Hardware and Software Purchasing ( Support service )

    Consulting
    Research Computing’s staff has in-depth knowledge of a variety of scientific and technical disciplines, including:
    * Bioinformatics – Research Computing’s bioinformatics core helps researchers conduct large-scale sequence analysis on the Odyssey cluster. The core provides high level biostatistics support, helping researchers determine the significance of results from high-throughput microarray experiments, as well as assists in interpreting the results of high-throughput sequencing, and annotating genomes.
    * As data sets from instruments and computational analysis keep growing, Research Computing helps researchers mine and transform that data to make scientific conclusions. In a few hours – or sometimes just a few minutes, a programmer can write a script that saves a researcher months of work, allowing them more time for other scientific pursuits.
    * Research Computing can provide faculty and other research staff with assistance in purchasing hardware and software that fit special computing needs. From storage equipment purchases to licenses for software packages, Research Computing can help facilitate the transaction with the vendor, and assist with installation and set-up.

  • High Performance Computing ( Access service )

    Research Computing provides faculty and researchers with the tools they need to take on large-scale computing challenges. Odyssey, Harvard’s largest supercomputer, offers users over 45 Petabytes of raw storage, more than 70,000 processing cores, and numerous software modules and applications. Research Computing can also host and create scientific applications not already on the Odyssey system.

  • Multi-terabyte data storage ( Data storage service )

    Multi-terabyte storage for labs at competitive rates, with different levels of access speed, stability, and backup capability.

  • Progamming and sequencing classes and training ( Training service )

    Occasional classes in using Linux, programming, Matlab, next-generation sequencing.

Software

  • abaqus ( Software )

    Abaqus is a finite element analysis and engineering software

  • ABySS ( Software )

    "We have Assembly By Short Sequences - a de novo, parallel, paired-end sequence assembler installed on Odyssey. The version on Odyssey is 1.1.2."

  • ADOL-C (Automatic Differentiation by OverLoading in C++) ( Software )

    The package ADOL-C (Automatic Differentiation by OverLoading in C++) facilitates the evaluation of first and higher derivatives of vector functions that are defined by computer programs written in C or C++.

  • afni (analysis of functional neuroimages) ( Algorithmic software suite )

    AFNI is a suite of programs for looking at and analyzing 3D brain images. The emphasis is on FMRI, but AFNI can be used for other purposes as well.

  • allpaths-lg ( Software )

    ALLPATHS-LG is a short read assembler and it works on both small and large (mammalian size) genomes.

  • Anaconda and Anaconda 3 ( Algorithmic software suite )

    A completely free enterprise-ready Python 2.x and Python 3.x distribution for large-scale data processing, predictive analytics, and scientific computing, from Continuum Analytics.

  • ANGSD ( Software )

    ANGSD is a software for analyzing next generation sequencing data.

  • ANTLR ( Software )

    ANTLR (ANother Tool for Language Recognition) is a powerful parser generator for reading, processing, executing, or translating structured text or binary files. It's widely used to build languages, tools, and frameworks. From a grammar, ANTLR generates a parser that can build and walk parse trees.

  • ANTs ( Software )

    Advanced Normalization Tools (ANTs)

  • Armadillo ( Software )

    Armadillo is a high quality C++ linear algebra library, aiming towards a good balance between speed and ease of use; the syntax (API) is deliberately similar to Matlab.

  • ARPACK ( Software )

    ARPACK is a collection of Fortran77 subroutines designed to solve large scale eigenvalue problems.

  • ATAC-seq ( Software )

    Some helper scripts for ATAC-seq analysis with NGmerge

  • ATLAS (Automatically Tuned Linear Algebra Software) ( Software )

    The ATLAS (Automatically Tuned Linear Algebra Software) project is an ongoing research effort focusing on applying empirical techniques in order to provide portable performance. At present, it provides C and Fortran77 interfaces to a portably efficient BLAS implementation, as well as a few routines from LAPACK.

  • BaitFisher-package ( Software )

    The BaitFisher-package is a software package for designing hybrid enrichment probes.

  • BamTools ( Software )

    BamTools provides both a programmer's API and an end-user's toolkit for handling BAM files

  • BamUtil ( Software )

    BamUtil provides some programs for working on SAM/BAM files.

  • Basic Local Alignment Search Tool (BLAST) ( Software )

    Basic Local Alignment Search Tool (BLAST) (1, 2) is the tool most frequently used for calculating sequence similarity. BLAST comes in variations for use with different query sequences against different databases. This module is an alias for ncbi-blast

  • Basic Local Alignment Search Tool (BLAST) ( Software )

    Basic Local Alignment Search Tool (BLAST) (1, 2) is the tool most frequently used for calculating sequence similarity. BLAST comes in variations for use with different query sequences against different databases. This module is an alias for ncbi-blast

  • BayesPhylogenies ( Software )

    "BayesPhylogenies is a general package for inferring phylogenetic trees using Bayesian Markov Chain Monte Carlo (MCMC) or Metropolis-coupled Markov chain Monte Carlo (MCMCMC) methods. "

  • bazel ( Software )

    Correct, reproducible, fast builds for everyone

  • bcftools ( Software )

    Utilities for variant calling and manipulating VCFs and BCFs.

  • bcl2fastq2 ( Software )

    bcl2fastq2 combines BCL files from an Illumina NextSeq run and converts them into FASTQ files. At the same time as converting, bcl2fastq2 separates reads from multiplexed samples (demultiplexing). The multiplexed reads are assigned to samples based on a user-generated sample sheet, and are written to corresponding FASTQ files.

  • BEAGLE ( Software )

    BEAGLE is a high-performance library that can perform the core calculations at the heart of most Bayesian and Maximum Likelihood phylogenetics packages. It can make use of highly-parallel processors such as those in graphics cards (GPUs) found in many PCs. The aim is to provide high performance evaluation 'services' to a wide range of phylogenetic software, both Bayesian samplers and Maximum Likelihood optimizers. This allows these packages to make use of implementations that make use of optimized hardware such as graphics processing units.

  • BEAST ( Algorithmic software suite )

    BEAST 2 is a cross-platform program for Bayesian MCMC analysis of molecular sequences. It is entirely orientated towards rooted, time-measured phylogenies inferred using strict or relaxed molecular clock models. It can be used as a method of reconstructing phylogenies but is also a framework for testing evolutionary hypotheses without conditioning on a single tree topology. BEAST 2 uses MCMC to average over tree space, so that each tree is weighted proportional to its posterior probability. BEAST 2 includes a graphical user-interface for setting up standard analyses and a suit of programs for analysing the results.

  • BEDOPS ( Software )

    BEDOPS is an open-source command-line toolkit that performs highly efficient and scalable Boolean and other set operations, statistical calculations, archiving, conversion and other management of genomic data of arbitrary scale.

  • bedtools2 ( Algorithmic software suite )

    Collectively, the bedtools utilities are a swiss-army knife of tools for a wide-range of genomics analysis tasks. The most widely-used tools enable genome arithmetic: that is, set theory on the genome. For example, bedtools allows one to intersect, merge, count, complement, and shuffle genomic intervals from multiple files in widely-used genomic file formats such as BAM, BED, GFF/GTF, VCF.

  • bioawk ( Software )

    awk modified for biological data

  • bismark ( Software )

    Bismark is a program to map bisulfite treated sequencing reads to a genome of interest and perform methylation calls in a single step.

  • blat ( Software )

    UCSC tools

  • bmtools ( Algorithmic software suite )

    Set of tools for removing human contaminants from other DNA samples

  • Boost ( Algorithmic software suite )

    Boost is a set of libraries for the C++ programming language that provide support for tasks and structures such as linear algebra, pseudorandom number generation, multithreading, image processing, regular expressions, and unit testing. It contains over eighty individual libraries.

  • Bowtie and Bowtie2 ( Software )

    Bowtie is an ultrafast, memory-efficient short read aligner. It aligns short DNA sequences (reads) to the human genome at a rate of over 25 million 35-bp reads per hour. Bowtie indexes the genome with a Burrows-Wheeler index to keep its memory footprint small: typically about 2.2 GB for the human genome (2.9 GB for paired-end).

  • bpp, ( Software )

    BPP is a Bayesian Markov chain Monte Carlo (MCMC) program for analyzing DNA sequence alignments from multiple loci and multiple closely-related species under the multispecies coalescent (MSC) model.

  • bpp-core, bpp-phyl, bpp-popgen, bpp-seq ( Software )

    Bio++ is a set of C++ libraries for Bioinformatics, including sequence analysis, phylogenetics, molecular evolution and population genetics.

  • BWA ( Algorithmic software suite )

    BWA is a software package for mapping low-divergent sequences against a large reference genome, such as the human genome. It consists of three algorithms: BWA-backtrack, BWA-SW and BWA-MEM. The first algorithm is designed for Illumina sequence reads up to 100bp, while the rest two for longer sequences ranged from 70bp to 1Mbp. BWA-MEM and BWA-SW share similar features such as long-read support and split alignment, but BWA-MEM, which is the latest, is generally recommended for high-quality queries as it is faster and more accurate. BWA-MEM also has better performance than BWA-backtrack for 70-100bp Illumina reads.

  • bzip2 ( Software )

    bzip2 is a freely available, patent free (see below), high-quality data compressor. It typically compresses files to within 10% to 15% of the best available techniques (the PPM family of statistical compressors), whilst being around twice as fast at compression and six times faster at decompression.

  • cactus ( Software )

    Cactus is a reference-free whole-genome multiple alignment program

  • cadence ( Software )

    Software for Electronic Design and Simulation.

  • calibre ( Software )

    Software for IC Design and Verification

  • canu ( Software )

    A single molecule sequence assembler for genomes large and small.

  • Cas-OFFinder ( Software )

    An ultrafast and versatile algorithm that searches for potential off-target sites of CRISPR/Cas-derived RNA-guided endonucleases.

  • CD-HIT ( Software )

    CD-HIT is very fast and can handle extremely large databases. CD-HIT helps to significantly reduce the computational and manual efforts in many sequence analysis tasks and aids in understanding the data structure and correct the bias within a dataset.

  • CDO ( Algorithmic software suite )

    CDO is a large tool set for working on climate and NWP model data. NetCDF 3/4, GRIB 1/2 including SZIP and JPEG compression, EXTRA, SERVICE and IEG are supported as IO-formats. Apart from that CDO can be used to analyse any kind of gridded data not related to climate science. CDO has very small memory requirements and can process files larger than the physical memory.

  • Cell Ranger and Cell Ranger-atac ( Algorithmic software suite )

    Cell Ranger is a set of analysis pipelines that processes Chromium single cell 3' RNA-seq output to align reads, generate gene-cell matrices and perform clustering and gene expression analysis.

    Cell Ranger ATAC is a set of analysis pipelines that process Chromium Single Cell ATAC data

  • CentOS6 ( Software )

    Allows loading of the CentOS6 modules

  • Centrifuge ( Software )

    Classifier for metagenomic sequences

  • cfitsio ( Software )

    C interface to FITS files.

  • CGAL ( Software )

    CGAL version 4.4

  • Cladescan ( Software )

    "Cladescan was written to automate the process of comparing trees. Looking for a particular node in a tree seems like a trivial task; and it is, until you try to look for a complicated node among hundreds of trees. With large numbers of taxa in an analysis, this can become a frequent and tedious occurrence. Worse, it's a process prone to human error. These problems become especially significant when performing sensitivity analyses (e.g., examining best trees from a number of condition sets for presence of a clade of interest).

    This program seeks to make your life easier and more accurate by doing these comparisons for you, summarizing the results both in textual and graphical formats. "Navajo Rug" sensitivity plots sensu Giribet (Systems Biology 52, 2003) for each target clade may be output in Scalable Vector Graphics format, suitable for import into vector graphics packages such as Adobe Illustrator. "

  • CLAPACK ( Software )

    f2c'd version of LAPACK

  • ClonalFrameML ( Algorithmic software suite )

    ClonalFrameML is a software package that performs efficient inference of recombination in bacterial genomes.

  • Clustal Omega ( Software )

    Clustal Omega is the latest addition to the Clustal family. It offers a significant increase in scalability over previous versions, allowing hundreds of thousands of sequences to be aligned in only a few hours. It will also make use of multiple processors, where present. In addition, the quality of alignments is superior to previous versions, as measured by a range of popular benchmarks.

  • cmake ( Algorithmic software suite )

    Cross platform build system as complicated as Autotools, but different.

  • codonPHYML_dev ( Software )

    codonPhyML uses Markovian codon models of evolution in phylogeny reconstruction.

  • COMSOL ( Software )

    COMSOL Multiphysics is an engineering, design, and finite element analysis software environment for the modeling and simulation of any physics-based system.

  • Concaterpillar ( Software )

    "A hierarchical likelihood ratio test for phylogenetic congruence."

    "Concaterpillar 1.4 is installed on Odyssey in bio/concaterpillar-1.4. "

  • Connectome Workbench ( Algorithmic software suite )

    Connectome Workbench is an open source, freely available visualization and discovery tool used to map neuroimaging data, especially data generated by the Human Connectome Project.

  • CUDA ( Software )

    Module that activates the CUDA libraries

  • cuDNN ( Software )

    The NVIDIA CUDA Deep Neural Network library (cuDNN) is a GPU-accelerated library of primitives for deep neural networks.

  • Cufflinks ( Software )

    Cufflinks assembles transcripts, estimates their abundances, and tests for differential expression and regulation in RNA-Seq samples. It accepts aligned RNA-Seq reads and assembles the alignments into a parsimonious set of transcripts. Cufflinks then estimates the relative abundances of these transcripts based on how many reads support each one, taking into account biases in library preparation protocols.

  • curl ( Software )

    curl is an open source command line tool and library for transferring data with URL syntax

  • cutadapt ( Software )

    cutadapt removes adapter sequences from high-throughput sequencing data. This is usually necessary when the read length of the sequencing machine is longer than the molecule that is sequenced, for example when sequencing microRNAs.

  • cyana ( Software )

    Combined assignment and dynamics algorithm for NMR applications

  • cytoscape ( Software )

    An open source bioinformatics software platform for visualizing molecular interaction networks and biological pathways and integrating these networks with annotations, gene expression profiles and other state data.

  • datamash ( Software )

    GNU datamash is a command-line program which performs basic numeric, textual and statistical operations on input textual data files.

  • DETONATE (DE novo TranscriptOme rNa-seq Assembly with or without the Truth Evaluation) ( Software )

    DETONATE (DE novo TranscriptOme rNa-seq Assembly with or without the Truth Evaluation) consists of two component packages, RSEM-EVAL and REF-EVAL. Both packages are mainly intended to be used to evaluate de novo transcriptome assemblies, although REF-EVAL can be used to compare sets of any kinds of genomic sequences.

  • DIAMOND ( Software )

    DIAMOND is a new alignment tool for aligning short DNA sequencing reads to a protein reference database such as NCBI-NR. On Illumina reads of length 100-150bp, in fast mode, DIAMOND is about 20,000 times faster than BLASTX, while reporting about 80-90% of all matches that BLASTX finds, with an e-value of at most 1e-5. In sensitive mode, DIAMOND ist about 2,500 times faster than BLASTX, finding more than 94% of all matches.

  • drmaa-python (Distributed Resource Management Application API) ( Software )

    Distributed Resource Management Application API (DRMAA) bindings for Python.

  • DS9 ( Software )

    SAOImage DS9 is an astronomical imaging and data visualization application.

  • EEMS (Estimating Effective Migration Surfaces) ( Software )

    EEMS - Estimating Effective Migration Surfaces

  • EIG/eigan ( Software )

    Eigen tools by Nick Patterson and Alkes Price lab. Eigen is a C++ template library for linear algebra: matrices, vectors, numerical solvers, and related algorithms.

  • Emacs ( Software )

    GNU Emacs is an extensible, customizable text editor—and more.

  • EMBOSS ( Algorithmic software suite )

    EMBOSS is "The European Molecular Biology Open Software Suite". EMBOSS is a free Open Source software analysis package specially developed for the needs of the molecular biology (e.g. EMBnet) user community. The software automatically copes with data in a variety of formats and even allows transparent retrieval of sequence data from the web. Also, as extensive libraries are provided with the package, it is a platform to allow other scientists to develop and release software in true open source spirit. EMBOSS also integrates a range of currently available packages and tools for sequence analysis into a seamless whole. EMBOSS breaks the historical trend towards commercial software packages.

  • enca (Extremely Naive Charset Analyser) ( Software )

    (Extremely Naive Charset Analyser) detects character set and encoding of text files and can also convert them to other encodings.

  • ESPResSoMD ( Algorithmic software suite )

    Extensible Simulation Package for Research on Soft matter

  • ExaBayes ( Algorithmic software suite )

    ExaBayes is a software package for Bayesian tree inference. It is particularly suitable for large-scale analyses on computer clusters.

  • ExaML ( Software )

    This code implements the popular RAxML search algorithm for maximum likelihood based inference of phylogenetic trees. It uses a radically new MPI parallelization approach that yields improved parallel efficiency, in particular on partitioned multi-gene or whole-genome datasets.

  • exonerate ( Software )

    A generic tool for sequence alignment by Guy St. C. Slater, et al.

  • Expat ( Software )

    This is James Clark's Expat XML parser library in C. It is a stream oriented parser that requires setting handlers to deal with the structure that the parser discovers in the document.

  • Eye of Gnome ( Software )

    Eye of Gnome is the gnome image viewer

  • GNU Science Library ( Software )

    "GNU Science Library (GSL) is a C library that provides an efficient implementation of a large number of useful mathematical and scientific functions.

    Odyssey has several version of this library installed.

    module load hpc/gsl-gnu: This module makes the GSL library that is compiled with the GNU toolchain available.
    module load hpc/gsl-intel: This module provides the GSL library compiled with the Intel 10.1.015 compilers
    module load hpc/gsl-intel_10.1.018: This provides the GSL library compiled with the Intel 10.1.018 compilers
    module load hpc/gsl-intel_11.0.083: This provides the same with the Intel 11.0.083 compilers"

  • Gnuplot ( Software )

    "Gnuplot is a portable command-line driven interactive data and function plotting utility for UNIX, IBM OS/2, MS Windows, DOS, Macintosh, VMS, Atari and many other platforms. The software is copyrighted but freely distributed (i.e., you don't have to pay for it). It was originally intended as to allow scientists and students to visualize mathematical functions and data. It does this job pretty well, but has grown to support many non-interactive uses, including web scripting and integration as a plotting engine for third-party applications like Octave. Gnuplot has been supported and under development since 1986."

  • GraphViz ( Software )

    "Graph visualization is a way of representing structural information as diagrams of abstract graphs and networks. Automatic graph drawing has many important applications in software engineering, database and web design, networking, and in visual interfaces for many other domains. "

  • MATLAB ( Software )

    "MATLAB® is a high-level language and interactive environment that enables you to perform computationally intensive tasks faster than with traditional programming languages such as C, C++, and Fortran." "The latest version of Matlab on Odyssey is Release R2009a."

  • Migrate ( Software )

    "Migrate: MIGRATION RATE AND POPULATION SIZE ESTIMATION
    using Markov Chain Monte Carlo simulation."

  • MPI Libraries on Odyssey ( Software )

    "The is no default MPI library in your environment when you log into Odyssey. You need to choose an implementation and then load the appropriate module.

    The MPI implementations on our cluster are OpenMPI and Mvapich2. The modules are built to use either the Intel compiler suite and the GNU compiler suite.

    The current versions of these are:

    hpc/openmpi-1.3.2_intel-11.0.083
    hpc/openmpi-1.3.2_gnu-4.1.2
    hpc/mvapich2-1.4_intel-11.1.046
    hpc/mvapich2-1.4rc2_gnu-4.3.3

    But these get updated often so check if there are ones more recent."

  • MrBayes ( Software )

    "MrBayes is a program for the Bayesian estimation of phylogeny. Bayesian inference of phylogeny is based upon a quantity called the posterior probability distribution of trees, which is the probability of a tree conditioned on the observations. The conditioning is accomplished using Bayes's theorem. The posterior probability distribution of trees is impossible to calculate analytically; instead, MrBayes uses a simulation technique called Markov chain Monte Carlo (or MCMC) to approximate the posterior probabilities of trees."

  • Omnetpp ( Software )

    "OMNeT++ is an extensible, modular, component-based C++ simulation library and framework, with an Eclipse-based IDE and a graphical runtime environment."

  • OpenGL & libGD on Odyssey ( Software )

    OpenGL & libGD

  • OpenMP software on Odyssey ( Software )

    "It is possible to run OpenMP codes on Odyssey."

  • qchem ( Software )

    "Serial qchem can be run in the usual way BUT by default it is multi-threaded."

  • RAxML ( Software )

    "RAxML is available in both serial and parallel versions. To run the serial version load the hpc/RAxML-7.0.4 module. The executable is called raxmlHPC."

  • Visualization Toolkit ( Software )

    "The Visualization Toolkit (VTK) is an open-source, freely available software system for 3D computer graphics, image processing and visualization. VTK consists of a C++ class library and several interpreted interface layers including Tcl/Tk, Java, and Python. Professional support and products for VTK are provided by Kitware."


Web Links:

Last updated: 2019-05-28T14:00:21.188-04:00

Copyright © 2016 by the President and Fellows of Harvard College
The eagle-i Consortium is supported by NIH Grant #5U24RR029825-02 / Copyright 2016