Research Computing (RC) facilitates the advancement of complex research by providing leading edge computing services across the Faculty of Arts & Sciences (FAS). RC staff maintain expertise in constantly changing computing technologies, while 'speaking the language' of the FAS researchers, to help them use computing more effectively.
Linux cluster with over 10,000 cores and many biological and other software packages available (BayesPhylogenies, BLAST and wu-blast, Cladescan, MrBayes, Matlab, Mathematica, RaxML).
Research Computing’s staff has in-depth knowledge of a variety of scientific and technical disciplines, including:
* Bioinformatics – Research Computing’s bioinformatics core helps researchers conduct large-scale sequence analysis on the Odyssey cluster. The core provides high level biostatistics support, helping researchers determine the significance of results from high-throughput microarray experiments, as well as assists in interpreting the results of high-throughput sequencing, and annotating genomes.
* As data sets from instruments and computational analysis keep growing, Research Computing helps researchers mine and transform that data to make scientific conclusions. In a few hours – or sometimes just a few minutes, a programmer can write a script that saves a researcher months of work, allowing them more time for other scientific pursuits.
* Research Computing can provide faculty and other research staff with assistance in purchasing hardware and software that fit special computing needs. From storage equipment purchases to licenses for software packages, Research Computing can help facilitate the transaction with the vendor, and assist with installation and set-up.
Research Computing provides faculty and researchers with the tools they need to take on large-scale computing challenges. Odyssey, Harvard’s largest supercomputer, offers users over 35 Petabytes of raw storage, more than 70,000 processing cores, and numerous software modules and applications. Research Computing can also host and create scientific applications not already on the Odyssey system.
Multi-terabyte storage for labs at competitive rates, with different levels of access speed, stability, and backup capability.
Occasional classes in using Linux, programming, Matlab, next-generation sequencing.
"We have Assembly By Short Sequences - a de novo, parallel, paired-end sequence assembler installed on Odyssey. The version on Odyssey is 1.1.2."
"BayesPhylogenies is a general package for inferring phylogenetic trees using Bayesian Markov Chain Monte Carlo (MCMC) or Metropolis-coupled Markov chain Monte Carlo (MCMCMC) methods. "
"Cladescan was written to automate the process of comparing trees. Looking for a particular node in a tree seems like a trivial task; and it is, until you try to look for a complicated node among hundreds of trees. With large numbers of taxa in an analysis, this can become a frequent and tedious occurrence. Worse, it's a process prone to human error. These problems become especially significant when performing sensitivity analyses (e.g., examining best trees from a number of condition sets for presence of a clade of interest).
This program seeks to make your life easier and more accurate by doing these comparisons for you, summarizing the results both in textual and graphical formats. "Navajo Rug" sensitivity plots sensu Giribet (Systems Biology 52, 2003) for each target clade may be output in Scalable Vector Graphics format, suitable for import into vector graphics packages such as Adobe Illustrator. "
"A hierarchical likelihood ratio test for phylogenetic congruence."
"Concaterpillar 1.4 is installed on Odyssey in bio/concaterpillar-1.4. "
"GNU Science Library (GSL) is a C library that provides an efficient implementation of a large number of useful mathematical and scientific functions.
Odyssey has several version of this library installed.
module load hpc/gsl-gnu: This module makes the GSL library that is compiled with the GNU toolchain available.
module load hpc/gsl-intel: This module provides the GSL library compiled with the Intel 10.1.015 compilers
module load hpc/gsl-intel_10.1.018: This provides the GSL library compiled with the Intel 10.1.018 compilers
module load hpc/gsl-intel_11.0.083: This provides the same with the Intel 11.0.083 compilers"
"Gnuplot is a portable command-line driven interactive data and function plotting utility for UNIX, IBM OS/2, MS Windows, DOS, Macintosh, VMS, Atari and many other platforms. The software is copyrighted but freely distributed (i.e., you don't have to pay for it). It was originally intended as to allow scientists and students to visualize mathematical functions and data. It does this job pretty well, but has grown to support many non-interactive uses, including web scripting and integration as a plotting engine for third-party applications like Octave. Gnuplot has been supported and under development since 1986."
"Graph visualization is a way of representing structural information as diagrams of abstract graphs and networks. Automatic graph drawing has many important applications in software engineering, database and web design, networking, and in visual interfaces for many other domains. "
"MATLAB® is a high-level language and interactive environment that enables you to perform computationally intensive tasks faster than with traditional programming languages such as C, C++, and Fortran." "The latest version of Matlab on Odyssey is Release R2009a."
"Migrate: MIGRATION RATE AND POPULATION SIZE ESTIMATION
using Markov Chain Monte Carlo simulation."
"The is no default MPI library in your environment when you log into Odyssey. You need to choose an implementation and then load the appropriate module.
The MPI implementations on our cluster are OpenMPI and Mvapich2. The modules are built to use either the Intel compiler suite and the GNU compiler suite.
The current versions of these are:
But these get updated often so check if there are ones more recent."
"MrBayes is a program for the Bayesian estimation of phylogeny. Bayesian inference of phylogeny is based upon a quantity called the posterior probability distribution of trees, which is the probability of a tree conditioned on the observations. The conditioning is accomplished using Bayes's theorem. The posterior probability distribution of trees is impossible to calculate analytically; instead, MrBayes uses a simulation technique called Markov chain Monte Carlo (or MCMC) to approximate the posterior probabilities of trees."
"OMNeT++ is an extensible, modular, component-based C++ simulation library and framework, with an Eclipse-based IDE and a graphical runtime environment."
OpenGL & libGD
"It is possible to run OpenMP codes on Odyssey."
"Serial qchem can be run in the usual way BUT by default it is multi-threaded."
"RAxML is available in both serial and parallel versions. To run the serial version load the hpc/RAxML-7.0.4 module. The executable is called raxmlHPC."
"The Visualization Toolkit (VTK) is an open-source, freely available software system for 3D computer graphics, image processing and visualization. VTK consists of a C++ class library and several interpreted interface layers including Tcl/Tk, Java, and Python. Professional support and products for VTK are provided by Kitware."