The Biostatistics Center provides support to MGH investigators, as well as serving as a Coordinating Center for several NIH-supported projects. The Center's staff includes biostatisticians, physicians, research nurses, data managers, project managers, research assistants, and computing staff.
Member:
Lee, Hang, Ph.D.
Role:
Associate Professor of Medicine
Phone:
(617)726-4293
Email
Provides statistical support to MGH investigators, as well as serving as a Coordinating Center for several NIH-supported projects.
depcen.exe is a program for estimating survival probabilities and probabilities of attending visits as described in the paper "Analysis of Failure Time Data with Dependent Interval Censoring" (Finkelstein D.M., Goggins W.B, and Schoenfeld D.A., Biometrics 2002 58:298-304).
The program was implemented in Matlab and runs as a batch job from a DOS command prompt. The time to blood shedding data from the paper is also included. "interval_censr_data.zip" contains the data in .dat format and the .sas file required for setup. When using this data, please reference the article cited above.
Gen.m is an m-file (Matlab/Octave) for for computing sequential boundaries, as descibed in the paper "A Simple Algorithm for Designing Group Sequential Clinical Trials" (Schoenfeld, Biometrics 57, 972-974; September 2001). If you have Matlab download gen.m, gen.m can also be run under Octave a public domain m-file interpreter which can be downloaded from the URL below. In addition sequential.zip contains a compiled version of gen.m which runs on the command line.
This program calculates the power or sample size for a random slopes model. This model assumes that each patient has a linear trajectory in with a slope and intercept that have a multivariate normal distribution. The fixed effects are time and a time treatment interaction. The time treatment interaction is the treatment effect as it measures the extent to which the mean slope is different in the two treatment groups. The program requires the variance covariance matrix for the random effects and the error variance of observations arround a patients trajectory.
The program is an m-file (Matlab/Octave) and will run under Octave which is a public domain m-file interpreter.
The program has also been ported to R.
The NIH project ‘Inflammatory and Host Response to Injury’ (Glue) is being conducted to study the changes in the body over time in response to trauma and burn. Patients are monitored for changes in their clinical status, such as the onset of and recovery from organ failure. Blood samples are drawn over the first days and weeks after the injury to obtain gene expression levels over time. Our goal was to develop a method of selecting genes that differentially expressed in patients who either improved or experienced organ failure. For this, we needed a test for the association between longitudinal gene expressions and the time to the occurrence of ordered categorical outcomes indicating recovery, stable disease, and organ failure. We propose a test for which the relationship between the gene expression and the events is modeled using the cumulative proportional odds model that is a generalization of the pooling repeated observation method. Given the high-dimensionality of the microarray data, it was necessary to control for the multiplicity of the testing. To control for the false discovery rate (FDR), we applied both a permutational approach as well as Efron’s empirical estimation method. We explore our method through simulations and provide the analysis of the multi-center, longitudinal study of immune response to inflammation and trauma (http://www.gluegrant.org).
This paper was publish in Statistics in Medicine, 2009; 28:2817–2832, Published online 17 July 2009 in Wiley InterScience, (www.interscience.wiley.com) DOI: 10.1002/sim.3665
Biopara is a parallel framework designed to allow users of R to distribute execution of coarsely parallel problems over multiple machines for concurrent execution. Biopara is called from within an R session and results are returned to the same session as a list of native R objects that can be directly manipulated without reinterpretation.
Compare a proportion, such as proportion of female subjects, between two groups.
Compare the mean of a continuous variable such as age, BMI, or diastolic blood pressure between two groups.
Compute approximate required sample size using an Excel spreadsheet.
Compute summary statistics for a continuous variable such as age, BMI, or diastolic blood pressure.
Compute the correlation between two continuous variables such as age and BMI or BMI and diastolic blood pressure.
sterconv.html is a Javascript application which converts milligrams of various corticosteroids to the equivalent dose of Methylprednisolone. The conversion data is adapted from Goodman and Gilman's The Pharmacological Basis of Therapeutics, 9th ed. (1996); Hardman JG and Limbird LE editors; New York: McGraw-Hill Health Professions Division; page 1466.
This graphical tool shows the effect of non-proportional hazards or a missing covariate on the power of the proportional hazards regression model. One can manipulate the hazard functions using sliders to create different hazard ratios over time keeping either the final survival rates, the mean or the median constant. Further, one can alter the hazard ratio by not considering an important covariate. The power is shown for each hazard ratio function. The graphical user interface shows each hazard function, each survival curve and the hazard ratio.
*** NOTE ***
To run this program you must have Java and either Matlab or Matlab Compiler Runtime (MCR) v7.13 installed. If you do not have Matlab or MCR, use the links below to download the appropriate installer for your system.
This program is a Monte Carlo EM (MCEM) algorithm for fitting the proportional hazards model for interval censored failure time data. The algorithm generates orderings of the failures from their probability distribution under the model. We maximize the average of the log-likelihoods from these completed data sets to obtain updated parameter estimates. As with the standard Cox (1972) model, this algorithm does not require the estimation of the baseline hazard function. The method is described in the paper Goggins W, Finkelstein DM and Zaslavsky A. Applying the Cox Proportional Hazards Model when the Change Time of a Binary Time-Varying Covariate is Interval-Censored. Biometrics 1999;55: 445-451
*** NOTE ***
This software requires access to a SPARC-based Sun workstation with S installed on it.
1s_logrank.xls is for computing one sample log rank test, confidence intervals for the SMR, calculating estimate for survivorship in the matched standard population and visually comparing survivorship of the sample to that of the standard population as described in the paper and instructions (both included in the zip file). The paper was published in the Journals of the National Cancer Institute, Vol. 95, No. 19, Oct 1 2003 pp. 1434–1439 as a commentary.
Permutation test to compare variability within and distance between two groups of microarrays using the distance matrix. This software uses R, and is available through CRAN.
Parallel enabled utilities for use with Mathworks' Distributed Computing Toolbox. A parallel for loop, parallel simulation and a parallel bootstrap that makes use of the DCT to speed up execution.
Plot a histogram for one continuous variable such as age, BMI, or diastolic blood pressure.
Plot a scatter plot for two continuous variables such as age and BMI or BMI and diastolic blood pressure.
This application calculates the power and sample size for the Cox and Logistic Models. It is written in Matlab/Octave. If you have Matlab (or it's free version, Octave) on your computer you can download the source files that are in the zip file "cox_and_logistic_matlab_programs.zip" . The programs to run the model are powl8 for the logistic model and powc9 for the cox model. The other programs are necessary helper functions which must be on your path. Included is a link to download Octave. If you don't wish to use the Octave/Matlab version you need to download and run the MCRInstaller first and then use the zip file "logistic_cox_power.zip".
An excel spreadsheet that calculates the power of a sequential parallel design assuming dropouts based on the paper by Tamura and Huang, Cancer Trials 2007; 4:309-317.
A web based program for computing various sample sizes.
costart.html is a Javascipt application for searching the Coding Symbols for Thesaurus of Adverse Reaction Terms (5th Edition). The page can be saved to a local computer so that it can be accessed while that computer is not connected to the internet. To do this from the COSTART page, select Save from the File menu, and make sure to save the file as the type "Web page, complete .htm, .html" (Internet Explorer).
R code for implementing survival analysis of longitudinally collected gene expression data. Methods are described in the linked paper by Rajicic N, Finkelstein DM, and Schoenfeld DA, submitted to Bioinformatics, May 2006.