for my PhD

DER Finder

DER Finder, or "Differentially Expressed Region Finder," is a method for highlighting genomic regions that are significantly differentially expressed between populations based on results of an RNA-seq experiment.

paper, in Biostatistics (open access) | GitHub repo


Software for interactive differential expression analysis and visualization of transcriptomes that were assembled from RNA-seq data.

paper, in Nature Biotechnology | open access preprint | GitHub repo


Flexible, lightweight RNA-seq read simulator. Particularly useful for generating datasets with differential expression at the transcript level.

paper, in Bioinformatics | open access preprint | GitHub repo


We maintain a repository of information about publicly-available, research-quality RNA-seq datasets. For each experiment in the repository, ReCount includes gene-by-sample count tables, R ExpressionSet objects, and links to the raw data.

paper, in BMC Bioinformatics (open access) | website

side projects


Created at Hacker School with my friend David, CourseCat is a Flask app to help people answer the question "What's a good online resource for learning about [TOPIC]?" Users can submit online courses (or other resources) and file them under topics. Courses can then be up- or down-voted by users based on whether or not they are good resources for each of their topics. On each topic page, courses are smartly ranked so that great courses for each topic appear at the top. The app is live and has full functionality, but it's still a work in progress, so the current courses and topics are placeholders.

app | GitHub repo

Data analysis: gender and GitHub repo ownership

I collected data on 170,000 public GitHub repositories and predicted repo owners' genders from the first name they display on their GitHub profile. I wanted to see whether the gender breakdown of repo owners differed by programming language (it didn't). The project relied on Python data analysis libraries and D3.js. The blog post made an appearance in FiveThirtyEight's weekly data journalism roundup.

full blog post | GitHub repo

Parsing Batch Job Runtimes from Emails

Python script that can scrape your Gmail account for a set of emails sent by the Sun Grid Engine scheduling system and analyze the runtimes and memory footprints of the jobs corresponding to that set of emails.

figshare | GitHub repo

Random Calendar Generator

Python script that takes office hour slots + teaching assistants and their availabilities as input and randomly fills each office hour slot with a TA available at that time. It automatically creates a Google calendar for each TA and adds it to the master calendar using the Google Calendar API. My friend Hilary initially implemented this project in R; I converted it to Python in order to use the Google Calendar API and added the availability-checking. I wrote this while I was at Hacker School.

GitHub repo

Committee Checker

Shiny app that checks the validity of a JHSPH exam committee, either for the preliminary oral exam or the final thesis defense.

app | GitHub repo