I am a biologist wielding the tools of analysis and coding to better understand the human condition. While I have deep expertise in computational biology, I have mostly worked closely with experimental biologists in order to generate and interpret data then develop new hypotheses and test them experimentally.
Most recently, I worked at Celsius Therapeutics as the lead computational biologist in inflammatory bowel disease (IBD). I co-led an interdisciplinary team responsible for identifying new targets using our large-scale in-house patient single-cell sequencing datasets. Our team discovered multiple new disease drivers and targets which became the basis of the IBD drug pipeline.
I was also the first hands-on computational biologist at Celsius and built much of the foundational infrastructure for the single-cell platform.
My post-doctoral research focused on understanding how changes to DNA during evolution result in phenotypic changes. While studying fundamental biological concepts, I also worked to improve analysis methods used for genome-wide measurements to better understand global patterns of mutation and related molecular phenotypes.
During my PhD work in biology at MIT, I studied the interplay between different post-transcriptional gene regulatory mechanisms.
Celsius's single-cell platform was built around the idea that we can't develop transformational therapies without better understanding the patient disease context. We partnered with Chris Buckley's group at Oxford University to deeply study the cellular drivers and responders to inflammation across Crohn's Disease, Ulcerative Colitis and Rheumatoid Arthritis.
We collected multiple biopsies from each patient prior to initiating anti-TNF treatment and compared them to biopsies collected after six months of treatment, allowing us to better understand the dynamics at play.
When using high-throughput sequencing to detect mutations in a genome, scientists typically focus on point mutations, those affecting a single or small number of nucleotides at once. While point mutations are the most frequent type of mutation, larger mutations called "structural variants" can have a much larger affect on a cell, and in fact make up a majority of the bases that differ between any two individuals.
Compared with point mutations, detection of structural variants is much more difficult for this simple reason: a structural variant can connect any location in the genome to any other location in the genome, so the potential search space is enormous.
I have developed new methods to detect and characterize structural variants using long-fragment "read cloud" sequencing data. My new methods are implemented as a package called GROC-SVs (github) for the Genome-Wide Reconstruction of Complex Structural Variants. GROC-SVs demonstrates that long-fragment information, when analyzed correctly, can dramatically improve both our sensitivity and specificity for detecting structural variants.
To demonstrate the usefulness of these methods, I collected, sequenced and analyzed the genome of a liposarcoma (a type of soft-tissue tumor). I found 500 structural variants resulting in a massive rearrangement of the tumor genome (see the image). A large proportion of these events occurred as part of a chromothripsis event, involving the shattering random reassembly of chromosome 12. Because the long fragments from this tumor often span multiple breakpoints, I was able to use GROC-SVs to reconstruct the order of much of this complex structural variation directly and automatically from the sequencing data.
In addition, I sampled 7 different sites from within this liposarcoma, allowing inference into the evolutionary history of the structural variation within the genome (see below as well). From the resulting evolutionary tree, it was possible to infer that the majority of structural variation occurred in an early burst of genome instability, resulting in 400 structural variants that pre-date the major growth of this tumor. Then, the genome found a new stable point which it maintained for significant evolutionary time with practically no additional subclonal structural variation. This is one of our first direct observations of the evolution of substantial structural variation within a tumor, and demonstrates a tumor genome can be remarkably stable even though it shows little resemblance to the normal genome.
Cancer is a disease defined by the evolutionary progression of normal into malignant cells. Much of our intuition about how evolution works is based on concepts worked out in animals, but these are not necessarily applicable in the context of the asexual (non-recombining) tumor. We recently published a review (Trends in Genetics) discussing our current understanding of tumor evolution, and contrasting it with evolution of multicellular, sexual organisms.
We are also working to further our understanding of tumor evolution, and have focused on studying a few tumors in great detail. For each tumor we study, we pull out multiple regions, each with its own evolutionary history. From whole-genome resequencing, we can build evolutionary trees relating the samples, and thus identify regions that are closely related, or independent cancerous (or pre-cancerous) lesions within the same patient.
One of our recent findings (Genome Medicine) was that the oncogene PIK3CA can be mutated multiple times within the same patient, leading to a pre-cancerous phenotype in multiple regions. For example, in the case of a bilateral mastectomy (where both breasts are removed even though only one is typically affected by a tumor), both the breast with the "diagnostic" tumor as well as the other breast may harbor small pre-cancerous growths that developed the PIK3CA mutation.
Substantial effort has been invested into understanding how DNA mutations between individuals and species result in functional changes. The end objective of this type of work is not only to shed light on the evolutionary processes at work but also to better be able to predict the molecular basis for genetic diseases. Gene expression is an easy phenotype to measure globally, and therefore numerous studies have focused on so-called expression quantitative trait loci or eQTLs — genetic differences between individuals that result in changes in gene expression.
Surprisingly, a large-scale study of eQTLs has not been performed in mammalian embryos. We harnessed high-throughput sequencing technologies to globally compare genotypes and gene expression patterns in hybrids of two distantly-related mouse strains (Black6, the standard lab mouse strain, and Castaneus). We found that regulatory sequence in the close vicinity of a gene, so-called cis-regulatory elements (for example the promoter or 3′ untranslated region), were quite likely to harbor mutations that affected gene expression patterns. In stark contrast, mutations in upstream regulatory genes, so-called trans-regulatory factors (such as transcription factors and signaling proteins) resulted in virtually no detectable expression changes. This result, in conjunction with additional analyses, led us to conclude that, in embryos, mutations affecting gene expression are much more strongly restricted to the target gene and do not cascade into downstream expression changes.
Our other major result was the discovery of substantial influence of the mother's genotype on embryonic gene expression. While it has long been appreciated that maternal diet and activities can affect the development of a growing embryo, our results demonstrate that the mother's genetic makeup can also have similar widespread effects on embryonic development.
Since every cell in the body shares pretty much the exact same DNA, one of the fundamental questions in biology is how does our body produce so many different types of cell from the same genetic blueprint? The short is answer is that different genes get turned on and off in different parts of our body, but the long answer is much more complicated.
Previous work in my graduate labs showed that the 3′ end of messenger RNA varies between cell types, such that more quickly growing cells such as stem cells and cancer have shorter 3′ untranslated regions. These 3′ UTRs do not encode proteins and instead are used to regulate the mRNA, and so it seemed likely that alternative 3′ UTR usage would result in widespread gene regulatory changes. The hypothesis proposed by these previous papers was that longer UTRs would result in lower half-lives and translation rates because of negative regulatory motifs (eg microRNA target sites) that were unique to the long isoforms.
To address this hypothesis, I performed a transcription shut-off experiment and then applied a new 3′ UTR analysis protocol I developed. By measuring 3′ UTR usage over time, I could then calculate isoform-specific half-lives. Surprisingly, I found that mRNA half-lives for short and long isoforms correlated very highly. I was able to quantify the contribution of known sequence motifs to the differential stability of mRNA isoforms, confirming the sensitivity of my experiments and allowing me to identify new regulatory motifs.
If mRNA half-lives were very similar, then perhaps the translation rates would differ, explaining the large-scale 3′ UTR shifts. I performed sucrose gradient centrifugation to separate mRNAs based on the number of ribosomes bound and again measured 3′ UTR isoforms, allowing me to calculate relative translation rates for short and long isoforms. Again surprisingly, I found the short and long isoforms correlated very highly, and so we concluded that alternative 3′ UTR usage likely results from rather than contributes to changes in cellular proliferation.