I am a postdoctoral researcher in Arend Sidow's group at the Stanford University School of Medicine. I am supported by the National Institute of Standards and Technology through a new partnership with Stanford called the Joint Initiative for Metrology in Biology (or JIMB).
My current research focuses on understanding how changes to DNA during evolution result in phenotypic changes. While studying fundamental biological concepts, I am also working to improve analysis methods used for genome-wide measurements to better understand global patterns of mutation and related molecular phenotypes.
During my PhD work in biology at MIT, I studied the interplay between different post-transcriptional gene regulatory mechanisms.
I am a hybrid wet/dry researcher, doing both lab experiments as well as analyzing the results of large-scale genomics experiments.
When using high-throughput sequencing to detect mutations in a genome, scientists typically focus on point mutations, those affecting a single or small number of nucleotides at once. While point mutations are the most frequent type of mutation, larger mutations called "structural variants" can have a much larger affect on a cell, and in fact make up a majority of the bases that differ between any two individuals.
Compared with point mutations, detection of structural variants is much more difficult for this simple reason: a structural variant can connect any location in the genome to any other location in the genome, so the potential search space is enormous.
I have developed new methods to detect and characterize structural variants using long-fragment "read cloud" sequencing data. My new methods are implemented as a package called GROC-SVs (github) for the Genome-Wide Reconstruction of Complex Structural Variants. GROC-SVs demonstrates that long-fragment information, when analyzed correctly, can dramatically improve both our sensitivity and specificity for detecting structural variants.
To demonstrate the usefulness of these methods, I collected, sequenced and analyzed the genome of a liposarcoma (a type of soft-tissue tumor). I found 500 structural variants resulting in a massive rearrangement of the tumor genome (see the image). A large proportion of these events occurred as part of a chromothripsis event, involving the shattering and random reassembly of chromosome 12. Because the long fragments from this tumor often span multiple breakpoints, I was able to use GROC-SVs to reconstruct the order of much of this complex structural variation directly and automatically from the sequencing data.
In addition, I sampled 7 different sites from within this liposarcoma, allowing inference into the evolutionary history of the structural variation within the genome (see below as well). From the resulting evolutionary tree, it was possible to infer that the majority of structural variation occurred in an early burst of genome instability, resulting in 400 structural variants that pre-date the major growth of this tumor. Then, the genome found a new stable point which it maintained for significant evolutionary time with practically no additional subclonal structural variation. This is one of our first direct observations of the evolution of substantial structural variation within a tumor, and demonstrates a tumor genome can be remarkably stable even though it shows little resemblance to the normal human genome.
Cancer is a disease defined by the evolutionary progression of normal into malignant cells. Much of our intuition about how evolution works is based on concepts worked out in animals, but these are not necessarily applicable in the context of the asexual (non-recombining) tumor. We recently published a review (Trends in Genetics) discussing our current understanding of tumor evolution, and contrasting it with evolution of multicellular, sexual organisms.
We are also working to further our understanding of tumor evolution, and have focused on studying a few tumors in great detail. For each tumor we study, we pull out multiple regions, each with its own evolutionary history. From whole-genome resequencing, we can build evolutionary trees relating the samples, and thus identify regions that are closely related, or independent cancerous (or pre-cancerous) lesions within the same patient.
One of our recent findings (Genome Medicine) was that the oncogene PIK3CA can be mutated multiple times within the same patient, leading to a pre-cancerous phenotype in multiple regions. For example, in the case of a bilateral mastectomy (where both breasts are removed even though only one is typically affected by a tumor), both the breast with the "diagnostic" tumor as well as the other breast may harbor small pre-cancerous growths that developed the PIK3CA mutation.
Substantial effort has been invested into understanding how DNA mutations between individuals and species result in functional changes. The end objective of this type of work is not only to shed light on the evolutionary processes at work but also to better be able to predict the molecular basis for genetic diseases. Gene expression is an easy phenotype to measure globally, and therefore numerous studies have focused on so-called expression quantitative trait loci or eQTLs — genetic differences between individuals that result in changes in gene expression.
Surprisingly, a large-scale study of eQTLs has not been performed in mammalian embryos. We harnessed high-throughput sequencing technologies to globally compare genotypes and gene expression patterns in hybrids of two distantly-related mouse strains (Black6, the standard lab mouse strain, and Castaneus). We found that regulatory sequence in the close vicinity of a gene, so-called cis-regulatory elements (for example the promoter or 3′ untranslated region), were quite likely to harbor mutations that affected gene expression patterns. In stark contrast, mutations in upstream regulatory genes, so-called trans-regulatory factors (such as transcription factors and signaling proteins) resulted in virtually no detectable expression changes. This result, in conjunction with additional analyses, led us to conclude that, in embryos, mutations affecting gene expression are much more strongly restricted to the target gene and do not cascade into downstream expression changes.
Our other major result was the discovery of substantial influence of the mother's genotype on embryonic gene expression. While it has long been appreciated that maternal diet and activities can affect the development of a growing embryo, our results demonstrate that the mother's genetic makeup can also have similar widespread effects on embryonic development.
Since every cell in the body shares pretty much the exact same DNA, one of the fundamental questions in biology is how does our body produce so many different types of cell from the same genetic blueprint? The short is answer is that different genes get turned on and off in different parts of our body, but the long answer is much more complicated.
Previous work in my graduate labs showed that the 3′ end of messenger RNA varies between cell types, such that more quickly growing cells such as stem cells and cancer have shorter 3′ untranslated regions. These 3′ UTRs do not encode proteins and instead are used to regulate the mRNA, and so it seemed likely that alternative 3′ UTR usage would result in widespread gene regulatory changes. The hypothesis proposed by these previous papers was that longer UTRs would result in lower half-lives and translation rates because of negative regulatory motifs (eg microRNA target sites) that were unique to the long isoforms.
To address this hypothesis, I performed a transcription shut-off experiment and then applied a new 3′ UTR analysis protocol I developed. By measuring 3′ UTR usage over time, I could then calculate isoform-specific half-lives. Surprisingly, I found that mRNA half-lives for short and long isoforms correlated very highly. I was able to quantify the contribution of known sequence motifs to the differential stability of mRNA isoforms, confirming the sensitivity of my experiments and allowing me to identify new regulatory motifs.
If mRNA half-lives were very similar, then perhaps the translation rates would differ, explaining the large-scale 3′ UTR shifts. I performed sucrose gradient centrifugation to separate mRNAs based on the number of ribosomes bound and again measured 3′ UTR isoforms, allowing me to calculate relative translation rates for short and long isoforms. Again surprisingly, I found the short and long isoforms correlated very highly, and so we concluded that alternative 3′ UTR usage likely results from rather than contributes to changes in cellular proliferation.