Research

Overview:

nice2.png

Intrinsically disordered proteins (IDPs) and regions (IDRs) make up around one-third of all eukaryotic proteomes and are found in a wide range of proteins critical for cellular function. Despite their abundance, we lack a general understanding of how function is encoded into IDRs. This is in contrast to folded domains, where decades of work have led to powerful bioinformatics tools that can identify specific functionally annotated domains directly from sequence.

Our goal is to understand how function is encoded into disordered sequences using a combination of computational and experimental approaches. Rather than treating IDRs as protein dark matter, we wish to understand - at a mechanistic level - how IDRs mediate function. By developing general approaches, we will be able to uncover new mechanistic insight, explore how IDRs vary across evolution, and predict new protein functions from sequence alone. In particular, we are trying to decode the sequence-ensemble-function relationship, that is, how does the amino acid sequence of an IDR dictate it’s conformational ensemble and how do sequence and ensemble interact to ultimately dictate function.

From a clinical perspective, mutations in IDRs are substantially over-represented in many diseases, including neurodegenerative conditions, rare genetic disorders, and many types of cancer. By approaching our problems through a mechanistic lens, we hope to provide insight into the molecular etiology of these diseases, ultimately opening new avenues for treatment.

We combine computational and experimental approaches to explore the relationship between sequence and function in IDRs. Specifically, we use all-atom and coarse-grained simulations coupled with a range of bioinformatics approaches and quantitative cell biology. A major focus of the lab is on the development of robust computational tools using industry-standard practices. With a range of methods at our disposal, we can integrate many different types of data to better understand complex biological systems in a quantitative, mechanistic, and predictive way.



PROJECTS:

with_ensemble.png

What determines molecular specificity in IDR-mediated interactions?

IDRs lack a fixed 3D structure, which may imply they cannot engage in specific molecular recognition. We are interested in understanding if and how IDRs can contribute to molecular specificity, with a focus on the emerging idea of chemical specificity instead of structural specificity.

We are developing conceptual and computational tools to understand how chemical specificity influences IDR-mediated interactions.

How DO IDRS interact transiently with folded domains?

Although historically IDRs have been studied as isolated proteins, the vast majority of IDRs are found in tandem with folded domains. We and others have identified a number of examples in which IDRs interact directly with folded domains in a regulatory capacity.

We are developing methodologies to investigate the interplay between folded domains and IDRs in a high-throughput manner. This involves new computational approaches for high-throughput all-atom simulations in conjunction with machine learning to extract key modes of interactions through ‘big-data biophysics.’ In addition, we are pursuing bioinformatics-based approaches to understand the coupling between IDRs and folded domains. Finally, we are developing new computational methods to interpret biophysical methods that study proteins with both folded and disordered domains.

What are functionally conserved signatures that exist within intrinsically disordered domains?

IDPs are often considered to be poorly conserved across evolution. We take a dim view of this perspective. Instead, while specific amino-acid sequences may be less well-conserved than in folded domains, we have numerous examples in which key functionally-relevant sequence features are strongly conserved across evolution. Using integrative computational tools across multiple length scales, we are beginning to understand how IDRs are conserved, allowing us to identify and annotate functionally important features within IDRs. In particular, we are interested in if and how deep learning based approaches can inform on understanding sequence

How is intracellular phase separation used to mediate biological function?

Over the last ten years, intracellular phase transitions have emerged as a physical framework through which many biologically important processes can be understood. While much work has been done on the physical chemistry underlying phase separation, we are interested in asking how biology can use phase transitions to mediate specific types of cellular function.

In particular, we are interested in theoretical models to understand how phase transitions may facilitate biological information processing, cellular sensing, and new environments for specific types of chemistry.


Approach:

We develop computational methods that allow us to ask new types of questions

The types of methods we develop are broad and include:

  • New ways to analyze and run physics-based simulations

  • New ways to analyze protein sequence

  • New ways to analyze and integrate experimental data

The lab follows industry-standard software practices (version control, continuous integration testing, documentation) with development occurring primarily in the Python programming language. The development of maintainable, stable, and well-documented code is critical to our scientific mission. Furthermore, these skills are of immense cross-discipline value inside and outside of academia.

We integrate experimental and computational methods

In addition to a broad range of computational methods, we will also be pursuing experimental approaches to test computational predictions.

We collaborate with the best in the world

Collaboration is at the core of what drives effective science, and we pride ourselves on pursuing integrative, collaborative projects with like-minded colleagues around the world. We have a number of well-defined collaborations in place and are excited to sit at the cutting edge of quantitative biology, biophysics, and computer science.