Tools

The lab uses a range of computational approaches, many of which we are actively developing. A subset of these tools are listed below. The development of novel tools and approaches is central to our ability to ask new types of questions.

 

Simulation tools

We use and develop several tools for performing molecular simulations. Of note, PIMMS is our in-house developed lattice-based simulation engine, and CAMPARITraj is our suite of analysis tools for working with disordered protein sequences.

 
PIMMS_logo-01.png

PIMMS

PIMMS (Polymer Interactions in Multicomponent MixtureS) is a high-performance coarse-grained lattice-based simulation engine developed explicitly for studying phase transitions of complex heteropolymers, such as intrinsically disordered proteins. PIMMS was developed by Alex while in the Pappu Lab, and is a MOLSSI sponsored project.


SOURSOP

SOURSOP (Simulation analysis Of Unstructured and disordered RegionS Orchestrated in Python) is an integrative analysis suite for analyzing all-atom simulation trajectories of intrinsically disordered proteins. While SOURSOP was developed with CAMPARI Monte Carlo simulations in mind, it can be broadly used with almost any simulation engine and trajectory type.

The initial prototype for SOURSOP was developed by Alex while in the Pappu lab (as a MOLSSI-sponsored project) and completed by Pappu lab member Jared Lalmansingh.

Paper: Lalmansingh et al. JCTC (2023)

Documentation: https://soursop.readthedocs.io/

Code: https://github.com/holehouse-lab/soursop

PyPI entry: https://pypi.org/project/soursop/


logov1-01.png

SolutionSpaceScanner

SolutionSpaceScanner is a Python toolkit that includes a command-line tool for re-wiring the solvation behaviour of polypeptides by creating customizable parameter-files for the ABSINTH implicit solvent model. SolutionSpaceScanner was developed with the Sukenik lab at UC Merced and is a MOLSSI sponsored project.

Paper: Holehouse & Sukenik JCTC (2020)

Documentation: https://solutionspacescanner.readthedocs.io/

Code: https://github.com/holehouse-lab/solutionspacescanner

PyPI entry: https://pypi.org/project/solutionspacescanner/


SEQUENCE DESIGN TOOLS

The lab also develops tools for the rational design of disordered protein regions. Our first such tool (GOOSE) is online as of October 2023!


GOOSE

GOOSE is a package for rationally designing disordered regions with specific sequence properties. The associated preprint will be online soon, but the code and documentation are already available. GOOSE was used in the design of sequence libraries for the ALBATROSS preprint, as well as several unpublished projects!

Preprint: Emenecker & Guadalupe et al. bioRxiv (2023)

Code: GitHub repository

Documentation: ReadTheDocs

Google Colab Notebook: Colab notebooks (in alpha - please report any issues!)


SEQUENCE ANALYSIS TOOLS

In addition to tools associated with molecular simulations and sequence design, a significant focus of the Holehouse lab is the development of tools to analyze protein sequence information. Below are several of our lab-developed methods. All are developed in Python.


ALBATROSS

ALBATROSS is a collection of deep learning models that enable the direct prediction of disordered protein dimensions from sequence. ALBATROSS is implemented inside sparrow, our general sequence analysis framework. However, ALBATROSS predictions are also available via several Google colab notebooks, as well as via the metapredict.net webserver.

Paper: Lotthammer, Ginell, and Griffith et al. Nature Methods (2024)

Code: Github repo for sparrow

Google colab notebooks: Colab notebooks

Webserver: https://metapredict.net/


metapredict

metapredict is our high-performance, deep-learning-based disorder predictor. Metapredict provides both a Python API and a command-line tool for interacting with FASTA files or directly downloading sequences from the UniProt database. In addition, we provide a web server for individual sequences that can be accessed at http://metapredict.net/, and a Google colab notebook for multiple sequences.

Metapredict is a top-10 CAID predictor (see DISORDER-PDB) and many (many) orders of magnitude faster than other predictors with equivalent accuracy.

NOTE: As of May 2023 the default metapredict implementation has been updated to metapredict V2-FF. This is identical to metapredict V2 in terms of disorder predictions but offers dramatic improvements in performance on CPUs and GPUs.

Metapredict was developed by graduate students Ryan Emenecker and Dan Griffith.

Paper: Emenecker et al. Biophysical Journal (2021) and follow-up permanent preprint (Metapredict V2 - Emenecker et al. bioRxiv (2022)).

Supporting data: GitHub supporting data repository

Documentation: metapredict documentation

Code: GitHub repository and PyPI project

Web server (single sequences): https://metapredict.net/

Google colab notebook (multiple sequences): Click here


SHEPHARD

SHEPHARD is our general framework for organizing and annotating large-scale protein-based datasets. SHEPHARD was developed by Garrett Ginell.

Paper: Ginell et al. Bioinformatics (2023)

Supporting data: GitHub supporting data repository

Documentation: SHEPHARD documentation

Code: GitHub repository and PyPI project

Colab notebooks: Annotated human proteome, general examples


PARROT_V1.png

PARROT

PARROT is a general-purpose deep learning platform for mapping between amino acid sequence and some arbitrary sequence annotation. PARROT was developed by graduate student Dan Griffith, and the logo was created by undergrad Shub Minhas.

Paper: Griffith & Holehouse eLife (2021)

Supporting data: GitHub supporting data repository

Documentation: parrot documentation (includes an introduction to deep learning for sequence prediction)

Code: GitHub repository and PyPI project


logo-01.png

protfasta

protfasta is a Python API and command-line tool for reading, parsing, and sanitizing protein FASTA files. protfasta was developed by Alex, and has been effectively used on datasets numbering millions of protein sequences without issue.

Documentation: protfasta documentation

Code: GitHub repository and PyPI project

Zenodo: Zenodo record

DOI: 10.5281/zenodo.4482762

How to cite: Please cite the DOI above as well as the version used