Lab wiki Lab manual, protocols, etc. Login required.
Calendars Meetings and instruments.

computational biology and bioinformatics tools

getting started

The Missing Semester of your CS Education  Getting comfortable with the command line, version control, text editors, etc. can save so much time for actual computational work and data analysis!
Happy Belly Bioinformatics  Approachable guides to Unix and R. More on it in a publication.
The Carpentries  Science-focused organization for teaching coding and data analysis. Though built around formal workshops, the curricula can be pretty helpful too.

sequence-focused databases

NCBI  Central NCBI databases for nucleotide and protein sequences, along with raw sequencing data (among other things).
ENA A European nucleotide database similar to NCBI's. Again, accompanied by many additional databases (including some that aren't universal yet for MS and metabolomics data.)
JGI/IMG A database for well-annotated microbial (meta)genomes and transcriptomes from the DOE. Downloadable sequences and metadata.
UniProt A central repository for well-annotated protein sequences (and metadata, and AlphaFold predictions). Likely to contain anything from NCBI's NP and NR databases, and the ENA, but not necessarily NCBI WGS or JGI/IMG.
InterPro  An EBI-hosted central repository for protein family information (includes PFAM and TIGRfam data); also includes classification tools, profile hidden Markov models, etc. The EBI also hosts the AlphaFold DB.

sequence-related tools  Basic nucleotide-based sequence analysis/manipulation. Everyone uses it for generating the reverse complement of sequences because it is the first Google result.
Jalview  Free Java-based sequence maniuplation tool for visualizing and manipulating sequences. OS-agnostic, and lighter-weight than things like uGene for basic alignment and visualization.
EFI-EST A web tool to construct sequence similarity networks for your protein family. Designed for UniProt integration, but can handle custom .fasta input.
BLAST Alignment-based search for similar protein or nucleic acid sequences. Servers targeting specific databases available via NCBI, UniProt, IMG (login may be required for full functionality), etc.
Diamond A DNA alignment program(download only) that is faster than BLAST in larger databases but is still less widespread.
MAFFT A tool for accurate protein and nucleic acid sequence alignment. Server available (and it can also be integrated into Jalview and Geneious).
MUSCLE Another tool for accurate alignment of protein or nucleic acid sequences. Download for local install or run via JalView or Geneious. Either MAFFT or MUSCLE are generally better choices than ClustalΩ in terms of accuracy. A server with a bunch of alignment and phylogeny tools.
HMMER A toolset for profile hidden Markov model use and construction. EBI provides a more limited online version of these tools.
InterProScan EBI search tool to identify protein or domain families from the InterPro database in a protein sequence.
HHBlits and HHPred Part of an extensive toolset from the Max Planck Institut with a web interface. HHblits searches HMM databases, looking for remote homology (while HHPred takes into account structure), but many additional tools (MAFFT and MUSCLE installs, secondary structure predictors, CLANS, DIAMOND-DeepClust and MMSeqs2 implementation, etc.) are also available.
Expasy From the SIB, also many online tools or predicting protein sequence properties. Frequently used: STRING, ProtParam, PeptideMass.
SignalP Predict signal peptides in proteins.
DeepTMHMM Predict transmembrane domains in proteins.
TopCons More transmembrane prediction.
CAVER Predicting and exploring tunnels and channels in protein structures.

protein structure databases

RCSB PDB Repository for protein structures (mostly X-ray crystallography and CryoEM but some NMR); now also includes the CSM (which combines AlphaFoldDB and a similar RoseTTAFold dataset. (Note that these predicted structures don't show up in the results by default!)
PDBe European breanch of the global repository for protein structures; see also the CryoEM-focused EMDB.
wwPDB Worldwide protein databank - merges PDBe, PDBj, RCSB PBD, EMDB, and BMRB databases.

protein structure tools

AlphaFold Predict protein structures with this Colab version (use a cluster install for the full version)!
ColabFold Predict protein structures with AlphaFold2, more configurably (multiple Colab notebooks available, including various experimental ones and RoseTTAFold implementations too)! They also have a helpful Discord. See also localColabFold.
RoseTTAFold Predict protein structures - also via the Baker lab's server (login required).
DALI A server to look for structural homology in the PDB with a source structure. Recently added AlphaFold DB searching.
FoldSeek An additional structural homology search server - looks at non-PDB sources (like UniProt AlphaFold predictions) but by default won't delve as deeply in the PDB. Can also be installed locally. I would opine that for comparison against the PDB, DALI still seems to be slightly better but it is much slower and the sever install doesn't dig through the AlphaFoldDB etc.
Pymol Commonly used for viewing macromolecule structures; probably has the most scripts and plugins still, though since it's no longer free it's losing some ground to Chimera. Absolutely bookmark the PyMOL wiki when working with Pymol.
UCSF ChimeraX A free alternative to PyMOL. Under active development, and pretty powerful, though I found it a little more unintuitive. Also has tutorials.
Not discussed  X-ray crystallography, CryoEM, and protein NMR tools for structure elucidation. These are their own world and I'm not up to date enough on some of them to provide a solid list.

phylogeny-related tools

MEGA A downloadable program with highly configurable phylogeny tools for smaller sequence sets.
FastTree Does what it says on the tin: makes trees - even really, really large ones - fast. Not an online implementation - you've got to run it locally.
IQ-TREE A newer maximum-likelihood tree tool (I'd recommend it over RAxML or PhyML these days except for large datasets, where FastTree still wins.) The ModelFinder module is excellent for exploring substitution models. There's a webserver; worth installing locally. A server with a bunch of alignment and phylogeny tools.
ggTree A very flexible tree visualization package in R with a predictably real learning curve.
figTree A clunky but widespread OS-agnostic program for tree visualization
Dendroscope A differently clunky alternative tree viewing program.
iTOL  An online tool for tree viewing and annotation, but subscriptions now limit some of the options.  A quick tree viewer.

natural products

natural product prediction

antiSMASH An excellent server (offline implementations are also available) for predicting biosynthetic gene clusters in contigs or genomes. Best for well-studied families.
Prism An alternate BGC discovery tool that has some strengths for NRPS and PKS systems particularly. One (also NRPS and PKS-focused) competitor is Nerpa.
DeepBGC Deep learning / natural language processing approach to finding novel BGCs. A direction I expect more tools to go, and this is probably one of the higher-profile tools that's tried to move beyond the comparison to previously studied BGCs. Haven't seen it opening that many new directions yet in the natural product literature I pay most attention to, but it's both relatively new and pre-AlphaFold... (Runs locally, alas.)
BiGSCAPE-Corason Tool for exploring BGC similarity; relies on antiSMASH, which can be a challenge for new BGC types.
EFI-GNT A webtool for a sequence-dependent way of exploring and visualizing genomic neighborhoods (requires an SSN for a gene of interest; relies on UniProt as a database souce.)
prettyClusters (My) tiny sequence-independent tool for exploring gene clusters without relying on previous BGC annotations. Accepts GenBank and IMG input. Sadly still just an R package (maybe you can help...?)
CAGECAT Online implementation of clinker (for visualizing genomic neighborhoods) and cblaster (a multiblast tool for gene clusters).
NPLinker Natural product focused tool for linking genomic and metabolomic data.

natural product databases

MiBIG A database of natural product biosynthetic gene clusters with experimental evidence (hooked into antiSMASH, with annotation for each BGC.) Great for the entries it has, but addition of new ones is relatively slow/conservative.
NPAtlas Database of info for known natural products.
StreptomeDB Streptomyces natural product database.
Norine NRPS-focused natural product database.
NP-MRD Natural product NMR database!
Paired Omics Data Platform A natural product focused database for linking genomic and metabolomic data - standardizing formats and adding metadata!

molecular biology


OpenWetWare A general molecular/synthetic/micro biology wiki. Lots of protocols.
IDT As a primer manufacturer, IDT has a useful set of nucleotide tools - OligoAnalyzer (for primer QC) and PrimerQuest (for qRT-PCR primer design) are particularly helpful.
NEB Similarly, NEB has a very useful set of molecular biology tools. Especially helpful - NEBcloner (for traditional cloning, including digests, ligation, mutagenesis, etc.), NEBuilder (for Gibson/HiFi assembly).
Primer3Plus Online primer design tool, very customizable.
Non-free tools  Geneious, DNAStar/Lasergene, SnapGene

genomes and transcriptomes

bowtie A free and well-established short read aligner. Works with tophat if you are operating in a system where splicing is relevant.
velvet Freeware for de novo assembly of genomes.
prokka Program for preliminary annotations for prokaryotic genomes. I'd currently recommend it over RAST, which can make non-standard GenBank files with some settings (including as implemented in the kBase pipeline).
PGAP The NCBI-supported prokaryotic genome annotation pipeline.
Galaxy  A solid web platform for 'omics work.
kBase A web platform that implements a lot of genome- and systems biology-related programs. Powerful but workflow setup can be really finicky.
Anvi'o  A microbiology-targeted 'omics platform. They have some helpful tutorials that include a bunch of extras, from unix tutorials to discussions of scientifically stringent approaches to 'omics workflows. To some extent, the same caveat as for kBase: very thorough, but definitely designed to send you through workflows in a specific way.
roary A pipeline for pangenome identification.
Cluster 3  A simple program with a basic GUI for doing a bunch of hierarchical clustering type analyses.
JavaTreeView  Another simple program with a GUI for viewing clustered heatmaps (including Cluster 3 output with trees).
Non-free tools  Geneious, DNAStar/Lasergene, CLC Workbench and related tools.



ActinoBase An exceedingly helpful wiki for all things actinomycete.
Practical Streptomyces Genetics (pdf)  A.K.A. "the Streptomyces Bible."
EcoliWiki A wiki devoted to all things E. coli.
BioCyc From genomes to metabolic models. also has tools for mapping omics data onto models! (See also kBase, sorta.)
NPDC Natural products-focused portal for the Scripps strain collection. BLASTable. (Listed here and not under natural products because it's particularly helpful for flagging strains of interest.)
ATCC  The American strain collection
DSMZ  The German strain collection - often cheaper than the ATCC, including gDNA preps. See the associated BacDive database of bacterial phenotypic information (along with genomic info), and MediaDive, which has all the media recipes you could want.
NRRL  The ARS Culture Collection. Particularly rich in soil bacteria, strains (within the US, at least) are quite affordable.
List of culture collection acronyms  Trying to find a CMGCC or UC or LMG strain? Now you know which culture collection to peruse. Courtesy of the JCM (the Japan Collection of Microorganisms, not actually included on the list.)



NotVoodoo A classic synthetic organic chemistry lab bible.
Schlenk Line Survival Guide Exactly what it sounds like, with many helpful figures.
SDBS A database for MS, H/C NMR, IR, RR, and EPR for small organic compounds.
SpectraBase Another broad database for NMR, MS, UV-vis (!), and other info for small organic compounds. Availability is sometimes spotty, but does have links to papers.
PubChem Reference info for chemicals (exact mass, solubility, SMILES code, suppliers, etc.)
ChEMBL "The ENA to PubChem's NCBI."" (Well, it's actually specifically focused on bioactive compounds.) See also ChEBI, which is focused on "small" compounds, and UniChem, which is a blanket compound database that also pulls in data from other sources.
Crystallography Open Database Small molecule crystallography database. See also the not-free Cambridge Structural Database.
SciFinder Search for papers, suppliers, syntheses, etc. for a given compound (or even for structurally similar things.) Requires login.
Reaxys Along the same lines as SciFinder, may turn up different things; also requires login.
Non-free tools ChemDraw, ChemAxon's Chemicalize, etc.
Not discussed Computational chemistry software. While I can point to some well-established tools (Gaussian, MOE, AutoDock and AutoDock Vina, etc.), I'm not that up to date and there are a lot of tools out there.


Cytiva's FPLC calculators Some occasionally useful tools that can help with method conversion etc. for FPLCs.
HPLC Columns A database of HPLC column selectivity for reversed-phase columns. Very specific, but occasionally rather handy.
Thermo's HPLC calculators Primarily relevant for the HPLC and preparative HPLC method transfer tools and some troubleshooting guides. See also Sigma's Supelco guide.
Phenomenex's tools Mostly partfinders, but there are also a few sometimes-helpful tools for identifying potentially useful resins, along with method calculators.
HPLC Troubleshooting Guide (pdf) Waters' classic guide.
HPLC - A Troubleshooting Guide (pdf) Thermo's competing guide.
OpenChrom Chromatography freeware, OS-agnostic. Originally developed more for GC/MS, since extended to many other things; in practice, I've found it less helpful for my most common LC runs (LC-MS).

mass spectrometry

XCMS Online metabolomics analyses. (An R package is also available.)
GNPS Online molecular networking for mass spectrometry datasets! Developed by natural products people.
NPDtools Can be run as part of GNPS workflow or individually - focused on metabologenomics for peptidic natural products. Haven't used it much yet?
mzMine Downloadable open-source program (OS-agnostic) with a GUI for mass spec data processing.
Metlin Database with searchable MS2 datasets. (The Gen2 version is not free though, sadly.)
CFM-ID MS2 prediction and assignments.
MASST Search a single MS2 spectrum against public GNPS libraries.
PeptideMass Expasy's digest prediction tool (good for exact mass prediction for MS on digested peptides).
ChemCalc Online tools for exact masses, including molecular formula prediction from monoisotopic masses, peptide MS2 fragmentation, and isotopic distribution.
enviPat Simulate isotopic distributions at various charges and resolutions from chemical formulas.
UWPR proteomics Additional implementations of several useful proteomics-related calculators. Note: the amino acid reference masses are for amino acids in peptides (i.e. in amide bonds, without terminal amine or carboxyl groups.)
ProteoWizard Good for converting horrible data formats (Windows is needed for some of them).
ProSight (PTM) Top-down analysis of intact proteins and peptides, including PTMs. The current version is sold by Thermo.
Scaffold More traditional proteomics, including quantiative stuff (iTRAQ, TMT, SILAC). Viewers can be downloaded and used for free; actual data analysis is not.
Non-free tools  MassHunter (Agilent), MassLynx (Waters), XCalibur and Compound Discover etc. (Thermo), Compass (Bruker), Mnova, ACD, etc... Many of these programs are quite powerful (or have addons that are), but they are also quite expensive.


BMRB A database of NMR spectra for biomolecules and metabolites.
NMRShiftDB A database of NMR spectra for small organic compounds.
NP-MRD Natural product NMR database! Listed here because it really is NMR focused.
EasySpin A Matlab-based toolbox for simulating and fitting EPR data - everyone uses this one.
Non-free tools  Mnova and ACD again.

UV-vis and enzymology

KinTek Explorer Well-regarded software for complex kinetics
DynaFit A somewhat simpler took for enzyme kinetics; not as actively maintained as far as I can tell.
Spectragryph Handles UV-vis and fluorescence data processing (along with NIR, FT-IR, Raman) from approximately a zillion data formats, including kinetics-relevant ones.
BRENDA Enzyme database. A laudable effort to get a real database for enzyme behavior (not just activity, but behavior of variants, structure, etc., with everything tied back to the literature.) It is, of course, a Sisyphean task.
Non-free tools  There are subsets of people who are advocates of OriginPro, Igor Pro, GraphPad Prism, and KaleidaGraph for working with spectroscopy data (and in some cases doing simpler kinetics); several of these tools have been around long enough to have accumulated user scripts and extensions and so on, which can be occasionally helpful. If you don't want to go to the extreme of using R or python or Matlab to make most/all of your plots, these are also decent alternatives with GUIs that are, importantly, not Excel.