the (g)kenney lab | resources

There are a vast number of useful science tools online! This is a non-exhaustive list of some sites or tools that may be helpful for anyone whose interests overlap with those of this lab. To make the list a little less unwieldy, it's subdivided - click each section to expand or hide the contents.

computational biology and bioinformatics tools

getting started

The Missing Semester of your CS Education • Getting comfortable with the command line, version control, text editors, etc. can save so much time for actual computational work and data analysis!
Happy Belly Bioinformatics • Approachable guides to Unix and R. More on it in a publication.
The Carpentries • Science-focused organization for teaching coding and data analysis. Though built around formal workshops, the curricula can be pretty helpful too.

sequence-focused databases

NCBI • Central NCBI databases for nucleotide and protein sequences, along with raw sequencing data (among other things).
ENA • A European nucleotide database similar to NCBI's. Again, accompanied by many additional databases (including some that aren't universal yet for MS and metabolomics data.)
JGI/IMG • A database for well-annotated microbial (meta)genomes and transcriptomes from the DOE. Downloadable sequences and metadata.
UniProt • A central repository for well-annotated protein sequences (and metadata, and sometimes AlphaFold predictions). Likely to contain anything from NCBI's NP and NR databases, and the ENA, but not necessarily NCBI WGS or JGI/IMG.
InterPro • An EBI-hosted central repository for protein family information (includes PFAM and TIGRfam data); also includes classification tools, profile hidden Markov models, etc. The EBI also hosts the AlphaFold DB.

sequence-related tools

bioinformatics.org • Basic nucleotide-based sequence analysis/manipulation. Everyone uses it for generating the reverse complement of sequences because it is the first Google result.
Jalview • Free Java-based sequence maniuplation tool for visualizing and manipulating sequences. OS-agnostic, and lighter-weight than things like uGene for basic alignment and visualization.
EFI-EST • The preferred web tool to construct sequence similarity networks for your protein family. Designed for UniProt integration, but can handle custom .fasta input. Advanced functionality (genome neighborhood visualization via EFI-GNT) available only for UniProt sequences.
BLAST • Alignment-based search for similar protein or nucleic acid sequences. Servers targeting specific databases available via NCBI, UniProt, IMG (login may be required for full functionality), etc.
MMSeqs2 • Sequence search and clustering for large numbers of search and target sequences. Faster than all-by-all BLAST, can do profile searches, etc. They maintain downloadable Uniclust databases of Uniprot sequences clustered via MMSeqs2 at various identity percentages, with enriched HHblits-based annotation.
Diamond • A DNA alignment program(download only) that is faster than BLAST in larger databases but is still a little less widespread.
MAFFT • A tool for accurate protein and nucleic acid sequence alignment. Server available (and it can also be integrated into Jalview and Geneious).
MUSCLE • Another tool for accurate alignment of protein or nucleic acid sequences. Download for local install or run via JalView or Geneious. Either MAFFT or MUSCLE are generally better choices than ClustalΩ in terms of accuracy.
NGPhylogeny.fr • A server with a bunch of alignment and phylogeny tools.
HMMER • A toolset for profile hidden Markov model use and construction. EBI provides a more limited online version of these tools.
InterProScan • EBI search tool to identify protein or domain families from the InterPro database in a protein sequence.
HHBlits and HHPred • Part of an extensive toolset from the Max Planck Institut with a web interface. HHblits searches HMM databases, looking for remote homology (while HHPred takes into account structure), but many additional tools (MAFFT and MUSCLE installs, secondary structure predictors, CLANS, DIAMOND-DeepClust and MMSeqs2 implementation, etc.) are also available.
Expasy • From the SIB, also many online tools or predicting protein sequence properties. Frequently used: STRING, ProtParam, PeptideMass.
SignalP • Predict signal peptides in proteins.
DeepTMHMM • Predict transmembrane domains in proteins. Note that its predecessor - TMHMM 2.0 - has a nice single-line tab-delimited output.
TopCons • More transmembrane prediction.
CAVER • Predicting and exploring tunnels and channels in protein structures.

protein structure databases

RCSB PDB • Repository for protein structures (mostly X-ray crystallography and CryoEM but some NMR); now also includes the CSM (which combines AlphaFoldDB and a similar RoseTTAFold dataset. (Note that these predicted structures don't show up in the results by default!)
PDBe • European breanch of the global repository for protein structures; see also the CryoEM-focused EMDB.
wwPDB • Worldwide protein databank - merges PDBe, PDBj, RCSB PBD, EMDB, and BMRB databases.

protein structure tools

AlphaFold 2 • Predict protein-only structures with this Colab version of AlphaFold 2 (use a cluster install for the full version)!
ColabFold • Predict protein structures with AlphaFold2, more configurably (multiple Colab notebooks available, including various experimental ones and RoseTTAFold implementations too)! They also have a helpful Discord. See also localColabFold.
AlphaFold 3 • Predict protein structures with AlphaFold3, now with a (limited) set of ligands, PTMs, and non-protein partners available via the server. There are efforts to reverse-engineer it; TBD on the best options.
RoseTTAFold • Predict protein structures - also via the Baker lab's server (login required) and a ColabFold notebook.
RoseTTAFold All-Atom • Predict protein structures with ligands/PTMs/non protein partners, Baker lab style. No servers/notebooks yet...
DALI • A server to look for structural homology in the PDB with a source structure. Recently added AlphaFold DB searching.
FoldSeek • An additional structural homology search server - looks at non-PDB sources (like UniProt AlphaFold predictions) but by default won't delve as deeply in the PDB. Can also be installed locally. I would opine that for comparison against the PDB, DALI still seems to be slightly better but it is much slower and the server version doesn't dig through the full AlphaFoldDB etc., while FoldSeek does.
AlphaFold Clusters • The AFDB, but clustered by sequence and structural similarity. Using the UniProt ID for a representative, find out about similar proteins and similar clusters. "Dark" clusters of proposed novel proteins are flagged.
Pymol • Commonly used for viewing macromolecule structures; probably has the most scripts and plugins still, though since it's no longer free it's losing some ground to Chimera. Absolutely bookmark the PyMOL wiki when working with Pymol.
UCSF ChimeraX • A free alternative to PyMOL. Under active development, and pretty powerful, though I found it a little more unintuitive. Also has tutorials.
Not discussed • X-ray crystallography, CryoEM, and protein NMR tools for structure elucidation, or any protein design tools. These are their own world and I'm not up to date enough on some of them to provide a solid list (particularly in the fast-evolving protein design area).

phylogeny-related tools

MEGA • A downloadable program with highly configurable phylogeny tools for smaller sequence sets.
FastTree • Does what it says on the tin: makes trees - even really, really large ones - fast. Not an online implementation - you've got to run it locally.
IQ-TREE • A newer maximum-likelihood tree tool (I'd recommend it over RAxML or PhyML these days except for large datasets, where FastTree still wins.) The ModelFinder module is excellent for exploring substitution models. There's a webserver; worth installing locally.
NGPhylogeny.fr • A server with a bunch of alignment and phylogeny tools.
ggTree • A very flexible tree visualization package in R with a predictably real learning curve.
figTree • A clunky but widespread OS-agnostic program for tree visualization
Dendroscope • A differently clunky alternative tree viewing program.
iTOL • An online tool for tree viewing and annotation, but subscriptions now limit some of the options.
Phylo.io • A quick tree viewer.
FoldTree • What if phylogeny, but protein structures? There's a Colab version.

natural products

natural product prediction

antiSMASH • An excellent server (offline implementations are also available) for predicting biosynthetic gene clusters in contigs or genomes. Best for well-studied families.
Prism • An alternate BGC discovery tool that has some strengths for NRPS and PKS systems particularly. One (also NRPS and PKS-focused) competitor is Nerpa.
DeepBGC • Deep learning / natural language processing approach to finding novel BGCs. A direction I expect more tools to go, and this is probably one of the higher-profile tools that's tried to move beyond the comparison to previously studied BGCs. Haven't seen it opening that many new directions yet in the natural product literature I pay most attention to, but it's both relatively new and pre-AlphaFold... (Runs locally, alas.) Other efforts in a similar direction include GECCO and SanntiS (heir to emeraldBGC). Both of these can generate antiSMASH-compatible output (but in my very non-quantitative exploration, I'd say they are both more conservative than deepBGC).
BiGSCAPE-Corason • Tool for exploring BGC similarity; relies on antiSMASH, which can be a challenge for new BGC types.
EFI-GNT • A webtool for a sequence-dependent way of exploring and visualizing genomic neighborhoods (requires an EFI-EST generated SSN for a gene of interest; relies on UniProt as a database souce.)
prettyClusters • (My) tiny sequence-independent tool for exploring gene clusters without relying on previous BGC annotations. Accepts GenBank and IMG input (with a workflow for nucleotide input), relies on blast (optionally hmmer, mafft, prokka, and interproscan for various steps), yields tab-delimited metadata along with vector figures and cytoscape files. Sadly still just an R package (maybe you can help...?)
SocialGene • A somewhat similar approach implemented rather differently. Starts with GenBank or sequence input, uses a Docker-dependent Nextflow workflow (optional steps rely on MMseqs2, DIAMOND, HMMER, antiSMASH, etc.), and yields a Neo4j databases (can be explored via a Docker Neo4j implementation and Cytoscape).
CAGECAT • Online implementation of clinker (for visualizing genomic neighborhoods) and cblaster (a multiblast tool for gene clusters).
NPLinker • Natural product focused tool for linking genomic and metabolomic data.

natural product databases

MiBIG • A database of natural product biosynthetic gene clusters with experimental evidence (hooked into antiSMASH, with annotation for each BGC.) Great for the entries it has, but addition of new ones is relatively slow/conservative.
NPAtlas • Database of info for known natural products.
StreptomeDB • Streptomyces natural product database.
Norine • NRPS-focused natural product database.
NP-MRD • Natural product NMR database!
Paired Omics Data Platform • A natural product focused database for linking genomic and metabolomic data - standardizing formats and adding metadata!

molecular biology

general

OpenWetWare • A general molecular/synthetic/micro biology wiki. Lots of protocols.
IDT • As a primer manufacturer, IDT has a useful set of nucleotide tools - OligoAnalyzer (for primer QC) and PrimerQuest (for qRT-PCR primer design) are particularly helpful.
NEB • Similarly, NEB has a very useful set of molecular biology tools. Especially helpful - NEBcloner (for traditional cloning, including digests, ligation, mutagenesis, etc.), NEBuilder (for Gibson/HiFi assembly).
Primer3Plus • Online primer design tool, very customizable.
Non-free tools • Geneious, DNAStar/Lasergene, SnapGene

genomes and transcriptomes

bowtie • A free and well-established short read aligner. Works with tophat if you are operating in a system where splicing is relevant.
velvet • Freeware for de novo assembly of genomes.
prokka • Program for preliminary annotations for prokaryotic genomes. I'd currently recommend it over RAST, which can make non-standard GenBank files with some settings (including as implemented in the kBase pipeline). (If you care a lot about actual protein family IDs, though, I'd suggest running CDS output through a commandline version of InterProScan
PGAP • The NCBI-supported prokaryotic genome annotation pipeline.
Galaxy • A solid web platform for 'omics work.
kBase • A web platform that implements a lot of genome- and systems biology-related programs. Powerful but workflow setup can be really finicky.
Anvi'o • A microbiology-targeted 'omics platform. They have some helpful tutorials that include a bunch of extras, from unix tutorials to discussions of scientifically stringent approaches to 'omics workflows. To some extent, the same caveat as for kBase: very thorough, but definitely designed to send you through workflows in a specific way.
roary • A pipeline for pangenome identification.
Cluster 3 • A simple program with a basic GUI for doing a bunch of hierarchical clustering type analyses.
JavaTreeView • Another simple program with a GUI for viewing clustered heatmaps (including Cluster 3 output with trees).
Non-free tools • Geneious, DNAStar/Lasergene, CLC Workbench and related tools generally encompass many of the free tools for working with protein and nucleic acid sequences under a single GUI.

microbiology

general

ActinoBase • An exceedingly helpful wiki for all things actinomycete.
Practical Streptomyces Genetics (pdf) • A.K.A. "the Streptomyces Bible."
EcoliWiki • A wiki devoted to all things E. coli.
Methanotroph Commons • A starting point for working with methane-oxidizing bacteria. (Anaerobic archaeal methane-oxidizers excluded.)
BioCyc • From genomes to metabolic models. also has tools for mapping omics data onto models! (See also kBase, sorta.)
NPDC • Natural products-focused portal for the Scripps strain collection. BLASTable. (Listed here and not under natural products because it's particularly helpful for flagging strains of interest.)
ATCC • The American strain collection
DSMZ • The German strain collection - often cheaper than the ATCC, including gDNA preps. See the associated BacDive database of bacterial phenotypic information (along with genomic info), and MediaDive, which has all the media recipes you could want.
NRRL • The ARS Culture Collection. Particularly rich in soil bacteria, strains (within the US, at least) are quite affordable.
List of culture collection acronyms • Trying to find a CMGCC or UC or LMG strain? Now you know which culture collection to peruse. Courtesy of the JCM (the Japan Collection of Microorganisms, not actually included on the list.)
Other common strain collections • NCIMB (UK), NBRC (Japan), JCM (Japan) are the ones that I run into most beyond the ATCC, DSMZ, and NRRL.

chemistry

general

NotVoodoo • A classic synthetic organic chemistry lab bible.
Schlenk Line Survival Guide • Exactly what it sounds like, with many helpful figures.
SDBS • A database for MS, H/C NMR, IR, RR, and EPR for small organic compounds.
SpectraBase • Another broad database for NMR, MS, UV-vis (!), and other info for small organic compounds. Availability is sometimes spotty, but does have links to papers.
PubChem • Reference info for chemicals (exact mass, solubility, SMILES code, suppliers, etc.)
ChEMBL • "The ENA to PubChem's NCBI."" (Well, it's actually specifically focused on bioactive compounds.) See also ChEBI, which is focused on "small" compounds, and UniChem, which is a blanket compound database that also pulls in data from other sources.
Crystallography Open Database • Small molecule crystallography database. See also the not-free Cambridge Structural Database.
SciFinder • Search for papers, suppliers, syntheses, etc. for a given compound (or even for structurally similar things.) Requires login.
Reaxys • Along the same lines as SciFinder, may turn up different things; also requires login.
Non-free tools • ChemDraw, ChemAxon's Chemicalize, etc.
Not discussed • Computational chemistry software. While I can point to some well-established tools (Gaussian, MOE, AutoDock and AutoDock Vina, etc.), I'm not that up to date and there are a lot of tools out there.

chromatography

Cytiva's FPLC calculators • Some occasionally useful tools that can help with method conversion etc. for FPLCs.
HPLC Columns • A database of HPLC column selectivity for reversed-phase columns with tools to compare columns. Very specific, but occasionally rather handy. See also similar tools from the USP, from ACD, and a table from Waters.
Thermo's HPLC calculators • Primarily relevant for the HPLC and preparative HPLC method transfer tools and some troubleshooting guides. See also Sigma's Supelco guide.
Phenomenex's tools • Mostly partfinders, but there are also a few sometimes-helpful tools for identifying potentially useful resins, along with method calculators.
HPLC Troubleshooting Guide (pdf) • Waters' classic guide.
HPLC - A Troubleshooting Guide (pdf) • Thermo's competing guide.
OpenChrom • Chromatography freeware, OS-agnostic. Originally developed more for GC/MS, since extended to many other things; in practice, I've found it less helpful for my most common LC runs (LC-MS).

mass spectrometry

XCMS • Online metabolomics analyses. (An R package is also available.)
GNPS • Online molecular networking for mass spectrometry datasets! Developed by natural products people.
NPDtools • Can be run as part of GNPS workflow or individually - focused on metabologenomics for peptidic natural products. Haven't used it much yet?
mzMine • Downloadable open-source program (OS-agnostic) with a GUI for mass spec data processing.
Metlin • Database with searchable MS2 datasets. (The Gen2 version is not free though, sadly.)
CFM-ID • MS2 prediction and assignments.
MASST • Search a single MS2 spectrum against public GNPS libraries.
PeptideMass • Expasy's digest prediction tool (good for exact mass prediction for MS on digested peptides).
ChemCalc • Online tools for exact masses, including molecular formula prediction from monoisotopic masses, peptide MS2 fragmentation, and isotopic distribution.
enviPat • Simulate isotopic distributions at various charges and resolutions from chemical formulas.
UWPR proteomics • Additional implementations of several useful proteomics-related calculators. Note: the amino acid reference masses are for amino acids in peptides (i.e. in amide bonds, without terminal amine or carboxyl groups.)
ProteoWizard • Good for converting horrible data formats (Windows is needed for some of them).
ProSight (PTM) • Top-down analysis of intact proteins and peptides, including PTMs. The current version is sold by Thermo.
Scaffold • More traditional proteomics, including quantiative stuff (iTRAQ, TMT, SILAC). Viewers can be downloaded and used for free; actual data analysis is not.
Non-free tools • MassHunter (Agilent), MassLynx (Waters), XCalibur and Compound Discover (Thermo), Compass (Bruker), Mnova, ACD, etc... Many of these programs are quite powerful (or have addons that are), but they are also quite expensive.

NMR and EPR

BMRB • A database of NMR spectra for biomolecules and metabolites.
NMRShiftDB • A database of NMR spectra for small organic compounds.
NP-MRD • Natural product NMR database! Listed here because it really is NMR focused.
EasySpin • A Matlab-based toolbox for simulating and fitting EPR data - everyone uses this one.
Non-free tools • Mnova and ACD again.

UV-vis and enzymology

KinTek Explorer • Well-regarded software for complex kinetics
DynaFit • A somewhat simpler took for enzyme kinetics; not as actively maintained as far as I can tell.
Spectragryph • Handles UV-vis and fluorescence data processing (along with NIR, FT-IR, Raman) from approximately a zillion data formats, including kinetics-relevant ones.
BRENDA • Enzyme database. A laudable effort to get a real database for enzyme behavior (not just activity, but behavior of variants, structure, etc., with everything tied back to the literature.) It is, of course, a Sisyphean task.
Non-free tools • There are subsets of people who are advocates of OriginPro, Igor Pro, GraphPad Prism, and KaleidaGraph for working with spectroscopy data (and in some cases doing simpler kinetics); several of these tools have been around long enough to have accumulated user scripts and extensions and so on, which can be occasionally helpful. If you don't want to go to the extreme of using R or python or Matlab to make most/all of your plots, these are also decent alternatives with GUIs that are, importantly, not Excel.