Plasmid Backbone Protein Naming Tool

Recommended: open this tool by itself in a new tab


We first downloaded a large database of plasmid genes from NCBI. We then wrote code to extract the amino acid sequences. After extracting the sequences and saving them as a FASTA file, we performed clustering using USEARCH. USEARCH grouped our protein sequences according to their identity.

We used R and ggplot2 to generate a visual display of this data. Each bar in the graph represents a different protein. Bars with the same color are a protein family (group of similar proteins according to identity). The bars are sorted by length, with the longest sequences at the top.

Reference Plot

The user can choose a four-letter name to be displayed from a list of about 100 common proteins. We created a very large plot of all 100 of these proteins side-by-side in PDF format. Click here to download it (recommended: view this using Acrobat).

Source Code and Feedback

The source code is available at Github.

Please send feedback and suggestions to

Download FASTA Feature

As of July 2016, the Download feature of this tool will not work online. If you would like to download amino acid sequences, run the local version in RStudio using the files from Github.