Plasmid Backbone Protein Naming Tool
We first downloaded a large database of plasmid genes from NCBI. We then wrote code to extract the amino acid sequences. After extracting the sequences and saving them as a FASTA file, we performed clustering using USEARCH. USEARCH grouped our protein sequences according to their identity.
We used R and ggplot2 to generate a visual display of this data. Each bar in the graph represents a different protein. Bars with the same color are a protein family (group of similar proteins according to identity). The bars are sorted by length, with the longest sequences at the top.
The user can choose a four-letter name to be displayed from a list of about 100 common proteins. We created a very large plot of all 100 of these proteins side-by-side in PDF format. Click here to download it (recommended: view this using Acrobat).
Source Code and Feedback
The source code is available at Github.
Please send feedback and suggestions to firstname.lastname@example.org.
Download FASTA Feature
As of July 2016, the Download feature of this tool will not work online. If you would like to download amino acid sequences, run the local version in RStudio using the files from Github.