Edwards
Lab


Lab Links
Homepage
EdwardsLab Blog
Research
People
Publications
Organisms
Genomes
Software
Lab Software
SLiMSuite Blog
Lab GitHub
Webservers
REST Servers
Bioware@UCD
Other Stuff
MapTime
UPGMA Walkthrough
Molecular Evolution Glossary

DCMF proteins with TaxaMap taxonomy assignments

DISCLAIMER: The data on these webpages has not yet been published and is the intellectual property of UNSW. It is provided in good faith and should not be used prior to publication without consent from Mike Manefield, Richard Edwards or Matt Lee at UNSW.

Protein sequences were predicted using prokka and the JGI Genome Portal annotation pipeline. Proteins were further annotated via high-throughput homology searching, multiple sequence alignment and molecular phylogenetics using HAQESAC and MulitHAQ. Putative taxonomic assignments for each protein were then made using TaxaMap, which identifies the taxonomic grouping of the clade to which that protein was found to belong. The full list of proteins can be found in the table, below. The genome has subsequently been annotated at NCBI and a full table of NCBI proteins is also available for browsing (click here).

Each protein was subject to a BLAST+ (blastp) search against all NCBI and JGI proteins annotated for DCMF, all bacterial proteins in the UniProt Knowledgebase (download 2017-02-06), and the published proteomes for a set of closely related bacteria as identified from a 16S phylogeny (see paper for details). HAQESAC was used to iteratively generate and clean up Clustal Omega multiple sequence alignments to produce a high quality alignment against a set of close homologues. The neighbor-joining tree implementation of Clustal W2 was used to make a phylogenetic tree (below). (NOTE: These alignments and trees are designed to give an automated first look at a protein. Where individual protein alignment and/or phylogenetic inference details are important, more careful analysis is recommended.)

Individual proteins can be looked at in further detail by clicking the protein ID. Paralogues and in-paralogues (products of gene duplication) can be looked at by editing the following URL with the appropriate GaXXXXX_XXXX ID: http://www.slimsuite.unsw.edu.au/research/dcmf/dcmf.php?protein=GaXXXXX_XXXX. HAQESAC only returns the closest homologues and these paralogue lists may be incomplete as a result.

protein, JGI locus tag; ncbi, NCBI protein ID (click ^ to open entry); prokka, prokka protein ID; jgi, JGI ID; description, JGI description; inpara, DCMF-specific "in-paralogues" identified by HAQESAC; paralogues, paralogues identified by HAQESAC; genus/family/order/class/phylum, TaxaMap taxonomy predictions based on well-supported HAQESAC clades; boot, bootstrap support (0-1) for TaxaMap clade; spcode, full list of Uniprot taxonomy species codes for HAQESAC clade.


© 2019 RJ Edwards. Contact: richard.edwards@unsw.edu.au.