Edwards
Lab


Lab Links
Homepage
EdwardsLab Blog
Research
People
Publications
Organisms
Genomes
Software
Lab Software
SLiMSuite Blog
Lab GitHub
Webservers
REST Servers
Bioware@UCD
Other Stuff
MapTime
UPGMA Walkthrough
Molecular Evolution Glossary

SLiMSuite: Short Linear Motif discovery and molecular evolution toolkit

EdwardsLab Software

Software produced over the years is now available as a single package, SLiMSuite. The main homepage for these tools is the SLiMSuite Blog. These tools are written in Python 2.x and freely available for local installation under a GNU General Public License.

In recent years, focus of tool development has switched from Short Linear Motif (SLiM) discovery/analysis tools, to genome assembly and more general sequence analysis tools. These are all released in SLiMSuite, but the latter have been separated out as SeqSuite for improved clarity.

Documentation is also available through the new REST servers, which are currently under development along with some more user-friendly webservers.

SLiMSUite

Webservers

GitHub

Download

Installation

SLiMSuite Webservers GitHub Download Installation

Click on the icons for more information. You can also enter specific programs in the text box below and click View Documentation to get formatted docstring documentation. More details are available at individual lab GitHub pages, or in the legacy manuals (see below).

for:

aphid badasp budapest buscomp comparimotif_V3 depthcharge depthkopy depthsizer diploidocus fiesta gablam gapspanner gasp gfessa gopher happi haqesac multihaq peptcluster pagsat picsi pingu_V3 pingu_V4 presto_V5 qslimfinder saaga samphaser seqmapper seqsuite slimbench slimfarmer slimfinder slimmaker slimmutant slimparser slimprob slimsearch slimsuite snapper synbad taxolotl unifake


This page is under development. More information and links for specific tools will be added soon. In the meantime, please see the SLiMSuite Blog and lab GitHub pages. Icons in the tables below should redirect to the appropriate GitHub page for download.

SLiMSuite tools

Tools in this section are primarily concerned with SLiM prediction or analysis. They can be run through either slimsuite.py or their own tools/ program.

SLiMSuite SLiMSuite: Short Linear Motif discovery and analysis tools

For any tools that are missing, please visit the main SLiMSuite GitHub repo. Please note that this repo is sometimes behind the versions released through individual repos.

Citation: Edwards RJ, Paulsen K, Aguilar Gomez CM & Pérez-Bercoff Å (2020): Computational Prediction of Disordered Protein Motifs using SLiMSuite. Methods Mol Biol. 2141:37-72.


CompariMotif CompariMotif: Motif-Motif Comparison Tool.

CompariMotif is a unqiue tool for making motif-motif comparisons, identifying and describing similarities between regular expression motifs. CompariMotif can identify a number of different relationships between motifs, including exact matches, variants of degenerate motifs and complex overlapping motifs. Motif relationships are scored using shared information content, allowing the best matches to be easily identified in large comparisons. Many input and search options are available, enabling a list of motifs to be compared to itself (to identify recurring motifs) or to datasets of known motifs.

Citation: Edwards RJ, Davey NE & Shields DC (2008): CompariMotif: Quick and easy comparisons of sequence motifs. Bioinformatics 24(10):1307-9.


PeptCluster PeptCluster: Peptide Clustering Module

PeptCluster is for simple sequence-based clustering of short (aligned) peptide sequences. First, a pairwise distance matrix is generated from the peptides. This distance matrix is then used to generate a tree using a distance method such as Neighbour-Joining or UPGMA. Default distances are amino acid property differences loaded from an amino acid property matrix file.

Citation: Edwards RJ, Paulsen K, Aguilar Gomez CM & Pérez-Bercoff Å (2020): Computational Prediction of Disordered Protein Motifs using SLiMSuite. Methods Mol Biol. 2141:37-72.


QSLiMFinder QSLiMFinder: Query SLiMFinder.

QSLiMFinder is an extension of SLiMFinder that uses a specific query sequence to constrain the motif space being searched. This can greatly increase the sensitivity of the motif prediction.

Citation: Palopoli N, Lythgow KT & Edwards RJ (2015): QSLiMFinder: improved short linear motif prediction using specific query protein data. Bioinformatics 31(14): 2284-2293.


SLiMBench SLiMBench: Short Linear Motif prediction Benchmarking.

SLiMBench was is a SLiM prediction software tool developed for comparing the performance of QSLiMFinder with SLiMFinder. It can be used to generate simulated or random benchmarking datasets using ELM motifs (or other data in a similar format), or assessing SLiM predictions against those datasets.

Citation: Palopoli N, Lythgow KT & Edwards RJ (2015): QSLiMFinder: improved short linear motif prediction using specific query protein data. Bioinformatics 31(14): 2284-2293.


SLiMDisc SLiMDisc: Short Linear Motif Discovery.

SLiMDisc builds on motif predictions made with the TEIRESIAS algorithm and adjusts for evolutionary relationships. It has been replaced by SLiMFinder.

Citation: Davey NE, Shields DC & Edwards RJ (2006): SLiMDisc: short, linear motif discovery, correcting for common evolutionary descent. Nucleic Acids Res. 34(12):3546-54.; Davey NE*, Edwards RJ* & Shields DC (2007): The SLiMDisc server: short, linear motif discovery in proteins. Nucleic Acids Res. 35(Web Server issue):W455-9. []*Joint first authors]


SLiMEnrich SLiMEnrich: Identification and enrichment analysis of domain-motif interaction.

SLiMEnrich is an R Shiny app that will interrogate a pairwise protein-protein interaction dataset for possible domain-motif interactions and then calculate whether it has more that would be expected by chance.

Citation: Idrees S, Pérez-Bercoff Å & Edwards RJ (2018): SLiMEnrich: computational assessment of protein–protein interaction data as a source of domain-motif interactions. PeerJ 6:e5858


SLiMSuite SLiMFarmer: SLiMSuite HPC job farming control program

SLiMFarmer is a wrapper for SLiMSuite tools to enable parallel processing even where multiple threads are not supported by an individual tool. It can also be used to generate job and submit scripts to a qsub HPC queue.

Citation: Edwards RJ, Paulsen K, Aguilar Gomez CM & Pérez-Bercoff Å (2020): Computational Prediction of Disordered Protein Motifs using SLiMSuite. Methods Mol Biol. 2141:37-72.


SLiMFinder SLiMFinder: Short Linear Motif Finder.

SLiMFinder is an integrated SLiM discovery program building on the principles of the SLiMDisc software for accounting for evolutionary relationships. SLiMFinder is comprised of two algorithms: 1. SLiMBuild identifies convergently evolved, short motifs in a dataset. Motifs with fixed amino acid positions are identified and then combined to incorporate amino acid ambiguity and variable-length wildcard spacers. If desired, SLiMBuild> can be used as a replacement for TEIRESIAS in other software (teiresias=T slimchance=F). >2. SLiMChance> estimates the probability of SLiMBuild motifs arising by chance, correcting for the size and composition of the dataset, and assigns a significance value to each motif.

Citation: Edwards RJ, Davey NE & Shields DC (2007): SLiMFinder: A probabilistic method for identifying over-represented, convergently evolved, short linear motifs in proteins. PLoS ONE 2(10): e967. Conservation Masking: Davey NE, Shields DC & Edwards RJ (2009): Masking residues using context-specific evolutionary conservation significantly improves short linear motif discovery. Bioinformatics 25(4): 443-50. SigV/SigPrime Citation: Davey NE, Edwards RJ & Shields DC (2010): Estimation and efficient computation of the true probability of recurrence of short linear protein sequence motifs in unrelated proteins. BMC Bioinformatics 11: 14. Webserver: Davey NE, Haslam NJ, Shields DC & Edwards RJ (2010): SLiMFinder: a web server to find novel, significantly over-represented, short protein motifs. Nucleic Acids Research 38: W534-W539.


SLiMMaker SLiMMaker: Short Linear Motif Maker.

SLiMMaker is a simple tool for generating a SLiM regular expression from a set of aligned peptides. It is now integrated into SLiMFinder and QSLiMFinder to provide a more nuanced summary pattern for a cloud of predicted motifs.

Citation: Palopoli N, Lythgow KT & Edwards RJ (2015): QSLiMFinder: improved short linear motif prediction using specific query protein data. Bioinformatics 31(14): 2284-2293.


SLiMSuite SLiMParser: SLiMSuite REST output parsing tool

SLiMParser is a tool for creating, monitoring and parsing SLiMSuite jobs via the online REST servers. Please see the 2020 Methods in Molecular Biology protocols paper for details of use.

Citation: Edwards RJ, Paulsen K, Aguilar Gomez CM & Pérez-Bercoff Å (2020): Computational Prediction of Disordered Protein Motifs using SLiMSuite. Methods Mol Biol. 2141:37-72.


SLiMPrints SLiMPrints: Short Linear Motif fingerprints.

SLiMPrints was a webserver that mined patterns of evolutionary conservation in intrinsically disordered protein regions to predict functional motifs in the absence of interaction data.

Citation: Davey NE, Cowan JL, Shields DC, Gibson TJ, Coldwell MJ & Edwards RJ (2012): SLiMPrints: conservation-based discovery of functional motif fingerprints in intrinsically disordered protein regions. Nucleic Acids Research 40(21):10628-41.


SLiMProb SLiMProb: Short Linear Motif Probability tool.

SLiMProb is a tool for finding pre-defined SLiMs (Short Linear Motifs) in a protein sequence database. SLiMProb can make use of corrections for evolutionary relationships and a variation of the SLiMChance alogrithm from SLiMFinder to assess motifs for statistical over- and under-representation. SLiMProb is a replacement for the original SLiMSearch, which itself was a replacement for PRESTO. The basic architecture is the same but it was felt that having two different "SLiMSearch" servers was confusing.

Citation: Davey NE, Haslam NJ, Shields DC & Edwards RJ (2010): SLiMSearch: a webserver for finding novel occurrences of short linear motifs in proteins, incorporating sequence context. In: Pattern Recognition in Bioinformatics Edited by Dijkstra TMH, Tsivtsivadze E, Marchiori E & Heskes T. Springer-Verlag, Berlin. Lecture Notes in Bioinformatics 6282: 50-61.


SLiMSearch SLiMSearch: Short Linear Motif Search tool.

SLiMSearch is a tool for finding pre-defined SLiMs (Short Linear Motifs) in a protein sequence database. SLiMSearch can make use of corrections for evolutionary relationships and a variation of the SLiMChance alogrithm from SLiMFinder to assess motifs for statistical over- and under-representation. SLiMSearch was a replacement for PRESTO and uses many of the same underlying modules. SLiMSearch has itself been replaced by SLiMProb.

Citation: Davey NE, Haslam NJ, Shields DC & Edwards RJ (2010): SLiMSearch: a webserver for finding novel occurrences of short linear motifs in proteins, incorporating sequence context. In: Pattern Recognition in Bioinformatics Edited by Dijkstra TMH, Tsivtsivadze E, Marchiori E & Heskes T. Springer-Verlag, Berlin. Lecture Notes in Bioinformatics 6282: 50-61.


SLiMSearch SLiMSearch 2.0: biological context for short linear motifs in proteins.

Short, linear motifs (SLiMs) play a critical role in many biological processes. The SLiMSearch 2.0 (Short, Linear Motif Search) web server allows researchers to identify occurrences of a user-defined SLiM in a proteome, using conservation and protein disorder context statistics to rank occurrences. User-friendly output and visualizations of motif context allow the user to quickly gain insight into the validity of a putatively functional motif occurrence. For each motif occurrence, overlapping UniProt features and annotated SLiMs are displayed. Visualization also includes annotated multiple sequence alignments surrounding each occurrence, showing conservation and protein disorder statistics in addition to known and predicted SLiMs, protein domains and known post-translational modifications. In addition, enrichment of Gene Ontology terms and protein interaction partners are provided as indicators of possible motif function. All web server results are available for download. Users can search motifs against the human proteome or a subset thereof defined by Uniprot accession numbers or GO term. The SLiMSearch server is available at: http://bioware.ucd.ie/slimsearch2.html.

Citation: Davey NE, Haslam NJ, Shields DC & Edwards RJ (2011): SLiMSearch 2.0: biological context for short linear motifs in proteins. Nucleic Acids Research 39: W56-W60.


SeqSuite tools

Tools in this section are not specifically concerned with SLiM prediction or analysis. They can be run through either slimsuite.py or seqsuite.py, or their own program in tools/ or dev/. Smaller and newer developmental utilities will not be included in this list.

SLiMSuite SeqSuite: molecular evolution and genome analysis tools

For any tools that are missing, please visit the main SLiMSuite GitHub repo. Please note that this repo is sometimes behind the versions released through individual repos. Citation: Edwards RJ. (2020). slimsuite/SLiMSuite: SLiMSuite v1.9.1 (2020-12-27) (v1.9.1). Zenodo. https://doi.org/10.5281/zenodo.4394731


APHID APHID: Automated Processing of High-resolution Intensity Data

APHID takes for input the partially processed results of MS analysis, with intensity data, filters based on scores thresholds, removes redundancy (using PINGU) and calculates relative intensity scores. PINGU is then used to generate outputs for use with Cytoscape and other visualisation tools.

Citation: Raab M, Daxecker H, Edwards RJ, Treumann A, Murphy D & Moran N (2010): Protein interactions with the platelet integrin alpha(IIb) regulatory motif. Proteomics 10: 2790-2800.


BADASP BADASP: Burst After Duplication with Ancestral Sequence Prediction.

Burst After Duplication with Ancestral Sequence Predictions (BADASP) is a software package for identifying sites that may confer subfamily-specific biological functions in protein families following functional divergence of duplicated proteins. A given protein phylogeny is grouped into subfamilies based on orthology/paralogy relationships and/or user definitions. Ancestral sequences are then predicted from the sequence alignment and the functional specificity is calculated using variants of the Burst After Duplication method, which tests for radical amino acid substitutions following gene duplications that are subsequently conserved. Statistics are output along with subfamily groupings and ancestral sequences for an easy analysis with other packages.

Citation: Edwards RJ & Shields DC (2005): BADASP: predicting functional specificity in protein families using ancestral sequences. Bioinformatics 21(22):4190-1.


BUDAPEST BUDAPEST: Bioinformatics Utility for Data Analysis of Proteomics on ESTs.

BUDAPEST (Bioinformatics Utility for Data Analysis of Proteomics on ESTs) removes redundancy and assigns putative homology-based identifications to translated reading frames (RFs) that have been "hit" during a MASCOT search of MS data against an EST database. Peptides assigned to "incorrect" RFs are eliminated and EST translations combined in consensus sequences using FIESTA (Fasta Input EST Analysis). These consensus hits are optionally filtered on the number of MASCOT peptides they contain before being re-annotated using BLAST searches against a reference database. Finally, HAQESAC can be used for automated or semi-automated phylogenetic analysis for improved sequence annotation.

Citation: Jones BM*, Edwards RJ*, Skipp PJ, O’Connor CD & Iglesias-Rodriguez MD (2011): Shotgun Proteomic Analysis of Emiliania huxleyi, a Marine Phytoplankton Species of Major Biogeochemical Importance. Marine Biotechnology 13(3): 496-504. *Joint first authors


BUSCOMP BUSCOMP: BUSCO Compilation and Comparison tool

BUSCOMP is a genome assembly quality and completeness assessment tool. BUSCOMP compiles BUSCO results over multiple runs, compares results and compiles the best set of predictions into a unified dataset. It will then search this unified set back against all assemblies to fill in missing BUSCO sequences where possible, and provide a more consistent comparative assessment of completeness that is less sensitive to base errors and gene prediction algorithms. Results are output along with a number of other assembly statistics as an R markdown (and HTML) report.

Citation: Stuart KC, Edwards RJ, et al. (preprint): Transcript- and annotation-guided genome assembly of the European starling. bioRxiv 2021.04.07.438753. See also: Edwards RJ (2019): BUSCOMP: BUSCO compilation and comparison – Assessing completeness in multiple genome assemblies [version 1; not peer reviewed]. F1000Research 8:995 (slides) (doi: 10.7490/f1000research.1116972.1)


DepthCharge DepthCharge: Genome assembly quality control and misassembly repair.

DepthCharge is an assembly quality control and misassembly repair program. It uses mapped long read depth of coverage to charge through a genome assembly and identify coverage "cliffs" that may indicate a misassembly. If appropriate, it will then blast the assembly into fragment at those misassemblies.


DepthKopy DepthKopy: Read-depth based copy number estimation.

DepthKopy applies the same single-copy read depth estimate as DepthSizer to estimate the copy number of different gene regions in a slightly modified version of the approach used in the basenji genome paper.

Citation: Edwards RJ, Field MA, Ferguson JM, Dudchenko O, Keilwagen K, Rosen BD, Johnson GS, Rice ES, Hillier L, Hammond JM, Towarnicki SG, Omer A, Khan R, Skvortsova K, Bogdanovic O, Zammit RA, Aiden EL, Warren WC & Ballard JWO (2021): Chromosome-length genome assembly and structural variations of the primal Basenji dog (Canis lupus familiaris) genome. BMC Genomics 22:188


DepthSizer DepthSizer: Read-depth based genome size prediction

DepthSizer uses long-read depth profiles and BUSCO single-copy orthologues to predict genome size. DepthSizer works on the principle that Complete BUSCO genes should represent predominantly single copy (diploid read depth) regions along with some poor quality and/or repeat regions. Assembly artefacts and collapsed repeats etc. are predicted to deviate from diploid read depth in an inconsistent manner. Therefore, even if less than half the region is actually diploid coverage, the modal read depth is expected to represent the actual single copy read depth.

Citation: Chen SH et al. & Edwards RJ (preprint): Chromosome-level de novo genome assembly of Telopea speciosissima (New South Wales waratah) using long-reads, linked-reads and Hi-C. bioRxiv 2021.06.02.444084.


Diploidocus Diploidocus: Diploid genome assembly analysis toolkit

Diploidocus is a sequence analysis toolkit for a number of different analyses related to diploid genome assembly. The main suite of analyses combines long read depth profiles, short read kmer analysis, assembly kmer analysis, BUSCO gene prediction and contaminant screening for a number of assembly tasks including contamination identification, haplotig identification/removal and low quality contig/scaffold trimming/filtering.

Basic Tidy Citation: Edwards RJ, Field MA, Ferguson JM, Dudchenko O, Keilwagen K, Rosen BD, Johnson GS, Rice ES, Hillier L, Hammond JM, Towarnicki SG, Omer A, Khan R, Skvortsova K, Bogdanovic O, Zammit RA, Aiden EL, Warren WC & Ballard JWO (2021): Chromosome-length genome assembly and structural variations of the primal Basenji dog (Canis lupus familiaris) genome. BMC Genomics 22:188 10x Pseudodip citation: Stuart KC*, Edwards RJ*, Cheng Y, Warren WC, Burt DW, Sherwin WB, Hofmeister NR, Werner SJ, Ball GF, Bateson M, Brandley MC, Buchanan KL, Cassey P, Clayton DF, De Meyer T, Meddle SL & Rollins LA (preprint): Transcript- and annotation-guided genome assembly of the European starling. bioRxiv 2021.04.07.438753; doi: 10.1101/2021.04.07.438753. Tidy Citation: Chen SH et al. & Edwards RJ (preprint): Chromosome-level de novo genome assembly of Telopea speciosissima (New South Wales waratah) using long-reads, linked-reads and Hi-C. bioRxiv 2021.06.02.444084.


FIESTA FIESTA: Fasta Input EST Analysis

FIESTA has three primary functions: 1. Discovery, assembly and evolutionary analysis of candidate genes in an EST library; 2. Assembly of an EST library for proteomics analysis; 3. Translation/Annotation of an EST library for proteomics analysis.

Citation: Jones BM*, Edwards RJ*, Skipp PJ, O’Connor CD & Iglesias-Rodriguez MD (2011): Shotgun Proteomic Analysis of Emiliania huxleyi, a Marine Phytoplankton Species of Major Biogeochemical Importance. Marine Biotechnology 13(3): 496-504. *Joint first authors


GABLAM GABLAM: Global Analysis of BLAST Local AlignMents.

GABLAM is a versatile tool for tabulating the results of all-by-all BLAST+ or Minimap2 searches, with a focus on generating global coverage and identity statistics. GABLAM also features a number of additional outputs, such as extracting hit regions as fasta files, or SAM/GFF output of local hits. GABLAM is a core component of the correction for evolutionary relationships within SLiMDisc and SLiMFinder.

Citation: Davey NE, Shields DC & Edwards RJ (2006): SLiMDisc: short, linear motif discovery, correcting for common evolutionary descent. Nucleic Acids Res. 34(12):3546-54.


GapSpanner GapSpanner: Genome assembly gap long read support and reassembly tool.

GapSpanner uses (or generates) a BAM file of long reads mapped to a genome assembly to assess assembly "gaps" for spanning read support. Optionally, reads spanning each gap can be extracted and re-assembled with Flye. If the new assembly spans the gap, crude gap-filling can be performed. This will be reversed if edits are not subsequently supported by spanning reads mapped onto the updated assembly.


GASP GASP: Gapped Ancestral Sequence Prediction.

GASP predicts ancestral protein sequences from a multiple sequence alignment and phylogenetic tree. Unlike many other similar tools, it will predict states for columns with gaps. GASP is also implemented as part of BADASP and HAQESAC.

Citation: Edwards RJ & Shields DC (2004): GASP: Gapped Ancestral Sequence Prediction for proteins. BMC Bioinformatics 5(1):123.


GapSpanner GFESSA: Genome-Free EST SuperSAGE Analysis.

This program is for the automated processing, mapping and identification-by-homology for SuperSAGE tag data for organisms without genome sequences, relying predominantly on EST libraries etc. Although designed for genome-free analysis, there is no reason why transcriptome data from genome projects cannot be used in the pipeline.

Citation: Johansson SA, Stephenson P, Edwards RJ, Yoshida K, Moore M, Terauchi R, Zubkov MV, Terry MJ & Bibby TS (2020): Isolation and molecular characterisation of Dunaliella tertiolecta with truncated light-harvesting antenna for enhanced photosynthetic efficiency. Algal Research 48:101917.


GOPHER GOPHER: Generation of Orthologous Proteins from High-throughput Estimation of Relationships

GOPHER is a query-focused orthologue prediction tool, designed to identify and align the closest orthologue from each species to a set of query proteins. GOPHER uses sequence similarity to estimate orthology/paralogy relationships. Unlike orthologue clustering tools, GOPHER is explicitly focused on the orthology/paralogy to the query protein, e.g. there is no requirement for other sequences in the alignment to be orthologous to each other. GOPHER is a rapid tool for the generation of multiple sequence alignments suitable for sequence conservation analyses, such as those used by conservation masking in SLiMSuite.

Citation: Davey NE*, Edwards RJ* & Shields DC (2007): The SLiMDisc server: short, linear motif discovery in proteins. Nucleic Acids Res. 35(Web Server issue):W455-9.


HAQESAC HAQESAC: Homologue Alignment Quality, Establishment of Subfamilies and Ancestor Construction.

HAQESAC was published as part of our 2007 Nature Chemical Biology paper, using functional specificity analysis to predict novel bioactive peptides. HAQESAC was primarily designed to generate curated high quality multiple sequence alignments of 2+ paralogous genes across multiple species, as input for BADASP. It is also useful for generating high-quality automated alignments and trees of proteins for initial annotation etc. HAQESAC can be scaled up using MultiHAQ.

Citation: Edwards RJ*, Moran N*, Devocelle M, Kiernan A, Meade G, Signac W, Foy M, Park SDE, Dunne E, Kenny D & Shields DC (2007): Bioinformatic discovery of novel bioactive peptides. Nature Chem. Biol. 3(2):108-112.


MultiHAQ MultiHAQ: Multi-Query HAQESAC controller

MultiHAQ is a wrapper for multiple HAQESAC runs where different query proteins are to be BLASTed against the same search database(s) and run through HAQESAC with the same settings.

Citation: Jones BM*, Edwards RJ*, Skipp PJ, O’Connor CD & Iglesias-Rodriguez MD (2011): Shotgun Proteomic Analysis of Emiliania huxleyi, a Marine Phytoplankton Species of Major Biogeochemical Importance. Marine Biotechnology 13(3): 496-504. *Joint first authors


NUMTFinder NUMTFinder: Nuclear mitochondrial fragment (NUMT) search tool.

NUMTFinder uses a mitochondrial genome to search against genome assembly and identify putative NUMTs. NUMT fragments are then combined into NUMT blocks based on proximity.

Citation: Edwards RJ, Field MA, Ferguson JM, Dudchenko O, Keilwagen K, Rosen BD, Johnson GS, Rice ES, Hillier L, Hammond JM, Towarnicki SG, Omer A, Khan R, Skvortsova K, Bogdanovic O, Zammit RA, Aiden EL, Warren WC & Ballard JWO (2021): Chromosome-length genome assembly and structural variations of the primal Basenji dog (Canis lupus familiaris) genome. BMC Genomics 22:188


PAFScaff PAFScaff: Pairwise mApping Format reference-based scaffold anchoring and super-scaffolding.

PAFScaff is designed for mapping genome assembly scaffolds to a closely-related chromosome-level reference genome assembly. It uses (or runs) Minimap2 to perform an efficient (if rough) all- against-all mapping, then parses the output to assign assembly scaffolds to reference chromosomes.

Citation: Field MA, Rosen BD, Dudchenko O, Chan EKF, Minoche AM, Edwards RJ, Barton K, Lyons RJ, Enosi Tuipulotu D, Hayes VM, Omer AD, Colaric Z, Keilwagen J, Skvortsova K, Bogdanovic O, Smith MA, Lieberman Aiden E, Smith TPL, Zammit RA & Ballard JWO (2020): Canfam_GSD: De novo chromosome-length genome assembly of the German Shepherd Dog (Canis lupus familiaris) using a combination of long reads, optical mapping, and Hi-C. GigaScience 9(4):giaa027.


PINGU PINGU: Protein Interaction Network & GO Utility (v3)

This utility was originally created for handling proteomics data with EnsEMBL peptide IDs. The data needed to be mapped onto Genes, overlaps and redundancies identified, gene lists output for GO analysis with FatiGO, and PPI data from HPRD and BioGRID to identify potential complexes.

Citation: Raab M, Daxecker H, Edwards RJ, Treumann A, Murphy D & Moran N (2010): Protein interactions with the platelet integrin alpha(IIb) regulatory motif. Proteomics 10: 2790-2800.


PAGSAT PAGSAT: Pairwise Assembled Genome Sequence Analysis Tool.

PAGSAT performs comparative assessment of an assembled genome against a suitable reference. For optimal results, the reference genome will be close to identical to that which should be assembled. However, comparative analyses should still be useful when different assemblies are run against a related genome. PAGSAT also contains tools to help interactively tidy up small genomes. It is designed and optimised for yeast assemblies.


SAAGA SAAGA: Summarise, Annotate & Assess Genome Annotations.

SAAGA is a tool for annotation versus reference proteome comparisons. SAAGA can compare different annotations of the same assembly, or be combined with a lightweight annotation tool like GeMoMa to compare different assemblies of the same organism. A number of statistics are generated to summarise comparative completeness, redundancy and accuracy of the annotation.

Citation (Summarise): Edwards RJ, Field MA, Ferguson JM, Dudchenko O, Keilwagen K, Rosen BD, Johnson GS, Rice ES, Hillier L, Hammond JM, Towarnicki SG, Omer A, Khan R, Skvortsova K, Bogdanovic O, Zammit RA, Aiden EL, Warren WC & Ballard JWO (2021): Chromosome-length genome assembly and structural variations of the primal Basenji dog (Canis lupus familiaris) genome. BMC Genomics 22:188 Citation (Assess): Stuart KC*, Edwards RJ*, Cheng Y, Warren WC, Burt DW, Sherwin WB, Hofmeister NR, Werner SJ, Ball GF, Bateson M, Brandley MC, Buchanan KL, Cassey P, Clayton DF, De Meyer T, Meddle SL & Rollins LA (preprint): Transcript- and annotation-guided genome assembly of the European starling. bioRxiv 2021.04.07.438753; doi: 10.1101/2021.04.07.438753.


SAMPhaser SAMPhaser: Diploid Genome Haplotype Phasing.

SAMPhaser is a probabilistic phasing tool for diploid long-read data.

Citation: Song W, Thomas T & Edwards RJ (2019): Complete genome sequences of pooled genomic DNA from 10 marine bacteria using PacBio long-read sequencing. Marine Genomics 48:100687.


PAFScaff Snapper: Genome-wide SNP Mapper

Snapper is designed to generate a table of SNPs from a BLAST comparison of two genomes, map those SNPs onto genome features, predict effects and generate a series of output tables to aid exploration of genomic differences. It will also output regions of each assembly that are not found in the other.


SynBad SynBad: Synteny-based scaffolding adjustment.

SynBad is a tool for comparing two related genome assemblies and identify putative translocations and inversions between the two that correspond to gap positions. These positions could indicate misplaced scaffolding. Where possible, gap-spanning mapped reads and HiC read pairs will be used to support gap placement. KAT assembly kmer analysis and read depth will also be used if possible to identify possible false duplications.


Taxolotl Taxolotl: Genome assembly taxonomy summary and assessment tool.

Taxolotl combines the MMseqs2 easy-taxonomy with GFF parsing to perform taxonomic analysis of a genome assembly (and any subsets given by taxsubsets=LIST) using an annotated proteome. Taxonomic assignments are mapped onto genes as well as assembly scaffolds and (if assembly=FILE is given) contigs.


PDF Manuals Archive

Please note that not all programs have manuals and most manuals are out of date. Documentation is in the process of being moved to GitHub: if in doubt, check the online documentation and docstrings for the latest options and default settings. Please report any anomalous behaviour. Suggestions for improvements to programs and documentation are also appreciated.


© 2021 RJ Edwards. Contact: richard.edwards@unsw.edu.au.