Community Service

As a service to the Bioinformatics, Systematics, Genomics and Evolution communities BRI provides the programs below at no cost. All programs are command-line programs that include detailed documentation to permit use by inexperienced users.

kSNP4

kSNP4 identifies the SNPs (Single Nucleotide Polymorphisms) in a set of genome sequences, and estimates phylogenetic trees based upon those SNPs. SNP discovery is based on k-mer analysis, and requires no multiple sequence alignment or reference genome. kSNP4 can analyze 100’s of microbial genomes. kSNP4 can analyze both complete (finished) genomes and unfinished genomes in assembled contigs or raw, unassembled reads. Finished and unfinished genomes can be analyzed together, and kSNP4 can automatically use the information in a specified set of finished genomes to annotate the SNPs in all the genomes in the data set.

No programming skills are required to use kSNP4.

kSNP4 executable files are provided for Linux and Mac platforms

kSNP4 was released on October 26, 2022. See this news article for the story of kSNP4

We have not yet published a paper on kSNP4, but publications about earlier versions are:

Gardner, S.N. and Hall, B.G. 2013. . PLoS ONE, 8(12):e81760.doi:10.1371/journal.pone.0081760

Gardner, S.N., T. Slezak, and B.G. Hall. 2015 Bioinformatics 31: 2877-2878 doi: 10.1093/bioinformatics/btv271

Download kSNP4 at https://sourceforge.net/projects/ksnp/files/

kSNPdist

kSNPdist calculates the number of SNP differences between all pairs of genomes in a data set by comparing the SNP sequences from the SNPs_all_matrix.fasta file written by kSNP4.

The kSNPdist package includes executables for Mac and Linux and complete documentation.

Ankrum, A. and  B. G. Hall. 2017.  Population dynamics of Staphylococcus aureus in Cystic Fibrosis patients to determine transmission events utilizing WGS.  J. Clin. Microbiol. 55: 2143 – 2152. (doi:10.1128.JCM00164-17)

Download kSNPdist at https://sourceforge.net/projects/ksnp/files/

EvolveAGene

EvolveAgene4 is a coding sequence evolution simulation program. It simulates evolution by base substitutions, insertions, deletions and recombination events.

It does not use a mathematical model of evolution processes, instead it uses an even older model: mutation and selection.  

EvolveAGene4, released in 2016, is an upgrade of EvolveAGene3  and now incorporates recombination into the process.

EvolveAGene3 has been used to study the accuracies of phylogenetic software and sequence alignment software by providing a known “true” phylogeny and an alignment in which every step is known.

Hall, B. G. 2008  Simulating DNA coding sequence evolution with EvolveAGene 3.  Mol. Biol. Evol.  25: 688-695.

The EvolveAGene4 package includes executables for Mac, Linux, and Windows. Download the EvolveAGene4 Package from https://sourceforge.net/projects/evolveagene/files/

MSTgold

Minimum spanning tree (MST) algorithms can generate multiple, equally-minimal, MSTs but MST programs typically report only one, arbitrarily chosen MST. Similarly, most MST programs do not provide statistical metrics to support the credibility of the MSTs that they estimate.

MSTgold estimates the number of alternative MSTs, reports up to a user-determined number of those trees, reports a consensus tree, and implements a bootstrapping metric to assess the reliability of alternate MST solutions.

MSTgold accepts single-character data that are nucleotides, amino acids, binary characters, or SNPs; integers that represent, for instance, the lengths of VNTRs or microsatellites; and distance matrices.

The MSTgold package includes Mac OS X, Linux, and Windows executables of the MSTgold program, a detailed Manual, example data and results, and executables of the program Fasta2MSTG which converts Fasta sequence files to the MSTgold input format.

Salipante, S.J. and B.G. Hall  2011 The Inadequacies of Minimum Spanning Trees in Molecular Epidemiology.  J. Clinical Microbiology 46: 3568-3575.

Download the MSTgold package from https://sourceforge.net/projects/mstgold/files/

PPFS2

PPFS2 (Predicting Phenotypes From SNPs) is used for downstream analysis of the results of kSNP4 analyses. It identifies a set of SNPs that are highly correlated with the phenotypes of those genomes whose phenotypes are known. It then uses that information to predict the phenotypes of genomes whose phenotypes are unknown.

Hall, B.G.  2014  SNP-associations and phenotype predictions from hundreds of microbial genomes without genome alignments.  PLoS One  9(2): e90490. doi:10.1371/journal.pone.0090490

Download PPFS2 for Mac and Linux OD from https://sourceforge.net/projects/ppfs/files/

FindPlasmids

FindPlasmids is a package of executables and detailed instructions for finding plasmid sequences in sequence assembly files of bacterial genomes.

FindPlasmids packages are provided for Mac, Linux, and Windows operating systems.

Download FindPlasmids from https://sourceforge.net/projects/findplasmids/files/

%d