software

 

 

SIFTER Search: A web server for accurate phylogeny-based protein function prediction

-- by Sayed Mohammad Ebrahim Sahraeian, Kevin R. Luo, and Steven E. Brenner

 

The accurate annotation of protein function is key to understanding life at the molecular level. With its inherent difficulty and expense, biochemical characterization of protein function cannot scale to accommodate the vast amount of sequence data already available, much less its continued growth. Thus, there is a need for reliable computational methods to predict protein function. SIFTER (Statistical Inference of Function Through Evolutionary Relationships) is a statistical approach to predicting protein function that uses a protein family's phylogenetic tree, as the natural structure for representing protein relationships. SIFTER has previously been shown to perform better than other methods in widespread use. 

SIFTER web server provides access to SIFTER results on 16,853,547 proteins from 232,403 species, which have been precomputed by using specially optimized parameters. Users can access the protein function predictions by searching for one or multiple proteins, searching for all proteins in a given species, searching for proteins that are predicted to have given functions, or searching for the predictions of homologs of a given sequence.  

SIFTER Search is available at http://sifter.berkeley.edu/

SMETANA: Semi-Markov random walk scores Enhanced by consistency Transformation for Accurate Network Alignment

-- by Sayed Mohammad Ebrahim Sahraeian and Byung-Jun Yoon

 

SMETANA is a probabilistic scheme for finding the maximum expected accuracy (MEA) alignment of large-scale biological networks. It employs semi-Markov random walk model to estimate the correspondence scores between proteins in different networks. SMETANA effectively incorporate the local and global similarities across multiple networks by employing two types of probabilistic consistency transformations for enhancing the initial node correspondence scores. The transformed scores are subsequently used to construct the MEA alignment in a greedy manner. SMETANA can serve as an effective tool for accurately aligning multiple networks. Especially, the proposed algorithm truly stands out when aligning a large number of networks. SMETANA is highly efficient and scalable and it can easily align tens of networks with thousands of nodes within a few minutes on a personal computer. 

SMETANA is described in the following paper:

  • SMETANA: accurate and scalable algorithm for probabilistic alignment of large-scale biological networks.
    S.M.E. Sahraeian and B.J. Yoon, PLoS ONE, 8(7): e67995, Jul. 2013.
    [PloS ONE] [Supplementary data]

Here is the official website of SMETANA where you can download the the source codes.

NAPAbench: Network Alignment Performance Assessment benchmark

-- by Sayed Mohammad Ebrahim Sahraeian and Byung-Jun Yoon

 

NAPAbench is a synthetic benchmark datasets that can address the necessity of accurate and systematic validation of network alignment algorithms. NAPAbench is constructed based on a comprehensive scheme for generating evolutionary related families of synthetic protein-protein interaction (PPI) networks. Considering the incompleteness and inaccuracy of the current PPI networks, along with the difficulty of accurate functional coherence assessment of aligned proteins, the proposed network synthesis model that can generate families of networks with biologically realistic properties can provide a practical and effective alternative for validating network alignment algorithms.

Napabench is described in the following paper:

  • A Network Synthesis Model for Generating Protein Interaction Network Families.
    S.M.E. Sahraeian and B.J. Yoon, PLoS ONE, 7(8): e41474, 2012.
    [PloS ONE] [Supplementary data]

Here is the official website of NAPAbench where you can download the NAPAbench dataset and the source codes for generating network families.

RESQUE: REduction-based scheme using Semi-Markov scores for network QUErying

-- by Sayed Mohammad Ebrahim Sahraeian and Byung-Jun Yoon

 

RESQUE is a probabilistic technique for querying large-scale biological networks. RESQUE searches for regions in the target network with high biological similarity to the query network. It employs semi-Markov random walk model to estimate the correspondence scores between proteins in the query and the target networks. The computed scores are used with an iterative network reduction approach to shrink the search space in the target network. This iterative scheme enhances the accuracy of the estimated correspondence scores, thereby leading to more accurate querying results.

RESQUE is described in the following paper:

Here is the official website of RESQUE where you can download the source codes.

PicXAA: Probabilistic maximum Accuracy Alignment 

-- by Sayed Mohammad Ebrahim Sahraeian and Byung-Jun Yoon

 

PicXAA is a probabilistic non-progressive alignment algorithm that finds protein multiple sequence alignments with maximum expected accuracy. PicXAA greedily builds up the multiple alignment from sequence regions with high local similarities, thereby yielding an accurate global alignment that effectively grasps the local similarities among sequences.

 

PicXAA-R: Extension of PicXAA for RNA  structural alignment

-- by Sayed Mohammad Ebrahim Sahraeian and Byung-Jun Yoon

 

PicXAA-R is an extension of PicXAA for greedy structural alignment of NonCoding RNAs. It efficiently grasps both folding information within each sequence and local similarities between sequences in a greedy manner. PicXAA-R is one of the fastest algorithms for structural alignment of multiple RNAs and consistently yields accurate alignment results, especially for alignment of locally similar sequences.

 

PicXAA and PicXAA-R are described in the following papers:

  • PicXAA: Greedy probabilistic construction of maximum expected accuracy alignment of multiple sequences.
    S.M.E. Sahraeian and B.J. Yoon, Nucleic Acids Research, 38(15): 4917-4928, 2010.

     [Nucleic Acid Research]

  • PicXAA-R: Efficient structural alignment of multiple RNA sequences using a greedy approach.
    S.M.E. Sahraeian and B.J. Yoon, BMC Bioinformatics , 12(Suppl 1):S38, 2011.

     [BMC Bioinformatics]

  • PicXAA-Web: a web-based platform for non-progressive maximum expected accuracy alignment of multiple biological sequences.
    S.M.E. Sahraeian and B.J. Yoon, Nucleic Acids Research , Web Server Issue, doi:10.1093/nar/gkr244, April 2011.

     [PicXAA Web Server] [Nucleic Acid Research]

 

Here is the official website of PicXAA where you can download the source codes of PicXAA and PicXAA-R.

 

PicXAA-Web: PicXAA & PicXAA-R web-server

    You can use PicXAA and PicXAA-R to align multiple protein/DNA/RNA sequences through your web-browser. Visit the following URL to access PicXAA-Web: http://gsp.tamu.edu/picxaa/