BioInformatics Tools
The Bioinformatics tools are the software programs for the saving, retrieving and analysis of Biological data and extracting the information from them.
Factors that must be taken into consideration when designing these tools are:
-
The end user (the biologist) may not be a frequent user of computer technology and thus it should be very user friendly.
-
These software tools must be made available over the internet given the global distribution of the scientific research community.
The Bioinformatics Tools may be categorized into following categories:
-
Homology and Similarity Tools
-
Protein Function Analysis
-
Structural Analysis
-
Sequence Analysis
Homology and Similarity Tools
The term homology implies a common evolutionary relationship between two traits -whether they are DNA sequences or bristle patterns on a fly's nose. Homologous sequences are sequences that are related by divergence from a common ancestor. Thus the degree of similarity between two sequences can be measured while their homology is a case of being either true of false. This set of tools can be used to identify similarities between novel query sequences of unknown structure and function and database sequences whose structure and function have been elucidated.
Protein Function Analysis
Function Analysis is Identification and mapping of all functional elements (both coding and non-coding) in a genome. This group of programs allow you to compare your protein sequence to the secondary (or derived) protein databases that contain information on motifs, signatures and protein domains. Highly significant hits against these different pattern databases allow you to approximate the biochemical function of your query protein.
Structural Analysis
This set of tools allow you to compare structures with the known structure databases. The function of a protein is more directly a consequence of its structure rather than its sequence with structural homologs tending to share functions. The determination of a protein's 2D/3D structure is crucial in the study of its function.
Sequence Analysis
This set of tools allows you to carry out further, more detailed analysis on
your query sequence including evolutionary analysis, identification of
mutations, hydropathy regions, CpG islands and compositional biases. The
identification of these and other biological properties are all clues that aid
the search to elucidate the specific function of your sequence.
Bioinformatics Tools
BLAST:
The Basic Local Alignment Search Tool (BLAST)
for comparing gene and protein sequences against others in public databases, now
comes in several types including PSI-BLAST, PHI-BLAST, and BLAST 2 sequences.
Specialized BLASTs are also available for human, microbial, malaria, and other
genomes, as well as for vector contamination, immunoglobulins, and tentative
human consensus sequences.
FASTA
A database search tool used to compare a nucleotide or peptide sequence to a
sequence database. The program is based on the rapid sequence algorithm
described by Lipman and Pearson. It was the first widely used algorithm for
database similarity searching. The program looks for optimal local alignments by
scanning the sequence for small matches called "words". Initially, the
scores of segments in which there are multiple word hits are calculated
("init1"). Later the scores of several segments may be summed to
generate an "initn" score. An optimized alignment that includes gaps
is shown in the output as "opt". The sensitivity and speed of the
search are inversely related and controlled by the "k-tup" variable
which specifies the size of a "word".
EMBOSS
EMBOSS (The European Molecular Biology Open Software Suite) is a new, free open
source software analysis package specially developed for the needs of the
molecular biology user community. Within EMBOSS you will find around 100
programs (applications) for sequence alignment, database searching with sequence
patterns, protein motif identification and domain analysis, nucleotide sequence
pattern analysis, codon usage analysis for small genomes, and much more.
A list of applications that are included with the EMBOSS package can be found in http://www.hgmp.mrc.ac.uk/Software/EMBOSS/Apps/
Clustalw
ClustalW is a general purpose multiple sequence alignment program for DNA or proteins. It produces biologically meaningful multiple sequence alignments of divergent sequences, calculates the best match for the selected sequences, and lines them up so that the identities, similarities and differences can be seen.
RasMol
It is a powerful research tool to display the structure of DNA, proteins,
and smaller molecules. Protein Explorer, a derivative of RasMol, is an easier to
use program.
Application Programs
JAVA in Bioinformatics:
Due to Platform independence nature of Java, it is emerging as a key
player in bioinformatics. Physiome Sciences' computer-based biological
simulation technologies and Bioinformatics Solutions' PatternHunter are two
examples of the growing adoption of Java in bioinformatics.
Perl in Bioinformatics:
Perl is also being used in the processing of biological data. One example of
perl project is BioPerl project.
Bioinformatics Projects:
BioJava:
The BioJava Project is providing the Java tool for the processing of data in
Java
BioPerl:
The BioPerl project many module for biological data processing.
BioXML:
A part of the BioPerl project, this is a resource to gather XML documentation,
DTDs and XML aware tools for biology in one location.