blastp is a command-line tool in Linux that is part of the BLAST (Basic Local Alignment Search Tool) suite of tools for searching and comparing sequence databases. It is used to perform protein-protein searches, which involve comparing a query protein sequence to a database of protein sequences to find similar or related sequences.
blastp is commonly used in bioinformatics and molecular biology to identify homologous or related protein sequences in proteomic data, or to search for specific protein sequences in large databases.
To use blastp, you will need to have the blastp package installed on your Linux system. You can install blastp using the package manager for your specific distribution of Linux, or you can download and install the latest version of the BLAST suite from the NCBI (National Center for Biotechnology Information) website.
Once blastp is installed, you can use it to perform a protein-protein search by running the blastp command followed by the appropriate options and arguments. For example, to search for a specific protein sequence in a database of protein sequences using a custom scoring matrix, you could use the following command:
# blastp -query my_sequence.fa -db protein_database.fa -matrix custom_matrix.txt
blastp supports a variety of command-line options that allow you to customize the search parameters, such as the database to use, the output format, or the filtering options. You can use these options to fine-tune the search to suit your needs.
For more information on using blastp, you can consult the blastp documentation or use the blastp –help command to view a list of available options and usage examples.
blastp Command Examples
1. Align two or more sequences using blastp, with the e-value threshold of 1e-9, pairwise output format, output to screen:
# blastp -query query.fa -subject subject.fa -evalue 1e-9
2. Align two or more sequences using blastp-fast:
# blastp -task blastp-fast -query query.fa -subject subject.fa
3. Align two or more sequences, custom tabular output format, output to file:
# blastp -query query.fa -subject subject.fa -outfmt '6 qseqid qlen qstart qend sseqid slen sstart send bitscore evalue pident' -out output.tsv
4. Search protein databases using a protein query, 16 threads to use in the BLAST search, with a maximum number of 10 aligned sequences to keep:
# blastp -query query.fa -db blast_database_name -num_threads 16 -max_target_seqs 10
5. Search the remote non-redundant protein database using a protein query:
# blastp -query query.fa -db nr -remote
6. Display help (use `-help` for detailed help):
# blastp -h