MPsrch parameters


database
Currently available databases are below.

Each of the databases may be searched, in full (all), or in any combination of component parts (divisions).

See MPsrch e-mail help document, for more details.

query sequence
MPsrch accept nucleic acid or amino acid sequence as a query. For nucleic acid sequence, the program can recognize IUPAC ambiguity codes as well as A, C, G, T, and U. For amino acid sequence, the program recognize all the one-letter codes. See also IUPAC code table .
The minimum length of a sequence that ca be loaded into MPsrch is 4 residues or bases.
The maximum length is 10,000 residues or bases. In the MPsrch_tpn and _tpna programs, the query length limit is 3,333 residues, which back-translate into a 9,999 bases sequence used to search the databases.

PAMs
(MPsrch_pp, _ppa only)
Specify number of Dayhoff PAMs for the comparison score table in protein search.
The default PAM value (100) is a reasonable choice for an initial database search. If no significant homologies are found, try a higher PAM value.
In MPsrch, PAM values are flexible over a large range; you can use any integer between 1 and 500. In general, PAM values over 400 do not result in more sensitive searches. See also the section of the table parameter.

scoring table
Specify predefined standard scoring table (matrix).
Nmatch
(without MPsrch_pp, _ppa)
This parameter controls the scoring of unknown bases in nucleic acid searches. Nmatch specifies how the character N in the query and in the database is matched. N can match everything or N can match only itself, in the databases or in the query.
STD (default selection)
Use scoring values supplied in the table used.
XX
Use most negative value from table, for N in both the query and the database.
QX
Use most negative value from table for N in the query, and 0.25 of the highest match score (minimum+1) for N in the database.
DX
Use most negative value from table for N in the database, and 0.25 of the highest match score (minimum+1) for N in the query.
NN
Use 0.25 of the highest match score (minimum+1), for N in both the query and the database.
The setting of Nmatch you use depends on your meaning for the N character. In DDBJ, GenBank, and EMBl, there are long runs of N's that often represent unknown bases. If you allow these N's to match, using the NN or QX settings, any query will match these sequences and they will all be reported with high scores. On the other hand, N's in the query may mean that any base is acceptable.
If you never want N to match anything, then use the XX setting.

gap penalty
(MPsrch_pp, _nn, _ntp, _tpn only)
Specify the value of the gap penalty in non-affine searches . Gap penalty, in the Smith-Waterman algorithm, is applied to any unmatched base or residue in the alignment.
MPsrch_pp
range = 2 -> 60, default = 14
MPsrch_nn
range = 2 -> 60, default = 6
MPsrch_ntp, _tpn
range = 15 -> 60, default = 30
gap open penalty
(MPsrch_ppa, _nna, _ntpa, _tpna only)
Specify the gap open penalty in affine gap searches . See also the section of gap extend penalty.
MPsrch_ppa
range = 2 -> 60, default = 20
MPsrch_nna
range = 3 -> 60, default = 12
MPsrch_ntpa, _tpna
range = 15 -> 100, default = 30
gap extend penalty
(MPsrch_ppa, _nna, _ntpa, _tpna only)
For affine gap searches , this option specifies the gap extension penalty for an increase in gap size. The gap extension penalty changes the weight of unmatched residues once a gap has already started, usually to make it easier to extend the gap; this clusters indels into runs of indels.

The gap extension penalty must always be less than or equal to the gap open penalty. If the gap extension penalty exceeds the gap open penalty, it will be reset to equal the gap open penalty when the job is run.

See also the section of gap open penalty.

MPsrch_ppa
range = 2 -> 60, default = 6
MPsrch_nna
range = 0 -> 60, default = 1
MPsrch_ntpa, _tpna
range = 0 -> 99, default = 10
ranking method
(MPsrch_pp, _ppa only)
SC - Score ranking (default selection)
This ranking method is by the original Smith-Waterman method, and puts high scoring alignments above lower scoring ones. This method takes no account of relative improbabilities. If you select score ranking, in the output summaries, score ranking is listed as 'Predicted Number'.
RF - Edinburgh Ranking Function
This ranking method aims to improve the recovery of improbable alignments arising from short or unbalanced size sequence comparison.
number of one-line summaries
Specifies the number of one-line summaries of names and scores in the output report. The summaries show rank, score, percent query match, sequence length, database identifier, sequence name, a short description, and a predicted expectation value.

number of alignments
Specifies the number of alignments to be shown in the output report. Alignments show the high-scoring region of best local similarity, numbered and marked to show the regions of identities or changes.

mode - the style of alignments
Specifies the style of the alignments.
E - Edinburgh style alignment (default selection)
identity = '*', conservative change = '.', mismatch or gap = (space)
            *.****. ..**  *
  Db     47 knlktldemknsedl 61
  Qy      5 khlktlqalrnsgsl 19
I - IntelliGenetics style alignment
identity = '|', conservative change = ':', mismatch or gap = (space)
The alignment of this mode looks like a result of BLAST.
  Db     47 knlktldemknsedl 61
            |:||||: ::||  |
  Qy      5 khlktlqalrnsgsl 19

Go Back to the Page of DISC MPsrch