next up previous contents
: 6 Parameter specification : SAM (Sequence Alignment and : 参考文献   目次


5 File types

SAM produces a variety of files. This section summarizes both the extensions SAM uses and the extensions we conventionally use when naming output that goes to standard output.

The following file extensions are automatically used:

.a2m
A SAM alignment file, as created by align2model or hmmscore. A FASTA-compatible format. See Section 10.1.
.apr
A SAM alignment probabilities file, as created by pathprobs. An RDB-compatible format. See Section 10.8.
.ckp
A PSI-BLAST checkpoint profile file created by sam2psi, which can be used with PSI-BLAST in conjunction with the matching .cks checkpoint sequence file. See Section 10.10.6.
.cks
A PSI-BLAST checkpoint sequence file created by sam2psi, which can be used with PSI-BLAST in conjunction with the matching .ckp checkpoint profile file. See Section 10.10.6.
.cst
A SAM constraints file containing the definition of sequence position constraints. See Section 9.7.
.data
A gnuplot data file generated by makehist. See Section 10.11.
.dist
A scoring file listing sequence identifiers, lengths, and scores, produced by hmmscore. See Section 10.2.
.dist-rdb
A scoring file in RDB format including sequence identifiers, lengths, scores, and information about scoring method produced by hmmscore. See Section 10.2.
.frag
An a2m alignment file generated by fragfinder. See Section 10.4.
.freq
A frequency model generated by buildmodel when many_files is set or by grabdp. This is a model that reports the frequency of each letter in each state when the training set is evaluated according to the model. See Section 9.3 and Section 10.5.
.fstat
A scoring file produced by the fragfinder program. See Section 10.4.
.hmmer
An HMMER-format model produced by sam2hmmer. See Section 10.10.7.
.kids
A Kestrel sequence id file. The file is an ordered list of identifiers for sequences in a .kseq or .krseq file. See Section 10.12.10.
.kseqs
A Kestrel sequence database, containing an encoding of the sequences optimized for Kestrel hmmscore. See Section 10.12.10.
.kseqs
A Kestrel reverse null model scoring sequence database. This contains each sequence in the original database, plus a reverse of the sequence. See Section 10.12.10.
.match-rdb
An RDB file created by grabdp, listing the posterior match probabilities of each amino acid type at each node, plus the prior probabilities of each amino acid type. Used for diagnostic purposes. See Section 10.5.
.mod
A model file created by a buildmodel, addfims, or modelfromalign. When many_files is not set (the default), buildmodel places the run statistics, parameter settings, and the frequency model into this file as well. See Section 9.3.
.f.mod
A subfamily model for family number `f'. See Section 9.4.
.a.mrrr.mod
If print_all_models is set, model number $m$ during re-estimate cycle 'rrr' will be placed in the file.
.s.rrr.mod
If print_surg_models is set, the winning model is printed after each surgery iteration, where `rrr' is the re-estimation index.
.mult
One of two multiple domain alignment output files created by hmmscore. This will contain all alignments to a motif that were found. See Section 10.2.5.
.mlib
A model library file created by hmmscore model calibration. Each entry includes a single- or multi-track model specification and various scoring parameters. See Section 10.2.10.
.mstat
The other multiple domain output file. This file contains the sequence identifier and scores for the data in the corresponding .mult file. See Section 10.2.5.
.pa2m
A SAM alignment file that alternates sequences and posterior probability annotations in sequence format, as created by pathprobs. See Section 10.8.
.pdoc
Created by grabdp, this file contains the posterior decoding of the dynamic programming for match states and transitions into match states. The files tend to be rather large. See Section 10.5.
.plt
A gnuplot command file created by makehist. See Section 10.11.
.ptrack
Created by predict_track, this file contains secondary structure (or other second alphabet) predictions for a given sequence based on a primary sequence model, and databases of primary and secondary structure for a group of sequences. See Section 10.9.
.samrc
A file of default parameters either in a home directory or the current directory. See Section 6.
.sel
Sequences that passed the selection criteria used in hmmscore. See Section 10.2.
.seq
A file of sequences, as for example created by sampleseqs.
.stat
The statistics of a buildmodel run when many_files is set, including re-estimation scores and initial parameters. See Section 9.3.
.ta2m
An alignment trimmed by the pathprobs program. See Section 10.8.
.weightoutput
A list of sequence weights from the internal weighting option of buildmodel. Present when print_all_weights and internal weighting are both enabled. See Section 9.4.4.

The following file extensions are conventionally used in this manual.

.colors
A color file for the makelogo program, usually located in the same directory as the Dirichlet priors. See Section 10.10.4.
.log
Standard error output from running target2k can be redirected to this file. See Section 4.
.ncomp
A Dirichlet mixture prior library file, where n is the number of components to the mixture, as in null.1comp or mall-opt.9comp. Information on specific mixtures can be found at http://www.cse.ucsc.edu/research/compbio/dirichlets/index.html. See Section 8.1.
.plib
A Dirichlet mixture prior library file (old naming convention). The same extension is used both for match state regularizers and HSSP-based transition regularizers. See Section 8.1.
.pretty
The output (stdout) from prettyalign. See Section 10.1.
.regularizer
A transition regularizer, generally used in conjunction with a Dirichlet mixture regularizer for the match states. See Section 8.1.
.weights
A file of sequence weights. See Section 9.4.

In most cases, SAM can read compressed (.gz or .Z) files, specified either as their complete name or as their root name without the compression suffix. If the root name is given and both a compressed and an uncompressed file exist, the uncompressed file is read. If both a .gz and a .Z file is present, the .gz file is used. Reading compressed files requires that the GNU decompression program gunzip and the Unix decompression program uncompress be in the user's path.


next up previous contents
: 6 Parameter specification : SAM (Sequence Alignment and : 参考文献   目次
SAM
sam-info@cse.ucsc.edu
UCSC Computational Biology Group