skbio.alignment.TabularMSA#
- class skbio.alignment.TabularMSA(sequences, metadata=None, positional_metadata=None, minter=None, index=None)[source]#
Store a multiple sequence alignment in tabular (row/column) form.
- Parameters:
- sequencesiterable of GrammaredSequence, TabularMSA
Aligned sequences in the MSA. Sequences must all be the same type and length. For example, sequences could be an iterable of
DNA,RNA, orProteinsequences. If sequences is aTabularMSA, its metadata, positional_metadata, and index will be used unless overridden by parameters metadata, positional_metadata, and minter/index, respectively.- metadatadict, optional
Arbitrary metadata which applies to the entire MSA. A shallow copy of the
dictwill be made.- positional_metadatapd.DataFrame consumable, optional
Arbitrary metadata which applies to each position in the MSA. Must be able to be passed directly to
pd.DataFrameconstructor. Each column of metadata must be the same length as the number of positions in the MSA. A shallow copy of the positional metadata will be made.- mintercallable or metadata key, optional
If provided, defines an index label for each sequence in sequences. Can either be a callable accepting a single argument (each sequence) or a key into each sequence’s
metadataattribute. Note that minter cannot be combined with index.- indexpd.Index consumable, optional
Index containing labels for sequences. Must be the same length as sequences. Must be able to be passed directly to
pd.Indexconstructor. Note that index cannot be combined with minter and the contents of index must be hashable.
- Raises:
- ValueError
If minter and index are both provided.
- ValueError
If index is not the same length as sequences.
- TypeError
If sequences contains an object that isn’t a
GrammaredSequence.- TypeError
If sequences does not contain exactly the same type of
GrammaredSequenceobjects.- ValueError
If sequences does not contain
GrammaredSequenceobjects of the same length.
See also
Notes
If neither minter nor index are provided, default index labels will be used:
pd.RangeIndex(start=0, stop=len(sequences), step=1).Examples
Create a
TabularMSAobject with three DNA sequences and four positions:>>> from skbio import DNA, TabularMSA >>> seqs = [ ... DNA('ACGT'), ... DNA('AG-T'), ... DNA('-C-T') ... ] >>> msa = TabularMSA(seqs) >>> msa TabularMSA[DNA] --------------------- Stats: sequence count: 3 position count: 4 --------------------- ACGT AG-T -C-T
Since minter or index wasn’t provided, the MSA has default index labels:
>>> msa.index RangeIndex(start=0, stop=3, step=1)
Create an MSA with metadata, positional metadata, and non-default index labels:
>>> msa = TabularMSA(seqs, index=['seq1', 'seq2', 'seq3'], ... metadata={'id': 'msa-id'}, ... positional_metadata={'prob': [3, 4, 2, 2]}) >>> msa TabularMSA[DNA] -------------------------- Metadata: 'id': 'msa-id' Positional metadata: 'prob': <dtype: int64> Stats: sequence count: 3 position count: 4 -------------------------- ACGT AG-T -C-T >>> msa.index Index(['seq1', 'seq2', 'seq3'], dtype='object')
Attributes
default_write_formatdtypeData type of the stored sequences.
ilocSlice the MSA on either axis by index position.
indexIndex containing labels along the sequence axis.
locSlice the MSA on first axis by index label, second axis by position.
metadatadictcontaining metadata which applies to the entire object.positional_metadatapd.DataFramecontaining metadata along an axis.shapeNumber of sequences (rows) and positions (columns).
Built-ins
__bool__()Boolean indicating whether the MSA is empty or not.
__contains__(label)Determine if an index label is in this MSA.
__copy__()Return a shallow copy of this MSA.
__deepcopy__(memo)Return a deep copy of this MSA.
__eq__(other)Determine if this MSA is equal to another.
__ge__(value, /)Return self>=value.
__getitem__(indexable)Slice the MSA on either axis.
__getstate__(/)Helper for pickle.
__gt__(value, /)Return self>value.
__iter__()Iterate over sequences in the MSA.
__le__(value, /)Return self<=value.
__len__()Return number of sequences in the MSA.
__lt__(value, /)Return self<value.
__ne__(other)Determine if this MSA is not equal to another.
Iterate in reverse order over sequences in the MSA.
__str__()Return string summary of this MSA.
Methods
append(sequence[, minter, index, reset_index])Append a sequence to the MSA without recomputing alignment.
Compute the majority consensus sequence for this MSA.
conservation([metric, degenerate_mode, gap_mode])Apply metric to compute conservation for all alignment positions.
extend(sequences[, minter, index, reset_index])Extend this MSA with sequences without recomputing alignment.
from_dict(dictionary)Create a
TabularMSAfrom adict.from_path_seqs(path, seqs)Create a tabular MSA from an alignment path and sequences.
gap_frequencies([axis, relative])Compute frequency of gap characters across an axis.
Determine if the object has metadata.
Determine if the object has positional metadata.
iter_positions([reverse, ignore_metadata])Iterate over positions (columns) in the MSA.
join(other[, how])Join this MSA with another by sequence (horizontally).
read(file[, format])Create a new
TabularMSAinstance from a file.reassign_index([mapping, minter])Reassign index labels to sequences in this MSA.
sort([level, ascending])Sort sequences by index label in-place.
to_dict()Create a
dictfrom thisTabularMSA.write(file[, format])Write an instance of
TabularMSAto a file.