skbio.sequence.GrammaredSequence(sequence, metadata=None, positional_metadata=None, interval_metadata=None, lowercase=False, validate=True)[source]¶Store sequence data conforming to a character set.
This is an abstract base class (ABC) that cannot be instantiated.
This class is intended to be inherited from to create grammared sequences with custom alphabets.
| Raises: | ValueError – If sequence characters are not in the character set [1]. |
|---|
References
| [1] | Nomenclature for incompletely specified bases in nucleic acid sequences: recommendations 1984. Nucleic Acids Res. May 10, 1985; 13(9): 3021-3030. A Cornish-Bowden |
Examples
Note in the example below that properties either need to be static or use skbio’s classproperty decorator.
>>> from skbio.sequence import GrammaredSequence
>>> from skbio.util import classproperty
>>> class CustomSequence(GrammaredSequence):
... @classproperty
... def degenerate_map(cls):
... return {"X": set("AB")}
...
... @classproperty
... def definite_chars(cls):
... return set("ABC")
...
...
... @classproperty
... def default_gap_char(cls):
... return '-'
...
... @classproperty
... def gap_chars(cls):
... return set('-.')
>>> seq = CustomSequence('ABABACAC')
>>> seq
CustomSequence
--------------------------
Stats:
length: 8
has gaps: False
has degenerates: False
has definites: True
--------------------------
0 ABABACAC
>>> seq = CustomSequence('XXXXXX')
>>> seq
CustomSequence
-------------------------
Stats:
length: 6
has gaps: False
has degenerates: True
has definites: False
-------------------------
0 XXXXXX
Attributes
alphabet |
Return valid characters. |
default_gap_char |
Gap character to use when constructing a new gapped sequence. |
default_write_format |
|
definite_chars |
Return definite characters. |
degenerate_chars |
Return degenerate characters. |
degenerate_map |
Return mapping of degenerate to definite characters. |
gap_chars |
Return characters defined as gaps. |
interval_metadata |
IntervalMetadata object containing info about interval features. |
metadata |
dict containing metadata which applies to the entire object. |
nondegenerate_chars |
Return non-degenerate characters. |
observed_chars |
Set of observed characters in the sequence. |
positional_metadata |
pd.DataFrame containing metadata along an axis. |
values |
Array containing underlying sequence characters. |
Built-ins
bool(gs) |
Returns truth value (truthiness) of sequence. |
x in gs |
Determine if a subsequence is contained in this sequence. |
copy.copy(gs) |
Return a shallow copy of this sequence. |
copy.deepcopy(gs) |
Return a deep copy of this sequence. |
gs1 == gs2 |
Determine if this sequence is equal to another. |
gs[x] |
Slice this sequence. |
iter(gs) |
Iterate over positions in this sequence. |
len(gs) |
Return the number of characters in this sequence. |
gs1 != gs2 |
Determine if this sequence is not equal to another. |
reversed(gs) |
Iterate over positions in this sequence in reverse order. |
str(gs) |
Return sequence characters as a string. |
Methods
concat(sequences[, how]) |
Concatenate an iterable of Sequence objects. |
count(subsequence[, start, end]) |
Count occurrences of a subsequence in this sequence. |
definites() |
Find positions containing definite characters in the sequence. |
degap() |
Return a new sequence with gap characters removed. |
degenerates() |
Find positions containing degenerate characters in the sequence. |
distance(other[, metric]) |
Compute the distance to another sequence. |
expand_degenerates() |
Yield all possible definite versions of the sequence. |
find_motifs(motif_type[, min_length, ignore]) |
Search the biological sequence for motifs. |
find_with_regex(regex[, ignore]) |
Generate slices for patterns matched by a regular expression. |
frequencies([chars, relative]) |
Compute frequencies of characters in the sequence. |
gaps() |
Find positions containing gaps in the biological sequence. |
has_definites() |
Determine if sequence contains one or more definite characters |
has_degenerates() |
Determine if sequence contains one or more degenerate characters. |
has_gaps() |
Determine if the sequence contains one or more gap characters. |
has_interval_metadata() |
Determine if the object has interval metadata. |
has_metadata() |
Determine if the object has metadata. |
has_nondegenerates() |
Determine if sequence contains one or more non-degenerate characters |
has_positional_metadata() |
Determine if the object has positional metadata. |
index(subsequence[, start, end]) |
Find position where subsequence first occurs in the sequence. |
iter_contiguous(included[, min_length, invert]) |
Yield contiguous subsequences based on included. |
iter_kmers(k[, overlap]) |
Generate kmers of length k from this sequence. |
kmer_frequencies(k[, overlap, relative]) |
Return counts of words of length k from this sequence. |
lowercase(lowercase) |
Return a case-sensitive string representation of the sequence. |
match_frequency(other[, relative]) |
Return count of positions that are the same between two sequences. |
matches(other) |
Find positions that match with another sequence. |
mismatch_frequency(other[, relative]) |
Return count of positions that differ between two sequences. |
mismatches(other) |
Find positions that do not match with another sequence. |
nondegenerates() |
Find positions containing non-degenerate characters in the sequence. |
read(file[, format]) |
Create a new Sequence instance from a file. |
replace(where, character) |
Replace values in this sequence with a different character. |
to_regex([within_capture]) |
Return regular expression object that accounts for degenerate chars. |
write(file[, format]) |
Write an instance of Sequence to a file. |