Skip to content

Binder

%load_ext autoreload 
%autoreload 2
from seqlike.SequenceLike import SequenceLike
from itertools import product 
import matplotlib.pyplot as plt
/usr/share/miniconda3/envs/seqlike-dev/lib/python3.9/site-packages/Bio/Application/__init__.py:40: BiopythonDeprecationWarning: The Bio.Application modules and modules relying on it have been deprecated.

Due to the on going maintenance burden of keeping command line application
wrappers up to date, we have decided to deprecate and eventually remove these
modules.

We instead now recommend building your command line and invoking it directly
with the subprocess module.
  warnings.warn(

Loading BokehJS ...

How to use the SequenceLike class

In this notebook, we will show you how to use the SequenceLike class.

tl;dr example

We can generate SequenceLike classes from sequences that are more complicated in computational form than simple strings. One example is a codon sequence, which comes in triplets of letters.

sequence = ["ACC", "CAT", "GCA", "AAA", "ATA", "AAA", "ACC", "CAT"]

By passing it into the SequenceLike constructor, we can gain access to many of the convenient methods available for SeqLike objects.

s = SequenceLike(sequence)

For example, it's possible to count the number of times a sequence element is found in the sequence:

s.count(["ACC"])
2

The alphabet is also inferred directly from the sequence. (Unless explicitly specified, of course!)

s.alphabet
['AAA', 'ACC', 'ATA', 'CAT', 'GCA']
codon_alphabet = [f"{l1}{l2}{l3}" for l1, l2, l3 in product("ATGC", "ATGC", "ATGC")]
s2 = SequenceLike(sequence, alphabet=codon_alphabet)
s2.alphabet[0:10]
['AAA', 'AAC', 'AAG', 'AAT', 'ACA', 'ACC', 'ACG', 'ACT', 'AGA', 'AGC']

We can obtain matrix representations of the sequence as well.

s.to_index()
array([1, 3, 4, 0, 2, 0, 1, 3])

By comparison, if the alphabet is defined, the indexing will be different:

s2.to_index()
array([ 5, 19, 36,  0, 12,  0,  5, 19])

Same goes for the one-hot representation:

plt.imshow(s.to_onehot())
<matplotlib.image.AxesImage at 0x7f26c9ce57f0>
No description has been provided for this image
plt.imshow(s2.to_onehot())
<matplotlib.image.AxesImage at 0x7f26c9c24be0>
No description has been provided for this image

The string representation also provides a sane default:

str(s)
'ACCCATGCAAAAATAAAAACCCAT'
s.to_str()
'ACCCATGCAAAAATAAAAACCCAT'