SeqLike API Reference

Core SeqLike implementation.

SeqLike (SequenceLike)

An omnibus object for various representations of biological sequences.

This class provides a simple way to interconvert between array, string, Seq, and SeqRecord representations of biological sequences. In general SeqLike objects try to act something like "dual SeqRecords", where we have both nucleotide and amino acid representations. All operations tend to act on the current selected form (NT or AA), and we strive to keep the representations in sync whenever possible.

Here's a quick usage example:

from seqlike import SeqLike
example = 'ATCGATC'

seq_record = SeqLike(example, seq_type="nt").to_seqrecord()
seq = SeqLike(example, seq_type="nt").to_seq()
seq_str = SeqLike(example, seq_type="nt").to_str()
seq_index = SeqLike(example, seq_type="nt").to_index()
seq_onehot = SeqLike(example, seq_type="nt").to_onehot()

Any of the aformentioned representations can be generated or passed in as an input. If using onehot or index encodings, they are of the shape: (N x NUM_BASES) and (N), respectively. We use symbols from our extended nucleotide and protein alphabets. These allow for gaps (-), stops (*), and additional amino acid letters. For more info, please see the _onehot_encoder and index_encoder objects, and specifically, their categories attributes.

We also allow switching between the NT and AA views of the object.

# conversion to AA
aa_example = SeqLike(example).aa()

# conversion to NT
example_orig = aa_example.nt()

The constructor also takes optional keyword arguments that are passed on to SeqRecord. For the id attribute, if unspecified we generate a random hexadecimal UUID.

codon_map defines back-translation from an AA sequence to a NT sequence. This is formatted as a callable, see for more info.

If the sequence is an NT, then its length must be a multiple of 3 to be translated. If it's not, then all AA attributes are None. Likewise, if the initializing sequence is an AA, but no codon_map was passed in, we can't generate a NT sequence. We defer and calculate these on the fly under the hood if possible.

    def sequence(self):
        return self._seqrecord.seq

    def nt(self, auto_backtranslate=True, **kwargs) -> "SeqLike":
        This method returns the NT view of the SeqLike object.

        The method will automagically back-translate the `._aa_record`
        of its SeqLike object if the `._nt_record` does not exist.

        :param auto_backtranslate: Whether or not
            to automagically back-translate an AA sequence into an NT sequence.
            Defaults to True.
        :param kwargs: Passed through to `.back_translate()`.
            Can include `codon_map` to specify which codon map to use.
        :returns: The NT view of the object.
        # Start with auto-back-translation
        if self._aa_record and self._nt_record is None and auto_backtranslate:
            return self.back_translate(**kwargs)

        if self._type == "NT":
            return deepcopy(self)
            return swap_representation(self)

    def aa(self, auto_translate=True, **kwargs) -> "SeqLike":
        Return the amino acid view of the SeqLike object.

        The method will automagically translate the `._nt_record`
        of its SeqLike object if the `._aa_record` does not exist.

        We preserve all NT record attributes for convenience in sequence manipulation
        tasks where the starter sequence is an NT and we merely want the AA form.
        The default behaviour, thus, has `id=True`, `name=True`, `description=True`,
        `annotations=True`, and `dbxrefs=True`.

        To change the behaviour,
        you can set any of those to False in `.aa()`, for example:


        :param auto_translate: Whether to automagically translate
            an NT sequence into an AA sequence.
            Defaults to True.
        :param kwargs: These kwargs are passed into BioPython SeqRecord's
            `.translate()` method.  The default behaviour is set to `id=True`,
            `name=True`, `description=True`, `annotations=True`, and `dbxrefs=True`.
        :returns: copy of self with sequence object as amino acid sequence.
        # Start with auto-translation
        if self._nt_record and self._aa_record is None and auto_translate:
            translate_kwargs = dict(id=True, name=True, description=True, annotations=True, dbxrefs=True)
            return self.translate(**translate_kwargs)

        # Return based on _type.
        if self._type == "AA":
            return deepcopy(self)
        return swap_representation(self)

    def to_seq(self) -> Seq:
        Convert the SeqLike object (string, Seq, SeqRecord) to a BioPython Seq object.

        :returns: A Seq object.
        return Seq(str(self._seqrecord.seq))

    # Function to convert input sequence to SeqRecord object
    def to_seqrecord(self, **kwargs) -> SeqRecord:
        Convert the SeqLike object (string, Seq, SeqRecord) to a BioPython SeqRecord.

        :param **kwargs: Used to set the attributes of the SeqRecord.
            Overrides the original SeqRecord's attributes.
        :returns: A SeqRecord object.
        rec = self._seqrecord
        newrec = SeqRecord(
        # overwrite newrec values with keyword args
        for key, val in kwargs.items():
            setattr(newrec, key, val)
        return newrec

    def back_translate(self, codon_map: Callable = None, **kwargs) -> "SeqLike":
        """This method backtranslates the current AA sequence and returns an
        NT sequence.

        We expect that self.codon_map will be defined, and if not, we
        expect a callable and its arguments to be passed into this
        method, as either the first positional argument or by using
        keyword arguments.  Any other arguments for the back
        translator should use keyword arguments.

        If `codon_map` is one of the keyword arguments or the sole
        positional argument, we override self.codon_map, with the
        associated value, and pass all kwargs through to it, along
        with self.  Otherwise we use self.codon_map.

        Finally, we use self.apply to apply the codon_map callable to
        the current sequence.

        :param codon_map: A codon map callable.
        :param **kwargs: Passed through to the codon_map function.
        :returns: A new NT seqlike.
        :raises AttributeError: if no codon map is passed in
            when the SeqLike's `codon_map` is also not set.
        # normally, we wouldn't have to do this sort of thing, we could
        # just use self.__dict__ in the codon_map function.

        # but we might want to use a different codon_map than the original
        # object, so we have to copy and set codon_map in the copy as to not
        # modify in-place.

        codon_map = codon_map or self.codon_map

        if codon_map:
            sc = self.apply(codon_map, **kwargs)  # returns a deepcopied SeqLike

            # TODO: Change this to an if/raise block.
            assert (
                isinstance(sc, SeqLike) and sc._type == "NT"
            ), f"Backtranslating function must return an NT SeqLike, type was {type(sc)}"
            raise AttributeError(
                "No callable passed and self.codon_map not set!  "
                "Please set the codon_map attribute or pass in a "
                "callable using the `codon_map` keyword."

        return sc.nt()

    def reverse_complement(self, **kwargs) -> "SeqLike":
        """Return reverse complement of NT sequence.

        Record this operation in annotations['reversed'].
        If sequence is currently AA, raise exception like Bio.Seq.Seq.complement()

        :param **kwargs: Not currently used.
        :returns: Reverse-complemented SeqLike object.
        :raises ValueError: when trying to reverse-complement an AA sequence.
            This is a semantically invalid operation.
        if self._type == "AA":
            raise ValueError("Proteins do not have complements!")

        if hasattr(self, "annotations"):
            annotations = self.annotations.copy()
            annotations = dict()

        if "reversed" in annotations:
            annotations["reversed"] = not annotations["reversed"]
            annotations["reversed"] = True

        _nt_record = self._nt_record.reverse_complement(
            id=True, name=True, description=True, annotations=annotations, dbxrefs=True

        s = SeqLike(
        return s

    def pad_to(self, new_length: int, pad_char: str = gap_letter, mode: str = "right") -> "SeqLike":
        """This method returns a new SeqLike, with gap characters added to the left
        or right to make a sequence of the specified length. Note that this method
        retains only the current view!

        :param new_length: An integer >= than current length
        :param pad_char: The padding character
        :param mode: one of 'left' or 'right'
        :returns: new SeqLike

        assert isinstance(new_length, (int, np.int0, np.int8, np.int16, np.int32, np.int64))
        assert mode in ["left", "right"]

        diff = new_length - len(self)
        assert diff >= 0, (
            f"Current length is {len(self)} and "
            "requested padded length is {new_length}."
            " Can't do negative padding"

        if mode == "left":
            return SeqLike(
                pad_char * diff + self._seqrecord,

        if mode == "right":
            return SeqLike(
                self._seqrecord + pad_char * diff,

    def seq_num_to_idx(self, list_of_seqnums):
        """Convert seqnum to idx.

        TODO: This docstring might need to be made better. tag Andrew Giessel.

        :param list_of_seqnums: Something.
        :returns: Something.

        def ranger(i):
            """TODO: Docstrings need to be added here. tag Andrew Giessel.

            :param i: Something
            :yields: Something
            for _, b in itertools.groupby(enumerate(i), lambda x: x[1] - x[0]):
                b = list(b)
                if len(b) > 1:
                    yield slice(b[0][1], b[-1][1] + 1)
                    yield b[0][1]

        seqnums_map = dict((seqnum, i) for i, seqnum in enumerate(self.letter_annotations["seqnums"]))
        seqnums = [seqnums_map[seqnum] for seqnum in list_of_seqnums]
        return list(ranger(seqnums))

    def slice(self, list_of_seqnums):
        """Use seqnums to sub-index.

        TODO: This docstring needs to be much more improved.

        :param list_of_seqnums: Something
        :returns: Something
        return self.__getitem__(self.seq_num_to_idx(list_of_seqnums))

    def __setattr__(self, name, value):
        """Set attribute value

        We prioritize setting attribute in _seqrecord before the SeqLike object.

        <!-- #noqa: DAR101 -->
        <!-- #noqa: DAR201 -->

        if name in seqrecord_attrs:
            object.__setattr__(self._seqrecord, name, value)
            object.__setattr__(self, name, value)

    def __getattr__(self, name):
        """Called if the attribute does not already exist in SeqLike

        Please see [the official Python reference][pyref] for more information.


        :param name: Attribute to return.
        :raises AttributeError: if the attribute is not a SeqLike
            or SeqRecord attribute.

        <!-- #noqa: DAR201 -->
        if name == "__setstate__":  # pertains to pickling
            raise AttributeError

        if name in seqrecord_attrs:
            return getattr(self._seqrecord, name)
            raise AttributeError("%s not an attribute of SeqLike or SeqLike._seqrecord" % name)

    def __dir__(self):
        """Override for dir() of a SeqLike's attributes.

        :returns: List of SeqLike attributes and SeqRecord attributes.
        return super().__dir__() + list(seqrecord_attrs)

    def __getitem__(self, index) -> "SeqLike":
        __getitem__ implementation.

        Slicing on SeqLike will slice the associated sequence
        and return a sub-sequence SeqLike object.
        We try to cast the SeqLike to NT if possible,
        and then create a new sequence out of that.
        If the sub-sequence is not a multiple of three in length,
        then there will be no AA sequence associated with the new SeqLike,
        due to ambiguity of coding frame.
        This is simply the default behavior of the SeqLike constructor.
        Note that this *can* result in a new NT with AA that is out of frame,
        the constructor doesn't know or care.

        By convention, the indices here are in the "units" of the
        current type of SeqLike, i.e. in NTs if DNA or RNA, or AAs if AA

        :param index: integer or slice to access parts of the sequence.
        :returns: A new SeqLike of the same type sliced to the index.
        index = [index] if isinstance(index, (int, slice)) else index

        # slicing is a type-preserving
        seqlike_kwargs = dict(

        all_seqlikes = []
        for idx in index:
            if isinstance(idx, int):
                idx = slice(idx, idx + 1, 1)

            # Handle case when both NT and AA are present:
            # - We need to slice BOTH the _nt_record and _aa_records.
            if self._nt_record is not None and self._aa_record is not None:
                # Firstly, determine the primary and secondary records to slice based on _type
                _aa_record = deepcopy(self._aa_record)
                _nt_record = deepcopy(self._nt_record)

                if self._type == "AA":
                    # Slice the AA followed by 3x boundaries.for NT
                    _aa_record = _aa_record[idx]

                    start = idx.start or 0
                    stop = idx.stop or len(self)
                    step = idx.step or 1

                    new_nt_record = None
                    for i in range(start, stop, step):
                        if new_nt_record is None:
                            new_nt_record = _nt_record[i * 3 : (i + 1) * 3]
                            new_nt_record = new_nt_record + _nt_record[i * 3 : (i + 1) * 3]

                    sliced = SeqLike(_aa_record, **seqlike_kwargs)
                    sliced._nt_record = new_nt_record

                elif self._type == "NT":
                    _nt_record = deepcopy(self._nt_record)[idx]
                        _aa_record = _nt_record.translate(gap=gap_letter)
                    except Exception as e:
                    sliced = SeqLike(_nt_record, **seqlike_kwargs)
                    sliced._aa_record = _aa_record

            # _aa_record is only present
            elif self._type == "AA" and self._nt_record is None:
                sliced = SeqLike(deepcopy(self._aa_record)[idx], **seqlike_kwargs)

            # _nt_record is only present
            elif self._type == "NT" and self._aa_record is None:
                sliced = SeqLike(deepcopy(self._nt_record)[idx], **seqlike_kwargs)


        # all_seqlikes is now a list of SeqLikes that we need to concatenate
        return reduce((lambda x, y: x + y), all_seqlikes)

    def __add__(self, other):
        """Add sequence to another sequence.

        Mimics behavior of SeqRecord.__add__.

        For magical behaviour, we assume that other is of the same _type as the self._type.
        Doing so allows us to do:

        # `a` is SeqLike, `b` is of variable type
        a + b

        without specifying any further information.

        :param other: A SeqLike type object.
        :returns: The added SeqLike object.

        return _add(self, other)

    def __radd__(self, other: "SeqLike"):
        """Add another sequence or string to this sequence from the left.

        Implementation for:

        self = SeqLike(...)
        other = SeqLike(...)
        result = other + self

        Mimics behavior of SeqRecord.__radd__.

        <!-- #noqa: DAR101 -->
        <!-- #noqa: DAR201 -->
        # for summation of SeqLike using sum()
        if other == 0:
            return self
        if isinstance(other, SeqLike):
            raise RuntimeError("This should have happened via the __add__ of the other SeqLike being added!")

        # Assume it is a string, Seq, or SeqRecord.
        # Note can't transfer any per-letter-annotations
        return SeqLike(
            other + self.to_seqrecord(),

    def __sub__(self, other):
        return _sub(self, other)

    def __deepcopy__(self, memo):
        """Deepcopy implementation.

        <!-- #noqa: DAR101 -->
        <!-- #noqa: DAR201 -->
        seq_copy = SeqLike(
        seq_copy._nt_record = deepcopy(self._nt_record)
        seq_copy._aa_record = deepcopy(self._aa_record)
        return seq_copy

    def scan(self, mutant_letter: str):
        """Scan a substitution mutation over the sequence."""
        mutants = []
        for i in range(1, len(self) + 1):
            mutants.append(self + Substitution(f"{i}{mutant_letter}"))
        return pd.Series(mutants)

aaSeqLike(sequence, alphabet=None, codon_map=None, **kwargs)

Entrypoint function for generating a SeqLike of seq_type=='aa', with the same call signature as the SeqLike class. Will coerce the sequence to AA. Please see SeqLike for more info.

Source code in seqlike/
def aaSeqLike(
    sequence: SeqLikeType, alphabet: Optional[str] = None, codon_map: Optional[Callable] = None, **kwargs
) -> SeqLike:
    Entrypoint function for generating a SeqLike of seq_type=='aa', with the same call signature as the SeqLike
    class.  Will coerce the sequence to AA.  Please see SeqLike for more info.
        if not kwargs["seq_type"].upper() in ["AA"]:
                f"Trying to initialize an AA SeqLike, but seq_type is set to {kwargs['seq_type']}.  Coercing seq_type to AA"
    except KeyError:
    kwargs["seq_type"] = "AA"
    return SeqLike(sequence, alphabet=alphabet, codon_map=codon_map, **kwargs)

ntSeqLike(sequence, alphabet=None, codon_map=None, **kwargs)

Entrypoint function for generating a SeqLike of seq_type=='nt', with the same call signature as the SeqLike class. Will coerce the sequence to NT. Please see SeqLike for more info.

Source code in seqlike/
def ntSeqLike(
    sequence: SeqLikeType, alphabet: Optional[str] = None, codon_map: Optional[Callable] = None, **kwargs
) -> SeqLike:
    Entrypoint function for generating a SeqLike of seq_type=='nt', with the same call signature as the SeqLike
    class.  Will coerce the sequence to NT.  Please see SeqLike for more info.
        if not kwargs["seq_type"].upper() in ["NT", "RNA", "DNA"]:
                f"Trying to initialize an NT SeqLike, but seq_type is set to {kwargs['seq_type']}.  Coercing seq_type to NT"
    except KeyError:
    kwargs["seq_type"] = "NT"
    return SeqLike(sequence, alphabet=alphabet, codon_map=codon_map, **kwargs)


Swap representation of a SeqLike object from NT to AA.

:param s: The SeqLike for which to swap representation. :returns: A SeqLike with swapped representation. :raises ValueError: When either the _aa_record or _nt_record is missing.

Source code in seqlike/
def swap_representation(s: SeqLike) -> SeqLike:
    """Swap representation of a SeqLike object from NT to AA.

    :param s: The SeqLike for which to swap representation.
    :returns: A SeqLike with swapped representation.
    :raises ValueError: When either the _aa_record or _nt_record is missing.
    if s._aa_record is None or s._nt_record is None:
        # Raise an informative error message.
        raise ValueError(
            "Oops! It looks like the SeqLike object is missing "
            "one of `._aa_record` or `._nt_record`."
            "Here are the values for you to inspect: \n"
            f"- SeqLike._aa_record: {s._aa_record}\n"
            f"- SeqLike._nt_record: {s._nt_record}"
            "Without both representations present, we can't swap views. "
            'If your SeqLike object is of sequence type "NT", '
            "please `.translate()` it first; "
            'alternatively if your SeqLike object is of sequence type "AA", '
            "please `.back_translate()` it first."
    sc = deepcopy(s)  # sc == "seq copy"

    # swap the _type
    _type = "NT" if s._type == "AA" else "AA"

    # copy over the _aa_record and _nt_record objects.
    _aa_record = deepcopy(s._aa_record)
    _nt_record = deepcopy(s._nt_record)

    # swap out the alphabets
    # When swapping representations, standard aa -> standard nt, aa -> nt, etc.
    alphabet_mapping = {
        NT: AA,
        AA: NT,

        alphabet = alphabet_mapping.get(s.alphabet)
    except KeyError:
        raise ValueError(
            "Switching between AA and NT views supported only when using "
            "the standard alphabets provided in the SeqLike library. "
            "This ensures that we can swap the alphabets correctly, "
            "and thus also swap the encoders correctly. "
            "If you are using a custom alphabet, "
            "please set the encoders manually."

    # Obtain new encoders
    _index_encoder = index_encoder_from_alphabet(alphabet)
    _onehot_encoder = onehot_encoder_from_alphabet(alphabet)

    # Now set the attributes correctly.
    sc._type = _type
    sc.alphabet = alphabet
    sc._index_encoder = _index_encoder
    sc._onehot_encoder = _onehot_encoder
    sc._aa_record = _aa_record
    sc._nt_record = _nt_record
    sc._seqrecord = sc._aa_record if _type == "AA" else sc._nt_record
    # Codon map doesn't need to be changed so we do not explicitly set it here.
    return sc


Validate that codon_map is a callable.

:param codon_map: A codon map callable. :raises TypeError: when the codon map is not a Callable.

Source code in seqlike/
def validate_codon_map(codon_map) -> None:
    """Validate that codon_map is a callable.

    :param codon_map: A codon map callable.
    :raises TypeError: when the codon map is not a Callable.
    err_msg = (
        "An explicitly passed-in codon_map must be a callable, for e.g., "
        "the output of codon_table_to_codon_map(codon_table). "
        "Did you pass in a codon table dictionary instead? "
        "If so, please pass the dictionary through `codon_table_to_codon_map` first "
        "and pass in the resulting function to the `codon_map` argument."
    if codon_map is not None:
        if not callable(codon_map):
            raise TypeError(err_msg)