API reference
- class genealloy.SeqStep(host_tuplelist, parasite_tuplelist, frameshift=0, start_host_codon=0)
Class for keeping track of sequence comparison.
It stores a method that aligns a parasite triplet with two consecutive host triplets (duodons), a cursor that marks the position of the comparison process, and methods for generating duodons and comparing them with triplets. The advance_step() method attempts to advance the comparison by one codon step. It can (i) advance the cursor or (ii) conclude that there is no match between the sequences, or (iii) conclude that there is a match.
- Parameters:
host_tuplelist (list of tuples) – A list of tuples. Each tuple stores the allowed triplets for a codon position of the host sequence.
parasite_tuplelist (list of tuples) – A list of tuples. Each tuple stores the allowed triplets for a codon position of the parasite sequence.
frameshift (int) – An integer (0, 1 or 2) denoting the frameshift between host and parasite.
start_host_codon (int) – The host codon position from which the comparison should start.
- class genealloy.Duodon(first_triplet, second_triplet)
Class for storing two triplets.
- genealloy.codon_to_aa = { "TTT": "F", ...
Codon to amino acid dictionary.
- genealloy.aa_to_codon_extended = {"A": ["GCX"], ...
Amino acid to codon dictionary, using extended nucleotide letters.
- genealloy.codon_extended_to_aa = {"GCX": "A", ...
Codon to amino acid dictionary, using extended nucleotide letters.
- genealloy.ambiguity_code_to_nt_set = {"A": {"A"}, ...
Extended nucleotide letter to nucleotide letter dictionary.
- genealloy.complement_table = {"A": "T", ...
Extended nucleotide letter to complement letter dictionary.
- genealloy.allowed_aa_transitions = {"A": ["G", "A", "V", "L", "I"], ...
- genealloy.make_transition_dictionary(aa_to_codon_extended, allowed_aa_transitions)
- genealloy.generate_swaptable(codon_to_aa, aa_to_codon_extended)
Generate a codon to extended codon dictionary.
- genealloy.compare_letters(letter1, letter2, table={'A': {'A'}, 'B': {'C', 'G', 'T'}, 'C': {'C'}, 'D': {'A', 'G', 'T'}, 'G': {'G'}, 'H': {'A', 'C', 'T'}, 'K': {'G', 'T'}, 'M': {'A', 'C'}, 'N': {'A', 'C', 'G', 'T'}, 'R': {'A', 'G'}, 'S': {'C', 'G'}, 'T': {'T'}, 'V': {'A', 'C', 'G'}, 'W': {'A', 'T'}, 'X': {'A', 'C', 'G', 'T'}, 'Y': {'C', 'T'}})
Compare two extended nucleotide letters and return True if they match.
- genealloy.convert_seq_to_codons(seq)
Convert a string (sequence) into a list of 3-letter strings (triplets).
- genealloy.convert_codonlist_to_tuplelist(seq_codons, codon_to_codon_extended)
Convert a list of triplets into a list of tuples, using a swaptable.
The swaptable is a dict of triplet: triplets, and determines the allowed swaps.
- genealloy.compare_then_get_letter_recursively(host_tuplelist, parasite_tuplelist, host_letter, parasite_letter)
Compare two letters then get next pair of letters recursively.
Returns string for match or no match between the sequences.
- genealloy.walk_seqstep(seqstep)
Compare two sequences by calling advance_step until it returns the result.
- genealloy.compare_sequence_tuplelists(parasite_tuplelist, host_tuplelist, frameshift)
Compare two sequence’s tuplists for given frame and return list of matches.
- genealloy.compare_sequence_tuplelists_in_all_frames(parasite_tuplelist, host_tuplelist, prefix='')
Compare two sequence’s tuplists for all frames and return dict of matches.
- genealloy.find_partial_overlaps(host, parasite, swaptable, verbose=True)
- genealloy.make_genealloy(host, parasite, swaptable, verbose=True)
Compare two sequence strings and return dictionary of matches.
- genealloy.get_complement_tuplelist(codon_tuplelist)
Get complement triplets of a sequence tuplelist.
- genealloy.get_reverse_tuplelist(codon_tuplelist)
Get reverse of a tuplelist with reversed triplets.
- genealloy.get_reverse_complement_tuplelist(codon_tuplelist)
Get reverse complement of a sequence’s tuplelist.