Golden Hinges Reference Manual

Golden Hinges has one main class called OverhangsSelector. One instance of this class determines which Overhangs are accepted and what “intercompatibility” of overhangs means. The different methods the class then enable to solve different scenarii.

OverhangsSelector

class goldenhinges.OverhangsSelector(gc_min=0, gc_max=1, differences=1, overhangs_size=4, forbidden_overhangs=(), forbidden_pairs=(), possible_overhangs=None, time_limit=None, external_overhangs=(), progress_logger='bar')[source]

A selector of comppatible overhangs for Golden-Gate assembly and others.

Parameters
gc_min

Minimal amount of GC allowed in the overhangs, e.g. 0.25 for 25% of GC in the overhang

gc_max

Maximal amount of GC allowed in the overhangs, e.g. 0.75 for 75% of GC or less in the overhang.

differences

Number of nucleotides by which all the selected overhangs and their reverse complement should differ.

overhangs_size

Number of nucleotides for the overhangs, e.g. 4 for golden gate assembly.

forbidden_overhang

List of all forbidden overhangs.

possible_overhangs

List of a few overhangs the collection should be chosen from.

time_limit

Time in seconds after which the solvers should stop if no solution was found yet.

external_overhangs

List of overhangs that all selected overhangs should be compatible with.

Methods

cut_sequence(sequence, intervals=None, solutions=1, allow_edits=False, include_extremities=True, optimize_score=True, edit_penalty=10, equal_segments=None, max_radius=10, target_indices=None)[source]

Select compatible-overhangs cut locations, one in each interval.

Parameters
sequence

An ATGC string or a Biopython record

intervals

List of the form [(start1, end1), …] indicating intervals in which to cut the sequence. Note that equal_segments or target_indices can be provided instead.

solutions

If equal to 1, one solution is returned (i.e. a list of cuts). If larger than 1, a list of solutions is returned. If equal to “iter”, an iterator over solutions is returned

allow_edits

Keep to false to forbid any sequence change.

include_extremities

Whether the sequence’s extremities should be considered as overhangs and be compatible with overhangs generated by the cuts.

optimize_score

If False, the algorithm will return any solution that fills all constraints. If True, the algorithm will go through all possible solution and find the best one, i.e. the one whose overhangs total score is maximal, which gnerally means overhangs as near as possible from the center of the cut interval, and ‘native’ in the sequence, i.e. did not need a sequence edit.

equal_segments

Number indicating that the sequence should be cut in N segments with lengths as similar as possible.

target_indices

If provided, the sequence will be cut in regions around these target indices.

max_radius

Maximal radius around the target indices for the search of a solution

Returns
solution

A list of dictionnaries, each representing one overhang with properties o['location'] (coordinate of the overhang in the sequence) and o['sequence'] (sequence of the overhang)

generate_overhangs_set(n_overhangs=None, mandatory_overhangs=(), start_at=2, step=2, n_cliques=None)[source]

Generate a set of compatible overhangs, eg {"ATTC", "ATCG", ...}

Parameters
n_overhangs

Size of the desired overhang set. If left to None, the algorithm will return the largest set it can find.

mandatory_overhangs

Overhangs which must be in the final set.

step

Increment to use for the set size when looking for the larget possible set (case n_overhangs=None). Note that this should not change the final result, but a well-chosen step can improve the computations speed several fold

start_at

Number of overhangs to start from (before increasing) when auto-selecting the number of overhangs.

n_cliques

If provided, the algorithm will look for for maximal sets of compatible overhangs using a graph-clique-based method

select_from_sets(sets_list, solutions=1, optimize_score=True)[source]

Find compatible overhangs, picking one from each provided set.

This is the central solver for methods cut_sequence, cut_sequence_into_similar_lengths,

Parameters
sets_list

A list of either sets or lists of overhangs.

solutions

Either 1 for a unique solution, a number k for a list of solutions, or “iter” which returns an iterator over all solutions.

optimize_score

If True, the total score of all overhangs choices will be maximized

Clique methods

Methods based on graph cliques enable to quickly find large albeit non necessary optimal sets of compatible overhangs

goldenhinges.clique_methods.find_compatible_overhangs(overhangs_size=4, mandatory_overhangs=(), forbidden_overhangs=(), min_gc_content=0, max_gc_content=1, min_overhangs_differences=2, min_reverse_overhangs_differences=2, elements_filters=(), compatibility_conditions=(), solution_validity_conditions=(), n_solutions_considered=5000, score='subset_size', progress_bar=False, randomize=False)[source]

Return a list of compatible overhangs for (Golden Gate) assembly, satisfying all the specified conditions.

Parameters
overhangs_size

The size of the overhangs. Four is the default and the most common case.

mandatory_overhangs

A list [“ATGC”, “TTGC”…] of the overhangs that must be part of the final solution. If these do not respect the other conditions or if they are not compatible between themselves, an error will be raised.

forbidden_overhangs

A list [“ATGC”, “TTGC”…] of overhangs that should NOT be part of the final solution.

min_gc_content

Float between 0.0 and 1.0 indicating the minimum proportion of G and C that valid overhangs should contain.

max_gc_content

Float between 0.0 and 1.0 indicating the maximum proportion of G and C that valid overhangs should contain.

min_overhangs_differences

Minimal number of different basepairs between two overhangs for them to be compatible (1 is an acceptable value but 2 is advised to really ensure the specificity of the assembly).

min_reverse_overhangs_differences=2

Minimal number of different basepairs between an overhang and the reverse-complement of a second overhang for these two overhangs to be to be compatible (1 is an acceptable value but 2 is advised to really ensure the specificity of the assembly).

elements_filters

Additional filters to narrow down the possible overhangs. Must be a list or tuple of functions fun(element)->True/False. Only overhangs such that fun(element) is True for all filters are kept.

compatibility_conditions

Additional conditions to determine whether two overhands are compatible, i.e. whether a solution can feature these two elements at the same time. Must be a list or tuple of functions fun(e1, e2)->True/False. Two overhangs are compatible when fun(e1, e2) is True for all functions in compatibility_conditions.

solution_validity_conditions

Additional validity conditions used to filter out some solutions. Must be a list or tuple of functions fun(solution)->True/False, where the solution is a list of elements of all_elements. A solution is considered valid when fun(solution) is True for all functions in solution_validity_conditions.

score

Of all the solutions explored, the which scores highest is returned. Can be either a function fun(solution)->float where solution is a list of overhangs, or it can be the default “subset_size” which means the score will be the length of the solution found (the solution returned will have as many different overhangs as could be found)

n_solutions_considered

Number of solutions considered (set to None if you want to consider all possible solutions which may take a very long time)

progress_bar

If True, progress bars are displayed as the edges are computed and the graph cliques are explored. (see find_best_compatible_subset)

goldenhinges.clique_methods.find_large_compatible_subset(all_elements, mandatory_elements=(), compatibility_conditions=(), elements_filters=(), solution_validity_conditions=(), score='subset_size', n_solutions_considered=5000, progress_bar=False, randomize=False)[source]

Return a maximal subset of all_elements where all elements are valid and inter-compatibles.

The algorithm takes in a set of elements all_elements and filters out some elements using the filters in elements_filters. Then it creates a graph whose nodes are the remaining elements. Edges are added between all pairs of elements which are “compatible” as defined by the compatibility_conditions. Finally we look for the “cliques” i.e. subsets of the graph made of elements that are all inter-compatibles. We consider a number n_solutions_considered of these, and return the one which scored highest as defined by the score function.

Parameters
all_elements

A tuple or list of all possible elements.

mandatory_elements

A tuple or list of elements contained in all_elements that must be included in the final solution.

elements_filters

Functions used to pre-filter the list of all_elements. Must be a list or tuple of functions fun(element)->True/False. Only elements of all_elements such that fun(element) is True for all filters are considered.

compatibility_conditions

Functions used to determine whether two elements are compatible, i.e. whether a solution can feature these two elements at the same time. Must be a list or tuple of functions fun(e1, e2)->True/False. Two elements of all_elements are compatible when fun(e1, e2) is True for all functions in compatibility_conditions.

solution_validity_conditions

Additional validity conditions used to filter out some solutions. Must be a list or tuple of functions fun(solution)->True/False, where the solution is a list of elements of all_elements. A solution is considered valid when fun(solution) is True for all functions in solution_validity_conditions.

score

Of all the solutions explored, the which scores highest is returned. Can be either a function fun(solution)->float where solution is a list of elements, or it can be the default “subset_size” which means the score will be the length of the solution found (the solution returned will have as many elements as could be found)

n_solutions_considered

Number of cliques of the graph that are itered through (set to None if you want to consider all cliques in the graph, which may take a very long time)

progress_bar

If True, progress bars are displayed as the edges are computed and the graph cliques are explored.

Biotools

goldenhinges.biotools.annotate_record(seqrecord, location='full', feature_type='misc_feature', margin=0, **qualifiers)[source]

Add a feature to a Biopython SeqRecord.

Parameters
seqrecord

The biopython seqrecord to be annotated.

location

Either (start, end) or (start, end, strand). (strand defaults to +1)

feature_type

The type associated with the feature

margin

Number of extra bases added on each side of the given location.

qualifiers

Dictionnary that will be the Biopython feature’s qualifiers attribute.

goldenhinges.biotools.crop_record(record, crop_start, crop_end, features_suffix=' (part)')[source]

Return the cropped record with possibly cropped features.

Note that this differs from record[start:end] in that in the latter expression, cropped features are discarded.

Parameters
record

A Biopython record

crop_start, crop_end

Start and end of the segment to be cropped.

features_suffix

All cropped features will have their label appended with this suffix.

goldenhinges.biotools.gc_content(sequence)[source]

Return the proportion of G and C in the sequence (between 0 and 1).

The sequence must be an ATGC string.

goldenhinges.biotools.list_overhangs(overhang_size=4, filters=())[source]

Return the list of all possible ATGC overhangs of the given size, such that fl(overhang) is true for every function fl in filters.

goldenhinges.biotools.load_record(filename, linear=True, name='unnamed', fmt='auto')[source]

Load a FASTA/Genbank/… record

goldenhinges.biotools.reverse_complement(sequence)[source]

Return the reverse-complement of the DNA sequence. For instance complement("ATGC") returns "GCAT".

The sequence must be an ATGC string.

goldenhinges.biotools.sequences_differences(seq1, seq2)[source]

Return the number of nucleotides that differ in the two sequences.

seq1, seq2 should be strings of DNA sequences e.g. “ATGCTGTGC”

goldenhinges.biotools.sequences_differences_array(seq1, seq2)[source]

Return an array [0, 0, 1, 0, …] with 1s for sequence differences.

seq1, seq2 should both be ATGC strings.

goldenhinges.biotools.sequences_differences_segments(seq1, seq2)[source]

Return the list of segments on which sequence seq1 differs from seq2.

The list is of the form [(start1, end1), (start2, end2), etc.]

Parameters
seq1, seq2

ATGC sequences to be compared