PrimerSelector

graph TD ap[available_primers] cafl[compute_all_forbidden_locations] cavp[compute_all_valid_primers] ccp[compute_coverage_points] cfpl[compute_forbidden_patterns_locations] cnul[compute_non_unique_locations] csp[compute_sequence_primers] seq[sequence/record] sp[select_primers] trlr[tm_range, length_range] seq -->cnul cnul --> cafl seq -->cfpl cfpl --> cafl seq --> csp trlr --> csp csp -->cavp cafl--> cavp ap --> cavp cavp -->sp ccp --> sp seq --> ccp style ap fill:#fff; style trlr fill:#fff; style seq fill:#fff;
class primavera.PrimerSelector(read_range=(150, 800), size_range=(16, 25), tm_range=(55, 70), primer_conditions=(), primer_reuse_bonus=2, logger='bars', coverage_resolution=5)[source]

A selector to compute the best primers to sequence a set of constructs.

Parameters:

read_range

The experimentally measured range (start, end) so that, when a primer anneals in the sequence at index i, the range [i + start, i + end] will be correctly sequenced.

size_range

Size range (min, max) for the size of the primers, in nucleotides.

tm_range

Acceptable melting temperature range for the primers (in Celsius), as computed using the heuristic A/T=2C, G/C=4C

primer_conditions

A list of functions of the form primer_sequence => True/False. Primers for which at least one condition returns False will not be considered.

primer_reuse_bonus

Weight that the availability of the primer should have in the decision to select this primer. A higher value of this parameter leads to solutions where a higher less new primershave to be ordered, but more sequencing reactions have to be done. Set to e.g. 200 to test if there exists solutions involving solely already-available primers.

logger

Leave to ‘bars’ for default progress-bar logger, to None for no logger, or any Proglog ProgressBarLogger object.

coverage_resolution

When the user provides a record with “cover” features to indicate where to cover, the coverage points used by the algorithm are the 1-in-N nucleotides along the feature region, where N is this parameter.

Examples

>>> selector = PrimerSelector()
>>> selected_primers = selector.select_primers(records, available_primers)
>>> selector.plot_coverage(records, selected_primers, 'my_report.pdf'
compute_all_forbidden_locations(record)[source]

Return an array indicating which positions should be avoided.

We take into account forbidden patterns, user-forbidden locations, and non-unique locations. arr[i] == 1 indicates that position i should be avoided in the record.

compute_all_primers_coverage_on_record(record, indices_to_cover, available_primers)[source]

Return, for each primer, the list of indices covered in this record.

Parameters:

record

The record in which to search

indices_to_cover

List of indices to cover (defined by the user) in this record.

available_primers

List of candidate primers.

Returns:

primers_coverage

{primer_name: list_of_indices_covered_in_this_record}.

compute_forbidden_patterns_locations(record)[source]

Return an array where arr[i] == 1 means that i is surrounded by a user-forbidden pattern.

compute_indices_to_cover(record)[source]

List all indices in the record which should be covered.

These indices are equidistant points inside the user-defined zones to cover in the record.

The use determines the zones to cover via features of type misc_feature and label ‘cover’.

compute_nonunique_segments_locations(record)[source]

Return an array where arr[i] == 1 means that i is surrounded by a non-unique location.

compute_sequence_primers(record)[source]

Return, primers for the sequence, one around each index.

The primers are chosen to fit the length and melting temperature specified by the class parameters.

Parameters:

record

The record in which to list the primers

Returns:

primers

List of primers sequences.

compute_user_forbidden_locations(record)[source]

Return an array where arr[i] == 1 means that i is surrounded by a user-forbidden location.

find_part_name_in_record(record, index)[source]

Find a part where the sequence appears in the provided record.

This is used to provide primers infos (“where does this come from ?”).

Parameters:

sequence

An ATGC string representing a sequence (of a primer)

record

A single Biopython records where to find the sequence. The parts should be marked by features with a qualifier {'part': part_name}

Returns:

part_name, index

The name of the part, and position in the part, where it was found. If the sequence is found in different parts, only the first find is returned.

find_subsequence_in_records(sequence, records)[source]

Find a part where the sequence appears in the provided records.

This is used to provide primers infos (“where does this come from ?”). This will look for either the sequence or its reverse-complement.

Parameters:

sequence

An ATGC string representing a sequence (of a primer)

records

A list of Biopython records where to find the sequence. The parts should be marked by features with a qualifier {'part': part_name}

Returns:

part_name, index

The name of the part, and position in the part, where it was found. If the sequence is found in different parts, only the first find is returned.

generate_primer_name(prefix='P', available_primers_names=(), n_digits=6)[source]

Return a suitable primer name, considering existing primer names.

The result will be of the form P000425 where ‘P’ is the prefix and ‘000425’ means that ‘P000424’ was the highest-numbered primer name sarting with P in the list of available primer names

Parameters:

prefix

The prefix for the primers name

available_primers_names

List of already-allocated primer names

static locate_primer_sequence(primer, sequence)[source]

Find the location (start, end, strand) of a primer in the sequence.

Return None if the primer sequence and its reverse complement are not found in the sequence.

name_subsequence_in_records(sequence, records, prefix='P')[source]

Write a table of primers with columns ‘name’, ‘sequence’, etc.

The columns after ‘sequence’ are one column per primer metadata, such as ‘available’, ‘infos’, etc.

Parameters:

primers

A list of primers, or list of list, as returned by select_primers

csv_path

The path to a csv file to write to. If None, no file is written, only the pandas dataframe of the table is returned.

Returns:

dataframe

The pandas dataframe of the table.

plot_coverage(records, selected_primers, pdf_path)[source]

Plot the predicted sequencing coverage for each construct.

select_primers(records, available_primers=(), new_primers_prefix='P', new_primers_digits=6)[source]

Select primers to sequence the given records.

Parameters:

records

A list of biopython records to sequence. The zones to cover in the record should be indicated by a feature of type misc_feature and label cover. The zones where no primers are desired should be indicated by a feature of type misc_feature and label no_primer.

available_primers

List of Primer objects representing the available primers.

new_primers_prefix

Prefix to use for names of the new primers

new_primers_digits

The new primers will have names of the form P000435, with a number of digits provided by this parameter.

Returns:

selected_primers

A list of lists of primers, one list of primers for each consrtruct.

write_primers_table(selected_primers, csv_path=None)[source]

Write a table of primers with columns ‘name’, ‘sequence’, etc.

The columns after ‘sequence’ are one column per primer metadata, such as ‘available’, ‘infos’, etc.

Parameters:

primers

A list of primers, or list of list, as returned by select_primers

csv_path

The path to a csv file to write to. If None, no file is written, only the pandas dataframe of the table is returned.

Returns:

dataframe

The pandas dataframe of the table.

write_records_primers_table(selected_primers, records, sep='|', csv_path=None)[source]

Write a table with columns ‘construct’,’primers’ for this construct.

Parameters:

selected_primers

The list of list of primers, as returned by the select_primers method.

records

The list of records, as provided to the select_primers method.

sep

The separator between the different primer names in the primers column. Avoid ‘;’ or ‘,’ as this might be used by the CSV formatter.

csv_path

The path to a csv file to write to. If None, no file is written, only the pandas dataframe of the table is returned.

Returns:

dataframe

The pandas dataframe of the table.