PrimerSelector¶
-
class
primavera.
PrimerSelector
(read_range=(150, 800), size_range=(16, 25), tm_range=(55, 70), primer_conditions=(), primer_reuse_bonus=2, logger='bars', coverage_resolution=5)[source]¶ A selector to compute the best primers to sequence a set of constructs.
Parameters: read_range
The experimentally measured range (start, end) so that, when a primer anneals in the sequence at index i, the range
[i + start, i + end]
will be correctly sequenced.size_range
Size range (min, max) for the size of the primers, in nucleotides.
tm_range
Acceptable melting temperature range for the primers (in Celsius), as computed using the heuristic A/T=2C, G/C=4C
primer_conditions
A list of functions of the form
primer_sequence => True/False
. Primers for which at least one condition returns False will not be considered.primer_reuse_bonus
Weight that the availability of the primer should have in the decision to select this primer. A higher value of this parameter leads to solutions where a higher less new primershave to be ordered, but more sequencing reactions have to be done. Set to e.g. 200 to test if there exists solutions involving solely already-available primers.
logger
Leave to ‘bars’ for default progress-bar logger, to None for no logger, or any Proglog ProgressBarLogger object.
coverage_resolution
When the user provides a record with “cover” features to indicate where to cover, the coverage points used by the algorithm are the 1-in-N nucleotides along the feature region, where N is this parameter.
Examples
>>> selector = PrimerSelector() >>> selected_primers = selector.select_primers(records, available_primers) >>> selector.plot_coverage(records, selected_primers, 'my_report.pdf'
-
compute_all_forbidden_locations
(record)[source]¶ Return an array indicating which positions should be avoided.
We take into account forbidden patterns, user-forbidden locations, and non-unique locations.
arr[i] == 1
indicates that position i should be avoided in the record.
-
compute_all_primers_coverage_on_record
(record, indices_to_cover, available_primers)[source]¶ Return, for each primer, the list of indices covered in this record.
Parameters: record
The record in which to search
indices_to_cover
List of indices to cover (defined by the user) in this record.
available_primers
List of candidate primers.
Returns: primers_coverage
{primer_name: list_of_indices_covered_in_this_record}
.
-
compute_forbidden_patterns_locations
(record)[source]¶ Return an array where
arr[i] == 1
means that i is surrounded by a user-forbidden pattern.
-
compute_indices_to_cover
(record)[source]¶ List all indices in the record which should be covered.
These indices are equidistant points inside the user-defined zones to cover in the record.
The use determines the zones to cover via features of type
misc_feature
and label ‘cover’.
-
compute_nonunique_segments_locations
(record)[source]¶ Return an array where
arr[i] == 1
means that i is surrounded by a non-unique location.
-
compute_sequence_primers
(record)[source]¶ Return, primers for the sequence, one around each index.
The primers are chosen to fit the length and melting temperature specified by the class parameters.
Parameters: record
The record in which to list the primers
Returns: primers
List of primers sequences.
-
compute_user_forbidden_locations
(record)[source]¶ Return an array where
arr[i] == 1
means that i is surrounded by a user-forbidden location.
-
find_part_name_in_record
(record, index)[source]¶ Find a part where the sequence appears in the provided record.
This is used to provide primers infos (“where does this come from ?”).
Parameters: sequence
An ATGC string representing a sequence (of a primer)
record
A single Biopython records where to find the sequence. The parts should be marked by features with a qualifier
{'part': part_name}
Returns: part_name, index
The name of the part, and position in the part, where it was found. If the sequence is found in different parts, only the first find is returned.
-
find_subsequence_in_records
(sequence, records)[source]¶ Find a part where the sequence appears in the provided records.
This is used to provide primers infos (“where does this come from ?”). This will look for either the sequence or its reverse-complement.
Parameters: sequence
An ATGC string representing a sequence (of a primer)
records
A list of Biopython records where to find the sequence. The parts should be marked by features with a qualifier
{'part': part_name}
Returns: part_name, index
The name of the part, and position in the part, where it was found. If the sequence is found in different parts, only the first find is returned.
-
generate_primer_name
(prefix='P', available_primers_names=(), n_digits=6)[source]¶ Return a suitable primer name, considering existing primer names.
The result will be of the form P000425 where ‘P’ is the prefix and ‘000425’ means that ‘P000424’ was the highest-numbered primer name sarting with P in the list of available primer names
Parameters: prefix
The prefix for the primers name
available_primers_names
List of already-allocated primer names
-
static
locate_primer_sequence
(primer, sequence)[source]¶ Find the location (start, end, strand) of a primer in the sequence.
Return None if the primer sequence and its reverse complement are not found in the sequence.
-
name_subsequence_in_records
(sequence, records, prefix='P')[source]¶ Write a table of primers with columns ‘name’, ‘sequence’, etc.
The columns after ‘sequence’ are one column per primer metadata, such as ‘available’, ‘infos’, etc.
Parameters: primers
A list of primers, or list of list, as returned by
select_primers
csv_path
The path to a csv file to write to. If None, no file is written, only the pandas dataframe of the table is returned.
Returns: dataframe
The pandas dataframe of the table.
-
plot_coverage
(records, selected_primers, pdf_path)[source]¶ Plot the predicted sequencing coverage for each construct.
-
select_primers
(records, available_primers=(), new_primers_prefix='P', new_primers_digits=6)[source]¶ Select primers to sequence the given records.
Parameters: records
A list of biopython records to sequence. The zones to cover in the record should be indicated by a feature of type
misc_feature
and labelcover
. The zones where no primers are desired should be indicated by a feature of typemisc_feature
and labelno_primer
.available_primers
List of Primer objects representing the available primers.
new_primers_prefix
Prefix to use for names of the new primers
new_primers_digits
The new primers will have names of the form P000435, with a number of digits provided by this parameter.
Returns: selected_primers
A list of lists of primers, one list of primers for each consrtruct.
-
write_primers_table
(selected_primers, csv_path=None)[source]¶ Write a table of primers with columns ‘name’, ‘sequence’, etc.
The columns after ‘sequence’ are one column per primer metadata, such as ‘available’, ‘infos’, etc.
Parameters: primers
A list of primers, or list of list, as returned by
select_primers
csv_path
The path to a csv file to write to. If None, no file is written, only the pandas dataframe of the table is returned.
Returns: dataframe
The pandas dataframe of the table.
-
write_records_primers_table
(selected_primers, records, sep='|', csv_path=None)[source]¶ Write a table with columns ‘construct’,’primers’ for this construct.
Parameters: selected_primers
The list of list of primers, as returned by the
select_primers
method.records
The list of records, as provided to the
select_primers
method.sep
The separator between the different primer names in the
primers
column. Avoid ‘;’ or ‘,’ as this might be used by the CSV formatter.csv_path
The path to a csv file to write to. If None, no file is written, only the pandas dataframe of the table is returned.
Returns: dataframe
The pandas dataframe of the table.
-