PrimerSelector

        graph TD

ap[available_primers]
cafl[compute_all_forbidden_locations]
cavp[compute_all_valid_primers]
ccp[compute_coverage_points]
cfpl[compute_forbidden_patterns_locations]
cnul[compute_non_unique_locations]
csp[compute_sequence_primers]
seq[sequence/record]
sp[select_primers]
trlr[tm_range, length_range]


seq -->cnul
cnul --> cafl
seq -->cfpl
cfpl --> cafl
seq --> csp
trlr --> csp
csp -->cavp
cafl--> cavp
ap --> cavp
cavp  -->sp
ccp --> sp
seq --> ccp

style ap fill:#fff;
style trlr fill:#fff;
style seq fill:#fff;
    
class primavera.PrimerSelector(read_range=(150, 800), size_range=(16, 25), tm_range=(55, 70), primer_conditions=(), primer_reuse_bonus=2, logger='bars', homology_percentage=80, nucleotide_resolution=1, coverage_resolution=5)[source]

A selector to compute the best primers to sequence a set of constructs.

Examples

>>> selector = PrimerSelector()
>>> selected_primers = selector.select_primers(records, available_primers)
>>> selector.plot_coverage(records, selected_primers, 'my_report.pdf'
Parameters:
  • read_range – The experimentally measured range (start, end) so that, when a primer anneals in the sequence at index i, the range [i + start, i + end] will be correctly sequenced.

  • size_range – Size range (min, max) for the size of the primers, in nucleotides.

  • tm_range – Acceptable melting temperature range for the primers (in Celsius), as computed using the self.read_rangec A/T=2C, G/C=4C.

  • primer_conditions – A list of functions of the form primer_sequence => True/False. Primers for which at least one condition returns False will not be considered.

  • primer_reuse_bonus – Weight that the availability of the primer should have in the decision to select this primer. A higher value of this parameter leads to solutions where fewer new primers have to be ordered, but more sequencing reactions have to be done. Set to e.g. 200 to test if there exists solutions involving solely already-available primers.

  • logger – Leave to ‘bars’ for default progress-bar logger, to None for no logger, or any Proglog ProgressBarLogger object.

  • coverage_resolution – When the user provides a record with “cover” features to indicate where to cover, the coverage points used by the algorithm are the 1-in-N nucleotides along the feature region, where N is this parameter.

  • nucleotide_resolution – If above 1, only every N primers will be considered when listing all the potential new primers (one around each nucleotide), where N is this number.

compute_all_forbidden_locations(record)[source]

Return an array indicating which positions should be avoided.

We take into account forbidden patterns, user-forbidden locations, and non-unique locations. arr[i] == 1 indicates that position i should be avoided in the record.

compute_all_primers_coverage_on_record(record, indices_to_cover, available_primers, strand='any')[source]

Return, for each primer, the list of indices covered in this record.

Parameters:
  • record – The record in which to search.

  • indices_to_cover – List of indices to cover (defined by the user) in this record.

  • available_primers – List of candidate primers.

Returns:

{primer_name: list_of_indices_covered_in_this_record}.

Return type:

primers_coverage

compute_forbidden_patterns_locations(record)[source]

Return an array where arr[i] == 1 means that i is surrounded by a user-forbidden pattern.

compute_indices_to_cover(record)[source]

List all indices in the record which should be covered.

These indices are equidistant points inside the user-defined zones to cover in the record.

The use determines the zones to cover via features of type misc_feature and label ‘cover’.

compute_nonunique_segments_locations(record)[source]

Return an array where arr[i] == 1 means that i is surrounded by a non-unique location.

compute_sequence_primers(record, strand='any')[source]

Return primers for the sequence, one around each index.

The primers are chosen to fit the length and melting temperature specified by the class parameters.

Parameters:

record – The record in which to list the primers.

Returns:

List of primer sequences.

Return type:

primers

compute_user_forbidden_locations(record)[source]

Return an array where arr[i] == 1 means that i is surrounded by a user-forbidden location.

find_part_name_in_record(record, index)[source]

Find a part where the sequence appears in the provided record.

This is used to provide primer infos (“where does this come from ?”).

Parameters:
  • sequence – An ATGC string representing a sequence (of a primer).

  • record – A single Biopython record where to find the sequence. The parts should be marked by features with a qualifier {'part': part_name}.

Returns:

The name of the part, and position in the part, where it was found. If the sequence is found in different parts, only the first find is returned.

Return type:

part_name, index

find_subsequence_in_records(sequence, records)[source]

Find a part where the sequence appears in the provided records.

This is used to provide primer infos (“where does this come from ?”). This will look for either the sequence or its reverse-complement.

Parameters:
  • sequence – An ATGC string representing a sequence (of a primer).

  • records – A list of Biopython records where to find the sequence. The parts should be marked by features with a qualifier {'part': part_name}.

Returns:

The name of the part, and position in the part, where it was found. If the sequence is found in different parts, only the first find is returned.

Return type:

part_name, index

generate_primer_name(prefix='P', available_primers_names=(), n_digits=6)[source]

Return a suitable primer name, considering existing primer names.

The result will be of the form P000425 where ‘P’ is the prefix and ‘000425’ means that ‘P000424’ was the highest-numbered primer name starting with P in the list of available primer names.

Parameters:
  • prefix – The prefix for the primer name.

  • available_primers_names – List of already-allocated primer names.

static locate_primer_sequence(primer, sequence)[source]

Find the location (start, end, strand) of a primer in the sequence.

Return None if the primer sequence and its reverse complement are not found in the sequence.

name_subsequence_in_records(sequence, records, prefix='P')[source]

Write a table of primers with columns ‘name’, ‘sequence’, etc.

The columns after ‘sequence’ are one column per primer metadata, such as ‘available’, ‘infos’, etc.

Parameters:
  • primers – A list of primers, or list of list, as returned by select_primers.

  • csv_path – The path to a csv file to write to. If None, no file is written, only the pandas dataframe of the table is returned.

Returns:

The pandas dataframe of the table.

Return type:

dataframe

plot_coverage(records, selected_primers, pdf_path, close_figures=True, sort_matches_by=())[source]

Plot the predicted sequencing coverage for each construct.

select_primers(records, available_primers=(), strand='any', new_primers_prefix='P', new_primers_digits=6)[source]

Select primers to sequence the given records.

Parameters:
  • records – A list of Biopython records to sequence. The zones to cover in the record should be indicated by a feature of type misc_feature and label cover. The zones where no primers are desired should be indicated by a feature of type misc_feature and label no_primer.

  • available_primers – List of Primer objects representing the available primers.

  • new_primers_prefix – Prefix to use for names of the new primers.

  • new_primers_digits – The new primers will have names of the form P000435, with a number of digits provided by this parameter.

Returns:

A list of lists of primers, one list of primers for each construct.

Return type:

selected_primers

write_multifile_report(records, selected_primers, sort_matches_by=('center', 'strand'), target='@memory')[source]

Plot a full report in a folder or zip or in memory.

The report contains: - A PDF ‘coverages_plots.pdf’, where each page shows how one construct should be covered by the primers’ respective sequencing. - A spreadsheet ‘primers_list.csv’ indicating for each primer its sequence and whether it is already available. - A spreadsheet indicating, for each construct,

Parameters:
  • records – A list of construct records.

  • selected_primers – A list of list of primers (one list for each record).

  • target – Either the path to a directory or zip, or “@memory” to return a bytestring of binary data representing the zip file.

write_primers_table(selected_primers, csv_path=None)[source]

Write a table of primers with columns ‘name’, ‘sequence’, etc.

The columns after ‘sequence’ are one column per primer metadata, such as ‘available’, ‘infos’, etc.

Parameters:
  • primers – A list of primers, or list of list, as returned by select_primers.

  • csv_path – The path to a csv file to write to. If None, no file is written, only the pandas dataframe of the table is returned.

Returns:

The pandas dataframe of the table.

Return type:

dataframe

write_records_primers_table(selected_primers, records, sep='|', csv_path=None)[source]

Write a table with columns ‘construct’,’primers’ for this construct.

Parameters:
  • selected_primers – The list of list of primers, as returned by the select_primers method.

  • records – The list of records, as provided to the select_primers method.

  • sep – The separator between the different primer names in the primers column. Avoid ‘;’ or ‘,’ as this might be used by the CSV formatter.

  • csv_path – The path to a csv file to write to. If None, no file is written, only the pandas dataframe of the table is returned.

Returns:

The pandas dataframe of the table.

Return type:

dataframe