Useful classes for scripts

SequenceRepository

class dnacauldron.SequenceRepository(collections=None, name='repo')[source]

Sequence repositories store and provide sequence records.

The records are organized into collections, for instance “parts” to host parts, “constructs” for records created during assembly plan simulation, or any other collection name like “emma_connectors” to store EMMA connectors.

The suggested initialization of a sequence repository is:

>>> repository = SequenceRepository()
>>> repository.import_records(files=['part.fa', 'records.zip', etc.])
Parameters
collections

A dict {‘collection_name’: {‘record_id’: record, …}, …} giving for each collection a dict of Biopython records.

name

The name of the repository as it may appear in error messages and other reports.

add_record(self, record, collection='parts')[source]

Add one record to a collection, using its record.id as key.

The collection is created if it doesn’t exist.

The record can also be a pair (id, “ATGTGCC…”).

add_records(self, records, collection='parts')[source]

Add

contains_record(self, record_id)[source]

Return whether the repo has a record corresponding to the given id

get_all_part_names(self)[source]

Return the list of all part names

get_part_names_by_collection(self, format='dict')[source]

Return a dictionnary or a string representing the repo’s content.

Format: “dict” or “string”

get_record(self, record_id)[source]

Return the record from the repository from its ID.

get_records(self, record_ids)[source]

Get a list of records from a list of record IDs.

import_records(self, files=None, folder=None, collection='parts', use_file_names_as_ids=True, topology='default_to_linear')[source]

Import records into the repository, from files and zips and folders.

Parameters
files

A list of file paths, either Genbank, Fasta, Snapgene (.dna), or zips containing any of these formats.

folder

Path to a folder which can be provided instead of files.

collection

Name of the collection under which to import the new records.

use_file_names_as_ids

If True, the file name will be used as ID for any record obtained from a single-record file (fasta files with many records will still use the internal ID).

topology

Can be “circular”, “linear”, “default_to_circular” (will default to circular if annotations['topology'] is not already set) or “default_to_linear”.

suggest_part_names(self, query, cutoff=90, limit=3)[source]

Suggest part names in the repo close to the given query.

Assembly Plan

class dnacauldron.AssemblyPlan(assemblies, name='plan', logger='bar')[source]
static from_spreadsheet(path=None, dataframe=None, assembly_class='from_spreadsheet', sheet_name='all', header=None, name='auto_from_filename', logger='bar', assembly_class_dict='default', is_csv='auto_from_filename', **assembly_params)[source]

Import an assembly plan from a spreadsheet.

You can either read these docs or browse the examples in the repo. Note that this function autoselects the enzyme, based on the sites in each part. To explicitly set enzymes, set assembly.enzyme for each assembly in AssemblyPlan.assemblies.

Parameters
path

Path to a spreadsheet file (a dataframe can be used instead).

dataframe

A pandas dataframe, possibly obtained from a spreadsheet.

sheet_name

Name of the spreadsheet’s sheet on which the assembly plan is defined. Use “all” to load assemblies from all the sheets.

header

True or False, indicates whether there is a header in the spreadsheet.

name

Name of the assembly plan (leave to “auto_from_filename” to use the file name as assembly plan name).

logger

Logger of the created assembly plan. Either “bar” for a progress bar or None for none, or any Proglog logger.

assembly_params

Extra keyword parameters which will be fed to each assembly.

simulate(self, sequence_repository)[source]

Simulate the whole assembly plan, return an AssemblyPlanSimulation.

to_dataframe(self)[source]

Return a dataframe describing the assembly plan.

Assembly Plan Simulation

class dnacauldron.AssemblyPlan.AssemblyPlanSimulation(assembly_plan, assembly_simulations, sequence_repository=None, cancelled=())[source]
compute_all_construct_data_dicts(self)[source]

Return the list of data dict for each assembly simulation.

compute_stats(self)[source]

Return a dictionary of stats.

For instance {“cancelled_assemblies”: 2, “errored_assemblies”: 1, “valid_assemblies”: 5}.

write_report(self, target, folder_name='auto', assembly_report_writer='default', logger='bar', include_original_parts_records=True)[source]

Write a comprehensive report to a folder or zip file.

Parameters
target

Either a path to a folder, to a zip file, or "@memory" to write into a virtual zip file whose raw data is then returned.

folder_name

Name of the folder created inside the target to host the report (yes, it is a folder inside a folder, which can be very practical).

assembly_report_writer

Either the “default” or any AssemblyReportWriter instance.

logger

Either “bar” for a progress bar, or None, or any Proglog logger.

include_original_parts_records

If true, the original provided part records will be included in the report (creates larger file sizes, but better for traceability).

Assembly Classes

class dnacauldron.Type2sRestrictionAssembly(parts, name='type2s_assembly', enzyme='auto', connectors_collection=None, expected_constructs=1, expect_no_unused_parts=True, max_constructs=40, randomize_constructs=False, dependencies=None)[source]

Representation and simulation of type-2s (Golden-Gate) assembly.

Parameters
parts

A list of parts names corresponding to records in a repository. These parts will be restricted and ligated together. They can be linear, circular, and in any order.

enzyme

Any type-2s enzyme (“BsmBI”, “BsaI”, “SapI”, etc.), or leave to “auto” to autodetect the enzyme.

name

Name of the assembly as it will appear in reports.

max_constructs

None or a number of maximum assemblies to compute (avoids complete freeze for combinatorial assemblies with extremely many possibilities).

expected_constructs

Either a number or a string 'any_number'. If the number of constructs doesn’t match this value, the assembly will be considered invalid in reports and summaries.

connectors_collection

Name of a collection in the repository from which to get candidates for connector autocompletion.

expect_no_unused_parts

If True and some parts are unused, this will be considered an invalid assembly in summaries and reports.

dependencies

(do not use). Metadata indicating which assemblies depend on this assembly, or are depended on by it.

simulate(self, sequence_repository, annotate_parts_homologies=True)[source]

Simulate the assembly, return an AssemblySimulation.

class dnacauldron.GibsonAssembly(parts, homology_checker='default', name='homologous_assembly', connectors_collection=None, expected_constructs=1, expect_no_unused_parts=True, max_constructs=40, dependencies=None)[source]

Representation and simulation of Gibson Assembly

Parameters
parts

A list of parts names corresponding to records in a repository. The parts will be considered as assembling together if they have end homologies, as checked by the homology_checker.

homology_checker

An HomologyChecker instance defining which homology sizes and melting temperatures are valid.

name

Name of the assembly as it will appear in reports.

max_constructs

None or a number of maximum assemblies to compute (avoids complete freeze for combinatorial assemblies with extremely many possibilities).

expected_constructs

Either a number or a string 'any_number'. If the number of constructs doesn’t match this value, the assembly will be considered invalid in reports and summaries

connectors_collection

Name of a collection in the repository from which to get candidates for connector autocompletion.

expect_no_unused_parts

If True and some parts are unused, this will be considered an invalid assembly in summaries and reports.

dependencies

(do not use). Metadata indicating which assemblies depend on this assembly, or are depended on by it.

simulate(self, sequence_repository, annotate_parts_homologies=True)[source]

Simulate the Gibson Assembly, return an AssemblySimulation.

class dnacauldron.BASICAssembly(parts, name='unnamed_assembly', max_constructs=40, dependencies=None, expected_constructs=1, connectors_collection=None)[source]

Representation and simulation of BASIC Assembly.

In this class, the order of the parts matters! It should be organized as triplets or the form (adapter, part, adapter), as follows:

>>> a1   PART_1   a2   a3   PART_2   a4   a5   PART_3   ...

Where a1 and a2 are the BASIC adapters for PART_1, etc. The parts (PART_i) should be standard biopython records, while the adapters should be sticky-ended fragments, obtained for instance from an OligoPairAnnealing assembly (see the provided example).

Parameters
parts

List of part names corresponding to part records in a repository. See explanations above.

name

Name of the assembly as it will appear in reports.

max_constructs

None or a number of maximum assemblies to compute (avoids complete freeze for combinatorial assemblies with extremely many possibilities).

expected_constructs

Either a number or a string 'any_number'. If the number of constructs doesn’t match this value, the assembly will be considered invalid in reports and summaries

connectors_collection

Name of a collection in the repository from which to get candidates for connector autocompletion.

dependencies

(do not use). Metadata indicating which assemblies depend on this assembly, or are depended on by it.

simulate(self, sequence_repository, annotate_parts_homologies=True)[source]

Simulate the BASIC assembly, return an AssemblySimulation.

class dnacauldron.BioBrickStandardAssembly(parts, name='unnamed_assembly', connectors_collection=None, expected_constructs=1, max_constructs=40, dependencies=None)[source]

Representation and simulation of the Biobrick 2-part assembly standard.

Parameters
parts

A list of parts names corresponding to records in a repository. There must be exactly 2 parts and they must be represented on a backbone (i.e. circular constructs), and the first part will be inserted in the backbone of the second part, upstream of the second part.

name

Name of the assembly as it will appear in reports.

max_constructs

None or a number of maximum assemblies to compute (avoids complete freeze for combinatorial assemblies with extremely many possibilities).

expected_constructs

Either a number or a string 'any_number'. If the number of constructs doesn’t match this value, the assembly will be considered invalid in reports and summaries

connectors_collection

Name of a collection in the repository from which to get candidates for connector autocompletion.

dependencies

(do not use). Metadata indicating which assemblies depend on this assembly, or are depended on by it.

simulate(self, sequence_repository, annotate_parts_homologies=True)[source]

Simulate the assembly, return an AssemblySimulation.

class dnacauldron.LigaseCyclingReactionAssembly(parts, bridging_oligos=(), oligo_indicator=None, homology_checker='default', name='homologous_assembly', connectors_collection=None, expected_constructs=1, expect_no_unused_parts=True, max_constructs=40, dependencies=None)[source]

Representation and simulation of Gibson Assembly

Parameters
parts

A list of parts names corresponding to records in a repository. bridging oligo names can also be provided in this part list, however in that case bridging_olios should be an empty list and an oligo_indicator string should be provided.

homology_checker

An HomologyChecker instance defining which homology sizes and melting temperatures are valid between one bridging oligo and one part.

bridging_oligos

A list of the name of bridging oligos if they are not included in the part names

oligos_indicator

String to use to identify bridging oligos when these are provided mixed with the other parts. The string should be common to all oligo names but should not appear in any part name. For instance “BO_”.

name

Name of the assembly as it will appear in reports.

max_constructs

None or a number of maximum assemblies to compute (avoids complete freeze for combinatorial assemblies with extremely many possibilities).

expected_constructs

Either a number or a string 'any_number'. If the number of constructs doesn’t match this value, the assembly will be considered invalid in reports and summaries

expect_no_unused_parts

If True and some parts are unused, this will be considered an invalid assembly in summaries and reports.

dependencies

(do not use). Metadata indicating which assemblies depend on this assembly, or are depended on by it.

simulate(self, sequence_repository, annotate_parts_homologies=True)[source]

Simulate the Gibson Assembly, return an AssemblySimulation.

Assembly Simulation

class dnacauldron.Assembly.AssemblySimulation(assembly, sequence_repository, construct_records=(), mixes=(), warnings=(), errors=())[source]

Class to represent and report on the simulation of a single assembly.

Instances are the result of assembly.simulate().

Parameters
assembly

The Assembly instance from which this is the simulation.

sequence_repository

The SequenceRepository used to get records for the simulation.

construct_records

List of Biopython records (or, sometimes, StickyEndFragment records) of the final constructs predicted by the simulation.

mixes

A list of AssemblyMix instances generated during the simulation (they can be plotted at report writing time).

warnings

List of AssemblyFlaw instances that will be flagged as warnings in reports and summaries.

errors

List of AssemblyFlaw instances that will be flagged as errors in reports and summaries.

compute_all_construct_data_dicts(self)[source]

Return a list of dictionnaries with infos on a each construct.

Fields: construct_id, parts, number_of_parts, construct_size, assembly_name, depends_on, used_in, assembly_level.

compute_construct_data_dict(self, construct_record)[source]

Return a dictionary with infos on a single construct.

fields: construct_id, parts, number_of_parts, construct_size, assembly_name, depends_on, used_in, assembly_level.

compute_summary_dataframe(self)[source]

Return a Pandas dataframe with infos on each construct.

static fragment_part(fragment, mark_reverse=False)[source]

Return the name of the fragment, or optionally NAME_r if the fragment is the reverse of another fragment.

list_all_parts_used(self)[source]

List all parts involved in at least one of the predicted constructs.

write_report(self, target, report_writer='default')[source]

Write a comprehensive simulation report in a folder or a zip file.

Parameters
target

Either a path to a folder, to a zip file, or "@memory" to write into a virtual zip file whose raw data is then returned.

report_writer

Either the “default” or any AssemblyReportWriter instance.

Returns
zip_data

binary zip data (if target=”@memory”) else None.

Report Writer

class dnacauldron.AssemblyReportWriter(include_fragment_plots='on_error', include_part_plots='on_error', include_mix_graphs='on_error', include_part_records=True, include_assembly_plots=False, show_overhangs_in_graph=True, annotate_parts_homologies=True, include_errors_spreadsheet=True, include_warnings_spreadsheet=True, include_pdf_report=False)[source]

Class to configure assembly simulation report writing.

Responsible for writing the final sequence(s) of the assembly in Genbank format as well as a .csv report on all assemblies produced and PDF figures to allow a quick overview or diagnostic.

Folder assemblies contains the final assemblies, assembly_graph contains a schematic view of how the parts assemble together, folder fragments contains the details of all fragments produced by the enzyme digestion, and folder provided_parts contains the original input (genbanks of all parts provided for the assembly mix).

Parameters
include_fragment_plots

Either True/False/”on_error” to plot schemas of the fragments used in the different AssemblyMix throughout the simulation.

include_part_plots

Either True/False/”on_error” to plot schemas of the parts used, possibly with restriction sites relevant to the AssemblyMix.

include_mix_graphs

Either True/False/”on_error” to plot representations of fragment connectivity in the AssemblyMix created during the simulation.

include_part_records

True/False to include the parts records in the simulation results (makes for larger folders and zips, but is better for traceability).

include_assembly_plots

True/False to include assembly schemas in the reports (makes the report generation slower, but makes it easier to check assemblies at a glance).

show_overhangs_in_graph

If true, the AssemblyMix graph representations will display the sequence of all fragment overhangs.

include_errors_spreadsheet

If true and there are errors, an errors spreadsheet will be added to the report.

include_warnings_spreadsheet

If true and there are warnings, a warnings spreadsheet will be added to the report.

include_pdf_report

If true, a PDF report file is also generated.

write_report(self, assembly_simulation, target)[source]

Write a comprehensive report for an AssemblySimulation instance.

target can be either a path to a folder, to a zip file, or "@memory" to write into a virtual zip file whose raw data is then returned.

Homologies

class dnacauldron.HomologyChecker(min_size=15, max_size=80, min_tm=0, max_tm=None, max_distance=0)[source]
check_homology(self, sequence, other_sequence=None)[source]

Return whether there is an acceptable full-sequence homology between two sequences.

compute_tm(self, sequence)[source]

Compute the melting temp. of a sequence

find_end_homologies(self, seq1, seq2)[source]

Finds an homology between seq1’s end and seq2’s start.

Return the size of the homology, or 0 if no homologies found.

parameters_as_string(self)[source]

Return a string of the parameters for errors and reports.