Useful classes for scripts¶

SequenceRepository¶

class dnacauldron.SequenceRepository(collections=None, name='repo')[source]¶

Sequence repositories store and provide sequence records.

The records are organized into collections, for instance “parts” to host parts, “constructs” for records created during assembly plan simulation, or any other collection name like “emma_connectors” to store EMMA connectors.

The suggested initialization of a sequence repository is:

>>> repository = SequenceRepository()
>>> repository.import_records(files=['part.fa', 'records.zip', etc.])

Parameters

collections: A dict {‘collection_name’: {‘record_id’: record, …}, …} giving for each collection a dict of Biopython records.
name: The name of the repository as it may appear in error messages and other reports.

add_record(self, record, collection='parts')[source]¶

Add one record to a collection, using its record.id as key.

The collection is created if it doesn’t exist.

The record can also be a pair (id, “ATGTGCC…”).

add_records(self, records, collection='parts')[source]¶: Add

contains_record(self, record_id)[source]¶: Return whether the repo has a record corresponding to the given id

get_all_part_names(self)[source]¶: Return the list of all part names

get_part_names_by_collection(self, format='dict')[source]¶

Return a dictionnary or a string representing the repo’s content.

Format: “dict” or “string”

get_record(self, record_id)[source]¶: Return the record from the repository from its ID.

get_records(self, record_ids)[source]¶: Get a list of records from a list of record IDs.

import_records(self, files=None, folder=None, collection='parts', use_file_names_as_ids=True, topology='default_to_linear')[source]¶

Import records into the repository, from files and zips and folders.

Parameters

files: A list of file paths, either Genbank, Fasta, Snapgene (.dna), or zips containing any of these formats.
folder: Path to a folder which can be provided instead of files.
collection: Name of the collection under which to import the new records.
use_file_names_as_ids: If True, the file name will be used as ID for any record obtained from a single-record file (fasta files with many records will still use the internal ID).
topology: Can be “circular”, “linear”, “default_to_circular” (will default to circular if annotations['topology'] is not already set) or “default_to_linear”.

suggest_part_names(self, query, cutoff=90, limit=3)[source]¶: Suggest part names in the repo close to the given query.

Assembly Plan¶

class dnacauldron.AssemblyPlan(assemblies, name='plan', logger='bar')[source]¶

static from_spreadsheet(path=None, dataframe=None, assembly_class='from_spreadsheet', sheet_name='all', header=None, name='auto_from_filename', logger='bar', assembly_class_dict='default', is_csv='auto_from_filename', **assembly_params)[source]¶

Import an assembly plan from a spreadsheet.

You can either read these docs or browse the examples in the repo. Note that this function autoselects the enzyme, based on the sites in each part. To explicitly set enzymes, set assembly.enzyme for each assembly in AssemblyPlan.assemblies.

Parameters

path: Path to a spreadsheet file (a dataframe can be used instead).
dataframe: A pandas dataframe, possibly obtained from a spreadsheet.
sheet_name: Name of the spreadsheet’s sheet on which the assembly plan is defined. Use “all” to load assemblies from all the sheets.
header: True or False, indicates whether there is a header in the spreadsheet.
name: Name of the assembly plan (leave to “auto_from_filename” to use the file name as assembly plan name).
logger: Logger of the created assembly plan. Either “bar” for a progress bar or None for none, or any Proglog logger.
assembly_params: Extra keyword parameters which will be fed to each assembly.

simulate(self, sequence_repository)[source]¶: Simulate the whole assembly plan, return an AssemblyPlanSimulation.

to_dataframe(self)[source]¶: Return a dataframe describing the assembly plan.

Assembly Plan Simulation¶

class dnacauldron.AssemblyPlan.AssemblyPlanSimulation(assembly_plan, assembly_simulations, sequence_repository=None, cancelled=())[source]¶

compute_all_construct_data_dicts(self)[source]¶: Return the list of data dict for each assembly simulation.

compute_stats(self)[source]¶

Return a dictionary of stats.

For instance {“cancelled_assemblies”: 2, “errored_assemblies”: 1, “valid_assemblies”: 5}.

write_report(self, target, folder_name='auto', assembly_report_writer='default', logger='bar', include_original_parts_records=True)[source]¶

Write a comprehensive report to a folder or zip file.

Parameters

target: Either a path to a folder, to a zip file, or "@memory" to write into a virtual zip file whose raw data is then returned.
folder_name: Name of the folder created inside the target to host the report (yes, it is a folder inside a folder, which can be very practical).
assembly_report_writer: Either the “default” or any AssemblyReportWriter instance.
logger: Either “bar” for a progress bar, or None, or any Proglog logger.
include_original_parts_records: If true, the original provided part records will be included in the report (creates larger file sizes, but better for traceability).

Assembly Classes¶

class dnacauldron.Type2sRestrictionAssembly(parts, name='type2s_assembly', enzyme='auto', connectors_collection=None, expected_constructs=1, expect_no_unused_parts=True, max_constructs=40, randomize_constructs=False, dependencies=None)[source]¶

Representation and simulation of type-2s (Golden-Gate) assembly.

Parameters

parts: A list of parts names corresponding to records in a repository. These parts will be restricted and ligated together. They can be linear, circular, and in any order.
enzyme: Any type-2s enzyme (“BsmBI”, “BsaI”, “SapI”, etc.), or leave to “auto” to autodetect the enzyme.
name: Name of the assembly as it will appear in reports.
max_constructs: None or a number of maximum assemblies to compute (avoids complete freeze for combinatorial assemblies with extremely many possibilities).
expected_constructs: Either a number or a string 'any_number'. If the number of constructs doesn’t match this value, the assembly will be considered invalid in reports and summaries.
connectors_collection: Name of a collection in the repository from which to get candidates for connector autocompletion.
expect_no_unused_parts: If True and some parts are unused, this will be considered an invalid assembly in summaries and reports.
dependencies: (do not use). Metadata indicating which assemblies depend on this assembly, or are depended on by it.

simulate(self, sequence_repository, annotate_parts_homologies=True)[source]¶: Simulate the assembly, return an AssemblySimulation.

class dnacauldron.GibsonAssembly(parts, homology_checker='default', name='homologous_assembly', connectors_collection=None, expected_constructs=1, expect_no_unused_parts=True, max_constructs=40, dependencies=None)[source]¶

Representation and simulation of Gibson Assembly

Parameters

parts: A list of parts names corresponding to records in a repository. The parts will be considered as assembling together if they have end homologies, as checked by the homology_checker.
homology_checker: An HomologyChecker instance defining which homology sizes and melting temperatures are valid.
name: Name of the assembly as it will appear in reports.
max_constructs: None or a number of maximum assemblies to compute (avoids complete freeze for combinatorial assemblies with extremely many possibilities).
expected_constructs: Either a number or a string 'any_number'. If the number of constructs doesn’t match this value, the assembly will be considered invalid in reports and summaries
connectors_collection: Name of a collection in the repository from which to get candidates for connector autocompletion.
expect_no_unused_parts: If True and some parts are unused, this will be considered an invalid assembly in summaries and reports.
dependencies: (do not use). Metadata indicating which assemblies depend on this assembly, or are depended on by it.

simulate(self, sequence_repository, annotate_parts_homologies=True)[source]¶: Simulate the Gibson Assembly, return an AssemblySimulation.

class dnacauldron.BASICAssembly(parts, name='unnamed_assembly', max_constructs=40, dependencies=None, expected_constructs=1, connectors_collection=None)[source]¶

Representation and simulation of BASIC Assembly.

In this class, the order of the parts matters! It should be organized as triplets or the form (adapter, part, adapter), as follows:

>>> a1   PART_1   a2   a3   PART_2   a4   a5   PART_3   ...

Where a1 and a2 are the BASIC adapters for PART_1, etc. The parts (PART_i) should be standard biopython records, while the adapters should be sticky-ended fragments, obtained for instance from an OligoPairAnnealing assembly (see the provided example).

Parameters

parts: List of part names corresponding to part records in a repository. See explanations above.
name: Name of the assembly as it will appear in reports.
max_constructs: None or a number of maximum assemblies to compute (avoids complete freeze for combinatorial assemblies with extremely many possibilities).
expected_constructs: Either a number or a string 'any_number'. If the number of constructs doesn’t match this value, the assembly will be considered invalid in reports and summaries
connectors_collection: Name of a collection in the repository from which to get candidates for connector autocompletion.
dependencies: (do not use). Metadata indicating which assemblies depend on this assembly, or are depended on by it.

simulate(self, sequence_repository, annotate_parts_homologies=True)[source]¶: Simulate the BASIC assembly, return an AssemblySimulation.

class dnacauldron.BioBrickStandardAssembly(parts, name='unnamed_assembly', connectors_collection=None, expected_constructs=1, max_constructs=40, dependencies=None)[source]¶

Representation and simulation of the Biobrick 2-part assembly standard.

Parameters

parts: A list of parts names corresponding to records in a repository. There must be exactly 2 parts and they must be represented on a backbone (i.e. circular constructs), and the first part will be inserted in the backbone of the second part, upstream of the second part.
name: Name of the assembly as it will appear in reports.
max_constructs: None or a number of maximum assemblies to compute (avoids complete freeze for combinatorial assemblies with extremely many possibilities).
expected_constructs: Either a number or a string 'any_number'. If the number of constructs doesn’t match this value, the assembly will be considered invalid in reports and summaries
connectors_collection: Name of a collection in the repository from which to get candidates for connector autocompletion.
dependencies: (do not use). Metadata indicating which assemblies depend on this assembly, or are depended on by it.

simulate(self, sequence_repository, annotate_parts_homologies=True)[source]¶: Simulate the assembly, return an AssemblySimulation.

class dnacauldron.LigaseCyclingReactionAssembly(parts, bridging_oligos=(), oligo_indicator=None, homology_checker='default', name='homologous_assembly', connectors_collection=None, expected_constructs=1, expect_no_unused_parts=True, max_constructs=40, dependencies=None)[source]¶

Representation and simulation of Gibson Assembly

Parameters

parts: A list of parts names corresponding to records in a repository. bridging oligo names can also be provided in this part list, however in that case bridging_olios should be an empty list and an oligo_indicator string should be provided.
homology_checker: An HomologyChecker instance defining which homology sizes and melting temperatures are valid between one bridging oligo and one part.
bridging_oligos: A list of the name of bridging oligos if they are not included in the part names
oligos_indicator: String to use to identify bridging oligos when these are provided mixed with the other parts. The string should be common to all oligo names but should not appear in any part name. For instance “BO_”.
name: Name of the assembly as it will appear in reports.
max_constructs: None or a number of maximum assemblies to compute (avoids complete freeze for combinatorial assemblies with extremely many possibilities).
expected_constructs: Either a number or a string 'any_number'. If the number of constructs doesn’t match this value, the assembly will be considered invalid in reports and summaries
expect_no_unused_parts: If True and some parts are unused, this will be considered an invalid assembly in summaries and reports.
dependencies: (do not use). Metadata indicating which assemblies depend on this assembly, or are depended on by it.

simulate(self, sequence_repository, annotate_parts_homologies=True)[source]¶: Simulate the Gibson Assembly, return an AssemblySimulation.

Assembly Simulation¶

class dnacauldron.Assembly.AssemblySimulation(assembly, sequence_repository, construct_records=(), mixes=(), warnings=(), errors=())[source]¶

Class to represent and report on the simulation of a single assembly.

Instances are the result of assembly.simulate().

Parameters

assembly: The Assembly instance from which this is the simulation.
sequence_repository: The SequenceRepository used to get records for the simulation.
construct_records: List of Biopython records (or, sometimes, StickyEndFragment records) of the final constructs predicted by the simulation.
mixes: A list of AssemblyMix instances generated during the simulation (they can be plotted at report writing time).
warnings: List of AssemblyFlaw instances that will be flagged as warnings in reports and summaries.
errors: List of AssemblyFlaw instances that will be flagged as errors in reports and summaries.

compute_all_construct_data_dicts(self)[source]¶

Return a list of dictionnaries with infos on a each construct.

Fields: construct_id, parts, number_of_parts, construct_size, assembly_name, depends_on, used_in, assembly_level.

compute_construct_data_dict(self, construct_record)[source]¶

Return a dictionary with infos on a single construct.

fields: construct_id, parts, number_of_parts, construct_size, assembly_name, depends_on, used_in, assembly_level.

compute_summary_dataframe(self)[source]¶: Return a Pandas dataframe with infos on each construct.

static fragment_part(fragment, mark_reverse=False)[source]¶: Return the name of the fragment, or optionally NAME_r if the fragment is the reverse of another fragment.

list_all_parts_used(self)[source]¶: List all parts involved in at least one of the predicted constructs.

write_report(self, target, report_writer='default')[source]¶

Write a comprehensive simulation report in a folder or a zip file.

Parameters

target: Either a path to a folder, to a zip file, or "@memory" to write into a virtual zip file whose raw data is then returned.
report_writer: Either the “default” or any AssemblyReportWriter instance.

Returns

zip_data: binary zip data (if target=”@memory”) else None.

Report Writer¶

class dnacauldron.AssemblyReportWriter(include_fragment_plots='on_error', include_part_plots='on_error', include_mix_graphs='on_error', include_part_records=True, include_assembly_plots=False, show_overhangs_in_graph=True, annotate_parts_homologies=True, include_errors_spreadsheet=True, include_warnings_spreadsheet=True, include_pdf_report=False)[source]¶

Class to configure assembly simulation report writing.

Responsible for writing the final sequence(s) of the assembly in Genbank format as well as a .csv report on all assemblies produced and PDF figures to allow a quick overview or diagnostic.

Folder assemblies contains the final assemblies, assembly_graph contains a schematic view of how the parts assemble together, folder fragments contains the details of all fragments produced by the enzyme digestion, and folder provided_parts contains the original input (genbanks of all parts provided for the assembly mix).

Parameters

include_fragment_plots: Either True/False/”on_error” to plot schemas of the fragments used in the different AssemblyMix throughout the simulation.
include_part_plots: Either True/False/”on_error” to plot schemas of the parts used, possibly with restriction sites relevant to the AssemblyMix.
include_mix_graphs: Either True/False/”on_error” to plot representations of fragment connectivity in the AssemblyMix created during the simulation.
include_part_records: True/False to include the parts records in the simulation results (makes for larger folders and zips, but is better for traceability).
include_assembly_plots: True/False to include assembly schemas in the reports (makes the report generation slower, but makes it easier to check assemblies at a glance).
show_overhangs_in_graph: If true, the AssemblyMix graph representations will display the sequence of all fragment overhangs.
include_errors_spreadsheet: If true and there are errors, an errors spreadsheet will be added to the report.
include_warnings_spreadsheet: If true and there are warnings, a warnings spreadsheet will be added to the report.
include_pdf_report: If true, a PDF report file is also generated.

write_report(self, assembly_simulation, target)[source]¶

Write a comprehensive report for an AssemblySimulation instance.

target can be either a path to a folder, to a zip file, or "@memory" to write into a virtual zip file whose raw data is then returned.

Homologies¶

class dnacauldron.HomologyChecker(min_size=15, max_size=80, min_tm=0, max_tm=None, max_distance=0)[source]¶

check_homology(self, sequence, other_sequence=None)[source]¶: Return whether there is an acceptable full-sequence homology between two sequences.

compute_tm(self, sequence)[source]¶: Compute the melting temp. of a sequence

find_end_homologies(self, seq1, seq2)[source]¶

Finds an homology between seq1’s end and seq2’s start.

Return the size of the homology, or 0 if no homologies found.

parameters_as_string(self)[source]¶: Return a string of the parameters for errors and reports.