Useful classes for scripts¶
SequenceRepository¶
- class dnacauldron.SequenceRepository(collections=None, name='repo')[source]¶
Sequence repositories store and provide sequence records.
The records are organized into collections, for instance “parts” to host parts, “constructs” for records created during assembly plan simulation, or any other collection name like “emma_connectors” to store EMMA connectors.
The suggested initialization of a sequence repository is:
>>> repository = SequenceRepository() >>> repository.import_records(files=['part.fa', 'records.zip', etc.])
- Parameters:
collections – A dict {‘collection_name’: {‘record_id’: record, …}, …} giving for each collection a dict of Biopython records.
name – The name of the repository as it may appear in error messages and other reports.
- add_record(record, collection='parts')[source]¶
Add one record to a collection, using its record.id as key.
The collection is created if it doesn’t exist.
The record can also be a pair (id, “ATGTGCC…”).
- contains_record(record_id)[source]¶
Return whether the repo has a record corresponding to the given id
- get_part_names_by_collection(format='dict')[source]¶
Return a dictionnary or a string representing the repo’s content.
Format: “dict” or “string”
- import_records(files=None, folder=None, collection='parts', use_file_names_as_ids=True, topology='default_to_linear')[source]¶
Import records into the repository, from files and zips and folders.
- Parameters:
files – A list of file paths, either Genbank, Fasta, Snapgene (.dna), or zips containing any of these formats.
folder – Path to a folder which can be provided instead of
files
.collection – Name of the collection under which to import the new records.
use_file_names_as_ids – If True, the file name will be used as ID for any record obtained from a single-record file (fasta files with many records will still use the internal ID).
topology – Can be “circular”, “linear”, “default_to_circular” (will default to circular if
annotations['topology']
is not already set) or “default_to_linear”.
Assembly Plan¶
- class dnacauldron.AssemblyPlan(assemblies, name='plan', logger='bar')[source]¶
- static from_spreadsheet(path=None, dataframe=None, assembly_class='from_spreadsheet', sheet_name='all', header=None, name='auto_from_filename', logger='bar', assembly_class_dict='default', is_csv='auto_from_filename', **assembly_params)[source]¶
Import an assembly plan from a spreadsheet.
You can either read these docs or browse the examples in the repo. Note that this function autoselects the enzyme, based on the sites in each part. To explicitly set enzymes, set
assembly.enzyme
for each assembly inAssemblyPlan.assemblies
.- Parameters:
path – Path to a spreadsheet file (a dataframe can be used instead).
dataframe – A pandas dataframe, possibly obtained from a spreadsheet.
sheet_name – Name of the spreadsheet’s sheet on which the assembly plan is defined. Use “all” to load assemblies from all the sheets.
header – True or False, indicates whether there is a header in the spreadsheet.
name – Name of the assembly plan (leave to “auto_from_filename” to use the file name as assembly plan name).
logger – Logger of the created assembly plan. Either “bar” for a progress bar or None for none, or any Proglog logger.
assembly_params – Extra keyword parameters which will be fed to each assembly.
Assembly Plan Simulation¶
- class dnacauldron.AssemblyPlan.AssemblyPlanSimulation(assembly_plan, assembly_simulations, sequence_repository=None, cancelled=())[source]¶
- compute_all_construct_data_dicts()[source]¶
Return the list of data dict for each assembly simulation.
- compute_stats()[source]¶
Return a dictionary of stats.
For instance {“cancelled_assemblies”: 2, “errored_assemblies”: 1, “valid_assemblies”: 5}.
- write_report(target, folder_name='auto', assembly_report_writer='default', logger='bar', include_original_parts_records=True)[source]¶
Write a comprehensive report to a folder or zip file.
- Parameters:
target – Either a path to a folder, to a zip file, or
"@memory"
to write into a virtual zip file whose raw data is then returned.folder_name – Name of the folder created inside the target to host the report (yes, it is a folder inside a folder, which can be very practical).
assembly_report_writer – Either the “default” or any AssemblyReportWriter instance.
logger – Either “bar” for a progress bar, or None, or any Proglog logger.
include_original_parts_records – If true, the original provided part records will be included in the report (creates larger file sizes, but better for traceability).
Assembly Classes¶
- class dnacauldron.Type2sRestrictionAssembly(parts, name='type2s_assembly', enzyme='auto', connectors_collection=None, expected_constructs=1, expect_no_unused_parts=True, max_constructs=40, randomize_constructs=False, dependencies=None)[source]¶
Representation and simulation of type-2s (Golden-Gate) assembly.
- Parameters:
parts – A list of parts names corresponding to records in a repository. These parts will be restricted and ligated together. They can be linear, circular, and in any order.
enzyme – Any type-2s enzyme (“BsmBI”, “BsaI”, “SapI”, etc.), or leave to “auto” to autodetect the enzyme.
name – Name of the assembly as it will appear in reports.
max_constructs – None or a number of maximum assemblies to compute (avoids complete freeze for combinatorial assemblies with extremely many possibilities).
expected_constructs – Either a number or a string
'any_number'
. If the number of constructs doesn’t match this value, the assembly will be considered invalid in reports and summaries.connectors_collection – Name of a collection in the repository from which to get candidates for connector autocompletion.
expect_no_unused_parts – If True and some parts are unused, this will be considered an invalid assembly in summaries and reports.
dependencies – (do not use). Metadata indicating which assemblies depend on this assembly, or are depended on by it.
- class dnacauldron.GibsonAssembly(parts, homology_checker='default', name='homologous_assembly', connectors_collection=None, expected_constructs=1, expect_no_unused_parts=True, max_constructs=40, dependencies=None)[source]¶
Representation and simulation of Gibson Assembly
- Parameters:
parts – A list of parts names corresponding to records in a repository. The parts will be considered as assembling together if they have end homologies, as checked by the homology_checker.
homology_checker – An HomologyChecker instance defining which homology sizes and melting temperatures are valid.
name – Name of the assembly as it will appear in reports.
max_constructs – None or a number of maximum assemblies to compute (avoids complete freeze for combinatorial assemblies with extremely many possibilities).
expected_constructs – Either a number or a string
'any_number'
. If the number of constructs doesn’t match this value, the assembly will be considered invalid in reports and summariesconnectors_collection – Name of a collection in the repository from which to get candidates for connector autocompletion.
expect_no_unused_parts – If True and some parts are unused, this will be considered an invalid assembly in summaries and reports.
dependencies – (do not use). Metadata indicating which assemblies depend on this assembly, or are depended on by it.
- class dnacauldron.BASICAssembly(parts, name='unnamed_assembly', max_constructs=40, dependencies=None, expected_constructs=1, connectors_collection=None)[source]¶
Representation and simulation of BASIC Assembly.
In this class, the order of the parts matters! It should be organized as triplets or the form (adapter, part, adapter), as follows:
>>> a1 PART_1 a2 a3 PART_2 a4 a5 PART_3 ...
Where a1 and a2 are the BASIC adapters for PART_1, etc. The parts (PART_i) should be standard biopython records, while the adapters should be sticky-ended fragments, obtained for instance from an OligoPairAnnealing assembly (see the provided example).
- Parameters:
parts – List of part names corresponding to part records in a repository. See explanations above.
- name
Name of the assembly as it will appear in reports.
- max_constructs
None or a number of maximum assemblies to compute (avoids complete freeze for combinatorial assemblies with extremely many possibilities).
- expected_constructs
Either a number or a string
'any_number'
. If the number of constructs doesn’t match this value, the assembly will be considered invalid in reports and summaries- connectors_collection
Name of a collection in the repository from which to get candidates for connector autocompletion.
- dependencies
(do not use). Metadata indicating which assemblies depend on this assembly, or are depended on by it.
- class dnacauldron.BioBrickStandardAssembly(parts, name='unnamed_assembly', connectors_collection=None, expected_constructs=1, max_constructs=40, dependencies=None)[source]¶
Representation and simulation of the Biobrick 2-part assembly standard.
- Parameters:
parts – A list of parts names corresponding to records in a repository. There must be exactly 2 parts and they must be represented on a backbone (i.e. circular constructs), and the first part will be inserted in the backbone of the second part, upstream of the second part.
name – Name of the assembly as it will appear in reports.
max_constructs – None or a number of maximum assemblies to compute (avoids complete freeze for combinatorial assemblies with extremely many possibilities).
expected_constructs – Either a number or a string
'any_number'
. If the number of constructs doesn’t match this value, the assembly will be considered invalid in reports and summariesconnectors_collection – Name of a collection in the repository from which to get candidates for connector autocompletion.
dependencies – (do not use). Metadata indicating which assemblies depend on this assembly, or are depended on by it.
- class dnacauldron.LigaseCyclingReactionAssembly(parts, bridging_oligos=(), oligo_indicator=None, homology_checker='default', name='homologous_assembly', connectors_collection=None, expected_constructs=1, expect_no_unused_parts=True, max_constructs=40, dependencies=None)[source]¶
Representation and simulation of Gibson Assembly
- Parameters:
parts – A list of parts names corresponding to records in a repository. bridging oligo names can also be provided in this part list, however in that case
bridging_oligos
should be an empty list and anoligo_indicator
string should be provided.homology_checker – An HomologyChecker instance defining which homology sizes and melting temperatures are valid between one bridging oligo and one part.
bridging_oligos – A list of the name of bridging oligos if they are not included in the part names
oligos_indicator – String to use to identify bridging oligos when these are provided mixed with the other parts. The string should be common to all oligo names but should not appear in any part name. For instance
"BO_"
.name – Name of the assembly as it will appear in reports.
max_constructs – None or a number of maximum assemblies to compute (avoids complete freeze for combinatorial assemblies with extremely many possibilities).
expected_constructs – Either a number or a string
'any_number'
. If the number of constructs doesn’t match this value, the assembly will be considered invalid in reports and summariesexpect_no_unused_parts – If True and some parts are unused, this will be considered an invalid assembly in summaries and reports.
dependencies – (do not use). Metadata indicating which assemblies depend on this assembly, or are depended on by it.
Assembly Simulation¶
- class dnacauldron.Assembly.AssemblySimulation(assembly, sequence_repository, construct_records=(), mixes=(), warnings=(), errors=())[source]¶
Class to represent and report on the simulation of a single assembly.
Instances are the result of
assembly.simulate()
.- Parameters:
assembly – The Assembly instance from which this is the simulation.
sequence_repository – The SequenceRepository used to get records for the simulation.
construct_records – List of Biopython records (or, sometimes, StickyEndFragment records) of the final constructs predicted by the simulation.
mixes – A list of AssemblyMix instances generated during the simulation (they can be plotted at report writing time).
warnings – List of AssemblyFlaw instances that will be flagged as warnings in reports and summaries.
errors – List of AssemblyFlaw instances that will be flagged as errors in reports and summaries.
- compute_all_construct_data_dicts()[source]¶
Return a list of dictionnaries with infos on a each construct.
Fields: construct_id, parts, number_of_parts, construct_size, assembly_name, depends_on, used_in, assembly_level.
- compute_construct_data_dict(construct_record)[source]¶
Return a dictionary with infos on a single construct.
fields: construct_id, parts, number_of_parts, construct_size, assembly_name, depends_on, used_in, assembly_level.
- static fragment_part(fragment, mark_reverse=False)[source]¶
Return the name of the fragment, or optionally NAME_r if the fragment is the reverse of another fragment.
- write_report(target, report_writer='default')[source]¶
Write a comprehensive simulation report in a folder or a zip file.
- Parameters:
target – Either a path to a folder, to a zip file, or
"@memory"
to write into a virtual zip file whose raw data is then returned.report_writer – Either the “default” or any AssemblyReportWriter instance.
- Returns:
binary zip data (if target=”@memory”) else None.
- Return type:
zip_data
Report Writer¶
- class dnacauldron.AssemblyReportWriter(include_fragment_plots='on_error', include_part_plots='on_error', include_mix_graphs='on_error', include_part_records=True, include_assembly_plots=False, show_overhangs_in_graph=True, annotate_parts_homologies=True, include_errors_spreadsheet=True, include_warnings_spreadsheet=True, include_pdf_report=False)[source]¶
Class to configure assembly simulation report writing.
Responsible for writing the final sequence(s) of the assembly in Genbank format as well as a .csv report on all assemblies produced and PDF figures to allow a quick overview or diagnostic.
Folder
assemblies
contains the final assemblies,assembly_graph
contains a schematic view of how the parts assemble together, folderfragments
contains the details of all fragments produced by the enzyme digestion, and folderprovided_parts
contains the original input (genbanks of all parts provided for the assembly mix).- Parameters:
include_fragment_plots – Either True/False/”on_error” to plot schemas of the fragments used in the different AssemblyMix throughout the simulation.
include_part_plots – Either True/False/”on_error” to plot schemas of the parts used, possibly with restriction sites relevant to the AssemblyMix.
include_mix_graphs – Either True/False/”on_error” to plot representations of fragment connectivity in the AssemblyMix created during the simulation.
include_part_records – True/False to include the parts records in the simulation results (makes for larger folders and zips, but is better for traceability).
include_assembly_plots – True/False to include assembly schemas in the reports (makes the report generation slower, but makes it easier to check assemblies at a glance).
show_overhangs_in_graph – If true, the AssemblyMix graph representations will display the sequence of all fragment overhangs.
include_errors_spreadsheet – If true and there are errors, an errors spreadsheet will be added to the report.
include_warnings_spreadsheet – If true and there are warnings, a warnings spreadsheet will be added to the report.
include_pdf_report – If true, a PDF report file is also generated.
Homologies¶
- class dnacauldron.HomologyChecker(min_size=15, max_size=80, min_tm=0, max_tm=None, max_distance=0)[source]¶
- check_homology(sequence, other_sequence=None)[source]¶
Return whether there is an acceptable full-sequence homology between two sequences.