Useful classes for scripts¶
SequenceRepository¶
-
class
dnacauldron.
SequenceRepository
(collections=None, name='repo')[source]¶ Sequence repositories store and provide sequence records.
The records are organized into collections, for instance “parts” to host parts, “constructs” for records created during assembly plan simulation, or any other collection name like “emma_connectors” to store EMMA connectors.
The suggested initialization of a sequence repository is:
>>> repository = SequenceRepository() >>> repository.import_records(files=['part.fa', 'records.zip', etc.])
- Parameters
- collections
A dict {‘collection_name’: {‘record_id’: record, …}, …} giving for each collection a dict of Biopython records.
- name
The name of the repository as it may appear in error messages and other reports.
-
add_record
(self, record, collection='parts')[source]¶ Add one record to a collection, using its record.id as key.
The collection is created if it doesn’t exist.
The record can also be a pair (id, “ATGTGCC…”).
-
contains_record
(self, record_id)[source]¶ Return whether the repo has a record corresponding to the given id
-
get_part_names_by_collection
(self, format='dict')[source]¶ Return a dictionnary or a string representing the repo’s content.
Format: “dict” or “string”
-
import_records
(self, files=None, folder=None, collection='parts', use_file_names_as_ids=True, topology='default_to_linear')[source]¶ Import records into the repository, from files and zips and folders.
- Parameters
- files
A list of file paths, either Genbank, Fasta, Snapgene (.dna), or zips containing any of these formats.
- folder
Path to a folder which can be provided instead of
files
.- collection
Name of the collection under which to import the new records.
- use_file_names_as_ids
If True, the file name will be used as ID for any record obtained from a single-record file (fasta files with many records will still use the internal ID).
- topology
Can be “circular”, “linear”, “default_to_circular” (will default to circular if
annotations['topology']
is not already set) or “default_to_linear”.
Assembly Plan¶
-
class
dnacauldron.
AssemblyPlan
(assemblies, name='plan', logger='bar')[source]¶ -
static
from_spreadsheet
(path=None, dataframe=None, assembly_class='from_spreadsheet', sheet_name='all', header=None, name='auto_from_filename', logger='bar', assembly_class_dict='default', is_csv='auto_from_filename', **assembly_params)[source]¶ Import an assembly plan from a spreadsheet.
You can either read these docs or browse the examples in the repo. Note that this function autoselects the enzyme, based on the sites in each part. To explicitly set enzymes, set
assembly.enzyme
for each assembly inAssemblyPlan.assemblies
.- Parameters
- path
Path to a spreadsheet file (a dataframe can be used instead).
- dataframe
A pandas dataframe, possibly obtained from a spreadsheet.
- sheet_name
Name of the spreadsheet’s sheet on which the assembly plan is defined. Use “all” to load assemblies from all the sheets.
- header
True or False, indicates whether there is a header in the spreadsheet.
- name
Name of the assembly plan (leave to “auto_from_filename” to use the file name as assembly plan name).
- logger
Logger of the created assembly plan. Either “bar” for a progress bar or None for none, or any Proglog logger.
- assembly_params
Extra keyword parameters which will be fed to each assembly.
-
static
Assembly Plan Simulation¶
-
class
dnacauldron.AssemblyPlan.
AssemblyPlanSimulation
(assembly_plan, assembly_simulations, sequence_repository=None, cancelled=())[source]¶ -
compute_all_construct_data_dicts
(self)[source]¶ Return the list of data dict for each assembly simulation.
-
compute_stats
(self)[source]¶ Return a dictionary of stats.
For instance {“cancelled_assemblies”: 2, “errored_assemblies”: 1, “valid_assemblies”: 5}.
-
write_report
(self, target, folder_name='auto', assembly_report_writer='default', logger='bar', include_original_parts_records=True)[source]¶ Write a comprehensive report to a folder or zip file.
- Parameters
- target
Either a path to a folder, to a zip file, or
"@memory"
to write into a virtual zip file whose raw data is then returned.- folder_name
Name of the folder created inside the target to host the report (yes, it is a folder inside a folder, which can be very practical).
- assembly_report_writer
Either the “default” or any AssemblyReportWriter instance.
- logger
Either “bar” for a progress bar, or None, or any Proglog logger.
- include_original_parts_records
If true, the original provided part records will be included in the report (creates larger file sizes, but better for traceability).
-
Assembly Classes¶
-
class
dnacauldron.
Type2sRestrictionAssembly
(parts, name='type2s_assembly', enzyme='auto', connectors_collection=None, expected_constructs=1, expect_no_unused_parts=True, max_constructs=40, randomize_constructs=False, dependencies=None)[source]¶ Representation and simulation of type-2s (Golden-Gate) assembly.
- Parameters
- parts
A list of parts names corresponding to records in a repository. These parts will be restricted and ligated together. They can be linear, circular, and in any order.
- enzyme
Any type-2s enzyme (“BsmBI”, “BsaI”, “SapI”, etc.), or leave to “auto” to autodetect the enzyme.
- name
Name of the assembly as it will appear in reports.
- max_constructs
None or a number of maximum assemblies to compute (avoids complete freeze for combinatorial assemblies with extremely many possibilities).
- expected_constructs
Either a number or a string
'any_number'
. If the number of constructs doesn’t match this value, the assembly will be considered invalid in reports and summaries.- connectors_collection
Name of a collection in the repository from which to get candidates for connector autocompletion.
- expect_no_unused_parts
If True and some parts are unused, this will be considered an invalid assembly in summaries and reports.
- dependencies
(do not use). Metadata indicating which assemblies depend on this assembly, or are depended on by it.
-
class
dnacauldron.
GibsonAssembly
(parts, homology_checker='default', name='homologous_assembly', connectors_collection=None, expected_constructs=1, expect_no_unused_parts=True, max_constructs=40, dependencies=None)[source]¶ Representation and simulation of Gibson Assembly
- Parameters
- parts
A list of parts names corresponding to records in a repository. The parts will be considered as assembling together if they have end homologies, as checked by the homology_checker.
- homology_checker
An HomologyChecker instance defining which homology sizes and melting temperatures are valid.
- name
Name of the assembly as it will appear in reports.
- max_constructs
None or a number of maximum assemblies to compute (avoids complete freeze for combinatorial assemblies with extremely many possibilities).
- expected_constructs
Either a number or a string
'any_number'
. If the number of constructs doesn’t match this value, the assembly will be considered invalid in reports and summaries- connectors_collection
Name of a collection in the repository from which to get candidates for connector autocompletion.
- expect_no_unused_parts
If True and some parts are unused, this will be considered an invalid assembly in summaries and reports.
- dependencies
(do not use). Metadata indicating which assemblies depend on this assembly, or are depended on by it.
-
class
dnacauldron.
BASICAssembly
(parts, name='unnamed_assembly', max_constructs=40, dependencies=None, expected_constructs=1, connectors_collection=None)[source]¶ Representation and simulation of BASIC Assembly.
In this class, the order of the parts matters! It should be organized as triplets or the form (adapter, part, adapter), as follows:
>>> a1 PART_1 a2 a3 PART_2 a4 a5 PART_3 ...
Where a1 and a2 are the BASIC adapters for PART_1, etc. The parts (PART_i) should be standard biopython records, while the adapters should be sticky-ended fragments, obtained for instance from an OligoPairAnnealing assembly (see the provided example).
- Parameters
- parts
List of part names corresponding to part records in a repository. See explanations above.
- name
Name of the assembly as it will appear in reports.
- max_constructs
None or a number of maximum assemblies to compute (avoids complete freeze for combinatorial assemblies with extremely many possibilities).
- expected_constructs
Either a number or a string
'any_number'
. If the number of constructs doesn’t match this value, the assembly will be considered invalid in reports and summaries- connectors_collection
Name of a collection in the repository from which to get candidates for connector autocompletion.
- dependencies
(do not use). Metadata indicating which assemblies depend on this assembly, or are depended on by it.
-
class
dnacauldron.
BioBrickStandardAssembly
(parts, name='unnamed_assembly', connectors_collection=None, expected_constructs=1, max_constructs=40, dependencies=None)[source]¶ Representation and simulation of the Biobrick 2-part assembly standard.
- Parameters
- parts
A list of parts names corresponding to records in a repository. There must be exactly 2 parts and they must be represented on a backbone (i.e. circular constructs), and the first part will be inserted in the backbone of the second part, upstream of the second part.
- name
Name of the assembly as it will appear in reports.
- max_constructs
None or a number of maximum assemblies to compute (avoids complete freeze for combinatorial assemblies with extremely many possibilities).
- expected_constructs
Either a number or a string
'any_number'
. If the number of constructs doesn’t match this value, the assembly will be considered invalid in reports and summaries- connectors_collection
Name of a collection in the repository from which to get candidates for connector autocompletion.
- dependencies
(do not use). Metadata indicating which assemblies depend on this assembly, or are depended on by it.
-
class
dnacauldron.
LigaseCyclingReactionAssembly
(parts, bridging_oligos=(), oligo_indicator=None, homology_checker='default', name='homologous_assembly', connectors_collection=None, expected_constructs=1, expect_no_unused_parts=True, max_constructs=40, dependencies=None)[source]¶ Representation and simulation of Gibson Assembly
- Parameters
- parts
A list of parts names corresponding to records in a repository. bridging oligo names can also be provided in this part list, however in that case
bridging_olios
should be an empty list and anoligo_indicator
string should be provided.- homology_checker
An HomologyChecker instance defining which homology sizes and melting temperatures are valid between one bridging oligo and one part.
- bridging_oligos
A list of the name of bridging oligos if they are not included in the part names
- oligos_indicator
String to use to identify bridging oligos when these are provided mixed with the other parts. The string should be common to all oligo names but should not appear in any part name. For instance “BO_”.
- name
Name of the assembly as it will appear in reports.
- max_constructs
None or a number of maximum assemblies to compute (avoids complete freeze for combinatorial assemblies with extremely many possibilities).
- expected_constructs
Either a number or a string
'any_number'
. If the number of constructs doesn’t match this value, the assembly will be considered invalid in reports and summaries- expect_no_unused_parts
If True and some parts are unused, this will be considered an invalid assembly in summaries and reports.
- dependencies
(do not use). Metadata indicating which assemblies depend on this assembly, or are depended on by it.
Assembly Simulation¶
-
class
dnacauldron.Assembly.
AssemblySimulation
(assembly, sequence_repository, construct_records=(), mixes=(), warnings=(), errors=())[source]¶ Class to represent and report on the simulation of a single assembly.
Instances are the result of
assembly.simulate()
.- Parameters
- assembly
The Assembly instance from which this is the simulation.
- sequence_repository
The SequenceRepository used to get records for the simulation.
- construct_records
List of Biopython records (or, sometimes, StickyEndFragment records) of the final constructs predicted by the simulation.
- mixes
A list of AssemblyMix instances generated during the simulation (they can be plotted at report writing time).
- warnings
List of AssemblyFlaw instances that will be flagged as warnings in reports and summaries.
- errors
List of AssemblyFlaw instances that will be flagged as errors in reports and summaries.
-
compute_all_construct_data_dicts
(self)[source]¶ Return a list of dictionnaries with infos on a each construct.
Fields: construct_id, parts, number_of_parts, construct_size, assembly_name, depends_on, used_in, assembly_level.
-
compute_construct_data_dict
(self, construct_record)[source]¶ Return a dictionary with infos on a single construct.
fields: construct_id, parts, number_of_parts, construct_size, assembly_name, depends_on, used_in, assembly_level.
-
static
fragment_part
(fragment, mark_reverse=False)[source]¶ Return the name of the fragment, or optionally NAME_r if the fragment is the reverse of another fragment.
-
list_all_parts_used
(self)[source]¶ List all parts involved in at least one of the predicted constructs.
-
write_report
(self, target, report_writer='default')[source]¶ Write a comprehensive simulation report in a folder or a zip file.
- Parameters
- target
Either a path to a folder, to a zip file, or
"@memory"
to write into a virtual zip file whose raw data is then returned.- report_writer
Either the “default” or any AssemblyReportWriter instance.
- Returns
- zip_data
binary zip data (if target=”@memory”) else None.
Report Writer¶
-
class
dnacauldron.
AssemblyReportWriter
(include_fragment_plots='on_error', include_part_plots='on_error', include_mix_graphs='on_error', include_part_records=True, include_assembly_plots=False, show_overhangs_in_graph=True, annotate_parts_homologies=True, include_errors_spreadsheet=True, include_warnings_spreadsheet=True, include_pdf_report=False)[source]¶ Class to configure assembly simulation report writing.
Responsible for writing the final sequence(s) of the assembly in Genbank format as well as a .csv report on all assemblies produced and PDF figures to allow a quick overview or diagnostic.
Folder
assemblies
contains the final assemblies,assembly_graph
contains a schematic view of how the parts assemble together, folderfragments
contains the details of all fragments produced by the enzyme digestion, and folderprovided_parts
contains the original input (genbanks of all parts provided for the assembly mix).- Parameters
- include_fragment_plots
Either True/False/”on_error” to plot schemas of the fragments used in the different AssemblyMix throughout the simulation.
- include_part_plots
Either True/False/”on_error” to plot schemas of the parts used, possibly with restriction sites relevant to the AssemblyMix.
- include_mix_graphs
Either True/False/”on_error” to plot representations of fragment connectivity in the AssemblyMix created during the simulation.
- include_part_records
True/False to include the parts records in the simulation results (makes for larger folders and zips, but is better for traceability).
- include_assembly_plots
True/False to include assembly schemas in the reports (makes the report generation slower, but makes it easier to check assemblies at a glance).
- show_overhangs_in_graph
If true, the AssemblyMix graph representations will display the sequence of all fragment overhangs.
- include_errors_spreadsheet
If true and there are errors, an errors spreadsheet will be added to the report.
- include_warnings_spreadsheet
If true and there are warnings, a warnings spreadsheet will be added to the report.
- include_pdf_report
If true, a PDF report file is also generated.
Homologies¶
-
class
dnacauldron.
HomologyChecker
(min_size=15, max_size=80, min_tm=0, max_tm=None, max_distance=0)[source]¶ -
check_homology
(self, sequence, other_sequence=None)[source]¶ Return whether there is an acceptable full-sequence homology between two sequences.
-