API reference
Domesticators
- class genedom.PartDomesticator(name='unnamed domesticator', left_flank='', right_flank='', constraints=(), objectives=(), cds_by_default=False, description=None, simultaneous_mutations=1, minimize_edits=True, logger=None)[source]
Generic domesticator.
- Parameters:
name – Domesticator name as it will appear in reports etc.
description – Short domesticator description as it will appear in reports etc.
left_flank – String. Left addition to the sequence (homology arms, enzymes sites etc).
right_flank – String. Right addition to the sequence (homology arms, enz. sites etc).
constraints – Either Dnachisel constraints or functions (sequence => constraint) to be applied to the sequence for optimization.
objectives – Either Dnachisel objectives or functions (sequence => objective) to be applied to the sequence for optimization.
simultaneous_mutations – Number of sequences mutations to be applied simulatenously during optimization. A larger number creates more noise but could allow to solve tougher problems.
minimize_edits – If true, the optimizer will attempt to minimize changes while making sure the constraints hold (each edit incurs a penalty of 1 in the total optimization score).
logger – A proglog logger or ‘bar’ or None for no logger at all.
- domesticate(dna_sequence=None, protein_sequence=None, is_cds='default', codon_optimization=None, extra_constraints=(), extra_objectives=(), final_record_target=None, edit=False, barcode='', barcode_spacer='AA', report_target=None)[source]
Domesticate a sequence.
- Parameters:
dna_sequence – The DNA sequence string to domesticate.
protein_sequence – Amino-acid sequence of the protein, which will be converted into a DNA sequence string.
is_cds – If True, sequence edits are restricted to synonymous mutations.
codon_optimization – Either None for no codon optimization or the name of an organism supported by DnaChisel.
extra_constraints – List of extra constraints to apply to the domesticated sequences. Each constraint is either a DnaChisel constraint or a function (dna_sequence => DnaChisel constraint).
extra_objectives – List of extra optimization objectives to apply to the domesticated sequences. Each objective is either a DnaChisel constraint or a function (dna_sequence => DnaChisel constraint).
final_record_target – Path to the file where to write the final genbank.
edit – Turn to True to allow sequence edits (if it is false and no all constraints are originally satisfied, a failed domestication result (i.e. with attribute
success
set to False) will be returned.report_target – Target for the sequence optimization report (a folder path, or a zip path).
barcode – A sequence of DNA that will be added to the left of the sequence once the domestication is done.
barcode_spacer – Nucleotides to be added between the barcode and the enzyme (optional, the idea here is that they will make sure to avoid the creation of unwanted cutting sites).
- Return type:
final_record, edits_record, report_data, success, msg
- class genedom.GoldenGateDomesticator(left_overhang, right_overhang, left_addition='', right_addition='', enzyme='BsmBI', extra_avoided_sites=(), description='Golden Gate domesticator', name='unnamed_domesticator', cds_by_default=False, constraints=(), objectives=())[source]
Special domesticator class for Golden-Gate standards.
- Parameters:
left_overhang – 4bp overhang to be added on the left.
right_overhang – 4bp overhang to be added on the right.
left_addition – Extra sequence of DNA to be systematically added on the left of each part between the enzyme site and the rest of the sequence.
right_addition – Extra sequence to be systematically added on the right of each part between the enzyme site and the rest of the sequence.
enzyme – Enzyme used for the Golden Gate assembly. This enzyme will be added on the flanks of the sequence, and the internal sequence will be protected against sites from this enzyme during optimization.
extra_avoided_sites – Other enzymes from which the sequence should be protected during optimization in addition to the assembly
enzyme
.description – Description of the domesticator as it will appear in reports.
name – Name of the domesticator as it will appear in reports.
constraints – Either Dnachisel constraints or functions (sequence => constraint) to be applied to the sequence for optimization.
objectives – Either Dnachisel objectives or functions (sequence => objective) to be applied to the sequence for optimization.
- static standard_from_spreadsheet(path=None, dataframe=None, name_prefix='')[source]
Parse a spreadsheet into a standard with Golden Gate domesticators.
The input should be a table with the following column names: slot_name, left_overhang, right_overhang, left_addition, right_addition, enzyme, extra_avoided_sites, description.
- Parameters:
path – Path to a CSV or XLS(X) file. A dataframe can be provided instead.
dataframe – A pandas Dataframe which can be provided instead of a path.
- static create_standard_from_parts_and_overhang_list(parts, path=None, dataframe=None, enzyme='', extra_avoided_sites='')[source]
Create a standard from a list of parts and an overhang list spreadsheet.
This is a specialised method that’s useful for creating a table for a standard. It assumes that part prefixes are made up of two characters: the first one denotes the left overhang, the second one denotes the right overhang. The standard table is created by looping through the prefixes in the part names, and looking up the corresponding overhangs in the input spreadsheet. The input spreadsheet should have the following column names: overhang_name, overhang. Note that this function does not create entries for the non-specified columns, but these can be set subsequently.
- Parameters:
parts – A list of Biopython SeqRecord instances.
path – Path to a CSV or XLS(X) file. A dataframe can be provided instead.
dataframe – A pandas Dataframe which can be provided instead of a path.
enzyme – Populate the enzyme column with the specified string.
extra_avoided_sites – Populate the extra_avoided_sites column with the specified string.
Standards
- genedom.StandardDomesticatorsSet
alias of <module ‘genedom.StandardDomesticatorsSet’ from ‘/opt/hostedtoolcache/Python/3.12.9/x64/lib/python3.12/site-packages/genedom/StandardDomesticatorsSet.py’>
Batch domestication
- genedom.batch_domestication(records, target, domesticator=None, standard=None, allow_edits=False, domesticated_suffix='', include_optimization_reports=True, include_original_records=True, barcodes=(), barcode_order='same_as_records', barcode_spacer='AA', logger='bar', max_length=20)[source]
Domesticate a batch of parts according to some domesticator/standard.
Examples
>>> from genedom import BUILTIN_STANDARDS, batch_domestication >>> batch_domestication(some_records, standard=BUILTIN_STANDARDS.EMMA)
- Returns:
Where n_fails indicate how many optimizations have failed, and zip_data is binary zip data (bytes), in the case where target=”@memory”.
- Return type:
(n_fails, zip_data)
- Parameters:
records – List of Bioython records to be domesticated.
target – Path to a folder, to a zip file, or “@memory” for in-memory report generation (the raw binary data of a zip archive is then returned).
domesticator – Either a single domesticator, to be used for all parts in the batch, or a function f(record) => appropriate_domesticator. Note that a “standard” can be provided instead.
standard – A StandardDomesticatorsSet object which will be used to attribute a specific domesticator to each part. See BUILTIN_STANDARDS for examples.
allow_edits – If False, sequences cannot be edited by the domesticator, only extended with flanks. If a sequence has for instance forbidden restriction sites, the domesticaton will fail for this sequence (and this will be noted in the report.
domesticated_suffix – Suffix to give to the domesticated parts names to differentiate them from the original parts (this is optional).
include_optimization_reports – If yes, some genbanks and pdfs will be produced to show how each part was domesticated. This is in particular informative when a domestication fails and you want to understand why.
include_original_records – Will include the input records into the final report folder/archive, for traceability.
barcodes – Either a list [(barcode_name, barcode),…] or a dictionary {name: bc} or a BarcodesCollection instance. If any of this is provided, the final parts will have a barcode added on the left (this barcode will be “outside” the part and won’t appear in final constructs, but can be used to check that the part is the one you think if your samples get mixed up). Note that if there are less barcodes than parts, the barcodes will cycle and several parts may get the same barcode (which is generally fine).
barcode_order – Either “same_as_records”, or “by_size” if you want your barcodes to be attributed from the smallest to the longest part in the batch.
barcode_spacer – Sequence to appear between the barcode and the left flank of the domesticated part.
logger – Either “bar” or None for no logger or any Proglog ProgressBarLogger.
max_length – The maximum length of the name of the sequences. Some DNA synthesis companies require names to be below a certain length.
BarcodesCollection
- class genedom.BarcodesCollection(barcodes)[source]
Class representing a set of named barcode sequences.
These barcodes are meant to be annealed with same-sequence primers for PCR or sequencing.
The constructor taked a list [(name, barcode), …] as an input.
Use
BarcodesCollection.from_specs(n_barcodes=25)
to generate an instance with 25 compatible barcodes.- static from_specs(n_barcodes=96, barcode_length=20, spacer='AA', forbidden_enzymes=('BsaI', 'BsmBI', 'BbsI'), barcode_tmin=55, barcode_tmax=70, other_primer_sequences=(), heterodim_tmax=5, max_homology_length=10, include_spacers=True, names_template='B_%03d')[source]
Return a BarcodesCollection object with compatible barcodes.
- Parameters:
n_barcodes – Number of barcodes to design.
barcode_length – Length of each barcode.
spacer – Spacer to place between each barcode during the optimization, ideally the same spacer that will be used when adding the barcode to a part.
include_spacers – Whether the spacers should be part of the final sequence of the barcodes (they still won’t be considered part of the annealing primer and won’t be used for melting temperature computations).
forbidden_enzymes – Name of enzymes whose sites should not be in the barcodes.
barcode_tmin – Interval of acceptable values for the melting temperature.
barcode_tmax – Interval of acceptable values for the melting temperature.
other_primer_sequences – External sequences with which the primers should not anneal.
heterodim_tmax – Max acceptable melting temperature for the annealing of a barcode and one of the other_primer_sequences.
max_homology_length – Maximal homology between any two barcodes in the sequence.
names_template – The template used to name barcode number “i”.
Report
Tools
- genedom.write_record(record, target, fmt='genbank')[source]
Write a record as genbank, fasta, etc. via Biopython, with fixes.
- genedom.random_dna_sequence(length, probas=None, seed=None)[source]
Return a random DNA sequence (“ATGGCGT…”) with the specified length.
- Parameters:
length – Length of the DNA sequence.
proba – Frequencies for the different nucleotides, for instance
probas={"A":0.2, "T":0.3, "G":0.3, "C":0.2}
. If not specified, all nucleotides are equiprobable (p=0.25).seed – The seed to feed to the random number generator. When a seed is provided the random results depend deterministically on the seed, thus enabling reproducibility.