easy_dna¶
Easy_dna is a Python library implementing useful routines for manipulating DNA sequences, either as “ATGC” strings or Biopython records. It aims at providing a simpler interface than Biopython for common operations related to DNA sequence design and genbank generation.
Easy_dna was originally created to gather useful methods repeatedly used in the different software projects of the Edinburgh Genome Foundry for DNA design and manufacturing.
See the API reference here.
Installation¶
You can install easy_dna through PIP:
pip install easy_dna
Alternatively, you can unzip the sources in a folder and type:
python setup.py install
License = MIT¶
Easy_dna is an open-source software originally written at the Edinburgh Genome Foundry by Zulko and released on Github under the MIT licence (Copyright 2019 Edinburgh Genome Foundry). Everyone is welcome to contribute!
More biology software¶
Easy_dna is part of the EGF Codons synthetic biology software suite for DNA design, manufacturing and validation.
Reference¶
-
easy_dna.
all_iupac_variants
(iupac_sequence)[source]¶ Return all unambiguous possible versions of the given sequence.
Examples
>>> all_iupac_variants('ATN') >>> ['ATA', 'ATC', 'ATG', 'ATT']
-
easy_dna.
annotate_record
(seqrecord, location='full', feature_type='misc_feature', margin=0, **qualifiers)[source]¶ Add a feature to a Biopython SeqRecord.
- Parameters
- seqrecord
The Biopython SeqRecord to be annotated.
- location
Either (start, end) or (start, end, strand). (strand defaults to +1).
- feature_type
The type associated with the feature.
- margin
Number of extra bases added on each side of the given location.
- qualifiers
Dictionary that will be the Biopython feature’s qualifiers attribute.
-
easy_dna.
anonymized_record
(record, record_id='anonymized', label_generator='feature_%d')[source]¶ Return a record with removed annotations/keywords/features/etc.
Warning: this does not change the record sequence!
- Parameters
- record
The record to be anonymized.
- record_id
ID of the new record.
- label_generator
Recipe to change feature labels. Either
"feature_%d"
orNone
(no label) of a function (i, feature)=>label.
-
easy_dna.
censor_genbank
(filename, target, **censor_params)[source]¶ Load Genbank file and write censored version.
- Parameters
- filename
Path to the file containing the record.
- target
Path to output genbank file.
- censor_params
Optional parameters. See
censor_record()
for details.
-
easy_dna.
censor_record
(record, record_id='censored', label_generator='feature_%d', keep_topology=False, anonymise_features=True, preserve_sites=None)[source]¶ Return a record with random sequence and censored annotations/features.
Useful for creating example files or anonymising sequences for bug reports.
- Parameters
- record
The record to be anonymized.
- record_id
ID of the new record.
- label_generator
Recipe to change feature labels. Either
"feature_%d"
orNone
(no label) of a function (i, feature)=>label.- keep_topology
Whether to keep the record topology or not.
- anonymise_features
Whether to replace feature labels and ID/name, or not.
- preserve_sites
List of enzyme sites to keep. Example:
["BsmBI", "BsaI"]
. Preserves the sequence around cut sites of the specified enzymes.
-
easy_dna.
complement
(dna_sequence)[source]¶ Return the complement of the DNA sequence.
For instance
complement("ATGCCG")
returns"TACGGC"
.Uses Biopython for speed.
-
easy_dna.
copy_and_paste_segment
(seq, start, end, new_start)[source]¶ Return the sequence with segment
[start, end]
also copied elsewhere, starting in ``new_start`.
-
easy_dna.
cut_and_paste_segment
(seq, start, end, new_start)[source]¶ Move a subsequence by “diff” nucleotides the left or the right.
-
easy_dna.
delete_nucleotides
(seq, start, n)[source]¶ Return the sequence with
n
deletions from positionstart
.
-
easy_dna.
delete_segment
(seq, start, end)[source]¶ Return the sequence with deleted segment from
start
toend
.
-
easy_dna.
dna_pattern_to_regexpr
(dna_pattern)[source]¶ Return a regular expression pattern for the provided DNA pattern.
For instance
dna_pattern_to_regexpr('ATTNN')
returns"ATT[A|T|G|C][A|T|G|C]"
.
-
easy_dna.
extract_from_input
(filename=None, directory=None, construct_list=None, direct_sense=True, output_path=None, min_sequence_length=20)[source]¶ Extract features from input and return in a dictionary.
Optionally save the features in separate files.
- Parameters
- file
Input sequence file (Genbank).
- directory
Directory name containing input sequence files.
- construct_list
A list of SeqRecords.
- direct_sense
If True: make antisense features into direct-sense in the exported files.
- output_path
Path for the exported feature and report files.
- min_sequence_length
Discard sequences with length less than this integer.
-
easy_dna.
insert_segment
(seq, pos, inserted)[source]¶ Return the sequence with
inserted
inserted, starting at indexpos
.
-
easy_dna.
list_common_enzymes
(site_length=6, opt_temp=37, min_suppliers=1, site_unlike=())[source]¶ Return a list of enzyme names with the given constraints.
- Parameters
- site_length
List of accepted site lengths (6, 4, …).
- opt_temp
List of accepted optimal temperatures for the enzyme.
- min_suppliers
Minimal number registered suppliers in the Biopython data. A minimum of 3 known suppliers returns the most common enzymes.
- site_unlike
List of (ambiguous or unambiguous) DNA sequences that should NOT be recognized by the selected enzymes.
-
easy_dna.
load_record
(filename, record_id='auto', upperize=False, id_cutoff=20)[source]¶ Load a Fasta/Genbank/Snapgene file as a Biopython record.
- Parameters
- filename
Path to the file containing the record.
- record_id
Id of the record (leave to “auto” to keep the record’s original Id, which will default to the file name if the record has no Id).
- upperize
If true, the record’s sequence will be upperized.
- id_cutoff
If the Id is read from a filename, it will get truncated at this cutoff to avoid errors at report write time.
-
easy_dna.
random_dna_sequence
(length, gc_share=None, probas=None, seed=None)[source]¶ Return a random DNA sequence (“ATGGCGT…”) with the specified length.
- Parameters
- length
Length of the DNA sequence.
- gc_share
The GC content of the random sequence, as a fraction (for example, 0.3 for 30%). Overwrites probas.
- probas
Frequencies for the different nucleotides, for instance
probas={"A":0.2, "T":0.3, "G":0.3, "C":0.2}
. If not specified, all nucleotides are equiprobable (p=0.25).- seed
The seed to feed to the random number generator. When a seed is provided the random results depend deterministically on the seed, thus enabling reproducibility.
-
easy_dna.
random_protein_sequence
(length, seed=None)[source]¶ Return a random protein sequence “MNQTW…YL*” of the specified length.
- Parameters
- length
Length of the protein sequence (in number of amino-acids). Note that the sequence will always start with
"M"
and end with a stop codon"*"
with (length-2) random amino-acids in the middle.- seed
The seed to feed to the random number generator. When a seed is provided the random results depend deterministically on the seed, thus enabling reproducibility.
-
easy_dna.
record_with_different_sequence
(record, new_seq)[source]¶ Return a version of the record with the sequence set to new_seq.
-
easy_dna.
records_from_data_files
(filepaths=None, folder=None)[source]¶ Automatically convert files or a folder’s content to Biopython records.
-
easy_dna.
replace_segment
(seq, start, end, replacement)[source]¶ Return the sequence with
seq[start:end]
replaced byreplacement
.
-
easy_dna.
reverse_complement
(dna_sequence)[source]¶ Return the reverse-complement of the DNA sequence.
For instance
reverse_complement("ATGCCG")
returns"CGGCAT"
.Uses Biopython for speed.
-
easy_dna.
reverse_segment
(seq, start, end)[source]¶ Return the sequence with segment
seq[start:end]
reverse-complemented.
-
easy_dna.
reverse_translate
(protein_sequence, randomize_codons=False)[source]¶ Return a DNA sequence which translates to the provided protein sequence.
Note: at the moment, the first valid codon found is used for each amino-acid (so it is deterministic but no codon-optimization is done).
-
easy_dna.
sequence_to_biopython_record
(sequence, id='<unknown id>', name='<unknown name>', features=())[source]¶ Return a SeqRecord of the sequence, ready to be Genbanked.
-
easy_dna.
swap_segments
(seq, pos1, pos2)[source]¶ Return a new sequence with segments at position
pos1
andpos2
swapped.pos1
,pos2
are both of the form (start1, end1), (start2, end2).
-
easy_dna.
translate
(dna_sequence, translation_table='Bacterial')[source]¶ Translate the DNA sequence into an amino-acid sequence “MLKYQT…”.
If
translation_table
is the name or number of a NCBI genetic table, Biopython will be used. See here for options:http://biopython.org/DIST/docs/tutorial/Tutorial.html#sec26
translation_table
can also be a dictionary of the form{"ATT": "M", "CTC": "X", etc.}
for more exotic translation tables.