Biotools module

Sequence and Record I/O

dnacauldron.biotools.sequence_io.load_record(filepath, topology='default_to_linear', id='auto', upperize=True, max_name_length=20)[source]

Return a Biopython record read from a Fasta/Genbank/Snapgene file.

Parameters
filepath

Path to a Genbank, Fasta, or Snapgene (.dna) file.

topology

Can be “circular”, “linear”, “default_to_circular” (will default to circular if annotations['topology'] is not already set) or “default_to_linear”.

id

Sets the record.id. If “auto”, the original record.id is used, and if none is set the name of the file (without extension) is used instead.

upperize

If true, the sequence will get upperized (recommended in this library, as the mix of upper and lower case can cause problems in Biopython’s enzyme site search).

max_name_length

The name of the record will be truncated if too long to avoid Biopython exceptions being raised.

dnacauldron.biotools.sequence_io.load_records_from_file(filepath)[source]

Autodetect file format and load biopython records from it.

dnacauldron.biotools.sequence_io.load_records_from_files(files=None, folder=None, use_file_names_as_ids=False)[source]

Automatically convert files or a folder’s content to biopython records.

Parameters
files

A list of path to files. A folder can be provided instead.

folder

A path to a folder containing sequence files.

use_file_names_as_ids

If True, for every file containing a single record, the file name (without extension) will be set as the record’s ID.

dnacauldron.biotools.sequence_io.string_to_records(string)[source]

Convert a string of a fasta, genbank… into a simple ATGC string.

Can also be used to detect a format.

dnacauldron.biotools.sequence_io.write_record(record, target, fmt='genbank')[source]

Write a record as genbank, fasta, etc. via Biopython, with fixes.

BioPython record operations

dnacauldron.biotools.record_operations.annotate_record(seqrecord, location='full', feature_type='misc_feature', margin=0, **qualifiers)[source]

Add a feature to a Biopython SeqRecord.

Parameters
seqrecord

The Biopython seqrecord to be annotated.

location

Either (start, end) or (start, end, strand). (strand defaults to +1)

feature_type

The type associated with the feature.

margin

Number of extra bases added on each side of the given location.

qualifiers

Dictionary that will be the Biopython feature’s qualifiers attribute.

dnacauldron.biotools.record_operations.complement(dna_sequence)[source]

Return the complement of the DNA sequence.

For instance complement("ATGCCG") returns "TACGGC".

Uses BioPython for speed.

dnacauldron.biotools.record_operations.crop_record_with_saddling_features(record, start, end, filters=())[source]

Crop the Biopython record, but keep features that are only partially in.

Parameters
record

The Biopython record to crop.

start, end

Coordinates of the segment to crop.

filters

list of functions (feature=>True/False). Any feature that doesn’t pass at least one filter will be filtered out.

dnacauldron.biotools.record_operations.reverse_complement(sequence)[source]

Return the reverse-complement of the DNA sequence.

For instance complement("ATGCCG") returns "GCCGTA".

Uses BioPython for speed.

dnacauldron.biotools.record_operations.sequence_to_biopython_record(sequence, id='<unknown id>', name='same_as_id', features=())[source]

Return a SeqRecord of the sequence, ready to be Genbanked.

dnacauldron.biotools.record_operations.set_record_topology(record, topology)[source]

Set the Biopython record’s topology, possibly passing if already set.

This actually sets the record.annotations['topology'].The topology parameter can be “circular”, “linear”, “default_to_circular” (will default to circular if annotations['topology'] is not already set) or “default_to_linear”.

Enzyme autoselection

dnacauldron.biotools.autoselect_enzyme.autoselect_enzyme(parts, enzymes=('BsmBI', 'BsaI', 'BbsI', 'AarI', 'SapI'))[source]

Finds the enzyme that the parts were probably meant to be assembled with

Parameters
parts

A list of SeqRecord files. They should have a “linear” attribute set to True or False, otherwise

Returns
The enzyme that has as near as possible as exactly 2 sites in the different
constructs.