Biotools module¶

Sequence and Record I/O¶

dnacauldron.biotools.sequence_io.load_record(filepath, topology='default_to_linear', id='auto', upperize=True, max_name_length=20)[source]¶

Return a Biopython record read from a Fasta/Genbank/Snapgene file.

Parameters:

filepath – Path to a Genbank, Fasta, or Snapgene (.dna) file.
topology – Can be “circular”, “linear”, “default_to_circular” (will default to circular if annotations['topology'] is not already set) or “default_to_linear”.
id – Sets the record.id. If “auto”, the original record.id is used, and if none is set the name of the file (without extension) is used instead.
upperize – If true, the sequence will get upperized (recommended in this library, as the mix of upper and lower case can cause problems in Biopython’s enzyme site search).
max_name_length – The name of the record will be truncated if too long to avoid Biopython exceptions being raised.

dnacauldron.biotools.sequence_io.load_records_from_file(filepath)[source]¶: Autodetect file format and load biopython records from it.

dnacauldron.biotools.sequence_io.load_records_from_files(files=None, folder=None, use_file_names_as_ids=False)[source]¶

Automatically convert files or a folder’s content to biopython records.

Parameters:

files – A list of path to files. A folder can be provided instead.
folder – A path to a folder containing sequence files.
use_file_names_as_ids – If True, for every file containing a single record, the file name (without extension) will be set as the record’s ID.

dnacauldron.biotools.sequence_io.string_to_records(string)[source]¶

Convert a string of a fasta, genbank… into a simple ATGC string.

Can also be used to detect a format.

dnacauldron.biotools.sequence_io.write_record(record, target, fmt='genbank')[source]¶: Write a record as genbank, fasta, etc. via Biopython, with fixes.

BioPython record operations¶

dnacauldron.biotools.record_operations.annotate_record(seqrecord, location='full', feature_type='misc_feature', margin=0, **qualifiers)[source]¶

Add a feature to a Biopython SeqRecord.

Parameters:

seqrecord – The Biopython seqrecord to be annotated.
location – Either (start, end) or (start, end, strand). (strand defaults to +1)
feature_type – The type associated with the feature.
margin – Number of extra bases added on each side of the given location.
qualifiers – Dictionary that will be the Biopython feature’s qualifiers attribute.

dnacauldron.biotools.record_operations.complement(dna_sequence)[source]¶

Return the complement of the DNA sequence.

For instance complement("ATGCCG") returns "TACGGC".

Uses BioPython for speed.

dnacauldron.biotools.record_operations.crop_record_with_saddling_features(record, start, end, filters=())[source]¶

Crop the Biopython record, but keep features that are only partially in.

Parameters:

record – The Biopython record to crop.
start – Coordinates of the segment to crop.
end – Coordinates of the segment to crop.
filters – list of functions (feature=>True/False). Any feature that doesn’t pass at least one filter will be filtered out.

dnacauldron.biotools.record_operations.reverse_complement(sequence)[source]¶

Return the reverse-complement of the DNA sequence.

For instance complement("ATGCCG") returns "GCCGTA".

Uses BioPython for speed.

dnacauldron.biotools.record_operations.sequence_to_biopython_record(sequence, id='<unknown id>', name='same_as_id', features=())[source]¶: Return a SeqRecord of the sequence, ready to be Genbanked.

dnacauldron.biotools.record_operations.set_record_topology(record, topology)[source]¶

Set the Biopython record’s topology, possibly passing if already set.

This actually sets the record.annotations['topology'].The topology parameter can be “circular”, “linear”, “default_to_circular” (will default to circular if annotations['topology'] is not already set) or “default_to_linear”.

Enzyme autoselection¶

dnacauldron.biotools.autoselect_enzyme.autoselect_enzyme(parts, enzymes=('BsmBI', 'BsaI', 'BbsI', 'AarI', 'SapI'))[source]¶

Finds the enzyme that the parts were probably meant to be assembled with

Parameters:

parts – A list of SeqRecord files. They should have a “linear” attribute set to True or False, otherwise

Returns:

The enzyme that has as near as possible as exactly 2 sites in the different
constructs.