Biotools module¶
Sequence and Record I/O¶
- dnacauldron.biotools.sequence_io.load_record(filepath, topology='default_to_linear', id='auto', upperize=True, max_name_length=20)[source]¶
Return a Biopython record read from a Fasta/Genbank/Snapgene file.
- Parameters:
filepath – Path to a Genbank, Fasta, or Snapgene (.dna) file.
topology – Can be “circular”, “linear”, “default_to_circular” (will default to circular if
annotations['topology']
is not already set) or “default_to_linear”.id – Sets the record.id. If “auto”, the original record.id is used, and if none is set the name of the file (without extension) is used instead.
upperize – If true, the sequence will get upperized (recommended in this library, as the mix of upper and lower case can cause problems in Biopython’s enzyme site search).
max_name_length – The name of the record will be truncated if too long to avoid Biopython exceptions being raised.
- dnacauldron.biotools.sequence_io.load_records_from_file(filepath)[source]¶
Autodetect file format and load biopython records from it.
- dnacauldron.biotools.sequence_io.load_records_from_files(files=None, folder=None, use_file_names_as_ids=False)[source]¶
Automatically convert files or a folder’s content to biopython records.
- Parameters:
files – A list of path to files. A
folder
can be provided instead.folder – A path to a folder containing sequence files.
use_file_names_as_ids – If True, for every file containing a single record, the file name (without extension) will be set as the record’s ID.
BioPython record operations¶
- dnacauldron.biotools.record_operations.annotate_record(seqrecord, location='full', feature_type='misc_feature', margin=0, **qualifiers)[source]¶
Add a feature to a Biopython SeqRecord.
- Parameters:
seqrecord – The Biopython seqrecord to be annotated.
location – Either (start, end) or (start, end, strand). (strand defaults to +1)
feature_type – The type associated with the feature.
margin – Number of extra bases added on each side of the given location.
qualifiers – Dictionary that will be the Biopython feature’s qualifiers attribute.
- dnacauldron.biotools.record_operations.complement(dna_sequence)[source]¶
Return the complement of the DNA sequence.
For instance
complement("ATGCCG")
returns"TACGGC"
.Uses BioPython for speed.
- dnacauldron.biotools.record_operations.crop_record_with_saddling_features(record, start, end, filters=())[source]¶
Crop the Biopython record, but keep features that are only partially in.
- Parameters:
record – The Biopython record to crop.
start – Coordinates of the segment to crop.
end – Coordinates of the segment to crop.
filters – list of functions (feature=>True/False). Any feature that doesn’t pass at least one filter will be filtered out.
- dnacauldron.biotools.record_operations.reverse_complement(sequence)[source]¶
Return the reverse-complement of the DNA sequence.
For instance
complement("ATGCCG")
returns"GCCGTA"
.Uses BioPython for speed.
- dnacauldron.biotools.record_operations.sequence_to_biopython_record(sequence, id='<unknown id>', name='same_as_id', features=())[source]¶
Return a SeqRecord of the sequence, ready to be Genbanked.
- dnacauldron.biotools.record_operations.set_record_topology(record, topology)[source]¶
Set the Biopython record’s topology, possibly passing if already set.
This actually sets the
record.annotations['topology']
.Thetopology
parameter can be “circular”, “linear”, “default_to_circular” (will default to circular ifannotations['topology']
is not already set) or “default_to_linear”.
Enzyme autoselection¶
- dnacauldron.biotools.autoselect_enzyme.autoselect_enzyme(parts, enzymes=('BsmBI', 'BsaI', 'BbsI', 'AarI', 'SapI'))[source]¶
Finds the enzyme that the parts were probably meant to be assembled with
- Parameters:
parts – A list of SeqRecord files. They should have a “linear” attribute set to True or False, otherwise
- Returns:
The enzyme that has as near as possible as exactly 2 sites in the different
constructs.