Biotools module¶
Sequence and Record I/O¶
-
dnacauldron.biotools.sequence_io.
load_record
(filepath, topology='default_to_linear', id='auto', upperize=True, max_name_length=20)[source]¶ Return a Biopython record read from a Fasta/Genbank/Snapgene file.
- Parameters
- filepath
Path to a Genbank, Fasta, or Snapgene (.dna) file.
- topology
Can be “circular”, “linear”, “default_to_circular” (will default to circular if
annotations['topology']
is not already set) or “default_to_linear”.- id
Sets the record.id. If “auto”, the original record.id is used, and if none is set the name of the file (without extension) is used instead.
- upperize
If true, the sequence will get upperized (recommended in this library, as the mix of upper and lower case can cause problems in Biopython’s enzyme site search).
- max_name_length
The name of the record will be truncated if too long to avoid Biopython exceptions being raised.
-
dnacauldron.biotools.sequence_io.
load_records_from_file
(filepath)[source]¶ Autodetect file format and load biopython records from it.
-
dnacauldron.biotools.sequence_io.
load_records_from_files
(files=None, folder=None, use_file_names_as_ids=False)[source]¶ Automatically convert files or a folder’s content to biopython records.
- Parameters
- files
A list of path to files. A
folder
can be provided instead.- folder
A path to a folder containing sequence files.
- use_file_names_as_ids
If True, for every file containing a single record, the file name (without extension) will be set as the record’s ID.
BioPython record operations¶
-
dnacauldron.biotools.record_operations.
annotate_record
(seqrecord, location='full', feature_type='misc_feature', margin=0, **qualifiers)[source]¶ Add a feature to a Biopython SeqRecord.
- Parameters
- seqrecord
The Biopython seqrecord to be annotated.
- location
Either (start, end) or (start, end, strand). (strand defaults to +1)
- feature_type
The type associated with the feature.
- margin
Number of extra bases added on each side of the given location.
- qualifiers
Dictionary that will be the Biopython feature’s qualifiers attribute.
-
dnacauldron.biotools.record_operations.
complement
(dna_sequence)[source]¶ Return the complement of the DNA sequence.
For instance
complement("ATGCCG")
returns"TACGGC"
.Uses BioPython for speed.
-
dnacauldron.biotools.record_operations.
crop_record_with_saddling_features
(record, start, end, filters=())[source]¶ Crop the Biopython record, but keep features that are only partially in.
- Parameters
- record
The Biopython record to crop.
- start, end
Coordinates of the segment to crop.
- filters
list of functions (feature=>True/False). Any feature that doesn’t pass at least one filter will be filtered out.
-
dnacauldron.biotools.record_operations.
reverse_complement
(sequence)[source]¶ Return the reverse-complement of the DNA sequence.
For instance
complement("ATGCCG")
returns"GCCGTA"
.Uses BioPython for speed.
-
dnacauldron.biotools.record_operations.
sequence_to_biopython_record
(sequence, id='<unknown id>', name='same_as_id', features=())[source]¶ Return a SeqRecord of the sequence, ready to be Genbanked.
-
dnacauldron.biotools.record_operations.
set_record_topology
(record, topology)[source]¶ Set the Biopython record’s topology, possibly passing if already set.
This actually sets the
record.annotations['topology']
.Thetopology
parameter can be “circular”, “linear”, “default_to_circular” (will default to circular ifannotations['topology']
is not already set) or “default_to_linear”.
Enzyme autoselection¶
-
dnacauldron.biotools.autoselect_enzyme.
autoselect_enzyme
(parts, enzymes=('BsmBI', 'BsaI', 'BbsI', 'AarI', 'SapI'))[source]¶ Finds the enzyme that the parts were probably meant to be assembled with
- Parameters
- parts
A list of SeqRecord files. They should have a “linear” attribute set to True or False, otherwise
- Returns
- The enzyme that has as near as possible as exactly 2 sites in the different
- constructs.