Biotools module

Sequence and Record I/O

dnacauldron.biotools.sequence_io.load_record(filepath, topology='default_to_linear', id='auto', upperize=True, max_name_length=20)[source]

Return a Biopython record read from a Fasta/Genbank/Snapgene file.

Parameters:
  • filepath – Path to a Genbank, Fasta, or Snapgene (.dna) file.

  • topology – Can be “circular”, “linear”, “default_to_circular” (will default to circular if annotations['topology'] is not already set) or “default_to_linear”.

  • id – Sets the record.id. If “auto”, the original record.id is used, and if none is set the name of the file (without extension) is used instead.

  • upperize – If true, the sequence will get upperized (recommended in this library, as the mix of upper and lower case can cause problems in Biopython’s enzyme site search).

  • max_name_length – The name of the record will be truncated if too long to avoid Biopython exceptions being raised.

dnacauldron.biotools.sequence_io.load_records_from_file(filepath)[source]

Autodetect file format and load biopython records from it.

dnacauldron.biotools.sequence_io.load_records_from_files(files=None, folder=None, use_file_names_as_ids=False)[source]

Automatically convert files or a folder’s content to biopython records.

Parameters:
  • files – A list of path to files. A folder can be provided instead.

  • folder – A path to a folder containing sequence files.

  • use_file_names_as_ids – If True, for every file containing a single record, the file name (without extension) will be set as the record’s ID.

dnacauldron.biotools.sequence_io.string_to_records(string)[source]

Convert a string of a fasta, genbank… into a simple ATGC string.

Can also be used to detect a format.

dnacauldron.biotools.sequence_io.write_record(record, target, fmt='genbank')[source]

Write a record as genbank, fasta, etc. via Biopython, with fixes.

BioPython record operations

dnacauldron.biotools.record_operations.annotate_record(seqrecord, location='full', feature_type='misc_feature', margin=0, **qualifiers)[source]

Add a feature to a Biopython SeqRecord.

Parameters:
  • seqrecord – The Biopython seqrecord to be annotated.

  • location – Either (start, end) or (start, end, strand). (strand defaults to +1)

  • feature_type – The type associated with the feature.

  • margin – Number of extra bases added on each side of the given location.

  • qualifiers – Dictionary that will be the Biopython feature’s qualifiers attribute.

dnacauldron.biotools.record_operations.complement(dna_sequence)[source]

Return the complement of the DNA sequence.

For instance complement("ATGCCG") returns "TACGGC".

Uses BioPython for speed.

dnacauldron.biotools.record_operations.crop_record_with_saddling_features(record, start, end, filters=())[source]

Crop the Biopython record, but keep features that are only partially in.

Parameters:
  • record – The Biopython record to crop.

  • start – Coordinates of the segment to crop.

  • end – Coordinates of the segment to crop.

  • filters – list of functions (feature=>True/False). Any feature that doesn’t pass at least one filter will be filtered out.

dnacauldron.biotools.record_operations.reverse_complement(sequence)[source]

Return the reverse-complement of the DNA sequence.

For instance complement("ATGCCG") returns "GCCGTA".

Uses BioPython for speed.

dnacauldron.biotools.record_operations.sequence_to_biopython_record(sequence, id='<unknown id>', name='same_as_id', features=())[source]

Return a SeqRecord of the sequence, ready to be Genbanked.

dnacauldron.biotools.record_operations.set_record_topology(record, topology)[source]

Set the Biopython record’s topology, possibly passing if already set.

This actually sets the record.annotations['topology'].The topology parameter can be “circular”, “linear”, “default_to_circular” (will default to circular if annotations['topology'] is not already set) or “default_to_linear”.

Enzyme autoselection

dnacauldron.biotools.autoselect_enzyme.autoselect_enzyme(parts, enzymes=('BsmBI', 'BsaI', 'BbsI', 'AarI', 'SapI'))[source]

Finds the enzyme that the parts were probably meant to be assembled with

Parameters:

parts – A list of SeqRecord files. They should have a “linear” attribute set to True or False, otherwise

Returns:

  • The enzyme that has as near as possible as exactly 2 sites in the different

  • constructs.