Saboteurs
Saboteurs is a Python library to detect failure-causing elements from success/failure data.
We use it at the Edinburgh Genome Foundry to identify defectuous genetic parts early:
When assembling large fragments of DNA, each with typically 5 to 25 parts, we observe that some assemblies have far fewer successes (“good clones”) than some others. We use Saboteurs to identify possible parts which would be causing the damage. This would generally mean that the sample corresponding to these parts has been compromised.
Before launching a large batch of assemblies which reuse the same few parts, we use Saboteurs to design a smaller “test batch” of carefully selected assemblies to detect and identify possible bad parts.
See this page for the HTML docs.
You can also use Saboteurs online, using this web app for saboteurs detection, or this other app for designing test batches.
Usage¶
Logical methods¶
Identifying saboteur elements from experimental results
Assume that a secret organization has a few dozen agents ([A]nna, [B]ob, [C]harlie, [D]olly, etc.). Regularly, the organization puts together a team (e.g. A, C, D) and sends them to a mission, which should succeed unless one of the members is a double-agent who will secretly sabotage the mission. Looking at the table below, can you identify the saboteur(s)?
Mission |
Members |
Outcome |
---|---|---|
1 |
A C D |
Success |
2 |
B C E |
Failure |
3 |
A B D |
Success |
4 |
D F G |
Failure |
Mission 2 raises suspicion on B, C, and E, but Mission 1 clears C, and mission 3 clears B. Therefore C is a saboteur. Meanwhile mission 4 raises suspicion on F and G, but while none of them is cleared by another mission, it is impossible to say if only F or only G or both are saboteurs.
The Saboteurs library has a method find_logical_saboteurs
which allows to do this reasoning many groups with many elements. Here is how you would solve the problem above:
from saboteurs import find_logical_saboteurs
groups = {
1: ['A', 'C', 'D'],
2: ['B', 'C', 'E'],
3: ['A', 'B', 'D'],
4: ['D', 'F', 'G']
}
find_logical_saboteurs(groups, failed_groups=[2, 4])
# result: {'saboteurs': ['E'], 'suspicious': ['G', 'F']}
In the result, suspicious
is the list of all elements which only appear in
failing groups, and saboteurs
is the list of suspicious elements which are
also the only suspicious element in at least one group (and therefore confirmed
unambiguously as saboteurs).
Designing experiment batches to find saboteur elements.
Assume that we have a list of agents, among which we suspect might hide one or two saboteurs. We want to select a batch of “test groups” (from all possible teams) so that when we get the result of all these teams (success or failure) we will be able to identify the one or two saboteurs. This is solved as follows:
from saboteurs import design_test_batch
all_possible_groups = {
'group_1': ['A', 'B', 'C],
'group_2': ['A', 'B', 'D', 'E'],
... and many more
}
selected_groups, error = design_test_batch(all_possible_groups,
max_saboteurs=2)
# result:
# OrderedDict([('group_3', ('A', 'B', 'L')),
# ('group_9', ('A', 'E', 'I', 'L')),
# ... and more])
You can get a quick report (CSV file and plot) of the selected groups with
generate_batch_report(selected_groups, plot_format='png',
target='design_test_batch_report')
In practice, a group can have different “positions” and a given element can only fill one of these positions. Consider for instance that there are 4 possible positions, with respective possible elements lists as follows:
elements_per_position = {
"Position_1": ['A', 'B', 'C'],
"Position_2": ['D', 'E', 'F', 'G'],
"Position_3": ['H', 'I', 'J', 'K'],
"Position_4": ['L', 'M', 'N'],
}
In that case there are 3x4x4x3=144 possible combinations, which can be generated
using saboteur’s utility method generate_combinatorial_groups
:
from saboteurs import (generate_combinatorial_groups, design_test_batch)
possible_groups = generate_combinatorial_groups(elements_per_position)
selected_groups = design_test_batch(possible_groups, max_saboteurs=2)
# result:
# OrderedDict([('group_009', ('A', 'D', 'J', 'N')),
# ('group_016', ('A', 'E', 'I', 'L')),
# ... and 13 other groups])
Statistical methods¶
Example 1: assume that a secret organization has a few dozen agents (Anna, Bob, Charlie, etc.). Regularly, the organization puts together a group (Anna and David and Peggy) and sends that group to missions, some of which will be successful, some of which will fail. After a large number of missions, looking at the results of each group, you may ask: are there some agents which tend to lower the chances of success of the groups they are part of ?
With the Saboteurs library, you would first put your data in a spreadsheet data.csv
like this one then run the following script:
from saboteurs import (csv_to_groups_data,
find_statistical_saboteurs,
statistics_report)
groups_data = csv_to_groups_data("data.csv")
analysis_results = find_statistical_saboteurs(groups_data)
statistics_report(analysis_results, "report.pdf")
You obtain the following PDF report highlighting which members have a significant negative impact on their groups, and where they appear:
Installation¶
You can install Saboteurs through PIP:
sudo pip install saboteurs
Alternatively, you can unzip the sources in a folder and type
sudo python setup.py install
License = MIT¶
Saboteurs is an open-source software originally written at the Edinburgh Genome Foundry by Zulko and released on Github under the MIT licence (Copyright 2017 Edinburgh Genome Foundry). Everyone is welcome to contribute!
More biology software¶
Saboteurs is part of the EGF Codons synthetic biology software suite for DNA design, manufacturing and validation.