Datasets¶
PIDSMaker supports several public datasets commonly used in APT detection research. This page describes each dataset and its attack scenarios.
Overview¶
| Dataset | OS | Attacks | Size (GB) |
|---|---|---|---|
| CADETS_E3 | FreeBSD | 3 | 10 |
| THEIA_E3 | Linux | 2 | 12 |
| CLEARSCOPE_E3 | Android | 1 | 4.8 |
| FIVEDIRECTIONS_E3 | Linux | 2 | 22 |
| TRACE_E3 | Linux | 3 | 100 |
| CADETS_E5 | FreeBSD | 2 | 276 |
| THEIA_E5 | Linux | 1 | 36 |
| CLEARSCOPE_E5 | Android | 2 | 49 |
| FIVEDIRECTIONS_E5 | Linux | 4 | 280 |
| TRACE_E5 | Linux | 1 | 710 |
| optc_h201 | Windows | 1 | 9 |
| optc_h501 | Windows | 1 | 6.7 |
| optc_h051 | Windows | 1 | 7.7 |
DARPA TC¶
The DARPA Transparent Computing program produced benchmark datasets for evaluating provenance-based security systems.
Engagement 3 (E3) - April 2018¶
CADETS_E3¶
FreeBSD host with Nginx server exploitation.
| Attack id | Duration | Description |
|---|---|---|
| 0 | 49 min | Nginx exploited to deploy Drakon loader with root escalation. Netrecon executed after C2 connection, followed by failed libdrakon injection into sshd. Host crashed with kernel panic. |
| 1 | 40 min | Nginx re-exploited to deploy Drakon and MicroAPT implants under random names (tmux, minions, sendmail). Privilege escalation failed; MicroAPT ran unprivileged for port scanning. |
| 2 | 13 min | Nginx re-exploited to deploy new Drakon implant with root privileges. Multiple failed sshd injection attempts using renamed libdrakon copies. |
python pidsmaker/main.py SYSTEM CADETS_E3
THEIA_E3¶
Ubuntu host with Firefox exploitation.
| Attack id | Duration | Description |
|---|---|---|
| 0 | 50 min | Malicious Firefox extension dropped Drakon implant. MicroAPT staged under /var/log/mail, connected to C2 for control and network scanning. |
| 1 | 30 min | Firefox exploited to drop Drakon implant as /home/admin/clean with root privileges, then copied as profile. Both connected to C2 server. |
python pidsmaker/main.py SYSTEM THEIA_E3
CLEARSCOPE_E3¶
Android device with Firefox exploitation.
| Attack id | Duration | Description |
|---|---|---|
| 0 | 54 min | Firefox exploited via malicious website. Drakon implant installed and elevated, but module loading failed. Persistent C2 connection maintained. |
python pidsmaker/main.py SYSTEM CLEARSCOPE_E3
Engagement 5 (E5) - May 2019¶
THEIA_E5¶
Ubuntu host with Firefox exploitation.
| Attack id | Duration | Description |
|---|---|---|
| 0 | 19 min | Firefox exploited via malicious website. Root gained with BinFmt-Elevate, Drakon shellcode injected into sshd, persistence file created, C2 access maintained. |
python pidsmaker/main.py SYSTEM THEIA_E5
CLEARSCOPE_E5¶
Android device with APK-based attacks.
| Attack id | Duration | Description |
|---|---|---|
| 0 | 41 min | Malicious appstarter APK loaded MicroAPT. Elevate driver installed for privilege escalation. Sensitive databases exfiltrated (calllog, calendar, SMS) and screenshot captured. |
| 1 | 8 min | MicroAPT deployed directly via adb shell after APK dropper failed. Privilege escalation via BinFmt Elevate driver, then file exfiltration. |
python pidsmaker/main.py SYSTEM CLEARSCOPE_E5
DARPA OpTC¶
Windows enterprise environment with realistic APT scenarios.
optc_h201¶
| Attack id | Duration | Description |
|---|---|---|
| 0 | 1h58 | PowerShell Empire stager executed with elevated access. Mimikatz used for credential theft, registry persistence set, recon performed, then pivoted to other hosts via WMI. |
python pidsmaker/main.py SYSTEM optc_h201
optc_h501¶
| Attack id | Duration | Description |
|---|---|---|
| 0 | 5h01 | Phishing email launched PowerShell Empire stager. Escalated via DeathStar, WMI persistence established, RDP tunneling and file exfiltration performed, then pivoted to other hosts. |
python pidsmaker/main.py SYSTEM optc_h501
optc_h051¶
| Attack id | Duration | Description |
|---|---|---|
| 0 | 3h56 | Malicious Notepad++ update installed Meterpreter. Escalated to SYSTEM, migrated into LSASS for Mimikatz credential theft, established persistence, timestomped files, added admin account for RDP. |
python pidsmaker/main.py SYSTEM optc_h051
Note
TODO: add descriptions for CADETS_E5, FIVED and TRACE datasets.
Data structure¶
Graph partitioning¶
Each dataset is partitioned into daily graphs, split into:
- Train graphs: Normal activity for model training
- Validation graphs: Normal activity for threshold calibration
- Test graphs: Contains both normal activity and attacks
Adding custom datasets¶
To add a new dataset, define its configuration in pidsmaker/config/config.py:
DATASET_DEFAULT_CONFIG = {
"MY_DATASET": {
"database": "my_database_name",
"num_node_types": 3,
"num_edge_types": 10,
"train_files": ["graph_1", "graph_2", "graph_3"],
"val_files": ["graph_4"],
"test_files": ["graph_5", "graph_6"],
"ground_truth_relative_path": ["MY_DATASET/labels.csv"],
"attack_to_time_window": [
["MY_DATASET/labels.csv", "2024-01-05 10:00:00", "2024-01-05 12:00:00"],
],
},
}
Then follow the database creation guide to load your data.