GitHub - Korving-F/DACA: DAtaset Creation Aquisition engine

Overview

DACA is a configurable, automated testbed focused on spinning up servers/services, attacking them, recording log/network data and extracting any desired artifacts. The collected artifacts can then be used to tune detection rules or for educational purposes.

This project was created as part of Master thesis work at TalTech.

Requirements

This project requires pipenv to install it's main dependencies:

# Install pipenv
pip3 install pipenv

# Install project's dependencies
git clone [email protected]:Korving-F/DACA.git
cd DACA
pipenv install

The vagrant-scp module is used to collect data from VMs and needs to be installed as well:

vagrant plugin install vagrant-vbguest
vagrant plugin install vagrant-scp

In addition one needs a local installation of as well as one of it's providers to actually run the VMs: or .

If you want to make use of the or outputs you need to install these solutions yourself and make them reachable over the network. By default no authentication is configured and all communications are done over plain-text protocols. To change this one has to update the filebeat ansible playbook.

Usage

pipenv shell                                       # Enter virtualenv
python3 daca.py --help                             # Show supported commands
python3 daca.py info --list                        # List available scenarios
python3 daca.py run -d data/ --id 2                # Run scenario 2 and store collected data in ./data/ directory
python3 daca.py run -d data/ --id 2 --interactive  # Run scenario 2 interactively instead of in automated mode
python3 daca.py --debug run -d data/ --id 2        # Run the scenaio in debug mode.

Writing Scenarios

Out-of-the-box scenarios are listed within the ./scenarios directory and can be used as a reference. Scenario files are found when they have the same name as their scenario directory.

Scenarios consist of components, the simplest type of Scenario has only a single component. Each component has 2 main sections:

setup: this contains installation / provisioning steps.
run: this contains runtime commands.

The Setup phase builds/snapshots the VMs or Docker containers, initializing the Scenario. The run section is evaluated on Scenario execution.

In the background network traffic can be captured and logs can be streamed to a Kafka broker or Elasticsearch cluster. Raw logs or other artifacts can be gathered after-the-fact as well.

Scenario files are interpreted as Jinja2 files which allow for

Simple, single-file scenario

This scenario sets up a vulnerable webapp and runs some nmap scans against it. It collects a tcpdump, raw log file and a asciinema terminal session recording as artifacts. The tool also writes all needed files to reproduce the scenario to a dedicated directory. See also here for the output of this particular example.

# simple_scenario.yaml
name: "Simple example Scenario"
description: |
  "This Scenario sets up a vulnerable web application and runs multiple NMAP scans against it."
provisioner: vagrant
use_default_templates: True

components:
  - name: main_server
    description: Main Ubuntu machine used in this example scenario
    image: ubuntu/focal64
    setup:
      type: shell
      val: >
        echo "[+] Installing dependencies";
        sudo apt-get update;
        sudo apt install -y python2.7 unzip nmap asciinema;

        echo "[+] Installing Vulnerable Web App Gruyère";
        wget http://google-gruyere.appspot.com/gruyere-code.zip -O /tmp/gruyere-code.zip;
        unzip /tmp/gruyere-code.zip -d /opt/gruyere-code;

    # Notice the Jinja2 template variable
    run:
      type: shell
      val: >
        echo "[+] Run webserver";
        set -x;
        sudo python2.7 /opt/gruyere-code/gruyere.py > /tmp/gruyere.log 2>&1 & sleep 1;
        "{{ variables.nmap }}";

    artifacts_to_collect:
      - type: pcap
        val:  ["tcpdump -i any -n -t -w /tmp/web.pcap port 8008"]
      - type: files
        val: ["/tmp/gruyere.log", "/tmp/*.cast", "/tmp/*.pcap"]
      - type: cli_recording
        val: ["/tmp/nmap.cast"]

# These entries are substituted for the Jinja2 tempate variable in the run section.
variables:
  - nmap:
    - nmap -sV -p 8008 --script=http-enum 127.0.0.1
    - nmap -p8008 --script http-waf-detect 127.0.0.1
    - nmap -p8008 --script http-wordpress-users 127.0.0.1

Multi-component scenario

scenarios/
├── web_attack_scenario           # Multi-component scenario.
│   ├── web_attack_scenario.yaml  # Main scenario file (same name as parent directory)
│   ├── component_webserver       # First webserver component with 2 instances.
│   │   ├── httpd.yaml            # 
│   │   ├── httpd_playbook        # Anisble playbook to provision httpd
│   │   ├── nginx.yaml            # 
│   │   └── nginx.bash            # Script to provision nginx
│   └── component_scanner         # Second component with two scanner subcomponents.
│       ├── nmap.yaml             # 
│       └── wpscan.yaml           # 
└── simple_scenario               # Single component scenario contained in a single file. 
    └── simple_scenario.yaml      # See below for a working example.

Architecture

Data Sets

DNS Tunnelling Dataset - Investigates multiple popular DNS servers and publicly available DNS Tunnel utilities.
DNS Tunnelling over DoH Dataset - Expands on the first data set by investigating utilities that can tunnel using DNS-over-HTTPS.

Future Development

Many scenarios lend themselves to also be run on Docker (faster than current VM-based approach) while new scenarios could also be written for the cloud through Terraform (AWS, Google, Azure) which would allow generation/collection of cloud-native datasets.
(#24) (#11)
Currently the local Anisble provisioner is used to initialize VMs, which installs / runs Ansible from within the VM. However ideally an installation on the Host is used. (#31)
Currently all components are assumed to be running on Linux, which should be expanded with (#32)

Run Tests

# Install pytest and other packages helpful for debugging
pipenv install --dev

# Run any tests
python -m pytest

License

DACA is licensed under the MIT license.
Copyright © 2022, Frank Korving

Name		Name	Last commit message	Last commit date
Latest commit History 96 Commits
daca		daca
data/simple_example_scenario		data/simple_example_scenario
images		images
scenarios		scenarios
tests		tests
.coveragerc		.coveragerc
.gitignore		.gitignore
LICENSE.txt		LICENSE.txt
Pipfile		Pipfile
Pipfile.lock		Pipfile.lock
README.md		README.md
daca.py		daca.py
pytest.ini		pytest.ini

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Table of Contents

Overview

Requirements

Usage

Writing Scenarios

Simple, single-file scenario

Multi-component scenario

Architecture

Data Sets

Future Development

Run Tests

License

About

Releases 1

Packages

Languages

License

Korving-F/DACA

Folders and files

Latest commit

History

Repository files navigation

Table of Contents

Overview

Requirements

Usage

Writing Scenarios

Simple, single-file scenario

Multi-component scenario

Architecture

Data Sets

Future Development

Run Tests

License

About

Resources

License

Stars

Watchers

Forks

Releases 1

Packages 0

Languages

Packages