-
Notifications
You must be signed in to change notification settings - Fork 2
/
README
91 lines (71 loc) · 3.63 KB
/
README
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
DOCUMENTATION:
This code works takes as input network traces and outputs a list of potential
identifier strings for each user in the network.
AUTHORS:
Sakshi Jain: [email protected]
Mobin Javed: [email protected]
Vern Paxson: [email protected]
---------------
Pre-requisites
---------------
- Bro to extract the contents of TCP connections from the raw network traces.
You can download and install it from Bro website [1].
- Python version > 2.7 (Some modules used in the code are only present in
versions > 2.7)
- Maximum number of open files allowed by the system > 300.
(On Mac OSX, you can set this using the command $ulimit -n 300)
--------------
Pre-processing
--------------
Configure the following variables in src/config.py:
INPUT_NETWORK_TRACES Path to network traces [Should contain one trace file per day]
TRACE_SUFFIX Suffix for trace files [For example, ".pcap",
".trace"]
TRACE_TO_DAY_MAPPING Path to a tab-separated file containing
mappings of trace names to day numbers [Number
days as day_1, day_2,...]
LOCAL_SUBNET Subnet of the local network on which network traffic was
collected
BRO_PATH Path to Bro installation
ALL_CONTENT Output path to store TCP payloads extracted from traces
MAIN_OUTPUT_PATH Output path to store all intermediary crunches
as well as the final set of identifiers with context
Run prepdata.py as:
$ cd preprocessing
$ python prepdata.py
prepdata.py extracts TCP payloads from network traces using Bro, and
reorganizes the data into per-user directories, each having per-day
subdirectories.
- Each IP address in the LOCAL_SUBNET is considered a separate user. The
directories for the users are named numerically and the mapping file is
stored in PARENT_CONTENT_FOLDER. Note that if your network contains
NATs, please use NAT/DHCP logs to further split data of individual users
behind NAT into their corresponding directories.
- The contents for each day are stored in a sub-directory named after the
mapping provided in the TRACE_TO_DAY_MAPPING file.
---------
Filtering
---------
Run filtering code as:
$ cd ../filtering
$ python filtering.py --run all --compress --printable_only
$ python context_filtering.py
filtering.py takes as input all the content files and applies connection
and string based filtering. The output is a set of identifiers with built
context.
context_filtering.py applies context based filtering, namely removing
identifiers in Cookie and Path along with identifiers occurring for less
than 3 days. The final output is saved in '$MAIN_OUTPUT_PATH/identifiers_with_context'.
The contents of this folder need to be manually analyzed by an analyst to find
interesting identifiers detected by our methodology.
--------------------
Analyzing the output
--------------------
The folder '$MAIN_OUTPUT_PATH/identifiers_with_context' contains one context
file per user. Each entry in a context file corresponds to a line (i.e.,
delineated by newlines) from the original content file, with the reconstructed
identifier string(s) highlighted. For reference, the path to the original
content file in which the identifier string was found is also printed in the
context file. In order to view the highlighted file use the command
"less -r <filename>"
[1] https://www.bro.org/downloads/release/bro-2.4.1.tar.gz