This repository contains the source code and tests developed under the DARPA Radio Frequency Machine Learning Systems (RFMLS) program contract N00164-18-R-WQ80. All the code released here is unclassified and the Government has unlimited rights to the code.
Please cite the following paper if you intend to use this code for your research.
J. Tong, R. Bruno, O. Emmanuel, S. Nasim, W. Zifeng, S.Kunal, G. Andrey, D. Jennifer, C. Kaushik, I. Stratis, "Deep Learning for RF Fingerprinting: A Massive Experimental Study", Internet of Things (IoT) Magazine, 2020.
Please install the python dependencies found in requirements.txt
with:
python install -r requirements.txt
GNURadio is used in preprocessing the dataset, and thus it must be installed. For installation instructions please follow the official GNURadio guide.
Preprocessing amounts to filtering, which entails re-centering the signals to a base band based on their metadata and equalization. Equalization only applies to WiFi signals, and is optional. Preprocessing also generates reference pickle files needed for training and testing a model; hence all data needs to be preprocessed through this script, whether it be WiFi, ADS-B, or a new type of dataset.
Note that if a task contains both WiFi and ADS-B data, both will be preprocessed simutlaneously (i.e., in one execution of the code with --preprocess True
evoked).
The following arguments can be specified during preprocessing:
--train_tsv path to the train tsv file
--test_tsv path to the test tsv fle
--task set task name
--root_wifi path to the raw wifi signals
--root_adsb path to the raw adsb signals
--out_root_data output path for permanently storing preprocessed data
--out_root_list output path for permanently storing pickled data lists
--wifi_eq Specify whether wifi signals need to be equalized or not {True | False}. default: False
--newtype_process [New-Type Signal] process a dataset with New-Type Signal or not {True | False}.
default: False
--root_newtype [New-Type Signal] path to the raw New-Type Signal {True | False}.
default: False
--newtype_filter [New-Type Signal] filter New-Type Signal or not {True | False}.
default: False
--signal_BW_useful [New-Type Signal] This argument specifies the bandwidth (measured in Hz) used to transmit the actual
information. Most wireless standards reserve portions of the bandwidth to the so-called
guard bands which are used to reduce interference across multiple bands. Guard bands are
located on the sides of the overall bandwidth, and the variable signal_BW_useful is used to
properly extract useful information from the recorder signal, thus removing any interference
generated by transmissions occurring at other bands. As an example, for WiFi signals at
2.4GHz, signal_BW_useful=16.5MHz
default: {16.5e6 | 17e6} depending on signal frequency
--num_guard_samp [New-Type Signal] This argument represents the number of guard samples before and after each packet
transmission. Each packet transmission is preceded and followed by num_guard_samp samples that
do not contain any data transmission.
default: 2e6
Preprocessed data are stored under /$out_root_data/$task/{wifi, ADS-B, newtype}/
. There is no need to specify the datatype, which will be detected automatically according to .tsv
files.
Other generated files are stored under /$out_root_list/$task/{wifi, ADS-B, newtype, mixed}/
. This folder will contain five files needed by both training and testing:
- a
file_list.pkl
file containing path of all preprocessed example needed by training and testing, - a
label.pkl
file containing a dictionary {path of preprocessed example: device name(e.g., 'wifi_100_crane-gfi_1_dataset-8965')}, - a
device_ids.pkl
file containing a dictionary {device name(e.g., 'wifi_100_crane-gfi_1_dataset-8965'): device id(an integer)}, - a
partition.pkl
file containing training and testing separation, - a
stats.pkl
file containing computed data statistics needed by training.
Specifically, the folder /$out_root_data/$task/mixed/
contains generated files for all datatypes provided in .tsv
files. For example, if there are two datatypes in .tsv
files, e.g., wifi
and ADS-B
, it will generate files for both filtered wifi
and raw ADS-B
signals; if there are three datatypes in .tsv
files, wifi
, ADS-B
and newtype
, it will generate files for filtered wifi
, raw ADS-B
, and {raw | filtered, depends on $newtype_filter} newtype
signals together. Notice that equalized and raw signals are in different domain and cannot be trained/tested together, the folder ~/mixed/
will contain generated files only for filtered wifi
no matter argument --wifi_eq
is set to True
or not.
An example of running the preprocessing would be:
$ python ./preprocessing/main.py \
--task MyTask \
--train_tsv /scratch/MyTask.train.tsv \
--test_tsv /scratch/MyTask.test.tsv \
--root_wifi /mnt/disk1/wifi/ \
--root_adsb /mnt/disk2/adsb/ \
--out_root_data ./data/test \
--out_root_list ./data/test_list
This prints the following output:
Extracting data...
*************** Extraction .meta/.data according to .tsv ***************
Initialize 40 workers.
Processing tsv: /scratch/MyTask.train.tsv
Processing tsv: /scratch/MyTask.test.tsv
*************** Filtering WiFi signals ***************
There are 50 devices to filter
Processing folder: [...]
*************** Create partitions, labels and device ids for training. Compute stats also.***************
generating files for WiFi, ADS-B and mixed dataset
***************creating labels for dataset:wifi raw_samples***************
Created auxiliary files:
('Number of devices', 50)
('Number of examples', 13650)
Save files to: ./data/test_list/MyTask/wifi/raw_samples
creating partition files for dataset:wifi raw_samples
13650/13650 [00:00<00:00, 149579.36it/s]
compute stats for dataset:wifi raw_samples
10900/10900 [00:25<00:00, 422.02it/s]
***************creating labels for dataset:ADS-B***************
Created auxiliary files:
('Number of devices', 50)
('Number of examples', 13650)
Save files to:./data/test_list/MyTask/ADS-B/
creating partition files for dataset:ADS-B
13650/13650 [00:00<00:00, 155440.33it/s]
('Num Ex in Train', 10900)
('Num Ex in Test', 2750)
compute stats for dataset:ADS-B
10900/10900 [00:20<00:00, 527.07it/s]
***************creating labels for dataset:mixed***************
Created auxiliary files:
('Number of devices', 100)
('Number of examples', 27300)
Save files to:./data/test_list/MyTask/mixed/
creating partition files for dataset:mixed
27300/27300 [00:00<00:00, 158303.85it/s]
('Num Ex in Train', 21800)
('Num Ex in Test', 5500)
compute stats for dataset:mixed
21800/21800 [00:46<00:00, 470.44it/s]
Skipping model framework.
Note that the above command mounts folders /scratch
, /mnt
, and /data
, as they are to be used for input arguments and persisting data.
To preprocess data using equalization, an additional --wifi_eq True
argument needs to be passed.
An example of running the preprocessing for a new dataset with new signal type would be:
$ python ./preprocessing/main.py \
--task MyTask \
--train_tsv /scratch/MyTask.train.tsv \
--test_tsv /scratch/MyTask.test.tsv \
--root_wifi /mnt/disk1/wifi/ \
--root_adsb /mnt/disk2/adsb/ \
--root_newtype /mnt/disk2/adsb/NewType/ \
--newtype_process True \
--newtype_filter True \
--signal_BW_useful 16.5e6 \
--num_guard_samp 2e-6 \
--out_root_data ./data/test \
--out_root_list ./data/test_list \
If the new dataset contains only newtype
signals, there is no need to specify --root_wifi
and root_adsb
. Otherwise, please specify these two arguments accordingly.
The code presumes that the content of the meta/data files follows the syntax of Wifi signals.
Note that arguments --signal_BW_useful
and --num_guard_samp
need to be set for filtering a new signal type. There is no need to specify these two arguments for Wifi or ADSB signals.
Running a task consists of a train_phase
and a test_phase
.
Both training and testing require that data is first pre-processed (so that, beyond filtering and/or equalizing, it is in a format understandable by our neural network), as described in the previous section.
Command line arguments for training and testing include:
--data_path path containing training, validation, testing, stats dictionaries.
directory used for --out_root_list.
--data_type the data type, either 'wifi', 'ADS-B', 'newtype', or 'mixed'. 'mixed' trains the model jointly on all datatype present. default: wifi
--model_flag set the model architecture {baseline | resnet1d}. default: baseline
--exp_name create an exp_name folder to save logs, models etc
The --data_path
folder corresponds to the --out_root_list
folder where the output from the preprocessing stage was stored. Running tests is a silent execution (it only prints errors in stderr
). Trained model weights as well as classification outcomes are stored under /results/$task/$data_type/$exp_name
. This folder will contain four files:
- a
json
file containing the model structure, - an
hdf5
file containing the model weights, - a
config
file containing the parameter configuration used during training and testing, and - a
log.out
file that contains a detailed output log, including the final test accuracy. The last line contains the per slice and per example/signal accuracy on the test set.
For a full list of arguments of preprocessing or building a model, you may use --help
.
Commands for preprocessing:
Processing usage, optional arguments:
-h, --help show this help message and exit
--task TASK Specify the task name (default: 1Cv2)
--multiburst Specify the task is a multiburst testing task or not.
If it is, we assume the corresponding general task has
alredy processed. For example, if the task is
specified as 1MC, we will look for corresponding
processed data, labels of 1Cv2. (default: False)
--new_device Specify the task is a novel device testing task.
(default: False)
--train_tsv TRAIN_TSV
Specify the path of .tsv for training (default: /scrat
ch/RFMLS/RFML_Test_Specs_Delivered_v3/test1/1Cv2.train
.tsv)
--test_tsv TEST_TSV Specify the path of .tsv for testing (default: /scratc
h/RFMLS/RFML_Test_Specs_Delivered_v3/test1/1Cv2.test.t
sv)
--root_wifi ROOT_WIFI
Specify the root path of WiFi signals (default:
/mnt/rfmls_data/disk1/wifi_sigmf_dataset_gfi_1/)
--root_adsb ROOT_ADSB
Specify the root path of ADS-B signals (default:
/mnt/rfmls_data/disk2/adsb_gfi_3_dataset/)
--out_root_data OUT_ROOT_DATA
Specify the root path of preprocessed data (default:
./data/v3)
--out_root_list OUT_ROOT_LIST
Specify the root path of data lists for training
(default: ./data/v3_list)
--wifi_eq Specify wifi signals need to be equalized or not.
(default: False)
--newtype_process [New Type Signal]Specify process new type signals or
not (default: False)
--root_newtype ROOT_NEWTYPE
[New Type Signal]Specify the root path of new type
signals (default: )
--newtype_filter [New Type Signal]Specify if new type signals need to
be filtered. (default: False)
--signal_BW_useful SIGNAL_BW_USEFUL
[New Type Signal]Specify Band width for new type
signal. (default: None)
--num_guard_samp NUM_GUARD_SAMP
[New Type Signal]Specify number of guard samples.
(default: 2e-06)
--time_analysis Enable to report time preprocessing takes (default:
False)
Commands for building, training, and testing a model:
Model usage, optional arguments:
-h, --help show this help message and exit
--exp_name Experiment name. (default: experiment_1)
--pickle_files Path containing pickle files with data information.
(default: None)
--save_path Path to save experiment weights and logs. (default:
None)
--save_predictions Disable saving model predictions. (default: True)
--task Set experiment task. (default: 1Cv2)
--equalize Enable to use equalized WiFi data. (default: False)
--data_type Set the data type {wifi | adsb}. (default: wifi)
--file_type Set data file format {mat | pickle}. (default: mat)
--decimated Enable if the data in the files is decimated.
(default: False)
--val_from_train If validation not present in partition file, generate
one from the training set. (If called, use test set as
validation). (default: True)
-m , --model_flag Define model architecture {baseline | vgg16 | resnet50
| resnet1d}. (default: baseline)
-ss , --slice_size Set slice size. (default: 198)
-d , --devices Set number of total devices. (default: 50)
--cnn_stack [Baseline Model] Set number of cnn layers. (default:
5)
--fc_stack [Baseline Model] Set number of fc layers. (default: 2)
--channels [Baseline Model] Set number of channels of cnn.
(default: 128)
--fc1 [Baseline Model] Set number of neurons in the first fc
layer. (default: 256)
--fc2 [Baseline Model] Set number of neurons in the
penultimate fc layer. (default: 128)
--dropout_flag Enable to use dropout layers. (default: False)
--batchnorm Enable to use batch normalization. (default: False)
--restore_model_from
Path from where to load model structure. (default:
None)
--restore_weight_from
Path from where to load model weights. (default: None)
--restore_params_from
Path from where to load model parameters. (default:
None)
--load_by_name Enable to only load weights by name. (default: False)
--padding Disable adding zero-padding if examples are smaller
than slice size. (default: True)
--try_concat Enable if examples are smaller than slice size and
using demodulated data, try and concat them. (default:
False)
--preprocessor Set preprocessor type to use. (default: no)
--K Set batch down sampling factor K. (default: 16)
--files_per_IO Set files loaded to memory per IO. (default: 500000)
--normalize Specify if you do not want to normalize the data using
mean and std in stats files (if stats does not have
this info, it is ignored). (default: True)
--crop Set to keep first "crop" samples. (default: 0)
--training_strategy Set training strategy to use {big | fft | tensor}.
(default: big)
--sampling Set sampling strategy to use. (default: model)
--epochs Set epochs to train. (default: 25)
-bs , --batch_size Set batch size. (default: 512)
--lr Set optimizer learning rate. (default: 0.0001)
--decay Set optimizer weight decay. (default: 0.0)
-mg, --multigpu Enable multiple distributed GPUs. (default: False)
-ng , --num_gpu Set number of distributed GPUs if --multigpu enabled.
(default: 8)
--id_gpu Set GPU ID to use. (default: 0)
--shrink Set down sampling factor. (default: 1)
--early_stopping Disable early stopping. (default: True)
--patience Set number of epochs for early stopping patience.
(default: 1)
--train Enable to train model. (default: False)
-t, --test Enable to test model. (default: False)
--test_stride Set stride to use for testing. (default: 16)
--per_example_strategy
Set the strategy used to compute the per example
accuracy {majority | prob_sum | log_prob_sum, all}.
(default: prob_sum)
--get_device_acc Report and save number of top class candidates for
each example. (default: 5)