datasets/cstnet-tls1.3/
and specify the data path in data_process/
.models/finetuned_model.bin
. Then you can do inference with the fine-tuned model:vocab_process/main.py
to generate the encrypted traffic corpus or directly use the generated corpus in corpora/
. Note you'll need to change the file paths and some configures at the top of the file.main/preprocess.py
to pre-process the encrypted traffic burst corpus. python3 preprocess.py --corpus_path corpora/encrypted_traffic_burst.txt \
--vocab_path models/encryptd_vocab.txt \
--dataset_path dataset.pt --processes_num 8 --target bert
data_process/main.py
to generate the data for downstream tasks if there is a dataset in pcap format that needs to be processed. This process includes two steps. The first is to split pcap files by setting splitcap=True
in datasets/main.py:54
and save as npy
datasets. Then the second is to generate the fine-tuning data. If you use the shared datasets, then you need to create a folder under the dataset_save_path
named dataset
and copy the datasets here.pretrain.py
to pre-train.run_classifier.py
script in the fine-tuning
folder.