



datasets/cstnet-tls1.3/ and specify the data path in data_process/.models/finetuned_model.bin. Then you can do inference with the fine-tuned model:vocab_process/main.py to generate the encrypted traffic corpus or directly use the generated corpus in corpora/. Note you'll need to change the file paths and some configures at the top of the file.main/preprocess.py to pre-process the encrypted traffic burst corpus. python3 preprocess.py --corpus_path corpora/encrypted_traffic_burst.txt \
--vocab_path models/encryptd_vocab.txt \
--dataset_path dataset.pt --processes_num 8 --target bert
data_process/main.py to generate the data for downstream tasks if there is a dataset in pcap format that needs to be processed. This process includes two steps. The first is to split pcap files by setting splitcap=True in datasets/main.py:54 and save as npy datasets. Then the second is to generate the fine-tuning data. If you use the shared datasets, then you need to create a folder under the dataset_save_path named dataset and copy the datasets here.pretrain.py to pre-train.run_classifier.py script in the fine-tuning folder.Posted Feb 26, 2024
The repository of ET-BERT, a network traffic classification model on encrypted traffic. The work has been accepted as The Web Conference (WWW) 2022 accepted pa…