Stream deinterleaving is an important problem with various applications in the cybersecurity domain. In this paper, we consider the specific problem of deinterleaving DNS data streams using machine-learning techniques, with the objective of automating the extraction of malware domain sequences. We first develop a generative model for user request generation and DNS stream interleaving. Based on these we evaluate various inference strategies for deinterleaving including augmented HMMs and LSTMs on synthetic datasets. Our results demonstrate that state-of-the-art LSTMs outperform more traditional augmented HMMs in this application domain.
|Original language||English (US)|
|Title of host publication||Proceedings - 2018 IEEE Symposium on Security and Privacy Workshops, SPW 2018|
|Publisher||Institute of Electrical and Electronics Engineers Inc.|
|Number of pages||6|
|State||Published - Aug 2 2018|
|Event||2018 IEEE Symposium on Security and Privacy Workshops, SPW 2018 - San Francisco, United States|
Duration: May 24 2018 → …
|Name||Proceedings - 2018 IEEE Symposium on Security and Privacy Workshops, SPW 2018|
|Other||2018 IEEE Symposium on Security and Privacy Workshops, SPW 2018|
|Period||5/24/18 → …|
Bibliographical noteFunding Information:
The work was supported in part by NSF grants CNS- 1314560, IIS-1447566, IIS-1447574, IIS-1422557, CCF- 1451986, and IIS-1563950. SG and VY acknowledge partial support from NSF Grant CNS-1314956 and CNS-1514503.
- Malicious domain detection