You Are How You Query: Deriving Behavioral

Fingerprints from DNS Traﬃc

Dae Wook Kim(B

)and Junjie Zhang

Wright State University, Dayton, USA

{kim.107,junjie.zhang}@wright.edu

Abstract. As the Domain Name System (DNS) plays an indispensable

role in a large number of network applications including those used for

malicious purposes, collecting and sharing DNS traﬃc from real networks

are highly desired for a variety of purposes such as measurements and sys-

tem evaluation. However, information leakage through the collected net-

work traﬃc raises signiﬁcant privacy concerns and DNS traﬃc is not an

exception. In this paper, we study a new privacy risk introduced by pas-

sively collected DNS traﬃc. We intend to derive behavioral fingerprints

from DNS traces, where each behavioral ﬁngerprint targets at uniquely

identifying its corresponding user and being immune to the change of

time. We have proposed a set of new patterns, which collectively form

behavioral ﬁngerprints by characterizing a user’s DNS activities through

three diﬀerent perspectives including the domain name, the inter-domain

relationship, and domains’ temporal behavior. We have also built a dis-

tributed system, namely DNSMiner, to automatically derive DNS-based

behavioral ﬁngerprints from a massive amount of DNS traces. We have

performed extensive evaluation based on a large volume of DNS queries

collected from a large campus network across two weeks. The evalua-

tion results have demonstrated that a signiﬁcant percentage of network

users with persistent DNS activities are likely to have DNS behavioral

ﬁngerprints.

Keywords: Domain Name System ·Behavioral ﬁngerprints ·Privacy

1 Introduction

The Domain Name System (DNS) plays an indispensable role in the Internet

by providing fundamental two-way mapping between domains and Internet Pro-

tocol (IP) addresses. Its practical usage has gone far beyond the domain-IP

mapping service: it supports many critical network services such as traﬃc bal-

ancing [1] and content delivering [2]; it is also leveraged by attackers to build

agile and robust malicious cyber infrastructures, where salient examples include

fast-ﬂux [3], random domain generator [4], and covert channels [5]. The impor-

tance and prevalence of DNS signiﬁes the demand of its traces collected from

real networks, which are essential for many DNS-relevant designs by serving as

benchmark data or ground truth. For instance, DNS traces have been collected

Institute for Computer Sciences, Social Informatics and Telecommunications Engineering 2015

B. Thuraisingham et al. (Eds.): SecureComm 2015, LNICST 164, pp. 348–366, 2015.

DOI: 10.1007/978-3-319-28865-9 19

You Are How You Query 349

to evaluate DNS cache algorithms [6] and to train statistical models for malicious

domain detection [7,8]. Although the speciﬁc type and granularity of informa-

tion extracted from DNS traces may vary for diﬀerent applications, the demand

for DNS traces is generally increasing.

Despite their practical values, DNS traces may introduce signiﬁcant privacy

concerns. For example, DNS queries that are triggered by the prefetching mech-

anisms of popular browsers can leak users’ search engine queries [9]; DNS queries

can also reveal the types of operating systems [10]. In this project, we study a

new privacy risk introduced by passively collected DNS traﬃc: to which extent

network users can be uniquely identiﬁed merely based on the way they issue DNS

queries? In other words, we intend to derive behavioral fingerprints from DNS

traces, where each behavioral ﬁngerprint targets at uniquely identifying its corre-

sponding user and being immune to the change of time. Such DNS-based behav-

ioral ﬁngerprints, once successfully derived, have strong privacy implications. For

example, they can be used to de-anonymize the DNS traces with anonymized

sources. To be more speciﬁc, when DNS traces are shared, the source (e.g., the

IP address) that issues the DNS query is usually anonymized (e.g., by obscur-

ing the IP address using hash functions). However, one can learn behavioral

ﬁngerprints from un-anonymized DNS traces and use the acquired ﬁngerprints

to reveal the presence of speciﬁc users in (other) anonymized traces. In addi-

tion, if one can get access to DNS traces collected from multiple access networks

(e.g., through open DNS services or collecting traces from multiple networks),

he/she can track users’ locations across diﬀerent networks by using behavioral

ﬁngerprints to reveal users in DNS traces.

This paper aims at investigating the extent to which behavioral ﬁngerprints

can be derived and measuring their accuracy on identifying the presence of cor-

responding network users. As a means towards this end, we have proposed a set

of new patterns, which collectively form behavioral ﬁngerprints. We also built a

distributed, scalable system, namely DNSMiner, to automatically derive DNS-

based behavioral ﬁngerprints from a massive amount of DNS traces. Speciﬁcally,

we make the following contributions in this paper.

– We have designed ﬁve new patterns including domain set,domain sequence,

window-aware domain sequence,period behavior,andhourly behavior,which

collectively form behavioral ﬁngerprints. These patterns systematically char-

acterize DNS behaviors from three aspects including the domain name, the

inter-domain relationship, and the temporal behavior. Although more pat-

terns might be discovered to enhance behavioral ﬁngerprints, our proposed

patterns serve as a lower bound of the capabilities to use DNS behaviors to

ﬁngerprint network users.

– We have built a system, namely DNSMiner, to automatically mine behav-

ioral ﬁngerprints from a massive amount of DNS traces. The design of the

system leverages the MapReduce distributed infrastructure to scale up the

system performance. After being deployed in a 15-nodes Hadoop platform,

DNSMiner can process more than 467 million DNS queries using approxi-

mately 4 hours.