Passive OS Fingerprinting by DNS Traffic Analysis
Takashi MATSUNAKA, Akira YAMADAand Ayumu KUBOTA
KDDI R&D Laboratories Inc.
Saitama, Japan
{ta-matsunaka, kubota}@kddilabs.jp
KDDI CORPORATION
Tokyo, Japan
ai-yamada@kddi.com
Abstract—Network administrators want to determine which
services and applications are most frequently used, which and
how many devices and operating systems (OSs) are used, and
when and where the highest peak of network traffic is to
overcome the massive traffic demand. However, it is hard to
recognize the situation in large and complicated networks. It
requires massive additional monitoring nodes or systems and
large volumes of traffic data analysis. Moreover, in the case of
using NAT or tethering, the number of IP addresses used does
not coincide with the number of devices because IP addresses is
shared with devices in the behind of NAT-boxes or tethering
devices.
In this paper, we propose a new passive OS fingerprinting
method which requires analyzing only DNS traffic. The
method utilizes characteristics on DNS queries that each OS
sends DNS queries related to specific domains, and each OS
sends these queries with specific patterns of time interval
between them. The method can estimate the number of devices
with each OS from the number of queries by utilizing the
characteristics of the time interval patterns. The method
considers the likelihood of irregular events that some queries
are sent less than regular time intervals, and some other
queries are sent more than regular time intervals. According to
our examination on our intra-network, some results of our
estimation method are close to the results of DHCP
fingerprinting.
Keywords-Passive OS fingerprinting; Traffic analysis
I. INTRODUCTION
In recent years, data traffic has increased explosively due
to the increase in the number and use of smartphones. To
overcome the massive traffic demand, network
administrators must perceive the status of their networks to
ensure stable network service. Network administrators want
to determine which services and applications are most
frequently used, which devices and operating systems (OSs)
are used, and when and where the highest peak of network
traffic is. In particular, the most important and useful factor
is to recognize the trend in the distribution of operating
systems in use in terms of network management. According
to Ericson’s report [1], different OSs have different trends in
traffic volume. Additionally, different OSs have different
applications installed. However, it is hard to perceive the
status since the network is more complicated due to the
diverse access networks (wire (e.g. FTTH (Fiber To The
Home), xDSL (Digital Subscriber Line)) or wireless
networks e.g. cellular, WLAN (Wireless Local Area)), traffic
off-loading from mobile networks to fixed ones (e.g. via
WLAN (Wireless Local Area Network)), and tethering by
smartphones or mobile routers.
Previous works studied ways to infer network status. In
[11,12], they profiled user activities or classified the traffic
on the network. In [2,3], they studied ways to detect an OS
(OS fingerprinting) by monitoring traffic on the network. For
example, OS fingerprinting is realized by using
characteristics in the TCP/IP header [2], fields in the DHCP
packets [3], and the HTTP header. Some works took another
approach to actively detect an OS by sending or injecting
configured packets to the target hosts or TCP/IP sessions
[4,6]. Another work used a hybrid approach [10] that
combined passive approaches with active ones. However,
these works are unrealistic for large, complicated networks
in terms of storage and computational cost. These works,
except for [3], force network administrators to deploy
massive additional monitors or systems on their networks
and to analyze large volumes of traffic data to profile all
activities. The works utilizing DHCP packets [3] cannot
extract additional information related to user activities. Some
works of reducing the monitoring nodes for network
management and monitoring are to adapt dynamic networks,
such as virtual networks, the Internet, or sensor networks by
selecting appropriate nodes [7,8,9]. However, these works
also have the deployment issue; these works need to improve
or replace existing network devices or nodes to add new
functions. These works also require large volumes of data,
making it difficult to extract information on network status in
terms of not only the volume of traffic but also services and
application trends. Furthermore, all previous works have
difficulty in estimating the number of devices with each OS
in the case that devices are located in the behind of NAT
(Network Address Transform) boxes or tethering devices, or
devices move across access networks by traffic off-loading
from the cellular network to fixed one via WLAN, where a
device is assigned with different IP addresses by access
networks it uses, or a device shares an IP address with other
devices.
To overcome such a difficulty, we focus on DNS traffic
as a tool to be aware of the situation on a network. Analyzing
DNS traffic results in a substantial amount of useful
information about the status of the network, such as popular
services and applications among users and daily traffic trends.
Furthermore, it allows us to presume upcoming traffic since
a user’s device first sends a query to a DNS server to resolve
the IP address of a service provider. Moreover, we argue that
we can effectively realize awareness of the network status by
simply monitoring DNS-related traffic without additional
systems or monitoring points. Then, this is a suitable and
realistic solution for large and complicated networks.
In this paper, we propose a new passive OS
fingerprinting method by analyzing DNS traffic. The method
utilizes characteristics on DNS queries: each OS has specific
queries for domains to which other OSs send no query, and
each OS has characteristics on the time interval distribution
in sending the OS-specific domain queries. The method can
estimate the number of OSs from the number of specific
DNS queries. In order to realize our method, we derive
characteristics regarding DNS traffic by analyzing DNS
queries from each OS. Our analysis shows that each OS has
two important characteristics on DNS queries described
above. We also devise a method for estimating the number of
OSs from the number of queries by utilizing the
characteristics. For the estimation, we derive an estimation
equation which utilizes the characteristics of specific DNS
queries and also considers the irregular time interval case
that some queries are sent less than regular time intervals,
and some other queries are sent more than regular time
intervals. In this paper, we provide the results of our analysis
against DNS queries from the Android OS and the
characteristics of the queries. Furthermore, this paper shows
the results of our examination on our intra-network for
estimating the number of OSs by using our estimation. Some
results show that our method is a close estimation of the
results of DHCP fingerprinting.
A. Contribution and Outline of this Paper
In this paper, we propose a new passive OS
fingerprinting method using DNS traffic. We demonstrate,
for example in the case of the Android OS, characteristics for
OS fingerprinting derived from DNS-related traffic analysis.
We derive a method for estimating the number of OSs by
using the characteristics and considering the likelihood of
irregular events: sending queries much less than the regular
time interval and sending queries much more frequently. We
demonstrate the results of our examination of the estimation
on our intra-network.
The outline of this paper is as follows. We describe the
works related to our study in Section II. We summarize our
proposal for the estimation in Section III. We introduce the
results of our DNS traffic analysis with the Android OS and
the equation for estimating the number of OS devices in
Section IV. We introduce our examination of our estimation
by using DNS traffic on our intra-network in Section V, and
conclude this paper in Section VI.
II. RELATED WORKS
There are some works of OS fingerprinting. In [2],
Zalewski uses a passive approach by monitoring differences
in the TCP/IP headers, TTL (Time To Live), and MSS
(Maximum Segment Size) to distinguish OSs. In the HTTP
headers, the User-Agent field has information about the web
browsers as well as the OSs of the users. In [5], Shah tries to
distinguish HTTP server software and the OS by using
information included in the HTTP responses. However, these
works are not feasible on large, complicated networks, since
these works need to establish traffic monitoring equipment at
all network borders and requires the filtering of usable
information from high volumes of captured traffic data.
Moreover, especially in [2], it does not work in the case of
tethering. In this case, some fields in the TCP/IP headers are
usually rewritten. In [3], Kollmann uses DHCP-related
packets for passive OS fingerprinting. He uses the time
difference between retransmission frames or DHCP fields,
such as Secs. However, there is no information about the
services or applications that users enjoy in the DHCP frames.
So, an additional system is needed to gather information
from another traffic analysis to that from DHCP frames.
There are other works of active OS fingerprinting. In [6],
Lyon uses the network scanning tool, Nmap. This tool has a
remote OS fingerprinting function. Nmap sends probe
packets to the target devices and monitors the response. The
application then determines the OS of the target from the
response packets. In [10], Gagnon takes a hybrid approach
that combines the passive approaches with active approaches
to increase the accuracy of OS fingerprinting. However, the
method does not work when the target devices are located
behind network devices, such as a firewall or NAT box. In
such cases, the application is unable to send probe packets to
the targets. Some works have been studied to overcome the
NAT-like situations. In [13], Beverly used a passive
approach to classify the traffic derived from NAT hosts with
other hosts by using a naïve Bayesian classifier for the
characteristic values in the TCP/IP header fields. In [4],
Schulz enabled active OS fingerprinting in the tethering
environment by injecting ICMP (Internet Control Message
Protocol) error packets into the target client’s TCP session.
However, this approach required an additional system to
monitor all clients networking and, especially in [4], to inject
ICMP packets at the right time. Therefore, the approach is
unfeasible with large, complicated networks.
Other works were studied to profile user activities by
analyzing traffic. In [11], Xu classified Internet backbone
traffic into clusters (servers/services, heavy hitter hosts,
scans/exploits) with source/destination IP addresses. This
approach is unrealistic for large networks because of the
need to analyze the volume of traffic data to profile all
activities in terms of storage and computational cost.
Furthermore, there is a problem with the deployment of
monitors to obtain all traffic data on a large network. In [12],
Zhang tried to infer online user activities (browsing, online
game, video, etc.) by analyzing MAC-level traffic on a
wireless LAN and extracting the feature of
data/control/management frames (data rate, frame interval
time, etc.). This approach specialized in wireless LAN traffic
but had a monitor deployment issue.
III. DESCRIPTION OF PROPOSED METHOD
Figure 1 shows our assumption of the network
environment for passive OS fingerprinting. There are some
access networks (cellular, FTTH, etc.) on the whole network,
and each device can connect to any access network. There is
a (set of) DNS server on a core network. Whichever access
network a device connects to, a device sends a query to the
same DNS server. We also assume that there are some
devices that connect to an access network through another
device, such as tethering-enabled ones or NAT-boxes. This
i
d
n
a
a
d
c
r
T
e
f
i
f
a
q
o
(
t
r
e
I
e
n
c
O
u
a
I
a
w
h
t
o
A
n
w
T
w
t
c
i
mplies that
a
d
evice; it is s
h
The outli
n
n
umber of O
S
1. (In
t
traffi
2. Extr
a
a sp
e
inter
v
com
p
3. Mak
e
4. (In t
h
esti
m
usin
g
The follo
w
a
nd examinat
i
a
n example
d
enote an
c
haracteristic
s
r
esul
t
s with t
h
T
h
r
ough the
e
stimation: (
A
f
or which an
y
i
n Section I
V
f
lows where
fi
a
n A record,
a
q
uery for one
Section IV-A
)
o
f queries i
n
(
denoted in S
e
t
he signature
r
epresents a c
h
e
stimatio
n
of
I
n Sectio
n
V,
e
stimation o
f
n
etwork.
IV.
R
To extrac
t
c
apture DNS
O
S left with
o
u
se four smar
t
a
nd the othe
r
I
nternet usin
g
a
llow auto-up
w
ith the othe
r
h
ave some a
p
t
raffic evolve
d
o
n our intra-n
e
A
.OS-
s
pecif
i
We extrac
t
n
o query. Ta
b
w
hich only t
h
T
able I, the
w
hich any ot
h
t
hat when
c
lients.androi
d
a
n IP addres
s
h
ared by some
n
e of our pr
o
S
s from DNS t
r
t
he expe
r
ime
n
c from each
m
a
ct characteris
t
e
cific domain,
v
al between
p
lete a name r
e
e
a signature
fr
h
e service n
e
m
ate the numb
e
g
the signature
w
ing sections
i
on of the est
i
o
f proof of
o
example of
s
from DNS
h
e Android O
analysis, w
e
A
) the Androi
d
y
other OSs s
e
V
-A); (B) the
fi
rst it sends a
n
a
nd after rec
e
of the IP add
r
)
; and (C) the
n
volved with
e
ction IV-B).
as an equ
a
h
aracteristic (
t
the number o
f
we show the
f
the number
R
ESULT OF O
U
t
signatures f
o
-
related traffi
c
o
ut any opera
t
t
phone device
r
s have differ
g
wireless LA
N
p
date of appli
c
r
configuratio
n
p
plications i
n
d
from devic
e
e
twork.
fi
c DNS query
t
domain na
m
b
le I shows a
n
h
e Android
O
Android OS
h
er OSs send
n
the Andro
i
d
.google.com
s
is not used
devices.
o
posal for th
e
r
affic is as fol
l
n
tal environ
m
m
obile OS dev
i
t
ics from DN
S
a specific pat
t
each quer
y
e
solution task
s
fr
om the extra
c
e
twor
k
) Gath
e
e
r of OSs fro
m
.
show the re
s
i
mation about
o
ur proposal.
our analy
s
traffic. Sect
S as an exa
m
e
found cha
r
d
OS has spe
c
e
n
d
s no quer
y
Android OS
n
AAAA reco
r
e
iving a respo
n
r
esses in the r
e
Android OS
h
time interva
l
We represent
a
tion with e
a
t
ime interval
p
f
OSs (denot
e
results of ou
r
of Android
U
R DNS TRAF
F
o
r passive O
S
c
from mobil
e
t
ion and the
c
s, two of whi
c
ent OSs. All
N
. All devic
e
c
ations and e
n
n
s set to the
d
n
stalled by d
e
e
s is captured
m
es for which
a
a
n example o
f
O
S sends qu
e
has specific
n
o query. Mo
i
d OS se
n
(or some
only by a
c
e
estimation
o
l
ows:
m
ent) Gather
i
ce.
S
traffic: quer
i
t
ern of querie
s
y
, query flo
s
).
c
ted character
i
e
r DNS traff
i
m
the traffic d
s
ult of our a
n
the Android
In Section I
V
s
is for extr
ion IV sho
w
m
ple of our an
a
r
acteristics f
o
c
ific domain
n
y
regularly (d
e
has specific
r
d query, then
n
se, it sends
a
e
sponse (den
o
h
as specific p
a
l
s between
q
a general m
o
a
ch paramete
r
p
attern) used
f
e
d in Section
I
r
examination
OS on our
F
IC ANALYSIS
S
fingerprinti
n
e
devices wit
h
c
aptured traffi
c
c
h have Andr
o
devices acce
e
s are configu
r
n
able GPS fu
n
d
efaul
t
. All d
e
e
fault. DNS-
r
on the DNS
a
ny other OSs
f
domai
n
na
m
e
ries. Accord
i
domain nam
e
reover, it is n
n
ds a quer
y
other
g
oog
l
c
ertain
o
f the
DNS
i
es for
s
(time
o
w to
i
stics.
i
c and
ata by
n
alysis
OS as
V
, we
acting
w
s the
a
lysis.
o
r the
n
ames
e
noted
query
sends
a
PTR
o
ted in
a
tterns
q
ueries
o
del of
r
that
f
or the
I
V-C).
of the
intra-
n
g, we
h
each
c
. We
o
id 2.3,
ss the
r
ed to
n
ctions
e
vices
r
elated
server
sen
d
s
m
es for
i
ng to
e
s for
otable
y
to
l
e.com
su
b
Th
e
mi
l
OS
is i
n
reg
u
set
t
ser
v
b
y
onl
y
qu
e
tak
e
Li
v
ob
s
se
n
B.
for
cli
e
wh
e
an
d
pr
o
qu
e
nu
m
da
y
De
v
se
n
sec
ag
a
cli
e
tak
e
88,
2
da
y
qu
e
to
8
ov
e
ch
a
b
domains), th
e
e
n, the OS s
e
l
liseconds lat
e
sends a quer
y
n
response to
t
Android O
S
ularly. The d
o
t
ing the dom
a
v
er domain.
H
other OSs (Li
n
y
once whe
n
e
ry whenever
e
time interv
a
v
e) value of th
e
s
erved. It diff
e
n
ds queries at
a
Interval time
p
We then ana
l
OS-specific
e
nts.android.g
o
e
n the first qu
e
d
2 have the
s
o
duced by d
i
e
ries are oft
e
m
ber 0 every
y
or there are
v
ice 2 sends
m
n
ding a query
,
onds again.
M
a
in after more
t
Figure 3 sh
o
e
nts.android.g
o
e
time inter
v
2
00 seconds)
.
y
(more than
e
ries take tim
e
8
8,200 secon
d
e
r one day.
a
racteristics
D
TABLE
I. E
X
domain name
c
c
Figure 1
.
e
OS first sen
d
e
nds a query
er
, after a res
p
y
for the PTR
t
he A record
q
S
also send
s
o
main is likel
y
a
in as an N
T
H
owever, acco
r
n
ux OS, Win
d
n
configuring
the OSs send
a
ls over one
d
e
DNS server
e
rs from the b
e
a
bout 14,400
s
p
attern of D
N
l
yze the inter
v
domain. Fig
u
o
ogle.com
A
e
ry (query nu
m
s
ame Android
i
fferent vend
o
e
n evolved a
t
day. Someti
m
some querie
m
ore queries t
h
,
Device 2 s
e
M
oreover, Dev
i
than an hour
(
o
ws frequenc
y
o
ogle.com
. A
t
v
als near 86,
4
.
42.9% of q
u
88,200 seco
n
e
intervals nea
r
d
s). A total of
1
Moreover,
F
D
evice 2 only
X
AMPLES OF
OS-
S
O
S
c
lients.google.co
m
c
om
.
Our assumption
d
s a query fo
r
for A record
s
p
onse has arri
v
record of an I
P
q
uery.
s
queries t
o
y
to be queried
T
P (Network
r
ding to our e
x
d
ows™), the
O
an NTP ser
v
NTP-related
d
ay because o
f
of
ntp.org
do
m
e
havior of A
n
s
econds interv
a
N
S queries
v
al time betw
e
u
re 2 shows
record fro
m
m
ber is 0) is e
v
OS version
2
o
rs. Accordi
n
t
the same t
i
m
es, there is n
o
s a
t
different
h
an Device 1
i
e
nds a query
a
i
ce 2 someti
m
(
3,600 second
s
y
distributio
n
t
Device 1,
2
4
00 seconds
(
u
eries take in
t
n
ds). At De
v
r
86,400 seco
n
1
3.0% of que
r
F
igure 2(b)
owns, 33.0
%
S
PECIFIC
DNS
QU
E
S
)
m
, *.pool.ntp.or
g
of the network e
n
r
AAAA reco
r
s
. Less than
2
v
ed, the Andr
o
P
address, wh
o
*.pool.ntp.
o
d
by other OSs
Time Proto
c
x
tra examinat
i
O
Ss send a qu
e
v
er and send
traffic. The
O
f
TTL (Time
m
ain as far as
n
droid OS, wh
a
l.
e
en DNS que
r
query time
m
the base ti
m
v
olved. Devic
2
.3, but these
n
g to Figure
i
me with qu
e
o
query withi
n
times in a
d
i
n 30 days. A
f
a
fter less tha
n
m
es sends a qu
e
s
).
n
of queries
2
1.4% of que
r
(
from 84,600
t
ervals over
o
v
ice 2, 8.7%
n
ds (from 84,
6
r
ies take inter
v
shows spec
i
%
of queries t
a
E
RY
(A
NDROID
g
, mtalk.google.
n
vironment
r
ds.
2
00
o
id
ich
o
rg
by
c
ol)
i
on
e
ry
no
O
Ss
To
we
ich
r
ies
for
m
e
e 1
are
2,
e
ry
n
a
d
ay.
f
ter
n
3
e
ry
for
r
ies
to
o
ne
of
6
00
v
als
i
fic
a
ke
interval less than 1,800 seconds (most of these queries send
at less than 3 seconds interval) and 16.5% of queries at less
than 5,400 seconds (in fact, these queries’ intervals take from
3,600 to 4,000 seconds). Device 2 also takes intervals near
82,800 seconds (from 81,000 to 82,800 seconds) at 6.1% of
queries. This appears that if a previous query takes intervals
near 3,600 seconds, the next interval is near 82,800 seconds
in order to adjust the query time at the same time in a day.
Through our analysis described above, we summarize
characteristics on DNS queries for
clients.android.google.com as follows:
After a query for the A record of
clients.android.google.com, the Android OS sends a
PTR query for an IP address, which is in response to
an A record query at an interval of less than 200
milliseconds.
Figure 3. Query time interval (domain name: clients.android.google.com, days: 60)
(a) Device 1
(b) Device 2
0
0.2
0.4
0.6
0.8
1
0
5
10
15
20
0
1800
3600
5400
7200
9000
10800
12600
14400
16200
18000
19800
21600
23400
25200
27000
28800
30600
32400
34200
36000
37800
39600
41400
43200
45000
46800
48600
50400
52200
54000
55800
57600
59400
61200
63000
64800
66600
68400
70200
72000
73800
75600
77400
79200
81000
82800
84600
86400
88200
88200
cumulative relative
frequency
number of queries
query time interval [second]
0
0.2
0.4
0.6
0.8
1
0
5
10
15
20
25
30
35
40
0
1800
3600
5400
7200
9000
10800
12600
14400
16200
18000
19800
21600
23400
25200
27000
28800
30600
32400
34200
36000
37800
39600
41400
43200
45000
46800
48600
50400
52200
54000
55800
57600
59400
61200
63000
64800
66600
68400
70200
72000
73800
75600
77400
79200
81000
82800
84600
86400
88200
88200
cumulative relatvie
frequency
number of queries
query time interval [second]
(a) Device 1
0
5
10
15
20
0 5 10 15 20 25 30
query number
relative query time from query #0 [day]
Figure 2. DNS query evolved time (domain name: clients.android.google.com, days: 30)
0
10
20
30
40
50
60
0 5 10 15 20 25 30
query number
relative query time from query #0 [day]
(b) Device 2
Android OS often sends queries for
clients.android.google.com at the same time every
day. However, the Android OS sometimes sends no
query in a day (42.9% of Device 1 queries, 13.0% of
Device 2 queries).
The Android OS sometimes sends queries for
clients.android.google.com at different times from
the regular time in a day. Some devices have a
specific pattern for the different time (e.g. Device 2
sends the queries at intervals near 3,600 seconds
(16.5%) or less than 3 seconds (33.0%)).
We analyze another query domain, *.pool.ntp.org. Figure
4 shows query time for *.pool.ntp.org A record from the
base time when the first query is evolved. In Figure 4,
vertical axes between days are drawn at 14,400 seconds.
According to Figure 5, queries are often evolved at time
intervals of multiples of 14,400 seconds (4 hours). Figure 5
shows frequency distribution of queries for *.pool.ntp.org.
Most of queries are sent at the time intervals of near the
multiples of 14,400 seconds, 78.0% of Device 1 queries and
78.5% of Device 2 queries. Some queries take intervals less
than 7,200 seconds, 8.7% of Device 1 and 9.3% of Device 2.
These queries appear to be for the alignment of the timing of
sending queries. Some queries take intervals over one day,
2.9% of Device 1 queries and 2.3% of Device 2 queries.
Through our analysis described above, we summarize
characteristics of DNS queries for
0
0.2
0.4
0.6
0.8
1
0
10
20
30
40
50
60
0
1800
3600
5400
7200
9000
10800
12600
14400
16200
18000
19800
21600
23400
25200
27000
28800
30600
32400
34200
36000
37800
39600
41400
43200
45000
46800
48600
50400
52200
54000
55800
57600
59400
61200
63000
64800
66600
68400
70200
72000
73800
75600
77400
79200
81000
82800
84600
86400
88200
88200
cumulative relative
frequency
number of queries
query time interval [second]
(a) Device 1
0
0.2
0.4
0.6
0.8
1
0
10
20
30
40
50
60
0
1800
3600
5400
7200
9000
10800
12600
14400
16200
18000
19800
21600
23400
25200
27000
28800
30600
32400
34200
36000
37800
39600
41400
43200
45000
46800
48600
50400
52200
54000
55800
57600
59400
61200
63000
64800
66600
68400
70200
72000
73800
75600
77400
79200
81000
82800
84600
86400
88200
88200
cumulative relatvie
frequency
number of queries
query time interval [second]
(b) Device 2
Figure 5. Query time interval (domain name: *.pool.ntp.org, days: 60)
0
5
10
15
20
25
01234567
query number
relative query time from query #0 [day]
0
5
10
15
20
01234567
query number
relative query time from query #0 [day]
Figure 4. DNS query evolved time (domain name: *.pool.ntp.org, days: 7)
(b) Device 2
(a) Device 1
clients.android.google.com as follows:
The Android OS often sends queries for
*.pool.ntp.org at multiples of 14,400 seconds.
However, the Android OS sometimes sends queries
over a day (2.9% of Device 1 queries, 2.3% of
Device 2 queries).
The Android OS sometimes sends queries for
*.pool.ntp.org at less than 7,200 seconds (8.7% of
Device 1 queries, 9.3% of Device 2’ queries) for
perhaps timing alignments.
C. Estimating the number of OSs
To estimate the number of OSs, we consider the
characteristics described above: (A) regularly, query time
intervals have a cyclic nature, (B) irregularly, some queries
are sent less than regular time intervals, and (C) some other
queries are sent more than regular time intervals.
Furthermore, for estimating the number of OSs, we have to
consider how to estimate the number of OSs using the data
captured during the less than the regular cyclic time interval.
This means that there are some OS devices that do not send
queries during the captured time interval, and we estimate
the number of such devices by using the captured data that
includes no query sent from the devices. In this section, we
first introduce how to estimate the number of OSs using the
data captured during the less than regular cyclic time interval.
Then, we introduce how to consider the irregular
characteristics described above. For the purpose of the
following explanation, Figure 6 is an example of the
situation of the following explanations. Table II summarizes
the notations we use in the following explanations.
First, we introduce the estimation equation which
utilizing the regular cyclic nature of queries (A). Let the
cyclic interval time for a domain dbe , and the interval
time for capturing traffic data be 󰇛 󰇜. A probability
,
that an OS device sends a query for domain din the
capture interval satisfies ,

/. Therefore, let the
number of queries for domain din the interval be ,
, if
all queries are sent at the cyclic interval , the number of
OSs , , satisfies ∙
,

,
. So, can
estimate by the following equation,
,
,
,
.
For example, in Figure 6, the number of queries for
domain d,,
, is 3 (query ,, ,, , and ,). If
satisfies 1/2∙
, the number of OS devices is
estimated that  3/󰇛1/2󰇜  6.
Then, we introduce how to consider the irregular
characteristics (B). In the queries in the captured data during
, there are the queries that are sent by the same OS device.
So, we should remove such duplicated queries from the
number ,
before the estimation of by the equation
(1).
First, we consider the irregular characteristic (B) to the
estimation equation (1). Let ,
be the number of OS
devices that send only one query in the capture interval ,
,
be the number of OS devices that send more than one
query in the capture interval . Let
be the mean of the
number of queries sent by OS devices, which send more than
one query in the capture interval , in the capture interval .
The number of queries in the capture interval , ,
, is
denoted as ,

,
󰇛
1󰇜∙
,
. ,
satisfies
,

,
∙
,
, where ,
is the probability that an
OS device sends a query for domain dat less than the capture
interval . So, the number of devices that send only one
query ,
is denoted as ,

,
/󰇛1  ,
󰇛
1󰇜󰇜.
Therefore, the equation (1) is revised as,
Figure 6. An example of the estimation situation
a query at less than
capture interval
a query over the
cyclic interval
device 1
device 2
capture interval
TABLE II. NOTATIONS
Cyclic interval time for a domain d
Interval time for capturing traffic data
,
The number of queries for domain dthat are sent in the
interval
,
The number of OS devices that sends only one query for
domain dduring the interval
,
The number of OS devices that sends more than one
query for domain dduring the interval
Mean of the number of queries sent by OS devices in
the interval , that send more than one query in the
interval
,
Probability that a OS device sends queries for domain d
in the interval
,
Probability that a OS device sends queries for domain d
at less than the interval
,
Probability that a OS device sends queries for domain d
over the cyclic interval
The number of OS devices only that send at least one
query in the cyclic interval

The number of all OS devices
,
,,
󰇛/󰇜󰇛,
󰇛
󰇜󰇜. (2)
In Figure 6, the probability that an OS device sends a
query at less than the capture interval , ,
, is 2/6  1/3
derived from Device 1 pattern (queries that sent at less than
the interval is , and ,). The mean of the number of
queries that is sent in the capture interval ,
, is 3 derived
from the Device 1 pattern. So, the number of OS devices is
estimated as 
󰇛/󰇜󰇛/∙󰇛󰇜󰇜  18/5.
Then, we consider the irregular characteristic (C) to the
equation (2). in the equation (2) denotes the number of
OS devices that send at least one query in the cyclic time
interval . However, according to characteristic (C), there
are some OS devices that send no query over the cyclic time
interval. So, let ,
be the probability that an OS device
sends a query over the cyclic time interval , and the
estimated number of all OS devices 
is denoted as
follows, 

/󰇛1  ,
󰇜. Therefore, the equation
(2) is revised as,


1
,
,
,
1  ,
,
󰇛/󰇜󰇛,,
󰇛
󰇜󰇜󰇛,
󰇜 
In Figure 6, the probability that an OS device sends a
query over the cyclic time interval , ,
, is 1/6 derived
from Device 1 pattern (queries that sent over the cyclic time
interval is , ). So, the number of OS devices is
estimated as 
󰇛/󰇜󰇛/∙󰇛󰇜󰇜󰇛/󰇜  108/25.
V. EXAMINATION IN OUT INTRA-NETWORK
We examine our estimation equation (3) by estimating
the number of Android OSs on our intra-network. We
capture the DNS traffic data and DHCP-related traffic in our
intra-network. DHCP traffic is used for DHCP fingerprinting
[3] to compare the estimation result by using DNS traffic. In
this examination, we use two DNS queries that are for
android.clients.google.com and *.pool.ntp.org. We derive
each parameter in the equation (3) from our DNS traffic
analysis described in Section III-B. Table III summarizes the
parameters for the equation (3) related to queries for
android.clients.google.com and *.pool.ntp.org, respectively.
Figure 7 shows the difference in the estimation results by
using queries for clients.android.google.com with the
captured time interval and parameters from device 1
analysis and device 2. We derive a number of queries, ,
in the equation (3) from the captured DNS traffic data during
one day. Each value of ,
related to the captured time
interval is shown in Table IV. Each value of ,
is
derived by calculating an average of the number of queries in
each interval where the start time is shifted hour by hour.
The dashed line in Figure 7 indicates the result of the DHCP
fingerprinting, which estimates that 8 OS devices exist in the
network. According to Figure 7, the results from the Device
2 parameters are closer to the DHCP fingerprinting result.
Therefore, Device 2 parameters are more suitable for the
characteristics of Android OS queries. Device 1 parameters
derive worse estimation results since the probability that an
OS device sends a query over the cyclic time interval ,
is
much higher than Device 2 due to device specific
characteristics or irregularly factors. Figure 7 also indicates
the feature that the longer the captured time interval , the
closer the estimation results are to the DHCP fingerprinting
result.
Figure 8 shows the estimation results by using queries for
*.pool.ntp.org. Each value of ,
is shown in Table V.
According to Figure 8, both estimation numbers of OS
devices are less than the DHCP fingerprinting result. It is
because some Android OS devices send no query for
*.pool.ntp.org by default, and we presume that there are
some Android OS devices that are set to choose another
domain or method for time synchronization in the network.
Our extra observation shows that 2 devices of 5 devices send
no query for that domain. If we consider the rate of such
devices to the estimation, the results of the estimation
become closer to the DHCP result.
TABLE III. PARAMETERS FOR THE EQUATION (3)
(A)DOMAIN:ANDROID.CLIENTS.GOOGLE.COM
Device 1 Device 2
,
,
,
,
86400
86400 0.357 0.429 2.00 0.783 0.130 2.86
43200 0.262 0.429 2.00 0.626 0.130 2.34
21600 0.143 0.429 2.00 0.539 0.130 2.05
10800 0.095 0.429 2.00 0.513 0.130 2.05
5400 0.071
0.429 2.00 0.348 0.130 2.00
(B) DOMAIN:*.POOL.NTP.ORG
Device 1 Device 2
,
,
,
,
14400
14400 0.104 0.549 2.00 0.116 0.581 2.00
7200 0.087
0.549 2.00 0.076 0.581 2.00
3600 0.046
0.549 2.00 0.052 0.581 2.00
TABLE IV. VALUES OF THE NUMBER OF QUERIES IN EACH CAPT URED
TIME INTERVAL
(DOMAIN:ANDROID.CLIENTS.GOOGLE.COM)
86400 43200 21600 10800 5400
,
16.0 8.58 4.61 2.29 1.14
TABLE V. VALUES OF THE NUMBER OF QUERIES IN EACH CAPTURED
TIME INTERVAL
(DOMAIN:*.POOL.NTP.ORG)
14400 7200 3600
,
1.50 0.73 0.35
VI. CONCLUSION
In this paper, we study ways to passive OS fingerprinting
from the analysis of DNS traffic and we derive a method to
estimate the number of OSs in the network.
We first reveal characteristics to determine OSs from
DNS traffic by analyzing DNS queries from each OS. Each
OS, especially the Android OS, has useful characteristics for
the estimation, each OS has specific domains to which other
OSs send no query, and each OS has characteristic time
interval distributions in sending the OS-specific domain
queries. Our analysis also shows that the OS-specific domain
queries are sometimes sent irregular time intervals; some
queries are sent less than regular time intervals, and some
other queries are sent more than regular time intervals.
We then propose a method for estimating the number of
OSs from a number of specific DNS queries in a captured
DNS traffic data during a time interval. We derive an
equation for estimating the number of OSs, which considers
not only the cyclic nature of queries for specific domains but
also the irregular time interval cases described above.
Finally, we provide the results of our examination on our
intra-network. Some results show our estimation method can
result in close estimation number of OSs to the results of
DHCP fingerprinting. The result indicates that the accuracy
of our estimation method depends on the parameters for the
equation. In the case of using DNS queries to
clients.android.google.com, we can obtain closer estimation
number of OSs by using the parameters which are derived
from our DNS traffic analysis regarding Device 2 than ones
regarding Device 1. Furthermore, the results also reveal the
feature that the shorter the data captured time interval, the
worse the precision of the estimation. Additional methods
should be studied to raise the precision of the estimation
from captured data with shorter time intervals and to derive
the adequate parameters that correctly describe OS
characteristics.
ACKNOWLEDGMENT
We would like to thank Mr. Yamashita from KDDI R&D
Laboratories Inc. for sharing his works.
REFERENCES
[1] Ericsson, “Traffic and Market Report”, Available at
http://www.ericsson.com/res/docs/2012/traffic_and_market_report_ju
ne_2012.pdf, Jun. 2012.
[2] M. Zalewski, “p0f v3”,Available at http://lcamtuf.coredump.cx/p0f3/.
[3] E. Kollmann, “Chatter on the Wire: A look at DHCP traffic”,
Available at http://myweb.cableone.net/xnih/download/Chatter-
DHCP.pdf, 2007.
[4] S. Schulz, A. Sadeghi, M. Zhdanova, H. A. Mustafa, W. Xu and V.
Varadharajan, “Tetherway: A Framework for Tethering Camouflage”,
Proc. ACM Wireless Network Security (WISEC 2012), pp. 149-160,
2012.
[5] S. Shah, “HTTP Fingerprinting and Advanced Assessment
Techniques”, Blackhat 2003 USA, Available at
http://www.blackhat.com/presentations/bh-usa-03/bh-us-03-shah/bh-
us-03-shah.ppt, 2003.
[6] G. F. Lyon., “Remote OS Detection via TCP/IP stack fingerprinting”,
Available at http://nmap.org/book/osdetect.html, 2011.
[7] C. Popi, O. Festor, “A Scheme for Dynamic Monitoring and Logging
of Topology Information in Wireless Mesh Networks”, Proc. IEEE
Network Operations and Management Symposium (NOMS 2008), pp.
759-762, 2008.
[8] D. Tuncer, M. Charalambides, G. Pavlou and N. Wang, “DACoRM:
A Coordinated, Decentralized and Adaptive Network Resource
Management Scheme”, Proc. IEEE Network Operations and
Management Symposium (NOMS 2011), pp. 417-425, 2011.
[9] R. G. Clegg, S. Clayman, G. Pavlou, L. Mamatas and A. Galis, “On
the selection of management/monitoring nodes in highly dynamic
networks”, IEEE Trans. on Computers, vol. 99, pp. 1-15, Mar., 2012.
[10] F. Gagnon and B. Esfandiari, “A Hybrid Approach to Operating
System Discovery Based on Diagnosis Theory”, Proc. IEEE Network
Operations and Management Symposium (NOMS 2012), pp. 860-865,
2012.
[11] K. Xui, Z. Zhang and S. Bhattacharyya, “Profiling Internet Backbone
Traffic: Behavior Models and Applications”, Proc. ACM SIGCOMM
2005, pp. 169-180, 2005.
[12] F. Zhang, W. He, X. Liu and P. G. Bridges, “Inferring Users’ Online
Activities Through Traffic Analysis”, Proc. ACM conference on
Wireless Network Security (WISEC 2011), pp. 59-70, 2011.
[13] R. Beverly, “A Robust Classifier of Passive TCP/IP Fingerprinting”,
Proc. Workshop Passive and Active Network Measurement (PAM
2004), pp. 158-167, 2004.
Figure 7. Estimation results with queries for
clients.android.google.com
Figure 8. Estimation results with queries for *.pool.ntp.org
0
5
10
15
20
25
30
35
40
45
0 20000 40000 60000 80000
estimated number of OS devices
captured time interval (T
q
)
Device 1's parameters
Device 2's parameters
DHCP fingerprinting
0
1
2
3
4
5
6
7
8
9
10
0 2500 5000 7500 10000 12500 15000
estimated number of OS devices
captured time interval (Tq)
Device 1's parameters
Device 2's parameters
DHCP fingerprinting