Using TLS Fingerprints for OS Identification
in Encrypted Traffic
Martin Laˇ
stoviˇ
cka∗† , Stanislav ˇ
Spaˇ
cek∗† , Petr Velanand Pavel ˇ
Celeda
Masaryk University, Institute of Computer Science, Brno, Czech Republic
Masaryk University, Faculty of Informatics, Brno, Czech Republic
Email: {lastovicka|spaceks|velan|celeda}@ics.muni.cz
Abstract—Asset identification plays a vital role in situational
awareness building. However, the current trends in communica-
tion encryption and the emerging new protocols turn the well-
known methods into a decline as they lose the necessary data to
work correctly. In this paper, we examine the traffic patterns of
the TLS protocol and its changes introduced in version 1.3. We
train a machine learning model on TLS handshake parameters
to identify the operating system of the client device and compare
its results to well-known identification methods. We test the
proposed method in a large wireless network. Our results show
that precise operating system identification can be achieved in
encrypted traffic of mobile devices and notebooks connected to
the wireless network.
I. INTRODUCTION
The first step in understanding the situation in a network
is obtaining the knowledge of what devices connect to it.
However, this is not easily fulfilled, especially in unmanaged
networks where devices can connect without any network ac-
cess control mechanism. These devices are typically identified
through dynamically assigned IP addresses. Administrators
and researchers address this issue with device identification
(fingerprinting) from network traffic. Most methods leverage
the HTTP User-Agent strings, which directly describe the
device and are inherently present in device communication.
However, the current trends in communication evolution go in
the direction of user privacy. They tend to move as much of
the traffic as possible into encrypted payloads to counter the
identification and tracking.
A typical solution to the encryption challenge is to scan the
devices in the network actively or to inspect the encryption
handshake parameters from packets using deep packet inspec-
tion. The contribution of our work lies in the analysis of TLS
(Transport Layer Security) handshake parameters suitable for
OS (Operating System) identification in IP flows. This shift
towards passive flow identification allows security admini-
strators to asses the OS of each device even in large networks.
Moreover, enabling identification from encrypted traffic will
ensure the identification usability in the ever-evolving network
communication.
First, we describe in detail the parameters of TLS handshake
and the nuances of distinct versions of the protocol and how
TLS 1.3 [1] redefined the semantics of some of the data
fields. The information from the handshake is incorporated
into network flows, exported using IPFIX [2], and further
processed. We employ machine learning classifier to build a
model mapping the TLS parameters feature vector to the OS of
the client device. We further explore the OS identification from
encrypted traffic by extending our previous work on TCP/IP
parameters fingerprinting [3] where we replaced manual sta-
tistical analysis with a decision tree algorithm.
We evaluate the methods on a large dataset collected from
campus wireless networks. Students and employees can bring
and connect any device which guarantees high diversity of the
devices. We pair records from infrastructure servers logs to the
collected traffic to establish ground truth and calculate the ac-
curacy metrics of our methods. Furthermore, we measured OS
identification results for two established methods (i.e., based
on HTTP User-Agent and connections to specific domains) to
provide a comparison with methods based on plaintext traffic.
II. RE LATE D WORK
We surveyed the state of the art in the field of device
fingerprinting in our previous work on this topic [3]. Hence,
we focus mainly on the new contributions in the area. Shen
et al. [4] published an article concerning the fingerprinting of
devices in the industrial control system environment. Sanchez
et al. [5] proposed hardware features, specifically the internal
clock signals, to discern different devices. However, none of
these publications deals with the challenge to identify the
operating system of a device from network traffic, posed by
the currently common encryption.
The recent approach to network communication is favoring
user privacy and promotes encrypting as much of the trans-
ferred data as possible. The purpose is to hide the content and
other relevant information about the transfer that might be
captured and analyzed by an external observer. However, even
encrypted connections might disclose some information about
the participants and the purpose of the transfer [6]. During the
handshake, before the encrypted connection is established, all
the exchanged data can be seen unencrypted.
The TLS handshakes have already been analyzed in the
past. Anderson et al. [7] provide a comprehensive study of
malware’s use of TLS by observing the unencrypted TLS
handshake messages. They also identify handshake features,
that can be used to cast some light on the data transferred by
TLS, e.g., Server Name Indication (SNI), and server certificate.
The usage of SNI for flow analysis was further explored
by Shbair et al. [8]. However, these articles focus more on978-1-7281-4973-820$31.00 c
2020 IEEE
the server side of the handshake, while the identification of
connected devices is based on the client-side parameters.
Identification of client applications using TLS handshake
monitoring was proposed by Korczy´
nski et al. [9]. The authors
showed the identification from encrypted data to be possible.
A client device identification algorithm based on the specific
parameters of the TLS handshake was proposed by Hus´
ak
et al. [10]. They utilized the simultaneous monitoring of the
HTTP and HTTPS connections to create a dictionary, pairing
the User-Agent from HTTP with the TLS handshake from
HTTPS connections generated by one device. The dictionary
was then used to assign User-Agents to HTTPS connections
captured in the network.
III. NET WORK DATA ACQUISITION
To be able to identify the operating system of a particular
device from its network traffic, we need to measure and
analyze relevant features from the traffic of the device. For
this purpose, we utilize IPFIX data provided by a flow mon-
itoring system [2]. The flow monitoring system uses multiple
Flowmon probes [11] and IPFIXcol flow collector [12].
The accuracy of any identification method depends on the
number and quality of features used as an input. We use
several different features in this work: TCP/IP parameters of
observed connections, values from plaintext HTTP headers,
and parameters of TLS connections. The rest of this section
describes the features we use for OS identification.
A. TCP/IP Features
Each flow record contains the basic fields, i.e., flow start
time, flow end time, source IP address, source port, destination
IP address, and destination port. Existing analyses show that
the size of TCP SYN packet, initial TCP window size, and
TTL (Time to Live) value of packets differ between operating
systems. Therefore, these fields are recorded as well for the
flows describing TCP connections.
B. HTTP Headers
Although most of the HTTP traffic is secured by TLS
protocol nowadays, there are still services using plaintext
HTTP. When a device communicates using this protocol, the
flow probe analyses the HTTP request header and extracts
the visited domain name from the URL and the User-Agent
field. Since there are domains tied to operating system specific
update services, they can be used as an indication of the used
OS. The User-Agent field usually contains not only the name
of the application but also the platform (i.e., operating system)
on which it is running. Moreover, some applications, such as
antivirus software, are often platform-specific, and its presence
can reveal the used OS as well. Therefore, the HTTP User-
Agent is often used for OS identification.
C. TLS Handshake
To identify the operating system of devices that commu-
nicate through encrypted protocols, we have to rely on the
information remaining in cleartext. Since the unencrypted ini-
tialization phase precedes every encrypted communication, as
shown in Figure 1, we can inspect the negotiation (handshake)
between the client and the server.
Connection
Request
TCP3-WayHandshake
Connection
Acknowledged
ClientHello TLS1.3Handshake
ServerHello
Finished
Application
Data
EncryptedData
Application
Data
Time Time
Client Server
Fig. 1. TLS 1.3 handshake – negotiation of the encrypted connection.
The first parameter we explore is the Client Version of the
TLS protocol sent by the client. The version influences further
parameters and extensions the client uses, which plays an es-
sential role in the OS identification. However, stating the client
version from Client Hello message is not as straightforward as
one would expect. Every version of TLS is identified differ-
ently due to backward compatibility, and each version sends
0x0301 bytes (TLS 1.0) in the record header field Version of
the TLS handshake which specifies the version of TLS used.
Then the client constructs the Client Hello header, in which
it fills in another Version field identifying the TLS version
of the client application. This identification was valid until
TLS 1.3 was introduced. TLS 1.3 clients send their version
as 0x0303 (TLS 1.2) [1], and their correct version is located
in extensions of the Client Hello message. Specifically, in the
extension Supported Versions it sends a list of all versions it
can use, and one of the versions is the TLS 1.3. This leads to
the current situation when TLS 1.3 clients use identifiers of
three different versions of the protocol. Flowmon monitoring
probes rely on the Version field from Client Hello header
and export this value into the flow. Hence, probes correctly
identify TLS versions 1.0 to 1.2, and TLS 1.3 clients are
exported as TLS 1.2 flows. To distinguish those two versions,
we look at extension types list and look for extension number
43 Supported Versions, which is exclusively and mandatorily
used by TLS 1.3 clients. However, this version mapping was
not implemented in the version (v10.02.05) of the Flowmon
probe used for our experiments.
The Cipher Suites is a field from Client Hello header which
specifies a list of supported encryption algorithms together
with the key length and hash algorithm to be used. This list is
ordered descendingly according to the client preference, and
we assume this preference can help identify the underlying
operating system. The TLS 1.3 specification defines only five
cipher suites to be used with TLS 1.3, but the clients usually
append more suites at the end of the list to ensure compatibility
with older servers. The list has variable length depending on
the client and the flow exporter stores only the first 16 bytes
of the field to the flow. As a result, we have IDs of the first
eight cipher suites most preferred by the client.
Similarly to cipher suites, the client can offer named groups
for the key exchange and cryptography based on elliptic
curves. This upgrade was introduced in TLS 1.2 as ellip-
tic curves extension and defined 25 named curves to chose
from [13]. TLS 1.3 specification then reduced this number to
only five supported curves, but added the option to use finite
field groups and defined five groups with audited parameters
resistant to known attacks [14]. This change also led to the
renaming of the extension to supported groups and introduced
ordering with the most preferred group first. Flowmon exporter
store the IDs of the first eight groups in the flow records.
Other parameters parsed from the TLS handshake are the
extension types and lengths. The extensions can specify ad-
ditional options for the handshake and greatly varies between
TLS versions and implementations. IANA maintains a list of
known extensions [15]; however, in the real traffic, we can
see unassigned extension types in use. Most of those are so-
called GREASE (Generate Random Extensions And Sustain
Extensibility) values which are nowadays only an Internet-
Draft [16] but already deployed in many TLS implementations.
Flowmon exporter parses the extensions and stores IDs of the
first 23 extensions used together with a list of their lengths.
The final parameter extracted from the TLS handshake is the
value of Server Name Indication extension [17]. It is present in
almost every HTTPS communication to differentiate between
multiple virtual servers so that the server know which TLS
certificate to send to the client. We treat the extracted server
name the same way as the domain name (host) extracted
from HTTP headers. Table I shows the features we measure
from the traffic using flow monitoring and use for passive OS
identification.
TABLE I
FEATU RES E XT RAC TED F ROM N ETW OR K DATA.
TCP, HTTP Flow Features TLS Flow Features
TCP SYN packet size TLS server name indication
TCP window size TLS client version
TTL of TCP SYN packet TLS cipher suites
HTTP User-Agent TLS extension types
HTTP hostname TLS extension length
TLS supported groups
TLS elliptic curves point formats
IV. DATAS ET
To test the OS identification methods on real-world data,
We measured the flow data from the university uplink to
the Internet. The dataset consists of data from three different
sources; flow records collected from the university backbone
network, log entries from the two university DHCP (Dynamic
Host Configuration Protocol) servers and a single RADIUS
(Remote Authentication Dial In User Service) accounting
server. The data was collected from 2019-07-12 00:00 to
2019-07-16 23:59 with a few hours overhead on both sides
of the interval for the log entries to cover long connection
sessions overlapping to and from the time frame. We made
the anonymized dataset publicly available on the Zenodo
platform [18]. In the dataset, we kept only flows with source
IP addresses from university wireless networks (Eduroam).
This step significantly reduced the amount of data and left
only the relevant flows to identify the OS of devices in our
network, which we can enrich with information from DHCP
and RADIUS servers.
The DHCP log data was chosen as the ground truth for our
experiment. The log typically archives the information that
connects a unique MAC (Media Access Control) address of
a specific device with the IP address it got assigned within
a specific time frame. However, it is not possible to estimate
the length of the session initiated by a specific device just
from the DHCP log. A device might end the connection long
before its IP lease time expires and the DHCP server does not
log this action. The RADIUS accounting logs supplement the
DHCP logs in this regard, as they contain the session start
and session end parameter for a specific device identified by
its IP and MAC address.
The DHCP log data was collected from the two central
PPPoE concentrators of the university network. The original
DHCP logs contained a large amount of data not relevant or
redundant to OS identification, so we applied several filters
to keep them as concise as possible. The resulting DHCP
log includes following parameters: timestamp,IP address,IP
range,MAC address, and hostname of the client device. The
timestamp denotes the precise time of the request and with
the requested IP address and the device’s MAC address allows
pairing an address to a specific device within the given time
frame. The information that identifies the device’s operating
system is provided by the hostname parameter. We discovered
that some of the DHCP REQUESTS in the dataset demanded
IP addresses that do not belong to any of the known Eduroam
address pools. We removed these addresses and to confirm that
the device requests a valid Eduroam IP address, we specify for
each one the corresponding Eduroam address pool.
After that, all data sources needed to be combined to
connect the captured sessions with the ground truth. At first,
we have correlated both logs sources. We created a log pair if
both logs entries contained the identical IP and MAC addresses
and the DHCP REQUEST timestamp lied within the interval
of RADIUS start and end timestamps with a tolerance of
one minute to deal with possible time dyssynchronization of
logging servers. This correlation resulted in tuples containing
ID,IP,MAC,device name,start time,end time, and ground
truth OS derived from the hostname. Finally, we have enriched
every flow record with the session ID and ground truth. The
key was the flow source IP address and flow start timestamp,
which had to fall exactly into the session time interval. We
experimented with different time tolerances for this mapping,
but even tolerances of tens of minutes added only a negligible
number of unpaired flows (i.e., ten minutes tolerance added
0.38 % of flows). Altogether, our dataset consists of or is a
result of the activity of:
18 708 983 enriched flows,
10 734 unique users,
45 602 unique Wi-Fi sessions,
11 962 unique MAC addresses,
8 071 unique IPv4 addresses assigned.
V. O S ID EN TI FIC ATIO N METHODOLOGY
We utilize four different methods to identify the OS of each
flow record. In this section, we describe the processing of
collected data, computing ground truth, and the settings of the
identification methods with a focus on the machine learning
algorithms. A brief overview of the identification level of detail
for each approach is presented in Table II.
TABLE II
OS IDENTIFICATION METHODS LEVEL OF DETAIL
Method Vendor Name Major
Version
Minor
Version
TCP/IP parameters ! ! (!) (!)
TLS handshake ! ! (!) (!)
User-Agent ! ! ! !
Specific domains ! ! 7 7
Ground truth X(!)7 7
A. Preprocessing
Machine learning algorithms were used for two methods,
TCP/IP, and TLS, and they require data transformations before
the learning or classification phase. We had already experi-
mented with machine learning for TCP/IP parameters in our
previous work [19] and the methodology stayed the same. The
three selected features (i.e., IP TTL, TCP Window Size, and
the size of initial TCP SYN packet) are all numerical values,
and during preprocessing we only round up the TTL value
to the nearest higher power of two according to Lippmann et
al. [20] to remove the influence of monitoring probe location.
In the case of TLS, we treat each feature as categorical.
The TLS version is an identifier which numerical value has no
real meaning, and the semantic of order relation on numbers
does not hold. The cipher suites and supported groups are
both ordered lists of IDs where the ordering is relevant for
the identification as it represents the preference of the client.
Similarly, we take the lists of extensions IDs and their lengths
as ordered to keep their semantics and position in the original
packet. During the preprocessing, we encode each feature into
a binary vector using one-hot encoding. The encoding ensures
all features retain their information value and that the encoding
does not introduce any new relations as if the values would
be treated as numbers. The encoder is persistently stored and
used on both learning and testing datasets.
For the User-Agent method, we did not use any preprocess-
ing. For the specific domains method, the information from
SNI is exported as a binary vector and HTTP host (domain
name) as a string. Therefore, the SNI values were converted
to string as well during the preprocessing.
B. OS Identification
The TCP/IP and TLS methods use a Decision tree to classify
the flows with labels corresponding to a specific minor version
of the OS. To train the classifier, we use the methodology
proposed by Hus´
ak et al. [10] and further extended by Ma-
touˇ
sek et al. [21] for flow monitoring. We pair the TCP/IP
parameters and User-Agents directly as they are present in the
same flow. For TLS, we pair HTTP and HTTPS requests from
the same device. Finally, we split the dataset into a training
one consisting of the annotated flows from the first day of
collected traffic. The rest of the flows were used as testing
dataset. Specific domains and User-Agent method stayed the
same as presented in previous paper [3] and could serve as a
comparison of traffic evolution in time.
C. Ground Truth
The ground truth of our experiment is based on the data
obtained from combining the DHCP and RADIUS auditing
logs, specifically, from the hostname parameter of DHCP RE-
QUEST events. Using this parameter to establish the devices’
OS works well for Apple and Google operating systems. Those
use identifiable device name set by default, and the user is
usually unable to change it freely. On the other hand, the
prevailing desktop operating systems, Windows and Linux,
are not as easily discerned. The Windows hostname is readily
editable, and by default, it is derived from the user name during
OS installation. The Linux hostname represents a similar
case; it is easily editable, and its default value is set during
installation, but also may vary for different Linux distributions.
Despite the aforementioned shortcomings, the ground truth
gained from DHCP log’s hostname should prove sufficient for
our experiment. The university network that we collect the
dataset from, Eduroam, is a wireless network environment,
so mobile devices, whose operating systems’ are rather easily
determined, prevail over the desktop ones by a large margin.
VI. RE SU LTS
In this section, we present how the implemented OS iden-
tification methods performed on our dataset. We also include
basic statistics of the traffic based on the usage of different
operating systems and versions of TLS and SSL protocols.
A. OS Identification Coverage
The first important measure of identification method capa-
bilities is its ability to identify the operating system from avail-
able data regardless of the result accuracy. We have evaluated
the coverage from two points of view. The first one represents
the ratio of flow records containing all features needed for the
identification method. The second is an aggregation of flows
into connection sessions where a session represents all flows
since the device connected to the network until it disconnected.
If the device sends at least one flow with all required features
during the session, the device OS for the whole session can
be identified.
The coverage results are summarized in Figure 2. A gener-
ally working method proved to be the TCP/IP parameters as it
depends on network and transport layer information and over
71 % of the captured traffic used TCP, and only a marginally
low number of sessions did not establish any TCP connection.
Similarly, almost every device (97.1 %) sent at least one TLS
0 %
20 %
40 %
60 %
80 %
100 %
TCP/IP
Parameters
TLS
Handshake
Specific
Domains
User-Agent
Flow Coverage Session Coverage
Fig. 2. Coverage of operating system identification methods.
handshake message, and the TLS traffic was responsible for
more than half (50.86 %) of the flows. The amount of traffic
for the other methods is significantly lower. As of Specific
domains and User-Agent parsing, the number of flows is 5 %,
resp. 3 %. Even so, they were able to identify the OS for
87 %, resp. 74 %, of the connections of the devices. This ratio
indicates their usability in large networks as they can filter out
a large amount of traffic before the identification and still keep
high coverage.
B. OS Identification Accuracy
We measure the identification accuracy using standard per-
formance metrics of accuracy, precision, recall, and F-score.
To deal with the multi-class classification of our methods,
we calculated the confusion matrix with true positive (TP),
false positive (FP), true negative (TN), and false negative (FN)
values for each of the lclasses (OS names), treating each
class as separate binary classifier. We assigned classification
prediction as TP if the ground OS name of the flow matched
the OS name of prediction regardless of the version as ground
truth does not cover this level of detail. From those values, we
calculated average accuracy and micro averaging of precision,
recall and F-score (β= 1) according to Sokolova et al. [22].
Detailed results of our experiments are listed in Table III,
and for visual comparison also on Figure 3. At first, we
discuss the results of repeated measurements of unchanged
methods (i.e., User-Agent and Specific domains) followed by
a description of the new or upgraded ones. The User-Agent
method produced the best and most consistent results. It is
based directly on the OS filled-in by the device itself and the
50 %
60 %
70 %
80 %
90 %
100 %
TCP/IP
Parameters
TLS
Handshake
Specific
Domains
User-Agent
Accuracy Precision Recall F-score
Fig. 3. Micro averaging of accuracy metrics.
TABLE III
MICRO AVERAGING OF ACCURACY METRICS
Method Accuracy Precision Recall F-score
TCP/IP parameters 0.9711 0.9137 0.9130 0.9133
TLS handshake 0.9312 0.8048 0.7749 0.7896
Specific domains 0.8659 0.5978 0.5974 0.5976
User-Agent 0.9764 0.9797 0.8763 0.9251
number of devices (intentionally) sending wrong information
in this field is very low, which is consistent with our previous
results. Also, Specific domains method provides consistent
results but with much lower F-score.
Our method based on TCP/IP parameters change was
twofold. The error in flow exporter which was present in
our previous work was fixed and it significantly increased the
coverage and provided correct data for prediction. The second
change was the shift from statistical analysis to machine
learning which improved the accuracy metrics notably. The
precision and recall over 90 % and almost complete coverage
makes TCP/IP method widely usable. Its main drawback is in
distinguishing versions of the same OS core used. Examining
the predictions, we found out that the primary source of false
positives and false negatives are the OS names iOS and MAC
OS which the classifier has troubles to distinguish as Apple
uses too similar parameters.
The new method based on TLS handshake parameters
generally proved very good results with the accuracy metrics
around 80 %. Surprisingly, this method was not able to identify
a single flow from Windows Phone and classified every one
of them as different versions of desktop Windows.
C. Traffic Statistics
Operating systems identified in the dataset are depicted on
Figure 4. Google Android takes the first place with 39.93 %
followed by Apple iOS and MAC OS with 30.03% and
Microsoft Windows with 28.49%. The remaining operating
systems (1.55 %) represent several Linux distributions and
minor mobile devices vendors (e.g., BlackBerry).
Google
Apple
Microsoft
Linux
Other
0 % 10 % 20 % 30 % 40 % 50 %
39.93 %
30.03 %
28.49 %
1.48 %
0.07 %
Fig. 4. Operating system usage share grouped by vendor.
Figure 5 shows the use of TLS versions among the clients.
Recent TLS versions 1.2 and 1.3 dominate the network traffic.
SSL was used only in 1887 flows with 292 of them in version
2.0; the rest was SSL 3.0.
TLS 1.3
TLS 1.2
TLS 1.1
TLS 1.0
SSL
0 % 10 % 20 % 30 % 40 % 50 % 60 %
48.72 %
50.13 %
0.06 %
1.07 %
0.02 %
Fig. 5. TLS and SSL protocol versions used by clients.
VII. CONCLUSION
In this paper, we proposed a method of passive identification
of the operating system based on flow monitoring data that
leverages information from TLS handshake. To support the
idea of OS identification in encrypted traffic, we enhanced OS
identification from TCP/IP parameters by exploiting machine
learning algorithms. Finally, we repeated our experiment with
OS identification using Specific domains and HTTP User-
Agent for comparison of the new methods to established ones.
Our results prove that the OS identification from encrypted
traffic is possible, and the used methods exhibited high ac-
curacy metrics. The method based on TCP/IP parameters is
comparable to unencrypted User-Agent identification with F-
score 91.33 % (compared to 92.51 % of User-Agent method).
The method based on TLS handshake parameters performed
a bit worse with accuracy metrics around 80 %, however,
excelled in the coverage. It was able to identify more than
97 % of the devices connected to the network, which is
significantly better portion than the Specific domains or User-
Agent methods could achieve.
Concerning lessons learned from the experiments, we would
argue that methods for OS identification are mature enough
and work in dynamic networks with the majority of traffic en-
crypted. However, data acquisition is becoming more complex.
The source flow data need to be enhanced with information
from applications protocols which are continuously evolving
and changing the specifications of the data fields from previous
versions. This evolution requires the flow exported to be
continuously updated as new protocols and protocol versions
are created. Also, the correlation of data from multiple data
sources required a lot of manual work and the use of heuristics
to correctly match log records to corresponding flows. In our
future work, we plan to focus on the automation of data re-
trieval from the infrastructure elements and their normalization
for (near) real-time flow annotation. We will also keep our
close cooperation with Flowmon Networks to apply results of
this research in their monitoring solution.
ACK NOW LE DG EM EN T
This research was partly supported by the CONCORDIA
project that has received funding from the European
Union’s Horizon 2020 research and innovation programme
under grant agreement No 830927 and partly by the
ERDF project “CyberSecurity, CyberCrime and Critical
Information Infrastructures Center of Excellence” (No.
CZ.02.1.01/0.0/0.0/16 019/0000822). Martin Laˇ
stoviˇ
cka is
Brno Ph.D. Talent Scholarship Holder – Funded by the Brno
City Municipality.
REFERENCES
[1] E. Rescorla, “The Transport Layer Security (TLS) Protocol Version 1.3,”
RFC 8446.
[2] R. Hofstede, P. ˇ
Celeda, B. Trammell, I. Drago, R. Sadre, A. Sperotto,
and A. Pras, “Flow Monitoring Explained: From Packet Capture to Data
Analysis With NetFlow and IPFIX,” IEEE Communications Surveys
Tutorials, 2014.
[3] M. Lastovicka, T. Jirsik, P. Celeda, S. Spacek, and D. Filakovsky, “Pas-
sive OS Fingerprinting Methods in the Jungle of Wireless Networks,”
in NOMS 2018-2018 IEEE/IFIP Network Operations and Management
Symposium. IEEE, 2018, pp. 1–9.
[4] C. Shen, C. Liu, H. Tan, Z. Wang, D. Xu, and X. Su, “Hybrid-augmented
device fingerprinting for intrusion detection in industrial control system
networks,” IEEE Wireless Communications, vol. 25, no. 6, pp. 26–31,
2018.
[5] I. Sanchez-Rola, I. Santos, and D. Balzarotti, “Clock Around the Clock:
Time-Based Device Fingerprinting,” in Proceedings of the 2018 ACM
SIGSAC Conference on Computer and Communications Security, ser.
CCS ’18. ACM, 2018, pp. 1502–1514.
[6] P. Velan, M. ˇ
Cerm´
ak, P. ˇ
Celeda, and M. Draˇ
sar, “A Survey of Methods
for Encrypted Traffic Classification and Analysis,” Netw., vol. 25, no. 5.
[7] B. Anderson, S. Paul, and D. McGrew, “Deciphering malwares use of
TLS (without decryption),” Journal of Computer Virology and Hacking
Techniques, vol. 14, no. 3, pp. 195–211, 2018.
[8] W. M. Shbair, T. Cholez, A. Goichot, and I. Chrisment, “Efficiently
bypassing SNI-based HTTPS filtering,” in 2015 IFIP/IEEE International
Symposium on Integrated Network Management (IM). IEEE, 2015, pp.
990–995.
[9] M. Korczy´
nski and A. Duda, “Markov chain fingerprinting to classify
encrypted traffic,” in IEEE INFOCOM 2014-IEEE Conference on Com-
puter Communications. IEEE, 2014, pp. 781–789.
[10] M. Hus´
ak, M. ˇ
Cerm´
ak, T. Jirs´
ık, and P. ˇ
Celeda, “HTTPS traffic anal-
ysis and client identification using passive SSL/TLS fingerprinting,”
EURASIP Journal on Information Security, 2016.
[11] Flowmon Networks. Flowmon Probe. [Online]. Available: https:
//www.flowmon.com/en/products/flowmon/probe
[12] P. Velan and R. Krejˇ
c´
ı, “Flow Information Storage Assessment Using
IPFIXcol,” in Dependable Networks and Services, ser. Lecture Notes in
Computer Science, vol. 7279. Springer, 2012, pp. 155–158.
[13] S. Blake-Wilson, N. Bolyard, V. Gupta, C. Hawk, and B. Moeller,
“Elliptic Curve Cryptography (ECC) Cipher Suites for Transport Layer
Security (TLS),” RFC 4492.
[14] D. K. Gillmor, “Negotiated Finite Field Diffie-Hellman Ephemeral
Parameters for Transport Layer Security (TLS),” RFC 7919.
[15] Internet Assigned Numbers Authority. Transport Layer Security (TLS)
Extensions. [Online]. Available: https://www.iana.org/assignments/
tls-extensiontype-values/tls-extensiontype-values.xhtml
[16] D. Benjamin, “Applying GREASE to TLS Extensibility,” Internet Engi-
neering Task Force, Tech. Rep., 2019.
[17] D. Eastlake, “Transport Layer Security (TLS) Extensions: Extension
Definitions,” RFC 6066.
[18] M. Laˇ
stoviˇ
cka, S. ˇ
Spaˇ
cek, P. Velan, and P. ˇ
Celeda, “Dataset Using TLS
Fingerprints for OS Identification in Encrypted Traffic,” 2019.
[19] M. Laˇ
stoviˇ
cka, A. Dufka, and J. Kom´
arkov´
a, “Machine learning fin-
gerprinting methods in cyber security domain: Which one to use?” in
2018 14th International Wireless Communications & Mobile Computing
Conference (IWCMC). IEEE, 2018, pp. 542–547.
[20] R. Lippmann, D. Fried, K. Piwowarski, and W. Streilein, “Passive op-
erating system identification from TCP/IP packet headers,” in Workshop
on Data Mining for Computer Security, 2003, p. 40.
[21] P. Matouˇ
sek, O. Ryˇ
sav´
y, M. Gr´
egr, and M. Vyml´
atil, “Towards Identifi-
cation of Operating Systems from the Internet Traffic: IPFIX Monitoring
with Fingerprinting and Clustering,” in 2014 5th International Confer-
ence on Data Communication Networking (DCNET), 2014.
[22] M. Sokolova and G. Lapalme, “A systematic analysis of performance
measures for classification tasks,” Information Processing & Manage-
ment, vol. 45, no. 4, pp. 427–437, 2009.