http://excel.fit.vutbr.cz
JA3cury - A new approch to TLS fingerprinting by
merging fingerprinting methods
Luk´
aˇ
s Hejcman1, Ing. Karel Hynek2, Ing. Tom´
aˇ
sˇ
Cejka Ph.D.3
Abstract
TLS is the most popular encryption protocol used on the internet today. It aims to provide high
levels of security and privacy for inter-device communication. However, it presents a challenge
from a network monitoring and administration standpoint, as it is not possible to analyse the
communication encrypted with TLS at a large scale with existing methods based on deep packet
inspection. Analysing encrypted communication can help administrators to detect malicious activity
on their networks, and can help them identify potential security threats. In this paper, we present
a method that allows us to leverage the advantages of two TLS fingerprinting methods, JA3 and
Cisco Mercury, to determine the operating system and processes of clients on multiple networks.
Our method is able to achieve comparable or better results than the existing Mercury approach
for our datasets whilst providing more analysis opportunities than JA3. Furthermore, by using JA3
fingerprints, we open the door to the utilisation of this approach in the wider industry, where JA3
fingerprinting is predominant.
Keywords: TLS – Fingerprint – Cisco – Mercury – JA3 – Identification – JA3cury
Supplementary Material: N/A
1xhejcm01@stud.fit.vutbr.cz,Faculty of Information Technology, Brno University of Technology
2hynekkar@fit.cvut.cz,Faculty of Information Technology, Czech University of Technology
3cejkat@cesnet.cz,CESNET, z.s.p.o.
1. Introduction
1
Network traffic analysis is the process of capturing
2
and analysing network traffic to increase the network’s
3
performance and security. Whilst many methods exist
4
to accomplish this, it is becoming more important to
5
utilize automatic techniques to filter the content on the
6
network into specific categories due to the constantly
7
increasing amount of traffic passing through networks.
8
However, network traffic analysis is becoming
9
more challenging due to the rise in the use of encrypted
10
traffic. According to a report by Google LLC[
1
], the
11
amount of encrypted network traffic using the TLS pro-
12
tocol has been steadily increasing since at least 2014
13
to the current 95% of all traffic on the internet. Encryp-
14
tion is generally perceived as beneficial towards the
15
privacy and security of the communication between
16
endpoints on the internet; however, it is making the
17
traditional network analysis approach based on packet
18
content inspection useless. Also, the recent report pub-
19
lished by ENISA
1
recognizes encrypted traffic as a
20
possible serious security threat due to hidden mali-
21
cious activities[
2
] that current monitoring tools cannot
22
easily detect. Therefore, it is essential to focus current
23
research activities on encrypted traffic analysis and
24
the retrieval of information about the connections and
25
communicating systems. 26
The traditional approach for encrypted traffic anal-
27
ysis is its decryption (e.g., using a network proxy).
28
However, this is very computationally expensive and
29
is therefore not feasible for high throughput networks
30
at a large scale. Furthermore, decrypting user commu-
31
nication can be seen as a security transgression and a
32
1European Union Agency for Cybersecurity
Figure 1. TLS Handshake overview
privacy concern. Traffic decryption is usually deployed
33
in a highly restricted environment, such as company
34
networks.35
One of the methods that sufficiently preserves the36
privacy and security benefits of encryption is TLS fin-
37
gerprinting. It works by gathering information about
38
the client from the unencrypted portion of the TLS
39
communication — the Client Hello packet. This packet
40
outlines the parameters of the TLS communication sup-
41
ported by the client. Because different combinations
42
of applications and operating systems support particu-
43
lar subsets of TLS communication parameters, we are
44
able to create a database of applications and their TLS
45
parameters. The database then allows us to evaluate
46
the encrypted portion of the communication and infer
47
the application name or the operating system type.48
This information about the user application and
49
operating systems is beneficial for network adminis-
50
trators and security experts. It allows the detection of
51
network policy violations, malware presence, or just
52
insecure and outdated versions of operating systems
53
or applications.54
One of the most common fingerprinting approaches
55
is the JA3 fingerprint introduced by Salesforce[
3
]. JA3
56
is the de-facto industry standard, supported in high-
57
performance monitoring tools (such as Flowmon ADS[
4
])
58
and IDSs (such as Suricata[
5
]). Even though JA3 is
59
very popular, it lacks a well-maintained and curated
60
public fingerprint database. The existing open-source
61
databases usually lack information, are unmaintained,
62
and sometimes even do not follow the standard JA3
63
fingerprint format.64
Anderson et al.[
6
] from Cisco research presented
65
a more accurate fingerprinting approach called Mer-
66
cury [
7
]. The mercury fingerprint approach allows a
67
much more accurate client taxonomy due to the vastly
68
larger amounts of information stored in the database.
69
Even though the Mercury database includes a well
70
maintained public database and it is in active devel-
71
opment, its adoption across the industry is poor. One
72
of the reasons might be the long plain-text fingerprint,
73
which put even more strain on network analysis tools. 74
Therefore, we propose a new approach to TLS fin-
75
gerprinting called JA3Cury, which combines the JA3
76
fingerprint with other context information from the
77
Mercury database. Surprisingly, even though it uses a
78
smaller fingerprint, the JA3cury matching algorithm
79
achieved larger accuracy than the original database
80
based on Mercury fingerprints in our testing environ-
81
ment. Furthermore, we are able to introduce this larger
82
precision to systems that currently use the JA3 ap-
83
proach without redesigning or reengineering these sys-
84
tems. 85
Our paper is organized as follows: section 2briefly
86
summarizes the TLS protocol and the concept of TLS
87
fingerprinting. Section 3describes the JA3 and Mer-
88
cury fingerprinting methods, section 4introduces the
89
JA3cury approach, section 5provide comparisons of
90
results obtained with JA3cury and Mercury, and finally
91
section 6concludes the paper. 92
2. TLS Fingerprinting 93
Transport Layer Security (TLS) is a cryptographic pro-
94
tocol designed to facilitate secure and encrypted com-
95
munication between two parties. It is based on the now
96
deprecated Secure Socket Layer (SSL) protocol. TLS
97
is the de-facto standard for secure communication on
98
the internet, as it is the protocol used by HTTP Secure
99
(HTTPS). 100
Before two clients can communicate through a
101
TLS secured connection, they must first agree on the
102
parameters of the connection, such as the ciphers sup-
103
ported by both sides. This negotiation happens during
104
the “handshake” phase of the communication. An
105
overview of the handshake can been seen in figure 1.106
The Client Hello and Server Hello messages are
107
always sent without encryption, because the param-
108
eters of the communication haven’t yet been agreed
109
upon. This gives us the opportunity the intercept these
110
messages and analyse their contents. 111
Both fingerprinting methods work by selecting a
112
subset of data from the Client Hello packet of the TLS
113
communication, compiling them into some format,
114
and comparing them with a database of collected and
115
annotated fingerprints. 116
3. Existing solutions
117
JA3118
The JA3 fingerprinting method works by only extract-
119
ing information from the following five fields of the
120
Client Hello packet:121
TLS version122
Supported cipher suites123
TLS extension headers124
Elliptic curves125
Elliptic curves point formats126
The fingerprint is then generated by concatenating
127
these fields in their decimal representation into a string
128
separated by comas, to generate the following:129
Version,Ciphers,Extensions,EC,ECPF130
The TLS standard does not require all these field
131
to be present in the Client Hello packet[
8
]. If any
132
field is missing, it is replaced with an empty string in
133
the fingerprint representation. The fingerprint is then
134
hashed with MD5.135
For example, the following string is a valid JA3
136
fingerprint in decimal format:137
769,4-5-47-51-50-10-22-19-9-
21-18-3-8-20-17-255,,,
This string would then be hashed with MD5 to
138
generate the string139
b677934e592ece9e09805bf36cd68d8a
which would then be used as the primary key in
140
the fingerprint database.141
The JA3 databases usually contain 3 fields: the
142
JA3 string, the JA3 hash, and the description. However,
143
the description of the fingerprint which is then used
144
for identification is not standardized; this results in
145
many fingerprints containing information about the
146
application in a format which is completely unsuitable
147
for further classification and analysis. For example,
148
the string149
BurpSuite Free
(Tested: 1.7.03 on Windows 10),
eclipse,JavaApplicationStub,idea
contains information about the application, the op-
150
erating system, and some further processes without
151
conforming to a specified format, and thus makes it
152
impossible to parse in large quantities. This results in
153
databases where the fingerprint classification must be
154
taken at face value and no further analysis is possible.155
Fingerprint
Total count
Timestamps
Processes
Process 1
Name
Total count
Application Category
SHA256
SHA256s
...
Autonomous Systems
AS 1
...
Hostname Domains
Domain 1
...
Application Port
https
...
OS
(F)(N)(V)
Count
...
...
...
Figure 2. The structure of an entry in the Mercury
database.
Mercury 156
The Mercury fingerprint format was developed by
157
Cisco by David McGrew and Blake Anderson. The
158
fingerprint itself contains much more information, as it
159
is basically a string representation of important fields
160
in the Client Hello packet in hexadecimal format. 161
The overall format of the fingerprint is the follow-
162
ing: 163
(version)(cipher suites) 164
((extensions)...) 165
Where
(version)
is the hex representation of
166
the advertised TLS version,
(cipher suites)
is a
167
list of hex values of cipher suited offered by the client,
168
and
((extensions)...)
contains the hex repre-
169
sentation of the extensions and their values (where
170
applicable). 171
Furthermore, compared to JA3, the Cisco Mer-
172
cury database contains much more information, and
173
is formatted so that it encourages further analysis of
174
the results. This format is visualized in figure 2. The
175
database was created by a novel approach of fusing net-
176
work endpoint data and captured fingerprints[
9
], so it
177
contains detailed process and contextual information.178
The approach of Mercury prefers generating a
179
knowledge base about a network based on a few clients,
180
which is then used for detection[
10
]. However, the
181
Mercury GitHub repository also includes a well main-
182
tained open source database which can be used for
183
identification without the need to generate custom
184
knowledge bases.185
The database fields in the Mercury database are
186
much more complex than in a JA3 databases. The
187
fields don’t contain a simple
1:1
mapping of finger-
188
print to process, but instead contains many possible
189
processes for each fingerprint, including the number
190
of times they were encountered when building the
191
knowledge base. This number can then be used in
192
further classification. In the latest version of the Mer-
193
cury database which we used for our experiments, the
194
largest number of distinct processes mapped to a single
195
fingerprint was 9.196
4. JA3cury
197
To leverage both the widespread usage of JA3 through-
198
out the industry and the better classification and anal-
199
ysis opportunities presented by the Mercury finger-
200
printing approach and database, we have devised an
201
approach we call JA3cury. With this approach, we
202
are able to search for JA3 fingerprints in the Mercury
203
database by converting existing Mercury fingerprints
204
to their corresponding JA3 representation.205
This conversion works by extracting the relevant
206
information from the Mercury fingerprint, formatting it
207
as a JA3 fingerprint and hashing it using the MD5 hash
208
function. The result is a fingerprint database based on
209
the Cisco Mercury database that can be indexed using
210
JA3 fingerprints.211
An example of this conversion can be seen in fig-
212
ure 3. As is shown in the figure, the conversion from
213
Mercury to JA3 is destructive; some information is lost
214
during the conversion. This means that the Mercury
215
database that previously contained only unique finger-
216
print entries now contains duplicate fingerprint entries
217
for some fingerprints. Out of the 9,060 entries in the
218
database we used, 1,914 unique fingerprints were lost;
219
this is a 21.1% decrease in fingerprint count.220
Initially, we were worried this would lead to a de-
221
crease in accuracy of client and process identification
222
compared to the original database; however, as we will
223
show in Section 5, the accuracy in our experiments
224
remained the same or even increased for some certain
225
scenarios. 226
This decrease in the number of unique fingerprints
227
also influenced the approach of our classification algo-
228
rithms to fingerprint collisions. Kotzias et al. found
229
that around 7.3% of JA3 fingerprints cause collisions
230
with each other[
11
]. However, since our approach of
231
converting fingerprints introduces collisions into the
232
database, our classification algorithms were designed
233
to deal with them and they didn’t cause a perceptible
234
decrease in detection accuracy. 235
Furthermore, it is important to note that the JA3cury
236
approach congregates the Client Hello classifications
237
over some time period, rather than identifying and
238
classifying a single Client Hello. This has lead to
239
better classification results overall, as gathering the
240
information over many handshakes increases the op-
241
erating system detection accuracy, as well as overall
242
process detection accuracy due to the systems ability
243
to overcome statistical anomalies. 244
5. Detection 245
Thanks to the complexity of the Mercury/JA3cury
246
database, we were able to try many different finger-
247
printing approaches with varying degrees of accuracy.
248
Overall, we created 7 algorithms for traffic classifica-
249
tion. Each algorithm took into account different com-
250
binations of information from the database, as well as
251
some contextual information about the Client Hello
252
packet, such as the destination port, server domain
253
name, etc. 254
During detection, we compared three sets of re-
255
sults: 256
As a baseline measurement, we used an unmod-
257
ified Mercury classification created with the of-
258
ficial pmercury utility. 259
Our classification algorithms performed using
260
the unmodified version of the database using
261
Mercury fingerprints. 262
Figure 3. Converting a Mercury fingerprint to JA3
Figure 4. Process classification results.
Our classification algorithms performed using
263
the JA3database.264
Our datasets contain over
48,000
Client Hello
265
packets collected over 6 home networks made up of
266
personal computers, laptops, home servers, and phones.
267
All the major operating systems (Windows, Mac, Linux,
268
Android, iOS) were represented in our experiments.269
Some of our classification algorithms even exposed
270
problems with the current version of the Mercury
271
database; it was created on corporate Cisco networks,
272
and thus skews heavily towards enterprise applications,
273
such as Cisco Webex, and towards very specific operat-
274
ing systems, mainly Mac OS. However, we discovered
275
that JA3cury is largely able to overcome this due to
276
the meshing together of different fingerprints from the
277
original Mercury database, which tends to average out
278
the discrepancies.279
Process and Category Detection280
Each process in the database contains a classification
281
into many categories, such as productivity, security,
282
or gaming. This means that process and category de-
283
tection are closely connected together. However, we
284
discovered it is possible to obtain a high accuracy of
285
category detection even with relatively low process
286
detection accuracy. This is due to the fact that the er-
287
roneous classifications tend to get averaged out due to
288
the vastly lower number of categories than processes,
289
which leads to larger detection scores.290
The average results for process classification of the
291
top 5 processes for each client can be seen in figure 4.292
Our JA3cury method was able to outperform the
293
baseline results generated by the official
pmercury294
for all our classification algorithms. Furthermore, our295
modified JA3cury database outperformed the original
296
Mercury database in all but one experiment.297
Furthermore, the category detection using our al-
298
gorithms was also successful, viz figure 5. Again, our
299
classification using JA3cury was more successful over-
300
all than either the original
pmercury
detection, or
301
Figure 5. Category classification results.
even the detection using our algorithms and the origi-
302
nal Mercury database. 303
The difference in scoring processes and categories
304
between the default Mercury and our JA3cury ap-
305
proaches is well illustrated in figure 6. This graph
306
shows the scores of different processes on the Y axis,
307
with each process being represented by a bar on the
308
X axis. Each process is also scored with Mercury and
309
JA3cury. You can see that JA3cury identified more pro-
310
cesses and categories, and it attributed a higher score
311
to correct processes compared to Mercury. 312
Operating System Detection 313
The information about operating system classification
314
in the Mercury database is dependent on the process
315
classification, as the operating system information is
316
nested inside each process (see figure 2). Furthermore,
317
the database unfortunately does not contain informa-
318
tion about mobile operating systems; instead, they tend
319
to be classified as desktop operating systems with the
320
most similar kernel architecture; MacOS for iOS de-
321
vices, and Linux for Android devices. 322
The database contains operating system informa-
323
tion split into three parts: the family (Linux, MacOS,
324
Windows), the name (Windows 10 Professional, Linux
325
4.19, ...), and the build version (10.5.6.7, ...). For our
326
experiment, we decided to classify the operating sys-
327
tem using a tree structure with depth of 4, where the
328
operating system frequency trickles down into the leaf
329
nodes. An example of this tree can be seen in figure 8.
330
Furthermore, the tree is sorted such that each par-
331
ent has its children ordered from the most frequent to
332
the least frequent. This allows us to find the most prob-
333
able operating system by taking the leftmost nodes. In
334
this case, the classification would result in
WinNT - 335
Windows 10 Enterprise - 10.0.18363.336
The operating system classification was performed
337
on all clients in each network. The comparison of
338
result for detection of the operating system family us-
339
ing our classifiers can be seen in figure 7. The figure
340
Figure 6. Detailed look at JA3cury and Mercury classification scores for one client.
contains only results created with our classification
341
algorithms, because
pmercury
doesn’t return infor-
342
mation about the operating system.343
Figure 7. Operating system classification results.
JA3cury was able to detect the operating system of
344
a client more accurately overall. However, the operat-
345
ing system detection was less reliable on our datasets
346
compared to process detection due to the prevalence
347
of Linux machines in our datasets. The database, how-
348
ever, contains many more entries for Windows and
349
Mac OS, than it does for Linux. Furthermore, the
350
fact that the database doesn’t contain mobile operating
351
systems leads to a lower accuracy as well.352
6. Conclusion
353
In conclusion, we have developed an alternative TLS
354
fingerprinting approach based on the strengths of JA3
355
and Mercury fingerprinting. This approach allows us
356
to utilize the more content rich Mercury database in a
357
setting where we would have to rely on the results of
358
JA3 without any further analysis. Furthermore, our ap-
359
proach is compatible with most existing fingerprinting
360
modules thanks to the wide adoption of JA3, and can
361
be used with existing modules and infrastructure with-
362
out the need of re-engineering or redesigning these
363
systems. Because our approach takes into account
364
Client Hello packets over a time period, and doesn’t
365
classify each packet separately, our approach is able
366
to overcome the disadvantage of missing information
367
in the JA3 fingerprint compared to the full Mercury
368
fingerprint. 369
Our method has proven to be at least as accurate
370
as default Mercury fingerprinting. Furthermore, when
371
used outside of corporate networks, it tends to be more
372
accurate. Furthermore, the process category was de-
373
tected correctly for all major categories. 374
The major area of further development could in-
375
clude increasing the accuracy of operating system de-
376
tection and the addition of mobile operating systems
377
into the Mercury database. 378
Acknowledgments 379
We would like to thank Blake Anderson and David
380
McGrew for their comments and feedback. 381
References 382
[1]
Google Transparency Report. HTTPS encryption
383
on the web – Google Transparency Report, 2021.
384
[2]
ENISA. Encrypted Traffic Analysis, April 2020.
385
[3]
John Althouse, Jeff Atkinson, and Josh Atkins.
386
salesforce/ja3, June 2017. original-date: 2017-
387
06-13T22:54:10Z. 388
Example Client
(1000)
WinNT (800)
Windows 10
Enterprise (790)
10.0.18363 (400) 10.0.18362 (300) 10.0.17763 (90)
Windows 10
Professional (10)
10.0.17134 (10)
Mac OS X (200)
Catalina (100)
10.15.3 (70) 10.15.2 (30)
Unknown (100)
Unknown (100)
Figure 8. OS Classification Tree
[4] Flowmon. Encrypted Traffic Analysis.389
[5]
Suricata. 6.17. JA3 Keywords — Suricata 6.0.1
390
documentation, 2019.391
[6]
Blake Anderson and David McGrew. Accurate
392
TLS Fingerprinting using Destination Context
393
and Knowledge Bases. arXiv:2009.01939 [cs],
394
September 2020. arXiv: 2009.01939.395
[7]
Blake Anderson, David McGrew, Brandon En-
396
right, Lucas Messenger, Adam Weller, An-
397
drew Chi, and Shekhar Acharya. cisco/mer-
398
cury, August 2021. original-date: 2019-08-
399
30T21:58:25Z.400
[8]
Eric Rescorla. RFC 8446 - The Transport Layer
401
Security (TLS) Protocol Version 1.3. Technical
402
report, August 2018.403
[9]
Blake Anderson, David McGrew, and Keith
404
Schomburg. The generation and use of tls finger-
405
prints, Jan 2019.406
[10] Blake Anderson and David McGrew. Video cor-407
respondece in regards to cisco cognitive intelli-
408
gence and cesnet collaboration., Mar 2021.409
[11]
Platon Kotzias, Abbas Razaghpanah, Johanna
410
Amann, Kenneth G. Paterson, Narseo Vallina-
411
Rodriguez, and Juan Caballero. Coming of age:
412
A longitudinal study of tls deployment. In Pro-
413
ceedings of the Internet Measurement Confer-
414
ence 2018, IMC ’18, page 415–428. Association
415
for Computing Machinery, Oct 2018.416