http://excel.fit.vutbr.cz

JA3cury - A new approch to TLS fingerprinting by

merging fingerprinting methods

Luk´

aˇ

s Hejcman1, Ing. Karel Hynek2, Ing. Tom´

aˇ

sˇ

Cejka Ph.D.3

Abstract

TLS is the most popular encryption protocol used on the internet today. It aims to provide high

levels of security and privacy for inter-device communication. However, it presents a challenge

from a network monitoring and administration standpoint, as it is not possible to analyse the

communication encrypted with TLS at a large scale with existing methods based on deep packet

inspection. Analysing encrypted communication can help administrators to detect malicious activity

on their networks, and can help them identify potential security threats. In this paper, we present

a method that allows us to leverage the advantages of two TLS fingerprinting methods, JA3 and

Cisco Mercury, to determine the operating system and processes of clients on multiple networks.

Our method is able to achieve comparable or better results than the existing Mercury approach

for our datasets whilst providing more analysis opportunities than JA3. Furthermore, by using JA3

fingerprints, we open the door to the utilisation of this approach in the wider industry, where JA3

fingerprinting is predominant.

Keywords: TLS – Fingerprint – Cisco – Mercury – JA3 – Identification – JA3cury

Supplementary Material: N/A

1xhejcm01@stud.fit.vutbr.cz,Faculty of Information Technology, Brno University of Technology

2hynekkar@fit.cvut.cz,Faculty of Information Technology, Czech University of Technology

3cejkat@cesnet.cz,CESNET, z.s.p.o.

1. Introduction

Network traffic analysis is the process of capturing

and analysing network traffic to increase the network’s

performance and security. Whilst many methods exist

to accomplish this, it is becoming more important to

utilize automatic techniques to filter the content on the

network into specific categories due to the constantly

increasing amount of traffic passing through networks.

However, network traffic analysis is becoming

more challenging due to the rise in the use of encrypted

traffic. According to a report by Google LLC[

], the

amount of encrypted network traffic using the TLS pro-

tocol has been steadily increasing since at least 2014

to the current 95% of all traffic on the internet. Encryp-

tion is generally perceived as beneficial towards the

privacy and security of the communication between

endpoints on the internet; however, it is making the

traditional network analysis approach based on packet

content inspection useless. Also, the recent report pub-

lished by ENISA

recognizes encrypted traffic as a

possible serious security threat due to hidden mali-

cious activities[

] that current monitoring tools cannot

easily detect. Therefore, it is essential to focus current

research activities on encrypted traffic analysis and

the retrieval of information about the connections and

communicating systems. 26

The traditional approach for encrypted traffic anal-

ysis is its decryption (e.g., using a network proxy).

However, this is very computationally expensive and

is therefore not feasible for high throughput networks

at a large scale. Furthermore, decrypting user commu-

nication can be seen as a security transgression and a

1European Union Agency for Cybersecurity

Figure 1. TLS Handshake overview

privacy concern. Traffic decryption is usually deployed

in a highly restricted environment, such as company

networks.35

One of the methods that sufficiently preserves the36

privacy and security benefits of encryption is TLS fin-

gerprinting. It works by gathering information about

the client from the unencrypted portion of the TLS

communication — the Client Hello packet. This packet

outlines the parameters of the TLS communication sup-

ported by the client. Because different combinations

of applications and operating systems support particu-

lar subsets of TLS communication parameters, we are

able to create a database of applications and their TLS

parameters. The database then allows us to evaluate

the encrypted portion of the communication and infer

the application name or the operating system type.48

This information about the user application and

operating systems is beneficial for network adminis-

trators and security experts. It allows the detection of

network policy violations, malware presence, or just

insecure and outdated versions of operating systems

or applications.54

One of the most common fingerprinting approaches

is the JA3 fingerprint introduced by Salesforce[

]. JA3

is the de-facto industry standard, supported in high-

performance monitoring tools (such as Flowmon ADS[

])

and IDSs (such as Suricata[

]). Even though JA3 is

very popular, it lacks a well-maintained and curated

public fingerprint database. The existing open-source

databases usually lack information, are unmaintained,

and sometimes even do not follow the standard JA3

fingerprint format.64

Anderson et al.[

] from Cisco research presented

a more accurate fingerprinting approach called Mer-

cury [

]. The mercury fingerprint approach allows a

much more accurate client taxonomy due to the vastly

larger amounts of information stored in the database.

Even though the Mercury database includes a well

maintained public database and it is in active devel-

opment, its adoption across the industry is poor. One

of the reasons might be the long plain-text fingerprint,

which put even more strain on network analysis tools. 74

Therefore, we propose a new approach to TLS fin-

gerprinting called JA3Cury, which combines the JA3

fingerprint with other context information from the

Mercury database. Surprisingly, even though it uses a

smaller fingerprint, the JA3cury matching algorithm

achieved larger accuracy than the original database

based on Mercury fingerprints in our testing environ-

ment. Furthermore, we are able to introduce this larger

precision to systems that currently use the JA3 ap-

proach without redesigning or reengineering these sys-

tems. 85

Our paper is organized as follows: section 2briefly

summarizes the TLS protocol and the concept of TLS

fingerprinting. Section 3describes the JA3 and Mer-

cury fingerprinting methods, section 4introduces the

JA3cury approach, section 5provide comparisons of

results obtained with JA3cury and Mercury, and finally

section 6concludes the paper. 92

2. TLS Fingerprinting 93

Transport Layer Security (TLS) is a cryptographic pro-

tocol designed to facilitate secure and encrypted com-

munication between two parties. It is based on the now

deprecated Secure Socket Layer (SSL) protocol. TLS

is the de-facto standard for secure communication on

the internet, as it is the protocol used by HTTP Secure

(HTTPS). 100

Before two clients can communicate through a

101

TLS secured connection, they must first agree on the

102

parameters of the connection, such as the ciphers sup-

103

ported by both sides. This negotiation happens during

104

the “handshake” phase of the communication. An

105

overview of the handshake can been seen in figure 1.106

The Client Hello and Server Hello messages are

107

always sent without encryption, because the param-

108

eters of the communication haven’t yet been agreed

109

upon. This gives us the opportunity the intercept these

110

messages and analyse their contents. 111

Both fingerprinting methods work by selecting a

112

subset of data from the Client Hello packet of the TLS

113

communication, compiling them into some format,

114

and comparing them with a database of collected and

115

annotated fingerprints. 116

3. Existing solutions

117

JA3118

The JA3 fingerprinting method works by only extract-

119

ing information from the following five fields of the

120

Client Hello packet:121

•TLS version122

•Supported cipher suites123

•TLS extension headers124

•Elliptic curves125

•Elliptic curves point formats126

The fingerprint is then generated by concatenating

127

these fields in their decimal representation into a string

128

separated by comas, to generate the following:129

Version,Ciphers,Extensions,EC,ECPF130

The TLS standard does not require all these field

131

to be present in the Client Hello packet[

]. If any

132

field is missing, it is replaced with an empty string in

133

the fingerprint representation. The fingerprint is then

134

hashed with MD5.135

For example, the following string is a valid JA3

136

fingerprint in decimal format:137

769,4-5-47-51-50-10-22-19-9-

21-18-3-8-20-17-255,,,

This string would then be hashed with MD5 to

138

generate the string139

b677934e592ece9e09805bf36cd68d8a

which would then be used as the primary key in

140

the fingerprint database.141

The JA3 databases usually contain 3 fields: the

142

JA3 string, the JA3 hash, and the description. However,

143

the description of the fingerprint which is then used

144

for identification is not standardized; this results in

145

many fingerprints containing information about the

146

application in a format which is completely unsuitable

147

for further classification and analysis. For example,

148

the string149

BurpSuite Free

(Tested: 1.7.03 on Windows 10),

eclipse,JavaApplicationStub,idea

contains information about the application, the op-

150

erating system, and some further processes without

151

conforming to a specified format, and thus makes it

152

impossible to parse in large quantities. This results in

153

databases where the fingerprint classification must be

154

taken at face value and no further analysis is possible.155

Fingerprint

Total count

Timestamps

Processes

Process 1

Name

Total count

Application Category

SHA256

SHA256s

...

Autonomous Systems

AS 1

...

Hostname Domains

Domain 1

...

Application Port

https

...

(F)(N)(V)

Count

...

Figure 2. The structure of an entry in the Mercury

database.

Mercury 156

The Mercury fingerprint format was developed by

157

Cisco by David McGrew and Blake Anderson. The

158

fingerprint itself contains much more information, as it

159

is basically a string representation of important fields

160

in the Client Hello packet in hexadecimal format. 161

The overall format of the fingerprint is the follow-

162

ing: 163

(version)(cipher suites) 164

((extensions)...) 165

Where

(version)

is the hex representation of

166

the advertised TLS version,

(cipher suites)

is a

167

list of hex values of cipher suited offered by the client,

168

and

((extensions)...)

contains the hex repre-

169

sentation of the extensions and their values (where

170

applicable). 171

Furthermore, compared to JA3, the Cisco Mer-

172

cury database contains much more information, and

173

is formatted so that it encourages further analysis of

174

the results. This format is visualized in figure 2. The

175

database was created by a novel approach of fusing net-

176

work endpoint data and captured fingerprints[

], so it

177

contains detailed process and contextual information.178

The approach of Mercury prefers generating a

179

knowledge base about a network based on a few clients,

180

which is then used for detection[

]. However, the

181

Mercury GitHub repository also includes a well main-

182

tained open source database which can be used for

183

identification without the need to generate custom

184

knowledge bases.185

The database fields in the Mercury database are

186

much more complex than in a JA3 databases. The

187

fields don’t contain a simple

1:1

mapping of finger-

188

print to process, but instead contains many possible

189

processes for each fingerprint, including the number

190

of times they were encountered when building the

191

knowledge base. This number can then be used in

192

further classification. In the latest version of the Mer-

193

cury database which we used for our experiments, the

194

largest number of distinct processes mapped to a single

195

fingerprint was 9.196

4. JA3cury

197

To leverage both the widespread usage of JA3 through-

198

out the industry and the better classification and anal-

199

ysis opportunities presented by the Mercury finger-

200

printing approach and database, we have devised an

201

approach we call JA3cury. With this approach, we

202

are able to search for JA3 fingerprints in the Mercury

203

database by converting existing Mercury fingerprints

204

to their corresponding JA3 representation.205

This conversion works by extracting the relevant

206

information from the Mercury fingerprint, formatting it

207

as a JA3 fingerprint and hashing it using the MD5 hash

208

function. The result is a fingerprint database based on

209

the Cisco Mercury database that can be indexed using

210

JA3 fingerprints.211

An example of this conversion can be seen in fig-

212

ure 3. As is shown in the figure, the conversion from

213

Mercury to JA3 is destructive; some information is lost

214

during the conversion. This means that the Mercury

215

database that previously contained only unique finger-

216

print entries now contains duplicate fingerprint entries

217

for some fingerprints. Out of the 9,060 entries in the

218

database we used, 1,914 unique fingerprints were lost;

219

this is a 21.1% decrease in fingerprint count.220

Initially, we were worried this would lead to a de-

221

crease in accuracy of client and process identification

222

compared to the original database; however, as we will

223

show in Section 5, the accuracy in our experiments

224

remained the same or even increased for some certain

225

scenarios. 226

This decrease in the number of unique fingerprints

227

also influenced the approach of our classification algo-

228

rithms to fingerprint collisions. Kotzias et al. found

229

that around 7.3% of JA3 fingerprints cause collisions

230

with each other[

]. However, since our approach of

231

converting fingerprints introduces collisions into the

232

database, our classification algorithms were designed

233

to deal with them and they didn’t cause a perceptible

234

decrease in detection accuracy. 235

Furthermore, it is important to note that the JA3cury

236

approach congregates the Client Hello classifications

237

over some time period, rather than identifying and

238

classifying a single Client Hello. This has lead to

239

better classification results overall, as gathering the

240

information over many handshakes increases the op-

241

erating system detection accuracy, as well as overall

242

process detection accuracy due to the systems ability

243

to overcome statistical anomalies. 244

5. Detection 245

Thanks to the complexity of the Mercury/JA3cury

246

database, we were able to try many different finger-

247

printing approaches with varying degrees of accuracy.

248

Overall, we created 7 algorithms for traffic classifica-

249

tion. Each algorithm took into account different com-

250

binations of information from the database, as well as

251

some contextual information about the Client Hello

252

packet, such as the destination port, server domain

253

name, etc. 254

During detection, we compared three sets of re-

255

sults: 256

•

As a baseline measurement, we used an unmod-

257

ified Mercury classification created with the of-

258

ficial pmercury utility. 259

•

Our classification algorithms performed using

260

the unmodified version of the database using

261

Mercury fingerprints. 262

Figure 3. Converting a Mercury fingerprint to JA3

Figure 4. Process classification results.

•

Our classification algorithms performed using

263

the JA3database.264

Our datasets contain over

48,000

Client Hello

265

packets collected over 6 home networks made up of

266

personal computers, laptops, home servers, and phones.

267

All the major operating systems (Windows, Mac, Linux,

268

Android, iOS) were represented in our experiments.269

Some of our classification algorithms even exposed

270

problems with the current version of the Mercury

271

database; it was created on corporate Cisco networks,

272

and thus skews heavily towards enterprise applications,

273

such as Cisco Webex, and towards very specific operat-

274

ing systems, mainly Mac OS. However, we discovered

275

that JA3cury is largely able to overcome this due to

276

the meshing together of different fingerprints from the

277

original Mercury database, which tends to average out

278

the discrepancies.279

Process and Category Detection280

Each process in the database contains a classification

281

into many categories, such as productivity, security,

282

or gaming. This means that process and category de-

283

tection are closely connected together. However, we

284

discovered it is possible to obtain a high accuracy of

285

category detection even with relatively low process

286

detection accuracy. This is due to the fact that the er-

287

roneous classifications tend to get averaged out due to

288

the vastly lower number of categories than processes,

289

which leads to larger detection scores.290

The average results for process classification of the

291

top 5 processes for each client can be seen in figure 4.292

Our JA3cury method was able to outperform the

293

baseline results generated by the official

pmercury294

for all our classification algorithms. Furthermore, our295

modified JA3cury database outperformed the original

296

Mercury database in all but one experiment.297

Furthermore, the category detection using our al-

298

gorithms was also successful, viz figure 5. Again, our

299

classification using JA3cury was more successful over-

300

all than either the original

pmercury

detection, or

301

Figure 5. Category classification results.

even the detection using our algorithms and the origi-

302

nal Mercury database. 303

The difference in scoring processes and categories

304

between the default Mercury and our JA3cury ap-

305

proaches is well illustrated in figure 6. This graph

306

shows the scores of different processes on the Y axis,

307

with each process being represented by a bar on the

308

X axis. Each process is also scored with Mercury and

309

JA3cury. You can see that JA3cury identified more pro-

310

cesses and categories, and it attributed a higher score

311

to correct processes compared to Mercury. 312

Operating System Detection 313

The information about operating system classification

314

in the Mercury database is dependent on the process

315

classification, as the operating system information is

316

nested inside each process (see figure 2). Furthermore,

317

the database unfortunately does not contain informa-

318

tion about mobile operating systems; instead, they tend

319

to be classified as desktop operating systems with the

320

most similar kernel architecture; MacOS for iOS de-

321

vices, and Linux for Android devices. 322

The database contains operating system informa-

323

tion split into three parts: the family (Linux, MacOS,

324

Windows), the name (Windows 10 Professional, Linux

325

4.19, ...), and the build version (10.5.6.7, ...). For our

326

experiment, we decided to classify the operating sys-

327

tem using a tree structure with depth of 4, where the

328

operating system frequency trickles down into the leaf

329

nodes. An example of this tree can be seen in figure 8.

330

Furthermore, the tree is sorted such that each par-

331

ent has its children ordered from the most frequent to

332

the least frequent. This allows us to find the most prob-

333

able operating system by taking the leftmost nodes. In

334

this case, the classification would result in

WinNT - 335

Windows 10 Enterprise - 10.0.18363.336

The operating system classification was performed

337

on all clients in each network. The comparison of

338

result for detection of the operating system family us-

339

ing our classifiers can be seen in figure 7. The figure

340

Figure 6. Detailed look at JA3cury and Mercury classification scores for one client.

contains only results created with our classification

341

algorithms, because

pmercury

doesn’t return infor-

342

mation about the operating system.343

Figure 7. Operating system classification results.

JA3cury was able to detect the operating system of

344

a client more accurately overall. However, the operat-

345

ing system detection was less reliable on our datasets

346

compared to process detection due to the prevalence

347

of Linux machines in our datasets. The database, how-

348

ever, contains many more entries for Windows and

349

Mac OS, than it does for Linux. Furthermore, the

350

fact that the database doesn’t contain mobile operating

351

systems leads to a lower accuracy as well.352

6. Conclusion

353

In conclusion, we have developed an alternative TLS

354

fingerprinting approach based on the strengths of JA3

355

and Mercury fingerprinting. This approach allows us

356

to utilize the more content rich Mercury database in a

357

setting where we would have to rely on the results of

358

JA3 without any further analysis. Furthermore, our ap-

359

proach is compatible with most existing fingerprinting

360

modules thanks to the wide adoption of JA3, and can

361

be used with existing modules and infrastructure with-

362

out the need of re-engineering or redesigning these

363

systems. Because our approach takes into account

364

Client Hello packets over a time period, and doesn’t

365

classify each packet separately, our approach is able

366

to overcome the disadvantage of missing information

367

in the JA3 fingerprint compared to the full Mercury

368

fingerprint. 369

Our method has proven to be at least as accurate

370

as default Mercury fingerprinting. Furthermore, when

371

used outside of corporate networks, it tends to be more

372

accurate. Furthermore, the process category was de-

373

tected correctly for all major categories. 374

The major area of further development could in-

375

clude increasing the accuracy of operating system de-

376

tection and the addition of mobile operating systems

377

into the Mercury database. 378

Acknowledgments 379

We would like to thank Blake Anderson and David

380

McGrew for their comments and feedback. 381

References 382

[1]

Google Transparency Report. HTTPS encryption

383

on the web – Google Transparency Report, 2021.

384

[2]

ENISA. Encrypted Traffic Analysis, April 2020.

385

[3]

John Althouse, Jeff Atkinson, and Josh Atkins.

386

salesforce/ja3, June 2017. original-date: 2017-

387

06-13T22:54:10Z. 388

Example Client

(1000)

WinNT (800)

Windows 10

Enterprise (790)

10.0.18363 (400) 10.0.18362 (300) 10.0.17763 (90)

Windows 10

Professional (10)

10.0.17134 (10)

Mac OS X (200)

Catalina (100)

10.15.3 (70) 10.15.2 (30)

Unknown (100)

Figure 8. OS Classification Tree

[4] Flowmon. Encrypted Traffic Analysis.389

[5]

Suricata. 6.17. JA3 Keywords — Suricata 6.0.1

390

documentation, 2019.391

[6]

Blake Anderson and David McGrew. Accurate

392

TLS Fingerprinting using Destination Context

393

and Knowledge Bases. arXiv:2009.01939 [cs],

394

September 2020. arXiv: 2009.01939.395

[7]

Blake Anderson, David McGrew, Brandon En-

396

right, Lucas Messenger, Adam Weller, An-

397

drew Chi, and Shekhar Acharya. cisco/mer-

398

cury, August 2021. original-date: 2019-08-

399

30T21:58:25Z.400

[8]

Eric Rescorla. RFC 8446 - The Transport Layer

401

Security (TLS) Protocol Version 1.3. Technical

402

report, August 2018.403

[9]

Blake Anderson, David McGrew, and Keith

404

Schomburg. The generation and use of tls finger-

405

prints, Jan 2019.406

[10] Blake Anderson and David McGrew. Video cor-407

respondece in regards to cisco cognitive intelli-

408

gence and cesnet collaboration., Mar 2021.409

[11]

Platon Kotzias, Abbas Razaghpanah, Johanna

410

Amann, Kenneth G. Paterson, Narseo Vallina-

411

Rodriguez, and Juan Caballero. Coming of age:

412

A longitudinal study of tls deployment. In Pro-

413

ceedings of the Internet Measurement Confer-

414

ence 2018, IMC ’18, page 415–428. Association

415

for Computing Machinery, Oct 2018.416