The Generation and Use of TLS Fingerprints
Blake Anderson, PhD; David McGrew, PhD; Keith Schomburg
Cisco
Reducing the Visibility Gap
?
VM
?
•TLS parameters offered in the ClientHello can
provide library/process attribution [1-6]
•Applications
•Network forensics
•Malware detection [2]
•Identifying obsolete/vulnerable software
•OS fingerprinting [3]
•Advantages
•No endpoint agent required
•Completely passive
TLS Fingerprinting Overview
Fingerprinting Goals
•Maximize discerning power by including all informative data
features
Efficacy
•Enable approximate matching where needed
Flexibility
•Accommodate missing data and new protocol features
Compatibility
•Fingerprint format is interpretable and forensically sound
Reversibility
•Fast and compact extraction and matching
Performance
•Problem: Current fingerprint databases are slow to update and lack real-
world, contextual data.
•Solution: Continuously and automatically fuse network and endpoint data.
Network and Endpoint Data Fusion
?
VM
?
Network Data
Endpoint Data
Long-
Term
Storage
•Cipher Suites
•Generalize GREASE cipher suites: 0x0a0a,...,0xfafa -> GREASE
•Extensions
•Generalize GREASE extension types/data
•0x0a0a,...,0xfafa -> GREASE
•Remove session specific extension data
•server_name, padding, session_ticket
TLS Feature Extraction and Pre-Processing
Identify
Protocol
Parse
Packet
Extract
Data
Normalize
Data
Comparison with Previous Work
Database Size Automatically Updated GREASE Support Static Extension Data
Our Work ~1,500 Yes Yes supported_groups
ec_point_formats
status_request
signature_algorithms
application_layer_
protocol_negotiation
supported_versions
psk_key_exchange_modes
Kotzias et al. [4] ~1,684 No Discards Locality supported_groups
ec_point_formats
JA3 [5] 158 No Discards All Data supported_groups
ec_point_formats
FingerprinTLS [6] 409 No No supported_groups
ec_point_formats
signature_algorithms
TLS Fingerprint Database Schema
Metadata TLS Information Attribution
TLS Fingerprint Database Schema
Metadata AttributionTLS Information
Metadata
TLS Fingerprint Database Schema
TLS Information Attribution
•Generated from 30M+ real-world TLS sessions
•1,567 fingerprints
•454 unique cipher suite vectors
•1,092 unique cipher suite + extension type vectors
•12,644 unique process hashes
•2,411 unique process names
General Stats
Operating System Representation
Application Representation
Similarity Matrix
Firefox
Chrome
OpenSSL
Schannel
Secure Transport
Cisco Collab
Python
Java
•String alignment over TLS features
Approximate TLS Fingerprinting
True Label Inferred Label
Alignment
Fingerprint Matching Overview
Identify
TLS
Extract
FP Data
Find
Match
Find
Approximate
Match
False
True
FP
Database
Report
Match
Update Database with
Approximate Match
Data Plane
Control Plane
Performance (Unoptimized Python)
Fingerprint Prevalence
TLS Fingerprint Visibility
TLS Session Visibility
•Fingerprint database and relevant code has been open-sourced:
•https://github.com/cisco/joy
•Joy
•Packet parsing and fingerprint extraction
•Python Scripts
•Exact and approximate matching
•Generation of custom fingerprint database from Joy output
Implementation
•More data!
•iOS, Android, and Linux
•Incorporate other fingerprint databases
•Time window analysis
Next Steps
[1] https://github.com/cisco/joy
[2] Blake Anderson, Subharthi Paul, David McGrew; Deciphering Malware’s Use of TLS (without
Decryption); arxiv, 2016; Journal of Computer Virology and Hacking Techniques, 2017.
[3] Blake Anderson, David McGrew; OS Fingerprinting: New Techniques and a Study of Information Gain
and Obfuscation; IEEE CNS 2017, https://arxiv.org/abs/1706.08003
[4] Platon Kotzias, Abbas Razaghpanah, Johanna Amann, Kenneth G. Paterson, Narseo Vallina-Rodriguez,
Juan Caballero; Coming of Age: A Longitudinal Study of TLS Deployment; IMC, 2018
[5] John B. Althouse, Jeff Atkinson, Josh Atkins; JA3 –A Method for Profiling SSL/TLS Clients
[6] Lee Brotherston; FingerprinTLS
References
Thank You