HAL Id: hal-03526240
https://hal.inria.fr/hal-03526240
Submitted on 14 Jan 2022
HAL is a multi-disciplinary open access
archive for the deposit and dissemination of sci-
entific research documents, whether they are pub-
lished or not. The documents may come from
teaching and research institutions in France or
abroad, or from public or private research centers.
L’archive ouverte pluridisciplinaire HAL, est
destinée au dépôt et à la diffusion de documents
scientifiques de niveau recherche, publiés ou non,
émanant des établissements d’enseignement et de
recherche français ou étrangers, des laboratoires
publics ou privés.
DRAWNAPART: A Device Identification Technique
based on Remote GPU Fingerprinting
Tomer Laor, Naif Mehanna, Antonin Durey, Vitaly Dyadyuk, Pierre
Laperdrix, Clémentine Maurice, Yossi Oren, Romain Rouvoy, Walter
Rudametkin, Yuval Yarom
To cite this version:
Tomer Laor, Naif Mehanna, Antonin Durey, Vitaly Dyadyuk, Pierre Laperdrix, et al.. DRAW-
NAPART: A Device Identification Technique based on Remote GPU Fingerprinting. Net-
work and Distributed System Security Symposium, Feb 2022, San Diego, United States.
�10.14722/ndss.2022.24093�. �hal-03526240�
DRAWNAPART: A Device Identification Technique
based on Remote GPU Fingerprinting
Tomer Laor*
Ben-Gurion Univ. of the Negev
tomerlao@post.bgu.ac.il
Naif Mehanna*
Univ. Lille, CNRS, Inria
naif.mehanna@univ-lille.fr
Antonin Durey
Univ. Lille, CNRS, Inria
antonin.durey@univ-lille.fr
Vitaly Dyadyuk
Ben-Gurion Univ. of the Negev
vitalyd@post.bgu.ac.il
Pierre Laperdrix
Univ. Lille, CNRS, Inria
pierre.laperdrix@univ-lille.fr
Cl´
ementine Maurice
Univ. Lille, CNRS, Inria
clementine.maurice@inria.fr
Yossi Oren
Ben-Gurion Univ. of the Negev
yos@bgu.ac.il
Romain Rouvoy
Univ. Lille, CNRS, Inria / IUF
romain.rouvoy@univ-lille.fr
Walter Rudametkin
Univ. Lille, CNRS, Inria
walter.rudametkin@univ-lille.fr
Yuval Yarom
Univ. of Adelaide
yval@cs.adelaide.edu.au
Abstract—Browser fingerprinting aims to identify users or
their devices, through scripts that execute in the users’ browser
and collect information on software or hardware characteristics.
It is used to track users or as an additional means of iden-
tification to improve security. Fingerprinting techniques have
one significant limitation: they are unable to track individual
users for an extended duration. This happens because browser
fingerprints evolve over time, and these evolutions ultimately
cause a fingerprint to be confused with those from other devices
sharing similar hardware and software.
In this paper, we report on a new technique that can signif-
icantly extend the tracking time of fingerprint-based tracking
methods. Our technique, which we call DR AWNAPA RT, is a
new GPU fingerprinting technique that identifies a device from
the unique properties of its GPU stack. Specifically, we show
that variations in speed among the multiple execution units
that comprise a GPU can serve as a reliable and robust device
signature, which can be collected using unprivileged JavaScript.
We investigate the accuracy of D RAWN APART under two sce-
narios. In the first scenario, our controlled experiments confirm
that the technique is effective in distinguishing devices with
similar hardware and software configurations, even when they
are considered identical by current state-of-the-art fingerprinting
algorithms. In the second scenario, we integrate a one-shot
learning version of our technique into a state-of-the-art browser
fingerprint tracking algorithm. We verify our technique through
a large-scale experiment involving data collected from over 2,500
crowd-sourced devices over a period of several months and show
it provides a boost of up to 67% to the median tracking duration,
compared to the state-of-the-art method.
DRAWN APART makes two contributions to the state of the
art in browser fingerprinting. On the conceptual front, it is the
first work that explores the manufacturing differences between
*Both authors are considered co-first authors.
identical GPUs and the first to exploit these differences in a
privacy context. On the practical front, it demonstrates a robust
technique for distinguishing between machines with identical
hardware and software configurations, a technique that delivers
practical accuracy gains in a realistic setting.
I. INTRODUCTION
Privacy is dignity. It is a human right. In the domain of
web browsing, the right to privacy should prevent websites
from tracking user browsing activity without consent. This is
the case in particular for cross-site tracking, in which website
owners collude to build browsing profiles spanning multiple
websites over extended periods of time. Unfortunately for
users, the right to privacy conflicts with business interests.
Website owners are highly interested in tracking users for the
purpose of showing them ads they are more likely to click on,
or to recommend products they are more likely to purchase.
We focus on the common scenario where identifying a
browser is equivalent to tracking a user. The traditional way
to track users is with cookies, small files that are stored by
the browser at the request of the website, and forwarded to
the website on demand [50]. Recent regulations restrict and
supervise the acquisition of private data by websites [4,31],
and in particular require that users consent to the use of
cookies. Furthermore, in an effort to protect users’ privacy and
curb tracking, modern browsers restrict cookie-based tracking,
especially third-party trackers that attempt to track users across
multiple unrelated websites.
To overcome the limitations of cookies, less scrupulous
websites often resort to an approach called browser fingerprint-
ing. To fingerprint a browser, the website provides a script that
queries the browser’s software and hardware configuration to
collect attributes, such as the browser’s version, OS, timezone,
screen, language, list of fonts, or even the way the browser
renders text and graphics. The diversity of configurations
allows websites to discriminate devices and, hence, to track
users, without the use of cookies [52], even in a collection
spanning millions of fingerprints [43]. Surveying the Internet
Network and Distributed Systems Security (NDSS) Symposium 2022
27 February - 3 March 2022, San Diego, CA, USA
ISBN 1-891562-74-6
https://dx.doi.org/10.14722/ndss.2022.24093
www.ndss-symposium.org
demonstrates that browser fingerprinting techniques are preva-
lent and used by many websites, no matter their category or
ranking [38,40,59].
A significant difficulty of fingerprint-based tracking is that
browser fingerprints evolve. As shown by Vastel et al. [73],
fingerprints change frequently, sometimes multiple times per
day, due to software updates and configuration changes. To
track a user, an adversary must link fingerprint evolutions
into a single coherent chain. This process is made difficult by
the existence of devices with identical hardware and software
configurations. It is difficult for an adversary to correctly link
a fingerprint if there is a set of identical devices to which it
might belong. This limits the adversary’s tracking duration.
In Vastel et al.’s evaluation over a dataset of nearly 100,000
fingerprints collected from 1,905 distinct browser instances,
with a wide variety of fingerprinting attributes, their state-
of-the-art machine learning technique was able to deliver a
median tracking time of less than two months.
In this work, we bring a new insight to the challenge
of browser fingerprinting identical computers, by observing
that even nominally identical hardware devices have slight
differences induced by their manufacturing process. These
manufacturing variations are shown to enable the extraction of
unique and robust fingerprints from a variety of devices, both
large and small, in other settings [44,71]. If an adversary
were able to extract such a hardware fingerprint from the
user’s device, it would significantly extend the adversary’s
ability to track them. Extracting a hardware fingerprint from
a browser, however, is far from trivial—since the attacker
has little control. In particular, the attacker can only interact
with the system through unprivileged JavaScript code and
WebGL graphics primitives—the attacker has no control over
the runtime environment of the system, including background
processes and simultaneous user activity—and the attacker has
very limited exposure to the system, making classical machine
learning pipelines that rely on long training phases all but
useless. Thus, in this paper we raise the following question:
Can browser fingerprinting work on devices with identical
hardware and software configurations?
Our Contribution. We claim this is possible, and we assess
this claim with DR AWN APART, a technique that measures
small differences among the Execution Units (EUs) that make
up a modern Graphics Processing Unit (GPU). By fingerprint-
ing the GPU stack, DRAWN APART can tell apart devices with
nominally identical configurations, both in the lab and in the
wild. In a nutshell, to create a fingerprint, DR AWN APART gen-
erates a sequence of rendering tasks, each targeting different
EUs. It times each rendering task, creating a fingerprint trace.
This trace is transformed by a deep learning network into an
embedding vector that describes it succinctly and points the
adversary towards the specific device that generated it.
We evaluate DRAWN APART in two main scenarios. First, to
validate the method’s ability to distinguish nominally identical
configurations, we perform a series of controlled experiments
under lab conditions. We experiment with multiple sets of
identical devices from vendors including Intel, Apple, Nvidia
and Samsung, and demonstrate that DRAWN APART consis-
tently improves identification of these nominally identical
devices, achieving high identification accuracy in multiple
hardware configurations, even though state-of-the-art browser-
based fingerprinting methods cannot tell them apart. Second,
to show that DR AWNAPART affects user privacy, we integrate
the technique into Vastel et al.’s state-of-the-art fingerprinting
algorithm from IEEE S&P 2018 [73], which uses machine
learning to link browser fingerprint evolutions. We show that
the median tracking duration is improved by up to 66.66%
once we add the DR AWN APART fingerprint.
In summary, this paper makes the following contributions:
We design and implement DR AWNAPART1, a GPU finger-
printing technique based on the relative speed of EUs, that
observes minute differences between GPUs (Section III).
We investigate the performance of our fingerprinting tech-
nique with multiple sets of identical devices, demonstrat-
ing that it can tell apart devices with identical hardware
and software configurations (Section V).
We integrate DR AWNAPART into Vastel et al.’s finger-
printing algorithm and show, through a large-scale crowd-
sourced experiment with over 2,500 unique devices and
almost 371,000 fingerprints, that DRAWN APART delivers
considerable gains to the tracking accuracy of this state-
of-the-art approach (Section VI).
We suggest possible countermeasures against our fin-
gerprinting technique, and discuss their advantages and
drawbacks (Section VII-B).
II. BAC KG ROU ND
A. Browser Fingerprinting
Mowery et al. [56] discuss fingerprinting on the Web.
As they state, fingerprinting can be applied constructively or
destructively. An example of constructive use of fingerprints
would be to identify fraudulent users trying to log in while
masquerading as legitimate users. Browser fingerprinting can
be used to detect bots [27,48,74], or support authentication,
where the fingerprint is used in addition to a traditional authen-
tication mechanism [20,51]. A destructive use might involve
tracking users without consent [17,40]. In this scenario,
fingerprinting is used to augment or replace cookies—e.g., to
track across multiple domains, or when users disable or delete
cookies. Our technique can be applied to either scenario.
Many fingerprinting techniques exist in the wild [24,39,
57,58]. They rely heavily on differences in devices’ hardware
and software characteristics found in HTTP header fields and
JavaScript attributes. The key challenge is to identify features
and attributes that further discriminate devices and allow for
their unique identification, and to overcome the tendency of
these features to evolve over time because of changes to the
user’s software, configuration, or environment.
B. GPU Programming
The Graphics Processing Unit (GPU) is specialized hard-
ware for rendering graphics. GPUs have highly parallel ar-
chitectures that are composed of multiple Execution Units
1The artifact accompanying this paper can be found at: https://github.com/
drawnapart/drawnapart.
2
(EUs), or shader cores, which can independently perform
arithmetic and logic operations. Most consumer desktop and
mobile processors from the past decade have on-chip GPUs
with multiple EUs. For example, the UHD Graphics 630
GPU—integrated into Intel Core i5-8500 CPUs—includes 24
EUs, while the Mali-G72 GPU—integrated into the Samsung
Exynos 9810 chipset used in Galaxy S9, S9+, Note9, and
Note10 Lite devices—includes 18 EUs.
Web Graphics Library (WebGL) is a cross-platform API
for rendering 3D graphics in the browser [12]. WebGL is
implemented in major browsers including Safari, Chrome,
Edge, and Firefox. Derived from native OpenGL ES 2.0, a
library designed for developing graphic applications in C++,
WebGL implements a JavaScript API for rendering graphics
in an HTML5 canvas element. WebGL takes a representation
of 3D objects as a list of vertices in space and information on
how to render them, and translates them into a two-dimensional
raster image that can be displayed on screen. WebGL abstracts
this process as a pipeline. Two pipeline steps which are of
interest to this work are the vertex shader, which places
the vertices in the two-dimensional canvas, and the fragment
shader, which determines the color and other properties of
each fragment. The vertex and fragment shaders can run
user-supplied programs, written in a C-derived programming
language named GL Shading Language (GLSL).
III. GPU FINGERPRINTING
A. Motivation
Similar to past work [39,52], we aim to uniquely identify
devices. However, unlike previous work, which rely on the
diversity of hardware and software configurations, we focus on
distinguishing identical devices. As we show experimentally,
this additional distinguishing power can considerably enhance
the tracking capabilities of existing fingerprinting methods.
To do so, we incorporate techniques similar to the arbiter-
based Physically Unclonable Function (PUF) concept of Lee
et al. [53]. In an arbiter PUF, the statistical delay variations
of wires and transistors across multiple instances of the same
integrated circuit design are used to uniquely identify individ-
ual instances of the integrated circuit. In our case, we harness
the statistical speed variations of individual EUs in the GPU
to uniquely identify a complete system.
B. Design
With unfettered access to the GPU, an adversary could
measure the speed of each EU and use those measurements
as a fingerprint. However, websites only have limited access
to the GPU through the JavaScript and WebGL APIs. WebGL
provides a high-level abstraction that makes it a challenge to
target specific EUs and to time computations accurately.
We overcome this challenge by using short GLSL programs
executed by the GPU as part of the vertex shader (cf. Sec-
tion II-B). We rely on the mostly predictable job allocation
in the WebGL software stack to target specific EUs. We
observe that, when allocating a parallel set of vertex shader
tasks, the WebGL stack tends to assign the tasks to different
EUs in a non-randomized fashion. This allows us to issue
multiple commands that target the same EUs. Finally, instead
of measuring specific tasks, we ensure that the execution time
... ...
1
2
3
requestAnimationFrame
drawArrays
OpenGL API
Operating System
GPU Drivers
Fig. 1. Overview of our GPU fingerprinting technique: (1) points are rendered
in parallel using several EUs; (2) the EU drawing point iexecutes a stall
function (dark), while other EUs return a hard-coded value (light); (3) the
execution time of each iteration is bounded by the slowest EU.
of the targeted EU dominates the execution time of the whole
pipeline. We do so by assigning the non-targeted EUs a vertex
shading program that is quick to complete, while assigning the
targeted EUs tasks whose execution time is highly sensitive to
the differences among individual EUs. As shown in Figure 1,
our fingerprint is created by executing a sequence of drawing
operations. We measure the time to draw a sequence of points
with carefully chosen shader programs. The technique consists
of three main steps:
Render. We instruct the WebGL API to draw a number
of points in parallel. Points are the simplest object that
WebGL can draw, and each consists of only a single vertex.
Using points minimizes the noise from the pipeline and its
interference with our technique. The position of each point is
determined by an attacker-controlled vertex shader.
Stall. For most points, the attacker-controlled vertex shader
returns a hard-coded value. For a specific subset of the points
the shader applies a function, which we call a stall function, to
compute the point’s position. The manner in which the entire
graphics stack distributes the points to be drawn to the EUs
allows us to influence which EU is chosen to run the stall
function. It takes much longer to compute the position with
the stall function than the hard-coded value. As a result, the
time needed to render the entire set of points corresponds to
the time taken by the EUs running the stall function.
Trace Generation. We execute the drawing command several
times, each time selecting a different vertex to stall. For each
execution, we store the time taken. The fingerprint output by
our technique is therefore a vector, named a trace, which
contains the sequence of timing measurements.
We note that prior browser fingerprinting techniques extract
deterministic fingerprints, which remain identical as long as
the device’s software and configuration have not changed. Our
technique, in contrast, is based on timing measurements and, as
such, is non-deterministic—multiple measurements made on
3
the same device will return different values due to the effects
of measurement noise, quantization, and the impact of other
tasks running at the same time.
C. Implementation
We now describe the implementation of each design step.
Render. The WebGL API exposes the drawArrays()
function, which allows dispatching multiple drawing opera-
tions in parallel to the GPU. We invoke drawArrays()
several times, each time rendering multiple points in parallel.
Listing 1 describes our main render loop. We execute the
rendering process by calling drawArrays (line 5). For each
iteration, we save the time to execute drawArrays into the
trace array. We evaluated several ways of measuring the
rendering time, as explained further in Section V-A. Briefly
put, the onscreen measurement method executes a relatively
small number of computationally intensive operations, while
the offscreen and GPU measurement methods execute a larger
number of less computationally intensive operations. The full
source code for these settings can be found in our artifact
repository, as listed in Section IX. After point_count
iterations, the code sends the trace array to our back-end
server (line 15), and terminates the loop.
1function render_loop() {
2if (point_index < point_count) {
3// Stall the current point
4gl.uniform1i(shader_stalled_point_id,
point_index);
5gl.drawArrays(gl.POINTS,0,point_count);
6// Save the rendering time
7var dt = performance.now() - prev_time;
8prev_time = performance.now();
9trace.push(dt);
10 // Prepare to stall the next point
11 point_index++;
12 requestAnimationFrame(render_loop);
13 }else {
14 // Finish and send the trace to the server
15 send_trace();
16 }
17 }
Listing 1. Main Render loop, onscreen setting (JavaScript).
Stall. In the current implementation of WebGL, a single call
to drawArrays() generates multiple drawing operations in
the underlying graphics API, which appear to assign vertices
to EUs in a deterministic order during vertex processing.
The operations are differentiated by a global variable, named
gl_VertexID. This special variable is an integer index for
the current vertex, intrinsically generated by the hardware in all
of the graphics APIs used to implement WebGL as it executes
gl.drawArrays. We created a vertex shader in GLSL that
examines the gl_VertexID identifier, and executes a com-
putationally intensive stall function only if it matches an input
variable named shader_stalled_point_id provided by
the JavaScript code running on the CPU. Listing 2 describes
the vertex shader code.
In the onscreen setting, the vertex shader checks if
shader_stalled_point_id equals gl_VertexID. In
the offscreen and GPU settings, the vertex shader treats
shader_stalled_point_id as a bit mask and checks if
1uniform int shader_stalled_point_id;
2void main(void) {
3// Stall on this vertex?
4if(shader_stalled_point_id == gl_VertexID) {
5gl_Position =vec4(stall_func(),0, 1,1);
6}else {
7gl_Position =vec4(0, 0, 1 ,1);
8}
9gl_PointSize = 1.0;
10 }
Listing 2. Vertex shader with stall function, onscreen setting (GLSL).
GEN3-1
GEN3-6
Fig. 2. Raw traces from two different G EN 3 devices.
bit 1 << gl_VertexID is set. In both cases, if the point is
selected the vertex shader program executes the stall function
(line 5). Otherwise, the shader exits quickly.
Trace Generation. By executing this parallel drawing
operation multiple times, each with a different value for
shader_stalled_point_id, we iterate over the different
EUs and measure the relative performance of each. The output
is a trace of multiple timing measurements, corresponding to
the time taken by the targeted EU to draw the scene.
D. Raw Traces
Before evaluating DRAW NAPART, we tested whether we
can visually distinguish devices. Figure 2 shows traces col-
lected from two GE N 3 devices. We collect 50 traces from
each device, each trace consisting of 176 measurements of 16
points. The measurements are divided into 16 groups of 11,
where in each group we stall a different point. The color of
a point indicates the rendering time, ranging from virtually
0 (white) to 90 ms (blue). Red vertical bars indicate group
boundaries. As we can see, the rendering time in the first half
of the traces is significantly faster than in the second half.
Moreover, while there are some timing variations in the traces
of the same device, the traces display patterns that are distinct
between devices, allowing us to distinguish them.
IV. EVALUATIO N OVE RVIEW
A. Motivation
We claim that our new method provides a tangible advan-
tage over deterministic GPU-based fingerprinting. To establish
this claim, we evaluate our system in a lab setting and in the
wild.
In the lab setting, we assume the attacker can collect
training traces from a set of identical machines (same hardware
and software), running under identical environmental condi-
tions. Next, the attacker is given a single trace and is tasked
with identifying the machine that generated the trace. Our
primary metric of evaluation in this setting is the accuracy
gain, which measures the multiplicative gain in accuracy of a
4
classifier that incorporates our non-deterministic method, when
compared to a classifier which only uses deterministic inputs.
An accuracy gain of 1 means that the classifier provides no
advantage over traditional methods, while higher values show
that it gives the attacker an advantage. The lab setting provides
the most advantageous conditions for our classifier, for several
reasons. First, existing deterministic schemes cannot tell apart
identical devices, as we demonstrate experimentally, resulting
in a very low base rate. Second, the attacker can tailor the
attack to the particular class of devices to be discriminated,
and thus choose optimal parameters for the target hardware.
Third, the workload on the target machines is controlled,
minimizing measurement noise. Finally, the attacker is not
concerned with detectability or compatibility, and can run an
experiment that takes a long time, that uses partially supported
hardware features, or that is noticeable to the user.
We also evaluate our system in the wild. More specifically,
we evaluate how our method can be applied to track devices
from a set of over 2,500 machines with 1,605 distinct GPU
configurations, recruited through a crowd-sourcing experiment.
We first perform a standalone evaluation of our method, in the
absence of additional identifying features. We then provide
additional deterministic features to the classifier, including the
browser version, screen dimensions, HTTP headers, and other
similar attributes. State-of-the-art fingerprinting techniques can
produce unique browser fingerprints through the consideration
of these signals, but these fingerprints are not ideal for tracking
users since they evolve over time [73]. We therefore measure
the added distinguishing power our method provides to existing
browser fingerprinting schemes, with the primary metric of
evaluation being the additional tracking time made possible
through the combination of our novel technique with existing
schemes.
The in-the-wild setting is more challenging. First, the
technique must perform well across a large number of devices,
precluding tailored attacks, and the attacker is prohibited from
using any trace collection method that is overly intrusive or
time-consuming. Second, the attacker’s choice of machine
learning pipelines is constrained. In particular, the attacker
cannot use a long training phase since this does not make sense
in the context of browser fingerprinting—the fingerprint should
be useful at once, and not depend on the victim spending
hours on the attacker’s website. The attacker must also be
able to accommodate new devices joining the dataset in real-
time, and should not be required to spend multiple CPU hours
retraining the classifier every time a new device is detected.
Finally, the attacker cannot control the runtime characteristics
of the machine being fingerprinted. Our method will have to
be tolerant to workload variations, GPU payloads from other
tabs, browser and system restarts, and so on.
In the following section, we study the lab setting to demon-
strate an upper bound on our classifier’s potential accuracy
gain, and to investigate parameter choices and their trade-offs
on accuracy, compatibility and performance. In Section VI,
we select a single set of parameters and launch a large-scale
crowd-sourced experiment in the wild, showing the advantage
of our method in a realistic setting.
B. Machine Learning Pipelines
We use two machine learning approaches to evaluate our
fingerprinting technique. In the lab setting, we cast our finger-
printing problem as a conventional multinomial classification
task, where the input is the trace of Nrendering times, and
the output is the label of the device assumed to have generated
this trace. We evaluated several classical machine learning
models suitable for this task, including tree-based classifiers,
k-Nearest Neighbors classifiers, Linear Discriminant Analysis,
and Support Vector Machines. We ultimately chose to use the
Random Forest ensemble classification algorithm [26,54], as it
empirically delivered the best classification results in terms of
accuracy. We did not apply any feature engineering, submitting
the raw traces into the classification algorithm. To make sure
we did not overfit our model, we applied a 5-fold train-test
split to the data, and collected the mean accuracy reported by
the folds, as well as the standard deviation among folds.
To evaluate our system in the wild, we needed a more
elaborate pipeline for the reasons listed in Section IV-A. Our
method relies on neural networks and consists of several
steps: 1) We preprocessed our traces by normalizing and
reshaping them into matrix form. 2) We trained a convolutional
neural network (CNN) to solve the multinomial classification
task. 3) We transformed the classification network into an
embedding network using the semi-hard triplet loss algorithm
of Schroff et al. [67]. The resulting network is capable of trans-
forming our trace into a representation called an embedding.
Because of the way the network is designed, the Euclidean
distance between two traces from the same device will be
small, while the Euclidean distance between traces from differ-
ent devices will be large. This allows the inference part of the
classification to use the k-Nearest Neighbors classifier—given
an unknown trace, measure the distance between its embedding
and the embeddings of all known traces, and output the label
of the embedding at the shortest distance. The simplicity of
this classifier means the adversary can add new devices to
the dataset simply by recording a few new traces and without
retraining the entire network, a desirable property known as
few-shot learning.
To ensure we did not overfit our in-the-wild model, we
split our training dataset into two mutually exclusive parts,
each with different labels, performed the evaluation on each
part in isolation, and observed that the accuracies for each split
were roughly the same. More details about the training process
and dataset splits can be found in Section VI.
V. L AB SE TTING
The objective of the lab setting is to discover DRAWN-
APART’s highest accuracy, and assumes that the attacker
customizes the attack to the class of device and ignores aspects
of detectability, compatibility or performance.
Evaluated Devices. Table I lists the devices used in the
lab setting. We used 88 devices from nine distinct hardware
classes, including desktops and mobile devices. The desktops
include multiple generations of Intel processors, all running
Windows 10, as well as a set of Apple Mac mini devices
with an Apple M1 chip, running MacOS X Version 11.1.
Other than the GE N 10 devices, which had discrete Nvidia
GTX1650 GPUs, all desktops used integrated graphics. For
5
TABLE I. ACCURACY GAINS ACHIEVED UNDER LAB CONDITIONS
Device Type GPU Device
Count Timer Base
Rate (%) Accuracy (%) Gain
Intel i5-3470 (GE N 3 Ivy Bridge) Intel HD Graphics 2500 10 Onscreen 10.0 93.0±0.3 9.3
Offscreen 10.0 36.3±1.6 3.6
Intel i5-4590 (GE N 4 Haswell) Intel HD Graphics 4600 23
Onscreen 4.3 32.7±0.3 7.6
Offscreen 4.3 63.7±0.6 14.7
GPU 4.3 15.2±0.5 3.5
Intel i5-8500 (GE N 8 Coffee Lake) Intel UHD Graphics 630 15
Onscreen 6.7 42.2±0.7 6.3
Offscreen 6.7 55.5±0.8 8.3
GPU 6.7 53.5±0.8 8.0
Intel i5-10500 (GE N 10 Comet Lake) Nvidia GTX1650 10 Offscreen 10.0 70.0±0.5 7.0
GPU 10.0 95.8±0.9 9.6
Apple Mac mini M1 Apple M1 4 Offscreen 25.0 46.9±0.4 1.9
GPU 25.0 73.1±0.7 2.9
Samsung Galaxy S8/S8+ Mali-G71 MP20 6 Onscreen 16.7 36.7±2.7 2.2
Samsung Galaxy S9/S9+ Mali-G72 MP18 6 Onscreen 16.7 54.3±5.5 3.3
Samsung Galaxy S10e/S10/S10+ Mali-G76 MP12 8 Onscreen 12.5 54.1±1.5 4.3
Samsung Galaxy S20/S20 Ultra Mali-G77 MP11 6 Onscreen 16.7 92.7±1.8 5.6
each class, the devices were purchased through the same order,
configured with our University’s official operating system
image, and located in the same temperature-controlled lab.
The mobile devices include multiple generations of Samsung
Galaxy devices, all sourced through the Samsung Remote Test
Lab [10]. All the mobile devices were Android-based and
featured Samsung Exynos CPUs and Mali GPUs.
Comparison With Prior Fingerprinting Techniques. Before
evaluating our technique, we reproduced and tested several
state-of-the-art web-based fingerprinting techniques.
UniqueMachine, presented by Cao et al. at NDSS
2017 [28], collects a “browser fingerprint”, with mutable
properties such as window size and IP address, and a more
permanent “computer fingerprint”. The UniqueMachine web-
site offers a demo that outputs both fingerprints as 32-character
hashes. We collected the fingerprints of all of the computers
in our GE N 3, GEN 4, GE N 8, and GEN 10 corpora using
UniqueMachine, and confirmed that all computers in the same
corpus were assigned the same computer fingerprint. Interest-
ingly, the GEN 4 and GEN 10 PCs shared the same computer
fingerprint despite having different hardware configurations.
Fingerprint JS (FPJS) is a commercial API offering
“browser fingerprinting as a service”. The paid-for version,
called FPJS Pro, claims to provide “unparalleled accuracy, ease
of use, and security” [6]. FPJS Pro outputs a 20-character hash.
The website provides a demo of FPJS Pro. We collected the
fingerprints of all computers in our GE N 3, GEN 4, GE N 8, and
GEN 10 corpora using the demo website. In the GEN 3 dataset,
all but one computer had the same fingerprint. Similarly to
UniqueMachine, all of the computers in the GEN 4 and GEN 10
corpora had identical FPJS fingerprints. Finally, FPJS divided
the GE N 8 corpus into three clusters: two clusters with seven
computers each, and the final cluster with one computer.
Clock around the Clock, proposed by S´
anchez-Rola
et al. at CCS 2018 [66], is an alternative to GPU-based
fingerprinting. This method is designed to exploit “small,
but measurable, differences in the clock frequency” by mea-
suring the precise execution times of a series of CPU-
intensive operations. To calculate the fingerprint, the com-
puter invokes the cryptographic random number generator
crypto.getRandomValues 1,000 times for 50 different
input sizes, then generates a vector of the most common timing
value, or mode, for each of the input sizes. We reproduced the
web-based variant of the method, and tested it on our GE N 4
corpus. We found that the modes did not contain any data
useful for fingerprinting. This is likely because since July 2018
Chrome contains countermeasures designed to prevent fine-
grain timing measurements, as part of the wider fallout of the
Spectre attacks [29,60,62,77]. All our measurements returned
either zero or five microseconds (with some added random-
ness). We conclude that, currently, the method presented by
S´
anchez-Rola et al. is not practical.
A. Tuning the Trace Parameters
We search for the parameter settings that provide the opti-
mal accuracy gain for the different hardware configurations.
Stall Function Operator Selection. Each model and gener-
ation of GPU has a different micro architecture. For example,
the third-generation Intel integrated GPU has a single arbiter,
which dispatches tasks to all EUs, while fourth-generation
GPUs adopt a hierarchical micro-architecture with multiple
arbiters. Intel GPUs also have Advanced Math (AM) Units,
which are tasked with executing less common operations such
as trigonometric operations. The amount and location of these
AM units differs among GPU generations, and even within
different GPU types from the same generation. The design
of GPUs by Nvidia, ARM and Apple is obviously different
as well. We hypothesize that, due to these differences, the
accuracy gain provided by our method will vary, depending
both on the choice of stall functions and target hardware.
To test this, we evaluated a representative set of operators,
including trigonometric operations, logical bit-wise operations,
and general floating-point operations. The set of operators
selected can be found in Appendix A.
Timing Measurement Method. Scene rendering is per-
formed in the GPU context, which is asynchronous to
the CPU context. Simply measuring the time it takes the
CPU to execute the draw operation, for example by calling
performance.now() immediately before and after the
call, does not provide any usable insight about the GPU.
We therefore considered three measurement methods that are
capable of measuring the actual drawing time of the GPU.
In the onscreen method, we render the scene
to a standard HTML canvas element and then call
6
Window.requestAnimationFrame. This function
is passed a callback function that is called after the rendering
is complete. Timing information is then extracted from
within the callback. The onscreen method is the most
compatible of those we evaluated, but browsers do not call
requestAnimationFrame at a rate higher than the
browser’s maximum frame rate, which is typically 60 Hz.
Thus, using this method requires that each iteration of our
rendering operation take at least 16 ms to provide us with
useful information. Even though the canvas element is on
screen, it can be made zero-size or invisible via styling,
making the fingerprinting operation invisible to the user.
Collecting the fingerprint does cause a noticeable slowdown
for the user since it runs in the browser’s main context.
In the offscreen method we use a worker thread and render
the scene to an OffscreenCanvas object. This does not
affect the user’s main context and does not slow down the
user. After rendering the scene, we call the convertToBlob
method of the OffscreenCanvas, causing it to execute all
instructions currently in the WebGL pipeline, and ultimately
return a binary object representing the image contained in the
canvas. We measure the time it takes to execute this command.
Since there is no frame rate limit in this setting, each iteration
of the rendering operation can take less time, allowing us to use
more iterations. At the time of writing, OffscreenCanvas
is supported on Chrome browsers, hidden behind a flag on
Firefox, and partially supported in the Technical Preview build
of Safari.
The GPU method is the third method we evaluate. It is
a modification of the offscreen method that does not measure
timing on the CPU side. Instead, the WebGL disjoint timer
query method is used to directly measure the duration of a
set of graphics commands on the GPU side. To perform this
measurement, we call beginQuery, issue the drawing oper-
ations, and call endQuery. Using getQueryParameter,
we retrieve the elapsed time on the GPU side. This disjoint
timer query command was previously used for side-channel
attacks by Frigo et al. in their work in IEEE S&P 2018 [42]. As
a result, support for this timer was disabled in Chrome version
65. However, with the introduction of Site Isolation [16], it
was deemed safe to be re-enabled in Chrome version 70 [55].
In contrast to CPU-side timers, whose resolutions have been
severely reduced to a few microseconds with jitter to mitigate
against transient execution attacks [63], the GPU-side timer
offers microsecond resolution with no jitter even on the most
modern versions of Chrome [15]. This GPU-based timer thus
has the potential to be the most accurate and the least sensitive
to activity on the CPU side. On the other hand, its accuracy
varies dramatically between different GPU architectures, and
it is not supported by the commonly used Google SwiftShader
renderer.
Number Of Points To Render. Our fingerprinting scheme
relies on multiple iterations of a drawing command, where each
iteration exercises a certain subset of the EUs while leaving
the other EUs idle. The number of iterations and the time each
iteration takes to run will determine the total execution time.
However, it is reasonable to assume that capturing more data
will provide better accuracy, and that relatively long workloads
will mitigate the impact of the low-resolution timers available
through JavaScript. We ran two experiments to capture this
trade-off. The first was run in the onscreen setting, using the
GEN 3 corpus. The frame rate requirement of the onscreen
setting limits each iteration to at least 16 ms, as explained
above. The second experiment was run in the offscreen setting
using the GE N 4 corpus. This setting allowed us to use much
shorter workloads and to increase the number of iterations
that can be run in a reasonable time period. Thus, instead
of assigning the stall function for each point only once per
iteration, we tried all 2npossible subsets of the set of points,
allowing us to measure the contention between EUs, as well
as their individual speeds.
B. Results
Table I summarizes the accuracy gains obtained in the lab
setting using different timing methods. The mobile devices
were evaluated using the onscreen method only due to limited
access to those devices. GE N 3 and GE N 4 are not evaluated
using the GPU timer method since their hardware does not
support it. All devices within each hardware class were sam-
pled the same amount of times. We observed that our Random
Forest-based classifier approaches peak accuracy as the size of
the training data set approaches 500 traces per label. As the
table shows, our scheme delivered significant accuracy gains,
well above the base rate, in all scenarios, both for desktop and
mobile devices. The parameter choices, however, did affect the
performance of our scheme.
Effect Of Stall Function. As expected, each of the operators
we evaluated performed differently on the different hardware
targets. Specifically, in the onscreen setting, the mul operator
delivered the best accuracy gains for the GEN 3 and G EN 4
corpora, while exp2 was the best performer for the GE N 8
corpora, as described in more detail in Appendix A. The
different mobile device corpora, which were also evaluated in
the onscreen setting, also had different optimal operators: pow
for Galaxy S8/S8+ and Galaxy S9/S9+, atanh for Galaxy
S10e/S10/S10+ and mul for Galaxy S20/S20+/S20 Ultra.
In the offscreen setting, the sinh operator was consistently
the best performer for the GEN 4 and GEN 8 corpora, while
mul was better than sinh for the GE N 10 corpora. We
hypothesize that since the offscreen setting allowed us to
trigger multiple execution units at the same time, and the
amount of advanced math units that handle trigonometric
operations is lower than the amount of EUs, the conflicts and
race conditions that arise inside the GPU gave this operator
additional discriminating power.
Effect Of Timing Measurement Method. As stated above,
the offscreen method allowed us to execute more iterations
than the onscreen method, allowing us to capture data about
EU contention, as well as on the timing of individual EUs.
We were also interested in comparing the relative performance
of the offscreen method, which measured time on the CPU
side, and the GPU method, which used disjoint timer queries
to measure performance on the GPU side. We hypothesizes
that the GPU method would be superior to the offscreen
method, since the GPU-side timer has higher accuracy than
the CPU-side timer, and is not affected by the timing jitter
introduced by inter-process communications (IPC) between
the GPU and the CPU. In practice, we discovered that this
is not always the case. As shown in Table I, the GPU timer is
7
better than the CPU timer for the Intel GE N 10 and Apple
M1 corpora, has equivalent accuracy to the CPU timer on
the GE N 8 corpus, and is actually less accurate than the CPU
timer on the Intel GE N 4 corpus. To make matters worse, the
disjoint timer query WebGL extension is not supported on
several popular WebGL stacks, most significantly the software-
based Google SwiftShader. Thus, the GPU-based timer is not
appropriate for use in a large-scale experiment where the
hardware configuration is not known beforehand.
Accuracy vs. Capture Time. Figure 3 shows the accuracy
gain as a function of trace capture time, both for the GE N 3
corpus using the onscreen collection method, and for the
GEN 4 using the offscreen collection method. As the Figure
shows, the accuracy gain of both methods approaches its
optimal point when samples are collected for around 2 seconds.
This is reached after about 80 iterations in the onscreen method
and 1024 iterations in the offscreen method.
0
2.5
5
7.5
10
12.5
15
0 1 2 3 4 5
Accuracy Gain
Trace Capture Time (seconds)
Gen 3 onscreen
Gen 4 offscreen
Fig. 3. Accuracy gain as a function of trace capture time
Swapping Hardware. To reinforce our claim that the clas-
sification results are due to differences in the behavior of
the GPUs, and not due to some residual differences among
the computers, we selected two GE N 3 computers, physically
swapped their hard drives, and re-ran the fingerprinting classi-
fier. As expected, the fingerprinting classifier was not misled by
the hard disk transplant, and was still able to label each of the
two computers according to their CPU. Next, we returned the
hard drives to their original locations, and physically swapped
the CPUs with integrated graphics of the two systems. As
expected, the classifier followed the transplanted CPU, even
though all other hardware was unmodified.
C. Evaluation on Additional Browsers.
We collected and evaluated traces from 16 devices from
the GE N 4 corpora using multiple additional browsers: Brave
browser [2] (version 81.0.4044.113), Edge [5] (version
96.0.1054.43), Opera [8] (version 82.0.4227.23) and Yandex
browser [13] (version 21.11.3.927), all using the offscreen
method. The accuracy showed a significant improvement over
the base rate, which lies at 6.25%, with Edge, Brave, Opera
and Yandex, delivering accuracies of 34.6±0.6%, 31.0±0.3%,
31.6±0.7%, and 31.1±0.3%, respectively.
We evaluated the stability of DRAWN APART over 21 de-
vices of the GE N 4 corpora for an extended period of time. We
collect data for both Chrome and Firefox. For Chrome, we use
the onscreen and offscreen methods. For Firefox, which does
not currently support the offscreen method, we are limited to
the onscreen method. We also chose to stall the EU for twice
as many operations under Firefox, compared to Chrome, to
account for the lower timer resolution found in Firefox.
For 24 days, we repeatedly launched the browser, collected
traces for 20 minutes using the offscreen method and for 40
minutes using the onscreen method, then quit the browser and
idled for 4 hours. The first 4 cycles were used to train the
Random Forest classifier, while the remaining cycles over the
experiment’s 24 days were used to evaluate its performance.
The results are summarized in Figure 4 and show the accuracy
to be above the base rate for each point in time. We observed
that the offscreen method yields slightly higher accuracy than
onscreen, and that the accuracy of both methods on Chrome
slightly decay over time, while the accuracy of the onscreen
method on Firefox remains stable. Finally, the accuracy in this
experiment is lower compared to the results reported in Table I.
It is possibly due to repeatedly restarting the browser over the
course of the experiment, as we discuss in Section VII-C.
0
1
2
3
4
0
7
14
21
Accuracy Gain
Days from experiment start
Chrome (Onscreen)
Chrome (Offscreen)
Firefox (Onscreen)
Fig. 4. Additional Browsers – Lab Evaluation
D. Summary
Our results show that DR AWNAPART can tell apart iden-
tical computers in a controlled lab setting. Our next objective
was to a realistic setting, in which the attacker has less
control over the devices to be fingerprinted. We did so by
first evaluating DRAWNAPART in a standalone setting, and
then integrating it with a state-of-the-art browser fingerprinting
algorithm.
VI. IN-T HE -W IL D SETTING
Performing browser fingerprinting in the wild presents
different challenges compared to what we experienced with
the lab setting: 1) The lab evaluation assumed a closed list of
devices. In the real world, new devices can be added at any
time during the collection period, but we cannot re-train the
model whenever it happens. 2) The lab evaluation assumed
we had a long time to collect data and train over the devices.
In the real world, we do not have unlimited access to a
device so the collection of data must be fast. 3) Finally,
the lab evaluation assumed the devices were idle and in a
controlled environment. In the real world, we have to contend
with variable computing loads, restarts, and updates to both
the browser and the operating system. In order to understand
the potential impact of DR AWN APART in the real world, we
collected 370,392 traces from 2,550 devices over 7 months
and performed the two following evaluations:
Standalone evaluation: Considering only DR AWNAPART
traces without any other information, we aim to see how
our method performs at reidentifying a device among
8
others. In Section VI-B, we propose a one-shot learning
pipeline whose aim is to match a new trace with another
known trace present in our dataset.
Tracking over time: Browser fingerprints evolve [39].
Vastel et al. developed two algorithms to track evolutions
and link fingerprints that belong to the same device [72].
In Section VI-D, we show how DRAW NAPART can
improve the FP-STALKER algorithms, which are the
current state-of-the-art tracking algorithms, by increasing
the duration users can be tracked. Our main metric to
evaluate the gain of our technique will be the median
tracking time. Contrary to the standalone evaluation, we
use all the attributes listed in Appendix B as well as the
DRAWNAPART traces.
A. Dataset constitution
Large-scale Experiment. To show DR AWN APART’s prac-
tical advantages over traditional deterministic fingerprinting
methods as used in FP -STAL KE R, we launched a large-scale
experiment with diverse hardware and software. We integrated
our DR AWNAPART technique into the Chrome browser ex-
tension from the AMIUNIQUE crowd-source experiment [52].
The extension periodically collects the browser fingerprints of
thousands of volunteers, allowing us to track their evolution.
DRAWN APART Collection Parameters. The crowd-sourced
experiment constrained our choices. Most importantly, we
wanted to be as non-intrusive as possible, as to not cause
any user-perceivable slowdowns. In addition, we wanted to
be compatible with various rendering stacks we encounter
in the wild. Finally, we were interested in selecting a stall
function that discriminates a wide variety of hardware. With
these constraints, we selected the offscreen timing method,
which is supported by all desktop versions of Chrome. The
onscreen method was not selected as it causes slowdowns, and
the GPU method was not selected since it is not supported by
the Google SwiftShader renderer. We chose the sinh stall
function operator, which provided good performance during
our tests. We render all possible subsets of 10 points in each
trace, for a total of 210 = 1,024 iterations per trace. This
fingerprint takes a median time of 1.6 seconds to run. It is
collected by the extension using a worker thread, without
affecting the user’s interactions with the browser. To increase
our trace count, we repeated each collection seven times, for
a median total run time of approximately 12 seconds. We
collected the traces every four hours.
Dataset Preparation. Our dataset contains 370,392 finger-
prints from 2,550 unique devices. In each fingerprint, we
collect the attributes listed in Appendix B, together with 7
DRAWNAPART traces. We identify devices with the same GPU
by looking at the WebGL renderer string property. Over 90% of
the devices shared a renderer string with at least one additional
device. The largest observed group with same renderer string
consisted of 534 unique devices.
We split our dataset into three subsets, divided by measure-
ment time: 1MP contains 109,375 samples collected between
3-Jan-2021 and 7-Feb-2021, 2MP contains 46,293 samples
collected between 7-Feb-2021 and 31-Mar-2021, and 3MP
contains 214,724 samples collected from 3-May-2021 to 8-
Jul-2021. We randomly choose 65% of the devices in 1MP that
have more than 28 samples, and refer to this subset as 1MP65.
The rest of 1MP will be referred to as 1MPrest The limit of
28 samples, or 196 DR AWN APART traces, was chosen to make
sure the neural network will generalize well, by preventing
it from overfitting on a small amount of traces of a specific
device. We normalized each trace and reshaped a vector of
length 1024 into a 32x32 matrix.
B. Standalone evaluation
Before integrating our model with FP-S TAL KE R, we first
evaluate it in isolation using only DRAW NAPART traces and
ignoring the other attributes. In contrast to the classical ML
model used in the lab setting, we used a neural network
pipeline for the in-the-wild setting. The ultimate goal of the
pipeline is to generate quality embeddings in Euclidean space,
which express the distance between traces. We begin the
process by creating a Convolutional Neural Network (CNN)-
based multinomial classifier. The structure we selected for the
classifier is inspired by Picek [61], and includes Nconvolution
blocks followed by a flatten layer, a dense layer, another
L2-normalized dense layer without activation, and concluding
with a fully connected layer with softmax activation. Each
convolution block contains a convolution layer, a dropout
layer, and an average pooling layer. We used scikit-optimize’s
Bayesian optimization [11] to search for the best parameters,
as described in Appendix C, using 80% of the traces in 1MP65
for training, and the remainder of 1MP65 for validation. The
parameter search took 48 hours on a server with four NVIDIA
GEFORCE RTX 2080 Ti GPUs, two Intel Xeon Silver 4110
CPUs, and 128 GiB of RAM. The run yielded 79 valid neural
networks. The best network achieved a training accuracy of
35.57% and a validation accuracy of 33.82%.
Semi-Hard Triplet Loss Model. The next step in our ML
pipeline is the transformation of the multinomial classifier
into an embedding, using the triplet loss method. Triplet loss
minimizes the distance between an anchor and a positive, both
of which have the same label, and maximizes the distance
between the anchor and a negative of a different label. Semi-
hard triplet loss means that we only use triplets that have
a negative that is farther from the anchor than the positive,
but still produces a positive loss [67]. We took our trained
classification model, removed its last layer, and trained it again
for 30 epochs on the same dataset as before, this time with a
bigger batch size of 1024 preprocessed traces and with semi-
hard triplet loss. Batch size is important to the triplet mining
process since we need sufficient examples in the batch to find
enough semi-hard triplets. We took the weights of the epoch
that yielded a model with the best accuracy using a 1-Nearest
Neighbor classifier. The end-product of this process is a model
that accepts preprocessed DRAWN APART traces as input and
produces embeddings in a Euclidean space. Labels are not
involved in this process—we can take any DRAWN APART
trace, even from a device that the model was not trained on,
feed it into the triplet loss model, and get Euclidean space
embeddings. We note that we optimized for the accuracy of
the classification model, instead of the 1-Nearest Neighbor, to
reduce the running time of our parameter search.
Evaluating The Classifier. The use of embeddings man-
dates using a k-Nearest Neighbors classifier for analyzing the
9
outputs of the network. Our metric for evaluation is the top-
kaccuracy, which stands for the probability that the correct
answer is one of the knearest neighbors of the selected trace,
for k= 1, 5, and 10, according to the distance metric output
by the model.
Base Rate Calculation. The accuracy of a classifier should be
compared to the base rate obtained by a naive classifier with
no access to the features. In the case of a classical learning
problem, the naive classifier can observe the training data and
learn the apriori probabilities of each label. Then, to get the
best accuracy, this naive classifier will output the label of the
most commonly observed device, or the nmost commonly
observed devices for a top-nsetting. The base rate in that
case is therefore the cumulative proportion of these devices
in the dataset. In the case of a k-shot learning problem, the
classifier does not know the apriori probabilities of each label,
since it gets an equal amount of training data for each label.
The naive classifier in this case will just output a random label,
or nrandom labels for a top-nsetting. The base rate in that
case is only n(#devices)1.
Train-Test Split Evaluation. We evaluated our model in two
ways: random train-test split, and k-shot learning. In the train-
test split evaluation, we randomly split each of the 1MP65,
1MPrest and 2MP datasets into two parts, using 80% for
memorizing and 20% for testing. We first used 1MP65 for
evaluation. On this subset, the base rate is 1.00% for top-
1 accuracy, 3.51% for top-5 accuracy and 6.15% for top-10
accuracy. To show that our network can generalize and work
on traces it has never seen before, we next considered the
performance of the network on 1MPrest. On this subset, the
base rate is 1.22% for top-1 accuracy, 4.42% for top-5 accuracy
and 7.2% for top-10 accuracy. To show that our network
generalizes to more devices and new traces, we evaluate it
on 2MP.2MP contains devices from 1MP, meaning that the
neural network was trained on some of the devices in 2MP,
but not all of them, but it was never trained on any traces
from 2MP. On this subset, the base rate is 0.64% for top-
1 accuracy, 2.78% for top-5 accuracy and 4.38% for top-10
accuracy. The results in Table II demonstrate that our model
accuracies are significantly better than the base rate for all
of the three datasets. The accuracies on 1MP65 and 1MPrest
datasets are roughly the same, showing the model responds
well to new devices. The small drop in the accuracy of 2MP
despite a base rate of approximately half the other datasets, the
addition of more devices and new traces and being collected
at a later date, shows the model has generalized well.
k-shot Learning Evaluation. The k-shot learning evaluation
was performed on the 2MP dataset. We chose 2MP to evaluate
k-shot learning because we used the traces from 1MP65 to
train our triplet loss model, which would bias the results.
While some of the devices in this subset also appear in 1MP,
none of the traces in 2MP were used to train or validate
the neural network. In the memorizing phase, we memorize
the first kcollections (k×7DRAWNAPART traces) of each
device in 2MP. The rest of the traces of 2MP are used in the
testing phase, again using a k-Nearest Neighbors classifier.
This is an evaluation that is close to real-world use. An
attacker would like to identify users with as few collections
as possible. This evaluation is harder than the previous one
TABLE II. STAN DAL ON E PERFORMANCE OF DRAWN APART IN TH E
WIL D USIN G THE RANDOM SP LIT (RS) AND k-S HOT ME TH OD S
Evaluation Method Accuracy (Base rate)
(Dataset) Top-1 Top-5 Top-10
RS (1MP65) 28.88% (1.00%) 56.36% (3.51%) 68.70% (6.15%)
RS (1MPrest) 28.28% (1.22%) 55.09% (4.42%) 67.15% (7.20%)
RS (2MP) 23.33% (0.64%) 47.23% (2.78%) 58.83% (4.38%)
1-Shot (2MP) 5.44% (0.05%) 14.10% (0.26%) 19.95% (0.51%)
5-Shot (2MP) 7.11% (0.05%) 19.34% (0.26%) 26.75% (0.51%)
10-Shot (2MP) 9.22% (0.05%) 22.77% (0.26%) 31.09% (0.51%)
due to the small amount of data available for the memorizing
phase. In addition, the time difference between 1MP and 2MP
requires the network to deal with concept drift. As mentioned
above, the base rate in this setting is very small, because the
attacker cannot learn anything about the distribution of the
devices in the test set. The results can be found in Table II.
As expected, they show a decrease in accuracy compared to
the evaluation using random split, but our model still delivers
significant accuracy beyond the base rate. We thus conclude
that DR AWNAPART can be used for few-shot learning.
We leave the 3MP dataset to be used in the evaluation
process of FP -STAL KE R to test the model on a truly unseen
dataset that reproduces in-the-wild conditions.
Visualizing Euclidean Distances. To visualize the perfor-
mance of our few-shot learning pipeline, we computed the
Euclidean distances between pairs randomly sampled from
2MP from the three following populations: Embeddings from
the same device, embeddings from different devices that share
the same renderer string, and finally embeddings from dif-
ferent devices with different renderer strings. To eliminate
correlations between traces in the same collection, we used
only the first trace in the collections that we sampled from.
It means that we measured the distance between traces from
different collections only. Figure 5 presents the probability
density of the different distributions. As the figure shows,
embeddings from the same device get a lower Euclidean
distance compared to embeddings from different devices, even
if the device has the same GPU. Of interest is that embeddings
from different devices that share the same renderer string
have a lower Euclidean distance compared to different devices
that do not share the same renderer string. This confirms
that DR AWNAPART indeed fingerprints the GPU stack or an
element correlated with the GPU stack. We can also observe
that if two traces have a Euclidean distance of less than 0.65,
we can be almost certain that both traces came from the same
device. This is a strong property, we use it in the next section
to improve FP-STALKER.
C. Evaluation on additional browsers in the wild.
While approximately 93.8% of the traces found in our in-
the-wild dataset 2MP come from users running the Google
Chrome browser, some users submitted traces using other
Chromium-based browsers. We isolated non-Chrome users by
filtering the traces according to their user-agent, and analyzed
the effectiveness of our standalone machine learning pipeline
on these browsers as well. The non-Chrome traces came from
users running Edge, Opera and Yandex, which represented
5%, 0.7% and 0.5% of the traces respectively. We run the
evaluation pipeline described in Section VI-B for each browser,
10
0
1
2
3
4
0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6
Probability Density
Euclidean Distance
Same device
Diff. device, same renderer
Diff. device, diff. renderer
Fig. 5. Performance of the DRAW NAPART embedding function. A Euclidean
distance below 0.65 indicates that the traces are likely to be from the same
device.
independently. Our results show that the standalone pipeline’s
accuracy for Edge, Opera and Yandex is 52.6%, 79.3%, and
89.7%, respectively. The smaller amount of traces in this
subset of the data results in a higher base rate when compared
to the entire 2MP dataset—3% for Edge, 17.9% for Opera,
and 27.6% for Yandex. These results, with the lab setting
results, indicate that our fingerprinting technique identifies
browsers from multiple vendors. More details can be found
in Appendix D.
Summary. The results of the standalone evaluation, as
summarized in Table II, show a significant improvement over
the base rate, demonstrating that DRAWN APART is effective
on its own. However, it can be observed that the classifier’s
effectiveness is significantly reduced in the k-shot model,
where the attacker has a limited trace budget to be used
for training. Putting these numbers into context is important.
In the world of browser fingerprinting, no single attribute
differentiates all devices. While some attributes are more
discriminating than others, it is their combination that is key
to differentiating one device from another. The standalone
evaluation of DRAW NAPART shows that our method has the
potential to significantly contribute to fingerprinting accuracy.
In the following subsection, we empirically measure this con-
tribution by using our method in conjunction with additional
fingerprinting attributes.
D. Integrating DRAWN APART with FP-STALKER
FP-STALK ER is the state-of-the-art fingerprint linking al-
gorithm [73]. In this section, we show that DR AWNAPART can
be used to improve the state-of-the-art.
Hybrid Algorithm. FP-STA LK ER has two distinct algo-
rithms: one entirely rule-based, while the other combines rule-
based constraints and machine-learning. Vastel et al. demon-
strated that their hybrid variant of yielded better results on
their dataset, but was slower than its rule-based counterpart.
As we are trying to prove the effectiveness of D RAWNAPART
in a real-world scenario, we chose to implement and optimize
the hybrid FP -STAL KE R algorithm, regardless of its speed.
FP-STALK ER consists in: 1) a preprocessing step that
discards fingerprints that contain inconsistencies or have been
spoofed and cannot be normally found in the wild, 2) a training
phase, in which the Random Forest algorithm is trained on a
balanced dataset, 3) an inference phase, in which the trained
model, combined with rules, compares incoming fingerprints
to a pool of previously classified fingerprints and attempts to
link them. Appendix E lists the linking algorithm.
Improving The Algorithm. As mentioned in Section VI-B,
the output of the embedding network consists of 256 L2-
normalized points that allow us to use a Euclidean distance to
compute the similarity between embeddings. Figure 5 shows
that the Euclidean distance is efficient, to an extent, in differen-
tiating devices. Based on the results obtained in Section VI-B,
which show that DR AWNAPART can correctly classify devices
with an acceptable accuracy, we decided to introduce the
use of the generated embeddings as a complement to the
machine-learning side of FP -STAL KE R. We note that the
results of our nude FP -STAL KE R cannot be fully compared
to the results obtained by Vastel et al. for two main reasons:
1) Their dataset spans for longer than the dataset we use in
our experiments. 2) Flash-related attributes no longer exist,[1],
impacting FP-STALKE R’s effectiveness.
Integrating DRAWNAPA RT as a complement to FP-
STALK ER’s machine-learning model is motivated by the fact
that FP-STALKE R uses a series of conditions on the output
of the Random Forest that makes its decisions too restrictive.
FP-STALK ER’s original code includes a function to optimize
the threshold used by the Random Forest, which we adapted
and ran on our dataset. The resulting threshold yielded sim-
ilar results, consequently comforting our observation that the
rules associated to the output of the Random Forest are too
restrictive, and discard too many fingerprints coming from the
same browser instance. On the other side, Figure 5 shows that
even though the Euclidean distances can be used to efficiently
differentiate devices with a relatively low threshold, its usage
alone may yield an unacceptable rate of false linkages due to a
little percentage of different devices having low Euclidean dis-
tances. To use DRAW NAPART embeddings in FP-S TAL KE R,
we average the seven embeddings that are collected with each
fingerprint and we output an average embedding. We used
the previously generated averaged embeddings to compute
the cosine similarity of the two compared fingerprints. The
resulting similarity is compared to a threshold we chose based
on an analysis on the train dataset. This process is explained
in the next paragraphs. If the similarity of the two embeddings
is above the chosen threshold, we classify the fingerprint as
similar to the one being compared without further steps. The
algorithm with the DR AWN APART additions, is available in
Appendix B.
Choosing The Epsilon Threshold. We chose the threshold
by performing an analysis over similar and different devices on
the train dataset. We generate an equally balanced dataset from
the training set comprising the cosine similarity of similar de-
vices and different devices, and compare different percentiles
of the distance of each group. As opposed to the Euclidean
distance used in Figure 5, was chose the cosine similarity for
FP-STALK ER because it is bounded by a more natural interval
of [1; 1]. Our experiments showed that our threshold on
the cosine similarity yielded better results than our Euclidean
distance threshold. Following our analysis, we noticed that the
5th-percentile of similar devices are all comprised below a
similarity of 0.10. Consequently, we chose a threshold of 0.15
in our experiments to account for a safety margin.
Results. We executed our revisited FP-STAL KER with its
DRAWNAPART addition on the dataset described in Sec-
tion VI-A. We first trained the Random Forest model on
fingerprints in the 1MP subset. We then executed the lambda
11
TABLE III. AVERAGE TRACKING TIME BY COLLECTION PERIOD
Collection Period Tracking duration in days Improvement
Nude FPS FPS+DA
2 days 17 26 +52.94%
3 days 17.25 25.5 +47.82%
4 days 17 28 +64.70%
5 days 17.5 27.5 +57.14%
6 days 18 30 +66.66%
7 days 17.5 28 +60.00%
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75
Browser Instances (%)
Average Tracking Duration (Days)
Nude FPS
FPS+DA
Fig. 6. Differences in Average Tracking Time between F P- STAL KE R (Nude
FPS) and FP -STALK ER with DRAWN APART (FPS+DA)
optimization in order to run FP -STAL KE R with its optimal
parameters, as required by the original paper. Finally, we
executed the inference phase on 3MP, which is unseen by
the training phase of both FP-STAL KE R and the embedding’s
network. We execute both FP-STALK ER without our con-
tribution, and our revisited version with DRAW NAPART, on
the same dataset for collection periods ranging from two to
seven days. Table III presents the average tracking duration
obtained for each collection period, with a top improvement of
66.66% compared to the original FP-STAL KE R on a collection
period of six days. Figure 6 presents the average tracking
duration with a collection period of seven days, as presented
in the original paper, which represents tracking a user who
visits a website once a week. As the figure shows, adding
DRAWNAPART to FP- STA LK ER increases the tracking time,
raising the median average tracking time by 10.5 days, from
17.5 days to 28 days. This is a substantial improvement
to stateless tracking, obtained through the use of our new
fingerprinting method, without making any changes to the
permission model or runtime assumptions of the browser
fingerprinting adversary. We believe it raises practical concerns
about the privacy of users being subjected to fingerprinting.
VII. DISCUSSION
A. Ethical Concerns
We integrated our fingerprinting algorithm into the Chrome
browser extension from the AMIUNIQUE crowd-sourced ex-
periment in January 2021. On the installation page, users are
informed of its purpose and of the data that is collected. To
safeguard users’ privacy, collected traces are only associated
with a random identifier created when the extension is in-
stalled, and participants can delete all their data by submitting
their extension ID. Out of an abundance of caution, we decided
not to publish the weights of the triplet loss model trained on
these users, since it can enable an attacker to track these users.
The extension and the handling of collected data conform to
the IRB recommendations we received.
B. Fingerprinting countermeasures
Countermeasures can be divided into three groups.
Blocking Scripts. Filter lists block resources known to be
a threat to user privacy. This is the case of Brave’s Shield
mechanism [3] and extensions, such as Ghostery [7] or Privacy
Badger [9]. However, filter lists against trackers and finger-
printing have been shown to lack exhaustiveness [41,47].
API Blocking. Tor Browser, by default, and Firefox, with
specific configuration, prevent web pages from reading out
the contents of the canvas for privacy reasons. Our technique
does not examine the canvas content, but rather measures the
time required to draw different graphics primitives. Snyder
et al. [70] consider the WebGL specification a “low-benefit,
high-cost standard”, which is required by less than 1% of
the Alexa Top 10k websites. This may lead some people to
consider the extreme option of completely blocking WebGL,
as possible way of preventing GPU fingerprinting. Disabling
WebGL, however, would have a non-negligible usability cost,
especially considering that many major websites rely on it,
including Google Maps, Microsoft Office Online, Amazon and
IKEA. As a form of compromise, we note that Tor Browser
currently runs WebGL in a “minimum capability mode”, which
allows some WebGL functionality while preventing access to
the ANGLE_instanced_arrays API used by our attack.
Changing Attribute Values. Defenses can change an attribute
value either to make it similar with common values shared by a
large proportion of users, or to add noise to it. For example, Tor
Browser unifies the values of many attributes for all users so
that their fingerprint is identical, and some browser extensions
add noise to rendered canvas images [72]. Wu et al. [76]
introduced a countermeasure that eliminates the differences
in floating point operations during the rendering process to
eliminate the differences in the rendering composition of
WebGL. Blurring defenses on canvas and WebGL focus on
changing values. Our technique does not directly rely on the
differences in images in a rendering process, and therefore is
not affected by the countermeasure of Wu et al. [76].
There are three elements that are crucial to our finger-
printing technique: the ability to issue drawing operations in
parallel. The entire graphics stack tendency to deterministically
choose which EU will render each vertex. And the ability to
measure the time it takes to render. Disrupting any of these
elements could affect the accuracy of our technique.
Preventing Parallel Execution. To block our method, graph-
ics stack could limit each web page to a single EU, or
disable hardware-accelerated rendering altogether and use a
deterministic software-only pipeline [76]. However, this would
severely affect usability and responsiveness, because WebGL
is built around massive parallelism. Existing graphics APIs do
not also support partitioning execution to a subset of EUs at
the moment.
Preventing Deterministic Dispatching. Adding a random-
ization step to the GPU’s dispatcher would make it impossible
for the web page to choose which EU receives which vertex.
Assuming the dispatcher still attempts to fill up all available
EUs, the effect on performance can be minimized. We note
12
that this countermeasure is not perfect, since a permuted trace
still contains data about the system being fingerprinted.
Preventing Time Measurements. Countermeasures that re-
duce, or even disable, the availability of timer APIs can affect
our technique, but completely blocking timing measurements
from the web is known to be a futile task [68,69].
C. Limitations and Insights
Experimental Limitations. The in the wild, crowd-sourced
experiments demonstrate that DR AWN APART can work suc-
cessfully in a variety of conditions that are not under the
attacker’s control. However, our lab experiments only cover
a limited set of conditions. Specifically, we only evaluated the
impact of temperatures between 26.4Cand 37 C, demon-
strating no impact on the results. Hence, we cannot preclude
the possibility that temperatures outside this range do not affect
the results. Similarly, our lab experiments do not control for
GPU voltage variations, which could affect our fingerprinting
capability. These limitations notwithstanding, the results of
the crowd-sourced experiments do provide confidence that
DRAWNAPART is effective in normal operating conditions.
Approach Limitations. We evaluate the effect of device
restarts on fingerprinting accuracy by training a model on the
GEN 3 devices, and testing the model against traces collected
after rebooting the devices. We obtain an overall accuracy of
50.3%. We observe that the accuracy drop is not uniform. That
is, some devices maintain stable fingerprints across restarts,
whereas the fingerprints of others change significantly each
restart. We note that we do not track reboots in our in-the-wild
experiments. Hence, these already account for the potential
accuracy drop associated with restarts.
We evaluate our technique across ten Chrome versions,
from 80.0.3987.116 to 81.0.4044.138. These ten versions
consist of two groups: the v80 group which includes six minor
versions, and the v81 which includes four minor versions. We
train our classifier on the latest v80 version (80.0.3987.163)
and test all ten versions. We obtain an accuracy of around
90% on all v80 versions, but significantly lower accuracy,
of around 60%, when we test the trained model on v81. We
hypothesize some changes in Chrome between v80 and v81
affected the entire WebGL stack. Observing the changelog
for the Chromium code repository reveals more than 10,000
commits between the two versions with several hundreds
affecting the GPU and the WebGL API [14]. An additional
experiment we conducted show that an attacker with a limited
trace capture budget can maintain an up-to-date classification
model by training a combined model with traces from multiple
versions and obtaining a consistently high accuracy of 90±%
across all ten versions.
D. Future Work
In-depth Root Cause Analysis. We shared our work with
a committee of WebGL experts in an effort to investigate
the root cause of DRAWN APART. They acknowledged that
the results reported in the paper offer insight on the track-
ing implications that WebGL can introduce and that our
method can highlight differences introduced by the hardware
manufacturing process. They propose additional hypotheses
for the mechanism through which manufacturing variations
enable DR AWNAPART. Specifically, the two propsals are that:
1) DrawnApart might be uncovering differences in power
consumption. A study by von Kistowski et al. [75] noticed
differences in power consumption from identical CPUs under
the same load but it remains to be seen if and how this
could translate to GPUs and WebGL. 2) The effect might be
induced by a difference in the response to temperature curves.
Validating either hypothesis requires detailed knowledge of the
design and the manufacturing process, which are only available
to the manufacturers, and are likely beyond the scope of a
typical academic research.
Next-Generation GPU APIs. DR AWNAPA RT currently only
uses the WebGL API, limiting its speed and accuracy. Upcom-
ing Web-based compute-specific GPU interfaces may allow
for far more efficient fingerprinting. There are two compute-
specific GPU APIs for web browsers: WebGL 2.0 Compute and
WebGPU. WebGL 2.0 Compute was integrated into Chrome
but disabled in 2020 [65], and its development has been
subsumed by WebGPU [49]. WebGPU is currently under active
development, and is not supported in the stable edition of any
browser, but preliminary implementations can be found in the
canary versions of Firefox, Chrome, and Edge.
These APIs introduce compute shaders, a form of com-
putational pipeline that coexists with the existing graphics
pipeline. One significant feature offered to compute shaders
is the ability to synchronize among different work units, by
using atomic functions, message queueing or shared memory.
We used this synchronization primitive to prototype a faster
fingerprinting technique for WebGL 2.0 Compute. In our pro-
totype, all workers race to acquire a mutex, and we record the
order in which the different work units were granted the mutex.
We tested this fingerprinting technique on our GEN 3 corpus,
after enabling WebGL 2.0 Compute support in Chrome through
a command-line parameter. This compute-based fingerprint
delivered a near-perfect classification accuracy of 98%, while
taking only 150 milliseconds to run, much faster than the
onscreen fingerprint which took a median time of 8 seconds
to collect. We believe that a similar method can also be found
for the WebGPU API once it becomes generally available. The
effects of accelerated compute APIs on user privacy should be
considered before they are enabled globally.
VIII. REL ATED W OR K
Web-based Fingerprinting. Eckersley [39] was the first to
show that it is possible to fingerprint browsers based on their
features and configurations. Mowery et al. [56] classified web
fingerprinting use as constructive or destructive. Constructive
fingerprinting can detect bots [27,48,74], or help to protect
sign-in processes [20,51]. Conversely, destructive use can
track users and their browsing habits. Many browser at-
tributes are considered parts of a browser fingerprint, including
navigator and screen properties [39,52], font enumera-
tion [59], audio rendering [40], and the WebGL canvas [57].
These techniques are all unable to tell apart identical devices.
Mobile Fingerprinting. Mobile devices have less hardware
and software diversity compared to desktops [43]. However,
they possess additional fingerprinting sources such as sen-
sors [25,33,36,37,78], microphones [23,30,34,35,79]
13
and cameras [22,32]. Manufacturing variations can also man-
ifest as differences in the radio frequency (RF) behavior of
networked devices [18,19]. These techniques are tailored to
mobile and RF environments, while our technique works in all
browsers that support WebGL, without requiring permissions,
additional sensors or RF hardware.
Physically Unclonable Functions. The silicon-based physi-
cally unclonable function (PUF) concept is based on the idea
that, even if a set of several integrated circuits is created
through an identical manufacturing process, each circuit is ac-
tually slightly different due to normal manufacturing variabil-
ity. This variability can be used as a unique device fingerprint
based on hardware. Examples of silicon PUF sources include
logic race conditions [44,71], Rowhammer behavior [21], and
SRAM initialization data [45,46]. Ruhrmair et al. [64] defined
a fingerprint as “a small, fixed set of unique analog properties”,
and explain that the fingerprint should be measured quickly and
preferably by an inexpensive device. In this work the GPU is
used as a PUF, and our challenge is how to successfully capture
the PUF behavior while using the limited APIs available to a
web browser.
IX. CONCLUSION
We introduced an effective technique to create a browser
fingerprint that relies on minor manufacturing variations in
GPUs. To the best of our knowledge, this is the first time
hardware features have been used to challenge privacy in this
context. Our fingerprinting technique can tell apart devices
that are completely indistinguishable by current state-of-the-
art methods, while remaining robust to changing environmental
conditions. Our technique works well both on PCs and mobile
devices, has a practical offline and online runtime, and does
not require access to any extra sensors such as the microphone,
camera, or gyroscope.
Processor designs are increasingly relying on massively
parallel architectures to improve performance without breaking
the physically-imposed constraints of power consumption and
processor speed. As the capabilities of GPU hardware become
increasingly exposed to untrusted web applications through
APIs such as WebGPU, hardware and software designers
must be aware of the risks to privacy raised by hardware
fingerprinting, and take care to design software, drivers and
hardware stacks in ways that protect user privacy.
Responsible Disclosure. We shared a preliminary draft of
our paper with Intel, ARM, Google, Mozilla and Brave during
June-July 2020 and continued sharing our progress with them
throughout 2020 and 2021. In response to the disclosure, the
Khronos group responsible for the WebGL specification has
established a technical study group to discuss the disclosure
with browser vendors and other stakeholders.
Artifact Availability. The JavaScript and GLSL collection
code in the online, offline and GPU-based methods, the ma-
chine learning pipeline, as well as the GE N 3, GEN 4, GEN 8
and GE N 10 datasets, are all available in the following reposi-
tory: https://github.com/drawnapart/drawnapart. The repository
includes an interactive Python notebook, viewable over the
web, that demonstrates classification over real data.
ACKNOWLEDGMENTS
This research has been supported by ANR-19-CE39-0007
MIAOUS; ANR-19-CE39-00201 FP-Locker projects; an ARC
Discovery Early Career Researcher Award DE200101577; an
ARC Discovery Project number DP210102670; the Blavatnik
ICRC at Tel-Aviv University; Intel Corporation; and Israel
Science Foundation.
We thank Gil Fidel, Anatoly Shusterman and Antoine
Vastel for their advice and help. We are grateful to the
BGU SISE technical support engineers Vitaly Shapira and
Sergey Korotchenko for their help in setting up the evaluation
test-beds. Experiments presented in this paper were carried
out using the Grid’5000 testbed, supported by a scientific
interest group hosted by Inria and including CNRS, RENATER
and several Universities as well as other organizations (see
https://www.grid5000.fr). Parts of this work were carried out
while Yuval Yarom was affiliated with CSIRO’s Data61.
BIBLIOGRAPHY
[1] “Adobe’s end of life,” https://www.adobe.com/products/
flashplayer/end-of-life.html.
[2] “Brave,” https://brave.com/.
[3] “Brave’s shields,” https://support.brave.com/hc/en-
us/articles/360022973471-What-is-Shields-.
[4] “California Consumer Privacy Act (CCPA),” https://oag.
ca.gov/privacy/ccpa.
[5] “Edge,” https://www.microsoft.com/en-us/edge.
[6] “FingerprintJS,” https://valve.github.io/fingerprintjs2/.
[7] “Ghostery,” http:wwww.ghostery.com.
[8] “Opera,” https://www.opera.com/.
[9] “Privacy badger,” https://support.brave.com/hc/en-
us/articles/360022973471-What-is-Shields-.
[10] “Samsung Remote Test Lab,” https://developer.samsung.
com/remotetestlab.
[11] “scikit-optimize: Sequential model-based optimization in
python,” https://scikit- optimize.github.io/stable/.
[12] “WebGL,” https://www.khronos.org/webgl/.
[13] “Yandex browser,” https://browser.yandex.ru/beta/.
[14] “Changelog from v80.0.3987.163 to v.81.0.4044.92
– chromium git repository,” https://chromium.
googlesource.com/chromium/src/+log/80.0.3987.163.
.81.0.4044.92?pretty=fuller&n=10000, 2020.
[15] “gpu timing.cc – chromium code search,”
https://source.chromium.org/chromium/chromium/
src/+/master:ui/gl/gpu timing.cc;l=309;drc=
e5a38eddbdf45d7563a00d019debd11b803af1bb, 2021.
[16] “Site isolation – the chromium projects,” https://www.
chromium.org/Home/chromium-security/site-isolation,
2021.
[17] G. Acar, C. Eubank, S. Englehardt, M. Ju´
arez,
A. Narayanan, and C. D´
ıaz, “The web never forgets:
Persistent tracking mechanisms in the wild,” in CCS,
2014, pp. 674–689.
[18] I. Agadakos, N. Agadakos, J. Polakis, and M. R. Amer,
“Chameleons’ oblivion: Complex-valued deep neural net-
works for protocol-agnostic RF device fingerprinting,” in
EuroS&P, 2020, pp. 322–338.
[19] A. Al-Shawabka, F. Restuccia, S. D’Oro, T. Jian, B. C.
Rendon, N. Soltani, J. G. Dy, S. Ioannidis, K. R.
Chowdhury, and T. Melodia, “Exposing the fingerprint:
14
Dissecting the impact of the wireless channel on radio
fingerprinting,” in INFOCOM, 2020, pp. 646–655.
[20] F. Alaca and P. C. van Oorschot, “Device fingerprinting
for augmenting web authentication: classification and
analysis of methods,” in ACSAC, 2016, pp. 289–301.
[21] N. A. Anagnostopoulos, T. Arul, Y. Fan, C. Hatzfeld,
A. Schaller, W. Xiong, M. Jain, M. U. Saleem,
J. Lotichius, S. Gabmeyer, J. Szefer, and S. Katzen-
beisser, “Intrinsic run-time row hammer PUFs: Leverag-
ing the row hammer effect for run-time cryptography and
improved security,” Cryptography, vol. 2, no. 3, p. 13,
2018.
[22] Z. Ba, S. Piao, X. Fu, D. Koutsonikolas, A. Mohaisen,
and K. Ren, “ABC: enabling smartphone authentication
with built-in camera,” in NDSS, 2018.
[23] G. Baldini and I. Amerini, “Smartphones identification
through the built-in microphones with convolutional neu-
ral network,” IEEE Access, vol. 7, pp. 158 685–158 696,
2019.
[24] K. Boda, ´
A. M. F¨
oldes, G. G. Guly´
as, and S. Imre, “User
tracking on the web via cross-browser fingerprinting,” in
NordSec, 2011, pp. 31–46.
[25] H. Bojinov, Y. Michalevsky, G. Nakibly, and D. Boneh,
“Mobile device identification via sensor fingerprinting,”
CoRR, vol. abs/1408.1416, 2014. [Online]. Available:
http://arxiv.org/abs/1408.1416
[26] L. Breiman, “Random forests,” Mach. Learn., vol. 45,
no. 1, pp. 5–32, 2001.
[27] E. Bursztein, A. Malyshev, T. Pietraszek, and K. Thomas,
“Picasso: Lightweight device class fingerprinting for web
clients,” in SPSM@CCS, 2016, pp. 93–102. [Online].
Available: http://dl.acm.org/citation.cfm?id=2994467
[28] Y. Cao, S. Li, and E. Wijmans, “(cross-)browser finger-
printing via OS and hardware level features,” in NDSS,
2017.
[29] A. Christensen, “Reduce resolution of performance.now,”
https://developer.mozilla.org/en-US/docs/Web/API/
Performance/now, 2015.
[30] W. B. Clarkson, “Breaking assumptions: Distinguishing.
between seemingly identical items using cheap sensors,”
Ph.D. dissertation, Princeton, 2012.
[31] E. Commission, “General Data Protection Regula-
tion (GDPR),” https://ec.europa.eu/info/law/law-topic/
data-protection/eu-data-protection-rules en.
[32] D. Cozzolino and L. Verdoliva, “Noiseprint: A CNN-
based camera model fingerprint,” IEEE TIFS, vol. 15,
pp. 144–159, 2020.
[33] A. Das, G. Acar, N. Borisov, and A. Pradeep, “The web’s
sixth sense: A study of scripts accessing smartphone
sensors,” in CCS, 2018, pp. 1515–1532.
[34] A. Das and N. Borisov, “Poster: Fingerprinting smart-
phones through speaker,” in Poster at the IEEE Security
and Privacy Symposium, 2014.
[35] A. Das, N. Borisov, and M. Caesar, “Do you hear what
I hear?: Fingerprinting smart devices through embedded
acoustic components,” in CCS, 2014, pp. 441–452.
[36] A. Das, N. Borisov, and E. Chou, “Every move you make:
Exploring practical issues in smartphone motion sensor
fingerprinting and countermeasures,” PoPETs, vol. 2018,
no. 1, pp. 88–108, 2018.
[37] S. Dey, N. Roy, W. Xu, R. R. Choudhury, and S. Nelaku-
diti, “Accelprint: Imperfections of accelerometers make
smartphones trackable,” in NDSS, 2014.
[38] A. Durey, P. Laperdrix, W. Rudametkin, and R. Rouvoy,
“FP-Redemption: Studying browser fingerprinting adop-
tion for the sake of web security,” in DIMVA, 2021, pp.
237–257.
[39] P. Eckersley, “How unique is your web browser?” in
PETS, 2010, pp. 1–18.
[40] S. Englehardt and A. Narayanan, “Online tracking: A 1-
million-site measurement and analysis,” in CCS, 2016,
pp. 1388–1401.
[41] I. Fouad, N. Bielova, A. Legout, and N. Sarafijanovic-
Djukic, “Missed by filter lists: Detecting unknown third-
party trackers with invisible pixels,” PoPETs, vol. 2020,
no. 2, pp. 499–518, 2020.
[42] P. Frigo, C. Giuffrida, H. Bos, and K. Razavi, “Grand
pwning unit: Accelerating microarchitectural attacks with
the GPU,” in IEEE SP, 2018, pp. 195–210.
[43] A. G ´
omez-Boix, P. Laperdrix, and B. Baudry, “Hiding
in the crowd: an analysis of the effectiveness of browser
fingerprinting at large scale,” in WWW, 2018, pp. 309–
318.
[44] C. Herder, M. M. Yu, F. Koushanfar, and S. Devadas,
“Physical unclonable functions and applications: A tuto-
rial,” Proceedings of the IEEE, vol. 102, no. 8, pp. 1126–
1141, 2014.
[45] D. E. Holcomb, W. P. Burleson, and K. Fu, “Initial
SRAM state as a fingerprint and source of true random
numbers for RFID tags,” in Proceedings of the Confer-
ence on RFID Security, vol. 7, no. 2, 2007, p. 01.
[46] ——, “Power-up SRAM state as an identifying finger-
print and source of true random numbers,” IEEE Trans.
Computers, vol. 58, no. 9, pp. 1198–1210, 2009.
[47] U. Iqbal, S. Englehardt, and Z. Shafiq, “Fingerprinting the
fingerprinters: Learning to detect browser fingerprinting
behaviors,” in IEEE SP, 2021, pp. 283–301.
[48] H. Jonker, B. Krumnow, and G. Vlot, “Fingerprint
surface-based detection of web bot detectors,” in ES-
ORICS, 2019, pp. 586–605.
[49] G. Kenneth Russell, personal communication.
[50] D. Kristol and L. Montulli, “HTTP state management
mechanism,” Internet Requests for Comments, RFC Ed-
itor, RFC 2109, Feb. 1997.
[51] P. Laperdrix, G. Avoine, B. Baudry, and N. Nikiforakis,
“Morellian analysis for browsers: Making web authen-
tication stronger with canvas fingerprinting,” in DIMVA,
2019, pp. 43–66.
[52] P. Laperdrix, W. Rudametkin, and B. Baudry, “Beauty
and the beast: Diverting modern web browsers to build
unique browser fingerprints,” in IEEE SP, 2016, pp. 878–
894.
[53] J. W. Lee, D. Lim, B. Gassend, G. E. Suh, M. V. Dijk,
and S. Devadas, “A technique to build a secret key in
integrated circuits with identification and authentication
applications,” in In Proceedings of the IEEE VLSI Cir-
cuits Symposium, 2004, pp. 176–179.
[54] A. Liaw and M. Wiener, “Classification and regression by
randomForest,” R news, vol. 2, no. 3, pp. 18–22, 2002.
[55] M. Moenig, “Issue 820891: Webgl2:
EXT disjoint timer query webgl2 failing in beta
of 65,” https://bugs.chromium.org/p/chromium/issues/
detail?id=820891, 2018.
15
[56] K. Mowery, D. Bogenreif, S. Yilek, and H. Shacham,
“Fingerprinting information in JavaScript implementa-
tions,” in Proceedings of W2SP, vol. 2, no. 11, 2011.
[57] K. Mowery and H. Shacham, “Pixel perfect: Fingerprint-
ing canvas in HTML5,” W2SP, pp. 1–12, 2012.
[58] G. Nakibly, G. Shelef, and S. Yudilevich,
“Hardware fingerprinting using HTML5,” CoRR,
vol. abs/1503.01408, 2015. [Online]. Available:
http://arxiv.org/abs/1503.01408
[59] N. Nikiforakis, A. Kapravelos, W. Joosen, C. Kruegel,
F. Piessens, and G. Vigna, “Cookieless monster: Explor-
ing the ecosystem of web-based device fingerprinting,”
in IEEE SP, 2013, pp. 541–555.
[60] M. Perry, “Bug 1517: Reduce precision of time for
JavaScript,” https://gitweb.torproject.org/user/mikeperry/
tor-browser.git/commit/?h=bug1517, 2015.
[61] S. Picek, “Challenges in deep learning-based profiled
side-channel analysis,” in SPACE, 2019, pp. 9–12.
[62] C. Project, “window.performance.now does not
support sub-millisecond precision on Windows,”
https://bugs.chromium.org/p/chromium/issues/detail?id=
158234#c110, 2016.
[63] T. Rokicki, C. Maurice, and P. Laperdrix, “SoK: In
Search of Lost Time: A Review of JavaScript Timers in
Browsers,” in 6th IEEE European Symposium on Security
and Privacy (EuroS&P’21), Vienna, Austria, Sep. 2021.
[Online]. Available: https://hal.inria.fr/hal-03215569
[64] U. R ¨
uhrmair, S. Devadas, and F. Koushanfar, “Security
based on physical unclonability and disorder,” in Intro-
duction to Hardware Security and Trust. Springer, 2012,
pp. 65–102.
[65] K. Russell, “Issue 859249: Extend WebGL
2.0 compute flag expiry to M85,” https:
//chromium.googlesource.com/chromium/src.git/+/
96186af9c385db253bf85f06f1324a729684cb2f, 2020.
[66] I. S´
anchez-Rola, I. Santos, and D. Balzarotti, “Clock
around the clock: Time-based device fingerprinting,” in
CCS, 2018, pp. 1502–1514.
[67] F. Schroff, D. Kalenichenko, and J. Philbin, “FaceNet: A
unified embedding for face recognition and clustering,”
in CVPR, 2015, pp. 815–823.
[68] M. Schwarz, C. Maurice, D. Gruss, and S. Mangard,
“Fantastic timers and where to find them: High-resolution
microarchitectural attacks in JavaScript,” in Financial
Cryptography and Data Security, 2017, pp. 247–267.
[69] A. Shusterman, A. Agarwal, S. O’Connell, D. Genkin,
Y. Oren, and Y. Yarom, “Prime+Probe 1, JavaScript
0: Overcoming browser-based side-channel defenses,” in
USENIX Security, 2021.
[70] P. Snyder, C. B. Taylor, and C. Kanich, “Most websites
don’t need to vibrate: A cost-benefit approach to improv-
ing browser security,” in CCS, 2017.
[71] G. E. Suh and S. Devadas, “Physical unclonable functions
for device authentication and secret key generation,” in
DAC, 2007, pp. 9–14.
[72] A. Vastel, P. Laperdrix, W. Rudametkin, and R. Rouvoy,
“Fp-Scanner: The privacy implications of browser
fingerprint inconsistencies,” in USENIX Security, 2018,
pp. 135–150. [Online]. Available: https://www.usenix.
org/conference/usenixsecurity18/presentation/vastel
[73] ——, “FP-Stalker: tracking browser fingerprint evolu-
tions,” in IEEE SP, 2018, pp. 728–741.
[74] A. Vastel, W. Rudametkin, R. Rouvoy, and X. Blanc,
“FP-Crawlers: Studying the resilience of browser finger-
printing to block crawlers,” in MADWeb, 2020.
[75] J. von Kistowski, H. Block, J. Beckett, C. Spradling,
K.-D. Lange, and S. Kounev, “Variations in cpu power
consumption,” in Proceedings of the 7th ACM/SPEC on
International Conference on Performance Engineering,
ser. ICPE ’16. New York, NY, USA: Association
for Computing Machinery, 2016, p. 147–158. [Online].
Available: https://doi.org/10.1145/2851553.2851567
[76] S. Wu, S. Li, Y. Cao, and N. Wang, “Rendered private:
Making GLSL execution uniform to prevent WebGL-
based browser fingerprinting,” in USENIX Security, 2019,
pp. 1645–1660.
[77] B. Zbarsky, “Clamp the resolution of performance.now()
calls to 5us,” https://hg.mozilla.org/integration/mozilla-
inbound/rev/48ae8b5e62ab, 2015.
[78] J. Zhang, A. R. Beresford, and I. Sheret, “SensorID:
Sensor calibration fingerprinting for smartphones,” in
IEEE SP, 2019, pp. 638–655.
[79] Z. Zhou, W. Diao, X. Liu, and K. Zhang, “Acoustic fin-
gerprinting revisited: Generate stable device ID stealthily
with inaudible sound,” in CCS, 2014, pp. 429–440.
APPENDIX A
EVALUATI ON O F GEN 3, GEN 4, AND GEN 8DEVICES
We report the complete evaluation for 600 traces, 20 points,
and 5 iterations per point in the online setting, for the GE N 3
(Table IV), G EN 4 (Table V), and GE N 8 (Table VI) datasets.
TABLE IV. EVALUATI ON FO R TH E GEN 3DATAS ET,DE PEN DI NG O N
TH E OPE RATO RS. T HE BA SE LI NE IS 10%
Operator Accuracy
Median
time (ms)
mul 93.5% ±0.7% 3,097
sinh 89.5% ±1.4% 8,757
abs 88.4% ±1.0% 4,532
pow 87.5% ±0.3% 4,663
log 87.1% ±1.2% 9,839
exp2 87.0% ±0.6% 7,532
shl 86.3% ±1.7% 6,799
atanh 81.8% ±1.4% 11,184
inversesqrt 80.1% ±0.7% 6,799
trunc 67.1% ±1.4% 1,667
TABLE V. EVALUATI ON FO R TH E GEN 4DATAS ET,DE PEN DI NG O N
TH E OPE RATO RS. T HE BA SE LI NE IS 4 %
Operator Accuracy
Median
time (ms)
mul 32.7% ±0.3% 6,361
abs 29.3% ±0.5% 3,295
shl 28.7% ±0.4% 6,483
inversesqrt 28.2% ±0.7% 6,485
exp2 27.8% ±1.0% 6,528
trunc 26.6% ±1.1% 3,161
log 25.3% ±1.0% 7,673
pow 23.3% ±0.6% 9,370
sinh 19.5% ±0.5% 8,953
atanh 19.1% ±0.6% 10,099
16
TABLE VI. EVALU ATIO N FOR T HE GE N 8DATASE T,DEP END IN G ON
TH E OPE RATO RS. T HE BA SE LI NE IS 6 %
Operator Accuracy
Median
time (ms)
exp2 43.6% ±1.2% 3,172
inversesqrt 39.4% ±0.9% 3,181
pow 36.6% ±1.4% 4,698
log 33.6% ±0.5% 3,299
sinh 32.4% ±0.9% 4,569
abs 30.9% ±0.6% 3,174
mul 30.9% ±1.0% 3,173
atanh 30.6% ±0.5% 5,935
trunc 28.7% ±0.6% 3,174
shl 26.9% ±1.1% 3,172
APPENDIX B
DETERMINISTIC ATTRIBUTES COLLECTED FOR THE
IN -TH E-W IL D DATASE T
1cookies and session support,
2HTTP headers: [Accept, Accept-Encoding, Language
, User-Agent],
3navigator: [DNT, platform, plugins],
4screen: [width, height]
5timezone,
6WebGL: [vendor, renderer]
APPENDIX C
SEL EC TE D HYP ER PAR AM ET ER S
Table VII summarizes the hyperparameters for the classi-
fiers used in this work.
TABLE VII. HY PER PARA ME TE RS FO R TH E CNN CL ASS IFI ER
Hyperparameter Value Space
Embedding size 256 32–256
Number of convolution blocks 3 1–10
Batch size 32 32–1024
Convolution filter size 128 8–128
Convolution kernel size 4 2–5
Dropout rate 0.119510 0–0.5
Activation relu relu, sigmoid
APPENDIX D
EVALUATI ON O F THE STANDALONE PI PE LI NE O N
ADDITIONAL BROWSERS IN THE WILD
TABLE VIII. STAN DAL ON E PERF OR MA NCE O F DRAWN APART OV ER
MU LTIP LE BR OWS ERS
Browser Accuracy (Base rate)
Top-1 Top-5 Top-10
Chrome 24.31% (0.7%) 49.12% (2.9%) 60.80% (4.7%)
Edge 52.60% (2.9%) 85.48% (15.6%) 93.86% (29.7%)
Opera 79.28% (17.9%) 99.41% (50.7%) 100.0% (77.5%)
Yandex 89.69% (27.6%) 98.36% (85.9%) 99.76% (94.1%)
APPENDIX E
FP-STALK ER HYBRID ALGORITHM WITH DRAWN APART
ADDITION
Algorithm 1: Hybrid matching algorithm with the
DRAWNAPART addition highlighted in red
1Function FingerprintMatching (F, fu,λ,)
2for fkFdo
3if FingerPrintHasDifferences(fk,fu, rules)
then Fksub exact < fk>;
4else
5exact exact ¡fk¿
6end
7end
8if |exact|>0then
9if SameIds(exact) then return exact[0].id ;
10 else return GenerateNewId() ;
11 end
12 for fkFksub do
13 cosine sim
GetSimilarity(fu.avg embedding,
fk.avg embedding);
14 if cosine sim >  then
15 return fk.id
16 end
17 < x1, x2, .., xm>= FeatureVector(fu,fk);
18 pP(fu.id =fk.id |< x1, x2, .., xm>)
19 if pλthen
20 candidates candidates < fk, p >
21 end
22 end
23 if |candidates|>0then
24 if |GetRankAndFilter(candidates)|>0then
return candidates[0].id ;
25 end
26 return GenerateNewId()
17