Fingerprinting the Fingerprinters:
Learning to Detect Browser Fingerprinting Behaviors
Umar Iqbal
The University of Iowa
Steven Englehardt
Mozilla Corporation
Zubair Shafiq
University of California, Davis
Abstract—Browser fingerprinting is an invasive and opaque
stateless tracking technique. Browser vendors, academics, and
standards bodies have long struggled to provide meaningful
protections against browser fingerprinting that are both ac-
curate and do not degrade user experience. We propose FP-
INS PECTO R, a machine learning based syntactic-semantic ap-
proach to accurately detect browser fingerprinting. We show
that FP-INSP EC TOR performs well, allowing us to detect 26%
more fingerprinting scripts than the state-of-the-art. We show
that an API-level fingerprinting countermeasure, built upon
FP-INS PECTO R, helps reduce website breakage by a factor of
2. We use FP-IN SPECT OR to perform a measurement study
of browser fingerprinting on top-100K websites. We find that
browser fingerprinting is now present on more than 10% of the
top-100K websites and over a quarter of the top-10K websites. We
also discover previously unreported uses of JavaScript APIs by
fingerprinting scripts suggesting that they are looking to exploit
APIs in new and unexpected ways.
I. INTRODUCTION
Mainstream browsers have started to provide built-in protec-
tion against cross-site tracking. For example, Safari [19] now
blocks all third-party cookies and Firefox [95] blocks third-
party cookies from known trackers by default. As mainstream
browsers implement countermeasures against stateful tracking,
there are concerns that it will encourage trackers to migrate
to more opaque, stateless tracking techniques such as browser
fingerprinting [83]. Thus, mainstream browsers have started to
explore mitigations for browser fingerprinting.
Some browsers and privacy tools have tried to mitigate
browser fingerprinting by changing the JavaScript API surface
exposed by browsers to the web. For example, privacy-oriented
browsers such as the Tor Browser [32], [64] have restricted
access to APIs such as Canvas and WebRTC, that are known
to be abused for browser fingerprinting. However, such blanket
API restriction has the side effect of breaking websites that use
these APIs to implement benign functionality.
Mainstream browsers have so far avoided deployment of
comprehensive API restrictions due to website breakage con-
cerns. As an alternative, some browsers—Firefox in partic-
ular [52]—have tried to mitigate browser fingerprinting by
blocking network requests to browser fingerprinting services
[12]. However, this approach relies heavily on manual analysis
and struggles to restrict fingerprinting scripts that are served
from first-party domains or dual-purpose third parties, such
as CDNs. Englehardt and Narayanan [54] manually designed
heuristics to detect fingerprinting scripts based on their exe-
cution behavior. However, this approach relies on hard-coded
heuristics that are narrowly defined to avoid false positives and
must be continually updated to capture evolving fingerprinting
and non-fingerprinting behaviors.
We propose FP-IN SP EC TOR, a machine learning based
approach to detect browser fingerprinting. FP-INS PE CTOR
trains classifiers to learn fingerprinting behaviors by extracting
syntactic and semantic features through a combination of static
and dynamic analysis that complement each others’ limita-
tions. More specifically, static analysis helps FP-IN SP EC TO R
overcome the coverage issues of dynamic analysis, while
dynamic analysis overcomes the inability of static analysis to
handle obfuscation.
Our evaluation shows that FP-IN SP EC TO R detects fin-
gerprinting scripts with 99.9% accuracy. We find that FP-
INS PE CTOR detects 26% more fingerprinting scripts than
manually designed heuristics [54]. Our evaluation shows that
FP-INS PE CTOR helps significantly reduce website breakage.
We find that targeted countermeasures that leverage FP-
INS PE CTOR’s detection reduce breakage by a factor 2 on
websites that are particularly prone to breakage.
We deploy FP-IN SP EC TOR to analyze the state of browser
fingerprinting on the web. We find that fingerprinting preva-
lence has increased over the years [37], [54], and is now
present on 10.18% of the Alexa top-100K websites. We detect
fingerprinting scripts served from more than two thousand
domains, which include both anti-ad fraud vendors as well
as cross-site trackers. FP -I NS PE CT OR also helps us uncover
several new APIs that were previously not known to be
used for fingerprinting. We discover that fingerprinting scripts
disproportionately use APIs such as the Permissions and
Performance APIs.
We summarize our key contributions as follows:
1) An ML-based syntactic-semantic approach to detect
browser fingerprinting behaviors by incorporating both
static and dynamic analysis.
2) An evaluation of website breakage caused by differ-
ent mitigation strategies that block network requests or
restrict APIs.
3) A measurement study of browser fingerprinting scripts
on the Alexa top-100K websites.
4) A clustering analysis of JavaScript APIs to uncover
new browser fingerprinting vectors.
Paper Organization: The rest of the paper proceeds as follows.
Section II presents an overview of browser fingerprinting and
limitations of existing countermeasures. Section III describes
the design and implementation of FP -INSPE CT OR. Section
IV presents the evaluation of FP-INSPECTO R’s accuracy and
arXiv:2008.04480v1 [cs.CR] 11 Aug 2020
website breakage. Section V describes FP-INSPECTOR’s de-
ployment on Alexa top-100K websites. Section VI presents
the analysis of JavaScript APIs used by fingerprinting scripts.
Section VII describes FP-INSPECTOR’s limitations. Section
VIII concludes the paper.
II. BACKGROU ND & REL ATED WO RK
Browser fingerprinting for online tracking. Browser fin-
gerprinting is a stateless tracking technique that uses device
configuration information exposed by the browser through
JavaScript APIs (e.g., Canvas) and HTTP headers (e.g.,
User-Agent). In contrast to traditional stateful tracking,
browser fingerprinting is stateless—the tracker does not need
to store any client-side information (e.g., unique identifiers
in cookies or local storage). Browser fingerprinting is widely
recognized by browser vendors [2], [7], [20] and standards
bodies [33], [76] as an abusive practice. Browser fingerprinting
is more intrusive than cookie-based tracking for two reasons:
(1) while cookies are observable in the browser, browser
fingerprints are opaque to users; (2) while users can control
cookies (e.g., disable third-party cookies or delete cookies alto-
gether), they have no such control over browser fingerprinting.
Browser fingerprinting is widely known to be used for
bot detection purposes [23], [49], [70], [73], including by
Google’s reCAPTCHA [44], [86] and during general web
authentication [40], [65]. However, there are concerns that
browser fingerprinting may be used for cross-site tracking
especially as mainstream browsers such as Safari [94] and
Firefox [95] adopt aggressive policies against third-party cook-
ies [83]. For example, browser fingerprints (by themselves or
when combined with IP address) [66] can be used to regenerate
or de-duplicate cookies [30], [85]. In fact, as we show later,
browser fingerprinting is used for both anti-fraud and potential
cross-site tracking.
Origins of browser fingerprinting. Mayer [71] first
showed that “quirkiness” can be exploited using JavaScript
APIs (e.g., navigator, screen, Plugin, and MimeType ob-
jects) to identify users. Later, Eckersley [51] conducted the
Panopticlick experiment to analyze browser fingerprints using
information from various HTTP headers and JavaScript APIs.
As modern web browsers have continued to add functionality
through new JavaScript APIs [88], the browser’s fingerprinting
surface has continued to expand. For example, researchers
have shown that Canvas [72], WebGL [45], [72], fonts
[56], extensions [90], the Audio API [54], the Battery
Status API [77], and even mobile sensors [47] can expose
identifying device information that can be used to build a
browser fingerprint. In fact, many of these APIs have already
been found to be abused in the wild [37], [38], [47], [54],
[75], [78]. Due to these concerns, standards bodies such as
the W3C [34] have provided guidance to take into account the
fingerprinting potential of newly proposed JavaScript APIs.
One such example is the Battery Status API, which was
deprecated by Firefox due to privacy concerns [78].
Does browser fingerprinting provide unique and per-
sistent identifiers? A browser fingerprint is a “statistical”
identifier, meaning that it does not deterministically identify
a device. Instead, the identifiability of a device depends on
the number of devices that share the same configuration. Past
research has reported widely varying statistics on the unique-
ness of browser fingerprints. Early research by Laperdrix et
al. [67] and Eckersley [51] found that 83% to 90% of devices
have a unique fingerprint. In particular, Laperdrix et al. found
that desktop browser fingerprints are more unique (90% of
devices) than mobile (81% of devices) due to the presence
of plugins and extensions. However, both Eckersley’s and
Laperdrix’s studies are based on data collected from self-
selected audiences—visitors to Panopticlick and AmIUnique,
respectively—which may bias their findings. In a more recent
study, Boix et al. [59] deployed browser fingerprinting code
on a major French publisher’s website. They found that only
33.6% of the devices in that sample have unique fingerprints.
However, they argued that adding other properties, such as
the IP address, Content language or Timezone, may make the
fingerprint unique.
To be used as a tracking identifier, a browser fingerprint
must either remain stable over time or be linkable with
relatively high confidence. Eckersley measured repeat visits to
the Panopticlick test page and found that 37% of repeat visitors
had more than one fingerprint [51]. However, about 65% of
devices could be re-identified by linking fingerprints using a
simple heuristic. Similarly, Vastel et al. [93] found that half
of the repeat visits to the AmIUnique test page change their
fingerprints in less than 5 days. They improve on Eckersley’s
linking heuristic and show that their linking technique can
track repeat AmIUnique visitors for an average of 74 days.
Prevalence of browser fingerprinting. A 2013 study of
browser fingerprinting in the wild [75] examined three fin-
gerprinting companies and found only 40 of the Alexa top-
10K websites deploying fingerprinting techniques. That same
year, a large-scale study by Acar et al. [38] found just 404
of the Alexa top 1-million websites deploying fingerprinting
techniques. Following that, a number of studies have measured
the deployment of fingerprinting across the web [37], [47],
[54], [78]. Although these studies use different methods to fin-
gerprinting, their results suggest an overall trend of increased
fingerprinting deployment. Most recently, an October 2019
study by The Washington Post [58] found fingerprinting on
about 37% of the Alexa top-500 US websites. This roughly
aligns with our findings in Section V, where we discover
fingerprinting scripts on 30.60% of the Alexa top-1K websites.
Despite increased scrutiny by browser vendors and the public
in general, fingerprinting continues to be prevalent.
Browser fingerprinting countermeasures. Existing tools
for fingerprinting protection broadly use three different ap-
proaches.1One approach randomizes return values of the
JavaScript APIs that can be fingerprinted, the second nor-
1Google has recently proposed a new approach to fingerprinting protection
that doesn’t fall into the categories discussed above. They propose assigning a
“privacy cost” based on the entropy exposed by each API access and enforcing
a “privacy budget” across all API accesses from a given origin [7]. Since this
proposal is only at the ideation stage and does not have any implementations,
we do not discuss it further.
malizes the return values of the JavaScript APIs that can be
fingerprinted, and the third uses heuristics to detect and block
fingerprinting scripts. All of these approaches have different
strengths and weaknesses. Some approaches protect against
active fingerprinting, i.e. scripts that probe for device proper-
ties such as the installed fonts, and others protect against pas-
sive fingerprinting, i.e. servers that collect information that’s
readily included in web requests, such as the User-Agent
request header. Randomization and normalization approaches
can defend against all forms of active fingerprinting and some
forms of passive (e.g., by randomizing User-Agent request
header). Heuristic-based approaches can defend against both
active and passive fingerprinting, e.g., by completely blocking
the network request to resource that fingerprints. We further
discuss these approaches and list their limitations.
1) The randomization approaches, such as Canvas Defender
[5], randomize the return values of the APIs such as
Canvas by adding noise to them. These approaches
not only impact the functional use case of APIs but are
also ineffective at restricting fingerprinting as they are re-
versible [92]. Additionally, the noised outputs themselves
can sometimes serve as a fingerprint, allowing websites to
identify the set of users that have the protection enabled
[48], [92].
2) The JavaScript API normalization approaches, such as
those used by the Tor Browser [15] and the Brave browser
[3], attempt to make all users return the same fingerprint.
This is achieved by limiting or spoofing the return values
of some APIs (e.g., Canvas), and entirely removing
access to other APIs (e.g., Battery Status). These
approaches limit website functionality and can cause
websites to break, even when those websites are using
the APIs for benign purposes.
3) The heuristic approaches, such as Privacy Badger [27]
and Disconnect [12], detect fingerprinting scripts with
pre-defined heuristics. Such heuristics, which must nar-
rowly target fingerprinters to avoid over-blocking, have
two limitations. First, they may miss fingerprinting scripts
that do not match their narrowly defined detection criteria.
Second, the detection criteria must be constantly main-
tained to detect new or evolving fingerprinting scripts.
Learning based solutions to detect fingerprinting. The
ineffectiveness of randomization, normalization, and heuristic-
based approaches motivate the need of a learning-based so-
lution. Browser fingerprinting falls into the broader class of
stateless tracking, i.e., tracking without storing on data on
the user’s machine. Stateless tracking is in contrast to stateful
tracking, which uses APIs provided by the browser to store an
identifier on the user’s device. Prior research has extensively
explored learning-based solutions for detecting stateful track-
ers. Such approaches try to learn tracking behavior of scripts
based on their structure and execution. One such method
by Ikram et al. [60] used features extracted through static
code analysis. They extracted n-grams of code statements as
features and trained a one-class machine learning classifier to
detect tracking scripts. In another work, Wu et al. [96] used
features extracted through dynamic analysis. They extracted
one-grams of web API method calls from execution traces of
scripts as features and trained a machine learning classifier to
detect tracking scripts.
Unfortunately, prior learning-based solutions generally lump
together stateless and stateful tracking. However, both of
these tracking techniques fundamentally differ from each other
and a solution that tries to detect both stateful and stateless
techniques will have mixed success. For example, a recent
graph-based machine learning approach to detect ads and
trackers proposed by Iqbal et al. [62] at times successfully
identified fingerprinting and at times failed.
Fingerprinting detection has not received as much attention
as stateful tracking detection. Al-Fannah et. al. [39] proposed
to detect fingerprinting vendors by matching 17 manually iden-
tified attributes (e.g., User-Agent), that have fingerprinting
potential, with the request URL. The request is labeled as
fingerprinting if at least one of the attributes is present in
the URL. However, this simple approach would incorrectly
detect the functional use of such attributes as fingerprinting.
Moreover, this approach fails when the attribute values in
the URL are hashed or encrypted. Rizzo [91], in their thesis,
explored the detection of fingerprinting scripts using machine
learning. Specifically, they trained a machine learning classifier
with features extracted through static code analysis. However,
only relying on static code analysis might not be sufficient
for an effective solution. Static code analysis has inherent
limitations to interpret obfuscated code and provide clarity
in enumerations. These limitations may hinder the ability
of a classifier, trained on features extracted through static
analysis, to correctly detect fingerprinting scripts as both
obfuscation [87] and enumerations (canvas font fingerprinting)
are common in fingerprinting scripts. Dynamic analysis of
fingerprinting scripts could solve that problem but it requires
scripts to execute and scripts may require user input or browser
events to trigger.
A complementary approach that uses both static and dy-
namic analysis could work—indeed this is the approach we
take next in Section III. Dynamic analysis can provide in-
terpretability for obfuscated scripts and scripts that involve
enumerations and static analysis could provide interpretability
for scripts that require user input or browser triggers.
III. FP-INS PE CTOR
In this section we present the design and implementation of
FP-INS PE CTOR, a machine learning approach that combines
static and dynamic JavaScript analysis to counter browser
fingerprinting. FP-INS PE CTOR has two major components: the
detection component, which extracts syntactic and semantic
features from scripts and trains a machine learning classifier
to detect fingerprinting scripts; and the mitigation component,
which applies a layered set of restrictions to the detected
fingerprinting scripts to counter passive and/or active finger-
printing in the browser. Figure 1 summarizes the architecture
of FP-INS PE CTOR.
(1) Opening a website with OpenWPM
instrumented Firefox
(2) Scripts unpacking and AST creation
from source files + Extraction of
Execution traces
Execution
Trace
AST
(3) Feature extraction from ASTs and
execution traces + Classification of scripts
Source Files
Logging
JavaScript
Instrumentation
www.example.com
JavaScript Layer
Network Layer
(4) Policy enforcement by blocking
network requests and restricting access
to JavaScript APIs
Fig. 1: FP -I NS PECTO R: (1) We crawl the web with an extended version of OpenWPM that extracts JavaScript source files and their execution
traces. (2) We extract Abstract Syntax Trees (ASTs) and execution traces for all scripts. (3) We use those representations to extract features
and train a machine learning model to detect fingerprinting scripts. (4) We use a layered approach to counter fingerprinting scripts.
A. Detecting fingerprinting scripts
A fingerprinting script has a limited number of APIs it can
use to extract a specific piece of information from a device.
For example, a script that tries to inspect the graphics stack
must use the Canvas and WebGL APIs; if a script wants
to collect 2D renderings (i.e., for canvas fingerprinting), it
must call toDataURL() or getImageData() functions
of the Canvas API to access the rendered canvas images. Past
research has used these patterns to manually curate heuristics
for detecting fingerprinting scripts with fairly high precision
[47], [54]. Our work builds on them and significantly extends
prior work in two main ways.
First, FP-INS PE CTOR automatically learns emergent prop-
erties of fingerprinting scripts instead of relying on hand-coded
heuristics. Specifically, we extract a large number of low-level
heuristics for capturing syntactic and semantic properties of
fingerprinting scripts to train a machine learning classifier.
FP-INS PE CTOR’s classifier trained on limited ground truth of
fingerprinting scripts from prior research is able to generalize
to detect new fingerprinting scripts as well as previously
unknown fingerprinting methods.
Second, unlike prior work, we leverage both static fea-
tures (i.e., script syntax) and dynamic features (i.e., script
execution). The static representation allows us to capture
fingerprinting scripts or routines that may not execute during
our page visit (e.g., because they require user interaction that
is hard to simulate during automated crawls). The dynamic
representation allows us to capture fingerprinting scripts that
are obfuscated or minified. FP-INSPECTOR trains separate
supervised machine learning models for static and dynamic
representations and combines their output to accurately clas-
sify a script as fingerprinting or non-fingerprinting.
Script monitoring. We gather script contents and their
execution traces by automatically loading webpages in an
extended version of OpenWPM [54]. By collecting both the
raw content and dynamic execution traces of scripts, we are
able to use both static and dynamic analysis to extract features
related to fingerprinting.
Collecting script contents: We collect script contents by
extending OpenWPM’s network monitoring instrumentation.
By default, this instrumentation saves the contents of all
HTTP responses that are loaded into script tags. We extend
OpenWPM to also capture the response content for all HTML
documents loaded by the browser. This allows us to capture
both external and inline JavaScript. We further parse the
HTML documents to extract inline scripts. This detail is
crucial because a vast majority of webpages use inline scripts
[68], [74].
Collecting script execution traces: We collect script ex-
ecution traces by extending OpenWPM’s script execution
instrumentation. OpenWPM records the name of the Javascript
API being accessed by a script, the method name or property
name of the access, any arguments passed to the method or
values set or returned by the property, and the stack trace at
the time of the call. By default, OpenWPM only instruments
a limited number of the JavaScript APIs that are known to
be used by fingerprinting scripts. We extend OpenWPM script
execution instrumentation to cover additional APIs and script
interactions that we expect to provide useful information for
differentiating fingerprinting activity from non-fingerprinting
activity. There is no canonical list of fingerprintable APIs,
and it is not performant to instrument the browser’s entire
API surface within OpenWPM. In light of these constraints,
we extended the set of APIs instrumented by OpenWPM to
cover several additional APIs used by popular fingerprinting
libraries (i.e., fingerprintjs2 [16]) and scripts (i.e., Media-
Math’s fingerprinting script [25]).2These include the Web
Graphics Library (WebGL) and performance.now, both
of which were previously not monitored by OpenWPM. We
also instrument a number of APIs used for Document Object
Model (DOM) interactions, including the createElement
method and the document and node objects. Monitoring
access to these APIs allows us to differentiate between scripts
that interact with the DOM and those that do not.
Static analysis. Static analysis allows us to capture infor-
mation from the contents and structure of JavaScript files—
including those which did not execute during our measure-
ments or those which were not covered by our extended
instrumentation.
AST representation: First, we represent scripts as Abstract
Syntax Trees (ASTs). This allows us to ignore coding style
differences between scripts and ever changing JavaScript syn-
2The full set of APIs monitored by our extended version of OpenWPM in
Appendix IX-A.
tax. ASTs encode scripts as a tree of syntax primitives (e.g.,
VariableDeclaration and ForStatement), where
edges represent syntactic relationship between code state-
ments. If we were to build features directly from the raw
contents of scripts, we would encode extraneous information
that may make it more difficult to determine whether a script is
fingerprinting. As an example, one script author may choose to
loop through an array of device properties by index, while an-
other may choose to use that same array’s forEach method.
Both scripts are accessing the same device information in a
loop, and both scripts will have a similar representation when
encoded as ASTs.
Figure 2b provides an example AST built from a simple
script. Nodes in an AST represent keywords, identifiers, and
literals in the script, while edges represent the relation between
them. Keywords are reserved words that have a special mean-
ing for the interpreter (e.g. for,eval), identifiers are func-
tion names or variable names (e.g. CanvasElem,FPDict),
and literals are constant values, such as a string assigned to an
identifier (e.g. “example”). Note that whitespace, comments,
and coding style are abstracted away by the AST.
Script unpacking: The process of representing scripts as
ASTs is complicated by the fact that JavaScript is an inter-
preted language and compiled at run time. This allows portions
of the script to arrive as plain text which is later compiled and
executed with eval or Function. Prior work has shown
that the fingerprinting scripts often include code that has been
“packed” with eval or Function [87]. To unpack scripts
containing eval or Function, we embed them in empty
HTML webpages and open them in an instrumented browser
[62] which allows us to extract scripts as they are parsed by
the JavaScript engine. We capture the parsed scripts and use
them in place of the packed versions when building ASTs.
We also follow this same procedure to extract in-line scripts,
which are scripts included directly in the HTML document.
Script 1 shows an example canvas font fingerprinting script
that has been packed with eval. This script loops through
a list of known fonts and measures the rendered width to
determine whether the font is installed (see [54] for a thorough
description of canvas font fingerprinting). Script 2 shows the
unpacked version of the script. As can be seen from the
two snippets, the script is significantly more interpretable
after unpacking. Figure 2 shows the importance of unpacking
to AST generation. The packed version of the script (i.e.,
Script 1) creates a generic stub AST (i.e., Figure 2a) which
would match the AST of any script that uses eval. Figure 2b
shows the full AST that has been generated from the unpacked
version of the script (i.e., Script 2). This AST captures the
actual structure and content of the fingerprinting code that
was passed to eval, and will allow us to extract meaningful
features from the script’s contents.
Static feature extraction: Next, we generate static features
from ASTs. ASTs have been extensively used in prior re-
search to detect malicious JavaScript [46], [55], [61]. To build
our features, we first hierarchically traverse the ASTs and
divide them into pairs of parent and child nodes. Parents
1eval("Fonts =[\"monospace\",..,\"sans-serif\"];
2CanvasElem = document.createElement(\"canvas\")
3;CanvasElem.width = \"100\";CanvasElem.height =
4\"100\";context = CanvasElem.getContext('2d');
5FPDict= {};for(i=0;i<Fonts.length;i++){
6CanvasElem.font = Fonts[i];FPDict[Fonts[i]] =
7CanvasElem.measureText(\"example\").width;}")
Script 1: A canvas font fingerprinting script packed with eval.
1// Canvas font fingerprinting script.
2Fonts =["monospace" , ... , "sans-serif"];
3
4CanvasElem =document.createElement("canvas");
5CanvasElem.width ="100";
6CanvasElem.height ="100";
7context =CanvasElem.getContext('2d');
8FPDict= {};
9for (i =0; i < Fonts.length; i++)
10 {
11 CanvasElem.font =Fonts[i];
12 FPDict[Fonts[i]] =context.measureText("example
").width;
13 }
Script 2: An unpacked version of the script in Script 1.
represents the context (e.g., for loops, try statements,
or if conditions), and children represent the function in-
side that context (e.g., createElement,toDataURL, and
measureText). Naively parsing parent:child pairs for
the entire AST of every script would result in a prohibitively
large number of features across all scripts (i.e., millions).
To avoid this we only consider parent:child pairs that
contain at least one keyword that matches a name, method, or
property from one of the JavaScript APIs [24]. We assemble
these parent:child combinations as feature vectors for
all scripts. Each parent:child combination is treated as
a binary feature, where 1 indicates the presence of a feature
and 0 indicates its absence. Since we do not execute scripts in
static analysis, fingerprinting-specific JavaScript API methods
usually have only one occurrence in the script. Thus, we
found the binary representation to sufficiently capture this
information from the script.
As an example, feature extracted from AST
in Figure 2b have ForStatement:var and
MemberExpression:measureText as features which
indicate the presence of a loop and access to measureText
method. These methods are frequently used in canvas font
fingerprinting scripts. Intuitively, fingerprinting script vectors
have combinations of parent:child pairs that are specific
to an API access pattern indicative of fingerprinting (e.g.,
setting a new font and measuring its width within a loop)
that are unlikely to occur in non-fingerprinting scripts. A
more comprehensive list of features extracted from the AST
in Figure 2b are listed in Appendix IX-B (Table VII).
To avoid over-fitting, we apply unsupervised and supervised
feature selection methods to reduce the number of features.
Specifically, we first prune features that do not vary much (i.e.,
variance <0.01) and also use information gain [63] to short
list top-1K features. This allows us to keep the features that
represent the most commonly used APIs for fingerprinting.
For example, two of the features with the highest information
gain represent the usage of getSupportedExtensions
Program CallExpression
eval
Fonts = [“mono...
(a) AST for packed Script 1
Program
VariableDeclaration VariableDeclaration ForStatement
Identifier
ArrayExpression
monospace Sans-serif
BlockStatement
ExprerssionStatement ExpressionStatement
canvas measureText
MemberExpression
(b) AST for unpacked script 2
Fig. 2: A truncated AST representation of Scripts 1 and 2. The
edges represent the syntactic relationship between nodes. Dotted lines
indicate an indirect connection through truncated nodes.
and toDataURL APIs. getSupportedExtensions is
used to get the list of supported WebGL extensions, which
vary depending on browser’s implementation. toDataURL
is used to get the base64 representation of the drawn canvas
image, which depending on underlying hardware and OS
configurations differs for the same canvas image. We then use
these top-1K features as input to train a supervised machine
learning model.
Dynamic analysis. Dynamic analysis complements some
weaknesses of static analysis. While static analysis allows
us to capture the syntactic structure of scripts, it fails when
the scripts are obfuscated or minified. This is crucial because
prior research has shown that fingerprinting scripts often use
obfuscation to hide their functionality [87]. For example,
Figure 3 shows an AST constructed from an obfuscated
version of Script 2. The static features extracted from this
AST would miss important parent:child pairs that are
essential to capturing the script’s functionality. Furthermore,
some of the important parent:child pairs may be filtered
during feature selection. Thus, in addition to extracting static
features from script contents, we extract dynamic features by
monitoring the execution of scripts. Execution traces capture
the semantic relationship within scripts and thus provide
additional context regarding a script’s functionality, even when
that script is obfuscated.
Dynamic feature extraction: We use two approaches to
extract features from execution traces. First, we keep presence
and count of the number of times a script accesses each
1var _0x2c4a=['\x63\x58\x49\x69','\x42\x6a\x58\
2x44\x6f\x41\x3d\x3d','\x55\x54\x72\x43\x69\x73
3\x4f\x77\x4f\x38\x4f\x6c\x50\x45\x6e\x43\x6d\x
477\x30\x3d','\x49\x38\x4f\x38\x49\x4d\x4f\x42\
5x77\x70\x72\x44\x6e\x41\x3d\x3d','\x77\x35\x54
6\x43\x73\x42\x56\x51','\x77\x37\x62\x43\x69\x4
7d\x4f\x38\x77\x...............................
8....................................x3284af={};
9for(i=0x0;i<_0x1b2b65[_0x5d52('0x7','\x28\x6d\
10 x68\x26')];i++){_0x1d1d56[_0x5d52('0x8','\x67\
11 x33\x48\x21')]=_0x1b2b65[i];_0x3284af[_0x1b2b6
12 5[i]]=_0x4d24cc[_0x5d52('0x9','\x35\x70\x64\x4
13 c')](_0x5d52('0xa','\x28\x6d\x68\x26'))['\x77\
14 x69\x64\x74\x68'];}
(a) Obfuscated canvas font fingerprinting script from Script 2.
Program
VariableDeclaration ExpressionStatement ForStatement
Identifier
ArrayExpression
BjXDoA== W5TCsBVQ
BlockStatement
ExprerssionStatement ExpressionStatement
_0x3284af MemberExpression
MemberExpression
cXli
(b) AST of the obfuscated script shown in (a).
Fig. 3: A truncated example showing the AST representation of an
obfuscated version of the canvas font fingerprinting script in Script 2.
The edges represent the syntactic relationship between nodes. Dotted
lines indicate an indirect connection through truncated nodes.
individual API method or property and use that as a feature.
Next, we build features from APIs that are passed arguments
or return values. Rather than using the arguments or return
values directly, we use derived values to capture a higher-level
semantic that is likely to better generalize during classification.
For example, we will compute the length of a string rather
than including the exact text, or will compute the area of
a element rather than including the height and width. This
allows us to avoid training our classifier with overly specific
features—i.e., we do not care whether the text “CanvasFinger-
print” or “C4NV45F1NG3RPR1NT” is used during a canvas
fingerprinting attempt, and instead only care about the text
length and complexity. For concrete example, we calculate
the area of canvas element, its text size, and whether its is
present on screen when processing execution logs related to
CanvasRenderingContext2D.fillText().
As an example, the features extracted from
the execution trace of Script 3a includes
(HTMLCanvasElement.getContext, True) and
(CanvasRenderingContext2D.measureText,
7) as features, where True indicates the us-
age of HTMLCanvasElement.getContext
and 7indicates the size of text in
CanvasRenderingContext2D.measureText. A
more comprehensive list of features extracted from the
execution trace of Script 3a can be found in Appendix IX-B
(Table VIII).
To avoid over-fitting, we again apply unsupervised and
supervised feature selection methods to limit the number
of features. Similar to feature reduction for static analysis,
this allows us to keep the features that represent the most
commonly used APIs for fingerprinting. For example, two of
the features with the highest information gain represent the
usage of CanvasRenderingContext2D.fillStyle
and navigator.platform APIs.
CanvasRenderingContext2D.fillStyle is used
to specify the color, gradient, or pattern inside a canvas
shape, which can make a shape render differently across
browsers and devices. navigator.platform reveals the
platform (e.g. MacIntel and Win32) on which the browser is
running. We then use these top-1K features as input to train
a supervised machine learning model.
Classifying fingerprinting scripts. FP-INSPECTO R uses a
decision tree [81] classifier for training a machine learning
model. The decision tree is passed feature vectors of scripts
for classification. While constructing the tree, at each node, the
decision tree chooses the feature that most effectively splits the
data. Specifically, the attribute with highest information gain
is chosen to split the data by enriching one class. The decision
tree then follows the same methodology to recursively partition
the subsets unless the subset belongs to one class or it can no
longer be partitioned.
Note that we train two separate models and take the union
of their classification results instead of combining features
from both the static and dynamic representations of scripts
to train a single model. That is, a script is considered to
be a fingerprinting script if it is classified as fingerprinting
by either the model that uses static features as input or the
model that uses dynamic features as input. We use union of
the two models because we only have the decision from one
of the two models for some scripts (e.g., scripts that do not
execute). Furthermore, the two models are already trained on
high-precision ground truth [54] and taking the union would
allow us to push for better recall. Using this approach, we
classify all scripts loaded during a page visit—i.e., we include
both external scripts loaded from separate URLs and inline
scripts contained in any HTML document.
B. Mitigating fingerprinting scripts
Existing browser fingerprinting countermeasures can be
classified into two categories: content blocking and API re-
striction. Content blocking, as the name implies, blocks the
requests to download fingerprinting scripts based on their
network location (e.g., domain or URL). API restriction, on the
other hand, does not block fingerprinting scripts from loading
but rather limits access to certain JavaScript APIs that are
known to be used for browser fingerprinting.
Privacy-focused browsers such as the Tor Browser [15]
prefer blanket API restriction over content blocking mainly
because it side steps the challenging problem of detecting
fingerprinting scripts. While API restriction provides reli-
able protection against active fingerprinting, it can break the
functionality of websites that use the restricted APIs for
benign purposes. Browsers that deploy API restriction also
require additional protections against passive fingerprinting
(e.g., routing traffic over the Tor network). Content blocking
protects against both active and passive fingerprinting, but it
is also prone to breakage when the detected script is dual-
purpose (i.e., implements both fingerprinting and legitimate
functionality) or a false positive.
Website breakage is an important consideration for finger-
printing countermeasures. For instance, a recent user trial by
Mozilla showed that privacy countermeasures in Firefox can
negatively impact user engagement due to website breakage
[21]. In fact, website breakage can be the deciding factor
in real-world deployment of any privacy-enhancing counter-
measure [8], [26]. We are interested in studying the impact
of different fingerprinting countermeasures based on FP-
INS PE CTOR on website breakage. We implement the following
countermeasures:
1) Blanket API Restriction. We restrict access for all scripts
to the JavaScript APIs known to be used by fingerprinting
scripts, hereafter referred to as “fingerprinting APIs”. Finger-
printing APIs include functions and properties that are used
in fingerprintjs2 and those discovered by FP-INS PE CTOR in
Section VI. Note that this countermeasure does not at all rely
on FP-INS PE CTOR’s detection of fingerprinting scripts.
2) Targeted API Restriction. We restrict access to finger-
printing APIs only for the scripts served from domains that are
detected by FP -INSPE CT OR to deploy fingerprinting scripts.
3) Request Blocking. We block the requests to download
the scripts served from domains that are detected by FP-
INS PE CTOR to deploy fingerprinting scripts.
4) Hybrid. We block the requests to download the scripts
served from domains that are detected by FP-INS PE CT OR to
deploy fingerprinting scripts, except for first-party and inline
scripts. Additionally, we restrict access to fingerprinting APIs
for first-party and inline scripts on detected domains. This
protects against active fingerprinting by first parties and both
active and passive fingerprinting by third parties.
IV. EVALUATION
We evaluate FP-INS PE CTOR’s performance in terms of its
accuracy in detecting fingerprinting scripts and its impact on
website breakage when mitigating fingerprinting.
A. Accuracy
We require samples of fingerprinting and non-fingerprinting
scripts to train our supervised machine learning models. Up-
to-date ground truth for fingerprinting is not readily avail-
able. Academic researchers have released lists of scripts [47],
[54], however these only show a snapshot at the time of
the paper’s publication and are not kept up-to-date. While
many anti-tracking lists (e.g., EasyPrivacy) do include some
fingerprinting domains, Disconnect’s tracking protection list
[12] is the only publicly available list that does not lump
together different types of tracking and separately identifies
fingerprinting domains. However, Disconnect’s list is insuffi-
cient for our purposes. First, Disconnect’s list only includes
the domain names of companies that deploy fingerprinting
scripts, rather than the actual URLs of the fingerprinting
scripts. This prevents us from using the list to differentiate
between fingerprinting and non-fingerprinting resources served
from those domains. Second, the list appears to be focused
on fingerprinting deployed by popular third-party vendors.
Since first-party fingerprinting is also prevalent [47], we would
like to train our classifier to detect both first- and third-party
fingerprinting scripts. Given the limitations of these options,
we choose to detect fingerprinting scripts using a slightly
modified version of the heuristics implemented in [54].
1) Fingerprinting Definition: The research community is
not aligned on a single definition to label fingerprinting
scripts. It is often difficult to determine the intent behind any
individual API access, and classifying all instances of device
information collection as fingerprinting will result in a large
number of false positives. For example, an advertisement script
may collect a device’s screen size to determine whether an
ad was viewable and may never use that information as part
of a fingerprint to identify the device. With that in mind,
we take a conservative approach: we consider a script as
fingerprinting if it uses Canvas,WebRTC,Canvas Font,
or AudioContext as defined in [54]. Specifically, if the
heuristics trigger for any of the above mentioned behaviors, we
label the script as fingerprinting and otherwise label it as non-
fingerprinting. We do not consider the collection of attributes
from navigator or screen APIs from a script as fingerprinting,
as these APIs are frequently used in non-distinct ways by
scripts that do not fingerprint users. We decide to initially use
this definition of fingerprinting because it is precise, i.e., it has
a low false positive rate. A low false positive rate is crucial
for a reliable ground truth as the classifiers effectiveness will
depend on the soundness of ground truth. The exact details of
heuristics are listed in Appendix IX-C.
2) Data Collection: We use our extended version of Open-
WPM to crawl the homepages of twenty thousand websites
sampled from the Alexa top-100K websites. To build this
sample, we take the top-10K sites from the list and augment
it with a random sample of 10K sites with Alexa ranks from
10K to 100K. This allows us to cover both the most popular
websites as well as websites further down the long tail. During
the crawl we allow each site 120 seconds to fully load before
timing out the page visit. We store the HTTP response body
content from all documents and scripts loaded on the page as
well as the execution traces of all scripts.
Our crawled dataset consists of 17,629 websites with
153,354 distinct executing scripts. Since we generate our
ground truth by analyzing script execution traces, we are
only able to collect ground truth from scripts that actually
execute during our crawl. Although we are not able train our
classifier on scripts that do not execute during our crawl, we
are still able to classify them. Their classification result will
depend entirely on the static features extracted from the script
contents. For static features, we successfully create ASTs for
143,526 scripts—9,828 scripts (6.4%) fail because of invalid
syntax. Out of valid scripts, we extract a total of 47,717
parent:child combinations and do feature selection as de-
scribed in Section III. Specifically, we first filter by a variance
threshold of 0.01 to reduce the set to 8,597 parent:child
combinations. We then select top 1K features when sorted by
information gain. For dynamic features, we extract a total of
2,628 features from 153,354 scripts. Similar to static analysis,
we do feature selection as described in Section III and reduce
the feature set to top 1K when sorted by information gain.
3) Enhancing Ground Truth: As discussed in Section II,
heuristics suffer from two inherent problems. First, heuristics
are narrowly defined which can cause them to miss some
fingerprinting scripts. Second, heuristics are predefined and are
thus unable to keep up with evolving fingerprinting scripts.
Due to these problems, we know that our heuristics-based
ground truth is imperfect and a machine learning model trained
on such a ground truth may perform poorly. We address these
problems by enhancing the ground truth through iterative re-
training. We first train a base model with incomplete ground
truth, and then manually analyze the disagreements between
the classifier’s output and the ground truth. We update the
ground truth whenever we find that our classifier makes a
correct decision that was not reflected in the ground truth (i.e.,
discovers a fingerprinting script that was missed by the ground
truth heuristics). We perform three iterations of this process.
Manual labeling. The manual process of analyzing scripts
during iterative re-training works as follows. We automatically
create a report for every script that requires manual analysis.
Each report contains: (1) all of the API method calls and
property accesses monitored by our instrumentation, including
the arguments and return values, (2) snippets from the script
that capture the surrounding context of calls to the APIs
used for canvas, WebRTC, canvas font, and AudioContext
fingerprinting, (3) a fingerprintjs2 similarity score,3and (4) the
formatted contents of the complete script. We then manually
review the reports based on our domain expertise to determine
whether the analyzed script is fingerprinting. Specifically, we
look for heuristic-like behaviors in the scripts. The heuristic-
like behavior means that the fingerprinting code in the script:
1) Is similar to known fingerprinting code in terms of its
functionality and structure,
2) It is accompanied with other fingerprinting code (i.e.
most fingerprinting scripts use multiple fingerprinting
techniques), and
3) It does not interact with the functional code in the script.
For example, common patterns include sequentially reading
values from multiple APIs, storing them in arrays or dictio-
3We compute Jaccard similarity between the script, by first beautifying it
and then tokenizing it based on white spaces, and all releases of fingerprintjs2.
The release with the highest similarity is reported along with the similarity
score.
Itr. Initial New Detections Correct Detections Enhanced
FP NON-FP FP NON-FP FP NON-FP FP NON-FP
S1 884 142,642 150 232 103 10 977 142,549
S2 977 142,549 109 182 84 5 1,056 142,470
S3 1,056 142,470 76 158 53 1 1,108 142,418
D1 928 152,426 11 52 4 9 923 152,431
D2 923 152,431 8 35 4 1 926 152,428
D3 926 152,428 13 36 5 2 929 152,425
TABLE I: Enhancing ground truth with multiple iterations of retain-
ing. Itr. represents the iteration number of training with static (S) and
dynamic (D) models. New Detections (FP) represent the additional
fingerprinting scripts detected by the classifier and New Detections
(NON-FP) represent the new non-fingerprinting scripts detected by
the classifier as compared to heuristics. Whereas Correct Detections
(FP) represent the manually verified correct determination of the
classifier for fingerprinting scripts and Correct Detections (NON-FP)
represent the manually verified correct determination of the classifier
for non-fingerprinting scripts.
naries, hashing them, and sending them in a network request
without interacting with other parts of the script or page.
Findings. We found the majority of reviews to be
straightforward—the scripts in question were often similar to
known fingerprinting libraries and they frequently use APIs
that are used by other fingerprinting scripts. If we find any
fingerprinting functionality within the script we label the
whole script as fingerprinting, otherwise we label it is non-
fingerprinting. To be on the safe side, scripts for which we
were unable to make a manual determination (e.g., due to
obfuscation) were considered non-fingerprinting.
Overall, perhaps expected, we find that our ground truth
based on heuristics is high precision but low recall within the
disagreements we analyzed. Most of the scripts that heuristics
detect as fingerprinting do include fingerprinting code, but we
also find that the heuristics miss some fingerprinting scripts.
There are two major reasons scripts are missed. First, the
fingerprinting portion of the script resides in a dormant part of
the script, waiting to be called by other events or functions in
a webpage. For example, the snippet in Script 3 (Appendix
IX-D) defines fingerprinting-specific prototypes and assign
them to a window object which can be called at a later
point in time. Second, the fingerprinting functionality of the
script deviates from the predefined heuristics. For example,
the snippet in Script 4 (Appendix IX-D) calls save and
restore methods on CanvasRenderingContext2D el-
ement, which are two method calls used by the heuristics to
filter out non-fingerprinting scripts [54].
However, for a small number of scripts, the heuristics
outperform the classifier. Scripts which make heavy use of
an API used that is used for fingerprinting, and which have
limited interaction with the webpage, are sometimes classified
incorrectly. For example, we find cases where the classifier
mislabels non-fingerprinting scripts that use the Canvas API
to create animations and charts, and which only interact with
a few HTML elements in the process. Since heuristics cannot
generalize over fingerprinting behaviors, they do not classify
partial API usage and limited interaction as fingerprinting.
In other cases, the classifier labels fingerprinting scripts as
non-fingerprinting because they include a single fingerprinting
technique along with functional code. For example, we find
cases where classifier mislabels fingerprinting scripts embed-
ded on login pages that only include canvas font fingerprinting
alongside functional code. Since heuristics are precise, they
do not consider functional aspects of the scripts and do not
classify limited usage of fingerprinting as non-fingerprinting.
Improvements. Table I presents the results of our manual
evaluation for ground truth improvement for both static and
dynamic analysis. It can be seen from the table that our
classifier is usually correct when it classifies a script as
fingerprinting in disagreement with the ground truth. We
discover new fingerprinting scripts in each iteration. In ad-
dition, it is also evident from the table that our models are
able to correct its mistakes with each iteration (i.e., correct
previously incorrect non-fingerprinting classifications). This
demonstrates the ability of classifier in iteratively detecting
new fingerprinting scripts and correct mistakes as ground truth
is improved. We further argue that this iterative improvement
with re-training is essential for an operational deployment of
a machine learning classifier and we empirically demonstrate
that for FP -I NS PE CT OR. Overall, we enhance our ground
truth by labeling an additional 240 scripts as fingerprinting
and 16 scripts as non-fingerprinting for static analysis, as
well as 13 scripts as fingerprinting and 12 scripts as non-
fingerprinting for dynamic analysis. In total, we detect 1,108
fingerprinting scripts and 142,418 non-fingerprinting scripts
with static analysis and 929 fingerprinting scripts and 152,425
non-fingerprinting scripts using dynamic analysis.
4) Classification Accuracy: We use the decision tree mod-
els described in Section III to classify the crawled scripts.
To establish confidence in our models against unseen scripts,
we perform standard 10-fold cross validation. We determine
the accuracy of our models by comparing the predicted label
of scripts with the enhanced ground truth described in Sec-
tion IV-A3. For the model trained on static features, we achieve
an accuracy of 99.8%, with 85.5% recall, and 92.7% precision.
For the model trained on dynamic features, we achieve an
accuracy of 99.9%, with 96.7% recall, and 99.1% precision.
Combining static and dynamic models. In FP-
INS PE CTOR, we train two separate machine learning models—
one using features extracted from the static representation
of the scripts, and one using features extracted from the
dynamic representation of the scripts. Both of the models
provide complementary information for detecting fingerprint-
ing scripts. Specifically, the model trained on static features
identifies dormant scripts that are not captured by the dy-
namic representation, whereas the model trained on dynamic
features identifies obfuscated scripts that are missed by the
static representation. We achieve the best of both worlds
by combining the classification results of these models. We
combine the models by doing an OR operation on the results
of each model. Specifically, if either of the model detects a
script as fingerprinting, we consider it a fingerprinting script. If
neither of the model detects a script as fingerprinting, then we
consider it a non-fingerprinting script. We manually analyze
Classifier Heuristics (Scripts/Websites) Classifiers (Scripts/Websites) FPR FNR Recall Precision Accuracy
Static 884 / 2,225 1,022 / 3,289 0.05% 15.7% 85.5% 92.7% 99.8%
Dynamic 928 / 2,272 907 / 3,278 0.005% 5.3% 96.7% 99.1% 99.9%
Combined 935 / 2,272 1,178 / 3,653 0.05% 6.1% 93.8% 93.1% 99.9%
TABLE II: FP-INS PE CT OR’s classification results in terms of recall, precision, and accuracy in detecting fingerprinting scripts. “Heuristics
(Scripts/Websites)” represents the number of scripts and websites detected by heuristics and “Classifiers (Scripts/Websites)” represents the
number of scripts and websites detected by the classifiers. FPR represents false positive rate and FNR represent false negative rate.
the differences in detection of static and dynamic models and
find that the 94.46% of scripts identified only by the static
model are partially or completely dormant and 92.30% of the
scripts identified only by the dynamic model are obfuscated
or excessively minified.
Table II presents the combined and individual results of
static and dynamic models. It can be seen from the table that
FP-INS PE CTOR’s classifier detects 26% more scripts than the
heuristics with a negligible false positive rate (FPR) of 0.05%
and a false negative rate (FNR) of 6.1%. Overall, we find
that by combining the models, FP -INSPE CT OR increases its
detection rate by almost 10% and achieves an overall accuracy
of 99.9% with 93.8% recall and 93.1% precision.4
B. Breakage
We implement the countermeasures listed in Section III-B in
a browser extension to evaluate their breakage. The browser
extension contains the countermeasures as options that can
be selected one at a time. For API restriction, we override
functions and properties of fingerprinting APIs and return
an error message when they are accessed on any webpage.
For targeted API restriction, we extract a script’s domain by
traversing the stack each time the script makes a call to one
of the fingerprinting APIs. We use FP-IN SP EC TOR’s classi-
fier determinations to create a domain-level (eTLD+1, which
matches Disconnect’s fingerprinting list used by Firefox) filter
list. For request blocking, we use the webRequest API [35]
to intercept and block outgoing web requests that match our
filter list [6].
Next, we analyze the breakage caused by these enforce-
ments on a random sample of 50 websites that load finger-
printing scripts along with 11 websites that are reported as
broken in Firefox due to fingerprinting countermeasures [17].
Prior research [62], [89] has mostly relied on manual analysis
to analyze website breakage due the challenges in automating
breakage detection. We follow the same principles and man-
ually analyze website breakage under the four fingerprinting
countermeasures. To systemize manual breakage analysis, we
create a taxonomy of common fingerprinting breakage patterns
by going through the breakage-related bug reports on Mozilla’s
bug tracker [17]. We open each test website on vanilla Firefox
(i.e., without our extension installed) as control and also with
4Is the complexity of a machine learning model really necessary? Would a
simpler approach work as well? While our machine learning model performs
well, we seek to answer this question in Appendix IX-E by comparing
our performance to a more straightforward similarity approach to detect
fingerprinting. We compute the similarity between scripts and the popular
fingerprinting library fingerprintjs2. Overall, we find that script similarity not
only detects a partial number of fingerprinting scripts detected by our machine
learning model but also incurs an unacceptably high number of false positives.
our extension installed as treatment. It is noteworthy that we
disable Firefox’s default privacy protections in both the control
and treatment branches of our study to isolate the impact of
our protections. We test each of the countermeasures one by
one by trying to interact with the website for few minutes
by scrolling through the page and using the obvious website
functionality. If we discover missing content or broken website
features only in the treatment group, we assign a breakage
label using the following taxonomy:
1) Major: The core functionality of the website is broken.
Examples include: login or registration flow, search bar,
menu, and page navigation.
2) Minor: The secondary functionality of the website is
broken. Examples include: comment sections, reviews,
social media widgets, and icons.
3) None: The core and secondary functionalities of the web-
site are the same in treatment and control. We consider
missing ads as no breakage.
Policy Major (%) Minor (%) Total (%)
Blanket API restriction 48.36% 19.67% 68.03%
Targeted API restriction 24.59% 5.73% 30.32%
Request blocking 44.26% 5.73% 50%
Hybrid 38.52% 8.19% 46.72%
TABLE III: Breakdown of breakage caused by different countermea-
sures. The results present the average assessment of two reviewers.
To reduce coder bias and subjectivity, we asked two re-
viewers to code the breakage on the full set of 61 test
websites using the aforementioned guidelines. The inter-coder
reliability between our two reviewers is 87.70% for a total of
244 instances (4 countermeasures ×61 websites). Table III
summarizes the averaged breakage results. Overall, we note
that targeted countermeasures that use FP-INS PE CT OR’s detec-
tion reduce breakage by a factor of 2 on the tested websites that
are particularly prone to breakage.5More specifically, blanket
API restriction suffers the most (breaking more than two-thirds
of the tested websites) while the targeted API restriction causes
the least breakage (with no major breakage on about 75% of
the tested websites).
Surprisingly, we find that the blanket API restriction causes
more breakage than request blocking. We posit this is caused
by the fact that blanket API restriction is applied to all scripts
on the page, regardless of whether they are fingerprinting,
since even benign functionality may be impacted. By compar-
5These websites employ fingerprinting scripts and/or are reported to be
broken due to fingerprinting-specific countermeasures. Thus, they represent a
particularly challenging set of websites to evaluate breakage by fingerprinting
countermeasures.
ison, request blocking only impacts scripts known to finger-
print. Next, we observe that targeted API restrictions has the
least breakage. This is expected, as we do not block requests
and only limit scripts that are suspected of fingerprinting; the
functionality of benign scripts is not impacted.
We find that the hybrid countermeasure causes less breakage
than request blocking but more breakage than the targeted
API restrictions. The hybrid countermeasure performs better
than request blocking because it does not block network
requests to load first-party fingerprinting resources and instead
applies targeted API restrictions to protect against first-party
fingerprinting. Whereas it performs worse than targeted API
restrictions because it still blocks network requests to load
third-party fingerprinting resources that are not blocked by
the targeted API restrictions. Though hybrid blocking causes
more breakage than targeted API restriction, it offers the
best protection. Hybrid blocking mitigates both active and
passive fingerprinting from third-party resources, and active
fingerprinting from first-party resources and inline scripts.
The only thing missed by hybrid blocking—passive first-
party fingerprinting—is nearly impossible to block without
breaking websites because any first-party resource loaded by
the browser can passively collect device information.
We find that the most common reason for website breakage
is the dependence of essential functionality on fingerprinting
code. In severe cases, registration/login or other core func-
tionality on a website depends on computing the fingerprint.
For example, the registration page on freelancer.com is blank
because we restrict the fingerprinting script from f-cdn.com.
In less severe cases, websites embed widgets or ads that rely
on fingerprinting code. For example, the social media widgets
on ucoz.ru/all/ disappears because we apply restrictions to the
fingerprinting script from usocial.pro.
V. MEASURING FINGERPRINTING INTHE WILD
Next, we use the detection component of FP-INS PE CT OR
to analyze the state of fingerprinting on top-100K websites.
To collect data from the Alexa top-100K websites, we first
start with the 20K website crawl described in Section IV-A2,
and follow the same collection procedure for the remaining
80K websites not included in that measurement. Out of this
additional 80K, we successfully visit 71,112 websites. The
results provide an updated view of fingerprinting deploy-
ment following the large-scale 2016 study by Englehardt and
Narayanan [54]. On a high-level we find: (1) the deployment
of fingerprinting is still growing—reaching over a quarter of
the Alexa top-10K sites, (2) fingerprinting is almost twice as
prevalent on news sites than in any other category of site,
(3) fingerprinting is used for both anti-ad fraud and potential
cross-site tracking.
A. Over a quarter of the top sites now fingerprint users
We first examine the deployment of fingerprinting across
the top sites; our results are summarized in Table IV. In
alignment with prior work [54], we find that fingerprinting
is more prevalent on highly ranked sites. We also detect more
fingerprinting than prior work [54], with over a quarter of
the top sites now deploying fingerprinting. This increase in
use holds true across all site ranks—we observe a notable
increase even within less popular sites (i.e., 10K - 100K).
Overall, we find that more than 10.18% of top-100K websites
deploy fingerprinting.
We also find significantly more domains serving fingerprint-
ing than past work—2,349 domains on the top 100K sites
(Table V) compared to 519 domains6on the top 1 million sites
[54]. This suggests two things: our method is detecting a more
comprehensive set of techniques than measured by Englehardt
and Narayanan [54], and/or that the use of fingerprinting—
both in prevalence and in the number of parties involved—has
significantly increased between 2016 and 2019.
Rank Interval Websites (count) Websites (%)
1 to 1K 266 30.60%
1K to 10K 2,010 24.45%
10K to 20K 981 11.10%
20K to 50K 2,378 8.92%
50K to 100K 3,405 7.70%
1 to 100K 9,040 10.18%
TABLE IV: Distribution of Alexa top-100K websites that deploy
fingerprinting. Results are sliced by site rank.
B. Fingerprinting is most common on news sites
Fingerprinting is deployed unevenly across different cate-
gories of sites.7The difference is staggering—ranging from
nearly 14% of news websites to just 1% of credit/debit related
websites. Figure 4 summarizes our findings.
The distribution of fingerprinting scripts in Figure 4 roughly
matches the distribution of trackers (i.e., not only finger-
printing, but any type of tracking) measured in past work
[54]. One possible explanation of these results is that—like
traditional tracking methods—fingerprinting is more common
on websites that rely on advertising for monetization. Our
results in Section V-C reinforce this interpretation, as the
most prevalent vendors classified as fingerprinting provide
anti-ad fraud and tracking services. The particularly high
use of fingerprinting on news websites could also point to
fingerprinting being used as part of paywall enforcement, since
cookie-based paywalls are relatively easy to circumvent [80].
C. Fingerprinting is used to fight ad fraud but also for
potential cross-site tracking
Fingerprinting scripts detected by FP -I NS PE CT OR are of-
ten served by third-party vendors. Three of the top five
vendors in Table V (doubleverify.com, adsafeprotected.com,
and adsco.re) specialize in verifying the authenticity of ad
impressions. Their privacy policies mention that they use
“device identification technology” that leverages “browser
6Englehardt and Narayanan [54] do not give an exact count of the number
of domains serving fingerprinting across all measured techniques, and instead
give a count for each individual fingerprinting technique. To get an upper
bound on the total count, we assume there is no overlap between the reported
results of each technique and take the sum.
7We use Webshrinker [36] for website categorization API.
News
Shopping
Adult Content
Technology
Games
Streaming Media
Illegal Content
Television & Video
Air Travel
Movies
Education
File Sharing
Email / Messaging
Sports
Credit / Debit
0.0
2.5
5.0
7.5
10.0
12.5
15.0
Number of websites (%)
Fig. 4: The deployment of fingerprinting scripts across different
categories of websites.
type, version, and capabilities” [1], [13], [22]. Our results also
corroborate that bot detection services rely on fingerprinting
[41], and indicate that prevalent fingerprinting vendors provide
anti-ad fraud services. The two remaining vendors of the top
five, i.e., alicdn.com and yimg.com, appear to be CDNs for
Alibaba and Oath/Yahoo!, respectively.
Vendor Domain Tracker Websites (count)
doubleverify.com Y 2,130
adsafeprotected.com Y 1,363
alicdn.com N 523
adsco.re N 395
yimg.com Y 246
2,344 others Y(86) 5,702
Total 10,359 (9,040 distinct)
TABLE V: The presence of the top vendors classified as fingerprint-
ing on Alexa top-100K websites. Tracker column shows whether
the vendor is a cross-site tracker according to Disconnect’s tracking
protection list. Y represents yes and N represents no.
Several fingerprinting vendors disclose using cookies “to
collect information about advertising impression opportuni-
ties” [22] that is shared with “customers and partners to
perform and deliver the advertising and traffic measurement
services” [13]. To better understand whether these vendors
participate in cross-site tracking, we first analyze the over-
lap of the fingerprinting vendors with Disconnect’s tracking
protection list [12].8Disconnect employs a careful manual
review process [11] to classify a service as tracking. For
example, Disconnect classifies c3tag as tracking [4], [10] and
adsco.re as not tracking [1], [9] because, based on their privacy
policies, the former shares Personally Identifiable Information
(PII) with its customers while the latter does not. We find that
3.78% of the fingerprinting vendors are classified as tracking
by Disconnect.
We also analyze whether fingerprinting vendors engage
in cookie syncing [79], which is a common practice by
online advertisers and trackers to improve their coverage. For
example, a tracker may associate browsing data from a single
device to multiple distinct identifier cookies when cookies are
cleared or partitioned. However, a fingerprinting vendor can
use a device fingerprint to link those cookie identifiers together
8We exclude the cryptomining and fingerprinting categories of the Discon-
nect list. The list was retrieved in June 2019.
[53]. If the fingerprinting vendor had previously cookie synced
with other trackers, it can use its fingerprint to link cookies
for other trackers. We use the list by Fouad et al. [57] to
identify fingerprinting domains that also participate in cookie
syncing. We find that 17.28% of the fingerprinting vendors
participate in cookie syncing. More importantly, we find that
fingerprinting vendors often sync cookies with well-known ad-
tech vendors. For example, adsafeprotected.com engages in
cookie syncing with rubiconproject.com and adnxs.com. We
also find that many fingerprinting vendors engage in cookie
syncing with numerous third-parties. For example, openx.net
engages in cookie syncing with 332 other domains, out of
which 14 are classified as tracking by Disconnect. We leave
an in-depth large-scale investigation of the interplay between
fingerprinting and cookie syncing as future work.
VI. ANA LYZING APIS USED BY FINGERPRINTERS
In this section, we are interested in systematically inves-
tigating whether any newly proposed or existing JavaScript
APIs are being exploited for browser fingerprinting. There are
serious concerns that newly proposed or existing JavaScript
APIs can be exploited in unexpected ways for browser finger-
printing [33].
We start off by analyzing the distribution of Javascript APIs
in fingerprinting scripts. Specifically, we extract Javascript API
keywords (i.e., API names, properties, and methods) from the
source code of scripts and sort them based on the ratio of their
fraction of occurrence in fingerprinting scripts to the fraction
of occurrence in non-fingerprinting scripts. This ratio captures
the relative prevalence of API keywords in fingerprinting
scripts as compared to non-fingerprinting scripts. A higher
value of the ratio for a keyword means that it is more prevalent
in fingerprinting scripts than non-fingerprinting scripts. Note
that means that the keyword is only present in fingerprinting
scripts. Table VI lists some of the interesting API keywords
that are disproportionately prevalent in fingerprinting scripts.
We note that some APIs are primarily used by fingerprinting
scripts, including APIs which have been reported by prior
fingerprinting studies (e.g., accelerometer) and those
which have not (e.g., getDevices). We present a more
comprehensive list of the API keywords disproportionately
prevalent in fingerprinting scripts in Appendix IX-F.
Keywords Ratio Scripts (count) Websites (count)
MediaDeviceInfo 1 1363
magnetometer 215 241
PresentationRequest 16 16
onuserproximity 543.77 18 18
accelerometer 326.71 219 247
chargingchange 302.10 20 20
getDevices 187.62 59 80
maxChannelCount 184.44 29 40
baseLatency 181.26 3 8
vibrate 57.68 232 1793
TABLE VI: A sample of frequently used JavaScript API keywords
in fingerprinting scripts and their presence on 20K websites crawl.
Scripts (count) represents the number of distinct fingerprinting scripts
in which the keyword is used and Websites (count) represents the
number of websites on which those scripts are embedded.
Since the number of API keywords is quite large, it is
practically infeasible to manually analyze all of them. Thus,
we first group the extracted API keywords into a few clusters
and then manually analyze the cluster which has the largest
concentration of API keywords that are disproportionately
used in the fingerprinting scripts detected by FP -INSPE CT OR.
Our key insight is that browser fingerprinting scripts typically
do not use a technique (e.g., canvas fingerprinting) in isolation
but rather combine several techniques together. Thus, we
expect fingerprinting-related API keywords to separate out as
a distinct cluster.
To group API keywords into clusters, we first construct the
co-occurrence graph of API keywords. Specifically, we model
API keywords as nodes and include an edge between them
that is weighted based on the frequency of co-occurrence in
a script. Thus, co-occurring API keywords appear together in
our graph representation. We then partition the API keyword
co-occurrence graph into clusters by identifying strongly con-
nected communities of co-occurring API keywords. Specifi-
cally, we extract communities of co-occurring keywords by
computing the partition of the nodes that maximize the mod-
ularity using the Louvain method [42]. In total, we extract
25 clusters with noticeable dense cliques of co-occurring API
keywords. To identify the clusters of interest, we assign the
API keyword’s fraction of occurrence in fingerprinting scripts
to the fraction of occurrence in non-fingerprinting scripts as
weights to the nodes. We further classify nodes based on
whether they appear in fingerprintjs2 [16], which is a popular
open-source browser fingerprinting library.
We investigate the cluster with the highest concentration of
nodes that tend to appear in the detected fingerprinting scripts
and those that appear in fingerprintjs2. While we discover a
number of previously unknown uses of JavaScript APIs by fin-
gerprinting scripts, for the sake of concise discussion, instead
of individually listing all of the previously unknown JavaScript
API keywords, we thematically group them. We discuss how
each new API we discover to be used by fingerprinting scripts
may be abused to extract identifying information about the
user or their device. While our method highlights potential
abuses, a deep manual analysis of each script is required to
confirm abuse.
Functionality fingerprinting. This category covers browser
fingerprinting techniques that probe for different function-
alities supported by the browser. Modern websites rely on
many APIs to support their rich functionality. However, not
all browsers support every API or may have the requisite user
permission. Thus, websites may need to probe for APIs and
permissions to adapt their functionality. However, such feature
probing can potentially leak entropy.
1) Permission fingerprinting: Permissions API provides a
way to determine whether a permission is granted or denied to
access a feature or an API. We discover several cases in which
the Permissions API was used in fingerprinting scripts.
Specifically, we found cases where the status and permissions
for APIs such as Notification,Geolocation, and
Camera were probed. The differences in permissions across
browsers and user settings can be used as part of a fingerprint.
2) Peripheral fingerprinting: Modern browsers provide inter-
faces to communicate with external peripherals connected with
the device. We find several cases in which peripherals such
as gamepads and virtual reality devices were probed. In one
of the examples of peripherals probing, we find a case in
which keyboard layout was probed using getLayoutMap
function. The layout of the keyboard (e.g., size, presence of
specific keys, string associated with specific keys) varies across
different vendors and models. The presence and the various
functionalities supported by these peripherals can potentially
leak entropy.
3) API fingerprinting: All browsers expose differing sets of
features and APIs to the web. Furthermore, some browser
extensions override native JavaScript methods. Such im-
plementation inconsistencies in browsers and modifications
by user-installed extensions can potentially leak entropy
[84]. We find several cases in which certain functions such
as AudioWorklet were probed by fingerprinting scripts.
AudioWorklet is only implemented in Chromium-based
browsers (e.g., Chrome or Opera) starting version 66 and
its presence can be probed to check the browser and its
version. We also find several cases where fingerprinting scripts
check whether certain functions such as setTimeout and
mozRTCSessionDescription were overridden. Function
overriding can also leak presence of certain browser exten-
sions. For example, Privacy Badger [27] overrides several
prototypes of functions that are known to be used for fin-
gerprinting.
Algorithmic fingerprinting. This category covers browser
fingerprinting techniques that do not just simply probe for
different functionalities. These browser fingerprinting tech-
niques algorithmically process certain inputs using different
JavaScript APIs and exploit the fact that different implemen-
tations process these inputs differently to leak entropy. We
discuss both newly discovered uses of JavaScript APIs that
were previously not observed in fingerprinting scripts and
known fingerprinting techniques that seem to have evolved
since their initial discovery.
1) Timing fingerprinting: The Performance API provides
high-resolution timestamps of various points during the life
cycle of loaded resources and it can be used in various
ways to conduct timing related fingerprinting attacks [29],
[82]. We find several instances of fingerprinting scripts using
the Performance API to record timing of all its events
such as domainLookupStart,domainLookupEnd,
domInteractive, and msFirstPaint. Such measure-
ments can be used to compute the DNS lookup time of a
domain, the time to interactive DOM, and the time of first
paint. A small DNS lookup time may reveal that the URL
has previously been visited and thus can leak the navigation
history [29], whereas time to interactive DOM and time to
first paint for a website may vary across different browsers and
different underlying hardware configurations. Such differences
in timing information can potentially leak entropy.
2) Animation fingerprinting: Similar to timing
fingerprinting, we found fingerprinting scripts using
requestAnimationFrame to compute the frame
rate of content rendering in a browser. The browser
guarantees that it will execute the callback function passed
to requestAnimationFrame before it repaints the view.
The browser callback rate generally matches the display
refresh rate [28] and the number of callbacks within an
interval can capture the frame rate. The differences in frame
rates can potentially leak entropy.
3) Audio fingerprinting: Englehardt and Narayanan [54] first
reported the audio fingerprinting technique that uses the
AudioContext API. Specifically, the audio signal gen-
erated with AudioContext varies across devices and
browsers. Audio fingerprinting seems to have evolved. We
identify several cases in which fingerprinting scripts used
the AudioContext API to capture additional proper-
ties such as numberOfInputs,numberOfOutputs, and
destination among many others properties. In addition
to reading AudioContext properties, we also find cases
in which canPlayType is used to extract the audio codecs
supported by the device. This additional information exposed
by the AudioContext API can potentially leak entropy.
4) Sensors fingerprinting: Prior work has shown that the
device sensors can be abused for browser fingerprinting
[43], [47], [50]. We find several instances of previously
known and unknown sensors being used by fingerprint-
ing scripts. Specifically, we find previously known sensors
[47] such as devicemotion and deviceorientation
and, more importantly, previously unknown sensors such as
userproximity being used by fingerprinting scripts.
VII. LIMITATIONS
In this section, we discuss some of the limitations of FP-
INS PE CTOR’s detection and mitigation components. Since FP -
INS PE CTOR detects fingerprinting at the granularity of a script,
an adversarial website can disperse fingerprinting scripts into
several chunks to avoid detection or amalgamate all scripts—
functional and fingerprinting—into one to avoid enforcement
of mitigation countermeasures.
Evading detection through script dispersion. For detec-
tion, FP-INS PE CTOR only considers syntactic and semantic
relationship within scripts and does not considers relationship
across scripts. Because of its current design, FP -I NS PE CT OR
may be challenged in detecting fingerprinting when the re-
sponsible code is divided across several scripts. However, FP-
INS PE CTOR can be extended to capture interaction among
scripts by more deeply instrumenting the browser. For ex-
ample, prior approaches such as AdGraph [62] and JSGraph
[69] instrument browsers to capture cross-script interaction.
Future versions of FP-INS PE CT OR can also implement such
instrumentation; in particular, FP-INSPECTO R can be extended
to capture the parent-child relationships of script inclusion.
To avoid trivial detection through parent-child relationships,
the script dispersion technique would need to be embed each
chunk into a website from an independent ancestor node,
and return the results to seemingly independent servers. Thus,
script dispersion also has a maintenance cost: each update to
the fingerprinting script will require the distribution of script
into several chunks along with extensive testing to ensure
correct implementation.
Evading countermeasures through script amalgamation.
To restrict fingerprinting, FP-IN SP EC TOR’s most effective
countermeasure (i.e. targeted API restriction) is applied at the
granularity of a script. FP-INSPECT OR may break websites
where all of the scripts are amalgamated in a single script.
However, more granular enforcement can be used to effec-
tively prevent fingerprinting in such cases. For example, the
instrumentation used by future versions of FP-INS PE CT OR can
be extended to track the execution of callbacks and target those
related to fingerprinting. It is noteworthy that—similar to script
dispersion—script amalgamation has a maintenance cost: each
update to any of the script will require the amalgamation of all
scripts into one. Script amalgamation could also be used as a
countermeasure against ad and tracker blockers, which would
introduce the same type of breakage. However, anecdotal
evidence suggests that the barriers to use are sufficiently
high to prevent widespread deployment of amalgamation as
a countermeasure against privacy tools.
VIII. CONCLUSION
We presented FP-IN SP EC TOR, a machine learning based
syntactic-semantic approach to accurately detect browser fin-
gerprinting behaviors. FP-INS PE CTOR outperforms heuristics
from prior work by detecting 26% more fingerprinting scripts
and helps reduce website breakage by 2X. FP -I NS PE CT OR’s
deployment showed that browser fingerprinting is more preva-
lent on the web now than ever before. Our measurement study
on the Alexa top-100K websites showed that fingerprinting
scripts are deployed on 10.18% of the websites by 2,349
different domains.
We plan to report the domains serving fingerprinting scripts
to tracking protection lists such as Disconnect [12] and
EasyPrivacy [14]. FP-IN SP EC TO R also helped uncover ex-
ploitation of several new APIs that were previously not known
to be used for browser fingerprinting. We plan to report
the names and statistics of these APIs to privacy-oriented
browser vendors and standards bodies. To foster follow-up
research, we will release our patch to OpenWPM, finger-
printing countermeasures prototype extension, list of newly
discovered fingerprinting vendors, and bug reports submitted
to tracking protection lists, browser vendors, and standards
bodies at https://uiowa-irl.github.io/FP-Inspector.
ACK NOW LE DG EM EN TS
The authors would like to thank Charlie Wolfe (NSF REU
Scholar) for his help with the breakage analysis. A part of this
work was carried out during the internship of the lead author
at Mozilla. This work is supported in part by the National
Science Foundation under grant numbers 1715152, 1750175,
1815131, and 1954224.
REFERENCES
[1] Adscore privacy policy. https://www.adscore.com/privacy-policy.
[2] Apple Declares War on Browser Fingerprinting, the Sneaky Tactic That
Tracks You in Incognito Mode. https://gizmodo.com/apple- declares-
war-on-browser-fingerprinting-the-sneak-1826549108.
[3] Brave Browser Fingerprinting Protection Mode. https://github.com/
brave/browser-laptop/wiki/Fingerprinting-Protection-Mode.
[4] C3 Metrics privacy policy. https://c3metrics.com/privacy/.
[5] Canvas Defender. https://multilogin.com/canvas-defender/.
[6] Cliqz Content Blocking Library. https://github.com/cliqz- oss/adblocker.
[7] Combating Fingerprinting with a Privacy Budget Explainer. https:
//github.com/bslassey/privacy-budget.
[8] Default on Cookie Restrictions Excerpt. https://mozilla.report/post/
projects/cookie restrictions.kp/.
[9] Disconnect policy review for adscore. https://github.com/
disconnectme/disconnect-tracking- protection/commit/
9666265d0a26fbcc65a20c1021517a44a5ade580.
[10] Disconnect policy review for c3metrics. https://
github.com/disconnectme/disconnect-tracking- protection/
blob/940d5e6da8fbc738a747a30328c397c4f453683a/
descriptions.md#policy-review-3.
[11] Disconnect tracking definition. https://disconnect.me/
trackerprotection#definition-of- tracking.
[12] Disconnect tracking protection lists. https://disconnect.me/
trackerprotection.
[13] DoubleVerify, Product Privacy Notice. https://web.archive.org/web/
20191130014642/https://www.doubleverify.com/privacy/.
[14] EasyPrivacy. https://easylist.to/easylist/easylist.txt.
[15] Fingerprinting Defenses in The Tor Browser. https://www.torproject.org/
projects/torbrowser/design/#fingerprinting-defenses.
[16] Fingerprintjs2 fingerprinting script. https://fingerprintjs.com/.
[17] Firefox Fingerprinting Blocking Breakage Bugs. https:
//bugzilla.mozilla.org/show bug.cgi?id=1527013.
[18] Firm uses typing cadence to finger unauthorized users.
https://arstechnica.com/tech-policy/2010/02/firm-uses-typing-cadence-
to-finger-unauthorized-users/.
[19] Full Third-Party Cookie Blocking and More. https://webkit.org/blog/
10218/full-third- party-cookie- blocking-and- more/.
[20] How to block fingerprinting with Firefox. https://blog.mozilla.org/
firefox/how-to-block-fingerprinting-with-firefox/.
[21] Improving Privacy Without Breaking The Web. https://blog.mozilla.org/
data/2018/01/26/improving-privacy-without-breaking-the-web/.
[22] Integral Ad Science, Privacy Policy. https://web.archive.org/web/
20191130014644/https://integralads.com/privacy-policy/.
[23] Iovation Fraud Protection. https://web.archive.org/web/
20191130164107/https://www.iovation.com/fraudforce-fraud-detection-
prevention.
[24] MDN Web APIs. https://developer.mozilla.org/en-US/docs/Web/API.
[25] MediaMath (MathTag) fingerprinting script. https://
www.mediamath.com/.
[26] Mozilla postpones default blocking of third-party cookies in
Firefox. https://www.computerworld.com/article/2497782/mozilla-
postpones-default- blocking-of- third-party- cookies-in- firefox.html.
[27] Privacy Badger. https://www.eff.org/privacybadger.
[28] requestAnimationFrame API. https://developer.mozilla.org/en-
US/docs/Web/API/window/requestAnimationFrame.
[29] Same-origin security model - Resource Timing APIs. https://
w3c.github.io/perf-security- privacy/#same-origin-security- model.
[30] The Tapad Graph. https://www.tapad.com/the-tapad- graph.
[31] Tor browser bug - reduced time precison to mitimate fingerprinting.
https://trac.torproject.org/projects/tor/ticket/1517.
[32] Tor Browser Fingerprinting Bugs. https://trac.torproject.org/projects/tor/
query?keywords=tbb-fingerprinting.
[33] W3C Fingerprinting Guidance. https://w3c.github.io/fingerprinting-
guidance.
[34] W3C. Privacy Interest Group Charter. https://www.w3.org/2011/07/
privacy-ig-charter.
[35] webRequest API. https://developer.mozilla.org/en-US/docs/Mozilla/
Add-ons/WebExtensions/API/webRequest.
[36] Webshrinker Website Categorization. https://www.webshrinker.com/.
[37] ACA R, G. , EU BAN K, C. , ENGLEHARDT, S., JUAR EZ , M.,
NAR AYAN AN, A ., A ND DIAZ, C. The Web Never Forgets: Persistent
Tracking Mechanisms in the Wild. In CCS (2014).
[38] ACAR, G. , JUA REZ , M., NIKIFORAKIS, N. , DIAZ, C., G ¨
URSES, S.,
PIESSENS, F., AN D PREN EE L, B. FPDetective: dusting the web for
fingerprinters. In Proceedings of CCS (2013), ACM.
[39] AL-FANNA H, N. M., L I, W., AND MITCHELL, C. J . Beyond Cookie
Monster Amnesia: Real World Persistent Online Tracking. In Informa-
tion Security Conference (2018).
[40] AL ACA, F., AND VAN OORSCHOT, P. Device Fingerprinting for Aug-
menting Web Authentication: Classification and Analysis of Methods.
In Proceedings of the 32nd Annual Conference on Computer Security
Applications (ACSAC) (2016).
[41] AZ AD, B . A., STAROV, O., LAPERDRIX, P., AN D NIKIFORAKIS, N.
Web runner 2049: Evaluating third-party anti-bot services. In 17th
Conference on Detection of Intrusions and Malware & Vulnerability
Assessment (DIMVA) (2020).
[42] BL OND EL , V. D. , GUILLAUME, J.-L., LAMBIOTTE, R., A ND LE FEB -
VRE, E. Fast unfolding of communities in large networks. In Journal
of Statistical Mechanics: Theory and Experiment (2008).
[43] BOJINOV, H. , MICHALEVSKY, Y., NA KIB LY, G., AN D BONE H, D .
Mobile Device Identification via Sensor Fingerprinting. In arXiv (2014).
[44] BUR SZ TEI N, E ., MA LYSH EV, A., P IETRASZEK, T., AN D THOMAS, K.
Picasso: Lightweight Device Class Fingerprintingfor Web Clients. In
ACM CCS Workshop on Security and Privacy in Smartphones and
Mobile Devices (SPSM) (2016).
[45] CAO , S. Y., AND WIJ MAN S, E. (cross-)browser fingerprinting via os
and hardware level features. In Proceedings of the 2017 Network &
Distributed System Security Symposium, NDSS (2017), vol. 17.
[46] CURTSINGER, C., LIVSHITS, B. , ZO RN, B ., A ND SEI FE RT, C. ZOZ-
ZLE: Fast and Precise In-Browser JavaScript Malware Detection. In
USENIX Security Symposium (2011).
[47] DAS , A., ACAR, G., B OR ISOV, N. , AN D PRAD EEP, A. The Web’s
Sixth Sense:A Study of Scripts Accessing Smartphone Sensors. In CCS
(2018).
[48] DATTA, A., LU, J ., AND TSCHANTZ, M. C. The effectiveness of
privacy enhancing technologies against fingerprinting. arXiv preprint
arXiv:1812.03920 (2018).
[49] DAVIS , W. BlueCava Touts Device Fingerprinting. https:
//web.archive.org/web/20150928090154/https://www.mediapost.com/
publications/article/166916/bluecava-touts-device- fingerprinting.html,
2012.
[50] DE Y, S., ROY, N., X U, W., CH OUD HU RY, R. R., AND SR I-
HA RINELAKUDITI. AccelPrint: Imperfections of accelerometers make
smartphones trackable. In Proceeding of the 21st Annual Network and
Distributed System Security Symposium (NDSS) (2014).
[51] EC KER SL EY, P. How unique is your web browser? In Privacy
Enhancing Technologies (2010), Springer.
[52] ED ELS TE IN, A. Protections Against Fingerprinting and Cryptocurrency
Mining Available in Firefox Nightly and Beta. https://blog.mozilla.org/
futurereleases/2019/04/09/protections-against- fingerprinting-and-
cryptocurrency-mining-available-in-firefox- nightly-and- beta/, 2019.
[53] ENGLEHARDT, S. The Hidden Perils of Cookie Syncing.
https://freedom-to- tinker.com/2014/08/07/the-hidden- perils-of- cookie-
syncing/, 2014.
[54] ENGLEHARDT, S., AND NA RAYANA N, A. Online Tracking: A 1-
million-site Measurement and Analysis. In ACM Conference on Com-
puter and Communications Security (CCS) (2016).
[55] FASS , A., BACK ES, M ., A ND STO CK , B. Jstap: A static pre-filter
for malicious javascript detection. In Proceedings of the 32nd Annual
Conference on Computer Security Applications (ACSAC) (2019).
[56] FI FIEL D, D., AN D EGELMAN, S. Fingerprinting web users through font
metrics. In Financial Cryptography and Data Security. Springer, 2015,
pp. 107–124.
[57] FO UAD, I ., B IEL OVA, N. , LEGOUT, A., AND SA RA FIJAN OVI C-D JUKIC,
N. Missed by Filter Lists: Detecting Unknown Third-Party Trackers with
Invisible Pixels. In Proceedings on Privacy Enhancing Technologies
(PETS) (2020).
[58] FOW LE R, G. A. Think you’re anonymous online? A third of popular
websites are ’fingerprinting’ you. https://www.washingtonpost.com/
technology/2019/10/31/think-youre- anonymous-online-third-popular-
websites-are- fingerprinting-you/, 2019.
[59] GO MEZ -BOIX, A., L APERDRIX, P., AND BAU DRY, B. Hiding in the
Crowd: an Analysis of the Effectiveness of Browser Fingerprinting at
Large Scale. In The Web Conference (2018).
[60] IK RAM , M., ASGHAR, H. J ., KAAFA R, M. A ., M AHA NT I, A. , AN D
KRISHNAMURTHY, B. Towards Seamless Tracking-Free Web: Improved
Detection of Trackers via One-class Learning . In Privacy Enhancing
Technologies Symposium (PETS) (2017).
[61] IQ BAL , U., S HAFIQ, Z., A ND QIAN, Z. The Ad Wars: Retrospective
Measurement and Analysis of Anti-Adblock Filter Lists. In IMC (2017).
[62] IQ BAL , U., S NYDER, P., ZHU , S., LIVSHITS, B. , QIAN, Z., AN D
SHA FIQ, Z. AdGraph: A Graph-Based Approach to Ad and Tracker
Blocking. In To appear in the Proceedings of the IEEE Symposium on
Security & Privacy (2020).
[63] JOHN RO SS QUINLAN.Induction of decision trees. Kluwer Academic
Publisher, 1986.
[64] LAPERDRIX, P. Browser Fingerprinting: An Introduction and the
Challenges Ahead. https://blog.torproject.org/browser-fingerprinting-
introduction-and- challenges-ahead, 2019.
[65] LAPERDRIX, P., AVOIN E, G., BAU DRY, B., A ND NIKIFORAKIS, N.
Morellian Analysis for Browsers: Making Web Authentication Stronger
with Canvas Fingerprinting. In International Conference on Detection of
Intrusions and Malware, and Vulnerability Assessment (DIMVA) (2019).
[66] LAPERDRIX, P., BI EL OVA, N., BAUDRY, B. , AND AVOI NE , G. Browser
fingerprinting: A survey. arXiv preprint arXiv:1905.01051 (2019).
[67] LAPERDRIX, P., RU DAM ETK IN, W., AND BAU DRY, B. Beauty and
the Beast: Diverting modern web browsers to build unique browser
fingerprints. In IEEE Symposium on Security and Privacy (2016).
[68] LAU ING ER , T., CHAABANE, A. , ARSHAD, S., ROB ERT SON , W., WI L-
SO N, C. , AN D KIRDA , E. Thou Shalt Not Depend on Me: Analysing
the Use of Outdated JavaScript Libraries on the Web. In Network and
Distributed System Security Symposium (NDSS) (2017).
[69] LI, B., VADR EVU , P., LEE , K. H ., AN D PERDISCI, R. JSgraph:
Enabling Reconstruction of Web Attacks via Efficient Tracking of
Live In-Browser JavaScript Executions. In 25th Annual Network and
Distributed System Security Symposium (2018).
[70] LUNDEN, I. Relx acquires ThreatMetrix for 817M to ramp up
in risk-based authentication. https://techcrunch.com/2018/01/29/relx-
threatmetrix-risk- authentication-lexisnexis/, 2018.
[71] MAYE R, J. R. “any person... a pamphleteer”: Internet anonymity in the
age of web 2.0.
[72] MOW ERY, K., AND SHACHAM, H. Pixel perfect: Fingerprinting canvas
in html5. Proceedings of W2SP (2012).
[73] NE TIQ. Device Fingerprinting for Low Friction Authentication.
https://www.microfocus.com/media/white-paper/device fingerprinting
for low friction authentication wp.pdf.
[74] NIKIFORAKIS, N., INVERNIZZI, L ., KAPR AVELO S, A ., AC KER , S. V.,
JOOSEN, W., KRU EG EL, C ., PIESSENS, F., AND VIGN A, G. You
Are What You Include: Large-scale Evaluation of Remote JavaScript
Inclusions. In ACM Conference on Computer and Communications
Security (CCS) (2012).
[75] NIKIFORAKIS, N., KA PR AVELO S, A. , JO OSE N, W., K RUE GEL , C.,
PIESSENS, F., AN D VIGN A, G. Cookieless monster: Exploring the
ecosystem of web-based device fingerprinting. In Security and Privacy
(S&P) (2013), IEEE.
[76] NOTTINGHAM, M. Unsanctioned Web Tracking. https://www.w3.org/
2001/tag/doc/unsanctioned-tracking/, 2015.
[77] OLEJNIK, L., ACA R, G., CASTELLUCCIA, C ., A ND DIAZ, C. The
leaking battery: A privacy analysis of the HTML5 Battery Status API.
In Cryptology ePrint Archive: Report 2015/616 (2015).
[78] OLEJNIK, L., ENGLEHARDT, S., AND NA RAYANA N, A. Battery Status
Not Included:Assessing Privacy in Web Standards. In International
Workshop on Privacy Engineering (2017).
[79] PAPADOPOULOS, P., KOU RTE LLI S, N ., AN D MAR KATOS , E. P. Cookie
Synchronization: Everything You Always Wanted to Know But Were
Afraid to Ask. In The Web Conference (2019).
[80] PAPADOPOULOS, P., SNYDER, P., ATHANASAKIS, D., A ND LIVSHITS,
B. Keeping out the Masses: Understanding the Popularity and Implica-
tions of Internet Paywalls. In The Web Conference (2020).
[81] QUINLAN, R. C4.5: Programs for Machine Learning. Morgan
Kaufmann Publishers, San Mateo, CA, 1993.
[82] SANCHEZ-ROLA , I., SAN TO S, I. , AND BAL ZARO TTI , D. Clock Around
the Clock: Time-Based Device Fingerprinting. In ACM Conference on
Computer and Communications Security (CCS) (2018).
[83] SC HUH , J. Building a more private web. https://www.blog.google/
products/chrome/building-a-more-private-web, 2019.
[84] SC HWARZ , M., LAC KNE R, F., A ND GRU SS, D . JavaScript Template
Attacks: Automatically Inferring Host Information for Targeted Exploits.
In NDSS (2019).
[85] SC ULLY, R. Identity Resolution vs Device Graphs: Clarifying the Differ-
ences. https://amperity.com/blog/identity-resolution- vs-device-graphs-
clarifying-differences/.
[86] SI VAKORN , S. , POLAKIS, J. , AND KERO MY TIS , A. D . I’m not a human:
Breaking the Google reCAPTCHA. In Black Hat Asia (2016).
[87] SKO LK A, P., STAI CU , C.-A., AND PRADEL, M. Anything to Hide?
Studying Minified and Obfuscated Code in the Web. In World Wide
Web (WWW) Conference (2019).
[88] SNYDER, P., ANSARI, L ., TAYL OR, C ., A ND KANICH, C. Browser
feature usage on the modern web. In Proceedings of the 2016 Internet
Measurement Conference (2016), ACM, pp. 97–110.
[89] SNYDER, P., TAYLO R, C. , AN D KANICH, C. Most websites don’t need
to vibrate: A cost-benefit approach to improving browser security. In
Proceedings of the 2017 ACM SIGSAC Conference on Computer and
Communications Security (2017), ACM, pp. 179–194.
[90] STARO V, O., AND NIKIFORAKIS, N. Xhound: Quantifying the finger-
printability of browser extensions. In 2017 IEEE Symposium on Security
and Privacy (SP) (2017), IEEE, pp. 941–956.
[91] VALENTINO RIZZO . Machine Learning Approaches for Automatic
Detection of Web Fingerprinting. Master’s thesis, Politecnico di Torino,
Corso di laurea magistrale in Ingegneria Informatica (Computer Engi-
neering), 2018.
[92] VASTE L, A., LAPERDRIX, P., RU DAM ETK IN , W., AN D ROUVO Y, R.
Fp-Scanner: The Privacy Implications of Browser Fingerprint Inconsis-
tencies. In USENIX Security (2018).
[93] VASTE L, A., LAPERDRIX, P., RU DAM ETK IN , W., AN D ROUVO Y, R.
Fp-stalker: Tracking browser fingerprint evolutions. In 2018 IEEE
Symposium on Security and Privacy (SP) (2018), IEEE, pp. 728–741.
[94] WILANDER, J. Intelligent Tracking Prevention 2.3. https://webkit.org/
blog/9521/intelligent-tracking- prevention-2-3/, 2019.
[95] WOO D, M. Todays Firefox Blocks Third-Party Tracking Cookies and
Cryptomining by Default. https://blog.mozilla.org/blog/2019/09/03/
todays-firefox- blocks-third- party-tracking- cookies-and- cryptomining-
by-default/, 2019.
[96] WU, Q., LI U, Q., ZH AN G, Y., LI U, P., AN D WEN, G . A Machine
Learning Approach for Detecting Third-Party Trackers on the Web. In
ESORICS (2016).
IX. APPENDIX
A. Extensions to OpenWPM JavaScript instrumentation
OpenWPM’s instrumentation does not cover a number of
APIs used for fingerprinting by prominent libraries—
including the Web Graphics Library (WebGL) and
performance.now. These APIs have been discovered
to be fingerprintable [64]. The standard use case of
WebGL is to render 2D and 3D graphics in HTML canvas
element, however, it has potential to be abused for browser
fingerprinting. The WebGL renderer and vendor varies by
the OS and it creates near distinct WebGL images with same
configurations on different machines. The WebgGL properties
and the rendered image are used by current state-of-the-art
browser fingerprinting [16], [25] scripts. Since WebGL is
used by popular fingerprinting scripts, we instrument WebGL
JavaScript API. performance.now is another JavaScript
API method whose standard use case is to return time in
floating point milliseconds since the start of a page load but
it also have fingerprinting potential. Specifically, the timing
information extracted from performance.now can be used
for timing specific fingerprint attacks such as typing cadence
[18], [31]. We extend OpenWPM to also capture execution
of performance.now.
For completeness, we instrument additional un-instrumented
methods of already instrumented JavaScript APIs in Open-
WPM. Specifically, we enhance our execution trace by instru-
menting methods such as drawImage and sendBeacon for
canvas and navigation JavaScript APIs, respectively.
Since most fingerprinting scripts use JavaScript APIs that
are also used by gaming and interactive websites (e.g.
canvas), we instrument additional JavaScript APIs to cap-
ture script’s interaction with DOM. Specifically, to capture
DOM interaction specific JavaScript APIs, we instrument
document,node, and animation APIs. JavaScript is
an event driven language and it has capability to execute
code when events trigger. To extend our execution trace,
we instrument JavaScript events such as onmousemove and
touchstart to capture user specific interactions.
In addition, we notice that some scripts make multiple calls
to JavaScript API methods such as createElement and
setAttribute during their execution. We limit our record-
ing to only first 50 calls of each method per script, except
for CanvasRenderingContext2D.measureText and
CanvasRenderingContext2D.font, which are called
multiple times for canvas font fingerprinting. Furthermore,
the event driven nature of JavaScript makes it challenging to
capture the complete execution trace of scripts. To this end, to
get a comprehensive execution of a script, we synthetically
simulate user activity on a webpage. First, we scroll the
wbepage from top to bottom and do random mouse movements
to trigger events. Second, we record all of the events (e.g.
onscroll) as they are registered on different elements on
a webpage and execute them after 10 seconds of a page
load. Doing so, we synthetically simulate events and capture
JavaScript API methods that were waiting for those events to
trigger.
B. Sample Features Extracted From ASTs & Execution Traces
Table VII shows a sample of the features extracted from
the AST in Figure 2b and Table VIII shows a sample of the
dynamic features extracted from execution trace of Script 3a.
Static Features
ArrayExpression:monospace
MemberExpression:font
ForStatement:var
MemberExpression:measureText
MemberExpression:width
MemberExpression:length
MemberExpression:getContext
CallExpression:canvas
TABLE VII: A sample of features extracted from AST in Figure 2b.
C. Fingerprinting Heuristics
Below we list down the slightly modified versions of
heuristics proposed by Englehardt and Narayanan [54] to
detect fingerprinting scripts. Since non-fingerprinting adoption
of fingerprinting APIs have increased since the study, we make
modifications to the heuristics to reduce the false positives.
These heuristics are used to build our initial ground truth of
fingerprinting and non-fingerprinting scripts.
Feature Name Feature Value
Document.createElement True
HTMLCanvasElement.width True
HTMLCanvasElement.height True
HTMLCanvasElement.getContext True
CanvasRenderingContext2D.measureText True
Element Tag Name Canvas
HTMLCanvasElement.width 100
HTMLCanvasElement.height 100
CanvasRenderingContext2D.measureText 7 (no. of chars.)
CanvasRenderingContext2D.measureText N (no. of calls)
TABLE VIII: A sample of the dynamic features extracted from the
execution trace of Script 3a.
Canvas Fingerprinting. A script is identified as canvas fin-
gerprinting script according to the following rules:
1) The canvas element text is written with fillText or
strokeText and style is applied with fillStyle or
strokeStyle methods of the rendering context.
2) The script calls toDataURL method to extract the
canvas image.
3) The script does not calls save,restore, and
addEventListener methods on the canvas element.
WebRTC Fingerprinting. A script is identified as WebRTC
fingerprinting script according to the following rules:
1) The script calls createDataChannel or
createOffer methods of the WebRTC peer connection.
2) The script calls onicecandidate or local
Description methods of the WebRTC peer connection.
Canvas Font Fingerprinting. A script is identified as canvas
font fingerprinting script according to the following rules:
1) The script sets the font property on a canvas element
to more than 20 different fonts.
2) The script calls the measureText method of the ren-
dering context more than 20 times.
AudioContext Fingerprinting. A script is identified as Audio-
Context fingerprinting script according to the following rules:
1) The script calls any of the createOscillator,
createDynamicsCompressor,destination,start
Rendering, and oncomplete method of the audio context.
D. Examples of Dormant and Deviating Scripts
Script 3 shows an example dormant script and Script 4
shows an example deviating script.
E. Why Machine Learning?
To conduct fingerprinting, websites often embed off-the-
shelf third-party fingerprinting libraries. Thus, one possible ap-
proach to detect fingerprinting scripts is to simply compute the
textual similarity between the known fingerprinting libraries
and the scripts embedded on a website. Scripts that have
higher similarity with known fingerprinting libraries are more
likely to be fingerprinting scripts. To test this hypothesis, we
compare the similarity of fingerprinting and non-fingerprinting
scripts detected by FP -INSPE CT OR against fingerprintjs2, a
1(function(g) {
2......
3n.prototype ={
4getCanvasPrint: function() {
5var b=document.createElement("canvas"),d;
6try {
7d=b.getContext("2d")
8}catch (e) {
9return ""
10 }
11 d.textBaseline ="top";
12 d.font ="14px 'Arial'";
13 ...
14 d.fillText("http://valve.github.io", 4, 17);
15 return b.toDataURL()
16 }
17 };
18 "object" === typeof module &&
19 "undefined" !== typeof exports && (
module.exports =n);
20 g.ClientJS =n
21 })(window);
Script 3: A truncated example of a dormant script from
sdk1.resu.io/scripts/resclient.min.js in which function prototypes
are assigned to the window object and can be called at a later
point in time.
1...
2canvas: function(t) {
3var e=document.createElement("canvas");
4if ("undefined" == typeof e.getContext)
5t.push("UNSUPPORTED_CANVAS");
6else {
7e.width =780, e.height =150;
8var n="UNICODE STRING",
9i=e.getContext("2d");
10 i.save(), i.rect(0,0,10,10), i.rect(2,2,6,6),
11 t.push(!1 === i.isPointInPath(5, 5, "evenodd")
12 ? "yes" : "no"), i.restore(), i.save();
13 var r=i.createLinearGradient(0, 0, 200, 0);
14 .....
15 i.shadowColor="rgb(85,85,85)",i.shadowBlur=3,
16 i.arc(500,15,10,0,2*Math.PI,!0),i.stroke(),
17 i.closePath(),i.restore(),t.push(e.toDataURL())
18 }
19 return t
20 }
21 ...
Script 4: A truncated example of a deviating script from
webresource.c-ctrip.com/code/ubt/ bfa.min.js?v=20195 22.js.
The heuristic is designed to ignore scripts that call save or
restore on CanvasRenderingContext2D as a way to
reduce false positives.
popular open-source fingerprinting library. Specifically, we
tokenize scripts into keywords by first beautifying them and
then splitting them on white spaces. We then compute a tok-
enized script’s Jaccard similarity, pairwise, with all versions of
fingerprintjs2. The highest similarity score among all versions
is attributed to a script.
Our test set consists of the fingerprinting scripts detected
by FP-INS PE CTOR and an equal number of randomly sampled
non-fingerprinting scripts. Figure 5, plots the similarity of FP -
INS PE CTOR’s detected fingerprinting and non-fingerprinting
scripts with fingerprintjs2. We find that the majority of the
detected fingerprinting scripts (54.06%) have less than 6%
similarity to fingerprintjs2 and only 13.49% of the scripts have
more than 30% similarity. Whereas most of the detected non-
fingerprinting scripts (90.94%) have less than 5% similarity to
fingerprintjs2 and only 9.05% of the scripts have more than 5%
similarity. We find that the true positive rate is at the highest
(69.20%) and false positive rate is at the lowest (5.97%) with
an accuracy of 81.69%, when we set the similarity threshold
to 5.28%. The shaded portion of the figure represents the
scripts classified as non-fingerprinting and the clear portion
of the figure represents the scripts classified as fingerprinting
using this threshold. There is a significant overlap between the
similarity of both fingerprinting and non-fingerprinting scripts
and there is no optimal way to use similarity as a classification
threshold.
0.0% 5.0% 10.0% 15.0% 20.0% 25.0% 30.0%
Similarity with fingerprintjs2
0.0
0.2
0.4
0.6
0.8
1.0
Fraction of scripts
Non-Fingerprinting
Fingerprinting
Fig. 5: Jaccard similarity of fingerprinting and non-fingerprinting
scripts with fingerprintjs2. The shaded portion of the figure represents
the scripts classified as non-fingerprinting and the clear portion of the
figure represents the scripts classified as fingerprinting based on the
similarity threshold.
Overall, our analysis shows that most websites do not inte-
grate fingerprinting libraries as-is but instead make alterations.
Alterations often include embedding minified or obfuscated
versions of the fingerprinting libraries, embedding only a
subset of the fingerprinting functionality, or fingerprinting
libraries inspired re-implementation. Such alterations cause a
lower similarity between fingerprinting scripts and popular
fingerprinting libraries. We also find that several APIs are
frequently used in both fingerprinting and non-fingerprinting
scripts. Common examples include the use of utility APIs
such as Math and window, and non-fingerprinting scripts using
fingerprinting APIs for functional purposes e.g. canvas API
being used for animations. The presence of such APIs results
in increase of similarity between non-fingerprinting scripts and
fingerprinting libraries. A simple similarity metric cannot gen-
eralize on alterations to fingerprinting libraries and functional
uses of APIs, and thus fails to detect fingerprinting scripts.
Whereas, our syntactic-semantic machine learning approach
is able to generalize. Our analysis justifies the efficacy of a
learning based approach over simple similarity metric.
F. JavaScript APIs Frequently Used in Fingerprinting Scripts
Below we provide a list of JavaScript API keywords fre-
quently used by fingerprinting scripts. To this end, we measure
the relative prevalence of API keywords in fingerprinting
scripts by computing the ratio of their fraction of occurrence
in fingerprinting scripts to their fraction of occurrence in
non-fingerprinting scripts. A higher value of the ratio for a
keyword means that it is more prevalent in fingerprinting
scripts than non-fingerprinting scripts. Note that means that
the keyword is only present in fingerprinting scripts. Table IX
includes keywords that have pervasiveness values greater than
or equal to 16 and are present on 3 or more websites.
Keywords Ratio Scripts Websites
(count) (count)
onpointerleave 4 1366
StereoPannerNode 1 1363
FontFaceSetLoadEvent 1 1363
PresentationConnection
AvailableEvent 1 1363
msGetRegionContent 1 1363
peerIdentity 1 1363
MSManipulationEvent 1 1363
VideoStreamTrack 1 1363
mozSetImageElement 1 1363
requestWakeLock 1 174
audioWorklet 3 8
onwebkitanimationiteration 3 3
onpointerenter 3 3
onwebkitanimationstart 3 3
onlostpointercapture 3 3
ongotpointercapture 362.52 3 3
onpointerout 362.52 3 3
onafterscriptexecute 217.51 18 1380
channelCountMode 199.03 28 39
onpointerover 181.26 3 3
onbeforescriptexecute 181.26 18 1380
onicegatheringstatechange 179.78 61 61
MediaDevices 161.12 4 1366
numberOfInputs 157.09 26 36
channelInterpretation 147.69 11 22
speedOfSound 140.98 7 11
dopplerFactor 140.98 7 11
midi 138.72 225 251
ondeviceproximity 131.35 25 282
HTMLMenuItemElement 121.40 218 244
updateCommands 120.84 1 1363
exportKey 105.97 57 57
onauxclick 90.63 3 3
microphone 90.43 223 250
iceGatheringState 90.30 68 1481
ondevicelight 88.31 19 36
renderedBuffer 87.17 189 439
WebGLContextEvent 82.52 28 44
ondeviceorientationabsolute 80.56 4 1366
startRendering 79.33 193 458
createOscillator 78.77 191 445
knee 76.65 170 419
OfflineAudioContext 74.68 199 721
timeLog 72.50 12 12
getFloatFrequencyData 72.50 6 10
WEBGL compressed texture atc 72.50 3 4
illuminance 72.50 3 3
reduction 69.64 170 419
modulusLength 69.39 58 58
WebGL2RenderingContext 68.71 29 30
enumerateDevices 64.12 208 666
AmbientLightSensor 63.60 10 267
attack 61.31 173 434
AudioWorklet 60.42 22 32
Worklet 60.42 22 32
AudioWorkletNode 60.42 22 32
lastStyleSheetSet 60.42 1 1363
DeviceProximityEvent 60.42 1 1363
DeviceLightEvent 60.42 1 1363
enableStyleSheetsForSet 60.42 1 1363
UserProximityEvent 60.42 1 1363
mediaDevices 60.03 230 850
vendorSub 56.17 251 1728
setValueAtTime 55.29 167 417
getChannelData 55.18 195 460
MAX DRAW BUFFERS WEBGL 54.93 10 12
reliable 52.36 39 103
WEBGL draw buffers 52.09 25 27
EXT sRGB 51.79 3 4
setSinkId 50.35 5 1367
namedCurve 50.29 67 74
WEBGL debug shaders 45.31 3 4
productSub 42.79 734 2819
hardwareConcurrency 41.92 716 3661
publicExponent 41.52 67 74
requestMIDIAccess 40.28 1 1363
mozIsLocallyAvailable 40.28 1 174
ondevicemotion 40.28 4 4
XPathResult 39.73 218 417
mozBattery 39.04 42 322
IndexedDB 38.73 25 25
generateKey 37.46 62 62
buildID 36.52 272 414
getSupportedExtensions 36.46 534 1007
MAX TEXTURE MAX
ANISOTROPY EXT 35.85 521 980
oscpu 35.33 681 1196
oninvalid 34.75 65 1428
vpn 34.53 24 24
createDynamicsCompressor 33.54 189 442
privateKey 33.46 67 74
EXT texture filter anisotropic 32.91 479 949
isPointInPath 32.17 481 949
getContextAttributes 31.76 460 920
BatteryManager 31.23 23 50
getShaderPrecisionFormat 30.81 450 915
depthFunc 30.81 452 921
uniform2f 30.71 460 930
rangeMax 30.36 449 902
rangeMin 30.24 446 897
EXT disjoint timer query 30.21 3 4
scrollByPages 30.21 1 1363
CanvasCaptureMediaStreamTrack 30.21 1 18
onlanguagechange 30.21 4 4
clearColor 29.16 457 916
createWriter 28.93 17 17
getUniformLocation 28.61 466 948
getAttribLocation 28.58 464 945
drawArrays 28.53 466 948
useProgram 28.37 467 949
enableVertexAttribArray 28.37 466 948
createShader 28.31 467 949
compileShader 28.30 467 936
shaderSource 28.27 466 936
attachShader 28.25 464 934
bufferData 28.24 466 938
linkProgram 28.23 464 933
vertexAttribPointer 28.22 464 933
bindBuffer 28.14 463 932
createProgram 27.95 464 934
OES standard derivatives 27.46 20 1384
appCodeName 27.03 325 1890
getAttributeNodeNS 26.49 16 21
ARRAY BUFFER 25.36 471 941
suffixes 25.14 775 1441
TouchEvent 25.01 481 1130
MIDIPort 24.17 2 19
onaudioprocess 23.64 9 17
showModalDialog 23.56 39 1419
globalStorage 23.48 245 1681
camera 22.76 229 255
onanimationiteration 22.66 3 3
textBaseline 21.76 888 3234
MediaStreamTrackEvent 21.32 3 1365
deviceproximity 21.13 25 26
taintEnabled 20.89 14 24
alphabetic 20.65 671 2986
userproximity 20.28 24 25
globalCompositeOperation 20.15 507 975
outputBuffer 20.14 12 34
WebGLUniformLocation 20.14 1 1363
WebGLShaderPrecisionFormat 20.14 1 1363
createScriptProcessor 20.14 11 20
createBuffer 19.98 472 954
UIEvent 19.93 47 63
toSource 19.54 416 2224
createAnalyser 19.33 12 17
fillRect 19.22 898 3432
evenodd 18.49 504 960
fillText 18.09 957 3502
candidate 18.03 178 1847
WEBGL debug renderer info 17.83 406 2214
toDataURL 17.64 951 3507
dischargingTime 17.53 38 54
bluetooth 17.28 225 424
FLOAT 16.89 467 939
battery 16.82 152 1853
devicelight 16.51 25 26
onanimationstart 16.48 3 3
getExtension 16.43 575 1115
onemptied 16.11 4 4
TABLE IX: JavaScript API keywords frequently used in fingerprint-
ing scripts, and their presence on 20K websites crawl. Scripts (count)
represents the number of distinct fingerprinting scripts in which the
keyword is used and Websites (count) represents the number of
websites on which those scripts are embedded.