Proceedings on Privacy Enhancing Technologies ; 2021 (3):453–473
Nathan Reitinger* and Michelle L. Mazurek
ML-CB: Machine Learning Canvas Block
Abstract: With the aim of increasing online privacy,
we present a novel, machine-learning based approach
to blocking one of the three main ways website visi-
tors are tracked online—canvas fingerprinting. Because
the act of canvas fingerprinting uses, at its core, a
JavaScript program, and because many of these pro-
grams are reused across the web, we are able to fit sev-
eral machine learning models around a semantic repre-
sentation of a potentially offending program, achieving
accurate and robust classifiers. Our supervised learn-
ing approach is trained on a dataset we created by
scraping roughly half a million websites using a cus-
tom Google Chrome extension storing information re-
lated to the canvas. Classification leverages our key in-
sight that the images drawn by canvas fingerprinting
programs have a facially distinct appearance, allowing
us to manually classify files based on the images drawn;
we take this approach one step further and train our
classifiers not on the malleable images themselves, but
on the more-difficult-to-change, underlying source code
generating the images. As a result, ML-CB allows for
more accurate tracker blocking.
Keywords: Privacy, Device Fingerprinting, Web Measurement,
Machine Learning
DOI 10.2478/popets-2021-0056
Received 2020-11-30; revised 2021-03-15; accepted 2021-03-16.
1 Introduction
Preventing online tracking is particularly problematic
when considering dual use technologies [1–4]. If a tool
maintains a functional purpose, even if it is not a main-
stream staple of the web, the case for blocking the tool
outright suffers from diminishing returns. The cost of
privacy is functionality, a trade-off which few users are
willing to take [5, 6].
HTML5’s canvas suffers from this trade-off. Orig-
inally developed to allow low-level control over a bit-
*Corresponding Author: Nathan Reitinger: University
of Maryland, E-mail: nlr@umd.edu
Michelle L. Mazurek: University of Maryland, E-mail:
mmazurek@umd.edu
mapped area of a browser window [7], the canvas ele-
ment was soon identified a boon for fingerprinting, given
its ability to tease out uniqueness between devices while
appearing no different than any other image rendered
on a webpage [8]. This occurs because the same image,
when drawn on the canvas rather than served as a sin-
gle .jpg or .png image file, will render uniquely given a
machine’s idiosyncratic software and hardware charac-
teristics (e.g., variations in font rasterization techniques
like anti-aliasing or hinting, which try to improve ren-
dering quality) [3, 9].
While methods exist to block or spoof use of the
canvas [10], the current state of affairs leaves much to
be desired—over-blocking (e.g., prohibiting javascript
entirely [11, 12]) or under-blocking (e.g., blocking web
requests based on a priori knowledge of a tracking pur-
pose [13]) is the norm. This leaves us with the follow-
ing grim outlook: “[I]n the case of canvas fingerprinting,
there is little that can be done to block the practice,
since the element is part of the HTML5 specification
and a feature of many interactive, responsive websites
[14].”
This paper introduces ML-CB, a means of increas-
ing online privacy by blocking only adverse canvas-
based actions. ML-CB leverages the key insight that
the images drawn to the canvas, when used for finger-
printing, have a distinct and repetitive appearance, in
part due to the highly cloned nature of fingerprinting
programs on the web [15]. What is more, by labeling
programs based on the images drawn—but training our
machine learning models on the text generating those
images—we are able to take advantage of a simple label-
ing process combined with a nuanced understanding of
program text. This novel approach provides models that
are less overfit to particular images and more robust—
though not perfect—against obfuscation and minifica-
tion.
In summary, we:
1. Present the canvas fingerprinting dataset, con-
taining over 3,000 distinct canvas images related to
over 150,000 websites which used HTML5’s canvas
element at the time of our roughly half-a-million-
website scrape.1
1Accompanying material available at: https://osf.io/shbe7/.
ML-CB 454
2. Measure canvas fingerprinting across the web.
3. Introduce ML-CB—a means of using distinguish-
able pictorial information combined with underlying
website source code to produce accurate and robust
classifiers able to discern fingerprinting from non-
fingerprinting canvas-based actions.
2 Background and Related Work
Prominent, in-practice tools driving online tracking can
be categorized at a high level into several buckets, in-
cluding: (1) stateful tracking such as cookies [4]; (2)
browser-based configurations [16, 17]; and (3) the can-
vas [3, 10, 18]. Cookies have been studied for a long
time, are a form of stateful tracking (i.e., in the typical
case, information must be stored client-side), and cre-
ate problems related to consent (i.e., notice and choice
has a long history of well-documented failures) [19–23].
Browser-based configurations, though a form of stateless
tracking (i.e., no information needs to be stored client-
side, allowing the tools to operate covertly), have also
received a lot of attention from the privacy community
[24]; more importantly, browser vendors are in a good
position to address this area through default configura-
tions, like Tor, Brave, Firefox, and Safari have done over
the years [25–28]. Canvas fingerprinting, on the other
hand, is a stateless form of tracking and is a dual-use
tracking vector, making it difficult for a browser vendor
to block without sacrificing functionality. Further, rela-
tively few efforts to describe fingerprinting focus specif-
ically on the canvas. For these reasons, we exclusively
focus on the canvas.
2.1 The Canvas
Canvas fingerprinting originates from a 2012 paper ex-
plaining how images drawn to HTML5’s canvas produce
a high amount of Shannon Entropy [29] given the unique
software and hardware characteristics of user machines
(e.g., browser version, operating system, graphics hard-
ware, and anti-aliasing techniques) [8, 30]. Researchers
had Amazon Mechanical Turk (M-Turk) workers visit a
website surreptitiously hosting a canvas fingerprinting
script, in order to assess participant identifiability. The
following piece of JavaScript code is an updated ver-
sion of what the researchers used, illustrating the core
of canvas fingerprinting.
1va r ca n va s = do cu me n t . cr e at e El e me nt ( ‘ c an va s ’ );
2va r ct x = ca n va s . g e t C on t e xt ( ‘2 d ’ ) ;
3va r txt = ‘C wm f jo rdb an k gl yp hs ve xt qu iz ’;
4ct x . t e xt B as e l in e = "t o p ";
5ct x . f on t = "16 p x ‘ Ar ia l ’" ;
6ct x . t ex t B as e l in e = "a l p ha b e ti c " ;
7ct x . f il l S ty l e = "# f6 0 " ;
8ct x . fi ll Re c t (1 25 , 1 ,6 2 ,2 0) ;
9ct x . f il l S ty l e = "# 06 9 ";
10 ct x . fi l lT e xt ( tx t , 2, 1 5) ;
11 ct x . f il l S ty l e = "r gb a ( 10 2 , 20 0 , 0, 0 .7 ) " ;
12 ct x . fi l lT e xt ( tx t , 4, 1 7) ;
13 va r s tr n g = ca nv a s . t o Da t a UR L () ;
Lines 1-2 create the canvas element; lines 3-12 add
color and text (fillStyle and fillText, respectively);
and line 13 converts the image into a character string us-
ing toDataURL. In fact, toDataURL is a lynchpin method
for the fingerprinter [31]. This function allows the im-
age drawn to be turned into a base64 encoded string,
which may be hashed and compared with other strings.
If a fingerprinting-compatible image, such as a mix of
colored shapes and a pangram [8], is drawn, the result-
ing hash will have high entropy, leading to identification
of a user (or more accurately, a device, assumed to be
associated with a user [32]) among a set of users [33].
Notably, the amount of entropy varies per drawing, but
colored images and text have been shown to be most
effective [8, 33].
Following this work, a series of repositories for plug-
and-play fingerprinting popped up [33–35], but it re-
mained unclear whether these mechanisms were being
adopted in the wild. Then, in 2016, Englehardt and
Narayanan [36] conducted a large-scale measurement
study with OpenWPM [37], finding that these types of
fingerprints were indeed widely used. The catch was that
the canvas did not have a solely malicious purpose, it
was used to both track the user and benefit the user’s
web experience. And this posed the seminal question:
How can canvas actions used only for device fingerprint-
ing be distinguished and blocked [11, 38, 39]?
2.2 Approaches to Blocking Canvas
Fingerprinting
Generally, three solutions to blocking “bad” canvas ac-
tions exist: (1) block or spoof al l canvas actions; (2)
prompt the user for a block–no-block decision; and (3)
use blocklists to block or permit particular attributes,
most commonly specific URLs [40].
Rote Blocking. The first option is most common
among anti-tracker canvas-blocking techniques. Tools
like canvasfingerprintblock [41] simply return an
empty images for all canvas drawings. A similar ap-
proach may be seen in FP-Block [42] where a spoof,
ML-CB 455
including the FP-Block logo, was added to all canvas
images. Though the researchers did distinguish between
tracking on a main page and cross-domain tracking (pre-
venting third parties from linking users between sites),
the spoofing was nonetheless rote, failing to distinguish
between “fingerprinting” or “non-fingerprinting” canvas
actions. Likewise, PriVaricator [43] adds random noise
to all canvas image output returned from the toDataURL
call (see also [44]). Although the noise is only added
to toDataURL output, this approached received critique
for its identifiability by fingerprinters [45]. Baumann et
al. argue that their DCB tool “transparently” modi-
fies all canvas images and therefore achieves a similar
spoof without sacrificing image quality or identifiabil-
ity [45]. However, DCB also assumes a canvas image of
1,806 pixels; our scrape, discussed below in Section 3,
identified images used for both fingerprinting and non-
fingerprinting that were represented at 16x22 pixels,
making these transparent changes likely noticeable to
the user or adversary. FPRandom [46] approaches the
problem like DCB and PriVaricator, but modifies the
canvas by manipulating underlying browser source code
to return slightly different values on subsequent func-
tion calls. A similar approach is taken by Blink [47],
which uses a variety of configuration options to create
the same type of image inconsistency. UniGL [9] follows
suit by modifying all renderings made with WebGL (in-
teractive 2D and 3D images) to make images uniform
[16]. These last three efforts garner the same criticism
for identifiability in image manipulation [48, 49], and,
by modifying all images, this may lead to a cost-benefit
ratio disfavoring adoption via adulterated functionality
[5, 6].
User Prompt. The second option for blocking can-
vas actions centers on user-focused control. The canon-
ical example here is Tor’s prompt on all canvas actions,
disabling these images by default and asking the user for
permission to render [50]. A problem with this approach
is the requisite knowledge required to correctly make a
block–no-block decision. Many users, even experienced
ones, would have no basis from which to decide whether
to block or permit a particular canvas action.
Finally, there is a third option: choosing specfici in-
stances to block/permit, generally using either heuris-
tics or machine-learning classifiers.
Blocklist—Heuristic. For heuristics, in the sim-
plest case (e.g., Disconnect [51]), a predefined list of
domain names flagged as block-worthy is used. These
lists are difficult to maintain, and do not always accom-
modate changes. For example, the company ForeSee,
at one point in time, used fingerprinting techniques on
.gov TLDs like ftc.gov and state.gov [52, 53]. After re-
ceiving scrutiny for the practice [54], the fingerprint-
ing scripts were removed, but the company remains on
Disconnect’s blocklist [55]. A more advanced tool, FP-
Guard [56], considers a canvas action “suspicious” if it
reads and writes to the canvas, and more suspicious if
the canvas drawing is dynamically created. Yet, in our
crawl, we found examples of these actions used for both
beneficial canvas images as well as tracking-based can-
vas images.
Perhaps the best example of where heuristic-based
blocklists fall short is the popular false positive triggered
on Wordpress’s script meant to test emoji settings [57].
Although the script has a benevolent, user-focused pur-
pose, it is often flagged by heuristics because it acts
like a fingerprinting script, creating the canvas element,
filling it with color and text, and reading the element
back with a call to toDataURL [14, 49, 58, 59]. As we
discuss in our results (Section 4.2), although blocklist-
based heuristics for canvas fingerprinting, like the cur-
rent state of the art from Englehardt and Narayanan
(see Appendix A) [36], these heuristics are often highly
accurate overall in part because they lean toward label-
ing instances as non-fingerprinting, which is the major-
ity class, but may perform less well is correctly identify-
ing instances of fingerprinting [36, 60]. Further, an ad-
versary may adapt to these heuristics and purposefully
avoid or include certain functions to escape classifica-
tion.
Blocklist—Machine Learning. Seeking robust-
ness, another class of research [15, 60–62] opts instead
to use machine learning to make the block–no-block de-
cision. Researchers here start with a set of ground truth
(i.e., labeling programs through manual inspection or
applying heuristic-based rules), but then build on the
ground truth by training machine learning models to
detect fingerprinting programs.
Researchers in [15], on which our work is based, at-
tempted to solve the false positive problem by leveraging
the fact that most tracking scripts are functionally and
structurally similar. Using this key insight, researchers
expert-labeled a set of Selenium-scraped programs, used
a semantic representation of these programs via the
“canonical form” (i.e., a string representation of the pro-
gram which accounts for tf-idf ranked n-grams based
on the program’s data and control flows [63, 64]), and
trained one-class and two-class support vector machines
(SVMs) [65] on the programs. The result, on originally-
scraped data, in the best case, had an accuracy of
99%. Though impressive, researchers took their model
and applied it to an updated set of Selenium-scraped
ML-CB 456
programs. Here, the model’s accuracy dropped signifi-
cantly, down to 75% when labeling tracking programs
and 81% when labeling functional programs. Moreover,
the researchers acknowledge that this method would
only work with a continually trained and updated SVM,
because unseen programs (either due to obfuscation or
novel functionality) would likely be inaccurately classi-
fied.
Overcoming some of these issues, FPInspector [60]
(see also [61] which takes a similar approach, but uses
an SVM and random forest as models) uses machine
learning built on a heuristic-based understanding [36]
of fingerprinting programs. Researchers applied static
analysis (i.e., text-based abstract syntax trees [66, 67]
mined for features using keywords commonly associ-
ated with fingerprinting APIs) and dynamic analysis
(i.e., statistics-based features like function call counts
and attributes of particular APIs like height and width
of a canvas image) to generate source-code based fea-
tures, and then used a decision tree [68] to split fea-
tures on highest information gain [69] for classifica-
tion. FPInspector proved accurate (i.e., 99% accuracy,
with 93% precision and recall) on a manually cre-
ated dataset2with manual-inspection retraining for im-
proved ground truth. Interestingly, researchers in FPIn-
spector noted how toDataURL was a watershed func-
tion for overall fingerprinting classification, citing it as
one of two features with the most information gain
(getSupportedExtensions was the second). FPInspec-
tor uses Englehardt and Narayanan’s heuristic list (Ap-
pendix A) as ground truth to classify canvas fingerprint-
ing [36], which as we note above works well for non-
fingerprinting examples, but not as well on fingerprint-
ing examples (see Section 4.2).
Our work uses a similar approach to [15, 60], but
leverages manual classification of images instead of a
heuristic-based ground truth. We also take advantage of
program representation specifically aimed at JavaScript
through our use of jsNice, helping alleviate problems
related to classifying minified or obfuscated text. The
following section outlines the architecture behind ML-
CB.
2As is the case with our research, manual labeling is necessary
given the lack of an existing “fingerprinting” dataset (e.g., Dis-
connect’s list is by domain and not by program, and although
other datasets have been released with fingerprinting examples
[36], these are not updated frequently).
3 Architecture
Our goal is to build a classifier that can distinguish be-
tween “non-fingerprinting” and “fingerprinting” canvas
actions. To that end, we first generate a labeled dataset
to be used for training and testing. The following sec-
tions (see Figure 1 for an overview) describe our nearly
half-million-website scrape (3.1), resulting dataset (3.2),
and labeling process (3.3). We then discuss canvas’s use
in the wild (3.4) before describing our fetching of web-
site source code (3.5) and the machine learning models
we used to train our classifiers (3.6).
3.1 Scrape
In order to classify the underlying programs driving can-
vas fingerprinting with supervised machine learning, a
dataset of programs and labels is needed. The programs
should be those used for both fingerprinting and non-
fingerprinting purposes. To gather this data, we scraped
the web in August, 2018. The scrape took nearly one
week to complete.
We used Selenium, a popular Python web-scraping
tool, and crawled 484,463 websites in total (limited due
to budget constraints) [70]. We visited websites listed on
Alexa Top Sites, using the Alexa Top Sites API, ordered
by Alexa Traffic Rank [71]. To maintain efficiency, we
parallelize this process with 24 non-headless (for cap-
turing screenshots) Chrome browsers.
Each Chrome browser was driven to a targeted
website’s landing page. No additional gestures, such as
scrolling down or moving the mouse, were used. Ad-
ditionally, no sub-pages were visited,3and we set no
“pause” between websites. As soon as the browser was
finished loading the landing page, it could move on to
the next website in the targeted URLs list.4
The browsers included a custom extension which
pre-loaded the targeted website’s landing page and
3This design choice was made to identify a lower bound on
canvas fingerprinting, when even minimal interaction with the
website would garner a fingerprint for the user.
4Although other researchers have noticed an increase in the
number of non-functional files (i.e., JavaScript programs) initi-
ated after waiting for a few seconds on each page [15], we did not
notice a change in the files themselves when comparing calls to
toDataURL. Waiting a few seconds may have increased the num-
ber of times a file was called, but typically no ‘new’ files were
called within the waiting period. Therefore, we did not add a
pause on each landing page.
ML-CB 457
Fig. 1. ML-CB. We scrape (3.1) the web with a custom Chrome extension, adding a hook on the toDataURL function. The hook up-
dates the database (3.2), storing canvas-related information. Canvas images, along with supporting material, are visually inspected
for labeling (3.3).Website source code (3.5) associated with the images are then fetched and stored in either plaintext or jsNiceified
form. Machine learning models (3.6) are trained on both images and text.
its associated HTTP links by adding the lis-
tener onBeforeRequest.addListener [72]. All requests
(i.e., HTTP or HTTPS links) were parsed using
XMLHttpRequest [73]. We term the linked content of a
request to be a ‘file’ (i.e., either an HTML document
with in-DOM JavaScript or a standalone JavaScript
program with a .js file extension).
We scanned each file for the toDataURL function5
based on a plaintext string match with the term to-
DataURL.6Upon a successful match, we added the file’s
information to our database of toDataURL occurrences.
We also altered the file itself by appending a JavaScript
prototype that modified the default toDataURL function,
by requiring it to capture additional data about how it
was used: the initiating domain (i.e., top-level initia-
tor associated with the function’s use), the requested
URL (i.e., origin_url), and a screenshot of the page.
When triggered by the execution of toDataURL, the pro-
totype updated the entry in our database to show that
toDataURL had actually been used, and to store the ad-
ditional data. All database updates were formatted as
SQL query strings, automatically downloaded by the
browser as a .sql file. In this way, our scrape analyzed
the use of toDataURL dynamically, unlike other stud-
ies which analyzed webpage text post-scrape, in a static
5Although several functions may be used to obtain a base-
64 encoded string of the canvas image (e.g., toDataURL or
getImageData), we focused exclusively on toDataURL.toDataURL
is the most common function used in canvas fingerprinting [34]
and this limitation is a common practice in the literature [60].
A similar procedure could be followed with other functions.
6If the function’s name were obfuscated, our hook would be un-
successful, giving us a lower bound on use of toDataURL. Anecod-
toally, we often found that even when a file was noticeably ob-
fuscated, toDataURL could be found as a plaintext string.
fashion [36]. As the crawl evolved, our Postgres database
[74] filled with canvas information.
3.2 Dataset
We structured the resulting dataset around
<initiator, origin_url> primary key pairs. This is
because a single domain (i.e., initiator) may call several
files (i.e., origin_urls) which serve various purposes,
and, ultimately, we want to block or permit a specific
file from engaging in canvas fingerprinting. Notably,
a single file may also draw several canvas images. We
handle this case by storing each image (in base64 en-
coded form) in an array associated with the initiator
and origin_url pair. If any of the images in the array are
flagged as canvas fingerprinting, the origin_url will be
labeled as fingerprinting. Finally, it is notable that the
dataset may include more than JavaScript programs.
This is because an initiator may interleave script tags
with HTML and CSS.
Out of the 484,463 websites targeted, 197,918
unique initiators (i.e., domains) were inserted into the
dataset, representing those websites with the string
toDataURL located either in the DOM or in a standalone
JavaScript file. This means that almost half of the URLs
targeted called a file that used the function toDataURL,
although this number is imprecise because some of the
targeted websites resulted in time-outs or, for example,
experienced implementation bugs such that toDataURL
did not execute (meaning that our extension did not log
the image to be drawn by the canvas, but did log the
static use of toDataURL).
In total, 402,540 canvas images were captured by
the scrape; these included 3,517 distinct images with
a unique base64 encoding. These images occurred in
the DOM of 103,040 files and in 177,108 standalone
ML-CB 458
JavaScript files. This demonstrates that many canvas
images are repeated across several domains. A further
analysis of the canvas’s use in the wild follows our dis-
cussion of labeling these images.
3.3 Labeling
We labeled each image according to its use for canvas
fingerprinting, true or false. Our key insight is that la-
beling may be done with high accuracy by looking at
the images themselves, rather than the corresponding
program text. For example, here is a drawing from the
dataset appearing to have no fingerprinting use:
And here is a drawing typically occurring in fingerprint-
ing programs:
To confirm this insight, we ran a mini-scrape and
reviewed the images produced by toDataURL. We found
that most variations of the images used for fingerprint-
ing conform to a similar pattern (e.g., the use of text,
one or more colored shapes, and the use of emojis). In
fact, many of these patterns come directly from popular
repositories providing examples of canvas fingerprinting
[34, 75] and are embarrassingly identifiable.
From the mini-scrape, we also found close cases,
where we were unsure of the appropriate label based
only on the image drawn. For example, Facebook’s use
of layered emojis (see Figure C (A.6) in the Appendix)
did not, at the time, appear visually similar to other
images used for fingerprinting. Although we might hy-
pothesize a purpose based on the domain alone, we had
some manner of doubt regarding the image’s purpose—
we would later learn that Facebook was an early adopter
of Picasso-styled canvas fingerprints [33], which were fre-
quently seen in our 2020 follow-up scrape (discussed in
Section 4.2).
To resolve close cases like this at scale, we fol-
low three steps. First, we inspect the image for recog-
nizable similarity to known canvas fingerprinting im-
ages. Second, if we have any doubt regarding the pur-
pose of the image, we next look at the screenshot as-
sociated with the initiator. From this, we can gener-
ally tell if the image drawn is present on the web-
site’s landing page; a common use-case here is us-
ing the canvas to draw a website’s logo. Third, if we
are unable to see the image on the website’s land-
ing page, we next look at the file (i.e., origin_url)
that drew the image. This approach is telling because
website operators, we found, did not commonly take
steps to hide the purpose of certain functions, instead
using names like function generateFingerprint()
or headers like Fingerprintjs2 1.5.0 - Modern &
flexible browser fingerprint library v2. Finally,
if we were still unsure as to the purpose of the image,
we classify the image as non-fingerprinting. We erred on
the side of non-fingerprinting to ensure that our classi-
fiers are using the most-likely, best-possible evidence—
images that are most commonly associated with finger-
printing.
With these three steps in mind, we set out to label
the entire dataset. A single researcher reviewed 3,517
distinct canvas images for classification. After the la-
bels had been made, the same researcher re-reviewed
all images labeled false (16 images reclassified) and all
images labeled true (four images were reclassified from
true to false). In the end, 285 distinct, positive finger-
printing images were stored in the dataset, along with
3,252 distinct, negative, non-fingerprinting images.
We note that we needed to make a design decision
for mixed-purpose files. These occur in the dataset be-
cause we generated labels based on images, but pri-
marily train at the granularity of a file—and a single
file (i.e., origin_url) could produce multiple images. We
consider a file “true” if it drew any image used for fin-
gerprinting and “false” if it did not draw any images
used for fingerprinting. In this way, the total, distinct
images labeled as true or false in our dataset is 3,537,
which is 20 more than the number reviewed by the ex-
pert. These 20 images relate to the case where the image
has a ground-truth label of false, but because it is some-
times shared with a script that draws a “true” image, it
is also labeled true in our dataset.
3.4 Canvas in the Wild
Combining labels with the data from our main scrape
allows us to assess the state of canvas fingerprinting on
the web (circa 2018). Using this perspective, we learn
that canvas images using toDataURL are: (1) not pre-
dominantly used for fingerprinting, though their use for
fingerprinting is rising; (2) repetitive and shared across
domains; and (3) used consistently across website pop-
ML-CB 459
(A) All Images (B) Fingerprinting
95,509 (23.7%) 8,946 (2.2%)
75,044 (18.6%) 5,288 (1.3%)
71,688 (17.8%) 4,434 (1.1%)
62,867 (15.6%) 2,214 (0.6%)
19,983 (5.0%) 1,226 (0.3%)
Table 1. Most common canvas images (by highest count) out
of the sum of all canvas images drawn by initiators. The left (A)
considers the full dataset, while the right (B) considers only those
images labeled as fingerprinting.
Origin URL (/FP)
cdn.doubleverify.com/dvbs_src_internal62.js 2,301 (10.7%)
rtbcdn.doubleverify.com/bsredirect5_internal40.js 1,746 (8.1%)
cdn.justuno.com/mwgt_4.1.js?v=1.56 1,165 (5.4%)
d9.flashtalking.com/d9core 741 (3.4%)
cdn.siftscience.com/s.js 724 (3.4%)
Table 2. Top five fingerprinting files representing the most com-
monly occurring origin_urls in the dataset. Counts show how
often the file was executed by an initiator (% of all 21,395 files
used for fingerprinting). Since the scrape, some of the URLs have
been modified; prior versions can often be found at the Internet
Archive Wayback Machine (see, e.g., [76]).
ularity, but pooled, in terms of fingerprinting, around
content related to news, businesses, and cooking. We
group these impressions around how the canvas is used
and who is using it.
Similar results may be shown for the files respon-
sible for drawing these images. Websites relied on a
total of 280,148 files to draw canvas images (i.e., sum
of the distinct origin_urls used per website). Of these,
21,395 leveraged canvas fingerprinting. Table 2 shows
the most common files used for canvas fingerprinting,
and the count of how many times each file was used
by a website. As shown in Englehardt and Narayanan’s
work [36], the number one provider of files relying on
canvas fingerprinting is doubleverify, though the rest of
the providers represent a mixed bag, with newer players
like justuno and siftscience being introduced.
From this view, we can verify the highly-cloned na-
ture of canvas images. The most common image in our
dataset is used on more than 95,000 websites. And the
average number of times a single image is re-used across
websites was 114; that number raises slightly when con-
Rank Interval toDataURL Fingerprinting
[1,100) 56% 27%
[100,1K) 40% 18%
[1K,10K) 36% 11%
[10K,100K) 36% 5%
[100K,400K] 39% 3%
Table 3. Percentage of websites in different rank intervals, or-
dered by Alexa Top Rank, using toDataURL in any way or for
canvas fingerprinting. Both uses become less common as website
rank decreases. toDataURL is commonly used for purposes other
than fingerprinting.
sidering only fingerprinting images, to 125. What this
means is that use of the canvas is becoming more com-
mon on the web, though the most likely use-case is to
draw emojis rather than a bespoke logo or other image.
Who Uses the Canvas. When considering web-
site popularity, based on Alexa Top Rank at the time of
our scrape, we see that use of the canvas is not isolated
to top-ranked websites. Figure 2 provides a close-up
view of the top 50 websites and their use of the canvas.
More than half of these websites (30) used toDataURL,
and over a quarter (16) used toDataURL for canvas fin-
gerprinting.7This view may be expanded to the entire
dataset (Table 3), showing relative consistency despite
rank. These results show that use of toDataURL does not
always equate with a fingerprinting purpose. Less than
half of the websites that used toDataURL used it for fin-
gerprinting. Similar to previous work, we also find that
use of the canvas for fingerprinting is increasing, from
4.93% in 2014, to 7% in 2016 (with sub-pages included),
to 11% at the time of our scrape [3, 36].
Finally, we wanted to assess the category of websites
fingerprinting programs may be associated with. To do
this, we fetched URLs found in our dataset and used the
website’s landing page text as input to Google’s Natu-
ral Language Processing API, which assigns text into
categories roughly mirroring Google’s AdWords buck-
ets (620 categories in total) [78]. The fetch occurred
in early September, 2020. We note that the process of
assigning website text into categories likely creates mis-
7We consider an initiator (e.g., google.com) to be “in” the
dataset based on a loose string match (i.e., domain or sub-
domain, using the Python package tld [77]) between the Alexa
Top Ranked URL and the URL found in our dataset. For ex-
ample, using the URL from Alexa, herokuapp.com, we would
consider the following URL from our dataset to be a match,
word-to-markdown.herokuapp.com.
ML-CB 460
Fig. 2. How the top 50 websites (according to Alexa Top Sites at the time of the scrape) use toDataURL with the canvas: Counts
(y axis) describe to the number of distinct images drawn by associated domains (i.e., including domain and subdomain matching) in
files captured by our script, both total (gray) and labeled as fingerprinting (black). The domain is noted along the x axis. Websites are
ranked from left (highest) to right (lowest).
Fig. 3. Distinct initiators from the canvas fingerprinting dataset
categorized using Google Natural Language categories. Counts
show the number of websites using toDataURL for fingerprinting
(black) and not for fingerprinting (gray). Raw counts are provided
because a single website can pertain to multiple categories (on
average 1.5 categories per website). Counts are influenced by
overall distribution of Alexa Top Site rankings.
classifications, and it is likely that website landing page
text changed from the time of our original scrape to
the time of this categorization. On the whole, however,
the top ten categories we identified (Figure 3) reinforce
findings from prior work [36, 60].
We found that a large number of fingerprinting pro-
grams were associated with News and Shopping web-
sites (shopping shows up three separate times, includ-
ing subcategories like vehicles and apparel), food recipe
websites like therealfoodrds.com or ambitiouskitchen.
com, and business-related sites, like retreaver.com or
ironmartonline.com. This reinforces prior work showing
that news websites were more likely to use fingerprint-
ing, with additional fine-grained categories provided by
Google Natural Language.
3.5 Website Source Code
At this point in ML-CB’s architecture, we have collected
a dataset of hyperlinks (i.e., origin_urls), their asso-
ciated canvas drawings, and their fingerprinting–non-
fingerprinting labels. If we intended to train classifiers
with these images alone, then we could move on to train-
ing. Instead, we obtain JavaScript source code for each
canvas drawing, and use that as the basis for our classi-
fier. In the next sections, we explain why images alone
are not sufficient, describe how we pre-process source
code used in our classifiers, and provide an overview of
the models we used.
3.5.1 Why Not Use Images
Models trained only on canvas drawings will be sus-
ceptible to unseen or easily-modified images, similar
to the limitations expressed in [15]. For instance, sup-
pose fingerprinting scripts started to use the cat image
shown in Section 3.3 instead of the common overlapping
circles plus the pangram Cwm fjordbank glyphs vext
quiz (i.e., the fjord pangram). It is plausible that the
difference in entropy would be small enough that the
ML-CB 461
image may still be useful for fingerprinting, but this
would significantly impede an image-based classifier. In-
deed, essentially all of the fingerprinting examples in our
dataset include text, so images without text would be
very likely to fool a classifier trained with this sample [8].
Using the fingerprinting program’s text circumvents
this problem because, although it may be somewhat
easy to change a fingerprinting image drawn to the
canvas, it will be harder to change the textual proper-
ties generating that image [79]. Programs fingerprinting
with the canvas element, in mostly the same fashion,
create the canvas element, fill it with colors and text,
and rely on toDataURL as a final step—an identifiable
pattern used over and over. This can be seen in Section
2.2’s example fingerprinting technique.
In short, using images alone for classification leaves
too much room for an adversary to easily swap images,
while using program text alone is too cumbersome when
generating labels at scale. We take a best-of-both-worlds
approach, leveraging a fast labeling process—labelling
thousands of canvas actions resulting in the same im-
age, though appearing textually heterogeneous, in one
step—while still training on less malleable source code.
The following subsection discusses how we obtained
program text from our original dataset, and how we then
processed this text with jsNice [80]. For an example of
how the resulting text appears in our text corpora, see
Figure 7 in Appendix D. Following this subsection, we
discuss the models we used and their respective archi-
tectures, before turning to our results.
3.5.2 Text Corpora
We fetched program text associated with each of the
origin_urls in our dataset using the Python requests
package [81]. To mitigate the effects of potential
JavaScript obfuscation, we apply jsNice to achieve a se-
mantic representation of program text (using the com-
mand line tool interface to jsNice [82]). We offer a brief
background on jsNice and then discuss our resulting
text-based corpora.
jsNice. jsNice deobfuscates JavaScript text by us-
ing machine learning to predict names of identifiers and
annotations of variables (see Appendix, Figure 7 (B) for
an example). The tool works by representing programs
in a dependency network which allows for the use of
probabilistic graphical models (e.g., a conditional ran-
dom field) to make structured predictions on program
properties [80, 82, 83]. This works well for our purpose
because jsNice adds natural language annotations, re-
gardless of obfuscation and minification, which may be
leveraged by our classifiers.
Corpora. We created two program-text corpora,
one plaintext and one jsNiceified.8These corpora in-
herited labels from our image labeling phase, creating
<program, label> rows.
We first pre-processed the plaintext corpus, drop-
ping 20 rows for empty values, 1,502 rows as dupli-
cates (e.g., dropping programs based on an exact string
match, with 1,316 negative duplicates and 186 positive
duplicates), and 124 programs for having a length of less
than 100 characters (i.e., 113 negatively labeled pro-
grams and 11 positively labeled programs). The final
plaintext corpus included 84,855 total programs, with
2,601 positively labeled programs and 82,254 negatively
labeled programs. We picked the character limit of 100
based on manual inspection of these programs, most of
which were HTML error codes (500 or 404). The av-
erage character length per program was 106,036, while
the maximum character length was 11,838,887.
In pre-processing the jsNiceified corpus, we re-
moved 72 empty rows, 1,806 duplicates (1,467 posi-
tively labeled programs and 339 negatively labeled pro-
grams), and 115 programs with less than 100 char-
acters (103 negatively labeled programs and 12 posi-
tively labeled programs), resulting in a final corpus of
84,735 programs. This included 82,359 negative exam-
ples and 2,376 positive examples. The average character
length per program was 111,041, with a maximum of
11,819,497.
To quickly assess our resulting dataset, we took
all positive examples from our plaintext corpus, tok-
enized each program, and compared the Jaccard sim-
iliarty |program1prog ram2|
|program1pr ogram2|using pairwise matching
[84, 85]. The mean similarity score was .40, with 25, 50,
and 75% of the data represented as similarity scores of
.22, .38, and .58, respectively (Figure 4). This validates
prior work suggesting that a large amount of canvas fin-
gerprinting programs are cloned across the web [15].
Lastly, it is worth noting a possible discrepancy in
our text-based data. Our original scrape did not store
program text, only labels and origin_url links to pro-
gram text, and we first labeled data prior to fetching
8As stated in Section 3.3, websites may either call toDataURL
in a standalone JavaScript file or use interleaved script tags in
an HTML file. This allows for the possibility of classification at
the per-script level, breaking each script tag into its own mini-
example for training and testing. We tested this approach, but
found it had a high tendency to overfit our data. Therefore, we
did not move forward with this approach.
ML-CB 462
Fig. 4. Jaccard similarity between all tokenized programs (i.e.,
files) manually labeled as fingerprinting. Programs along the diag-
onal are compared against themselves (i.e., a Jaccard score of 1).
Lighter colors suggest more similarity.
website source code. As a result, almost two months
had passed from the time of our original scrape to the
time our fetching was complete. This means that a web-
site could have changed a particular JavaScript program
from the time when the scrape occurred to the time the
program text was downloaded. Although possible, this
is unlikely, as previous research has shown that website
source code has a half-life of two or more years, and
JavaScript tends to change even more slowly [86–90].
3.6 Machine Learning Models
We trained four types of machine learning models: a
convolutional neural network (CNN), a support vector
machine (SVM), a Bag-of-Words (BoW) model, and an
embedding model using a pre-trained GloVe embedding
[91].
We use the CNN for an initial reasonableness check
on our classifications; this model was trained only on
the images drawn to the canvas. While the CNN did
produce accurate classifications, its robustness is ques-
tionable for the reasons stated in Section 3.5.1, and con-
firmed in our tests conducted with follow-up data (Sec-
tion 4.2). The SVM was picked for its efficiency, and
because it was shown to be effective in [15]. We hy-
pothesized, based on the limitations of [15], that this
model would not adapt to the slowly changing meth-
ods used for canvas fingerprinting and would need to
be updated frequently. Finally, we used a BoW and
embedding model to assess the differences between our
plaintext corpus (HTML, CSS, and Javascript) and js-
Niceified corpus (more likely to include natural language
generated by jsNice). We trained the SVM, BoW, and
embedding models on both the plaintext and jsNiceified
corpora. Each of the models’ architecture is discussed in
turn.
3.6.1 Images
A standard CNN was used to classify images from our
dataset. The CNN relied on ResNet50, a 50-layer CNN
pretrained on the ImageNet dataset [92, 93]. Consider-
ing raw images, we used a 20% test-train split for hold-
out cross validation (i.e., training with 2,823 examples,
2,691 false and 132 true, and then testing on 707 exam-
ples (674 false and 33 true). All images were reduced to
a size of (3, 150, 150). Six non-fingerprinting (false)
images were unreadable by Pillow [94] and one false im-
age erred during conversion, resulting in a slight differ-
ence in total images between our dataset and this model.
Because of the highly one-sided distribution of positive
examples to negative examples, we also used data ag-
gregation in the form of image manipulation (horizon-
tal, vertical, and 90-degree rotations, along with image
“squishing” [95]). We trained the model at three epochs,
using a one-cycle policy [12] and discriminative layer
training, with the initial layers trained at a learning rate
of 0.0001 and the ending layers trained at 0.001 [96].
3.6.2 Text
For our three text-based models (i.e., SVM, BoW, and
embedding), we followed the preprocessing steps out-
lined in Section 3.5.2. Given the skewed distribution
of negative to positive examples, we also downsampled
the majority class to meet the minority class’s exam-
ple count. To do this, we used all positive fingerprint-
ing examples and randomly selected negative examples
with sklearn’s resample function, which generates n
random samples from a provided set. We used Strati-
fied K-Fold (SKF) cross-validation (ten folds) on each
model to ensure an accurate balance between testing
and training [60, 97]. For a performance metric, we rely
on the F1score in each of the models, though we report
the accuracy, precision, and recall as well. The F1 score
(harmonic mean of precision and recall) is a more realis-
tic measure of performance than accuracy, and is often
used in natural language processing [98]. Finally, to en-
ML-CB 463
sure consistent performance despite downsampling, we
repeated the above process five times, storing the aver-
age of the ten folds’ scores per loop and averaging the
five loops to produce a final score.
Regarding the architecture of specific models, the
SVM’s feature extraction used a pipeline of count vec-
torization, to tokenize and count occurrence values;
term frequency times inverse document frequency, to
turn the occurrence matrix into a normalized frequency
distribution; and stochastic gradient descent with a
hinge loss parameter, for a linear classification. We used
the l2penalty for regularization, with an alpha value of
0.001 to offset the l2penalty. This helps reduce overfit-
ting and increase generalization [99].
For the BoW model [100], we tokenized text using
the keras tokenizer, which vectorizes a text corpus; in
this case, the set of programs from our training class.
This produces a set of integers indexed to a dictionary.
We used a limit of the 100,000 most common words and
then transformed the tokenized programs into a ma-
trix. Our keras sequential model contained a dense layer
with a rectified linear unit as an activation, a dropout
layer of 10%, and a final dense layer using the Soft-
max function as an activation. The model was compiled
with the Adam optimizer (for simple and efficient opti-
mization [101]), used categorical cross-entropy for loss
(given the principled approach in its use of maximum
likelihood estimates [102]), was fit on the training data
for ten epochs, and evaluated with our F1performance
metric on testing data.
The embedding model used the same tokenization
technique as the BoW model, but created a sequence
of integers instead of a matrix for text program rep-
resentation. The sequences were prepended with 0’s to
ensure each integer-as-program sequence was the same
length as the longest sequence (maximum length set
to 1,000). A keras sequential model was used, start-
ing with an embedding layer. We used the popular, pre-
defined GloVe embedding matrix (glove.42B.300d) be-
cause we wanted to assess the contrast between plain-
text and jsNiceified programs [91]. We allowed the em-
bedding’s weights to be updated during training. The
model then used a one dimensional convolution, fol-
lowed by a maxpooling layer, to reduce overfitting and
generalize the model [103]. The next layer in the model
was a Long Short Term Memory (LSTM) recurrent neu-
ral network [104]. An LSTM will be particularly suited
for this purpose given the long memory of the network,
as opposed to a typical recurrent neural network. Our
keras model ended with a dense layer using the Soft-
max activation. We compiled with the Adam optimizer,
used binary cross-entropy for loss, and trained for ten
epochs.
4 Results
To assess our machine learning models, we evaluate
them using: (1) data produced from the original scrape;
(2) a follow-up, up-to-date test suite; and (3) an adver-
sarial perspective, where heavy minification and obfus-
cation are applied to the test suite. Table 4 provides an
overview. It is useful to keep in mind that error here
means broken website functionality (i.e., false positive)
or permitted tracking (i.e., false negative).
Overall, we can see that all of the text-based mod-
els maintain high accuracy and relatively high F1scores
on both the original data (test-train split) and the test
suite. This is true despite our lack of heavy optimization
(i.e., GPU support was not used) and hyper-parameter
tuning. Also, as we hypothesized, the CNN did fare well
with original data, but had problems when considering
new images found in the test suite. Overall accuracy
for all models also decreased in the adversarial setting,
though both the BoW and embedding models nonethe-
less report moderately high accuracy scores. A full anal-
ysis of each of the three categories follows, along with
summaries regarding three hypotheses:
H1 ML-CB’s use of text is more effective than images
or heuristics
H2 jsNice transformations are useful
H3 ML-CB is robust against an adversary who obfus-
cates and minifies website source code
4.1 Original Scrape
The data here comes from our original scrape capturing
canvas images, using one researcher to label those im-
ages, and then fetching program text associated with
each image. For the plaintext corpus, the model ob-
served a total of 5,202 examples; 4,682 of those examples
were used for training and 520 for testing (10% per fold).
For the jsNiceified data, the model observed a total of
4,752 examples, training on 4,277 examples and testing
on 475 examples (10% per fold).
As may be expected given the nature of a train-test
split, all models perform well (F1scores 95%); in fact,
there is almost no difference between the models’ per-
formance. This suggests that more complex transforma-
tions like jsNice or more complex models like the BoW
ML-CB 464
Original Scrape Test Suite Adversarial Perspective
SVM BoW Embedding SVM BoW Embedding SVM BoW Embedding
P jsN P jsN P jsN P jsN P jsN P jsN P jsN P jsN P jsN
F1 97 97 98 98 95 95 83 84 86 87 84 86 53 50 55 80 60 74
Accuracy 97 97 98 98 95 95 91 91 93 93 92 93 59 59 85 90 69 84
Precision 98 98 99 99 98 99 72 73 77 78 75 81 37 34 79 73 43 60
Recall 97 97 96 96 92 91 100 100 96 98 96 92 99 89 56 91 99 98
Table 4. Showing the average of the ten folds’ F1performance (per the SKF cross validation method used), averaged again over five
separate executions of the models, to help generalize performance. The highest F1scores out of all models are noted with a gray box.
or embedding are not needed. It is also worth pointing
out that this result is on par with previous work using
machine learning to distinguish text-based fingerprint-
ing programs (e.g., [15, 60] had 99% accuracy scores on
an original set of scraped programs).
CNN. On original image data, averaging ten sep-
arate test-train runs, the CNN was also fairly accu-
rate (98.9%) and performed slightly worse when clas-
sifying positive examples (F1score of 89.2%, with re-
call at 86.1% and precision at 92.7%). We likely could
have improved these numbers with additional tuning
of the model (e.g., using downsampling or something
like SMOTE [105]) or solidified this finding with more
robust cross validation, but we can already see the
model’s shortcomings when comparing images alone
(see also Section 3.5.1). Figure 5 (A) illustrates an ex-
ample weakness—images labeled as fingerprinting can
appear very similar to those used for non-fingerprinting,
leading to errors.
Summary. When considering the original data,
ML-CB is not appreciably better than classification
based on images (H1); all models perform well. Like-
wise, adding jsNice (H2) does not improve performance
with this original dataset. While a fraction of files (i.e.,
scripts) in this dataset are obfuscated (see [60, 106]),
this evaluation does not provide sufficient evidence re-
garding the models’ robustness to adversarial obfusca-
tion (H3).
4.2 Test Suite
In order to more accurately evaluate these models, we
followed the approach used in [15] and created a sec-
ond set of scraped programs, using the same methods
discussed in Section 3.1. Initially, we targeted 2,200
websites, using our original dataset, original Alexa Top
Rank list, and the Tranco list of the one million top
sites (pulled on September 10, 2020) as sources [107].
Fig. 5. Most likely incorrect predictions made by the CNN. Im-
ages from the original scrape are shown on the left (A), while
images from the test suite are shown on the right (B). For further
examples of the variety in fingerprinting images, see Figure 6 in
Appendix C.
.
To obtain a wide variety of test cases, we aimed for a
set of websites categorized in the following buckets:
– last 100 URLs in Tranco
– 200 random URLs labeled false in original dataset
– 300 random URLs labeled true in original dataset
– 100 random URLs from last 500,000 Tranco URLs
– 300 random URLs outside top 100 Tranco URLs
– first 100 URLs in original Alexa Top Rank list
– first 100 URLs in Tranco
– first 1,000 URLs in Tranco but not original dataset
We ran our scraper on these 2,200 URLs, hooking the
toDataURL function call and creating a second dataset
(hereafter, the test suite). Our scrape occurred in early
October, 2020. The scrape took less than one day to
complete.
The resulting test suite contained 906 distinct do-
mains identified as using toDataURL and 181 distinct
canvas images. The same researcher labeled the images,
ML-CB 465
returning a set of 90 true images and 95 false images. As
described in Section 3.3, a small number of images were
labeled both true and false depending on the origin_url
the image was associated with. This aligns with our de-
sign preference to label as “true” when handling mul-
tiple image labels found within a single program (i.e.,
a program drawing several benevolent images and one
fingerprinting image is labeled as fingerprinting).
We then fetched origin_urls to generate the plain-
text corpus and applied the jsNice transformation to re-
ceive the jsNiceified corpus. Because we wanted a one-
to-one comparison on the test suite, we dropped two
negative examples from the plaintext corpus which did
not transform due to a jsNice error, resulting in 318
negative examples and 90 positive examples.
We train one model on the original plaintext cor-
pus and one on the original jsNiceified corpus, then test
these models on the test suite’s plaintext and jsNiceified
data, respectively. We assess performance on the test
suite for each fold in our SKF cross validation, taking
the average for all folds, and repeating this process five
times, reporting the average of the five runs in Table 4.
Overall, most models perform well on the new data.
The average F1score among all models is in the mid
80s, with accuracy scores in the low 90s. Notably, pre-
cision scores are low for most models, with high recall
scores. We can also start to see the case for why jsNice is
needed. Although most models perform better across all
metrics when using jsNice, the difference is slight, but
begins to show higher gains in the embedding model,
which better balances precision and recall.
Heuristic Comparison. We next want to com-
pare our models against state-of-the-art heuristic ap-
proaches. One possible approximation is to compare
results: do we identify the same fingerprinting files as
prior work? This comparison, however, is not straight-
forward; different datasets use different strategies to de-
cide which and how many websites to test, and were
collected at different times, meaning tracking compa-
nies may have moved their fingerprinting scripts to new
files, or the content of the same URL may have changed.
Datasets like Englehardt and Narayanan’s original work
(2016) [36] and the more recent FPInspector (2021) [60]
also focus on general fingerprinting rather than canvas
fingerprinting specifically and contain URLs rather than
the contents of files that may be fingerprinting. Acar et
al. (2014) [3] released domain names of fingerprinting
scripts rather than full paths. All three publicly released
datasets only include examples of fingerprinting, not ex-
amples of non-fingerprinting.
As such, we calculate overlap between our dataset
and these datasets as follows: considering only ori-
gin_urls we identify as fingerprinting, how much over-
lap is there between fingerprinting URLs in our data
(our original and test suite datasets combined) and
prior datasets. We find that the overlap with Engle-
hardt and Narayanan [36] is 1% of their dataset (0.7%
of ours). Our overlap with FPInspector [60] is 14% of
their dataset (9.8% of ours). And the overlap with Acar
et al. [3] (truncating our dataset to only domains to
match theirs) is 20% of their data (.8% of ours).
For a more meaningful heuristic comparison, we
used openWPM [36, 37] on the URLs in the test suite,
filtering for “canvas fingerprinting” according to the
heuristic used in [36] (see Appendix A). To make the
comparison one to one, we considered only those ori-
gin_urls (i.e., script_urls) captured by both our scraper
and openWPM, 306 urls in total. On this subset, the
heuristic maintains a high accuracy rate, at 93.8%, but
low F1score, at 77.6% (precision at 80.5% and recall
at 75%). Considering these origin_urls only, ML-CB’s
BoW model with the jsNice corpus has an accuracy of
97.6% and an F1score of 91.4% (with precision at 94.7%
and recall at 88.9%). We hypothesize that this occurs
because openWPM performs best on typical canvas fin-
gerprints (i.e., the fjord pangram), as does our classi-
fier. Finally, we modify the heuristic to be more accurate
on the test suite by dropping the requirement that at
least two colors are used in the fingerprint. The heuris-
tic finds increased performance, but still less than the
ML-CB BoW model, achieving an accuracy of 94.8%,
but an F1score of 81.8% (81.8% precision and recall).
CNN. We also used the test suite to assess our
image-based CNN model trained on original scrape data
and tested on renderable, distinct images associated
with the test suite (181 examples in total, 108 nega-
tive and 73 positive). We followed the same procedures
stated in Section 3.6.1 (e.g, image manipulations, train-
ing at three epochs with a one-cycle policy and discrim-
inative layer training). The model achieved an average
accuracy of 82%, with an F1score of 72.1% (100% pre-
cision and 57.4% recall) on ten separate test-train runs.
Notably, images most likely to be incorrectly labeled by
the CNN were Picasso-styled images, most of which the
model predicted as negative. Figure 5 (B) demonstrates
another weakness of the CNN: new or different images
can easily trick the classifier, as the Picasso-styled fin-
gerprinters in the test suite did.
Summary. As shown in Table 5, ML-CB handles
the up-to-date test suite better than the CNN (F1scores
of 72% versus 91% when using the BoW model) and bet-
ML-CB 466
Test Suite (Subset)
F1 Accuracy Precision Recall
ML-CB (BoW jsNice) 91 98 95 89
Heuristic (improved) 82 95 81 75
Heuristic (original) 78 94 82 82
CNN* 72 82 100 57
Table 5. Comparing ML-CB against the heuristic and CNN,
percentages rounded up. All models (except the CNN* which
reports results on the full test suite) use a subset of programs
found in the test suit which were also identified by openWPM,
for a one to one comparison. Although the heuristic shows high
accuracy, ML-CB offers a better balance of precision and recall.
ter than heuristics (F1score, in the improved heuristic,
of 82% versus 91%); therefore, H1 holds at this point.
However, although there is a performance boost by us-
ing jsNice in the full test suite, the difference is not
substantial, meaning that H2, whether jsNice is useful,
only partially holds. Finally, although we found exam-
ples of obfuscation in the test suite, we still do not have
enough information to assess whether ML-CB resists
adversarial techniques (H3).
4.3 Adversarial Considerations
A canvas fingerprinting detection system must also con-
sider the ad- and tracker-blocking ecosystem it would
possibly be deployed in [108]. If straightforward at-
tempts to fool the classifier via obfuscation are success-
ful, then ML-CB will be less useful.
To approximate this type of adversary, we applied
heavy, but standard, obfuscation and minification to
our test suite.9We accomplished this by iterating over
the plaintext corpus from our test suite and obfuscating
all JavaScript programs with the JavaScript Obfuscator
Tool [109] (see Figure 7 (C) in the Appendix for an ex-
ample). We also minified all HTML and CSS statements
using a numpy package html-minifier [110] or a Python
tool htmlmin [111] (if the html-minifier failed).
For minification, we used typical techniques, such
as the removal of whitespace, the removal of optional
tags, and the reduction of boolean attributes. For the
JavaScript Obfuscator Tool, a full list of flags may be
9We did not apply other, less out-of-the-box kinds of adversar-
ial perturbations, such as substantive code changes. We also did
not consider use of alternative functions, such as getImageData
to avoid a classifier’s detection of toDataURL, but we could easily
adjust our pipeline to recognize such functions.
found in Appendix B, but the notable flags include:
control-flow-flattening, to alter code structure [112];
dead-code-injection, to thwart language-based classifi-
cations; unicode-escape-sequence, to make natural lan-
guage classification harder; numbers-to-expressions, to
further change control flow; string-array, to replace nat-
ural language strings with hexadecimal arrays; and
transform-object-keys, to turn objects into functions.
The obfuscated and minified results became our
“plaintext” adversarial corpus. We then applied jsNice
to the obfuscated and minified programs to create a “js-
Niceified” adversarial corpus. This process is illustrated
in Appendix D. Due to errors in obfuscation, the fi-
nal dataset held 289 negative examples and 86 positive
examples. As described in 3.5.2, we train on our origi-
nal corpora (using ten-fold SKF) and test on the new
“plaintext” adversarial corpus and “jsNiceified” adver-
sarial corpus, separately.
The classifiers perform poorly against the “plain-
text” adversarial corpus, but adequately against the “js-
Niceified” adversarial corpus, with accuracy scores as
high as 90%. It is unsurprising that a model trained on
natural-language plaintext is ill-equipped to assess non-
natural-language code (e.g., hexadecimal strings) in the
obfuscated corpus. On the other hand, applying jsNice
restores some of the natural-language features, includ-
ing predicting names and types. A classifier trained us-
ing jsNice-ification is able to take advantage of these fea-
tures. Appendix E further illustrates this point, showing
the contrast between obfuscated code (unreadable) and
obfuscated-but-jsNiceified code (more readable).
The improvement on the adversarial corpus when
using jsNice validates our hypothesis that jsNice adds
natural-language meaning to otherwise difficult-to-parse
source code, motivating its use in this setting. More-
over, we can see that the SVM—which we expected to
need more continual updating—does not fare well, even
with the use of jsNice. The same is true for the embed-
ding model, possibly due to its use of predefined weights
taken from website text (i.e., the stock GloVe weights)
rather than website source code. We would likely see a
performance boost if the embedding model used its own
set of weights tuned to this particular environment.
Lastly, we note that although the models in the ad-
versarial case do not perform nearly as well as in the
original scrape or test suite, to some extent, this out-
come correctly aligns incentives. Because obfuscation
creates many false positives (low precision scores), our
classifier may incentivize website owners who are not
conducting fingerprinting to avoid obfuscation. On the
ML-CB 467
other hand, trackers who try to use obfuscation to en-
able fingerprinting are reasonably likely to be identified.
Summary. One of the advantages of images and
heuristics is that these methods are not as heavily af-
fected by obfuscation—the same image is eventually
drawn, and openWPM uses dynamic analysis to mea-
sure features in part for this reason (H3) [36]. The
trade-off here is that these methods ossify easily, with
Picasso-styled images being missed by both (i.e., short,
same-color strings are missed by the heuristic’s sec-
ond requirement, see Appendix A). At the same time,
the heavy obfuscation used in our adversarial corpora,
which weakens ML-CB, is hypothetical, and not consis-
tently or heavily applied by website owners, even though
these methods have existed for years. This suggests that
it may be best to prioritize generalizable models in can-
vas fingerprint blocking tools, something achieved by
ML-CB’s use of text, rather than resilience from obfus-
cation (H1). Finally, from this vantage point, we can see
that jsNice is only partially needed now (H2) but may
provide some protection against future, heavier obfus-
cation.
5 Discussion
HTML5’s canvas is a dual-use technology, currently
used by 37% of the top 10K websites (Table 3). At the
same time, using the canvas for device fingerprinting
is rising, with websites ranked in the 1K to 10K inter-
val engaging in canvas fingerprinting at a rate of 4.93%
in 2014, 7% in 2016 (sub-pages included), and 11% in
2018 [3, 36]. Moreover, the current state of the art for
blocking canvas fingerprinting is either over-aggressive,
in a block-all manner, or under-aggressive, with rigid
heuristics.
ML-CB presents an alternative approach. Using hu-
mans to make judgments on easily distinguishable can-
vas fingerprinting images to quickly and easily create
a set of ground truth, allowing classifiers to be trained
on website source code. This approach makes it feasi-
ble to use supervised machine learning on continuously
trained or easily updatable classifiers. ML-CB may also
be combined with other classifiers in a wisdom of the
crowd approach—e.g., using a bitwise OR operation on
the outputs from the CNN and text-based models from
above—or used as a plug-in to identify canvas finger-
printing in systems that might otherwise use a heuristic-
based list to define canvas fingerprinting (Section 2) [60].
What is more, this approach may be extended to other
forms of fingerprinting like AudioContext, mobile de-
vice orientation, or touch-focused fingerprinting, which
are likely heavily cloned across the web and may rely
on identifiable hallmarks when used for fingerprinting
[34, 113, 114]. The difficulty here would be finding a way
to relax the dual-use problem with an easy-to-identify
proxy, which we leave to future work.
As a final recommendation, we urge policymakers to
take a more active role in the ad- and tracker-blocking
ecosystem. The use of surreptitious tracking mecha-
nisms, like canvas fingerprinting, is rising, and, cur-
rently, nothing stops a website from using these state-
less tools—a barely legible statement in a privacy pol-
icy may be required in some jurisdictions, but is un-
likely to bring awareness to the practice [115]. To change
the Panopticon-styled Internet we have today, tools like
ML-CB are necessary, but not sufficient, and must be
accompanied by non-technical solutions like legislation
[116].
5.1 Limitations
A primary limitation to ML-CB is the self-labeled na-
ture of supervised learning, which may have biased the
classifier. However, although it may have been possible
to outsource the image-based labeling to crowdwork-
ers, training a crowdworker to distinguish fingerprint-
ing from non-fingerprinting would require supplying
straightforward guidelines (i.e., distinguishing meaning-
ful content like the cat from meaningless content like
the fjord pangram), and this requires the crowdworker
to also have context about the website on which the im-
age was found. Further, most of the canvas images in
the dataset were both illogical and non-fingerprinting.
This occurs because many of these images are used as
small background icons or pieced together given some
user interaction. As such, non-fingerprinting examples
would have likely been labeled fingerprinting by work-
ers. Overcoming false positives by providing more train-
ing about true positives would inherit the same bias as
relying on an expert for labeling. For this reason, we
opted to label the images ourselves.
Second, ML-CB produces more false positives than
false negatives. This likely occurs because of our design
decision to aggressively downsample, based on the re-
spective difficulty of classifying positive versus negative
examples. Given that the vast majority of examples in
our dataset are negative (non-fingerprinting), the classi-
fier’s strength would be oversold if we did not downsam-
ple and let the classifier prioritize a prediction of non-
ML-CB 468
fingerprinting—as a ground truth label is, by default,
more likely negative. This approach enables a better
performing classifier, but one that potentially “blocks”
benevolent files, worsening the user’s experience. De-
pending on context, it may therefore be preferable to
alter this judgment and use less downsampling; how-
ever, in order to better assess classification decisions,
we did not take that approach in this paper.
Finally, although ML-CB inferred tracking based on
“typical” canvas fingerprinting images, there are poten-
tially beneficial uses for these images and underlying
methods, such as authentication schemes or fraud detec-
tion. For example, the last URL found in Table 2 shows
siftscience, an anti-fraud company, drawing the popular
fjord pangram. Because the same image may be used
for either purpose, and because ML-CB’s pipeline takes
ground truth from images, altering the label on those
images would inappropriately change the label for all
images. Instead, a deployed fingerprinting blocker could
opt to allowlist specified <script, domain> pairs, al-
lowing specific, approved websites to engage in canvas
fingerprinting. We note that what constitutes appropri-
ate or inappropriate fingerprinting requires human judg-
ment.
6 Conclusion
In this paper, we presented ML-CB, a process and im-
plementation for generating trained machine-learning
models which may be integrated into browsers to pro-
vide “smart” tracking blocking on HTML5 canvas ac-
tions. To achieve this, we crawled roughly half a million
websites and created a dataset of canvas-based actions.
We analyze this dataset and discuss the web’s overall
use of the canvas, finding that rote blocking of the can-
vas would disproportionately block a substantial num-
ber of non-harmful canvas actions. We then apply a key
insight—the images drawn to the canvas may be used
as a proxy for “good” or “bad” canvas actions, allow-
ing us to label hundreds of files (i.e., programs), related
to thousands of webpages, with a single image. We use
these labels to train supervised machine-learning classi-
fiers, which perform adequately even in the adversarial
case where website source code is heavily obfuscated
and minified. In this way, ML-CB may be used to in-
crease privacy online by thwarting one way devices are
surreptitiously tracked across the web.
Acknowledgements.
This research received no specific grant from any fund-
ing agency in the public, commercial, or not-for-profit
sectors. We thank Steven M. Bellovin for comments on
an earlier version of this paper.
References
[1] E. Zuckerman, “The internet’s original sin,” The
Atlantic, vol. 14, August 2014. [Online]. Available:
https://www.theatlantic.com/technology/archive/2014/
08/advertising-is-the-internets- original-sin/376041/
[2] N. Bielova, “Web tracking technologies and protection
mechanisms,” in Proceedings of the 2017 ACM SIGSAC
Conference on Computer and Communications Security,
2017.
[3] G. Acar, C. Eubank, S. Englehardt, M. Juarez, A. Narayanan,
and C. Diaz, “The web never forgets: Persistent tracking
mechanisms in the wild,” in Proceedings of the 2014 ACM
SIGSAC Conference on Computer and Communications
Security, 2014.
[4] J. R. Mayer and J. C. Mitchell, “Third-party web tracking:
Policy and technology,” in 2012 IEEE Symposium on
Security and Privacy, 2012.
[5] N. F. Awad and M. S. Krishnan, “The personalization
privacy paradox: An empirical evaluation of information
transparency and the willingness to be profiled online for
personalization,” MIS Quarterly, vol. 30, no. 1, pp. 13–28,
March 2006.
[6] M. Taddicken, “The ’privacy paradox’ in the social web:
The impact of privacy concerns, individual characteristics,
and the perceived social relevance on different forms of self-
disclosure,” Journal of Computer-Mediated Communication,
vol. 19, no. 2, p. 248–273, January 2014.
[7] S. Fulton and J. Fulton, HTML5 Canvas. O’Reilly Media,
Inc., 2011.
[8] K. Mowery and H. Shacham, “Pixel perfect: Fingerprinting
canvas in HTML5,” in Proceedings of W2SP, 2012.
[9] S. Wu, S. Li, Y. Cao, and N. Wang, “Rendered private:
Making GLSL execution uniform to prevent WebGL-
based browser fingerprinting,” in 28th USENIX Security
Symposium, 2019.
[10] P. Laperdrix, N. Bielova, B. Baudry, and G. Avoine, “Browser
fingerprinting: A survey,” ACM Transactions on the Web
(TWEB), vol. 14, no. 2, pp. 1–33, 2020.
[11] S. Luangmaneerote, E. Zaluska, and L. Carr, “Survey of
existing fingerprint countermeasures,” in 2016 International
Conference on Information Society (i-Society), 2016.
[12] L. N. Smith, “A disciplined approach to neural network
hyper-parameters: Part 1 – learning rate, batch size, mo-
mentum, and weight decay,” US Naval Research Laboratory,
Technical Report 5510-026, 2018.
[13] S. Haiduc, J. Aponte, L. Moreno, and A. Marcus, “On
the use of automated text summarization techniques
for summarizing source code,” in 2010 17th Working
ML-CB 469
Conference on Reverse Engineering, 2010.
[14] S. Clark, M. Blaze, and J. M. Smith, “Smearing fingerprints:
Changing the game of web tracking with composite
privacy,” in Cambridge International Workshop on Security
Protocols, 2015.
[15] M. Ikram, H. J. Asghar, M. A. Kaafar, A. Mahanti, and
B. Krishnamurthy, “Towards seamless tracking-free web:
Improved detection of trackers via one-class learning,”
Proceedings on Privacy Enhancing Technologies, vol. 2017,
no. 1, pp. 79–99, 2017.
[16] T. Bujlow, V. Carela-Español, J. Solé-Pareta, and P. Barlet-
Ros, “A survey on web tracking: Mechanisms, implications,
and defenses,” Proceedings of the IEEE, vol. 105, no. 8, pp.
1476–1510, August 2017.
[17] Electronic Frontier Foundation, “Panopticlcick,” https:
//panopticlick.eff.org/.
[18] R. Upathilake, Y. Li, and A. Matrawy, “A classification
of web browser fingerprinting techniques,” in 2015 7th
International Conference on New Technologies, Mobility
and Security (NTMS), 2015.
[19] L. I. Millett, B. Friedman, and E. Felten, “Cookies and web
browser design: Toward realizing informed consent online,”
in Proceedings of the SIGCHI conference on Human factors
in computing systems, 2001.
[20] O. Kulyk, A. Hilt, N. Gerber, and M. Volkamer, “‘this
website uses cookies’: Users’ perceptions and reactions to
the cookie disclaimer,” in European Workshop on Usable
Security (EuroUSEC) 2018, April 2018.
[21] F. Marotta-Wurgler, “Does “notice and choice” disclosure
regulation work? an empirical study of privacy policies,”
in Michigan Law: Law and Economics Workshop, 2015.
[Online]. Available: https://perma.cc/GYN4-3YFA
[22] J. R. Reidenberg, N. C. Russell, A. Callen, S. Qasir, and
T. Norton, “Privacy harms and the effectiveness of the
notice and choice framework,” I/S: A Journal of Law and
Policy for the Information Society, pp. 485–524, 2014.
[23] N. Richards and W. Hartzog, “The pathologies of digital
consent,” Washington University Law Review, vol. 96, pp.
1461–1503, 2019.
[24] S. Englehardt, D. Reisman, C. Eubank, P. Zimmerman,
J. Mayer, A. Narayanan, and E. W. Felten, “Cookies
that give you away: The surveillance implications of
web tracking,” in Proceedings of the 24th International
Conference on World Wide Web, 2015.
[25] M. Perry, E. Clark, S. Murdoch, and G. Koppen, “The
design and implementation of the Tor Browser [draft],”
June 2018, https://2019.www.torproject.org/projects/
torbrowser/design/.
[26] Inform Action, “noscript,” https://noscript.net.
[27] A. Macrina and E. Phetteplace, “The Tor Browser and
intellectual freedom in the digital age,” Reference and User
Services Quarterly, vol. 54, no. 4, pp. 17–20, 2015.
[28] M. Piekarska, Y. Zhou, D. Strohmeier, and A. Raake,
“Because we care: Privacy dashboard on FirefoxOS,” arXiv,
2015. [Online]. Available: https://arxiv.org/abs/1506.04105
[29] C. E. Shannon, “A mathematical theory of communication,”
Bell Syst. Tech., vol. 27, pp. 379–423, 1948. [Online].
Available: https://arxiv.org/abs/1506.04105
[30] Y. Cao, S. Li, E. Wijmans et al., “(Cross-) browser finger-
printing via OS and hardware level features.” in Proceedings
of the 2017 Network and Distributed System Security Sym-
posium, 2017.
[31] Mozilla, “HTMLCanvasElement.toDataURL(),”
https://developer.mozilla.org/en-US/docs/Web/API/
HTMLCanvasElement/toDataURL.
[32] N. Reitinger, “Faces and fingers: Authentication,” Journal
of High Technology Law, vol. 20, no. 1, pp. 61–81, 2020.
[33] E. Bursztein, A. Malyshev, T. Pietraszek, and K. Thomas,
“Picasso: Lightweight device class fingerprinting for web
clients,” in Proceedings of the 6th Workshop on Security
and Privacy in Smartphones and Mobile Devices, 2016.
[34] fingerprintJS, “FPJS - Valve,” https://github.com/Valve/
fingerprintjs2.
[35] antoinevastel, “Picasso based canvas fingerprinting,”
https://github.com/antoinevastel/picasso-like-canvas-
fingerprinting.
[36] S. Englehardt and A. Narayanan, “Online tracking: A
1-million-site measurement and analysis,” in Proceedings
of the 2016 ACM SIGSAC Conference on Computer and
Communications Security, 2016.
[37] Mozilla, “OpenWPM,” https://github.com/mozilla/
OpenWPM.
[38] P. Laperdrix, “Browser fingerprinting: Exploring device
diversity to augment authentification and build
client-side countermeasures,” Cryptography and Security
[cs.CR]. INSA de Rennes, 2017. [Online]. Available:
https://tel.archives-ouvertes.fr/tel-01729126/document
[39] N. Bielova, F. Besson, and T. Jensen, “Using JavaScript
monitoring to prevent device fingerprinting,” ERCIM News,
vol. 106, July 2016. [Online]. Available: https://ercim-
news.ercim.eu/images/stories/EN106/EN106-web.pdf
[40] G. Merzdovnik, M. Huber, D. Buhov, N. Nikiforakis,
S. Neuner, M. Schmiedecker, and E. Weippl, “Block me if
you can: A large-scale study of tracker-blocking tools,” in
2017 IEEE Symposium on Security and Privacy, 2017.
[41] Appodrome, “CanvasFingerprintBlock,” https://chrome.
google.com/webstore/detail/canvasfingerprintblock/
ipmjngkmngdcdpmgmiebdmfbkcecdndc.
[42] C. F. Torres, H. Jonker, and S. Mauw, “FP-Block: Usable
web privacy by controlling browser fingerprinting,” in
European Symposium on Research in Computer Security,
2015.
[43] N. Nikiforakis, W. Joosen, and B. Livshits, “Privaricator,”
in Proceedings of the 24th International Conference on the
World Wide Web, 2015.
[44] A. ElBanna and N. Abdelbaki, “NONYM! ZER: Mitigation
framework for browser fingerprinting,” in 2019 IEEE 19th
International Conference on Software Quality, Reliability
and Security Companion, 2019.
[45] P. Baumann, S. Katzenbeisser, M. Stopczynski, and
E. Tews, “Disguised Chromium browser: Robust browser,
flash and canvas fingerprinting protection,” in Proceedings of
the 2016 ACM on Workshop on Privacy in the Electronic
Society, 2016.
[46] P. Laperdrix, B. Baudry, and V. Mishra, “FPRandom:
Randomizing core browser objects to break advanced device
fingerprinting techniques,” in 9th International Symposium
on Engineering Secure Software and Systems, 2017.
[47] P. Laperdrix, W. Rudametkin, and B. Baudry, “Mitigating
browser fingerprint tracking: Multi-level reconfiguration
ML-CB 470
and diversification,” in Proceedings of the IEEE/ACM
10th International Symposium on Software Engineering for
Adaptive and Self-Managing Systems, 2015.
[48] A. Datta, J. Lu, and M. C. Tschantz, “Evaluating anti-
fingerprinting privacy enhancing technologies,” in The Web
Conference, 2019.
[49] A. Vastel, P. Laperdrix, W. Rudametkin, and R. Rouvoy,
“Fp-Scanner: The privacy implications of browser fingerprint
inconsistencies,” in 27th USENIX Security Symposium,
2018.
[50] The Tor Project, “Bug 6253: Add canvas image ex-
traction prompt,” https://gitweb.torproject.org/tor-
browser.git/commit/?h=tor-browser-52.5.2esr-7.0-
2&id=196354d7951a48b4e6f5309d2a8e46962fff9d5f.
[51] Disconnect, “Disconnect tracker protection,” https://
github.com/disconnectme/disconnect-tracking-protection.
[52] Wayback Machine, “ftc.gov, february 1, 2019,” https://web.
archive.org/web/20190201065632/https://www.ftc.gov/
and https://web.archive.org/web/20190201050345js_
/https://gateway.foresee.com/code/19.6.6/fs.utils.js.
[53] N. Reitinger, “Strange bedfellows: Fingerprinting
phenomena...or state.gov versus facebook.com,”
https://medium.freecodecamp.org/strange-bedfellows-
fingerprinting-phenomena-or-state-gov-versus-facebook-
com-8d123866e7df.
[54] A. Narayanan, January 2019, https://twitter.com/random_
walker/status/1089897867458867200.
[55] Disconnect, “services.json,” https://github.com/
disconnectme/disconnect-tracking-protection/blob/master/
services.json and https://perma.cc/C4GQ-XSUY.
[56] A. FaizKhademi, M. Zulkernine, and K. Weldemariam,
“FPGuard: Detection and prevention of browser fingerprint-
ing,” in IFIP Annual Conference on Data and Applications
Security and Privacy, 2015.
[57] WordPress.org, “#43264: WordPress emojis show up as
browser fingerprinting and will be blocked in new versions of
FireFox,” https://core.trac.wordpress.org/ticket/43264.
[58] K. Boda, Á. M. Földes, G. G. Gulyás, and S. Imre, “User
tracking on the web via cross-browser fingerprinting,” in
Nordic Conference on Secure IT Systems, 2011.
[59] A. Gómez-Boix, D. Frey, Y.-D. Bromberg, and B. Baudry,
“A collaborative strategy for mitigating tracking through
browser fingerprinting,” in Proceedings of the 6th ACM
Workshop on Moving Target Defense, 2019.
[60] U. Iqbal, S. Englehardt, and Z. Shafiq, “Fingerprinting the
fingerprinters: Learning to detect browser fingerprinting be-
haviors,” in 2021 IEEE Symposium on Security & Privacy,
2021.
[61] V. Rizzo, S. Traverso, and M. Mellia, “Unveiling web
fingerprinting in the wild via code mining and machine
learning,” Proceedings on Privacy Enhancing Technologies,
vol. 1, pp. 43–63, 2021.
[62] S. Bird, V. Mishra, S. Englehardt, R. Willoughby, D. Zeber,
W. Rudametkin, and M. Lopatka, “Actions speak louder than
words: Semi-supervised learning for browser fingerprinting
detection,” 2020, https://arxiv.org/pdf/2003.04463.pdf.
[63] C.-H. Hsiao, M. Cafarella, and S. Narayanasamy, “Using web
corpus statistics for program analysis,” in Proceedings of the
2014 ACM International Conference on Object Oriented
Programming Systems Languages & Applications, 2014.
[64] S. Robertson, “Understanding inverse document frequency:
On theoretical arguments for IDF,” Journal of Documenta-
tion, vol. 60, no. 5, p. 503–520, 2004.
[65] W. S. Noble, “What is a support vector machine?” Nature
Biotechnology, vol. 24, no. 12, p. 1565–1567, 2006.
[66] D. E. Knuth, “Semantics of context-free languages. math-
ematical systems theory,” Mathematical systems theory,
vol. 2, no. 2, pp. 127–145, 1968.
[67] J. McCarthy, “A formal description of a subset of ALGOL,”
Stanford University Department of Computer Science, Tech.
Rep., 1964.
[68] J. R. Quinlan, C4.5: Programs for Machine Learning.
Morgan Kaufmann, 1994.
[69] ——, “Induction of decision trees,” Machine learning, vol. 1,
no. 1, pp. 81–106, 1986.
[70] Selenium, “Selenium WebDriver,” https://www.selenium.
dev.
[71] Amazon Web Services, “Top sites in united states,”
https://www.alexa.com/topsites/countries/US.
[72] Mozilla, “webRequest,” https://developer.mozilla.org/
en-US/docs/Mozilla/Add-ons/WebExtensions/API/
webRequest.
[73] MDN Web Docs, “webRequest.onBeforeRequest,” https:
//developer.mozilla.org/en-US/docs/Mozilla/Add-ons/
WebExtensions/API/webRequest/onBeforeRequest.
[74] PostgreSQL, “Postgresql: The world’s most advanced open
source relational database,” https://www.postgresql.org.
[75] Browserleaks.com, “HTML5 Canvas Fingerprinting,” https:
//browserleaks.com/canvas.
[76] Wayback Machine, “justuno.com, october 31, 2018,” 2018,
https://web.archive.org/web/20181031094820/cdn.justuno.
com/mwgt_4.1.js?v=1.56.
[77] tld, “tld 0.12.2,” https://pypi.org/project/tld/.
[78] Google Cloud, “Natural language,” https://cloud.google.
com/natural-language.
[79] L. Luo, J. Ming, D. Wu, P. Liu, and S. Zhu, “Semantics-
based obfuscation-resilient binary code similarity comparison
with applications to software plagiarism detection,” in
in Proceedings of the 22nd ACM SIGSOFT International
Symposium on the Foundations of Software Engineering,
2014.
[80] V. Raychev, M. Vechev, and A. Krause, “Predicting program
properties from ‘Big Code’,” in Proceedings of the 42nd
Annual ACM SIGPLAN-SIGACT Symposium on Principles
of Programming Languages, 2015.
[81] Requests, “Requests: HTTP for humans,” https://requests.
readthedocs.io/en/master/.
[82] brettlangdon, “Command line interface to http://jsnice.org,”
https://github.com/brettlangdon/jsnice.
[83] D. Koller and N. Friedman, Probabilistic Graphical Models:
Principles and Techniques. MIT Press, 2009.
[84] M. Chahal, “Information retrieval using Jaccard similarity
coefficient,” International Journal of Computer Trends and
Technology (IJCTT), vol. 36, no. 3, 2016.
[85] textdistance, “textdistance 4.2.0,” https://pypi.org/project/
textdistance/.
[86] W. Koehler, “Web page change and persistence—Ai’m
not four-year longitudinal study,” Journal of the American
Society for Information Science and Technology, vol. 53,
no. 2, pp. 162–171, 2002.
ML-CB 471
[87] P. Warren, C. Boldyreff, and M. Munro, “The evolution
of websites,” in Proceedings of the Seventh International
Workshop on Program Comprehension, 1999.
[88] W. Koehler, “An analysis of web page and web site
constancy and permanence,” Journal of the American
Society for Information Science, vol. 50, no. 2, pp. 162–180,
1999.
[89] D. Herrmann, R. Wendolsky, and H. Federrath, “Website
fingerprinting: Attacking popular privacy enhancing tech-
nologies with the multinomial naïve-Bayes classifier,” in
Proceedings of the ACM Workshop on Cloud Computing
Security, 2009.
[90] N. Nikiforakis, L. Invernizzi, A. Kapravelos, S. V. Acker,
W. Joosen, C. Kruegel, F. Piessens, and G. Vigna, “You
are what you include,” in Proceedings of the 2012 ACM
SIGSAC Conference on Computer and Communications
Security, 2012.
[91] J. Pennington, R. Socher, and C. D. Manning, “GloVe:
Global vectors for word representation,” in Empirical
Methods in Natural Language Processing, 2014.
[92] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual
learning for image recognition,” in Proceedings of the IEEE
Conference on Computer Vision and Pattern Recognition,
2016.
[93] Stanford Vision Lab, Stanford University, and Princeton
University, “ImageNet,” http://www.image-net.org.
[94] Alex Clark and Contributors, “Pillow,” https://pillow.
readthedocs.io/en/stable/.
[95] fastai, “Data augmentation in computer vision,” 2021,
https://docs.fast.ai/vision.augment.html.
[96] X. Jin, Y. Chen, J. Dong, J. Feng, and S. Yan, “Collaborative
layer-wise discriminative learning in deep neural networks,”
in European Conference on Computer Vision, 2016.
[97] E. G. Adagbasa, S. A. Adelabu, and T. W. Okello, “Appli-
cation of deep learning with stratified k-fold for vegetation
species discrimation in a protected mountainous region using
sentinel-2 image,” Geocarto International, pp. 1–21, 2019.
[98] N. Ghamrawi and A. McCallum, “Collective multi-label clas-
sification,” in Proceedings of the 14th ACM International
Conference on Information and Knowledge Management,
2005.
[99] M. Jaggi, “An equivalence between the lasso and
support vector machines,” in Regularization, Optimization,
Kernels, and Support Vector Machines, J. A. Suykens,
M. Signoretto, and A. Argyriou, Eds. CRC Press, 2014.
[Online]. Available: https://arxiv.org/pdf/1303.1152.pdf
[100] L. Wu, S. C. Hoi, and N. Yu, “Semantics-preserving bag-
of-words models and applications,” IEEE Transactions on
Image Processing, vol. 19, no. 7, pp. 1908–1920, 2010.
[101] D. P. Kingma and J. Ba, “Adam: A method for stochastic
optimization,” in Proceedings of the 3rd International
Conference on Learning Representations, 2014.
[102] P.-T. de Boer, D. P. Kroese, S. Mannor, and R. Y.
Rubinstein, “A tutorial on the cross-entropy method,”
Annals of Operations Research, vol. 134, pp. 19–67, 2005.
[103] I. Goodfellow, Y. Bengio, and A. Courville, Deep
Learning. MIT Press, 2016. [Online]. Available:
http://www.deeplearningbook.org
[104] X. Shi, Z. Chen, H. Wang, D.-Y. Yeung, W.-K. Wong, and
W.-C. Woo, “Convolutional LSTM network: A machine
learning approach for precipitation nowcasting,” in Advances
in Neural Information Processing Systems, 2015.
[105] N. V. Chawla, K. W. Bowyer, L. O. Hall, and W. P.
Kegelmeyer, “Smote: Synthetic minority over-sampling tech-
nique,” Journal of Artificial Intelligence Research, vol. 16,
pp. 321–357, 2002.
[106] P. Skolka, C.-A. Staicu, and M. Pradel, “Anything to hide?
Studying minified and obfuscated code in the web,” in The
Web Conference, 2019.
[107] V. Le Pochat, T. Van Goethem, S. Tajalizadehkhoob,
M. Korczynski, and W. Joosen, “Tranco: A research-oriented
top sites ranking hardened against manipulation,” in
Proceedings of the 2019 Network and Distributed System
Security Symposium, 2019. [Online]. Available: https:
//tranco-list.eu
[108] C. Baraniuk, “Where will the ad versus ad blocker arms race
end?” Scientific American, May 2018. [Online]. Available:
https://www.scientificamerican.com/article/where-will-the-
ad-versus-ad-blocker-arms-race- end
[109] JavaScript Obfuscator, https://obfuscator.io.
[110] kangax, “HTML minifier,” https://kangax.github.io/html-
minifier/.
[111] mankyd, “htmlmin,” https://htmlmin.readthedocs.io/en/
latest/.
[112] T. László and Ákos Kiss, “Obfuscating C++ programs via
control flow flattening„” Annales Universitatis Scientarum
Budapestinensis de Rolando Eötvös Nominatae, Sectio
Computatorica, vol. 30, no. 1, pp. 3–19, 2009.
[113] A. Das, N. Borisov, G. Acar, and A. Pradeep, “The web’s
sixth sense: A study of scripts accessing smartphone sensors,”
Proceedings of the ACM SIGSAC Conference on Computer
and Communications Security, 2018.
[114] R. Masood, B. Z. H. Zhao, H. J. Asghar, and M. A. Kaafar,
“Touch and you’re trapp(ck)ed: Quantifying the uniqueness
of touch gestures for tracking,” Proceedings on Privacy
Enhancing Technologies, vol. 2018, no. 2, pp. 122–142,
2018.
[115] R. Amos, G. Acar, E. Lucherini, M. Kshirsagar, A. Narayanan,
and J. Mayer, “Privacy policies over time: Curation and
analysis of a million-document dataset,” 2020, https:
//arxiv.org/pdf/2008.09159.pdf.
[116] S. M. Bellovin, P. K. Dutta, and N. Reitinger, “Privacy
and synthetic datasets,” Stanford Technology Law Review,
vol. 22, no. 1, pp. 1–52, 2019.
ML-CB 472
A Heuristics
1. The canvas element’s height and width properties
must not be set below 16 px.
2. Text must be written to [the] canvas with at least
two colors or at least 10 distinct characters.
3. The script should not call the save, restore, or
addEventListener methods of the rendering con-
text.
4. The script extracts an image with toDataURL or
with a single call to getImageData that specifies an
area with a minimum size of 16 px ×16 px.
B JavaScript Obfuscator Tool
Flags
control-flow-flattening (i.e., a series of nested
functions alter code structure [112])
dead-code-injection (i.e., dead code is randomly
placed throughout the JavaScript program)
compact (i.e., one-line output code)
unicode-escape-sequence (i.e., string converted
to unicode escape sequence)
identifier-names-generator in hexadecimal
(i.e., renaming identifiers to hexadecmial, e.g.,
_0xabc123)
numbers-to-expressions (i.e., converts numbers
to functions, e.g., 1234 == 0xd93 + 0x10b4 +
0x41 0x67 + 0x84e0x3 + 0xff 8;)
rename-globals (i.e., rename global variable
names)
rename-properties (i.e., rename property names)
rotate-string-array (i.e., putting stings in array
and rotating the array)
self-defending (i.e., defeating measures to
JavaScript beautification tools)
shuffle-string-array (i.e., putting stings in array
and shuffling the array)
split-strings (i.e., splitting strings into separate
chunks)
string-array (i.e., removes string literals and puts
them in an array, e.g., var m = “Hello World” be-
comes var m = _0x12c456[0x1])
transform-object-keys (i.e., takes object keys
and transforms them into functions)
C Representative Fingerprinting
Images
Fig. 6. Common canvas fingerprinting images found in the
dataset. Images on the left (A) come from our original scrape,
while images on the right (B) are from our test suite. Notably,
the third image on the left (original scrape) was found on 68
websites (i.e., initiators), including: wsj.com, lenscrafters.com,
and fnlondon.com. On the right (test suite), the sixth im-
age was found on 48 websites in the test suite, including
humanesocietyofyorkcounty.org, bloomberg.com/businessweek,
and heart.org.
ML-CB 473
D Example Program Text
!
!
var d =document.createElement("canvas");
d.setAttribute("width",220);
d.setAttribute("height",30);
d.setAttribute("style","display:none");
window.document.body.appendChild(d);
var a =d.getContext("2d");
a.textBaseline ="top";
a.font ="14px 'Arial'";
a.textBaseline ="alphabetic";
a.fillStyle ="#f60";
a.fillRect(125,1,62,20);
a.fillStyle ="#069";
a.fillText("BrowserLeaks,com <canvas> 1.0",2,15);
a.fillStyle ="rgba(102, 204, 0, 0.7)";
a.fillText("BrowserLeaks,com <canvas> 1.0",4,17);
var p,g =d.toDataURL("image/
png").replace("data:image/png;base64,",""), b,k,
f,q,t,u =a =0;
/** @type {!Element} */
var canvas =document.createElement("canvas");
canvas.setAttribute("width",220);
canvas.setAttribute("height",30);
canvas.setAttribute("style","display:none");
window.document.body.appendChild(canvas);
var c =canvas.getContext("2d");
/** @type {string} */
c.textBaseline ="top";
/** @type {string} */
c.font ="14px 'Arial'";
/** @type {string} */
c.textBaseline ="alphabetic";
/** @type {string} */
c.fillStyle ="#f60";
c.fillRect(125,1,62,20);
/** @type {string} */
c.fillStyle ="#069";
c.fillText("BrowserLeaks,com <canvas> 1.0",2,15);
/** @type {string} */
c.fillStyle ="rgba(102, 204, 0, 0.7)";
c.fillText("BrowserLeaks,com <canvas> 1.0",4,17);
var m;
var s =canvas.toDataURL("image/
png").replace("data:image/png;base64,","");
var i;
var j;
var o3;
var resizewidth;
var val;
/** @type {number} */
var t =c =0;
var _0x1a985e =document[_0x296b('0x11')+
_0x296b('0x12')](_0x296b('0x13'));
_0x1a985e['\x73\x65\x74\x41\x74\x74\x72\x69\x62\x75'
+'\x74\x65'](_0x296b('0x14'), 0xdc);
_0x1a985e[_0x296b('0x15')+'\x74\x65']
(_0x296b('0x16'), 0x1e);
_0x1a985e[_0x296b('0x15')+'\x74\x65']
(_0x296b('0x17'),
'\x64\x69\x73\x70\x6c\x61\x79\x3a\x6e\x6f'+
'\x6e\x65');
window[_0x296b('0x18')][_0x296b('0x19')]
[_0x296b('0x1a')+'\x64'](_0x1a985e);
var _0x5b741b =_0x1a985e[_0x296b('0x1b')]
('\x32\x64');
_0x5b741b['\x74\x65\x78\x74\x42\x61\x73\x65\x6c\x69'
+'\x6e\x65']='\x74\x6f\x70';
_0x5b741b[_0x296b('0x1c')] =
'\x31\x34\x70\x78\x20\x27\x41\x72\x69\x61'+
'\x6c\x27';
_0x5b741b[_0x296b('0x1d')+'\x6e\x65']=
_0x296b('0x1e');
_0x5b741b[_0x296b('0x1f')] =_0x296b('0x20');
_0x5b741b[_0x296b('0x21')](0x7d,0x1,0x3e,0x14);
_0x5b741b[_0x296b('0x1f')] =_0x296b('0x22');
_0x5b741b[_0x296b('0x23')](_0x296b('0x24')+
'\x6b\x73\x2c\x63\x6f\x6d\x20\x3c\x63\x61'+
_0x296b('0x25'), 0x2,0xf);
_0x5b741b[_0x296b('0x1f')] =_0x296b('0x26')+
_0x296b('0x27')+'\x37\x29';
_0x5b741b['\x66\x69\x6c\x6c\x54\x65\x78\x74']
('\x42\x72\x6f\x77\x73\x65\x72\x4c\x65\x61'+
_0x296b('0x28')+_0x296b('0x25'), 0x4,0x11);
var _0x2902d0,_0x388aec =
_0x1a985e[_0x296b('0x29')](_0x296b('0x2a'))
[_0x296b('0x2b')]
('\x64\x61\x74\x61\x3a\x69\x6d\x61\x67\x65'+
_0x296b('0x2c')+'\x34\x2c',''),
_0x2ee290,_0x2de607,_0x1f7ca1,_0x45102f,
_0x50deb,_0x3d37a0 =_0x5b741b =0x0;
var _0x1a985e =document[_0x296b("0x11")+
_0x296b("0x12")](_0x296b("0x13"));
_0x1a985e["setAttribu"+"te"](_0x296b("0x14"),
220);
_0x1a985e[_0x296b("0x15")+"te"](_0x296b("0x16"),
30);
_0x1a985e[_0x296b("0x15")+"te"](_0x296b("0x17"),
"display:no"+"ne");
window[_0x296b("0x18")][_0x296b("0x19")]
[_0x296b("0x1a")+"d"](_0x1a985e);
var _0x5b741b =_0x1a985e[_0x296b("0x1b")]("2d");
/** @type {string} */
_0x5b741b["textBaseli"+"ne"]="top";
/** @type {string} */
_0x5b741b[_0x296b("0x1c")] ="14px 'Aria"+"l'";
_0x5b741b[_0x296b("0x1d")+"ne"]=_0x296b("0x1e");
_0x5b741b[_0x296b("0x1f")] =_0x296b("0x20");
_0x5b741b[_0x296b("0x21")](125,1,62,20);
_0x5b741b[_0x296b("0x1f")] =_0x296b("0x22");
_0x5b741b[_0x296b("0x23")](_0x296b("0x24")+"ks,com
<ca"+_0x296b("0x25"), 2,15);
/** @type {string} */
_0x5b741b[_0x296b("0x1f")] =_0x296b("0x26")+
_0x296b("0x27")+"7)";
_0x5b741b["fillText"]("BrowserLea"+_0x296b("0x28")
+_0x296b("0x25"), 4,17);
var _0x2902d0;
var _0x388aec =_0x1a985e[_0x296b("0x29")]
(_0x296b("0x2a"))[_0x296b("0x2b")]("data:image"+
_0x296b("0x2c")+"4,","");
var _0x2ee290;
var _0x2de607;
var _0x1f7ca1;
var _0x45102f;
var _0x50deb;
(A)
(B)
(C)
(D)
Fig. 7. Representative program text taken from the test suite (https://colorscheme.ru/js/canvas.min.js?e828175732). (A) shows the
original plaintext version of the program in truncated form, focusing on the part of the program drawing to the canvas; (B) shows the
plaintext snippet processed with jsNice, using the method discussed in Section 3.5.2; (C) shows a truncated version of the original
plaintext, obfuscated with the Javascript Obfuscator Tool, using the settings mentioned in Section 4.3—as best as we could tell (with
the help of jsNice), this sub-snippet represents the actions related to creating and filling the canvas with text, the full program was
transformed into a 12,457 character, single-line string; and (D) takes the full obfuscated program from (C), processes it with jsNice,
and shows only the truncated part provided in part (C), the full jsNice program is 483 lines long.
E Obfuscated Versus Obfuscated-then-jsNiceified Example
!
!
!
(B)
(A)
Fig. 8. Illustration of obfuscated plaintext (A) versus obfuscated-then-jsNiceified text (B). The example in (A) continues in hex-
adecimal form, while (B), with jsNice, re-introduces words like Cwm fjord and toDataURL in an instantiation section and, in a func-
tion named lagOffset, shows a similar structure to fingerprinting programs (Figure 7 (A)) and includes properties like fillStyle and
fillRect.