HAL Id: hal-01652021

https://hal.inria.fr/hal-01652021

Submitted on 2 Dec 2017

HAL is a multi-disciplinary open access

archive for the deposit and dissemination of sci-

entific research documents, whether they are pub-

lished or not. The documents may come from

teaching and research institutions in France or

abroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, est

destinée au dépôt et à la diffusion de documents

scientifiques de niveau recherche, publiés ou non,

émanant des établissements d’enseignement et de

recherche français ou étrangers, des laboratoires

publics ou privés.

FP-STALKER: Tracking Browser Fingerprint Evolutions

Antoine Vastel, Pierre Laperdrix, Walter Rudametkin, Romain Rouvoy

To cite this version:

Antoine Vastel, Pierre Laperdrix, Walter Rudametkin, Romain Rouvoy. FP-STALKER: Tracking

Browser Fingerprint Evolutions. IEEE S&P 2018 - 39th IEEE Symposium on Security and Privacy,

May 2018, San Francisco, United States. pp.728-741, �10.1109/SP.2018.00008�. �hal-01652021�

FP-STALKER:

Tracking Browser Fingerprint Evolutions

Antoine Vastel

Univ. Lille / Inria

antoine.vastel@univ-lille.fr

Pierre Laperdrix

INSA / Inria

pierre.laperdrix@inria.fr

Walter Rudametkin

Univ. Lille / Inria

walter.rudametkin@univ-lille.fr

Romain Rouvoy

Univ. Lille / Inria / IUF

romain.rouvoy@univ-lille.fr

Abstract—Browser fingerprinting has emerged as a technique

to track users without their consent. Unlike cookies, fingerprint-

ing is a stateless technique that does not store any information

on devices, but instead exploits unique combinations of attributes

handed over freely by browsers. The uniqueness of fingerprints

allows them to be used for identification. However, browser

fingerprints change over time and the effectiveness of tracking

users over longer durations has not been properly addressed.

In this paper, we show that browser fingerprints tend to

change frequently—from every few hours to days—due to, for

example, software updates or configuration changes. Yet, despite

these frequent changes, we show that browser fingerprints can

still be linked, thus enabling long-term tracking.

FP-STALK ER is an approach to link browser fingerprint

evolutions. It compares fingerprints to determine if they originate

from the same browser. We created two variants of FP-STA LK ER,

a rule-based variant that is faster, and a hybrid variant that

exploits machine learning to boost accuracy. To evaluate FP-

STALK ER , we conduct an empirical study using 98,598 finger-

prints we collected from 1,905 distinct browser instances. We

compare our algorithm with the state of the art and show that,

on average, we can track browsers for 54.48 days, and 26 % of

browsers can be tracked for more than 100 days.

I. INTRODUCTION

Websites track their users for different reasons, including

targeted advertising, content personalization, and security [2].

Traditionally, tracking consists in assigning unique identifiers

to cookies. However, recent discussions and legislation have

brought to light the privacy concerns these cookies imply;

more people are sensitive to these issues. A study conducted

by Microsoft in 2012 observed that they were unable to keep

track of 32 % of their users using only cookies, as they were

regularly deleted [26]. Cookie erasure is now common as many

browser extensions and private modes automatically delete

cookies at the end of browsing sessions.

In 2010, Eckersley introduced a tracking technique called

browser fingerprinting that leverages the user’s browser and

system characteristics to generate a fingerprint associated to

the browser [8]. He showed that 83.6 % of visitors to the

PANOPTICLICK website1could be uniquely identified from

a fingerprint composed of only 8attributes. Further studies

have focused on studying new attributes that increase browser

fingerprint uniqueness [7], [10], [14], [18], [19], [20], while

others have shown that websites use browser fingerprinting as

a way to regenerate deleted cookies [1].

1https://panopticlick.eff.org

However, fingerprint uniqueness, by itself, is insufficient

for tracking because fingerprints change. One needs to keep

track of these evolutions to link them to previous fingerprints.

Recent approaches exploit fingerprint uniqueness as a defense

mechanism by adding randomness to break uniqueness [12],

[13], [21], but they did not address linkability.

The goal of this paper is to link browser fingerprint evo-

lutions and discover how long browsers can be tracked. More

precisely, FP-STALK ER detects if two fingerprints originate

from the same browser instance, which refers to an installation

of a browser on a device. Browser instances change over

time, e.g. they are updated or configured differently, causing

their fingerprints to evolve. We introduce two variants of FP-

STALK ER: a rule-based and an hybrid variant, which leverage

rules and a random forest.

We evaluate our approach using 98,598 browser finger-

prints originating from 1,905 browser instances, which we

collected over two years. The fingerprints were collected using

two browser extensions advertised on the AmIUnique website2,

one for Firefox3and the other for Chrome4. We compare

both variants of FP-STALKER and an implementation of the

algorithm proposed by Eckersley [8]. In our experiments, we

evaluate FP-STALKER’s ability to correctly link browser fin-

gerprints originating from the same browser instance, as well

as its ability to detect fingerprints that originate from unknown

browser instances. Finally, we show that FP-STAL KE R can

link, on average, fingerprints from a given browser instance

for more than 51 days, which represents an improvement of

36 days compared to the closest algorithm from the literature.

In summary, this paper reports on four contributions:

1) We highlight the limits of browser fingerprint uniqueness

for tracking purposes by showing that fingerprints change

frequently (50 % of browser instances changed their

fingerprints in less than 5days, 80 % in less than 10 days);

2) We propose two variant algorithms to link fingerprints

from the same browser instance, and to detect when a

fingerprint comes from an unknown browser instance;

3) We compare the accuracy of our algorithms with the

state of the art, and we study how browser fingerprinting

frequency impacts tracking duration;

4) Finally, we evaluate the execution times of our algorithms,

and we discuss the impact of our findings.

2https://amiunique.org

3https://addons.mozilla.org/firefox/addon/amiunique/

4https://chrome.google.com/webstore/detail/amiunique/

pigjfndpomdldkmoaiiigpbncemhjeca

TABLE I: An example of a browser fingerprint

Attribute Source Value Examples

Accept HTTP header text/html,application/xhtml+xml,application

/xml;q=0.9,image/webp,*/*;q=0.8

Connection HTTP header close

Encoding HTTP header gzip, deflate, sdch, br

Headers HTTP header Connection Accept X-Real-IP DNT Cookie

Accept-Language Accept-Encoding User-Agent Host

Languages HTTP header en-US,en;q=0.8,es;q=0.6

User-agent HTTP header

Mozilla/5.0 (Windows NT 10.0; Win64; x64)

AppleWebKit/537.36 (KHTML, like Gecko)

Chrome/54.0.2840.99 Safari/537.36

Canvas JavaScript

Cookies JavaScript yes

Do not track JavaScript yes

Local storage JavaScript no

Platform JavaScript MacIntel

Plugins JavaScript

Plugin 0: Chrome PDF Viewer; ; mhiehjai.

Plugin 1: Chrome PDF Viewer; Portable

Document Format; internal-pdf-viewer.

Plugin 2: Native Client; ; internal-nacl-plugin.

Resolution JavaScript 2560x1440x24

Timezone JavaScript -180

WebGL Javascript NVIDIA GeForce GTX 750 Series; Microsoft .

Fonts Flash List of fonts installed on the device

The remainder of this paper is organized as follows.

Section II gives an overview of the state of the art. Section III

analyzes how browser fingerprints evolve over time. Section IV

introduces Eckersley’s algorithm as well as both variants of

FP-STALKER. Section V reports on an empirical evaluation,

a comparison to the state of the art, and a benchmark of our

approach. Finally, we conclude in Section VI.

II. BAC KG ROUND & MOT IVATIO NS

a) Browser fingerprinting: aims to identify web

browsers without using stateful identifiers, like cookies [8]. A

browser fingerprint is composed of a set of browser and system

attributes. By executing a script in the browser, sensitive meta-

data can be revealed, including the browser’s parameters, but

also operating system and hardware details. As this technique

is completely stateless, it remains hard to detect and block,

as no information is stored on the client side. Individually,

these attributes may not give away much information but,

when combined, they often form a unique signature, hence

the analogy with a fingerprint. Most of the attributes in a

fingerprint are collected through JavaScript APIs and HTTP

headers, but extra information can also be retrieved through

plugins like Flash. Table I illustrates a browser fingerprint

collected from a Chrome browser running on Windows 10.

b) Browser fingerprinting studies: have focused on

uniquely identifying browsers. Mayer [17] was the first to point

out, in 2009, that a browser’s “quirkiness” that stems from its

configuration and underlying operating system could be used

for “individual identification”. In 2010, the PANOPTICLICK

study was the first large-scale demonstration of browser fin-

gerprinting as an identification technique [8]. From about

half a million fingerprints, Eckersley succeeded in uniquely

identifying 83.6 % of browsers. Since then, many studies have

been conducted on many different aspects of this tracking

technique. As new features are included within web browsers

to draw images, render 3D scenes or process sounds, new

attributes have been discovered to strengthen the fingerprinting

process [5], [7], [9], [10], [18], [19], [20]. Additionally, re-

searchers have performed large crawls of the web that confirm

a steady growth of browser fingerprinting [1], [2], [9], [22].

While most of these studies focused on desktops, others

demonstrated they could successfully fingerprint mobile device

browsers [11], [14]. Finally, a study we conducted in 2016

confirmed Eckersley’s findings, but observed a notable shift in

some attributes [14]. While the lists of plugins and fonts were

the most revealing features in 2010, this has rapidly changed

as the Netscape Plugin Application Programming Interface

(NPAPI) has been deprecated in Chrome (September 2015) and

Firefox (March 2017). Browser fingerprinting is continuously

adapting to evolutions in browser technologies since highly

discriminating attributes can change quickly.

c) Browser fingerprinting defenses: have been designed

to counter fingerprint tracking. The largest part of a browser

fingerprint is obtained from the JavaScript engine. However,

the values of these attributes can be altered to mislead fin-

gerprinting algorithms. Browser extensions, called spoofers,

change browser-populated values, like the User-agent or

the Platform, with pre-defined ones. The goal here is to

expose values that are different from the real ones. However,

Nikiforakis et al. showed that they may be harmful as they

found that these extensions “did not account for all possible

ways of discovering the true identity of the browsers on which

they are installed” and they actually make a user “more

visible and more distinguishable from the rest of the users,

who are using their browsers without modifications” [22].

Torres et al. went a step further by providing the concept of

separation of web identities with FP-BLO CK, where a browser

fingerprint is generated for each encountered domain [24].

Every time a browser connects to the same domain, it will

return the same fingerprint. However, it keeps presenting the

same limitation as naive spoofers since the modified values are

incomplete and can be incoherent. Laperdrix et al. explored the

randomization of media elements, such as canvas and audio

used in fingerprinting, to break fingerprint linkability [12].

They add a slight random noise to canvas and audio, that is

not perceived by users, to defeat fingerprinting algorithms.

Finally, the TOR browser is arguably the best overall de-

fense against fingerprinting. Their strategy is to have all users

converge towards a normalized fingerprint. The TOR browser

is a modified Firefox that integrates custom defenses [23].

In particular, they removed plugins, canvas image extraction

is blocked by default, and well-known attributes have been

modified to return the same information on all operating

systems. They also defend against JavaScript font enumeration

by bundling a set of default fonts with the browser. However,

using TOR can degrade the user’s experience (e.g., due to

latency) and can break some websites (e.g., due to disabled

features, websites that block the TO R network). Furthermore,

the unique browser fingerprint remains limited, as changes to

the browser’s configuration, or even resizing the window, can

make the browser fingerprint unique.

d) Browser fingerprint linkability: is only partially ad-

dressed by existing studies. Eckersley tried to identify re-

turning users on the PANOPTICLICK website with a very

simple heuristic based on string comparisons that made correct

guesses 65 % of the time [8]. Although not related to browsers,

the overall approach taken by Wu et al. to fingerprint Android

smartphones from permissionless applications [25] is similar

in nature to our work. They collected a 38-attribute fingerprint,

including the list of system packages, the storage capacity

of the device and the current ringtone. Using a naive Bayes

classifier, they were able to successfully link fingerprints from

the same mobile device over time. However, the nature of

the data in [25] strongly differs from the focus of this work.

In particular, the attributes in a browser fingerprint are not

composed of strong identifiers, like the current wallpaper, and

the browser does not share personal information from other

parts of the system as do applications on Android. For these

reasons, the results are not comparable.

To the best of our knowledge, beyond the initial contribu-

tion by Eckersley, no other studies have looked into the use of

advanced techniques to link browser fingerprints over time.

III. BROWSE R FINGERPRINT EVOLUTIONS

This paper focuses on the linkability of browser fingerprint

evolutions over time. Using fingerprinting as a long-term

tracking technique requires not only obtaining unique browser

fingerprints, but also linking fingerprints that originate from

the same browser instance. Most of the literature has focused

on studying or increasing fingerprint uniqueness [7], [8], [14].

While uniqueness is a critical property of fingerprints, it is

also critical to understand fingerprint evolution to build an

effective tracking technique. Our study provides more insights

into browser fingerprint evolution in order to demonstrate the

effectiveness of such a tracking technique.

a) Input dataset: The raw input dataset we collected

contains 172,285 fingerprints obtained from 7,965 different

browser instances. All browser fingerprints were obtained from

AmIUnique extensions for Chrome and Firefox installed from

July 2015 to early August 2017 by participants in this study.

The extensions load a page in the background that fingerprints

the browser. Compared to a fingerprinting website, the only

additional information we collect is a unique identifier we

generate per browser instance when the extension is installed.

This serves to establish the ground truth. Moreover, we pre-

process the raw dataset by applying the following rules:

1) We remove browser instances with less than 7browser

fingerprints. This is because to study the ability to track

browsers, we need browser instances that have been

fingerprinted multiple times.

2) We discard browser instances with inconsistent finger-

prints due to the use of countermeasures that artificially

alter the fingerprints. To know if a user installed such a

countermeasure, we check if the browser or OS changes

and we check that the attributes are consistent among

themselves. Although countermeasures exist in the wild,

they are used by a minority of users and, we argue,

should be treated by a separate specialized anti-spoofing

algorithm. We leave this task for future work.

After applying these rules, we obtain a final dataset of

98,598 fingerprints from 1,905 browser instances. All fol-

lowing graphs and statistics are based on this final dataset.

Figure 1 presents the number of fingerprints and distinct

browser instances per month over the two year period.

Most users heard of our extensions through posts published

on popular tech websites, such as Reddit, Hackernews or

Slashdot. Users install the extension to visualize the evolution

07/2015

10/2015

01/2016

04/2016

07/2016

10/2016

01/2017

04/2017

07/2017

100

200

300

400

500

600

700

800

Number of distinct browser

instances

2000

4000

6000

8000

Number of fingerprints

Number of browser instances

Number of fingerprints

Fig. 1: Number of fingerprints and distinct browser instances

per month

100101102103104

Number of fingerprints

Anonymity set size

Fig. 2: Browser fingerprint anonymity set sizes

of their browser fingerprints over a long period of time, and

also to help researchers understand browser fingerprinting in

order to design better countermeasures. We explicitly state

the purpose of the extension and the fact it collects their

browser fingerprints. Moreover, we received an approval from

the Institutional Review Board (IRB) of our research center

for the collection as well as the storage of these browser fin-

gerprints. As a ground truth, the extension generates a unique

identifier per browser instance. The identifier is attached to all

fingerprints, which are automatically sent every 4hours. In

this study, the browser fingerprints we consider are composed

of the standard attributes described in Table I.

Figure 2 illustrates the anonymity set sizes against the

number of participants involved in this study. The long tail

reflects that 99 % of the browser fingerprints are unique among

all the participants and belong to a single browser instance,

while only 10 browser fingerprints are shared by more than 5

browser instances.

b) Evolution triggers: Browser fingerprints naturally

evolve for several reasons. We identified the following cat-

egories of changes:

Automatic evolutions happen automatically and without di-

rect user intervention. This is mostly due to automatic

software upgrades, such as the upgrade of a browser or a

plugin that may impact the user agent or the list of

plugins;

Context-dependent evolutions being caused by changes in

the user’s context. Some attributes, such as resolution

or timezone, are indirectly impacted by a contextual

change , such as connecting a computer to an external

screen or traveling to a different timezone; and

TABLE II: Durations the attributes remained constant for the

median, the 90th and the 95th percentiles.

Percentile (days)

Attribute Trigger 50th 90th 95th

Resolution Context Never 3.1 1.8

User agent Automatic 39.7 13.0 8.4

Plugins Automatic/User 44.1 12.2 8.7

Fonts Automatic Never 11.8 5.4

Headers Automatic 308.0 34.1 14.9

Canvas Automatic 290.0 35.3 17.2

Major

browser version Automatic 52.2 33.3 23.5

Timezone Context 206.3 53.8 26.8

Renderer Automatic Never 81.2 30.3

Vendor Automatic Never 107.9 48.6

Language User Never 215.1 56.7

Dnt User Never 171.4 57.0

Encoding Automatic Never 106.1 60.5

Accept Automatic Never 163.8 109.5

Local storage User Never Never 320.2

Platform Automatic Never Never Never

Cookies User Never Never Never

User-triggered evolutions that require an action from the

user. They concern configuration-specific attributes, such

as cookies,do not track or local storage.

To know how long attributes remain constant and if their

stability depends on the browser instance, we compute the

average time, per browser instance, that each attribute does

not change. Table II presents the median, the 90th and 95th

percentiles of the duration each attribute remains constant, on

average, in browser instances. In particular, we observe that the

User agent is rather unstable in most browser instances as

its value is systematically impacted by software updates. In

comparison, attributes such as cookies,local storage

and do not track rarely change if ever. Moreover, we

observe that attributes evolve at different rates depending on

the browser instance. For example, canvas remains stable for

290 days in 50% of the browser instances, whereas it changes

every 17.2days for 10% of them. The same phenomena can

be observed for the screen resolution where more than

50% of the browser instances never see a change, while 10%

change every 3.1days on average. More generally this points

to some browser instances being quite stable, and thus, more

trackable, while others aren’t.

c) Evolution frequency: Another key indicator to ob-

serve is the elapsed time (Et) before a change occurs in

a browser fingerprint. Figure 3 depicts the cumulative dis-

tribution function of Etfor all fingerprints (blue), or aver-

aged per browser instance (orange). After one day, at least

one transition occurs in 45.2 % of the observed fingerprints.

The 90th percentile is observed after 13 days and the 95th

percentile after 17.3days. This means the probability that

at least one transition occurs in 13 days is 0.9(blue). It

is important to point out that changes occur more or less

frequently depending on the browser instance (orange). While

some browser instances change often (20% change in less

than two days) others, on the contrary, are much more stable

(23% have no changes after 10 days). In this context, keeping

pace with the frequency of change is likely a challenge for

browser fingerprint linking algorithms and, to the best of our

knowledge, has not been explored in the state of the art.

d) Evolution rules: While it is difficult to anticipate

browser fingerprint evolutions, we can observe how individual

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

Days before a change occurs in a fingerprint

0.0

0.2

0.4

0.6

0.8

1.0

P(X)

Over all fingerprints

Average per browser instance

Fig. 3: CDF of the elapsed time before a fingerprint evolution

for all the fingerprints, and averaged per browser instance.

attributes evolve. In particular, evolutions of the User agent

attribute are often tied to browser upgrades, while evolutions

of the Plugins attribute refers to the addition, deletion or

upgrade of a plugin (upgrades change its version). Neverthe-

less, not all attribute changes can be explained in this manner,

some values are difficult to anticipate. For example, the value

of the canvas attribute is the result of an image rendered by

the browser instance and depends on many different software

and hardware layers. The same applies, although to a lesser

extent, to screen resolution, which can take unexpected

values depending on the connected screen. Based on these

observations, the accuracy of linking browser fingerprint evo-

lutions depends on the inference of such evolution rules. The

following section introduces the evolution rules we first iden-

tified empirically, and then learned automatically, to achieve

an efficient algorithm to track browser fingerprints over time.

IV. LINKING BROW SE R FINGERPRINTS

FP-STALKER’s goal is to determine if a browser fin-

gerprint comes from a known browser instance—i.e., it is

an evolution—or if it should be considered as from a new

browser instance. Because fingerprints change frequently, and

for different reasons (see section III), a simple direct equality

comparison is not enough to track browsers over long periods

of time.

In FP-STALKER, we have implemented two variant al-

gorithms with the purpose of linking browser fingerprints,

as depicted in Figure 4. The first variant is a rule-based

algorithm that uses a static ruleset, and the second variant is

an hybrid algorithm that combines both rules and machine

learning. We explain the details and the tradeoffs of both

algorithms in this section. Our results show that the rule-based

algorithm is faster but the hybrid algorithm is more precise

while still maintaining acceptable execution times. We have

also implemented a fully random forest-based algorithm, but

the small increase in precision did not outweigh the large

execution penalty, so we do not present it further in this paper.

A. Browser fingerprint linking

When collecting browser fingerprints, it is possible that

a fingerprint comes from a previous visitor—i.e., a known

browser instance—or from a new visitor—i.e., an unknown

browser instance. The objective of fingerprint linking is to

match fingerprints to their browser instance and follow the

browser instance as long as possible by linking all of its

fingerprint evolutions. In the case of a match, linked browser

fingerprints are given the same identifier, which means the

linking algorithm considers they originate from the same

browser instance. If the browser fingerprint cannot be linked,

the algorithm assigns a new identifier to the fingerprint.

More formally, given a set of known browser fingerprints

F, each f∈Fhas an identifier f.id that links to the browser

instance it belongs to. Given an unknown fingerprint fu/∈F

for whom we ignore the real id, a linking algorithm returns

the browser instance identifier fk.id of the fingerprint fkthat

maximizes the probability that fkand fubelong to the same

browser instance. This computation can be done either by

applying rules, or by training an algorithm to predict this

probability. If no known fingerprint can be found, it assigns

a new id to fu. For optimization purposes, we only hold and

compare the last νfingerprints of each browser instance biin

F. The reason is because if we linked, for example, 3browser

fingerprints fA,fBand fCto a browser instance bithen,

when trying to link an unknown fingerprint fu, it is rarely

useful to compare futo the oldest browser fingerprints of bi.

That is, newer fingerprints are more likely to produce a match,

hence we avoid comparing old fingerprints in order to improve

execution times. In our case we set the value of νto 2.

B. Rule-based Linking Algorithm

The first variant of FP-STALKER is a rule-based algorithm

that uses static rules obtained from statistical analyses per-

formed in section III. The algorithm relies on rules designed

from attribute stability presented in Table II to determine if an

unknown fingerprint fubelongs to the same browser instance

as a known fingerprint fk. We also define rules based on

constraints that we would not expect to be violated, such as,

a browser’s family should be constant (e.g., the same browser

instance cannot be Firefox one moment and Chrome at a

later time), the Operating System is constant, and the browser

version is either constant or increases over time. The full list

of rules are as follow:

1) The OS,platform and browser family must be

identical for any given browser instance. Even if this

may not always be true (e.g. when a user updates from

Windows 8 to 10), we consider it reasonable for our

algorithm to lose track of a browser when such a large

change occurs since it is not frequent.

2) The browser version remains constant or increases

over time. This would not be true in the case of a

downgrade, but this is also, not a common event.

3) Due to the results from our statistical analyses, we have

defined a set of attributes that must not differ between two

fingerprints from the same browser instance. We consider

that local storage,Dnt,cookies and canvas

should be constant for any given browser instance. As

observed in Table II, these attributes do not change often,

if at all, for a given browser instance. In the case of

canvas, even if it seldomly changes for most users (see

Table II , the changes are unpredictable making them

hard to model. Since canvas are quite unique among

browser instances [14], and don’t change too frequently, it

(a) Rule-based variant of FP-STAL KER . Uses a set of

static rules to determine if fingerprints should be linked

to the same browser instance or not.

(b) Hybrid variant of FP-STAL KER . The training phase is used to learn the

probability that two fingerprints belong to the same browser instance, and the

testing phase uses the random forest-based algorithm to link fingerprints.

Fig. 4: FP-STALK ER : Overview of both algorithm variants.

The rule-based algorithm is simpler and faster but the hybrid

algorithm leads to better fingerprint linking.

is still interesting to consider that it must remain identical

between two fingerprints of the same browser instance.

4) We impose a constraint on fonts: if both fingerprints have

Flash activated—i.e. we have a list of fonts available—

then the fonts of fumust either be a subset or a superset

of the fonts of fk, but not a disjoint set. That means that

between two fingerprints of a browser instance, it will

allow deletions or additions of fonts, but not both.

5) We define a set of attributes that are allowed to

change, but only within a certain similarity. That

means that their values must have a similarity ra-

tio >0.75, as defined in the Python library func-

tion difflib.SequenceMatcher().ratio. These

attributes are user agent,vendor,renderer,

plugins,language,accept,headers. We allow

at most two changes of this kind.

6) We also define a set of attributes that are allowed to

change, no matter their value. This set is composed of

resolution,timezone and encoding. However,

we only allow one change at the same time among these

three attributes.

7) Finally, the total number of changes from rules 5 and 6

must be less than 2.

Algorithm 1 Rule-based matching algorithm

function FINGERPRINTMATCHING(F, fu)

rules ={rule1, ..., rule6}

candidates ∅

exact ∅

for fk∈Fdo

if VERIFYRUL ES(fk, fu, r ules)then

if nbDif f = 0 then

exact exact ∪ hfki

else

candidates candidates ∪ hfki

end if

end for

if |exact|>0and SAMEIDS(exact)then

return exact[0].id

else if |candidates|>0and SAMEIDS(candidates)then

return candidates[0].id

else

return GENE RATE NEW ID()

end if

end function

SAMEIDS is a function that, given a list of candidates, returns true

if all of them share the same id, else false.

The order in which rules are applied is important for

performance purposes: we ordered them from the most to

least discriminating. The first rules discard many candidates,

reducing the total number of comparisons. In order to link

futo a fingerprint fk, we apply the rules to each known

fingerprint taken from F. As soon as a rule is not matched,

the known fingerprint is discarded and we move onto the next.

If a fingerprint matches all the rules, then it is added to a

list of potential candidates, candidates. Moreover, in case

fingerprints fkand fuare identical, we add it to the list of

exact matching candidates, exact. Once the rule verification

process is completed, we look at the two lists of candidates. If

exact is not empty, we check if there is only one candidate or if

all the candidates come from the same browser instance. If it is

the case, then we link fuwith this browser instance, otherwise

we assign a new id to fu. In case no exact candidate is found,

we look at candidates and apply the same technique as for

exact. We summarize the rule-based approach in Algorithm 1.

On a side-note, we established the rules using a simple

univariate statistical analysis to study attribute stability (see

Table II), as well as some objective (e.g., rule 1) and other sub-

jective (e.g., rule 4) decisions. Due to the difficulty in making

complex yet effective rules, the next subsection presents the

use of machine learning to craft a more effective algorithm.

C. Hybrid Linking Algorithm

The second variant of FP-STALKER mixes the rule-based

algorithm with machine learning to produce a hybrid algo-

rithm. It reuses the first three rules of the previous algorithm,

since we consider them as constraints that should not be

violated between two fingerprints of a same browser instance.

However, for the last four rules, the situation is more fuzzy.

Indeed, it is not as clear when to allow attributes to be

different, how many of them can be different, and with what

dissimilarity. Instead of manually crafting rules for each of

these attributes, we propose to use machine learning to dis-

cover them. The interest of combining both rules and machine

learning approaches is that rules are faster than machine

learning, but machine learning tends to be more precise. Thus,

by applying the rules first, it helps keep only a subset of

fingerprints on which to apply the machine learning algorithm.

1) Approach Description: The first step of this algorithm

is to apply rules 1,2and 3on fuand all fk∈F. We keep

the subset of browser fingerprints fksub that verify these rules.

If, during this process, we found any browser fingerprints that

exactly match fu, then we add them to exact. In case exact is

not empty and all of its candidates are from the same browser

instance, we stop here and link fuwith the browser instance

in exact. Otherwise, if there are multiple exact candidates but

from different browser instances, then we assign a new browser

id to fu. In the case where the set of exact candidates is

empty, we continue with a second step that leverages machine

learning. In this step, for each fingerprint fk∈fksub, we

compute the probability that fkand fucome from the same

browser instance using a random forest model. We keep a set

of fingerprint candidates whose probability is greater than a

λthreshold parameter. If the set of candidates is empty, we

assign a new id to fu. Otherwise, we keep the set of candidates

with the highest and second highest probabilities, ch1and ch2.

Then, we check if ch1contains only one candidate or if all

of the candidates come from the same browser instance. If

it is not the case, we check that either the probability ph1

associated with candidates of ch1is greater than the probability

ph2associated with candidates of ch2+diff, or that ch2and

ch1contains only candidates from the same browser instance.

Algorithm 2 summarizes the hybrid approach.

2) Machine Learning: Computing the probability that two

fingerprints fuand fkoriginate from the same browser in-

stance can be modeled as a binary classification problem where

the two classes to predict are same browser instance

and different browser instance. We use the ran-

dom forest algorithm [6] to solve this binary classification

problem. A random forest is an ensemble learning method

for classification that operates by constructing a multitude

of decision trees at training time and outputting the class

of the individual trees. In the case of FP-STALKER, each

decision tree makes a prediction and votes if the two browser

fingerprints come from the same browser instance. The result

of the majority vote is chosen. Our main motivation to adopt a

random forest instead of other classifiers is because it provides

a good tradeoff between precision and the interpretation of

the model. In particular, the notion of feature importance in

random forests allows FP-STALKER to interpret the importance

of each attribute in the decision process.

In summary, given two fingerprints, fu/∈Fand fk∈F,

whose representation is reduced to a single feature vector of

Mfeatures X=hx1, x2, ..., xMi, where the feature xnis

the comparison of the attribute nfor both fingerprints (the

process of transforming two fingerprints into a feature vector

is presented after). Our random forest model computes the

probability P(fu.id =fk.id |(x1, x2, ..., xM)) that fuand fk

belong to the same browser instance.

a) Input Feature Vector: To solve the binary classifica-

tion problem, we provide an input vector X=hx1, x2, ..., xMi

of Mfeatures to the random forest classifier. The features

are mostly pairwise comparisons between the values of the

attributes of both fingerprints (e.g.,Canvas,User agent).

Algorithm 2 Hybrid matching algorithm

function FINGERPRINTMATCHING(F, fu, λ)

rules ={rule1, rule2, rule3}

exact ∅

Fksub ∅

for fk∈Fdo

if VERIFYRUL ES(fk, fu, r ules)then

if nbDif f = 0 then

exact exact ∪ hfki

else

Fksub Fksub ∪ hfki

end if

end for

if |exact|>0then

if SAMEIDS(exact)then

return exact[0].id

else

return GENE RATE NEW ID()

end if

candidates ∅

for fk∈Fksub do

hx1, x2, ..., xMi=FEATU RE VEC TOR(fu, fk)

pP(fu.id =fk.id | hx1, x2, ..., xMi)

if p≥λthen

candidates candidates ∪ hfk, pi

end if

end for

if |candidates|>0then

ch1, ph1GE TCA ND IDATE SRA NK(candidates, 1)

ch2, ph2GE TCA ND IDATE SRA NK(candidates, 2)

if SAMEIDS(ch1)and ph1> ph2+dif f then

return candidates[0].id

end if

if SAMEIDS(ch1∪ch2)then

return candidates[0].id

end if

return GENE RATE NEW ID()

end function

GE TCA ND IDATE SRANK is a function that given a list of candidates

and an rank i, returns a list of candidates with the ith greatest

probability, and this probability.

Most of these features are binary values (0or 1) corresponding

to the equality or inequality of an attribute, or similarity

ratios between these attributes. We also include a number

of changes feature that corresponds to the total number of

different attributes between fuand fk, as well as the time

difference between the two fingerprints.

In order to choose which attributes constitute the feature

vector we made a feature selection. Indeed, having too many

features does not necessarily ensure better results. It may lead

to overfitting—i.e., our algorithm correctly fits our training

data, but does not correctly predict on the test set. Moreover,

having too many features also has a negative impact on

performance. For the feature selection, we started with a model

using all of the attributes in a fingerprint. Then, we looked at

feature importance, as defined by [15], to determine the most

discriminating features. In our case, feature importance is a

combination of uniqueness, stability, and predictability (the

possibility to anticipate how an attribute might evolve over

time). We removed all the components of our feature vector

TABLE III: Feature importances of the random forest model

calculated from the fingerprint train set.

Rank Feature Importance

1Number of changes 0.350

2Languages HTTP 0.270

3User agent HTTP 0.180

4Canvas 0.090

5Time difference 0.083

6Plugins 0.010

7Fonts 0.008

8Renderer 0.004

9Resolution 0.003

that had a negligible impact (feature importance <0.002).

Finally, we obtained a feature vector composed of the attributes

presented in Table III. We see that the most important feature

is the number of differences between two fingerprints, and the

second most discriminating attribute is the list of languages.

Although this may seem surprising since the list of languages

does not have high entropy, it does remain stable over time,

as shown in Table II, which means that if two fingerprints

have different languages, this often means that they do not

belong to the same browser instance. In comparison, screen

resolution also has low entropy but it changes more often than

the list of languages, leading to low feature importance. This is

mostly caused by the fact that since screen resolution changes

frequently, having two fingerprints with a different resolution

doesn’t add a lot of information to determine whether or not

they are from the same browser instance. Finally, we see a

high drop in feature importance after rank 5 (from 0.083 to

0.010), which means that most of the information required for

the classification is contained in the first five features.

b) Training Random Forests: This phase trains the

random forest classifier to estimate the probability that two

fingerprints belong to the same browser instance. To do so, we

split the input dataset introduced in Section III chronologically

into two sets: a training set and a test set. The training set is

composed of the first 40 % of fingerprints in our input dataset,

and the test set of the last 60 %. The random forest detects

fingerprint evolutions by computing the evolutions between

fingerprints as feature vectors. During the training phase, it

needs to learn about correct evolutions by computing relevant

feature vectors from the training set. Algorithm 3 describes

this training phase, which is split into two steps.

Algorithm 3 Compute input feature vectors for training

function BUI LD TRAININGVECTOR S(I D,F,δ,ν)

T∅

for id ∈ID do Step 1

Fid BROWSERFINGERPRINTS(id, F )

for ft∈Fid do

TT∪FEATU RE VEC TOR (ft, ft−1)

end for

for f∈Fdo Step 2

frRANDOM(F)

if f.id 6=fr.id then

TT∪FEATU RE VEC TOR (f, fr)

end if

end for

return T

end function

In Step 1, for every browser instance (id) of the

training set, we compare each of its fingerprints (ft∈

BROWSE RFINGERPRINTS(id, F )) present in the training set

(F) with the previous one (ft−1). By doing so, FP-STALK ER

captures the atomic evolutions that occur between two consec-

utive fingerprints from the same browser instance. We apply

BUI LD TRAININGVEC TO RS() for different collect frequencies

(time difference between tand t−1) to teach our model to link

fingerprints even when they are not equally spaced in time.

While Step 1 teaches the random forest to identify

fingerprints that belong to the same browser instance, it is also

necessary to identify when they do not. Step 2 compares

fingerprints from different browser instances. Since the num-

ber of fingerprints from different browser instances is much

larger than the number of fingerprints from the same browser

instance, we limit the number of comparisons to one for

each fingerprint. This technique is called undersampling [16]

and it reduces overfitting by adjusting the ratio of input data

labeled as true—i.e., 2 fingerprints belong to the same browser

instance—against the number of data labeled as false—i.e., 2

fingerprints are from different browser instances. Otherwise,

the algorithm would tend to simply predict false.

c) Random forest hyperparameters.: Concerning the

number of trees of the random forest, there is a tradeoff

between precision and execution time. Adding trees does

obtain better results but follows the law of diminishing returns

and increases training and prediction times. Our goal is to

balance precision and execution time. The number of features

plays a role during the tree induction process. At each split, Nf

features are randomly selected, among which the best split is

chosen [4]. Usually, its default value is set to the square root of

the length of the feature vector. The diff parameter enables the

classifier to avoid selecting browser instances with very similar

probabilities as the origin of the fingerprint; we would rather

create a new browser instance than choose the wrong one. It

is not directly related to random forest hyperparameters but

rather to the specificities of our approach. In order to optimize

the hyperparameters number of trees and number of features,

as well as the diff parameter, we define several possible values

for each and run a grid search to optimize the accuracy. This

results in setting the hyperparameters to 10 trees and 3features,

and the diff value to 0.10.

After training our random forest classifier, we obtain a

forest of decision trees that predict the probability that two

fingerprints belong to the same browser instance. Figure 5 il-

lustrates the first three levels of one of the decision trees. These

levels rely on the languages, the number of changes

and the user agent to take a decision. If an attribute has a

value below its threshold, the decision path goes to the left

child node, otherwise it goes to the right child node. The

process is repeated until we reach a leaf of the tree. The

prediction corresponds to the class (same/different browser

instance) that has the most instances over all the leaf nodes.

d) Lambda threshold parameter: For each browser

fingerprint in the test set, we compare it with its previ-

ous browser fingerprint and with another random fingerprint

from a different browser, and compute the probability that

it belongs to the same browser instance using our random

forest classifier with the parameters determined previously.

Using these probabilities and the true labels, we choose the λ

languagesHTTP <= 0.978

number of changes < = 4.5

True

number of changes < = 5.5

False

languagesHTTP <= 0.759 num berof changes <= 5.5

(...) (...) (...) (...)

user agent HTTP < = 0.869 user agent HTTP < = 0.974

(...) (...) (...) (...)

Fig. 5: First 3 levels of a single tree classifier from our forest.

value that minimizes the false positive rate, while maximizing

the true positive rate. However, this configuration parameter

depends on the targeted application of browser fingerprinting.

For instance, if browser fingerprinting is used as a second-

tier security mechanism (e.g., to verify the user is connecting

from a known browser instance), we set λto a high value.

This makes the algorithm more conservative, reducing the risk

of linking a fingerprint to an incorrect browser instance, but

it also increases false negatives and results in a reduction of

the duration the algorithm can effectively track a browser. On

the opposite end, a low λvalue will increase the false positive

rate, in this case meaning it tends to link browser fingerprints

together even though they present differences. Such a use

case might be acceptable for constructing ad profiles, because

larger profiles are arguably more useful even if sometimes

contaminated with someone else’s information. By applying

this approach, we obtained a λthreshold equal to 0.994.

V. E MPIRICAL EVALUATIO N OF FP-STALK ER

This section assesses FP- STALK ER ’s capacity to i) cor-

rectly link fingerprints from the same browser instance, and

to ii) correctly predict when a fingerprint belongs to a browser

instance that has never been seen before. We show that both

variants of FP-STALKER are effective in linking fingerprints

and in distinguishing fingerprints from new browser instances.

However, the rule-based variant is faster while the hybrid

variant is more precise. Finally, we discuss the impact of

the collect frequency on fingerprinting effectiveness, and we

evaluate the execution times of both variants of F P-STAL KE R.

Figure 6 illustrates the linking and evaluation process. Our

database contains perfect tracking chains because of the unique

identifiers our extensions use to identify browser instances.

From there, we sample the database using different collection

frequencies and generate a test set that removes the identifiers,

resulting in a mix of fingerprints from different browsers.

The resulting test set is then run through FP-STALK ER to

reconstruct the best browser instance chains as possible.

A. Key Performance Metrics

To evaluate the performance of our algorithms and measure

how vulnerable users are to browser fingerprint tracking, we

consider several metrics that represent the capacity to keep

track of browser instances over time and to detect new browser

instances. This section presents these evaluation metrics, as

well as the related vocabulary. Figure 6 illustrates the different

metrics with a scenario.

Atracking chain is a list of fingerprints that have been

linked—i.e., fingerprints for which the linking algorithm as-

signed the same identifier. A chain may be composed of one or

more fingerprints. In case of a perfect linking algorithm, each

browser instance would have a unique tracking chain—i.e.,

all of its fingerprints are grouped together and are not mixed

with fingerprints from any other browser instances. However,

in reality, fingerprinting is a statistical attack and mistakes may

occur during the linking process, which means that:

1) Fingerprints from different browser instances may be

included in the same tracking chain.

2) Fingerprints from a given browser instance may be split

into different tracking chains.

The lower part of Figure 6 shows examples of these mistakes.

Chain 1 has an incorrect fingerprint fpB1 from Browser

B, and chain 3 and chain 4 contain fingerprints from

browser C that have not correctly been linked—i.e.,fpC3

and fpC4 were not linked leading to a split).

We present the tracking duration metric to evaluate the

capacity of an algorithm to track browser instances over time.

We define tracking duration as the period of time a link-

ing algorithm matches the fingerprints of a browser instance

within a single tracking chain. More specifically, the tracking

duration for a browser biin a chain chainkis defined as

Col lectF requency ×(#bi∈chaink−1). We subtract one

because we consider a browser instance to have been tracked,

by definition, from the second linked fingerprint onwards.

The average tracking duration for a browser instance bi

is the arithmetic mean of its tracking duration across all the

tracking chains the instance is present in. For example, in

Figure 6, the tracking duration of browser B in chain 1

is 0×Col lectF requency, and the tracking duration in chain

2is 1×Col lectF requency, thus the average tracking duration

is 0.5×Col lectF requency. In the same manner, the average

tracking duration of browser C is 1.5×C ollectF req uency.

The maximum tracking duration for a browser instance bi

is defined as the maximum tracking duration across all of the

tracking chains the browser instance is present in. In the case

of browser C, the maximum tracking duration occurred in

chain 3 and is equal to 2×Col lectF requency.

The Number of assigned ids represents the number of

different identifiers that have been assigned to a browser

instance by the linking algorithm. It can be seen as the number

of tracking chains in which a browser instance is present. For

each browser instance, a perfect linking algorithm would group

all of the browser’s fingerprints into a single chain. Hence,

each browser instance would have a number of assigned ids

of 1. Figure 6 shows an imperfect case where browser C

has been assigned 2different ids (chain 3 and chain 4).

The ownership ratio reflects the capacity of an algorithm

to not link fingerprints from different browser instances. The

owner of a tracking chain chainkis defined as the browser

instance bithat has the most fingerprints in the chain. Thus,

we define ownership ratio as the number of fingerprints that

belong to the owner of the chain divided by the length of

the chain. For example, in chain 1,browser A owns the

chain with an ownership ratio of 4

5because it has 4out

of 5of the fingerprints. In practice, an ownership ratio close to

1means that a tracking profile is not polluted with information

from different browser instances.

Fig. 6: Overview of our evaluation process that allows testing

the algorithms using different simulated collection frequencies.

Algorithm 4 Eckersley fingerprint matching algorithm [8]

ALLOW E D ={cook ies, resolution, timezone, local}

function FINGERPRINTMATCHING(F, fu)

candidates ∅

for fk∈Fdo

changes DIFF(fu, fk)

if |changes|= 1 then

candidates candidates ∪ hfk, changesi

end if

end for

if |candidates|= 1 then

hfk, ai candidates[0]

if a∈ALLOW E D then

return fk

else if MATCHRATIO(fu(a), fk(a)) >0.85 then

return fk

else

return NU LL

end if

end function

MATCH RATIO refers to the Python standard library function

difflib.SequenceMatcher().ratio() for estimating the

similarity of strings.

B. Comparison with Panopticlick’s linking algorithm

We compare FP-STAL KE R to the algorithm proposed

by Eckersley [8] in the context of the PANOPTICLICK

project. To the best of our knowledge, there are no other

algorithms to compare to. Although Eckersley’s algorithm

has been characterized as “naive” by its author, we use

it as a baseline to compare our approach. The PAN OP -

TICLICK algorithm is summarized in Algorithm 4. It uses

the following 8attributes: User agent,accept,cookies

enabled,screen resolution,timezone,plugins,

fonts and local storage. Given an unknown fingerprint

fu, PANOPTICLICK tries to match it to a previous fingerprint of

the same browser instance if a sufficiently similar one exists—

Fig. 7: Example of the process to generate a simulated test

set. The dataset contains fingerprints collected from browser’s

Aand B, which we sample at a collect f requency of 2

days to obtain a dataset that allows us to test the impact of

collect f requency on fingerprint tracking.

i.e., no more than one attribute changed. Otherwise, if it found

no similar fingerprints, or too many similar fingerprints that

belong to different browser instances, it assigns a new id.

Moreover, although at most one change is allowed, this change

can only occur among the following attributes: cookies,

resolution,timezone and local storage.

C. Dataset generation using fingerprint collect frequency

To evaluate the effectiveness of FP-STAL KE R we start

from our test set of 59,159 fingerprints collected from

1,395 browser instances (60% of our input dataset, see Sec-

tion IV-C2b). However, we do not directly use this set. Instead,

by sampling the test set, we generate new datasets using a

configurable collect frequency. Because our input dataset is

fine-grained, it allows us to simulate the impact fingerprinting

frequency has on tracking. The intuition being that if a browser

is fingerprinted less often, it becomes harder to track.

To generate a dataset for a given collect frequency, we start

from the test set of 59,159 fingerprints, and, for each browser

instance, we look at the collection date of its first fingerprint.

Then, we iterate in time with a step of collect f requency

days and recover the browser instance’s fingerprint at time

t+collect f requency. It may be the same fingerprint as the

previous collect or a new one. We do this until we reach the

last fingerprint collected for that browser id. This allows us

to record a sequence of fingerprints that correspond to the

sequence a fingerprinter would obtain if the browser instance

was fingerprinted at a frequency of collect frequency days.

The interest of sampling is that it is more realistic than using

all of the fingerprints from our database since they are very

fine-grained. Indeed, the extension is capable of catching even

short-lived changes in the fingerprint (e.g., connecting an exter-

nal monitor), which is not always possible in the wild. Finally,

it allows us to investigate how fingerprint collection frequency

impacts browser tracking. Figure 7 provides an example of the

process to generate a dataset with a collect f requency of two

days. Table IV presents, for each simulated collect frequency,

the number of fingerprints in the generated test sets.

The browser fingerprints in a generated test set are ordered

chronologically. At the beginning of our experiment, the set

of known fingerprints (F) is empty. At each iteration, F P-

STALK ER tries to link an unknown fingerprint fuwith one

of the fingerprints in F. If it can be linked to a fingerprint

TABLE IV: Number of fingerprints per generated test set after

simulating different collect frequencies

Collect frequency (days) Number of fingerprints

1171,735

286,225

357,695

443,441

534,916

629,195

725,155

822,065

10 17,814

15 12,100

20 9,259

2.5 5.0 7.5 10.0 12.5 15.0 17.5 20.0

Collect frequency (days)

Average tracking duration

(days)

Panopticlick

Rule-based

Hybrid

Fig. 8: Average tracking duration against simulated collect

frequency for the three algorithms

fk, then FP-STALK ER assigns the id fk.id to fu, otherwise

it assigns a new id. In both cases, fuis added to F. The

chronological order of the fingerprints implies that at time

t, a browser fingerprint can only be linked with a former

fingerprint collected at a time t0< t. This approach ensures

a more realistic scenario, similar to online fingerprint tracking

approaches, than if we allowed fingerprints from the past to

be linked with fingerprints collected in the future.

D. Tracking duration

Figure 8 plots the average tracking duration against the

collect frequency for the three algorithms. On average, browser

instances from the test set were present for 109 days, which

corresponds to the maximum value our linking algorithm could

potentially achieve. We see that the hybrid variant of FP-

STALK ER is able to keep track of browser instances for a

longer period of time than the two other algorithms. In the

case where a browser gets fingerprinted every three days,

FP-STALKER can track it for 51,8days, on average. More

generally, the hybrid variant of F P-S TAL KE R has an average

tracking duration of about 9days more than the rule-based

variant and 15 days more than the Panopticlick algorithm.

Figure 9 presents the average maximum tracking duration

against the collect frequency for the three algorithms. We

see that the hybrid algorithm still outperforms the two other

algorithms because the it constructs longer tracking chains

with less mistakes. On average, the maximum average tracking

duration for FP- STALK ER ’s hybrid version is in the order of

74 days, meaning that at most users were generally tracked for

this duration.

2.5 5.0 7.5 10.0 12.5 15.0 17.5 20.0

Collect frequency (days)

Average maximum tracking

duration (days)

Panopticlick

Rule-based

Hybrid

Fig. 9: Average maximum tracking duration against simulated

collect frequency for the three algorithms. This shows averages

of the longest tracking durations that were constructed.

2.5 5.0 7.5 10.0 12.5 15.0 17.5 20.0

Collect frequency (days)

Number of ids per user

(lower is better)

Panopticlick

Rule-based

Hybrid

Fig. 10: Average number of assigned ids per browser instance

against simulated collect frequency for the three algorithms

(lower is better).

Figure 10 shows the number of ids they assigned, on aver-

age, for each browser instance. We see that PANOPTICLICK’s

algorithm often assigns new browser ids, which is caused by

its conservative nature. Indeed, as soon as there is more than

one change, or multiple candidates for linking, Panopticlick’s

algorithm assigns a new id to the unknown browser instance.

However, we can observe that both FP-STAL KE R’s hybrid and

rule-based variants perform similarly.

Finally, Figure 11 presents the average ownership of track-

ing chains against the collect frequency for the three algo-

rithms. We see that, despite its conservative nature, PAN OP -

TICLICK’s ownership is 0.94, which means that, on average,

6% of a tracking chain is constituted of fingerprints that do

not belong to the browser instance that owns the chain—i.e.,

it is contaminated with other fingerprints. The hybrid variant

of FP-STALKER has an average ownership of 0.985, against

0.977 for the rule-based.

When it comes to linking browser fingerprints, FP-

STALK ER’s hybrid variant is better, or as good as, the rule-

based variant. The next paragraphs focus on a few more results

we obtain with the hybrid algorithm. Figure 12 presents the

cumulative distribution of the average and maximum tracking

duration when collect f requency equals 7days for the hybrid

variant. We observe that, on average, 15,5% of the browser

instances are tracked more than 100 days. When it comes to the

the longest tracking chains, we observe that more than 26% of

the browser instances have been tracked at least once for more

2.5 5.0 7.5 10.0 12.5 15.0 17.5 20.0

Collect frequency (days)

0.93

0.94

0.95

0.96

0.97

0.98

0.99

Average ownership

(higher is better)

Panopticlick

Rule-based

Hybrid

Fig. 11: Average ownership of tracking chains against simu-

lated collect frequency for the three algorithms. A value of 1

means the tracking chain is constructed perfectly.

0 50 100 150 200 250

Tracking duration (collect frequency = 7 days)

100

Browser instances (%)

Perfect tracking time

FP-Stalker hybrid maximum tracking time

FP-Stalker hybrid average tracking time

Fig. 12: CDF of average and maximum tracking duration for a

collect frequency of 7days (FP-STA LK ER hybrid variant only).

than 100 days during the experiment. These numbers show

how tracking may depend on the browser and its configuration.

Indeed, while some browsers are never tracked for a long

period of time, others may be tracked for multiple months. This

is also due to the duration of presence of browser instances in

our experiments. Few browser instances were present for the

whole experiment, most for a few weeks, and at best we can

track a browser instance only as long as it was present. The

graph also shows the results of the perfect linking algorithm,

which can also be interpreted as the distribution of duration of

presence of browser instances in our test set.

The boxplot in Figure 13 depicts the number of ids gener-

ated by the hybrid algorithm for a collect frequency of 7 days.

It shows that half of the browser instances have been assigned

2identifiers, which means they have one mistake, and more

than 90 % have less than 9identifiers.

Finally, we also look at the distribution of the chains to

see how often fingerprints from different browser instances are

mixed together. For the FP-STALKER hybrid variant, more than

95% of the chains have an ownership superior to 0.8, and more

than 90% have perfect ownership—i.e.,1. This shows that a

small percentage of browser instances become highly mixed in

the chains, while the majority of browser instances are properly

linked into clean and relatively long tracking chains.

Collect frequency = 7 days

Number of ids

per browser instance

Fig. 13: Distribution of number of ids per browser for a

collect frequency of 7days (FP-STA LK ER hybrid variant only).

E. Benchmark/Overhead

This section presents a benchmark that evaluates the perfor-

mance of FP-STALK ER ’s hybrid and rule-based variants. We

start by providing more details about our implementation, then

we explain the protocol used for this benchmark, demonstrate

that our approach can scale, and we show how our two variants

behave when the number of browser instances increases.

a) The implementations: of FP-STALKER used for this

benchmark are developed in Python, and the implementation of

the random forest comes from the Scikit-Learn library. In order

to study the scalability of our approach, we parallelized the

linking algorithm to run on multiple nodes. A master node is

responsible for receiving linkability requests, then it sends the

unknown fingerprint to match futo slave nodes that compare

fuwith all of the fkpresent on their process. Then, each

slave node sends its set of candidates associated either with a

probability in case of the hybrid algorithm, or the number of

changes in case of the rule-based version. Finally, the master

node takes the final decision according to the policy defined

either by the rule-based or hybrid algorithm. After the decision

is made, it sends a message to each node to announce whether

or not they should keep fuin their local memory. In the case

of the benchmark, we do not implement an optimization for

exact matching. Indeed, normally the master nodes should hold

a list of the exact matches associated with their ids.

b) The experimental protocol: aims to study scalability.

We evaluate our approach on a standard Azure cloud instance.

We generate fake browser fingerprints to increase the test set

size. Thus, this part does not evaluate the previous metrics,

such as tracking duration, but only the execution times

required to link synthetic browser fingerprints, as well as how

well the approach scales across multiple processes.

The first step of the benchmark is to generate fake finger-

prints from real ones. The generation process consists in taking

a real fingerprint from our database and applying random

changes to the canvas and the timezone attributes. We apply

only two random changes so that generated fingerprints are

unique, but they do not have too many differences which would

reduce the number of comparisons. This point is important

because our algorithms include heuristics related to the number

of differences. Thus, by applying a small number of random

changes, we do not discard all fkfingerprints, making it the

worst case scenario for testing execution times. Regarding

the browser ids, we assign two generated fingerprints to each

browser instance. It would not have been useful to generate

1 2 4 8 16

Execution speed up (%)

Fig. 14: Speedup of average execution time against number of

processes for FP- STALK ER ’s hybrid variant

more fingerprints per browser instance since we compare

an unknown fingerprint only with the last 2fingerprints of

each browser instance. Then, the master node creates nslave

processes and sends the generated fingerprints to them. The

fingerprints are spread evenly over the processes.

Once the fingerprints are stored in the slave processes

memory, we start our benchmark. We get 100 real fingerprints

and try to link them with our generated fingerprints. For each

fingerprint, we measure the execution time of the linking

process. In this measurement, we measure:

1) The number of fingerprints and browser instances.

2) The number of processes spawned.

We execute our benchmark on a Standard D16 v3

Azure instance with 16 virtual processors and 64 Gb of

RAM, which has an associated cost of $576 USD per month.

Figure 14 shows the execution time speedup in percentage

against the number of processes for the hybrid approach. We

see that that as the number of processes increases, we obtain

a speedup in execution time. Going from 1to 8processes

enables a speed up of more than 80 %. Figure 15 shows the

execution time to link a fingerprint against the number of

browser fingerprints for FP-STALK ER’s hybrid and rule-based

variants, using 16 processes. Better tracking duration

from the hybrid variant (see V-D) is obtained at the cost of

execution speed. Indeed, for any given number of processes

and browser instances, the rule-based variant links fingerprints

about 5times faster. That said, the results show that the hybrid

variant links fingerprints relatively quickly.

However, the raw execution times should not be used

directly. The algorithm was implemented in Python, whose pri-

mary focus is not performance. Moreover, although we scaled

by adding processes, it is possible to scale further by splitting

the linking process (e.g., depending on the combination of OS

and browser, send the fingerprint to more specialized nodes).

In our current implementation, if an unknown fingerprint from

a Chrome browser on Linux is trying to be matched, it will be

compared to fingerprints from Firefox on Windows, causing

us to wait even though they have no chance of being linked.

By adopting a hierarchical structure where nodes or processes

are split depending on their OS and browser, it is possible to

increase the throughput of our approach.

Furthermore, the importance of the raw execution speeds

depend highly on the use case. In the case where fingerprinting

is used as a way to regenerate cookies (e.g., for advertising), a

fingerprint only needs to be linked when the cookie is missing

Rules Hybrid

500k fingerprints

Execution time (s)

Rules Hybrid

1M fingerprints

Rules Hybrid

2M fingerprints

Fig. 15: Execution times for FP-STA LK ER hybrid and rule-

based to link a fingerprint using 16 processes. Time is depen-

dent on the size of the test set. The increased effectiveness of

the hybrid variant comes at the cost slower of execution times.

or has been erased, a much less frequent event. Another use

case is using browser fingerprinting as a way to enhance

authentication [3]. In this case, one only needs to match

the fingerprint of the browser attempting to sign-in with the

previous fingerprints from the same user, drastically reducing

the number of comparisons.

F. Threats to Validity

First, the results we report in this work depend on the rep-

resentativity of our browser fingerprint dataset. We developed

extensions for Chrome and Firefox, the two most popular web

browsers, and distributed them through standard channels. This

does provide long term data, and mitigates a possible bias if

we had chosen a user population ourselves, but it is possible

that the people interested in our extension are not a good

representation of the average Web surfer.

Second, there is a reliability threat due to the difficulty in

replicating the experiments. Unfortunately, this is inherent to

scientific endeavors in the area of privacy: these works must

analyze personal data (browser fingerprints in our case) and the

data cannot be publicly shared. Yet, the code to split the data,

generate input data, train the algorithm, as well as evaluate it,

is publicly available online on GitHub5.

Finally, a possible internal threat lies in our experimental

framework. We did extensive testing of our machine learning

algorithms, and checked classification results as thoroughly as

possible. We paid attention to split the data and generate a

scenario close to what would happen in a web application.

However, as for any large scale experimental infrastructure,

there are surely bugs in this software. We hope that they only

change marginal quantitative things, and not the qualitative

essence of our findings.

G. Discussion

This paper studies browser fingerprint linking in isolation,

which is its worst-case scenario. In practice, browser finger-

printing is often combined with stateful tracking techniques

5https://github.com/Spirals-Team/FPStalker

(e.g., cookies, Etags) to respawn stateful identifiers [1]. In such

cases, fingerprint linking is performed much less frequently

since most of the time a cookie is sufficient and inexpensive

to track users. Our work shows that browser fingerprinting can

provide an efficient solution to extend the lifespan of cookies,

which are increasingly being deleted by privacy-aware users.

Browser vendors and users would do well to minimize the

differences that are so easily exploited by fingerprinters. Our

results show that some browser instances have highly trackable

fingerprints, to the point that very infrequent fingerprinting is

quite effective. In contrast, other browser instances appear to

be untrackable using the attributes we collect. Vendors should

work to minimize the attack surfaces exploited by fingerprint-

ers, and users should avoid customizing their browsers in ways

that make them expose unique and linkable fingerprints.

Depending on the objectives, browser fingerprint linking

can be tuned to be more conservative and avoid false positives

(e.g., for second-tier security purposes), or more permissive

(e.g., ad tracking). Tuning could also be influenced by how

effective other tracking techniques are. For example, it could

be tuned very conservatively and simply serve to extend cookie

tracking in cases where privacy-aware users, which are in

our opinion more likely to have customized (i.e., unique and

linkable) browser configurations, delete their cookies.

VI. CONCLUSION

In this paper, we investigated browser fingerprint evolution

and proposed FP-STALK ER as an approach to link fingerprint

changes over time. We address the problem with two variants

of FP-STALKER. The first one builds on a ruleset identified

from an analysis of grounded programmer knowledge. The

second variant combines the most discriminating rules by

leveraging machine learning to sort out the more subtle ones.

We trained the F P-S TAL KE R hybrid variant with a training

set of fingerprints that we collected for 2years through browser

extensions installed by 1,905 volunteers. By analyzing the

feature importance of our random forest, we identified the

number of changes, the languages, as well as the

user agent, as the three most important features.

We ran FP-STALK ER on our test set to assess its capacity

to link fingerprints, as well as to detect new browser instances.

Our experiments demonstrate that the hybrid variant can cor-

rectly link fingerprint evolutions from a given browser instance

for 54.48 consecutive days on average, against 42,3days

for the rule-based variant. When it comes to the maximum

tracking duration, with the hybrid variant, more than 26% of

the browsers can be tracked for more than 100 days.

Regarding the usability of FP-STALK ER, we measure the

average execution time to link an unknown fingerprint when

the number of known fingerprints is growing. We show that

both our rule-based and hybrid variants scale horizontally.

ACKNOWLEDGMENT

We would like to thank the users of the AmIUnique

extensions, whose contributions were essential to this study.

We also want to thank our shepherd, Davide Balzarotti, and

the anonymous reviewers for their valuable comments and

feedback. Finally, this work would not have been possible

without our long-term collaboration with Benoit Baudry.

REFERENCES

[1] G. Acar, C. Eubank, S. Englehardt, M. Juarez, A. Narayanan, and

C. Diaz, “The Web Never Forgets: Persistent Tracking Mechanisms

in the Wild,” Proceedings of the 2014 ACM SIGSAC Conference on

Computer and Communications Security - CCS ’14, pp. 674–689, 2014.

[2] G. Acar, M. Juarez, N. Nikiforakis, C. Diaz, S. G¨

urses, F. Piessens,

and B. Preneel, “FPDetective,” Proceedings of the 2013 ACM SIGSAC

conference on Computer & communications security - CCS ’13, pp.

1129–1140, 2013. [Online]. Available: http://dl.acm.org/citation.cfm?

doid=2508859.2516674

[3] F. Alaca and P. C. V. Oorschot, “Device Fingerprinting for

Augmenting Web Authentication: Classification and Analysis of

Methods,” Annual Computer Security Applications Conference (ASAC

’32), 2016. [Online]. Available: http://people.scs.carleton.ca/∼paulv/

papers/acsac2016-device-fingerprinting.pdf

[4] S. Bernard, L. Heutte, and S. Adam, “Influence of hyperparameters on

random forest accuracy,” Proc. Int. Workshop Multiple Classifier Syst.,

vol. 5519 LNCS, pp. 171–180, 2009.

[5] K. Boda, ´

A. M. F¨

oldes, G. G. Guly´

as, and S. Imre, “User tracking

on the web via cross-browser fingerprinting,” in Nordic Conference on

Secure IT Systems. Springer, 2011, pp. 31–46.

[6] L. Breiman, “Random Forests,” Machine Learning, vol. 45, no. 1,

pp. 5–32, 2001. [Online]. Available: http://dx.doi.org/10.1023/A:

1010933404324

[7] Y. Cao, “(Cross-) Browser Fingerprinting via OS and Hardware Level

Features,” 24nd Annual Network and Distributed System Security Sym-

posium, NDSS’17, no. March, 2017.

[8] P. Eckersley, “How unique is your web browser?” Proceedings of the

10th Privacy Enhancing Technologies Symposium (PETS), 2010.

[9] S. Englehardt and A. Narayanan, “Online Tracking: A 1-million-site

Measurement and Analysis,” Proceedings of the 2016 ACM SIGSAC

Conference on Computer and Communications Security - CCS’16, no. 1,

pp. 1388–1401, 2016.

[10] D. Fifield and S. Egelman, “Fingerprinting web users through font

metrics,” Financial Cryptography and Data Security, vol. 8975, pp.

107–124, 2015.

[11] T. Hupperich, D. Maiorca, M. K¨

uhrer, T. Holz, and G. Giacinto,

“On the Robustness of Mobile Device Fingerprinting,” Proceedings

of the 31st Annual Computer Security Applications Conference,

pp. 191–200, 2015. [Online]. Available: http://dl.acm.org/citation.cfm?

doid=2818000.2818032

[12] P. Laperdrix, B. Baudry, and V. Mishra, “FPRandom: Randomizing

core browser objects to break advanced device fingerprinting

techniques,” Proceedings - 9th Int. Symposium on Engineering Secure

Software and Systems, ESSoS 2017, Jul. 2017. [Online]. Available:

https://hal.inria.fr/hal-01527580

[13] P. Laperdrix, W. Rudametkin, and B. Baudry, “Mitigating Browser

Fingerprint Tracking: Multi-level Reconfiguration and Diversification,”

Proceedings - 10th Int. Symposium on Software Engineering for

Adaptive and Self-Managing Systems, SEAMS 2015, pp. 98–108, 2015.

[Online]. Available: https://hal.inria.fr/hal- 01121108

[14] ——, “Beauty and the Beast: Diverting Modern Web Browsers

to Build Unique Browser Fingerprints,” Proceedings - 2016 IEEE

Symposium on Security and Privacy, SP 2016, pp. 878–894, 2016.

[Online]. Available: https://hal.inria.fr/hal- 01285470

[15] G. Louppe, L. Wehenkel, A. Sutera, and P. Geurts,

“Understanding variable importances in forests of randomized

trees,” Advances in Neural Information Processing Systems 26,

pp. 431–439, 2013. [Online]. Available: http://papers.nips.cc/paper/

4928-understanding- variable-importances-in-forests- of-randomized- trees.

pdf

[16] O. Loyola-Gonz´

alez, M. Garc´

ıa-Borroto, M. A. Medina-P´

erez, J. F.

Mart´

ınez-Trinidad, J. A. Carrasco-Ochoa, and G. De Ita, An

Empirical Study of Oversampling and Undersampling Methods for

LCMine an Emerging Pattern Based Classifier. Berlin, Heidelberg:

Springer Berlin Heidelberg, 2013, pp. 264–273. [Online]. Available:

http://dx.doi.org/10.1007/978-3-642-38989- 4 27

[17] J. R. Mayer, “Internet Anonymity in the Age of Web 2.0,” A Senior

Thesis presented to the Faculty of the Woodrow Wilson School of

Public and International A airs in partial ful llment of the requirements

for the degree of Bachelor of Arts., p. 103, 2009. [Online]. Available:

https://jonathanmayer.org/papers data/thesis09.pdf

[18] K. Mowery, D. Bogenreif, S. Yilek, and H. Shacham, “Fingerprinting

Information in JavaScript Implementations,” Web 2.0 Security &

Privacy, pp. 1–11, 2011. [Online]. Available: http://cseweb.ucsd.edu/

∼hovav/papers/mbys11.html

[19] K. Mowery and H. Shacham, “Pixel Perfect : Fingerprinting Canvas in

HTML5,” Web 2.0 Security & Privacy 20 (W2SP), pp. 1–12, 2012.

[20] M. Mulazzani, P. Reschl, and M. Huber, “Fast and Reliable Browser

Identification with JavaScript Engine Fingerprinting,” Proceedings

of W2SP, 2013. [Online]. Available: http://www.sba-research.org/

wp-content/uploads/publications/jsfingerprinting.pdf

[21] N. Nikiforakis, W. Joosen, and B. Livshits, “PriVaricator,” Proceedings

of the 24th International Conference on World Wide Web - WWW ’15,

pp. 820–830, 2015. [Online]. Available: http://dl.acm.org/citation.cfm?

doid=2736277.2741090

[22] N. Nikiforakis, A. Kapravelos, W. Joosen, C. Kruegel, F. Piessens, and

G. Vigna, “Cookieless monster: Exploring the ecosystem of web-based

device fingerprinting,” Proceedings - IEEE Symposium on Security and

Privacy, pp. 541–555, 2013.

[23] M. Perry, E. Clark, and S. Murdoch, “The Design and Implementation

of the Tor Browser,” Tech. Rep., May 2015, https://www.torproject.org/

projects/torbrowser/design.

[24] C. F. Torres, H. Jonker, and S. Mauw, “FP-block: Usable web privacy

by controlling browser fingerprinting,” ESORICS, 2015, vol. 9327, no.

October, pp. 3–19, 2015.

[25] W. Wu, J. Wu, Y. Wang, Z. Ling, and M. Yang, “Effi-

cient Fingerprinting-Based Android Device Identification With Zero-

Permission Identifiers,” vol. 4, pp. 8073–8083, 2016.

[26] T.-F. Yen, Y. Xie, F. Yu, R. P. Yu, and M. Abadi, “Host Fingerprinting

and Tracking on the Web: Privacy and Security Implications,” Network

and Distributed System Security Symposium, pp. 1–16, 2012.