Talon: An Automated Framework for Cross-Device Tracking Detection

Konstantinos Solomos

FORTH, Greece

solomos@ics.forth.gr

Panagiotis Ilia

Univ. of Illinois at Chicago, USA

pilia@uic.edu

Sotiris Ioannidis

FORTH, Greece

sotiris@ics.forth.gr

Nicolas Kourtellis

Telefonica Reasearch, Spain

nicolas.kourtellis@telefonica.com

Abstract

Although digital advertising fuels much of today’s free Web,

it typically does so at the cost of online users’ privacy, due to

the continuous tracking and leakage of users’ personal data.

In search for new ways to optimize the effectiveness of ads,

advertisers have introduced new advanced paradigms such

as cross-device tracking (CDT), to monitor users’ browsing

on multiple devices and screens, and deliver (re)targeted ads

in the most appropriate screen. Unfortunately, this practice

leads to greater privacy concerns for the end-user.

Going beyond the state-of-the-art, we propose a novel

methodology for detecting CDT and measuring the factors

affecting its performance, in a repeatable and systematic

way. This new methodology is based on emulating realistic

browsing activity of end-users, from different devices, and

thus triggering and detecting cross-device targeted ads. We

design and build Talon1, a CDT measurement framework

that implements our methodology and allows experimenta-

tion with multiple parallel devices, experimental setups and

settings. By employing Talon, we perform several critical

experiments, and we are able to not only detect and measure

CDT with average AUC score of 0.78-0.96, but also to pro-

vide significant insights about the behavior of CDT entities

and the impact on users’ privacy. In the hands of privacy

researchers, policy makers and end-users, Talon can be an

invaluable tool for raising awareness and increasing trans-

parency on tracking practices used by the ad-ecosystem.

1 Introduction

Online advertising has become a driving force of the econ-

omy, with digital ad spending already surpassing the spend-

ing for TV-based advertising in 2017 [46], and expected to

reach $327 billion in 2019 [61]. This is because online adver-

tising can be easily tailored to, and target specific audiences.

In order to personalize ads, advertisers employ various track-

ing practices to collect user behavioral and browsing data.

1https://en.wikipedia.org/wiki/Talos

Ad#Ecosystem

User%Info

Ads

Figure 1: High level representation of cross-device tracking.

Until recently, the tracking of a user was confined to the

physical boundary of each one of her devices. However,

as users typically own multiple devices [4, 80], advertisers

have started employing advanced targeting practices specif-

ically designed to track and target users across all their de-

vices. These efforts indicate a radical shift of the ad-targeting

paradigm, from device-centric to user-centric. In this new

paradigm, an advertiser tries to identify which devices (e.g.,

smartphone, tablet, laptop) belong to the same user, and then

target her across all devices with ads related to her overall

online behavior. Figure 1 illustrates a typical cross-device

tracking (CDT) scenario, where a user is targeted with rele-

vant ads in her second device (desktop), due to the behavior

exhibited to the ad-ecosystem from her first device (mobile).

A recent FTC Staff Report [74] states that CDT can be de-

terministic or probabilistic, and companies engaging in such

practices typically use a mixture of both techniques. Deter-

ministic tracking utilizes 1st-party login services that require

user authentication (e.g., Facebook, Twitter, Gmail). These

1st-party services often share information (e.g., a unique

identifier) with 3rd-parties, enabling them to perform a more

effective CDT. In the case of probabilistic CDT, there are

no shared identifiers between the users’ devices, and 3rd-

parties attempt to identify which devices belong to the same

user by considering network access data, common behav-

ioral patterns in browsing history, etc. In fact, to understand

the degree to which CDT trackers appear on the Web, we

arXiv:1812.11393v5 [cs.CR] 31 Jul 2019

measured their frequency of appearance on Alexa Top-10k

websites: companies performing probabilistic CDT can be

found in ∼27% of the websites, and when also considering

deterministic CDT, this coverage reaches ∼80%. Also, sev-

eral advertising companies such as Criteo [26], Tapad [79],

Drawbridge [32] etc., claim that they can track users across

devices with very high accuracy (e.g., Drawbridge’s Cross-

Device connected consumer graph is 97.3% accurate [31]).

In spite of its big impact on user privacy, apart from some

empirical evidence about CDT, there is only a limited work

investigating it. In the most close work to ours, Zimmeck

et al. [88], designed an algorithm that correlates mobile and

desktop devices into pairs by considering devices’ browsing

history and IP addresses. While this approach shows that

correlation of devices is possible when such data are avail-

able, it does not provide an approach for detecting and mea-

suring CDT. In fact, to the best of our knowledge, there is no

existing approach to audit the probabilistic CDT ecosystem

and the factors that impact its performance on the Web. Our

work is the first to propose a novel methodology that enables

auditing the CDT ecosystem in an automated and systematic

way. In effect, our work takes the first and crucial step in un-

derstanding the inner workings of the CDT mechanics and

measure different parameters that affect how it performs.

The methodology proposed in this work is based on the

following idea: we want to detect when CDT trackers suc-

cessfully correlate a user’s devices, by identifying cross-

device targeted behavioral ads they send, i.e., ads that are de-

livered on one device, but have been triggered because of the

user’s browsing on a different device. In order to design this

methodology, we first study browsing data of real users with

multiple devices from [88] and extract topics of interest and

other user behavioral patterns. Then, to make trackers cor-

relate the different devices of the end-user and serve cross-

device targeted ads, we employ artificially created personas

with specific interests, to emulate realistic browsing activity

across the user devices as extracted from the real data.

We build Talon, a novel framework that materializes our

methodology in order to collect, categorize and analyze all

the ads delivered to the different user devices, and evaluate

with simple and advanced statistical methods the potential

existence of CDT. Through a variety of experiments we are

able to measure CDT with an average AUC of 0.78-0.96.

Specifically, in the simplest experiment, where the user ex-

hibits significant browsing activity mainly from the mobile

device, the average value of AUC is 0.78 for the 10 different

behavioral profiles used. When the user exhibits significant

browsing activity from both devices (mobile and desktop),

with a matching behavioral profile, we observe CDT with

an average AUC of 0.83. In the case of visiting specifically

chosen websites that employ multiple known CDT trackers,

we achieve AUC score of 0.96. We also find that brows-

ing in incognito can reduce the effect of CDT, but does not

eliminate it, as trackers can perform device matching based

only on the current browsing session of the user, and not all

her browsing history. Finally, we compare the data collected

with our real user-driven artificial personas (such as CDT

trackers found, types of ads detected, etc.) with correspond-

ing distributions observed in the real user data from [88],

offering a strong validation to the realistic design of Talon.

Overall, our main contributions in this work are:

•Design a novel, real data-driven methodology for detect-

ing CDT by triggering behavioral cross-device targeted

ads on one user device, according to specifically-crafted

emulated personas, and then detecting those ads when

delivered on a different device of the same user.

•Implement Talon, a practical framework for CDT mea-

surements. Talon has been designed to provide scala-

bility for fast deployment of multiple parallel device in-

stances, to support various experimental setups, and to

be easily extensible.

•Conduct a set of experiments for measuring the potential

existence of CDT in different types of emulated users,

with an average AUC score of 0.78-0.96, and investigate

the various factors that affect its performance under dif-

ferent classes of experimental setups and configurations.

2 Background & Related Work

In this section, we provide the necessary terminology to un-

derstand the technical contributions of our work, and in par-

allel we present various mechanisms and technologies pro-

posed in related works.

2.1 Personalized Targeted Advertising

As the purpose of advertising is to increase market share, the

advertising industry continuously develops new mechanisms

to deliver more effective ads. These mechanisms involve the

delivery of contextual ads, targeted behavioral ads, and also

retargeted ads.

Contextual advertising refers to the delivery of ads rel-

evant to the content of the publishing website. With re-

gards to the effectiveness of contextual advertisement, Chun

et al. [24] found that it enhances brand recognition and that

users tend to have favourable attitudes towards it. In one

of the first works in this area, Broder et al. [16] proposed

an approach for classifying ads and web pages into a broad

taxonomy of topics, and then matching web pages with se-

mantically relevant ads. A large body of work also investi-

gates targeted behavioral advertising with regards to differ-

ent levels of personalization, based on the type of informa-

tion that is used to target the user [14, 9, 84, 69, 66], and

its effectiveness [86, 38, 54, 39]. Interestingly, Aguirre et

al. [9] found that, while highly personalized ads are more

relevant to users, they increase users’ sense of vulnerabil-

ity. In another study, Dolin et al. [30] measured users’

comfort regarding personalized advertisement. In a differ-

ent direction of investigation, Carrascosa et al. [21] devel-

oped a methodology that employs artificially-created behav-

ioral profiles (i.e., personas) for detecting behavioral targeted

advertising at scale. Their methodology could distinguish

interest-based targeting from other forms of advertising such

as retargeting. An extensive review of the literature about

behavioral advertising can be found in [15].

2.2 Leakage of Personal Information

In order to serve highly targeted ads, advertisers employ vari-

ous, often questionable and privacy intrusive, techniques for

collecting and inferring users’ personal information. They

typically employ techniques for tracking users visits across

different websites, which allow them to reconstruct parts of

the users’ browsing history. Numerous works investigate the

various approaches employed by trackers, and focus on pro-

tecting users’ privacy.

In a recent work, Papadopoulos et al. [70] developed a

methodology that enables users to estimate the actual price

advertisers pay for serving them ads. The range of these

prices can indicate which personal information of the user is

exposed to the advertiser and the sensitivity of this informa-

tion. Liu et al. [55] proposed AdReveal, a tool for character-

izing ads, and found that advertisers frequently target users

based on their interests and browsing behavior. Lecuyer et

al. [51] proposed XRay, a data tracking system that allows

users to identify which data is being used for targeting, by

comparing outputs from different accounts. In another work,

they propose Sunlight [52], a system that employs method-

ologies from statistics and machine learning to detect target-

ing at large scale.

Bashir et al. [12] developed a methodology that detects in-

formation flows between ad-exchanges. This approach lever-

ages retargeted ads, in order to detect when ad-exchanges

share the user’s information between them, for tracking and

retargeting the user. Datta et al. [27] developed AdFisher, a

tool that explores causal connections between users’ brows-

ing activities, their ad settings and the ads they receive, and

found cases of discriminatory ads. This tool uses machine

learning to determine, based on the ads received, if the user

belongs to a group of users that exhibit a specific browsing

behavior i.e., visited specific websites that affected their be-

havioral profile. Castelluccia et al. [83] showed that targeted

ads contain information that enable reconstruction of users’

behavioral profiles, and that user’s personal information can

be revealed to any party that has access the ads received by

the user.

In order to enable ad targeting without compromising user

privacy, Toubiana et al. [82] and Guha et al. [43] proposed

Adnostic and Privad, respectively. These two approaches

try to protect users’ privacy by keeping user profiles on the

client-side and thus, hiding user activities and interests from

the ad-network. Furthermore, in an attempt to provide a bet-

ter alternative, Parra-Arnau et al. [71], proposes a tool that

allows users to control which information can be used for

the purpose of advertising.

Furthermore, many works investigate privacy leakage,

specifically, in mobile devices and the different factors in-

fluencing mobile advertising [68, 81, 75, 42, 62]. A recent

study by Papadopoulos et al. [68] compared privacy leakage

when visiting mobile websites and using mobile apps. Meng

et al. [62] studied the accuracy of personalized ads served by

mobile applications based on the information collected by

the ad-networks. Also, Razaghpanah et al. [75] developed

a technique that detects third-party advertising and tracking

services in the mobile ecosystem and uncovers unknown re-

lationships between these services.

2.3 Web Tracking

As mentioned previously, various techniques are employed

for tracking and correlating users’ activities across different

websites. Many works investigated stateful tracking tech-

niques [77, 65, 35, 87, 53], and also stateless techniques such

as browser fingerprinting [34, 6, 5, 64, 63, 67]. One of the

first studies about tracking [59], investigated which informa-

tion is collected by third parties and how users can be identi-

fied. Roesner et al. [77] measured the prevalence of trackers

and different tracking behaviors in the web.

Olejnik et al. [65] investigated “cookie syncing”, a tech-

nique that enables third parties to have a more completed

view on the users’ browsing history by synchronizing their

cookies. Acar et al. [5] investigated the prevalence of “ev-

ercookies” and the effects of cookie respawning in combi-

nation with cookie syncing. Englehardt and Narayanan [35]

conducted a large scale measurement study to quantify state-

ful and stateless tracking in the web, and cookie syncing,

while Lerner et al. [53] conducted a longitudinal measure-

ment study of third party tracking behaviors and found that

tracking has increased in prevalence and complexity over

time.

With regards to stateless tracking, Nikiforakis et al. [64]

investigated various fingerprinting techniques employed by

popular trackers and measured the adoption of fingerprint-

ing in the web. Acar et al. [6] proposed FPDetective, a

framework to detect fingerprinting by identifying and ana-

lyzing specific events such as the loading of fonts, or access-

ing specific browser properties. In another work, Nikiforakis

et al. [63] proposed PriVaricator, a tool that employs ran-

domization to make fingerprints non-deterministic, in order

to make it harder for trackers to link user fingerprints across

websites. Also, in a recent work, Cao et al. [20] proposed a

fingerprinting technique that utilizes OS and hardware level

features, for enabling user tracking not only within a single

browser, but also across different browsers on the same ma-

chine.

2.4 Cross-Device Tracking

A few recent works investigate cross-device tracking that

is implemented based on technologies such as ultrasound

and Bluetooth, and measure the prevalence of these ap-

proaches [58, 11, 49]. As in this work we focus on web

based cross-device tracking, our work is complementary to

works that investigate such technologies.

A work by Brookman et al. [17], one of the few that inves-

tigate CDT on the web, provides some initial insights about

the prevalence of trackers. This work examines 100 popular

websites in order to determine which of them disclose data

to trackers, identifies which websites contain trackers known

to employ CDT techniques, and also investigates if users are

aware of these techniques.

During the Drawbridge Cross-Device Connection com-

petition of the ICDM 2015 conference [2], the participants

were provided with a dataset [1] that contained informa-

tion about some users’ devices, cookies, IP addresses and

also browsing activity, and were challenged to match cook-

ies with devices and users. This resulted in a number of short

papers [10, 19, 47, 48, 50, 28, 78, 85] that describe different

machine learning approaches followed during the competi-

tion for matching devices and cookies. Some of the proposed

methods achieved accuracy greater than 90%, and seen from

a different point compared to our work, showed that users’

devices can be potentially correlated if enough network and

device information is available.

Zimmeck et al. [88] conducted an initial small-scale ex-

ploratory study on CDT based on the observation of cross-

device targeted ads in two “paired” devices (mobile and

desktop) over the course of two months. Following this ex-

ploration, they collected the browsing history of 126 users,

from which 107 have provided data from both their desktop

and mobile device, and designed an algorithm that estimates

similarities and correlates the devices into pairs. This ap-

proach, which is based on IP addresses and browsing history,

and achieves high matching rates, shows that users’ network

information and browsing history can be used for correlating

user devices, and thus potentially for CDT.

In general, research around CDT is still very limited; in

fact, only [88, 17] initially studied some of its aspects, but

without proving its actual existence or providing a methodol-

ogy for detecting and measuring it. Overall, our work builds

on these early studies on CDT, as well as past studies on de-

tection of web tracking during targeted ads. We propose the

first of its kind methodology for systematic investigation of

probabilistic CDT, by leveraging artificially-created profiles

with specific web behaviors, and measuring the existence of,

and factors affecting CDT in various experimental setups.

3 A methodology to measure CDT

The proposed methodology emulates realistic browsing ac-

tivity of end-users across different devices, and collects and

categorizes all ads delivered to these devices based on the

intensity of the targeting. Finally, it compares these ads with

baseline browsing activity to establish if CDT is present or

not, at what level, and for which types of user interests.

3.1 Design Principle

In general, the CDT performed by the ad-ecosystem is a very

complex process, with multiple parties involved, and a non-

trivial task to dissect and understand. To infer its internal

mechanics, we rely on probing the ecosystem with consis-

tent and repeatable inputs (I), under specific experimental

settings (V), allowing the ecosystem to process and use this

input via transformations and modeling (F), and produce out-

puts we can measure on the receiving end (Y):

(I,V) F

−→ Y

In this expression, the unknown Fis the probabilistic model-

ing performed by CDT entities, allowing them to track users

across their devices. Following this design principle, our

methodology allows to push realistic input signals to the ad-

ecosystem via website visits, and measure the ecosystem’s

output through the delivered ads, to demonstrate if Fenabled

the ecosystem to perform probabilistic CDT. An overview of

our methodology is illustrated in Figure 2.

3.2 Design Overview

3.2.1 Input Signal (I)

To trigger CDT, we first need to inject to the ad-ecosystem

some activity from a user’s browsing behavior (I). This input

can be visits (i) to pages of interest (e.g., travel, shopping),

or (ii) to control pages of null interest (e.g., weather pages).

Intuitively, the former can be used first to demonstrate par-

ticular behavior of a user from a given device (mobile), and

the latter afterwards for collecting ads delivered as the output

of the ecosystem (Y) due to I, to that device, or other device

of the same user (desktop).

Persona Pages. We extract real users’ interests from the

dataset provided by Zimmeck et al. [88] and leverage an ap-

proach similar to Carrascosa et al. [21] to emulate brows-

ing behavior according to specific web categories, and cre-

ate multiple, carefully-crafted personas of different granular-

ities. This design makes the methodology systematic and re-

peatable and produces realistic browsing traffic from scripted

browsers. For each persona, our approach identifies a set of

websites (dubbed as persona pages) that have, at the given

time, active ad-campaigns. This “training activity” aims to

drive CDT trackers into possible device-pairing between the

user’s two devices with high degree of confidence.

Control' Pag es Persona ' Pages

Same'Public' IP'Address

Paire d'PC

Baselin e'PC

Experimental'Setup'Selector

Mobile

W'W'W

Ad-

ecosystem

Ad'

Categorizer

HTML

Ads

Metadat a

CDT No&CDT

Experimental'Setup'(V)

Input'

Signal'(I)

Ad-ecosystem

CDT' Functions'

&'Model ing' (F)

Output' Signal'(Y)

…

Instantiation& of

Probing& Devices

Feature'

Extractor

CDT' Detection

Categori es

Page 'Parser

&'Ad' E xtr actor

CDT' Machi ne'

Learning'Modeler

Figure 2: High level representation of methodology design principles and units for CDT measurements.

Control Pages. Following past works [21, 12], all devices

in the system collect ads by visiting neutral websites that

typically serve ads not related to their content, thus, reducing

bias from possible behavioral ads delivered to specific type

of websites. We refer to these websites as control pages. We

detail the design of personas and control pages in §4.1.

3.2.2 Experimental Setup (V)

No 1st-party logins. Since we focus on probabilistic CDT,

we assume that the emulated user does not visit or log into

any 1st-party service that employs deterministic CDT and

thus, there is no common identifier (e.g., email address, so-

cial network ID) shared between the user’s devices.

Devices, IP addresses & Activity. The approach we fol-

low is based on triggering and identifying behavioral cross-

device targeted ads, and specifically ads that appear on one of

the user’s devices, but have been triggered by the user’s ac-

tivity on a different device. For this trigger to be facilitated,

the ad-ecosystem must be provided with hints that these two

devices belong to the same user. Zimmeck et al. [88] suggest

that in many cases, the devices’ IP address is adequate for

matching devices that belong to the same user. Also, accord-

ing to relevant industrial teams [57, 8] more signals can be

used, such as location, browsing, etc., for device matching.

Following these observations, our methodology requires

a minimum of three different devices: one mobile device

and two desktop computers, with two different public IP ad-

dresses. We assume that two devices (i.e., the mobile and one

desktop) belong to the same user, and are connected to the

same network. That is, these devices have the same public

IP address, are active in the same geolocation as in a typical

home network, and will be considered by the ad-ecosystem

as producing traffic from the same user. The second desk-

top (i.e., baseline PC), which has a different IP address, is

used for receiving a different flow of ads while replicating

the browsing of the user’s desktop (i.e., paired PC). This

control instance is used for establishing a baseline set of ads

to compare with the ads received by the user’s paired PC.

CDT Direction. In principle, the design allows the inves-

tigation of both directions of CDT. That is, users may first

browse on the mobile device, and then move to their desk-

top, and vice versa. However, since ad-targeting companies

such as AdBrain and Criteo support that the direction from

mobile to desktop is more suitable for cross-device retarget-

ing [72, 7, 25], in this work we focus on the mobile to desk-

top direction (Mob →PC). In essence, the mobile device

performs a specifically instructed web browsing session to

establish the persona, by visiting the set of persona pages,

i.e., training phase; then, the two desktop computers perform

web browsing, i.e., testing phase, where they visit the set of

control pages and collect the delivered ads. The browsing

performed by the desktops is synchronized by means of vis-

iting the same pages and performing the exact same clicks.

3.2.3 Output Signal (Y)

In order to handle the Output Signal and transform it ap-

propriately, we design and implement two different compo-

nents: (i) Page Parser & Ad Extractor and (ii) Ad Catego-

rizer. The first is responsible for the identification and ex-

traction of ad elements inside the webpages. The module

uses string matching techniques and a public list of common

ad-domains (Easylist [33]) to identify the delivered ads. The

second module assigns a keyword on each ad identified on

the previous step, based on its type and content (e.g., “On-

line Shopping”, “Fashion”, “Recreation”, etc.). Using both

modules, we store the ads delivered in all devices of our ex-

perimental setup along with their categories, as well as data

related to the activity of the devices that attracted these ads.

Pre-processing

Extract

Keywords

Google Search -

Campaign Extraction

Real User’s

Interests

[88]

Group

Keywords per

Persona

Persona Selection

Extract

Weather

Websites

Google Product

Taxonomy

Alexa Top

Sites

Synthetic

Personas

[21]

Control Pages Persona Pages

Figure 3: Persona design and automatic generation.

3.2.4 CDT Detection

Comparing Signals. Various statistical methods can be used

to associate the input signal Iof persona browsing in the

mobile device, with the output signal Yof ads delivered to

the potentially paired-PC. For example, simple methods that

perform similarity computation between the two signals in a

given dimensionality (e.g., Jaccard, Cosine) can be applied.

These methods, as well as typical statistical techniques (e.g.,

permutation tests) capture only one dimension of each in-

put/output signal and thus, might not be suitable for measur-

ing with confidence the high complexity of the CDT signal.

In this case, more advanced methods can be employed, such

as Machine Learning techniques (ML) for classification of

the signals as similar enough to match, or not. In our analy-

sis, we mainly focus on ML to compute the likelihood of the

two signals being the product of CDT, as it takes into con-

sideration this multidimensionality in the feature space. We

describe the modeling and methods used for ML in §4.4.

4 Framework Implementation

A high level overview of our methodology, and its material-

ization by our framework Talon, is presented in Figure 2 and

described in §3. In the following, we provide more details

about its building blocks, and argue for various design de-

cisions taken while implementing this methodology into the

fully-fledged automated system.

4.1 Input Signal: Control Pages & Personas

Persona Pages. A critical part of our methodology is the de-

sign and automatic building of realistic user personas. Each

persona has a unique collection of visiting links, that form

the set of persona pages. Since we do not know in ad-

vance which e-commerce sites are conducting cross-device

Table 1: Behavioral personas used in our experiments.

Persona Category - Description

1 Online Shopping - Accessories, Jewelry.

2 Online Shopping - Fashion, Beauty.

3 Online Shopping - Sports and Accessories.

4 Online Shopping - Health and Fitness.

5 Online Shopping - Pet Supplies.

6 Air Travel.

7 Online Courses and Language Resources.

8 Online Business, Marketing , Merchandising.

9 Browser Games - Online Games.

10 Hotels and Vacations.

ad-campaigns, we design a process to dynamically detect ac-

tive persona pages of given interest categories. Our approach

for persona generation is shown in Figure 3.

We first use the list of topics of Zimmeck at al. [88], that

describe real user’s online interests. We perform a cluster-

ing based on the content of each interest and label the clus-

ters appropriately (e.g., we group together: “Shopping” and

“Beauty and Fashion” under the label: “Shopping and Fash-

ion”). Then, we use the persona categorization of Carrascosa

et al. [21] for their top 50 personas, and select only those

personas that describe similar interests with the previously

formed list. For the resulting intersection of personas from

the two lists, we iterate through the Google Product Taxon-

omy list [40] to obtain the related keywords for each one.

For increasing the probability to capture active ad-

campaigns that can potentially deliver ads to the devices,

we use Google Search as it reveals campaigns associated

with products currently being advertised. That is, if a

user searches for specific keywords (e.g., “men watches”),

Google will display a set of results, including sponsored

links for sites conducting campaigns for the terms searched.

In this way, we use the keywords set for each persona, as

extracted above, and transform them into search queries by

appending common string patterns such as “buy”, “sell”, and

“offers”. This process is repeated until between five and ten

unique domains per persona are collected. If the procedure

fails, no persona is formed.

As the effectiveness of a persona depends on the active ad-

campaigns at the given time, in our experiments, we deploy

personas in 10 categories related to shopping, traveling, etc.

(full list shown in Table 1). With this procedure, we manage

to design personas similar enough with real users, as well as

with emulated users designed in previous works [21, 12, 13,

88].

Control Pages. For retrieving the delivered ads (after any

type of browsing), we employ a set of webpages that contain:

(i) easily identifiable ad-elements and (ii) a sufficient num-

ber of ads that remains consistent through time. These pages

have neutral context and do not affect the behavioral profile

of the device visiting them. For most of the experiments in

§5, we use a set of five popular weather websites2as control

pages, similarly to [21]. We manually confirmed the neutral-

ity of these pages, by observing no contextual ads delivered

to them. When visiting the set of control pages, our meth-

ods extract and categorize all the ads received, in order to

identify those that have been potentially resulted from CDT.

4.2 Experimental System Setup

The experimental setup contains different types of units, con-

nected together for replicating browsing activity on multiple

devices. Typically, CDT is applied on two or more devices

that belong to the same user, such as a desktop and a mobile

device. Thus, the system contains emulated instances of both

types, controlled by a number of experimental parameters.

Devices & Automation. The desktop devices are built on

top of the web measurement framework OpenWPM [35].

This platform enables launching instances of the Firefox

browser, performs realistic browsing with scrolling, sleeps

and clicks, and collects a wide range of measurements in

every browsing session. It is also capable of storing the

browser’s data (cookies, local cache, temporary files) and ex-

ports a browser profile after the end of a browsing session,

which can be loaded in a future session. With these options,

we can perform stateful experiments, as a typical user’s web

browser that stores all the data through time, or stateless ex-

periments to emulate browsing in incognito mode.

For the mobile device, we use the official Android Emula-

tor [41], as well as the Appium UI Automator [73] for the au-

tomation of browsing. We build the mobile browsing module

on top of these components to automate visits to pages via

the Browser Application. This browsing module provides

functionalities for realistic interaction with a website, e.g.,

scrolling, click and sleep rate. Similarly to the desktop, it

can run either in a stateful or stateless mode.

Experimental Setup Selector. As shortly described in §3,

we need two phases of browsing to different types of web-

pages (training and testing), in order to successfully measure

CDT. For that reason, we set the two browsing phases in the

following way: During the training phase, the selected de-

vice visits the set of Persona Pages for a specific duration,

referred to as training time (ttrain). The test phase is the set

of visits to control pages for the purpose of collecting ads.

During this phase, we control the duration of browsing (i.e.,

ttest ). The experimental setup selector controls various pa-

rameters such as: which type of device will be trained and

tested, the times ttrain and ttest , the sequence of time slots

for training and testing from the selected device, number of

repetitions of this procedure, etc.

Timeline of phases. Each class of experiments is executed

multiple times (or runs), through parallel instantiations of the

user devices within the framework (as shown in Figure 2).

2accuweather.com,wunderground.com,weather.com,

weather-forecast.com,metcheck.com

Session'1 Session N

.'.'.'.'.'

0''''''''''''''''''''''''''''''tStStS

Session 2

B1' M1' W A1'''' R''''B2M2' W A2'''' R'''''''''''''''''BNMN' W AN'

Time'(t)

1''''''''''''''''''''''''''''''''''''''''''''' 2 N-1''''''''''''''''''''''''''''''''''''

CDT'detection

Figure 4: Timeline of phases for CDT measurement.

Mi: mobile training time ttrain + testing time ttest ;

Bi(Ai): desktop testing time ttest before (after) mobile phase;

W: wait time (twait ); R: rest time (trest ); tSi: time of session i.

Each experimental run is executed following a timeline of

phases as illustrated in Figure 4. This timeline contains N

sessions with three primary stages in each: Before, Mobile,

and After. The Before (Bi)stage is when the two desktop de-

vices perform a parallel test browsing, with a duration of ttest

time, to establish the state of ads before the mobile device

injects signal into the ad-ecosystem. The Mobile (Mi)stage

is when the mobile device performs a training browsing for

ttrain time, and a test browsing for ttest time. This phase in-

jects the signal from the mobile during training with a per-

sona, but also performs a subsequent test with control pages

to establish the state of ads after the training. Finally, the Af-

ter (Ai)stage is when the two desktops perform the final test

browsing, with the same duration ttest as in Before (Bi)stage,

to establish the state of ads after the mobile training.

After extensive experimentation, we found that a mini-

mum training time ttrain=15 minutes and testing time ttest =20

minutes are sufficient for injecting a clear signal over noise,

from the trained device to the ad-ecosystem. There is also

a waiting time (twait=10 minutes) and resting time (trest =5

minutes) between the stages of each session, to allow align-

ment of instantiations of devices running in parallel during

each session. In total, each session lasts 1.5 hours and is re-

peated N=15 times during a run. Through the experimental

setup selector, we define the values of such variables (ttrain ,

ttest ,twait ,trest,N, type of device), offering the researcher the

flexibility to experiment in different cases of CDT.

4.3 Output Signal

Page Parser. This component is activated when the visited

page is fully loaded and no further changes occur on the con-

tent. To collect the display ads, we first need to identify spe-

cific DOM elements inside the visited webpages. This task is

challenging due to the dynamic Javascript execution and the

complex DOM structures generated in most webpages. For

the reliable extraction of ad-elements and identification of

the landing pages,3we follow a methodology similar to the

one proposed in [55]. The functionality of this component

is to parse the rendered webpage and extract the attributes of

3Destination websites the user is redirected to when clicking on the ads.

display ads, which also contain the landing pages.

Ad Extractor. In most modern websites, the displayed ads

are embedded in iFrame tags that create deep nesting layers,

containing numerous and different types of elements. How-

ever, the ads served by the control pages are found directly

inside the iFrames so the module does not have to handle

such complex behavior. Therefore, the module firstly iden-

tifies all the active iFrame elements and filters out the in-

valid ones that have either empty content or zero dimensions.

Then, it retrieves the href attributes of image and flash ads

and parses the URLs, while searching for specific string pat-

terns such as adurl=, redirect=, etc. These patterns are typ-

ically used by the ad-networks for encoding URLs in web-

pages. Next, the module forms the list of candidate landing

pages, which are then processed and analyzed to create the

set of true landing pages. The Ad Extractor is fully com-

patible with the crawlers, and does not need to perform any

clicks on the ad-elements, since it extracts only the landing

pages’ URLs directly from the rendered webpage. After col-

lecting the candidate landing pages, the module filters them

with the EasyList [33], similarly to previous works [12, 35],

and stores only the true active ad-domains. Finally, the Page

Parser & Ad Extractor module also stores metadata from the

crawls such as: time and date of execution, number of identi-

fied ads, number of categories, type and phase of crawl, etc.

Ad Categorizer. To associate landing pages or browsing

URLs with web categories, we employ the McAfee Trusted-

Sources database [60], which provides URLs organized into

categories. This system was able to categorize 96% of the

landing pages of our collection into a total of 76 unique cat-

egories, by providing up to four semantic categories for each

page, while the remaining 4% domains were manually clas-

sified to the categories above. The final output contains the

landing pages of collected ads, along with their categories.

4.4 CDT Detection

Probabilistic CDT is a kind of task generally suitable for

investigation through ML. Previous work [88] and industry

directions [57, 8] claim that probabilistic device-pairing is

based on specific, well-defined signals such as: IP address,

geolocation, type and frequency of browsing activity. Since

we control these parameters in our methodology, by defi-

nition we construct the ground truth with our experimental

setups. That is, we control (i) the devices used, which are

potentially paired under a given IP address, geolocation and

browsing patterns, (ii) the control instance of baseline desk-

top device, and (iii) the browsing with the personas.

Before applying any statistical method, every instance of

the input data has to be transformed into a vector of values;

each position in the vector corresponds to a feature. Features

are different properties of the collected data: browsing ac-

tivity of a user during training time, experimental setup used

(persona, etc.), time-related details of the experiment, as well

Table 2: Description of features used by datasets. The type of

desktop crawl values are in range {0,1}, where 0 represents

the before/test sessions, while 1 the after/train sessions. The

time of crawl is divided in 30 minutes timeslots and is en-

coded in range {0,48}. The day of crawl is encoded in range

{1,7}.Vrepresents the (enumerated) vectors of values in the

sets of: landing pages, training pages, ads and ad categories.

Feature Label Description

Crawl Type The type of desktop crawl.

Run ID The indexed number of run{1,4}.

Session ID The index of session{1,15}.

Persona Keywords V: keyword categoriesof training pages.

Mobile Timeslot Time of crawl (Mobile).

Desktop Timeslot Time of crawl (Desktop).

Desktop Day The day of crawl (Desktop).

Mobile Number of Ads # ad domains (Mobile).

Desktop Number of Ads # ad domains collected (Desktop).

Mobile Unique Number of Ads # distinct ad domains (Mobile).

Desktop Unique Number of Ads # ad domains (Desktop).

Mobile Number of Keywords # ad categories (Mobile).

Desktop Number of Keywords # ad categories (Desktop).

Mobile Unique Number of Keywords # distinct ad categories (Mobile).

Desktop Unique Number of Keywords # distinct ad categories (Desktop).

Mobile Keywords V: keyword categories of landing pages (Mobile).

Desktop Keywords V: keyword categories of landing pages (Desktop).

Mobile Landing Pages V: landing pages of delivered ads (Mobile).

Desktop Landing Pages V: landing pages of delivered ads (Desktop.)

as information about the collected ads, which is the output

signal received from the given browsing activity. These fea-

tures can be studied systematically to identify statistical as-

sociation between the input and output signals, given an ex-

perimental setup. In effect, our feature space is comprised

of a union of these vectors, since all features are either con-

trolled, or measurable by us (detailed description of the fea-

tures is given in Table 2). The only unknown is whether the

ad-ecosystem has successfully associated the devices, and if

it has exhibited this in the output signal via ads.

One Dimension Statistical Analysis. At the first level of

analysis, to measure the similarity of distribution of ads de-

livered in the different devices, we compare the signals us-

ing a two-tailed permutation test and reject the null hypoth-

esis that the frequency of ads delivered (for a given cate-

gory) comes from the same distribution, if the t-test statistic

leads to a p-value smaller than a significance level α<0.05.

Multidimensional Statistical Analysis. Given that a uni-

dimensional test such as the previous one does not take into

account the various other features available in each experi-

ment, we further consider ML, which take into account mul-

tidimensional data, to decide if the ads delivered in each de-

vice are from the same distribution or not. We transform

the problem of identifying if the previously exported vec-

tors are similar enough, into a typical binary classification

problem, where the predicted class describes the existence

of pairing or not, that may have occurred between the mo-

bile device and one of the two desktop devices. As a paired

combination we consider the desktop device that exists un-

der the same IP address with the mobile device. The “not

paired” combination is the mobile device and the baseline

desktop. The analysis is based on three classification algo-

Table 3: Characteristics of the datasets used in each setup

(S) of experiments. S={1,2,3}are the setups of experiments

in §5.2, §5.3 and §5.4, respectively; ttotal : the total du-

ration of experiment; ttrain : the training duration; ttest : the

testing duration; I: independent personas; C: data combined

from personas; SF: stateful browser; SL: stateless browser;

B: boosted CDT browsing.

S Personas Runs tt rain ttest ttotal Samples Features

1a 10 (I, SF) 4 15min 20min 37 days 240 1100

1b 10 (C, SF) - - - - 2400 2201

2a 2 (I, SF) 4 480min 30min 6 days 192 600

2b 2 (C, SF) - - - - 384 750

2c 2 (I, SF, B) 4 480min 30min 6 days 192 500

2d 2 (C, SF, B) - - - - 384 576

3a 5 (I, SL) 2 15min 20min 9 days 120 450

3b 5 (C, SL) - - - - 600 880

rithms with different dependences on the data distributions.

An easily applied classifier that is typically used for perfor-

mance comparison with other models, is the Gaussian Naive

Bayes classifier. Logistic Regression is a well-behaved clas-

sification algorithm that can be trained, as long as the classes

are linearly separable. It is also robust to noise and can avoid

overfitting by tuning its regularization/penalty parameters.

Random Forest and Extra-Trees classifiers, construct a mul-

titude of decision trees and output the class that is the mode

of the classes of the individual trees. Also they use the Gini

index metric to compute the importance of features.

A fundamental point when considering the performance

evaluation of ML algorithms is the selection of the appropri-

ate metrics. Pure Accuracy can be used, but it’s not repre-

sentative for our analysis, since we want to report the most

accurate estimation for the number of predicted paired de-

vices, while at the same time measure the absolute number of

miss-classified samples overall. For this reason, metrics like

Precision, Recall and F1-score, and the Area Under Curve

of the Receiver Operating Curve (AUC) are typically used,

since they can quantify this type of information.

5 Experimental Evaluation

We use the Talon framework to perform various experiments

and construct different datasets for each. Since every ex-

perimental setup has different experimental parameters (i.e.,

training and testing time, number of personas, browsing

functionalities), the datasets vary in terms of samples size

and feature space. The datasets collected during our experi-

ments and used in our analysis are presented in the Table 3.

5.1 Does IP-sharing allow CDT?

A first set of preliminary experiments were performed to

demonstrate that our platform can (i) successfully identify

and collect the ads delivered to our multiple devices (mobile

and desktops), (ii) inject browsing signal from a device, thus

biasing it to have a realistic persona and (iii) lead to match-

ing/pairing of devices, which could be due to same behav-

ioral ads, retargeting ads or CDT.

First, we use a simple experimental setup: we connect

three instances of desktop devices and one mobile device un-

der the same IP address. We create one persona (as in §4.1),

with an interest in “Online Shopping-Fashion, Beauty”, and

following the described timeline of phases, we run this ex-

periment for two days. Then, we perform one-dimensional

statistical analysis, as introduced in §4.4, and find that there

is no similarity between the mobile with any of desktop de-

vices (null hypothesis rejected with highest p-value=0.030),

while all desktop distributions are similar to each other (null

hypothesis accepted with lowest p-value=0.33). These statis-

tical results indicate that there is no clear device-pairing (at

the level of ad distribution for the given persona), and that

we should consider controlling more factors to instigate it.

Consequently, we expand this experiment by also training

one of the desktop devices using the same persona as with

mobile. By repeating the same statistical tests, we find that

the mobile and desktop with the same browsing behavior re-

ceive ads coming from the same distribution (null hypothesis

accepted with lowest p-value=0.84), while the other desk-

top devices show no similarity with each other or the mobile

(null hypothesis rejected with highest p-value=0.008). This

result indicates that browsing behavior under a shared IP ad-

dress can boost the signal towards advertisers, which they

can use to apply advanced targeting, either as CDT, or retar-

geting on each device or a mixture of both techniques.

Finally, these preliminary experiments and statistical tests

provide us with evidence regarding the effectiveness of our

framework to inject enough browsing signal from different

devices under selected personas. Our framework is also able

to collect ads delivered between devices, that can be later

analyzed and linked back to the personas. Those are fun-

damental components for our system and importantly they

are potentially causing CDT between the devices involved.

Next, we present more elaborate experimentations with our

framework, in order to study CDT in action.

5.2 Does short-time browsing allow CDT?

Independent Personas: Setup 1a. This experimental setup

emulates the behavior of a user that browses frequently about

some topics, but in short-lived sessions in her devices. Given

that most users do not frequently delete their local brows-

ing state, this setup assumes that the user’s browser stores

all state, i.e., cookies, cache, browsing history. This enables

trackers to identify users more easily across their devices, as

they have historical information about them. In this setup,

every experimental run starts with a clean browser profile;

cookies and temporary browser files are stored for the whole

duration of the experimental run (stateful). We use all per-

Table 4: Performance evaluation for Random Forest in Se-

tups 1a and 1b. Left value in each column is the score for

Class 0 (C0=not paired desktop); right value for Class 1

(C1=paired desktop).

Persona Precision Recall F1-Score AUC

(Setup) C0 C1 C0 C1 C0 C1

1 (1a) 0.89 0.60 0.57 0.90 0.70 0.72 0.73

2 (1a) 0.84 0.78 0.81 0.82 0.82 0.80 0.82

3 (1a) 0.81 0.73 0.78 0.76 0.79 0.74 0.76

4 (1a) 0.87 0.78 0.87 0.78 0.87 0.78 0.82

5 (1a) 0.94 0.65 0.68 0.93 0.79 0.76 0.80

6 (1a) 0.57 0.67 0.81 0.38 0.67 0.48 0.59

7 (1a) 0.81 0.87 0.89 0.76 0.85 0.81 0.81

8 (1a) 0.86 0.85 0.89 0.81 0.87 0.83 0.84

9 (1a) 0.74 0.90 0.91 0.73 0.82 0.81 0.81

10 (1a) 0.77 0.85 0.81 0.81 0.79 0.83 0.81

combined (1b) 0.77 0.84 0.81 0.84 0.82 0.84 0.89

sonas of Table 1, and the data collection for each lasts 4 days.

We perform the same statistical analysis as in §5.1, and

find that in 4/10 personas, the mobile and paired desktop

ads are similar (null hypothesis accepted with lowest p-

value=0.13), while the mobile and baseline desktop ad dis-

tributions are different (null hypothesis is rejected with high-

est p-value=0.009). This inconsistency is reasonable since

the statistical analysis is based only on one dimension (the

frequency count of types of ads appearing in the devices),

which may not be enough for fully capturing the existence of

device-pairing. For this reason, we choose to use more ad-

vanced, multidimensional ML methods which take into ac-

count the various variables available, to effectively compare

the potential CDT signals received by the two devices.

The classification results of the Random Forest (best per-

forming) algorithm are reported in Table 4. We use AUC

score as the main metric in our analysis, since the ad-industry

seems to prefer higher Precision scores over Recall, as the

False Positives have greater impact on the effectiveness of

ad-campaigns.4As shown in Table 4, the model achieves

high AUC scores for most of the personas, with a maximum

value of 0.84. Specifically, the personas 2, 4 and 8 scored

highest in AUC, and also in Precision and Recall, whereas

persona 6 has poor performance compared to the rest. These

results indicate that for high scoring personas, we success-

fully captured the active CDT campaigns, but for the per-

sonas with lower scores, there may not be active campaigns

for the period of the experiments.

In order to retrieve the variables that affect the discovery

and measurement of CDT, we applied the feature importance

method on the dataset of each persona, and selected the top-

10 highest scoring features. For the majority of the personas

4Tapad [3] mentions: “Maintaining a low false positive rate while also

having a low false negative rate and scale is optimal. This combination is

a strong indicator that the Device Graph in question was neither artificially

augmented nor scrubbed.”

0.01

0.02

0.03

Desktop Day

Real Estate

Marketing

Online Shopping

Crawl Type

Desktop Timeslot

Stock Trading

Beauty

Desktop #Ads

Domain 1

Domain 2

Recreation

Domain 3

Education

Mobile Day

Business

Run Id

Hardware

Domain 4

Mobile #Ads (unq)

Games

Fashion

Merchandising

Sports

Mobile #Keywords (unq)

Domain 5

Software

Travel

Mobile Timeslot

Session Id

Gini Score

Features

Crawl Attributes

Ad Domain

Keyword

Figure 5: Top-30 features ranked by importance using Gini

index, in the machine learning model.

(7 out of 10) the most important features were the number

of ads (distinct or not) and the number of keywords in desk-

top. In some cases, there were also landing pages that had

high scoring (i.e., specific ad-campaigns), but this was not

consistent across all personas.

Combined Personas: Setup 1b. Here, we use all the

datasets collected individually, for each persona in the previ-

ous experiment (Setup 1a), and combine them into one uni-

fied dataset. This setup emulates the real scenario of a user

exhibiting multiple and diverse web interests, that give ex-

tra information to the ad-ecosystem about their browsing be-

havior. Of course, there is an increase in the possible feature

space to accommodate all the domains and keywords from

all personas. In fact, the dataset contains 2021 features as it

stores the vectors of landing pages and keywords, for all the

different types of personas. In total, there were 890 distinct

ad-domains described by keywords in 76 distinct categories.

In this dataset, we apply feature selection with the Extra-

Trees classifier to select the most relevant features and cre-

ate a more accurate predictive model. This method reduced

the feature space to 984 useful features out of 2201. Next,

we use the three classification algorithms and a range of

hyper-parameters for each one. Also, we apply a 10-fold

nested cross-validation method for selecting the best model

(in terms of scoring performance) that can give us an ac-

curate, non overly-optimistic estimation [22]. Again, the

best selected model was Random Forest, with 200 estimators

(trees) and 200 depth of each tree, with AUC=0.89 (bottom

row in Table 4). The model’s performance is high in all the

mentioned scores, which indicates that the more diverse data

the advertisers collect, the easier it is to identify the different

user’s devices. This result is in line with Zimmeck et al. [88],

who attempted a threshold-based approach for probabilistic

CDT detection on real users’ data, lending credence to our

proposed platform’s performance.

We also measure the feature importance for the top-30 fea-

tures (shown in Figure 5). One third of the top features are

related to crawl specific metadata, whereas about half of the

top features are keyword-related. Interestingly, features such

as the day and time of the experiment, as well as the number

of received ads, are important for the algorithm to make the

classification of the devices. Indeed, time-related features

provide hints on when the ad-ecosystem receives the brows-

ing signal and attempts the CDT, and thus, which days and

hours of day the CDT is stronger. These results give support

to our initial decision to experiment in a continuous fashion

with regular sessions injecting browsing signal, while at the

same time measuring the output signal via delivered ads.

5.3 Does long-time browsing improve CDT?

Independent Personas: Setup 2a. In this set of experi-

ments, we allow the devices to train for a longer period of

time, to emulate the scenario where a user is focused on a

particular interest, and produces heavy browsing behavior

around a specific category. This long-lived browsing injects

a significantly higher input signal to the ad-ecosystem than

the previous setup, which should make it easier to perform

CDT. In order to increase the setup’s complexity, and make

it more difficult to track the user, we allow all devices (i.e.,

1 mobile, 2 desktops) to train in the same way under the

same persona. In effect, this setup also tests a basic coun-

termeasure from the user’s point of view, who tries to blur

her browsing by injecting traffic of the same persona from

all devices to the ad-ecosystem.

In this setup, while all devices are trained with the same

behavioral profile, we examine if the statistical tests and ML

modeler can still detect and distinguish the CDT. This exper-

iment contains three different phases during each run. The

mobile phase, where the mobile performs training crawls for

ttrain =480 mins, and a testing crawl for ttest =30 mins. In par-

allel with the mobile training, the two desktops perform test

crawls for ttest =30 mins. After mobile training and testing,

both desktops start continuous training and testing crawls al-

ternately for 8 hours (ttrain=ttest =30 min).

Due to the long time needed for executing this experiment,

we focus on two personas constructed in the following way.

We use the methodology for persona creation as described

in §4.1, and focus on active ad-campaigns, resulting to two

personas in the interest of “Online Shopping-Accessories”,

and “Online Shopping-Health and Fitness” (loosely match-

ing the personas 1 and 4 from Table 1). Then, we per-

formed 4 runs of 16 hours duration each, for each persona.

In this setup, since all devices are uniformly trained, we do

not include the keyword vector of the persona pages into the

datasets, to not introduce any bias from repetitive features.

The statistical analysis for this experiment reveals poten-

tial CDT, since we accept the null hypothesis for the dis-

tribution of ads delivered in the paired desktop and mobile

(lowest p-value=0.052), and reject it in the baseline desktop

and mobile (highest p-value=0.006). This consistency is in-

teresting, since for this setup all three devices are uniformly

trained with the same persona, and thus all of them collect

Table 5: Performance evaluation for Logistic Regression in

total components of Setup 2. Left value in each column is

the score for Class 0 (C0=not paired desktop); right value

for Class 1 (C1=paired desktop).

Persona Precision Recall F1-Score AUC

(setup) C0 C1 C0 C1 C0 C1

1 (2a) 0.90 0.79 0.82 0.88 0.86 0.83 0.85

4 (2a) 0.83 0.79 0.81 0.81 0.82 0.80 0.81

combined(2b) 0.87 0.92 0.92 0.87 0.89 0.90 0.89

1 (2c) 0.87 1.0 1.0 0.88 0.93 0.93 0.93

4 (2c) 1.0 0.98 0.98 1.0 0.99 0.99 0.99

combined(2d) 1.0 0.86 0.88 1.0 0.93 0.93 0.93

similar ads due to retargeting. However, there is no similar-

ity between the distributions of ads in the devices that do not

share the same IP address.

To clarify this finding, we applied the ML algorithms as in

the previous experiment. The algorithms again detect CDT

between the mobile and the paired desktop, even though all

devices were exposed to similar training with the same per-

sona. In fact, Logistic Regression performed the best across

both personas, with AUC ≥0.81, and F1-score ≥0.80 for

both classes.Detailed evaluation results of §5.3 presented

in able 5. When computing the importance of features, the

desktop number of ads and keywords and the desktop time

slot are in the top-10 features. Based on these observa-

tions, we believe that the longer training time allowed the

ad-ecosystem to establish an accurate user profile, and retar-

get ads on the paired desktop, based on the mobile’s activity.

Combined Personas: Setup 2b. Similarly to §5.2 we

combine all data collected from the Setup 2a into a unified

dataset. Under this scenario, in which we mix data from both

personas, the classifier again performs well, with AUC=0.89.

Important features in this case are the number of ads and key-

words delivered to the desktops, the time of the experiment,

and number of keywords for the desktop.

Boosted Browsing with CDT trackers and Independent

Personas: Setup 2c. In the next set of experiments, we

investigate the role of CDT trackers in the discovery and

measurement of CDT. In particular, we attempt to boost the

CDT signal, by visiting webpages with higher portion of

CDT trackers. Therefore, the experimental setup and the

preprocessing method remain the same as in the previous

Setup 2a, but we select webpages to be visited that have ac-

tive ad-campaigns and their landing pages embed the most-

known CDT trackers (as we also show in the next section):

Criteo, Tapad, Demdex, Drawbridge. We also change the

set of our control pages, so that each one contains at least

a CDT tracker. News sites have many 3rd-parties compared

to other types of sites [35]. Thus, for this boosted browsing

experiment, we choose the set of control pages to contain 3

weather pages and 2 news websites,5while verifying they do

5accuweather.com,wunderground.com,weather.com,

usatoday.com,huffingtonpost.com

not serve contextual ads.

Performing the same analysis as earlier, we find that mo-

bile and paired desktop have ads coming from the same dis-

tribution (lowest p-value=0.10), and that there is no simi-

larity between the ads delivered in the mobile and baseline

desktop (highest p-value=0.007). For a clearer investigation

of the importance of the CDT trackers, we also evaluate the

findings with the ML models. For persona 1, Logistic Re-

gression and Random Forest models perform near optimally,

with high precision of Class 1, high recall for class 0, aver-

age F1-Score=0.93 for both classes, and AUC=0.93. For per-

sona 4, the scores are even higher, outperforming the other

setups, as all metrics for Logistic Regression scored higher

than 0.98. Overall, these results indicate that we success-

fully biased the trackers to identify the emulated user in both

devices, and to provide enough output signal (ads delivered)

for the statistical algorithms to detect the CDT performed.

Boosted Browsing with CDT trackers and Combined

Personas: Setup 2d. We follow a similar approach with be-

fore, and combine all data collected from the Setup 2c, into

a unified dataset for Setup 2d. Under this scenario, the clas-

sifier (Logistic Regression) again performs very well, with

AUC=0.93. Important features in this case are the number

of ads delivered to the desktops, the time of the experiment

in each desktop and the number of keywords. Interestingly,

and perhaps unexpectedly, the existence of Criteo tracker in

a landing page, is a feature appearing in the top-10 features.

5.4 Does incognito browsing help evade CDT?

Independent Personas: Setup 3a. In this final experimen-

tal setup, we investigate if it is possible for the user to apply

some basic countermeasures to avoid, or at least reduce the

possibility of CDT, by removing her browsing state in every

new session. For this, we perform experiments where the tra-

ditional tracking mechanisms (e.g., cookies, cache, browsing

history, etc.) are disabled or removed, emulating incognito

browsing. We select the first five personas from Table 1,

which had the most active ad-campaigns and appeared to be

promising due to the “online shopping” interest. Every desk-

top executed browsing in a stateless mode, while the mobile

in a stateful mode. For each persona, we collected data for

two runs, following the timeline of phases as in Setup 1a.

The distributions between mobile vs. paired desktop, as

well as mobile vs. baseline desktop, were found to be dif-

ferent (highest p-value=0.034). Also, none of the ML classi-

fiers performed higher than 0.7 (in all metrics), and thus we

could not clearly extract any significant result. Specifically,

the highest AUC score for personas 1 and 2 was 0.70 with the

use of the Random Forest classifier, and for personas 3 and 4

was 0.73 using the Logistic Regression classifier. The worst

scoring, independent of algorithm, was recorded for persona

5, with AUC=0.57, and Precision/Recall scores under 0.50.

Combined Personas: Setup 3b. When the data from all five

0.2

0.4

0.6

0.8

0 5 10 15 20 25 30 35

CDF of Sessions

Number of Ads

Paired PC

Baseline PC

Mobile

0 5 10 15 20 25 30 35

Number of Keywords

Paired PC

Baseline PC

Mobile

Figure 6: CDF of collected ads (left) and corresponding key-

words of the ads (right) per crawling session for all devices.

personas are combined, the classifier performing best was

Logistic Regression, with AUC=0.79. Overall, these results

point to the semi-effectiveness of the incognito browsing to

limit CDT. That is, by removing the browsing state of a user

on a given device, the signal provided to the CDT entities is

reduced, but not fully removed. In fact, when the data from

various personas are combined, the CDT is still somewhat

effective, since the paired devices have the same IP address.

6 Platform Validation

In this section we validate the representativeness of the data

collected from the previous experiments, by examining: (i)

the type and frequency of ads delivered in each device, and

(ii) the type and number of trackers that our personas were

exposed to. We compare the distributions of these quantities

with past works and data on real users, to quantify if our syn-

thetic personas successfully emulate real users’ traffic, and if

our measurements of the CDT ad-ecosystem are realistic.

We first measure the frequency of ads delivered to our de-

vices in the experiment §5.2, since it follows a well-crafted

timeline that is suitable for this kind of measurement. The

ads delivered in the three devices during these sessions are

shown in Figure 6 (left). For most sessions (∼90%), the

mobile device was exposed to fewer than five ads, since the

mobile version of websites typically delivers a smaller num-

ber of ads, designed for smaller screens and devices. On the

contrary, the desktop devices had a higher exposure to ads

compared to the mobile device. Also, the two desktops re-

ceive a similar number of ads (on average 2 to 4 ads on every

visit to the control pages). Similar observations can be made

for the keywords categories of ads (Figure 6 (right)). The

ad-industry has reported that ∼300 ads everyday, on aver-

age, are being displayed to desktop users [44, 18, 45, 23],

while they also recommend the delivery of 5 ads per mobile

domain [76], which proportionally match the number of ads

we have collected in our mobile and desktop sessions.

We also validate the representativeness of the data col-

lected from the experiments §5.2 and §5.3, by examining the

trackers appearing in the webpages visited by the personas.

We use Disconnect List [29] to detect them and measure their

Google

Facebook

Bing

Zopim

AppNexus

Advertising.com

Yahoo

Casalemedia

Yandex

OpenX

Twitter

Pubmatic

Hotjar

Tapad

Amazon

Outbrain

Drawbridge

Taboola

Linkedin(Microsoft)

Occurences(%)

Figure 7: Top-20 trackers (grouped based on organization)

and their coverage in persona pages. For example, all the

Google-owned domains, such as Doubleclick, Googleapis,

Google-Analytics, are grouped under the “Google” label.

frequency of appearance (i.e., Figure 7). From the trackers

detected in the set of persona pages, and using the list pro-

vided by [88], 37% was found to be CDT related, including

both deterministic and probabilistic. In fact, the top CDT

trackers found in our data, which may perform both types

of CDT, include Google-owned domains, Facebook, Criteo,

Zopim, Bing, Advertising.com(AOL), and are in-line with

the top CDT trackers found in [88, 17] (66% overlap of top-

20 with [88] and 55% overlap with [17]). In addition, 17%

of these trackers are mainly focused on probabilistic CDT,

including Criteo, BlueKai, AdRoll, Cardlytics, Drawbridge,

Tapad, and each individual tracker is found at least in 2% of

the persona pages, again in-line with the results in [88].

7 Discussion & Conclusion

Through extensive experiments with the proposed frame-

work Talon, we were able to trigger CDT trackers into pair-

ing of the emulated users’ devices. This allowed us to statis-

tically verify that CDT is indeed happening, and measure its

effectiveness on different user interests and browsing behav-

iors, independently and in combination. In fact, CDT was

prominent when user devices were trained to browse pages

of similar interests, reinforcing the behavioral signal sent to

CDT entities, and specifically when browsing activity is re-

lated with online shopping, since those types of users seem to

be more targeted by advertisers. The CDT effect was further

amplified when the visited persona and control pages had

embedded CDT trackers, pushing the accuracy of detection

up to 99%. We also found that browsing in a stateless mode

showed a reduced, but not completely removed CDT effect,

as incognito browsing obfuscates somewhat the signal sent

to the ad-ecosystem, but not the network access information.

Indeed, our data collection was performed across relatively

short time periods, in comparison to the wealth of browsing

data that advertising networks have at their disposal. In fact,

we anticipate that CDT companies collect data about users

and devices for months or years, and even buy data from data

brokers, to have the capacity of targeting users with even

higher rates. To that end, we believe that high accuracies

self-reported by CDT companies (e.g., Lotame: >90% [56],

Drawbridge: 97.3% [31]), are possible.

Impact on user privacy: Undoubtedly, CDT infringes on

users’ online privacy and minimizes their anonymity. But the

actual extent of this tracking paradigm and its consequences

to users, the community, and even to the ad-ecosystem itself,

are still unknown. In fact, since CDT is heavily depended on

user’s browsing activity, and the ad-ecosystem employs such

collected data for targeting purposes, one major line of future

work is the study of targeting sensitive user categories (e.g.,

gender, sexual orientation, race, etc.) via CDT. This is espe-

cially relevant nowadays with the enforcement of recent EU

privacy regulations such as GDPR [37] and ePrivacy [36].

This is where Talon comes in play, as it provides a concrete,

scalable and extensible methodology for experimenting with

different CDT scenarios, auditing its mechanics and measur-

ing its impact. In fact, the modular design of our method-

ology allows to study CDT in depth, and propose new ex-

tensions to study the CDT ecosystem: new plugins, personas

and ML techniques. To that end, our design constitutes Talon

into an enhanced transparency tool that reveals potentially il-

legal biases or discrimination from the ad-ecosystem.

Acknowledgments

The research leading to these results has received fund-

ing from the European Union’s Horizon 2020 Research and

Innovation Programme under grand agreement No 786669

(project CONCORDIA), the Marie Sklodowska-Curie grant

agreement No 690972 (project PROTASIS), and the Defense

Advanced Research Projects Agency (DARPA) ASED Pro-

gram and AFRL under contract FA8650-18-C-7880. The

paper reflects only the authors’ views and the Agency and

the Commission are not responsible for any use that may be

made of the information it contains.

References

[1] ICDM 2015: Drawbridge Cross-Device Connections

- Data. https://www.kaggle.com/c/icdm-2015-

drawbridge-cross-device-connections/data,

2015.

[2] WebWire - Drawbridge Challenges Scientific Com-

munity to Better the Accuracy of Its Cross-Device

Consumer Graph. https://www.webwire.com/

ViewPressRel.asp?aId=198392, 2017.

[3] Measuring Cross-Device: The Methodology.

https://www.tapad.com/resources/cross-

device/measuring-cross-device-the-

methodology, 2018.

[4] Pew Research Center - Mobile Fact Sheet. http:

//www.pewinternet.org/fact-sheet/mobile/,

2018.

[5] ACA R, G., EU BANK , C. , ENGLEHARDT, S., JUAR EZ,

M., NARAYANAN, A., A ND DIAZ, C. The web never

forgets: Persistent tracking mechanisms in the wild. In

Proceedings of the 2014 ACM SIGSAC Conference on

Computer and Communications Security, CCS ’14.

[6] ACAR, G., JUAR EZ , M., NIKIFORAKIS, N., DIAZ,

C. , G ¨

UR SE S, S ., PIESSENS, F., A ND PRE NE EL , B.

Fpdetective: Dusting the web for fingerprinters. In

Proceedings of the 2013 ACM SIGSAC Conference on

Computer & Communications Security, CCS ’13.

[7] ADBR AI N. Demystifying cross-device. essen-

tial reading for product management,business

development and business technology lead-

ers. https://www.iabuk.com/sites/

default/files/white-paper-docs/Adbrain-

Demystifying-Cross-Device.pdf, 2016.

[8] ADELPHIC. How cross-device identity matching

works. https://adelphic.com/how-cross-

device-identity-matching-works-part-1/,

2016.

[9] AGUIRRE, E., MAHR, D., GR EWAL , D., D E RUYTER,

K., AND WET ZE LS , M. Unraveling the personaliza-

tion paradox: The effect of information collection and

trust-building strategies on online advertisement effec-

tiveness. Journal of Retailing 91, 1 (2015), 34–49.

[10] ANA ND, T. R., A ND RE NOV, O. Machine learning ap-

proach to identify users across their digital devices. In

IEEE International Conference on Data Mining Work-

shop (ICDMW) (2015), pp. 1676–1680.

[11] AR P, D., QUIRING, E. , WRE SS NE GGE R, C., AND

RIE CK , K. Privacy threats through ultrasonic side

channels on mobile devices. In IEEE European Sym-

posium on Security and Privacy (EuroS&P) (2017),

pp. 35–47.

[12] BASHIR, M. A., ARSHAD, S., ROB ERT SON , W., AN D

WILSON, C. Tracing information flows between ad ex-

changes using retargeted ads. In 25th USENIX Security

Symposium (2016), pp. 481–496.

[13] BASHIR, M. A., FA ROOQ , U., SHAHID, M., ZA FFA R,

M. F., AND WILSON, C. Quantity vs. quality: Evaluat-

ing user interest profiles using ad preference managers.

In Proceedings of the Annual Network and Distributed

System Security Symposium (NDSS), San Diego, CA

(2019).

[14] BL EI ER, A., A ND EISENBEISS, M. Personalized on-

line advertising effectiveness: The interplay of what,

when, and where. Marketing Science 34, 5 (2015),

669–688.

[15] BOERMAN, S. C., KRUIKEMEIER, S., A ND

ZUI DE RVE EN BORGESIUS, F. J. Online behav-

ioral advertising: A literature review and research

agenda. Journal of Advertising.

[16] BRO DER, A., FONTOURA, M., J OSI FOV SKI , V., AND

RIEDEL, L. A semantic approach to contextual adver-

tising. In Proceedings of the 30th Annual International

ACM SIGIR Conference on Research and Development

in Information Retrieval (2007).

[17] BROOKMAN, J., RO UG E, P., ALVA, A., AN D YEU NG ,

C. Cross-device tracking: Measurement and disclo-

sures. Proceedings on Privacy Enhancing Technolo-

gies, 2 (2017), 133–148.

[18] BRY CE SA ND ERS. Do we really see 4,000 ads a day?

https://www.bizjournals.com/bizjournals/

how-to/marketing/2017/09/do-we-really-

see-4-000-ads-a-day.html, 2017.

[19] CAO , X. , HUAN G, W., AN D YU, Y. Recovering cross-

device connections via mining ip footprints with en-

semble learning. In IEEE International Conference on

Data Mining Workshop (ICDMW) (2015), pp. 1681–

1686.

[20] CAO , Y., LI, S., A ND WIJMANS, E. (cross-)browser

fingerprinting via os and hardware level features. In

Proceedings of Network & Distributed System Security

Symposium (NDSS) (2017), Internet Society.

[21] CARRASCOSA, J. M., MIKIANS, J. , CUE VAS, R.,

ERR AM ILLI, V., AN D LAOUTARIS, N. I always feel

like somebody’s watching me: measuring online be-

havioural advertising. In Proceedings of the 11th ACM

Conference on Emerging Networking Experiments and

Technologies (CONEXT) (2015).

[22] CAWLEY, G. C., AN D TALBOT, N. L. On over-fitting

in model selection and subsequent selection bias in per-

formance evaluation. Journal of Machine Learning Re-

search 11 (2010).

[23] CH RI STOPHER EL LI OT T. Yes, there are too many

ads online. yes, you can stop them. heres how.

https://www.huffingtonpost.com/entry/yes-

there-are-too-many-ads-online-yes-you-

can-stop_us_589b888de4b02bbb1816c297, 2017.

[24] CHUN, K. Y., SO NG , J. H., HOLLENBECK, C. R.,

AN D LEE , J.-H. Are contextual advertisements effec-

tive? International Journal of Advertising 33, 2 (2014),

351–371.

[25] CRITEO. The State of Cross-Device Com-

merce. https://www.criteo.com/wp-content/

uploads/2017/07/Report-criteo-state-of-

cross-device-commerce-2016-h2-SEA.pdf,

2016.

[26] CRITEO. The 5 top attribution methodologies for

cross-channel roi. https://www.criteo.com/

insights/top-attribution-methodologies-

for-cross-channel-roi/, 2018.

[27] DATTA, A., TSCHANTZ, M. C ., AN D DATTA, A. Au-

tomated experiments on ad privacy settings. Proceed-

ings on privacy enhancing technologies 2015, 1 (2015),

92–112.

[28] DIAZ-MO RALES , R . Cross-device tracking: Matching

devices and cookies. In 2015 IEEE International Con-

ference on Data Mining Workshop (ICDMW) (2015).

[29] DISCONNECT. Disconnect lets you visualize and block

the invisible websites that track your browsing history.

https://disconnect.me/, 2019.

[30] DO LI N, C., WEINSHEL, B., S HAN , S., HAHN, C. M.,

CHO I, E., MAZU RE K, M. L., A ND UR, B. Unpacking

perceptions of data-driven inferences underlying online

targeting and personalization. In Proceedings of the

2018 CHI Conference on Human Factors in Computing

Systems (2018), ACM, p. 493.

[31] DRAWBRIDGE. Cross-Device Consumer Graph.

https://go.drawbridge.com/rs/454-ORY-

155/images/Drawbridge-Cross-Device-

Consumer-Graph.pdf, 2015.

[32] DRAWBRIDGE. Drawbridge Cross-Device Con-

nected Consumer Graph Is 97.3% Accurate.

https://go.drawbridge.com/rs/454-ORY-

155/images/Drawbridge-Cross-Device-

Consumer-Graph.pdf, 2015.

[33] EASYLIST. Easylist is the primary filter list that re-

moves most adverts from international webpages, in-

cluding unwanted frames, images and objects. https:

//easylist.to/, 2018.

[34] EC KE RSLEY, P. How unique is your web browser? In

Proceedings of the 10th International Conference on

Privacy Enhancing Technologies, PETS’10.

[35] ENGLEHARDT, S., AN D NAR AYANAN , A. Online

tracking: A 1-million-site measurement and analysis.

In Proceedings of the ACM SIGSAC conference on

computer and communications security (CCS) (2016),

pp. 1388–1401.

[36] EU ROPEAN PARLIAMENT, COUNCIL OF THE EURO -

PE AN UNION. Directive 2002/58/EC of the European

Parliament and of the Council of 12 July 2002 concern-

ing the processing of personal data and the protection of

privacy in the electronic communications sector (Direc-

tive on privacy and electronic communications)), 2002.

[37] Regulation (EU) 2016/679 of the European Parliament

and of the Council of 27 April 2016 on the protection

of natural persons with regard to the processing of per-

sonal data and on the free movement of such data, and

repealing Directive 95/46/EC (General Data Protection

Regulation). Official Journal of the European Union

L119 (2016), 1–88.

[38] FARAH AT, A., A ND BAILEY, M. C. How effective is

targeted advertising? In Proceedings of the 21st ACM

International Conference on World Wide Web (2012),

pp. 111–120.

[39] GI RONDA , J. T., AND KORGAONKAR, P. K. ispy?

tailored versus invasive ads and consumers perceptions

of personalized advertising. Electronic Commerce Re-

search and Applications 29 (2018), 64–77.

[40] GOOGLE. Google Product Taxonomy. https:

//www.google.com/basepages/producttype/

taxonomy.en-US.txt, 2015.

[41] GOOGLE. Run apps on the Android Emula-

tor. https://developer.android.com/studio/

run/emulator/, 2018.

[42] GR ACE, M. C., ZHOU, W., JIANG , X. , AND

SAD EG HI , A.-R. Unsafe exposure analysis of mo-

bile in-app advertisements. In Proceedings of the Fifth

ACM Conference on Security and Privacy in Wireless

and Mobile Networks, WISEC ’12.

[43] GUHA, S., CHENG, B., AND FRANCIS, P. Privad:

Practical privacy in online advertising. In Proceedings

of the 8th USENIX Conference on Networked Systems

Design and Implementation (2011).

[44] JO N SIMPSON. Finding brand success in the

digital world. https://www.forbes.com/

sites/forbesagencycouncil/2017/08/25/

finding-brand-success-in-the-digital-

world/#734eaba626e2, 2018.

[45] JUSTIN MALLINSON. How many ads do we really see

each day? http://www.tcsmedia.co.uk/many-

ads-really-see-day/, 2018.

[46] KA FK A, P., AND MOLLA, R. Recode - 2017 was

the year digital ad spending finally beat TV. https:

//www.recode.net/2017/12/4/16733460/2017-

digital-ad-spend-advertising-beat-tv, 2017.

[47] KEJELA, G., AND RO NG, C. Cross-device consumer

identification. In IEEE International Conference on

Data Mining Workshop (ICDMW) (2015), pp. 1687–

1689.

[48] KI M, M. S., LI U, J., WANG, X., AND YANG, W.

Connecting devices to cookies via filtering, feature en-

gineering, and boosting. In IEEE International Con-

ference on Data Mining Workshop (ICDMW) (2015),

pp. 1690–1694.

[49] KOROL OVA, A., AND SHAR MA , V. Cross-app track-

ing via nearby bluetooth low energy devices. In Pro-

ceedings of the 8th ACM Conference on Data and

Application Security and Privacy (CODASPY) (2018),

pp. 43–52.

[50] LA ND RY, M., CHO NG, R., E T AL. Multi-layer clas-

sification: Icdm 2015 drawbridge cross-device connec-

tions competition. In IEEE International Conference

on Data Mining Workshop (ICDMW) (2015), pp. 1695–

1698.

[51] L ´

EC UY ER , M., DUC OFFE, G., LA N, F., PAPANCEA,

A., PE TSIOS , T., SPAHN, R., CHAINTREAU, A., AND

GEAMBASU, R . Xray: Enhancing the webs trans-

parency with differential correlation. In 23rd USENIX

Security Symposium (2014), pp. 49–64.

[52] LE CU YER, M., SPA HN , R. , SPILIOPOLOUS, Y.,

CHAINTREAU, A., GEA MBAS U, R., AN D HSU , D.

Sunlight: Fine-grained targeting detection at scale with

statistical confidence. In Proceedings of the 22Nd ACM

SIGSAC Conference on Computer and Communica-

tions Security (CCS) (2015).

[53] LE RN ER, A., SIMPSON, A. K., KO HN O, T., AND

ROE SN ER , F. Internet jones and the raiders of the lost

trackers: An archaeological study of web tracking from

1996 to 2016. In 25th USENIX Security Symposium

(2016).

[54] LEWIS, R. A., RAO, J. M., AND REIL EY, D. H. Here,

there, and everywhere: correlated online behaviors can

lead to overestimates of the effects of advertising. In

Proceedings of the 20th ACM International Conference

on World Wide Web (2011), pp. 157–166.

[55] LI U, B., SHET H, A ., WEINSBERG, U., CHAN-

DRASHEKAR, J., AN D GOVIN DAN, R. Adreveal: im-

proving transparency into online targeted advertising.

In Proceedings of the 12th ACM Workshop on Hot Top-

ics in Networks (2013), p. 12.

[56] LOTAME. Cross-Device ID Graph Accuracy: Method-

ology. https://www.lotame.com/cross-device-

id-graph-accuracy-methodology/, 2016.

[57] LOTAME. Cross-device.bridging the gap between

screens. https://www.lotame.com/products/

cross-device/, 2018.

[58] MAVRO UD IS, V., HAO , S., FRATAN TO NI O, Y.,

MAGG I, F., KRUE GE L, C ., AN D VIG NA , G. On the

privacy and security of the ultrasound ecosystem. Pro-

ceedings on Privacy Enhancing Technologies (2017).

[59] MAYE R, J. R., AND MITCHELL, J . C. Third-party

web tracking: Policy and technology. In Proceedings

of the 2012 IEEE Symposium on Security and Privacy,

SP ’12.

[60] MCAF EE. Customer URL Ticketing System. https:

//www.trustedsource.org/, 2018.

[61] MCNAIR, C. Global Ad Spending Update.

https://www.emarketer.com/content/global-

ad-spending-update, 2018.

[62] ME NG , W., DING, R., CH UNG, S. P., HAN , S., AND

LEE , W. The price of free: Privacy leakage in person-

alized mobile in-apps ads. In NDSS (2016).

[63] NIKIFORAKIS, N., JOOSEN , W., AN D LIVSHITS, B.

Privaricator: Deceiving fingerprinters with little white

lies. In Proceedings of the 24th International Confer-

ence on World Wide Web, WWW ’15.

[64] NIKIFORAKIS, N., KAPRAVELOS , A. , JOO SE N, W.,

KRUE GEL, C., PIESSENS, F., AN D VIG NA , G. Cook-

ieless monster: Exploring the ecosystem of web-based

device fingerprinting. In Proceedings of the 2013 IEEE

Symposium on Security and Privacy, SP ’13.

[65] OLEJNIK, L., MINH-DUNG , T., A ND CA ST ELL UC -

CIA, C. Selling off privacy at auction. In Network

and Distributed System Security Symposium (NDSS)

(2014).

[66] PACHILAKIS, M., PAPADOPOULOS, P., MAR KATO S,

E. P., AND KO URT ELL IS , N. No more chasing water-

falls: A measurement study of the header bidding ad-

ecosystem. In 19th ACM Internet Measurement Con-

ference (2019).

[67] PANC HENKO, A., LA NZ E, F., PENNEKAMP, J., E N-

GE L, T., ZINNEN, A., HENZE, M., A ND WEH RL E,

K. Website fingerprinting at internet scale. In NDSS

(2016).

[68] PAPADOPOULOS, E. P., DIAMANTARIS, M., PA-

PADOPOULOS, P., PET SA S, T., IOANNIDIS, S., A ND

MAR KATO S, E. P. The long-standing privacy debate:

Mobile websites vs mobile apps. In Proceedings of

the 26th ACM International Conference on World Wide

Web (2017), pp. 153–162.

[69] PAPADOPOULOS, P., KOURT EL LIS, N., AN D

MAR KATO S, E. Cookie synchronization: Everything

you always wanted to know but were afraid to ask.

In The World Wide Web Conference (2019), ACM,

pp. 1432–1442.

[70] PAPADOPOULOS, P., KOURT EL LIS, N., RODRIGUEZ,

P. R., A ND LAO UTARI S, N. If you are not paying for

it, you are the product: How much do advertisers pay to

reach you? In Proceedings of the ACM Internet Mea-

surement Conference (2017), pp. 142–156.

[71] PARR A-ARNAU, J., ACHARA, J. P., AN D CASTEL -

LUCCIA, C. Myadchoices: Bringing transparency and

control to online advertising. ACM Transactions on the

Web (TWEB) 11, 1 (2017), 7.

[72] PATRICK HOLMES. Mobile and Desktop Ad-

vertising Strategies Based on User Intent.

https://instapage.com/blog/adwords-

search-device-user-intent, 2018.

[73] PRO JECT, J. F. Automation for Apps. http://

appium.io/, 2018.

[74] RAMIREZ, E., OHLHAUS EN , M., A ND MCSWEENY,

T. Cross-device tracking: An FTC staff report. Tech.

rep., 2017.

[75] RA ZAGHPANA H, A ., NI TH YANA ND , R., VALL INA-

RODRIGUEZ, N., SU NDAR ES AN, S., ALLMAN, M.,

KREIBICH, C., AN D GILL, P. Apps, trackers, privacy

and regulators: A global study of the mobile tracking

ecosystem. Proceedings of the Annual Network and

Distributed System Security Symposium (NDSS), San

Diego, CA.

[76] RE NE HE RMENAU. Adsense max allowed number of

ads - 2018 Rules. https://wpquads.com/google-

adsense-allowed-number-ads/, 2018.

[77] ROE SNER, F., KOH NO , T., A ND WETHE RA LL, D.

Detecting and defending against third-party tracking on

the web. In Proceedings of the 9th USENIX Confer-

ence on Networked Systems Design and Implementa-

tion, NSDI’12.

[78] SE LS AAS, L. R., AGRAWAL, B., RON G, C., AN D

WIK TO RS KI , T. Affm: Auto feature engineering in

field-aware factorization machines for predictive ana-

lytics. In IEEE International Conference on Data Min-

ing Workshop (ICDMW) (2015), pp. 1705–1709.

[79] TAPAD. Tapad device graph - creating a unified view of

the consumer. https://www.tapad.com/device-

graph/, 2018.

[80] TAPAD. The expert’s guide to cross-device conversion

& attribution. https://www.tapad.com/uses/the-

experts-guide-to-cross-device-conversion-

attribution, 2018.

[81] TERKKI, E., RAO, A., AN D TARKOMA, S. Spying on

android users through targeted ads. In 2017 9th Inter-

national Conference on Communication Systems and

Networks (COMSNETS).

[82] TOUBIANA, V., NAR AYANAN , A., BONEH , D. , NIS -

SE NBAUM , H., AND BAROCAS, S. Adnostic: Privacy

preserving targeted advertising. In Proceedings Net-

work and Distributed System Symposium (2010).

[83] TRAN, M. M.-D., A ND CASTELLUCCIA, C. Betrayed

by your ads! reconstructing user profiles from targeted

ads. In The 12th (PETS 2012) Privacy Enhancing Tech-

nologies Symposium, Vigo, Spain (2012).

[84] TUCKER, C. E. Social networks, personalized adver-

tising, and privacy controls. Journal of Marketing Re-

search 51, 5 (2014), 546–562.

[85] WALTHE RS, J. Learning to rank for cross-device

identification. In 2015 IEEE International Conference

on Data Mining Workshop (ICDMW) (2015), IEEE,

pp. 1710–1712.

[86] YAN, J., LIU , N., WANG, G., ZH AN G, W., JI AN G,

Y., AN D CHEN, Z. How much can behavioral targeting

help online advertising? In Proceedings of the 18th

International Conference on World Wide Web (2009).

[87] YU, Z., MAC BE TH , S., MOD I, K., AN D PUJOL, J. M.

Tracking the trackers. In Proceedings of the 25th In-

ternational Conference on World Wide Web (2016),

WWW ’16, pp. 121–132.

[88] ZI MM ECK, S., LI, J . S., KIM , H., BEL LOVIN, S. M. ,

AN D JEBAR A, T. A privacy analysis of cross-device

tracking. In 26th USENIX Security Symposium (2017),

pp. 1391–1408.