Tunneling for Transparency: A Large-Scale
Analysis of End-to-End Violations in the Internet
Taejoong Chung
Northeastern University
David Choffnes
Northeastern University
Alan Mislove
Northeastern University
ABSTRACT
Detecting violations of application-level end-to-end con-
nectivity on the Internet is of significant interest to re-
searchers and end users; recent studies have revealed
cases of HTTP ad injection and HTTPS man-in-the-
middle attacks. Unfortunately, detecting such end-to-
end violations at scale remains difficult, as it generally
requires having the cooperation of many nodes spread
across the globe. Most successful approaches have relied
either on dedicated hardware, user-installed software, or
privileged access to a popular web site.
In this paper, we present an alternate approach for
detecting end-to-end violations based on Luminati, a
HTTP/S proxy service that routes traffic through mil-
lions of end hosts. We develop measurement techniques
that allow Luminati to be used to detect end-to-end
violations of DNS, HTTP, and HTTPS, and, in many
cases, enable us to identify the culprit. We present re-
sults from over 1.2mnodes across 14kASes in 172 coun-
tries, finding that up to 4.8% of nodes are subject to
some type of end-to-end connectivity violation. Finally,
we are able to use Luminati to identify and measure
the incidence of content monitoring, where end-host
software or ISP middleboxes record users’ HTTP re-
quests and later re-download the content to third-party
servers.
1. INTRODUCTION
End-user applications typically make the implicit as-
sumption of end-to-end connectivity when using the net-
work: that application-level data is delivered to the
destination application in unmodified form. However,
this assumption is often violated, either due to other
software running on the end-host (e.g., malware, ad
Permission to make digital or hard copies of all or part of this work for personal
or classroom use is granted without fee provided that copies are not made or
distributed for profit or commercial advantage and that copies bear this notice
and the full citation on the first page. Copyrights for components of this work
owned by others than the author(s) must be honored. Abstracting with credit is
permitted. To copy otherwise, or republish, to post on servers or to redistribute to
lists, requires prior specific permission and/or a fee. Request permissions from
permissions@acm.org.
IMC 2016, November 14 - 16, 2016, Santa Monica, CA, USA
c
2016 Copyright held by the owner/author(s). Publication rights licensed to
ACM. ISBN 978-1-4503-4526-2/16/11.. . $15.00
DOI: http://dx.doi.org/10.1145/2987443.2987455
injectors) or specialized appliances deployed by ISPs
(e.g., middleboxes). Malware often modifies content
to steal user credentials or inject advertisements; mid-
dleboxes are deployed by ISPs for a variety of rea-
sons, including security (e.g., firewalls, anti-virus sys-
tems), content policies (e.g., content blockers), and per-
formance optimization (e.g., proxies, DNS interception
boxes, transcoders).
While the implementation and impact on users varies
widely, all of these cases represent violations of end-to-
end principles in Internet connectivity. These pieces
of software and middleboxes are especially concern-
ing when they monitor and manipulate users’ traffic,
such as when they intercept and replace DNS NXDO-
MAIN responses with advertisements or inject tracking
JavaScript code into web pages. Making this situation
more dire is the fact that these end-to-end violations are
often opaque to users; ISPs typically do not announce
the presence or function of middleboxes, nor do they
usually declare how traffic is monitored/manipulated.1
Unfortunately, it is challenging to understand end-
to-end connectivity violations without access to devices
or users in affected networks. To address this chal-
lenge, there have been a number of successful prior ap-
proaches that entail deploying software [16,30] or hard-
ware [22,25] to enable network experiments, or leverag-
ing the vantage point of a popular Internet destination
to deploy custom web-based measurement code [13,31].
While these approaches have identified a number of dif-
ferent violations, they are typically difficult for others to
replicate: dedicated hardware and software approaches
are often difficult to scale as users must install their
hardware or software, and web-based approaches re-
quire privileged access to a popular web site, which few
researchers have. Thus, a scalable approach to quickly
enable researchers to measure end-to-end connectivity
violations remains an elusive goal.
In this paper, we explore an alternative approach to
detecting end-to-end connectivity violations in edge net-
works, which allows us to achieve measurements from
over 1mend hosts simultaneously without requiring
1We consider all end-to-end violations, even those that user
opt into, due to their potential impact on protocols making end-
to-end assumptions.
Project Nodes ASes Countries Measurement Period ICMP DNS HTTP HTTPS
Our approach 1,276,873 14,772 17215 days X X X
Netalyzr [16, 19] 1,217,181 14,375 196 6 years X X X X
BISmark [4, 22] 406 118 34 2 years X X X X
Dasu [30] 100,104 1,802 147 6 years X X X X
RIPE Atlas [25] 9,300 3,333 181 6 years X X X X
Table 1: Comparison of the approach presented in this paper to other, complementary approaches, based on number of
nodes, ASes, countries and different supported protocols. Not shown are web-based approaches [13,31] that require privileged
access to a popular web site. While we face a restricted set of protocols, our approach allows for a similar level of scalability
but much faster data collection.
users to install our software or hardware. We lever-
age the commercial Peer-to-Peer (P2P)-based HTTP/S
proxy service, Luminati, which is based on the Hola
Unblocker browser plugin. Hola allows users to route
traffic via other peers in order to evade geo-blocking,
providing strong incentives for users to install it; the de-
velopers claim that over 91musers have installed Hola.
By using Luminati, we can route HTTP/S traffic via
many of the Hola nodes, and gain visibility into their
networks. As shown in Table 1, when compared to
alternate approaches, our approach allows for similar
scalability (over 1mnodes) but with much faster data
collection (in 5 days versus 6 years).
However, our approach faces several technical chal-
lenges and limitations. As the measurement is based on
HTTP/S proxy service, we can only send DNS requests,
HTTP traffic (on port 80), and arbitrary traffic (on port
443). This makes it impossible to directly measure vi-
olations affecting other protocols such as SMTP; fortu-
nately, HTTP, HTTPS, and DNS represent three of the
most commonly used protocols and targets for end-to-
end connectivity violations. Additionally, we have no
control over the host’s network configuration, and only
limited visibility into network traffic (i.e., we can in-
duce the node to make a DNS request, but are unable
to observe the exact response the node receives). For ex-
ample, if a host uses OpenDNS, a public DNS resolver,
we must infer this by causing the host to make a DNS
request to a domain we control and then examining the
requests that arrive at our DNS server. Similarly, if a
host has malware that modifies outgoing requests, we
must infer this as well.
Overall, this paper makes four contributions: First,
we demonstrate how a large-scale HTTP/S proxy ser-
vice can be used to measure end-to-end connectivity vi-
olations in DNS, HTTP, and HTTPS. We develop tech-
niques that allow us, in most cases, to identify the party
responsible for the violations (e.g., the user’s DNS re-
solver, an ISP middlebox, software on the user’s ma-
chine). This allows researchers to conduct measure-
ments at the scale of approaches deployed by popular
web sites, and avoids the overhead of having to convince
users to install custom software or hardware.
Second, we deploy such measurements to over 1.2m
2As described in Section 3.1, we use CAIDA’s AS-to-
organization mapping to determine the country of each AS.
nodes across 14kASes in 172 countries3and investi-
gate numerous instances of end-to-end connectivity vi-
olations that result in content modification. With DNS,
we observe that 4.8% of nodes have their NXDOMAIN re-
sponses tampered with, often by the users’ ISPs who
direct them to pages containing ads. While this general
behavior was reported in previous studies from 2008
and 2011 [8, 36], we find different patterns of DNS ma-
nipulation. With HTTP, we find that 0.95% of nodes
suffer from HTML modification (e.g., ad injection or
content filtering) and 1.4% experience image transcod-
ing. Within days, we found new cases of content modifi-
cation that extend previous results that only applied to
the U.S. [37] or that required privileged access to a pop-
ular web site [23]. With HTTPS, we observe that 0.5%
of nodes suffer from certificate replacement (i.e., man-
in-the-middle attacks), typically from anti-virus soft-
ware or malware; these results are in-line with a recent
Facebook study [13].
Third, during the course of our measurements, we dis-
covered another unexpected end-to-end violation: con-
tent monitoring. Specifically, we observed unexpected
requests arriving at our measurement server, indicating
that the user’s application-level traffic was being mon-
itored, and that the content was refetched by a third
party. We found that this affected 1.5% of all nodes, and
that it is most commonly conducted by anti-virus soft-
ware and ISP-level middleboxes. These findings raise
significant security, privacy, and Internet freedom issues
for users subject to this monitoring.
Fourth, we make all of our analysis code and data
public to the research community at
https://tft.ccs.neu.edu
allowing other researchers to use a similar approach
to detect end-to-end connectivity violations in DNS,
HTTP, and HTTPS.
2. BACKGROUND
In this section, we provide background on Luminati,
the Hola Unblocker, and approaches to detecting end-
to-end violations.
3Throughout the paper, we infer country-level information
based on where the networks are registered (see Section 3.1); thus,
our country-level statistics are measuring ASes, not users.
2.1 Large-scale network measurements
Conducting large-scale measurements to detect end-
to-end connectivity violations has been of significant in-
terest to researchers for many years. In general, mak-
ing such measurements requires the cooperation of ma-
chines in a variety of networks across the globe. As a
result, most prior approaches fall into one of two classes:
1. Dedicated hardware/software The first class of
approaches is based on having users deploy dedicated
hardware or explicitly run measurement software. For
example, the Netalyzr [16] project is a Java applet that
diagnoses network problems, and the Dasu [30] project
is a BitTorrent extension that tells users how their ISP
performs; these both also collect measurements for re-
searchers. Note that Netalyzr [16] has comparable cov-
erage [19] with our approach, but their coverage was
amassed over 6 years (instead of days in our case). Sim-
ilarly, the RIPE Atlas [25] and BISmark [22] projects
deploy dedicated hardware to a variety of networks, en-
abling researchers to send and receive traffic from differ-
ent vantage points. These projects and others discussed
in Section 8 have the benefit that they can generally
send arbitrary traffic. However, they can be difficult for
other researchers to emulate, as researchers must con-
vince users to install the software, or build and deploy
dedicated hardware.
2. Web-based measurements The second class of
approaches is based on injecting JavaScript or Flash
into Web pages that runs measurement code. This ap-
proach has been successful employed by Google to mea-
sure the incidence of ad injection [31], and by Facebook
to measure the incidence of HTTPS certificate replace-
ment [13] (i.e., man-in-the-middle attacks). Due to the
popularity of these sites, these approaches can quickly
gather data from a large number of diverse users. Unfor-
tunately, these approaches are typically limited in the
protocols and destinations they can measure (due to
web browser sandboxing and the web security model),
and they require privileged access to a popular web site.
Our goal is to develop an approach that achieves the
best of both of these: allow researchers to conduct mea-
surements without having privileged access, and with-
out having to spend significant effort to develop soft-
ware or hardware for users to install.
2.2 Hola Unblocker
The Hola Unblocker (http://hola.org/) is a system
deployed by Hola Networks that allows users to route
traffic via a large number of proxies across the globe.
The software is provided in a number of different forms,
including a Windows application, a Firefox add-on, a
Chrome extension, and an Android application. Hola
claims [14] that more than 91 million people across the
globe installed the system.
When users install Hola, they have two options for
how to “pay” for access to the service:
Figure 1: Timeline of a request in Luminati: the client
connects to the super proxy and makes the request ¬; the
super proxy makes a DNS request and forwards the re-
quest to the exit node ®; the exit node makes a DNS request
if desired ¯and then requests the actual content °. The
response is then returned to the super proxy ±and finally
to the client ².
1. Users can choose to pay $5 per month (or $45 per
year) for a “premium subscription.”
2. Users can choose to allow Hola to route traffic via
their machine, and then can use Hola for free.
If users choose the second (free) option, clients of Lumi-
nati (described below) are also allowed to route traffic
via the user’s machine. Through experimentation, we
found that not all Hola clients actually are available in
Luminati. In fact, if the user uses any version of the
Hola software other than the version for Windows or
Mac OS, the user is allowed to use Hola, but no traffic
is routed via the user’s machine. In the case of Win-
dows or Mac OS, a separate service is installed on the
user’s machine, and this service maintains a persistent
connection with the Hola servers. For more details on
this service and the security implications, we refer the
reader to the study by Vectra [34].
2.3 Luminati
Luminati is the paid HTTP/S proxy service that
routes traffic via Hola nodes.4Clients of Luminati can
use an API to automate requests, as well as express
preferences over which Hola client will be selected to
route their traffic. Luminati clients are charged on a
per-GB basis, and all Luminati traffic is first routed via
a Hola server before being forwarded to a Hola user’s
client. Below, we provide more details about the Lumi-
nati service and the protocols and control that Luminati
clients are afforded.
Architecture Once a client signs up with Lumi-
nati, they are given a username and password to
access the service. To route traffic via Luminati,
clients make a proxy connection to a Hola server
zproxy.luminati.org (called the super proxy); the su-
per proxy then forwards the client’s request to a Hola
4The exception to this policy is traffic sent to a few domains
(e.g., Google); for these, traffic is forwarded directly from Hola
servers.
client (called the exit node). The exit node then con-
nects to the server the client wishes to connect to, makes
the request, and returns the response back via the super
proxy. Thus, the Luminati client interacts only with the
super proxy. An overview is shown in Figure 1.
Exit node selection Luminati allows clients a measure
of control over which exit node is picked to forward the
traffic. First, the client is allowed to select the country
that the exit node is located in by adding a -country-
XX parameter to their username (where XX is the ISO
country code). Second, the client is allowed to con-
trol whether the same exit node is used for subsequent
requests by appending a -session-XXX parameter to
their username. For example, if the client wished to
make multiple requests via a single exit node, they select
a random number (say, 429) and append -session-429
to their username. Then, if the client makes another
request to Luminati within 60 seconds using that same
session number, it will be routed via the same exit node.
If the client instead picks a different session number,
Luminati will instead route via a new exit node.
DNS request location Luminati also allows clients
to control where the DNS resolution is performed (re-
call that HTTP proxy requests take the form of GET
http://foo.com). First, clients can request that the
DNS resolution be done by the super proxy (using
Google’s DNS service). As this is normally faster, this
is the default behavior. Second, clients can request that
DNS resolution be done by the exit node (using the exit
node’s DNS server). This is done by appending a -dns-
remote parameter to their username. Allowing the exit
node to make the DNS request enables the client to
observe any DNS localization that occurs based on re-
questing IP address (and, as we will show in Section 4,
to measure any DNS content manipulation that the exit
node experiences).
Logging and debugging In the HTTP response
header (X-Hola-Timeline-Debug and X-Hola-
Unblocker-Debug) Luminati provides debugging
information about the request that is useful for un-
derstanding different events. Luminati includes a zID
parameter that represents a persistent unique identifier
for the exit node.5Thus, by recording these zIDs, we
can measure if we are accessing the same Hola exit
node over long timescales, even if the exit node has
changed IP address.
Luminati will also automatically“retry”requests with
additional exit nodes if the first request fails, up to five
times. If the request ultimately succeeds, the Lumi-
nati debugging response header will include the zIDs of
all exit nodes tried and why each request failed. This
behavior is useful, as if the user requested a specific
5We verified this zID parameter is static by installing Hola
on a machine we control and locating it with Luminati; the zID
parameter included in the Luminati response is the same one that
was located in the hola svc.exe.cid file on our machine’s drive.
exit node be re-used for a subsequent request but that
exit node went offline during the request, the debugging
header will indicate that the first request failed but Lu-
minati automatically retried with a different exit node.
HTTPS So far, we have described how the Luminati
HTTP proxy works. Luminati also allows requests to be
made over port 443. To do so, the client connects to the
super proxy and issues a CONNECT IP:443 request
to the super proxy. At this point, Luminati establishes a
TCP-level tunnel via the exit node to the destination IP
address. Luminati does not enforce that the client then
initiates a TLS handshake—at that point the client can
send any data it wishes—but Luminati only allows the
CONNECT command to connect to port 443. The upshot
of this behavior is that Luminati clients can collect the
SSL certificates observed by exit nodes by starting the
TLS handshake and requesting certificates.
3. METHODOLOGY AND DATASET
In this section, we describe how we use Luminati to
collect data, detail the datasets we collected, and dis-
cuss the ethics of our measurement methodology.
3.1 Preliminaries
Throughout the paper, we look at exit nodes at the
Autonomous System (AS) level, the Organization (ISP)
level, and the country level. We map IP addresses to
ASes using data from RouteViews [26] taken at the same
time as our data collection. We map ASes to ISPs (as
one ISP may operate many ASes) using CAIDA’s AS-
organizations dataset [6]. We determine the country of
the ISP using the same dataset.
3.2 Selecting exit nodes
Hola nodes are unlikely to be representative of all
Internet nodes, so we try to crawl as many as is fea-
sible in order to obtain as representative a sample as
possible. While Luminati gives us some control over
which exit nodes are used to send traffic, it does not
allow us to enumerate all exit nodes. As a result, we
must iteratively request new exit nodes until we begin
seeing many of the exit nodes we have already seen be-
fore (as identified by the zID value). Thus, our data
collection methodology proceeds by picking a country
(in proportion to the number of exit nodes Luminati
reports in that country) and picking a random session
number. We repeat this procedure until the rate of new
exit nodes we discover drops significantly.6
3.3 Collected data sets
As each experiment requires a custom measurement
methodology, we defer describing the exact requests we
made to each exit node to the later sections. How-
ever, in Table 2, we present an overview of the number
of exit nodes and their distribution. We collected our
6Note that the Luminati network is very dynamic, so there is
no point at which we have crawled “all” exit nodes.
DNS HTTP HTTPS Monitoring
(§4) (§5) (§6) (§7)
Exit Nodes 753,111 49,545 807,910 747,449
ASes 10,197 12,658 10,007 11,638
Countries 167 171 115 167
Table 2: Number of exit nodes, unique IP addresses, and
corresponding ASes and countries for each of the four ex-
periments in this paper.
datasets between April 13 and April 18, 2016 for the
DNS and content monitoring studies, between April 14
and April 18, 2016 for the HTTPS study, and between
May 4 and May 8, 2016 for the HTTP study. In most
experiments, we are able to use over 650kexit nodes
in over 165 countries (the HTTP and HTTPS experi-
ments pose limitations that require us to study fewer
exit nodes and countries, respectively; these limitations
are described in those sections).
3.4 Discussion
Ethics Our methodology brings up a few ethical mea-
surement issues, and we wish to discuss them explicitly
before presenting our results. We first note that we
paid the operators of Luminati for access to their proxy
service, and were careful to not violate their Terms of
Service. Additionally, using Luminati does not expose
any PII of the exit nodes’ users. We note the oper-
ators of the exit nodes agreed to allow Hola to route
Luminati traffic via their nodes in exchange for free ser-
vice7; these users have the ability to opt-out of such for-
warding either by subscribing to Hola (for $5/month)
or uninstalling the software. Regardless, we took great
care to ensure that our measurements would not harm
the users, either by us sending too much traffic or by
visiting any potentially sensitive domains. For each exit
node (identified by the zID parameter), we never down-
loaded more than 1 MB across all of our experiments.
Additionally, we only requested content or DNS lookups
from domains we created for the experiment, or from a
small number of select sites (the Alexa top 20 domains
in the user’s country and the top 10 U.S. university
domains). We believe that our methodology carefully
balances the potential harm to the operators of the exit
node with the scientific benefit of our results.
Generality Before presenting our results, we briefly
discuss whether our network measurement techniques
can be applied to other VPN services. There are sev-
eral different types of VPN-like services offered today,
each with its own characteristics. Luminati is notable in
that it provides a very large number of exit IP addresses,
but only HTTP and HTTPS protocols are allowed.
Thus, our techniques are directly applicable to other
HTTP/HTTPS-based VPN services; however, services
that are implemented using a static set of centralized
servers are likely to be less interesting from a network
7https://hola.org/legal/sla
measurement perspective as they will likely cover a only
small set of networks. Additionally, we could extend our
methodologies for VPNs that allow arbitrary traffic to
be sent, enabling us to capture end-to-end connectivity
violations in protocols like SMTP; we leave exploring
this further to future work.
4. DNS NXDOMAIN HIJACKING
We begin our look into end-to-end connectivity viola-
tions by investigating the prevalence of modifying DNS
NXDOMAIN responses, which indicate that a given domain
name does not exist. This practice is often referred to
as “hijacking.” Previous work [8, 36] showed that ISPs
may hijack such responses to “assist” users by sending
them to a “search help” page (or one simply filled with
advertisements) instead of allowing the browser to show
a connection error to the user. As this practice can cre-
ate security vulnerabilities, break non-HTTP protocols,
and confuse users, it has generated significant contro-
versy [10,28]. We first introduce our methodology, then
describe our data set, and close by analyzing the causes
and prevalence of NXDOMAIN hijacking.
4.1 Methodology
At first glance, measuring NXDOMAIN hijacking seems
trivial: we could return a NXDOMAIN response to a Lu-
minati exit node and see if the super proxy reports the
error. However, in practice it more difficult, because (a)
Luminati first checks that the requested domain name
exists at the super proxy before forwarding the request
to the exit node, and (b) returning an NXDOMAIN re-
sponse does not allow us to identify the IP address of
the exit node that receives it; rather, we only see the IP
address of the exit node’s DNS server. To address these
issues, we need to ensure that Luminati’s super proxy
check passes, and that we can reliably retrieve the exit
node’s IP address.
Thus, for each exit node we wish to measure, we first
select two unique domain names d1and d2for a domain
whose authoritative server we control. We then proceed
as follows, illustrated in Figure 2:
1. We configure our DNS server to always return a
valid Arecord pointing to our web server for d1.
We also configure our DNS server to return a valid
Arecord for d2, but only if the request comes from
Luminati’s super proxy’s DNS server (empirically
determined to be one of Google’s anycasted 8.8.8.8
DNS servers, located in 74.125.0.0/16). For all
other source IP addresses, our DNS server returns
NXDOMAIN. This is necessary to convince the super
proxy to forward the request to the exit node.8
2. We then request that the exit node fetch
http://d1. We record the IP address of the exit
8The careful reader will note that this prevents us from mea-
suring exit nodes that use the same anycasted Google DNS server;
we filter these out after measuring the exit node’s DNS server’s
IP address in step 2.
Figure 2: Timeline of measurement of NXDOMAIN hijacking:
the client connects to the super proxy and makes the request
¬; the super proxy makes a DNS request, our authoritative
DNS server returns a Arecord, and the request is forwarded
to the exit node °; the exit node makes a DNS request
via its DNS server and our authoritative server returns NX-
DOMAIN response ±³. The error response (i.e., NXDOMAIN
is not hijacked) or the resulting content (i.e., NXDOMAIN is
hijacked) is returned to our client ´.
node’s DNS server (from the incoming DNS re-
quest), the exit node’s IP address (from the in-
coming HTTP request), and the exit node’s zID
(from the headers in the Luminati response). This
allows us to establish the exit node’s IP address
for the subsequent NXDOMAIN test.
3. Using the same exit node, we request http://d2.
If we receive an NXDOMAIN error in the Luminati
log, we know the exit node received the correct
response. Otherwise, we record the content that
was served to the exit node for later analysis.
4.2 Results
We use this methodology to measure a total of
753,111 unique exit nodes from 167 countries and 10,197
ASes. We find that these exit nodes are configured to
use a total of 33,446 unique DNS servers. We observe
that 717,311 of the exit nodes (95.2%) do not expe-
rience NXDOMAIN hijacking, but the other 35,800 exit
nodes (4.8%) have their response intercepted.
We first obtain a macroscopic view of NXDOMAIN hi-
jacking phenomena by grouping exit nodes according
to country and AS, and focus on the groups where we
have at least 100 exit nodes. This sample size allows
us to draw strong inferences about the parties respon-
sible for the hijacking. We observe a number of inter-
esting trends: the exit nodes that experience hijacking
are widely spread across the globe, with only 262 (40%)
ASes and 15 (10%) countries having no exit nodes that
hijacking. However, we do observe instances where hi-
jacking is common among our exit nodes: we observe
that in 20 ASes, more than one-third of exit nodes expe-
rience it. Table 3 shows the top 10 countries sorted by
the fraction of exit nodes with hijacked DNS responses.
For example, we found that in Malaysia, more than 52%
of exit nodes we measured experienced hijacking.
For the rest of the section, we focus on individual
DNS servers; We therefore consider all DNS servers
where we observed at least 10 exit nodes, which leaves
us 9,839 (29.4%) DNS servers.
Exit nodes
Rank Country Hijacked Total Ratio
1 Malaysia 3,652 6,983 52.3%
2 Indonesia 3,178 8,568 37.1%
3 China 237 671 35.3%
4 U.K. 9,553 37,156 25.7%
5 Germany 4,703 19,076 24.7%
6 U.S. 6,108 33,398 18.3%
7 India 1,127 6,868 16.4%
8 Brazil 3,190 24,298 16.4%
9 Benin 90 716 12.6%
10 Jordan 76 1,117 7.7%
Table 3: Table showing the top 10 countries sorted by the
ratio of hijacked exit nodes.
4.3 Causes of hijacking
Below, we investigate the sources of DNS hijacking.
There are four locations where the request could be hi-
jacked that we can identify: the ISP’s DNS server, a
public DNS server, a middlebox, or end-host software.
4.3.1 ISP’s DNS server
If the exit node is using the ISP’s DNS server, then
this server may be configured to hijack responses and di-
rect users to an ISP-provided web server. This behavior
has been observed at a variety of ISPs like AT&T [11]
and Verizon [35]. If this were the case, we would expect
to see that most of the exit nodes that use an ISP’s
DNS server would experience hijacking.
Our first task is to identify ISP-provided DNS servers.
To do so, we group exit nodes by the DNS server that
we observed them to use. We identify ISP-provided
DNS servers as ones where all exit nodes and the DNS
server belong to the same ISP; this results in 9,584
ISP-provided DNS servers. For statistical significance,
we focus on those where we observe at least 10 exit
nodes using the DNS server; this represents 534 of these
servers. We then focus on those that are likely to be
hijacking responses by selecting those among the 534
where at least 90% of the exit nodes using the server
experienced hijacking; this represents 366 unique ISP-
provided DNS servers (3.8% of all ISP servers) covering
a total of 17,358 exit nodes.
In Table 4, we aggregate these 366 DNS servers
into 19 ISPs from 9 countries. We observe that the
majority of these ISPs and DNS servers are in the
U.S., and their behavior varies by ISP. For exam-
ple, we find that TMnet intercepts NXDOMAIN response
and serves content that redirects the mistyped URL
to http://midascdn.nervesis.com. This product’s
tagline is “We turn users’ typing errors into your adver-
tising advantage”, clearly indicating that TMnet hijacks
NXDOMAIN responses for advertising revenue.
As another example, we found that five ISPs used
nearly identical JavaScript code in their hijacked re-
sponse HTML: Cox Communication, Oi Fixo, TalkTalk,
BT Internet, and Verizon. This code redirects users to
a web page managed by each ISP, which typically in-
cludes search results for the NXDOMAIN domain. The
DNS Exit
Country ISP Servers Nodes
Argentina Telefonica de Argentina 14 276
Australia Dodo Australia 21 1,404
Brazil Oi Fixo 21 2,558
CTBC 4 290
Germany Deutsche Telekom AG 8 1,385
India
Airtel Broadband 9 735
BSNL 2 71
Ntl. Int. Backbone 8 245
Malaysia TMnet 8 1,676
Spain ONO 2 71
U.K. BT Internet 6 479
Talk Talk 46 3,738
U.S.
AT&T 37 561
Cable One 4 108
Cox Communications 63 1,789
Mediacom Cable 6 219
Suddenlink 9 98
Verizon 98 2,102
WideOpenWest 1 39
Table 4: Table showing ISP DNS servers that hijack re-
sponses for more than 90% of exit nodes. Also shown is the
number of DNS servers and exit nodes per ISP.
common JavaScript code suggests that these ISPs are
using a common hardware device or software package
to implement the hijacking.
In summary, NXDOMAIN hijacking by ISP DNS servers,
while rare overall, affects a wide variety of networks
globally and often causes users’ browsers to visit sites
with advertising. This raises serious concerns about pri-
vacy and deceptive business practices, as it is unclear
what additional information is exposed to third-party
advertisers, nor is it clear whether users have knowingly
opted-in to participate in such advertising.
4.3.2 Public/External DNS server
If the exit node is configured to use a public
DNS server external to the ISP—such as Google or
OpenDNS—then this server may be hijacking responses
(e.g., OpenDNS has been observed to do so [1]). Similar
to the ISP DNS server hijacking, if this were the case,
we would expect to see most of the exit nodes using a
public DNS server receiving hijacked responses.
To identify public DNS servers, we use the same
grouping by DNS server as in Section 4.3.1, and dis-
regard the DNS servers with fewer than 10 exit nodes
for statistical significance. We then identify public DNS
servers as ones where we observe exit nodes coming from
more than two countries.9We find 1,110 such public
DNS servers, and we observe 21 (1.89%)10 of them to
9Interestingly, we are also able to measure when ISPs are
likely pointing their subscribers to public DNS servers. We iden-
tify 91 ASes where at least 80% of their exit nodes are using
Google DNS services. For example, for AS 28683 (OPT Benin)
in Benin, we observe 225 exit nodes out of 227 (99.1%) using
Google DNS. These results align with a recent study [32] that
reported that 16.2% of African ASes’ DNS resolvers are located
at outside of their network.
10In fact, this fraction is in-line with a previous study [8] from
2008, which reported that 2% of public DNS servers hijack NXDO-
MAIN response by querying to most of IPv4 space. However, they
be hijacking more than 90% of the exit node responses;
these 21 servers are used by a total of 1,512 exit nodes.
Next, we take a closer look at the operators of these
21 servers to determine whether they are, in fact, public
DNS servers. Specifically, we (a) identify the owner of
the DNS server’s IP address based on the owner of its
BGP prefix and (b) issue DNS queries directly to each
DNS server to determine whether it responds. We find
four public DNS services we can identify: (1) Comodo
DNS 11, encompassing 9 DNS servers, (2) UltraDNS, en-
compassing 4 DNS servers, (3) LookSafe, a piece of mal-
ware that changes users’ DNS settings,12 encompassing
2 DNS servers, and (4) Level 3, encompassing 3 DNS
servers. From the remaining 3 public DNS servers, we
are unable to identify the operator.13
4.3.3 ISP middleboxes and malware
If the exit node received a hijacked response but we
cannot attribute it to a DNS server, then there are two
other potential vectors: somewhere along the network
path (e.g., a transparent DNS proxy) or software on
the exit node itself (e.g., malware). In general, disam-
biguating these cases is difficult, as we have little visi-
bility into the network path or the software on the exit
node. However, we can get clues as to the source of the
hijacking by looking at the content of the HTML page
returned.
We first focus on exit nodes where we know the server
is not performing hijacking. Specifically, we focus exit
nodes using Google’s 8.8.8.8 DNS service14 , which is
well-known to not hijack responses. We observe 927
(0.12%) exit nodes that use Google’s DNS service and
yet still receive a hijacked response.
Next, we look into the content returned, extracting
the URL links that appear in the response. If we ob-
serve links to ISP-operated web sites, it is likely that
the hijacking is occurring either somewhere along the
path or due to ISP-provided software. If we instead ob-
serve links to known malware or advertising domains
(e.g., affiliate programs), it is likely that the hijacking
is occurring due to malware on the exit node.
Across these 927 exit nodes, we extract 119 URLs;
Table 5 presents all domains from the URLs that we
observed on at least 5 exit nodes. We observe a
number of interesting phenomena: First, we find that
12 URLs are from exit nodes in a small number of
ASes, and the URLs link to sites operated by the same
ISP. For example, all 80 exit nodes who received the
content including the http://navigationshilfe.t-
focused only on open resolvers, whereas we are measuring DNS
resolvers that nodes are configured to use.
11http://www.comodo.com/secure- dns/
12http://www.spyware- techie.com/
looksafe-removal- guide
13In fact, we are not able to issue DNS resolution queries to
two of them, despite the fact that the exit nodes using them come
from more than 6 countries.
14We look for DNS requests coming from Google’s published
netblocks.
Exit
URL Nodes ASes
navigationshilfe.t-online.de 80 1
www.webaddresshelp.bt.com 73 1
v3.mercusuar.uzone.id 53 1
error.talktalk.co.uk 46 3
dnserros.oi.com.br 40 2
dnserrorassist.att.net 32 1
searchassist.verizon.com 30 1
finder.cox.net 17 1
ayudaenlabusqueda.telefonica.com.ar 16 1
google.dodo.com.au 13 1
airtelforum.com 14 1
nodomain.ctbc.com.br 7 1
search.mediacomcable.com 7 1
midascdn.nervesis.com 68 1
nortonsafe.search.ask.com 25 18
securedns.comodo.com 9 9
Table 5: Table showing domains present in URLs from
hijacked NXDOMAIN responses served by Google’s DNS server.
The top 12 rows represent cases of likely ISP hijacking; the
bottom two rows (shaded) represent cases of likely anti-virus
software or malware.
online.de are in the AS 3320 (Deutsche Telekom).
Similarly, all 46 exit nodes whose content con-
tains http://error.talktalk.com are in ASes 43234,
13285, and 9105—all from Talk Talk. Thus, we can rea-
sonably conclude that these ISPs are responsible for in-
tercepting the NXDOMAIN response from our DNS server.
Second, we also identify cases where we sus-
pect software on the exit node (anti-virus soft-
ware or malware) is performing the hijacking. For
example, 25 exit nodes received content including
http://nortonsafe.search.ask.com, which appears
with other 5 URLs containing “symantec” or “nor-
ton” in 18 different ASes and in 18 different coun-
tries. A similar pattern exists for content with the URL
http://securedns.comodo.com. Given the large num-
ber of ISPs affected, we can infer that this is likely due
to software at end hosts and not due to an ISP.
4.4 Summary
In this section, we developed techniques to use Lu-
minati to explore the prevalence of and mechanisms
behind NXDOMAIN hijacking across the world. Overall,
we found that 4.8% of all exit nodes experienced such
hijacking. If we look at the sources, we can attribute
89.6% of the hijacking to ISP DNS servers, 7.7% of the
hijacking to public DNS servers, and 2.7% of the hijack-
ing to either ISP-provided software or malware.
This fraction of vantage points affected by this be-
havior is higher than was reported in 2008 [8], but is
also substantially lower than reported in a 2011 study;
in the latter, 24% of Netalyzr sessions experienced NX-
DOMAIN wildcarding [36]. We believe one key difference
is that we use six times more vantage points, and our
results maybe somewhat less biased by users who run
Netalyzr because they suspect problems with their net-
work configuration.
5. HTTP CONTENT MODIFICATION
Another important type of end-to-end violation con-
sists of a third party modifying HTTP contents between
servers and clients. In this section, we use Luminati to
investigate how HTTP content is modified by focusing
on four types of content: (1) HTML, (2) images, (3)
JavaScript, and (4) CSS.
5.1 Methodology
Our methodology is relatively straightforward: we
simply fetch content from our Web server via an exit
node, and check whether the content we receive is the
same as what we sent. For this experiment, we fetch
four different pieces of content through each exit node:
a 9 KB HTML page, a 39KB JPEG image, a 258 KB un-
minified JavaScript library, and a 3 KB un-minified CSS
file. We initially tried using very small files to minimize
the bandwidth consumption and load on exit nodes, but
found that when fetched objects smaller than 1 KB, we
observed much lower levels of content modification.
Because the content modification tests require signif-
icantly more bandwidth that the other tests, we use a
methodology that minimizes the network traffic. We
first measure three exit nodes in the same AS. If we
detect that at least one exit node in an AS experiences
content modification, we then return to that AS to mea-
sure more exit nodes to confirm whether the modifica-
tion is more likely due to the network service provider
or software running on an end host. This approach
may underestimate content modification that ASes ap-
ply non-uniformly, but should detect AS-level modifi-
cations that apply to most nodes, as well as provide
an estimate of the prevalence of modifications due to
end-host software.
5.2 Measurement Results
Using this methodology, we measured 49,545 exit
nodes in 12,658 ASes across 171 countries. We de-
tected HTML content modification for 472 exit nodes
(0.95%), image modification for 694 (1.4%), JavaScript
modification for 45 (0.09%), and CSS modification for
11 (0.002%). We discuss each of these in detail below.
HTML We find that 472 exit nodes (0.95%) received
modified HTML pages, which is in-line with the previ-
ous work [23]. We filter 32 cases that return pages such
as “bandwidth exceeded” or “blocked” messages, leav-
ing 440 exit nodes in 268 ASes. In all of these cases,
JavaScript code is injected into the HTML page.
To understand the source of injection, we group exit
nodes at the AS level and find the ratio of exit nodes
affected. We do this only for ASes where we measured
at least 10 exit nodes, leaving us with 272 exit nodes
spread across 65 ASes.
We find that only in one AS (discussed below) do
all nodes receive injected content and only in one other
do more than 10% receive injected content. Thus, for
URL or Keyword Exit Countries
Nodes (ASes)
NetSparkQuiltingResult 21 1 (1)
d36mw5gp02ykm5.cloudfront.net 201 44 (99)
msmdzbsyrw.org 97 4 (76)
pgjs.me14 16 1 (12)
jswrite.com/script1.js15 15 9 (10)
var oiasudoj; 11 1 (11)
AdTaily_Widget_Container 11 8 (9)
Table 6: Table showing the most commonly appearing 7
URLs (or keyword) of injected JavaScript. The first row
(shaded) represent the keyword used for web filtering in AS
42925 (Internet Rimon ISP).
exit nodes in the remaining ASes, the culprit is likely
software running on the exit node.
We observe that all exit nodes in AS 42925 (In-
ternet Rimon ISP), received modified HTML content.
Through manual inspection, we find that their source
codes share the same meta tag, NetsparkQuiltingResult.
We find that this tag is generated by NetSpark’s17 Web
filtering software, which shares the same parent com-
pany as Internet Rimon.
For the remaining cases, we investigated the
JavaScript code injected into HTML content by man-
ually extracting URLs or keywords that characterize
the code. This process identified 21 URLs or key-
words from 416 exit nodes (94.5% of all injected con-
tent). Table 6 shows the most common of these.
Most injected content comes from malware. For ex-
ample, when the modified HTML contains class id
AdTaily_Widget_Container, an additional 335 KB of
advertisements are included. Similarly, if the injected
JavaScript includes a variable named oiasudoj, the
modified response size increases by 23 KB and the page
loads more than 170 ads.
Images We find that 694 exit nodes in 22 ASes re-
ceive modified images. As with the HTML analysis, we
first group exit nodes by AS, then calculate the fraction
of exit nodes per AS that are affected. We filter out
ASes with fewer than 10 exit nodes, yielding 604 exit
nodes in 12 ASes. Interestingly, via manual verification,
we observe that all of 12 ASes correspond to mobile
ISPs. Since Luminati exit nodes are not attached to
mobile networks, we believe that most are temporarily
connected via tethering, enabling us to measure mobile
ISPs as well.18 We also found that in all cases, the im-
ages were compressed to lower quality levels, which is
consistent with recent prior work showing this behavior
in the US [37].
In addition to the US, our measurements reveal signif-
icant image compression in other countries. To validate
that image compression is due to the ISP, we use two
14reported at http://www.freefixer.com/b/remove-pgjs-me-
from-firefox-chrome-and-internet-explorer/
15reported at https://www.herbiez.com/?p=218
17http://www.netspark.com
18Luminati also uses mobile devices running the Hola Mobile
App, but they become exit nodes only when they are on WiFi
and charging [15].
Exit Nodes
AS ISP (Country) Mod. Total Ratio Cmp.
15617 Wind Hellas (GR) 10 10 100% 53%
29180 Telefonica (GB) 17 17 100% 47%
29975 Vodacom (ZA) 83 88 94% M
25135 Vodafone (GB) 15 18 83% 54%
36935 Vodafone (EG) 62 81 77% M
36925 Meditelecom (MA) 87 128 68% 34%
16135 Turkcell (TR) 44 65 68% 54%
15897 Vodafone (TR) 14 25 56% 53%
12361 Vodafone (GR) 11 23 48% 52%
37492 Orange (TN) 97 331 29% 34%
132199 Globe (PH) 197 1,374 14% 51%
12844 Bouygues (FR) 34 615 6% 53%
Table 7: Exit nodes that received compressed images, di-
vided by ISP. Also show in the final column is the compres-
sion ratio observed (Cmp.); “M” indicates multiple compres-
sion ratios were observed.
.
approaches. First, we compare how many exit nodes
observe image compression in each AS. Table 7 shows
detailed results. Interestingly, we first notice that Voda-
fone is present in four different countries, indicating
that they use compression in their networks globally.
We also observe that not every exit node experiences
compression. For example, only 6% of exit nodes in
AS16135 received compressed content. We do not cur-
rently have an explanation for this phenomenon, but it
may be due to different subscriber plans.
We also analyze the size of compressed images that
exit nodes received. The rationale behind this is that if
image is compressed by the ISP, all exit nodes should see
similar sizes. Table 7 also shows the image compression
ratios for each AS. We find that different ISPs use differ-
ent compression ratios (suggesting different implemen-
tations or settings), but are consistent for all exit nodes
except in two ASes. In these, we observe two different
compression ratios spread across many exit nodes.
These results strongly suggest that the ISP is respon-
sible for transparently compressing images, and that the
exit nodes generally see consistent compression. The
impact of image compression is potentially significantly
reduced image quality, which not only violates net neu-
trality principles but also reduces quality of experience
for affected users.
JavaScript and CSS We observe 45 exit nodes and
11 exit nodes received JavaScript and CSS content re-
placed by different content, respectively. Manually in-
specting these revealed they all consisted of error pages
or empty responses. We did not observe any modifi-
cation to the original content such as minification or
injection.
5.3 Summary
In this section, we examined how HTTP objects are
modified in-flight, finding significant modifications for
both HTML and image objects. Our findings comple-
ment results from previous studies that looked into the
source of ad-injection by investigating Chrome exten-
Figure 3: Timeline of measurement of the middleboxes
man-in-the-middling: the client iteratively establishes a
TCP-level tunnel via the exit node to the three target servers
on port 443 ¬and fetches their certificates . If our mea-
surement client finds at least one of them to be modified, it
does the same procedure of ¬and but fetches the certifi-
cates from all target servers ®and ¯.
sions/Windows binaries [31] or local ISP servers [38].
We also observed images to be modified only by mo-
bile ISPs, in-line with other recent research [37]; how-
ever, we present results from many additional ISPs and
countries.
6. SSL CERTIFICATE REPLACEMENT
Next, we investigate the end-to-end violations in
HTTPS by looking for cases where a third party inter-
cepts a SSL/TLS connection and presents a certificate
that is different from the one provided by the server.
This attack, known as a man-in-the-middle (MITM)
attack, is typically conducted to allow inspection of
otherwise-encrypted content.
6.1 Methodology
To detect certificate replacement using Luminati, we
use the HTTP CONNECT method with the super proxy,
which tunnels all TCP port 443 via the exit node, al-
lowing our measurement client to conduct a TLS hand-
shake with arbitrary servers. For each site we measure,
we complete a TLS handshake and record the SSL cer-
tificates presented; we then terminate the connection
without actually requesting any content.
As certificate replacement may target individual web
sites, we choose three different classes of sites to test:
1. Popular sites We choose the 20 most popular
sites that support HTTPS from each country’s
Alexa Ranking [3].
2. International sites We choose the web sites of
10 U.S. universities where IMC’16 PC members
are affiliated.
3. Invalid sites We create three sites under our con-
trol with intentionally invalid certificates: a self-
signed certificate, an expired certificate, and a cer-
tificate having incorrect Common Name.
For each exit node we wish to measure, we conduct
a two-phase scan. First, as an initial phase, we select
one site randomly from each of these three groups. We
connect to these three sites with the exit node and then
download and verify the certificates. For the first two
classes of sites, we check for certificate replacement by
Issuer Name Exit Nodes Type
Avast 3,283 Anti-Virus/Security
AVG Technology 247 Anti-Virus/Security
BitDefender 241 Anti-Virus/Security
Eset SSL Filter 217 Anti-Virus/Security
Kaspersky 68 Anti-Virus/Security
OpenDNS 64 Content filter
Cyberoam SSL 35 Anti-Virus/Security
Sample CA 2 29 N/A
Fortigate 17 Anti-Virus/Security
Empty 14 N/A
Cloudguard.me 14 Malware
Dr. Web 13 Anti-Virus/Security
McAfee 6 Anti-Virus/Security
Table 8: Table showing the most commonly appearing
13 issuers of replaced certificates. The types refer to anti-
virus/security products, content filters, and malware.
validating the certificate chain.19, 20 For third class of
site, we check whether the invalid certificate matches
exactly (because we know exactly which certificate was
sent). Second, if any of these checks fail, we then down-
load certificates for all 33 sites via the exit node. Fig-
ure 3 presents a diagram of our methodology.
6.2 Measurement Results
Using this methodology, we measured a total 807,910
exit nodes in 10,007 ASes and 115 countries.21 Among
these exit nodes, we find that 4,540 of them (0.05%)
received at least one modified certificate. Interestingly,
we find that not every certificate is modified, indicating
that certificates can be selectively replaced.
Looking at the AS distribution of these exit nodes,
we find the majority of nodes in all ASes do not expe-
rience certificate replacement (e.g., only 1.2% of ASes
have more than 10% of exit nodes experience replace-
ment). Since the observed certificate replacement does
not strongly depend on AS, we infer that the cause of
replacement is likely software on the exit nodes [13,20].
To investigate the source of the certificate replace-
ment, we focus on the Issuer Common Name of the cer-
tificates.22 We find 320 unique Issuer Common Names,
and Table 8 details the 13 groups that have at least five
exit nodes (these cover 93.6% of all exit nodes experienc-
ing certificate replacement). We manually investigated
these 13 issuers and found three primary causes: anti-
virus software, content filtering services, and malware.
19We check the validity using openssl verify, configured to
trust the OS X 10.11 root store [21]; this includes 187 unique root
certificates.
20We cannot do an exact match check on the certificate, as
many sites use content delivery networks and end up using differ-
ent certificates on different servers.
21We measured fewer countries in this experiment than the
others as we were unable to get Alexa rankings for many of the
countries.
22Leaf certificates are signed by a Certificate Authority (CA),
and the identity of the CA that signed the certificate is present in
the Issuer field. Thus, the Issuer field is likely to provide clues
about who is conducting the certificate replacement.
Anti-virus The most commonly occurring cause of
certificate replacement is anti-virus software (or, other
types of security software) that appear to be running
on the exit node; these account for 9 of the top 13 is-
suers, which is in-line with a previous study [13]. We
find that all of these appear to generate spoofed leaf
certificates on demand, suggesting that as part of the
install process, they installed a custom root certificate
into the browser’s or OS’s trust store to avoid browser
warnings.23 Interestingly, we find that, with the excep-
tion of Avast, each system uses the same public keys on
all certificates on a given exit node (i.e., every spoofed
certificate uses the same public key on the same exit
node running the software), which was not previously
reported [12].
We also observe that Cyberroam, ESET SSL Filter,
Kaspersky, McAfee, and Fortigate replace originally in-
valid certificates with seemingly valid spoofed certifi-
cates. In fact, all valid and invalid certificates share the
same public key and other attributes in the Issuer field
such as name, organization, and country, which strongly
indicates that all of them are signed by the same root
certificate. Thus, unless these AV systems have some
other mechanism to alert users, their browsers would
not raise an alert for sites with invalid certificates, po-
tentially exposing users to security vulnerabilities like
phishing attacks. Avast,24 BitDefender, and Dr. Web
also also generate new certificates for sites with invalid
certificates, but they do so using a different Issuer.
Content filter We observe one content filter,
OpenDNS, that replaces TLS certificates. OpenDNS
provides a service called Block Page and Block Page
Bypass25 that presents a block page—depending the
network administrator’s list of blocked sites—over both
HTTP and HTTPS connections. To prevent browser
warnings from occurring on blocked pages, OpenDNS
must install the “OpenDNS Root Certificate Author-
ity (CA)” certificate into their root store. In our col-
lected data, we find that OpenDNS uses this certificate
to MITM secure connections, but only if the server’s
certificate is valid (i.e., they do not replace certificates
that were originally invalid).
Malware Finally, we observe one piece of malware that
is present on 14 exit nodes: Cloudguard.me. We observe
that the replaced certificates copy most of the fields
from the original, valid certificate, presumably to make
them appear more legitimate to users. We also find
these exit nodes experience HTTP content injection,26
23Carn´e de Carnavalet and Mannan [12] investigated six of the
anti-virus systems (Avast, AVG, Eset, Kaspersky, Dr. Web) and
found that this is indeed the case.
24Avast has multiple issues, including: Avast! web/mail shield
root, Avast! web/mail shield self-signed root, Avast! web/mail
shield untrusted root, Avast trusted CA, Avast untrusted CA
25https://support.opendns.com/entries/98279288-Block-
Page-Errors-Installing-the-OpenDNS-Root-CA
26http://www.spyware-techie.com/cloudguard-removal-
guide
Figure 4: Timeline of measurement of the middleboxes
monitoring content: the client connects to the super proxy
and makes a request ¬for a unique domain; the proxy for-
wards the request to the exit node ; who makes the actual
request ®. If someone is monitoring the request ¯they later
make the same request to our server °.
further evidence that Cloudguard.me is malware. Inter-
estingly, all of these exit nodes are in Russian ISPs.
6.3 Summary
In this section, we examined how Luminati’s exit
nodes’ HTTPS connections are intercepted. While we
only found that only 0.05% of all exit nodes experience
certificate replacement, it is an extremely serious at-
tack: when SSL is intercepted, users’ passwords, bank-
ing details, and other sensitive information are exposed
to whomever is replacing certificates. Further, we ob-
serve than even in the case of the “legitimate” uses of
certificate replacement (i.e., anti-virus and other secu-
rity products), many follow poor security practices by
sharing private keys and replacing invalid certificates
with ones the browser trusts.
7. CONTENT MONITORING
All of our analysis thus far has focused on end-to-end
violations where content was modified. Another con-
cerning form of end-to-end violation is content monitor-
ing, or cases where middleboxes are silently observing
content that users are downloading for the purpose of
scanning content or otherwise controlling access. While
content modification is easy to detect (e.g., via block
pages), content monitoring is significantly more diffi-
cult to detect, as there is (by definition) no change to
the content itself. However, we discovered we can de-
tect certain types of content monitoring based on unex-
pected requests arriving at our measurement server.
7.1 Methodology
In the previous experiments, recall that we generated
a unique domain name for each HTTP request. When
analyzing the results, we sometimes observed multiple
requests for a given domain, even though our client only
ever requested each domain once. We explored this be-
havior in more detail by generating a unique per-exit-
node domain name dwhose IP address pointed to our
web server. Then, we requested that the exit node fetch
http://d, which we expect to generate a single request
at our web server. We monitored the Web server for up
to 24 hours after generating the initial request to see if
additional requests for darrived from different IP ad-
dresses; if so, it would indicate that either a middlebox
Monitoring entity Monitored users
Name IPs Exit nodes ASes Countries
Trend Micro 55 6,571 734 13
TalkTalk 6 2,233 5 1
Commtouch 20 1,154 371 79
AnchorFree 223 461 225 98
Bluecoat 12 453 162 64
Tiscali U.K. 2 363 6 1
Table 9: Table showing the top six ASes where unexpected
requests originated, indicating content monitoring.
along the path or software on the exit node monitored
the request and requested the content itself. Figure 4
presents a diagram of our methodology.
7.2 Measurement Results
Using this methodology, we measured a total of
747,449 exit nodes, and observed that 11,234 (1.5%)
of them resulted in multiple, unexpected requests. In
general, it is challenging to determine the cause of the
unexpected requests as we have little visibility into the
network path or the software on the exit node. How-
ever, we are able to get clues by manually looking at
features of the unexpected request, including the User-
Agent field in HTTP request headers and the AS from
which the request came.
Overall, we found that the unexpected requests came
from a total of 424 unique IP addresses that were dif-
ferent from the exit nodes’; we grouped these by AS,
resulting in 54 groups. Table 9 provides more details
on the most frequent groups out of these 54; all to-
gether, these six sources generated 11,235 (94.0%) of
the unexpected requests. We also calculated the time
between the exit node’s request and the unexpected re-
quest; Figure 5 presents the cumulative distribution of
the delay for each of these six. We manually investi-
gated these requests and found two primary culprits:
anti-virus software and ISP-level services.
7.2.1 Anti-virus and VPN software
TrendMicro The most commonly observed source of
the unexpected requests came from IP addresses owned
by TrendMicro, an anti-virus software company; we
observed unexpected requests from 6,571 (47.8%) exit
nodes across 734 ASes and 13 countries. Interestingly,
we found that TrendMicro almost always makes two un-
expected requests: the first request typically arrives be-
tween 12 and 120 seconds after the exit node’s request,
while the second request typically arrives between 200
and 12,500 seconds after the exit node’s request. This
is immediately visible in Figure 5, with the two parts
of the distribution separated by a step at y=0.5. Due
to the wide distribution of ASes where exit nodes are
affected, we believe this is likely to due to TrendMi-
cro monitoring software (called Web Reputation Ser-
vices [33]) running on the exit nodes.
Commtouch We observe a similar pattern with
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0.001 0.01 0.1 1 10 100 1000 10000
CDF
Request Interval (Sec)
TrendMicro
TalkTalk
Commtouch
AnchorFree
BlueCoat
Tiscali
Figure 5: Cumulative distribution of delay between the
exit node request and the additional, unexpected requests
for the top six sources. Note that xaxis in log scale.
Commtouch, the former name of CYREN Ltd., a soft-
ware company making anti-virus software. We observe
that 20 IP addresses linked to Commtouch make unex-
pected requests from 1,154 exit nodes across 371 ASes
and 79 countries. We also observe that unlike Trend-
Micro, Commtouch makes the unexpected requests be-
tween 1 and 10 minutes after the exit node’s request.
Anchorfree The fourth-most-common source of ad-
ditional requests comes from IP addresses owned
by Anchorfree—a “freemium” VPN service for web
browsing—and covers 461 exit nodes. We can observe
that these exit nodes are using Anchorfree’s VPN ser-
vice, as the IP address of the exit node’s request does
not match the IP address of the exit node as reported
by Luminati (instead, the IP address is from Anchor-
free’s AS). However, with these nodes, we also observed
an additional request from a separate IP address in An-
chorfree’s network, suggesting that Anchorfree is mon-
itoring the content browsed. In fact, we observe that
the first request comes from one of 10 different loca-
tions around the globe, but the second request always
comes from Menlo Park, California. Additionally, the
two requests are extremely close in time; 99% of them
are separated by under 1 second. This is likely due to
their “malware protection” feature provided as part of
the “Hotspot Shield” service.
Bluecoat Finally, we observe requests from 12 IP ad-
dresses owned by Bluecoat Systems (a computer se-
curity company) covering 453 exit nodes in 162 ASes
across 64 countries. Bluecoat appears largely similar
to TrendMicro and Commtouch, with two unexpected
requests per exit node. However, we observe that the
first of these unexpected requests comes in before the
exit node’s request 83% of the time (this is why Blue-
coat’s CDF starts at 41%). Thus, it appears that Blue-
coat first downloads the content before allowing the exit
node’s request to proceed.
The behavior of these applications has significant se-
curity, privacy, and performance implications for end
users. Users’ web browsing history is being uploaded in
real-time to the anti-virus company’s servers, who are
then re-downloading the content fetched by the user.
Similarly, Anchorfree is duplicating users’ requests, and
Bluecoat is holding Web requests until it first fetches it
on the user’s behalf for analysis.
7.2.2 ISP-level monitoring
TalkTalk The second-most-commonly observed source
of unexpected requests comes from six IP addresses
owned by TalkTalk (a U.K. ISP); these requests were
generated by 2,233 exit nodes. We also observe a sim-
ilar pattern to TrendMicro, in that TalkTalk typically
generates two unexpected requests, with the first re-
quest arriving almost exactly 30 seconds after the exit
node’s request and the second request arriving at our
measurement server over the next hour (this pattern
can be observed in Figure 5). Interestingly, we find
that all of the exit nodes who generated these unex-
pected requests are in TalkTalk; we therefore believe
that these requests are due to ISP-level content moni-
toring. Adding further weight to this theory is the fact
that the 2,233 exit nodes in TalkTalk that generated un-
expected requests represent 45.2% of all the exit nodes
that we measured in TalkTalk.
Tiscali U.K. We also observe that another ISP, Tis-
cali U.K., has a similar pattern (in fact, Tiscali U.K.
was acquired by TalkTalk in 2009, but continues to be
run as a separate entity). We observe two IP addresses
in Tiscali U.K. that generated unexpected requests for
363 exit nodes (representing 11.4% of all Tiscali U.K.
exit nodes). Unlike TalkTalk, we observe only one un-
expected request; this request almost always comes in
exactly 30 seconds after the exit node’s request.
We cannot say for sure why some, but not all, exit
nodes in both ISPs experience content monitoring. Po-
tential explanations are that content monitoring could
be done non-deterministically (e.g., only 10% of re-
quests are monitored), or it may be due to ISP-provided
additional services like parental content controls.27 Re-
gardless, monitoring and its potential for controlling
open access to content has significant implications for
Internet users, and should be made transparent.
7.3 Summary
In this section, we developed and deployed techniques
that can detect certain instances of content monitoring.
We found that over 1.5% of all exit nodes suffered from
their HTTP requests being collected and re-requested
by a third party, and that the most common causes
were anti-virus software products, VPN services, and
the user’s ISPs. All of these instances have significant
privacy implications, as these users are likely unaware
that their HTTP browsing history is being duplicated
in near-real-time.
27For example, TalkTalk’s opt-in SuperSafe feature
(https://help2.talktalk.co.uk/supersafe-boost-overview).
8. RELATED WORK
Before concluding, we provide an overview of related
work on end-to-end connectivity violations (we dis-
cussed other measurement approaches in Section 2.1).
DNS Manipulation Because DNS provides no built-
in security, DNS traffic has been the vector for a large
number of different attacks [24,27, 29]. In parallel, other
work has explored how different DNS resolvers or ISPs
manipulate DNS traffic. Kuhrner et al. [17] classified
open DNS resolvers using fingerprints of DNS software
and found that millions of the resolvers deliberately ma-
nipulated DNS resolutions and returned unexpected IP
address information. Dagon et al. [8] found in 2008 that
2.4% of DNS queries to open DNS servers are returned
with incorrect answers; Weaver et al. [36] used Netalyzr
data in 2011 to perform a similar study, finding up to
24% of responses manipulated.
Our NXDOMAIN hijacking study shares many goals and
is complementary to these previous studies, but with the
following key differences. First, using Luminati allows
us to measure in-use DNS servers, rather than having to
scan for open resolvers as some approaches have done.
Second, our approach provides measurements at greater
scale and in less time than previous work; the Netalyzr
data set, while incredibly useful, took months to years
to be created; we are able to measure a similar number
of nodes in a matter of days. Finally, our results present
a new look at NXDOMAIN hijacking, as the two previous
studies are now both over five years old.
HTTPS content manipulation SSL and TLS secure
a large portion of the Internet’s traffic today; together
with a public key infrastructure, they provide authenti-
cation and encryption. However, the increasing fraction
of HTTPS traffic has led to renewed interest in trying to
gain visibility into such encrypted traffic, typically via
man-in-the-middle (MITM) attacks [7]. There has been
a series of work [2,5,12, 18] that focuses on how MITM
attacks are conducted in different scenarios, including
mobile networks [18], using invalid certificates [5], when
authentication protocols are tunneled [2], or via anti-
virus software [12]. Recently Carn´e de Carnavalet and
Mannan [12] analyzed eight commercial antivirus soft-
ware and parental control applications, which interpose
a TLS proxy in between end hosts’ communications.
Our results complement theirs; while they were con-
cerned with understanding how anti-virus applications
work, we demonstrated how wide-spread they are. Ad-
ditionally, Huang et al. [13] studied SSL MITM attacks
by injecting a Flash object into Facebook’s Web pages.
Similar to our results, they find that 0.2% of hosts re-
ceive forged certificates. In comparison, we were able to
measure a similar number of users, but did so without
requiring access to a popular Web site.
HTTP content manipulation Because HTTP by it-
self has no integrity checks, violations of the end-to-end
connectivity of HTTP traffic have been occurring for
many years. Much of this has occurred due to the use
of proxies, both in both broadband [9, 16] and mobile
networks [37]. For example, the Netalyzr project [16]
revealed HTTP proxies by monitoring request and re-
sponse headers, and also used this to identify proxy
caching policies and content transcoding.
Other forms of HTTP tampering are more pernicious.
Recently, Thomas et al. [31] found that more than 5%
of unique daily IP addresses accessing Google experi-
ence ad injection due to malicious Chrome extensions
or Window binaries. Also, Zhang et al. [38] used soft-
ware installed on users’ Microsoft computers to identify
nine ISPs in U.S. that redirect users to rogue servers
that serve modified content. In mobile networks, Xu
et al. [37] recently investigated transparent proxies and
identified content manipulation and caching behavior
by studying four major mobile carriers in the US.
Our approach to measuring HTTP end-to-end con-
nectivity violations largely complements these. Our re-
sults show similar patterns of malicious software and
web proxies, but our approach allows us to examine this
behavior for millions of hosts in networks worldwide.
9. CONCLUSION
In this paper, we proposed a new approach to measur-
ing end-to-end connectivity violations in DNS, HTTP,
and HTTPS, based on the Luminati proxy service. We
developed techniques to be able to detect content ma-
nipulation in all three protocols, and used these tech-
niques to measure over 1.2mhosts across 14kASes in
172 countries. Our results, at various points, confirm
prior findings, update prior studies with new measure-
ments, and reveal new ways in which content manipu-
lation is leading to security vulnerabilities. As part of
our study, we also identified a new content monitoring
attack, where users’ URL requests via HTTP are up-
loaded to third party servers, who unexpectedly later
fetch the same content.
We demonstrated our methodology allows researchers
to quickly conduct measurement studies that previously
required months to years of user-recruitment effort or
privileged access to a popular Web site. This opens
the door to continuous measurements worldwide, with
the ability to see how various types of violations evolve
over time. We believe this will be useful not only for
improving transparency, privacy, and security, but also
for informing regulators and policymakers.
Acknowledgments
We thank the anonymous reviewers and our shepherd,
Amogh Dhamdhere, for their helpful comments. This
research was supported in part by NSF grants CNS-
1421444 and CNS-1563320.
10. REFERENCES
[1] A new reason to love OpenDNS: No more ads.
https://www.opendns.com/no-more-ads/.
[2] N. Asokan, V. Niemi, and K. Nyberg.
Man-in-the-middle in Tunnelled Authentication
Protocols. Security Protocols, Springer, 2005.
[3] Alexa Top 500 Global Sites.
http://www.alexa.com/topsites.
[4] BISMark Network Dashboard.
http://networkdashboard.org/.
[5] F. Callegati, W. Cerroni, and M. Ramilli.
Man-in-the-Middle Attack to the HTTPS
Protocol. IEEE Security and Privacy, 7(1), 2009.
[6] CAIDA AS Organizations Dataset. http:
//www.caida.org/data/as-organizations/.
[7] D. Dolev and A. C. Yao. On the Security of
Public Key Protocols. IEEE Transactions on
Information Theory, 29(2), 1983.
[8] D. Dagon, C. Lee, W. Lee, and N. Provos.
Corrupted DNS Resolution Paths: The Rise of a
Malicious Resolution Authority. ndss, 2008.
[9] G. Detal, B. Hesmans, O. Bonaventure, Y.
Vanaubel, and B. Donnet. Revealing Middlebox
Interference with Tracebox. IMC, 2015.
[10] T. Dziuba. When ISPs hijack your rights to
NXDOMAIN. The Register, 2009.
http://www.theregister.co.uk/2009/08/17/
dzuiba_virgin_media_opendns/.
[11] DNS Error Assist.
http://dnserrorassist.att.net.
[12] X. de Carn´e de Carnavalet and M. Mannan.
Killed by Proxy: Analyzing Client-end TLS
Interception Software. NSDI, 2016.
[13] L.-S. Huang, A. Rice, E. Ellingsen, and C.
Jackson. Analyzing Forged SSL Certificates in the
Wild. IEEE S&P, 2014.
[14] Hola. https://hola.org.
[15] Hola. Personal Communication.
[16] C. Kreibich, N. Weaver, B. Nechaev, and V.
Paxson. Netalyzr: Illuminating the Edge Network.
IMC, 2010.
[17] M. K¨
uhrer, T. Hupperich, J. Bushart, C. Rossow,
and T. Holz. Going Wild: Large-Scale
Classification of Open DNS Resolvers. IMC, 2015.
[18] U. Meyer and S. Wetzel. A Man-in-the-middle
Attack on UMTS. WISE, 2004.
[19] Narseo Vallina-Rodriguez. Personal
Communication.
[20] M. O’Neill, S. Ruoti, K. Seamons, and D.
Zappala. POSTER: TLS Proxies: Friend or Foe?
CCS, 2014.
[21] OS X El Capitan: List of available trusted root
certificates.
https://support.apple.com/en-us/HT205204.
[22] Project BISmark. http://projectbismark.net.
[23] C. Reis, S. D. Gribble, T. Kohno, and N. C.
Weaver. Detecting in-flight Page Changes with
Web Tripwires. NSDI, 2008.
[24] V. Ramasubramanian and E. G. Sirer. Perils of
Transitive Trust in the Domain Name System.
IMC, 2005.
[25] RIPE NCC Annual Report 2015. https:
//www.ripe.net/publications/docs/ripe-665.
[26] University of Oregon RouteViews project.
http://www.routeviews.org/.
[27] J. Schlamp, J. Gustafsson, M. W¨
ahlisch, T. C.
Schmidt, and G. Carle. The Abandoned Side of
the Internet: Hijacking Internet Resources When
Domain Names Expire. Int. Workshop on Traffic
Mon. and An., 2015.
[28] R. Singel. ISPs’ Error Page Ads Let Hackers
Hijack Entire Web, Researcher Discloses. WIRED,
2008. https:
//www.wired.com/2008/04/isps-error-page.
[29] S. Son and V. Shmatikov. The hitchhiker’s guide
to DNS cache poisoning. Security and Privacy in
Communication Networks, Springer, 2010.
[30] M. A. S´anchez, J. S. Otto, Z. S. Bischof, D. R.
Choffnes, F. E. Bustamante, B. Krishnamurthy,
and W. Willinger. Dasu: Pushing Experiments to
the Internet’s Edge. NSDI, 2013.
[31] K. Thomas, E. Bursztein, C. Grier, G. Ho, N.
Jagpal, A. Kapravelos, D. McCoy, A. Nappa, V.
Paxson, P. Pearce, N. Provos, and M. A. Rajab.
Injection at Scale: Assessing Deceptive
Advertisement Modifications. IEEE S&P, 2015.
[32] R. F. a. G. Tyson, P. Francois, and A.
Sathiaseelan. Pushing the Frontier: Exploring the
African Web Ecosystem. WWW, 2016.
[33] TrendMicro Web Reputation Services.
http://esupport.trendmicro.com/solution/
en-US/1058991.aspx.
[34] Vectra Threat Labs. Technical analysis of Hola.
http://blog.vectranetworks.com/blog/
technical-analysis-of-hola.
[35] Verizon Search Assist.
http://searchassist.verizon.com.
[36] N. Weaver, C. Kreibich, B. Nechaev, and V.
Paxson. Implications of Netalyzr’s DNS
Measurements. SATIN, 2011.
[37] X. Xu, Y. Jiang, T. Flach, E. Katz-Bassett, D.
Choffnes, and R. Govindan. Investigating
Transparent Web Proxies in Cellular Networks.
PAM, 2015.
[38] C. Zhang, C. Huang, K. W. Ross, D. A. Maltz,
and J. Li. Inflight Modifications of Content: Who
Are the Culprits? LEET, 2011.