
real-time to the anti-virus company’s servers, who are
then re-downloading the content fetched by the user.
Similarly, Anchorfree is duplicating users’ requests, and
Bluecoat is holding Web requests until it first fetches it
on the user’s behalf for analysis.
7.2.2 ISP-level monitoring
TalkTalk The second-most-commonly observed source
of unexpected requests comes from six IP addresses
owned by TalkTalk (a U.K. ISP); these requests were
generated by 2,233 exit nodes. We also observe a sim-
ilar pattern to TrendMicro, in that TalkTalk typically
generates two unexpected requests, with the first re-
quest arriving almost exactly 30 seconds after the exit
node’s request and the second request arriving at our
measurement server over the next hour (this pattern
can be observed in Figure 5). Interestingly, we find
that all of the exit nodes who generated these unex-
pected requests are in TalkTalk; we therefore believe
that these requests are due to ISP-level content moni-
toring. Adding further weight to this theory is the fact
that the 2,233 exit nodes in TalkTalk that generated un-
expected requests represent 45.2% of all the exit nodes
that we measured in TalkTalk.
Tiscali U.K. We also observe that another ISP, Tis-
cali U.K., has a similar pattern (in fact, Tiscali U.K.
was acquired by TalkTalk in 2009, but continues to be
run as a separate entity). We observe two IP addresses
in Tiscali U.K. that generated unexpected requests for
363 exit nodes (representing 11.4% of all Tiscali U.K.
exit nodes). Unlike TalkTalk, we observe only one un-
expected request; this request almost always comes in
exactly 30 seconds after the exit node’s request.
We cannot say for sure why some, but not all, exit
nodes in both ISPs experience content monitoring. Po-
tential explanations are that content monitoring could
be done non-deterministically (e.g., only 10% of re-
quests are monitored), or it may be due to ISP-provided
additional services like parental content controls.27 Re-
gardless, monitoring and its potential for controlling
open access to content has significant implications for
Internet users, and should be made transparent.
7.3 Summary
In this section, we developed and deployed techniques
that can detect certain instances of content monitoring.
We found that over 1.5% of all exit nodes suffered from
their HTTP requests being collected and re-requested
by a third party, and that the most common causes
were anti-virus software products, VPN services, and
the user’s ISPs. All of these instances have significant
privacy implications, as these users are likely unaware
that their HTTP browsing history is being duplicated
in near-real-time.
27For example, TalkTalk’s opt-in SuperSafe feature
(https://help2.talktalk.co.uk/supersafe-boost-overview).
8. RELATED WORK
Before concluding, we provide an overview of related
work on end-to-end connectivity violations (we dis-
cussed other measurement approaches in Section 2.1).
DNS Manipulation Because DNS provides no built-
in security, DNS traffic has been the vector for a large
number of different attacks [24,27, 29]. In parallel, other
work has explored how different DNS resolvers or ISPs
manipulate DNS traffic. Kuhrner et al. [17] classified
open DNS resolvers using fingerprints of DNS software
and found that millions of the resolvers deliberately ma-
nipulated DNS resolutions and returned unexpected IP
address information. Dagon et al. [8] found in 2008 that
2.4% of DNS queries to open DNS servers are returned
with incorrect answers; Weaver et al. [36] used Netalyzr
data in 2011 to perform a similar study, finding up to
24% of responses manipulated.
Our NXDOMAIN hijacking study shares many goals and
is complementary to these previous studies, but with the
following key differences. First, using Luminati allows
us to measure in-use DNS servers, rather than having to
scan for open resolvers as some approaches have done.
Second, our approach provides measurements at greater
scale and in less time than previous work; the Netalyzr
data set, while incredibly useful, took months to years
to be created; we are able to measure a similar number
of nodes in a matter of days. Finally, our results present
a new look at NXDOMAIN hijacking, as the two previous
studies are now both over five years old.
HTTPS content manipulation SSL and TLS secure
a large portion of the Internet’s traffic today; together
with a public key infrastructure, they provide authenti-
cation and encryption. However, the increasing fraction
of HTTPS traffic has led to renewed interest in trying to
gain visibility into such encrypted traffic, typically via
man-in-the-middle (MITM) attacks [7]. There has been
a series of work [2,5,12, 18] that focuses on how MITM
attacks are conducted in different scenarios, including
mobile networks [18], using invalid certificates [5], when
authentication protocols are tunneled [2], or via anti-
virus software [12]. Recently Carn´e de Carnavalet and
Mannan [12] analyzed eight commercial antivirus soft-
ware and parental control applications, which interpose
a TLS proxy in between end hosts’ communications.
Our results complement theirs; while they were con-
cerned with understanding how anti-virus applications
work, we demonstrated how wide-spread they are. Ad-
ditionally, Huang et al. [13] studied SSL MITM attacks
by injecting a Flash object into Facebook’s Web pages.
Similar to our results, they find that 0.2% of hosts re-
ceive forged certificates. In comparison, we were able to
measure a similar number of users, but did so without
requiring access to a popular Web site.
HTTP content manipulation Because HTTP by it-
self has no integrity checks, violations of the end-to-end