
WWW ’20, April 20–24, 2020, Taipei, Taiwan Vikas Mishra, Pierre Laperdrix, Antoine Vastel, Walter Rudametkin, Romain Rouvoy, and Martin Lopatka
Nevertheless, besides Opera that natively integrates a VPN to
hide the IP address of its users [
22
], none of the aforementioned
browsers protect against IP address based tracking techniques. Thus,
we argue that many of these efforts present few benefit if users
can be tracked solely because of their IP address. Indeed, while
mobile network IP addresses are commonly shared by multiple users
because of carrier-grade NAT [
18
], it is less the case of residential IP
addresses. Even though most Internet Service Providers (ISP) provide
dynamic IP addresses, they still remain the same for long time, as
long as the user does not turn off their WiFi router for long enough.
Nevertheless, no large-scale longitudinal study has been conducted
on the duration for which such IP addresses are retained by users
and its implications in terms of privacy.
Our study leverages a dataset of public IP addresses collected
over a period of 111 days from 5,443 users. The IP addresses were
collected using two browser extensions advertised on the AmIU-
nique website.
1
Using this dataset, we study the stability of the
public IP addresses a user’s device uses to communicate with our
server. The public IP addresses we obtain could be those that are
directly assigned to the users’ devices or, more commonly, the users’
devices are behind a gateway, such as a residential router, in which
case, our server obtains the IP addresses of the routers. Over time, a
same device communicates with our server using a set of distinct IP
addresses, but we find that devices reuse some of their previous IP
addresses for long periods. We call this IP address retention and, the
duration for which an IP address is retained by a device, is known
as the IP address retention period, or simply retention period. In
many cases, a device’s list of retained IP addresses show repeti-
tive patterns. In its simplest form, this may be a home-work-home
routine. We define these patterns as IP address cycles.
We first study the retention period of each IP address, how it
varies from country to country, the presence of short-lived and
long-lived IP addresses, and when a device uses a new, previously
unknown, IP address, we test to see its similarity to previous IP
addresses by removing the last—least-significant—octet. Then, we
derive cycles of long-lived IP addresses for each user, which shows
potential to discriminate and track users. Our intuition is that, given
the portable nature of personal devices, IP cycles reflect human
behavior and can be used as a proxy to infer other information, like
user routines. Our evaluation also shows that even simple metrics,
such as the Jaccard similarity between sets of IP addresses, provide
unique and stable information that could be used by trackers, and
could be also be used for respawning cookies.
In summary, this paper reports on the following results:
(1)
87 % of users have at least one IP address that was retained
for more than 30 days,
(2)
Among the 10 countries that contributed the most IP ad-
dresses to our dataset, the Netherlands shows the highest
average IP address retention period of 36.96 days,
(3)
93 % of users have a distinct set of long-lived IP addresses,
that is, the set of IP addresses with a retention period at least
30 days proves to be highly discriminating,
(4)
20 % of users have at least one cycle of long-lived IP addresses
that lasts for more than 30 days.
1https://amiunique.org/
The remainder of this paper is organized as follows. Section 2 de-
scribes the background and related Work. Section 3 details our input
dataset and how we cleaned it. Section 4 goes over our methodology
and the analysis of our dataset. Section 5 discusses our results and
some of the privacy implications. Section 6 concludes the paper.
2 BACKGROUND & RELATED WORK
An Internet Protocol (IP) address is a numerical label assigned to
each device connected to a computer network that uses IP for
communication. IP serves two main functions: host or network
interface identification and location addressing. The IP address
space is managed by Internet Assigned Numbers Authority (IANA)
and by 5 regional registries for different parts of the world. These
registries assign blocks of addresses to Internet Service Providers
(ISP) who further assigns an IP address to each device connected on
its network. These IP addresses can be static or dynamic, depending
on the usage.
In general, static IP addresses are used to host services and are
more expensive. Most residential IP addresses are dynamic, allowing
the ISP to optimize how the address space is used. They are assigned
by the ISP using the Dynamic Host Configuration Protocol (DHCP),
and each address is given a lease with an expiry period. If the lease
is not renewed before the expiry period, the address is released
back to the DHCP server and can then be assigned to a different
device. However, if the lease is renewed the device retains the same
IP address. Depending on policies and configuration, this lease
can be renewed an indefinite amount of times. In practice, if our
WiFi set-top-boxes remain connected and the ISP’s policies allow
it, contrary to being dynamic, the same IP address may be retained
for a long duration.
Dynamic IP addresses constitute a significant portion of assigned
addresses. In 2007, Xie et al. [
28
] observed that more than 40 % of
IP addresses collected from Hotmail user logins in a month were
dynamic. They also studied the volatility of dynamic IP addresses
and they showed that over 30 % of dynamic IP addresses were
shared by more than one user within 1 to 3 days. Despite the very
large dataset, the IP address is only collected when a user logs
in, and important information, like the ID of the device or any
IP changes between two user logins, is lost. As we propose in
this paper, we believe that better understanding the stability of
IP addresses requires a finer-grained dataset that can associate
IP addresses to specific devices, and retrieve IP address changes
rapidly.
Maier et al. [
19
] studied the dominant characteristics of residen-
tial broadband traffic in 2009. On a DSL network, they found that
50 % of IP addresses were assigned to residential routers at least
twice in 24 hours, whereas 1–5% of IP addresses were reassigned al-
most 10 times a day. In our study, we look at the public IP addresses
from which devices connect to the Internet, not at any specific ISP.
We observe that some IP addresses are used by devices for only a
few hours and then they change. Our study confirms that devices
access the Internet across a large amount of short-lived IP addresses,
but we also found that devices have some very long-lived IP ad-
dresses that they reuse over time, with induces important privacy
risks, such as online tracking.