Even Turing Should Sometimes Not Be Able To

Tell: Mimicking Humanoid Usage Behavior for

Exploratory Studies of Online Services

Stephan Wiefling1, Nils Gruschka2, and Luigi Lo Iacono1

1TH K¨oln - University of Applied Sciences, Cologne, Germany

{stephan.wiefling, luigi.lo iacono}@th-koeln.de

2University of Oslo, Oslo, Norway

nilsgrus@ifi.uio.no

Abstract. Online services such as social networks, online shops, and

search engines deliver different content to users depending on their loca-

tion, browsing history, or client device. Since these services have a major

influence on opinion forming, understanding their behavior from a social

science perspective is of greatest importance. In addition, technical as-

pects of services such as security or privacy are becoming more and more

relevant for users, providers, and researchers. Due to the lack of essen-

tial data sets, automatic black box testing of online services is currently

the only way for researchers to investigate these services in a methodi-

cal and reproducible manner. However, automatic black box testing of

online services is difficult since many of them try to detect and block

automated requests to prevent bots from accessing them.

In this paper, we introduce a testing tool that allows researchers to cre-

ate and automatically run experiments for exploratory studies of online

services. The testing tool performs programmed user interactions in such

a manner that it can hardly be distinguished from a human user. To eval-

uate our tool, we conducted—among other things—a large-scale research

study on Risk-based Authentication (RBA), which required human-like

behavior from the client. We were able to circumvent the bot detec-

tion of the investigated online services with the experiments. As this

demonstrates the potential of the presented testing tool, it remains to

the responsibility of its users to balance the conflicting interests between

researchers and service providers as well as to check whether their re-

search programs remain undetected.

Keywords: Black box testing, Evaluation, Testing framework

1 Introduction

The advancing digital transformation impacts all areas of human life. As a conse-

quence, research aiming at understanding the inner workings of digital technolo-

gies, platforms, applications, services, and products, as well as their influence on

human society and culture, becomes increasingly important.

Postprint version of a paper accepted for NordSec 2019. The final authenticated version is available

online at https://doi.org/10.1007/978-3- 030-35055- 0_12.

2 S. Wiefling et al.

Numerous examples for such intersections of online services with society can

be found today. All sorts of social networks use non-transparent algorithms to

perform content filtering and provisioning tasks, depending on one’s individ-

ual characteristics and interests. Other examples can be found in the various

deployed recommendation systems of e-commerce and content distribution plat-

forms. To what extent these types of online services influence society is an im-

portant research question. Is your taste in music governed by music streaming

companies and their algorithms to promote or recommend music? Are these

systems exploitable for purposes other than the intended ones? First research

attempts indicate that such influences on society are taking place [32,6,43].

Besides the impact on culture and society, technical aspects are more and

more hidden behind the user interfaces of online services. Deployed security, as

well as privacy preserving and undermining technologies, remain opaque to the

user. For instance, contemporary security approaches to strengthen password-

based authentication with Risk-based Authentication (RBA) [18] are deployed

by only a few large online services [28,23,2], even though this technology is of

broad relevance as the recent recommendation by NIST emphasizes [22]. Study-

ing RBA-instrumented services would help to demystify RBA setups so that

they can be discussed and further developed by a wider audience. This may

contribute to accelerate the adoption and deployment of RBA in the wild. Good

examples for exploratory research that are beneficial for society are the various

studies on misusing cookies for tracking purposes [12,17,7,26,9].

An essential prerequisite to perform effective and reliable research, in this

context, is the availability of data. Although many openly accessible data sets

of various online services and platforms exist [4], they only provide a very lim-

ited and fragmented view. One major reason for this lack of data is that the

companies and organizations possessing it—most commonly—do not share it

(publicly). Thus, the digital utilities surrounding our daily lives are black boxes

that do not reveal their internal workings. As this lack of transparency hinders

scientific research, methods are required to methodically reverse-engineer these

black boxes. This is important to understand the algorithms influencing our

current and future zeitgeist as well as their corresponding security and privacy

features.

Unfortunately, the investigation of the inner workings of online services is

complicated for several reasons which turns studying them into a difficult prob-

lem. There is no unique path to conduct such an analysis, no simple agreed Appli-

cation Programming Interface (API) or even approach. Moreover, online services

are distributed systems, making the service-side inaccessible to entities other

than the respective service provider itself. Also, investigating the inner work-

ings of online services is further complicated by means of the service provider.

For large-scale methodological studies, automated browsing through online ser-

vices is required. However, online services integrate technical countermeasures

against such automated browsing. These range from presenting CAPTCHA chal-

lenges [11] or delivering different website contents for human users and bots [1],

to completely blocking the service access [30]. Hence, in order to be able to

Mimicking Humanoid Usage Behavior for Studies of Online Services 3

conduct exploratory studies of online services, technologies are required to cam-

ouflage automated black box testing as far as possible.

This arms race between service providers and researchers lies in their con-

tradicting requirements. Service providers want to keep their internals secret, as

they might also contain intellectual property. Researchers instead, are keen to

analyze and understand systems thoroughly, with the aim to gain knowledge and

enhance system properties towards an optimum. Thus, in the absence of other

means, researchers will use black box tests to determine their research results,

while service providers will detect and block automated black box tests to keep

their internals opaque to outsiders.

Another reason for service providers to block automated black box testing

is that it is considered a double-edged sword since it might not only be used

by researchers. In the hands of attackers, such testing tools can be used to

threaten systems and networks, even if they are aimed at improving security.

Still, as security is about balancing several trade-offs, these trade-offs need to be

understood thoroughly in order to make the right compromise.

Contributions. We introduce an inspection tool to perform automated black

box testing of online services and, at the same time, mimic human-like user

behavior3. The aim is to provide a research vehicle to investigate the inner

workings of online services lacking publicly accessible resources. This can foster

discussion and collaboration among security researchers and service providers.

Outline. The rest of this paper is structured as follows. We review related work

in Section 2. We describe the introduced inspection tool in Section 3. We give

more detailed descriptions on its implementation as well as customization to

study online services in Section 4. To further illustrate the use of the introduced

inspection tool, Section 5 discusses exemplary studies. We discuss the benefits

as well as limitations of the introduced inspection tool in Section 6. As the usage

of our tool can easily be extended to exploit online services, Section 7 discusses

ethical considerations before the paper concludes in Section 8.

2 Related Work

A number of researchers performed black box testing of online services with web

browser automation. Choudhary et al. [10] developed a tool for automated web

application testing to detect cross-browser inconsistencies on websites. Starov

and Nikiforakis [34] analyzed the effect of browser extensions on the rendered

Document Object Model (DOM) document for the 50 most popular websites.

They showed that differences inside the DOM tree can be (mis-)used for fin-

gerprinting and tracking the client. Englehardt and Narayanan [15] measured

and analyzed one million websites and their corresponding usage of online track-

ing as well as the effect of browser privacy tools. Golla and D¨urmuth [19] used

3Provided as open source software at https://github.com/das-th-koeln/HOSIT

4 S. Wiefling et al.

browser automation to test password strength meters of online services. Degeling

et al. [12] automatically extracted cookie consent notices and privacy policies of

6,579 websites inside the European Union (EU) to analyze their appearance be-

fore and after the EU General Data Protection Regulation (GDPR) [16] went

into effect.

However, in all publications mentioned above, the corresponding browser

automation frameworks did not aim to imitate human-like behavior as we did

in our framework. As a consequence, these studies cannot tell whether their

observations reflect the services’ inner workings or a customized behavior due to

being detected as a bot.

Other browser automation frameworks tried to imitate human-like user-

actions to a small extent. Petsas et al. [29] used browser automation to evalu-

ate the quantity of Google users with enabled Two-factor Authentication (2FA).

Their framework introduced a random waiting time between clicks. Snickars and

M¨ahler [32] analyzed the behavior of the online music streaming service Spotify

with browser automation. Their automation framework conducted several user

actions, e.g., logging in, selecting a track, and skipping a track. However, they

noted that this was only possible before Spotify introduced reCAPTCHAs as a

bot protection mechanism in 2016.

In contrast to all these frameworks, we included a considerably higher amount

of efforts in our framework to closely mimic human-like behavior and bypass

CAPTCHAs to not be detected as a bot (see Section 6).

The DASH tool [13] by the DETER project aimed to model human behavior

in various situations, e.g., responding to phishing emails. In contrast to our tool

(see Section 3), the application did not really conduct human-like actions on

online services and only simulated possible behavior in theory.

Most browser automation tools described in this section were based on the

Selenium framework [15,17,12,32,19,10,34]. One tool was based on CasparJS [29].

We decided to use the high-level application programming library Puppeteer [21]

as a base for our tool. We chose Puppeteer over Selenium since it offers a higher-

level API and is targeted to the popular Chrome browser [44] instead of multiple

browsers. Note that Puppeteer was not available at the time where most of the

above mentioned studies were conducted4.

3 Humanoid Online Services Inspection Tool

The Humanoid Online Services Inspection Tool (HOSIT) was designed to sim-

ulate human-like browsing behavior on online services. While some frameworks

for automated browsing are freely available on the Internet, their standard func-

tionality makes them difficult to use for inspecting online services for several

reasons:

4First version of the source code was published on the Puppeteer GitHub reposi-

tory on May 11th, 2017: https://github.com/GoogleChrome/puppeteer/commit/

2cda8c18d10865d79d3e63b23e36aa7562098bf7

Mimicking Humanoid Usage Behavior for Studies of Online Services 5

– There is no function to create virtual identities which are perceived as real

humans by online services.

– Some of the integrated functions do not model real-world human behavior and

thus can be detected by online services, e.g., typing with 0 ms delay or clicking

in the exact center of an element.

– The API allows activities which are not possible for real web browser users,

e.g., conducting browsing activities inside two browser tabs at the same time.

– Browser automation using these frameworks can be detected due to differences

between the normal and the automated browser mode [40].

– These frameworks do not log conducted actions such as name and screenshot

of clicked elements automatically. This makes potential implementation errors

(e.g., element with certain ID not found) hard to detect. Consequently, scaling

the automation to multiple machines is difficult.

We addressed these issues with HOSIT and enhanced the integrated standard

functionalities of Puppeteer with human-like browsing behavior and camouflage

measures to be as indistinguishable from human users as possible:

(i) A scrolling function to imitate reading of website contents (usage can be

seen in the script in Figure 2). The function scrolls down around half the

display height, pauses for some time, scrolls further, pauses again and re-

peats this procedure until reaching the bottom of the page. We developed

this function since scrolling is considered a typical behavior for human users

on websites [42,36].

(ii) A function which allows switching between browser tabs. This is also a

typical behavior for a human using a web browser.

(iii) A search query generator based on current events in media. The generated

queries can be used to create arbitrary browsing behavior, e.g., entering

query in a search engine and opening one of the results. The generator

accesses a publicly available Really Simple Syndication (RSS) feed and

parses the feed’s content to an evaluation function which generates a list of

search queries. From this list, the generator selects a random entry every

time the generator is called. The search query generators can be customized

and added in the HOSIT configuration, e.g., for generating search queries

focused on other topics. We chose this functionality since visiting search

engines is a common online activity [24,27].

(iv) Hidden element checks: some online services integrate hidden elements

which can be used to detect bots, e.g., typing text inside a hidden field

or clicking a hidden link. For this reason, we provide a function to check

whether a certain element on a website is visible or not.

(v) Integration of external services providing CAPTCHA solving capabilities.

(vi) Automated logging of all activities conducted on the online service with

screenshots into a MongoDB database for replicable studies. The database

type can be adjusted for individual use case scenarios.

In general, we focused on human-like behavior that could by analyzed by

reading out information via the web browser, i.e., keyboard and mouse events.

6 S. Wiefling et al.

Log

API

Virtual

Identities

Training

Procedures

Inspection

Procedures

HOSIT Framework

Inspected Service

Human User Imitation

API

Study

Conductor

Fig. 1. Architecture of HOSIT

We considered the human-like behavior based on empirical studies modeling

human computer interaction [8,14,46,37,33,25] as well as similar ideas on human

behavior simulation [5].

The basic architecture of HOSIT is as follows (see Figure 1). In order to

test services, the study conductor creates one or multiple virtual identities with

different browsing behavior (e.g., typing speed and clicking behavior). The con-

ductor also defines a sequence of activities to be executed on the tested services

for the respective study. Examples for activities can be: “click on shopping cart

link”, “search for a friend”, or “logout from service”. In many cases, these activ-

ities can be divided into training procedures (let the service learn “normal” user

behavior) and inspection procedures (analyze the service’s reaction to unusual

behavior). The HOSIT API offers functions to create virtual identities inside the

HOSIT framework as well as to execute the activities. In contrast to other solu-

tions, HOSIT enables human-like behavior in two ways. First, human-imitating

behavior is automatically added to activity calls for many functions (e.g., the

“click on button” function clicks on an arbitrary position inside the button).

And second, HOSIT offers additional function calls allowing explicit human be-

havior as a part of the activity sequence (e.g., “scroll to the end of the web

page”). Using a script containing a sequence of activities and the browsing be-

havior from the virtual identities, the HOSIT framework calls the service using

a Chromium browser instance. Finally, all responses from the service are logged

for later analysis.

Figure 2 shows a simple example of a HOSIT script calling an online service in

a human-imitating manner. The example code invokes HOSIT to open a search

Mimicking Humanoid Usage Behavior for Studies of Online Services 7

// O p en ne w page t ab

a wa it c on t ro l le r . n ew Pa g e (" h tt p s :/ / w ww . s ta r tp a g e . co m / " );

// W ait a ran d o m tim e p e rio d

a wa it c on tr o ll e r . ra n do m Wa it ( );

// Cl i ck on th e " I ma g es " - L in k

a wa it c on t ro l le r . c li c k (" a [ h re f = ’ h tt p s :/ / w ww . s ta r tp a g e . co m /

en / p i cs . h tm l ’ ] ") ;

// W ait un t il th e tex t f i eld i s loa d e d

a wa it co n tr o ll e r . wa it F or S el e ct o r (" i n pu t [ t yp e = ’ te xt ’ ]" ) ;

// G e n e rat e a n d ente r s e arch que r y b ase d o n

// c u r rent eve n t s in me d ia

a wa it co n tr o ll e r . ty pe S ea r ch Q ue r y (" i n pu t [ t yp e = ’ te xt ’ ]" ) ;

// S c r oll t o the b o t tom of the pag e

a wa i t c o n tr o l l er . s c ro l l T oB o t t om () ;

Fig. 2. Example HOSIT script

engine, click on the link to open the image search (after waiting a random period),

and enter a search query chosen randomly by HOSIT based on current events

in the media. Finally, HOSIT scrolls to the bottom of the page. This results in

a usage of the online service in a way that a human would also do.

All activities performed on the online service as well as errors are logged into

a database. As a result, study conductors get an overview of all interactions that

the identities performed on the online service. This also eases the debugging of

errors caused by activities of a certain identity.

4 Implementation

We implemented HOSIT using the Node.js library Puppeteer [21] in version

0.13.0 for browser automation. Consequently, HOSIT can be used on all operat-

ing systems which are capable of running the Node.js runtime environment and

the browser Chromium. A Chromium version is bundled with the HOSIT instal-

lation. Nevertheless, HOSIT can be configured to use a customized Chromium

or Chrome browser version instead of the bundled version. This might be nec-

essary for example, when testing websites requiring Digital Rights Management

(DRM) functionalities, which are included in Chrome but not in Chromium.

Chromium is executed in a custom headful mode, in which the browser is

launched in the standard mode with visible GUI5. HOSIT uses this headful

5To be compatible with Linux servers or Docker containers without a visible desktop

environment, the headful mode can also be run inside a virtual window session.

8 S. Wiefling et al.

mode to minimize the detection of automated browsing. Chromium’s headless

mode, which is designed specifically for browser automation, can be detected

by a number of differences in the browser’s properties and behavior [40]. Dur-

ing testing we actually experienced that online services treat headless browsers

differently. Amazon, for example, required a CAPTCHA in headless but not in

headful mode. We also patched HOSIT against known headless browser detec-

tion mechanisms [40]. HOSIT executes these patches when launching Chromium,

e.g., removing the navigator.webdriver property [35].

During testing, we found some indications that browser automation can be

detected with the standard functionality of Puppeteer. For instance, Amazon

rated correctly entered CAPTCHA solutions as not correct if “typed” in by the

standard Puppeteer function. Therefore, we enhanced some of Puppeteer’s inte-

grated functions with human-like user behavior. We compared manual browsing

behavior with the automated behavior of Puppeteer to determine differences and

optimized the affected functions. First, we modified the constant standard de-

lays between pressing and releasing key buttons with randomized delays. These

delays vary with an average typing speed which is defined by the identity (aver-

age time and maximum deviation). We recommend to measure these delays on

real humans before setting them on the identities. By default, we set empirically

measured typing speeds [8,14] on the identities. This procedure helped mimic

human behavior more precisely. Further, we modified the mouse input behav-

ior. Instead of clicking in the exact center of an element, the mouse selected a

random click point in the center quarter of the element. We also replaced the

default delay between pressing and releasing the left mouse button of 0 ms with

an empirically measured clicking time with randomized variations [25].

We, moreover, added further functionalities to HOSIT which did not exist in

Puppeteer (see Table 1). Finally, we simplified the API of Puppeteer and added

recurrent tasks for the use case scenario inside the functions, e.g., automatically

adjust the browser resolution when creating a new tab. As a result, fewer function

calls are required to achieve the same result as with Puppeteer while being more

human-like in many respects.

As stated in Section 3, each HOSIT instance is linked to a virtual identity

which controls a browser instance. All further browsing behaviors are derived

from this identity on this instance (e.g., typing behavior, selecting different cat-

egories based on the virtual identity’s persona). The identity manages all browser

tabs such as opening, switching, and closing browser tabs, and performs the ac-

tions on the website. These actions range from typing or clicking to scrolling and

can only be performed in the currently open browser tab. We decided to select

this identity-based structure to both optimize the API for the use case and to

avoid unrealistic browsing behavior that was possible in Puppeteer, e.g., clicking

buttons in two browser tabs at the same time.

When developing own studies of online services, study conductors have to

design individual testing procedures with HOSIT. This is necessary since nav-

igation structures and functionalities differ between online services and might

change over time. For fine-grained variations of the browsing behavior, each

Mimicking Humanoid Usage Behavior for Studies of Online Services 9

Table 1. Feature differences between Puppeteer and HOSIT

Puppeteer 0.13.0 HOSIT

Properties

Typing speed Constant Randomized variations

Click position Exact element center Randomized variations

Click time 0 ms Realistic [25]

Logging Limited Extended*

Browsing behavior changes - Yes, based on persona

Bot detection protection - Patched

Functions

Common workflows Need to be repeated Integrated in Controller class

Search query generator - Included

CAPTCHA solving - Included

Scrolling - Included

Select tabs - Included

- Not included

* Logs all conducted actions with screenshots into a database

HOSIT instance provides functions which can be used to increase randomized

browsing behavior. These functions range from providing a random boolean value

with a given probability for if-else conditions to providing the persona of the iden-

tity. By using these functions, we achieved that each browser session performed

by HOSIT appeared differently on the tested online services.

5 Exemplary Use

To evaluate HOSIT, we conducted two studies that we discuss in the following.

Both experiments would not have been possible without HOSIT or just with

significant higher effort. The discussions will also provide a better understanding

of HOSIT deployments based on the two given exemplary use case scenarios.

5.1 Use Case 1: RBA

Risk-based Authentication (RBA) [18] is an adaptive security measure to im-

prove password authentication. During login, RBA monitors and stores addi-

tional features available in the context (e.g., IP address or user agent string)

and requests additional information for authentication if a certain risk level is

exceeded. RBA offers protection against security risks such as credential stuff-

ing, password database leaks and intelligent password guessing methods. Beyond

10 S. Wiefling et al.

that, RBA has the potential to compensate low adoption rates of Two-factor

Authentication (2FA). For instance, less than 10% of all active Google users

activated 2FA in January 2018 [28].

RBA is recommended in the NIST digital identity guidelines [22] and is used

by several large-scale online services. However, these online services keep their

implementations secret and restrain their approaches for a public discussion in

science. This lack of public knowledge makes it difficult for small and medium

websites to use RBA.

For this reason, we black box tested eight popular online services6with

HOSIT to find out more about the corresponding RBA implementations, i.e.,

features and offered additional authentication factors [45]. We created 28 virtual

online identities, registered 224 user accounts with the eight targeted services,

and observed the services’ behavior when accessing them under different cir-

cumstances. Each virtual identity had its own unique IP address from the same

Internet service provider and a personal computer.

However, analyzing the inner workings of RBA is complicated, since one

of the main tasks of RBA is to protect against bots. During pilot testing, we

found indicators that some online services treated an automated browser using

Puppeteer differently. For this reason, we designed our study using HOSIT to

imitate human user behavior as exact as possible. Imitating human behavior was

essential to make sure that the observed services’ behavior is identical to normal

usage.

RBA estimates the login risk based on the login history of the user. Therefore,

our virtual identities conducted 20 browsing sessions including user sessions on

the online services. The user sessions included login, activities on the online

service, and logout. After these 20 browsing sessions, we varied browser features

including the login time, IP address and device, logged in again on all online

services, and observed the reactions. Based on the reactions, we drew conclusions

about the inner RBA workings of the tested online services. The activities on the

online services were randomized and individualized with HOSIT and differed on

each of the online services. We selected typical activities for each of the online

services, e.g., scrolling in the newsfeed, checking mail inbox or browsing for

articles or jobs. In addition, these activities included a lot of randomness to

mitigate being detected as a bot. As an example, on social media websites, it

was randomly alternated between scrolling in the newsfeed, checking the message

inbox and searching for content.

Since online services are likely tracking their users [7,9], all virtual identities

simulated randomized browsing behavior in each browsing session with HOSIT.

They visited search engines and entered search queries based on current topics

discussed in media. Then, they opened some of the websites and “read” the text

by scrolling and waiting. Also, the testing sequence of services was shuffled to a

random order. This was done to prevent our virtual online identities from logging

into the online services at similar times.

6Amazon, Facebook, GOG.com, Google, iCloud, LinkedIn, Steam and Twitch

Mimicking Humanoid Usage Behavior for Studies of Online Services 11

With the study based on HOSIT, we were able to derive features as well as

an approximation to the respective weightings used for the RBA risk estimation

of popular online services. One major finding was that five of the eight tested

popular online services used RBA. Also, each of the services had a different RBA

implementation, varying from protecting all users to only a selection of users.

Besides using the IP address as a high weighted RBA feature, some services also

used additional lower weighted features (e.g., user agent string).

More details on the RBA study can be found in the original publication [45].

5.2 Use Case 2: Amazon Product Recommendation System

When shopping on Amazon, a large amount of customer actions are tracked by

the online shop. Besides the purchased items, these actions also include every

item just visited by the user. Details can be seen in logs which European users

can request from Amazon [3]. This right to request all personal data stored on

a service provider is granted by the GDPR [16].

Fig. 3. Shopping history and recommendations in the Amazon online shop

Amazon offers different types of product recommendations that are consid-

ered interesting for the customer [31]. When visiting a product page for exam-

ple, similar or related items are presented. These items are based on sponsoring

(“Sponsored products related to this item”) or shopping behavior of other users

(“Customers who bought this item also bought” ). Another recommendation type

(“Inspired by your browsing history”) is based on the user’s own browsing his-

tory mentioned in the previous paragraph, i.e., not only the purchase history,

but also items just visited.

The recommendations given by Amazon are interesting for customers as well

as other online shops. Hence, these recommendations can be considered a valu-

able asset for Amazon. It is therefore a reasonable assumption that Amazon, by

detecting bots, is protecting these assets from automatic scraping. As a counter-

measure, a bot could be presented different website content compared to human

12 S. Wiefling et al.

users, e.g., a CAPTCHA, different recommendations, or even recommendations

with different prices. Thus, research on recommendations shown to human users

requires a human-imitating client as provided by HOSIT.

In order to analyze the recommendation system and to verify this assumption,

we conducted a study on the Amazon online shop. In this study, our (automated)

user requested a fixed sequence of products and recorded the recommended prod-

ucts on the history page.

We conducted the same study with three different types of clients: automat-

ically using Puppeteer, automatically using HOSIT, and manually by a human

user. In addition, the products were requested in two different manners: either by

simply opening the sequence of product page URLs or with “human like” online

shopping, i.e., typing a search term into the search bar, selecting a search result,

looking at the product page, searching for a next item and so forth. Finally, we

performed this study with both registered and unregistered Amazon users.

The evaluation of the study revealed an unexpected result: the recommended

items were exactly the same in all cases, including the order of items and the

product prices. Thus, in contrast to the RBA of Amazon services, we assume

that Amazon does not perform any bot detection for their recommendation

system or allows bots to a certain degree, e.g., let harmless bots pass, block bots

exaggerating the network traffic [1].

6 Benefits and Limitations

We put a lot of effort into ensuring that our tool was not recognized as a bot

by online services. Nevertheless, the possibility that online services recognize

HOSIT-based experiments as automated browsing remains. Even human-like

browsing if performed constantly for a very long time will surely be detected.

Also, creating too many new user accounts from the same IP address in a short

time is likely to be noticed and even stopped by many online services. This,

however, is even true when performed by a human. Thus, despite all protection

mechanisms, automated browsing activities should not be exaggerated and kept

at a realistic level, e.g., by introducing a long pause after some hours.

Still, based on our observations, we are convinced that our tool remained

under respective bot detection thresholds. For instance, Amazon did not block

automated logins with HOSIT while it did with Puppeteer. In March 2019, we

also tested HOSIT using an instance of reCAPTCHA v3 [20], which is specifically

designed to recognize bots. It analyzed the browsing behavior and returned a risk

score. The score was a numerical value between 1.0 (very likely a human) and 0.0

(very likely a bot). We opened a testing website, which used reCAPTCHA v3,

with both Puppeteer and HOSIT, and observed the risk score returned by the

reCAPTCHA API. When using HOSIT, the reCAPTCHA v3 risk scores were

identical to those of a human-controlled Chrome browser with empty browsing

history and cookies (score: 0.7 = likely a human), while this was not the case

with Puppeteer (score: 0.1 = likely a bot). After the release of HOSIT in April

2019, the reCAPTCHA risk score when using HOSIT was lowered to 0.3. This

Mimicking Humanoid Usage Behavior for Studies of Online Services 13

again underlines the arms race between bot detectors and bot detection avoiders.

We will observe novel bot detection mechanisms and integrate countermeasures

against them in future versions of HOSIT.

Before conducting research studies with HOSIT, study conductors are ad-

vised to test for anomalies on online services. In addition, study conductors

should monitor which JavaScript attributes were read by online services while

accessing this service [41]. These tests are helpful to determine possible bot de-

tection and to implement countermeasures as a result.

Overall, we still find HOSIT highly sensible for studies due to the following

reasons: (i) The reCAPTCHA v3 risk score is still higher than with Puppeteer.

(ii) Not all online services on the Internet use the current reCAPTCHA. (iii) The

API of HOSIT is more simplified than the API of comparable tools such as

Selenium and Puppeteer, which makes it much easier to use.

7 Ethical Considerations

As with most tools for black box analysis, HOSIT is considered as “dual use”,

i.e., it can be used for illegitimate purposes as well. On the one hand, it can be

beneficial to gather information on service behavior determining our everyday

life. On the other hand, it could also be used for click fraud on online advertising,

theft of intellectual property, or possibly even denial of service. Further, when

using HOSIT, researchers should carefully check not to violate the respective

e.g., by keeping the impact on the inspected online services minimal [29,45].

We believe, however, that the results gathered by public research with HOSIT

can be beneficial for a large user base and thus should be set ahead of corporate

goals. We further argue that our work is justified, as the expected gain from

scientific studies outweighs the potential security implications. Ultimately, we

hope that public research based on our inspection tool will be beneficial for

smaller online services. In consequence, security related research using this tool

will protect a larger user base.

8 Conclusion

In this paper we presented HOSIT, a framework for automatically invoking on-

line services in a human-like manner. As many online services try to detect if the

client is a person or a bot, human-imitating behavior is required for automated

service interactions in order to receive the same results as a human user. HOSIT

implements a number of human-like behavior techniques and can be extended

with further methods, as required by the targeted experiment and online service.

HOSIT can be used to circumvent services’ bot-detection and to perform

large-scale research on how online services behave towards human users. This

is particularly interesting if the offered service depends—or is suspected to

depend—on the user’s behavior, location, history, device, and so on. Examples

for such services are results from search engines, information in social networks,

14 S. Wiefling et al.

or recommendations in online shops. In particular, our research on RBA [45],

which led to valuable and beneficial results, would not have even been possible

without HOSIT. We discovered—among others—a privacy leakage in one of the

RBA dialogs of Facebook and resolved this issue within a responsible disclosure

process.

In future work, we will continuously extend and refine the human-imitating

techniques of HOSIT. To evaluate their effectiveness, we will perform in-depth

analysis on the influence of our methods on bot-detection systems, such as re-

CAPTCHA, on a regular basis. Moreover, we will apply HOSIT to study further

scenarios including, e.g., search engine results and local browser storage usage

patterns. We hope to see more of such research conducted on the basis of HOSIT.

For future research on service behavior, we will also follow alternative ap-

proaches. Instead of performing black box tests using camouflaged tools, services

could enable responsible access to researchers. Researchers would benefit from

unbiased results and focus on the analysis (and not on the black box testing

tools), and services could advertise their support for research. This responsible

service access could be monitored by an independent organization or public au-

thority. A similar method called regulatory sandbox is used successfully in the

financial area [38] and is currently discussed for research on personal identifying

information [39].

Acknowledgements. We would like to thank Tanvi Patil for proofreading a

draft of the paper. This research was supported by the research training group

“Human Centered Systems Security” (NERD.NRW) sponsored by the state of

North Rhine-Westphalia.

References

1. Akamai: Bot-Manager (Jan 2018), https://www.akamai.com/us/en/multimedia/

documents/product-brief/bot-manager-product-brief.pdf

2. Allen, N.A.: Risk based authentication, patent number US9202038B1 (2015)

3. Amazon: Amazon.co.uk Help: How do I request my data? (2019), https://www.

amazon.co.uk/gp/help/customer/display.html?nodeId=G5NBVNN2RHXD5BUW

4. Auer, S., Bizer, C., Kobilarov, G., Lehmann, J., Cyganiak, R., Ives, Z.: DBpedia:

A Nucleus for a Web of Open Data. In: The Semantic Web, vol. 4825, pp. 722–735.

Springer Berlin Heidelberg (Nov 2007)

5. Blythe, J., Botello, A., Sutton, J., Mazzaco, D., Lin, J., Spraragen, M., Zyda, M.:

Testing Cyber Security with Simulated Humans. In: IAAI ’11. San Francisco, CA,

USA (Aug 2011)

6. Bond, R.M., Fariss, C.J., Jones, J.J., Kramer, A.D.I., Marlow, C., Settle, J.E.,

Fowler, J.H.: A 61-million-person experiment in social influence and political mo-

bilization. Nature 489(7415), 295–298 (Sep 2012)

7. Bujlow, T., Carela-Espanol, V., Lee, B.R., Barlet-Ros, P.: A survey on web track-

ing: Mechanisms, implications, and defenses. Proceedings of the IEEE 105(8),

1476–1510 (Aug 2017)

Mimicking Humanoid Usage Behavior for Studies of Online Services 15

8. Card, S.K., Moran, T.P., Newell, A.: The keystroke-level model for user perfor-

mance time with interactive systems. Communications of the ACM 23(7), 396–410

(Jul 1980)

9. Chaabane, A., Kaafar, M.A., Boreli, R.: Big friend is watching you: Analyzing on-

line social networks tracking capabilities. In: WOSN ’12. pp. 7–12. ACM, Helsinki,

Finland (Aug 2012)

10. Choudhary, S.R., Prasad, M.R., Alessandro Orso: X-PERT: a web application test-

ing tool for cross-browser inconsistency detection. In: ISSTA ’14. pp. 417–420.

ACM, San Jose, CA, USA (2014)

11. Dalai, A.K., Jena, S.K.: Online identification of illegitimate web server requests.

In: ICIP ’11. pp. 123–131. Springer, Bangalore, India (2011)

12. Degeling, M., Utz, C., Lentzsch, C., Hosseini, H., Schaub, F., Holz, T.: We Value

Your Privacy ... Now Take Some Cookies: Measuring the GDPR’s Impact on Web

Privacy. In: NDSS ’19. San Diego, CA, USA (Feb 2019)

13. DETER Project: DASH user guide (2014), https://deter-project.org/sites/

deter-test.isi.edu/files/files/dash_users_guide.pdf

14. Drury, C.G., Hoffmann, E.R.: A model for movement time on data-entry keyboards.

Ergonomics 35(2), 129–147 (Feb 1992)

15. Englehardt, S., Narayanan, A.: Online Tracking: A 1-million-site Measurement and

Analysis. In: CCS’16. pp. 1388–1401. ACM, Vienna, Austria (Oct 2016)

16. European Parliament and Council: Regulation (EU) 2016/679 (GDPR) (Jan 2016),

http://data.europa.eu/eli/reg/2016/679/oj/eng

17. Franken, G., Goethem, T.V., Joosen, W.: Who Left Open the Cookie Jar? A

Comprehensive Evaluation of Third-Party Cookie Policies. In: USENIX Security

’18. Baltimore, MD, USA (Aug 2018)

18. Freeman, D., Jain, S., Duermuth, M., Biggio, B., Giacinto, G.: Who Are You? A

Statistical Approach to Measuring User Authenticity. In: NDSS ’16. San Diego,

CA, USA (Feb 2016)

19. Golla, M., D¨urmuth, M.: On the Accuracy of Password Strength Meters. In: CCS

’18. pp. 1567–1582. ACM, Toronto, Canada (Oct 2018)

20. Google: reCAPTCHA v3 (Jul 2019), https://developers.google.com/

recaptcha/docs/v3

21. Google Chrome: Puppeteer - Headless Chrome node API (Jul 2019), https://

github.com/googlechrome/puppeteer

22. Grassi, P.A., Fenton, J.L., Newton, E.M., Perlner, R.A., Regenscheid, A.R., Burr,

W.E., Richer, J.P., Lefkovitz, N.B., Danker, J.M., Choong, Y.Y., Greene, K.K.,

Theofanos, M.F.: Digital identity guidelines: authentication and lifecycle manage-

ment. Tech. Rep. NIST SP 800-63b, National Institute of Standards and Technol-

ogy, Gaithersburg, MD (Jun 2017)

23. Iaroshevych, O.: Improving Second Factor Authentication Challenges to Help Pro-

tect Facebook account owners. In: SOUPS ’17. USENIX Association, Santa Clara,

CA, USA (Jul 2017)

24. Judd, T., Kennedy, G.: A five-year study of on-campus Internet use by undergrad-

uate biomedical students. Computers & Education 55(4), 1564–1571 (Dec 2010)

25. Komandur, S., Johnson, P.W., Storch, R.: Relation between mouse button click

duration and muscle contraction time. In: EMBC ’08. IEEE (Aug 2008)

26. Li, T.C., Hang, H., Faloutsos, M., Efstathopoulos, P.: TrackAdvisor: Taking Back

Browsing Privacy from Third-Party Trackers. In: Passive and Active Measurement,

vol. 8995, pp. 277–289. Springer International Publishing, Cham (2015)

27. Mark, G., Wang, Y., Niiya, M.: Stress and multitasking in everyday college life: an

empirical study of online activity. In: CHI ’14. ACM, Toronto, Canada (2014)

16 S. Wiefling et al.

28. Milka, G.: Anatomy of Account Takeover. In: Enigma 2018. USENIX Association,

Santa Clara, CA (Jan 2018), https://www.usenix.org/node/208154

29. Petsas, T., Tsirantonakis, G., Athanasopoulos, E., Ioannidis, S.: Two-factor au-

thentication: Is the world ready?: Quantifying 2FA adoption. In: EuroSec ’15. pp.

4:1–4:7. ACM, Bordeaux, France (Apr 2015)

30. Rsmwe: Rakuten.com Chrome Headless Detection (Feb 2018), https://github.

com/Rsmwe/Headless-detected-demo

31. Smith, B., Linden, G.: Two Decades of Recommender Systems at Amazon.com.

IEEE Internet Computing 21(3), 12–18 (May 2017)

32. Snickars, P., M¨ahler, R.: SpotiBot — Turing Testing Spotify. Digital Humanities

Quarterly 12, 12 (2018)

33. Soukoreff, R.W., MacKenzie, I.S.: Towards a standard for pointing device evalua-

tion, perspectives on 27 years of Fitts’ law research in HCI. International Journal

of Human-Computer Studies 61(6), 751–789 (Dec 2004)

34. Starov, O., Nikiforakis, N.: XHOUND: Quantifying the Fingerprintability of

Browser Extensions. In: IEEE S&P. IEEE, San Jose, CA, USA (May 2017)

35. Steward, S., Burns, D.: WebDriver - W3C Recommendation (Jun 2018), https:

//www.w3.org/TR/webdriver1/

36. Sulikowski, P., Zdziebko, T., Turzy´nski, D., Ka´ntoch, E.: Human-website interac-

tion monitoring in recommender systems. Procedia Computer Science 126, 1587–

1596 (2018)

37. Trauzettel-Klosinski, S., Dietz, K.: Standardized Assessment of Reading Perfor-

mance: The New International Reading Speed Texts IReST. Investigative Opthal-

mology & Visual Science 53(9), 5452 (Aug 2012)

38. UK Financial Conduct Authority: Regulatory Sandbox Lessons Learned Re-

port (2017), https://www.fca.org.uk/publication/research-and-data/

regulatory-sandbox-lessons-learned-report.pdf

39. UK Information Commissioner’s Office: Call for Views on Build-

ing a Sandbox: Summary of Responses and ICO Comment (2018),

https://ico.org.uk/media/about-the-ico/consultations/2260322/

201811-sandbox-call-for-views-analysis.pdf

40. Vastel, A.: Detecting Chrome headless, new techniques (Jan

2018), https://antoinevastel.com/bot%20detection/2018/01/17/

detect-chrome-headless-v2.html

41. Vastel, A.: How to monitor the execution of JavaScript code with Puppeteer and

Chrome headless (Jun 2019), https://antoinevastel.com/javascript/2019/06/

10/monitor-js-execution.html

42. Velayathan, G., Yamada, S.: Behavior-Based Web Page Evaluation. In: WI-IAT

’06. pp. 409–412 (Dec 2006)

43. Venkatadri, G., Lucherini, E., Sapiezynski, P., Mislove, A.: Investigating sources

of PII used in Facebook’s targeted advertising. PETS 2019, 227–244 (Jan 2019)

44. W3Schools: Browser Statistics: The Most Popular Browsers (2019), https://www.

w3schools.com/browsers/default.asp

45. Wiefling, S., Lo Iacono, L., D¨urmuth, M.: Is This Really You? An Empiri-

cal Study on Risk-Based Authentication Applied in the Wild. In: IFIP SEC

’19. Springer International Publishing (Jun 2019), https://doi.org/10.1007/

978-3-030-22312-0_10

46. Williams, J.L., Skinner, C.H., Floyd, R.G., Hale, A.D., Neddenriep, C., Kirk, E.P.:

Words correct per minute: The variance in standardized reading scores accounted

for by reading speed. Psychology in the Schools 48(2), 87–101 (Feb 2011)