Hacking Google reCAPTCHA v3 using Reinforcement Learning

Ismail Akrout∗

T´

el´

ecom ParisTech

akrout.ismail@gmail.com

Amal Feriani∗

Ankor AI

amal.feriani@gmail.com

Mohamed Akrout

University of Toronto

makrout@cs.toronto.edu

Abstract

We present a Reinforcement Learning (RL) methodology to bypass Google reCAPTCHA v3. We formulate the problem as

a grid world where the agent learns how to move the mouse and click on the reCAPTCHA button to receive a high score.

We study the performance of the agent when we vary the cell size of the grid world and show that the performance drops

when the agent takes big steps toward the goal. Finally, we use a divide and conquer strategy to defeat the reCAPTCHA

system for any grid resolution. Our proposed method achieves a success rate of 97.4% on a 100 ×100 grid and 96.7% on

a1000 ×1000 screen resolution.

Keywords: Reinforcement Learning, reCAPTCHA, Security, Artificial Intelli-

gence, Machine Learning

Acknowledgements

We thank Douglas Tweed for his valuable feedback and helpful discussions.

∗equal contribution

arXiv:1903.01003v3 [cs.LG] 18 Apr 2019

1 Introduction

Artificial Intelligence (AI) has been experiencing unprecedented success in the recent years thanks to the progress

accomplished in Machine Learning (ML), and more specifically Deep Learning (DL). These advances raise several

questions about AI safety and ethics [1]. In this work, we do not provide an answer to these questions but we show

that AI systems based on ML algorithms such as reCAPTCHA v3 [2] are still vulnerable to automated attacks. Google’s

reCAPTCHA system, for detecting bots from humans, is the most used defense mechanism in websites. Its purpose is

to protect against automated agents and bots, attacks and spams. Previous versions of Google’s reCAPTCHA (v1 and

v2) present tasks (images, letters, audio) easily solved by humans but challenging for computers. The reCAPTCHA v1

presented a distorted text that the user had to type correctly to pass the test. This version was defeated by Bursztein

et al. [3] with 98% accuracy using ML-based system to segment and recognize the text. As a result, image-based and

audio-based reCAPTCHAs were introduced as a second version. Researchers have also succeeded in hacking these

versions using ML and more specifically DL. For example, the authors in [4] designed an AI-based system called

UnCAPTCHA to break Google’s most challenging audio reCAPTCHAs. On 29 October 2018, the official third version

was published [5] and removed any user interface. Google’s reCAPTCHA v3 uses ML to return a risk assessment score

between 0.0 and 1.0. This score characterize the trustability of the user. A score close to 1.0 means that the user is human.

In this work, we introduce an RL formulation to solve this reCAPTCHA version. Our approach is programmatic: first,

we propose a plausible formalization of the problem as a Markov Decision Process (MDP) solvable by state-of-the-art

RL algorithms; then, we introduce a new environment for interacting with the reCAPTCHA system; finally, we analyze

how the RL agent learns or fails to defeat Google reCAPTCHA. Experiment results show that the RL agent passes the

reCAPTCHA test with 97.4accuracy. To our knowledge, this is the first attempt to defeat the reCAPTCHA v3 using RL .

2 Method

2.1 Preliminaries

An agent interacting with an environment is modeled as a Markov Decision Process (MDP) [6]. A MDP is defined as

a tuple (S,A, P, r)where Sand Aare the sets of possible states and actions respectively. P(s, a, s0)is the transition

probabilities between states and ris the reward function. Our objective is to find an optimal policy π∗that maximizes

the future expected rewards. Policy-based methods directly learn π∗. Let’s assume that the policy is parameterized by a

set of weights wsuch as π=π(s, w). Then, the objective is defined as: J(w) = EπPT

t=0 γtrtwhere γis the discount

factor and rtis the reward at time t.

Thanks to the policy gradient theorem and the gradient trick [7], the Reinforce algorithm [8] estimates gradients using

(1).

∇EπT

t=0

γtrt=EπT

t=0 ∇log π(at|st)Rt(1)

Rtis the future discounted return at time tdefined as Rt=PT

k=tγ(k−t)·rk, where Tmarks the end of an episode.

Usually the equation (1) is formulated as the gradient of a loss function L(w)defined as follows: L(w) =

−1

NPN

i=1 PT

t=0 ∇log π(ai

t|si

t)Ri

twhere Nis the a number of collected episodes.

2.2 Settings

To pass the reCAPTCHA test, a human user will move his mouse starting from an initial position, perform a sequence

of steps until reaching the reCAPTCHA check-box and clicking on it. Depending on this interaction, the reCAPTCHA

system will reward the user with a score. In this work, we modeled this process as a MDP where the state space Sis the

possible mouse positions on the web page and the action space is A={up, lef t, right, down}. Using these settings, the

task becomes similar to a grid world problem.

As shown in Figure 1, the starting point is the initial mouse position and the goal is the position of the reCAPTCHA is

the web page. For each episode, the starting point is randomly chosen from a top right or a top left region representing

2.5% of the browser window’s area (5% on the x-Axis and 5% on the y-Axis). A grid is then constructed where each pixel

between the initial and final points is a possible position for the mouse. We assume that a normal user will not necessary

move the mouse pixel by pixel. Therefore, we defined a cell size cwhich is the number of pixels between two consecutive

positions. For example, if the agent is at the position (x0, y0)and takes the action left, the next position is then (x0−c, y).

Figure 1: The agent’s mouse movement in a MDP

One of our technical contributions consists in our ability to simulate the same user experience as any normal reCAPTCHA

user. This was challenging since reCAPTCHA system uses different methods to distinguish fake or headless browsers,

inorganic behaviors of the mouse, etc. Our environment overcomes all these problems. For more details about the

environment implementation, refer to section 6. At each episode, a browser page will open up with the user mouse at a

random position, the agent will take a sequence of actions until reaching the reCAPTCHA or the horizon limit Tdefined

as twice the grid diagonal i.e. T= 2 ×√a2+b2where aand bare the grid’s height and width respectively. Once the

episode ends, the user will receive the feedback of the reCAPTCHA algorithm as would any normal user.

3 Experiments and Results

We trained a Reinforce agent on a grid world of a specific size. Our approach simply applies the trained policy to

choose optimal actions in the reCAPTCHA environment. Our results presented are the success rates across 1000 runs.

We consider that the agent successfully defeated the reCAPTCHA if it obtained a score of 0.9. In our experiments, the

discount factor was γ= 0.99. The policy network was a vanilla two fully connected layer network. The parameters were

learned with a learning rate of 10−3and a batch size of 2000. Figure 3shows the results for a 100 ×100 grid. Our method

successfully passed the reCAPTCHA test with a success rate of 97.4%.

Next, we consider testing our method on bigger grid sizes. If we increase the size of the grid, the state space dimension

|S| increases exponentially and it is not feasible to train a Reinforce algorithm with a very high dimensional state space.

For example, if we set the grid size to 1000 ×1000 pixels, the state space becomes 106versus 104for a 100 ×100. This is

another challenge that we address in this paper: how to attack the reCAPTCHA system for different resolutions without

training an agent for each resolution?

4 An efficient solution to any grid size

In this section, we propose a divide and conquer technique to defeat the reCAPTCHA system for any grid size without

retraining the RL agent. The idea consists in dividing the grid into sub-grids of size 100 ×100 and then applying our

trained agent on these sub-grids to find the optimal strategy for the bigger screen (see Figure 2). Figure 3shows that this

approach is effective and the success rates for the different tested sizes exceed 90%.

Figure 2: Illustration of the divide and conquer approach: the agent runs sequentially on the diagonal grid worlds in

purple. The grid worlds in red are not explored.

Figure 3: Reward distribution of the RL agent on different grid resolutions over 1000 episodes

5 Effect of cell size

Here, we study the sensitivity of our approach to the cell size as illustrated in Figure 4.

(a) cell size 1x1 pixel (b) cell size 3x3 pixel

Figure 4: Illustration of the effect of the cell size on the state space

Figure 5illustrates the obtained performance. We observe that when the cell size increases, the success rate of the agent

decreases. For, cell size of 10, the RL agent is detected as a bot in more than 20% of the test runs. We believe that this

decline is explained by the fact, with a big cell size, the agent scheme will contain more jumps which may be considered

as non-human behavior by the reCAPTCHA system.

6 Details of the reCAPTCHA environment

Most previous works (e.g [4]) used the browser automation software Selenium [9] to simulate interactions with the re-

CAPTCHA system. At the beginning, we adopted the same approach but we observed that the reCAPTCHA system

always returned low scores suggesting that the browser was detected as fake. After investigating the headers of the

HTTP queries, we found an automated header in the webdriver and some additional variables that are not defined in

a normal browser, indicating that the browser is controlled by a script. This was confirmed when we observed that the

reCAPTCHA system with Selenium and a human user always returns a low score.

It is possible to solve this problem in two different ways. The first consists in creating a proxy to remove the automated

header while the second alternative is to launch a browser from the command line and control the mouse using dedicated

Python packages such as the PyAutoGUI library [10]. We adopted the second option since we cannot control the mouse

using Selenium. Hence, unlike previous approches, our environment does not use browser automation tools.

Figure 5: Reward distribution for different cell sizes over 1000 episodes

Another attempt to use Tor [11] to change the IP address did not pass the reCAPTCHA test and resulted in low scores

(i.e 0.3). It is possible that the reCAPTCHA system uses an API services such as ExoneraTor [12] to determine if the IP

address is part of the Tor network or not on a specific date.

We also discovered that simulations running on a browser with a connected Google account receive higher scores com-

pared when no Google account is associated to the browser.

To summarize, in order to simulate a human-like experience, our reCAPTCHA environment (1) does not use browser

automation tools (2) is not connected using a proxy or VPN (3) is not logged in with a Google account.

7 Conclusion

This paper proposes a RL formulation to successfully defeat the most recent version of Google’s reCAPTCHA. The main

idea consists in modeling the reCAPTCHA test as finding an optimal path in a grid. We show how our approach achieves

more than 90% success rate on various resolutions using a divide and conquer strategy. This paper should be considered

as the first attempt to pass the reCAPTCHA test using RL techniques. Next, we will deploy our approach on multiple

pages and verify if the reCAPTCHA adaptive risk analysis engine can detect the pattern of attacks more accurately by

looking at the activities across different pages on the website.

References

[1] Dario Amodei, Chris Olah, Jacob Steinhardt, Paul F. Christiano, John Schulman, and Dan Man´

e. Concrete problems

in ai safety. CoRR, 2016.

[2] Google. reCAPTCHA v3’s website. https://developers.google.com/recaptcha/docs/v3, 2018. [Online; accessed

15-February-2019].

[3] Elie Bursztein, Jonathan Aigrain, Angelika Moscicki, and John.C Mitchell. The end is nigh: Generic solving of

text-based captchas. USENIX Workshop on Offensive Technologies, 2014.

[4] Kevin Bock, Daven Patel, George Hughey, and Dave Levin. uncaptcha: A low-resource defeat of recaptcha’s audio

challenge. USENIX Workshop on Offensive Technologies, 2017.

[5] Google. reCAPTCHA v3’s official announcement. https://webmasters.googleblog.com/2018/10/

introducing-recaptcha-v3-new-way-to.html, 2018. [Online; accessed 15-February-2019].

[6] Martin L Puterman. Markov decision processes: discrete stochastic dynamic programming. John Wiley & Sons, 2014.

[7] Richard S Sutton and Andrew G Barto. Reinforcement learning: An introduction. MIT press, 2018.

[8] Ronald J. Williams. Simple statistical gradient-following algorithms for connectionist reinforcement learning. Mach.

Learn., 8(3-4):229–256, May 1992.

[9] Selenium. https://www.seleniumhq.org/. [Online; accessed 15-February-2019].

[10] PyAutoGUI. https://pyautogui.readthedocs.io/en/latest/. [Online; accessed 15-February-2019].

[11] Tor. https://www.torproject.org/. [Online; accessed 15-February-2019].

[12] ExoneraTor. https://metrics.torproject.org/exonerator.html. [Online; accessed 15-February-2019].