JOURNAL OF L
A
T
E
X CLASS FILES, VOL. 14, NO. 8, FEBRUARY 2021 1
TypeNet: Deep Learning Keystroke Biometrics
Alejandro Acien, Aythami Morales, John V. Monaco, Ruben Vera-Rodriguez, Julian Fierrez, Member, IEEE
Abstract—We study the performance of Long Short-Term Memory networks for keystroke biometric authentication at large scale in
free-text scenarios. For this we introduce TypeNet, a Recurrent Neural Network (RNN) trained with a moderate number of keystrokes
per identity. We evaluate different learning approaches depending on the loss function (softmax, contrastive, and triplet loss), number
of gallery samples, length of the keystroke sequences, and device type (physical vs touchscreen keyboard). With 5 gallery sequences
and test sequences of length 50, TypeNet achieves state-of-the-art keystroke biometric authentication performance with an Equal Error
Rate of 2.2% and 9.2% for physical and touchscreen keyboards, respectively, significantly outperforming previous approaches. Our
experiments demonstrate a moderate increase in error with up to 100,000 subjects, demonstrating the potential of TypeNet to operate
at an Internet scale. We utilize two Aalto University keystroke databases, one captured on physical keyboards and the second on
mobile devices (touchscreen keyboards). To the best of our knowledge, both databases are the largest existing free-text keystroke
databases available for research with more than 136 million keystrokes from 168,000 subjects in physical keyboards, and 60,000
subjects with more than 63 million keystrokes acquired on mobile touchscreens.
Index Terms—Biometrics, keystroke dynamics, large scale, deep learning, TypeNet, keystroke authentication.
F
1 INTRODUCTION
Keystroke dynamics is a behavioral biometric trait aimed
at recognizing individuals based on their typing habits. The
velocity of pressing and releasing different keys [1], the
hand postures during typing [2], and the pressure exerted
when pressing a key [3] are some of the features taken
into account by keystroke biometric algorithms aimed to
discriminate among subjects. Although keystroke biomet-
rics suffer high intra-class variability for person recognition,
especially in free-text scenarios (i.e. the input text typed
is not fixed between enrollment and testing), the ubiquity
of keyboards as a method of text entry makes keystroke
dynamics a near universal modality to authenticate subjects
on the Internet.
Text entry is prevalent in day-to-day applications: un-
locking a smartphone, accessing a bank account, chatting
with acquaintances, email composition, posting content on
a social network, and e-learning [4]. As a means of subject
authentication, keystroke dynamics is economical because
it can be deployed on commodity hardware and remains
transparent to the user. These properties have prompted
several companies to capture and analyze keystrokes. The
global keystroke biometrics market is projected to grow
from $129.8million dollars (2017 estimate) to $754.9million
by 2025, a rate of up to 25% per year1. As an example,
Google has recently committed $7 million dollars to fund
TypingDNA2, a startup company which authenticates peo-
ple based on their typing behavior.
At the same time, the security challenges that keystroke
biometrics promises to solve are constantly evolving and
getting more sophisticated every year: identity fraud, ac-
A. Acien, A. Morales, R. Vera-Rodriguez, and J. Fierrez are with the
School of Engineering, Universidad Autonoma de Madrid, 28049 Madrid,
Spain (e-mail: alejandro.acien@uam.es; aythami.morales@uam.es;
ruben.vera@uam.es; julian.fierrez@uam.es).
J. V. Monaco is with the Naval Postgraduate School, Monterey CA, USA
(e-mail: vinnie.monaco@nps.edu).
1. https://www.prnewswire.com/news-releases/keystroke
2. https://siliconcanals.com/news/
count takeover, sending unauthorized emails, and credit
card fraud are some examples3. These challenges are mag-
nified when dealing with applications that have hundreds
of thousands to millions of users. In this context, keystroke
biometric algorithms capable of authenticating individuals
while interacting with online applications are more neces-
sary than ever. As an example of this, Wikipedia struggles to
solve the problem of ‘edit wars’ that happens when different
groups of editors represent opposing opinions. According
to [5], up to 12% of the discussions in Wikipedia are
devoted to revert changes and vandalism, suggesting that
the Wikipedia criteria to identify and resolve controversial
articles is highly contentious. Large scale keystroke bio-
metrics algorithms could be used to detect these malicious
editors among the thousands of editors who write articles
in Wikipedia every day. Other applications of keystroke
biometric technologies are found in e-learning platforms;
student identity fraud and cheating are some challenges that
virtual education technologies need to addresss to become
a viable alternative to face-to-face education [4].
The literature on keystroke biometrics is extensive, but
to the best of our knowledge, previous systems have only
been evaluated with up to several hundred subjects and
cannot deal with the recent challenges that massive usage
applications are facing. The aim of this paper is to explore
the feasibility and limits of deep learning architectures for
scaling up free-text keystroke biometrics to hundreds of
thousands of users. The main contributions of this work are
threefold:
1) We explore novel free-text keystroke biometrics ap-
proaches based on Deep Recurrent Neural Net-
works, suitable for authentication and identification
at large scale. We conduct an exhaustive exper-
imentation and evaluate how performance is af-
fected by the following factors: the length of the
keystroke sequences, the number of gallery samples,
3. https://150sec.com/fraudulent-fingertips
arXiv:2101.05570v2 [cs.CV] 18 Feb 2021
JOURNAL OF L
A
T
E
X CLASS FILES, VOL. 14, NO. 8, FEBRUARY 2021 2
and the device (touchscreen vs physical keyboard).
We present TypeNet, a Recurrent Neural Network
trained with keystroke sequences from more than
100,000 subjects. We analyze the performance of
three different loss functions (softmax, contrastive,
triplet) used to train TypeNet.
2) The results reported by TypeNet represent the state
of the art in keystroke authentication based on free-
text reducing the error obtained by previous works
in more than 50%. Processed data has been made
available so the results can be reproduced4. We
evaluate TypeNet in terms of Equal Error Rate (EER)
as the number of test subjects is scaled from 100
up to 100,000 (independent from the training data)
for the desktop scenario (physical keyboards) and
up to 30,000 for the mobile scenario (touchscreen
keyboard). TypeNet learns a feature representation
of a keystroke sequence without the need for re-
training if new subjects are added to the database,
as commonly happens in many biometric systems
[6]. Therefore, TypeNet is easily scalable.
3) We carry out a comparison with previous state-of-
the-art approaches for free-text keystroke biometric
authentication. The performance achieved by the
proposed method outperforms previous approaches
in the scenarios evaluated in this work. The results
suggest that authentication error rates achieved by
TypeNet remain low as thousands of new users are
enrolled.
A preliminary version of this article was presented in
[7]. This article significantly improves [7] in the following
aspects:
1) We add a new version of TypeNet trained and
tested with keystroke sequences acquired in mobile
devices and results in the mobile scenario. Addition-
ally, we provide cross-sensor interoperability results
[8], [9] between desktop and mobile datasets.
2) We include two new loss functions (softmax and
triplet loss) that serve to improve the performances
in all scenarios. Our experiments demonstrate that
triplet loss can be used to multiply by two the
accuracy of free-text keystroke authentication ap-
proaches.
3) We evaluate TypeNet in terms of Rank-n identifica-
tion rates using a background set of 1,000 subjects
(independent from the training data).
4) We add experiments about the dependencies be-
tween input text and TypeNet performance, a com-
mon issue in free-text keystroke biometrics.
In summary, we present the first evidence in the lit-
erature of competitive performance of free-text keystroke
biometric authentication at large scale (up to 100,000 test
subjects). The results reported in this work demonstrate
the potential of this behavioral biometric for widespread
deployment.
The paper is organized as follows: Section 2summarizes
related works in free-text keystroke dynamics. Section 3
describes the datasets used for training and testing TypeNet
4. Data available at: https://github.com/BiDAlab/TypeNet
models. Section 4describes the processing steps and learn-
ing methods in TypeNet. Section 5details the experimental
protocol. Section 6reports the experiments and discusses
the results obtained. Section 7summarizes the conclusions
and future work.
2 BACKG ROUN D AN D RELATE D WORK
The measurement of keystroke dynamics depends on the
acquisition of key press and release events. This can oc-
cur on almost any commodity device that supports text
entry, including desktop and laptop computers, mobile and
touchscreen devices that implement soft (virtual) keyboards,
and PIN entry devices such as those used to process credit
card transactions. Generally, each keystroke (the action of
pressing and releasing a single key) results in a keydown
event followed by keyup event, and the sequence of these
timings is used to characterize an individual’s keystroke dy-
namics. Within a web browser, the acquisition of keydown
and keyup event timings requires no special permissions,
enabling the deployment of keystroke biometric systems
across the Internet in a transparent manner.
Keystroke biometric systems are commonly placed into
two categories: fixed-text, where the keystroke sequence
typed by the subject is prefixed, such as a username or
password, and free-text, where the keystroke sequence is
arbitrary, such as writing an email or transcribing a sen-
tence with typing errors. Notably, free-text input results in
different keystroke sequences between the gallery and test
samples as opposed to fixed-text input. Biometric authenti-
cation algorithms based on keystroke dynamics for desktop
and laptop keyboards have been predominantly studied
in fixed-text scenarios where accuracies higher than 95%
are common [18]. Approaches based on sample alignment
(e.g. Dynamic Time Warping) [18], Manhattan distances [19],
digraphs [20], and statistical models (e.g. Hidden Markov
Models) [21] have shown to achieve the best results in fixed-
text.
Nevertheless, the performances of free-text algorithms
are generally far from those reached in the fixed-text sce-
nario, where the complexity and variability of the text entry
contribute to intra-subject variations in behavior, challeng-
ing the ability to recognize subjects [22]. Monrose and Rubin
[10] proposed in 1997 a free-text keystroke algorithm based
on subject profiling by using the mean latency and stan-
dard deviation of digraphs and computing the Euclidean
distance between each test sample and the reference pro-
file. Their results worsened from 90% to 23% of correct
classification rates when they changed both subject profiles
and test samples from fixed-text to free-text. Gunetti and
Picardi [11] extended the previous algorithm to n-graphs.
They calculated the duration of n-graphs common between
training and testing and defined a distance function based
on the duration and order of such n-graphs. Their results of
7.33% classification error outperformed the previous state of
the art. Nevertheless, their algorithm needs long keystroke
sequences (between 700 and 900 keystrokes) and many
keystroke sequences (up to 14) to build the subject profile,
which limits the usability of that approach. Murphy et al.
[15] more recently collected a very large free-text keystroke
JOURNAL OF L
A
T
E
X CLASS FILES, VOL. 14, NO. 8, FEBRUARY 2021 3
Study Scenario #Subjects #Seq. Sequence Size #Keys Best Performance
Monrose and Rubin (1997) [10] Desktop 31 N/A N/A N/A ACC = 23%
Gunetti and Picardi (2005) [11] Desktop 205 1 15 700 900 keys 688K EER = 7.33%
Kim and Kang (2009) [12] Mobile 50 20 200 keys 200K EER= 0.05%
Gascon et al. (2014) [13] Mobile 315 1 10 160 keys 67K EER= 10.0%
Ceker and Upadhyaya (2016) [14] Desktop 34 2 7K keys 442K EER = 2.94%
Murphy et al. (2017) [15] Desktop 103 N/A 1,000 keys 12.9M EER = 10.36%
Monaco and Tappert (2018) [16] Both 55 6 500 keys 165K EER = 0.6%
Deb et al. (2019) [17] Mobile 37 180K 3 seconds 6.7M81.61% TAR at 0.1% FAR
Ours (2020) Both 228K15 70 keys 199M EER= 2.2%
TABLE 1
Comparison among different free-text keystroke datasets employed in relevant related works. N/A = Not Available. ACC = Accuracy, EER = Equal
Error Rate, TAR = True Acceptance Rate, FAR = False Acceptance Rate.
dataset (2.9M keystrokes) and applied the Gunetti and Pi-
cardi algorithm achieving 10.36% classification error using
sequences of 1,000 keystrokes and 10 genuine sequences to
authenticate subjects.
More recently than the pioneering works of Monrose and
Gunetti, some algorithms based on statistical models have
shown to work very well with free-text, like the POHMM
(Partially Observable Hidden Markov Model) [16]. This
algorithm is an extension of the traditional Hidden Markov
Model (HMM), but with the difference that each hidden
state is conditioned on an independent Markov chain. This
algorithm is motivated by the idea that keystroke timings
depend both on past events and the particular key that
was pressed. Performance achieved using this approach in
free-text is close to fixed-text, but it again requires several
hundred keystrokes and has only been evaluated with a
database containing less than 100 subjects.
The performance of keystroke biometric systems on
mobile devices can in some cases exceed that of desktop
systems. Unlike physical keyboards, touchscreen keyboards
support a variety of input methods, such as swipe which
enables text entry by sliding the finger along a path that vis-
its each letter and lifting the finger only between words. The
ability to enter text in ways other than physical key pressing
has led to a greater variety of text entry strategies employed
by typists [23]. In addition to this, mobile devices are readily
equipped with additional sensors which offer more insight
to a users keystroke dynamics. This includes the touchscreen
itself, which is able to sense the location and pressure, as
well as accelerometer, gyroscope, and orientation sensors.
Like desktop keystroke biometrics, many mobile
keystroke biometric studies have focused on fixed-text se-
quences [24]. Some recent works have considered free-text
sequences on mobile devices. Gascon et al. [13] collected
freely typed samples from over 300 participants and de-
veloped a system that achieved a True Acceptance Rate
(TAR) of 92% at 1% False Acceptance Rate (FAR) (an EER of
about 10%). Their system utilized accelerometer, gyroscope,
time, and orientation features. Each user typed an English
pangram (sentence containing every letter of the alphabet)
approximately 160 characters in length, and classification
was performed by Support Vector Machine (SVM). In other
work, Kim and Kang [12] utilized microbehavioral features
to obtain an EER below 0.05% for 50 subjects with a single
reference sample of approximately 200 keystrokes for both
English and Korean input. The microbehavioral features
consist of angular velocities along three axes when each key
is pressed and released, as well as timing features and the
coordinate of the touch event within each key. See [24] for a
survey of keystroke biometrics on mobile devices.
Because mobile devices are not stationary, mobile
keystroke biometrics depend more heavily on environmen-
tal conditions, such as the user’s location or posture, than
physical keyboards which typically remain stationary. This
challenge of mobile keystroke biometrics was examined
by Crawford and Ahmadzadeh in [25]. They found that
authenticating a user in different positions (sitting, standing,
or walking) performed only slightly better than guessing,
but detecting the user’s position before authentication can
significantly improve performance.
Nowadays, with the proliferation of machine learning
algorithms capable of analysing and learning human behav-
iors from large scale datasets, the performance of keystroke
dynamics in the free-text scenario has been boosted. As an
example, [14] proposes a combination of the existing di-
graphs method for feature extraction plus an SVM classifier
to authenticate subjects. This approach achieves almost 0%
error rate using samples containing 500 keystrokes. These
results are very promising, even though it was evaluated
using a small dataset with only 34 subjects. In [17] the
authors employ an RNN within a Siamese architecture to
authenticate subjects based on 8biometric modalities on
smartphone devices. They achieved results in a free-text
scenario of 81.61% TAR at 0.1% FAR using just 3second
test windows with a dataset of 37 subjects.
Previous works in free-text keystroke dynamics have
achieved promising results with up to several hundred
subjects (see Table 1), but they have yet to scale beyond this
limit and leverage emerging machine learning techniques
that benefit from vast amounts of data. Here we take a step
forward in this direction of machine learning-based free-text
keystroke biometrics by using the largest datasets published
to date with 199 million keystrokes from 228,000 subjects
(considering both mobile and desktop datasets). We analyze
to what extent deep learning models are able to scale in
keystroke biometrics to recognize subjects at a large scale
while attempting to minimize the amount of data per subject
required for enrollment.
3 KEYSTROKE DATASETS
All experiments are conducted with two Aalto University
Datasets: 1) the Dhakal et al. dataset [26], which comprises
more than 5GB of keystroke data collected on desktop
keyboards from 168,000 participants; and 2) the Palin et
JOURNAL OF L
A
T
E
X CLASS FILES, VOL. 14, NO. 8, FEBRUARY 2021 4
AABB
Press Release Press Release
Time
HL IL
PL RL
Fig. 1. Example of the 4 temporal features extracted between two con-
secutive keys: Hold Latency (HL), Inter-key Latency (IL), Press Latency
(PL), and Release Latency (RL).
al. dataset [23], which comprises almost 4GB of keystroke
data collected on mobile devices from 260,000 participants.
The same data collection procedure was followed for both
datasets. The acquisition task required subjects to memorize
English sentences and then type them as quickly and ac-
curate as they could. The English sentences were selected
randomly from a set of 1,525 examples taken from the
Enron mobile email and Gigaword Newswire corpus. The
example sentences contained a minimum of 3words and a
maximum of 70 characters. Note that the sentences typed
by the participants could contain more than 70 characters
because each participant could forget or add new charac-
ters when typing. All participants in the Dhakal database
completed 15 sessions (i.e. one sentence for each session) on
either a desktop or a laptop physical keyboard. However,
in the Palin dataset the participants who finished at least 15
sessions are only 23% (60,000 participants) out of 260,000
participants that started the typing test. In this paper we
will employ these 60,000 subjects with their first 15 sessions
in order to allow fair comparisons between both datasets.
For the data acquisition, the authors launched an online
application that records the keystroke data from participants
who visit their webpage and agree to complete the acqui-
sition task (i.e. the data was collected in an uncontrolled
environment). Press (keydown) and release (keyup) event
timings were recorded in the browser with millisecond reso-
lution using the JavaScript function Date.now. The authors
also reported demographic statistics for both datasets: 72%
of the participants from the Dhakal database took a typing
course, 218 countries were involved, and 85% of the them
have English as native language, meanwhile only 31% of the
participants from the Palin database took a typing course,
163 countries were involved, and 68% of the them were
English native speakers.
4 SYSTEM DESCRIPTION
4.1 Pre-processing and Feature Extraction
The raw data captured in each session includes a time
series with three dimensions: the keycodes, press times, and
release times of the keystroke sequence. Timestamps are in
UTC format with millisecond resolution, and the keycodes
are integers between 0and 255 according to the ASCII code.
We extract 4temporal features for each sequence (see
Fig. 1for details): (i) Hold Latency (HL), the elapsed time
Fig. 2. Architecture of TypeNet for free-text keystroke sequences. The
input xis a time series with shape M×5(keystrokes ×keystroke
features) and the output f(x)is an embedding vector with shape 1×128.
between key press and release events; (ii) Inter-key Latency
(IL), the elapsed time between releasing a key and pressing
the next key; (iii) Press Latency (PL), the elapsed time
between two consecutive press events; and (iv) Release
Latency (RL), the elapsed time between two consecutive
release events. These 4features are commonly used in both
fixed-text and free-text keystroke systems [27]. Finally, we
include the keycodes as an additional feature.
The 5features are calculated for each keystroke in the
sequence. Let Nbe the length of the keystroke sequence,
such that each sequence provided as input to the model is a
time series with shape N×5(Nkeystrokes by 5features).
All feature values are normalized before being provided
as input to the model. Normalization is important so that
the activation values of neurons in the input layer of the
network do not saturate (i.e. all close to 1). The keycodes
are normalized to between 0and 1by dividing each keycode
by 255, and the 4timing features are converted to seconds.
This scales most timing features to between 0and 1as the
average typing rate over the entire dataset is 5.1±2.1keys
per second. Only latency features that occur either during
very slow typing or long pauses exceed a value of 1.
4.2 TypeNet Architecture
In keystroke dynamics, it is thought that idiosyncratic
behaviors that enable authentication are characterized by
the relationship between consecutive key press and release
events (e.g. temporal patterns, typing rhythms, pauses,
typing errors). In a free-text scenario, keystroke sequences
between enrollment and testing may differ in both length
and content. This reason motivates us to choose a Recurrent
Neural Network as our keystroke authentication algorithm.
RNNs have demonstrated to be one of the best algorithms to
deal with temporal data (e.g. [28], [29]) and are well suited
for free-text keystroke sequences (e.g. [17], [30]).
JOURNAL OF L
A
T
E
X CLASS FILES, VOL. 14, NO. 8, FEBRUARY 2021 5
Our RNN architecture is depicted in Fig. 2. It is com-
posed of two Long Short-Term Memory (LSTM) layers of
128 units (tanh activation function). Between the LSTM
layers, we perform batch normalization and dropout at a
rate of 0.5to avoid overfitting. Additionally, each LSTM
layer has a recurrent dropout rate of 0.2.
One constraint when training a RNN using standard
backpropagation through time applied to a batch of se-
quences is that the number of elements in the time dimen-
sion (i.e. number of keystrokes) must be the same for all
sequences. We set the size of the time dimension to M. In
order to train the model with sequences of different lengths
Nwithin a single batch, we truncate the end of the input
sequence when N > M and zero pad at the end when
N < M, in both cases to the fixed size M. Error gradients
are not computed for those zeros and do not contribute
to the loss function at the output layer as a result of the
masking layer shown in Fig. 2.
Finally, the output of the model f(x)is an array of size
1×128 that we will employ later as an embedding feature
vector to recognize subjects.
4.3 LSTM Training: Loss Functions
Our goal is to build a keystroke biometric system capable of
generalizing to new subjects not seen during model training,
and therefore, having a competitive performance when it
deploys to applications with thousands of users. Our RNN
is trained only once on an independent set of subjects. This
model then acts as a feature extractor that provides input
to a distance-based recognition scheme. After training the
RNN once, we will evaluate in the experimental section the
recognition performance for a varying number of subjects
and enrollment samples per subject.
We train our deep model with three different loss func-
tions: Softmax loss, which is widely used in classification
tasks; Contrastive loss, a loss for distance metric learning
based on two samples [31]; and Triplet loss, a loss for metric
learning based on three samples [32]. These are each defined
as follows.
4.3.1 Softmax loss
Let xibe a keystroke sequence of individual Ii, and let us
introduce a dense layer after the embeddings described in
the previous section aimed at classifying the individuals
used for learning (see Fig. 3.a). The Softmax loss is applied
as
LS=log
efC
Ii(xi)
C
P
c=1
efC
c(xi)
(1)
where Cis the number of classes used for learning (i.e. iden-
tities), fC= [fC
1, . . . , f C
C], and after learning all elements of
fCwill tend to 0 except fC
Ii(xi)that will tend to 1. Softmax is
widely used in classification tasks because it provides good
performance on closed-set problems. Nonetheless, Softmax
does not optimize the margin between classes. Thus, the
performance of this loss function usually decays for prob-
lems with high intra-class variance. In order to train the
architecture proposed in Fig. 2, we have added an output
classification layer with Cunits (see Fig. 3.a). During the
training phase, the model will learn discriminative infor-
mation from the keystroke sequences and transform this
information into an embedding space where the embedding
vectors f(x)(the outputs of the model) will be close in case
both keystroke inputs belong to the same subject (genuine
pairs), and far in the opposite case (impostor pairs).
4.3.2 Contrastive loss
Let xiand xjeach be a keystroke sequence that together
form a pair which is provided as input to the model. The
Contrastive loss calculates the Euclidean distance between
the model outputs,
d(xi,xj) = kf(xi)f(xj)k(2)
where f(xi)and f(xj)are the model outputs (embedding
vectors) for the inputs xiand xj, respectively. The model will
learn to make this distance small (close to 0) when the input
pair is genuine and large (close to α) for impostor pairs by
computing the loss function LCL defined as follows:
LCL = (1Lij )d2(xi,xj)
2+Lij
max2{0, α d(xi,xj)}
2(3)
where Lij is the label associated with each pair that is set
to 0for genuine pairs and 1for impostor ones, and α0
is the margin (the maximum margin between genuine and
impostor distances). The Contrastive loss is trained using
a Siamese architecture (see Fig. 3.b) that minimizes the
distance between embeddings vectors from the same class
(d(xi,xj)with Lij = 0), and maximizes it for embeddings
from different class (d(xi,xj)with Lij = 1).
4.3.3 Triplet loss
The Triplet loss function enables learning from positive and
negative comparisons at the same time (note that the label
Lij eliminates one of the distances for each pair in the
Contrastive loss). A triplet is composed by three different
samples from two different classes: Anchor (A) and Positive
(P) are different keystroke sequences from the same subject,
and Negative (N) is a keystroke sequence from a different
subject. The Triplet loss function is defined as follows:
LT L = max n0, d2(xi
A,xi
P)d2(xi
A,xj
N) + αo(4)
where αis a margin between positive and negative pairs
and dis the Euclidean distance calculated with Eq. 2. In
comparison with Contrastive loss, Triplet loss is capable
of learning intra- and inter-class structures in a unique
operation (removing the label Lij). The Triplet loss is trained
using an extension of a Siamese architecture (see Fig. 3.c) for
three samples. This learning process minimizes the distance
between embedding vectors from the same class (d(xA,xP)),
and maximizes it for embeddings from different classes
(d(xA,xN)).
JOURNAL OF L
A
T
E
X CLASS FILES, VOL. 14, NO. 8, FEBRUARY 2021 6
Fig. 3. Learning architecture for the different loss functions a) Softmax loss, b) Contrastive loss, and c) Triplet loss. The goal is to find the most
discriminant embedding space f(x).
4.4 LSTM Training: Implementation Details
We train three RNN versions (i.e. one for each loss func-
tion) for each input device: desktop and mobile, using the
Dhakal and Palin databases, respectively. For the desktop
scenario, we train the models using only the first 68,000
subjects from the Dhakal dataset. For the Softmax function
we train a model with C= 10,000 subjects due to GPU
memory constraints, as the Softmax loss requires a very
wide final layer with many classes. In this case, we used
15 ×10,000 = 150,000 keystroke sequences for training
and the remaining 58,000 subjects were discarded. For the
Contrastive loss we generate genuine and impostor pairs
using all the 15 keystroke sequences available for each
subject. This provides us with 15 ×67,999 ×15 = 15.3
million impostor pair combinations and 15 ×14/2 = 105
genuine pair combinations for each subject. The pairs were
chosen randomly in each training batch ensuring that the
number of genuine and impostor pairs remains balanced
(512 pairs in total in each batch including impostor and
genuine pairs). Similarly, we randomly chose triplets for the
Triplet loss training.
The remaining 100,000 subjects were employed only for
model evaluation, so there is no data overlap between the
two groups of subjects. This reflects an open-set authen-
tication paradigm. The same protocol was employed for
the mobile scenario but adjusting the amount of subjects
employed to train and test. In order to have balanced
subsets close to the desktop scenario, we divided by half
the Palin database such that 30,000 subjects were used
to train the models, generating 15 ×29,999 ×15 = 6.75
million impostor pair combinations and 15 ×14/2 = 105
genuine pair combinations for each subject. The other 30,000
subjects were used to test the mobile TypeNet models. Once
again 10,000 subjects were used to train the mobile TypeNet
model with Softmax loss.
Regarding the hyper-parameters employed during train-
ing, the best results for both models were achieved with
a learning rate of 0.05, Adam optimizer with β1= 0.9,
β2= 0.999 and = 108, and the margin set to α= 1.5.
The models were trained for 200 epochs with 150 batches
per epoch and 512 sequences in each batch. The models
were built in Keras-Tensorflow.
5 EXPERIMENTAL PROTOCO L
5.1 Authentication Protocol
We authenticate subjects by comparing gallery samples xi,g
belonging to the subject iin the test set to a query sample
xj,q from either the same subject (genuine match i=j)
or another subject (impostor match i6=j). The test score
is computed by averaging the Euclidean distances between
each gallery embedding vector f(xi,g)and the query embed-
ding vector f(xj,q)as follows:
sq
i,j =1
G
G
X
g=1
||f(xi,g)f(xj,q )|| (5)
where Gis the number of sequences in the gallery (i.e. the
number of enrollment samples) and qis the query sample of
subject j. Taking into account that each subject has a total of
15 sequences, we retain 5sequences per subject as the test
set (i.e. each subject has 5genuine test scores) and let Gvary
between 1G10 in order to evaluate the performance
as a function of the number of enrollment sequences.
To generate impostor scores, for each enrolled subject
we choose one test sample from each remaining subject.
We define kas the number of enrolled subjects. In our
experiments, we vary kin the range 100 kK,
where K= 100,000 for the desktop TypeNet models and
K= 30,000 for the mobile TypeNet. Therefore each subject
has 5genuine scores and k1impostor scores. Note that we
have more impostor scores than genuine ones, a common
scenario in keystroke dynamics authentication. The results
reported in the next section are computed in terms of
Equal Error Rate (EER), which is the value where False
Acceptance Rate (FAR, proportion of impostors classified
as genuine) and False Rejection Rate (FRR, proportion of
JOURNAL OF L
A
T
E
X CLASS FILES, VOL. 14, NO. 8, FEBRUARY 2021 7
genuine subjects classified as impostors) are equal. The error
rates are calculated for each subject and then averaged over
all ksubjects [33].
5.2 Identification Protocol
Identification scenarios are common in forensics applica-
tions, where the final decision is based on a bag of evidences
and the biometric recognition technology can be used to
provide a list of candidates, referred to as background set
Bin this work. The Rank-1 identification rate reveals the
performance to unequivocally identifying the target subject
among all the subjects in the background set. Rank-nrepre-
sents the accuracy if we consider a ranked list of nprofiles
from which the result is then manually or automatically
determined based on additional evidence [34].
The 15 sequences from the ktest subjects in the database
were divided into two groups: Gallery (10 sequences) and
Query (5sequences). We evaluate the identification rate by
comparing the Query set of samples xQ
j,q, with q= 1, ..., 5
belonging to the test subject jagainst the Background
Gallery set xG
i,g, with g= 1, ..., 10 belonging to all back-
ground subjects. The distance was computed by averaging
the Euclidean distances || · || between each gallery embed-
ding vector f(xG
i,g)and each query embedding vector f(xQ
j,q)
as follows:
sQ
i,j =1
10 ×5
10
X
g=1
5
X
q=1
||f(xG
i,g)f(xQ
j,q)|| (6)
We then identify a query set (i.e. subject j=Jis the
same gallery person i=I) as follows:
I= arg min
isQ
i,J (7)
The results reported in the next section are computed in
terms of Rank-naccuracy. A Rank-1means that di,J < dI,J
for any i6=I, while a Rank-nmeans that instead of selecting
a single gallery profile, we select nprofiles starting with
i=Iby increasing distance di,J . In forensic scenarios, it is
traditional to use Rank-20, Rank-50, or Rank-100 in order to
generate a short list of potential candidates that are finally
identified by considering other evidence.
6 EXPERIMENTS AND RE SU LTS
6.1 Authentication: Varying Amount of Enrollment Data
As commented in the related works section, one key factor
when analyzing the performance of a free-text keystroke
authentication algorithm is the amount of keystroke data
per subject employed for enrollment. In this work, we study
this factor with two variables: the keystroke sequence length
Mand the number of gallery sequences used for enrollment
G.
Our first experiment reveals to what extent Mand
Gaffect the authentication performance of our TypeNet
models. Note that the input to our models has a fixed size
of Mafter the masking process shown in Fig. 2. For this
experiment, we set k= 1,000 (where kis the number of
enrolled subjects). Tables 2and 3summarize the error rates
in both desktop and mobile scenarios respectively, achieved
by the TypeNet models for the different values of sequence
length Mand enrollment sequences per subject G.
In the desktop scenario (Table 2) we observe that for
sequences longer than M= 70 there is no significant
improvement in performance. Adding three times more key
events (from M= 50 to M= 150) lowers the EER by only
0.7% in average for all values of G. However, adding more
sequences to the gallery shows greater improvements with
about 50% relative error reduction when going from 1to 10
sequences independent of M. Comparing among the differ-
ent loss functions, the best results are always achieved by
the model trained with Triplet loss for M= 70 and G= 10
with an error rate of 1.2%, followed by the Contrastive loss
function with an error rate of 3.9%; the worst results are
achieved with the Softmax loss function (6.0%). For one-
shot authentication (G= 1), our approach has an error rate
of 4.5% using sequences of 70 keystrokes.
Similar trends are observed in the mobile scenario (Table
3) compared to the desktop scenario (Table 2). First, increas-
ing sequence length beyond M= 70 keystrokes does not
significantly improve performance, but there is a significant
improvement when increasing the number of sequences per
subject. The best results are achieved for M= 100 and
G= 10 with an error rate of 6.3% by the model trained with
triplet loss, followed again by the contrastive loss (10.0%),
and softmax (12.3%). For one-shot authentication (G= 1),
the performance of the triplet model decays up to 10.7%
EER using sequences of M= 100 keystrokes.
Comparing the performance achieved by the three Type-
Net models between mobile and desktop scenarios, we
observe that in all cases the results achieved in the desktop
scenario are significantly better to those achieved in the
mobile scenario. These results are consistent with prior work
that has obtained lower performance on mobile devices
when only timing features are utilized [2], [24], [35].
Next, we compare TypeNet with our implementation
of two state-of-the-art algorithms for free-text keystroke
authentication: a statistical sequence model, the POHMM
(Partially Observable Hidden Markov Model) from [16], and
another algorithm based on digraphs and SVM from [14].
To allow fair comparisons, all approaches are trained and
tested with the same data and experimental protocol: G= 5
enrollment sequences per subject, M= 50 keystrokes per
sequence, k= 1,000 test subjects.
In Fig. 4we plot the error rates of the three approaches
(i.e. Digraphs, POHMM, and TypeNet) trained and tested on
both desktop (left) and mobile (right) datasets. The Type-
Net models outperform previous state-of-the-art free-text
algorithms in both mobile and desktop scenarios with this
experimental protocol, where the amount of enrollment data
is reduced (5×M= 250 training keystrokes in comparison
to more than 10,000 in related works, see Section 2). This can
largely be attributed to the rich embedding feature vector
produced by TypeNet, which minimizes the amount of data
needed for enrollment. The SVM generally requires a large
number of training sequences per subject (100), whereas
in this experiment we have only 5training sequences per
subject. We hypothesize that the lack of training samples
contributes to the poor performance (near chance accuracy)
of the Digraphs system based on SVMs.
JOURNAL OF L
A
T
E
X CLASS FILES, VOL. 14, NO. 8, FEBRUARY 2021 8
#enrollment sequences per subject G
1 2 5 7 10
30 17.2/10.7/8.6 14.1/9.0/6.4 13.3/7.3/4.6 12.7/6.8/4.1 11.5/3.3/3.7
50 16.8/8.2/5.4 13.1/6.7/3.6 10.8/5.4/2.2 9.2/4.8/1.8 8.8/4.3/1.6
70 14.1/7.7/4.5 10.4/6.2/2.8 7.5/4.8/1.7 6.7/4.3/1.4 6.0/3.9/1.2
100 13.8/7.7/4.2 10.1/6.0/2.7 7.4/4.7/1.6 6.4/4.3/1.4 5.7/3.9/1.2
#keys per sequence M
150 13.8/7.7/4.1 10.1/6.0/2.7 7.4/4.7/1.6 6.5/4.3/1.4 5.8/3.8/1.2
TABLE 2
Equal Error Rates (%) achieved in desktop scenario using Softmax/Contrastive/Triplet loss for different values of the parameters M(sequence
length) and G(number of enrollment sequences per subject).
#enrollment sequences per subject G
1 2 5 7 10
30 17.7/15.7/14.2 16.0/14.1/12.5 15.2/13.0/11.3 14.9/12.6/10.9 14.5/12.1/10.5
50 17.2/14.6/12.6 15.4/13.1/10.7 13.8/12.1/9.2 13.4/11.5/8.5 12.7/11.0/8.0
70 17.8/13.8/11.3 15.5/12.4/9.5 13.5/11.2/7.8 13.0/10.7/7.2 12.1/10.4/6.8
100 18.4/13.6/10.7 15.8/12.3/8.9 13.6/10.9/7.3 13.0/10.4/6.6 12.3/10.0/6.3
#keys per sequence M
150 18.4/13.7/10.7 15.9/12.3/8.8 13.7/10.8/7.3 13.0/10.4/6.6 12.3/10.0/6.3
TABLE 3
Equal Error Rates (%) achieved in mobile scenario using Softmax/Contrastive/Triplet loss for different values of the parameters M(sequence
length) and G(number of enrollment sequences per subject).
Fig. 4. ROC comparisons in free-text biometric authentication for desktop (left) and mobile (right) scenarios between the three proposed TypeNet
models and two state-of-the-art approaches: POHMM from [16] and digraphs/SVM from [14]. M= 50 keystrokes per sequence, G= 5 enrollment
sequences per subject, and k= 1,000 test subjects.
6.2 Authentication: Varying Number of Subjects
In this experiment, we evaluate to what extent our best
TypeNet models (those trained with triplet loss) are able
to generalize without performance decay. For this, we scale
the number of enrolled subjects kfrom 100 to K(with
K= 100,000 for desktop and K= 30,000 for mobile).
For each subject we have 5genuine test scores and k1
impostor scores, one against each other test subject. The
models used for this experiment are the same trained in
previous the section (68,000 independent subjects included
in the training phase for desktop and 30,000 for mobile).
Fig. 5shows the authentication results for one-shot en-
rollment (G= 1 enrollment sequences, M= 50 keystrokes
per sequence) and the case (G= 5,M= 50) for different
values of k. For the desktop devices, we can observe that in
both cases there is a slight performance decay when we scale
from 1,000 to 10,000 test subjects, which is more pronounced
in the one-shot case. However, for a large number of subjects
JOURNAL OF L
A
T
E
X CLASS FILES, VOL. 14, NO. 8, FEBRUARY 2021 9
Fig. 5. EER (%) of our proposed TypeNet models when scaling up the
number of test subjects kin one-shot (G= 1 enrollment sequences per
subject) and 5-shot (G= 5) authentication cases. M= 50 keystrokes
per sequence.
(k10,000), the error rates do not appear to demonstrate
continued growth. For the mobile scenario, the results when
scaling from 100 to 1,000 test subjects show a similar ten-
dency compared to the desktop scenario with a slightly
greater performance decay. However, we can observe an
error rate reduction when we continue scaling the number
of test subjects up to 30,000. In all cases the variation of
the performance across the number of test subjects is less
than 2.5% EER. These results demonstrate the potential of
the RNN architecture in TypeNet to authenticate subjects at
large scale in free-text keystroke dynamics. We note that in
the mobile scenario, we have utilized only timing features;
prior work has found that greater performance may be
achieved by incorporating additional sensor features [12].
6.3 Authentication: Cross-device Interoperability
In this experiment we measure the cross-device interoper-
ability between the best TypeNet models trained with the
triplet loss. We also study the capacity of both desktop and
mobile TypeNet models to generalize to other input devices.
For this, we test both models with a different keystroke
dataset than the one employed in their training. Addition-
ally, for this experiment we train a third TypeNet model
called Mixture-TypeNet with triplet loss using keystroke
sequences from both datasets (half of the training batch for
each dataset) but keeping the same train/test subject divi-
sion as the other TypeNet models to allow fair comparisons.
To be consistent with the other experiments we keep the
same experimental protocol: G= 5 enrollment sequences
per subject, M= 50 keystrokes per sequence, k= 1,000 test
subjects.
Table 4shows the error rates achieved for the three Type-
Net models when we test with desktop (Dhakal) and mobile
(Palin) datasets. We can observe that error rates increase
significantly in the cross-device scenario for both desktop
and mobile TypeNet models. This performance decay is
alleviated by the Mixture-TypeNet model, which still per-
forms much worse than the other two models trained and
TypeNet model
Desktop Mobile Mixture
Desktop 2.2 21.4 17.9
Test dataset
Mobile 13.7 9.2 12.6
TABLE 4
Equal Error Rates (%) achieved in the cross-device scenario for the
three TypeNet models (Desktop, Mobile, and Mixture) when testing on
either desktop [26] or mobile [23] dataset.
tested in the same-sensor scenario. These results suggest
that multiple device-specific models may be superior to a
single model when dealing with input from different device
types. This would require device type detection in order to
pass the enrollment and test samples to the correct model
[8].
6.4 Identification based on Keystroke Dynamics
Table 5presents the identification accuracy for a background
of B= 1,000 subjects, k= 10,000 test subjects, G= 10
gallery sequences per subject, and M= 50 keystrokes
per sequence. The accuracy obtained for an identification
scenario is much lower than the accuracy reported for
authentication. In general, the results suggest that keystroke
identification enables a 90% size reduction of the candidate
list while maintaining almost 100% accuracy (i.e., 100%
rank-100 accuracy with 1,000 subjects). However, the results
show the superior performance of the triplet loss function
and significantly better performance compared to tradi-
tional keystroke approaches [14], [16]. While traditional ap-
proaches are not suitable for large-scale free text keystroke
applications, the results obtained by TypeNet demonstrate
its usefulness in many applications.
The number of background profiles can be further re-
duced if auxiliary data is available to realize a pre-screening
of the initial list of gallery profiles (e.g. country, language).
The Aalto University Dataset contains auxiliary data includ-
ing age, country, gender, keyboard type (desktop vs laptop),
among others. Table 6shows also subject identification
accuracy over the 1,000 subjects with a pre-screening by
country (i.e., contents generated in a country different to
the country of the target subject are removed from the
background set). The results show that pre-screening based
on a unique attribute is enough to largely improve the
identification rate: Rank-1 identification with pre-screening
ranges between 5.5% to 84.0%, while the Rank-100 ranges
between 42.2% to 100%. These results demonstrate the
potential of keystroke dynamics for large-scale identification
when auxiliary information is available.
6.5 Input Text Dependency in TypeNet Models
For the last experiment, we examine the effect of the text
typed (i.e. the keycodes employed as input feature in the
TypeNet models) on the distances between embedding vec-
tors and how this may affect the model performance. The
main drawback when using the keycode as an input feature
JOURNAL OF L
A
T
E
X CLASS FILES, VOL. 14, NO. 8, FEBRUARY 2021 10
Method Scenario Rank
1 50 100
Digraph [14] D 0.1 9.5 15.2
Digraph [14] M 0.0 8.5 14.4
POHMM [16] D 6.1 48.4 63.4
POHMM [16] M 6.5 41.8 53.7
TypeNet (softmax) D 47.5 96.3 98.7
TypeNet (softmax) M 23.5 82.6 91.4
TypeNet (contrastive)D29.4 97.2 99.3
TypeNet (contrastive)M19.0 80.4 89.8
TypeNet (triplet) D 67.4 99.8 99.9
TypeNet (triplet) M 25.5 87.5 94.2
TABLE 5
Identification accuracy (Rank-nin %) for a background size B= 1,000.
Scenario: D = Desktop, M = Mobile.
Method Scenario Rank
1 50 100
Digraph [14] D 5.5 37.6 42.2
POHMM [16] D 21.8 78.3 89.7
TypeNet (softmax) D 68.3 99.39 99.9
TypeNet (contrastive)D56.3 99.7 99.9
TypeNet (triplet) D 84.0 99.9 100
TABLE 6
Identification accuracy (Rank-nin %) for a background size B= 1,000
and pre-screening based on the location of the typist. Scenario: D =
Desktop. There is not metadata related to the mobile scenario.
to free-text keystroke algorithms is that the model could
potentially learning text-based features (e.g. orthography,
linguistic expressions, typing styles) rather than keystroke
dynamics (e.g., typing speed and style) features. To analyze
this phenomenon, we first introduce the Levenshtein dis-
tance (commonly referred as Edit distance) proposed in [36].
The Levenshtein distance dLmeasures the distance between
two words as the minimum number of single-character edits
(insertions, deletions or substitutions) required to change
one word into another. As an example, the Levenshtein
distance between “kitten” and “sitting” is dL= 3, because
we need to substitute “s” for “k”, substitute “i” for “e”,
and insert “g” at the end (three editions in total). With the
Levenshtein distance metric we can measure the similarity
of two keystroke sequences in terms of keys pressed and an-
alyze whether TypeNet models could be learning linguistic
expressions to recognize subjects. This would be revealed
by a high correlation between Levenshtein distance dLand
the Euclidean distance of test scores dE.
In Fig. 6we plot the test scores (Euclidean distances)
employed in one-shot scenario (G= 1 enrollment sequence
per subject, M= 50 keystrokes per sequence, k= 1,000
test subjects) versus the Levenshtein distance between the
gallery and the query sample that produced the test score
(i.e. dE(f(xg),f(xq)) vs. dL(xg,xq)). To provide a quantita-
tive comparison, we also calculate the Pearson coefficient
pand the Linear Regression response as a measure of
correlation between both distances (smaller slope indicates
a weaker relationship). In mobile scenarios (Fig. 6down) we
can observe a significant correlation (i.e higher slope in the
Linear Regression response and high pvalue) between the
Levenshtein distances and the test scores: genuine distance
scores show lower Levenshtein distances (i.e. more similar
typed text) than the impostor ones, and therefore, this
metric provides us some clues about the possibility that
TypeNet models in the mobile scenario could be using the
similarity of linguistic expressions or keys pressed between
the gallery and the query samples to recognize subjects.
These results suggest us that the TypeNet models trained
in the mobile scenario may be performing worse than in
the desktop scenario, among other factors, because mobile
TypeNet embeddings show a significant dependency to the
entry text. On the other hand, in desktop scenarios (Fig.
6up) this correlation is not present (i.e. the small slope
in the Linear Regression response and p0) between
test scores and Levenshtein distances, suggesting that the
embedding vector produced by TypeNet models trained
with the desktop dataset are largely independent of the
input text.
7 CONCLUSIONS AND FUTURE WORK
We have presented TypeNet, a new free-text keystroke
biometrics system based on an RNN architecture trained
with three different loss functions: softmax, contrastive, and
triplet. Authentication and identificatino results were obtain
with two datasets at very large scale: one dataset composed
of 136 million keystrokes from 168,000 subjects captured
on desktop keyboards and a second composed of 60,000
subjects captured on mobile devices with more than 63
million keystrokes. Deep neural networks have shown to
be effective in face recognition tasks when scaling up to
hundreds of thousands of identities [37]. The same capacity
has been shown by TypeNet models in free-text keystroke
biometrics.
In all authentication scenarios evaluated in this work,
the models trained with triplet loss have shown a superior
performance, esspecially when there are many subjects but
few enrollment samples per subject. The results achieved in
this work outperform previous state-of-the-art algorithms.
Our results range from 17.2% to 1.2% EER in desktop and
from 17.7% to 6.3% EER in mobile scenarios depending
on the amount of subject data enrolled. A good balance
between performance and the amount of enrollment data
per subject is achieved with 5enrollment sequences and 50
keystrokes per sequence, which yields an EER of 2.2/9.2%
(desktop/mobile) for 1,000 test subjects. These results sug-
gest that our approach achieves error rates close to those
achieved by the state-of-the-art fixed-text algorithms [18],
within 5% of error rate even when the enrollment data is
scarce.
Scaling up the number of test subjects does not sig-
nificantly affect the performance: the EER in the desktop
scenario increases only 5% in relative terms with respect to
the previous 2.2% when scaling up from 1,000 to 100,000
test subjects, while in the mobile scenario decays up to 15%
the EER in relative terms. Evidence of the EER stabilizing
around 10,000 subjects demonstrates the potential of this
JOURNAL OF L
A
T
E
X CLASS FILES, VOL. 14, NO. 8, FEBRUARY 2021 11
Fig. 6. Levenshtein distances vs. test scores in desktop (up) and mobile (down) scenarios for the three TypeNet models. For qualitative comparison
we plot the linear regression results (red line), and the Pearson correlation coefficient p.
architecture to perform well at large scale. However, the
error rates of both models increase in the cross-device inter-
operability scenario. Evaluating the TypeNet model trained
in the desktop scenario with the mobile dataset the EER
increases from 2.2% to 13.7%, and from 9.2% to 21.4% for
the TypeNet model trained with the mobile dataset when
testing with the desktop dataset. A solution based on a
mixture model trained with samples from both datasets out-
performs the previous TypeNet models in the cross-device
scenario but with significantly worse results compared to
single-device development and testing.
In addition to authentication results, identification ex-
periments have been also conducted. In this case, Type-
Net models trained with triplet loss have shown again a
superior performance in all ranks evaluated. For Rank-1,
TypeNet models trained with triplet loss have an accuracy
of 67.4/25.5% (desktop/mobile) with a background size of
B= 1,000 identities, meanwhile previous related works
barely achieve 6.5% accuracy. For Rank-50, the TypeNet
model trained with triplet loss achieves almost 100% accu-
racy in the desktop scenario and up to 87.5% in the mobile
one. The results are improved when using auxiliary-data to
realize a pre-screening of the initial list of gallery profiles
(e.g. country, language), showing the potential of TypeNet
models to perform great not only in authentication, but
also in identification tasks. Finally we have demonstrated
that the text-entry dependencies in TypeNet models are
irrelevant in desktop scenarios, although in mobile scenarios
the TypeNet models have some correlation between the
input text typed and the performance achieved.
For future work, we will improve the way training
pairs/triplets are chosen in Siamese/Triplet training. Cur-
rently, the pairs are chosen randomly; however, recent work
has shown that choosing hard pairs during the training phase
can improve the quality of the embedding feature vectors
[38]. We will also explore improved learning architectures
based on a combination of short- and long-term modeling,
which has demonstrated to be very useful for modeling
behavioral biometrics [39].
In addition, we plan to test our model with other free-
text keystroke databases to analyze the performance in other
scenarios [40], and investigate alternate ways to combine
the multiple sources of information [34] originated in the
proposed framework, e.g., the multiple distances in Equa-
tion (6). Integration of keystroke data with other informa-
tion captured at the same time in desktop [4] and mobile
acquisition [41] will be also explored.
Finally, the proposed TypeNet models will be valuable
beyond user authentication and identification, for applica-
tions related to human behavior analysis like profiling [42],
bot detection [43], and e-health [44].
ACKNOWLEDGMENTS
This work has been supported by projects: PRIMA
(MSCA-ITN-2019-860315), TRESPASS-ETN (MSCA-ITN-
2019-860813), BIBECA (RTI2018-101248-B-I00 MINECO),
edBB (UAM), and Instituto de Ingenieria del Conocimiento
(IIC). A. Acien is supported by a FPI fellowship from the
Spanish MINECO.
REFERENCES
[1] S. Banerjee and D. Woodard, “Biometric authentication and iden-
tification using keystroke dynamics: A survey,” Journal of Pattern
Recognition Research, vol. 7, pp. 116–139, Jan. 2012. 1
JOURNAL OF L
A
T
E
X CLASS FILES, VOL. 14, NO. 8, FEBRUARY 2021 12
[2] D. Buschek, A. De Luca, and F. Alt, “Improving accuracy, applica-
bility and usability of keystroke biometrics on mobile touchscreen
devices,” in Proc. of the ACM Conference on Human Factors in
Computing Systems, 2015, pp. 1393–1402. 1,7
[3] A. Acien, A. Morales, R. Vera-Rodriguez, and J. Fierrez,
“Keystroke mobile authentication: Performance of long-term ap-
proaches and fusion with behavioral profiling,” in Proc. Iberian
Conf. on Pattern Recognition and Image Analysis (IBPRIA), ser. LNCS,
vol. 11868. Springer, July 2019, pp. 12–24. 1
[4] J. Hernandez-Ortega, R. Daza, A. Morales, J. Fierrez, and J. Ortega-
Garcia, “edBB: Biometrics and Behavior for assessing remote ed-
ucation,” in AAAI Workshop on Artificial Intelligence for Education
(AI4EDU), February 2020. 1,11
[5] T. Yasseri, R. Sumi, A. Rung, A. Kornai, and J. Kertesz, “Dynamics
of conflicts in Wikipedia,” PLOS ONE, vol. 7, no. 6, pp. 1–12, 06
2012. 1
[6] J. Fierrez-Aguilar, D. Garcia-Romero, J. Ortega-Garcia, and
J. Gonzalez-Rodriguez, “Adapted user-dependent multimodal
biometric authentication exploiting general information,” Pattern
Recognition Letters, vol. 26, no. 16, pp. 2628–2639, December 2005.
2
[7] A. Acien, J. V. Monaco, A. Morales, R. Vera-Rodriguez, and
J. Fierrez, “TypeNet: Scaling up keystroke biometrics,” in Proc.
IEEE/IAPR International Joint Conference on Biometrics (IJCB),
September 2020. 2
[8] F. Alonso-Fernandez, J. Fierrez, D. Ramos, and J. Gonzalez-
Rodriguez, “Quality-based conditional processing in multi-
biometrics: application to sensor interoperability,” IEEE Trans. on
Systems, Man and Cybernetics Part A, vol. 40, no. 6, pp. 1168–1179,
2010. 2,9
[9] R. Tolosana, R. Vera-Rodriguez, J. Fierrez, A. Morales, and
J. Ortega-Garcia, “Benchmarking desktop and mobile handwriting
across cots devices: the e-biosign biometric database,” PLOS ONE,
vol. 5, no. 12, 2017. 2
[10] F. Monrose and A. Rubin, “Authentication via keystroke dynam-
ics,” in Proc. of the 4th ACM Conference on Computer and Communi-
cations Security, 1997, pp. 48–56. 2,3
[11] D. Gunetti and C. Picardi, “Keystroke analysis of free text,” ACM
Transactions on Information and System Security, vol. 8, no. 3, pp.
312—-347, Aug. 2005. 2,3
[12] J. Kim and P. Kang, “Freely typed keystroke dynamics-based
user authentication for mobile devices based on heterogeneous
features,” Pattern Recognition, vol. 108, p. 107556, 2020. 3,9
[13] H. Gascon, S. Uellenbeck, C. Wolf, and K. Rieck, “Continuous
authentication on mobile devices by analysis of typing motion be-
havior,” Sicherheit 2014–Sicherheit, Schutz und Zuverl¨assigkeit, 2014.
3
[14] H. C¸ eker and S. Upadhyaya, “User authentication with keystroke
dynamics in long-text data,” in Proc. of IEEE 8th International
Conference on Biometrics Theory, Applications and Systems (BTAS),
2016. 3,7,8,9,10
[15] C. Murphy, J. Huang, D. Hou, and S. Schuckers, “Shared dataset
on natural human-computer interaction to support continuous
authentication research,” in Proc. of IEEE/IAPR International Joint
Conference on Biometrics (IJCB), 2017, pp. 525–530. 2,3
[16] J. V. Monaco and C. C. Tappert, “The partially observable Hidden
Markov Model and its application to keystroke dynamics,” Pattern
Recognition, vol. 76, pp. 449–462, 2018. 3,7,8,9,10
[17] D. Deb, A. Ross, A. K. Jain, K. Prakah-Asante, and K. V. Prasad,
“Actions speak louder than (pass)words: Passive authentication
of smartphone users via deep temporal features,” in Proc. of IAPR
International Conference on Biometrics (ICB), 2019. 3,4
[18] A. Morales, J. Fierrez, R. Tolosana, J. Ortega-Garcia, J. Galbally,
M. Gomez-Barrero, A. Anjos, and S. Marcel, “Keystroke Biometrics
Ongoing Competition,” IEEE Access, vol. 4, pp. 7736–7746, Nov.
2016. 2,10
[19] J. V. Monaco, “Robust keystroke biometric anomaly detection,”
arXiv preprint arXiv:1606.09075, Jun. 2016. 2
[20] F. Bergadano, D. Gunetti, and C. Picardi, “User authentication
through keystroke dynamics,” ACM Transactions on Information and
System Security, vol. 5, no. 4, pp. 367–397, Nov. 2002. 2
[21] M. L. Ali, K. Thakur, C. C. Tappert, and M. Qiu, “Keystroke
biometric user verification using Hidden Markov Model,” in Proc.
of IEEE 3rd International Conference on Cyber Security and Cloud
Computing (CSCloud), 2016, pp. 204–209. 2
[22] T. Sim and R. Janakiraman, “Are digraphs good for free-text
keystroke dynamics?” in Proc. of IEEE Conference on Computer
Vision and Pattern Recognition, 2007. 2
[23] K. Palin, A. Feit, S. Kim, P. O. Kristensson, and A. Oulasvirta,
“How do people type on mobile devices? observations from a
study with 37,000 volunteers.” in Proc. of 21st ACM International
Conference on Human-Computer Interaction with Mobile Devices and
Services (MobileHCI’19), 2019. 3,4,9
[24] P. S. Teh, N. Zhang, A. B. J. Teoh, and K. Chen, “A survey on touch
dynamics authentication in mobile devices,” Computers & Security,
vol. 59, pp. 210–235, 2016. 3,7
[25] H. Crawford and E. Ahmadzadeh, “Authentication on the go:
Assessing the effect of movement on mobile device keystroke
dynamics,” in Thirteenth Symposium on Usable Privacy and Security
(SOUPS 2017), 2017, pp. 163–173. 3
[26] V. Dhakal, A. M. Feit, P. O. Kristensson, and A. Oulasvirta,
“Observations on typing from 136 million keystrokes,” in Proc.
of the ACM CHI Conference on Human Factors in Computing Systems,
2018. 3,9
[27] A. Alsultan and K. Warwick, “Keystroke dynamics authentication:
A survey of free-text,” International Journal of Computer Science
Issues (IJCSI), vol. 10, pp. 1–10, 01 2013. 4
[28] R. Tolosana, R. Vera-Rodriguez, J. Fierrez, and J. Ortega-Garcia,
“BioTouchPass2: Touchscreen password biometrics using Time-
Aligned Recurrent Neural Networks,” IEEE Transactions on Infor-
mation Forensics and Security, 2020. 4
[29] Tolosana, Ruben and Vera-Rodriguez, Ruben and Fierrez, Julian
and Ortega-Garcia, Javier, “Deepsign: Deep on-line signature ver-
ification,” IEEE Transactions on Biometrics, Behavior, and Identity
Science, 2021. 4
[30] X. Lu, Z. Shengfei, and Y. Shengwei, “Continuous authentication
by free-text keystroke based on CNN plus RNN,” Procedia Com-
puter Science, vol. 147, pp. 314–318, 01 2019. 4
[31] R. Hadsell, S. Chopra, and Y. Lecun, “Dimensionality reduction
by learning an invariant mapping,” in Proc. Computer Vision and
Pattern Recognition Conference, 2006. 5
[32] K. Q. Weinberger and L. K. Saul, “Distance metric learning for
large margin nearest neighbor classification,” Journal of Machine
Learning Research, vol. 10, pp. 207–244, 2009. 5
[33] A. Morales, J. Fierrez, and J. Ortega-Garcia, “Towards predicting
good users for biometric recognition based on keystroke dynam-
ics,” in Proc. of European Conference on Computer Vision Workshops,
ser. LNCS, vol. 8926. Springer, September 2014, pp. 711–724. 7
[34] J. Fierrez, A. Morales, R. Vera-Rodriguez, and D. Camacho, “Mul-
tiple classifiers in biometrics. Part 2: Trends and challenges,”
Information Fusion, vol. 44, pp. 103–112, November 2018. 7,11
[35] N. Banovic, V. Rao, A. Saravanan, A. K. Dey, and J. Mankoff,
“Quantifying aversion to costly typing errors in expert mobile text
entry,” in Proc. of the CHI Conference on Human Factors in Computing
Systems, 2017, pp. 4229––4241. 7
[36] H. Hyyro, “Bit-parallel approximate string matching algorithms
with transposition,” Journal of Discrete Algorithms, vol. 3, no. 2, pp.
215–229, 2005. 10
[37] I. Kemelmacher-Shlizerman, S. M. Seitz, D. Miller, and E. Brossard,
“The megaface benchmark: 1 million faces for recognition at
scale,” in Proceedings of the IEEE Conference on Computer Vision and
Pattern Recognition, 2016, pp. 4873–4882. 10
[38] C.-Y. Wu, R. Manmatha, A. J. Smola, and P. Krahenbuhl, “Sam-
pling matters in deep embedding learning,” in Proc. of the IEEE
International Conference on Computer Vision, 2017, pp. 2840–2848. 11
[39] R. Tolosana, P. Delgado-Santos, A. Perez-Uribe, R. Vera-Rodriguez,
J. Fierrez, and A. Morales, “DeepWriteSYN: On-line handwriting
synthesis via deep short-term representations,” in AAAI Conf. on
Artificial Intelligence (AAAI), February 2021. 11
[40] A. Acien, A. Morales, R. Vera-Rodriguez, J. Fierrez, and O. Del-
gado, “Smartphone sensors for modeling human-computer inter-
action: General outlook and research datasets for user authen-
tication,” in IEEE Conf. on Computers, Software, and Applications
(COMPSAC), July 2020. 11
[41] A. Acien, A. Morales, R. Vera-Rodriguez, and J. Fierrez, “Mul-
tilock: Mobile active authentication based on multiple biometric
and behavioral patterns,” in Proc. ACM Intl. Conf. on Multimedia,
Workshop on Multimodal Understanding and Learning for Embodied
Applications (MULEA), October 2019, pp. 53–59. 11
[42] A. Acien, A. Morales, J. Fierrez, R. V. Rodriguez, and J. Hernandez-
Ortega, “Active detection of age groups based on touch interac-
tion,” IET Biometrics, vol. 8, no. 1, pp. 101–108, January 2019. 11
JOURNAL OF L
A
T
E
X CLASS FILES, VOL. 14, NO. 8, FEBRUARY 2021 13
[43] A. Acien, A. Morales, J. Fierrez, R. Vera-Rodriguez, and
O. Delgado-Mohatar, “Becaptcha: Behavioral bot detection using
touchscreen and mobile sensors benchmarked on humidb,” En-
gineering Applications of Artificial Intelligence, vol. 98, p. 104058,
February 2021. 11
[44] L. Giancardo, A. S´
anchez-Ferro, T. Arroyo-Gallego, I. Butterworth,
C. S. Mendoza, P. Montero, M. Matarazzo, J. A. Obeso, M. L.
Gray, and R. S. J. Est´
epar, “Computer keyboard interaction as an
indicator of early parkinson’s disease,” Scientific Reports, vol. 6,
October 2018. 11
Alejandro Acien Alejandro Acien received the
MSc in Electrical Engineering in 2015 from Uni-
versidad Autonoma de Madrid. In October 2016,
he joined the Biometric Recognition Group -
ATVS at the Universidad Autonoma de Madrid,
where he is currently collaborating as an as-
sistant researcher pursuing the PhD degree.
The research activities he is currently working
in Behaviour Biometrics, Human-Machine Inter-
action, Cognitive Biometric Authentication, Ma-
chine Learning and Deep Learning.
Aythami Morales Aythami Morales received his
M.Sc. degree in Telecommunication Engineer-
ing in 2006 from ULPGC. He received his Ph.D
degree from ULPGC in 2011. He performs his
research works in the BiDA Lab at Universi-
dad Aut´
onoma de Madrid, where he is currently
an Associate Professor. He has performed re-
search stays at the Biometric Research Labo-
ratory at Michigan State University, the Biomet-
ric Research Center at Hong Kong Polytechnic
University, the Biometric System Laboratory at
University of Bologna and Schepens Eye Research Institute. His re-
search interests include pattern recognition, machine learning, trustwor-
thy AI, and biometrics. He is author of more than 100 scientific articles
published in international journals and conferences, and 4 patents. He
has received awards from ULPGC, La Caja de Canarias, SPEGC, and
COIT. He has participated in several National and European projects in
collaboration with other universities and private entities such as ULPGC,
UPM, EUPMt, Accenture, Uni´
on Fenosa, Soluziona, BBVA.
John V. Monaco Dr. Monaco is an Assis-
tant Professor in the Computer Science De-
partment at the Naval Postgraduate School in
Monterey, CA. His research focuses on user
and device fingerprinting, security and privacy
in human-computer interaction, and the develop-
ment of neural-inspired computer architectures.
Dr. Monaco is the recipient of Best Paper Awards
at the 2020 Conference on Human Factors in
Computing Systems and the 2017 International
Symposium on Circuits and Systems. His work
is supported by the National Reconnaissance Office and the Army
Network Enterprise Technology Command.
Ruben Vera-Rodriguez Ruben Vera-Rodriguez
received the M.Sc. degree in telecommunica-
tions engineering from Universidad de Sevilla,
Spain, in 2006, and the Ph.D. degree in elec-
trical and electronic engineering from Swansea
University, U.K., in 2010. Since 2010, he has
been affiliated with the Biometric Recognition
Group, Universidad Autonoma de Madrid, Spain,
where he is currently an Associate Professor
since 2018. His research interests include signal
and image processing, pattern recognition, HCI,
and biometrics, with emphasis on signature, face, gait verification and
forensic applications of biometrics. Ruben has published over 100 sci-
entific articles published in international journals and conferences. He
is actively involved in several National and European projects focused
on biometrics. Ruben has been Program Chair for the IEEE 51st Inter-
national Carnahan Conference on Security and Technology (ICCST) in
2017; the 23rd Iberoamerican Congress on Pattern Recognition (CIARP
2018) in 2018; and the International Conference on Biometric Engineer-
ing and Applications (ICBEA 2019) in 2019.
Julian Fierrez Julian FIERREZ received the
MSc and the PhD degrees from Universidad
Politecnica de Madrid, Spain, in 2001 and 2006,
respectively. Since 2004 he is at Universidad Au-
tonoma de Madrid, where he is Associate Pro-
fessor since 2010. His research is on signal and
image processing, AI fundamentals and applica-
tions, HCI, forensics, and biometrics for security
and human behavior analysis. He is Associate
Editor for Information Fusion, IEEE Trans. on
Information Forensics and Security, and IEEE
Trans. on Image Processing. He has received best papers awards at
AVBPA, ICB, IJCB, ICPR, ICPRS, and Pattern Recognition Letters;
and several research distinctions, including: EBF European Biometric
Industry Award 2006, EURASIP Best PhD Award 2012, Miguel Catalan
Award to the Best Researcher under 40 in the Community of Madrid
in the general area of Science and Technology, and the IAPR Young
Biometrics Investigator Award 2017. Since 2020 he is member of the
ELLIS Society.