Vol.:(0123456789)
1 3
International Journal of Information Security
https://doi.org/10.1007/s10207-020-00510-x
SPECIAL ISSUE PAPER
Combining behavioral biometrics andsession context analytics
toenhance risk‑based static authentication inweb applications
JesusSolano1,2· LuisCamacho1,2· AlejandroCorrea2· ClaudioDeiro2· JavierVargas2· MartínOchoa1,2
© Springer-Verlag GmbH Germany, part of Springer Nature 2020
Abstract
The fragility of password-based authentication has been recognized and studied for several decades. It is an increasingly com-
mon industry practice to profile users based on their sessions context, such as IP ranges and Browser type in order to build
a risk profile on an incoming authentication attempt. On the other hand, behavioral dynamics such as mouse and keyword
features have been proposed in the scientific literature order to improve authentication, but have been shown most effective in
continuous authentication scenarios. In this paper we propose to combine both fingerprinting and behavioral dynamics (for
mouse and keyboard) in order to increase security of login mechanisms. We do this by using machine learning techniques
that aim at high accuracy, and only occasionally raise alarms for manual inspection. We evaluate our approach on a dataset
containing mouse, keyboard and session context information of 24 users and simulated attacks. We show that while context
analysis and behavioural analysis on their own achieve around 0.7 accuracy on this dataset, a combined approach reaches
up to 0.9 accuracy using a linear combination of the outcomes of the single models.
Keywords Behavioral dynamics· Static authentication· Machine learning
1 Introduction
Several studies have pointed out the challenges that pass-
word-based authentication pose for robust security [1, 2].
With the increasing popularity of web services and cloud-
based applications, we have also seen an increase on attacks
to those platforms in the past decade. Several of those
publicly known attacks have involved stealing of authenti-
cation credentials to services (see for instance [3]). In addi-
tion to this, passwords are often the target of Malware (for
instance banking related Malware such as Zeus[4] and its
variants). So even if one would assume users are forced to
select strong passwords (from the point of view of difficulty
to guess), password-based authentication does not provide
strong security guarantees.
In order to mitigate the risk posed by attackers imper-
sonating legitimate users by means of compromised or
guessed credentials, many applications use mechanisms
to detect anomalies by analyzing the connection features
such as incoming IP, browser and OS type as read by HTTP
headers, among others. Some of these context-based features
have been also been discussed in the scientific literature
[5]. However, there are some limitations of those defensive
mechanisms, for example, if the anomaly detection is too
strict, there could be false positives that would harm user
experience and thus hurt the webservices from a business
perspective. On the other hand, if a careful attacker manages
to bypass such context-related filters, for instance by manip-
ulating HTTP parameters, using VPN services, or ultimately
using a victim’s machine [6], then such countermeasures fall
short to provide better security.
*Martín Ochoa
martin.ochoa@appgate.com
Jesus Solano
jesus.solano@appgate.com
Luis Camacho
luis.camacho@appgate.com
Alejandro Correa
alejandro.correa@cyxtera.com
Claudio Deiro
claudio.deiro@cyxtera.com
Javier Vargas
javier.vargas@cyxtera.com
1AppGate Inc., Cra 13A # 98-75, Bogotá, Colombia
2Cyxtera Technologies, BAC Colonnade Office Towers, 2333
Ponce De Leon Blvd, Suite 900, CoralGables, FL33134,
USA
J.Solano et al.
1 3
Behavioral biometrics [7] have been proposed in the
literature as a strategy to enhance the security of both web
and desktop applications. They have shown to work with
reasonable accuracy in the context of continuous authen-
tication [8, 9], when both the training and the monitoring
time of mouse and/or keyboard activity is long enough.
In the context of static authentication, where interaction
during log-in time with users is limited, such methods are
less accurate and may be impractical [10], unless long
static authentication interactions are assumed and many
sessions are available for training. However, in today’s
internet of services, many websites rely on third parties
for security related functionality, that is integrated in the
form of external javascript snippets. In domains handling
highly sensitive data such as banking, those services are
often only allowed to interact with a user’s session during
or before log-in, but not post-login. Users log-in only spo-
radically and thus not sufficient training data is available to
use some models that have been proposed in the literature.
Therefore improving static risk-based authentication is a
practical challenge.
Our proposed solution to address the above mentioned
shortcomings of the individual context-based risk assess-
ment techniques is to synergistically consider machine-
learning based methods to detect anomalies in both context
(browser type, country of origin of IP etc.) and behavio-
ral features of a given user at login time. By considering a
model that takes into account several features of browser,
operating system, internet connection, connection times,
keystroke and mouse dynamics one gains more confidence
on the legitimacy of a given log-in attempt. Our model ana-
lyzes several previous log-in attempts in order to evaluate
the risk of a new log-in attempt and is based on realistic data
from customers of several major banks.
On the other hand, we build a lean machine-learning
model that relies on data from the last 10 login attempts
to give a score on the biometric behaviour, and thus can be
used without large amounts of training data per user. These
ideas have been preliminary explored in [11], which we
extend in the following ways.
– We improved the methodology used to generate log-in
attempts out of the TWOS [12] dataset and re-designed
the evaluation of the proposed behavioural model in
order to obtain more realistic data and to better assess
how the approach generalizes.
– We discuss in depth the accuracy of the individual mod-
els in isolation, and their accuracy in a scenario that has
both context and behavioural attacks.
– We explore three different metamodels of the individual
models in order to find the best performing combination
of parameters. The best combination achieves an accu-
racy of up to 0.9 in our dataset.
The rest of the paper is organized as follows: in Sect.2we
recap some notions of context analytics and behavioral
dynamics. In Sect.3we present our approach, and describe
the data collected and the experimental design. In Sect.4we
describe the experiments carried out in order to assess the
effectiveness of the proposed approach. We discuss related
work in Sect.5and conclude in Sect.6.
2 Background andattacker model
User authentication has been traditionally based on pass-
words or passphrases which are meant to be secret. How-
ever, secrets can be stolen or guessed and, without further
authentication mechanisms, attackers could impersonate a
victim and steal sensitive information. To avoid this, the
implementation of risk based authentication has allowed
traditional authentication systems to increase confidence on
a given user’s identity by analyzing not only a pre-shared
secret, but other features, such as device characteristics or
user interaction which are expected to be unique [13, 14]. In
the following we review some fundamental concepts related
to device fingerprinting for authentication and behavioral
biometrics.
2.1 Device fingerprinting
Device fingerprinting is an identification technique used
both for user tracking and authentication purposes. The
main goal of this technique is to gather characteristics that
uniquely identify a device. There are different ways to create
this profile, the most reliable of them involves creating an
identifier based on hardware signatures. However, acquiring
these signatures requires high level privileges on the device,
which is often hard to achieve.
Thanks to the popularization of the internet and the
increased browser capabilities it is possible to also use
statistical identification techniques using information gath-
ered from the web browser [5, 15], such as browser history,
installed plugins, supported mime types, user agents and
also network information like headers, timestamps, origin
IP and geolocation. Geolocation can be either collected
using HTML5 or approximated from an IP address by
using appropriate services. Gathering only browser infor-
mation means these techniques identify web browsers and
not necessarily devices or users. On the other hand, param-
eters such as HTTP request parameters are easy to spoof.
Most recent techniques try to combine both hardware and
statistical analysis gathering the information using the web
browser capabilities, these techniques use HTML5 [16]
and javascript APIs to measure the execution time of com-
mon javascript functions and the final result of rendering
images as hardware signatures, these measurements are
Combining behavioral biometrics andsession context analytics toenhance risk-based static…
1 3
compared to a base line of time execution and rendering
performed in a known hardware used as control [17, 18].
2.2 User behavior identification
Another popular risk-based authentication technique
is behavioral analysis, based on mouse and keyboard
dynamic statistics. The underlying idea of measuring
user behavior is to turn human-computer interactions
into numerical, categorical and temporal information.
The standard interactions gathered for a behavioral model
are key-strokes, mouse movements and mouse clicks.
For instance, common features extracted from keyboard
events are key pressed and key released events, together
with their time-stamps. For mouse, cursor position, click
coordinates and timestamps are commonly used [12].
Such features are processed and aggregated to profile user
behavior. In this work, we will use aggregations such as
the ones discussed in [19]. As shown in Fig.1we used the
suggested space segmentation in [19] to calculate mouse
movement features.
These behavioral features give us information about
very unique characteristics of each user such as how fast
the user types, how many special keys the user uses, what
is the proportion of use of mouse and keyboard, how long
the user stops interacting before finishing an activity. The
intuition behind this is that it must be easy to distinguish a
user who uses mainly mouse from a user who uses mainly
keyboard, also intuitively some physical conditions like
hardware and user’s ability with the peripheral devices
makes these interactions more unique.
Behavioral models use machine learning to iden-
tify users by using these feature vectors. Notice that by
recording one user’s interaction in the same situation many
times, it is expected that this user will interact with the
computer similarly each time and also that it differs from
the interactions gathered from other users.
2.3 TWOS dataset
For the behavioral dynamics analysis, that we will illus-
trate in the following sections, it is important to have
mouse and keyboard dynamics data, in order to evalu-
ate our models. For this purpose, we have chosen to user
data from a public data set known as The Wolf Of SUTD
(TWOS) [12]. The data set contains realistic instances of
insider threats based on a gamified competition. We have
chosen this dataset since it contains both mouse and key-
board traces, among others. In [12], authors attempted
to simulated user interactions in competing companies,
inducing two types of behaviors (normal and malicious).
The data set contains both mouse and keyboard data of 24
different users. We chose the TWOS dataset because of
the large amount of behavioral patterns they recorded. In
total, TWOS data set has more than 320 hours of mouse
and keyboard dynamics. Data was continuously collected
for volunteers during routine internet browsing activi-
ties in the context of a gamified experiment. The mouse
agent collected the position of the cursor in the screen, the
action’s timestamp, screen resolution, the mouse action,
and user ID. The mouse actions involved in our analysis
are mouse movement, button press/release and scroll. The
keyboard agent logged all characters pressed by the users.
The data set includes the timestamp of event, movement
type (press/release), key and user ID. Both alphanumeric
and special keys were recorded by the agent. Since the
users typed potentially sensitive information the data is
provided in an anonymized fashion. The keyboard was
divided into zones to accomplish the anonymization. Fig-
ure2shows the mapping of the keyboard into three zones
to enhance the privacy concerns.
2.4 Attacker model
We assume an attacker that has gained access to a victim’s
credentials to authenticate to a webservice (login and pass-
word). An attacker may also gain knowledge about, or try to
guess, the context in which a victim uses a service: the time
of the day in which a user usually connects, the operating
system used, the browser used and IP range from which a
victim connects. We assume that an attacker could employ
one of the following strategies, or more than one in combina-
tion with others to attempt to impersonate a victim:
Fig. 1 Mouse directions segmentation Fig. 2 Keyboard mapping layout to anonymize sensitive information
J.Solano et al.
1 3
Simple attack: The attacker connects to the webservice
from a machine different than the victim’s machine.
Context simulation attack: The attacker connects to the
webservice from a machine different than the victim’s
machine, but tries to replicate or guess the victim’s
access patterns: OS, Browser type, IP range and time of
the day similar to victim’s access patterns.
Physical access to victim’s machine: An attacker con-
nects from the victim’s machine, thereby having very
faithfully replicated a victim’s context, and attempts at
impersonating the victim.
Note that we explicitly exclude from the attacker’s capabili-
ties that of recording and attempting to replicate a victim’s
behavioral dynamics (keyboard and mouse usage features).
We believe that although this is an interesting attacker
model, it is an extremely powerful one, and we leave its
treatment to future work.
3 Approach
The goal of our approach is to overcome the shortcomings of
the single risk assessment strategies (context-based analysis
of HTTP connections and behavioral dynamics) by obtaining
a single model that takes into account both strategies.
In Table1we summarize the effectiveness of various
strategies in detecting the attacks discussed in the previ-
ous section, and also highlight the desired outcome of our
approach. In essence, we expect a combined model to per-
form better in case of attacks, given that the combined model
can recognize both changes in context and changes in behav-
ior. Note that in this table we assume there is always imper-
sonation (and thus always changes in behavior).
Moreover, we highlight the potential misclassfication of
the various approaches in various scenarios in Table2. Here,
we summarize the expectation of the combination of both
approaches in terms of reducing false positives. When a user
uses a new device, one would expect its behavior to be simi-
lar in terms of keystrokes and mouse dynamics (although not
exact). When he travels, it should remain very similar, those
correcting possible false positives from the context analysis.
In the following we will summarize the models we used
for for the single risk-based strategies, and describe how
these models are used in combination to produce a combined
risk-based assessment strategy. It is important to note that
for the context analytics data we will assume that some
users have a heterogenous access pattern (i.e. from multiple
devices and locations, due to travel), as depicted in Fig.3
for a user for which we have 338 access records. On the
other hand, the time of activity considered for behavioral
interaction reflects the average time of a password based log-
in (which typically is a value between 25 and 30 seconds).
Because of these challenges single models are not perfect
within a global context attack, but can be used in synergy
to produce a better model as we will show in the evaluation
section.
3.1 History‑aware context analytics
In this subsection we describe the high-level construction
of a session context model, based solely on session data
obtained from HTTP requests. We assume users with com-
plex access behaviors such as the ones depicted in Fig.3, so
we need to build a system that is good at detecting anoma-
lies and potential attacks, but also it is somewhat flexible to
certain changes in context that could be benign.
We assume a system that records usage statistics of the
number of times that a user logs in, the day and time of the
week at which the user logs in, what type of device and
browser they are using, and the country and region from
which the user is accessing. Currently, platform and browser
data is obtained parsing the user agent, and geographic data
is obtained parsing the IP address, information that can be
obtained from network sessions corresponding to successful
log-ins for a given user.
One of the challenges of building such a model is the
fact that several categories considered are non-numerical
(for instance a given browser version or operating system).
This forces us to use a feature vector with connections
statistics on each browser model version, each day of the
week, each country and region etc. On the other hand, we
Table 1 Strategies versus attack
vectors Approach Simple attack Context simulation Physical attack
Context analytics Effective Partially effective Ineffective
behavioral dynamics Partially effective Partially effective Partially effective
Combination Effective More effective than single
approaches.
Partially Effective
Table 2 Strategies versus benign context changes
Approach New machine User travels
Context analytics Likely FP Likely FP
behavioral dynamics Likely accurate Accurate
Combination Likely accurate Accurate
Combining behavioral biometrics andsession context analytics toenhance risk-based static…
1 3
must somehow assess the likelihood of a given connection
context in order to decide whether a new connection is
anomalous or not. One way to do this is to simply compute
the ratio of observations in a given field of a category
divided by the sum of all the observations in that category.
For instance, let cthe number of connections coming from
a country K. Let Cthe total number of observations coming
from all countries for a given users. Then the likelihood of an
incoming connection from Kcould be computed as
c
C
. In order
to assign a probability of 1 to the most likely event within a cat-
egory, and a relative weight to other events in decreasing order
from most likely to less likely, we normalize all values within a
category as follows: order fields from most likely to less likely,
define a new probability for a given field within a category
as the sum of the probabilities for categories with probability
equal or less to the one of the given field. For example, con-
sider three countries with the following probabilities based on
access frequency: US
=
1
2
, UK
=
1
3
, FR
=
1
6
. The normalized
probabilities would be: US = 1, UK =
3
6
and FR =
1
6
.
Moreover, temporal categories (hours, days, etc.) are con-
sidered cyclical, because for instance events around midnight
(before or after 24:00) should be considered relatively close to
each other. Also, in order to smooth the notion of ’closeness’
in discretized events such as frequencies of access in different
hours of the day, we use a convolution as depicted in Fig.4.
In this example, we have a distribution of discrete frequencies
around the clock for a given user. In this scenario, 7PM is the
hour of the day with most access. However this is close to say,
8PM, so it would be appropriate to consider an access at 8PM
relatively normal for this context.
The feature vector for a session login attempt is formed
using the normalized probability for each variable gathered
from the HTTP request. For example, in the countries case
above, a session which comes from US will have a value of 1
for variable country in the feature vector. To train the model we
calculate the probability profiles for each user using the login
history. Afterwards we evaluate a subset of new logins with the
user probability profile and compute the feature vector for each
visit. The feature vector is fed to a Random Forest model that
assesses how anomalous the current event is. The impersona-
tion records were synthesized comparing login events from
one user to the history of another user. With this in mind, the
model assesses the likelihood of an impersonation. Finally the
statistics are updated, the idea being that the system will gradu-
ally adapt to permanent changes in the user profile.
3.2 Behavioral dynamics combining keystrokes
andmouse activity
Both keyboard and mouse events are enough to describe
a human-computer interactions and turn it into behavioral
features. It is obvious that a regular user uses both at the
same time. However, there is no simple way to merge both
keyboard and mouse dynamics features. To describe a user
behavior during a session we calculate the keyboard and
mouse dynamics using all the gathered events in one single
session, where a session is defined as a time frame where the
user is performing any activity on the computer. Once the
keyboard and mouse dynamics are calculated, we combine
both set of features, resulting in only one single vector of
Fig. 3 Context of user with
heterogeneous access patterns
tion.
(a) Connection per country distribution. (b)
Connection per operative system distribu-
(c) Connection per browser distribution.
J.Solano et al.
1 3
features per session. The combination of both set of features
describes the use of keyboard and mouse dynamics in a sin-
gle session. This process is repeated each time a new session
is gathered. To compare a session behavior vector against
the sessions in history we defined a maximum number of
sessions to compare, in our experiment for each user we
randomly chose between 10 and 30 sessions, this allows to
test the algorithm performance with different history length.
We calculated the history mean by using Eq.1, as follows:
Where
FeatureHistMeanj
is defined as the the mean of one
feature,
FeatureHistj
is the individual observation of the
feature and Jis the number of observations in the history.
To compare the gathered session against the user sessions
history we used Eq.2.
(1)
FeatureHistMean
j=
JFeatureHist
j
J
(2)
i=
i
i
1+𝜎(FeatureHist
)
Where
FeatureHistMeani
is the calculated mean of the fea-
ture and
𝜎(FeatureHisti)
is the feature standard deviation.
The resulting vectors of deviations give us the distance of a
session compared to the history.
Using the previously described behavioral analysis pro-
cess, we created a data set of sessions with labeled data.
To create the positive labels we calculated for each user a
base history. Then we calculated the behavioral features and
deviation vectors. To create the negative labels for each user
we randomly selected different users sessions and ran the
behavioral analysis against the original user history. The
resulting vectors feed a random forest algorithm to assess if
a session is legitimate or not.
3.3 Overview ofcombined model
Assume we have a model to assess the risk of a session based
on the browser context information, and another model to
identify users by using behavioral patterns. As discussed in the
introduction, there are however inherent limitations to each of
the single models: context-based info of an incoming network
Fig. 4 Graphical representa-
tion of convolution used for
temporal categories (e.g. hour
of connection)
Combining behavioral biometrics andsession context analytics toenhance risk-based static…
1 3
(HTTP) connection cannot detect advanced impersonation
attacks, whereas behavioral info is not accurate enough in short
interactions such as log-ins. As a result we propose to enhance
the risk-based authentication system’s overall performance by
combining the predictions of both models.
In principle, there are several ways to build such a com-
bined meta-model, for instance by building a decision flow-
chart that takes the scores produced by the singles models and
decides whether a given session should be considered suspi-
cious or not. In this work, we propose to study three different
combination methods of both scores: (1) a parametric linear
combination, (2) a random forest classifier and (3) a Support-
vector Machine (SVM) classifier to predict the combined label
of both scores. Let us to define
yc,yb∈[0, 1]
as the prediction
of context-based and behavioral model, respectively. First at
all, we propose to unify the models’ prediction using a linear
convex combination as we describe in Eq.3.
where
𝛼c,𝛼b∈[0, 1]
are the coefficient parameters of each
model. Note the coefficients must satisfy
𝛼c+𝛼b=1
,
because to be a meaningful prediction
yt∈[0, 1]
. In the
evaluation section we will discuss an example instance of
the parameters. Second, we use a Random Forest classifier
to predict the combined label. The random forest is fed with
the prediction score of context-based and behavioral model.
Third, we propose as combination meta-model a SVM clas-
sifier. The SVM classifier is also trained with individual
scores as input features. In Sect.4we will show the results
for the three combination methods. Furthermore, we will
discuss in-depth the results of the best performance model.
Notice that by building a model, that takes into account
browser context and behavioral dynamics scores, more con-
fidence on the legitimacy of a given log-in attempt can be
gained.
Scalability of the combined model Note that the models
obtained for the two risk assessment strategies involve train-
ing with a dataset of multiple users, however one model is
generated that can be applied for each user (there is no need
to build one model for each user). Therefore, the approach is
designed to scale to millions of users, once the two respective
models are trained.
4 Evaluation
In order to train and evaluate the performance of our pro-
posed method we collect two sets of data. The behavioral
data set, containing both mouse and keyboard data, was
retrieved from a public data set known as The Wolf Of
SUTD (TWOS) [12] as we described in Sect.2.3. Conversely,
the context analytics data set was collected in house from
(3)
yt=𝛼c
yc+𝛼b
yb
banking web services. This data set contains information
about context-based features for online banking log-in ses-
sions. The context-based data set has ca. 13 million entries
summarizing connection features when users perform a
password-based authentication process. Within those fea-
tures each entry has information of session timestamp, IP
Address and user agent. To avoid over-fitting we first split
the history logs for each user into halves. The split is per-
formed for both datasets. The reason of the last is to simulate
an scenario where the user’s credentials were compromised
or guessed at some point. We use the first half to fit and
validate the model. The second half is used to provide an
unbiased evaluation of the final trained model.
In order to test the combined model we perform a match
between the session attempts in TWOS data and context-
based data. First we find out the data set with less entries, for
us TWOS data. Afterwards we split the TWOS data set into
positive (impersonation attacks) and negative samples. As
we balanced TWOS dataset before we train the behavioral
model the behavioral data has as many positives as nega-
tives entries. We take the positives entries of TWOS and
split them into two sets. One of those subsets is matched
with an equal number of random sessions from the context-
based data set. In that vein, the remaining subset is matched
with negatives samples from context-based data. The same
process is performed for the positive entries in TWOS data
set. As a result, the data set for the combined model is dis-
tributed as Table3shows.
4.1 Session context‑based model
Historically, the analysis of connection features is the most
common technique to mitigate the risk of impersonation
attacks. For this, we first train a model for the context-based
information to have a notion of how the basic model is per-
forming. Starting from the session timestamp, IP Address
and user agent in the session start we calculate the convo-
lutions and probability profiles described in Sect.3. From
the ca. 13 million session log in attempts, we take the 30%
of data to test the models performance and the remaining
to train algorithm. The model used to predict the risk of a
connection based on contextual information was a random
Table 3 Distribution of combined label to test the model that aggre-
gates the predictions of single models
Label behavioral Label context Data percentage (%) Com-
bined
label
0 0 25 0
0 1 25 1
1 0 25 1
1 1 25 1
J.Solano et al.
1 3
forest. The evaluation of the performance is done using
standard classification evaluation measures. Using a confu-
sion matrix, the following measures are calculated:
where P,R,TP,FN,TN and FP are the precision score,
recall score, the numbers of true positives, false negatives,
true negatives and false positives, respectively. To evaluate
the context-based model we define as positive the sessions
with context simulation. The sessions with no context simu-
lation attacks are the negative ones. As we are facing a clas-
sification problem some performance metrics are depend-
ent of the decision threshold
𝜆
. The
𝜆
parameter defines the
minimum output probability a prediction must hold to be
𝐀𝐜𝐜𝐮𝐫𝐚𝐜𝐲 =
TP +TN
TP +TN +FP +FN
𝐏𝐫𝐞𝐜𝐢𝐬𝐢𝐨𝐧 =TP
TP +FP
𝐑𝐞𝐜𝐚𝐥𝐥 =TP
TP +FN
𝐅
1Score =2PΔR
P+R
classified as a attack. Table4summarizes the performance
of the single random forest model trained to alert attacks
based on device context information, assuming the correct
label for each sample is only the context simulation label.
The results in Table4show the context-based model per-
forms impressively. Particularly, we observe that the model
has a high accuracy in the process of detect impersonation
attacks with not large decreases in precision or recall. To
have a better understating on how the model is performing
depending on the threshold we compare precision and recall
curves Fig.5b for the context-based model. Furthermore,
we show in Fig.5a the model area under the curve (AUC)
metric.
However, the results showed in Table4hide that in real
scenarios an attack could be an impersonation of context
(browser type, country of origin of IP etc.) or an imper-
sonation of behavioral features for a given user at login
time. Additionally, the imitation or simulation of the ses-
sion context by an attacker is relatively easy. On account
of this premise it is important to evaluate the model using
as correct output for each sample the combined label. In
other words, we define as positive the sessions with context
simulation or impersonation attacks. The sessions without
any attack are the negative ones. In the Table5we summa-
rize the performance of the single context-based model for
different decision thresholds when the combined label is the
output variable.
It becomes noticeable in Table5that context-based model
is not resilient to complex attacks in which the attacker
carefully manipulates HTTP parameters. Specifically, we
observe that the model has decreased accuracy metric in
almost 17% when the model prediction is evaluated over
the combined label. To have a better understating on how
Table 4 Single context-based model performance for different classi-
fication thresholds when evaluated against context-session attacks
Decision
threshold
F1-score Precision Accuracy Recall
0.2 0.931 0.872 0.928 0.998
0.4 0.956 0.924 0.955 0.990
0.5 0.963 0.949 0.964 0.978
0.6 0.964 0.964 0.965 0.965
0.8 0.934 0.982 0.939 0.892
(a) ROC curve and AUC score. (b) Precision-Recall curve.
Fig. 5 Threshold dependent performance curves for the single model of context-based analysis of HTTP connections when evaluated against
context-session attacks
Combining behavioral biometrics andsession context analytics toenhance risk-based static…
1 3
the model is performing depending on the threshold we
compare precision and recall curves Fig.6b for the context-
based model. Furthermore, we show in Fig.6a the model
area under the curve (AUC) metric.
4.2 Behavioral biometrics model
For the behavioral dynamics analysis, we extract mouse
traces and keystrokes from the TWOS dataset for all users.
After extraction we correlate the mouse and keyboard data
for the different sessions the users performed. For each user,
a session is created by collecting all mouse entries within a
time window before the final login click is observed. After-
wards, the keyboard data is correlated searching for all key-
strokes in keyboard data set within the same time windows
for the same user. Only sessions with both mouse and key-
board information are considered, the others were ignored.
With both mouse and keyboard session’s information, we
created the features described in Sect.3. Once the feature
data sets are correlated we train the behavioral dynamics
model. A random forest was trained in order to capture the
behavioral patterns of each user. In order to test the per-
formance of the model we split the half data set into 70%
to train and the remaining data’s entries to validate model.
The evaluation of the performance is done using standard
classification evaluation measures explained in Sect.4. We
test the model with the second half of the behavioral data.
First, we define as positive(attacks) the sessions with only
impersonation attacks. The sessions without impersonation
attacks are the negative ones. In the Table6we summarize
the performance of the single behavioral dynamics model for
different decision threshold, assuming the correct label for
each sample is the behavioral biometrics label.
From the Table6we observe the model has a high preci-
sion for higher thresholds. However, more the threshold is
increased more the recall decrease drastically. As result, the
behavioral biometrics model exhibits a high rate of false
negatives when high thresholds are required. Notice that the
problem addressed in this paper considers the high cost of
false negative predictions because they generates a cascade
of attacks which the system does not alert. For this reason
we compare precision and recall curves Fig.7b for the
Table 5 Single context-based model performance for different clas-
sification thresholds when evaluated against session and biometric
attacks
Decision
threshold
F1-score Precision Accuracy Recall
0.2 0.806 0.932 0.751 0.710
0.4 0.799 0.958 0.748 0.685
0.5 0.792 0.972 0.743 0.668
0.6 0.784 0.980 0.738 0.654
0.8 0.747 0.990 0.703 0.599
(a) ROC curve and AUC score. (b) Precision-Recall curve.
Fig. 6 Threshold dependent performance curves for the single model of context-based analysis of HTTP connections when evaluated over
attacks to context and biometrics
Table 6 Single behavioral model performance for different classifica-
tion thresholds (
𝜆
) when evaluated only against behavioral attacks
Decision
threshold
F1-score Precision Accuracy Recall
0.2 0.806 0.932 0.751 0.710
0.4 0.799 0.958 0.748 0.685
0.5 0.792 0.972 0.743 0.668
0.6 0.784 0.980 0.738 0.654
0.8 0.747 0.990 0.703 0.599
J.Solano et al.
1 3
behavioral model to find out the threshold which minimizes
the critical cases. Additionally, we show in Fig.7a the model
receiver operating characteristic(ROC) curve performance.
The evaluation using only the behavioral label gives
that the threshold which minimizes the false positives and
the false negatives is close to 0.2. But these results have a
bias because of complex attacks could arise from behavio-
ral biometrics impersonation but also from session context
simulation. As we did with the session context-based model,
we evaluate the model using as output variable the combined
label. For that purpose, we define as positive the sessions
with context simulation or impersonation attacks. The ses-
sions without any attack are the negative ones.
The results for the combined label in Table7show that
the model keeps high precision independent of the decision
threshold. However, the false negative rate increases consid-
erably. Furthermore, the F1-score is 14% lower which pre-
dicts a decrease in AUC metrics. To corroborate the decrease
in AUC we show in Fig.8a the model receiver operating
characteristic(ROC) curve performance. Additionally, it is
presented next the precision and recall curves Fig.8b for the
behavioral model.
The AUC scores for both models are around 0.80, how-
ever, the precision and the recall metrics are not accurate
enough for the problem we are addressing.
For instance, the recall for context based model indicates
a high rate of false negatives which in our context means a
high rate of attacks are unnoticeable for the system. Moreo-
ver, F1-scores denote that each model separately has a simi-
lar performance when they try to detect a global attack. The
issue is therefore that each model is not able to detect the
counterpart attack: the context-based model will not detect
changes in biometric features, and the behavioral model, on
its own, will ignore changes in the connection context.
4.3 Combined model
In the light of the above results we develop a model that
attempts to overcome the shortcomings of the single risk
assessment strategies (context-based analysis of HTTP con-
nections and behavioral dynamics individually) by propos-
ing a single model that takes into account both strategies.
In the following we show the results for each combination
model we proposed in Sect.3.3: (1) Parametric Linear Com-
bination, (2) Random Forest classifier and (3) Support-Vec-
tor Machine classifier.
4.3.1 Parametric linear combination
For the convex linear model we combine the predictions of
single models using Eq.3. The values showed in this section
were calculated using
𝛼c
=0.5
and
𝛼b=0.5
following the
intuition that both attacks are equally probable in our data
(a) ROC curve and AUC score. (b) Precision-Recall curve.
Fig. 7 Threshold dependent performance curves for the single model of behavioral dynamics when evaluated against behavioral attacks
Table 7 Single behavioral model performance for different classifica-
tion thresholds (
𝜆
) when evaluated over the combined label
Decision
threshold
F1-score Precision Accuracy Recall
0.2 0.800 0.837 0.720 0.765
0.4 0.662 0.896 0.608 0.525
0.5 0.543 0.920 0.526 0.385
0.6 0.419 0.940 0.454 0.269
0.8 0.222 0.970 0.359 0.126
Combining behavioral biometrics andsession context analytics toenhance risk-based static…
1 3
set construction. However, those parameters for the convex
combination were optimized at the end of the entitled sub-
section. Table8shows the results for the combined model
for different classification thresholds.
It becomes noticeable that the parametric linear com-
bination model have a considerable effect in the improve-
ment accuracy of detect impersonation attacks. A detailed
comparison of the single behavioral model, the single con-
text-based model and the combined model is presented in
Table9.
The results achieved with the parametric linear combina-
tion model show an important enhancement in detection of
attacks, as Table9reveals. The high precision and recall
values bring to light that the use of a combined model per-
forms better in detecting attacks, given that the combined
model can recognize both changes in context and changes
in behavior. At the same time, an improved F1-Score and
accuracy show that the overall classification was improved,
thus also false positives caused by use of new devices or
travel can be sometimes mitigated by using the information
from the behavioral model (Fig.9).
Finally, we show in Fig.10 the receiver operating charac-
teristic (ROC) curve for all models we discuss in this work.
As it is also evident from the AUC in this figure, a combined
model using the parametric linear combination has better
performance than the individual models in the data set we
have considered.
Up to this point all the results we have presented are
related to the scenario when
𝛼c
=0.5
and
𝛼b=0.5
. Despite
the fact we chosen those values based on the intuition of that
both attacks are equally probable in our data set construc-
tion, there might more optimal parameters. To find the best
set of parameters we performed an exhaustive search of the
weights in the convex linear combination. In Table10 we
present the F1-Score, Precision, Accuracy and Recall for
20 different configurations of parameters with a threshold
𝜆=0.2
. Remember the linear combination in Eq.3is convex
and the the parameters must satisfy that
𝛼c+𝛼b=1
. Some
of the configurations are presented in Fig.11 in a graphical
way to have a better understanding of the model behavior
when the
𝛼b
is modified.
The information in Table10 show that the behavio-
ral model should have a greater contribution in the linear
(a) ROC curve and AUC score. (b) Precision-Recall curve.
Fig. 8 Threshold dependent performance curves for the single model of behavioral dynamics when evaluated over the combined label
(
𝛼
b=
𝛼
c=0.5)
Table 8 Combined model performance for different classification
thresholds when evaluated over the combined label (
𝛼b
=
𝛼c
=
0.5
)
Decision
threshold
F1-score Precision Accuracy Recall
0.2 0.915 0.894 0.873 0.937
0.4 0.843 0.972 0.797 0.744
0.5 0.699 0.986 0.660 0.542
0.6 0.549 0.992 0.545 0.379
0.8 0.185 0.998 0.344 0.102
Table 9 Model performance comparison with a decision thresh-
old of 0.2 for the three models we build to increase security in login
attempts when evaluated over the combined label
Model F1-score Precision Accuracy Recall
Behavioral 0.800 0.837 0.720 0.765
Context-based 0.806 0.932 0.751 0.710
Parametric linear
combination 0.915 0.894 0.873 0.937
J.Solano et al.
1 3
combination because it maximises three out of four metrics
we evaluated in the test dataset. As it can be seen from the
analysis, when the parameter
𝛼b
is set to 0.70 the F1-Score,
Precision and Recall are maximized, while the accuracy has
a decrease of 6% compared to the maximum value. It is
important to remark that one the major objectives of the
combined model was to reduce the false negative and false
positive rates and choice of
𝛼b=0.70
succeed this goal,
while it keeps a competitive accuracy. Moreover, the results
of the exhaustive analysis we performed mean that changes
in biometrics features are more difficult to detect compared
to changes in session context because of the need of increase
the contribution of behavioral model to maximize the com-
bined model performance.
Fig. 9 Precision - Recall curves for the combined risk assessment
model (i.e. context-based analysis of HTTP connections and behavio-
ral dynamics) using
𝛼b
=𝛼
c
=
0.5
Fig. 10 ROC curves and AUC scores for the single risk assessment
strategies(context-based analysis of HTTP connections and behav-
ioral dynamics) and the Parametric Linear Combination Model
which combines both strategies (using
𝛼b=
𝛼
c=0.5
.)
(a) Model’s ROCcurve and AUC score for αb=0.3.
(b) Model’s ROC curve and AUC score for αb=0.5.
(c) Model’s ROC curve and AUCscore for αb=0.7.
Fig. 11 ROC curves and AUC scores for the single risk assessment
strategies(context-based analysis of HTTP connections and behavio-
ral dynamics) and the model which combines both strategies for dif-
ferent set-ups of
𝛼b
Combining behavioral biometrics andsession context analytics toenhance risk-based static…
1 3
4.3.2 Random forest classifier
For the Random Forest meta-model we feed a random forest
classifier with both scores given by single models: context-
based and behavioral score. Due to the dimensionality of
input vector we propose to train a random forest with 10
trees. Table11 shows the results for the random forest meta-
model for different classification thresholds.
Table11 let us to conclude that best classification thresh-
old for the random forest meta-model is close to 0.2. Moreo-
ver, Precision metric obtained by using the random forest is
higher than using the parametric linear combination, while
the Recall metrics is lower. A comparison of random for-
est meta-model performance metrics compared to the sin-
gle behavioral model and the single context-based model is
presented in Table12.
The use of a random forest classifier as meta-model
improves the detection of combined attacks, as results in
Table12 reveal. The high F1-Score and accuracy values
show that overall classification of attacks was improved in
relation to the single models. In detail, the random forest
approach exhibits a good performance to avoid false neg-
atives due to high Precision value. At the same time, the
accuracy and Recall metrics are lower compared to linear
parametric meta-model. Finally, we show in Fig.12b the
receiver operating characteristic (ROC) curve for the single
models and the random forest meta-model. As it is also evi-
dent from the AUC in this figure, a combined model using
the random forest classifier has better performance than the
individual models in the data set we have considered.
4.3.3 Support‑vector machine classifier
For the Support-Vector Machine meta-model we feed a SVM
classifier with both scores given by single models: context-
based and behavioral score. As we interested on to give a
risk assessment of the incoming login we have to calibrate
typical SVM’s class scores into probabilities. In order to
have a probability score we use logistic regression on the
SVM’s scores, fit by an additional cross-validation on the
training data. Table13 shows the results for the SVM meta-
model for different classification thresholds.
The results achieved by using the SVM classifier as the
metal model show a important improvement in the detec-
tion impersonation attacks compared to single models. It
is interesting that the SVM meta-model exhibits an almost
constant, and high, value for all metrics evaluated no mat-
ter the selected decision threshold. A detailed comparison
of performance for the single behavioral model, the single
context-based model and the combined model using the
SVM classifier is presented in Table9.
The information in Table14 show that SVM meta-model
approach is also capable to detect with high accuracy the
impersonation attacks. The SVM classifier exhibits the best
behavior to minimize false positives and false negatives
Table 10 Parametric linear combination model performance compari-
son for different set-ups of parameters
𝛼b
and
𝛼c
Only
𝛼b
is presented due to the linear combination is convex and thus
parameters must satisfy that
𝛼c
+𝛼
b
=
1
Bold values highlight the value of
𝛼a
for which the corresponding
metric (F1-score, Precision, Accuracy, Recall) is maximized
Parameter
𝛼b
F1-score Precision Accuracy Recall
0.00 0.751 0.806 0.932 0.710
0.05 0.753 0.808 0.932 0.714
0.10 0.756 0.811 0.932 0.718
0.15 0.766 0.820 0.932 0.733
0.20 0.787 0.840 0.931 0.766
0.25 0.816 0.866 0.927 0.812
0.30 0.834 0.881 0.921 0.845
0.35 0.850 0.895 0.914 0.877
0.40 0.861 0.904 0.907 0.902
0.45 0.869 0.911 0.900 0.922
0.50 0.873 0.915 0.894 0.937
0.55 0.875 0.917 0.888 0.948
0.60 0.876 0.918 0.884 0.955
0.65 0.876 0.919 0.881 0.960
0.70 0.876 0.919 0.878 0.963
0.75 0.873 0.917 0.876 0.962
0.80 0.823 0.881 0.866 0.896
0.85 0.785 0.852 0.857 0.847
0.90 0.760 0.833 0.850 0.816
0.95 0.739 0.815 0.843 0.789
Table 11 Random forest model performance for different classifica-
tion thresholds when evaluated over the combined label
Decision
threshold
F1-score Precision Accuracy Recall
0.2 0.905 0.914 0.863 0.897
0.4 0.897 0.929 0.855 0.868
0.5 0.892 0.935 0.850 0.854
0.6 0.885 0.940 0.842 0.837
0.8 0.860 0.953 0.813 0.783
Table 12 Random forest combination model performance compared
with the best classification decision threshold for the single models
when evaluated over the combined label
Model F1-score Precision Accuracy Recall
Behavioral 0.800 0.837 0.720 0.765
Context-based 0.806 0.932 0.751 0.710
Random Forest
Meta-Model 0.905 0.914 0.863 0.897
J.Solano et al.
1 3
at the same time, as precision and recall metrics reveals.
Moreover, the F1-Score and accuracy show high rates of
attacks detection for the combined label. Finally, we depict
in Fig.13b the receiver operating characteristic (ROC) curve
for the single models and the SVM meta-model.
Notice that at the optimal decision threshold the results
for the SVM meta-model indicate the best trade-off for all
four metrics evaluated in our work. The SVM exhibits the
best behavior of the three approaches to minimize the num-
ber of false positives and false negatives. Moreover, the
SVM meta-model has the highest values of Precision and
Accuracy. Finally, the F1-score obtained for SVM classifier
is statistically comparable with the best F1-Score obtained
for parametric linear combination.
Scalability of the combined model We have measure the
time it takes to evaluate a given session against the separate
strategies, in order to assess the scalability of the approach.
These times were measured in a i7-7700hq processor (2.8
ghz), using a single core. For the context-based model, we
obtain an execution time of 105 ms in average (± 435
𝜇
s) per
session. For the behavioral dynamics model we can classify
a session within 106 ms (± 263
𝜇
s) per session. The times
related to the combination method depends on the combina-
tion approach: for the (1) Parametric Linear Combination
we obtained an execution time of 27 ns in average (± 0.8
ns), for the (2) Random Forest meta-model we obtained an
execution time of 111 ms in average (± 814
𝜇
s) and for the
(3) SVM classifier we obtained an execution time of 1.1 ms
in average (± 22
𝜇
s). As a result, the risk-assessment can be
completed within half second per each session.
4.4 Use inindustrial scenarios
There is no single approach for including an anomaly
detection system such as the one discussed in this paper in
an operational workflow, nor a one-size-fits-all choice of
parameters. Smaller entities – especially if not experienc-
ing a high level of fraud – may want to handle assessments
manually. In this case human operators in a SOC receive an
alert and react to it. Actions may include blocking the user
account, or contacting the user. Aggregated data can also
be used to drive the decision process towards more sophis-
ticated, and effective, solutions.
In this scenario where all alerts are handled by a human
operator it is mandatory that the alert rate is reasonably
(a) Precision-Recall curve. (b) ROCcurve andAUC score.
Fig. 12 Threshold dependent performance curves for combined model using Random Forest Classifier as meta-model
Table 13 SVM classifier model performance for different classifica-
tion thresholds when evaluated over the combined label
Decision
threshold
F1-score Precision Accuracy Recall
0.2 0.914 0.915 0.874 0.912
0.4 0.908 0.926 0.869 0.892
0.5 0.906 0.930 0.866 0.883
0.6 0.904 0.934 0.864 0.875
0.8 0.895 0.943 0.854 0.852
Table 14 SVM combination meta-model performance compared with
the best classification decision threshold for the single models when
evaluated over the combined label
Model F1-score Precision Accuracy Recall
Behavioral 0.800 0.837 0.720 0.765
Context-based 0.806 0.932 0.751 0.710
SVM Meta-model 0.914 0.915 0.874 0.912
Combining behavioral biometrics andsession context analytics toenhance risk-based static…
1 3
small. While it is impossible to define a generic threshold,
1% may be a useful benchmark. As the user base, or the
fraud level, grows the institution may decide to integrate
the assessments in the application work-flow. If this is done
with hard-coded rules then a low rate of false positives is
critical, as each alert will generate an action and therefore
an expense.
Bigger, or more security-aware, institutions will probably
feed the assessment generated from this module to additional
systems. While in practice there is no such clear distinction
– one single system can often play the two roles – we can
typify this systems in two categories:
Dynamic authentication systems. Based on the assess-
ment and other factors such as the user’s risk profile and
the money at stake the system can decide if additional
authentication factors are to be requested to the user, or if
access has to be blocked altogether, once or permanently.
Transaction anomaly detection systems, that can include
the information related to the transaction to decide if
it can be approved, denied or further action should be
requested, including sending the transaction to a SOC
for further human analysis.
In this last scenario a higher rate of false positives is accept-
able, as the alerts will be filtered using independent crite-
ria. Furthermore, a numeric assessment is preferable with
respect to a binary value, letting the institution fine tune,
possibly in real time, how to react to the assessment.
In the context of web-application static authentication,
we believe that optimizing the choice of parameters in the
model to minimize false negatives (i.e. undetected attacks),
is acceptable if in those cases, users can be prompted for a
2-Factor-Authentication such as an OTP sent to their mobile
phones. In our model, setting the threshold between 0.2 and
0.25 will yield between 35% and 24.6% false-positive rate
against 1.9% and 3.5% false-negative rate respectively. This
means roughly one out four users is prompted for 2-FA,
whereas between one out 30 to 50 attacks goes undetected.
Note that these number hold for our experiments, where 75%
of the data consists of attacks, in practice attacks are much
less common.
For very sensitive customers, further manual action can
be taken depending on the transactions performed in the
application. For instance, in the banking domain, further fil-
ters depending on transaction amounts can be applied, given
a suspicion on context and behavior.
4.5 Discussion andlimitations
We have shown that in principle the combination of both
risk-based authentication strategies indeed improves the
performance of the single models in isolation. There are a
number of limitations to our evaluation. First, the data from
the HTTP contextual model and the behavioral model do not
belong to the same users. Although in principle there should
be no strong correlation between context and behavior, a
more accurate model could be built if variations in behavior
from the same user across devices are taken into account.
On the other hand, experiments were built under the
assumption that the combinations of different scenarios
(between contextual and behavioral attacks) were equally
likely. In practice, attacks are rare, and this aspect should be
considered in future work. Last, we have considered behav-
ioral data that has been adapted to simulate static authen-
tication, but that in reality may belong to other activities
(a) Precision-Recall curve. (b) ROCcurve andAUC score.
Fig. 13 Threshold dependent performance curves for combined model using SVM Classifier as meta-model
J.Solano et al.
1 3
in the context of the competition where it was gathered. In
future work, we plan to consider data collected from real
user log-ins. To the best of our knowledge, there is no public
database containing both mouse and keyboard data for static
authentication, although there are some datasets containing
either of them.
5 Related work
Risk-based authentication has seen popularity in web appli-
cations due to the limitations of password authentication.
Bonneau etal.[2] give a historical overview of the intro-
duction of risk-based authentication in practical systems
in order to complement password-based authentication.
Alaca and Van Oorschot[5] classify and survey several
device fingerprinting mechanisms that can be used as the
basis for authentication, and discuss different ways in which
authentication can be complemented by them. Misbahuddin
etal.[20] study the application of machine learning tech-
niques for risk-based authentication using HTTP and net-
work patterns, in a similar spirit of our technique, but do not
take into account behavioral biometric patterns from mouse
and keyboard, that as we have shown, improve the accuracy
of risk-based authentication.
On the other hand, there are several works exploring
applications of behavioral biometrics for static and continu-
ous authentication. In the general context of desktop based
applications, Mondal and Bours[9] have studied the combi-
nation of keyboard and mouse for continuous authentication.
Different from them, we focus on static authentication for
web applications. Shen etal.[10] study the applicability
of mouse-based analytics for static authentication and con-
clude that longer than typical log-in interactions would be
necessary in order to obtain high accuracy in such models.
Traore etal.[13] explore the combination of both mouse
and keyboard for risk-based authentication in web applica-
tions, however they assume the behavior monitor to be in the
application after log-in as well (continuous authentication),
and obtain an equal error rate of around 8% (even when con-
sidering full web sessions). Recently, Solano etal.[21] study
the use of mouse and keyboard features for static authenti-
cation in web applications, however they focus the research
on the feasibility of learning user behavior by using only
few samples from the legitimate user in the training phase.
Moreover they do not consider other risk factors (such as
context) in their approach.
To the best of our knowledge the combination of tradi-
tional risk-based authentication based on HTTP and network
information and behavioral biometrics for static (log-in time)
authentication, as proposed in this work, has not been dis-
cussed in the literature.
6 Conclusions
The results of our proposed method demonstrates that device
identification and behavioral analytics are complementary
methods of risk measurement thus by combining both of
them, efficacy and performance are never lower than single
method approach. Moreover, our approach appears to be
more resilient to changes, for instance when a user changes
his/her device, an only device identification approach will
alert event though there is no attack and an only behavioral
approach will not notice the change at all.
In this work we also have shown that, by combining
both device identification and behavioral identification risk
assessment methods during login time, static web authen-
tication performance can be enhanced by detecting single
and mixed attack models with higher or equal accuracy in
each case. This also makes web authentication systems more
robust and may give the user a better security experience.
We have also discussed the practical applicability of our
solution in industrial scenarios. In the future, we plan to
consider a more powerful attacker model that is aware of
a behavioral risk assessment component and attempts to
bypass it, as well as reproducing this experiments on novel
datasets that collect both session information and behavioral
dynamics simultaneously.
Compliance with ethical standards
Conflict of interest All authors were Cyxtera employees (now App-
Gate Inc.) at the time of writing this manuscript and declare no conflict
of interest. Parts of this study use the TWOS dataset, which is a public
dataset based on the behaviour of 24 students during a gamified experi-
ment and shared in an anonymized fashion by the Singapore Univer-
sity of Technology and Design. Authors of the original study obtained
SUTD’s IRB consent to carry out and share the data used in this paper.
Ethical standard In this work we also used a proprietary dataset of
log-in contextual information (based on HTTP parameters), that was
anonymized and which cannot be associated with any particular indi-
vidual. Moreover, we only disclose aggregated results based on this
dataset. So in sum all procedures performed in studies involving human
participants were in accordance with the ethical standards of the insti-
tutional research committee and with the 1964 Helsinki declaration and
its later amendments or comparable ethical standards.
References
1. Perrig, A.: Shortcomings of password-based authentication. In:
9th USENIX Security Symposium, vol. 130. ACM (2000)
2. Bonneau, J., Herley, C., Stajano, F.M., etal.: Passwords and the
evolution of imperfect authentication. Commun. ACM 58, 78–87
(2014)
3. Newman, L.: Hacker Lexicon: What is Credential Stuffing? Wired
Magazine (2019). https://www.wired.com/story/what-is-credential
-stuffing/. Accessed 12 Sept 2019
Combining behavioral biometrics andsession context analytics toenhance risk-based static…
1 3
4. Kaspersky: Zeus malware. Online (2019). https://usa.kaspersky.
com/resource-center/threats/zeus-virus. Accessed 12 Sept 2019
5. Alaca, F., VanOorschot, P.C.: Device fingerprinting for augment-
ing web authentication: classification and analysis of methods. In:
Proceedings of the 32nd Annual Conference on Computer Secu-
rity Applications, pp. 289–301. ACM (2016)
6. Salem, M.B., Hershkop, S., Stolfo, S.J.: A survey of insider attack
detection research. In: Stolfo, S.J., Bellovin, S.M., Keromytis,
A.D., Hershkop, S., Smith, S.W., Sinclair, S. (eds.) Insider Attack
and Cyber Security, pp. 69–90. Springer, Boston (2008)
7. Yampolskiy, R.V., Govindaraju, V.: Behavioural biometrics: a
survey and classification. Int. J. Biom. 1(1), 81–113 (2008)
8. Zheng, N., Paloski, A., Wang, H.: An efficient user verification
system via mouse movements. In: Proceedings of the 18th ACM
Conference on Computer and Communications Security, pp.
139–150. ACM (2011)
9. Mondal, S., Bours, P.: Combining keystroke and mouse dynam-
ics for continuous user authentication and identification. In: 2016
IEEE International Conference on Identity, Security and Behavior
Analysis (ISBA), pp. 1–8. IEEE (2016)
10. Shen, C., Cai, Z., Guan, X., Wang, J.: On the effectiveness and
applicability of mouse dynamics biometric for static authentica-
tion: a benchmark study. In: 2012 5th IAPR International Confer-
ence on Biometrics (ICB) (2012)
11. Solano, J., Camacho, L., Correa, A., Deiro, C., Vargas, J., Ochoa,
M.: Risk-based static authentication in web applications with
behavioral biometrics and session context analytics. In: Zhou, J.,
Deng, R., Li, Z., Majumdar, S., Meng, W., Wang, L., Zhang, K.
(eds.) Applied Cryptography and Network Security Workshops,
pp. 3–23. Springer, Berlin (2019)
12. Harilal, A., Toffalini, F., Homoliak, I., Castellanos, J., Guarnizo,
J., Mondal, S., Ochoa, M.: The wolf of SUTD (twos): a dataset of
malicious insider threat behavior based on a gamified competition.
J. Wirel. Mob. Netw. (2018). https://doi.org/10.22667/JOWUA
.2018.03.31.054
13. Traore, I., Woungang, I., Obaidat, M.S., Nakkabi, Y., Lai, I.: Com-
bining mouse and keystroke dynamics biometrics for risk-based
authentication in web environments. In: 2012 Fourth International
Conference on Digital Home (2012)
14. Swati Gurav, R.G., Mhangore, S.: Combining keystroke and
mouse dynamics for user authentication. Int. J. Emerg. Trends
Technol. Comput. Sci. (IJETTCS) 6, 055–058 (2017)
15. Cao, Y., Li, S., Wijmans, E.: (Cross-)browser fingerprinting via
OS and hardware level features. In: NDSS (2017). https://doi.
org/10.14722/ndss.2017.23152
16. Nakibly, G., Shelef, G., Yudilevich, S.: Hardware fingerprinting
using HTML5 (2015). arXiv:1503.01408v3
17. Sanchez-Rola, I., Santos, I., Balzarotti, D.: Clock around the
clock: time-based device fingerprinting. In: Proceedings of the
2018 ACM SIGSAC Conference on Computer and Communica-
tions Security, pp. 1502–1514 (2018)
18. Kohno, T., Broido, A., Claffy, K.C.: Remote physical device
fingerprinting. IEEE Trans. Dependable Secure Comput. 2(2),
93–108 (2005)
19. Bailey, K.O., Okolica, J.S., Peterson, G.L.: User identification and
authentication using multi-modal behavioral biometrics. Comput.
Secur. 43, 77–89 (2014)
20. Misbahuddin, M., Bindhumadhava, B.S., Dheeptha, B.: Design
of a risk based authentication system using machine learning
techniques. In: 2017 IEEE SmartWorld, Ubiquitous Intelligence
Computing, Advanced Trusted Computed, Scalable Computing
Communications, Cloud Big Data Computing, Internet of People
and Smart City Innovation, pp. 1–6 (2017)
21. Solano, J., Tengana, L., Castelblanco, A., Rivera, E., Lopez, C.,
Ochoa, M.: A few-shot practical behavioral biometrics model for
login authentication in web applications. In: NDSS Workshop on
Measurements, Attacks, and Defenses for the Web (MADWeb’20)
(2020)
Publisher’s Note Springer Nature remains neutral with regard to
jurisdictional claims in published maps and institutional affiliations.