Markov Chain Fingerprinting to Classify

Encrypted Traffic

Maciej Korczy´

nski∗¶ and Andrzej Duda¶

∗EENR & DIMACS, Rutgers University, New Jersey, USA

¶Grenoble Institute of Technology, CNRS Grenoble Informatics Laboratory UMR 5217, France

Email: Maciej.Korczynski@rutgers.edu, Andrzej.Duda@imag.fr

Abstract—In this paper, we propose stochastic fingerprints

for application traffic flows conveyed in Secure Socket

Layer/Transport Layer Security (SSL/TLS) sessions. The fin-

gerprints are based on first-order homogeneous Markov chains

for which we identify the parameters from observed training

application traces. As the fingerprint parameters of chosen

applications considerably differ, the method results in a very good

accuracy of application discrimination and provides a possibility

of detecting abnormal SSL/TLS sessions. Our analysis of the

results reveals that obtaining application discrimination mainly

comes from incorrect implementation practice, the misuse of

the SSL/TLS protocol, various server configurations, and the

application nature.

I. INTRODUCTION

The importance of appropriate traffic classification methods

continues to grow. They are essential for effective network

planning, policy-based traffic management, application priori-

tization, and security control. However, traditional port-based

[1] and payload-based [2], [3] classification methods become

less effective, because new applications can hide their nature

by dynamically assigning ports, by using tunneling, or by ap-

plying proprietary payload encryption methods. This situation

has led to the development of new identification methods based

on flow features [4], [5] and host behavior [6], [7]. They are

useful for classification of application layer protocols (e.g.

HTTP, DNS, BitTorrent, etc.) or traffic categories (e.g. P2P

content sharing, games, multimedia, WWW, etc.). However,

they are less effective for reliable identification of application

flows on top of a given protocol. Moreover, apart from some

notable exceptions, these methods are not appropriate for

classifying traffic where only one direction is observed due

to routing asymmetries [8].

The past research on traffic analysis and classification

showed that once we are able to generate a unique signature

based on the packet or message payload (e.g. HTTP request

headers), we can classify applications with high accuracy [3],

[8]. Unfortunately, such approaches fail in case of encrypted

traffic [9]. In this work, we propose a payload-based method

to identify application flows encrypted with the Secure Socket

Layer/Transport Layer Security (SSL/TLS) protocol, which is

a fundamental cryptographic protocol suite supporting secure

communication over the Internet [10].

Our approach consists of taking advantage of the infor-

mation embedded in the SSL/TLS header to create statistical

fingerprints of sessions to classify application traffic. We call

a fingerprint any distinctive feature allowing identification of a

given traffic class. In this work, a fingerprint corresponds to a

first-order homogeneous Markov chain reflecting the dynamics

of an SSL/TLS session. The Markov chain states model a

sequence of SSL/TLS message types appearing in a single

direction flow of a given application from a server to a client.

We have studied the Markov chain fingerprints for twelve

representative applications that make use of SSL/TLS: PayPal

(an electronic service allowing online payments and money

transfers), Twitter (an online social networking and micro-

blogging service), Dropbox (a file hosting service), Gadu-

Gadu (a popular Polish instant messenger), Mozilla (a part

of Mozilla add-ons service responsible for verification of the

software version), MBank and PKO (two popular European

online banking services), Dziekanat (student online service),

Poczta (student online mail service), Amazon S3 (a Simple

Storage Service) and EC2 (an Elastic Compute Cloud), and

Skype (a VoIP service). The resulting models exhibit a specific

structure allowing to classify encrypted application flows by

comparing its message sequences with fingerprints. They can

also serve to reveal intrusions trying to exploit the SSL/TLS

protocol by establishing abnormal communications with a

server.

II. SSL/TLS OVERVIEW

Secure Sockets Layer (SSL) [11] and its successor Transport

Layer Security (TLS) [10] are cryptographic protocols that

provide secure communication between two parties over the

Internet by encapsulating and encrypting application layer

data. Many WWW portals and servers, especially those pro-

viding commercial services, use SSL/TLS for guaranteeing

security of all operations.

Figure 1 illustrates the structure of SSL/TLS and its com-

ponents:

•Record Protocol: compresses and encrypts upper-layer

data using the security parameters configured by the

Handshake Protocol.

•Application Data Protocol: provides application layer

data to the Record Protocol.

•Handshake Protocol: negotiates parameters of an

SSL/TLS session. Two communicating parties agree on

the protocol version to use, they optionally authenticate

each other, exchange information on the session ID, select978-1-4799-3360-0/14/$31.00 c

2014 IEEE

Application Layer HTTP

HTTP Telnet

Telnet FTP

FTP Other

Other

SSL/TLS

Protocol

TCP

Layer 1

Layer 2

Application

Data

Protocol

Application

Data

Protocol

Change

Cipher Spec

Change

Cipher Spec

IRC

Handshake

Handshake Alert

Alert

Record Protocol

Transport Layer

Handshake Protocol

Figure 1. SSL/TLS protocol structure

cryptographic and compression algorithms, as well as the

shared secret used to generate keys.

•Change Cipher Spec Protocol: signals modifications to

encryption strategies. The protocol consists of a single

message sent by either the client or the server to inform

the other party that successive records will use the newly

negotiated cryptographic algorithm and keys.

•Alert Protocol: reports an error condition or a change in

status of the session.

Client Hello

Server Hello

Server Certificate *

Server Key Exchange *

Client Certificate Req *

Server Hello Done

Client Certificate *

Client Key Exchange

Certificate Verify *

Change Cipher Spec

Client Finished Message

Change Cipher Spec

Server Finished Message

Application Data

Alert

Application

Data

Protocol

Alert Protocol

Handshake Protocol

Change Cipher Spec Prot.

Handshake Protocol

Change Cipher Spec Prot.

Handshake

Protocol

* Indicates optional or situation-dependent messages that are not always sent

Figure 2. Message exchange during an SSL/TLS session with a full

handshake

Figure 2 presents an example message exchange between

a client and a server during the SSL/TLS session with a full

handshake.

The initial message exchange of Client Hello and

Server Hello establishes the attributes: Protocol Version,

Session ID, Cipher Suite, and Compression Method. The key

exchange uses up to four messages: server Certificate,

Server Key Exchange, client Certificate, and

Client Key Exchange. Then, the client sends Change

Cipher Spec and the next Finished message is

encrypted with the new algorithms and keys. In response, the

server sends its own Change Cipher Spec message and

the Finished message under the new cipher specification,

which completes the SSL/TLS handshake and the two parties

can exchange application layer data. The server terminates

the session with an Alert message.

The exchange is an example—a session can be shortened

by resuming previous sessions using the Session ID or it can

be significantly modified depending on server configuration

and application requirements. Note that during the SSL/TLS

handshake much information is sent as plaintext. However, af-

ter the Server Hello Done or Change Cipher Spec

protocol message, only the protocol type, the length of a

record, and the SSL/TLS version are not encrypted.

We use below the following compact notation of messages

types: the decimal protocol types and, if not encrypted, the

corresponding message types present in the SSL/TLS headers

(cf. Figure 3). For instance, we represent the Application

Data Protocol as 23: and the Handshake Client

Hello message as 22:1.

In our study, we consider only server-side message types

of an SSL/TLS session. Depending on client configurations

Decimal

Code Protocol Type

Change Cipher Spec20

Alert21

Handshake22

Application Data23

Hello Request0

Client Hello1

Server Hello2

Certificate11

Server Key Exchange12

Certificate Request13

Server Hello Done14

Certificate Verify15

Client Key Exchange16

Finished20

Decimal

Code

Handshake Message

Type

Figure 3. SSL/TLS protocol types and their corresponding decimal codes

(e.g. SSL/TLS protocol settings of Web browsers), we expect

slightly different characteristics for the client side, whereas the

service-side model should be representative of all networks.

Moreover, the separation of client and server-side models helps

tackling the problem of asymmetric routing when we can only

observe traffic in one direction.

III. MARKOV CHAIN FINGERPRINTS

In this section, we propose an approach based on Markov

chains to model possible sequences of message types observed

in single-directional SSL/TLS sessions. We have chosen a first-

order homogeneous Markov chain model due to its simplicity.

We consider discrete-time random variable Xtfor any t=

t0, t1, ..., tn∈T. It takes values it∈ {1, ..., s}, where itis

either an SSL/TLS message type (e.g. 22:2) or a sequence

of the SSL/TLS message types transmitted in a single TCP

segment (e.g. 22:11,22:14).

We assume that Xtis a first-order Markov chain [12]:

P(Xt=it|Xt−1=it−1, Xt−2=it−2, . . . , X1=i1)

=P(Xt=it|Xt−1=it−1).(1)

We further assume that the Markov chain is homogeneous,

i.e. a state transition from time t−1to time tis time-invariant:

P(Xt=it|Xt−1=it−1) = P(Xt=j|Xt−1=i) = pi−j,(2)

with the transition matrix [12]:







p1−1p1−2· · · p1−s

p2−1p2−2· · · p2−s

.....

ps−1ps−2· · · ps−s







,(3)

where: Ps

j=1 pi−j= 1. We denote by:

Q= [q1, q2, . . . , qs],(4)

the ENter Probability Distribution (ENPD), where qi=

P(Xt=i)at time t0, and we define:

W= [w1, w2, . . . , ws],(5)

as the EXit Probability Distribution (EXPD), where wirepre-

sents the probability that the session finishes when it is in state

iat time tn. Note that both probability distributions are inde-

pendent of the Markov chain—they provide the probabilities to

enter and quit the Markov chain. In traditional Markov chain

models, there is an initial state and one or several absorbing

states. In our case, ENPD defines the probability to enter

one of the state of the Markov chain and EXPD gives the

probability of quitting the Markov chain from any of its states.

Based on these definitions, the probability that a sequence

of states X1, . . . , XTrepresenting a single SSL/TLS session

occurs is as follows:

P({X1, . . . , XT}) = qi1×

t=2

pit−1−it×wiT.(6)

The resulting probability indicates how a given SSL/TLS

sequence of message types during a session is close to a model

of an application flow: a larger value means that the SSL/TLS

session is closer to the model.

To illustrate the process of the fingerprint creation, consider

the following examples of the message sequences observed

during SSL/TLS sessions in a training dataset composed of

only three server-side SSL/TLS flows of the PayPal application

traffic:

22:2-22:11,22:14-20:,22:-23:

22:2,20:,22:-23:

22:2-22:11,22:14-20:,22:-23:-23:-21:

There are 6 different Markov states in the example. The tran-

sition probability between states is derived from frequencies

observed in the sequences, e.g. P22:2−22:11,22:14 = 1, while

P23:−23: = 0.5. The ENPD vector is composed of two non-

zero elements, namely P22:2 = 0.67 and P22:2,20:,22: = 0.33,

whereas the EXPD vector also contains two non-zero elements

P23: = 0.67 and P21: = 0.33. The probabilities are the

parameters of the Markov chain fingerprint for the PayPal

traffic. Based on the model, we can find the probability that

22:2

20:,22:

4.3%

22:11,22:14

94.4%

99.6% 23:

99.5%

22:2,20:,22:

100%

21:

10%

90%

70.7%

22.1%

7.1%

23.5%

61.6%

Enter Exit

8.1%

3.9%

Enter probabilities Transition probabilities Exit probabilities

Figure 4. Parameters of the fingerprint for PayPal

an observed SSL/TLS session conveys the PayPal application

traffic (cf. Eq. 6). The probability that the following sequence

of SSL/TSL message types:

22:2,20:,22:-23:-23:-23:

is a PayPal flow is equal to: P({X1, . . . , X4}) = 0.055. In

comparison, the probability computed from the Twitter model

(cf. Figure 5) is equal to 0.003%, whereas the probability

computed from the Skype fingerprint (cf. Figure 9) is equal

to 0.

A. Examples of Fingerprints

Figures 4-9 illustrate the fingerprints derived for chosen

application traffic—they represent ENPD, the transition prob-

ability matrix, and EXPD. The diagrams are simplified for

clarity by including only the states with meaningful probabili-

ties (full models usually contain a large number of states). For

this reason, in some cases, ENPD and EXPD do not sum to

100%. Due to the space limitation, we only provide six models

out of twelve analyzed applications: PayPal, Twitter, Dropbox,

Mozilla, Gadu-Gadu, and Skype. The model parameters are

derived from a representative dataset (cf. Section IV-A for the

description of the Campus2 dataset). For Skype, the measured

data comes from the traffic dataset recorded for Skype service

flow classification [13].

1) PayPal: Figure 4 for PayPal shows that 92.8% of all ses-

sions start with the Server Hello message, whereas 7.1%

are Alert messages indicating handshake failure and closing

the session even before the authentication process. Moreover,

in case of successfully established sessions, the server always

sends Change Cipher Spec indicating modifications in

ciphering strategies. In 66.74% of sessions, the client authen-

ticates the server, whereas in the remaining cases, the session

is resumed using the Session ID (the handshake procedure is

shortened to the Server Hello message). Finally, from the

Exit Probability Distribution, we can conclude that in most

cases, successful sessions do not end up with the Alert

message coming from the server.

2) Twitter: Figure 5 indicates that 55% of SSL/TLS ses-

sions are resumed from previously negotiated ones. Contrary

to PayPal, almost 70% of remaining sessions do not change

ciphering strategy after the server authentication procedure (no

Change Cipher Spec message). Moreover, sessions tend

22:2

22:2,20:,22:

22:11,22:1497.9%

22:,20:,22:

70.9% 20:,22:

28%

23: 40.3%

21:

58.6%

91.9%

98.5%

96.1%

42.7%

91.8%

55%

Enter Exit

Enter probabilities Transition probabilities Exit probabilities

3.5%

Figure 5. Parameters of the fingerprint for Twitter

to be rather short and are composed of Application Data

message followed by the Alert message (with probability

58.6%) terminating the session.

22:2

22:11,22:12,22:14

98%

22:,20:,22:

97.9%

23:

90.6%

82.4%

23:,23:

6.4%

17.6%

21:

60%

15.7%27.4%

21.9%

22:

20:

24.7%

25.3%

22:2,20:,22:

98.3%

84.7%

13.3%

Enter

1.7%

Exit

Enter probabilities Transition probabilities Exit probabilities

Figure 6. Parameters of the fingerprint for Dropbox

3) Dropbox: Contrary to previously presented PayPal and

Twitter, the great majority of initial sessions (98.3%) never

resume previous SSL/TLS sessions as shown in Figure 6.

Furthermore, we can observe the Server Key Exchange

message that contains additional cryptographic information to

the Certificate message allowing the client to communi-

cate the premaster secret. Sessions are composed of multiple

Application Data messages, which reflects the specific

nature of the application. Again, the majority of sessions

(84.7%) are terminated by sending the Alert message. De-

spite the fact that sessions are highly consistent and message

sequences often repeat, we can observe quite a few unusual

states signaled by the Alert Protocol.

22:2, 20

20:,22:

100%

23:

22:(2,11,14)

100%

48.3%

49.8%

84.7%

Enter Exit

Enter probabilities Transition probabilities Exit probabilities

100%

98.8%

22: 97.3%

Figure 7. Parameters of the fingerprint for Mozilla

4) Mozilla: In Figure 7 for Mozilla, we can observe a

rare initial state—so-called a multiple handshake message in

which we have identified three messages in a single TCP

segment with SSL/TLS handshake, namely Server Hello,

Certificate, and Server Hello Done, depicted as

22:(2,11,14). Also, the number of significant states is limited

to five.

22:2,22:11

22:12,22:14

100%

20:,22:

100%

23:

67.4%

22:2,20:,22:

23:,23:,23:,23:,23:24.1%

13.4%

19.8%

23:,23:

48.7%

21:

20.6%

92%

2.2% 95.5%

22:2

22:11,22:14

99.2%

20:,22:,23:

95.7%

99.3%

1.7%

66.9%

16.8%

15.9%

64.5%

35.7%

58.7%

Enter Exit

Enter probabilities Transition probabilities Exit probabilities

Figure 8. Parameters of the fingerprint for Gadu-Gadu

5) Gadu-Gadu: As we can observe in Figure 8, Gadu-Gadu

presents three possibilities to establish a session. The primary

one (64.5%) consist of a typical Server Hello message

followed by the Certificate and Server Hello Done

messages. In 95.7% cases, the Change Cipher Spec mes-

sage comes after. The second SSL/TLS handshake procedure

additionally includes the Server Key Exchange mes-

sage. Finally, 15.9% sessions are being resumed. The figure

suggests that on the average, Gadu-Gadu sessions consist of

a significantly larger number of messages in comparison to

PayPal, Mozilla, or Twitter. Moreover, we observe individ-

ual segments composed of multiple Application Data

messages, which presumably implies that application layer

messages are relatively short—this feature stands out from the

previous cases.

22:2 20:

21:

23:

22:

22.7%

35.1%

21%

19.3%

100%

33.5%

18.8%

16.5%

14.7%

16.5%

Enter Exit

Enter probabilities Transition probabilities Exit probabilities

Figure 9. Parameters of the fingerprint for Skype

6) Skype: Finally, we present the example of the Markov

chain analysis of the Skype traffic tunneled through SSL/TLS

(cf. Figure 9). Skype traffic represents a special case—the

Markov state space contains only six states. In addition to four

standard SSL/TLS protocol type messages, we do observe a

unique state interpreted by Wireshark as a Heartbeat Protocol

message defined in RFC 6520 [14], depicted as 24:. Briefly, the

Heartbeat Extension provides a new protocol for TLS allowing

the use of keep-alive functionality without performing a ses-

sion renegotiation. In this case however, all five protocols act

as application data protocols and directly provide application

layer data to the Record Protocol. After the transition from

the initial state to one of five remaining states with probability

70.6%, the transition probabilities between each of them are

very similar ranging from 18.9% to 21.2% (not depicted in

the figure for the sake of clarity).

Actually, Skype is a proprietary piece of software that

uses its own internal encryption mechanisms and a complex

connection protocol designed for bypassing firewalls and es-

tablishing communication regardless of network policies [13].

Skype randomly selects ports and can switch to port 443 if it

fails to establish a connection on chosen ports. Such technique

is sufficient to bypass network-layer firewalls, however, it

results in creating a particular SSL/TLS session and a salient

Markov fingerprint.

B. Cross-validation

To validate the constructed models, we apply a 4-fold cross-

validation based on four heterogeneous datasets described

in Section IV-A. More precisely, we create Markov chains,

compute the ENPD and EXPD vectors for application traffic

based on one dataset (training set) and validate the analysis

on the remaining three datasets (testing sets).

We perform validation as follows. First, we pre-process the

testing set to extract application flows and then the classifier

applies a decision process based on the Maximum Likelihood

criterion [15]. Validation based on models obtained in the

training process corresponds to a multi-hypothesis decision

problem. More specifically, we consider up to twelve hypoth-

esis Hi, i = 1,...,12 corresponding to each of considered

applications. We apply a classical approach based on Max-

imum Likelihood criterion—we select the hypothesis under

which the data sequence Yis most likely:

H= arg max

log L({Y1, . . . , YT}|Hi),(7)

where L({Y1, . . . , YT})is the likelihood of the input

data sequence under each hypothesis: L({Y1, . . . , YT})≡

P({X1, . . . , XT}), the probability of a message sequence

computed over each fingerprint (cf. Eq. 6).

IV. CLASSIFICATION RESU LTS

In this section, we present the results of cross-validation of

the fingerprints based on four trace datasets.

A. Datasets

To build and validate fingerprints, we have used four recent

heterogeneous datasets gathered on edge routers located in

a European country. The Campus1 and Campus2 datasets

come from two links connecting a large campus network

to the Internet. Campus1 dataset contains a one day long

trace starting from March 1, 2012, whereas the 24 hours long

Campus2 dataset was obtained starting from March 26, 2012.

The datasets labeled Campus3 and Campus4 consist of data

observed at a different part of the campus network than the

previous ones and were collected starting from July, 17, and

July 21, 2013, lasting one day and 42 hours, respectively. All

datasets contain only SSL/TLS encrypted traffic generated by

standard services such as Web, chat, mail, VoIP, file transfer,

or streaming applications. Moreover, we often refer to Skype

as an example of traffic tunneled through SSL/TLS. The

evaluation runs on two sets of packet traces generated in

the experiments of classifying Skype service flows [13]. We

merged two Skype datasets with Campus1 and Campus2

packet traces. For privacy reason, we analyze a TCP payload

and export only information from SSL/TLS headers, while the

actual payload is discarded.

B. Ground Truth

To establish the ground truth, we have developed a Domain

Name System Classifier (DNSC) to extract SSL/TLS applica-

tion flows according to their domain names. More specifically,

DNSC matches hostnames to an array of signature strings

such as twitter, r-*twttr in case of Twitter. Table I summarizes

domains used by DNSC for identifying different SSL/TLS

applications. We have confirmed the accuracy of the method

Table I

DOMAIN NAMES USED BY DNSC. IRRE LEVAN T STRIN GS ARE RE PLACED

BY AN AST ERISK .

Application Strings

PayPal *active*paypal*

Twitter *twitter.com, r-*twttr

Dropbox *dropbox*

Gadu-Gadu ip*gadu-gadu.pl

Mozilla *versioncheck*mozilla*

MBank www.mbank*

PKO ipko*

Dziekanat dziekanat.agh.edu.pl

Poczta poczta.agh.edu.pl

Amazon S3 s3*amazon*

Amazon EC2 ec2*amazon*

by manual payload inspection. Nevertheless, we might not

cover all instances of signatures for a particular application.

Another constraint of the approach is that we cannot obtain

the instances of applications, if we are not able to resolve IP

addresses into the corresponding domain names. For example,

we cannot extract Skype flows using DNSC because the

application relies on a Peer-to-Peer (P2P) infrastructure and

the traffic is relayed by ordinary hosts.

C. Application Selection

In our experimental evaluation, to overcome the limitations

of the DNSC classifier, we have selected the applications for

which the IP address resolution was possible and correspond-

ing strings are straightforward and unambiguous.

Table II presents the number of flows derived from the

training datasets for the purpose of cross-validation of fin-

gerprints. To estimate the minimal number of flows required

Table II

NUMBER O F FLOW S COR RE SPO ND ING T O EAC H APP LICATIO N IN F OUR

DATASETS.

Application Campus1 Campus2 Campus3 Campus4

PayPal 546 421 – –

Twitter 1257 1500 8848 10308

Dropbox 1160 3134 4714 5253

Gadu-Gadu 659 807 1318 1779

Mozilla 1017 1076 2431 1567

MBank 644 94 675 459

PKO 354 420 1574 1137

Dziekanat 1162 1706 2655 609

Poczta 680 944 944 4420

Amazon S3 238 321 1587 1310

Amazon EC2 109 314 5798 610

Skype 210 207 – –

Table III

CLASSI FICATION R ESULTS FOR TH E FINGER PRI NT S. TR AI NIN G DATASE T:

CAMPUS1, VALI DATIO N DATASETS:CAMPUS2,CAMPUS3,CAMPUS4

Campus2 Campus3 Campus4

Application TPR FPR TPR FPR TPR FPR

PayPal 0.76 0.007 – – – –

Twitter 0.932 0.013 0.768 0.029 0.791 0.023

Dropbox 0.922 0.001 0.971 0.007 0.957 0.009

Gadu-Gadu 0.865 0.001 0.535 0.053 0.59 0.063

Mozilla 0.998 0.0 0.0 0.0 0.0 0.0

MBank 0.67 0.03 0.008 0.025 0.0 0.01

PKO 0.957 0.016 0.916 0.164 0.92 0.12

Dziekanat 0.807 0.005 0.83 0.0 0.805 0.001

Poczta 0.976 0.025 0.97 0.008 0.97 0.02

Skype 0.986 0.002 – – – –

to create a reliable application fingerprint, we have applied

the following procedure: we start with a model based on

a randomly chosen flow. We build the state space and the

transition matrix corresponding to a first-order homogeneous

Markov chain. Therefore, we enrich the model by randomly

including flows one by one and we observe the stability of the

model. When the number of states, transitions and transition

probabilities do not significantly change when enriching the

model, the fingerprint can be included in the validation pro-

cess. Depending on the application, the minimal number of

the required flows may vary. However, even if the number of

flows for some application is not sufficient to create a model,

e.g. Amazon S3 or EC2 in Campus1 or Campus2, they can

still serve to validate fingerprints built upon other datasets.

Table IV

CLASSI FICATION R ESULTS FOR TH E FINGER PRI NT S. TR AI NIN G DATASE T:

CAMPUS2, VALI DATIO N DATASETS:CAMPUS1,CAMPUS3,CAMPUS4

Campus1 Campus3 Campus4

Application TPR FPR TPR FPR TPR FPR

PayPal 0.749 0.007 – – – –

Twitter 0.847 0.003 0.438 0.012 0.528 0.007

Dropbox 0.984 0.009 0.987 0.056 0.985 0.06

Gadu-Gadu 0.975 0.015 0.521 0.146 0.569 0.153

Mozilla 0.988 0.0 0.0 0.0 0.0 0.0

PKO 0.901 0.027 0.889 0.141 0.898 0.098

Dziekanat 0.997 0.002 1.0 0.005 0.995 0.006

Poczta 0.961 0.003 0.968 0.009 0.969 0.019

Skype 0.986 0.001 – – – –

D. Criteria of Cross-validation

We assume that the classification based on the DNSC

reference classifier provides a reliable benchmark and we

validate SSL/TLS models with respect to its classification

decisions. We consider two meaningful metrics to assess the

performance of the classification method: the true positive

(TP) and false positive (FP) rates (denoted TPR and FPR,

respectively). True Positive occurs when the validation result

is consistent with the classification decision taken by DNSC

and the application session is correctly classified as a given

application, e.g. a PayPal session is accurately recognized as

PayPal. Conversely, False Positive occurs when the validation

result is inconsistent with the decision taken by the reference

classifier and a session is incorrectly classified, e.g. a Twitter

session is falsely recognized as PayPal.

E. Cross-validation Results

Section III-A has shown that we can observe a great variety

of SSL/TLS message exchanges. The parameters of the derived

fingerprints differ considerably, which is the basis for accurate

application discrimination. In this section, we report on the

4-fold cross-validation results of the application models in

which Campus1-4 datasets served for training and validation

alternately (cf. Table III-VI).

Let us take the example of Amazon S3, for which we

have observed that the TP rate exceeds 97% regardless of

the classification datasets used for training and validation

(cf. Table V and VI). By manual inspection, we have found

a multiple handshake message state observed previously in

the Mozilla models built from the packet traces collected in

March of 2012 (cf. Figure 7). This is also the reason why

we experience a higher FP rate when validating Amazon S3

models on the Campus1 and Campus2 datasets.

Table V

CLASSI FICATION R ESULTS FOR TH E FINGER PRI NT S. TR AI NIN G DATASE T:

CAMPUS3, VALI DATIO N DATASETS:CAMPUS1,CAMPUS2,CAMPUS4

Campus1 Campus2 Campus4

Application TPR FPR TPR FPR TPR FPR

Twitter 0.932 0.007 0.908 0.04 0.907 0.018

Dropbox 0.692 0.01 0.704 0.005 0.922 0.01

Gadu-Gadu 0.97 0.004 0.916 0.004 0.781 0007

Mozilla 0.001 0.028 0.0 0.023 0.405 0.041

MBank 0.02 0.006 0.0 0.007 0.817 0.018

PKO 0.597 0.005 0.537 0.004 0.595 0.035

Dziekanat 0.966 0.093 0.933 0.012 0.988 0.0

Poczta 0.942 0.002 0.967 0.004 0.97 0.0

Amazon S3 0.978 0.146 0.991 0.111 0.996 0.0

Amazon EC2 0.02 0.007 0.035 0.084 0.579 0.013

To better understand the results for Dropbox, let us consider

its architecture and operation. The control and data storage

servers are two major components of its architecture [16].

While the former is controlled by Dropbox Inc., the latter is

managed by the Amazon S3 and EC2 servers. As we could

expect, in some cases, Dropbox flows are incorrectly classified

as Amazon EC2 resulting in a lower TPR (cf. Table V and

VI). However, the classification based on models built upon the

most recent flows coming from the Campus3 and Campus4

Table VI

CLASSI FICATION R ESULTS FOR TH E FINGER PRI NT S. TR AI NIN G DATASE T:

CAMPUS4, VALI DATIO N DATASETS:CAMPUS1,CAMPUS2,CAMPUS3

Campus1 Campus2 Campus3

Application TPR FPR TPR FPR TPR FPR

Twitter 0.936 0.01 0.911 0.041 0.887 0.026

Dropbox 0.672 0.005 0.7 0.005 0.919 0.06

Gadu-Gadu 0.975 0.011 0.929 0.01 0.684 0.013

Mozilla 0.001 0.029 0.0 0.023 0.29 0.035

MBank 0.0 0.013 0.0 0.01 0.903 0.037

PKO 0.521 0.005 0.489 0.003 0.575 0.032

Dziekanat 0.959 0.092 0.929 0.012 0.994 0.0

Poczta 0.924 0.002 0.935 0.003 0.97 0.0

Amazon S3 0.982 0.142 0.99 0.112 0.994 0.0

Amazon EC2 0.146 0.044 0.001 0.079 0.598 0.01

datasets lead to TPR higher than 90%. The accurate fingerprint

comes from the nature of Dropbox sessions: they exchanged

a lot of data after the handshake process, which results in

long SSL/TLS sessions composed of multiple application data

protocol messages.

We have found a reliable fingerprint for the Poczta applica-

tion for all four datasets. The most commonly observed state

in SSL/TLS exchanges is composed of four application data

protocol messages merged together: 23:23:23:23:.

The Skype traffic tunneled over the SSL/TLS protocol

results in a unique fingerprint. The number of transitions

depends on the service such as voice calls, chat, skypeOut or

file sharing, and the amount of data to be sent. Every few to

tens of seconds, Skype encapsulates a huge portion of data (in

comparison to typical values) ranging from 3KB up to 65KB

in one of its 6 SSL/TLS protocol types. Such an SSL/TLS

segment is further divided into multiple TCP segments and

sent across the network. This behavior is consistent with

the real-time nature of Skype—creating multiple SSL/TLS

messages could potentially increase the processing time and

influence the quality of experience.

22:2

22:2,20:,22:

22:11,22:14

98.5%

22:,20:,22:

76.3%

20:,22:

22%

23: 87.6%

21:

11%

74%

75.6%

95.9%

34%

87.5%

41.7%

Enter Exit

Enter probabilities Transition probabilities Exit probabilities

7.4%

22:2, 22:11, 22:14

24.1% 79.6%

19.8%

22.7%

21.4%

Figure 10. Parameters of the fingerprint for Twitter

Although, the proposed methodology results in very reliable

application fingerprints, it may fail in some cases. For instance,

we have identified a specific Markov chain instance in the

fingerprint for Amazon EC2. It is composed of 5 states and

more than 53% of all of its SSL/TLS sessions generate the

same transition chain. The drawback is that the probabilities

of all other chain instances of Amazon EC2 are very low,

which results in a false negative rate not lower than 40% (cf.

Table V and VI).

F. Time Evolution of Markov Fingerprints

We have run our tests twice during the last one year and

a half to study the evolution of the SSL/TLS fingerprints. By

analyzing Markov chains, we can also extract a variety of

meaningful information and possibly evaluate the changes in

cryptographic practices of application servers. For example,

we can clearly observe that the classification based on the

fingerprints generated for Twitter and Gadu-Gadu in March

2012 (datasets Campus1 and Campus2) results in a worse

TPR compared to the validation on two recent Campus3

and Campus4 datasets. Moreover, the classification based

on the Mozilla and MBank fingerprints failed because of

implementation changes between the two observation periods.

Let us focus on the Twitter application. Figure 10 presents

the fingerprint based on one of two most recent datasets,

namely Campus3. To emphasize the fingerprint differences

between the older model (cf. Figure 5) and the recent one, we

have thickened the new states and transitions. When comparing

with the fingerprint based on traffic collected one year earlier,

we can notice only a change in the SSL/TLS segmentation.

More specifically, we can observe a new state in the ENDP

vector created by merging two neighboring states from the

older model. In this case, enriching the model with recent

application flows can significantly improve classification per-

formance.

22:2,22:11,22:14

57.4% 20:,22:

23:

67.4%

22:2,20:,22:

23:,23:,23:,23:,23:

24.1%

49%

23:,23:

40.1%

21:

9.7%

85%

1.9% 95.4%

22:2

22:11,22:14

99.4%

20:,22:,23:

50.5%

56.1%

1.9%

63%

7.1%

19.4%

73.2%

31.7%

38.6%

Enter Exit

Enter probabilities Transition probabilities Exit probabilities

18.4%

42.5%

22:,20:,22:

37%

Figure 11. Parameters of the fingerprint for Gadu-Gadu

Figure 11 presents a recent fingerprint for Gadu-Gadu.

When comparing with the older model in Figure 8, the

primary transition composed of the Server Hello and

Certificate messages followed by a state that con-

sists of the Server Key Exchange and Server Hello

Done messages is replaced by one initial state composed of

the Server Hello,Certificate, and Server Hello

Done messages. In other words, in the recent Gadu-Gadu

fingerprints, we have not observed either Server Key

Exchange or the associated Diffie-Hellman (DHE) key ex-

change algorithm. It means that either the application has

changed the key exchange methods and does not support

the DHE algorithm anymore or, which is less likely, all ob-

served clients have used the RSA, DH_DSS, or DH_RSA key

exchange methods in which the Server Key Exchange

message is not allowed [10].

22:2

20:,22:

22:11,22:14

95,4%

99.7%

23:

95.3%

22:2,20:,22:

22:2,22:11,22:14

94.8%

28%

59.9%

11.97%

74.7%

Enter Exit

12%

Enter probabilities Transition probabilities Exit probabilities

93.7%

99%

Figure 12. Parameters of the fingerprint for Mozilla

Finally, as we have expected by analyzing the cross-

validation results, a simple “service version check” maintained

by Mozilla has changed the SSL/TLS security implemen-

tation since last year. As a result, the previously obtained

very reliable fingerprint is not valid anymore (for detailed

comparisons, please refer to Figures 7 and 12). Moreover,

while the fingerprints built upon the most recent datasets,

namely Campus3,Campus4 are consistent with one another,

the cross-validation results are poor, because the resulting

fingerprints correspond to typical patterns widely observed in

the Internet. To conclude, application fingerprints may evolve

over time and need a periodical or even constant update.

V. DISCUSSION

Below, we discuss the reasons for which our method results

in a precise discrimination of applications.

Incorrect/diverse implementation practices: First, we have

noticed that many protocol implementations do not follow

the RFC specifications and behave slightly different from

common SSL/TLS stacks. For example, in Campus1 and

Campus2 datasets, we have observed that PayPal does not

support extensions in the TLS protocol version 1.0 necessary

for extensibility and security [17], [18]: the implementation

simply rejects the Client Hello messages that contain

extensions. This is the reason why, contrary to other appli-

cations, 7.1% of all PayPal sessions start (and terminate) with

the Alert messages. A recent RFC draft discusses other

incorrect practices of HTTP implementations over TLS [19]. A

counterexample is a recent verified reference implementation

of TLS 1.2 that supports all protocol functions, as prescribed

in the RFCs [20]. Moreover, an increasing number of protocol

extensions are available to expand the functionality of the TLS

protocol (e.g. Heartbeat extension), which results in highly

differentiated implementation practices.

Misuse of the SSL/TLS protocol: SSL/TLS tunneling is

increasingly used as a tool for bypassing restrictions set by

network configuration and security checks instead of using

SSL/TSL for enforcing security. For example, since Skype

uses its own security and a real-time communication protocol,

SSL/TLS tunneling is only adopted to bypass network-layer

firewalls. As a result, the SSL/TLS stack fingerprint is reduced

to a few transitions, which significantly differs from other

models.

Server configuration: Some SSL/TLS protocol messages are

defined as optional or context-dependent. For example, in the

two first datasets of our study, we have not observed any

Server Key Exchange message in PayPal and Twitter

sessions, whereas in the case of Dropbox, it always follows

the Certificate message. The behavior depends on key

exchange methods. For some of them (DHE_DSS, DHE_RSA,

DH_anon), the server sends Server Key Exchange, be-

cause the Certificate message does not contain enough

data to allow the client to exchange the premaster secret.

However, for some other methods (RSA, DH_DSS, DH_RSA)

the Server Key Exchange message is illegal [10].

Application nature: While some SSL/TLS server commu-

nication parameters can be configured and possibly changed

over time, others reflect the nature of the application and

depend on the service implementation. It is noteworthy that

they may reflect some flow features like the session duration

or the content size. For example, we can observe only few

session transitions in case of Twitter, which enables its users

to send short text-based messages of up to 140 characters,

whereas in case of the proprietary Gadu-Gadu protocol, we

may observe more than one hundred session transitions that

reflect its instant messaging character (i.e., high interaction

between users). Moreover, the Gadu-Gadu SSL/TLS messages

are relatively short, so individual session states are composed

of multiple Application Data messages. Contrary to

Gadu-Gadu, the Skype SSL/TLS protocol messages are long

and need to be divided in multiple TCP segments before

sending across the network—this is another feature that cannot

be easily evaded or changed with time due to the application

nature.

VI. RELATED WORK

As new Internet applications started to use obfuscation

methods (port masquerading, tunneling, packet encryption),

traditional classification methods based on simple pattern

matching are no more reliable. In our work, however, we

demonstrate that it is possible to effectively model application

flows by inspecting application layer protocols. Risso et al.

introduced a taxonomy of payload-based classification meth-

ods [9] and argued that they are mainly based on pattern

verification. We believe that a key challenge in encrypted

traffic classification is to replace traditional pattern verification

with more sophisticated statistical fingerprinting.

Some authors focus on classification of encrypted traffic [4],

[21], [13], [22]. Bernaille and Teixeira proposed a method

based on the size of the first few packets of an encrypted

connection, which enables an early application recognition

with the accuracy of more that 85% [4]. A more recent

hybrid method tries to identify SSL/TLS encrypted application

layer protocols with a combination of a signature-based and a

flow-based statistical analysis scheme [21]. Both methods are

related to our proposal, however their objectives are limited

either to the SSL/TLS application recognition or to classifi-

cation of encrypted application layer protocols. In this work,

we focus on an in-depth analysis of the SSL/TLS protocol

message sequences to characterize and classify application

flows. In our previous work, we considered the problem of

detecting Skype traffic and classifying its service flows. We

proposed a classification method for Skype traffic tunneled

over TLS in addition to proprietary encryption. The method

is based on the Statistical Protocol IDentification (SPID) that

analyzes the distributions of flow and application layer data

[13].

Bissias et al. presented a traffic analysis attack against

encrypted HTTP streams to identify the source of the traffic by

analyzing distributions of packet sizes and inter-arrival times

of web requests from interesting sites [23]. Even if their work

differs from our paper in terms of objectives and methodology,

the conclusions remain the same: encrypting traffic does not

prevent from performing some types of traffic analysis.

Lee et al. and Levillain et al. evaluated the practices of

SSL/TLS servers by investigating server replies [24], [18].

They studied the details of the encryption parameters, e.g. ci-

pher suites, key sizes, and protocol features such as supported

versions and their extensions. Our work is a further step in

this direction.

VII. CONCLUSIONS

In this paper, we have proposed stochastic fingerprints for

application traffic flows conveyed in SSL/TLS sessions. The

fingerprints are based on first-order homogeneous Markov

chains for which we identify the parameters from observed

training application traces. As the fingerprint parameters of

chosen applications differ considerably, the method results in a

very good accuracy of application discrimination and provides

a possibility of detecting abnormal SSL/TLS sessions. We have

also shown that application fingerprints need to be updated

periodically, because they change over time.

Our analysis of the results reveals that obtaining application

discrimination mainly comes from incorrect and diverse imple-

mentation practices, the misuse of the SSL/TLS protocol, var-

ious server configurations, and the application nature. Finally,

even if we are able to identify some very reliable statistical

fingerprints for selected applications, it is also possible to

evade the classification by avoiding implementation mistakes

and building the secure layer on limited, but widely-used set

of SSL/TLS states.

In the future work, we plan to investigate further the

proposed method on a wider range of Internet applications

and cross-validate on other heterogeneous datasets gathered in

various subnetworks. We also aim at analyzing the SSL/TLS

stack to verify its consistency with protocol recommendations

and best security practices. Finally, we plan to apply the ap-

proach to reveal intrusions that exploit the SSL/TLS protocol

by establishing suspicious, unlikely sessions.

ACKNOWLEDGMENTS

We would like to thank DIMACS and CCICADA for

support, and Nina Fefferman for useful comments on the

draft. This work was partially supported by the European

Commission FP7 project INDECT under contract 218086.

REFERENCES

[1] “Internet Assigned Numbers Authority (IANA),”

http://www.iana.org/assignments/port-numbers.

[2] A. W. Moore and K. Papagiannaki, “Toward the Accurate Identification

of Network Applications,” Proc. of the PAM Conference, pp. 41–54,

2005.

[3] S. Sen, O. Spatscheck, and D. Wang, “Accurate, Scalable In-network

Identification of P2P Traffic Using Application Signatures,” in Proc. of

the WWW Conference, 2004, pp. 512 – 521.

[4] L. Bernaille and R. Teixeira, “Early Recognition of Encrypted Applica-

tions,” Proc. of the PAM Conference, vol. 4427, pp. 165–175, 2007.

[5] A. W. Moore and D. Zuev, “Internet Traffic Classification Using

Bayesian Analysis Techniques,” Proc. of ACM SIGMETRICS, pp. 50–60,

2005.

[6] M. Iliofotou, P. Pappu, M. Faloutsos, M. Mitzenmacher, S. Singh, and

G. Varghese, “Network Monitoring Using Traffic Dispersion Graphs

(TDGs),” in Proc. of ACM IMC, 2007, pp. 315–320.

[7] T. Karagiannis, K. Papagiannaki, and M. Faloutsos, “BLINC: Multilevel

Traffic Classification in the Dark,” Proc. of ACM SIGCOMM, pp. 229–

240, 2005.

[8] H. Kim, K. Claffy, M. Fomenkov, D. Barman, M. Faloutsos, and K. Lee,

“Internet Traffic Classification Demystified: Myths, Caveats, and the

Best Practices,” in Proc. of ACM CoNEXT, 2008, pp. 1–12.

[9] F. Risso, M. Baldi, O. Morandi, A. Baldini, and P. Monclus,

“Lightweight, Payload-Based Traffic Classification: An Experimental

Evaluation,” Proc. of IEEE ICC, pp. 5869–5875, 2008.

[10] T. Dierks and E. Rescorla, “The Transport Layer Security (TLS)

Protocol, Version 1.2,” RFC 5246 (Proposed Standard), August 2008.

[11] A. Freier, P. Karlton, and P. Kocher, “ The Secure Sockets Layer (SSL)

Protocol Version 3.0,” RFC 6101 (Historic), August 2011.

[12] I. L. MacDonald and W. Zucchini, Hidden Markov and Other Models

for Discrete-Valued Time Series. Chapman & Hall, 1997.

[13] M. Korczy´

nski and A. Duda, “Classifying Service Flows in the En-

crypted Skype Traffic,” Proc. of IEEE ICC, pp. 1–5, 2012.

[14] R. Seggelmann, M. Tuexen, and M. Williams, “ Transport Layer Secu-

rity (TLS) and Datagram Transport Layer Security (DTLS) Heartbeat

Extension,” RFC 6520 (Proposed Standard), February 2012.

[15] J. Aldrich, “R.A. Fisher and the Making of Maximum Likelihood 1912-

1922,” Statistical Science, vol. 12, no. 3, pp. 162–176, August 1997.

[16] I. Drago, M. Mellia, M. Munafo, A. Sperotto, R. Sadre, and A. Pras,

“Inside Dropbox: Understanding Personal Cloud Storage Services,” in

Proc. of ACM IMC, 2012, pp. 481–494.

[17] S. Blake-Wilson, M. Nystrom, D. Hopwood, and J. Mikkelsen, “ Trans-

port Layer Security (TLS) Extensions,” RFC 4366 (Proposed Standard),

April 2006.

[18] O. Levillain, A. Ébalard, B. Morin, and H. Debar, “One Year of SSL

Internet Measurement,” in Proc. of ACM ACSAC, 2012, pp. 11–20.

[19] A. Langley, “Unfortunate Current Practices for HTTP over TLS,”

Internet Draft, January 2011.

[20] K. Bhargavan, C. Fournet, M. Kohlweiss, A. Pironti, and P. Strub,

“Implementing TLS with Verified Cryptographic Security,” in Proc. of

the IEEE Symposium on Security & Privacy, 2013, pp. 445–459.

[21] G.-L. Sun, Y. Xue, Y. Dong, D. Wang, and C. Li, “An Novel Hybrid

Method for Effectively Classifying Encrypted Traffic,” Proc. of IEEE

GLOBECOM, pp. 1–5, 2010.

[22] R. Alshammari and A. Zincir-Heywood, “Machine Learning Based

Encrypted Traffic Classification: Identifying SSH and Skype,” in IEEE

Symposium on Computational Intelligence for Security and Defense

Applications, 2009, pp. 1–8.

[23] G. D. Bissias, M. Liberatore, D. Jensen, and B. N. Levine, “Privacy

Vulnerabilities in Encrypted HTTP Streams,” in Proc. of the 5th Int.

Conference on Privacy Enhancing Technologies, 2006, pp. 1–11.

[24] H. K. Lee, T. Malkin, and E. Nahum, “Cryptographic Strength of

SSL/TLS Servers: Current and Recent Practices,” in Proc. of ACM IMC,

2007, pp. 83–92.