RIoT (Raiding Internet of Things) by Jacob Holcomb
CoMiFin Essentials: Collaborative Processing System for Cyber Threat Detection
1.
2. CoMiFin Essential
contract
contract
contract
Agre
infor ed
matio
Organization 1 n
warn
ings
Collaborative
....
Processing
System
Organization M Internet
3. Comifin Essentials: Business Vision
• CoMiFin platform can be potentially useful for addressing the following
business use cases
– Monitoring and reaction to cyber threats (Man-in-the-Browser, Man-in-the-
Middle, Botnet detection, stealthy inter-domain port scan)
– Location intelligence for fraud correlation
– ID-theft
– Anti money laundering monitoring
– Black/white lists distribution (for credit reputation, trust level,.)
– Anti-terrorism lists
• These use cases imply value added services that can be offered by
SPs to FIs over CoMiFin
• CoMiFin project had been submitted to four FAB meeting evaluation
sessions that have highlighted its possible business value in real
financial use cases
4. CoMiFin Essentials: The notion of semantic room
■ Contract
■ set of processing and data
sharing services provided by
the SR along with the data
protection, privacy, isolation,
trust, security, dependability,
performance requirements.
■ The contract also contains the hardware and software requirements a member
has to provision in order to be admitted into the SR.
■ Objective
■ each SR has one strategic objective to meet (e.g, large-scale stealthy scans
detection, detecting Man-In-The-Middle attacks)
■ Deployment
■ highly flexible to accommodate the use of different technologies for the
implementation of the processing and sharing within the SR (i.e., the
implementation of the SR logic or functionality).
5. CoMiFin Essentials: The notion of semantic room
■ Contract
■ set of processing and data
sharing services provided by
the SR along with the data
protection, privacy, isolation,
trust, security, dependability,
performance requirements.
■ The contract also contains the hardware and software requirements a member
has to provision in order to be admitted into the SR.
■ Objective
■ each SR has one strategic objective to meet (e.g, large-scale stealthy scans
detection, detecting Man-In-The-Middle attacks)
■ Deployment
■ highly flexible to accommodate the use of different technologies for the
implementation of the processing and sharing within the SR (i.e., the
implementation of the SR logic or functionality).
7. CoMiFin Essentials: Deploying a
Semantic Room
■ Private cloud
■ Deployment of the semantic room
through the federation of
computing and storage
capabilities at each member
■ Each member brings a private
cloud to federate
■ Public Cloud
■ Deployment of the semantic room on
a third party cloud provider
■ The third party owns all computing
and storage capabilities
■ Hybrid approach
9. Interconnections of SRs
• Horizontal/Vertical composition
– Communicating SRs have different goals
– Complex/combined attacks can be detected
by information fusion
– E.g. , alerts generated in Portscan SR
contributes to the Blacklist managed in MitM
SR
10. Today’s CoMiFin Achievements:
user perspectives
• Timely identification of Identity Theft
• Community-based advanced
information sharing
• Identification of Command and
Control of Trojans for MiTB
12. Outline
• Three Semantic Rooms
– Esper-based and Agilis-based SRs for inter-domain stealthy port
scan detection
– Agilis-based botnet driven HTTP session hijacking (not covered in
this presentation)
• High level description of the port scanning detection algorithms
– R-SYN port scan detection
– Line Fitting
• High Level description of botnet driven HTTP session hijacking
(not covered in this presentation)
• Performance evaluation
– R-SYN vs Line Fitting using the Esper-based SR
– Agilis vs Esper-based SR for inter-domain stealthy port scan
detection (R-SYN only)
– Agilis for botnet driven HTTP session hijacking (not covered in this
presentation)
13. Do you remember some
important concepts?
• SR creation phase
– A so-called SR schema is created where the three characterizing SR
elements are specified
• The objective
• The contract filled in with general SR contractual clauses
• The variety of software deployments that can be used for that SR
• SR instantiation phase
– The same SR schema can be instantiated in different ways according
to different aspects, e.g.,
• geographical position of the members
• processing and sharing software (SR IDPS instance uses Agilis for the
processing, SR IDPS instance uses Esper for the processing)
• types of SR deployments: third party based, SR owned, hybrid
15. Inter-domain stealthy port
scan
• TCP SYN (half-open) port scan
– Each scanner is targeting multiple sites
– Hosts at each site receive a series of probes to multiple ports
– Each probe is a TCP connection initiation (3-way handshake), which never
completes
• A scanner S sends a SYN packet to a target T on a specific port P and waits for a
response
– If a SYN-ACK packet is received, S can conclude that P is open and optionally reply with a RST
packet to reset the connection (incomplete connections)
– if a RST-ACK packet is received, S can consider P as closed
– If no packet is received at all and S has some knowledge that T is reachable, then S can conclude
that P is filtered
– If S does not have any clue on the reachability status of T, it cannot assume anything about the
state of P
• We have implemented two algorithms for inter-domain stealthy port scan
detection
– Rank-based SYN (R-SYN) port scan detection
– Line Fitting port scan detection
16. •
R-SYN algorithm*
It recognizes half open connections (HOC)
– Sequence of SYN, ACK, RST packets in the 3-way TCP
handshake
• Normal: (i) SYN, (ii) SYN-ACK, (iii) ACK
• SYN port scan: (i) SYN, (ii) SYN-ACK, (iii) RST (or nothing)
• It recognizes failed connections (FC)
– Unreachable hosts and closed ports
• Unreachable hosts: a sender, after a timeout the sending of a SYN
packet, does not receive neither SYN-ACK nor RST-ACK packets
• Closed ports: it looks for RST-ACK reply packets
•
source IP address x, it maintains the pairs (IP
address, TCP port) probed by x (V(x))
• Using a proper ranking mechanism, it assigns a mark r to
each source IP address x
– r(x) = f (HOC(x), FC(x), V(x))
– If r(x) >= predefined threshold, x is a scanner
* L. Aniello, G. Lodi, R. Baldoni, “Inter-Domain Stealthy Port Scan Detection through
Complex Event Processing”, to appear in the Proceedings of 13th European Workshop on
Dependable Computing (EWDC 2011), May 11-12, Pisa, Italy
17. Line Fitting algorithm*
• Underlying principle
– a scanner does not repeatedly perform the same
operation towards specific hosts or ports
• if the attempt fails on a T:P a scanner likely carries out a
malicious port scan towards different targets
• Line Fitting takes into account the
set F{h} which is a multiset of
failures generated by the source host
h
– A normal failure: the set contains few elements with
high multiplicity
– A port scan: the set includes many elements with low
multiplicity
* L. Aniello, G. A. Di Luna, G. Lodi, R. Baldoni, “A Collaborative Event Processing System for
Protection of Critical Infrastructures From Cyber Attacks”, submitted for publication to
International Conference, 2011.
18. Implementation in Esper
• Esper uses the so-called Event Processing Language (EPL)
for defining continuous queries
– EPL is an SQL extension language
• Example: EPL query for detecting incomplete connections
– We exploit the “pattern” construct of EPL
• a is the stream of SYN packets, b is the stream of SYN+ACK packets, <c >
is a filter for RST packets and <d > is the filter of ACK packets that would
correctly complete 3-way handshaking
• Pattern matches if involved packets are within a time window of 61 sec
21. Concluding Remarks
• Collaboration can be beneficial
– In both algorithms, augmenting the number of SR
Members (i.e., augmenting the volume of data to be
correlated) leads to an increase of the detection rate
– Line Fitting converges more quickly to the highest
detection rate compared to R-SYN
• Esper employed a centralized approach
– It can can be useful in case of small SRs with
members which are not geographically dispersed
– It can become a bottleneck in case of high number of
distributed SR members
• A distributed version can be necessary in order to
improve scalability
23. Design Goals
• Enable processing for detecting cyber attacks
• Ensure privacy/confidentiality for locally
generated data
– Sensitive information must not be exposed for
global correlation
• Support for diverse types of input data
– E.g., real-time and long-lived historical data
• Easy-to-use, built using off-the-shelf
components
• Scalable performance
24. Scaling Challenges
• Large numbers participating sites
• Possibly wide distribution
• Massive volumes of events
• High rates of incoming event traffic
25. Agilis Architecture
Re-define
InputFormat,
OutputFormat Map-Reduce (Hadoop)
Job Task Task Task
Jaql
Tracker Tracker Tracker Tracker
Front-End
Jaql
Query
Query WXS HDFS
Scheduler Adapter Adapter
Distributed In-Memory Store (WXS)
Storage Storage Storage Cat 1
container container container
Cat 2 Distributed File System
(HDFS)
Data Data Data
Anonymizing Anonymizing Anonymizing
Gateway
Gateway
Gateway
…
Pre-Processing Pre-Processing Pre-Processing
.
Raw Raw Raw
Data Data Data
Aglis Site 1 Agilis Site 2 Agilis Site N
26. Locality-Aware Collaboration
• Map tasks are mutually independent and can run
in parallel at each site
– Can be run in parallel at each site
• Input data is partitioned among the sites
– Each partition is mapped into an input split
– Map tasks are collocated with their input splits
• Simple queries are delegated to the SQL engine
embedded in data containers
– Select, project, and aggregate
Improves scalability by reducing the amount
of data requiring global correlation
27. Processing Flow
Parallelized Map/Reduce Jobs
Summarized Data:
Normalized [SourceIP,
Prepr XS rNum,
TCPDump Data:
oces Part [LogEvent]* pNum]*
sing 1 Summ Blacklis Black
arizati ting List
on
Prepr XS
TCPDump
oces Part Summarized
sing 2 Data
Ranking Calibr
ation
Prepr XS
TCPDump oces Part
sing N [SourceIP,rank]*
Historical
Safety
Ranking
29. R-SYN Implementation in Agilis
• Gateway identifies incomplete and failed
connections, and maintains tuples
(IP, #incomplete, #failed)
• Jaql query for global correlation:
30. Latency Evaluation*
• 287MB intrusion trace from http://www.itoc.usma.edu/research/
dataset/index.html
Varied WAN link bandwidth
Partitioned across 6 Agilis sites
WANem simulator
Each site simulated by a Linux VM
Comparison against
centralized event processing
(Esper)
* L. Aniello, R. Baldoni, G. Chockler, G.
Laventman, G. Lodi, Y. Vigfusson, “Agilis: An
Internet-Scale Distributed Event Processing
System for Collaborative Detection of Cyber
Attacks”, submitted for publication to
International conference, 2011
31. Conclusions
• Agilis: Distributed platform for collaborative
detection of cyber attacks
• Performance studies show scalability and
highlight benefits of collaboration
– Port scan detection and botnet identification
• Ongoing work: improvements to the
infrastructure
– Replacing WXS with open source RAM-based
store, persistent & data-driven queries
• Evaluation on realistic data
32.
33. UML modeling
• UML
– standardized, general-purpose modeling language
• Modeling the overall CoMiFin middleware
– Common understanding
– Coherent models
• Create
– high level
– technology independent models
– following MDA approach
• Main packages of the model
– Actors
– Components
– Common Data Types
– Diagrams
34. Diagrams
• Use Case diagrams
• High-level Component diagram
• Detailed component diagrams
• Class diagrams
• Data models
• Sequence diagrams
• Deployment diagrams
• Covering all parts of CoMiFin prototype
– 123 diagrams, 1518 modeling element, 223 classes,
49 components, 89 sequence diagrams,…
• Example diagrams: SR Gateway component
39. Data model - Gateway
• Description of elements, components and
concepts
40.
41. Nov. 2010 attack – selected
actions
Event # Day # Time Event
1 1 14:48 Bank1 is notified about infections
3 1 16:05 Logon attempt from UK IP
4 1 16:35 Bank2 sends Bank1 link to drop site
5 2 09:00 Bank 1 analyzes the information received from Bank 2
6 2 09:10 Bank 1 comes across login information of customers of Bank 3, and duly warns Bank 3.
16 3 13:04 Bank 1 analyzes config file of the infection that Bank 1 has received from Bank 2.
17 3 18:45 Customer records are collected from drop site
20 3 20:56 Analysis of config file reveals how the customer may recognize if the PC is infected.
26 4 09:10 The certificates of compromised customers are revoked.
29 4 09:16 The recent transaction history of compromised customers is analyzed.
37 4 12:38 The Financial Supervisory Authority of Norway is notified of the attack.
45 4 13:04 All certificates of compromised customers are revoked.
47 4 13:10 There is a successful logon from a PC in UK.
48 4 13:43 The infected PCs of compromised customers are collected.
53 4 14:10 There are telephone calls with the cyber police.
78 7 10:55 Bank1 receives samples of the Zeus virus from the cyber police.
81 7 12:02 Discussions with the cyber police about how the Zeus virus works.
104 8 09:21 New “stolen” login credentials are posted to drop site.
… … … …
42. Lessons learnt
Banks today exchange information about incidents in an informal
and accidental way
The way banks today exchange information about incidents does not
scale
The cooperation between the banks and between the banks and the
cyber police seems informal and based more or less on good will.
One cannot help thinking that a more formal cooperation and
exchange of information made possible by CoMiFin might
further benefit the parties involved. Countermeasures might
then be made at an earlier stage, reducing detrimental
consequences of attacks and protect society as a whole against
attacks.
43. Which are the events to look for?
During log-in, the Trojan presents a false login-page. When
this happens, one bank had certain log-entries posted to the
bank’s web-server log. The bank identified the signature of
the log entry and developed an automated process that
recognizes the signature, stops the login and also
automatically and instantly revokes the login-credentials of
the customer. Sharing this information allowed other banks
to implement similar countermeasures.
44. Which are the events to look for?
Analysis revealed that at the time of writing this, the Trojan
is very context sensitive, i.e. a small change to the web of
the internet bank would fool the Trojan. This suggests that
using random names for JavaScript functions and CSS
class names would make the Trojan not recognize where to
inject the malicious code and probably render the Trojan
useless or less effective.
45. Which are the events to look for?
One important quality of a Trojan, is its ability to hide itself.
Encryption, hashing and other techniques are being used
for this. If the obfuscation techniques are elaborate it takes
resources and expertise to break them. Together the banks
were able to “undress” the Trojan effectively. Once
“undressed”, the banks were able to device effective
countermeasures.
46. Which are the events to look for?
The national CERT (NSM) did a comprehensive technical
analysis of SpyEye to be shared by all FIs. During this they
discovered the IP of the Command and Control Center
(CC). By contacting the Norwegian ISPs, they were able to
monitor the amount of traffic going from Norwegian
customers to the CC. This way they got a picture of the
severity of the attack. Also steps could be taken to close
down the attack sites.
47. Supporting documentation
The CIFAS - the UK's Fraud Prevention Service, has released
Fraudscape, a report detailing the frauds recorded by the 265
CIFAS Members during 2009. According to CIFAS,
"the findings presented in Fraudscape, however, clearly
demonstrates the benefits that mutual collaboration brings. By
sharing knowledge and pooling resources, CIFAS Members have
prevented millions of pounds of fraud year after year and also
increased the knowledge of the methods used to defraud
businesses, consumers and society equally. This approach can
only bring futher benefits if further cooperation and responsible
data-sharing takes place across all sections of society".