A key concern in today's Internet is the threat of cybercrime. Cybercrimes on the Web use different types of malware and fraud for various purposes such as financial theft, espionage, copyright infringement, denial of service and cyber-warfare. They spread using different protocols such as HTTP or HTTPS, links in email or IM, IRC, malware attachments, and phishing attacks. This cyber threat landscape, often controlled by organized crime and nation states, has been evolving rapidly and is becoming more evasive and difficult to detect. They often make use of multiple infection mechanisms to take control of machines and make them part of botnets, which can then be utilized to perpetrate other kinds of attacks such as data leakage and denial of service attacks. As threats blend across diverse data channels, their detection requires scalable distributed monitoring and cross-correlation with a substantial amount of contextual information. Conventional methods of protecting against cyber attacks such as signature-based detection and firewalls have become less effective.
Many corporations, security companies and governments, thus, are beginning to employ more and more sophisticated means of detecting and protecting against cyber attacks. Recently, data-driven approaches have become popular for detecting new kinds of attacks. Instead of relying on static signature-based detection, these techniques seek to detect anomalies and other patterns from various kinds of data such as network traffic statistics and server and application logs. For example, a sudden increase in the number of unresolvable DNS requests from a laptop might indicate that it is infected by a bot. These approaches rely on very large volumes of data and a variety of analytics to analyze the data. In this talk, I will describe some Big-Data based analytics and systems that IBM has built for detecting different kinds of cyber-attacks, particularly for detecting new kinds or new sources of cyber-attacks that may have not been seen before. These analytics span both real-time processing on the IBM InfoSphere Streams platform as well as off-line processing using InfoSphere Big Insights and data mining tools like SPSS.
2. Agenda
Cyber Threats
IBM Big Data Suite
Big Data Analytics for CyberSecurity
– Monitor Network Behaviors to detect known and unknown cyber-threats
in Enterprises
– Detect Denial of Service Attacks in large ISPs
– Detect Data-Leakage from organizations
2IB
4. 2011: Year of the Targeted Attack
Source: IBM X-Force®
Research 2011 Trend and Risk Report
JK2012-04-26
Marketing
Services
Online
Gaming
Online
Gaming
Online
Gaming
Online
Gaming
Central
Government
Gaming
Gaming
Internet
Services
Online
Gaming
Online
Gaming
Online
Services
Online
Gaming
IT
Security
Banking
IT
Security
Government
Consulting
IT
Security
Tele-
communic
ations
Enter-
tainment
Consumer
Electronics
Agriculture
Apparel
Insurance
Consulting
Consumer
Electronics
Internet
Services
Central
Govt
Central
Govt
Central
Govt
Attack Type
SQL Injection
URL Tampering
Spear Phishing
3rd
Party Software
DDoS
SecureID
Trojan Software
Unknown
Size of circle estimates relative impact of
breach in terms of cost to business
Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
Entertainment
Defense
Defense
Defense
Consumer
Electronics
Central
Government Central
Government
Central
Government
Central
Government
Central
Government
Central
Government
Central
Government
Consumer
Electronics
National
Police
National
Police
State
Police
State
Police
Police
Gaming
Financial
Market
Online
Services
Consulting
Defense
Heavy
Industry
Entertainment
Banking
2011 Sampling of Security Incidents by Attack Type, Time and Impact
conjecture of relative breach impact is based on publicly disclosed information regarding leaked records and financial losses
5. 2012: The explosion of breaches continues!
Source: IBM X-Force®
Research 2012 Trend and Risk Report
2012 Sampling of Security Incidents by Attack Type, Time and Impact
Conjecture of relative breach impact is based on publicly disclosed information regarding leaked records and financial losses
6. A Denial of Service attack that prevents or impairs the use of networks,
systems, or applications by exhausting resources
Malware infection - A virus, worm, Trojan horse, or other code-based
malicious entity that successfully infects a host
A targeted, advanced attack – also known as an advanced persistent
threat (APT) - which is designed to be undetectable.
Loss or theft of technology (laptops, memory sticks, PDAs) which
contain sensitive data; Inadvertent disclosure of data
Defacement - A person gains logical or physical access without
permission and defaces a Web application
Common Cyber Security Risks and Potential Impacts
Loss of Customers
Impact to Brand
Sensitive Data Disclosure
Stolen Intellectual Property
Loss of Data & Productivity
Personal and National Security
Common Security Risks Potential Impacts
Loss of Data or Productivity
7. Botnets
Botnet = A network of compromised computers controlled by
the botmaster, ranging in size from hundreds to millions of hosts
Purpose: denial of service attacks, spam delivery, stealing
credentials and data, compromising control systems, etc.
Hosts infected by downloads from malicious websites, emailed
executables, web, memory stick, PDF, …
Bots receive updates and commands from the Command and
Control node and communications are becoming more
sophisticated
7
8. Botnet Communication
There is need to talk:
Bots receive updates and
commands from the C&C
node
Utilize a command and
control structure, through
IRC, HTML, SSL, Twitter, IM
or custom built solutions.
Botnet communications are
becoming more
sophisticated and harder to
track
– peer-to-peer, distributed vs.
hierarchical control structure
– fast fluxing, name generation
8
C&C
P2P
9. A Typical Threat Example
9
2
Malicious Web
server sends or
reflects exploit code
<click>
1
Install Malware
Mail-Client
5
Victim
Domain
Name
Server
Spammer
Command
& Control
4 web-page +
3 Follow link
Execute (Spam..)
9
C&C
/ U
pdater IP
Address
Lookup
C
&C
/ U
pdater D
N
6
Remotely Control
Malware
Contact Updater
By IP Address (C&C)7
8
10. A Typical Threat Example
10
2
Malicious Web
server sends or
reflects exploit code
<click>
1
Install Malware
Mail-Client
5
Victim
Domain
Name
Server
Spammer
Command
& Control
4 web-page +
3 Follow link
Execute (Spam..)
9
C&C
/ U
pdater IP
Address
Lookup
C
&C
/ U
pdater D
N
6
Remotely Control
Malware
Contact Updater
By IP Address (C&C)7
8
d) Monitor Web Traffic
a) Monitor DNS
c) Monitor Port &
Protocol Usage
b) Monitor NetFlowb) Monitor NetFlow
11. Typical Solution Architecture
11
01/11/10
DNS
NetFlow
…..
X86
Box
X86
Blade
Cell
Blade
X86
Blade
FPGA
Blade
Operating System
TransportSystem S Data Fabric
Unsupervised Real-Time AnalyticsUnsupervised Real-Time Analytics Supervised LearningSupervised Learning
Dashboarding /
Visualization
1
3
2
Real-time Results
(Tickets, Monitoring)
Collect Results +
Evidence
Trends, History
4 Adapted Analytics Models
• Cybersecurity Analytics
• Real-Time processing
of massive data streams
• Advanced Data Mining,
and Trend analytics
• New and Incremental
model learning
PureData System for
Analytics, BigInsights
16. Security Appliances (Firewalls, IDS, IPS, SIEMs)
vs Big Data
IBM Big Data PlatformIBM QRadar Security Intelligence Platform
Security use cases Turnkey Custom
User Interface All-in-one console Purpose-built applications
Data Sources 450+ preconfigured (and growing) Everything else
Data Volume 100+ Terabyte range Peta-byte range
Real-time Analysis Seconds Milliseconds
Analytics Pre-built, primarily rule-based Custom, learning
Required Expertise Average - Security practitioners Skilled – Data scientists and analysts
InfoSphere BigInsights,
Streams and PureData
for Analytics
17. Organizations have a growing need to identify and protect
against threats by building insights from broader and
larger data sets
18. A Typical Threat Example
20
2
Malicious Web
server sends or
reflects exploit code
<click>
1
Install Malware
Mail-Client
5
Victim
Domain
Name
Server
Spammer
Command
& Control
4 web-page +
3 Follow link
Execute (Spam..)
9
C&C
/ U
pdater IP
Address
Lookup
C
&C
/ U
pdater D
N
6
Remotely Control
Malware
Contact Updater
By IP Address (C&C)7
8
d) Monitor Web Traffic
a) Monitor DNS
c) Monitor Port &
Protocol Usage
b) Monitor NetFlowb) Monitor NetFlow
20. Streaming Analytics
22
Monitored
Network
Monitored
Network
The Rest
Of The World
(Internet)
DNSDNSDNS
DHCPDHCP
Firewall
IDS/
IPS
Inline
Real-Time Streaming
Analytics Setup
Detect Signatures
within Individual
Data Streams
Real-Time
Cyber Security
Analytics
Detects behaviors by correlating
across diverse & massive data
streams via Analytics in Motion
Models learnt offline with
Analytics on Data at Rest
IDS/IPS Alerts…
21. Streaming Analytics for Fast-flux Botnets
23
DNS Response
Records
Suspected
Fast-flux
Domain
Names
JoinJoin
DNS Queries
(with internal querying host IP Addresses)
FastFlux
Analytics
FastFlux
Analytics
FastFlux
Analytics
FastFlux
Analytics
FastFlux
Analytics
FastFlux
Analytics
Candidate Names/IP's
with Confidence Values
AggregatorAggregator
Suspected
Fast-Flux
IP-addresses
JoinJoin
DHCP Traffic
(IP MAC System/Owner)
Fast-fluxing
Bot alerts
JoinJoin
Host LogsHost Logs
IPS AlertsIPS Alerts
…
Netflow
26. DNS Amplification Attack
Key characteristics: 1) Targeted attack victimizing hosts & servers 2) DNS service provider becomes a
participant and unavailable during attack 3) Attack attribution is hard
28
To delete
Notas del editor
This slide shows you sort of a timeline of events during the first half of 2011. A bunch of different attacks against major organizations, many of whom we feel are probably pretty operationally competent. These are not surprising that some of these organizations were breached. Also, we sort of relate the attack vector as best we understand it based on what ’s been publicly disclosed. And we also - we sort of have a conjecture about the impact of the breach from a financial standpoint, and that’s a rough estimate based on what’s been publicly disclosed. So those numbers are certainly not to be bet on or anything. But it’s as good as we can do based on what we know.
Open Security Foundation reported 40% increase in breach events for 2012 that cover loss, theft, and exposure of personally identifiable information
There is need to talk: Bots receive updates and commands from the C&C node Utilize a command and control structure, through IRC, HTML, SSL, Twitter, IM or custom built solutions. Botnet communications are becoming more sophisticated and harder to track peer-to-peer, distributed vs. hierarchical control structure fast fluxing, name generation
Key Points - Integrate v3 – the point is to have one platform to manage all of the data – there’s no point in having separate silos of data, each creating separate silos of insight. From the customer POV (a solution POV) big data has to be bigger than just one technology Analyze v3 – very important point – we see big data as a viable place to analyze and store data. New technology is not just a pre-processor to get data into a structured DW for analysis. Significant area of value add by IBM – and the game has changed – unlike DBs/SQL, the market is asking who gets the better answer and therefore sophistication and accuracy of the analytics matters Visualization – need to bring big data to the users – spreadsheet metaphor is the key to doing son Development – need sophisticated development tools for the engines and across them to enable the market to develop analytic applications Workload optimization – improvements upon open source for efficient processing and storage Security and Governance – many are rushing into big data like the wild west. But there is sensitive data that needs to be protected, retention policies need to be determined – all of the maturity of governance for the structured world can benefit the big data world
IBM IOD 2011 05/14/13 Prensenter name here.ppt
What we are monitoring: > 12.000 Systems, we have about 12.000 unique MAC addresses in our db and we can only get to MAC addresses for a part of the systems we monitor (mostly systems using DHCP) since we do not yet connect to infrastructure that assign fixed IP addresses. We added ARP monitoring to correlate static IP addresses with their MAC addresses but see only partially the ARP traffic since the taps are located at the network boundaries. We track about 200.000-600.000 unique domain names per day, 20K to 120K unique domain names per hour, just to give you an idea.