In this presentation, Peter Starceski discussed artificial intelligence and machine learning and how they have been applied to the cybersecurity industry. He highlighted how leveraging artificial intelligence and machine learning provides defenders with an advantage they have never possessed till now. Peter shared examples of how machine learning have proven successful at stopping zero days and preventing ransomware prior to any other legacy solution. He examined the shifting nature of the threat landscape and to how to move beyond signature-based threat detection to rely on a mathematical, algorithmic, and scientific approach to disarm a threat.
2. Agenda
• Introductions – Myself / Bio and Audience
• Review Current Security Risks and Threats
• Discuss Security Threat Vectors
• Definition of Machine Learning (ML)
• Some Current Practical ML Use Cases
• How Does Data Science / ML Work
• Future of Security
• Q&A
4. Introductions – Audience
• Your Name, Company and Role
• Years of Experience in Information Technology?
• Years of Experience in Information Security
6. Malware is used in 90% of
cyber incidents
Hackers are modifying their
code to avoid detection - 99%
of malware hashes are seen
for 58 seconds or less
New or changed hashes render
traditional AV totally useless
2016 VERIZON DBIR
Source: http://www.verizonenterprise.com/verizon-insights-lab/dbir/2016/
10. Execution
(99%)
Identity
(~50%)
Resource
Starvation
- DDOS
(<5%)
T H E T H R E AT L A N D S C A P E
There are three core methods that allow
attackers to get into systems:
Phish
USBMalware
Exploits
0-day
APTs
Adware
Spyware
Ransomware
DOC/XLS
Scripts
Web
Waterholing
SQL
Authentication
CASB
Surveillance
MITM
Encryption
DLP
VPN
Firewall
Authentication
DNS
TCP
UDP ARP
Unicast
Web (80/443)
BroadcastMulticast
IP
11. Some Common Threat Vectors Used By Malware
• Network Edge
• Web
• Email
• Portable Devices and Drives
• Endpoints / Users
12. RUNNING PROCESSES
FILE SYSTEM
PROGRAM EXECUTION
MEMORY EXPLOITS MALICIOUS SCRIPTS MALICIOUS MACROS
WATCH FOR
NEW FILES
BACKGROUND
THREAT DETECTION
MONITOR PROCESS
EXECUTION
MONITOR
LIBRARY LOADS
BLOCK MALWARE
PRE-EXECUTION
APPLICATION
CONTROL
ENDPOINT KILL CHAIN
13. ARE HUMANS EQUIPPED TO DEAL WITH MACHINE LEARNING?
DATA SCIENCE & MACHINE LEARNING
MACHINES ARE MORE EQUIPPED TO DEAL WITH MACHINE
LEARNING …
14. Machine learning is the subfield of computer science that
gives computers the ability to learn without being explicitly
programmed.
Machine learning is closely related to (and often overlaps
with) computational statistics, which also focuses in
prediction-making through the use of computers.
Within the field of data analytics, machine learning is a
method used to devise complex models and algorithms that
lend themselves to prediction; in commercial use, this is
known as predictive analytics.
Machine learning focuses on prediction based on the
properties learned from a earlier data.
DEFINITION: MACHINE LEARNING
15. Amazon, Uber, Facebook, Pandora, etc.
SPAM filtering
Optical Character Recognition (OCR)
Speech Recognition (e.g. Apple Siri)
Internet Search Engines: Google, Bing and Yahoo!
Search
Computer Vision
Space, Astronomy and Robotics
SOME PRACTICAL EXAMPLES:
MACHINE LEARNING
16. COLLECT
HOW DOES IT WORK?
DATA SCIENCE AND MACHINE LEARNING
TRANSFORM,
VECTORIZE AND TRAIN
X = [63796c616e6365]
X = [70726576656e74]
X = [70726f74656374]
EXTRACT CLASSIFY
AND CLUSTER
17. Past Present Future
AV Hips /
Anti-Exploitation
Sandboxing Isolation EDR AI
Specialized Humans Needed
Post-Execution
No Humans
Pre-Execution
Humans Needed
THE FUTURE OF SECURITY
New malware programs continue to increase exponentially – this stat comes from AV-TEST.org and mentions their daily registration for new malware programs … numbers for new malware are even larger than this actually.
Src: http://www.verizonenterprise.com/verizon-insights-lab/dbir/2016/
This 2016 Verizon Data Breach Report, mentions 1,500 known ‘malware-related’ breaches
The report further reveals …
“Analysis of one of our larger datasets showed that 99% of malware hashes are seen for only 58 seconds or less. In fact, most malware was seen only once. This reflects how quickly hackers are modifying their code to avoid detection.”
Many of these are PoS breaches many are ransomware
Lifespan data supports the one-time-use of malware data. That is, many targeted threat-related malware artifacts will never been seen in-the-wild, and will not be intercepted by the traditional AV companies for analysis.
When malware is involved in an attack, the attackers are able to act swiftly and dynamically to ensure persistence, evasion, and long-term success.
Adversaries modify code to avoid detection by signatures
Of 3.8M samples, 20K existed in more than one organization
Lets review an example of such an attack that uses Spear Phishing on said company, Big Ideas …
CEO is being spear phished with public intel from someone pretending to be Carol ...
CEO receives an email with a MS Word Doc from this threat actor and opens it ... As it turns out the CEO is infected with Ransomware ... Let look at this process a bit more closely
Reviewing the process we can see that the MS Word doc actually also contained a VBS file that launched a powershell script to begin the ransomware infection/encryption process …
Is everyone aware of the big issue of ransomware? It is a big business for those that operate in the underground .. Lets look at some more data ...
The revenue from the Angler exploit alone is approximately $300/ransom or roughly $34M/year. That’s from just one type of Ransomware … what about the others? And the underground continues to operate Ransomware as a Service (RaaS) for all threat actor orgs to use ... Represents almost a $1B / year industry.
There are three core problems that allow attackers to get into systems:
Execution – 99% of attacks use this method
Identity – 50% of attacks use this method
Distributed Denial of Service (DDOS) – Only less than 5% of attacks use this
Let’s review some common threat vectors used by malware … list them ... Audience poll: what types on Endpoint Security or AV solutions are you familiar with … list them on whiteboard/easel.
All of these solutions are challenged to protect and prevent these sorts of threats ... How can we break this process and deal with this?
For these threat vectors along this attack chain, security solutions that are part of the kill chain must address / deal with these sorts of threats through protecting running processes, file system and program execution.
Now to the meat of my discussion … there is a lot of buzz around machine learning these days especially with online retail, cloud and security. Let clarify some of this … but first lets review a definition of machine learning ...
Evolved from the study of pattern recognition and computational learning theory in artificial intelligence,[2] machine learning explores the study and construction of algorithms that can learn from and make predictions on data[3] – such algorithms overcome following strictly static program instructions by making data driven predictions or decisions,[4]:2 through building a model from sample inputs. Machine learning is employed in a range of computing tasks where designing and programming explicit algorithms is infeasible; example applications include spam filtering, detection of network intruders or malicious insiders working towards a data breach,[5] optical character recognition (OCR),[6] search engines and computer vision.
These analytical models allow researchers, data scientists, engineers, and analysts to "produce reliable, repeatable decisions and results" and uncover "hidden insights" through learning from historical relationships and trends in the data.[11]
Much like a DNA analysis or an actuarial review, file analysis starts with the collection of a massive amount of data— in this case files of specific types (executables, PDFs, Microsoft Word® documents, Java, Flash, etc.). We collect hundreds of millions of files (even billions) from industry ‘feeds,’ proprietary organizational repositories and live inputs from active computers with Cylance agents on them
Once these files are collected, they’re normalized, reviewed and placed into three buckets; known and verified valid, known and verified malicious and unknown.
Then they’re converted to numerical values that can be used in statistical models. It’s here where vectorization and machine learning are applied to eliminate the human impurities and speed analytical processing. Leveraging the millions of attributes of files identified in extraction, Cylance mathematicians then develop statistical models that accurately predict whether a file is valid or malicious.
So to summarize, in the past AV might have been enough to deal with a limited landscape of threats and human intervention was required. Presently today layers of security protection within a threat vector area are still being challenged to keep pace with security threats, post-execution. The future of security protection and prevention must begin to include ML and AI technology to increase your ability to protect and prevent security threats pre-execution within your environments.
When you are going through the layers of protection as part of your security program, remember that machine learning and AI already have many practical uses within our daily lives – think differently about including use of this innovative technology in your security protection and prevention solution strategy. It will reduce your needs and requirements for incident and breach response activities.
TY