SlideShare una empresa de Scribd logo
1 de 71
Real time analysis
and visualization
ANUBISNETWORKS LABS
PTCORESEC
1
Agenda
 Who are we?
 AnubisNetworks Stream
 Stream Information Processing
 Adding Valuable Information to Stream Events
2
Who are we?
 Tiago Martins
 AnubisNetworks
 @Gank_101
3
 João Gouveia
 AnubisNetworks
 @jgouv
 Tiago Henriques
 Centralway
 @Balgan
Anubis StreamForce
 Events (lots and lots of events)
 Events are “volatile” by nature
 They exist only if someone is listening
 Remember?:
“If a tree falls in a forest and no one is
around to hear it, does it make a
sound?”
4
Anubis StreamForce
 Enter security Big Data
“a brave new world”
5
Volume
Variety Velocity
We are
here
Anubis StreamForce
 Problems (and ambitions) to tackle
 The huge amount and variety of data to process
 Mechanisms to share data across multiple systems,
organizations, teams, companies..
 Common API for dealing with all this (both from a
producer and a consumer perspective)
6
Anubis StreamForce
 Enter the security events CEP - StreamForce
High performance, scalable, Complex Event
Processor (CEP) – 1 node (commodity hw) = 50k
evt/second
Uses streaming technology
Follows a publish / subscriber model
7
Anubis StreamForce
 Data format
Events are published in JSON format
Events are consumed in JSON format
8
Anubis StreamForce
 Yes, we love
JSON
9
Anubis StreamForce 10
Sharing Models
MFE
OpenSource /
MailSpike community
Dashboard
Dashboard
Complex Event Processing
Sinkholes
Data-theft Trojans
Real Time Feeds
Real Time Feeds
IP Reputation
Passive DNSTraps /
Honeypots
Twitter
MFE
OpenSource /
MailSpike community
Dashboard
Dashboard
Complex Event Processing
Sinkholes
Data-theft Trojans
Real Time Feeds
Real Time Feeds
IP Reputation
Passive DNSTraps /
Honeypots
Twitter
Anubis CyberFeed 13
 Feed galore!
Sinkhole data, traps, IP reputation, etc.
 Bespoke feeds (create your own view)
 Measure, group, correlate, de-duplicate ..
 High volume (usually ~6,000 events per
second, more data being added frequently
MFE
OpenSource /
MailSpike community
Dashboard
Event navigation
Complex Event Processing
Sinkholes
Data-theft Trojans
Real Time Feeds
Real Time Feeds
IP Reputation
Passive DNSTraps /
Honeypots
Twitter
Anubis CyberFeed 15
 Apps (demo time)
Stream Information Processing
 Collecting events from the Stream.
 Generating reports.
 Real time visualization.
16
Challenge
 ~6k events/s and at peak over 10k events/s.
 Let‟s focus on trojans feed (banktrojan).
 Peaks @ ~4k events/s
{"_origin":"banktrojan","env":{"server_name":"anam0rph.su","remote_ad
dr":"46.247.141.66","path_info":"/in.php","request_method":"POST","http
_user_agent":"Mozilla/4.0"},"data":"upqchCg4slzHEexq0JyNLlaDqX40G
sCoA3Out1Ah3HaVsQj45YCqGKylXf2Pv81M9JX0","seen":1379956636,"tr
ojanfamily":"Zeus","_provider":"lab","hostn":"lab14","_ts":1379956641}
17
Challenge 18
Challenge 19
Challenge
 Let‟s use the Stream to help
 Group by machine and trojan
 From peak ~4k/s to peak ~1k/s
 Filter fields.
 Geo location
 We end up with
{"env":{"remote_addr":"207.215.48.83"},"trojanfamily":"W32Expiro","_geo_env_remote_addr
":{"country_code":"US","country_name":"United States","city":"Los
Angeles","latitude":34.0067,"longitude":-118.3455,"asn":7132,"asn_name":"AS for SBIS-AS"}}
20
Challenge
 How to process and store these events?
21
Technologies 22
 Applications
 NodeJS
 Server-side Javascript Platform.
 V8 Javascript Engine.
 http://nodejs.org/
Why?
 Great for prototyping.
 Fast and scalable.
 Modules for (almost) everything.
Technologies 23
 Databases
 MongoDB
 NoSQL Database.
 Stores JSON-style documents.
 GridFS
 http://www.mongodb.org/
Why?
 JSON from the
Stream, JSON in the
database.
 Fast and scalable.
 Redis
 Key-value storage.
 In-memory dataset.
 http://redis.io/
Why?
 Faster than MongoDB for
certain operations, like
keeping track of number of
infected machines.
 Very fast and scalable.
Data Collection 24
Storage
Aggregate
information
MongoDB Redis
Worker
Worker
Worker
Processor
Process real time
events
 Applications
 Collector
 Worker
 Processor
 Databases
 MongoDB
 Redis
Collector
Stream
Data Collection 25
Storage
Aggregate
information
MongoDB Redis
Worker
Worker
Worker
Processor
Process real time
events
 Events comes from the Stream.
 Collector distributes events to Workers.
 Workers persist event information.
 Processor aggregates information and stores it for statistical and historical
analysis.
Collector
Stream
Data Collection 26
Storage
Aggregate
information
MongoDB Redis
Worker
Worker
Worker
Processor
Process real time
events
 MongoDB
 Real time information of infected machines.
 Historical aggregated information.
 Redis
 Real time counters of infected machines.
Collector
Stream
Data Collection - Collector 27
Collector
 Old data is periodically remove, i.e. machines that don‟t
produce events for more than 24 hours.
 Send events to Workers.Workers
 Decrements counters of removed information.
 Send warnings
 Country / ASN is no longer infected.
 Botnet X decreased Y % of its size.
Data Collection - Worker 28
Worker
 Create new entries for unseen machines.
 Adds information about new trojans / domains.
 Update the last time the machine was seen.
 Process events and update the Redis counters
accordingly.
 Needs to check MongoDB to determine if:
 New entry – All counters incremented
 Existing entry – Increment only the counters related to
that Trojan
 Send warnings
 Botnet X increased Y % in its size.
 New infections seen on Country / ASN.
Data Collection - Processor
Processor
29
 Processor retrieves real time counters from Redis.
 Information is processed by:
 Botnet;
 ASN;
 Country;
 Botnet/Country;
 Botnet/ASN/Country;
 Total.
 Persisting information to MongoDB creates a historic
database of counters that can be queried and
analyzed.
Data Collection - MongoDB
 Collection for active machines in the last 24h
{
"city" : "Philippine",
"country" : "PH",
"region" : "N/A",
"geo" : {
"lat" : 16.4499,
"lng" : 120.5499
},
"created" : ISODate("2013-09-21T00:19:12.227Z "),
"domains" : [
{ "domain" : "hzmksreiuojy.nl",
"trojan" : "zeus",
"last" : ISODate("2013-09-21T09:42:56.799Z"),
"created" : ISODate("2013-09-21T00:19:12.227Z") }
],
"host" : "112.202.37.72.pldt.net",
"ip" : "112.202.37.72",
"ip_numeric" : 1892296008,
"asn" : "Philippine Long Distance Telephone Company",
"asn_code" : 9299,
"last" : ISODate("2013-09-21T09:42:56.799Z"),
"trojan" : [ "zeus” ]
}
30
Data Collection - MongoDB
 Collection for aggregated information (the historic counters database)
{
"_id" : ObjectId("519c0abac1172e813c004ac3"),
"0" : 744,
"1" : 745,
"3" : 748,
"4" : 748,
"5" : 746,
"6" : 745,
...
"10" : 745,
"11" : 742,
"12" : 746,
"13" : 750,
"14" : 753,
...
"metadata" : {
"country" : "CH",
"date" : "2013-05-22T00:00:00+0000",
"trojan" : "conficker_b",
"type" : "daily"
}
}
31
Preallocated entries for each hour
when the document is created.
If we don’t, MongoDB will keep
extending the documents by adding
thousands of entries every hour and it
becomes very slow.
Data Collection - MongoDB
 Collection for 24 hours
 4 MongoDB Shard instances
 >3 Million infected machines
 ~2 Gb of data
 ~558 bytes per document.
 Indexes by
 ip – helps inserts and updates.
 ip_numeric – enables queries by CIDRs.
 last – Faster removes for expired machines.
 host – Hmm, is there any .gov? 
 country, family, asn – Speeds MongoDB
queries and also allows faster custom
queries.
 Collection for aggregated information
 Data for 119 days (25 May to 11 July)
 > 18 Million entries
 ~6,5 Gb of data
 ~366 bytes per object
 ~56 Mb per day
 Indexes by
 metadata.country
 metadata.trojan
 metadata.date
 Metadata.asn
 Metadata.type,
metadata.country,metadata.date,met.......
(all)
32
Data Collection - Redis
 Counters by Trojan / Country
"cutwailbt:RO": "1256",
"rbot:LA": "3",
"tdss:NP": "114",
"unknown4adapt:IR": "100",
"unknownaff:EE": "0",
"cutwail:CM": "20",
"unknownhrat3:NZ": "56",
"cutwailbt:PR": "191",
"shylock:NO": "1",
"unknownpws:BO": "3",
"unknowndgaxx:CY": "77",
"fbhijack:GH": "22",
"pushbot:IE": "2",
"carufax:US": "424“
 Counters by Trojan
"unknownwindcrat": "18",
"tdss": "79530",
"unknownsu2": "2735",
"unknowndga9": "15",
"unknowndga3": "17",
"ircbot": "19874",
"jshijack": "35570",
"adware": "294341",
"zeus": "1032890",
"jadtre": "40557",
"w32almanahe": "13435",
"festi": "1412",
"qakbot": "19907",
"cutwailbt": "38308“
 Counters by Country
“BY": "11158",
"NA": "314",
"BW": "326",
"AS": "35",
"AG": "94",
"GG": "43",
"ID": "142648",
"MQ": "194",
"IQ": "16142",
"TH": "105429",
"MY": "35410",
"MA": "15278",
"BG": "15086",
"PL": "27384”
33
Data Collection - Redis
 Redis performance in our machine
 SET: 473036.88 requests per second
 GET: 456412.59 requests per second
 INCR: 461787.12 requests per second
 Time to get real time data
 Getting all the data from Familys/ASN/Counters to the NodeJS application and ready to
be processed in around half a second
 > 120 000 entries in… (very fast..)
 Our current usage is
 ~ 3% CPU (of a 2.0 Ghz core)
 ~ 480 Mb of RAM
34
Data Collection - API
 But! There is one more application..
 How to easily retrieve stored data
 MongoDB Rest API is a bit limited.
 NodeJS HTTP + MongoDB + Redis
 Redis
 http://<host>/counters_countries
 ...
 MongoDB
 http://<host>/family_country
 ...
 Custom MongoDB Querys
 http://<host>/ips?f.ip_numeric=95.68.149.0/22
 http://<host>/ips?f.country=PT
 http://<host>/ips?f.host=bgovb
35
Data Collection - Limitations
 Grouping information by machine and trojan doesn‟t allow to
study the real number of events per machine.
 Can be useful to get an idea of the botnet operations or how many
machines are behind a single IP (everyone is behind a router).
 Slow MongoDB impacts everything
 Worker application needs to tolerate a slow MongoDB and discard some
information has a last resort.
 Beware of slow disks! Data persistence occurs every 60 seconds (default)
and can take too much time, having a real impact on performance..
 >10s to persist is usually very bad, something is wrong with hard drives..
36
Data Collection - Evolution
 Warnings
 Which warnings to send? When? Thresholds?
 Aggregate data by week, month, year.
 Aggregate information in shorter intervals.
 Data Mining algorithms applied to all the collected information.
 Apply same principles to other feeds of the Stream.
 Spam
 Twitter
 Etc..
37
Reports
 What‟s happening in country X?
 What about network 192.168.0.1/24?
 Can send me the report of Y everyday at 7 am?
 Ohh!! Remember the report I asked last week?
 Can I get a report for ASN AnubisNetwork?
38
Reports 39
 HTTP API
 Schedule
 Get
 Edit
 Delete
 List schedules
 List reports
 Check MongoDB for work.
 Generate CSV report or store the JSON Document for
later querying.
 Send email with link to files when report is ready.
Server
Generator
Reports – MongoDB CSVs
 Scheduled Report
{
"__v" : 0,
"_id" : ObjectId("51d64e6d5e8fd0d145000008"),
"active" : true,
"asn_code" : "",
"country" : "PT",
"desc" : "Portugal Trojans",
"emails" : "",
"range" : "",
"repeat" : true,
"reports" : [
ObjectId("51d64e7037571bd24500000d"),
ObjectId("51d741e8bcb161366600000c"),
ObjectId("51d89367bcb161366600005f"),
ObjectId("51d9e4f9bcb16136660000ca"),
ObjectId("51db3678c3a15fc577000038"),
ObjectId("51dc87e216eea97c20000007"),
ObjectId("51ddd964a89164643b000001")
],
"run_at" : ISODate("2013-07-11T22:00:00Z"),
"scheduled_date" : ISODate("2013-07-
05T04:41:17.067Z")
}
 Report
{
"__v" : 0,
"_id" : ObjectId("51d89367bcb161366600005f"),
"date" : ISODate("2013-07-06T22:00:07.015Z"),
"files" : [
ObjectId("51d89368bcb1613666000060")
],
"work" : ObjectId("51d64e6d5e8fd0d145000008")
}
 Files
 Each report has an array of files that
represents the report.
 Each file is stored in GridFS.
40
Reports – MongoDB JSONs
 Scheduled Report
{
"__v" : 0,
"_id" : ObjectId("51d64e6d5e8fd0d145000008"),
"active" : true,
"asn_code" : "",
"country" : "PT",
"desc" : "Portugal Trojans",
"emails" : "",
"range" : "",
"repeat" : true,
“snapshots" : [
ObjectId("521f761c0a45c3b00b000001"),
ObjectId("521fb0848275044d420d392f"),
ObjectId("52207c2f7c53a8494f010afa"),
ObjectId("5221c9df4910ba3874000001"),
ObjectId("522275724910ba3874001f66"),
ObjectId("5223c6f24910ba3874003b7a"),
ObjectId("522518734910ba3874005763")
],
"run_at" : ISODate("2013-07-11T22:00:00Z"),
"scheduled_date" : ISODate("2013-07-05T04:41:17.067Z")
}
 Snapshot
{
"_id" : ObjectId("51d89367bcb161366600005f"),
"date" : ISODate("2013-07-06T22:00:07.015Z"),
"work" : ObjectId("521f761c0a45c3b00b000001"),
count: 123
}
 Results
{
"machine" : {
"trojan" : [ “conficker_b“ ],
"ip" : "2.80.2.53",
"host" : "Bl19-1-13.dsl.telepac.pt",
}, …
, "metadata" : {
"work" : ObjectId("521f837647b8d3ba7d000001"),
"snaptshot" : ObjectId("521f837aa669d0b87d000001"),
"date" : ISODate("2013-08-29T00:00:00Z")
},
}
41
Reports – Evolution
 Other reports formats.
 Charts?
 Other type of reports. (Not only botnets).
 Need to evolve Collector first.
42
Globe
 How to visualize real time events from the stream?
 Where are the botnets located?
 Who‟s the most infected?
 How many infections?
43
Globe – Stream
 origin = banktrojan
 Modules
 Group
 trojanfamily
 _geo_env_remote_addr.country_n
ame
 grouptime=5000
 Geo
 Filter fields
 trojanfamily
 Geolocation
 _geo_env_remote_addr.l*
 KPI
 trojanfamily
 _geo_env_remote_addr.country_n
ame
 kpilimit = 10
44
Stream NodeJS Browser
 Request botnets from stream
Globe – NodeJS 45
Stream NodeJS Browser
 NodeJS
 HTTP
 Get JSON from Stream.
 Socket.IO
 Multiple protocol support (to bypass some proxys and handle
old browsers).
 Redis
 Get real time number of infected machines.
Globe – Browser 46
Stream NodeJS Browser
 Browser
 Socket.IO Client
 Real time apps.
 Websockets and other
types of transport.
 WebGL
 ThreeJS
 Tween
 jQuery
 WebWorkers
 Runs in the background.
 Where to place the red dots?
 Calculations from geolocation
to 3D point goes here.
Globe – Evolution
 Some kind of HUD to get better interaction and notifications.
 Request actions by clicking in the globe.
 Generate report of infected in that area.
 Request operations in a specific that area.
 Real time warnings
 New Infections
 Other types of warnings...
47
Adding Valuable Information to
Stream Events
 How to distribute workload to other machines?
 Adding value to the information we already have.
48
Minions
 Typically the operations that would had value
are expensive in terms of resources
 CPU
 Bandwidth
 Master-slave approach that distributes work
among distributed slaves we called Minions.
49
Master
Minion
Minion
Minion
Minion
Minions 50
 Master receives work from Requesters and store the work in MongoDB.
 Minions request work.
 Requesters receive real time information on the work from the Master or
they can ask for work information at a later time.
Process / Storage Minions
Master MongoDB
DNS
Scan
Minion
Minion
Requesters
Minion
Minions
 Master has an API that allows custom Requesters to ask for
work and monitor the work.
 Minion have a modular architecture
 Easily create a custom module.
 Information received from the Minions can then be
processed by the Requesters and
 Sent to the Stream
 Saved on the database
 Update existing database
51
Minion
DNS
Scanning
Data
Mining
Extras...
 So what else could we possibly do using the Stream?
 Distributed Portscanning
 Distributed DNS Resolutions
 Transmit images
 Transmit videos
 Realtime tools
 Data agnostic. Throw stuff at it and it will deal with it.
52
Extras...
 So what else could we possibly do using the Stream?
 Distributed Portscanning
 Distributed DNS Resolutions
 Transmit images
 Transmit videos
 Realtime tools
 Data agnostic. Throw stuff at it and it will deal with it.
53
FOCUS
FOCUS
Portscanning
 Portscanning done right…
 Its not only about your portscanner being able to throw 1 billion
packets per second.
 Location = reliability of scans.
 Distributed system for portscanning is much better. But its not just
about having it distributed. Its about optimizing what it scans.
54
Portscanning 55
Portscanning 56
Portscanning 57
Portscanning
IP Australia
(intervolve)
China
(ChinaVPShosting)
Russia
(NQHost)
USA
(Ramnode)
Portugal
(Zon PT)
41.63.160.0/19
(Angola)
0 hosts up 0 hosts up 0 hosts up 0 hosts up 3 hosts up
(sometimes)
5.1.96.0/21
(China)
10 hosts up 70 hosts up 40 hosts up 10 hosts up 40 hosts up
41.78.72.0/22
(Somalia)
0 hosts up 0 hosts up 0 hosts up 0 hosts up 33 hosts up
92.102.229.0/24
(Russia)
20 hosts up 100 hosts up 2 hosts up 2 hosts up 150 hosts up
58
Portscanning problems...
 Doing portscanning correctly brings along certain problems.
 If you are not HD Moore or Dan Kaminsky, resource wise you are gonna have a bad time
59
Portscanning problems...
 Doing portscanning correctly brings along certain problems.
 If you are not HD Moore or Dan Kaminsky, resource wise you are gonna have a bad time
60
Portscanning problems...
 Doing portscanning correctly brings along certain problems.
 If you are not HD Moore or Dan Kaminsky, resource wise you are gonna have a bad time
 You need lots of minions in different parts of the world
 Doesn‟t actually require an amazing CPU or RAM if you do it correctly.
 Storing all that data...
 Querying that data...
Is it possible to have a cheap, distributed portscanning
system?
61
Portscanning problems... 62
Minion
Portscanning 63
Data…. 64
Data 65
Internet status... 66
Internet status... 67
If we„re doing it... Anyone else can.
Evil side?
68
Anubis StreamForce
 Have cool ideas? Contact us
 Access for Brucon participants:
API Endpoint:
http://brucon.cyberfeed.net:8080/stream?key=brucon
2013
 Web UI Dashboard maker:
http://brucon.cyberfeed.net:8080/webgui
69
Lol
 Last minute testing
70
Questions? 71

Más contenido relacionado

La actualidad más candente

Spark Summit East 2015 Keynote -- Databricks CEO Ion Stoica
Spark Summit East 2015 Keynote -- Databricks CEO Ion StoicaSpark Summit East 2015 Keynote -- Databricks CEO Ion Stoica
Spark Summit East 2015 Keynote -- Databricks CEO Ion StoicaDatabricks
 
Big Data in the Real World
Big Data in the Real WorldBig Data in the Real World
Big Data in the Real WorldMark Kromer
 
Building the Data Lake with Azure Data Factory and Data Lake Analytics
Building the Data Lake with Azure Data Factory and Data Lake AnalyticsBuilding the Data Lake with Azure Data Factory and Data Lake Analytics
Building the Data Lake with Azure Data Factory and Data Lake AnalyticsKhalid Salama
 
Best Practices For Building and Operating A Managed Data Lake - StampedeCon 2016
Best Practices For Building and Operating A Managed Data Lake - StampedeCon 2016Best Practices For Building and Operating A Managed Data Lake - StampedeCon 2016
Best Practices For Building and Operating A Managed Data Lake - StampedeCon 2016StampedeCon
 
Optimizing industrial operations using the big data ecosystem
Optimizing industrial operations using the big data ecosystemOptimizing industrial operations using the big data ecosystem
Optimizing industrial operations using the big data ecosystemDataWorks Summit
 
[Webinar] Getting to Insights Faster: A Framework for Agile Big Data
[Webinar] Getting to Insights Faster: A Framework for Agile Big Data[Webinar] Getting to Insights Faster: A Framework for Agile Big Data
[Webinar] Getting to Insights Faster: A Framework for Agile Big DataInfochimps, a CSC Big Data Business
 
WhereHows: Taming Metadata for 150K Datasets Over 9 Data Platforms
WhereHows: Taming Metadata for 150K Datasets Over 9 Data PlatformsWhereHows: Taming Metadata for 150K Datasets Over 9 Data Platforms
WhereHows: Taming Metadata for 150K Datasets Over 9 Data PlatformsMars Lan
 
The Stream is the Database - Revolutionizing Healthcare Data Architecture
The Stream is the Database - Revolutionizing Healthcare Data ArchitectureThe Stream is the Database - Revolutionizing Healthcare Data Architecture
The Stream is the Database - Revolutionizing Healthcare Data ArchitectureDataWorks Summit/Hadoop Summit
 
How a Tweet Went Viral - BIWA Summit 2017
How a Tweet Went Viral - BIWA Summit 2017How a Tweet Went Viral - BIWA Summit 2017
How a Tweet Went Viral - BIWA Summit 2017Rittman Analytics
 
Big Data Computing Architecture
Big Data Computing ArchitectureBig Data Computing Architecture
Big Data Computing ArchitectureGang Tao
 
Tools and approaches for migrating big datasets to the cloud
Tools and approaches for migrating big datasets to the cloudTools and approaches for migrating big datasets to the cloud
Tools and approaches for migrating big datasets to the cloudDataWorks Summit
 
Uber's data science workbench
Uber's data science workbenchUber's data science workbench
Uber's data science workbenchRan Wei
 
Hadoop Reporting and Analysis - Jaspersoft
Hadoop Reporting and Analysis - JaspersoftHadoop Reporting and Analysis - Jaspersoft
Hadoop Reporting and Analysis - JaspersoftHortonworks
 
Hadoop data access layer v4.0
Hadoop data access layer v4.0Hadoop data access layer v4.0
Hadoop data access layer v4.0SpringPeople
 
Options for Data Prep - A Survey of the Current Market
Options for Data Prep - A Survey of the Current MarketOptions for Data Prep - A Survey of the Current Market
Options for Data Prep - A Survey of the Current MarketDremio Corporation
 
Building a Data Lake on AWS
Building a Data Lake on AWSBuilding a Data Lake on AWS
Building a Data Lake on AWSGary Stafford
 

La actualidad más candente (20)

Spark Summit East 2015 Keynote -- Databricks CEO Ion Stoica
Spark Summit East 2015 Keynote -- Databricks CEO Ion StoicaSpark Summit East 2015 Keynote -- Databricks CEO Ion Stoica
Spark Summit East 2015 Keynote -- Databricks CEO Ion Stoica
 
Big Data in the Real World
Big Data in the Real WorldBig Data in the Real World
Big Data in the Real World
 
Building the Data Lake with Azure Data Factory and Data Lake Analytics
Building the Data Lake with Azure Data Factory and Data Lake AnalyticsBuilding the Data Lake with Azure Data Factory and Data Lake Analytics
Building the Data Lake with Azure Data Factory and Data Lake Analytics
 
Best Practices For Building and Operating A Managed Data Lake - StampedeCon 2016
Best Practices For Building and Operating A Managed Data Lake - StampedeCon 2016Best Practices For Building and Operating A Managed Data Lake - StampedeCon 2016
Best Practices For Building and Operating A Managed Data Lake - StampedeCon 2016
 
Optimizing industrial operations using the big data ecosystem
Optimizing industrial operations using the big data ecosystemOptimizing industrial operations using the big data ecosystem
Optimizing industrial operations using the big data ecosystem
 
[Webinar] Getting to Insights Faster: A Framework for Agile Big Data
[Webinar] Getting to Insights Faster: A Framework for Agile Big Data[Webinar] Getting to Insights Faster: A Framework for Agile Big Data
[Webinar] Getting to Insights Faster: A Framework for Agile Big Data
 
Yahoo's Next Generation User Profile Platform
Yahoo's Next Generation User Profile PlatformYahoo's Next Generation User Profile Platform
Yahoo's Next Generation User Profile Platform
 
WhereHows: Taming Metadata for 150K Datasets Over 9 Data Platforms
WhereHows: Taming Metadata for 150K Datasets Over 9 Data PlatformsWhereHows: Taming Metadata for 150K Datasets Over 9 Data Platforms
WhereHows: Taming Metadata for 150K Datasets Over 9 Data Platforms
 
The Stream is the Database - Revolutionizing Healthcare Data Architecture
The Stream is the Database - Revolutionizing Healthcare Data ArchitectureThe Stream is the Database - Revolutionizing Healthcare Data Architecture
The Stream is the Database - Revolutionizing Healthcare Data Architecture
 
How a Tweet Went Viral - BIWA Summit 2017
How a Tweet Went Viral - BIWA Summit 2017How a Tweet Went Viral - BIWA Summit 2017
How a Tweet Went Viral - BIWA Summit 2017
 
Benefits of Hadoop as Platform as a Service
Benefits of Hadoop as Platform as a ServiceBenefits of Hadoop as Platform as a Service
Benefits of Hadoop as Platform as a Service
 
Analysis of Major Trends in Big Data Analytics
Analysis of Major Trends in Big Data AnalyticsAnalysis of Major Trends in Big Data Analytics
Analysis of Major Trends in Big Data Analytics
 
Big Data Computing Architecture
Big Data Computing ArchitectureBig Data Computing Architecture
Big Data Computing Architecture
 
Tools and approaches for migrating big datasets to the cloud
Tools and approaches for migrating big datasets to the cloudTools and approaches for migrating big datasets to the cloud
Tools and approaches for migrating big datasets to the cloud
 
Uber's data science workbench
Uber's data science workbenchUber's data science workbench
Uber's data science workbench
 
Hadoop Reporting and Analysis - Jaspersoft
Hadoop Reporting and Analysis - JaspersoftHadoop Reporting and Analysis - Jaspersoft
Hadoop Reporting and Analysis - Jaspersoft
 
Hadoop data access layer v4.0
Hadoop data access layer v4.0Hadoop data access layer v4.0
Hadoop data access layer v4.0
 
Options for Data Prep - A Survey of the Current Market
Options for Data Prep - A Survey of the Current MarketOptions for Data Prep - A Survey of the Current Market
Options for Data Prep - A Survey of the Current Market
 
Building a Data Lake on AWS
Building a Data Lake on AWSBuilding a Data Lake on AWS
Building a Data Lake on AWS
 
Marketing vs Technology
Marketing vs TechnologyMarketing vs Technology
Marketing vs Technology
 

Destacado

Theius: A Streaming Visualization Suite for Hadoop Clusters
Theius: A Streaming Visualization Suite for Hadoop ClustersTheius: A Streaming Visualization Suite for Hadoop Clusters
Theius: A Streaming Visualization Suite for Hadoop Clustersjtedesco5
 
What Is Visualization?
What Is Visualization?What Is Visualization?
What Is Visualization?OneSpring LLC
 
An Introduction to Evaluation in Medical Visualization
An Introduction to Evaluation in Medical VisualizationAn Introduction to Evaluation in Medical Visualization
An Introduction to Evaluation in Medical VisualizationNoeska Smit
 
In Memory Analytics with Apache Spark
In Memory Analytics with Apache SparkIn Memory Analytics with Apache Spark
In Memory Analytics with Apache SparkVenkata Naga Ravi
 
JT@UCSB - On-Demand Data Streaming from Sensor Nodes and A quick overview of ...
JT@UCSB - On-Demand Data Streaming from Sensor Nodes and A quick overview of ...JT@UCSB - On-Demand Data Streaming from Sensor Nodes and A quick overview of ...
JT@UCSB - On-Demand Data Streaming from Sensor Nodes and A quick overview of ...Jonas Traub
 
Text and text stream mining tutorial
Text and text stream mining tutorialText and text stream mining tutorial
Text and text stream mining tutorialmgrcar
 
Web 2 0 Projects Elementary
Web 2 0 Projects ElementaryWeb 2 0 Projects Elementary
Web 2 0 Projects ElementaryCinci0987
 
Towards Utilizing GPUs in Information Visualization
Towards Utilizing GPUs in Information VisualizationTowards Utilizing GPUs in Information Visualization
Towards Utilizing GPUs in Information VisualizationNiklas Elmqvist
 
Processing Twitter Events in Real-Time with Oracle Event Processing (OEP) 12c
Processing Twitter Events in Real-Time with Oracle Event Processing (OEP) 12cProcessing Twitter Events in Real-Time with Oracle Event Processing (OEP) 12c
Processing Twitter Events in Real-Time with Oracle Event Processing (OEP) 12cGuido Schmutz
 
Info vis 4-22-2013-dc-vis-meetup-shneiderman
Info vis 4-22-2013-dc-vis-meetup-shneidermanInfo vis 4-22-2013-dc-vis-meetup-shneiderman
Info vis 4-22-2013-dc-vis-meetup-shneidermanUniversity of Maryland
 
Information Visualization for Medical Informatics
Information Visualization for Medical Informatics Information Visualization for Medical Informatics
Information Visualization for Medical Informatics University of Maryland
 
Real-Time Analytics and Visualization of Streaming Big Data with JReport & Sc...
Real-Time Analytics and Visualization of Streaming Big Data with JReport & Sc...Real-Time Analytics and Visualization of Streaming Big Data with JReport & Sc...
Real-Time Analytics and Visualization of Streaming Big Data with JReport & Sc...Mia Yuan Cao
 
Stream Processing with Kafka in Uber, Danny Yuan
Stream Processing with Kafka in Uber, Danny Yuan Stream Processing with Kafka in Uber, Danny Yuan
Stream Processing with Kafka in Uber, Danny Yuan confluent
 

Destacado (13)

Theius: A Streaming Visualization Suite for Hadoop Clusters
Theius: A Streaming Visualization Suite for Hadoop ClustersTheius: A Streaming Visualization Suite for Hadoop Clusters
Theius: A Streaming Visualization Suite for Hadoop Clusters
 
What Is Visualization?
What Is Visualization?What Is Visualization?
What Is Visualization?
 
An Introduction to Evaluation in Medical Visualization
An Introduction to Evaluation in Medical VisualizationAn Introduction to Evaluation in Medical Visualization
An Introduction to Evaluation in Medical Visualization
 
In Memory Analytics with Apache Spark
In Memory Analytics with Apache SparkIn Memory Analytics with Apache Spark
In Memory Analytics with Apache Spark
 
JT@UCSB - On-Demand Data Streaming from Sensor Nodes and A quick overview of ...
JT@UCSB - On-Demand Data Streaming from Sensor Nodes and A quick overview of ...JT@UCSB - On-Demand Data Streaming from Sensor Nodes and A quick overview of ...
JT@UCSB - On-Demand Data Streaming from Sensor Nodes and A quick overview of ...
 
Text and text stream mining tutorial
Text and text stream mining tutorialText and text stream mining tutorial
Text and text stream mining tutorial
 
Web 2 0 Projects Elementary
Web 2 0 Projects ElementaryWeb 2 0 Projects Elementary
Web 2 0 Projects Elementary
 
Towards Utilizing GPUs in Information Visualization
Towards Utilizing GPUs in Information VisualizationTowards Utilizing GPUs in Information Visualization
Towards Utilizing GPUs in Information Visualization
 
Processing Twitter Events in Real-Time with Oracle Event Processing (OEP) 12c
Processing Twitter Events in Real-Time with Oracle Event Processing (OEP) 12cProcessing Twitter Events in Real-Time with Oracle Event Processing (OEP) 12c
Processing Twitter Events in Real-Time with Oracle Event Processing (OEP) 12c
 
Info vis 4-22-2013-dc-vis-meetup-shneiderman
Info vis 4-22-2013-dc-vis-meetup-shneidermanInfo vis 4-22-2013-dc-vis-meetup-shneiderman
Info vis 4-22-2013-dc-vis-meetup-shneiderman
 
Information Visualization for Medical Informatics
Information Visualization for Medical Informatics Information Visualization for Medical Informatics
Information Visualization for Medical Informatics
 
Real-Time Analytics and Visualization of Streaming Big Data with JReport & Sc...
Real-Time Analytics and Visualization of Streaming Big Data with JReport & Sc...Real-Time Analytics and Visualization of Streaming Big Data with JReport & Sc...
Real-Time Analytics and Visualization of Streaming Big Data with JReport & Sc...
 
Stream Processing with Kafka in Uber, Danny Yuan
Stream Processing with Kafka in Uber, Danny Yuan Stream Processing with Kafka in Uber, Danny Yuan
Stream Processing with Kafka in Uber, Danny Yuan
 

Similar a Real time analysis and visualization of security events

Real-time big data analytics based on product recommendations case study
Real-time big data analytics based on product recommendations case studyReal-time big data analytics based on product recommendations case study
Real-time big data analytics based on product recommendations case studydeep.bi
 
MongoDB Europe 2016 - Debugging MongoDB Performance
MongoDB Europe 2016 - Debugging MongoDB PerformanceMongoDB Europe 2016 - Debugging MongoDB Performance
MongoDB Europe 2016 - Debugging MongoDB PerformanceMongoDB
 
Building and Scaling the Internet of Things with MongoDB at Vivint
Building and Scaling the Internet of Things with MongoDB at Vivint Building and Scaling the Internet of Things with MongoDB at Vivint
Building and Scaling the Internet of Things with MongoDB at Vivint MongoDB
 
MongoDB Europe 2016 - Enabling the Internet of Things at Proximus - Belgium's...
MongoDB Europe 2016 - Enabling the Internet of Things at Proximus - Belgium's...MongoDB Europe 2016 - Enabling the Internet of Things at Proximus - Belgium's...
MongoDB Europe 2016 - Enabling the Internet of Things at Proximus - Belgium's...MongoDB
 
Slicing Apples with Ninja Sword: Fighting Malware at the Corporate Level (OWA...
Slicing Apples with Ninja Sword: Fighting Malware at the Corporate Level (OWA...Slicing Apples with Ninja Sword: Fighting Malware at the Corporate Level (OWA...
Slicing Apples with Ninja Sword: Fighting Malware at the Corporate Level (OWA...Jakub "Kuba" Sendor
 
Spark Summit - Stratio Streaming
Spark Summit - Stratio Streaming Spark Summit - Stratio Streaming
Spark Summit - Stratio Streaming Stratio
 
Introduction to Streaming Analytics
Introduction to Streaming AnalyticsIntroduction to Streaming Analytics
Introduction to Streaming AnalyticsGuido Schmutz
 
IOOF IT System Modernisation
IOOF IT System ModernisationIOOF IT System Modernisation
IOOF IT System ModernisationMongoDB
 
Implementing and Visualizing Clickstream data with MongoDB
Implementing and Visualizing Clickstream data with MongoDBImplementing and Visualizing Clickstream data with MongoDB
Implementing and Visualizing Clickstream data with MongoDBMongoDB
 
Practice Fusion & MongoDB: Transitioning a 4 TB Audit Log from SQL Server to ...
Practice Fusion & MongoDB: Transitioning a 4 TB Audit Log from SQL Server to ...Practice Fusion & MongoDB: Transitioning a 4 TB Audit Log from SQL Server to ...
Practice Fusion & MongoDB: Transitioning a 4 TB Audit Log from SQL Server to ...MongoDB
 
MongoDB for Time Series Data
MongoDB for Time Series DataMongoDB for Time Series Data
MongoDB for Time Series DataMongoDB
 
Strata Presentation: One Billion Objects in 2GB: Big Data Analytics on Small ...
Strata Presentation: One Billion Objects in 2GB: Big Data Analytics on Small ...Strata Presentation: One Billion Objects in 2GB: Big Data Analytics on Small ...
Strata Presentation: One Billion Objects in 2GB: Big Data Analytics on Small ...randyguck
 
Introduction to WSO2 Data Analytics Platform
Introduction to  WSO2 Data Analytics PlatformIntroduction to  WSO2 Data Analytics Platform
Introduction to WSO2 Data Analytics PlatformSrinath Perera
 
Intelligent Monitoring
Intelligent MonitoringIntelligent Monitoring
Intelligent MonitoringIntelie
 
Exceeding Classical: Probabilistic Data Structures in Data Intensive Applicat...
Exceeding Classical: Probabilistic Data Structures in Data Intensive Applicat...Exceeding Classical: Probabilistic Data Structures in Data Intensive Applicat...
Exceeding Classical: Probabilistic Data Structures in Data Intensive Applicat...Andrii Gakhov
 
The hidden engineering behind machine learning products at Helixa
The hidden engineering behind machine learning products at HelixaThe hidden engineering behind machine learning products at Helixa
The hidden engineering behind machine learning products at HelixaAlluxio, Inc.
 
MongoDB Solution for Internet of Things and Big Data
MongoDB Solution for Internet of Things and Big DataMongoDB Solution for Internet of Things and Big Data
MongoDB Solution for Internet of Things and Big DataStefano Dindo
 
Lab pratico per la progettazione di soluzioni MongoDB in ambito Internet of T...
Lab pratico per la progettazione di soluzioni MongoDB in ambito Internet of T...Lab pratico per la progettazione di soluzioni MongoDB in ambito Internet of T...
Lab pratico per la progettazione di soluzioni MongoDB in ambito Internet of T...festival ICT 2016
 
Aggregated queries with Druid on terrabytes and petabytes of data
Aggregated queries with Druid on terrabytes and petabytes of dataAggregated queries with Druid on terrabytes and petabytes of data
Aggregated queries with Druid on terrabytes and petabytes of dataRostislav Pashuto
 

Similar a Real time analysis and visualization of security events (20)

Real-time big data analytics based on product recommendations case study
Real-time big data analytics based on product recommendations case studyReal-time big data analytics based on product recommendations case study
Real-time big data analytics based on product recommendations case study
 
MongoDB Europe 2016 - Debugging MongoDB Performance
MongoDB Europe 2016 - Debugging MongoDB PerformanceMongoDB Europe 2016 - Debugging MongoDB Performance
MongoDB Europe 2016 - Debugging MongoDB Performance
 
Building and Scaling the Internet of Things with MongoDB at Vivint
Building and Scaling the Internet of Things with MongoDB at Vivint Building and Scaling the Internet of Things with MongoDB at Vivint
Building and Scaling the Internet of Things with MongoDB at Vivint
 
MongoDB Europe 2016 - Enabling the Internet of Things at Proximus - Belgium's...
MongoDB Europe 2016 - Enabling the Internet of Things at Proximus - Belgium's...MongoDB Europe 2016 - Enabling the Internet of Things at Proximus - Belgium's...
MongoDB Europe 2016 - Enabling the Internet of Things at Proximus - Belgium's...
 
Slicing Apples with Ninja Sword: Fighting Malware at the Corporate Level (OWA...
Slicing Apples with Ninja Sword: Fighting Malware at the Corporate Level (OWA...Slicing Apples with Ninja Sword: Fighting Malware at the Corporate Level (OWA...
Slicing Apples with Ninja Sword: Fighting Malware at the Corporate Level (OWA...
 
Internet of things
Internet of thingsInternet of things
Internet of things
 
Spark Summit - Stratio Streaming
Spark Summit - Stratio Streaming Spark Summit - Stratio Streaming
Spark Summit - Stratio Streaming
 
Introduction to Streaming Analytics
Introduction to Streaming AnalyticsIntroduction to Streaming Analytics
Introduction to Streaming Analytics
 
IOOF IT System Modernisation
IOOF IT System ModernisationIOOF IT System Modernisation
IOOF IT System Modernisation
 
Implementing and Visualizing Clickstream data with MongoDB
Implementing and Visualizing Clickstream data with MongoDBImplementing and Visualizing Clickstream data with MongoDB
Implementing and Visualizing Clickstream data with MongoDB
 
Practice Fusion & MongoDB: Transitioning a 4 TB Audit Log from SQL Server to ...
Practice Fusion & MongoDB: Transitioning a 4 TB Audit Log from SQL Server to ...Practice Fusion & MongoDB: Transitioning a 4 TB Audit Log from SQL Server to ...
Practice Fusion & MongoDB: Transitioning a 4 TB Audit Log from SQL Server to ...
 
MongoDB for Time Series Data
MongoDB for Time Series DataMongoDB for Time Series Data
MongoDB for Time Series Data
 
Strata Presentation: One Billion Objects in 2GB: Big Data Analytics on Small ...
Strata Presentation: One Billion Objects in 2GB: Big Data Analytics on Small ...Strata Presentation: One Billion Objects in 2GB: Big Data Analytics on Small ...
Strata Presentation: One Billion Objects in 2GB: Big Data Analytics on Small ...
 
Introduction to WSO2 Data Analytics Platform
Introduction to  WSO2 Data Analytics PlatformIntroduction to  WSO2 Data Analytics Platform
Introduction to WSO2 Data Analytics Platform
 
Intelligent Monitoring
Intelligent MonitoringIntelligent Monitoring
Intelligent Monitoring
 
Exceeding Classical: Probabilistic Data Structures in Data Intensive Applicat...
Exceeding Classical: Probabilistic Data Structures in Data Intensive Applicat...Exceeding Classical: Probabilistic Data Structures in Data Intensive Applicat...
Exceeding Classical: Probabilistic Data Structures in Data Intensive Applicat...
 
The hidden engineering behind machine learning products at Helixa
The hidden engineering behind machine learning products at HelixaThe hidden engineering behind machine learning products at Helixa
The hidden engineering behind machine learning products at Helixa
 
MongoDB Solution for Internet of Things and Big Data
MongoDB Solution for Internet of Things and Big DataMongoDB Solution for Internet of Things and Big Data
MongoDB Solution for Internet of Things and Big Data
 
Lab pratico per la progettazione di soluzioni MongoDB in ambito Internet of T...
Lab pratico per la progettazione di soluzioni MongoDB in ambito Internet of T...Lab pratico per la progettazione di soluzioni MongoDB in ambito Internet of T...
Lab pratico per la progettazione di soluzioni MongoDB in ambito Internet of T...
 
Aggregated queries with Druid on terrabytes and petabytes of data
Aggregated queries with Druid on terrabytes and petabytes of dataAggregated queries with Druid on terrabytes and petabytes of data
Aggregated queries with Druid on terrabytes and petabytes of data
 

Más de Tiago Henriques

BSides Lisbon 2023 - AI in Cybersecurity.pdf
BSides Lisbon 2023 - AI in Cybersecurity.pdfBSides Lisbon 2023 - AI in Cybersecurity.pdf
BSides Lisbon 2023 - AI in Cybersecurity.pdfTiago Henriques
 
Pixels Camp 2017 - Stories from the trenches of building a data architecture
Pixels Camp 2017 - Stories from the trenches of building a data architecturePixels Camp 2017 - Stories from the trenches of building a data architecture
Pixels Camp 2017 - Stories from the trenches of building a data architectureTiago Henriques
 
Pixels Camp 2017 - Stranger Things the internet version
Pixels Camp 2017 - Stranger Things the internet versionPixels Camp 2017 - Stranger Things the internet version
Pixels Camp 2017 - Stranger Things the internet versionTiago Henriques
 
The state of cybersecurity in Switzerland - FinTechDay 2017
The state of cybersecurity in Switzerland - FinTechDay 2017The state of cybersecurity in Switzerland - FinTechDay 2017
The state of cybersecurity in Switzerland - FinTechDay 2017Tiago Henriques
 
Webzurich - The State of Web Security in Switzerland
Webzurich - The State of Web Security in SwitzerlandWebzurich - The State of Web Security in Switzerland
Webzurich - The State of Web Security in SwitzerlandTiago Henriques
 
BSides Lisbon - Data science, machine learning and cybersecurity
BSides Lisbon - Data science, machine learning and cybersecurity BSides Lisbon - Data science, machine learning and cybersecurity
BSides Lisbon - Data science, machine learning and cybersecurity Tiago Henriques
 
I FOR ONE WELCOME OUR NEW CYBER OVERLORDS! AN INTRODUCTION TO THE USE OF MACH...
I FOR ONE WELCOME OUR NEW CYBER OVERLORDS! AN INTRODUCTION TO THE USE OF MACH...I FOR ONE WELCOME OUR NEW CYBER OVERLORDS! AN INTRODUCTION TO THE USE OF MACH...
I FOR ONE WELCOME OUR NEW CYBER OVERLORDS! AN INTRODUCTION TO THE USE OF MACH...Tiago Henriques
 
BinaryEdge - Security Data Metrics and Measurements at Scale - BSidesLisbon 2015
BinaryEdge - Security Data Metrics and Measurements at Scale - BSidesLisbon 2015BinaryEdge - Security Data Metrics and Measurements at Scale - BSidesLisbon 2015
BinaryEdge - Security Data Metrics and Measurements at Scale - BSidesLisbon 2015Tiago Henriques
 
Codebits 2014 - Secure Coding - Gamification and automation for the win
Codebits 2014 - Secure Coding - Gamification and automation for the winCodebits 2014 - Secure Coding - Gamification and automation for the win
Codebits 2014 - Secure Coding - Gamification and automation for the winTiago Henriques
 
Confraria 28-feb-2013 mesa redonda
Confraria 28-feb-2013 mesa redondaConfraria 28-feb-2013 mesa redonda
Confraria 28-feb-2013 mesa redondaTiago Henriques
 
How to dominate a country
How to dominate a countryHow to dominate a country
How to dominate a countryTiago Henriques
 
Country domination - Causing chaos and wrecking havoc
Country domination - Causing chaos and wrecking havocCountry domination - Causing chaos and wrecking havoc
Country domination - Causing chaos and wrecking havocTiago Henriques
 
(Mis)trusting and (ab)using ssh
(Mis)trusting and (ab)using ssh(Mis)trusting and (ab)using ssh
(Mis)trusting and (ab)using sshTiago Henriques
 
Secure coding - Balgan - Tiago Henriques
Secure coding - Balgan - Tiago HenriquesSecure coding - Balgan - Tiago Henriques
Secure coding - Balgan - Tiago HenriquesTiago Henriques
 
Vulnerability, exploit to metasploit
Vulnerability, exploit to metasploitVulnerability, exploit to metasploit
Vulnerability, exploit to metasploitTiago Henriques
 
Practical exploitation and social engineering
Practical exploitation and social engineeringPractical exploitation and social engineering
Practical exploitation and social engineeringTiago Henriques
 

Más de Tiago Henriques (20)

BSides Lisbon 2023 - AI in Cybersecurity.pdf
BSides Lisbon 2023 - AI in Cybersecurity.pdfBSides Lisbon 2023 - AI in Cybersecurity.pdf
BSides Lisbon 2023 - AI in Cybersecurity.pdf
 
Pixels Camp 2017 - Stories from the trenches of building a data architecture
Pixels Camp 2017 - Stories from the trenches of building a data architecturePixels Camp 2017 - Stories from the trenches of building a data architecture
Pixels Camp 2017 - Stories from the trenches of building a data architecture
 
Pixels Camp 2017 - Stranger Things the internet version
Pixels Camp 2017 - Stranger Things the internet versionPixels Camp 2017 - Stranger Things the internet version
Pixels Camp 2017 - Stranger Things the internet version
 
The state of cybersecurity in Switzerland - FinTechDay 2017
The state of cybersecurity in Switzerland - FinTechDay 2017The state of cybersecurity in Switzerland - FinTechDay 2017
The state of cybersecurity in Switzerland - FinTechDay 2017
 
Webzurich - The State of Web Security in Switzerland
Webzurich - The State of Web Security in SwitzerlandWebzurich - The State of Web Security in Switzerland
Webzurich - The State of Web Security in Switzerland
 
BSides Lisbon - Data science, machine learning and cybersecurity
BSides Lisbon - Data science, machine learning and cybersecurity BSides Lisbon - Data science, machine learning and cybersecurity
BSides Lisbon - Data science, machine learning and cybersecurity
 
I FOR ONE WELCOME OUR NEW CYBER OVERLORDS! AN INTRODUCTION TO THE USE OF MACH...
I FOR ONE WELCOME OUR NEW CYBER OVERLORDS! AN INTRODUCTION TO THE USE OF MACH...I FOR ONE WELCOME OUR NEW CYBER OVERLORDS! AN INTRODUCTION TO THE USE OF MACH...
I FOR ONE WELCOME OUR NEW CYBER OVERLORDS! AN INTRODUCTION TO THE USE OF MACH...
 
BinaryEdge - Security Data Metrics and Measurements at Scale - BSidesLisbon 2015
BinaryEdge - Security Data Metrics and Measurements at Scale - BSidesLisbon 2015BinaryEdge - Security Data Metrics and Measurements at Scale - BSidesLisbon 2015
BinaryEdge - Security Data Metrics and Measurements at Scale - BSidesLisbon 2015
 
Codebits 2014 - Secure Coding - Gamification and automation for the win
Codebits 2014 - Secure Coding - Gamification and automation for the winCodebits 2014 - Secure Coding - Gamification and automation for the win
Codebits 2014 - Secure Coding - Gamification and automation for the win
 
Hardware hacking 101
Hardware hacking 101Hardware hacking 101
Hardware hacking 101
 
Workshop
WorkshopWorkshop
Workshop
 
Enei
EneiEnei
Enei
 
Confraria 28-feb-2013 mesa redonda
Confraria 28-feb-2013 mesa redondaConfraria 28-feb-2013 mesa redonda
Confraria 28-feb-2013 mesa redonda
 
Preso fcul
Preso fculPreso fcul
Preso fcul
 
How to dominate a country
How to dominate a countryHow to dominate a country
How to dominate a country
 
Country domination - Causing chaos and wrecking havoc
Country domination - Causing chaos and wrecking havocCountry domination - Causing chaos and wrecking havoc
Country domination - Causing chaos and wrecking havoc
 
(Mis)trusting and (ab)using ssh
(Mis)trusting and (ab)using ssh(Mis)trusting and (ab)using ssh
(Mis)trusting and (ab)using ssh
 
Secure coding - Balgan - Tiago Henriques
Secure coding - Balgan - Tiago HenriquesSecure coding - Balgan - Tiago Henriques
Secure coding - Balgan - Tiago Henriques
 
Vulnerability, exploit to metasploit
Vulnerability, exploit to metasploitVulnerability, exploit to metasploit
Vulnerability, exploit to metasploit
 
Practical exploitation and social engineering
Practical exploitation and social engineeringPractical exploitation and social engineering
Practical exploitation and social engineering
 

Último

Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 

Último (20)

Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 

Real time analysis and visualization of security events

  • 1. Real time analysis and visualization ANUBISNETWORKS LABS PTCORESEC 1
  • 2. Agenda  Who are we?  AnubisNetworks Stream  Stream Information Processing  Adding Valuable Information to Stream Events 2
  • 3. Who are we?  Tiago Martins  AnubisNetworks  @Gank_101 3  João Gouveia  AnubisNetworks  @jgouv  Tiago Henriques  Centralway  @Balgan
  • 4. Anubis StreamForce  Events (lots and lots of events)  Events are “volatile” by nature  They exist only if someone is listening  Remember?: “If a tree falls in a forest and no one is around to hear it, does it make a sound?” 4
  • 5. Anubis StreamForce  Enter security Big Data “a brave new world” 5 Volume Variety Velocity We are here
  • 6. Anubis StreamForce  Problems (and ambitions) to tackle  The huge amount and variety of data to process  Mechanisms to share data across multiple systems, organizations, teams, companies..  Common API for dealing with all this (both from a producer and a consumer perspective) 6
  • 7. Anubis StreamForce  Enter the security events CEP - StreamForce High performance, scalable, Complex Event Processor (CEP) – 1 node (commodity hw) = 50k evt/second Uses streaming technology Follows a publish / subscriber model 7
  • 8. Anubis StreamForce  Data format Events are published in JSON format Events are consumed in JSON format 8
  • 11. MFE OpenSource / MailSpike community Dashboard Dashboard Complex Event Processing Sinkholes Data-theft Trojans Real Time Feeds Real Time Feeds IP Reputation Passive DNSTraps / Honeypots Twitter
  • 12. MFE OpenSource / MailSpike community Dashboard Dashboard Complex Event Processing Sinkholes Data-theft Trojans Real Time Feeds Real Time Feeds IP Reputation Passive DNSTraps / Honeypots Twitter
  • 13. Anubis CyberFeed 13  Feed galore! Sinkhole data, traps, IP reputation, etc.  Bespoke feeds (create your own view)  Measure, group, correlate, de-duplicate ..  High volume (usually ~6,000 events per second, more data being added frequently
  • 14. MFE OpenSource / MailSpike community Dashboard Event navigation Complex Event Processing Sinkholes Data-theft Trojans Real Time Feeds Real Time Feeds IP Reputation Passive DNSTraps / Honeypots Twitter
  • 15. Anubis CyberFeed 15  Apps (demo time)
  • 16. Stream Information Processing  Collecting events from the Stream.  Generating reports.  Real time visualization. 16
  • 17. Challenge  ~6k events/s and at peak over 10k events/s.  Let‟s focus on trojans feed (banktrojan).  Peaks @ ~4k events/s {"_origin":"banktrojan","env":{"server_name":"anam0rph.su","remote_ad dr":"46.247.141.66","path_info":"/in.php","request_method":"POST","http _user_agent":"Mozilla/4.0"},"data":"upqchCg4slzHEexq0JyNLlaDqX40G sCoA3Out1Ah3HaVsQj45YCqGKylXf2Pv81M9JX0","seen":1379956636,"tr ojanfamily":"Zeus","_provider":"lab","hostn":"lab14","_ts":1379956641} 17
  • 20. Challenge  Let‟s use the Stream to help  Group by machine and trojan  From peak ~4k/s to peak ~1k/s  Filter fields.  Geo location  We end up with {"env":{"remote_addr":"207.215.48.83"},"trojanfamily":"W32Expiro","_geo_env_remote_addr ":{"country_code":"US","country_name":"United States","city":"Los Angeles","latitude":34.0067,"longitude":-118.3455,"asn":7132,"asn_name":"AS for SBIS-AS"}} 20
  • 21. Challenge  How to process and store these events? 21
  • 22. Technologies 22  Applications  NodeJS  Server-side Javascript Platform.  V8 Javascript Engine.  http://nodejs.org/ Why?  Great for prototyping.  Fast and scalable.  Modules for (almost) everything.
  • 23. Technologies 23  Databases  MongoDB  NoSQL Database.  Stores JSON-style documents.  GridFS  http://www.mongodb.org/ Why?  JSON from the Stream, JSON in the database.  Fast and scalable.  Redis  Key-value storage.  In-memory dataset.  http://redis.io/ Why?  Faster than MongoDB for certain operations, like keeping track of number of infected machines.  Very fast and scalable.
  • 24. Data Collection 24 Storage Aggregate information MongoDB Redis Worker Worker Worker Processor Process real time events  Applications  Collector  Worker  Processor  Databases  MongoDB  Redis Collector Stream
  • 25. Data Collection 25 Storage Aggregate information MongoDB Redis Worker Worker Worker Processor Process real time events  Events comes from the Stream.  Collector distributes events to Workers.  Workers persist event information.  Processor aggregates information and stores it for statistical and historical analysis. Collector Stream
  • 26. Data Collection 26 Storage Aggregate information MongoDB Redis Worker Worker Worker Processor Process real time events  MongoDB  Real time information of infected machines.  Historical aggregated information.  Redis  Real time counters of infected machines. Collector Stream
  • 27. Data Collection - Collector 27 Collector  Old data is periodically remove, i.e. machines that don‟t produce events for more than 24 hours.  Send events to Workers.Workers  Decrements counters of removed information.  Send warnings  Country / ASN is no longer infected.  Botnet X decreased Y % of its size.
  • 28. Data Collection - Worker 28 Worker  Create new entries for unseen machines.  Adds information about new trojans / domains.  Update the last time the machine was seen.  Process events and update the Redis counters accordingly.  Needs to check MongoDB to determine if:  New entry – All counters incremented  Existing entry – Increment only the counters related to that Trojan  Send warnings  Botnet X increased Y % in its size.  New infections seen on Country / ASN.
  • 29. Data Collection - Processor Processor 29  Processor retrieves real time counters from Redis.  Information is processed by:  Botnet;  ASN;  Country;  Botnet/Country;  Botnet/ASN/Country;  Total.  Persisting information to MongoDB creates a historic database of counters that can be queried and analyzed.
  • 30. Data Collection - MongoDB  Collection for active machines in the last 24h { "city" : "Philippine", "country" : "PH", "region" : "N/A", "geo" : { "lat" : 16.4499, "lng" : 120.5499 }, "created" : ISODate("2013-09-21T00:19:12.227Z "), "domains" : [ { "domain" : "hzmksreiuojy.nl", "trojan" : "zeus", "last" : ISODate("2013-09-21T09:42:56.799Z"), "created" : ISODate("2013-09-21T00:19:12.227Z") } ], "host" : "112.202.37.72.pldt.net", "ip" : "112.202.37.72", "ip_numeric" : 1892296008, "asn" : "Philippine Long Distance Telephone Company", "asn_code" : 9299, "last" : ISODate("2013-09-21T09:42:56.799Z"), "trojan" : [ "zeus” ] } 30
  • 31. Data Collection - MongoDB  Collection for aggregated information (the historic counters database) { "_id" : ObjectId("519c0abac1172e813c004ac3"), "0" : 744, "1" : 745, "3" : 748, "4" : 748, "5" : 746, "6" : 745, ... "10" : 745, "11" : 742, "12" : 746, "13" : 750, "14" : 753, ... "metadata" : { "country" : "CH", "date" : "2013-05-22T00:00:00+0000", "trojan" : "conficker_b", "type" : "daily" } } 31 Preallocated entries for each hour when the document is created. If we don’t, MongoDB will keep extending the documents by adding thousands of entries every hour and it becomes very slow.
  • 32. Data Collection - MongoDB  Collection for 24 hours  4 MongoDB Shard instances  >3 Million infected machines  ~2 Gb of data  ~558 bytes per document.  Indexes by  ip – helps inserts and updates.  ip_numeric – enables queries by CIDRs.  last – Faster removes for expired machines.  host – Hmm, is there any .gov?   country, family, asn – Speeds MongoDB queries and also allows faster custom queries.  Collection for aggregated information  Data for 119 days (25 May to 11 July)  > 18 Million entries  ~6,5 Gb of data  ~366 bytes per object  ~56 Mb per day  Indexes by  metadata.country  metadata.trojan  metadata.date  Metadata.asn  Metadata.type, metadata.country,metadata.date,met....... (all) 32
  • 33. Data Collection - Redis  Counters by Trojan / Country "cutwailbt:RO": "1256", "rbot:LA": "3", "tdss:NP": "114", "unknown4adapt:IR": "100", "unknownaff:EE": "0", "cutwail:CM": "20", "unknownhrat3:NZ": "56", "cutwailbt:PR": "191", "shylock:NO": "1", "unknownpws:BO": "3", "unknowndgaxx:CY": "77", "fbhijack:GH": "22", "pushbot:IE": "2", "carufax:US": "424“  Counters by Trojan "unknownwindcrat": "18", "tdss": "79530", "unknownsu2": "2735", "unknowndga9": "15", "unknowndga3": "17", "ircbot": "19874", "jshijack": "35570", "adware": "294341", "zeus": "1032890", "jadtre": "40557", "w32almanahe": "13435", "festi": "1412", "qakbot": "19907", "cutwailbt": "38308“  Counters by Country “BY": "11158", "NA": "314", "BW": "326", "AS": "35", "AG": "94", "GG": "43", "ID": "142648", "MQ": "194", "IQ": "16142", "TH": "105429", "MY": "35410", "MA": "15278", "BG": "15086", "PL": "27384” 33
  • 34. Data Collection - Redis  Redis performance in our machine  SET: 473036.88 requests per second  GET: 456412.59 requests per second  INCR: 461787.12 requests per second  Time to get real time data  Getting all the data from Familys/ASN/Counters to the NodeJS application and ready to be processed in around half a second  > 120 000 entries in… (very fast..)  Our current usage is  ~ 3% CPU (of a 2.0 Ghz core)  ~ 480 Mb of RAM 34
  • 35. Data Collection - API  But! There is one more application..  How to easily retrieve stored data  MongoDB Rest API is a bit limited.  NodeJS HTTP + MongoDB + Redis  Redis  http://<host>/counters_countries  ...  MongoDB  http://<host>/family_country  ...  Custom MongoDB Querys  http://<host>/ips?f.ip_numeric=95.68.149.0/22  http://<host>/ips?f.country=PT  http://<host>/ips?f.host=bgovb 35
  • 36. Data Collection - Limitations  Grouping information by machine and trojan doesn‟t allow to study the real number of events per machine.  Can be useful to get an idea of the botnet operations or how many machines are behind a single IP (everyone is behind a router).  Slow MongoDB impacts everything  Worker application needs to tolerate a slow MongoDB and discard some information has a last resort.  Beware of slow disks! Data persistence occurs every 60 seconds (default) and can take too much time, having a real impact on performance..  >10s to persist is usually very bad, something is wrong with hard drives.. 36
  • 37. Data Collection - Evolution  Warnings  Which warnings to send? When? Thresholds?  Aggregate data by week, month, year.  Aggregate information in shorter intervals.  Data Mining algorithms applied to all the collected information.  Apply same principles to other feeds of the Stream.  Spam  Twitter  Etc.. 37
  • 38. Reports  What‟s happening in country X?  What about network 192.168.0.1/24?  Can send me the report of Y everyday at 7 am?  Ohh!! Remember the report I asked last week?  Can I get a report for ASN AnubisNetwork? 38
  • 39. Reports 39  HTTP API  Schedule  Get  Edit  Delete  List schedules  List reports  Check MongoDB for work.  Generate CSV report or store the JSON Document for later querying.  Send email with link to files when report is ready. Server Generator
  • 40. Reports – MongoDB CSVs  Scheduled Report { "__v" : 0, "_id" : ObjectId("51d64e6d5e8fd0d145000008"), "active" : true, "asn_code" : "", "country" : "PT", "desc" : "Portugal Trojans", "emails" : "", "range" : "", "repeat" : true, "reports" : [ ObjectId("51d64e7037571bd24500000d"), ObjectId("51d741e8bcb161366600000c"), ObjectId("51d89367bcb161366600005f"), ObjectId("51d9e4f9bcb16136660000ca"), ObjectId("51db3678c3a15fc577000038"), ObjectId("51dc87e216eea97c20000007"), ObjectId("51ddd964a89164643b000001") ], "run_at" : ISODate("2013-07-11T22:00:00Z"), "scheduled_date" : ISODate("2013-07- 05T04:41:17.067Z") }  Report { "__v" : 0, "_id" : ObjectId("51d89367bcb161366600005f"), "date" : ISODate("2013-07-06T22:00:07.015Z"), "files" : [ ObjectId("51d89368bcb1613666000060") ], "work" : ObjectId("51d64e6d5e8fd0d145000008") }  Files  Each report has an array of files that represents the report.  Each file is stored in GridFS. 40
  • 41. Reports – MongoDB JSONs  Scheduled Report { "__v" : 0, "_id" : ObjectId("51d64e6d5e8fd0d145000008"), "active" : true, "asn_code" : "", "country" : "PT", "desc" : "Portugal Trojans", "emails" : "", "range" : "", "repeat" : true, “snapshots" : [ ObjectId("521f761c0a45c3b00b000001"), ObjectId("521fb0848275044d420d392f"), ObjectId("52207c2f7c53a8494f010afa"), ObjectId("5221c9df4910ba3874000001"), ObjectId("522275724910ba3874001f66"), ObjectId("5223c6f24910ba3874003b7a"), ObjectId("522518734910ba3874005763") ], "run_at" : ISODate("2013-07-11T22:00:00Z"), "scheduled_date" : ISODate("2013-07-05T04:41:17.067Z") }  Snapshot { "_id" : ObjectId("51d89367bcb161366600005f"), "date" : ISODate("2013-07-06T22:00:07.015Z"), "work" : ObjectId("521f761c0a45c3b00b000001"), count: 123 }  Results { "machine" : { "trojan" : [ “conficker_b“ ], "ip" : "2.80.2.53", "host" : "Bl19-1-13.dsl.telepac.pt", }, … , "metadata" : { "work" : ObjectId("521f837647b8d3ba7d000001"), "snaptshot" : ObjectId("521f837aa669d0b87d000001"), "date" : ISODate("2013-08-29T00:00:00Z") }, } 41
  • 42. Reports – Evolution  Other reports formats.  Charts?  Other type of reports. (Not only botnets).  Need to evolve Collector first. 42
  • 43. Globe  How to visualize real time events from the stream?  Where are the botnets located?  Who‟s the most infected?  How many infections? 43
  • 44. Globe – Stream  origin = banktrojan  Modules  Group  trojanfamily  _geo_env_remote_addr.country_n ame  grouptime=5000  Geo  Filter fields  trojanfamily  Geolocation  _geo_env_remote_addr.l*  KPI  trojanfamily  _geo_env_remote_addr.country_n ame  kpilimit = 10 44 Stream NodeJS Browser  Request botnets from stream
  • 45. Globe – NodeJS 45 Stream NodeJS Browser  NodeJS  HTTP  Get JSON from Stream.  Socket.IO  Multiple protocol support (to bypass some proxys and handle old browsers).  Redis  Get real time number of infected machines.
  • 46. Globe – Browser 46 Stream NodeJS Browser  Browser  Socket.IO Client  Real time apps.  Websockets and other types of transport.  WebGL  ThreeJS  Tween  jQuery  WebWorkers  Runs in the background.  Where to place the red dots?  Calculations from geolocation to 3D point goes here.
  • 47. Globe – Evolution  Some kind of HUD to get better interaction and notifications.  Request actions by clicking in the globe.  Generate report of infected in that area.  Request operations in a specific that area.  Real time warnings  New Infections  Other types of warnings... 47
  • 48. Adding Valuable Information to Stream Events  How to distribute workload to other machines?  Adding value to the information we already have. 48
  • 49. Minions  Typically the operations that would had value are expensive in terms of resources  CPU  Bandwidth  Master-slave approach that distributes work among distributed slaves we called Minions. 49 Master Minion Minion Minion Minion
  • 50. Minions 50  Master receives work from Requesters and store the work in MongoDB.  Minions request work.  Requesters receive real time information on the work from the Master or they can ask for work information at a later time. Process / Storage Minions Master MongoDB DNS Scan Minion Minion Requesters Minion
  • 51. Minions  Master has an API that allows custom Requesters to ask for work and monitor the work.  Minion have a modular architecture  Easily create a custom module.  Information received from the Minions can then be processed by the Requesters and  Sent to the Stream  Saved on the database  Update existing database 51 Minion DNS Scanning Data Mining
  • 52. Extras...  So what else could we possibly do using the Stream?  Distributed Portscanning  Distributed DNS Resolutions  Transmit images  Transmit videos  Realtime tools  Data agnostic. Throw stuff at it and it will deal with it. 52
  • 53. Extras...  So what else could we possibly do using the Stream?  Distributed Portscanning  Distributed DNS Resolutions  Transmit images  Transmit videos  Realtime tools  Data agnostic. Throw stuff at it and it will deal with it. 53 FOCUS FOCUS
  • 54. Portscanning  Portscanning done right…  Its not only about your portscanner being able to throw 1 billion packets per second.  Location = reliability of scans.  Distributed system for portscanning is much better. But its not just about having it distributed. Its about optimizing what it scans. 54
  • 58. Portscanning IP Australia (intervolve) China (ChinaVPShosting) Russia (NQHost) USA (Ramnode) Portugal (Zon PT) 41.63.160.0/19 (Angola) 0 hosts up 0 hosts up 0 hosts up 0 hosts up 3 hosts up (sometimes) 5.1.96.0/21 (China) 10 hosts up 70 hosts up 40 hosts up 10 hosts up 40 hosts up 41.78.72.0/22 (Somalia) 0 hosts up 0 hosts up 0 hosts up 0 hosts up 33 hosts up 92.102.229.0/24 (Russia) 20 hosts up 100 hosts up 2 hosts up 2 hosts up 150 hosts up 58
  • 59. Portscanning problems...  Doing portscanning correctly brings along certain problems.  If you are not HD Moore or Dan Kaminsky, resource wise you are gonna have a bad time 59
  • 60. Portscanning problems...  Doing portscanning correctly brings along certain problems.  If you are not HD Moore or Dan Kaminsky, resource wise you are gonna have a bad time 60
  • 61. Portscanning problems...  Doing portscanning correctly brings along certain problems.  If you are not HD Moore or Dan Kaminsky, resource wise you are gonna have a bad time  You need lots of minions in different parts of the world  Doesn‟t actually require an amazing CPU or RAM if you do it correctly.  Storing all that data...  Querying that data... Is it possible to have a cheap, distributed portscanning system? 61
  • 68. If we„re doing it... Anyone else can. Evil side? 68
  • 69. Anubis StreamForce  Have cool ideas? Contact us  Access for Brucon participants: API Endpoint: http://brucon.cyberfeed.net:8080/stream?key=brucon 2013  Web UI Dashboard maker: http://brucon.cyberfeed.net:8080/webgui 69
  • 70. Lol  Last minute testing 70

Notas del editor

  1. Internet scale. Devices, systems, firewalls, ids..
  2. Internet scale. Devices, systems, firewalls, ids..
  3. Internet scale. Devices, systems, firewalls, ids..
  4. Internet scale. Devices, systems, firewalls, ids..
  5. Internet scale. Devices, systems, firewalls, ids..
  6. Internet scale. Devices, systems, firewalls, ids..
  7. Internet scale. Devices, systems, firewalls, ids..
  8. Internet scale. Devices, systems, firewalls, ids..
  9. Hi, I’m going to present the next section of the presentation.So, how can we collect events from the Stream? What information can we gather from those events?How can we access to those events in real time?
  10. The challenge here is the large number of events per second, on total we currently have over 6000 events per second, 4000 of these events are from a single feed called banktrojans, which is basically formed by infected machines. This is what an event from that machines looks like.
  11. So, basicallythiswhatwesee..
  12. And, thisiswhatwewant.Wewant to knowwhereour targets are, where to look.
  13. Infected machines are usually noisy and they tend to produce a big number of events. We can use the stream to help us, the group module groups the events that occur within 4 minutes of each other and originate from the same machine and trojan, we can go from 4000 to 1000 events per second, basically we receive an event for a machine and trojan and the next events will not be received because they are considered duplicates. Then we have the filter module to filter the fields we need, for example, we only care about the IP address, ASN, Trojan, C&amp;C domain and geo location of the machine.How do we process and store these 1000 events per second?
  14. Infected machines are usually noisy and they tend to produce a big number of events. We can use the stream to help us, the group module groups the events that occur within 4 minutes of each other and originate from the same machine and trojan, we can go from 4000 to 1000 events per second, basically we receive an event for a machine and trojan and the next events will not be received because they are considered duplicates. Then we have the filter module to filter the fields we need, for example, we only care about the IP address, ASN, Trojan, C&amp;C domain and geo location of the machine.How do we process and store these 1000 events per second?
  15. First, some technical information about the technologies we use.For applications development, we NodeJS, a server-side javascript platform built on top of the V8 engine. It’s fast, scalable and has modules for almost everything.For data storage, MongoDB is a NoSQL database that is fast and scalable. It can also store JSON-style documents and files in GridFS.And then we have Redis, key-value storage that is very fast and also scalable.
  16. First, some technical information about the technologies we use.For applications development, we NodeJS, a server-side javascript platform built on top of the V8 engine. It’s fast, scalable and has modules for almost everything.For data storage, MongoDB is a NoSQL database that is fast and scalable. It can also store JSON-style documents and files in GridFS.And then we have Redis, key-value storage that is very fast and also scalable.
  17. This is an overview of the Data Collection. We built 3 applications: Collector; Worker; Processor.We have the events coming from the Stream to the Collector. The Collector then distributes the workload to workers that process and store the information in MongoDB and Redis.The Processor will then gather information from MongoDB and stores it for statistical and historical analysis.
  18. Events come from the Stream to the Collector. The Collector then distributes the workload to workers that process and store the information in MongoDB and Redis.The Processor will then gather information from Redis and stores it in MongoDB for statistical and historical analysis.
  19. Events come from the Stream to the Collector. The Collector then distributes the workload to workers that process and store the information in MongoDB and Redis.The Processor will then gather information from Redis and stores it in MongoDB for statistical and historical analysis.
  20. So the Collector, talks to these 3 components. It maintains the information on MongoDB, removing information about machines that don’t produce events for more than 24 hours.Decrements counters for Redis, and while maintaining this information, it is possible to send warnings.Workers receive events from the Collector and can run in any machine with connection to the collector and database..
  21. The Worker processes and stores the event in MongoDB, creating new entries or updating information about new trojans in existing entry. It also updates the last time we saw an event for that machine.While updating MongoDB the Worker also needs to maintain the Redis counters information, incrementing the values for new entries or updating counters for a new trojan in a seen machine. While performing this task it can also understand if there is a warning to be sent.
  22. The last component is Processor. It retrieves real time counters from Redis, processes and stores them in MongoDB aggregated by Botnet, ASN, Country, etc. This information can then be analysed and queried.
  23. Let’s now check the Databases. MongoDB collection that stores information of active machines in the last 24 hours, looks like this. It’s a JSON document with information about geolocation, IP address, Trojans, last time seen, etc. There is also a numerical representation of the IP Address that helps to query for specific network ranges.
  24. The aggregated information collection holds documents with this format. The metadata field that holds information about the specific document, its type and origin of information. In this case its country and trojan. It has an entry per hour with the number of infections, these entries need to be preallocated with zeros, so at every day a new document is created for a specific metadata with all the hours at 0. If we don’t do this there will be a lot of extends of documents on MongoDB and I will become very slow.
  25. Some more information for this collections. The 24 hours collection is sharded between 4 MongoDB instance and in July it had information over 3 million infected machines, that only takes 2 Gb of disk to store. The aggregated information collected for 119 days, had over 18 million entries and occupied around 6,5 Gb of data, that’s around 56Mb per day.These were the indexes created. We need to be very careful with these because they speed the readings but they slow the writings. We want fast writes for the 24 hours collection and for that reason we need to keep the indexes optimized and only the IP index runs on the foreground, all the other run on the background.For the aggregated information collection we don’t need to be very careful, we can add the indexes that will allow us to perform faster queries.
  26. Let’s look at the Redis information. The counters look like this, they are concatenation of string separated by colons, for example (example).
  27. Redis is very fast, we can retrieve all the information from the biggest in around half a second. The insertion of data is also very fast while using very few resources of a machine.
  28. There was also the need to access all this information on demand, so an API was created that allow to retrieve or query information on both Redis and MongoDB.
  29. So, there are a couple of limitations with these approaches. By grouping events in order to reduce the amount of events per second we are discarding information that could be studied in order to better understand what is behind those machines, for example, the number of events of a machine with a specific botnet could indicate how many machines are on that network (everyone has a router nowadays).Also MongoDB can impact everything, it is fast but needs to be used carefully. We need 3 MongoDB shards to keep the performance on acceptable levels. If we start getting 2 or 3 times the events we currently have the Workers won’t be able to persist all that information in time and will have to start discarding it at some point. The alternative to discard is to add more shards. You need to constantly monitor your hard drives, if the performance decreases, bad things will happen. Mongo won’t be able to persist the information in time and will start to slow down everything.
  30. How can we evolve this solution?We can send more warnings with the information we have, but when? What thresholds should we use.We only aggregate information by the hours and day, what about weeks, months, years? What about shorter intervals?We can also apply data mining algorithms in order to retrieve more information from the data we already collect.And of course, apply these principals to other feeds like Spam or Twitter.
  31. So how do we extract information about a specific network or country? What about what happened last week?
  32. Of course we used NodeJS and built 2 applications, one that is used as an API to access and request reports and the other that checks the Database for requests, generates the reports and stores them. The reports are saved in CSV or in JSON format, for later query. They are also sent by email where we give a URL to download the files.
  33. The collections that hold the CSV reports look like this. The have a scheduled work collection that keeps an record of the report its generating and the reports it already generated. The reports keep an array of files generated and saved on MongoDB storage for files, called GridFS.
  34. Then we have the JSONs reports, that we call snapshots. The main differences are the count field in the snapshot that holds the number of infected machines in that snapshot, and the results for that snapshot which include the information about the machine and the metadata that identifies the origin of that entry.We could store an array of results in the Snapshot collection but it would be hard to use it because it would have too many entries, possibly millions and would just be useless.
  35. How could we evolve the Reports? We can store reports in other formats, generate charts for that report with specific information and start storing other type of reports, not just for botnets.
  36. So, how can we visualize realtime events? Let’s focus on the botnets again, it would be awesome if we could see the distribution of botnets thru the world, receive warnings and monitor other information in realtime.For that purpose, there is a shiny globe (demo).We can see in realtime when infected machines produce events, monitor a top of most infected with a specific Trojan, number of events being generated every second and a total number of infections. countries
  37. This information comes from the Steam, we group it by Trojan and country, we don’t really want to sent ALL the events to the browser because some browsers would just crash.. For that reason we also filter only the geolocation and trojan family. The information about the top infected come from a KPI module that dynamically calculates the top in the stream.
  38. Between the Stream and the Browser we have a NodeJS application that controls the flow of events to the browser, discarding if too many events are received and relaying the information to the Browser using the socket.io module. We all need to get the total number of infected machines from the Redis counters.
  39. At the browser end we use the socket.io client to receive the events, process those events using WebWorkers (calculation of where to place the dots) and render everything using WebGL.
  40. We can evolve the globe to create a more interactive experience where we could perform actions in realtime through the globe.We can also show warnings in the globe, for example, about new infections.
  41. How can we add valuable information to the information we already have?
  42. Typically the operations that would had value are expensive, they need CPU and bandwidth. So we needed a master-slave approach that distributes the work among multiple slaves, we called Minions.
  43. Masters receive work from the Requesters and store that work on MongoDB. Minions will then request work and send the work result to the Master. Master then send updates directly to the Requester of the work and also stores the results in MongoDB.
  44. The Master has an API that allows for custom Requesters to ask for work and monitor the work results received from the Minion.The Minion application was built with a modular architecture in mind, so it is very easy to create a custom module.Information received by the minion can then be injected on the stream or stored in a database.
  45. Getting full picture from an infected machine or a networking involves lots of steps:Sinkholing that botnetPortscanning target gives u an idea if the machine is connected directly to internet or behind gateway or if there are shares available, how could this machine possibly been compromised (ms08-067 ? )DNS analysis
  46. We are going to focus on:PortscanningDNS resolutionsRealtime demos
  47. Its really cool to have a super fast scanner in a lab giving 1 quadrillion packets per second. However this is the wrong way. Correct way:Slow scanGeo DistributedScanning angola from australia = 60% of services timeout and look closedScanning USA from Russia or vice versa = retarded
  48. Combining a model B raspberry pi with the distro pwnpi and a custom set of scripts makes it a Minion. A cheap device that we can use to do distributed scanning and even ask others to deploy and contribute to our system.In the near future we intend to make this image available for others that want to contribute to our system.