SlideShare una empresa de Scribd logo
1 de 18
Securely explore your data

APPACHE
ACCUMULO
Adam Fuchs and John Vines
sqrrldata, inc.
January 9, 2013
APACHE ACCUMULO

Sorted, Distributed Key/Value Store
Based on Google’s Big Table Design
Built on Top of Apache Hadoop and Apache Zookeeper
Augments and Integrates With the Hadoop ecosystem

© 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential

2
TODAY’S TALK
Overview of the Accumulo Project
Accumulo Design
Table Design Strategy
Live Demonstration

© 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential

3
ACCUMULO TIMELINE
NSA open
sources
Accumulo into
incubation at
Apache

Google
publishes
Bigtablepaper

2005

2006

Google Publishes
Papers:
GFS (2003)
Map Reduce (2004)

2007

2008

2009

NSA begins
development of
Accumulo

© 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential

Accumulo becomes
a top-level Apache
project

2010

2011

First sqrrl
release planned

2012

2013

sqrrl is founded

4
ACCUMULO’S STRENGTHS
Apache Accumulo excels at:
- Security
Cell-level security reduces the cost of application development in
the presence of complex legal or policy restrictions on data use
Mandatory access control keeps your data safe
- Scalability
Proven reliability and performance at the multi-petabyte scale
High-performance parallel I/O library
- Adaptability
Flexible schema support to quickly ingest new data sources
Sorted key/value paradigm supports a multitude of search and
analysis applications
Server-side programming framework “iterator trees” support bestin-class aggregation, filtering, and complex query semantics
© 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential

5
BASIC SCHEMA
Accumulo stores sorted key/value pairs (entries).

An Accumulo key is a 5-tuple, consisting of:
- Row: Controls Atomicity
- Column Family: Controls Locality
- Column Qualifier: Controls Uniqueness
- Visibility Label: Controls Access
- Timestamp: Controls Versioning

Keys are sorted:
-Hierarchically: Row first, then column family, and so on.
- Lexicographically: Compare first byte, then second, and so on.

Values are byte arrays.

© 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential

6
KEY/VALUE EXAMPLES
Row

Col. Fam.

Col. Qual.

John Doe

Visibility

JD

Timesta
mp

Value

Jane Doe

Friends

20121130

Jane Doe

PhoneNumbe
555-1212
r

John Doe

Friends

Jane Doe

JD

20121201

John Doe

Notes

PCP

PCP_JD

20120912

Patient suffers
from an acute …

John Doe

Test Results

Cholesterol

JD|PCP_JD

20120912

183

John Doe

Test Results

Mental Health

JD|PSYCH_JD

20120801

Pass

John Doe

Test Results

Mental Health

PSYCH_JD

20120801

Crazy!

John Doe

Test Results

X-Ray

JD|PHYS_JD

20120513

1010110110100
…

20090115

© 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential

7
VISIBILITY SYNTAX & SEMANTICS

© 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential

8
TABLET ORGANIZATION
Well-Known
Location
(zookeeper)

Collections of entries from tables
Tables are partitioned into Tablets
Metadata tablets hold info about other tablets,
forming a 3-level hierarchy
A Tablet is a unit of work for a Tablet Server

Root Tablet
-∞ to ∞

Metadata Tablet 1

Metadata Tablet 2

-∞ to
“Encyclopedia:Ocelot”

“Encyclopedia:Ocelot” to ∞

Table: Adam’s Table
Data Tablet
-∞ : thing

Data Tablet
thing : ∞

Table: Encyclopedia
Data Tablet
-∞ : Ocelot

© 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential

Data Tablet
Ocelot : Yak

Data Tablet
Yak : ∞

Table: Foo
Data Tablet
-∞ to ∞

9
ACCUMULO ARCHITECTURE
Zookeeper

Zookeeper

Delegate
Authority,
Configs

Zookeeper
Delegate
Authority,
Configs

Tablet Server

Tablet
Read/Write
Assign/Balance

Tablet Server

Master

Application

Application
Tablet
Store/Replicate

Tablet Server

Application

HDFS
Scan

Delete

Tablet

Garbage
Collector
© 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential

10
TABLET DATA FLOW
Tablet
Writes

In-Memory
Map

Scan

Iterator
Tree

Iterator
Tree

Minor
Compaction

Sorted,
Indexed
File

Sorted,
Indexed
File
Write Ahead
Log
(For Recovery)

Iterator
Major Tree

Merging /
Compaction

© 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential

Reads

Sorted,
Indexed
File

11
ITERATOR FRAMEWORK
Iterator Operations:
- File Reads
- Block Caching
- Merging
- Deletion
- Isolation
- Locality Groups
- Range Selection
- Column Selection
- Cell-level Security
- Versioning
- Filtering
- Aggregation
- Partitioned Joins

© 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential

12
CLIENT API
new ZooKeeperInstance(...)

Instance

new MockInstance()

getConnector(auth info...)

Range
IteratorOption

Connector

TableOperations

Authorizations

InstanceOperations

createScanner(...)

createBatchScanner(...) createBatchWriter(...)

SecurityOperations

Scanner

BatchScanner

BatchWriter

iterator()
addMutation(...)

Map.Entry
Key

Mutation

Value

© 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential

13
TABLE DESIGN
Table:

Graphs
Document-distributed indexing
Multi-dimensional index
Custom index

© 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential

Inverted Index

Row:

<UUID>

<Term>

Column
Family:

<Type>

<Type> +
<Field>

Column
Qualifier:

<Field>

<UUID>

Value:

No built-in secondary indices
Sort Order  Index
Basic design pattern: forward and
inverted index tables
Additional table design patterns

Forward Index

<Term>

<Digest of
Event>

14
DEMO TIME!

© 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential

15
OPEN SOURCE PROJECT
Apache Software Foundation project since October 2011
site: http://accumulo.apache.org
jira: https://issues.apache.org/jira/browse/ACCUMULO
lists: http://accumulo.apache.org/mailing_list.html

© 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential

16
CURRENT CONTRIBUTORS

© 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential

17
CONTACT

Adam Fuchs
CTO
adam@sqrrl.com

John Vines
Director of Ecosystems
john@sqrrl.com

sqrrl data, Inc.
www.sqrrl.com
@sqrrl_inc
info@sqrrl.com
© 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential

18

Más contenido relacionado

La actualidad más candente

Cloud Accelerated Genomics
Cloud Accelerated GenomicsCloud Accelerated Genomics
Cloud Accelerated GenomicsIdan Tohami
 
Achieving HIPAA on GCP
Achieving HIPAA on GCPAchieving HIPAA on GCP
Achieving HIPAA on GCPIdan Tohami
 
Search Analytics Business Value & NoSQL Backend
Search Analytics Business Value & NoSQL BackendSearch Analytics Business Value & NoSQL Backend
Search Analytics Business Value & NoSQL BackendSematext Group, Inc.
 
Fighting cyber fraud with hadoop
Fighting cyber fraud with hadoopFighting cyber fraud with hadoop
Fighting cyber fraud with hadoopNiel Dunnage
 
Analyzing 1.2 Million Network Packets per Second in Real-time
Analyzing 1.2 Million Network Packets per Second in Real-timeAnalyzing 1.2 Million Network Packets per Second in Real-time
Analyzing 1.2 Million Network Packets per Second in Real-timeDataWorks Summit
 

La actualidad más candente (7)

Cloud Accelerated Genomics
Cloud Accelerated GenomicsCloud Accelerated Genomics
Cloud Accelerated Genomics
 
Achieving HIPAA on GCP
Achieving HIPAA on GCPAchieving HIPAA on GCP
Achieving HIPAA on GCP
 
AcceleTest
AcceleTestAcceleTest
AcceleTest
 
Using Graphs for Data Analysis
Using Graphs for Data AnalysisUsing Graphs for Data Analysis
Using Graphs for Data Analysis
 
Search Analytics Business Value & NoSQL Backend
Search Analytics Business Value & NoSQL BackendSearch Analytics Business Value & NoSQL Backend
Search Analytics Business Value & NoSQL Backend
 
Fighting cyber fraud with hadoop
Fighting cyber fraud with hadoopFighting cyber fraud with hadoop
Fighting cyber fraud with hadoop
 
Analyzing 1.2 Million Network Packets per Second in Real-time
Analyzing 1.2 Million Network Packets per Second in Real-timeAnalyzing 1.2 Million Network Packets per Second in Real-time
Analyzing 1.2 Million Network Packets per Second in Real-time
 

Similar a Accumulo meetup 20130109

Adam Fuchs' Accumulo Talk at NoSQL Now! 2013
Adam Fuchs' Accumulo Talk at NoSQL Now! 2013Adam Fuchs' Accumulo Talk at NoSQL Now! 2013
Adam Fuchs' Accumulo Talk at NoSQL Now! 2013Sqrrl
 
Sqrrl Overview for Stac Research
Sqrrl Overview for Stac ResearchSqrrl Overview for Stac Research
Sqrrl Overview for Stac ResearchSqrrl
 
206590 mobilizing your primavera workforce
206590 mobilizing your primavera workforce206590 mobilizing your primavera workforce
206590 mobilizing your primavera workforcep6academy
 
Harnessing the Power of Apache Hadoop Series
Harnessing the Power of Apache Hadoop SeriesHarnessing the Power of Apache Hadoop Series
Harnessing the Power of Apache Hadoop SeriesCloudera, Inc.
 
Sqrrl February Webinar: Breaking Down Data Silos
Sqrrl February Webinar: Breaking Down Data SilosSqrrl February Webinar: Breaking Down Data Silos
Sqrrl February Webinar: Breaking Down Data SilosSqrrl
 
OSI_MySQL_Performance Schema
OSI_MySQL_Performance SchemaOSI_MySQL_Performance Schema
OSI_MySQL_Performance SchemaMayank Prasad
 
dlux splunk>live! 2012 Beginners Session
dlux splunk>live! 2012 Beginners Sessiondlux splunk>live! 2012 Beginners Session
dlux splunk>live! 2012 Beginners SessionDavid Lutz
 
Hugaccumulo 121018192044-phpapp02
Hugaccumulo 121018192044-phpapp02Hugaccumulo 121018192044-phpapp02
Hugaccumulo 121018192044-phpapp02Sqrrl
 
Getting Started with Splunk Enterprise Hands-On Breakout Session
Getting Started with Splunk Enterprise Hands-On Breakout SessionGetting Started with Splunk Enterprise Hands-On Breakout Session
Getting Started with Splunk Enterprise Hands-On Breakout SessionSplunk
 
What's Next for Google's BigTable
What's Next for Google's BigTableWhat's Next for Google's BigTable
What's Next for Google's BigTableSqrrl
 
3. EMC Storage for future Surveillance.pdf
3. EMC Storage for future Surveillance.pdf3. EMC Storage for future Surveillance.pdf
3. EMC Storage for future Surveillance.pdfPawachMetharattanara
 
Oracle NoSQL Database release 3.0 overview
Oracle NoSQL Database release 3.0 overviewOracle NoSQL Database release 3.0 overview
Oracle NoSQL Database release 3.0 overviewPaulo Fagundes
 
High Availability And Oracle Data Guard 11g R2
High Availability And Oracle Data Guard 11g R2High Availability And Oracle Data Guard 11g R2
High Availability And Oracle Data Guard 11g R2Mario Redón Luz
 
Java code coverage with JCov. Implementation details and use cases.
Java code coverage with JCov. Implementation details and use cases.Java code coverage with JCov. Implementation details and use cases.
Java code coverage with JCov. Implementation details and use cases.Alexandre (Shura) Iline
 
Solution Use Case Demo: The Power of Relationships in Your Big Data
Solution Use Case Demo: The Power of Relationships in Your Big DataSolution Use Case Demo: The Power of Relationships in Your Big Data
Solution Use Case Demo: The Power of Relationships in Your Big DataInfiniteGraph
 
Efficient content structures and queries in CRX/CQ
Efficient content structures and queries in CRX/CQEfficient content structures and queries in CRX/CQ
Efficient content structures and queries in CRX/CQconnectwebex
 
Tendencias Storage
Tendencias StorageTendencias Storage
Tendencias StorageFran Navarro
 
Hadoop Design Patterns
Hadoop Design PatternsHadoop Design Patterns
Hadoop Design PatternsEMC
 
Oracle Active Data Guard: Best Practices and New Features Deep Dive
Oracle Active Data Guard: Best Practices and New Features Deep Dive Oracle Active Data Guard: Best Practices and New Features Deep Dive
Oracle Active Data Guard: Best Practices and New Features Deep Dive Glen Hawkins
 

Similar a Accumulo meetup 20130109 (20)

Adam Fuchs' Accumulo Talk at NoSQL Now! 2013
Adam Fuchs' Accumulo Talk at NoSQL Now! 2013Adam Fuchs' Accumulo Talk at NoSQL Now! 2013
Adam Fuchs' Accumulo Talk at NoSQL Now! 2013
 
Sqrrl Overview for Stac Research
Sqrrl Overview for Stac ResearchSqrrl Overview for Stac Research
Sqrrl Overview for Stac Research
 
206590 mobilizing your primavera workforce
206590 mobilizing your primavera workforce206590 mobilizing your primavera workforce
206590 mobilizing your primavera workforce
 
Harnessing the Power of Apache Hadoop Series
Harnessing the Power of Apache Hadoop SeriesHarnessing the Power of Apache Hadoop Series
Harnessing the Power of Apache Hadoop Series
 
Sqrrl February Webinar: Breaking Down Data Silos
Sqrrl February Webinar: Breaking Down Data SilosSqrrl February Webinar: Breaking Down Data Silos
Sqrrl February Webinar: Breaking Down Data Silos
 
OSI_MySQL_Performance Schema
OSI_MySQL_Performance SchemaOSI_MySQL_Performance Schema
OSI_MySQL_Performance Schema
 
dlux splunk>live! 2012 Beginners Session
dlux splunk>live! 2012 Beginners Sessiondlux splunk>live! 2012 Beginners Session
dlux splunk>live! 2012 Beginners Session
 
Hugaccumulo 121018192044-phpapp02
Hugaccumulo 121018192044-phpapp02Hugaccumulo 121018192044-phpapp02
Hugaccumulo 121018192044-phpapp02
 
Getting Started with Splunk Enterprise Hands-On Breakout Session
Getting Started with Splunk Enterprise Hands-On Breakout SessionGetting Started with Splunk Enterprise Hands-On Breakout Session
Getting Started with Splunk Enterprise Hands-On Breakout Session
 
What's Next for Google's BigTable
What's Next for Google's BigTableWhat's Next for Google's BigTable
What's Next for Google's BigTable
 
3. EMC Storage for future Surveillance.pdf
3. EMC Storage for future Surveillance.pdf3. EMC Storage for future Surveillance.pdf
3. EMC Storage for future Surveillance.pdf
 
con9578-2088758.pdf
con9578-2088758.pdfcon9578-2088758.pdf
con9578-2088758.pdf
 
Oracle NoSQL Database release 3.0 overview
Oracle NoSQL Database release 3.0 overviewOracle NoSQL Database release 3.0 overview
Oracle NoSQL Database release 3.0 overview
 
High Availability And Oracle Data Guard 11g R2
High Availability And Oracle Data Guard 11g R2High Availability And Oracle Data Guard 11g R2
High Availability And Oracle Data Guard 11g R2
 
Java code coverage with JCov. Implementation details and use cases.
Java code coverage with JCov. Implementation details and use cases.Java code coverage with JCov. Implementation details and use cases.
Java code coverage with JCov. Implementation details and use cases.
 
Solution Use Case Demo: The Power of Relationships in Your Big Data
Solution Use Case Demo: The Power of Relationships in Your Big DataSolution Use Case Demo: The Power of Relationships in Your Big Data
Solution Use Case Demo: The Power of Relationships in Your Big Data
 
Efficient content structures and queries in CRX/CQ
Efficient content structures and queries in CRX/CQEfficient content structures and queries in CRX/CQ
Efficient content structures and queries in CRX/CQ
 
Tendencias Storage
Tendencias StorageTendencias Storage
Tendencias Storage
 
Hadoop Design Patterns
Hadoop Design PatternsHadoop Design Patterns
Hadoop Design Patterns
 
Oracle Active Data Guard: Best Practices and New Features Deep Dive
Oracle Active Data Guard: Best Practices and New Features Deep Dive Oracle Active Data Guard: Best Practices and New Features Deep Dive
Oracle Active Data Guard: Best Practices and New Features Deep Dive
 

Más de Sqrrl

Transitioning Government Technology
Transitioning Government TechnologyTransitioning Government Technology
Transitioning Government TechnologySqrrl
 
Leveraging Threat Intelligence to Guide Your Hunts
Leveraging Threat Intelligence to Guide Your HuntsLeveraging Threat Intelligence to Guide Your Hunts
Leveraging Threat Intelligence to Guide Your HuntsSqrrl
 
How to Hunt for Lateral Movement on Your Network
How to Hunt for Lateral Movement on Your NetworkHow to Hunt for Lateral Movement on Your Network
How to Hunt for Lateral Movement on Your NetworkSqrrl
 
Machine Learning for Incident Detection: Getting Started
Machine Learning for Incident Detection: Getting StartedMachine Learning for Incident Detection: Getting Started
Machine Learning for Incident Detection: Getting StartedSqrrl
 
Building a Next-Generation Security Operations Center (SOC)
Building a Next-Generation Security Operations Center (SOC)Building a Next-Generation Security Operations Center (SOC)
Building a Next-Generation Security Operations Center (SOC)Sqrrl
 
User and Entity Behavior Analytics using the Sqrrl Behavior Graph
User and Entity Behavior Analytics using the Sqrrl Behavior GraphUser and Entity Behavior Analytics using the Sqrrl Behavior Graph
User and Entity Behavior Analytics using the Sqrrl Behavior GraphSqrrl
 
Threat Hunting Platforms (Collaboration with SANS Institute)
Threat Hunting Platforms (Collaboration with SANS Institute)Threat Hunting Platforms (Collaboration with SANS Institute)
Threat Hunting Platforms (Collaboration with SANS Institute)Sqrrl
 
Sqrrl and IBM: Threat Hunting for QRadar Users
Sqrrl and IBM: Threat Hunting for QRadar UsersSqrrl and IBM: Threat Hunting for QRadar Users
Sqrrl and IBM: Threat Hunting for QRadar UsersSqrrl
 
Threat Hunting for Command and Control Activity
Threat Hunting for Command and Control ActivityThreat Hunting for Command and Control Activity
Threat Hunting for Command and Control ActivitySqrrl
 
Modernizing Your SOC: A CISO-led Training
Modernizing Your SOC: A CISO-led TrainingModernizing Your SOC: A CISO-led Training
Modernizing Your SOC: A CISO-led TrainingSqrrl
 
Threat Hunting vs. UEBA: Similarities, Differences, and How They Work Together
Threat Hunting vs. UEBA: Similarities, Differences, and How They Work Together Threat Hunting vs. UEBA: Similarities, Differences, and How They Work Together
Threat Hunting vs. UEBA: Similarities, Differences, and How They Work Together Sqrrl
 
Leveraging DNS to Surface Attacker Activity
Leveraging DNS to Surface Attacker ActivityLeveraging DNS to Surface Attacker Activity
Leveraging DNS to Surface Attacker ActivitySqrrl
 
The Art and Science of Alert Triage
The Art and Science of Alert TriageThe Art and Science of Alert Triage
The Art and Science of Alert TriageSqrrl
 
Reducing Mean Time to Know
Reducing Mean Time to KnowReducing Mean Time to Know
Reducing Mean Time to KnowSqrrl
 
Sqrrl Enterprise: Big Data Security Analytics Use Case
Sqrrl Enterprise: Big Data Security Analytics Use CaseSqrrl Enterprise: Big Data Security Analytics Use Case
Sqrrl Enterprise: Big Data Security Analytics Use CaseSqrrl
 
The Linked Data Advantage
The Linked Data AdvantageThe Linked Data Advantage
The Linked Data AdvantageSqrrl
 
Sqrrl Enterprise: Integrate, Explore, Analyze
Sqrrl Enterprise: Integrate, Explore, AnalyzeSqrrl Enterprise: Integrate, Explore, Analyze
Sqrrl Enterprise: Integrate, Explore, AnalyzeSqrrl
 
Sqrrl Datasheet: Cyber Hunting
Sqrrl Datasheet: Cyber HuntingSqrrl Datasheet: Cyber Hunting
Sqrrl Datasheet: Cyber HuntingSqrrl
 
Benchmarking The Apache Accumulo Distributed Key–Value Store
Benchmarking The Apache Accumulo Distributed Key–Value StoreBenchmarking The Apache Accumulo Distributed Key–Value Store
Benchmarking The Apache Accumulo Distributed Key–Value StoreSqrrl
 
Scalable Graph Clustering with Pregel
Scalable Graph Clustering with PregelScalable Graph Clustering with Pregel
Scalable Graph Clustering with PregelSqrrl
 

Más de Sqrrl (20)

Transitioning Government Technology
Transitioning Government TechnologyTransitioning Government Technology
Transitioning Government Technology
 
Leveraging Threat Intelligence to Guide Your Hunts
Leveraging Threat Intelligence to Guide Your HuntsLeveraging Threat Intelligence to Guide Your Hunts
Leveraging Threat Intelligence to Guide Your Hunts
 
How to Hunt for Lateral Movement on Your Network
How to Hunt for Lateral Movement on Your NetworkHow to Hunt for Lateral Movement on Your Network
How to Hunt for Lateral Movement on Your Network
 
Machine Learning for Incident Detection: Getting Started
Machine Learning for Incident Detection: Getting StartedMachine Learning for Incident Detection: Getting Started
Machine Learning for Incident Detection: Getting Started
 
Building a Next-Generation Security Operations Center (SOC)
Building a Next-Generation Security Operations Center (SOC)Building a Next-Generation Security Operations Center (SOC)
Building a Next-Generation Security Operations Center (SOC)
 
User and Entity Behavior Analytics using the Sqrrl Behavior Graph
User and Entity Behavior Analytics using the Sqrrl Behavior GraphUser and Entity Behavior Analytics using the Sqrrl Behavior Graph
User and Entity Behavior Analytics using the Sqrrl Behavior Graph
 
Threat Hunting Platforms (Collaboration with SANS Institute)
Threat Hunting Platforms (Collaboration with SANS Institute)Threat Hunting Platforms (Collaboration with SANS Institute)
Threat Hunting Platforms (Collaboration with SANS Institute)
 
Sqrrl and IBM: Threat Hunting for QRadar Users
Sqrrl and IBM: Threat Hunting for QRadar UsersSqrrl and IBM: Threat Hunting for QRadar Users
Sqrrl and IBM: Threat Hunting for QRadar Users
 
Threat Hunting for Command and Control Activity
Threat Hunting for Command and Control ActivityThreat Hunting for Command and Control Activity
Threat Hunting for Command and Control Activity
 
Modernizing Your SOC: A CISO-led Training
Modernizing Your SOC: A CISO-led TrainingModernizing Your SOC: A CISO-led Training
Modernizing Your SOC: A CISO-led Training
 
Threat Hunting vs. UEBA: Similarities, Differences, and How They Work Together
Threat Hunting vs. UEBA: Similarities, Differences, and How They Work Together Threat Hunting vs. UEBA: Similarities, Differences, and How They Work Together
Threat Hunting vs. UEBA: Similarities, Differences, and How They Work Together
 
Leveraging DNS to Surface Attacker Activity
Leveraging DNS to Surface Attacker ActivityLeveraging DNS to Surface Attacker Activity
Leveraging DNS to Surface Attacker Activity
 
The Art and Science of Alert Triage
The Art and Science of Alert TriageThe Art and Science of Alert Triage
The Art and Science of Alert Triage
 
Reducing Mean Time to Know
Reducing Mean Time to KnowReducing Mean Time to Know
Reducing Mean Time to Know
 
Sqrrl Enterprise: Big Data Security Analytics Use Case
Sqrrl Enterprise: Big Data Security Analytics Use CaseSqrrl Enterprise: Big Data Security Analytics Use Case
Sqrrl Enterprise: Big Data Security Analytics Use Case
 
The Linked Data Advantage
The Linked Data AdvantageThe Linked Data Advantage
The Linked Data Advantage
 
Sqrrl Enterprise: Integrate, Explore, Analyze
Sqrrl Enterprise: Integrate, Explore, AnalyzeSqrrl Enterprise: Integrate, Explore, Analyze
Sqrrl Enterprise: Integrate, Explore, Analyze
 
Sqrrl Datasheet: Cyber Hunting
Sqrrl Datasheet: Cyber HuntingSqrrl Datasheet: Cyber Hunting
Sqrrl Datasheet: Cyber Hunting
 
Benchmarking The Apache Accumulo Distributed Key–Value Store
Benchmarking The Apache Accumulo Distributed Key–Value StoreBenchmarking The Apache Accumulo Distributed Key–Value Store
Benchmarking The Apache Accumulo Distributed Key–Value Store
 
Scalable Graph Clustering with Pregel
Scalable Graph Clustering with PregelScalable Graph Clustering with Pregel
Scalable Graph Clustering with Pregel
 

Último

How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...apidays
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWERMadyBayot
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelNavi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelDeepika Singh
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024The Digital Insurer
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024The Digital Insurer
 
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...apidays
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...Zilliz
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 

Último (20)

How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelNavi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 

Accumulo meetup 20130109

  • 1. Securely explore your data APPACHE ACCUMULO Adam Fuchs and John Vines sqrrldata, inc. January 9, 2013
  • 2. APACHE ACCUMULO Sorted, Distributed Key/Value Store Based on Google’s Big Table Design Built on Top of Apache Hadoop and Apache Zookeeper Augments and Integrates With the Hadoop ecosystem © 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential 2
  • 3. TODAY’S TALK Overview of the Accumulo Project Accumulo Design Table Design Strategy Live Demonstration © 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential 3
  • 4. ACCUMULO TIMELINE NSA open sources Accumulo into incubation at Apache Google publishes Bigtablepaper 2005 2006 Google Publishes Papers: GFS (2003) Map Reduce (2004) 2007 2008 2009 NSA begins development of Accumulo © 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential Accumulo becomes a top-level Apache project 2010 2011 First sqrrl release planned 2012 2013 sqrrl is founded 4
  • 5. ACCUMULO’S STRENGTHS Apache Accumulo excels at: - Security Cell-level security reduces the cost of application development in the presence of complex legal or policy restrictions on data use Mandatory access control keeps your data safe - Scalability Proven reliability and performance at the multi-petabyte scale High-performance parallel I/O library - Adaptability Flexible schema support to quickly ingest new data sources Sorted key/value paradigm supports a multitude of search and analysis applications Server-side programming framework “iterator trees” support bestin-class aggregation, filtering, and complex query semantics © 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential 5
  • 6. BASIC SCHEMA Accumulo stores sorted key/value pairs (entries). An Accumulo key is a 5-tuple, consisting of: - Row: Controls Atomicity - Column Family: Controls Locality - Column Qualifier: Controls Uniqueness - Visibility Label: Controls Access - Timestamp: Controls Versioning Keys are sorted: -Hierarchically: Row first, then column family, and so on. - Lexicographically: Compare first byte, then second, and so on. Values are byte arrays. © 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential 6
  • 7. KEY/VALUE EXAMPLES Row Col. Fam. Col. Qual. John Doe Visibility JD Timesta mp Value Jane Doe Friends 20121130 Jane Doe PhoneNumbe 555-1212 r John Doe Friends Jane Doe JD 20121201 John Doe Notes PCP PCP_JD 20120912 Patient suffers from an acute … John Doe Test Results Cholesterol JD|PCP_JD 20120912 183 John Doe Test Results Mental Health JD|PSYCH_JD 20120801 Pass John Doe Test Results Mental Health PSYCH_JD 20120801 Crazy! John Doe Test Results X-Ray JD|PHYS_JD 20120513 1010110110100 … 20090115 © 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential 7
  • 8. VISIBILITY SYNTAX & SEMANTICS © 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential 8
  • 9. TABLET ORGANIZATION Well-Known Location (zookeeper) Collections of entries from tables Tables are partitioned into Tablets Metadata tablets hold info about other tablets, forming a 3-level hierarchy A Tablet is a unit of work for a Tablet Server Root Tablet -∞ to ∞ Metadata Tablet 1 Metadata Tablet 2 -∞ to “Encyclopedia:Ocelot” “Encyclopedia:Ocelot” to ∞ Table: Adam’s Table Data Tablet -∞ : thing Data Tablet thing : ∞ Table: Encyclopedia Data Tablet -∞ : Ocelot © 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential Data Tablet Ocelot : Yak Data Tablet Yak : ∞ Table: Foo Data Tablet -∞ to ∞ 9
  • 10. ACCUMULO ARCHITECTURE Zookeeper Zookeeper Delegate Authority, Configs Zookeeper Delegate Authority, Configs Tablet Server Tablet Read/Write Assign/Balance Tablet Server Master Application Application Tablet Store/Replicate Tablet Server Application HDFS Scan Delete Tablet Garbage Collector © 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential 10
  • 11. TABLET DATA FLOW Tablet Writes In-Memory Map Scan Iterator Tree Iterator Tree Minor Compaction Sorted, Indexed File Sorted, Indexed File Write Ahead Log (For Recovery) Iterator Major Tree Merging / Compaction © 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential Reads Sorted, Indexed File 11
  • 12. ITERATOR FRAMEWORK Iterator Operations: - File Reads - Block Caching - Merging - Deletion - Isolation - Locality Groups - Range Selection - Column Selection - Cell-level Security - Versioning - Filtering - Aggregation - Partitioned Joins © 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential 12
  • 13. CLIENT API new ZooKeeperInstance(...) Instance new MockInstance() getConnector(auth info...) Range IteratorOption Connector TableOperations Authorizations InstanceOperations createScanner(...) createBatchScanner(...) createBatchWriter(...) SecurityOperations Scanner BatchScanner BatchWriter iterator() addMutation(...) Map.Entry Key Mutation Value © 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential 13
  • 14. TABLE DESIGN Table: Graphs Document-distributed indexing Multi-dimensional index Custom index © 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential Inverted Index Row: <UUID> <Term> Column Family: <Type> <Type> + <Field> Column Qualifier: <Field> <UUID> Value: No built-in secondary indices Sort Order  Index Basic design pattern: forward and inverted index tables Additional table design patterns Forward Index <Term> <Digest of Event> 14
  • 15. DEMO TIME! © 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential 15
  • 16. OPEN SOURCE PROJECT Apache Software Foundation project since October 2011 site: http://accumulo.apache.org jira: https://issues.apache.org/jira/browse/ACCUMULO lists: http://accumulo.apache.org/mailing_list.html © 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential 16
  • 17. CURRENT CONTRIBUTORS © 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential 17
  • 18. CONTACT Adam Fuchs CTO adam@sqrrl.com John Vines Director of Ecosystems john@sqrrl.com sqrrl data, Inc. www.sqrrl.com @sqrrl_inc info@sqrrl.com © 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential 18

Notas del editor

  1. Sort order across all keys.Columns can differ across rows.Value can have many different types.Entry can convey information with an empty value.Each key has a visibility – a given read will see a subset of the data.Visibility is part of key uniqueness – PSYCH_JD is withholding information from JD.
  2. Tablet Servers have 4 primary functions:Hosting RPCs (read, write, etc.)Managing resources (RAM, CPU, File I/O, etc.)Scheduling background tasks (compactions, caching, etc.)Handling key/value pairs (via Iterators)BecauseAccumulodoesn’t use hashing to assign key-value pairs to servers, we need:We need to store the mapping of TabletServer-to-Tablet . This mapping is stored in another Tablet in Accumulo called the Metadata Table. A client need only scan a portion of the Metdata table to find which TabletServers have the Tablets it wants. (binary search through the metadata hierarchy (NEED input, is this correct) The Metadata table’sTabletServer-to-Tablet assignments must also be stored somewhere. These are written to the first Tablet of the Metadata table, called the Root Tablet. However, you may notice that the Root Tablet itself is stored in Accumulo! Somewhat of a circular dependency. That’s what we use Zookeeper for: The location of the Root Tablet is always known to ZooKeeper.
  3. ApachAccumulo runs on top of Hadoop and ZooKeeperIt relies on HDFS for storage of data and Zookeeper for:- storage of config data (location of metadata Root Tablet) - locking of tabletsZookeeper uses Quorum consistency algorithms for High Availability to prevent Single Point of FailureZookeeper itself is actually a K/V datastore, but holds very little dataAlthough we’ll be running ZK on the same machines as Accumulo and Hadoop, it’s recommended that the Quorum of servers be separate.