SlideShare una empresa de Scribd logo
Introducing
DataWave
Hannah Pellon
Agenda
• Prerequisites
• High Level Architecture
• Key Concepts
• Ingest
• Query
• Applications
• FAQ
Foundations
Prerequisites
• Hadoop (HDFS, YARN, MapReduce)
• Accumulo (General Architecture, Iterators, Authorizations, Shell)
• Zookeeper
• Wildfly
What is Datawave?
Storage & Retrieval engine built on Apache Accumulo Providing:
• Tunable ingest workflow
• Table schema
• Query API
• Query parsing, planning, and execution
• Support for a variety of data types and formats
Datawave Overview
• Storage & Retrieval engine built on Apache Accumulo
Ingest Query
text
image
html
...
Various structured and unstructured data formats
csv Datawave Datawave
warehouse
ingest
Datawave
Ingest
Datawave Architecture
YARN
Wildfly
Datawave
Web
Accumulo
Datawave
Iterators
MapReduce
Zookeeper
Datawave
Flag Maker
Ingress
(HAProxy)
Dataflow
(NiFi)
Many Data Sources / Multiple Hadoop Clusters / Multiple HDFS Instances / Many Web Front-ends
Datawave
Bulk Loader
HDFS
Datawave
Query
Datawave
Ingest Job
Datawave
Tables
text
html
image
...
csv Datawave
REST API
Key Concepts
Data Model
• Records
• RawRecordContainer, Event, or Document
• A collection of fields and content data
• Fields
• Can be multi valued
• Can be indexed, reverse indexed, excluded, or simply stored
• Special fields: index-only, tokenized, and virtual fields
Record Field
Field
Type
Normalizer
Mytype-config.xml
<property>
<name>mytype.data.category.index</name>
<value>NAME,ID,ID,EXTERNALS_THETVDB,EXTERNALS_TVRAG
E,EXTERNALS_IMDB,EMBEDDED_CAST_CHARACTER_NAME,EMB
EDDED_CAST_PERSON_NAME,EMBEDDED_CAST_PERSON_ID,GE
NRES</value
</property>
<property>
<name>mytype.data.category.index.reverse</name>
<value>NAME,NETWORK_NAME,OFFICIALSITE,URL</value>
</property>
…
Normalizers
• Field Types & Normalizers
• Raw field data into index entries using a Normalizer.
• Normalized in index tables, but not in shard
Record Field
Field
Type
Normalizer
LcNoDiacritics
NumberType
…
BaNaNa banana
10 +bE1
27 +bE2.7
100 +cE1
Mytype-config.xml
<property>
<name>mytype.data.default.type.class</name>
<value>datawave.data.type.LcNoDiacriticsType</value>
</property>
<property>
<name>mytype.START.data.field.type.class</name>
<value>datawave.data.type.DateType</value>
</property>
<property>
<name>mytype.WEIGHT.data.field.type.class</name>
<value>datawave.data.type.NumberType</value>
</property>
Tables, Indexes & Shards
• Overall Structure
• Data is partitioned into date-based shards
• Each Shard is contained in a single tablet
• Each Shard has its own Field & Term Index, Data & Record Storage
• Certain Field values tracked in Edges
Shard Table
Global Index Table
Shard
Field Index
Record Storage
Data Storage
Term Index
Edge
Table
Shard Shard
Shard
Meta
Data
Table …
…
Shard Table
Dates
• LOAD_DATE
• ACTIVITY_DATE
• EVENT_DATE
• …
Mytype-config.xml
<property>
<name>mytype.data.category.date</name>
<value>EVENT_DATE</value>
</property>
YYYYMMDD_id
Table Structures
Shard
• Field Index – Index of fields within a shard
• Record – All fields and values for an Event
• Term Index- Index of terms within a record
• Document Data – All content for an object
RowId ColumnFamily ColumnQualifier Value
Shard id Datatype0DocUID FieldName0Value
RowId ColumnFamily ColumnQualifier Value
Shard id ‘fi’0FieldName Value0datatype0DocUID
RowId ColumnFamily ColumnQualifier Value
Shard id ‘tf’ Datatype0DocUId0Value0FieldName Protocol Buffer
(Offsets, stats)
RowId ColumnFamily ColumnQualifier Value
Shard id ‘d’ Datatype0DocUId0ContentName Document Contents
(GZIP and base64
encoded)
Shard Index and Shard Reverse Index
• shardIndex
• shardReverseIndex
RowId Column Family Column Qualifier Value
Normalized Field Value Normalized Field Name ShardID0DataType Doc Uid.List or
Uid.List.size
RowId Column Family Column Qualifier Value
eulaV dleiF dezilamroN Normalized Field Name ShardID0DataType Doc Uid.List or
Uid.List.size
FOOD == ‘Banana’ AND CAR_MAKE =~ ‘.*i’
shardIndex
…
banana FOOD : 20210724_0favorites doc1
banana FOOD : 20210724_1favorites doc2
…
shardReverseIndex
…
inihgrobmal CAR_MAKE : 20210724_0favorites doc1
irarref CAR_MAKE : 20210724_1favorites doc2
ittagub CAR_MAKE : 20210724_2favorites doc3,doc4,doc5
…
Shard Table
20210724_0 favoritesdoc1 : CAR_MAKE.0Lamborghini
favoritesdoc1 : COLOR.0Blue
favoritesdoc1 : FOOD.0Banana
favoritesdoc1 : NAME.0John Smith
20210724_1 favoritesdoc2 : CAR_MAKE.0Ferrari
favoritesdoc2 : COLOR.0Red
favoritesdoc2 : FOOD.0Banana
favoritesdoc2 : NAME.0Jane Doe
Edge
Use rowID Column Family Column Qualifier Value
Edge
Event
Count per
Day
EntityA0EntityB edgeType/relationshipA-
relationshipB
yyyyMMdd/edgeAttributes Protocol Buffer containing count
for day as Long and int32 bitmask
for hour of day.
Entity
Activity
per Hour
EntityA STATS/ACTIVITY/edgeType/edge
Relationship
yyyyMMdd/edgeAttributes Protocol Buffer containing count
per hour as Long[24]
• Relationship between field values in a record
• Allows all records with the same field values to be grouped together
• Supports Iterative graph building
Edges
…
kevin bacon0emily tremaine TV_COSTARS/PERSON-PERSON : 20210724/TVMAZE_METADATA-TVMAZE_METADATA/tremors/29754/A
kevin bacon0valentine mckee TV_CHARACTERS/PERSON-CHARACTER : 20210724/TVMAZE_METADATA-TVMAZE_METADATA/tremors/29754/A
…
Edge-definitions.xml
<bean id="myjson" class="datawave.ingest.mapreduce.handler.edge.define.EdgeDefinitionConfigurationHelper" scope="prototype">
…
<property name="edges">
<list> <!-- Create bidirectional edges for each pair of costars in a tv show --> <bean
class="datawave.ingest.mapreduce.handler.edge.define.EdgeDefinition">
<property name="edgeType" value="TV_COSTARS"/>
<property name="direction" value="bi"/>
<property name="AllPairs">
<list>
<bean class="datawave.ingest.mapreduce.handler.edge.define.EdgeNode">
<property name="selector" value="EMBEDDED_CAST_PERSON_NAME.EMBEDDED_0.CAST_0.PERSON_0.NAME_0"/>
<property name="relationship" value="PERSON"/>
<property name="collection" value="TVMAZE_METADATA"/>
</bean>
<bean class="datawave.ingest.mapreduce.handler.edge.define.EdgeNode">
<property name="selector" value="EMBEDDED_CAST_PERSON_NAME.EMBEDDED_0.CAST_1.PERSON_0.NAME_0"/>
<property name="relationship" value="PERSON"/>
<property name="collection" value="TVMAZE_METADATA"/>
</bean>
…
Datawave Metadata
Use rowID ColumnFamily Column Qualifier Value
Event Metadata
Datatypes this field has been
ingested for
Field name ‘e’ DataType NULL
Index Metadata
Datatypes this field is indexed for
Field name ‘i’ DataType NULL
Reverse Index Metadata
Datatypes this field is reverse
indexed for
Field name ‘ri’ DataType NULL
Type Metadata
class used to normalize field
Field name ‘t’ DataType0DatatypeClassName NULL
Event Column Frequency
Count per field per type per day
Field name ‘f’ DataType0YYYYMMDD (event date) Count
Description Field name ‘desc’ DataType Text description of field
• Tracks field characteristics and frequencies per datatype
Datawave Metadata Continued
Use rowID ColumnFamily Column Qualifier Value
Query Model
Mapping
Field name Model Name Field Name0Mapping Direction NULL
Edge metadata edgeType/
relationshipA-
relationshipB
‘edge’ infoA-infoB/edgeAttributes Protocol Buffer containing the list of field
names that contributed to this edge type with
counts
Query
Datawave Query
• JEXL or Lucene query syntax with functions
• Execution shaped by data characteristics
• Iterators to perform low level functions (e.g: Intersection)
• REST API
Query/Logic/create
Query/id/next
Query/id/close
POST
Parses, validates, optimizes query
Considers field cardinality, index, ...
GET
Returns pages of results
REST endpoints
POST
Frees up Resources
Grouping Contexts
{FIELD}.{GROUPING_CONTEXT}
• Example Query: ITEM == ‘banana’ AND COST == 500
Record 1
20210724_0 purchasedoc1 ITEM.0BANANA
20210724_0 purchasedoc1 ITEM.1MATRESS
20210724_0 purchasedoc1 ITEM.2MITERSAW
20210724_0 purchasedoc1 COST.00.69
20210724_0 purchasedoc1 COST.1500
20210724_0 purchasedoc1 COST.2249.99
20210724_0 purchasedoc1 STORE.0NILE
20210724_0 purchasedoc1 TRANSACTION_ID.01234
Grouping Contexts
{FIELD}.{GROUPING_CONTEXT}
• Example Query: ITEM == ‘banana’ AND COST == 500
• ITEM == ‘banana’ AND #MATCHES_IN_GROUP(ITEM, ‘banana’, COST, 500)
•
Record 1
20210724_0 purchasedoc1 ITEM.0BANANA
20210724_0 purchasedoc1 ITEM.1MATRESS
20210724_0 purchasedoc1 ITEM.2MITERSAW
20210724_0 purchasedoc1 COST.00.69
20210724_0 purchasedoc1 COST.1500
20210724_0 purchasedoc1 COST.2249.99
20210724_0 purchasedoc1 STORE.0NILE
20210724_0 purchasedoc1 TRANSACTION_ID.01234
Query Functions
• MATCHES_IN_GROUP
• INCLUDE(field, regex)
• EXCLUDE(field, regex)
• OCCURRENCE(field, operator, count)
• …
GeoWave
• INTERSECTS_BOUNDING_BOX(field, westLon, eastLon, southLat, southLon)
• INTERSECTS_RADIUS_KM(field, centerLon, centerLat, radiusKm)
• …
Query Logics
• EventQuery - record retrieval
• LookupUUID - find records given an ID
• EdgeQuery - find records given edge members and attributes
• DiscoveryQuery - record counts by attribute
• MetricsQuery - find query metrics
Query/Logic/create
Query/next
Query/close
Query Models
• Search many similar fields with a single term
FRUIT OR VEGETABLE OR MEAT OR GRAIN OR DAIRY FOOD
ACCOUNT_ID OR ACCOUNT_NUMBER OR TRANSACTION_ID OR …
… OR REPORT_ID OR STATEMENT_NUMBER IDS
Applications
Applications
 Healthcare
 Symptom correlation
 Outbreak tracking
 Purchase Trends
 Network Security
 …
 Any data set/feed you can translate into FIELD/VALUE pairs
myocarditis0bacterial infection SYMPTOMS / SYMPTOM-DIAGNOSIS : 20210724/UCI-HEART-DISEASE
myocarditis0covid-19 SYMPTOMS / SYMPTOM-DIAGNOSIS : 20210724/COVID-WHO
myocarditis0fever SYMPTOMS / SYMPTOM-SYMPTOM : 20210724/COVID-WHO
Thanks!
Backups
Query
Webserver
QueryExpiration
Bean
QueryExecutor
Bean
CachedResultsBean
QueryLogic
Pushdown
Scheduler
QueryPlanner
Ingress
(HAProxy)
Load
Balancer assigns
a query to a
webserver
Tserver
Query Execution
QueryTransformer
Results
Shard Table
Global Index Table Edge
Table
Ingest Classes
InputFormat
RecordReader
EventMapper
DataTypeHandler
IngestHelper
Raw file
Record1
Record2
…
split
{k, RawRecordContainer}
getHelper
getEventFields
process
Mutations/
Bulk Ingest Keys
RRC F/V Multimap
EventMapper Anatomy: Raw Records to K,V Pairs

Más contenido relacionado

La actualidad más candente

Manchester Hadoop Meetup: Spark Cassandra Integration
Manchester Hadoop Meetup: Spark Cassandra IntegrationManchester Hadoop Meetup: Spark Cassandra Integration
Manchester Hadoop Meetup: Spark Cassandra Integration
Christopher Batey
 
Dublin Ireland Spark Meetup October 15, 2015
Dublin Ireland Spark Meetup October 15, 2015Dublin Ireland Spark Meetup October 15, 2015
Dublin Ireland Spark Meetup October 15, 2015
eddiebaggott
 
Trivadis TechEvent 2016 Big Data Cassandra, wieso brauche ich das? by Jan Ott
Trivadis TechEvent 2016 Big Data Cassandra, wieso brauche ich das? by Jan OttTrivadis TechEvent 2016 Big Data Cassandra, wieso brauche ich das? by Jan Ott
Trivadis TechEvent 2016 Big Data Cassandra, wieso brauche ich das? by Jan Ott
Trivadis
 
SenchaCon 2016: Add Magic to Your Ext JS Apps with D3 Visualizations - Vitaly...
SenchaCon 2016: Add Magic to Your Ext JS Apps with D3 Visualizations - Vitaly...SenchaCon 2016: Add Magic to Your Ext JS Apps with D3 Visualizations - Vitaly...
SenchaCon 2016: Add Magic to Your Ext JS Apps with D3 Visualizations - Vitaly...
Sencha
 
Reading Cassandra Meetup Feb 2015: Apache Spark
Reading Cassandra Meetup Feb 2015: Apache SparkReading Cassandra Meetup Feb 2015: Apache Spark
Reading Cassandra Meetup Feb 2015: Apache Spark
Christopher Batey
 
SenchaCon 2016: The Once and Future Grid - Nige White
SenchaCon 2016: The Once and Future Grid - Nige WhiteSenchaCon 2016: The Once and Future Grid - Nige White
SenchaCon 2016: The Once and Future Grid - Nige White
Sencha
 
Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...
Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...
Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...
Cloudera, Inc.
 
SenchaCon 2016: Developing COSMOS Using Sencha Ext JS 5 - Shenglin Xu
SenchaCon 2016: Developing COSMOS Using Sencha Ext JS 5 - Shenglin XuSenchaCon 2016: Developing COSMOS Using Sencha Ext JS 5 - Shenglin Xu
SenchaCon 2016: Developing COSMOS Using Sencha Ext JS 5 - Shenglin Xu
Sencha
 
N1QL+GSI: Language and Performance Improvements in Couchbase 5.0 and 5.5
N1QL+GSI: Language and Performance Improvements in Couchbase 5.0 and 5.5N1QL+GSI: Language and Performance Improvements in Couchbase 5.0 and 5.5
N1QL+GSI: Language and Performance Improvements in Couchbase 5.0 and 5.5
Keshav Murthy
 
Mondrian update (Pentaho community meetup 2012, Amsterdam)
Mondrian update (Pentaho community meetup 2012, Amsterdam)Mondrian update (Pentaho community meetup 2012, Amsterdam)
Mondrian update (Pentaho community meetup 2012, Amsterdam)
Julian Hyde
 
From SQL to NoSQL: Structured Querying for JSON
From SQL to NoSQL: Structured Querying for JSONFrom SQL to NoSQL: Structured Querying for JSON
From SQL to NoSQL: Structured Querying for JSON
Keshav Murthy
 
FleetDB: A Schema-Free Database in Clojure
FleetDB: A Schema-Free Database in ClojureFleetDB: A Schema-Free Database in Clojure
FleetDB: A Schema-Free Database in Clojure
Mark McGranaghan
 
Java/Scala Lab: Борис Трофимов - Обжигающая Big Data.
Java/Scala Lab: Борис Трофимов - Обжигающая Big Data.Java/Scala Lab: Борис Трофимов - Обжигающая Big Data.
Java/Scala Lab: Борис Трофимов - Обжигающая Big Data.
GeeksLab Odessa
 
Couchbase N1QL: Index Advisor
Couchbase N1QL: Index AdvisorCouchbase N1QL: Index Advisor
Couchbase N1QL: Index Advisor
Keshav Murthy
 
FleetDB A Schema-Free Database in Clojure
FleetDB A Schema-Free Database in ClojureFleetDB A Schema-Free Database in Clojure
FleetDB A Schema-Free Database in Clojure
elliando dias
 
MongoDB at Giant Eagle by David Williams
MongoDB at Giant Eagle by David WilliamsMongoDB at Giant Eagle by David Williams
MongoDB at Giant Eagle by David Williams
MongoDB
 
ASP.Net 3.5 SP1 Dynamic Data
ASP.Net 3.5 SP1 Dynamic DataASP.Net 3.5 SP1 Dynamic Data
ASP.Net 3.5 SP1 Dynamic Data
micham
 
Aws simple db
Aws simple dbAws simple db
Aws simple db
Genki Fukusaki
 
VISUAL BASIC .net data accesss vii
VISUAL BASIC .net data accesss viiVISUAL BASIC .net data accesss vii
VISUAL BASIC .net data accesss vii
argusacademy
 
SparkSQL and Dataframe
SparkSQL and DataframeSparkSQL and Dataframe
SparkSQL and Dataframe
Namgee Lee
 

La actualidad más candente (20)

Manchester Hadoop Meetup: Spark Cassandra Integration
Manchester Hadoop Meetup: Spark Cassandra IntegrationManchester Hadoop Meetup: Spark Cassandra Integration
Manchester Hadoop Meetup: Spark Cassandra Integration
 
Dublin Ireland Spark Meetup October 15, 2015
Dublin Ireland Spark Meetup October 15, 2015Dublin Ireland Spark Meetup October 15, 2015
Dublin Ireland Spark Meetup October 15, 2015
 
Trivadis TechEvent 2016 Big Data Cassandra, wieso brauche ich das? by Jan Ott
Trivadis TechEvent 2016 Big Data Cassandra, wieso brauche ich das? by Jan OttTrivadis TechEvent 2016 Big Data Cassandra, wieso brauche ich das? by Jan Ott
Trivadis TechEvent 2016 Big Data Cassandra, wieso brauche ich das? by Jan Ott
 
SenchaCon 2016: Add Magic to Your Ext JS Apps with D3 Visualizations - Vitaly...
SenchaCon 2016: Add Magic to Your Ext JS Apps with D3 Visualizations - Vitaly...SenchaCon 2016: Add Magic to Your Ext JS Apps with D3 Visualizations - Vitaly...
SenchaCon 2016: Add Magic to Your Ext JS Apps with D3 Visualizations - Vitaly...
 
Reading Cassandra Meetup Feb 2015: Apache Spark
Reading Cassandra Meetup Feb 2015: Apache SparkReading Cassandra Meetup Feb 2015: Apache Spark
Reading Cassandra Meetup Feb 2015: Apache Spark
 
SenchaCon 2016: The Once and Future Grid - Nige White
SenchaCon 2016: The Once and Future Grid - Nige WhiteSenchaCon 2016: The Once and Future Grid - Nige White
SenchaCon 2016: The Once and Future Grid - Nige White
 
Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...
Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...
Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...
 
SenchaCon 2016: Developing COSMOS Using Sencha Ext JS 5 - Shenglin Xu
SenchaCon 2016: Developing COSMOS Using Sencha Ext JS 5 - Shenglin XuSenchaCon 2016: Developing COSMOS Using Sencha Ext JS 5 - Shenglin Xu
SenchaCon 2016: Developing COSMOS Using Sencha Ext JS 5 - Shenglin Xu
 
N1QL+GSI: Language and Performance Improvements in Couchbase 5.0 and 5.5
N1QL+GSI: Language and Performance Improvements in Couchbase 5.0 and 5.5N1QL+GSI: Language and Performance Improvements in Couchbase 5.0 and 5.5
N1QL+GSI: Language and Performance Improvements in Couchbase 5.0 and 5.5
 
Mondrian update (Pentaho community meetup 2012, Amsterdam)
Mondrian update (Pentaho community meetup 2012, Amsterdam)Mondrian update (Pentaho community meetup 2012, Amsterdam)
Mondrian update (Pentaho community meetup 2012, Amsterdam)
 
From SQL to NoSQL: Structured Querying for JSON
From SQL to NoSQL: Structured Querying for JSONFrom SQL to NoSQL: Structured Querying for JSON
From SQL to NoSQL: Structured Querying for JSON
 
FleetDB: A Schema-Free Database in Clojure
FleetDB: A Schema-Free Database in ClojureFleetDB: A Schema-Free Database in Clojure
FleetDB: A Schema-Free Database in Clojure
 
Java/Scala Lab: Борис Трофимов - Обжигающая Big Data.
Java/Scala Lab: Борис Трофимов - Обжигающая Big Data.Java/Scala Lab: Борис Трофимов - Обжигающая Big Data.
Java/Scala Lab: Борис Трофимов - Обжигающая Big Data.
 
Couchbase N1QL: Index Advisor
Couchbase N1QL: Index AdvisorCouchbase N1QL: Index Advisor
Couchbase N1QL: Index Advisor
 
FleetDB A Schema-Free Database in Clojure
FleetDB A Schema-Free Database in ClojureFleetDB A Schema-Free Database in Clojure
FleetDB A Schema-Free Database in Clojure
 
MongoDB at Giant Eagle by David Williams
MongoDB at Giant Eagle by David WilliamsMongoDB at Giant Eagle by David Williams
MongoDB at Giant Eagle by David Williams
 
ASP.Net 3.5 SP1 Dynamic Data
ASP.Net 3.5 SP1 Dynamic DataASP.Net 3.5 SP1 Dynamic Data
ASP.Net 3.5 SP1 Dynamic Data
 
Aws simple db
Aws simple dbAws simple db
Aws simple db
 
VISUAL BASIC .net data accesss vii
VISUAL BASIC .net data accesss viiVISUAL BASIC .net data accesss vii
VISUAL BASIC .net data accesss vii
 
SparkSQL and Dataframe
SparkSQL and DataframeSparkSQL and Dataframe
SparkSQL and Dataframe
 

Similar a Introducing DataWave

Couchbas for dummies
Couchbas for dummiesCouchbas for dummies
Couchbas for dummies
Qureshi Tehmina
 
Solutions for bi-directional Integration between Oracle RDMBS & Apache Kafka
Solutions for bi-directional Integration between Oracle RDMBS & Apache KafkaSolutions for bi-directional Integration between Oracle RDMBS & Apache Kafka
Solutions for bi-directional Integration between Oracle RDMBS & Apache Kafka
Guido Schmutz
 
Solutions for bi-directional integration between Oracle RDBMS and Apache Kafk...
Solutions for bi-directional integration between Oracle RDBMS and Apache Kafk...Solutions for bi-directional integration between Oracle RDBMS and Apache Kafk...
Solutions for bi-directional integration between Oracle RDBMS and Apache Kafk...
confluent
 
Running Databases on AWS
Running Databases on AWSRunning Databases on AWS
Running Databases on AWS
Amazon Web Services
 
Solutions for bi-directional integration between Oracle RDBMS & Apache Kafka
Solutions for bi-directional integration between Oracle RDBMS & Apache KafkaSolutions for bi-directional integration between Oracle RDBMS & Apache Kafka
Solutions for bi-directional integration between Oracle RDBMS & Apache Kafka
Guido Schmutz
 
10.Local Database & LINQ
10.Local Database & LINQ10.Local Database & LINQ
10.Local Database & LINQ
Nguyen Tuan
 
Avro, la puissance du binaire, la souplesse du JSON
Avro, la puissance du binaire, la souplesse du JSONAvro, la puissance du binaire, la souplesse du JSON
Avro, la puissance du binaire, la souplesse du JSON
Alexandre Victoor
 
Apache cassandra & apache spark for time series data
Apache cassandra & apache spark for time series dataApache cassandra & apache spark for time series data
Apache cassandra & apache spark for time series data
Patrick McFadin
 
Owning time series with team apache Strata San Jose 2015
Owning time series with team apache   Strata San Jose 2015Owning time series with team apache   Strata San Jose 2015
Owning time series with team apache Strata San Jose 2015
Patrick McFadin
 
OSCON 2011 CouchApps
OSCON 2011 CouchAppsOSCON 2011 CouchApps
OSCON 2011 CouchApps
Bradley Holt
 
Local storage in Web apps
Local storage in Web appsLocal storage in Web apps
Local storage in Web apps
Ivano Malavolta
 
MongoDB World 2018: Keynote
MongoDB World 2018: KeynoteMongoDB World 2018: Keynote
MongoDB World 2018: Keynote
MongoDB
 
Cutting Edge Data Processing with PHP & XQuery
Cutting Edge Data Processing with PHP & XQueryCutting Edge Data Processing with PHP & XQuery
Cutting Edge Data Processing with PHP & XQuery
William Candillon
 
Querying Data Pipeline with AWS Athena
Querying Data Pipeline with AWS AthenaQuerying Data Pipeline with AWS Athena
Querying Data Pipeline with AWS Athena
Yaroslav Tkachenko
 
Brightstar DB
Brightstar DBBrightstar DB
Brightstar DB
Connected Data World
 
TechDays 2013 Jari Kallonen: What's New WebForms 4.5
TechDays 2013 Jari Kallonen: What's New WebForms 4.5TechDays 2013 Jari Kallonen: What's New WebForms 4.5
TechDays 2013 Jari Kallonen: What's New WebForms 4.5
Tieturi Oy
 
Local data storage for mobile apps
Local data storage for mobile appsLocal data storage for mobile apps
Local data storage for mobile apps
Ivano Malavolta
 
Be A Hero: Transforming GoPro Analytics Data Pipeline
Be A Hero: Transforming GoPro Analytics Data PipelineBe A Hero: Transforming GoPro Analytics Data Pipeline
Be A Hero: Transforming GoPro Analytics Data Pipeline
Chester Chen
 
What's new in GeoServer 2.2
What's new in GeoServer 2.2What's new in GeoServer 2.2
What's new in GeoServer 2.2
GeoSolutions
 
Hands On Spring Data
Hands On Spring DataHands On Spring Data
Hands On Spring Data
Eric Bottard
 

Similar a Introducing DataWave (20)

Couchbas for dummies
Couchbas for dummiesCouchbas for dummies
Couchbas for dummies
 
Solutions for bi-directional Integration between Oracle RDMBS & Apache Kafka
Solutions for bi-directional Integration between Oracle RDMBS & Apache KafkaSolutions for bi-directional Integration between Oracle RDMBS & Apache Kafka
Solutions for bi-directional Integration between Oracle RDMBS & Apache Kafka
 
Solutions for bi-directional integration between Oracle RDBMS and Apache Kafk...
Solutions for bi-directional integration between Oracle RDBMS and Apache Kafk...Solutions for bi-directional integration between Oracle RDBMS and Apache Kafk...
Solutions for bi-directional integration between Oracle RDBMS and Apache Kafk...
 
Running Databases on AWS
Running Databases on AWSRunning Databases on AWS
Running Databases on AWS
 
Solutions for bi-directional integration between Oracle RDBMS & Apache Kafka
Solutions for bi-directional integration between Oracle RDBMS & Apache KafkaSolutions for bi-directional integration between Oracle RDBMS & Apache Kafka
Solutions for bi-directional integration between Oracle RDBMS & Apache Kafka
 
10.Local Database & LINQ
10.Local Database & LINQ10.Local Database & LINQ
10.Local Database & LINQ
 
Avro, la puissance du binaire, la souplesse du JSON
Avro, la puissance du binaire, la souplesse du JSONAvro, la puissance du binaire, la souplesse du JSON
Avro, la puissance du binaire, la souplesse du JSON
 
Apache cassandra & apache spark for time series data
Apache cassandra & apache spark for time series dataApache cassandra & apache spark for time series data
Apache cassandra & apache spark for time series data
 
Owning time series with team apache Strata San Jose 2015
Owning time series with team apache   Strata San Jose 2015Owning time series with team apache   Strata San Jose 2015
Owning time series with team apache Strata San Jose 2015
 
OSCON 2011 CouchApps
OSCON 2011 CouchAppsOSCON 2011 CouchApps
OSCON 2011 CouchApps
 
Local storage in Web apps
Local storage in Web appsLocal storage in Web apps
Local storage in Web apps
 
MongoDB World 2018: Keynote
MongoDB World 2018: KeynoteMongoDB World 2018: Keynote
MongoDB World 2018: Keynote
 
Cutting Edge Data Processing with PHP & XQuery
Cutting Edge Data Processing with PHP & XQueryCutting Edge Data Processing with PHP & XQuery
Cutting Edge Data Processing with PHP & XQuery
 
Querying Data Pipeline with AWS Athena
Querying Data Pipeline with AWS AthenaQuerying Data Pipeline with AWS Athena
Querying Data Pipeline with AWS Athena
 
Brightstar DB
Brightstar DBBrightstar DB
Brightstar DB
 
TechDays 2013 Jari Kallonen: What's New WebForms 4.5
TechDays 2013 Jari Kallonen: What's New WebForms 4.5TechDays 2013 Jari Kallonen: What's New WebForms 4.5
TechDays 2013 Jari Kallonen: What's New WebForms 4.5
 
Local data storage for mobile apps
Local data storage for mobile appsLocal data storage for mobile apps
Local data storage for mobile apps
 
Be A Hero: Transforming GoPro Analytics Data Pipeline
Be A Hero: Transforming GoPro Analytics Data PipelineBe A Hero: Transforming GoPro Analytics Data Pipeline
Be A Hero: Transforming GoPro Analytics Data Pipeline
 
What's new in GeoServer 2.2
What's new in GeoServer 2.2What's new in GeoServer 2.2
What's new in GeoServer 2.2
 
Hands On Spring Data
Hands On Spring DataHands On Spring Data
Hands On Spring Data
 

Más de Data Works MD

Data Journalism at The Baltimore Banner
Data Journalism at The Baltimore BannerData Journalism at The Baltimore Banner
Data Journalism at The Baltimore Banner
Data Works MD
 
Jolt’s Picks - Machine Learning and Major League Baseball Hit Streaks
Jolt’s Picks - Machine Learning and Major League Baseball Hit StreaksJolt’s Picks - Machine Learning and Major League Baseball Hit Streaks
Jolt’s Picks - Machine Learning and Major League Baseball Hit Streaks
Data Works MD
 
Malware Detection, Enabled by Machine Learning
Malware Detection, Enabled by Machine LearningMalware Detection, Enabled by Machine Learning
Malware Detection, Enabled by Machine Learning
Data Works MD
 
Using AWS, Terraform, and Ansible to Automate Splunk at Scale
Using AWS, Terraform, and Ansible to Automate Splunk at ScaleUsing AWS, Terraform, and Ansible to Automate Splunk at Scale
Using AWS, Terraform, and Ansible to Automate Splunk at Scale
Data Works MD
 
A Day in the Life of a Data Journalist
A Day in the Life of a Data JournalistA Day in the Life of a Data Journalist
A Day in the Life of a Data Journalist
Data Works MD
 
Robotics and Machine Learning: Working with NVIDIA Jetson Kits
Robotics and Machine Learning: Working with NVIDIA Jetson KitsRobotics and Machine Learning: Working with NVIDIA Jetson Kits
Robotics and Machine Learning: Working with NVIDIA Jetson Kits
Data Works MD
 
Connect Data and Devices with Apache NiFi
Connect Data and Devices with Apache NiFiConnect Data and Devices with Apache NiFi
Connect Data and Devices with Apache NiFi
Data Works MD
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine Learning
Data Works MD
 
Data in the City: Analytics and Civic Data in Baltimore
Data in the City: Analytics and Civic Data in BaltimoreData in the City: Analytics and Civic Data in Baltimore
Data in the City: Analytics and Civic Data in Baltimore
Data Works MD
 
Exploring Correlation Between Sentiment of Environmental Tweets and the Stock...
Exploring Correlation Between Sentiment of Environmental Tweets and the Stock...Exploring Correlation Between Sentiment of Environmental Tweets and the Stock...
Exploring Correlation Between Sentiment of Environmental Tweets and the Stock...
Data Works MD
 
Automated Software Requirements Labeling
Automated Software Requirements LabelingAutomated Software Requirements Labeling
Automated Software Requirements Labeling
Data Works MD
 
Introduction to Elasticsearch for Business Intelligence and Application Insights
Introduction to Elasticsearch for Business Intelligence and Application InsightsIntroduction to Elasticsearch for Business Intelligence and Application Insights
Introduction to Elasticsearch for Business Intelligence and Application Insights
Data Works MD
 
An Asynchronous Distributed Deep Learning Based Intrusion Detection System fo...
An Asynchronous Distributed Deep Learning Based Intrusion Detection System fo...An Asynchronous Distributed Deep Learning Based Intrusion Detection System fo...
An Asynchronous Distributed Deep Learning Based Intrusion Detection System fo...
Data Works MD
 
RAPIDS – Open GPU-accelerated Data Science
RAPIDS – Open GPU-accelerated Data ScienceRAPIDS – Open GPU-accelerated Data Science
RAPIDS – Open GPU-accelerated Data Science
Data Works MD
 
Two Algorithms for Weakly Supervised Denoising of EEG Data
Two Algorithms for Weakly Supervised Denoising of EEG DataTwo Algorithms for Weakly Supervised Denoising of EEG Data
Two Algorithms for Weakly Supervised Denoising of EEG Data
Data Works MD
 
Detecting Lateral Movement with a Compute-Intense Graph Kernel
Detecting Lateral Movement with a Compute-Intense Graph KernelDetecting Lateral Movement with a Compute-Intense Graph Kernel
Detecting Lateral Movement with a Compute-Intense Graph Kernel
Data Works MD
 
Predictive Analytics and Neighborhood Health
Predictive Analytics and Neighborhood HealthPredictive Analytics and Neighborhood Health
Predictive Analytics and Neighborhood Health
Data Works MD
 
Social Network Analysis Workshop
Social Network Analysis WorkshopSocial Network Analysis Workshop
Social Network Analysis Workshop
Data Works MD
 

Más de Data Works MD (18)

Data Journalism at The Baltimore Banner
Data Journalism at The Baltimore BannerData Journalism at The Baltimore Banner
Data Journalism at The Baltimore Banner
 
Jolt’s Picks - Machine Learning and Major League Baseball Hit Streaks
Jolt’s Picks - Machine Learning and Major League Baseball Hit StreaksJolt’s Picks - Machine Learning and Major League Baseball Hit Streaks
Jolt’s Picks - Machine Learning and Major League Baseball Hit Streaks
 
Malware Detection, Enabled by Machine Learning
Malware Detection, Enabled by Machine LearningMalware Detection, Enabled by Machine Learning
Malware Detection, Enabled by Machine Learning
 
Using AWS, Terraform, and Ansible to Automate Splunk at Scale
Using AWS, Terraform, and Ansible to Automate Splunk at ScaleUsing AWS, Terraform, and Ansible to Automate Splunk at Scale
Using AWS, Terraform, and Ansible to Automate Splunk at Scale
 
A Day in the Life of a Data Journalist
A Day in the Life of a Data JournalistA Day in the Life of a Data Journalist
A Day in the Life of a Data Journalist
 
Robotics and Machine Learning: Working with NVIDIA Jetson Kits
Robotics and Machine Learning: Working with NVIDIA Jetson KitsRobotics and Machine Learning: Working with NVIDIA Jetson Kits
Robotics and Machine Learning: Working with NVIDIA Jetson Kits
 
Connect Data and Devices with Apache NiFi
Connect Data and Devices with Apache NiFiConnect Data and Devices with Apache NiFi
Connect Data and Devices with Apache NiFi
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine Learning
 
Data in the City: Analytics and Civic Data in Baltimore
Data in the City: Analytics and Civic Data in BaltimoreData in the City: Analytics and Civic Data in Baltimore
Data in the City: Analytics and Civic Data in Baltimore
 
Exploring Correlation Between Sentiment of Environmental Tweets and the Stock...
Exploring Correlation Between Sentiment of Environmental Tweets and the Stock...Exploring Correlation Between Sentiment of Environmental Tweets and the Stock...
Exploring Correlation Between Sentiment of Environmental Tweets and the Stock...
 
Automated Software Requirements Labeling
Automated Software Requirements LabelingAutomated Software Requirements Labeling
Automated Software Requirements Labeling
 
Introduction to Elasticsearch for Business Intelligence and Application Insights
Introduction to Elasticsearch for Business Intelligence and Application InsightsIntroduction to Elasticsearch for Business Intelligence and Application Insights
Introduction to Elasticsearch for Business Intelligence and Application Insights
 
An Asynchronous Distributed Deep Learning Based Intrusion Detection System fo...
An Asynchronous Distributed Deep Learning Based Intrusion Detection System fo...An Asynchronous Distributed Deep Learning Based Intrusion Detection System fo...
An Asynchronous Distributed Deep Learning Based Intrusion Detection System fo...
 
RAPIDS – Open GPU-accelerated Data Science
RAPIDS – Open GPU-accelerated Data ScienceRAPIDS – Open GPU-accelerated Data Science
RAPIDS – Open GPU-accelerated Data Science
 
Two Algorithms for Weakly Supervised Denoising of EEG Data
Two Algorithms for Weakly Supervised Denoising of EEG DataTwo Algorithms for Weakly Supervised Denoising of EEG Data
Two Algorithms for Weakly Supervised Denoising of EEG Data
 
Detecting Lateral Movement with a Compute-Intense Graph Kernel
Detecting Lateral Movement with a Compute-Intense Graph KernelDetecting Lateral Movement with a Compute-Intense Graph Kernel
Detecting Lateral Movement with a Compute-Intense Graph Kernel
 
Predictive Analytics and Neighborhood Health
Predictive Analytics and Neighborhood HealthPredictive Analytics and Neighborhood Health
Predictive Analytics and Neighborhood Health
 
Social Network Analysis Workshop
Social Network Analysis WorkshopSocial Network Analysis Workshop
Social Network Analysis Workshop
 

Último

The Ipsos - AI - Monitor 2024 Report.pdf
The  Ipsos - AI - Monitor 2024 Report.pdfThe  Ipsos - AI - Monitor 2024 Report.pdf
The Ipsos - AI - Monitor 2024 Report.pdf
Social Samosa
 
A presentation that explain the Power BI Licensing
A presentation that explain the Power BI LicensingA presentation that explain the Power BI Licensing
A presentation that explain the Power BI Licensing
AlessioFois2
 
Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......
Sachin Paul
 
Open Source Contributions to Postgres: The Basics POSETTE 2024
Open Source Contributions to Postgres: The Basics POSETTE 2024Open Source Contributions to Postgres: The Basics POSETTE 2024
Open Source Contributions to Postgres: The Basics POSETTE 2024
ElizabethGarrettChri
 
一比一原版(harvard毕业证书)哈佛大学毕业证如何办理
一比一原版(harvard毕业证书)哈佛大学毕业证如何办理一比一原版(harvard毕业证书)哈佛大学毕业证如何办理
一比一原版(harvard毕业证书)哈佛大学毕业证如何办理
taqyea
 
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
v7oacc3l
 
Learn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queriesLearn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queries
manishkhaire30
 
一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理
一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理
一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理
hyfjgavov
 
Build applications with generative AI on Google Cloud
Build applications with generative AI on Google CloudBuild applications with generative AI on Google Cloud
Build applications with generative AI on Google Cloud
Márton Kodok
 
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
nuttdpt
 
Experts live - Improving user adoption with AI
Experts live - Improving user adoption with AIExperts live - Improving user adoption with AI
Experts live - Improving user adoption with AI
jitskeb
 
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
apvysm8
 
End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024
Lars Albertsson
 
原版一比一多伦多大学毕业证(UofT毕业证书)如何办理
原版一比一多伦多大学毕业证(UofT毕业证书)如何办理原版一比一多伦多大学毕业证(UofT毕业证书)如何办理
原版一比一多伦多大学毕业证(UofT毕业证书)如何办理
mkkikqvo
 
writing report business partner b1+ .pdf
writing report business partner b1+ .pdfwriting report business partner b1+ .pdf
writing report business partner b1+ .pdf
VyNguyen709676
 
一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理
一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理
一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理
xclpvhuk
 
一比一原版巴斯大学毕业证(Bath毕业证书)学历如何办理
一比一原版巴斯大学毕业证(Bath毕业证书)学历如何办理一比一原版巴斯大学毕业证(Bath毕业证书)学历如何办理
一比一原版巴斯大学毕业证(Bath毕业证书)学历如何办理
y3i0qsdzb
 
原版一比一利兹贝克特大学毕业证(LeedsBeckett毕业证书)如何办理
原版一比一利兹贝克特大学毕业证(LeedsBeckett毕业证书)如何办理原版一比一利兹贝克特大学毕业证(LeedsBeckett毕业证书)如何办理
原版一比一利兹贝克特大学毕业证(LeedsBeckett毕业证书)如何办理
wyddcwye1
 
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
sameer shah
 
Intelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicineIntelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicine
AndrzejJarynowski
 

Último (20)

The Ipsos - AI - Monitor 2024 Report.pdf
The  Ipsos - AI - Monitor 2024 Report.pdfThe  Ipsos - AI - Monitor 2024 Report.pdf
The Ipsos - AI - Monitor 2024 Report.pdf
 
A presentation that explain the Power BI Licensing
A presentation that explain the Power BI LicensingA presentation that explain the Power BI Licensing
A presentation that explain the Power BI Licensing
 
Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......
 
Open Source Contributions to Postgres: The Basics POSETTE 2024
Open Source Contributions to Postgres: The Basics POSETTE 2024Open Source Contributions to Postgres: The Basics POSETTE 2024
Open Source Contributions to Postgres: The Basics POSETTE 2024
 
一比一原版(harvard毕业证书)哈佛大学毕业证如何办理
一比一原版(harvard毕业证书)哈佛大学毕业证如何办理一比一原版(harvard毕业证书)哈佛大学毕业证如何办理
一比一原版(harvard毕业证书)哈佛大学毕业证如何办理
 
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
 
Learn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queriesLearn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queries
 
一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理
一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理
一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理
 
Build applications with generative AI on Google Cloud
Build applications with generative AI on Google CloudBuild applications with generative AI on Google Cloud
Build applications with generative AI on Google Cloud
 
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
 
Experts live - Improving user adoption with AI
Experts live - Improving user adoption with AIExperts live - Improving user adoption with AI
Experts live - Improving user adoption with AI
 
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
 
End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024
 
原版一比一多伦多大学毕业证(UofT毕业证书)如何办理
原版一比一多伦多大学毕业证(UofT毕业证书)如何办理原版一比一多伦多大学毕业证(UofT毕业证书)如何办理
原版一比一多伦多大学毕业证(UofT毕业证书)如何办理
 
writing report business partner b1+ .pdf
writing report business partner b1+ .pdfwriting report business partner b1+ .pdf
writing report business partner b1+ .pdf
 
一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理
一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理
一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理
 
一比一原版巴斯大学毕业证(Bath毕业证书)学历如何办理
一比一原版巴斯大学毕业证(Bath毕业证书)学历如何办理一比一原版巴斯大学毕业证(Bath毕业证书)学历如何办理
一比一原版巴斯大学毕业证(Bath毕业证书)学历如何办理
 
原版一比一利兹贝克特大学毕业证(LeedsBeckett毕业证书)如何办理
原版一比一利兹贝克特大学毕业证(LeedsBeckett毕业证书)如何办理原版一比一利兹贝克特大学毕业证(LeedsBeckett毕业证书)如何办理
原版一比一利兹贝克特大学毕业证(LeedsBeckett毕业证书)如何办理
 
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
 
Intelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicineIntelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicine
 

Introducing DataWave

  • 2. Agenda • Prerequisites • High Level Architecture • Key Concepts • Ingest • Query • Applications • FAQ
  • 4. Prerequisites • Hadoop (HDFS, YARN, MapReduce) • Accumulo (General Architecture, Iterators, Authorizations, Shell) • Zookeeper • Wildfly
  • 5. What is Datawave? Storage & Retrieval engine built on Apache Accumulo Providing: • Tunable ingest workflow • Table schema • Query API • Query parsing, planning, and execution • Support for a variety of data types and formats
  • 6. Datawave Overview • Storage & Retrieval engine built on Apache Accumulo Ingest Query text image html ... Various structured and unstructured data formats csv Datawave Datawave
  • 7. warehouse ingest Datawave Ingest Datawave Architecture YARN Wildfly Datawave Web Accumulo Datawave Iterators MapReduce Zookeeper Datawave Flag Maker Ingress (HAProxy) Dataflow (NiFi) Many Data Sources / Multiple Hadoop Clusters / Multiple HDFS Instances / Many Web Front-ends Datawave Bulk Loader HDFS Datawave Query Datawave Ingest Job Datawave Tables text html image ... csv Datawave REST API
  • 9. Data Model • Records • RawRecordContainer, Event, or Document • A collection of fields and content data • Fields • Can be multi valued • Can be indexed, reverse indexed, excluded, or simply stored • Special fields: index-only, tokenized, and virtual fields Record Field Field Type Normalizer Mytype-config.xml <property> <name>mytype.data.category.index</name> <value>NAME,ID,ID,EXTERNALS_THETVDB,EXTERNALS_TVRAG E,EXTERNALS_IMDB,EMBEDDED_CAST_CHARACTER_NAME,EMB EDDED_CAST_PERSON_NAME,EMBEDDED_CAST_PERSON_ID,GE NRES</value </property> <property> <name>mytype.data.category.index.reverse</name> <value>NAME,NETWORK_NAME,OFFICIALSITE,URL</value> </property> …
  • 10. Normalizers • Field Types & Normalizers • Raw field data into index entries using a Normalizer. • Normalized in index tables, but not in shard Record Field Field Type Normalizer LcNoDiacritics NumberType … BaNaNa banana 10 +bE1 27 +bE2.7 100 +cE1 Mytype-config.xml <property> <name>mytype.data.default.type.class</name> <value>datawave.data.type.LcNoDiacriticsType</value> </property> <property> <name>mytype.START.data.field.type.class</name> <value>datawave.data.type.DateType</value> </property> <property> <name>mytype.WEIGHT.data.field.type.class</name> <value>datawave.data.type.NumberType</value> </property>
  • 11. Tables, Indexes & Shards • Overall Structure • Data is partitioned into date-based shards • Each Shard is contained in a single tablet • Each Shard has its own Field & Term Index, Data & Record Storage • Certain Field values tracked in Edges Shard Table Global Index Table Shard Field Index Record Storage Data Storage Term Index Edge Table Shard Shard Shard Meta Data Table … …
  • 12. Shard Table Dates • LOAD_DATE • ACTIVITY_DATE • EVENT_DATE • … Mytype-config.xml <property> <name>mytype.data.category.date</name> <value>EVENT_DATE</value> </property> YYYYMMDD_id
  • 14. Shard • Field Index – Index of fields within a shard • Record – All fields and values for an Event • Term Index- Index of terms within a record • Document Data – All content for an object RowId ColumnFamily ColumnQualifier Value Shard id Datatype0DocUID FieldName0Value RowId ColumnFamily ColumnQualifier Value Shard id ‘fi’0FieldName Value0datatype0DocUID RowId ColumnFamily ColumnQualifier Value Shard id ‘tf’ Datatype0DocUId0Value0FieldName Protocol Buffer (Offsets, stats) RowId ColumnFamily ColumnQualifier Value Shard id ‘d’ Datatype0DocUId0ContentName Document Contents (GZIP and base64 encoded)
  • 15. Shard Index and Shard Reverse Index • shardIndex • shardReverseIndex RowId Column Family Column Qualifier Value Normalized Field Value Normalized Field Name ShardID0DataType Doc Uid.List or Uid.List.size RowId Column Family Column Qualifier Value eulaV dleiF dezilamroN Normalized Field Name ShardID0DataType Doc Uid.List or Uid.List.size
  • 16. FOOD == ‘Banana’ AND CAR_MAKE =~ ‘.*i’ shardIndex … banana FOOD : 20210724_0favorites doc1 banana FOOD : 20210724_1favorites doc2 … shardReverseIndex … inihgrobmal CAR_MAKE : 20210724_0favorites doc1 irarref CAR_MAKE : 20210724_1favorites doc2 ittagub CAR_MAKE : 20210724_2favorites doc3,doc4,doc5 … Shard Table 20210724_0 favoritesdoc1 : CAR_MAKE.0Lamborghini favoritesdoc1 : COLOR.0Blue favoritesdoc1 : FOOD.0Banana favoritesdoc1 : NAME.0John Smith 20210724_1 favoritesdoc2 : CAR_MAKE.0Ferrari favoritesdoc2 : COLOR.0Red favoritesdoc2 : FOOD.0Banana favoritesdoc2 : NAME.0Jane Doe
  • 17. Edge Use rowID Column Family Column Qualifier Value Edge Event Count per Day EntityA0EntityB edgeType/relationshipA- relationshipB yyyyMMdd/edgeAttributes Protocol Buffer containing count for day as Long and int32 bitmask for hour of day. Entity Activity per Hour EntityA STATS/ACTIVITY/edgeType/edge Relationship yyyyMMdd/edgeAttributes Protocol Buffer containing count per hour as Long[24] • Relationship between field values in a record • Allows all records with the same field values to be grouped together • Supports Iterative graph building
  • 18.
  • 19. Edges … kevin bacon0emily tremaine TV_COSTARS/PERSON-PERSON : 20210724/TVMAZE_METADATA-TVMAZE_METADATA/tremors/29754/A kevin bacon0valentine mckee TV_CHARACTERS/PERSON-CHARACTER : 20210724/TVMAZE_METADATA-TVMAZE_METADATA/tremors/29754/A … Edge-definitions.xml <bean id="myjson" class="datawave.ingest.mapreduce.handler.edge.define.EdgeDefinitionConfigurationHelper" scope="prototype"> … <property name="edges"> <list> <!-- Create bidirectional edges for each pair of costars in a tv show --> <bean class="datawave.ingest.mapreduce.handler.edge.define.EdgeDefinition"> <property name="edgeType" value="TV_COSTARS"/> <property name="direction" value="bi"/> <property name="AllPairs"> <list> <bean class="datawave.ingest.mapreduce.handler.edge.define.EdgeNode"> <property name="selector" value="EMBEDDED_CAST_PERSON_NAME.EMBEDDED_0.CAST_0.PERSON_0.NAME_0"/> <property name="relationship" value="PERSON"/> <property name="collection" value="TVMAZE_METADATA"/> </bean> <bean class="datawave.ingest.mapreduce.handler.edge.define.EdgeNode"> <property name="selector" value="EMBEDDED_CAST_PERSON_NAME.EMBEDDED_0.CAST_1.PERSON_0.NAME_0"/> <property name="relationship" value="PERSON"/> <property name="collection" value="TVMAZE_METADATA"/> </bean> …
  • 20. Datawave Metadata Use rowID ColumnFamily Column Qualifier Value Event Metadata Datatypes this field has been ingested for Field name ‘e’ DataType NULL Index Metadata Datatypes this field is indexed for Field name ‘i’ DataType NULL Reverse Index Metadata Datatypes this field is reverse indexed for Field name ‘ri’ DataType NULL Type Metadata class used to normalize field Field name ‘t’ DataType0DatatypeClassName NULL Event Column Frequency Count per field per type per day Field name ‘f’ DataType0YYYYMMDD (event date) Count Description Field name ‘desc’ DataType Text description of field • Tracks field characteristics and frequencies per datatype
  • 21. Datawave Metadata Continued Use rowID ColumnFamily Column Qualifier Value Query Model Mapping Field name Model Name Field Name0Mapping Direction NULL Edge metadata edgeType/ relationshipA- relationshipB ‘edge’ infoA-infoB/edgeAttributes Protocol Buffer containing the list of field names that contributed to this edge type with counts
  • 22. Query
  • 23. Datawave Query • JEXL or Lucene query syntax with functions • Execution shaped by data characteristics • Iterators to perform low level functions (e.g: Intersection) • REST API Query/Logic/create Query/id/next Query/id/close POST Parses, validates, optimizes query Considers field cardinality, index, ... GET Returns pages of results REST endpoints POST Frees up Resources
  • 24. Grouping Contexts {FIELD}.{GROUPING_CONTEXT} • Example Query: ITEM == ‘banana’ AND COST == 500 Record 1 20210724_0 purchasedoc1 ITEM.0BANANA 20210724_0 purchasedoc1 ITEM.1MATRESS 20210724_0 purchasedoc1 ITEM.2MITERSAW 20210724_0 purchasedoc1 COST.00.69 20210724_0 purchasedoc1 COST.1500 20210724_0 purchasedoc1 COST.2249.99 20210724_0 purchasedoc1 STORE.0NILE 20210724_0 purchasedoc1 TRANSACTION_ID.01234
  • 25. Grouping Contexts {FIELD}.{GROUPING_CONTEXT} • Example Query: ITEM == ‘banana’ AND COST == 500 • ITEM == ‘banana’ AND #MATCHES_IN_GROUP(ITEM, ‘banana’, COST, 500) • Record 1 20210724_0 purchasedoc1 ITEM.0BANANA 20210724_0 purchasedoc1 ITEM.1MATRESS 20210724_0 purchasedoc1 ITEM.2MITERSAW 20210724_0 purchasedoc1 COST.00.69 20210724_0 purchasedoc1 COST.1500 20210724_0 purchasedoc1 COST.2249.99 20210724_0 purchasedoc1 STORE.0NILE 20210724_0 purchasedoc1 TRANSACTION_ID.01234
  • 26. Query Functions • MATCHES_IN_GROUP • INCLUDE(field, regex) • EXCLUDE(field, regex) • OCCURRENCE(field, operator, count) • … GeoWave • INTERSECTS_BOUNDING_BOX(field, westLon, eastLon, southLat, southLon) • INTERSECTS_RADIUS_KM(field, centerLon, centerLat, radiusKm) • …
  • 27. Query Logics • EventQuery - record retrieval • LookupUUID - find records given an ID • EdgeQuery - find records given edge members and attributes • DiscoveryQuery - record counts by attribute • MetricsQuery - find query metrics Query/Logic/create Query/next Query/close
  • 28. Query Models • Search many similar fields with a single term FRUIT OR VEGETABLE OR MEAT OR GRAIN OR DAIRY FOOD ACCOUNT_ID OR ACCOUNT_NUMBER OR TRANSACTION_ID OR … … OR REPORT_ID OR STATEMENT_NUMBER IDS
  • 30. Applications  Healthcare  Symptom correlation  Outbreak tracking  Purchase Trends  Network Security  …  Any data set/feed you can translate into FIELD/VALUE pairs myocarditis0bacterial infection SYMPTOMS / SYMPTOM-DIAGNOSIS : 20210724/UCI-HEART-DISEASE myocarditis0covid-19 SYMPTOMS / SYMPTOM-DIAGNOSIS : 20210724/COVID-WHO myocarditis0fever SYMPTOMS / SYMPTOM-SYMPTOM : 20210724/COVID-WHO
  • 33. Query Webserver QueryExpiration Bean QueryExecutor Bean CachedResultsBean QueryLogic Pushdown Scheduler QueryPlanner Ingress (HAProxy) Load Balancer assigns a query to a webserver Tserver Query Execution QueryTransformer Results Shard Table Global Index Table Edge Table
  • 34. Ingest Classes InputFormat RecordReader EventMapper DataTypeHandler IngestHelper Raw file Record1 Record2 … split {k, RawRecordContainer} getHelper getEventFields process Mutations/ Bulk Ingest Keys RRC F/V Multimap
  • 35. EventMapper Anatomy: Raw Records to K,V Pairs