SlideShare una empresa de Scribd logo
1 de 15
Descargar para leer sin conexión
Retrospection / prospection
and schema
TAGOMORI Satoshi (@tagomoris)
LINE Corp.
2014/01/31 (Fri) at University of Tsukuba
the 1st half

14年1月31日金曜日
TAGOMORI Satoshi (@tagomoris)
LINE Corp.
Development Support Team
14年1月31日金曜日
14年1月31日金曜日
14年1月31日金曜日
Logs
Service metrics (Users, PageViews, ...)
UX/UI metrics (Access path, Taps/views, ...)
Monitoring metrics (Traffic Gbps, TBytes/day, ...)
System monitoring (Error rates, Response time, ...)

14年1月31日金曜日
Software for Logging
Collection: Fluentd, Scribed, Flume, LogStash, ...
Storage: RDBMS, Hadoop HDFS, NoSQLs, Elasticsearch, ....
Processing: SQL, Hadoop MapReduce(Hive), Presto, Impala, ...
Stream-Processing: Storm, Kafka, Norikra, ...
Visualization: Kibana, Tableau Fnordmetric, GrowthForecast, Focuslight, ...
Appliance: DHW + BI Tools
Services: Google BigQuery, Treasure Data, ...

14年1月31日金曜日
How inspect logs
Retrospection (reactive search)
Store data, and search
Prospection (proactive search)
Define what should be processed, and store data

14年1月31日金曜日
What logs inspected
Schema-full data:
strict schema: pre defined fields w/ types (or reject)
schema on read: try to read known fields (or ignore)
Schema-less data:
any fields (or ignore), any types (implicit/explicit
conversion)
fit for services in-development (all internet services!)
14年1月31日金曜日
How/what
HowWhat

Schema-full

Schema-less

Retrospect

RDBMS,
Hive, BigQuery,
Cassandra, HBase, ...

MongoDB,
Hive(SerDe), TD,
Plain text file, ...

Prospect

Esper,
many of stream CEPs,
...

Norikra, ...

14年1月31日金曜日
Data size: schema & index
Logs: size is always important (xTB - xPB)
Schema:
size optimization
access optimization on memory/disk
Index:
access optimization on memory/disk
more memory/disk required
hard to distribute

14年1月31日金曜日
Query response improvements
of retrospection
Schema-full + indexed (RDBMS)
Query plan optimization
Schema on read
I/O and Task size optimization & scale out
Schema-less + indexed (Mongo)
mmap-ed index & data (!)

14年1月31日金曜日
Query response improvements
of prospection

Time window + incremental calculation
Stream processing engines

14年1月31日金曜日
Stream processing
and data size
No disks: reduction of failure points
Less memory:
size of just processing and I/O buffers
aggregation results
Easy to distribute:
stream duplication
stream splitting by aggregation key

14年1月31日金曜日
Stream processing and schema
Stream processing: query -> data
Prospective schema by queries:
Queries know required fields and its types
Unused fields can be ignored
Implicit type conversion available
Schema-less data + schema-full queries

14年1月31日金曜日
My goal:
Schema-less data stream
+ schema-full queries

It’s Norikra!

14年1月31日金曜日

Más contenido relacionado

Más de SATOSHI TAGOMORI

Más de SATOSHI TAGOMORI (20)

Hijacking Ruby Syntax in Ruby
Hijacking Ruby Syntax in RubyHijacking Ruby Syntax in Ruby
Hijacking Ruby Syntax in Ruby
 
Lock, Concurrency and Throughput of Exclusive Operations
Lock, Concurrency and Throughput of Exclusive OperationsLock, Concurrency and Throughput of Exclusive Operations
Lock, Concurrency and Throughput of Exclusive Operations
 
Data Processing and Ruby in the World
Data Processing and Ruby in the WorldData Processing and Ruby in the World
Data Processing and Ruby in the World
 
Planet-scale Data Ingestion Pipeline: Bigdam
Planet-scale Data Ingestion Pipeline: BigdamPlanet-scale Data Ingestion Pipeline: Bigdam
Planet-scale Data Ingestion Pipeline: Bigdam
 
Technologies, Data Analytics Service and Enterprise Business
Technologies, Data Analytics Service and Enterprise BusinessTechnologies, Data Analytics Service and Enterprise Business
Technologies, Data Analytics Service and Enterprise Business
 
Ruby and Distributed Storage Systems
Ruby and Distributed Storage SystemsRuby and Distributed Storage Systems
Ruby and Distributed Storage Systems
 
Perfect Norikra 2nd Season
Perfect Norikra 2nd SeasonPerfect Norikra 2nd Season
Perfect Norikra 2nd Season
 
Fluentd 101
Fluentd 101Fluentd 101
Fluentd 101
 
To Have Own Data Analytics Platform, Or NOT To
To Have Own Data Analytics Platform, Or NOT ToTo Have Own Data Analytics Platform, Or NOT To
To Have Own Data Analytics Platform, Or NOT To
 
The Patterns of Distributed Logging and Containers
The Patterns of Distributed Logging and ContainersThe Patterns of Distributed Logging and Containers
The Patterns of Distributed Logging and Containers
 
How To Write Middleware In Ruby
How To Write Middleware In RubyHow To Write Middleware In Ruby
How To Write Middleware In Ruby
 
Modern Black Mages Fighting in the Real World
Modern Black Mages Fighting in the Real WorldModern Black Mages Fighting in the Real World
Modern Black Mages Fighting in the Real World
 
Open Source Software, Distributed Systems, Database as a Cloud Service
Open Source Software, Distributed Systems, Database as a Cloud ServiceOpen Source Software, Distributed Systems, Database as a Cloud Service
Open Source Software, Distributed Systems, Database as a Cloud Service
 
Fluentd Overview, Now and Then
Fluentd Overview, Now and ThenFluentd Overview, Now and Then
Fluentd Overview, Now and Then
 
How to Make Norikra Perfect
How to Make Norikra PerfectHow to Make Norikra Perfect
How to Make Norikra Perfect
 
Distributed Logging Architecture in Container Era
Distributed Logging Architecture in Container EraDistributed Logging Architecture in Container Era
Distributed Logging Architecture in Container Era
 
Fighting API Compatibility On Fluentd Using "Black Magic"
Fighting API Compatibility On Fluentd Using "Black Magic"Fighting API Compatibility On Fluentd Using "Black Magic"
Fighting API Compatibility On Fluentd Using "Black Magic"
 
Fluentd v0.14 Plugin API Details
Fluentd v0.14 Plugin API DetailsFluentd v0.14 Plugin API Details
Fluentd v0.14 Plugin API Details
 
Overview of data analytics service: Treasure Data Service
Overview of data analytics service: Treasure Data ServiceOverview of data analytics service: Treasure Data Service
Overview of data analytics service: Treasure Data Service
 
Hive dirty/beautiful hacks in TD
Hive dirty/beautiful hacks in TDHive dirty/beautiful hacks in TD
Hive dirty/beautiful hacks in TD
 

Último

Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 

Último (20)

Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelNavi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 

Retrospection / prospection and schema

  • 1. Retrospection / prospection and schema TAGOMORI Satoshi (@tagomoris) LINE Corp. 2014/01/31 (Fri) at University of Tsukuba the 1st half 14年1月31日金曜日
  • 2. TAGOMORI Satoshi (@tagomoris) LINE Corp. Development Support Team 14年1月31日金曜日
  • 5. Logs Service metrics (Users, PageViews, ...) UX/UI metrics (Access path, Taps/views, ...) Monitoring metrics (Traffic Gbps, TBytes/day, ...) System monitoring (Error rates, Response time, ...) 14年1月31日金曜日
  • 6. Software for Logging Collection: Fluentd, Scribed, Flume, LogStash, ... Storage: RDBMS, Hadoop HDFS, NoSQLs, Elasticsearch, .... Processing: SQL, Hadoop MapReduce(Hive), Presto, Impala, ... Stream-Processing: Storm, Kafka, Norikra, ... Visualization: Kibana, Tableau Fnordmetric, GrowthForecast, Focuslight, ... Appliance: DHW + BI Tools Services: Google BigQuery, Treasure Data, ... 14年1月31日金曜日
  • 7. How inspect logs Retrospection (reactive search) Store data, and search Prospection (proactive search) Define what should be processed, and store data 14年1月31日金曜日
  • 8. What logs inspected Schema-full data: strict schema: pre defined fields w/ types (or reject) schema on read: try to read known fields (or ignore) Schema-less data: any fields (or ignore), any types (implicit/explicit conversion) fit for services in-development (all internet services!) 14年1月31日金曜日
  • 9. How/what HowWhat Schema-full Schema-less Retrospect RDBMS, Hive, BigQuery, Cassandra, HBase, ... MongoDB, Hive(SerDe), TD, Plain text file, ... Prospect Esper, many of stream CEPs, ... Norikra, ... 14年1月31日金曜日
  • 10. Data size: schema & index Logs: size is always important (xTB - xPB) Schema: size optimization access optimization on memory/disk Index: access optimization on memory/disk more memory/disk required hard to distribute 14年1月31日金曜日
  • 11. Query response improvements of retrospection Schema-full + indexed (RDBMS) Query plan optimization Schema on read I/O and Task size optimization & scale out Schema-less + indexed (Mongo) mmap-ed index & data (!) 14年1月31日金曜日
  • 12. Query response improvements of prospection Time window + incremental calculation Stream processing engines 14年1月31日金曜日
  • 13. Stream processing and data size No disks: reduction of failure points Less memory: size of just processing and I/O buffers aggregation results Easy to distribute: stream duplication stream splitting by aggregation key 14年1月31日金曜日
  • 14. Stream processing and schema Stream processing: query -> data Prospective schema by queries: Queries know required fields and its types Unused fields can be ignored Implicit type conversion available Schema-less data + schema-full queries 14年1月31日金曜日
  • 15. My goal: Schema-less data stream + schema-full queries It’s Norikra! 14年1月31日金曜日