SlideShare una empresa de Scribd logo
1 de 42
Descargar para leer sin conexión
Building a Near Real time
“Logs Search Engine & Analytics”
using Solr
Lucene/Solr Revolution 2013
May 1st , 2013
Rahul Jain
jainr@ivycomptech.com
Who am I?
 Software Engineer
 Member of Core technology @ IVY Comptech,
Hyderabad, India
 6 years of programming experience
 Areas of expertise/interest
 High traffic web applications
 JAVA/J2EE
 Big data, NoSQL
 Information-Retrieval, Machine learning
2
Agenda
• Overview
• Indexing
• Search
• Analytics
• Architecture
• Lessons learned
• Q&A
3
Overview
Issues keep coming in “Production”
5
java.net.ConnectException:
Connection refused
ServerNotRunningException
Too many open files
DBException
NullPointerException
OutOfMemory
Issues
 Hidden Bugs
 DB is down
 Server crashed
 OutOfMemory
 Connection reset
 Nodes go out of cluster
(Due to long GC pause)
 Attack
 DOS (Denial of Service) by
sending a lot of requests
in a short time frame.
5
Why Logs Search?
• Enable production support team to immediately check for issues at
“one place”
– Saves time from logging on to multiple servers to check the logs
• Debugging production issues
– Is it a server specific or occurring in all other servers for that application?
• Allows to track user activity across multiple servers/applications.
• Correlation of multiple issues with each other.
– e.g. Logins might be failing on X Node due to OutOfMemory on Y node.
6
Key Problems
• Hundreds of servers/services generating logs
• terabytes of unstructured logs/day to index in Near Real time
• Millions of log events (Priority one)
• Full Text search & storage of log content
• High Indexing Rate of 1GB/min
• Search latency in seconds is acceptable
7
Logs are different
• varying size
– from few bytes to several KBs
– more no. of documents.
• average 6-8 million log messages in 1 GB logs
– Each line forms one log message except “exception stack trace”.
• different types
– exception stack-trace
– application logs
– http access/error logs
– gclog
• logging format is not uniform across all logs
8
Indexing
Improving Indexing Performance
10
 Solr in Embedded Mode
 Bypassing XML Marshalling/Unmarshalling
 Moving to an Async Approach
 Route traffic on Alternate Shard once “Commit” starts on Main Shard
 Other optimizations
 Add document does update (add + delete)
 Changing Buffer size in BufferIndexInput and BufferIndexOutput
 Reusing Lucene document object
Old Architecture
11
Solr Server
Centralized
Log Collection
Server
Solr Server
Search UIProduction
Server
Logs Transfer
Old Architecture
12
Solr Server
Centralized
Log Collection
Server
Solr Server
Search UIProduction
Server
Logs Transfer
Data Copy1 Data Copy2
Direct Logs transfer
Indexing ServerProduction
Server
Indexing Server
Indexing Server
Open question :
Since now Indexing system is exposed to production servers
- what if a new Indexing Server is added on the fly or one of them is down
13
Solr in Embedded Mode
14
Single JVM
Solrj
(EmbeddedSolrServer)
SolrApplication
Indexing Server
No network latency
Improving Indexing Performance
15
 Solr in Embedded Mode
 Bypassing XML Marshalling/Unmarshalling
 Moving to an Async Approach
 Route traffic on Alternate Shard once “Commit” starts on Main Shard
 Other optimizations
 Add document does update (add + delete)
 Changing Buffer size in BufferIndexInput and BufferIndexOutput
 Reusing Lucene document object
Message Flow
16
SolrInputDocument
SolrInputDocument
(new object)
Single JVM
XML
Marshalling
(UpdateRequest)
XML
Unmarshalling
(XMLLoader)
<add>
<doc>
<field> </field>
<field> </field>
<doc>
</add>
xml
Bypassing XML
Marshalling/Unmarshalling
17
SolrInputDocument
XML
Marshalling
(UpdateRequest)
XML
Unmarshalling
(XMLLoader)
SolrInputDocument
(referenced object)
Passing the Direct reference of
SolrInputDocument Object
Single JVM
DocContentStream
#getSolrInputDocuments()
RefDocumentLoader
#load()
DocUpdateRequest
#add(List<SolrInputDocument>)
LMEmbeddedSolrServer
#add(List<SolrInputDocument>)
Improving Indexing Performance
18
 Solr in Embedded Mode
 Bypassing XML Marshalling/Unmarshalling
 Moving to an Async Approach
 Route traffic on Alternate Shard once “Commit” starts on Main Shard
 Other optimizations
 Add document does update (add + delete)
 Changing Buffer size in BufferIndexInput and BufferIndexOutput
 Reusing Lucene document object
Old Architecture
(Sync)
19
Incoming
Message
Log
Event
Solr
unstructured structured
SolrInput
Document
(10K)
Thread Pool with
multiple threads
Once Batch size reaches to
10k, one of the thread adds
documents to Solr as a Sync
call and wait for response
add
UpdateResponse
Batch
Wait for response
Time taken :
- Indexing 1 chunk (10k) takes anywhere between 400ms-3000ms#
- while commit it is from 6000ms-23000ms and even more…
- In 1 GB there are around 600 chunks
- so most of time is just spent in waiting for response
#Indexing time vary based on several factors, for e.g. hardware configurations, application type, nature of data,
number of index fields/stored fields, analyzer type etc.
Moving to an Asynchronous
Architecture
20
Incoming
Message
Log Event
Event Pipeline
(BlockingQueue)
Log Event
SolrInput
Document
Log Message
Transformation
(Analyzer Thread Pool)
Log Event to
SolrInputDocument
(Indexer Thread Pool)
Add a Batch of Log
Event to Pipeline
Remove Batch of Log
Event from Pipeline
Solr
Add to
Batch
Remove
from Batch
Improving Indexing Performance
21
 Solr in Embedded Mode
 Bypassing XML Marshalling/Unmarshalling
 Moving to an Async Approach
 Route traffic on Alternate Shard once “Commit” starts on Main Shard
 Other optimizations
 Add document does update (add + delete)
 Changing Buffer size in BufferIndexInput and BufferIndexOutput
 Reusing Lucene document object
Commit Strategy
22
Solr
20130501_0
20130501_1
20130501_2
Shard
(Single Node)
SolrInputDocument
Indexing
Partition
function
22
Indexing traffic on alternate Shard
Once commit starts on “Main Shard”
23
Solr
20130501_0
20130501_1
20130501_2
Main Shard
(Single Node)
20130501_3
20130501_4
20130501_5
Alternate
Shard
SolrInputDocument
Indexing
PairPartition
function
23
Commit Strategy
• Merits
– Scales well
– Indexing can run continuously
• De-Merits
– Search needs to be done on both cores
– but end of the day these two can be merged into
one core
24
Improving Indexing Performance
25
 Solr in Embedded Mode
 Bypassing XML Marshalling/Unmarshalling
 Moving to an Async Approach
 Route traffic on Alternate Shard once “Commit” starts on Main Shard
 Other optimizations
 Add document does update (add + delete)
 Changing Buffer size in BufferIndexInput and BufferIndexOutput
 Reusing Lucene document object
Other Optimizations
• In Solr, Add document does update (add + delete)
– for each add document call, Solr internally creates a delete term with “id” field
for delete
– but log messages are always unique
• Changing Buffer Size in BufferIndexInput and BufferIndexOutput
– Increasing buffer size improves the indexing performance especially if disk is
slow.
– More Process heap is required accordingly as lot of files are created if data
volume is high.
• Reusing Lucene document and Field instances
- Check org/apache/lucene/benchmark/byTask/feeds/DocMaker.java
• Check for more information on Improving Indexing performance
http://rahuldausa.wordpress.com/2013/01/14/scaling-lucene-for-indexing-a-billion-documents/
26
The Result
27
Data Volume v/s Indexing time
(GB/Minutes)
3
14
38
56
112
0.5 2 4.5
9
22
0
20
40
60
80
100
120
1GB 4GB 8GB 17GB 35GB
IndexingTime
Before
After
28
Search
Partition
• Partitioning the data properly improves the Search performance significantly
• Partition Type
– Server based Partition
• Number of documents does not balance out evenly in all shards
– Date and Time based Partition
• Hotspot a single shard
– Least loaded Shard (index)
• By number of documents
• Balances out documents evenly in all shards
• Can’t provide optimal search performance, as all shards needs to be hit
30
Incoming
message
Server Based
Partition
Date & time
Based
Partition
Solr Shard
Hybrid Approach
Multi-tier Partition
jacob
Incoming
Message
Date & time
based
Partition
20130501_00_0
mia
Solr Shard
(date_hour_shardId)
20130501_06_0
20130501_00_1
Server based
Partition
Jacob: {
message: hello lucene
time:20130501:11:00:00
}
Indexing ServerProduction Server
mia: {
message: hello solr and lucene
time:20130501:04:00:00
}
31
Distributed Search
• One shard is chosen as leader shard
– Forwards request to all other shards and collects response.
• Requires all documents to must have
– Unique key (e.g. “id”) across all shards and should be stored
– Used a approach based on epoch to generate a unique id across cluster
inspired from instagram engineering blog#
• Unique Id
– Combination of epoch time, unique node id and an incremented number
epoch_time unique_node_id incremented_number
Unique Id
#http://instagram-engineering.tumblr.com/post/10853187575/sharding-ids-at-instagram
32
How Search works
Zookeeper
(zk)
Search Server
(tomcat)
Pushes shard
mapping to zk
Create a watcher on zk node
and update the In-Memory
shard mapping on change
User query QueryParser Indexing Server
(maestro)
Search query with
shards parameter
Shard Mapping
(In Memory structure)
Lookup
33
erverIndexing
Server
How Search works (Cont’d)
from: now-24hour
server: jacob from: now-4hour
from: now-11hour
Indexing
server
Indexing
server
Indexing
server
Lookup on shards for today
shard(s) having data for jacob
from last 6hour shard
shard(s) having data for last
12 hours
34
Leader shard
(maestro)
Analytics
• Young GC timings/chart
• Full GC timings
• DB Access/Update Timings
– Reveal is there any pattern across all DB servers?
• Real time Exceptions/Issues reporting using
facet query.
• Apache Access/Error KPI
35
Analytics
• Custom report based on “Key:Value” Pair
For e.g.
time – key:value
18:28:28, 541 - activeThreadCount:5
18:28:29, 541- activeThreadCount:8
18:28:30, 541 - activeThreadCount:9
18:28:31, 541- activeThreadCount:3
36
0
2
4
6
8
10
activeThreadCount
Architecture
Data Flows
38
Weird
Log File
Zero Copy
server
Kafka Broker
Indexing
Server
Search UI
Periodic Push
Real time transfer
from In Memory
(Log4jAppender)
o Zero Copy server
- Deployed on each Indexing server for data locality
- Write incoming files to disk as Indexing server doesn’t index with same rate
o Kafka Broker
o Kafka Appender pass the messages from in-Memory
Periodic Push
39
Zookeeper
Indexing
Server
Zero copy
server
Node 1
Production
Server
Logs transfer
Daemon
Disk
Node…n
.
.
.
Real time transfer
40
Indexing
Server
Kafka
Broker
Indexing
Server
Search UI
Production
Server
(Kafka Appender)
Zookeeper
Indexing
Server
Indexing
Server
Update Consumed
Message offset
Conclusion
Lessons Learned
• Always find sweet-spots for
– Number of Indexer threads, that can run in parallel
– Randomize
• Merge factor
• Commit Interval
• ramBufferSize
– Increasing Cache Size helps in bringing down search latency
• but with Full GC penalty
• Index size of more than 5GB in one core does not go well with Search
• Search on a lot of cores does not provide optimal response time
– Overall query response time is limited by slowest shard’s performance
• Solr scales both vertically and horizontally
• Batching of log messages based on message size (~10KB) in a MessageSet
– Kafka adds 10 bytes on each message
– Most of the time Log messages are < 100 bytes
41
Thank You
jainr@ivycomptech.com
42

Más contenido relacionado

La actualidad más candente

Tuning Solr and its Pipeline for Logs: Presented by Rafał Kuć & Radu Gheorghe...
Tuning Solr and its Pipeline for Logs: Presented by Rafał Kuć & Radu Gheorghe...Tuning Solr and its Pipeline for Logs: Presented by Rafał Kuć & Radu Gheorghe...
Tuning Solr and its Pipeline for Logs: Presented by Rafał Kuć & Radu Gheorghe...Lucidworks
 
Faceting Optimizations for Solr: Presented by Toke Eskildsen, State & Univers...
Faceting Optimizations for Solr: Presented by Toke Eskildsen, State & Univers...Faceting Optimizations for Solr: Presented by Toke Eskildsen, State & Univers...
Faceting Optimizations for Solr: Presented by Toke Eskildsen, State & Univers...Lucidworks
 
Lucene Revolution 2013 - Scaling Solr Cloud for Large-scale Social Media Anal...
Lucene Revolution 2013 - Scaling Solr Cloud for Large-scale Social Media Anal...Lucene Revolution 2013 - Scaling Solr Cloud for Large-scale Social Media Anal...
Lucene Revolution 2013 - Scaling Solr Cloud for Large-scale Social Media Anal...thelabdude
 
Cross Datacenter Replication in Apache Solr 6
Cross Datacenter Replication in Apache Solr 6Cross Datacenter Replication in Apache Solr 6
Cross Datacenter Replication in Apache Solr 6Shalin Shekhar Mangar
 
DZone Java 8 Block Buster: Query Databases Using Streams
DZone Java 8 Block Buster: Query Databases Using StreamsDZone Java 8 Block Buster: Query Databases Using Streams
DZone Java 8 Block Buster: Query Databases Using StreamsSpeedment, Inc.
 
SFBay Area Solr Meetup - June 18th: Benchmarking Solr Performance
SFBay Area Solr Meetup - June 18th: Benchmarking Solr PerformanceSFBay Area Solr Meetup - June 18th: Benchmarking Solr Performance
SFBay Area Solr Meetup - June 18th: Benchmarking Solr PerformanceLucidworks (Archived)
 
Real-time Inverted Search in the Cloud Using Lucene and Storm
Real-time Inverted Search in the Cloud Using Lucene and StormReal-time Inverted Search in the Cloud Using Lucene and Storm
Real-time Inverted Search in the Cloud Using Lucene and Stormlucenerevolution
 
Scaling Through Partitioning and Shard Splitting in Solr 4
Scaling Through Partitioning and Shard Splitting in Solr 4Scaling Through Partitioning and Shard Splitting in Solr 4
Scaling Through Partitioning and Shard Splitting in Solr 4thelabdude
 
User Defined Partitioning on PlazmaDB
User Defined Partitioning on PlazmaDBUser Defined Partitioning on PlazmaDB
User Defined Partitioning on PlazmaDBKai Sasaki
 
Data Analytics Service Company and Its Ruby Usage
Data Analytics Service Company and Its Ruby UsageData Analytics Service Company and Its Ruby Usage
Data Analytics Service Company and Its Ruby UsageSATOSHI TAGOMORI
 
Call me maybe: Jepsen and flaky networks
Call me maybe: Jepsen and flaky networksCall me maybe: Jepsen and flaky networks
Call me maybe: Jepsen and flaky networksShalin Shekhar Mangar
 
Scaling massive elastic search clusters - Rafał Kuć - Sematext
Scaling massive elastic search clusters - Rafał Kuć - SematextScaling massive elastic search clusters - Rafał Kuć - Sematext
Scaling massive elastic search clusters - Rafał Kuć - SematextRafał Kuć
 
Faster Data Analytics with Apache Spark using Apache Solr - Kiran Chitturi, L...
Faster Data Analytics with Apache Spark using Apache Solr - Kiran Chitturi, L...Faster Data Analytics with Apache Spark using Apache Solr - Kiran Chitturi, L...
Faster Data Analytics with Apache Spark using Apache Solr - Kiran Chitturi, L...Lucidworks
 
Introduction to SolrCloud
Introduction to SolrCloudIntroduction to SolrCloud
Introduction to SolrCloudVarun Thacker
 
Logging for Production Systems in The Container Era
Logging for Production Systems in The Container EraLogging for Production Systems in The Container Era
Logging for Production Systems in The Container EraSadayuki Furuhashi
 
How to make a simple cheap high availability self-healing solr cluster
How to make a simple cheap high availability self-healing solr clusterHow to make a simple cheap high availability self-healing solr cluster
How to make a simple cheap high availability self-healing solr clusterlucenerevolution
 
NYJavaSIG - Big Data Microservices w/ Speedment
NYJavaSIG - Big Data Microservices w/ SpeedmentNYJavaSIG - Big Data Microservices w/ Speedment
NYJavaSIG - Big Data Microservices w/ SpeedmentSpeedment, Inc.
 
Introduction to Apache Solr
Introduction to Apache SolrIntroduction to Apache Solr
Introduction to Apache SolrChristos Manios
 
NYC Lucene/Solr Meetup: Spark / Solr
NYC Lucene/Solr Meetup: Spark / SolrNYC Lucene/Solr Meetup: Spark / Solr
NYC Lucene/Solr Meetup: Spark / Solrthelabdude
 

La actualidad más candente (20)

Tuning Solr and its Pipeline for Logs: Presented by Rafał Kuć & Radu Gheorghe...
Tuning Solr and its Pipeline for Logs: Presented by Rafał Kuć & Radu Gheorghe...Tuning Solr and its Pipeline for Logs: Presented by Rafał Kuć & Radu Gheorghe...
Tuning Solr and its Pipeline for Logs: Presented by Rafał Kuć & Radu Gheorghe...
 
Faceting Optimizations for Solr: Presented by Toke Eskildsen, State & Univers...
Faceting Optimizations for Solr: Presented by Toke Eskildsen, State & Univers...Faceting Optimizations for Solr: Presented by Toke Eskildsen, State & Univers...
Faceting Optimizations for Solr: Presented by Toke Eskildsen, State & Univers...
 
Lucene Revolution 2013 - Scaling Solr Cloud for Large-scale Social Media Anal...
Lucene Revolution 2013 - Scaling Solr Cloud for Large-scale Social Media Anal...Lucene Revolution 2013 - Scaling Solr Cloud for Large-scale Social Media Anal...
Lucene Revolution 2013 - Scaling Solr Cloud for Large-scale Social Media Anal...
 
Cross Datacenter Replication in Apache Solr 6
Cross Datacenter Replication in Apache Solr 6Cross Datacenter Replication in Apache Solr 6
Cross Datacenter Replication in Apache Solr 6
 
DZone Java 8 Block Buster: Query Databases Using Streams
DZone Java 8 Block Buster: Query Databases Using StreamsDZone Java 8 Block Buster: Query Databases Using Streams
DZone Java 8 Block Buster: Query Databases Using Streams
 
SFBay Area Solr Meetup - June 18th: Benchmarking Solr Performance
SFBay Area Solr Meetup - June 18th: Benchmarking Solr PerformanceSFBay Area Solr Meetup - June 18th: Benchmarking Solr Performance
SFBay Area Solr Meetup - June 18th: Benchmarking Solr Performance
 
Real-time Inverted Search in the Cloud Using Lucene and Storm
Real-time Inverted Search in the Cloud Using Lucene and StormReal-time Inverted Search in the Cloud Using Lucene and Storm
Real-time Inverted Search in the Cloud Using Lucene and Storm
 
Scaling Through Partitioning and Shard Splitting in Solr 4
Scaling Through Partitioning and Shard Splitting in Solr 4Scaling Through Partitioning and Shard Splitting in Solr 4
Scaling Through Partitioning and Shard Splitting in Solr 4
 
User Defined Partitioning on PlazmaDB
User Defined Partitioning on PlazmaDBUser Defined Partitioning on PlazmaDB
User Defined Partitioning on PlazmaDB
 
Data Analytics Service Company and Its Ruby Usage
Data Analytics Service Company and Its Ruby UsageData Analytics Service Company and Its Ruby Usage
Data Analytics Service Company and Its Ruby Usage
 
Call me maybe: Jepsen and flaky networks
Call me maybe: Jepsen and flaky networksCall me maybe: Jepsen and flaky networks
Call me maybe: Jepsen and flaky networks
 
Scaling massive elastic search clusters - Rafał Kuć - Sematext
Scaling massive elastic search clusters - Rafał Kuć - SematextScaling massive elastic search clusters - Rafał Kuć - Sematext
Scaling massive elastic search clusters - Rafał Kuć - Sematext
 
Faster Data Analytics with Apache Spark using Apache Solr - Kiran Chitturi, L...
Faster Data Analytics with Apache Spark using Apache Solr - Kiran Chitturi, L...Faster Data Analytics with Apache Spark using Apache Solr - Kiran Chitturi, L...
Faster Data Analytics with Apache Spark using Apache Solr - Kiran Chitturi, L...
 
Introduction to SolrCloud
Introduction to SolrCloudIntroduction to SolrCloud
Introduction to SolrCloud
 
How to Run Solr on Docker and Why
How to Run Solr on Docker and WhyHow to Run Solr on Docker and Why
How to Run Solr on Docker and Why
 
Logging for Production Systems in The Container Era
Logging for Production Systems in The Container EraLogging for Production Systems in The Container Era
Logging for Production Systems in The Container Era
 
How to make a simple cheap high availability self-healing solr cluster
How to make a simple cheap high availability self-healing solr clusterHow to make a simple cheap high availability self-healing solr cluster
How to make a simple cheap high availability self-healing solr cluster
 
NYJavaSIG - Big Data Microservices w/ Speedment
NYJavaSIG - Big Data Microservices w/ SpeedmentNYJavaSIG - Big Data Microservices w/ Speedment
NYJavaSIG - Big Data Microservices w/ Speedment
 
Introduction to Apache Solr
Introduction to Apache SolrIntroduction to Apache Solr
Introduction to Apache Solr
 
NYC Lucene/Solr Meetup: Spark / Solr
NYC Lucene/Solr Meetup: Spark / SolrNYC Lucene/Solr Meetup: Spark / Solr
NYC Lucene/Solr Meetup: Spark / Solr
 

Similar a Building a near real time search engine & analytics for logs using solr

SQL Analytics for Search Engineers - Timothy Potter, Lucidworksngineers
SQL Analytics for Search Engineers - Timothy Potter, LucidworksngineersSQL Analytics for Search Engineers - Timothy Potter, Lucidworksngineers
SQL Analytics for Search Engineers - Timothy Potter, LucidworksngineersLucidworks
 
Webinar: Faster Log Indexing with Fusion
Webinar: Faster Log Indexing with FusionWebinar: Faster Log Indexing with Fusion
Webinar: Faster Log Indexing with FusionLucidworks
 
Solr At Scale For Time-Oriented Data: Presented by Brett Hoerner, Rocana
Solr At Scale For Time-Oriented Data: Presented by Brett Hoerner, RocanaSolr At Scale For Time-Oriented Data: Presented by Brett Hoerner, Rocana
Solr At Scale For Time-Oriented Data: Presented by Brett Hoerner, RocanaLucidworks
 
Real-Time Inverted Search NYC ASLUG Oct 2014
Real-Time Inverted Search NYC ASLUG Oct 2014Real-Time Inverted Search NYC ASLUG Oct 2014
Real-Time Inverted Search NYC ASLUG Oct 2014Bryan Bende
 
Scality S3 Server: Node js Meetup Presentation
Scality S3 Server: Node js Meetup PresentationScality S3 Server: Node js Meetup Presentation
Scality S3 Server: Node js Meetup PresentationScality
 
Highlights of AWS ReInvent 2023 (Announcements and Best Practices)
Highlights of AWS ReInvent 2023 (Announcements and Best Practices)Highlights of AWS ReInvent 2023 (Announcements and Best Practices)
Highlights of AWS ReInvent 2023 (Announcements and Best Practices)Emprovise
 
Keynote Yonik Seeley & Steve Rowe lucene solr roadmap
Keynote   Yonik Seeley & Steve Rowe lucene solr roadmapKeynote   Yonik Seeley & Steve Rowe lucene solr roadmap
Keynote Yonik Seeley & Steve Rowe lucene solr roadmaplucenerevolution
 
KEYNOTE: Lucene / Solr road map
KEYNOTE: Lucene / Solr road mapKEYNOTE: Lucene / Solr road map
KEYNOTE: Lucene / Solr road maplucenerevolution
 
Developing on SQL Azure
Developing on SQL AzureDeveloping on SQL Azure
Developing on SQL AzureIke Ellis
 
What's new in Lucene and Solr 4.x
What's new in Lucene and Solr 4.xWhat's new in Lucene and Solr 4.x
What's new in Lucene and Solr 4.xGrant Ingersoll
 
Solving Office 365 Big Challenges using Cassandra + Spark
Solving Office 365 Big Challenges using Cassandra + Spark Solving Office 365 Big Challenges using Cassandra + Spark
Solving Office 365 Big Challenges using Cassandra + Spark Anubhav Kale
 
(ATS6-PLAT02) Accelrys Catalog and Protocol Validation
(ATS6-PLAT02) Accelrys Catalog and Protocol Validation(ATS6-PLAT02) Accelrys Catalog and Protocol Validation
(ATS6-PLAT02) Accelrys Catalog and Protocol ValidationBIOVIA
 
Building a Large Scale SEO/SEM Application with Apache Solr
Building a Large Scale SEO/SEM Application with Apache SolrBuilding a Large Scale SEO/SEM Application with Apache Solr
Building a Large Scale SEO/SEM Application with Apache SolrRahul Jain
 
Real Time Indexing and Search - Ashwani Kapoor & Girish Gudla, Trulia
Real Time Indexing and Search - Ashwani Kapoor & Girish Gudla, TruliaReal Time Indexing and Search - Ashwani Kapoor & Girish Gudla, Trulia
Real Time Indexing and Search - Ashwani Kapoor & Girish Gudla, TruliaLucidworks
 
Large Data Volume Salesforce experiences
Large Data Volume Salesforce experiencesLarge Data Volume Salesforce experiences
Large Data Volume Salesforce experiencesCidar Mendizabal
 
(Re)Indexing Large Repositories in Alfresco
(Re)Indexing Large Repositories in Alfresco(Re)Indexing Large Repositories in Alfresco
(Re)Indexing Large Repositories in AlfrescoAngel Borroy López
 

Similar a Building a near real time search engine & analytics for logs using solr (20)

SQL Analytics for Search Engineers - Timothy Potter, Lucidworksngineers
SQL Analytics for Search Engineers - Timothy Potter, LucidworksngineersSQL Analytics for Search Engineers - Timothy Potter, Lucidworksngineers
SQL Analytics for Search Engineers - Timothy Potter, Lucidworksngineers
 
Internals of Presto Service
Internals of Presto ServiceInternals of Presto Service
Internals of Presto Service
 
Webinar: Faster Log Indexing with Fusion
Webinar: Faster Log Indexing with FusionWebinar: Faster Log Indexing with Fusion
Webinar: Faster Log Indexing with Fusion
 
Solr At Scale For Time-Oriented Data: Presented by Brett Hoerner, Rocana
Solr At Scale For Time-Oriented Data: Presented by Brett Hoerner, RocanaSolr At Scale For Time-Oriented Data: Presented by Brett Hoerner, Rocana
Solr At Scale For Time-Oriented Data: Presented by Brett Hoerner, Rocana
 
Real-Time Inverted Search NYC ASLUG Oct 2014
Real-Time Inverted Search NYC ASLUG Oct 2014Real-Time Inverted Search NYC ASLUG Oct 2014
Real-Time Inverted Search NYC ASLUG Oct 2014
 
CQRS
CQRSCQRS
CQRS
 
Solr 4
Solr 4Solr 4
Solr 4
 
Scality S3 Server: Node js Meetup Presentation
Scality S3 Server: Node js Meetup PresentationScality S3 Server: Node js Meetup Presentation
Scality S3 Server: Node js Meetup Presentation
 
Highlights of AWS ReInvent 2023 (Announcements and Best Practices)
Highlights of AWS ReInvent 2023 (Announcements and Best Practices)Highlights of AWS ReInvent 2023 (Announcements and Best Practices)
Highlights of AWS ReInvent 2023 (Announcements and Best Practices)
 
Keynote Yonik Seeley & Steve Rowe lucene solr roadmap
Keynote   Yonik Seeley & Steve Rowe lucene solr roadmapKeynote   Yonik Seeley & Steve Rowe lucene solr roadmap
Keynote Yonik Seeley & Steve Rowe lucene solr roadmap
 
KEYNOTE: Lucene / Solr road map
KEYNOTE: Lucene / Solr road mapKEYNOTE: Lucene / Solr road map
KEYNOTE: Lucene / Solr road map
 
Developing on SQL Azure
Developing on SQL AzureDeveloping on SQL Azure
Developing on SQL Azure
 
What's new in Lucene and Solr 4.x
What's new in Lucene and Solr 4.xWhat's new in Lucene and Solr 4.x
What's new in Lucene and Solr 4.x
 
Solving Office 365 Big Challenges using Cassandra + Spark
Solving Office 365 Big Challenges using Cassandra + Spark Solving Office 365 Big Challenges using Cassandra + Spark
Solving Office 365 Big Challenges using Cassandra + Spark
 
(ATS6-PLAT02) Accelrys Catalog and Protocol Validation
(ATS6-PLAT02) Accelrys Catalog and Protocol Validation(ATS6-PLAT02) Accelrys Catalog and Protocol Validation
(ATS6-PLAT02) Accelrys Catalog and Protocol Validation
 
Building a Large Scale SEO/SEM Application with Apache Solr
Building a Large Scale SEO/SEM Application with Apache SolrBuilding a Large Scale SEO/SEM Application with Apache Solr
Building a Large Scale SEO/SEM Application with Apache Solr
 
Real Time Indexing and Search - Ashwani Kapoor & Girish Gudla, Trulia
Real Time Indexing and Search - Ashwani Kapoor & Girish Gudla, TruliaReal Time Indexing and Search - Ashwani Kapoor & Girish Gudla, Trulia
Real Time Indexing and Search - Ashwani Kapoor & Girish Gudla, Trulia
 
Large Data Volume Salesforce experiences
Large Data Volume Salesforce experiencesLarge Data Volume Salesforce experiences
Large Data Volume Salesforce experiences
 
(Re)Indexing Large Repositories in Alfresco
(Re)Indexing Large Repositories in Alfresco(Re)Indexing Large Repositories in Alfresco
(Re)Indexing Large Repositories in Alfresco
 
VoltDB.ppt
VoltDB.pptVoltDB.ppt
VoltDB.ppt
 

Más de lucenerevolution

Text Classification Powered by Apache Mahout and Lucene
Text Classification Powered by Apache Mahout and LuceneText Classification Powered by Apache Mahout and Lucene
Text Classification Powered by Apache Mahout and Lucenelucenerevolution
 
State of the Art Logging. Kibana4Solr is Here!
State of the Art Logging. Kibana4Solr is Here! State of the Art Logging. Kibana4Solr is Here!
State of the Art Logging. Kibana4Solr is Here! lucenerevolution
 
Building Client-side Search Applications with Solr
Building Client-side Search Applications with SolrBuilding Client-side Search Applications with Solr
Building Client-side Search Applications with Solrlucenerevolution
 
Scaling Solr with SolrCloud
Scaling Solr with SolrCloudScaling Solr with SolrCloud
Scaling Solr with SolrCloudlucenerevolution
 
Administering and Monitoring SolrCloud Clusters
Administering and Monitoring SolrCloud ClustersAdministering and Monitoring SolrCloud Clusters
Administering and Monitoring SolrCloud Clusterslucenerevolution
 
Implementing a Custom Search Syntax using Solr, Lucene, and Parboiled
Implementing a Custom Search Syntax using Solr, Lucene, and ParboiledImplementing a Custom Search Syntax using Solr, Lucene, and Parboiled
Implementing a Custom Search Syntax using Solr, Lucene, and Parboiledlucenerevolution
 
Using Solr to Search and Analyze Logs
Using Solr to Search and Analyze Logs Using Solr to Search and Analyze Logs
Using Solr to Search and Analyze Logs lucenerevolution
 
Enhancing relevancy through personalization & semantic search
Enhancing relevancy through personalization & semantic searchEnhancing relevancy through personalization & semantic search
Enhancing relevancy through personalization & semantic searchlucenerevolution
 
Solr's Admin UI - Where does the data come from?
Solr's Admin UI - Where does the data come from?Solr's Admin UI - Where does the data come from?
Solr's Admin UI - Where does the data come from?lucenerevolution
 
Schemaless Solr and the Solr Schema REST API
Schemaless Solr and the Solr Schema REST APISchemaless Solr and the Solr Schema REST API
Schemaless Solr and the Solr Schema REST APIlucenerevolution
 
High Performance JSON Search and Relational Faceted Browsing with Lucene
High Performance JSON Search and Relational Faceted Browsing with LuceneHigh Performance JSON Search and Relational Faceted Browsing with Lucene
High Performance JSON Search and Relational Faceted Browsing with Lucenelucenerevolution
 
Text Classification with Lucene/Solr, Apache Hadoop and LibSVM
Text Classification with Lucene/Solr, Apache Hadoop and LibSVMText Classification with Lucene/Solr, Apache Hadoop and LibSVM
Text Classification with Lucene/Solr, Apache Hadoop and LibSVMlucenerevolution
 
Faceted Search with Lucene
Faceted Search with LuceneFaceted Search with Lucene
Faceted Search with Lucenelucenerevolution
 
Recent Additions to Lucene Arsenal
Recent Additions to Lucene ArsenalRecent Additions to Lucene Arsenal
Recent Additions to Lucene Arsenallucenerevolution
 
Turning search upside down
Turning search upside downTurning search upside down
Turning search upside downlucenerevolution
 
Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...
Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...
Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...lucenerevolution
 
Shrinking the haystack wes caldwell - final
Shrinking the haystack   wes caldwell - finalShrinking the haystack   wes caldwell - final
Shrinking the haystack wes caldwell - finallucenerevolution
 
The First Class Integration of Solr with Hadoop
The First Class Integration of Solr with HadoopThe First Class Integration of Solr with Hadoop
The First Class Integration of Solr with Hadooplucenerevolution
 
A Novel methodology for handling Document Level Security in Search Based Appl...
A Novel methodology for handling Document Level Security in Search Based Appl...A Novel methodology for handling Document Level Security in Search Based Appl...
A Novel methodology for handling Document Level Security in Search Based Appl...lucenerevolution
 

Más de lucenerevolution (20)

Text Classification Powered by Apache Mahout and Lucene
Text Classification Powered by Apache Mahout and LuceneText Classification Powered by Apache Mahout and Lucene
Text Classification Powered by Apache Mahout and Lucene
 
State of the Art Logging. Kibana4Solr is Here!
State of the Art Logging. Kibana4Solr is Here! State of the Art Logging. Kibana4Solr is Here!
State of the Art Logging. Kibana4Solr is Here!
 
Search at Twitter
Search at TwitterSearch at Twitter
Search at Twitter
 
Building Client-side Search Applications with Solr
Building Client-side Search Applications with SolrBuilding Client-side Search Applications with Solr
Building Client-side Search Applications with Solr
 
Scaling Solr with SolrCloud
Scaling Solr with SolrCloudScaling Solr with SolrCloud
Scaling Solr with SolrCloud
 
Administering and Monitoring SolrCloud Clusters
Administering and Monitoring SolrCloud ClustersAdministering and Monitoring SolrCloud Clusters
Administering and Monitoring SolrCloud Clusters
 
Implementing a Custom Search Syntax using Solr, Lucene, and Parboiled
Implementing a Custom Search Syntax using Solr, Lucene, and ParboiledImplementing a Custom Search Syntax using Solr, Lucene, and Parboiled
Implementing a Custom Search Syntax using Solr, Lucene, and Parboiled
 
Using Solr to Search and Analyze Logs
Using Solr to Search and Analyze Logs Using Solr to Search and Analyze Logs
Using Solr to Search and Analyze Logs
 
Enhancing relevancy through personalization & semantic search
Enhancing relevancy through personalization & semantic searchEnhancing relevancy through personalization & semantic search
Enhancing relevancy through personalization & semantic search
 
Solr's Admin UI - Where does the data come from?
Solr's Admin UI - Where does the data come from?Solr's Admin UI - Where does the data come from?
Solr's Admin UI - Where does the data come from?
 
Schemaless Solr and the Solr Schema REST API
Schemaless Solr and the Solr Schema REST APISchemaless Solr and the Solr Schema REST API
Schemaless Solr and the Solr Schema REST API
 
High Performance JSON Search and Relational Faceted Browsing with Lucene
High Performance JSON Search and Relational Faceted Browsing with LuceneHigh Performance JSON Search and Relational Faceted Browsing with Lucene
High Performance JSON Search and Relational Faceted Browsing with Lucene
 
Text Classification with Lucene/Solr, Apache Hadoop and LibSVM
Text Classification with Lucene/Solr, Apache Hadoop and LibSVMText Classification with Lucene/Solr, Apache Hadoop and LibSVM
Text Classification with Lucene/Solr, Apache Hadoop and LibSVM
 
Faceted Search with Lucene
Faceted Search with LuceneFaceted Search with Lucene
Faceted Search with Lucene
 
Recent Additions to Lucene Arsenal
Recent Additions to Lucene ArsenalRecent Additions to Lucene Arsenal
Recent Additions to Lucene Arsenal
 
Turning search upside down
Turning search upside downTurning search upside down
Turning search upside down
 
Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...
Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...
Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...
 
Shrinking the haystack wes caldwell - final
Shrinking the haystack   wes caldwell - finalShrinking the haystack   wes caldwell - final
Shrinking the haystack wes caldwell - final
 
The First Class Integration of Solr with Hadoop
The First Class Integration of Solr with HadoopThe First Class Integration of Solr with Hadoop
The First Class Integration of Solr with Hadoop
 
A Novel methodology for handling Document Level Security in Search Based Appl...
A Novel methodology for handling Document Level Security in Search Based Appl...A Novel methodology for handling Document Level Security in Search Based Appl...
A Novel methodology for handling Document Level Security in Search Based Appl...
 

Último

BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...Sapna Thakur
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfJayanti Pande
 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionSafetyChain Software
 
The byproduct of sericulture in different industries.pptx
The byproduct of sericulture in different industries.pptxThe byproduct of sericulture in different industries.pptx
The byproduct of sericulture in different industries.pptxShobhayan Kirtania
 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfchloefrazer622
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAssociation for Project Management
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...EduSkills OECD
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Sapana Sha
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxheathfieldcps1
 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxGaneshChakor2
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfsanyamsingh5019
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactPECB
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxSayali Powar
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactdawncurless
 
Student login on Anyboli platform.helpin
Student login on Anyboli platform.helpinStudent login on Anyboli platform.helpin
Student login on Anyboli platform.helpinRaunakKeshri1
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Celine George
 

Último (20)

BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdf
 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory Inspection
 
The byproduct of sericulture in different industries.pptx
The byproduct of sericulture in different industries.pptxThe byproduct of sericulture in different industries.pptx
The byproduct of sericulture in different industries.pptx
 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdf
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across Sectors
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
 
Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1
 
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptx
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdf
 
Advance Mobile Application Development class 07
Advance Mobile Application Development class 07Advance Mobile Application Development class 07
Advance Mobile Application Development class 07
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global Impact
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impact
 
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
 
Student login on Anyboli platform.helpin
Student login on Anyboli platform.helpinStudent login on Anyboli platform.helpin
Student login on Anyboli platform.helpin
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17
 

Building a near real time search engine & analytics for logs using solr

  • 1. Building a Near Real time “Logs Search Engine & Analytics” using Solr Lucene/Solr Revolution 2013 May 1st , 2013 Rahul Jain jainr@ivycomptech.com
  • 2. Who am I?  Software Engineer  Member of Core technology @ IVY Comptech, Hyderabad, India  6 years of programming experience  Areas of expertise/interest  High traffic web applications  JAVA/J2EE  Big data, NoSQL  Information-Retrieval, Machine learning 2
  • 3. Agenda • Overview • Indexing • Search • Analytics • Architecture • Lessons learned • Q&A 3
  • 5. Issues keep coming in “Production” 5 java.net.ConnectException: Connection refused ServerNotRunningException Too many open files DBException NullPointerException OutOfMemory Issues  Hidden Bugs  DB is down  Server crashed  OutOfMemory  Connection reset  Nodes go out of cluster (Due to long GC pause)  Attack  DOS (Denial of Service) by sending a lot of requests in a short time frame. 5
  • 6. Why Logs Search? • Enable production support team to immediately check for issues at “one place” – Saves time from logging on to multiple servers to check the logs • Debugging production issues – Is it a server specific or occurring in all other servers for that application? • Allows to track user activity across multiple servers/applications. • Correlation of multiple issues with each other. – e.g. Logins might be failing on X Node due to OutOfMemory on Y node. 6
  • 7. Key Problems • Hundreds of servers/services generating logs • terabytes of unstructured logs/day to index in Near Real time • Millions of log events (Priority one) • Full Text search & storage of log content • High Indexing Rate of 1GB/min • Search latency in seconds is acceptable 7
  • 8. Logs are different • varying size – from few bytes to several KBs – more no. of documents. • average 6-8 million log messages in 1 GB logs – Each line forms one log message except “exception stack trace”. • different types – exception stack-trace – application logs – http access/error logs – gclog • logging format is not uniform across all logs 8
  • 10. Improving Indexing Performance 10  Solr in Embedded Mode  Bypassing XML Marshalling/Unmarshalling  Moving to an Async Approach  Route traffic on Alternate Shard once “Commit” starts on Main Shard  Other optimizations  Add document does update (add + delete)  Changing Buffer size in BufferIndexInput and BufferIndexOutput  Reusing Lucene document object
  • 11. Old Architecture 11 Solr Server Centralized Log Collection Server Solr Server Search UIProduction Server Logs Transfer
  • 12. Old Architecture 12 Solr Server Centralized Log Collection Server Solr Server Search UIProduction Server Logs Transfer Data Copy1 Data Copy2
  • 13. Direct Logs transfer Indexing ServerProduction Server Indexing Server Indexing Server Open question : Since now Indexing system is exposed to production servers - what if a new Indexing Server is added on the fly or one of them is down 13
  • 14. Solr in Embedded Mode 14 Single JVM Solrj (EmbeddedSolrServer) SolrApplication Indexing Server No network latency
  • 15. Improving Indexing Performance 15  Solr in Embedded Mode  Bypassing XML Marshalling/Unmarshalling  Moving to an Async Approach  Route traffic on Alternate Shard once “Commit” starts on Main Shard  Other optimizations  Add document does update (add + delete)  Changing Buffer size in BufferIndexInput and BufferIndexOutput  Reusing Lucene document object
  • 16. Message Flow 16 SolrInputDocument SolrInputDocument (new object) Single JVM XML Marshalling (UpdateRequest) XML Unmarshalling (XMLLoader) <add> <doc> <field> </field> <field> </field> <doc> </add> xml
  • 17. Bypassing XML Marshalling/Unmarshalling 17 SolrInputDocument XML Marshalling (UpdateRequest) XML Unmarshalling (XMLLoader) SolrInputDocument (referenced object) Passing the Direct reference of SolrInputDocument Object Single JVM DocContentStream #getSolrInputDocuments() RefDocumentLoader #load() DocUpdateRequest #add(List<SolrInputDocument>) LMEmbeddedSolrServer #add(List<SolrInputDocument>)
  • 18. Improving Indexing Performance 18  Solr in Embedded Mode  Bypassing XML Marshalling/Unmarshalling  Moving to an Async Approach  Route traffic on Alternate Shard once “Commit” starts on Main Shard  Other optimizations  Add document does update (add + delete)  Changing Buffer size in BufferIndexInput and BufferIndexOutput  Reusing Lucene document object
  • 19. Old Architecture (Sync) 19 Incoming Message Log Event Solr unstructured structured SolrInput Document (10K) Thread Pool with multiple threads Once Batch size reaches to 10k, one of the thread adds documents to Solr as a Sync call and wait for response add UpdateResponse Batch Wait for response Time taken : - Indexing 1 chunk (10k) takes anywhere between 400ms-3000ms# - while commit it is from 6000ms-23000ms and even more… - In 1 GB there are around 600 chunks - so most of time is just spent in waiting for response #Indexing time vary based on several factors, for e.g. hardware configurations, application type, nature of data, number of index fields/stored fields, analyzer type etc.
  • 20. Moving to an Asynchronous Architecture 20 Incoming Message Log Event Event Pipeline (BlockingQueue) Log Event SolrInput Document Log Message Transformation (Analyzer Thread Pool) Log Event to SolrInputDocument (Indexer Thread Pool) Add a Batch of Log Event to Pipeline Remove Batch of Log Event from Pipeline Solr Add to Batch Remove from Batch
  • 21. Improving Indexing Performance 21  Solr in Embedded Mode  Bypassing XML Marshalling/Unmarshalling  Moving to an Async Approach  Route traffic on Alternate Shard once “Commit” starts on Main Shard  Other optimizations  Add document does update (add + delete)  Changing Buffer size in BufferIndexInput and BufferIndexOutput  Reusing Lucene document object
  • 23. Indexing traffic on alternate Shard Once commit starts on “Main Shard” 23 Solr 20130501_0 20130501_1 20130501_2 Main Shard (Single Node) 20130501_3 20130501_4 20130501_5 Alternate Shard SolrInputDocument Indexing PairPartition function 23
  • 24. Commit Strategy • Merits – Scales well – Indexing can run continuously • De-Merits – Search needs to be done on both cores – but end of the day these two can be merged into one core 24
  • 25. Improving Indexing Performance 25  Solr in Embedded Mode  Bypassing XML Marshalling/Unmarshalling  Moving to an Async Approach  Route traffic on Alternate Shard once “Commit” starts on Main Shard  Other optimizations  Add document does update (add + delete)  Changing Buffer size in BufferIndexInput and BufferIndexOutput  Reusing Lucene document object
  • 26. Other Optimizations • In Solr, Add document does update (add + delete) – for each add document call, Solr internally creates a delete term with “id” field for delete – but log messages are always unique • Changing Buffer Size in BufferIndexInput and BufferIndexOutput – Increasing buffer size improves the indexing performance especially if disk is slow. – More Process heap is required accordingly as lot of files are created if data volume is high. • Reusing Lucene document and Field instances - Check org/apache/lucene/benchmark/byTask/feeds/DocMaker.java • Check for more information on Improving Indexing performance http://rahuldausa.wordpress.com/2013/01/14/scaling-lucene-for-indexing-a-billion-documents/ 26
  • 28. Data Volume v/s Indexing time (GB/Minutes) 3 14 38 56 112 0.5 2 4.5 9 22 0 20 40 60 80 100 120 1GB 4GB 8GB 17GB 35GB IndexingTime Before After 28
  • 30. Partition • Partitioning the data properly improves the Search performance significantly • Partition Type – Server based Partition • Number of documents does not balance out evenly in all shards – Date and Time based Partition • Hotspot a single shard – Least loaded Shard (index) • By number of documents • Balances out documents evenly in all shards • Can’t provide optimal search performance, as all shards needs to be hit 30 Incoming message Server Based Partition Date & time Based Partition Solr Shard Hybrid Approach
  • 31. Multi-tier Partition jacob Incoming Message Date & time based Partition 20130501_00_0 mia Solr Shard (date_hour_shardId) 20130501_06_0 20130501_00_1 Server based Partition Jacob: { message: hello lucene time:20130501:11:00:00 } Indexing ServerProduction Server mia: { message: hello solr and lucene time:20130501:04:00:00 } 31
  • 32. Distributed Search • One shard is chosen as leader shard – Forwards request to all other shards and collects response. • Requires all documents to must have – Unique key (e.g. “id”) across all shards and should be stored – Used a approach based on epoch to generate a unique id across cluster inspired from instagram engineering blog# • Unique Id – Combination of epoch time, unique node id and an incremented number epoch_time unique_node_id incremented_number Unique Id #http://instagram-engineering.tumblr.com/post/10853187575/sharding-ids-at-instagram 32
  • 33. How Search works Zookeeper (zk) Search Server (tomcat) Pushes shard mapping to zk Create a watcher on zk node and update the In-Memory shard mapping on change User query QueryParser Indexing Server (maestro) Search query with shards parameter Shard Mapping (In Memory structure) Lookup 33 erverIndexing Server
  • 34. How Search works (Cont’d) from: now-24hour server: jacob from: now-4hour from: now-11hour Indexing server Indexing server Indexing server Lookup on shards for today shard(s) having data for jacob from last 6hour shard shard(s) having data for last 12 hours 34 Leader shard (maestro)
  • 35. Analytics • Young GC timings/chart • Full GC timings • DB Access/Update Timings – Reveal is there any pattern across all DB servers? • Real time Exceptions/Issues reporting using facet query. • Apache Access/Error KPI 35
  • 36. Analytics • Custom report based on “Key:Value” Pair For e.g. time – key:value 18:28:28, 541 - activeThreadCount:5 18:28:29, 541- activeThreadCount:8 18:28:30, 541 - activeThreadCount:9 18:28:31, 541- activeThreadCount:3 36 0 2 4 6 8 10 activeThreadCount
  • 38. Data Flows 38 Weird Log File Zero Copy server Kafka Broker Indexing Server Search UI Periodic Push Real time transfer from In Memory (Log4jAppender) o Zero Copy server - Deployed on each Indexing server for data locality - Write incoming files to disk as Indexing server doesn’t index with same rate o Kafka Broker o Kafka Appender pass the messages from in-Memory
  • 39. Periodic Push 39 Zookeeper Indexing Server Zero copy server Node 1 Production Server Logs transfer Daemon Disk Node…n . . .
  • 40. Real time transfer 40 Indexing Server Kafka Broker Indexing Server Search UI Production Server (Kafka Appender) Zookeeper Indexing Server Indexing Server Update Consumed Message offset
  • 41. Conclusion Lessons Learned • Always find sweet-spots for – Number of Indexer threads, that can run in parallel – Randomize • Merge factor • Commit Interval • ramBufferSize – Increasing Cache Size helps in bringing down search latency • but with Full GC penalty • Index size of more than 5GB in one core does not go well with Search • Search on a lot of cores does not provide optimal response time – Overall query response time is limited by slowest shard’s performance • Solr scales both vertically and horizontally • Batching of log messages based on message size (~10KB) in a MessageSet – Kafka adds 10 bytes on each message – Most of the time Log messages are < 100 bytes 41