Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Ubiquitous Solr - A Database's Not-So-Evil Twin: Presented by Ayon Sinha, WalmartLabs
1. Ubiquitous Solr - A Database’s not-so-evil Twin
Ayon Sinha
Data Foundation @WalmartLabs
O C T O B E R 1 3 - 1 6 , 2 0 1 6 • A U S T I N , T X
2. 2
Text Search
wow
Search Suggestions
Search Engine… Lucene… Solr
• Internet and Intranet Search
• Relevance
• Search Suggestions
• Faceting
• Recommendations
• Time series
• Log search
• Geo-spatial search
• Analytics
• Graph search
• Document Store
Recommendations
Relevance
Facets
3. 3
Overview
• How to scale any data infrastructure with Apache Solr
• Build a high performance and highly available data platform for
internal and external users alike
• Walmart’s commitment to open source
4. 4
About me
• Team lead at the Data Foundation team for the largest retailer and the
largest private employer in the world
• Prior to Walmart, worked at startups building recommendation and
analytics systems
• And prior to that, was building search applications, recommendations
systems and Hadoop based analytics systems for the largest online
auction company, ebay, for 6 years
• Have been a manuscript reviewer for Manning publications for 4 years
and have helped shape the contents of “Hadoop in Practice” and “Big
Data”
5. 5
About Walmart
• 11,000+ Stores in 27 countries
• 11 eCommerce sites
• 250M customers weekly in stores and online
• Millions of database transactions per day
• Sales, Holidays and massive volume shifts
7. 7
Turns out to be a great idea!
Users seem to like the new product
8. 8
Users REALLY like this..
Higher volume, increased use cases. Quick fix scaling
alternatives add some headroom … and complexity
9. 9
We need more Business Intelligence
Business is looking good but source-of-truth data store,
not so much …
10. 10
Scale up (in a hurry) with hardware
Least risk. Diminishing returns. What next?
11. 11
Design to scale out
• Offload queries to Search Engines
• Offload recurring reads to Cache
• Offload analytics to OLAP datastores
• Shard the database
… and of course do something to hide the complexity. It is
worth it.
13. 13
The “not-so-evil” Twin to protect your Source of Truth DB
• What if a copy of your source-of-truth data is available … Just about
anywhere you want it?
• How could you use a search engine to protect and augment your
database?
– Redirect queries
• Helps scale by reducing demand for
– database indexing
– database connections
– scarce database resources like memory, storage
• Not-so-evil Twin
– Adding multiple near real-time search adds complexity … and it
comes at a cost; but done right, the benefits far outweigh the costs
14. 14
Our Approach
• Abstract the complexity of managing
– source-of-truth database
– cache coherence
– Search queries
– message bus
• Abstract Connection pool management
• Provide a scalable way to query across shards with full control of Solr
schema
• And to analyze big data without affecting real-time systems and
isolating individual data domains
20. 20
Lessons learned
A Search engine like Apache Solr is…
• not limited to search-based business applications.
• a first class citizen in your persistence technology stack; it
complements the SoT database.
• easy to adopt and has all of us as community for support.
21. 21
The Future
• Symbiotic existence of Solr/Lucene with RDBMS, NoSQL and Big
Data systems
• Walmart is committed to be part of the community building it
22. 22
Questions? Reach us at:
• You can reach me, Ayon Sinha, at:
– asinha@walmartlabs.com
– https://www.linkedin.com/in/ayonsinha
• Jason Sardina, our Lead Persistence Architect
– jsardina@walmartlabs.com
• @WalmartLabs is always hiring the best