SlideShare una empresa de Scribd logo
1 de 37
Descargar para leer sin conexión
Scaling at Showyou
      John Muellerleile (@jrecursive, john@remixation.com)
      September 26, 2011




Tuesday, September 27, 2011
John Muellerleile

      • Basho Technologies: Jan. 2008 - Dec. 2010
            • Riak
            • Riak Search
            • Automated research, NLP, spidering
      • Consulting: 2003 - 2008
            • E-commerce, AdSense, AdWords, ...
      • Infrastructure:
                  • Messaging
                  • Riak
                  • Solr



Tuesday, September 27, 2011
Agenda

      • Who am I?


      • What is Showyou?


      • The Nature of “Social Data”


      • Showyou’s Data: Today & Tomorrow


      • Data Management: Technology Stack


      • Riak: The Awesome & Sub-Awesome, Integration Patterns & Observations


      • Not Bob’s Riak: The “Mecha” Backend & Query System


Tuesday, September 27, 2011
What is Showyou?




Tuesday, September 27, 2011
A minute about the nature
                  of “Social Data”



Tuesday, September 27, 2011
Tuesday, September 27, 2011
Tuesday, September 27, 2011
Tuesday, September 27, 2011
Tuesday, September 27, 2011
Tuesday, September 27, 2011
Hi!




Tuesday, September 27, 2011
Showyou’s Data: Today

      • Riak with Bitcask


      • Solr with replication


      • Often ever-growing blobs of JSON


      • No useful way to “find” data based on anything other than compound primary
        keys :(


            • Primary keys crafted for specific access patterns :(


      • This will not last forever




Tuesday, September 27, 2011
Showyou’s Data: Tomorrow

      • Significant data growth per additional user


      • Find & aggregate data about our users & their videos


      • Derive useful “signal” from this data


            • Better search: disambiguation, “more like this”, performance


            • “Collective Intelligence”: trending, “smart collections”


            • Spam & De-duplication


            • & more: #hashtags, auto-complete, statistics, usage, ...


Tuesday, September 27, 2011
Data Management: Technology Stack

      • Erlang/OTP


            • Riak, Bitcask


      • Java


            • JInterface


            • Hazelcast


            • Solr




Tuesday, September 27, 2011
Riak: The Awesome Parts

      • By far the best operational story in its class


      • Shared-nothing


      • No single point of failure


      • Masterless multi-site replication


      • Support via EnterpriseDS Startup Program


      • I helped write it & turned it inside-out to design Riak Search while working at
        Basho -- this helps




Tuesday, September 27, 2011
Riak: The Sub-Awesome Parts

      • Riak Search as it exists today does not perform well for us & lacks features


      • Bitcask keeps all keys in memory


      • Listing keys for a bucket will take your cluster down


      • Map/Reduce is virtually useless (for us) other than as “multi-get”


      • Pre-1.0 cluster membership changes are “at your peril”


      • Usable/useful built-in monitoring is non-existent


      • If you are not intimately familiar with Riak, it’s very hard to debug!


Tuesday, September 27, 2011
Riak: Integration Patterns

      • api_riak_node: “main” Riak cluster node


      • sidecar: post-commit talks to local
        Java-based Erlang node using JInterface


      • fabric: distributed data structures &
        utilities via Hazelcast - very similar in
        spirit & implementation as Riak!


      • indexer: pull from fabric queue &
        index record in Solr


      • Identical deployment on every node



Tuesday, September 27, 2011
Riak: Integration Patterns: The Big Picture

      • backend_node: nginx, redis; logs


      • spider: finds, extracts,
        stores & indexes further
        information on users & videos


      • log_indexer: aggregate & index
        interesting parts of our access logs


      • research_riak_node: a special-purpose
        Riak cluster node to support Showyou
        data “Tomorrow”




Tuesday, September 27, 2011
Riak: Integration Patterns: Observations

      • Riak post-commit hooks


      • Why not pre-commit hooks?


      • Riak as a “virtual memory”


      • Post-commit hooks as “change events”


      • Wishlist: pre- & post-delete hook (I realize this is tricky - do it anyway)


      • Wishlist: pre- & post-create hook (Less tricky - do it anyway)




Tuesday, September 27, 2011
Riak: Integration Patterns: Search

      • Monolithic replicated Solr won’t last forever & sharding is a faceted multi-
        value “shitshow” [1]


      • We’re doing fine, for now, with lots of RAM, SSDs, etc. but...


      • I don’t want to find out “the hard way” where the joyride ends and the hellride
        starts [2]


      • Clearly the answer is to kill every bird in nearby airspace by writing a
        Riak storage backend and integrated query mechanism! [2,3,4]

          [1] Cliff Moon suggested the use of this word (for emphasis)
          [2] It’s okay, I have done this several dozen times
          [3] Yes, really: BDB, BDBJE, Innostore, sqlite, hsql, mysql, postgres, etc.
          [4] This is not a pride thing: it was a hard, lonely, unforgiving road




Tuesday, September 27, 2011
Not Bob’s Riak: A Moment of Weakness




                              “My way of joking is to tell the truth;
                               it’s the funniest joke in the world”
                                      George Bernard Shaw




Tuesday, September 27, 2011
Not Bob’s Riak: Introducing “Mecha”

      • Bob?




Tuesday, September 27, 2011
Mecha: Goals

      • Birds to kill:


            • Tight, purposed Riak integration


            • Efficient & feature-complete indexing


            • Fast sequential & range object access


            • Flexible distributed query mechanism


            • Query parallelism where appropriate




Tuesday, September 27, 2011
Mecha: What Stays The Same

      • Works with “stock” (unmodified) Riak 1.0 (pre & release)


      • Little/no difference from the “Riak side” -- everything works “as it should”


      • Differences:


            • All objects you “put” into Riak must be JSON objects (this will change by
              release to respect content type)


            • Any fields without a specific “indexed field type” (e.g., _t, _s, _dt, etc.) are
              simply stored along with the rest of the fields (i.e. “stored field”)




Tuesday, September 27, 2011
Mecha: At a glance

      • Written in Java, uses many, many third-party libraries:
        LevelDB (JNI), Solr, JInterface, Jetlang, Netty, Protobufs, Commons, ...


      • Riak backend module written in Erlang; beaten into submission for reliable
        interaction with JInterface-driven Java node


      • LevelDB instance per partition, per bucket; “ulimit -n 90000” :)


      • One Solr instance per node (covers all partitions, buckets)


      • Objects stored as Riak Objects in JSON form in LevelDB


      • Standardized schema covering common data types using name suffixes



Tuesday, September 27, 2011
Mecha: Riak Integration




Tuesday, September 27, 2011
Mecha: Index Field Types

      • Supported index field types (by suffix):


      • _t, _tt - full-text (optionally w/ term vectors, ...)


      • _s, _s_mv - exact string, multi-value exact string


      • _i, _l, _d, _f - trie-based integer, long, double, float


      • _dt - trie-based date (YYYY-MM-DDTHH:MM:SS)


      • _b - boolean


      • _xy, _ll, _geo - point, lat/lon & geohash


Tuesday, September 27, 2011
Mecha: Example Object

          { “content_t”: “NoSQL is a ghetto”,
            “lol_count_i”: 47,
            “lol_dt”: “2009-04-13T07:01:43.000Z”
            “rating_f”: 4.111111164093018,
            “tags_s_mv”: [
              “funny”,
              “lol”,
              “nosql”,
              “ghetto”]}


Tuesday, September 27, 2011
Mecha: Fast Sequential & Range Object Access

      • Every bucket gets own instance of LevelDB per partition


      • No “multiplexing” buckets or partitions per LevelDB instance


      • Keys are literal, no encoded Erlang terms; simple ranges, smaller values


      • Values stored as JSON-ized Riak objects (why? you’ll see)


      • LevelDB JNI binaries shipped with Snappy compression built-in




Tuesday, September 27, 2011
Mecha: Flexible Distributed Querying

      • Exact, prefix, suffix, & wildcard filtering on multiple index fields


      • Ultra-fast list_keys, list_bucket, list_buckets replacements


      • Equally fast bucket count operation


      • Ridiculous range query performance (any trie- type; sane datetime
        functions)


      • Faceting, group counts


      • Spatial (bounding box/bowl, geofilt, Haversine, distance faceting)




Tuesday, September 27, 2011
Mecha: Query Parallelism




Tuesday, September 27, 2011
Mecha: Coverage

      • Coverage is the set of nodes and respectively owned partitions you must
        process to cover all objects in a given bucket.

                                         Node 1




                                                              Node 2
              (for n=2)



Tuesday, September 27, 2011
Mecha: Examples: Count & Faceting

      • Count the number of records in the "research" bucket modified within the last
        10 seconds




      • Top search terms by count for the last hour




Tuesday, September 27, 2011
Mecha: Examples: Multi-value String Fields

      • See which field values occur with other values, and how often (using a multi-
        value string field)




Tuesday, September 27, 2011
Mecha: Next Steps

      • Code cleanup, test & polish current functionality


      • Sane build & deployment (yay, Maven)


      • Simplify configuration


      • Embed Solr (currently running standalone for debugging)


      • Extend & improve Solr “standard configuration”


      • “Out of Band” Map/reduce with “direct bucket-level” LevelDB instance


      • After that? Join operators, ... :)


Tuesday, September 27, 2011
Mecha: Availability

      • Right now it is a complex system


      • It will be worth waiting for tighter integration & polish


      • This is me not answering your question :)


      • As soon as responsible, sane, & possible :)




Tuesday, September 27, 2011
Thank you!

      • Basho Technologies, especially Andy Gross (@argv0) & Kelly McLaughlin
        (@_klm) specifically for help with Riak 1.0 changes


      • Of course, without Erlang/OTP, LevelDB, Solr, Java, JInterface, and a host of
        other open source projects, this would have never even gotten started --
        thank you.


      • Last but not least, thank you to my Showyou teammates for encouragement
        & support!


      • Contact:
        John Muellerleile / http://twitter.com/jrecursive
        john@remixation.com, jmuellerleile@gmail.com
        http://github.com/jrecursive



Tuesday, September 27, 2011

Más contenido relacionado

La actualidad más candente

Change data capture with MongoDB and Kafka.
Change data capture with MongoDB and Kafka.Change data capture with MongoDB and Kafka.
Change data capture with MongoDB and Kafka.Dan Harvey
 
Taboola Road To Scale With Apache Spark
Taboola Road To Scale With Apache SparkTaboola Road To Scale With Apache Spark
Taboola Road To Scale With Apache Sparktsliwowicz
 
Archiving, E-Discovery, and Supervision with Spark and Hadoop with Jordan Volz
Archiving, E-Discovery, and Supervision with Spark and Hadoop with Jordan VolzArchiving, E-Discovery, and Supervision with Spark and Hadoop with Jordan Volz
Archiving, E-Discovery, and Supervision with Spark and Hadoop with Jordan VolzDatabricks
 
Building Google-in-a-box: using Apache SolrCloud and Bigtop to index your big...
Building Google-in-a-box: using Apache SolrCloud and Bigtop to index your big...Building Google-in-a-box: using Apache SolrCloud and Bigtop to index your big...
Building Google-in-a-box: using Apache SolrCloud and Bigtop to index your big...rhatr
 
Scala and Spark are Ideal for Big Data
Scala and Spark are Ideal for Big DataScala and Spark are Ideal for Big Data
Scala and Spark are Ideal for Big DataJohn Nestor
 
Becoming Protocol-Agnostic with Kafka, REST, GraphQL & gRPC | Tyler Mills, Sm...
Becoming Protocol-Agnostic with Kafka, REST, GraphQL & gRPC | Tyler Mills, Sm...Becoming Protocol-Agnostic with Kafka, REST, GraphQL & gRPC | Tyler Mills, Sm...
Becoming Protocol-Agnostic with Kafka, REST, GraphQL & gRPC | Tyler Mills, Sm...HostedbyConfluent
 
Bootstrap SaaS startup using Open Source Tools
Bootstrap SaaS startup using Open Source ToolsBootstrap SaaS startup using Open Source Tools
Bootstrap SaaS startup using Open Source Toolsbotsplash.com
 
Change Data Capture using Kafka
Change Data Capture using KafkaChange Data Capture using Kafka
Change Data Capture using KafkaAkash Vacher
 
How Pulsar Stores Your Data - Pulsar Summit NA 2021
How Pulsar Stores Your Data - Pulsar Summit NA 2021How Pulsar Stores Your Data - Pulsar Summit NA 2021
How Pulsar Stores Your Data - Pulsar Summit NA 2021StreamNative
 
Big Data visualization with Apache Spark and Zeppelin
Big Data visualization with Apache Spark and ZeppelinBig Data visualization with Apache Spark and Zeppelin
Big Data visualization with Apache Spark and Zeppelinprajods
 
DEVNET-1106 Upcoming Services in OpenStack
DEVNET-1106	Upcoming Services in OpenStackDEVNET-1106	Upcoming Services in OpenStack
DEVNET-1106 Upcoming Services in OpenStackCisco DevNet
 
Apache Con 2021 : Apache Bookkeeper Key Value Store and use cases
Apache Con 2021 : Apache Bookkeeper Key Value Store and use casesApache Con 2021 : Apache Bookkeeper Key Value Store and use cases
Apache Con 2021 : Apache Bookkeeper Key Value Store and use casesShivji Kumar Jha
 
Making Scala Faster: 3 Expert Tips For Busy Development Teams
Making Scala Faster: 3 Expert Tips For Busy Development TeamsMaking Scala Faster: 3 Expert Tips For Busy Development Teams
Making Scala Faster: 3 Expert Tips For Busy Development TeamsLightbend
 
Efficient State Management With Spark 2.0 And Scale-Out Databases
Efficient State Management With Spark 2.0 And Scale-Out DatabasesEfficient State Management With Spark 2.0 And Scale-Out Databases
Efficient State Management With Spark 2.0 And Scale-Out DatabasesJen Aman
 
2015 01-17 Lambda Architecture with Apache Spark, NextML Conference
2015 01-17 Lambda Architecture with Apache Spark, NextML Conference2015 01-17 Lambda Architecture with Apache Spark, NextML Conference
2015 01-17 Lambda Architecture with Apache Spark, NextML ConferenceDB Tsai
 
Kafka Streams: What it is, and how to use it?
Kafka Streams: What it is, and how to use it?Kafka Streams: What it is, and how to use it?
Kafka Streams: What it is, and how to use it?confluent
 
Introduction to CosmosDB - Azure Bootcamp 2018
Introduction to CosmosDB - Azure Bootcamp 2018Introduction to CosmosDB - Azure Bootcamp 2018
Introduction to CosmosDB - Azure Bootcamp 2018Josh Carlisle
 
Tech Spark Presentation
Tech Spark PresentationTech Spark Presentation
Tech Spark PresentationStephen Borg
 
Spark Internals Training | Apache Spark | Spark | Anika Technologies
Spark Internals Training | Apache Spark | Spark | Anika TechnologiesSpark Internals Training | Apache Spark | Spark | Anika Technologies
Spark Internals Training | Apache Spark | Spark | Anika TechnologiesAnand Narayanan
 

La actualidad más candente (20)

Change data capture with MongoDB and Kafka.
Change data capture with MongoDB and Kafka.Change data capture with MongoDB and Kafka.
Change data capture with MongoDB and Kafka.
 
Apache HBase Workshop
Apache HBase WorkshopApache HBase Workshop
Apache HBase Workshop
 
Taboola Road To Scale With Apache Spark
Taboola Road To Scale With Apache SparkTaboola Road To Scale With Apache Spark
Taboola Road To Scale With Apache Spark
 
Archiving, E-Discovery, and Supervision with Spark and Hadoop with Jordan Volz
Archiving, E-Discovery, and Supervision with Spark and Hadoop with Jordan VolzArchiving, E-Discovery, and Supervision with Spark and Hadoop with Jordan Volz
Archiving, E-Discovery, and Supervision with Spark and Hadoop with Jordan Volz
 
Building Google-in-a-box: using Apache SolrCloud and Bigtop to index your big...
Building Google-in-a-box: using Apache SolrCloud and Bigtop to index your big...Building Google-in-a-box: using Apache SolrCloud and Bigtop to index your big...
Building Google-in-a-box: using Apache SolrCloud and Bigtop to index your big...
 
Scala and Spark are Ideal for Big Data
Scala and Spark are Ideal for Big DataScala and Spark are Ideal for Big Data
Scala and Spark are Ideal for Big Data
 
Becoming Protocol-Agnostic with Kafka, REST, GraphQL & gRPC | Tyler Mills, Sm...
Becoming Protocol-Agnostic with Kafka, REST, GraphQL & gRPC | Tyler Mills, Sm...Becoming Protocol-Agnostic with Kafka, REST, GraphQL & gRPC | Tyler Mills, Sm...
Becoming Protocol-Agnostic with Kafka, REST, GraphQL & gRPC | Tyler Mills, Sm...
 
Bootstrap SaaS startup using Open Source Tools
Bootstrap SaaS startup using Open Source ToolsBootstrap SaaS startup using Open Source Tools
Bootstrap SaaS startup using Open Source Tools
 
Change Data Capture using Kafka
Change Data Capture using KafkaChange Data Capture using Kafka
Change Data Capture using Kafka
 
How Pulsar Stores Your Data - Pulsar Summit NA 2021
How Pulsar Stores Your Data - Pulsar Summit NA 2021How Pulsar Stores Your Data - Pulsar Summit NA 2021
How Pulsar Stores Your Data - Pulsar Summit NA 2021
 
Big Data visualization with Apache Spark and Zeppelin
Big Data visualization with Apache Spark and ZeppelinBig Data visualization with Apache Spark and Zeppelin
Big Data visualization with Apache Spark and Zeppelin
 
DEVNET-1106 Upcoming Services in OpenStack
DEVNET-1106	Upcoming Services in OpenStackDEVNET-1106	Upcoming Services in OpenStack
DEVNET-1106 Upcoming Services in OpenStack
 
Apache Con 2021 : Apache Bookkeeper Key Value Store and use cases
Apache Con 2021 : Apache Bookkeeper Key Value Store and use casesApache Con 2021 : Apache Bookkeeper Key Value Store and use cases
Apache Con 2021 : Apache Bookkeeper Key Value Store and use cases
 
Making Scala Faster: 3 Expert Tips For Busy Development Teams
Making Scala Faster: 3 Expert Tips For Busy Development TeamsMaking Scala Faster: 3 Expert Tips For Busy Development Teams
Making Scala Faster: 3 Expert Tips For Busy Development Teams
 
Efficient State Management With Spark 2.0 And Scale-Out Databases
Efficient State Management With Spark 2.0 And Scale-Out DatabasesEfficient State Management With Spark 2.0 And Scale-Out Databases
Efficient State Management With Spark 2.0 And Scale-Out Databases
 
2015 01-17 Lambda Architecture with Apache Spark, NextML Conference
2015 01-17 Lambda Architecture with Apache Spark, NextML Conference2015 01-17 Lambda Architecture with Apache Spark, NextML Conference
2015 01-17 Lambda Architecture with Apache Spark, NextML Conference
 
Kafka Streams: What it is, and how to use it?
Kafka Streams: What it is, and how to use it?Kafka Streams: What it is, and how to use it?
Kafka Streams: What it is, and how to use it?
 
Introduction to CosmosDB - Azure Bootcamp 2018
Introduction to CosmosDB - Azure Bootcamp 2018Introduction to CosmosDB - Azure Bootcamp 2018
Introduction to CosmosDB - Azure Bootcamp 2018
 
Tech Spark Presentation
Tech Spark PresentationTech Spark Presentation
Tech Spark Presentation
 
Spark Internals Training | Apache Spark | Spark | Anika Technologies
Spark Internals Training | Apache Spark | Spark | Anika TechnologiesSpark Internals Training | Apache Spark | Spark | Anika Technologies
Spark Internals Training | Apache Spark | Spark | Anika Technologies
 

Destacado

Redis : Play buzz uses Redis
Redis : Play buzz uses RedisRedis : Play buzz uses Redis
Redis : Play buzz uses RedisRedis Labs
 
Modelling Data as Graphs (Neo4j)
Modelling Data as Graphs (Neo4j)Modelling Data as Graphs (Neo4j)
Modelling Data as Graphs (Neo4j)Michal Bachman
 
An intro to Neo4j and some use cases (JFokus 2011)
An intro to Neo4j and some use cases (JFokus 2011)An intro to Neo4j and some use cases (JFokus 2011)
An intro to Neo4j and some use cases (JFokus 2011)Emil Eifrem
 
Riak Use Cases : Dissecting The Solutions To Hard Problems
Riak Use Cases : Dissecting The Solutions To Hard ProblemsRiak Use Cases : Dissecting The Solutions To Hard Problems
Riak Use Cases : Dissecting The Solutions To Hard ProblemsAndy Gross
 
Recommendations with Neo4j (FOSDEM 2015)
Recommendations with Neo4j (FOSDEM 2015)Recommendations with Neo4j (FOSDEM 2015)
Recommendations with Neo4j (FOSDEM 2015)Michal Bachman
 
Advanced Neo4j Use Cases with the GraphAware Framework
Advanced Neo4j Use Cases with the GraphAware FrameworkAdvanced Neo4j Use Cases with the GraphAware Framework
Advanced Neo4j Use Cases with the GraphAware FrameworkMichal Bachman
 
Modelling Data in Neo4j (plus a few tips)
Modelling Data in Neo4j (plus a few tips)Modelling Data in Neo4j (plus a few tips)
Modelling Data in Neo4j (plus a few tips)Michal Bachman
 
Best Buy Web 2.0
Best Buy Web 2.0Best Buy Web 2.0
Best Buy Web 2.0Lee Aase
 
Introduction to Redis Data Structures: Sorted Sets
Introduction to Redis Data Structures: Sorted SetsIntroduction to Redis Data Structures: Sorted Sets
Introduction to Redis Data Structures: Sorted SetsScaleGrid.io
 
Redis and its many use cases
Redis and its many use casesRedis and its many use cases
Redis and its many use casesChristian Joudrey
 
Introduction to some top Redis use cases
Introduction to some top Redis use casesIntroduction to some top Redis use cases
Introduction to some top Redis use casesJosiah Carlson
 
The BestBuy.com Cloud Architecture
The BestBuy.com Cloud ArchitectureThe BestBuy.com Cloud Architecture
The BestBuy.com Cloud Architecturejoelcrabb
 
Using Redis at Facebook
Using Redis at FacebookUsing Redis at Facebook
Using Redis at FacebookRedis Labs
 
Bootstrapping Recommendations with Neo4j
Bootstrapping Recommendations with Neo4jBootstrapping Recommendations with Neo4j
Bootstrapping Recommendations with Neo4jMax De Marzi
 
Neo4j GraphTalks - Einführung in Graphdatenbanken
Neo4j GraphTalks - Einführung in GraphdatenbankenNeo4j GraphTalks - Einführung in Graphdatenbanken
Neo4j GraphTalks - Einführung in GraphdatenbankenNeo4j
 
Redis Use Patterns (DevconTLV June 2014)
Redis Use Patterns (DevconTLV June 2014)Redis Use Patterns (DevconTLV June 2014)
Redis Use Patterns (DevconTLV June 2014)Itamar Haber
 
Neo4j the Anti Crime Database
Neo4j the Anti Crime DatabaseNeo4j the Anti Crime Database
Neo4j the Anti Crime DatabaseNeo4j
 
Neo4j GraphTalks Panama Papers
Neo4j GraphTalks Panama PapersNeo4j GraphTalks Panama Papers
Neo4j GraphTalks Panama PapersNeo4j
 
Intro to Neo4j and Graph Databases
Intro to Neo4j and Graph DatabasesIntro to Neo4j and Graph Databases
Intro to Neo4j and Graph DatabasesNeo4j
 

Destacado (20)

Redis : Play buzz uses Redis
Redis : Play buzz uses RedisRedis : Play buzz uses Redis
Redis : Play buzz uses Redis
 
Modelling Data as Graphs (Neo4j)
Modelling Data as Graphs (Neo4j)Modelling Data as Graphs (Neo4j)
Modelling Data as Graphs (Neo4j)
 
An intro to Neo4j and some use cases (JFokus 2011)
An intro to Neo4j and some use cases (JFokus 2011)An intro to Neo4j and some use cases (JFokus 2011)
An intro to Neo4j and some use cases (JFokus 2011)
 
Riak Use Cases : Dissecting The Solutions To Hard Problems
Riak Use Cases : Dissecting The Solutions To Hard ProblemsRiak Use Cases : Dissecting The Solutions To Hard Problems
Riak Use Cases : Dissecting The Solutions To Hard Problems
 
Recommendations with Neo4j (FOSDEM 2015)
Recommendations with Neo4j (FOSDEM 2015)Recommendations with Neo4j (FOSDEM 2015)
Recommendations with Neo4j (FOSDEM 2015)
 
Advanced Neo4j Use Cases with the GraphAware Framework
Advanced Neo4j Use Cases with the GraphAware FrameworkAdvanced Neo4j Use Cases with the GraphAware Framework
Advanced Neo4j Use Cases with the GraphAware Framework
 
Redis use cases
Redis use casesRedis use cases
Redis use cases
 
Modelling Data in Neo4j (plus a few tips)
Modelling Data in Neo4j (plus a few tips)Modelling Data in Neo4j (plus a few tips)
Modelling Data in Neo4j (plus a few tips)
 
Best Buy Web 2.0
Best Buy Web 2.0Best Buy Web 2.0
Best Buy Web 2.0
 
Introduction to Redis Data Structures: Sorted Sets
Introduction to Redis Data Structures: Sorted SetsIntroduction to Redis Data Structures: Sorted Sets
Introduction to Redis Data Structures: Sorted Sets
 
Redis and its many use cases
Redis and its many use casesRedis and its many use cases
Redis and its many use cases
 
Introduction to some top Redis use cases
Introduction to some top Redis use casesIntroduction to some top Redis use cases
Introduction to some top Redis use cases
 
The BestBuy.com Cloud Architecture
The BestBuy.com Cloud ArchitectureThe BestBuy.com Cloud Architecture
The BestBuy.com Cloud Architecture
 
Using Redis at Facebook
Using Redis at FacebookUsing Redis at Facebook
Using Redis at Facebook
 
Bootstrapping Recommendations with Neo4j
Bootstrapping Recommendations with Neo4jBootstrapping Recommendations with Neo4j
Bootstrapping Recommendations with Neo4j
 
Neo4j GraphTalks - Einführung in Graphdatenbanken
Neo4j GraphTalks - Einführung in GraphdatenbankenNeo4j GraphTalks - Einführung in Graphdatenbanken
Neo4j GraphTalks - Einführung in Graphdatenbanken
 
Redis Use Patterns (DevconTLV June 2014)
Redis Use Patterns (DevconTLV June 2014)Redis Use Patterns (DevconTLV June 2014)
Redis Use Patterns (DevconTLV June 2014)
 
Neo4j the Anti Crime Database
Neo4j the Anti Crime DatabaseNeo4j the Anti Crime Database
Neo4j the Anti Crime Database
 
Neo4j GraphTalks Panama Papers
Neo4j GraphTalks Panama PapersNeo4j GraphTalks Panama Papers
Neo4j GraphTalks Panama Papers
 
Intro to Neo4j and Graph Databases
Intro to Neo4j and Graph DatabasesIntro to Neo4j and Graph Databases
Intro to Neo4j and Graph Databases
 

Similar a Scaling at Showyou: Data Management and Riak Integration

Sebastian Cohnen – Building a Startup with NoSQL - NoSQL matters Barcelona 2014
Sebastian Cohnen – Building a Startup with NoSQL - NoSQL matters Barcelona 2014Sebastian Cohnen – Building a Startup with NoSQL - NoSQL matters Barcelona 2014
Sebastian Cohnen – Building a Startup with NoSQL - NoSQL matters Barcelona 2014NoSQLmatters
 
JCR - Java Content Repositories
JCR - Java Content RepositoriesJCR - Java Content Repositories
JCR - Java Content RepositoriesCarsten Ziegeler
 
A Case Study of NoSQL Adoption: What Drove Wordnik Non-Relational?
A Case Study of NoSQL Adoption: What Drove Wordnik Non-Relational?A Case Study of NoSQL Adoption: What Drove Wordnik Non-Relational?
A Case Study of NoSQL Adoption: What Drove Wordnik Non-Relational?DATAVERSITY
 
Eurydike: Schemaless Object Relational SQL Mapper
Eurydike: Schemaless Object Relational SQL MapperEurydike: Schemaless Object Relational SQL Mapper
Eurydike: Schemaless Object Relational SQL MapperESUG
 
Data Modeling for NoSQL
Data Modeling for NoSQLData Modeling for NoSQL
Data Modeling for NoSQLTony Tam
 
Introducing Hibernate OGM: porting JPA applications to NoSQL, Sanne Grinovero...
Introducing Hibernate OGM: porting JPA applications to NoSQL, Sanne Grinovero...Introducing Hibernate OGM: porting JPA applications to NoSQL, Sanne Grinovero...
Introducing Hibernate OGM: porting JPA applications to NoSQL, Sanne Grinovero...OpenBlend society
 
NoSQL, Apache SOLR and Apache Hadoop
NoSQL, Apache SOLR and Apache HadoopNoSQL, Apache SOLR and Apache Hadoop
NoSQL, Apache SOLR and Apache HadoopDmitry Kan
 
DSpace Under the Hood
DSpace Under the HoodDSpace Under the Hood
DSpace Under the HoodDuraSpace
 
SeaJUG May 2012 mybatis
SeaJUG May 2012 mybatisSeaJUG May 2012 mybatis
SeaJUG May 2012 mybatisWill Iverson
 
SDEC2011 NoSQL concepts and models
SDEC2011 NoSQL concepts and modelsSDEC2011 NoSQL concepts and models
SDEC2011 NoSQL concepts and modelsKorea Sdec
 
Oracle Analytics Security Everything you always wanted to know
Oracle Analytics Security Everything you always wanted to knowOracle Analytics Security Everything you always wanted to know
Oracle Analytics Security Everything you always wanted to knowChristian Berg
 
Inside Wordnik's Architecture
Inside Wordnik's ArchitectureInside Wordnik's Architecture
Inside Wordnik's ArchitectureTony Tam
 
Combine Spring Data Neo4j and Spring Boot to quickl
Combine Spring Data Neo4j and Spring Boot to quicklCombine Spring Data Neo4j and Spring Boot to quickl
Combine Spring Data Neo4j and Spring Boot to quicklNeo4j
 
Riak seattle-meetup-august
Riak seattle-meetup-augustRiak seattle-meetup-august
Riak seattle-meetup-augustpharkmillups
 
The Why and How of Scala at Twitter
The Why and How of Scala at TwitterThe Why and How of Scala at Twitter
The Why and How of Scala at TwitterAlex Payne
 
OSGi, Scripting and REST, Building Webapps With Apache Sling
OSGi, Scripting and REST, Building Webapps With Apache SlingOSGi, Scripting and REST, Building Webapps With Apache Sling
OSGi, Scripting and REST, Building Webapps With Apache SlingCarsten Ziegeler
 
Apache Cayenne: a Java ORM Alternative
Apache Cayenne: a Java ORM AlternativeApache Cayenne: a Java ORM Alternative
Apache Cayenne: a Java ORM AlternativeAndrus Adamchik
 
Oct meetup open stack 101 clean
Oct meetup open stack 101   cleanOct meetup open stack 101   clean
Oct meetup open stack 101 cleanbenrodrigue
 
How to Build Deep Learning Models
How to Build Deep Learning ModelsHow to Build Deep Learning Models
How to Build Deep Learning ModelsJosh Patterson
 

Similar a Scaling at Showyou: Data Management and Riak Integration (20)

Sebastian Cohnen – Building a Startup with NoSQL - NoSQL matters Barcelona 2014
Sebastian Cohnen – Building a Startup with NoSQL - NoSQL matters Barcelona 2014Sebastian Cohnen – Building a Startup with NoSQL - NoSQL matters Barcelona 2014
Sebastian Cohnen – Building a Startup with NoSQL - NoSQL matters Barcelona 2014
 
JCR - Java Content Repositories
JCR - Java Content RepositoriesJCR - Java Content Repositories
JCR - Java Content Repositories
 
A Case Study of NoSQL Adoption: What Drove Wordnik Non-Relational?
A Case Study of NoSQL Adoption: What Drove Wordnik Non-Relational?A Case Study of NoSQL Adoption: What Drove Wordnik Non-Relational?
A Case Study of NoSQL Adoption: What Drove Wordnik Non-Relational?
 
Eurydike: Schemaless Object Relational SQL Mapper
Eurydike: Schemaless Object Relational SQL MapperEurydike: Schemaless Object Relational SQL Mapper
Eurydike: Schemaless Object Relational SQL Mapper
 
Data Modeling for NoSQL
Data Modeling for NoSQLData Modeling for NoSQL
Data Modeling for NoSQL
 
Introducing Hibernate OGM: porting JPA applications to NoSQL, Sanne Grinovero...
Introducing Hibernate OGM: porting JPA applications to NoSQL, Sanne Grinovero...Introducing Hibernate OGM: porting JPA applications to NoSQL, Sanne Grinovero...
Introducing Hibernate OGM: porting JPA applications to NoSQL, Sanne Grinovero...
 
NoSQL, Apache SOLR and Apache Hadoop
NoSQL, Apache SOLR and Apache HadoopNoSQL, Apache SOLR and Apache Hadoop
NoSQL, Apache SOLR and Apache Hadoop
 
DSpace Under the Hood
DSpace Under the HoodDSpace Under the Hood
DSpace Under the Hood
 
SeaJUG May 2012 mybatis
SeaJUG May 2012 mybatisSeaJUG May 2012 mybatis
SeaJUG May 2012 mybatis
 
SDEC2011 NoSQL concepts and models
SDEC2011 NoSQL concepts and modelsSDEC2011 NoSQL concepts and models
SDEC2011 NoSQL concepts and models
 
Oracle Analytics Security Everything you always wanted to know
Oracle Analytics Security Everything you always wanted to knowOracle Analytics Security Everything you always wanted to know
Oracle Analytics Security Everything you always wanted to know
 
Inside Wordnik's Architecture
Inside Wordnik's ArchitectureInside Wordnik's Architecture
Inside Wordnik's Architecture
 
Combine Spring Data Neo4j and Spring Boot to quickl
Combine Spring Data Neo4j and Spring Boot to quicklCombine Spring Data Neo4j and Spring Boot to quickl
Combine Spring Data Neo4j and Spring Boot to quickl
 
Riak seattle-meetup-august
Riak seattle-meetup-augustRiak seattle-meetup-august
Riak seattle-meetup-august
 
The Why and How of Scala at Twitter
The Why and How of Scala at TwitterThe Why and How of Scala at Twitter
The Why and How of Scala at Twitter
 
OSGi, Scripting and REST, Building Webapps With Apache Sling
OSGi, Scripting and REST, Building Webapps With Apache SlingOSGi, Scripting and REST, Building Webapps With Apache Sling
OSGi, Scripting and REST, Building Webapps With Apache Sling
 
Apache Cayenne: a Java ORM Alternative
Apache Cayenne: a Java ORM AlternativeApache Cayenne: a Java ORM Alternative
Apache Cayenne: a Java ORM Alternative
 
Oct meetup open stack 101 clean
Oct meetup open stack 101   cleanOct meetup open stack 101   clean
Oct meetup open stack 101 clean
 
How to Build Deep Learning Models
How to Build Deep Learning ModelsHow to Build Deep Learning Models
How to Build Deep Learning Models
 
NOsql Presentation.pdf
NOsql Presentation.pdfNOsql Presentation.pdf
NOsql Presentation.pdf
 

Último

Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 

Último (20)

Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 

Scaling at Showyou: Data Management and Riak Integration

  • 1. Scaling at Showyou John Muellerleile (@jrecursive, john@remixation.com) September 26, 2011 Tuesday, September 27, 2011
  • 2. John Muellerleile • Basho Technologies: Jan. 2008 - Dec. 2010 • Riak • Riak Search • Automated research, NLP, spidering • Consulting: 2003 - 2008 • E-commerce, AdSense, AdWords, ... • Infrastructure: • Messaging • Riak • Solr Tuesday, September 27, 2011
  • 3. Agenda • Who am I? • What is Showyou? • The Nature of “Social Data” • Showyou’s Data: Today & Tomorrow • Data Management: Technology Stack • Riak: The Awesome & Sub-Awesome, Integration Patterns & Observations • Not Bob’s Riak: The “Mecha” Backend & Query System Tuesday, September 27, 2011
  • 4. What is Showyou? Tuesday, September 27, 2011
  • 5. A minute about the nature of “Social Data” Tuesday, September 27, 2011
  • 12. Showyou’s Data: Today • Riak with Bitcask • Solr with replication • Often ever-growing blobs of JSON • No useful way to “find” data based on anything other than compound primary keys :( • Primary keys crafted for specific access patterns :( • This will not last forever Tuesday, September 27, 2011
  • 13. Showyou’s Data: Tomorrow • Significant data growth per additional user • Find & aggregate data about our users & their videos • Derive useful “signal” from this data • Better search: disambiguation, “more like this”, performance • “Collective Intelligence”: trending, “smart collections” • Spam & De-duplication • & more: #hashtags, auto-complete, statistics, usage, ... Tuesday, September 27, 2011
  • 14. Data Management: Technology Stack • Erlang/OTP • Riak, Bitcask • Java • JInterface • Hazelcast • Solr Tuesday, September 27, 2011
  • 15. Riak: The Awesome Parts • By far the best operational story in its class • Shared-nothing • No single point of failure • Masterless multi-site replication • Support via EnterpriseDS Startup Program • I helped write it & turned it inside-out to design Riak Search while working at Basho -- this helps Tuesday, September 27, 2011
  • 16. Riak: The Sub-Awesome Parts • Riak Search as it exists today does not perform well for us & lacks features • Bitcask keeps all keys in memory • Listing keys for a bucket will take your cluster down • Map/Reduce is virtually useless (for us) other than as “multi-get” • Pre-1.0 cluster membership changes are “at your peril” • Usable/useful built-in monitoring is non-existent • If you are not intimately familiar with Riak, it’s very hard to debug! Tuesday, September 27, 2011
  • 17. Riak: Integration Patterns • api_riak_node: “main” Riak cluster node • sidecar: post-commit talks to local Java-based Erlang node using JInterface • fabric: distributed data structures & utilities via Hazelcast - very similar in spirit & implementation as Riak! • indexer: pull from fabric queue & index record in Solr • Identical deployment on every node Tuesday, September 27, 2011
  • 18. Riak: Integration Patterns: The Big Picture • backend_node: nginx, redis; logs • spider: finds, extracts, stores & indexes further information on users & videos • log_indexer: aggregate & index interesting parts of our access logs • research_riak_node: a special-purpose Riak cluster node to support Showyou data “Tomorrow” Tuesday, September 27, 2011
  • 19. Riak: Integration Patterns: Observations • Riak post-commit hooks • Why not pre-commit hooks? • Riak as a “virtual memory” • Post-commit hooks as “change events” • Wishlist: pre- & post-delete hook (I realize this is tricky - do it anyway) • Wishlist: pre- & post-create hook (Less tricky - do it anyway) Tuesday, September 27, 2011
  • 20. Riak: Integration Patterns: Search • Monolithic replicated Solr won’t last forever & sharding is a faceted multi- value “shitshow” [1] • We’re doing fine, for now, with lots of RAM, SSDs, etc. but... • I don’t want to find out “the hard way” where the joyride ends and the hellride starts [2] • Clearly the answer is to kill every bird in nearby airspace by writing a Riak storage backend and integrated query mechanism! [2,3,4] [1] Cliff Moon suggested the use of this word (for emphasis) [2] It’s okay, I have done this several dozen times [3] Yes, really: BDB, BDBJE, Innostore, sqlite, hsql, mysql, postgres, etc. [4] This is not a pride thing: it was a hard, lonely, unforgiving road Tuesday, September 27, 2011
  • 21. Not Bob’s Riak: A Moment of Weakness “My way of joking is to tell the truth; it’s the funniest joke in the world” George Bernard Shaw Tuesday, September 27, 2011
  • 22. Not Bob’s Riak: Introducing “Mecha” • Bob? Tuesday, September 27, 2011
  • 23. Mecha: Goals • Birds to kill: • Tight, purposed Riak integration • Efficient & feature-complete indexing • Fast sequential & range object access • Flexible distributed query mechanism • Query parallelism where appropriate Tuesday, September 27, 2011
  • 24. Mecha: What Stays The Same • Works with “stock” (unmodified) Riak 1.0 (pre & release) • Little/no difference from the “Riak side” -- everything works “as it should” • Differences: • All objects you “put” into Riak must be JSON objects (this will change by release to respect content type) • Any fields without a specific “indexed field type” (e.g., _t, _s, _dt, etc.) are simply stored along with the rest of the fields (i.e. “stored field”) Tuesday, September 27, 2011
  • 25. Mecha: At a glance • Written in Java, uses many, many third-party libraries: LevelDB (JNI), Solr, JInterface, Jetlang, Netty, Protobufs, Commons, ... • Riak backend module written in Erlang; beaten into submission for reliable interaction with JInterface-driven Java node • LevelDB instance per partition, per bucket; “ulimit -n 90000” :) • One Solr instance per node (covers all partitions, buckets) • Objects stored as Riak Objects in JSON form in LevelDB • Standardized schema covering common data types using name suffixes Tuesday, September 27, 2011
  • 26. Mecha: Riak Integration Tuesday, September 27, 2011
  • 27. Mecha: Index Field Types • Supported index field types (by suffix): • _t, _tt - full-text (optionally w/ term vectors, ...) • _s, _s_mv - exact string, multi-value exact string • _i, _l, _d, _f - trie-based integer, long, double, float • _dt - trie-based date (YYYY-MM-DDTHH:MM:SS) • _b - boolean • _xy, _ll, _geo - point, lat/lon & geohash Tuesday, September 27, 2011
  • 28. Mecha: Example Object { “content_t”: “NoSQL is a ghetto”, “lol_count_i”: 47, “lol_dt”: “2009-04-13T07:01:43.000Z” “rating_f”: 4.111111164093018, “tags_s_mv”: [ “funny”, “lol”, “nosql”, “ghetto”]} Tuesday, September 27, 2011
  • 29. Mecha: Fast Sequential & Range Object Access • Every bucket gets own instance of LevelDB per partition • No “multiplexing” buckets or partitions per LevelDB instance • Keys are literal, no encoded Erlang terms; simple ranges, smaller values • Values stored as JSON-ized Riak objects (why? you’ll see) • LevelDB JNI binaries shipped with Snappy compression built-in Tuesday, September 27, 2011
  • 30. Mecha: Flexible Distributed Querying • Exact, prefix, suffix, & wildcard filtering on multiple index fields • Ultra-fast list_keys, list_bucket, list_buckets replacements • Equally fast bucket count operation • Ridiculous range query performance (any trie- type; sane datetime functions) • Faceting, group counts • Spatial (bounding box/bowl, geofilt, Haversine, distance faceting) Tuesday, September 27, 2011
  • 31. Mecha: Query Parallelism Tuesday, September 27, 2011
  • 32. Mecha: Coverage • Coverage is the set of nodes and respectively owned partitions you must process to cover all objects in a given bucket. Node 1 Node 2 (for n=2) Tuesday, September 27, 2011
  • 33. Mecha: Examples: Count & Faceting • Count the number of records in the "research" bucket modified within the last 10 seconds • Top search terms by count for the last hour Tuesday, September 27, 2011
  • 34. Mecha: Examples: Multi-value String Fields • See which field values occur with other values, and how often (using a multi- value string field) Tuesday, September 27, 2011
  • 35. Mecha: Next Steps • Code cleanup, test & polish current functionality • Sane build & deployment (yay, Maven) • Simplify configuration • Embed Solr (currently running standalone for debugging) • Extend & improve Solr “standard configuration” • “Out of Band” Map/reduce with “direct bucket-level” LevelDB instance • After that? Join operators, ... :) Tuesday, September 27, 2011
  • 36. Mecha: Availability • Right now it is a complex system • It will be worth waiting for tighter integration & polish • This is me not answering your question :) • As soon as responsible, sane, & possible :) Tuesday, September 27, 2011
  • 37. Thank you! • Basho Technologies, especially Andy Gross (@argv0) & Kelly McLaughlin (@_klm) specifically for help with Riak 1.0 changes • Of course, without Erlang/OTP, LevelDB, Solr, Java, JInterface, and a host of other open source projects, this would have never even gotten started -- thank you. • Last but not least, thank you to my Showyou teammates for encouragement & support! • Contact: John Muellerleile / http://twitter.com/jrecursive john@remixation.com, jmuellerleile@gmail.com http://github.com/jrecursive Tuesday, September 27, 2011