SlideShare una empresa de Scribd logo
1 de 21
Descargar para leer sin conexión
Building a Cassandra based
 application from scratch
        Patrick McFadin
    Cassandra Summit 2012
         #cassandra12
This is me
• Chief Architect at Hobsons
  – Hobsons is an education services company. More
    here: www.hobsons.com
• Cassandra user since .7
• Follow me here: @PatrickMcFadin
Goals
•   Take a new concept
•   What’s the data model?!?!
•   Some sample code
•   You get homework! (If you want)
Here’s the plan
•   Conceptualize a new application
•   Identify the entity tables
•   Identify query tables
•   Code. Rinse. Repeat.
•   Deploy
•   …
•   Profit!
           * I’ll be using the term Tables which is equivalent to Column Families
www.killrvideos.com

                                                    Video Tit le                        User name
                                      Recommended
                                                                   D ipt ion
                                                                    escr




Start with a                                              Meow
                                                                                                       Ads
  concept                                                                                           by Google




 Video Sharing Website                              Rat ing:                   Tags: Foo Bar



                                                                                                    Upload New!
                                                                   Comment s




*Cat drawing by goodrob13 on Flickr
Break down the features
•   Post a video*
•   View a video
•   Add a comment
•   Rate a video
•   Tag a video


     * Not talking about transcoding! Check out zencoder.com, it’s pretty sweet.
Create Entity Tables

  Basic storage unit
Users
                       password        FirstName        LastName
      Username




•   Similar to a RDBMS table. Fairly fixed columns
•   Username is unique
•   Use secondary indexes on firstname and lastname for lookup
•   Adding columns with Cassandra is super easy




                              CREATE TABLE users (
                                username varchar PRIMARY KEY,
                                firstname varchar,
                                lastname varchar,
                                password varchar
                              );
Users: The set code
static void setUser(User user, Keyspace keyspace) {

    // Create a mutator that allows you to talk to casssandra
    Mutator<String> mutator = HFactory.createMutator(keyspace, stringSerializer);

    try {

       // Use the mutator to insert data into our table
       mutator.addInsertion(user.getUsername(), "users",
          HFactory.createStringColumn("firstname", user.getFirstname()));
       mutator.addInsertion(user.getUsername(), "users”,
          HFactory.createStringColumn("lastname", user.getLastname()));
       mutator.addInsertion(user.getUsername(), "users",
          HFactory.createStringColumn("password", user.getPassword()));

       // Once the mutator is ready, execute on cassandra
       mutator.execute();

    } catch (HectorException he) {
       he.printStackTrace();
    }
}




                                                            You can implement the get…
Videos
                                   UserName      Description   Tags
    VideoId       VideoName
    <UUID>



•   Use a UUID as a row key for uniqueness
•   Allows for same video names
•   Tags should be stored in some sort of delimited format
•   Index on username may not be the best plan


                        CREATE TABLE videos (
                          videoid uuid PRIMARY KEY,
                          videoname varchar,
                          username varchar,
                          description varchar,
                          tags varchar
                        );
Videos: The get code
static Video getVideoByUUID(UUID videoId, Keyspace keyspace){

    Video video = new Video();

    //Create a slice query. We'll be getting specific column names
    SliceQuery<UUID, String, String> sliceQuery =
       HFactory.createSliceQuery(keyspace, uuidSerializer, stringSerializer, stringSerializer);

    sliceQuery.setColumnFamily("videos");
    sliceQuery.setKey(videoId);

    sliceQuery.setColumnNames("videoname","username","description","tags");

    // Execute the query and get the list of columns
    ColumnSlice<String,String> result = sliceQuery.execute().get();

    // Get each column by name and add them to our video object
    video.setVideoName(result.getColumnByName("videoname").getValue());
    video.setUsername(result.getColumnByName("username").getValue());
    video.setDescription(result.getColumnByName("description").getValue());
    video.setTags(result.getColumnByName("tags").getValue().split(","));

    return video;
}


                                                                  You can implement the set…
Comments
     VideoId      Username:<timestamp>           ..        Username:<timestamp>

     <UUID>


                    Time Order
•   Videos have many comments
•   Use Composite Columns to store user and time
•   Value of each column is the text of the comment
•   Order is as inserted
•   Use getSlice() to pull some or all of the comments

                                         CREATE TABLE comments (
                                           videoid uuid PRIMARY KEY
                                           comment varchar
                                         );
Rating a video
                             rating_count         rating_total
             VideoId
             <UUID>           <counter>             <counter>




• Use counter for single call update
• rating_count is how many ratings were given
• rating_total is the sum of rating
• Ex: rating_count = 5, rating_total = 23, avg rating = 23/5 = 4.6


                                  CREATE TABLE video_rating (
                                    videoid uuid PRIMARY KEY,
                                    rating_counter counter,
                                    rating_total counter);*

                                  * Only valid in CQL 3+
Video Event
                       start_<timestamp>   stop_<timestamp>    start_<timestamp>
    VideoId:Username
                                           video_<timestamp>


                          Time Order
•     Track viewing events
•     Combine Video ID and Username for a unique row
•     Stop time can be used to pick up where they left off
•     Great for usage analytics later




                                    CREATE TABLE video_event (
                                      videoid_username varchar PRIMARY KEY,
                                      event varchar
                                    );
Create Query Tables

Indexes to support fast lookups
Lookup Video By Username
                      VideoId:<timestamp>       ..           VideoId:<timestamp>
       Username




•   Username is unique
•   One column for each new video uploaded
•   Column slice for time span. From x to y
•   VideoId is added the same time a Video record is added




                               CREATE TABLE username_video_index (
                                 username varchar PRIMARY KEY,
                                 videoid_timestamp varchar
                               );
Videos by Tag
                         VideoId                ..              VideoId
         tag




•   Tag is unique regardless of video
•   Great for “List videos with X tag”
•   Tags have to be updated in Video and Tag at the same time
•   Index integrity is maintained in app logic



                                   CREATE TABLE tag_index (
                                     tag varchar PRIMARY KEY,
                                     videoid varchar
                                   );
Deployment strategies
• Measure your risk
  – Replication factor?
  – Multi-datacenter?
  – Cost?
• Performance
  – Today != tomorrow. Scale when needed
  – Have a expansion plan ready
Wrap up
• Similar data model process to RDBMS… to
  start
• Query -> Index table
• Don’t be afraid to write in multiple tables at
  once
• Bonus points: Hadoop and Solr!
Go play!
•   Go to: http://github.com/pmcfadin
•   Look for projects with cassandra12
•   Clone or fork my examples
•   Implement stubbed methods
•   Send me your solutions: pmcfadin@gmail.com
•   Follow me for updates: @PatrickMcFadin
Thank You!


Connect with me at @PatrickMcFadin
            Or linkedIn
   Conference tag #cassandra12

Más contenido relacionado

Destacado

Scalable Web Architectures - Common Patterns & Approaches
Scalable Web Architectures - Common Patterns & ApproachesScalable Web Architectures - Common Patterns & Approaches
Scalable Web Architectures - Common Patterns & ApproachesCal Henderson
 
Owning time series with team apache Strata San Jose 2015
Owning time series with team apache   Strata San Jose 2015Owning time series with team apache   Strata San Jose 2015
Owning time series with team apache Strata San Jose 2015Patrick McFadin
 
Cassandra EU - Data model on fire
Cassandra EU - Data model on fireCassandra EU - Data model on fire
Cassandra EU - Data model on firePatrick McFadin
 
The world's next top data model
The world's next top data modelThe world's next top data model
The world's next top data modelPatrick McFadin
 
Cassandra 3.0 advanced preview
Cassandra 3.0 advanced previewCassandra 3.0 advanced preview
Cassandra 3.0 advanced previewPatrick McFadin
 
An Introduction to time series with Team Apache
An Introduction to time series with Team ApacheAn Introduction to time series with Team Apache
An Introduction to time series with Team ApachePatrick McFadin
 
居住正義論壇II主題三(地政局)
居住正義論壇II主題三(地政局)居住正義論壇II主題三(地政局)
居住正義論壇II主題三(地政局)leembtoleem
 
Servlet 3.1 Async I/O
Servlet 3.1 Async I/OServlet 3.1 Async I/O
Servlet 3.1 Async I/OSimone Bordet
 
Open source or proprietary, choose wisely!
Open source or proprietary,  choose wisely!Open source or proprietary,  choose wisely!
Open source or proprietary, choose wisely!Patrick McFadin
 
Inbound Marketing for Startups in 2011
Inbound Marketing for Startups in 2011Inbound Marketing for Startups in 2011
Inbound Marketing for Startups in 2011Rand Fishkin
 
Cassandra Virtual Node talk
Cassandra Virtual Node talkCassandra Virtual Node talk
Cassandra Virtual Node talkPatrick McFadin
 

Destacado (15)

Cassandra at scale
Cassandra at scaleCassandra at scale
Cassandra at scale
 
Scalable Web Architectures - Common Patterns & Approaches
Scalable Web Architectures - Common Patterns & ApproachesScalable Web Architectures - Common Patterns & Approaches
Scalable Web Architectures - Common Patterns & Approaches
 
Owning time series with team apache Strata San Jose 2015
Owning time series with team apache   Strata San Jose 2015Owning time series with team apache   Strata San Jose 2015
Owning time series with team apache Strata San Jose 2015
 
Cassandra EU - Data model on fire
Cassandra EU - Data model on fireCassandra EU - Data model on fire
Cassandra EU - Data model on fire
 
The world's next top data model
The world's next top data modelThe world's next top data model
The world's next top data model
 
Cassandra 3.0 advanced preview
Cassandra 3.0 advanced previewCassandra 3.0 advanced preview
Cassandra 3.0 advanced preview
 
An Introduction to time series with Team Apache
An Introduction to time series with Team ApacheAn Introduction to time series with Team Apache
An Introduction to time series with Team Apache
 
Big Data at Riot Games
Big Data at Riot GamesBig Data at Riot Games
Big Data at Riot Games
 
居住正義論壇II主題三(地政局)
居住正義論壇II主題三(地政局)居住正義論壇II主題三(地政局)
居住正義論壇II主題三(地政局)
 
Servlet 3.1 Async I/O
Servlet 3.1 Async I/OServlet 3.1 Async I/O
Servlet 3.1 Async I/O
 
Open source or proprietary, choose wisely!
Open source or proprietary,  choose wisely!Open source or proprietary,  choose wisely!
Open source or proprietary, choose wisely!
 
Become a super modeler
Become a super modelerBecome a super modeler
Become a super modeler
 
Karel čapek
Karel čapekKarel čapek
Karel čapek
 
Inbound Marketing for Startups in 2011
Inbound Marketing for Startups in 2011Inbound Marketing for Startups in 2011
Inbound Marketing for Startups in 2011
 
Cassandra Virtual Node talk
Cassandra Virtual Node talkCassandra Virtual Node talk
Cassandra Virtual Node talk
 

Más de Patrick McFadin

Successful Architectures for Fast Data
Successful Architectures for Fast DataSuccessful Architectures for Fast Data
Successful Architectures for Fast DataPatrick McFadin
 
Laying down the smack on your data pipelines
Laying down the smack on your data pipelinesLaying down the smack on your data pipelines
Laying down the smack on your data pipelinesPatrick McFadin
 
Analyzing Time Series Data with Apache Spark and Cassandra
Analyzing Time Series Data with Apache Spark and CassandraAnalyzing Time Series Data with Apache Spark and Cassandra
Analyzing Time Series Data with Apache Spark and CassandraPatrick McFadin
 
Storing time series data with Apache Cassandra
Storing time series data with Apache CassandraStoring time series data with Apache Cassandra
Storing time series data with Apache CassandraPatrick McFadin
 
A Cassandra + Solr + Spark Love Triangle Using DataStax Enterprise
A Cassandra + Solr + Spark Love Triangle Using DataStax EnterpriseA Cassandra + Solr + Spark Love Triangle Using DataStax Enterprise
A Cassandra + Solr + Spark Love Triangle Using DataStax EnterprisePatrick McFadin
 
Advanced data modeling with apache cassandra
Advanced data modeling with apache cassandraAdvanced data modeling with apache cassandra
Advanced data modeling with apache cassandraPatrick McFadin
 
Introduction to data modeling with apache cassandra
Introduction to data modeling with apache cassandraIntroduction to data modeling with apache cassandra
Introduction to data modeling with apache cassandraPatrick McFadin
 
Apache cassandra and spark. you got the the lighter, let's start the fire
Apache cassandra and spark. you got the the lighter, let's start the fireApache cassandra and spark. you got the the lighter, let's start the fire
Apache cassandra and spark. you got the the lighter, let's start the firePatrick McFadin
 
Nike Tech Talk: Double Down on Apache Cassandra and Spark
Nike Tech Talk:  Double Down on Apache Cassandra and SparkNike Tech Talk:  Double Down on Apache Cassandra and Spark
Nike Tech Talk: Double Down on Apache Cassandra and SparkPatrick McFadin
 
Apache cassandra & apache spark for time series data
Apache cassandra & apache spark for time series dataApache cassandra & apache spark for time series data
Apache cassandra & apache spark for time series dataPatrick McFadin
 
Real data models of silicon valley
Real data models of silicon valleyReal data models of silicon valley
Real data models of silicon valleyPatrick McFadin
 
Introduction to cassandra 2014
Introduction to cassandra 2014Introduction to cassandra 2014
Introduction to cassandra 2014Patrick McFadin
 
Making money with open source and not losing your soul: A practical guide
Making money with open source and not losing your soul: A practical guideMaking money with open source and not losing your soul: A practical guide
Making money with open source and not losing your soul: A practical guidePatrick McFadin
 
Time series with Apache Cassandra - Long version
Time series with Apache Cassandra - Long versionTime series with Apache Cassandra - Long version
Time series with Apache Cassandra - Long versionPatrick McFadin
 
Time series with apache cassandra strata
Time series with apache cassandra   strataTime series with apache cassandra   strata
Time series with apache cassandra strataPatrick McFadin
 
Cassandra 2.0 and timeseries
Cassandra 2.0 and timeseriesCassandra 2.0 and timeseries
Cassandra 2.0 and timeseriesPatrick McFadin
 
The data model is dead, long live the data model
The data model is dead, long live the data modelThe data model is dead, long live the data model
The data model is dead, long live the data modelPatrick McFadin
 

Más de Patrick McFadin (17)

Successful Architectures for Fast Data
Successful Architectures for Fast DataSuccessful Architectures for Fast Data
Successful Architectures for Fast Data
 
Laying down the smack on your data pipelines
Laying down the smack on your data pipelinesLaying down the smack on your data pipelines
Laying down the smack on your data pipelines
 
Analyzing Time Series Data with Apache Spark and Cassandra
Analyzing Time Series Data with Apache Spark and CassandraAnalyzing Time Series Data with Apache Spark and Cassandra
Analyzing Time Series Data with Apache Spark and Cassandra
 
Storing time series data with Apache Cassandra
Storing time series data with Apache CassandraStoring time series data with Apache Cassandra
Storing time series data with Apache Cassandra
 
A Cassandra + Solr + Spark Love Triangle Using DataStax Enterprise
A Cassandra + Solr + Spark Love Triangle Using DataStax EnterpriseA Cassandra + Solr + Spark Love Triangle Using DataStax Enterprise
A Cassandra + Solr + Spark Love Triangle Using DataStax Enterprise
 
Advanced data modeling with apache cassandra
Advanced data modeling with apache cassandraAdvanced data modeling with apache cassandra
Advanced data modeling with apache cassandra
 
Introduction to data modeling with apache cassandra
Introduction to data modeling with apache cassandraIntroduction to data modeling with apache cassandra
Introduction to data modeling with apache cassandra
 
Apache cassandra and spark. you got the the lighter, let's start the fire
Apache cassandra and spark. you got the the lighter, let's start the fireApache cassandra and spark. you got the the lighter, let's start the fire
Apache cassandra and spark. you got the the lighter, let's start the fire
 
Nike Tech Talk: Double Down on Apache Cassandra and Spark
Nike Tech Talk:  Double Down on Apache Cassandra and SparkNike Tech Talk:  Double Down on Apache Cassandra and Spark
Nike Tech Talk: Double Down on Apache Cassandra and Spark
 
Apache cassandra & apache spark for time series data
Apache cassandra & apache spark for time series dataApache cassandra & apache spark for time series data
Apache cassandra & apache spark for time series data
 
Real data models of silicon valley
Real data models of silicon valleyReal data models of silicon valley
Real data models of silicon valley
 
Introduction to cassandra 2014
Introduction to cassandra 2014Introduction to cassandra 2014
Introduction to cassandra 2014
 
Making money with open source and not losing your soul: A practical guide
Making money with open source and not losing your soul: A practical guideMaking money with open source and not losing your soul: A practical guide
Making money with open source and not losing your soul: A practical guide
 
Time series with Apache Cassandra - Long version
Time series with Apache Cassandra - Long versionTime series with Apache Cassandra - Long version
Time series with Apache Cassandra - Long version
 
Time series with apache cassandra strata
Time series with apache cassandra   strataTime series with apache cassandra   strata
Time series with apache cassandra strata
 
Cassandra 2.0 and timeseries
Cassandra 2.0 and timeseriesCassandra 2.0 and timeseries
Cassandra 2.0 and timeseries
 
The data model is dead, long live the data model
The data model is dead, long live the data modelThe data model is dead, long live the data model
The data model is dead, long live the data model
 

Último

Irene Moetsana-Moeng: Stakeholders in Cybersecurity: Collaborative Defence fo...
Irene Moetsana-Moeng: Stakeholders in Cybersecurity: Collaborative Defence fo...Irene Moetsana-Moeng: Stakeholders in Cybersecurity: Collaborative Defence fo...
Irene Moetsana-Moeng: Stakeholders in Cybersecurity: Collaborative Defence fo...itnewsafrica
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Hiroshi SHIBATA
 
Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...itnewsafrica
 
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesThousandEyes
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsNathaniel Shimoni
 
JET Technology Labs White Paper for Virtualized Security and Encryption Techn...
JET Technology Labs White Paper for Virtualized Security and Encryption Techn...JET Technology Labs White Paper for Virtualized Security and Encryption Techn...
JET Technology Labs White Paper for Virtualized Security and Encryption Techn...amber724300
 
Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...
Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...
Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...Jeffrey Haguewood
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfNeo4j
 
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS:  6 Ways to Automate Your Data IntegrationBridging Between CAD & GIS:  6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integrationmarketing932765
 
QCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesQCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesBernd Ruecker
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Alkin Tezuysal
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesThousandEyes
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI AgeCprime
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentPim van der Noll
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersNicole Novielli
 
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesMuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesManik S Magar
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Mark Goldstein
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfIngrid Airi González
 
A Glance At The Java Performance Toolbox
A Glance At The Java Performance ToolboxA Glance At The Java Performance Toolbox
A Glance At The Java Performance ToolboxAna-Maria Mihalceanu
 
Landscape Catalogue 2024 Australia-1.pdf
Landscape Catalogue 2024 Australia-1.pdfLandscape Catalogue 2024 Australia-1.pdf
Landscape Catalogue 2024 Australia-1.pdfAarwolf Industries LLC
 

Último (20)

Irene Moetsana-Moeng: Stakeholders in Cybersecurity: Collaborative Defence fo...
Irene Moetsana-Moeng: Stakeholders in Cybersecurity: Collaborative Defence fo...Irene Moetsana-Moeng: Stakeholders in Cybersecurity: Collaborative Defence fo...
Irene Moetsana-Moeng: Stakeholders in Cybersecurity: Collaborative Defence fo...
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024
 
Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...
 
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directions
 
JET Technology Labs White Paper for Virtualized Security and Encryption Techn...
JET Technology Labs White Paper for Virtualized Security and Encryption Techn...JET Technology Labs White Paper for Virtualized Security and Encryption Techn...
JET Technology Labs White Paper for Virtualized Security and Encryption Techn...
 
Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...
Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...
Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdf
 
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS:  6 Ways to Automate Your Data IntegrationBridging Between CAD & GIS:  6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
 
QCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesQCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architectures
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI Age
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
 
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesMuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdf
 
A Glance At The Java Performance Toolbox
A Glance At The Java Performance ToolboxA Glance At The Java Performance Toolbox
A Glance At The Java Performance Toolbox
 
Landscape Catalogue 2024 Australia-1.pdf
Landscape Catalogue 2024 Australia-1.pdfLandscape Catalogue 2024 Australia-1.pdf
Landscape Catalogue 2024 Australia-1.pdf
 

Cassandra Summit 2012 - Building a Cassandra Based App From Scratch

  • 1. Building a Cassandra based application from scratch Patrick McFadin Cassandra Summit 2012 #cassandra12
  • 2. This is me • Chief Architect at Hobsons – Hobsons is an education services company. More here: www.hobsons.com • Cassandra user since .7 • Follow me here: @PatrickMcFadin
  • 3. Goals • Take a new concept • What’s the data model?!?! • Some sample code • You get homework! (If you want)
  • 4. Here’s the plan • Conceptualize a new application • Identify the entity tables • Identify query tables • Code. Rinse. Repeat. • Deploy • … • Profit! * I’ll be using the term Tables which is equivalent to Column Families
  • 5. www.killrvideos.com Video Tit le User name Recommended D ipt ion escr Start with a Meow Ads concept by Google Video Sharing Website Rat ing: Tags: Foo Bar Upload New! Comment s *Cat drawing by goodrob13 on Flickr
  • 6. Break down the features • Post a video* • View a video • Add a comment • Rate a video • Tag a video * Not talking about transcoding! Check out zencoder.com, it’s pretty sweet.
  • 7. Create Entity Tables Basic storage unit
  • 8. Users password FirstName LastName Username • Similar to a RDBMS table. Fairly fixed columns • Username is unique • Use secondary indexes on firstname and lastname for lookup • Adding columns with Cassandra is super easy CREATE TABLE users ( username varchar PRIMARY KEY, firstname varchar, lastname varchar, password varchar );
  • 9. Users: The set code static void setUser(User user, Keyspace keyspace) { // Create a mutator that allows you to talk to casssandra Mutator<String> mutator = HFactory.createMutator(keyspace, stringSerializer); try { // Use the mutator to insert data into our table mutator.addInsertion(user.getUsername(), "users", HFactory.createStringColumn("firstname", user.getFirstname())); mutator.addInsertion(user.getUsername(), "users”, HFactory.createStringColumn("lastname", user.getLastname())); mutator.addInsertion(user.getUsername(), "users", HFactory.createStringColumn("password", user.getPassword())); // Once the mutator is ready, execute on cassandra mutator.execute(); } catch (HectorException he) { he.printStackTrace(); } } You can implement the get…
  • 10. Videos UserName Description Tags VideoId VideoName <UUID> • Use a UUID as a row key for uniqueness • Allows for same video names • Tags should be stored in some sort of delimited format • Index on username may not be the best plan CREATE TABLE videos ( videoid uuid PRIMARY KEY, videoname varchar, username varchar, description varchar, tags varchar );
  • 11. Videos: The get code static Video getVideoByUUID(UUID videoId, Keyspace keyspace){ Video video = new Video(); //Create a slice query. We'll be getting specific column names SliceQuery<UUID, String, String> sliceQuery = HFactory.createSliceQuery(keyspace, uuidSerializer, stringSerializer, stringSerializer); sliceQuery.setColumnFamily("videos"); sliceQuery.setKey(videoId); sliceQuery.setColumnNames("videoname","username","description","tags"); // Execute the query and get the list of columns ColumnSlice<String,String> result = sliceQuery.execute().get(); // Get each column by name and add them to our video object video.setVideoName(result.getColumnByName("videoname").getValue()); video.setUsername(result.getColumnByName("username").getValue()); video.setDescription(result.getColumnByName("description").getValue()); video.setTags(result.getColumnByName("tags").getValue().split(",")); return video; } You can implement the set…
  • 12. Comments VideoId Username:<timestamp> .. Username:<timestamp> <UUID> Time Order • Videos have many comments • Use Composite Columns to store user and time • Value of each column is the text of the comment • Order is as inserted • Use getSlice() to pull some or all of the comments CREATE TABLE comments ( videoid uuid PRIMARY KEY comment varchar );
  • 13. Rating a video rating_count rating_total VideoId <UUID> <counter> <counter> • Use counter for single call update • rating_count is how many ratings were given • rating_total is the sum of rating • Ex: rating_count = 5, rating_total = 23, avg rating = 23/5 = 4.6 CREATE TABLE video_rating ( videoid uuid PRIMARY KEY, rating_counter counter, rating_total counter);* * Only valid in CQL 3+
  • 14. Video Event start_<timestamp> stop_<timestamp> start_<timestamp> VideoId:Username video_<timestamp> Time Order • Track viewing events • Combine Video ID and Username for a unique row • Stop time can be used to pick up where they left off • Great for usage analytics later CREATE TABLE video_event ( videoid_username varchar PRIMARY KEY, event varchar );
  • 15. Create Query Tables Indexes to support fast lookups
  • 16. Lookup Video By Username VideoId:<timestamp> .. VideoId:<timestamp> Username • Username is unique • One column for each new video uploaded • Column slice for time span. From x to y • VideoId is added the same time a Video record is added CREATE TABLE username_video_index ( username varchar PRIMARY KEY, videoid_timestamp varchar );
  • 17. Videos by Tag VideoId .. VideoId tag • Tag is unique regardless of video • Great for “List videos with X tag” • Tags have to be updated in Video and Tag at the same time • Index integrity is maintained in app logic CREATE TABLE tag_index ( tag varchar PRIMARY KEY, videoid varchar );
  • 18. Deployment strategies • Measure your risk – Replication factor? – Multi-datacenter? – Cost? • Performance – Today != tomorrow. Scale when needed – Have a expansion plan ready
  • 19. Wrap up • Similar data model process to RDBMS… to start • Query -> Index table • Don’t be afraid to write in multiple tables at once • Bonus points: Hadoop and Solr!
  • 20. Go play! • Go to: http://github.com/pmcfadin • Look for projects with cassandra12 • Clone or fork my examples • Implement stubbed methods • Send me your solutions: pmcfadin@gmail.com • Follow me for updates: @PatrickMcFadin
  • 21. Thank You! Connect with me at @PatrickMcFadin Or linkedIn Conference tag #cassandra12

Notas del editor

  1. I’ll be using the term Tables instead of Column Family during this presentation.
  2. Comp Columns. Two different types
  3. Example shows use of a counter