SlideShare una empresa de Scribd logo
1 de 21
Building a Cassandra based
 application from scratch
        Patrick McFadin
    Cassandra Summit 2012
         #cassandra12
This is me
• Chief Architect at Hobsons
  – Hobsons is an education services company. More
    here: www.hobsons.com
• Cassandra user since .7
• Follow me here: @PatrickMcFadin
Goals
•   Take a new concept
•   What’s the data model?!?!
•   Some sample code
•   You get homework! (If you want)
Here’s the plan
•   Conceptualize a new application
•   Identify the entity tables
•   Identify query tables
•   Code. Rinse. Repeat.
•   Deploy
•   …
•   Profit!
           * I’ll be using the term Tables which is equivalent to Column Families
www.killrvideos.com

                                                    Video Tit le                        User name
                                      Recommended
                                                                   D ipt ion
                                                                    escr




Start with a                                              Meow
                                                                                                       Ads
  concept                                                                                           by Google




 Video Sharing Website                              Rat ing:                   Tags: Foo Bar



                                                                                                    Upload New!
                                                                   Comment s




*Cat drawing by goodrob13 on Flickr
Break down the features
•   Post a video*
•   View a video
•   Add a comment
•   Rate a video
•   Tag a video


     * Not talking about transcoding! Check out zencoder.com, it’s pretty sweet.
Create Entity Tables

  Basic storage unit
Users
                       password        FirstName        LastName
      Username




•   Similar to a RDBMS table. Fairly fixed columns
•   Username is unique
•   Use secondary indexes on firstname and lastname for lookup
•   Adding columns with Cassandra is super easy




                              CREATE TABLE users (
                                username varchar PRIMARY KEY,
                                firstname varchar,
                                lastname varchar,
                                password varchar
                              );
Users: The set code
static void setUser(User user, Keyspace keyspace) {

    // Create a mutator that allows you to talk to casssandra
    Mutator<String> mutator = HFactory.createMutator(keyspace, stringSerializer);

    try {

       // Use the mutator to insert data into our table
       mutator.addInsertion(user.getUsername(), "users",
          HFactory.createStringColumn("firstname", user.getFirstname()));
       mutator.addInsertion(user.getUsername(), "users”,
          HFactory.createStringColumn("lastname", user.getLastname()));
       mutator.addInsertion(user.getUsername(), "users",
          HFactory.createStringColumn("password", user.getPassword()));

       // Once the mutator is ready, execute on cassandra
       mutator.execute();

    } catch (HectorException he) {
       he.printStackTrace();
    }
}




                                                            You can implement the get…
Videos
                                   UserName      Description   Tags
    VideoId       VideoName
    <UUID>



•   Use a UUID as a row key for uniqueness
•   Allows for same video names
•   Tags should be stored in some sort of delimited format
•   Index on username may not be the best plan


                        CREATE TABLE videos (
                          videoid uuid PRIMARY KEY,
                          videoname varchar,
                          username varchar,
                          description varchar,
                          tags varchar
                        );
Videos: The get code
static Video getVideoByUUID(UUID videoId, Keyspace keyspace){

    Video video = new Video();

    //Create a slice query. We'll be getting specific column names
    SliceQuery<UUID, String, String> sliceQuery =
       HFactory.createSliceQuery(keyspace, uuidSerializer, stringSerializer, stringSerializer);

    sliceQuery.setColumnFamily("videos");
    sliceQuery.setKey(videoId);

    sliceQuery.setColumnNames("videoname","username","description","tags");

    // Execute the query and get the list of columns
    ColumnSlice<String,String> result = sliceQuery.execute().get();

    // Get each column by name and add them to our video object
    video.setVideoName(result.getColumnByName("videoname").getValue());
    video.setUsername(result.getColumnByName("username").getValue());
    video.setDescription(result.getColumnByName("description").getValue());
    video.setTags(result.getColumnByName("tags").getValue().split(","));

    return video;
}


                                                                  You can implement the set…
Comments
     VideoId      Username:<timestamp>           ..        Username:<timestamp>

     <UUID>


                    Time Order
•   Videos have many comments
•   Use Composite Columns to store user and time
•   Value of each column is the text of the comment
•   Order is as inserted
•   Use getSlice() to pull some or all of the comments

                                         CREATE TABLE comments (
                                           videoid uuid PRIMARY KEY
                                           comment varchar
                                         );
Rating a video
                             rating_count         rating_total
             VideoId
             <UUID>           <counter>             <counter>




• Use counter for single call update
• rating_count is how many ratings were given
• rating_total is the sum of rating
• Ex: rating_count = 5, rating_total = 23, avg rating = 23/5 = 4.6


                                  CREATE TABLE video_rating (
                                    videoid uuid PRIMARY KEY,
                                    rating_counter counter,
                                    rating_total counter);*

                                  * Only valid in CQL 3+
Video Event
                       start_<timestamp>   stop_<timestamp>    start_<timestamp>
    VideoId:Username
                                           video_<timestamp>


                          Time Order
•     Track viewing events
•     Combine Video ID and Username for a unique row
•     Stop time can be used to pick up where they left off
•     Great for usage analytics later




                                    CREATE TABLE video_event (
                                      videoid_username varchar PRIMARY KEY,
                                      event varchar
                                    );
Create Query Tables

Indexes to support fast lookups
Lookup Video By Username
                      VideoId:<timestamp>       ..           VideoId:<timestamp>
       Username




•   Username is unique
•   One column for each new video uploaded
•   Column slice for time span. From x to y
•   VideoId is added the same time a Video record is added




                               CREATE TABLE username_video_index (
                                 username varchar PRIMARY KEY,
                                 videoid_timestamp varchar
                               );
Videos by Tag
                         VideoId                ..              VideoId
         tag




•   Tag is unique regardless of video
•   Great for “List videos with X tag”
•   Tags have to be updated in Video and Tag at the same time
•   Index integrity is maintained in app logic



                                   CREATE TABLE tag_index (
                                     tag varchar PRIMARY KEY,
                                     videoid varchar
                                   );
Deployment strategies
• Measure your risk
  – Replication factor?
  – Multi-datacenter?
  – Cost?
• Performance
  – Today != tomorrow. Scale when needed
  – Have a expansion plan ready
Wrap up
• Similar data model process to RDBMS… to
  start
• Query -> Index table
• Don’t be afraid to write in multiple tables at
  once
• Bonus points: Hadoop and Solr!
Go play!
•   Go to: http://github.com/pmcfadin
•   Look for projects with cassandra12
•   Clone or fork my examples
•   Implement stubbed methods
•   Send me your solutions: pmcfadin@gmail.com
•   Follow me for updates: @PatrickMcFadin
Thank You!


Connect with me at @PatrickMcFadin
            Or linkedIn
   Conference tag #cassandra12

Más contenido relacionado

Destacado

Scalable Web Architectures - Common Patterns & Approaches
Scalable Web Architectures - Common Patterns & ApproachesScalable Web Architectures - Common Patterns & Approaches
Scalable Web Architectures - Common Patterns & Approaches
Cal Henderson
 
The world's next top data model
The world's next top data modelThe world's next top data model
The world's next top data model
Patrick McFadin
 
Cassandra Virtual Node talk
Cassandra Virtual Node talkCassandra Virtual Node talk
Cassandra Virtual Node talk
Patrick McFadin
 

Destacado (15)

Cassandra at scale
Cassandra at scaleCassandra at scale
Cassandra at scale
 
Scalable Web Architectures - Common Patterns & Approaches
Scalable Web Architectures - Common Patterns & ApproachesScalable Web Architectures - Common Patterns & Approaches
Scalable Web Architectures - Common Patterns & Approaches
 
Owning time series with team apache Strata San Jose 2015
Owning time series with team apache   Strata San Jose 2015Owning time series with team apache   Strata San Jose 2015
Owning time series with team apache Strata San Jose 2015
 
Cassandra EU - Data model on fire
Cassandra EU - Data model on fireCassandra EU - Data model on fire
Cassandra EU - Data model on fire
 
The world's next top data model
The world's next top data modelThe world's next top data model
The world's next top data model
 
Cassandra 3.0 advanced preview
Cassandra 3.0 advanced previewCassandra 3.0 advanced preview
Cassandra 3.0 advanced preview
 
An Introduction to time series with Team Apache
An Introduction to time series with Team ApacheAn Introduction to time series with Team Apache
An Introduction to time series with Team Apache
 
Big Data at Riot Games
Big Data at Riot GamesBig Data at Riot Games
Big Data at Riot Games
 
居住正義論壇II主題三(地政局)
居住正義論壇II主題三(地政局)居住正義論壇II主題三(地政局)
居住正義論壇II主題三(地政局)
 
Servlet 3.1 Async I/O
Servlet 3.1 Async I/OServlet 3.1 Async I/O
Servlet 3.1 Async I/O
 
Open source or proprietary, choose wisely!
Open source or proprietary,  choose wisely!Open source or proprietary,  choose wisely!
Open source or proprietary, choose wisely!
 
Become a super modeler
Become a super modelerBecome a super modeler
Become a super modeler
 
Karel čapek
Karel čapekKarel čapek
Karel čapek
 
Inbound Marketing for Startups in 2011
Inbound Marketing for Startups in 2011Inbound Marketing for Startups in 2011
Inbound Marketing for Startups in 2011
 
Cassandra Virtual Node talk
Cassandra Virtual Node talkCassandra Virtual Node talk
Cassandra Virtual Node talk
 

Más de Patrick McFadin

Apache cassandra & apache spark for time series data
Apache cassandra & apache spark for time series dataApache cassandra & apache spark for time series data
Apache cassandra & apache spark for time series data
Patrick McFadin
 
The data model is dead, long live the data model
The data model is dead, long live the data modelThe data model is dead, long live the data model
The data model is dead, long live the data model
Patrick McFadin
 

Más de Patrick McFadin (17)

Successful Architectures for Fast Data
Successful Architectures for Fast DataSuccessful Architectures for Fast Data
Successful Architectures for Fast Data
 
Laying down the smack on your data pipelines
Laying down the smack on your data pipelinesLaying down the smack on your data pipelines
Laying down the smack on your data pipelines
 
Analyzing Time Series Data with Apache Spark and Cassandra
Analyzing Time Series Data with Apache Spark and CassandraAnalyzing Time Series Data with Apache Spark and Cassandra
Analyzing Time Series Data with Apache Spark and Cassandra
 
Storing time series data with Apache Cassandra
Storing time series data with Apache CassandraStoring time series data with Apache Cassandra
Storing time series data with Apache Cassandra
 
A Cassandra + Solr + Spark Love Triangle Using DataStax Enterprise
A Cassandra + Solr + Spark Love Triangle Using DataStax EnterpriseA Cassandra + Solr + Spark Love Triangle Using DataStax Enterprise
A Cassandra + Solr + Spark Love Triangle Using DataStax Enterprise
 
Advanced data modeling with apache cassandra
Advanced data modeling with apache cassandraAdvanced data modeling with apache cassandra
Advanced data modeling with apache cassandra
 
Introduction to data modeling with apache cassandra
Introduction to data modeling with apache cassandraIntroduction to data modeling with apache cassandra
Introduction to data modeling with apache cassandra
 
Apache cassandra and spark. you got the the lighter, let's start the fire
Apache cassandra and spark. you got the the lighter, let's start the fireApache cassandra and spark. you got the the lighter, let's start the fire
Apache cassandra and spark. you got the the lighter, let's start the fire
 
Nike Tech Talk: Double Down on Apache Cassandra and Spark
Nike Tech Talk:  Double Down on Apache Cassandra and SparkNike Tech Talk:  Double Down on Apache Cassandra and Spark
Nike Tech Talk: Double Down on Apache Cassandra and Spark
 
Apache cassandra & apache spark for time series data
Apache cassandra & apache spark for time series dataApache cassandra & apache spark for time series data
Apache cassandra & apache spark for time series data
 
Real data models of silicon valley
Real data models of silicon valleyReal data models of silicon valley
Real data models of silicon valley
 
Introduction to cassandra 2014
Introduction to cassandra 2014Introduction to cassandra 2014
Introduction to cassandra 2014
 
Making money with open source and not losing your soul: A practical guide
Making money with open source and not losing your soul: A practical guideMaking money with open source and not losing your soul: A practical guide
Making money with open source and not losing your soul: A practical guide
 
Time series with Apache Cassandra - Long version
Time series with Apache Cassandra - Long versionTime series with Apache Cassandra - Long version
Time series with Apache Cassandra - Long version
 
Time series with apache cassandra strata
Time series with apache cassandra   strataTime series with apache cassandra   strata
Time series with apache cassandra strata
 
Cassandra 2.0 and timeseries
Cassandra 2.0 and timeseriesCassandra 2.0 and timeseries
Cassandra 2.0 and timeseries
 
The data model is dead, long live the data model
The data model is dead, long live the data modelThe data model is dead, long live the data model
The data model is dead, long live the data model
 

Último

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 

Último (20)

2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu SubbuApidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelNavi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 

Cassandra Summit 2012 - Building a Cassandra Based App From Scratch

  • 1. Building a Cassandra based application from scratch Patrick McFadin Cassandra Summit 2012 #cassandra12
  • 2. This is me • Chief Architect at Hobsons – Hobsons is an education services company. More here: www.hobsons.com • Cassandra user since .7 • Follow me here: @PatrickMcFadin
  • 3. Goals • Take a new concept • What’s the data model?!?! • Some sample code • You get homework! (If you want)
  • 4. Here’s the plan • Conceptualize a new application • Identify the entity tables • Identify query tables • Code. Rinse. Repeat. • Deploy • … • Profit! * I’ll be using the term Tables which is equivalent to Column Families
  • 5. www.killrvideos.com Video Tit le User name Recommended D ipt ion escr Start with a Meow Ads concept by Google Video Sharing Website Rat ing: Tags: Foo Bar Upload New! Comment s *Cat drawing by goodrob13 on Flickr
  • 6. Break down the features • Post a video* • View a video • Add a comment • Rate a video • Tag a video * Not talking about transcoding! Check out zencoder.com, it’s pretty sweet.
  • 7. Create Entity Tables Basic storage unit
  • 8. Users password FirstName LastName Username • Similar to a RDBMS table. Fairly fixed columns • Username is unique • Use secondary indexes on firstname and lastname for lookup • Adding columns with Cassandra is super easy CREATE TABLE users ( username varchar PRIMARY KEY, firstname varchar, lastname varchar, password varchar );
  • 9. Users: The set code static void setUser(User user, Keyspace keyspace) { // Create a mutator that allows you to talk to casssandra Mutator<String> mutator = HFactory.createMutator(keyspace, stringSerializer); try { // Use the mutator to insert data into our table mutator.addInsertion(user.getUsername(), "users", HFactory.createStringColumn("firstname", user.getFirstname())); mutator.addInsertion(user.getUsername(), "users”, HFactory.createStringColumn("lastname", user.getLastname())); mutator.addInsertion(user.getUsername(), "users", HFactory.createStringColumn("password", user.getPassword())); // Once the mutator is ready, execute on cassandra mutator.execute(); } catch (HectorException he) { he.printStackTrace(); } } You can implement the get…
  • 10. Videos UserName Description Tags VideoId VideoName <UUID> • Use a UUID as a row key for uniqueness • Allows for same video names • Tags should be stored in some sort of delimited format • Index on username may not be the best plan CREATE TABLE videos ( videoid uuid PRIMARY KEY, videoname varchar, username varchar, description varchar, tags varchar );
  • 11. Videos: The get code static Video getVideoByUUID(UUID videoId, Keyspace keyspace){ Video video = new Video(); //Create a slice query. We'll be getting specific column names SliceQuery<UUID, String, String> sliceQuery = HFactory.createSliceQuery(keyspace, uuidSerializer, stringSerializer, stringSerializer); sliceQuery.setColumnFamily("videos"); sliceQuery.setKey(videoId); sliceQuery.setColumnNames("videoname","username","description","tags"); // Execute the query and get the list of columns ColumnSlice<String,String> result = sliceQuery.execute().get(); // Get each column by name and add them to our video object video.setVideoName(result.getColumnByName("videoname").getValue()); video.setUsername(result.getColumnByName("username").getValue()); video.setDescription(result.getColumnByName("description").getValue()); video.setTags(result.getColumnByName("tags").getValue().split(",")); return video; } You can implement the set…
  • 12. Comments VideoId Username:<timestamp> .. Username:<timestamp> <UUID> Time Order • Videos have many comments • Use Composite Columns to store user and time • Value of each column is the text of the comment • Order is as inserted • Use getSlice() to pull some or all of the comments CREATE TABLE comments ( videoid uuid PRIMARY KEY comment varchar );
  • 13. Rating a video rating_count rating_total VideoId <UUID> <counter> <counter> • Use counter for single call update • rating_count is how many ratings were given • rating_total is the sum of rating • Ex: rating_count = 5, rating_total = 23, avg rating = 23/5 = 4.6 CREATE TABLE video_rating ( videoid uuid PRIMARY KEY, rating_counter counter, rating_total counter);* * Only valid in CQL 3+
  • 14. Video Event start_<timestamp> stop_<timestamp> start_<timestamp> VideoId:Username video_<timestamp> Time Order • Track viewing events • Combine Video ID and Username for a unique row • Stop time can be used to pick up where they left off • Great for usage analytics later CREATE TABLE video_event ( videoid_username varchar PRIMARY KEY, event varchar );
  • 15. Create Query Tables Indexes to support fast lookups
  • 16. Lookup Video By Username VideoId:<timestamp> .. VideoId:<timestamp> Username • Username is unique • One column for each new video uploaded • Column slice for time span. From x to y • VideoId is added the same time a Video record is added CREATE TABLE username_video_index ( username varchar PRIMARY KEY, videoid_timestamp varchar );
  • 17. Videos by Tag VideoId .. VideoId tag • Tag is unique regardless of video • Great for “List videos with X tag” • Tags have to be updated in Video and Tag at the same time • Index integrity is maintained in app logic CREATE TABLE tag_index ( tag varchar PRIMARY KEY, videoid varchar );
  • 18. Deployment strategies • Measure your risk – Replication factor? – Multi-datacenter? – Cost? • Performance – Today != tomorrow. Scale when needed – Have a expansion plan ready
  • 19. Wrap up • Similar data model process to RDBMS… to start • Query -> Index table • Don’t be afraid to write in multiple tables at once • Bonus points: Hadoop and Solr!
  • 20. Go play! • Go to: http://github.com/pmcfadin • Look for projects with cassandra12 • Clone or fork my examples • Implement stubbed methods • Send me your solutions: pmcfadin@gmail.com • Follow me for updates: @PatrickMcFadin
  • 21. Thank You! Connect with me at @PatrickMcFadin Or linkedIn Conference tag #cassandra12

Notas del editor

  1. I’ll be using the term Tables instead of Column Family during this presentation.
  2. Comp Columns. Two different types
  3. Example shows use of a counter