SlideShare una empresa de Scribd logo
1 de 34
Descargar para leer sin conexión
Cassandra 2012:
What's New & Upcoming

    Sam Tunnicliffe
●   sam@datastax.com
●   DSE : integrated Big Data platform
    –   Built on Cassandra
    –   Analytics using Hadoop (Hive/Pig/Mahout)
    –   Enterprise Search with Solr
Cassandra in 2012
●   Cassandra 1.1 – April 2012
●   Cassandra 1.2 – November 2012
●   What's on the roadmap
Release Schedule
Version                Release Date
0.5                    24 Jan 2010
0.6                    13 April 2010
0.7                    9 January 2011
0.8                    2 June 2011
1.0                    18 October 2011
1.1                    24 April 2012
1.2                    End November 2012 ?

●   Currently working to six month release cycle
●   Minor releases as and when necessary
Global Row & Key Caches
●   Prior to 1.1 separate row & key caches per
    column family
●   Since 1.1 single row cache & single key
    cache shared across all column families
●   Simpler configuration
    –   Globally:   {key,row}_cache_size_in_mb

    –   Per CF:     ALL|NONE|KEYS_ONLY|ROWS_ONLY
More Granular Storage Config
●   Pre 1.1 one storage location per keyspace
    –   /var/lib/cassandra/ks/cf-hc-1-Data.db

●   In 1.1 one location per column family
    –   /var/lib/cassandra/ks/cf/ks-cf-hc-1-Data.db
●   Allows you to pin certain data to particular
    storage system
Why?




 http://techblog.netflix.com/2012/07/benchmarking-high-performance-io-with.html
Why?




 http://www.slideshare.net/rbranson/cassandra-and-solid-state-drives
Row Level Isolation
●   Batched writes within row have always
    been atomic
●   Batched writes within row are now also
    isolated
Row Level Isolation
●   So Writing
      UPDATE foo
      SET column_x='a' AND column_y='1'
      WHERE id='bar';


●   Followed by
      UPDATE foo
      SET column_x='b' AND column_y='2'
      WHERE id='bar';


●   No reader ever sees either
      {column_a:'a', column_b:'2'}
      {column_x:'b', column_y:'1'}
Misc Other Stuff
●   Windows off heap cache
●   write survey mode
●   commitlog segment pre-allocation/recycling
●   abortable compactions
●   multi threaded streaming
●   Hadoop improvements
    –   utilise secondary indexes from Hadoop
    –   better wide row support
CQL 3.0
●   Beta in 1.1, finalized in 1.2
●   Motivations
    –   Better support for wide rows
    –   Native syntax for composites
CQL 3.0
●   Wide Rows
●   Composites
●   Transposition
CQL 3.0
CREATE TABLE comments (
   id text,
   posted_at timestamp,
   author text,
   content text,
   karma int,
   PRIMARY KEY( id, posted_at )
);



INSERT INTO comments (id, posted_at, author, content, karma)
VALUES ('article_x', '2012-10-31T08:00:00', 'freddie', 'awesome', 100);

INSERT INTO comments (id, posted_at, author, content, karma)
VALUES ('article_x', '2012-10-31T09:59:59', 'rageguy', 'F7U12', 0);
CQL 3.0 Transposition
{ article_x : { 2012-10-31T08:00:00 + author => freddie }
              { 2012-10-31T08:00:00 + content => awesome }
              { 2012-10-31T08:00:00 + karma   => 100 }

              { 2012-10-31T09:59:59 + author => rageguy }
              { 2012-10-31T09:59:59 + content => F7U12 }
              { 2012-10-31T09:59:59 + karma   => 1 }
}
{ article_y : { 2012-10-28T15:30:12 + author...




cqlsh:hn> select * from comments where id = 'article_x';


 id        | posted_at           | author | content | karma
-----------+---------------------+---------+---------+-------
 article_x | 2012-10-31 08:00:00 | freddie | awesome |   100
 article_x | 2012-10-31 09:59:59 | rageguy |   F7U12 |     1
CQL 3.0 Transposition
{ article_x : { 2012-10-31T08:00:00 + author => freddie }
              { 2012-10-31T08:00:00 + content => awesome }
              { 2012-10-31T08:00:00 + karma   => 100 }

              { 2012-10-31T09:59:59 + author => rageguy }
              { 2012-10-31T09:59:59 + content => F7U12 }
              { 2012-10-31T09:59:59 + karma   => 1 }
}
{ article_y : { 2012-10-28T15:30:12 + author...




cqlsh:hn> select * from comments where id = 'article_x'
          and posted_at <= '2012-10-31T08:00:00' ;

 id        | posted_at           | author | content | karma
-----------+---------------------+---------+---------+-------
 article_x | 2012-10-31 08:00:00 | freddie | awesome |   100
CQL 3.0 Collections
●   Typed collections – Set, Map, List
●   Good fit for denormalizing small collections
●   Less ugly, more efficient than previous
    approach
●   Composites under the hood
●   Each element stored as one column
    –   So each can have an individual TTL
CQL 3.0 Sets
●   Collection of typed elements
●   No duplicate elements
●   Actually a sorted set
    –   Ordering determined by natural order elements
CQL 3.0 Sets
CREATE TABLE users (
           user_id text PRIMARY KEY,
           first_name text,
           last_name text,
           emails set<text>
       );


●   Set entire value with set literal
    INSERT INTO users (user_id, first_name, last_name, emails)
    VALUES('sam', 'Sam', 'Tunnicliffe',
           {'sam@beobal.com', 'sam@datastax.com'});
CQL 3.0 Sets
●   Insert and remove
    UPDATE users
    SET emails = emails + {'sam@cohodo.net'}
    WHERE user_id = 'sam';

    UPDATE users
    SET emails = emails – {'sam@cohodo.net'}
    WHERE user_id = 'sam'


●   No distinction between empty and null set
    UPDATE users
    SET emails = {}
    WHERE user_id = 'sam';

    DELETE emails
    FROM users
    WHERE user_id = 'sam';
CQL 3.0 Lists
●   Ordering determined by user, not by
    natural ordering of elements
●   Allows multiple occurrences of same value
CQL 3.0 Lists
ALTER TABLE users ADD todo list<text>;

●   Set, Prepend & Append
    UPDATE users
    SET todo = [ 'get to Sinsheim', 'give presentation' ]
    WHERE user_id = 'sam';

    UPDATE users
    SET todo = [ 'finish slides' ] + todo
    WHERE user_id = 'sam';

    UPDATE users
    SET todo = todo + [ 'drink beer' ]
    WHERE user_id = 'sam';
CQL 3.0 Lists
●   Access elements by index
    –   Less performant, requires read of entire list
    UPDATE users
    SET todo[2] = 'slow down'
    WHERE user_id = 'sam';

    DELETE todo[3]
    FROM users
    WHERE user_id = 'sam';


●   Remove by value
    UPDATE users
    SET todo = todo – ['finish slides']
    WHERE user_id = 'sam';
CQL 3.0 Maps
●   Collection of typed key/value pairs
●   Like sets, maps are actually ordered
    –   Ordering determined by the type of the keys
●   Operate on entire collection or individual
    element by key
CQL 3.0 Maps
ALTER TABLE users ADD accounts map<text, text>;
●   Set entire value with map literal
    UPDATE users
    SET accounts = { 'twitter' : 'beobal',
                     'irc'     : 'sam___' }
    WHERE user_id = 'sam';


●   Access elements by key
    UPDATE users
    SET accounts['irc'] = 'beobal@freenode.net'
    WHERE user_id = 'sam';

    DELETE accounts['twitter']
    FROM users
    WHERE user_id = 'sam';
CQL 3.0 Collections
●   Can only retrieve a collection in its entirety
    –   Collections are not intended to be very large
    –   Not a replacement for a good data model
●   Typed but cannot currently be nested
    –   Can't define a list<list<int>>
●   No support for secondary indexes
    –   On the roadmap but not yet implemented
CQL 3.0 Binary Protocol
●   Motivations
    –   Ability to optimize for specific use-cases
    –   Reduce client dependencies
    –   Lay groundwork for future enhancements such
        as streaming/paging
CQL 3.0 Binary Protocol
●   Framed protocol, designed natively for
    CQL 3.0
●   Built on Netty
●   Java, Python & C drivers in development
●   Switched off by default in 1.2
●   Thrift API will remain, no breaking API
    changes
Virtual Nodes
●   Evolution of Cassandra's distribution model
●   Nodes own multiple token ranges
●   Big implications for operations
    –   Replacing failed nodes
    –   Growing / shrinking cluster
●   Stay put for Eric
Concurrent Schema Changes
●   Online schema changes introduced in 0.7
●   Initially schema changes needed serializing
    –   Update schema
    –   Wait for propagation before making next
        change
●   Mostly fixed in 1.1
●   Completed in 1.2
    –   Concurrent ColumnFamily creation
Better JBOD Support
●   Growing data volumes and compaction
    requirements means moar disk
●   Pre 1.2 best practice is to run RAID10
●   Improvements to handling disk failures
    –   disk_failure_policy : stop | best_effort | ignore

    –   best_effort implications   for writes / reads
●   Better write balancing for less contention
Misc Other Stuff
●   Query Tracing
●   Speedup slicing wide rows from disk
●   Atomic Batches
Future Plans
●   Improved caching for wide rows
●   Remove 2-phase compaction
●   Remove SuperColumns internally
Thank you

Más contenido relacionado

La actualidad más candente

Cassandra 3.0 advanced preview
Cassandra 3.0 advanced previewCassandra 3.0 advanced preview
Cassandra 3.0 advanced previewPatrick McFadin
 
PostgreSQL Database Slides
PostgreSQL Database SlidesPostgreSQL Database Slides
PostgreSQL Database Slidesmetsarin
 
Python SQite3 database Tutorial | SQlite Database
Python SQite3 database Tutorial | SQlite DatabasePython SQite3 database Tutorial | SQlite Database
Python SQite3 database Tutorial | SQlite DatabaseElangovanTechNotesET
 
Modern query optimisation features in MySQL 8.
Modern query optimisation features in MySQL 8.Modern query optimisation features in MySQL 8.
Modern query optimisation features in MySQL 8.Mydbops
 
Cassandra Day SV 2014: Netflix’s Astyanax Java Client Driver for Apache Cassa...
Cassandra Day SV 2014: Netflix’s Astyanax Java Client Driver for Apache Cassa...Cassandra Day SV 2014: Netflix’s Astyanax Java Client Driver for Apache Cassa...
Cassandra Day SV 2014: Netflix’s Astyanax Java Client Driver for Apache Cassa...DataStax Academy
 
How to Avoid Pitfalls in Schema Upgrade with Galera
How to Avoid Pitfalls in Schema Upgrade with GaleraHow to Avoid Pitfalls in Schema Upgrade with Galera
How to Avoid Pitfalls in Schema Upgrade with GaleraSveta Smirnova
 
Cassandra and Spark
Cassandra and Spark Cassandra and Spark
Cassandra and Spark datastaxjp
 
Introduction to Apache Cassandra
Introduction to Apache CassandraIntroduction to Apache Cassandra
Introduction to Apache CassandraLuke Tillman
 
Cassandra and Rails at LA NoSQL Meetup
Cassandra and Rails at LA NoSQL MeetupCassandra and Rails at LA NoSQL Meetup
Cassandra and Rails at LA NoSQL MeetupMichael Wynholds
 
MySQL database replication
MySQL database replicationMySQL database replication
MySQL database replicationPoguttuezhiniVP
 
A brief introduction to PostgreSQL
A brief introduction to PostgreSQLA brief introduction to PostgreSQL
A brief introduction to PostgreSQLVu Hung Nguyen
 
MariaDB Cassandra Interoperability
MariaDB Cassandra InteroperabilityMariaDB Cassandra Interoperability
MariaDB Cassandra InteroperabilityColin Charles
 
Cassandra 2.1 boot camp, Read/Write path
Cassandra 2.1 boot camp, Read/Write pathCassandra 2.1 boot camp, Read/Write path
Cassandra 2.1 boot camp, Read/Write pathJoshua McKenzie
 
MariaDB for developers
MariaDB for developersMariaDB for developers
MariaDB for developersColin Charles
 
CQL performance with Apache Cassandra 3.0 (Aaron Morton, The Last Pickle) | C...
CQL performance with Apache Cassandra 3.0 (Aaron Morton, The Last Pickle) | C...CQL performance with Apache Cassandra 3.0 (Aaron Morton, The Last Pickle) | C...
CQL performance with Apache Cassandra 3.0 (Aaron Morton, The Last Pickle) | C...DataStax
 
Cassandra summit keynote 2014
Cassandra summit keynote 2014Cassandra summit keynote 2014
Cassandra summit keynote 2014jbellis
 
Cassandra Summit 2015
Cassandra Summit 2015Cassandra Summit 2015
Cassandra Summit 2015jbellis
 

La actualidad más candente (20)

Cassandra 3.0 advanced preview
Cassandra 3.0 advanced previewCassandra 3.0 advanced preview
Cassandra 3.0 advanced preview
 
CQL3 in depth
CQL3 in depthCQL3 in depth
CQL3 in depth
 
PostgreSQL Database Slides
PostgreSQL Database SlidesPostgreSQL Database Slides
PostgreSQL Database Slides
 
Python SQite3 database Tutorial | SQlite Database
Python SQite3 database Tutorial | SQlite DatabasePython SQite3 database Tutorial | SQlite Database
Python SQite3 database Tutorial | SQlite Database
 
Modern query optimisation features in MySQL 8.
Modern query optimisation features in MySQL 8.Modern query optimisation features in MySQL 8.
Modern query optimisation features in MySQL 8.
 
Cassandra Day SV 2014: Netflix’s Astyanax Java Client Driver for Apache Cassa...
Cassandra Day SV 2014: Netflix’s Astyanax Java Client Driver for Apache Cassa...Cassandra Day SV 2014: Netflix’s Astyanax Java Client Driver for Apache Cassa...
Cassandra Day SV 2014: Netflix’s Astyanax Java Client Driver for Apache Cassa...
 
How to Avoid Pitfalls in Schema Upgrade with Galera
How to Avoid Pitfalls in Schema Upgrade with GaleraHow to Avoid Pitfalls in Schema Upgrade with Galera
How to Avoid Pitfalls in Schema Upgrade with Galera
 
Cassandra and Spark
Cassandra and Spark Cassandra and Spark
Cassandra and Spark
 
Introduction to Apache Cassandra
Introduction to Apache CassandraIntroduction to Apache Cassandra
Introduction to Apache Cassandra
 
Cassandra and Rails at LA NoSQL Meetup
Cassandra and Rails at LA NoSQL MeetupCassandra and Rails at LA NoSQL Meetup
Cassandra and Rails at LA NoSQL Meetup
 
MySQL database replication
MySQL database replicationMySQL database replication
MySQL database replication
 
A brief introduction to PostgreSQL
A brief introduction to PostgreSQLA brief introduction to PostgreSQL
A brief introduction to PostgreSQL
 
MariaDB Cassandra Interoperability
MariaDB Cassandra InteroperabilityMariaDB Cassandra Interoperability
MariaDB Cassandra Interoperability
 
Cassandra 2.1 boot camp, Read/Write path
Cassandra 2.1 boot camp, Read/Write pathCassandra 2.1 boot camp, Read/Write path
Cassandra 2.1 boot camp, Read/Write path
 
Mongodb replication
Mongodb replicationMongodb replication
Mongodb replication
 
MariaDB for developers
MariaDB for developersMariaDB for developers
MariaDB for developers
 
CQL performance with Apache Cassandra 3.0 (Aaron Morton, The Last Pickle) | C...
CQL performance with Apache Cassandra 3.0 (Aaron Morton, The Last Pickle) | C...CQL performance with Apache Cassandra 3.0 (Aaron Morton, The Last Pickle) | C...
CQL performance with Apache Cassandra 3.0 (Aaron Morton, The Last Pickle) | C...
 
Cassandra summit keynote 2014
Cassandra summit keynote 2014Cassandra summit keynote 2014
Cassandra summit keynote 2014
 
Cassandra Summit 2015
Cassandra Summit 2015Cassandra Summit 2015
Cassandra Summit 2015
 
How to understand Galera Cluster - 2013
How to understand Galera Cluster - 2013How to understand Galera Cluster - 2013
How to understand Galera Cluster - 2013
 

Destacado

Open Source In Further Education
Open Source In Further EducationOpen Source In Further Education
Open Source In Further EducationRoss Gardler
 
Open Source Basics
Open Source BasicsOpen Source Basics
Open Source BasicsRoss Gardler
 
Tmjh English
Tmjh EnglishTmjh English
Tmjh Englishtmjhpri
 
Browsing History for Mozilla
Browsing History for MozillaBrowsing History for Mozilla
Browsing History for MozillaSamuel Burlamaqui
 
Sustainable Podcasting
Sustainable PodcastingSustainable Podcasting
Sustainable PodcastingRoss Gardler
 
A forge is just a tool, but is it the right tool?
A forge is just a tool, but is it the right tool?A forge is just a tool, but is it the right tool?
A forge is just a tool, but is it the right tool?Ross Gardler
 

Destacado (7)

Open Source In Further Education
Open Source In Further EducationOpen Source In Further Education
Open Source In Further Education
 
Open Source Basics
Open Source BasicsOpen Source Basics
Open Source Basics
 
Tmjh English
Tmjh EnglishTmjh English
Tmjh English
 
Browsing History for Mozilla
Browsing History for MozillaBrowsing History for Mozilla
Browsing History for Mozilla
 
Sustainable Podcasting
Sustainable PodcastingSustainable Podcasting
Sustainable Podcasting
 
94light
94light94light
94light
 
A forge is just a tool, but is it the right tool?
A forge is just a tool, but is it the right tool?A forge is just a tool, but is it the right tool?
A forge is just a tool, but is it the right tool?
 

Similar a Cassandra 2012

PostgreSQL 9.5 - Major Features
PostgreSQL 9.5 - Major FeaturesPostgreSQL 9.5 - Major Features
PostgreSQL 9.5 - Major FeaturesInMobi Technology
 
Introduction to PostgreSQL
Introduction to PostgreSQLIntroduction to PostgreSQL
Introduction to PostgreSQLJim Mlodgenski
 
What's New in MariaDB Server 10.2 and MariaDB MaxScale 2.1
What's New in MariaDB Server 10.2 and MariaDB MaxScale 2.1What's New in MariaDB Server 10.2 and MariaDB MaxScale 2.1
What's New in MariaDB Server 10.2 and MariaDB MaxScale 2.1MariaDB plc
 
What's New in MariaDB Server 10.2 and MariaDB MaxScale 2.1
What's New in MariaDB Server 10.2 and MariaDB MaxScale 2.1What's New in MariaDB Server 10.2 and MariaDB MaxScale 2.1
What's New in MariaDB Server 10.2 and MariaDB MaxScale 2.1MariaDB plc
 
Recent MariaDB features to learn for a happy life
Recent MariaDB features to learn for a happy lifeRecent MariaDB features to learn for a happy life
Recent MariaDB features to learn for a happy lifeFederico Razzoli
 
MariaDB 10.11 key features overview for DBAs
MariaDB 10.11 key features overview for DBAsMariaDB 10.11 key features overview for DBAs
MariaDB 10.11 key features overview for DBAsFederico Razzoli
 
Avoiding Pitfalls for Cassandra.pdf
Avoiding Pitfalls for Cassandra.pdfAvoiding Pitfalls for Cassandra.pdf
Avoiding Pitfalls for Cassandra.pdfCédrick Lunven
 
OpenTSDB 2.0
OpenTSDB 2.0OpenTSDB 2.0
OpenTSDB 2.0HBaseCon
 
The Apache Cassandra ecosystem
The Apache Cassandra ecosystemThe Apache Cassandra ecosystem
The Apache Cassandra ecosystemAlex Thompson
 
What’s New In PostgreSQL 9.3
What’s New In PostgreSQL 9.3What’s New In PostgreSQL 9.3
What’s New In PostgreSQL 9.3Pavan Deolasee
 
10 Reasons to Start Your Analytics Project with PostgreSQL
10 Reasons to Start Your Analytics Project with PostgreSQL10 Reasons to Start Your Analytics Project with PostgreSQL
10 Reasons to Start Your Analytics Project with PostgreSQLSatoshi Nagayasu
 
Cassandra Basics, Counters and Time Series Modeling
Cassandra Basics, Counters and Time Series ModelingCassandra Basics, Counters and Time Series Modeling
Cassandra Basics, Counters and Time Series ModelingVassilis Bekiaris
 
dbs class 7.ppt
dbs class 7.pptdbs class 7.ppt
dbs class 7.pptMARasheed3
 
Common schema my sql uc 2012
Common schema   my sql uc 2012Common schema   my sql uc 2012
Common schema my sql uc 2012Roland Bouman
 
Common schema my sql uc 2012
Common schema   my sql uc 2012Common schema   my sql uc 2012
Common schema my sql uc 2012Roland Bouman
 
NoCOUG_201411_Patel_Managing_a_Large_OLTP_Database
NoCOUG_201411_Patel_Managing_a_Large_OLTP_DatabaseNoCOUG_201411_Patel_Managing_a_Large_OLTP_Database
NoCOUG_201411_Patel_Managing_a_Large_OLTP_DatabaseParesh Patel
 
BITS: Introduction to MySQL - Introduction and Installation
BITS: Introduction to MySQL - Introduction and InstallationBITS: Introduction to MySQL - Introduction and Installation
BITS: Introduction to MySQL - Introduction and InstallationBITS
 
Introduction to Postrges-XC
Introduction to Postrges-XCIntroduction to Postrges-XC
Introduction to Postrges-XCAshutosh Bapat
 
Think Exa!
Think Exa!Think Exa!
Think Exa!Enkitec
 

Similar a Cassandra 2012 (20)

PostgreSQL 9.5 - Major Features
PostgreSQL 9.5 - Major FeaturesPostgreSQL 9.5 - Major Features
PostgreSQL 9.5 - Major Features
 
Introduction to PostgreSQL
Introduction to PostgreSQLIntroduction to PostgreSQL
Introduction to PostgreSQL
 
What's New in MariaDB Server 10.2 and MariaDB MaxScale 2.1
What's New in MariaDB Server 10.2 and MariaDB MaxScale 2.1What's New in MariaDB Server 10.2 and MariaDB MaxScale 2.1
What's New in MariaDB Server 10.2 and MariaDB MaxScale 2.1
 
What's New in MariaDB Server 10.2 and MariaDB MaxScale 2.1
What's New in MariaDB Server 10.2 and MariaDB MaxScale 2.1What's New in MariaDB Server 10.2 and MariaDB MaxScale 2.1
What's New in MariaDB Server 10.2 and MariaDB MaxScale 2.1
 
Recent MariaDB features to learn for a happy life
Recent MariaDB features to learn for a happy lifeRecent MariaDB features to learn for a happy life
Recent MariaDB features to learn for a happy life
 
MariaDB 10.11 key features overview for DBAs
MariaDB 10.11 key features overview for DBAsMariaDB 10.11 key features overview for DBAs
MariaDB 10.11 key features overview for DBAs
 
Avoiding Pitfalls for Cassandra.pdf
Avoiding Pitfalls for Cassandra.pdfAvoiding Pitfalls for Cassandra.pdf
Avoiding Pitfalls for Cassandra.pdf
 
OpenTSDB 2.0
OpenTSDB 2.0OpenTSDB 2.0
OpenTSDB 2.0
 
The Apache Cassandra ecosystem
The Apache Cassandra ecosystemThe Apache Cassandra ecosystem
The Apache Cassandra ecosystem
 
What’s New In PostgreSQL 9.3
What’s New In PostgreSQL 9.3What’s New In PostgreSQL 9.3
What’s New In PostgreSQL 9.3
 
10 Reasons to Start Your Analytics Project with PostgreSQL
10 Reasons to Start Your Analytics Project with PostgreSQL10 Reasons to Start Your Analytics Project with PostgreSQL
10 Reasons to Start Your Analytics Project with PostgreSQL
 
Cassandra Basics, Counters and Time Series Modeling
Cassandra Basics, Counters and Time Series ModelingCassandra Basics, Counters and Time Series Modeling
Cassandra Basics, Counters and Time Series Modeling
 
dbs class 7.ppt
dbs class 7.pptdbs class 7.ppt
dbs class 7.ppt
 
Common schema my sql uc 2012
Common schema   my sql uc 2012Common schema   my sql uc 2012
Common schema my sql uc 2012
 
Common schema my sql uc 2012
Common schema   my sql uc 2012Common schema   my sql uc 2012
Common schema my sql uc 2012
 
Apache Cassandra at Macys
Apache Cassandra at MacysApache Cassandra at Macys
Apache Cassandra at Macys
 
NoCOUG_201411_Patel_Managing_a_Large_OLTP_Database
NoCOUG_201411_Patel_Managing_a_Large_OLTP_DatabaseNoCOUG_201411_Patel_Managing_a_Large_OLTP_Database
NoCOUG_201411_Patel_Managing_a_Large_OLTP_Database
 
BITS: Introduction to MySQL - Introduction and Installation
BITS: Introduction to MySQL - Introduction and InstallationBITS: Introduction to MySQL - Introduction and Installation
BITS: Introduction to MySQL - Introduction and Installation
 
Introduction to Postrges-XC
Introduction to Postrges-XCIntroduction to Postrges-XC
Introduction to Postrges-XC
 
Think Exa!
Think Exa!Think Exa!
Think Exa!
 

Último

Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 

Último (20)

Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 

Cassandra 2012

  • 1. Cassandra 2012: What's New & Upcoming Sam Tunnicliffe
  • 2. sam@datastax.com ● DSE : integrated Big Data platform – Built on Cassandra – Analytics using Hadoop (Hive/Pig/Mahout) – Enterprise Search with Solr
  • 3. Cassandra in 2012 ● Cassandra 1.1 – April 2012 ● Cassandra 1.2 – November 2012 ● What's on the roadmap
  • 4. Release Schedule Version Release Date 0.5 24 Jan 2010 0.6 13 April 2010 0.7 9 January 2011 0.8 2 June 2011 1.0 18 October 2011 1.1 24 April 2012 1.2 End November 2012 ? ● Currently working to six month release cycle ● Minor releases as and when necessary
  • 5. Global Row & Key Caches ● Prior to 1.1 separate row & key caches per column family ● Since 1.1 single row cache & single key cache shared across all column families ● Simpler configuration – Globally: {key,row}_cache_size_in_mb – Per CF: ALL|NONE|KEYS_ONLY|ROWS_ONLY
  • 6. More Granular Storage Config ● Pre 1.1 one storage location per keyspace – /var/lib/cassandra/ks/cf-hc-1-Data.db ● In 1.1 one location per column family – /var/lib/cassandra/ks/cf/ks-cf-hc-1-Data.db ● Allows you to pin certain data to particular storage system
  • 9. Row Level Isolation ● Batched writes within row have always been atomic ● Batched writes within row are now also isolated
  • 10. Row Level Isolation ● So Writing UPDATE foo SET column_x='a' AND column_y='1' WHERE id='bar'; ● Followed by UPDATE foo SET column_x='b' AND column_y='2' WHERE id='bar'; ● No reader ever sees either {column_a:'a', column_b:'2'} {column_x:'b', column_y:'1'}
  • 11. Misc Other Stuff ● Windows off heap cache ● write survey mode ● commitlog segment pre-allocation/recycling ● abortable compactions ● multi threaded streaming ● Hadoop improvements – utilise secondary indexes from Hadoop – better wide row support
  • 12. CQL 3.0 ● Beta in 1.1, finalized in 1.2 ● Motivations – Better support for wide rows – Native syntax for composites
  • 13. CQL 3.0 ● Wide Rows ● Composites ● Transposition
  • 14. CQL 3.0 CREATE TABLE comments ( id text, posted_at timestamp, author text, content text, karma int, PRIMARY KEY( id, posted_at ) ); INSERT INTO comments (id, posted_at, author, content, karma) VALUES ('article_x', '2012-10-31T08:00:00', 'freddie', 'awesome', 100); INSERT INTO comments (id, posted_at, author, content, karma) VALUES ('article_x', '2012-10-31T09:59:59', 'rageguy', 'F7U12', 0);
  • 15. CQL 3.0 Transposition { article_x : { 2012-10-31T08:00:00 + author => freddie } { 2012-10-31T08:00:00 + content => awesome } { 2012-10-31T08:00:00 + karma => 100 } { 2012-10-31T09:59:59 + author => rageguy } { 2012-10-31T09:59:59 + content => F7U12 } { 2012-10-31T09:59:59 + karma => 1 } } { article_y : { 2012-10-28T15:30:12 + author... cqlsh:hn> select * from comments where id = 'article_x'; id | posted_at | author | content | karma -----------+---------------------+---------+---------+------- article_x | 2012-10-31 08:00:00 | freddie | awesome | 100 article_x | 2012-10-31 09:59:59 | rageguy | F7U12 | 1
  • 16. CQL 3.0 Transposition { article_x : { 2012-10-31T08:00:00 + author => freddie } { 2012-10-31T08:00:00 + content => awesome } { 2012-10-31T08:00:00 + karma => 100 } { 2012-10-31T09:59:59 + author => rageguy } { 2012-10-31T09:59:59 + content => F7U12 } { 2012-10-31T09:59:59 + karma => 1 } } { article_y : { 2012-10-28T15:30:12 + author... cqlsh:hn> select * from comments where id = 'article_x' and posted_at <= '2012-10-31T08:00:00' ; id | posted_at | author | content | karma -----------+---------------------+---------+---------+------- article_x | 2012-10-31 08:00:00 | freddie | awesome | 100
  • 17. CQL 3.0 Collections ● Typed collections – Set, Map, List ● Good fit for denormalizing small collections ● Less ugly, more efficient than previous approach ● Composites under the hood ● Each element stored as one column – So each can have an individual TTL
  • 18. CQL 3.0 Sets ● Collection of typed elements ● No duplicate elements ● Actually a sorted set – Ordering determined by natural order elements
  • 19. CQL 3.0 Sets CREATE TABLE users ( user_id text PRIMARY KEY, first_name text, last_name text, emails set<text> ); ● Set entire value with set literal INSERT INTO users (user_id, first_name, last_name, emails) VALUES('sam', 'Sam', 'Tunnicliffe', {'sam@beobal.com', 'sam@datastax.com'});
  • 20. CQL 3.0 Sets ● Insert and remove UPDATE users SET emails = emails + {'sam@cohodo.net'} WHERE user_id = 'sam'; UPDATE users SET emails = emails – {'sam@cohodo.net'} WHERE user_id = 'sam' ● No distinction between empty and null set UPDATE users SET emails = {} WHERE user_id = 'sam'; DELETE emails FROM users WHERE user_id = 'sam';
  • 21. CQL 3.0 Lists ● Ordering determined by user, not by natural ordering of elements ● Allows multiple occurrences of same value
  • 22. CQL 3.0 Lists ALTER TABLE users ADD todo list<text>; ● Set, Prepend & Append UPDATE users SET todo = [ 'get to Sinsheim', 'give presentation' ] WHERE user_id = 'sam'; UPDATE users SET todo = [ 'finish slides' ] + todo WHERE user_id = 'sam'; UPDATE users SET todo = todo + [ 'drink beer' ] WHERE user_id = 'sam';
  • 23. CQL 3.0 Lists ● Access elements by index – Less performant, requires read of entire list UPDATE users SET todo[2] = 'slow down' WHERE user_id = 'sam'; DELETE todo[3] FROM users WHERE user_id = 'sam'; ● Remove by value UPDATE users SET todo = todo – ['finish slides'] WHERE user_id = 'sam';
  • 24. CQL 3.0 Maps ● Collection of typed key/value pairs ● Like sets, maps are actually ordered – Ordering determined by the type of the keys ● Operate on entire collection or individual element by key
  • 25. CQL 3.0 Maps ALTER TABLE users ADD accounts map<text, text>; ● Set entire value with map literal UPDATE users SET accounts = { 'twitter' : 'beobal', 'irc' : 'sam___' } WHERE user_id = 'sam'; ● Access elements by key UPDATE users SET accounts['irc'] = 'beobal@freenode.net' WHERE user_id = 'sam'; DELETE accounts['twitter'] FROM users WHERE user_id = 'sam';
  • 26. CQL 3.0 Collections ● Can only retrieve a collection in its entirety – Collections are not intended to be very large – Not a replacement for a good data model ● Typed but cannot currently be nested – Can't define a list<list<int>> ● No support for secondary indexes – On the roadmap but not yet implemented
  • 27. CQL 3.0 Binary Protocol ● Motivations – Ability to optimize for specific use-cases – Reduce client dependencies – Lay groundwork for future enhancements such as streaming/paging
  • 28. CQL 3.0 Binary Protocol ● Framed protocol, designed natively for CQL 3.0 ● Built on Netty ● Java, Python & C drivers in development ● Switched off by default in 1.2 ● Thrift API will remain, no breaking API changes
  • 29. Virtual Nodes ● Evolution of Cassandra's distribution model ● Nodes own multiple token ranges ● Big implications for operations – Replacing failed nodes – Growing / shrinking cluster ● Stay put for Eric
  • 30. Concurrent Schema Changes ● Online schema changes introduced in 0.7 ● Initially schema changes needed serializing – Update schema – Wait for propagation before making next change ● Mostly fixed in 1.1 ● Completed in 1.2 – Concurrent ColumnFamily creation
  • 31. Better JBOD Support ● Growing data volumes and compaction requirements means moar disk ● Pre 1.2 best practice is to run RAID10 ● Improvements to handling disk failures – disk_failure_policy : stop | best_effort | ignore – best_effort implications for writes / reads ● Better write balancing for less contention
  • 32. Misc Other Stuff ● Query Tracing ● Speedup slicing wide rows from disk ● Atomic Batches
  • 33. Future Plans ● Improved caching for wide rows ● Remove 2-phase compaction ● Remove SuperColumns internally