SlideShare una empresa de Scribd logo
1 de 28
Descargar para leer sin conexión
©2013 DataStax Confidential. Do not distribute without consent.
@rustyrazorblade
Jon Haddad

Technical Evangelist, DataStax
Python & Cassandra
1
This should be boring
• Talking to a database should not
be any of the following:
• Exciting
• "AH HA!"
• Confusing
git@github.com:rustyrazorblade/python-presentation.git
Agenda
• Go over driver basic concepts
• Connecting
• Perform queries
• Introduce object mapper
(cqlengine)
• Application integration
DataStax Native Python Driver
• Talks to Cassandra
• Connection pooling
• Aware of cluster topology
• Automatic retries / failure
management
• Load balancing
• Will include object mapper
(cqlengine) in next release
• Fully Open Source (Apache
License)
Connect to Cassandra
• Import and create a Cluster instance
• Cluster takes options such as load balancing policy, reconnect policy, retry
policy
• On connection, driver discovers entire cluster automatically
Executing queries
• CQL: Similar to SQL
• session.execute()
• Create tables, insert, selects
• Can accept simple strings
• Not token aware
Prepared Statements
• Use for all queries (inserts / updates / deletes)
• Decrease server load
• Increase security
• Allows for token aware queries
Async Queries
• Prepared statements required!
• Much faster than sync
• Utilize the entire cluster
• Driver can help us here
• We can use futures
1 statement = """INSERT INTO sensor
2 (sensor_id, name, created_at)
3 VALUES (?, ?, ?)"""
4
5 insert_sensor = session.prepare(statement)
6
7 def create_sensor_entries_callback(response, sensor_id):
8 print "CALLBACK"
9
10 for x in range(10):
11 sensor_data = (uuid.uuid4(), "sensor %d" % x, datetime.now())
12 future = session.execute_async(insert_sensor, sensor_data)
13 future.add_callback(create_sensor_entries_callback, sensor_id)
14
Async Queries w/ Callbacks
callback function
add callback
1 from cassandra.concurrent import execute_concurrent_with_args
2
3 stmt = """SELECT * FROM sensor_data WHERE sensor_id=?
4 ORDER BY created_at DESC LIMIT 1""")
5
6 select_statement = session.prepare(stmt)
7
8 sensor_ids = [["f472d5ff-0c76-404a-8044-038db416685e"],
9 ["940cb741-d5b5-4c5d-82f5-bf1aa61c6d47"],
10 ["497d4b2c-cba2-4d0f-bd80-42de612690fd"],
11 ["1bdeac75-7e12-43ba-80b5-2d38405f9843"]
12
13 result = execute_concurrent_with_args(session, select_statement, sensor_ids)
Async Queries (managed)
prepared statement
automatically manages concurrency
Performance Considerations
• Like SQL, CQL features IN() but in
general, it's terrible for
performance
• Results in more GC & perf
problems
• BATCH has the same issue
• Failure to get a single result
causes entire IN() or batch to retry
Object Mapper
Defining Models
• Each model maps to a single table
• Every model inherits from cassandra.cqlengine.models.Model
• Define fields in your table programatically
• Collections map to native Python types (lists, sets, dict)
• Table management included (no need to write ALTER)
Model with Collections
• Sets & Maps are most useful
• Use to denormalize
• Lists can have performance issues if misused
1 class Message(Model):
2 message_id = TimeUUID(primary_key=True, default=uuid1)
3 subject = Text()
4 body = Text()
5 addressed_to = Set(UUID)
6
7 class Photo(Model):
8 photo_id = UUID(primary_key=True, default=uuid4)
9 title = Text()
10 likes = Map<UUID, Text>
Clustering Keys
• Automatically determined by
ordering in model
• First primary key is partition key
• The rest are clustering keys
1 class UsersInGroup(Model):
2 group_id = UUID(primary_key=True)
3 user_id = UUID(primary_key=True)
4 is_admin = Boolean()
5
6
1 class UsersInGroupByState(Model):
2 group_id = UUID(primary_key=True, partition_key=True)
3 state = Text(primary_key=True, partition_key=True
4 user_id = UUID(primary_key=True)
5 is_admin = Boolean(default=False)
Inserting Data
• Model.create(**kwargs)
• Performs validation
• Supports custom validation
• Supports TTLs
Lightweight Transactions
• Uses paxos for consensus
• IF NOT EXISTS for INSERT
• IF FIELD=VALUE for UPDATE
• Use sparingly - requires
several round trips
Batches
• Use only to maintain multiple views (for consistency purposes)
1 class User(Model):
2 name = Text(primary_key=True)
3 twitter = Text()
4 email = Text()
5
6 class TwitterToUser(Model):
7 twitter = Text(primary_key=True)
8 name = Text()
9
10 (twitter, name) = ("rustyrazorblade", "jon")
11
12 with BatchQuery() as b:
13 User.batch(b).create(name=name, twitter=twitter)
14 EmailToUser.batch(b).create(twitter=twitter, name=name)
Fetching a Row
• Model.get() can be used to
fetch a single row
• Will throw a DoesNotExist
exception if not found
Fetching Many Rows
• Model.objects() accepts any filter acceptable to Cassandra
Table Properties
• Every table option supported
• Compaction
• gc_grace_seconds
• read repair chance
• caching
Table Inheritance
• Multiple tables with similar fields
• Query Pattern: filtering
Table Polymorphism
• Similar to inheritance
• Uses a single table
• Query pattern: select all types
Application Development
Virtual Environments
• virtualenv is your friend!
• mkvirtualenv also your friend!
• pip install mkvirtualenv
Flask==0.10.1
blist==1.3.6
cassandra-driver==2.1.2
Flask==0.9.0
rednose==0.4.1
ipdb==0.7
ipdbplugin==1.2
ipython==2.3.1
mock==1.0.1
nose==1.3.4
All sandboxed environments
Integrations
• Django
• django-cassandra-engine
• Integrates with manage.py
• Flask
• use @app.before_first_request
• General rule: connect post-fork
Go build stuff!
©2013 DataStax Confidential. Do not distribute without consent. 28

Más contenido relacionado

La actualidad más candente

Introduction to Kafka and Zookeeper
Introduction to Kafka and ZookeeperIntroduction to Kafka and Zookeeper
Introduction to Kafka and ZookeeperRahul Jain
 
Real time stock processing with apache nifi, apache flink and apache kafka
Real time stock processing with apache nifi, apache flink and apache kafkaReal time stock processing with apache nifi, apache flink and apache kafka
Real time stock processing with apache nifi, apache flink and apache kafkaTimothy Spann
 
Getting Started with Confluent Schema Registry
Getting Started with Confluent Schema RegistryGetting Started with Confluent Schema Registry
Getting Started with Confluent Schema Registryconfluent
 
Apache Hudi: The Path Forward
Apache Hudi: The Path ForwardApache Hudi: The Path Forward
Apache Hudi: The Path ForwardAlluxio, Inc.
 
Deep Dive into Building Streaming Applications with Apache Pulsar
Deep Dive into Building Streaming Applications with Apache Pulsar Deep Dive into Building Streaming Applications with Apache Pulsar
Deep Dive into Building Streaming Applications with Apache Pulsar Timothy Spann
 
Building robust CDC pipeline with Apache Hudi and Debezium
Building robust CDC pipeline with Apache Hudi and DebeziumBuilding robust CDC pipeline with Apache Hudi and Debezium
Building robust CDC pipeline with Apache Hudi and DebeziumTathastu.ai
 
Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...
Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...
Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...Spark Summit
 
A Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and HudiA Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and HudiDatabricks
 
Lessons Learned: Implementing Azure Synapse Analytics in a Rapidly-Changing S...
Lessons Learned: Implementing Azure Synapse Analytics in a Rapidly-Changing S...Lessons Learned: Implementing Azure Synapse Analytics in a Rapidly-Changing S...
Lessons Learned: Implementing Azure Synapse Analytics in a Rapidly-Changing S...Cathrine Wilhelmsen
 
Security and Data Governance using Apache Ranger and Apache Atlas
Security and Data Governance using Apache Ranger and Apache AtlasSecurity and Data Governance using Apache Ranger and Apache Atlas
Security and Data Governance using Apache Ranger and Apache AtlasDataWorks Summit/Hadoop Summit
 
Kafka Connect - debezium
Kafka Connect - debeziumKafka Connect - debezium
Kafka Connect - debeziumKasun Don
 
Parallelization of Structured Streaming Jobs Using Delta Lake
Parallelization of Structured Streaming Jobs Using Delta LakeParallelization of Structured Streaming Jobs Using Delta Lake
Parallelization of Structured Streaming Jobs Using Delta LakeDatabricks
 
(BDT303) Running Spark and Presto on the Netflix Big Data Platform
(BDT303) Running Spark and Presto on the Netflix Big Data Platform(BDT303) Running Spark and Presto on the Netflix Big Data Platform
(BDT303) Running Spark and Presto on the Netflix Big Data PlatformAmazon Web Services
 
Streaming 101 Revisited: A Fresh Hot Take With Tyler Akidau and Dan Sotolongo...
Streaming 101 Revisited: A Fresh Hot Take With Tyler Akidau and Dan Sotolongo...Streaming 101 Revisited: A Fresh Hot Take With Tyler Akidau and Dan Sotolongo...
Streaming 101 Revisited: A Fresh Hot Take With Tyler Akidau and Dan Sotolongo...HostedbyConfluent
 
ORC Column Encryption
ORC Column EncryptionORC Column Encryption
ORC Column EncryptionOwen O'Malley
 
Simplify CDC Pipeline with Spark Streaming SQL and Delta Lake
Simplify CDC Pipeline with Spark Streaming SQL and Delta LakeSimplify CDC Pipeline with Spark Streaming SQL and Delta Lake
Simplify CDC Pipeline with Spark Streaming SQL and Delta LakeDatabricks
 
Best practices for MySQL High Availability Tutorial
Best practices for MySQL High Availability TutorialBest practices for MySQL High Availability Tutorial
Best practices for MySQL High Availability TutorialColin Charles
 
Write Faster SQL with Trino.pdf
Write Faster SQL with Trino.pdfWrite Faster SQL with Trino.pdf
Write Faster SQL with Trino.pdfEric Xiao
 
Hadoop Meetup Jan 2019 - HDFS Scalability and Consistent Reads from Standby Node
Hadoop Meetup Jan 2019 - HDFS Scalability and Consistent Reads from Standby NodeHadoop Meetup Jan 2019 - HDFS Scalability and Consistent Reads from Standby Node
Hadoop Meetup Jan 2019 - HDFS Scalability and Consistent Reads from Standby NodeErik Krogen
 
PySpark Best Practices
PySpark Best PracticesPySpark Best Practices
PySpark Best PracticesCloudera, Inc.
 

La actualidad más candente (20)

Introduction to Kafka and Zookeeper
Introduction to Kafka and ZookeeperIntroduction to Kafka and Zookeeper
Introduction to Kafka and Zookeeper
 
Real time stock processing with apache nifi, apache flink and apache kafka
Real time stock processing with apache nifi, apache flink and apache kafkaReal time stock processing with apache nifi, apache flink and apache kafka
Real time stock processing with apache nifi, apache flink and apache kafka
 
Getting Started with Confluent Schema Registry
Getting Started with Confluent Schema RegistryGetting Started with Confluent Schema Registry
Getting Started with Confluent Schema Registry
 
Apache Hudi: The Path Forward
Apache Hudi: The Path ForwardApache Hudi: The Path Forward
Apache Hudi: The Path Forward
 
Deep Dive into Building Streaming Applications with Apache Pulsar
Deep Dive into Building Streaming Applications with Apache Pulsar Deep Dive into Building Streaming Applications with Apache Pulsar
Deep Dive into Building Streaming Applications with Apache Pulsar
 
Building robust CDC pipeline with Apache Hudi and Debezium
Building robust CDC pipeline with Apache Hudi and DebeziumBuilding robust CDC pipeline with Apache Hudi and Debezium
Building robust CDC pipeline with Apache Hudi and Debezium
 
Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...
Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...
Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...
 
A Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and HudiA Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and Hudi
 
Lessons Learned: Implementing Azure Synapse Analytics in a Rapidly-Changing S...
Lessons Learned: Implementing Azure Synapse Analytics in a Rapidly-Changing S...Lessons Learned: Implementing Azure Synapse Analytics in a Rapidly-Changing S...
Lessons Learned: Implementing Azure Synapse Analytics in a Rapidly-Changing S...
 
Security and Data Governance using Apache Ranger and Apache Atlas
Security and Data Governance using Apache Ranger and Apache AtlasSecurity and Data Governance using Apache Ranger and Apache Atlas
Security and Data Governance using Apache Ranger and Apache Atlas
 
Kafka Connect - debezium
Kafka Connect - debeziumKafka Connect - debezium
Kafka Connect - debezium
 
Parallelization of Structured Streaming Jobs Using Delta Lake
Parallelization of Structured Streaming Jobs Using Delta LakeParallelization of Structured Streaming Jobs Using Delta Lake
Parallelization of Structured Streaming Jobs Using Delta Lake
 
(BDT303) Running Spark and Presto on the Netflix Big Data Platform
(BDT303) Running Spark and Presto on the Netflix Big Data Platform(BDT303) Running Spark and Presto on the Netflix Big Data Platform
(BDT303) Running Spark and Presto on the Netflix Big Data Platform
 
Streaming 101 Revisited: A Fresh Hot Take With Tyler Akidau and Dan Sotolongo...
Streaming 101 Revisited: A Fresh Hot Take With Tyler Akidau and Dan Sotolongo...Streaming 101 Revisited: A Fresh Hot Take With Tyler Akidau and Dan Sotolongo...
Streaming 101 Revisited: A Fresh Hot Take With Tyler Akidau and Dan Sotolongo...
 
ORC Column Encryption
ORC Column EncryptionORC Column Encryption
ORC Column Encryption
 
Simplify CDC Pipeline with Spark Streaming SQL and Delta Lake
Simplify CDC Pipeline with Spark Streaming SQL and Delta LakeSimplify CDC Pipeline with Spark Streaming SQL and Delta Lake
Simplify CDC Pipeline with Spark Streaming SQL and Delta Lake
 
Best practices for MySQL High Availability Tutorial
Best practices for MySQL High Availability TutorialBest practices for MySQL High Availability Tutorial
Best practices for MySQL High Availability Tutorial
 
Write Faster SQL with Trino.pdf
Write Faster SQL with Trino.pdfWrite Faster SQL with Trino.pdf
Write Faster SQL with Trino.pdf
 
Hadoop Meetup Jan 2019 - HDFS Scalability and Consistent Reads from Standby Node
Hadoop Meetup Jan 2019 - HDFS Scalability and Consistent Reads from Standby NodeHadoop Meetup Jan 2019 - HDFS Scalability and Consistent Reads from Standby Node
Hadoop Meetup Jan 2019 - HDFS Scalability and Consistent Reads from Standby Node
 
PySpark Best Practices
PySpark Best PracticesPySpark Best Practices
PySpark Best Practices
 

Destacado

Client Drivers and Cassandra, the Right Way
Client Drivers and Cassandra, the Right WayClient Drivers and Cassandra, the Right Way
Client Drivers and Cassandra, the Right WayDataStax Academy
 
Cassandra Core Concepts - Cassandra Day Toronto
Cassandra Core Concepts - Cassandra Day TorontoCassandra Core Concepts - Cassandra Day Toronto
Cassandra Core Concepts - Cassandra Day TorontoJon Haddad
 
Crash course intro to cassandra
Crash course   intro to cassandraCrash course   intro to cassandra
Crash course intro to cassandraJon Haddad
 
Cassandra Core Concepts
Cassandra Core ConceptsCassandra Core Concepts
Cassandra Core ConceptsJon Haddad
 
Cassandra 3.0 Awesomeness
Cassandra 3.0 AwesomenessCassandra 3.0 Awesomeness
Cassandra 3.0 AwesomenessJon Haddad
 
Diagnosing Problems in Production (Nov 2015)
Diagnosing Problems in Production (Nov 2015)Diagnosing Problems in Production (Nov 2015)
Diagnosing Problems in Production (Nov 2015)Jon Haddad
 
Diagnosing Problems in Production - Cassandra
Diagnosing Problems in Production - CassandraDiagnosing Problems in Production - Cassandra
Diagnosing Problems in Production - CassandraJon Haddad
 
Enter the Snake Pit for Fast and Easy Spark
Enter the Snake Pit for Fast and Easy SparkEnter the Snake Pit for Fast and Easy Spark
Enter the Snake Pit for Fast and Easy SparkJon Haddad
 
Spark and cassandra (Hulu Talk)
Spark and cassandra (Hulu Talk)Spark and cassandra (Hulu Talk)
Spark and cassandra (Hulu Talk)Jon Haddad
 
Cassandra meetup slides - Oct 15 Santa Monica Coloft
Cassandra meetup slides - Oct 15 Santa Monica ColoftCassandra meetup slides - Oct 15 Santa Monica Coloft
Cassandra meetup slides - Oct 15 Santa Monica ColoftJon Haddad
 
Introduction to Cassandra - Denver
Introduction to Cassandra - DenverIntroduction to Cassandra - Denver
Introduction to Cassandra - DenverJon Haddad
 
Diagnosing Problems in Production: Cassandra Summit 2014
Diagnosing Problems in Production: Cassandra Summit 2014Diagnosing Problems in Production: Cassandra Summit 2014
Diagnosing Problems in Production: Cassandra Summit 2014Jon Haddad
 
Python & Cassandra - Best Friends
Python & Cassandra - Best FriendsPython & Cassandra - Best Friends
Python & Cassandra - Best FriendsJon Haddad
 
Intro to Cassandra
Intro to CassandraIntro to Cassandra
Intro to CassandraJon Haddad
 
Python performance profiling
Python performance profilingPython performance profiling
Python performance profilingJon Haddad
 
Python internals and how they affect your code - kasra ahmadvand
Python internals and how they affect your code - kasra ahmadvandPython internals and how they affect your code - kasra ahmadvand
Python internals and how they affect your code - kasra ahmadvandirpycon
 
SanDisk: Persistent Memory and Cassandra
SanDisk: Persistent Memory and CassandraSanDisk: Persistent Memory and Cassandra
SanDisk: Persistent Memory and CassandraDataStax Academy
 
Introduction to Cassandra Architecture
Introduction to Cassandra ArchitectureIntroduction to Cassandra Architecture
Introduction to Cassandra Architecturenickmbailey
 
CQL In Cassandra 1.0 (and beyond)
CQL In Cassandra 1.0 (and beyond)CQL In Cassandra 1.0 (and beyond)
CQL In Cassandra 1.0 (and beyond)Eric Evans
 
Cassandra Materialized Views
Cassandra Materialized ViewsCassandra Materialized Views
Cassandra Materialized ViewsCarl Yeksigian
 

Destacado (20)

Client Drivers and Cassandra, the Right Way
Client Drivers and Cassandra, the Right WayClient Drivers and Cassandra, the Right Way
Client Drivers and Cassandra, the Right Way
 
Cassandra Core Concepts - Cassandra Day Toronto
Cassandra Core Concepts - Cassandra Day TorontoCassandra Core Concepts - Cassandra Day Toronto
Cassandra Core Concepts - Cassandra Day Toronto
 
Crash course intro to cassandra
Crash course   intro to cassandraCrash course   intro to cassandra
Crash course intro to cassandra
 
Cassandra Core Concepts
Cassandra Core ConceptsCassandra Core Concepts
Cassandra Core Concepts
 
Cassandra 3.0 Awesomeness
Cassandra 3.0 AwesomenessCassandra 3.0 Awesomeness
Cassandra 3.0 Awesomeness
 
Diagnosing Problems in Production (Nov 2015)
Diagnosing Problems in Production (Nov 2015)Diagnosing Problems in Production (Nov 2015)
Diagnosing Problems in Production (Nov 2015)
 
Diagnosing Problems in Production - Cassandra
Diagnosing Problems in Production - CassandraDiagnosing Problems in Production - Cassandra
Diagnosing Problems in Production - Cassandra
 
Enter the Snake Pit for Fast and Easy Spark
Enter the Snake Pit for Fast and Easy SparkEnter the Snake Pit for Fast and Easy Spark
Enter the Snake Pit for Fast and Easy Spark
 
Spark and cassandra (Hulu Talk)
Spark and cassandra (Hulu Talk)Spark and cassandra (Hulu Talk)
Spark and cassandra (Hulu Talk)
 
Cassandra meetup slides - Oct 15 Santa Monica Coloft
Cassandra meetup slides - Oct 15 Santa Monica ColoftCassandra meetup slides - Oct 15 Santa Monica Coloft
Cassandra meetup slides - Oct 15 Santa Monica Coloft
 
Introduction to Cassandra - Denver
Introduction to Cassandra - DenverIntroduction to Cassandra - Denver
Introduction to Cassandra - Denver
 
Diagnosing Problems in Production: Cassandra Summit 2014
Diagnosing Problems in Production: Cassandra Summit 2014Diagnosing Problems in Production: Cassandra Summit 2014
Diagnosing Problems in Production: Cassandra Summit 2014
 
Python & Cassandra - Best Friends
Python & Cassandra - Best FriendsPython & Cassandra - Best Friends
Python & Cassandra - Best Friends
 
Intro to Cassandra
Intro to CassandraIntro to Cassandra
Intro to Cassandra
 
Python performance profiling
Python performance profilingPython performance profiling
Python performance profiling
 
Python internals and how they affect your code - kasra ahmadvand
Python internals and how they affect your code - kasra ahmadvandPython internals and how they affect your code - kasra ahmadvand
Python internals and how they affect your code - kasra ahmadvand
 
SanDisk: Persistent Memory and Cassandra
SanDisk: Persistent Memory and CassandraSanDisk: Persistent Memory and Cassandra
SanDisk: Persistent Memory and Cassandra
 
Introduction to Cassandra Architecture
Introduction to Cassandra ArchitectureIntroduction to Cassandra Architecture
Introduction to Cassandra Architecture
 
CQL In Cassandra 1.0 (and beyond)
CQL In Cassandra 1.0 (and beyond)CQL In Cassandra 1.0 (and beyond)
CQL In Cassandra 1.0 (and beyond)
 
Cassandra Materialized Views
Cassandra Materialized ViewsCassandra Materialized Views
Cassandra Materialized Views
 

Similar a Python and cassandra

Preparse Query Rewrite Plugins
Preparse Query Rewrite PluginsPreparse Query Rewrite Plugins
Preparse Query Rewrite PluginsSveta Smirnova
 
DBA Commands and Concepts That Every Developer Should Know - Part 2
DBA Commands and Concepts That Every Developer Should Know - Part 2DBA Commands and Concepts That Every Developer Should Know - Part 2
DBA Commands and Concepts That Every Developer Should Know - Part 2Alex Zaballa
 
DBA Commands and Concepts That Every Developer Should Know - Part 2
DBA Commands and Concepts That Every Developer Should Know - Part 2DBA Commands and Concepts That Every Developer Should Know - Part 2
DBA Commands and Concepts That Every Developer Should Know - Part 2Alex Zaballa
 
Getting Started with Datatsax .Net Driver
Getting Started with Datatsax .Net DriverGetting Started with Datatsax .Net Driver
Getting Started with Datatsax .Net DriverDataStax Academy
 
PostgreSQL Performance Problems: Monitoring and Alerting
PostgreSQL Performance Problems: Monitoring and AlertingPostgreSQL Performance Problems: Monitoring and Alerting
PostgreSQL Performance Problems: Monitoring and AlertingGrant Fritchey
 
DjangoCon 2010 Scaling Disqus
DjangoCon 2010 Scaling DisqusDjangoCon 2010 Scaling Disqus
DjangoCon 2010 Scaling Disquszeeg
 
Cassandra Day NY 2014: Getting Started with the DataStax C# Driver
Cassandra Day NY 2014: Getting Started with the DataStax C# DriverCassandra Day NY 2014: Getting Started with the DataStax C# Driver
Cassandra Day NY 2014: Getting Started with the DataStax C# DriverDataStax Academy
 
Building and Scaling Node.js Applications
Building and Scaling Node.js ApplicationsBuilding and Scaling Node.js Applications
Building and Scaling Node.js ApplicationsOhad Kravchick
 
Staying Ahead of the Curve with Spring and Cassandra 4
Staying Ahead of the Curve with Spring and Cassandra 4Staying Ahead of the Curve with Spring and Cassandra 4
Staying Ahead of the Curve with Spring and Cassandra 4VMware Tanzu
 
Staying Ahead of the Curve with Spring and Cassandra 4 (SpringOne 2020)
Staying Ahead of the Curve with Spring and Cassandra 4 (SpringOne 2020)Staying Ahead of the Curve with Spring and Cassandra 4 (SpringOne 2020)
Staying Ahead of the Curve with Spring and Cassandra 4 (SpringOne 2020)Alexandre Dutra
 
Planet-HTML5-Game-Engine Javascript Performance Enhancement
Planet-HTML5-Game-Engine Javascript Performance EnhancementPlanet-HTML5-Game-Engine Javascript Performance Enhancement
Planet-HTML5-Game-Engine Javascript Performance Enhancementup2soul
 
Scaling MySQL Strategies for Developers
Scaling MySQL Strategies for DevelopersScaling MySQL Strategies for Developers
Scaling MySQL Strategies for DevelopersJonathan Levin
 
(DEV204) Building High-Performance Native Cloud Apps In C++
(DEV204) Building High-Performance Native Cloud Apps In C++(DEV204) Building High-Performance Native Cloud Apps In C++
(DEV204) Building High-Performance Native Cloud Apps In C++Amazon Web Services
 
Performance Test Driven Development with Oracle Coherence
Performance Test Driven Development with Oracle CoherencePerformance Test Driven Development with Oracle Coherence
Performance Test Driven Development with Oracle Coherencearagozin
 
Built-in query caching for all PHP MySQL extensions/APIs
Built-in query caching for all PHP MySQL extensions/APIsBuilt-in query caching for all PHP MySQL extensions/APIs
Built-in query caching for all PHP MySQL extensions/APIsUlf Wendel
 
New features in Performance Schema 5.7 in action
New features in Performance Schema 5.7 in actionNew features in Performance Schema 5.7 in action
New features in Performance Schema 5.7 in actionSveta Smirnova
 
Introduction to Apache Mesos
Introduction to Apache MesosIntroduction to Apache Mesos
Introduction to Apache MesosJoe Stein
 
Where the wild things are - Benchmarking and Micro-Optimisations
Where the wild things are - Benchmarking and Micro-OptimisationsWhere the wild things are - Benchmarking and Micro-Optimisations
Where the wild things are - Benchmarking and Micro-OptimisationsMatt Warren
 
Query Optimization with MySQL 5.6: Old and New Tricks
Query Optimization with MySQL 5.6: Old and New TricksQuery Optimization with MySQL 5.6: Old and New Tricks
Query Optimization with MySQL 5.6: Old and New TricksMYXPLAIN
 

Similar a Python and cassandra (20)

Preparse Query Rewrite Plugins
Preparse Query Rewrite PluginsPreparse Query Rewrite Plugins
Preparse Query Rewrite Plugins
 
DBA Commands and Concepts That Every Developer Should Know - Part 2
DBA Commands and Concepts That Every Developer Should Know - Part 2DBA Commands and Concepts That Every Developer Should Know - Part 2
DBA Commands and Concepts That Every Developer Should Know - Part 2
 
DBA Commands and Concepts That Every Developer Should Know - Part 2
DBA Commands and Concepts That Every Developer Should Know - Part 2DBA Commands and Concepts That Every Developer Should Know - Part 2
DBA Commands and Concepts That Every Developer Should Know - Part 2
 
Getting Started with Datatsax .Net Driver
Getting Started with Datatsax .Net DriverGetting Started with Datatsax .Net Driver
Getting Started with Datatsax .Net Driver
 
PostgreSQL Performance Problems: Monitoring and Alerting
PostgreSQL Performance Problems: Monitoring and AlertingPostgreSQL Performance Problems: Monitoring and Alerting
PostgreSQL Performance Problems: Monitoring and Alerting
 
DjangoCon 2010 Scaling Disqus
DjangoCon 2010 Scaling DisqusDjangoCon 2010 Scaling Disqus
DjangoCon 2010 Scaling Disqus
 
Cassandra Day NY 2014: Getting Started with the DataStax C# Driver
Cassandra Day NY 2014: Getting Started with the DataStax C# DriverCassandra Day NY 2014: Getting Started with the DataStax C# Driver
Cassandra Day NY 2014: Getting Started with the DataStax C# Driver
 
Building and Scaling Node.js Applications
Building and Scaling Node.js ApplicationsBuilding and Scaling Node.js Applications
Building and Scaling Node.js Applications
 
Staying Ahead of the Curve with Spring and Cassandra 4
Staying Ahead of the Curve with Spring and Cassandra 4Staying Ahead of the Curve with Spring and Cassandra 4
Staying Ahead of the Curve with Spring and Cassandra 4
 
Staying Ahead of the Curve with Spring and Cassandra 4 (SpringOne 2020)
Staying Ahead of the Curve with Spring and Cassandra 4 (SpringOne 2020)Staying Ahead of the Curve with Spring and Cassandra 4 (SpringOne 2020)
Staying Ahead of the Curve with Spring and Cassandra 4 (SpringOne 2020)
 
Planet-HTML5-Game-Engine Javascript Performance Enhancement
Planet-HTML5-Game-Engine Javascript Performance EnhancementPlanet-HTML5-Game-Engine Javascript Performance Enhancement
Planet-HTML5-Game-Engine Javascript Performance Enhancement
 
Scaling MySQL Strategies for Developers
Scaling MySQL Strategies for DevelopersScaling MySQL Strategies for Developers
Scaling MySQL Strategies for Developers
 
(DEV204) Building High-Performance Native Cloud Apps In C++
(DEV204) Building High-Performance Native Cloud Apps In C++(DEV204) Building High-Performance Native Cloud Apps In C++
(DEV204) Building High-Performance Native Cloud Apps In C++
 
Performance Test Driven Development with Oracle Coherence
Performance Test Driven Development with Oracle CoherencePerformance Test Driven Development with Oracle Coherence
Performance Test Driven Development with Oracle Coherence
 
Built-in query caching for all PHP MySQL extensions/APIs
Built-in query caching for all PHP MySQL extensions/APIsBuilt-in query caching for all PHP MySQL extensions/APIs
Built-in query caching for all PHP MySQL extensions/APIs
 
New features in Performance Schema 5.7 in action
New features in Performance Schema 5.7 in actionNew features in Performance Schema 5.7 in action
New features in Performance Schema 5.7 in action
 
Introduction to Apache Mesos
Introduction to Apache MesosIntroduction to Apache Mesos
Introduction to Apache Mesos
 
Where the wild things are - Benchmarking and Micro-Optimisations
Where the wild things are - Benchmarking and Micro-OptimisationsWhere the wild things are - Benchmarking and Micro-Optimisations
Where the wild things are - Benchmarking and Micro-Optimisations
 
Rails Security
Rails SecurityRails Security
Rails Security
 
Query Optimization with MySQL 5.6: Old and New Tricks
Query Optimization with MySQL 5.6: Old and New TricksQuery Optimization with MySQL 5.6: Old and New Tricks
Query Optimization with MySQL 5.6: Old and New Tricks
 

Último

🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 

Último (20)

🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 

Python and cassandra

  • 1. ©2013 DataStax Confidential. Do not distribute without consent. @rustyrazorblade Jon Haddad
 Technical Evangelist, DataStax Python & Cassandra 1
  • 2. This should be boring • Talking to a database should not be any of the following: • Exciting • "AH HA!" • Confusing git@github.com:rustyrazorblade/python-presentation.git
  • 3. Agenda • Go over driver basic concepts • Connecting • Perform queries • Introduce object mapper (cqlengine) • Application integration
  • 4. DataStax Native Python Driver • Talks to Cassandra • Connection pooling • Aware of cluster topology • Automatic retries / failure management • Load balancing • Will include object mapper (cqlengine) in next release • Fully Open Source (Apache License)
  • 5. Connect to Cassandra • Import and create a Cluster instance • Cluster takes options such as load balancing policy, reconnect policy, retry policy • On connection, driver discovers entire cluster automatically
  • 6. Executing queries • CQL: Similar to SQL • session.execute() • Create tables, insert, selects • Can accept simple strings • Not token aware
  • 7. Prepared Statements • Use for all queries (inserts / updates / deletes) • Decrease server load • Increase security • Allows for token aware queries
  • 8. Async Queries • Prepared statements required! • Much faster than sync • Utilize the entire cluster • Driver can help us here • We can use futures
  • 9. 1 statement = """INSERT INTO sensor 2 (sensor_id, name, created_at) 3 VALUES (?, ?, ?)""" 4 5 insert_sensor = session.prepare(statement) 6 7 def create_sensor_entries_callback(response, sensor_id): 8 print "CALLBACK" 9 10 for x in range(10): 11 sensor_data = (uuid.uuid4(), "sensor %d" % x, datetime.now()) 12 future = session.execute_async(insert_sensor, sensor_data) 13 future.add_callback(create_sensor_entries_callback, sensor_id) 14 Async Queries w/ Callbacks callback function add callback
  • 10. 1 from cassandra.concurrent import execute_concurrent_with_args 2 3 stmt = """SELECT * FROM sensor_data WHERE sensor_id=? 4 ORDER BY created_at DESC LIMIT 1""") 5 6 select_statement = session.prepare(stmt) 7 8 sensor_ids = [["f472d5ff-0c76-404a-8044-038db416685e"], 9 ["940cb741-d5b5-4c5d-82f5-bf1aa61c6d47"], 10 ["497d4b2c-cba2-4d0f-bd80-42de612690fd"], 11 ["1bdeac75-7e12-43ba-80b5-2d38405f9843"] 12 13 result = execute_concurrent_with_args(session, select_statement, sensor_ids) Async Queries (managed) prepared statement automatically manages concurrency
  • 11. Performance Considerations • Like SQL, CQL features IN() but in general, it's terrible for performance • Results in more GC & perf problems • BATCH has the same issue • Failure to get a single result causes entire IN() or batch to retry
  • 13. Defining Models • Each model maps to a single table • Every model inherits from cassandra.cqlengine.models.Model • Define fields in your table programatically • Collections map to native Python types (lists, sets, dict) • Table management included (no need to write ALTER)
  • 14. Model with Collections • Sets & Maps are most useful • Use to denormalize • Lists can have performance issues if misused 1 class Message(Model): 2 message_id = TimeUUID(primary_key=True, default=uuid1) 3 subject = Text() 4 body = Text() 5 addressed_to = Set(UUID) 6 7 class Photo(Model): 8 photo_id = UUID(primary_key=True, default=uuid4) 9 title = Text() 10 likes = Map<UUID, Text>
  • 15. Clustering Keys • Automatically determined by ordering in model • First primary key is partition key • The rest are clustering keys 1 class UsersInGroup(Model): 2 group_id = UUID(primary_key=True) 3 user_id = UUID(primary_key=True) 4 is_admin = Boolean() 5 6 1 class UsersInGroupByState(Model): 2 group_id = UUID(primary_key=True, partition_key=True) 3 state = Text(primary_key=True, partition_key=True 4 user_id = UUID(primary_key=True) 5 is_admin = Boolean(default=False)
  • 16. Inserting Data • Model.create(**kwargs) • Performs validation • Supports custom validation • Supports TTLs
  • 17. Lightweight Transactions • Uses paxos for consensus • IF NOT EXISTS for INSERT • IF FIELD=VALUE for UPDATE • Use sparingly - requires several round trips
  • 18. Batches • Use only to maintain multiple views (for consistency purposes) 1 class User(Model): 2 name = Text(primary_key=True) 3 twitter = Text() 4 email = Text() 5 6 class TwitterToUser(Model): 7 twitter = Text(primary_key=True) 8 name = Text() 9 10 (twitter, name) = ("rustyrazorblade", "jon") 11 12 with BatchQuery() as b: 13 User.batch(b).create(name=name, twitter=twitter) 14 EmailToUser.batch(b).create(twitter=twitter, name=name)
  • 19. Fetching a Row • Model.get() can be used to fetch a single row • Will throw a DoesNotExist exception if not found
  • 20. Fetching Many Rows • Model.objects() accepts any filter acceptable to Cassandra
  • 21. Table Properties • Every table option supported • Compaction • gc_grace_seconds • read repair chance • caching
  • 22. Table Inheritance • Multiple tables with similar fields • Query Pattern: filtering
  • 23. Table Polymorphism • Similar to inheritance • Uses a single table • Query pattern: select all types
  • 25. Virtual Environments • virtualenv is your friend! • mkvirtualenv also your friend! • pip install mkvirtualenv Flask==0.10.1 blist==1.3.6 cassandra-driver==2.1.2 Flask==0.9.0 rednose==0.4.1 ipdb==0.7 ipdbplugin==1.2 ipython==2.3.1 mock==1.0.1 nose==1.3.4 All sandboxed environments
  • 26. Integrations • Django • django-cassandra-engine • Integrates with manage.py • Flask • use @app.before_first_request • General rule: connect post-fork
  • 28. ©2013 DataStax Confidential. Do not distribute without consent. 28