SlideShare una empresa de Scribd logo
1 de 34
Descargar para leer sin conexión
Cassandra     for
Python Developers

     Tyler Hobbs
History

    Open-sourced by Facebook   2008

    Apache Incubator           2009

    Top-level Apache project   2010

    DataStax founded           2010
Strengths

    Scalable
    – 2x Nodes == 2x Performance

    Reliable (Available)
    – Replication that works
    – Multi-DC support
    – No single point of failure
Strengths

    Fast
    – 10-30k writes/sec, 1-10k reads/sec

    Analytics
    – Integrated Hadoop support
Weaknesses

    No ACID transactions
    – Don't need these as often as you'd think

    Limited support for ad-hoc queries
    – You'll give these up anyway when sharding an RDBMS


    Generally complements another system
    – Not intended to be one-size-fits-all
Clustering

    Every node plays the same role
    – No masters, slaves, or special nodes
Clustering

             0

      50          10




      40          20

             30
Clustering
                       Key: “www.google.com”
             0

      50          10




      40          20

             30
Clustering
                       Key: “www.google.com”
             0
                       md5(“www.google.com”)
      50          10

                                14

      40          20

             30
Clustering
                       Key: “www.google.com”
             0
                       md5(“www.google.com”)
      50          10

                                14

      40          20

             30
Clustering
                       Key: “www.google.com”
             0
                       md5(“www.google.com”)
      50          10

                                14

      40          20

             30
Clustering
                          Key: “www.google.com”
             0
                          md5(“www.google.com”)
      50           10

                                     14

      40           20

             30
                  Replication Factor = 3
Clustering
   Client can talk to any node
Data Model

    Keyspace
    – A collection of Column Families
    – Controls replication settings

    Column Family
    – Kinda resembles a table
ColumnFamilies

    Static
    – Object data

    Dynamic
    – Pre-calculated query results
Static Column Families
                   Users
   zznate    password: *    name: Nate


   driftx    password: *   name: Brandon


   thobbs    password: *    name: Tyler


   jbellis   password: *   name: Jonathan   site: riptano.com
Dynamic Column Families
                     Following
zznate    driftx:   thobbs:


driftx


thobbs    zznate:


jbellis   driftx:   mdennis:   pcmanus   thobbs:   xedin:   zznate
Dynamic Column Families

    Timeline of tweets by a user

    Timeline of tweets by all of the people a
    user is following

    List of comments sorted by score

    List of friends grouped by state
Pycassa


    Python client library for Cassandra

    Open Source (MIT License)
    – www.github.com/pycassa/pycassa

    Users
    – Reddit
    – ~10k github downloads of every version
Installing Pycassa

    easy_install pycassa
    – or pip
Basic Layout

    pycassa.pool
    – Connection pooling

    pycassa.columnfamily
    – Primary module for the data API

    pycassa.system_manager
    – Schema management
The Data API

    RPC-based API

    Rows are like a sorted list of (name,value)
    tuples
    – Like a dict, but sorted by the names
    – OrderedDicts are used to preserve sorting
Inserting Data
>>> from pycassa.pool import ConnectionPool
>>> from pycassa.columnfamily import ColumnFamily
>>>
>>> pool = ConnectionPool(“MyKeyspace”)
>>> cf = ColumnFamily(pool, “MyCF”)
>>>
>>> cf.insert(“key”, {“col_name”: “col_value”})
>>> cf.get(“key”)
{“col_name”: “col_value”}
Inserting Data
>>> columns = {“aaa”: 1, “ccc”: 3}
>>> cf.insert(“key”, columns)
>>> cf.get(“key”)
{“aaa”: 1, “ccc”: 3}
>>>
>>> # Updates are the same as inserts
>>> cf.insert(“key”, {“aaa”: 42})
>>> cf.get(“key”)
{“aaa”: 42, “ccc”: 3}
>>>
>>> # We can insert anywhere in the row
>>> cf.insert(“key”, {“bbb”: 2, “ddd”: 4})
>>> cf.get(“key”)
{“aaa”: 42, “bbb”: 2, “ccc”: 3, “ddd”: 4}
Fetching Data
>>> cf.get(“key”)
{“aaa”: 42, “bbb”: 2, “ccc”: 3, “ddd”: 4}
>>>
>>> # Get a set of columns by name
>>> cf.get(“key”, columns=[“bbb”, “ddd”])
{“bbb”: 2, “ddd”: 4}
Fetching Data
>>> # Get a slice of columns
>>> cf.get(“key”, column_start=”bbb”,
...               column_finish=”ccc”)
{“bbb”: 2, “ccc”: 3}
>>>
>>> # Slice from “ccc” to the end
>>> cf.get(“key”, column_start=”ccc”)
{“ccc”: 3, “ddd”: 4}
>>>
>>> # Slice from “bbb” to the beginning
>>> cf.get(“key”, column_start=”bbb”,
...               column_reversed=True)
{“bbb”: 2, “aaa”: 42}
Fetching Data
>>> # Get the first two columns in the row
>>> cf.get(“key”, column_count=2)
{“aaa”: 42, “bbb”: 2}
>>>
>>> # Get the last two columns in the row
>>> cf.get(“key”, column_reversed=True,
...               column_count=2)
{“ddd”: 4, “ccc”: 3}
Fetching Multiple Rows
>>> columns = {“col”: “val”}
>>> cf.batch_insert({“k1”: columns,
...                  “k2”: columns,
...                  “k3”: columns})
>>>
>>> # Get multiple rows by name
>>> cf.multiget([“k1”,“k2”])
{“k1”: {”col”: “val”},
 “k2”: {“col”: “val”}}


>>> # You can get slices of each row, too
>>> cf.multiget([“k1”,“k2”], column_start=”bbb”) …
Fetching a Range of Rows
>>> # Get a generator over all of the rows
>>> for key, columns in cf.get_range():
...     print key, columns
“k1” {”col”: “val”}
“k2” {“col”: “val”}
“k3” {“col”: “val”}


>>> # You can get slices of each row
>>> cf.get_range(column_start=”bbb”) …
Fetching Rows by Secondary Index
>>> from pycassa.index import *
>>>
>>> # Build up our index clause to match
>>> exp = create_index_expression(“name”, “Joe”)
>>> clause = create_index_clause([exp])
>>> matches = users.get_indexed_slices(clause)
>>>
>>> # results is a generator over matching rows
>>> for key, columns in matches:
...     print key, columns
“13” {”name”: “Joe”, “nick”: “thatguy2”}
“257” {“name”: “Joe”, “nick”: “flowers”}
“98” {“name”: “Joe”, “nick”: “fr0d0”}
Deleting Data
>>>   # Delete a whole row
>>>   cf.remove(“key1”)
>>>
>>>   # Or selectively delete columns
>>>   cf.remove(“key2”, columns=[“name”, “date”])
Connection Management

    pycassa.pool.ConnectionPool
    – Takes a list of servers
        • Can be any set of nodes in your cluster
    – pool_size, max_retries, timeout
    – Automatically retries operations against other nodes
        • Writes are idempotent!
    – Individual node failures are transparent
    – Thread safe
Async Options

    eventlet
    – Just need to monkeypatch socket and threading

    Twisted
    – Use Telephus instead of Pycassa
    – www.github.com/driftx/telephus
    – Less friendly, documented, etc
Tyler Hobbs
        @tylhobbs
tyler@datastax.com

Más contenido relacionado

La actualidad más candente

Python in the database
Python in the databasePython in the database
Python in the databasepybcn
 
Caching and tuning fun for high scalability
Caching and tuning fun for high scalabilityCaching and tuning fun for high scalability
Caching and tuning fun for high scalabilityWim Godden
 
Apache Cassandra Lesson: Data Modelling and CQL3
Apache Cassandra Lesson: Data Modelling and CQL3Apache Cassandra Lesson: Data Modelling and CQL3
Apache Cassandra Lesson: Data Modelling and CQL3Markus Klems
 
MongoDB Database Replication
MongoDB Database ReplicationMongoDB Database Replication
MongoDB Database ReplicationMehdi Valikhani
 
Cassandra 3.0 Awesomeness
Cassandra 3.0 AwesomenessCassandra 3.0 Awesomeness
Cassandra 3.0 AwesomenessJon Haddad
 
CQL performance with Apache Cassandra 3.0 (Aaron Morton, The Last Pickle) | C...
CQL performance with Apache Cassandra 3.0 (Aaron Morton, The Last Pickle) | C...CQL performance with Apache Cassandra 3.0 (Aaron Morton, The Last Pickle) | C...
CQL performance with Apache Cassandra 3.0 (Aaron Morton, The Last Pickle) | C...DataStax
 
Replication MongoDB Days 2013
Replication MongoDB Days 2013Replication MongoDB Days 2013
Replication MongoDB Days 2013Randall Hunt
 
C*ollege Credit: Creating Your First App in Java with Cassandra
C*ollege Credit: Creating Your First App in Java with CassandraC*ollege Credit: Creating Your First App in Java with Cassandra
C*ollege Credit: Creating Your First App in Java with CassandraDataStax
 
Cassandra EU - Data model on fire
Cassandra EU - Data model on fireCassandra EU - Data model on fire
Cassandra EU - Data model on firePatrick McFadin
 
Cassandra 3.0 - JSON at scale - StampedeCon 2015
Cassandra 3.0 - JSON at scale - StampedeCon 2015Cassandra 3.0 - JSON at scale - StampedeCon 2015
Cassandra 3.0 - JSON at scale - StampedeCon 2015StampedeCon
 
Beyond PHP - it's not (just) about the code
Beyond PHP - it's not (just) about the codeBeyond PHP - it's not (just) about the code
Beyond PHP - it's not (just) about the codeWim Godden
 
Beyond php - it's not (just) about the code
Beyond php - it's not (just) about the codeBeyond php - it's not (just) about the code
Beyond php - it's not (just) about the codeWim Godden
 
glance replicator
glance replicatorglance replicator
glance replicatoririx_jp
 
Beyond php - it's not (just) about the code
Beyond php - it's not (just) about the codeBeyond php - it's not (just) about the code
Beyond php - it's not (just) about the codeWim Godden
 
DataStax: An Introduction to DataStax Enterprise Search
DataStax: An Introduction to DataStax Enterprise SearchDataStax: An Introduction to DataStax Enterprise Search
DataStax: An Introduction to DataStax Enterprise SearchDataStax Academy
 
Administering and Monitoring SolrCloud Clusters
Administering and Monitoring SolrCloud ClustersAdministering and Monitoring SolrCloud Clusters
Administering and Monitoring SolrCloud ClustersSematext Group, Inc.
 

La actualidad más candente (20)

Python in the database
Python in the databasePython in the database
Python in the database
 
Caching and tuning fun for high scalability
Caching and tuning fun for high scalabilityCaching and tuning fun for high scalability
Caching and tuning fun for high scalability
 
Apache Cassandra Lesson: Data Modelling and CQL3
Apache Cassandra Lesson: Data Modelling and CQL3Apache Cassandra Lesson: Data Modelling and CQL3
Apache Cassandra Lesson: Data Modelling and CQL3
 
How to Use JSON in MySQL Wrong
How to Use JSON in MySQL WrongHow to Use JSON in MySQL Wrong
How to Use JSON in MySQL Wrong
 
Top Node.js Metrics to Watch
Top Node.js Metrics to WatchTop Node.js Metrics to Watch
Top Node.js Metrics to Watch
 
MongoDB Database Replication
MongoDB Database ReplicationMongoDB Database Replication
MongoDB Database Replication
 
Cassandra 3.0 Awesomeness
Cassandra 3.0 AwesomenessCassandra 3.0 Awesomeness
Cassandra 3.0 Awesomeness
 
CQL performance with Apache Cassandra 3.0 (Aaron Morton, The Last Pickle) | C...
CQL performance with Apache Cassandra 3.0 (Aaron Morton, The Last Pickle) | C...CQL performance with Apache Cassandra 3.0 (Aaron Morton, The Last Pickle) | C...
CQL performance with Apache Cassandra 3.0 (Aaron Morton, The Last Pickle) | C...
 
Replication MongoDB Days 2013
Replication MongoDB Days 2013Replication MongoDB Days 2013
Replication MongoDB Days 2013
 
C*ollege Credit: Creating Your First App in Java with Cassandra
C*ollege Credit: Creating Your First App in Java with CassandraC*ollege Credit: Creating Your First App in Java with Cassandra
C*ollege Credit: Creating Your First App in Java with Cassandra
 
Load Data Fast!
Load Data Fast!Load Data Fast!
Load Data Fast!
 
Cassandra EU - Data model on fire
Cassandra EU - Data model on fireCassandra EU - Data model on fire
Cassandra EU - Data model on fire
 
Cassandra 3.0 - JSON at scale - StampedeCon 2015
Cassandra 3.0 - JSON at scale - StampedeCon 2015Cassandra 3.0 - JSON at scale - StampedeCon 2015
Cassandra 3.0 - JSON at scale - StampedeCon 2015
 
Beyond PHP - it's not (just) about the code
Beyond PHP - it's not (just) about the codeBeyond PHP - it's not (just) about the code
Beyond PHP - it's not (just) about the code
 
Beyond php - it's not (just) about the code
Beyond php - it's not (just) about the codeBeyond php - it's not (just) about the code
Beyond php - it's not (just) about the code
 
Cassandra 2.2 & 3.0
Cassandra 2.2 & 3.0Cassandra 2.2 & 3.0
Cassandra 2.2 & 3.0
 
glance replicator
glance replicatorglance replicator
glance replicator
 
Beyond php - it's not (just) about the code
Beyond php - it's not (just) about the codeBeyond php - it's not (just) about the code
Beyond php - it's not (just) about the code
 
DataStax: An Introduction to DataStax Enterprise Search
DataStax: An Introduction to DataStax Enterprise SearchDataStax: An Introduction to DataStax Enterprise Search
DataStax: An Introduction to DataStax Enterprise Search
 
Administering and Monitoring SolrCloud Clusters
Administering and Monitoring SolrCloud ClustersAdministering and Monitoring SolrCloud Clusters
Administering and Monitoring SolrCloud Clusters
 

Similar a Cassandra for Python Developers

Cassandra for Ruby/Rails Devs
Cassandra for Ruby/Rails DevsCassandra for Ruby/Rails Devs
Cassandra for Ruby/Rails DevsTyler Hobbs
 
Cassandra Summit 2013 Keynote
Cassandra Summit 2013 KeynoteCassandra Summit 2013 Keynote
Cassandra Summit 2013 Keynotejbellis
 
Importing Data into Neo4j quickly and easily - StackOverflow
Importing Data into Neo4j quickly and easily - StackOverflowImporting Data into Neo4j quickly and easily - StackOverflow
Importing Data into Neo4j quickly and easily - StackOverflowNeo4j
 
ETL With Cassandra Streaming Bulk Loading
ETL With Cassandra Streaming Bulk LoadingETL With Cassandra Streaming Bulk Loading
ETL With Cassandra Streaming Bulk Loadingalex_araujo
 
Rapid and Scalable Development with MongoDB, PyMongo, and Ming
Rapid and Scalable Development with MongoDB, PyMongo, and MingRapid and Scalable Development with MongoDB, PyMongo, and Ming
Rapid and Scalable Development with MongoDB, PyMongo, and MingRick Copeland
 
Graph Connect: Importing data quickly and easily
Graph Connect: Importing data quickly and easilyGraph Connect: Importing data quickly and easily
Graph Connect: Importing data quickly and easilyMark Needham
 
GraphConnect Europe 2016 - Importing Data - Mark Needham, Michael Hunger
GraphConnect Europe 2016 - Importing Data - Mark Needham, Michael HungerGraphConnect Europe 2016 - Importing Data - Mark Needham, Michael Hunger
GraphConnect Europe 2016 - Importing Data - Mark Needham, Michael HungerNeo4j
 
The Ring programming language version 1.6 book - Part 46 of 189
The Ring programming language version 1.6 book - Part 46 of 189The Ring programming language version 1.6 book - Part 46 of 189
The Ring programming language version 1.6 book - Part 46 of 189Mahmoud Samir Fayed
 
Webinaire 2 de la série « Retour aux fondamentaux » : Votre première applicat...
Webinaire 2 de la série « Retour aux fondamentaux » : Votre première applicat...Webinaire 2 de la série « Retour aux fondamentaux » : Votre première applicat...
Webinaire 2 de la série « Retour aux fondamentaux » : Votre première applicat...MongoDB
 
The Ring programming language version 1.7 book - Part 48 of 196
The Ring programming language version 1.7 book - Part 48 of 196The Ring programming language version 1.7 book - Part 48 of 196
The Ring programming language version 1.7 book - Part 48 of 196Mahmoud Samir Fayed
 
The Ring programming language version 1.5 book - Part 8 of 31
The Ring programming language version 1.5 book - Part 8 of 31The Ring programming language version 1.5 book - Part 8 of 31
The Ring programming language version 1.5 book - Part 8 of 31Mahmoud Samir Fayed
 
ClickHouse Introduction by Alexander Zaitsev, Altinity CTO
ClickHouse Introduction by Alexander Zaitsev, Altinity CTOClickHouse Introduction by Alexander Zaitsev, Altinity CTO
ClickHouse Introduction by Alexander Zaitsev, Altinity CTOAltinity Ltd
 
Spark And Cassandra: 2 Fast, 2 Furious
Spark And Cassandra: 2 Fast, 2 FuriousSpark And Cassandra: 2 Fast, 2 Furious
Spark And Cassandra: 2 Fast, 2 FuriousJen Aman
 
Spark and Cassandra 2 Fast 2 Furious
Spark and Cassandra 2 Fast 2 FuriousSpark and Cassandra 2 Fast 2 Furious
Spark and Cassandra 2 Fast 2 FuriousRussell Spitzer
 
The Ring programming language version 1.2 book - Part 32 of 84
The Ring programming language version 1.2 book - Part 32 of 84The Ring programming language version 1.2 book - Part 32 of 84
The Ring programming language version 1.2 book - Part 32 of 84Mahmoud Samir Fayed
 
Store and Process Big Data with Hadoop and Cassandra
Store and Process Big Data with Hadoop and CassandraStore and Process Big Data with Hadoop and Cassandra
Store and Process Big Data with Hadoop and CassandraDeependra Ariyadewa
 
The Ring programming language version 1.9 book - Part 53 of 210
The Ring programming language version 1.9 book - Part 53 of 210The Ring programming language version 1.9 book - Part 53 of 210
The Ring programming language version 1.9 book - Part 53 of 210Mahmoud Samir Fayed
 
The Ring programming language version 1.9 book - Part 36 of 210
The Ring programming language version 1.9 book - Part 36 of 210The Ring programming language version 1.9 book - Part 36 of 210
The Ring programming language version 1.9 book - Part 36 of 210Mahmoud Samir Fayed
 
Back to Basics Webinar 2 - Your First MongoDB Application
Back to  Basics Webinar 2 - Your First MongoDB ApplicationBack to  Basics Webinar 2 - Your First MongoDB Application
Back to Basics Webinar 2 - Your First MongoDB ApplicationJoe Drumgoole
 

Similar a Cassandra for Python Developers (20)

Cassandra for Ruby/Rails Devs
Cassandra for Ruby/Rails DevsCassandra for Ruby/Rails Devs
Cassandra for Ruby/Rails Devs
 
Cassandra Summit 2013 Keynote
Cassandra Summit 2013 KeynoteCassandra Summit 2013 Keynote
Cassandra Summit 2013 Keynote
 
Importing Data into Neo4j quickly and easily - StackOverflow
Importing Data into Neo4j quickly and easily - StackOverflowImporting Data into Neo4j quickly and easily - StackOverflow
Importing Data into Neo4j quickly and easily - StackOverflow
 
ETL With Cassandra Streaming Bulk Loading
ETL With Cassandra Streaming Bulk LoadingETL With Cassandra Streaming Bulk Loading
ETL With Cassandra Streaming Bulk Loading
 
Rapid and Scalable Development with MongoDB, PyMongo, and Ming
Rapid and Scalable Development with MongoDB, PyMongo, and MingRapid and Scalable Development with MongoDB, PyMongo, and Ming
Rapid and Scalable Development with MongoDB, PyMongo, and Ming
 
Graph Connect: Importing data quickly and easily
Graph Connect: Importing data quickly and easilyGraph Connect: Importing data quickly and easily
Graph Connect: Importing data quickly and easily
 
GraphConnect Europe 2016 - Importing Data - Mark Needham, Michael Hunger
GraphConnect Europe 2016 - Importing Data - Mark Needham, Michael HungerGraphConnect Europe 2016 - Importing Data - Mark Needham, Michael Hunger
GraphConnect Europe 2016 - Importing Data - Mark Needham, Michael Hunger
 
The Ring programming language version 1.6 book - Part 46 of 189
The Ring programming language version 1.6 book - Part 46 of 189The Ring programming language version 1.6 book - Part 46 of 189
The Ring programming language version 1.6 book - Part 46 of 189
 
Webinaire 2 de la série « Retour aux fondamentaux » : Votre première applicat...
Webinaire 2 de la série « Retour aux fondamentaux » : Votre première applicat...Webinaire 2 de la série « Retour aux fondamentaux » : Votre première applicat...
Webinaire 2 de la série « Retour aux fondamentaux » : Votre première applicat...
 
The Ring programming language version 1.7 book - Part 48 of 196
The Ring programming language version 1.7 book - Part 48 of 196The Ring programming language version 1.7 book - Part 48 of 196
The Ring programming language version 1.7 book - Part 48 of 196
 
The Ring programming language version 1.5 book - Part 8 of 31
The Ring programming language version 1.5 book - Part 8 of 31The Ring programming language version 1.5 book - Part 8 of 31
The Ring programming language version 1.5 book - Part 8 of 31
 
ClickHouse Introduction by Alexander Zaitsev, Altinity CTO
ClickHouse Introduction by Alexander Zaitsev, Altinity CTOClickHouse Introduction by Alexander Zaitsev, Altinity CTO
ClickHouse Introduction by Alexander Zaitsev, Altinity CTO
 
Python database access
Python database accessPython database access
Python database access
 
Spark And Cassandra: 2 Fast, 2 Furious
Spark And Cassandra: 2 Fast, 2 FuriousSpark And Cassandra: 2 Fast, 2 Furious
Spark And Cassandra: 2 Fast, 2 Furious
 
Spark and Cassandra 2 Fast 2 Furious
Spark and Cassandra 2 Fast 2 FuriousSpark and Cassandra 2 Fast 2 Furious
Spark and Cassandra 2 Fast 2 Furious
 
The Ring programming language version 1.2 book - Part 32 of 84
The Ring programming language version 1.2 book - Part 32 of 84The Ring programming language version 1.2 book - Part 32 of 84
The Ring programming language version 1.2 book - Part 32 of 84
 
Store and Process Big Data with Hadoop and Cassandra
Store and Process Big Data with Hadoop and CassandraStore and Process Big Data with Hadoop and Cassandra
Store and Process Big Data with Hadoop and Cassandra
 
The Ring programming language version 1.9 book - Part 53 of 210
The Ring programming language version 1.9 book - Part 53 of 210The Ring programming language version 1.9 book - Part 53 of 210
The Ring programming language version 1.9 book - Part 53 of 210
 
The Ring programming language version 1.9 book - Part 36 of 210
The Ring programming language version 1.9 book - Part 36 of 210The Ring programming language version 1.9 book - Part 36 of 210
The Ring programming language version 1.9 book - Part 36 of 210
 
Back to Basics Webinar 2 - Your First MongoDB Application
Back to  Basics Webinar 2 - Your First MongoDB ApplicationBack to  Basics Webinar 2 - Your First MongoDB Application
Back to Basics Webinar 2 - Your First MongoDB Application
 

Último

SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 

Último (20)

SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 

Cassandra for Python Developers

  • 1. Cassandra for Python Developers Tyler Hobbs
  • 2. History  Open-sourced by Facebook 2008  Apache Incubator 2009  Top-level Apache project 2010  DataStax founded 2010
  • 3. Strengths  Scalable – 2x Nodes == 2x Performance  Reliable (Available) – Replication that works – Multi-DC support – No single point of failure
  • 4. Strengths  Fast – 10-30k writes/sec, 1-10k reads/sec  Analytics – Integrated Hadoop support
  • 5. Weaknesses  No ACID transactions – Don't need these as often as you'd think  Limited support for ad-hoc queries – You'll give these up anyway when sharding an RDBMS  Generally complements another system – Not intended to be one-size-fits-all
  • 6. Clustering  Every node plays the same role – No masters, slaves, or special nodes
  • 7. Clustering 0 50 10 40 20 30
  • 8. Clustering Key: “www.google.com” 0 50 10 40 20 30
  • 9. Clustering Key: “www.google.com” 0 md5(“www.google.com”) 50 10 14 40 20 30
  • 10. Clustering Key: “www.google.com” 0 md5(“www.google.com”) 50 10 14 40 20 30
  • 11. Clustering Key: “www.google.com” 0 md5(“www.google.com”) 50 10 14 40 20 30
  • 12. Clustering Key: “www.google.com” 0 md5(“www.google.com”) 50 10 14 40 20 30 Replication Factor = 3
  • 13. Clustering  Client can talk to any node
  • 14. Data Model  Keyspace – A collection of Column Families – Controls replication settings  Column Family – Kinda resembles a table
  • 15. ColumnFamilies  Static – Object data  Dynamic – Pre-calculated query results
  • 16. Static Column Families Users zznate password: * name: Nate driftx password: * name: Brandon thobbs password: * name: Tyler jbellis password: * name: Jonathan site: riptano.com
  • 17. Dynamic Column Families Following zznate driftx: thobbs: driftx thobbs zznate: jbellis driftx: mdennis: pcmanus thobbs: xedin: zznate
  • 18. Dynamic Column Families  Timeline of tweets by a user  Timeline of tweets by all of the people a user is following  List of comments sorted by score  List of friends grouped by state
  • 19. Pycassa  Python client library for Cassandra  Open Source (MIT License) – www.github.com/pycassa/pycassa  Users – Reddit – ~10k github downloads of every version
  • 20. Installing Pycassa  easy_install pycassa – or pip
  • 21. Basic Layout  pycassa.pool – Connection pooling  pycassa.columnfamily – Primary module for the data API  pycassa.system_manager – Schema management
  • 22. The Data API  RPC-based API  Rows are like a sorted list of (name,value) tuples – Like a dict, but sorted by the names – OrderedDicts are used to preserve sorting
  • 23. Inserting Data >>> from pycassa.pool import ConnectionPool >>> from pycassa.columnfamily import ColumnFamily >>> >>> pool = ConnectionPool(“MyKeyspace”) >>> cf = ColumnFamily(pool, “MyCF”) >>> >>> cf.insert(“key”, {“col_name”: “col_value”}) >>> cf.get(“key”) {“col_name”: “col_value”}
  • 24. Inserting Data >>> columns = {“aaa”: 1, “ccc”: 3} >>> cf.insert(“key”, columns) >>> cf.get(“key”) {“aaa”: 1, “ccc”: 3} >>> >>> # Updates are the same as inserts >>> cf.insert(“key”, {“aaa”: 42}) >>> cf.get(“key”) {“aaa”: 42, “ccc”: 3} >>> >>> # We can insert anywhere in the row >>> cf.insert(“key”, {“bbb”: 2, “ddd”: 4}) >>> cf.get(“key”) {“aaa”: 42, “bbb”: 2, “ccc”: 3, “ddd”: 4}
  • 25. Fetching Data >>> cf.get(“key”) {“aaa”: 42, “bbb”: 2, “ccc”: 3, “ddd”: 4} >>> >>> # Get a set of columns by name >>> cf.get(“key”, columns=[“bbb”, “ddd”]) {“bbb”: 2, “ddd”: 4}
  • 26. Fetching Data >>> # Get a slice of columns >>> cf.get(“key”, column_start=”bbb”, ... column_finish=”ccc”) {“bbb”: 2, “ccc”: 3} >>> >>> # Slice from “ccc” to the end >>> cf.get(“key”, column_start=”ccc”) {“ccc”: 3, “ddd”: 4} >>> >>> # Slice from “bbb” to the beginning >>> cf.get(“key”, column_start=”bbb”, ... column_reversed=True) {“bbb”: 2, “aaa”: 42}
  • 27. Fetching Data >>> # Get the first two columns in the row >>> cf.get(“key”, column_count=2) {“aaa”: 42, “bbb”: 2} >>> >>> # Get the last two columns in the row >>> cf.get(“key”, column_reversed=True, ... column_count=2) {“ddd”: 4, “ccc”: 3}
  • 28. Fetching Multiple Rows >>> columns = {“col”: “val”} >>> cf.batch_insert({“k1”: columns, ... “k2”: columns, ... “k3”: columns}) >>> >>> # Get multiple rows by name >>> cf.multiget([“k1”,“k2”]) {“k1”: {”col”: “val”}, “k2”: {“col”: “val”}} >>> # You can get slices of each row, too >>> cf.multiget([“k1”,“k2”], column_start=”bbb”) …
  • 29. Fetching a Range of Rows >>> # Get a generator over all of the rows >>> for key, columns in cf.get_range(): ... print key, columns “k1” {”col”: “val”} “k2” {“col”: “val”} “k3” {“col”: “val”} >>> # You can get slices of each row >>> cf.get_range(column_start=”bbb”) …
  • 30. Fetching Rows by Secondary Index >>> from pycassa.index import * >>> >>> # Build up our index clause to match >>> exp = create_index_expression(“name”, “Joe”) >>> clause = create_index_clause([exp]) >>> matches = users.get_indexed_slices(clause) >>> >>> # results is a generator over matching rows >>> for key, columns in matches: ... print key, columns “13” {”name”: “Joe”, “nick”: “thatguy2”} “257” {“name”: “Joe”, “nick”: “flowers”} “98” {“name”: “Joe”, “nick”: “fr0d0”}
  • 31. Deleting Data >>> # Delete a whole row >>> cf.remove(“key1”) >>> >>> # Or selectively delete columns >>> cf.remove(“key2”, columns=[“name”, “date”])
  • 32. Connection Management  pycassa.pool.ConnectionPool – Takes a list of servers • Can be any set of nodes in your cluster – pool_size, max_retries, timeout – Automatically retries operations against other nodes • Writes are idempotent! – Individual node failures are transparent – Thread safe
  • 33. Async Options  eventlet – Just need to monkeypatch socket and threading  Twisted – Use Telephus instead of Pycassa – www.github.com/driftx/telephus – Less friendly, documented, etc
  • 34. Tyler Hobbs @tylhobbs tyler@datastax.com