Would you like to learn how to use Cassandra but don’t know where to begin? Want to get your feet wet but you’re lost in the desert? Longing for a cluster when you don’t even know how to set up a node? Then look no further! Rebecca Mills, Junior Evangelist at Datastax, will guide you in the webinar “Getting Started with Apache Cassandra...”
You'll get an overview of Planet Cassandra’s resources to get you started quickly and easily. Rebecca will take you down the path that's right for you, whether you are a developer or administrator. Join if you are interested in getting Cassandra up and working in the way that suits you best.
2. • Then you’ve come to the
right place!
• To learn some important
basics of Cassandra
without ever having to
leave your couch
Don’t want to spend exorbitant amount of
time and energy learning a new database?
3. What do I do?
• Try to create awareness
for open source Cassandra
• Develop content to get
people interested in trying
• Identify problems
newcomers might be
encountering
• Develop strategies and
material to help with that
4. Where can you download Cassandra?
• The easiest way is to
head straight to Planet
Cassandra
• http://planetcassandra.or
• Go to the “Downloads”
section, choose you
operating system and the
version of DSC that’ you’d
like
• Get crackin’!
6. 2 things you should do to get going
1.Check your version of
Java
2.Edit your cassandra.yaml
file to point your
Cassandra instance
towards your home
directory
7. 1. Check your version of Java
• To check what version of java
you are using, at the prompt
type
% java –version
•Be sure to use the latest
version (JDK 7) on all nodes
8. 2. Change default location to save data
• Don’t run Cassandra as
root
• Other wise we will not be
able to start Cassandra
or have access to the
directories where our data
is being saved.
• Access the cassandra.yaml
file though the cassandra
conf directory
9. The 3 lines you should change in the
cassandra.yaml file:
Edit cassandra.yaml
data_file_directories:
- /var/lib/cassandra/data
-$HOME/cassandra/data
commitlog_directory: /var/lib/cassandra/commitlog
$HOME/cassandra/commitlog
saved_caches_directory: /var/lib/cassandra/saved_caches
$HOME/cassandra/saved_caches
10. 1.Start up an instance
1.Create a schema with CQL
2.Inject some data into our
instance
1.Run a query against our
database
5 things you can do quickly
11. 1. Start up an instance
• It’s very simple! Just go to
your install location and start
it from the bin directory as
such:
$ cd install_location
$ bin/cassandra
12. 2. Create a schema with CQL
• From within your installation
directory, start up your CQL
shell from within the bin
directory
$ cd install_directory
$ bin/cqlsh
• You should see the cqlsh
command prompt as such
Connected to Test Cluster at localhost:9160.
[cqlsh 4.1.1 | Cassandra 2.0.8 | CQL spec 3.1.1 | Thrift protocol 19.39.0]
Use HELP for help.
cqlsh>
13. 2. Create a schema with CQL
• A keyspace is a container for
our data. Here we are creating
a demo keyspace and a users
table within. A table consists
of rows and columns.
CREATE KEYSPACE demo WITH REPLICATION =
{‘class’:’SimpleStrategy’,’replication_factor’:1};
USE demo;
CREATE TABLE users (
firstname text,
lastname text,
age int,
email text,
city text,
PRIMARY KEY (lastname)
);
14. 3. Inject some data into your instance
• Nothing sadder than an empty database. Here we
are populating our “users” table with rows of
data using the INSERT command.
INSERT INTO users (firstname, lastname, age, email, city) VALUES
(‘John’,’Smith’, 46, ‘johnsmith@email.com’, ‘Sacramento’);
INSERT INTO users (firstname, lastname, age, email, city) VALUES
(‘Jane’,’Doe’, 36, ‘janedoe@email.com’, ‘Beverly Hills’);
INSERT INTO users (firstname, lastname, age, email, city) VALUES
(‘Rob’,’Byrne’, 24, ‘robbyrne@email.com’, ‘San Diego’);
15. 4. Make a query against your database
SELECT * FROM users;
SELECT * FROM users WHERE lastname=‘Doe’;
lastname | age | city | email | firstname
----------+-----+---------------+---------------------+-----------
Doe | 36 | Beverly Hills | janedoe@email.com | Jane
Bryne | 24 | San Diego | robbyrne@email.com | Rob
Smith | 46 | Sacramento | johnsmith@email.com | John
lastname | age | city | email | firstname
----------+-----+---------------+-------------------+-----------
Doe | 36 | Beverly Hills | janedoe@email.com | Jane
16. 5. Make a change to your data
UPDATE users SET city=‘San Jose’ WHERE lastname=‘Doe’;
SELECT * FROM users WHERE lastname= ‘Doe’;
lastname | age | city | email | firstname
----------+-----+----------+-------------------+-------------
Doe | 36 | San Jose | janedoe@email.com | Jane
SELECT * FROM users;DELETE FROM users WHERE lastname=‘Doe’;
lastname | age | city | email | firstname
----------+-----+---------------+---------------------+-----------
Bryne | 24 | San Diego | robbyrne@email.com | Rob
Smith | 46 | Sacramento | johnsmith@email.com | John
18. Dev Center
• Try out your CQL in an easy-
to-use tool
• Has most of the same
functionality as cqlsh with
a few exceptions
• Quickly connect to your
cluster and keyspace. GO!
29. You can run an AWS AMI from Opscenter!
• Run a Cassandra instance/cluster in the
cloud!
• Using Amazon Web Services EC2 Management
Console
• Quickly deploy a Cassandra cluster within
a single availability zone through
Opscenter
• Check out
http://www.datastax.com/documentation/cassa
30.
31.
32. What about the drivers
• Datastax provides drivers
for Java, Python, C#, and C+
+
• There are also many open
sources community drivers,
including Closure, Go,
Node.js and many many
more.
33. Connect to your instance with Java
• Create a new Java class,
com.example.cassandra.SimpleClient for
example
• Add an instance field to hold cluster
reference
private Cluster cluster;
• Add an instance method, connect, to your
new class. Here you can add your contact
point, the ip address of your node.
public void connect(String node) {
cluster = Cluster.builder()
.addContactPoint(<ip_address>)
.build();
}
• Add an instance method, close, to shut
down the cluster once you are finished
34. Connect to your instance with Java
• In your main class, create a SimpleClient
object, call connect, and close it
public static void main(String[] args) {
SimpleClient client = new SimpleClient();
client .connect(<ip_address>);
client.close();
}
• Select some data
session.execute (‘SELECT * FROM demo.users’);
35. Connect to your instance in Python
• From cassandra.cluster import Cluster
cluster = Cluster()
• This will attempt to connect to a cluster
on your local machine.
You could also give it an ip address and it
will connect to that.
cluster = Cluster(<ip_address>)
• To connect to a node and begin begin
actually running queries against our
instance, we need a session, which is
created by calling Cluster.connect()
cluster = Cluster()
Session = cluster.connect()
• You can even connect to a particular
keyspace
cluster = Cluster()
Session = cluster.connect(‘demo’)
36. Connect to your instance in Python
• Select some data
results = session.execute (”””
SELECT * FROM demo.users
“““)
Part of my job is to help try to make Cassandra more approachable for everyone
A lot of people claim that other databases are faster and easier to get up and running with then Cassandra
I consider it my mission to guide people through the challenges of getting started
Maybe you don’t have a loads of free time to spend trying to learn how to use a new database
Sometimes it can be hard navigating your way through tangly docs
when you really just want a quick taste of what its like to use the database
Today I’m going to give you a brief overview of what it takes, we’ll say the bare minimum steps to get up and running with Cassandra
I’m not saying you’ll have your own 100 node cluster going by the end of all this, but at least you’ll have a concept of what its like
So sit back, relax, and lets go
As a Junior Evangelist I
try to create awareness for open source cassandra
I develop Cassandra themed content like blog posts, video tutorials, webinars, and I also have my twitter account
Part of my job is also to step in the shoes of a ‘newbie’ to try to determine what kind of problems people just being
introduced to Cassandra might encounter, which may not be obvious to an expert.
So if you haven’t already, head to Planet Cassandra and go to the Downloads section
There you can choose your operating system and the type of DSC download that you want
On the downloads page there are also guides on how to install DSC once you have it
Alright, well lets get going with our instance
But before you can fire up your instance, there are a few things that we need to tinker with
Otherwise Cassandra may not work properly, or may not even start up at all!
If we were starting up a cluster, this list would get a little longer as we would have to tell the nodes how to share information
But for now we are just worried about our single instance
Two things we are concerned about are checking our version of Java and making sure we have access to our data files when they get saved
Firstly, you need to make sure you have the latest version of Java, JDK7 installed on all your nodes
You’re going to want to to change the location of data, commit logs and save caches
If you leave them as default, you’re going to have to run Cassandra as root in order for it to start, which isn’t ideal of course
Put probably The easiest way to deal with this problem is set the save location in your home directory
The location for the saves is configured in the cassandra.yaml file in the conf directory
Instead of using the default directory paths we’ll change them all to use our home directory.
This will guarantee that we have the correct permissions.
We’re going to run through this list here now of 5 things you should be able to do quickly when you start up a Cassandra instance
So assuming you downloaded the tarball, just go to your install location and run cassandra from the bin directory
Once we get our instance started, we can run CQL shell
CQL is Cassandra Query Language
Syntactically its pretty similar to SQL, so it shouldn’t be too hard if you have a relational database background
When you run CQL shell, you’ll get a prompt and then you can start communicating with your database
So a keyspaces hold our data in cassandra
They have tables which are made up of rows and columns
A row represents a single data entry
Here I’m showing the creation of a keyspace in CQL, never mind the class and replication factor component for now, that’s outside of the scope of this webinar
And then I created a “user” table within that keyspace, where I assign the columns a name and data type
Next, We can populate our the rows in our table using the insert command
If I ran these 3 insert commands, it would inject 3 rows of user information into the “users” table I made
If we wanted to query our database, a “SELECT * FROM users“ would return all the rows from the table
Using a WHERE clause and a specific last name (which we set to be our primary key), it would return the users associated with that last name.
The PRIMARY KEY (which is also the partition key in this case) refers to the partition on disk where the data is located
These are examples of what an update and delete look like in CQL
As you can see its pretty familiar looking syntax, it’s just that simple!
Two really great tools you can use with Cassandra are Opscenter and DevCenter
DevCenter is a free tool you can download on the DS website
It’s a cool alternative to CQL shell, if you’d prefer a GUI
You can connect to a local server or remote clusters
This is what dev center looks like
You can type most of the same commands here as you would in CQL shell
It has almost the same functionaliy, and has a nice visual interface
In the connection center, you can save a new connection if you intend to use it frequently, Instead of reconnecting over and over each time that you use it
Here I’m connecting to an instance on my local machine
I’m running the same commands here as I was in CQL as earlier, creating that same demo keyspace
Creating that same user table.
Notice the nice syntax highlighting.
Also notice the schema window in the upper right corner showing all our keyspaces
Insert new records into the database
Then select those records and get a nice table view of the data
OpsCenter is a there to help you manage a Cassandra cluster
Because managing a lot of machines can be a challenge sometimes
It’s easy to make cluster wide configuration changes with Opscenter, instead of digging through configuration files on the command line
You can also diagnosis problems with your cluster using Opscenter
You can set up graphs to track Write latency, read latency, hinted handoff etc
And these may give you a good indication of the source of a problem
So what about multi data center?
Of course Opscenter does multi data center! Because its cassandra!
You can create a Cassandra instance or cluster in the cloud using the AWS AMI
You spin these up through Opscenter
In the new cluster section, select the cloud option, which only appears if you’re running opscenter on an EC2 instance
Adding a cluster can be done from a single image and configuration file
You give your Datastax credentials sent to you by email
As well as the credentials of each node
You use your own AWS credentials to create a cluster and configure things like security groups on the fly
So DS has drivers for Java, Python, C# and C++
There are a lot of other opensource drivers though
Check out the Client Drivers section of Planet Cassandra and you’ll probably find one in the language you’re looking for
Connecting to your cluster using Java is really easy
First create a cluster object
Use the builder method to connect to the cluster
That’s it! It’s just that easy.
Here is a simple program that will connect to your database
Just a few lines of code and you are ready to insert and select data from Cassandra
Here the same situation in python, I wish I had more to say about this but it essentially the same, very simple
Create a cluster object and use the connect method. That’s it.
Once you have a session, you can use the execute method to run CQL commands
So if your looking for great resouroces on Apache Casandra, you should definiety check out Planet Cassandra
You’ll find everything you need there: webinars, blog posts, use cases, tutorials
While you’re there, check out the try Cassandra section, which I created all the content for
Try cassandra has quick 10 minute tutorial for developers and administrators
And some walk through videos that I made to help you guys out