SlideShare una empresa de Scribd logo
1 de 31
Cloning Twitter With
HBase
Dr. Fabio Fumarola
A Twitter Clone
• One of the most successful new Internet services of
recent times is Twitter.
• Since its launch it has exploded from niche usage to
usage by the general populace, with celebrities such
as Oprah Winfrey, Britney Spears, and Shaquille
O'Neal, and politicians such as Barack Obama and Al
Gore jumping into it.
2
Why Twitter?
• Simple: it does not care what you share, as a long it is less
than 140 characters
• A means to have public conversation: Twitter allows a user
to tweet and have users respond using '@' reply, comment,
or re-tweet
• Fan versus friend
• Understanding user behavior
• Easy to share through text messaging
• Easy to access through multiple devices and applications
3
Twitter Stats
• According to Compete (www.compete.com)
4
Main Features
• Allow users to post status updates (known as
'tweets' in Twitter) to the public.
• Allow users to follow and unfollow other users. Users
can follow any other user but it is not reciprocal.
• Allow users to send public messages directed to
particular users using the @ replies convention (in
Twitter this is known as mentions)
5
Main Features
• Allow users to send direct messages to other users,
messages are private to the sender and the recipient
user only (direct messages are only to a single
recipient).
• Allow users to re-tweet or forward another user's
status in their own status update.
• Provide a public timeline where all statuses are
publicly available for viewing.
• Provide APIs to allow external applications access.
6
HBAse
7
Hbase: Features
• Strictly consistent reads and writes.
• Automatic and configurable sharding of tables
• Automatic failover support between RegionServers.
• Base classes for MapReduce jobs
• Easy java API
• Block cache and Bloom Filters for real-time queries.
8
Hbase: Features
• Query predicate push down via server side Filters
• Thrift gateway and a REST-ful Web service that
supports XML, Protobuf, and binary data encoding
options
• Extensible jruby-based (JIRB) shell
• Support for exporting metrics via the Hadoop metrics
subsystem to files or Ganglia; or via JMX
9
Hbase: Installation
• It can be run in 3 settings:
– Single-node standalone
– Pseudo-distributed single-machine
– Fully-distributed cluster
• We will see how to install HBase using Docker
10
Single Node
11
Single-node standalone
• Source code at
https://github.com/fabiofumarola/NoSQLDatabasesCourses
• It uses the local file system not HDFS (not for production).
• Download the tar distribution
• Edit hbase-site.xml
• Start HBase via start-hbase.sh
• We can use jps to test if HBase is running
12
Hbase-site.xml
The folders are created automatically by HBase
<configuration>
<property>
<name>hbase.rootdir</name>
<value>file:///hbase-data/hbase</value>
</property>
<property>
<name>hbase.zookeeper.property.dataDir</name>
<value>/hbase-data/zookeeper</value>
</property>
</configuration>
13
Single-node standalone
• Build the image
– docker build –tag=wheretolive/hbase:single ./
• Run the image
– docker run –d –p 2181:2181 -p 60010:60010 -p
60000:60000 -p 60020:60020 -p 60030:60030 –h hbase
--name=hbase wheretolive/hbase:single
14
Pseudo Distributed
15
Pseudo-distributed
• Run HBase in this mode means that each daemon
(HMaster, HRegionServer and Zookpeeper) run as
separate process.
• Here we can store the data into HDFS if it is available
• The main change is the hbase-site.xml
16
<configuration>
<property>
<name>hbase.cluster.distributed</name>
<value>true</value>
</property>
</configuration>
Pseudo-distributed
• Build the image
– docker build –tag=wheretolive/hbase:pseudo ./
• Run the image
– docker run –d –p 2181:2181 -p 60010:60010 -p
60000:60000 -p 60020:60020 -p 60030:60030 –h hbase
--name=hbase wheretolive/hbase:pseudo
17
Interacting with the Hbase Shell
18
HBase Shell
• Start the shell
• Create a table
• List the tables
19
$ ./bin/hbase shell
hbase(main):001:0>
hbase(main):001:0> create 'test', 'cf'
0 row(s) in 0.4170 seconds
=> Hbase::Table - test
hbase(main):002:0> list 'test'
TABLE
test
1 row(s) in 0.0180 seconds
=> ["test"]
HBase shell
20
hbase(main):034:0> describe 'test'
Table test is ENABLED
test
COLUMN FAMILIES DESCRIPTION
{NAME => 'cf', BLOOMFILTER => 'ROW', VERSIONS => '1',
IN_MEMORY => 'false', KEEP_DELETED_CELLS => 'FALSE',
DATA_BLOCK_ENCODING => 'NONE',
TTL => 'FOREVER', COMPRESSION => 'NONE', MIN_VERSIONS =>
'0', BLOCKCACHE => 'true', BLOCKSIZE => '65536',
REPLICATION_SCOPE => '0'}
1 row(s) in 0.0480 seconds
HBase shell: put data
21
hbase(main):003:0> put 'test', 'row1', 'cf:a',
'value1'
0 row(s) in 0.0850 seconds
hbase(main):004:0> put 'test', 'row2', 'cf:b',
'value2'
0 row(s) in 0.0110 seconds
hbase(main):005:0> put 'test', 'row3', 'cf:c',
'value3'
0 row(s) in 0.0100 seconds
HBase shell get
22
hbase(main):007:0> get 'test', 'row1'
COLUMN CELL
cf:a timestamp=1421762485768, value=value1
1 row(s) in 0.0350 seconds
HBase shell: incr
23
hbase(main):027:0> incr 'test', 'row3', 'cf:count', 1
COUNTER VALUE = 1
0 row(s) in 0.0070 seconds
hbase(main):028:0> incr 'test', 'row3', 'cf:count', 1
COUNTER VALUE = 2
0 row(s) in 0.0210 seconds
#Get Counter
hbase(main):031:0> get_counter 'test', 'row3', 'cf:count'
COUNTER VALUE = 4
HBase shell: scan
24
hbase(main):006:0> scan 'test'
ROW COLUMN+CELL
row1 column=cf:a, timestamp=1430940122422,
value=value1
row2 column=cf:b, timestamp=1430940126703,
value=value2
row3 column=cf:c, timestamp=1430940130700,
value=value3
3 row(s) in 0.0470 seconds
HBase shell: disable and drop
25
hbase(main):008:0> disable 'test'
0 row(s) in 1.1820 seconds
hbase(main):009:0> enable 'test'
0 row(s) in 0.1770 seconds
hbase(main):011:0> drop 'test'
0 row(s) in 0.1370 seconds
https://learnhbase.wordpress.com/2013/03/02/hbase-shell-
commands/
Data Layout
26
Users: Identifier
• We need to represent users, of course, with their
– username, userid, password, the set of users following a
given user, the set of users a given user follows, and so on.
• The first question is, how should we identify a user?
• A solution is to associate a unique ID with every user.
• Every other reference to this user will be done by id.
– Create a table that stores all the ids
27
Users
28
package HBaseIA.TwitBase.model;
public abstract class User {
public String user;
public String name;
public String email;
public String password;
@Override
public String toString() {
return String.format("<User: %s, %s, %s>", user, name, email);
}
Twits
29
public abstract class Twit {
public String user;
public DateTime dt;
public String text;
@Override
public String toString() {
return String.format(
"<Twit: %s %s %s>",
user, dt, text);
}
}
Followers, following and updates
• A user might have users who
follow them, which we'll call
their followers.
• A user might follow other
users, which we'll call a
following
30
public abstract class Relation {
public String relation;
public String from;
public String to;
@Override
public String toString() {
return String.format(
"<Relation: %s %s %s>",
from,
relation,
to);
}
}
Let us analyze the code in depth
• http://www.manning.com/dimidukkhurana/
• https://github.com/hbaseinaction/twitbase
• https://github.com/hbaseinaction
31

Más contenido relacionado

La actualidad más candente

MySQL shell and It's utilities - Praveen GR (Mydbops Team)
MySQL shell and It's utilities - Praveen GR (Mydbops Team)MySQL shell and It's utilities - Praveen GR (Mydbops Team)
MySQL shell and It's utilities - Praveen GR (Mydbops Team)Mydbops
 
The Google Chubby lock service for loosely-coupled distributed systems
The Google Chubby lock service for loosely-coupled distributed systemsThe Google Chubby lock service for loosely-coupled distributed systems
The Google Chubby lock service for loosely-coupled distributed systemsRomain Jacotin
 
Making Apache Kafka Elastic with Apache Mesos
Making Apache Kafka Elastic with Apache MesosMaking Apache Kafka Elastic with Apache Mesos
Making Apache Kafka Elastic with Apache MesosJoe Stein
 
Intro to HBase - Lars George
Intro to HBase - Lars GeorgeIntro to HBase - Lars George
Intro to HBase - Lars GeorgeJAX London
 
Mysql database basic user guide
Mysql database basic user guideMysql database basic user guide
Mysql database basic user guidePoguttuezhiniVP
 
A brief introduction to PostgreSQL
A brief introduction to PostgreSQLA brief introduction to PostgreSQL
A brief introduction to PostgreSQLVu Hung Nguyen
 
Containerized Data Persistence on Mesos
Containerized Data Persistence on MesosContainerized Data Persistence on Mesos
Containerized Data Persistence on MesosJoe Stein
 
Introduction of mesos persistent storage
Introduction of mesos persistent storageIntroduction of mesos persistent storage
Introduction of mesos persistent storageZhou Weitao
 
Making Distributed Data Persistent Services Elastic (Without Losing All Your ...
Making Distributed Data Persistent Services Elastic (Without Losing All Your ...Making Distributed Data Persistent Services Elastic (Without Losing All Your ...
Making Distributed Data Persistent Services Elastic (Without Losing All Your ...Joe Stein
 
Amazon Aurora로 안전하게 migration 하기
Amazon Aurora로 안전하게 migration 하기Amazon Aurora로 안전하게 migration 하기
Amazon Aurora로 안전하게 migration 하기Jesang Yoon
 
MySQL PHP native driver : Advanced Functions / PHP forum Paris 2013
 MySQL PHP native driver  : Advanced Functions / PHP forum Paris 2013   MySQL PHP native driver  : Advanced Functions / PHP forum Paris 2013
MySQL PHP native driver : Advanced Functions / PHP forum Paris 2013 Serge Frezefond
 
What's New in PostgreSQL 9.6
What's New in PostgreSQL 9.6What's New in PostgreSQL 9.6
What's New in PostgreSQL 9.6EDB
 
Web scraping with nutch solr part 2
Web scraping with nutch solr part 2Web scraping with nutch solr part 2
Web scraping with nutch solr part 2Mike Frampton
 
Apache Kafka, HDFS, Accumulo and more on Mesos
Apache Kafka, HDFS, Accumulo and more on MesosApache Kafka, HDFS, Accumulo and more on Mesos
Apache Kafka, HDFS, Accumulo and more on MesosJoe Stein
 
What is new in MariaDB 10.6?
What is new in MariaDB 10.6?What is new in MariaDB 10.6?
What is new in MariaDB 10.6?Mydbops
 
MySQL Live Migration - Common Scenarios
MySQL Live Migration - Common ScenariosMySQL Live Migration - Common Scenarios
MySQL Live Migration - Common ScenariosMydbops
 
Hortonworks HBase Meetup Presentation
Hortonworks HBase Meetup PresentationHortonworks HBase Meetup Presentation
Hortonworks HBase Meetup PresentationHortonworks
 
Introduction To Apache Mesos
Introduction To Apache MesosIntroduction To Apache Mesos
Introduction To Apache MesosJoe Stein
 

La actualidad más candente (20)

MySQL shell and It's utilities - Praveen GR (Mydbops Team)
MySQL shell and It's utilities - Praveen GR (Mydbops Team)MySQL shell and It's utilities - Praveen GR (Mydbops Team)
MySQL shell and It's utilities - Praveen GR (Mydbops Team)
 
The Google Chubby lock service for loosely-coupled distributed systems
The Google Chubby lock service for loosely-coupled distributed systemsThe Google Chubby lock service for loosely-coupled distributed systems
The Google Chubby lock service for loosely-coupled distributed systems
 
Making Apache Kafka Elastic with Apache Mesos
Making Apache Kafka Elastic with Apache MesosMaking Apache Kafka Elastic with Apache Mesos
Making Apache Kafka Elastic with Apache Mesos
 
Intro to HBase - Lars George
Intro to HBase - Lars GeorgeIntro to HBase - Lars George
Intro to HBase - Lars George
 
Mysql database basic user guide
Mysql database basic user guideMysql database basic user guide
Mysql database basic user guide
 
A brief introduction to PostgreSQL
A brief introduction to PostgreSQLA brief introduction to PostgreSQL
A brief introduction to PostgreSQL
 
Containerized Data Persistence on Mesos
Containerized Data Persistence on MesosContainerized Data Persistence on Mesos
Containerized Data Persistence on Mesos
 
SphinxSE with MySQL
SphinxSE with MySQLSphinxSE with MySQL
SphinxSE with MySQL
 
Introduction of mesos persistent storage
Introduction of mesos persistent storageIntroduction of mesos persistent storage
Introduction of mesos persistent storage
 
Making Distributed Data Persistent Services Elastic (Without Losing All Your ...
Making Distributed Data Persistent Services Elastic (Without Losing All Your ...Making Distributed Data Persistent Services Elastic (Without Losing All Your ...
Making Distributed Data Persistent Services Elastic (Without Losing All Your ...
 
Amazon Aurora로 안전하게 migration 하기
Amazon Aurora로 안전하게 migration 하기Amazon Aurora로 안전하게 migration 하기
Amazon Aurora로 안전하게 migration 하기
 
MySQL PHP native driver : Advanced Functions / PHP forum Paris 2013
 MySQL PHP native driver  : Advanced Functions / PHP forum Paris 2013   MySQL PHP native driver  : Advanced Functions / PHP forum Paris 2013
MySQL PHP native driver : Advanced Functions / PHP forum Paris 2013
 
Introduction to HDFS
Introduction to HDFSIntroduction to HDFS
Introduction to HDFS
 
What's New in PostgreSQL 9.6
What's New in PostgreSQL 9.6What's New in PostgreSQL 9.6
What's New in PostgreSQL 9.6
 
Web scraping with nutch solr part 2
Web scraping with nutch solr part 2Web scraping with nutch solr part 2
Web scraping with nutch solr part 2
 
Apache Kafka, HDFS, Accumulo and more on Mesos
Apache Kafka, HDFS, Accumulo and more on MesosApache Kafka, HDFS, Accumulo and more on Mesos
Apache Kafka, HDFS, Accumulo and more on Mesos
 
What is new in MariaDB 10.6?
What is new in MariaDB 10.6?What is new in MariaDB 10.6?
What is new in MariaDB 10.6?
 
MySQL Live Migration - Common Scenarios
MySQL Live Migration - Common ScenariosMySQL Live Migration - Common Scenarios
MySQL Live Migration - Common Scenarios
 
Hortonworks HBase Meetup Presentation
Hortonworks HBase Meetup PresentationHortonworks HBase Meetup Presentation
Hortonworks HBase Meetup Presentation
 
Introduction To Apache Mesos
Introduction To Apache MesosIntroduction To Apache Mesos
Introduction To Apache Mesos
 

Similar a 8b. Column Oriented Databases Lab

03 h base-2-installation_andshell
03 h base-2-installation_andshell03 h base-2-installation_andshell
03 h base-2-installation_andshelldntth0601
 
Real-time Big Data Analytics Engine using Impala
Real-time Big Data Analytics Engine using ImpalaReal-time Big Data Analytics Engine using Impala
Real-time Big Data Analytics Engine using ImpalaJason Shih
 
New Security Features in Apache HBase 0.98: An Operator's Guide
New Security Features in Apache HBase 0.98: An Operator's GuideNew Security Features in Apache HBase 0.98: An Operator's Guide
New Security Features in Apache HBase 0.98: An Operator's GuideHBaseCon
 
Web Services PHP Tutorial
Web Services PHP TutorialWeb Services PHP Tutorial
Web Services PHP TutorialLorna Mitchell
 
Web hacking series part 3
Web hacking series part 3Web hacking series part 3
Web hacking series part 3Aditya Kamat
 
Why Managed Service Providers Should Embrace Container Technology
Why Managed Service Providers Should Embrace Container TechnologyWhy Managed Service Providers Should Embrace Container Technology
Why Managed Service Providers Should Embrace Container TechnologySagi Brody
 
Getting to know Laravel 5
Getting to know Laravel 5Getting to know Laravel 5
Getting to know Laravel 5Bukhori Aqid
 
Improvements in Hadoop Security
Improvements in Hadoop SecurityImprovements in Hadoop Security
Improvements in Hadoop SecurityChris Nauroth
 
Large-scale Web Apps @ Pinterest
Large-scale Web Apps @ PinterestLarge-scale Web Apps @ Pinterest
Large-scale Web Apps @ PinterestHBaseCon
 
Working with Hive Analytics
Working with Hive AnalyticsWorking with Hive Analytics
Working with Hive AnalyticsManish Chopra
 
Improvements in Hadoop Security
Improvements in Hadoop SecurityImprovements in Hadoop Security
Improvements in Hadoop SecurityDataWorks Summit
 
Hive 3 - a new horizon
Hive 3 - a new horizonHive 3 - a new horizon
Hive 3 - a new horizonThejas Nair
 
IBM Connect 2016 - AD1548 - Building Responsive XPages Applications
IBM Connect 2016 - AD1548 - Building Responsive XPages ApplicationsIBM Connect 2016 - AD1548 - Building Responsive XPages Applications
IBM Connect 2016 - AD1548 - Building Responsive XPages Applicationsbeglee
 
HBaseCon 2012 | HBase Coprocessors – Deploy Shared Functionality Directly on ...
HBaseCon 2012 | HBase Coprocessors – Deploy Shared Functionality Directly on ...HBaseCon 2012 | HBase Coprocessors – Deploy Shared Functionality Directly on ...
HBaseCon 2012 | HBase Coprocessors – Deploy Shared Functionality Directly on ...Cloudera, Inc.
 
Apache Hive micro guide - ConfusedCoders
Apache Hive micro guide - ConfusedCodersApache Hive micro guide - ConfusedCoders
Apache Hive micro guide - ConfusedCodersYash Sharma
 

Similar a 8b. Column Oriented Databases Lab (20)

Web Services Tutorial
Web Services TutorialWeb Services Tutorial
Web Services Tutorial
 
Web services tutorial
Web services tutorialWeb services tutorial
Web services tutorial
 
03 h base-2-installation_andshell
03 h base-2-installation_andshell03 h base-2-installation_andshell
03 h base-2-installation_andshell
 
Real-time Big Data Analytics Engine using Impala
Real-time Big Data Analytics Engine using ImpalaReal-time Big Data Analytics Engine using Impala
Real-time Big Data Analytics Engine using Impala
 
New Security Features in Apache HBase 0.98: An Operator's Guide
New Security Features in Apache HBase 0.98: An Operator's GuideNew Security Features in Apache HBase 0.98: An Operator's Guide
New Security Features in Apache HBase 0.98: An Operator's Guide
 
Web Services PHP Tutorial
Web Services PHP TutorialWeb Services PHP Tutorial
Web Services PHP Tutorial
 
Web hacking series part 3
Web hacking series part 3Web hacking series part 3
Web hacking series part 3
 
Configuration management with Chef
Configuration management with ChefConfiguration management with Chef
Configuration management with Chef
 
Why Managed Service Providers Should Embrace Container Technology
Why Managed Service Providers Should Embrace Container TechnologyWhy Managed Service Providers Should Embrace Container Technology
Why Managed Service Providers Should Embrace Container Technology
 
Getting to know Laravel 5
Getting to know Laravel 5Getting to know Laravel 5
Getting to know Laravel 5
 
Improvements in Hadoop Security
Improvements in Hadoop SecurityImprovements in Hadoop Security
Improvements in Hadoop Security
 
Upgrading HDFS to 3.3.0 and deploying RBF in production #LINE_DM
Upgrading HDFS to 3.3.0 and deploying RBF in production #LINE_DMUpgrading HDFS to 3.3.0 and deploying RBF in production #LINE_DM
Upgrading HDFS to 3.3.0 and deploying RBF in production #LINE_DM
 
Large-scale Web Apps @ Pinterest
Large-scale Web Apps @ PinterestLarge-scale Web Apps @ Pinterest
Large-scale Web Apps @ Pinterest
 
PHP FUNCTIONS
PHP FUNCTIONSPHP FUNCTIONS
PHP FUNCTIONS
 
Working with Hive Analytics
Working with Hive AnalyticsWorking with Hive Analytics
Working with Hive Analytics
 
Improvements in Hadoop Security
Improvements in Hadoop SecurityImprovements in Hadoop Security
Improvements in Hadoop Security
 
Hive 3 - a new horizon
Hive 3 - a new horizonHive 3 - a new horizon
Hive 3 - a new horizon
 
IBM Connect 2016 - AD1548 - Building Responsive XPages Applications
IBM Connect 2016 - AD1548 - Building Responsive XPages ApplicationsIBM Connect 2016 - AD1548 - Building Responsive XPages Applications
IBM Connect 2016 - AD1548 - Building Responsive XPages Applications
 
HBaseCon 2012 | HBase Coprocessors – Deploy Shared Functionality Directly on ...
HBaseCon 2012 | HBase Coprocessors – Deploy Shared Functionality Directly on ...HBaseCon 2012 | HBase Coprocessors – Deploy Shared Functionality Directly on ...
HBaseCon 2012 | HBase Coprocessors – Deploy Shared Functionality Directly on ...
 
Apache Hive micro guide - ConfusedCoders
Apache Hive micro guide - ConfusedCodersApache Hive micro guide - ConfusedCoders
Apache Hive micro guide - ConfusedCoders
 

Más de Fabio Fumarola

11. From Hadoop to Spark 2/2
11. From Hadoop to Spark 2/211. From Hadoop to Spark 2/2
11. From Hadoop to Spark 2/2Fabio Fumarola
 
11. From Hadoop to Spark 1:2
11. From Hadoop to Spark 1:211. From Hadoop to Spark 1:2
11. From Hadoop to Spark 1:2Fabio Fumarola
 
10b. Graph Databases Lab
10b. Graph Databases Lab10b. Graph Databases Lab
10b. Graph Databases LabFabio Fumarola
 
9b. Document-Oriented Databases lab
9b. Document-Oriented Databases lab9b. Document-Oriented Databases lab
9b. Document-Oriented Databases labFabio Fumarola
 
9. Document Oriented Databases
9. Document Oriented Databases9. Document Oriented Databases
9. Document Oriented DatabasesFabio Fumarola
 
8. column oriented databases
8. column oriented databases8. column oriented databases
8. column oriented databasesFabio Fumarola
 
7. Key-Value Databases: In Depth
7. Key-Value Databases: In Depth7. Key-Value Databases: In Depth
7. Key-Value Databases: In DepthFabio Fumarola
 
6 Data Modeling for NoSQL 2/2
6 Data Modeling for NoSQL 2/26 Data Modeling for NoSQL 2/2
6 Data Modeling for NoSQL 2/2Fabio Fumarola
 
5 Data Modeling for NoSQL 1/2
5 Data Modeling for NoSQL 1/25 Data Modeling for NoSQL 1/2
5 Data Modeling for NoSQL 1/2Fabio Fumarola
 
2 Linux Container and Docker
2 Linux Container and Docker2 Linux Container and Docker
2 Linux Container and DockerFabio Fumarola
 
1. Introduction to the Course "Designing Data Bases with Advanced Data Models...
1. Introduction to the Course "Designing Data Bases with Advanced Data Models...1. Introduction to the Course "Designing Data Bases with Advanced Data Models...
1. Introduction to the Course "Designing Data Bases with Advanced Data Models...Fabio Fumarola
 
An introduction to maven gradle and sbt
An introduction to maven gradle and sbtAn introduction to maven gradle and sbt
An introduction to maven gradle and sbtFabio Fumarola
 
Develop with linux containers and docker
Develop with linux containers and dockerDevelop with linux containers and docker
Develop with linux containers and dockerFabio Fumarola
 
Linux containers and docker
Linux containers and dockerLinux containers and docker
Linux containers and dockerFabio Fumarola
 
A Parallel Algorithm for Approximate Frequent Itemset Mining using MapReduce
A Parallel Algorithm for Approximate Frequent Itemset Mining using MapReduce A Parallel Algorithm for Approximate Frequent Itemset Mining using MapReduce
A Parallel Algorithm for Approximate Frequent Itemset Mining using MapReduce Fabio Fumarola
 
NoSQL databases pros and cons
NoSQL databases pros and consNoSQL databases pros and cons
NoSQL databases pros and consFabio Fumarola
 

Más de Fabio Fumarola (20)

11. From Hadoop to Spark 2/2
11. From Hadoop to Spark 2/211. From Hadoop to Spark 2/2
11. From Hadoop to Spark 2/2
 
11. From Hadoop to Spark 1:2
11. From Hadoop to Spark 1:211. From Hadoop to Spark 1:2
11. From Hadoop to Spark 1:2
 
10b. Graph Databases Lab
10b. Graph Databases Lab10b. Graph Databases Lab
10b. Graph Databases Lab
 
10. Graph Databases
10. Graph Databases10. Graph Databases
10. Graph Databases
 
9b. Document-Oriented Databases lab
9b. Document-Oriented Databases lab9b. Document-Oriented Databases lab
9b. Document-Oriented Databases lab
 
9. Document Oriented Databases
9. Document Oriented Databases9. Document Oriented Databases
9. Document Oriented Databases
 
8. column oriented databases
8. column oriented databases8. column oriented databases
8. column oriented databases
 
7. Key-Value Databases: In Depth
7. Key-Value Databases: In Depth7. Key-Value Databases: In Depth
7. Key-Value Databases: In Depth
 
6 Data Modeling for NoSQL 2/2
6 Data Modeling for NoSQL 2/26 Data Modeling for NoSQL 2/2
6 Data Modeling for NoSQL 2/2
 
5 Data Modeling for NoSQL 1/2
5 Data Modeling for NoSQL 1/25 Data Modeling for NoSQL 1/2
5 Data Modeling for NoSQL 1/2
 
3 Git
3 Git3 Git
3 Git
 
2 Linux Container and Docker
2 Linux Container and Docker2 Linux Container and Docker
2 Linux Container and Docker
 
1. Introduction to the Course "Designing Data Bases with Advanced Data Models...
1. Introduction to the Course "Designing Data Bases with Advanced Data Models...1. Introduction to the Course "Designing Data Bases with Advanced Data Models...
1. Introduction to the Course "Designing Data Bases with Advanced Data Models...
 
Scala and spark
Scala and sparkScala and spark
Scala and spark
 
An introduction to maven gradle and sbt
An introduction to maven gradle and sbtAn introduction to maven gradle and sbt
An introduction to maven gradle and sbt
 
Develop with linux containers and docker
Develop with linux containers and dockerDevelop with linux containers and docker
Develop with linux containers and docker
 
Linux containers and docker
Linux containers and dockerLinux containers and docker
Linux containers and docker
 
08 datasets
08 datasets08 datasets
08 datasets
 
A Parallel Algorithm for Approximate Frequent Itemset Mining using MapReduce
A Parallel Algorithm for Approximate Frequent Itemset Mining using MapReduce A Parallel Algorithm for Approximate Frequent Itemset Mining using MapReduce
A Parallel Algorithm for Approximate Frequent Itemset Mining using MapReduce
 
NoSQL databases pros and cons
NoSQL databases pros and consNoSQL databases pros and cons
NoSQL databases pros and cons
 

Último

Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysismanisha194592
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxfirstjob4
 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxMohammedJunaid861692
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxolyaivanovalion
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Delhi Call girls
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptxAnupama Kate
 
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Delhi Call girls
 
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...shivangimorya083
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxolyaivanovalion
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxolyaivanovalion
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfadriantubila
 
Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...shambhavirathore45
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...amitlee9823
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAroojKhan71
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfLars Albertsson
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Researchmichael115558
 

Último (20)

Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptx
 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptx
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx
 
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
 
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFx
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
 
Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 

8b. Column Oriented Databases Lab

  • 2. A Twitter Clone • One of the most successful new Internet services of recent times is Twitter. • Since its launch it has exploded from niche usage to usage by the general populace, with celebrities such as Oprah Winfrey, Britney Spears, and Shaquille O'Neal, and politicians such as Barack Obama and Al Gore jumping into it. 2
  • 3. Why Twitter? • Simple: it does not care what you share, as a long it is less than 140 characters • A means to have public conversation: Twitter allows a user to tweet and have users respond using '@' reply, comment, or re-tweet • Fan versus friend • Understanding user behavior • Easy to share through text messaging • Easy to access through multiple devices and applications 3
  • 4. Twitter Stats • According to Compete (www.compete.com) 4
  • 5. Main Features • Allow users to post status updates (known as 'tweets' in Twitter) to the public. • Allow users to follow and unfollow other users. Users can follow any other user but it is not reciprocal. • Allow users to send public messages directed to particular users using the @ replies convention (in Twitter this is known as mentions) 5
  • 6. Main Features • Allow users to send direct messages to other users, messages are private to the sender and the recipient user only (direct messages are only to a single recipient). • Allow users to re-tweet or forward another user's status in their own status update. • Provide a public timeline where all statuses are publicly available for viewing. • Provide APIs to allow external applications access. 6
  • 8. Hbase: Features • Strictly consistent reads and writes. • Automatic and configurable sharding of tables • Automatic failover support between RegionServers. • Base classes for MapReduce jobs • Easy java API • Block cache and Bloom Filters for real-time queries. 8
  • 9. Hbase: Features • Query predicate push down via server side Filters • Thrift gateway and a REST-ful Web service that supports XML, Protobuf, and binary data encoding options • Extensible jruby-based (JIRB) shell • Support for exporting metrics via the Hadoop metrics subsystem to files or Ganglia; or via JMX 9
  • 10. Hbase: Installation • It can be run in 3 settings: – Single-node standalone – Pseudo-distributed single-machine – Fully-distributed cluster • We will see how to install HBase using Docker 10
  • 12. Single-node standalone • Source code at https://github.com/fabiofumarola/NoSQLDatabasesCourses • It uses the local file system not HDFS (not for production). • Download the tar distribution • Edit hbase-site.xml • Start HBase via start-hbase.sh • We can use jps to test if HBase is running 12
  • 13. Hbase-site.xml The folders are created automatically by HBase <configuration> <property> <name>hbase.rootdir</name> <value>file:///hbase-data/hbase</value> </property> <property> <name>hbase.zookeeper.property.dataDir</name> <value>/hbase-data/zookeeper</value> </property> </configuration> 13
  • 14. Single-node standalone • Build the image – docker build –tag=wheretolive/hbase:single ./ • Run the image – docker run –d –p 2181:2181 -p 60010:60010 -p 60000:60000 -p 60020:60020 -p 60030:60030 –h hbase --name=hbase wheretolive/hbase:single 14
  • 16. Pseudo-distributed • Run HBase in this mode means that each daemon (HMaster, HRegionServer and Zookpeeper) run as separate process. • Here we can store the data into HDFS if it is available • The main change is the hbase-site.xml 16 <configuration> <property> <name>hbase.cluster.distributed</name> <value>true</value> </property> </configuration>
  • 17. Pseudo-distributed • Build the image – docker build –tag=wheretolive/hbase:pseudo ./ • Run the image – docker run –d –p 2181:2181 -p 60010:60010 -p 60000:60000 -p 60020:60020 -p 60030:60030 –h hbase --name=hbase wheretolive/hbase:pseudo 17
  • 18. Interacting with the Hbase Shell 18
  • 19. HBase Shell • Start the shell • Create a table • List the tables 19 $ ./bin/hbase shell hbase(main):001:0> hbase(main):001:0> create 'test', 'cf' 0 row(s) in 0.4170 seconds => Hbase::Table - test hbase(main):002:0> list 'test' TABLE test 1 row(s) in 0.0180 seconds => ["test"]
  • 20. HBase shell 20 hbase(main):034:0> describe 'test' Table test is ENABLED test COLUMN FAMILIES DESCRIPTION {NAME => 'cf', BLOOMFILTER => 'ROW', VERSIONS => '1', IN_MEMORY => 'false', KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'NONE', TTL => 'FOREVER', COMPRESSION => 'NONE', MIN_VERSIONS => '0', BLOCKCACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '0'} 1 row(s) in 0.0480 seconds
  • 21. HBase shell: put data 21 hbase(main):003:0> put 'test', 'row1', 'cf:a', 'value1' 0 row(s) in 0.0850 seconds hbase(main):004:0> put 'test', 'row2', 'cf:b', 'value2' 0 row(s) in 0.0110 seconds hbase(main):005:0> put 'test', 'row3', 'cf:c', 'value3' 0 row(s) in 0.0100 seconds
  • 22. HBase shell get 22 hbase(main):007:0> get 'test', 'row1' COLUMN CELL cf:a timestamp=1421762485768, value=value1 1 row(s) in 0.0350 seconds
  • 23. HBase shell: incr 23 hbase(main):027:0> incr 'test', 'row3', 'cf:count', 1 COUNTER VALUE = 1 0 row(s) in 0.0070 seconds hbase(main):028:0> incr 'test', 'row3', 'cf:count', 1 COUNTER VALUE = 2 0 row(s) in 0.0210 seconds #Get Counter hbase(main):031:0> get_counter 'test', 'row3', 'cf:count' COUNTER VALUE = 4
  • 24. HBase shell: scan 24 hbase(main):006:0> scan 'test' ROW COLUMN+CELL row1 column=cf:a, timestamp=1430940122422, value=value1 row2 column=cf:b, timestamp=1430940126703, value=value2 row3 column=cf:c, timestamp=1430940130700, value=value3 3 row(s) in 0.0470 seconds
  • 25. HBase shell: disable and drop 25 hbase(main):008:0> disable 'test' 0 row(s) in 1.1820 seconds hbase(main):009:0> enable 'test' 0 row(s) in 0.1770 seconds hbase(main):011:0> drop 'test' 0 row(s) in 0.1370 seconds https://learnhbase.wordpress.com/2013/03/02/hbase-shell- commands/
  • 27. Users: Identifier • We need to represent users, of course, with their – username, userid, password, the set of users following a given user, the set of users a given user follows, and so on. • The first question is, how should we identify a user? • A solution is to associate a unique ID with every user. • Every other reference to this user will be done by id. – Create a table that stores all the ids 27
  • 28. Users 28 package HBaseIA.TwitBase.model; public abstract class User { public String user; public String name; public String email; public String password; @Override public String toString() { return String.format("<User: %s, %s, %s>", user, name, email); }
  • 29. Twits 29 public abstract class Twit { public String user; public DateTime dt; public String text; @Override public String toString() { return String.format( "<Twit: %s %s %s>", user, dt, text); } }
  • 30. Followers, following and updates • A user might have users who follow them, which we'll call their followers. • A user might follow other users, which we'll call a following 30 public abstract class Relation { public String relation; public String from; public String to; @Override public String toString() { return String.format( "<Relation: %s %s %s>", from, relation, to); } }
  • 31. Let us analyze the code in depth • http://www.manning.com/dimidukkhurana/ • https://github.com/hbaseinaction/twitbase • https://github.com/hbaseinaction 31

Notas del editor

  1. . You need to run HBase on HDFS to ensure all writes are preserved. Running against the local filesystem is intended as a shortcut to get you familiar with how the general system works, as the very first phase of evaluation.
  2. . You need to run HBase on HDFS to ensure all writes are preserved. Running against the local filesystem is intended as a shortcut to get you familiar with how the general system works, as the very first phase of evaluation.
  3. . You need to run HBase on HDFS to ensure all writes are preserved. Running against the local filesystem is intended as a shortcut to get you familiar with how the general system works, as the very first phase of evaluation.
  4. . You need to run HBase on HDFS to ensure all writes are preserved. Running against the local filesystem is intended as a shortcut to get you familiar with how the general system works, as the very first phase of evaluation.
  5. . You need to run HBase on HDFS to ensure all writes are preserved. Running against the local filesystem is intended as a shortcut to get you familiar with how the general system works, as the very first phase of evaluation.
  6. We use the next_user_id key in order to always get an unique ID for every new user. Then we use this unique ID to name the key holding an Hash with user&amp;apos;s data. This is a common design pattern with key-values stores! Keep it in mind.