SlideShare una empresa de Scribd logo
1 de 15
NoSQL........................................................................................................................2
  Why NoSQL?..........................................................................................................2
  NoSQL Categories..................................................................................................2
  Relational Vs NoSQL Databases............................................................................2
  Why Key/value store?.............................................................................................3
  Memcached (Key/value store on memory).............................................................4
  Memcachedb (Key/Value store on disk).................................................................8
  BerkeleyDB...........................................................................................................11
Document Stores.......................................................................................................14
Other Info..................................................................................................................15
NoSQL
NOT only SQL. It’s not about saying that SQL should never be used, or that SQL
is dead… it’s about recognizing that for some problems other storage solutions are
better suited.


Why NoSQL?
Trends that gave way for NoSQL paradigm
    Exploding Data Size – Each year more and more digital data is created.
       Over two years we create more digital data than all the data created in
       history before that.
    Increasing Connectedness – Over time data has evolved to be more and
       more interlinked and connected. Hypertext has links, Blogs have pingback,
       Tagging groups all related data.
    Semi-structure – Individualization of content, Store more about each entity,
       Acceleration of decentralized content generation (web 2.0)
    Architecture – Moving towards decoupled services with their own backend

Sources: http://www.slideshare.net/novelys/nosql-3272395
http://www.slideshare.net/marin_dimitrov/nosql-databases-3584443
http://www.slideshare.net/thobe/nosql-for-dummies


NoSQL Categories

 NoSQL Products




Relational Vs NoSQL Databases
Key/value store
Why Key/value store?
Even though RDBMS have provided database users with the best mix of simplicity,
robustness, flexibility, performance, scalability, and compatibility, their
performance in each of these areas is not necessarily better than that of an alternate
solution pursuing one of these benefits in isolation. This has not been much of a
problem so far because the universal dominance of RDBMS has outweighed the
need to push any of these boundaries. Nonetheless, if you really had a need that
couldn't be answered by a generic relational database, alternatives have always been
around to fill those niches.
Today, we are in a slightly different situation. For an increasing number of
applications, one of these benefits is becoming more and more critical; and while
still considered a niche, it is rapidly becoming mainstream, so much so that for an
increasing number of database users this requirement is beginning to eclipse others
in importance. That benefit is scalability. As more and more applications are
launched in environments that have massive workloads, such as web services, their
scalability requirements can, first of all, change very quickly and, secondly, grow
very large. The first scenario can be difficult to manage if you have a relational
database sitting on a single in-house server. For example, if your load triples
overnight, how quickly can you upgrade your hardware? The second scenario can
be too difficult to manage with a relational database in general.

Relational databases scale well, but usually only when that scaling happens on a
single server node. When the capacity of that single node is reached, you need to
scale out and distribute that load across multiple server nodes. This is when the
complexity of relational databases starts to rub against their potential to scale. Try
scaling to hundreds or thousands of nodes, rather than a few, and the complexities
become overwhelming, and the characteristics that make RDBMS so appealing
drastically reduce their viability as platforms for large distributed systems.

For cloud services to be viable, vendors have had to address this limitation, because
a cloud platform without a scalable data store is not much of a platform at all. So, to
provide customers with a scalable place to store application data, vendors had only
one real option. They had to implement a new type of database system that focuses
on scalability, at the expense of the other benefits that come with relational
databases.

These efforts, combined with those of existing niche vendors, have led to the rise of
a new breed of database management system.

Source:http://www.slideshare.net/marc.seeger/keyvalue-stores-a-practical-overview


Memcached (Key/value store on memory)
Definition
Free & open source, high-performance, distributed memory object caching
system, generic in nature, but intended for use in speeding up dynamic web
applications by alleviating database load.

Memcached is an in-memory key-value store for small chunks of arbitrary data
(strings, objects) from results of database calls, API calls, or page rendering.
Memcached is simple yet powerful. Its simple design promotes quick deployment,
ease of development, and solves many problems facing large data caches. Its API is
available for most popular languages.


What is it made up of?
      Client software, which is given a list of available memcached servers.
      A client-based hashing algorithm, which chooses a server based on the
       "key" input.
      Server software, which stores your values with their keys into an internal
       hash table.
      Server algorithms, which determine when to throw out old data (if out of
       memory), or reuse memory.

What are the Design Philosophies?
Simple Key/Value Store
The server does not care what your data looks like. Items are made up of a key, an
expiration time, optional flags, and raw data. It does not understand data structures;
you must upload data that is pre-serialized. Some commands (incr/decr) may
operate on the underlying data, but the implementation is simplistic.

Smarts Half in Client, Half in Server
A "memcached implementation" is implemented partially in a client, and partially
in a server. Clients understand how to send items to particular servers, what to do
when it cannot contact a server, and how to fetch keys from the servers. The servers
understand how to receive items, and how to expire them.

Servers are Disconnected From Each Other
Memcached servers are generally unaware of each other. There is no crosstalk, no
synchronization, no broadcasting. The lack of interconnections means adding more
servers will usually add more capacity as you expect. There might be exceptions to
this rule, but they are exceptions and carefully regarded.

O(1) Everything
For everything it can, memcached commands are O(1). Each command takes
roughly the same amount of time to process every time, and should not get
noticably slower anywhere. This goes back to the "Simple K/V Store" principle, as
you don't want to be processing data in the cache service your tens or hundreds or
thousands of webservers may need to access at the same time.

Forgetting Data is a Feature
Memcached is, by default, a Least Recently Used cache. It is designed to have items
expire after a specified amount of time. Both of these are elegant solutions to many
problems; Expire items after a minute to limit stale data being returned, or flush
unused data in an effort to retain frequently requested information.
This further allows great simplification in how memcached works. No "pauses"
waiting for a garbage collector ensures low latency, and free space is lazily
reclaimed.

Cache Invalidation is a Hard Problem
Given memcached's centralized-as-a-cluster nature, the job of invalidating a cache
entry is trivial. Instead of broadcasting data to all available hosts, clients direct in on
the exact location of data to be invalidated. You may further complicate matters to
your needs, and there are caveats, but you sit on a strong baseline.


Architecture
The system uses client–server architecture. The servers maintain a key–value
associative array; the clients populate this array and query it. Keys are up to 250
bytes long and values can be at most 1 megabyte large.

Clients use client side libraries to contact the servers which, by default, expose their
service at port 11211. Each client knows all servers; the servers do not
communicate with each other. If a client wishes to set or read the value
corresponding to a certain key, the client's library first computes a hash of the key
to determine the server that will be used. Then it contacts that server. The server
will compute a second hash of the key to determine where to store or read the
corresponding value.

The servers keep the values in RAM; if a server runs out of RAM, it discards the
oldest values. Therefore, clients must treat Memcached as a transitory cache; they
cannot assume that data stored in Memcached is still there when they need it. A
Memcached-protocol compatible product known as MemcacheDB provides
persistent storage. There is also a solution called Membase from NorthScale that
provides persistence, replication and clustering.

If all client libraries use the same hashing algorithm to determine servers, then
clients can read each other's cached data; this is obviously desirable.

A typical deployment will have several servers and many clients. However, it is
possible to use Memcached on a single computer, acting simultaneously as client
and server.

http://memcached.org/

How this stuff works? a.k.a “The Memcache Pattern “
(http://code.google.com/appengine/docs/python/memcache/usingmemcache.html#Pattern)
Memcache is typically used with the following pattern:
  • The application receives a query from the user or the application.
  • The application checks whether the data needed to satisfy that query is in
     memcache.
o If the data is in memcache, the application uses that data.
           o If the data is not in memcache, the application queries the datastore
             and stores the results in memcache for future requests.
The pseudocode below represents a typical memcache request:
     def get_data():
          data = memcache.get("key")
          if data is not None:
                return data
          else:
                data = self.query_for_data()
                memcache.add("key", data, 60)
                return data

Memcached allows you to take memory from parts of your system where you have
more than you need and make it accessible to areas where you have less than you
need.

Memcached also allows you to make better
use of your memory. If you consider the
diagram to the right, you can see two
deployment scenarios:

   1. Each node is completely independent
(top).
   2. Each node can make use of memory
from other nodes (bottom).

The first scenario illustrates the classic
deployment strategy, however you'll find
that it's both wasteful in the sense that the
total cache size is a fraction of the actual
capacity of your web farm, but also in the
amount of effort required to keep the cache
consistent across all of those nodes.

With memcached, you can see that all of the
servers are looking into the same virtual
pool of memory. This means that a given
item is always stored and always retrieved
from the same location in your entire web
cluster.

Also, as the demand for your application grows to the point where you need to have
more servers, it generally also grows in terms of the data that must be regularly
accessed. A deployment strategy where these two aspects of your system scale
together just makes sense.
The illustration to the right only shows two web servers for simplicity, but the
property remains the same as the number increases. If you had fifty web servers,
you'd still have a usable cache size of 64MB in the first example, but in the second,
you'd have 3.2GB of usable cache.

Of course, you aren't required to use your web server's memory for cache. Many
memcached users have dedicated machines that are built to only be memcached
servers.

Users of Memcached
LiveJournal, Wikipedia, Flickr, Bebo, Twitter, Typepad, Yellowbot, Youtube,
Digg, Wordpress, Craigslist, Mixi


Memcachedb (Key/Value store on disk)
Definition (Wiki:
http://en.wikipedia.org/wiki/Memcachedb)
is a persistence enabled variant of memcached, a general-purpose distributed
memory caching system often used to speed up dynamic database-driven
websites by caching data and objects in memory. The main difference between
MemcacheDB and memcached is that MemcacheDB has its own key-value
database system based on Berkeley DB, so it is meant for persistent storage
rather than as a cache solution. MemcacheDB is accessed through the same protocol
as memcached, so applications may use any memcached API as a means of
accessing a MemcacheDB database.

MemcacheQ is a MemcacheDB variant that provides a simple message queue
service.

MemcacheDB is a distributed key-value storage system designed for persistent. It
is NOT a cache solution, but a persistent storage engine for fast and reliable key-
value based object storage and retrieval. It conforms to memcache protocol, so
any memcached client can have connectivity with it. MemcacheDB uses Berkeley
DB as a storing backend, so lots of features including transaction and replication
are supported.

Memcached was first developed by Brad Fitzpatrick for his website LiveJournal, on
May 22, 2003.

Features
      High performance read/write for a key-value based object. Rapid set/get
       for a key-value based object, not relational. Benchmark will tell you the true
       later.
   High reliable persistent storage with transaction. Transaction is used to
        make your data more reliable.
       High availability data storage with replication. Replication rocks!
        Achieve your HA, spread your read, make your transaction durable!
       Memcache protocol compatibility. Lots of Memcached Client APIs can be
        used for Memcachedb, almost in any language, Perl, C, Python, Java, ...

Why memcachedb?
We have MySQL, we have PostgreSQL, we have a lot of RDBMSs, but why we
need Memcachedb?
    RDBMS is slow. All they have a complicated SQL engine on top of storage.
      Our data requires to be stored and retrieved damnable fast.
    Not concurrent well. When thousands of clients, millions of requests
      happens...
    But the data we wanna store is very small size! Cost is high if we use
      RDBMS.
    Many critical infrastructure services need fast, reliable data storage and
      retrieval, but do not need the flexibility of dynamic SQL queries.
          o Index, Counter, Flags
          o Identity Management(Account, Profile, User config info, Score)
          o Messaging
          o Personal domain name
          o meta data of distributed system
          o Other non-relatonal data

Performance Benchmark:
MemcacheDB is very fast.

Environment
    • Box: Dell 2950III
    • OS: Linux CentOS 5
    • Version: memcachedb-1.0.0-beta
    • Client API: libmemcached
a. Non-thread Edition
Started: memcachedb -d -r -u root -H /data1/mdbtest/ -N -v
Write (key: 16 value: 100B, 8 concurrents, every process does 2,000,000 set)
No.     1 2 3 4 5 6 7 8 avg.
Cost(s) 807 835 840 853 859 857 865 868 848
2000000 * 8 / 848 = 18868 w/s
Read (key: 16 value: 100B, 8 concurrents, every process does 2,000,000 get)
No.     1 2 3 4 5 6 7 8 avg.
Cost(s) 354 354 359 358 357 364 363 365 360
2000000 * 8 / 360 = 44444 r/s
b. Thread Edition(4 Threads)
Started: memcachedb -d -r -u root -H /data1/mdbtest/ -N -t 4 –v
Write (key: 16 value: 100B, 8 concurrents, every process does 2,000,000 set)
No.     1 2 3 4 5 6 7 8 avg.
Cost(s) 663 669 680 680 684 683 687 686 679
2000000 * 8 / 679 = 23564 w/s
Read (key: 16 value: 100B, 8 concurrents, every process does 2,000,000 get)
No.     1 2 3 4 5 6 7 8 avg.
Cost(s) 245 249 250 248 248 249 251 250 249
2000000 * 8 / 249 = 64257 r/s


How this stuff works??
Source:    http://memcachedb.org/     and     http://memcachedb.org/memcachedb-
guide-1.0.pdf

Non Thread version




Thread version
BerkeleyDB
(Persistent storage used by memcachedb)
Source: http://www.oracle.com/technology/products/berkeley-db/db/index.html

Oracle Berkeley DB is a high-performance embeddable database providing SQL,
Java Object and Key/Value storage. Berkeley DB offers advanced features
including transactional data storage, highly concurrent access, replication for high
availability, and fault tolerance in a self-contained, small footprint software library.

Berkeley DB enables the development of custom data management solutions,
without the overhead traditionally associated with such custom projects. Berkeley
DB provides a collection of well-proven building-block technologies that can be
configured to address any application need from the handheld device to the
datacenter, from a local storage solution to a world-wide distributed one, from
kilobytes to petabytes.

Berkeley DB can be downloaded and the source code can be reviewed, then choose
your build options and then compile the library in the configuration most suitable
for your needs. The Berkeley DB library is a building block that provides the
complex data management features found in enterprise class databases. These
facilities include high throughput, low-latency reads, non-blocking writes, high
concurrency, data scalability, in-memory caching, ACID transactions, automatic
and catastrophic recovery when the application, system or hardware fails, high
availability and replication in an application configurable package. Simply
configure the library and use the particular features available to satisfy your
particular application needs.

Oracle Berkeley DB fits where you need it regardless of programming language,
hardware platform, or storage media. Berkeley DB APIs are available in almost all
programming languages including ANSI-C, C++, Java, C#, Perl, Python, Ruby and
Erlang to name a few. There is a pure-Java version of the Berkeley DB library
designed for products that must run entirely within a Java Virtual Machine (JVM).
We support the Microsoft .NET environment and the Common Language Runtime
(CLR) with a C# API. Oracle Berkeley DB is tested and certified to compile and
run on all modern operating systems including Solaris, Windows, Linux, Android,
Mac OS/X, BSD, iPhone OS, VxWorks, and QNX to name a few.

Storage engine design




BerkeleyDB               BerkeleyDB Java Ed.           BerkeleyDB XML
Written in C             Written in Java               Written in C++
Software Library         Java Software Archive         Software Library
                         (JAR)
Key/value API            Key/value API                 Layered on Berkeley DB
SQL API by incorporating Java Direct Persistence       XQuery        API     by
SQLite                   Layer (DPL) API               incorporating XQilla
BTREE, HASH, QUEUE, Java Collections API               Indexed, optimized XML
RECNO storage                                          storage
C++,    Java/JNI,    C#, Replication for High          C++,     Java/JNI,   C#,
Python, Perl, ...       Availability               Python, Perl, ...
Java Direct Persistence                            Replication for     High
Layer (DPL) API                                    Availability
Java Collections API
Replication for High
Availability




Use cases of BerkeleyDB
    Amazon’s Dynamo -
       http://www.allthingsdistributed.com/2007/10/amazons_dynamo.html
    BerkeleyDB Java Ed. On Android on
       http://www.oracle.com/technetwork/database/berkeleydb/bdb-je-
       android-160932.pdf
    Infoflex Connect AB Embeds Critical Edge into High-Speed, High-
       Performance SMS Messaging Gateway - http://www.oracle.com/customers/
       snapshots/infoflex-connect-database-snapshot.pdf
Document Stores
Other Info
http://www.metabrew.com/article/anti-rdbms-a-list-of-distributed-key-value-stores

http://ayende.com/Blog/category/565.aspx

http://www.readwriteweb.com/enterprise/2009/02/is-the-relational-database-
doomed.php

Más contenido relacionado

La actualidad más candente

Evaluating Apache Cassandra as a Cloud Database
Evaluating Apache Cassandra as a Cloud DatabaseEvaluating Apache Cassandra as a Cloud Database
Evaluating Apache Cassandra as a Cloud DatabaseDataStax
 
High availability solutions bakostech
High availability solutions bakostechHigh availability solutions bakostech
High availability solutions bakostechViktoria Bakos
 
How to boost performance of your rails app using dynamo db and memcached
How to boost performance of your rails app using dynamo db and memcachedHow to boost performance of your rails app using dynamo db and memcached
How to boost performance of your rails app using dynamo db and memcachedAndolasoft Inc
 
Distributed applications using Hazelcast
Distributed applications using HazelcastDistributed applications using Hazelcast
Distributed applications using HazelcastTaras Matyashovsky
 
NoSQL Databases: An Introduction and Comparison between Dynamo, MongoDB and C...
NoSQL Databases: An Introduction and Comparison between Dynamo, MongoDB and C...NoSQL Databases: An Introduction and Comparison between Dynamo, MongoDB and C...
NoSQL Databases: An Introduction and Comparison between Dynamo, MongoDB and C...Vivek Adithya Mohankumar
 
Scaling Out Tier Based Applications
Scaling Out Tier Based ApplicationsScaling Out Tier Based Applications
Scaling Out Tier Based ApplicationsYury Kaliaha
 
Chapter1: NoSQL: It’s about making intelligent choices
Chapter1: NoSQL: It’s about making intelligent choicesChapter1: NoSQL: It’s about making intelligent choices
Chapter1: NoSQL: It’s about making intelligent choicesMaynooth University
 
HTTP Session Replication with Oracle Coherence, GlassFish, WebLogic
HTTP Session Replication with Oracle Coherence, GlassFish, WebLogicHTTP Session Replication with Oracle Coherence, GlassFish, WebLogic
HTTP Session Replication with Oracle Coherence, GlassFish, WebLogicOracle
 
Apache Cassandra at the Geek2Geek Berlin
Apache Cassandra at the Geek2Geek BerlinApache Cassandra at the Geek2Geek Berlin
Apache Cassandra at the Geek2Geek BerlinChristian Johannsen
 
An efficient concurrent access on cloud database using secureDBAAS
An efficient concurrent access on cloud database using secureDBAASAn efficient concurrent access on cloud database using secureDBAAS
An efficient concurrent access on cloud database using secureDBAASIJTET Journal
 
Scalability Considerations
Scalability ConsiderationsScalability Considerations
Scalability ConsiderationsNavid Malek
 
Cache and consistency in nosql
Cache and consistency in nosqlCache and consistency in nosql
Cache and consistency in nosqlJoão Gabriel Lima
 

La actualidad más candente (20)

Evaluating Apache Cassandra as a Cloud Database
Evaluating Apache Cassandra as a Cloud DatabaseEvaluating Apache Cassandra as a Cloud Database
Evaluating Apache Cassandra as a Cloud Database
 
High availability solutions bakostech
High availability solutions bakostechHigh availability solutions bakostech
High availability solutions bakostech
 
My sql
My sqlMy sql
My sql
 
Branch office access with branch cache
Branch office access with branch cacheBranch office access with branch cache
Branch office access with branch cache
 
DAG
DAGDAG
DAG
 
How to boost performance of your rails app using dynamo db and memcached
How to boost performance of your rails app using dynamo db and memcachedHow to boost performance of your rails app using dynamo db and memcached
How to boost performance of your rails app using dynamo db and memcached
 
Polyglot Persistence
Polyglot Persistence Polyglot Persistence
Polyglot Persistence
 
Distributed applications using Hazelcast
Distributed applications using HazelcastDistributed applications using Hazelcast
Distributed applications using Hazelcast
 
NoSQL Databases: An Introduction and Comparison between Dynamo, MongoDB and C...
NoSQL Databases: An Introduction and Comparison between Dynamo, MongoDB and C...NoSQL Databases: An Introduction and Comparison between Dynamo, MongoDB and C...
NoSQL Databases: An Introduction and Comparison between Dynamo, MongoDB and C...
 
Data models in NoSQL
Data models in NoSQLData models in NoSQL
Data models in NoSQL
 
Scaling Out Tier Based Applications
Scaling Out Tier Based ApplicationsScaling Out Tier Based Applications
Scaling Out Tier Based Applications
 
Introduction to NoSQL
Introduction to NoSQLIntroduction to NoSQL
Introduction to NoSQL
 
Chapter1: NoSQL: It’s about making intelligent choices
Chapter1: NoSQL: It’s about making intelligent choicesChapter1: NoSQL: It’s about making intelligent choices
Chapter1: NoSQL: It’s about making intelligent choices
 
HTTP Session Replication with Oracle Coherence, GlassFish, WebLogic
HTTP Session Replication with Oracle Coherence, GlassFish, WebLogicHTTP Session Replication with Oracle Coherence, GlassFish, WebLogic
HTTP Session Replication with Oracle Coherence, GlassFish, WebLogic
 
Apache Cassandra at the Geek2Geek Berlin
Apache Cassandra at the Geek2Geek BerlinApache Cassandra at the Geek2Geek Berlin
Apache Cassandra at the Geek2Geek Berlin
 
An efficient concurrent access on cloud database using secureDBAAS
An efficient concurrent access on cloud database using secureDBAASAn efficient concurrent access on cloud database using secureDBAAS
An efficient concurrent access on cloud database using secureDBAAS
 
Scalability Considerations
Scalability ConsiderationsScalability Considerations
Scalability Considerations
 
Nosql intro
Nosql introNosql intro
Nosql intro
 
Cache and consistency in nosql
Cache and consistency in nosqlCache and consistency in nosql
Cache and consistency in nosql
 
Cassandra Architecture FTW
Cassandra Architecture FTWCassandra Architecture FTW
Cassandra Architecture FTW
 

Destacado

PHP Underground Session 1: The Basics
PHP Underground Session 1: The BasicsPHP Underground Session 1: The Basics
PHP Underground Session 1: The BasicsRobin Hawkes
 
Winer istm dublin_20160909_d_final
Winer istm dublin_20160909_d_finalWiner istm dublin_20160909_d_final
Winer istm dublin_20160909_d_finalDov Winer
 
If you can see it, you can change it
If you can see it, you can change itIf you can see it, you can change it
If you can see it, you can change itJames Smith
 
YP-T8 Samsung Handbuch
YP-T8 Samsung HandbuchYP-T8 Samsung Handbuch
YP-T8 Samsung Handbuchjulia135
 
Dr Anil Khandelwal ss infotour
Dr Anil Khandelwal ss infotourDr Anil Khandelwal ss infotour
Dr Anil Khandelwal ss infotourguestfc8a87
 
041018 It Committee Bog Onlyejewish
041018 It Committee Bog Onlyejewish041018 It Committee Bog Onlyejewish
041018 It Committee Bog OnlyejewishDov Winer
 
MelbJS - Inside Rawkets
MelbJS - Inside RawketsMelbJS - Inside Rawkets
MelbJS - Inside RawketsRobin Hawkes
 
IIG_IFCA_Presentation-0909
IIG_IFCA_Presentation-0909IIG_IFCA_Presentation-0909
IIG_IFCA_Presentation-0909iigsolutions
 
2015teleyedaB
2015teleyedaB2015teleyedaB
2015teleyedaBDov Winer
 
Judaica europeana dovwinerjudaicalibrarians
Judaica europeana dovwinerjudaicalibrariansJudaica europeana dovwinerjudaicalibrarians
Judaica europeana dovwinerjudaicalibrariansDov Winer
 
XPers/失われたリール《腰索》
XPers/失われたリール《腰索》XPers/失われたリール《腰索》
XPers/失われたリール《腰索》ledsun
 
2012 11 07 pre incubator workshop
2012 11 07 pre incubator workshop2012 11 07 pre incubator workshop
2012 11 07 pre incubator workshopjvielman
 
Eun lre brussels_winer20100616
Eun lre brussels_winer20100616Eun lre brussels_winer20100616
Eun lre brussels_winer20100616Dov Winer
 
MozTW 離線報
MozTW 離線報MozTW 離線報
MozTW 離線報Toomore
 
MOSAICA: Semantically Enhanced Multifaceted Collaborative Access to Cultural ...
MOSAICA: Semantically Enhanced Multifaceted Collaborative Access to Cultural ...MOSAICA: Semantically Enhanced Multifaceted Collaborative Access to Cultural ...
MOSAICA: Semantically Enhanced Multifaceted Collaborative Access to Cultural ...Dov Winer
 
Oscar Gamero Garate - Canciones italianas
Oscar Gamero Garate - Canciones italianasOscar Gamero Garate - Canciones italianas
Oscar Gamero Garate - Canciones italianasmeroga
 
Toomore 20130627 Taipei.py
Toomore 20130627 Taipei.pyToomore 20130627 Taipei.py
Toomore 20130627 Taipei.pyToomore
 
UC Onliner #5 - Mei-Juni 2014
UC Onliner #5 - Mei-Juni 2014UC Onliner #5 - Mei-Juni 2014
UC Onliner #5 - Mei-Juni 2014Nur Agustinus
 

Destacado (20)

PHP Underground Session 1: The Basics
PHP Underground Session 1: The BasicsPHP Underground Session 1: The Basics
PHP Underground Session 1: The Basics
 
Winer istm dublin_20160909_d_final
Winer istm dublin_20160909_d_finalWiner istm dublin_20160909_d_final
Winer istm dublin_20160909_d_final
 
If you can see it, you can change it
If you can see it, you can change itIf you can see it, you can change it
If you can see it, you can change it
 
YP-T8 Samsung Handbuch
YP-T8 Samsung HandbuchYP-T8 Samsung Handbuch
YP-T8 Samsung Handbuch
 
Dr Anil Khandelwal ss infotour
Dr Anil Khandelwal ss infotourDr Anil Khandelwal ss infotour
Dr Anil Khandelwal ss infotour
 
041018 It Committee Bog Onlyejewish
041018 It Committee Bog Onlyejewish041018 It Committee Bog Onlyejewish
041018 It Committee Bog Onlyejewish
 
2009 2010 Presentation
2009 2010 Presentation2009 2010 Presentation
2009 2010 Presentation
 
MelbJS - Inside Rawkets
MelbJS - Inside RawketsMelbJS - Inside Rawkets
MelbJS - Inside Rawkets
 
IIG_IFCA_Presentation-0909
IIG_IFCA_Presentation-0909IIG_IFCA_Presentation-0909
IIG_IFCA_Presentation-0909
 
2015teleyedaB
2015teleyedaB2015teleyedaB
2015teleyedaB
 
Judaica europeana dovwinerjudaicalibrarians
Judaica europeana dovwinerjudaicalibrariansJudaica europeana dovwinerjudaicalibrarians
Judaica europeana dovwinerjudaicalibrarians
 
XPers/失われたリール《腰索》
XPers/失われたリール《腰索》XPers/失われたリール《腰索》
XPers/失われたリール《腰索》
 
2012 11 07 pre incubator workshop
2012 11 07 pre incubator workshop2012 11 07 pre incubator workshop
2012 11 07 pre incubator workshop
 
Eun lre brussels_winer20100616
Eun lre brussels_winer20100616Eun lre brussels_winer20100616
Eun lre brussels_winer20100616
 
MozTW 離線報
MozTW 離線報MozTW 離線報
MozTW 離線報
 
MOSAICA: Semantically Enhanced Multifaceted Collaborative Access to Cultural ...
MOSAICA: Semantically Enhanced Multifaceted Collaborative Access to Cultural ...MOSAICA: Semantically Enhanced Multifaceted Collaborative Access to Cultural ...
MOSAICA: Semantically Enhanced Multifaceted Collaborative Access to Cultural ...
 
Oscar Gamero Garate - Canciones italianas
Oscar Gamero Garate - Canciones italianasOscar Gamero Garate - Canciones italianas
Oscar Gamero Garate - Canciones italianas
 
Toomore 20130627 Taipei.py
Toomore 20130627 Taipei.pyToomore 20130627 Taipei.py
Toomore 20130627 Taipei.py
 
HISTORIA DE LA FISICA-1
HISTORIA DE LA FISICA-1HISTORIA DE LA FISICA-1
HISTORIA DE LA FISICA-1
 
UC Onliner #5 - Mei-Juni 2014
UC Onliner #5 - Mei-Juni 2014UC Onliner #5 - Mei-Juni 2014
UC Onliner #5 - Mei-Juni 2014
 

Similar a No sql exploration keyvaluestore

Mysql wp memcached
Mysql wp memcachedMysql wp memcached
Mysql wp memcachedkbour23
 
EOUG95 - Client Server Very Large Databases - Paper
EOUG95 - Client Server Very Large Databases - PaperEOUG95 - Client Server Very Large Databases - Paper
EOUG95 - Client Server Very Large Databases - PaperDavid Walker
 
Scalable Web Architecture and Distributed Systems
Scalable Web Architecture and Distributed SystemsScalable Web Architecture and Distributed Systems
Scalable Web Architecture and Distributed Systemshyun soomyung
 
How to scale your web app
How to scale your web appHow to scale your web app
How to scale your web appGeorgio_1999
 
Altoros using no sql databases for interactive_applications
Altoros using no sql databases for interactive_applicationsAltoros using no sql databases for interactive_applications
Altoros using no sql databases for interactive_applicationsJeff Harris
 
EN - Azure - Cache for Redis.pdf
EN - Azure - Cache for Redis.pdfEN - Azure - Cache for Redis.pdf
EN - Azure - Cache for Redis.pdfArnaudMorvillier1
 
Memcached Presentation
Memcached PresentationMemcached Presentation
Memcached PresentationAsif Ali
 
Web20expo Scalable Web Arch
Web20expo Scalable Web ArchWeb20expo Scalable Web Arch
Web20expo Scalable Web Archroyans
 
Web20expo Scalable Web Arch
Web20expo Scalable Web ArchWeb20expo Scalable Web Arch
Web20expo Scalable Web Archguest18a0f1
 
Web20expo Scalable Web Arch
Web20expo Scalable Web ArchWeb20expo Scalable Web Arch
Web20expo Scalable Web Archmclee
 
If NoSQL is your answer, you are probably asking the wrong question.
If NoSQL is your answer, you are probably asking the wrong question.If NoSQL is your answer, you are probably asking the wrong question.
If NoSQL is your answer, you are probably asking the wrong question.Lukas Smith
 
Configuration and Deployment Guide For Memcached on Intel® Architecture
Configuration and Deployment Guide For Memcached on Intel® ArchitectureConfiguration and Deployment Guide For Memcached on Intel® Architecture
Configuration and Deployment Guide For Memcached on Intel® ArchitectureOdinot Stanislas
 
Benchmarking Couchbase Server for Interactive Applications
Benchmarking Couchbase Server for Interactive ApplicationsBenchmarking Couchbase Server for Interactive Applications
Benchmarking Couchbase Server for Interactive ApplicationsAltoros
 
Scaling SQL and NoSQL Databases in the Cloud
Scaling SQL and NoSQL Databases in the Cloud Scaling SQL and NoSQL Databases in the Cloud
Scaling SQL and NoSQL Databases in the Cloud RightScale
 
Scalable Architecture 101
Scalable Architecture 101Scalable Architecture 101
Scalable Architecture 101Mike Willbanks
 

Similar a No sql exploration keyvaluestore (20)

Mysql wp memcached
Mysql wp memcachedMysql wp memcached
Mysql wp memcached
 
No sql
No sqlNo sql
No sql
 
EOUG95 - Client Server Very Large Databases - Paper
EOUG95 - Client Server Very Large Databases - PaperEOUG95 - Client Server Very Large Databases - Paper
EOUG95 - Client Server Very Large Databases - Paper
 
Scalable Web Architecture and Distributed Systems
Scalable Web Architecture and Distributed SystemsScalable Web Architecture and Distributed Systems
Scalable Web Architecture and Distributed Systems
 
How to scale your web app
How to scale your web appHow to scale your web app
How to scale your web app
 
How To Scale v2
How To Scale v2How To Scale v2
How To Scale v2
 
Altoros using no sql databases for interactive_applications
Altoros using no sql databases for interactive_applicationsAltoros using no sql databases for interactive_applications
Altoros using no sql databases for interactive_applications
 
EN - Azure - Cache for Redis.pdf
EN - Azure - Cache for Redis.pdfEN - Azure - Cache for Redis.pdf
EN - Azure - Cache for Redis.pdf
 
No sql3 rmoug
No sql3 rmougNo sql3 rmoug
No sql3 rmoug
 
Memcached Presentation
Memcached PresentationMemcached Presentation
Memcached Presentation
 
Web20expo Scalable Web Arch
Web20expo Scalable Web ArchWeb20expo Scalable Web Arch
Web20expo Scalable Web Arch
 
Web20expo Scalable Web Arch
Web20expo Scalable Web ArchWeb20expo Scalable Web Arch
Web20expo Scalable Web Arch
 
Web20expo Scalable Web Arch
Web20expo Scalable Web ArchWeb20expo Scalable Web Arch
Web20expo Scalable Web Arch
 
If NoSQL is your answer, you are probably asking the wrong question.
If NoSQL is your answer, you are probably asking the wrong question.If NoSQL is your answer, you are probably asking the wrong question.
If NoSQL is your answer, you are probably asking the wrong question.
 
Configuration and Deployment Guide For Memcached on Intel® Architecture
Configuration and Deployment Guide For Memcached on Intel® ArchitectureConfiguration and Deployment Guide For Memcached on Intel® Architecture
Configuration and Deployment Guide For Memcached on Intel® Architecture
 
Benchmarking Couchbase Server for Interactive Applications
Benchmarking Couchbase Server for Interactive ApplicationsBenchmarking Couchbase Server for Interactive Applications
Benchmarking Couchbase Server for Interactive Applications
 
NoSQL Consepts
NoSQL ConseptsNoSQL Consepts
NoSQL Consepts
 
No sql presentation
No sql presentationNo sql presentation
No sql presentation
 
Scaling SQL and NoSQL Databases in the Cloud
Scaling SQL and NoSQL Databases in the Cloud Scaling SQL and NoSQL Databases in the Cloud
Scaling SQL and NoSQL Databases in the Cloud
 
Scalable Architecture 101
Scalable Architecture 101Scalable Architecture 101
Scalable Architecture 101
 

No sql exploration keyvaluestore

  • 1. NoSQL........................................................................................................................2 Why NoSQL?..........................................................................................................2 NoSQL Categories..................................................................................................2 Relational Vs NoSQL Databases............................................................................2 Why Key/value store?.............................................................................................3 Memcached (Key/value store on memory).............................................................4 Memcachedb (Key/Value store on disk).................................................................8 BerkeleyDB...........................................................................................................11 Document Stores.......................................................................................................14 Other Info..................................................................................................................15
  • 2. NoSQL NOT only SQL. It’s not about saying that SQL should never be used, or that SQL is dead… it’s about recognizing that for some problems other storage solutions are better suited. Why NoSQL? Trends that gave way for NoSQL paradigm  Exploding Data Size – Each year more and more digital data is created. Over two years we create more digital data than all the data created in history before that.  Increasing Connectedness – Over time data has evolved to be more and more interlinked and connected. Hypertext has links, Blogs have pingback, Tagging groups all related data.  Semi-structure – Individualization of content, Store more about each entity, Acceleration of decentralized content generation (web 2.0)  Architecture – Moving towards decoupled services with their own backend Sources: http://www.slideshare.net/novelys/nosql-3272395 http://www.slideshare.net/marin_dimitrov/nosql-databases-3584443 http://www.slideshare.net/thobe/nosql-for-dummies NoSQL Categories NoSQL Products Relational Vs NoSQL Databases
  • 3. Key/value store Why Key/value store? Even though RDBMS have provided database users with the best mix of simplicity, robustness, flexibility, performance, scalability, and compatibility, their performance in each of these areas is not necessarily better than that of an alternate solution pursuing one of these benefits in isolation. This has not been much of a problem so far because the universal dominance of RDBMS has outweighed the need to push any of these boundaries. Nonetheless, if you really had a need that couldn't be answered by a generic relational database, alternatives have always been around to fill those niches.
  • 4. Today, we are in a slightly different situation. For an increasing number of applications, one of these benefits is becoming more and more critical; and while still considered a niche, it is rapidly becoming mainstream, so much so that for an increasing number of database users this requirement is beginning to eclipse others in importance. That benefit is scalability. As more and more applications are launched in environments that have massive workloads, such as web services, their scalability requirements can, first of all, change very quickly and, secondly, grow very large. The first scenario can be difficult to manage if you have a relational database sitting on a single in-house server. For example, if your load triples overnight, how quickly can you upgrade your hardware? The second scenario can be too difficult to manage with a relational database in general. Relational databases scale well, but usually only when that scaling happens on a single server node. When the capacity of that single node is reached, you need to scale out and distribute that load across multiple server nodes. This is when the complexity of relational databases starts to rub against their potential to scale. Try scaling to hundreds or thousands of nodes, rather than a few, and the complexities become overwhelming, and the characteristics that make RDBMS so appealing drastically reduce their viability as platforms for large distributed systems. For cloud services to be viable, vendors have had to address this limitation, because a cloud platform without a scalable data store is not much of a platform at all. So, to provide customers with a scalable place to store application data, vendors had only one real option. They had to implement a new type of database system that focuses on scalability, at the expense of the other benefits that come with relational databases. These efforts, combined with those of existing niche vendors, have led to the rise of a new breed of database management system. Source:http://www.slideshare.net/marc.seeger/keyvalue-stores-a-practical-overview Memcached (Key/value store on memory) Definition Free & open source, high-performance, distributed memory object caching system, generic in nature, but intended for use in speeding up dynamic web applications by alleviating database load. Memcached is an in-memory key-value store for small chunks of arbitrary data (strings, objects) from results of database calls, API calls, or page rendering.
  • 5. Memcached is simple yet powerful. Its simple design promotes quick deployment, ease of development, and solves many problems facing large data caches. Its API is available for most popular languages. What is it made up of?  Client software, which is given a list of available memcached servers.  A client-based hashing algorithm, which chooses a server based on the "key" input.  Server software, which stores your values with their keys into an internal hash table.  Server algorithms, which determine when to throw out old data (if out of memory), or reuse memory. What are the Design Philosophies? Simple Key/Value Store The server does not care what your data looks like. Items are made up of a key, an expiration time, optional flags, and raw data. It does not understand data structures; you must upload data that is pre-serialized. Some commands (incr/decr) may operate on the underlying data, but the implementation is simplistic. Smarts Half in Client, Half in Server A "memcached implementation" is implemented partially in a client, and partially in a server. Clients understand how to send items to particular servers, what to do when it cannot contact a server, and how to fetch keys from the servers. The servers understand how to receive items, and how to expire them. Servers are Disconnected From Each Other Memcached servers are generally unaware of each other. There is no crosstalk, no synchronization, no broadcasting. The lack of interconnections means adding more servers will usually add more capacity as you expect. There might be exceptions to this rule, but they are exceptions and carefully regarded. O(1) Everything For everything it can, memcached commands are O(1). Each command takes roughly the same amount of time to process every time, and should not get noticably slower anywhere. This goes back to the "Simple K/V Store" principle, as you don't want to be processing data in the cache service your tens or hundreds or thousands of webservers may need to access at the same time. Forgetting Data is a Feature Memcached is, by default, a Least Recently Used cache. It is designed to have items expire after a specified amount of time. Both of these are elegant solutions to many problems; Expire items after a minute to limit stale data being returned, or flush unused data in an effort to retain frequently requested information.
  • 6. This further allows great simplification in how memcached works. No "pauses" waiting for a garbage collector ensures low latency, and free space is lazily reclaimed. Cache Invalidation is a Hard Problem Given memcached's centralized-as-a-cluster nature, the job of invalidating a cache entry is trivial. Instead of broadcasting data to all available hosts, clients direct in on the exact location of data to be invalidated. You may further complicate matters to your needs, and there are caveats, but you sit on a strong baseline. Architecture The system uses client–server architecture. The servers maintain a key–value associative array; the clients populate this array and query it. Keys are up to 250 bytes long and values can be at most 1 megabyte large. Clients use client side libraries to contact the servers which, by default, expose their service at port 11211. Each client knows all servers; the servers do not communicate with each other. If a client wishes to set or read the value corresponding to a certain key, the client's library first computes a hash of the key to determine the server that will be used. Then it contacts that server. The server will compute a second hash of the key to determine where to store or read the corresponding value. The servers keep the values in RAM; if a server runs out of RAM, it discards the oldest values. Therefore, clients must treat Memcached as a transitory cache; they cannot assume that data stored in Memcached is still there when they need it. A Memcached-protocol compatible product known as MemcacheDB provides persistent storage. There is also a solution called Membase from NorthScale that provides persistence, replication and clustering. If all client libraries use the same hashing algorithm to determine servers, then clients can read each other's cached data; this is obviously desirable. A typical deployment will have several servers and many clients. However, it is possible to use Memcached on a single computer, acting simultaneously as client and server. http://memcached.org/ How this stuff works? a.k.a “The Memcache Pattern “ (http://code.google.com/appengine/docs/python/memcache/usingmemcache.html#Pattern) Memcache is typically used with the following pattern: • The application receives a query from the user or the application. • The application checks whether the data needed to satisfy that query is in memcache.
  • 7. o If the data is in memcache, the application uses that data. o If the data is not in memcache, the application queries the datastore and stores the results in memcache for future requests. The pseudocode below represents a typical memcache request: def get_data(): data = memcache.get("key") if data is not None: return data else: data = self.query_for_data() memcache.add("key", data, 60) return data Memcached allows you to take memory from parts of your system where you have more than you need and make it accessible to areas where you have less than you need. Memcached also allows you to make better use of your memory. If you consider the diagram to the right, you can see two deployment scenarios: 1. Each node is completely independent (top). 2. Each node can make use of memory from other nodes (bottom). The first scenario illustrates the classic deployment strategy, however you'll find that it's both wasteful in the sense that the total cache size is a fraction of the actual capacity of your web farm, but also in the amount of effort required to keep the cache consistent across all of those nodes. With memcached, you can see that all of the servers are looking into the same virtual pool of memory. This means that a given item is always stored and always retrieved from the same location in your entire web cluster. Also, as the demand for your application grows to the point where you need to have more servers, it generally also grows in terms of the data that must be regularly accessed. A deployment strategy where these two aspects of your system scale together just makes sense.
  • 8. The illustration to the right only shows two web servers for simplicity, but the property remains the same as the number increases. If you had fifty web servers, you'd still have a usable cache size of 64MB in the first example, but in the second, you'd have 3.2GB of usable cache. Of course, you aren't required to use your web server's memory for cache. Many memcached users have dedicated machines that are built to only be memcached servers. Users of Memcached LiveJournal, Wikipedia, Flickr, Bebo, Twitter, Typepad, Yellowbot, Youtube, Digg, Wordpress, Craigslist, Mixi Memcachedb (Key/Value store on disk) Definition (Wiki: http://en.wikipedia.org/wiki/Memcachedb) is a persistence enabled variant of memcached, a general-purpose distributed memory caching system often used to speed up dynamic database-driven websites by caching data and objects in memory. The main difference between MemcacheDB and memcached is that MemcacheDB has its own key-value database system based on Berkeley DB, so it is meant for persistent storage rather than as a cache solution. MemcacheDB is accessed through the same protocol as memcached, so applications may use any memcached API as a means of accessing a MemcacheDB database. MemcacheQ is a MemcacheDB variant that provides a simple message queue service. MemcacheDB is a distributed key-value storage system designed for persistent. It is NOT a cache solution, but a persistent storage engine for fast and reliable key- value based object storage and retrieval. It conforms to memcache protocol, so any memcached client can have connectivity with it. MemcacheDB uses Berkeley DB as a storing backend, so lots of features including transaction and replication are supported. Memcached was first developed by Brad Fitzpatrick for his website LiveJournal, on May 22, 2003. Features  High performance read/write for a key-value based object. Rapid set/get for a key-value based object, not relational. Benchmark will tell you the true later.
  • 9. High reliable persistent storage with transaction. Transaction is used to make your data more reliable.  High availability data storage with replication. Replication rocks! Achieve your HA, spread your read, make your transaction durable!  Memcache protocol compatibility. Lots of Memcached Client APIs can be used for Memcachedb, almost in any language, Perl, C, Python, Java, ... Why memcachedb? We have MySQL, we have PostgreSQL, we have a lot of RDBMSs, but why we need Memcachedb?  RDBMS is slow. All they have a complicated SQL engine on top of storage. Our data requires to be stored and retrieved damnable fast.  Not concurrent well. When thousands of clients, millions of requests happens...  But the data we wanna store is very small size! Cost is high if we use RDBMS.  Many critical infrastructure services need fast, reliable data storage and retrieval, but do not need the flexibility of dynamic SQL queries. o Index, Counter, Flags o Identity Management(Account, Profile, User config info, Score) o Messaging o Personal domain name o meta data of distributed system o Other non-relatonal data Performance Benchmark: MemcacheDB is very fast. Environment • Box: Dell 2950III • OS: Linux CentOS 5 • Version: memcachedb-1.0.0-beta • Client API: libmemcached a. Non-thread Edition Started: memcachedb -d -r -u root -H /data1/mdbtest/ -N -v Write (key: 16 value: 100B, 8 concurrents, every process does 2,000,000 set) No. 1 2 3 4 5 6 7 8 avg. Cost(s) 807 835 840 853 859 857 865 868 848 2000000 * 8 / 848 = 18868 w/s Read (key: 16 value: 100B, 8 concurrents, every process does 2,000,000 get) No. 1 2 3 4 5 6 7 8 avg. Cost(s) 354 354 359 358 357 364 363 365 360 2000000 * 8 / 360 = 44444 r/s b. Thread Edition(4 Threads)
  • 10. Started: memcachedb -d -r -u root -H /data1/mdbtest/ -N -t 4 –v Write (key: 16 value: 100B, 8 concurrents, every process does 2,000,000 set) No. 1 2 3 4 5 6 7 8 avg. Cost(s) 663 669 680 680 684 683 687 686 679 2000000 * 8 / 679 = 23564 w/s Read (key: 16 value: 100B, 8 concurrents, every process does 2,000,000 get) No. 1 2 3 4 5 6 7 8 avg. Cost(s) 245 249 250 248 248 249 251 250 249 2000000 * 8 / 249 = 64257 r/s How this stuff works?? Source: http://memcachedb.org/ and http://memcachedb.org/memcachedb- guide-1.0.pdf Non Thread version Thread version
  • 11. BerkeleyDB (Persistent storage used by memcachedb) Source: http://www.oracle.com/technology/products/berkeley-db/db/index.html Oracle Berkeley DB is a high-performance embeddable database providing SQL, Java Object and Key/Value storage. Berkeley DB offers advanced features including transactional data storage, highly concurrent access, replication for high availability, and fault tolerance in a self-contained, small footprint software library. Berkeley DB enables the development of custom data management solutions, without the overhead traditionally associated with such custom projects. Berkeley DB provides a collection of well-proven building-block technologies that can be configured to address any application need from the handheld device to the datacenter, from a local storage solution to a world-wide distributed one, from kilobytes to petabytes. Berkeley DB can be downloaded and the source code can be reviewed, then choose your build options and then compile the library in the configuration most suitable for your needs. The Berkeley DB library is a building block that provides the complex data management features found in enterprise class databases. These
  • 12. facilities include high throughput, low-latency reads, non-blocking writes, high concurrency, data scalability, in-memory caching, ACID transactions, automatic and catastrophic recovery when the application, system or hardware fails, high availability and replication in an application configurable package. Simply configure the library and use the particular features available to satisfy your particular application needs. Oracle Berkeley DB fits where you need it regardless of programming language, hardware platform, or storage media. Berkeley DB APIs are available in almost all programming languages including ANSI-C, C++, Java, C#, Perl, Python, Ruby and Erlang to name a few. There is a pure-Java version of the Berkeley DB library designed for products that must run entirely within a Java Virtual Machine (JVM). We support the Microsoft .NET environment and the Common Language Runtime (CLR) with a C# API. Oracle Berkeley DB is tested and certified to compile and run on all modern operating systems including Solaris, Windows, Linux, Android, Mac OS/X, BSD, iPhone OS, VxWorks, and QNX to name a few. Storage engine design BerkeleyDB BerkeleyDB Java Ed. BerkeleyDB XML Written in C Written in Java Written in C++ Software Library Java Software Archive Software Library (JAR) Key/value API Key/value API Layered on Berkeley DB SQL API by incorporating Java Direct Persistence XQuery API by SQLite Layer (DPL) API incorporating XQilla BTREE, HASH, QUEUE, Java Collections API Indexed, optimized XML RECNO storage storage C++, Java/JNI, C#, Replication for High C++, Java/JNI, C#,
  • 13. Python, Perl, ... Availability Python, Perl, ... Java Direct Persistence Replication for High Layer (DPL) API Availability Java Collections API Replication for High Availability Use cases of BerkeleyDB  Amazon’s Dynamo - http://www.allthingsdistributed.com/2007/10/amazons_dynamo.html  BerkeleyDB Java Ed. On Android on http://www.oracle.com/technetwork/database/berkeleydb/bdb-je- android-160932.pdf  Infoflex Connect AB Embeds Critical Edge into High-Speed, High- Performance SMS Messaging Gateway - http://www.oracle.com/customers/ snapshots/infoflex-connect-database-snapshot.pdf