SlideShare una empresa de Scribd logo
1 de 145
KVSの性能
   RDBMSのインデックス
  更にMapReduceを併せ持つ
    All-in-one NoSQL

Rakuten,inc DU Architect Group Hiroaki Kubota |2011/1/18   1
Introduction




               2
Who am I ?




             3
Introduction
Profile


Name:       窪田 博昭 Hiroaki Kubota
Company:    Rakuten Inc.
Unit:       ACT = Development Unit Architect Group
Mail:       hiroaki.kubota@mail.rakuten.com

Hobby:      Futsal , Golf
Recent:     My physical power has gradual declined...

twitter :   crumbjp
github:     crumbjp
                                                        4
Introduction
Agenda
• Introduction
• Mongo’s characteristic
• How to take advantage of the mongo for our service
     –   Our new system “cockatoo”
     –   MapReduce
• Structure & Performance
• Performance example ( on EC2 large )
• Major problems...
      – Indexing
      – STALE
      – Diskspace
      – PHP client
• Closing
                                                       5
Mongo’s characteristics




                          6
Mongo’s characteristic
Mongo’s ... / Mongo has ... / Mongo is ...
READ performance is extremely good !

WRITE performance is so-so,but cannot be scalable.

To READ data immediately after it is WRITTEN is bad.

Very high availability !

Under development.
 Maintenance tools are poor.
 Some useless operations.
                                                     7
How to take advantages of the Mongo
         for the infoseek news




                                      8
Our new system “Cockatoo”
(used to be call “Albatross”)




                                9
For instance of our page




                           10
Page structure




                 11
Layout / Components

Layout            Components




                               12
Generic WEB structure

Internet
 Internet

                 Request



       WEB
       WEB

                Call APIs



       API
       API

              Retrieve data

  DB
                               13
Cockatoo structure

           Internet
            Internet

                                  Request                 SessionDB
LayoutDB     Gat page layout

                                                          MongoDB
                   WEB
                   WEB                                    ReplSet
MongoDB
ReplSet       Get components
                                 Call APIs                Memcache

                    API
                    API

                               Retrieve data

             ContentsDB                        MongoDB
                                                ReplSet               14
Cockatoo structure

           Internet
            Internet

                                  Request                 SessionDB
LayoutDB     Gat page layout
             Mongo’s READ performance                     MongoDB
              is WEB
                  WEB
                  enough to cope with                     ReplSet
MongoDB
ReplSet       WEB PV.
             Get components
                            Call APIs                     Memcache
             But WRITE performance is
              not enough.
                  API
                    API

                               Retrieve data

             ContentsDB                        MongoDB
                                                ReplSet               15
Cockatoo structure

           Internet
            Internet

                                  Request                 SessionDB
LayoutDB     Gat page layout

                                                          MongoDB
                   WEB
                   WEB                                    ReplSet
MongoDB
ReplSet       Get components
                                 Call APIs                Memcache

                    API
                    API

                               Retrieve data

             ContentsDB                        MongoDB
                                                ReplSet               16
Cockatoo structure

            Internet
             Internet

                                   Request                 SessionDB
LayoutDB      Gat page layout

                                                           MongoDB
                    WEB
                    WEB                                    ReplSet
MongoDB
ReplSet        Get components
                                  Call APIs                Memcache

                     API
                     API
Zookeeper
                                Retrieve data

              ContentsDB                        MongoDB
                                                 ReplSet               17
Cockatoo structure

            Internet
             Internet

                                   Request                 SessionDB
LayoutDB      Gat page layout

                                                           MongoDB
                    WEB
                    WEB                                    ReplSet
MongoDB
ReplSet        Get components
                                  Call APIs                Memcache

                     API
                     API
Zookeeper                                                  Solr
                                Retrieve data

              ContentsDB                        MongoDB
                                                 ReplSet               18
Cockatoo structure
                   Developer

                                HTML markup
LayoutDB   Set page layout          &        Deploy API
                                API settings

                 CMS                                Batch servers
MongoDB
ReplSet     Set components
                                                             Insert Data


                                     API servers
                                     API servers
              Set static contents




             ContentsDB                        MongoDB
                                                ReplSet                    19
CMS
Layout editor




                      20
CMS




      21
CMS




      22
MapReduce




            23
MapReduce
Our usage
We have never used MapReduce as regular operation.
However, We have used it for some irreglar case.

• To search the invalid articles that should be removed
  because of someone’s mistakes...

• To analyze the number of new articles posted a day.

• To analyze the updated number an article.

• We get start considering to use it regularly for the
  social data analyzing before long ...                  24
Structure & Performance




                          25
Structure
We are using very poor machine (Virtual machine) !!

   • Intel(R) Xeon(R) CPU X5650 2.67GHz
       1core!!
   •   4GB memory
   •   50 GB disk space ( iScsi )
   •   CentOS5.5 64bit
   •   mongodb 1.8.0
        – ReplicaSet 5 nodes ( + 1 Arbiter)
        – Oplog size 1.2GB
        – Average object size 1KB
                                                      26
Structure
Researched environment

We’ve also researched following environments...
• Virtual machine 1 core
  – 1kb data , 6,000,000 documents
  – 8kb data , 200,000 documents
• Virtual machine 3 core
  – 1kb data , 6,000,000 documents
  – 8kb data , 200,000 documents
• EC2 large instance
  – 2kb data , 60,000,000 documents. ( 100GB )
                                                  27
Performance
I found the formula for making a rough estimation of QPS

1~8 kb documents + 1 unique index
C    = Number of CPU cores (Xeon 2.67 GHz)
DD = Score of ‘dd’ command (byte/sec)
S    = Document size (byte)

• GET qps            = 4500 × C
• SET(fsync) bytes/s = 0.05×DD ÷ S
• SET(nsync) qps     = 4500 BUT...
                           have chance of STALE
                                                           28
Performance example (on EC2 large)




                                     29
Performance example (on EC2 large)
Environment and amount of data

EC2 large instance
  – 2kb data , 60,000,000 documents. ( 100GB )
  – 1 unique index

Data-type
  {
      shop: 'someone',
      item: 'something',
      description: 'item explanation sentences...‘
  }                                                  30
Performance example (on EC2 large)
Batch insert (1000 documents) fsync=true
17906 sec (=289 min) (=3358 docs/sec)

Ensure index (background=false)

4049 sec (=67min)
         1.primary            2101 sec (=35min)
         2.secondary     1948 sec (=32min)




                                                  31
Performance example (on EC2 large)
Add one node
5833sec (=97min)
       1.Get files 2GB×48   2120 sec (=35min)
       2._id indexing       1406 sec (=23min)
       3.uniq indexing      2251 sec (=38min)
       4.other processes    56 sec (=1 min)




                                                32
Performance example (on EC2 large)
Group by
• Reduce by unique index & map & reduce
       –368 msec
 db.data.group({ key: { shop: 1},
                 cond: { shop: 'someone' },
                 reduce: function ( o , p ) { p.sum++; },
                 initial: { sum: 0 } });




                                                            33
Performance example (on EC2 large)
MapReduce
• Scan all data 3116sec (=52min)
         –number of key = 39092
 db.data.mapReduce(
  function(){ emit(this.shop,1); },
  function(k,v){
   var ret=0;
   v.forEach( function (value){ ret+=value; });
   return ret; },
  { query: {}, inline: 1, out: 'Tmp' } );
                                                  34
Major problems...




                    35
Indexing




           36
Index probrem
Online indexisng is completely useless even if last version (2.0.2)
Indexing is lock operation in default.
Indexing operation can run as background
   on the primary. But...
It CANNOT run as background on the secondary
Moreover the all secondary’s indexing run
   at the same time !!
Result in above...

           All slave freezes !                           orz...
                                                                      37
Present indexing ( default )




                               38
Index probrem
Present indexing ( default )
                               Primary
                                              save
                                                        Batch


      Secondary                Secondary          Secondary




   Client       Client         Client    Client      Client   39
Index probrem
Present indexing ( default )
                                Primary
 ensureIndex
                                 Lock      Cannot Batch
                                                  write
                               Indexing

      Secondary                Secondary           Secondary




   Client       Client         Client     Client     Client    40
Index probrem
Present indexing ( default )
                               Primary
           finished
                                                       Batch
                               Complete
                  SYNC                     SYNC
                                  SYNC
      Secondary                Secondary          Secondary
         Lock                     Lock               Lock
       Indexing                 Indexing           Indexing

                      Cannot read !!
   Client       Client         Client    Client     Client    41
Index probrem
Present indexing ( default )
                               Primary
                                                       Batch
                               Complete

      Secondary                Secondary          Secondary

       Complete                Complete           Complete



   Client       Client         Client    Client     Client    42
Present indexing ( background )




                                  43
Index probrem
Present indexing ( background )
                            Primary
                                           save
                                                     Batch


     Secondary             Secondary           Secondary




   Client      Client      Client     Client      Client   44
Index probrem
 Present indexing ( background )

ensureIndex(background)
                             Primary      Slow down...
                           Slowdown                Batch
                            Indexing

       Secondary            Secondary         Secondary




    Client      Client      Client   Client     Client    45
Index probrem
Present indexing ( background )
                            Primary
          finished
                                                       Batch
                          Complete
                 SYNC                      SYNC
                                  SYNC
     Secondary             Secondary              Secondary
        Lock                  Lock                   Lock
      Indexing              Indexing               Indexing

                     Cannot read !!
   Client      Client      Client        Client     Client    46
Index probrem
Present indexing ( background )
                            Primary
          finished
                                                       Batch
  Background Complete don’t work
             indexing
                 SYNC                      SYNC
                                  SYNC
     Secondary
        on the
        Lock
                        secondaries
                           Secondary
                              Lock
                                                  Secondary
                                                     Lock
      Indexing              Indexing               Indexing

                     Cannot read !!
   Client      Client      Client        Client     Client    47
Index probrem
Present indexing ( background )
                            Primary
          finished
                                                       Batch
                          Complete
                 SYNC                      SYNC
                                  SYNC
     Secondary             Secondary              Secondary
        Lock                  Lock                   Lock
      Indexing              Indexing               Indexing

                     Cannot read !!
   Client      Client      Client        Client     Client    48
Index probrem
Present indexing ( background )
                            Primary
                                                    Batch
                          Complete

     Secondary             Secondary           Secondary

      Complete             Complete            Complete



   Client      Client      Client     Client     Client    49
Probable 2.1.X indexing




                          50
Index probrem
Accoding to mongodb.org this probrem will fix in 2.1.0

But not released formally.
So I checked out the source code up to date.
 Certainlly it’ll be fixed !
Moreover it sounds like it’ll run as foreground
 when slave status isn’t SECONDARY
   (Does it means RECOVERING ?)




                                                         51
Index probrem
Probable 2.1.X indexing
                             Primary
                                            save
                                                      Batch


      Secondary             Secondary           Secondary




   Client      Client       Client     Client      Client   52
Index probrem
 Probable 2.1.X indexing

ensureIndex(background)
                              Primary      Slow down...
                            Slowdown                Batch
                             Indexing

       Secondary             Secondary         Secondary




    Client      Client       Client   Client     Client    53
Index probrem
Probable 2.1.X indexing
                             Primary
          finished
                                                     Batch
                            Complete
                  SYNC                    SYNC
                                SYNC
      Secondary             Secondary           Secondary
      Slowdown              Slowdown            Slowdown
       Indexing              Indexing            Indexing

                     Slow down...
   Client      Client       Client     Client     Client    54
Index probrem
Probable 2.1.X indexing
                             Primary
                                                     Batch
                            Complete

      Secondary             Secondary           Secondary

      Complete               Complete           Complete



   Client      Client       Client     Client     Client    55
Index probrem
Background indexing 2.1.X

But I think it’s not enough.
I think it can bring failure to our system when
the all secondaries slowdown at the same time !!


               So...




                                                   56
Ideal indexing




                 57
Index probrem
Ideal indexing
                             Primary
                                            save
                                                      Batch


      Secondary             Secondary           Secondary




   Client        Client     Client     Client      Client   58
Index probrem
 Ideal indexing

ensureIndex(background)
                              Primary      Slow down...
                            Slowdown                Batch
                             Indexing

       Secondary             Secondary         Secondary




    Client        Client     Client   Client     Client    59
Index probrem
Ideal indexing
                             Primary
           finished
                                                     Batch
                            Complete
            ensureIndex

     Recovering             Secondary           Secondary

        Indexing



   Client        Client     Client     Client     Client    60
Index probrem
Ideal indexing
                             Primary
                                                     Batch
                            Complete
                                  ensureIndex
      Secondary             Recovering          Secondary

       Complete              Indexing



   Client        Client     Client     Client     Client    61
Index probrem
Ideal indexing
                             Primary
                                                    Batch
                            Complete
                                          ensureIndex

      Secondary             Secondary       Recovering

       Complete              Complete           Indexing



   Client        Client     Client     Client    Client    62
Index probrem
Ideal indexing
                             Primary
                                                     Batch
                            Complete

      Secondary             Secondary           Secondary

       Complete              Complete           Complete



   Client        Client     Client     Client     Client    63
Index probrem
But ... I easilly guess it’s difficult to apply for current Oplog



It would be great if I can operate indexing
   manually
  at each secondaries




                                                                    64
I suggest Manual indexing




                            65
Index probrem
Manual indexing
                              Primary
                                             save
                                                       Batch


     Secondary               Secondary           Secondary




   Client         Client     Client     Client      Client   66
Index probrem
Manual indexing

ensureIndex(manual,background)
                              Primary      Slow down...
                            Slowdown                Batch
                             Indexing

     Secondary               Secondary         Secondary




   Client         Client     Client   Client     Client    67
Index probrem
Manual indexing
                              Primary
          finished
                                                      Batch
                             Complete

     Secondary               Secondary           Secondary




   Client         Client     Client     Client     Client    68
Index probrem
Manual indexing
                              Primary
          finished
                                                      Batch
                             Complete

     Secondary               Secondary           Secondary

      The secondaries don’t sync
             automatically
   Client         Client     Client     Client     Client    69
Index probrem
Manual indexing
                              Primary
          finished
                                                      Batch
                             Complete

     Secondary               Secondary           Secondary




   Client         Client     Client     Client     Client    70
Index probrem
Manual indexing
                              Primary
                                                      Batch
                             Complete
            ensureIndex(manual)

     Recovering              Secondary           Secondary

       Indexing



   Client         Client     Client     Client     Client    71
Index probrem
Manual indexing
                              Primary
                                                         Batch
                             Complete
                                   ensureIndex(manual)

     Secondary               Recovering          Secondary

      Complete                Indexing



   Client         Client     Client     Client     Client    72
Index probrem
Manual indexing
                              Primary
                                                      Batch
                             Complete
ensureIndex(manual,background)

     Secondary               Secondary           Secondary
                                                 Slowdown
      Complete                Complete            Indexing



   Client         Client     Client     Client     Client    73
Index probrem
Manual indexing
                              Primary
                                                     Batch
     It needs to support     Complete
ensureIndex(manual,background)


   background operation
    Secondary Secondary Secondary
                        Slowdown
      Complete                Complete           Indexing
Just in case,if the ReplSet has only
           one Secondary
   Client         Client     Client     Client    Client    74
Index probrem
Manual indexing
                              Primary
                                                      Batch
                             Complete
ensureIndex(manual,background)

     Secondary               Secondary           Secondary
                                                 Slowdown
      Complete                Complete            Indexing



   Client         Client     Client     Client     Client    75
Index probrem
Manual indexing
                              Primary
                                                      Batch
                             Complete

     Secondary               Secondary           Secondary

      Complete                Complete           Complete



   Client         Client     Client     Client     Client    76
That’s all about Indexing problem




                                    77
Struggle to control the sync




                               78
STALE




        79
Unknown log & Out of control the ReplSet
We often suffered from going out of control the Secondaries...

• Secondaries change status repeatedly in a
  moment
    between Secondary and Recovering
  (1.8.0)
• Then we found the strange line in the log...



 [rsSync] replSet error RS102 too stale to
   catch up
                                                                 80
What’s Stale ?
stale [stéil] (レベル:社会人必須 ) powered by goo.ne.jp

•   〈食品・飲料などが〉新鮮でない(⇔fresh);
•    気の抜けた, 〈コーヒーが〉香りの抜けた,
•   〈パンが〉ひからびた, 堅くなった,
•   〈空気・臭(にお)いなどが〉むっとする,
•   いやな臭いのする




                                                  81
What’s Stale ?
stale [stéil] (レベル:社会人必須 ) powered by goo.ne.jp

•   〈食品・飲料などが〉新鮮でない(⇔fresh);
•    気の抜けた, 〈コーヒーが〉香りの抜けた,
•   〈パンが〉ひからびた, 堅くなった,
•   〈空気・臭(にお)いなどが〉むっとする,
•   いやな臭いのする


どうも非常によろしくないらしい・・・

                                                  82
Mechanizm of being stale




                           83
ReplicaSet

  Client
  Client

mongod              mongod




Database   Oplog   Database   Oplog
    Primary          Secondary        84
Replication (simple case)




                            85
ReplicaSet

  Client
  Client

mongod              mongod




Database   Oplog   Database   Oplog
    Primary          Secondary        86
Insert & Replication 1

             A
  Client
  Client      Insert


mongod                     mongod




            Insert A
   A

Database    Oplog         Database   Oplog
       Primary               Secondary          87
Insert & Replication 1

  Client
  Client

                                  Sync




            Insert A                     Insert A
   A                          A

Database    Oplog         Database       Oplog
       Primary               Secondary              88
Replication (busy case)




                          89
Stale

  Client
  Client

mongod                  mongod




            Insert A                  Insert A
   A                      A

Database    Oplog      Database       Oplog
       Primary           Secondary               90
Insert & Replication 2

             B
  Client
  Client      Insert




            Insert B
   B        Insert A                 Insert A
   A                          A

Database    Oplog         Database   Oplog
       Primary               Secondary          91
Insert & Replication 2

             C
  Client
  Client      Insert




            Insert C
   C        Insert B
   B        Insert A                 Insert A
   A                          A

Database    Oplog         Database   Oplog
       Primary               Secondary          92
Insert & Replication 2

             A
  Client
  Client     Update




           Update A
            Insert C
   C        Insert B
   B        Insert A                 Insert A
   A                          A

Database    Oplog         Database   Oplog
       Primary               Secondary          93
Insert & Replication 2

  Client
  Client

                            Check Oplog




           Update A
            Insert C
   C        Insert B
   B        Insert A                      Insert A
   A                          A

Database    Oplog         Database        Oplog
       Primary               Secondary               94
Insert & Replication 2

  Client
  Client

                              Sync




           Update A                  Update A
            Insert C                 Insert C
   C        Insert B          C      Insert B
   B        Insert A          B      Insert A
   A                          A

Database    Oplog         Database   Oplog
       Primary               Secondary          95
Replication (more busy)




                          96
Stale

  Client
  Client

mongod                  mongod




            Insert A                  Insert A
   A                      A

Database    Oplog      Database       Oplog
       Primary           Secondary               97
Stale

             B
  Client
  Client      Insert




            Insert B
   B        Insert A                  Insert A
   A                      A

Database    Oplog      Database       Oplog
       Primary           Secondary               98
Stale

             C
  Client
  Client      Insert




            Insert C
   C        Insert B
   B        Insert A                  Insert A
   A                      A

Database    Oplog      Database       Oplog
       Primary           Secondary               99
Stale

             A
  Client
  Client     Update




           Update A
            Insert C
   C        Insert B
   B        Insert A                  Insert A
   A                      A

Database    Oplog      Database       Oplog
       Primary           Secondary               100
Stale

             C
  Client
  Client     Update




           Update C
           Update A
   C        Insert C
   B        Insert B                  Insert A
   A        Insert A      A

Database    Oplog      Database       Oplog
       Primary           Secondary               101
Stale

             D
  Client
  Client      Insert




            Insert D
   D       Update C
   C       Update A
   B        Insert C                  Insert A
   A        Insert B      A

Database    Insert A   Database       Oplog
       Primary           Secondary               102
Stale

  Client
  Client                                [Inset A]
                                       not found !!
                              Check Oplog




            Insert D
   D       Update C
   C       Update A
   B        Insert C                  Insert A
   A        Insert B      A

Database    Insert A   Database       Oplog
       Primary           Secondary                    103
Stale

  Client
  Client                                [Inset A]
                                       not found !!
                              Check Oplog

                                                 It cannot get
                                                 infomation about
                                                 [Insert B].
            Insert D
   D       Update C
   C       Update A                              So cannot sync !!
   B        Insert C                  Insert A
   A        Insert B      A
                                                 It’s called STALE
Database    Insert A   Database       Oplog
       Primary           Recovering                                  104
Stale
We have to understand the importance of adjusting oplog size

We can specify the oplog size as one of the command
   line option
Only at the first time per the dbpath
 that is also specified as a command line.
Also we cannot change the oplog size
 without clearing the dbpath.


              Be careful !


                                                           105
Replication (Join as a new node)




                                   106
InitialSync

  Client
  Client

mongod




            Insert D
   D       Update C
   C       Update A
   B        Insert C
   A

Database    Oplog
       Primary                       107
InitialSync

  Client
  Client

mongod                  mongod




            Insert D
   D       Update C
   C       Update A
   B        Insert C
   A

Database    Oplog      Database   Oplog
       Primary             Startup        108
InitialSync

  Client
  Client

                            Get last Oplog




            Insert D
   D       Update C
   C       Update A
   B        Insert C                 Insert D
   A

Database    Oplog      Database       Oplog
       Primary           Recovering             109
InitialSync

                          D
  Client
  Client                  C
                          B
                          A   Cloning DB




            Insert D
   D       Update C
   C       Update A
   B        Insert C               Insert D
   A

Database    Oplog      Database     Oplog
       Primary           Recovering           110
InitialSync

                          D
  Client
  Client                  C
                          B
                          A   Cloning DB




            Insert D
   D       Update C
   C       Update A
   B        Insert C               Insert D
   A                      A

Database    Oplog      Database     Oplog
       Primary           Recovering           111
InitialSync

             E            D
  Client
  Client      Insert      C
                          B
                          A   Cloning DB




   E        Insert E
   D        Insert D
   C       Update C
   B                      B
           Update A                Insert D
   A                      A
            Insert C

Database    Oplog      Database     Oplog
       Primary           Recovering           112
InitialSync

             B
  Client
  Client     Update


                          Cloning DB complete




   E       Update B
   D        Insert E      D
   C        Insert D      C
   B       Update C       B           Insert D
   A       Update A       A

Database    Oplog      Database       Oplog
       Primary           Recovering              113
InitialSync

  Client
  Client

                              Check Oplog




   E       Update B
   D        Insert E      D
   C        Insert D      C
   B       Update C       B           Insert D
   A                      A

Database    Oplog      Database       Oplog
       Primary           Recovering              114
InitialSync

  Client
  Client

                              Sync




   E       Update B       E
   D        Insert E      D       Update B
   C        Insert D      C          Insert E
   B       Update C       B          Insert D
   A                      A

Database    Oplog      Database      Oplog
       Primary           Secondary              115
Additional infomation
From source code. ( I’ve never examed these... )

Secondary will try to sync from other Secondaries
 when it cannot reach the Primary or
 might be stale against the Primary.

 There is a bit of chance that sync problem not occured if the
  secondary has old Oplog or larger Oplog space than Primary




                                                            116
Sync from another secondary

  Client
  Client




            Insert D                                    Insert D
   D       Update C                             D       Update C
   C       Update A                             C       Update A
   B        Insert C              Insert A      B       Insert C
   A        Insert B      A                     A       Insert B

Database    Insert A   Database   Oplog      Database   Insert A

       Primary           Secondary             Secondary      117
Sync from another secondary

  Client                             [Inset A]
  Client
                                    not found !!

                              Check Oplog




            Insert D                                        Insert D
   D       Update C                                 D       Update C
   C       Update A                                 C       Update A
   B        Insert C                  Insert A      B       Insert C
   A        Insert B      A                         A       Insert B

Database    Insert A   Database       Oplog      Database   Insert A

       Primary           Secondary                 Secondary      118
Sync from another secondary

  Client               But found at the other secondary
  Client
                         So it’s able to sync

                                Check Oplog




            Insert D                                          Insert D
   D       Update C                                   D       Update C
   C       Update A                                   C       Update A
   B        Insert C                    Insert A      B       Insert C
   A        Insert B        A                         A       Insert B

Database    Insert A   Database         Oplog      Database   Insert A

       Primary            Secondary                  Secondary      119
Sync from the other secondary

  Client               But found at the other secondary
  Client
                         So it’s able to sync

                                Sync




            Insert D                   Insert D              Insert D
   D       Update C        D       Update C          D       Update C
   C       Update A        C       Update A          C       Update A
   B        Insert C       B           Insert C      B       Insert C
   A        Insert B       A           Insert B      A       Insert B
                                       Insert A
Database    Insert A   Database                   Database   Insert A

       Primary            Secondary                 Secondary      120
That’s all about sync




                        121
Others...




            122
Disk space




             123
Disk space
Data fragment into any DB files sparsely...
 We met the unfavorable circumstance in our DBs

 This circumstance appears at some of our collections
  around 3 months after we launched the services

 db.ourcol.storageSize()      = 16200727264 (15GB)
 db.ourcol.totalSize()        = 16200809184
 db.ourcol.totalIndexSize()   =       81920
 db.outcol.dataSize()         =     2032300 (2MB)


      What’s happen to them !!                          124
Disk space
Data fragment into any DB files sparsely...
 It’s seems like to be caused by the specific operation
  that insert , update and delete over and over.

 Anyway we have to shrink the using disk space regularly
  just like PostgreSQL’s vacume.


             But how to do it ?



                                                           125
Disk space
Shrink the using disk spaces
  MongoDB offers some functions for this case.
   But couldn’t use in our case !

  repairdatabase:
   Only runable on the Primary.
   It needs long time and BLOCK all operations !!

  compact:
   Only runable on the Secondary.
   Zero-fill the blank space instead of shrink disk spaces.
   So cannot shrink...
                                                              126
Disk space
Our measurements
For temporary collection:
 To issue drop-command regularly.
For other collections:
       1.Get rid of one secondary from the ReplSet.
       2.Shut down this.
       3.Remove all DB files.
       4.Join to the ReplSet.
       5.Do these operations one after another.
       6.Step down the Primary. (Change Primary node)
       7.At last, do 1 – 4 operations on prior Primary.
                                                    127
Disk space
Shrink operation




        Primary    Secondary    Secondary

        Bloated      Bloated     Bloated
                                            128
Disk space
Shrink operation




                         shutdown mongod
                         (kill -15)




        Primary       Dead           Secondary

        Bloated      Bloated           Bloated
                                                 129
Disk space
Shrink operation




                          delete DBPATH




        Primary       Dead           Secondary

        Bloated      Nothing              Bloated
                                                    130
Disk space
Shrink operation




                           start mongod




        Primary      Startup          Secondary

        Bloated      Nothing              Bloated
                                                    131
Disk space
Shrink operation




        Primary    Secondary    Secondary

        Bloated     Shrinked     Bloated
                                            132
Disk space
Shrink operation




                                shutdown mongod
                                delete DBPATH
                                startup mongod




        Primary    Secondary        Secondary

        Bloated     Shrinked          Shrinked
                                                  133
Disk space
Shrink operation




                   step down




     Secondary                 Primary    Secondary

        Bloated                Shrinked   Shrinked
                                                      134
Disk space
Shrink operation




         shutdown mongod
         delete DBPATH
         startup mongod




     Secondary               Primary    Secondary

       Shrinked             Shrinked    Shrinked
                                                    135
PHP client




             136
PHP client
We tried 1.1.4 and 1.2.2
1.1.4:
 There is some critical bugs around connection pool.
 We struggled to invalidate the broken connection.
 I think, you should use 1.2.X instead of 1.1.X
1.2.2:
 It seems like to be fixed around connection pool.
 But there are 2 critical bugs !
        –Socket handle leak
        –Useless sleep
 However, This version is relatively stable        137



    as long as to fix these bugs
PHP client
Patches


https://github.com/crumbjp/Personal

 - mongo1.2.2.non-wait.patch
 - mongo1.2.2.sock-leak.patch




                                      138
PHP client




             139
Closing




          140
Closing
 What’s MongoDB ?
It has very good READ performance.
    We can use mongo instead of memcached.
    if we can allow the limited write performance.
Die hard !
    MongoDB have high availability even if under a severe stress..
Can use easilly without deep consideration
    We can manage to do anything after getting start to use.
    Let’s forget any awkward trivial things that have bothered us.
         How to treat the huge data ?
         How to put in the cache system ?
         How to keep the availablity ?
         And so on ....                                        141
Closing
Keep in mind
Sharding is challenging...
   It’s last resort !
   It’s hard to operate. In particular, to maintain config-servers.
   [Mongos] is also difficult to keep alive.
   I want the way to failover Mongos.
Mongo is able to run on the poor environment but...
   You should ONLY put aside the large diskspace
Huge write is sensitive
   Adjust the oplog size carefully
Indexing function has been unfinished
   Cannot apply index online
                                                                  142
All right, Have fun !!



                         143
All right, Have fun !!
 ...with us at Rakuten


                         144
All right, Have fun !!
         ...with us at Rakuten
Please join Rakuten for cool work?
                                 145

Más contenido relacionado

La actualidad más candente

Scalable XQuery Processing with Zorba on top of MongoDB
Scalable XQuery Processing with Zorba on top of MongoDBScalable XQuery Processing with Zorba on top of MongoDB
Scalable XQuery Processing with Zorba on top of MongoDBWilliam Candillon
 
微博cache设计谈
微博cache设计谈微博cache设计谈
微博cache设计谈Tim Y
 
PG Day'14 Russia, Работа со слабо-структурированными данными в PostgreSQL, Ол...
PG Day'14 Russia, Работа со слабо-структурированными данными в PostgreSQL, Ол...PG Day'14 Russia, Работа со слабо-структурированными данными в PostgreSQL, Ол...
PG Day'14 Russia, Работа со слабо-структурированными данными в PostgreSQL, Ол...pgdayrussia
 
[db tech showcase Tokyo 2017] C23: Lessons from SQLite4 by SQLite.org - Richa...
[db tech showcase Tokyo 2017] C23: Lessons from SQLite4 by SQLite.org - Richa...[db tech showcase Tokyo 2017] C23: Lessons from SQLite4 by SQLite.org - Richa...
[db tech showcase Tokyo 2017] C23: Lessons from SQLite4 by SQLite.org - Richa...Insight Technology, Inc.
 
Using MongoDB and Python
Using MongoDB and PythonUsing MongoDB and Python
Using MongoDB and PythonMike Bright
 
10 Reasons to Start Your Analytics Project with PostgreSQL
10 Reasons to Start Your Analytics Project with PostgreSQL10 Reasons to Start Your Analytics Project with PostgreSQL
10 Reasons to Start Your Analytics Project with PostgreSQLSatoshi Nagayasu
 
Scaling up and accelerating Drupal 8 with NoSQL
Scaling up and accelerating Drupal 8 with NoSQLScaling up and accelerating Drupal 8 with NoSQL
Scaling up and accelerating Drupal 8 with NoSQLOSInet
 
视觉中国的MongoDB应用实践(QConBeijing2011)
视觉中国的MongoDB应用实践(QConBeijing2011)视觉中国的MongoDB应用实践(QConBeijing2011)
视觉中国的MongoDB应用实践(QConBeijing2011)Night Sailer
 
深入了解Redis
深入了解Redis深入了解Redis
深入了解Redisiammutex
 
[db tech showcase Tokyo 2017] A11: SQLite - The most used yet least appreciat...
[db tech showcase Tokyo 2017] A11: SQLite - The most used yet least appreciat...[db tech showcase Tokyo 2017] A11: SQLite - The most used yet least appreciat...
[db tech showcase Tokyo 2017] A11: SQLite - The most used yet least appreciat...Insight Technology, Inc.
 
State of the art: Server-side JavaScript - MoscowJS
State of the art: Server-side JavaScript - MoscowJSState of the art: Server-side JavaScript - MoscowJS
State of the art: Server-side JavaScript - MoscowJSAlexandre Morgaut
 
State of the art server side java script
State of the art server side java scriptState of the art server side java script
State of the art server side java scriptThibaud Arguillere
 
MongoDB开发应用实践
MongoDB开发应用实践MongoDB开发应用实践
MongoDB开发应用实践iammutex
 
Mongo db pefrormance optimization strategies
Mongo db pefrormance optimization strategiesMongo db pefrormance optimization strategies
Mongo db pefrormance optimization strategiesronwarshawsky
 
MongoFr : MongoDB as a log Collector
MongoFr : MongoDB as a log CollectorMongoFr : MongoDB as a log Collector
MongoFr : MongoDB as a log CollectorPierre Baillet
 
NoSQL and JavaScript: a Love Story
NoSQL and JavaScript: a Love StoryNoSQL and JavaScript: a Love Story
NoSQL and JavaScript: a Love StoryAlexandre Morgaut
 
Amazon Aurora로 안전하게 migration 하기
Amazon Aurora로 안전하게 migration 하기Amazon Aurora로 안전하게 migration 하기
Amazon Aurora로 안전하게 migration 하기Jesang Yoon
 
Crawlware
CrawlwareCrawlware
Crawlwarekidrane
 
MongoDB Internals
MongoDB InternalsMongoDB Internals
MongoDB InternalsSiraj Memon
 
MongoDB Best Practices in AWS
MongoDB Best Practices in AWS MongoDB Best Practices in AWS
MongoDB Best Practices in AWS Chris Harris
 

La actualidad más candente (20)

Scalable XQuery Processing with Zorba on top of MongoDB
Scalable XQuery Processing with Zorba on top of MongoDBScalable XQuery Processing with Zorba on top of MongoDB
Scalable XQuery Processing with Zorba on top of MongoDB
 
微博cache设计谈
微博cache设计谈微博cache设计谈
微博cache设计谈
 
PG Day'14 Russia, Работа со слабо-структурированными данными в PostgreSQL, Ол...
PG Day'14 Russia, Работа со слабо-структурированными данными в PostgreSQL, Ол...PG Day'14 Russia, Работа со слабо-структурированными данными в PostgreSQL, Ол...
PG Day'14 Russia, Работа со слабо-структурированными данными в PostgreSQL, Ол...
 
[db tech showcase Tokyo 2017] C23: Lessons from SQLite4 by SQLite.org - Richa...
[db tech showcase Tokyo 2017] C23: Lessons from SQLite4 by SQLite.org - Richa...[db tech showcase Tokyo 2017] C23: Lessons from SQLite4 by SQLite.org - Richa...
[db tech showcase Tokyo 2017] C23: Lessons from SQLite4 by SQLite.org - Richa...
 
Using MongoDB and Python
Using MongoDB and PythonUsing MongoDB and Python
Using MongoDB and Python
 
10 Reasons to Start Your Analytics Project with PostgreSQL
10 Reasons to Start Your Analytics Project with PostgreSQL10 Reasons to Start Your Analytics Project with PostgreSQL
10 Reasons to Start Your Analytics Project with PostgreSQL
 
Scaling up and accelerating Drupal 8 with NoSQL
Scaling up and accelerating Drupal 8 with NoSQLScaling up and accelerating Drupal 8 with NoSQL
Scaling up and accelerating Drupal 8 with NoSQL
 
视觉中国的MongoDB应用实践(QConBeijing2011)
视觉中国的MongoDB应用实践(QConBeijing2011)视觉中国的MongoDB应用实践(QConBeijing2011)
视觉中国的MongoDB应用实践(QConBeijing2011)
 
深入了解Redis
深入了解Redis深入了解Redis
深入了解Redis
 
[db tech showcase Tokyo 2017] A11: SQLite - The most used yet least appreciat...
[db tech showcase Tokyo 2017] A11: SQLite - The most used yet least appreciat...[db tech showcase Tokyo 2017] A11: SQLite - The most used yet least appreciat...
[db tech showcase Tokyo 2017] A11: SQLite - The most used yet least appreciat...
 
State of the art: Server-side JavaScript - MoscowJS
State of the art: Server-side JavaScript - MoscowJSState of the art: Server-side JavaScript - MoscowJS
State of the art: Server-side JavaScript - MoscowJS
 
State of the art server side java script
State of the art server side java scriptState of the art server side java script
State of the art server side java script
 
MongoDB开发应用实践
MongoDB开发应用实践MongoDB开发应用实践
MongoDB开发应用实践
 
Mongo db pefrormance optimization strategies
Mongo db pefrormance optimization strategiesMongo db pefrormance optimization strategies
Mongo db pefrormance optimization strategies
 
MongoFr : MongoDB as a log Collector
MongoFr : MongoDB as a log CollectorMongoFr : MongoDB as a log Collector
MongoFr : MongoDB as a log Collector
 
NoSQL and JavaScript: a Love Story
NoSQL and JavaScript: a Love StoryNoSQL and JavaScript: a Love Story
NoSQL and JavaScript: a Love Story
 
Amazon Aurora로 안전하게 migration 하기
Amazon Aurora로 안전하게 migration 하기Amazon Aurora로 안전하게 migration 하기
Amazon Aurora로 안전하게 migration 하기
 
Crawlware
CrawlwareCrawlware
Crawlware
 
MongoDB Internals
MongoDB InternalsMongoDB Internals
MongoDB Internals
 
MongoDB Best Practices in AWS
MongoDB Best Practices in AWS MongoDB Best Practices in AWS
MongoDB Best Practices in AWS
 

Destacado

Hist 141 california and the civil war
Hist 141   california and the civil warHist 141   california and the civil war
Hist 141 california and the civil warflip7rider
 
Hist 141 panama & los angeles
Hist 141   panama & los angelesHist 141   panama & los angeles
Hist 141 panama & los angelesflip7rider
 
Jaarverslag 2011/2012
Jaarverslag 2011/2012Jaarverslag 2011/2012
Jaarverslag 2011/2012dewittenberg
 
Hist 141 panama & los angeles
Hist 141   panama & los angelesHist 141   panama & los angeles
Hist 141 panama & los angelesflip7rider
 
презентация элективного курса по биологии
презентация элективного курса по биологиипрезентация элективного курса по биологии
презентация элективного курса по биологииloksal
 
Formatofduediligence 020608
Formatofduediligence 020608Formatofduediligence 020608
Formatofduediligence 020608Anji Uppari
 
6 11-13 collecting russia ucb
6 11-13 collecting russia ucb6 11-13 collecting russia ucb
6 11-13 collecting russia ucblpendse
 
サーバーの初歩的な話セミナー@大阪20120901
サーバーの初歩的な話セミナー@大阪20120901サーバーの初歩的な話セミナー@大阪20120901
サーバーの初歩的な話セミナー@大阪20120901Masayuki Abe
 
B orenic1
B orenic1B orenic1
B orenic1bso901
 
Onco Care Pharmaceuticals
Onco Care PharmaceuticalsOnco Care Pharmaceuticals
Onco Care PharmaceuticalsHamza Khan
 
Copyright crash course part 5
Copyright crash course part 5Copyright crash course part 5
Copyright crash course part 5gsalas10
 
Mongo ghostsync and slaveDelay (Japanease)
Mongo ghostsync and slaveDelay (Japanease)Mongo ghostsync and slaveDelay (Japanease)
Mongo ghostsync and slaveDelay (Japanease)Hiroaki Kubota
 
Html5fun@東京 Bootstrapにアニメーションを付けよう
Html5fun@東京 Bootstrapにアニメーションを付けようHtml5fun@東京 Bootstrapにアニメーションを付けよう
Html5fun@東京 Bootstrapにアニメーションを付けようMasayuki Abe
 
My Favorite Movie
My Favorite MovieMy Favorite Movie
My Favorite Moviececil52
 
Team 4 Chp 4 & 5
Team 4 Chp 4 & 5Team 4 Chp 4 & 5
Team 4 Chp 4 & 5tkern317
 
関デジセミナー20130710
関デジセミナー20130710関デジセミナー20130710
関デジセミナー20130710Masayuki Abe
 
Cordovaコトハジメ( Html5fun×senchUG )
Cordovaコトハジメ( Html5fun×senchUG )Cordovaコトハジメ( Html5fun×senchUG )
Cordovaコトハジメ( Html5fun×senchUG )Masayuki Abe
 

Destacado (20)

Hist 141 california and the civil war
Hist 141   california and the civil warHist 141   california and the civil war
Hist 141 california and the civil war
 
Hist 141 panama & los angeles
Hist 141   panama & los angelesHist 141   panama & los angeles
Hist 141 panama & los angeles
 
Jaarverslag 2011/2012
Jaarverslag 2011/2012Jaarverslag 2011/2012
Jaarverslag 2011/2012
 
Hist 141 panama & los angeles
Hist 141   panama & los angelesHist 141   panama & los angeles
Hist 141 panama & los angeles
 
Kitakyushu Smart City Master Plan
Kitakyushu Smart City Master PlanKitakyushu Smart City Master Plan
Kitakyushu Smart City Master Plan
 
презентация элективного курса по биологии
презентация элективного курса по биологиипрезентация элективного курса по биологии
презентация элективного курса по биологии
 
Formatofduediligence 020608
Formatofduediligence 020608Formatofduediligence 020608
Formatofduediligence 020608
 
6 11-13 collecting russia ucb
6 11-13 collecting russia ucb6 11-13 collecting russia ucb
6 11-13 collecting russia ucb
 
サーバーの初歩的な話セミナー@大阪20120901
サーバーの初歩的な話セミナー@大阪20120901サーバーの初歩的な話セミナー@大阪20120901
サーバーの初歩的な話セミナー@大阪20120901
 
B orenic1
B orenic1B orenic1
B orenic1
 
Onco Care Pharmaceuticals
Onco Care PharmaceuticalsOnco Care Pharmaceuticals
Onco Care Pharmaceuticals
 
Copyright crash course part 5
Copyright crash course part 5Copyright crash course part 5
Copyright crash course part 5
 
Mongo ghostsync and slaveDelay (Japanease)
Mongo ghostsync and slaveDelay (Japanease)Mongo ghostsync and slaveDelay (Japanease)
Mongo ghostsync and slaveDelay (Japanease)
 
Html5fun@東京 Bootstrapにアニメーションを付けよう
Html5fun@東京 Bootstrapにアニメーションを付けようHtml5fun@東京 Bootstrapにアニメーションを付けよう
Html5fun@東京 Bootstrapにアニメーションを付けよう
 
Merchant kit
Merchant kitMerchant kit
Merchant kit
 
My Favorite Movie
My Favorite MovieMy Favorite Movie
My Favorite Movie
 
Trivia game
Trivia gameTrivia game
Trivia game
 
Team 4 Chp 4 & 5
Team 4 Chp 4 & 5Team 4 Chp 4 & 5
Team 4 Chp 4 & 5
 
関デジセミナー20130710
関デジセミナー20130710関デジセミナー20130710
関デジセミナー20130710
 
Cordovaコトハジメ( Html5fun×senchUG )
Cordovaコトハジメ( Html5fun×senchUG )Cordovaコトハジメ( Html5fun×senchUG )
Cordovaコトハジメ( Html5fun×senchUG )
 

Similar a KVSの性能を活かすAll-in-one NoSQLデータベースMongoDBの性能と問題点

KVSの性能、RDBMSのインデックス、更にMapReduceを併せ持つAll-in-One NoSQL: MongoDB
KVSの性能、RDBMSのインデックス、更にMapReduceを併せ持つAll-in-One NoSQL: MongoDB KVSの性能、RDBMSのインデックス、更にMapReduceを併せ持つAll-in-One NoSQL: MongoDB
KVSの性能、RDBMSのインデックス、更にMapReduceを併せ持つAll-in-One NoSQL: MongoDB Rakuten Group, Inc.
 
BedCon 2013 - Java Persistenz-Frameworks für MongoDB
BedCon 2013 - Java Persistenz-Frameworks für MongoDBBedCon 2013 - Java Persistenz-Frameworks für MongoDB
BedCon 2013 - Java Persistenz-Frameworks für MongoDBTobias Trelle
 
using Spring and MongoDB on Cloud Foundry
using Spring and MongoDB on Cloud Foundryusing Spring and MongoDB on Cloud Foundry
using Spring and MongoDB on Cloud FoundryJoshua Long
 
Why we chose mongodb for guardian.co.uk
Why we chose mongodb for guardian.co.ukWhy we chose mongodb for guardian.co.uk
Why we chose mongodb for guardian.co.ukGraham Tackley
 
Java Persistence Frameworks for MongoDB
Java Persistence Frameworks for MongoDBJava Persistence Frameworks for MongoDB
Java Persistence Frameworks for MongoDBMongoDB
 
Moving from C#/.NET to Hadoop/MongoDB
Moving from C#/.NET to Hadoop/MongoDBMoving from C#/.NET to Hadoop/MongoDB
Moving from C#/.NET to Hadoop/MongoDBMongoDB
 
WebBee rapid web app development teck stack
WebBee rapid web app development teck stackWebBee rapid web app development teck stack
WebBee rapid web app development teck stackALDAN3
 
The DevOps PaaS Infusion - May meetup
The DevOps PaaS Infusion - May meetupThe DevOps PaaS Infusion - May meetup
The DevOps PaaS Infusion - May meetupNorm Leitman
 
Hybrid MongoDB and RDBMS Applications
Hybrid MongoDB and RDBMS ApplicationsHybrid MongoDB and RDBMS Applications
Hybrid MongoDB and RDBMS ApplicationsSteven Francia
 
Spring MVC introduction HVA
Spring MVC introduction HVASpring MVC introduction HVA
Spring MVC introduction HVAPeter Maas
 
Q con london2011-matthewwall-whyichosemongodbforguardiancouk
Q con london2011-matthewwall-whyichosemongodbforguardiancoukQ con london2011-matthewwall-whyichosemongodbforguardiancouk
Q con london2011-matthewwall-whyichosemongodbforguardiancoukRoger Xia
 
Java Persistence Frameworks for MongoDB
Java Persistence Frameworks for MongoDBJava Persistence Frameworks for MongoDB
Java Persistence Frameworks for MongoDBTobias Trelle
 
The Radeox Wiki Render Engine
The Radeox Wiki Render EngineThe Radeox Wiki Render Engine
The Radeox Wiki Render EngineMatthias Jugel
 
MongoDB for Java Devs with Spring Data - MongoPhilly 2011
MongoDB for Java Devs with Spring Data - MongoPhilly 2011MongoDB for Java Devs with Spring Data - MongoPhilly 2011
MongoDB for Java Devs with Spring Data - MongoPhilly 2011MongoDB
 
Services inception in Ruby
Services inception in RubyServices inception in Ruby
Services inception in RubyDave McCrory
 
A flexible plugin like data layer - decouple your -_application logic from yo...
A flexible plugin like data layer - decouple your -_application logic from yo...A flexible plugin like data layer - decouple your -_application logic from yo...
A flexible plugin like data layer - decouple your -_application logic from yo...MongoDB
 
How Thermo Fisher is Reducing Data Analysis Times from Days to Minutes with M...
How Thermo Fisher is Reducing Data Analysis Times from Days to Minutes with M...How Thermo Fisher is Reducing Data Analysis Times from Days to Minutes with M...
How Thermo Fisher is Reducing Data Analysis Times from Days to Minutes with M...MongoDB
 
No SQL at The Guardian
No SQL at The GuardianNo SQL at The Guardian
No SQL at The GuardianMat Wall
 
VIE - Using RDFa to make content editable
VIE - Using RDFa to make content editableVIE - Using RDFa to make content editable
VIE - Using RDFa to make content editableHenri Bergius
 

Similar a KVSの性能を活かすAll-in-one NoSQLデータベースMongoDBの性能と問題点 (20)

KVSの性能、RDBMSのインデックス、更にMapReduceを併せ持つAll-in-One NoSQL: MongoDB
KVSの性能、RDBMSのインデックス、更にMapReduceを併せ持つAll-in-One NoSQL: MongoDB KVSの性能、RDBMSのインデックス、更にMapReduceを併せ持つAll-in-One NoSQL: MongoDB
KVSの性能、RDBMSのインデックス、更にMapReduceを併せ持つAll-in-One NoSQL: MongoDB
 
BedCon 2013 - Java Persistenz-Frameworks für MongoDB
BedCon 2013 - Java Persistenz-Frameworks für MongoDBBedCon 2013 - Java Persistenz-Frameworks für MongoDB
BedCon 2013 - Java Persistenz-Frameworks für MongoDB
 
using Spring and MongoDB on Cloud Foundry
using Spring and MongoDB on Cloud Foundryusing Spring and MongoDB on Cloud Foundry
using Spring and MongoDB on Cloud Foundry
 
Why we chose mongodb for guardian.co.uk
Why we chose mongodb for guardian.co.ukWhy we chose mongodb for guardian.co.uk
Why we chose mongodb for guardian.co.uk
 
Java Persistence Frameworks for MongoDB
Java Persistence Frameworks for MongoDBJava Persistence Frameworks for MongoDB
Java Persistence Frameworks for MongoDB
 
Moving from C#/.NET to Hadoop/MongoDB
Moving from C#/.NET to Hadoop/MongoDBMoving from C#/.NET to Hadoop/MongoDB
Moving from C#/.NET to Hadoop/MongoDB
 
WebBee rapid web app development teck stack
WebBee rapid web app development teck stackWebBee rapid web app development teck stack
WebBee rapid web app development teck stack
 
The DevOps PaaS Infusion - May meetup
The DevOps PaaS Infusion - May meetupThe DevOps PaaS Infusion - May meetup
The DevOps PaaS Infusion - May meetup
 
Hybrid MongoDB and RDBMS Applications
Hybrid MongoDB and RDBMS ApplicationsHybrid MongoDB and RDBMS Applications
Hybrid MongoDB and RDBMS Applications
 
Spring MVC introduction HVA
Spring MVC introduction HVASpring MVC introduction HVA
Spring MVC introduction HVA
 
Q con london2011-matthewwall-whyichosemongodbforguardiancouk
Q con london2011-matthewwall-whyichosemongodbforguardiancoukQ con london2011-matthewwall-whyichosemongodbforguardiancouk
Q con london2011-matthewwall-whyichosemongodbforguardiancouk
 
NoSQL Technology
NoSQL TechnologyNoSQL Technology
NoSQL Technology
 
Java Persistence Frameworks for MongoDB
Java Persistence Frameworks for MongoDBJava Persistence Frameworks for MongoDB
Java Persistence Frameworks for MongoDB
 
The Radeox Wiki Render Engine
The Radeox Wiki Render EngineThe Radeox Wiki Render Engine
The Radeox Wiki Render Engine
 
MongoDB for Java Devs with Spring Data - MongoPhilly 2011
MongoDB for Java Devs with Spring Data - MongoPhilly 2011MongoDB for Java Devs with Spring Data - MongoPhilly 2011
MongoDB for Java Devs with Spring Data - MongoPhilly 2011
 
Services inception in Ruby
Services inception in RubyServices inception in Ruby
Services inception in Ruby
 
A flexible plugin like data layer - decouple your -_application logic from yo...
A flexible plugin like data layer - decouple your -_application logic from yo...A flexible plugin like data layer - decouple your -_application logic from yo...
A flexible plugin like data layer - decouple your -_application logic from yo...
 
How Thermo Fisher is Reducing Data Analysis Times from Days to Minutes with M...
How Thermo Fisher is Reducing Data Analysis Times from Days to Minutes with M...How Thermo Fisher is Reducing Data Analysis Times from Days to Minutes with M...
How Thermo Fisher is Reducing Data Analysis Times from Days to Minutes with M...
 
No SQL at The Guardian
No SQL at The GuardianNo SQL at The Guardian
No SQL at The Guardian
 
VIE - Using RDFa to make content editable
VIE - Using RDFa to make content editableVIE - Using RDFa to make content editable
VIE - Using RDFa to make content editable
 

Más de Hiroaki Kubota

Db tech showcase2015 how to replicate between clusters
Db tech showcase2015 how to replicate between clustersDb tech showcase2015 how to replicate between clusters
Db tech showcase2015 how to replicate between clustersHiroaki Kubota
 
DB tech showcase: 噂のMongoDBその用途は?
DB tech showcase: 噂のMongoDBその用途は?DB tech showcase: 噂のMongoDBその用途は?
DB tech showcase: 噂のMongoDBその用途は?Hiroaki Kubota
 
MongoDBで自然言語処理
MongoDBで自然言語処理MongoDBで自然言語処理
MongoDBで自然言語処理Hiroaki Kubota
 
MongoDBJP 納涼もんご祭り
MongoDBJP 納涼もんご祭りMongoDBJP 納涼もんご祭り
MongoDBJP 納涼もんご祭りHiroaki Kubota
 
Why mincore() returns different value of stat ?
Why mincore() returns different value of stat ?Why mincore() returns different value of stat ?
Why mincore() returns different value of stat ?Hiroaki Kubota
 
Mongo ghostsync and slaveDelay
Mongo ghostsync and slaveDelayMongo ghostsync and slaveDelay
Mongo ghostsync and slaveDelayHiroaki Kubota
 
C10K on Mongo's sharding
C10K on Mongo's shardingC10K on Mongo's sharding
C10K on Mongo's shardingHiroaki Kubota
 

Más de Hiroaki Kubota (9)

Db tech showcase2015 how to replicate between clusters
Db tech showcase2015 how to replicate between clustersDb tech showcase2015 how to replicate between clusters
Db tech showcase2015 how to replicate between clusters
 
DB tech showcase: 噂のMongoDBその用途は?
DB tech showcase: 噂のMongoDBその用途は?DB tech showcase: 噂のMongoDBその用途は?
DB tech showcase: 噂のMongoDBその用途は?
 
MongoDBで自然言語処理
MongoDBで自然言語処理MongoDBで自然言語処理
MongoDBで自然言語処理
 
MongoDBJP 納涼もんご祭り
MongoDBJP 納涼もんご祭りMongoDBJP 納涼もんご祭り
MongoDBJP 納涼もんご祭り
 
Why mincore() returns different value of stat ?
Why mincore() returns different value of stat ?Why mincore() returns different value of stat ?
Why mincore() returns different value of stat ?
 
Mongo ghostsync and slaveDelay
Mongo ghostsync and slaveDelayMongo ghostsync and slaveDelay
Mongo ghostsync and slaveDelay
 
C10K on Mongo's sharding
C10K on Mongo's shardingC10K on Mongo's sharding
C10K on Mongo's sharding
 
Cockatoo
CockatooCockatoo
Cockatoo
 
Albatross
AlbatrossAlbatross
Albatross
 

Último

Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityIES VE
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...Wes McKinney
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI AgeCprime
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesThousandEyes
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfNeo4j
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentPim van der Noll
 
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Nikki Chapple
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationKnoldus Inc.
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Alkin Tezuysal
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Generative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptxGenerative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptxfnnc6jmgwh
 
QCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesQCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesBernd Ruecker
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality AssuranceInflectra
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 

Último (20)

Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a reality
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI Age
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdf
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
 
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog Presentation
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Generative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptxGenerative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptx
 
QCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesQCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architectures
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 

KVSの性能を活かすAll-in-one NoSQLデータベースMongoDBの性能と問題点

  • 1. KVSの性能 RDBMSのインデックス 更にMapReduceを併せ持つ All-in-one NoSQL Rakuten,inc DU Architect Group Hiroaki Kubota |2011/1/18 1
  • 3. Who am I ? 3
  • 4. Introduction Profile Name: 窪田 博昭 Hiroaki Kubota Company: Rakuten Inc. Unit: ACT = Development Unit Architect Group Mail: hiroaki.kubota@mail.rakuten.com Hobby: Futsal , Golf Recent: My physical power has gradual declined... twitter : crumbjp github: crumbjp 4
  • 5. Introduction Agenda • Introduction • Mongo’s characteristic • How to take advantage of the mongo for our service – Our new system “cockatoo” – MapReduce • Structure & Performance • Performance example ( on EC2 large ) • Major problems... – Indexing – STALE – Diskspace – PHP client • Closing 5
  • 7. Mongo’s characteristic Mongo’s ... / Mongo has ... / Mongo is ... READ performance is extremely good ! WRITE performance is so-so,but cannot be scalable. To READ data immediately after it is WRITTEN is bad. Very high availability ! Under development. Maintenance tools are poor. Some useless operations. 7
  • 8. How to take advantages of the Mongo for the infoseek news 8
  • 9. Our new system “Cockatoo” (used to be call “Albatross”) 9
  • 10. For instance of our page 10
  • 12. Layout / Components Layout Components 12
  • 13. Generic WEB structure Internet Internet Request WEB WEB Call APIs API API Retrieve data DB 13
  • 14. Cockatoo structure Internet Internet Request SessionDB LayoutDB Gat page layout MongoDB WEB WEB ReplSet MongoDB ReplSet Get components Call APIs Memcache API API Retrieve data ContentsDB MongoDB ReplSet 14
  • 15. Cockatoo structure Internet Internet Request SessionDB LayoutDB Gat page layout Mongo’s READ performance MongoDB is WEB WEB enough to cope with ReplSet MongoDB ReplSet WEB PV. Get components Call APIs Memcache But WRITE performance is not enough. API API Retrieve data ContentsDB MongoDB ReplSet 15
  • 16. Cockatoo structure Internet Internet Request SessionDB LayoutDB Gat page layout MongoDB WEB WEB ReplSet MongoDB ReplSet Get components Call APIs Memcache API API Retrieve data ContentsDB MongoDB ReplSet 16
  • 17. Cockatoo structure Internet Internet Request SessionDB LayoutDB Gat page layout MongoDB WEB WEB ReplSet MongoDB ReplSet Get components Call APIs Memcache API API Zookeeper Retrieve data ContentsDB MongoDB ReplSet 17
  • 18. Cockatoo structure Internet Internet Request SessionDB LayoutDB Gat page layout MongoDB WEB WEB ReplSet MongoDB ReplSet Get components Call APIs Memcache API API Zookeeper Solr Retrieve data ContentsDB MongoDB ReplSet 18
  • 19. Cockatoo structure Developer HTML markup LayoutDB Set page layout & Deploy API API settings CMS Batch servers MongoDB ReplSet Set components Insert Data API servers API servers Set static contents ContentsDB MongoDB ReplSet 19
  • 21. CMS 21
  • 22. CMS 22
  • 23. MapReduce 23
  • 24. MapReduce Our usage We have never used MapReduce as regular operation. However, We have used it for some irreglar case. • To search the invalid articles that should be removed because of someone’s mistakes... • To analyze the number of new articles posted a day. • To analyze the updated number an article. • We get start considering to use it regularly for the social data analyzing before long ... 24
  • 26. Structure We are using very poor machine (Virtual machine) !! • Intel(R) Xeon(R) CPU X5650 2.67GHz 1core!! • 4GB memory • 50 GB disk space ( iScsi ) • CentOS5.5 64bit • mongodb 1.8.0 – ReplicaSet 5 nodes ( + 1 Arbiter) – Oplog size 1.2GB – Average object size 1KB 26
  • 27. Structure Researched environment We’ve also researched following environments... • Virtual machine 1 core – 1kb data , 6,000,000 documents – 8kb data , 200,000 documents • Virtual machine 3 core – 1kb data , 6,000,000 documents – 8kb data , 200,000 documents • EC2 large instance – 2kb data , 60,000,000 documents. ( 100GB ) 27
  • 28. Performance I found the formula for making a rough estimation of QPS 1~8 kb documents + 1 unique index C = Number of CPU cores (Xeon 2.67 GHz) DD = Score of ‘dd’ command (byte/sec) S = Document size (byte) • GET qps = 4500 × C • SET(fsync) bytes/s = 0.05×DD ÷ S • SET(nsync) qps = 4500 BUT... have chance of STALE 28
  • 29. Performance example (on EC2 large) 29
  • 30. Performance example (on EC2 large) Environment and amount of data EC2 large instance – 2kb data , 60,000,000 documents. ( 100GB ) – 1 unique index Data-type { shop: 'someone', item: 'something', description: 'item explanation sentences...‘ } 30
  • 31. Performance example (on EC2 large) Batch insert (1000 documents) fsync=true 17906 sec (=289 min) (=3358 docs/sec) Ensure index (background=false) 4049 sec (=67min) 1.primary 2101 sec (=35min) 2.secondary 1948 sec (=32min) 31
  • 32. Performance example (on EC2 large) Add one node 5833sec (=97min) 1.Get files 2GB×48 2120 sec (=35min) 2._id indexing 1406 sec (=23min) 3.uniq indexing 2251 sec (=38min) 4.other processes 56 sec (=1 min) 32
  • 33. Performance example (on EC2 large) Group by • Reduce by unique index & map & reduce –368 msec db.data.group({ key: { shop: 1}, cond: { shop: 'someone' }, reduce: function ( o , p ) { p.sum++; }, initial: { sum: 0 } }); 33
  • 34. Performance example (on EC2 large) MapReduce • Scan all data 3116sec (=52min) –number of key = 39092 db.data.mapReduce( function(){ emit(this.shop,1); }, function(k,v){ var ret=0; v.forEach( function (value){ ret+=value; }); return ret; }, { query: {}, inline: 1, out: 'Tmp' } ); 34
  • 36. Indexing 36
  • 37. Index probrem Online indexisng is completely useless even if last version (2.0.2) Indexing is lock operation in default. Indexing operation can run as background on the primary. But... It CANNOT run as background on the secondary Moreover the all secondary’s indexing run at the same time !! Result in above... All slave freezes ! orz... 37
  • 38. Present indexing ( default ) 38
  • 39. Index probrem Present indexing ( default ) Primary save Batch Secondary Secondary Secondary Client Client Client Client Client 39
  • 40. Index probrem Present indexing ( default ) Primary ensureIndex Lock Cannot Batch write Indexing Secondary Secondary Secondary Client Client Client Client Client 40
  • 41. Index probrem Present indexing ( default ) Primary finished Batch Complete SYNC SYNC SYNC Secondary Secondary Secondary Lock Lock Lock Indexing Indexing Indexing Cannot read !! Client Client Client Client Client 41
  • 42. Index probrem Present indexing ( default ) Primary Batch Complete Secondary Secondary Secondary Complete Complete Complete Client Client Client Client Client 42
  • 43. Present indexing ( background ) 43
  • 44. Index probrem Present indexing ( background ) Primary save Batch Secondary Secondary Secondary Client Client Client Client Client 44
  • 45. Index probrem Present indexing ( background ) ensureIndex(background) Primary Slow down... Slowdown Batch Indexing Secondary Secondary Secondary Client Client Client Client Client 45
  • 46. Index probrem Present indexing ( background ) Primary finished Batch Complete SYNC SYNC SYNC Secondary Secondary Secondary Lock Lock Lock Indexing Indexing Indexing Cannot read !! Client Client Client Client Client 46
  • 47. Index probrem Present indexing ( background ) Primary finished Batch Background Complete don’t work indexing SYNC SYNC SYNC Secondary on the Lock secondaries Secondary Lock Secondary Lock Indexing Indexing Indexing Cannot read !! Client Client Client Client Client 47
  • 48. Index probrem Present indexing ( background ) Primary finished Batch Complete SYNC SYNC SYNC Secondary Secondary Secondary Lock Lock Lock Indexing Indexing Indexing Cannot read !! Client Client Client Client Client 48
  • 49. Index probrem Present indexing ( background ) Primary Batch Complete Secondary Secondary Secondary Complete Complete Complete Client Client Client Client Client 49
  • 51. Index probrem Accoding to mongodb.org this probrem will fix in 2.1.0 But not released formally. So I checked out the source code up to date. Certainlly it’ll be fixed ! Moreover it sounds like it’ll run as foreground when slave status isn’t SECONDARY (Does it means RECOVERING ?) 51
  • 52. Index probrem Probable 2.1.X indexing Primary save Batch Secondary Secondary Secondary Client Client Client Client Client 52
  • 53. Index probrem Probable 2.1.X indexing ensureIndex(background) Primary Slow down... Slowdown Batch Indexing Secondary Secondary Secondary Client Client Client Client Client 53
  • 54. Index probrem Probable 2.1.X indexing Primary finished Batch Complete SYNC SYNC SYNC Secondary Secondary Secondary Slowdown Slowdown Slowdown Indexing Indexing Indexing Slow down... Client Client Client Client Client 54
  • 55. Index probrem Probable 2.1.X indexing Primary Batch Complete Secondary Secondary Secondary Complete Complete Complete Client Client Client Client Client 55
  • 56. Index probrem Background indexing 2.1.X But I think it’s not enough. I think it can bring failure to our system when the all secondaries slowdown at the same time !! So... 56
  • 58. Index probrem Ideal indexing Primary save Batch Secondary Secondary Secondary Client Client Client Client Client 58
  • 59. Index probrem Ideal indexing ensureIndex(background) Primary Slow down... Slowdown Batch Indexing Secondary Secondary Secondary Client Client Client Client Client 59
  • 60. Index probrem Ideal indexing Primary finished Batch Complete ensureIndex Recovering Secondary Secondary Indexing Client Client Client Client Client 60
  • 61. Index probrem Ideal indexing Primary Batch Complete ensureIndex Secondary Recovering Secondary Complete Indexing Client Client Client Client Client 61
  • 62. Index probrem Ideal indexing Primary Batch Complete ensureIndex Secondary Secondary Recovering Complete Complete Indexing Client Client Client Client Client 62
  • 63. Index probrem Ideal indexing Primary Batch Complete Secondary Secondary Secondary Complete Complete Complete Client Client Client Client Client 63
  • 64. Index probrem But ... I easilly guess it’s difficult to apply for current Oplog It would be great if I can operate indexing manually at each secondaries 64
  • 65. I suggest Manual indexing 65
  • 66. Index probrem Manual indexing Primary save Batch Secondary Secondary Secondary Client Client Client Client Client 66
  • 67. Index probrem Manual indexing ensureIndex(manual,background) Primary Slow down... Slowdown Batch Indexing Secondary Secondary Secondary Client Client Client Client Client 67
  • 68. Index probrem Manual indexing Primary finished Batch Complete Secondary Secondary Secondary Client Client Client Client Client 68
  • 69. Index probrem Manual indexing Primary finished Batch Complete Secondary Secondary Secondary The secondaries don’t sync automatically Client Client Client Client Client 69
  • 70. Index probrem Manual indexing Primary finished Batch Complete Secondary Secondary Secondary Client Client Client Client Client 70
  • 71. Index probrem Manual indexing Primary Batch Complete ensureIndex(manual) Recovering Secondary Secondary Indexing Client Client Client Client Client 71
  • 72. Index probrem Manual indexing Primary Batch Complete ensureIndex(manual) Secondary Recovering Secondary Complete Indexing Client Client Client Client Client 72
  • 73. Index probrem Manual indexing Primary Batch Complete ensureIndex(manual,background) Secondary Secondary Secondary Slowdown Complete Complete Indexing Client Client Client Client Client 73
  • 74. Index probrem Manual indexing Primary Batch It needs to support Complete ensureIndex(manual,background) background operation Secondary Secondary Secondary Slowdown Complete Complete Indexing Just in case,if the ReplSet has only one Secondary Client Client Client Client Client 74
  • 75. Index probrem Manual indexing Primary Batch Complete ensureIndex(manual,background) Secondary Secondary Secondary Slowdown Complete Complete Indexing Client Client Client Client Client 75
  • 76. Index probrem Manual indexing Primary Batch Complete Secondary Secondary Secondary Complete Complete Complete Client Client Client Client Client 76
  • 77. That’s all about Indexing problem 77
  • 78. Struggle to control the sync 78
  • 79. STALE 79
  • 80. Unknown log & Out of control the ReplSet We often suffered from going out of control the Secondaries... • Secondaries change status repeatedly in a moment between Secondary and Recovering (1.8.0) • Then we found the strange line in the log... [rsSync] replSet error RS102 too stale to catch up 80
  • 81. What’s Stale ? stale [stéil] (レベル:社会人必須 ) powered by goo.ne.jp • 〈食品・飲料などが〉新鮮でない(⇔fresh); • 気の抜けた, 〈コーヒーが〉香りの抜けた, • 〈パンが〉ひからびた, 堅くなった, • 〈空気・臭(にお)いなどが〉むっとする, • いやな臭いのする 81
  • 82. What’s Stale ? stale [stéil] (レベル:社会人必須 ) powered by goo.ne.jp • 〈食品・飲料などが〉新鮮でない(⇔fresh); • 気の抜けた, 〈コーヒーが〉香りの抜けた, • 〈パンが〉ひからびた, 堅くなった, • 〈空気・臭(にお)いなどが〉むっとする, • いやな臭いのする どうも非常によろしくないらしい・・・ 82
  • 83. Mechanizm of being stale 83
  • 84. ReplicaSet Client Client mongod mongod Database Oplog Database Oplog Primary Secondary 84
  • 86. ReplicaSet Client Client mongod mongod Database Oplog Database Oplog Primary Secondary 86
  • 87. Insert & Replication 1 A Client Client Insert mongod mongod Insert A A Database Oplog Database Oplog Primary Secondary 87
  • 88. Insert & Replication 1 Client Client Sync Insert A Insert A A A Database Oplog Database Oplog Primary Secondary 88
  • 90. Stale Client Client mongod mongod Insert A Insert A A A Database Oplog Database Oplog Primary Secondary 90
  • 91. Insert & Replication 2 B Client Client Insert Insert B B Insert A Insert A A A Database Oplog Database Oplog Primary Secondary 91
  • 92. Insert & Replication 2 C Client Client Insert Insert C C Insert B B Insert A Insert A A A Database Oplog Database Oplog Primary Secondary 92
  • 93. Insert & Replication 2 A Client Client Update Update A Insert C C Insert B B Insert A Insert A A A Database Oplog Database Oplog Primary Secondary 93
  • 94. Insert & Replication 2 Client Client Check Oplog Update A Insert C C Insert B B Insert A Insert A A A Database Oplog Database Oplog Primary Secondary 94
  • 95. Insert & Replication 2 Client Client Sync Update A Update A Insert C Insert C C Insert B C Insert B B Insert A B Insert A A A Database Oplog Database Oplog Primary Secondary 95
  • 97. Stale Client Client mongod mongod Insert A Insert A A A Database Oplog Database Oplog Primary Secondary 97
  • 98. Stale B Client Client Insert Insert B B Insert A Insert A A A Database Oplog Database Oplog Primary Secondary 98
  • 99. Stale C Client Client Insert Insert C C Insert B B Insert A Insert A A A Database Oplog Database Oplog Primary Secondary 99
  • 100. Stale A Client Client Update Update A Insert C C Insert B B Insert A Insert A A A Database Oplog Database Oplog Primary Secondary 100
  • 101. Stale C Client Client Update Update C Update A C Insert C B Insert B Insert A A Insert A A Database Oplog Database Oplog Primary Secondary 101
  • 102. Stale D Client Client Insert Insert D D Update C C Update A B Insert C Insert A A Insert B A Database Insert A Database Oplog Primary Secondary 102
  • 103. Stale Client Client [Inset A] not found !! Check Oplog Insert D D Update C C Update A B Insert C Insert A A Insert B A Database Insert A Database Oplog Primary Secondary 103
  • 104. Stale Client Client [Inset A] not found !! Check Oplog It cannot get infomation about [Insert B]. Insert D D Update C C Update A So cannot sync !! B Insert C Insert A A Insert B A It’s called STALE Database Insert A Database Oplog Primary Recovering 104
  • 105. Stale We have to understand the importance of adjusting oplog size We can specify the oplog size as one of the command line option Only at the first time per the dbpath that is also specified as a command line. Also we cannot change the oplog size without clearing the dbpath. Be careful ! 105
  • 106. Replication (Join as a new node) 106
  • 107. InitialSync Client Client mongod Insert D D Update C C Update A B Insert C A Database Oplog Primary 107
  • 108. InitialSync Client Client mongod mongod Insert D D Update C C Update A B Insert C A Database Oplog Database Oplog Primary Startup 108
  • 109. InitialSync Client Client Get last Oplog Insert D D Update C C Update A B Insert C Insert D A Database Oplog Database Oplog Primary Recovering 109
  • 110. InitialSync D Client Client C B A Cloning DB Insert D D Update C C Update A B Insert C Insert D A Database Oplog Database Oplog Primary Recovering 110
  • 111. InitialSync D Client Client C B A Cloning DB Insert D D Update C C Update A B Insert C Insert D A A Database Oplog Database Oplog Primary Recovering 111
  • 112. InitialSync E D Client Client Insert C B A Cloning DB E Insert E D Insert D C Update C B B Update A Insert D A A Insert C Database Oplog Database Oplog Primary Recovering 112
  • 113. InitialSync B Client Client Update Cloning DB complete E Update B D Insert E D C Insert D C B Update C B Insert D A Update A A Database Oplog Database Oplog Primary Recovering 113
  • 114. InitialSync Client Client Check Oplog E Update B D Insert E D C Insert D C B Update C B Insert D A A Database Oplog Database Oplog Primary Recovering 114
  • 115. InitialSync Client Client Sync E Update B E D Insert E D Update B C Insert D C Insert E B Update C B Insert D A A Database Oplog Database Oplog Primary Secondary 115
  • 116. Additional infomation From source code. ( I’ve never examed these... ) Secondary will try to sync from other Secondaries when it cannot reach the Primary or might be stale against the Primary. There is a bit of chance that sync problem not occured if the secondary has old Oplog or larger Oplog space than Primary 116
  • 117. Sync from another secondary Client Client Insert D Insert D D Update C D Update C C Update A C Update A B Insert C Insert A B Insert C A Insert B A A Insert B Database Insert A Database Oplog Database Insert A Primary Secondary Secondary 117
  • 118. Sync from another secondary Client [Inset A] Client not found !! Check Oplog Insert D Insert D D Update C D Update C C Update A C Update A B Insert C Insert A B Insert C A Insert B A A Insert B Database Insert A Database Oplog Database Insert A Primary Secondary Secondary 118
  • 119. Sync from another secondary Client But found at the other secondary Client So it’s able to sync Check Oplog Insert D Insert D D Update C D Update C C Update A C Update A B Insert C Insert A B Insert C A Insert B A A Insert B Database Insert A Database Oplog Database Insert A Primary Secondary Secondary 119
  • 120. Sync from the other secondary Client But found at the other secondary Client So it’s able to sync Sync Insert D Insert D Insert D D Update C D Update C D Update C C Update A C Update A C Update A B Insert C B Insert C B Insert C A Insert B A Insert B A Insert B Insert A Database Insert A Database Database Insert A Primary Secondary Secondary 120
  • 121. That’s all about sync 121
  • 122. Others... 122
  • 123. Disk space 123
  • 124. Disk space Data fragment into any DB files sparsely... We met the unfavorable circumstance in our DBs This circumstance appears at some of our collections around 3 months after we launched the services db.ourcol.storageSize() = 16200727264 (15GB) db.ourcol.totalSize() = 16200809184 db.ourcol.totalIndexSize() = 81920 db.outcol.dataSize() = 2032300 (2MB) What’s happen to them !! 124
  • 125. Disk space Data fragment into any DB files sparsely... It’s seems like to be caused by the specific operation that insert , update and delete over and over. Anyway we have to shrink the using disk space regularly just like PostgreSQL’s vacume. But how to do it ? 125
  • 126. Disk space Shrink the using disk spaces MongoDB offers some functions for this case. But couldn’t use in our case ! repairdatabase: Only runable on the Primary. It needs long time and BLOCK all operations !! compact: Only runable on the Secondary. Zero-fill the blank space instead of shrink disk spaces. So cannot shrink... 126
  • 127. Disk space Our measurements For temporary collection: To issue drop-command regularly. For other collections: 1.Get rid of one secondary from the ReplSet. 2.Shut down this. 3.Remove all DB files. 4.Join to the ReplSet. 5.Do these operations one after another. 6.Step down the Primary. (Change Primary node) 7.At last, do 1 – 4 operations on prior Primary. 127
  • 128. Disk space Shrink operation Primary Secondary Secondary Bloated Bloated Bloated 128
  • 129. Disk space Shrink operation shutdown mongod (kill -15) Primary Dead Secondary Bloated Bloated Bloated 129
  • 130. Disk space Shrink operation delete DBPATH Primary Dead Secondary Bloated Nothing Bloated 130
  • 131. Disk space Shrink operation start mongod Primary Startup Secondary Bloated Nothing Bloated 131
  • 132. Disk space Shrink operation Primary Secondary Secondary Bloated Shrinked Bloated 132
  • 133. Disk space Shrink operation shutdown mongod delete DBPATH startup mongod Primary Secondary Secondary Bloated Shrinked Shrinked 133
  • 134. Disk space Shrink operation step down Secondary Primary Secondary Bloated Shrinked Shrinked 134
  • 135. Disk space Shrink operation shutdown mongod delete DBPATH startup mongod Secondary Primary Secondary Shrinked Shrinked Shrinked 135
  • 136. PHP client 136
  • 137. PHP client We tried 1.1.4 and 1.2.2 1.1.4: There is some critical bugs around connection pool. We struggled to invalidate the broken connection. I think, you should use 1.2.X instead of 1.1.X 1.2.2: It seems like to be fixed around connection pool. But there are 2 critical bugs ! –Socket handle leak –Useless sleep However, This version is relatively stable 137 as long as to fix these bugs
  • 138. PHP client Patches https://github.com/crumbjp/Personal - mongo1.2.2.non-wait.patch - mongo1.2.2.sock-leak.patch 138
  • 139. PHP client 139
  • 140. Closing 140
  • 141. Closing What’s MongoDB ? It has very good READ performance. We can use mongo instead of memcached. if we can allow the limited write performance. Die hard ! MongoDB have high availability even if under a severe stress.. Can use easilly without deep consideration We can manage to do anything after getting start to use. Let’s forget any awkward trivial things that have bothered us. How to treat the huge data ? How to put in the cache system ? How to keep the availablity ? And so on .... 141
  • 142. Closing Keep in mind Sharding is challenging... It’s last resort ! It’s hard to operate. In particular, to maintain config-servers. [Mongos] is also difficult to keep alive. I want the way to failover Mongos. Mongo is able to run on the poor environment but... You should ONLY put aside the large diskspace Huge write is sensitive Adjust the oplog size carefully Indexing function has been unfinished Cannot apply index online 142
  • 143. All right, Have fun !! 143
  • 144. All right, Have fun !! ...with us at Rakuten 144
  • 145. All right, Have fun !! ...with us at Rakuten Please join Rakuten for cool work? 145
  • 146. Thank you for your listening 146