SlideShare una empresa de Scribd logo
1 de 38
Descargar para leer sin conexión
Issues and Tips for Big Data
       on Cassandra



                     Shotaro Kamio
Architecture and Core Technology dept., DU, Rakuten, Inc.   1
Contents


1   Big Data Problem in Rakuten


2   Contributions to Cassandra Project


3   System Architecture


4   Details and Tips


5   Conclusion




                                         2
Contents


1   Big Data Problem in Rakuten


2   Contributions to Cassandra Project


3   System Architecture


4   Details and Tips


5   Conclusion




                                         3

                                      
                                                                                                         
                                                                                                                                                                    Total size
                                                                                                                                       M
                                                                                                                                        on
                                                                                                                                          th
                                                                                                                                             -Y
                                                                                                                                           Ju ear
                                                                                                                                              n
                                                                                                                                          De -9
                                                                                                                                              c 7
                                                                                                                                           Ju -97
                                                                                                                                              n
                                                                                                                                          De -9
                                                                                                                                              c- 8
                                                                                                                                           Ju 98
                                                                                                                                              n
                                                                                                                                          De -99
                                                                                                                                              c
                                                                                                                                           Ju -99
                                                                                                                                              n
                                                                                                                                           Ja -00
                                                                                                                                              n
                                                                                                                                           Ju -00
                                                                                                                                              n
                                                                                                                                          De -01
                                                                                                                                              c
                                                                                                                                           Ju -01
                                                                                                                                              n
                                                                                                                                          De -0
                                                                                                                                              c 2
                                                                                                                                           Ju -02
                                                                                                                                              n
                                                                                                                                          De -0




    More than 1 billion records.
                                                                                                                                              c- 3
                                                                                                                                           Ju 03
                                                                                                                                              n
                                                                                                                                          De -0
                                                                                                                                              c 4

                                                           – Double its size every second year.
                                                                                                                                           Ju -04
                                                                                                                                              n
                                                                                                                                          De -05
                                                                                                  User data increases exponentially.
                                                                                                                                              c
                                                                                                                                           Ju -05
                                                                                                                                              n
                                                                                                                                          De -06
                                                                                                                                              c
                                                                                                                                           Ju -06
                                                                                                                                              n
                                                                                                                                          De -07
                                                                                                                                              c
                                                                                                                                           Ju -07
                                                                                                                                              n
                                                                                                                                          De -0
                                                                                                                                                                                 Big Data Problem in Rakuten




                                                                                                                                              c- 8
                                                                                                                                           Ju 08
                                                                                                                                              n
                                                                                                                                          De -0
                                                                                                                                              c 9
                                                                                                                                           Ju -09
                                                                                                                                                     2 years




                                                                                                                                              n
                                                                                                                                          De -1
                                                                                                                                              c- 0
    We need a scalable solution to handle this big data.
                                                                                                                                                               x2




                                                                                                                                                10
4
Importance of Data Store in Rakuten


• Rakuten have a lot of data
   – User data, item data, reviews, etc.
• Expect connectivity to Hadoop
• High-performance, fault-tolerant, scalable
  storage is necessary → Cassandra


             Service A           Service B   Service C   …



             Data A                Data B


                                                             5
Performance of New System (Cassandra)


   Store all data in 1 day
     – Achieved 15,000 updates/sec with quorum.
     – 50 times faster than DB.
                                              15,000 updates/sec
   Good read throughput
     – Handle more than 100 read threads at a
       time.
                                                x 50



                                                  DB   New


                                                              6
Contents


1   Big Data Problem in Rakuten


2   Contributions to Cassandra Project


3   System Architecture


4   Details and Tips


5   Conclusion




                                         7
Contributions to Cassandra Project


• Tested 0.7.x - 0.8.x

• Bug reports / Feedback to JIRA
   – CASSANDRA-2212, 2297, 2406, 2557, 2626 and more
   – Bugs related to specific condition, secondary index and large
     dataset.
• Contribute patches
   – Talk this in later slides.




                                                                     8
JIRA: Overflow in bytesPastMark(..)


•   https://issues.apache.org/jira/browse/CASSANDRA-2297


• Hit the error on a row which is more than 60GB
     – The row has column families of super column type


• bytesPastMark method was fixed to return long value.




                                                           9
JIRA: Stack overflow while compacting


•   https://issues.apache.org/jira/browse/CASSANDRA-2626


• Long series of compaction causes stack overflow.
← This occurs with large dataset.

• Helped debugging.




                                                           10
Challenges in OSS


• Not well tested with real big data.
→ Rakuten can feedback a lot to community.
   – Bug report, patches, and communication.
• OSS becomes much stable.



                    Feedback




                                               11
Contribution of Patches


• Column name aliasing
  – Encode column name in compact way.
  – Useful to reduce data size for structured (relational)
    data.
  – Reduce SSTable size by 15%.
• Variable-length quantity (VLQ) compression
  – Reduce encoding overhead in columns
  – Reduce SSTable size by 17%.




                                                             12
VLQ Compression Patch


• Serializer is changed to use VLQ encoding.
• Typical column has fixed length of:
   –   2 bytes for column name length
   –   1 byte for flag
   –   8 bytes for TTL, deletion time
   –   8 bytes for timestamp
   –   4 bytes for length of value.
• Those encoding overheads are reduced.



                                               13
Contents


1   Big Data Problem in Rakuten


2   Contributions to Cassandra Project


3   System Architecture


4   Details and Tips


5   Conclusion




                                         14
System Architecture




                               DB




                                    …
                          DB



                         Cassandra 1
     B atch



       Data
      feeder
              

DB                                      Services
     B atch
                     …

                               DB




                                    …
                          DB



                         Cassandra 2


     Backup

                                                   15
System Architecture




                               DB




                                    …
                          DB



                         Cassandra 1
     B atch



       Data
      feeder
              

DB                                      Services
     B atch
                     …

                               DB




                                    …
                          DB



                         Cassandra 2


     Backup

                                                   16
Planning: Schema Design


• Data modeling is a key of scalability.
• Design schema
   – Query patterns for super column and normal column.
• Think queries based on use cases.
   – Batch operation to reduce number of requests because Thrift has
     communication overhead.
• Secondary Index
   – We used it to find out updated data.
• Choose partitioner appropriately.
   – One partitioner for a cluster.




                                                                       17
Secondary Index


• Pros
   – Useful to query based on a column value.
   – It can reduce consistency problem.
   – For example, to query updated data based on update-time.
• Cons
   – Performance of complex query depends on data.
      E.g., Year == 2011 and Price < 100




                                                                18
A Bit Detail of Secondary Index


   Works like a hash + filters.
    1. Pick up a row which has a key for the index (hash).
    2. Apply filters.
        – Collect the result if all filters are matched.
    1. Repeat until the requested number of rows are obtained.

                                            E.g., Year == 2011 and Price < 100
Key1     Year = 2011

Key2     Year = 2011       Price = 1,000
                                                     Many keys of year = 2011,
Key3     Year = 2011       Price = 10                    but a few results.
Key4     Year = 2011       Price = 10,000

Key5     Year = 2011       Price = 200

                                                                                 19
A Bit Detail of Secondary Index (2)


   Consider the frequency of results for the query
     – Very few result in large data set → query might get
       timeout.
   Careful data/query design is necessary at this moment.
   Improvement is discussed: CASSANDRA-2915




                                                             20
Planning: Data Size Estimation


• Estimate future data volume
• Serialization overhead: x 3 - 4
   – Big overhead for small data.
   – We improved with custom patches, compression code
      • Cassandra 1.0 can use Snappy/Deflate compression.
• Replication: x 3 (depends on your decision)
• Compaction: x 2 or above




                                                            21
Other Factors for Data Size


• Obsolete SSTables
   – Disk usage may keep high after compaction.
   – Cassandra 0.8.x relies on GC to remove obsolete SSTables.
   – Improved in 1.0.

• How to balance data distribution
   – Disk usage can be unbalanced (ByteOrderedPartitioner).
   – Partitioning, key design, initial token assignment.
   – Very helpful if you know data in advance.



• Backup scheme affects disk space
   – Need backup space.
   – Discuss later.
                                                                 22
Configuration


• We adopted Cassandra 0.8.x + custom patches.
• Without mmap
   – No noticeable difference on performance
   – Easier to monitor and debug memory usage and GC related
     issues
• ulimit
   – Avoid file descriptor shortage. Need more than number of db
     files. Bug??
   – “memlock unlimited” for JNA
   – Make /etc/security/limits.d/cassandra.conf (Redhat)




                                                                   23
JVM / GC


• Have to avoid Full GC anytime.
• JVM cannot utilize large heap over 15G.
   – Slow GC. Can be unstable.
   – Don’t give too much data/cache into heap.
   – Off-heap cache is available in 0.8.1
• Cassandra may use more memory than heap size.
   – ulimit –d 25000000 (max data segment size)
   – ulimit –v 75000000 (max virtual memory size)
• Need benchmark to know appropriate parameters.




                                                    24
Parameter Tuning for Failure Detector


•   Cassandra uses Phi Accrual Failure Detector
     – The Φ Accrual Failure Detector [SRDS'04]

                                        double phi(long tnow)
•   Failure detection error occurs      {
    when node is having too much          int size = arrivalIntervals_.size();
                                          double log = 0d;
    access and/or GC running              if ( size > 0 )
                                          {
                                              double t = tnow - tLast_;
•   Depends on number of nodes:               double probability = p(t);
                                              log = (-1) * Math.log10( probability );
     – Larger cluster, larger number.     }
                                          return log;
                                        }
                                        double p(double t)
                                        {
                                            double mean = mean();
                                            double exponent = (-1)*(t)/mean;
                                            return Math.pow(Math.E, exponent);
                                        }

                                                                                    25
Hardware


• Benchmark is important to decide hardware.
   – Requirements for performance, data size, etc.
   – Cassandra is good at utilizing CPU cores.
• Network ports will be bottleneck to scale-out…
   – Large number of low-spec servers or
   – Small number of high-spec servers.



     Our case:
     • High-spec CPU and SSD drives
     • 2 clusters (active and test cluster)



                                                     26
System Architecture




                               DB




                                    …
                          DB



                         Cassandra 1
     B atch



       Data
      feeder
              

DB                                      Services
     B atch
                     …

                               DB




                                    …
                          DB



                         Cassandra 2


     Backup

                                                   27
Customize Hector Library


• Query can timeout on Cassandra:
   – When Cassandra is in high load temporarily.
   – Request of large result set
   – Timeout of secondary index query
• Hector retries forever when query get timed-out.
• Client cannot detect infinite loop.
• Customize:
   – 3 Timeouts to return exception to client.




                                                     28
System Architecture




                               DB




                                    …
                          DB



                         Cassandra 1
     B atch



       Data
      feeder
              

DB                                      Services
     B atch
                     …

                               DB




                                    …
                          DB



                         Cassandra 2


     Backup

                                                   29
Testing: Data Consistency Check Tool


   • We wanted to make sure data is not corrupted within
      Cassandra.
   • Made a tool to check the data consistency.
                                                 Input data
- Insert                                        (Periodically
- Update                                         comes in)
- Delete           Process A
                   Insert, update, and
                   delete data
Another
                   Process B                            Cassandra
database
                   Compare data with that
                   in Cassandra
                                                                    30
Testing: Data Consistency Check Tool (2)


   Compare only keys of data, not contents.
   Useful to diagnose which part is wrong in test phase.
   We found out other team’s bug as well




                                                            31
Repair


• Some types of query doesn’t trigger read repair.
• Nodetool repair is tricky on big data.
   – Disk usage
   – Time consuming
→ Read all data afterward: Read repair

• Discussion for improvement is going on:
   – CASSANDRA-2699




                                                     32
System Architecture




                               DB




                                    …
                          DB



                         Cassandra 1
     B atch



       Data
      feeder
              

DB                                      Services
     B atch
                     …

                               DB




                                    …
                          DB



                         Cassandra 2


     Backup

                                                   33
Backup Scheme

  Backup might be required to shorten recovery time.
1. Snapshot to local disk
    – Plan disk size at server estimation phase.
1. Full backup of input data
    – We had full data feed several times for various reasons:
       E.g., Logic change, schema change, data corruption, etc.


                                            DB

    Incoming




                                                 …
                                       DB



       data                           Cassandra

                    Backup
                                      Snapshot
                                       Snapshot
                                        Snapshot

                                                                  34
Contents


1   Big Data Problem in Rakuten


2   Contributions to Cassandra Project


3   System Architecture


4   Details and Tips


5   Conclusion




                                         35
Conclusion


• Rakuten uses Cassandra with Big data.
• We’ll continue contributing to OSS.




                                          36
最後に・・・




ちょっと宣伝させてください・・・




                   37
We are hiring! 中途採用を大募集しております!

楽天のMission

人と社会を(ネットを通じて)Empowermentし
自らの成功を通じ社会を変革し豊かにする
楽天のGOAL
              To become No.1
         Internet Service Company
                in the World
楽天のMission&GOALに共感いただける方は是非ご連絡ください!

       tech-career@mail.rakuten.com
                                         38

Más contenido relacionado

Destacado (6)

[Rakuten TechConf2014] [Sendai] Little look inside Global Ichiba: Ichiba Busi...
[Rakuten TechConf2014] [Sendai] Little look inside Global Ichiba: Ichiba Busi...[Rakuten TechConf2014] [Sendai] Little look inside Global Ichiba: Ichiba Busi...
[Rakuten TechConf2014] [Sendai] Little look inside Global Ichiba: Ichiba Busi...
 
第4回楽天研究開発シンポジウム.開会挨拶
第4回楽天研究開発シンポジウム.開会挨拶第4回楽天研究開発シンポジウム.開会挨拶
第4回楽天研究開発シンポジウム.開会挨拶
 
Hadoop at Rakuten, 2011/07/06
Hadoop at Rakuten, 2011/07/06Hadoop at Rakuten, 2011/07/06
Hadoop at Rakuten, 2011/07/06
 
[RakutenTechConf2013] [C-4_2] Building Structured Data from Product Descriptions
[RakutenTechConf2013] [C-4_2] Building Structured Data from Product Descriptions[RakutenTechConf2013] [C-4_2] Building Structured Data from Product Descriptions
[RakutenTechConf2013] [C-4_2] Building Structured Data from Product Descriptions
 
RIT (Rakuten Institute of Technology) presentation about UI/UX
RIT (Rakuten Institute of Technology) presentation about UI/UXRIT (Rakuten Institute of Technology) presentation about UI/UX
RIT (Rakuten Institute of Technology) presentation about UI/UX
 
Case Analysis Rakuten Ichiba
Case Analysis  Rakuten IchibaCase Analysis  Rakuten Ichiba
Case Analysis Rakuten Ichiba
 

Similar a Cassandra conference

art of presentation Map of Jamies Yam
art of presentation Map of Jamies Yamart of presentation Map of Jamies Yam
art of presentation Map of Jamies Yam
Jamies Yam
 
VMware vCloud Director and Nexus 1000V / Workload Mobility
VMware vCloud Director and Nexus 1000V / Workload MobilityVMware vCloud Director and Nexus 1000V / Workload Mobility
VMware vCloud Director and Nexus 1000V / Workload Mobility
Sal Lopez
 
Crompton Way Traffic Proposal Map
Crompton Way Traffic Proposal MapCrompton Way Traffic Proposal Map
Crompton Way Traffic Proposal Map
guestf8bf20
 

Similar a Cassandra conference (20)

art of presentation Map of Jamies Yam
art of presentation Map of Jamies Yamart of presentation Map of Jamies Yam
art of presentation Map of Jamies Yam
 
VMware vCloud Director and Nexus 1000V / Workload Mobility
VMware vCloud Director and Nexus 1000V / Workload MobilityVMware vCloud Director and Nexus 1000V / Workload Mobility
VMware vCloud Director and Nexus 1000V / Workload Mobility
 
Tvr new map 2012
Tvr new map 2012Tvr new map 2012
Tvr new map 2012
 
Brocade Migration Example
Brocade Migration ExampleBrocade Migration Example
Brocade Migration Example
 
UBD Media Kit 2012
UBD Media Kit 2012UBD Media Kit 2012
UBD Media Kit 2012
 
Webster City Enterprise Zone Map
Webster City Enterprise Zone MapWebster City Enterprise Zone Map
Webster City Enterprise Zone Map
 
Report: HSE in the Oilfield
Report: HSE in the OilfieldReport: HSE in the Oilfield
Report: HSE in the Oilfield
 
Jun05 A01 Bct
Jun05 A01 BctJun05 A01 Bct
Jun05 A01 Bct
 
International Trade Compliance Strategy Responsibility Matrix
International Trade Compliance Strategy Responsibility MatrixInternational Trade Compliance Strategy Responsibility Matrix
International Trade Compliance Strategy Responsibility Matrix
 
High stakes world of Mobile Payments
High stakes world of Mobile PaymentsHigh stakes world of Mobile Payments
High stakes world of Mobile Payments
 
High stakes-world-of-mobile-payments-infographic
High stakes-world-of-mobile-payments-infographicHigh stakes-world-of-mobile-payments-infographic
High stakes-world-of-mobile-payments-infographic
 
9 18 Part 2
9 18 Part 29 18 Part 2
9 18 Part 2
 
3AMIGAS - Keynote: Pjotr Van Schothorst, VStep
3AMIGAS - Keynote: Pjotr Van Schothorst, VStep3AMIGAS - Keynote: Pjotr Van Schothorst, VStep
3AMIGAS - Keynote: Pjotr Van Schothorst, VStep
 
The Content Creation Workflow of the Ship Simulator Game - A Case Study
The Content Creation Workflow of the Ship Simulator Game - A Case StudyThe Content Creation Workflow of the Ship Simulator Game - A Case Study
The Content Creation Workflow of the Ship Simulator Game - A Case Study
 
Are you paying attention
Are you paying attentionAre you paying attention
Are you paying attention
 
Timeline 1
Timeline 1Timeline 1
Timeline 1
 
Crompton Way Traffic Proposal Map
Crompton Way Traffic Proposal MapCrompton Way Traffic Proposal Map
Crompton Way Traffic Proposal Map
 
Hse Product Promo
Hse Product PromoHse Product Promo
Hse Product Promo
 
Hse product promo
Hse product promoHse product promo
Hse product promo
 
Exerpt From Exec Overview
Exerpt From Exec OverviewExerpt From Exec Overview
Exerpt From Exec Overview
 

Más de Rakuten Group, Inc.

Más de Rakuten Group, Inc. (20)

コードレビュー改善のためにJenkinsとIntelliJ IDEAのプラグインを自作してみた話
コードレビュー改善のためにJenkinsとIntelliJ IDEAのプラグインを自作してみた話コードレビュー改善のためにJenkinsとIntelliJ IDEAのプラグインを自作してみた話
コードレビュー改善のためにJenkinsとIntelliJ IDEAのプラグインを自作してみた話
 
楽天における安全な秘匿情報管理への道のり
楽天における安全な秘匿情報管理への道のり楽天における安全な秘匿情報管理への道のり
楽天における安全な秘匿情報管理への道のり
 
What Makes Software Green?
What Makes Software Green?What Makes Software Green?
What Makes Software Green?
 
Simple and Effective Knowledge-Driven Query Expansion for QA-Based Product At...
Simple and Effective Knowledge-Driven Query Expansion for QA-Based Product At...Simple and Effective Knowledge-Driven Query Expansion for QA-Based Product At...
Simple and Effective Knowledge-Driven Query Expansion for QA-Based Product At...
 
DataSkillCultureを浸透させる楽天の取り組み
DataSkillCultureを浸透させる楽天の取り組みDataSkillCultureを浸透させる楽天の取り組み
DataSkillCultureを浸透させる楽天の取り組み
 
大規模なリアルタイム監視の導入と展開
大規模なリアルタイム監視の導入と展開大規模なリアルタイム監視の導入と展開
大規模なリアルタイム監視の導入と展開
 
楽天における大規模データベースの運用
楽天における大規模データベースの運用楽天における大規模データベースの運用
楽天における大規模データベースの運用
 
楽天サービスを支えるネットワークインフラストラクチャー
楽天サービスを支えるネットワークインフラストラクチャー楽天サービスを支えるネットワークインフラストラクチャー
楽天サービスを支えるネットワークインフラストラクチャー
 
楽天の規模とクラウドプラットフォーム統括部の役割
楽天の規模とクラウドプラットフォーム統括部の役割楽天の規模とクラウドプラットフォーム統括部の役割
楽天の規模とクラウドプラットフォーム統括部の役割
 
Rakuten Services and Infrastructure Team.pdf
Rakuten Services and Infrastructure Team.pdfRakuten Services and Infrastructure Team.pdf
Rakuten Services and Infrastructure Team.pdf
 
The Data Platform Administration Handling the 100 PB.pdf
The Data Platform Administration Handling the 100 PB.pdfThe Data Platform Administration Handling the 100 PB.pdf
The Data Platform Administration Handling the 100 PB.pdf
 
Supporting Internal Customers as Technical Account Managers.pdf
Supporting Internal Customers as Technical Account Managers.pdfSupporting Internal Customers as Technical Account Managers.pdf
Supporting Internal Customers as Technical Account Managers.pdf
 
Making Cloud Native CI_CD Services.pdf
Making Cloud Native CI_CD Services.pdfMaking Cloud Native CI_CD Services.pdf
Making Cloud Native CI_CD Services.pdf
 
How We Defined Our Own Cloud.pdf
How We Defined Our Own Cloud.pdfHow We Defined Our Own Cloud.pdf
How We Defined Our Own Cloud.pdf
 
Travel & Leisure Platform Department's tech info
Travel & Leisure Platform Department's tech infoTravel & Leisure Platform Department's tech info
Travel & Leisure Platform Department's tech info
 
Travel & Leisure Platform Department's tech info
Travel & Leisure Platform Department's tech infoTravel & Leisure Platform Department's tech info
Travel & Leisure Platform Department's tech info
 
OWASPTop10_Introduction
OWASPTop10_IntroductionOWASPTop10_Introduction
OWASPTop10_Introduction
 
Introduction of GORA API Group technology
Introduction of GORA API Group technologyIntroduction of GORA API Group technology
Introduction of GORA API Group technology
 
100PBを越えるデータプラットフォームの実情
100PBを越えるデータプラットフォームの実情100PBを越えるデータプラットフォームの実情
100PBを越えるデータプラットフォームの実情
 
社内エンジニアを支えるテクニカルアカウントマネージャー
社内エンジニアを支えるテクニカルアカウントマネージャー社内エンジニアを支えるテクニカルアカウントマネージャー
社内エンジニアを支えるテクニカルアカウントマネージャー
 

Último

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 

Último (20)

Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 

Cassandra conference

  • 1. Issues and Tips for Big Data on Cassandra Shotaro Kamio Architecture and Core Technology dept., DU, Rakuten, Inc. 1
  • 2. Contents 1 Big Data Problem in Rakuten 2 Contributions to Cassandra Project 3 System Architecture 4 Details and Tips 5 Conclusion 2
  • 3. Contents 1 Big Data Problem in Rakuten 2 Contributions to Cassandra Project 3 System Architecture 4 Details and Tips 5 Conclusion 3
  • 4.   Total size M on th -Y Ju ear n De -9 c 7 Ju -97 n De -9 c- 8 Ju 98 n De -99 c Ju -99 n Ja -00 n Ju -00 n De -01 c Ju -01 n De -0 c 2 Ju -02 n De -0 More than 1 billion records. c- 3 Ju 03 n De -0 c 4 – Double its size every second year. Ju -04 n De -05 User data increases exponentially. c Ju -05 n De -06 c Ju -06 n De -07 c Ju -07 n De -0 Big Data Problem in Rakuten c- 8 Ju 08 n De -0 c 9 Ju -09 2 years n De -1 c- 0 We need a scalable solution to handle this big data. x2 10 4
  • 5. Importance of Data Store in Rakuten • Rakuten have a lot of data – User data, item data, reviews, etc. • Expect connectivity to Hadoop • High-performance, fault-tolerant, scalable storage is necessary → Cassandra Service A Service B Service C … Data A Data B 5
  • 6. Performance of New System (Cassandra)  Store all data in 1 day – Achieved 15,000 updates/sec with quorum. – 50 times faster than DB. 15,000 updates/sec  Good read throughput – Handle more than 100 read threads at a time. x 50 DB New 6
  • 7. Contents 1 Big Data Problem in Rakuten 2 Contributions to Cassandra Project 3 System Architecture 4 Details and Tips 5 Conclusion 7
  • 8. Contributions to Cassandra Project • Tested 0.7.x - 0.8.x • Bug reports / Feedback to JIRA – CASSANDRA-2212, 2297, 2406, 2557, 2626 and more – Bugs related to specific condition, secondary index and large dataset. • Contribute patches – Talk this in later slides. 8
  • 9. JIRA: Overflow in bytesPastMark(..) • https://issues.apache.org/jira/browse/CASSANDRA-2297 • Hit the error on a row which is more than 60GB – The row has column families of super column type • bytesPastMark method was fixed to return long value. 9
  • 10. JIRA: Stack overflow while compacting • https://issues.apache.org/jira/browse/CASSANDRA-2626 • Long series of compaction causes stack overflow. ← This occurs with large dataset. • Helped debugging. 10
  • 11. Challenges in OSS • Not well tested with real big data. → Rakuten can feedback a lot to community. – Bug report, patches, and communication. • OSS becomes much stable. Feedback 11
  • 12. Contribution of Patches • Column name aliasing – Encode column name in compact way. – Useful to reduce data size for structured (relational) data. – Reduce SSTable size by 15%. • Variable-length quantity (VLQ) compression – Reduce encoding overhead in columns – Reduce SSTable size by 17%. 12
  • 13. VLQ Compression Patch • Serializer is changed to use VLQ encoding. • Typical column has fixed length of: – 2 bytes for column name length – 1 byte for flag – 8 bytes for TTL, deletion time – 8 bytes for timestamp – 4 bytes for length of value. • Those encoding overheads are reduced. 13
  • 14. Contents 1 Big Data Problem in Rakuten 2 Contributions to Cassandra Project 3 System Architecture 4 Details and Tips 5 Conclusion 14
  • 15. System Architecture DB … DB Cassandra 1 B atch Data feeder          DB Services B atch … DB … DB Cassandra 2 Backup 15
  • 16. System Architecture DB … DB Cassandra 1 B atch Data feeder          DB Services B atch … DB … DB Cassandra 2 Backup 16
  • 17. Planning: Schema Design • Data modeling is a key of scalability. • Design schema – Query patterns for super column and normal column. • Think queries based on use cases. – Batch operation to reduce number of requests because Thrift has communication overhead. • Secondary Index – We used it to find out updated data. • Choose partitioner appropriately. – One partitioner for a cluster. 17
  • 18. Secondary Index • Pros – Useful to query based on a column value. – It can reduce consistency problem. – For example, to query updated data based on update-time. • Cons – Performance of complex query depends on data. E.g., Year == 2011 and Price < 100 18
  • 19. A Bit Detail of Secondary Index  Works like a hash + filters. 1. Pick up a row which has a key for the index (hash). 2. Apply filters. – Collect the result if all filters are matched. 1. Repeat until the requested number of rows are obtained. E.g., Year == 2011 and Price < 100 Key1 Year = 2011 Key2 Year = 2011 Price = 1,000 Many keys of year = 2011, Key3 Year = 2011 Price = 10 but a few results. Key4 Year = 2011 Price = 10,000 Key5 Year = 2011 Price = 200 19
  • 20. A Bit Detail of Secondary Index (2)  Consider the frequency of results for the query – Very few result in large data set → query might get timeout.  Careful data/query design is necessary at this moment.  Improvement is discussed: CASSANDRA-2915 20
  • 21. Planning: Data Size Estimation • Estimate future data volume • Serialization overhead: x 3 - 4 – Big overhead for small data. – We improved with custom patches, compression code • Cassandra 1.0 can use Snappy/Deflate compression. • Replication: x 3 (depends on your decision) • Compaction: x 2 or above 21
  • 22. Other Factors for Data Size • Obsolete SSTables – Disk usage may keep high after compaction. – Cassandra 0.8.x relies on GC to remove obsolete SSTables. – Improved in 1.0. • How to balance data distribution – Disk usage can be unbalanced (ByteOrderedPartitioner). – Partitioning, key design, initial token assignment. – Very helpful if you know data in advance. • Backup scheme affects disk space – Need backup space. – Discuss later. 22
  • 23. Configuration • We adopted Cassandra 0.8.x + custom patches. • Without mmap – No noticeable difference on performance – Easier to monitor and debug memory usage and GC related issues • ulimit – Avoid file descriptor shortage. Need more than number of db files. Bug?? – “memlock unlimited” for JNA – Make /etc/security/limits.d/cassandra.conf (Redhat) 23
  • 24. JVM / GC • Have to avoid Full GC anytime. • JVM cannot utilize large heap over 15G. – Slow GC. Can be unstable. – Don’t give too much data/cache into heap. – Off-heap cache is available in 0.8.1 • Cassandra may use more memory than heap size. – ulimit –d 25000000 (max data segment size) – ulimit –v 75000000 (max virtual memory size) • Need benchmark to know appropriate parameters. 24
  • 25. Parameter Tuning for Failure Detector • Cassandra uses Phi Accrual Failure Detector – The Φ Accrual Failure Detector [SRDS'04] double phi(long tnow) • Failure detection error occurs { when node is having too much int size = arrivalIntervals_.size(); double log = 0d; access and/or GC running if ( size > 0 ) { double t = tnow - tLast_; • Depends on number of nodes: double probability = p(t); log = (-1) * Math.log10( probability ); – Larger cluster, larger number. } return log; } double p(double t) { double mean = mean(); double exponent = (-1)*(t)/mean; return Math.pow(Math.E, exponent); } 25
  • 26. Hardware • Benchmark is important to decide hardware. – Requirements for performance, data size, etc. – Cassandra is good at utilizing CPU cores. • Network ports will be bottleneck to scale-out… – Large number of low-spec servers or – Small number of high-spec servers. Our case: • High-spec CPU and SSD drives • 2 clusters (active and test cluster) 26
  • 27. System Architecture DB … DB Cassandra 1 B atch Data feeder          DB Services B atch … DB … DB Cassandra 2 Backup 27
  • 28. Customize Hector Library • Query can timeout on Cassandra: – When Cassandra is in high load temporarily. – Request of large result set – Timeout of secondary index query • Hector retries forever when query get timed-out. • Client cannot detect infinite loop. • Customize: – 3 Timeouts to return exception to client. 28
  • 29. System Architecture DB … DB Cassandra 1 B atch Data feeder          DB Services B atch … DB … DB Cassandra 2 Backup 29
  • 30. Testing: Data Consistency Check Tool • We wanted to make sure data is not corrupted within Cassandra. • Made a tool to check the data consistency. Input data - Insert (Periodically - Update comes in) - Delete Process A Insert, update, and delete data Another Process B Cassandra database Compare data with that in Cassandra 30
  • 31. Testing: Data Consistency Check Tool (2)  Compare only keys of data, not contents.  Useful to diagnose which part is wrong in test phase.  We found out other team’s bug as well 31
  • 32. Repair • Some types of query doesn’t trigger read repair. • Nodetool repair is tricky on big data. – Disk usage – Time consuming → Read all data afterward: Read repair • Discussion for improvement is going on: – CASSANDRA-2699 32
  • 33. System Architecture DB … DB Cassandra 1 B atch Data feeder          DB Services B atch … DB … DB Cassandra 2 Backup 33
  • 34. Backup Scheme  Backup might be required to shorten recovery time. 1. Snapshot to local disk – Plan disk size at server estimation phase. 1. Full backup of input data – We had full data feed several times for various reasons: E.g., Logic change, schema change, data corruption, etc. DB Incoming … DB data Cassandra Backup Snapshot Snapshot Snapshot 34
  • 35. Contents 1 Big Data Problem in Rakuten 2 Contributions to Cassandra Project 3 System Architecture 4 Details and Tips 5 Conclusion 35
  • 36. Conclusion • Rakuten uses Cassandra with Big data. • We’ll continue contributing to OSS. 36
  • 38. We are hiring! 中途採用を大募集しております! 楽天のMission 人と社会を(ネットを通じて)Empowermentし 自らの成功を通じ社会を変革し豊かにする 楽天のGOAL To become No.1 Internet Service Company in the World 楽天のMission&GOALに共感いただける方は是非ご連絡ください!  tech-career@mail.rakuten.com 38