Se ha denunciado esta presentación.
Utilizamos tu perfil de LinkedIn y tus datos de actividad para personalizar los anuncios y mostrarte publicidad más relevante. Puedes cambiar tus preferencias de publicidad en cualquier momento.

de

Data Modeling on NoSQL Slide 1 Data Modeling on NoSQL Slide 2 Data Modeling on NoSQL Slide 3 Data Modeling on NoSQL Slide 4 Data Modeling on NoSQL Slide 5 Data Modeling on NoSQL Slide 6 Data Modeling on NoSQL Slide 7 Data Modeling on NoSQL Slide 8 Data Modeling on NoSQL Slide 9 Data Modeling on NoSQL Slide 10 Data Modeling on NoSQL Slide 11 Data Modeling on NoSQL Slide 12 Data Modeling on NoSQL Slide 13 Data Modeling on NoSQL Slide 14 Data Modeling on NoSQL Slide 15 Data Modeling on NoSQL Slide 16 Data Modeling on NoSQL Slide 17 Data Modeling on NoSQL Slide 18 Data Modeling on NoSQL Slide 19 Data Modeling on NoSQL Slide 20 Data Modeling on NoSQL Slide 21 Data Modeling on NoSQL Slide 22 Data Modeling on NoSQL Slide 23 Data Modeling on NoSQL Slide 24 Data Modeling on NoSQL Slide 25 Data Modeling on NoSQL Slide 26 Data Modeling on NoSQL Slide 27 Data Modeling on NoSQL Slide 28 Data Modeling on NoSQL Slide 29 Data Modeling on NoSQL Slide 30 Data Modeling on NoSQL Slide 31 Data Modeling on NoSQL Slide 32 Data Modeling on NoSQL Slide 33 Data Modeling on NoSQL Slide 34 Data Modeling on NoSQL Slide 35 Data Modeling on NoSQL Slide 36 Data Modeling on NoSQL Slide 37 Data Modeling on NoSQL Slide 38 Data Modeling on NoSQL Slide 39 Data Modeling on NoSQL Slide 40 Data Modeling on NoSQL Slide 41 Data Modeling on NoSQL Slide 42 Data Modeling on NoSQL Slide 43 Data Modeling on NoSQL Slide 44 Data Modeling on NoSQL Slide 45 Data Modeling on NoSQL Slide 46 Data Modeling on NoSQL Slide 47 Data Modeling on NoSQL Slide 48 Data Modeling on NoSQL Slide 49
Próximo SlideShare
Adding Hadoop to Your Analytics Mix?
Siguiente
Descargar para leer sin conexión y ver en pantalla completa.

20 recomendaciones

Compartir

Descargar para leer sin conexión

Data Modeling on NoSQL

Descargar para leer sin conexión

More and more applications are leveraging the power of NoSQL as a primary means of data storage. This session, as presented at Teradata Partners Conference 2015, by Bryce Cottam, Principal Architect at Think Big, a Teradata company, covered how to successfully model application data on NoSQL storage engines for everyday application use. The presentation explores common design patterns, techniques and tips that will help developers leverage the horizontal scalability of NoSQL stores while embracing their inherent limitations. Topics include: Denormalization, Intelligent Keys (including avoiding hot-spotting), Counters, and Data Sharding.

Data Modeling on NoSQL

  1. 1. Data Modeling on NoSQL Bryce Cottam Principal Architect, Think Big a Teradata Company
  2. 2. • Where we came from (RDBMS Modeling) • Migrate Existing Data Model to NoSQL • Questions Agenda
  3. 3. • Migrate a SQL based solution to NoSQL • NoSQL Smack-Down (Battle of the NoSQL Bands) Anti-Agenda What we are NOT going to cover:
  4. 4. Where We Came From (RDBMS Modeling)
  5. 5. SQL Backdrop 123 Tony Soprano true 1963-04-15 124 Carmella Soprano false 1968-12-02 125 Johnny Sacrimoni true 1959-01-11 158 Paulie Gualtieri false 1960-08-04 159 Silvio Dante false 1965-10-11 162 Ralph Cifaretto false 1969-03-28 164 Christopher Moltisanti false 1974-01-11 165 Adriana La Cerva false 1976-11-02 • Column Order • Column Names • Column Width • Data Types Metadata Raw Data • Save space • Consistent format • Familiar syntax (ANSI SQL Standard)
  6. 6. Issues at Scale
  7. 7. UI Presentation
  8. 8. UI Presentation
  9. 9. UI Presentation
  10. 10. Where We Came From Auction User Bid Payment id email name profile_image_url access_level created_date id user_id auction_id amount timestamp id title image_url current_price high_bidder end_time id auction_id timestamp card_type confirmation_number
  11. 11. Data Models public class User { private long id; private String email; private String name; private String profileImageUrl; // AccessLevel is an enum private AccessLevel accessLevel; private Date createdDate; private List<Auction> auctions; private List<Bid> bids; ... } public class Auction { private long id; private String title; private String imageUrl; private BigDecimal currentPrice; private User highBidder; private Date endTime; private List<Bid> bids; private Payment payment; ... } public class Bid { private long id; private User user; private Auction auction; private BigDecimal amount; private Date timestamp; ... } public class Payment { private long id; private Auction auction; private Date timestamp; // Visa, MasterCard, AmEx etc. private String cardType; private String confirmationNumber; ... }
  12. 12. Support Queries select a.*, b.* from auction a join bid b on a.id = b.auction_id where a.id = 12345 order by b.timestamp desc • Either manual SQL or ORM generated SQL will wind up joining a few tables to get the desired results • Joins are not supported by most NoSQL solutions Get all Bids for a given Auction:
  13. 13. Support Queries select count(*) from bid where user_id = 554422 • Aggregates in NoSQL are usually not supported • If they are supported, they often have performance or memory issues select avg(current_price) from auction select u.name, max(s.bid_count) as bids from (select user_id, count(*) as bid_count from bid group by user_id) as s join user u on u.id = s.user_id Count all Bids for a User: Get average final price of all Auctions: Get the User with the most Bids:
  14. 14. Adapt to your Data Store Model • Most web app developers think in terms of tables, columns, queries • Many times the schema is simply mirrored in the application layer model objects • (Not a bad thing, but hard to change) • The most successful/scalable applications embrace the features and limitations of their chosen datastore Schema DAO Application Patterns defined here effect application behavior for data interaction Model Access PatternStorage Details Model
  15. 15. Encouraging Scalable Access Patterns public class BidDao { // Common API structure, loads all in memory // Also requires that the full User object is available public List<Bid> getBids(User user) {…} ... } public class BidDao { // Paging is a good option to avoid memory issues public List<Bid> getBids(String userId, int offset, int limit) {…} // Streaming APIs encourages streaming processing public Iterator<Bid> getBids(String userId) {…} ... } Common: Alternative:
  16. 16. Encouraging Scalable Access Patterns DAO DAO Common: Streaming: Small buffer Memory Required DAO Paging: Memory Required … Garbage Collected … Memory Required
  17. 17. Adapt to your Data Store Application SQL-NoSQL Adapter DAO DAO DAO Danger!! If you mask your true datastore semantics, you risk your scalability • DataNucleus is a good option if used with discipline • Provides JDO/JPA support NoSQL Store
  18. 18. Top level concepts to embrace • Denormalization • Intelligent Key Design • Counters • Sharding
  19. 19. Denormalization
  20. 20. Identify Conceptually Immutable Fields public class User { private long id; private String email; private String name; private String profileImageUrl; // AccessLevel is an enum private AccessLevel accessLevel; private Date createdDate; private List<Auction> auctions; private List<Bid> bids; ... } public class Auction { private long id; private String title; private String imageUrl; private BigDecimal currentPrice; private User highBidder; private Date endTime; private List<Bid> bids; private Payment payment; ... } public class UserReference { private long id; private String name; private String profileImageUrl; ... } public class AuctionReference { private long id; private String title; private String imageUrl; ... }
  21. 21. Modified Data Structures public class User { // Changed ids to Strings // (more on that soon) private String id; private String email; private String name; private String profileImageUrl; private AccessLevel accessLevel; private Date createdDate; private List<Auction> auctions; private List<Bid> bids; ... } public class Auction { private String id; private String title; private String imageUrl; private BigDecimal currentPrice; private UserReference highBidder; private Date endTime; private List<Bid> bids; private Payment payment; ... } public class Bid { private String id; private UserReference user; private AuctionReference auction; private BigDecimal amount; private Date timestamp; ... } public class Payment { private String id; private AuctionReference auction; private Date timestamp; // Visa, MasterCard, AmEx etc. private String cardType; private String confirmationNumber; ... }
  22. 22. Modified Data Models public class Bid { // the @Embedded annotation (both JDO and JPA) // indicates that this is not an FK relationship: @Embedded private UserReference user; @Embedded private AuctionReference auction; ... } …/d288-4af3-8821-27a37269ec0c {amount:”14.00”, user_id:”abc123”, user_name:”Ralph Cifaretto”, user_profile_image:”http://…”, …} …/d288-4af3-8821-27a37283af10 {amount:”240.00”, user_id:”abc123”, user_name:”Ralph Cifaretto”, user_profile_image:”http://…”, …} Bid id user_id user_name user_profile_image amount timestamp auction_title … Under the hood in the data store: • JDO/JPA configuration is certainly not required • We’re making a copy of the conceptually immutable properties of the user • When we read a Bid record now, we don’t need to go fetch the User record • Nor do we need a join
  23. 23. Manual Marshaling public class BidDao { public Bid read(String id) { // This is an HBase-like API, but the idea is the same for most all // NoSQL datastore native APIs: Result result = openConnection().get(“bid”, id); Bid bid = new Bid(); bid.setId(result.getValue(“id”)); ... String userId = result.getValue(“user_id”); String userName = result.getValue(“user_name”); String profileUrl = result.getValue(“user_profile_image”); UserReference user = new UserReference(userId, userName, profileUrl); bid.setUser(user); ... return bid; } ... } // To access user information: UserReference user = bid.getUser(); String userName = user.getName();
  24. 24. We support access pattern without joins auction_title auction_title auction_title auction_title auction_image .somg Bid id user_id user_name user_profile_image amount timestamp auction_id auction_title auction_image_url Click on Auction image or name and go to details for Auction
  25. 25. Data is duplicated many (many) times Bid id amount user_id user_name user_profile_image auction_id auction_title . . . 124 14.00 5432 Gustavo ‘Gus’ Fring http://nj.boss.com… 555111222 Barrel Methylamine . . . 125 13.00 1234 Walter White http://dead.users… 555111222 Barrel Methylamine . . . 126 12.00 2223 Hank Schrader http://dea.bro.com… 555111222 Barrel Methylamine . . . 127 11.00 1234 Walter White http://dead.users… 555111222 Barrel Methylamine . . . 128 10.00 1112 Jesse Pinkman http://facebook.com… 555111222 Barrel Methylamine . . . 129 9.00 2223 Hank Schrader http://dea.bro.com… 555111222 Barrel Methylamine . . . 130 8.00 1234 Walter White http://dead.users… 555111222 Barrel Methylamine . . . 131 7.00 1112 Jesse Pinkman http://facebook.com… 555111222 Barrel Methylamine . . . 132 6.00 1234 Walter White http://dead.users… 555111222 Barrel Methylamine . . . User id name profile_image email created_date . . . 5432 Gustavo ‘Gus’ Fring http://nj.boss.com… tony@breakingbad.com 2008-01-01 . . . 1234 Walter White http://chem.users… walter@breakingbad.com 2008-02-02 . . . 2223 Hank Schrader http://dea.bro.com… hank@breakingbad.com 2009-01-12 . . . 1112 Jesse Pinkman http://facebook.com… jessie@breakingbad.com 2008-11-16 . . .
  26. 26. What about updates? Backend Node(s) Async Request to change all Bid records related to this user Name Change Request Edge Node Time Line NoSQL Response sent to user Use workers to modify affected records Possibly minutes
  27. 27. Denormalization Observations • We don’t always need ACID compliance • Strict FK enforcement not always required • MySQL’s MyISAM storage works fine for many situations • Users are getting used to change latency • There is a trade off between horizontal scalability in your app and patterns we’ve been trained to rely on
  28. 28. Intelligent Key Design
  29. 29. Sample NoSQL Storage Layout Server 1 key001 ...data... key002 ...data... key003 ...data... key004 ...data... key005 ...data... key006 ...data... key007 ...data... key008 ...data... key009 ...data... key010 ...data... … Server 2 key011 ...data... key012 ...data... key013 ...data... key014 ...data... key015 ...data... key016 ...data... key017 ...data... key018 ...data... key019 ...data... key020 ...data... Server 3 key021 ...data... key022 ...data... key023 ...data... key024 ...data... key025 ...data... key026 ...data... key027 ...data... key028 ...data... key029 ...data... key030 ...data... Server n key091 ...data... key092 ...data... key093 ...data... key094 ...data... key095 ...data... key096 ...data... key097 ...data... key098 ...data... key099 ...data... key100 ...data... • This scan is “get everything from key16 through key22” • A key-range scan returns N rows in linear time O(N) regardless of the number of rows in the table • This is not true for relational databases
  30. 30. Intelligent Key Design abc123 {…} abc124 {name:”Tony Soprano”, createdDate:”2011-01-12”, email:”tony@sopranos.com”, role:”BOSS”} abc125 {name:”Salvator Bonpensiero”, createdDate:”2014-10-02”, email:”bonpensiero@sopranos.com”, role:”CAPO”} abc126 {name:”Christopher Moltisanti”, createdDate:”2012-10-02”, email:”christopher@sopranos.com”, role:”SOLDIER”} abc2 {name:”Carmella Soprano”, createdDate:”2011-10-02”, email:”carmella@sopranos.com”, favoriateCar:”BMW”} abc20 {name:”Meadow Soprano”, createdDate:”2012-01-02”, email:”meadow@sopranos.com”, favoriateCar:12.25} abc21 {someField:”some value”, averageScore:5.75, someOtherDate:”2011-10-02”} abc22 {…} bcd1 {…} bcd12 {…} Key ordering is lexical Records can be different schemas
  31. 31. Ascending Timestamp Bid/2014-10-26T09:00:00.000 {…} Bid/2014-10-26T09:00:12.975 {…} Bid/2014-10-26T09:00:14.221 {…} Bid/2014-10-26T09:00:18.005 {…} Bid/2014-10-26T09:00:35.572 {…} Bid/2014-10-26T09:00:40.003 {…} Bid/2014-10-26T09:00:41.123 {…} Bid/2014-10-26T09:00:41.124 {…} Bid/2014-10-26T09:00:41.150 {…} Bid/2014-10-26T09:00:41.218 {…} yyyy-MM-ddTHH:mm:ss.SSS is a pretty standard timestamp and lexically orders chronologically • Great for time-series data • Timeline tracking (viewing data in the order it was processed etc.) OlderNewer
  32. 32. UI Presentation Descending Order
  33. 33. UI Presentation Descending Order
  34. 34. Descending Timestamp Bid/9223370622642200431 {…} Bid/9223370622642200478 {…} Bid/9223370622642200512 {…} Bid/9223370622642203021 {…} Bid/9223370622642203897 {…} Bid/9223370622642204112 {…} Bid/9223370622642204559 {…} Bid/9223370622642207054 {…} Bid/9223370622642215431 {…} Bid/9223370622642235500 {…} public class User { // This will yield some ridiculous value like: 9223370622642200431 // Number of millseconds in a year: 3153600000 // This computation will reach 0 in the year 292,471,163 long descendingTimestamp = Long.MAX_VALUE – System.currentTimeMillis(); } NewerOlder
  35. 35. Descending Timestamp Bid/9223370622642200431 {… action_id:”12345” …} Bid/9223370622642200478 {… action_id:”54321” …} Bid/9223370622642200512 {… action_id:”12345” …} Bid/9223370622642203021 {… action_id:”22222” …} Bid/9223370622642203897 {… action_id:”22233” …} Bid/9223370622642204112 {… action_id:”12345” …} Bid/9223370622642204559 {… action_id:”22233” …} Bid/9223370622642207054 {… action_id:”54321” …} Bid/9223370622642215431 {… action_id:”54321” …} Bid/9223370622642235500 {… action_id:”12345” …} 1 2 3 4 5 Start with ”Bid/” Stop after 5 rows 5 most recent bids • Known as a “range scan” • Very easy to start with some prefix and read for N records • Complexity stays constant for top 5 bids no matter how many bids are in the system
  36. 36. Descending Timestamp Auction/11222/Bid/9223370622642203021 {… action_id:”11222” …} Auction/12233/Bid/9223370622642203897 {… action_id:”12233” …} Auction/12233/Bid/9223370622642204559 {… action_id:”12233” …} Auction/12345/Bid/9223370622642200431 {… action_id:”12345” …} Auction/12345/Bid/9223370622642200512 {… action_id:”12345” …} Auction/12345/Bid/9223370622642204112 {… action_id:”12345” …} Auction/12345/Bid/9223370622642235500 {… action_id:”12345” …} Auction/54321/Bid/9223370622642200478 {… action_id:”54321” …} Auction/54321/Bid/9223370622642207054 {… action_id:”54321” …} Auction/54321/Bid/9223370622642215431 {… action_id:”54321” …} 1 2 3 4 Start with ”Auction/12345” Stop after 4 rows 4 most recent bids “Bid/9223370622642200431”“Auction/12345” • Now, all Bids for each Auction are located right next to each other • This matches our most used access pattern • We now have information about related data just from the key • Key-only queries can be used to help speed up apps • Why 4 Bids instead of 5? My example only had 4 records (or until row “Auction/12346”)
  37. 37. Linking Related Data With Intelligent Keys 1234 12341234 Bid Auction/11222/... {…} Auction/12233/... {…} Auction/12233/... {…} Auction/12345/... {…} Auction/12345/... {…} Auction/12345/... {…} Auction/12345/... {…} Auction/54321/... {…} Auction/54321/... {…} Auction/54321/... {…} Auction 11222 {…} 12233 {…} 12345 {…} 54321 {…} http://myapp.com/api/auctions/12345 datastore.get(”12345”); datastore.rangeScan(”Auction/12345/”, 5); Both reads can be done in parallel
  38. 38. Linking Related Data With Intelligent Keys 1234 12341234 AuctionData Auction/11222/Bid/987321... {…} Auction/12233/Bid/987534... {…} Auction/12233/Bid/987635... {…} Auction/12345 {…, ..., ...} Auction/12345/Bid/977534... {…} Auction/12345/Bid/987501... {…} Auction/12345/Bid/987687... {…} Auction/12345/Bid/988012... {…} Auction/54321 {…, ..., ...} Auction/54321/... {…} Auction/54321/... {…} datastore.rangeScan(”Auction/12345”, 6); Data of completely different schemas / types can be written to the same table co-located on disk http://myapp.com/api/auctions/12345
  39. 39. Counters
  40. 40. Counters public void placeBid(String userId, String auctionId) { // Many NoSQL stores support a native counter via some increment-and-get // After the counter has been incremented, we don’t need to worry about contention long bidCount = datastore.incrementAndGet(auctionId + ”_counter”); BigDecimal amount = bidCount * BID_INCREMENT; long descendingTimestamp = Long.MAX_VALUE - System.currentTimeMillis(); String bidId = ”Auction/” + auctionId + ”/Bid/” + reverseTimestamp + ”/” + amount; // Increment some helper counters... datastore.incrementAndGet(”global_bidCounter”); datastore.incrementAndGet(auctionId + ”_bidCounter”); datastore.incrementAndGet(userId + ”_bidCounter”); // ... other logic like creating the Bid object ... bidDao.write(bidId, bid); } // Some datastores may have a first-order Counter object: Counter bidCounter = datastore.getCounter(auctionId + ”_counter”); long bidCount = counter.incrementAndGet();
  41. 41. UI Presentation datastore.incrementAndGet(userId + ”_bidCounter”);
  42. 42. UI Presentation datastore.incrementAndGet(”global_bidCounter”); • Global counters are a major bottleneck
  43. 43. Sharding
  44. 44. Data Model Sharding public class Auction { private String id; private String title; private String imageUrl; private String description; private BigDecimal currentPrice; private User highBidder; private Date endTime; ... } public class AuctionState { private String id; private BigDecimal currentPrice; private User highBidder; private Date endTime; ... } • Separate frequently changing data from static data • Allows caching of static data • Makes reads/writes of changing data faster • Separate values expensive to serialize but in-frequently read
  45. 45. 12341234http://myapp.com/api/auctions/12345 More Parallel Reads 1234 AuctionState Auction 11222 {…} 12233 {…} 12345 {…} 54321 {…} datastore.get(”12345”); datastore.get(”12345”); Both records can share the same key 11222 {…} 12233 {…} 12345 {…} 54321 {…} Memcache Check Cache Both reads can be done in parallel
  46. 46. 1234 1234 AuctionData Auction/11222/Bid/987321... {…} Auction/12233/Bid/987534... {…} Auction/12233/Bid/987635... {…} Auction/12345 {…, ..., ...} Auction/12345/AuctionState {…} Auction/12345/Bid/977534... {…} Auction/12345/Bid/987501... {…} Auction/54321 {…, ..., ...} Auction/54321/... {…} More Parallel Reads 12341234http://myapp.com/api/auctions/12345 datastore.get(”Auction/12345/AuctionState”); datastore.get(”Auction/12345”); Again, records can be in the same table Memcache Check Cache 1 4
  47. 47. Sharding a 64 bit Integer long count = datastore.incrementAndGet(”global_bidCounter”); 176 52 84 40+ + = 176 global_bidCounter 52 84 41 177+ + = 53 84 40 177+ + = 52 85 40 177+ + = • Decompose the counter • Pick any part of the count and increment it
  48. 48. Implementing a Sharded Counter public class ShardedCounter { // the @Embedded annotation (both JDO and JPA) // indicates that this is not an FK relationship: private String name; private int shards; private void increment() { int index = random(shards); datastore.incrementAndGet(name + ”-” + index); } private long get() { long count = 0; // All the shards of the counter are located next to each other: Result scan = datastore.rangeScan(name + ”-”, shards); while (scan.hasNext()) { Counter next = scan.next(); count += next.get(); } return count; } }
  49. 49. We Love Feedback Questions/Comments Email: bryce.cottam@thinkbiganalytics.com Rate This Session with the PARTNERS Mobile App Remember To Share Your Virtual Passes Follow Teradata 2015 PARTNERS www.teradata-partners.com/social
  • BrightsonTo

    Oct. 19, 2017
  • OhDaeHoon

    Nov. 26, 2015
  • mmcdevitt89

    Nov. 21, 2015
  • FinneyChang

    Nov. 19, 2015
  • forwhom

    Nov. 19, 2015
  • ThomasSangJoonKim

    Nov. 18, 2015
  • kato1883

    Nov. 18, 2015
  • mshang5

    Nov. 18, 2015
  • evangozali

    Nov. 18, 2015
  • ssuser88a8b3

    Nov. 18, 2015
  • RyanKwon1

    Nov. 18, 2015
  • ygtalkplace

    Nov. 18, 2015
  • timotolkie

    Nov. 18, 2015
  • lastkuku

    Nov. 17, 2015
  • underbellpark

    Nov. 17, 2015
  • benjaminbkim9

    Nov. 17, 2015
  • JungKim2

    Nov. 17, 2015
  • AquaMac

    Nov. 17, 2015
  • kewang

    Nov. 17, 2015
  • rualatngua

    Nov. 13, 2015

More and more applications are leveraging the power of NoSQL as a primary means of data storage. This session, as presented at Teradata Partners Conference 2015, by Bryce Cottam, Principal Architect at Think Big, a Teradata company, covered how to successfully model application data on NoSQL storage engines for everyday application use. The presentation explores common design patterns, techniques and tips that will help developers leverage the horizontal scalability of NoSQL stores while embracing their inherent limitations. Topics include: Denormalization, Intelligent Keys (including avoiding hot-spotting), Counters, and Data Sharding.

Vistas

Total de vistas

1.841

En Slideshare

0

De embebidos

0

Número de embebidos

27

Acciones

Descargas

57

Compartidos

0

Comentarios

0

Me gusta

20

×