4. Confidential Common themes in next-gen DB architectures NoSQL, Data Grids, Data Fabrics, NewSQL “ Shared nothing” commodity clusters focus shifts to memory, distributing data and clustering Scale by partitioning the data and move behavior to data nodes HA within cluster and across data centers Add capacity to scale dynamically
12. Flexible Deployment Topologies Confidential Java Application cluster can host an embedded clustered database by just changing the URL jdbc:sqlfire:;mcast-port=33666;host-data=true
24. Hash partitioning for linear scaling Key Hashing provides single hop access to its partition But, what if the access is not based on the key … say, joins are involved
25.
26.
27.
28. Partition aware DB design Entity Groups Table FlightAvailability partitioned by FlightID colocated with Flights FlightID is the entity group Key
31. Procedures Java Stored Procedures may be created according to the SQL Standard SQLFabric also supports the JDBC type Types.JAVA_OBJECT. A parameter of type JAVA_OBJECT supports an arbitrary Serializable Java object. In this case, the procedure will be executed on the server to which a client is connected (or locally for Peer Clients) CREATE PROCEDURE getOverBookedFlights (IN argument OBJECT, OUT result OBJECT) LANGUAGE JAVA PARAMETER STYLE JAVA READS SQL DATA DYNAMIC RESULT SETS 1 EXTERNAL NAME com.acme.OverBookedFLights;
32. Data Aware Procedures Parallelize procedure and prune to nodes with required data Extend the procedure call with the following syntax: Hint the data the procedure depends on CALL getOverBookedFlights( <bind arguments> ON TABLE FLIGHTAVAILABILITY WHERE FLIGHTID = <SomeFLIGHTID> ; If table is partitioned by columns in the where clause the procedure execution is pruned to nodes with the data (node with <someFLIGHTID> in this case) CALL [PROCEDURE] procedure_name ( [ expression [, expression ]* ] ) [ WITH RESULT PROCESSOR processor_name ] [ { ON TABLE table_name [ WHERE whereClause ] } | { ON {ALL | SERVER GROUPS (server_group_name [, server_group_name ]*) }} ] Fabric Server 2 Fabric Server 1 Client
33. Parallelize procedure then aggregate (reduce) Fabric Server 2 Fabric Server 1 Client Fabric Server 3 CALL SQLF.CreateResultProcessor( processor_name, processor_class_name); register a Java Result Processor (optional in some cases) : CALL [PROCEDURE] procedure_name ( [ expression [, expression ]* ] ) [ WITH RESULT PROCESSOR processor_name ] [ { ON TABLE table_name [ WHERE whereClause ] } | { ON {ALL | SERVER GROUPS (server_group_name [, server_group_name ]*) }} ]
Original design rooted in good principles of data independence, durability and consistency of data Designers naturally focused on disk IO performance and maintaining strict data consistency through many locking/latching techniques Buffer management is all about using memory for caching contiguous disk blocks
Driven by the desire to scale, be HA and offer lower latencies ... clusters of commodity servers .... ... focus shifts to distributing data and clustering ... shared nothing including the disk ... avoid disk seeks as much as possible .. ... memory is cheap and reliable ... Pool memory across cluster and use highly optimized concurrent data structures ... partitioning with consistent hashing ... dynamically increase cluster capacity ... Move and parallelize behavior to data (MR) ... High availability within cluster and across data centers
Entity groups defined in SQLFire using “colocation” clause Entity group guaranteed to be collocated in presence of failures or rebalance Now, complex queries can be executed without requiring excessive distributed data access