SlideShare una empresa de Scribd logo
1 de 155
NoSQL and MongoDB
Rajesh Menon
Agenda – Day 1
• Day 1 – Theory / Demo
o Introduction to NoSQL – 3 hours
➢ What Is Meant by NoSQL?
➢ Distributed and Decentralized
➢ Elastic Scalability
➢ High Availability and Fault Tolerance
➢ Brewer's CAP Theorem
➢ Row-Oriented
➢ Schema-Free
➢ High Performance
➢ Types of NoSQL Databases
➢ Introduction to ReDis (Key-Value Pair)
➢ Introduction to HBase (Column Oriented Hadoop)
➢ Introduction to Cassandra (Column- Oriented)
➢ Introduction to MongoDB (Document Oriented)
➢ Introduction to Neo4j (Graph DB)
•
o Remaining 5 hours
➢ Aggregation – 30 mins
➢ Map Reduce – 30 mins
➢ Compatibility with SPARK – 30 mins
➢ Query Optimisation based on 3.6(Advanced functions) – 30 mins
➢ Deep Diving on the functions of Mongo DB 3.4 – 2.5 hours
➢ Indexing techniques – 30 mins
Agenda – Day 2
 1) Aggregation Framework. – 1 hour
 2) MongoDB Sharding. – 1 hour
 3) MongoDB Ad hoc queries. – 1 hour
 4) MongoDB is Schema – Less. – 1 hour
 5) Capped Collections. - 1 hour
 6) MongoDB Indexing. – 1 hour
✓ 7) Project Work – 2hours (Related to Security)
What Is Meant
by NoSQL?
• A NoSQL database provides a mechanism for storage
and retrieval of data that is modeled in means other
than the tabular relations used in relational databases.
Why NOSQL?
Driving Trends - Data Size
• Data size is increasing exponentially year after year
Driving Trend: Semi-
Structured
Information
• Content is becoming more
unique
• We can blame generation Y for
this!
• Before: Job Title - Software
Engineer
• After: Job Title - ZOMG
Awesome Core Repository
Developer
RDBMS Performance Curve
Social Network Performance
Why is RDBMS performance horrible?
•To find all friends on depth 5,MySQL will create Cartesian
product on t_user_friend table 5 times
•Resulting in 50,000^5 records return
•All except 1,000 are discarded
•Neo4j will simply traverse through the nodes in the database
until there are no more nodes in which to traverse
Social Network Performance
The power of traversals
•Graphs data structures are localized
•Count all of the people around you
•Adding more people in the room may only slightly impact
your performance to count your neighbors
Distributed and
Decentralized
• Horizontal (Mongo DB) and Vertical
(Oracle) scaling
• Distributed database
• Decentralized database
• Eg: Mongo DB
• Application : BlockChain
Elastic Scalability
• Scales to multiple clusters
• Private Cloud
• Public Cloud (MongoDB Atlas)
• Literally infinite scale
• Keep adding resources
• Tune the database
• NoSQL scales better than RDBMS
• Provision resources in seconds
High Availability
and Fault
Tolerance
• Always available, provided configuration is OK
• Look at the load on the system
• Number of users, sessions, memory, CPU, network
• Fault tolerance built in
• If one node fails, others take over without affecting the
entire system
• Classic example : Hadoop
• Cloud is a catalyst. And the future.
Brewer's CAP
Theorem
http://vitalflux.com/wp-content/uploads/2015/04/cap-theorum.png
Brewer’s Cap
Theorem
explained
• Consistency: Any changes to a particular record
stored in database, in form of inserts, updates
or deletes is seen as it is, by other users
accessing that record at that particular time. If
they don’t see the updated data, it is termed as
eventual consistency.
• Availability: The system continues to work and
serve data in spite of node failures.
• Partition Tolerance: The database system could
be stored based on distributed architecture
such as Hadoop (HDFS).
Brewer’s Cap
Theorem
examples
• RDBMS systems such as Oracle, MySQL etc
supports consistency and availability.
• NoSQL datastore such as HBase supports
Consistency and Partition Tolerance.
• NoSQL datastore such as Cassandra, CouchDB
supports Availability and Partition Tolerance.
Brewer’s CAP
Theorem
notes –
Which DB to
choose ?
CP-based database system: When it is critical that all
clients see a consistent view of the database, the users of
one node will have to wait for any other nodes to come
into agreement before being able to read or write to the
database, availability takes a backseat to consistency and
one may want to choose database such as HBase that
supports CP (Consistency and Partition Tolerance)
AP-based database system: When there is a requirement
that database remain available at all times, one could DB
system which allows clients write data to one node of the
database without waiting for other nodes to come into
agreement. DB system then takes care of data
reconciliation at a little later time. This is the state of
eventual consistency. In applications which could sacrifice
data consistency in return of huge performance, one could
select databases such as CouchDB, Cassandra.
Row-Oriented
• A column-oriented DBMS (or
columnar database management system) is a
database management system (DBMS) that stores data
tables by column rather than by row. Practical use of
a column store versus a row store differs little in the
relational DBMS world.
• RDBMS
• Document-based Store- It stores documents made up of
tagged elements. {Example- CouchDB} Column-based
Store- Each storage block contains data from only
one column, {Example- HBase, Cassandra} Graph-based-
A network databasethat uses edges and nodes to
represent and store data.
Row-Oriented
• MongoDB is schema-free, allowing
you to create documents without
having to first create the structure for
that document. At the same time, it
still has many of the features of a
relational database, including strong
consistency and an expressive
query language.CouchDB: Views
in CouchDB are similar to indexes
in SQL.
Comparing
Couch DB
with Mongo
DB
mySQL and
MongoDB
Map Reduce
in mySQL
and
MongoDB
Schema-Free
• On schema-free databases like for
example MongoDB, you can simply add
records without any previous structure.
Moreover, you can group records that
do not have the same structure, for
example, you can have a collection
(something like a table on relational
databases where you group records)
with records of various structures, in
other words, they do not need to have
the same columns (properties).
Schema Free – Mongo DB Notes
MongoDB is a JSON-style data store. The documents stored in the database can have
varying sets of fields, with different types for each field.
And that’s true. But it doesn’t mean that there is no schema. There are in fact various
schemas:
•The one in your head when you designed the data structures
•The one that your database really implemented to store your data structures
•The one you should have implemented to fulfill your requirements
Every time you realise that you made a mistake (see point three above), or when your
requirements change, you will need to migrate your data.
Let’s review again MongoDB’s point of view here:
With a schemaless database, 90% of the time adjustments to the database become
transparent and automatic.
For example, if we wish to add GPA to the student objects, we add the attribute, resave,
and all is well — if we look up an existing student and reference GPA,
we just get back null. Further, if we roll back our code, the new GPA fields in the existing
objects are unlikely to cause problems if our code was well written.
Schema Free
- Hadoop
What Hadoop, NoSQL databases and other modern
big data tools allow is for each application or user to
come to the raw data with a different schema. Take
call center logs as an example. Someone performing
a columnar analysis on time and call length has a
different interpretation of the schema than someone
doing a row search for a specific call. But they aren't
imposing a schema-on-read; rather, they're flexibly
addressing different components of the schema to
maximize their individual query performance.
So, forget schema-less, schema-on-read and other
nonsense that is of use only to theorists and niche
players. Focus instead on providing ways for flexible
database schemas to be integrated into the full
business information pipeline.
High Performance
High Performance
• mongostat is the most powerful utility. It
reports real-time statistics about connections,
inserts, queries, updates, deletes, queued reads
and writes, flushes, memory usage, page faults,
and much more. It can be useful to quickly spot-
check database activity, see if values are not
abnormally high, and make sure you have
enough capacity.
• mongotop returns the amount of time a
MongoDB instance spends performing read and
write operations. It is broken down by collection
(namespace). This allows you to make sure
there is no unexpected activity and see where
resources are consumed. All active namespaces
are reported. (frequency – every second)
NOSQL Database Types
Column Family
GraphDocument
Key-Value
Types of
NoSQL
Databases
Key-Value Stores
• Most based on Dynamo white
paper
• Dynamo: Amazon’s Highly
Available Key Value Store (2007)
• Data Model
• Global key-value mapping
• Massively scalable HashMap
• Highly Fault Tolerant
• Examples
• Riak, Redis, Voldemort
Key Value Stores:
Strengths and
Weaknesses
• Strengths
• Simple data model
• Horizontally scalable
• Weaknesses
• Simple data model
• Poor at handling complex data
Column Family
• Based on Google’s BigTable
white paper
• BigTable: Distributed Storage
System for Structured Data
(2006)
• Tuple of K-V where the key
maps to a set of columns
• Map Reduce for querying and
processing
• Examples
• Cassandra, HBase, Hypertable
Column
Family:
Strengths
and
Weaknesses
• Strengths
• Data model supports semi-structured
data
• Naturally indexed (columns)
• Horizontally scalable
• Weaknesses
• Does not handle interconnected data
well
Document-oriented
Databases
• Data Model
• Collection of documents
• Document is a key-value
collection
• Index-centric
• Examples
• MongoDB, CouchDB, Lotus
Notes?
Document-
oriented
Databases:
Strengths
and
Weaknesses
• Strengths
• Simple, but powerful data model
• Good scalability
• Weaknesses
• Does not handle interconnected data
well
• Querying is limited to keys and indexes
• MapReduce for large queries
Graph
Databases
Data Model
• Nodes with
properties
• Named
relationships with
properties
Examples
• Neo4j, Sones
GraphDB,
OrientDB,
InfiniteGraph,
AllegroGraph
Graph
Databases:
Strengths
and
Weaknesses
• Strengths
• Extremely powerful data model
• Performant when querying
interconnected data
• Easy to query
• Weaknesses
• Sharding
• Rewiring your brain
Complexity
vs Size
Typical Use Cases for Graph Databases
• Recommendations
• Business Intelligence
• Social Computing
• Master Data Management
• Geospatial
• Genealogy (Past and Present)
• Time Series Data
• Web Analytics
• Bioinformatics
• Indexing RDBMS
Maturity of Data Models
NOSQL RDBMS Graph Stores
•Most NOSQL: ~6 years
•Relational: 42 years
•Graph Theory: 276 years
Leonhard Euler
• Inventor of Graph Theory
(1736)
• Swiss mathematician
• The original hipster
Graph Data Model
Person
City
Event
Graph Data Model
Person
City
Event
Is Attending
Hosted In
Is Located In
Rated
Graph Data Model
Person
City
Event
Is Attending
Hosted In
Is Located In
firstName:kyle
lastName:adams
name:DevCon
name:San Jose
country:USA
Rated
score: 11 out of 10
comment: Amazing!!!
What is Neo4j?
•Leading Open Source graph database
•Embeddable and Server
•ACID compliant
•White board friendly
•Stable
•Has been in 24/7 operation since 2003
More Reasons Why Neo4j is Great
•High performance graph operations
•Traverse 1,000,000+ relationships/sec on commodity
hardware
•32 billion nodes & relationships per Neo4j instance
•64 billion properties per Neo4j instance
•Small footprint
•Standalone server is ~65mb
If NOSQL stands for Not
Only SQL,
....then how do we
execute queries?!
Traversals!
Social Network Performance
MySQL vs Neo4j
Social Network Performance
• First rule of fight club:
• Run a friends of friends query
• Second rule of fight club:
• 1,000 Users
• Third rule of fight club:
• Average of 50 friends per user
• Fourth rule of fight club:
• Limit the depth of 5
• Fifth rule of fight club:
• Intel i7 commodity laptop w/8GB RAM
The Experiment: Round 1
Social Network Performance
RDBMS Schema
Social Network Performance
select distinct uf3.* from t_user_friend uf1 inner join t_user_friend uf2 on
uf1.user_1 = uf2.user_2 inner join t_user_friend uf3 on
uf2.user_1 = uf3.user_2 where uf1.user_1 = ?
SQL: Friends of friends at depth 3
Social Network Performance
MySQL Results: Round 1- 1,000 Users
Depth Execution Time (sec) Records Returned
2 0.028 ~900
3 0.213 ~999
4 10.273 ~999
5 92,613.150 ~999
Social Network Performance
Social Graph
Social Network Performance
Neo4j Traversal API
TraversalDescription traversalDescription =
Traversal.description()
.relationships("IS_FRIEND_OF",Direction.OUTGOING)
.evaluator(Evaluators.atDepth(2))
.uniqueness(Uniqueness.NODE_GLOBAL);
Iterable<Node> nodes = traversalDescription.traverse(nodeById).nodes();
Social Network Performance
Neo4j Results: Round 1- 1,000 Users
Depth Execution Time (sec) Records Returned
2 0.04 ~900
3 0.06 ~999
4 0.07 ~999
5 0.07 ~999
Social Network Performance
• First rule of fight club:
• Run a friends of friends query
• Second rule of fight club:
• 1,000,000 Users
• Third rule of fight club:
• Average of 50 friends per user
• Fourth rule of fight club:
• Limit the depth of 5
• Fifth rule of fight club:
• Intel i7 commodity laptop w/8GB RAM
The Experiment: Round 2
Social Network Performance
MySQL Results: Round 1- 1,000,000 Users
Depth Execution Time (sec) Records Returned
2 0.016 ~2,500
3 30.267 ~125,000
4 1,543.505 ~600,00
5 Did not finish after an hour N/A
Social Network Performance
Neo4j Results: Round 1- 1,00,000 Users
Depth Execution Time (sec) Records Returned
2 0.010 ~2,500
3 0.168 ~110,000
4 1.359 ~600,000
5 2.132 ~800,000
Introduction to ReDis (Key-Value Pair)
Redis architecture contains two main
processes:Redis client and Redis Server.
Redis client and server can be in the
same
computer or in two different computers.
Redis server is responsible for storing
data
in memory. It handles all kinds of
manage
-ment and forms the major
part ofarchitecture.
Redis with
Clojure –
Pub / Sub
http://matthiasnehlsen.com/images/redesign2.png
Performance of Redis
Redis does a lot with very little CPU
utilization. In a non-scientific test, I fired up
50 JVMs (on four machines) subscribing to
the topic on which the TwitterClient publishes
tweets with matched percolation queries.
Then I changed the tracked term from
the Twitter Streaming API to “love”, which
reliably maxes out the rate of tweets
permitted. Typically, with this term I see
around 60 to 70 tweets per second. With 50
connected processes, 3000 to 3500 tweets
were delivered per second overall, yet the
CPU utilization of Redis idled somewhere
between 1.7% and 2.3%.
Introduction
to HBase
(Column
Oriented
Hadoop)
HBase is a distributed, NoSQL, open-source database,
initially conceived as an open-source alternative to
Google’s proprietary BigTable. Originally, HBase was
part of the Hadoop project, but was eventually spun
off as a subproject. Given this legacy, it is not
surprising that most often HBase is deployed on top of
a Hadoop cluster (it used HDFS as its underlying
storage), however a case study suggests that it can run
on top of Amazon Elastic Block Store (EBS) as well.
These days HBase is used by companies such as
Adobe, Facebook, Twitter and Yahoo – and many
others to process large amounts of data in real time,
since it is ideally placed to store the input and/or the
output of MapReduce jobs.
Introduction to
HBase (Column
Oriented Hadoop)
• HDFS is a distributed filesystem; One can do most regular FS
operations on it such as listing files in a directory, writing a regular
file, reading a part of the file, etc. Its not simply "a collection of
structured or unstructured data" anymore than
your EXT4 or NTFS filesystems are.
• HBase is a in-memory Key-Value store which
may persist to HDFS (it isn't a hard-requirement, you can run
HBase on any distributed-filesystem). For any read key request
asked of HBase, it will first check its runtime memory caches to see
if it has a value cached, and otherwise visit its stored files on HDFS
to seek and read out the specific value. There are various
configurations in HBase offered to control the way the cache is
utilised, but HBase's speed comes from a combination of caching
and indexed persistence (faster, seek-ed file reads).
• HBase's file-based persistence on HDFS does the key indexing
automatically when it writes, so there is no manual indexing need
by its users. These files are regular HDFS files, but specialised in
format for HBase's usage, known as HFiles.
Introduction to HBase
(Column Oriented
Hadoop)
HBase is a NoSQL, column
oriented database built on top
of hadoop to overcome the
drawbacks of HDFS as it allows
fast random writes and reads
in an optimized way.
https://s3.amazonaws.com/files.dezyre.com/images/blog/Overview+of+HBase+Architecture+and+its+Components/HBase+Architecture.jpg
Introduction to
Cassandra (Column-
Oriented)
Apache Cassandra is a free and open-source distributed NoSQL
database management system designed to handle large amounts of
data across many commodity servers, providing high availability with
no single point of failure.
• It is scalable, fault-tolerant, and consistent.
• It is a column-oriented database.
• Its distribution design is based on Amazon’s Dynamo and its data
model on Google’s Bigtable.
• Created at Facebook, it differs sharply from relational database
management systems.
• Cassandra implements a Dynamo-style replication model with no
single point of failure, but adds a more powerful “column family”
data model.
• Cassandra is being used by some of the biggest companies such as
Facebook, Twitter, Cisco, Rackspace, ebay, Twitter, Netflix, and
more.
Introduction
to Cassandra
(Column-
Oriented)
Introduction to
MongoDB
(Document
Oriented)
• Document Oriented and NoSQL database.
• Supports Aggregation
• Uses BSON format
• Sharding (Helps in Horizontal Scalability)
• Supports Ad Hoc Queries
• Schema Less
• Capped Collection
• Indexing (Any field in MongoDB can be indexed)
• MongoDB Replica Set (Provides high availability)
• Supports Multiple Storage Engines
Introduction
to MongoDB
(Document
Oriented)
https://www.researchgate.net/profile/Beschi_Raja2/publication/322918614/figure/fig3/AS:59
0003382018051@1517679176661/MongoDB-Architecture.jpg
•1. Growth of MongoDB
2. Flexible Data Model
3. MongoDB features
4. Rich set drivers and connectivity
5. Availability & Uptime
6. Elastic Scalability
7. Security http://www.habilelabs.io/choose-mongodb-databases/
Introduction to
Neo4j (Graph
DB)
• SQL Like easy query language Neo4j CQL
• It follows Property Graph Data Model
• It supports Indexes by using Apache Lucence
• It supports UNIQUE constraints
• It contains a UI to execute CQL Commands : Neo4j Data Browser
• It supports full ACID(Atomicity, Consistency, Isolation and
Durability) rules
• It uses Native graph storage with Native GPE(Graph Processing
Engine)
• It supports exporting of query data to JSON and XLS format
• It provides REST API to be accessed by any Programming Language
like Java, Spring,Scala etc.
• It provides Java Script to be accessed by any UI MVC Framework
like Node JS.
• It supports two kinds of Java API: Cypher API and Native Java API
to develop Java applications.
Neo4j v/s
other NoSQL
databases
Introduction
to Neo4j
(Graph DB)
https://image.slidesharecdn.com/neo4j-131011114420-phpapp02/95/neo4j-graph-storage-7-638.jpg?cb=1381492129
More extensive details
Aggregation
MongoDB stores data in BSON (Binary JSON)
format, supports a dynamic schema and allows for
dynamic queries. The Mongo Query Language is
expressed as JSON and is different from the SQL
queries used in an RDBMS. MongoDB provides
anAggregation Framework that includes utility
functions such as count, distinct and group
Aggregation operations process data records and
return computed results. Aggregation operations
group values from multiple documents together,
and can perform a variety of operations on the
grouped data to return a single result. MongoDB
provides three ways to perform aggregation:
the aggregation pipeline, the map-reduce function,
and single purpose aggregation methods.
Aggregation Framework
Aggregation and SQL
https://matthewmoisen.com/static/images/matthew_moisen_sql_to_mongo_aggregation.jpg
Aggregation - Python
Aggregation
Framework
Aggregation
Aggregation
Aggregation
Operators
Map Reduce
Map Reduce is a 2 step
process to break down
a problem statement
into a solution.
Map : Collate all
information together
and sort it
Reduce: Divide the
Map into individual
keys and aggregate
Big Data 101
What is Big
Data?
It is a new set of approaches for analysing data sets
that were not previously accessible because they
posed challenges across one or more of the “3 V’s” of
Big Data
Volume - too Big – Terabytes and more of Credit Card
Transactions, Web Usage data, System logs
Variety - too Complex – truly unstructured data such
as Social Media, Customer Reviews, Call Center
Records
Velocity - too Fast - Sensor data, live web traffic,
Mobile Phone usage, GPS Data
Head Node
Data Node Data Node Data Node Data Node Data Node
File
Big Data 101
Hadoop is just a File System - HDFS
Read Optimised & Failure Tolerant
REDUCEMAP
Big Data 101
Map + Reduce = Extract, Load + Transform
Raw Data Raw Data Raw Data Raw Data
Mapper Mapper Mapper Mapper
Data Data Data Data
Reducer
Output
HDInsight
hands on
Hadoop
Streaming
with C#
Building the job
Executing
No need to unzip
data
HDInsight hands on
Hadoop Streaming with C#
C:appsdisthadoop-1.1.0-SNAPSHOTbinhadoop.cmd
jar C:appsdisthadoop-1.1.0-SNAPSHOTlibhadoop-streaming.jar
"-D mapred.output.compress=true"
"-D mapred.output.compression.codec=org.apache.hadoop.io.compress.GzipCodec"
-files "asv://container@storage/user/hadoop/code/Sentiment_v2.exe"
-numReduceTasks 0
-mapper "Sentiment_v2.exe"
-input "asv://container@storage.blob.core.windows.net/user/hadoop/data/"
-output "asv://container@storage.blob.core.windows.net/user/hadoop/output/Sentiment"
HDInsight hands on
Hadoop Streaming with C#
276.0|5|bob|government
276.0|5|bob|telling
276.0|5|bob|opposed
276.0|5|bob|liberty
276.0|5|bob|obviously
276.0|5|bob|fail
276.0|5|bob|comprehend
276.0|5|bob|qualifier
276.0|5|bob|legalized
276.0|5|bob|curtis
HDInsight hands on
Using Pig to Enrich the data
• Pig is a query language which shares
some concepts with SQL
• Invoked from the Hadoop command shell
• No GUI
• Does not do any work until it has to output a resultset
• Under the hood executes Map/reduce jobs
HDInsight hands on
Using Pig to Enrich the data with Sentiment
scores
• Load sentiment word lists and assign scores
• Loading the data
• Preprocess to get some key fields
• Count words in various contexts and add sentiment value
• Dump results to Azure Blob Storage
Using Pig to Enrich the data
Code sample: LOAD Operation
data_raw =
LOAD ‘<filename>'
USING PigStorage('|')
AS
(filename:chararray,message_id:chararray,a
uthor_id:chararray,word:chararray);
Using Pig to Enrich the data
Code sample: JOIN Statement
words_count_sentiment =
JOIN words_count_flat
BY words LEFT,
sentiment BY sentiment_word;
Using Pig to Enrich the data
Code sample: SUM Operation
message_sum_sentiment =
FOREACH messages_grouped
GENERATE group
AS message_details,
SUM(messages_joined.sentiment_value) AS
sentiment;
HDInsight hands on
Outputting results to Hive
• Hive is a near SQL compliant
language with a lot of similarities
• Again, under the hood issues MapReduce queries
• Exposed to ODBC
HDInsight hands on
Outputting results to Hive
• Create some Hive tables to reference the Pig Output
• Use the Interactive console
Outputting data to Hive
Code review: CREATE EXTERNAL TABLE
CREATE EXTERNAL TABLE words
( word STRING,
counts INT,
sentiment INT )
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '124'
STORED AS TEXTFILE
LOCATION
'asv://westburycorpus@westburycorpusnoreur.blob.co
re.windows.net/user/hadoop/pig_out/words';
https://image.slidesharecdn.com/mongodbandhadoop-120229224640-phpapp02/95/mongodb-and-hadoop-27-728.jpg?cb=1330556134
MapReduce
in MongoDB
Map Reduce
Twitter
Mongo DB
https://techstakes.files.wordpress.com/2011/04/slide1.jpg
Map Reduce
across
Shards
https://image.slidesharecdn.com/talkmongodbmunich20121016-121015163941-
phpapp01/95/mapconfused-a-practical-approach-to-mapreduce-with-mongodb-5-638.jpg?cb=1350635383
Big Data Technologies
BI Strengths and Weaknesses
© 2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555 103
The Current Solutions
10,000
2005 20152010
5,000
0
Current Database Solutions are designed for
structured data.
• Optimized to answer known questions quickly
• Schemas dictate form/context
• Difficult to adapt to new data types and new
questions
• Expensive at petabyte scale
STRUCTURED DATA UNSTRUCTURED DATA
GIGABYTESOFDATACREATED(INBILLIONS)
10%
Main Big Data Technologies
Hadoop NoSQL Databases Analytic Databases
Hadoop
• Low cost, reliable
scale-out architecture
• Distributed computing
Proven success in
Fortune 500 companies
• Exploding interest
NoSQL Databases
• Huge horizontal scaling
and high availability
• Highly optimized for
retrieval and appending
• Types
• Document stores
• Key Value stores
• Graph databases
Analytic RDBMS
• Optimized for bulk-load
and fast aggregate query
workloads
• Types
• Column-oriented
• MPP
• In-memory
© 2010, Pentaho. All Rights Reserved. www.pentaho.com.
Hadoop Core Components
• Hadoop Distributed File System (HDFS)
• Massive redundant storage across a commodity
cluster
• MapReduce
• Map: distribute a computational problem across a
cluster
• Reduce: Master node collects the answers to all
the sub-problems and combines them
• Many distros available
US and Worldwide: +1 (866) 660-7555 | Slide
Major Hadoop Utilities
Apache Hive
Apache Pig
Apache HBase
Sqoop
Oozie
Hue
Flume
Apache Whirr
Apache Zookeeper
SQL-like language and
metadata repository
High-level language for
expressing data
analysis programs
The Hadoop database.
Random, real -time
read/write access
Highly reliable
distributed
coordination service
Library for running
Hadoop in the cloud
Distributed service for
collecting and
aggregating log and
event data
Browser-based desktop
interface for interacting
with Hadoop
Server-based workflow
engine for Hadoop
activities
Integrating Hadoop
with RDBMS
Hadoop &
Databases
“The working conditions can
be are shocking”
ETL Developer
Big Data Platform Challenges
Challenges
1. Somewhat immature
2. Lack of tooling
3. Steep technical learning curve
4. Hiring qualified people
5. Availability of enterprise-ready products and tools
6. High latency (Hadoop)
7. Running inside the cluster
Challenges
Would you rather do this?
Scheduling
Modeling
Ingestion / Manipulation /
Integration
… OR THIS?
Compatibility with SPARK
• The MongoDB Connector for Spark provides integration between MongoDB and Apache Spark.
• With the connector, you have access to all Spark libraries for use with MongoDB datasets:
Datasets for analysis with SQL (benefiting from automatic schema inference), streaming, machine
learning, and graph APIs. You can also use the connector with the Spark Shell.
• The MongoDB Connector for Spark is compatible with the following versions of Apache Spark and
MongoDB:
• MongoDB Connector for Spark Spark Version MongoDB Version
• 2.2.0 2.2.x 2.6 or later
• 2.1.0 2.1.x 2.6 or later
• 2.0.0 2.0.x 2.6 or later
• 1.1.0 1.6.x 2.6 or later
Hadoop v/s Spark
Hadoop v/s Spark
http://www.big-data.tips/wp-
content/uploads/2017/03/apache-spark-vs-hadoop-figure.jpg
Big Data Deployment
Spark – Bare metal
https://weidongzhou.files.wordpress.com/2015/09/spark_engi
ne.jpg
Hadoop v/s Spark
https://www.springpeople.com/blog/wp-
content/uploads/2017/05/Comparing-Hadoop-MapReduce-
and-Spark.png
Spark Streaming
Spark with MongoDB
Spark Streaming
Query Optimisation based on 3.6(Advanced
functions)
• Retryable Writes
• Causal Consistency
• Change Streams
• New Aggregation Pipeline Stages and Operators
• Performance Advisor
• Default Bind to localhost
• Array Updates
Retryable
Writes
There’s always room for error when writing to a database
even when you think you’ve got all your bases covered.
With MongoDB 3.6, you no longer run the risk of executing
an update twice because of network glitches and the like,
thanks to the new Retryable Writes feature.
Instead of the developer or the application, it’s now the
driver itself that handles these system flukes. The
MongoDB driver that comes with 3.6 can automatically
retry certain failed write operations once, to recover from
transient network errors or replica set failovers.
The benefit here is all in the feature name: your writes will
automatically be retried by MongoDB itself, so you don’t
have to worry about any write inconsistencies.
Causal
Consistency
Prior to MongoDB 3.6, reading from primaries was
the only reliable way to go. Causal relationships
between read and write operations as they occurred
on primaries (and got replicated to secondaries)
weren’t guaranteed. These could result in lags (e.g.
writes to the primary not replicated to the
secondaries, multiple secondaries writing updates
at different times, etc.) which could make reading
from secondaries inconsistent.
This all changes with MongoDB 3.6, which in tl;dr
format is: you can now also reliably read from
secondaries. You can find the longer technical
explanation here.
Change
Streams
Just like you get notified about real-time changes for about
almost anything these days, MongoDB is now also able to
do the same through a feature called Change Streams.
The benefit of Change Streams is immediately visible. You
can now subscribe to changes in a collection and get
notified. A new method, called watch, listens for these
changes, notifies you, and can even trigger an automatic
series of events, as defined in your change stream.
Change streams can “listen” to five events for now (Insert,
Delete, Replace, Update and Invalidate) and can only be set
up based on user roles, which means that only those who
have read access to collections can create change streams
in those collections.
New
Aggregation
Pipeline Stages
and Operators
MongoDB users can feel a bit more
empowered by an aggregation pipeline
that boasts new operators, stages, and
an improved $lookup operator with
even more powerful join capabilities.
Studio 3T’s Aggregation Editor will of
course support these new additions,
the full list of which you can find in the
MongoDB 3.6 Release Notes.
Performance
Advisor
MongoDB’s Ops Manager comes bundled
with Performance Advisor, a feature that
alerts you about slow queries – meaning
queries that take longer than the default
slowOpThresholdMs of 100 milliseconds –
and suggests new indexes to improve query
performance.
Indexes help speed up queries significantly,
so having automated suggestions on how to
optimize them is quite a leg-up. But there is a
tradeoff to consider: the more indexes you
have, the worse your write performance. And
it’s still up to you – and not Performance
Advisor – to strike the right balance.
Default bind
to localhost
In an effort to enforce security,
MongoDB 3.6 now by default binds to
localhost if no authentication is
enabled, so that only connections
from clients running on the same
machine are accepted in such a case.
Only users from whitelisted IP
addresses can externally connect to
your unsecured databases, everything
else will be denied.
Array
Updates
Nested arrays are easier to manipulate that ever
in MongoDB 3.6. Now, the query $type : "array"
detects that fields are arrays, unlike before when
it would only return documents with array fields
with an element of BSON type array.
MongoDB also introduced new operators which
will make updating all elements in an array much
easier and with less code.
We already made showing nested fields and
exploring arrays easier with Stud
Cloud
Companies
Amazon
Google
IBM
Microsoft
DeepCut Confidential
Amazon Cloud
Infrastructure
DeepCut Confidential
Amazon Cloud Features
• Elastic Web-Scale Computing
• Completely Controlled
• Flexible Cloud Hosting Services
• Designed for use with other Amazon Web Services
• Reliable
• Secure
• Inexpensive
• Easy to Start
DeepCut Confidential
DeepCut Confidential
Google
AppEngine
Features
Popular languages and frameworks
Focus on your code
Multiple storage options
Powerful built-in services
Familiar development tools
Deploy at Google scale
DeepCut Confidential
DeepCut Confidential
DeepCut Confidential
IBM
SmartCloud
Features
Expert Cloud Consulting
Private and Hybrid Clouds
IaaS, PaaS and SaaS
Speed
Empowerment
Economics
DeepCut Confidential
DeepCut Confidential
DeepCut Confidential
DeepCut Confidential
Microsoft Cloud
Features
Infrastructure Services
Develop Modern Applications
Insights from Data
Identity and Access Management
DeepCut Confidential
Cloud
Segments
IaaS
PaaS
SaaS
DeepCut Confidential
Cloud
Deployment
Models
Private
Public
Hybrid
Community
DeepCut Confidential
DeepCut Confidential
DeepCut Confidential
DeepCut Confidential
DeepCut Confidential
DeepCut Confidential
Deep Diving
on the
functions of
Mongo DB
3.4
Database
commands
mongo shell
methods
Indexing
Techniques
Create Indexes to Support Your Queries
An index supports a query when the index contains all the
fields scanned by the query. Creating indexes that
supports queries results in greatly increased query
performance.
Use Indexes to Sort Query Results
To support efficient queries, use the strategies here when you
specify the sequential order and sort order of index fields.
Ensure Indexes Fit in RAM
When your index fits in RAM, the system can avoid reading
the index from disk and you get the fastest processing.
Create Queries that Ensure Selectivity
Selectivity is the ability of a query to narrow results using the
index. Selectivity allows MongoDB to use the index for a
larger portion of the work associated with fulfilling the
query.
Name Description
db.collection.createIndex() Builds an index on a collection.
db.collection.dropIndex() Removes a specified index on a collection.
db.collection.dropIndexes() Removes all indexes on a collection.
db.collection.getIndexes() Returns an array of documents that describe the existing indexes on a collection.
db.collection.reIndex() Rebuilds all existing indexes on a collection.
db.collection.totalIndexSize()
Reports the total size used by the indexes on a collection. Provides a wrapper around the totalIndexSize field of
the collStats output.
cursor.explain() Reports on the query execution plan for a cursor.
cursor.hint() Forces MongoDB to use a specific index for a query.
cursor.max() Specifies an exclusive upper index bound for a cursor. For use with cursor.hint()
cursor.min() Specifies an inclusive lower index bound for a cursor. For use with cursor.hint()
Indexing Methods in the mongo Shell
Name Description
createIndexes Builds one or more indexes for a collection.
dropIndexes Removes indexes from a collection.
compact Defragments a collection and rebuilds the indexes.
reIndex Rebuilds all indexes on a collection.
validate Internal command that scans for a collection’s data and indexes for correctness.
geoNear
Performs a geospatial query that returns the documents closest to a given
point.
geoSearch Performs a geospatial query that uses MongoDB’s haystack index functionality.
checkShardingIndex Internal command that validates index on shard key.
Indexing Database Commands
Name Description
$geoWithin
Selects geometries within a bounding GeoJSON geometry.
The 2dsphere and 2d indexes support $geoWithin.
$geoIntersects
Selects geometries that intersect with a GeoJSON geometry. The 2dsphere index
supports $geoIntersects.
$near
Returns geospatial objects in proximity to a point. Requires a geospatial index.
The 2dsphere and 2d indexes support $near.
$nearSphere
Returns geospatial objects in proximity to a point on a sphere. Requires a
geospatial index. The 2dsphere and 2d indexes support $nearSphere.
Geospatial Query Selectors
Name Description
$explain Forces MongoDB to report on query execution plans. See explain().
$hint Forces MongoDB to use a specific index. See hint()
$max Specifies an exclusive upper limit for the index to use in a query. See max().
$min Specifies an inclusive lower limit for the index to use in a query. See min().
$returnKey Forces the cursor to only return fields included in the index.
Indexing Query Modifiers
Thanks !!!
Keep in touch
Rajesh30menon
@YAHOO, GMAIL, HOTMAIL, SKYPE, TWITTER, INSTAGRAM, PINTEREST
My blog : http://www.technospirituailty.com
MY BOOKS : Link : https://goo.gl/bQ8cnM (Amazon.com)
Link : https://goo.gl/owgMxT (Amazon.in)
http://www.technospirituality.com 155

Más contenido relacionado

La actualidad más candente

Relational and non relational database 7
Relational and non relational database 7Relational and non relational database 7
Relational and non relational database 7abdulrahmanhelan
 
MongoDB vs Mysql. A devops point of view
MongoDB vs Mysql. A devops point of viewMongoDB vs Mysql. A devops point of view
MongoDB vs Mysql. A devops point of viewPierre Baillet
 
Scaling MongoDB
Scaling MongoDBScaling MongoDB
Scaling MongoDBMongoDB
 
Hadoop World 2011: Advanced HBase Schema Design
Hadoop World 2011: Advanced HBase Schema DesignHadoop World 2011: Advanced HBase Schema Design
Hadoop World 2011: Advanced HBase Schema DesignCloudera, Inc.
 
Navigating NoSQL in cloudy skies
Navigating NoSQL in cloudy skiesNavigating NoSQL in cloudy skies
Navigating NoSQL in cloudy skiesshnkr_rmchndrn
 
NOSQL- Presentation on NoSQL
NOSQL- Presentation on NoSQLNOSQL- Presentation on NoSQL
NOSQL- Presentation on NoSQLRamakant Soni
 
Performance Analysis of HBASE and MONGODB
Performance Analysis of HBASE and MONGODBPerformance Analysis of HBASE and MONGODB
Performance Analysis of HBASE and MONGODBKaushik Rajan
 
NoSQL and MongoDB Introdction
NoSQL and MongoDB IntrodctionNoSQL and MongoDB Introdction
NoSQL and MongoDB IntrodctionBrian Enochson
 
HBaseCon 2013: Compaction Improvements in Apache HBase
HBaseCon 2013: Compaction Improvements in Apache HBaseHBaseCon 2013: Compaction Improvements in Apache HBase
HBaseCon 2013: Compaction Improvements in Apache HBaseCloudera, Inc.
 
Facebook - Jonthan Gray - Hadoop World 2010
Facebook - Jonthan Gray - Hadoop World 2010Facebook - Jonthan Gray - Hadoop World 2010
Facebook - Jonthan Gray - Hadoop World 2010Cloudera, Inc.
 
When to Use MongoDB
When to Use MongoDBWhen to Use MongoDB
When to Use MongoDBMongoDB
 
Intro to NoSQL and MongoDB
Intro to NoSQL and MongoDBIntro to NoSQL and MongoDB
Intro to NoSQL and MongoDBDATAVERSITY
 
Hadoop World 2011: Advanced HBase Schema Design - Lars George, Cloudera
Hadoop World 2011: Advanced HBase Schema Design - Lars George, ClouderaHadoop World 2011: Advanced HBase Schema Design - Lars George, Cloudera
Hadoop World 2011: Advanced HBase Schema Design - Lars George, ClouderaCloudera, Inc.
 

La actualidad más candente (20)

Relational and non relational database 7
Relational and non relational database 7Relational and non relational database 7
Relational and non relational database 7
 
MongoDB vs Mysql. A devops point of view
MongoDB vs Mysql. A devops point of viewMongoDB vs Mysql. A devops point of view
MongoDB vs Mysql. A devops point of view
 
Scaling MongoDB
Scaling MongoDBScaling MongoDB
Scaling MongoDB
 
Hadoop World 2011: Advanced HBase Schema Design
Hadoop World 2011: Advanced HBase Schema DesignHadoop World 2011: Advanced HBase Schema Design
Hadoop World 2011: Advanced HBase Schema Design
 
Navigating NoSQL in cloudy skies
Navigating NoSQL in cloudy skiesNavigating NoSQL in cloudy skies
Navigating NoSQL in cloudy skies
 
NOSQL- Presentation on NoSQL
NOSQL- Presentation on NoSQLNOSQL- Presentation on NoSQL
NOSQL- Presentation on NoSQL
 
Performance Analysis of HBASE and MONGODB
Performance Analysis of HBASE and MONGODBPerformance Analysis of HBASE and MONGODB
Performance Analysis of HBASE and MONGODB
 
NoSQL and MongoDB Introdction
NoSQL and MongoDB IntrodctionNoSQL and MongoDB Introdction
NoSQL and MongoDB Introdction
 
HBaseCon 2013: Compaction Improvements in Apache HBase
HBaseCon 2013: Compaction Improvements in Apache HBaseHBaseCon 2013: Compaction Improvements in Apache HBase
HBaseCon 2013: Compaction Improvements in Apache HBase
 
Facebook - Jonthan Gray - Hadoop World 2010
Facebook - Jonthan Gray - Hadoop World 2010Facebook - Jonthan Gray - Hadoop World 2010
Facebook - Jonthan Gray - Hadoop World 2010
 
HBase in Practice
HBase in Practice HBase in Practice
HBase in Practice
 
When to Use MongoDB
When to Use MongoDBWhen to Use MongoDB
When to Use MongoDB
 
No SQL and MongoDB - Hyderabad Scalability Meetup
No SQL and MongoDB - Hyderabad Scalability MeetupNo SQL and MongoDB - Hyderabad Scalability Meetup
No SQL and MongoDB - Hyderabad Scalability Meetup
 
Intro to NoSQL and MongoDB
Intro to NoSQL and MongoDBIntro to NoSQL and MongoDB
Intro to NoSQL and MongoDB
 
Relational vs. Non-Relational
Relational vs. Non-RelationalRelational vs. Non-Relational
Relational vs. Non-Relational
 
Hbase: an introduction
Hbase: an introductionHbase: an introduction
Hbase: an introduction
 
Hadoop World 2011: Advanced HBase Schema Design - Lars George, Cloudera
Hadoop World 2011: Advanced HBase Schema Design - Lars George, ClouderaHadoop World 2011: Advanced HBase Schema Design - Lars George, Cloudera
Hadoop World 2011: Advanced HBase Schema Design - Lars George, Cloudera
 
NOSQL Overview
NOSQL OverviewNOSQL Overview
NOSQL Overview
 
Apache HBase™
Apache HBase™Apache HBase™
Apache HBase™
 
Mongodb @ vrt
Mongodb @ vrtMongodb @ vrt
Mongodb @ vrt
 

Similar a NoSQL and MongoDB

Mongo Bb - NoSQL tutorial
Mongo Bb - NoSQL tutorialMongo Bb - NoSQL tutorial
Mongo Bb - NoSQL tutorialMohan Rathour
 
Introduction to NoSQL
Introduction to NoSQLIntroduction to NoSQL
Introduction to NoSQLbalwinders
 
Big Data technology Landscape
Big Data technology LandscapeBig Data technology Landscape
Big Data technology LandscapeShivanandaVSeeri
 
NoSql Data Management
NoSql Data ManagementNoSql Data Management
NoSql Data Managementsameerfaizan
 
Presentation On NoSQL Databases
Presentation On NoSQL DatabasesPresentation On NoSQL Databases
Presentation On NoSQL DatabasesAbiral Gautam
 
Why no sql ? Why Couchbase ?
Why no sql ? Why Couchbase ?Why no sql ? Why Couchbase ?
Why no sql ? Why Couchbase ?Ahmed Rashwan
 
MongoDB 2.4 and spring data
MongoDB 2.4 and spring dataMongoDB 2.4 and spring data
MongoDB 2.4 and spring dataJimmy Ray
 
NoSQLDatabases
NoSQLDatabasesNoSQLDatabases
NoSQLDatabasesAdi Challa
 
No SQL- The Future Of Data Storage
No SQL- The Future Of Data StorageNo SQL- The Future Of Data Storage
No SQL- The Future Of Data StorageBethmi Gunasekara
 
MongoDB Lab Manual (1).pdf used in data science
MongoDB Lab Manual (1).pdf used in data scienceMongoDB Lab Manual (1).pdf used in data science
MongoDB Lab Manual (1).pdf used in data sciencebitragowthamkumar1
 
NOSQL Databases types and Uses
NOSQL Databases types and UsesNOSQL Databases types and Uses
NOSQL Databases types and UsesSuvradeep Rudra
 
Nosql databases for the .net developer
Nosql databases for the .net developerNosql databases for the .net developer
Nosql databases for the .net developerJesus Rodriguez
 

Similar a NoSQL and MongoDB (20)

Mongo Bb - NoSQL tutorial
Mongo Bb - NoSQL tutorialMongo Bb - NoSQL tutorial
Mongo Bb - NoSQL tutorial
 
Introduction to NoSQL
Introduction to NoSQLIntroduction to NoSQL
Introduction to NoSQL
 
No sq lv1_0
No sq lv1_0No sq lv1_0
No sq lv1_0
 
Introduction to NoSQL
Introduction to NoSQLIntroduction to NoSQL
Introduction to NoSQL
 
the rising no sql technology
the rising no sql technologythe rising no sql technology
the rising no sql technology
 
NoSQL.pptx
NoSQL.pptxNoSQL.pptx
NoSQL.pptx
 
Big Data technology Landscape
Big Data technology LandscapeBig Data technology Landscape
Big Data technology Landscape
 
NOsql Presentation.pdf
NOsql Presentation.pdfNOsql Presentation.pdf
NOsql Presentation.pdf
 
NoSql Data Management
NoSql Data ManagementNoSql Data Management
NoSql Data Management
 
Presentation On NoSQL Databases
Presentation On NoSQL DatabasesPresentation On NoSQL Databases
Presentation On NoSQL Databases
 
Why no sql ? Why Couchbase ?
Why no sql ? Why Couchbase ?Why no sql ? Why Couchbase ?
Why no sql ? Why Couchbase ?
 
Database Technologies
Database TechnologiesDatabase Technologies
Database Technologies
 
MongoDB 2.4 and spring data
MongoDB 2.4 and spring dataMongoDB 2.4 and spring data
MongoDB 2.4 and spring data
 
NoSQLDatabases
NoSQLDatabasesNoSQLDatabases
NoSQLDatabases
 
No SQL- The Future Of Data Storage
No SQL- The Future Of Data StorageNo SQL- The Future Of Data Storage
No SQL- The Future Of Data Storage
 
NoSQL Databases
NoSQL DatabasesNoSQL Databases
NoSQL Databases
 
MongoDB Lab Manual (1).pdf used in data science
MongoDB Lab Manual (1).pdf used in data scienceMongoDB Lab Manual (1).pdf used in data science
MongoDB Lab Manual (1).pdf used in data science
 
NOSQL Databases types and Uses
NOSQL Databases types and UsesNOSQL Databases types and Uses
NOSQL Databases types and Uses
 
NoSql
NoSqlNoSql
NoSql
 
Nosql databases for the .net developer
Nosql databases for the .net developerNosql databases for the .net developer
Nosql databases for the .net developer
 

Más de Rajesh Menon

Big Data in Verticals - Complete.pptx
Big Data in Verticals - Complete.pptxBig Data in Verticals - Complete.pptx
Big Data in Verticals - Complete.pptxRajesh Menon
 
AI in Verticals - Complete.pptx
AI in Verticals - Complete.pptxAI in Verticals - Complete.pptx
AI in Verticals - Complete.pptxRajesh Menon
 
SMAC - Social, Mobile, Analytics and Cloud - An overview
SMAC - Social, Mobile, Analytics and Cloud - An overview SMAC - Social, Mobile, Analytics and Cloud - An overview
SMAC - Social, Mobile, Analytics and Cloud - An overview Rajesh Menon
 
Social media presentation in under 2 hours
Social media presentation  in under 2 hoursSocial media presentation  in under 2 hours
Social media presentation in under 2 hoursRajesh Menon
 
Oracle sql unleashed
Oracle sql unleashedOracle sql unleashed
Oracle sql unleashedRajesh Menon
 
Q U E S T The Book Final
Q U E S T The Book   FinalQ U E S T The Book   Final
Q U E S T The Book FinalRajesh Menon
 

Más de Rajesh Menon (7)

Big Data in Verticals - Complete.pptx
Big Data in Verticals - Complete.pptxBig Data in Verticals - Complete.pptx
Big Data in Verticals - Complete.pptx
 
AI in Verticals - Complete.pptx
AI in Verticals - Complete.pptxAI in Verticals - Complete.pptx
AI in Verticals - Complete.pptx
 
SMAC - Social, Mobile, Analytics and Cloud - An overview
SMAC - Social, Mobile, Analytics and Cloud - An overview SMAC - Social, Mobile, Analytics and Cloud - An overview
SMAC - Social, Mobile, Analytics and Cloud - An overview
 
Social media presentation in under 2 hours
Social media presentation  in under 2 hoursSocial media presentation  in under 2 hours
Social media presentation in under 2 hours
 
Oracle sql unleashed
Oracle sql unleashedOracle sql unleashed
Oracle sql unleashed
 
Technology Summit
Technology SummitTechnology Summit
Technology Summit
 
Q U E S T The Book Final
Q U E S T The Book   FinalQ U E S T The Book   Final
Q U E S T The Book Final
 

Último

IMA MSN - Medical Students Network (2).pptx
IMA MSN - Medical Students Network (2).pptxIMA MSN - Medical Students Network (2).pptx
IMA MSN - Medical Students Network (2).pptxdolaknnilon
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfSocial Samosa
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptSonatrach
 
Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Colleen Farrelly
 
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default  Presentation : Data Analysis Project PPTPredictive Analysis for Loan Default  Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPTBoston Institute of Analytics
 
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfBoston Institute of Analytics
 
Identifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanIdentifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanMYRABACSAFRA2
 
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一fhwihughh
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptxthyngster
 
Heart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectHeart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectBoston Institute of Analytics
 
9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home ServiceSapana Sha
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFAAndrei Kaleshka
 
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSINGmarianagonzalez07
 
Top 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In QueensTop 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In Queensdataanalyticsqueen03
 
Multiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfMultiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfchwongval
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Jack DiGiovanna
 
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degreeyuu sss
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)jennyeacort
 
MK KOMUNIKASI DATA (TI)komdat komdat.docx
MK KOMUNIKASI DATA (TI)komdat komdat.docxMK KOMUNIKASI DATA (TI)komdat komdat.docx
MK KOMUNIKASI DATA (TI)komdat komdat.docxUnduhUnggah1
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一F sss
 

Último (20)

IMA MSN - Medical Students Network (2).pptx
IMA MSN - Medical Students Network (2).pptxIMA MSN - Medical Students Network (2).pptx
IMA MSN - Medical Students Network (2).pptx
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
 
Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024
 
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default  Presentation : Data Analysis Project PPTPredictive Analysis for Loan Default  Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
 
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
 
Identifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanIdentifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population Mean
 
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
 
Heart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectHeart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis Project
 
9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFA
 
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
 
Top 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In QueensTop 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In Queens
 
Multiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfMultiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdf
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
 
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
 
MK KOMUNIKASI DATA (TI)komdat komdat.docx
MK KOMUNIKASI DATA (TI)komdat komdat.docxMK KOMUNIKASI DATA (TI)komdat komdat.docx
MK KOMUNIKASI DATA (TI)komdat komdat.docx
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
 

NoSQL and MongoDB

  • 2. Agenda – Day 1 • Day 1 – Theory / Demo o Introduction to NoSQL – 3 hours ➢ What Is Meant by NoSQL? ➢ Distributed and Decentralized ➢ Elastic Scalability ➢ High Availability and Fault Tolerance ➢ Brewer's CAP Theorem ➢ Row-Oriented ➢ Schema-Free ➢ High Performance ➢ Types of NoSQL Databases ➢ Introduction to ReDis (Key-Value Pair) ➢ Introduction to HBase (Column Oriented Hadoop) ➢ Introduction to Cassandra (Column- Oriented) ➢ Introduction to MongoDB (Document Oriented) ➢ Introduction to Neo4j (Graph DB) • o Remaining 5 hours ➢ Aggregation – 30 mins ➢ Map Reduce – 30 mins ➢ Compatibility with SPARK – 30 mins ➢ Query Optimisation based on 3.6(Advanced functions) – 30 mins ➢ Deep Diving on the functions of Mongo DB 3.4 – 2.5 hours ➢ Indexing techniques – 30 mins
  • 3. Agenda – Day 2  1) Aggregation Framework. – 1 hour  2) MongoDB Sharding. – 1 hour  3) MongoDB Ad hoc queries. – 1 hour  4) MongoDB is Schema – Less. – 1 hour  5) Capped Collections. - 1 hour  6) MongoDB Indexing. – 1 hour ✓ 7) Project Work – 2hours (Related to Security)
  • 4. What Is Meant by NoSQL? • A NoSQL database provides a mechanism for storage and retrieval of data that is modeled in means other than the tabular relations used in relational databases.
  • 6. Driving Trends - Data Size • Data size is increasing exponentially year after year
  • 7. Driving Trend: Semi- Structured Information • Content is becoming more unique • We can blame generation Y for this! • Before: Job Title - Software Engineer • After: Job Title - ZOMG Awesome Core Repository Developer
  • 9. Social Network Performance Why is RDBMS performance horrible? •To find all friends on depth 5,MySQL will create Cartesian product on t_user_friend table 5 times •Resulting in 50,000^5 records return •All except 1,000 are discarded •Neo4j will simply traverse through the nodes in the database until there are no more nodes in which to traverse
  • 10. Social Network Performance The power of traversals •Graphs data structures are localized •Count all of the people around you •Adding more people in the room may only slightly impact your performance to count your neighbors
  • 11. Distributed and Decentralized • Horizontal (Mongo DB) and Vertical (Oracle) scaling • Distributed database • Decentralized database • Eg: Mongo DB • Application : BlockChain
  • 12. Elastic Scalability • Scales to multiple clusters • Private Cloud • Public Cloud (MongoDB Atlas) • Literally infinite scale • Keep adding resources • Tune the database • NoSQL scales better than RDBMS • Provision resources in seconds
  • 13. High Availability and Fault Tolerance • Always available, provided configuration is OK • Look at the load on the system • Number of users, sessions, memory, CPU, network • Fault tolerance built in • If one node fails, others take over without affecting the entire system • Classic example : Hadoop • Cloud is a catalyst. And the future.
  • 15. Brewer’s Cap Theorem explained • Consistency: Any changes to a particular record stored in database, in form of inserts, updates or deletes is seen as it is, by other users accessing that record at that particular time. If they don’t see the updated data, it is termed as eventual consistency. • Availability: The system continues to work and serve data in spite of node failures. • Partition Tolerance: The database system could be stored based on distributed architecture such as Hadoop (HDFS).
  • 16. Brewer’s Cap Theorem examples • RDBMS systems such as Oracle, MySQL etc supports consistency and availability. • NoSQL datastore such as HBase supports Consistency and Partition Tolerance. • NoSQL datastore such as Cassandra, CouchDB supports Availability and Partition Tolerance.
  • 17. Brewer’s CAP Theorem notes – Which DB to choose ? CP-based database system: When it is critical that all clients see a consistent view of the database, the users of one node will have to wait for any other nodes to come into agreement before being able to read or write to the database, availability takes a backseat to consistency and one may want to choose database such as HBase that supports CP (Consistency and Partition Tolerance) AP-based database system: When there is a requirement that database remain available at all times, one could DB system which allows clients write data to one node of the database without waiting for other nodes to come into agreement. DB system then takes care of data reconciliation at a little later time. This is the state of eventual consistency. In applications which could sacrifice data consistency in return of huge performance, one could select databases such as CouchDB, Cassandra.
  • 18. Row-Oriented • A column-oriented DBMS (or columnar database management system) is a database management system (DBMS) that stores data tables by column rather than by row. Practical use of a column store versus a row store differs little in the relational DBMS world. • RDBMS • Document-based Store- It stores documents made up of tagged elements. {Example- CouchDB} Column-based Store- Each storage block contains data from only one column, {Example- HBase, Cassandra} Graph-based- A network databasethat uses edges and nodes to represent and store data.
  • 19. Row-Oriented • MongoDB is schema-free, allowing you to create documents without having to first create the structure for that document. At the same time, it still has many of the features of a relational database, including strong consistency and an expressive query language.CouchDB: Views in CouchDB are similar to indexes in SQL.
  • 23. Schema-Free • On schema-free databases like for example MongoDB, you can simply add records without any previous structure. Moreover, you can group records that do not have the same structure, for example, you can have a collection (something like a table on relational databases where you group records) with records of various structures, in other words, they do not need to have the same columns (properties).
  • 24. Schema Free – Mongo DB Notes MongoDB is a JSON-style data store. The documents stored in the database can have varying sets of fields, with different types for each field. And that’s true. But it doesn’t mean that there is no schema. There are in fact various schemas: •The one in your head when you designed the data structures •The one that your database really implemented to store your data structures •The one you should have implemented to fulfill your requirements Every time you realise that you made a mistake (see point three above), or when your requirements change, you will need to migrate your data. Let’s review again MongoDB’s point of view here: With a schemaless database, 90% of the time adjustments to the database become transparent and automatic. For example, if we wish to add GPA to the student objects, we add the attribute, resave, and all is well — if we look up an existing student and reference GPA, we just get back null. Further, if we roll back our code, the new GPA fields in the existing objects are unlikely to cause problems if our code was well written.
  • 25. Schema Free - Hadoop What Hadoop, NoSQL databases and other modern big data tools allow is for each application or user to come to the raw data with a different schema. Take call center logs as an example. Someone performing a columnar analysis on time and call length has a different interpretation of the schema than someone doing a row search for a specific call. But they aren't imposing a schema-on-read; rather, they're flexibly addressing different components of the schema to maximize their individual query performance. So, forget schema-less, schema-on-read and other nonsense that is of use only to theorists and niche players. Focus instead on providing ways for flexible database schemas to be integrated into the full business information pipeline.
  • 27. High Performance • mongostat is the most powerful utility. It reports real-time statistics about connections, inserts, queries, updates, deletes, queued reads and writes, flushes, memory usage, page faults, and much more. It can be useful to quickly spot- check database activity, see if values are not abnormally high, and make sure you have enough capacity. • mongotop returns the amount of time a MongoDB instance spends performing read and write operations. It is broken down by collection (namespace). This allows you to make sure there is no unexpected activity and see where resources are consumed. All active namespaces are reported. (frequency – every second)
  • 28. NOSQL Database Types Column Family GraphDocument Key-Value
  • 30. Key-Value Stores • Most based on Dynamo white paper • Dynamo: Amazon’s Highly Available Key Value Store (2007) • Data Model • Global key-value mapping • Massively scalable HashMap • Highly Fault Tolerant • Examples • Riak, Redis, Voldemort
  • 31. Key Value Stores: Strengths and Weaknesses • Strengths • Simple data model • Horizontally scalable • Weaknesses • Simple data model • Poor at handling complex data
  • 32. Column Family • Based on Google’s BigTable white paper • BigTable: Distributed Storage System for Structured Data (2006) • Tuple of K-V where the key maps to a set of columns • Map Reduce for querying and processing • Examples • Cassandra, HBase, Hypertable
  • 33. Column Family: Strengths and Weaknesses • Strengths • Data model supports semi-structured data • Naturally indexed (columns) • Horizontally scalable • Weaknesses • Does not handle interconnected data well
  • 34. Document-oriented Databases • Data Model • Collection of documents • Document is a key-value collection • Index-centric • Examples • MongoDB, CouchDB, Lotus Notes?
  • 35. Document- oriented Databases: Strengths and Weaknesses • Strengths • Simple, but powerful data model • Good scalability • Weaknesses • Does not handle interconnected data well • Querying is limited to keys and indexes • MapReduce for large queries
  • 36. Graph Databases Data Model • Nodes with properties • Named relationships with properties Examples • Neo4j, Sones GraphDB, OrientDB, InfiniteGraph, AllegroGraph
  • 37. Graph Databases: Strengths and Weaknesses • Strengths • Extremely powerful data model • Performant when querying interconnected data • Easy to query • Weaknesses • Sharding • Rewiring your brain
  • 39. Typical Use Cases for Graph Databases • Recommendations • Business Intelligence • Social Computing • Master Data Management • Geospatial • Genealogy (Past and Present) • Time Series Data • Web Analytics • Bioinformatics • Indexing RDBMS
  • 40. Maturity of Data Models NOSQL RDBMS Graph Stores •Most NOSQL: ~6 years •Relational: 42 years •Graph Theory: 276 years
  • 41. Leonhard Euler • Inventor of Graph Theory (1736) • Swiss mathematician • The original hipster
  • 43. Graph Data Model Person City Event Is Attending Hosted In Is Located In Rated
  • 44. Graph Data Model Person City Event Is Attending Hosted In Is Located In firstName:kyle lastName:adams name:DevCon name:San Jose country:USA Rated score: 11 out of 10 comment: Amazing!!!
  • 45. What is Neo4j? •Leading Open Source graph database •Embeddable and Server •ACID compliant •White board friendly •Stable •Has been in 24/7 operation since 2003
  • 46. More Reasons Why Neo4j is Great •High performance graph operations •Traverse 1,000,000+ relationships/sec on commodity hardware •32 billion nodes & relationships per Neo4j instance •64 billion properties per Neo4j instance •Small footprint •Standalone server is ~65mb
  • 47. If NOSQL stands for Not Only SQL, ....then how do we execute queries?!
  • 50. Social Network Performance • First rule of fight club: • Run a friends of friends query • Second rule of fight club: • 1,000 Users • Third rule of fight club: • Average of 50 friends per user • Fourth rule of fight club: • Limit the depth of 5 • Fifth rule of fight club: • Intel i7 commodity laptop w/8GB RAM The Experiment: Round 1
  • 52. Social Network Performance select distinct uf3.* from t_user_friend uf1 inner join t_user_friend uf2 on uf1.user_1 = uf2.user_2 inner join t_user_friend uf3 on uf2.user_1 = uf3.user_2 where uf1.user_1 = ? SQL: Friends of friends at depth 3
  • 53. Social Network Performance MySQL Results: Round 1- 1,000 Users Depth Execution Time (sec) Records Returned 2 0.028 ~900 3 0.213 ~999 4 10.273 ~999 5 92,613.150 ~999
  • 55. Social Network Performance Neo4j Traversal API TraversalDescription traversalDescription = Traversal.description() .relationships("IS_FRIEND_OF",Direction.OUTGOING) .evaluator(Evaluators.atDepth(2)) .uniqueness(Uniqueness.NODE_GLOBAL); Iterable<Node> nodes = traversalDescription.traverse(nodeById).nodes();
  • 56. Social Network Performance Neo4j Results: Round 1- 1,000 Users Depth Execution Time (sec) Records Returned 2 0.04 ~900 3 0.06 ~999 4 0.07 ~999 5 0.07 ~999
  • 57. Social Network Performance • First rule of fight club: • Run a friends of friends query • Second rule of fight club: • 1,000,000 Users • Third rule of fight club: • Average of 50 friends per user • Fourth rule of fight club: • Limit the depth of 5 • Fifth rule of fight club: • Intel i7 commodity laptop w/8GB RAM The Experiment: Round 2
  • 58. Social Network Performance MySQL Results: Round 1- 1,000,000 Users Depth Execution Time (sec) Records Returned 2 0.016 ~2,500 3 30.267 ~125,000 4 1,543.505 ~600,00 5 Did not finish after an hour N/A
  • 59. Social Network Performance Neo4j Results: Round 1- 1,00,000 Users Depth Execution Time (sec) Records Returned 2 0.010 ~2,500 3 0.168 ~110,000 4 1.359 ~600,000 5 2.132 ~800,000
  • 60. Introduction to ReDis (Key-Value Pair) Redis architecture contains two main processes:Redis client and Redis Server. Redis client and server can be in the same computer or in two different computers. Redis server is responsible for storing data in memory. It handles all kinds of manage -ment and forms the major part ofarchitecture.
  • 61. Redis with Clojure – Pub / Sub http://matthiasnehlsen.com/images/redesign2.png
  • 62. Performance of Redis Redis does a lot with very little CPU utilization. In a non-scientific test, I fired up 50 JVMs (on four machines) subscribing to the topic on which the TwitterClient publishes tweets with matched percolation queries. Then I changed the tracked term from the Twitter Streaming API to “love”, which reliably maxes out the rate of tweets permitted. Typically, with this term I see around 60 to 70 tweets per second. With 50 connected processes, 3000 to 3500 tweets were delivered per second overall, yet the CPU utilization of Redis idled somewhere between 1.7% and 2.3%.
  • 63. Introduction to HBase (Column Oriented Hadoop) HBase is a distributed, NoSQL, open-source database, initially conceived as an open-source alternative to Google’s proprietary BigTable. Originally, HBase was part of the Hadoop project, but was eventually spun off as a subproject. Given this legacy, it is not surprising that most often HBase is deployed on top of a Hadoop cluster (it used HDFS as its underlying storage), however a case study suggests that it can run on top of Amazon Elastic Block Store (EBS) as well. These days HBase is used by companies such as Adobe, Facebook, Twitter and Yahoo – and many others to process large amounts of data in real time, since it is ideally placed to store the input and/or the output of MapReduce jobs.
  • 64. Introduction to HBase (Column Oriented Hadoop) • HDFS is a distributed filesystem; One can do most regular FS operations on it such as listing files in a directory, writing a regular file, reading a part of the file, etc. Its not simply "a collection of structured or unstructured data" anymore than your EXT4 or NTFS filesystems are. • HBase is a in-memory Key-Value store which may persist to HDFS (it isn't a hard-requirement, you can run HBase on any distributed-filesystem). For any read key request asked of HBase, it will first check its runtime memory caches to see if it has a value cached, and otherwise visit its stored files on HDFS to seek and read out the specific value. There are various configurations in HBase offered to control the way the cache is utilised, but HBase's speed comes from a combination of caching and indexed persistence (faster, seek-ed file reads). • HBase's file-based persistence on HDFS does the key indexing automatically when it writes, so there is no manual indexing need by its users. These files are regular HDFS files, but specialised in format for HBase's usage, known as HFiles.
  • 65. Introduction to HBase (Column Oriented Hadoop) HBase is a NoSQL, column oriented database built on top of hadoop to overcome the drawbacks of HDFS as it allows fast random writes and reads in an optimized way. https://s3.amazonaws.com/files.dezyre.com/images/blog/Overview+of+HBase+Architecture+and+its+Components/HBase+Architecture.jpg
  • 66. Introduction to Cassandra (Column- Oriented) Apache Cassandra is a free and open-source distributed NoSQL database management system designed to handle large amounts of data across many commodity servers, providing high availability with no single point of failure. • It is scalable, fault-tolerant, and consistent. • It is a column-oriented database. • Its distribution design is based on Amazon’s Dynamo and its data model on Google’s Bigtable. • Created at Facebook, it differs sharply from relational database management systems. • Cassandra implements a Dynamo-style replication model with no single point of failure, but adds a more powerful “column family” data model. • Cassandra is being used by some of the biggest companies such as Facebook, Twitter, Cisco, Rackspace, ebay, Twitter, Netflix, and more.
  • 68.
  • 69. Introduction to MongoDB (Document Oriented) • Document Oriented and NoSQL database. • Supports Aggregation • Uses BSON format • Sharding (Helps in Horizontal Scalability) • Supports Ad Hoc Queries • Schema Less • Capped Collection • Indexing (Any field in MongoDB can be indexed) • MongoDB Replica Set (Provides high availability) • Supports Multiple Storage Engines
  • 71. •1. Growth of MongoDB 2. Flexible Data Model 3. MongoDB features 4. Rich set drivers and connectivity 5. Availability & Uptime 6. Elastic Scalability 7. Security http://www.habilelabs.io/choose-mongodb-databases/
  • 72. Introduction to Neo4j (Graph DB) • SQL Like easy query language Neo4j CQL • It follows Property Graph Data Model • It supports Indexes by using Apache Lucence • It supports UNIQUE constraints • It contains a UI to execute CQL Commands : Neo4j Data Browser • It supports full ACID(Atomicity, Consistency, Isolation and Durability) rules • It uses Native graph storage with Native GPE(Graph Processing Engine) • It supports exporting of query data to JSON and XLS format • It provides REST API to be accessed by any Programming Language like Java, Spring,Scala etc. • It provides Java Script to be accessed by any UI MVC Framework like Node JS. • It supports two kinds of Java API: Cypher API and Native Java API to develop Java applications.
  • 76. Aggregation MongoDB stores data in BSON (Binary JSON) format, supports a dynamic schema and allows for dynamic queries. The Mongo Query Language is expressed as JSON and is different from the SQL queries used in an RDBMS. MongoDB provides anAggregation Framework that includes utility functions such as count, distinct and group Aggregation operations process data records and return computed results. Aggregation operations group values from multiple documents together, and can perform a variety of operations on the grouped data to return a single result. MongoDB provides three ways to perform aggregation: the aggregation pipeline, the map-reduce function, and single purpose aggregation methods.
  • 84. Map Reduce Map Reduce is a 2 step process to break down a problem statement into a solution. Map : Collate all information together and sort it Reduce: Divide the Map into individual keys and aggregate
  • 85. Big Data 101 What is Big Data? It is a new set of approaches for analysing data sets that were not previously accessible because they posed challenges across one or more of the “3 V’s” of Big Data Volume - too Big – Terabytes and more of Credit Card Transactions, Web Usage data, System logs Variety - too Complex – truly unstructured data such as Social Media, Customer Reviews, Call Center Records Velocity - too Fast - Sensor data, live web traffic, Mobile Phone usage, GPS Data
  • 86. Head Node Data Node Data Node Data Node Data Node Data Node File Big Data 101 Hadoop is just a File System - HDFS Read Optimised & Failure Tolerant
  • 87. REDUCEMAP Big Data 101 Map + Reduce = Extract, Load + Transform Raw Data Raw Data Raw Data Raw Data Mapper Mapper Mapper Mapper Data Data Data Data Reducer Output
  • 88. HDInsight hands on Hadoop Streaming with C# Building the job Executing No need to unzip data
  • 89. HDInsight hands on Hadoop Streaming with C# C:appsdisthadoop-1.1.0-SNAPSHOTbinhadoop.cmd jar C:appsdisthadoop-1.1.0-SNAPSHOTlibhadoop-streaming.jar "-D mapred.output.compress=true" "-D mapred.output.compression.codec=org.apache.hadoop.io.compress.GzipCodec" -files "asv://container@storage/user/hadoop/code/Sentiment_v2.exe" -numReduceTasks 0 -mapper "Sentiment_v2.exe" -input "asv://container@storage.blob.core.windows.net/user/hadoop/data/" -output "asv://container@storage.blob.core.windows.net/user/hadoop/output/Sentiment"
  • 90. HDInsight hands on Hadoop Streaming with C# 276.0|5|bob|government 276.0|5|bob|telling 276.0|5|bob|opposed 276.0|5|bob|liberty 276.0|5|bob|obviously 276.0|5|bob|fail 276.0|5|bob|comprehend 276.0|5|bob|qualifier 276.0|5|bob|legalized 276.0|5|bob|curtis
  • 91. HDInsight hands on Using Pig to Enrich the data • Pig is a query language which shares some concepts with SQL • Invoked from the Hadoop command shell • No GUI • Does not do any work until it has to output a resultset • Under the hood executes Map/reduce jobs
  • 92. HDInsight hands on Using Pig to Enrich the data with Sentiment scores • Load sentiment word lists and assign scores • Loading the data • Preprocess to get some key fields • Count words in various contexts and add sentiment value • Dump results to Azure Blob Storage
  • 93. Using Pig to Enrich the data Code sample: LOAD Operation data_raw = LOAD ‘<filename>' USING PigStorage('|') AS (filename:chararray,message_id:chararray,a uthor_id:chararray,word:chararray);
  • 94. Using Pig to Enrich the data Code sample: JOIN Statement words_count_sentiment = JOIN words_count_flat BY words LEFT, sentiment BY sentiment_word;
  • 95. Using Pig to Enrich the data Code sample: SUM Operation message_sum_sentiment = FOREACH messages_grouped GENERATE group AS message_details, SUM(messages_joined.sentiment_value) AS sentiment;
  • 96. HDInsight hands on Outputting results to Hive • Hive is a near SQL compliant language with a lot of similarities • Again, under the hood issues MapReduce queries • Exposed to ODBC
  • 97. HDInsight hands on Outputting results to Hive • Create some Hive tables to reference the Pig Output • Use the Interactive console
  • 98. Outputting data to Hive Code review: CREATE EXTERNAL TABLE CREATE EXTERNAL TABLE words ( word STRING, counts INT, sentiment INT ) ROW FORMAT DELIMITED FIELDS TERMINATED BY '124' STORED AS TEXTFILE LOCATION 'asv://westburycorpus@westburycorpusnoreur.blob.co re.windows.net/user/hadoop/pig_out/words';
  • 103. Big Data Technologies BI Strengths and Weaknesses © 2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555 103
  • 104. The Current Solutions 10,000 2005 20152010 5,000 0 Current Database Solutions are designed for structured data. • Optimized to answer known questions quickly • Schemas dictate form/context • Difficult to adapt to new data types and new questions • Expensive at petabyte scale STRUCTURED DATA UNSTRUCTURED DATA GIGABYTESOFDATACREATED(INBILLIONS) 10%
  • 105. Main Big Data Technologies Hadoop NoSQL Databases Analytic Databases Hadoop • Low cost, reliable scale-out architecture • Distributed computing Proven success in Fortune 500 companies • Exploding interest NoSQL Databases • Huge horizontal scaling and high availability • Highly optimized for retrieval and appending • Types • Document stores • Key Value stores • Graph databases Analytic RDBMS • Optimized for bulk-load and fast aggregate query workloads • Types • Column-oriented • MPP • In-memory
  • 106. © 2010, Pentaho. All Rights Reserved. www.pentaho.com. Hadoop Core Components • Hadoop Distributed File System (HDFS) • Massive redundant storage across a commodity cluster • MapReduce • Map: distribute a computational problem across a cluster • Reduce: Master node collects the answers to all the sub-problems and combines them • Many distros available US and Worldwide: +1 (866) 660-7555 | Slide
  • 107. Major Hadoop Utilities Apache Hive Apache Pig Apache HBase Sqoop Oozie Hue Flume Apache Whirr Apache Zookeeper SQL-like language and metadata repository High-level language for expressing data analysis programs The Hadoop database. Random, real -time read/write access Highly reliable distributed coordination service Library for running Hadoop in the cloud Distributed service for collecting and aggregating log and event data Browser-based desktop interface for interacting with Hadoop Server-based workflow engine for Hadoop activities Integrating Hadoop with RDBMS
  • 109. “The working conditions can be are shocking” ETL Developer Big Data Platform Challenges
  • 110. Challenges 1. Somewhat immature 2. Lack of tooling 3. Steep technical learning curve 4. Hiring qualified people 5. Availability of enterprise-ready products and tools 6. High latency (Hadoop) 7. Running inside the cluster
  • 111. Challenges Would you rather do this? Scheduling Modeling Ingestion / Manipulation / Integration … OR THIS?
  • 112. Compatibility with SPARK • The MongoDB Connector for Spark provides integration between MongoDB and Apache Spark. • With the connector, you have access to all Spark libraries for use with MongoDB datasets: Datasets for analysis with SQL (benefiting from automatic schema inference), streaming, machine learning, and graph APIs. You can also use the connector with the Spark Shell. • The MongoDB Connector for Spark is compatible with the following versions of Apache Spark and MongoDB: • MongoDB Connector for Spark Spark Version MongoDB Version • 2.2.0 2.2.x 2.6 or later • 2.1.0 2.1.x 2.6 or later • 2.0.0 2.0.x 2.6 or later • 1.1.0 1.6.x 2.6 or later
  • 116. Spark – Bare metal https://weidongzhou.files.wordpress.com/2015/09/spark_engi ne.jpg
  • 121. Query Optimisation based on 3.6(Advanced functions) • Retryable Writes • Causal Consistency • Change Streams • New Aggregation Pipeline Stages and Operators • Performance Advisor • Default Bind to localhost • Array Updates
  • 122. Retryable Writes There’s always room for error when writing to a database even when you think you’ve got all your bases covered. With MongoDB 3.6, you no longer run the risk of executing an update twice because of network glitches and the like, thanks to the new Retryable Writes feature. Instead of the developer or the application, it’s now the driver itself that handles these system flukes. The MongoDB driver that comes with 3.6 can automatically retry certain failed write operations once, to recover from transient network errors or replica set failovers. The benefit here is all in the feature name: your writes will automatically be retried by MongoDB itself, so you don’t have to worry about any write inconsistencies.
  • 123. Causal Consistency Prior to MongoDB 3.6, reading from primaries was the only reliable way to go. Causal relationships between read and write operations as they occurred on primaries (and got replicated to secondaries) weren’t guaranteed. These could result in lags (e.g. writes to the primary not replicated to the secondaries, multiple secondaries writing updates at different times, etc.) which could make reading from secondaries inconsistent. This all changes with MongoDB 3.6, which in tl;dr format is: you can now also reliably read from secondaries. You can find the longer technical explanation here.
  • 124. Change Streams Just like you get notified about real-time changes for about almost anything these days, MongoDB is now also able to do the same through a feature called Change Streams. The benefit of Change Streams is immediately visible. You can now subscribe to changes in a collection and get notified. A new method, called watch, listens for these changes, notifies you, and can even trigger an automatic series of events, as defined in your change stream. Change streams can “listen” to five events for now (Insert, Delete, Replace, Update and Invalidate) and can only be set up based on user roles, which means that only those who have read access to collections can create change streams in those collections.
  • 125. New Aggregation Pipeline Stages and Operators MongoDB users can feel a bit more empowered by an aggregation pipeline that boasts new operators, stages, and an improved $lookup operator with even more powerful join capabilities. Studio 3T’s Aggregation Editor will of course support these new additions, the full list of which you can find in the MongoDB 3.6 Release Notes.
  • 126. Performance Advisor MongoDB’s Ops Manager comes bundled with Performance Advisor, a feature that alerts you about slow queries – meaning queries that take longer than the default slowOpThresholdMs of 100 milliseconds – and suggests new indexes to improve query performance. Indexes help speed up queries significantly, so having automated suggestions on how to optimize them is quite a leg-up. But there is a tradeoff to consider: the more indexes you have, the worse your write performance. And it’s still up to you – and not Performance Advisor – to strike the right balance.
  • 127. Default bind to localhost In an effort to enforce security, MongoDB 3.6 now by default binds to localhost if no authentication is enabled, so that only connections from clients running on the same machine are accepted in such a case. Only users from whitelisted IP addresses can externally connect to your unsecured databases, everything else will be denied.
  • 128. Array Updates Nested arrays are easier to manipulate that ever in MongoDB 3.6. Now, the query $type : "array" detects that fields are arrays, unlike before when it would only return documents with array fields with an element of BSON type array. MongoDB also introduced new operators which will make updating all elements in an array much easier and with less code. We already made showing nested fields and exploring arrays easier with Stud
  • 131. Amazon Cloud Features • Elastic Web-Scale Computing • Completely Controlled • Flexible Cloud Hosting Services • Designed for use with other Amazon Web Services • Reliable • Secure • Inexpensive • Easy to Start DeepCut Confidential
  • 133. Google AppEngine Features Popular languages and frameworks Focus on your code Multiple storage options Powerful built-in services Familiar development tools Deploy at Google scale DeepCut Confidential
  • 136. IBM SmartCloud Features Expert Cloud Consulting Private and Hybrid Clouds IaaS, PaaS and SaaS Speed Empowerment Economics DeepCut Confidential
  • 140. Microsoft Cloud Features Infrastructure Services Develop Modern Applications Insights from Data Identity and Access Management DeepCut Confidential
  • 149. Deep Diving on the functions of Mongo DB 3.4 Database commands mongo shell methods
  • 150. Indexing Techniques Create Indexes to Support Your Queries An index supports a query when the index contains all the fields scanned by the query. Creating indexes that supports queries results in greatly increased query performance. Use Indexes to Sort Query Results To support efficient queries, use the strategies here when you specify the sequential order and sort order of index fields. Ensure Indexes Fit in RAM When your index fits in RAM, the system can avoid reading the index from disk and you get the fastest processing. Create Queries that Ensure Selectivity Selectivity is the ability of a query to narrow results using the index. Selectivity allows MongoDB to use the index for a larger portion of the work associated with fulfilling the query.
  • 151. Name Description db.collection.createIndex() Builds an index on a collection. db.collection.dropIndex() Removes a specified index on a collection. db.collection.dropIndexes() Removes all indexes on a collection. db.collection.getIndexes() Returns an array of documents that describe the existing indexes on a collection. db.collection.reIndex() Rebuilds all existing indexes on a collection. db.collection.totalIndexSize() Reports the total size used by the indexes on a collection. Provides a wrapper around the totalIndexSize field of the collStats output. cursor.explain() Reports on the query execution plan for a cursor. cursor.hint() Forces MongoDB to use a specific index for a query. cursor.max() Specifies an exclusive upper index bound for a cursor. For use with cursor.hint() cursor.min() Specifies an inclusive lower index bound for a cursor. For use with cursor.hint() Indexing Methods in the mongo Shell
  • 152. Name Description createIndexes Builds one or more indexes for a collection. dropIndexes Removes indexes from a collection. compact Defragments a collection and rebuilds the indexes. reIndex Rebuilds all indexes on a collection. validate Internal command that scans for a collection’s data and indexes for correctness. geoNear Performs a geospatial query that returns the documents closest to a given point. geoSearch Performs a geospatial query that uses MongoDB’s haystack index functionality. checkShardingIndex Internal command that validates index on shard key. Indexing Database Commands
  • 153. Name Description $geoWithin Selects geometries within a bounding GeoJSON geometry. The 2dsphere and 2d indexes support $geoWithin. $geoIntersects Selects geometries that intersect with a GeoJSON geometry. The 2dsphere index supports $geoIntersects. $near Returns geospatial objects in proximity to a point. Requires a geospatial index. The 2dsphere and 2d indexes support $near. $nearSphere Returns geospatial objects in proximity to a point on a sphere. Requires a geospatial index. The 2dsphere and 2d indexes support $nearSphere. Geospatial Query Selectors
  • 154. Name Description $explain Forces MongoDB to report on query execution plans. See explain(). $hint Forces MongoDB to use a specific index. See hint() $max Specifies an exclusive upper limit for the index to use in a query. See max(). $min Specifies an inclusive lower limit for the index to use in a query. See min(). $returnKey Forces the cursor to only return fields included in the index. Indexing Query Modifiers
  • 155. Thanks !!! Keep in touch Rajesh30menon @YAHOO, GMAIL, HOTMAIL, SKYPE, TWITTER, INSTAGRAM, PINTEREST My blog : http://www.technospirituailty.com MY BOOKS : Link : https://goo.gl/bQ8cnM (Amazon.com) Link : https://goo.gl/owgMxT (Amazon.in) http://www.technospirituality.com 155

Notas del editor

  1. The following trends make it increasingly difficult to perform analytics with relational databases And more importantly, the following trends make it near to impossible to perform these analytics within the click stream. (i.e. on-the-fly analysis and results)
  2. Creating more data year after year. Storing and process this data is becoming increasing difficult for the relational databases
  3. The total amount of data grows and becomes more connected. However it’s losing some of its predictable structure. Blame generation Y! Yes me. I don’t want my information to fit into a 1970’s style database anymore., I want it to be all about me. This causes data to become more morphable.
  4. Before we start talking about NOSQL let’s give relational databases a little credit. Relational database are still great for tabular data Performance degrades as data becomes more deeply connected and voluminous I’m not telling to you shy away from relational database, but in this polyglot persistence world different use cases require different ways of storing and process today’s data
  5. To find all friends on depth 5, MySQL will create Cartesian product on t_user_friend table 5 times, resulting in 50,0005 records, out of which all but 1,000 are discarded. Neo4j, on the other hand, will simply visit nodes in the database, and when there is no more nodes to visit, it will stop the traversal.
  6. Its not magic, its all about the data structure and how they’re localized. Lets say we have about 50 people in the room and I ask you the count the people around you. It may take a few seconds to complete the task. But if we add a 100 more people in the room, you ability to count the people around you is only slightly affected by the increase in total number of people
  7. 4 types of databases in the NOSQL universe: K-V Stores Column Family Store Document Databases Graph Databases Who here has worked with NOSQL stores before? For the people that raised their hand how many used... KV Stores? Column Family? Document DBs? Graph DBs? If you raised your hand for Graph DBs, then pat yourself on the back b/c that’s where I spend most of my time.
  8. Let’s look at each of the types Its a massively scalable HashMap
  9. Strengths: Again...Its a HashMap! If you can understand how HashMaps work, the KV stores are relatively easy to adopt. Weaknesses: At the same time the simple data structure is a weakness. It’s difficult to represent complex and interconnected data.
  10. Essentially K-VVVVVVVVVV stores
  11. Strengths: Supports semi-structured data Weaknesses: Does not handle interconnected data well. You may pull your hair out trying to write code against these stores. However, the Spring Data project aims to reduce some of that complexity
  12. These are becoming more popular today Contains documents and a document is simply a key-value collection Usually have great index support!!! Is there anyone out there thats still using Notes? Please say no. Notes was actually one of the early Document Databases. I suppose you can say that’s one thing that isn’t completely terrible from the Lotus products.
  13. Again we see this trend where all of these NOSQL stores do not handle interconnected data well. I wonder where this is going
  14. Finally we have graph databases. My little section of the NOSQL universe
  15. Has the richest data model of all of the NOSQL types Graphs are mutable which makes it extremely hard to shard because graphs are naturally mutable. You can shard based on domains but you would need to reduce the chances of creating relationships between the two graphs.
  16. In the following graph we see that KV stores are the best at scaling due to their simplistic data model and Graph databases are the worst at scaling because of the complexity and interconnectedness of the data. Even though Graphs DBs are the worst at scaling out of all of the NOSQL types, we’re still able to cover 90% of today’s use cases.
  17. Indexing relational DBs: Some people classify SOLR as a NOSQL store
  18. The relational model is quite mature, but Graph theory is much older. So when you boss says that you can’t use Graph database because they’re not mature enough, just tell him that he needs to check his facts.
  19. This is my homeboy Leonhard Euler. Inventor of Graph Theory, swiss math ninja, Volvo lover, and apparently from his choice in clothing, he’s also the original hipster. But I’ll let that one slide.
  20. What you draw on the white board is what you implement in your code. And truthfully, this was the main reason why I was attracted to graph databases in the first places. I constantly found myself in the position where I would map out my domain on a white board, spend a ton of time normalizing my tables thinking I was this total SQL badass ninja, then I would deploy to production and performance would be horrible. Then I would have to denormalize the crap out of my database and before I knew it a week had already passed.
  21. And more specifically how to query a Graph database? Some of you already know this comic, but I have to give credit to the Basho Riak team for having a nerdy sense of humor.
  22. The real answer for the Graph db world is traversals
  23. This brings us to an experiment in which Neo Technology has benchmarked performance of MySQL and Neo4j in a social graph
  24. We want to run a query that find all of the friends of Kyle. then the friends of his friends and so on.
  25. We have a table that stores all users and another table that stores primary and foreign keys that map the friendships
  26. This is an example of the SQL query used as depth 3. find friends of friends of friends of a particular user
  27. find friends of friends of friends of the user We see a dramatic decrease in performance the more inner joins we add to the query.
  28. For Neo4j the social network is a typical graph
  29. Neo4j’s traversal API is used to return a result set. IS_FRIEND_OF = traverse relationships that a typed “IS_FRIEND_OF” Evaluator.atDepth(2) = is how you limit the depth Uniqueness.NODE_GLOBAL = means a node cannot be traversed more than once traverse(nodeById) = is the id of the node where we want to start our traversal
  30. So let look at Neo4j’s performance We see that performance is relatively unaffected as we increase the depth of traversal
  31. We perform the same queries but we increase the total amount of users to 1 million. In MySQL we will have 1,000,000 records in t_user table, and approximately 1,000,000 X 50 = 50,000,000 records in t_user_friend table.
  32. 1,543.505 ~ 25 minutes Depth five didn’t finish after running for an hour
  33. For Neo4j we have a linear increase in execution time.
  34. TAKE-AWAYS Pentaho provides complete integrated DI+BI for every leading big data platform.
  35. Big Data solutions are not databases. They don’t provide the capabilities that BI toolsets expect of a database. Hadoop also has a high latency. This means the smallest query possible has an execution time that is much slower than that of a database Hadoop is optimized for executing very intensive data processing tasks on very large amounts of data. It is not optimized for quick queries. Some Hadoop experts recommend configuring the workloads so that Hadoop jobs take an hour or more. This conflicts with OLAP performance criteria of 5-10 seconds per query. There are database implementations within the Hadoop world, Hive, HBase etc.
  36. Unfortunately for developers who are used to working with data transformation tools, the productivity within the Hadoop environment is not what they are used to.
  37. TAKE-AWAYS The better choice is obviously visual development
  38. Focus on your code
  39. Focus on your code
  40. Focus on your code