Safely exploiting the power of data and information is vital to your business. In this seminar, Iain, Tim and David will look at the advantages and risks presented to your business by 'Big Data' and its potential for data analysis.
They will investigate a number of uses of NoSQL databases to support this analysis, and will discuss other business considerations including how to distribute data across multiple locations, and significantly reduce potential cost of storage.
The seminar concludes with an overview of modern ways of securing your data assets, including different authentication methods and encryption.
4. What is Data?
• Data is everywhere
• You have more than you think
• It’s your biggest asset
5. So What Is “Big” Data?
• Many Definitions
• Study by Ward & Barker of St Andrews
• “Big data is a term describing the storage
and analysis of large and or complex data
sets using a series of techniques
including, but not limited to: NoSQL, Map
Reduce and machine learning.”
6. So What Is “Big” Data?
• We have a huge amount of data:
– 90% of data was created in the last two years
– 2.5 Exabyte's (2.5×1018) of data created
every day
• Data Analysis on a huge scale
10. How and Why Big Data is Used
• Healthcare
• Scientific Research
(Folding@Home, SETI)
• Market Research
• Business Operation Optimisation
11. Why to use Big Data
• Investigative and Predictive
• Increasing amount of public data access
• Enables high level understanding of previously
unfathomable datasets
12. Why Not to Use Big Data
• Expensive
• Limited Pool of talent
• Not always applicable
• Must be used correctly: Correlation does not mean
causation
...yaaaarrrr?!
13. Conclusion
• Big Data technologies may or may not be
right for you
• But the principles are universal:
– Gather your data
– Use novel new sources such as Social Media
and public data initiatives
– Analyse it intelligently
25. Column stores
Item Name
Number Of Sales
Total Cost (£)
Total Revenue (£)
Origin
Orange Juice
152,000
76,000
152,000
Spain
Apple Juice
137,000
54,800
123,300
UK
Pineapple Juice
63,000
37,800
78,750
Brazil
Grape Juice
84,000
46,200
92,400
Spain
26. Column stores
Item Name
Number Of Sales
Total Cost (£)
Total Revenue (£)
Origin
Orange Juice
152,000
76,000
152,000
Spain
Apple Juice
137,000
54,800
123,300
UK
Pineapple Juice
63,000
37,800
78,750
Brazil
Grape Juice
84,000
46,200
92,400
Spain
= 436,000
27. Column stores
Item Name
Number Of Sales
Total Cost (£)
Total Revenue (£)
Origin
Orange Juice
152,000
76,000
152,000
Spain
Apple Juice
137,000
54,800
123,300
UK
Pineapple Juice
63,000
37,800
78,750
Brazil
Grape Juice
84,000
46,200
92,400
Spain
28. Column stores
Item Name
Number Of Sales
Total Cost (£)
Total Revenue (£)
Origin
Orange Juice
152,000
76,000
152,000
Spain
Apple Juice
137,000
54,800
123,300
UK
Pineapple Juice
63,000
37,800
78,750
Brazil
Grape Juice
84,000
46,200
92,400
Spain
Profit = £231,650
35. Case Study: Middle Earth University
Introduction to
Alchemy
Wed 11AM
Advanced
Alchemy
Wed 1PM
World Domination
Wed 9AM
Introduction to
Magic
Wed 11AM
Advanced Magical
Techniques
Wed 9AM
36. Case Study: Middle Earth University
Introduction to
Alchemy
Wed 11AM
All courses running at
11AM on Wednesday
Introduction to
Magic
Wed 11AM
37. Case Study: Middle Earth University
Introduction to
Alchemy
Wed 11AM
Advanced
Alchemy
Wed 1PM
World Domination
Wed 9AM
Introduction to
Magic
Wed 11AM
Advanced Magical
Techniques
Wed 9AM
38. Case Study: Middle Earth University
Introduction to
Alchemy
Wed 11AM
Advanced
Alchemy
BMag
Wed 1PM
World Domination
Wed 9AM
Introduction to
Magic
Wed 11AM
Advanced Magical
Techniques
Wed 9AM
MMag
DMag
39. Case Study: Middle Earth University
Advanced
Alchemy
Wed 1PM
DMag
Advanced Magical
Techniques
Wed 9AM
40. Case Study: Middle Earth University
Advanced
Alchemy
Wed 1PM
DMag
Advanced Magical
Techniques
Wed 9AM
41. Case Study: Middle Earth University
Introduction to
Alchemy
Shire Lecture Hall
Wed 11AM
Advanced
Alchemy
Wed 1PM
Mordor Seminar
Room
World Domination
Wed 9AM
BMag
Introduction to
Magic
Wed 11AM
MMag
Advanced Magical
Techniques
Wed 9AM
DMag
42. Case Study: Middle Earth University
Introduction to
Alchemy
Wed 11AM
Advanced
Alchemy
Wed 1PM
Mordor Seminar
Room
43. Is NoSQL for everyone?
• Most businesses functioning effectively using
only relational databases.
• Not the grand solution to all data storage
problems.
• Train or employ → NoSQL knowledge.
45. NoSQL is showing significant promise for
certain aspects of almost any business.
46. Reasons to use NoSQL in your
business
• Potential significant financial savings.
• Easy to adapt stored data as your business
grows and your priorities change.
• Exceeding the performance of popular
commercial relational databases.
47. Reasons to use NoSQL in your
business
Effective tool for a holistic approach to
analysing the growth/status of your business.
48. Reasons to use NoSQL in your
business
Relational databases are
not the only solution to
your data storage
problems.
50. Why Is Data Security Important
• The cost of a data breach is continuing to rise
• Fewer customers remain loyal after a data
breach
• Reputation losses and diminished goodwill – lost
business cost has steadily increase over the last
6 years (£500 thousand in 2007)
• Malicious or criminal attacks are the most costly
51. The Cost Of a Data Breach
2013 Cost Of Data Breach Study: United Kingdom [Ponemon Institute, May 2013]
52. The Cost Of a Data Breach
2013 Cost Of Data Breach Study: United Kingdom [Ponemon Institute, May 2013]
53. The Cost Of a Data Breach
2013 Cost Of Data Breach Study: United Kingdom [Ponemon Institute, May 2013]
54. The Causes Of a Data Breach
2013 Cost Of Data Breach Study: United Kingdom [Ponemon Institute, May 2013]
55. Current Methods Of
Authentication
1. Basic User Name and Passwords
2. Biometrics
•
•
•
•
Fingerprint Scanners
Voice recognition
Face scanning and recognition
Retina and iris scans
3. Multi-Factor Authentication
• Something possessed, as in a physical token or telephone
• Something known, such as a password or mother’s maiden
name
• Something inherent, like a biometric trait
56. Pros and Cons Of These
Methods
1. Standard Username and Password
authentication is extremely vulnerable to
Rainbow Attacks
2. Relies on the ability of the system users
to pick secure passwords
Adobe Crossword
57. Pros and Cons Of These
Methods
• In theory biometrics is a great way to authenticate a user. Its impossible
to lose your finger prints, unless you have both your hands chopped off.
58. The Best Solution
• Multi-factor Authentication. A security measure that requires two or more
kinds of evidence that you are who you say you are.
• Authentication requires a combination of these bits of evidence rather
than simply using one or the other.
• Something you know – Username, Password
• Something you have – An RSA Key, Credit Card
• Something inherent – A fingerprint, retina scan
• Multi-factor Authentication is very secure, but it is hard to implement
everywhere.
• Requires users to remember to carry their RSA keys with them.
59. Emerging Methods Of
Authentication
• YubiKey – Authentication method based on a unique physical token which
cannot be duplicated or recorded, providing a credential based on
something only an authorised user possesses.
• Can also be used with password managers such as LastPass
61. Quantum Cryptography
What is Quantum Cryptography?
The use of quantum mechanical effects to perform cryptographic tasks
or to break cryptographic systems
What does that mean exactly?
• Using physics rather than mathematics to perform cryptographic
tasks, such as generating cryptographic keys
• Moreover Quantum Cryptography addresses the problem of Key
distribution
62. Quantum Cryptography
How does it work?
• It works by using a technique called Quantum Key Distribution (QKD).
QKD enables two parties to produce a shared random secret key which is
only known to them. They can then use this key to encrypt and decrypt
messages passed between those parties.
• Keys are generated by using photons, which are produced using LEDS.
These photons are then polarised using polarising filters and then
transmitted
• The two parties decide on what filters are going to be used, and also
assign a value, usually a binary value to each photon which has a certain
polarisation
• When the whole transmission has happened a unique key has been
produced
63. Quantum Cryptography
What is the benefits of using Quantum Cryptography?
• An important property of quantum cryptography is the ability to detect the
presence of a third party attempting to eavesdrop on the transmission of
the secret key
• This is achieved because of a fundamental principle of quantum
mechanics – the process of measuring a quantum system in general
disturbs the system.
66. Upcoming Seminars
• Capturing the Real Value of IT Service
Management- Friday14th February
• Preparing for BYOD & Mobile Device
Management- Friday 28th February
Notas del editor
-There’s been a data revolution – its everywhere-Huge variety of sources -News Aggregators, Social Media, Search Engines, Internet -New Government initiatives (data.gov.uk) -Public data -Geographical, weather, transport, literature, historical records-You have more than you think: -Customer Information -Sales and financial data -Employee data -Stock -Intellectual property -Logistics -Sensors (Internet of Things)-Biggest Asset-Has monetary value -Can be used in a huge variety of ways to improve business
-Good Question-As many definitions as people you ask-Jonathan Ward + Adam Barker + St Andrews University did a study-Asked big venders-Oracle “relational + unstructured data combined for Business Intelligence”-Microsoft “applying Artificial Intelligence and distributed computing to large datasets”-Most definitions technology focused-Vague-Study concluded with *click for quote
-Huge amounts of data nowadays-Need new techniques to analyse and store itDid you know:-90% of data was generated in the last 2 years-2.5 Exabytes (more than 2.7 trillion megabytes OR 17,179,869 iPod classics - 160gb)-Data analysis on a huge scale
-Healthcare: Predict trends in diseases and effectiveness of treatment, e.g. UK Biobank – collected medical, lifestyle and geographical data of 500,000 people to find what causes developments of major diseases, and the effectiveness of different treatments on them-Scientific Research: Folding@Home and SETI@Home-Market Research: -Billion Prices Project MIT -Twitter Sentiment -Google Analytics-Business Operation Analysis and Optimisation: -Tesco predict stock -FedEx package tracking and logistics optimisation -Amazon stock layout optimisation-Advertising:GoogleAdWords and Facebook Ad Audiences Hadoop/MapReduceNoSQLDistributed Computing and Virtualisation
Hardware/expertise expensiveFew Big Data specialists (but growing)Not always the right tool (do you have “BIG DATA”?)Causation: Remember the pirates?...
Key Points:You have more data than you thinkYou can do a lot more with it than you thinkSo:-Gather data on you and your customers-Use the analytical approach of Big Data to make informed business decisions
Bill Clinton in office, AyrtonSenna died in an accident during the San Marino Grand Prix, China got its first connection to the internet and The Lion King was released into the cinema.
To guarantee reliability.ACID Atomicity requires that each transaction is "all or nothing”.The consistency property ensures that any transaction will bring the database from one valid state to another.The isolation property ensures that the concurrent execution of transactions results in a system state that would be obtained if transactions were executed serially, i.e. one after the other.Durability means that once a transaction has been committed, it will remain so.BASE (Basically Available, Soft state, Eventual consistency)
Twitter: Apache Cassandra - Our geo team uses it to store and query their database of places of interest. The research team uses it to store the results of data mining done over our entire user base. Those results then feed into things like @toptweets and local trends. Our analytics, operations and infrastructure teams are working on a system that uses cassandra for large-scale real time analytics for use both internally and externally.Facebook: Hbase – Messaging platform introduced at the end of 2010. Can deal with very high throughputs.BBC: CouchDB -The BBC is building a new environment that allows cost-effective building of dynamic content platforms.Theguardian: MongoDB - Storage of articles.NASA: Allegrograph - Storing assets and being able to provide a meaningful search through links between a variety of different kinds of assets from software to drawings to documents to employee skills.
This is a very simple method of storing data.Unique key and data associated with the key.Inherent expectation of being distributed over many machines -> Highly available data stores, minimal downtime.
Data is stored by column, rather than by row. Ideal for sparsely populated databases. Large reductions in the storage requirements.Very good for finding aggregate values.PIVOT TABLE!!!Use for your business, a single database containing anything purchased by your company over the past year. Quickly do analysis, such as average spend per employee or total amount spent by a particular department. Rather than pull in selected data from all over your business into a data warehouse in order to do analysis, you could store all the varied information in a single data store and run all kinds of analysis.
Data is stored by column, rather than by row. Ideal for sparsely populated databases. Large reductions in the storage requirements.Very good for finding aggregate values.PIVOT TABLE!!!Use for your business, a single database containing anything purchased by your company over the past year. Quickly do analysis, such as average spend per employee or total amount spent by a particular department. Rather than pull in selected data from all over your business into a data warehouse in order to do analysis, you could store all the varied information in a single data store and run all kinds of analysis.
Data is stored by column, rather than by row. Ideal for sparsely populated databases. Large reductions in the storage requirements.Very good for finding aggregate values.PIVOT TABLE!!!Use for your business, a single database containing anything purchased by your company over the past year. Quickly do analysis, such as average spend per employee or total amount spent by a particular department. Rather than pull in selected data from all over your business into a data warehouse in order to do analysis, you could store all the varied information in a single data store and run all kinds of analysis.
Data is stored by column, rather than by row. Ideal for sparsely populated databases. Large reductions in the storage requirements.Very good for finding aggregate values.PIVOT TABLE!!!Use for your business, a single database containing anything purchased by your company over the past year. Quickly do analysis, such as average spend per employee or total amount spent by a particular department. Rather than pull in selected data from all over your business into a data warehouse in order to do analysis, you could store all the varied information in a single data store and run all kinds of analysis.
A more holistic approach to storing data.Documents with varying data and structures can be kept together.You are not penalised as your business grows and your data model changes.Great for storing your business assets.FILING CABINET.Get a picture of a medical store.
A more holistic approach to storing data.Documents with varying data and structures can be kept together.You are not penalised as your business grows and your data model changes.Great for storing your business assets.FILING CABINET.Get a picture of a medical store.
A more holistic approach to storing data.Documents with varying data and structures can be kept together.You are not penalised as your business grows and your data model changes.Great for storing your business assets.FILING CABINET.Get a picture of a medical store.
Where the links between data (edges) become as important as the data itself (nodes). Specialised data stores, particularly suited to social networks. If it is important to know exactly the relationship between one piece of data and another, this may be the solution to your problem.Inherent value in the links state can change.Unknown link.
Distribute your database over a number (lower cost) machines -> ‘always on’ solution. Reduces downtime and, hence, risk.
Id Quantique – Swiss Company developed a machine which was used in the Swiss parliamentary election which was used to securely pass results of the election. This was done by using quantum cryptography.It works by using a technique called Quantum Key Distribution (QKD). QKD enables two parties to produce a shared random secret key which is only known to them. They can then use this key to encrypt and decrypt messages passed between those parties.Keys are generated by using Photos, which are produced using LEDS. These Photons are then polarised using polarising filtersAn important property of Quantum cryptography is the ability to detect the presence of a third party who is attempting to eavesdrop on the transmission of the secret key, thus being able to encrypt and decrypt messages themselves.However, a fundamental principle of quantum mechanics – the process of measuring a quantum system in general disturbs the system.