TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
Big Data on The Cloud
1. Big data on the Cloud
Dr. Putchong Uthayopas
Department of Computer Engineering, Faculty of
Engineering, Kasetsart University.
Email: pu@ku.ac.th
2. We are living in the world of Data
Video
Surveillan
ce
Social
Media
Mobile
Sensors
Smart
Grids Geophysi Medical Imaging
Gene Sequencing
cal
Explorati
on
3. Why now?
• Internet create an ability to gather all data
together at the scale never be seen before.
– Data from human
– Data from Sensor
• Crowd Sourcing is now being practice
– User generated data is flooding the world
• New device and tools make it easy to generate
data
4. Big Data
“Big data is data that exceeds the processing
capacity of conventional database systems. The
data is too big, moves too fast, or doesn’t fit the
strictures of your database architectures. To gain
value from this data, you must choose an
alternative way to process it.”
Reference: “What is big data? An introduction to the big data
landscape.”, Edd Dumbill,
http://radar.oreilly.com/2012/01/what-is-big-data.html
5. Amazon View of Big Data
'Big data' refers to a collection of tools,
'Big data' refers to a collection of tools,
techniques and technologies which make ititeasy
techniques and technologies which make easy
to work with data at any scale. These distributed,
to work with data at any scale. These distributed,
scalable tools provide flexible programming
scalable tools provide flexible programming
models to navigate and explore data of any
models to navigate and explore data of any
shape and size, from a variety of sources.
shape and size, from a variety of sources.
9. Information as an Asset
• Cloud will enable larger and larger data to be
easily collected and used
• People will deposit information into the cloud
– Bank, personal ware house
• New technology will emerge
– Larger and scalable storage technology
– Innovative and complex data analysis/visualization
for multimedia data
– Security technology to ensure privacy
• Cloud will be mankind intelligent and memory!
10.
11. Google Cloud Platform
• App engines
– mobile and web app
• Cloud SQL
– MySQL on the cloud
• Cloud Storage
– Data storage
• Big Query
– Data analysis
• Google Compute Engine
– Processing of large data
12. Amazon
• Amazon EC2
– Computation Service using VM
• Amazon DynamoDB
– Large scalable NoSQL databased
– Fully distributed shared nothing architecture
• Amazon Elastic MapReduce (Amazon EMR)
– Hadoop based analysis engine
– Can be used to analyse data from DynamoDB
13. Trends
• A move toward large and scalable Virtual
Infrastructure
– Providing computing service
– Providing basic storage service
– Providing Scalable large database
• NOSQL
– Providing Analysis Service
• All these services has to come together
– Big data can not moved!
14. Issues
• Security
– Will you let an important data being accumulate outside your
organization?
• If it is not an important data, why analyze them ?
– Who own the data? If you discontinue the service, is the data
being destroy properly.
– Protection in multi-tenant environment
• Big data can not be moved easily
– Processing have to be near. Just can not ship data around
• So you finally have to select the same cloud for your processing. Is it
available, easy, fast?
• New learning, development cost
– Need new programming, porting?
– Tools is mature enough?
15. When to use Big data on the Cloud
• When data is already on the cloud
– Virtual organization
– Cloud based SaaS Service
• For startup
– CAPEX to OPEX
– No need to maintain large infra
– Focus on scalability and pay as you go
– Data is on the cloud anyway
• For experimental project
– Pilot for new services
16. Summary
• Big data is coming.
– Big data are being accumulate anyway
– Knowledge is power.
• Better understand your customer so you can offer
better service
• Tools and Technology is available
– Still being developed fast
• Cloud is coming, why not doing big data
on the cloud
– Probably not today but soon
The sources of information are expanding. Many new sources are machine generated. It’s also big files (siesmic scans can be 5TB per file) and massive numbers of small files (email, social media). Leading companies for decades have always sought to leverage new sources of data, and the insights that can be gleaned from those data sources, as new sources of competitive advantage. More detailed structured data New unstructured data Device-generated data But big data isn’t only about data, a comprehensive big data strategy also needs to consider the role and prominence of new, enabling-technologies such as: Scale out storage MPP database architectures Hadoop and the Hadoop ecosystem In-database analytics In-memory computing Data virtualization Data visualization