The power of Hadoop lies in its ability to help users cost effectively analyze all kinds of data. We are now seeing the emergence of a new class of analytic applications that can only be enabled by a comprehensive big data platform. Such a platform extends the Hadoop framework with built-in analytics, robust developer tools, and the integration, reliability, and security capabilities that enterprises demand for complex, large scale analytics. In this session, we will share innovative analytics use cases from actual customer implementations using an enterprise-class big data analytics platform.
Common theme: moving time, space, or processor intensive processing to Hadoop.
Flume provides ingestion of streaming data (e.g. logs) into Hadoop.
Client executesSqoop job.Sqoop interrogates DB for column names, types, etc.Based on extracted metadata, Sqoop creates source code for table class, and then kicks off MR job. This table class can be used for processing on extracted records.Sqoop by default will guess at a column for splitting data for distribution across the cluster. This can also be specified by client.
Pentaho also has integration with NoSQL DBs (Mongo, Cassandra, etc.)
Most of these tools integrate to existing data stores using the ODBC standard.
MSTR and Tableau are tested and certified now with the Cloudera driver, but other standard ODBC based tools should also work, and more integrations will be supported soon.
Also, Cloudera has implemented a solution for multi-user, which will also soon support authentication.