Learn why 451 Research believes Infochimps is well-positioned with an easy-to-consume managed service for those without Hadoop expertise, as well as a stack of technologically interesting projects for the 'devops' crowd.
Opening with a market positioning statement and ending with a competitive and SWOT analysis, Matt Aslett provides a comprehensive impact report.
1. Infochimps targets enterprises with
stream-processing additions to 'big
data' PaaS
Analyst: Matt Aslett
14 Nov, 2012
'Big data' PaaS provider Infochimps has updated its Infochimps Platform with the addition of
stream-processing capabilities to the Infochimps Data Delivery Service based on technologies
first developed at Twitter and LinkedIn. With its first paying customer on board, the company is
now seeking partnerships to help support its enterprise-focused PaaS offering.
The 451 Take
There's a big difference between offering Hadoop as a service to be configured, deployed and
managed, and offering a managed service that masks the complexity of configuring and
deploying Hadoop. We believe the latter will gain traction as more late adopters begin to look
at adopting the benefits of Hadoop without investing upfront in the expertise and
infrastructure required to support it. While Infochimps will need to establish the trust of its
target customers, it is well-positioned with an easy-to-consume managed service for those
without Hadoop expertise, as well as a stack of technologically interesting projects for the
'devops' crowd.
Context
We first covered Infochimps earlier this year when the company pivoted from being a data
marketplace provider to releasing the technology that supported its data marketplace, both as
open source projects and as PaaS. The initial focus was on making it easier to deploy the Hadoop
Copyright 2012 - The 451 Group 1
2. data-processing framework via a Chef-based systems provisioning, deployment and updating tool
called IronFan. Infochimps has expanded since then with the addition in April of an operations
dashboard called Dashpot, and in August with the addition of the Apache Flume-based Data
Delivery Service (DDS) for integrating with existing data sources, as well as early data-streaming
functionality in DDS via extensions to Wukong, the company's Ruby for Hadoop. The latest addition
to the platform expands its support for stream processing through the integration of open source
stream-processing projects Storm and Kafka.
Initially developed by BackType and released as an open source project by Twitter in August 2011
following its acquisition of the social analytics provider, Storm is a stream-processing engine. Kafka,
meanwhile, is a distributed message queue originally developed by LinkedIn and used by the
company in a number of projects, including feeding all activity events to its data warehouse and
Hadoop, as well as keeping its search engine up to date with network activity in real time. Storm
and Kafka are used by Infochimps as the foundation of DDS, which is used to connect the
company's Hadoop-based PaaS with multiple existing data sources, enabling real-time integration
of relevant data for processing and analysis.
DDS is a key component of the Infochimps Platform that elevates it beyond a platform for Hadoop
deployment to being a potential big data management and analytics platform of choice. It is DDS
that will enable businesses to adopt the Infochimps Platform alongside existing data management
technologies and quickly gain insight from new and existing sources of data.
Infochimps' main selling point is in lowering the barriers to adopting Hadoop. While there is a lot of
complex technology involved – such as IronFan, elastic Hadoop, DDS, elasticsearch, NoSQL and
NewSQL databases, Wukong and Dashpot – the platform is delivered as a service designed to mask
that complexity. The company maintains that it can take customers from nowhere to generating
business insight from the Infochimps Platform in 30 days, without the need to hire specialist
support and analytics staff, or invest in specialist infrastructure.
Infochimps has attracted nine paying customers since its platform went live in the second quarter,
with an average selling price of $200,000. The company charges customers per node per month for
what is currently a public cloud offering hosted on Amazon Web Services or Rackspace Cloud.
Infochimps has established relationships (soon to be announced) to deliver both private cloud and
virtual private cloud offerings supported in its customers' own datacenters or via their trusted
datacenter provider. The company is launching its cloud services across a network of tier four
datacenters in North America and will begin offering its big data cloud services in the first quarter
of 2013. The potential to support private cloud deployments will be aided by the fact that IronFan is
Copyright 2012 - The 451 Group 2
3. a key component in VMware's Serengeti project to make it easy to configure and deploy Hadoop on
virtual machines, while the Infochimps Platform also supports the OpenStack API.
The shift toward more enterprise-focused services and partnerships is being led by former Teradata
and StackIQ executive (and Xerox PARC EIR) Jim Kaskade, who joined the company as CEO in
August, replacing cofounder Joe Kelly, who became COO. Kaskade has also been busy lining up a
new major financing round. Infochimps had previously raised a total of $3m from investors
including DFJ Mercury, although that was during its previous incarnation as a data marketplace
provider. The company currently has 23 employees, up from 14 in March.
Competition
There are an increasing number of vendors offering Hadoop as a service, with Amazon and Google
being the biggest players at this point. While they therefore pose a competitive threat to
Infochimps, the value proposition is quite different, since it still requires a degree of expertise to
configure, deploy and manage a cloud-based Hadoop service in comparison to Infochimps'
managed services approach. We've seen limited uptake of cloud-based Hadoop services to date,
with the main use case being development and testing. Indeed, we've noted before that if a
company begins to move toward a larger-scale deployment, the costs can be prohibitive enough to
require on-premises deployment. While Infochimps' service is initially based on the public cloud, it
has designs on supporting deployment choice. The company also believes that with the added
value of IronFan, DDS, Wukong, Dashpot and the rest, along with its managed services approach, it
has enough to justify the additional cost above that of running Hadoop on a public cloud service
with the required expertise.
Other Hadoop service providers include SunGard, Treasure Data, Qubole, Mortar Data and Guavus,
while Infochimps believes its closest competition will come from MetaScale, the Hadoop managed
services subsidiary of Sears Holdings, and tresata, the stealthy data platform provider founded by
former Bank of America managing director for big data and analytics Abhi Mehta. Other vendors are
trying to mask the complexity of configuring and deploying Hadoop by building it into larger
on-premises application stacks, so we might also expect would-be customers to consider the likes
of Drawn to Scale, Splice Machine or Digital Reasoning, depending on the specific application. The
company must also be considered a rival to some extent with Hadoop distributors such as Cloudera,
Hortonworks, MapR, IBM and EMC, although there is also the potential for partnerships here, as
indicated by the fact that Cloudera CEO Mike Olson is an adviser to Infochimps.
Copyright 2012 - The 451 Group 3
4. SWOT Analysis
Strengths Weaknesses
We were already fans of the Chef-based cluster Managed services relationships are built on trust.
platform tuned for the needs of enterprises using While Infochimps has technological expertise, it
Hadoop. DDS adds all-important integration with will need to establish itself before some would-be
existing tools that will help drive wider adoption. customers will consider it.
Opportunities Threats
We are seeing an increasing need for technologies and The big services and software providers are
services that mask the complexity of configuring, unlikely to sit back and let demand for Hadoop
deploying and managing Hadoop for late adopters. managed services go elsewhere. Expect the
Infochimps has both. competition to increase with demand.
Copyright 2012 - The 451 Group 4