IOUG93 - Technical Architecture for the Data Warehouse - Presentation
Conspectus data warehousing appliances – fad or future
1. Data Warehousing Appliances – Fad or Future?
David M Walker
Data Management & Warehousing
December 2006
Despite all the hype from vendors the basics of data warehousing have remained
fundamentally unchanged – extract data from multiple source systems, reformat the
information into an easy to query structure, load it into a dedicated database and add
an effective user interface to allow users to query the information. The cost of this
environment is substantial and directly relates to the complexity of the Extract,
Transform Load (ETL) process and the volume of data held in the system.
The complexity of the ETL process has two cost impacts: the first is in the cost of the
initial design and development and is reasonably well understood. The second is the
cost of changes over the lifetime of the system, for example if an organisation have
four source systems and each system under goes a change once a quarter then the data
warehouse support team have to modify and test an interface every three weeks, and
all this without any changes in the users requirements. The volume of data also hits
the bottom line, not only in the cost of storage but in the size and (more expensive)
skills of team required to support it, especially as data explosion forces the business to
enter the very large database arena where load time and user query performance are
critical.
Against this background it is unsurprising that vendors are looking to compete by
reducing storage, improving query times and simplify administration. Oracle have
taken steps to enhance their core database engine with features that improve each of
these areas and continue to develop their strategy, however more and more is built
into the core of its flagship general purpose engine resulting in software that has many
features not needed by a specific application. Sybase have taken the more radical step
of creating an entirely new database engine called Sybase IQ that does away with
some of the limitations required of a general purpose engine to produce a solution that
is both much faster in load and user query performance and far more efficient in its
disk usage than other general purpose databases.
Into this market enters the data warehousing appliance vendors, a breed of dedicated
integrated hardware and software solution designed to solve a business’ data
warehousing woes. Such systems use low cost commodity components in large
volumes with dedicated business intelligence engines to deliver radically faster load
times whilst at the same time reducing the query times and simplifying the systems
administration process.
The first hurdle for many organisations is that data warehousing appliances are
proprietary going against a corporate policy of open systems to allow technology re-
use, however a solution built on one of the current market leading platforms,
Terradata, is no less so. In fact Terradata can be considered one of the original data
warehouse appliances and it is the use of the low-cost commodity components and the
ability to achieve massive parallelism by the new-comers that differentiates them.
2. The second hurdle is credibility – the promises of such large benefits (typically query
performance of ten to fifty times faster whilst using three to six times less storage on a
platform that only requires a small amount of systems administration support) will be
doubted, often by systems and database administrators who have had to work so hard
to maintain the performance of the existing solution. Vendors such as Netezza have
overcome this challenge with some key accounts by providing a system on the basis
that if it meets agreed performance criteria it will be purchased and thus significantly
reducing the risk to the purchasing company.
The final obstacle is migration: an existing solution that is build, for example, on an
Oracle database, using Oracle Warehouse Builder and Oracle Discoverer is
effectively proprietary and therefore more difficult, but not impossible, to migrate.
This is also a reason to review the existing data warehousing architecture now to
ensure that as these and other new technologies come along the business will be able
to take advantage of them.
Those organisations that have overcome the hurdles report that they are achieving the
immediate huge performance gains for their queries without the need for tuning the
database whilst lowering the disk footprint and reducing the support costs. The
systems also continue to deliver benefit as the fast query times allow more complex
data models to be queried, which in turn reduces the need for complex ETL to
restructure the data. These changes to the data model and to reduce the complexity of
the ETL can be made either as part of the migration project (which delivers the largest
benefit quickly but at the greatest risk) or as part of the change management process
for the source systems (which delivers benefit over a longer time frame but
significantly reduces the risk).
With a number of entrants into the market including pure appliance players Netezza
and DATAllegro and those developing variations such as Kognitio (offering a virtual
appliance) and Sybase (offering an appliance bundle called Data Integration Suite) it
is clear that appliances are going to form a key part of data warehouse architectures
going forward, the risks of using a smaller vendor and a proprietary solution being
outweighed by the business benefit of much more timely information at a significantly
reduced cost.
David Walker is a principle consultant with Data Management & Warehousing
(http://www.datamgmt.com), a company that has been providing strategic business
intelligence consultancy as well as designing large scale data warehousing solutions
to clients around the world since 1995. David can be contacted at
davidw@datamgmt.com or on 07050 028 911.