More Related Content Similar to Ugif 04 2011 france ug04042011-jroy_ts (20) Ugif 04 2011 france ug04042011-jroy_ts2. Agenda
■ What is TimeSeries
■ Why TimeSeries
■ Components
■ Usage
■
2 © 2010 IBM Corporation
3. “Give me the Jan 1st element from time series “X”
Most useful when a range of data is normally read
“Give me the Jan 1st thru Jan 10th elements from time series “X”
Access to one time series is usually completed before moving to
the next time series.
3 © 2010 IBM Corporation
4. Challenges Managing Time Series Data
■ Slow Performance
– Extremely slow data access specially for ordered set of rows due to
the data layout and disk I/O
– Operations hard or impossible to do in standard SQL
■ High Storage Requirements
– Time series are usually stored as "tall – thin" tables with a very large
number of rows
– May need one index to enforce uniqueness and another for index
only read, more space used for index than data
– Huge space requirements in standard relational layout, due to the
volume and data
■ Complex Querying
– Can be difficult to write SQL to work with the data
4
5. Informix Solution
● TimeSeries Data Type : Native time series support
■ Store time series elements as an ordered set of elements
– Uses less space because the "key" is factored out and the time field
takes either 0 (for regular) or 11 ( for irregular) bytes
– Access is faster than index-only-read
– SQL can be made much simpler
■ Freedom to manage time series data:
– Freedom to choose what and how it is stored
– Freedom to choose the time series interval
– Freedom to choose where the time series is stored
MeterId Reading
No other RDBMS
1001 2010-01-01,daily,{(12.34,12567),(12.56,9000),(12.34,55567),..}
has native time
2001 2010-05-05,daily,{(199.08,6780),(198.55,3400),(198.12,250),..} series support
2011 2010-09-01,daily,{(9.34,8067),(9.56,9000),(9.40,10780),..}
5
6. Key Strengths of Informix TimeSeries
Performance
– Extremely fast data access: Data clustered on disk to reduce I/O
– Provides very high degree of parallelism on reads and writes
– Provides continuous loading of data with minimal impact on concurrent
queries
Space Savings
– Provides high level of compression
– Can be over 50% space savings over standard relational layout
Usability
– Time series tool kit allows custom analytics to be written
– Handles operations hard or impossible to do in standard SQL
– Conceptually closer to how users think of time series
– No other RDBMS has native time series support
6 © 2010 IBM Corporation
7. Smart Meters Data: Schema Example
Primary Key
mtr_id date Col1 Col2 ColN
1 Mon Value 1 Value 2 ……. Value N
1 Tue Value 1 Value 2 ……. Value N
1 Wed Value 1 Value 2 ……. Value N
Relational Schema ... ... ... ... ……. ...
13 Mon Value 1 Value 2 ……. Value N
13 Tue Value 1 Value 2 ……. Value N
13 Wed Value 1 Value 2 ……. Value N
... ... ... ... ……. ...
mtr_id Series
(int) timeseries(mtr_data)
1 [(Mon, v1, ...)(Tue,v1…)]
2 [(Mon, v1, ...)(Tue,v1…)]
Above schema using 3 [(Mon, v1, ...)(Tue,v1…)]
Informix TimeSeries
4 [(Mon, v1, ...)(Tue,v1…)]
… …
Save space and increase performance with faster data access with Informix
7 © 2010 IBM Corporation
8. TimeSeries Space Savings Example
TimeSeries data type takes much less space than traditional relational storage
●
– Proof of concept example:
• Regular TimeSeries, 15 minute interval
• Relational database used ~ 1TB (1000GB)
• Informix used ~340GB
The reason for this is:
– The TimeSeries does not repeat data
•MeterID: 4 bytes per reading
•TimeStamp: Could be 12 bytes per reading
•Assuming a 8 byte reading, that ~66% savings
•3X less storage!
Data Storage Comparison for 1 million meters
8 © 2010 IBM Corporation
9. TimeSeries Performance
Performance
– Faster accessing sets of data
• Ordered data
– Much faster combining time series
– For data loading into timeseries,
Informix outperforms the nearest
competition by more than 30x
times
– For report generation from
Performance Comparison for Data
timeseries, Informix outperforms Loads and Reports for 1 Million Meters
the nearest competition by more
than 90x times
9 © 2010 IBM Corporation
10. Who’s Interested in TimeSeries
Energy: smart meters
Capital Markets
– Arbitrage opportunities, breakout signals, risk/return optimization,
portfolio management, VaR calculations, simulations, backtesting...
Telecommunications:
– Network monitoring, load prediction, blocked calls (lost revenue)
from load, phone usage, fraud detection and analysis...
Manufacturing:
– Machinery going out of spec; process sampling and analysis
Logistics:
– Location of a fleet (e.g. GPS); route analysis
Scientific research:
– Temperature over time...
10 Informix Dynamic Server, TimeSeries DataBlade Module class © 2007 IBM Corporation
11. TimeSeries: Key Concepts
■ Containers
– Specialized storage for TimeSeries
EXECUTE PROCEDURE
TSContainerCreate('raw_container', 'rootdbs',
'meter_data', 100, 50);
■ Timeseries data element: row type
– Flexibility to define as many parts as needed
CREATE ROW TYPE meter_data (
tstamp datetime year to fraction(5),
value decimal(14,3)
);
■ Timeseries types: regular, irregular
– Covers regular intervals and sparse data distribution
■ Calendar
– Defines business patterns
11 © 2010 IBM Corporation
12. Features Unique to Regular TimeSeries
Only one element per “on” interval
Value "persists" to end of interval
An element for an “on” interval may be missing, entire
element will be NULL
Calendar determines offset in TimeSeries of given time point
Elements can be accessed by offset or time point
Time point not stored; calculated from header + date/time
arithmetic
12 Informix Dynamic Server, TimeSeries DataBlade Module class © 2007 IBM Corporation
13. Features Unique to Irregular TimeSeries
Data can be entered at any time point within a valid "on"
interval
Element persist until next element
No NULL elements
Elements can only be accessed by time
No duplicate time points allowed
If element already exists at given time point either an error is
raise or a unique time point is found:
– round time point up to nearest second
– search back for first element
– add 10 microseconds, this is new time point
13 Informix Dynamic Server, TimeSeries DataBlade Module class © 2007 IBM Corporation
14. Accessing Timeseries
Access through standard tabular view
– Makes TimeSeries look like a standard relational table
SQL Functions
– 103 functions
Customized functions
– Written in Stored Procedure Language (SPL), “C”, Java
– 65 “C” functions
14 Informix Dynamic Server, TimeSeries DataBlade Module class © 2007 IBM Corporation
15. TimeSeries Header
A TimeSeries needs information that sets its context:
– Calendar: Time period where data is found
– Origin: Time origin of the TimeSeries
– Threshold: in-row storage threshold
– Container: where to store the out-of-row data
– Metadata: optional data added by the TimeSeries creator
15 Informix Dynamic Server, TimeSeries DataBlade Module class © 2007 IBM Corporation
16. Calendar and Calendar Patterns
A calendar pattern is needed before we can create a calendar:
INSERT INTO CalendarPatterns
VALUES('day', '{1 on, 2 off, 4 on}, day' );
A Calendar defines a set of valid times at which the TimeSeries can record
data. (July 8, 2005 is a Friday)
INSERT INTO CalendarTable(c_name, c_calendar)
VALUES('calday',
'startdate(2005-07-08 00:00:00.00000),
pattstart(2005-07-08 00:00:00.00000),
pattname(day)' );
You can provide a pattern explicitly:
INSERT INTO CalendarTable(c_name, c_calendar)
VALUES('weekcal',
'startdate(2005-07-08 00:00:00.00000),
pattstart(2005-07-08 00:00:00.00000),
pattern({1 on, 2 off, 4 on}, day)' );
16 Informix Dynamic Server, TimeSeries DataBlade Module class © 2007 IBM Corporation
17. TimeSeries: Table
A TimeSeries resides in a table:
CREATE TABLE ts_data (
loc_esi_id char(20) NOT NULL,
measure_unit varchar(10) NOT NULL,
direction char(1) NOT NULL,
multiplier TimeSeries(meter_data),
raw_reads timeseries(meter_data),
PRIMARY KEY(loc_esi_id, measure_unit, direction)
) LOCK MODE ROW;
17 Informix Dynamic Server, TimeSeries DataBlade Module class © 2007 IBM Corporation
18. Populating a TimeSeries
A timeSeries must first be created:
INSERT INTO taqtrade_day
VALUES("IBM.N",
TSCreate('calday', '2005-07-08 00:00:00.00000',
20, 0, 0, 'taqtrade_day')
);
It can be created through the input function:
INSERT INTO taqtrade
VALUES("AA.N",
'irregular, container(taqtrade),
origin(2007-04-03 06:30:00.00000),
calendar(calsec),
[(4.48, . . .)@2007-04-03 06:30:03.00003,
(4.50,. . .)@2007-04-03 06:30:03.00119,
. . .]'
);
18 Informix Dynamic Server, TimeSeries DataBlade Module class © 2007 IBM Corporation
19. The Virtual Table Interface
Makes a TimeSeries look like a table:
EXECUTE PROCEDURE
TSCreateVirtualTab('ts_data_v', 'ts_data',
'origin(2010-11-10 00:00:00.00000),
calendar(cal15min),container(raw_container),
threshold(0), regular',
0, 'raw_reads');
Virtual table created:
CREATE TABLE ts_data_v (
loc_esi_id char(20),
measure_unit varchar(10,0),
direction char(1),
tstamp datetime year to fraction(5),
value decimal(14,3)
);
19 Informix Dynamic Server, TimeSeries DataBlade Module class © 2007 IBM Corporation
20. Quick Review
A TimeSeries resides in a container
– The container resides in a dbspace
– The container is for a specific element type (row type)
– A container is for either a regular or irregular TimeSeries (not both)
– A container can contain multiple TimeSeries
A TimeSeries requires a calendar
– Defines when the data starts, defines a pattern of valid values
A TimeSeries data is defines as a row type
– Defines the values tracked
You can operate on TimeSeries through special SQL functions or
use the virtual table interface and standard SQL
20 Informix Dynamic Server, TimeSeries DataBlade Module class © 2007 IBM Corporation
21. DEMO
21 Informix Dynamic Server, TimeSeries DataBlade Module class © 2007 IBM Corporation