SlideShare una empresa de Scribd logo
1 de 40
SciQL
Bridging The Gap Between Science
      And Relational DBMS


    Martin Kersten, Ying Zhang, Milena Ivanova, Niels Nes
                      CWI Amsterdam

        IDEAS 2011, Sep. 21-23, 2011,!"#$%&'()*+,#-&$.#/(012#&+$#%3$%#,(
                                      Lisbon, Portugal
                                             2.#(4&#$5()*+,#-&$".1(6&$&



                                             !"#$%&'()&"#*+,-(     ./0/123
                                             4")*'()5"%%,%*'(*#-(( 6!7(8 9:7;;9
Who needs arrays anyway?

             Seismology           – 1-D waveforms, 3-D spatial data

             Astronomy            – temporal ordered rasters

             Climate simulation   – temporal ordered grid

             Remote sensing       – images of 2-D or higher

             Genomics             – ordered DNA strings


             Scientists love arrays:
                   HDF5, NETCDF, FITS, MSEED, …
             but also use:
                   lists, tables, XML, ...


2011-09-22                          IDEAS 2011                        2
Arrays In DBMS
      Research issues already in the 80’s                     OODB, multi-dimensional DBMS,
                                                              Sequence DBMS, ...
      Algebraic frameworks
                                                              The Longhorn Array Database
             (S)RAM, AQL, AML, ...
                                                              RasDaMan
      SQL language extension                                     Store large arrays in chunks as BLOBs

             RasQL, AQuery, SRQL, …                              Array query (RasQL) optimisation on
                                                                 top of DBMS
             a notion of order                                   Known to work up to 12 TBs!

      SQL:1999, SQL:2003                                      PostgreSQL 8.1

             collection type, C-style arrays                  SciDB

             aggregation functions over arrays                   Array DBMS from scratch

                                                                 Overlapping chunks for parallel
                                                                 execution



2011-09-22                                       IDEAS 2011                                              3
What is the problem with RDBMS?

             Appropriate array denotations?

             Functional complete operation set?



             Size limitations (due to BLOB representations)?

             Existing foreign files?

             Scale?

             ...




2011-09-22                             IDEAS 2011              4
SciQL

             An array query language based on SQL:2003

                Pronounced as ‘cycle’

             Distinguish features:

                Arrays and tables as first class citizens of DBMSs

                Seamless integration of relational and array paradigms

                Named dimensions with constraints

                Flexible structure-based grouping




             Seismology use case


2011-09-22                              IDEAS 2011                           5
Array Definitions

             Dimensions and cell values

             Dimension range: [(start|∗) : (step|∗) : (stop|∗)]

                A short cut for integer-typed dimensions: [size]

             Dimension data type: scalar data types

             Cells:

                ≽0 value(s) / cell

                all data types of normal table columns




2011-09-22                              IDEAS 2011                       6
Array Definitions
                                  Fixed array

                             CREATE ARRAY A1 (
                              x INT DIMENSION[0:1:4],
                              y INT DIMENSION[0:1:4],
                              v FLOAT DEFAULT 0.0);


                        y                 null

                    3       0.0     0.0          0.0     0.0
                    2       0.0     0.0          0.0     0.0
             null                                                  null
                    1       0.0     0.0          0.0     0.0
                    0       0.0     0.0          0.0     0.0
                                                               x
                            0        1            2      3
                                          null


2011-09-22                           IDEAS 2011                           7
Array Definitions
                               Unbounded array

             CREATE ARRAY A2 (
              x INT DIMENSION,
              y INT DIMENSION,
              v FLOAT DEFAULT 0.0);


                       y

                   3
                   2
                                       null
                   1
                   0
                                                       x
                           0       1          2    3



2011-09-22                         IDEAS 2011                     8
Array Definitions
                               Unbounded array

             CREATE ARRAY A2 (                INSERT INTO A2 VALUES
              x INT DIMENSION,                  (1,0,5.5), (1,1,0.4), (2,2,4.5);
              y INT DIMENSION,
              v FLOAT DEFAULT 0.0);


                       y

                   3
                   2
                                       null
                   1
                   0
                                                                    x
                           0       1            2           3



2011-09-22                         IDEAS 2011                                      8
Array Definitions
                                  Unbounded array

             CREATE ARRAY A2 (                    INSERT INTO A2 VALUES
              x INT DIMENSION,                      (1,0,5.5), (1,1,0.4), (2,2,4.5);
              y INT DIMENSION,
              v FLOAT DEFAULT 0.0);


                       y
                                           null
                   3
                   2                 0.0           4.5
                           null                               null
                   1                 0.4           0.0
                   0                 5.5           0.0
                                                                        x
                            0         1             2           3
                                           null


2011-09-22                             IDEAS 2011                                      8
Array Definitions
                                  Unbounded array

             CREATE ARRAY A2 (                    INSERT INTO A2 VALUES
              x INT DIMENSION,                      (1,0,5.5), (1,1,0.4), (2,2,4.5);
              y INT DIMENSION,
              v FLOAT DEFAULT 0.0);
                                                         current range
                       y
                                           null
                   3
                   2                 0.0           4.5
                           null                               null
                   1                 0.4           0.0
                   0                 5.5           0.0
                                                                        x
                            0         1             2           3
                                           null


2011-09-22                             IDEAS 2011                                      8
Array & Table Coercions

                                                    SELECT x, y, v FROM A1;

        CREATE ARRAY A1 (                               x       y         v
          x INT DIMENSION[0:1:4],
          y INT DIMENSION[0:1:4],                       0       0        0.0
          v FLOAT DEFAULT 0.0);
                                                        0       1        0.0   full materialisation!
            y             null                          0       2        0.0
        3       0.0   0.0        0.0   0.0              0       3        0.0
        2       0.0   0.0        0.0   0.0
 null                                        null       1       0        0.0
        1       0.0   0.0        0.0   0.0
        0       0.0   0.0        0.0   0.0
                                                        1       1        0.0
                                             x
                0     1           2     3               1       2        0.0
                          null
                                                        1       3        0.0
                                                        2       0        0.0
                                                        2       1        0.0
                                                        2       2        0.0
                                                        2       3        0.0
                                                        3       0        0.0
                                                        3       1        0.0
                                                        3       2        0.0
                                                        3       3        0.0

2011-09-22                                                  IDEAS 2011                                 9
Array & Table Coercions

                                      SELECT [x], [y], v FROM T2;

                                                                   dimension qualifiers: ‘[’, ‘]’
 CREATE TABLE T2 (
   x INT, y INT,               y
   v FLOAT DEFAULT 0.0);
                                                        null
 INSERT INTO T2 VALUES     3
   (1,0,5.5), (1,1,0.4),
   (2,2,4.5), (1,1,1.3);
                           2                      0.0          4.5
    x        y      v                null                                   null
    1        0      5.5
                           1                      0.4          0.0
    1        1      0.4
                           0                      5.5          0.0
    2        2      4.5
                                                                                    x
    1        1      1.3
                                       0           1           2             3
                                                        null

                                   An unbounded array
                                   dimension ranges derived from the minimal bounding box
                                   cells values from the table or the column default
                                   duplicates are overwritten arbitrarily


2011-09-22                                    IDEAS 2011                                            10
Array Modifications

                            DELETE FROM A1 WHERE x = 1;



                        y                   null

                    3        0.0     null          0.0     0.0
                    2        0.0     null          0.0     0.0
             null                                                      null
                    1        0.0     null          0.0     0.0
                    0        0.0     null          0.0     0.0
                                                                   x
                              0       1            2        3
                                            null
                                              creates holes in the array




2011-09-22                            IDEAS 2011                              11
Array Modifications

                        UPDATE A1 SET v = 0.5 WHERE y = 1;
                        INSERT INTO A1 VALUES
                          (0,1,0.5), (1,1,0.5), (2,1,0.5), (3,1,0.5);
                        y                     null

                    3        0.0        0.0           0.0      0.0
                    2        0.0        0.0           0.0      0.0
             null                                                           null
                    1        0.5        0.5           0.5      0.5
                    0        0.0        0.0           0.0      0.0
                                                                        x
                              0          1            2          3
                                               null

                                             set/change cell values
                                             overwrite existing values



2011-09-22                               IDEAS 2011                                12
Array Views

                                 CREATE ARRAY VIEW A2 (
                                  x INT DIMENSION [-1:1:5],
                                  y INT DIMENSION [-1:1:5],
                                  w FLOAT DEFAULT 0.0) AS
                                 SELECT x-1, y, v FROM A1 WHERE x > 1 UNION
                                 SELECT x, y, 1.0 FROM A1 WHERE x = 3;


                                                          y                    null
           y                    null                  4       0.0   0.0   0.0    0.0   0.0   0.0
       3       -1.0 -1.0 -1.0 -1.0                    3       0.0   0.0   0.0    0.0   0.0   0.0
       2       -1.0 -1.0 -1.0 -1.0                    2       0.0   0.0   0.0    0.0   0.0   0.0
null                                    null   null                                                null
       1       -1.0 0.5    0.5    0.5                 1       0.0   0.0   0.0    0.0   0.0   0.0
       0       -1.0 -1.0 -1.0 -1.0                    0       0.0   0.0   0.0    0.0   0.0   0.0
                0    1      2      3
                                        x
                                                      -1      0.0   0.0   0.0    0.0   0.0   0.0
                         null                                 -1     0     1      2     3     4
                                                                                                   x
                                                                               null

2011-09-22                                        IDEAS 2011                                              13
Array Views

                                 CREATE ARRAY VIEW A2 (
                                  x INT DIMENSION [-1:1:5],
                                  y INT DIMENSION [-1:1:5],
                                  w FLOAT DEFAULT 0.0) AS
                                 SELECT x-1, y, v FROM A1 WHERE x > 1 UNION
                                 SELECT x, y, 1.0 FROM A1 WHERE x = 3;


                                                          y                    null
           y                    null                  4       0.0   0.0   0.0    0.0   0.0   0.0
       3       -1.0 -1.0 -1.0 -1.0                    3       0.0   0.0   0.0    0.0   0.0   0.0
       2       -1.0 -1.0 -1.0 -1.0                    2       0.0   0.0   0.0    0.0   0.0   0.0
null                                    null   null                                                null
       1       -1.0 0.5    0.5    0.5                 1       0.0   0.0   0.0    0.0   0.0   0.0
       0       -1.0 -1.0 -1.0 -1.0                    0       0.0   0.0   0.0    0.0   0.0   0.0
                0    1      2      3
                                        x
                                                      -1      0.0   0.0   0.0    0.0   0.0   0.0
                         null                                 -1     0     1      2     3     4
                                                                                                   x
                                                                               null

2011-09-22                                        IDEAS 2011                                              13
Array Views

                                 CREATE ARRAY VIEW A2 (
                                  x INT DIMENSION [-1:1:5],
                                  y INT DIMENSION [-1:1:5],
                                  w FLOAT DEFAULT 0.0) AS
                                 SELECT x-1, y, v FROM A1 WHERE x > 1 UNION
                                 SELECT x, y, 1.0 FROM A1 WHERE x = 3;


                                                          y                    null
           y                    null                  4       0.0   0.0   0.0    0.0   0.0   0.0
       3       -1.0 -1.0 -1.0 -1.0                    3       0.0 -1.0 -1.0 -1.0 0.0
                                                                   0.0 0.0 0.0               0.0
       2       -1.0 -1.0 -1.0 -1.0                    2       0.0 -1.0 -1.0 -1.0 0.0
                                                                   0.0 0.0 0.0               0.0
null                                    null   null                                                null
       1       -1.0 0.5    0.5    0.5                 1       0.0   0.5
                                                                    0.0   0.5
                                                                          0.0    0.5
                                                                                 0.0   0.0   0.0
       0       -1.0 -1.0 -1.0 -1.0                    0       0.0 -1.0 -1.0 -1.0 0.0
                                                                   0.0 0.0 0.0               0.0
                0    1      2      3
                                        x
                                                      -1      0.0   0.0   0.0    0.0   0.0   0.0
                         null                                 -1     0     1      2     3     4
                                                                                                   x
                                                                               null

2011-09-22                                        IDEAS 2011                                              13
Array Views

                                 CREATE ARRAY VIEW A2 (
                                  x INT DIMENSION [-1:1:5],
                                  y INT DIMENSION [-1:1:5],
                                  w FLOAT DEFAULT 0.0) AS
                                 SELECT x-1, y, v FROM A1 WHERE x > 1 UNION
                                 SELECT x, y, 1.0 FROM A1 WHERE x = 3;


                                                          y                    null
           y                    null                  4       0.0   0.0   0.0    0.0   0.0   0.0
       3       -1.0 -1.0 -1.0 -1.0                    3       0.0 -1.0 -1.0 -1.0 0.0
                                                                   0.0 0.0 0.0               0.0
       2       -1.0 -1.0 -1.0 -1.0                    2       0.0 -1.0 -1.0 -1.0 0.0
                                                                   0.0 0.0 0.0               0.0
null                                    null   null                                                null
       1       -1.0 0.5    0.5    0.5                 1       0.0   0.5
                                                                    0.0   0.5
                                                                          0.0    0.5
                                                                                 0.0   0.0   0.0
       0       -1.0 -1.0 -1.0 -1.0                    0       0.0 -1.0 -1.0 -1.0 0.0
                                                                   0.0 0.0 0.0               0.0
                0    1      2      3
                                        x
                                                      -1      0.0   0.0   0.0    0.0   0.0   0.0
                         null                                 -1     0     1      2     3     4
                                                                                                   x
                                                                               null

2011-09-22                                        IDEAS 2011                                              13
Array Views

                                 CREATE ARRAY VIEW A2 (
                                  x INT DIMENSION [-1:1:5],
                                  y INT DIMENSION [-1:1:5],
                                  w FLOAT DEFAULT 0.0) AS
                                 SELECT x-1, y, v FROM A1 WHERE x > 1 UNION
                                 SELECT x, y, 1.0 FROM A1 WHERE x = 3;


                                                          y                    null
           y                    null                  4       0.0   0.0   0.0    0.0   0.0   0.0
       3       -1.0 -1.0 -1.0 -1.0                    3       0.0 -1.0 -1.0 -1.0 0.0
                                                                   0.0 0.0 0.0               0.0
       2       -1.0 -1.0 -1.0 -1.0                    2       0.0 -1.0 -1.0 -1.0 0.0
                                                                   0.0 0.0 0.0               0.0
null                                    null   null                                                null
       1       -1.0 0.5    0.5    0.5                 1       0.0   0.5
                                                                    0.0   0.5
                                                                          0.0    0.5
                                                                                 0.0   0.0   0.0
       0       -1.0 -1.0 -1.0 -1.0                    0       0.0 -1.0 -1.0 -1.0 0.0
                                                                   0.0 0.0 0.0               0.0
                0    1      2      3
                                        x
                                                      -1      0.0   0.0   0.0    0.0   0.0   0.0
                         null                                 -1     0     1      2     3     4
                                                                                                   x
                                                                               null

2011-09-22                                        IDEAS 2011                                              13
Array Views

                                 CREATE ARRAY VIEW A2 (
                                  x INT DIMENSION [-1:1:5],
                                  y INT DIMENSION [-1:1:5],
                                  w FLOAT DEFAULT 0.0) AS
                                 SELECT x-1, y, v FROM A1 WHERE x > 1 UNION
                                 SELECT x, y, 1.0 FROM A1 WHERE x = 3;


                                                          y                    null
           y                    null                  4       0.0   0.0   0.0    0.0   0.0   0.0
       3       -1.0 -1.0 -1.0 -1.0                    3       0.0 -1.0 -1.0 -1.0 0.0
                                                                   0.0 0.0 0.0 1.0           0.0
       2       -1.0 -1.0 -1.0 -1.0                    2       0.0 -1.0 -1.0 -1.0 0.0
                                                                   0.0 0.0 0.0 1.0           0.0
null                                    null   null                                                null
       1       -1.0 0.5    0.5    0.5                 1       0.0   0.5
                                                                    0.0   0.5
                                                                          0.0    0.5
                                                                                 0.0   1.0
                                                                                       0.0   0.0
       0       -1.0 -1.0 -1.0 -1.0                    0       0.0 -1.0 -1.0 -1.0 0.0
                                                                   0.0 0.0 0.0 1.0           0.0
                0    1      2      3
                                        x
                                                      -1      0.0   0.0   0.0    0.0   0.0   0.0
                         null                                 -1     0     1      2     3     4
                                                                                                   x
                                                                               null

2011-09-22                                        IDEAS 2011                                              13
Array Tiling
                SELECT [x], [y], AVG(v) FROM A1
                GROUP BY A1[x:x+2][y:y+2];


                        y              null

                    3       0.0   0.0         0.0   0.0

                    2       0.0   0.0         0.0   0.0
             null                                         null
                    1       0.0   0.5         0.5   0.5

                    0       0.0   0.0         0.0   0.0
                             0     1           2     3
                                                           x
                                       null




2011-09-22                        IDEAS 2011                               14
Array Tiling
                      SELECT [x], [y], AVG(v) FROM A1
                      GROUP BY A1[x:x+2][y:y+2];


                              y              null

                          3       0.0   0.0         0.0   0.0


   Anchor point:          2       0.0   0.0         0.0   0.0
     A1[x][y]      null                                         null
                          1       0.0   0.5         0.5   0.5

                          0       0.0   0.0         0.0   0.0
                                   0     1           2     3
                                                                 x
                                             null




2011-09-22                              IDEAS 2011                               14
Array Tiling
                      SELECT [x], [y], AVG(v) FROM A1
                      GROUP BY A1[x:x+2][y:y+2];


                              y              null

                          3       0.0   0.0         0.0   0.0


   Anchor point:          2       0.0   0.0         0.0   0.0
     A1[x][y]      null                                         null
                          1       0.0   0.5         0.5   0.5

                          0       0.0   0.0         0.0   0.0
                                   0     1           2     3
                                                                 x
                                             null




2011-09-22                              IDEAS 2011                               14
Array Tiling
                      SELECT [x], [y], AVG(v) FROM A1
                      GROUP BY A1[x:x+2][y:y+2];


                              y              null

                          3       0.0   0.0         0.0   0.0


   Anchor point:          2       0.0   0.0         0.0   0.0
     A1[x][y]      null                                         null
                          1       0.0   0.5         0.5   0.5

                          0       0.0   0.0         0.0   0.0
                                   0     1           2     3
                                                                 x
                                             null




2011-09-22                              IDEAS 2011                               14
Array Tiling
                      SELECT [x], [y], AVG(v) FROM A1
                      GROUP BY A1[x:x+2][y:y+2];


                              y              null

                          3       0.0   0.0         0.0   0.0


   Anchor point:          2       0.0   0.0         0.0   0.0
     A1[x][y]      null                                         null
                          1       0.0   0.5         0.5   0.5

                          0       0.0   0.0         0.0   0.0
                                   0     1           2     3
                                                                 x
                                             null




2011-09-22                              IDEAS 2011                               14
Array Tiling
                      SELECT [x], [y], AVG(v) FROM A1
                      GROUP BY A1[x:x+2][y:y+2];


                              y              null

                          3       0.0   0.0         0.0   0.0


   Anchor point:          2       0.0   0.0         0.0   0.0
     A1[x][y]      null                                         null
                          1       0.0   0.5         0.5   0.5

                          0       0.0   0.0         0.0   0.0
                                   0     1           2     3
                                                                 x
                                             null




2011-09-22                              IDEAS 2011                               14
Array Tiling
                SELECT [x], [y], AVG(v) FROM A1
                GROUP BY A1[x:x+2][y:y+2];


                        y               null

                    3        0.0   0.0         0.0   0.0

                    2        0.0   0.0         0.0   0.0
             null                                          null
                    1       0.125 0.25 0.25 0.25

                    0       0.125 0.25 0.25 0.25
                              0     1           2     3
                                                            x
                                        null




2011-09-22                         IDEAS 2011                               15
Array Tiling
                SELECT [x], [y], AVG(v) FROM A1
                GROUP BY A1[x-1][y], A1[x][y-1],
                  A1[x][y], A1[x+1][y], A1[x][y+1];

                        y              null

                    3       0.0   0.0         0.0   0.0

                    2       0.0   0.0         0.0   0.0
             null                                         null
                    1       0.0   0.5         0.5   0.5

                    0       0.0   0.0         0.0   0.0
                             0     1           2     3
                                                           x
                                       null




2011-09-22                        IDEAS 2011                               16
Array Tiling
                SELECT [x], [y], AVG(v) FROM A1
                GROUP BY A1[x-1][y], A1[x][y-1],
                  A1[x][y], A1[x+1][y], A1[x][y+1];

                        y              null

                    3       0.0   0.0         0.0   0.0

                    2       0.0   0.0         0.0   0.0
             null                                         null
                    1       0.0   0.5         0.5   0.5

                    0       0.0   0.0         0.0   0.0
                             0     1           2     3
                                                           x
                                       null




2011-09-22                        IDEAS 2011                               17
Array Tiling
                SELECT [x], [y], AVG(v) FROM A1
                GROUP BY A1[x-1][y], A1[x][y-1],
                  A1[x][y], A1[x+1][y], A1[x][y+1];

                        y              null

                    3       0.0   0.0         0.0   0.0

                    2       0.0   0.0         0.0   0.0
             null                                         null
                    1       0.0   0.5         0.5   0.5

                    0       0.0   0.0         0.0   0.0
                             0     1           2     3
                                                           x
                                       null




2011-09-22                        IDEAS 2011                               18
Array Tiling
                SELECT [x], [y], AVG(v) FROM A1
                GROUP BY A1[x-1][y], A1[x][y-1],
                  A1[x][y], A1[x+1][y], A1[x][y+1];

                        y              null

                    3       0.0   0.0         0.0   0.0

                    2       0.0   0.0         0.0   0.0
             null                                         null
                    1       0.0   0.5         0.5   0.5

                    0       0.0   0.0         0.0   0.0
                             0     1           2     3
                                                           x
                                       null




2011-09-22                        IDEAS 2011                               19
Array Tiling
                SELECT [x], [y], AVG(v) FROM A1
                GROUP BY A1[x-1][y], A1[x][y-1],
                  A1[x][y], A1[x+1][y], A1[x][y+1];

                        y               null

                    3        0.0   0.0         0.0   0.0

                    2        0.0   0.1         0.1   0.0
             null                                           null
                    1       0.125 0.2          0.3   0.25

                    0        0.0 0.125 0.125 0.167
                              0     1           2     3
                                                             x
                                        null




2011-09-22                         IDEAS 2011                                20
Seismology Use Case
      Recent aftershock in Chili

             2TB waveform data at 100Hz

             detecting seismic events using
             STA/LTA (e.g., 2 sec / 15 sec)

             remove false positives

             window-based 3 min. cuts

             further analysis: digital signal
             processing operations

      Current problems

             accessing waveform files too slow

             unpacking and positioning MSEED
             data every time take too long



2011-09-22                                       IDEAS 2011              21
Seismology Use Case
      Recent aftershock in Chili                     CREATE ARRAY MSeed (
                                                        station VARCHAR(5) DIMENSION [‘0’:*:‘ZZZZZ’];
                                                        time TIMESTAMP DIMENSION,
             2TB waveform data at 100Hz                 data DECIMAL(8,6)
                                                     );
             detecting seismic events using
             STA/LTA (e.g., 2 sec / 15 sec)          station

             remove false positives                  efg

             window-based 3 min. cuts                bce


             further analysis: digital signal        bcd
             processing operations
                                                     abc

      Current problems                                                                         time


             accessing waveform files too slow

             unpacking and positioning MSEED
             data every time take too long



2011-09-22                                       IDEAS 2011                                             22
Seismology Use Case
      Recent aftershock in Chili                     --- avg of 2 sec. windows:

                                                     SELECT M.station, M.time, AVG(M.data)
             2TB waveform data at 100Hz              FROM MSeed AS M
                                                     GROUP BY
             detecting seismic events using             M[station][time - INTERVAL ‘2’ SECOND : time];
             STA/LTA (e.g., 2 sec / 15 sec)

             remove false positives

             window-based 3 min. cuts

             further analysis: digital signal
             processing operations

      Current problems

             accessing waveform files too slow

             unpacking and positioning MSEED
             data every time take too long



2011-09-22                                       IDEAS 2011                                              23
Seismology Use Case
      Recent aftershock in Chili                     CREATE TABLE Event(
                                                      station VARCHAR(5),
                                                      time     TIMESTAMP,
             2TB waveform data at 100Hz               ratio FLOAT,
                                                      PRIMARY KEY (station, time));
             detecting seismic events using
             STA/LTA (e.g., 2 sec / 15 sec)          INSERT INTO Event
                                                     SELECT M1.station, M1.time,
             remove false positives                     AVG(M1.data)/AVG(M2.data) AS ratio
                                                     FROM MSeed AS M1, MSeed AS M2
                                                     WHERE M1.station = M2.station
             window-based 3 min. cuts                   AND M1.time = M2.time
                                                     GROUP BY
             further analysis: digital signal           M1[station][time - INTERVAL ‘2’ SECOND: time],
             processing operations                      M2[station][time - INTERVAL ‘15’ SECOND: time]
                                                     HAVING AVG(M1.data)/AVG(M2.data) > ?delta;
      Current problems

             accessing waveform files too slow

             unpacking and positioning MSEED
             data every time take too long



2011-09-22                                       IDEAS 2011                                              24
Seismology Use Case
      Recent aftershock in Chili                      -- detect isolated errors by direct environment
                                                      -- using wave propagation statics
             2TB waveform data at 100Hz               CREATE TABLE Neighbors(
                                                         station1 VARCHAR(5),
             detecting seismic events using              station2 VARCHAR(5),
             STA/LTA (e.g., 2 sec / 15 sec)              mindelay INTERVAL SECOND,
                                                         maxdelay INTERVAL SECOND,
             remove false positives                      weight FLOAT
                                                      );
             window-based 3 min. cuts
                                                      -- remove the false positives from Event
             further analysis: digital signal
             processing operations                    DELETE FROM Event WHERE id NOT IN (
                                                      SELECT E1.id
      Current problems                                FROM Event AS E1, Event AS E2, Neighbor AS N
                                                      WHERE E1.station = N.station1
                                                        AND E2.station = N.station2
             accessing waveform files too slow          AND E2.time BETWEEN E1.time + N.mindelay
                                                                           AND E1.time + N.maxdelay
             unpacking and positioning MSEED            AND E1.ratio > E2.ratio * N.weight);
             data every time take too long



2011-09-22                                       IDEAS 2011                                             25
Seismology Use Case
      Recent aftershocks in Chili                     -- pass time series to a UDF, written in, e.g., C:

                                                      SELECT myfunction(M[station].*)
             2TB waveform data at 100Hz               FROM MSeed AS M, Event AS E
                                                      WHERE M.station = E.station
             detecting seismic events using             AND M.time = E.time
             STA/LTA (e.g., 2 sec / 15 sec)           GROUP BY DISTINCT
                                                        M[station][time - INTERVAL ‘1’ MINUTE :
             remove false positives                               time + INTERVAL ‘2’ MINUTE];

             window-based 3 min. cuts

             further analysis: digital signal
             processing operations

      Current problems

             accessing waveform files too slow

             unpacking and positioning MSEED
             data every time take too long



2011-09-22                                       IDEAS 2011                                                26
Conclusion
             SciQL: a first step towards a tailored scientific DBMS
                A symbiosis of relational and array paradigms

             Under active implementation



             Open issues:
                Appropriate array denotations

                Functional complete operation set

                Size limitations (due to BLOB representations)

                Existing foreign files
                                                      !"#$%&'()*+,#-&$.#/(012#&+$#%3$%#,(
                                                      2.#(4&#$5()*+,#-&$".1(6&$&
                Scale
                                                      !"#$%&'()&"#*+,-(     ./0/123
                                                      4")*'()5"%%,%*'(*#-(( 6!7(8 9:7;;9




2011-09-22                               IDEAS 2011                                                 27

Más contenido relacionado

Similar a SciQL, Bridging the Gap between Science and Relational DBMS

Collaborative Similarity Measure for Intra-Graph Clustering
Collaborative Similarity Measure for Intra-Graph ClusteringCollaborative Similarity Measure for Intra-Graph Clustering
Collaborative Similarity Measure for Intra-Graph Clustering
Waqas Nawaz
 

Similar a SciQL, Bridging the Gap between Science and Relational DBMS (20)

SQLBits X SQL Server 2012 Spatial Indexing
SQLBits X SQL Server 2012 Spatial IndexingSQLBits X SQL Server 2012 Spatial Indexing
SQLBits X SQL Server 2012 Spatial Indexing
 
Dexjava Technical Seminar Dec 2011
Dexjava Technical Seminar Dec 2011Dexjava Technical Seminar Dec 2011
Dexjava Technical Seminar Dec 2011
 
CSMR11b.ppt
CSMR11b.pptCSMR11b.ppt
CSMR11b.ppt
 
Structured Query Language
Structured Query LanguageStructured Query Language
Structured Query Language
 
Principal Component Analysis For Novelty Detection
Principal Component Analysis For Novelty DetectionPrincipal Component Analysis For Novelty Detection
Principal Component Analysis For Novelty Detection
 
No Sql
No SqlNo Sql
No Sql
 
Enter The Matrix
Enter The MatrixEnter The Matrix
Enter The Matrix
 
Collaborative Similarity Measure for Intra-Graph Clustering
Collaborative Similarity Measure for Intra-Graph ClusteringCollaborative Similarity Measure for Intra-Graph Clustering
Collaborative Similarity Measure for Intra-Graph Clustering
 
Mindmap: Oracle to Couchbase for developers
Mindmap: Oracle to Couchbase for developersMindmap: Oracle to Couchbase for developers
Mindmap: Oracle to Couchbase for developers
 
Configuring Mahout Clustering Jobs - Frank Scholten
Configuring Mahout Clustering Jobs - Frank ScholtenConfiguring Mahout Clustering Jobs - Frank Scholten
Configuring Mahout Clustering Jobs - Frank Scholten
 
3. Synthesis.pptx
3. Synthesis.pptx3. Synthesis.pptx
3. Synthesis.pptx
 
Automatically Defined Functions for Learning Classifier Systems
Automatically Defined Functions for Learning Classifier SystemsAutomatically Defined Functions for Learning Classifier Systems
Automatically Defined Functions for Learning Classifier Systems
 
Data structures
Data structuresData structures
Data structures
 
Lecture12
Lecture12Lecture12
Lecture12
 
Learning to Spot and Refactor Inconsistent Method Names
Learning to Spot and Refactor Inconsistent Method NamesLearning to Spot and Refactor Inconsistent Method Names
Learning to Spot and Refactor Inconsistent Method Names
 
Object Relational Mapping with LINQ To SQL
Object Relational Mapping with LINQ To SQLObject Relational Mapping with LINQ To SQL
Object Relational Mapping with LINQ To SQL
 
Csmr11b.ppt
Csmr11b.pptCsmr11b.ppt
Csmr11b.ppt
 
SQLPASS AD404-M Spatial Index MRys
SQLPASS AD404-M Spatial Index MRysSQLPASS AD404-M Spatial Index MRys
SQLPASS AD404-M Spatial Index MRys
 
H2O.ai's Distributed Deep Learning by Arno Candel 04/03/14
H2O.ai's Distributed Deep Learning by Arno Candel 04/03/14H2O.ai's Distributed Deep Learning by Arno Candel 04/03/14
H2O.ai's Distributed Deep Learning by Arno Candel 04/03/14
 
Cassandra 2012 scandit
Cassandra 2012 scanditCassandra 2012 scandit
Cassandra 2012 scandit
 

Más de PlanetData Network of Excellence

A Contextualized Knowledge Repository for Open Data about Trentino
A Contextualized Knowledge Repository for Open Data about TrentinoA Contextualized Knowledge Repository for Open Data about Trentino
A Contextualized Knowledge Repository for Open Data about Trentino
PlanetData Network of Excellence
 
Access Control for RDF graphs using Abstract Models
Access Control for RDF graphs using Abstract ModelsAccess Control for RDF graphs using Abstract Models
Access Control for RDF graphs using Abstract Models
PlanetData Network of Excellence
 
Abstract Access Control Model for Dynamic RDF Datasets
Abstract Access Control Model for Dynamic RDF DatasetsAbstract Access Control Model for Dynamic RDF Datasets
Abstract Access Control Model for Dynamic RDF Datasets
PlanetData Network of Excellence
 
Automation in Cytomics: A Modern RDBMS Based Platform for Image Analysis and ...
Automation in Cytomics: A Modern RDBMS Based Platform for Image Analysis and ...Automation in Cytomics: A Modern RDBMS Based Platform for Image Analysis and ...
Automation in Cytomics: A Modern RDBMS Based Platform for Image Analysis and ...
PlanetData Network of Excellence
 
Heuristic based Query Optimisation for SPARQL
Heuristic based Query Optimisation for SPARQLHeuristic based Query Optimisation for SPARQL
Heuristic based Query Optimisation for SPARQL
PlanetData Network of Excellence
 

Más de PlanetData Network of Excellence (20)

Dl2014 slides
Dl2014 slidesDl2014 slides
Dl2014 slides
 
A Contextualized Knowledge Repository for Open Data about Trentino
A Contextualized Knowledge Repository for Open Data about TrentinoA Contextualized Knowledge Repository for Open Data about Trentino
A Contextualized Knowledge Repository for Open Data about Trentino
 
On Leveraging Crowdsourcing Techniques for Schema Matching Networks
On Leveraging Crowdsourcing Techniques for Schema Matching NetworksOn Leveraging Crowdsourcing Techniques for Schema Matching Networks
On Leveraging Crowdsourcing Techniques for Schema Matching Networks
 
Towards Enabling Probabilistic Databases for Participatory Sensing
Towards Enabling Probabilistic Databases for Participatory SensingTowards Enabling Probabilistic Databases for Participatory Sensing
Towards Enabling Probabilistic Databases for Participatory Sensing
 
Privacy-Preserving Schema Reuse
Privacy-Preserving Schema ReusePrivacy-Preserving Schema Reuse
Privacy-Preserving Schema Reuse
 
Pay-as-you-go Reconciliation in Schema Matching Networks
Pay-as-you-go Reconciliation in Schema Matching NetworksPay-as-you-go Reconciliation in Schema Matching Networks
Pay-as-you-go Reconciliation in Schema Matching Networks
 
Demo: tablet-based visualisation of transport data in Madrid using SPARQLstream
Demo: tablet-based visualisation of transport data in Madrid using SPARQLstreamDemo: tablet-based visualisation of transport data in Madrid using SPARQLstream
Demo: tablet-based visualisation of transport data in Madrid using SPARQLstream
 
On the need for a W3C community group on RDF Stream Processing
On the need for a W3C community group on RDF Stream ProcessingOn the need for a W3C community group on RDF Stream Processing
On the need for a W3C community group on RDF Stream Processing
 
Urbanopoly: Collection and Quality Assessment of Geo-spatial Linked Data via ...
Urbanopoly: Collection and Quality Assessment of Geo-spatial Linked Data via ...Urbanopoly: Collection and Quality Assessment of Geo-spatial Linked Data via ...
Urbanopoly: Collection and Quality Assessment of Geo-spatial Linked Data via ...
 
Linking Smart Cities Datasets with Human Computation: the case of UrbanMatch
Linking Smart Cities Datasets with Human Computation: the case of UrbanMatchLinking Smart Cities Datasets with Human Computation: the case of UrbanMatch
Linking Smart Cities Datasets with Human Computation: the case of UrbanMatch
 
CLODA: A Crowdsourced Linked Open Data Architecture
CLODA: A Crowdsourced Linked Open Data ArchitectureCLODA: A Crowdsourced Linked Open Data Architecture
CLODA: A Crowdsourced Linked Open Data Architecture
 
Scalable Nonmonotonic Reasoning over RDF Data Using MapReduce
Scalable Nonmonotonic Reasoning over RDF Data Using MapReduceScalable Nonmonotonic Reasoning over RDF Data Using MapReduce
Scalable Nonmonotonic Reasoning over RDF Data Using MapReduce
 
Data and Knowledge Evolution
Data and Knowledge Evolution  Data and Knowledge Evolution
Data and Knowledge Evolution
 
Evolution of Workflow Provenance Information in the Presence of Custom Infere...
Evolution of Workflow Provenance Information in the Presence of Custom Infere...Evolution of Workflow Provenance Information in the Presence of Custom Infere...
Evolution of Workflow Provenance Information in the Presence of Custom Infere...
 
Access Control for RDF graphs using Abstract Models
Access Control for RDF graphs using Abstract ModelsAccess Control for RDF graphs using Abstract Models
Access Control for RDF graphs using Abstract Models
 
Abstract Access Control Model for Dynamic RDF Datasets
Abstract Access Control Model for Dynamic RDF DatasetsAbstract Access Control Model for Dynamic RDF Datasets
Abstract Access Control Model for Dynamic RDF Datasets
 
Towards Parallel Nonmonotonic Reasoning with Billions of Facts
Towards Parallel Nonmonotonic Reasoning with Billions of FactsTowards Parallel Nonmonotonic Reasoning with Billions of Facts
Towards Parallel Nonmonotonic Reasoning with Billions of Facts
 
Automation in Cytomics: A Modern RDBMS Based Platform for Image Analysis and ...
Automation in Cytomics: A Modern RDBMS Based Platform for Image Analysis and ...Automation in Cytomics: A Modern RDBMS Based Platform for Image Analysis and ...
Automation in Cytomics: A Modern RDBMS Based Platform for Image Analysis and ...
 
Heuristic based Query Optimisation for SPARQL
Heuristic based Query Optimisation for SPARQLHeuristic based Query Optimisation for SPARQL
Heuristic based Query Optimisation for SPARQL
 
Adaptive Semantic Data Management Techniques for Federations of Endpoints
Adaptive Semantic Data Management Techniques for Federations of EndpointsAdaptive Semantic Data Management Techniques for Federations of Endpoints
Adaptive Semantic Data Management Techniques for Federations of Endpoints
 

Último

Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Victor Rentea
 

Último (20)

CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital Adaptability
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 

SciQL, Bridging the Gap between Science and Relational DBMS

  • 1. SciQL Bridging The Gap Between Science And Relational DBMS Martin Kersten, Ying Zhang, Milena Ivanova, Niels Nes CWI Amsterdam IDEAS 2011, Sep. 21-23, 2011,!"#$%&'()*+,#-&$.#/(012#&+$#%3$%#,( Lisbon, Portugal 2.#(4&#$5()*+,#-&$".1(6&$& !"#$%&'()&"#*+,-( ./0/123 4")*'()5"%%,%*'(*#-(( 6!7(8 9:7;;9
  • 2. Who needs arrays anyway? Seismology – 1-D waveforms, 3-D spatial data Astronomy – temporal ordered rasters Climate simulation – temporal ordered grid Remote sensing – images of 2-D or higher Genomics – ordered DNA strings Scientists love arrays: HDF5, NETCDF, FITS, MSEED, … but also use: lists, tables, XML, ... 2011-09-22 IDEAS 2011 2
  • 3. Arrays In DBMS Research issues already in the 80’s OODB, multi-dimensional DBMS, Sequence DBMS, ... Algebraic frameworks The Longhorn Array Database (S)RAM, AQL, AML, ... RasDaMan SQL language extension Store large arrays in chunks as BLOBs RasQL, AQuery, SRQL, … Array query (RasQL) optimisation on top of DBMS a notion of order Known to work up to 12 TBs! SQL:1999, SQL:2003 PostgreSQL 8.1 collection type, C-style arrays SciDB aggregation functions over arrays Array DBMS from scratch Overlapping chunks for parallel execution 2011-09-22 IDEAS 2011 3
  • 4. What is the problem with RDBMS? Appropriate array denotations? Functional complete operation set? Size limitations (due to BLOB representations)? Existing foreign files? Scale? ... 2011-09-22 IDEAS 2011 4
  • 5. SciQL An array query language based on SQL:2003 Pronounced as ‘cycle’ Distinguish features: Arrays and tables as first class citizens of DBMSs Seamless integration of relational and array paradigms Named dimensions with constraints Flexible structure-based grouping Seismology use case 2011-09-22 IDEAS 2011 5
  • 6. Array Definitions Dimensions and cell values Dimension range: [(start|∗) : (step|∗) : (stop|∗)] A short cut for integer-typed dimensions: [size] Dimension data type: scalar data types Cells: ≽0 value(s) / cell all data types of normal table columns 2011-09-22 IDEAS 2011 6
  • 7. Array Definitions Fixed array CREATE ARRAY A1 ( x INT DIMENSION[0:1:4], y INT DIMENSION[0:1:4], v FLOAT DEFAULT 0.0); y null 3 0.0 0.0 0.0 0.0 2 0.0 0.0 0.0 0.0 null null 1 0.0 0.0 0.0 0.0 0 0.0 0.0 0.0 0.0 x 0 1 2 3 null 2011-09-22 IDEAS 2011 7
  • 8. Array Definitions Unbounded array CREATE ARRAY A2 ( x INT DIMENSION, y INT DIMENSION, v FLOAT DEFAULT 0.0); y 3 2 null 1 0 x 0 1 2 3 2011-09-22 IDEAS 2011 8
  • 9. Array Definitions Unbounded array CREATE ARRAY A2 ( INSERT INTO A2 VALUES x INT DIMENSION, (1,0,5.5), (1,1,0.4), (2,2,4.5); y INT DIMENSION, v FLOAT DEFAULT 0.0); y 3 2 null 1 0 x 0 1 2 3 2011-09-22 IDEAS 2011 8
  • 10. Array Definitions Unbounded array CREATE ARRAY A2 ( INSERT INTO A2 VALUES x INT DIMENSION, (1,0,5.5), (1,1,0.4), (2,2,4.5); y INT DIMENSION, v FLOAT DEFAULT 0.0); y null 3 2 0.0 4.5 null null 1 0.4 0.0 0 5.5 0.0 x 0 1 2 3 null 2011-09-22 IDEAS 2011 8
  • 11. Array Definitions Unbounded array CREATE ARRAY A2 ( INSERT INTO A2 VALUES x INT DIMENSION, (1,0,5.5), (1,1,0.4), (2,2,4.5); y INT DIMENSION, v FLOAT DEFAULT 0.0); current range y null 3 2 0.0 4.5 null null 1 0.4 0.0 0 5.5 0.0 x 0 1 2 3 null 2011-09-22 IDEAS 2011 8
  • 12. Array & Table Coercions SELECT x, y, v FROM A1; CREATE ARRAY A1 ( x y v x INT DIMENSION[0:1:4], y INT DIMENSION[0:1:4], 0 0 0.0 v FLOAT DEFAULT 0.0); 0 1 0.0 full materialisation! y null 0 2 0.0 3 0.0 0.0 0.0 0.0 0 3 0.0 2 0.0 0.0 0.0 0.0 null null 1 0 0.0 1 0.0 0.0 0.0 0.0 0 0.0 0.0 0.0 0.0 1 1 0.0 x 0 1 2 3 1 2 0.0 null 1 3 0.0 2 0 0.0 2 1 0.0 2 2 0.0 2 3 0.0 3 0 0.0 3 1 0.0 3 2 0.0 3 3 0.0 2011-09-22 IDEAS 2011 9
  • 13. Array & Table Coercions SELECT [x], [y], v FROM T2; dimension qualifiers: ‘[’, ‘]’ CREATE TABLE T2 ( x INT, y INT, y v FLOAT DEFAULT 0.0); null INSERT INTO T2 VALUES 3 (1,0,5.5), (1,1,0.4), (2,2,4.5), (1,1,1.3); 2 0.0 4.5 x y v null null 1 0 5.5 1 0.4 0.0 1 1 0.4 0 5.5 0.0 2 2 4.5 x 1 1 1.3 0 1 2 3 null An unbounded array dimension ranges derived from the minimal bounding box cells values from the table or the column default duplicates are overwritten arbitrarily 2011-09-22 IDEAS 2011 10
  • 14. Array Modifications DELETE FROM A1 WHERE x = 1; y null 3 0.0 null 0.0 0.0 2 0.0 null 0.0 0.0 null null 1 0.0 null 0.0 0.0 0 0.0 null 0.0 0.0 x 0 1 2 3 null creates holes in the array 2011-09-22 IDEAS 2011 11
  • 15. Array Modifications UPDATE A1 SET v = 0.5 WHERE y = 1; INSERT INTO A1 VALUES (0,1,0.5), (1,1,0.5), (2,1,0.5), (3,1,0.5); y null 3 0.0 0.0 0.0 0.0 2 0.0 0.0 0.0 0.0 null null 1 0.5 0.5 0.5 0.5 0 0.0 0.0 0.0 0.0 x 0 1 2 3 null set/change cell values overwrite existing values 2011-09-22 IDEAS 2011 12
  • 16. Array Views CREATE ARRAY VIEW A2 ( x INT DIMENSION [-1:1:5], y INT DIMENSION [-1:1:5], w FLOAT DEFAULT 0.0) AS SELECT x-1, y, v FROM A1 WHERE x > 1 UNION SELECT x, y, 1.0 FROM A1 WHERE x = 3; y null y null 4 0.0 0.0 0.0 0.0 0.0 0.0 3 -1.0 -1.0 -1.0 -1.0 3 0.0 0.0 0.0 0.0 0.0 0.0 2 -1.0 -1.0 -1.0 -1.0 2 0.0 0.0 0.0 0.0 0.0 0.0 null null null null 1 -1.0 0.5 0.5 0.5 1 0.0 0.0 0.0 0.0 0.0 0.0 0 -1.0 -1.0 -1.0 -1.0 0 0.0 0.0 0.0 0.0 0.0 0.0 0 1 2 3 x -1 0.0 0.0 0.0 0.0 0.0 0.0 null -1 0 1 2 3 4 x null 2011-09-22 IDEAS 2011 13
  • 17. Array Views CREATE ARRAY VIEW A2 ( x INT DIMENSION [-1:1:5], y INT DIMENSION [-1:1:5], w FLOAT DEFAULT 0.0) AS SELECT x-1, y, v FROM A1 WHERE x > 1 UNION SELECT x, y, 1.0 FROM A1 WHERE x = 3; y null y null 4 0.0 0.0 0.0 0.0 0.0 0.0 3 -1.0 -1.0 -1.0 -1.0 3 0.0 0.0 0.0 0.0 0.0 0.0 2 -1.0 -1.0 -1.0 -1.0 2 0.0 0.0 0.0 0.0 0.0 0.0 null null null null 1 -1.0 0.5 0.5 0.5 1 0.0 0.0 0.0 0.0 0.0 0.0 0 -1.0 -1.0 -1.0 -1.0 0 0.0 0.0 0.0 0.0 0.0 0.0 0 1 2 3 x -1 0.0 0.0 0.0 0.0 0.0 0.0 null -1 0 1 2 3 4 x null 2011-09-22 IDEAS 2011 13
  • 18. Array Views CREATE ARRAY VIEW A2 ( x INT DIMENSION [-1:1:5], y INT DIMENSION [-1:1:5], w FLOAT DEFAULT 0.0) AS SELECT x-1, y, v FROM A1 WHERE x > 1 UNION SELECT x, y, 1.0 FROM A1 WHERE x = 3; y null y null 4 0.0 0.0 0.0 0.0 0.0 0.0 3 -1.0 -1.0 -1.0 -1.0 3 0.0 -1.0 -1.0 -1.0 0.0 0.0 0.0 0.0 0.0 2 -1.0 -1.0 -1.0 -1.0 2 0.0 -1.0 -1.0 -1.0 0.0 0.0 0.0 0.0 0.0 null null null null 1 -1.0 0.5 0.5 0.5 1 0.0 0.5 0.0 0.5 0.0 0.5 0.0 0.0 0.0 0 -1.0 -1.0 -1.0 -1.0 0 0.0 -1.0 -1.0 -1.0 0.0 0.0 0.0 0.0 0.0 0 1 2 3 x -1 0.0 0.0 0.0 0.0 0.0 0.0 null -1 0 1 2 3 4 x null 2011-09-22 IDEAS 2011 13
  • 19. Array Views CREATE ARRAY VIEW A2 ( x INT DIMENSION [-1:1:5], y INT DIMENSION [-1:1:5], w FLOAT DEFAULT 0.0) AS SELECT x-1, y, v FROM A1 WHERE x > 1 UNION SELECT x, y, 1.0 FROM A1 WHERE x = 3; y null y null 4 0.0 0.0 0.0 0.0 0.0 0.0 3 -1.0 -1.0 -1.0 -1.0 3 0.0 -1.0 -1.0 -1.0 0.0 0.0 0.0 0.0 0.0 2 -1.0 -1.0 -1.0 -1.0 2 0.0 -1.0 -1.0 -1.0 0.0 0.0 0.0 0.0 0.0 null null null null 1 -1.0 0.5 0.5 0.5 1 0.0 0.5 0.0 0.5 0.0 0.5 0.0 0.0 0.0 0 -1.0 -1.0 -1.0 -1.0 0 0.0 -1.0 -1.0 -1.0 0.0 0.0 0.0 0.0 0.0 0 1 2 3 x -1 0.0 0.0 0.0 0.0 0.0 0.0 null -1 0 1 2 3 4 x null 2011-09-22 IDEAS 2011 13
  • 20. Array Views CREATE ARRAY VIEW A2 ( x INT DIMENSION [-1:1:5], y INT DIMENSION [-1:1:5], w FLOAT DEFAULT 0.0) AS SELECT x-1, y, v FROM A1 WHERE x > 1 UNION SELECT x, y, 1.0 FROM A1 WHERE x = 3; y null y null 4 0.0 0.0 0.0 0.0 0.0 0.0 3 -1.0 -1.0 -1.0 -1.0 3 0.0 -1.0 -1.0 -1.0 0.0 0.0 0.0 0.0 0.0 2 -1.0 -1.0 -1.0 -1.0 2 0.0 -1.0 -1.0 -1.0 0.0 0.0 0.0 0.0 0.0 null null null null 1 -1.0 0.5 0.5 0.5 1 0.0 0.5 0.0 0.5 0.0 0.5 0.0 0.0 0.0 0 -1.0 -1.0 -1.0 -1.0 0 0.0 -1.0 -1.0 -1.0 0.0 0.0 0.0 0.0 0.0 0 1 2 3 x -1 0.0 0.0 0.0 0.0 0.0 0.0 null -1 0 1 2 3 4 x null 2011-09-22 IDEAS 2011 13
  • 21. Array Views CREATE ARRAY VIEW A2 ( x INT DIMENSION [-1:1:5], y INT DIMENSION [-1:1:5], w FLOAT DEFAULT 0.0) AS SELECT x-1, y, v FROM A1 WHERE x > 1 UNION SELECT x, y, 1.0 FROM A1 WHERE x = 3; y null y null 4 0.0 0.0 0.0 0.0 0.0 0.0 3 -1.0 -1.0 -1.0 -1.0 3 0.0 -1.0 -1.0 -1.0 0.0 0.0 0.0 0.0 1.0 0.0 2 -1.0 -1.0 -1.0 -1.0 2 0.0 -1.0 -1.0 -1.0 0.0 0.0 0.0 0.0 1.0 0.0 null null null null 1 -1.0 0.5 0.5 0.5 1 0.0 0.5 0.0 0.5 0.0 0.5 0.0 1.0 0.0 0.0 0 -1.0 -1.0 -1.0 -1.0 0 0.0 -1.0 -1.0 -1.0 0.0 0.0 0.0 0.0 1.0 0.0 0 1 2 3 x -1 0.0 0.0 0.0 0.0 0.0 0.0 null -1 0 1 2 3 4 x null 2011-09-22 IDEAS 2011 13
  • 22. Array Tiling SELECT [x], [y], AVG(v) FROM A1 GROUP BY A1[x:x+2][y:y+2]; y null 3 0.0 0.0 0.0 0.0 2 0.0 0.0 0.0 0.0 null null 1 0.0 0.5 0.5 0.5 0 0.0 0.0 0.0 0.0 0 1 2 3 x null 2011-09-22 IDEAS 2011 14
  • 23. Array Tiling SELECT [x], [y], AVG(v) FROM A1 GROUP BY A1[x:x+2][y:y+2]; y null 3 0.0 0.0 0.0 0.0 Anchor point: 2 0.0 0.0 0.0 0.0 A1[x][y] null null 1 0.0 0.5 0.5 0.5 0 0.0 0.0 0.0 0.0 0 1 2 3 x null 2011-09-22 IDEAS 2011 14
  • 24. Array Tiling SELECT [x], [y], AVG(v) FROM A1 GROUP BY A1[x:x+2][y:y+2]; y null 3 0.0 0.0 0.0 0.0 Anchor point: 2 0.0 0.0 0.0 0.0 A1[x][y] null null 1 0.0 0.5 0.5 0.5 0 0.0 0.0 0.0 0.0 0 1 2 3 x null 2011-09-22 IDEAS 2011 14
  • 25. Array Tiling SELECT [x], [y], AVG(v) FROM A1 GROUP BY A1[x:x+2][y:y+2]; y null 3 0.0 0.0 0.0 0.0 Anchor point: 2 0.0 0.0 0.0 0.0 A1[x][y] null null 1 0.0 0.5 0.5 0.5 0 0.0 0.0 0.0 0.0 0 1 2 3 x null 2011-09-22 IDEAS 2011 14
  • 26. Array Tiling SELECT [x], [y], AVG(v) FROM A1 GROUP BY A1[x:x+2][y:y+2]; y null 3 0.0 0.0 0.0 0.0 Anchor point: 2 0.0 0.0 0.0 0.0 A1[x][y] null null 1 0.0 0.5 0.5 0.5 0 0.0 0.0 0.0 0.0 0 1 2 3 x null 2011-09-22 IDEAS 2011 14
  • 27. Array Tiling SELECT [x], [y], AVG(v) FROM A1 GROUP BY A1[x:x+2][y:y+2]; y null 3 0.0 0.0 0.0 0.0 Anchor point: 2 0.0 0.0 0.0 0.0 A1[x][y] null null 1 0.0 0.5 0.5 0.5 0 0.0 0.0 0.0 0.0 0 1 2 3 x null 2011-09-22 IDEAS 2011 14
  • 28. Array Tiling SELECT [x], [y], AVG(v) FROM A1 GROUP BY A1[x:x+2][y:y+2]; y null 3 0.0 0.0 0.0 0.0 2 0.0 0.0 0.0 0.0 null null 1 0.125 0.25 0.25 0.25 0 0.125 0.25 0.25 0.25 0 1 2 3 x null 2011-09-22 IDEAS 2011 15
  • 29. Array Tiling SELECT [x], [y], AVG(v) FROM A1 GROUP BY A1[x-1][y], A1[x][y-1], A1[x][y], A1[x+1][y], A1[x][y+1]; y null 3 0.0 0.0 0.0 0.0 2 0.0 0.0 0.0 0.0 null null 1 0.0 0.5 0.5 0.5 0 0.0 0.0 0.0 0.0 0 1 2 3 x null 2011-09-22 IDEAS 2011 16
  • 30. Array Tiling SELECT [x], [y], AVG(v) FROM A1 GROUP BY A1[x-1][y], A1[x][y-1], A1[x][y], A1[x+1][y], A1[x][y+1]; y null 3 0.0 0.0 0.0 0.0 2 0.0 0.0 0.0 0.0 null null 1 0.0 0.5 0.5 0.5 0 0.0 0.0 0.0 0.0 0 1 2 3 x null 2011-09-22 IDEAS 2011 17
  • 31. Array Tiling SELECT [x], [y], AVG(v) FROM A1 GROUP BY A1[x-1][y], A1[x][y-1], A1[x][y], A1[x+1][y], A1[x][y+1]; y null 3 0.0 0.0 0.0 0.0 2 0.0 0.0 0.0 0.0 null null 1 0.0 0.5 0.5 0.5 0 0.0 0.0 0.0 0.0 0 1 2 3 x null 2011-09-22 IDEAS 2011 18
  • 32. Array Tiling SELECT [x], [y], AVG(v) FROM A1 GROUP BY A1[x-1][y], A1[x][y-1], A1[x][y], A1[x+1][y], A1[x][y+1]; y null 3 0.0 0.0 0.0 0.0 2 0.0 0.0 0.0 0.0 null null 1 0.0 0.5 0.5 0.5 0 0.0 0.0 0.0 0.0 0 1 2 3 x null 2011-09-22 IDEAS 2011 19
  • 33. Array Tiling SELECT [x], [y], AVG(v) FROM A1 GROUP BY A1[x-1][y], A1[x][y-1], A1[x][y], A1[x+1][y], A1[x][y+1]; y null 3 0.0 0.0 0.0 0.0 2 0.0 0.1 0.1 0.0 null null 1 0.125 0.2 0.3 0.25 0 0.0 0.125 0.125 0.167 0 1 2 3 x null 2011-09-22 IDEAS 2011 20
  • 34. Seismology Use Case Recent aftershock in Chili 2TB waveform data at 100Hz detecting seismic events using STA/LTA (e.g., 2 sec / 15 sec) remove false positives window-based 3 min. cuts further analysis: digital signal processing operations Current problems accessing waveform files too slow unpacking and positioning MSEED data every time take too long 2011-09-22 IDEAS 2011 21
  • 35. Seismology Use Case Recent aftershock in Chili CREATE ARRAY MSeed ( station VARCHAR(5) DIMENSION [‘0’:*:‘ZZZZZ’]; time TIMESTAMP DIMENSION, 2TB waveform data at 100Hz data DECIMAL(8,6) ); detecting seismic events using STA/LTA (e.g., 2 sec / 15 sec) station remove false positives efg window-based 3 min. cuts bce further analysis: digital signal bcd processing operations abc Current problems time accessing waveform files too slow unpacking and positioning MSEED data every time take too long 2011-09-22 IDEAS 2011 22
  • 36. Seismology Use Case Recent aftershock in Chili --- avg of 2 sec. windows: SELECT M.station, M.time, AVG(M.data) 2TB waveform data at 100Hz FROM MSeed AS M GROUP BY detecting seismic events using M[station][time - INTERVAL ‘2’ SECOND : time]; STA/LTA (e.g., 2 sec / 15 sec) remove false positives window-based 3 min. cuts further analysis: digital signal processing operations Current problems accessing waveform files too slow unpacking and positioning MSEED data every time take too long 2011-09-22 IDEAS 2011 23
  • 37. Seismology Use Case Recent aftershock in Chili CREATE TABLE Event( station VARCHAR(5), time TIMESTAMP, 2TB waveform data at 100Hz ratio FLOAT, PRIMARY KEY (station, time)); detecting seismic events using STA/LTA (e.g., 2 sec / 15 sec) INSERT INTO Event SELECT M1.station, M1.time, remove false positives AVG(M1.data)/AVG(M2.data) AS ratio FROM MSeed AS M1, MSeed AS M2 WHERE M1.station = M2.station window-based 3 min. cuts AND M1.time = M2.time GROUP BY further analysis: digital signal M1[station][time - INTERVAL ‘2’ SECOND: time], processing operations M2[station][time - INTERVAL ‘15’ SECOND: time] HAVING AVG(M1.data)/AVG(M2.data) > ?delta; Current problems accessing waveform files too slow unpacking and positioning MSEED data every time take too long 2011-09-22 IDEAS 2011 24
  • 38. Seismology Use Case Recent aftershock in Chili -- detect isolated errors by direct environment -- using wave propagation statics 2TB waveform data at 100Hz CREATE TABLE Neighbors( station1 VARCHAR(5), detecting seismic events using station2 VARCHAR(5), STA/LTA (e.g., 2 sec / 15 sec) mindelay INTERVAL SECOND, maxdelay INTERVAL SECOND, remove false positives weight FLOAT ); window-based 3 min. cuts -- remove the false positives from Event further analysis: digital signal processing operations DELETE FROM Event WHERE id NOT IN ( SELECT E1.id Current problems FROM Event AS E1, Event AS E2, Neighbor AS N WHERE E1.station = N.station1 AND E2.station = N.station2 accessing waveform files too slow AND E2.time BETWEEN E1.time + N.mindelay AND E1.time + N.maxdelay unpacking and positioning MSEED AND E1.ratio > E2.ratio * N.weight); data every time take too long 2011-09-22 IDEAS 2011 25
  • 39. Seismology Use Case Recent aftershocks in Chili -- pass time series to a UDF, written in, e.g., C: SELECT myfunction(M[station].*) 2TB waveform data at 100Hz FROM MSeed AS M, Event AS E WHERE M.station = E.station detecting seismic events using AND M.time = E.time STA/LTA (e.g., 2 sec / 15 sec) GROUP BY DISTINCT M[station][time - INTERVAL ‘1’ MINUTE : remove false positives time + INTERVAL ‘2’ MINUTE]; window-based 3 min. cuts further analysis: digital signal processing operations Current problems accessing waveform files too slow unpacking and positioning MSEED data every time take too long 2011-09-22 IDEAS 2011 26
  • 40. Conclusion SciQL: a first step towards a tailored scientific DBMS A symbiosis of relational and array paradigms Under active implementation Open issues: Appropriate array denotations Functional complete operation set Size limitations (due to BLOB representations) Existing foreign files !"#$%&'()*+,#-&$.#/(012#&+$#%3$%#,( 2.#(4&#$5()*+,#-&$".1(6&$& Scale !"#$%&'()&"#*+,-( ./0/123 4")*'()5"%%,%*'(*#-(( 6!7(8 9:7;;9 2011-09-22 IDEAS 2011 27