Dr. Anand Deshpande, Chairman, Managing Director & CEO, Persistent systems Ltd talks about Life and Work of Jim Gray ( 1998 Turing Award Recipient) during 6th Turing Session
3. JAMES ("JIM") NICHOLAS GRAY
United States – 1998
CITATION
For fundamental contributions to database and
transaction processing research and technical
leadership in system implementation from research
prototypes to commercial products. The transaction
is the fundamental abstraction underlying database
system concurrency and failure recovery. Gray’s
work [defined] the key transaction properties:
atomicity, consistency, isolation and durability, and
his locking and recovery work demonstrated how to
build … systems that exhibit these properties.
3
4. E. F. Codd invented the
Relational Databases in
1970 and created what is a
100+ Billion Dollar/year
Industry today.
5. Codd’s Relational Model
● Simple model
● Data stored in relational tables
● Data Independence – separation of
data storage and data access
● Declarative Queries
● Algebra to mathematically reason
about data objects – made query
optimization possible
● Ad-hoc queries through SQL.
● Embedded in operational systems.
5
6. ACID properties are
fundamental to
Relational Systems and
necessary for on-line
transaction processing (OLTP)
systems
Atomicity
● Jim Gray defined ACID
properties to guarantee
Consistency
database transactions are Isolation
processed reliably. Durability
6
7. From Transactions to Transaction
Processing Systems - II
Reality Abstraction
DB
Change Transaction
Q u ery
'
DB Answer
The real state is represented by an abstraction, called the database, and the
transformation of the real state is mirrored by the execution of a program, called a
transaction, that transforms the database.
7
8. Gray defined
Data Manipulation Actions as
• transient and • grouped into • involve sensors,
internal state transactions and actuators etc. They
reflected in the cannot be undone
state of transaction they can be
outcome compensated.
Unprotected Protected Real
8
9. Definitions
● A transaction is a sequence of operations that form a single
unit of work
● A transaction is often initiated by an application program
– begin a transaction
START TRANSACTION
– end a transaction
COMMIT (if successful) or ROLLBACK (if errors)
● Either the whole transaction must succeed or the effect of all
operations has to be undone (rollback)
● To achieve durable transaction atomicity, the transition to the
―committed‖ state must be accomplished by the single write to
non-volatile storage.
9
10. Structure of a Transaction Program
BEGIN WORK ()
ROLL BACK WORK ()
WORK
ROLL BACK WORK ()
COMMIT WORK ()
10
11. While at IBM San Jose Research
Laboratory
October 1972 to December 1980
● Jim Gray developed three key ideas related to transaction concurrency
control:
– The notion of transaction
– Serializability; degrees of consistency;
– Multi-granularity locking.
● There are two main transaction issues
– concurrent execution of multiple transactions
– recovery after hardware failures and system crashes
11
12. Write Ahead Log (WAL) protocol
● The WAL protocol records the old and new states induced by
protected actions separately from the actual state changes.
● The logged changes are written to stable storage before the
actual changes are written back to stable storage (that‘s the
―Write Ahead‖ part).
● Transactions are committed by simply appending and writing a
‗commit‘ record to the recovery log. Logged changes are used
to undo protected actions of aborted transactions and of
transactions in progress at the time of a system failure.
12
13. Write Ahead Log (WAL) protocol
● Log records are also used to redo committed actions
whose actual changes have not been written back to
stable storage at the time of a system failure.
● The WAL protocol allows changed data to be written to
their stable storage home at any time after the log
records describing the changes have been written into the
stable log.
● This gives the Database Manager great flexibility in
managing the contents of its volatile data buffer pools.
13
14. ACID Properties: First Definition
● Atomicity: A transaction‘s changes to the state are atomic:
either all happen or none happen. These changes include
database changes, messages, and actions on transducers.
● Consistency: A transaction is a correct transformation of the
state. The actions taken as a group do not violate any of the
integrity constraints associated with the state. This requires that
the transaction be a correct program.
● Isolation: Even though transactions execute concurrently, it
appears to each transaction T, that others executed either
before T or after T, but not both.
● Durability: Once a transaction completes successfully
(commits), its changes to the state survive failures.
14
15. [Gray 1993] Jim Gray and Andreas Reuter,
Transaction Processing: Concepts and
Techniques, Morgan Kaufmann, San
Mateo, CA (1993).
15
16. In 1985, Jim and a number of other
senior leaders in the field of transaction
processing started the HPTS (High
Performance Transaction Systems)
Workshop [HPTS]. This is a biennial
gathering of folks interested in
transaction systems (and things related
to scalable systems). It includes people
from competing companies in industry
and also from academia. Over the last
22 years, it has evolved to include many
different topics as high-end computing
morphed from the mainframe to the
Internet.
16
17. The early years …
● Born January 12, 1944
● 1961 graduated from Westmoor
High School in San Francisco.
● 1966 graduated from the
University of California at
Berkeley with bachelor‘s degree
in mathematics and engineering.
17
18. James Nicholas Gray was born in San
Francisco, California on 12 January
1944.
● In 1961 Gray graduated from Westmoor High School in San
Francisco.
● He graduated from the University of California at Berkeley
bachelor‘s degree in mathematics and engineering in 1966.
● After spending a year in New Jersey working at Bell
Laboratories in Murray Hill and attending classes at the
Courant Institute in New York City, he returned to Berkeley and
enrolled in the newly-formed computer science department,
earning a Ph.D. in 1969 for work on context-free grammars and
formal language theory.
18
19. 5-minute rule
for Memory vs. Disk Access (1987)
When does it make economic sense to
hold pages in memory versus doing IO
every time data from the page is
accessed?
THE FIVE MINUTE RULE
Pages referenced every
five minutes should be
memory resident.
19
20. From Tandem Report 1987:
Jim Gray and Gianfranco Putzolu
● The argument goes as follows: A Tandem disc, and half a
controller comfortably deliver 15 accesses per second and are
priced at 15K$ for a small disc and 20K$ for a large disc (180Mb
and 540Mb respectively).
● So the price per access per second is about 1K$. The extra CPU
and channel cost for supporting a disc are lK$/a/s. So one disc
access per second costs about 2K$ on a Tandem system.
● A megabyte of Tandem main memory costs 5K$, so a kilobyte
costs 5$.
20
21. ● If making a 1Kb record resident saves 1a/s, then it saves
about 2K$ worth of disc accesses at a cost of 5$, a good
deal. If it saves 0.1 a/s then it saves about 200$, still a
good deal. Continuing this, the break even point is an
access every 2000/5 - 400 seconds.
● So, any 1KB record accessed more frequently than every
400 seconds should live in main memory. 400 seconds is
"about" 5 minutes, hence the name: the Five Minute Rule.
21
22. 5-minute rule
● The five-minute rule is based on the tradeoff between the
cost of RAM and the cost of disk accesses.
22
23. 5-minute rule
● The five-minute rule is based on the tradeoff between the
cost of RAM and the cost of disk accesses.
23
25. New Storage Metrics:
Kaps, Maps, SCAN
● Kaps: How many kilobyte objects served per second
– The file server, transaction processing metric
– This is the OLD metric.
● Maps: How many megabyte objects served per sec
– The Multi-Media metric
● SCAN: How long to scan all the data
– the data mining and utility metric
● And
– Kaps/$, Maps/$, TBscan/$
25
26. Disk Changes
● Disks got cheaper: 20k$ -> 1K$ (or even 200$)
– $/Kaps etc improved 100x (Moore‘s law!) (or even 500x)
– One-time event (went from mainframe prices to PC prices)
● Disk data got cooler (10x per decade):
– 1990 disk ~ 1GB and 50Kaps and 5 minute scan
– 2000 disk ~70GB and 120Kaps and 45 minute scan
● So
– 1990: 1 Kaps per 20 MB
– 2000: 1 Kaps per 500 MB
– disk scans take longer (10x per decade)
● Backup/restore takes a long time (too long)
26
27. Storage Ratios Changed
● 10x better access time ● DRAM/disk media price
● 10x more bandwidth ratio changed
● 100x more capacity – 1970-1990 100:1
– 1990-1995 10:1
● Data 25x cooler
– 1995-1997 50:1
(1Kaps/20MB vs
– today
1Kaps/500MB) ~ 0.03$/MB disk 100:1
● 4,000x lower media price 3$/MB dram
● 20x to 100x lower disk price
● Scan takes 10x longer (3
min vs 45 min)
27
28. The Five Minute Rule
● Trade DRAM for Disk Accesses
● Cost of an access (DriveCost / Access_per_second)
● Cost of a DRAM page ( $/MB / pages_per_MB)
● Break even has two terms:
● Technology term and an Economic term
● Grew page size to compensate for changing ratios.
● Still at 5 minute for random, 1 minute sequential
From his presentations in 2000 28
29. Data on Disk
Can Move to RAM in 10 years
Storage Price vs Time
Megabytes per kilo-dollar
10,000.
1,000.
100.
MB/k$
100:1 10.
10 years 1.
0.1
1980 1990 2000
Ye ar 29
30. Storage Hierarchy :
Speed & Capacity vs Cost Tradeoffs
Size vs Speed Price vs Speed
1015 Nearline Cache 102
Tape
Typical System (bytes)
Offline Main
1012 Disc Tape Secondary 100
Online
$/MB
Secondary Online
Tape Tape
Disc
109 Main 10-2
Nearline Offline
Tape Tape
106 10-4
Cache
103 10-6
10-9 10-6 10-3 10 0 10 3 10-9 10-6 10-3 10 0 10 3
Access Time (seconds) Access Time (seconds) 30
31. 5-minute rule holds in 1997
● In summary, the five-minute rule still seems to apply to
randomly accessed pages, primarily because page sizes
have grown from 1KB to 8KB to compensate for changing
technology ratios.
31
32. Storage Latency:
How Far Away is the Data?
Andromeda
9
10 Tape /Optical 2,000 Years
Robot
106 Disk Pluto 2 Years
Olympia 1.5 hr
100 Memory
10 On Board Cache This Hotel 10 min
2 On Chip Cache This Room
1 Registers My Head 1 min
32
From Jim Gray‟s Rules of Thumb in Data Engineering Presentation
33. What’s TeraByte?
● 1 Terabyte:
– 1,000,000,000 business letters 150 miles of book shelf
– 100,000,000 book pages 15 miles of book shelf
– 50,000,000 FAX images 7 miles of book shelf
– 10,000,000 TV pictures (mpeg) 10 days of video
– 4,000 LandSat images 16 earth images (100m)
– 100,000,000 web page 10 copies of the web HTML
● Library of Congress (in ASCII) is 25 TB
– 1980: $200 million of disc 10,000 discs
– $5 million of tape silo 10,000 tapes
– 1997: 200 k$ of magnetic disc 48 discs
– 30 k$ nearline tape 20 tapes
Jim Gray‘s presentations 1995
Terror Byte ! 33
34. Yotta
How Much Information Is there?
Everything!
Zetta
● Soon everything can be Recorded
recorded and indexed
All Books Exa
● Most data never be seen by
MultiMedia
humans
Peta
All LoC books
● Precious Resource: (words) Tera
Human attention
– Auto-Summarization .Movie
– Auto-Search Giga
is key technology.
http://www.lesk.com/mlesk/ksg97 A Photo
/ksg.html Mega
24 Yecto, 21 zepto, 18 atto, 15 femto, 12 pico, 9 nano, 6 micro, 3 milli A Book 34
Kilo
36. The 5-minute rule holds in 2007
● The old five-minute rule for RAM and disk now applies to
64KB page sizes (334 seconds).
– Five minutes had been the approximate break-even interval for
1KB in 198715and for 8KB in 1997.14
● The five-minute break-even interval also applies to RAM
and the expensive flash memory of 2007 for page sizes of
64KB and above (365 seconds and 339 seconds).
– As the price premium for flash memory decreases, so does the
break-even interval (146 seconds and 136 seconds).
36
37. Flash memory falls between
traditional RAM and
persistent mass storage
based on rotating disks in
terms of acquisition cost,
access latency, transfer
bandwidth, spatial density,
power consumption, and
cooling costs.
37
38. 20 years out:
Summary and Conclusion
● The 20-year-old five-minute rule for RAM and disks still
holds, but for ever-larger disk pages.
● It should be augmented by two new five-minute rules:
– for small pages moving between RAM and flash memory and
– for large pages moving between flash memory and traditional
disks.
● For small pages moving between RAM and disk, Gray and
Putzolu were amazingly accurate in predicting a five-hour
break-even point 20 years into the future.
38
42. Aggregates in SQL
● The SQL standard [Melton, Simon] provides five SUM()
aggregate functions:
COUNT, SUM, MIN, MAX, AVG
SELECT [DISTINCT] AVG(Temp)
FROM Weather;
● Aggregate functions return a single value. In addition,
SQL allows aggregation over distinct values.
Table
attribute SUM()
A
● Using GROUP BY , SQL can create a table of aggregate
A
A
A
values indexed by a set of attributes.
B
B
B B
B
SELECT Time, Altitude, AVG(Temp) B
C
C
FROM Weather C
C
C
C
GROUP BY Time, Altitude; D
D D
42
43. Problems With This Design
● Users Want Histograms
● Users want sub-totals and totals
sum
– drill-down & roll-up reports F() G() H()
● Users want CrossTabs
● Conventional wisdom
– These are not relational operators AIR
M T W T F S S •
– They are in many report writers and HOTEL
FOOD
query engines MISC
•
43
44. Other Variants – Illustra
● init(&handle):
– Allocates the handle and initializes the aggregate computation.
● iter(&handle, value):
– Aggregates the next value into the current aggregate.
● value = final(&handle):
– Computes and returns the resulting aggregate by using data
saved in the handle. This invocation deallocates the handle.
44
45. Agg reg at e
DATA CUBE and Gro up B y
Su m
ROLLUP (wit h t ot al)
By Colo r
RED
WHIT E
BLUE
SELECT Model, Year, Color
Su m
SUM(Sales) AS total, C ro ss Ta b
By Colo r
SUM(Sales) / total(ALL,ALL,ALL) RED
Chevy Ford
FROM Sales WHIT E
BLUE
WHERE Model IN {‘Ford’, ‘Chevy’} By Make
Th e Da ta C ube a nd
Su m
AND Year Between 1990 AND 1992 Th e Su b- Space Agg re ga te s
CH FO
RD 0
EV 1 9 9 91
GROUP BY CUBE(Model, Year, Color); Y 1 9 92
19 3
199
By Year
By Make
By Make & Year
RED
WHIT E
BLUE
By Colo r & Year
By Make & Col or
Su m By Colo r
45
48. A Dozen Information Technology Research Goals
1. Scalability: Devise a software and hardware architecture that scales
up by a factor of 106. That is, an application‘s storage and
processing capacity can automatically grow by a factor of million,
doing jobs faster (106 x speedup) or doing larger jobs in the same
time (106 x scale-up), just by adding more resources.
2. The Turing Test: Build a computer system that wins the imitation
game at least 30% of the time.
3. Speech to text: Hear as well as a native speaker.
4. Text to speech: Speak as well as a native speaker.
5. See as well as person: Recognize objects and motion.
48
49. A Dozen Information Technology Research Goals
6. Personal Memex: Record every thing a person sees and hears and
quickly re retrieve any iteration on request.
7. World Memex: Build a system that given a text corpus, can answer
questions about and summarize the text as precisely and quickly as
a human expert in that field. Do the same for music, images, art and
cinema.
8. Telepresence: Simulate being some other place retrospectively as
an observer.
(Teleobserver): hear and see as well as actually being there and as
well as participant. Simulate being some other place as a participant
(Telepresent): interacting with others and with the environment as
though you are actually there.
49
50. A Dozen Information Technology Research Goals
9. Trouble-Free Systems: Built a system used by millions of people
each day and yet administered and managed by a single part-time
person.
10. Secure System: Assure that the system of problem 9 services only
authorized users, service cannot be denied by unauthorized users
and information cannot be stolen (and prove it).
11. Always Up: Assure that the system is unavailable for less than one
second per hundred years – eight s of availability (and prove it).
50
51. A Dozen Information Technology Research Goals
12. Automatic Programmer: Devise a specification language or user
interface that
– Makes it easy for people to express designs (1,000x easier),
– Computer can compile, and
– Can describe all applications (is complete).
The system should reason about application, asking questions about
exception cases and incomplete specification. But is should not be onerous
to use.
51
52. Computer Industry Laws
(Rules of thumb)
● Metcalf‘s law
● Moore‘s first law
● Bell‘s computer classes (7 price tiers)
● Bell‘s platform evolution
● Bell‘s platform economics
● Bill‘s law
● Software economics
● Grove‘s law
● Moore‘s second law
● Is info-demand infinite?
● The death of Grosch‘s law 52
53. Gordon Bell’s Seven Price Tiers
10$: wrist watch computers
100$: pocket/ palm computers
1,000$: portable computers
10,000$: personal computers (desktop)
•
100,000$: departmental computers (closet)
1,000,000$: site computers (glass house)
10,000,000$: regional computers (glass castle)
Super server: costs more than $100,000
“Mainframe”: costs more than $1 million
Must be an array of processors, disks, tapes, comm ports 53
54. Information at your fingertips.
Bill Gates is known for his long-standing
belief that, as he once put it, ‖any piece of
information you want should be available
to you. -- Putting Information at Your
Fingertips.‖
Gates championed it as early as 1989,
and he was in a position to do something
about it. It remained his overriding goal
for the next two decades.
54
55. The Vision: Global Data Federation
● Massive datasets live near their owners:
– Near the instrument‘s software pipeline
– Near the applications
– Near data knowledge and curation
● Each Archive publishes a (web) service
– Schema: documents the data
– Methods on objects (queries)
● Scientists get ―personalized‖ extracts
● Uniform access to multiple Archives
– A common global schema
Federation
55
56. Gray and Bell
worked closely at
Digital and at
Microsoft’s Bay
Area Research
Center since 1994
● MyLifeBits
● Terra Server
56
57. Gordon Bell’s: MyLifeBits
● MylifeBits is a lifetime store of everything.
It is the fulfillment of Vannevar Bush‘s 1945
Memex vision including full-text search,
text and audio annotations, and hyperlinks.
● The experiment:
Gordon Bell has captured a lifetime's worth of
articles, books, cards, CDs, letters, memos,
papers, photos, pictures, presentations, home
movies, videotaped lectures, and voice
recordings and stored them digitally. He is now
paperless, and is beginning to capture phone
calls, IM transcripts, television, and radio.
57
59. TerraServer
In late spring of 1996, Paul Flessner, the General Manager of the
SQL Server team asked our lab to build a database application
that would test and demonstrate the scalability of the next release
of SQL Server code named ―Sphinx‖.
One of Jim‘s greatest abilities was to clearly define and articulate
the problem. The SQL team gave us two goals:
1. Test SQL‘s ability to scale up to support a database of one
terabyte or larger.
2. An internet application where SQL marketing could
demonstrate Windows and SQL Server‘s scalability.
59
61. TerraServer Requirements
● BIG —1 TB of data including catalog, temporary space, etc.
● PUBLIC — available on the world wide web
● INTERESTING — to a wide audience
● ACCESSIBLE — using standard browsers (IE, Netscape)
● REAL — a LOB application (users can buy imagery)
● FREE —cannot require NDA or money to a user to access
● FAST — usable on low-speed (56kbps) and high speeds(T-1+)
● EASY — we do not want a large group to develop, deploy, or
maintain the application
● CHEAP – An unwritten requirement
(1) because TerraServer was only a prototype, test, and free
demonstration; and (2) Jim Gray was a very frugal person! 61
62. SOVINFORMSPUTNIK
(the Russian Space Agency)
and Aerial Images
United States Geological An Interesting Internet
Survey (USGS) Server
http://msdn.microsoft.com/en-us/library/aa226316(v=sql.70).aspx 62
63. Thesis: Scaleable Servers
● Scaleable Servers
– Commodity hardware allows new applications
– New applications need huge servers
– Clients and servers are built of the same ―stuff‖
• Commodity software and
• Commodity hardware
● Servers should be able to
– Scale up (grow node by adding CPUs, disks, networks)
– Scale out (grow by adding nodes)
– Scale down (can start small)
● Key software technologies
– Objects, Transactions, Clusters, Parallelism
63
64. Thesis: Scaleable Servers
● Scaleable Servers
– Commodity hardware allows new applications
– New applications need huge servers
– Clients and servers are built of the same ―stuff‖
• Commodity software and
• Commodity hardware
● Servers should be able to
– Scale up (grow node by adding CPUs, disks, networks)
– Scale out (grow by adding nodes)
– Scale down (can start small)
● Key software technologies
– Objects, Transactions, Clusters, Parallelism
64
65. Scaleable Servers
BOTH SMP And Cluster
Grow up with SMP; 4xP6
SMP super is now standard
server
Grow out with cluster
Cluster has inexpensive parts
Departmental
server
Cluster
of PCs
Personal
system
65
66. SMPs Have Advantages
● Single system image easier to
manage, easier to program
threads in shared memory, SMP super
disk, Net server
● 4x SMP is commodity
● Software capable of 16x
Departmental
● Problems: server
– >4 not commodity
– Scale-down problem (starter
systems expensive) Personal
● There is a BIGGEST one system
66
67. Grow UP and OUT
1 Terabyte DB Cluster:
•a collection of nodes
•as easy to program and manage as
SMP super a single node
server
Departmental 1 billion
server transactions
per day
Personal
system
67
68. Clusters Have Advantages
● Clients and servers made from the same stuff
● Inexpensive:
– Built with commodity components
● Fault tolerance:
– Spare modules mask failures
● Modular growth
– Grow by adding small modules
● Unlimited growth:
no biggest one
68
69. Windows NT Clusters
● Microsoft & 60 vendors defining NT clusters
– Almost all big hardware and software vendors involved
● No special hardware needed - but it may help
● Fault-tolerant first, scaleable second
– Microsoft, Oracle, SAP giving demos today
● Enables
– Commodity fault-tolerance
– Commodity parallelism (data mining, virtual reality…)
– Also great for workgroups!
69
70. Parallelism
The OTHER aspect of clusters
● Clusters of machines allow two
kinds of parallelism
– Many little jobs: online transaction
processing
• TPC-A, B, C…
– A few big jobs: data search and
analysis
• TPC-D, DSS, OLAP
● Both give automatic parallelism
70
71. Kinds of Parallel Execution
Any Any
Sequential Sequential
Pipeline Program Program
Partition Any
Sequential
Any
Sequential
Program Program
outputs split N ways
inputs merge M ways
71
Jim Gray & Gordon Bell: VLDB 95 Parallel Database Systems Survey
72. Data Rivers
Split + Merge Streams
N X M Data Streams
M Consumers
N producers
River
Producers add records to the river,
Consumers consume records from the river
Purely sequential programming.
River does flow control and buffering
does partition and merge of data records
River = Split/Merge in Gamma = Exchange operator in Volcano.
72
Jim Gray & Gordon Bell: VLDB 95 Parallel Database Systems Survey
73. Partitioned Execution
Spreads computation and IO among processors
Count
Count Count Count Count Count
A Table
A...E F...J K...N O...S T...Z
Partitioned data gives
NATURAL parallelism
73
Jim Gray & Gordon Bell: VLDB 95 Parallel Database Systems Survey
74. N x M way Parallelism
Merge Merge Merge
Sort Sort Sort Sort Sort
Join Join Join Join Join
A...E F...J K...N O...S T...Z
N inputs, M outputs, no bottlenecks.
Partitioned Data
Partitioned and Pipelined Data Flows
74
Jim Gray & Gordon Bell: VLDB 95 Parallel Database Systems Survey
75. Year 2000
The Year 2000 commodity PC
4B Machine
1 Bips Processor
●Billion Instructions/Sec
● .1 Billion Bytes RAM
.1 B byte RAM
●Billion Bits/s Net
10 GB byte Disk
● 10 B Bytes Disk
●Billion Pixel display
– 3000 x 3000 x 24
● 1,000 $
75
Jim Gray & Gordon Bell: 1997 presentations
76. Super Server: 4T Machine
● Array of 1,000 4B machines
– 1 b ips processors
– 1 B B DRAM CPU
– 10 B B disks 50 GB Disc
– 1 Bbps comm lines 5 GB RAM
– 1 TB tape robot
● A few megabucks
● Challenge: Cyber Brick
– Manageability
a 4B machine
– Programmability
– Security
Future servers are CLUSTERS
– Availability of processors, discs
– Scaleability
– Affordability Distributed database techniques
make clusters work
● As easy as a single system
76
Jim Gray & Gordon Bell: 1997 presentations
77. Jim Gray’s quest for real problems and
real data … led to a collaboration with
Astronomers. Why Astronomy Data?
● It has no commercial value
– No privacy concerns
– Can freely share results with others
– Great for experimenting with algorithms
● It is real and well documented
– High-dimensional data (with confidence intervals)
– Spatial data
– Temporal data
● Many different instruments from many different
places and many different times
● Federation is a goal
Alex Szalay ● There is a lot of it (petabytes)
77
78. Availability and
ability
to handle
very large volumes
of storage and
complex computing
is redefining how we
do Science
78
79. Galileo and his telescope
First Paradigm:
For thousands of years, Science was about
empirically describing natural phenomenon
79
81. Third Paradigm:
Computational Science: Simulating
Complex Phenomenon
Over the last
25 years
Scientists
have used
computer
simulation to
validate
theories.
A hurricane computer simulation. 81
82. Fourth Paradigm:
Data Intensive Science
The scientific method was traditionally driven by hypothesis.
First scientists predict a good response, then collect
experimental data to validate the data against its predictions.
However, in the new data-driven approach researchers start
with collecting data and analyze data later.
82
83. Scientists are collecting data
How to codify data and extract insights and
knowledge?
Experiments and
Instruments
Simulations
Question
Literature
Answer
Other Archives
83
84. Astronomy
● Help build world-wide telescope
– All astronomy data and literature online
and cross indexed
– Tools to analyze the data
● Built SkyServer.SDSS.org
● Built Analysis system
– MyDB
– CasJobs (batch job)
● Results:
– It works and is used every day
– Spatial extensions in SQL 2005
– A good example of Data Grid
– Good examples of Web Services.
85. World Wide Telescope
Virtual Observatory
http://www.us-vo.org/ http://www.ivoa.net/
● Premise: Most data is (or could be online)
● So, the Internet is the world‘s best telescope:
– It has data on every part of the sky
– In every measured spectral band: optical, x-ray, radio..
– As deep as the best instruments (2 years ago).
– It is up when you are up.
The ―seeing‖ is always great
(no working at night, no clouds no moons no..).
– It‘s a smart telescope:
links objects and data to literature on them.
86. SkyServer.SDSS.org
● A modern archive
– Access to Sloan Digital Sky Survey
Spectroscopic and Optical surveys
– Raw Pixel data lives in file servers
– Catalog data (derived objects) lives in Database
– Online query to any and all
● Also used for education
– 150 hours of online Astronomy
– Implicitly teaches data analysis
● Interesting things
– Spatial data search
– Client query interface via Java Applet
– Query from Emacs, Python, ….
– Cloned by other surveys (a template design)
– Web services are core of it.
87. SkyServer
SkyServer.SDSS.org
● Like the TerraServer,
but looking the other way:
a picture of ¼ of the universe
● Sloan Digital Sky Survey Data:
Pixels + Data Mining
● About 400 attributes per
―object‖
● Spectrograms for 1% of
objects
89. SkyQuery (http://skyquery.net/)
● Distributed Query tool using a set of web services
● Many astronomy archives from
Pasadena, Chicago, Baltimore, Cambridge (England)
● Has grown from 4 to 15 archives,
now becoming
international standard
WebService Poster Child
●SELECT o.objId, o.r, o.type, t.objId
● Allows queries like:
FROM SDSS:PhotoPrimary o,
TWOMASS:PhotoPrimary t
WHERE XMATCH(o,t)<3.5
AND AREA(181.3,-0.76,6.5)
AND o.type=3 and (o.I - t.m_j)>2
90. SkyServer/SkyQuery Evolution
MyDB and Batch Jobs
Problem: need multi-step data analysis (not just single
query).
Solution: Allow personal databases on portal
Problem: some queries are monsters
Solution: ―Batch schedule‖ on portal. Deposits answer in
personal database.
91. Ecosystem Sensor Net
LifeUnderYourFeet.Org
● Small sensor net monitoring soil
● Sensors feed to a database
● Helping build system to
collect & organize data.
● Working on data analysis tools
● Prototype for other LIMS
Laboratory Information Management Systems
92. RNA Structural Genomics
● Goal: Predict secondary and
tertiary structure
from sequence.
Deduce tree of life.
● Technique: Analyze
sequence variations sharing
a common structure
across tree of life
● Representing
structurally aligned sequences
is a key challenge
● Creating a database-driven alignment
workbench accessing public and private
sequence data
93. VHA Health Informatics
● VHA: largest standardized electronic medical records system in US.
● Design, populate and tune a ~20 TB Data Warehouse and Analytics
environment
● Evaluate population health and treatment outcomes,
● Support epidemiological studies
– 7 million enrollees
– 5 million patients
– Example Milestones:
• 1 Billionth Vital Sign loaded
in April „06
• 30-minutes to population-wide
obesity analysis (next slide)
• Discovered seasonality in
blood pressure -- NEJM fall „06
95. Jim Gray’s work on Fourth Paradigm
and eScience has had a profound
impact on the scientific community.
This work continues …
95
96. Jim Gray eScience Award
Each year, Microsoft Research presents the Jim Gray eScience
Award to a researcher who has made an outstanding
contribution to the field of data-intensive computing. The
award recognizes innovators whose work truly makes science
easier for scientists.
96
98. Jim Gray’s Legacy
● The Prolific Writer
– Jim Gray‘s two rules for authorship:
• The person who types puts their name first, and
• It‘s easier to add a name to the list of authors Ideas
than deal with someone‘s hurt feelings.
● The Masterful Presenter
● The Sense of Community
● The Patient Listener Community People
98
99. Jim’s Life was a
Text Book on Mentoring
● Making time
● Simply Listening ● Promoting the Young
● Inspiring Self-Confidence ● Sharing Knowledge Selflessly
● Lighting the Way ● Displaying Professional
● Nurturing and Pushing Integrity
● Following the Muse ● Advocating for the Field
● Connecting Good People and ● Keeping things in Perspective
Good Ideas Without ● Being a friend
Boundaries
99
103. The University of
California, Berkeley and
Gray's family hosted a
tribute to him on May
31, 2008.
http://www.youtube.com/user/UCBerkeleyE
vents/videos?query=jim+gray
103
105. Good references
● Microsoft Faculty Summit 2011
– http://research.microsoft.com/en-us/events/fs2011/
– Tony Hey‘s presentations at the event
– http://research.microsoft.com/en-
us/events/fs2011/welcome_introduction_hey_faculitysummit_071811.pdf
● The Fourth Paradigm book
– http://research.microsoft.com/en-
us/collaboration/fourthparadigm/4th_paradigm_book_complete_lr.pdf
● Jim Gray‘s work
– http://research.microsoft.com/en-us/um/people/gray/
● Alex Szalay‘s work on Large Databases and Science
– http://www.sdss.jhu.edu/~szalay/servers.html
105
Notas del editor
Jim Gray refined the notion of a Database Transaction.He explained that application initiated data manipulationactions can be classified as “unprotected”, “protected”,and “real” actions [Gray 1981b]. Unprotected actionsinvolve transient and internal state, such as temporaryfiles. Protected actions, on the other hand, are groupedinto transactions and are reflected in the state of thetransaction outcome. The outcome of a transaction mustbe to either commit the effects of its protected actions tothe system state, or to abort and remove the protectedactions’ effects from the system state. This means thatprotected actions must be undone on transaction failure orabort and their effects must be ensured in the case oftransaction commit. Real actions involve sensors,actuators, and messages outside the DBMS. While realactions cannot be “undone”, they can be compensated.For example, if the missile is fired, the compensationcould be “debit quantity on hand and send apologies”.In order to achieve durable transaction atomicity (all ornothing for protected actions) in the presence ofprocessor, memory, storage, communication, orenvironmental failures, multiple copies of the stored datamust be maintained and a record of the protected actionsequence is needed to complete or undo transactionsinterrupted by system failures. To achieve durabletransaction atomicity, the transition to the “committed”state must be accomplished by a single write to nonvolatilestorage. To these ends Jim Gray defined the WriteAhead Log (WAL) protocol [Gray 1978, Gray 1981a]while at IBM Research. The WAL protocol records theold and new states induced by protected actions separatelyfrom the actual state changes. The logged changes arewritten to stable storage before the actual changes arewritten back to stable storage (that’s the “Write Ahead”part). Transactions are committed by simply appendingand writing a ‘commit’ record to the recovery log. Loggedchanges are used to undo protected actions of abortedtransactions and of transactions in progress at the time ofa system failure. Log records are also used to redocommitted actions whose actual changes have not beenwritten back to stable storage at the time of a systemfailure. The WAL protocol allows changed data to bewritten to their stable storage home at any time after thelog records describing the changes have been written intothe stable log. This gives the Database Manager greatflexibility in managing the contents of its volatile databuffer pools.The recovery techniques developed by Jim Grayand the System R team have been instrumental to thedeployment of on-line transaction processing applications.With the ability to recover from equipment andenvironmental failures, without loss of committed,protected actions, along with atomic (all-or-nothing)transaction completion, on-line business criticalapplications become reliable enough to replace batch andpaper-based transaction processing. The impact of Dr.Gray’s recovery technologies for transaction reliabilitycannot be overstated – without adequate reliability anddurability for transactional applications, the transition toon-line transaction processing would not have beenpossible.