SlideShare una empresa de Scribd logo
1 de 12
Managing XML and Semistructured Data Lecture 19: Compressing XML Data Prof. Dan Suciu Spring 2001
In this lecture ,[object Object],[object Object],[object Object],[object Object],[object Object]
Compression: The Problem ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
An Example:Web Server Logs ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],202.239.238.16|GET / HTTP/1.0|text/html|200|1997/10/01-00:00:02|-|4478|-|-|http://www.net.jp/|Mozilla/3.1[ja](I) ASCII File 15.9 MB  (gzipped 1.6MB): XML-ized inflates to 24.2 MB  (gzipped 2.1MB):
XMill ,[object Object],[object Object],[object Object],[object Object],[object Object]
How Xmill Works: Three Ideas < apache:entry > < apache:host > </ apache:host > . . . </ apache:entry > 202.239.238.16  GET / HTTP/1.0  text/html  200 … gzip Structure gzip Data =1.75MB + Compress the structure separately from the data:
How Xmill Works: Three Ideas < apache:entry > . . . </ apache:entry > 202.23.23.16 224.42.24.55 … gzip Structure gzip Data1 =1.33MB + GET / HTTP/1.0 GET / HTTP/1.1 … gzip Data2 + Group the data values according to their types:
How Xmill Works: Three Ideas Apply semantic (specialized) compressors: ,[object Object],[object Object],[object Object],[object Object],[object Object],gzip Structure  +  gzip c1(Data1)  +  gzip c2(Data2) + ... =0.82MB
XML Compression
Compression Tradeoff
Summary of XML Data Management ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Summary of XML Data Management ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Más contenido relacionado

La actualidad más candente

2013 DATA @ NFLX (Tableau User Group)
2013 DATA @ NFLX (Tableau User Group)2013 DATA @ NFLX (Tableau User Group)
2013 DATA @ NFLX (Tableau User Group)Albert Wong
 
Lost In The Clouds
Lost In The CloudsLost In The Clouds
Lost In The Cloudsgeorge.james
 
YAML Engineering: why we need a new paradigm
YAML Engineering: why we need a new paradigmYAML Engineering: why we need a new paradigm
YAML Engineering: why we need a new paradigmRaphaël PINSON
 
The immutable database datomic
The immutable database   datomicThe immutable database   datomic
The immutable database datomicLaurence Chen
 
Mining top k frequent closed itemsets
Mining top k frequent closed itemsetsMining top k frequent closed itemsets
Mining top k frequent closed itemsetsyuanchung
 

La actualidad más candente (6)

2013 DATA @ NFLX (Tableau User Group)
2013 DATA @ NFLX (Tableau User Group)2013 DATA @ NFLX (Tableau User Group)
2013 DATA @ NFLX (Tableau User Group)
 
Lost In The Clouds
Lost In The CloudsLost In The Clouds
Lost In The Clouds
 
YAML Engineering: why we need a new paradigm
YAML Engineering: why we need a new paradigmYAML Engineering: why we need a new paradigm
YAML Engineering: why we need a new paradigm
 
Geo data analytics
Geo data analyticsGeo data analytics
Geo data analytics
 
The immutable database datomic
The immutable database   datomicThe immutable database   datomic
The immutable database datomic
 
Mining top k frequent closed itemsets
Mining top k frequent closed itemsetsMining top k frequent closed itemsets
Mining top k frequent closed itemsets
 

Destacado

Best Practices Portfolio Mngt
Best Practices Portfolio MngtBest Practices Portfolio Mngt
Best Practices Portfolio MngtSTKI
 
Rt Printing V3 And Vendors
Rt Printing  V3 And VendorsRt Printing  V3 And Vendors
Rt Printing V3 And VendorsSTKI
 
Office Of The Cio Round Table Summary 3
Office Of The Cio Round Table Summary 3Office Of The Cio Round Table Summary 3
Office Of The Cio Round Table Summary 3STKI
 
Crm Round Table Summary 2
Crm Round Table Summary 2Crm Round Table Summary 2
Crm Round Table Summary 2STKI
 
Office Of The Cio 2
Office Of The Cio  2 Office Of The Cio  2
Office Of The Cio 2 STKI
 
Office Of The Cio Pmo 23.12.07
Office Of The Cio Pmo 23.12.07Office Of The Cio Pmo 23.12.07
Office Of The Cio Pmo 23.12.07STKI
 
Christ The Redeemer In Rio
Christ The Redeemer In Rio Christ The Redeemer In Rio
Christ The Redeemer In Rio alina28
 
Erp Round Table Summary V5
Erp Round Table Summary V5Erp Round Table Summary V5
Erp Round Table Summary V5STKI
 
Itil Rt Summary1
Itil Rt Summary1Itil Rt Summary1
Itil Rt Summary1STKI
 
Office Of The Cio Pmo 23.12.07
Office Of The Cio Pmo 23.12.07Office Of The Cio Pmo 23.12.07
Office Of The Cio Pmo 23.12.07STKI
 
Minunea Globului Pamintesc
Minunea Globului PamintescMinunea Globului Pamintesc
Minunea Globului Pamintescalina28
 
Bpm Round Table Summary
Bpm Round Table SummaryBpm Round Table Summary
Bpm Round Table SummarySTKI
 
Office Of The Cio 2
Office Of The Cio  2 Office Of The Cio  2
Office Of The Cio 2 STKI
 
Green Dc Rt V3 And Vendors
Green Dc  Rt V3 And VendorsGreen Dc  Rt V3 And Vendors
Green Dc Rt V3 And VendorsSTKI
 
Nelson Rolihlahla Mandela
Nelson Rolihlahla MandelaNelson Rolihlahla Mandela
Nelson Rolihlahla Mandelamarija1987
 
Psalm of Life
Psalm of LifePsalm of Life
Psalm of Lifelsample
 
Lit. Unit 7: Gift of the Magi
Lit. Unit 7: Gift of the MagiLit. Unit 7: Gift of the Magi
Lit. Unit 7: Gift of the Magilsample
 
The harsh hammurabi code
The harsh hammurabi codeThe harsh hammurabi code
The harsh hammurabi codelsample
 

Destacado (19)

Best Practices Portfolio Mngt
Best Practices Portfolio MngtBest Practices Portfolio Mngt
Best Practices Portfolio Mngt
 
Rt Printing V3 And Vendors
Rt Printing  V3 And VendorsRt Printing  V3 And Vendors
Rt Printing V3 And Vendors
 
Office Of The Cio Round Table Summary 3
Office Of The Cio Round Table Summary 3Office Of The Cio Round Table Summary 3
Office Of The Cio Round Table Summary 3
 
Crm Round Table Summary 2
Crm Round Table Summary 2Crm Round Table Summary 2
Crm Round Table Summary 2
 
Office Of The Cio 2
Office Of The Cio  2 Office Of The Cio  2
Office Of The Cio 2
 
Office Of The Cio Pmo 23.12.07
Office Of The Cio Pmo 23.12.07Office Of The Cio Pmo 23.12.07
Office Of The Cio Pmo 23.12.07
 
Christ The Redeemer In Rio
Christ The Redeemer In Rio Christ The Redeemer In Rio
Christ The Redeemer In Rio
 
Erp Round Table Summary V5
Erp Round Table Summary V5Erp Round Table Summary V5
Erp Round Table Summary V5
 
Itil Rt Summary1
Itil Rt Summary1Itil Rt Summary1
Itil Rt Summary1
 
Office Of The Cio Pmo 23.12.07
Office Of The Cio Pmo 23.12.07Office Of The Cio Pmo 23.12.07
Office Of The Cio Pmo 23.12.07
 
Minunea Globului Pamintesc
Minunea Globului PamintescMinunea Globului Pamintesc
Minunea Globului Pamintesc
 
Com Fer Un Blog
Com Fer Un BlogCom Fer Un Blog
Com Fer Un Blog
 
Bpm Round Table Summary
Bpm Round Table SummaryBpm Round Table Summary
Bpm Round Table Summary
 
Office Of The Cio 2
Office Of The Cio  2 Office Of The Cio  2
Office Of The Cio 2
 
Green Dc Rt V3 And Vendors
Green Dc  Rt V3 And VendorsGreen Dc  Rt V3 And Vendors
Green Dc Rt V3 And Vendors
 
Nelson Rolihlahla Mandela
Nelson Rolihlahla MandelaNelson Rolihlahla Mandela
Nelson Rolihlahla Mandela
 
Psalm of Life
Psalm of LifePsalm of Life
Psalm of Life
 
Lit. Unit 7: Gift of the Magi
Lit. Unit 7: Gift of the MagiLit. Unit 7: Gift of the Magi
Lit. Unit 7: Gift of the Magi
 
The harsh hammurabi code
The harsh hammurabi codeThe harsh hammurabi code
The harsh hammurabi code
 

Similar a 19compression

XML-athon with Don and Dean
XML-athon with Don and DeanXML-athon with Don and Dean
XML-athon with Don and DeanSafe Software
 
Utilized XStrem in Green Integration
Utilized XStrem in Green IntegrationUtilized XStrem in Green Integration
Utilized XStrem in Green IntegrationGuo Albert
 
Big data should be simple
Big data should be simpleBig data should be simple
Big data should be simpleDori Waldman
 
Information Retrieval, Applied Statistics and Mathematics onBigData - German ...
Information Retrieval, Applied Statistics and Mathematics onBigData - German ...Information Retrieval, Applied Statistics and Mathematics onBigData - German ...
Information Retrieval, Applied Statistics and Mathematics onBigData - German ...Romeo Kienzler
 
How to deploy & optimize eZ Publish (2014)
How to deploy & optimize eZ Publish (2014)How to deploy & optimize eZ Publish (2014)
How to deploy & optimize eZ Publish (2014)Kaliop-slide
 
Managing the logs of your (Rails) applications - Arrrrcamp 2011
Managing the logs of your (Rails) applications - Arrrrcamp 2011Managing the logs of your (Rails) applications - Arrrrcamp 2011
Managing the logs of your (Rails) applications - Arrrrcamp 2011lennartkoopmann
 
Synapse 2018 Guarding against failure in a hundred step pipeline
Synapse 2018 Guarding against failure in a hundred step pipelineSynapse 2018 Guarding against failure in a hundred step pipeline
Synapse 2018 Guarding against failure in a hundred step pipelineCalvin French-Owen
 
Semi structure data extraction
Semi structure data extractionSemi structure data extraction
Semi structure data extractionR A Akerkar
 
Best Practices for Migrating your Data Warehouse to Amazon Redshift
Best Practices for Migrating your Data Warehouse to Amazon RedshiftBest Practices for Migrating your Data Warehouse to Amazon Redshift
Best Practices for Migrating your Data Warehouse to Amazon RedshiftAmazon Web Services
 
Php memory-redux
Php memory-reduxPhp memory-redux
Php memory-reduxnanderoo
 
Managing the logs of your (Rails) applications - RailsWayCon 2011
Managing the logs of your (Rails) applications - RailsWayCon 2011Managing the logs of your (Rails) applications - RailsWayCon 2011
Managing the logs of your (Rails) applications - RailsWayCon 2011lennartkoopmann
 
Google Cloud Computing on Google Developer 2008 Day
Google Cloud Computing on Google Developer 2008 DayGoogle Cloud Computing on Google Developer 2008 Day
Google Cloud Computing on Google Developer 2008 Dayprogrammermag
 
Best Practices for Migrating your Data Warehouse to Amazon Redshift
Best Practices for Migrating your Data Warehouse to Amazon Redshift Best Practices for Migrating your Data Warehouse to Amazon Redshift
Best Practices for Migrating your Data Warehouse to Amazon Redshift Amazon Web Services
 
Document Object Model
Document Object ModelDocument Object Model
Document Object Modelchomas kandar
 

Similar a 19compression (20)

XML-athon with Don and Dean
XML-athon with Don and DeanXML-athon with Don and Dean
XML-athon with Don and Dean
 
Utilized XStrem in Green Integration
Utilized XStrem in Green IntegrationUtilized XStrem in Green Integration
Utilized XStrem in Green Integration
 
Big data should be simple
Big data should be simpleBig data should be simple
Big data should be simple
 
Information Retrieval, Applied Statistics and Mathematics onBigData - German ...
Information Retrieval, Applied Statistics and Mathematics onBigData - German ...Information Retrieval, Applied Statistics and Mathematics onBigData - German ...
Information Retrieval, Applied Statistics and Mathematics onBigData - German ...
 
How to deploy & optimize eZ Publish (2014)
How to deploy & optimize eZ Publish (2014)How to deploy & optimize eZ Publish (2014)
How to deploy & optimize eZ Publish (2014)
 
Managing the logs of your (Rails) applications - Arrrrcamp 2011
Managing the logs of your (Rails) applications - Arrrrcamp 2011Managing the logs of your (Rails) applications - Arrrrcamp 2011
Managing the logs of your (Rails) applications - Arrrrcamp 2011
 
D3ML Session
D3ML SessionD3ML Session
D3ML Session
 
Synapse 2018 Guarding against failure in a hundred step pipeline
Synapse 2018 Guarding against failure in a hundred step pipelineSynapse 2018 Guarding against failure in a hundred step pipeline
Synapse 2018 Guarding against failure in a hundred step pipeline
 
Semi structure data extraction
Semi structure data extractionSemi structure data extraction
Semi structure data extraction
 
Best Practices for Migrating your Data Warehouse to Amazon Redshift
Best Practices for Migrating your Data Warehouse to Amazon RedshiftBest Practices for Migrating your Data Warehouse to Amazon Redshift
Best Practices for Migrating your Data Warehouse to Amazon Redshift
 
Handout3o
Handout3oHandout3o
Handout3o
 
PowerAI Deep dive
PowerAI Deep divePowerAI Deep dive
PowerAI Deep dive
 
Php memory-redux
Php memory-reduxPhp memory-redux
Php memory-redux
 
Managing the logs of your (Rails) applications - RailsWayCon 2011
Managing the logs of your (Rails) applications - RailsWayCon 2011Managing the logs of your (Rails) applications - RailsWayCon 2011
Managing the logs of your (Rails) applications - RailsWayCon 2011
 
Google Cloud Computing on Google Developer 2008 Day
Google Cloud Computing on Google Developer 2008 DayGoogle Cloud Computing on Google Developer 2008 Day
Google Cloud Computing on Google Developer 2008 Day
 
6 311 W
6 311 W6 311 W
6 311 W
 
test
testtest
test
 
6 311 W
6 311 W6 311 W
6 311 W
 
Best Practices for Migrating your Data Warehouse to Amazon Redshift
Best Practices for Migrating your Data Warehouse to Amazon Redshift Best Practices for Migrating your Data Warehouse to Amazon Redshift
Best Practices for Migrating your Data Warehouse to Amazon Redshift
 
Document Object Model
Document Object ModelDocument Object Model
Document Object Model
 

Último

Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsRoshan Dwivedi
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 

Último (20)

Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 

19compression

  • 1. Managing XML and Semistructured Data Lecture 19: Compressing XML Data Prof. Dan Suciu Spring 2001
  • 2.
  • 3.
  • 4.
  • 5.
  • 6. How Xmill Works: Three Ideas < apache:entry > < apache:host > </ apache:host > . . . </ apache:entry > 202.239.238.16 GET / HTTP/1.0 text/html 200 … gzip Structure gzip Data =1.75MB + Compress the structure separately from the data:
  • 7. How Xmill Works: Three Ideas < apache:entry > . . . </ apache:entry > 202.23.23.16 224.42.24.55 … gzip Structure gzip Data1 =1.33MB + GET / HTTP/1.0 GET / HTTP/1.1 … gzip Data2 + Group the data values according to their types:
  • 8.
  • 11.
  • 12.