SlideShare una empresa de Scribd logo
1 de 47
Descargar para leer sin conexión
Creating Open Data with
Open Source
Sammy Fung
sammy.hk
[ITFest.HK] Seminar of Free / Open Source in Hong Kong, April 2013.
Agenda
● What is Open Data ?
● Use of Open Source Software in web crawling.
● Starting new Open Source projects to create
Open Data.
Sammy Fung
● Software Developer using open source.
– Perl → PHP → Python.
– Data Mining / Web Crawling.
– Also deploying OpenStack Cloud and Linux Solutions.
● Open Source Community Leader.
– opensource.hk, HKLUG, GNOME Asia committee, Mozilla
Rep, and program committee member of the largest
Taiwan open source conference - COSCUP.
● Blogger at sammy.hk.
Open Data
Three Laws of Open Government Data by David Eaves.
1.If it can't be spidered or indexed, it doesn't exist.
2.If it isn't available in open and machine readable format, it
can't engage.
3.If a legal framework doesn't allow it to be repurposed, it
doesn't empower.
http://eaves.ca/2009/09/30/three-law-of-open-government-data/
Open Data
● Tim Berners-Lee, the inventor of the Web.
– 5stardata.info
– 5 star deployment scheme of Open Data.
* One Star - Open Data
1.make your stuff available on the Web (whatever format) under an
open license.
2.make it available as structured data (e.g., Excel instead of image
scan of a table)
3.use non-proprietary formats (e.g., CSV instead of Excel)
4.use URIs to denote things, so that people can point at your stuff.
5.link your data to other data to provide context.
5stardata.info by Tim Berners-Lee, the inventor of the Web.
** Two Star - Open Data
1.make your stuff available on the Web (whatever format) under an
open license.
2.make it available as structured data (e.g., Excel instead of image
scan of a table)
3.use non-proprietary formats (e.g., CSV instead of Excel)
4.use URIs to denote things, so that people can point at your stuff.
5.link your data to other data to provide context.
5stardata.info by Tim Berners-Lee, the inventor of the Web.
*** Three Star - Open Data
1.make your stuff available on the Web (whatever format) under an
open license.
2.make it available as structured data (e.g., Excel instead of image
scan of a table)
3.use non-proprietary formats (e.g., CSV instead of Excel)
4.use URIs to denote things, so that people can point at your stuff.
5.link your data to other data to provide context.
5stardata.info by Tim Berners-Lee, the inventor of the Web.
**** Four Star - Open Data
1.make your stuff available on the Web (whatever format) under an
open license.
2.make it available as structured data (e.g., Excel instead of image
scan of a table)
3.use non-proprietary formats (e.g., CSV instead of Excel)
4.use URIs to denote things, so that people can point at your stuff.
5.link your data to other data to provide context.
5stardata.info by Tim Berners-Lee, the inventor of the Web.
***** Five Star - Open Data
1.make your stuff available on the Web (whatever format) under an
open license.
2.make it available as structured data (e.g., Excel instead of image
scan of a table)
3.use non-proprietary formats (e.g., CSV instead of Excel)
4.use URIs to denote things, so that people can point at your stuff.
5.link your data to other data to provide context.
5stardata.info by Tim Berners-Lee, the inventor of the Web.
Open Data from HK Government ?
● 2 Use Cases of Data:
– Legco Meeting Minutes and Voting Results.
– Weather at Data.One.
Legco Meeting Minutes
and Voting Results
Legco Meeting Minutes
and Voting Results
Legco Meeting Minutes
and Voting Results
● All legco voting results are scanned and
released in PDF, it is only possible to retrieve
voting results manually.
● In recent years, it seems scanned minutes
from sheets scanned are replaced by minutes
converted from original computer document
files.
Improving Legco Vote Result Data ?
● Legcovotes.net is created by Hong Kong
netitizens(?).
● Only 20 famous vote results are included.
● It is possible to let public to input other vote
results by hand, and submissions should be
verified by legcovotes.net authoritative.
● Including other data, eg. Minutes in plain text
or paragraphs related to a counciler.
Weather at Data.One
● My Chinese Blog Post 「香港政府機構開放資
料 Open Data 情況」 on 2013/1/17.
● Data.One released on 2011/3/31.
● Weather at Data.One provides 7 dataset URLs,
returns RSS (XML) format (Eng/TChi/SChi)
– One word: Useless.
– Data.One dataset (RSS) is completely different
with HKO own paid service (XML).
Weather at Data.One
● Example - Current local weather report:
● Plain text report in RSS.
● Difference to quote report content:
– Website: a pair of HTML tags, eg. <PRE>....</PRE>.
– Data.One: a pair of RSS description tags,
<description>....</description>.
● Other weather data is missing, eg. Regional
temperture updates per each 12 mins.
Weather at Data.One
● Weather at Data.One is 'report' but not 'data'.
● Weather RSS is already released by HKO
before launch of Data.One.
● Technically, json/xml format is better
readable by computer programs.
Oversea Open Data Project
Examples
● Toronto:
– City Data: http://map.toronto.ca/wellbeing/
– Transportation: http://www.rocketradar.net/
– Pollution: http://www.emitter.ca/
● US & Canada:
– https://www.crimereports.com/
Use of Open Source Software in
Web Crawling
● Use Open Source Tools to collect useful and
meaningful machine-readable data.
● Doesn't need to wait provider to release data
in machine-readable format.
Open Source Tools
● Python programming lanugage
● with Regular Expression library
● Scrapy web crawling framework
Why python + scrapy ?
● python: my current favourite programming
language for few years.
● scrapy: web crawling framework written in
Python.
Scrapy
● scrapy: web crawling framework written in
Python.
● HtmlXPathSelector
● Output: built-in JSON, CSV, XML.
● Python: import re
My Products
● WeatherHK ← ← ←
● TCTrack
WeatherHK
● http://twitter.com/weatherhk
● hourly current weather report
● weather forecast report
● tropical signal warning
WeatherHK
● Backend: Python + Scrapy + Database +
Twitter + NNTP......
● Frontend: Twitter + Newsgroup
WeatherHK
● http://twitter.com/weatherhk
● Interview by MetroPop in 2009.
My Products
● WeatherHK
● TCTrack ← ← ←
TCTrack
● http://sammy.hk/projects/tctrack/tctrack.php
● Plot TC current and forecast tracks over
Google Map.
● Source:
– JTWC
– HKO
TCTrack
● http://sammy.hk/projects/tctrack/tctrack.php
● Probably first tctrack map in HK using
GoogleMap
● Use of GMap: TCTrack -> Weather
Underground Hong Kong -> HKO
TCTrack
● http://twitter.com/tctrack
● Tweet JTWC updates for Northwest Pacific.
Starting new Open Source projects
to create Open Data
● Develop a open source project.
● Release data in standard machine-readable
data format.
Open Source Project Examples
● Hk0weather
● My weather related open source project.
hk0weather
● https://github.com/sammyfung/hk0weather
● Open Source Hong Kong Weather Project.
● convert to JSON data from HKO webpages.
● python + scrapy
● 1st version: from current weather report,
extracting temperture and humidity from 20+
weather stations, export in json format.
hk0weather
● https://github.com/sammyfung/hk0weather
● $ virtualenv hk0weatherenv
● $ source hk0weatherenv/bin/activate
● $ pip install scrapy
● $ git clone
https://github.com/sammyfung/hk0weather.git
● $ cd hk0weather
● $ scrapy crawl currwx -t json -o testresult
hk0weather
[{"humidity": 80, "station": "hko", "temperture": 17, "time": 1360785720},
{"station": "kingspark", "temperture": 16, "time": 1360785720},
{"station": "wongchukhang", "temperture": 17, "time": 1360785720},
{"station": "takwuling", "temperture": 16, "time": 1360785720},
{"station": "laufaushan", "temperture": 15, "time": 1360785720},
{"station": "taipo", "temperture": 16, "time": 1360785720},
{"station": "shatin", "temperture": 17, "time": 1360785720},
{"station": "tuenmun", "temperture": 17, "time": 1360785720},
{"station": "tseungkwano", "temperture": 16, "time": 1360785720},
{"station": "saikung", "temperture": 16, "time": 1360785720},
{"station": "cheungchau", "temperture": 17, "time": 1360785720},
{"station": "cheungchau", "temperture": 17, "time": 1360785720},
{"station": "tsingyi", "temperture": 17, "time": 1360785720},
{"station": "shekkong", "temperture": 15, "time": 1360785720},
{"station": "tsuenwanhokoon", "temperture": 15, "time": 1360785720},
{"station": "tsuenwanshingmunvalley", "temperture": 17, "time": 1360785720},
{"station": "hongkongpark", "temperture": 17, "time": 1360785720},
{"station": "shaukeiwan", "temperture": 16, "time": 1360785720},
{"station": "kowlooncity", "temperture": 16, "time": 1360785720},
{"station": "happyvalley", "temperture": 18, "time": 1360785720},
{"station": "wongtaisin", "temperture": 17, "time": 1360785720},
{"station": "stanley", "temperture": 16, "time": 1360785720},
{"station": "kwuntong", "temperture": 15, "time": 1360785720},
{"station": "shamshuipo", "temperture": 17, "time": 1360785720}]
Items.py
class Hk0WeatherItem(Item):
time = Field()
station = Field()
temperture = Field()
humidity = Field()
Currwx.py
start_urls = (
'http://www.weather.gov.hk/wxinfo/currwx/curr
entc.htm',
)
Currwx.py
def parse(self, response):
laststation = ''
temperture = int()
stations = []
hxs = HtmlXPathSelector(response)
report = hxs.select('//div[@id="ming"]')
libhk0
class hk0:
stations = [
(u' 天 文 台 ', 'hko'),
(u' 京 士 柏 ', 'kingspark'),
(u' 黃 竹 坑 ', 'wongchukhang'),
(u' 打 鼓 嶺 ', 'takwuling'),
(u' 流 浮 山 ', 'laufaushan'),
libhk0
class hk0:
def gettime(self, report):
…
def hk0current(self, report):
…
hk0weather
● Future Planning:
● Add more weather reports.
● Getting ideas and/or cooperate with 'pro'
Weather hobbists.
● Remarks:
● Development of hk0weather is started from
ZERO, its code is different than my twitter
@weatherhk.
Challenge
● Challenge on first day of hk0weather release.
● Director of a mobile app developer company
told me by leaving a Facebook comment.
– HKO provides data in pretty XML format with their 
annual service plan for commerical companies.
– He think that ***MAYBE*** HKO would provide XML 
to you ***without*** any charges if I asked.
● Remark: This is an assumption only, not listed on HKO 
website.
Challenge
● I replied the following to him after googling for HKO XML
schema.
– HKO didn't mention 'free of charge service' of XML data feed on
website.
– I registered and got authorization from HKO to re-distribute their
weather information for non-profit making. And I received some
emails from HKO for any updates of website and HTML structure,
but never mention about XML data feed service.
– Weather data available on HKO XML data feed is still fewer than its
HTML website.
●
So, this challenge is FAIL! XD
Open Data Project Examples
● Open Government initiative from HKU JMSC.
● http://opengov.jmsc.hku.hk/
● https://github.com/jmschku
Agenda
● What is Open Data ?
● Use of Open Source Software in web crawling.
● Starting new Open Source projects to create
Open Data.
Thank You!
sammy.hk

Más contenido relacionado

La actualidad más candente

RDF Stream Processing Tutorial: RSP implementations
RDF Stream Processing Tutorial: RSP implementationsRDF Stream Processing Tutorial: RSP implementations
RDF Stream Processing Tutorial: RSP implementationsJean-Paul Calbimonte
 
Introduction of g0v.tw at OpenDataHK.meet.12
Introduction of g0v.tw at OpenDataHK.meet.12Introduction of g0v.tw at OpenDataHK.meet.12
Introduction of g0v.tw at OpenDataHK.meet.12Sammy Fung
 
Realtimestream and realtime fastcatsearch
Realtimestream and realtime fastcatsearchRealtimestream and realtime fastcatsearch
Realtimestream and realtime fastcatsearch상욱 송
 
Actionable data in life sciences
Actionable data in life sciencesActionable data in life sciences
Actionable data in life sciencesJorge Boucas
 
Connecting Stream Reasoners on the Web
Connecting Stream Reasoners on the WebConnecting Stream Reasoners on the Web
Connecting Stream Reasoners on the WebJean-Paul Calbimonte
 
Triplewave: a step towards RDF Stream Processing on the Web
Triplewave: a step towards RDF Stream Processing on the WebTriplewave: a step towards RDF Stream Processing on the Web
Triplewave: a step towards RDF Stream Processing on the WebDaniele Dell'Aglio
 
The Lonesome LOD Cloud
The Lonesome LOD CloudThe Lonesome LOD Cloud
The Lonesome LOD CloudRuben Verborgh
 
Query Rewriting in RDF Stream Processing
Query Rewriting in RDF Stream ProcessingQuery Rewriting in RDF Stream Processing
Query Rewriting in RDF Stream ProcessingJean-Paul Calbimonte
 
Gradoop: Scalable Graph Analytics with Apache Flink @ Flink & Neo4j Meetup Be...
Gradoop: Scalable Graph Analytics with Apache Flink @ Flink & Neo4j Meetup Be...Gradoop: Scalable Graph Analytics with Apache Flink @ Flink & Neo4j Meetup Be...
Gradoop: Scalable Graph Analytics with Apache Flink @ Flink & Neo4j Meetup Be...Martin Junghanns
 
Introduction to Research Objects - Collaboartions Workshop 2015, Oxford
Introduction to Research Objects - Collaboartions Workshop 2015, OxfordIntroduction to Research Objects - Collaboartions Workshop 2015, Oxford
Introduction to Research Objects - Collaboartions Workshop 2015, Oxfordmatthewgamble
 
Bigdive 2014 - RDF, principles and case studies
Bigdive 2014 - RDF, principles and case studiesBigdive 2014 - RDF, principles and case studies
Bigdive 2014 - RDF, principles and case studiesDiego Valerio Camarda
 
Staab programming thesemanticweb
Staab programming thesemanticwebStaab programming thesemanticweb
Staab programming thesemanticwebAneta Tu
 
Programming the Semantic Web
Programming the Semantic WebProgramming the Semantic Web
Programming the Semantic WebSteffen Staab
 
Collaborations in the Extreme: 
The rise of open code development in the scie...
Collaborations in the Extreme: 
The rise of open code development in the scie...Collaborations in the Extreme: 
The rise of open code development in the scie...
Collaborations in the Extreme: 
The rise of open code development in the scie...Kelle Cruz
 
Introduction to OpenRefine
Introduction to OpenRefineIntroduction to OpenRefine
Introduction to OpenRefineHeather Myers
 

La actualidad más candente (20)

RDF Stream Processing Tutorial: RSP implementations
RDF Stream Processing Tutorial: RSP implementationsRDF Stream Processing Tutorial: RSP implementations
RDF Stream Processing Tutorial: RSP implementations
 
Introduction of g0v.tw at OpenDataHK.meet.12
Introduction of g0v.tw at OpenDataHK.meet.12Introduction of g0v.tw at OpenDataHK.meet.12
Introduction of g0v.tw at OpenDataHK.meet.12
 
Realtimestream and realtime fastcatsearch
Realtimestream and realtime fastcatsearchRealtimestream and realtime fastcatsearch
Realtimestream and realtime fastcatsearch
 
Actionable data in life sciences
Actionable data in life sciencesActionable data in life sciences
Actionable data in life sciences
 
Connecting Stream Reasoners on the Web
Connecting Stream Reasoners on the WebConnecting Stream Reasoners on the Web
Connecting Stream Reasoners on the Web
 
Reminiscing about interoperability
Reminiscing about interoperabilityReminiscing about interoperability
Reminiscing about interoperability
 
Signposting Overview
Signposting OverviewSignposting Overview
Signposting Overview
 
Triplewave: a step towards RDF Stream Processing on the Web
Triplewave: a step towards RDF Stream Processing on the WebTriplewave: a step towards RDF Stream Processing on the Web
Triplewave: a step towards RDF Stream Processing on the Web
 
The Lonesome LOD Cloud
The Lonesome LOD CloudThe Lonesome LOD Cloud
The Lonesome LOD Cloud
 
Linked Data Fragments
Linked Data FragmentsLinked Data Fragments
Linked Data Fragments
 
Query Rewriting in RDF Stream Processing
Query Rewriting in RDF Stream ProcessingQuery Rewriting in RDF Stream Processing
Query Rewriting in RDF Stream Processing
 
Gradoop: Scalable Graph Analytics with Apache Flink @ Flink & Neo4j Meetup Be...
Gradoop: Scalable Graph Analytics with Apache Flink @ Flink & Neo4j Meetup Be...Gradoop: Scalable Graph Analytics with Apache Flink @ Flink & Neo4j Meetup Be...
Gradoop: Scalable Graph Analytics with Apache Flink @ Flink & Neo4j Meetup Be...
 
Introduction to Research Objects - Collaboartions Workshop 2015, Oxford
Introduction to Research Objects - Collaboartions Workshop 2015, OxfordIntroduction to Research Objects - Collaboartions Workshop 2015, Oxford
Introduction to Research Objects - Collaboartions Workshop 2015, Oxford
 
Bigdive 2014 - RDF, principles and case studies
Bigdive 2014 - RDF, principles and case studiesBigdive 2014 - RDF, principles and case studies
Bigdive 2014 - RDF, principles and case studies
 
IIIF & Digital Humanities
IIIF & Digital Humanities     IIIF & Digital Humanities
IIIF & Digital Humanities
 
Staab programming thesemanticweb
Staab programming thesemanticwebStaab programming thesemanticweb
Staab programming thesemanticweb
 
Programming the Semantic Web
Programming the Semantic WebProgramming the Semantic Web
Programming the Semantic Web
 
Collaborations in the Extreme: 
The rise of open code development in the scie...
Collaborations in the Extreme: 
The rise of open code development in the scie...Collaborations in the Extreme: 
The rise of open code development in the scie...
Collaborations in the Extreme: 
The rise of open code development in the scie...
 
Introduction to OpenRefine
Introduction to OpenRefineIntroduction to OpenRefine
Introduction to OpenRefine
 
AINL 2016: Bugaychenko
AINL 2016: BugaychenkoAINL 2016: Bugaychenko
AINL 2016: Bugaychenko
 

Destacado

Software Freedom and Open Source Community
Software Freedom and Open Source CommunitySoftware Freedom and Open Source Community
Software Freedom and Open Source CommunitySammy Fung
 
Mozilla - Openness of the Web
Mozilla - Openness of the WebMozilla - Openness of the Web
Mozilla - Openness of the WebSammy Fung
 
香港中文開源軟件翻譯
香港中文開源軟件翻譯香港中文開源軟件翻譯
香港中文開源軟件翻譯Sammy Fung
 
香港開放原始碼社群 在香港社會發聲
香港開放原始碼社群 在香港社會發聲香港開放原始碼社群 在香港社會發聲
香港開放原始碼社群 在香港社會發聲Sammy Fung
 
Global Open Source Development 2011-2014 Review and 2015 Forecast
Global Open Source Development 2011-2014 Review and 2015 ForecastGlobal Open Source Development 2011-2014 Review and 2015 Forecast
Global Open Source Development 2011-2014 Review and 2015 ForecastSammy Fung
 
Firefox 4 介紹短講
Firefox 4 介紹短講Firefox 4 介紹短講
Firefox 4 介紹短講Sammy Fung
 
From Hk0weather to Open Data
From Hk0weather to Open DataFrom Hk0weather to Open Data
From Hk0weather to Open DataSammy Fung
 
讓網誌博客自由的 Wordpress - 香港 wordpress 博客用家分享
讓網誌博客自由的 Wordpress - 香港 wordpress 博客用家分享讓網誌博客自由的 Wordpress - 香港 wordpress 博客用家分享
讓網誌博客自由的 Wordpress - 香港 wordpress 博客用家分享Sammy Fung
 
Open source communities in hong kong and asia (2012 updates) (Summer BarCam...
Open source communities in hong kong and asia  (2012 updates)  (Summer BarCam...Open source communities in hong kong and asia  (2012 updates)  (Summer BarCam...
Open source communities in hong kong and asia (2012 updates) (Summer BarCam...Sammy Fung
 
Use open source software to develop ideas at work
Use open source software to develop ideas at workUse open source software to develop ideas at work
Use open source software to develop ideas at workSammy Fung
 
Mozilla Community and Hong Kong
Mozilla Community and Hong KongMozilla Community and Hong Kong
Mozilla Community and Hong KongSammy Fung
 
Open Data and Web API
Open Data and Web APIOpen Data and Web API
Open Data and Web APISammy Fung
 
IBus Chinese input methods for HongKongers - Problem, Solution, Future.
IBus Chinese input methods for HongKongers - Problem, Solution, Future.IBus Chinese input methods for HongKongers - Problem, Solution, Future.
IBus Chinese input methods for HongKongers - Problem, Solution, Future.Sammy Fung
 
Building your own job site with Drupal
Building your own job site with DrupalBuilding your own job site with Drupal
Building your own job site with DrupalSammy Fung
 
Open Source Technology and Community
Open Source Technology and CommunityOpen Source Technology and Community
Open Source Technology and CommunitySammy Fung
 
Python, web scraping and content management: Scrapy and Django
Python, web scraping and content management: Scrapy and DjangoPython, web scraping and content management: Scrapy and Django
Python, web scraping and content management: Scrapy and DjangoSammy Fung
 

Destacado (16)

Software Freedom and Open Source Community
Software Freedom and Open Source CommunitySoftware Freedom and Open Source Community
Software Freedom and Open Source Community
 
Mozilla - Openness of the Web
Mozilla - Openness of the WebMozilla - Openness of the Web
Mozilla - Openness of the Web
 
香港中文開源軟件翻譯
香港中文開源軟件翻譯香港中文開源軟件翻譯
香港中文開源軟件翻譯
 
香港開放原始碼社群 在香港社會發聲
香港開放原始碼社群 在香港社會發聲香港開放原始碼社群 在香港社會發聲
香港開放原始碼社群 在香港社會發聲
 
Global Open Source Development 2011-2014 Review and 2015 Forecast
Global Open Source Development 2011-2014 Review and 2015 ForecastGlobal Open Source Development 2011-2014 Review and 2015 Forecast
Global Open Source Development 2011-2014 Review and 2015 Forecast
 
Firefox 4 介紹短講
Firefox 4 介紹短講Firefox 4 介紹短講
Firefox 4 介紹短講
 
From Hk0weather to Open Data
From Hk0weather to Open DataFrom Hk0weather to Open Data
From Hk0weather to Open Data
 
讓網誌博客自由的 Wordpress - 香港 wordpress 博客用家分享
讓網誌博客自由的 Wordpress - 香港 wordpress 博客用家分享讓網誌博客自由的 Wordpress - 香港 wordpress 博客用家分享
讓網誌博客自由的 Wordpress - 香港 wordpress 博客用家分享
 
Open source communities in hong kong and asia (2012 updates) (Summer BarCam...
Open source communities in hong kong and asia  (2012 updates)  (Summer BarCam...Open source communities in hong kong and asia  (2012 updates)  (Summer BarCam...
Open source communities in hong kong and asia (2012 updates) (Summer BarCam...
 
Use open source software to develop ideas at work
Use open source software to develop ideas at workUse open source software to develop ideas at work
Use open source software to develop ideas at work
 
Mozilla Community and Hong Kong
Mozilla Community and Hong KongMozilla Community and Hong Kong
Mozilla Community and Hong Kong
 
Open Data and Web API
Open Data and Web APIOpen Data and Web API
Open Data and Web API
 
IBus Chinese input methods for HongKongers - Problem, Solution, Future.
IBus Chinese input methods for HongKongers - Problem, Solution, Future.IBus Chinese input methods for HongKongers - Problem, Solution, Future.
IBus Chinese input methods for HongKongers - Problem, Solution, Future.
 
Building your own job site with Drupal
Building your own job site with DrupalBuilding your own job site with Drupal
Building your own job site with Drupal
 
Open Source Technology and Community
Open Source Technology and CommunityOpen Source Technology and Community
Open Source Technology and Community
 
Python, web scraping and content management: Scrapy and Django
Python, web scraping and content management: Scrapy and DjangoPython, web scraping and content management: Scrapy and Django
Python, web scraping and content management: Scrapy and Django
 

Similar a Creating Open Data with Open Source (beta2)

Local Weather Information and GNOME Shell Extension
Local Weather Information and GNOME Shell ExtensionLocal Weather Information and GNOME Shell Extension
Local Weather Information and GNOME Shell ExtensionSammy Fung
 
Use of Open Data in Hong Kong
Use of Open Data in Hong KongUse of Open Data in Hong Kong
Use of Open Data in Hong KongSammy Fung
 
Use of Open Data in Hong Kong (LegCo 2014)
Use of Open Data in Hong Kong (LegCo 2014)Use of Open Data in Hong Kong (LegCo 2014)
Use of Open Data in Hong Kong (LegCo 2014)Sammy Fung
 
Web Scraping_ Gathering Data from Websites.pptx
Web Scraping_ Gathering Data from Websites.pptxWeb Scraping_ Gathering Data from Websites.pptx
Web Scraping_ Gathering Data from Websites.pptxHitechIOT
 
PSUG 1 - 2024-01-22 - Onboarding Best Practices
PSUG 1 - 2024-01-22 - Onboarding Best PracticesPSUG 1 - 2024-01-22 - Onboarding Best Practices
PSUG 1 - 2024-01-22 - Onboarding Best PracticesTomas Moser
 
Leancamp - are you ready to rock
Leancamp - are you ready to rockLeancamp - are you ready to rock
Leancamp - are you ready to rockChristian Heilmann
 
Full Stack Monitoring with Prometheus and Grafana
Full Stack Monitoring with Prometheus and GrafanaFull Stack Monitoring with Prometheus and Grafana
Full Stack Monitoring with Prometheus and GrafanaJazz Yao-Tsung Wang
 
What Goes In Must Come Out: Egress-Assess and Data Exfiltration
What Goes In Must Come Out: Egress-Assess and Data ExfiltrationWhat Goes In Must Come Out: Egress-Assess and Data Exfiltration
What Goes In Must Come Out: Egress-Assess and Data ExfiltrationCTruncer
 
Monitoring Big Data Systems Done "The Simple Way" - Codemotion Milan 2017 - D...
Monitoring Big Data Systems Done "The Simple Way" - Codemotion Milan 2017 - D...Monitoring Big Data Systems Done "The Simple Way" - Codemotion Milan 2017 - D...
Monitoring Big Data Systems Done "The Simple Way" - Codemotion Milan 2017 - D...Demi Ben-Ari
 
Demi Ben-Ari - Monitoring Big Data Systems Done "The Simple Way" - Codemotion...
Demi Ben-Ari - Monitoring Big Data Systems Done "The Simple Way" - Codemotion...Demi Ben-Ari - Monitoring Big Data Systems Done "The Simple Way" - Codemotion...
Demi Ben-Ari - Monitoring Big Data Systems Done "The Simple Way" - Codemotion...Codemotion
 
Softshake 2013: Introduction to NoSQL with Couchbase
Softshake 2013: Introduction to NoSQL with CouchbaseSoftshake 2013: Introduction to NoSQL with Couchbase
Softshake 2013: Introduction to NoSQL with CouchbaseTugdual Grall
 
Web scraping 1 2-3 with python + scrapy (Summer BarCampHK 2012 version)
Web scraping 1 2-3 with python + scrapy (Summer BarCampHK 2012 version)Web scraping 1 2-3 with python + scrapy (Summer BarCampHK 2012 version)
Web scraping 1 2-3 with python + scrapy (Summer BarCampHK 2012 version)Sammy Fung
 
Akash rajguru project report sem v
Akash rajguru project report sem vAkash rajguru project report sem v
Akash rajguru project report sem vAkash Rajguru
 
SplunkLive! Munich 2018: Data Onboarding Overview
SplunkLive! Munich 2018: Data Onboarding OverviewSplunkLive! Munich 2018: Data Onboarding Overview
SplunkLive! Munich 2018: Data Onboarding OverviewSplunk
 
Data Science in the Cloud @StitchFix
Data Science in the Cloud @StitchFixData Science in the Cloud @StitchFix
Data Science in the Cloud @StitchFixC4Media
 
Consuming open and linked data with open source tools
Consuming open and linked data with open source toolsConsuming open and linked data with open source tools
Consuming open and linked data with open source toolsJoanne Cook
 
SplunkLive! Frankfurt 2018 - Data Onboarding Overview
SplunkLive! Frankfurt 2018 - Data Onboarding OverviewSplunkLive! Frankfurt 2018 - Data Onboarding Overview
SplunkLive! Frankfurt 2018 - Data Onboarding OverviewSplunk
 

Similar a Creating Open Data with Open Source (beta2) (20)

Local Weather Information and GNOME Shell Extension
Local Weather Information and GNOME Shell ExtensionLocal Weather Information and GNOME Shell Extension
Local Weather Information and GNOME Shell Extension
 
Use of Open Data in Hong Kong
Use of Open Data in Hong KongUse of Open Data in Hong Kong
Use of Open Data in Hong Kong
 
Use of Open Data in Hong Kong (LegCo 2014)
Use of Open Data in Hong Kong (LegCo 2014)Use of Open Data in Hong Kong (LegCo 2014)
Use of Open Data in Hong Kong (LegCo 2014)
 
Web Scraping_ Gathering Data from Websites.pptx
Web Scraping_ Gathering Data from Websites.pptxWeb Scraping_ Gathering Data from Websites.pptx
Web Scraping_ Gathering Data from Websites.pptx
 
PSUG 1 - 2024-01-22 - Onboarding Best Practices
PSUG 1 - 2024-01-22 - Onboarding Best PracticesPSUG 1 - 2024-01-22 - Onboarding Best Practices
PSUG 1 - 2024-01-22 - Onboarding Best Practices
 
Leancamp - are you ready to rock
Leancamp - are you ready to rockLeancamp - are you ready to rock
Leancamp - are you ready to rock
 
Full Stack Monitoring with Prometheus and Grafana
Full Stack Monitoring with Prometheus and GrafanaFull Stack Monitoring with Prometheus and Grafana
Full Stack Monitoring with Prometheus and Grafana
 
What Goes In Must Come Out: Egress-Assess and Data Exfiltration
What Goes In Must Come Out: Egress-Assess and Data ExfiltrationWhat Goes In Must Come Out: Egress-Assess and Data Exfiltration
What Goes In Must Come Out: Egress-Assess and Data Exfiltration
 
Monitoring Big Data Systems Done "The Simple Way" - Codemotion Milan 2017 - D...
Monitoring Big Data Systems Done "The Simple Way" - Codemotion Milan 2017 - D...Monitoring Big Data Systems Done "The Simple Way" - Codemotion Milan 2017 - D...
Monitoring Big Data Systems Done "The Simple Way" - Codemotion Milan 2017 - D...
 
Demi Ben-Ari - Monitoring Big Data Systems Done "The Simple Way" - Codemotion...
Demi Ben-Ari - Monitoring Big Data Systems Done "The Simple Way" - Codemotion...Demi Ben-Ari - Monitoring Big Data Systems Done "The Simple Way" - Codemotion...
Demi Ben-Ari - Monitoring Big Data Systems Done "The Simple Way" - Codemotion...
 
Softshake 2013: Introduction to NoSQL with Couchbase
Softshake 2013: Introduction to NoSQL with CouchbaseSoftshake 2013: Introduction to NoSQL with Couchbase
Softshake 2013: Introduction to NoSQL with Couchbase
 
Web scraping 1 2-3 with python + scrapy (Summer BarCampHK 2012 version)
Web scraping 1 2-3 with python + scrapy (Summer BarCampHK 2012 version)Web scraping 1 2-3 with python + scrapy (Summer BarCampHK 2012 version)
Web scraping 1 2-3 with python + scrapy (Summer BarCampHK 2012 version)
 
Deploy your own P2P network
Deploy your own P2P networkDeploy your own P2P network
Deploy your own P2P network
 
Akash rajguru project report sem v
Akash rajguru project report sem vAkash rajguru project report sem v
Akash rajguru project report sem v
 
SplunkLive! Munich 2018: Data Onboarding Overview
SplunkLive! Munich 2018: Data Onboarding OverviewSplunkLive! Munich 2018: Data Onboarding Overview
SplunkLive! Munich 2018: Data Onboarding Overview
 
Data Science in the Cloud @StitchFix
Data Science in the Cloud @StitchFixData Science in the Cloud @StitchFix
Data Science in the Cloud @StitchFix
 
Consuming open and linked data with open source tools
Consuming open and linked data with open source toolsConsuming open and linked data with open source tools
Consuming open and linked data with open source tools
 
Scrapy
ScrapyScrapy
Scrapy
 
Handout: 'Open Source Tools & Resources'
Handout: 'Open Source Tools & Resources'Handout: 'Open Source Tools & Resources'
Handout: 'Open Source Tools & Resources'
 
SplunkLive! Frankfurt 2018 - Data Onboarding Overview
SplunkLive! Frankfurt 2018 - Data Onboarding OverviewSplunkLive! Frankfurt 2018 - Data Onboarding Overview
SplunkLive! Frankfurt 2018 - Data Onboarding Overview
 

Más de Sammy Fung

Python 爬網⾴工具 - Scrapy 介紹
Python 爬網⾴工具 - Scrapy 介紹Python 爬網⾴工具 - Scrapy 介紹
Python 爬網⾴工具 - Scrapy 介紹Sammy Fung
 
DevRel - Transform article writing from printing to online
DevRel - Transform article writing from printing to onlineDevRel - Transform article writing from printing to online
DevRel - Transform article writing from printing to onlineSammy Fung
 
Introduction to Open Source by opensource.hk (2019 Edition)
Introduction to Open Source by opensource.hk (2019 Edition)Introduction to Open Source by opensource.hk (2019 Edition)
Introduction to Open Source by opensource.hk (2019 Edition)Sammy Fung
 
My Open Source Journey - Developer and Community
My Open Source Journey - Developer and CommunityMy Open Source Journey - Developer and Community
My Open Source Journey - Developer and CommunitySammy Fung
 
Introduction to development with Django web framework
Introduction to development with Django web frameworkIntroduction to development with Django web framework
Introduction to development with Django web frameworkSammy Fung
 
Access Open Data with Open Source Software Tools
Access Open Data with Open Source Software ToolsAccess Open Data with Open Source Software Tools
Access Open Data with Open Source Software ToolsSammy Fung
 
Installation of LAMP Server with Ubuntu 14.10 Server Edition
Installation of LAMP Server with Ubuntu 14.10 Server EditionInstallation of LAMP Server with Ubuntu 14.10 Server Edition
Installation of LAMP Server with Ubuntu 14.10 Server EditionSammy Fung
 
Software Freedom and Community
Software Freedom and CommunitySoftware Freedom and Community
Software Freedom and CommunitySammy Fung
 
Open Source Job Board
Open Source Job BoardOpen Source Job Board
Open Source Job BoardSammy Fung
 
Introduction of Mozilla Hong Kong (COSCUP 2014)
Introduction of Mozilla Hong Kong (COSCUP 2014)Introduction of Mozilla Hong Kong (COSCUP 2014)
Introduction of Mozilla Hong Kong (COSCUP 2014)Sammy Fung
 
Introduction of Open Source Job Board with Drupal CMS
Introduction of Open Source Job Board with Drupal CMSIntroduction of Open Source Job Board with Drupal CMS
Introduction of Open Source Job Board with Drupal CMSSammy Fung
 
ITFest 2014 - Open Source Marketing
ITFest 2014 - Open Source MarketingITFest 2014 - Open Source Marketing
ITFest 2014 - Open Source MarketingSammy Fung
 
How Open Data can help entrepreneurs - ITFest 2014 E2
How Open Data can help entrepreneurs - ITFest 2014 E2How Open Data can help entrepreneurs - ITFest 2014 E2
How Open Data can help entrepreneurs - ITFest 2014 E2Sammy Fung
 
Air Pollution Weather Map at OpenDataHK.make.02
Air Pollution Weather Map at OpenDataHK.make.02Air Pollution Weather Map at OpenDataHK.make.02
Air Pollution Weather Map at OpenDataHK.make.02Sammy Fung
 
15+ years of open source movements in Hong Kong
15+ years of open source movements in Hong Kong15+ years of open source movements in Hong Kong
15+ years of open source movements in Hong KongSammy Fung
 

Más de Sammy Fung (15)

Python 爬網⾴工具 - Scrapy 介紹
Python 爬網⾴工具 - Scrapy 介紹Python 爬網⾴工具 - Scrapy 介紹
Python 爬網⾴工具 - Scrapy 介紹
 
DevRel - Transform article writing from printing to online
DevRel - Transform article writing from printing to onlineDevRel - Transform article writing from printing to online
DevRel - Transform article writing from printing to online
 
Introduction to Open Source by opensource.hk (2019 Edition)
Introduction to Open Source by opensource.hk (2019 Edition)Introduction to Open Source by opensource.hk (2019 Edition)
Introduction to Open Source by opensource.hk (2019 Edition)
 
My Open Source Journey - Developer and Community
My Open Source Journey - Developer and CommunityMy Open Source Journey - Developer and Community
My Open Source Journey - Developer and Community
 
Introduction to development with Django web framework
Introduction to development with Django web frameworkIntroduction to development with Django web framework
Introduction to development with Django web framework
 
Access Open Data with Open Source Software Tools
Access Open Data with Open Source Software ToolsAccess Open Data with Open Source Software Tools
Access Open Data with Open Source Software Tools
 
Installation of LAMP Server with Ubuntu 14.10 Server Edition
Installation of LAMP Server with Ubuntu 14.10 Server EditionInstallation of LAMP Server with Ubuntu 14.10 Server Edition
Installation of LAMP Server with Ubuntu 14.10 Server Edition
 
Software Freedom and Community
Software Freedom and CommunitySoftware Freedom and Community
Software Freedom and Community
 
Open Source Job Board
Open Source Job BoardOpen Source Job Board
Open Source Job Board
 
Introduction of Mozilla Hong Kong (COSCUP 2014)
Introduction of Mozilla Hong Kong (COSCUP 2014)Introduction of Mozilla Hong Kong (COSCUP 2014)
Introduction of Mozilla Hong Kong (COSCUP 2014)
 
Introduction of Open Source Job Board with Drupal CMS
Introduction of Open Source Job Board with Drupal CMSIntroduction of Open Source Job Board with Drupal CMS
Introduction of Open Source Job Board with Drupal CMS
 
ITFest 2014 - Open Source Marketing
ITFest 2014 - Open Source MarketingITFest 2014 - Open Source Marketing
ITFest 2014 - Open Source Marketing
 
How Open Data can help entrepreneurs - ITFest 2014 E2
How Open Data can help entrepreneurs - ITFest 2014 E2How Open Data can help entrepreneurs - ITFest 2014 E2
How Open Data can help entrepreneurs - ITFest 2014 E2
 
Air Pollution Weather Map at OpenDataHK.make.02
Air Pollution Weather Map at OpenDataHK.make.02Air Pollution Weather Map at OpenDataHK.make.02
Air Pollution Weather Map at OpenDataHK.make.02
 
15+ years of open source movements in Hong Kong
15+ years of open source movements in Hong Kong15+ years of open source movements in Hong Kong
15+ years of open source movements in Hong Kong
 

Último

The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 

Último (20)

The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 

Creating Open Data with Open Source (beta2)

  • 1. Creating Open Data with Open Source Sammy Fung sammy.hk [ITFest.HK] Seminar of Free / Open Source in Hong Kong, April 2013.
  • 2. Agenda ● What is Open Data ? ● Use of Open Source Software in web crawling. ● Starting new Open Source projects to create Open Data.
  • 3. Sammy Fung ● Software Developer using open source. – Perl → PHP → Python. – Data Mining / Web Crawling. – Also deploying OpenStack Cloud and Linux Solutions. ● Open Source Community Leader. – opensource.hk, HKLUG, GNOME Asia committee, Mozilla Rep, and program committee member of the largest Taiwan open source conference - COSCUP. ● Blogger at sammy.hk.
  • 4. Open Data Three Laws of Open Government Data by David Eaves. 1.If it can't be spidered or indexed, it doesn't exist. 2.If it isn't available in open and machine readable format, it can't engage. 3.If a legal framework doesn't allow it to be repurposed, it doesn't empower. http://eaves.ca/2009/09/30/three-law-of-open-government-data/
  • 5. Open Data ● Tim Berners-Lee, the inventor of the Web. – 5stardata.info – 5 star deployment scheme of Open Data.
  • 6. * One Star - Open Data 1.make your stuff available on the Web (whatever format) under an open license. 2.make it available as structured data (e.g., Excel instead of image scan of a table) 3.use non-proprietary formats (e.g., CSV instead of Excel) 4.use URIs to denote things, so that people can point at your stuff. 5.link your data to other data to provide context. 5stardata.info by Tim Berners-Lee, the inventor of the Web.
  • 7. ** Two Star - Open Data 1.make your stuff available on the Web (whatever format) under an open license. 2.make it available as structured data (e.g., Excel instead of image scan of a table) 3.use non-proprietary formats (e.g., CSV instead of Excel) 4.use URIs to denote things, so that people can point at your stuff. 5.link your data to other data to provide context. 5stardata.info by Tim Berners-Lee, the inventor of the Web.
  • 8. *** Three Star - Open Data 1.make your stuff available on the Web (whatever format) under an open license. 2.make it available as structured data (e.g., Excel instead of image scan of a table) 3.use non-proprietary formats (e.g., CSV instead of Excel) 4.use URIs to denote things, so that people can point at your stuff. 5.link your data to other data to provide context. 5stardata.info by Tim Berners-Lee, the inventor of the Web.
  • 9. **** Four Star - Open Data 1.make your stuff available on the Web (whatever format) under an open license. 2.make it available as structured data (e.g., Excel instead of image scan of a table) 3.use non-proprietary formats (e.g., CSV instead of Excel) 4.use URIs to denote things, so that people can point at your stuff. 5.link your data to other data to provide context. 5stardata.info by Tim Berners-Lee, the inventor of the Web.
  • 10. ***** Five Star - Open Data 1.make your stuff available on the Web (whatever format) under an open license. 2.make it available as structured data (e.g., Excel instead of image scan of a table) 3.use non-proprietary formats (e.g., CSV instead of Excel) 4.use URIs to denote things, so that people can point at your stuff. 5.link your data to other data to provide context. 5stardata.info by Tim Berners-Lee, the inventor of the Web.
  • 11. Open Data from HK Government ? ● 2 Use Cases of Data: – Legco Meeting Minutes and Voting Results. – Weather at Data.One.
  • 12. Legco Meeting Minutes and Voting Results
  • 13. Legco Meeting Minutes and Voting Results
  • 14. Legco Meeting Minutes and Voting Results ● All legco voting results are scanned and released in PDF, it is only possible to retrieve voting results manually. ● In recent years, it seems scanned minutes from sheets scanned are replaced by minutes converted from original computer document files.
  • 15. Improving Legco Vote Result Data ? ● Legcovotes.net is created by Hong Kong netitizens(?). ● Only 20 famous vote results are included. ● It is possible to let public to input other vote results by hand, and submissions should be verified by legcovotes.net authoritative. ● Including other data, eg. Minutes in plain text or paragraphs related to a counciler.
  • 16. Weather at Data.One ● My Chinese Blog Post 「香港政府機構開放資 料 Open Data 情況」 on 2013/1/17. ● Data.One released on 2011/3/31. ● Weather at Data.One provides 7 dataset URLs, returns RSS (XML) format (Eng/TChi/SChi) – One word: Useless. – Data.One dataset (RSS) is completely different with HKO own paid service (XML).
  • 17. Weather at Data.One ● Example - Current local weather report: ● Plain text report in RSS. ● Difference to quote report content: – Website: a pair of HTML tags, eg. <PRE>....</PRE>. – Data.One: a pair of RSS description tags, <description>....</description>. ● Other weather data is missing, eg. Regional temperture updates per each 12 mins.
  • 18. Weather at Data.One ● Weather at Data.One is 'report' but not 'data'. ● Weather RSS is already released by HKO before launch of Data.One. ● Technically, json/xml format is better readable by computer programs.
  • 19. Oversea Open Data Project Examples ● Toronto: – City Data: http://map.toronto.ca/wellbeing/ – Transportation: http://www.rocketradar.net/ – Pollution: http://www.emitter.ca/ ● US & Canada: – https://www.crimereports.com/
  • 20. Use of Open Source Software in Web Crawling ● Use Open Source Tools to collect useful and meaningful machine-readable data. ● Doesn't need to wait provider to release data in machine-readable format.
  • 21. Open Source Tools ● Python programming lanugage ● with Regular Expression library ● Scrapy web crawling framework
  • 22. Why python + scrapy ? ● python: my current favourite programming language for few years. ● scrapy: web crawling framework written in Python.
  • 23. Scrapy ● scrapy: web crawling framework written in Python. ● HtmlXPathSelector ● Output: built-in JSON, CSV, XML. ● Python: import re
  • 24. My Products ● WeatherHK ← ← ← ● TCTrack
  • 25. WeatherHK ● http://twitter.com/weatherhk ● hourly current weather report ● weather forecast report ● tropical signal warning
  • 26. WeatherHK ● Backend: Python + Scrapy + Database + Twitter + NNTP...... ● Frontend: Twitter + Newsgroup
  • 28. My Products ● WeatherHK ● TCTrack ← ← ←
  • 29. TCTrack ● http://sammy.hk/projects/tctrack/tctrack.php ● Plot TC current and forecast tracks over Google Map. ● Source: – JTWC – HKO
  • 30. TCTrack ● http://sammy.hk/projects/tctrack/tctrack.php ● Probably first tctrack map in HK using GoogleMap ● Use of GMap: TCTrack -> Weather Underground Hong Kong -> HKO
  • 31. TCTrack ● http://twitter.com/tctrack ● Tweet JTWC updates for Northwest Pacific.
  • 32. Starting new Open Source projects to create Open Data ● Develop a open source project. ● Release data in standard machine-readable data format.
  • 33. Open Source Project Examples ● Hk0weather ● My weather related open source project.
  • 34. hk0weather ● https://github.com/sammyfung/hk0weather ● Open Source Hong Kong Weather Project. ● convert to JSON data from HKO webpages. ● python + scrapy ● 1st version: from current weather report, extracting temperture and humidity from 20+ weather stations, export in json format.
  • 35. hk0weather ● https://github.com/sammyfung/hk0weather ● $ virtualenv hk0weatherenv ● $ source hk0weatherenv/bin/activate ● $ pip install scrapy ● $ git clone https://github.com/sammyfung/hk0weather.git ● $ cd hk0weather ● $ scrapy crawl currwx -t json -o testresult
  • 36. hk0weather [{"humidity": 80, "station": "hko", "temperture": 17, "time": 1360785720}, {"station": "kingspark", "temperture": 16, "time": 1360785720}, {"station": "wongchukhang", "temperture": 17, "time": 1360785720}, {"station": "takwuling", "temperture": 16, "time": 1360785720}, {"station": "laufaushan", "temperture": 15, "time": 1360785720}, {"station": "taipo", "temperture": 16, "time": 1360785720}, {"station": "shatin", "temperture": 17, "time": 1360785720}, {"station": "tuenmun", "temperture": 17, "time": 1360785720}, {"station": "tseungkwano", "temperture": 16, "time": 1360785720}, {"station": "saikung", "temperture": 16, "time": 1360785720}, {"station": "cheungchau", "temperture": 17, "time": 1360785720}, {"station": "cheungchau", "temperture": 17, "time": 1360785720}, {"station": "tsingyi", "temperture": 17, "time": 1360785720}, {"station": "shekkong", "temperture": 15, "time": 1360785720}, {"station": "tsuenwanhokoon", "temperture": 15, "time": 1360785720}, {"station": "tsuenwanshingmunvalley", "temperture": 17, "time": 1360785720}, {"station": "hongkongpark", "temperture": 17, "time": 1360785720}, {"station": "shaukeiwan", "temperture": 16, "time": 1360785720}, {"station": "kowlooncity", "temperture": 16, "time": 1360785720}, {"station": "happyvalley", "temperture": 18, "time": 1360785720}, {"station": "wongtaisin", "temperture": 17, "time": 1360785720}, {"station": "stanley", "temperture": 16, "time": 1360785720}, {"station": "kwuntong", "temperture": 15, "time": 1360785720}, {"station": "shamshuipo", "temperture": 17, "time": 1360785720}]
  • 37. Items.py class Hk0WeatherItem(Item): time = Field() station = Field() temperture = Field() humidity = Field()
  • 39. Currwx.py def parse(self, response): laststation = '' temperture = int() stations = [] hxs = HtmlXPathSelector(response) report = hxs.select('//div[@id="ming"]')
  • 40. libhk0 class hk0: stations = [ (u' 天 文 台 ', 'hko'), (u' 京 士 柏 ', 'kingspark'), (u' 黃 竹 坑 ', 'wongchukhang'), (u' 打 鼓 嶺 ', 'takwuling'), (u' 流 浮 山 ', 'laufaushan'),
  • 41. libhk0 class hk0: def gettime(self, report): … def hk0current(self, report): …
  • 42. hk0weather ● Future Planning: ● Add more weather reports. ● Getting ideas and/or cooperate with 'pro' Weather hobbists. ● Remarks: ● Development of hk0weather is started from ZERO, its code is different than my twitter @weatherhk.
  • 43. Challenge ● Challenge on first day of hk0weather release. ● Director of a mobile app developer company told me by leaving a Facebook comment. – HKO provides data in pretty XML format with their  annual service plan for commerical companies. – He think that ***MAYBE*** HKO would provide XML  to you ***without*** any charges if I asked. ● Remark: This is an assumption only, not listed on HKO  website.
  • 44. Challenge ● I replied the following to him after googling for HKO XML schema. – HKO didn't mention 'free of charge service' of XML data feed on website. – I registered and got authorization from HKO to re-distribute their weather information for non-profit making. And I received some emails from HKO for any updates of website and HTML structure, but never mention about XML data feed service. – Weather data available on HKO XML data feed is still fewer than its HTML website. ● So, this challenge is FAIL! XD
  • 45. Open Data Project Examples ● Open Government initiative from HKU JMSC. ● http://opengov.jmsc.hku.hk/ ● https://github.com/jmschku
  • 46. Agenda ● What is Open Data ? ● Use of Open Source Software in web crawling. ● Starting new Open Source projects to create Open Data.