Se ha denunciado esta presentación.
Utilizamos tu perfil de LinkedIn y tus datos de actividad para personalizar los anuncios y mostrarte publicidad más relevante. Puedes cambiar tus preferencias de publicidad en cualquier momento.

BCS Address Day - Open Addresses

1.317 visualizaciones

Publicado el

In this presentation I talk about why open addresses are a public good, and the mechanisms that we're using within Open Addresses to navigate the legal and technical challenges of building an open address list for the UK.

Publicado en: Tecnología
  • Inicia sesión para ver los comentarios

BCS Address Day - Open Addresses

  1. 1. Address Day what next after the Address Wars Jeni Tennison - @JeniT 5 March 2015 https://openaddressesuk.org @openaddressesuk
  2. 2. In economics, a public good is a good that is both non-excludable and non-rivalrous in that individuals cannot be effectively excluded from use and where use by one individual does not reduce availability to others. Wikipedia - Public good
  3. 3. "Tompkins Square Park Central Knoll" by David Shankbone - (CC BY-SA 3.0) via Wikimedia Commons
  4. 4. open data public good
  5. 5. sum of what everyone would pay what it costs to maintain When should a good be public?
  6. 6. Address data should be open data ● National Information Infrastructure ● Not just for posting mail... ○ geocoding for route finding ○ associating people with areas ○ classification for targeting interventions ○ linking datasets together ● Denmark has taken this step ○ 1000% increase use of address data ○ costs = €0.2M - benefits = €14M
  7. 7. Current real life problems ● startup wanting to build an application ○ prohibitive costs ○ prohibitive licensing complexity ● SME with a geodemographic product ○ prohibitive costs ○ limiting customer base & growth ● New build owners ○ 3 months to register to vote, order pizza
  8. 8. Funding public goods ● Government via taxation ● Collaborative bound by contract ● Cross-subsidy by selling other goods ● Voluntary effort ● Social norms
  9. 9. "The sale of the PAF with the Royal Mail was a mistake. Public access to public sector data must never be sold or given away again. This type of information, like census information and many other data sets, is very expensive to collect and collate into useable form, but it also has huge potential value to the economy and society as a whole if it is kept as an open, public good." Bernard Jenkin, Chair of Public Administration Select Committee
  10. 10. Hypothesis 1: the maintenance of open address data can only be effectively funded through taxation Hypothesis 2: it is possible to build and maintain a sustainable open address database using collaboration, cross-subsidy and voluntary effort
  11. 11. Goals ● Free, openly licensed, up-to-date bulk downloads of addresses ● Freemium services over that data ○ eg validation, auto-completion, geocoding ● 100% open source, collaboratively maintained ● Initial ~£400k investment from government ○ compared with £25M annual cost maintaining PAF
  12. 12. Eventual Architecture “Definitive” UK address list - where the address data is safe to use - where each record has confidence and provenance Bulk - Download - Upload APIs - Add - Sort - Validate - Search URLs - Linked data - Extensibility Service Providers Aggregators, digital, telecoms, public sector, distribution, academics, manufacturers etc Services - Websites, Users Value Revenueforsustainability
  13. 13. This takes time Large datasets and inference to tackle the bulk of the challenge “80/20” rule Ongoing, collaborative maintenance Targeted work. Low- volume records to fill existing gaps in available datasets NB: dates are “just for fun”
  14. 14. Approaches 1. Load open datasets containing addresses 2. Build out crowdsourcing mechanisms 3. Use inference to fill gaps and throughout: ● keep track of provenance ● keep track of confidence
  15. 15. Loading datasets Third Party IPR Possibly infected if validated against PAF or AddressBase ⇒ most Government “open” data is infected A few not: ● Companies House ● err...
  16. 16. Platform for loading bulk data Originally developed for OpenCorporates Sandboxed environment for running scripts
  17. 17. Motivating crowdsourcing Bulk - Download - Upload APIs - Add - Sort - Validate - Search URLs - Linked data - Extensibility Value Building Blocks - towns, postcodes, streets - used to parse data and provide confidence in the address list - links between towns, postcodes and streets are learned from addresses Authoritative and definitive UK address list - where the address data is safe to use - where each record has confidence and provenance Revenueforsustainability
  18. 18. ● Turn free-text addresses into building blocks ● Can be used with data containing third party IPR ● Optional “contribute” option Address parsing service
  19. 19. Inference
  20. 20. Fogralea ZE1 0SE © Open Addresses Ltd.
  21. 21. 7 9 11 13 15 17 19 21 23 25 27 29 6 8 10 12 14 16 18 20 22 24 26 28 Fogralea ZE1 0SE
  22. 22. 7 9 11 13 15 17 19 21 23 25 27 29 6 8 10 12 14 16 18 20 22 24 26 28 Fogralea ZE1 0SE
  23. 23. What about nos. 1 to 4? Same postcode? We cannot know! Fogralea ZE1 0SE
  24. 24. Enabling collaborative maintenance
  25. 25. St James House, St James Square, Cheltenham, GL50 3PR 7, St James Square, Cheltenham, GL50 3PT St James North 1, St James Square, Cheltenham, GL50 3PR St James North 3, St James Square, Cheltenham, GL50 3PR 3, St James Square, Cheltenham, GL50 3PR St James House, St James Square, Cheltenham Spa, GL50 3PR St James North 1, St James Square, Cheltenham, GL50 3PR St James Place, Jessop Avenue, Cheltenham, GL50 3PR St James House, St James Square, Cheltenham, GL50 3PR Apt. 3, St James Place, Jessop Avenue, Cheltenham, GL50 3PR 56, Cheltenham Road, London, SE15 3AR Calculating confidence
  26. 26. St James House, St James Square, Cheltenham, GL50 3PR 7, St James Square, Cheltenham, GL50 3PT St James North 1, St James Square, Cheltenham, GL50 3PR St James North 3, St James Square, Cheltenham, GL50 3PR 3, St James Square, Cheltenham, GL50 3PR St James House, St James Square, Cheltenham Spa, GL50 3PR St James North 1, St James Square, Cheltenham, GL50 3PR St James Place, Jessop Avenue, Cheltenham, GL50 3PR St James House, St James Square, Cheltenham, GL50 3PR Apt. 3, St James Place, Jessop Avenue, Cheltenham, GL50 3PR 56, Cheltenham Road, London, SE15 3AR Calculating confidence
  27. 27. Sector Town Count Total Confidence ... HD3 4 HUDDERSFIELD 66 66 87.71% ... DG8 6 NEWTON STEWART 11 12 65.69% DG8 6 STRANRAER 1 12 0.00% DG8 7 NEWTON STEWART 1 1 0.00% ... W3 6 LONDON 196 196 92.96% ... CH44 4 WALLASEY 23 29 76.06% CH44 4 WIRRAL 6 29 8.22% Calculating confidence This postcode/town association is right but confidence is low because of the low count This postcode/town association is incorrect Another correct postcode/town association, but with a higher count This is what happens when post towns are re-organised; Wirral is now split in Birkenhead, Wallasey, Wirral and Prenton This is how a correct postcode/town association looks like
  28. 28. Provenance
  29. 29. Summary ● Built most of the supporting platform ○ parsing free text / messy addresses ○ collaborative loading of data ○ providing downloads, search & URL identity ○ recording provenance & assigning confidence ○ using inference to fill in gaps ● We have low numbers of addresses currently ○ but the right mechanisms to add more ○ and many potential partners
  30. 30. What next? ● Building the platform ● Building the community of collaborators ● Building services to aid cross-subsidy ● Increasing quantity & quality of addresses ● Can anyone else reuse the technology? ● Can anyone else reuse the approach?
  31. 31. Any Questions? @JeniT - jeni.tennison@openaddressesuk.org https://openaddressesuk.org info@openaddressesuk.org @openaddressesuk
  32. 32. Open Addresses Ltd. is a new company being set up to create and maintain an address database for the UK that will be made available to the public as Open Data. It will facilitate the collaborative maintenance of the address database with various stakeholders from the UK Government, industry and non-profit. Offices Where?

×