This document discusses a twinning project between the EU and Turkey aimed at improving data quality in public accounts. It covers topics such as data warehousing, business intelligence, dashboards, OLAP tools, data visualization techniques including maps, charts and infographics. Open data practices and linked data are also discussed. Tools mentioned include Tableau, Google Public Data Explorer, and VIDI modules for the Drupal content management system.
DSPy a system for AI to Write Prompts and Do Fine Tuning
IT tools for statistics, visualization, open data
1. EU TWINNING PROJECT
TR 08 IB FI 02
“ Improving Data Quality in Public Accounts”
AB EŞLEŞTİRME PROJESİ
“ Kamu Hesaplarında Veri Kalitesinin Artırılması”
IT tools for statistics, visualization,
open data
Carlo Vaccari (ISTAT / Formez)
1
Twinning Project “Improving data quality in public accounts”
2. Data warehouse
Business Intelligence to analyze data
Business Intelligence elaborations operate on Data Warehouse
A Data Warehouse is a collection of data that supports decision
making and having the following characteristics:
• oriented to the subject of interest
• integrated and consistent
• representative of the temporal evolution
• non-volatile
2
Twinning Project “Improving data quality in public accounts”
2
3. Data Warehouse tools
Operational from operational data
data to data warehouse
Current transactional
Data Warehouse
procedures
Dashboards
Data Mining
Advanced OLAP tools
Reporting
3
Twinning Project “Improving data quality in public accounts” 3
4. Dashboard
Dashboard: data visualization tool that displays the current status of
metrics and key performance indicators (KPIs) for an enterprise.
Dashboards consolidate and arrange numbers, metrics and
sometimes performance scorecards on a single screen.
Various kind of dashboards:
“Business Dashboards” – Business related dashboard
“Executive Dashboard” – Dashboards meant to be used by CEO,
Managers etc
“Operational Dashboard” – Dashboards that monitor day to day
activity
Dashboards are designed to help us monitor what’s going on at a
glance
4
Twinning Project “Improving data quality in public accounts”
4
5. Dashboard
5
Twinning Project “Improving data quality in public accounts”
5
6. Dashboard
6
Twinning Project “Improving data quality in public accounts”
6
7. OLAP
OnLine Analytical Processing: decision support software that allows the
user to quickly analyze information that has been summarized into
multidimensional views and hierarchies
OLAP tools are used to perform trend analysis on financial information
Multidimensional data
Many operators
Complex not-
predefined analysis
Data:
- not operational
- current and historical
7
Twinning Project “Improving data quality in public accounts”
7
8. An OLAP implementation with BO (Italian case)
E/R outline DFM Fact outline DFM Functionality outline
Fascia di Dimensione
Tipo Ente Codice Fascia
Popolazione
Periodo Prospetto PSI Codice Tipo Ente
Periodo PSI
Anno rilevazione Istat
Tipo Applicazione Fascia di Dimensione Periodo Prospetto Cassa
Codice Tipo Applicazione Codice Fascia Ente Periodo Prospetto Cassa
Codice Ente
Popolazione Tipo Ente
Anno rilevazione Istat Codice Tipo Ente
Tipo Anomalia
Codice Tipo Anomalia
Prospetto Cassa
Prospetto PSI Ente
Codice Ente
Tipo Prospetto Cassa
Anagrafica Anomalia
Tipo Prospetto Cassa
Codice Anomalia Voce Cassa
Tipo Modello
Codice Tipo Modello Dettaglio Anomalia
Progressivo Anomalia
Tipo Voce Prospetto
Codice Tipo Voce
Anagrafica Voce Cassa
Codice Voce Cassa
Anagrafica Voce PSI
Codice Voce Patto
Tipo Voce Cassa
Codice Tipo Voce Cassa
Voce Prospetto PSI
Sezione
Titolo
Voce Istat Voce Patto
Categoria
Voce Dettaglio
Business Rules
Analysis
Development
BO Report
BO Universe
ETL EDW ETL Data
Mart DMA_DC13_LISTE
ID_LISTA: SMALLINT
DENOMINAZIONE_LISTA: varchar(200)
NOME_FORNITORE: varchar(200)
DMA_DC12_CAMPAGNE
ID_CAMPAGNA: SMALLINT
CAMPAGNA: varchar(200)
DMA_DC01_DATA_OSSERVAZIONE
ID_DATA_OSS: smalldatetime
DATA_FORNITURA: smalldatetime DATA_ASSEGN_CAMPAGNA: smalldatetime DATA_OSSERVAZIONE: smalldatetime
CRITERI_SELEZIONE: varchar(2000) DB_PROVENIENZA: varchar(100)
COD_LISTA: varchar(200)
DMA_DC02_CLIENTI
D_DMA_DCEE_CL_ECONOMICA_EN ID_CLIENTE: SMALLINT DMA_DC08_OPERATORI_TELESELLING
D_DMA_DTPE_TIPO_ ENTE TIPO_CLIENTE: varchar(15) ID_OPERATORE: int
CONSENSO_INFORM: varchar(2) DMA_FC01_CONTATTI
OPERATORE: varchar(20)
NOMINATIVO_DA_RICHIAMARE: varchar(200) ID_CAMPAGNA: SMALLINT
D_DMA_SLOG_LOG_DI_CARICAMENTO NOMINATIVO_INTERLOCUTORE: varchar(200)
PARTNER_COMMERCIALE: varchar(255)
ID_LISTA: SMALLINT
D_DMA_DCES_CL_ECONOMICA_SP E_MAIL: varchar(50) ID_CLIENTE: SMALLINT
PARTITA_IVA: varchar(16)
D_DMA_DSTE_SOTTOTIPO_ENTE NUM_CONTATTI_DEFINITIVI: SMALLINT DMA_DC09_CONTRATTI
FORMA_GIURIDICA: varchar(200)
COGNOME_RAGIONE_SOCIALE: varchar(200) NUM_CONTATTI_NON_DEFINITIVI: SMALLINT ID_CONTRATTO: SMALLINT
NOME: varchar(100) NUM__PRODOTTI_VENDUTI: SMALLINT TIPO_CONTRATTO: varchar(30)
TITOLO: varchar(50) NUM_SERVIZI_VENDUTI: SMALLINT CONTRATTO: varchar(200)
D_DMA_SSTS_STATUS ID_TEMPO_COURTESY_CALL: SMALLINT
D_DMA_DPRZ_PROV_REG_ZONA SESSO: varchar(10)
ID_TEMPO_CHIUSURA: SMALLINT
UTENZA_CLI_INPUT: varchar(15)
UTENZA_ALTERNATIVA1_CLI_OUTPUT: varchar(15) ID_TEMPO_SCADENZA_GEST: SMALLINT
TIPO_1_CLI_OUTPUT: varchar(20) ID_CONTRATTO: SMALLINT DMA_DC07_FASCIA_ETA
D_DMA_DAVC_ANAGRAFICA_VOCE_CAS UTENZA_ALTERNATIVA2_CLI_OUTPUT: varchar(15) ID_ESITO: SMALLINT ID_FASCIA_ETA: TINYINT
CENTRALE_TELEFONICA: varchar(100) ID_DATA_OSS: smalldatetime
FASCIA_ETA: varchar(10)
CODICE_IDBRE: varchar(15) ID_GESTORE: SMALLINT
ESTREMO_INF: TINYINT
LONGDISTANCE: varchar(2) ID_FASCIA_ETA: TINYINT
ESTREMO_SUP: TINYINT
COPERTURA_WS: varchar(2) DATA_HHMM_CONTATTO: smalldatetime
CENTRALE_ADSL_SA: varchar(2) ID_MOT_NON_ADES_ALTR: TINYINT
TIPO_APPARATO: varchar(50) ID_MOT_NONADES_NOTE: int
DMA_DC15_GESTORI
INDIRIZZO_DEL_CLI_INPUT: varchar(200) ID_ESITO_CCALL: SMALLINT
COD_PROVINCIA: varchar(3) ID_GESTORE: SMALLINT
D_DMA_DCGS_CODICI_GEST_SIOPE COMUNE: varchar(100)
PROVINCIA: varchar(100) COD_COMUNE: varchar(3) GESTORE: varchar(100)
REGIONE: varchar(50) COD_REGIONE: varchar(2)
CAP: varchar(5) ID_OPERATORE_VENDITA: int
NUM_LINEE: TINYINT ID_OPERATORE_CCALL: int DMA_DC00_TEMPO
FAX: varchar(15) ID_MOT_RIFIUTO_NOTE: int
ID_TEMPO: SMALLINT
D_DMA_DAEN_ANAGRAFICA_ENTE COD_CLIENTE: varchar(20)
COD_REGIONE: varchar(2) ANNO: SMALLINT
COD_PROVINCIA: varchar(3) MESE: TINYINT
COD_COMUNE: varchar(3) GIORNO: TINYINT
DATA: smalldatetime
DMA_DC04_ESITO_CCALL ORA: TINYINT
ID_ESITO_CCALL: SMALLINT FESTIVO: bit
ESITO_COURTESY_CALL: varchar(20)
MOTIVO_RIFIUTO: varchar(80)
COD_ESITO_CCAL: varchar(10)
DMA_DC18_REGIONI
D_DMA_DDCT_DATA_CONT_SIOPE D_DMA_FMUE_MOV_USCITE_ENTRATE DMA_DC14_TERRITORIO COD_REGIONE: varchar(2)
DMA_DC05_MOT_RIFIUTO_NOTE
COD_PROVINCIA: varchar(3)
ID_MOT_RIFIUTO_NOTE: int REGIONE: varchar(50)
COD_COMUNE: varchar(3)
RIPARTIZIONE: varchar(30)
MOTIVO_RIFIUTO_NOTE: varchar(2000) COMUNE: varchar(100) COD_RIPARTIZ: varchar(1)
PROVINCIA: varchar(100)
DMA_DC03_ESITO_CONTATTO REGIONE: varchar(100)
ID_ESITO: SMALLINT RIPARTIZIONE: varchar(20)
CODICE_REGIONE: varchar(2)
UTILITA: varchar(10) CODICE_RIPART: varchar(1)
ESITO: varchar(20)
MOTIVO_ESITO: varchar(80)
COD_ESITO_DEF: varchar(2)
ESITO_DEFINITIVO: varchar(20)
D_DMA_DBTE_BANCA_TESORIERA DMA_DC16_PROVINCE
DMA_DC10_MOT_NONADES_NOTE COD_PROVINCIA: varchar(3)
ID_MOT_NONADES_NOTE: int PROVINCIA: varchar(100)
MOT_NONADES_NOTE: varchar(2000) COD_REGIONE: varchar(2)
DMA_DC06_MOT_NONADES_ALTRO
ID_MOT_NON_ADES_ALTR: TINYINT
MOTIVO_NON_ADESIONE_ALTR: varchar(50)
8
8 Twinning Project “Improving data quality in public accounts”
8
9. Tools
Examples of tools for data management and Business Intelligence
(opensource applications)
Google refine http://code.google.com/p/google-refine/
Business Intelligence opensource tools:
- http://www.pentaho.com/
- http://www.jaspersoft.com/
- http://www.palo.net/
Free software with fee for support
9
Twinning Project “Improving data quality in public accounts”
9
10. Visualization techniques
Visualization techniques (cartography, advanced visualization
tools)
Mindmaps
Displaying news
Displaying data
Displaying connections
Displaying websites
Articles & resources
Tools and services
Tableau http://www.tableausoftware.com/public/community
CACS http://www.cacs.org/post/index
10
Twinning Project “Improving data quality in public accounts”
10
12. Visualization techniques
Google Public Data Explorer:
a simple way to start presenting data using advanced
visualization techniques is Google Public Data Explorer
(http://www.google.com/publicdata/home), a tool by which every
organization can show his data on the Web so that users can find,
explore, and share it.
Two steps available for using GPDE:
1 - MoF can start testing the tool uploading datasets for
visualization and exploration by privileged users
2 – in a second phase MoF can agree with Google for a formal
insertion of his data in the Dataset Directory
http://www.google.com/publicdata/directory
Many organizations have chosen this way of publishing (often as
additional way to their website), between them WorldBank, IMF,
OECD, Eurostat etc.
12
Twinning Project “Improving data quality in public accounts”
12
13. Visualization techniques
VIDI
VIDI suite is a set of Drupal (an open CMS) modules designed to
enable the creation of visual data displays. Using VIDI tools you
can display changes in data values over time, relate data in
various ways to geographical maps, or display static datasets
through different types of charts. You can use Dataviz website to
create visual data displays
Two ways to use it:
1. You can use VIDI on the website http://www.dataviz.org
loading your data, choosing between available visualizations and
storing your visualization
2. Download VIDI modules and install them in your Drupal
webiste, then import your datasets and prepare data displays
see http://www.patchworknation.org/
13
Twinning Project “Improving data quality in public accounts”
13
14. Visualization techniques
Future: HTML5 – new standard for the Web
See some show: http://www.apple.com/html5/
Some effect from http://slides.html5rocks.com
14
Twinning Project “Improving data quality in public accounts”
14
15. Open Data
Open Data:
- data freely available to everyone
- elementary (raw) data
- to use and republish
- without restrictions from copyrights or patents
Best practice: World Bank http://data.worldbank.org/
Data: by country or by topic or by indicator (1000+)
All indicators available in table, map, graph and downloadable as
xls and xml
On the WB website also modules to directly access WB data from
Stata and “R” statistical tools
15
Twinning Project “Improving data quality in public accounts”
15
16. Open Data
Tim Berners-Lee: Linked Data associated
with gold stars, like the ones you got in
school.
1 - make your stuff available on the web
(whatever format)
2 - make it available as structured data
(e.g. excel instead of image scan)
3 - non-proprietary format (e.g. csv not xls)
4 - use URLs to identify things, so that
people can point at your stuff
5 - link your data to other people’s data to
provide context
16
Twinning Project “Improving data quality in public accounts”
16
17. Linked Data
17
Twinning Project “Improving data quality in public accounts”
17
18. Open Data tools: CKAN
CKAN stands for Comprehensive Knowledge Archive Network
Developed by OKFN Open Knowledge Foundation Network
Open source package that make data accessible – by providing
tools to streamline publishing, sharing, finding and using data.
CKAN is aimed at data publishers (national and regional
governments, companies and organizations) wanting to make
their data open and available
Used by many central (dk, no, uk) and local governments
Features:
Publish & Find Datasets (import, keywords, versioning)
Store & Manage Data (Raw data, metadata, statistics, geo-)
Engage with users & Others (Community Mgmt)
Customize & Extend (APIs, extensions, opensource)
18
Twinning Project “Improving data quality in public accounts”
18
19. Open Data tools: data.gov / Drupal
Drupal is a CMS (Content Management System) opensource often
used in Open Data projects
Data.gov code released as OSS (a modified Drupal version) used
also for India → Open Government Platform
Drupal OpenData working group
http://groups.drupal.org/opendata-working-group
Data Journalism:
http://www.guardian.co.uk/world/datablog/2010/feb/01/united-
nations-population-world-data
19
Twinning Project “Improving data quality in public accounts”
19