SlideShare una empresa de Scribd logo
1 de 8
Descargar para leer sin conexión
International Marketing and Output Database Conference
Blarney, Cork, 24th –28th September 2007




                        What's a City Transport System
                                Got to Do With
                    Publishing Data in an Output Database?

                                         Katja Šnuderl
                        Statistical Office of the Republic of Slovenia
                                     katja.snuderl@gov.si



Abstract

It is easy to compare a city transport system to the process of publishing statistical
data on a statistical website. There are completely unorganized systems, where
everyone drives to work in their own cars, takes whatever route is most convenient at
the time and expects to park as close as possible to their destinations. This is similar
to those systems in which there are no rules as to how, when, where and in what
form data are published. There are several reasons why neither such a transport
system nor such a statistical output database is preferable.

Conversely, there are completely organized systems, where all of the commuters use
a public transportation system designed to their needs. Users adjust to the various
schedules and transportation availability in order to reach their goals. This
corresponds to a metadata-driven system where a well organized metadata repository
runs data publishing through a pre-defined process based on integrated databases
and templates.

This article focuses on work done and lessons learned during a project of upgrading
the Slovenian statistical output database from a file server to a macro database.


Context

Following the general trend of making statistical data available on the web, the
Statistical Office of the Republic of Slovenia (Statistics Slovenia) decided to build an
output database. First databases (Agriculture Census and Population Census) in 2003
were based on the PC-Axis file format and tools. As the concept has proven to be
efficient, Statistics Slovenia has decided to migrate all of its dissemination to the
output database. The dilemma of choosing either a file server system or an SQL
macro model was always present, until some largest tables hit the technical
limitations of the file server system. Within a new project in the field of External
Trade a new PC-Axis SQL macro database was built. Having experiences with both
systems and with migrating from one to another helped at identifying a metaphor
that can help "non-IT people" understand the differences between table and database
management.
Katja Šnuderl: What's a City Transport System Got to Do With Publishing Data in an Output Database?




    1. Introduction

It is all about people. There is no IT solution being run by machines for machines.
Each is created by people, maintained by people and used by people. Therefore, when
building an output database it is important to understand how the human mind
works.

Somehow it seems we believe that everything that looks simple is simple. But in
reality to make a simple application, where a user can understand the features easily
and learn only by doing, it takes thorough analysis of users' needs, their behaviour,
technical possibilities and an exacting decision process. It takes less work to make
something that looks complicated and is difficult to use.

In terms of a transport system we could say that good transport networks don't just
happen. It takes a lot of effort to turn a chaotic situation into a well run public
service. Good route maps and schedules are based on user needs analysis and
technical possibilities. They evolve for years.

Basic preconditions for a succesful project are sharing the information (among al
participants and cooperating parties in the project), understanding the project goal
and decision and (management) support. No support is possible without
understanding the problems. The comparison of building an output database with a
transport system can sometimes help us explain basics of standardization and
changes to someone who sees building a database purely as an IT matter.
Management can support our needs even without understanding IT matters – if we
know how to explain them in an understandable way. Since transport is somehing
most people know and use, it can be used as a useful comparison.


    2. "Keep it as it is, we're fine"

There is always a problem when a system
changes. The new one doesn't always
support all the options the old one had.
Many people ask why changing a system
that runs well at all, but if this view was
always respected we'd be still using
carriages.

The project on External Trade was built in order to replace dissemination of data in
the Statistical Databank, an older instance of the output database. The Statistical
Databank had a lot of regular users who extracted data monthly. However, only one
kind of extraction was possible: one flow (exports or imports) for one time period by
tariff codes (for one country or total) or by countries (for one tariff code or total). In
the new database users can combine flows, several time periods, many tariff codes
and many countries. The output table always has a multidimensional structure and
presents also empty cells – if a user selects a country with no flows, the country is
listed in the table with appropriate statistical sign.




                                                       -2-
Katja Šnuderl: What's a City Transport System Got to Do With Publishing Data in an Output Database?



Regular users, who were adjusting to the old database for years, had many problems
and special requests when introducing the new system. We had to enlarge the
selection size limit in the first week and we introduced new functions to filter data
according to existing data flows. Luckily all users will benefit from the new functions,
though not all parameters of the previous output were met.


    3. "Don't just just replace cars with buses"

We often say that there is no IT solution that
could change a process by itself. Changing
only the technical part of the process is
similar to giving people buses instead of cars.
Without changing anything else, people
would probably start driving one bus each to
the workplace. A project manager should be
careful in preventing usage of new tools in
old and obsolete ways. At the same time it is
essential to know that users have to adjust to
new tools at different levels and not all of
them can be the "drivers".

At Statistics Slovenia we chose a step-by-step approach when building the output
database. The first stage was building the file server, where procedures and tools are
easiest to understand for statisticians who were used to preparing tables in
spreadsheets. The first tables were always prepared by the support team in order to
meet all the general rules. The first examples also helped statisticians understand the
multidimensional table structure. At the beginning we always took what was available
and tried to create a comprehensive multidimensional table from existing tabulations
(published tables). In the last year a major step was made when we introduced new
tabulation rules based on our experiences. The new rules introduce a clear
multidimensional structure, where the statistician only defines the content of the
table. The programming unit then prepares a new tabulation with the available tool
(from the view of the source or the responsible person) by the general rules of
tabulation for the PC-Axis database. The main result of the whole exercise is higher
understanding of multidimensional table structure by the statisticians and the
programming unit. But, when preparing these tables statisticians had to learn and use
new tools for table management. They have to update existing tables with new time
periods themselves.

When building the new macro database, the next step was taken. Here statisticians
only deal with content definition and don't manage the tables in any way. Once the
data for the new time period are ready, the support unit pulls data into the macro
database. The statistician can make the final check whether data and metadata are
ready to be published. The procedure of pulling data is manual for now and will be
automated when it is stable. At the early stage we prefer to do it manually in order to
learn how the automated process should run.




                                                       -3-
Katja Šnuderl: What's a City Transport System Got to Do With Publishing Data in an Output Database?




    4. Transport logistics is complex

In reality nobody expects a city tram system to cover all the
areas of the city. Transport modules (trains, trams, metro,
buses, cars, etc.) are differentiated but at the same time
integrated and can be used successively. In the same way a
good IT system should be developed in modules - coherent,
integrated and supporting each other.

When building our new output database a decision was made that new applications
shouldn't depend on any other system within Statistics Slovenia. It was understood
that the dissemination "module" will be integrated with the metadata system, but
only at a later stage. Working other way could reasonably slow down the project or
even cause failure. For classifications we decided to pull them from the classification
server and maybe at a later stage use direct views. But, as not all classifications are
always prepared in the server, a backup option to be able to import classifications as
TXT files was introduced.

A similar solution was introduced for importing data into the output database. We
expect all data to be available in micro or macro databases eventually. Currently at
Statistics Slovenia we still maintain the variety of sources of data. Input tables are
created from relational databases, flat files and Excel spreadsheets. Tools for
tabulation are versatile, from SQL queries and views to Cobol, TPL, SAS and Excel
tabulations. We even prepared a simple converter for TXT files from TPL to be
converted to the correct CSV structure. So even though the project was run on data
for External Trade (available in an Oracle database), procedures to import data from
other SQL databases or CSV files or even existing PC-Axis files were developed.

Having the old output database (file server) and building the new one at the same
time brought us the luxury of having an option to keep them both. Our strategy is to
eventually migrate all data to the SQL Macro Database, but there is no need to do it
before input data sources are consolidated. For now both systems will be supported
and integrated.

Another aspect of coexistance of transport systems is the image of simplicity. When a
system runs smoothly and is easy to use, usually a lot of efforts were made towards
integrating and coordinating different modules. Intuitive tools are based on lots of
axperiences, selection of needs and testing. On the other hand, if a system looks
complicated and is difficult to use is very easy to develop. You simply respct all needs
and make no selection. In the proces of preparing the specifications of the output
database project a lot of emphasis was given to the expected outcome, especially
with the end-user solution (web interface to view the data) in order to make it
intuitive and easy to use. Unfortunately fewer experiences were available when
building the database management application, so the tool turned out to be rather
complicated to use.




                                                       -4-
Katja Šnuderl: What's a City Transport System Got to Do With Publishing Data in an Output Database?




    5. "Let the grass grow, please!"

Allowing exception to rules is similar to
building parking places where people tend
to park on the grass. Finally everything is a
parking place and the chaos remains.
There is no green colour to calm the
nervous drivers down anymore.

Already in our first output database some general rules were introduced. We had the
file naming convention (unique file names within the whole system), corporate
metadata, common classifications and some standard links (to methodological
explanations, the release calendar and questionnaires). But in a file server it is
difficult to validate each and every file whether it is compliant to the rules. As it was
done manually, not all exceptions were noticed and some were even agreed upon.

On the other hand, when we built the macro database we formed some very strict
rules. For example, all classifications in use have to be maintained in the classification
server. Even though there is an alternative to import classifications, all tables with
exceptions will be maintained in the file server. This decision is based on the workload
balancing – in the macro database the management of metadata is done by the
support unit. If statisticians demand to maintain an exception to the rule, they have
to manage the table themselves. They can only do that within the file server. Even in
the long run we don't plan to allocate management of the metadata from the support
unit to the statisticians.


    6. Why bother with anything else than a taxi?

In some big cities around the world people don't use public transportation but the taxi
service. There is no worrying about schedules or need to learn which route to go and
which number to take. In output database management terms there can be a support
unit that manages all the
dissemination of statistical data.
Statisticians are only involved in
managing the statistical process up
to dissemination. They don't have to
learn or use any new tools to
prepare data for dissemination.

Statistics Slovenia is relatively small. The output database support unit grew to 5
members who work on regular production and development in parallel. Therefore the
process of producing files to be published was organized within the subject-matter
units from the early beginning. One of the arguments for such a decision was also
knowledge, as only statisticians knew the content of a statistical survey and could
define expected outputs. But through the file server management also experiences
and knowledge within the support unit were collected. While building the new macro
database we wondered whether there is any need to put any technical burdens on the
content managers. We decided no to do so for the start, so all technical matters are
done within the output database unit.




                                                       -5-
Katja Šnuderl: What's a City Transport System Got to Do With Publishing Data in an Output Database?



It is always a matter of balancing – if all management is given to subject-matter
units, it is not very probable that the coherence principles would be met. If all
management is centralized, subject-matter units could oppose solutions that don't
support their special requirements. So it is important to set some clear rules and
introduce validation tools that support these rules on one hand, and balance
management between the content managers and output database team on the other.




    7. "Lost"

Not many people get lost in the Paris Metro network. At
every station it is easy to find maps and information
where to exit and where to continue to go the right
way. But in another country it is fairly easy to miss the
Haag train station and end up in Rotterdam.

As an output database includes more and more data, it
also grows larger and larger. It is important to build a
navigation system that helps users easily navigate within the database. This refers
either to entering the database to find the data or later to find the way back.

The first challenge is how to build an efficient way to find the data. The new output
database at Statistics Slovenia offers several options. One is browsing through the
content tree from the starting page of the database. There all subjects are available
and users have to open the content tree and check table titles whether they seem
compliant with their needs. This option is available without additional maintenance of
metadata, just using the database content definitions. But, besides the entry page
we've introduced an option to open the content tree at any level within a subject
area. For this purpose we use content identification numbers, unique and
standardized among different dissemination products. For example, on our website
every theme (e.g. Prices) has an ID number. Opening the database content tree with
the same ID number opens only items within the same theme (Prices). Identifications
go down to a single table. When the content tree opens partially, the current location
is read from the database and written in the header section.

In the next step we will add an option to search for tables. We plan to introduce a
keyword search, where a pre-defined list of keywords will be prepared and linked to
the tables. Users will only be able to select words from the list. The words will be
suggested while typing the letters. The list of keywords will be maintained regularly in
order to support users' needs.

When users select data from a table in the database, they are often interested in
continuing work on other tables from the same content. To support such request, we
introduced a command "List of tables" in the menu bar, which opens the content tree
for the same content.




                                                       -6-
Katja Šnuderl: What's a City Transport System Got to Do With Publishing Data in an Output Database?




    8. "Shinkansen or a good old tram"?

We are yet far from Shinkansen and the Japanese transportation system. Actually in
Slovenia one can experience that it is not enough to replace old trains with new ones
that can speed up to 200 km/h. Here at some places they have to slow down to 50
km/h or less, otherwise the tracks would collapse. Or you get stuck on a train station
because nobody knows how to unlock a secured carriage and after half an hour of
trying and thinking they have to move people and uncouple the carriage so the train
can proceed.

So, what we did for now is limiting parking places for cars within the city, introduce
many bus routes, one intercity train route ending in the suburbs and one tram route
from the suburbs to the centre. The system might be not the most modern, but it has
proven to be is reliable. In reality we reduced the number of published Excel
spreadsheets in favour of multidimensional tables, introduced standard procedures for
tabulation of multidimensional tables, included the classification server in the
dissemination process and built a macro database for data on External Trade. The
next "tram" routes will be prepared for Earnings and Tourism Statistics.

After deciding to maintain both systems (the file server and the macro database) our
main goal was to integrate them without putting burden on the user when searching
for data. Basic principles are:
          a)     Single entry point
          b)     Same "Look and feel"
          c)     Same functions + advanced options in the macro database
          d)     Same support (header menus)
          e)     Single registration for advanced user (option to save queries).

A lot of effort was put into coherent design of the two systems, adjusted to the design
of the Statistics Slovenia website. The only connecting point of the two databases is
the content tree view, the entry page of the database. From there users are
redirected either to a table in the file server database or in the macro database. In
the tree view there are also links to related content: First Releases, methodological
explanations, statistical questionnaires, special publications, links to external websites
(data on websites of other governmental bodies) and links to the Eurostat database.

To view or download, a data user can select any values from the table, change
texts/codes presentation of values, pivot the table, view selection-specific footnotes,
change decimals presentation, display data in graph or map and download data to
several formats. Advanced features in the macro database support selection and
filtering of hierarchical variables by levels, removing empty lines, sorting and a better
structured presentation of footnotes.

With the new database structure we are also introducing pre-defined tables, where
less experienced users can look at data just by clicking the table title. The content of
pre-defined tables was defined by each theme editor.




                                                       -7-
Katja Šnuderl: What's a City Transport System Got to Do With Publishing Data in an Output Database?




    9. Conclusion

Our habits differ in different societies. Not every country has as many problems with
transport systems as Slovenia. But still, there are some basic principles that everyone
understands and that can be used when explaining the principles of building a new IT
solution to a non-IT person.

During the project of building the new dissemination macro database our main goal
was to build a system that will support different contents, different input data formats
and versatile users. From the start we have been careful about standardisation,
coherence and process management. We are building on our experiences with the file
server database. At the same time we are trying to meet most users' needs.

Statistics is produced by people for people and our role in this process is to make it
accessible, reliable and understandable.




                                                       -8-

Más contenido relacionado

Destacado

your PR Consultant
your PR Consultantyour PR Consultant
your PR Consultantmistertipr
 
Seminar Nasional Asbanda 18 Juli 2014
Seminar Nasional Asbanda 18 Juli 2014 Seminar Nasional Asbanda 18 Juli 2014
Seminar Nasional Asbanda 18 Juli 2014 mistertipr
 
Kiat Mengendalikan Pertanyaan Wartawan
Kiat Mengendalikan Pertanyaan WartawanKiat Mengendalikan Pertanyaan Wartawan
Kiat Mengendalikan Pertanyaan Wartawanmistertipr
 
แผนการตลาดแบบรมิตา
แผนการตลาดแบบรมิตาแผนการตลาดแบบรมิตา
แผนการตลาดแบบรมิตาtonypuy
 
Branding New Party by Tarsih Ekaputra
Branding New Party by Tarsih EkaputraBranding New Party by Tarsih Ekaputra
Branding New Party by Tarsih Ekaputramistertipr
 
Pengumuman Pemenang Optifog Video Contest Dec 2011 - Feb 2012
Pengumuman Pemenang Optifog Video Contest Dec 2011 - Feb 2012Pengumuman Pemenang Optifog Video Contest Dec 2011 - Feb 2012
Pengumuman Pemenang Optifog Video Contest Dec 2011 - Feb 2012mistertipr
 
Laporan harian Lensa optifog 01 juli 2011
Laporan harian Lensa optifog 01 juli 2011Laporan harian Lensa optifog 01 juli 2011
Laporan harian Lensa optifog 01 juli 2011mistertipr
 
Mediaclipping Impian 1 Milyar
Mediaclipping Impian 1 MilyarMediaclipping Impian 1 Milyar
Mediaclipping Impian 1 Milyarmistertipr
 
Laporan bonchon non magz & tabloid
Laporan bonchon non magz & tabloidLaporan bonchon non magz & tabloid
Laporan bonchon non magz & tabloidmistertipr
 
Tagging: Can User-Generated Content Improve Our Services?
Tagging: Can User-Generated Content Improve Our Services?Tagging: Can User-Generated Content Improve Our Services?
Tagging: Can User-Generated Content Improve Our Services?Katja Šnuderl
 
Berita Lensa Optifog 3
Berita Lensa Optifog 3Berita Lensa Optifog 3
Berita Lensa Optifog 3mistertipr
 
Metadata and Dissemination
Metadata and DisseminationMetadata and Dissemination
Metadata and DisseminationKatja Šnuderl
 
Laporan harian Berita Lensa Optifog 28 juni 2011
Laporan harian Berita Lensa Optifog 28 juni 2011Laporan harian Berita Lensa Optifog 28 juni 2011
Laporan harian Berita Lensa Optifog 28 juni 2011mistertipr
 
C:\Fakepath\Kiat Mengendalikan Pertanyaan Wartawan
C:\Fakepath\Kiat Mengendalikan Pertanyaan WartawanC:\Fakepath\Kiat Mengendalikan Pertanyaan Wartawan
C:\Fakepath\Kiat Mengendalikan Pertanyaan Wartawanmistertipr
 
Panen Rejeki BPD Pontianak
Panen Rejeki BPD PontianakPanen Rejeki BPD Pontianak
Panen Rejeki BPD Pontianakmistertipr
 
Media Training Module
Media Training ModuleMedia Training Module
Media Training Modulemistertipr
 
Richtimeteam Pre
Richtimeteam PreRichtimeteam Pre
Richtimeteam Pretonypuy
 

Destacado (20)

your PR Consultant
your PR Consultantyour PR Consultant
your PR Consultant
 
Seminar Nasional Asbanda 18 Juli 2014
Seminar Nasional Asbanda 18 Juli 2014 Seminar Nasional Asbanda 18 Juli 2014
Seminar Nasional Asbanda 18 Juli 2014
 
Kiat Mengendalikan Pertanyaan Wartawan
Kiat Mengendalikan Pertanyaan WartawanKiat Mengendalikan Pertanyaan Wartawan
Kiat Mengendalikan Pertanyaan Wartawan
 
Archiving the website
Archiving the websiteArchiving the website
Archiving the website
 
แผนการตลาดแบบรมิตา
แผนการตลาดแบบรมิตาแผนการตลาดแบบรมิตา
แผนการตลาดแบบรมิตา
 
Branding New Party by Tarsih Ekaputra
Branding New Party by Tarsih EkaputraBranding New Party by Tarsih Ekaputra
Branding New Party by Tarsih Ekaputra
 
Pengumuman Pemenang Optifog Video Contest Dec 2011 - Feb 2012
Pengumuman Pemenang Optifog Video Contest Dec 2011 - Feb 2012Pengumuman Pemenang Optifog Video Contest Dec 2011 - Feb 2012
Pengumuman Pemenang Optifog Video Contest Dec 2011 - Feb 2012
 
Ps Writing
Ps WritingPs Writing
Ps Writing
 
Laporan harian Lensa optifog 01 juli 2011
Laporan harian Lensa optifog 01 juli 2011Laporan harian Lensa optifog 01 juli 2011
Laporan harian Lensa optifog 01 juli 2011
 
Mediaclipping Impian 1 Milyar
Mediaclipping Impian 1 MilyarMediaclipping Impian 1 Milyar
Mediaclipping Impian 1 Milyar
 
Laporan bonchon non magz & tabloid
Laporan bonchon non magz & tabloidLaporan bonchon non magz & tabloid
Laporan bonchon non magz & tabloid
 
Tagging: Can User-Generated Content Improve Our Services?
Tagging: Can User-Generated Content Improve Our Services?Tagging: Can User-Generated Content Improve Our Services?
Tagging: Can User-Generated Content Improve Our Services?
 
Berita Lensa Optifog 3
Berita Lensa Optifog 3Berita Lensa Optifog 3
Berita Lensa Optifog 3
 
Metadata and Dissemination
Metadata and DisseminationMetadata and Dissemination
Metadata and Dissemination
 
Laporan harian Berita Lensa Optifog 28 juni 2011
Laporan harian Berita Lensa Optifog 28 juni 2011Laporan harian Berita Lensa Optifog 28 juni 2011
Laporan harian Berita Lensa Optifog 28 juni 2011
 
C:\Fakepath\Kiat Mengendalikan Pertanyaan Wartawan
C:\Fakepath\Kiat Mengendalikan Pertanyaan WartawanC:\Fakepath\Kiat Mengendalikan Pertanyaan Wartawan
C:\Fakepath\Kiat Mengendalikan Pertanyaan Wartawan
 
Panen Rejeki BPD Pontianak
Panen Rejeki BPD PontianakPanen Rejeki BPD Pontianak
Panen Rejeki BPD Pontianak
 
Media Training Module
Media Training ModuleMedia Training Module
Media Training Module
 
Richtimeteam Pre
Richtimeteam PreRichtimeteam Pre
Richtimeteam Pre
 
PR Proposal
PR ProposalPR Proposal
PR Proposal
 

Más de Katja Šnuderl

Povezani odprti podatki SURS?
Povezani odprti podatki SURS?Povezani odprti podatki SURS?
Povezani odprti podatki SURS?Katja Šnuderl
 
Значение метаданных (2014)
Значение метаданных (2014)Значение метаданных (2014)
Значение метаданных (2014)Katja Šnuderl
 
The Importance of Metadata
The Importance of MetadataThe Importance of Metadata
The Importance of MetadataKatja Šnuderl
 
Современные сайты статистики (2014)
Современные сайты статистики (2014)Современные сайты статистики (2014)
Современные сайты статистики (2014)Katja Šnuderl
 
Statistical Website Principles
Statistical Website PrinciplesStatistical Website Principles
Statistical Website PrinciplesKatja Šnuderl
 
Planning and persuading: the organizational implications
Planning and persuading: the organizational implicationsPlanning and persuading: the organizational implications
Planning and persuading: the organizational implicationsKatja Šnuderl
 
What’s a City Transport System Got to Do With Publishing Data in an Output Da...
What’s a City Transport System Got to Do With Publishing Data in an Output Da...What’s a City Transport System Got to Do With Publishing Data in an Output Da...
What’s a City Transport System Got to Do With Publishing Data in an Output Da...Katja Šnuderl
 

Más de Katja Šnuderl (7)

Povezani odprti podatki SURS?
Povezani odprti podatki SURS?Povezani odprti podatki SURS?
Povezani odprti podatki SURS?
 
Значение метаданных (2014)
Значение метаданных (2014)Значение метаданных (2014)
Значение метаданных (2014)
 
The Importance of Metadata
The Importance of MetadataThe Importance of Metadata
The Importance of Metadata
 
Современные сайты статистики (2014)
Современные сайты статистики (2014)Современные сайты статистики (2014)
Современные сайты статистики (2014)
 
Statistical Website Principles
Statistical Website PrinciplesStatistical Website Principles
Statistical Website Principles
 
Planning and persuading: the organizational implications
Planning and persuading: the organizational implicationsPlanning and persuading: the organizational implications
Planning and persuading: the organizational implications
 
What’s a City Transport System Got to Do With Publishing Data in an Output Da...
What’s a City Transport System Got to Do With Publishing Data in an Output Da...What’s a City Transport System Got to Do With Publishing Data in an Output Da...
What’s a City Transport System Got to Do With Publishing Data in an Output Da...
 

Último

My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesZilliz
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfSeasiaInfotech2
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 

Último (20)

My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector Databases
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdf
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 

What's a City Transport System Got to Do With Publishing Data in an Output Database?

  • 1. International Marketing and Output Database Conference Blarney, Cork, 24th –28th September 2007 What's a City Transport System Got to Do With Publishing Data in an Output Database? Katja Šnuderl Statistical Office of the Republic of Slovenia katja.snuderl@gov.si Abstract It is easy to compare a city transport system to the process of publishing statistical data on a statistical website. There are completely unorganized systems, where everyone drives to work in their own cars, takes whatever route is most convenient at the time and expects to park as close as possible to their destinations. This is similar to those systems in which there are no rules as to how, when, where and in what form data are published. There are several reasons why neither such a transport system nor such a statistical output database is preferable. Conversely, there are completely organized systems, where all of the commuters use a public transportation system designed to their needs. Users adjust to the various schedules and transportation availability in order to reach their goals. This corresponds to a metadata-driven system where a well organized metadata repository runs data publishing through a pre-defined process based on integrated databases and templates. This article focuses on work done and lessons learned during a project of upgrading the Slovenian statistical output database from a file server to a macro database. Context Following the general trend of making statistical data available on the web, the Statistical Office of the Republic of Slovenia (Statistics Slovenia) decided to build an output database. First databases (Agriculture Census and Population Census) in 2003 were based on the PC-Axis file format and tools. As the concept has proven to be efficient, Statistics Slovenia has decided to migrate all of its dissemination to the output database. The dilemma of choosing either a file server system or an SQL macro model was always present, until some largest tables hit the technical limitations of the file server system. Within a new project in the field of External Trade a new PC-Axis SQL macro database was built. Having experiences with both systems and with migrating from one to another helped at identifying a metaphor that can help "non-IT people" understand the differences between table and database management.
  • 2. Katja Šnuderl: What's a City Transport System Got to Do With Publishing Data in an Output Database? 1. Introduction It is all about people. There is no IT solution being run by machines for machines. Each is created by people, maintained by people and used by people. Therefore, when building an output database it is important to understand how the human mind works. Somehow it seems we believe that everything that looks simple is simple. But in reality to make a simple application, where a user can understand the features easily and learn only by doing, it takes thorough analysis of users' needs, their behaviour, technical possibilities and an exacting decision process. It takes less work to make something that looks complicated and is difficult to use. In terms of a transport system we could say that good transport networks don't just happen. It takes a lot of effort to turn a chaotic situation into a well run public service. Good route maps and schedules are based on user needs analysis and technical possibilities. They evolve for years. Basic preconditions for a succesful project are sharing the information (among al participants and cooperating parties in the project), understanding the project goal and decision and (management) support. No support is possible without understanding the problems. The comparison of building an output database with a transport system can sometimes help us explain basics of standardization and changes to someone who sees building a database purely as an IT matter. Management can support our needs even without understanding IT matters – if we know how to explain them in an understandable way. Since transport is somehing most people know and use, it can be used as a useful comparison. 2. "Keep it as it is, we're fine" There is always a problem when a system changes. The new one doesn't always support all the options the old one had. Many people ask why changing a system that runs well at all, but if this view was always respected we'd be still using carriages. The project on External Trade was built in order to replace dissemination of data in the Statistical Databank, an older instance of the output database. The Statistical Databank had a lot of regular users who extracted data monthly. However, only one kind of extraction was possible: one flow (exports or imports) for one time period by tariff codes (for one country or total) or by countries (for one tariff code or total). In the new database users can combine flows, several time periods, many tariff codes and many countries. The output table always has a multidimensional structure and presents also empty cells – if a user selects a country with no flows, the country is listed in the table with appropriate statistical sign. -2-
  • 3. Katja Šnuderl: What's a City Transport System Got to Do With Publishing Data in an Output Database? Regular users, who were adjusting to the old database for years, had many problems and special requests when introducing the new system. We had to enlarge the selection size limit in the first week and we introduced new functions to filter data according to existing data flows. Luckily all users will benefit from the new functions, though not all parameters of the previous output were met. 3. "Don't just just replace cars with buses" We often say that there is no IT solution that could change a process by itself. Changing only the technical part of the process is similar to giving people buses instead of cars. Without changing anything else, people would probably start driving one bus each to the workplace. A project manager should be careful in preventing usage of new tools in old and obsolete ways. At the same time it is essential to know that users have to adjust to new tools at different levels and not all of them can be the "drivers". At Statistics Slovenia we chose a step-by-step approach when building the output database. The first stage was building the file server, where procedures and tools are easiest to understand for statisticians who were used to preparing tables in spreadsheets. The first tables were always prepared by the support team in order to meet all the general rules. The first examples also helped statisticians understand the multidimensional table structure. At the beginning we always took what was available and tried to create a comprehensive multidimensional table from existing tabulations (published tables). In the last year a major step was made when we introduced new tabulation rules based on our experiences. The new rules introduce a clear multidimensional structure, where the statistician only defines the content of the table. The programming unit then prepares a new tabulation with the available tool (from the view of the source or the responsible person) by the general rules of tabulation for the PC-Axis database. The main result of the whole exercise is higher understanding of multidimensional table structure by the statisticians and the programming unit. But, when preparing these tables statisticians had to learn and use new tools for table management. They have to update existing tables with new time periods themselves. When building the new macro database, the next step was taken. Here statisticians only deal with content definition and don't manage the tables in any way. Once the data for the new time period are ready, the support unit pulls data into the macro database. The statistician can make the final check whether data and metadata are ready to be published. The procedure of pulling data is manual for now and will be automated when it is stable. At the early stage we prefer to do it manually in order to learn how the automated process should run. -3-
  • 4. Katja Šnuderl: What's a City Transport System Got to Do With Publishing Data in an Output Database? 4. Transport logistics is complex In reality nobody expects a city tram system to cover all the areas of the city. Transport modules (trains, trams, metro, buses, cars, etc.) are differentiated but at the same time integrated and can be used successively. In the same way a good IT system should be developed in modules - coherent, integrated and supporting each other. When building our new output database a decision was made that new applications shouldn't depend on any other system within Statistics Slovenia. It was understood that the dissemination "module" will be integrated with the metadata system, but only at a later stage. Working other way could reasonably slow down the project or even cause failure. For classifications we decided to pull them from the classification server and maybe at a later stage use direct views. But, as not all classifications are always prepared in the server, a backup option to be able to import classifications as TXT files was introduced. A similar solution was introduced for importing data into the output database. We expect all data to be available in micro or macro databases eventually. Currently at Statistics Slovenia we still maintain the variety of sources of data. Input tables are created from relational databases, flat files and Excel spreadsheets. Tools for tabulation are versatile, from SQL queries and views to Cobol, TPL, SAS and Excel tabulations. We even prepared a simple converter for TXT files from TPL to be converted to the correct CSV structure. So even though the project was run on data for External Trade (available in an Oracle database), procedures to import data from other SQL databases or CSV files or even existing PC-Axis files were developed. Having the old output database (file server) and building the new one at the same time brought us the luxury of having an option to keep them both. Our strategy is to eventually migrate all data to the SQL Macro Database, but there is no need to do it before input data sources are consolidated. For now both systems will be supported and integrated. Another aspect of coexistance of transport systems is the image of simplicity. When a system runs smoothly and is easy to use, usually a lot of efforts were made towards integrating and coordinating different modules. Intuitive tools are based on lots of axperiences, selection of needs and testing. On the other hand, if a system looks complicated and is difficult to use is very easy to develop. You simply respct all needs and make no selection. In the proces of preparing the specifications of the output database project a lot of emphasis was given to the expected outcome, especially with the end-user solution (web interface to view the data) in order to make it intuitive and easy to use. Unfortunately fewer experiences were available when building the database management application, so the tool turned out to be rather complicated to use. -4-
  • 5. Katja Šnuderl: What's a City Transport System Got to Do With Publishing Data in an Output Database? 5. "Let the grass grow, please!" Allowing exception to rules is similar to building parking places where people tend to park on the grass. Finally everything is a parking place and the chaos remains. There is no green colour to calm the nervous drivers down anymore. Already in our first output database some general rules were introduced. We had the file naming convention (unique file names within the whole system), corporate metadata, common classifications and some standard links (to methodological explanations, the release calendar and questionnaires). But in a file server it is difficult to validate each and every file whether it is compliant to the rules. As it was done manually, not all exceptions were noticed and some were even agreed upon. On the other hand, when we built the macro database we formed some very strict rules. For example, all classifications in use have to be maintained in the classification server. Even though there is an alternative to import classifications, all tables with exceptions will be maintained in the file server. This decision is based on the workload balancing – in the macro database the management of metadata is done by the support unit. If statisticians demand to maintain an exception to the rule, they have to manage the table themselves. They can only do that within the file server. Even in the long run we don't plan to allocate management of the metadata from the support unit to the statisticians. 6. Why bother with anything else than a taxi? In some big cities around the world people don't use public transportation but the taxi service. There is no worrying about schedules or need to learn which route to go and which number to take. In output database management terms there can be a support unit that manages all the dissemination of statistical data. Statisticians are only involved in managing the statistical process up to dissemination. They don't have to learn or use any new tools to prepare data for dissemination. Statistics Slovenia is relatively small. The output database support unit grew to 5 members who work on regular production and development in parallel. Therefore the process of producing files to be published was organized within the subject-matter units from the early beginning. One of the arguments for such a decision was also knowledge, as only statisticians knew the content of a statistical survey and could define expected outputs. But through the file server management also experiences and knowledge within the support unit were collected. While building the new macro database we wondered whether there is any need to put any technical burdens on the content managers. We decided no to do so for the start, so all technical matters are done within the output database unit. -5-
  • 6. Katja Šnuderl: What's a City Transport System Got to Do With Publishing Data in an Output Database? It is always a matter of balancing – if all management is given to subject-matter units, it is not very probable that the coherence principles would be met. If all management is centralized, subject-matter units could oppose solutions that don't support their special requirements. So it is important to set some clear rules and introduce validation tools that support these rules on one hand, and balance management between the content managers and output database team on the other. 7. "Lost" Not many people get lost in the Paris Metro network. At every station it is easy to find maps and information where to exit and where to continue to go the right way. But in another country it is fairly easy to miss the Haag train station and end up in Rotterdam. As an output database includes more and more data, it also grows larger and larger. It is important to build a navigation system that helps users easily navigate within the database. This refers either to entering the database to find the data or later to find the way back. The first challenge is how to build an efficient way to find the data. The new output database at Statistics Slovenia offers several options. One is browsing through the content tree from the starting page of the database. There all subjects are available and users have to open the content tree and check table titles whether they seem compliant with their needs. This option is available without additional maintenance of metadata, just using the database content definitions. But, besides the entry page we've introduced an option to open the content tree at any level within a subject area. For this purpose we use content identification numbers, unique and standardized among different dissemination products. For example, on our website every theme (e.g. Prices) has an ID number. Opening the database content tree with the same ID number opens only items within the same theme (Prices). Identifications go down to a single table. When the content tree opens partially, the current location is read from the database and written in the header section. In the next step we will add an option to search for tables. We plan to introduce a keyword search, where a pre-defined list of keywords will be prepared and linked to the tables. Users will only be able to select words from the list. The words will be suggested while typing the letters. The list of keywords will be maintained regularly in order to support users' needs. When users select data from a table in the database, they are often interested in continuing work on other tables from the same content. To support such request, we introduced a command "List of tables" in the menu bar, which opens the content tree for the same content. -6-
  • 7. Katja Šnuderl: What's a City Transport System Got to Do With Publishing Data in an Output Database? 8. "Shinkansen or a good old tram"? We are yet far from Shinkansen and the Japanese transportation system. Actually in Slovenia one can experience that it is not enough to replace old trains with new ones that can speed up to 200 km/h. Here at some places they have to slow down to 50 km/h or less, otherwise the tracks would collapse. Or you get stuck on a train station because nobody knows how to unlock a secured carriage and after half an hour of trying and thinking they have to move people and uncouple the carriage so the train can proceed. So, what we did for now is limiting parking places for cars within the city, introduce many bus routes, one intercity train route ending in the suburbs and one tram route from the suburbs to the centre. The system might be not the most modern, but it has proven to be is reliable. In reality we reduced the number of published Excel spreadsheets in favour of multidimensional tables, introduced standard procedures for tabulation of multidimensional tables, included the classification server in the dissemination process and built a macro database for data on External Trade. The next "tram" routes will be prepared for Earnings and Tourism Statistics. After deciding to maintain both systems (the file server and the macro database) our main goal was to integrate them without putting burden on the user when searching for data. Basic principles are: a) Single entry point b) Same "Look and feel" c) Same functions + advanced options in the macro database d) Same support (header menus) e) Single registration for advanced user (option to save queries). A lot of effort was put into coherent design of the two systems, adjusted to the design of the Statistics Slovenia website. The only connecting point of the two databases is the content tree view, the entry page of the database. From there users are redirected either to a table in the file server database or in the macro database. In the tree view there are also links to related content: First Releases, methodological explanations, statistical questionnaires, special publications, links to external websites (data on websites of other governmental bodies) and links to the Eurostat database. To view or download, a data user can select any values from the table, change texts/codes presentation of values, pivot the table, view selection-specific footnotes, change decimals presentation, display data in graph or map and download data to several formats. Advanced features in the macro database support selection and filtering of hierarchical variables by levels, removing empty lines, sorting and a better structured presentation of footnotes. With the new database structure we are also introducing pre-defined tables, where less experienced users can look at data just by clicking the table title. The content of pre-defined tables was defined by each theme editor. -7-
  • 8. Katja Šnuderl: What's a City Transport System Got to Do With Publishing Data in an Output Database? 9. Conclusion Our habits differ in different societies. Not every country has as many problems with transport systems as Slovenia. But still, there are some basic principles that everyone understands and that can be used when explaining the principles of building a new IT solution to a non-IT person. During the project of building the new dissemination macro database our main goal was to build a system that will support different contents, different input data formats and versatile users. From the start we have been careful about standardisation, coherence and process management. We are building on our experiences with the file server database. At the same time we are trying to meet most users' needs. Statistics is produced by people for people and our role in this process is to make it accessible, reliable and understandable. -8-