The document discusses the history and process of data journalism at the Guardian Datablog. It describes how the Datablog started as a small blog publishing underlying datasets for stories and has grown to publish hundreds of datasets, visualizations, and analyses. The process involves locating data, examining it for quality, performing calculations to identify potential stories, and publishing stories, graphics, or visualizations using tools like Excel, Fusion Tables, and custom graphics developed internally. It notes that the first example of data journalism at the Guardian was a 1821 table of school data in Manchester that revealed inaccuracies in official statistics and caused a large reaction.
Strata in Santa Clara, California has gathered over 2,000 developers, journalists and data scientists in one place to discuss data - big and small - at what has become the data event of the year. Oh and we're there too. See where the data enthusiasts came from, what they want to talk about - and how much data they process
Locate untapped sourcesRefine data rather than just selling it. For instance, the analysis georeferenced photos you have seen previously as led tothe production of new layer of information for navigation systems.Research Challenge on Visualization http://www.w3.org/2012/06/pmod/visualization.pdfIntroduction and definition As the Google CEO Eric Schmidt pointed out in 2010, currently in two days is created in the world as much information as it was from the appearance of man till 2003. This is due to the explosion in computing techniques, which led to the generation of a tremendous amount of data which are stored in the internet and processed in the IT systems all over the world. In fact as predicted by CISCO4, by 2015 the annual global IP traffic will reach 966 Exabytes (1018 bytes) (nearly a Zettabyte (1021 bytes)), increasing fourfold from about 900 Petabytes (1015 bytes) back in 2000 and around 2,500 Petabytes in 20105. But data are not only stored in the internet, rather in an exponentially increasing number of IT infrastructures.
Materialize data into new services or into new ‘data products’.Some examples of new technologies for data collections6 are: web logs; RFID; sensor networks; social networks; social data (due to the Social data revolution), Internet text and documents; Internet search indexing; call detail records; astronomy, atmospheric science, genomics, biogeochemical, biological; military surveillance; medical records; photography archives; video archives; large-scale eCommerce. In fact, in order to manage this huge amount of data, when it comes to human-computer interaction there is a need to distil the most important information to be presented it in a humanly understandable and comprehensive way. Here it comes visualisation, which is a way to interpret and translate data from computer understandable formats to human ones by employing graphical models, charts, graphs and other images that are conventional for humans7. In a sense we can define visualisation as any technique for creating images, diagrams, or animations to communicate a message or an idea. Since from the beginning of human history, visualisation has been an effective way to communicate both abstract and concrete ideas ------------------------http://www.livework.co.uk/articles/data-is-the-new-oil-part-1-business-informationData, whilst valuable, is a commodityThis is where the process of refinement comes in. We need to refine the data into services. And these services need to meet the needs and issues of the businesses that information providers hope to sell to.Data owners need to think about how to use their data to help fix their customers’ challenges rather than focusing on the number of data sets they can sell.We use information about location, weather, traffic conditions in ways that help us make decisions and fit well into our lives. We all know that information can be live, dynamic and personal to our life context. If data providers do not adopt this kind of Service Thinking then they will be superseded by more agile providers or by Google themselves. The opportunity is there for information businesses to significantly add value to their data assets by treating the provision of information as a service.---------------------------http://ana.blogs.com/maestros/2006/11/data_is_the_new.html “Data is just like crude. It’s valuable, but if unrefined it cannot really be used. It has to be changed into gas, plastic, chemicals, etc., to create a valuable entity that drives profitable activity; so must data be broken down, analyzed for it to have value.”---------------------------http://www.forbes.com/sites/perryrotella/2012/04/02/is-data-the-new-oil/according to IBM, the digital universe will grow to eight zetabytes by 2015real impetus is the potential insights we can derive from this new, vast, and growing natural resource. If data is the next big thing, then companies need to think about a new business model that exploits this valuable resource.
Big data is data that exceeds the processing capacity of conventional database systems. The data is too big, moves too fast, or doesn’t fit the strictures of your database architectures. To gain value from this data, you must choose an alternative way to process it.Today’s commodity hardware, cloud architectures and open source software bring big data processing into the reach of the less well-resourced. Big data processing is eminently feasible for even the small garage startups, who can cheaply rent server time in the cloud.The value of big data to an organization falls into two categories: analytical use, and enabling new products. Big data analytics can reveal insights hidden previously by data too costly to process, such as peer influence among customers, revealed by analyzing shoppers’ transactions, social and geographical data. Being able to process every item of data in reasonable time removes the troublesome need for sampling and promotes an investigative approach to data, in contrast to the somewhat static nature of running predetermined reports.
Noah Iliinksy’s Designing Data Visualizations Author of Beautiful Visualization & O’Reilly’s Designing Data VisualizationsNoah Iliinsky, of Complex Diagrams and Designing Data Visualizations, takes our focus from the clear and factual to good storytelling. While data has its properties that need to be honored, he places equal emphasis on knowing your audience and being able to state exactly what it is you want to convey. In terms of design advice, Iliinsky is slightly less explicit about established rules. He borrows a quote from Moritz Stefaner, that "position is everything, color is difficult." No one wants to see arbitrarily chosen, confusing color schemes, but it's no reason to shy away from it completely.Jock Mackinlay’s The Science of Visualization- Tableau
Goal: pop out important information to present effectivelyTake advantage of human visual comparison/system
“The representation and presentation of data that exploits our visual perception abilities in order to amplify cognition”http://complexdiagrams.com/2009/03/tire-chart/ Toughness axis (vertical) isn’t well-defined/ordered: “burly” vs “svelte” gives an idea but is intentionally ambiguous (loose categorical grouping) Rim sizes are preattentively differentiable Price & special features not included in this level of use other ideas: filter by rim size (and price), use icons, reduce grid lines (nominal categories?)
Hans Rosling: TEDTalks “Myths about the developing world“ (2006)
When you don’t yet have a story to tellEach color corresponds to a different group within the professional network, which can be labeled by the user. The graph should allow users to recognize connections that share mutual people, or indentify areas that might be underrepresentedZoomable interface. Select a node to see highlighted nodes that are mutual connections.
http://qph.cf.quoracdn.net/main-qimg-40df8574b885918dde4c2496025a323fuse visuals to thinkExperience is active and involves people trying to answer questionsTask: “question answering”
Visual properties don’t help us compare the share of each client
Use defaults: timelines for timeseries, maps for geographic data
This just takes technology and pours it into a periodic table-shaped box. Timelines are great — it’s a really powerful axis, that time axis, because you can see where there are clumps and trends. Pour it into a box like [the periodic table] and you get none of that.
Timeline is obviousPlacement is keySee departure and arrival times and flight duration in relation to one anotherTime bar across the top has both time zones listedsort order (ranked) “agony” filter? “Agony” is a combination of price, time of day, number of stopovers. That’s the one you want! That’s really smart.
Axes give you information for free About targets When searching (think grouping)
The top image is an example of poor use of colour to represent sea elevation and land topology. The hues have no natural order and only simply disrupts the reading.The bottom map uses natural colours (blue for ocean and brown for land). It shows ordering and depth/height using varied levels of saturation and luminance.
What does data tell us about ourselves and the places (cities, streets, buildings) we live in?A researcher, engineer in the domains of user experience and data science- Investigates interplay between people and data.
“A good sketch is better than a long speech.”
We have been focusing on specific types of data, we call ‘network data’. Network data are the byproducts of ourinteractions with digital infrastructures as nicely animated here by our friend TimoArnall in his project ‘Wireless in theworld’ http://www.nearfield.org/2010/06/new-film-wireless-in-the-world-2. Practically, we have materializinginformation from pretty much anything that is networked in our cities: cellphones, cars, shared bikes, digital cameras,credit cards, ...Video: making invisible wireless technologies visible, in order to better understand and communicate with and about them. Here we are creating communicative material that uses dashed-line abstractions to visualise the presence of wireless technologies in the everyday environment. What if we could see every field produced by an Oyster card or NFC enabled mobile phone for instance?http://www.nearfield.org/2010/06/new-film-wireless-in-the-world-2
We have been focusing on specific types of data, we call ‘network data’. Network data are the byproducts of ourinteractions with digital infrastructures as nicely animated here by our friend TimoArnall in his project ‘Wireless in theworld’ http://www.nearfield.org/2010/06/new-film-wireless-in-the-world-2. Practically, we have materializinginformation from pretty much anything that is networked in our cities: cellphones, cars, shared bikes, digital cameras,credit cards, ...Video: making invisible wireless technologies visible, in order to better understand and communicate with and about them. Here we are creating communicative material that uses dashed-line abstractions to visualise the presence of wireless technologies in the everyday environment. What if we could see every field produced by an Oyster card or NFC enabled mobile phone for instance?http://www.nearfield.org/2010/06/new-film-wireless-in-the-world-2
http://villevivante.ch/Based on this conclusion the City of Geneva decided to take the challenge to visualize these digital traces created by our mobile phones. The objective of this installation is to make this data visible and allow you to explore these streams of connected people around the city, in their everyday life.
Cumulative activity of the city per hour & per daySize + brightness indicates aggregate activity at that hour-----------------------------Every mobile phone leaves digital traces permanently, while interacting with the mobile infrastructure.Geneva generates approximately 15 million connections from 2 million phone calls per day. These 'digital traces' offer new insights about the city, which are of great interest both from a economic and political perspective. innovation opportunity for new citizen services like traffic jam detectors or nightlife buzz indicators.public administration can evaluate urban planning strategies.reveal insights for businesses on how popular certain districts are, during what time periods. reveal information that is invisible in traditional visualization techniques such as cartography.
The process of innovating with (network) data demands several clear steps, each with their own set of questions andanswers: From the data access and collection techniques, that feed data to obfuscations algorithms and big datamanagement systems that are interrogated by basic data mining operation or advanced statistical inquiries. Informationvisualization techniques are then used to build evidences and indicators used to interrogate further the data.Innovate with data : iterate through process, métiers, sketch, sketch and sketchThe process involves multiple practices and skills from engineering, to statistics, design, strategy planning, productmanagement and law.
sketches with the data at hand at each steps. We use this sketches to answered some questions that generate newinterrogations for the next phase
Sketching is not a new practice as part of a creative activity. Sketching has been widely used to innovate in drawing,painting and architecture all domains related to visualization and communication. For instance Le Corbusier whochanged the face of architecture was famous to sketch while presenting his projects and ideas:“Through visual artifacts, architects can transform, manipulate, and develop architectural concepts in anticipation offuture construction. It may, in fact, be through this alteration that architectural ideas find form”
The project gathered multiple practices from a Network Engineer to help access the data to a Product manager that had to transform insights scenarios of product.Engineer Data: network of cells that distribute phone conversationsProduct manager view: sees the data through customers and their interactionsability to quickly sketch an interactive system is a way to develop a common language amongst varied stakeholdersallows them to focus on tangible opportunities of products or services that are hidden within their data
produced a sketch to showed the data we were trying to transform, for instance revealing the quality of the data to measure mobility and the type of information that could be extracted (here mobility and density of activity on the network).
In this project, we first helped the Louvre formulate needs to measure of occupancy levels and flows. We create an inventory of the availability of datasets both internally and externally in partnership with sensor network providers. We then considered the complementarity of the information to define indicators that help facility managers, museologists and architects evaluate their strategies. We helped them design novel strategies to control hyper-congestion and ensure a good visiting experience.
So far, administrators of the museum only had a partial understanding of the problem based on observations and surveys.Used BitCarrier to collect emperical data on flows and densities of visitors in key areasBased on the measures of occupancy levels, visiting times, and centrality of trails, we developed a solution that measures the influence of hyper-congestion on the visiting experience in the most popular rooms of the museum.These results can influence the remodeling of areas and the deployment of information kiosks and help evaluate strategies and policies to control hyper-congestion.
Limitations of quants: how to qualify how people walk, etc.Doors were closed because the crowds became too largeSo we used our sketches to confront our measures and indicators with people on the field. Their *qualitative evidences* helped contextualize and qualify the early results as well as explain the detected irregularities. This qualitative view reinforced the quantitative observations and consolidated the overall knowledge on hyper-congestion. In other words, network data tell a story, not THE story.
Limitations of quants: how to qualify how people walk, etc.Doors were closed because the crowds became too largePeople on the field have the experience to help contextualize the data and early resultsSo we used our sketches to confront our measures and indicators with people on the field. Their *qualitative evidences* helped contextualize and qualify the early results as well as explain the detected irregularities. This qualitative view reinforced the quantitative observations and consolidated the overall knowledge on hyper-congestion. In other words, network data tell a story, not THE story.
Explore new roles of banks in the smart cities in the near future: needWe used maps (see examples) and interactive proof of concept to provoke the exploration of opportunities for innovative BBVA internal and external services. This investigation process led us to co-create opportunities to exploit data in the domains of distribution strategies, audience profiling and social navigation.
New perspectives for innovative servicesThis investigation process led us to co-create opportunities to exploit data in the domains of distribution strategies, audience profiling and social navigation.As part of our consulting work, we sketched a pretty advanced dashboard for participants of the project to explore and interrogate their data with fresh perspectives. (Here a mix of social network and credit card activity in Madrid). The use of the dashboard helped the participants craft and tune indicators that qualify the space (e.g. the streets of a city) based on its business activity. This experience was used to develop specific scenarios involving services and products that exploit a bank could take advantage of. multiple perspectives extracted from the use of exploratory data visualizations is crucial to quickly answer some basic questions and provoke many better ones-that generate new interrogations for the next phase
Quadrigram is an online platform with a Visual Programming Language, that can be used to gather data and generate meaning through data processing and information visualization. Modular interface to design information flows, linking data resources to operators, controls and viz methods within node-based GUI that displays structure of your process. These modules form a data flow when you link them together. Each time you modify a modules, the update is propagated throughout the flow. Access, manipulate, analyze and visualizeFreely explore multiple dimensions of a single dataset, each time generating a set of questions and answers.Additionally they reduce the prototyping time necessary to sketch interactive visualizations that allow the different stakeholder of an organization to take an active part in the design of services or products.
Real-time traffic information: their sensor networks measures the quantity and speed of the traffic in key areas of a city.Exploratory data analysis approach to create an interactive applicationFive representations of a single data set:Table visualizer (rows & columns)Network visualization to see relationships between pointsGeodata to view points on map to view context. View trajectory of traffic in a single slice of timeData in real-time. Incoming up-to-the-second data to see motion of traffic between points, moving at different velocities Data as a living materialTemporal data: temperature data--------------
Real-time traffic information: their sensor networks measures the quantity and speed of the traffic in key areas of a city.Exploratory data analysis approach to create an interactive applicationFive representations of a single data set:Table visualizer (rows & columns)Network visualization to see relationships between pointsGeodata to view points on map to view context. View trajectory of traffic in a single slice of timeData in real-time. Incoming up-to-the-second data to see motion of traffic between points, moving at different velocities Data as a living materialTemporal data: temperature data--------------
This example shows how multiples interrelated perspectives on the same data (temporal bar charts, quadrifications, maps, and scatter plots) can create a powerful tool that permits us to explore the activities of a company by projects, sectors, location, and profitability.This application collects and analyses the sentiment expressed in real-time on Twitter. The results shows the positive and negative polarities with respect to a word you define.So, we have seen that our world produces new type of data - network data - that is now treated is a material. There areboth processes and tools that help innovate with this evolution. From our experience, there are values to sketching withdata, in the same ways as strategists, innovators and world changers have been using sketches in the past.
Visualization is one of the most advanced fields in policy modeling, being able to foster the design of more effective and efficient policies, as well as to make sense of large datasets, such as those provided as open government data. In fact the huge increase in data availability is also due to the so called "open data" movement, characterized by the fact that all across Europe and the US, governments are increasingly publishing their data repositories for other people to access and use it.
This map visualizes crowd-sourced radiation geiger counter readings from across Japan. Click on the labels to get more information on the source of each reading.The number of locations fluctuate due to the validity of the data feeds. There are approximately 185 feeds from the official Japanese government source MEXT and the rest are from other sources such as the Tokyo hackspace, universities, local councils and concerned individuals.
http://cpstiers.opencityapps.org/
Simon Rogers is editor of The Guardian Data Blog (www.guardian.co.uk/data, @datastore) an online data resource which publishes hundreds of raw datasets and encourages its users to visualise and analyse them. He is also a news editor on the Guardian, working with the graphics team to visualise and interpret huge datasets.
Simon Rogers is editor of The Guardian Data Blog (www.guardian.co.uk/data, @datastore) an online data resource which publishes hundreds of raw datasets and encourages its users to visualise and analyse them. He is also a news editor on the Guardian, working with the graphics team to visualise and interpret huge datasets.Manually pick out data from PDF to extract specific information
The tools we have to analyse the data may have changed; that motivation has stayed exactly the same.How all the spending fits together: see department cuts and which programmes received big increases (nuclear, defence)Most comprehensive atlas of public spending availableEver year every government dept publishes an annual report which includes breakdowns of spendingManually pick out data from PDF to extract specific information
The data itself covers over 194,000 individual transactions, payments to suppliers and bills covered by government departments in the first five months of the life of the Coalition. There's lots excluded, though: the NHS, benefit payments, spending by quangos, information removed for "national security" and personally confidential reports. It's about £80bn of an annual spend of £670bn.We figured 170 spreadsheets is too much for most people to browse, so Guardian lead software architect Matthew Wall has built this usefulspending data explorer app. It's designed to make it easier for you to search and download the key data you're interested in.We may even have done some of the analysis you're looking for already. We've combined spending for each department into single spreadsheets. Here's what you can find:• Sheet 1: Every item for the department• Sheet 2: Detailed breakdown of type of spending• Sheet 3: Broader breakdown into fewer areas• Sheet 4: Every supplier listed in alphabetical order and by size (watch out on this one for different spellings of the same supplier)
Soldiers are good at entering data – locations where soldiers died in Afghanhistan (date, what happened, # of casualties, summaries)
Interactive map display region using wikileaks war log dataWikileaks: every IED attack, with co-ordinates2004-2009
Made it more interesting/rewarding for people: asked ppl to do smaller tasks with reduced number of data “zooniverse” – citizen science project to transcribe documents, visually classify images, categorize etc. added recognition to users: keep track of task assignments, see progress reward for work: identification from journalists / editorial feedback allow users to skip over uninteresting docs – lead to users reviewing more docs on average ability to view data about your own MP
BlackoutGate: massive cover-up of their expenses after the Commons authorities released hundreds of thousands of claims documents and receipts with huge sections of detail blacked out. belief that publication would be in breach of the Data Protection Act.http://www.guardian.co.uk/politics/2009/jun/18/mps-expenses-censorship-black-out
First time such a major attempt had been made to forensically examine the motivations behind a riot since the work in Detroit in 1967Gathered qualitative data of the interviews and quantitative responses to a set of questionsUK riots: every verified incidentCollected key reported incidents from as many possible sourcesRaw data in Google spreadsheets: approx time, date, place, location details, local authority, what happened, sourceMapped with Google Fusion tables