Se ha denunciado esta presentación.
Utilizamos tu perfil de LinkedIn y tus datos de actividad para personalizar los anuncios y mostrarte publicidad más relevante. Puedes cambiar tus preferencias de publicidad en cualquier momento.

Data Speaks: Data Stories That Matter

895 visualizaciones

Publicado el

This book offers a compliation of content that has been published via our Data Speaks blog during our first year of life. LUCA is the Big Data unit of Telefónica and its mission is to accompany organization on their transformation to becoming data-driven. The book is divided into three parts: Business, Big Data for Social Good, and Artificial Intellignece. We hope you enjoy this snapshot of our work and encourage you to follow us on Twitter (@LUCA_D3) and LinkedIn for more daily content!

Publicado en: Datos y análisis
  • Sé el primero en comentar

Data Speaks: Data Stories That Matter

  2. 2. Data Stories that Matter 1 Data Speaks Data stories that matter Dr. Richard Benjamins (Editor) Florence Broderick Dr. Pedro de Alarcon Javier Carro Ana Zamora With contributions from Chema Alonso Cambria Hayashino María Lucaya Gavin Hastings José María Álvarez Pallete Nicolas de Cordes (Orange) Dr. Yves-Alexandre de Montjoye (MIT) María Esperanza Díaz (Diseño de portada) LUCA Data-Driven Decisions Telefónica
  3. 3. Data Stories that Matter 2 Foreword .....................................................................................4 Introduction.................................................................................6 Part I – Business...........................................................................8 From Data Exhaust to Data-Driven: How CEOs face Big Data .............................................................................................. 12 Big Data: What's the economic value of it? ......................... 15 54% of organizations now have Chief Data Officers, but should mine? ........................................................................ 24 Open Data and Business - a paradox?.................................. 29 International call traffic may tell you more than first thought .............................................................................................. 32 Using data-driven decisions in Brazil's tourism sector......... 46 Big Data and Tourism: How this Girona Festival became Data- Driven.................................................................................... 47 Can Big Data reshape the Outdoor Media sector?............... 50 How these 4 sports are using Data Science.......................... 53 Part II – Big Data for Social Good – Data with a Soul ............... 59 Big Data: moneymaker and force for social good? .............. 64 The 6 challenges of Big Data for Social Good....................... 68 7 ways mobile data is being used to change the world ....... 74 It's time to scale up Big Data for Social Good – Interview with Nicolas de Cordes ................................................................. 80 What does Big Data tell us about the way people move around Brazil's favelas?........................................................ 85 Poverty Insights in Guatemala using Big Data...................... 88 Big Data for understanding the impact of Climate Change on migration .............................................................................. 92 Refugees of the Syrian War; a humanitarian drama reflected in data................................................................................... 96 Commuter Traffic: Can Big Data solve the problem?......... 103 Air Quality: How can Open Data and Mobile Data provide actionable insights?............................................................ 111 Can Mobile Data Combat Climate Change in Germany?.... 119 How Big Data helped Stuttgart improve commutes and tackle Climate Change................................................................... 122
  4. 4. Data Stories that Matter 3 Open Algorithms, what are they? Yves-Alexandre de Montjoye explains .............................................................. 126 Part III – Artificial Intelligence ................................................ 130 Chatbots? New? You haven't met ELIZA ............................ 132 Artificial Intelligence: What even is that? .......................... 136 What are the sub areas of AI?............................................ 139 How "intelligent" can Artificial Intelligence get? ............... 143 Can machines think? Or are humans machines? ............... 150 Artificial Intelligence vs Cognitive Computing: What's the difference?.......................................................................... 154 Final words.............................................................................. 157 Our top 5 Big Data blogs..................................................... 157 About LUCA............................................................................. 157 About the authors and contributors ...................................... 160 References.............................................................................. 165
  5. 5. Data Stories that Matter 4 Foreword Observe, Record, Analyze, Interpret… A human being is curious. He observes his environment, he asks questions. he tries to understand what is happening around him. How things work. How to turn this difficult and onerous task into an easier one. He looks for answers. Ancient civilizations of the Middle East in Assyria, Babylon and further afield in India and China were agrarian cultures. For these economies, it was very important to know how to understand and measure the cycles of Nature, and to create an agrarian calendar. Thus, as early as the VIII century, they began to look skyward to observe the stars and the planets. Eventually, they were able to precisely measure the movement of the planets and thus determine the beginning of the month, which was the day following the new moon. At the beginning, this date was determined by directly observing the sky. Then, the Babylonians took a step forward and created the first calculation systems to represent the movement of the stars. For the first time, they use Mathematics as a language able to “record” and “explain” the complexity of the world and natural events. This way, they learnt how to predict periodic phenomena and how to identify, represent and analyze patterns by using mathematical models to predict the positions of the Moon and the Sun, and when an eclipse was expected. This way, they accumulated and rationalized empirical concepts about nature and society, from which the early origins of Astronomy, Mathematics, Ethic and Logic arose. The mantra was: Observe, Record, Analyze, Explain, Predict Nowadays, our world and Science itself have changed a lot. However, the mantra “Observe, Record, Analyze, Explain, Predict” is still valid. The first scientists made their observations by sight, recorded them in
  6. 6. Data Stories that Matter 5 clay tablets, did their calculations by hand using cuneiform script… Today, we are surrounded and even wear all kind of state of the art technology sensors. These devices produce the huge and complex volume of data that we have to collect, record, protect and analyze. We don´t need to calculate by hand on clay tablets anymore. We have developed complex mathematical algorithms that can be processed simultaneously among many computers in a cluster, so that we can obtain the answers we need quickly and efficiently. Nevertheless, the mantra of understanding “why”, identifying patterns, and predicting future behaviors remains the same. The objectives of today’s Data Scientists are also the same: helping to make our life easier, solving our business problems and organizing society in a more fair and intelligent way. Chema Alonso
  7. 7. Data Stories that Matter 6 Introduction Big Data, and increasingly Artificial Intelligence, are hot topics in business and society, and many people and organizations have an interest to master those topics better to understand how to apply it to their business or activities. This books gives many examples of how big data is used in different use cases and applications across a variety of sectors. In this book we let data “speak” to tell stories. Stories that matter for business, for people and for society. The book also explains some fundamental concepts of Artificial Intelligence (AI); we feel there is a need to equip readers with knowledge to sift through the many popular articles appearing on the impact of AI, and to let them form their own opinion. Much of the content of this book has been published on the Internet in our Data-Speaks blog of LUCA (data-speaks.luca- Writing quality content is time consuming process, and we believe there is an added value in packaging up this content in a thematic and coherent way. While this book is printed, and gives a snapshot of six months of blog writing, the blog continues to grow with 1 post per workday. This book is a joint effort of the LUCA team. LUCA is the Big Data business unit of Telefónica. The mission of LUCA is to accompany organizations –enterprises, public administrations- on their transformational journey to become a data-driven organization. LUCA stands for the Last Universal Common Ancestor, which is the last organism which all life on Earth has in common. After LUCA the organisms start to diverge. In the same way, we believe that in the digital world, data is something that all organizations will have in common for the times to come, hence LUCA.
  8. 8. Data Stories that Matter 7 The book is organized in three parts, each representing a specific theme: • Part I – Business. This part describes examples of how Big Data can be applied to create value for businesses in different sectors. • Part II - Big Data for Social Good. Apart from creating value for businesses, Big Data can also be used to improve the world by helping achieve the UN’s Sustainable Development Goals. • Part III - Artificial Intelligence. While AI exists for more than 50 years, currently (2017) it is a hype. It is not always easy to understand the claims made about the impact of AI. In this part we explain the fundamental concepts underlying Artificial Intelligence, so readers can develop their own criterion to shift through the large amount of publications that are appearing today. There is no need to read the book in order, as the chapters are individual stories about, or by data. However, there is a certain reason why the chapters appear in the order they do, namely from more generic and theoretical to more concrete and practical. While this book is a snapshot of the best content in the period between November 2016 – March 2017, new posts are published daily on our Data Speaks blog at data-speaks.luca-
  9. 9. Data Stories that Matter 8 Part I – Business
  10. 10. Data Stories that Matter 9 Big Data started entering the boardrooms of the Fortune 500 companies after McKinsey published its now seminal report on the value of Big Data for different sectors1 . Since then, many large enterprises have started their Big Data journey – as part of their digital transformation- to become a data-driven organization. Many of those enterprises have invested significant amounts in the set-up of big data platforms and data scientist teams. However, a large percentage of those are struggling to see tangible results of their investments. And this seems to be more the rule than the exception. Experience has taught us that creating value from Big Data is not easy, as it implies much more than just technology. Our experience is that companies need to work in parallel on the following tasks to start and complete their Big Data journey as part of their digital transformation. This is illustrated in Figure 1. • Data Engineering. Organizations that want to become data-driven should consider data as a strategic asset, just like their financial or real estate assets. To take data- driven decisions based on incorrect data may be worse than not being data-driven. Therefore, it is very important have processes in place for data sourcing, data quality and data governance. • Tools & Infrastructure. You can’t take advantage of Big Data without the proper platforms and infrastructure. Companies have to think about where to store their data; in the cloud or on-premise. And they need to define the Big Data architecture, whether it is open source or based on a propriety solution of a vendor. To bring analytics capabilities to as many as employees as possible, there are specialist tools that allow non-technical people to rapidly build dashboards or execute Machine Learning algorithms.
  11. 11. Data Stories that Matter 10 • Data Science is the process to analyse all the data to solve specific problems. Techniques come from Artificial Intelligence, Machine Learning – including Deep Learning, Mathematics and Statistics. In order to do so, organizations need to have access to data engineers and data scientist, either in-house or outsourced. • Business Insights are the results of data science that help improve business processes and results. And the more advanced organizations can even generate insights from their data that help other organizations to become more data-driven. • Strategy & Transformation. While achieving those four tasks is necessary to become data-driven, it is not sufficient. For many of the Big Data use cases, people still form a critical element in the data value chain. The value of Big Data can only be realized if there are people interested in, willing to or passionate about taking actions based on the insights. And that requires a data- friendly culture. A data-friendly culture implies that people are willing to share data that used to be the realm of heads of departments. Sharing data may imply loss of control. A data-friendly culture also means that you need to accept the conclusions of the data even if you don’t like them. It requires managers to kill their own projects and products quickly, if they do not give the expected results, and that is not easy. • Security is the last, but not the least, part of becoming a successful data-driven organization. It is very important since much of the data used, in its origin has been or still is personal data. Therefor it is important to apply the principles of Privacy & Security by Design and by Default to all Big Data initiatives. While this is sometimes slowing down the initiatives, it shouldn’t be overlooked before things are put in production.
  12. 12. Data Stories that Matter 11 Figure 1 The ingredients of a successful Big Data journey. In the posts in the first part of this book we discuss relevant aspects of Big Data for businesses; aspects organizations probably will struggle with on their journey to become data- driven. We will also give several concrete examples of value creation from Big Data applied to different sectors including tourism, traffic, media, and sports.
  13. 13. Data Stories that Matter 12 From Data Exhaust to Data-Driven: How CEOs face Big Data2 By Richard Benjamins Since Big Data became a buzzword in the board room of companies some years ago (thanks to McKinsey's report "Big Data: The next frontier for innovation, competition, and productivity"), many organizations have started Big Data initiatives in the hope of achieving its full potential. Over time, many companies have started pilot projects to address some of their most important business issues. Often, these initial steps have not shown immediate results for a number of reasons, which have been amply published online. From our experience, one of the main reasons behind the failure of Big Data projects is Data Access and Data Quality. This is mostly true for "non-digital native" companies, and stems from the fact that such organizations never considered that the data their systems generated could be of strategic value. In other words, data was considered an "exhaust": a side-effect or a mere by-product of running the business. While some things were done with some of this data such as descriptive business intelligence (i.e. what has happened), data was never considered as a strategic asset. Normally, organizations take meticulous care of their strategic assets, and manage them explicitly, keeping a close eye on them at all times.
  14. 14. Data Stories that Matter 13 Figure 2 Gartner infographic about CEOs on Data as an Asset When companies start their data journey, they often don't realize that their data has not been carefully taken care of or collected. It might be incomplete, duplicated, hidden, incorrect or even missing. When Data Scientists first get their hands on the data, they have many questions, and will find insights that do not make sense from a business perspective, perhaps even leading to wrong conclusions. Big Data Analytics and Machine Learning are no exception to the rule "garbage in, garbage out". For all of these reasons, it is important for organizations to have the right expectations when starting their data journey. We are not saying that much upfront investment needs to go into data asset management, but that organizations must be aware of the potential pitfalls in their Big Data pilots. Ideally, business leaders need to move things in parallel: starting to create value through pilots, but also starting with data management so that when you are ready to industrialize or scale Data Science projects, the data is in good shape and a first-class asset.
  15. 15. Data Stories that Matter 14
  16. 16. Data Stories that Matter 15 Big Data: What's the economic value of it?3 By Richard Benjamins How do we put an economic value on Big Data initiatives in our organizations? How can we measure the impact of such projects in our businesses? How can we convince senior leadership to continue and increase their investment? In this blog we share our perspective. Most of us who are familiar with the Big Data boom, are also familiar with the big and bold promises made about its value for our economies and society. For example, McKinsey estimated in 2011 that Big Data would bring $300bn in value for healthcare, €250bn for the European Public Sector and $800bn for global personal location data.4 Recently, McKinsey also published an estimation of what percentage of that originally identified value has become a reality as of December 2016, which is up to 30%, with an exception of 50-60% for location-based data5 . These astronomic numbers have convinced, and are still convincing, many organizations to start their Big Data journey. In fact, only recently Forbes6 and IDC7 have estimated the market value for Big Data and Analytics technology to grow from $130B in 2016 to $203B in 2020. However, these sky-high numbers do not tell individual companies and institutions how to measure the value they generate with their Big Data initiatives. Many organizations are struggling to put an economic value to their Big Data investments, which is one of the main reasons why so many initiatives are not reaching the ambitious goals they once set.
  17. 17. Data Stories that Matter 16 So how can we put numbers on Big Data and Analytics initiatives? From our experience, there are four main sources of economic value: Reducing costs with Big Data IT infrastructure There are considerable savings to be made on IT infrastructure: from propriety software to open source. The traditional model of IT providers of Data Warehouses is to charge a license fee for the software part and charge separately for the needed professional services. Some solutions, in addition, come with specific hardware. Before the age of Big Data this model has worked well, but with the increasing amount of data (much of which is non-structured and real-time), existing solutions have been come prohibitively expensive. This, in combination with a so-called "vendor lock-in" (due to committed investments and complexity, it becomes very costly and hard to change to another vendor solution) has forced many organizations to look for alternative, more economical, solutions. The most popular alternative is now provided by the Open Source Hadoop8 ecosystem of tools to manage Big Data. Open Source software has no license cost, and is therefore very attractive. However, in order to be able to take advantage of the Open Source solutions for Big Data, organizations need to have the appropriate skill set and experience available, either in- house, or outsourced. The Hadoop ecosystem software runs on commodity software, scales linearly and is therefore much more cost effective. For those reasons many organizations have substituted part of their propriety data infrastructure with Open Source, potentially saving up to millions of euros annually. While saving on IT
  18. 18. Data Stories that Matter 17 doesn't give you the largest economic value, it is relatively easy to measure in the Total Cost of Ownership (TCO) of your data infrastructure, and therefore it is a good strategy to start with. Optimization of your business There is no questioning that Big Data and Analytics can improve your core business. There are two ways to achieve such economic benefits: by generating additional revenues or by reducing costs. Generating additional revenues means doing more with the same, or in other words, using Big Data to generate more revenues. The problem with this is that it is not easy to decide where to start, and it can be hard to work out how to measure the "more". Reducing costs means doing the same with less, or in other words, using Big Data to make business processes more efficient, while maintaining the same results. External Data Monetization Here, the economic value of Big Data is not generated from optimizing your business, but it is generated from new, data- centric, business. This is only for organizations that have reached a certain level of maturity in Big Data. Once organizations are ready to materialize the benefits of Big Data to optimize their business, they can start looking to create new business around data, either by creating new data value propositions, i.e. new products where data is at the heart, or by creating insights from Big Data to help other organizations optimizing their business. In this case, measuring the economic value of Big Data is not different from launching new products in the market and managing their P&L.
  19. 19. Data Stories that Matter 18 We believe that in the coming three to five years, the lion share of the value of Big Data will come from business optimization, that is, by turning companies and institutions into data-driven organizations that take data-driven decisions. And those are the kind of Big Data initiatives that organizations struggle to put an economic value on. Savings from IT are a good starting point, but will not scale with the business, while revenues from data monetization will become huge in the future, but are currently still modest compared to the potential value that can be generated from business optimization. Most businesses start their Big Data journey the right way. They make an opportunity-feasibility matrix, which plots the value of a use case against how feasible it is to realize that value. Figure 3 shows an example from EMC. The use cases to select would be those in the upper right quadrant:
  20. 20. Data Stories that Matter 19 Figure 3 Opportunity Matrix for Big Data Use Cases - value versus feasibility. A good way to estimate the business value of a use case is to multiply the business volume with the estimated percentage of optimization. For instance, if the churn rate of a company is 1% (per month) and there are about 10M customers, with an ARPU (average monthly revenues) of €10, then the business volume amounts to €1M per month or €12M per year. If Big Data could reduce the churn rate by 25%, that is, from 1% to 0.75%, then the estimated value would be €250.000 per month. As an example of a cost saving use case, consider procurement. Suppose an organization spends €100M on procurement every year. Analytics might lead to a 0.5% optimization, which would amount to a potential value of €500.000 per year. There are hundreds of Big Data use cases and the TM Forum9 gives an extensive overview of some of the most relevant ones in the telecommunications sector. However, once the initial use cases have been selected, how should you measure the benefits? This is all about comparing the situation before and after, measuring the difference, and
  21. 21. Data Stories that Matter 20 knowing how to extrapolate its value if it were applied as business as usual. Over the years, we have learned that there are two main issues that make it hard to measure and disseminate the economic impact of Big Data in an organization: 1. Big Data is almost never the only reason for an improvement. Other business areas will be involved and it becomes then hard to decide how much value to assign to Big Data. 2. Telling the whole organization and top management about the results obtained. Giving exposure to the value of Big Data is fundamental in raising awareness and creating a data-driven culture in your company. With regards to point 1, Big Data is almost never the only reason for creating value. Let's consider the Churn use case, and assume you use Analytics to better identify what customer are most likely to leave in the next month. Once the customers have been identified, other parts of the company need to define a retention campaign, and yet another department executes the campaign, e.g. through calling the top 3000 people at risk. Once the campaign is done, and the results are there, it is hard to decide whether the results, or what part of it, are due to Analytics, due the retention offer or due the execution through the call centres. There are two ways to deal with this issue: 1. Start with use cases that have never been done before. An example of such a use case would be to use real- time, contextual campaigns. Real-time campaigns are not yet frequently used in many industries, and require Big Data technology. Imagine you are a mobile customer with a data tariff, watching a video. The use case is to detect in real-time that you are watching a
  22. 22. Data Stories that Matter 21 video and that you have almost reached the limit of your data bundle. The usual things to happen in those cases are that you either are throttled or are completely cut-off from Internet. Either situation results in a bad customer experience. In the new situation, you would receive a message in real-time telling you about your bundle ending, and asking you whether you wanted to buy an extra 500MB for €2. If you accepted this offer, then the service would be provisioned in real-time, and you would be able to continue watching your video. The value of this use case is easy to calculate: simply take the number of customers that have accepted the offer, and multiply it by the price charged to the customer. Since there is no previous experience with this use-case, few people will challenge you that the value is not due to Big Data and Analytics. 2. Compare with what would happen if you didn't use analytics. The second solution is a bit more complex, but applies more often than the previous case. Let's go back to the churn example. It is unlikely that an organization has never done anything about retention, either in a basic or more sophisticated way. So, when you do your Analytics initiative to identify customers that are likely to leave the company, and you have a good result, you can't just say that all is due to Analytics. You need to compare it with what would have happened without Analytics, all other things being equal. This requires using control groups. When you select a target customer set for your campaign, you should reserve a small, random part of this set to treat them exactly the same as the target customers, but without the Analytics part. If you do so, then any statistically significant difference between the target set and the control group can be assigned to the influence of Analytics. For instance, if with this, you retain 2%
  23. 23. Data Stories that Matter 22 more customers than the control group, you then calculate how much revenue you would retain annually, if the retention campaign would be run every month. Some companies are able to run control groups for every single campaign, and are always able to calculate the "uplift", and thus continuously report the economic value that can be assigned to Analytics. However, most companies will only do control groups in the beginning to make and confirm the business case, and once confirmed they consider it business as usual (BAU), and a new baseline has been created. With regards to point 2, sharing results of Big Data within the organization in the right way is fundamental. It is our experience that while business owners love Analytics for the additional revenues or cost reduction, at first they are not always willing to tell the rest of the organization about it. But evangelizing in the organization about the success of the internal Big Data projects is critical to get top management on board and to change the culture. Why would individual business owners hesitate in sharing? The reason is simple: we are human. Showing the wider organization that using Big Data and Analytics creates additional revenue makes some business owners worry about getting higher targets, but not with more resources (apart from Big Data). Similarly, other business owners might not want to share a cost saving of 5%, since it might reduce their next budget accordingly. Haven't they shown - through Big Data - that they can achieve the same goals with less? This is an example of a cultural challenge. Luckily it is not sustainable to maintain such a situation for a long time, and in the end, all organizations get used to publishing the value. But, it might be a problem especially at the beginning of the Big Data journey, when such economic numbers are most needed.
  24. 24. Data Stories that Matter 23 For those organizations that in the end do not succeed to measure any concrete economic impact, don't worry too much either. Experience teaches us that, whereas organizations at the early phase of their journey are obsessed with measuring value, more mature organizations know that there is value and do not feel the need any more to measure improvements. Taking full advantage of Big Data has changed the way departments interact and that is one of the main value drivers. Big Data has become fully integrated with Business As Usual. Big Data = BAU.
  25. 25. Data Stories that Matter 24 54% of organizations now have Chief Data Officers, but should mine?10 By Richard Benjamins With Big Data becoming such a big deal in the world of business, it is no surprise that the Chief Data Officer (CDO) has managed to wriggle its way into an extra seat around the boardroom table. Increasingly more organizations, in both the private and public sector, consider data to be a strategic asset, and for this reason, the most forward-thinking companies are appointing CDOs. In fact, according to a recent survey,11 54% of firms now report having appointed a CDO, up from just 12% in 2012. Until the appearance of this new role, Business Intelligence (BI) and Big Data initiatives had often been remotely dispersed throughout organizations, working in isolated departments - even if there was supposedly a central BI department keeping tabs on the overall company data strategy. So, what kind of questions will an organization be asking themselves ahead of appointing a CDO? We thought of a few: • How far should the CDO be from the CEO? CEO-1 or CEO-n? • If it is CEO-1, how does the CDO relate to the other officers, in particular the CIO and CTO? • If it is CEO-n, to what Officer should the CDO report to? The CIO, COO, CMO, CFO, the Chief Transformation Officer, or the Chief Digital Officer?
  26. 26. Data Stories that Matter 25 To leverage the full potential of data, the CDO is best placed in an area whose mission is cross-company and that represents a large chunk of the business. In this way, the value creation is not limited to one specific area (e.g. marketing), and the value is relevant for the business. Doing otherwise, creates a bias towards creating value only from data in a specific area, or in an area that doesn't really matter. Therefore, many argue that the best place to be for the CDO is at CEO-1 or at CEO-2 under the COO, which is cross-company. Having the CDO directly reporting to the CEO gets him or her a seat on the Executive Committee, which delivers a strong message both internally and externally. There are two alternative Officers who also ensure cross-organizational application and relevance: the Chief Transformation Officer and the Chief Digital Officer. While by nature those two roles have a temporary role (albeit for several years), they work in a cross- organizational manner and are tasked with the mission of adapting their business to the digital world, of which data is a pivotal part. Of course, having the CDO directly reporting to the CEO is not necessarily suitable for all organizations at all times. It requires a level of "data literacy", and is likely to be reserved for the more forward-looking organizations who really know and embrace the fact that they have to adapt to the digital world in a data-driven way. So why may organizations not yet want a want a CEO-1 position for the CDO? • Some companies may be too immature from a data perspective (i.e. not fully data-literate) and therefore might want to place the CDO under the CIO with IT to
  27. 27. Data Stories that Matter 26 make sure that there is sufficient quality data before starting to exploit it. • Some organizations have a very clear idea of where to start exploiting data, so they place it under the corresponding department. For example, companies in sectors such as FMCG with a strong interest in improving their consumer marketing might place the CDO under the CMO. Those who want to innovate with data might even place it under the CTO (R&D), whilst organizations which want to save money, might place it under the Global Resources Officer. In general, if the CDO is placed within a specific area, it normally implies that the CDO inherits some of the objectives of that area. If it is under marketing, then objectives will probably be phrased in terms of sales or revenues. If it is under Global Resources, then it will likely be related to savings. Helping areas outside of their specific area then becomes a best-effort thing, rather than a core responsibility - depending on the bandwidth of the area of the CDO. However, experience teaches us that it is challenging to see this kind of cooperation beyond the day-to-day corporate limits of KPIs. So, if an organization decides to place the CDO under one of the Officers without a cross-organizational responsibility, they create an unnecessary limitation to value creation from data. But why then are most CDOs not CEO-1, but -2 or sometimes even CEO-3 or -4? Below, we briefly list the pros & cons for why an organization might do it this way: Pros & Cons of CDO under:
  28. 28. Data Stories that Matter 27 Figure 4 The Pros and Cons of a CDO's position in the org chart Of course, whether a CDO is successful in his or her job does not only depend on how the role is placed in the organization, but it is an important factor. Other relevant factors are discussed in elsewhere online,12 such as business sponsorship or a lack of clarity on the role. In Telefónica, the CDO function was introduced to the Executive Committee at the end of 2015 and is currently held by Chema Alonso, whilst 5 years ago it was between CEO-5 and -4. Three years ago it became -3, then two years ago -2 and now it is CEO- 1 - showing just how fundamental data is in Telefonica’s strategy going forwards in their quest to put customers at the centre of everything they do. Of course, this discussion is much more relevant for those organizations who are on their journey to becoming data- driven. However, there are many companies who are already data companies (i.e. their business is the data) and in their case, the CDO has very different requirements. Gartner wrote a report on the four types of Chief Data Officer Organizations highlighting that in data companies, the CDO is even more critical13 . We think that in such companies the CDO might even be the CEO. We may
  29. 29. Data Stories that Matter 28 not know what the future holds for big corporates, but we do know that it will be driven by data.
  30. 30. Data Stories that Matter 29 Open Data and Business - a paradox?14 By Richard Benjamins While Open Data has a wide range of definitions, Wikipedia15 provides one of the most commonly accepted: "Open Data is the idea that some data should be freely available to everyone to use and republish as they wish, without restrictions from copyright, patents or other mechanisms of control." From our perspective, the most important word in this definition is "freely". And we pose the question: does this mean that Open Data and Business are incompatible? The short answer: absolutely not. McKinsey stated in a 2013 report16 that Open Data (public information and shared data from private sources) can help to create $3 trillion a year of value in seven areas of the global economy. The opportunities that arise when data is opened up to the masses are clear. However, the longer answer is that anyone who has tried to get some Open Data and performs an analysis, knows that this is not trivial. Open Data varies much in terms of quality, formats, frequency of updates, support, etc. Moreover, it is very hard to find the right Open Data you are looking for. Today, most business and value from Open Data is generated through ad hoc consultancy projects that search, find and incorporate Open Data to solve a specific business problem. However, one of the visions of Open Data is to create a thriving ecosystem of, on the one hand, Open Data publishers, and on the other hand, users, developers, start-ups and businesses that
  31. 31. Data Stories that Matter 30 process, combine and analyse this Open Data to create value (e.g. to solve specific problems, or to discover important and actionable insights). The current state of play is that those thriving ecosystems are still being formed, and there are several initiatives and companies that try to position themselves, mostly in specific niche markets. A few players in the field include: • OpenCorporates17 . A large open database of companies in the world. • Transport API18 . A digital platform for transport collecting all kinds of transport data, especially in the UK. • Quandl19 . A financial and economic data portal. Those companies and organizations focus on aggregating Open Data in a specific niche area, and their business model is built around access to curated quality data. Other types of companies then can use this Open Data to run a specific business. A typical example of such a business is Claim my Refund20 , which uses Transport Open Data (e.g. from a Transport API) to automatically claim refunds for their customers in case there are delays on their underground trips in London. Another business model around Open Data is to help institution publish their Open Data in a structured way. Such projects are mostly performed for governmental institution: • Socrata21 and Junar22 are cloud platforms that allow government organizations to put their data online. • Localidata23 focuses on Location Data, especially in Spain. • FiWare24 is an independent, open community to build an open sustainable ecosystem around public, royalty-free
  32. 32. Data Stories that Matter 31 and implementation-driven software platform standards. Once the data is published as Open Data, developers and other companies can then access that data and build value added applications. In the governmental space it is not uncommon for Public Administration to pay for having their data published as Open Data, and then to pay again for an innovative application that uses this Open Data to provide value to citizens (e.g. with information about schools). In conclusion, there is definitely a business model for Open Data. In the short term around specific niche areas such as transport, or through ad hoc consultancy projects. In the mid-term, business will evolve around ecosystems around Open Data both coming from the public and the private sector. However, the current state of play is relatively immature. The bottom line is that public Open Data still lacks quality and private Open Data is barely available. But this doesn't mean that Open Data is not powerful yet. A great example of this is where British Airways uses only three Open Data sets for an amazingly innovative advertising campaign in Piccadilly Circus in London25 . On a huge screen in Piccadilly Circus in London, a boy stands up and points to a passing planes only if it is a BA flight and it can be seen (i.e. there are no clouds). This advert is based on bringing together three data sources, which are all publicly available: GPS data, plane tracking data and weather data. This work illustrates the power of Open Data when combined with creativity.
  33. 33. Data Stories that Matter 32 International call traffic may tell you more than first thought26 By Pedro de Alarcón and Javier Carro This post debates the value of international phone calls in understanding society. As many telecommunications operators, Telefónica has a wide global infrastructure of networks which can be used by other service providers to carry their international call and data traffic. Telefonica Business Solutions27 sell this service, amongst others, negotiating wholesale business deals. Their role throughout this process is to collect call and data traffic in one country (provided by a telecommunications operator) and effectively transport and pass this on to another operator in a different country. We recently had the chance to process and analyse a few months’ worth of data relating to this service. The aim of this was to let us understand the data and information allowing us to discover some interesting facts with a simple analysis. All the information that is stored and processed is done anonymously, ensuring a secure working environment. When dealing with voice calls, the characteristics of each “recording/event” have a dataset which can be summed up as the phone number from the country of origin, a destination phone number, a timestamp and the call time duration. To add a deeper analysis, we can also use further parameters – but we won’t do that on this occasion. Whilst the phone numbers we deal with are anonymous, we can access the country code and in some cases the region or
  34. 34. Data Stories that Matter 33 province of the number of the person making the call and the person receiving the call. This dataset may face limitations in terms of data variance but it is expansive in terms of volume. In terms of structure it is very similar to some popular open data relating to air traffic28 . In fact, this resemblance has allowed us to easily reuse some interpretations of the data as they have been previously formed by Carto29 . Let’s listen to what the data tells us: Given the basic information that this dataset has provided, the first exploration that we have raised is the evolution of the total number of calls that they have studied. The following graph is a representation of the daily traffic that was studied. Figure 5 A specific representation of the amount of calls managed by Telefónica. We can see a clear weekly pattern and the curious changes that happen in different weeks. We can easily see the weekly pattern with dips during the weekend. What is most notable is the weekly variation and of course even more so when the variations are very pronounced. The data is starting to show something worth debating, so what is the data trying to tell us? The answer becomes clearer when we start to travel between countries. For example, in the following graphic we can see the daily progress of the number of calls to Italy from various countries. The biggest peaks that appear on the right hand of the
  35. 35. Data Stories that Matter 34 graph are from the 24th of August 2016, which is when a large earthquake took place in Italy30 . Figure 6 Representation of the amount of calls made to Italy from different countries worldwide during the earthquake that took place on the 24th of August in Italy.
  36. 36. Data Stories that Matter 35 The data allows us to analyse international events when the ties between countries are noted through our data. Let’s hold that thought, the information starts to appear more subtly: why was the response of Ecuador or Argentina more notable than other countries? We can try to explain the situation with a few well thought out arguments, but, we want the data to do the talking. Google gives us a very useful tool to help us interpret this information we are finding. This platform is referred to as GDELT31 and it monitors in real time what’s happening in the world and the impact it is having. It also takes into consideration the language and where in the world it has happened. This means that we can further develop the information that we already have by combining the local and global. This tool can be used with the BigQuery platform from Google. Depending on how you choose to set the parameters the results may vary or you can simply stick to the preconfigured analytical tools. As an example, in June 2016, the United Kingdom voted over whether they would remain in the European Union. Does our data reflect this? It certainly does. We aren’t just talking about the immediate effect but also about the impact it had in the following weeks. We can see in Figure 7 the amount of calls between the United Kingdom and Belgium (location of the headquarters of the European Commission). The first marked date (in red) is the day of the vote (Thursday June 23, 2016). We can also see the impact in the weeks after the event. The second marked date, exactly one month later, coincides with the first published economic index which highlighted the economic contraction of the United Kingdom.
  37. 37. Data Stories that Matter 36 Figure 7 Representation of the amount of calls between the UK and Belgium around the time of the Brexit vote in the UK. The left red line is the date of the voting and the right red line is exactly one month later when the first economic index was published predicting an economic contraction of the UK. These initial investigations help to create a more formal model, where we can look at anonymous phone numbers or at geographical regions with their origin and destination. We can also create a series of indicators (amount of minutes sent and received), or this information can be combined so that it would be talking about a network or graph in which the hubs are the numbers or geographical regions, and the arcs connect those hubs with others where there has been traffic. This allows different types of analyses as illustrated in Figure 8.
  38. 38. Data Stories that Matter 37 Figure 8 Type of data and suggested analysis. Figure 9 shows an example from a graph which represents a map that shows and analyses the existing connections between Spain other countries. It shows the situation on July 7, 2016, and highlights the connections with Islamic countries as it was the last day of Ramadan. It also shows links to countries that contribute to tourism in the summer.
  39. 39. Data Stories that Matter 38 Figure 9 Graph showing the connections between Spain and the rest of the world on the 7th of July 2016 (the end of Ramadan) Time Series The sequential and temporary nature of data allows us to model in a time sensitive way. The analysis of time series is a popular statistical discipline and therefore library functions have been developed in almost all programming languages which are regularly used in the world of data analysis (R32 , Python and Matlab). There are even free tools such as INZight33 which allow us to do some basic analyses without even writing one line of code. As a first step before making any analysis, it is important to verify that our data series is static (the mean, variance and covariance of its values does not depend on time) and, if it is not, we make it that way. A series of data from the call data set usually isn’t stationary, so we need to make it that way. Put simply, a time series like what we have identified in the data taken from the call traffic, can be divided into three parts that, added together, produce the original series.
  40. 40. Data Stories that Matter 39 • Trend: this depends on the volume of traffic that Telefonica processes with a particular country. It is mainly linked to the growth or contraction of the business. • Seasonality: There are notable weekly cycles, in which a significant increase in calls happens during the weekend. • Residuals: This is the difference in values from the original series and the data that has been generated through trends and seasonality. This is the interesting part of the data as the peaks and troughs can be linked and related to international events, public holidays or technical issues. Ultimately residuals is where we should look if we want to analyse what happened outside the normal trends and seasonality. Any program (like zoo, xts or R timeSeries) allows us to easily remove these three components, as is shown in Figure 10.
  41. 41. Data Stories that Matter 40 Figure 10 Breakdown through trend, seasonality and those left over from the amount of directed minutes to one single country. The usual interest in doing a time series analysis is to be able to generate a predictive model that allows us to anticipate, for example, how much traffic we will have in the next few days. Or it can help us to find the true outliers in the series (values outside the predicted intervals). Due to its reliability to make predictions in the short term, the family of tests belonging to the exponential smoothing technique from Holt-Winters34 have become popular and are available in tools like Tableau or TIBCO Spotfire analysis. See Figure 11.
  42. 42. Data Stories that Matter 41 The ARIMA35 models are more complex to apply but in most cases improve the prediction of the previous data as the link between the data has been previously established giving the model more context depending on earlier values. Figure 11 Prediction of traffic generated with exponential smoothing (Holt-Winters). Multi-country social media platforms: The data generated through the use of social networks is used extensively by businesses. The main reason for viral marketing or other viral processes. However, the main obstacles for businesses when trying to exploit these sources are complexity and cost. Telefónica has differential knowledge about the construction of social network models or SNA36 (Social Network Analysis) which uses information gained from calling patterns. Here we want to understand the relationships at country level that are formed through telecommunication social networks. We have been inspired by social initiatives like Combatting global epidemics37
  43. 43. Data Stories that Matter 42 with big mobile data and also Behavioural insights for the 2030 agenda38 . Figure 12 gives us a first look at data from this perspective. Only taking into account the volume of calls relevant to the ones that were actually answered, combining this with common sense aligning this data with global socio-economic data. Figure 12 The amount of calls between countries by origin and destination for August 2016. This only includes the countries with most volumes of calls. There are good sources with international socio-economic data in order to contrast and complement what is observed in our data. For example, large amounts of economic data can be found in the economic observatory of MIT, in the Databank of the World Bank, or Eurostat. And more social (and also economic) data in the United Nations or UNICEF databases. This type of data can be very useful even if there are issue related to temporal, or spatial granularity.
  44. 44. Data Stories that Matter 43 Before continuing to understanding how countries interact, we need to stop for a moment and think about how people behave when it comes to making a call. In Figure 13, we have divided the calls that are made daily into four different user groups: those who tend to call during working hours (green), those who call in their free time (blue), those who call during the weekend (red), or finally those who call at night (purple). Although this first division may seem simple, it allows us to differentiate the users who will normally call for personal reasons from those who call because of work-related activity. We can notice how, for example, the level of calls during the weekend easily exceeds those from Monday to Friday, and furthermore once you are in one of these groups you tend to stay there. It is easy to say this was expected but it is the data that has been able to state and qualify these statements. Figure 13 The daily evolution of calls made by users who normally call during office hours (green), those who call during the afternoon Monday-Friday (blue), weekend callers (red) and night callers (purple). Figure 14 shows a specific European zone and their communications.
  45. 45. Data Stories that Matter 44 Figure 14 Geographical representation of communications through a defined zone in Europe. To end this post, we suggest some interesting analyses that could be performed by combining this location data of calling behaviour with the professional / personal calling patterns: • We could analyse communications between eminently industrial zones and compare those with relation to commercial seaports which are connected by transport links. • The combined knowledge of country communications with historic immigration patterns could allow us to better understand migration questions. Would this mean that this call time information could become an indicator
  46. 46. Data Stories that Matter 45 for modern day immigration? It is possible that one day we might be able to predict the flow of people through data.
  47. 47. Data Stories that Matter 46 Using data-driven decisions in Brazil's tourism sector39 By Florence Broderick Brazil's tourism industry had a huge boost in 2016 with more than half a million tourists40 descending upon Rio de Janeiro for the Olympics - making it a record year with a total of 6.6 million internationals visiting the country, a 4.8% increase on 201541 . Injecting a total of $6.2 billion into the local economy, international tourism has become extremely important in the growth of the country - with Brazil now becoming the second most visited country in Latin America after Mexico42 . To support Brazil’s step towards data-driven decision making in this sector, we have partnered with the state of Espírito Santo. With a population of almost 4 million and 40% of its territory on the coast43 , Espíritu Santo attracts a great number of national tourists from neighbouring states. This partnership allows them to measure their progress and make decisions on its tourism offering based on Big Data - leading the digital transformation of the Brazilian tourism sector. Through special data compilation technology, Espíritu Santo can understand the behavioural patterns of tourists as well as understanding the profiles of visitors in certain locations throughout the state. In turn, decision-makers will be able to provide new statistics on the direct and indirect impacts of tourism on the local economy - as well as taking actions such as optimized marketing campaigns to attract more visitors from certain locations and of certain profiles.
  48. 48. Data Stories that Matter 47 The project, which was contracted by Secretaria do Turismo do Estado de Espírito Santo, will focus on the analysis of 10 touristic events in the state, allowing them to understand which are more profitable for the local economy by looking at the times of day when most visitors attend and where they come from allowing them to compare year on year and make data-driven decisions about the organization of future events. The data will be used by the Tourism Observatory of Espírito Santo. Big Data and Tourism: How this Girona Festival became Data-Driven44 By Ana Zamora Every year, from the 9th to the 17th of May, Girona celebrates the “Temps de Flors,” one of the most popular flower festivals in Europe. For ten days, the streets of the city come to life with music, colour and the smell of exotic flowers. During this period, thousands of visitors flood the city enjoying the charm of this unique Catalonian festival with the backdrop of one of the most famous Game of Thrones filming locations45 . For 2 years in a row, we have been working with Girona's local government to enable them to take a more data-driven approach to this touristic event, ensuring that the festival is as successful as possible for the organisers. A study was conducted to analyse crowd behaviour by aggregating and anonymising mobile network event data to provide actionable insights to decision-makers in the public sector in areas such as mobility, infrastructure planning and in this case, tourism. The study46 enabled the city of Girona to become a pioneer in Big Data analysis for tourism, analysing millions of mobile data
  49. 49. Data Stories that Matter 48 events per day to understand the behaviour of tourists as well as where they come from and how long they stay. The compilation technology prioritises security and privacy at all times, carrying out a robust and exhaustive anonymization and aggregation process to analyse the movements of groups of people, rather than individual tourists to provide trend insights and patterns. After this, an extrapolation is also applied to provide an accurate representation of both national and international tourists. Key insights allowed Girona to estimate that they had a total of 244,199 visits to the city during the festival. With 90% of visitors coming from Catalonia, 2% from the rest of Spain and 8% from other countries. Figure 15 Thousands of people walk the streets of Girona during Temps de Flors. Furthermore, of the Spanish national visitors, we could identify that of the 92% of visitors visiting from Spain, 60% were from regions within Girona, 35% from Barcelona, 2% of Tarragona and 1% of Lleida. The study also showed that the gender and age split was consistent among the different regions, apart from in Girona
  50. 50. Data Stories that Matter 49 where visitors were slightly younger on the whole compared to other areas of Catalonia. Figure 16 Heat map of national tourists on the Festival de las Flores. The study also saw in the analysis that 18,881 visitors came from outside of Spain, 82% of which came from 9 countries: France, Holland, Germany, Belgium, Great Britain, Italy, Poland, Russia and the USA. French visitors were the most prominent, accounting for 45% of the total. Some years ago, the only way of obtaining this kind of data was by carrying out more traditional visitor surveys. However, now, thanks to Big Data, it is possible to obtain an in-depth analysis of the movements and behaviours of large groups of people. These tourism insights are extremely valuable to both public and private sector decision makers, as they allow them to adapt their offering to give tourists an even better experience.
  51. 51. Data Stories that Matter 50 Can Big Data reshape the Outdoor Media sector?47 By Florence Broderick Out-of-home (OOH) adspend in the UK rose to £1 billion 2014, and is predicted to grow by 4.8% in 2016 according to a recent report48 . To ensure they benefit from this growth, Outdoor Media players are looking to embrace top technological trends such as Big Data and the Internet of Things, allowing them to sell audiences: moving from panels to people. At the same time, global spend for programmatic digital display advertising is estimated to reach $53 billion by 201849 and OOH will play a big part in that growth. So what does this shift closer towards near real-time mean when it comes to data? How can leading companies in the sector adapt their data strategy to get closer to their audiences and communicate in the most digital way possible? How do they propel themselves from the realms of traditional Business Intelligence into the world of Big Data? As well as more traditional data sources such as surveys and panels, the mobile phone and its corresponding mobile event data offers a unique opportunity to organisations who want to understand their users (or audiences) better, with 90% of people keeping their phone within 1 metre reach, 24 hours a day. Using anonymized and aggregated data, mobile data compilation solutions allow OOH decision makers to understand their audiences by converting mobile event data into actionable insights to help them bring more value to their customers, enabling them to sell inventory in a more data-driven way.
  52. 52. Data Stories that Matter 51 Figure 17 Insights provided by mobile data compilation solutions In the UK, our team of expert Data Scientists and Data Engineers process and analyse over 4 billion mobile data events per day, providing extrapolated data so that our insights represent the entire population - giving an accurate picture of the behaviour of users in the area of study to a range of clients in different sectors. Figure 17 shows the insights that can be generated. Recently, our team have been working with Exterion Media, the largest privately owned Outdoor Media company in Europe. Exterion's data strategy team were keen to move from static and modelled information to more dynamic, mobile-based audience-led data. To do this, the team developed a world-class analytics tool which is on the desktops of all of Exterion's sales team. This software empowers them to optimize their sales to clients demonstrating insights on how their audiences travel throughout the London Underground network, showing them where they can find them and when they can find them - which is particularly relevant for retail clients who want to drive
  53. 53. Data Stories that Matter 52 footfall, targeting their audience at the right time in the right place. Figure 18 An example of the Insights Viewer. Data Strategy Director Mick Ridley said: "this has really given us the opportunity to speak confidently about our audiences and how we engage with our media throughout the day, giving clients the ability to target more effectively." To accelerate digital transformation, expanding and diversifying data sources in all industries is fundamental. This ground- breaking project with Exterion clearly shows their commitment to being a data-driven company, leading innovation in their space to bring even greater benefits to their customers.
  54. 54. Data Stories that Matter 53 How these 4 sports are using Data Science50 By Richard Benjamins and Florence Broderick Thousands of companies around the world may have started their journey to become data-driven, harnessing the full potential of Big Data, however, the world of professional sports is only just starting to explore this world of applying Data Science to gain a competitive advantage. Until now, sports coaches have been able to boast about their experience or their gut feelings when making decisions and have therefore been somewhat resistant to the world of Big Data - something which we all saw so perfectly illustrated in Moneyball where Brad Pitt shows the tension between human experience and data-driven. However, things are changing - and slowly but surely we're starting to see a lot more research around the role of data in sport as well as an increasing number of jobs working directly with professional sports teams to enhance their performance. But which sports are leading the way? We took a look: 1. Formula 1 Formula 1 teams are pioneers when it comes to data-driven decisions. With every race generating huge amounts of data, on the track, vehicles, conditions and drivers - Williams saw a unique opportunity. They optimized team pits-stops by taking bio-metric measurements from the technical team allowing them to understand when each team member functions optimally.
  55. 55. Data Stories that Matter 54 Eventually, they ended up reducing their pit stop time to 1.92 seconds - the fastest ever recorded.51 Figure 19 The Williams F1 team holds the record for the fastest pit stop in 1.92 seconds. 2. Football Some years ago, we obtained some data from the Spanish football league for the 2012-2013 season, allowing our Data Scientists to carry out an in-depth analysis. The data was generated by cameras that take up to 10 photos per second, and post-processed so that individual players can be identified. In Figure 20, you can see heat maps of Barcelona vs Atletico Madrid. The area represents the field, and the goal of the team is located in the “pointed” parts with the darkest colours. The darker the colour, the longer the players are at a certain location. It becomes immediately clear that Barcelona were more of an attacking team throughout that season, unlike Atletico who tended to have a more defensive approach.
  56. 56. Data Stories that Matter 55 Figure 20 Barcelona's pitch activity (left) vs Atletico Madrid's pitch activity (right). It was also possible to follow individual players, and in Figure 21, we can see the paths of two players throughout a match. The green points show that the player ran at approximately 5 m/s (the equivalent of running 100m in 20 seconds) and red points at approximately 7 m/s. It is clear that the first player runs much more than the second, but what does that mean? That the first player is better than the second? That they have different roles? Looking at only this data, if you were the trainer, which player would you prefer to buy? Figure 21 The "work rate" of Xavi Hernandez (left) vs Leo Messi (right).
  57. 57. Data Stories that Matter 56 Well, the first player is midfielder Xavi Hernandez, and the second player is Leo Messi, who doesn't need any further introduction. 3. Cycling More recently, we had the opportunity to analyse data from the 2016 "Vuelta a España", looking at Movistar Team's performance. We had access to the data of 8 cyclists from the team throughout the 21 stages from start to finish. Every second, 7 types of data of each cyclist are captured resulting in more than 2 million data feeds. The variables captured, include location, altitude, force, speed, heart rate and pedal rate. Figure 22 The Movistar Team looking at their Big Data. With this data, apart from analysing individual cyclists, it becomes possible to analyse how the team works together, and to understand and compare different stages. Looking at the
  58. 58. Data Stories that Matter 57 data, it becomes very evident how professional cycling is a team sport with differentiated roles for the different team members: today it is impossible to win one of the main competitions "flying solo". What we have learned is that it is important to: • Understand when team members peak in terms of performance so that training can be planned for peaks to coincide with competitions. • Determine the context variables (altitude, weather), the training variables and the personal cyclist variables which impact most in the cyclist's performance and subjective experience. • Combine the roles that cyclists play in the different stages with performance and fatigue variables to plan the recovery of the cyclists and the next stages during the competition. 4. Cricket Cricket, which is the most popular sport in India, and the second most popular sport in the world52 is also embracing the growing value of Big Data. IBM launched their #ScorewithData53 campaign during the Cricket World Cup which included a Social Sentiment Index which predicted correctly who would win certain phases of the tournament. The England Cricket team have also been pioneers and their ex- team coach, Peter Moores, even said "we use advanced data analytics as the sole basis for some of our decisions – even affecting who we select for the team54 ." Nathan Leamon, who was hired by the new head coach for his expertise in maths and statistics, also used to create spreadsheets using Hawk-Eye technology to run match simulations which ended up being
  59. 59. Data Stories that Matter 58 accurate to within 5% - breaking the field up into different segments for players to target when batting. Figure 23 Big Data in the world of cricket. As you can see, Big Data and Data Science aren't just limited to the world of big business - they are in fact affecting every single part of our lives. In the context of sport, the most successful will embrace data on and off the field if they want to fill up their trophy cabinets any time soon.
  60. 60. Data Stories that Matter 59 Part II – Big Data for Social Good – Data with a Soul
  61. 61. Data Stories that Matter 60 In the first part we have seen how Big Data can improve business by discussing some relevant themes and concrete examples. However, Big Data is not only good for business, but also for society. Big Data can help improve the world. “On September 25th 2015, the United Nations adopted a set of goals to end poverty, protect the planet, and ensure prosperity for all as part of a new sustainable development agenda55 . Each goal has specific targets to be achieved over the next 15 years. For the goals to be reached, everyone needs to do their part: governments, the private sector, civil society and people like you.”56 The 17 resulting Sustainable Development Goals of the United Nations have 169 targets to be achieved by 2030, and they are measured through 241 KPIs. Not all of the 241 indicators are equally easy to measure. The Inter-Agency and Expert Group on SDG Indicators57 has divided these indicators into three tiers (as of March 2016): • Tier I comprises 98 indicators (41%) for which statistical methodologies are agreed and global data are regularly available; • There are 50 Tier II indicators (21%) with clear statistical methodologies, but little available data; and • There are 78 Tier III indicators (32%) where there are no agreed standards or methodology and there is no data available; • 15 indicators (6%) are still unclassified. It is the responsibility of the Offices of National Statistics to monitor all KPIs, and government Open Data is also set to play an important role. However, as we can see from the different Tiers, there are still many KPIs without data or measuring methodology. We believe that private big data from specific sectors can help measuring those KPIs. In particular, data from
  62. 62. Data Stories that Matter 61 mobile phone operators, satellite images, financial institutions and supermarkets. Figure 24 shows an overview of sectors whose private data has been used for research projects contributing the SDGs58 . Figure 24 Number of research projects that use private data from specific sectors. Source: Worldbank. Figure 25 shows some examples of how private big data can be used to estimate KPIs of the targets of the sustainable developments goals. For example, payment data from financial institutions can help to estimate a consumer price index or a poverty index. While there are official indicators to measure those indexes, they usually are difficult to execute in rural areas of developing countries. Search queries in Google have been used to estimate and/or follow the propagation of influenza outbreaks. Satellite images have been used to estimate GDP growth through measuring light emissions. Mobile phone data has been used to estimate the rate of illiteracy in developing countries (ratio between SMS and calls) and to predict socio- economic levels.
  63. 63. Data Stories that Matter 62 Figure 25 Use of private data for measuring KPIs of targets of the SDGs. A quick analysis of all the Tier II and Tier III targets shows that initially there are about 10 clear KPIs where mobile phone data could be of help, as shown in Table 1. A more thorough analysis is needed to see how private data can contribute to support measuring of Tier II and Tier III KPIs, and to understand how it can improve some of the Tier I measurements in terms of frequency and granularity. Table 1 Tier II and III KPIs of targets of the SDGs where Telco mobile data could help. There are now about 8 billion mobile phones in the world59 and collectively, the activity of those mobile phones generates a huge amount of data. Much research60,61,62 has shown that this Big Data can function as a proxy reflecting important Goal Target Indicator Description Measure with Mobile Data? Crisis management Other data? Tier proposed by agencyTier revised by 1 1,5 1.5.1 Number of deaths, missing persons and persons affected by disaster per 100,000 people Yes Yes www.emdat.b e Tier II Tier II 4 4,4 4.4.1 Proportion of youth and adults with information and communications technology (ICT) skills, by Yes No open data on population Tier II Tier II 5 5.b 5.b.1 Proportion of individuals who own a mobile telephone, by sex Yes No open data on population Tier II Tier II 9 9,1 9.1.1 Proportion of the rural population who live within 2 km of an all-season road Yes No open data on population Tier II Tier III 10 10,2 10.2.1 Proportion of people living below 50 per cent of median income, by age, sex and persons with Yes No open data Tier III Tier III 11 11,2 11.2.1 Proportion of population that has convenient access to public transport, by sex, age and Yes No open data on population Tier II Tier II 11 11,3 11.3.1 Ratio of land consumption rate to population growth rate Yes No open data on population Tier II Tier II 11 11,5 11.5.1 Number of deaths, missing persons and persons affected by disaster per 100,000 people Yes Yes www.emdat.b e Tier II Tier II
  64. 64. Data Stories that Matter 63 alternations happening to crowds of people upon important events. The challenge is now to turn those promising research results into concrete systems that help improve the world and save lives on a continuous basis. Issues to resolve include privacy and the structural involvement of the private sector, apart from the public sector. In this part, we will discuss the main concepts of how Big Data can be used for Social Good, we present various concrete examples. The first posts are about the motivation and the challenges to scale up Big Data for Social Good.
  65. 65. Data Stories that Matter 64 Big Data: moneymaker and force for social good63 ? This article was originally posted by the Chairman of Telefónica, José María Álvarez Pallete, on the World Economic Forum blog64 . Big data and, more recently, artificial intelligence have become some of the world’s favourite boardroom buzzwords in the past three years. Every CEO feels some level of nervous excitement about the immense opportunity of big data, with the International Data Corporation predicting revenues will rocket from $122 billion last year to more than $187 billion in 201965 . Forrester Research forecasts that the big data technology market will grow three times faster than the IT market overall66 . All this buzz about becoming data-driven has led us to make more than significant investments in big data technology, convinced that data scientists and their advanced analytics will give us answers and reshape our businesses as we know them. But how should we be measuring our success when it comes to big data? Is it a lift in our average revenue per user (ARPU)? Is it a slight drop in our churn? Or is it a marginal increase in the efficiency of our network? For me, it’s all of those things, but equally, as a private sector multinational, we want to ensure that we are going above and beyond to apply the capabilities we develop in big data for social good. When I think about big data in telecommunications, I think about the 350 million customers we serve worldwide and the 23 billion mobile events they create every day in 21 countries. When I think about social good, I think about the commitments we have all made with the UN when it comes to the 17 Sustainable Development Goals (SDGs) for 2030, which represent 169
  66. 66. Data Stories that Matter 65 targets with 241 proposed indicators. Forging a relationship between our big data and work for global social good is fundamental, especially as 80% of the 6 billion mobile phones in the world are in developing countries (Figure 26), which is where we can have the greatest impact. Figure 26 Mobile cellular subscriptions per 100 inhabitants. Source Worldbank. To maximize this impact, I strongly believe we have to go much further than just telco data. To accurately measure our progress on the SDGs we will need to focus on goal 17, which is about working in partnership to achieve the goals. Collaboration between the public and private sector is crucial to advance a global open data initiative, but it is also of great importance to ensure more private sector data is used. This means bringing together data from the financial services sector, utilities providers, retailers and search engines, amongst others, to investigate how combining multiple data sources can provide achievable insights for policy-makers and NGOs. As an example of this, linked to SDG 3 on good health and well- being, our Telefónica R&D team carried out research on our data in Mexico during the H1N1 flu outbreak67 . Human mobility directly accelerates the spread of diseases, so, using our data
  67. 67. Data Stories that Matter 66 compilation technology,68 we investigated mobility patterns before and after the government advised citizens to stay at home. This showed that only 30% of people actually stayed at home, whilst 70% showed barely any changes in their day-to-day behaviour (see Figure 27). Over time, this data-driven approach to epidemic response will inevitably help us to control such challenges for global health. Figure 27 Shutting down key infrastructure reduces mobility between 10 – 30% and consequently disease propagation by 10%. Equally, data from the financial services sector can also help us achieve SDG 13, which focuses on climate action. In September 2016, BBVA used its sale payments and ATM cash withdrawal data to measure people’s economic resilience to natural disasters during Hurricane Odile69 , one of the most destructive hurricanes in almost 25 years in Mexico. Of course, combatting climate change requires a shift in the way we behave, but big data projects such as this or our study on commuting and pollution70 , are pivotal in ensuring climate doubters start to take this seriously. We’re already taking it seriously having committed to run 100% of our business with renewable energy sources by 203071 . Alert Closed Shutdown Mobility during interventions Normal mobility Interventio phases
  68. 68. Data Stories that Matter 67 Making big data for social good a success has its challenges. Chief Data Officers are inevitably concerned about privacy and security. Legal implications vary in every country, meaning that anonymization and aggregation processes need to be adaptable and exceptionally robust. Equally, Chief Communications Officers are worried about what using customer data does to their organizational reputation – even if it has an overwhelmingly positive outcome for society. However, if we want to make big data the success everybody projects it to be, then we must ensure we overcome these challenges and start measuring its success, not only on its commercial potential, but also its ability to bring value to society. If we can find a consistent way for NGOs and the private and public sectors to work together, such as the Open Algorithms Project (OPAL72 ), then we will see big data as a different kind of buzzword: not just as a money-maker, but as a society-shaker.
  69. 69. Data Stories that Matter 68 The 6 challenges of Big Data for Social Good73 By Richard Benjamins Many of us are familiar with the Sustainable Development Goals74 set by the United Nations for 2030 and increasingly more and more companies and organizations are contributing to their achievement75 . However, there are some specific companies in certain sectors who hold invaluable assets which can be key in accelerating the journey towards achieving these goals. One of those assets is Big Data. A data-driven approach can be taken for each and every one of the Sustainable Development Goals, using data to measure how the public and private sector are progressing, as well as helping policy makers to shape their decisions and have the greatest social impact possible. As we can see in Figure 28, there are many different use cases that can be considered by organizations.
  70. 70. Data Stories that Matter 69 Figure 28 Big Data can support the SDGs. Source: United Nations Global Pulse. However, many of the examples above refer to one-off projects and pilots and the real acceleration of these SDG's will come from running these projects on a continuous basis with (near) real-time data-feeds to ensure stability and continuity for the next generation of social Data Scientists. So what are the biggest challenges for companies and organizations who want to contribute their data for the greater good? Is it risky that the data has to leave the company's premises for analysis by other organizations? We've outlined
  71. 71. Data Stories that Matter 70 the challenges decision makers are currently facing when it comes to Big Data for Social Good: 1: Privacy & Security Data needs to be anonymized and aggregated. But will the anonymization process be good enough? Is it impossible to re- identify customers or users? Once the data is somewhere else, how secure is it? If it becomes a constant data feed, how safe is it? 2: Legal For many companies, most of the relevant data is based on customer data. And although it is likely to be anonymized, aggregated and extrapolated, there is no full consensus on whether this is allowed or not. Organizations also have to face the challenges of there being a wide range of different Data Protection legislations in the different countries across their footprints. 3. Corporate reputation Even if things are completely legal, professionals may still worry about public opinion and how customers may see things differently. What happens after a data breach, even if the use of data had a social purpose? 4. Big Data is the Key Asset Businesses also have strategic commercial issues that they may struggle with. Many companies have only just learned that Big Data is a key asset, so may think why should they share this with someone else, even if for the greater good? 5. Competition. Could the competition get hold of my data (asset) and make inappropriate use of it? How would I explain that one in the boardroom? The competition is tough and sending data to an external platform has most CSO's concerned.
  72. 72. Data Stories that Matter 71 6. Cannibalization. Does this use of data for social good cannibalize some of my external Big Data revenue? What if I jeopardize an existing business opportunity in order to carry out a Big Data for Social Good project? However, there is an existing solution which addresses the first three challenges. The OPAL Project76 ( which stands for Open Algorithms) doesn't require companies to move their data off their premises; it stays where it is. Using OPAL, the algorithms are transferred to the data and are certified (against virus and malware) and produce the insights they are designed for (ensuring quality). Albeit, simple, this is an extremely powerful technology and as it is an Open Source project, all software developed will be freely available. The algorithms will be developed by the community and certified by OPAL. OPAL is still in early stage, but we firmly believe that it will encourage a wider range of companies to contribute to the Sustainable Development Goals. And while OPAL is an interesting solution for the privacy, legal and reputation concerns, it doesn't yet solve the strategic and business concerns mentioned above. Until now, there is a general consensus that Big Data for Social Good should be free of charge, meaning that Social Good implies Data Philanthropy77 : a form of collaboration in which private sector companies share data for public benefit. However, Big Data for Social Good projects do not have to be necessarily free of charge. While data philanthropy is very important to start the social good movement, in the long run we expect progress to be much quicker if there are also commercial opportunities. Companies are simply more willing to invest in something with a business model.
  73. 73. Data Stories that Matter 72 At the moment there are several examples of Big Data for Social Good not being free: • Many international organizations are spending a significant part of their budgets on monitoring and achieving the Sustainable Development Goals, including The World Bank, United Nations, UN Global Pulse, UNICEF and the Inter-American Development Bank, While it may not be appropriate to charge commercial rates, it may be possible to have an "at-cost" model. • Several philanthropists are donating large amounts for social purposes such as the Bill & Melinda Gates Foundation for gender equality78 , or Facebook's founder, Mark Zuckerberg, who committed to donate €3bn to fight diseases79 . • Many projects with a social purpose are a high priority for local and national governments. For example, generating a poverty index; anticipating pandemic spreads or reducing CO2 emissions in large cities. Governments are spending considerable amounts of their budgets on such projects and there is no reason why initiatives with a social purpose couldn't also be charged for. • Sometimes a freemium model works: pilots (or proofs of concepts) are done free of charge, but putting the project into production requires investment. Or, insights with limited granularity (frequency and geography) are free of charge, but more detailed insights often have price tag. While the discussion about data for SDGs and Data Philanthropy is far from over, some visionaries predict that any future commercial, business opportunity will have a strong social component. A great read on this is "Breakthrough Business Models Exponentially more social, lean, integrated and
  74. 74. Data Stories that Matter 73 circular80 " which was recently commissioned by the Business and Sustainable Development Commission81 . Will Big Data cause the next revolution in social impact? We believe it can and we're 100% behind it.
  75. 75. Data Stories that Matter 74 7 ways mobile data is being used to change the world82 By Florence Broderick and Richard Benjamins The smartphone has changed our world as we know it, with 79% of people between 18 and 44 having their devices within reach 22 hours a day83 . This new extension of our hands has become a “digital echo” of our real behaviour, with thousands of mobile applications accelerating our data generation and providing a wide range of companies with a scary amount of data on who we are, where we go, what we do and why we do it. In the era of “Big Data”, an innocent online search for some golf clubs turns into a commercial pursuit to convert interest into investment – which has started to cause some alarm amongst consumers. Over the past few years, an increasing number of questions around privacy and the value of our data have come to the forefront of political debate. Cyber attack headlines and aggressive advertising in the top of our minds have led people to become more aware about the negative use of their data, often neglecting the many positive ways in which data is being used around the world. The mainstream media tends to focus on the Internet giants of this world and how they exploit data. However, telecommunications companies also have access to incredibly insightful data sets. The telcos of this world are increasingly finding ways to have a social impact with the data created by their customers, who spend an average of 145 minutes per day tapping and swiping away84 . Of course, mobile behaviour varies immensely in different geographies, and the extent of this social impact depends on
  76. 76. Data Stories that Matter 75 how developed the country of analysis is. For this reason, projects on Big Data for Social Good based on mobile data have tended to focus on developing countries85 , targeting the alarming problems linked to Sustainable Development Goals 1 and 2 – “No Poverty” and “Zero Hunger”. In the context of UN World Data Forum that took place in Cape Town from 15-18 January, 2017, we took a look at 7 projects, linked to 7 different Sustainable Development Goals to underline just how far mobile data can go in monitoring the progress we make on the world’s biggest challenges: GOAL 1: NO POVERTY – Poverty Analysis in Senegal (Orange and the State University of New York at Buffalo USA). By using mobile phone usage data and regional level mobility information, Orange and the State University of New York are creating poverty maps showcasing a wide range of perspectives which can be provide decision makers with better insights to eradicate poverty in the most efficient way possible in Senegal86 . GOAL 3: GOOD HEALTH AND WELL BEING – Mobility Data Analysis in Mexico during the H1N1 Flu Outbreak (Movistar). Scientific experts in the Telefónica Research and Development team used their technology to understand the efficiency of government measures during the H1N1 flu outbreak of 200987 , which was estimated to have affected up to 375,000 people88 . Human movement directly accelerates the spread of diseases so
  77. 77. Data Stories that Matter 76 they analysed mobility patterns before and after the government advised citizens to stay at home, uncovering that only 30% of people stayed at home, whilst 70% barely showed any changes in their day to day behaviour. In the future, this data-driven approach to handling health pandemics will inevitably save lives and help governments to optimize their response. GOAL 4: QUALITY EDUCATION – Using mobile data to increase the reach and effectiveness of digital education for the children that most need it (Profuturo, Telefónica) Profuturo89 aims to bring digital education to millions of children who currently are deprived from quality education. The program captures data from every interaction with the platform for all students and teachers, as well as contextual data about the country, region, etc. With this mobile data, learning analytics takes place to predict the success of students, teachers and projects. The project has started in Angola where the education program is now offered to 8000 children. It is now being extended to other countries in Africa as well as in Central and South America. GOAL 7: AFFORDABLE AND CLEAN ENERGY – Using Mobile Data for Electrification Planning in Senegal (University of Manchester, Ecole supérieure polytechnique de Dakar UCAD and the Santa Fe Institute)
  78. 78. Data Stories that Matter 77 Mobile phone data has proved to be an accurate proxy of the energy needs of populations in Senegal, allowing telecommunications operators to help utilities providers build bottom-up demand models90 . This is especially important where there is scarce information on the constantly evolving energy needs of people and companies in developing countries. In the future, mobile data will be crucial in helping governments and utilities providers to decide where to invest in renewable energies – making them more affordable for citizens. GOAL 11: SUSTAINABLE CITIES AND COMMUNITIES – Crime Prediction in the city of London in the UK (02 Telefónica and the University of Trento) Academic and mobile data experts used anonymized and aggregated mobile data and police data to predict crime hotspots in London91 . Identifying them with an accuracy of 70%, 6% higher than when police data was used on its own. The analysis showed that some components of mobile phone data are more important than others. For example, the data about the phone’s home location showed a strong correlation with crime patterns. In the future, these insights could be invaluable to law enforcement authorities in making our cities safer. GOAL 13: CLIMATE ACTION – Using Mobile Data to measure CO2 emissions in Nuremberg, Germany (O2 Telefónica, Teralytics and the South Pole Group)
  79. 79. Data Stories that Matter 78 Local governments are facing immense challenges with accelerating rates of CO2 emissions causing serious air pollution problems in cities. The first and most important step to combat this is to collect accurate data to identify where the major air pollution hotspots are, ahead of investing in solutions such as improved public transport or new infrastructure. In Nuremberg, local government decision makers are working with O2, Teralytics and the South Pole Group to understand mobility patterns using mobile data, extracting insights on traffic which allow them to make predictions on pollution in a more cost- efficient way than surveys or sensors92 . GOAL 17: PARTNERSHIPS FOR THE GOALS – The Open Algorithms Project In order to overcome the challenges facing private sector organizations wanting to use their Big Data for Social Good, the Data-Pop Alliance93 , Massachusetts Institute of Technology, Orange Group, the World Economic Forum and Telefónica have come together in the Open Algorithms Project94 . OPAL will consist of an open platform and algorithms that can be run on the servers of partner companies behind their firewalls to extract key development indicators for society, in a privacy preserving, commercially sensitive, stable, scalable and sustainable manners. Hopefully, this cutting-edge approach to Big Data will allow for more private and public sector partnerships to thrive, providing better and deeper insights to policy makers around the world. So, it’s clearly not all doom and gloom when it comes to data exploitation. In fact, many innovation teams in private and
  80. 80. Data Stories that Matter 79 public sector organizations are regularly lobbying to implement disruptive data projects with a social impact – as we can see in the examples above. However, to make Big Data for Social Good a real success, it is fundamental to find sustainable business models which allow these data analyses to become recurring projects, not just one off pilots or exploratory academic work. Taking a data-driven approach to the world’s biggest problems is fundamental and regular measurement of our collective progress on the 17 Sustainable Development Goals is crucial in enabling policy makers to mould their decisions in the most responsible and effective way possible.
  81. 81. Data Stories that Matter 80 It's time to scale up Big Data for Social Good – Interview with Nicolas de Cordes95 By Florence Broderick We interviewed Nicolas de Cordes, Director for Marketing Anticipation at Orange and a pioneer in the field of Big Data for Social Good. After spending his career between strategy and innovation, both in consulting at Boston Consulting Group and in marketing and innovation at Orange, Nicolas has more recently become a member of the World Economic Forum’s Global Agenda Council on Data-Driven Development and the Council on the Future of Humanitarian System, as well as being a key part of the UN Secretary General’s Expert Advisory Group on the Data Revolution for Development. We quizzed Nicolas on his experience in using data to have a social impact as well as gaging just how big this opportunity could be in the next few years. So Nicolas, how did you end up working in this space and where did the Data for Development Initiative come from? Well, everything started at the NetMob conference in 201196 at MIT where I was taking part as a member of the scientific committee for these analyses on mobile data. In one of those typical “networking coffee breaks” I ended up discussing the data for social good opportunity with several colleagues and a few months later we formed the Data for Development initiative at Orange97 . Fortunately, my current role allows me to explore new business opportunities in the telco sector and mobile data for social good is one of the most exciting ones I’m working on at the moment. And which project within the initiative are you proudest of?
  82. 82. Data Stories that Matter 81 There are so many incredible projects out there, however, I was particularly fascinated by one analysis which combined mobility data with medical data to fight Malaria in Senegal. We received a request from a local coordination body to provide Data as a Service (DaaS) sending weekly data to doctors on the ground enabling them to optimize their deployment of resources. If we manage to develop this service for example, they could be distributing mosquito nets and vaccines in the most efficient way possible. Other projects have been looking at poverty index and literacy rates. This was particularly rewarding because we really helped the National Statistic Office to be a lot more precise. Mobile data allows them to measure things more frequently, which is so important in terms of development. Of course there is a lot of bias in our data (e.g. market penetration) but you can still gather great insights and take better decisions as a result. What are challenges of working with national statistics organizations being a mobile operator? There are many, but fortunately there is a common opportunity. They need data more regularly and more granular in particular to measure the new sustainable Development Goals, and we want to contribute. However, there are always some disparities in culture and skills between the private and public sector. By nature, telcos are able to develop Big Data tools with weekly and monthly insights (sometimes even in real-time) from data provided by their mobile networks whilst public organizations don’t always have the resources or infrastructure to extract insights at the same pace and their decisions are taken with different time scales in mind. The key is to work closely together to find a healthy way to collaborate going forwards for both
  83. 83. Data Stories that Matter 82 policy making or strategic decision needs on one side and for operational and tactical needs on the other side. And how do you think we can achieve that healthy collaboration? Advancing social good projects is unfortunately quite slow, however, if you take a step back, we can see significant progress over the past 3 to 4 years. Going forwards, data privacy is going to be one of our greatest challenges to address. Everybody is talking about it, whether it’s the data of groups of people or individuals and it’s a legitimate concern which needs to be taken into account. We must work through that issue together to overcome these barriers. From your experience, do attitudes towards privacy vary much across different geographies? Absolutely yes, but to answer this one I’ll use an anecdote from a project we participated in in Senegal. The first thing we did there was to visit the Data Protection commission. Coming from Europe, I was expecting to encounter some challenges, however, the head of their legal department stopped me in my tracks and said: “We have a different attitude here, in this country people die because of the lack of information, so we really need your project.” This was the moment when I realized that the mind-set is completely different depending on the stage of development the country is in. Many are starting to talk about Data Philanthropy in the world of Big Data for Social Good. How do you think we can set a sustainable model which keeps all parties happy?