Se ha denunciado esta presentación.
Utilizamos tu perfil de LinkedIn y tus datos de actividad para personalizar los anuncios y mostrarte publicidad más relevante. Puedes cambiar tus preferencias de publicidad en cualquier momento.

Data Scientist - Good Rebels -

What does a data scientist actually do? Here at Good Rebels we wanted to outline a profile of this new profession, with the help of various industry leaders from academia, business and institutions. In short, we concluded that the main tasks of a data scientist are to identify data, transform it when incomplete, categorize it, prepare it for analysis, perform the analysis, visualize the results and communicate them.

  • Sé el primero en comentar

Data Scientist - Good Rebels -

  1. 1. 1 Data Scientists: Who are they? What do they do? How do they work? Data Scientists: Who are they? What do they do? How do they work?
  2. 2. 2 Data Scientists: Who are they? What do they do? How do they work? “The sexiest job in the next ten years will be that of the statistician. People think I’m joking, but who would’ve guessed that computer engineers would’ve had the sexiest job of the 1990s?”. Hal Varian, October 2008.
  3. 3. 3 Data Scientists: Who are they? What do they do? How do they work? Introduction: Data Scientist, the sexiest job of the decade - Data, data and more data - A little bit of history 1. Where do Data Scientists come from? - Understanding the role of each specialist 2. Data Scientists: seeking their place in the organizational chart - The data was already in-house
 - Are companies ready to listen to the Data Scientist? 3. Who needs a Data Scientist? 4. The Data Scientist skill set - Technical skills 
 - Above and beyond technical skills - How to choose your data scientist 
 - Struggling to find a data scientist? Train them in-house 
 - Supermen and superwomen? No, super teams! 5. The Data Scientist’s tools - Data processing system construction, databases, visualization, and data wrangling tools 
 - Open source or proprietary software? 6. Getting down to it: the work process - Three obstacles to overcome before accessing data - From data to decision... if nothing goes wrong 7. Evaluating the Data Scientist’s work 8. Trust: an essential component in the process of data science - Ethics: science’s essential accessory 9. Data scientists in Spain today - Who’s making the most out of data science in Spain? 10. Conclusions: still a great deal to be done - What does the adulthood of big data look like?
  4. 4. 4 Data Scientists: Who are they? What do they do? How do they work? The data scientist is a sort of mix between a programmer, an analyst, a communicator and an adviser. A very difficult combination to come across.
  5. 5. 5 Data Scientists: Who are they? What do they do? How do they work? Data scientist, the sexiest job of the decade The figure of the data scientist first emerged in the early twenty-first century. A decade after the widespread business adoption of the Internet, Hal Varian, chief economist at Google, predicted in an interview in October 2008: “The sexiest job in the next ten years will be statisticians. People think I’m joking, but who would’ve guessed that computer engineers would’ve had the sexiest job of the 1990s?” Varian, also a professor at the University of California, Berkeley, was one of the first to recognize the strategic importance of extracting information from data, and not just at a corporate level. “The ability to take data - to be able to understand it, to process it, to extract value from it, to visualize it, to communicate it - that’s going to be a hugely important skill in the next decades. And not only at the professional level, but even at the educational level for elementary school kids, for high school kids, for college kids. Because now we really do have essentially free and ubiquitous data. So, the complimentary scarce factor is the ability to understand that data and extract value from it”. The truth is that in 2008 a few companies had already incorporated the position in order to manage a volume of information hitherto unknown, due to its variety and sheer scope, in a quest for findings relevant to the business. Until then nobody had called them “Data Scientists”. The first to do so were DJ Patil and Jeff Hammerbacher, then heads of Data Analytics at LinkedIn and Facebook respectively. Eight years later, in 2016, with an increasing volume of data generated on a daily basis, Varian’s predictions are more poignant than ever. According to the McKinsey Global Institute report “Game changers: Five opportunities for US growth and renewal”, the big data industry in the United States could increase annual GDP by 325 billion dollars by 2020. According to the same report, the United States alone will face a shortage of up to 190,000 data scientists and 1.5 million professionals with enough proficiency to use big data effectively. Between 2010 and 2020, the number of companies seeking to incorporate the figure of a data scientist will grow by 18.7%, according to the EMC study “The Digital Universe in 2020”. An estimated 40,000 exabytes of data will be created by 2020, underlying the need for organizations to incorporate talent to conduct in-depth analysis of information.
  6. 6. 6 Data Scientists: Who are they? What do they do? How do they work? In reality, many companies (the biggest or the most pioneering ones) have already incorporated the figure of data scientist in any one of its variations. Their sudden appearance in the business world and the high demand for these professionals expected over the coming years confirm that there is a growing need to process large volumes of information and transform it into a valuable asset, given that data “in its raw state” is not useful for companies. Only an in-depth analysis offers the chance to reveal patterns and trends, which at the same time streamline business processes and optimize decision- making. This is where data science emerges as the process that enables the collection, preparation, analysis, visualization, management and preservation of large volumes of data. Extracting valuable information from all types of sources provides solutions to a companies’ vital strategic issues, such as those related to time and cost savings, new product development, the optimization of offers and faster and more accurate decision- making processes. But what does a data scientist actually do? Here at Good Rebels we wanted to outline a profile of this new profession, with the help of various industry leaders from academia, business and institutions. In short, we concluded that the main tasks of a data scientist are to identify data, transform it when incomplete, categorize it, prepare it for analysis, perform the analysis, visualize the results and communicate them. To do this, the data scientist must have technical training in programming, data management, statistics and data mining. And let’s not forget, aside from the analytical part, the ability to focus on creating value for the company. This is why, in a competitive scenario where challenges are constantly renewed and data doesn’t stop flowing, the data scientist’s work enables managers to move from an ad hoc analysis to an ongoing conversation with the data. What kind of person is able to perform this task? The data scientist is a mix between a programmer, an analyst, a communicator and an adviser. With proficiency in statistics, technology, math, and data architecture. All this without forgetting human qualities. A very difficult skill set to find all in one person? Probably so. Simply because there are not many people who can do all that.
  7. 7. 7 Data Scientists: Who are they? What do they do? How do they work? So basically, we’re talking about a well-rounded jack-of-all trades proficient in mathematics, IT and data architecture, knowledgeable of business, with strong communication skills as well as empathetic virtues... Professionals refer to this ideal person, given the practical impossibility of finding one on the market, with labels such as “El Dorado, “Unicorn”, “The Data Science Superhero”, “The Dark Beast” or “The New Renaissance Man”. An extremely powerful combination... and very hard to find, because demand is growing and such professionals are in short supply. The solution: training, retraining and building teams that when combined are able to integrate a profile like the one described. Read more: Hal Varian interview at DJ Patil Biography Building Data Science Teams, at
  8. 8. 8 Data Scientists: Who are they? What do they do? How do they work? Data, data and more data With countless services and connected devices, it is estimated that 90% of data has been generated in the last two years. This volume is higher than all the information ever created in the history of mankind. And this is also very good news for anyone who specializes in data management and processing: they’ll probably never be short of work for the rest of their lives. Numerous indicators illustrate this spectacular explosion of data. For example: - In 2020, 1.7 MB of information will be created per second and for every human being, according to EMC forecasts. - Information is constantly being generated, which someone needs to monitor. For example, on Google alone there are 40,000 searches every second. - Facebook is another behemoth when it comes to data generation. Every minute, its users send an average of 31.2 million messages and watch 2.77 million videos. - In May 2016, Facebook and Microsoft began laying a 6,600-km underwater cable between Europe and the US, capable of transmitting 160 TB of data per second. - 80% of photos will be taken with smartphones in 2017. A high percentage of them will be shared via the Internet. - It is estimated that in 2020 more smartphones will be in use than landlines, with a total of 6,100 million users worldwide. - Also in 2020, there will be 50 billion smart devices in use worldwide, all collecting, analyzing, and sharing data. A third of data will travel through the cloud. - 80% of data generated today is unstructured. This includes data found in emails, spreadsheets, social media, the Internet, etc.
  9. 9. 9 Data Scientists: Who are they? What do they do? How do they work? - The market for Hadoop (an open-source software framework used to manage networked computers) will grow at an annual rate of 58%, exceeding the value of 1 billion dollars in 2020. - For an average company on the Fortune 1000, an improvement of just 10% in data accessibility will result in over $60 million of additional net income. - Businesses that make full use of the potential of data could boost their operating margins by up to 60%. - Perhaps the most mind-boggling fact, and which highlights the enormous potential that lies ahead for the big data industry: according to MIT, less than 0.5% of all data generated right now is analyzed. Read more: Big Data: 20 Mind-Boggling Facts Everyone Must Read Internet Live Stats Big data: The next frontier for innovation, competition, and productivity
  10. 10. 10 Data Scientists: Who are they? What do they do? How do they work? A little bit of history The Cyclopædia of Commercial and Business Anecdotes, published in 1865 by Richard Millar Devens, contains the first recorded reference of the term “business intelligence”. The author described how a banker, Sir Henry Furnese, succeeded by having an understanding of market conditions before his competitors: “Throughout Holland, Flanders, France, and Germany, he maintained a complete and perfect train of business intelligence. The news…was thus received first by him”, Devens writes. Furnese ultimately used this advance knowledge to duplicitous ends and became renowned as a corrupt financier. However, he can be credited for sowing the seeds of business intelligence. Technology did not advance to the point where it could be considered an agent of business intelligence until well into the 20th century. The first commercial computers arrived in the United States in the 1950s. Hans Peter Luhn, a pioneering researcher at IBM, published in 1958 the article “A Business Intelligence System”, in which he defined business intelligence as “the ability to apprehend the interrelationships of presented facts in such a way as to guide action towards a desired goal”. Luhn contemplated the development of an automatic and intelligent system, built on document processing equipment, capable of designing target-specific action guidelines for the various sections of any organization. With this article, Luhn, considered the father of business intelligence, laid the foundations for information analysis and distribution to serve the needs of a company. It wasn’t until three decades later, in 1989 to be exact, when the analyst Howard Dresner brought the modern definition of business intelligence into the common vernacular. Encompassing somewhat cumbersome-sounding concepts related to data storage and data processing, Dresner summed up the idea of business intelligence as “concepts and methods to improve business decision-making by using fact-based support systems”. From the 2000s, the intersection between different technologies and business needs prompted new concepts and terminologies: data engineering, business analytics, data mining, etc. There is currently no clear consensus on exactly where the skills of each of these disciplines begin and end, nor to what extent some overlap with others. But what’s clear is that they all coexist under the umbrella of big data.
  11. 11. 11 Data Scientists: Who are they? What do they do? How do they work? Read more: Richard Miller Devens - Cyclopædia of commercial and business anecdotes Hans Peter Luhn – A Business Intelligence System Howard Dresner’s blog
  12. 12. 12 Data Scientists: Who are they? What do they do? How do they work?
  13. 13. 13 Data Scientists: Who are they? What do they do? How do they work? The following people have participated in this study: Bosco Aranguren Chief Marketing Officer, Microsoft Iberia CMO at Microsoft Iberia since March 2017. Previously, he was responsible for Programmatic Media Buying at Google. He joined Google in 2010 as Industry Head Automotive, and in 2012 he became Industry Head CPG Entertainment Álvaro Barbero Chief Data Scientist at Instituto de Ingeniería del Conocimiento (IIC) Expert in the fields of machine learning, optimization and algorithm engineering. His work is to transform advances in these areas into practical Big Data systems, from predictive and recommender systems to automated text analysis and resource optimization. Richard Benjamins Director of External Positioning Big Data, LUCA: Data-Driven Decisions Director of External Positioning Big Data for Social Good at Telefonica in Telefonica’s Chief Data Office. In his previous position of Group Director BI Big Data he was responsible for internal exploitation of Big Data across Telefonica. He was also Director of Business Intelligence at Telefonica Digital, and before that he was Director of User Modelling where he led Global BI programs.
  14. 14. 14 Data Scientists: Who are they? What do they do? How do they work? Fuencisla Clemares Country Manager at Google Spain Portugal Joined Google in 2009 as Manager of Retail and Consumer goods; after that, she led the Telecommunications, Banking and Insurance sectors, along with the mobile strategy for Spain. Prior to Google, she worked for seven years as a strategic consultant at McKinsey Company, and later became Director of Purchasing in the Carrefour home division. Manuel Marín Data Analytics Manager, PwC Data Analytics Manager at PwC. Before that, he was Chief Technical Officer at APARA, and applied predictive analytics in telco, banking, insurance, energy, health, sports and retail companies in the areas of fraud detection and customer intelligence. Esteban Moro Associate Professor at Universidad Carlos III de Madrid Esteban is professor at Universidad Carlos III de Madrid and member of the Joint Institute UC3M-Santander on Big Data and academic director of the Master of Data Science and Big Data on Finance by AFI. He serves as consultant for many public and private institutions. His areas of interests are applied mathematics, financial mathematics, viral marketing and social network.
  15. 15. 15 Data Scientists: Who are they? What do they do? How do they work? Felipe Ortega Director of the Master in Data Science at Universidad Rey Juan Carlos Assistant Professor in the Department of Theory on Signal and Communication and Telematic Systems and Computing, School of Telecommunications Engineering at University Rey Juan Carlos (Madrid). He is co-founder of the Data Science Lab at the Center for Intelligent Information Systems (CETINIA) and Academic Director of the Master in Data Science at UJC. His main areas of research are data engineering, computational statistics, machine learning, quantitative methods, open source software, large-scale data management and data visualization. Pep Porrà Business Performance Director, Business Performance Director at, where he leads a team of Data Scientists and Business Performance managers focused on evaluate, anticipate and understand the monetization impact of game features. Prior to work in corporate, he was a Statistics and Mathematics Professor at University of Barcelona. Alejandro Rodríguez Professor at Universidad Politécnica de Madrid Professor at the Department of Computer Languages and Systems and Software Engineering at UPM. Specialized researcher in the fields of medical informatics, knowledge representation, expert systems and semantic web. Marcelo Soria Partner at From mid-2016, partner at Between May 2014 and May 2016, he was VP of Data Services at BBVA Data Analytics, and before that he was Big Data // Smart Cities initiative co-leader at BBVA.
  16. 16. 16 Data Scientists: Who are they? What do they do? How do they work? 1. Where do data scientists come from?
  17. 17. 17 Data Scientists: Who are they? What do they do? How do they work?
  18. 18. 18 Data Scientists: Who are they? What do they do? How do they work? Where are data scientists? More than half of these professionals are concentrated in the United States. Spain is ranked as the eighth country in the world with the highest number of data scientists in employment. “The State of Data Science”,
  19. 19. 19 Data Scientists: Who are they? What do they do? How do they work? DJ Patil, currently Chief Data Scientist for the US government, was the first to coin the term “data scientist”, during his tenure at LinkedIn. But nearly a decade after, there is still some controversy about its exact meaning, and whether or not this role differs from that performed by data analysts in companies for many years now. For some, the origin of data science lies in machine learning. All prediction and classification models have been developed from this branch. Professionals trained in this discipline were mainly mathematicians who also had programming skills that enabled them to implement and test predictive models, as it represents a non-theoretical branch of mathematics. The huge change in the amount of data being handled by organizations is the main driving force behind the new profile. If elements such as big data and machine learning are added to traditional data analytics, we may well be talking about a new theoretical discipline - and also job category - whose terms are being defined virtually at the same time as the market creates demand. What distinguishes a data scientist is a different, more scientific type of training, which allows them to use the very latest techniques to access mass data, not only at the level of exploration, but also speed. A profile both academic training and professional. Due to the current lack of consensus on their characteristics and skills, there is a wide spectrum of professionals included in the category of data scientist. It is important, though, that they meet a set of characteristics: they should be able to use their knowledge to extract non-obvious information from data and empirical evidence, and also present it in an understandable way. Each specialist has their place and time Data science, big data, data analytics... Terms that we’ve been hearing for years now, but are still somewhat enshrouded in confusion when it comes to their definition and competencies. What’s involved in each of these disciplines? First and foremost, it’s important to stress that the role of data scientist is different from that of an analyst who designs models or forecasts. The data scientist is not only expected to explain the effect that the data will have on the company’s future, but also to provide solutions that help the company to grow, both in the present and in the future.
  20. 20. 20 Data Scientists: Who are they? What do they do? How do they work? “You can not communicate a relevant decision in your business if you are not able to explain how you got it, what data you have used, and what processes you have followed to break it down.” Esteban Moro.
  21. 21. 21 Data Scientists: Who are they? What do they do? How do they work? Data science - Faced with structured or unstructured data, data science is a field that encompasses everything related to the cleaning (curation), preparation and analysis of data. - Data science consists of a medley of statistics, mathematics and programming, peppered with problem-solving, data extraction using as much ingenuity as required and the ability to scrutinize a problem from different perspectives. - The data scientist shifts business cases to an analytical plane, develops hypotheses and patterns, and evaluates their impact on the business. This deep analysis has the ultimate goal of solving complex business issues efficiently and anticipating future needs. Big Data - Big data refers to huge volumes of data, proprietary or third-party and usually non- aggregated, the size of which prevents it from being processed effectively using traditional applications. - Big data is a term that is gaining more and more ground in firms and industries. The analysis of data trends using sophisticated algorithms and other cutting-edge information processing methods ultimately improves strategic decisions that are a driving force behind business. Data analytics - Data analytics uses data to examine market and business trends, and to develop or improve methods linked to productivity and cost reduction. - The essence of data analytics is inference, which is the process of drawing conclusions based solely on what the researcher already knows. - Data analytics is used in many industries to help companies improve decision- making, as well as to verify or refute existing theories and models.
  22. 22. 22 Data Scientists: Who are they? What do they do? How do they work? “The next big challenge in the gaming industry is to create smart systems. To convert data into new value for the company”. Pep Porrà.
  23. 23. 23 Data Scientists: Who are they? What do they do? How do they work? A hypothetical case will let us see the different processes involved in a data science project. Let’s imagine that every day millions of images are uploaded to a restaurant review site and they need to be catalogued: are they pictures of food? What kind of food? Or are they of a restaurant? Of the outside or the inside? Machine learning automatically classifies each image into its respective category. Properly “trained”, a computer can figure out, for example, if the photo of a restaurant is of the inside or the outside. The data scientist oversees the entire project, from selecting the right algorithm to engineering design. - The data scientist creates the model which allows the computer to make this distinction, using different sources of information ranging from manually classified images to keywords in screenshots. - Using data engineering techniques, a data feed and storage system is created, to which algorithms are applied on a large scale. - Finally, analysis is made of the business implications for the company of the innovation applied: is it useful for business? Will it help the website generate more traffic?... and so on. The findings are then presented using visualization tools.
  24. 24. 24 Data Scientists: Who are they? What do they do? How do they work? 2. Data scientists: seeking their place within the organizational chart
  25. 25. 25 Data Scientists: Who are they? What do they do? How do they work?
  26. 26. 26 Data Scientists: Who are they? What do they do? How do they work? “The problem we often find is that data has been managed in isolation. And then the time comes to enable that data and there’s no communication going on”. Bosco Aranguren.
  27. 27. 27 Data Scientists: Who are they? What do they do? How do they work? The data scientist isn’t a radically new profile that’s being defined from scratch. Companies have long been resorting to in-depth data analysis as a valuable tool that helps meet or exceed their goals. What’s changed now is the dimension of this analysis, as in a greater volume of data calls for a different approach, with regard both to procedures and the purpose of the analysis. Many experts stress the idea of rediscovering data, or rather, discovering its value contribution to the company. The person who used to manage data, target customers or detect products with the greatest turnover quite clearly added value to the company. But the data scientist’s role goes much further. The data was already in-house It’s true that the figure of data manager has existed in companies for some time now. Data Analytics has been used in the telecommunications industry for at least 20 years. Banking also has been using Business Intelligence for several years, as have - somewhat more mutedly - all major companies at the helm of their respective industries. However, far from being a cross-disciplinary practice, data analysis has often only been applied in specific departments, mainly in Marketing, Communication and Customer Insights. A form of pigeonholing which has to a certain extent jeopardized its importance within the hierarchy of company priorities. The main problem in companies without a data-focused corporate culture is that they were often run in a decentralized and disorganized way. As a result of this siloed management, each corporate department has been taking technology-related decisions it deemed the most appropriate at any given time. Now that the time has come to deal with data, experts are encountering barriers and incompatibilities that hugely complicate their work. In institutions with enormous historical repositories, grouping together and processing data files is a colossal effort, but once this path of self-learning has been completed, the work translates into improvements in internal processes, people management and/or customer service.
  28. 28. 28 Data Scientists: Who are they? What do they do? How do they work? “Technically you can do just about everything, but the organization must then be prepared to use it”. Richard Benjamins.
  29. 29. 29 Data Scientists: Who are they? What do they do? How do they work? The difference when compared to the situation in recent years is that data analytics specialists now have much more powerful and effective technological resources, allowing them to extract greater value from the information. Computing costs are lower, data availability is higher and connectivity between both is greater, so this raises the chances of finding patterns or potential case-based reasoning, helping to update the practice of using data to improve management. In this process of recognizing the status of data scientists, it’s vital to mention a fundamental advance in their professional acknowledgement: they have taken on the crucial responsibility to commit towards improving company results. Their mission is no longer limited to guiding or advising the actions of other departments, nor to crunching data to later present it to managers responsible for decision-making. The data scientist’s work culminates with the delivery of new business opportunities founded on the comprehensive inspection of data. Is the company ready to listen to the data scientist? The data scientist in many cases faces another crucial battle to make sure that their new status within the company is acknowledged: overcoming resistance to change. Digital inertia is pushing many companies towards the culture of data, but in more traditional or larger organizations, where digital natives are often part of the management, this can end up being a costly journey if it is long, or traumatic if it is short. The first leg of the company’s journey towards big data must receive firm support from the management. There are so many departments involved (IT, Business Intelligence, e-Commerce, Marketing, etc.), and so much coordination among them is needed for data to flow, be shared and properly used, that only by providing resources from the top will it be possible for change to take place. Without agility and cooperation, there can be no results. In companies where there’s a tendency towards convenience or resistance to change, the data scientist might even be seen as a gatecrasher who has turned up to lecture experts on how to run the business. Executives who have long established the rules of the game are wary of the mathematician, who even seems to be speaking a language that is foreign to the company.
  30. 30. 30 Data Scientists: Who are they? What do they do? How do they work? The first step in a company’s journey towards Big Data needs support from top management.
  31. 31. 31 Data Scientists: Who are they? What do they do? How do they work? This is a cultural issue: the scientific endorsement behind the data scientist’s recommendations must tap into traditional decision-making processes, based on experience or other types of indicators, sometimes as simple as a spreadsheet. There may even be people who ignore the contributions of the data scientist, as they may fear being put into a compromise to improve results: meeting KPIs can be a painful goal. A phenomenon that is repeated in all kinds of organizations, including startups, because ultimately each person tends to protect their own teams and projects. That’s why, as we shall see later on, entropy and communication are two of the essential non-technical qualities required to work as a data scientist.
  32. 32. 32 Data Scientists: Who are they? What do they do? How do they work? 3. Who needs a data scientist?
  33. 33. 33 Data Scientists: Who are they? What do they do? How do they work?
  34. 34. 34 Data Scientists: Who are they? What do they do? How do they work? In the United States, data scientist was listed in 2016 as the job with the best prospects, based on three factors: job openings, salary and potential for career development. Source: 25 best Jobs in America,
  35. 35. 35 Data Scientists: Who are they? What do they do? How do they work? Companies and organizations in countless industries today are embarking upon projects related to data analysis: banking, communications, entertainment, healthcare, education, natural resources, insurance, retail, transport, energy, etc. Many institutions publish their big data repositories, and moreover technologies to visualize and analyze data are generally available. This scenario facilitates investigation as anyone with basic training can raise a company-related issue and collect the data required to solve it. Why would a company venture into a big data related project? The main objective is usually to improve customer experience, but other goals include reducing costs, refocusing marketing strategies, streamlining internal processes or improving security. We know that we have unprecedented access to information and data. What’s more, complex systems appear in any field of knowledge. Unpredictability can manifest itself in all kinds of disciplines: mathematics, physics, chemistry, engineering, programming, economics, sociology, psychology, etc. There is a continual challenge to find order or a behavior pattern among the seemingly chaotic nature of any system. As a result, there is no shortage of data or, obviously, problems to solve. And there is so much knowledge out there that it is difficult to create new knowledge, in this instance understood as any algorithm or model to help improve business performance. Taking on all these challenges, in addition to a solid technical background, requires huge doses of passion and motivation. That’s why defining the criticality of the problem to be solved is crucial for the data scientist. But, how do you define a good problem? How it is recognized and how are resources allocated to solve this particular issue and not another? The answer may be subjective, depending on the other person. But basically, a good problem should meet three conditions: • Demonstrate a clear and direct impact on the business. • Prove solvable with the data at hand. • Provide sufficient motivation to the data scientist and his/her team.
  36. 36. 36 Data Scientists: Who are they? What do they do? How do they work? “It’s impossible to have someone who is knowledgeable in all the businesses in the world. The company may have a generalist data scientist and specialists in the areas where business can be developed”. Álvaro Barbero.
  37. 37. 37 Data Scientists: Who are they? What do they do? How do they work? The last question is who can take charge of solving such problems. In his book Building Data Science Teams, DJ Patil sums up the essence of a guide for employing or hiring a data scientist: “The inventor of LinkedIn’s ‘People You May Know’ was an experimental physicist. A computational chemist on my decision sciences team had solved a 100-year-old problem on energy states of water. An oceanographer made major impacts on the way we identify fraud. Perhaps most surprising was the neurosurgeon who turned out to be a wizard at identifying rich underlying trends in the data”. Ultimately, all scientists, whatever their training, are able to meet the challenge of extracting information from data, as long as they convey enough passion for problem- solving. And it is always beneficial to test the robustness of a model based on the variety of perspectives provided by different scientific disciplines.
  38. 38. 38 Data Scientists: Who are they? What do they do? How do they work? 4. Skills of a data scientist
  39. 39. 39 Data Scientists: Who are they? What do they do? How do they work?
  40. 40. 40 Data Scientists: Who are they? What do they do? How do they work? “MOOCs are very useful for training, because they are very specific and oriented towards a specific objective.”. Alejandro Rodríguez.
  41. 41. 41 Data Scientists: Who are they? What do they do? How do they work? The data scientist is not necessarily a professional with a “numbers” training. It’s not essential to come from disciplines such as mathematics, statistics, physics or exact sciences, although these educational backgrounds provide a useful foundation. Some data scientists come from fields such as telecommunications, engineering or computer science, and even from seemingly obscure areas such as communication, economics, finance or biomedicine. Why? Because the most important part of their job is ultimately to analyze data: play with it, work with it, question it, and love it. The data scientist should be a curious, creative, innovative and even defiant person, capable of questioning the status quo. And that’s why their training is not as decisive as their attitude is. Technical skills What is clear is that the data scientist’s work revolves around the combination of technology, creativity and data. There are likely common core requirements when it comes to their qualifications and performance, but as time goes by, the profile will gradually diversify into multiple branches and specializations. In short, the data scientist should be fully at ease with the following four disciplines: • Statistics / Mathematics: they should be able to analyze databases, build models, make statistical forecasts and distinguish what is representative from what is not. Therefore, they should have a strong mathematical background that allows them to control supervised models with predictive techniques (data mining, machine learning) and unsupervised segmentation models. Prior to this modelling, they should be able to work with all mathematical techniques of data pre-processing, and once the model is built, of data evaluation. In short, they should be familiar with a skill set of techniques to enable them to construct and to evaluate a predictive model, as well as apply statistical logic to programming languages. • Technology: as a requirement for transforming data into knowledge, the data scientist must understand the business’ technological and have the know how to implement them. Algorithm design is key to data transformation, and calls for fluency in multiple computer languages, as well as full knowledge of database management. It’s very important to be proficient in automation, since many processes are repeated on a computer while the data scientist is working on refining or calibrating the model.
  42. 42. 42 Data Scientists: Who are they? What do they do? How do they work? “In Spain, we lack the mindset to help people grow, take risks, even train them to grow in their job positions”. Fuencisla Clemares.
  43. 43. 43 Data Scientists: Who are they? What do they do? How do they work? • Business analytics: the data scientist should speak the corporate language, understand the company’s goals, the industry in which it operates and the processes that drive profit and growth. Only in this way will they be able to discern which problems can be feasibly solved through data processing, and only by understanding the inner workings of the company will they be able to convert data analysis into insights and valuable recommendations for the company. Without certain knowledge of the business environment, mere technical qualifications can lead to rejection of the “techie” or difficulty in understanding them, or even awkward situations where all they are offered are obvious answers. • Communication: the data scientist will at some point have to present meticulous and accurate results of their work - not based on experience, but on their analysis- to professionals, often managers with decision-making powers and extensive business experience but who lack technical training. That’s why they should possess the ability to communicate with ease and create a dialogue tailored to the level of their audience. It’s paramount that the result of an analytical process be able to be understood by any manager within the company, whether that be an engineer or a social media specialist. Skills above and beyond technical ones The data scientist doesn’t only subsist on technical know-how. Ideally, the above capabilities are complemented by a series of personal characteristics, thereby forming a skill set (sometimes merely utopian) in which merges specialisation with human qualities. • Creativity: in order to give a different perspective analysis thanks to the ability to use new methods to collect, interpret and analyze data. The technology itself is not a differential factor from the moment that a program is made available to any organization. That’s why the significance of know-how is vital: the tools may be the same for everyone, but the minds handling them are not. Technological uniformity melts down when intelligence is added, turning the results offered by a software solution – one which may even be used by the competition - into unique ones. • Intuition: the ability to choose between one way or another of reaching a solution is extremely important. Experts underline the importance of applying an artistic component to a technical working process that usually triggers a fixed sequence
  44. 44. 44 Data Scientists: Who are they? What do they do? How do they work? “To stay on top of everything and constantly refresh one’s knowledge, curiosity is essential”. Marcelo Soria.
  45. 45. 45 Data Scientists: Who are they? What do they do? How do they work? (data processing, curation, modelling, etc.), but which requires an intuitive spark to discriminate which steps are suited to critical analysis. • Flexibility: Trial and error mechanisms allow us to evaluate and choose one option or another for the work already underway, complementing - or even rectifying - decisions made before starting the project. Mathematical models are not unique, but are grouped into toolboxes that encompass different techniques. Therefore, agility is required to opt for a technique or one analytical tool or another, depending on the structure of the data, the information available, etc. For professionals trained in theory but with little experience in the practical side this may represent a point of weakness. • Curiosity: understood as the ability to ask questions, to comprehend what is asked and to envisage the right path to take. Curiosity is essential for keeping abreast of techniques and arts, as well as for constantly refreshing knowledge base. Ultimately, this will lead the data scientist to draw meaningful inferences from the data. • Empathy: Although their work is the result of hours and hours spent in front of a computer, the data scientist is not a lone wolf. The human factor must be present in their daily lives, in the sense that their work depends on collaboration with other departments, and it is impossible to pull it off without cooperation. Accustomed to mobility between projects and areas, the challenge lies in creating free-flowing dialogue with other parts of the organization. What’s more, they may sometimes have to present undesirable results to clients or superiors, further reinforcing the importance of the personal touch. • Pragmatism: Finally, there’s no point in all this theoretical analysis if it isn’t accompanied by a practical impact. Technical skills are of little use if the data scientist isn’t able to integrate into a team or convert all their analytical potential into results that benefit the company or other working groups. Therefore, they must be able to transfer data analysis into insights or actions with a direct impact on the business.
  46. 46. 46 Data Scientists: Who are they? What do they do? How do they work? “At Google, we try to work extensively in the ecosystem, which is a word we’re very fond of. We aren’t the ones who are going to train people, but we can influence other experts to encourage such initiatives”. Fuencisla Clemares.
  47. 47. 47 Data Scientists: Who are they? What do they do? How do they work? How to choose your data scientist For a profession that is still evolving, traditional recruitment processes are of no use. Companies like Facebook, Amazon, Google or Microsoft are at the forefront of corporate use of data science, serving as a benchmark for companies from all industries to understand the professional profile of recruits and the type of work they perform. It goes without saying that their technological background is critical: without the relevant technical training, it is impossible to address the mission of data processing. That’s why above all it is important to evaluate training and experience in mathematics and computer science. But we must also assess the ability to refresh knowledge, grow and learn in an ever- changing environment, because we’re likely to recruit someone who doesn’t know which challenges they are going to face in three years’ time. Therefore, in the selection process it is important to test reasoning skills through problems where it is not as important to find the right solution as it is to follow a logical process. Nor is it uncommon to consult references seldom used in other selection processes, for example, work developed on platforms such as GitHub. Struggling to find a data scientist? Train them in-house When recruiting a data processing specialist becomes a complex or financially costly chore, some companies opt for internal promotion. Professionals already working in an area related to data analytics are trained or re-trained in disciplines adapted to the new needs of the company. This is a widespread and perfectly valid procedure for companies that choose to re-train their specialists in data analytics. This re-training is favored by the trend towards standardization brought on by technology: there are countless tools that make the prior task of data analysis and cleansing easier, and which allow professionals already in the workforce - especially in business intelligence - to be re-trained in data science. The pull effect of what some describe as today’s coolest profession, along with technological standardization, has somewhat lowered the bar of technical knowledge required to perform the role of data scientist, which actually poses a risk that threatens the quality of the decision-making process. The tools that automate some of the work with
  48. 48. 48 Data Scientists: Who are they? What do they do? How do they work? Where has data scientist studied? When looking at data scientists’ academic backgrounds, it’s surprising that Business Administration is the second-most common course of study. Source: “The State of Data Science”,
  49. 49. 49 Data Scientists: Who are they? What do they do? How do they work? less specific knowledge globalize and streamline the practice of extracting value from data, without the need to aspire towards having a data scientist, or at least a data analyst, on the payroll. Another advantage of in-house training stems from the unique nature of the data scientist’s work. Their concerns and personal motivations do not always coincide with those of other professionals. Their passion for research - let’s not forget that we’re talking about scientists - and their motivation to learn may actually replace the priority levels they give to variables such as their rank in the company, advancement, salary or responsibilities. In this regard, the profile lies halfway between professional and academic, although we must remember that performance metrics in a company are not the same as those at a university. Supermen and superwomen? No, super teams! Statistics, Technology, Analytics, Communication... Without forgetting human qualities. Is this skill set very difficult to come across all in one person? Probably so. Simply because there aren’t many people who can do all that. The alternative is simple: working in multidisciplinary teams. This involves creating groups that, as a whole, satisfy all these qualities. A collaborative effort that goes beyond the work of a single person, where the most important thing is to create a climate where curiosity, motivation, knowledge sharing and cooperation are encouraged. Each team member has a clearly defined role, and does not need to know everything: the modelling expert will work alongside the analytics expert; and the business specialist with the head of communication. But what is important is that the generalist data scientist has a global vision of the entire work process, which will avoid situations where, for example, they invent a mathematical model that cannot be run with the available hardware. The group should operate smoothly, within a dynamic rather than a rigid structure, because once the general problem has been identified, specialists centered in a particular area can be incorporated. Such a smooth operation, besides oiling the wheels of the team, will allow each group member to focus on areas that most appeal to them.
  50. 50. 50 Data Scientists: Who are they? What do they do? How do they work? “Right now, there is demand from our Data Science students even before they complete their training”. Esteban Moro.
  51. 51. 51 Data Scientists: Who are they? What do they do? How do they work? The ideal CV Looking to work as a data scientist? In that case, you should make sure that your CV features the maximum number of the following skills and qualifications: • Programming - R - Python - Spreadsheets - JavaSript and HTML - C/C++ o Java, Julia • Statistics - Descriptive and inferential statistics - Experimental design • Mathematics - Functions and graphs - Multivariable calculus - Linear algebra And an essential complement: a good command of English, the language in which an enormous amount of new knowledge is generated. How much does each specialist earn? Salaries (in the US) Data Scientist $113,000 / year Big Data Specialist $62,000 / year Data Analyst $60,000 / year Source: • Data management - Database systems - SQL • Data communication and visualization - Visual coding - Data presentation - Knowledge of audiences • Bonus: Intuition - Project management - Industry knowledge • Machine learning - Supervised learning - Unsupervised learning - Reinforcement learning
  52. 52. 52 Data Scientists: Who are they? What do they do? How do they work? 5. The Data Scientist’s Tools
  53. 53. 53 Data Scientists: Who are they? What do they do? How do they work?
  54. 54. 54 Data Scientists: Who are they? What do they do? How do they work? “Expectations are the issue. Companies don’t understand that in research, there are times when things just don’t work out”. Alejandro Rodríguez.
  55. 55. 55 Data Scientists: Who are they? What do they do? How do they work? Construction of data processing systems, databases, visualization tools, and data wrangling tools Within engineering related to the construction of data processing systems, there are three basic tools to embark upon the analysis of huge volumes of information: Python, R and Hadoop. While these programming languages are relatively news and not as widespread, they are easier to grasp for professionals already proficient in programming languages like Java or C. R Project. Considered the standard among statistical programming languages, some know it as “the golden boy” of data science. R is a free software environment dedicated to statistical computing and graphics, compatible with UNIX, Windows, and MacOS platforms. It is a must in data science, and being proficient in it practically guarantees a job offer, given the increasing number of commercial applications and its advantageous versatility. - R is free: anyone can install, use, upgrade, clone, modify, redistribute, and even resell R. Not only does it save money on technology projects, but it also provides constant updates, which are always useful for any statistical programming language. - R is a high-performance language, which helps users handle large data packages, making it a great tool for managing big data. It’s also ideal for intense and resource- intensive simulations. - Given all its advantages, it is increasingly popular. It has about 2 million users, who make up an active and supportive community. There are more than 2,000 free libraries with statistical resources devoted to finance, cluster analysis, and much more.
  56. 56. 56 Data Scientists: Who are they? What do they do? How do they work? Any cultural change is costly or takes a long time; and if it’s short, it’s traumatic.
  57. 57. 57 Data Scientists: Who are they? What do they do? How do they work? Python. Another flexible and straightforward open-source programming language. A programmer working with Python ends up writing less code thanks to its “friendly” features for beginners, such as code readability, simplified syntax and ease of implementation. - As with R, programming in Python is suited to a great deal of industries and applications. Python powers Google’s search engine, as well as YouTube, Dropbox, or Reddit. Institutions such as NASA, IBM, and Mozilla also depend heavily on Python. - Python is also free, which benefits startups and small businesses. Since the language favors simplification, it can be handled by small teams. And a good knowledge of the basics of this target-focused language lets you migrate to another similar language just by learning the syntax of the new language. - As a high-performance language, Python is the option often chosen to construct fast-access applications. Plus, its huge library of resources provides the necessary help to ensure that productivity is just a few clicks away. Hadoop. Another staple for anyone who wants to venture into the analysis of big data. Available as an open-source framework, Hadoop facilitates the storage and processing of huge amounts of data. It is considered the cornerstone of any flexible and forward- thinking data platform. - Hadoop is one of the technologies with the greatest potential for growth within the data industry. Companies like Dell, Amazon Web Services, IBM, Yahoo, Microsoft, Google, eBay, and Oracle are firmly committed to Hadoop’s implementation. - One of its major benefits is to help companies with their marketing needs: Identifying customer behavior patterns on the website, providing recommendations and custom targeting, etc. - Hadoop opens great career opportunities up in a wide variety of positions. Given its relevance in many industries, Hadoop specialists can find work as an architect, developer, administrator or data scientist.
  58. 58. 58 Data Scientists: Who are they? What do they do? How do they work? “The reality of Data Scientist’s work is that you do not know what you’re going to find behind the data. If you want to work agilely, you have to be flexible and, above all, be very practical”. Álvaro Barbero.
  59. 59. 59 Data Scientists: Who are they? What do they do? How do they work? Another frequent interaction in the data scientist‘s work is with databases. Here it’s common to work with NoSQL databases, Apache Storm, and processing tools like Spark, as well as with virtual machines like Storm. Visualization tools are not as important for creating value as they are for convincing. In this sense, they’re associated with the results communication phase and the actual work of rediscovering the value of the data: it’s not the same to trawl through numbers as it is to present them. Programs such as QlikView, Tableau, and Spotfire are used for this. Finally, there’s a pretty unglamorous part of the data scientist’s work, which is a process known as data wrangling. Raw data is often presented in a confused or imperfect way, so the data first needs to be manually collected and cleaned up before it can be converted into a structured format to be explored and analyzed. And this is a task that can take up more than 50% of the data scientist’s working time, using tools like OpenRefine or Fusion Tables. Open source or proprietary software? As in any area where specific software is required, data science professionals can choose between programs marketed by private companies and open-source software. Before embarking on a data science project, it’s very important to know exactly which technological needs will be required to adapt resources and budgets accordingly. This is one of the reasons why more and more companies are opting for the flexibility of open- source alternatives. The variety of options arising from the open-source environment has also helped to expand the use of new technologies and knowledge. Fee-charging commercial tools that dominated the market up until recently are increasingly seeing their prominence diminished in favor of free alternatives. Some experts have warned about manufacturers who try to impose their commercial solutions on businesses, which end up investing heavily in proprietary applications that always have an open-source alternative. This captive nature is replaceable by open-source projects, which are scalable and can offer a performance that’s comparable to proprietary software.
  60. 60. 60 Data Scientists: Who are they? What do they do? How do they work? 6. Getting down to it: the work process
  61. 61. 61 Data Scientists: Who are they? What do they do? How do they work?
  62. 62. 62 Data Scientists: Who are they? What do they do? How do they work? “Some people get scared because they think you want to impose an army of mathematicians on them”. Manuel Marín.
  63. 63. 63 Data Scientists: Who are they? What do they do? How do they work? The coexistence between analysts and specialists in a company within mixed teams involves starting out on a journey that will ideally culminate in the opening of new lines of business. Results don’t sprout up from one day to the next, but data science makes once seemingly unattainable milestones feasible. Three obstacles before accessing data Before buckling down to work, the data scientist first must overcome three obstacles: 1. Access to data Many companies may amass huge amounts of customer data, but the nature of their services includes restrictions related to security and privacy. This presents a ‘chicken and egg’ type of dilemma: as a condition for access to data, management will want to know the potential value it can bring to the company. No matter how much the analyst may sound off about this, the real benefits for the company cannot be demonstrated if the necessary data cannot be accessed. How can we get out of this quandary? One way of doing so is by pressing on through scaled models which progressively show the management team the benefits analytics can bring. Access to a sample of data will help create a model that solves a specific problem. A small-scale study of specific customers, which can trigger a decision with immediate impact on the company, is a good starting point. Once the management team can verify the model’s suitability, by applying it to immediate decisions, the first step will have been taken. In this scenario, choosing a suitable problem that has a visible impact on the business is crucial. Therefore, the analyst needs to show their skills, intuition, and knowledge of the business. It goes without saying that a model built from a limited sample will have limited significance, but it is a requirement to fling open the doors of data.
  64. 64. 64 Data Scientists: Who are they? What do they do? How do they work? “There will be a lot of demand from companies that we could consider more traditional”. Bosco Aranguren.
  65. 65. 65 Data Scientists: Who are they? What do they do? How do they work? 2. Technological means Having overcome the first obstacle, the next one appears: having the necessary technological infrastructure to support access to data, analysis, and the exploration of results. It’s not about looking for a culprit if such means are not available: there might not be anybody in the organization cognizant of the impact that data analysis can have on the business. But, this path offers no shortcuts: if this work isn’t done, someone will have to deal with it. A further problem that often comes up is the decentralization of data. With disaggregated departments and dispersed databases, each with its own access and security protocols, the data scientist, sometimes with the help of an engineer, will have to focus on grouping the data in one place, before they can even get to work. 3. Human resource management Part of data science, like any other science, is exploration. And exploration calls for a great deal of inspiration and the lowest possible number of strict orders that stifle creativity. Passion, perseverance, and curiosity are qualities required in this type of work, and are often not compatible with rigid organizational structures. Therefore, managers must be patient and understanding, and always within the varying pressure dictated by financial results, should grant the data scientist the necessary time and freedom to move forward with his or her investigation. Once the balance has been achieved between what motivates employees and the business’s priorities, the results should start to appear. From data to decision... if nothing goes wrong Once the data is available, the data scientist generally undertakes a scaled process. He or she will have to devote much of their time to cleaning the data, and then set off on a route that begins with small samples and will end, if all goes well, with the extraction of useful conclusions based on a predictive model.
  66. 66. 66 Data Scientists: Who are they? What do they do? How do they work? “Oftentimes the reason they end up hiring you astonishes you”. Manuel Marín.
  67. 67. 67 Data Scientists: Who are they? What do they do? How do they work? If all goes well... Because data science is not a foolproof process. As in any research project, there are no absolute certainties. Therefore, we must be prepared for possible failure, however hard it may be for companies with high expectations and often do not consider the lack of results to assume. In projects involving vast databases, it’s not always necessary to use all the data. Therefore, it is important to scale: starting with a manageable database, going back and forth, and setting up a permanent dialogue with the person or department most interested in the project. Then, once a small insight into the potential scope has been gained, scaling can begin. The road to this point is sometimes littered with issues related to decision-making: the focus of the investigation, the data to be used, the analytics to be used… Technical knowledge does not guarantee the customization of specific projects, always subject to unforeseen circumstances that are not covered in training centers. The ratio between available information and decisions is very unbalanced towards the former. The process of transforming data into decisions may lead to swathes of information being lost, and the way the process is transmitted plays a role in this journey. An important decision for the company cannot be conveyed if it is not backed up with solid arguments about the source of this conclusion, which data has been used and which processes have been followed to analyze this information and turn it into the nugget that is the decision.
  68. 68. 68 Data Scientists: Who are they? What do they do? How do they work? 7. Evaluating the data scientist’s work
  69. 69. 69 Data Scientists: Who are they? What do they do? How do they work?
  70. 70. 70 Data Scientists: Who are they? What do they do? How do they work? In what industries can we find data scientists? Technology-heavy industries account for the largest accumulation of data scientists. Fuente: “The State of Data Science”,
  71. 71. 71 Data Scientists: Who are they? What do they do? How do they work? Mathematician George E. P. Box, considered one of the most important statisticians of the twentieth century, famously once said: “All models are wrong, but some are useful”. Wrong in the sense that they cannot capture all the details of a system, because if they did that, the model would be so complex that it would contradict the very purpose of modeling. Yet, that does render models useless; but it does force them to be constantly reinterpreted and validated using empirical data and knowledge of the system itself, regardless of the techniques or algorithms used in the analysis. How can we measure the results of the data scientist’s work? First, we must take the time horizon into account: benefits are never seen in the short term. The data scientist develops a predictive model, whose execution depends on whether it is accepted by management. Machine learning techniques will then be run on the model created to improve accuracy. For team leaders, it is important to emphasize the work’s practical application. It is fundamental, especially in large companies, to ensure that algorithms do not end up simply as beautiful theories. The responsibility of the data scientist can officially be wrapped up once they have finished constructing their model, but personal responsibility presses on, even at the risk of sounding gloomy, until the model is run. Then comes the wait for results. Models are not foolproof: a key parameter may have been left out, either because a wrong variable altering the outcome has been entered or because the subtleties of the business have not been grasped. Execution may also fail: the insight might be good, but it is not put into practice in the right way. The quality of the algorithm is not the exclusive yardstick to measure that data scientist’s performance. Their responsibilities include some sales-related work-dealing with customers, explaining to them what they have found, guiding them on what to do with their data, always using the communication skills that the data scientist - or any member of their team - should hold. Another type of valuation can be extracted from this work. Finally, let’s once again remind ourselves of the importance of the human factor. Data science is not a black box enshrouded in mystery. Data scientists are not oracles, nor are their words prophecies: the algorithm may make a specific prediction, but the option to translate that insight to the business or not, with all the consequences that it may incur, ultimately depends on the person who makes the decision. Hence the importance of the human factor in the whole process.
  72. 72. 72 Data Scientists: Who are they? What do they do? How do they work? 8. Trust: an essential component in the data science process.
  73. 73. 73 Data Scientists: Who are they? What do they do? How do they work?
  74. 74. 74 Data Scientists: Who are they? What do they do? How do they work? “In terms of training, I don’t think there is a gap between Spain and the United States or the United Kingdom”. Pep Porrà.
  75. 75. 75 Data Scientists: Who are they? What do they do? How do they work? Data is highly sensitive, especially when working with outside information. In such cases, the customer relationship should be respectful and diplomatic: it’s their business, it’s their data and it’s often their most asset with the most value. In some industries, there is a certain idea of harnessing a return on data, but the lack of experience with big data leads to reservations before they even dare to venture into data analytics. Younger companies are more cautious, perhaps waiting for others in their industry to take the first step. It’s also common for companies to take the big data route but are later reluctant to give up their data, either because they hold back from sharing any conclusions with the market or because they don’t even want analysts to know them. In this context, the most common formula is: acquire the tool, train the team in the tool, and then give support. Another delicate situation arises with the dangers of do-it-yourself data science. There are some people who choose to blindly apply tools only after learning about them superficially, with unpredictable results. This creates a buzz that is detrimental to the entire data science industry, in the sense that companies don’t receive the advertised benefits of big data, without truly understanding why they haven’t reaped the full rewards. There are many disoriented companies, that have heard the fanfare about big data, spend lots of money without knowing what they’re spending it on, or have yet to see the results. They need to be treated sensitively, with sound judgement and common sense, clarifying and simplifying the guidelines for action. In an industry where the raw material is so perilous, trust is essential. Ethics: the essential complement to science The data scientist takes on a strong ethical commitment, in the sense that they must ensure a responsible use of the information given to them. In an increasingly digitalized society where everyone unwittingly and involuntarily leaves trails, it would be possible to invade anybody’s freedom simply by using the appropriate knowledge and powerful servers. But nobody wants that to happen. Ethical commitment is not just a sign of sound judgement; it is also imperative in an information society that may face dangers that are not fully known: mass surveillance, lack of privacy, large-scale loss of data, etc. It is therefore the data scientist’s duty to work transparently, explaining in a simple and accessible way what their job is and how
  76. 76. 76 Data Scientists: Who are they? What do they do? How do they work? “Clients sometimes comes across things that they weren’t expecting, and communicating it requires specialists who are very good with people”. Felipe Ortega.
  77. 77. 77 Data Scientists: Who are they? What do they do? How do they work? they do it, to quash the threat to privacy that people might often associate with big data. Few people are interested in knowing the intricacies of an algorithm, but they do want an outline of the route that the data follows. One way to ensure that data gets used ethically is to work on open data projects, where anyone can access the data, contributing in some way social awareness and utility. For example, Spanish bank BBVA has launched several of these projects, designed to improve the quality of life of citizens or to optimize efficiency in cities through the intelligent use of information. Open the data, give something back to society, become an aggregated data platform for others to use for the creation of value in cutting-edge projects where altruism replaces the quest for profit. That is the ethical commitment that many data scientists have taken to safeguard the good name of their specialty.
  78. 78. 78 Data Scientists: Who are they? What do they do? How do they work? 9. Data scientists in Spain today
  79. 79. 79 Data Scientists: Who are they? What do they do? How do they work?
  80. 80. 80 Data Scientists: Who are they? What do they do? How do they work? To stay on top of everything and constantly refresh one’s knowledge, curiosity is essential.
  81. 81. 81 Data Scientists: Who are they? What do they do? How do they work? Are Spanish data scientists more qualified or less qualified than other nationalities? Is there a shortage of professionals? Will academic programs keep up with the expected demand in the years to come? Overall, experts agree that Spain is at a par with the leading countries in data science. There is no shortage of highly qualified professionals or startups specializing in big data processing which stand out among the most advanced in Europe, if not the world. The professional level is so high that it’s not unreasonable to think of Spain as a global powerhouse in data science. This opportunity must be managed well to make sure it doesn’t fail. As in other scientific disciplines, excellent professionals are going to other countries to pursue their careers. It’s true that money draws professionals to places like California, but a high concentration does not necessarily imply a higher level. For Spanish data scientists to prove their worth, they should start with loving themselves, acting with professionalism and discretion to ensure a promising future. The range of academic programs is also increasingly extensive in both public and private colleges, where there are countless Master’s programs and specialized courses. This mix is indispensable in a discipline that is permanently in coexistence with innovation and research. So, if something were to jeopardize the advancement of data science in Spain, it wouldn’t be the academic level of specialists, but rather some of the endemic problems provoked by how work is organized in Spanish corporations. For example, agility when implementing projects is not comparable to the United States, where there are far fewer bureaucratic obstacles. Similarly, there is still a gap between academia and the business world: there’s a lack of dynamism when integrating the work of a data scientist into the business world. In Spain, there are claims that there is less flexibility in the labor market when it comes to re-training. Once the professional has focused on a career path, taking the risk to change it is more difficult than in other countries, due to a tendency towards convenience or pigeonholing. Therefore, it is important for organizations to support their employees. That said, Spanish professionals, as well as those from Latin American, have a bonus that can give them a competitive advantage over their peers in rest of the world: creativity, understood as the ability to seek out alternative problem-solving processes that nobody else has imagined. And that fits in with and complements the empathy side. In other words, other words, creativity lets Spanish data scientists apply a part of art - the other is science - to problem-solving.
  82. 82. 82 Data Scientists: Who are they? What do they do? How do they work? “Everyone must realize that our daily life is going to be very dependent on and influenced by data analysis”. Felipe Ortega.
  83. 83. 83 Data Scientists: Who are they? What do they do? How do they work? Who’s making the most out of data science in Spain? Three industries are at the forefront of the implementation of data science in Spain: banking, telecommunications, and tourism. Overall, large companies are investing more resources into data science. These include entities such as Santander, BBVA, Telefónica, Bankinter, Sabadell, La Caixa, Amadeus, Kayak, etc. But this investment isn’t exclusively for large companies. More moderately-sized companies are using data science in a very creative and innovative way, with worldwide recognition of their work. Two examples: Carto Founded in Madrid in 2012, originally as CartoDB. Its most popular tool is Carto Builder, which allows visualization enthusiasts to build interactive maps from geodata with no programming skills required. With more than 1,400 customers, 200,000 registered users and an office in New York, its goals focus on offering large corporations an optimization tool for decision-making and predicting consumer trends. Stratio Also, founded in 2012 as an offshoot of predecessor Paradigma. Stratio develops platforms and products from big data technologies such as Cassandra, Apache Stark, and proprietary developments. Customers using its real-time processing solution come from banking, insurance, tourism, and retail. More than 25 specialists in big data architecture work out of Stratio’s Madrid headquarters. Stratio also has an office in Palo Alto, California, the heart of Silicon Valley.
  84. 84. 84 Data Scientists: Who are they? What do they do? How do they work? 10. Conclusions: still a great deal to be done
  85. 85. 85 Data Scientists: Who are they? What do they do? How do they work?
  86. 86. 86 Data Scientists: Who are they? What do they do? How do they work? “People ask us: are you opening up data so that everyone can do business? Well, yes: we let others have a better knowledge of reality from our data”. Marcelo Soria.
  87. 87. 87 Data Scientists: Who are they? What do they do? How do they work? The analysis of big data has already left behind the emerging technology phase (hype cycle) and is taking hold in many companies. Or, at the very least, certain “core” technologies are, like: distributed databases, real-time processing, large analytical layers, etc. With the initial implementation being wrapped up, data science professionals are treading towards specialization. As the field continues to grow, it is normal for it to split up into specialties, to form an ecosystem. Companies, to some extent, are promoting this trend because they cannot afford to properly compensate large teams of data scientists. The same is happening in training. It’s no longer possible to offer a set of core courses, so the range of academic content is beginning to diversify. As they define their needs, companies will continue to increasingly demand sought-after professionals, who are often awarded grants by the companies that recruit them or guaranteed immediate employment upon completing their education. Lots of companies invest huge sums into market research. Some will realize that data science represents another data source, a new form of RD that converts data into a new value for the company. But big data is still in its teenage years. Many challenges lie ahead, derived from handling large volumes of information and its conversion into useful tools. What’s the adulthood of big data looking like? Attention should be shifted from the “bigness” of data to its application. The famous “Four Vs” of big data (Volume, Velocity, Variety and Veracity) must be expanded to bring in a new concept: Value. This involves reducing the noise of data and increasing its contribution. Data science will mature, strengthen its position, gain recognition as a career and surprise us with future discoveries. It should be designed as a tool to not only bring transparency to the present, but to anticipate the future in a way conducive to business growth.
  88. 88. 88 Data Scientists: Who are they? What do they do? How do they work? “It is our duty to give something back to society. With all the information companies have about people, they can greatly improve their lives”. Richard Benjamins.
  89. 89. 89 Data Scientists: Who are they? What do they do? How do they work? This will be possible by converting data into knowledge, and that knowledge into practical actions, whether to provide better customer service, boost efficiency through automation, or create new business opportunities by identifying cross-sells or opening new markets. At present, most projects related to data analysis focus on cost optimization and process integration. In the future, predictive analysis will place emphasis on data monetization and the delivery of new applications and business opportunities. Predictive models in cloud environments, parallel data processing or sophisticated machine learning algorithms will optimize or guide the decision-making process. Ultimately, companies will have to reinvent themselves or reinterpret themselves as their business becomes more digital and customer proposals will increasingly depend on lessons learned from data. Companies like Siemens, defined by its CEO defines as “a software company”, have already fully embarked on this process. A key element of this evolution will be existence alongside an environment of experimentation, tolerance, and short development cycles that drive innovation. The companies leading this evolution will be those who place the figure of data scientist at the core of their strategy. This way, they will be able to develop the conditions (talent acquisition, employee commitment and priority-setting) needed to place them at the head of the race to turn data into a long-lasting and tangible competitive advantage. In our daily lives, we are already using applications and products that come from processing a huge amount of data: spam filters in email inboxes, recommendations on social networks, search engine results, medical tests and prescriptions, investment funds, etc. And with the future promised by The Internet of Things, the need to process more and more information will only grow and grow. Our lives may end up highly conditioned, or heavily influenced at the very least, by the analysis of all the data surrounding us. A future, in any case, where all those involved in the analysis of big data should be very cautious with everything related to data privacy and consumer confidence. It doesn’t matter if our data is used to better manage our time or our money, customize the advertising we see or improve our health. If we believe that it will improve our lives, we won’t object to anybody’s use of it.
  90. 90. 90 Data Scientists: Who are they? What do they do? How do they work? Annexed.
  91. 91. 91 Data Scientists: Who are they? What do they do? How do they work? Business case 1 Commerce360 What are my customers most interested in? On what day does my competition outsell me? Are their items more expensive or cheaper than mine? When do I sell the most? Where do my buyers live? What is their gender, their age, and how much do they spend on every purchase? Any business would like to know the answers to these and similar questions. Large and medium companies can do this by allocating resources to business intelligence, but it’s more difficult for independent traders or local stores. That’s why Spanish bank BBVA has developed Commerce360, a tool that aims to make business intelligence accessible to any company. Based on aggregated and anonymous data from BBVA card payments, the application extracts indicators related to the industry and profile of customers who buy items in a specific area. “Commerce360 is a tool for retailers, where by using our information on card payments we can provide a store with its economic activity, purchasing dynamics, socio-demographic information on what its customers are like, age, gender, where and when they shop, etc., comparing all this with aggregated businesses that are their competition or other businesses in the area that perform the same type of activity,” as Marcelo Soria explains. As a result, retailers once guided by intuition or other traditional methods have access to an analytical tool that lets them discover the origin of their customers, measure their loyalty, study their demographic characteristics and identify high-value customers. “For us it is a very interesting line for democratizing access to data and data-based intelligence. This is the future of retail,” adds Soria.
  92. 92. 92 Data Scientists: Who are they? What do they do? How do they work?
  93. 93. 93 Data Scientists: Who are they? What do they do? How do they work? Business case 2 Smart Steps SmartSteps is a geo-marketing program developed by Telefónica using data from its mobile phone network. Data is aggregated and extrapolated anonymously to extract information on user trends or behavior patterns in a specific area. The project captures billions of data points from Telefónica’s mobile network, 365 days a year, 24 hours a day. This data is matched with different sociodemographic and mobility indicators (residence, means of transport, age) that can offer companies precise targeting based on the movements of their potential customers. Smart Steps can be applied to any industry in which the movement and knowledge of the user profile are important, such as travel and transport, tourism, or outdoor advertising. For example, local retailers could find out whether participants in an event such as San Fermín are regular or sporadic, where they come from, where they are staying, the length of their visit, etc., and with this information they can tailor their sales approach. It is also useful in the public sector, as knowing people’s movement patterns helps improve traffic management in the city, adapt public transport, or analyze the need to build new infrastructure. In 2014, the program was used to map out the most crime-prone areas in London: the generated algorithm obtained an accuracy of 70% when predicting crime hotspots.
  94. 94. 94 Data Scientists: Who are they? What do they do? How do they work?
  95. 95. 95 Data Scientists: Who are they? What do they do? How do they work? Business case 3 Home Risk Fire Map 25,000 people are killed or injured in house fires every year in the United States. The American Red Cross aims to reduce the number of victims through an initiative based on big data. The Home Fire Risk Map program identifies the locations most house fire-prone across the country, and will be used by Red Cross volunteers to install smoke alarms and provide fire safety courses where they’re needed most. Data suggests that 60% of fires can be prevented simply by having a working smoke alarm and by knowing what to do in the event of a fire. Using different open data repositories, 50 volunteers worked for over a year to create a map that identifies high-risk areas throughout the country. First, they built a model to identify those communities with the least amount of smoke alarm coverage. After that, another algorithm predicted the places most prone to fires. Lastly, a third program calculated the likelihood of injury or death when a home fire does occur. The three models and their results come together on the map presented here. Thanks to this initiative launched in June 2016, the first month saw the installation of 400,000 smoke alarms in households across the United States, with the goal of reaching 2.5 million alarms. Smoke alarms have an average lifespan of 10 years, which signals that a year’s work is expected to result in medium-term benefits.
  96. 96. 96 Data Scientists: Who are they? What do they do? How do they work?
  97. 97. 97 Data Scientists: Who are they? What do they do? How do they work? Business case 4 The Huffington Post The Huffington Post is one of the widest-read digital media resources in the world. And an environment where data analysts enjoy almost as much prominence as editors, since much of their success is due to big data, which optimizes content, authenticates comments, boosts advertising clout, and improves user experience. Real-time statistics and analytical platforms define the editorial process. For HuffPost it is essential to provide the right content to each reader straight away and in the right format. For example, data analytics for the Parents section showed that this demographic mainly uses mobile devices to connect, especially when children are in bed, and is more active on weekend mornings. Content and advertising is tailored to these habits. The huge number of comments received on the website (more than 300 million in 2013) also encouraged HuffPost executives to debug data to improve user experience. This was achieved by means of conjoint analysis, a statistical technique used to evaluate the different characteristics of a product or service. The analysis found that the quality of comments increased by geographic proximity and in identified users, which led THP to banning anonymous comments. Big data was also used to improve user loyalty. In collaboration with technology company Gravity, HuffPost identified topics of interest for its readers, connecting the most compelling content for each type of reader through what it calls “passive personalization”. The technology also provides information on where each reader accesses content, and helps optimize navigation around the website. With an average of 10 to 12 articles read in each session, the goal is to reach 15.
  98. 98. 98 Data Scientists: Who are they? What do they do? How do they work?
  99. 99. 99 Data Scientists: Who are they? What do they do? How do they work? Business case 5 Hillary Clinton’s 2016 campaign Few Americans will have heard of the name Elan Kriegel. Yet millions of them were in his sight during the 2016 presidential campaign. Kriegel led a team of 60 mathematicians and analysts responsible for guiding each of the Democrat candidate’s promotional activities in the campaign, from the party primaries up to the final vote with absolute precision. For example, Kriegel’s team developed an algorithm that decided where to spend each cent of the $60 million TV advertising budget during the primaries. With hundreds of local and state TV networks scattered throughout the country, the victory over Bernie Sanders was molded by carefully choosing the states, networks, programs, and schedules where Clinton would convey her message to voters. Unlike in other countries, campaigns for elections in the United States get fully customized. Key decisions were made based on the work of analysts, such as at what time and how to send email messages to voters, which doors canvassers knock, which numbers phone bankers would dial, which voters to target via a Facebook ad, and which to address through regular mail. This meticulous work turned Clinton’s campaign into more of a mathematical than inspirational exercise. A ground-breaking and efficient campaign organized around models defined by data analysis, and which paves the way for a new era in the definition of political campaigns, based on data culture. And in the meantime, Kriegel’s team is already incubating the next generation of talent within the Democratic Party, unknown names for now but which will play a key role in 2020.
  100. 100. #REBELTHINKING REBEL THINKERS Iñaki Bagazgoitia Mar Castaño Carlos Corredor Laura Dinneen Carlota García-Abril Amelia Hernández Natasha Morrison Ellen Thomas HAVE COLLABORATED Fuencisla Clemares Bosco Aranguren Richard Benjamins Marcelo Soria Álvaro Barbero Alejandro Rodríguez Manuel Marín Esteban Moro Felipe Ortega Pep Porrà ACKNOWLEDGMENT