SlideShare a Scribd company logo
1 of 17
Download to read offline
Challenges of Big Data Research
Manfred M. Fischer
Vienna University of Economics and Business
Regional Science Academy, Special Academic Session, ERSA conference,
Tue 23 August 2016
may be characterized in terms of three dimensions of data management challenges:
Ø Volume refers to the size of the data.
Ø Velocity addresses the speed at which data can be received as well as analyzed.
This also refers to the rate of change of data, which is especially relevant in the
area of stream processing.
Ø Variety refers to the issue of disparate and incompatible data formats. Data can
come in from many different sources and take on many different forms, including
text, audio, video, graph, and more.
1
The Big Data Problem
2
The Big Data Analysis Pipeline
Data Acquisition
and Recording
Information
Extraction and
Cleaning
Data Integration,
Aggregation, and
Representation
Query
Processing, Data
Modelling, and
Analysis
Interpretation
Heterogeneity and
Incompleteness Scale Timeliness Privacy
Human
Collaboration
Major Steps in Analysis of Big Data
Cross-Cutting Challenges
Major steps in analysis of Big Data are shown in the flow at the top. Below it are Big Data needs
that make these tasks challenging.
Overall System
Ø Big Data does not arise out of a vacuum, but is recorded from some data
generating source. Much of this data is of no interest, and can be filtered and
compressed by orders of magnitude.
Ø One challenge is to define these filters in such a way that they do not discard
useful information. We need research in the science of data reduction that can
intelligently process this raw data, and we need on-line techniques that can
process such streaming data on the fly, since we cannot afford to store first and
reduce afterwards.
Ø The second big challenge is to automatically generate the right metadata to
describe what data is recorded and how it is recorded and measured.
3
Data Acquisition and Recording
Ø Frequently the information collected will not be in a format ready for analysis.
Ø We require an information extraction process that pulls out the required
information from the underlying sources and expresses it in a standard form
suitable for analysis.
Ø Doing this correctly and completely is a continuing technical challenge. Note that
this data also includes images and will in the future include video. Such extraction
is often highly application dependent.
Ø Existing work on data cleaning assumes well-recognized constraints on valid data
or well-understood error models. But for many emerging Big Data domains these
do not exist.
4
Information Extraction and Cleaning
Ø Data analysis is considerably more challenging than simply locating, identifying,
understanding, and citing data. For effective large-scale analysis all of this has to
happen in a completely automated manner.
Ø This requires differences in data structuring and semantics to be expressed in
forms that are computer understandable.
Ø There is a strong body of work in data integration that can provide some of the
answers. However, considerable additional work is required to achieve automated
error-free difference resolution.
Ø Even for simpler analyses that depend on only one data set, there remains an
important question of suitable database design.
5
Data Integration, Aggregation, and Representation
Ø Methods for querying and mining Big Data are fundamentally different from
traditional statistical analysis on small samples. Big Data is often noisy, dynamic,
heterogeneous, interrelated and untrustworthy.
Ø Mining requires integrated, cleaned, trustworthy, and efficiently accessible data,
declarative query and mining interfaces, scalable mining algorithms, and Big-Data
computing environments.
Ø In the future, queries towards Big Data will be automatically generated for content
creation on websites, to populate hot-lists or recommendations, and to provide an
ad hoc analysis of the value of a data set to decide whether to store or to discard it.
Ø Scaling complex query processing techniques to terabytes while enabling
interactive response times is a major open research problem today.
Ø Another problem with current Big Data analysis is the lack of coordination
between database systems, which host the data and provide SQL querying, with
analytics packages that perform various forms of non-SQL processing, such as
data mining and statistical analyses.
6
Query Processing, Data Modelling, and Analysis
Ø It is rarely enough to provide just the results. Rather, one must provide
supplementary information [called the provenance of the (result) data] that
explains how each result was derived, and based upon precisely what inputs.
Ø Systems with a rich palette of visualizations become important in conveying to the
users the results of the queries in a way that is best understood in the particular
domain.
Ø Furthermore, with a few clicks the user should be able to drill down into each
piece of data that she sees and understand its provenance, which is a key feature to
understanding the data.
7
Interpretation
Having described the five stages on the Big Data analysis pipeline, we now turn to
some common challenges that underly many, and sometimes all, of these stages.
These are:
Ø Heterogeneity and incompleteness
Ø Scale
Ø Timeliness
Ø Privacy
Ø Human collaboration
8
Common Challenges
Ø When humans consume information, a great deal of heterogeneity is comfortably
tolerated. In fact, the nuance and richness of natural language can provide valuable
depth. However, machine analysis algorithms expect homogeneous data, and
cannot understand nuance. In consequence, data must be carefully structured as a
first step in or prior to data analysis.
Ø Even after data cleaning and error correction, some incompleteness and some
errors in data are likely to remain. This incompleteness and these errors must be
managed during data analysis.
Ø Doing this correctly is a challenge. Recent work on managing probabilistic data
suggests one way to make progress.
9
Heterogeneity and Incompleteness
Ø  Managing large and rapidly increasing volumes of data has been a challenging issue for
many decades. In the past, this challenge was mitigated by processors getting faster.
Ø  Over the last five years the processor technology has made a dramatic shift. Rather than
processors doubling their clock cycle frequency every 18-24 months, now, due to power
constraints, clock speeds have largely stalled and processors are being built with increasing
numbers of cores.
Ø  In the past, large data processing systems had to worry about parallelism across nodes in a
cluster. Now, one has to deal with parallelism within a single node. This requires us to
rethink how we design, build and operate data processing components.
Ø  Another dramatic shift that is underway is the move towards cloud computing, which now
aggregates multiple disparate workloads with varying performance goals into very large
clusters.
Ø  The level of sharing of resources on expensive and large clusters requires new ways of
determining how to run and execute data processing jobs and to deal with system failures.
10
Scale
Ø The design of a system that effectively deals with size is likely also to result in a
system that can process a given size of data set faster. However, it is not just this
speed that is usually meant when one speaks of Velocity in the context of Big
Data. Rather, there is an acquisition rate challenge and a timeliness challenge.
Ø There are many situations in which the result of the analysis is required
immediately. For example, if a fraudulent credit card transaction is suspected, it
should ideally be flagged before the transaction is completed, potentially
preventing the transaction from taking place at all.
Ø Obviously, a full analysis of a user’s purchase history is not likely to be feasible in
real-time. Rather, we need to develop partial results in advance so that a small
amount of incremental computation with new data can be used to arrive at a quick
determination.
11
Timeliness
Ø The privacy of data is another huge concern, and one that increases in the context
of Big Data.
Ø Managing privacy effectively is a technical as well as a sociological problem,
which must be addressed jointly from perspectives to realize the promise of Big
Data.
Ø It is important to rethink security for information sharing in Big Data use cases.
Many online services today require us to share private information, but beyond
record-level access control we do not understand what it means to share data, how
the shared data can be linked, and how to give users fine-grained control over this
sharing.
12
Privacy
Ø  Ideally, analytics for Big Data will be designed to have a human in the loop. The new field
of visual analytics is attempting to do this, at least with respect to the modelling and
analysis phase in the pipeline.
Ø  A popular new method of harnessing human ingenuity to solve problems is through
crowd-sourcing. Wikipedia, the on-line encyclopedia, is perhaps the best known example of
crowd-sourced data.
Ø  We are relying upon information provided by unvetted strangers. While most such errors
will be detected and corrected by others in the crowd, we need technologies to facilitate
this.
Ø  The issues of uncertainty and error become even more pronounced in a specific type of
crowd-sourcing, termed participatory-sensing. In this case, every person with a mobile
phone can act as a multi-modal sensor collecting various types of data instantaneously.
Ø  The extra challenge here is the inherent uncertainty of the data collection devices. The fact
that collected data are probably spatially and temporally correlated can be exploited to
better assess their correctness.
13
Human Collaboration
Ø With Big Data, the use of separate systems becomes prohibitively expensive,
given the large size of the data sets. The expense is due not only to the cost of the
systems themselves, but also the time to load the data into multiple systems.
Ø Big Data has made it necessary to run heterogeneous workloads on a single
infrastructure that is sufficiently flexible to handle all these workloads.
Ø The challenge here is not to build a system that is ideally suited for all processing
tasks. Instead, the need is for the underlying system architecture to be flexible
enough that the components built on top of it, for expressing the various kinds of
processing tasks, can tune it to efficiently run these different workloads.
14
System Architecture
Ø Through better analysis of the large volumes of data, there is the potential for
making faster advances in many scientific disciplines and improving the
profitability and success of many enterprises.
Ø But many technical challenges must be addressed before this potential can be
realized fully.
Ø The challenges include not just the obvious issues of scale, but also heterogeneity,
lack of structure, error-handling, privacy, timeliness, provenance, and
visualization, at all stages of the analysis pipeline from data acquisition to result
interpretation.
Ø These challenges will require transformative solutions, and will not be addressed
naturally by the next generation of industrial products.
Ø Hence, we must support and encourage fundamental research towards addressing
these technical challenges if we are to achieve the promised benefits of Big Data.
15
Closing Remarks
Thank you for your attention
16

More Related Content

What's hot

Big Data: Its Characteristics And Architecture Capabilities
Big Data: Its Characteristics And Architecture CapabilitiesBig Data: Its Characteristics And Architecture Capabilities
Big Data: Its Characteristics And Architecture CapabilitiesAshraf Uddin
 
The Advantages and Disadvantages of Big Data
The Advantages and Disadvantages of Big DataThe Advantages and Disadvantages of Big Data
The Advantages and Disadvantages of Big DataNicha Tatsaneeyapan
 
BIG DATA & DATA ANALYTICS
BIG  DATA & DATA  ANALYTICSBIG  DATA & DATA  ANALYTICS
BIG DATA & DATA ANALYTICSNAGARAJAGIDDE
 
Big Data Characteristics And Process PowerPoint Presentation Slides
Big Data Characteristics And Process PowerPoint Presentation SlidesBig Data Characteristics And Process PowerPoint Presentation Slides
Big Data Characteristics And Process PowerPoint Presentation SlidesSlideTeam
 
06/07/17 Table ronde Data Transformation Program
06/07/17 Table ronde Data Transformation Program06/07/17 Table ronde Data Transformation Program
06/07/17 Table ronde Data Transformation ProgramSoft Computing
 
The role of data engineering in data science and analytics practice
The role of data engineering in data science and analytics practiceThe role of data engineering in data science and analytics practice
The role of data engineering in data science and analytics practiceJoseph Benjamin Ilagan
 
Agile Data Engineering - Intro to Data Vault Modeling (2016)
Agile Data Engineering - Intro to Data Vault Modeling (2016)Agile Data Engineering - Intro to Data Vault Modeling (2016)
Agile Data Engineering - Intro to Data Vault Modeling (2016)Kent Graziano
 
What Is Data Science? Data Science Course - Data Science Tutorial For Beginne...
What Is Data Science? Data Science Course - Data Science Tutorial For Beginne...What Is Data Science? Data Science Course - Data Science Tutorial For Beginne...
What Is Data Science? Data Science Course - Data Science Tutorial For Beginne...Edureka!
 
Build Intelligent Fraud Prevention with Machine Learning and Graphs
Build Intelligent Fraud Prevention with Machine Learning and GraphsBuild Intelligent Fraud Prevention with Machine Learning and Graphs
Build Intelligent Fraud Prevention with Machine Learning and GraphsNeo4j
 
Big Data & Analytics (Conceptual and Practical Introduction)
Big Data & Analytics (Conceptual and Practical Introduction)Big Data & Analytics (Conceptual and Practical Introduction)
Big Data & Analytics (Conceptual and Practical Introduction)Yaman Hajja, Ph.D.
 
Drug and Vaccine Discovery: Knowledge Graph + Apache Spark
Drug and Vaccine Discovery: Knowledge Graph + Apache SparkDrug and Vaccine Discovery: Knowledge Graph + Apache Spark
Drug and Vaccine Discovery: Knowledge Graph + Apache SparkDatabricks
 

What's hot (20)

Big data
Big dataBig data
Big data
 
Applications of Big Data
Applications of Big DataApplications of Big Data
Applications of Big Data
 
Big Data: Its Characteristics And Architecture Capabilities
Big Data: Its Characteristics And Architecture CapabilitiesBig Data: Its Characteristics And Architecture Capabilities
Big Data: Its Characteristics And Architecture Capabilities
 
The Advantages and Disadvantages of Big Data
The Advantages and Disadvantages of Big DataThe Advantages and Disadvantages of Big Data
The Advantages and Disadvantages of Big Data
 
Big data
Big dataBig data
Big data
 
BIG DATA & DATA ANALYTICS
BIG  DATA & DATA  ANALYTICSBIG  DATA & DATA  ANALYTICS
BIG DATA & DATA ANALYTICS
 
Presentation on Big Data
Presentation on Big DataPresentation on Big Data
Presentation on Big Data
 
Big data ppt
Big data pptBig data ppt
Big data ppt
 
Big Data Characteristics And Process PowerPoint Presentation Slides
Big Data Characteristics And Process PowerPoint Presentation SlidesBig Data Characteristics And Process PowerPoint Presentation Slides
Big Data Characteristics And Process PowerPoint Presentation Slides
 
Why Data Vault?
Why Data Vault? Why Data Vault?
Why Data Vault?
 
Big data analysis
Big data analysisBig data analysis
Big data analysis
 
06/07/17 Table ronde Data Transformation Program
06/07/17 Table ronde Data Transformation Program06/07/17 Table ronde Data Transformation Program
06/07/17 Table ronde Data Transformation Program
 
The role of data engineering in data science and analytics practice
The role of data engineering in data science and analytics practiceThe role of data engineering in data science and analytics practice
The role of data engineering in data science and analytics practice
 
Agile Data Engineering - Intro to Data Vault Modeling (2016)
Agile Data Engineering - Intro to Data Vault Modeling (2016)Agile Data Engineering - Intro to Data Vault Modeling (2016)
Agile Data Engineering - Intro to Data Vault Modeling (2016)
 
What Is Data Science? Data Science Course - Data Science Tutorial For Beginne...
What Is Data Science? Data Science Course - Data Science Tutorial For Beginne...What Is Data Science? Data Science Course - Data Science Tutorial For Beginne...
What Is Data Science? Data Science Course - Data Science Tutorial For Beginne...
 
Build Intelligent Fraud Prevention with Machine Learning and Graphs
Build Intelligent Fraud Prevention with Machine Learning and GraphsBuild Intelligent Fraud Prevention with Machine Learning and Graphs
Build Intelligent Fraud Prevention with Machine Learning and Graphs
 
Data Cleaning Techniques
Data Cleaning TechniquesData Cleaning Techniques
Data Cleaning Techniques
 
Big Data & Analytics (Conceptual and Practical Introduction)
Big Data & Analytics (Conceptual and Practical Introduction)Big Data & Analytics (Conceptual and Practical Introduction)
Big Data & Analytics (Conceptual and Practical Introduction)
 
Big data storage
Big data storageBig data storage
Big data storage
 
Drug and Vaccine Discovery: Knowledge Graph + Apache Spark
Drug and Vaccine Discovery: Knowledge Graph + Apache SparkDrug and Vaccine Discovery: Knowledge Graph + Apache Spark
Drug and Vaccine Discovery: Knowledge Graph + Apache Spark
 

Similar to Challenges of Big Data Research

SEAMLESS AUTOMATION AND INTEGRATION OF MACHINE LEARNING CAPABILITIES FOR BIG ...
SEAMLESS AUTOMATION AND INTEGRATION OF MACHINE LEARNING CAPABILITIES FOR BIG ...SEAMLESS AUTOMATION AND INTEGRATION OF MACHINE LEARNING CAPABILITIES FOR BIG ...
SEAMLESS AUTOMATION AND INTEGRATION OF MACHINE LEARNING CAPABILITIES FOR BIG ...ijdpsjournal
 
SEAMLESS AUTOMATION AND INTEGRATION OF MACHINE LEARNING CAPABILITIES FOR BIG ...
SEAMLESS AUTOMATION AND INTEGRATION OF MACHINE LEARNING CAPABILITIES FOR BIG ...SEAMLESS AUTOMATION AND INTEGRATION OF MACHINE LEARNING CAPABILITIES FOR BIG ...
SEAMLESS AUTOMATION AND INTEGRATION OF MACHINE LEARNING CAPABILITIES FOR BIG ...ijdpsjournal
 
Big data issues and challenges
Big data issues and challengesBig data issues and challenges
Big data issues and challengesDilpreet kaur Virk
 
Big dataplatform operationalstrategy
Big dataplatform operationalstrategyBig dataplatform operationalstrategy
Big dataplatform operationalstrategyHimanshu Bari
 
big data Big Things
big data Big Thingsbig data Big Things
big data Big Thingspateelhs
 
Ab cs of big data
Ab cs of big dataAb cs of big data
Ab cs of big dataDigimark
 
Fundamentals of data mining and its applications
Fundamentals of data mining and its applicationsFundamentals of data mining and its applications
Fundamentals of data mining and its applicationsSubrat Swain
 
Big data ppt
Big data pptBig data ppt
Big data pptYash Raj
 
A Survey on Big Data Analytics
A Survey on Big Data AnalyticsA Survey on Big Data Analytics
A Survey on Big Data AnalyticsBHARATH KUMAR
 
Sameer Kumar Das International Conference Paper 53
Sameer Kumar Das International Conference Paper 53Sameer Kumar Das International Conference Paper 53
Sameer Kumar Das International Conference Paper 53Mr.Sameer Kumar Das
 
Nikita rajbhoj(a 50)
Nikita rajbhoj(a 50)Nikita rajbhoj(a 50)
Nikita rajbhoj(a 50)NikitaRajbhoj
 
A Survey on Big Data Analytics: Challenges
A Survey on Big Data Analytics: ChallengesA Survey on Big Data Analytics: Challenges
A Survey on Big Data Analytics: ChallengesDr. Amarjeet Singh
 
Emcien overview v6 01282013
Emcien overview v6 01282013Emcien overview v6 01282013
Emcien overview v6 01282013WCJones6348
 

Similar to Challenges of Big Data Research (20)

SEAMLESS AUTOMATION AND INTEGRATION OF MACHINE LEARNING CAPABILITIES FOR BIG ...
SEAMLESS AUTOMATION AND INTEGRATION OF MACHINE LEARNING CAPABILITIES FOR BIG ...SEAMLESS AUTOMATION AND INTEGRATION OF MACHINE LEARNING CAPABILITIES FOR BIG ...
SEAMLESS AUTOMATION AND INTEGRATION OF MACHINE LEARNING CAPABILITIES FOR BIG ...
 
SEAMLESS AUTOMATION AND INTEGRATION OF MACHINE LEARNING CAPABILITIES FOR BIG ...
SEAMLESS AUTOMATION AND INTEGRATION OF MACHINE LEARNING CAPABILITIES FOR BIG ...SEAMLESS AUTOMATION AND INTEGRATION OF MACHINE LEARNING CAPABILITIES FOR BIG ...
SEAMLESS AUTOMATION AND INTEGRATION OF MACHINE LEARNING CAPABILITIES FOR BIG ...
 
Big data issues and challenges
Big data issues and challengesBig data issues and challenges
Big data issues and challenges
 
Research paper on big data and hadoop
Research paper on big data and hadoopResearch paper on big data and hadoop
Research paper on big data and hadoop
 
Big dataplatform operationalstrategy
Big dataplatform operationalstrategyBig dataplatform operationalstrategy
Big dataplatform operationalstrategy
 
The ABCs of Big Data
The ABCs of Big DataThe ABCs of Big Data
The ABCs of Big Data
 
big data Big Things
big data Big Thingsbig data Big Things
big data Big Things
 
Ab cs of big data
Ab cs of big dataAb cs of big data
Ab cs of big data
 
Big data
Big dataBig data
Big data
 
Fundamentals of data mining and its applications
Fundamentals of data mining and its applicationsFundamentals of data mining and its applications
Fundamentals of data mining and its applications
 
Big data ppt
Big data pptBig data ppt
Big data ppt
 
Big data upload
Big data uploadBig data upload
Big data upload
 
A Survey on Big Data Analytics
A Survey on Big Data AnalyticsA Survey on Big Data Analytics
A Survey on Big Data Analytics
 
Big data
Big dataBig data
Big data
 
Sameer Kumar Das International Conference Paper 53
Sameer Kumar Das International Conference Paper 53Sameer Kumar Das International Conference Paper 53
Sameer Kumar Das International Conference Paper 53
 
Nikita rajbhoj(a 50)
Nikita rajbhoj(a 50)Nikita rajbhoj(a 50)
Nikita rajbhoj(a 50)
 
A Survey on Big Data Analytics: Challenges
A Survey on Big Data Analytics: ChallengesA Survey on Big Data Analytics: Challenges
A Survey on Big Data Analytics: Challenges
 
Fundamentals of Big Data
Fundamentals of Big DataFundamentals of Big Data
Fundamentals of Big Data
 
Emcien overview v6 01282013
Emcien overview v6 01282013Emcien overview v6 01282013
Emcien overview v6 01282013
 
Big Data: Issues and Challenges
Big Data: Issues and ChallengesBig Data: Issues and Challenges
Big Data: Issues and Challenges
 

More from Regional Science Academy

Bots Versus Bohemians: Resiliency of Labor Markets in Automated Cities
Bots Versus Bohemians: Resiliency of Labor Markets in Automated CitiesBots Versus Bohemians: Resiliency of Labor Markets in Automated Cities
Bots Versus Bohemians: Resiliency of Labor Markets in Automated CitiesRegional Science Academy
 
A selection of regional science papers important for my career
A selection of regional science papers important for my careerA selection of regional science papers important for my career
A selection of regional science papers important for my careerRegional Science Academy
 
High-tech services to companies in the city: therise of the modern economy in...
High-tech services to companies in the city: therise of the modern economy in...High-tech services to companies in the city: therise of the modern economy in...
High-tech services to companies in the city: therise of the modern economy in...Regional Science Academy
 
Tourism in the Smart City:a Common place for tourists and residents
Tourism in the Smart City:a Common place for tourists and residentsTourism in the Smart City:a Common place for tourists and residents
Tourism in the Smart City:a Common place for tourists and residentsRegional Science Academy
 
BANSKÁ BYSTRICA IN CONTEXT OF SMART URBAN DEVELOPMENT
BANSKÁ BYSTRICA IN CONTEXT OF SMART URBAN DEVELOPMENTBANSKÁ BYSTRICA IN CONTEXT OF SMART URBAN DEVELOPMENT
BANSKÁ BYSTRICA IN CONTEXT OF SMART URBAN DEVELOPMENTRegional Science Academy
 
Assessing Metropolitan Transportation Investments: Spatial Econometrics-CGE C...
Assessing Metropolitan Transportation Investments: Spatial Econometrics-CGE C...Assessing Metropolitan Transportation Investments: Spatial Econometrics-CGE C...
Assessing Metropolitan Transportation Investments: Spatial Econometrics-CGE C...Regional Science Academy
 
Regional Brain Drain in the Chilean Economy
Regional Brain Drain in the Chilean EconomyRegional Brain Drain in the Chilean Economy
Regional Brain Drain in the Chilean EconomyRegional Science Academy
 
Creative Capital, Information & Communication Technologies, & Economic Growth...
Creative Capital, Information & Communication Technologies, & Economic Growth...Creative Capital, Information & Communication Technologies, & Economic Growth...
Creative Capital, Information & Communication Technologies, & Economic Growth...Regional Science Academy
 
Data requirements for smart people in smart cities
Data requirements for smart people in smart citiesData requirements for smart people in smart cities
Data requirements for smart people in smart citiesRegional Science Academy
 
Urban Empires – Cities as Global Rulers in the New Urban World
Urban Empires – Cities as Global Rulers in the New Urban WorldUrban Empires – Cities as Global Rulers in the New Urban World
Urban Empires – Cities as Global Rulers in the New Urban WorldRegional Science Academy
 
Geographic Clustering of Craft Breweries in Select American Cities
Geographic Clustering of Craft Breweries in Select American CitiesGeographic Clustering of Craft Breweries in Select American Cities
Geographic Clustering of Craft Breweries in Select American CitiesRegional Science Academy
 

More from Regional Science Academy (20)

Bots Versus Bohemians: Resiliency of Labor Markets in Automated Cities
Bots Versus Bohemians: Resiliency of Labor Markets in Automated CitiesBots Versus Bohemians: Resiliency of Labor Markets in Automated Cities
Bots Versus Bohemians: Resiliency of Labor Markets in Automated Cities
 
A selection of regional science papers important for my career
A selection of regional science papers important for my careerA selection of regional science papers important for my career
A selection of regional science papers important for my career
 
Population and Migration
Population and MigrationPopulation and Migration
Population and Migration
 
Big Data and Big Cities
Big Data and Big CitiesBig Data and Big Cities
Big Data and Big Cities
 
THE CITY IN REGIONAL SCIENCE
THE CITY IN REGIONAL SCIENCETHE CITY IN REGIONAL SCIENCE
THE CITY IN REGIONAL SCIENCE
 
High-tech services to companies in the city: therise of the modern economy in...
High-tech services to companies in the city: therise of the modern economy in...High-tech services to companies in the city: therise of the modern economy in...
High-tech services to companies in the city: therise of the modern economy in...
 
The Geography of Urban Intelligence
The Geography of Urban IntelligenceThe Geography of Urban Intelligence
The Geography of Urban Intelligence
 
Matej Bel - Magnum Decus Hungariae
Matej Bel - Magnum Decus HungariaeMatej Bel - Magnum Decus Hungariae
Matej Bel - Magnum Decus Hungariae
 
Julian Wolpert
Julian Wolpert Julian Wolpert
Julian Wolpert
 
Resilience in Spatial and Urban Systems 2
Resilience in Spatial and Urban Systems 2Resilience in Spatial and Urban Systems 2
Resilience in Spatial and Urban Systems 2
 
Tourism in the Smart City:a Common place for tourists and residents
Tourism in the Smart City:a Common place for tourists and residentsTourism in the Smart City:a Common place for tourists and residents
Tourism in the Smart City:a Common place for tourists and residents
 
Citiesand their (start-up) communities
Citiesand their (start-up) communitiesCitiesand their (start-up) communities
Citiesand their (start-up) communities
 
BANSKÁ BYSTRICA IN CONTEXT OF SMART URBAN DEVELOPMENT
BANSKÁ BYSTRICA IN CONTEXT OF SMART URBAN DEVELOPMENTBANSKÁ BYSTRICA IN CONTEXT OF SMART URBAN DEVELOPMENT
BANSKÁ BYSTRICA IN CONTEXT OF SMART URBAN DEVELOPMENT
 
Assessing Metropolitan Transportation Investments: Spatial Econometrics-CGE C...
Assessing Metropolitan Transportation Investments: Spatial Econometrics-CGE C...Assessing Metropolitan Transportation Investments: Spatial Econometrics-CGE C...
Assessing Metropolitan Transportation Investments: Spatial Econometrics-CGE C...
 
Regional Brain Drain in the Chilean Economy
Regional Brain Drain in the Chilean EconomyRegional Brain Drain in the Chilean Economy
Regional Brain Drain in the Chilean Economy
 
Resilience in Spatial and Urban Systems
Resilience in Spatial and Urban SystemsResilience in Spatial and Urban Systems
Resilience in Spatial and Urban Systems
 
Creative Capital, Information & Communication Technologies, & Economic Growth...
Creative Capital, Information & Communication Technologies, & Economic Growth...Creative Capital, Information & Communication Technologies, & Economic Growth...
Creative Capital, Information & Communication Technologies, & Economic Growth...
 
Data requirements for smart people in smart cities
Data requirements for smart people in smart citiesData requirements for smart people in smart cities
Data requirements for smart people in smart cities
 
Urban Empires – Cities as Global Rulers in the New Urban World
Urban Empires – Cities as Global Rulers in the New Urban WorldUrban Empires – Cities as Global Rulers in the New Urban World
Urban Empires – Cities as Global Rulers in the New Urban World
 
Geographic Clustering of Craft Breweries in Select American Cities
Geographic Clustering of Craft Breweries in Select American CitiesGeographic Clustering of Craft Breweries in Select American Cities
Geographic Clustering of Craft Breweries in Select American Cities
 

Recently uploaded

Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...christianmathematics
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhikauryashika82
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introductionMaksud Ahmed
 
Class 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfClass 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfAyushMahapatra5
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfsanyamsingh5019
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfAdmir Softic
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...EduSkills OECD
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfJayanti Pande
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdfSoniaTolstoy
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104misteraugie
 
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...fonyou31
 
Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Disha Kariya
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdfQucHHunhnh
 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfchloefrazer622
 
Student login on Anyboli platform.helpin
Student login on Anyboli platform.helpinStudent login on Anyboli platform.helpin
Student login on Anyboli platform.helpinRaunakKeshri1
 
Disha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdfDisha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdfchloefrazer622
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfciinovamais
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Sapana Sha
 
social pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajansocial pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajanpragatimahajan3
 

Recently uploaded (20)

Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introduction
 
Class 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfClass 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdf
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdf
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdf
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdf
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
 
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104
 
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
 
Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdf
 
Student login on Anyboli platform.helpin
Student login on Anyboli platform.helpinStudent login on Anyboli platform.helpin
Student login on Anyboli platform.helpin
 
Disha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdfDisha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdf
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
 
social pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajansocial pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajan
 

Challenges of Big Data Research

  • 1. Challenges of Big Data Research Manfred M. Fischer Vienna University of Economics and Business Regional Science Academy, Special Academic Session, ERSA conference, Tue 23 August 2016
  • 2. may be characterized in terms of three dimensions of data management challenges: Ø Volume refers to the size of the data. Ø Velocity addresses the speed at which data can be received as well as analyzed. This also refers to the rate of change of data, which is especially relevant in the area of stream processing. Ø Variety refers to the issue of disparate and incompatible data formats. Data can come in from many different sources and take on many different forms, including text, audio, video, graph, and more. 1 The Big Data Problem
  • 3. 2 The Big Data Analysis Pipeline Data Acquisition and Recording Information Extraction and Cleaning Data Integration, Aggregation, and Representation Query Processing, Data Modelling, and Analysis Interpretation Heterogeneity and Incompleteness Scale Timeliness Privacy Human Collaboration Major Steps in Analysis of Big Data Cross-Cutting Challenges Major steps in analysis of Big Data are shown in the flow at the top. Below it are Big Data needs that make these tasks challenging. Overall System
  • 4. Ø Big Data does not arise out of a vacuum, but is recorded from some data generating source. Much of this data is of no interest, and can be filtered and compressed by orders of magnitude. Ø One challenge is to define these filters in such a way that they do not discard useful information. We need research in the science of data reduction that can intelligently process this raw data, and we need on-line techniques that can process such streaming data on the fly, since we cannot afford to store first and reduce afterwards. Ø The second big challenge is to automatically generate the right metadata to describe what data is recorded and how it is recorded and measured. 3 Data Acquisition and Recording
  • 5. Ø Frequently the information collected will not be in a format ready for analysis. Ø We require an information extraction process that pulls out the required information from the underlying sources and expresses it in a standard form suitable for analysis. Ø Doing this correctly and completely is a continuing technical challenge. Note that this data also includes images and will in the future include video. Such extraction is often highly application dependent. Ø Existing work on data cleaning assumes well-recognized constraints on valid data or well-understood error models. But for many emerging Big Data domains these do not exist. 4 Information Extraction and Cleaning
  • 6. Ø Data analysis is considerably more challenging than simply locating, identifying, understanding, and citing data. For effective large-scale analysis all of this has to happen in a completely automated manner. Ø This requires differences in data structuring and semantics to be expressed in forms that are computer understandable. Ø There is a strong body of work in data integration that can provide some of the answers. However, considerable additional work is required to achieve automated error-free difference resolution. Ø Even for simpler analyses that depend on only one data set, there remains an important question of suitable database design. 5 Data Integration, Aggregation, and Representation
  • 7. Ø Methods for querying and mining Big Data are fundamentally different from traditional statistical analysis on small samples. Big Data is often noisy, dynamic, heterogeneous, interrelated and untrustworthy. Ø Mining requires integrated, cleaned, trustworthy, and efficiently accessible data, declarative query and mining interfaces, scalable mining algorithms, and Big-Data computing environments. Ø In the future, queries towards Big Data will be automatically generated for content creation on websites, to populate hot-lists or recommendations, and to provide an ad hoc analysis of the value of a data set to decide whether to store or to discard it. Ø Scaling complex query processing techniques to terabytes while enabling interactive response times is a major open research problem today. Ø Another problem with current Big Data analysis is the lack of coordination between database systems, which host the data and provide SQL querying, with analytics packages that perform various forms of non-SQL processing, such as data mining and statistical analyses. 6 Query Processing, Data Modelling, and Analysis
  • 8. Ø It is rarely enough to provide just the results. Rather, one must provide supplementary information [called the provenance of the (result) data] that explains how each result was derived, and based upon precisely what inputs. Ø Systems with a rich palette of visualizations become important in conveying to the users the results of the queries in a way that is best understood in the particular domain. Ø Furthermore, with a few clicks the user should be able to drill down into each piece of data that she sees and understand its provenance, which is a key feature to understanding the data. 7 Interpretation
  • 9. Having described the five stages on the Big Data analysis pipeline, we now turn to some common challenges that underly many, and sometimes all, of these stages. These are: Ø Heterogeneity and incompleteness Ø Scale Ø Timeliness Ø Privacy Ø Human collaboration 8 Common Challenges
  • 10. Ø When humans consume information, a great deal of heterogeneity is comfortably tolerated. In fact, the nuance and richness of natural language can provide valuable depth. However, machine analysis algorithms expect homogeneous data, and cannot understand nuance. In consequence, data must be carefully structured as a first step in or prior to data analysis. Ø Even after data cleaning and error correction, some incompleteness and some errors in data are likely to remain. This incompleteness and these errors must be managed during data analysis. Ø Doing this correctly is a challenge. Recent work on managing probabilistic data suggests one way to make progress. 9 Heterogeneity and Incompleteness
  • 11. Ø  Managing large and rapidly increasing volumes of data has been a challenging issue for many decades. In the past, this challenge was mitigated by processors getting faster. Ø  Over the last five years the processor technology has made a dramatic shift. Rather than processors doubling their clock cycle frequency every 18-24 months, now, due to power constraints, clock speeds have largely stalled and processors are being built with increasing numbers of cores. Ø  In the past, large data processing systems had to worry about parallelism across nodes in a cluster. Now, one has to deal with parallelism within a single node. This requires us to rethink how we design, build and operate data processing components. Ø  Another dramatic shift that is underway is the move towards cloud computing, which now aggregates multiple disparate workloads with varying performance goals into very large clusters. Ø  The level of sharing of resources on expensive and large clusters requires new ways of determining how to run and execute data processing jobs and to deal with system failures. 10 Scale
  • 12. Ø The design of a system that effectively deals with size is likely also to result in a system that can process a given size of data set faster. However, it is not just this speed that is usually meant when one speaks of Velocity in the context of Big Data. Rather, there is an acquisition rate challenge and a timeliness challenge. Ø There are many situations in which the result of the analysis is required immediately. For example, if a fraudulent credit card transaction is suspected, it should ideally be flagged before the transaction is completed, potentially preventing the transaction from taking place at all. Ø Obviously, a full analysis of a user’s purchase history is not likely to be feasible in real-time. Rather, we need to develop partial results in advance so that a small amount of incremental computation with new data can be used to arrive at a quick determination. 11 Timeliness
  • 13. Ø The privacy of data is another huge concern, and one that increases in the context of Big Data. Ø Managing privacy effectively is a technical as well as a sociological problem, which must be addressed jointly from perspectives to realize the promise of Big Data. Ø It is important to rethink security for information sharing in Big Data use cases. Many online services today require us to share private information, but beyond record-level access control we do not understand what it means to share data, how the shared data can be linked, and how to give users fine-grained control over this sharing. 12 Privacy
  • 14. Ø  Ideally, analytics for Big Data will be designed to have a human in the loop. The new field of visual analytics is attempting to do this, at least with respect to the modelling and analysis phase in the pipeline. Ø  A popular new method of harnessing human ingenuity to solve problems is through crowd-sourcing. Wikipedia, the on-line encyclopedia, is perhaps the best known example of crowd-sourced data. Ø  We are relying upon information provided by unvetted strangers. While most such errors will be detected and corrected by others in the crowd, we need technologies to facilitate this. Ø  The issues of uncertainty and error become even more pronounced in a specific type of crowd-sourcing, termed participatory-sensing. In this case, every person with a mobile phone can act as a multi-modal sensor collecting various types of data instantaneously. Ø  The extra challenge here is the inherent uncertainty of the data collection devices. The fact that collected data are probably spatially and temporally correlated can be exploited to better assess their correctness. 13 Human Collaboration
  • 15. Ø With Big Data, the use of separate systems becomes prohibitively expensive, given the large size of the data sets. The expense is due not only to the cost of the systems themselves, but also the time to load the data into multiple systems. Ø Big Data has made it necessary to run heterogeneous workloads on a single infrastructure that is sufficiently flexible to handle all these workloads. Ø The challenge here is not to build a system that is ideally suited for all processing tasks. Instead, the need is for the underlying system architecture to be flexible enough that the components built on top of it, for expressing the various kinds of processing tasks, can tune it to efficiently run these different workloads. 14 System Architecture
  • 16. Ø Through better analysis of the large volumes of data, there is the potential for making faster advances in many scientific disciplines and improving the profitability and success of many enterprises. Ø But many technical challenges must be addressed before this potential can be realized fully. Ø The challenges include not just the obvious issues of scale, but also heterogeneity, lack of structure, error-handling, privacy, timeliness, provenance, and visualization, at all stages of the analysis pipeline from data acquisition to result interpretation. Ø These challenges will require transformative solutions, and will not be addressed naturally by the next generation of industrial products. Ø Hence, we must support and encourage fundamental research towards addressing these technical challenges if we are to achieve the promised benefits of Big Data. 15 Closing Remarks
  • 17. Thank you for your attention 16