SlideShare una empresa de Scribd logo
1 de 9
Descargar para leer sin conexión
Some ideas on Theory of Data Systems
Ansu Chatterjee
School of Statistics,
University of Minnesota
February 12, 2018
What’s contained below
This is partially based on papers by Chandrasekaran, De
Domenico, Smith, Johnson, and other talks we heard
earlier today.
Contains a lot of ramblings.
Important disclaimer: I’m not an expert on most things
discussed below!
The main idea
Chandrasekaran (and Jordan, Soh, Berthet) proposed
using a bit of both of these considerations: so there is a
cost for time/storage, there is also a quality constraint on
the resulting estimate, and both depend on the sample
size.
De Domenico brought in multi-layered networks: the
different data repositories, the different computational
resources, algorithms, models, can form networks.
Smith introduced different blocking methods.
Crichton brought out the different levels of abstractions and
complexities that underlie a data analytic task.
Johnson brought together all the above elements.
The main idea
The way we do Statistics now: (i) data is a resource, (ii)
the more we use data the better our estimates get, (iii)
there is no cost associated with obtaining data,
pre-processing, storing, time required to access data and
to run the algorithm.
The way algorithms are evaluated in (theoretical) computer
science: (i) time for implementation and storage are the
important things to consideration, (ii) quality and usage of
the output does not matter, (iii) data is a constraint (the
more data, the greater is the demand on storage and time).
Constrained view of what is optimal.
The goals may converge
Example
Consider i.i.d. observations X1, . . . , Xn from Np(0, Σ).
The goal is to estimate (λ1, P1), the highest eigenvalue
and the corresponding eigenvector of Σ.
This can be done in O(p2) steps more or less (depends on
assumptions, skills).
That might be a futile exercise, since in general the sample
maximum eigenvalue is not consistent for the population
maximum eigenvalue.
Solution: Make assumptions about Σ, which may reduce
computational complexity and make the computational
meaningful.
Many goals in both computation and statistics
CS: Traditionally, we consider time and storage
requirements as important quantifiers of complexity.
CS: The distributed nature of data, and distributed
computing, generates additional quantifiers. (Think today’s
talks, think MapReduce and beyond.)
Statistics: Robustness (to assumptions, model, quality of
data) are additional quantifiers of desirable statistical
properties.
Statistics: High-dimensional data requires strong
assumptions.
Statistics is forgiving
Example
An estimator of the mean is ¯Xn.
Anything wrong with ¯Xn + 42/n? Absolutely nothing (up to
first order asymptotics).
We do not need the estimators to be more than O(n−α)
precise.
Statistical inference often does not depend on being
ultra-precise, and shouldn’t be considered trustworthy if it
does (takes us back to robustness).
(As Venkat pointed out, and George Box before him:)
Statistical models are approximations anyway. :
Essentially, all models are wrong, but some are useful. (G.
E. P. Box)
True for not just statistical models, but also for all other
sciences as well.
There is more to it
Example
Facebook, allegedly, has a data center in Lapland:
There are benefits to Facebook with reduced costs of
cooling towers.
There are environmental costs to Lapland.
There are possible economic benefits to Sweden.
Possibly a cost-benefit analysis, the way economist do,
may be useful here. (Real options?)
Don’t forget the stakeholders
Example
All tweets are stored (hopefully):
Is that necessary? Informative?
For example: all tweets of the president may be personally
important to him.
They may be important to future historians.
There are different kinds of stakeholders for each
computation/statistics exercise. Often, the core issue is
one of inference.
My personal view: Think of streaming data as the template,
and consider careful preferential sampling and online
updating. (Leads to non-regular asymptotics!)

Más contenido relacionado

La actualidad más candente

Mining on Relationships in Big Data era using Improve Apriori Algorithm with ...
Mining on Relationships in Big Data era using Improve Apriori Algorithm with ...Mining on Relationships in Big Data era using Improve Apriori Algorithm with ...
Mining on Relationships in Big Data era using Improve Apriori Algorithm with ...KamleshKumar394
 
poem_presentation_v5_linkedIn_version
poem_presentation_v5_linkedIn_versionpoem_presentation_v5_linkedIn_version
poem_presentation_v5_linkedIn_versionIliada Eleftheriou
 
Clustering big spatiotemporal interval data
Clustering big spatiotemporal interval dataClustering big spatiotemporal interval data
Clustering big spatiotemporal interval dataNexgen Technology
 
Introduction to Topological Data Analysis
Introduction to Topological Data AnalysisIntroduction to Topological Data Analysis
Introduction to Topological Data AnalysisHendri Karisma
 
ComputableFacts: a Secure System to Store Documents and Graphs
ComputableFacts: a Secure System to Store Documents and GraphsComputableFacts: a Secure System to Store Documents and Graphs
ComputableFacts: a Secure System to Store Documents and GraphsAccumulo Summit
 
Search: Probabilistic Information Retrieval
Search: Probabilistic Information RetrievalSearch: Probabilistic Information Retrieval
Search: Probabilistic Information RetrievalVipul Munot
 

La actualidad más candente (8)

Mining on Relationships in Big Data era using Improve Apriori Algorithm with ...
Mining on Relationships in Big Data era using Improve Apriori Algorithm with ...Mining on Relationships in Big Data era using Improve Apriori Algorithm with ...
Mining on Relationships in Big Data era using Improve Apriori Algorithm with ...
 
poem_presentation_v5_linkedIn_version
poem_presentation_v5_linkedIn_versionpoem_presentation_v5_linkedIn_version
poem_presentation_v5_linkedIn_version
 
Data science
Data scienceData science
Data science
 
Clustering big spatiotemporal interval data
Clustering big spatiotemporal interval dataClustering big spatiotemporal interval data
Clustering big spatiotemporal interval data
 
Introduction to Topological Data Analysis
Introduction to Topological Data AnalysisIntroduction to Topological Data Analysis
Introduction to Topological Data Analysis
 
ComputableFacts: a Secure System to Store Documents and Graphs
ComputableFacts: a Secure System to Store Documents and GraphsComputableFacts: a Secure System to Store Documents and Graphs
ComputableFacts: a Secure System to Store Documents and Graphs
 
Search: Probabilistic Information Retrieval
Search: Probabilistic Information RetrievalSearch: Probabilistic Information Retrieval
Search: Probabilistic Information Retrieval
 
Building intelligent systems (that can explain)
Building intelligent systems (that can explain)Building intelligent systems (that can explain)
Building intelligent systems (that can explain)
 

Similar a CLIM Program: Remote Sensing Workshop, Some Ideas on Theory of Data Systems - Ansu Chatterjee, Feb 12, 2018

UNIT1-2.pptx
UNIT1-2.pptxUNIT1-2.pptx
UNIT1-2.pptxcsecem
 
s40537-015-0030-3-data-analytics-a-survey.pdf
s40537-015-0030-3-data-analytics-a-survey.pdfs40537-015-0030-3-data-analytics-a-survey.pdf
s40537-015-0030-3-data-analytics-a-survey.pdfAkuhuruf
 
International Collaboration Networks in the Emerging (Big) Data Science
International Collaboration Networks in the Emerging (Big) Data ScienceInternational Collaboration Networks in the Emerging (Big) Data Science
International Collaboration Networks in the Emerging (Big) Data Sciencedatasciencekorea
 
Data Science - An emerging Stream of Science with its Spreading Reach & Impact
Data Science - An emerging Stream of Science with its Spreading Reach & ImpactData Science - An emerging Stream of Science with its Spreading Reach & Impact
Data Science - An emerging Stream of Science with its Spreading Reach & ImpactDr. Sunil Kr. Pandey
 
Context, Causality, and Information Flow: Implications for Privacy Engineerin...
Context, Causality, and Information Flow: Implications for Privacy Engineerin...Context, Causality, and Information Flow: Implications for Privacy Engineerin...
Context, Causality, and Information Flow: Implications for Privacy Engineerin...Sebastian Benthall
 
Data Science in 2016: Moving Up
Data Science in 2016: Moving UpData Science in 2016: Moving Up
Data Science in 2016: Moving UpPaco Nathan
 
Data Science in 2016: Moving up by Paco Nathan at Big Data Spain 2015
Data Science in 2016: Moving up by Paco Nathan at Big Data Spain 2015Data Science in 2016: Moving up by Paco Nathan at Big Data Spain 2015
Data Science in 2016: Moving up by Paco Nathan at Big Data Spain 2015Big Data Spain
 
Data science e machine learning
Data science e machine learningData science e machine learning
Data science e machine learningGiuseppe Manco
 
Data Center Computing for Data Science: an evolution of machines, middleware,...
Data Center Computing for Data Science: an evolution of machines, middleware,...Data Center Computing for Data Science: an evolution of machines, middleware,...
Data Center Computing for Data Science: an evolution of machines, middleware,...Paco Nathan
 
A Comparative Study of Various Data Mining Techniques: Statistics, Decision T...
A Comparative Study of Various Data Mining Techniques: Statistics, Decision T...A Comparative Study of Various Data Mining Techniques: Statistics, Decision T...
A Comparative Study of Various Data Mining Techniques: Statistics, Decision T...Editor IJCATR
 
Competitive advantage from Data Mining: some lessons learnt ...
Competitive advantage from Data Mining: some lessons learnt ...Competitive advantage from Data Mining: some lessons learnt ...
Competitive advantage from Data Mining: some lessons learnt ...butest
 
Competitive advantage from Data Mining: some lessons learnt ...
Competitive advantage from Data Mining: some lessons learnt ...Competitive advantage from Data Mining: some lessons learnt ...
Competitive advantage from Data Mining: some lessons learnt ...butest
 
Data Science-1 (1).ppt
Data Science-1 (1).pptData Science-1 (1).ppt
Data Science-1 (1).pptSanjayAcharaya
 
DSSG Speaker Series: Paco Nathan
DSSG Speaker Series: Paco NathanDSSG Speaker Series: Paco Nathan
DSSG Speaker Series: Paco NathanPaco Nathan
 
Python's Role in the Future of Data Analysis
Python's Role in the Future of Data AnalysisPython's Role in the Future of Data Analysis
Python's Role in the Future of Data AnalysisPeter Wang
 
Introduction to Data Science 1113.pptx
Introduction to Data Science 1113.pptxIntroduction to Data Science 1113.pptx
Introduction to Data Science 1113.pptxmark828
 

Similar a CLIM Program: Remote Sensing Workshop, Some Ideas on Theory of Data Systems - Ansu Chatterjee, Feb 12, 2018 (20)

Slides barcelona risk data
Slides barcelona risk dataSlides barcelona risk data
Slides barcelona risk data
 
UNIT1-2.pptx
UNIT1-2.pptxUNIT1-2.pptx
UNIT1-2.pptx
 
s40537-015-0030-3-data-analytics-a-survey.pdf
s40537-015-0030-3-data-analytics-a-survey.pdfs40537-015-0030-3-data-analytics-a-survey.pdf
s40537-015-0030-3-data-analytics-a-survey.pdf
 
International Collaboration Networks in the Emerging (Big) Data Science
International Collaboration Networks in the Emerging (Big) Data ScienceInternational Collaboration Networks in the Emerging (Big) Data Science
International Collaboration Networks in the Emerging (Big) Data Science
 
Sensors1(1)
Sensors1(1)Sensors1(1)
Sensors1(1)
 
Data Science - An emerging Stream of Science with its Spreading Reach & Impact
Data Science - An emerging Stream of Science with its Spreading Reach & ImpactData Science - An emerging Stream of Science with its Spreading Reach & Impact
Data Science - An emerging Stream of Science with its Spreading Reach & Impact
 
Context, Causality, and Information Flow: Implications for Privacy Engineerin...
Context, Causality, and Information Flow: Implications for Privacy Engineerin...Context, Causality, and Information Flow: Implications for Privacy Engineerin...
Context, Causality, and Information Flow: Implications for Privacy Engineerin...
 
Data Science in 2016: Moving Up
Data Science in 2016: Moving UpData Science in 2016: Moving Up
Data Science in 2016: Moving Up
 
Data Science in 2016: Moving up by Paco Nathan at Big Data Spain 2015
Data Science in 2016: Moving up by Paco Nathan at Big Data Spain 2015Data Science in 2016: Moving up by Paco Nathan at Big Data Spain 2015
Data Science in 2016: Moving up by Paco Nathan at Big Data Spain 2015
 
Data science e machine learning
Data science e machine learningData science e machine learning
Data science e machine learning
 
Data Center Computing for Data Science: an evolution of machines, middleware,...
Data Center Computing for Data Science: an evolution of machines, middleware,...Data Center Computing for Data Science: an evolution of machines, middleware,...
Data Center Computing for Data Science: an evolution of machines, middleware,...
 
A Comparative Study of Various Data Mining Techniques: Statistics, Decision T...
A Comparative Study of Various Data Mining Techniques: Statistics, Decision T...A Comparative Study of Various Data Mining Techniques: Statistics, Decision T...
A Comparative Study of Various Data Mining Techniques: Statistics, Decision T...
 
Competitive advantage from Data Mining: some lessons learnt ...
Competitive advantage from Data Mining: some lessons learnt ...Competitive advantage from Data Mining: some lessons learnt ...
Competitive advantage from Data Mining: some lessons learnt ...
 
Competitive advantage from Data Mining: some lessons learnt ...
Competitive advantage from Data Mining: some lessons learnt ...Competitive advantage from Data Mining: some lessons learnt ...
Competitive advantage from Data Mining: some lessons learnt ...
 
Data Science-1 (1).ppt
Data Science-1 (1).pptData Science-1 (1).ppt
Data Science-1 (1).ppt
 
DSSG Speaker Series: Paco Nathan
DSSG Speaker Series: Paco NathanDSSG Speaker Series: Paco Nathan
DSSG Speaker Series: Paco Nathan
 
2018 Modern Math Workshop - Nonparametric Regression and Classification for M...
2018 Modern Math Workshop - Nonparametric Regression and Classification for M...2018 Modern Math Workshop - Nonparametric Regression and Classification for M...
2018 Modern Math Workshop - Nonparametric Regression and Classification for M...
 
Python's Role in the Future of Data Analysis
Python's Role in the Future of Data AnalysisPython's Role in the Future of Data Analysis
Python's Role in the Future of Data Analysis
 
Introduction to Data Science 1113.pptx
Introduction to Data Science 1113.pptxIntroduction to Data Science 1113.pptx
Introduction to Data Science 1113.pptx
 
CLIM Program: Remote Sensing Workshop, A Notional Framework for a Theory of D...
CLIM Program: Remote Sensing Workshop, A Notional Framework for a Theory of D...CLIM Program: Remote Sensing Workshop, A Notional Framework for a Theory of D...
CLIM Program: Remote Sensing Workshop, A Notional Framework for a Theory of D...
 

Más de The Statistical and Applied Mathematical Sciences Institute

Más de The Statistical and Applied Mathematical Sciences Institute (20)

Causal Inference Opening Workshop - Latent Variable Models, Causal Inference,...
Causal Inference Opening Workshop - Latent Variable Models, Causal Inference,...Causal Inference Opening Workshop - Latent Variable Models, Causal Inference,...
Causal Inference Opening Workshop - Latent Variable Models, Causal Inference,...
 
2019 Fall Series: Special Guest Lecture - 0-1 Phase Transitions in High Dimen...
2019 Fall Series: Special Guest Lecture - 0-1 Phase Transitions in High Dimen...2019 Fall Series: Special Guest Lecture - 0-1 Phase Transitions in High Dimen...
2019 Fall Series: Special Guest Lecture - 0-1 Phase Transitions in High Dimen...
 
Causal Inference Opening Workshop - Causal Discovery in Neuroimaging Data - F...
Causal Inference Opening Workshop - Causal Discovery in Neuroimaging Data - F...Causal Inference Opening Workshop - Causal Discovery in Neuroimaging Data - F...
Causal Inference Opening Workshop - Causal Discovery in Neuroimaging Data - F...
 
Causal Inference Opening Workshop - Smooth Extensions to BART for Heterogeneo...
Causal Inference Opening Workshop - Smooth Extensions to BART for Heterogeneo...Causal Inference Opening Workshop - Smooth Extensions to BART for Heterogeneo...
Causal Inference Opening Workshop - Smooth Extensions to BART for Heterogeneo...
 
Causal Inference Opening Workshop - A Bracketing Relationship between Differe...
Causal Inference Opening Workshop - A Bracketing Relationship between Differe...Causal Inference Opening Workshop - A Bracketing Relationship between Differe...
Causal Inference Opening Workshop - A Bracketing Relationship between Differe...
 
Causal Inference Opening Workshop - Testing Weak Nulls in Matched Observation...
Causal Inference Opening Workshop - Testing Weak Nulls in Matched Observation...Causal Inference Opening Workshop - Testing Weak Nulls in Matched Observation...
Causal Inference Opening Workshop - Testing Weak Nulls in Matched Observation...
 
Causal Inference Opening Workshop - Difference-in-differences: more than meet...
Causal Inference Opening Workshop - Difference-in-differences: more than meet...Causal Inference Opening Workshop - Difference-in-differences: more than meet...
Causal Inference Opening Workshop - Difference-in-differences: more than meet...
 
Causal Inference Opening Workshop - New Statistical Learning Methods for Esti...
Causal Inference Opening Workshop - New Statistical Learning Methods for Esti...Causal Inference Opening Workshop - New Statistical Learning Methods for Esti...
Causal Inference Opening Workshop - New Statistical Learning Methods for Esti...
 
Causal Inference Opening Workshop - Bipartite Causal Inference with Interfere...
Causal Inference Opening Workshop - Bipartite Causal Inference with Interfere...Causal Inference Opening Workshop - Bipartite Causal Inference with Interfere...
Causal Inference Opening Workshop - Bipartite Causal Inference with Interfere...
 
Causal Inference Opening Workshop - Bridging the Gap Between Causal Literatur...
Causal Inference Opening Workshop - Bridging the Gap Between Causal Literatur...Causal Inference Opening Workshop - Bridging the Gap Between Causal Literatur...
Causal Inference Opening Workshop - Bridging the Gap Between Causal Literatur...
 
Causal Inference Opening Workshop - Some Applications of Reinforcement Learni...
Causal Inference Opening Workshop - Some Applications of Reinforcement Learni...Causal Inference Opening Workshop - Some Applications of Reinforcement Learni...
Causal Inference Opening Workshop - Some Applications of Reinforcement Learni...
 
Causal Inference Opening Workshop - Bracketing Bounds for Differences-in-Diff...
Causal Inference Opening Workshop - Bracketing Bounds for Differences-in-Diff...Causal Inference Opening Workshop - Bracketing Bounds for Differences-in-Diff...
Causal Inference Opening Workshop - Bracketing Bounds for Differences-in-Diff...
 
Causal Inference Opening Workshop - Assisting the Impact of State Polcies: Br...
Causal Inference Opening Workshop - Assisting the Impact of State Polcies: Br...Causal Inference Opening Workshop - Assisting the Impact of State Polcies: Br...
Causal Inference Opening Workshop - Assisting the Impact of State Polcies: Br...
 
Causal Inference Opening Workshop - Experimenting in Equilibrium - Stefan Wag...
Causal Inference Opening Workshop - Experimenting in Equilibrium - Stefan Wag...Causal Inference Opening Workshop - Experimenting in Equilibrium - Stefan Wag...
Causal Inference Opening Workshop - Experimenting in Equilibrium - Stefan Wag...
 
Causal Inference Opening Workshop - Targeted Learning for Causal Inference Ba...
Causal Inference Opening Workshop - Targeted Learning for Causal Inference Ba...Causal Inference Opening Workshop - Targeted Learning for Causal Inference Ba...
Causal Inference Opening Workshop - Targeted Learning for Causal Inference Ba...
 
Causal Inference Opening Workshop - Bayesian Nonparametric Models for Treatme...
Causal Inference Opening Workshop - Bayesian Nonparametric Models for Treatme...Causal Inference Opening Workshop - Bayesian Nonparametric Models for Treatme...
Causal Inference Opening Workshop - Bayesian Nonparametric Models for Treatme...
 
2019 Fall Series: Special Guest Lecture - Adversarial Risk Analysis of the Ge...
2019 Fall Series: Special Guest Lecture - Adversarial Risk Analysis of the Ge...2019 Fall Series: Special Guest Lecture - Adversarial Risk Analysis of the Ge...
2019 Fall Series: Special Guest Lecture - Adversarial Risk Analysis of the Ge...
 
2019 Fall Series: Professional Development, Writing Academic Papers…What Work...
2019 Fall Series: Professional Development, Writing Academic Papers…What Work...2019 Fall Series: Professional Development, Writing Academic Papers…What Work...
2019 Fall Series: Professional Development, Writing Academic Papers…What Work...
 
2019 GDRR: Blockchain Data Analytics - Machine Learning in/for Blockchain: Fu...
2019 GDRR: Blockchain Data Analytics - Machine Learning in/for Blockchain: Fu...2019 GDRR: Blockchain Data Analytics - Machine Learning in/for Blockchain: Fu...
2019 GDRR: Blockchain Data Analytics - Machine Learning in/for Blockchain: Fu...
 
2019 GDRR: Blockchain Data Analytics - QuTrack: Model Life Cycle Management f...
2019 GDRR: Blockchain Data Analytics - QuTrack: Model Life Cycle Management f...2019 GDRR: Blockchain Data Analytics - QuTrack: Model Life Cycle Management f...
2019 GDRR: Blockchain Data Analytics - QuTrack: Model Life Cycle Management f...
 

Último

HMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptxHMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptxEsquimalt MFRC
 
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...ZurliaSoop
 
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...pradhanghanshyam7136
 
REMIFENTANIL: An Ultra short acting opioid.pptx
REMIFENTANIL: An Ultra short acting opioid.pptxREMIFENTANIL: An Ultra short acting opioid.pptx
REMIFENTANIL: An Ultra short acting opioid.pptxDr. Ravikiran H M Gowda
 
Sociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning ExhibitSociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning Exhibitjbellavia9
 
Application orientated numerical on hev.ppt
Application orientated numerical on hev.pptApplication orientated numerical on hev.ppt
Application orientated numerical on hev.pptRamjanShidvankar
 
Jamworks pilot and AI at Jisc (20/03/2024)
Jamworks pilot and AI at Jisc (20/03/2024)Jamworks pilot and AI at Jisc (20/03/2024)
Jamworks pilot and AI at Jisc (20/03/2024)Jisc
 
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...Nguyen Thanh Tu Collection
 
Unit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxUnit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxVishalSingh1417
 
On National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsOn National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsMebane Rash
 
ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.MaryamAhmad92
 
ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxAreebaZafar22
 
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdf
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdfUnit 3 Emotional Intelligence and Spiritual Intelligence.pdf
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdfDr Vijay Vishwakarma
 
General Principles of Intellectual Property: Concepts of Intellectual Proper...
General Principles of Intellectual Property: Concepts of Intellectual  Proper...General Principles of Intellectual Property: Concepts of Intellectual  Proper...
General Principles of Intellectual Property: Concepts of Intellectual Proper...Poonam Aher Patil
 
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...Pooja Bhuva
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsTechSoup
 
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptxHMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptxmarlenawright1
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfAdmir Softic
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfagholdier
 
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdfUGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdfNirmal Dwivedi
 

Último (20)

HMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptxHMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
 
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
 
REMIFENTANIL: An Ultra short acting opioid.pptx
REMIFENTANIL: An Ultra short acting opioid.pptxREMIFENTANIL: An Ultra short acting opioid.pptx
REMIFENTANIL: An Ultra short acting opioid.pptx
 
Sociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning ExhibitSociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning Exhibit
 
Application orientated numerical on hev.ppt
Application orientated numerical on hev.pptApplication orientated numerical on hev.ppt
Application orientated numerical on hev.ppt
 
Jamworks pilot and AI at Jisc (20/03/2024)
Jamworks pilot and AI at Jisc (20/03/2024)Jamworks pilot and AI at Jisc (20/03/2024)
Jamworks pilot and AI at Jisc (20/03/2024)
 
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
 
Unit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxUnit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptx
 
On National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsOn National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan Fellows
 
ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.
 
ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptx
 
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdf
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdfUnit 3 Emotional Intelligence and Spiritual Intelligence.pdf
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdf
 
General Principles of Intellectual Property: Concepts of Intellectual Proper...
General Principles of Intellectual Property: Concepts of Intellectual  Proper...General Principles of Intellectual Property: Concepts of Intellectual  Proper...
General Principles of Intellectual Property: Concepts of Intellectual Proper...
 
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The Basics
 
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptxHMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdf
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdf
 
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdfUGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
 

CLIM Program: Remote Sensing Workshop, Some Ideas on Theory of Data Systems - Ansu Chatterjee, Feb 12, 2018

  • 1. Some ideas on Theory of Data Systems Ansu Chatterjee School of Statistics, University of Minnesota February 12, 2018
  • 2. What’s contained below This is partially based on papers by Chandrasekaran, De Domenico, Smith, Johnson, and other talks we heard earlier today. Contains a lot of ramblings. Important disclaimer: I’m not an expert on most things discussed below!
  • 3. The main idea Chandrasekaran (and Jordan, Soh, Berthet) proposed using a bit of both of these considerations: so there is a cost for time/storage, there is also a quality constraint on the resulting estimate, and both depend on the sample size. De Domenico brought in multi-layered networks: the different data repositories, the different computational resources, algorithms, models, can form networks. Smith introduced different blocking methods. Crichton brought out the different levels of abstractions and complexities that underlie a data analytic task. Johnson brought together all the above elements.
  • 4. The main idea The way we do Statistics now: (i) data is a resource, (ii) the more we use data the better our estimates get, (iii) there is no cost associated with obtaining data, pre-processing, storing, time required to access data and to run the algorithm. The way algorithms are evaluated in (theoretical) computer science: (i) time for implementation and storage are the important things to consideration, (ii) quality and usage of the output does not matter, (iii) data is a constraint (the more data, the greater is the demand on storage and time). Constrained view of what is optimal.
  • 5. The goals may converge Example Consider i.i.d. observations X1, . . . , Xn from Np(0, Σ). The goal is to estimate (λ1, P1), the highest eigenvalue and the corresponding eigenvector of Σ. This can be done in O(p2) steps more or less (depends on assumptions, skills). That might be a futile exercise, since in general the sample maximum eigenvalue is not consistent for the population maximum eigenvalue. Solution: Make assumptions about Σ, which may reduce computational complexity and make the computational meaningful.
  • 6. Many goals in both computation and statistics CS: Traditionally, we consider time and storage requirements as important quantifiers of complexity. CS: The distributed nature of data, and distributed computing, generates additional quantifiers. (Think today’s talks, think MapReduce and beyond.) Statistics: Robustness (to assumptions, model, quality of data) are additional quantifiers of desirable statistical properties. Statistics: High-dimensional data requires strong assumptions.
  • 7. Statistics is forgiving Example An estimator of the mean is ¯Xn. Anything wrong with ¯Xn + 42/n? Absolutely nothing (up to first order asymptotics). We do not need the estimators to be more than O(n−α) precise. Statistical inference often does not depend on being ultra-precise, and shouldn’t be considered trustworthy if it does (takes us back to robustness). (As Venkat pointed out, and George Box before him:) Statistical models are approximations anyway. : Essentially, all models are wrong, but some are useful. (G. E. P. Box) True for not just statistical models, but also for all other sciences as well.
  • 8. There is more to it Example Facebook, allegedly, has a data center in Lapland: There are benefits to Facebook with reduced costs of cooling towers. There are environmental costs to Lapland. There are possible economic benefits to Sweden. Possibly a cost-benefit analysis, the way economist do, may be useful here. (Real options?)
  • 9. Don’t forget the stakeholders Example All tweets are stored (hopefully): Is that necessary? Informative? For example: all tweets of the president may be personally important to him. They may be important to future historians. There are different kinds of stakeholders for each computation/statistics exercise. Often, the core issue is one of inference. My personal view: Think of streaming data as the template, and consider careful preferential sampling and online updating. (Leads to non-regular asymptotics!)