A Statewide Archaeological Predictive Model of Pennsylvania: Lessons Learned

•

1 recomendación•1,959 vistas

A recently completed archaeological predictive model (APM) of the state of Pennsylvania provided an unprecedented opportunity to explore the current status of APM methods and extend them based on current methods derived from related scientific fields, medicine, and statistical computing. Through this process many different types of models were created and tested for validity, predictive performance, and adherence to archaeological theory. One result of this project is a comprehensive view of the problems that beset existing APM methodologies, solutions to some of these problems, and the nature of challenges that we will face going forward with new techniques. Most, if not all of the findings of this project are applicable to the eastern deciduous United States, and much of the methodological scope is useful to APMs in any geography. This paper will discuss the primary lessons learned from this project in regards to archaeological data, modeling methods, and theory, as well as touch on best-practices for future APM efforts.

Medio ambiente

Pennsylvania
Predictive Model:
Lessons Learned
Matthew D. Harris, AECOM - Burlington, NJ
matthew.d.harris@aecom.com

FHWA Statement
“The contents of the report reflect the views of the author(s) who are
responsible for the facts and accuracy of the data presented within. The
contents do not necessarily reflect the official view or policies of the
Department or FHWA at the time of publication.”
Report available at: www.penndotcrm.org

“Remember that all models are
wrong; the practical question is
how wrong do they have to be
to not be useful.”
~ George E. P. Box, 1987

Organization of talk
• Introduction to PA Model
• Data lessons
• Methodological lessons
• Policy lessons
• Concluding observations

PA Model Specs
• 45,293 square miles
• 1 billion raster cells
• 2 million site-present cells
• 18,226 pre-contact sites
• 132 geographic study areas
• 528 individual models
• 93 model variables
• 102 billion cells processed
• Random Forest, MARS, and Stepwise
Logistic Regression models
Archaeo “Big Data”

DATA Lessons Learned
• Unique characteristics of archaeological data
• Representation of archaeological data
• Archaeological site prevalence
• Covariates and correlation
• Dealing with uncertainty

Characteristics of Archaeological Data
Population Generating Process:
• Highly dynamic & complex
• Non-mechanistic
• Cultural and Agency
• Dynamic environment
• Changing parameters
• Subjectively defined expression
• Censored through taphonomy
Sample Generating Process:
• Non-systematic
• Subjective & inconsistent
• Extensive measurement error
• Imperfect detectability
• Non-representative of population
• Spatially biased
• Over simplification

Background
Samples and
model variance
How many non-site samples to
use?
Background gif

Quantifying Uncertainty
Logistic regression (Bayesian GLM)

Quantifying Uncertainty
95% Credibility Interval

Quantifying Uncertainty
500 simulated plausible models

Methodological Lessons Learned
• Define your objectives and assumptions
• Reproducibility
• Create a model building system
• ArcGIS is only part of the answer
• Understand your algorithms
• Test and validate all results

Reproducibility and Accountability
www.rstudio.com
www.python.org
www.esri.com
aws.amazon.com
code example:
pseudo-code example:

Model Building
System
● Variable creationand analysis
● Train model hyperparameters
● Algortihm Selection
● Test error with Cross-Validation
● Assess performance
● Model selection
● Mosaic and aggregate

Validation and error
Does this model predict new sites?

“The generalization performance of a
learning method relates to its prediction
capability on independent test data.” ~ Hastie et al.
(2008)

Policy Lessons Learned
• Model purpose dictates policy applications
• Implementation requires explicit assumptions
• Error rates and uncertainty must be known
• Scale of data is critical in scale of use
• Methods to visualize uncertainty

How it all works...
PURPOSE ASSUMPTIONS METHODS
ALGORITHMS /
MODELS
INTERPRETATIONPOLICY

Reproducibility
Accountability in all aspects of model building
Clear and understandableassumptions

Validation
Test predictions on independent data to assess error
Balance Models to achieve appropriate generalization

Uncertainty
Understand and control for sources of uncertainty
Communicate uncertainty in text and visually

Purpose
Assess all aspects of a model relative to its purpose
Policy and implementation are based on model purpose

Not all doom and gloom!
• Face modeling issues head-on
• Model for our unique data
• Standardize our approaches
• Formalize our theory
• Compare our results

THANK
YOU!!!
@md_harris
github.com/mrecos
matthewdharris.com
www.penndotcrm.orgReport:

Más contenido relacionado

Similar a A Statewide Archaeological Predictive Model of Pennsylvania: Lessons Learned

Revisiting evolutionary information filteringManolis Vavalis

Using evolutionary testing to improve efficiency and qualityFaysal Ahmed

Looking into the Future: Using Google's Prediction APIJustin Grammens

Experimental Design for Distributed Machine Learning with Myles BakerDatabricks

Kaggle Higgs Boson Machine Learning ChallengeBernard Ong

Handling Missing Attributes using Matrix Factorization CS, NcState

Large Scale PCA Analysis in SVSGolden Helix

COSMOS1_Scitech_2014_AliMDO_Lab

2019 GDRR: Blockchain Data Analytics - QuTrack: Model Life Cycle Management f...The Statistical and Applied Mathematical Sciences Institute

Machine Learning Interpretability - Mateusz Dymczyk - H2O AI World London 2018Sri Ambati

Risk Management for LLMsSri Ambati

productionising-recommendersLudovik Coba

Learning With Complete DataVishnuprabhu Gopalakrishnan

Automating Data Science over a Human Genomics Knowledge BaseVaticle

Context-aware preference modeling with factorizationBalázs Hidasi

QuTrack: Model Life Cycle Management for AI and ML models using a Blockchain ...QuantUniversity

Testing Machine Learning-enabled Systems: A Personal PerspectiveLionel Briand

Transferring biodiversity models for conservation: Opportunities and challengesPhil Bouchet

Sybrandt Thesis Proposal PresentationJustin Sybrandt, Ph.D.

Kaggle Gold Medal Case StudyAlon Bochman, CFA

Similar a A Statewide Archaeological Predictive Model of Pennsylvania: Lessons Learned (20)

Revisiting evolutionary information filtering

Using evolutionary testing to improve efficiency and quality

Looking into the Future: Using Google's Prediction API

Experimental Design for Distributed Machine Learning with Myles Baker

Kaggle Higgs Boson Machine Learning Challenge

Handling Missing Attributes using Matrix Factorization

Large Scale PCA Analysis in SVS

COSMOS1_Scitech_2014_Ali

2019 GDRR: Blockchain Data Analytics - QuTrack: Model Life Cycle Management f...

Machine Learning Interpretability - Mateusz Dymczyk - H2O AI World London 2018

Risk Management for LLMs

productionising-recommenders

Learning With Complete Data

Automating Data Science over a Human Genomics Knowledge Base

Context-aware preference modeling with factorization

QuTrack: Model Life Cycle Management for AI and ML models using a Blockchain ...

Testing Machine Learning-enabled Systems: A Personal Perspective

Transferring biodiversity models for conservation: Opportunities and challenges

Sybrandt Thesis Proposal Presentation

Kaggle Gold Medal Case Study

Último

Call Now ☎ Russian Call Girls Connaught Place @ 9899900591 # Russian Escorts ...kauryashika82

Get Premium Attur Layout Call Girls (8005736733) 24x7 Rate 15999 with A/c Roo...MOHANI PANDEY

Contact Number Call Girls Service In Goa 9316020077 Goa Call Girls Servicesexy call girls service in goa

Verified Trusted Kalyani Nagar Call Girls 8005736733 𝐈𝐍𝐃𝐄𝐏𝐄𝐍𝐃𝐄𝐍𝐓 Call 𝐆𝐈𝐑𝐋 𝐕...tanu pandey

Alandi Road ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready ...tanu pandey

Call Girls Magarpatta Call Me 7737669865 Budget Friendly No Advance Bookingroncy bisnoi

Call Girls Moshi Call Me 7737669865 Budget Friendly No Advance Bookingroncy bisnoi

Call Girls in Sakinaka Agency, { 9892124323 } Mumbai Vashi Call Girls Serivce...Pooja Nehwal

VIP Model Call Girls Chakan ( Pune ) Call ON 8005736733 Starting From 5K to 2...SUHANI PANDEY

Green MarketingDr. Salem Baidas

Booking open Available Pune Call Girls Yewalewadi 6297143586 Call Hot Indian...Call Girls in Nagpur High Profile

Book Sex Workers Available Pune Call Girls Khadki 6297143586 Call Hot Indian...Call Girls in Nagpur High Profile

Kondhwa ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready For ...tanu pandey

Cheap Call Girls in Dubai %(+971524965298 )# Dubai Call Girl Service By Rus...Escorts Call Girls

VIP Model Call Girls Wagholi ( Pune ) Call ON 8005736733 Starting From 5K to ...SUHANI PANDEY

Call Girls In Bloom Boutique | GK-1 ☎ 9990224454 High Class Delhi NCR 24 Hour...rajputriyana310

CSR_Module5_Green Earth Initiative, Tree Planting DayGeorgeDiamandis11

VIP Model Call Girls Bhosari ( Pune ) Call ON 8005736733 Starting From 5K to ...SUHANI PANDEY

VVIP Pune Call Girls Koregaon Park (7001035870) Pune Escorts Nearby with Comp...Call Girls in Nagpur High Profile

Cyclone Case Study Odisha 1999 Super Cyclone in India.cojitesh

A Statewide Archaeological Predictive Model of Pennsylvania: Lessons Learned

1. Pennsylvania Predictive Model: Lessons Learned Matthew D. Harris, AECOM - Burlington, NJ matthew.d.harris@aecom.com

2. FHWA Statement “The contents of the report reflect the views of the author(s) who are responsible for the facts and accuracy of the data presented within. The contents do not necessarily reflect the official view or policies of the Department or FHWA at the time of publication.” Report available at: www.penndotcrm.org

3. “Remember that all models are wrong; the practical question is how wrong do they have to be to not be useful.” ~ George E. P. Box, 1987

4. Organization of talk • Introduction to PA Model • Data lessons • Methodological lessons • Policy lessons • Concluding observations

5. Pennsylvania Predictive Model

6. PA Model Specs • 45,293 square miles • 1 billion raster cells • 2 million site-present cells • 18,226 pre-contact sites • 132 geographic study areas • 528 individual models • 93 model variables • 102 billion cells processed • Random Forest, MARS, and Stepwise Logistic Regression models Archaeo “Big Data”

7. PA Model

8. PA Model in comparison

9. PA Model in comparison

10. DATA Lessons Learned • Unique characteristics of archaeological data • Representation of archaeological data • Archaeological site prevalence • Covariates and correlation • Dealing with uncertainty

11. Characteristics of Archaeological Data Population Generating Process: • Highly dynamic & complex • Non-mechanistic • Cultural and Agency • Dynamic environment • Changing parameters • Subjectively defined expression • Censored through taphonomy Sample Generating Process: • Non-systematic • Subjective & inconsistent • Extensive measurement error • Imperfect detectability • Non-representative of population • Spatially biased • Over simplification

12. Data Representation

13. Do centroids represent sites?

14. Background Samples and model variance How many non-site samples to use? Background gif

15. Model uncertainty

16. Quantifying Uncertainty Logistic regression (Bayesian GLM)

17. Quantifying Uncertainty 95% Credibility Interval

18. Quantifying Uncertainty 500 simulated plausible models

19. Methodological Lessons Learned • Define your objectives and assumptions • Reproducibility • Create a model building system • ArcGIS is only part of the answer • Understand your algorithms • Test and validate all results

20. Reproducibility

21. Reproducibility and Accountability www.rstudio.com www.python.org www.esri.com aws.amazon.com code example: pseudo-code example:

22. Model Building System ● Variable creationand analysis ● Train model hyperparameters ● Algortihm Selection ● Test error with Cross-Validation ● Assess performance ● Model selection ● Mosaic and aggregate

23. Validation and error Does this model predict new sites?

24. “The generalization performance of a learning method relates to its prediction capability on independent test data.” ~ Hastie et al. (2008)

25. Bias & Variance Tradeoff ErrorError

26. Policy Lessons Learned • Model purpose dictates policy applications • Implementation requires explicit assumptions • Error rates and uncertainty must be known • Scale of data is critical in scale of use • Methods to visualize uncertainty

27. How it all works... PURPOSE ASSUMPTIONS METHODS ALGORITHMS / MODELS INTERPRETATIONPOLICY

28. Lessons learned

29. Reproducibility Accountability in all aspects of model building Clear and understandableassumptions

30. Validation Test predictions on independent data to assess error Balance Models to achieve appropriate generalization

31. Uncertainty Understand and control for sources of uncertainty Communicate uncertainty in text and visually

32. Purpose Assess all aspects of a model relative to its purpose Policy and implementation are based on model purpose

33. Not all doom and gloom! • Face modeling issues head-on • Model for our unique data • Standardize our approaches • Formalize our theory • Compare our results

34. THANK YOU!!! @md_harris github.com/mrecos matthewdharris.com www.penndotcrm.orgReport:

A Statewide Archaeological Predictive Model of Pennsylvania: Lessons Learned

Recomendados

Recomendados

Más contenido relacionado

Similar a A Statewide Archaeological Predictive Model of Pennsylvania: Lessons Learned

Similar a A Statewide Archaeological Predictive Model of Pennsylvania: Lessons Learned (20)

Último

Último (20)

A Statewide Archaeological Predictive Model of Pennsylvania: Lessons Learned