SlideShare una empresa de Scribd logo
1 de 20
Descargar para leer sin conexión
NAPE ICE
2019
Outline
● Why Process Petrophysical Big Data?
● What Big Data processing challenges?
● ETL Workflow
● Conclusion
● References
NAPE ICE
2019Why Process Petrophysical
Big Data?
● Re-evaluate old well logs for opportunities
● Conducting pre-drill analysis of offset wells
● Unable to effectively assess well / field reserves
● Challenge with inferring geological features
NAPE ICE
2019
● For 1 to 10 well log files?
- Copying the link and pasting on the browser is straightforward
- Quickly download log data
- Easier to perform ETL with such amount of data
What Big Data
Processing Challenges?
NAPE ICE
2019
● For 1 to 10 well log files?
● For 1000 well log files ???
What Big Data
Processing Challenges?
NAPE ICE
2019
● Link to ~ 1000 well log data from 5 fields in excel sheets
What Big Data
Processing Challenges?
NAPE ICE
2019
● Download each well log file individually from the web
● Read log data from each file
● Enrich metadata and actual data files and save as Apache Arrow data
format before loading to AWS S3 bucket
● GOAL: Making data ready for Apache Spark ML and Tensorflow
Deep Learning Pipeline
Extract Transform Load
NAPE ICE
2019
● Link to ~ 1000 well log data from 5 fields in excel sheets
● Download each well log file individually from the web
- get the links to the files
- append all the extracted links to a list
- account for errors
- save the file
ETL Workflow
NAPE ICE
2019ETL Workflow
NAPE ICE
2019ETL Workflow
NAPE ICE
2019
● Link to ~ 1000 well log data from 5 fields in excel sheets
● Download each well log file individually from the web
● Read log data from each file
- extract their actual data and Metadata / Header data
- account for errors
ETL Workflow
NAPE ICE
2019ETL Workflow
NAPE ICE
2019
● Link to ~ 1000 well log data from 5 fields in excel sheets
● Download each well log file individually from the web
● Read log data from each file
● Enrich metadata and actual data files and save as Apache Arrow data
format before loading to AWS s3 bucket
ETL Workflow
NAPE ICE
2019
Why Apache Arrow?
ETL Workflow
NAPE ICE
2019
● Link to ~ 1000 well log data from 5 fields in excel sheets
● Download each well log file individually from the web
● Read log data from each file
● Enrich metadata and actual data files and save as Apache Arrow data format before
loading to AWS s3 bucket
● Making data ready for Apache Spark ML / Keras Deep Learning Pipeline
- drop columns: 152 to 13 , drop duplicates , null / NA values, account for missing values
- Split-apply-combine on grouped data by field and API: @pandas_udf
- Caching dataframe
ETL Workflow
NAPE ICE
2019Conclusion
● Apache Airflow to orchestrate ETL process
● Moving towards real time data processing:
-WITSML data processing
● Apache Kafka, Apache Flink, Apache Storm, Apache Spark
NAPE ICE
2019References
● http://www.searchanddiscovery.com/pdfz/documents/2018/42234ejimuda/ndx_ejimuda.pdf.html
● https://www.slideshare.net/wesm/high-performance-python-on-apache-spark
● https://spark.apache.org/docs/latest/sql-pyspark-pandas-with-arrow.html
● https://airflow.apache.org/installation.html
NAPE ICE
2019
Questions?
NAPE ICE
2019
Thank you!
NAPE 2019 Presentation

Más contenido relacionado

La actualidad más candente

Five Ways to Move Data to Excel in EnergyCAP
Five Ways to Move Data to Excel in EnergyCAPFive Ways to Move Data to Excel in EnergyCAP
Five Ways to Move Data to Excel in EnergyCAPEnergyCAP, Inc.
 
Hydraulic Modelling with GIS Data
Hydraulic Modelling with GIS DataHydraulic Modelling with GIS Data
Hydraulic Modelling with GIS DataSafe Software
 
Processing genetic data at scale
Processing genetic data at scaleProcessing genetic data at scale
Processing genetic data at scaleMark Schroering
 
Review on Apache Spark Technology
Review on Apache Spark TechnologyReview on Apache Spark Technology
Review on Apache Spark TechnologyIRJET Journal
 
Data Quality Assurance
Data Quality AssuranceData Quality Assurance
Data Quality AssuranceSafe Software
 
2016 conservation track: under the hood of an rea: what is within a rapid ec...
2016 conservation track: under the hood of an rea:  what is within a rapid ec...2016 conservation track: under the hood of an rea:  what is within a rapid ec...
2016 conservation track: under the hood of an rea: what is within a rapid ec...GIS in the Rockies
 
Integrating Ontario’s Provincially Tracked Species Data Using FME
Integrating Ontario’s Provincially Tracked Species Data Using FMEIntegrating Ontario’s Provincially Tracked Species Data Using FME
Integrating Ontario’s Provincially Tracked Species Data Using FMESafe Software
 
SDE to SPS (Synergi Pipeline Simulator) - Spatial Data to Text
SDE to SPS (Synergi Pipeline Simulator) - Spatial Data to TextSDE to SPS (Synergi Pipeline Simulator) - Spatial Data to Text
SDE to SPS (Synergi Pipeline Simulator) - Spatial Data to TextSafe Software
 
Stinger hadoop summit june 2013
Stinger hadoop summit june 2013Stinger hadoop summit june 2013
Stinger hadoop summit june 2013alanfgates
 
Using FME for the City of Palo Alto Topobase Implentation
Using FME for the City of Palo Alto Topobase ImplentationUsing FME for the City of Palo Alto Topobase Implentation
Using FME for the City of Palo Alto Topobase ImplentationSafe Software
 
Tenisha Hamilton -BI
Tenisha Hamilton -BITenisha Hamilton -BI
Tenisha Hamilton -BITenishaH
 
Filtering vs Enriching Data in Apache Spark
Filtering vs Enriching Data in Apache SparkFiltering vs Enriching Data in Apache Spark
Filtering vs Enriching Data in Apache SparkDatabricks
 
Why Open Source Works for DevOps Monitoring
Why Open Source Works for DevOps MonitoringWhy Open Source Works for DevOps Monitoring
Why Open Source Works for DevOps MonitoringDevOps.com
 
Active Record Internals - Medellin.rb
Active Record Internals - Medellin.rbActive Record Internals - Medellin.rb
Active Record Internals - Medellin.rbOscar Rendon
 
Spark Summit EU 2015: Revolutionizing Big Data in the Enterprise with Spark
Spark Summit EU 2015: Revolutionizing Big Data in the Enterprise with SparkSpark Summit EU 2015: Revolutionizing Big Data in the Enterprise with Spark
Spark Summit EU 2015: Revolutionizing Big Data in the Enterprise with SparkDatabricks
 

La actualidad más candente (20)

Five Ways to Move Data to Excel in EnergyCAP
Five Ways to Move Data to Excel in EnergyCAPFive Ways to Move Data to Excel in EnergyCAP
Five Ways to Move Data to Excel in EnergyCAP
 
Hydraulic Modelling with GIS Data
Hydraulic Modelling with GIS DataHydraulic Modelling with GIS Data
Hydraulic Modelling with GIS Data
 
Loading Data into Redshift
Loading Data into RedshiftLoading Data into Redshift
Loading Data into Redshift
 
Processing genetic data at scale
Processing genetic data at scaleProcessing genetic data at scale
Processing genetic data at scale
 
ETL to RDF with Talend and AllegroGraph
ETL to RDF with Talend and AllegroGraphETL to RDF with Talend and AllegroGraph
ETL to RDF with Talend and AllegroGraph
 
Review on Apache Spark Technology
Review on Apache Spark TechnologyReview on Apache Spark Technology
Review on Apache Spark Technology
 
Data Quality Assurance
Data Quality AssuranceData Quality Assurance
Data Quality Assurance
 
2016 conservation track: under the hood of an rea: what is within a rapid ec...
2016 conservation track: under the hood of an rea:  what is within a rapid ec...2016 conservation track: under the hood of an rea:  what is within a rapid ec...
2016 conservation track: under the hood of an rea: what is within a rapid ec...
 
Pilot Project for HDF5 Metadata Structures for SWOT
Pilot Project for HDF5 Metadata Structures for SWOTPilot Project for HDF5 Metadata Structures for SWOT
Pilot Project for HDF5 Metadata Structures for SWOT
 
Integrating Ontario’s Provincially Tracked Species Data Using FME
Integrating Ontario’s Provincially Tracked Species Data Using FMEIntegrating Ontario’s Provincially Tracked Species Data Using FME
Integrating Ontario’s Provincially Tracked Species Data Using FME
 
SDE to SPS (Synergi Pipeline Simulator) - Spatial Data to Text
SDE to SPS (Synergi Pipeline Simulator) - Spatial Data to TextSDE to SPS (Synergi Pipeline Simulator) - Spatial Data to Text
SDE to SPS (Synergi Pipeline Simulator) - Spatial Data to Text
 
Stinger hadoop summit june 2013
Stinger hadoop summit june 2013Stinger hadoop summit june 2013
Stinger hadoop summit june 2013
 
FME & Governement
FME & GovernementFME & Governement
FME & Governement
 
Using FME for the City of Palo Alto Topobase Implentation
Using FME for the City of Palo Alto Topobase ImplentationUsing FME for the City of Palo Alto Topobase Implentation
Using FME for the City of Palo Alto Topobase Implentation
 
Tenisha Hamilton -BI
Tenisha Hamilton -BITenisha Hamilton -BI
Tenisha Hamilton -BI
 
Filtering vs Enriching Data in Apache Spark
Filtering vs Enriching Data in Apache SparkFiltering vs Enriching Data in Apache Spark
Filtering vs Enriching Data in Apache Spark
 
Why Open Source Works for DevOps Monitoring
Why Open Source Works for DevOps MonitoringWhy Open Source Works for DevOps Monitoring
Why Open Source Works for DevOps Monitoring
 
Active Record Internals - Medellin.rb
Active Record Internals - Medellin.rbActive Record Internals - Medellin.rb
Active Record Internals - Medellin.rb
 
Spark Summit EU 2015: Revolutionizing Big Data in the Enterprise with Spark
Spark Summit EU 2015: Revolutionizing Big Data in the Enterprise with SparkSpark Summit EU 2015: Revolutionizing Big Data in the Enterprise with Spark
Spark Summit EU 2015: Revolutionizing Big Data in the Enterprise with Spark
 
Apache Kafka
Apache KafkaApache Kafka
Apache Kafka
 

Similar a NAPE 2019 Presentation

Actionable Insights with AI - Snowflake for Data Science
Actionable Insights with AI - Snowflake for Data ScienceActionable Insights with AI - Snowflake for Data Science
Actionable Insights with AI - Snowflake for Data ScienceHarald Erb
 
Continuous Data Replication into Cloud Storage with Oracle GoldenGate
Continuous Data Replication into Cloud Storage with Oracle GoldenGateContinuous Data Replication into Cloud Storage with Oracle GoldenGate
Continuous Data Replication into Cloud Storage with Oracle GoldenGateMichael Rainey
 
Snowflake for Data Engineering
Snowflake for Data EngineeringSnowflake for Data Engineering
Snowflake for Data EngineeringHarald Erb
 
Data Pipeline for The Big Data/Data Science OKC
Data Pipeline for The Big Data/Data Science OKCData Pipeline for The Big Data/Data Science OKC
Data Pipeline for The Big Data/Data Science OKCMark Smith
 
Cloud-based Data Lake for Analytics and AI
Cloud-based Data Lake for Analytics and AICloud-based Data Lake for Analytics and AI
Cloud-based Data Lake for Analytics and AITorsten Steinbach
 
Building a Data Pipeline using Apache Airflow (on AWS / GCP)
Building a Data Pipeline using Apache Airflow (on AWS / GCP)Building a Data Pipeline using Apache Airflow (on AWS / GCP)
Building a Data Pipeline using Apache Airflow (on AWS / GCP)Yohei Onishi
 
Combining Logs, Metrics, and Traces for Unified Observability
Combining Logs, Metrics, and Traces for Unified ObservabilityCombining Logs, Metrics, and Traces for Unified Observability
Combining Logs, Metrics, and Traces for Unified ObservabilityElasticsearch
 
IBM THINK 2019 - Cloud-Native Clickstream Analysis in IBM Cloud
IBM THINK 2019 - Cloud-Native Clickstream Analysis in IBM CloudIBM THINK 2019 - Cloud-Native Clickstream Analysis in IBM Cloud
IBM THINK 2019 - Cloud-Native Clickstream Analysis in IBM CloudTorsten Steinbach
 
Peteris Arajs - Where is my data
Peteris Arajs - Where is my dataPeteris Arajs - Where is my data
Peteris Arajs - Where is my dataAndrejs Vorobjovs
 
Todd vatalaro oracle 2004
Todd vatalaro oracle 2004Todd vatalaro oracle 2004
Todd vatalaro oracle 2004Todd Vatalaro
 
Stream, Stream, Stream: Different Streaming Methods with Apache Spark and Kafka
Stream, Stream, Stream: Different Streaming Methods with Apache Spark and KafkaStream, Stream, Stream: Different Streaming Methods with Apache Spark and Kafka
Stream, Stream, Stream: Different Streaming Methods with Apache Spark and KafkaDatabricks
 
Using Databricks as an Analysis Platform
Using Databricks as an Analysis PlatformUsing Databricks as an Analysis Platform
Using Databricks as an Analysis PlatformDatabricks
 
From Idea to Model: Productionizing Data Pipelines with Apache Airflow
From Idea to Model: Productionizing Data Pipelines with Apache AirflowFrom Idea to Model: Productionizing Data Pipelines with Apache Airflow
From Idea to Model: Productionizing Data Pipelines with Apache AirflowDatabricks
 
Flink Forward Berlin 2018: Timo Walther - "Flink SQL in Action"
Flink Forward Berlin 2018: Timo Walther - "Flink SQL in Action"Flink Forward Berlin 2018: Timo Walther - "Flink SQL in Action"
Flink Forward Berlin 2018: Timo Walther - "Flink SQL in Action"Flink Forward
 
Delivering rapid-fire Analytics with Snowflake and Tableau
Delivering rapid-fire Analytics with Snowflake and TableauDelivering rapid-fire Analytics with Snowflake and Tableau
Delivering rapid-fire Analytics with Snowflake and TableauHarald Erb
 
Airflow at lyft for Airflow summit 2020 conference
Airflow at lyft for Airflow summit 2020 conferenceAirflow at lyft for Airflow summit 2020 conference
Airflow at lyft for Airflow summit 2020 conferenceTao Feng
 
IBM THINK 2019 - A Sharing Economy for Analytics: SQL Query in IBM Cloud
IBM THINK 2019 - A Sharing Economy for Analytics: SQL Query in IBM CloudIBM THINK 2019 - A Sharing Economy for Analytics: SQL Query in IBM Cloud
IBM THINK 2019 - A Sharing Economy for Analytics: SQL Query in IBM CloudTorsten Steinbach
 

Similar a NAPE 2019 Presentation (20)

AAPG Geoscience Technology Workshop 2019
AAPG Geoscience Technology Workshop 2019AAPG Geoscience Technology Workshop 2019
AAPG Geoscience Technology Workshop 2019
 
Actionable Insights with AI - Snowflake for Data Science
Actionable Insights with AI - Snowflake for Data ScienceActionable Insights with AI - Snowflake for Data Science
Actionable Insights with AI - Snowflake for Data Science
 
Continuous Data Replication into Cloud Storage with Oracle GoldenGate
Continuous Data Replication into Cloud Storage with Oracle GoldenGateContinuous Data Replication into Cloud Storage with Oracle GoldenGate
Continuous Data Replication into Cloud Storage with Oracle GoldenGate
 
Mcneill 01
Mcneill 01Mcneill 01
Mcneill 01
 
Snowflake for Data Engineering
Snowflake for Data EngineeringSnowflake for Data Engineering
Snowflake for Data Engineering
 
Data Pipeline for The Big Data/Data Science OKC
Data Pipeline for The Big Data/Data Science OKCData Pipeline for The Big Data/Data Science OKC
Data Pipeline for The Big Data/Data Science OKC
 
Cloud-based Data Lake for Analytics and AI
Cloud-based Data Lake for Analytics and AICloud-based Data Lake for Analytics and AI
Cloud-based Data Lake for Analytics and AI
 
Building a Data Pipeline using Apache Airflow (on AWS / GCP)
Building a Data Pipeline using Apache Airflow (on AWS / GCP)Building a Data Pipeline using Apache Airflow (on AWS / GCP)
Building a Data Pipeline using Apache Airflow (on AWS / GCP)
 
Combining Logs, Metrics, and Traces for Unified Observability
Combining Logs, Metrics, and Traces for Unified ObservabilityCombining Logs, Metrics, and Traces for Unified Observability
Combining Logs, Metrics, and Traces for Unified Observability
 
IBM THINK 2019 - Cloud-Native Clickstream Analysis in IBM Cloud
IBM THINK 2019 - Cloud-Native Clickstream Analysis in IBM CloudIBM THINK 2019 - Cloud-Native Clickstream Analysis in IBM Cloud
IBM THINK 2019 - Cloud-Native Clickstream Analysis in IBM Cloud
 
Peteris Arajs - Where is my data
Peteris Arajs - Where is my dataPeteris Arajs - Where is my data
Peteris Arajs - Where is my data
 
Enterprise Data Lakes
Enterprise Data LakesEnterprise Data Lakes
Enterprise Data Lakes
 
Todd vatalaro oracle 2004
Todd vatalaro oracle 2004Todd vatalaro oracle 2004
Todd vatalaro oracle 2004
 
Stream, Stream, Stream: Different Streaming Methods with Apache Spark and Kafka
Stream, Stream, Stream: Different Streaming Methods with Apache Spark and KafkaStream, Stream, Stream: Different Streaming Methods with Apache Spark and Kafka
Stream, Stream, Stream: Different Streaming Methods with Apache Spark and Kafka
 
Using Databricks as an Analysis Platform
Using Databricks as an Analysis PlatformUsing Databricks as an Analysis Platform
Using Databricks as an Analysis Platform
 
From Idea to Model: Productionizing Data Pipelines with Apache Airflow
From Idea to Model: Productionizing Data Pipelines with Apache AirflowFrom Idea to Model: Productionizing Data Pipelines with Apache Airflow
From Idea to Model: Productionizing Data Pipelines with Apache Airflow
 
Flink Forward Berlin 2018: Timo Walther - "Flink SQL in Action"
Flink Forward Berlin 2018: Timo Walther - "Flink SQL in Action"Flink Forward Berlin 2018: Timo Walther - "Flink SQL in Action"
Flink Forward Berlin 2018: Timo Walther - "Flink SQL in Action"
 
Delivering rapid-fire Analytics with Snowflake and Tableau
Delivering rapid-fire Analytics with Snowflake and TableauDelivering rapid-fire Analytics with Snowflake and Tableau
Delivering rapid-fire Analytics with Snowflake and Tableau
 
Airflow at lyft for Airflow summit 2020 conference
Airflow at lyft for Airflow summit 2020 conferenceAirflow at lyft for Airflow summit 2020 conference
Airflow at lyft for Airflow summit 2020 conference
 
IBM THINK 2019 - A Sharing Economy for Analytics: SQL Query in IBM Cloud
IBM THINK 2019 - A Sharing Economy for Analytics: SQL Query in IBM CloudIBM THINK 2019 - A Sharing Economy for Analytics: SQL Query in IBM Cloud
IBM THINK 2019 - A Sharing Economy for Analytics: SQL Query in IBM Cloud
 

Más de Chijioke “CJ” Ejimuda

Revolutionizing Crtitical Infrastructure Connectivity
Revolutionizing Crtitical Infrastructure ConnectivityRevolutionizing Crtitical Infrastructure Connectivity
Revolutionizing Crtitical Infrastructure ConnectivityChijioke “CJ” Ejimuda
 
Leveraging big data to maximize value from rail and power infrastructure assets.
Leveraging big data to maximize value from rail and power infrastructure assets.Leveraging big data to maximize value from rail and power infrastructure assets.
Leveraging big data to maximize value from rail and power infrastructure assets.Chijioke “CJ” Ejimuda
 
Internet of Energy: "Can python prevent California wildfires?"
Internet of Energy: "Can python prevent California wildfires?"Internet of Energy: "Can python prevent California wildfires?"
Internet of Energy: "Can python prevent California wildfires?"Chijioke “CJ” Ejimuda
 
Using Deep Learning and Computer Vision to improve Corrosion risk assessments
Using Deep Learning and Computer Vision to improve Corrosion risk assessmentsUsing Deep Learning and Computer Vision to improve Corrosion risk assessments
Using Deep Learning and Computer Vision to improve Corrosion risk assessmentsChijioke “CJ” Ejimuda
 
Could Edge Computing Lead to the End of Real Time Operating Centers?
Could Edge Computing Lead to the End of Real Time Operating Centers?Could Edge Computing Lead to the End of Real Time Operating Centers?
Could Edge Computing Lead to the End of Real Time Operating Centers?Chijioke “CJ” Ejimuda
 
Optimizing PV energy yield with Elasticsearch and graphQL
Optimizing PV energy yield with Elasticsearch and graphQLOptimizing PV energy yield with Elasticsearch and graphQL
Optimizing PV energy yield with Elasticsearch and graphQLChijioke “CJ” Ejimuda
 
Self Driving Directional Drilling on the Edge
Self Driving Directional Drilling on the EdgeSelf Driving Directional Drilling on the Edge
Self Driving Directional Drilling on the EdgeChijioke “CJ” Ejimuda
 
hybriData IIoT Workshop for AAPG Short Course
hybriData IIoT Workshop for AAPG Short CoursehybriData IIoT Workshop for AAPG Short Course
hybriData IIoT Workshop for AAPG Short CourseChijioke “CJ” Ejimuda
 
IIoT: The Whole Gamut - Exploration --> Drilling --> Production --> Facility
IIoT: The Whole Gamut - Exploration --> Drilling --> Production --> FacilityIIoT: The Whole Gamut - Exploration --> Drilling --> Production --> Facility
IIoT: The Whole Gamut - Exploration --> Drilling --> Production --> FacilityChijioke “CJ” Ejimuda
 

Más de Chijioke “CJ” Ejimuda (12)

Revolutionizing Crtitical Infrastructure Connectivity
Revolutionizing Crtitical Infrastructure ConnectivityRevolutionizing Crtitical Infrastructure Connectivity
Revolutionizing Crtitical Infrastructure Connectivity
 
Leveraging big data to maximize value from rail and power infrastructure assets.
Leveraging big data to maximize value from rail and power infrastructure assets.Leveraging big data to maximize value from rail and power infrastructure assets.
Leveraging big data to maximize value from rail and power infrastructure assets.
 
Internet of Energy: "Can python prevent California wildfires?"
Internet of Energy: "Can python prevent California wildfires?"Internet of Energy: "Can python prevent California wildfires?"
Internet of Energy: "Can python prevent California wildfires?"
 
Using Deep Learning and Computer Vision to improve Corrosion risk assessments
Using Deep Learning and Computer Vision to improve Corrosion risk assessmentsUsing Deep Learning and Computer Vision to improve Corrosion risk assessments
Using Deep Learning and Computer Vision to improve Corrosion risk assessments
 
Learning from Autonomous Vehicle Industry
Learning from Autonomous Vehicle IndustryLearning from Autonomous Vehicle Industry
Learning from Autonomous Vehicle Industry
 
Could Edge Computing Lead to the End of Real Time Operating Centers?
Could Edge Computing Lead to the End of Real Time Operating Centers?Could Edge Computing Lead to the End of Real Time Operating Centers?
Could Edge Computing Lead to the End of Real Time Operating Centers?
 
Optimizing PV energy yield with Elasticsearch and graphQL
Optimizing PV energy yield with Elasticsearch and graphQLOptimizing PV energy yield with Elasticsearch and graphQL
Optimizing PV energy yield with Elasticsearch and graphQL
 
Self Driving Directional Drilling on the Edge
Self Driving Directional Drilling on the EdgeSelf Driving Directional Drilling on the Edge
Self Driving Directional Drilling on the Edge
 
hybriData Energy Services and Data Products
hybriData Energy Services and Data ProductshybriData Energy Services and Data Products
hybriData Energy Services and Data Products
 
hybriData IIoT Workshop for AAPG Short Course
hybriData IIoT Workshop for AAPG Short CoursehybriData IIoT Workshop for AAPG Short Course
hybriData IIoT Workshop for AAPG Short Course
 
IIoT: The Whole Gamut - Exploration --> Drilling --> Production --> Facility
IIoT: The Whole Gamut - Exploration --> Drilling --> Production --> FacilityIIoT: The Whole Gamut - Exploration --> Drilling --> Production --> Facility
IIoT: The Whole Gamut - Exploration --> Drilling --> Production --> Facility
 
elasticsearch X react
elasticsearch X reactelasticsearch X react
elasticsearch X react
 

Último

Online banking management system project.pdf
Online banking management system project.pdfOnline banking management system project.pdf
Online banking management system project.pdfKamal Acharya
 
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Standamitlee9823
 
Thermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - VThermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - VDineshKumar4165
 
Booking open Available Pune Call Girls Pargaon 6297143586 Call Hot Indian Gi...
Booking open Available Pune Call Girls Pargaon  6297143586 Call Hot Indian Gi...Booking open Available Pune Call Girls Pargaon  6297143586 Call Hot Indian Gi...
Booking open Available Pune Call Girls Pargaon 6297143586 Call Hot Indian Gi...Call Girls in Nagpur High Profile
 
Bhosari ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready For ...
Bhosari ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready For ...Bhosari ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready For ...
Bhosari ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready For ...tanu pandey
 
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordCCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordAsst.prof M.Gokilavani
 
Call for Papers - International Journal of Intelligent Systems and Applicatio...
Call for Papers - International Journal of Intelligent Systems and Applicatio...Call for Papers - International Journal of Intelligent Systems and Applicatio...
Call for Papers - International Journal of Intelligent Systems and Applicatio...Christo Ananth
 
data_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdfdata_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdfJiananWang21
 
Double Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torqueDouble Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torqueBhangaleSonal
 
Unleashing the Power of the SORA AI lastest leap
Unleashing the Power of the SORA AI lastest leapUnleashing the Power of the SORA AI lastest leap
Unleashing the Power of the SORA AI lastest leapRishantSharmaFr
 
Call Girls Wakad Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Wakad Call Me 7737669865 Budget Friendly No Advance BookingCall Girls Wakad Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Wakad Call Me 7737669865 Budget Friendly No Advance Bookingroncy bisnoi
 
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdfONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdfKamal Acharya
 
Double rodded leveling 1 pdf activity 01
Double rodded leveling 1 pdf activity 01Double rodded leveling 1 pdf activity 01
Double rodded leveling 1 pdf activity 01KreezheaRecto
 
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...roncy bisnoi
 
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...SUHANI PANDEY
 
Vivazz, Mieres Social Housing Design Spain
Vivazz, Mieres Social Housing Design SpainVivazz, Mieres Social Housing Design Spain
Vivazz, Mieres Social Housing Design Spaintimesproduction05
 
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...ranjana rawat
 

Último (20)

Online banking management system project.pdf
Online banking management system project.pdfOnline banking management system project.pdf
Online banking management system project.pdf
 
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
 
Water Industry Process Automation & Control Monthly - April 2024
Water Industry Process Automation & Control Monthly - April 2024Water Industry Process Automation & Control Monthly - April 2024
Water Industry Process Automation & Control Monthly - April 2024
 
Thermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - VThermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - V
 
NFPA 5000 2024 standard .
NFPA 5000 2024 standard                                  .NFPA 5000 2024 standard                                  .
NFPA 5000 2024 standard .
 
Booking open Available Pune Call Girls Pargaon 6297143586 Call Hot Indian Gi...
Booking open Available Pune Call Girls Pargaon  6297143586 Call Hot Indian Gi...Booking open Available Pune Call Girls Pargaon  6297143586 Call Hot Indian Gi...
Booking open Available Pune Call Girls Pargaon 6297143586 Call Hot Indian Gi...
 
Bhosari ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready For ...
Bhosari ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready For ...Bhosari ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready For ...
Bhosari ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready For ...
 
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordCCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
 
Call for Papers - International Journal of Intelligent Systems and Applicatio...
Call for Papers - International Journal of Intelligent Systems and Applicatio...Call for Papers - International Journal of Intelligent Systems and Applicatio...
Call for Papers - International Journal of Intelligent Systems and Applicatio...
 
data_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdfdata_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdf
 
Double Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torqueDouble Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torque
 
Unleashing the Power of the SORA AI lastest leap
Unleashing the Power of the SORA AI lastest leapUnleashing the Power of the SORA AI lastest leap
Unleashing the Power of the SORA AI lastest leap
 
Call Girls Wakad Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Wakad Call Me 7737669865 Budget Friendly No Advance BookingCall Girls Wakad Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Wakad Call Me 7737669865 Budget Friendly No Advance Booking
 
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdfONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
 
Double rodded leveling 1 pdf activity 01
Double rodded leveling 1 pdf activity 01Double rodded leveling 1 pdf activity 01
Double rodded leveling 1 pdf activity 01
 
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
 
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
 
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...
 
Vivazz, Mieres Social Housing Design Spain
Vivazz, Mieres Social Housing Design SpainVivazz, Mieres Social Housing Design Spain
Vivazz, Mieres Social Housing Design Spain
 
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
 

NAPE 2019 Presentation

  • 1.
  • 2. NAPE ICE 2019 Outline ● Why Process Petrophysical Big Data? ● What Big Data processing challenges? ● ETL Workflow ● Conclusion ● References
  • 3. NAPE ICE 2019Why Process Petrophysical Big Data? ● Re-evaluate old well logs for opportunities ● Conducting pre-drill analysis of offset wells ● Unable to effectively assess well / field reserves ● Challenge with inferring geological features
  • 4. NAPE ICE 2019 ● For 1 to 10 well log files? - Copying the link and pasting on the browser is straightforward - Quickly download log data - Easier to perform ETL with such amount of data What Big Data Processing Challenges?
  • 5. NAPE ICE 2019 ● For 1 to 10 well log files? ● For 1000 well log files ??? What Big Data Processing Challenges?
  • 6. NAPE ICE 2019 ● Link to ~ 1000 well log data from 5 fields in excel sheets What Big Data Processing Challenges?
  • 7. NAPE ICE 2019 ● Download each well log file individually from the web ● Read log data from each file ● Enrich metadata and actual data files and save as Apache Arrow data format before loading to AWS S3 bucket ● GOAL: Making data ready for Apache Spark ML and Tensorflow Deep Learning Pipeline Extract Transform Load
  • 8. NAPE ICE 2019 ● Link to ~ 1000 well log data from 5 fields in excel sheets ● Download each well log file individually from the web - get the links to the files - append all the extracted links to a list - account for errors - save the file ETL Workflow
  • 11. NAPE ICE 2019 ● Link to ~ 1000 well log data from 5 fields in excel sheets ● Download each well log file individually from the web ● Read log data from each file - extract their actual data and Metadata / Header data - account for errors ETL Workflow
  • 13. NAPE ICE 2019 ● Link to ~ 1000 well log data from 5 fields in excel sheets ● Download each well log file individually from the web ● Read log data from each file ● Enrich metadata and actual data files and save as Apache Arrow data format before loading to AWS s3 bucket ETL Workflow
  • 14. NAPE ICE 2019 Why Apache Arrow? ETL Workflow
  • 15. NAPE ICE 2019 ● Link to ~ 1000 well log data from 5 fields in excel sheets ● Download each well log file individually from the web ● Read log data from each file ● Enrich metadata and actual data files and save as Apache Arrow data format before loading to AWS s3 bucket ● Making data ready for Apache Spark ML / Keras Deep Learning Pipeline - drop columns: 152 to 13 , drop duplicates , null / NA values, account for missing values - Split-apply-combine on grouped data by field and API: @pandas_udf - Caching dataframe ETL Workflow
  • 16. NAPE ICE 2019Conclusion ● Apache Airflow to orchestrate ETL process ● Moving towards real time data processing: -WITSML data processing ● Apache Kafka, Apache Flink, Apache Storm, Apache Spark
  • 17. NAPE ICE 2019References ● http://www.searchanddiscovery.com/pdfz/documents/2018/42234ejimuda/ndx_ejimuda.pdf.html ● https://www.slideshare.net/wesm/high-performance-python-on-apache-spark ● https://spark.apache.org/docs/latest/sql-pyspark-pandas-with-arrow.html ● https://airflow.apache.org/installation.html