SlideShare una empresa de Scribd logo
1 de 19
Descargar para leer sin conexión
WITSML data processing
example with Kafka and
Spark Streaming
Houston Hadoop Meetup, 4/26/2016
About me - Dmitry Kniazev
Currently Solution Architect at EPAM Systems
- About 4 years in Oil & Gas here in Houston
- Started working with Hadoop about 2 years ago
Before that BI/DW Specialist at EPAM Systems for 6 years
- Reports, ETL with Oracle, Microsoft, Cognos and other tools
- Enjoyed not SO HOT life in Eastern Europe
Before that Performance Analyst at EPAM Systems for 4 years
- Web Applications and Databases optimization
What is the problem?
Source: http://www.croftsystems.net/blog/conventional-vs.-unconventional
What is WITSML?
DATA EXCHANGE STANDARD FOR THE UPSTREAM OIL AND GAS INDUSTRY
WITSML
Data
Store
Rig
Aggregation
Solution
Rig
Aggregation
Solution
Corp
Store
WITSML
Data
Store
Service Company
#1
Operator #1
Service Company
#2
WITSML based
ApplicationsWITSML
Operator Company Data Center
Architecture
WITSML
Data
Store
HBase
WITSML
via
SOAP
Internet
Consumer
(Scala)
Producer
(Scala)
Service
Company
DC
Kafka
Consumer
(Scala)
Email /
Browser
What is Kafka?
What is Spark Streaming?
Discretized Stream
Producer - prep
// some important imports
import com.mycompany.witsml.client.WitsmlClient //based on jwitsml 1.0
import org.apache.kafka.clients.producer.{KafkaProducer, ProducerRecord}
import scala.xml.{Elem, Node, XML}
// variables initialization
var producer: KafkaProducer[String, String] = null
var startTimeIndex = DateTime.now()
var topic = ""
var pollInterval = 5
Producer - Kafka Properties
bootstrap.servers = srv1:9092,srv2:9092
key.serializer = org.apache.kafka.common.serialization.StringSerializer
value.serializer = org.apache.kafka.common.serialization.StringSerializer
Producer - main function
producer = new KafkaProducer[String, String](props)
// each wellBore is a separate Kafka topic which is going to be partitioned by log
topic = args(0)
while (true) {
val logs = WitsmlClient.getWitsmlResponse(logsQuery)
// parse logs and send messages to Kafka
(logs  "log").foreach { node: Node =>
// send all data from one log to the same partition
val key = (node  "@uidLog").text
(node  "data").foreach { data =>
val message = new ProducerRecord(topic, null, key, data.text)
producer.send(message)
}
}
Producer - results
”Well123” => Topic
“5207KFSJ18” => Key (Partition)
Content of <data> element => Message
Consumer - prep
import org.apache.spark.SparkConf
import org.apache.spark.sql.{Row, SQLContext}
import org.apache.spark.streaming.dstream.InputDStream
import org.apache.spark.streaming.kafka.KafkaUtils
var schema: StructType = null
val sc = new SparkConf().setAppName("WitsmlKafkaDemo")
val ssc = new StreamingContext(sc, Seconds(1))
val dStream: InputDStream = KafkaUtils.createDirectStream(ssc, kafkaParams, topics)
val sqlContext = new SQLContext(ssc.sparkContext)
Consumer - Rules Definition
# fields for Spark SQL query
`Co. Man G/L`,`Gain Loss - Spare`,`ACC_DRILL_STRKS`
# where clause for SQL query
`Co. Man G/L`>100 OR `Gain Loss - Spare`<(-42.1)
Consumer - main function
dStream.foreachRDD( batchRDD => {
val messages = batchRDD.map(_._2).map(_.split(","))
//create DataFrame with a custom schema
val df = sqlContext.createDataFrame(messages, schema)
//register temp table and test against rule
df.registerTempTable("timeLog")
val collected = sqlContext.sql("SELECT " + fields + " FROM timeLog WHERE " + condition).collect
if (collected.length > 0) {
//send email alert
WitsmlKafkaUtil.sendEmail(collected)
}
})
ssc.start()
ssc.awaitTermination()
Visualization with Highcharts
Why Highcharts?
- Websockets support -> real-time data visualization
- Multiple Y-axes that automatically scale -> many mnemonics on the same chart
- Inverted X-axis -> great for Depth Logs
- 3D charts that can be rotated -> Trajectories
- Area range with custom colors -> Formations on the background
- 100% client side javascript -> easy to deploy
Lessons Learned
- Throw away and re-design:
- Logs should be Topics, Wells(Wellbores) should be Partitions for Scalability
- Producers and Consumers should be Managed Services (Flume Agents?)
- Backend:
- Land data to HBase (and probably OpenTSDB)
- Frontend:
- WebApp to visualize both NRT and historical data?
- Mobile App for Alerts?
- Improve Producers:
- Speak many WITSML dialects?
- Get ready for Real-time:
- Support for ETP standard
Thank you!
dmitry_kniazev@epam.com
Links:
http://www.energistics.org/
http://www.highcharts.com/
https://spark.apache.org/
http://kafka.apache.org/

Más contenido relacionado

La actualidad más candente

Автоматизация и Selenium IDE
Автоматизация и Selenium IDEАвтоматизация и Selenium IDE
Автоматизация и Selenium IDE
ISsoft
 

La actualidad más candente (15)

Selenium ppt
Selenium pptSelenium ppt
Selenium ppt
 
testng
testngtestng
testng
 
Selenium Concepts
Selenium ConceptsSelenium Concepts
Selenium Concepts
 
Автоматизация и Selenium IDE
Автоматизация и Selenium IDEАвтоматизация и Selenium IDE
Автоматизация и Selenium IDE
 
OOSE-UNIT-1.pptx
OOSE-UNIT-1.pptxOOSE-UNIT-1.pptx
OOSE-UNIT-1.pptx
 
.NET Core, ASP.NET Core Course, Session 6
.NET Core, ASP.NET Core Course, Session 6.NET Core, ASP.NET Core Course, Session 6
.NET Core, ASP.NET Core Course, Session 6
 
Lwc presentation
Lwc presentationLwc presentation
Lwc presentation
 
以 Laravel 經驗開發 Hyperf 應用
以 Laravel 經驗開發 Hyperf 應用以 Laravel 經驗開發 Hyperf 應用
以 Laravel 經驗開發 Hyperf 應用
 
Introduction to ado.net
Introduction to ado.netIntroduction to ado.net
Introduction to ado.net
 
Selenium
SeleniumSelenium
Selenium
 
Introduction to lightning Web Component
Introduction to lightning Web ComponentIntroduction to lightning Web Component
Introduction to lightning Web Component
 
Feature and Future of ASP.NET
Feature and Future of ASP.NETFeature and Future of ASP.NET
Feature and Future of ASP.NET
 
SQL Differences SQL Interview Questions
SQL Differences  SQL Interview QuestionsSQL Differences  SQL Interview Questions
SQL Differences SQL Interview Questions
 
Difference Between Sql - MySql and Oracle
Difference Between Sql - MySql and OracleDifference Between Sql - MySql and Oracle
Difference Between Sql - MySql and Oracle
 
передача параметрів в функції
передача параметрів в функціїпередача параметрів в функції
передача параметрів в функції
 

Destacado

Night owl by Boyd Meyer of PROS
Night owl by Boyd Meyer of PROS Night owl by Boyd Meyer of PROS
Night owl by Boyd Meyer of PROS
Mark Kerzner
 
Witsml core api_version_1.3.1
Witsml core api_version_1.3.1Witsml core api_version_1.3.1
Witsml core api_version_1.3.1
Suresh Ayyappan
 
Set up Hadoop Cluster on Amazon EC2
Set up Hadoop Cluster on Amazon EC2Set up Hadoop Cluster on Amazon EC2
Set up Hadoop Cluster on Amazon EC2
IMC Institute
 
kafka-steaming-data
kafka-steaming-datakafka-steaming-data
kafka-steaming-data
Bryan Jacobs
 

Destacado (20)

Toorcamp 2016
Toorcamp 2016Toorcamp 2016
Toorcamp 2016
 
Cloudera search
Cloudera searchCloudera search
Cloudera search
 
Hadoop as a service presented by Ajay Jha at Houston Hadoop Meetup
Hadoop as a service presented by Ajay Jha at Houston Hadoop MeetupHadoop as a service presented by Ajay Jha at Houston Hadoop Meetup
Hadoop as a service presented by Ajay Jha at Houston Hadoop Meetup
 
Oil and gas big data edition
Oil and gas  big data editionOil and gas  big data edition
Oil and gas big data edition
 
Night owl by Boyd Meyer of PROS
Night owl by Boyd Meyer of PROS Night owl by Boyd Meyer of PROS
Night owl by Boyd Meyer of PROS
 
Witsml core api_version_1.3.1
Witsml core api_version_1.3.1Witsml core api_version_1.3.1
Witsml core api_version_1.3.1
 
Oil and Gas Climate Initiative 2016 report
Oil and Gas Climate Initiative 2016 reportOil and Gas Climate Initiative 2016 report
Oil and Gas Climate Initiative 2016 report
 
Hadoop on ec2
Hadoop on ec2Hadoop on ec2
Hadoop on ec2
 
Set up Hadoop Cluster on Amazon EC2
Set up Hadoop Cluster on Amazon EC2Set up Hadoop Cluster on Amazon EC2
Set up Hadoop Cluster on Amazon EC2
 
Hadoop Hadoop & Spark meetup - Altiscale
Hadoop Hadoop & Spark meetup - AltiscaleHadoop Hadoop & Spark meetup - Altiscale
Hadoop Hadoop & Spark meetup - Altiscale
 
WITSML data processing with Kafka and Spark Streaming
WITSML data processing with Kafka and Spark StreamingWITSML data processing with Kafka and Spark Streaming
WITSML data processing with Kafka and Spark Streaming
 
Challenges in Global Standardisation | EnergySys Hydrocarbon Allocation Forum
Challenges in Global Standardisation | EnergySys Hydrocarbon Allocation ForumChallenges in Global Standardisation | EnergySys Hydrocarbon Allocation Forum
Challenges in Global Standardisation | EnergySys Hydrocarbon Allocation Forum
 
GIS Technology and E&P in Petroleum Industry Context, Applications and Impact...
GIS Technology and E&P in Petroleum Industry Context, Applications and Impact...GIS Technology and E&P in Petroleum Industry Context, Applications and Impact...
GIS Technology and E&P in Petroleum Industry Context, Applications and Impact...
 
WITSML to PPDM mapping project
WITSML to PPDM mapping projectWITSML to PPDM mapping project
WITSML to PPDM mapping project
 
Standards for Production Allocation
Standards for Production AllocationStandards for Production Allocation
Standards for Production Allocation
 
kafka-steaming-data
kafka-steaming-datakafka-steaming-data
kafka-steaming-data
 
Configuring Your First Hadoop Cluster On EC2
Configuring Your First Hadoop Cluster On EC2Configuring Your First Hadoop Cluster On EC2
Configuring Your First Hadoop Cluster On EC2
 
WITSML
WITSMLWITSML
WITSML
 
Prodml Production Reporting | Hydrocarbon Allocation Forum | 2014 09-30
Prodml Production Reporting | Hydrocarbon Allocation Forum | 2014 09-30Prodml Production Reporting | Hydrocarbon Allocation Forum | 2014 09-30
Prodml Production Reporting | Hydrocarbon Allocation Forum | 2014 09-30
 
Data Modelling and WITSML
Data Modelling and WITSMLData Modelling and WITSML
Data Modelling and WITSML
 

Similar a Witsml data processing with kafka and spark streaming

Similar a Witsml data processing with kafka and spark streaming (20)

5 Ways to Use Spark to Enrich your Cassandra Environment
5 Ways to Use Spark to Enrich your Cassandra Environment5 Ways to Use Spark to Enrich your Cassandra Environment
5 Ways to Use Spark to Enrich your Cassandra Environment
 
Jump Start with Apache Spark 2.0 on Databricks
Jump Start with Apache Spark 2.0 on DatabricksJump Start with Apache Spark 2.0 on Databricks
Jump Start with Apache Spark 2.0 on Databricks
 
Strata NYC 2015: What's new in Spark Streaming
Strata NYC 2015: What's new in Spark StreamingStrata NYC 2015: What's new in Spark Streaming
Strata NYC 2015: What's new in Spark Streaming
 
PyconZA19-Distributed-workloads-challenges-with-PySpark-and-Airflow
PyconZA19-Distributed-workloads-challenges-with-PySpark-and-AirflowPyconZA19-Distributed-workloads-challenges-with-PySpark-and-Airflow
PyconZA19-Distributed-workloads-challenges-with-PySpark-and-Airflow
 
Spark Study Notes
Spark Study NotesSpark Study Notes
Spark Study Notes
 
20170126 big data processing
20170126 big data processing20170126 big data processing
20170126 big data processing
 
Spark Cassandra Connector: Past, Present and Furure
Spark Cassandra Connector: Past, Present and FurureSpark Cassandra Connector: Past, Present and Furure
Spark Cassandra Connector: Past, Present and Furure
 
Spark Saturday: Spark SQL & DataFrame Workshop with Apache Spark 2.3
Spark Saturday: Spark SQL & DataFrame Workshop with Apache Spark 2.3Spark Saturday: Spark SQL & DataFrame Workshop with Apache Spark 2.3
Spark Saturday: Spark SQL & DataFrame Workshop with Apache Spark 2.3
 
Spark streaming state of the union
Spark streaming state of the unionSpark streaming state of the union
Spark streaming state of the union
 
Intro to apache spark
Intro to apache sparkIntro to apache spark
Intro to apache spark
 
Learning spark ch10 - Spark Streaming
Learning spark ch10 - Spark StreamingLearning spark ch10 - Spark Streaming
Learning spark ch10 - Spark Streaming
 
Spark ML Pipeline serving
Spark ML Pipeline servingSpark ML Pipeline serving
Spark ML Pipeline serving
 
Unified Data Access with Gimel
Unified Data Access with GimelUnified Data Access with Gimel
Unified Data Access with Gimel
 
Data orchestration | 2020 | Alluxio | Gimel
Data orchestration | 2020 | Alluxio | GimelData orchestration | 2020 | Alluxio | Gimel
Data orchestration | 2020 | Alluxio | Gimel
 
Spark streaming with kafka
Spark streaming with kafkaSpark streaming with kafka
Spark streaming with kafka
 
Spark stream - Kafka
Spark stream - Kafka Spark stream - Kafka
Spark stream - Kafka
 
Nike tech talk.2
Nike tech talk.2Nike tech talk.2
Nike tech talk.2
 
Big Data Processing with .NET and Spark (SQLBits 2020)
Big Data Processing with .NET and Spark (SQLBits 2020)Big Data Processing with .NET and Spark (SQLBits 2020)
Big Data Processing with .NET and Spark (SQLBits 2020)
 
Data Engineer's Lunch #82: Automating Apache Cassandra Operations with Apache...
Data Engineer's Lunch #82: Automating Apache Cassandra Operations with Apache...Data Engineer's Lunch #82: Automating Apache Cassandra Operations with Apache...
Data Engineer's Lunch #82: Automating Apache Cassandra Operations with Apache...
 
SnappyData overview NikeTechTalk 11/19/15
SnappyData overview NikeTechTalk 11/19/15SnappyData overview NikeTechTalk 11/19/15
SnappyData overview NikeTechTalk 11/19/15
 

Más de Mark Kerzner

FreeEed popcorn overview
FreeEed popcorn overviewFreeEed popcorn overview
FreeEed popcorn overview
Mark Kerzner
 
FreeEed presentation
FreeEed presentationFreeEed presentation
FreeEed presentation
Mark Kerzner
 
Porting your hadoop app to horton works hdp
Porting your hadoop app to horton works hdpPorting your hadoop app to horton works hdp
Porting your hadoop app to horton works hdp
Mark Kerzner
 
Google Office in Zurich, Switzerland
Google Office in Zurich, SwitzerlandGoogle Office in Zurich, Switzerland
Google Office in Zurich, Switzerland
Mark Kerzner
 
Fun art with fruit and vegetable
Fun art with fruit and vegetableFun art with fruit and vegetable
Fun art with fruit and vegetable
Mark Kerzner
 
Carnavale de Venice
Carnavale de VeniceCarnavale de Venice
Carnavale de Venice
Mark Kerzner
 

Más de Mark Kerzner (20)

IBM Strategy for Spark
IBM Strategy for SparkIBM Strategy for Spark
IBM Strategy for Spark
 
Joe Witt presentation on Apache NiFi
Joe Witt presentation on Apache NiFiJoe Witt presentation on Apache NiFi
Joe Witt presentation on Apache NiFi
 
FreeEed popcorn overview
FreeEed popcorn overviewFreeEed popcorn overview
FreeEed popcorn overview
 
FreeEed presentation
FreeEed presentationFreeEed presentation
FreeEed presentation
 
Nutch + Hadoop scaled, for crawling protected web sites (hint: Selenium)
Nutch + Hadoop scaled, for crawling protected web sites (hint: Selenium)Nutch + Hadoop scaled, for crawling protected web sites (hint: Selenium)
Nutch + Hadoop scaled, for crawling protected web sites (hint: Selenium)
 
SHMcloud vision
SHMcloud visionSHMcloud vision
SHMcloud vision
 
Porting your hadoop app to horton works hdp
Porting your hadoop app to horton works hdpPorting your hadoop app to horton works hdp
Porting your hadoop app to horton works hdp
 
Automated Hadoop Cluster Construction on EC2
Automated Hadoop Cluster Construction on EC2Automated Hadoop Cluster Construction on EC2
Automated Hadoop Cluster Construction on EC2
 
Open source e_discovery
Open source e_discoveryOpen source e_discovery
Open source e_discovery
 
FreEed - Open Source eDiscovery
FreEed - Open Source eDiscoveryFreEed - Open Source eDiscovery
FreEed - Open Source eDiscovery
 
Houston Hadoop Meetup Presentation by Vikram Oberoi of Cloudera
Houston Hadoop Meetup Presentation by Vikram Oberoi of ClouderaHouston Hadoop Meetup Presentation by Vikram Oberoi of Cloudera
Houston Hadoop Meetup Presentation by Vikram Oberoi of Cloudera
 
Google Office in Zurich, Switzerland
Google Office in Zurich, SwitzerlandGoogle Office in Zurich, Switzerland
Google Office in Zurich, Switzerland
 
Fun art with fruit and vegetable
Fun art with fruit and vegetableFun art with fruit and vegetable
Fun art with fruit and vegetable
 
Carnavale de Venice
Carnavale de VeniceCarnavale de Venice
Carnavale de Venice
 
Holocaust Memorial Tato
Holocaust Memorial TatoHolocaust Memorial Tato
Holocaust Memorial Tato
 
Yehuda Pen
Yehuda PenYehuda Pen
Yehuda Pen
 
Mark Chagall
Mark ChagallMark Chagall
Mark Chagall
 
Thailand Visite
Thailand VisiteThailand Visite
Thailand Visite
 
Venice views with music
Venice views with musicVenice views with music
Venice views with music
 
Jean Beraud Paris
Jean Beraud ParisJean Beraud Paris
Jean Beraud Paris
 

Último

Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Victor Rentea
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Último (20)

Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering Developers
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 

Witsml data processing with kafka and spark streaming

  • 1. WITSML data processing example with Kafka and Spark Streaming Houston Hadoop Meetup, 4/26/2016
  • 2. About me - Dmitry Kniazev Currently Solution Architect at EPAM Systems - About 4 years in Oil & Gas here in Houston - Started working with Hadoop about 2 years ago Before that BI/DW Specialist at EPAM Systems for 6 years - Reports, ETL with Oracle, Microsoft, Cognos and other tools - Enjoyed not SO HOT life in Eastern Europe Before that Performance Analyst at EPAM Systems for 4 years - Web Applications and Databases optimization
  • 3. What is the problem? Source: http://www.croftsystems.net/blog/conventional-vs.-unconventional
  • 4. What is WITSML? DATA EXCHANGE STANDARD FOR THE UPSTREAM OIL AND GAS INDUSTRY WITSML Data Store Rig Aggregation Solution Rig Aggregation Solution Corp Store WITSML Data Store Service Company #1 Operator #1 Service Company #2 WITSML based ApplicationsWITSML
  • 5. Operator Company Data Center Architecture WITSML Data Store HBase WITSML via SOAP Internet Consumer (Scala) Producer (Scala) Service Company DC Kafka Consumer (Scala) Email / Browser
  • 7. What is Spark Streaming?
  • 9. Producer - prep // some important imports import com.mycompany.witsml.client.WitsmlClient //based on jwitsml 1.0 import org.apache.kafka.clients.producer.{KafkaProducer, ProducerRecord} import scala.xml.{Elem, Node, XML} // variables initialization var producer: KafkaProducer[String, String] = null var startTimeIndex = DateTime.now() var topic = "" var pollInterval = 5
  • 10. Producer - Kafka Properties bootstrap.servers = srv1:9092,srv2:9092 key.serializer = org.apache.kafka.common.serialization.StringSerializer value.serializer = org.apache.kafka.common.serialization.StringSerializer
  • 11. Producer - main function producer = new KafkaProducer[String, String](props) // each wellBore is a separate Kafka topic which is going to be partitioned by log topic = args(0) while (true) { val logs = WitsmlClient.getWitsmlResponse(logsQuery) // parse logs and send messages to Kafka (logs "log").foreach { node: Node => // send all data from one log to the same partition val key = (node "@uidLog").text (node "data").foreach { data => val message = new ProducerRecord(topic, null, key, data.text) producer.send(message) } }
  • 12. Producer - results ”Well123” => Topic “5207KFSJ18” => Key (Partition) Content of <data> element => Message
  • 13. Consumer - prep import org.apache.spark.SparkConf import org.apache.spark.sql.{Row, SQLContext} import org.apache.spark.streaming.dstream.InputDStream import org.apache.spark.streaming.kafka.KafkaUtils var schema: StructType = null val sc = new SparkConf().setAppName("WitsmlKafkaDemo") val ssc = new StreamingContext(sc, Seconds(1)) val dStream: InputDStream = KafkaUtils.createDirectStream(ssc, kafkaParams, topics) val sqlContext = new SQLContext(ssc.sparkContext)
  • 14. Consumer - Rules Definition # fields for Spark SQL query `Co. Man G/L`,`Gain Loss - Spare`,`ACC_DRILL_STRKS` # where clause for SQL query `Co. Man G/L`>100 OR `Gain Loss - Spare`<(-42.1)
  • 15. Consumer - main function dStream.foreachRDD( batchRDD => { val messages = batchRDD.map(_._2).map(_.split(",")) //create DataFrame with a custom schema val df = sqlContext.createDataFrame(messages, schema) //register temp table and test against rule df.registerTempTable("timeLog") val collected = sqlContext.sql("SELECT " + fields + " FROM timeLog WHERE " + condition).collect if (collected.length > 0) { //send email alert WitsmlKafkaUtil.sendEmail(collected) } }) ssc.start() ssc.awaitTermination()
  • 17. Why Highcharts? - Websockets support -> real-time data visualization - Multiple Y-axes that automatically scale -> many mnemonics on the same chart - Inverted X-axis -> great for Depth Logs - 3D charts that can be rotated -> Trajectories - Area range with custom colors -> Formations on the background - 100% client side javascript -> easy to deploy
  • 18. Lessons Learned - Throw away and re-design: - Logs should be Topics, Wells(Wellbores) should be Partitions for Scalability - Producers and Consumers should be Managed Services (Flume Agents?) - Backend: - Land data to HBase (and probably OpenTSDB) - Frontend: - WebApp to visualize both NRT and historical data? - Mobile App for Alerts? - Improve Producers: - Speak many WITSML dialects? - Get ready for Real-time: - Support for ETP standard