SlideShare una empresa de Scribd logo
1 de 14
Descargar para leer sin conexión
PRECISION AGRICULTURE SUPPORT
USING SCALA/SPARK
Project Report
SRIRAM RV
SPRING SEMESTER
ADVISOR: PROFESSOR BRAD RUBIN
2
Table of Contents
1.0 PURPOSE OF PROJECT.......................................................................................................................4
2.0 PROJECT DESCRIPTION.....................................................................................................................4
2.0 Why Agriculture Data.........................................................................................................................5
3.0 DATASET ..............................................................................................................................................5
3.1 Data Source:........................................................................................................................................5
3.2 Details about Dataset: .........................................................................................................................5
3.3 Sample Data........................................................................................................................................5
Weather data .........................................................................................................................................5
Moisture Data........................................................................................................................................5
Image Data............................................................................................................................................6
3.4 Schema................................................................................................................................................6
Weather data .........................................................................................................................................6
Moisture Data........................................................................................................................................6
3.5 Data Description: ................................................................................................................................7
Weather Data: .......................................................................................................................................7
Moisture Data........................................................................................................................................7
4.0 PROJECT IMPLEMENTATION...........................................................................................................8
4.1 Data Ingestion using Kafka.................................................................................................................8
4.2 Kafka producer....................................................................................................................................8
4.4 Kafka Broker.......................................................................................................................................9
4.5 Kafka Consumer ...............................................................................................................................10
5.0 ADDITIONAL TOOLS........................................................................................................................10
5.1 Maven ...............................................................................................................................................10
5.2 Scala Build tool.................................................................................................................................11
5.3 Git .....................................................................................................................................................11
6.0 OUTPUT INTERPRETATION............................................................................................................12
7.0 IMPROVING THE KAFKA ARCHITECTURE.................................................................................12
7.1. Making kafka architecture more robust ...........................................................................................12
7.2. Having dedicated Kafka Broker to improve performance ...............................................................13
3
8.0 FUTURE RESEARCH.........................................................................................................................13
9.0. CONCLUSION....................................................................................................................................13
BIBLIOGRAPHY.......................................................................................................................................14
4
1.0 PURPOSE OF PROJECT
Big data tools over last few years has been focused on both structured and unstructured data.
However, image processing is one area where it needs more of attention and it has been my
area of interest too. With the help of this project, I will get an opportunity to experiment with
streaming images and weather data captured in the UST greenhouse, and also get a feel for
image processing with Scala/Spark on Hadoop more generally.
I will gain experience in technologies such as Scala, Spark, Spark streaming, and image
processing in the domain of food technology that will give me skills that I cannot otherwise
obtain in the GPS curriculum.
2.0 PROJECT DESCRIPTION
The purpose of the project is to stream real-time weather data captured by both direct sensors
and RGB images captured by the drones to perform image processing and weather data
analytics leveraging the Scala/Spark ecosystem on a Hadoop computing cluster. Since image
processing and streaming with Spark are knew technologies to GPS, part of the project will
focus on experimenting with different tools and find out more reliable way of storing images
and streamed data in HDFS.
The UST greenhouse will be growing plants for the Precision Agriculture project run by the UST
School of Engineering. The greenhouse has a local weather station that will be broadcasting
weather data such as temperature, humidity, light intensity, barometric pressure, position
(latitude/longitude), wind speed and direction and rainfall. The broadcast will be continuous at
10 second intervals (in CSV format) .The equipment in the greenhouse is a prototype for field
use which is useful for both analysis of plant health and creating a model for each of the six
plant species that will be grown. In addition, high resolution images will be taken of the plants
in the visible and near IR regions of the light spectrum. The periodicity of these images will be
every couple of days.
5
2.0 Why Agriculture Data
With the help of agricultural data, I will get an opportunity to experiment withstreaming images
and weather data captured in the UST greenhouse. Data captured in greenhouse is so much detailed
and gives me experience on working with data from food technology.
3.0 DATASET
3.1 Data Source:
The data source used for this project is the live streaming of weather and moisture data captured using
sensors through Arduino chip and Streamed using Kafka producer.
3.2 Details about Dataset:
 The sensor data were captured for every second.
 Total number of days of weather data stored in HDFS is 90 days.
 Total number of days of moisture data stored in HDFS is 85days.
 Total number of days of image data stored is 90 days.
3.3 Sample Data
Weather data
Fig 1: Sensor Weather data from Arduino
Moisture Data
6
Fig 2: Sensor data from Arduino
Image Data
Image data was captured every alternative day over a period of 90 days .
Fig 3: Images from the greenhouse
3.4 Schema
Weather data
Date Time Wind
direction
Wind
Speed
Humidity Temperature Rain Pressure Battery Light
Level
Table 1 : Weather Data Schema
Moisture Data
Date Time Moist
2
Moist
6
Moist
8
Moist
11
Moist
10
Moist
1
Moist
9
Moist
7
Moist
5
Temp Par
Table 2 : Weather Data Schema
7
3.5 Data Description:
Weather Data:
Date & time : Timestamp of the recording
Wind Direction: Direction of wind
Wind Speed: Speed of wind
Wind Gust: Gust of wind
Humidity: Percentage of water in air
Temperature: Temperature
Rain: Rain percentage
Pressure: Air pressure
Battery: Battery of Arduino
Light: Light exposure
Moisture Data
Moist 2: Moisture of plot 2
Moist 6: Moisture of plot 6
Moist 8: Moisture of plot 8
Moist 11: Moisture of plot 5
Moist 10: Moisture of plot 10
Moist 1: Moisture of plot 1
Moist 9: Moisture of plot 9
Moist 7: Moisture of plot 7
Moist 5: Moisture of plot 5
Temp: Soil temperature
PAR: Moisture metrics
8
4.0 PROJECT IMPLEMENTATION
4.1 Data Ingestion using Kafka
Kafka is the distributed messaging system which allows to transmit moisture and weather data from
Arduino chip to the HDFS. Kafka Architecture depends mainly on three components producer, broker and
consumer. Zookeeper is used to monitor the frequency of data following in and out of the Kafka broker.
The Below diagram is the architectural diagram of precision agriculture project. Kafka producer streams
the data that is produced in the greenhouse and sends it to the kafka broker. Kafka producer gets the
addresses of the broker thought zookeeper.
Fig 4: Kafka Architectural Diagram
4.2 Kafka producer
Kafka Producer is sender side of the Kafka distributed messaging system. Producer splits the messages to
their respective topics and sends to brokers based on topics. Producer also gets the address of the Kafka
brokers which is attached to the header of packet while sending the data.
The weather data, moisture data and image data differentiated using different topics such as “weather-data”,
”moisture-data” and “image-data”.
Below is the snippet to set up the Kafka producer with key and value set as string serialization. Bootstrap
server is the broker ID list of the Kafka broker.
9
Fig 5 : Configuring the kafka producer
Below is the snippet that is used to create message object which contains topic and messages to be sent to
the Kafka broker. Send function of Kafka producer binds the Kafka configuration instance with messages,
sends it to the broker.
Fig 6: Sending the message to kafka broker
4.4 Kafka Broker
Kafka Broker is the server side of the kafka distributed messaging system which is capable of handling
hundreds and hundreds of read and write operation per second. It can elastically expand without downtime.
Data Streams are partitioned and spread over a cluster of machines to allow data streams larger than
capability of single machine. The Kafka broker can be monitored using Zookeeper using port number
2181.By default Kafka broker comes with retention period of 168 hours.
10
Fig 7 : Monitoring the messages using Zookeeper
4.5 Kafka Consumer
Kafka Consumer is receiver side of the kafka distributed messaging system that fetches the data topic wise
from the brokers. Consumer runs in cluster and also stores the data in the HDFS for further processing.
Below is the sample consumer code which connects to the PA cluster. Topic set contains list of topics that
we are interested to fetch from the broker.
Fig 8 : Configuring Kafka Consumer
5.0 ADDITIONAL TOOLS
5.1 Maven
Maven was used as the dependency management to bring in all the jar from the server to the local repository.
This dependency injection help to develop the code from the windows environment .Maven helped to
specify the version of spark and kafka that was used and all the jar files related that version of spark was
stored in the local repository.
11
Fig 9 : Dependency Injection
5.2 Scala Build tool
Scala Build tool (SBT) was used to create the package and jar files which was transferred to cluster and vm
using winscp.
Fig 10 : SBT build
5.3 Git
Git is online code repository for storing all the code related to project. It offers all of the distributed
revision control and source code management (SCM). Git was used for precision agriculture project
repository to store the code online and share with team.
Below is the git link for the precision agriculture.
https://github.com/sri303030/Data-Ingestion-using-Kafka
12
6.0 OUTPUT INTERPRETATION
The Streamed data with the help of consumer is send to the HDFS and stored as two different folder to
distinguish between weather data and moisture data.
Below is the output from the weather data folder
Fig 11: weather data folder
Below is output from the moisture data folder
Fig 12: Moisture Data Folder
7.0 IMPROVING THE KAFKA ARCHITECTURE
Kafka Architecture can be improved in two ways:
1. Making kafka architecture more robust.
2. Having dedicated Kafka Broker to improve performance
7.1. Making kafka architecture more robust
In precision agriculture project, both broker and consumer were running on the same system as the
requirement of data ingestion was to store data in HDFS. In order make the architecture more robust,
consumer system must be a remote system or cluster which have the access to kafka broker this way the
architecture will be more robust and in case of failure in kafka broker the data can be retrived from
consumer.
13
7.2. Having dedicated Kafka Broker to improve performance
Kafka Broker runs as part of the cluster in precision agriculture project . In order to avoid noise in the
cluster broker must be a dedicated system or set of systems. It also helps to eradicate the overhead that
kafka broker has got over hadoop environment and speeds up all the processes.
8.0 FUTURE RESEARCH
1. Implement the bridging between HDFS and SparkSQL and store table as persistent data in hive.
2. Implement real time machine learning using Spark Mllib
3. Connect the live data to the reporting tool and analyze live data and create useful reports.
9.0. CONCLUSION
Kafka is rapidly growing distributed messaging system having various application in the field of
engineering. Thus with the help precision agriculture project, agricultural data from greenhouse was
captured and streamed to hadoop environment using kafka and spark. This project also gave me exposure
to handle different big data problems in real time situation and helped me understand kafka architecture.
14
BIBLIOGRAPHY
http://kafka.apache.org/
Rahul Jain (2014) Real time Analytics with Apache Kafka and Apache Spark
Wang, H., Can, D., Kazemzadeh, A., Bar, F., & Narayanan, S. (2012). A System for Real- time Twitter
Sentiment Analysis of 2012 U.S. Presidential Election Cycle. Paper presented at the Proceedings of
the 50th Annual Meeting of the Association for Computational Linguistics, Jeju, Republic of
Korea. http://www.aclweb.org/anthology/P12-3020

Más contenido relacionado

La actualidad más candente

SAP Activate Elements.pdf
SAP Activate Elements.pdfSAP Activate Elements.pdf
SAP Activate Elements.pdfAslamAs1
 
BMC Control M Advantage
BMC Control M Advantage BMC Control M Advantage
BMC Control M Advantage Vyom Labs
 
SAP ERP Solutions - How It Evolved Over Time
SAP ERP Solutions - How It Evolved Over TimeSAP ERP Solutions - How It Evolved Over Time
SAP ERP Solutions - How It Evolved Over TimeAPPSeCONNECT
 
Core Data Service
Core Data ServiceCore Data Service
Core Data ServiceSujoy Saha
 
Day1 Sap Basis Overview V1 1
Day1 Sap Basis Overview V1 1Day1 Sap Basis Overview V1 1
Day1 Sap Basis Overview V1 1Guang Ying Yuan
 
Introduction to SAP Gateway and OData
Introduction to SAP Gateway and ODataIntroduction to SAP Gateway and OData
Introduction to SAP Gateway and ODataChris Whealy
 
Togaf for innovation sandiego-v1
Togaf for innovation sandiego-v1Togaf for innovation sandiego-v1
Togaf for innovation sandiego-v1SUNDAR RAMANATHAN
 
SITIST 2015 Dev - Abap on Hana
SITIST 2015 Dev - Abap on HanaSITIST 2015 Dev - Abap on Hana
SITIST 2015 Dev - Abap on Hanasitist
 
OPEN TEXT ADMINISTRATION
OPEN TEXT ADMINISTRATIONOPEN TEXT ADMINISTRATION
OPEN TEXT ADMINISTRATIONSUMIT KUMAR
 
ServiceNow Configuration Management Database
ServiceNow Configuration Management Database ServiceNow Configuration Management Database
ServiceNow Configuration Management Database Jade Global
 
What is Business Objects
What is Business Objects What is Business Objects
What is Business Objects BigClasses.com
 
Mainframe Architecture & Product Overview
Mainframe Architecture & Product OverviewMainframe Architecture & Product Overview
Mainframe Architecture & Product Overviewabhi1112
 
Enterprise-architecture and the service-oriented enterprise
Enterprise-architecture and the service-oriented enterpriseEnterprise-architecture and the service-oriented enterprise
Enterprise-architecture and the service-oriented enterpriseTetradian Consulting
 
Example IDEF 0 Flow Diagrams
Example IDEF 0 Flow DiagramsExample IDEF 0 Flow Diagrams
Example IDEF 0 Flow DiagramsMandar Trivedi
 
Technical Overview of CDS View – SAP HANA Part I
Technical Overview of CDS View – SAP HANA Part ITechnical Overview of CDS View – SAP HANA Part I
Technical Overview of CDS View – SAP HANA Part IAshish Saxena
 
SAP ODATA Overview & Guidelines
SAP ODATA Overview & GuidelinesSAP ODATA Overview & Guidelines
SAP ODATA Overview & GuidelinesAshish Saxena
 
HfS Webinar Slides - Achieving Intelligent Automation in Business Operations
HfS Webinar Slides - Achieving Intelligent Automation in Business OperationsHfS Webinar Slides - Achieving Intelligent Automation in Business Operations
HfS Webinar Slides - Achieving Intelligent Automation in Business OperationsHfS Research
 

La actualidad más candente (20)

SAP Activate Elements.pdf
SAP Activate Elements.pdfSAP Activate Elements.pdf
SAP Activate Elements.pdf
 
BMC Control M Advantage
BMC Control M Advantage BMC Control M Advantage
BMC Control M Advantage
 
Oracle BPM 11G
Oracle BPM 11GOracle BPM 11G
Oracle BPM 11G
 
SAP ERP Solutions - How It Evolved Over Time
SAP ERP Solutions - How It Evolved Over TimeSAP ERP Solutions - How It Evolved Over Time
SAP ERP Solutions - How It Evolved Over Time
 
Sap grc-access-control-solution
Sap grc-access-control-solutionSap grc-access-control-solution
Sap grc-access-control-solution
 
Core Data Service
Core Data ServiceCore Data Service
Core Data Service
 
Day1 Sap Basis Overview V1 1
Day1 Sap Basis Overview V1 1Day1 Sap Basis Overview V1 1
Day1 Sap Basis Overview V1 1
 
Introduction to SAP Gateway and OData
Introduction to SAP Gateway and ODataIntroduction to SAP Gateway and OData
Introduction to SAP Gateway and OData
 
Sap architecture
Sap architectureSap architecture
Sap architecture
 
Togaf for innovation sandiego-v1
Togaf for innovation sandiego-v1Togaf for innovation sandiego-v1
Togaf for innovation sandiego-v1
 
SITIST 2015 Dev - Abap on Hana
SITIST 2015 Dev - Abap on HanaSITIST 2015 Dev - Abap on Hana
SITIST 2015 Dev - Abap on Hana
 
OPEN TEXT ADMINISTRATION
OPEN TEXT ADMINISTRATIONOPEN TEXT ADMINISTRATION
OPEN TEXT ADMINISTRATION
 
ServiceNow Configuration Management Database
ServiceNow Configuration Management Database ServiceNow Configuration Management Database
ServiceNow Configuration Management Database
 
What is Business Objects
What is Business Objects What is Business Objects
What is Business Objects
 
Mainframe Architecture & Product Overview
Mainframe Architecture & Product OverviewMainframe Architecture & Product Overview
Mainframe Architecture & Product Overview
 
Enterprise-architecture and the service-oriented enterprise
Enterprise-architecture and the service-oriented enterpriseEnterprise-architecture and the service-oriented enterprise
Enterprise-architecture and the service-oriented enterprise
 
Example IDEF 0 Flow Diagrams
Example IDEF 0 Flow DiagramsExample IDEF 0 Flow Diagrams
Example IDEF 0 Flow Diagrams
 
Technical Overview of CDS View – SAP HANA Part I
Technical Overview of CDS View – SAP HANA Part ITechnical Overview of CDS View – SAP HANA Part I
Technical Overview of CDS View – SAP HANA Part I
 
SAP ODATA Overview & Guidelines
SAP ODATA Overview & GuidelinesSAP ODATA Overview & Guidelines
SAP ODATA Overview & Guidelines
 
HfS Webinar Slides - Achieving Intelligent Automation in Business Operations
HfS Webinar Slides - Achieving Intelligent Automation in Business OperationsHfS Webinar Slides - Achieving Intelligent Automation in Business Operations
HfS Webinar Slides - Achieving Intelligent Automation in Business Operations
 

Destacado

Embedded training
Embedded trainingEmbedded training
Embedded trainingsowmiya437
 
Final presentation
Final presentationFinal presentation
Final presentationDao Tran
 
Motores de búsqueda
Motores de búsquedaMotores de búsqueda
Motores de búsquedaMario Hernan
 
Fiqih kelas 7 sm 2 pelajaran 3
Fiqih kelas 7 sm 2 pelajaran 3Fiqih kelas 7 sm 2 pelajaran 3
Fiqih kelas 7 sm 2 pelajaran 3mas_mughni
 
4th Nov 15 - Creating Great Minimum Viable Products - Brian Crofts
4th Nov 15 - Creating Great Minimum Viable Products - Brian Crofts4th Nov 15 - Creating Great Minimum Viable Products - Brian Crofts
4th Nov 15 - Creating Great Minimum Viable Products - Brian CroftsCity Unrulyversity
 
The Analysis of the Impact of Capital Mobility on Bubbly Episodes Creation in...
The Analysis of the Impact of Capital Mobility on Bubbly Episodes Creation in...The Analysis of the Impact of Capital Mobility on Bubbly Episodes Creation in...
The Analysis of the Impact of Capital Mobility on Bubbly Episodes Creation in...Andrii Chlechko
 
Documentación Proyecto # 73 Premios Eureka 2011 Mención Innovatividad Técnica
Documentación Proyecto # 73 Premios Eureka 2011 Mención Innovatividad TécnicaDocumentación Proyecto # 73 Premios Eureka 2011 Mención Innovatividad Técnica
Documentación Proyecto # 73 Premios Eureka 2011 Mención Innovatividad TécnicaProyecto Red Eureka
 
Presentación Proyecto # 51 Eureka 2011 Mención Innovatividad Social
Presentación Proyecto # 51 Eureka 2011 Mención Innovatividad SocialPresentación Proyecto # 51 Eureka 2011 Mención Innovatividad Social
Presentación Proyecto # 51 Eureka 2011 Mención Innovatividad SocialProyecto Red Eureka
 
Resume - Mechanical Engineer
Resume - Mechanical EngineerResume - Mechanical Engineer
Resume - Mechanical EngineerAdeel Khan
 
House Sale Price Prediction
House Sale Price PredictionHouse Sale Price Prediction
House Sale Price Predictionsriram30691
 
Quemados Graves. Resultados comparativos Indisa
Quemados Graves. Resultados comparativos IndisaQuemados Graves. Resultados comparativos Indisa
Quemados Graves. Resultados comparativos IndisaSebastian Villegas
 
Slideshare 1os auxilios paula_vicent_paco
Slideshare 1os auxilios paula_vicent_pacoSlideshare 1os auxilios paula_vicent_paco
Slideshare 1os auxilios paula_vicent_pacoVicentMenaAsix
 

Destacado (20)

UC-83FNA0DB
UC-83FNA0DBUC-83FNA0DB
UC-83FNA0DB
 
Embedded training
Embedded trainingEmbedded training
Embedded training
 
Trabajo de didactica
Trabajo de didacticaTrabajo de didactica
Trabajo de didactica
 
CV HARIS
CV HARISCV HARIS
CV HARIS
 
Separación siamesas
Separación siamesasSeparación siamesas
Separación siamesas
 
Final presentation
Final presentationFinal presentation
Final presentation
 
Motores de búsqueda
Motores de búsquedaMotores de búsqueda
Motores de búsqueda
 
Fiqih kelas 7 sm 2 pelajaran 3
Fiqih kelas 7 sm 2 pelajaran 3Fiqih kelas 7 sm 2 pelajaran 3
Fiqih kelas 7 sm 2 pelajaran 3
 
Anticoagulación y cirugía
Anticoagulación y cirugíaAnticoagulación y cirugía
Anticoagulación y cirugía
 
4th Nov 15 - Creating Great Minimum Viable Products - Brian Crofts
4th Nov 15 - Creating Great Minimum Viable Products - Brian Crofts4th Nov 15 - Creating Great Minimum Viable Products - Brian Crofts
4th Nov 15 - Creating Great Minimum Viable Products - Brian Crofts
 
The Analysis of the Impact of Capital Mobility on Bubbly Episodes Creation in...
The Analysis of the Impact of Capital Mobility on Bubbly Episodes Creation in...The Analysis of the Impact of Capital Mobility on Bubbly Episodes Creation in...
The Analysis of the Impact of Capital Mobility on Bubbly Episodes Creation in...
 
Documentación Proyecto # 73 Premios Eureka 2011 Mención Innovatividad Técnica
Documentación Proyecto # 73 Premios Eureka 2011 Mención Innovatividad TécnicaDocumentación Proyecto # 73 Premios Eureka 2011 Mención Innovatividad Técnica
Documentación Proyecto # 73 Premios Eureka 2011 Mención Innovatividad Técnica
 
Presentación Proyecto # 51 Eureka 2011 Mención Innovatividad Social
Presentación Proyecto # 51 Eureka 2011 Mención Innovatividad SocialPresentación Proyecto # 51 Eureka 2011 Mención Innovatividad Social
Presentación Proyecto # 51 Eureka 2011 Mención Innovatividad Social
 
Desforramiento de extremidad inferior
Desforramiento de extremidad inferior Desforramiento de extremidad inferior
Desforramiento de extremidad inferior
 
Resume - Mechanical Engineer
Resume - Mechanical EngineerResume - Mechanical Engineer
Resume - Mechanical Engineer
 
House Sale Price Prediction
House Sale Price PredictionHouse Sale Price Prediction
House Sale Price Prediction
 
Plasma Technology
Plasma TechnologyPlasma Technology
Plasma Technology
 
Digital plan for Men's Biore
Digital plan for Men's BioreDigital plan for Men's Biore
Digital plan for Men's Biore
 
Quemados Graves. Resultados comparativos Indisa
Quemados Graves. Resultados comparativos IndisaQuemados Graves. Resultados comparativos Indisa
Quemados Graves. Resultados comparativos Indisa
 
Slideshare 1os auxilios paula_vicent_paco
Slideshare 1os auxilios paula_vicent_pacoSlideshare 1os auxilios paula_vicent_paco
Slideshare 1os auxilios paula_vicent_paco
 

Similar a Scala Spark Agriculture Data Analytics

Real-time monitoring system for weather and air pollutant measurement with HT...
Real-time monitoring system for weather and air pollutant measurement with HT...Real-time monitoring system for weather and air pollutant measurement with HT...
Real-time monitoring system for weather and air pollutant measurement with HT...journalBEEI
 
4 realtime wether station for monitoring and control of agricultre
4 realtime wether station for monitoring and control of agricultre4 realtime wether station for monitoring and control of agricultre
4 realtime wether station for monitoring and control of agricultreBhushan Deore
 
OpenWeatherMap on the Open GIS Conference 2012
OpenWeatherMap on the Open GIS Conference 2012OpenWeatherMap on the Open GIS Conference 2012
OpenWeatherMap on the Open GIS Conference 2012Dennsy
 
Wireless Sensor Network for AgriTech Applications
Wireless Sensor Network for AgriTech Applications Wireless Sensor Network for AgriTech Applications
Wireless Sensor Network for AgriTech Applications IoTForum | TiE Bangalore
 
23 2 may17 28apr 16137 (6575 new)(edit)
23 2 may17 28apr 16137 (6575 new)(edit)23 2 may17 28apr 16137 (6575 new)(edit)
23 2 may17 28apr 16137 (6575 new)(edit)IAESIJEECS
 
Bulk Loading Into HBase With MapReduce
Bulk Loading Into HBase With MapReduceBulk Loading Into HBase With MapReduce
Bulk Loading Into HBase With MapReduceEdureka!
 
Intelligent Weather Service
Intelligent Weather Service Intelligent Weather Service
Intelligent Weather Service Uday Sharma
 
Building a fully-automated Fast Data Platform
Building a fully-automated Fast Data PlatformBuilding a fully-automated Fast Data Platform
Building a fully-automated Fast Data PlatformComsysto Reply GmbH
 
Building a fully-automated Fast Data Platform
Building a fully-automated Fast Data PlatformBuilding a fully-automated Fast Data Platform
Building a fully-automated Fast Data PlatformManuel Sehlinger
 
Realtime wether station for monitoring and control of agricultre
Realtime wether station for monitoring and control of agricultreRealtime wether station for monitoring and control of agricultre
Realtime wether station for monitoring and control of agricultreBhushan Deore
 
IRJET- Smart Management of Crop Cultivation using IoT and Machine Learning
IRJET- Smart Management of Crop Cultivation using IoT and Machine LearningIRJET- Smart Management of Crop Cultivation using IoT and Machine Learning
IRJET- Smart Management of Crop Cultivation using IoT and Machine LearningIRJET Journal
 
Distributed Cache With MapReduce
Distributed Cache With MapReduceDistributed Cache With MapReduce
Distributed Cache With MapReduceEdureka!
 

Similar a Scala Spark Agriculture Data Analytics (20)

Dynamic Integrations of Crop Data and Corresponding Meteorological Data based...
Dynamic Integrations of Crop Data and Corresponding Meteorological Data based...Dynamic Integrations of Crop Data and Corresponding Meteorological Data based...
Dynamic Integrations of Crop Data and Corresponding Meteorological Data based...
 
Dynamic integrations of crop data and corresponding meteorological data based...
Dynamic integrations of crop data and corresponding meteorological data based...Dynamic integrations of crop data and corresponding meteorological data based...
Dynamic integrations of crop data and corresponding meteorological data based...
 
Real-time monitoring system for weather and air pollutant measurement with HT...
Real-time monitoring system for weather and air pollutant measurement with HT...Real-time monitoring system for weather and air pollutant measurement with HT...
Real-time monitoring system for weather and air pollutant measurement with HT...
 
4 realtime wether station for monitoring and control of agricultre
4 realtime wether station for monitoring and control of agricultre4 realtime wether station for monitoring and control of agricultre
4 realtime wether station for monitoring and control of agricultre
 
OpenWeatherMap on the Open GIS Conference 2012
OpenWeatherMap on the Open GIS Conference 2012OpenWeatherMap on the Open GIS Conference 2012
OpenWeatherMap on the Open GIS Conference 2012
 
Wireless Sensor Network for AgriTech Applications
Wireless Sensor Network for AgriTech Applications Wireless Sensor Network for AgriTech Applications
Wireless Sensor Network for AgriTech Applications
 
finalDraftPoster
finalDraftPosterfinalDraftPoster
finalDraftPoster
 
FinalReport
FinalReportFinalReport
FinalReport
 
23 2 may17 28apr 16137 (6575 new)(edit)
23 2 may17 28apr 16137 (6575 new)(edit)23 2 may17 28apr 16137 (6575 new)(edit)
23 2 may17 28apr 16137 (6575 new)(edit)
 
Bulk Loading Into HBase With MapReduce
Bulk Loading Into HBase With MapReduceBulk Loading Into HBase With MapReduce
Bulk Loading Into HBase With MapReduce
 
Ashwin_Thesis
Ashwin_ThesisAshwin_Thesis
Ashwin_Thesis
 
Process Model
Process ModelProcess Model
Process Model
 
Intelligent Weather Service
Intelligent Weather Service Intelligent Weather Service
Intelligent Weather Service
 
Hh3413401342
Hh3413401342Hh3413401342
Hh3413401342
 
Building a fully-automated Fast Data Platform
Building a fully-automated Fast Data PlatformBuilding a fully-automated Fast Data Platform
Building a fully-automated Fast Data Platform
 
Building a fully-automated Fast Data Platform
Building a fully-automated Fast Data PlatformBuilding a fully-automated Fast Data Platform
Building a fully-automated Fast Data Platform
 
Realtime wether station for monitoring and control of agricultre
Realtime wether station for monitoring and control of agricultreRealtime wether station for monitoring and control of agricultre
Realtime wether station for monitoring and control of agricultre
 
UDP Report
UDP ReportUDP Report
UDP Report
 
IRJET- Smart Management of Crop Cultivation using IoT and Machine Learning
IRJET- Smart Management of Crop Cultivation using IoT and Machine LearningIRJET- Smart Management of Crop Cultivation using IoT and Machine Learning
IRJET- Smart Management of Crop Cultivation using IoT and Machine Learning
 
Distributed Cache With MapReduce
Distributed Cache With MapReduceDistributed Cache With MapReduce
Distributed Cache With MapReduce
 

Scala Spark Agriculture Data Analytics

  • 1. PRECISION AGRICULTURE SUPPORT USING SCALA/SPARK Project Report SRIRAM RV SPRING SEMESTER ADVISOR: PROFESSOR BRAD RUBIN
  • 2. 2 Table of Contents 1.0 PURPOSE OF PROJECT.......................................................................................................................4 2.0 PROJECT DESCRIPTION.....................................................................................................................4 2.0 Why Agriculture Data.........................................................................................................................5 3.0 DATASET ..............................................................................................................................................5 3.1 Data Source:........................................................................................................................................5 3.2 Details about Dataset: .........................................................................................................................5 3.3 Sample Data........................................................................................................................................5 Weather data .........................................................................................................................................5 Moisture Data........................................................................................................................................5 Image Data............................................................................................................................................6 3.4 Schema................................................................................................................................................6 Weather data .........................................................................................................................................6 Moisture Data........................................................................................................................................6 3.5 Data Description: ................................................................................................................................7 Weather Data: .......................................................................................................................................7 Moisture Data........................................................................................................................................7 4.0 PROJECT IMPLEMENTATION...........................................................................................................8 4.1 Data Ingestion using Kafka.................................................................................................................8 4.2 Kafka producer....................................................................................................................................8 4.4 Kafka Broker.......................................................................................................................................9 4.5 Kafka Consumer ...............................................................................................................................10 5.0 ADDITIONAL TOOLS........................................................................................................................10 5.1 Maven ...............................................................................................................................................10 5.2 Scala Build tool.................................................................................................................................11 5.3 Git .....................................................................................................................................................11 6.0 OUTPUT INTERPRETATION............................................................................................................12 7.0 IMPROVING THE KAFKA ARCHITECTURE.................................................................................12 7.1. Making kafka architecture more robust ...........................................................................................12 7.2. Having dedicated Kafka Broker to improve performance ...............................................................13
  • 3. 3 8.0 FUTURE RESEARCH.........................................................................................................................13 9.0. CONCLUSION....................................................................................................................................13 BIBLIOGRAPHY.......................................................................................................................................14
  • 4. 4 1.0 PURPOSE OF PROJECT Big data tools over last few years has been focused on both structured and unstructured data. However, image processing is one area where it needs more of attention and it has been my area of interest too. With the help of this project, I will get an opportunity to experiment with streaming images and weather data captured in the UST greenhouse, and also get a feel for image processing with Scala/Spark on Hadoop more generally. I will gain experience in technologies such as Scala, Spark, Spark streaming, and image processing in the domain of food technology that will give me skills that I cannot otherwise obtain in the GPS curriculum. 2.0 PROJECT DESCRIPTION The purpose of the project is to stream real-time weather data captured by both direct sensors and RGB images captured by the drones to perform image processing and weather data analytics leveraging the Scala/Spark ecosystem on a Hadoop computing cluster. Since image processing and streaming with Spark are knew technologies to GPS, part of the project will focus on experimenting with different tools and find out more reliable way of storing images and streamed data in HDFS. The UST greenhouse will be growing plants for the Precision Agriculture project run by the UST School of Engineering. The greenhouse has a local weather station that will be broadcasting weather data such as temperature, humidity, light intensity, barometric pressure, position (latitude/longitude), wind speed and direction and rainfall. The broadcast will be continuous at 10 second intervals (in CSV format) .The equipment in the greenhouse is a prototype for field use which is useful for both analysis of plant health and creating a model for each of the six plant species that will be grown. In addition, high resolution images will be taken of the plants in the visible and near IR regions of the light spectrum. The periodicity of these images will be every couple of days.
  • 5. 5 2.0 Why Agriculture Data With the help of agricultural data, I will get an opportunity to experiment withstreaming images and weather data captured in the UST greenhouse. Data captured in greenhouse is so much detailed and gives me experience on working with data from food technology. 3.0 DATASET 3.1 Data Source: The data source used for this project is the live streaming of weather and moisture data captured using sensors through Arduino chip and Streamed using Kafka producer. 3.2 Details about Dataset:  The sensor data were captured for every second.  Total number of days of weather data stored in HDFS is 90 days.  Total number of days of moisture data stored in HDFS is 85days.  Total number of days of image data stored is 90 days. 3.3 Sample Data Weather data Fig 1: Sensor Weather data from Arduino Moisture Data
  • 6. 6 Fig 2: Sensor data from Arduino Image Data Image data was captured every alternative day over a period of 90 days . Fig 3: Images from the greenhouse 3.4 Schema Weather data Date Time Wind direction Wind Speed Humidity Temperature Rain Pressure Battery Light Level Table 1 : Weather Data Schema Moisture Data Date Time Moist 2 Moist 6 Moist 8 Moist 11 Moist 10 Moist 1 Moist 9 Moist 7 Moist 5 Temp Par Table 2 : Weather Data Schema
  • 7. 7 3.5 Data Description: Weather Data: Date & time : Timestamp of the recording Wind Direction: Direction of wind Wind Speed: Speed of wind Wind Gust: Gust of wind Humidity: Percentage of water in air Temperature: Temperature Rain: Rain percentage Pressure: Air pressure Battery: Battery of Arduino Light: Light exposure Moisture Data Moist 2: Moisture of plot 2 Moist 6: Moisture of plot 6 Moist 8: Moisture of plot 8 Moist 11: Moisture of plot 5 Moist 10: Moisture of plot 10 Moist 1: Moisture of plot 1 Moist 9: Moisture of plot 9 Moist 7: Moisture of plot 7 Moist 5: Moisture of plot 5 Temp: Soil temperature PAR: Moisture metrics
  • 8. 8 4.0 PROJECT IMPLEMENTATION 4.1 Data Ingestion using Kafka Kafka is the distributed messaging system which allows to transmit moisture and weather data from Arduino chip to the HDFS. Kafka Architecture depends mainly on three components producer, broker and consumer. Zookeeper is used to monitor the frequency of data following in and out of the Kafka broker. The Below diagram is the architectural diagram of precision agriculture project. Kafka producer streams the data that is produced in the greenhouse and sends it to the kafka broker. Kafka producer gets the addresses of the broker thought zookeeper. Fig 4: Kafka Architectural Diagram 4.2 Kafka producer Kafka Producer is sender side of the Kafka distributed messaging system. Producer splits the messages to their respective topics and sends to brokers based on topics. Producer also gets the address of the Kafka brokers which is attached to the header of packet while sending the data. The weather data, moisture data and image data differentiated using different topics such as “weather-data”, ”moisture-data” and “image-data”. Below is the snippet to set up the Kafka producer with key and value set as string serialization. Bootstrap server is the broker ID list of the Kafka broker.
  • 9. 9 Fig 5 : Configuring the kafka producer Below is the snippet that is used to create message object which contains topic and messages to be sent to the Kafka broker. Send function of Kafka producer binds the Kafka configuration instance with messages, sends it to the broker. Fig 6: Sending the message to kafka broker 4.4 Kafka Broker Kafka Broker is the server side of the kafka distributed messaging system which is capable of handling hundreds and hundreds of read and write operation per second. It can elastically expand without downtime. Data Streams are partitioned and spread over a cluster of machines to allow data streams larger than capability of single machine. The Kafka broker can be monitored using Zookeeper using port number 2181.By default Kafka broker comes with retention period of 168 hours.
  • 10. 10 Fig 7 : Monitoring the messages using Zookeeper 4.5 Kafka Consumer Kafka Consumer is receiver side of the kafka distributed messaging system that fetches the data topic wise from the brokers. Consumer runs in cluster and also stores the data in the HDFS for further processing. Below is the sample consumer code which connects to the PA cluster. Topic set contains list of topics that we are interested to fetch from the broker. Fig 8 : Configuring Kafka Consumer 5.0 ADDITIONAL TOOLS 5.1 Maven Maven was used as the dependency management to bring in all the jar from the server to the local repository. This dependency injection help to develop the code from the windows environment .Maven helped to specify the version of spark and kafka that was used and all the jar files related that version of spark was stored in the local repository.
  • 11. 11 Fig 9 : Dependency Injection 5.2 Scala Build tool Scala Build tool (SBT) was used to create the package and jar files which was transferred to cluster and vm using winscp. Fig 10 : SBT build 5.3 Git Git is online code repository for storing all the code related to project. It offers all of the distributed revision control and source code management (SCM). Git was used for precision agriculture project repository to store the code online and share with team. Below is the git link for the precision agriculture. https://github.com/sri303030/Data-Ingestion-using-Kafka
  • 12. 12 6.0 OUTPUT INTERPRETATION The Streamed data with the help of consumer is send to the HDFS and stored as two different folder to distinguish between weather data and moisture data. Below is the output from the weather data folder Fig 11: weather data folder Below is output from the moisture data folder Fig 12: Moisture Data Folder 7.0 IMPROVING THE KAFKA ARCHITECTURE Kafka Architecture can be improved in two ways: 1. Making kafka architecture more robust. 2. Having dedicated Kafka Broker to improve performance 7.1. Making kafka architecture more robust In precision agriculture project, both broker and consumer were running on the same system as the requirement of data ingestion was to store data in HDFS. In order make the architecture more robust, consumer system must be a remote system or cluster which have the access to kafka broker this way the architecture will be more robust and in case of failure in kafka broker the data can be retrived from consumer.
  • 13. 13 7.2. Having dedicated Kafka Broker to improve performance Kafka Broker runs as part of the cluster in precision agriculture project . In order to avoid noise in the cluster broker must be a dedicated system or set of systems. It also helps to eradicate the overhead that kafka broker has got over hadoop environment and speeds up all the processes. 8.0 FUTURE RESEARCH 1. Implement the bridging between HDFS and SparkSQL and store table as persistent data in hive. 2. Implement real time machine learning using Spark Mllib 3. Connect the live data to the reporting tool and analyze live data and create useful reports. 9.0. CONCLUSION Kafka is rapidly growing distributed messaging system having various application in the field of engineering. Thus with the help precision agriculture project, agricultural data from greenhouse was captured and streamed to hadoop environment using kafka and spark. This project also gave me exposure to handle different big data problems in real time situation and helped me understand kafka architecture.
  • 14. 14 BIBLIOGRAPHY http://kafka.apache.org/ Rahul Jain (2014) Real time Analytics with Apache Kafka and Apache Spark Wang, H., Can, D., Kazemzadeh, A., Bar, F., & Narayanan, S. (2012). A System for Real- time Twitter Sentiment Analysis of 2012 U.S. Presidential Election Cycle. Paper presented at the Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics, Jeju, Republic of Korea. http://www.aclweb.org/anthology/P12-3020