Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Novotny Jan Scherbaum

•

3 recomendaciones•2,349 vistas

Data lineage tracking is one of the significant problems that financial institutions face when using modern big data tools. This presentation describes Spline – a data lineage tracking and visualization tool for Apache Spark. Spline captures and stores lineage information from internal Spark execution plans and visualizes it in a user-friendly manner.

Datos y análisis

Jan Scherbaum & Marek Novotny
Barclays Africa Group Limited
SPLINE:
APACHE SPARK LINEAGE, NOT ONLY
FOR THE BANKING INDUSTRY
#EUent3

7.34
Stuff happens
under the hood…
#EUent3 2

Data type
changes
Adding new dataComplex calcs
7.34
#EUent3 3

Overview
• Barclays Africa
• Spline
• Live demo
• Future work
• The big picture
• Questions
4#EUent3

• Pan-African financial services provider
– Providing services to South Africa (ABSA) and 11 other
African countries (Barclays)
• Subject to strict regulatory compliance
– Basel Committee on Banking Supervision (BCBS)
• Accuracy
• Comprehensiveness
• Clarity
• Usefulness
5#EUent3
Barclays Africa Group Limited

• SPark LINEage
• Open source project
• Goals
– Satisfy initial interpretations of regulatory
requirements, specifically on “Clarity”
• Data lineage from Spark’s execution plans
• Visualize in an “explorable” user-friendly format
6#EUent3
Spline

Spline – How it works
7#EUent3
Spark job
Spark Job
Spark library
Spark Session
Action
Transformations
Generate execution plans

Spline – How it works
8#EUent3
Spark job
1 line initialization
Use SQLContext listeners
Generate execution plans
Spark Job & Spline
Spark library
Spark Session
Action
Spline
Transformations

Spline UI
Spline – How it works
9#EUent3
Spark job
1 line initialization
Use SQLContext listeners
Generate execution plans
Spark Job & Spline
Spark library
Spark Session
Action
Spline
Transformations
…

Demo use case
• Find the countries with the highest annual beer
consumption per person
– Correlation with GDP??
10#EUent3

Data
11#EUent3
Country 2011 2010 2009
Czech Republic 15,583,000 15,549,000 16,190,000
Ireland 4,721,000 4,814,000 4,832,000
Country Metric 2011 2010 2009
Czech Republic GDP $21,717 $19,764 $19,698
Czech Republic Population 10,496,088 10,474,410 10,443,936
Ireland GDP $52,567 $48,538 $51,983
Ireland Population 4,576,794 4,560,155 4,535,375
Beer consumption per country
Development indicators from the world bank

Analysis
• Marek’s job
– Data prep
– Analyze the correlation between beer consumption
and GDP growth
• Jan’s beer job
– Calculate the consumption of beer per country per
capita
12#EUent3

Dependencies
<dependency>
<groupId>za.co.absa</groupId>
<artifactId>spline-core</artifactId>
<version>${spline.version}</version>
</dependency>
<dependency>
<groupId>za.co.absa</groupId>
<artifactId>spline-persistence-mongo</artifactId>
<version>${spline.version}</version>
</dependency>
13#EUent3

Initialization
14#EUent3
// Initializing library to hook up to Apache Spark
import za.co.absa.spline.core.SparkLineageInitializer._
spark.enableLineageTracking()

Next steps
• Enterprise features
– Authentication (Kerberos, SSO)
– Authorization
– User management
• Interoperability with other tools
– Cloudera Manager, Informatica, Apache Atlas
• Support other Spark data sources and actions
– Streaming, ML
16#EUent3

The bigger picture
• Develop open source conformance & ingestion
engine on Spark
– BCBS compliant (lineage, dataflow controls, error tracking)
– Transfer & transform data from different source systems
• To strongly typed datasets
• On Hadoop
• Conforming to enterprise level data dictionaries & data quality
• In development – stay tuned J
17#EUent3

We’re open source!
• Contributions are most welcome
• Released versions mirrored on
– https://github.com/AbsaOSS/spline
• Wiki and docs on
– https://absaoss.github.io/spline/
18#EUent3

Questions
19#EUent3
• Now is a good time
• Or feel free to contact us
– Jan Scherbaum
• jan.scherbaum@barclays.com
– Marek Novotny
• marek.x.novotny@barclays.com
– Oleksandr Vayda
• oleksandr.vayda@barclays.com
• Acknowledgements:
– Dennis Chu, Aaisha Bibi Osman, Adam Smyczek, Andrew Baker

Más contenido relacionado

La actualidad más candente

It’s very easy to be distracted by the latest and greatest approaches with technology, but sometimes there’s a reason old approaches stand the test of time. Star Schemas & Kimball is one of those things that isn’t going anywhere, but as we move towards the “Data Lakehouse” paradigm – how appropriate is this modelling technique, and how can we harness the Delta Engine & Spark 3.0 to maximise it’s performance?

Achieving Lakehouse Models with Spark 3.0

Databricks

DW Migration Webinar-March 2022.pptx

Databricks

Webinar Data Mesh - Part 3

Jeffrey T. Pollock

Data Mesh using Microsoft Fabric

Nathan Bijnens

Azure Data Factory ETL Patterns in the Cloud

Mark Kromer

Modern Data architecture Design

Kujambu Murugesan

Cloud-native Semantic Layer on Data Lake

Databricks

Introducing Databricks Delta

Databricks

Databricks' founders caused a seismic shift in data analysis community when they created Apache Spark which has become a cornerstone of Big Data processing pipelines and tools in large and small companies all around the world. Now they've built a revolutionary, comprehensive and easy-to-use platform around Apache Spark and their other inventions, such as MLFlow and Koalas frameworks and most importantly the Data Lakehouse: a concept of fusing data warehouse and data lake architectures into a single versatile and fast platform. Technical foundation for Databricks Data Lakehouse is Delta Lake. More than 7000 organizations today rely on Databricks to enable massive-scale data engineering, collaborative data science, full-lifecycle machine learning and business analytics. Come to the talk and see the demo to find out why.

[DSC Europe 22] Overview of the Databricks Platform - Petar Zecevic

DataScienceConferenc1

Applying DevOps to Databricks can be a daunting task. In this talk this will be broken down into bite size chunks. Common DevOps subject areas will be covered, including CI/CD (Continuous Integration/Continuous Deployment), IAC (Infrastructure as Code) and Build Agents. We will explore how to apply DevOps to Databricks (in Azure), primarily using Azure DevOps tooling. As a lot of Spark/Databricks users are Python users, will will focus on the Databricks Rest API (using Python) to perform our tasks.

DevOps for Databricks

Databricks

Organizations with on-premises Hadoop infrastructure are bogged down by system complexity, unscalable infrastructure, and the increasing burden on DevOps to manage legacy architectures. Costs and resource utilization continue to go up while innovation has flatlined. In this session, you will learn why, now more than ever, enterprises are looking for cloud alternatives to Hadoop and are migrating off of the architecture in large numbers. You will also learn how elastic compute models’ benefits help one customer scale their analytics and AI workloads and best practices from their experience on a successful migration of their data and workloads to the cloud.

Modernizing to a Cloud Data Architecture

Databricks

Time to Talk about Data Mesh

LibbySchulze

Databricks Platform.pptx

Alex Ivy

Apache Kafka in conjunction with Apache Spark became the de facto standard for processing and analyzing data. Both frameworks are open, flexible, and scalable. Unfortunately, the latter makes operations a challenge for many teams. Ideally, teams can use serverless SaaS offerings to focus on business logic. However, hybrid and multi-cloud scenarios require a cloud-native platform that provides automated and elastic tooling to reduce the operations burden. This session explores different architectures to build serverless Apache Kafka and Apache Spark multi-cloud architectures across regions and continents. We start from the analytics perspective of a data lake and explore its relation to a fully integrated data streaming layer with Kafka to build a modern data Data Lakehouse. Real-world use cases show the joint value and explore the benefit of the "delta lake" integration.

Serverless Kafka and Spark in a Multi-Cloud Lakehouse Architecture

Kai Wähner

Apache Spark is one of the most popular big data projects, offering greatly improved performance over traditional MapReduce models. Much of Apache Spark’s power comes from lazy evaluation along with intelligent pipelining, which can make debugging more challenging. This talk will examine how to debug Apache Spark applications, the different options for logging in Spark’s variety of supported languages, as well as some common errors and how to detect them. Spark’s own internal logging can often be quite verbose, and this talk will examine how to effectively search logs from Apache Spark to spot common problems. In addition to the internal logging, this talk will look at options for logging from within our program itself. Spark’s accumulators have gotten a bad rap because of how they interact in the event of cache misses or partial recomputes, but this talk will look at how to effectively use Spark’s current accumulators for debugging as well as a look to future for data property type accumulators which may be coming to Spark in future version. In addition to reading logs, and instrumenting our program with accumulators, Spark’s UI can be of great help for quickly detecting certain types of problems.

Debugging PySpark: Spark Summit East talk by Holden Karau

Spark Summit

Informatica MDM Presentation

MaxHung

The world of data architecture began with applications. Next came data warehouses. Then text was organized into a data warehouse. Then one day the world discovered a whole new kind of data that was being generated by organizations. The world found that machines generated data that could be transformed into valuable insights. This was the origin of what is today called the data lakehouse. The evolution of data architecture continues today. Come listen to industry experts describe this transformation of ordinary data into a data architecture that is invaluable to business. Simply put, organizations that take data architecture seriously are going to be at the forefront of business tomorrow. This is an educational event. Several of the authors of the book Building the Data Lakehouse will be presenting at this symposium.

Data Lakehouse Symposium | Day 4

Databricks

This session covers how to work with PySpark interface to develop Spark applications. From loading, ingesting, and applying transformation on the data. The session covers how to work with different data sources of data, apply transformation, python best practices in developing Spark Apps. The demo covers integrating Apache Spark apps, In memory processing capabilities, working with notebooks, and integrating analytics tools into Spark Applications.

Programming in Spark using PySpark

Mostafa

Delta Lake, an open-source innovations which brings new capabilities for transactions, version control and indexing your data lakes. We uncover how Delta Lake benefits and why it matters to you. Through this session, we showcase some of its benefits and how they can improve your modern data engineering pipelines. Delta lake provides snapshot isolation which helps concurrent read/write operations and enables efficient insert, update, deletes, and rollback capabilities. It allows background file optimization through compaction and z-order partitioning achieving better performance improvements. In this presentation, we will learn the Delta Lake benefits and how it solves common data lake challenges, and most importantly new Delta Time Travel capability.

Databricks Delta Lake and Its Benefits

Databricks

Azure Synapse Analytics is Azure SQL Data Warehouse evolved: a limitless analytics service, that brings together enterprise data warehousing and Big Data analytics into a single service. It gives you the freedom to query data on your terms, using either serverless on-demand or provisioned resources, at scale. Azure Synapse brings these two worlds together with a unified experience to ingest, prepare, manage, and serve data for immediate business intelligence and machine learning needs. This is a huge deck with lots of screenshots so you can see exactly how it works.

Azure Synapse Analytics Overview (r2)

James Serra

La actualidad más candente (20)

Achieving Lakehouse Models with Spark 3.0

DW Migration Webinar-March 2022.pptx

Webinar Data Mesh - Part 3

Data Mesh using Microsoft Fabric

Azure Data Factory ETL Patterns in the Cloud

Modern Data architecture Design

Cloud-native Semantic Layer on Data Lake

Introducing Databricks Delta

[DSC Europe 22] Overview of the Databricks Platform - Petar Zecevic

DevOps for Databricks

Modernizing to a Cloud Data Architecture

Time to Talk about Data Mesh

Databricks Platform.pptx

Serverless Kafka and Spark in a Multi-Cloud Lakehouse Architecture

Debugging PySpark: Spark Summit East talk by Holden Karau

Informatica MDM Presentation

Data Lakehouse Symposium | Day 4

Programming in Spark using PySpark

Databricks Delta Lake and Its Benefits

Azure Synapse Analytics Overview (r2)

Similar a Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Novotny Jan Scherbaum

>Sarah: My Spark SQL query failed. How can I fix it? >Jeeves: Your Spark query driver went out of memory. >Jeeves: You can set spark.driver.memory to 2.2GB and rerun the query to complete it successfully. Who is Jeeves? An experienced Spark developer? A seasoned administrator? No, Jeeves is a chatbot created to simplify data operations management for enterprise Spark clusters. This chatbot is powered by advanced AI algorithms and an intuitive conversational interface that together provide answers to get users in and out of performance problems quickly. Instead of just being stuck to screens displaying performance logs and metrics, users can now have more refreshing experience; and consume performance insights via a two-way conversation with their own personal Spark expert. This talk will give an overview of the chatbot, its architecture, and how it fits in a complex Spark environment. The chatbot connects to a large number of sources to get the data to power its AI algorithms. It can detect anomalies in performance and push key insights via alerts to users when they need them the most. The chatbot can also be told to take actions like creating tickets and making configuration changes. You will learn how to build chatbots that tackle your complex data operations challenges with AI algorithms and automation, keeping a cool head at all times.

An AI-Powered Chatbot to Simplify Apache Spark Performance Management

Databricks

Kafka Summit NYC 2017 - The Real-time Event Driven Bank: A Kafka Story

confluent

SANOG 38: RPKI Update

APNIC

Splunk at Scotiabank

Splunk

SplunkSummit 2015 - Update on Splunk Enterprise 6.3 & Hunk 6.3

Splunk

Exascale Computing Project (ECP) Update

inside-BigData.com

Я хочу рассказать о нашем опыте создания высоконагруженного REST API, который имеет следующие параметры: * Интуитивно понятный * Масштабируемый * Отказоустойчивой * Защита от атаки * Высокая пропускная способность Также я расскажу о том: * как мы делали нагрузочное тестирование * как мы пережили множество DDoS-атак * как мы сделали Blue-Green deploy Использовался следующий список технологий: .NET Core 2, ASP.NET Core 2, Consul, Fabio, Orleans.NET, Kafka, RabbitMQ, Serilog, xMetrics, Grafana, ELK

.NET Fest 2019. Андрей Винда. Создание REST API с поддержкой высокой нагрузки

NETFest

Introducing Blockchain With Java.pdf

LuisFeliciano23

ION Krakow - BCOP Update

Deploy360 Programme (Internet Society)

Introduce to PredictionIO

Wei-Yuan Chang

16Gb Fibre Channel Deployment Guide

TheFibreChannel

Clean Slate Design - O Que é?

Antonio Marcos Alberti

Portland Splunk User Group May 2020

Amanda Richardson

bmp.pptx

ReddyR12

Customer Presentation

Splunk

Splunk live! Customer Presentation – Wellsfargo

Splunk

Sql Server 2008 Portfolio

Eugene Kilpatrick

IET programmable components

DTGeek

[Tel Aviv merge world tour] Introducing Perforce Insights

Perforce

Life occurs in real-time, and not surprisingly, more solutions are being built using streaming technologies. Event-based architectures are becoming the norm, and customers are expecting immediate access to their data. This new world offers many exciting opportunities, but also some new challenges. What do you do when your streaming data is not complete? What if it relies on another data source? Does the dependent data exist yet, and does it come from a 3rd party? How do we merge a complete picture of data when data is sourcing from multiple places at the same time? A new norm in the world of distributed services. Join us as we dive deep into the technical details around these scenarios and more. Expect to learn about stream-stream joins, enriching stream data using local or remote data, and ways to anticipate and correct errors within the stream. Leave with a better understanding of managing data dependencies within a Spark Structured Streaming application.

Life is but a Stream

Databricks

Similar a Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Novotny Jan Scherbaum (20)

An AI-Powered Chatbot to Simplify Apache Spark Performance Management

Kafka Summit NYC 2017 - The Real-time Event Driven Bank: A Kafka Story

SANOG 38: RPKI Update

Splunk at Scotiabank

SplunkSummit 2015 - Update on Splunk Enterprise 6.3 & Hunk 6.3

Exascale Computing Project (ECP) Update

.NET Fest 2019. Андрей Винда. Создание REST API с поддержкой высокой нагрузки

Introducing Blockchain With Java.pdf

ION Krakow - BCOP Update

Introduce to PredictionIO

16Gb Fibre Channel Deployment Guide

Clean Slate Design - O Que é?

Portland Splunk User Group May 2020

bmp.pptx

Customer Presentation

Splunk live! Customer Presentation – Wellsfargo

Sql Server 2008 Portfolio

IET programmable components

[Tel Aviv merge world tour] Introducing Perforce Insights

Life is but a Stream

Más de Spark Summit

In this session we will present a Configurable FPGA-Based Spark SQL Acceleration Architecture. It is target to leverage FPGA highly parallel computing capability to accelerate Spark SQL Query and for FPGA’s higher power efficiency than CPU we can lower the power consumption at the same time. The Architecture consists of SQL query decomposition algorithms, fine-grained FPGA based Engine Units which perform basic computation of sub string, arithmetic and logic operations. Using SQL query decomposition algorithm, we are able to decompose a complex SQL query into basic operations and according to their patterns each is fed into an Engine Unit. SQL Engine Units are highly configurable and can be chained together to perform complex Spark SQL queries, finally one SQL query is transformed into a Hardware Pipeline. We will present the performance benchmark results comparing the queries with FGPA-Based Spark SQL Acceleration Architecture on XEON E5 and FPGA to the ones with Spark SQL Query on XEON E5 with 10X ~ 100X improvement and we will demonstrate one SQL query workload from a real customer.

FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang

Spark Summit

In this talk, we’ll present techniques for visualizing large scale machine learning systems in Spark. These are techniques that are employed by Netflix to understand and refine the machine learning models behind Netflix’s famous recommender systems that are used to personalize the Netflix experience for their 99 millions members around the world. Essential to these techniques is Vegas, a new OSS Scala library that aims to be the “missing MatPlotLib” for Spark/Scala. We’ll talk about the design of Vegas and its usage in Scala notebooks to visualize Machine Learning Models.

VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...

Spark Summit

This presentation introduces how we design and implement a real-time processing platform using latest Spark Structured Streaming framework to intelligently transform the production lines in the manufacturing industry. In the traditional production line there are a variety of isolated structured, semi-structured and unstructured data, such as sensor data, machine screen output, log output, database records etc. There are two main data scenarios: 1) Picture and video data with low frequency but a large amount; 2) Continuous data with high frequency. They are not a large amount of data per unit. However the total amount of them is very large, such as vibration data used to detect the quality of the equipment. These data have the characteristics of streaming data: real-time, volatile, burst, disorder and infinity. Making effective real-time decisions to retrieve values from these data is critical to smart manufacturing. The latest Spark Structured Streaming framework greatly lowers the bar for building highly scalable and fault-tolerant streaming applications. Thanks to the Spark we are able to build a low-latency, high-throughput and reliable operation system involving data acquisition, transmission, analysis and storage. The actual user case proved that the system meets the needs of real-time decision-making. The system greatly enhance the production process of predictive fault repair and production line material tracking efficiency, and can reduce about half of the labor force for the production lines.

Apache Spark Structured Streaming Helps Smart Manufacturing with Xiaochang Wu

Spark Summit

As common sense would suggest, weather has a definite impact on traffic. But how much? And under what circumstances? Can we improve traffic (congestion) prediction given weather data? Predictive traffic is envisioned to significantly impact how driver’s plan their day by alerting users before they travel, find the best times to travel, and over time, learn from new IoT data such as road conditions, incidents, etc. This talk will cover the traffic prediction work conducted jointly by IBM and the traffic data provider. As a part of this work, we conducted a case study over five large metropolitans in the US, 2.58 billion traffic records and 262 million weather records, to quantify the boost in accuracy of traffic prediction using weather data. We will provide an overview of our lambda architecture with Apache Spark being used to build prediction models with weather and traffic data, and Spark Streaming used to score the model and provide real-time traffic predictions. This talk will also cover a suite of extensions to Spark to analyze geospatial and temporal patterns in traffic and weather data, as well as the suite of machine learning algorithms that were used with Spark framework. Initial results of this work were presented at the National Association of Broadcasters meeting in Las Vegas in April 2017, and there is work to scale the system to provide predictions in over a 100 cities. Audience will learn about our experience scaling using Spark in offline and streaming mode, building statistical and deep-learning pipelines with Spark, and techniques to work with geospatial and time-series data.

Improving Traffic Prediction Using Weather Data with Ramya Raghavendra

Spark Summit

Graph is on the rise and it’s time to start learning about scalable graph analytics! In this session we will go over two Spark-based Graph Analytics frameworks: Tinkerpop and GraphFrames. While both frameworks can express very similar traversals, they have different performance characteristics and APIs. In this Deep-Dive by example presentation, we will demonstrate some common traversals and explain how, at a Spark level, each traversal is actually computed under the hood! Learn both the fluent Gremlin API as well as the powerful GraphFrame Motif api as we show examples of both simultaneously. No need to be familiar with Graphs or Spark for this presentation as we’ll be explaining everything from the ground up!

A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...

Spark Summit

Building accurate machine learning models has been an art of data scientists, i.e., algorithm selection, hyper parameter tuning, feature selection and so on. Recently, challenges to breakthrough this “black-arts” have got started. In cooperation with our partner, NEC Laboratories America, we have developed a Spark-based automatic predictive modeling system. The system automatically searches the best algorithm, parameters and features without any manual work. In this talk, we will share how the automation system is designed to exploit attractive advantages of Spark. The evaluation with real open data demonstrates that our system can explore hundreds of predictive models and discovers the most accurate ones in minutes on a Ultra High Density Server, which employs 272 CPU cores, 2TB memory and 17TB SSD in 3U chassis. We will also share open challenges to learn such a massive amount of models on Spark, particularly from reliability and stability standpoints. This talk will cover the presentation already shown on Spark Summit SF’17 (#SFds5) but from more technical perspective.

No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...

Spark Summit

In Sweden, from the Rise ICE Data Center at www.hops.site, we are providing to reseachers both Spark-as-a-Service and, more recently, Tensorflow-as-a-Service as part of the Hops platform. In this talk, we examine the different ways in which Tensorflow can be included in Spark workflows, from batch to streaming to structured streaming applications. We will analyse the different frameworks for integrating Spark with Tensorflow, from Tensorframes to TensorflowOnSpark to Databrick’s Deep Learning Pipelines. We introduce the different programming models supported and highlight the importance of cluster support for managing different versions of python libraries on behalf of users. We will also present cluster management support for sharing GPUs, including Mesos and YARN (in Hops Hadoop). Finally, we will perform a live demonstration of training and inference for a TensorflowOnSpark application written on Jupyter that can read data from either HDFS or Kafka, transform the data in Spark, and train a deep neural network on Tensorflow. We will show how to debug the application using both Spark UI and Tensorboard, and how to examine logs and monitor training.

Apache Spark and Tensorflow as a Service with Jim Dowling

Spark Summit

Apache Spark and Tensorflow as a Service with Jim Dowling

Spark Summit

With the rapid growth of available datasets, it is imperative to have good tools for extracting insight from big data. The Spark ML library has excellent support for performing at-scale data processing and machine learning experiments, but more often than not, Data Scientists find themselves struggling with issues such as: low level data manipulation, lack of support for image processing, text analytics and deep learning, as well as the inability to use Spark alongside other popular machine learning libraries. To address these pain points, Microsoft recently released The Microsoft Machine Learning Library for Apache Spark (MMLSpark), an open-source machine learning library built on top of SparkML that seeks to simplify the data science process and integrate SparkML Pipelines with deep learning and computer vision libraries such as the Microsoft Cognitive Toolkit (CNTK) and OpenCV. With MMLSpark, Data Scientists can build models with 1/10th of the code through Pipeline objects that compose seamlessly with other parts of the SparkML ecosystem. In this session, we explore some of the main lessons learned from building MMLSpark. Join us if you would like to know how to extend Pipelines to ensure seamless integration with SparkML, how to auto-generate Python and R wrappers from Scala Transformers and Estimators, how to integrate and use previously non-distributed libraries in a distributed manner and how to efficiently deploy a Spark library across multiple platforms.

MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...

Spark Summit

The Next Accelerator Logging Service (NXCALS) is a new Big Data project at CERN aiming to replace the existing Oracle-based service. The main purpose of the system is to store and present Controls/Infrastructure related data gathered from thousands of devices in the whole accelerator complex. The data is used to operate the machines, improve their performance and conduct studies for new beam types or future experiments. During this talk, Jakub will speak about NXCALS requirements and design choices that lead to the selected architecture based on Hadoop and Spark. He will present the Ingestion API, the abstractions behind the Meta-data Service and the Spark-based Extraction API where simple changes to the schema handling greatly improved the overall usability of the system. The system itself is not CERN specific and can be of interest to other companies or institutes confronted with similar Big Data problems.

Next CERN Accelerator Logging Service with Jakub Wozniak

Spark Summit

In Between (A mobile App for couples, downloaded 20M in Global), from daily batch for extracting metrics, analysis and dashboard. Spark is widely used by engineers and data analysts in Between, thanks to the performance and expendability of Spark, data operating has become extremely efficient. Entire team including Biz Dev, Global Operation, Designers are enjoying data results so Spark is empowering entire company for data driven operation and thinking. Kevin, Co-founder and Data Team leader of Between will be presenting how things are going in Between. Listeners will know how small and agile team is living with data (how we build organization, culture and technical base) after this presentation.

Powering a Startup with Apache Spark with Kevin Kim

Spark Summit

Improving Traffic Prediction Using Weather Datawith Ramya Raghavendra

Spark Summit

In many cases, Big Data becomes just another buzzword because of the lack of tools that can support both the technological requirements for developing and deploying of the projects and/or the fluency of communication between the different profiles of people involved in the projects. In this talk, we will present Moriarty, a set of tools for fast prototyping of Big Data applications that can be deployed in an Apache Spark environment. These tools support the creation of Big Data workflows using the already existing functional blocks or supporting the creation of new functional blocks. The created workflow can then be deployed in a Spark infrastructure and used through a REST API. For better understanding of Moriarty, the prototyping process and the way it hides the Spark environment to the Big Data users and developers, we will present it together with a couple of examples based on a Industry 4.0 success cases and other on a logistic success case.

Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...

Spark Summit

Large-scale testing of new data products or enhancements to existing products in a research and development environment can be a technical challenge for data scientists. In some cases, tools available to data scientists lack production-level capacity, whereas other tools do not provide the algorithms needed to run the methodology. At Nielsen, the Databricks platform provided a solution to both of these challenges. This breakout session will cover a specific Nielsen business case where two methodology enhancements were developed and tested at large-scale using the Databricks platform. Development and large-scale testing of these enhancements would not have been possible using standard database tools.

How Nielsen Utilized Databricks for Large-Scale Research and Development with...

Spark Summit

Since the invention of SQL and relational databases, data production has been about specifying how data is transformed through queries. While Apache Spark can certainly be used as a general distributed query engine, the power and granularity of Spark’s APIs enables a revolutionary increase in data engineering productivity: goal-based data production. Goal-based data production concerns itself with specifying WHAT the desired result is, leaving the details of HOW the result is achieved to a smart data warehouse running on top of Spark. That not only substantially increases productivity, but also significantly expands the audience that can work directly with Spark: from developers and data scientists to technical business users. With specific data and architecture patterns spanning the range from ETL to machine learning data prep and with live demos, this session will demonstrate how Spark users can gain the benefits of goal-based data production.

Goal Based Data Production with Sim Simeonov

Spark Summit

Have you imagined a simple machine learning solution able to prevent revenue leakage and monitor your distributed application? To answer this question, we offer a practical and a simple machine learning solution to create an intelligent monitoring application based on simple data analysis using Apache Spark MLlib. Our application uses linear regression models to make predictions and check if the platform is experiencing any operational problems that can impact in revenue losses. The application monitor distributed systems and provides notifications stating the problem detected, that way users can operate quickly to avoid serious problems which directly impact the company’s revenue and reduce the time for action. We will present an architecture for not only a monitoring system, but also an active actor for our outages recoveries. At the end of the presentation you will have access to our training program source code and you will be able to adapt and implement in your company. This solution already helped to prevent about US$3mi in losses last year.

Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...

Spark Summit

Getting Ready to use Redis with Apache Spark is a technical tutorial designed to address integrating Redis with an Apache Spark deployment to increase the performance of serving complex decision models. To set the context for the session, we start with a quick introduction to Redis and the capabilities Redis provides. We cover the basic data types provided by Redis and cover the module system. Using an ad serving use-case, we look at how Redis can improve the performance and reduce the cost of using complex ML-models in production. Attendees will be guided through the key steps of setting up and integrating Redis with Spark, including how to train a model using Spark then load and serve it using Redis, as well as how to work with the Spark Redis module. The capabilities of the Redis Machine Learning Module (redis-ml) will be discussed focusing primarily on decision trees and regression (linear and logistic) with code examples to demonstrate how to use these feature. At the end of the session, developers should feel confident building a prototype/proof-of-concept application using Redis and Spark. Attendees will understand how Redis complements Spark and how to use Redis to serve complex, ML-models with high performance.

Getting Ready to Use Redis with Apache Spark with Dvir Volk

Spark Summit

Here we present a general supervised framework for record deduplication and author-disambiguation via Spark. This work differentiates itself by – Application of Databricks and AWS makes this a scalable implementation. Compute resources are comparably lower than traditional legacy technology using big boxes 24/7. Scalability is crucial as Elsevier’s Scopus data, the biggest scientific abstract repository, covers roughly 250 million authorships from 70 million abstracts covering a few hundred years. – We create a fingerprint for each content by deep learning and/or word2vec algorithms to expedite pairwise similarity calculation. These encoders substantially reduce compute time while maintaining semantic similarity (unlike traditional TFIDF or predefined taxonomies). We will briefly discuss how to optimize word2vec training with high parallelization. Moreover, we show how these encoders can be used to derive a standard representation for all our entities namely such as documents, authors, users, journals, etc. This standard representation can simplify the recommendation problem into a pairwise similarity search and hence it can offer a basic recommender for cross-product applications where we may not have a dedicate recommender engine designed. – Traditional author-disambiguation or record deduplication algorithms are batch-processing with small to no training data. However, we have roughly 25 million authorships that are manually curated or corrected upon user feedback. Hence, it is crucial to maintain historical profiles and hence we have developed a machine learning implementation to deal with data streams and process them in mini batches or one document at a time. We will discuss how to measure the accuracy of such a system, how to tune it and how to process the raw data of pairwise similarity function into final clusters. Lessons learned from this talk can help all sort of companies where they want to integrate their data or deduplicate their user/customer/product databases.

Deduplication and Author-Disambiguation of Streaming Records via Supervised M...

Spark Summit

The use of large-scale machine learning and data mining methods is becoming ubiquitous in many application domains ranging from business intelligence and bioinformatics to self-driving cars. These methods heavily rely on matrix computations, and it is hence critical to make these computations scalable and efficient. These matrix computations are often complex and involve multiple steps that need to be optimized and sequenced properly for efficient execution. This work presents new efficient and scalable matrix processing and optimization techniques based on Spark. The proposed techniques estimate the sparsity of intermediate matrix-computation results and optimize communication costs. An evaluation plan generator for complex matrix computations is introduced as well as a distributed plan optimizer that exploits dynamic cost-based analysis and rule-based heuristics The result of a matrix operation will often serve as an input to another matrix operation, thus defining the matrix data dependencies within a matrix program. The matrix query plan generator produces query execution plans that minimize memory usage and communication overhead by partitioning the matrix based on the data dependencies in the execution plan. We implemented the proposed matrix techniques inside the Spark SQL, and optimize the matrix execution plan based on Spark SQL Catalyst. We conduct case studies on a series of ML models and matrix computations with special features on different datasets. These are PageRank, GNMF, BFGS, sparse matrix chain multiplications, and a biological data analysis. The open-source library ScaLAPACK and the array-based database SciDB are used for performance evaluation. Our experiments are performed on six real-world datasets are: social network data ( e.g., soc-pokec, cit-Patents, LiveJournal), Twitter2010, Netflix recommendation data, and 1000 Genomes Project sample. Experiments demonstrate that our proposed techniques achieve up to an order-of-magnitude performance.

MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...

Spark Summit

Kapil Malik and Arvind Heda will discuss a solution for interactive querying of large scale structured data, stored in a distributed file system (HDFS / S3), in a scalable and reliable manner using a unique combination of Spark SQL, Apache Zeppelin and Spark Job-server (SJS) on Yarn. The solution is production tested and can cater to thousands of queries processing terabytes of data every day. It contains following components – 1. Zeppelin server : A custom interpreter is deployed, which de-couples spark context from the user notebooks. It connects to the remote spark context on Spark Job-server. A rich set of APIs are exposed for the users. The user input is parsed, validated and executed remotely on SJS. 2. Spark job-server : A custom application is deployed, which implements the set of APIs exposed on Zeppelin custom interpreter, as one or more spark jobs. 3. Context router : It routes different user queries from custom interpreter to one of many Spark Job-servers / contexts. The solution has following characteristics – * Multi-tenancy There are hundreds of users, each having one or more Zeppelin notebooks. All these notebooks connect to same set of Spark contexts for running a job. * Fault tolerance The notebooks do not use Spark interpreter, but a custom interpreter, connecting to a remote context. If one spark context fails, the context router sends user queries to another context. * Load balancing Context router identifies which contexts are under heavy load / responding slowly, and selects the most optimal context for serving a user query. * Efficiency We use Alluxio for caching common datasets. * Elastic resource usage We use spark dynamic allocation for the contexts. This ensures that cluster resources are blocked by this application only when it’s doing some actual work.

Indicium: Interactive Querying at Scale Using Apache Spark, Zeppelin, and Spa...

Spark Summit

Más de Spark Summit (20)

FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang

VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...

Apache Spark Structured Streaming Helps Smart Manufacturing with Xiaochang Wu

Improving Traffic Prediction Using Weather Data with Ramya Raghavendra

A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...

No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...

Apache Spark and Tensorflow as a Service with Jim Dowling

MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...

Next CERN Accelerator Logging Service with Jakub Wozniak

Powering a Startup with Apache Spark with Kevin Kim

Improving Traffic Prediction Using Weather Datawith Ramya Raghavendra

Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...

How Nielsen Utilized Databricks for Large-Scale Research and Development with...

Goal Based Data Production with Sim Simeonov

Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...

Getting Ready to Use Redis with Apache Spark with Dvir Volk

Deduplication and Author-Disambiguation of Streaming Records via Supervised M...

MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...

Indicium: Interactive Querying at Scale Using Apache Spark, Zeppelin, and Spa...

Último

Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage with sex Booking Contact Details :- WhatsApp Chat :- +91-9920725232 4-May-2024(SMW) Call Girls In Model Towh +91-9920725232 !! Best Woman Seeking Man Call Girls Service, Escorts Service in Home Hotel in NCR 24 Hours Available Service Call Girls, Contact Us +91-9920725232 (Any Time. Any Where) Call Girls in , Noida, Gurgaon, Ghaziabad,Sexy Indian Female Escorts Service NCRWelcome To Escorts Service – An All Over New Very Sexy Hot Call Girls Agency Service Escorts In South NCR’s No. 1 High Profile Independent Female Escorts Service. We Provide Good Quality Educated Profile At Very Regnebal Price 100% Safe And Original.We Are Provide Escorts Service All OYO Hotels ,3*,4*,5* Star Hotel And Home Flat, Apartment. Guest-House. Services In -Call And Out – Call Both Are Services Available. 24Hrs. Any Time Any Where. In All Over Noida Gurgaon Ghaziabad Faridabad.More Information And Contact Profile Real Pic Visit Our Website City Wise Escorts Service Agency.Good Looking Cheap And Best Models Girls U Can Get Best Click On Link……Night Call Girls Now In Hotel Le Meridien Gurgaon Near Female Escort One Shot — 5000/in call (time 1 hour), 6000/out call Two shot with one girl — 8000/in call (time 2 hour), 10000/out call Body to body massage with sex- 8000/in call (time 1 hour) Full night Service for one person– 12000/in call, 13000/out call (shot limit 3-4 shots) Full night Service for more than 1 person — please contact Us —9920725232 We are available 24*7 all days of the year. Call us — 9920725232 Thank you for Visiting.

Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...

amitlee9823

Just Call Vip call girls Erode Escorts ☎️9352988975 Two shot with one girl (Erode ) Booking Contact Details :- WhatsApp Chat :- +91-9352988975 If you're looking for India Call girls you've come to the right place. You'll find some of the most beautiful call girls in our location with. These ladies have pleasing personalities, hot figures, and a passion for physical pleasure. Call girls in India Lucknow Many men have booked them for their erotic and soul-mixing performances, which are sure to leave you with unforgettable memories. #K09 Escort Service India is available in the city for men and women of all ages. They can satisfy your sexual needs and will make your experience even more enjoyable and memorable. Whether you're looking for a blow-job, stripping, lovemaking, or other dirty acts, you'll be able to find a match for your tastes and budget. These highly trained professionals will help you have an unforgettable night. One Shot — 5000/in call (time 1 hour), 6000/out call Two shot with one girl — 8000/in call (time 2 hour), 10000/out call Body to body massage with sex- 8000/in call (time 1 hour) Full night Service for one person– 12000/in call, 13000/out call (shot limit 3-4 shots) Full night Service for more than 1 person — please contact Us —9352988975 We are available 24*7 all days of the year. Call us — 9352988975 Thank you for Visiting.

Just Call Vip call girls Erode Escorts ☎️9352988975 Two shot with one girl (E...

gajnagarg

Building Real-Time Pipelines With FLaNK Timothy Spann, Principal Developer Advocate, Streaming - Cloudera Future of Data meetup, startup grind, AI Camp The combination of Apache Flink, Apache NiFi, and Apache Kafka for building real-time data processing pipelines is extremely powerful, as demonstrated by this case study using the FLaNK-MTA project. The project leverages these technologies to process and analyze real-time data from the New York City Metropolitan Transportation Authority (MTA). FLaNK-MTA demonstrates how to efficiently collect, transform, and analyze high-volume data streams, enabling timely insights and decision-making. Apache NiFi Apache Kafka Apache Flink Apache Iceberg LLM Generative AI Slack Postgresql

DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK

Timothy Spann

Klinik_ Apotek Onlin 085657271886 Solusi Menggugurkan Masalah Kehamilan Anda Jual Obat Aborsi Asli KLINIK ABORSI TERPEECAYA _ Jual Obat Aborsi Cytotec Misoprostol Asli 100% Ampuh Hanya 3 Jam Langsung Gugur || OBAT PENGGUGUR KANDUNGAN AMPUH MANJUR OBAT ABORSI OLINE" APOTIK Jual Obat Cytotec, Gastrul, Gynecoside Asli Ampuh. JUAL ” Obat Aborsi Tuntas | Obat Aborsi Manjur | Obat Aborsi Ampuh | Obat Penggugur Janin | Obat Pencegah Kehamilan | Obat Pelancar Haid | Obat terlambat Bulan | Ciri Obat Aborsi Asli | Obat Telat Bulan | Pil Aborsi Asli | Cara Menggugurkan Konten | Cara Aborsi Tuntas | Harga Obat Aborsi Asli | Pil Aborsi | Jual Obat Aborsi Cytotec | Cara Aborsi Sendiri | Cara Aborsi Usia 1 Bulan | Cara Aborsi Usia 2 Tahun | Cara Aborsi Usia 3 Bulan | Obat Aborsi Usia 4 Bulan | Cara Abrasi Usia 5 Bulan | Cara Menggugurkan Konten | Kandungan Obat Penggugur | Cara Menghitung Usia Konten | Cara Mengatasi Terlambat Bulan | Penjual Obat Aborsi Asli | Obat Aborsi Garansi | Kandungan Obat Peluntur | Obat Telat Datang Bulan | Obat Telat Haid | Obat Aborsi Paling Murah | Klinik Jual Obat Aborsi | Jual Pil Cytotec | Apotik Jual Obat Aborsi | Kandungan Dokter Abrasi | Cara Aborsi Cepat | Jual Obat Aborsi Bergaransi | Jual Obat Cytotec Asli | Obat Aborsi Aman Manjur | Obat Misoprostol Cytotec Asli. "APA ITU ABORSI" “Aborsi Adalah dengan membendung hormon yang di perlukan untuk mempertahankan kehamilan yaitu hormon progesteron, karena hormon ini dibendung, maka jalur kehamilan mulai membuka dan leher rahim menjadi melunak,sehingga mengeluarkan darah yang merupakan tanda bahwa obat telah bekerja || maksimal 1 jam obat diminum || PENJELASAN OBAT ABORSI USIA 1 _7 BULAN Pada usia kandungan ini, pasien akan merasakan sakit yang sedikit tidak berlebihan || sekitar 1 jam ||. namun hanya akan terjadi pada saatdarah keluar merupakan pertanda menstruasi. Hal ini dikarenakan pada usiakandungan 3 bulan,janin sudah terbentuk sebesar kepalan tangan orang dewasa. Cara kerja obat aborsi : JUAL OBAT ABORSI AMPUH dosis 3 bulan secara umum sama dengan cara kerja || DOSIS OBAT ABORSI 2 bulan”, hanya berbedanya selain mengisolasijanin juga menghancurkan janin dengan formula methotrexate dikandungdidalamnya. Formula methotrexate ini sangat ampuh untuk menghancurkan janinmenjadi serpihan-serpihan kecil akan sangat berguna pada saat dikeluarkan nanti. APA ALASAN WANITA MELAKUKAN ABORSI? Aborsi di lakukan wanita hamil baik yang sudah menikah maupun belum menikah dengan berbagai alasan , akan tetapi alasan yang utama adalah alasan-alasan non medis (termasuk aborsi sendiri / di sengaja/ buatan] MELAYANI PEMESANAN OBAT ABORSI SETIAP HARI, SIAP KIRIM KESELURUH KOTA BESAR DI INDONESIA DAN LUAR NEGERI. HUBUNGI PEMESANAN LEBIH NYAMAN VIA WA/: 085657271886

Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...

ZurliaSoop

This project aims to predict whether a loan application will be approved or denied based on various factors such as applicant's income, credit score, loan amount, etc. Using a dataset containing historical loan application data, we employed machine learning algorithms to build a predictive model. The model was trained on features such as applicant's income, credit history, loan amount, loan term, and others. After training the model, we evaluated its performance using metrics like accuracy, precision, recall, and F1 score. The insights from this project can help financial institutions streamline their loan approval process and make informed decisions. Visit for more information: https://bostoninstituteofanalytics.org/data-science-and-artificial-intelligence/

Predicting Loan Approval: A Data Science Project

Boston Institute of Analytics

Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (Bangalore) Booking Contact Details :- WhatsApp Chat :- +91-7737669865 2-May-2024(SMW) Call Girls In Model Towh Bangalore +91-7737669865 !! Best Woman Seeking Man Call Girls Service, Escorts Service in Home Hotel in Bangalore NCR 24 Hours Available Service Call Girls, Contact Us +91-7737669865 (Any Time. Any Where) Call Girls in Bangalore, Noida, Gurgaon, Ghaziabad,Sexy Indian Female Escorts Service Bangalore NCRWelcome To Bangalore Escorts Service – An All Over New Bangalore Very Sexy Hot Call Girls Agency Service Escorts In South BangaloreNCRBangalore’s No. 1 High Profile Independent Female Escorts Service. We Provide Good Quality Educated Profile At Very Regnebal Price 100% Safe And Original.We Are Provide Escorts Service All OYO Hotels ,3*,4*,5* Star Hotel And Home Flat, Apartment. Guest-House. Services In -Call And Out – Call Both Are Services Available. 24Hrs. Any Time Any Where. In All Over Bangalore Noida Gurgaon Ghaziabad Faridabad.More Information And Contact Profile Real Pic Visit Our Website City Wise Escorts Service Agency.Good Looking Cheap And Best Models Girls U Can Get Best Click On Link……Night Call Girls Now In Hotel Le Meridien Gurgaon Near Female Escort One Shot — 5000/in call (time 1 hour), 6000/out call Two shot with one girl — 8000/in call (time 2 hour), 10000/out call Body to body massage with sex- 8000/in call (time 1 hour) Full night Service for one person– 12000/in call, 13000/out call (shot limit 3-4 shots) Full night Service for more than 1 person — please contact Us —7737669865 We are available 24*7 all days of the year. Call us — 7737669865 Thank you for Visiting.

Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...

amitlee9823

Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand Booking Contact Details :- WhatsApp Chat :- +91-7737669865 Call Girls In Model Towh +91-7737669865 !! Best Woman Seeking Man Call Girls Service, Escorts Service in Home Hotel in NCR 24 Hours Available Service Call Girls, Contact Us +91-7737669865 (Any Time. Any Where) Call Girls in , Noida, Gurgaon, Ghaziabad,Sexy Indian Female Escorts Service NCRWelcome To Escorts Service – An All Over New Very Sexy Hot Call Girls Agency Service Escorts In South NCR’s No. 1 High Profile Independent Female Escorts Service. We Provide Good Quality Educated Profile At #K09 Very Regnebal Price 100% Safe And Original.We Are Provide Escorts Service All OYO Hotels ,3*,4*,5* Star Hotel And Home Flat, Apartment. Guest-House. Services In -Call And Out – Call Both Are Services Available. 24Hrs. Any Time Any Where. In All Over Noida Gurgaon Ghaziabad Faridabad.More Information And Contact Profile Real Pic Visit Our Website City Wise Escorts Service Agency.Good Looking Cheap And Best Models Girls U Can Get Best Click On Link……Night Call Girls Now In Hotel Le Meridien Gurgaon Near Female Escort One Shot — 5000/in call (time 1 hour), 6000/out call Two shot with one girl — 8000/in call (time 2 hour), 10000/out call Body to body massage with sex- 8000/in call (time 1 hour) Full night Service for one person– 12000/in call, 13000/out call (shot limit 3-4 shots) Full night Service for more than 1 person — please contact Us —7737669865 We are available 24*7 all days of the year. Call us — 7737669865 Thank you for Visiting.

Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand

amitlee9823

Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore Booking Contact Details :- WhatsApp Chat :- +91-7737669865 2-May-2024(SMW) Call Girls In Model Towh Bangalore +91-7737669865 !! Best Woman Seeking Man Call Girls Service, Escorts Service in Home Hotel in Bangalore NCR 24 Hours Available Service Call Girls, Contact Us +91-7737669865 (Any Time. Any Where) Call Girls in Bangalore, Noida, Gurgaon, Ghaziabad,Sexy Indian Female Escorts Service Bangalore NCRWelcome To Bangalore Escorts Service – An All Over New Bangalore Very Sexy Hot Call Girls Agency Service Escorts In South BangaloreNCRBangalore’s No. 1 High Profile Independent Female Escorts Service. We Provide Good Quality Educated Profile At Very Regnebal Price 100% Safe And Original.We Are Provide Escorts Service All OYO Hotels ,3*,4*,5* Star Hotel And Home Flat, Apartment. Guest-House. Services In -Call And Out – Call Both Are Services Available. 24Hrs. Any Time Any Where. In All Over Bangalore Noida Gurgaon Ghaziabad Faridabad.More Information And Contact Profile Real Pic Visit Our Website City Wise Escorts Service Agency.Good Looking Cheap And Best Models Girls U Can Get Best Click On Link……Night Call Girls Now In Hotel Le Meridien Gurgaon Near Female Escort One Shot — 5000/in call (time 1 hour), 6000/out call Two shot with one girl — 8000/in call (time 2 hour), 10000/out call Body to body massage with sex- 8000/in call (time 1 hour) Full night Service for one person– 12000/in call, 13000/out call (shot limit 3-4 shots) Full night Service for more than 1 person — please contact Us —7737669865 We are available 24*7 all days of the year. Call us — 7737669865 Thank you for Visiting.

Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...

amitlee9823

➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts Service Booking Contact Details :- WhatsApp Chat :- +91-7737669865 Call Girls In Model Towh +91-7737669865 !! Best Woman Seeking Man Call Girls Service, Escorts Service in Home Hotel in NCR 24 Hours Available Service Call Girls, Contact Us +91-7737669865 (Any Time. Any Where) Call Girls in , Noida, Gurgaon, Ghaziabad,Sexy Indian Female Escorts Service NCRWelcome To Escorts Service – An All Over New Very Sexy Hot Call Girls Agency Service Escorts In South NCR’s No. 1 High Profile Independent Female Escorts Service. We Provide Good Quality Educated Profile At #K09 Very Regnebal Price 100% Safe And Original.We Are Provide Escorts Service All OYO Hotels ,3*,4*,5* Star Hotel And Home Flat, Apartment. Guest-House. Services In -Call And Out – Call Both Are Services Available. 24Hrs. Any Time Any Where. In All Over Noida Gurgaon Ghaziabad Faridabad.More Information And Contact Profile Real Pic Visit Our Website City Wise Escorts Service Agency.Good Looking Cheap And Best Models Girls U Can Get Best Click On Link……Night Call Girls Now In Hotel Le Meridien Gurgaon Near Female Escort One Shot — 5000/in call (time 1 hour), 6000/out call Two shot with one girl — 8000/in call (time 2 hour), 10000/out call Body to body massage with sex- 8000/in call (time 1 hour) Full night Service for one person– 12000/in call, 13000/out call (shot limit 3-4 shots) Full night Service for more than 1 person — please contact Us —7737669865 We are available 24*7 all days of the year. Call us — 7737669865 Thank you for Visiting.

➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...

amitlee9823

Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore Escorts Service Booking Contact Details :- WhatsApp Chat :- +91-7737669865 2-May-2024(SMW) Call Girls In Model Towh Bangalore +91-7737669865 !! Best Woman Seeking Man Call Girls Service, Escorts Service in Home Hotel in Bangalore NCR 24 Hours Available Service Call Girls, Contact Us +91-7737669865 (Any Time. Any Where) Call Girls in Bangalore, Noida, Gurgaon, Ghaziabad,Sexy Indian Female Escorts Service Bangalore NCRWelcome To Bangalore Escorts Service – An All Over New Bangalore Very Sexy Hot Call Girls Agency Service Escorts In South BangaloreNCRBangalore’s No. 1 High Profile Independent Female Escorts Service. We Provide Good Quality Educated Profile At Very Regnebal Price 100% Safe And Original.We Are Provide Escorts Service All OYO Hotels ,3*,4*,5* Star Hotel And Home Flat, Apartment. Guest-House. Services In -Call And Out – Call Both Are Services Available. 24Hrs. Any Time Any Where. In All Over Bangalore Noida Gurgaon Ghaziabad Faridabad.More Information And Contact Profile Real Pic Visit Our Website City Wise Escorts Service Agency.Good Looking Cheap And Best Models Girls U Can Get Best Click On Link……Night Call Girls Now In Hotel Le Meridien Gurgaon Near Female Escort One Shot — 5000/in call (time 1 hour), 6000/out call Two shot with one girl — 8000/in call (time 2 hour), 10000/out call Body to body massage with sex- 8000/in call (time 1 hour) Full night Service for one person– 12000/in call, 13000/out call (shot limit 3-4 shots) Full night Service for more than 1 person — please contact Us —7737669865 We are available 24*7 all days of the year. Call us — 7737669865 Thank you for Visiting.

Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...

amitlee9823

➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men 🔝Dindigul🔝 Escorts Service Booking Contact Details :- WhatsApp Chat :- +91-7737669865 Call Girls In Model Towh +91-7737669865 !! Best Woman Seeking Man Call Girls Service, Escorts Service in Home Hotel in NCR 24 Hours Available Service Call Girls, Contact Us +91-7737669865 (Any Time. Any Where) Call Girls in , Noida, Gurgaon, Ghaziabad,Sexy Indian Female Escorts Service NCRWelcome To Escorts Service – An All Over New Very Sexy Hot Call Girls Agency Service Escorts In South NCR’s No. 1 High Profile Independent Female Escorts Service. We Provide Good Quality Educated Profile At #K09 Very Regnebal Price 100% Safe And Original.We Are Provide Escorts Service All OYO Hotels ,3*,4*,5* Star Hotel And Home Flat, Apartment. Guest-House. Services In -Call And Out – Call Both Are Services Available. 24Hrs. Any Time Any Where. In All Over Noida Gurgaon Ghaziabad Faridabad.More Information And Contact Profile Real Pic Visit Our Website City Wise Escorts Service Agency.Good Looking Cheap And Best Models Girls U Can Get Best Click On Link……Night Call Girls Now In Hotel Le Meridien Gurgaon Near Female Escort One Shot — 5000/in call (time 1 hour), 6000/out call Two shot with one girl — 8000/in call (time 2 hour), 10000/out call Body to body massage with sex- 8000/in call (time 1 hour) Full night Service for one person– 12000/in call, 13000/out call (shot limit 3-4 shots) Full night Service for more than 1 person — please contact Us —7737669865 We are available 24*7 all days of the year. Call us — 7737669865 Thank you for Visiting.

➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men 🔝Dindigul🔝 Escor...

amitlee9823

Just Call Vip call girls roorkee Escorts ☎️9352988975 Two shot with one girl (roorkee ) Booking Contact Details :- WhatsApp Chat :- +91-9352988975 If you're looking for India Call girls you've come to the right place. You'll find some of the most beautiful call girls in our location with. These ladies have pleasing personalities, hot figures, and a passion for physical pleasure. Call girls in India Lucknow Many men have booked them for their erotic and soul-mixing performances, which are sure to leave you with unforgettable memories. #K09 Escort Service India is available in the city for men and women of all ages. They can satisfy your sexual needs and will make your experience even more enjoyable and memorable. Whether you're looking for a blow-job, stripping, lovemaking, or other dirty acts, you'll be able to find a match for your tastes and budget. These highly trained professionals will help you have an unforgettable night. One Shot — 5000/in call (time 1 hour), 6000/out call Two shot with one girl — 8000/in call (time 2 hour), 10000/out call Body to body massage with sex- 8000/in call (time 1 hour) Full night Service for one person– 12000/in call, 13000/out call (shot limit 3-4 shots) Full night Service for more than 1 person — please contact Us —9352988975 We are available 24*7 all days of the year. Call us — 9352988975 Thank you for Visiting.

Just Call Vip call girls roorkee Escorts ☎️9352988975 Two shot with one girl ...

gajnagarg

Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore Booking Contact Details :- WhatsApp Chat :- +91-7737669865 2-May-2024(SMW) Call Girls In Model Towh Bangalore +91-7737669865 !! Best Woman Seeking Man Call Girls Service, Escorts Service in Home Hotel in Bangalore NCR 24 Hours Available Service Call Girls, Contact Us +91-7737669865 (Any Time. Any Where) Call Girls in Bangalore, Noida, Gurgaon, Ghaziabad,Sexy Indian Female Escorts Service Bangalore NCRWelcome To Bangalore Escorts Service – An All Over New Bangalore Very Sexy Hot Call Girls Agency Service Escorts In South BangaloreNCRBangalore’s No. 1 High Profile Independent Female Escorts Service. We Provide Good Quality Educated Profile At Very Regnebal Price 100% Safe And Original.We Are Provide Escorts Service All OYO Hotels ,3*,4*,5* Star Hotel And Home Flat, Apartment. Guest-House. Services In -Call And Out – Call Both Are Services Available. 24Hrs. Any Time Any Where. In All Over Bangalore Noida Gurgaon Ghaziabad Faridabad.More Information And Contact Profile Real Pic Visit Our Website City Wise Escorts Service Agency.Good Looking Cheap And Best Models Girls U Can Get Best Click On Link……Night Call Girls Now In Hotel Le Meridien Gurgaon Near Female Escort One Shot — 5000/in call (time 1 hour), 6000/out call Two shot with one girl — 8000/in call (time 2 hour), 10000/out call Body to body massage with sex- 8000/in call (time 1 hour) Full night Service for one person– 12000/in call, 13000/out call (shot limit 3-4 shots) Full night Service for more than 1 person — please contact Us —7737669865 We are available 24*7 all days of the year. Call us — 7737669865 Thank you for Visiting.

Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...

amitlee9823

Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore Escorts Service Booking Contact Details :- WhatsApp Chat :- +91-7737669865 2-May-2024(SMW) Call Girls In Model Towh Bangalore +91-7737669865 !! Best Woman Seeking Man Call Girls Service, Escorts Service in Home Hotel in Bangalore NCR 24 Hours Available Service Call Girls, Contact Us +91-7737669865 (Any Time. Any Where) Call Girls in Bangalore, Noida, Gurgaon, Ghaziabad,Sexy Indian Female Escorts Service Bangalore NCRWelcome To Bangalore Escorts Service – An All Over New Bangalore Very Sexy Hot Call Girls Agency Service Escorts In South BangaloreNCRBangalore’s No. 1 High Profile Independent Female Escorts Service. We Provide Good Quality Educated Profile At Very Regnebal Price 100% Safe And Original.We Are Provide Escorts Service All OYO Hotels ,3*,4*,5* Star Hotel And Home Flat, Apartment. Guest-House. Services In -Call And Out – Call Both Are Services Available. 24Hrs. Any Time Any Where. In All Over Bangalore Noida Gurgaon Ghaziabad Faridabad.More Information And Contact Profile Real Pic Visit Our Website City Wise Escorts Service Agency.Good Looking Cheap And Best Models Girls U Can Get Best Click On Link……Night Call Girls Now In Hotel Le Meridien Gurgaon Near Female Escort One Shot — 5000/in call (time 1 hour), 6000/out call Two shot with one girl — 8000/in call (time 2 hour), 10000/out call Body to body massage with sex- 8000/in call (time 1 hour) Full night Service for one person– 12000/in call, 13000/out call (shot limit 3-4 shots) Full night Service for more than 1 person — please contact Us —7737669865 We are available 24*7 all days of the year. Call us — 7737669865 Thank you for Visiting.

Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...

amitlee9823

➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Service Booking Contact Details :- WhatsApp Chat :- +91-7737669865 Call Girls In Model Towh +91-7737669865 !! Best Woman Seeking Man Call Girls Service, Escorts Service in Home Hotel in NCR 24 Hours Available Service Call Girls, Contact Us +91-7737669865 (Any Time. Any Where) Call Girls in , India,Sexy Indian Female Escorts Service NCRWelcome To Escorts Service – An All Over New Very Sexy Hot Call Girls Agency Service Escorts In South NCR’s No. 1 High Profile Independent Female Escorts Service. We Provide Good Quality Educated Profile At #K09 Very Regnebal Price 100% Safe And Original.We Are Provide Escorts Service All OYO Hotels ,3*,4*,5* Star Hotel And Home Flat, Apartment. Guest-House. Services In -Call And Out – Call Both Are Services Available. 24Hrs. Any Time Any Where. In All Over Noida Gurgaon Ghaziabad Faridabad.More Information And Contact Profile Real Pic Visit Our Website City Wise Escorts Service Agency.Good Looking Cheap And Best Models Girls U Can Get Best Click On Link……Night Call Girls Now In Hotel Le Meridien Gurgaon Near Female Escort One Shot — 5000/in call (time 1 hour), 6000/out call Two shot with one girl — 8000/in call (time 2 hour), 10000/out call Body to body massage with sex- 8000/in call (time 1 hour) Full night Service for one person– 12000/in call, 13000/out call (shot limit 3-4 shots) Full night Service for more than 1 person — please contact Us —7737669865 We are available 24*7 all days of the year. Call us — 7737669865 Thank you for Visiting.

➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...

amitlee9823

Just Call Vip call girls Palakkad Escorts ☎️9352988975 Two shot with one girl (Palakkad ) Booking Contact Details :- WhatsApp Chat :- +91-9352988975 If you're looking for India Call girls you've come to the right place. You'll find some of the most beautiful call girls in our location with. These ladies have pleasing personalities, hot figures, and a passion for physical pleasure. Call girls in India Lucknow Many men have booked them for their erotic and soul-mixing performances, which are sure to leave you with unforgettable memories. #K09 Escort Service India is available in the city for men and women of all ages. They can satisfy your sexual needs and will make your experience even more enjoyable and memorable. Whether you're looking for a blow-job, stripping, lovemaking, or other dirty acts, you'll be able to find a match for your tastes and budget. These highly trained professionals will help you have an unforgettable night. One Shot — 5000/in call (time 1 hour), 6000/out call Two shot with one girl — 8000/in call (time 2 hour), 10000/out call Body to body massage with sex- 8000/in call (time 1 hour) Full night Service for one person– 12000/in call, 13000/out call (shot limit 3-4 shots) Full night Service for more than 1 person — please contact Us —9352988975 We are available 24*7 all days of the year. Call us — 9352988975 Thank you for Visiting.

Just Call Vip call girls Palakkad Escorts ☎️9352988975 Two shot with one girl...

gajnagarg

Digital advertising, or paid media, encompasses the strategic deployment of online advertisements to reach target audiences efficiently and effectively. This includes any digital platform that supports advertising to deliver unique messages for any objective. Understanding the mechanics of digital advertising platforms, along with insights into audience behaviors and preferences, allows marketers to optimize their ad spend and achieve significant engagement and conversion rates. This lecture is for Advanced Digital & Social Media Strategy (MGMTX 466.05) at UCLA Extension.

Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...

Valters Lauzums

Discover Why Less is More in B2B Research

michael115558

5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed

amy56318795

Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore Booking Contact Details :- WhatsApp Chat :- +91-7737669865 2-May-2024(SMW) Call Girls In Model Towh Bangalore +91-7737669865 !! Best Woman Seeking Man Call Girls Service, Escorts Service in Home Hotel in Bangalore NCR 24 Hours Available Service Call Girls, Contact Us +91-7737669865 (Any Time. Any Where) Call Girls in Bangalore, Noida, Gurgaon, Ghaziabad,Sexy Indian Female Escorts Service Bangalore NCRWelcome To Bangalore Escorts Service – An All Over New Bangalore Very Sexy Hot Call Girls Agency Service Escorts In South BangaloreNCRBangalore’s No. 1 High Profile Independent Female Escorts Service. We Provide Good Quality Educated Profile At Very Regnebal Price 100% Safe And Original.We Are Provide Escorts Service All OYO Hotels ,3*,4*,5* Star Hotel And Home Flat, Apartment. Guest-House. Services In -Call And Out – Call Both Are Services Available. 24Hrs. Any Time Any Where. In All Over Bangalore Noida Gurgaon Ghaziabad Faridabad.More Information And Contact Profile Real Pic Visit Our Website City Wise Escorts Service Agency.Good Looking Cheap And Best Models Girls U Can Get Best Click On Link……Night Call Girls Now In Hotel Le Meridien Gurgaon Near Female Escort One Shot — 5000/in call (time 1 hour), 6000/out call Two shot with one girl — 8000/in call (time 2 hour), 10000/out call Body to body massage with sex- 8000/in call (time 1 hour) Full night Service for one person– 12000/in call, 13000/out call (shot limit 3-4 shots) Full night Service for more than 1 person — please contact Us —7737669865 We are available 24*7 all days of the year. Call us — 7737669865 Thank you for Visiting.

Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...

amitlee9823

Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Novotny Jan Scherbaum

1. Jan Scherbaum & Marek Novotny Barclays Africa Group Limited SPLINE: APACHE SPARK LINEAGE, NOT ONLY FOR THE BANKING INDUSTRY #EUent3

2. 7.34 Stuff happens under the hood… #EUent3 2

3. Data type changes Adding new dataComplex calcs 7.34 #EUent3 3

4. Overview • Barclays Africa • Spline • Live demo • Future work • The big picture • Questions 4#EUent3

5. • Pan-African financial services provider – Providing services to South Africa (ABSA) and 11 other African countries (Barclays) • Subject to strict regulatory compliance – Basel Committee on Banking Supervision (BCBS) • Accuracy • Comprehensiveness • Clarity • Usefulness 5#EUent3 Barclays Africa Group Limited

6. • SPark LINEage • Open source project • Goals – Satisfy initial interpretations of regulatory requirements, specifically on “Clarity” • Data lineage from Spark’s execution plans • Visualize in an “explorable” user-friendly format 6#EUent3 Spline

7. Spline – How it works 7#EUent3 Spark job Spark Job Spark library Spark Session Action Transformations Generate execution plans

8. Spline – How it works 8#EUent3 Spark job 1 line initialization Use SQLContext listeners Generate execution plans Spark Job & Spline Spark library Spark Session Action Spline Transformations

9. Spline UI Spline – How it works 9#EUent3 Spark job 1 line initialization Use SQLContext listeners Generate execution plans Spark Job & Spline Spark library Spark Session Action Spline Transformations …

10. Demo use case • Find the countries with the highest annual beer consumption per person – Correlation with GDP?? 10#EUent3

11. Data 11#EUent3 Country 2011 2010 2009 Czech Republic 15,583,000 15,549,000 16,190,000 Ireland 4,721,000 4,814,000 4,832,000 Country Metric 2011 2010 2009 Czech Republic GDP $21,717 $19,764 $19,698 Czech Republic Population 10,496,088 10,474,410 10,443,936 Ireland GDP $52,567 $48,538 $51,983 Ireland Population 4,576,794 4,560,155 4,535,375 Beer consumption per country Development indicators from the world bank

12. Analysis • Marek’s job – Data prep – Analyze the correlation between beer consumption and GDP growth • Jan’s beer job – Calculate the consumption of beer per country per capita 12#EUent3

13. Dependencies <dependency> <groupId>za.co.absa</groupId> <artifactId>spline-core</artifactId> <version>${spline.version}</version> </dependency> <dependency> <groupId>za.co.absa</groupId> <artifactId>spline-persistence-mongo</artifactId> <version>${spline.version}</version> </dependency> 13#EUent3

14. Initialization 14#EUent3 // Initializing library to hook up to Apache Spark import za.co.absa.spline.core.SparkLineageInitializer._ spark.enableLineageTracking()

15. 15#EUent3

16. Next steps • Enterprise features – Authentication (Kerberos, SSO) – Authorization – User management • Interoperability with other tools – Cloudera Manager, Informatica, Apache Atlas • Support other Spark data sources and actions – Streaming, ML 16#EUent3

17. The bigger picture • Develop open source conformance & ingestion engine on Spark – BCBS compliant (lineage, dataflow controls, error tracking) – Transfer & transform data from different source systems • To strongly typed datasets • On Hadoop • Conforming to enterprise level data dictionaries & data quality • In development – stay tuned J 17#EUent3

18. We’re open source! • Contributions are most welcome • Released versions mirrored on – https://github.com/AbsaOSS/spline • Wiki and docs on – https://absaoss.github.io/spline/ 18#EUent3

19. Questions 19#EUent3 • Now is a good time • Or feel free to contact us – Jan Scherbaum • jan.scherbaum@barclays.com – Marek Novotny • marek.x.novotny@barclays.com – Oleksandr Vayda • oleksandr.vayda@barclays.com • Acknowledgements: – Dennis Chu, Aaisha Bibi Osman, Adam Smyczek, Andrew Baker

Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Novotny Jan Scherbaum

Recomendados

Recomendados

Más contenido relacionado

La actualidad más candente

La actualidad más candente (20)

Similar a Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Novotny Jan Scherbaum

Similar a Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Novotny Jan Scherbaum (20)

Más de Spark Summit

Más de Spark Summit (20)

Último

Último (20)

Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Novotny Jan Scherbaum