Spark on Azure HDInsight - spark meetup seattle

•Descargar como PPTX, PDF•

4 recomendaciones•1,151 vistas

Since HDInsight launched Spark clusters last year, HDInsight spark team’s mission has been making Spark easy-to-use and production-ready. In the process, we have explored many open source technologies such as Livy, Jupyter, Zeppelin. In this talk, we will demo top customer features, deep dive into HDInsight Spark architecture, and share learnings from building the perfect cluster. Speakers: Judy Nash and Lin Chan

Tecnología

Spark on HDInsight
Seattle Spark Meetup on March 9, 2016
Presenters: Judy Nash & Lin Chan

About Us
 Azure HDInsight Service
 Azure’s answer to big data with open source tech
 deploy and manage clusters hosting Hadoop, HBase, Storm, and now Spark
 Our Goal – Make Spark easy to use on Azure
 How Do We Make It Happen
 Deploy new spark clusters via SDK and Portal
 Pre-configure and tune cluster for optimal experience
 Adopt open source technologies to enhance spark workload
 Contribute back to open source

About the Talk
 How to Build an Enterprise-ready Spark System
 Deep Dive of HDInsight’s Spark Cluster
 Cluster Architecture
 Resource Manager
 End-to-end Workflows
 Business Intelligence
 Remote Job Submission

Why Yarn?
 Standalone
 Better UI
 Less memory overhead
 Faster application launch time
 YARN
 Better community support
 More powerful resource management
 Share resources with other job workflows
 More user friendly to users who knew Hadoop on yarn already

Addressing Multi-tenancy
 Fair Scheduler
 Allow sharing resources between queries within thrift server
 Important for BI customers who share a cluster. Avoid bad query taking over a
cluster.
 To Use, set default queue type as “fair” scheduling
 Dynamic Allocation
 Allow sharing resources between thrift and other applications
 Leave minimum footprints for customers who do not use thrift, but able to expand
to maximum resource allowed when customers execute expensive queries

What is Livy?
 REST Server allowing remote job submission
 2 modes currently: batch & interactive
 Open source project
 Co-development with Cloudera

Sample Call
 Submit a batch job
curl -k --user "admin:mypassword1!" -v -H 'Content-Type: application/json' -X
POST -d '{
"file":"wasb://mycontainer@mystorageaccount.blob.core.windows.net/data/Spar
kSimpleTest.jar", "className":"com.microsoft.spark.test.SimpleFile" }'
"https://mysparkcluster.azurehdinsight.net/livy/batches"
 Check the job status
curl -k --user "admin:mypassword1!" -v -X GET
"https://mysparkcluster.azurehdinsight.net/livy/batches/{batchId}"

Sample Call
 Start a Scala interactive session
curl -k --user "admin:mypassword1!" -v -H 'Content-Type: application/json' -X POST -d '{
"kind":"spark" }' "https://mysparkcluster.azurehdinsight.net/livy/sessions"
 Post a statement
curl -k --user "admin:mypassword1!" -v -H 'Content-Type: application/json' -X POST -d
'{"code":"1+1" }'
"https://mysparkcluster.azurehdinsight.net/livy/sessions/{sessionId}/statements"
 Check the statement result
curl -k --user "admin:mypassword1!" -v -X GET
"https://mysparkcluster.azurehdinsight.net/livy/sessions/{sessionId}/statements"
 Terminate a session
curl -k --user "admin:mypassword1!" -v -X DELETE
"https://mysparkcluster.azurehdinsight.net/livy/sessions/{sessionId}”

Livy vs Job Server
 Had Job Server initially
 Job server is not easy to use for simple jar submission or notebook case
 Job server is good for embedding Spark work within a bigger app
 Client mode is coming to Livy soon
 Partner with Cloudera is important

More on Livy
 HDI online documentation: https://azure.microsoft.com/en-
us/documentation/articles/hdinsight-apache-spark-livy-rest-interface
 Livy Repo: https://github.com/cloudera/livy

More on HDInsight
 HDInsight Blog
 https://blogs.msdn.microsoft.com/azuredatalake/
 Contact Us
 Lin Chan https://www.linkedin.com/in/linchanms
 Judy Nash https://www.linkedin.com/in/judynash

Más contenido relacionado

La actualidad más candente

Self-Service Provisioning and Hadoop Management with Apache Ambari

DataWorks Summit

By simply looking at structured and unstructured data, Data Lakes enable companies to understand correlations between existing and new external data - such as social media - in ways traditional Business Intelligence tools cannot. For this you need to find out the most efficient way to store and access structured or unstructured petabyte-sized data across your entire infrastructure. In this meetup we’ll give answers on the next questions: 1. Why would someone use a Data Lake? 2. Is it hard to build a Data Lake? 3. What are the main features that a Data Lake should bring in? 4. What’s the role of the microservices in the big data world?

Data Lake and the rise of the microservices

Bigstep

Red Hat Openshift on Microsoft Azure

John Archer

Big Data and the Internet of Things (IoT) have forced businesses and the Federal Government to reevaluate their existing data strategies and adopt a more modern data architecture. With the advent of the connected data platform, migrating or building data-driven applications that take advantage of data-in-motion and data-at-rest can be a daunting journey to undertake. Scaling, reusability, and achieving operational agility are just some of the common pitfalls associated with existing software architectures. How do we embrace this paradigm shift? Adopting agile methodologies and emerging development practices such as Microservices and DevOps offer greater agility and operational efficiency enabling the government to rapidly build modern data-driven applications. During this talk and demonstration, we will show how the federal government can unleash the true power of the connected data platform with modern data-driven applications. Connected Data Platform: • Hortonworks DataFlow o Using Apache NiFi for capturing data at the edge of the data lake & managing the flow of data to the data platform o Apache Storm for complex event processing and stream processing • Hortonworks Data Platform o Apache Accumulo for scalability and cell-level security o Apache YARN for resource management • Modern Data-Driven Applications o Microservices: a software architecture practice for designing software applications as suites of independently deployable services, promoting componentization, single responsibility & scalability. Adopting a Microservices mindset enables the government to be technology agnostic: using the best tool or programming language for the job. ♣ Demoed REST API’s on-top of Apache Accumulo. (Spark-Java, AngularJS/Typescript) o DevOps: A culture and practice that breaks down the silos found between development and operations teams in traditional software practices. ♣ CI / CD pipelines, automated build kick-offs using containers (Docker, Jenkins) This talk will lay out a basic environment for promoting greater agility and operational efficiency for the federal government while taking advantage of a connected data platform.

Enabling Modern Application Architecture using Data.gov open government data

DataWorks Summit

azure synapse analytics end-to-end solution-hands-on at 20200728

Daichi Isami

Logical Data Warehouse: How to Build a Virtualized Data Services Layer

DataWorks Summit

Using Databricks, McGraw-Hill securely transformed itself from a collection of data silos with limited access to data and minimal collaboration to an organization with democratized access to data and machine learning. This ultimately enables its data teams to rapidly identify usage patterns predicting student performance, so they can make timely enhancements to the software that proactively guide at-risk students through the course material. Join our webinar to learn: - How a cloud-based unified analytics platform can help your company perform analytics faster, at lower cost. - How to mitigate challenges presented by data silos so data science teams can collaborate effectively. - How to implement data analytics infrastructure to put models into production quickly

McGraw-Hill Optimizes Analytics Workloads with Databricks

Amazon Web Services

Machine Learning for Any Size of Data, Any Type of Data

DataWorks Summit/Hadoop Summit

Let's be honest - there are some pretty amazing capabilities locked in proprietary SQL engines which have had decades of R&D baked into them. At this session, learn how IBM, working with the Apache community, has unlocked the value of their SQL optimizer for Hive, HBase, ObjectStore, and Spark - helping customers avoid lock-in while providing best performance, concurrency and scalability for complex, analytical SQL workloads. You'll also learn how the SQL engine was extended and integrated with Ambari, Ranger, YARN/Slider and HBase. We share the results of this project which has enabled running all 99 TPC-DS queries at world record breaking 100TB scale factor.

Big SQL: Powerful SQL Optimization - Re-Imagined for open source

DataWorks Summit

Lean Enterprise, Microservices and Big Data

Stylight

Openshift 3.10 & Container solutions for Blockchain, IoT and Data Science

John Archer

Delivered to SQL Saturday BI Edition -- Atlanta, GA Microsoft provides several technologies in and around Azure which can be used for casual to serious data science. This presentation provides an overview of the major Microsoft options for both on-premise and cloud-based data science (and hybrid). These technologies have been used by the presenter in various companies and industries, both as a Microsoft consultant and previously independent consultant. As well, the speaker provides insights into data science careers, information which helps imply where the business will likely be for consultants and partners.

Microsoft Technologies for Data Science 201612

Mark Tabladillo

We have the challenge of how to reliably store massive quantities of data that are available even in the face of infrastructure failures. We have similar challenges on the application side. The most successful cloud architectures break applications down into microservices. How then do we deploy, upgrade and manage the scale of those microservices? This session will illustrate how to tackle these challenges by taking advantage of both Cassandra and Microsoft's next generation PaaS infrastructure called Azure Service Fabric.

Microsoft: Building a Massively Scalable System with DataStax and Microsoft's...

DataStax Academy

The application landscape inside our data center is changing: Along with the trend of moving toward microservices and containers, there are a number of new distributed data processing frameworks such as Kafka or Cassandra being released on a weekly basis. These changes have implications for the ways we think about infrastructure. With the growing need for computing power and the rise of distributed applications comes the need for a reliable and simple-use cluster manager and programming abstraction. In this presentation, Mesosphere explains how to use DC/OS to manage microservices and fast data systems on a single platform. We will look at how container orchestration, including resource management and service management, can be streamlined to process fast data in a matter of seconds, allowing for predictive user interfaces, product recommendations, and billing charge back, among other modern app components.

Manage Microservices & Fast Data Systems on One Platform w/ DC/OS

Mesosphere Inc.

Ignite Your Big Data With a Spark!

Progress

Spark and Couchbase– Augmenting the Operational Database with Spark

Matt Ingenthron

The next-phase-of-distributed-systems-with-apache-ignite

Dani Traphagen

CloudStack currently provides a variety bespoke high availability mechanisms for resources such as virtual machines, hosts, and virtual routers. Each of these implementations duplicates the HA check/recovery cycle, as well as, concurrency, persistence, and clustering required manage high available for any CloudStack resource. The High Availability Resource Management Service has been developed to consolidate these concerns -- providing a robust, extensible HA mechanism. Using this service, plugins only need to define health check, activity check, and fence operations.

When the Cloud is a Rockin: High Availability in Apache CloudStack

John Burwell

The BlueData EPIC software platform makes deployment of Big Data infrastructure and applications easier, faster, and more cost-effective – whether on-premises or on the public cloud. With BlueData EPIC on AWS, you can quickly and easily deploy your preferred Big Data applications, distributions and tools; leverage enterprise-class security and cost controls for multi-tenant deployments on the Amazon cloud; and tap into both Amazon S3 and on-premises storage for your Big Data analytics. Sign up for a free two-week trial at www.bluedata.com/aws

BlueData EPIC on AWS - Spec Sheet

BlueData, Inc.

Building Enterprise Clouds - Key Considerations and Strategies - RED HAT

Fadi Semaan

La actualidad más candente (20)

Self-Service Provisioning and Hadoop Management with Apache Ambari

Data Lake and the rise of the microservices

Red Hat Openshift on Microsoft Azure

Enabling Modern Application Architecture using Data.gov open government data

azure synapse analytics end-to-end solution-hands-on at 20200728

Logical Data Warehouse: How to Build a Virtualized Data Services Layer

McGraw-Hill Optimizes Analytics Workloads with Databricks

Machine Learning for Any Size of Data, Any Type of Data

Big SQL: Powerful SQL Optimization - Re-Imagined for open source

Lean Enterprise, Microservices and Big Data

Openshift 3.10 & Container solutions for Blockchain, IoT and Data Science

Microsoft Technologies for Data Science 201612

Microsoft: Building a Massively Scalable System with DataStax and Microsoft's...

Manage Microservices & Fast Data Systems on One Platform w/ DC/OS

Ignite Your Big Data With a Spark!

Spark and Couchbase– Augmenting the Operational Database with Spark

The next-phase-of-distributed-systems-with-apache-ignite

When the Cloud is a Rockin: High Availability in Apache CloudStack

BlueData EPIC on AWS - Spec Sheet

Building Enterprise Clouds - Key Considerations and Strategies - RED HAT

Destacado

Logical-DataWarehouse-Alluxio-meetup

Gianmario Spacagna

Vitalii Bondarenko HDinsight: spark. advanced in memory big-data analytics wi...

Аліна Шепшелей

Go Serverless with Azure Functions

Jim O'Neil

Azure api app métricas com application insights

Nicolas Takashi

Fraud Detection using Hadoop

hadooparchbook

Azure IOT

Maik van der Gaag

Belgian Windows Server 2012 Launch windows azure insights for the enterprise ...

Mike Martin

Enterprise Data Workflows with Cascading and Windows Azure HDInsight

Paco Nathan

SQLSaturday #230 - Introduction to Microsoft Big Data (Part 1)

Sascha Dittmann

Microsoft NYC 14

SwitchPitch

Big data streaming with Apache Spark on Azure

Willem Meints

Azure HDInsight

Koray Kocabas

Going serverless

TechExeter

2016-08-25 TechExeter - going serverless with Azure

Steve Lee

Software scope

Shubham Dubey

Azure Stream Analytics : Analyse Data in Motion

Ruhani Arora

With all the outstanding education technologies available these days, it's now possible to turn an online course into a full ecosystem of best-in-breed technologies and content providers. Come to this session to learn what that ecosystem can look like! We'll discuss how to use open educational resources (OERs) to replace expensive textbooks, and tips for finding, reviewing, and implementing the best tools right inside your LMS/VLE. We'll also look at best practices for building and adopting an open-centric strategy in your organization's teaching and learning environment.

Open up to a better learning ecosystem

Katie Bradford

The concept of the Internet of Things is intrinsically related to the sending of data to the internet and its so-called cloud services. Learn how to join a Toradex Single Board Computer solution with the Azure IoT Hub service to send and receive messages in our next blog. It will help you to develop an IoT application which can read field sensors, present results, and demonstrate business intelligence. Toradex is an Azure IoT certified partner.

Azure IoT Hub on a Toradex Colibri VF61 – Part 1 - Sending data to the cloud

Toradex

Azure functions

vivek p s

Going serverless

Jeremy Green

Destacado (20)

Logical-DataWarehouse-Alluxio-meetup

Vitalii Bondarenko HDinsight: spark. advanced in memory big-data analytics wi...

Go Serverless with Azure Functions

Azure api app métricas com application insights

Fraud Detection using Hadoop

Azure IOT

Belgian Windows Server 2012 Launch windows azure insights for the enterprise ...

Enterprise Data Workflows with Cascading and Windows Azure HDInsight

SQLSaturday #230 - Introduction to Microsoft Big Data (Part 1)

Microsoft NYC 14

Big data streaming with Apache Spark on Azure

Azure HDInsight

Going serverless

2016-08-25 TechExeter - going serverless with Azure

Software scope

Azure Stream Analytics : Analyse Data in Motion

Open up to a better learning ecosystem

Azure IoT Hub on a Toradex Colibri VF61 – Part 1 - Sending data to the cloud

Azure functions

Going serverless

Similar a Spark on Azure HDInsight - spark meetup seattle

Kafka for data scientists

Jenn Rawlins

The secret is out – Drupal has become the ‘go-to’ open source software for the publication and management of website content. By pairing Drupal with cloud technologies there is a whole new world of user benefits well beyond scale and performance. In this session, Bret Piatt, director, technical alliances at Rackspace Hosting will discuss how to best take advantage of cloud technologies with Drupal sites. The panel presentation will address: • Leveraging the cloud ecosystem for managing configuration, code, and backups • How to scale Drupal clusters by integrating with cloud APIs • Enhancing site scale and performance by taking advantage of cloud file storage/CDN • Cloud/Drupal success stories such as Chapter Three’s ( http://www.chapterthree.com ) on Mercury, a Drupal PaaS built on The Rackspace Cloud’s Cloud Servers

Drupal In The Cloud

Bret Piatt

Workshop - Openstack, Cloud Computing, Virtualization

Jayaprakash R

Openstack workshop @ Kalasalingam

Beny Raja

OpenStack is the leading open source Infrastructure-as-a-Service, and Cloud Foundry has become the leading open source Platform-as-a-Service. Deploying them together is a natural fit for your next generation systems of engagement. This special joint meetup of the OpenStack NY and NYC Cloud Foundry communities will give both audiences an introduction to these popular open source IaaS and PaaS projects. The presentation will describe the compelling advantages of each technology, and then explain how they can be integrated, optimized, and scaled to provide a complete cloud application hosting solution.

OpenStack and Cloud Foundry - Pair the leading open source IaaS and PaaS

Daniel Krook

Just one-shade-of-openstack

Roberto Polli

Over the last few years, we have seen a dramatic increase in the use of open source projects as the mainstay of architectures in both startups and enterprises. Many of our customers and partners also run their own open source programs and contribute key technologies to the industry as a whole (see DCS201). At AWS we engage with open source projects in a number of ways. We contribute bug fixes and enhancements to popular projects including our work with the Hadoop ecosystem (see BDM401), Chromium (see BAP305) and (obviously) Boto. We have our own standalone projects including the security library s2n (see NET405) and machine learning project MXnet (see MAC401). We also have services that make open source easier to use like ECS for Docker (see CON316), and RDS for MySQL and PostgreSQL (see DAT305). In this session you will learn about our existing open source work across AWS, and our next steps.

AWS re:Invent 2016: Open Source at AWS—Contributions, Support, and Engagement...

Amazon Web Services

DIMT 2023 SG - Hands-on Workshop_ Getting started with Confluent Cloud.pdf

confluent

Near Real Time Indexing Kafka Messages into Apache Blur: Presented by Dibyend...

Lucidworks

And while the Hitchhiker’s Guide to the Galaxy (HHGTTG) is a wholly remarkable book it doesn’t cover the nuances of cloud computing. Whether you want to build a public, private or hybrid cloud there are free and open source tools that can help provide you a complete solution or help augment your existing Amazon or other hosted cloud solution. That’s why you need the Hitchhiker’s Guide to (Open Source) Cloud Computing (HHGTCC) or at least to attend this talk understand the current state of open source cloud computing. This talk will cover infrastructure-as-a-service, platform-as-a-service and developments in big data and how to more effectively deploy and manage open source flavors of these technologies. Specific the guide will cover: Infrastructure-as-a-Service – The Systems Cloud – Get a comparison of the open source cloud platforms including OpenStack, Apache CloudStack, Eucalyptus and OpenNebula Platform-as-a-Service – The Developers Cloud – Learn about the tools that abstract the complexity for developers and used to build portable auto-scaling applications ton CloudFoundry, OpenShift, Stackato and more. Data-as-a-Service – The Analytics Cloud – Want to figure out the who, what, where, when and why of big data? You’ll get an overview of open source NoSQL databases and technologies like MapReduce to help parallelize data mining tasks and crunch massive data sets in the cloud. Network-as-a-Service – The Network Cloud – The final pillar for truly fungible network infrastructure is network virtualization. We will give an overview of software-defined networking including OpenStack Quantum, Nicira, open Vswitch and others. Finally this talk will provide an overview of the tools that can help you really take advantage of the cloud. Do you want to auto-scale to serve millions of web pages and scale back down as demand fluctuates. Are you interested in automating the total lifecycle of cloud computing environments You’ll learn how to combine these tools into tool chains to provide continuous deployment systems that will help you become agile and spend more time improving your IT rather than simply maintaining it. [Finally, for those of you that are Douglas Adams fans please accept the deepest apologies for bad analogies to the HHGTTG.]

OSCON 2013 - The Hitchiker’s Guide to Open Source Cloud Computing

Mark Hinkle

Big Data analytics is well known to uncover hidden insights that gives an organization an edge over the competition. But data does not need to be big in order to be useful. Smaller companies and startups may lack the volume of data that qualifies as big data, yet the variety of data can still yield a trove of insights that helps in driving the business strategies of a company. Startups may also lack the resources to fund an additional, seemingly expensive development project. The key is in simplicity, start small, simple and architect for scalability and performance. But how do you start? In this presentation, we share our experience in building a cost effective, AWS serverless data analytics platform that became an invaluable tool for sales, marketing and operational efficiencies.Serverless architectures simplify development work where servers and software are managed by a third party cloud provider. Developers can focus on just building the data wrangling and data analysis logic where critical aspects like scalability and high availability are guaranteed by the cloud provider. Besides, serverless services offer the pay as you go model, where you pay only based on the amount of resources you use. This turns out to be another attractive aspect where costs can be managed based on the usage. In this presentation we will focus on techniques and best practices to build a big data analytics platform using AWS serverless services like Lambda, DynamoDB, S3, Kinesis, Athena, QuickSight and Amazon ML. We will highlight the strengths of each of these services and what role each plays in the data analytics pipeline. We compare and contrast these services with some of the other popularly used big data technologies like Hadoop, Spark and Kafka. We also demonstrate the usage of these services to build intelligent components that detect anomalies, yield recommendations, simulate chat bots and generate predictive analytics.

Building Data Analytics pipelines in the cloud using serverless technology

Domino Data Lab

963

Annu Ahmed

Cloud computing is more than a buzz-phrase it’s a transformative IT paradigm shift. The emphasis in the cloud is on elasticity, scalability, agility and open. Not just open standards but open APIs and open source. The delivery of software is also going through a paradigm shift. Open source software was often a commoditization of a market leader; Unix to Linux or Oracle to MySQL what’s changing is that the iterative nature, user context and the motto of releasing early and often are driving real innovation in open source. This session will cover those essential open source technologies for delivering cloud computing in the enterprise. Speaker Bio: Mark Hinkle is the Senior Director, Open Source Solutions at Citrix Systems Inc. He joined Citrix as a result of their July 2011 acquisition of Cloud.com where he was their Vice President of Community. He is currently responsible for Citrix open source efforts around the open source cloud computing platform, Apache CloudStack and the Xen Hypervisor. Previously he was the VP of Community at Zenoss Inc., a producer of the open source application, server, and network management software, where he grew the Zenoss Core project to over 100,000 users and 20,000 organizations on all seven continents. He also is a longtime open source expert and author having served as Editor-in-Chief for both LinuxWorld Magazine and Enterprise Open Source Magazine. His blog on open source, technology, and new media can be found at http://www.socializedsoftware.com.

Cloud Expo East 2013: Essential Open Source Software for Building the Open Cloud

Mark Hinkle

OpenStack Identity Service (Keystone) seminar. Distributed Systems course at Engineering and Computer Science (ECS), University of Messina. By Lorenzo Carnevale and Silvio Tavilla. Seminar’s topics ❖ OpenStack Identity - Keystone (liberty) ❖ Installation and first configuration of Keystone ❖ Identity service configuration ➢ Identity API protection with RBAC ➢ Use Trusts ➢ Certificates for PKI ❖ Hierarchical Projects ❖ Identity API v3 client example

OpenStack Identity - Keystone (liberty) by Lorenzo Carnevale and Silvio Tavilla

Lorenzo Carnevale

Cisco Cloud Computing and Open Stack: Velocity 2011

Cisco Service Provider

DEVNET-1140 InterCloud Mapreduce and Spark Workload Migration and Sharing: Fi...

Cisco DevNet

Cloud Computing using OpenStack

Jobayer Almahmud Hossain (RHCA, RHCDS, RHCSS)

20141021 AWS Cloud Taekwon - Startup Best Practices on AWS

Amazon Web Services Korea

Pivoting Spring XD to Spring Cloud Data Flow: A microservice based architecture for stream processing Microservice based architectures are not just for distributed web applications! They are also a powerful approach for creating distributed stream processing applications. Spring Cloud Data Flow enables you to create and orchestrate standalone executable applications that communicate over messaging middleware such as Kafka and RabbitMQ that when run together, form a distributed stream processing application. This allows you to scale, version and operationalize stream processing applications following microservice based patterns and practices on a variety of runtime platforms such as Cloud Foundry, Apache YARN and others. About Sabby Anandan Sabby Anandan is a Product Manager at Pivotal. Sabby is focused on building products that eliminate the barriers between application development, cloud, and big data.

Pivoting Spring XD to Spring Cloud Data Flow with Sabby Anandan

PivotalOpenSourceHub

[DevDay 2016] OpenStack and approaches for new users - Speaker: Chi Le – Head...

DevDay.org

Similar a Spark on Azure HDInsight - spark meetup seattle (20)

Kafka for data scientists

Drupal In The Cloud

Workshop - Openstack, Cloud Computing, Virtualization

Openstack workshop @ Kalasalingam

OpenStack and Cloud Foundry - Pair the leading open source IaaS and PaaS

Just one-shade-of-openstack

AWS re:Invent 2016: Open Source at AWS—Contributions, Support, and Engagement...

DIMT 2023 SG - Hands-on Workshop_ Getting started with Confluent Cloud.pdf

Near Real Time Indexing Kafka Messages into Apache Blur: Presented by Dibyend...

OSCON 2013 - The Hitchiker’s Guide to Open Source Cloud Computing

Building Data Analytics pipelines in the cloud using serverless technology

963

Cloud Expo East 2013: Essential Open Source Software for Building the Open Cloud

OpenStack Identity - Keystone (liberty) by Lorenzo Carnevale and Silvio Tavilla

Cisco Cloud Computing and Open Stack: Velocity 2011

DEVNET-1140 InterCloud Mapreduce and Spark Workload Migration and Sharing: Fi...

Cloud Computing using OpenStack

20141021 AWS Cloud Taekwon - Startup Best Practices on AWS

Pivoting Spring XD to Spring Cloud Data Flow with Sabby Anandan

[DevDay 2016] OpenStack and approaches for new users - Speaker: Chi Le – Head...

Último

Exploring Multimodal Embeddings with Milvus

Zilliz

Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood

Juan lago vázquez

Retrieval augmented generation (RAG) is the most popular style of large language model application to emerge from 2023. The most basic style of RAG works by vectorizing your data and injecting it into a vector database like Milvus for retrieval to augment the text output generated by an LLM. This is just the beginning. One of the ways that we can extend RAG, and extend AI, is through multilingual use cases. Typical RAG is done in English using embedding models that are trained in English. In this talk, we’ll explore how RAG could work in languages other than English. We’ll explore French, Chinese, and Polish.

Introduction to Multilingual Retrieval Augmented Generation (RAG)

Zilliz

In this keynote, Asanka Abeysinghe, CTO,WSO2 will explore the shift towards platformless technology ecosystems and their importance in driving digital adaptability and innovation. We will discuss strategies for leveraging decentralized architectures and integrating diverse technologies, with a focus on building resilient, flexible, and future-ready IT infrastructures. We will also highlight WSO2's roadmap, emphasizing our commitment to supporting this transformative journey with our evolving product suite.

Platformless Horizons for Digital Adaptability

WSO2

Three things you will take away from the session: • How to run an effective tenant-to-tenant migration • Best practices for before, during, and after migration • Tips for using migration as a springboard to prepare for Copilot in Microsoft 365 Main ideas: Migration Overview: The presentation covers the current reality of cross-tenant migrations, the triggers, phases, best practices, and benefits of a successful tenant migration Considerations: When considering a migration, it is important to consider the migration scope, performance, customization, flexibility, user-friendly interface, automation, monitoring, support, training, scalability, data integrity, data security, cost, and licensing structure Next Wave: The next wave of change includes the launch of Copilot, which requires businesses to be prepared for upcoming changes related to Copilot and the cloud, and to consolidate data and tighten governance ShareGate: ShareGate can help with pre-migration analysis, configurable migration tool, and automated, end-user driven collaborative governance

Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff

sammart93

Passkeys: Developing APIs to enable passwordless authentication Cody Salas, Sr Developer Advocate | Solutions Architect - Yubico Apidays New York 2024: The API Economy in the AI Era (April 30 & May 1, 2024) ------ Check out our conferences at https://www.apidays.global/ Do you want to sponsor or talk at one of our conferences? https://apidays.typeform.com/to/ILJeAaV8 Learn more on APIscene, the global media made by the community for the community: https://www.apiscene.io Explore the API ecosystem with the API Landscape: https://apilandscape.apiscene.io/

Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...

apidays

The presentation was made in “Web3 Fusion: Embracing AI and Beyond” is more than a conference; it's a journey into the heart of digital transformation. The conference a provided a platform where the future of technology meets practical application. This three-day hybrid event, set in the heart of innovation, served as a gateway to the latest trends and transformative discussions in AI, Blockchain, IoT, AR/VR, and their collective impact on the information space.

AI in Action: Real World Use Cases by Anitaraj

AnitaRaj43

Vector Search -An Introduction in Oracle Database 23ai.pptx

Remote DBA Services

The microservices honeymoon is over. When starting a new project or revamping a legacy monolith, teams started looking for alternatives to microservices. The Modular Monolith, or 'Modulith', is an architecture that reaps the benefits of (vertical) functional decoupling without the high costs associated with separate deployments. This talk will delve into the advantages and challenges of this progressive architecture, beginning with exploring the concept of a 'module', its internal structure, public API, and inter-module communication patterns. Supported by spring-modulith, the talk provides practical guidance on addressing the main challenges of a Modultith Architecture: finding and guarding module boundaries, data decoupling, and integration module-testing. You should not miss this talk if you are a software architect or tech lead seeking practical, scalable solutions. About the author With two decades of experience, Victor is a Java Champion working as a trainer for top companies in Europe. Five thousands developers in 120 companies attended his workshops, so he gets to debate every week the challenges that various projects struggle with. In return, Victor summarizes key points from these workshops in conference talks and online meetups for the European Software Crafters, the world’s largest developer community around architecture, refactoring, and testing. Discover how Victor can help you on victorrentea.ro : company training catalog, consultancy and YouTube playlists.

Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024

Victor Rentea

JohnPollard-hybrid-app-RailsConf2024.pptx

JohnPollard37

Effective data discovery is crucial for maintaining compliance and mitigating risks in today's rapidly evolving privacy landscape. However, traditional manual approaches often struggle to keep pace with the growing volume and complexity of data. Join us for an insightful webinar where industry leaders from TrustArc and Privya will share their expertise on leveraging AI-powered solutions to revolutionize data discovery. You'll learn how to: - Effortlessly maintain a comprehensive, up-to-date data inventory - Harness code scanning insights to gain complete visibility into data flows leveraging the advantages of code scanning over DB scanning - Simplify compliance by leveraging Privya's integration with TrustArc - Implement proven strategies to mitigate third-party risks Our panel of experts will discuss real-world case studies and share practical strategies for overcoming common data discovery challenges. They'll also explore the latest trends and innovations in AI-driven data management, and how these technologies can help organizations stay ahead of the curve in an ever-changing privacy landscape.

TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery

TrustArc

Tracing the root cause of a performance issue requires a lot of patience, experience, and focus. It’s so hard that we sometimes attempt to guess by trying out tentative fixes, but that usually results in frustration, messy code, and a considerable waste of time and money. This talk explains how to correctly zoom in on a performance bottleneck using three levels of profiling: distributed tracing, metrics, and method profiling. After we learn to read the JVM profiler output as a flame graph, we explore a series of bottlenecks typical for backend systems, like connection/thread pool starvation, invisible aspects, blocking code, hot CPU methods, lock contention, and Virtual Thread pinning, and we learn to trace them even if they occur in library code you are not familiar with. Attend this talk and prepare for the performance issues that will eventually hit any successful system. About authorWith two decades of experience, Victor is a Java Champion working as a trainer for top companies in Europe. Five thousands developers in 120 companies attended his workshops, so he gets to debate every week the challenges that various projects struggle with. In return, Victor summarizes key points from these workshops in conference talks and online meetups for the European Software Crafters, the world’s largest developer community around architecture, refactoring, and testing. Discover how Victor can help you on victorrentea.ro : company training catalog, consultancy and YouTube playlists.

Finding Java's Hidden Performance Traps @ DevoxxUK 2024

Victor Rentea

Scaling API-first – The story of a global engineering organization Ian Reasor, Senior Computer Scientist - Adobe Radu Cotescu, Senior Computer Scientist - Adobe Apidays New York 2024: The API Economy in the AI Era (April 30 & May 1, 2024) ------ Check out our conferences at https://www.apidays.global/ Do you want to sponsor or talk at one of our conferences? https://apidays.typeform.com/to/ILJeAaV8 Learn more on APIscene, the global media made by the community for the community: https://www.apiscene.io Explore the API ecosystem with the API Landscape: https://apilandscape.apiscene.io/

Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe

apidays

Keynote 2: APIs in 2030: The Risk of Technological Sleepwalk Paolo Malinverno, Growth Advisor - The Business of Technology Apidays New York 2024: The API Economy in the AI Era (April 30 & May 1, 2024) ------ Check out our conferences at https://www.apidays.global/ Do you want to sponsor or talk at one of our conferences? https://apidays.typeform.com/to/ILJeAaV8 Learn more on APIscene, the global media made by the community for the community: https://www.apiscene.io Explore the API ecosystem with the API Landscape: https://apilandscape.apiscene.io/

Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...

apidays

CNIC Information System with Pakdata Cf In Pakistan

danishmna97

Following the popularity of "Cloud Revolution: Exploring the New Wave of Serverless Spatial Data," we're thrilled to announce this much-anticipated encore webinar. In this sequel, we'll dive deeper into the Cloud-Native realm by uncovering practical applications and FME support for these new formats, including COGs, COPC, FlatGeoBuf, GeoParquet, STAC, and ZARR. Building on the foundation laid by industry leaders Michelle Roby of Radiant Earth and Chris Holmes of Planet in the first webinar, this second part offers an in-depth look at the real-world application and behind-the-scenes dynamics of these cutting-edge formats. We will spotlight specific use-cases and workflows, showcasing their efficiency and relevance in practical scenarios. Discover the vast possibilities each format holds, highlighted through detailed discussions and demonstrations. Our expert speakers will dissect the key aspects and provide critical takeaways for effective use, ensuring attendees leave with a thorough understanding of how to apply these formats in their own projects. Elevate your understanding of how FME supports these cutting-edge technologies, enhancing your ability to manage, share, and analyze spatial data. Whether you're building on knowledge from our initial session or are new to the serverless spatial data landscape, this webinar is your gateway to mastering cloud-native formats in your workflows.

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME

Safe Software

Sidekick Solutions uses Bonterra Impact Management (fka Social Solutions Apricot) and automation solutions to integrate data for business workflows. We believe integration and automation are essential to user experience and the promise of efficient work through technology. Automation is the critical ingredient to realizing that full vision. We develop integration products and services for Bonterra Case Management software to support the deployment of automations for a variety of use cases. This video focuses on the deployment of external web forms using Jotform for Bonterra Impact Management. This solution can be customized to your organization’s needs and deployed to support the common use cases below: - Intake and consent - Assessments - Surveys - Applications - Program registration Interested in deploying web form automations for Bonterra Impact Management? Contact us at sales@sidekicksolutionsllc.com to discuss next steps.

Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...

Jeffrey Haguewood

Dubai, known for its towering skyscrapers, luxurious lifestyle, and relentless pursuit of innovation, often finds itself in the global spotlight. However, amidst the glitz and glamour, the emirate faces its own set of challenges, including the occasional threat of flooding. In recent years, Dubai has experienced sporadic but significant floods, disrupting normalcy and posing unique challenges to its infrastructure. Among the critical nodes in this bustling metropolis is the Dubai International Airport, a vital hub connecting the world. This article delves into the intersection of Dubai flood events and the resilience demonstrated by the Dubai International Airport in the face of such challenges.

Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf

Orbitshub

Corporate and higher education. Two industries that, in the past, have had a clear divide with very little crossover. The difference in goals, learning styles and objectives paved the way for differing learning technologies platforms to evolve. Now, those stark lines are blurring as both sides are discovering they have content that’s relevant to the other. Join Tammy Rutherford as she walks through the pros and cons of corporate and higher ed collaborating. And the challenges of these different technology platforms working together for a brighter future.

Corporate and higher education May webinar.pptx

Rustici Software

EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER

MadyBayot

Spark on Azure HDInsight - spark meetup seattle

1. Spark on HDInsight Seattle Spark Meetup on March 9, 2016 Presenters: Judy Nash & Lin Chan

2. About Us  Azure HDInsight Service  Azure’s answer to big data with open source tech  deploy and manage clusters hosting Hadoop, HBase, Storm, and now Spark  Our Goal – Make Spark easy to use on Azure  How Do We Make It Happen  Deploy new spark clusters via SDK and Portal  Pre-configure and tune cluster for optimal experience  Adopt open source technologies to enhance spark workload  Contribute back to open source

3. About the Talk  How to Build an Enterprise-ready Spark System  Deep Dive of HDInsight’s Spark Cluster  Cluster Architecture  Resource Manager  End-to-end Workflows  Business Intelligence  Remote Job Submission

4. Spark Cluster Architecture

5. Why Yarn?  Standalone  Better UI  Less memory overhead  Faster application launch time  YARN  Better community support  More powerful resource management  Share resources with other job workflows  More user friendly to users who knew Hadoop on yarn already

6. Business Intelligence Workflow

7. Addressing Multi-tenancy  Fair Scheduler  Allow sharing resources between queries within thrift server  Important for BI customers who share a cluster. Avoid bad query taking over a cluster.  To Use, set default queue type as “fair” scheduling  Dynamic Allocation  Allow sharing resources between thrift and other applications  Leave minimum footprints for customers who do not use thrift, but able to expand to maximum resource allowed when customers execute expensive queries

8. What is Livy?  REST Server allowing remote job submission  2 modes currently: batch & interactive  Open source project  Co-development with Cloudera

9. Batch Job Submission

10. Sample Call  Submit a batch job curl -k --user "admin:mypassword1!" -v -H 'Content-Type: application/json' -X POST -d '{ "file":"wasb://mycontainer@mystorageaccount.blob.core.windows.net/data/Spar kSimpleTest.jar", "className":"com.microsoft.spark.test.SimpleFile" }' "https://mysparkcluster.azurehdinsight.net/livy/batches"  Check the job status curl -k --user "admin:mypassword1!" -v -X GET "https://mysparkcluster.azurehdinsight.net/livy/batches/{batchId}"

11. Interactive session

12. Sample Call  Start a Scala interactive session curl -k --user "admin:mypassword1!" -v -H 'Content-Type: application/json' -X POST -d '{ "kind":"spark" }' "https://mysparkcluster.azurehdinsight.net/livy/sessions"  Post a statement curl -k --user "admin:mypassword1!" -v -H 'Content-Type: application/json' -X POST -d '{"code":"1+1" }' "https://mysparkcluster.azurehdinsight.net/livy/sessions/{sessionId}/statements"  Check the statement result curl -k --user "admin:mypassword1!" -v -X GET "https://mysparkcluster.azurehdinsight.net/livy/sessions/{sessionId}/statements"  Terminate a session curl -k --user "admin:mypassword1!" -v -X DELETE "https://mysparkcluster.azurehdinsight.net/livy/sessions/{sessionId}”

13. Integration with Jupyter

14. Livy vs Job Server  Had Job Server initially  Job server is not easy to use for simple jar submission or notebook case  Job server is good for embedding Spark work within a bigger app  Client mode is coming to Livy soon  Partner with Cloudera is important

15. More on Livy  HDI online documentation: https://azure.microsoft.com/en- us/documentation/articles/hdinsight-apache-spark-livy-rest-interface  Livy Repo: https://github.com/cloudera/livy

16. More on HDInsight  HDInsight Blog  https://blogs.msdn.microsoft.com/azuredatalake/  Contact Us  Lin Chan https://www.linkedin.com/in/linchanms  Judy Nash https://www.linkedin.com/in/judynash

Notas del editor

HDInsight – an Azure service dedicated to hosting big data solutions from open source communities. Azure service dedicated to deploy and manage clusters hosting big data solutions from open source
Key concepts * What does the node types do * Introduce cluster daemons * Mentions HA, monitoring, telemetry – future spark talk topics 
Talk Points What is business intelligence? Who are the customers? What is thrift? An open source protocol that handles data transfers between client and services. Similar to SOAP in functionality. Spark Thrift server -> at launch time creates a spark SQL application session -> sends queries to Spark SQL for processing

Spark on Azure HDInsight - spark meetup seattle

Recomendados

Recomendados

Más contenido relacionado

La actualidad más candente

La actualidad más candente (20)

Destacado

Destacado (20)

Similar a Spark on Azure HDInsight - spark meetup seattle

Similar a Spark on Azure HDInsight - spark meetup seattle (20)

Último

Último (20)

Spark on Azure HDInsight - spark meetup seattle

Notas del editor