Modern Data Warehousing with the Microsoft Analytics Platform SystemJames Serra
The traditional data warehouse has served us well for many years, but new trends are causing it to break in four different ways: data growth, fast query expectations from users, non-relational/unstructured data, and cloud-born data. How can you prevent this from happening? Enter the modern data warehouse, which is able to handle and excel with these new trends. It handles all types of data (Hadoop), provides a way to easily interface with all these types of data (PolyBase), and can handle “big data” and provide fast queries. Is there one appliance that can support this modern data warehouse? Yes! It is the Analytics Platform System (APS) from Microsoft (formally called Parallel Data Warehouse or PDW) , which is a Massively Parallel Processing (MPP) appliance that has been recently updated (v2 AU1). In this session I will dig into the details of the modern data warehouse and APS. I will give an overview of the APS hardware and software architecture, identify what makes APS different, and demonstrate the increased performance. In addition I will discuss how Hadoop, HDInsight, and PolyBase fit into this new modern data warehouse.
Optimizing the Supply Chain with Knowledge Graphs, IoT and Digital Twins_Moor...Neo4j
With the world’s supply chain system in crisis, it’s clear that better solutions are needed. Digital twins built on knowledge graph technology allow you to achieve an end-to-end view of the process, supporting real-time monitoring of critical assets.
Data Ingestion in Big Data and IoT platformsGuido Schmutz
Many of the Big Data and IoT use cases are based on combining data from multiple data sources and to make them available on a Big Data platform for analysis. The data sources are often very heterogeneous, from simple files, databases to high-volume event streams from sensors (IoT devices). It’s important to retrieve this data in a secure and reliable manner and integrate it with the Big Data platform so that it is available for analysis in real-time (stream processing) as well as in batch (typical big data processing). In past some new tools have emerged, which are especially capable of handling the process of integrating data from outside, often called Data Ingestion. From an outside perspective, they are very similar to a traditional Enterprise Service Bus infrastructures, which in larger organization are often in use to handle message-driven and service-oriented systems. But there are also important differences, they are typically easier to scale in a horizontal fashion, offer a more distributed setup, are capable of handling high-volumes of data/messages, provide a very detailed monitoring on message level and integrate very well with the Hadoop ecosystem. This session will present and compare Apache NiFi, StreamSets and the Kafka Ecosystem and show how they handle the data ingestion in a Big Data solution architecture.
Data Quality Patterns in the Cloud with Azure Data FactoryMark Kromer
This is my slide presentation from Pragmatic Works' Azure Data Week 2019: Data Quality Patterns in the Cloud with Azure Data Factory using Mapping Data Flows
Modern Data Warehousing with the Microsoft Analytics Platform SystemJames Serra
The traditional data warehouse has served us well for many years, but new trends are causing it to break in four different ways: data growth, fast query expectations from users, non-relational/unstructured data, and cloud-born data. How can you prevent this from happening? Enter the modern data warehouse, which is able to handle and excel with these new trends. It handles all types of data (Hadoop), provides a way to easily interface with all these types of data (PolyBase), and can handle “big data” and provide fast queries. Is there one appliance that can support this modern data warehouse? Yes! It is the Analytics Platform System (APS) from Microsoft (formally called Parallel Data Warehouse or PDW) , which is a Massively Parallel Processing (MPP) appliance that has been recently updated (v2 AU1). In this session I will dig into the details of the modern data warehouse and APS. I will give an overview of the APS hardware and software architecture, identify what makes APS different, and demonstrate the increased performance. In addition I will discuss how Hadoop, HDInsight, and PolyBase fit into this new modern data warehouse.
Optimizing the Supply Chain with Knowledge Graphs, IoT and Digital Twins_Moor...Neo4j
With the world’s supply chain system in crisis, it’s clear that better solutions are needed. Digital twins built on knowledge graph technology allow you to achieve an end-to-end view of the process, supporting real-time monitoring of critical assets.
Data Ingestion in Big Data and IoT platformsGuido Schmutz
Many of the Big Data and IoT use cases are based on combining data from multiple data sources and to make them available on a Big Data platform for analysis. The data sources are often very heterogeneous, from simple files, databases to high-volume event streams from sensors (IoT devices). It’s important to retrieve this data in a secure and reliable manner and integrate it with the Big Data platform so that it is available for analysis in real-time (stream processing) as well as in batch (typical big data processing). In past some new tools have emerged, which are especially capable of handling the process of integrating data from outside, often called Data Ingestion. From an outside perspective, they are very similar to a traditional Enterprise Service Bus infrastructures, which in larger organization are often in use to handle message-driven and service-oriented systems. But there are also important differences, they are typically easier to scale in a horizontal fashion, offer a more distributed setup, are capable of handling high-volumes of data/messages, provide a very detailed monitoring on message level and integrate very well with the Hadoop ecosystem. This session will present and compare Apache NiFi, StreamSets and the Kafka Ecosystem and show how they handle the data ingestion in a Big Data solution architecture.
Data Quality Patterns in the Cloud with Azure Data FactoryMark Kromer
This is my slide presentation from Pragmatic Works' Azure Data Week 2019: Data Quality Patterns in the Cloud with Azure Data Factory using Mapping Data Flows
Data Catalog in Denodo Platform 7.0: Creating a Data Marketplace with Data Vi...Denodo
Watch Alberto's session from Fast Data Strategy on-demand here: https://buff.ly/2wByS41
Gartner’s recently published report “Data Catalogs Are the New Black in Data Management Analytics” emphasizes the importance of data catalogs.
Watch this session to learn more about:
• The vision behind the Denodo Data Catalog
• How to maximize information value with the Denodo Data Catalog
• Why it is essential to combine data delivery with a data catalog
Big Data Analytics Architectural Patterns and Best Practices (ANT201-R1) - AW...Amazon Web Services
In this session, we discuss architectural principles that helps simplify big data analytics.
We'll apply principles to various stages of big data processing: collect, store, process, analyze, and visualize. We'll disucss how to choose the right technology in each stage based on criteria such as data structure, query latency, cost, request rate, item size, data volume, durability, and so on.
Finally, we provide reference architectures, design patterns, and best practices for assembling these technologies to solve your big data problems at the right cost.
Kappa vs Lambda Architectures and Technology ComparisonKai Wähner
Real-time data beats slow data. That’s true for almost every use case. Nevertheless, enterprise architects build new infrastructures with the Lambda architecture that includes separate batch and real-time layers.
This video explores why a single real-time pipeline, called Kappa architecture, is the better fit for many enterprise architectures. Real-world examples from companies such as Disney, Shopify, Uber, and Twitter explore the benefits of Kappa but also show how batch processing fits into this discussion positively without the need for a Lambda architecture.
The main focus of the discussion is on Apache Kafka (and its ecosystem) as the de facto standard for event streaming to process data in motion (the key concept of Kappa), but the video also compares various technologies and vendors such as Confluent, Cloudera, IBM Red Hat, Apache Flink, Apache Pulsar, AWS Kinesis, Amazon MSK, Azure Event Hubs, Google Pub Sub, and more.
Video recording of this presentation:
https://youtu.be/j7D29eyysDw
Further reading:
https://www.kai-waehner.de/blog/2021/09/23/real-time-kappa-architecture-mainstream-replacing-batch-lambda/
https://www.kai-waehner.de/blog/2021/04/20/comparison-open-source-apache-kafka-vs-confluent-cloudera-red-hat-amazon-msk-cloud/
https://www.kai-waehner.de/blog/2021/05/09/kafka-api-de-facto-standard-event-streaming-like-amazon-s3-object-storage/
Deep-dive into Microservices Patterns with Replication and Stream Analytics
Target Audience: Microservices and Data Architects
This is an informational presentation about microservices event patterns, GoldenGate event replication, and event stream processing with Oracle Stream Analytics. This session will discuss some of the challenges of working with data in a microservices architecture (MA), and how the emerging concept of a “Data Mesh” can go hand-in-hand to improve microservices-based data management patterns. You may have already heard about common microservices patterns like CQRS, Saga, Event Sourcing and Transaction Outbox; we’ll share how GoldenGate can simplify these patterns while also bringing stronger data consistency to your microservice integrations. We will also discuss how complex event processing (CEP) and stream processing can be used with event-driven MA for operational and analytical use cases.
Business pressures for modernization and digital transformation drive demand for rapid, flexible DevOps, which microservices address, but also for data-driven Analytics, Machine Learning and Data Lakes which is where data management tech really shines. Join us for this presentation where we take a deep look at the intersection of microservice design patterns and modern data integration tech.
This is a brief technology introduction to Oracle Stream Analytics, and how to use the platform to develop streaming data pipelines that support a wide variety of industry use cases
We have embraced Cloud and Open-Source further enabling the analytics ecosystems by creating new integration capabilities at scale.
Simplifying technology footprints to make it easier to buy
Bringing scale to analytics
Virtual Flink Forward 2020: Netflix Data Mesh: Composable Data Processing - J...Flink Forward
Netflix processes trillions of events and petabytes of data a day in the Keystone data pipeline, which is built on top of Apache Flink. As Netflix has scaled up original productions annually enjoyed by more than 150 million global members, data integration across the streaming service and the studio has become a priority. Scalably integrating data across hundreds of different data stores in a way that enables us to holistically optimize cost, performance and operational concerns presented a significant challenge. Learn how we expanded the scope of the Keystone pipeline into the Netflix Data Mesh, our real-time, general-purpose, data transportation platform for moving data between Netflix systems. The Keystone Platform’s unique approach to declarative configuration and schema evolution, as well as our approach to unifying batch and streaming data and processing will be covered in depth.
“The upcoming sections cover introductory topic areas pertaining to the fundamental models used to categorize and define clouds and their most common service offerings, along with definitions of organizational roles and the specific set of characteristics that collectively distinguish a cloud.”
The new Microsoft Azure SQL Data Warehouse (SQL DW) is an elastic data warehouse-as-a-service and is a Massively Parallel Processing (MPP) solution for "big data" with true enterprise class features. The SQL DW service is built for data warehouse workloads from a few hundred gigabytes to petabytes of data with truly unique features like disaggregated compute and storage allowing for customers to be able to utilize the service to match their needs. In this presentation, we take an in-depth look at implementing a SQL DW, elastic scale (grow, shrink, and pause), and hybrid data clouds with Hadoop integration via Polybase allowing for a true SQL experience across structured and unstructured data.
Working From Anywhere with Advanced Load Balancing and VMware Horizon VDI Avi Networks
In this webinar, you will learn how to:
- Simplify your infrastructure and operations to deliver virtual desktops and apps
- Troubleshoot end-user experience issues with point and click simplicity
- Eliminate costly over-provisioning of load balancers and save costs for VDI deployments
- Deploy load balancing consistently for virtual desktops in any cloud environment
Introduction to Event Driven ArchitectureCitiusTech
In this document, we present the idea of EDA as a favored software architecture pattern, and check out how a leading healthcare system leveraged EDA to successfully meet its dynamic business needs.
Big data real time architectures -
How do to big data processing in real time?
What architectures are out there to support this paradigm?
Which one should we choose?
What Advantages / Pitfalls they contain.
Data Catalog in Denodo Platform 7.0: Creating a Data Marketplace with Data Vi...Denodo
Watch Alberto's session from Fast Data Strategy on-demand here: https://buff.ly/2wByS41
Gartner’s recently published report “Data Catalogs Are the New Black in Data Management Analytics” emphasizes the importance of data catalogs.
Watch this session to learn more about:
• The vision behind the Denodo Data Catalog
• How to maximize information value with the Denodo Data Catalog
• Why it is essential to combine data delivery with a data catalog
Big Data Analytics Architectural Patterns and Best Practices (ANT201-R1) - AW...Amazon Web Services
In this session, we discuss architectural principles that helps simplify big data analytics.
We'll apply principles to various stages of big data processing: collect, store, process, analyze, and visualize. We'll disucss how to choose the right technology in each stage based on criteria such as data structure, query latency, cost, request rate, item size, data volume, durability, and so on.
Finally, we provide reference architectures, design patterns, and best practices for assembling these technologies to solve your big data problems at the right cost.
Kappa vs Lambda Architectures and Technology ComparisonKai Wähner
Real-time data beats slow data. That’s true for almost every use case. Nevertheless, enterprise architects build new infrastructures with the Lambda architecture that includes separate batch and real-time layers.
This video explores why a single real-time pipeline, called Kappa architecture, is the better fit for many enterprise architectures. Real-world examples from companies such as Disney, Shopify, Uber, and Twitter explore the benefits of Kappa but also show how batch processing fits into this discussion positively without the need for a Lambda architecture.
The main focus of the discussion is on Apache Kafka (and its ecosystem) as the de facto standard for event streaming to process data in motion (the key concept of Kappa), but the video also compares various technologies and vendors such as Confluent, Cloudera, IBM Red Hat, Apache Flink, Apache Pulsar, AWS Kinesis, Amazon MSK, Azure Event Hubs, Google Pub Sub, and more.
Video recording of this presentation:
https://youtu.be/j7D29eyysDw
Further reading:
https://www.kai-waehner.de/blog/2021/09/23/real-time-kappa-architecture-mainstream-replacing-batch-lambda/
https://www.kai-waehner.de/blog/2021/04/20/comparison-open-source-apache-kafka-vs-confluent-cloudera-red-hat-amazon-msk-cloud/
https://www.kai-waehner.de/blog/2021/05/09/kafka-api-de-facto-standard-event-streaming-like-amazon-s3-object-storage/
Deep-dive into Microservices Patterns with Replication and Stream Analytics
Target Audience: Microservices and Data Architects
This is an informational presentation about microservices event patterns, GoldenGate event replication, and event stream processing with Oracle Stream Analytics. This session will discuss some of the challenges of working with data in a microservices architecture (MA), and how the emerging concept of a “Data Mesh” can go hand-in-hand to improve microservices-based data management patterns. You may have already heard about common microservices patterns like CQRS, Saga, Event Sourcing and Transaction Outbox; we’ll share how GoldenGate can simplify these patterns while also bringing stronger data consistency to your microservice integrations. We will also discuss how complex event processing (CEP) and stream processing can be used with event-driven MA for operational and analytical use cases.
Business pressures for modernization and digital transformation drive demand for rapid, flexible DevOps, which microservices address, but also for data-driven Analytics, Machine Learning and Data Lakes which is where data management tech really shines. Join us for this presentation where we take a deep look at the intersection of microservice design patterns and modern data integration tech.
This is a brief technology introduction to Oracle Stream Analytics, and how to use the platform to develop streaming data pipelines that support a wide variety of industry use cases
We have embraced Cloud and Open-Source further enabling the analytics ecosystems by creating new integration capabilities at scale.
Simplifying technology footprints to make it easier to buy
Bringing scale to analytics
Virtual Flink Forward 2020: Netflix Data Mesh: Composable Data Processing - J...Flink Forward
Netflix processes trillions of events and petabytes of data a day in the Keystone data pipeline, which is built on top of Apache Flink. As Netflix has scaled up original productions annually enjoyed by more than 150 million global members, data integration across the streaming service and the studio has become a priority. Scalably integrating data across hundreds of different data stores in a way that enables us to holistically optimize cost, performance and operational concerns presented a significant challenge. Learn how we expanded the scope of the Keystone pipeline into the Netflix Data Mesh, our real-time, general-purpose, data transportation platform for moving data between Netflix systems. The Keystone Platform’s unique approach to declarative configuration and schema evolution, as well as our approach to unifying batch and streaming data and processing will be covered in depth.
“The upcoming sections cover introductory topic areas pertaining to the fundamental models used to categorize and define clouds and their most common service offerings, along with definitions of organizational roles and the specific set of characteristics that collectively distinguish a cloud.”
The new Microsoft Azure SQL Data Warehouse (SQL DW) is an elastic data warehouse-as-a-service and is a Massively Parallel Processing (MPP) solution for "big data" with true enterprise class features. The SQL DW service is built for data warehouse workloads from a few hundred gigabytes to petabytes of data with truly unique features like disaggregated compute and storage allowing for customers to be able to utilize the service to match their needs. In this presentation, we take an in-depth look at implementing a SQL DW, elastic scale (grow, shrink, and pause), and hybrid data clouds with Hadoop integration via Polybase allowing for a true SQL experience across structured and unstructured data.
Working From Anywhere with Advanced Load Balancing and VMware Horizon VDI Avi Networks
In this webinar, you will learn how to:
- Simplify your infrastructure and operations to deliver virtual desktops and apps
- Troubleshoot end-user experience issues with point and click simplicity
- Eliminate costly over-provisioning of load balancers and save costs for VDI deployments
- Deploy load balancing consistently for virtual desktops in any cloud environment
Introduction to Event Driven ArchitectureCitiusTech
In this document, we present the idea of EDA as a favored software architecture pattern, and check out how a leading healthcare system leveraged EDA to successfully meet its dynamic business needs.
Big data real time architectures -
How do to big data processing in real time?
What architectures are out there to support this paradigm?
Which one should we choose?
What Advantages / Pitfalls they contain.
Esta es una presentacion de la arquitectura 3 capas realizada con informacion recopilada de varios sitios web y de un trabajo elaborado por nosotras en la Universidad
AWS ofrece una gran variedad de servicios de base de datos que se adaptan a los requisitos de su aplicación. Los servicios de bases de datos están totalmente administrados y se pueden implementar en cuestión de minutos con tan solo unos clics.
https://aws.amazon.com/es/products/databases/
AWS ofrece una gran variedad de servicios de base de datos que se adaptan a los requisitos de su aplicación. Los servicios de bases de datos están totalmente administrados y se pueden implementar en cuestión de minutos con tan solo unos clics.
https://aws.amazon.com/es/products/databases/
Seminario UADY / FMAT. Marzo 2014. Presentar temas relacionados al cómputo en la nube y desde el punto de vista académico, tanto actuales como retos y oportunidades de investigación a futuro.
El exponencial crecimiento de aplicaciones web, móviles y la entrada permanente de dispositivos conectados a internet trajo consigo un cambio en la administración de los datos y una transformación sin precedentes con respecto a como se hacía décadas atrás y de la forma como se diseñaba y operaba a nivel plataformas tecnológicas. Requerimientos provenientes de la nueva economía de Internet presionaron a las empresas emprendedoras de nuevos proyectos y soluciones, más allá de los límites de las bases de datos relacionales (RDBMS) e introdujeron un nuevo tipo de base de datos al dominio de los entornos tecnológicos: Las Arquitecturas de Tipo NoSQL.
Hay un largo camino por recorrer antes de contemplar la posibilidad de implementar una solución en una plataforma que para nuestro entorno local es totalmente nueva y está relacionado con el hecho del poco o ningún conocimiento o referencia de implementaciones que se tiene sobre las mismas.
Es por eso que se habla de un cambio de paradigma, dado que es un nuevo planteamiento para construir, implementar y soportar arquitecturas de TI de alcance masivo. Hoy estamos acostumbrados que muchos temas sean hechos a veces incuestionables, es el resultado de campañas de mercadeo y ventas de la oferta, que unido a la resignación de la demanda que ha creído y crecido pensando que no hay nada mejor disponible.
Caso Walmart y Denodo: ¿Cómo afrontar con éxito la transición a la nube de la...Denodo
Watch full webinar here: https://bit.ly/33eqN3y
Los sistemas de gestión de datos en la nube han alcanzado los niveles de madurez y seguridad necesarios para permitir arquitecturas complejas de BI 100% en la nube. Sin embargo, la transición no es sencilla y en muchos casos puede llevar años. En esta sesión, analizaremos junto con Miguel Angel Burguete, Senior Data Architect de Walmart su estrategia de adopción de la nube y el uso de la herramienta de virtualización de datos Denodo para agilizar sus procesos de gestión de datos. También repasaremos las complicaciones del estado híbrido en el que se combina nuevos y viejos sistemas, y veremos cómo utilizar herramientas como la virtualización de datos para simplificar y acelerar el proceso junto.
Es un diagrama para La asistencia técnica o apoyo técnico es brindada por las compañías para que sus clientes puedan hacer uso de sus productos o servicios de la manera en que fueron puestos a la venta.
En este documento analizamos ciertos conceptos relacionados con la ficha 1 y 2. Y concluimos, dando el porque es importante desarrollar nuestras habilidades de pensamiento.
Sara Sofia Bedoya Montezuma.
9-1.
Las lámparas de alta intensidad de descarga o lámparas de descarga de alta in...espinozaernesto427
Las lámparas de alta intensidad de descarga o lámparas de descarga de alta intensidad son un tipo de lámpara eléctrica de descarga de gas que produce luz por medio de un arco eléctrico entre electrodos de tungsteno alojados dentro de un tubo de alúmina o cuarzo moldeado translúcido o transparente.
lámparas más eficientes del mercado, debido a su menor consumo y por la cantidad de luz que emiten. Adquieren una vida útil de hasta 50.000 horas y no generan calor alguna. Si quieres cambiar la iluminación de tu hogar para hacerla mucho más eficiente, ¡esta es tu mejor opción!
Las nuevas lámparas de descarga de alta intensidad producen más luz visible por unidad de energía eléctrica consumida que las lámparas fluorescentes e incandescentes, ya que una mayor proporción de su radiación es luz visible, en contraste con la infrarroja. Sin embargo, la salida de lúmenes de la iluminación HID puede deteriorarse hasta en un 70% durante 10,000 horas de funcionamiento.
Muchos vehículos modernos usan bombillas HID para los principales sistemas de iluminación, aunque algunas aplicaciones ahora están pasando de bombillas HID a tecnología LED y láser.1 Modelos de lámparas van desde las típicas lámparas de 35 a 100 W de los autos, a las de más de 15 kW que se utilizan en los proyectores de cines IMAX.
Esta tecnología HID no es nueva y fue demostrada por primera vez por Francis Hauksbee en 1705. Lámpara de Nernst.
Lámpara incandescente.
Lámpara de descarga. Lámpara fluorescente. Lámpara fluorescente compacta. Lámpara de haluro metálico. Lámpara de vapor de sodio. Lámpara de vapor de mercurio. Lámpara de neón. Lámpara de deuterio. Lámpara xenón.
Lámpara LED.
Lámpara de plasma.
Flash (fotografía) Las lámparas de descarga de alta intensidad (HID) son un tipo de lámparas de descarga de gas muy utilizadas en la industria de la iluminación. Estas lámparas producen luz creando un arco eléctrico entre dos electrodos a través de un gas ionizado. Las lámparas HID son conocidas por su gran eficacia a la hora de convertir la electricidad en luz y por su larga vida útil.
A diferencia de las luces fluorescentes, que necesitan un recubrimiento de fósforo para emitir luz visible, las lámparas HID no necesitan ningún recubrimiento en el interior de sus tubos. El propio arco eléctrico emite luz visible. Sin embargo, algunas lámparas de halogenuros metálicos y muchas lámparas de vapor de mercurio tienen un recubrimiento de fósforo en el interior de la bombilla para mejorar el espectro luminoso y reproducción cromática. Las lámparas HID están disponibles en varias potencias, que van desde los 25 vatios de las lámparas de halogenuros metálicos autobalastradas y los 35 vatios de las lámparas de vapor de sodio de alta intensidad hasta los 1.000 vatios de las lámparas de vapor de mercurio y vapor de sodio de alta intensidad, e incluso hasta los 1.500 vatios de las lámparas de halogenuros metálicos.
Las lámparas HID requieren un equipo de control especial llamado balasto para funcionar
Inteligencia Artificial y Ciberseguridad.pdfEmilio Casbas
Recopilación de los puntos más interesantes de diversas presentaciones, desde los visionarios conceptos de Alan Turing, pasando por la paradoja de Hans Moravec y la descripcion de Singularidad de Max Tegmark, hasta los innovadores avances de ChatGPT, y de cómo la IA está transformando la seguridad digital y protegiendo nuestras vidas.
(PROYECTO) Límites entre el Arte, los Medios de Comunicación y la Informáticavazquezgarciajesusma
En este proyecto de investigación nos adentraremos en el fascinante mundo de la intersección entre el arte y los medios de comunicación en el campo de la informática.
La rápida evolución de la tecnología ha llevado a una fusión cada vez más estrecha entre el arte y los medios digitales, generando nuevas formas de expresión y comunicación.
Continuando con el desarrollo de nuestro proyecto haremos uso del método inductivo porque organizamos nuestra investigación a la particular a lo general. El diseño metodológico del trabajo es no experimental y transversal ya que no existe manipulación deliberada de las variables ni de la situación, si no que se observa los fundamental y como se dan en su contestó natural para después analizarlos.
El diseño es transversal porque los datos se recolectan en un solo momento y su propósito es describir variables y analizar su interrelación, solo se desea saber la incidencia y el valor de uno o más variables, el diseño será descriptivo porque se requiere establecer relación entre dos o más de estás.
Mediante una encuesta recopilamos la información de este proyecto los alumnos tengan conocimiento de la evolución del arte y los medios de comunicación en la información y su importancia para la institución.
3Redu: Responsabilidad, Resiliencia y Respetocdraco
¡Hola! Somos 3Redu, conformados por Juan Camilo y Cristian. Entendemos las dificultades que enfrentan muchos estudiantes al tratar de comprender conceptos matemáticos. Nuestro objetivo es brindar una solución inclusiva y accesible para todos.
3. ¿De
qué
vamos
a
hablar?
• Mensajeo
,
Nempo
real
y
batch
• Problemas
de
integración
de
datos
• Arquitectura
Lambda
• ¿Cómo
me
ayuda?
• KaRa,
Spark,
Cassandra
,
Redis
• Implementación
5. Necesidades
Empresariales
• Infraestructura
Escalable
• ParNcionamiento
de
los
datos
• Replicación
• Arquitectura
descentralizada
(Shared
Nothing)
• Paralelización
• Aislamiento
• Localidad
de
datos
6. El
problema
• El
problema
de
hoy
es
que
no
todas
las
aplicaciones
pueden
ser
solamente
en
lotes
(batch
processsing)
• Existen
aplicaciones
que
requieren
hacer
un
cómputo
sobre
los
datos
para
tomar
decisiones
en
Nempo
real
• Ej:
Alertas
Financieras,
Detección
de
Fraude
etc
7. La
necesidad
Necesito
acceder
rápidamente
a
datos
históricos
(Big
Data)
para
efectuar
modelos
predicNvos
pero
también
con
datos
en
Nempo
real.
9. ¿En
que
consite
la
arquitectura
Lambda?
• La
desarrollo
Nathan
Martz
• Se
cuenta
con
3
capas
• Capa
por
lotes
(Batch
Layer)
• Capa
de
servicio
(Serving
Layer)
• Capa
de
velocidad
(Speed
Layer)
10. Capa
por
lotes
(Batch
Layer)
• Responsable
de
almacenar
todos
los
datos
que
llegan
ya
sea
a
un
repositorio
como
HDFS,
Cassandra
,
Ceph
etc.
• Efectuar
el
cómputo
sobre
esos
datos
para
acceder
a
vistas
o
a
información
arbitraria.
11. Capa
de
Servicio
(Service
Layer)
• La
salida
de
la
capa
por
lotes
es
un
conjunto
de
datos
conteniendo
el
cómputo
de
las
vistas
• La
capa
de
servicio
es
responsable
para
la
indexación
y
exposición
de
esas
vistas
para
que
puedan
ser
buscadas
(querys)
12. Capa
de
Velocidad
(Speed
Layer)
• La
capa
de
velocidad
es
encargada
de
efectuar
el
cómputo
en
Nempo
real.
• Las
vistas
en
Nempo
real
son
transitorias
tan
pronto
como
los
datos
se
propaguen
a
la
capa
batch
y
de
servicio
14. El
problema
de
integración
de
datos
• Se
Nenen
muchos
sistemas
y
se
requiere
que
se
comuniquen
unos
con
otros
• A
través
de
APIS,
Servicios
Rest,
Web
Services
etc
• De
igual
manera
se
Nenen
otros
repositorios
de
datos
dónde
se
requiere
guardar
e
interactuar
con
estos
datos.
15. El
problema
de
integración
de
datos
Aplicaciónes
Web
Logs
Indexación
y
búsqueda
Aplicaciones
legadas
DB1
DB2
DB4
DB3
17. KaRa
• Mensajeo
Distribuido
de
alto
desempeño
• Desacopla
flujos
de
datos
• Maneja
Carga
masiva
de
datos
• Soporta
Consumidores
Masivos
• Distribución
y
parNció
a
través
de
nodos
• Recuperación
automáNca
contra
fallos
de
brokers
18. KaRa
Aplicaciónes
Web
Logs
Indexación
y
búsqueda
Aplicaciones
legadas
DB1
DB2
DB4
DB3
19. Arquitectura
Lambda
App
1
App
2
App
3
App
4
(Batch
Layer)
(Service
Layer)
Admin
Users
Dashboards
(Speed
Layer)
Vistas
en
Nempo
Real
Vistas
por
lotes
20. Casos
de
uso
• Stream
de
datos
de
KaRa
a
Cassandra
• Stream
de
datos
de
Kafla
a
Spark
y
escribir
en
cassandra
• Leer
datos
de
Spark
Streaming
y
escribirlos
en
Cassandra
• Leer
datos
de
Cassandra
a
Spark
22. Spark
Streaming
• Necesito
resultados
conNnuos
en
un
stream
• Se
requiere
procesar
los
datos
y
devolverlos
a
una
aplicación
o
persisNrlos
• Flujo
conNnuo
de
datos
a
través
de
DiscreNzed
Data
Streams
• Garangas
que
solamente
se
proceso
una
vez
el
dato
• A
los
datos
que
llegan
puedo
aplicar
algoritmos
de
Machine
learning
con
MLLib
24. Ok
ok,
pero
quiero
dashboards
en
Nempo
real
Existen
varias
opciones
1.-‐
El
cómputo
una
vez
hecho
por
Spark
Streaming
/
Storm
lo
persisto
en
un
cache
como
Redis
2.-‐
Redis
es
un
servidor
de
estructuras
de
datos
que
almacena
en
memoria
(MemCached)
3.-‐
UNlizo
Node/JS
Socket.IO
con
el
publish
subscribe
de
REDIS
para
empujar
datos
25. 2da
Opción
• El
cómputo
hecho
por
Spark
Streaming
lo
devuelvo
a
KaRa
y
exisNrá
algún
consumidor
(aplicación)
para
pintar
los
datos
• KaRa
también
se
integra
con
NodeJS
26. Conclusión
• La
arquitectura
lambda
implica
muchas
tecnologías
,
e
infraestructura
• Puede
ser
muy
úNl
en
casos
de
negocio
• Se
debe
tener
en
cuenta
las
configuraciones
de
KaRa,
Spark
Streamig,
Spark
(Cluster),
Cassandra
,
Redis
etc
• DEN
VALOR
A
SUS
DATOS