How to use Big Data

•

1 recomendación•970 vistas

Bei vielen Unternehmen fallen extrem viele Daten an, die mit geeigneten Auswertungen gewinnbringend analysiert werden können. Doch häufig werden solche Auswertungen mühsam manuell erstellt, sodass Aufwand und Ertrag kaum übereinstimmen. Der Markt für die moderne Datenanalyse hat sich in den letzten Jahren entwickelt. Die Tools zur Datenauswertung sind einfacher, effizienter und skalierbarer geworden. Zudem können die Daten interaktiv und in Echtzeit ausgewertet und präsentiert werden. Referent Matthias Gessenay zeigte in seinem Referat, wie mit mit Hilfe einfacher Tools wie PowerBI, PoverView, PowerPivot und SharePoint Daten analysiert und ästhetisch ansprechend dargestellt werden können. • Welche Daten Sie nutzen können • Nutzen von Big Data • Was leistet Microsoft Excel? • Was bedeutet am meisten Aufwand? • Wie Sie Daten am besten konsolidieren • Natural Language Query • So einfach ist PowerBi • Was SharePoint in diesem Zusammenhang bietet In einer Demo zeigte Matthias Gessenay zudem praktische Anwendungen mit Natural Language Query und PowerBI. Gerne stellen wir Ihnen die Slides des Referats zur Verfügung.

Tecnología

Digicomp 1
Kursleitung:
Die Microsoft BI Plattform in der Cloud
Matthias Gessenay, 20. Januar 2016 / Matthias.gessenay@corporatesoftware.ch

2Digicomp
Copyrights
 Folien z.T. entnommen aus dem Azure Readiness Slidedeck von Microsoft (https://github.com/Azure-
Readiness/CloudDataCamp/blob/master/Presentation/HDInsight/Hadoop%20in%20Azure.pptx)
 Folien z.T. entnommen aus der MS Ignite Session PowerBI Overview
(http://www.google.ch/url?sa=t&rct=j&q=&esrc=s&source=web&cd=8&cad=rja&uact=8&ved=0ahUKEwiH3pygp7XKA
hVBVRoKHQ9KCJwQFghcMAc&url=http%3A%2F%2Fvideo.ch9.ms%2Fsessions%2Fignite%2F2015%2Fdecks%2FBRK25
56_Doyle.pptx&usg=AFQjCNHOr7Kb8pJEFnLKHvAMUho0AOBhjA)

6Digicomp
Data volume
Hadoop speichert Dateien in einem verteilten Dateisystem
 Verteilt über viele Server
 Dateien können über viele Knoten verteilt werden
Hadoop kann sehr grosse Datenmengen speichern
 Skalierbar von einigen zu vielen tausend Knoten
 Dateien können grösser sein als die Kapazität eines einzelnen Knotens

7Digicomp
Data variety
 Hadoop speichert Dateien in einem nicht-relationalen Format

CalibriDigicomp
Hadoop vs. SQL
Relational
Database
SCALE (storage & processing)
Hadoop
Platform
schema
speed
governance
best fit use
processing
Required on write Required on read
Reads are fast Writes are fast
Standards and structured Loosely structured
Limited, no data processing Processing coupled with data
data typesStructured Multi and unstructured
Interactive OLAP Analytics
Complex ACID Transactions
Operational Data Store
Data Discovery
Processing unstructured data
Massive Storage/Processing

CalibriDigicomp
YARN: Next Generation Hadoop (Azure DataLake ist auf Yarn gebaut)
Single Use System
Batch Apps
Multi Use Data Platform
Batch, Interactive, Online, Streaming, …
1st Gen
of Hadoop
HDFS
(redundant, reliable storage)
MapReduce
(cluster resource management
& data processing)
Redundant, Reliable Storage
(HDFS)
Efficient Cluster Resource
Management & Shared Services
(YARN)
Flexible Data
Processing
Hive, Pig, others…
Batch
MapReduce
Batch & Interactive
Tez
Online Data
Processing
HBase, Accumulo
Stream
Processing
Storm
others
…
2nd Gen of Hadoop
Classic
Hadoop
Apps

CalibriDigicomp
http://hortonworks.com/blog/introducing-apache-hadoop-yarn/
Hadoop 2.0: Yarn

11Digicomp
Datenknoten
 Verteilt
 Lokaler Speicher
 Fehlertolerant (3 Kopien per Block)
 Splittet Dateien in Blöcke
Namensknoten
 Speichert keine Daten
 Weiss aber, wo welche Blöcke liegen
HDFS: Hadoop Storage

CalibriDigicomp
Hadoop MapReduce
………
Do work() Do work() Do work()

14Digicomp
HDInsight: What’s Different?
 Nicht so viel …
 HDP on Windows
 HDP on Linux
 Compute und Storage sind verteilt
 Azure Blob Storage

CalibriDigicomp
HDInsight Storage Infrastructure
HDInsight Compute Nodes
(Large VMs)
Azure Blob Storage
Azure Flat Network Storage
Stream data
to compute
Push data
back to storage
map sort shuffle reduce
http://dennyglee.com/2013/03/18/why-use-blob-storage-with-hdinsight-on-azure/

CalibriDigicomp
Mächtige Self-Service BI mit Excel 2013

19Digicomp
 Suited for self-service data that fits in Excel
 Data driven shaping – design while you drive
 Ideal for sampling data
 Partition data in Hadoop/Hive based on user
workloads
 No governors to prevent users from pulling «too
much data»
 Does not read compressed or binary files (yet)
Power Query

23Digicomp
Azure Data Lake
 Basierend auf Apache YARN
 Praktisch unbegrenzte Datenmengen / Rechenpower
 Zahlung nach Nutzung
 Aktuell noch auf Einladung
 Neue Sprache: U-SQL

25Digicomp
PowerBI
 Cloud Dashboards
 On Premise-Technologie verfügbar (DataZen)
 Datenanbindung via PowerBI sehr einfach
 Hybrid möglich

Más contenido relacionado

Destacado

Hacking ChallengesDigicomp Academy AG

Information System SecurityNovizul Evendi

Linux School: Advanced Administration for IBM SoftwareBill Malchisky Jr.

HTML5 Hacking - Yahoo! Open Hack DayTed Drake

sharepoint 2007 presentation in crcis saber tabatabaee

Ibm big dataandanalytics_28433_archposter_wht_mar_2014_v4Friedel Jonker

Ccna notesMubeen Chughtai

RED HAT CERTIFIED SYSTEM ADMINISTRATOR__EX210Raghuprasad Gundeti

What Is Tcp Iptmavroidis

Hacking educationFrederik Questier

RHEL6 - Rh135Ahmed Abbas Ahmed

Microsoft SQL Licensing Workshop - Software ONEDigicomp Academy AG

Redhat 6 & 7r9social

Os linux complete notesDreams Design

Wireless Hacking Fast TrackNovizul Evendi

Hacking Microsoft Remote Desktop Services for Fun and ProfitAlisa Esage Шевченко

RHEL6 - Rh124Ahmed Abbas Ahmed

Hadoop platform and application frameworkAhmed Gamil

SYMANTEC ENDPOINT PROTECTION Performing Server and Database ManagementDsunte Wilson

Linux+02Duong Hieu

Destacado (20)

Hacking Challenges

Information System Security

Linux School: Advanced Administration for IBM Software

HTML5 Hacking - Yahoo! Open Hack Day

sharepoint 2007 presentation in crcis

Ibm big dataandanalytics_28433_archposter_wht_mar_2014_v4

Ccna notes

RED HAT CERTIFIED SYSTEM ADMINISTRATOR__EX210

What Is Tcp Ip

Hacking education

RHEL6 - Rh135

Microsoft SQL Licensing Workshop - Software ONE

Redhat 6 & 7

Os linux complete notes

Wireless Hacking Fast Track

Hacking Microsoft Remote Desktop Services for Fun and Profit

RHEL6 - Rh124

Hadoop platform and application framework

SYMANTEC ENDPOINT PROTECTION Performing Server and Database Management

Linux+02

Similar a How to use Big Data

4×4: Big Data in der CloudDanny Linden

Hochleistungsspeichersysteme für Datenanalyse an der TU Dresden (Michael Kluge)data://disrupted®

Hadoop in modernen BI-Infrastruktureninovex GmbH

Cloud@Night: What’s new and hot in SharePoint 2016 & Office 365Digicomp Academy AG

130605 buildfrei skalieren_fuer_bigdataHenning Blohm

Logical Data Warehouse - SQL mit Oracle DB und HadoopOPITZ CONSULTING Deutschland

MongoDB Munich 2012: Spring Data MongoDBTobias Trelle

The Hadoop Connectioninovex GmbH

MapRecude: The Hadoop Connectionvesparun

Dataservices - Data Processing mit MicroservicesQAware GmbH

Günzel/Griesbaum -OpenShift und GitLab: Continuous delivery in der cloudAndreas Günzel

NoSQL - HyperGraphDBLudgerSchoenfeld

Fusion der Welten: Hadoop als DWH-Backend bei ProSiebeninovex GmbH

Technologie und SEO: Cloud, Big Data und Mobile First angeschautRalf Schwoebel

Geoinformatik-Kolloquium Juni 2012: High Performance Computing Cluster GISPeter Löwe

Cloud Computing im Kontext des D-GridStefan Freitag

Browserbasiertes computing, RIAThomas Christinck

Architekturen für .NET Core-AnwendungenRobin Sedlaczek

Dateisysteme und Datenbanken im Cloud ComputingLothar Wieske

Apache Kafkagedoplan

Similar a How to use Big Data (20)

4×4: Big Data in der Cloud

Hochleistungsspeichersysteme für Datenanalyse an der TU Dresden (Michael Kluge)

Hadoop in modernen BI-Infrastrukturen

Cloud@Night: What’s new and hot in SharePoint 2016 & Office 365

130605 buildfrei skalieren_fuer_bigdata

Logical Data Warehouse - SQL mit Oracle DB und Hadoop

MongoDB Munich 2012: Spring Data MongoDB

The Hadoop Connection

MapRecude: The Hadoop Connection

Dataservices - Data Processing mit Microservices

Günzel/Griesbaum -OpenShift und GitLab: Continuous delivery in der cloud

NoSQL - HyperGraphDB

Fusion der Welten: Hadoop als DWH-Backend bei ProSieben

Technologie und SEO: Cloud, Big Data und Mobile First angeschaut

Geoinformatik-Kolloquium Juni 2012: High Performance Computing Cluster GIS

Cloud Computing im Kontext des D-Grid

Browserbasiertes computing, RIA

Architekturen für .NET Core-Anwendungen

Dateisysteme und Datenbanken im Cloud Computing

Apache Kafka

Más de Digicomp Academy AG

Becoming Agile von Christian Botta – Personal Swiss Vortrag 2019Digicomp Academy AG

Swiss IPv6 Council – Case Study - Deployment von IPv6 in einer Container Plat...Digicomp Academy AG

Innovation durch kollaboration gennex 2018Digicomp Academy AG

Roger basler meetup_digitale-geschaeftsmodelle-entwickeln_handoutDigicomp Academy AG

Roger basler meetup_21082018_work-smarter-not-harder_handoutDigicomp Academy AG

Xing expertendialog zu nudge unit xDigicomp Academy AG

Responsive Organisation auf Basis der Holacracy – nur ein Hype oder die Zukunft?Digicomp Academy AG

IPv6 Security Talk mit Joe KleinDigicomp Academy AG

Agiles Management - Wie geht das?Digicomp Academy AG

Gewinnen Sie Menschen und Ziele - Referat von Andi OdermattDigicomp Academy AG

Querdenken mit Kreativitätsmethoden – XING ExpertendialogDigicomp Academy AG

Xing LearningZ: Digitale Geschäftsmodelle entwickelnDigicomp Academy AG

Swiss IPv6 Council: The Cisco-Journey to an IPv6-only BuildingDigicomp Academy AG

UX – Schlüssel zum Erfolg im Digital BusinessDigicomp Academy AG

Minenfeld IPv6Digicomp Academy AG

Was ist design thinkingDigicomp Academy AG

Die IPv6 Journey der ETH Zürich Digicomp Academy AG

Xing LearningZ: Die 10 + 1 Trends im (E-)CommerceDigicomp Academy AG

Zahlen Battle: klassische werbung vs.online-werbung-somexcloudDigicomp Academy AG

General data protection regulation-slidesDigicomp Academy AG

Más de Digicomp Academy AG (20)

Becoming Agile von Christian Botta – Personal Swiss Vortrag 2019

Swiss IPv6 Council – Case Study - Deployment von IPv6 in einer Container Plat...

Innovation durch kollaboration gennex 2018

Roger basler meetup_digitale-geschaeftsmodelle-entwickeln_handout

Roger basler meetup_21082018_work-smarter-not-harder_handout

Xing expertendialog zu nudge unit x

Responsive Organisation auf Basis der Holacracy – nur ein Hype oder die Zukunft?

IPv6 Security Talk mit Joe Klein

Agiles Management - Wie geht das?

Gewinnen Sie Menschen und Ziele - Referat von Andi Odermatt

Querdenken mit Kreativitätsmethoden – XING Expertendialog

Xing LearningZ: Digitale Geschäftsmodelle entwickeln

Swiss IPv6 Council: The Cisco-Journey to an IPv6-only Building

UX – Schlüssel zum Erfolg im Digital Business

Minenfeld IPv6

Was ist design thinking

Die IPv6 Journey der ETH Zürich

Xing LearningZ: Die 10 + 1 Trends im (E-)Commerce

Zahlen Battle: klassische werbung vs.online-werbung-somexcloud

General data protection regulation-slides

How to use Big Data

1. Digicomp 1 Kursleitung: Die Microsoft BI Plattform in der Cloud Matthias Gessenay, 20. Januar 2016 / Matthias.gessenay@corporatesoftware.ch

2. 2Digicomp Copyrights  Folien z.T. entnommen aus dem Azure Readiness Slidedeck von Microsoft (https://github.com/Azure- Readiness/CloudDataCamp/blob/master/Presentation/HDInsight/Hadoop%20in%20Azure.pptx)  Folien z.T. entnommen aus der MS Ignite Session PowerBI Overview (http://www.google.ch/url?sa=t&rct=j&q=&esrc=s&source=web&cd=8&cad=rja&uact=8&ved=0ahUKEwiH3pygp7XKA hVBVRoKHQ9KCJwQFghcMAc&url=http%3A%2F%2Fvideo.ch9.ms%2Fsessions%2Fignite%2F2015%2Fdecks%2FBRK25 56_Doyle.pptx&usg=AFQjCNHOr7Kb8pJEFnLKHvAMUho0AOBhjA)

3. Digicomp 3 Einführung in Apache Hadoop

4. 4Digicomp Apache Hadoop   

5. 6Digicomp Data volume Hadoop speichert Dateien in einem verteilten Dateisystem  Verteilt über viele Server  Dateien können über viele Knoten verteilt werden Hadoop kann sehr grosse Datenmengen speichern  Skalierbar von einigen zu vielen tausend Knoten  Dateien können grösser sein als die Kapazität eines einzelnen Knotens

6. 7Digicomp Data variety  Hadoop speichert Dateien in einem nicht-relationalen Format

7. CalibriDigicomp Hadoop vs. SQL Relational Database SCALE (storage & processing) Hadoop Platform schema speed governance best fit use processing Required on write Required on read Reads are fast Writes are fast Standards and structured Loosely structured Limited, no data processing Processing coupled with data data typesStructured Multi and unstructured Interactive OLAP Analytics Complex ACID Transactions Operational Data Store Data Discovery Processing unstructured data Massive Storage/Processing

8. CalibriDigicomp YARN: Next Generation Hadoop (Azure DataLake ist auf Yarn gebaut) Single Use System Batch Apps Multi Use Data Platform Batch, Interactive, Online, Streaming, … 1st Gen of Hadoop HDFS (redundant, reliable storage) MapReduce (cluster resource management & data processing) Redundant, Reliable Storage (HDFS) Efficient Cluster Resource Management & Shared Services (YARN) Flexible Data Processing Hive, Pig, others… Batch MapReduce Batch & Interactive Tez Online Data Processing HBase, Accumulo Stream Processing Storm others … 2nd Gen of Hadoop Classic Hadoop Apps

9. CalibriDigicomp http://hortonworks.com/blog/introducing-apache-hadoop-yarn/ Hadoop 2.0: Yarn

10. 11Digicomp Datenknoten  Verteilt  Lokaler Speicher  Fehlertolerant (3 Kopien per Block)  Splittet Dateien in Blöcke Namensknoten  Speichert keine Daten  Weiss aber, wo welche Blöcke liegen HDFS: Hadoop Storage

11. CalibriDigicomp Hadoop MapReduce ……… Do work() Do work() Do work()

12. Digicomp 13 Apache Hadoop in Azure

13. 14Digicomp HDInsight: What’s Different?  Nicht so viel …  HDP on Windows  HDP on Linux  Compute und Storage sind verteilt  Azure Blob Storage

14. CalibriDigicomp HDInsight Storage Infrastructure HDInsight Compute Nodes (Large VMs) Azure Blob Storage Azure Flat Network Storage Stream data to compute Push data back to storage map sort shuffle reduce http://dennyglee.com/2013/03/18/why-use-blob-storage-with-hdinsight-on-azure/

15. 16Digicomp HDInsight Demo

16. 17Digicomp Microsoft Self Service-BI

17. CalibriDigicomp Mächtige Self-Service BI mit Excel 2013

18. 19Digicomp  Suited for self-service data that fits in Excel  Data driven shaping – design while you drive  Ideal for sampling data  Partition data in Hadoop/Hive based on user workloads  No governors to prevent users from pulling «too much data»  Does not read compressed or binary files (yet) Power Query

19. 22Digicomp Demo - HDInsight

20. 23Digicomp Azure Data Lake  Basierend auf Apache YARN  Praktisch unbegrenzte Datenmengen / Rechenpower  Zahlung nach Nutzung  Aktuell noch auf Einladung  Neue Sprache: U-SQL

21. CalibriDigicomp Demo

22. 25Digicomp PowerBI  Cloud Dashboards  On Premise-Technologie verfügbar (DataZen)  Datenanbindung via PowerBI sehr einfach  Hybrid möglich

23. CalibriDigicomp Demo

24. CalibriDigicomp Fragen?

How to use Big Data

Recomendados

Recomendados

Más contenido relacionado

Destacado

Destacado (20)

Similar a How to use Big Data

Similar a How to use Big Data (20)

Más de Digicomp Academy AG

Más de Digicomp Academy AG (20)

How to use Big Data