SlideShare una empresa de Scribd logo
1 de 2227
Descargar para leer sin conexión
Contents
BigInsights 4.1.0 1
BigInsights 3
Accessibility 3
Using keyboard shortcuts and accelerators 4
Terms and Conditions 7
Notices 8
Product overview 13
BigInsights information roadmap 14
Introduction to BigInsights 18
IBM Open Platform with Apache Hadoop and the BigInsights value-add services 20
IBM BigInsights Quick Start Edition for Non-Production Environments 21
Features and architecture 22
File systems 24
Hadoop Distributed File System 25
MapReduce frameworks 26
Hadoop MapReduce 27
Yarn 28
Open source technologies 29
Ambari 32
Flume 34
Hadoop 36
HBase 38
Hive 40
Kafka 43
Oozie 44
Pig 45
Slider 47
Solr 49
Spark 50
Sqoop 53
ZooKeeper 54
Other Apache projects 55
Text Analytics 56
IBM Big SQL 57
Integration with other IBM products 60
Suggested services layout for IBM Open Platform with Apache Hadoop and
BigInsights value-added services 62
Scenarios for working with big data 63
Predictive modeling 64
Consumer sentiment insight 66
Research and business development 67
Where BigInsights fits in an enterprise data architecture 69
Release notes 70
Release notes - IBM Open Platform with Apache Hadoop, and BigInsights value-
add services, Version 4.1 71
What's new for Version 4.1 75
Installing 80
Installing IBM Open Platform with Apache Hadoop 81
Important installation information 83
Preparing to install IBM Open Platform with Apache Hadoop 86
Reference Architecture 88
Suggested services layout for IBM Open Platform with Apache Hadoop and
BigInsights value-added services 89
Meeting minimum system requirements 91
Collect host names and other information for your installation 92
Preparing your environment 93
Users and groups for IBM Open Platform with Apache Hadoop 102
Default Ports created by a typical installation 104
Setting up port forwarding from private to edge nodes 106
Configuring your browser 108
Configuring authentication 109
Configuring LDAP authentication on RHEL 6 110
Obtaining software for the IBM Open Platform with Apache Hadoop 113
Downloading the IBM repository definition for the IBM Open Platform with
Apache Hadoop 115
Creating a mirror repository for the IBM Open Platform with Apache Hadoop
software 116
Creating a repository for Spectrum Scale 118
Installing IBM Open Platform with Apache Hadoop on SUSE Linux Enterprise
Server (SLES) by using tar files 120
Running the installation package 122
Validating your installation 131
Upgrading the Java (JDK) version 132
Installing and configuring HttpFS on IBM Open Platform with Apache Hadoop 134
Installing and configuring WebHDFS with Knox on IBM Open Platform with Apache
Hadoop 137
Installing additional services in your IBM Open Platform 139
Cleaning up nodes before reinstalling software 140
HostCleanup.ini file 144
HostCleanup_Custom_Actions.ini file 146
Advanced installation planning 147
Serviceability tools 148
Directories created when installing IBM Open Platform with Apache Hadoop150
Planning for high availability 152
Configuring Slider HBase 153
Setting up a dual network for IBM Open Platform with Apache Hadoop 157
Enabling CMX compression to compress the output of the intermediate jobs
generated by Pig 159
Configuring YARN container execution 160
Installing the IBM BigInsights value-added services on IBM Open Platform with Apache
Hadoop 162
Preparing to install the BigInsights value-add services 164
Users, groups, and ports for BigInsights value-add services 169
Default Ports created by a typical BigInsights value-add services installation 171
Obtaining the BigInsights value-add services 172
Additional related software 175
Obtaining the BigInsights Quick Start Edition for non-production use 177
Installing the BigInsights value-add packages 178
Installing BigInsights Home 185
Installing the BigSheets service 187
Installing the BigInsights - Big SQL service 190
Preinstallation checker utility for Big SQL 194
Clean-up utility for Big SQL 198
Migrating from Big SQL 1.0 200
Installing the BigInsights - Data Server Manager service 203
Installing the Text Analytics service 207
Installing the Big R service 211
Installing Big R on a workstation or notebook 217
Enabling Knox for value-add services 218
Removing BigInsights value-add services 222
Installing the Enterprise Manager module for IBM Open Platform with Apache Hadoop
226
Acquiring Spectrum Scale and Platform Symphony 227
Spectrum Scale FPO (GPFS) 228
Installing Spectrum Scale FPO (GPFS) 229
Performance analysis for GPFS 230
Platform Symphony 232
Installing and configuring Platform Symphony 234
Installing the IBM Platform Symphony service stack on the Ambari server
235
Adding Platform Symphony as an Ambari service 237
Post-installation checks 239
Running a service check on the Platform Symphony cluster 240
Verifying component installation and configuration 241
Cleaning up an incomplete Symphony deployment from Ambari 242
Post-installation configuration 244
Mapping Hadoop YARN queues to Symphony YARN queues 245
Granting MySQL access privileges 248
Updating the Ambari GUI manually after integrating Symphony 249
Changing port numbers 251
Configuring master failover in an Ambari environment 252
Configuring automatic failure recovery for the Symphony YARN Resource
Manager in an Ambari environment 253
Opening the Platform Management Console from the Ambari console 254
Controlling hosts with Platform Symphony 255
Integrating open source MapReduce, YARN, or Spark in a Symphony cluster
256
Undoing integration of open source MapReduce, YARN, or Spark in a
Symphony cluster 257
Upgrading IBM Platform Symphony to IBM Open Platform 4.1 258
Getting started with Spark on EGO 261
Submitting a sample Spark job 263
Spark on EGO FAQs 265
Installing the free IBM Open Platform with Apache Hadoop and BigInsights Quick Start
Edition, non-production software 266
IBM Open Platform with Apache Hadoop and IBM BigInsights Quick Start Edition
for non-production environments, v4.1: Docker image README 267
Optional: Get nodes on a cloud environment 276
IBM BigInsights Quick Start Edition for Non-Production Environment, v4.1: VMware
image README 279
Tutorials 284
Tutorial: Analyzing big data with BigSheets 285
Lesson 1: Creating master workbooks from social media data 287
Lesson 2: Tailoring your data by creating child workbooks 290
Lesson 3: Combining the data from two workbooks 293
Lesson 4: Creating columns by grouping data 297
Lesson 5: Viewing data in BigSheets diagrams 299
Lesson 6: Visualizing and refining the results in charts 300
Lesson 7: Exporting data from your workbooks 306
Summary of analyzing data with BigSheets tutorial 308
Tutorial: Developing Big SQL queries to analyze big data 309
Setting up the Big SQL tutorial environment 311
Creating a directory in the distributed file system to hold your samples 312
Getting the sample data 313
Accessing the Big SQL sample data installed in the Big SQL service 314
Downloading sample data from a developerWorks source 316
Creating tables and loading sample data 317
Module 1: Creating and running SQL script files 320
Lesson 1.1: Creating an SQL script file 322
Lesson 1.2: Creating and running a simple query to begin data analysis 323
Lesson 1.3: Creating a view that represents the inventory shipped by branch
327
Lesson 1.4: Analyzing products and market trends with Big SQL Joins and
Predicates 329
Lesson 1.5: Creating advanced Big SQL queries that include common table
expressions, aggregate functions, and ranking 334
Lesson 1.6: Advanced: Porting an existing Hive UDF to Big SQL 337
Lesson 1.7: Advanced: Creating and running a simple Big SQL query from a
JDBC client application 339
Module 2: Analyzing big data by using Big SQL and BigSheets 343
Lesson 2.1: Preparing queries to export to BigSheets that examine the results
of sales by year 345
Lesson 2.2: Exporting Big SQL data about total sales by year to BigSheets 347
Lesson 2.3: Creating tables for BigSheets from other tables 349
Lesson 2.4: Exporting BigSheets data about IBM Watson blogs to Big SQL
tables 351
Task 2.4.1: Creating and modifying a BigSheets workbook from JSON
Array formatted data 352
Task 2.4.2: Exporting the BigSheets blog data workbook to a TSV file 354
Task 2.4.3: Creating a Big SQL script that creates Big SQL tables from the
exported TSV file 355
Task 2.4.4: Exporting the BigSheets workbook as a JSON Array for use
with a SerDe application in Big SQL 357
Task 2.4.5: Creating a Big SQL table that uses the SerDe application to
process the Watson blog data 359
Module 3: Analyzing Big SQL data in a client spreadsheet program 361
Lesson 3.1: Installing the IBM Data Server Driver Package for the client ODBC
drivers 362
Lesson 3.2: Importing Big SQL data to a client spreadsheet program 365
Module 4: Using Federation in Big SQL 367
Lesson 4.1: Setting up the client data source 368
Lesson 4.2: Configuring Big SQL as the Federated Server 370
Lesson 4.3: Integrating information from two companies 373
Module 5: Working with HBase tables 375
Lesson 5.1: Creating and populating HBase tables 376
Lesson 5.2: Creating and using HBase views 378
Summary of developing Big SQL queries to analyze big data 379
Tutorial: Analyzing big data with Big R 380
Lesson 1: Uploading the airline data set to the BigInsights server with Big R 382
Lesson 2: Exploring the structure of the data set with Big R 384
Lesson 3: Analyzing data with Big R 385
Lesson 4: Visualizing big data with Big R 387
Lesson 5: Extending R packages to work with big data 389
Lesson 6: Building scalable machine learning models with Big R 392
Summary of analyzing data with Big R tutorial 395
Tutorial: Analyzing text with BigInsights Text Analytics 396
Lesson 1: Setting up your project 398
Lesson 2: Selecting input documents and identifying examples 400
Lesson 3: Creating and testing extractors 403
Lesson 4: Writing and testing extractors for candidates 406
Lesson 5: Creating and testing final extractors 411
Lesson 6: Finalizing and saving the extractor 413
Summary of creating your first Text Analytics extractor 414
Importing and exporting data 415
Identifying your data resources 417
Recommended tools for importing data 420
Importing data at rest 421
Importing data by using Hadoop shell commands 423
Importing data in motion 425
Importing data by using Flume 427
Importing data from a data warehouse 428
Big SQL LOAD 430
How to diagnose and correct LOAD HADOOP problems 460
How to monitor the progress of LOAD HADOOP 463
Importing and exporting DB2 data by using Sqoop 464
Importing IMS data by using Sqoop 466
Integrating the Teradata Connector for Hadoop 468
Installing the Teradata Connector for Hadoop 469
Importing data with the Teradata Connector for Hadoop 471
Exporting data with the Teradata Connector for Hadoop 475
Running tdimport or tdexport from an Oozie application 479
Corresponding Sqoop and Teradata options 482
Setting up and administering security 484
Securing IBM Open Platform with Apache Hadoop 485
Setting up HTTPS with a self-signed certificate for the Ambari web interface 487
Setting up HTTPS with an authority certificate for the Ambari web interface 489
Setting up two-way SSL between the Ambari server and Ambari agents 491
Manually configuring SSL support for HBase REST gateway with Knox 492
Manually configuring SSL support for HBase, MapReduce, YARN, and HDFS web
interfaces 495
Manually configuring SSL support for HiveServer2 499
Manually configuring SSL support for Oozie 501
Apache Knox gateway overview 503
Hadoop service access in Knox 506
Knox Gateway directories 509
Knox Gateway samples 511
Changing the Knox Gateway port or path 513
Managing the master secret 514
Redeploying cluster topologies 515
Manually starting and stopping Apache Knox 518
Adding a new service to the Knox Gateway 519
Cluster topology definition in Apache Knox Gateway 520
Knox topology configuration to connect to Hadoop cluster services 522
Setting up Hadoop service URLs 523
Example: service definitions 524
Service connectivity validation 526
Configuring authentication on Knox and Ambari 528
Setting up LDAP authentication in Knox 529
Example: Active Directory configuration 531
Example: OpenLDAP configuration 532
Setting up LDAP or Active Directory authentication in Ambari 533
Knox Gateway Identity Assertion 537
Defining an identify-assertion provider 539
Adding a user mapping rule to an identity-assertion provider: 540
Concat Identity Assertion 541
User Mapping Example 542
Configuring group mapping 543
Knox Gateway security 545
Implementing web application security 546
Configuring Knox with a secured Hadoop cluster 548
Configuring wire encryption (SSL) 550
Using CA-signed certificates for production 551
Kerberos in IBM Open Platform with Apache Hadoop 552
Overview of Kerberos in IBM Open Platform with Apache Hadoop 553
Setting up Kerberos for IBM Open Platform with Apache Hadoop 556
Setting up a KDC manually 561
Manually generating keytabs for Kerberos authentication 563
Properties in the Kerberos descriptors 570
Enabling SPNEGO authentication for IBM Open Platform with Apache Hadoop
572
User and group management 573
Changing the administrator account password 575
Creating a local user 576
Changing the password of a local user 577
Deleting a local user 578
Creating a local group 579
Managing local group membership 580
Deleting a local group 581
Enabling transparent data encryption 582
Securing the BigInsights value-added services 585
Restarting Knox to access value added components 586
Setting up Kerberos for the BigInsights - Big R service 587
Setting up Kerberos for the BigInsights - Big SQL service 588
Setting up Kerberos for the BigInsights - Text Analytics service 590
Setting up Kerberos for the BigInsights - BigSheets service 591
Understanding and configuring BigSheets access of Big SQL table data 593
Enabling SSL encryption for the Big R service 595
Managing Access Control Lists (ACL) and Authorizations 597
ACL Management for Hive 598
Storage-Based Authorization 600
SQL-Standard Based Authorization 603
ACL Management for HDFS 607
ACL Management for HBase 609
ACL Management for YARN 612
Administering Ambari and components 613
High availability 614
Setting up NameNode high availability 615
Pointing to a new NameNode location in the Hive metastore 618
Setting up Oozie high availability 621
Setting up Resource Manager high availability 624
Enabling work-preserving ResourceManager restart 625
Enabling Hive metastore high availability 627
Enabling HiveServer2 high availability 629
High availability in Big SQL 632
Enabling Big SQL high availability 635
Disabling Big SQL high availability 638
Example of configuring clients to work with Big SQL high availability 639
Configuring high availability for HBase 641
Managing Flume 642
Flume configuration scenario 644
Decommissioning slave nodes 646
Ambari alerts and monitoring services 649
Working with alerts in the Ambari web interface 652
Configuring notifications 654
Creating or editing alert groups 657
Creating alert notification to track status changes in high availability failover 659
Pre-defined alerts in Ambari 663
HDFS service alerts 664
NameNode High Availability service alerts 667
YARN service alerts 669
MapReduce2 service alerts 671
HBase service alerts 672
Hive service alerts 674
Oozie servie alerts 675
ZooKeeper service alerts 676
Ambari alerts 677
Ambari metrics 678
Working with widgets 682
Working with service specific widgets 685
How to switch the Ambari metrics system to a distributed mode 689
Ambari views 691
Creating a Capacity-Scheduler view instance 692
Creating a Files view instance 694
Creating a Hive view instance 697
Additional Hive view configurations and setup 701
Creating a Pig view instance 703
Creating a Slider view instance 707
Developing applications to access and manage data 709
Developing Big SQL applications in your Hadoop environment 710
What's New with the Big SQL server 712
Big SQL configuration and log management 714
JSqsh client 715
Big SQL connections 716
Security in Big SQL 717
HBase tables 718
Data types that are supported by Big SQL 719
Big SQL Catalog schema 721
What you need to know before writing Big SQL applications 722
File formats supported by Big SQL 723
User-defined external scalar functions in Big SQL 728
Transactional behavior of Hadoop tables 731
Transactional behavior of CREATE TABLE ... AS in Big SQL 732
Transactional behavior of INSERT in Big SQL 733
Transactional behavior of LOAD HADOOP USING 734
Understanding data types 735
Data types migrated from Hive applications 736
Data types that are supported by Big SQL 737
How Hive handles NULL values on a partitioning column of type String 743
How to work with HBase tables 745
Security considerations when Big SQL accesses HBase objects 755
HDFS caching 757
Memory calculator worksheet 764
Developing routines 767
Routines 769
Overview of routines 770
Benefits of using routines 772
Types of routines 774
Built-in and user-defined routines 777
Built-in 778
User-defined 779
Comparison of user-defined and built-in routines 781
Choosing to use built-in or user-defined routines 783
Functional types of routines 784
Procedures 786
Functions 788
Scalar functions 790
Row functions 792
Table functions 793
Methods 794
Comparison of routine functional types 795
Choosing a routine functional type 798
Implementations of routines 800
Built-in routines 802
Sourced routines 803
SQL routines 804
External routines 805
Supported APIs and programming languages 807
Comparison of APIs and programming languages 808
Comparison of routine implementations 812
Choosing a routine implementation 815
Usage of routines 817
Administering databases with built-in routines 818
Extension of SQL function support with user-defined functions 820
Auditing using SQL table functions 821
Tools for developing routines 824
IBM Data Studio routine development support 825
SQL statements that can be executed in routines and triggers 826
SQL access levels 833
Determining what SQL statements can be executed in routines 835
Portability of routines 837
Interoperability of routines 838
Performance of routines 839
Security of routines 847
Securing routines 849
Authorizations and binding of routines that contain SQL 851
Data conflicts when procedures read from or write to tables 855
Debugging compiled SQL PL objects overview 857
External routines 858
External routine features 860
External function and method features 862
Scalar user-defined functions 864
External scalar function and method invocation 866
External table functions 867
External table function processing 868
Generic table functions 870
Using generic table functions 871
Java table function execution model 873
Scratchpads for external functions and external methods 875
Scratchpads for 32-bit and 64-bit operating systems 879
SQL in external routines 881
Parameter styles for external routines 884
Parameter handling 890
Supported routine programming languages 892
Comparison of APIs and programming languages 894
Performance considerations for developing routines 894
Security considerations for routines 897
Routine code page considerations 900
Application and routine support 901
32-bit and 64-bit support for external routines 903
Performance of 32-bit routines in 64-bit environments 904
XML data type support 905
Restrictions on external routines 907
Creating external routines 910
Writing routines 913
Debugging routines 915
Library and class management considerations 917
Deployment of routine library or class files 919
Security of external routine library or class files 921
Resolution of external routine library or class files 922
Modifications to external routine library or class files 923
Backup and restore of external routine library and class files 924
Performance and library management 925
C and C++ routines 926
Supported software (C) 928
Supported software (C++) 929
Tools for developing C and C++ routines 930
Designing C and C++ routines 931
Include file required for C and C++ routine development 933
Parameters in C and C++ routines 935
Parameter styles supported 937
Parameter null indicators 938
Parameter style SQL C and C++ procedures 939
Parameter style SQL C and C++ functions 943
Passing parameters by value and by reference 946
Parameters not required for result sets 947
The dbinfo structure routine parameter 948
Scratchpad as function parameter 952
Program type MAIN support for procedures 954
SQL data type representation 956
SQL data type handling 960
How to pass arguments to C routines 969
Graphic host variables 981
C++ type decoration 982
Returning result sets from procedures 984
Creating C and C++ routines 986
Building C and C++ routine code 989
Building C and C++ routine code using the sample bldrtn script 990
Building routines in C or C++ using the sample build script (UNIX)
992
Building C/C++ routines on Windows 992
Building C and C++ routine code from the command line 992
Compile and link options for C and C++ routines 994
AIX C routine compile and link options 995
AIX C++ routine compile and link options 995
HP-UX C routine compile and link options 995
HP-UX C++ routine compile and link options 995
Linux C routine compile and link options 995
Linux C++ routine compile and link options 995
Solaris C routine compile and link options 995
Solaris C++ routine compile and link options 995
Windows C and C++ routine compile and link options 995
Rebuilding routine shared libraries 995
Updating the database manager configuration parameters 996
COBOL procedures 997
Supported software 1000
Supported SQL data types in COBOL embedded SQL applications 1001
Building COBOL routines 1001
Compile and link options for COBOL routines 1002
AIX IBM COBOL routine compile and link options 1003
AIX Micro Focus COBOL routine compile and link options 1003
HP-UX Micro Focus COBOL routine compile and link options 1003
Solaris Micro Focus COBOL routine compile and link options 1003
Linux Micro Focus COBOL routine compile and link options 1003
Windows IBM COBOL routine compile and link options 1003
Windows Micro Focus COBOL routine compile and link options 1003
Building IBM COBOL routines on AIX 1003
Building UNIX Micro Focus COBOL routines 1003
Building IBM COBOL routines on Windows 1003
Building Micro Focus COBOL routines on Windows 1003
Java routines 1003
Supported software 1005
JDBC and SQLJ API support 1006
Specifying JDK for Java routine development (Linux and UNIX) 1007
Specification of a driver for Java routines 1009
Tools for developing Java routines 1010
Designing Java routines 1011
SQL data type representation 1013
Connection contexts in SQLJ routines 1015
Parameters in Java routines 1016
Parameter style JAVA procedures 1018
Parameter style JAVA functions 1020
Parameter style HIVE functions 1021
Supported SQL data types in HIVE routines 1024
Parameter style DB2GENERAL routines 1025
DB2GENERAL UDFs 1026
Supported SQL data types in DB2GENERAL routines 1029
Java classes for DB2GENERAL routines 1031
DB2GENERAL Java class: COM.ibm.db2.app.StoredProc
1032
DB2GENERAL Java class: COM.ibm.db2.app.UDF 1034
DB2GENERAL Java class: COM.ibm.db2.app.Lob 1037
DB2GENERAL Java class: COM.ibm.db2.app.Blob 1038
DB2GENERAL Java class: COM.ibm.db2.app.Clob 1039
Passing parameters of data type ARRAY to Java routines 1040
Returning result sets from Java (JDBC) procedures 1042
Returning result sets from Java (SQLJ) procedures 1043
Retrieving procedure result sets in Java (JDBC) applications and
procedures 1044
Retrieving procedure result sets in Java (SQLJ) applications and
procedures 1046
Restrictions on Java routines 1048
Java table function execution model 1050
Creating Java routines 1050
Creating Java routines from the command line 1052
Building Java routine code 1055
Building JDBC routines 1056
Building SQLJ routines 1058
Compile and link options for Java (SQLJ) routines 1059
SQLJ routine options for Linux and UNIX 1060
Deploying Java routines 1061
JAR file administration 1063
Updating Java routines 1065
Examples of Java (JDBC) routines 1067
Example: Array data type in Java (JDBC) procedure 1068
Example: XML and XQuery support in Java (JDBC) procedure 1069
Invoking routines 1074
Authorizations and binding of routines that contain SQL 1077
Routine names and paths 1077
Nested routine invocations 1079
Invoking 32-bit routines on a 64-bit database server 1080
References to procedures 1081
Calling procedures 1082
Calling procedures from applications or external routines 1084
Calling procedures from triggers or SQL routines 1086
Calling stored procedures from the CLP 1089
Calling stored procedures from CLI applications 1092
Calling stored procedures with array parameters from CLI
applications 1092
Procedure result sets 1092
Result sets from SQL data changes 1094
Result sets from SQL data changes using cursors 1097
References to functions 1098
Function selection 1100
Distinct types as UDF or method parameters 1102
LOB values as UDF parameters 1103
Invoking scalar functions or methods 1104
Invoking user-defined table functions 1106
Analyzing big data by using BigInsights value-added services 1108
Analyzing data with BigSheets 1109
Overview of BigSheets 1110
Workbooks and sheets 1112
Sheet types 1114
Group sheet 1116
Building sets of data 1120
Creating master workbooks from catalog tables 1121
Creating workbooks from existing workbooks 1122
Changing a column data type in a master workbook 1123
Data types 1124
Changing the data source for master workbooks 1125
Copying workbooks to a new cluster 1126
Exporting workbook metadata 1127
Importing workbook metadata 1128
Discovering data 1129
Adding columns in sheets 1130
Modifying columns in sheets 1131
Adding sheets to workbooks 1132
Viewing related sheets 1133
Viewing related workbooks 1134
Changing the data reader for workbooks 1135
Data readers 1136
Running workbooks 1140
Visualizing your result data in charts and maps 1141
Chart and map types 1142
Understanding null behavior 1145
Deleting workbooks 1146
Restoring deleted workbooks 1147
Purging deleted workbooks 1148
Formulas 1149
Functions 1150
Conditional functions 1152
DateTime functions 1157
Pattern syntax for custom DateTime formats 1165
Entity functions 1167
HTML and XML functions 1171
Math functions 1177
Geospatial functions 1182
Statistical functions 1190
Selection functions 1193
Text functions 1195
Text comparison functions 1214
URL functions 1219
Formula examples 1225
Sharing data 1227
Exporting data from a workbook 1228
Sharing workbooks with other users 1229
Creating and deleting catalog tables 1230
Extending BigSheets 1232
Administering BigSheets by using REST APIs 1233
Creating BigSheets plug-ins 1243
Building customized functions 1246
Building customized readers 1251
Building customized charts 1257
Uploading custom plug-ins to BigSheets 1261
Analyzing big data with Text Analytics 1262
Developing extractors in the web tool 1263
Information Extraction Web Tool 1264
Designing text extraction projects 1265
Design your project 1267
What are the provided extractors? 1269
Linguistic support 1270
Case study: Extracting insights from financial documents 1271
Create the extractor 1272
Refine the results 1275
Manage projects and extractors 1276
Use the workspace 1277
Manage projects 1278
Adding and removing sample documents 1279
Document size limitations 1283
Manage extractors 1285
Create, edit, and combine extractors 1287
Define dictionaries 1289
Define a list 1290
Define a mapping table 1291
Define a literal 1292
Define regular expressions 1293
Define sequence patterns 1295
Add proximity rules 1297
Define unions of extractors 1299
Run an extractor and refine results 1301
Refine results 1303
Eliminate duplicate and overlapping results 1306
Refine results using filters 1308
Export refined extractor results 1310
Extract in languages other than English 1312
Extend the provided extractors 1314
Define new extractors based on linguistic patterns 1316
Custom extractors 1317
Exporting Extractors 1318
Exporting extractors to AQL 1319
Exporting extractors as map/reduce jobs 1321
Exporting extractors as BigSheets functions 1323
Developing Text Analytics extractors using Annotation Query Language (AQL)1326
Annotation Query Language (AQL) 1328
Extractors 1329
Modules 1331
Scenarios that illustrate modules 1334
Best practices for developing modules 1342
AQL files 1343
Views 1345
Dictionaries 1347
Tables 1348
Functions 1349
Pre-built extractor libraries 1350
Named entity extractors 1354
Financial extractors 1361
Generic extractors 1372
Other extractors 1376
Sentiment extractors 1379
Machine Data Analytics extractors 1382
Base modules 1485
Guidelines for writing AQL 1487
Using basic feature AQL statements 1491
Using candidate generation AQL statements 1493
Using filter and consolidate AQL statements 1496
Creating complex AQL statements 1499
Enhancing content of AQL views 1502
Using naming conventions 1505
Data collection formats 1507
UTF-8 encoded text files 1508
UTF-8 encoded CSV files 1510
UTF-8 encoded JSON files in Hadoop text input format 1512
Multilingual support for Text Analytics 1515
Text Analytics Optimizer 1517
Execution plan 1522
Operators 1525
Relational operators 1526
Span aggregation operators 1528
Span extraction operators 1529
Specialized operators 1531
Tokenization 1532
Running Text Analytics extractors 1535
Run extractors on distributed files from the web tool 1536
Running extractors with the Java Text Analytics APIs 1538
Reading document collections with the DocReader API 1550
Text Analytics URI formats 1553
Improving extractor performance 1556
Know your performance requirements 1557
Follow the guidelines for regular expressions 1558
Use the consolidate clause wisely 1563
Using external resources in Text Analytics 1566
Analyzing and manipulating big data with Big SQL 1568
Configuring security for Big SQL 1570
Enabling authentication for Big SQL 1572
Authorization of Big SQL objects 1574
Database authorization 1577
Enabling SSL (Secure Socket Layer) encryption 1581
Default privileges granted on the bigsql database 1582
Configuration parameters 1584
Big SQL architecture 1587
Managing the Big SQL server 1589
Configuring the IBM Big SQL server 1590
Big SQL Scheduler 1593
Big SQL Input/Output 1595
JDBC and ODBC drivers 1596
JDBC driver 1597
ODBC driver for Linux 1599
ODBC driver for Windows 1601
Connecting to the Big SQL server that is part of the Big SQL service 1603
Downloading and Installing IBM Data Studio 1604
Creating or changing a JDBC driver definition 1605
Connecting to a Big SQL server 1606
Analyzing data with Big SQL 1608
How to run Big SQL queries 1610
Running Big SQL queries with Big SQL monitoring and edit tool 1611
Java SQL Shell (JSqsh) 1612
Big SQL statistics 1616
Statistics gathered from expression columns in statistical views 1617
Extending Big SQL 1619
Working with Hive ACID tables in Big SQL 1622
LOAD performance guidelines 1625
Tuning HBase performance 1627
HBase basics 1628
General HBase tuning 1631
Major compaction and data locality 1634
Hints for designing HBase tables 1636
Mapping data types to fields in HBase 1639
Hints for designing indexes 1642
Properties that can optimize HBase table scans 1645
Hints for optimizing LOAD 1646
Monitoring Big SQL in the IBM Open Platform with Apache Hadoop environment
1649
Monitoring metrics with the Big SQL query interface 1650
Monitoring the cluster status of your Big SQL queries 1653
Analyzing data with Big R 1654
Overview of Big R 1655
Connecting to a data set with Big R 1656
Running Big R scripts 1657
Troubleshooting and support 1658
Resolving problems with BigInsights 1659
Logging 1660
Logs and their locations 1661
Problems and workarounds 1664
Installation 1665
Unable to open Ambari browser 1666
Installing IBM Open Platform with Apache Hadoop does not complete
successfully because of connection issues 1667
Starting the Ambari server on Linux Power operating systems fails due to
connection error when the number of cores in the machine is greater than
48 1668
Components and value-add services 1669
Cannot stop all BigInsights value-add services from web interface 1671
Oozie: Cannot start Oozie service - ERROR XSDB6 1672
Restart all Hive services does not start all services correctly 1673
Failed to get schema version when starting Hive Metastore Service 1674
Adding additional Kafka Brokers after the initial installation might result in
an error when starting the broker 1675
Running Kafka producer with localhost generates error 1676
Continual "Hive Metastore Process" alerts showing in the Ambari web
interface even when process is running on pLinux 1678
After Kerberos is enabled, Ambari Quick Links might no longer work 1679
Ambari is unable to create a Files view for a cluster after Kerberos is
enabled 1680
Spark Thrift Server (1.5.1) goes down when Kerberos is enabled on the
cluster 1681
Restarting the Solr service might fail 1682
Big SQL 1683
Hive and Big SQL catalogs inconsistent 1685
TPCIP ports in FIN_WAIT1 state 1687
JSqsh Big SQL connection profile points to the wrong Big SQL service port
1688
Command fails with authorization error after successfully connecting to a
Big SQL server 1689
Installing the Big SQL service failed because of a tty requirement 1690
Cannot decommission a dead Big SQL worker node 1691
How to delete a Big SQL worker node 1693
Big SQL monitoring utility fails to install 1695
Big SQL monitoring utility (DSM) is not configured properly because of a
packaging error 1696
Uninstalling the BigInsights - Big SQL service does not completely uninstall
components 1697
Cannot use Hive directly to work with Big SQL HBase tables 1698
Big SQL instance owner does not exist in the server that hosts the
NameNode service causing Hadoop operations to fail 1699
Big SQL authorization errors after switching to an HDFS standby
NameNode 1700
How to remove a faulty node from the Big SQL service 1701
Interrupted operations can cause metadata inconsistency 1703
The Big SQL scheduler can report an incorrect number of nodes in a
Spectrum Scale FPO (GPFS) environment, affecting the Big SQL plan
quality 1704
Text Analytics 1705
Stopping and starting the Hive service that is installed on the same node as
BigInsights - Text Analytics might result in a failure. 1706
BigSheets 1707
Workbook names must be unique 1708
Workbook run does not progress 1709
BigSheets reader cannot view data 1710
BigSheets service start fails in a Kerberos environment 1712
Area charts hide data sets 1713
BigSheets hangs when you remove columns 1714
BigInteger columns cannot be Y axis 1715
Data missing from processed results 1716
Unable to create a table from BigSheets 1718
BigInsights Home service 1719
After installing the BigInsights Home service, other services will not start
1720
Big R problems and workarounds 1721
Removing Big R is not always successful 1722
General troubleshooting techniques and resources 1723
Subscribing to IBM Support updates 1724
Searching knowledge bases 1726
Getting fixes from Fix Central 1727
Contacting IBM Support 1729
Exchanging information with IBM 1731
Reference 1733
IBM Big SQL reference 1734
IBM Big SQL Reference 1735
How to read the syntax diagrams 1736
Conventions used for the SQL topics 1739
Error conditions 1740
Highlighting conventions 1741
Conventions describing Unicode data 1742
Language elements 1743
Characters 1744
Tokens 1746
Identifiers 1748
Data types 1777
Data type list 1779
Numbers 1780
Character strings 1783
Graphic strings 1788
National character strings 1790
Binary strings 1791
Large objects (LOBs) 1792
Datetime values 1794
Boolean values 1798
Cursor values 1799
XML values 1800
Array values 1801
Row values 1804
Anchored types 1806
User-defined types 1807
Promotion of data types 1811
Casting between data types 1813
Assignments and comparisons 1822
Rules for result data types 1842
Rules for string conversions 1849
String comparisons in a Unicode database 1851
Resolving the anchor object for an anchored type 1853
Resolving the anchor object for an anchored row type 1855
Database partition-compatible data types 1857
Constants 1859
Special registers 1865
CURRENT CLIENT_ACCTNG 1868
CURRENT CLIENT_APPLNAME 1869
CURRENT CLIENT_USERID 1870
CURRENT CLIENT_WRKSTNNAME 1871
CURRENT DATE 1872
CURRENT DBPARTITIONNUM 1873
CURRENT DECFLOAT ROUNDING MODE 1874
CURRENT DEFAULT TRANSFORM GROUP 1875
CURRENT DEGREE 1876
CURRENT EXPLAIN MODE 1877
CURRENT EXPLAIN SNAPSHOT 1879
CURRENT FEDERATED ASYNCHRONY 1880
CURRENT IMPLICIT XMLPARSE OPTION 1881
CURRENT ISOLATION 1882
CURRENT LOCALE LC_MESSAGES 1883
CURRENT LOCALE LC_TIME 1884
CURRENT LOCK TIMEOUT 1885
CURRENT MAINTAINED TABLE TYPES FOR OPTIMIZATION 1886
CURRENT MDC ROLLOUT MODE 1887
CURRENT MEMBER 1888
CURRENT OPTIMIZATION PROFILE 1889
CURRENT PACKAGE PATH 1890
CURRENT PATH 1891
CURRENT QUERY OPTIMIZATION 1892
CURRENT REFRESH AGE 1893
CURRENT SCHEMA 1894
CURRENT SERVER 1895
CURRENT SQL_CCFLAGS 1896
CURRENT TEMPORAL BUSINESS_TIME 1897
CURRENT TEMPORAL SYSTEM_TIME 1899
CURRENT TIME 1901
CURRENT TIMESTAMP 1902
CURRENT TIMEZONE 1904
CURRENT USER 1905
SESSION_USER 1906
SYSTEM_USER 1907
USER 1908
Global variables 1909
Types of global variables 1910
Authorization required for global variables 1912
Resolution of global variable references 1913
Using global variables 1915
Functions 1917
Methods 1933
Conservative binding semantics 1941
Expressions 1944
Datetime operations and durations 1957
CASE expression 1963
CAST specification 1966
Field reference 1972
XMLCAST specification 1974
ARRAY element specification 1976
Array constructor 1977
Dereference operation 1979
Method invocation 1981
OLAP specification 1983
ROW CHANGE expression 1998
Sequence reference 2000
Subtype treatment 2004
Determining data types of untyped expressions 2005
Row expression 2012
Predicates 2014
Search conditions 2015
Basic predicate 2018
Quantified predicate 2021
ARRAY_EXISTS predicate 2024
BETWEEN predicate 2025
Cursor predicates 2026
EXISTS predicate 2028
IN predicate 2029
LIKE predicate 2031
NULL predicate 2037
REGEXP_LIKE predicate 2038
Trigger event predicates 2041
TYPE predicate 2042
VALIDATED predicate 2044
XMLEXISTS predicate 2047
Built-in global variables 2050
CATALOG_SYNC_MODE global variable 2052
CLIENT_HOST global variable 2053
CLIENT_IPADDR global variable 2054
CLIENT_ORIGUSERID global variable 2055
COMPATIBILITY_MODE global variable 2056
CLIENT_USRSECTOKEN global variable 2057
MON_INTERVAL_ID global variable 2058
NLS_STRING_UNITS global variable 2059
PACKAGE_NAME global variable 2060
PACKAGE_SCHEMA global variable 2061
PACKAGE_VERSION global variable 2062
ROUTINE_MODULE global variable 2063
ROUTINE_SCHEMA global variable 2064
ROUTINE_SPECIFIC_NAME global variable 2065
ROUTINE_TYPE global variable 2066
TRUSTED_CONTEXT global variable 2067
Built-in functions 2068
Aggregate functions 2082
ARRAY_AGG 2083
AVG 2088
CORRELATION 2090
COUNT 2092
COVARIANCE 2094
COVARIANCE_SAMP 2096
GROUPING 2098
LISTAGG 2100
MAX 2103
MEDIAN 2105
MIN 2107
PERCENTILE_CONT 2109
PERCENTILE_DISC 2111
Regression functions (REGR_AVGX, REGR_AVGY, REGR_COUNT, ...)
2113
STDDEV 2117
STDDEV_SAMP 2119
SUM 2121
VARIANCE 2123
VARIANCE_SAMP 2125
XMLAGG 2127
XMLGROUP 2129
Scalar functions 2133
ABS or ABSVAL 2134
ACOS 2135
ADD_DAYS 2136
ADD_HOURS 2138
ADD_MINUTES 2140
ADD_MONTHS 2142
ADD_SECONDS 2144
ADD_YEARS 2146
AGE 2148
ARRAY_DELETE 2150
ARRAY_FIRST 2152
ARRAY_LAST 2153
ARRAY_NEXT 2154
ARRAY_PRIOR 2156
ASCII 2158
ASIN 2159
ATAN 2160
ATAN2 2161
ATANH 2162
BIGINT 2163
BINARY 2165
BITAND, BITANDNOT, BITOR, BITXOR, and BITNOT 2167
BLOB 2170
CARDINALITY 2171
CEILING or CEIL 2172
CHAR 2173
CHARACTER_LENGTH 2180
CHR 2182
CLOB 2183
COALESCE 2184
COLLATION_KEY 2185
COLLATION_KEY_BIT 2187
COMPARE_DECFLOAT 2189
CONCAT 2191
COS 2193
COSH 2194
COT 2195
CURSOR_ROWCOUNT 2196
DATAPARTITIONNUM 2197
DATE 2199
DAY 2201
DAYNAME 2203
IBM BigInsights
IBM BigInsights 4.1 documentation
Welcome to IBM® BigInsights®, a collection of powerful value-add services that
can be installed on top of the IBM Open Platform with Apache Hadoop. IBM Open
Platform with Apache Hadoop is a platform for analyzing and visualizing Internet-
scale data volumes that is powered by Apache Hadoop, an open source distributed
computing platform. The value-add services include Big SQL, BigSheets, Big R,
and Text Analytics. This information was updated December 2015.
Getting started
Introduction
Hadoop-Dev
FAQs
What’s new?
Release notes
Installing IBM Open Platform with Apache Hadoop
Installing the value-added services
Detailed system requirements
Common tasks
Security
Analyzing data with IBM Big SQL
A typical Big SQL scenario
Analyzing data (Big R, BigSheets, Big SQL, Text Analytics)
CREATE TABLE (HADOOP) (Big SQL)
CREATE TABLE (HBASE) (Big SQL)
Troubleshooting and support
Troubleshooting BigInsights
1
BigInsights support portal
Query IBM Support knowledge base
BigInsights for Hadoop
More information
Tutorials
Best Practices
IBM Big Data education
Understanding BigInsights
IBM Redbooks
© Copyright IBM Corporation 2009, 2015
2
-
-
-
-
IBM BigInsights
Accessibility
Accessibility features help users with physical disabilities, such as restricted mobility
or limited vision, to use software products successfully.
The following list specifies the major accessibility features:
All IBM® Open Platform with Apache Hadoop functionality is available using the
keyboard for navigation instead of the mouse.
You can customize the size of the fonts on IBM Open Platform with Apache
Hadoop user interfaces with your web browser.
BigInsights documentation is provided in an accessible format.
Accessible documentation
Documentation for BigInsights products is provided in XHTML 1.0 format, which is
viewable in most Web browsers. XHTML allows you to view documentation
according to the display preferences set in your browser. It also allows you to use
screen readers and other assistive technologies.
Syntax diagrams are provided in dotted decimal format. This format is available
only if you are accessing the online documentation using a screen-reader.
Using keyboard shortcuts and accelerators
You can use keys or key combinations to perform operations that can also be
done by using a mouse.
Parent topic:BigInsights
3
-
-
-
-
-
IBM BigInsights
Using keyboard shortcuts and accelerators
You can use keys or key combinations to perform operations that can also be done
by using a mouse.
About this task
You can initiate menu actions from the keyboard. Some menu items have
accelerators, which allow you to invoke the menu option without expanding the
menu. For example, you can enter CTRL+F for find, when the focus is on the details
view.
The major accessibility features in the Knowledge Center enable users to use
assistive technologies, magnify what is displayed on the screen, and initiate menu
actions from the keyboard. In addition, all images are provided with alternative text
so that users with vision impairments can understand the contents of the images.
Procedure
You can initiate menu actions from the keyboard in the following ways:
Press F10 to activate the keyboard. Then press the arrow keys to access specific
options, or press the same letter as the one that is underlined in the name of the
menu option you want to select. For example, to select the Help Index, press F10
to activate the main menu; then use the arrow keys to select Help > Help Index.
Press and hold the Alt key. Then press the same letter as the one that is
underlined in the name of the main menu option that you want to select. For
example, to select the General Help menu option, press Alt+H; then press G.
Tip: On some UNIX operating systems, you might need to press Ctrl instead of Alt
.
To exit the main menu without selecting an option, press Esc.
Often the directions for how to access a window or wizard instruct you to right-click
an object and select an object from the pop-up menu.
To open the pop-up menu by using keyboard shortcuts, first select the object then
press Shift+F10.
To access a specific menu option on the pop-up menu, press the same letter as
the one that is underlined in the name of the menu option you want to select.
The following tables provide instructions for using keyboard shortcuts and
accelerators.Table 1. General keyboard shortcuts and accelerators
Action Shortcut
Access the menu bar Alt or F10
Exit the main menu without selecting an option Esc
Go to the next menu item arrow keys, or the underlined letter in the menu
option
Go to the next field in a window Tab
Go back to the previous field in a window Shift+Tab
Go from the browser address bar to the browser
content area
F6
4
-
-
-
Table 2. Keyboard shortcuts for table actions
Table 3. Tree navigation
Table 4. Editing actions
Knowledge Center navigation: The major accessibility features in the Knowledge
Center enable users to do the following:
Use assistive technologies, such as screen-reader software and digital speech
synthesizers, to hear what is displayed on the screen. In this Knowledge Center, all
information is provided in HTML format. Consult the product documentation of the
assistive technology for details on using assistive technologies with HTML-based
information.
Operate specific or equivalent features by using only the keyboard.
Magnify what is displayed on the screen.
The following table gives instructions for how to navigate the Knowledge Center by
using the keyboard.Table 5. Keyboard shortcuts in the Knowledge Center
Find Ctrl+F
Find Next Alt+N
Action Shortcut
Move to the cell above or below up or down arrows
Move to the cell to the left or right left or right arrows
Give the next component focus Tab
Give the previous component focus Shift+Tab
Action Shortcut
Navigate out forward Tab
Navigate out backward Shift+Tab
Expand entry Right
Collapse entry Left
Toggle expand/collapse for entry Enter
Move up/down one entry up or down arrows
Move to first entry Home
Move to last visible entry End
Action Shortcut
Copy Ctrl+C
Cut Ctrl+X
Paste Ctrl+V
Select All Ctrl+A
Undo Ctrl+Z
Action Shortcut
5
- Standard operating system keystrokes are used for standard operating system
operations.
Parent topic:Accessibility
Go to the next link, button, or topic
branch from inside a frame (page)
Tab
Expand or collapse a topic branch Right and Left arrow keys
Move to the next topic branch Down arrow or Tab
Move to the previous topic branch Up arrow or Shift+Tab
Scroll to the top Home
Scroll to the bottom End
Go back Alt+Left arrow
Go forward Alt+Right arrow
Next frame Ctrl+Tab
Previous frame Shift+Ctrl+Tab
Print the current page or active frame Ctrl+P
6
IBM BigInsights
Terms and Conditions
Permissions for the use of these publications is granted subject to the following
terms and conditions.
Personal use: You may reproduce these Publications for your personal, non
commercial use provided that all proprietary notices are preserved. You may not
distribute, display or make derivative work of these Publications, or any portion
thereof, without the express consent of IBM.
Commercial use: You may reproduce, distribute and display these Publications
solely within your enterprise provided that all proprietary notices are preserved. You
may not make derivative works of these Publications, or reproduce, distribute or
display these Publications or any portion thereof outside your enterprise, without the
express consent of IBM.
Except as expressly granted in this permission, no other permissions, licenses or
rights are granted, either express or implied, to the Publications or any information,
data, software or other intellectual property contained therein.
IBM reserves the right to withdraw the permissions granted herein whenever, in its
discretion, the use of the Publications is detrimental to its interest or, as determined
by IBM, the above instructions are not being properly followed.
You may not download, export or re-export this information except in full compliance
with all applicable laws and regulations, including all United States export laws and
regulations.
IBM MAKES NO GUARANTEE ABOUT THE CONTENT OF THESE
PUBLICATIONS. THE PUBLICATIONS ARE PROVIDED "AS-IS" AND WITHOUT
WARRANTY OF ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING
BUT NOT LIMITED TO IMPLIED WARRANTIES OF MERCHANTABILITY, NON-
INFRINGEMENT, AND FITNESS FOR A PARTICULAR PURPOSE.
Parent topic:BigInsights
7
IBM BigInsights
Notices
This information was developed for products and services offered in the US. This
material might be available from IBM in other languages. However, you may be
required to own a copy of the product or product version in that language in order to
access it.
IBM may not offer the products, services, or features discussed in this document in
other countries. Consult your local IBM representative for information on the
products and services currently available in your area. Any reference to an IBM
product, program, or service is not intended to state or imply that only that IBM
product, program, or service may be used. Any functionally equivalent product,
program, or service that does not infringe any IBM intellectual property right may be
used instead. However, it is the user's responsibility to evaluate and verify the
operation of any non-IBM product, program, or service.
IBM may have patents or pending patent applications covering subject matter
described in this document. The furnishing of this document does not grant you any
license to these patents. You can send license inquiries, in writing, to:IBM Director
of Licensing
IBM Corporation
North Castle Drive, MD-NC119
Armonk, NY 10504-1785
US
For license inquiries regarding double-byte character set (DBCS) information,
contact the IBM Intellectual Property Department in your country or send inquiries, in
writing, to:Intellectual Property Licensing
Legal and Intellectual Property Law
IBM Japan Ltd.
19-21, Nihonbashi-Hakozakicho, Chuo-ku
Tokyo 103-8510, Japan
INTERNATIONAL BUSINESS MACHINES CORPORATION PROVIDES THIS
PUBLICATION "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESS
OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES
OF NON-INFRINGEMENT, MERCHANTABILITY OR FITNESS FOR A
PARTICULAR PURPOSE. Some jurisdictions do not allow disclaimer of express or
implied warranties in certain transactions, therefore, this statement may not apply to
you.
This information could include technical inaccuracies or typographical errors.
Changes are periodically made to the information herein; these changes will be
incorporated in new editions of the publication. IBM may make improvements and/or
changes in the product(s) and/or the program(s) described in this publication at any
time without notice.
Any references in this information to non-IBM websites are provided for convenience
only and do not in any manner serve as an endorsement of those websites. The
materials at those websites are not part of the materials for this IBM product and use
8
of those websites is at your own risk.
IBM may use or distribute any of the information you provide in any way it believes
appropriate without incurring any obligation to you.
Licensees of this program who wish to have information about it for the purpose of
enabling: (i) the exchange of information between independently created programs
and other programs (including this one) and (ii) the mutual use of the information
which has been exchanged, should contact:IBM Director of Licensing
IBM Corporation
North Castle Drive, MD-NC119
Armonk, NY 10504-1785
US
Such information may be available, subject to appropriate terms and conditions,
including in some cases, payment of a fee.
The licensed program described in this document and all licensed material available
for it are provided by IBM under terms of the IBM Customer Agreement, IBM
International Program License Agreement or any equivalent agreement between us.
The performance data and client examples cited are presented for illustrative
purposes only. Actual performance results may vary depending on specific
configurations and operating conditions.
Information concerning non-IBM products was obtained from the suppliers of those
products, their published announcements or other publicly available sources. IBM
has not tested those products and cannot confirm the accuracy of performance,
compatibility or any other claims related to non-IBMproducts. Questions on the
capabilities of non-IBM products should be addressed to the suppliers of those
products.
Statements regarding IBM's future direction or intent are subject to change or
withdrawal without notice, and represent goals and objectives only.
All IBM prices shown are IBM's suggested retail prices, are current and are subject
to change without notice. Dealer prices may vary.
This information is for planning purposes only. The information herein is subject to
change before the products described become available.
This information contains examples of data and reports used in daily business
operations. To illustrate them as completely as possible, the examples include the
names of individuals, companies, brands, and products. All of these names are
fictitious and any similarity to actual people or business enterprises is entirely
coincidental.
COPYRIGHT LICENSE:
This information contains sample application programs in source language, which
illustrate programming techniques on various operating platforms. You may copy,
modify, and distribute these sample programs in any form without payment to IBM,
for the purposes of developing, using, marketing or distributing application programs
conforming to the application programming interface for the operating platform for
which the sample programs are written. These examples have not been thoroughly
tested under all conditions. IBM, therefore, cannot guarantee or imply reliability,
serviceability, or function of these programs. The sample programs are provided
"AS IS", without warranty of any kind. IBM shall not be liable for any damages
arising out of your use of the sample programs.
9
-
-
-
-
-
-
Each copy or any portion of these sample programs or any derivative work must
include a copyright notice as follows:
© (your company name) (year).
Portions of this code are derived from IBM Corp. Sample Programs.
© Copyright IBM Corp. _enter the year or years_.
Parent topic:BigInsights
Trademarks
IBM, the IBM logo, and ibm.com are trademarks or registered trademarks of
International Business Machines Corp., registered in many jurisdictions worldwide.
Other product and service names might be trademarks of IBM or other companies.
A current list of IBM trademarks is available on the web at "Copyright and trademark
information" at www.ibm.com/legal/copytrade.shtml.
The following terms are trademarks or registered trademarks of other companies
and have been used in at least one of the documents in the BigInsights
documentation library:
Microsoft, Windows, and the Windows logo are trademarks of Microsoft
Corporation in the United States, other countries, or both.
Intel, Intel logo, Intel Inside, Intel Inside logo, Intel Centrino, Intel Centrino logo,
Celeron, Intel Xeon, Intel SpeedStep, Itanium, and Pentium are trademarks or
registered trademarks of Intel Corporation or its subsidiaries in the United States
and other countries.
Java™ and all Java-based trademarks and logos are trademarks or registered
trademarks of Oracle and/or its affiliates.
UNIX is a registered trademark of The Open Group in the United States and other
countries.
Linux is a registered trademark of Linus Torvalds in the United States, other
countries, or both.
Adobe, the Adobe logo, PostScript, and the PostScript logo are either registered
trademarks or trademarks of Adobe Systems Incorporated in the United States,
and/or other countries.
Other company, product, or service names may be trademarks or service marks of
others.
Terms and conditions for product documentation
Permissions for the use of these publications are granted subject to the following
terms and conditions.
Applicability
These terms and conditions are in addition to any terms of use for the IBM website.
Personal use
You may reproduce these publications for your personal, noncommercial use
provided that all proprietary notices are preserved. You may not distribute, display or
make derivative work of these publications, or any portion thereof, without the
10
express consent of IBM.
Commercial use
You may reproduce, distribute and display these publications solely within your
enterprise provided that all proprietary notices are preserved. You may not make
derivative works of these publications, or reproduce, distribute or display these
publications or any portion thereof outside your enterprise, without the express
consent of IBM.
Rights
Except as expressly granted in this permission, no other permissions, licenses or
rights are granted, either express or implied, to the publications or any information,
data, software or other intellectual property contained therein.
IBM reserves the right to withdraw the permissions granted herein whenever, in its
discretion, the use of the publications is detrimental to its interest or, as determined
by IBM, the above instructions are not being properly followed.
You may not download, export or re-export this information except in full compliance
with all applicable laws and regulations, including all United States export laws and
regulations.
IBM MAKES NO GUARANTEE ABOUT THE CONTENT OF THESE
PUBLICATIONS. THE PUBLICATIONS ARE PROVIDED "AS-IS" AND WITHOUT
WARRANTY OF ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING
BUT NOT LIMITED TO IMPLIED WARRANTIES OF MERCHANTABILITY, NON-
INFRINGEMENT, AND FITNESS FOR A PARTICULAR PURPOSE.
IBM Online Privacy Statement
IBM Software products, including software as a service solutions, (“Software
Offerings”) may use cookies or other technologies to collect product usage
information, to help improve the end user experience, to tailor interactions with the
end user, or for other purposes. In many cases no personally identifiable information
is collected by the Software Offerings. Some of our Software Offerings can help
enable you to collect personally identifiable information. If this Software Offering
uses cookies to collect personally identifiable information, specific information about
this offering’s use of cookies is set forth below.
This Software Offering does not use cookies or other technologies to collect
personally identifiable information.
If the configurations deployed for this Software Offering provide you as customer the
ability to collect personally identifiable information from end users via cookies and
other technologies, you should seek your own legal advice about any laws
applicable to such data collection, including any requirements for notice and
consent.
For more information about the use of various technologies, including cookies, for
these purposes, see IBM’s Privacy Policy at http://www.ibm.com/privacy and
IBM’s Online Privacy Statement at http://www.ibm.com/privacy/details in the
section entitled “Cookies, Web Beacons and Other Technologies,” and the “IBM
Software Products and Software-as-a-Service Privacy Statement” at
11
http://www.ibm.com/software/info/product-privacy.
12
-
-
-
-
-
IBM BigInsights
Product overview
BigInsights® is a flexible software platform that provides capabilities to discover and
analyze business insights that are hidden in large volumes of structured and
unstructured data, giving value to previously dormant data.
BigInsights information roadmap
This document provides links to the information resources that are available for
IBM® Open Platform with Apache Hadoop and the BigInsights value-add services.
Introduction to BigInsights
BigInsights is a software platform for discovering, analyzing, and visualizing data
from disparate sources. You use this software to help process and analyze the
volume, variety, and velocity of data that continually enters your organization every
day. BigInsights is a collection of value-added services that can be installed on top
of the IBM Open Platform with Apache Hadoop, wihch is the open Hadoop
foundation.
Release notes
Release notes contain information about the installation and administration of IBM
BigInsights Enterprise Edition and its components.
Installing the free IBM Open Platform with Apache Hadoop and BigInsights Quick
Start Edition, non-production software
What's new for Version 4.1
13
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
IBM BigInsights
BigInsights information roadmap
This document provides links to the information resources that are available for
IBM® Open Platform with Apache Hadoop and the BigInsights® value-add services.
Product overview
Evaluating
Planning
Installing
Administering
Getting started
Developing
Analyzing
Troubleshooting and support
Reference
Community resources
Product overview
BigInsights home page
This web page provides an overview of BigInsights and its related components.
Introduction
These topics introduce you to BigInsights, including its product modules and
components. Last updated: 2015-008
New features and capabilities
These topics include information about the features, capabilities, and updates
that are included in the most recent version of BigInsights. Last updated: 2015-
08
SQL-on-Hadoop without compromise
This white paper contains information on the updated Big SQL for BigInsights
V3.0 and the speed, portability, and robust functionality that this SQL on
Hadoop solution provides. Last updated: 2014-04
Evaluating
Analyzing social media and structured data with InfoSphere BigInsights
This article provides a quick start on BigSheets. You'll learn how to model big
data in BigSheets, manipulate this data using built-in macros and functions,
create charts to visualize your work, and export the results of your analysis in
one of several popular output formats. Last updated: 2012
Big Data Networked Storage Solution for Hadoop
This IBM® Redpaper™ provides a reference architecture, based on Apache
Hadoop, to help businesses gain control over their data, meet tight service level
agreements (SLAs) around their data applications, and turn data-driven insight
into effective action. Big Data Networked Storage Solution for Hadoop delivers
the capabilities for ingesting, storing, and managing large data sets with high
reliability. IBM BigInsights provides an innovative analytics platform that
processes and analyzes all types of data to turn large complex data into
14
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
insight.Last updated: 2013-07
Quick Start Edition
IBM BigInsights Quick Start Edition is a free, downloadable non-production
version of BigInsights that enables new solutions that cost effectively turn large,
complex volumes of data into insight by combining Apache Hadoop, (including
the MapReduce framework and the Hadoop Distributed File Systems), with
unique, enterprise-ready technologies and capabilities from across IBM,
including Big SQL, text analytics and BigSheets. Last updated: 2015-08
Planning
System requirements
This document describes the system requirements for BigInsights. Last
updated: 2015-08
Performance and Capacity Implications for Big Data
The purpose of this IBM Redpaper™ publication is to consider the performance
and capacity implications of big data solutions, which must be taken into
account for them to be viable. This paper describes the benefits that big data
approaches can provide. We then cover performance and capacity
considerations for creating big data solutions. We conclude with what this
means for big data solutions, both now and in the future. Last updated: 2014-
02-07
Using the IBM Big Data and Analytics Platform to Gain Operational Efficiency
This IBM® Redbooks® Solution Guide describes how to use IBM Big Data and
Analytics Platform to provide a more comprehensive view of a customer’s
interaction with an organization’s products and services. It provides an example
about how to take multiple sources of information and analyze that data to gain
insight. This example uses disparate data sources, real-time analytics, an
appliance data warehouse, analytic modeling, and reporting tools to support the
business decision-making process. Last updated: 2014-04
Big SQL: Data warehouse-grade performance on Hadoop
This Impact 2014 presentation introduces Big SQL and places it in the SQL on
Hadoop context. Last updated: 2014-04
Taming Big Data with Big SQL
This Impact 2014 presentation introduces big data, BigInsights, and Big SQL. It
provides performance and security best practices for Big SQL and announces
security improvements over Hive 0.12. Last updated: 2014-04
Installing
Release notes
The release notes contain critical information to ensure the successful
installation and operation of BigInsights. Last updated: 2015-08
Administering
Setting up Security and Administering BigInsights
These topics describe how to complete general administration tasks, such as
configuring user security, administering individual components, and deploying
and running programs. Last updated: 2015-08
15
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
Getting started
BigInsights FAQs
The FAQs area in the Hadoop Dev community provides answers to questions
frequently asked about BigInsights, Big SQL, Hive, and HBase.
BigInsights tutorials
These topics include tutorials that you can use to quickly get started with
BigInsights. Last updated: 2015-08
IBM big data education
IBM offers classroom, on-site, and e-Learning training classes to help build and
enhance your skills BigInsights.
IBM BigInsights education
These e-Learning training classes include material on BigInsights Foundation,
Big SQL, BigInsights Analytics for Business Analysts, and BigInsights Analytics
for Programmers.
Big Data University
The Big Data University website contains courses, downloads, and educational
materials about Hadoop and other big data applications.
Understanding BigInsights
This developerWorks® article provides an introduction to BigInsights, including
architecture, capabilities, and scenarios for how you can use the product in your
organization. Last updated: 2011-10
Developing
Developing and administering applications
These topics describe how to develop and maintain BigInsights applications.
Last updated: 2015-08
Set up and use federation
This developerWorks article introduces Big SQL federation capabilities by using
many data sources, including IBM DB2 for Linux, UNIX, and Windows, IBM
PureData System for Analytics, IBM PureData System for Operational
Analytics, Teradata, and Oracle. Federation enables you to sent distributed
requests to multiple data sources within a single SQL statement. Last updated:
2014-07-08
Analyzing
Analyzing big data
These topics describe how to analyze data with IBM BigSheets, analyzing and
manipulating data with Jaql, and analyzing documents with text analytics. Last
updated: 2015-08
BigSheets
This website includes information about BigSheets, which is a browser-based
visualization tool that you can use to extend the scope of your business
intelligence data.
Troubleshooting and support
16
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
Troubleshooting
These topics describe how to troubleshoot issues with BigInsights components,
security, and text analytics Last updated: 2015-08
dW Answers for BigInsights
This Q&A area in the Hadoop Dev community provides a place to ask questions
and get answers from experts.
BigInsights product support
The IBM Support Portal is a unified, customizable view of all technical support
tools and information for all IBM systems, software, and services. Updated
continuously.
Reference
Reference
Use the reference information to read more about commands and functions,
supported languages, and console messages. Last updated: 2015-08
Community resources
Hadoop Dev
This dev-to-dev community site where you can find resources and tips from
experts, ask questions, and share with others. Updated continuously.
IBM Meetup Groups
This website, designed for developers, data scientists, and big data
enthusiasts, provides an opportunity to work hands-on with the solutions and
tools in the big data portfolio.
IBM developerWorks
The IBM developerWorks website contains developer resources, tutorials, and
articles about BigInsights.
The Big Data and Analytics Hub
This website provides links to big data communities, events, blogs, videos and
podcasts, and developer-centric material. Updated continuously.
Video Guide
This developerWorks article provides links to new and trending videos from the
IBM Big Data channel on YouTube. You can also go to the Videos area in
Hadoop Dev for a searchable and continuously updated list.
Parent topic:Product overview
17
-
-
-
-
-
IBM BigInsights
Introduction to BigInsights
BigInsights® is a software platform for discovering, analyzing, and visualizing data
from disparate sources. You use this software to help process and analyze the
volume, variety, and velocity of data that continually enters your organization every
day. BigInsights is a collection of value-added services that can be installed on top
of the IBM® Open Platform with Apache Hadoop, wihch is the open Hadoop
foundation.
.BigInsights helps your organization understand and analyze massive volumes of
unstructured information as easily as smaller volumes of information. The flexible
platform is built on an Apache Hadoop open source framework that runs in parallel
on commonly available, low-cost hardware. You can easily scale the platform to
analyze hundreds of terabytes, petabytes, or more of raw data that is derived from
various sources. As information grows, you add more hardware to support the influx
of data.
BigInsights helps application developers, data scientists, and administrators in your
organization quickly build and deploy custom analytics to capture insight from data.
This data is often integrated into existing databases, data warehouses, and
business intelligence infrastructure. By using BigInsights, users can extract new
insights from this data to enhance knowledge of your business. For more
information about the IBM Open Source Platform, see Installing IBM Open Source
Platorm for Apache Hadoop.
BigInsights incorporates tooling and value-add services for numerous users,
speeding time to value and simplifying development and maintenance:
Software developers can use the value-add services that are provided to develop
custom text analytic functions to analyze loosely structured or largely unstructured
text data.
Data scientists and business analysts can use the data analysis tools within the
value-add services to explore and work with unstructured data in a familiar
spreadsheet-like environment.
IBM Open Platform with Apache Hadoop and the BigInsights value-add services
The content of IBM Open Platform with Apache Hadoop and the BigInsights value-
add services includes the following:
IBM BigInsights Quick Start Edition for Non-Production Environments
Test drive the IBM Open Platform with Apache Hadoop and BigInsights value-add
modules, Version 4.1 by downloading the Quick Start Edition, which is a free, non-
production software.
BigInsights features and architecture
BigInsights provides distinct capabilities for discovering and analyzing business
insights that are hidden in large volumes of data. These technologies and features
combine to help your organization manage data from the moment that it enters
your enterprise.
18
-
-
-
Suggested services layout for IBM Open Platform with Apache Hadoop and
BigInsights value-added services
In your multi-node cluster, it is suggested that you have at least one management
node in your non-high availability environment, if performance is not an issue. If
performance is a concern, consider configuring at least three management nodes.
If you use the BigInsights - Big SQL service, consider configuring four
management nodes. If you use a high availability environment, consider six
management nodes. Use the following list as a guide for the nodes in your cluster.
Scenarios for working with big data
BigInsights provides capabilities to derive business value from complex,
unstructured information. BigInsights supports various scenarios that can help
different organizations grow by finding value that is hidden in data and data
relationships.
Where BigInsights fits in an enterprise data architecture
Reusing business investments and incorporating existing assets is important when
expanding your enterprise data architecture. BigInsights supports data exchange
with a number of sources, relational data stores, and applications so that it can
integrate into your existing architecture.
Parent topic:Product overview
19
IBM BigInsights
IBM Open Platform with Apache Hadoop and the
BigInsights value-add services
The content of IBM® Open Platform with Apache Hadoop and the BigInsights®
value-add services includes the following:
Table 1. Supported features of the BigInsights editions
Parent topic:Introduction to BigInsights
Modules Supported features
IBM Open Platform with Apache Hadoop Cluster management, and services such as
Hive, HBase, Oozie, Flume, HDFS
IBM BigInsights Analyst Module Big SQL and BigSheets
IBM BigInsights Data Scientist Module The contents of the IBM BigInsights Analyst
Module, plus Text Analytics and Big R
IBM BigInsights Enterprise Management
Module
GPFS and Platform Symphony
IBM BigInsights for Apache Hadoop The contents of the IBM BigInsights Data
Scienties module, the BigInsights Analyst
module, and the BigInsights Enterprise
Management module. In addition, it contains a
license that provides limited-use licenses for
other software so that you can get even more
value out of Hadoop.
IBM BigInsights Quick Start Edition Big SQLIBM BigInsights Big RBigSheetsText
AnalyticsConnectorsIBM Hadoop core
20
-
-
-
-
IBM BigInsights
IBM BigInsights Quick Start Edition for Non-
Production Environments
Test drive the IBM® Open Platform with Apache Hadoop and BigInsights® value-
add modules, Version 4.1 by downloading the Quick Start Edition, which is a free,
non-production software.
Use the Quick Start Edition to begin exploring the features of IBM Open Platform
with Apache Hadoop and BigInsights value-add modules by using real data and
running real applications.
The Quick Start Edition comes loaded with most of the same features as the IBM
Open Platform with Apache Hadoop, and the related services bundled in the Data
Scientist and Business Analyst packages, without any need to upgrade or uninstall
your current products. The Quick Start Edition puts no data limit on the cluster and
there is no time limit on the license.
Download the software
You can download the native software. See Installing the value-add services
for information how downloading and installing. Or, you can download the
VM image, the latter of which comes preconfigured.
Complete the tutorials
After you download the software, use the BigInsights tutorials to begin working
with big data.
For more information, view the video tutorials on the BigInsights home page.
The following table highlights the supported and unsupported features of the Quick
Start Edition.
Table 1. Supported and unsupported features of the Quick Start Edition
Parent topic:Introduction to BigInsights
Related concepts:
IBM BigInsights Quick Start Edition for Non-Production Environments: VM image
README
Supported features Unsupported features
Big SQLIBM BigInsights Big RBigSheetsText
AnalyticsWorkload optimizationQuery
SupportConnectorsManagement toolsIBM Open
Platform with Apache Hadoop
High availability (HA) capabilityGeneral Parallel
File System (GPFS™)Production support
21
IBM BigInsights
BigInsights features and architecture
BigInsights® provides distinct capabilities for discovering and analyzing business
insights that are hidden in large volumes of data. These technologies and features
combine to help your organization manage data from the moment that it enters your
enterprise.
By combining these technologies, BigInsights extends the Hadoop open source
framework with enterprise-grade security, governance, availability, integration into
existing data stores, tools that simplify developer productivity, and more.
Hadoop is a computing environment built on top of a distributed, clustered file
system that is designed specifically for large-scale data operations. Hadoop is
designed to scan through large data sets to produce its results through a highly
scalable, distributed batch processing system. Hadoop comprises two main
components: a file system, known as the Hadoop Distributed File System (HDFS),
and a programming paradigm, known as Hadoop MapReduce. To develop
applications for Hadoop and interact with HDFS, you use additional technologies
and programming languages such as Pig, Hive, Flume, and many others.
Apache Hadoop helps enterprises harness data that was previously difficult to
manage and analyze. BigInsights features Hadoop and its related technologies as a
core component.
22
-
-
-
-
-
-
File systems
The Hadoop Distributed File System (HDFS) comes with IBM Open Platform with
Apache Hadoop as your distributed file system.
MapReduce frameworks
The MapReduce framework is the core of Apache Hadoop. This programming
paradigm provides for massive scalability across hundreds or thousands of servers
in a Hadoop cluster.
Open source technologies
The following open source technologies are included with IBM Open Platform with
Apache Hadoop version 4.1.
Text Analytics
BigInsights includes Text Analytics, which extracts structured information from
unstructured and semistructured data.
IBM Big SQL
Big SQL is a massively parallel processing (MPP) SQL engine that deploys directly
on the physical Hadoop Distributed File System (HDFS) cluster.
Integration with other IBM products
BigInsights complements and extends existing business capabilities by integrating
with other IBM products. These integration points extend existing technologies to
encompass more comprehensive information types, enabling a complete view of
your business.
Parent topic:Introduction to BigInsights
23
-
IBM BigInsights
File systems
The Hadoop Distributed File System (HDFS) comes with IBM® Open Platform with
Apache Hadoop as your distributed file system.
Hadoop Distributed File System (HDFS)
The Hadoop Distributed File System (HDFS) allows applications to run across
multiple servers. HDFS is highly fault tolerant, runs on low-cost hardware, and
provides high-throughput access to data.
Parent topic:BigInsights features and architecture
24
IBM BigInsights
Hadoop Distributed File System (HDFS)
The Hadoop Distributed File System (HDFS) allows applications to run across
multiple servers. HDFS is highly fault tolerant, runs on low-cost hardware, and
provides high-throughput access to data.
Data in a Hadoop cluster is broken into smaller pieces called blocks, and then
distributed throughout the cluster. Blocks, and copies of blocks, are stored on other
servers in the Hadoop cluster. That is, an individual file is stored as smaller blocks
that are replicated across multiple servers in the cluster.
Each HDFS cluster has a number of DataNodes, with one DataNode for each node
in the cluster. DataNodes manage the storage that is attached to the nodes on
which they run. When a file is split into blocks, the blocks are stored in a set of
DataNodes that are spread throughout the cluster. DataNodes are responsible for
serving read and write requests from the clients on the file system, and also handle
block creation, deletion, and replication.
An HDFS cluster supports NameNodes, an active NameNode and a standby
NameNode, which is a common setup for high availability. The NameNode regulates
access to files by clients, and tracks all data files in HDFS. The NameNode
determines the mapping of blocks to DataNodes, and handles operations such as
opening, closing, and renaming files and directories. All of the information for the
NameNode is stored in memory, which allows for quick response times when adding
storage or reading requests. The NameNode is the repository for all HDFS
metadata, and user data never flows through the NameNode.
A typical HDFS deployment has a dedicated computer that runs only the
NameNode, because the NameNode stores metadata in memory. If the computer
that runs the NameNode fails, then metadata for the entire cluster is lost, so this
computer is typically more robust than others in the cluster.
Parent topic:File systems
Related reference:
Read the HDFS Architecture Guide
Read the HDFS User Guide
25
-
-
IBM BigInsights
MapReduce frameworks
The MapReduce framework is the core of Apache Hadoop. This programming
paradigm provides for massive scalability across hundreds or thousands of servers
in a Hadoop cluster.
Hadoop MapReduce
In IBM® Open Platform with Apache Hadoop, the MapReduce framework,
MapReduce version 2, is run as a YARN workload framework. The benefits of this
new approach are that resource management is separated from workload
management, and MapReduce applications can coexist with other types of
workloads such as Spark or Slider.
Yarn
The current version of the product supports the new Apache Hadoop YARN
framework and integrates it with the rest of the IBM Open Platform with Apache
Hadoop components. Yarn decouples resource management from workload
management.
Parent topic:BigInsights features and architecture
26
IBM BigInsights
Hadoop MapReduce
In IBM® Open Platform with Apache Hadoop, the MapReduce framework,
MapReduce version 2, is run as a YARN workload framework. The benefits of this
new approach are that resource management is separated from workload
management, and MapReduce applications can coexist with other types of
workloads such as Spark or Slider.
In this programming paradigm, applications are divided into self-contained units of
work. Each of these units of work can be run on any node in the cluster. In a
Hadoop cluster, a MapReduce program is known as a job. A job is run by being
broken down into pieces, known as tasks. These tasks are scheduled to run on the
nodes in the cluster where the data exists.
MapReduce version 2 jobs are executed by YARN in the Hadoop cluster. The YARN
ResourceManager spawns a MapReduce ApplicationMaster container, which
requests additional containers for mapper and reducer tasks. The ApplicationMaster
communicates with the NameNode to determine where all of the data required for
the job exists across the cluster. It attempts to schedule tasks on the cluster where
the data is stored, rather than sending data across the network to complete a task.
The YARN framework and the Hadoop Distributed File System (HDFS) typically
exist on the same set of nodes, which enables the ResourceManager program to
schedule tasks on nodes where the data is stored.
As the name MapReduce implies, the reduce task is always completed after the map
task. A MapReduce job splits the input data set into independent chunks that are
processed by map tasks, which run in parallel. These bits, known as tuples, are
key/value pairs. The reduce task takes the output from the map task as input, and
combines the tuples into a smaller set of tuples.
Each MapReduce ApplicationMaster monitors its spawned tasks. If a task fails to
complete, the ApplicationMaster will reschedule that task on another node in the
cluster.
This distribution of work enables map tasks and reduce tasks to run on smaller
subsets of larger data sets, which ultimately provides maximum scalability. The
MapReduce framework also maximizes parallelism by manipulating data stored
across multiple clusters. MapReduce applications do not have to be written in
Java™, though most MapReduce programs that run natively under Hadoop are
written in Java.
Parent topic:MapReduce frameworks
27
IBM BigInsights
Yarn
The current version of the product supports the new Apache Hadoop YARN
framework and integrates it with the rest of the IBM® Open Platform with Apache
Hadoop components. Yarn decouples resource management from workload
management.
The YARN framework uses a ResourceManager service, a NodeManagers service,
and an Application master service.
The Application master service is an import type of YARN services that is a per-
application service. It is responsible for negotiating with the Resource Manager to
apply resources for a particular application. It also monitors the status of the
application execution and provides tracking information. The bottleneck for the
central service, like Resource Manager, on a highly concurrent and utilized cluster is
resolved by transferring the scheduling responsibility from Resource Manager to
Application Manager.
The ResourceManager is in charge of scheduling resources for jobs. The basic
allocation unit is a container. Containers are workload agnostic, and they can
represent any type of computation, such as a map or reduce task in MapReduce.
The ResourceManager ensures that the cluster capacity is not exceeded by keeping
track of the scheduled containers and queueing requests when resources are busy.
NodeManagers spawn containers scheduled by the ResourceManager and monitor
that they do not go beyond the expected resource utilization. Containers that use
more memory or CPU than allocated are terminated.
For more information about the Yarn architecture, see
http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/YARN.html
Parent topic:MapReduce frameworks
28
-
-
-
-
IBM BigInsights
Open source technologies
The following open source technologies are included with IBM® Open Platform with
Apache Hadoop version 4.1.
Table 1. Open source technology versions by IBM BigInsights value-add services
release
Ambari
Apache Ambari is an open framework for provisioning, managing, and monitoring
Apache Hadoop clusters. Ambari provides an intuitive and easy-to-use Hadoop
management web UI backed by its collection of tools and APIs that simplify the
operation of Hadoop clusters.
Flume
Apache Flume is a distributed, reliable, and available service for efficiently
collecting, aggregating, and moving large amounts of streaming event data. Flume
helps you aggregate data from many sources, manipulate the data, and then add
the data into your Hadoop environment.
Hadoop
Apache Hadoop contains open-source software for reliable, scalable, distributed
computing and storage. The Apache Hadoop software library is a framework that
allows for the distributed processing of large data sets across clusters of
computers using simple programming models. It is designed to scale up from
single servers to thousands of machines, each offering local computation and
storage.
HBase
Apache HBase is a column-oriented database management system that runs on
top of HDFS and is often used for sparse data sets. Unlike relational database
systems, HBase does not support a structured query language like SQL. HBase
Open source
technology
4.1.0.0 4.1.0.1 4.1.0.2
Ambari 2.1.0 2.1.0 2.1.0
Flume 1.5.2 1.5.2 1.5.2
Hadoop (HDFS, YARN
MapReduce)
2.7.1 2.7.1 2.7.1
HBase 1.1.1 1.1.1 1.1.1
Hive 1.2.1 1.2.1 1.2.1
Kafka 0.8.2.1 0.8.2.1 0.8.2.1
Knox 0.6.0 0.6.0 0.6.0
Oozie 4.2.0 4.2.0 4.2.0
Pig 0.15.0 0.15.0 0.15.0
Slider 0.80.0 0.80.0 0.80.0
Solr 5.1.0 5.1.0 5.1.0
Spark 1.4.1 1.4.1 1.5.1
Sqoop 1.4.6 1.4.6 1.4.6
Zookeeper 3.4.6 3.4.6 3.4.6
29
-
-
-
-
-
-
-
-
applications are written in Java™, much like a typical MapReduce application.
HBase allows many attributes to be grouped into column families so that the
elements of a column family are all stored together. This approach is different from
a row-oriented relational database, where all columns of a row are stored together.
Hive
Apache Hive is a data warehouse infrastructure that facilitates data extract-
transform-load (ETL) operations, in addition to analyzing large data sets that are
stored in the Hadoop Distributed File System (HDFS). IBM Open Platform with
Apache Hadoop includes a JDBC driver that is used for programming with Hive
and for connecting with Cognos Business Intelligence software.
Kafka
Apache Kafka is a distributed publish-subscribe messaging system rethought as a
distributed commit log. It is designed to be fast, scalable, durable, and fault-tolerant
providing a unified, high-throughput, low-latency platform for handling real-time
data feeds. Kafka is often used in place of traditional message brokers because of
its higher throughput, reliability and replication.
Oozie
Apache Oozie is a management application that simplifies workflow and
coordination between MapReduce jobs. Oozie provides users with the ability to
define actions and dependencies between actions. Oozie then schedules actions
to run when the required dependencies are met. Workflows can be scheduled to
start based on a given time or based on the arrival of specific data in the file
system.
Pig
Apache Pig is a platform for analyzing large data sets that consist of a high-level
language for expressing data analysis programs, coupled with infrastructure for
evaluating these programs. A key property of Pig programs is that their structure is
amenable to substantial parallelization, which in turns enables them to handle very
large data sets.
Slider
Apache Slider (incubating) is a YARN application to deploy existing distributed
applications on YARN, monitor them, and make them larger or smaller as desired,
even while they are running.
Solr
Solr is an enterprise search tool from the Apache Lucene project that offers
powerful search tools, including hit highlighting, as well as indexing capabilities,
reliability and scalability, a central configuration system, and failover and recovery.
Spark
Spark is a component of IBM Open Platform with Apache Hadoop that includes
Apache Spark. Apache Spark is a fast and general-purpose cluster computing
system. It provides high-level APIs in Java, Scala and Python, and an optimized
engine that supports general execution graphs. It also supports a rich set of higher-
level tools including Spark SQL for SQL and structured data processing, MLLib for
machine learning, GraphX for combined data-parallel and graph-parallel
computations, and Spark Streaming for streaming data processing.
Sqoop
Sqoop is a tool designed to easily import information from structured databases
(such as SQL) and related Hadoop systems (such as Hive and HBase) into your
30
-
-
Hadoop cluster. You can also use Sqoop to extract data from Hadoop and export it
to relational databases and enterprise data warehouses.
ZooKeeper
ZooKeeper is a centralized infrastructure and set of services that enable
synchronization across a cluster. ZooKeeper maintains common objects that are
needed in large cluster environments, such as configuration information,
distributed synchronization, and group services. Many other open source projects
that use Hadoop clusters require these cross-cluster services. Having these
services available in ZooKeeper ensures that each project can embed ZooKeeper
without having to build new synchronization services into each project.
Other Apache projects
The IBM Open Platform with Apache Hadoop is a pure open source offering with
the latest components in the Apache Hadoop and Spark ecosystems.
Parent topic:BigInsights features and architecture
Related reference:
Apache Hadoop website
Related information:
Apache Solr
31
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
IBM BigInsights
Ambari
Apache Ambari is an open framework for provisioning, managing, and monitoring
Apache Hadoop clusters. Ambari provides an intuitive and easy-to-use Hadoop
management web UI backed by its collection of tools and APIs that simplify the
operation of Hadoop clusters.
Core Ambari
The release of IBM® Open Platform with Apache Hadoop includes an updated
Apache Ambari 2.1.0 with more functionality and improvements.
Customizable Dashboards [AMBARI-9792] : Ability to customize the Metric widgets
displayed on HDFS, YARN and HBase on Service Summary pages. Includes
ability for Operators to create new widgets and share widgets in a Widget Library.
Guided Configs [AMBARI-9794] : Service Configs for HDFS, YARN, Hive and
HBase included new UI controls (such as slider-bars) and an improved
organization/layout.
Manual Kerberos [AMBARI-9783] : When enabling Kerberos, ability to perform
setup Kerberos manually.
New User Views : Hive, Pig, Files and Capacity Scheduler user views are included
by default with Ambari.
Rack Awareness [AMBARI-6646] : Ability to set Rack ID on hosts. Ambari will
generate a topology script automatically and set the configuration for HDFS.
Alerts Log Appender [AMBARI-10249] : Log alert state change events to ambari-
alerts.log.
JDK 1.8 [AMBARI-9784] : Added support for Oracle JDK 1.8.
RHEL/CentOS/Oracle Linux 7 [AMBARI-979] : Added support for
RHEL/CentOS/Oracle Linux 7.
Ambari Alerts (AMBARI-6354)
Ambari Metrics (AMBARI-5707)
Simplified Kerberos Setup (AMBARI-7204)
Hive Metastore HA (AMBARI-6684)
HiveServer2 HA (AMBARI-8906)
Oozie HA (AMBARI-6683)
Add HDFS-NFS gateway as a new component to HDFS in Ambari stack (AMBARI-
9224)
Extensibility
Blueprints: Host Discovery [AMBARI-10750] : Ability to automatically add hosts to
a blueprint-created cluster.
Views Framework: Auto-create [AMBARI-10424] : Ability to specify how to
automatically create a view instance.
Views Framework: Auto-configure [AMBARI-10306] : Ability to specify how to
automatically configuration a view instance based on the cluster being managed by
Ambari.
32
For more information about the updates, see https://issues.apache.org/jira/browse/
Parent topic:Open source technologies
33
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
IBM BigInsights
Flume
Apache Flume is a distributed, reliable, and available service for efficiently
collecting, aggregating, and moving large amounts of streaming event data. Flume
helps you aggregate data from many sources, manipulate the data, and then add
the data into your Hadoop environment.
Use the following terms as a guide for working with Flume:
sources
Any data source that Flume supports,
channels
A repository where the data are staged.
sinks
The target of data where Flume sends the interceptors to.
IBM® Open Platform with Apache Hadoop and BigInsights include the following
changes on top of Flume 1.5.2:
FLUME-2095: JMS source with TIBCO
FLUME-924: Implement a JMS source for Flume NG
FLUME-997: Support secure transport mechanism
FLUME-1502: Support for running simple configurations embedded in host process
FLUME-1516: FileChannel Write Dual Checkpoints to avoid replays
FLUME-1632: Persist progress on each file in file spooling client/source
FLUME-1735: Add support for a plugins.d directory
FLUME-1894: Implement Thrift RPC
FLUME-1917: FileChannel group commit (coalesce fsync)
FLUME-2010: Support Avro records in Log4jAppender and the HDFS Sink
FLUME-2048: Avro container file deserializer
FLUME-2070: Add a Flume Morphline Solr Sink
FLUME-1227: Introduce some sort of SpillableChannel
FLUME-2056: Allow SpoolDir to pass just the filename that is the source of an
event
FLUME-2071: Flume Context doesn’t support float or double configuration values.
FLUME-2185: Upgrade morphlines to 0.7.0
FLUME-2188: flume-ng-log4jappender Support user supplied headers
FLUME-2225: Elasticsearch Sink for ES HTTP API
FLUME-2294: Add a sink for Kite Datasets
FLUME-2309: Spooling directory should not always consume the oldest file first.
For a complete list of the new features, improvements, and bug fixes available, refer
to the CHANGELOG.txt file located in your Flume installation directory.
For more information about Flume, see http://flume.apache.org/.
Parent topic:Open source technologies
34
35
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
IBM BigInsights
Hadoop
Apache Hadoop contains open-source software for reliable, scalable, distributed
computing and storage. The Apache Hadoop software library is a framework that
allows for the distributed processing of large data sets across clusters of computers
using simple programming models. It is designed to scale up from single servers to
thousands of machines, each offering local computation and storage.
The Apache Hadoop 2.7.1 release includes important new features and
improvements since Hadoop 2.2.0:
Support for Access Control Lists in HDFS
Native support for Rolling Upgrades in HDFS
Usage of protocol-buffers for HDFS FSImage for smooth operational upgrades
Complete HTTPS support in HDFS
Enhanced support for new applications on YARN with Application History Server
and Application Timeline Server
Support for strong SLAs in YARN CapacityScheduler via Preemption
Support for Heterogeneous Storage hierarchy in HDFS.
In-memory cache for HDFS data with centralized administration and management.
Simplified distribution of MapReduce binaries with HDFS in YARN Distributed
Cache.
The IBM-specific changes are:
Backport HDFS-8432:Introduce a minimum compatible layout version to allow
downgrade in more rolling upgrade use cases.
Backport HADOOP-9431: TestSecurityUtil#testLocalHostNameForNullOrWild on
systems where hostname contains capital letters
Fix to remove cross-site scripting / scripting injection in HDFS webapps
Backport HADOOP-11138: Stream yarn daemon and container logs through log4j
Fix hadoop2 scripts to enable log streaming
Backport HADOOP-10420: Add support to Swift-FS to support tempAuth
Backport MAPREDUCE-5621: mr-jobhistory-daemon.sh doesn't have to execute
mkdir and chown all the time
Make the location of container executor config file configurable
Add log streaming for container logs
Backport HADOOP-7436: Bundle Log4j socket appender Metrics plugin in Hadoop
Upgrade jsch to 0.1.50 because of JDK 1.7 incompatibilities
Backport MAPREDUCE-6191: TestJavaSerialization fails with getting incorrect MR
job result
Backport HADOOP-11418: Property "io.compression.codec.lzo.class" does not
work with other value besides default
Fix race condition in Configuration.write
Backport MAPREDUCE-6246: DBOutputFormat.java appending extra semicolon to
query which is incompatible with DB2
Backport HDFS-7282. Fix intermittent TestShortCircuitCache and
TestBlockReaderFactory failures resulting from TemporarySocketDirectory GC
Backport HDFS-7182. JMX metrics aren't accessible when NN is busy
36
-
-
-
Upgrade jetty version to 6.1.26-ibm
Backport HADOOP-10062. race condition in
MetricsSystemImpl#publishMetricsNow that causes incorrect results.
Backport HDFS-6874: Add GET_BLOCK_LOCATIONS operation to HttpFS
Parent topic:Open source technologies
37
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1
Big Insights v4.1

Más contenido relacionado

La actualidad más candente

Startup Case Study: Leveraging the Broad Hadoop Ecosystem to Develop World-Fi...
Startup Case Study: Leveraging the Broad Hadoop Ecosystem to Develop World-Fi...Startup Case Study: Leveraging the Broad Hadoop Ecosystem to Develop World-Fi...
Startup Case Study: Leveraging the Broad Hadoop Ecosystem to Develop World-Fi...
DataWorks Summit
 

La actualidad más candente (20)

Ingesting Data at Blazing Speed Using Apache Orc
Ingesting Data at Blazing Speed Using Apache OrcIngesting Data at Blazing Speed Using Apache Orc
Ingesting Data at Blazing Speed Using Apache Orc
 
Meetup Oracle Database MAD: 2.1 Data Management Trends: SQL, NoSQL y Big Data
Meetup Oracle Database MAD: 2.1 Data Management Trends: SQL, NoSQL y Big Data Meetup Oracle Database MAD: 2.1 Data Management Trends: SQL, NoSQL y Big Data
Meetup Oracle Database MAD: 2.1 Data Management Trends: SQL, NoSQL y Big Data
 
Ibm integrated analytics system
Ibm integrated analytics systemIbm integrated analytics system
Ibm integrated analytics system
 
Nordic infrastructure Conference 2017 - SQL Server on Linux Overview
Nordic infrastructure Conference 2017 - SQL Server on Linux OverviewNordic infrastructure Conference 2017 - SQL Server on Linux Overview
Nordic infrastructure Conference 2017 - SQL Server on Linux Overview
 
Openshift 3.10 & Container solutions for Blockchain, IoT and Data Science
Openshift 3.10 & Container solutions for Blockchain, IoT and Data ScienceOpenshift 3.10 & Container solutions for Blockchain, IoT and Data Science
Openshift 3.10 & Container solutions for Blockchain, IoT and Data Science
 
MOUG17 Keynote: Oracle OpenWorld Major Announcements
MOUG17 Keynote: Oracle OpenWorld Major AnnouncementsMOUG17 Keynote: Oracle OpenWorld Major Announcements
MOUG17 Keynote: Oracle OpenWorld Major Announcements
 
From Insights to Value - Building a Modern Logical Data Lake to Drive User Ad...
From Insights to Value - Building a Modern Logical Data Lake to Drive User Ad...From Insights to Value - Building a Modern Logical Data Lake to Drive User Ad...
From Insights to Value - Building a Modern Logical Data Lake to Drive User Ad...
 
Expert summit SQL Server 2016
Expert summit   SQL Server 2016Expert summit   SQL Server 2016
Expert summit SQL Server 2016
 
SQL On Hadoop
SQL On HadoopSQL On Hadoop
SQL On Hadoop
 
Exploring microservices in a Microsoft landscape
Exploring microservices in a Microsoft landscapeExploring microservices in a Microsoft landscape
Exploring microservices in a Microsoft landscape
 
Accelerating Big Data Insights
Accelerating Big Data InsightsAccelerating Big Data Insights
Accelerating Big Data Insights
 
Open Innovation with Power Systems
Open Innovation with Power Systems Open Innovation with Power Systems
Open Innovation with Power Systems
 
Bootcamp 2017 - SQL Server on Linux
Bootcamp 2017 - SQL Server on LinuxBootcamp 2017 - SQL Server on Linux
Bootcamp 2017 - SQL Server on Linux
 
Sharing metadata across the data lake and streams
Sharing metadata across the data lake and streamsSharing metadata across the data lake and streams
Sharing metadata across the data lake and streams
 
Built-In Security for the Cloud
Built-In Security for the CloudBuilt-In Security for the Cloud
Built-In Security for the Cloud
 
Startup Case Study: Leveraging the Broad Hadoop Ecosystem to Develop World-Fi...
Startup Case Study: Leveraging the Broad Hadoop Ecosystem to Develop World-Fi...Startup Case Study: Leveraging the Broad Hadoop Ecosystem to Develop World-Fi...
Startup Case Study: Leveraging the Broad Hadoop Ecosystem to Develop World-Fi...
 
PASS Summit - SQL Server 2017 Deep Dive
PASS Summit - SQL Server 2017 Deep DivePASS Summit - SQL Server 2017 Deep Dive
PASS Summit - SQL Server 2017 Deep Dive
 
Hybrid Data Platform
Hybrid Data Platform Hybrid Data Platform
Hybrid Data Platform
 
Treat your enterprise data lake indigestion: Enterprise ready security and go...
Treat your enterprise data lake indigestion: Enterprise ready security and go...Treat your enterprise data lake indigestion: Enterprise ready security and go...
Treat your enterprise data lake indigestion: Enterprise ready security and go...
 
Intel and Cloudera: Accelerating Enterprise Big Data Success
Intel and Cloudera: Accelerating Enterprise Big Data SuccessIntel and Cloudera: Accelerating Enterprise Big Data Success
Intel and Cloudera: Accelerating Enterprise Big Data Success
 

Destacado

Destacado (8)

Big Data: Querying complex JSON data with BigInsights and Hadoop
Big Data:  Querying complex JSON data with BigInsights and HadoopBig Data:  Querying complex JSON data with BigInsights and Hadoop
Big Data: Querying complex JSON data with BigInsights and Hadoop
 
Big Data: Using free Bluemix Analytics Exchange Data with Big SQL
Big Data: Using free Bluemix Analytics Exchange Data with Big SQL Big Data: Using free Bluemix Analytics Exchange Data with Big SQL
Big Data: Using free Bluemix Analytics Exchange Data with Big SQL
 
Big Data: Getting started with Big SQL self-study guide
Big Data:  Getting started with Big SQL self-study guideBig Data:  Getting started with Big SQL self-study guide
Big Data: Getting started with Big SQL self-study guide
 
Big Data: HBase and Big SQL self-study lab
Big Data:  HBase and Big SQL self-study lab Big Data:  HBase and Big SQL self-study lab
Big Data: HBase and Big SQL self-study lab
 
Big Data: Big SQL and HBase
Big Data:  Big SQL and HBase Big Data:  Big SQL and HBase
Big Data: Big SQL and HBase
 
Big Data: Working with Big SQL data from Spark
Big Data:  Working with Big SQL data from Spark Big Data:  Working with Big SQL data from Spark
Big Data: Working with Big SQL data from Spark
 
Big Data: Explore Hadoop and BigInsights self-study lab
Big Data:  Explore Hadoop and BigInsights self-study labBig Data:  Explore Hadoop and BigInsights self-study lab
Big Data: Explore Hadoop and BigInsights self-study lab
 
Big Data: SQL on Hadoop from IBM
Big Data:  SQL on Hadoop from IBM Big Data:  SQL on Hadoop from IBM
Big Data: SQL on Hadoop from IBM
 

Similar a Big Insights v4.1

The art of .net deployment automation
The art of .net deployment automationThe art of .net deployment automation
The art of .net deployment automation
MidVision
 
Building & managing wa app wely
Building & managing wa app   welyBuilding & managing wa app   wely
Building & managing wa app wely
Spiffy
 
Building & Managing Windows Azure
Building & Managing Windows AzureBuilding & Managing Windows Azure
Building & Managing Windows Azure
K.Mohamed Faizal
 

Similar a Big Insights v4.1 (20)

Continuous Integration & Continuous Delivery
Continuous Integration & Continuous DeliveryContinuous Integration & Continuous Delivery
Continuous Integration & Continuous Delivery
 
Introduction to Machine Learning with H2O and Python
Introduction to Machine Learning with H2O and PythonIntroduction to Machine Learning with H2O and Python
Introduction to Machine Learning with H2O and Python
 
Introduction to Machine Learning with H2O and Python
Introduction to Machine Learning with H2O and PythonIntroduction to Machine Learning with H2O and Python
Introduction to Machine Learning with H2O and Python
 
Exploring sql server 2016 bi
Exploring sql server 2016 biExploring sql server 2016 bi
Exploring sql server 2016 bi
 
Sybase IQ Course Content EDB785
Sybase IQ Course Content EDB785Sybase IQ Course Content EDB785
Sybase IQ Course Content EDB785
 
The art of .net deployment automation
The art of .net deployment automationThe art of .net deployment automation
The art of .net deployment automation
 
SQL Server 2019 Big Data Cluster
SQL Server 2019 Big Data ClusterSQL Server 2019 Big Data Cluster
SQL Server 2019 Big Data Cluster
 
Session 8 - Creating Data Processing Services | Train the Trainers Program
Session 8 - Creating Data Processing Services | Train the Trainers ProgramSession 8 - Creating Data Processing Services | Train the Trainers Program
Session 8 - Creating Data Processing Services | Train the Trainers Program
 
Microsoft ignite 2018 SQL server 2019 big data clusters - deep dive session
Microsoft ignite 2018 SQL server 2019 big data clusters - deep dive sessionMicrosoft ignite 2018 SQL server 2019 big data clusters - deep dive session
Microsoft ignite 2018 SQL server 2019 big data clusters - deep dive session
 
Get Ready for SharePoint 2016
Get Ready for SharePoint 2016Get Ready for SharePoint 2016
Get Ready for SharePoint 2016
 
NTC 326 Effective Communication/tutorialrank.com
 NTC 326 Effective Communication/tutorialrank.com NTC 326 Effective Communication/tutorialrank.com
NTC 326 Effective Communication/tutorialrank.com
 
VMworld 2013: Big Data Platform Building Blocks: Serengeti, Resource Manageme...
VMworld 2013: Big Data Platform Building Blocks: Serengeti, Resource Manageme...VMworld 2013: Big Data Platform Building Blocks: Serengeti, Resource Manageme...
VMworld 2013: Big Data Platform Building Blocks: Serengeti, Resource Manageme...
 
Day 13 - Creating Data Processing Services | Train the Trainers Program
Day 13 - Creating Data Processing Services | Train the Trainers ProgramDay 13 - Creating Data Processing Services | Train the Trainers Program
Day 13 - Creating Data Processing Services | Train the Trainers Program
 
Make Web, Not War - Installfest: Extend Your Web Server, Rodney Buike
Make Web, Not War - Installfest: Extend Your Web Server, Rodney BuikeMake Web, Not War - Installfest: Extend Your Web Server, Rodney Buike
Make Web, Not War - Installfest: Extend Your Web Server, Rodney Buike
 
Integrating Google Cloud Dataproc with Alluxio for faster performance in the ...
Integrating Google Cloud Dataproc with Alluxio for faster performance in the ...Integrating Google Cloud Dataproc with Alluxio for faster performance in the ...
Integrating Google Cloud Dataproc with Alluxio for faster performance in the ...
 
Ntc 326 Great Wisdom / tutorialrank.com
Ntc 326 Great Wisdom / tutorialrank.comNtc 326 Great Wisdom / tutorialrank.com
Ntc 326 Great Wisdom / tutorialrank.com
 
Building & managing wa app wely
Building & managing wa app   welyBuilding & managing wa app   wely
Building & managing wa app wely
 
Building & Managing Windows Azure
Building & Managing Windows AzureBuilding & Managing Windows Azure
Building & Managing Windows Azure
 
SQL Server 2019 hotlap - WARDY IT Solutions
SQL Server 2019 hotlap - WARDY IT SolutionsSQL Server 2019 hotlap - WARDY IT Solutions
SQL Server 2019 hotlap - WARDY IT Solutions
 
Mta cert paths
Mta cert pathsMta cert paths
Mta cert paths
 

Último

Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 

Último (20)

Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelNavi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu SubbuApidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 

Big Insights v4.1

  • 1. Contents BigInsights 4.1.0 1 BigInsights 3 Accessibility 3 Using keyboard shortcuts and accelerators 4 Terms and Conditions 7 Notices 8 Product overview 13 BigInsights information roadmap 14 Introduction to BigInsights 18 IBM Open Platform with Apache Hadoop and the BigInsights value-add services 20 IBM BigInsights Quick Start Edition for Non-Production Environments 21 Features and architecture 22 File systems 24 Hadoop Distributed File System 25 MapReduce frameworks 26 Hadoop MapReduce 27 Yarn 28 Open source technologies 29 Ambari 32 Flume 34 Hadoop 36 HBase 38 Hive 40 Kafka 43 Oozie 44 Pig 45 Slider 47 Solr 49 Spark 50 Sqoop 53 ZooKeeper 54 Other Apache projects 55 Text Analytics 56 IBM Big SQL 57 Integration with other IBM products 60 Suggested services layout for IBM Open Platform with Apache Hadoop and BigInsights value-added services 62 Scenarios for working with big data 63 Predictive modeling 64 Consumer sentiment insight 66 Research and business development 67 Where BigInsights fits in an enterprise data architecture 69 Release notes 70 Release notes - IBM Open Platform with Apache Hadoop, and BigInsights value- add services, Version 4.1 71 What's new for Version 4.1 75
  • 2. Installing 80 Installing IBM Open Platform with Apache Hadoop 81 Important installation information 83 Preparing to install IBM Open Platform with Apache Hadoop 86 Reference Architecture 88 Suggested services layout for IBM Open Platform with Apache Hadoop and BigInsights value-added services 89 Meeting minimum system requirements 91 Collect host names and other information for your installation 92 Preparing your environment 93 Users and groups for IBM Open Platform with Apache Hadoop 102 Default Ports created by a typical installation 104 Setting up port forwarding from private to edge nodes 106 Configuring your browser 108 Configuring authentication 109 Configuring LDAP authentication on RHEL 6 110 Obtaining software for the IBM Open Platform with Apache Hadoop 113 Downloading the IBM repository definition for the IBM Open Platform with Apache Hadoop 115 Creating a mirror repository for the IBM Open Platform with Apache Hadoop software 116 Creating a repository for Spectrum Scale 118 Installing IBM Open Platform with Apache Hadoop on SUSE Linux Enterprise Server (SLES) by using tar files 120 Running the installation package 122 Validating your installation 131 Upgrading the Java (JDK) version 132 Installing and configuring HttpFS on IBM Open Platform with Apache Hadoop 134 Installing and configuring WebHDFS with Knox on IBM Open Platform with Apache Hadoop 137 Installing additional services in your IBM Open Platform 139 Cleaning up nodes before reinstalling software 140 HostCleanup.ini file 144 HostCleanup_Custom_Actions.ini file 146 Advanced installation planning 147 Serviceability tools 148 Directories created when installing IBM Open Platform with Apache Hadoop150 Planning for high availability 152 Configuring Slider HBase 153 Setting up a dual network for IBM Open Platform with Apache Hadoop 157 Enabling CMX compression to compress the output of the intermediate jobs generated by Pig 159 Configuring YARN container execution 160 Installing the IBM BigInsights value-added services on IBM Open Platform with Apache Hadoop 162 Preparing to install the BigInsights value-add services 164 Users, groups, and ports for BigInsights value-add services 169
  • 3. Default Ports created by a typical BigInsights value-add services installation 171 Obtaining the BigInsights value-add services 172 Additional related software 175 Obtaining the BigInsights Quick Start Edition for non-production use 177 Installing the BigInsights value-add packages 178 Installing BigInsights Home 185 Installing the BigSheets service 187 Installing the BigInsights - Big SQL service 190 Preinstallation checker utility for Big SQL 194 Clean-up utility for Big SQL 198 Migrating from Big SQL 1.0 200 Installing the BigInsights - Data Server Manager service 203 Installing the Text Analytics service 207 Installing the Big R service 211 Installing Big R on a workstation or notebook 217 Enabling Knox for value-add services 218 Removing BigInsights value-add services 222 Installing the Enterprise Manager module for IBM Open Platform with Apache Hadoop 226 Acquiring Spectrum Scale and Platform Symphony 227 Spectrum Scale FPO (GPFS) 228 Installing Spectrum Scale FPO (GPFS) 229 Performance analysis for GPFS 230 Platform Symphony 232 Installing and configuring Platform Symphony 234 Installing the IBM Platform Symphony service stack on the Ambari server 235 Adding Platform Symphony as an Ambari service 237 Post-installation checks 239 Running a service check on the Platform Symphony cluster 240 Verifying component installation and configuration 241 Cleaning up an incomplete Symphony deployment from Ambari 242 Post-installation configuration 244 Mapping Hadoop YARN queues to Symphony YARN queues 245 Granting MySQL access privileges 248 Updating the Ambari GUI manually after integrating Symphony 249 Changing port numbers 251 Configuring master failover in an Ambari environment 252 Configuring automatic failure recovery for the Symphony YARN Resource Manager in an Ambari environment 253 Opening the Platform Management Console from the Ambari console 254 Controlling hosts with Platform Symphony 255 Integrating open source MapReduce, YARN, or Spark in a Symphony cluster 256 Undoing integration of open source MapReduce, YARN, or Spark in a Symphony cluster 257 Upgrading IBM Platform Symphony to IBM Open Platform 4.1 258
  • 4. Getting started with Spark on EGO 261 Submitting a sample Spark job 263 Spark on EGO FAQs 265 Installing the free IBM Open Platform with Apache Hadoop and BigInsights Quick Start Edition, non-production software 266 IBM Open Platform with Apache Hadoop and IBM BigInsights Quick Start Edition for non-production environments, v4.1: Docker image README 267 Optional: Get nodes on a cloud environment 276 IBM BigInsights Quick Start Edition for Non-Production Environment, v4.1: VMware image README 279 Tutorials 284 Tutorial: Analyzing big data with BigSheets 285 Lesson 1: Creating master workbooks from social media data 287 Lesson 2: Tailoring your data by creating child workbooks 290 Lesson 3: Combining the data from two workbooks 293 Lesson 4: Creating columns by grouping data 297 Lesson 5: Viewing data in BigSheets diagrams 299 Lesson 6: Visualizing and refining the results in charts 300 Lesson 7: Exporting data from your workbooks 306 Summary of analyzing data with BigSheets tutorial 308 Tutorial: Developing Big SQL queries to analyze big data 309 Setting up the Big SQL tutorial environment 311 Creating a directory in the distributed file system to hold your samples 312 Getting the sample data 313 Accessing the Big SQL sample data installed in the Big SQL service 314 Downloading sample data from a developerWorks source 316 Creating tables and loading sample data 317 Module 1: Creating and running SQL script files 320 Lesson 1.1: Creating an SQL script file 322 Lesson 1.2: Creating and running a simple query to begin data analysis 323 Lesson 1.3: Creating a view that represents the inventory shipped by branch 327 Lesson 1.4: Analyzing products and market trends with Big SQL Joins and Predicates 329 Lesson 1.5: Creating advanced Big SQL queries that include common table expressions, aggregate functions, and ranking 334 Lesson 1.6: Advanced: Porting an existing Hive UDF to Big SQL 337 Lesson 1.7: Advanced: Creating and running a simple Big SQL query from a JDBC client application 339 Module 2: Analyzing big data by using Big SQL and BigSheets 343 Lesson 2.1: Preparing queries to export to BigSheets that examine the results of sales by year 345 Lesson 2.2: Exporting Big SQL data about total sales by year to BigSheets 347 Lesson 2.3: Creating tables for BigSheets from other tables 349 Lesson 2.4: Exporting BigSheets data about IBM Watson blogs to Big SQL tables 351 Task 2.4.1: Creating and modifying a BigSheets workbook from JSON Array formatted data 352
  • 5. Task 2.4.2: Exporting the BigSheets blog data workbook to a TSV file 354 Task 2.4.3: Creating a Big SQL script that creates Big SQL tables from the exported TSV file 355 Task 2.4.4: Exporting the BigSheets workbook as a JSON Array for use with a SerDe application in Big SQL 357 Task 2.4.5: Creating a Big SQL table that uses the SerDe application to process the Watson blog data 359 Module 3: Analyzing Big SQL data in a client spreadsheet program 361 Lesson 3.1: Installing the IBM Data Server Driver Package for the client ODBC drivers 362 Lesson 3.2: Importing Big SQL data to a client spreadsheet program 365 Module 4: Using Federation in Big SQL 367 Lesson 4.1: Setting up the client data source 368 Lesson 4.2: Configuring Big SQL as the Federated Server 370 Lesson 4.3: Integrating information from two companies 373 Module 5: Working with HBase tables 375 Lesson 5.1: Creating and populating HBase tables 376 Lesson 5.2: Creating and using HBase views 378 Summary of developing Big SQL queries to analyze big data 379 Tutorial: Analyzing big data with Big R 380 Lesson 1: Uploading the airline data set to the BigInsights server with Big R 382 Lesson 2: Exploring the structure of the data set with Big R 384 Lesson 3: Analyzing data with Big R 385 Lesson 4: Visualizing big data with Big R 387 Lesson 5: Extending R packages to work with big data 389 Lesson 6: Building scalable machine learning models with Big R 392 Summary of analyzing data with Big R tutorial 395 Tutorial: Analyzing text with BigInsights Text Analytics 396 Lesson 1: Setting up your project 398 Lesson 2: Selecting input documents and identifying examples 400 Lesson 3: Creating and testing extractors 403 Lesson 4: Writing and testing extractors for candidates 406 Lesson 5: Creating and testing final extractors 411 Lesson 6: Finalizing and saving the extractor 413 Summary of creating your first Text Analytics extractor 414 Importing and exporting data 415 Identifying your data resources 417 Recommended tools for importing data 420 Importing data at rest 421 Importing data by using Hadoop shell commands 423 Importing data in motion 425 Importing data by using Flume 427 Importing data from a data warehouse 428 Big SQL LOAD 430 How to diagnose and correct LOAD HADOOP problems 460 How to monitor the progress of LOAD HADOOP 463 Importing and exporting DB2 data by using Sqoop 464
  • 6. Importing IMS data by using Sqoop 466 Integrating the Teradata Connector for Hadoop 468 Installing the Teradata Connector for Hadoop 469 Importing data with the Teradata Connector for Hadoop 471 Exporting data with the Teradata Connector for Hadoop 475 Running tdimport or tdexport from an Oozie application 479 Corresponding Sqoop and Teradata options 482 Setting up and administering security 484 Securing IBM Open Platform with Apache Hadoop 485 Setting up HTTPS with a self-signed certificate for the Ambari web interface 487 Setting up HTTPS with an authority certificate for the Ambari web interface 489 Setting up two-way SSL between the Ambari server and Ambari agents 491 Manually configuring SSL support for HBase REST gateway with Knox 492 Manually configuring SSL support for HBase, MapReduce, YARN, and HDFS web interfaces 495 Manually configuring SSL support for HiveServer2 499 Manually configuring SSL support for Oozie 501 Apache Knox gateway overview 503 Hadoop service access in Knox 506 Knox Gateway directories 509 Knox Gateway samples 511 Changing the Knox Gateway port or path 513 Managing the master secret 514 Redeploying cluster topologies 515 Manually starting and stopping Apache Knox 518 Adding a new service to the Knox Gateway 519 Cluster topology definition in Apache Knox Gateway 520 Knox topology configuration to connect to Hadoop cluster services 522 Setting up Hadoop service URLs 523 Example: service definitions 524 Service connectivity validation 526 Configuring authentication on Knox and Ambari 528 Setting up LDAP authentication in Knox 529 Example: Active Directory configuration 531 Example: OpenLDAP configuration 532 Setting up LDAP or Active Directory authentication in Ambari 533 Knox Gateway Identity Assertion 537 Defining an identify-assertion provider 539 Adding a user mapping rule to an identity-assertion provider: 540 Concat Identity Assertion 541 User Mapping Example 542 Configuring group mapping 543 Knox Gateway security 545 Implementing web application security 546 Configuring Knox with a secured Hadoop cluster 548 Configuring wire encryption (SSL) 550 Using CA-signed certificates for production 551
  • 7. Kerberos in IBM Open Platform with Apache Hadoop 552 Overview of Kerberos in IBM Open Platform with Apache Hadoop 553 Setting up Kerberos for IBM Open Platform with Apache Hadoop 556 Setting up a KDC manually 561 Manually generating keytabs for Kerberos authentication 563 Properties in the Kerberos descriptors 570 Enabling SPNEGO authentication for IBM Open Platform with Apache Hadoop 572 User and group management 573 Changing the administrator account password 575 Creating a local user 576 Changing the password of a local user 577 Deleting a local user 578 Creating a local group 579 Managing local group membership 580 Deleting a local group 581 Enabling transparent data encryption 582 Securing the BigInsights value-added services 585 Restarting Knox to access value added components 586 Setting up Kerberos for the BigInsights - Big R service 587 Setting up Kerberos for the BigInsights - Big SQL service 588 Setting up Kerberos for the BigInsights - Text Analytics service 590 Setting up Kerberos for the BigInsights - BigSheets service 591 Understanding and configuring BigSheets access of Big SQL table data 593 Enabling SSL encryption for the Big R service 595 Managing Access Control Lists (ACL) and Authorizations 597 ACL Management for Hive 598 Storage-Based Authorization 600 SQL-Standard Based Authorization 603 ACL Management for HDFS 607 ACL Management for HBase 609 ACL Management for YARN 612 Administering Ambari and components 613 High availability 614 Setting up NameNode high availability 615 Pointing to a new NameNode location in the Hive metastore 618 Setting up Oozie high availability 621 Setting up Resource Manager high availability 624 Enabling work-preserving ResourceManager restart 625 Enabling Hive metastore high availability 627 Enabling HiveServer2 high availability 629 High availability in Big SQL 632 Enabling Big SQL high availability 635 Disabling Big SQL high availability 638 Example of configuring clients to work with Big SQL high availability 639 Configuring high availability for HBase 641 Managing Flume 642
  • 8. Flume configuration scenario 644 Decommissioning slave nodes 646 Ambari alerts and monitoring services 649 Working with alerts in the Ambari web interface 652 Configuring notifications 654 Creating or editing alert groups 657 Creating alert notification to track status changes in high availability failover 659 Pre-defined alerts in Ambari 663 HDFS service alerts 664 NameNode High Availability service alerts 667 YARN service alerts 669 MapReduce2 service alerts 671 HBase service alerts 672 Hive service alerts 674 Oozie servie alerts 675 ZooKeeper service alerts 676 Ambari alerts 677 Ambari metrics 678 Working with widgets 682 Working with service specific widgets 685 How to switch the Ambari metrics system to a distributed mode 689 Ambari views 691 Creating a Capacity-Scheduler view instance 692 Creating a Files view instance 694 Creating a Hive view instance 697 Additional Hive view configurations and setup 701 Creating a Pig view instance 703 Creating a Slider view instance 707 Developing applications to access and manage data 709 Developing Big SQL applications in your Hadoop environment 710 What's New with the Big SQL server 712 Big SQL configuration and log management 714 JSqsh client 715 Big SQL connections 716 Security in Big SQL 717 HBase tables 718 Data types that are supported by Big SQL 719 Big SQL Catalog schema 721 What you need to know before writing Big SQL applications 722 File formats supported by Big SQL 723 User-defined external scalar functions in Big SQL 728 Transactional behavior of Hadoop tables 731 Transactional behavior of CREATE TABLE ... AS in Big SQL 732 Transactional behavior of INSERT in Big SQL 733 Transactional behavior of LOAD HADOOP USING 734 Understanding data types 735 Data types migrated from Hive applications 736
  • 9. Data types that are supported by Big SQL 737 How Hive handles NULL values on a partitioning column of type String 743 How to work with HBase tables 745 Security considerations when Big SQL accesses HBase objects 755 HDFS caching 757 Memory calculator worksheet 764 Developing routines 767 Routines 769 Overview of routines 770 Benefits of using routines 772 Types of routines 774 Built-in and user-defined routines 777 Built-in 778 User-defined 779 Comparison of user-defined and built-in routines 781 Choosing to use built-in or user-defined routines 783 Functional types of routines 784 Procedures 786 Functions 788 Scalar functions 790 Row functions 792 Table functions 793 Methods 794 Comparison of routine functional types 795 Choosing a routine functional type 798 Implementations of routines 800 Built-in routines 802 Sourced routines 803 SQL routines 804 External routines 805 Supported APIs and programming languages 807 Comparison of APIs and programming languages 808 Comparison of routine implementations 812 Choosing a routine implementation 815 Usage of routines 817 Administering databases with built-in routines 818 Extension of SQL function support with user-defined functions 820 Auditing using SQL table functions 821 Tools for developing routines 824 IBM Data Studio routine development support 825 SQL statements that can be executed in routines and triggers 826 SQL access levels 833 Determining what SQL statements can be executed in routines 835 Portability of routines 837 Interoperability of routines 838 Performance of routines 839 Security of routines 847
  • 10. Securing routines 849 Authorizations and binding of routines that contain SQL 851 Data conflicts when procedures read from or write to tables 855 Debugging compiled SQL PL objects overview 857 External routines 858 External routine features 860 External function and method features 862 Scalar user-defined functions 864 External scalar function and method invocation 866 External table functions 867 External table function processing 868 Generic table functions 870 Using generic table functions 871 Java table function execution model 873 Scratchpads for external functions and external methods 875 Scratchpads for 32-bit and 64-bit operating systems 879 SQL in external routines 881 Parameter styles for external routines 884 Parameter handling 890 Supported routine programming languages 892 Comparison of APIs and programming languages 894 Performance considerations for developing routines 894 Security considerations for routines 897 Routine code page considerations 900 Application and routine support 901 32-bit and 64-bit support for external routines 903 Performance of 32-bit routines in 64-bit environments 904 XML data type support 905 Restrictions on external routines 907 Creating external routines 910 Writing routines 913 Debugging routines 915 Library and class management considerations 917 Deployment of routine library or class files 919 Security of external routine library or class files 921 Resolution of external routine library or class files 922 Modifications to external routine library or class files 923 Backup and restore of external routine library and class files 924 Performance and library management 925 C and C++ routines 926 Supported software (C) 928 Supported software (C++) 929 Tools for developing C and C++ routines 930 Designing C and C++ routines 931 Include file required for C and C++ routine development 933 Parameters in C and C++ routines 935 Parameter styles supported 937
  • 11. Parameter null indicators 938 Parameter style SQL C and C++ procedures 939 Parameter style SQL C and C++ functions 943 Passing parameters by value and by reference 946 Parameters not required for result sets 947 The dbinfo structure routine parameter 948 Scratchpad as function parameter 952 Program type MAIN support for procedures 954 SQL data type representation 956 SQL data type handling 960 How to pass arguments to C routines 969 Graphic host variables 981 C++ type decoration 982 Returning result sets from procedures 984 Creating C and C++ routines 986 Building C and C++ routine code 989 Building C and C++ routine code using the sample bldrtn script 990 Building routines in C or C++ using the sample build script (UNIX) 992 Building C/C++ routines on Windows 992 Building C and C++ routine code from the command line 992 Compile and link options for C and C++ routines 994 AIX C routine compile and link options 995 AIX C++ routine compile and link options 995 HP-UX C routine compile and link options 995 HP-UX C++ routine compile and link options 995 Linux C routine compile and link options 995 Linux C++ routine compile and link options 995 Solaris C routine compile and link options 995 Solaris C++ routine compile and link options 995 Windows C and C++ routine compile and link options 995 Rebuilding routine shared libraries 995 Updating the database manager configuration parameters 996 COBOL procedures 997 Supported software 1000 Supported SQL data types in COBOL embedded SQL applications 1001 Building COBOL routines 1001 Compile and link options for COBOL routines 1002 AIX IBM COBOL routine compile and link options 1003 AIX Micro Focus COBOL routine compile and link options 1003 HP-UX Micro Focus COBOL routine compile and link options 1003 Solaris Micro Focus COBOL routine compile and link options 1003 Linux Micro Focus COBOL routine compile and link options 1003 Windows IBM COBOL routine compile and link options 1003 Windows Micro Focus COBOL routine compile and link options 1003 Building IBM COBOL routines on AIX 1003 Building UNIX Micro Focus COBOL routines 1003
  • 12. Building IBM COBOL routines on Windows 1003 Building Micro Focus COBOL routines on Windows 1003 Java routines 1003 Supported software 1005 JDBC and SQLJ API support 1006 Specifying JDK for Java routine development (Linux and UNIX) 1007 Specification of a driver for Java routines 1009 Tools for developing Java routines 1010 Designing Java routines 1011 SQL data type representation 1013 Connection contexts in SQLJ routines 1015 Parameters in Java routines 1016 Parameter style JAVA procedures 1018 Parameter style JAVA functions 1020 Parameter style HIVE functions 1021 Supported SQL data types in HIVE routines 1024 Parameter style DB2GENERAL routines 1025 DB2GENERAL UDFs 1026 Supported SQL data types in DB2GENERAL routines 1029 Java classes for DB2GENERAL routines 1031 DB2GENERAL Java class: COM.ibm.db2.app.StoredProc 1032 DB2GENERAL Java class: COM.ibm.db2.app.UDF 1034 DB2GENERAL Java class: COM.ibm.db2.app.Lob 1037 DB2GENERAL Java class: COM.ibm.db2.app.Blob 1038 DB2GENERAL Java class: COM.ibm.db2.app.Clob 1039 Passing parameters of data type ARRAY to Java routines 1040 Returning result sets from Java (JDBC) procedures 1042 Returning result sets from Java (SQLJ) procedures 1043 Retrieving procedure result sets in Java (JDBC) applications and procedures 1044 Retrieving procedure result sets in Java (SQLJ) applications and procedures 1046 Restrictions on Java routines 1048 Java table function execution model 1050 Creating Java routines 1050 Creating Java routines from the command line 1052 Building Java routine code 1055 Building JDBC routines 1056 Building SQLJ routines 1058 Compile and link options for Java (SQLJ) routines 1059 SQLJ routine options for Linux and UNIX 1060 Deploying Java routines 1061 JAR file administration 1063 Updating Java routines 1065 Examples of Java (JDBC) routines 1067 Example: Array data type in Java (JDBC) procedure 1068
  • 13. Example: XML and XQuery support in Java (JDBC) procedure 1069 Invoking routines 1074 Authorizations and binding of routines that contain SQL 1077 Routine names and paths 1077 Nested routine invocations 1079 Invoking 32-bit routines on a 64-bit database server 1080 References to procedures 1081 Calling procedures 1082 Calling procedures from applications or external routines 1084 Calling procedures from triggers or SQL routines 1086 Calling stored procedures from the CLP 1089 Calling stored procedures from CLI applications 1092 Calling stored procedures with array parameters from CLI applications 1092 Procedure result sets 1092 Result sets from SQL data changes 1094 Result sets from SQL data changes using cursors 1097 References to functions 1098 Function selection 1100 Distinct types as UDF or method parameters 1102 LOB values as UDF parameters 1103 Invoking scalar functions or methods 1104 Invoking user-defined table functions 1106 Analyzing big data by using BigInsights value-added services 1108 Analyzing data with BigSheets 1109 Overview of BigSheets 1110 Workbooks and sheets 1112 Sheet types 1114 Group sheet 1116 Building sets of data 1120 Creating master workbooks from catalog tables 1121 Creating workbooks from existing workbooks 1122 Changing a column data type in a master workbook 1123 Data types 1124 Changing the data source for master workbooks 1125 Copying workbooks to a new cluster 1126 Exporting workbook metadata 1127 Importing workbook metadata 1128 Discovering data 1129 Adding columns in sheets 1130 Modifying columns in sheets 1131 Adding sheets to workbooks 1132 Viewing related sheets 1133 Viewing related workbooks 1134 Changing the data reader for workbooks 1135 Data readers 1136 Running workbooks 1140
  • 14. Visualizing your result data in charts and maps 1141 Chart and map types 1142 Understanding null behavior 1145 Deleting workbooks 1146 Restoring deleted workbooks 1147 Purging deleted workbooks 1148 Formulas 1149 Functions 1150 Conditional functions 1152 DateTime functions 1157 Pattern syntax for custom DateTime formats 1165 Entity functions 1167 HTML and XML functions 1171 Math functions 1177 Geospatial functions 1182 Statistical functions 1190 Selection functions 1193 Text functions 1195 Text comparison functions 1214 URL functions 1219 Formula examples 1225 Sharing data 1227 Exporting data from a workbook 1228 Sharing workbooks with other users 1229 Creating and deleting catalog tables 1230 Extending BigSheets 1232 Administering BigSheets by using REST APIs 1233 Creating BigSheets plug-ins 1243 Building customized functions 1246 Building customized readers 1251 Building customized charts 1257 Uploading custom plug-ins to BigSheets 1261 Analyzing big data with Text Analytics 1262 Developing extractors in the web tool 1263 Information Extraction Web Tool 1264 Designing text extraction projects 1265 Design your project 1267 What are the provided extractors? 1269 Linguistic support 1270 Case study: Extracting insights from financial documents 1271 Create the extractor 1272 Refine the results 1275 Manage projects and extractors 1276 Use the workspace 1277 Manage projects 1278 Adding and removing sample documents 1279 Document size limitations 1283
  • 15. Manage extractors 1285 Create, edit, and combine extractors 1287 Define dictionaries 1289 Define a list 1290 Define a mapping table 1291 Define a literal 1292 Define regular expressions 1293 Define sequence patterns 1295 Add proximity rules 1297 Define unions of extractors 1299 Run an extractor and refine results 1301 Refine results 1303 Eliminate duplicate and overlapping results 1306 Refine results using filters 1308 Export refined extractor results 1310 Extract in languages other than English 1312 Extend the provided extractors 1314 Define new extractors based on linguistic patterns 1316 Custom extractors 1317 Exporting Extractors 1318 Exporting extractors to AQL 1319 Exporting extractors as map/reduce jobs 1321 Exporting extractors as BigSheets functions 1323 Developing Text Analytics extractors using Annotation Query Language (AQL)1326 Annotation Query Language (AQL) 1328 Extractors 1329 Modules 1331 Scenarios that illustrate modules 1334 Best practices for developing modules 1342 AQL files 1343 Views 1345 Dictionaries 1347 Tables 1348 Functions 1349 Pre-built extractor libraries 1350 Named entity extractors 1354 Financial extractors 1361 Generic extractors 1372 Other extractors 1376 Sentiment extractors 1379 Machine Data Analytics extractors 1382 Base modules 1485 Guidelines for writing AQL 1487 Using basic feature AQL statements 1491 Using candidate generation AQL statements 1493 Using filter and consolidate AQL statements 1496 Creating complex AQL statements 1499
  • 16. Enhancing content of AQL views 1502 Using naming conventions 1505 Data collection formats 1507 UTF-8 encoded text files 1508 UTF-8 encoded CSV files 1510 UTF-8 encoded JSON files in Hadoop text input format 1512 Multilingual support for Text Analytics 1515 Text Analytics Optimizer 1517 Execution plan 1522 Operators 1525 Relational operators 1526 Span aggregation operators 1528 Span extraction operators 1529 Specialized operators 1531 Tokenization 1532 Running Text Analytics extractors 1535 Run extractors on distributed files from the web tool 1536 Running extractors with the Java Text Analytics APIs 1538 Reading document collections with the DocReader API 1550 Text Analytics URI formats 1553 Improving extractor performance 1556 Know your performance requirements 1557 Follow the guidelines for regular expressions 1558 Use the consolidate clause wisely 1563 Using external resources in Text Analytics 1566 Analyzing and manipulating big data with Big SQL 1568 Configuring security for Big SQL 1570 Enabling authentication for Big SQL 1572 Authorization of Big SQL objects 1574 Database authorization 1577 Enabling SSL (Secure Socket Layer) encryption 1581 Default privileges granted on the bigsql database 1582 Configuration parameters 1584 Big SQL architecture 1587 Managing the Big SQL server 1589 Configuring the IBM Big SQL server 1590 Big SQL Scheduler 1593 Big SQL Input/Output 1595 JDBC and ODBC drivers 1596 JDBC driver 1597 ODBC driver for Linux 1599 ODBC driver for Windows 1601 Connecting to the Big SQL server that is part of the Big SQL service 1603 Downloading and Installing IBM Data Studio 1604 Creating or changing a JDBC driver definition 1605 Connecting to a Big SQL server 1606 Analyzing data with Big SQL 1608
  • 17. How to run Big SQL queries 1610 Running Big SQL queries with Big SQL monitoring and edit tool 1611 Java SQL Shell (JSqsh) 1612 Big SQL statistics 1616 Statistics gathered from expression columns in statistical views 1617 Extending Big SQL 1619 Working with Hive ACID tables in Big SQL 1622 LOAD performance guidelines 1625 Tuning HBase performance 1627 HBase basics 1628 General HBase tuning 1631 Major compaction and data locality 1634 Hints for designing HBase tables 1636 Mapping data types to fields in HBase 1639 Hints for designing indexes 1642 Properties that can optimize HBase table scans 1645 Hints for optimizing LOAD 1646 Monitoring Big SQL in the IBM Open Platform with Apache Hadoop environment 1649 Monitoring metrics with the Big SQL query interface 1650 Monitoring the cluster status of your Big SQL queries 1653 Analyzing data with Big R 1654 Overview of Big R 1655 Connecting to a data set with Big R 1656 Running Big R scripts 1657 Troubleshooting and support 1658 Resolving problems with BigInsights 1659 Logging 1660 Logs and their locations 1661 Problems and workarounds 1664 Installation 1665 Unable to open Ambari browser 1666 Installing IBM Open Platform with Apache Hadoop does not complete successfully because of connection issues 1667 Starting the Ambari server on Linux Power operating systems fails due to connection error when the number of cores in the machine is greater than 48 1668 Components and value-add services 1669 Cannot stop all BigInsights value-add services from web interface 1671 Oozie: Cannot start Oozie service - ERROR XSDB6 1672 Restart all Hive services does not start all services correctly 1673 Failed to get schema version when starting Hive Metastore Service 1674 Adding additional Kafka Brokers after the initial installation might result in an error when starting the broker 1675 Running Kafka producer with localhost generates error 1676 Continual "Hive Metastore Process" alerts showing in the Ambari web interface even when process is running on pLinux 1678
  • 18. After Kerberos is enabled, Ambari Quick Links might no longer work 1679 Ambari is unable to create a Files view for a cluster after Kerberos is enabled 1680 Spark Thrift Server (1.5.1) goes down when Kerberos is enabled on the cluster 1681 Restarting the Solr service might fail 1682 Big SQL 1683 Hive and Big SQL catalogs inconsistent 1685 TPCIP ports in FIN_WAIT1 state 1687 JSqsh Big SQL connection profile points to the wrong Big SQL service port 1688 Command fails with authorization error after successfully connecting to a Big SQL server 1689 Installing the Big SQL service failed because of a tty requirement 1690 Cannot decommission a dead Big SQL worker node 1691 How to delete a Big SQL worker node 1693 Big SQL monitoring utility fails to install 1695 Big SQL monitoring utility (DSM) is not configured properly because of a packaging error 1696 Uninstalling the BigInsights - Big SQL service does not completely uninstall components 1697 Cannot use Hive directly to work with Big SQL HBase tables 1698 Big SQL instance owner does not exist in the server that hosts the NameNode service causing Hadoop operations to fail 1699 Big SQL authorization errors after switching to an HDFS standby NameNode 1700 How to remove a faulty node from the Big SQL service 1701 Interrupted operations can cause metadata inconsistency 1703 The Big SQL scheduler can report an incorrect number of nodes in a Spectrum Scale FPO (GPFS) environment, affecting the Big SQL plan quality 1704 Text Analytics 1705 Stopping and starting the Hive service that is installed on the same node as BigInsights - Text Analytics might result in a failure. 1706 BigSheets 1707 Workbook names must be unique 1708 Workbook run does not progress 1709 BigSheets reader cannot view data 1710 BigSheets service start fails in a Kerberos environment 1712 Area charts hide data sets 1713 BigSheets hangs when you remove columns 1714 BigInteger columns cannot be Y axis 1715 Data missing from processed results 1716 Unable to create a table from BigSheets 1718 BigInsights Home service 1719 After installing the BigInsights Home service, other services will not start 1720
  • 19. Big R problems and workarounds 1721 Removing Big R is not always successful 1722 General troubleshooting techniques and resources 1723 Subscribing to IBM Support updates 1724 Searching knowledge bases 1726 Getting fixes from Fix Central 1727 Contacting IBM Support 1729 Exchanging information with IBM 1731 Reference 1733 IBM Big SQL reference 1734 IBM Big SQL Reference 1735 How to read the syntax diagrams 1736 Conventions used for the SQL topics 1739 Error conditions 1740 Highlighting conventions 1741 Conventions describing Unicode data 1742 Language elements 1743 Characters 1744 Tokens 1746 Identifiers 1748 Data types 1777 Data type list 1779 Numbers 1780 Character strings 1783 Graphic strings 1788 National character strings 1790 Binary strings 1791 Large objects (LOBs) 1792 Datetime values 1794 Boolean values 1798 Cursor values 1799 XML values 1800 Array values 1801 Row values 1804 Anchored types 1806 User-defined types 1807 Promotion of data types 1811 Casting between data types 1813 Assignments and comparisons 1822 Rules for result data types 1842 Rules for string conversions 1849 String comparisons in a Unicode database 1851 Resolving the anchor object for an anchored type 1853 Resolving the anchor object for an anchored row type 1855 Database partition-compatible data types 1857 Constants 1859 Special registers 1865
  • 20. CURRENT CLIENT_ACCTNG 1868 CURRENT CLIENT_APPLNAME 1869 CURRENT CLIENT_USERID 1870 CURRENT CLIENT_WRKSTNNAME 1871 CURRENT DATE 1872 CURRENT DBPARTITIONNUM 1873 CURRENT DECFLOAT ROUNDING MODE 1874 CURRENT DEFAULT TRANSFORM GROUP 1875 CURRENT DEGREE 1876 CURRENT EXPLAIN MODE 1877 CURRENT EXPLAIN SNAPSHOT 1879 CURRENT FEDERATED ASYNCHRONY 1880 CURRENT IMPLICIT XMLPARSE OPTION 1881 CURRENT ISOLATION 1882 CURRENT LOCALE LC_MESSAGES 1883 CURRENT LOCALE LC_TIME 1884 CURRENT LOCK TIMEOUT 1885 CURRENT MAINTAINED TABLE TYPES FOR OPTIMIZATION 1886 CURRENT MDC ROLLOUT MODE 1887 CURRENT MEMBER 1888 CURRENT OPTIMIZATION PROFILE 1889 CURRENT PACKAGE PATH 1890 CURRENT PATH 1891 CURRENT QUERY OPTIMIZATION 1892 CURRENT REFRESH AGE 1893 CURRENT SCHEMA 1894 CURRENT SERVER 1895 CURRENT SQL_CCFLAGS 1896 CURRENT TEMPORAL BUSINESS_TIME 1897 CURRENT TEMPORAL SYSTEM_TIME 1899 CURRENT TIME 1901 CURRENT TIMESTAMP 1902 CURRENT TIMEZONE 1904 CURRENT USER 1905 SESSION_USER 1906 SYSTEM_USER 1907 USER 1908 Global variables 1909 Types of global variables 1910 Authorization required for global variables 1912 Resolution of global variable references 1913 Using global variables 1915 Functions 1917 Methods 1933 Conservative binding semantics 1941 Expressions 1944 Datetime operations and durations 1957
  • 21. CASE expression 1963 CAST specification 1966 Field reference 1972 XMLCAST specification 1974 ARRAY element specification 1976 Array constructor 1977 Dereference operation 1979 Method invocation 1981 OLAP specification 1983 ROW CHANGE expression 1998 Sequence reference 2000 Subtype treatment 2004 Determining data types of untyped expressions 2005 Row expression 2012 Predicates 2014 Search conditions 2015 Basic predicate 2018 Quantified predicate 2021 ARRAY_EXISTS predicate 2024 BETWEEN predicate 2025 Cursor predicates 2026 EXISTS predicate 2028 IN predicate 2029 LIKE predicate 2031 NULL predicate 2037 REGEXP_LIKE predicate 2038 Trigger event predicates 2041 TYPE predicate 2042 VALIDATED predicate 2044 XMLEXISTS predicate 2047 Built-in global variables 2050 CATALOG_SYNC_MODE global variable 2052 CLIENT_HOST global variable 2053 CLIENT_IPADDR global variable 2054 CLIENT_ORIGUSERID global variable 2055 COMPATIBILITY_MODE global variable 2056 CLIENT_USRSECTOKEN global variable 2057 MON_INTERVAL_ID global variable 2058 NLS_STRING_UNITS global variable 2059 PACKAGE_NAME global variable 2060 PACKAGE_SCHEMA global variable 2061 PACKAGE_VERSION global variable 2062 ROUTINE_MODULE global variable 2063 ROUTINE_SCHEMA global variable 2064 ROUTINE_SPECIFIC_NAME global variable 2065 ROUTINE_TYPE global variable 2066 TRUSTED_CONTEXT global variable 2067
  • 22. Built-in functions 2068 Aggregate functions 2082 ARRAY_AGG 2083 AVG 2088 CORRELATION 2090 COUNT 2092 COVARIANCE 2094 COVARIANCE_SAMP 2096 GROUPING 2098 LISTAGG 2100 MAX 2103 MEDIAN 2105 MIN 2107 PERCENTILE_CONT 2109 PERCENTILE_DISC 2111 Regression functions (REGR_AVGX, REGR_AVGY, REGR_COUNT, ...) 2113 STDDEV 2117 STDDEV_SAMP 2119 SUM 2121 VARIANCE 2123 VARIANCE_SAMP 2125 XMLAGG 2127 XMLGROUP 2129 Scalar functions 2133 ABS or ABSVAL 2134 ACOS 2135 ADD_DAYS 2136 ADD_HOURS 2138 ADD_MINUTES 2140 ADD_MONTHS 2142 ADD_SECONDS 2144 ADD_YEARS 2146 AGE 2148 ARRAY_DELETE 2150 ARRAY_FIRST 2152 ARRAY_LAST 2153 ARRAY_NEXT 2154 ARRAY_PRIOR 2156 ASCII 2158 ASIN 2159 ATAN 2160 ATAN2 2161 ATANH 2162 BIGINT 2163 BINARY 2165 BITAND, BITANDNOT, BITOR, BITXOR, and BITNOT 2167
  • 23. BLOB 2170 CARDINALITY 2171 CEILING or CEIL 2172 CHAR 2173 CHARACTER_LENGTH 2180 CHR 2182 CLOB 2183 COALESCE 2184 COLLATION_KEY 2185 COLLATION_KEY_BIT 2187 COMPARE_DECFLOAT 2189 CONCAT 2191 COS 2193 COSH 2194 COT 2195 CURSOR_ROWCOUNT 2196 DATAPARTITIONNUM 2197 DATE 2199 DAY 2201 DAYNAME 2203
  • 24. IBM BigInsights IBM BigInsights 4.1 documentation Welcome to IBM® BigInsights®, a collection of powerful value-add services that can be installed on top of the IBM Open Platform with Apache Hadoop. IBM Open Platform with Apache Hadoop is a platform for analyzing and visualizing Internet- scale data volumes that is powered by Apache Hadoop, an open source distributed computing platform. The value-add services include Big SQL, BigSheets, Big R, and Text Analytics. This information was updated December 2015. Getting started Introduction Hadoop-Dev FAQs What’s new? Release notes Installing IBM Open Platform with Apache Hadoop Installing the value-added services Detailed system requirements Common tasks Security Analyzing data with IBM Big SQL A typical Big SQL scenario Analyzing data (Big R, BigSheets, Big SQL, Text Analytics) CREATE TABLE (HADOOP) (Big SQL) CREATE TABLE (HBASE) (Big SQL) Troubleshooting and support Troubleshooting BigInsights 1
  • 25. BigInsights support portal Query IBM Support knowledge base BigInsights for Hadoop More information Tutorials Best Practices IBM Big Data education Understanding BigInsights IBM Redbooks © Copyright IBM Corporation 2009, 2015 2
  • 26. - - - - IBM BigInsights Accessibility Accessibility features help users with physical disabilities, such as restricted mobility or limited vision, to use software products successfully. The following list specifies the major accessibility features: All IBM® Open Platform with Apache Hadoop functionality is available using the keyboard for navigation instead of the mouse. You can customize the size of the fonts on IBM Open Platform with Apache Hadoop user interfaces with your web browser. BigInsights documentation is provided in an accessible format. Accessible documentation Documentation for BigInsights products is provided in XHTML 1.0 format, which is viewable in most Web browsers. XHTML allows you to view documentation according to the display preferences set in your browser. It also allows you to use screen readers and other assistive technologies. Syntax diagrams are provided in dotted decimal format. This format is available only if you are accessing the online documentation using a screen-reader. Using keyboard shortcuts and accelerators You can use keys or key combinations to perform operations that can also be done by using a mouse. Parent topic:BigInsights 3
  • 27. - - - - - IBM BigInsights Using keyboard shortcuts and accelerators You can use keys or key combinations to perform operations that can also be done by using a mouse. About this task You can initiate menu actions from the keyboard. Some menu items have accelerators, which allow you to invoke the menu option without expanding the menu. For example, you can enter CTRL+F for find, when the focus is on the details view. The major accessibility features in the Knowledge Center enable users to use assistive technologies, magnify what is displayed on the screen, and initiate menu actions from the keyboard. In addition, all images are provided with alternative text so that users with vision impairments can understand the contents of the images. Procedure You can initiate menu actions from the keyboard in the following ways: Press F10 to activate the keyboard. Then press the arrow keys to access specific options, or press the same letter as the one that is underlined in the name of the menu option you want to select. For example, to select the Help Index, press F10 to activate the main menu; then use the arrow keys to select Help > Help Index. Press and hold the Alt key. Then press the same letter as the one that is underlined in the name of the main menu option that you want to select. For example, to select the General Help menu option, press Alt+H; then press G. Tip: On some UNIX operating systems, you might need to press Ctrl instead of Alt . To exit the main menu without selecting an option, press Esc. Often the directions for how to access a window or wizard instruct you to right-click an object and select an object from the pop-up menu. To open the pop-up menu by using keyboard shortcuts, first select the object then press Shift+F10. To access a specific menu option on the pop-up menu, press the same letter as the one that is underlined in the name of the menu option you want to select. The following tables provide instructions for using keyboard shortcuts and accelerators.Table 1. General keyboard shortcuts and accelerators Action Shortcut Access the menu bar Alt or F10 Exit the main menu without selecting an option Esc Go to the next menu item arrow keys, or the underlined letter in the menu option Go to the next field in a window Tab Go back to the previous field in a window Shift+Tab Go from the browser address bar to the browser content area F6 4
  • 28. - - - Table 2. Keyboard shortcuts for table actions Table 3. Tree navigation Table 4. Editing actions Knowledge Center navigation: The major accessibility features in the Knowledge Center enable users to do the following: Use assistive technologies, such as screen-reader software and digital speech synthesizers, to hear what is displayed on the screen. In this Knowledge Center, all information is provided in HTML format. Consult the product documentation of the assistive technology for details on using assistive technologies with HTML-based information. Operate specific or equivalent features by using only the keyboard. Magnify what is displayed on the screen. The following table gives instructions for how to navigate the Knowledge Center by using the keyboard.Table 5. Keyboard shortcuts in the Knowledge Center Find Ctrl+F Find Next Alt+N Action Shortcut Move to the cell above or below up or down arrows Move to the cell to the left or right left or right arrows Give the next component focus Tab Give the previous component focus Shift+Tab Action Shortcut Navigate out forward Tab Navigate out backward Shift+Tab Expand entry Right Collapse entry Left Toggle expand/collapse for entry Enter Move up/down one entry up or down arrows Move to first entry Home Move to last visible entry End Action Shortcut Copy Ctrl+C Cut Ctrl+X Paste Ctrl+V Select All Ctrl+A Undo Ctrl+Z Action Shortcut 5
  • 29. - Standard operating system keystrokes are used for standard operating system operations. Parent topic:Accessibility Go to the next link, button, or topic branch from inside a frame (page) Tab Expand or collapse a topic branch Right and Left arrow keys Move to the next topic branch Down arrow or Tab Move to the previous topic branch Up arrow or Shift+Tab Scroll to the top Home Scroll to the bottom End Go back Alt+Left arrow Go forward Alt+Right arrow Next frame Ctrl+Tab Previous frame Shift+Ctrl+Tab Print the current page or active frame Ctrl+P 6
  • 30. IBM BigInsights Terms and Conditions Permissions for the use of these publications is granted subject to the following terms and conditions. Personal use: You may reproduce these Publications for your personal, non commercial use provided that all proprietary notices are preserved. You may not distribute, display or make derivative work of these Publications, or any portion thereof, without the express consent of IBM. Commercial use: You may reproduce, distribute and display these Publications solely within your enterprise provided that all proprietary notices are preserved. You may not make derivative works of these Publications, or reproduce, distribute or display these Publications or any portion thereof outside your enterprise, without the express consent of IBM. Except as expressly granted in this permission, no other permissions, licenses or rights are granted, either express or implied, to the Publications or any information, data, software or other intellectual property contained therein. IBM reserves the right to withdraw the permissions granted herein whenever, in its discretion, the use of the Publications is detrimental to its interest or, as determined by IBM, the above instructions are not being properly followed. You may not download, export or re-export this information except in full compliance with all applicable laws and regulations, including all United States export laws and regulations. IBM MAKES NO GUARANTEE ABOUT THE CONTENT OF THESE PUBLICATIONS. THE PUBLICATIONS ARE PROVIDED "AS-IS" AND WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING BUT NOT LIMITED TO IMPLIED WARRANTIES OF MERCHANTABILITY, NON- INFRINGEMENT, AND FITNESS FOR A PARTICULAR PURPOSE. Parent topic:BigInsights 7
  • 31. IBM BigInsights Notices This information was developed for products and services offered in the US. This material might be available from IBM in other languages. However, you may be required to own a copy of the product or product version in that language in order to access it. IBM may not offer the products, services, or features discussed in this document in other countries. Consult your local IBM representative for information on the products and services currently available in your area. Any reference to an IBM product, program, or service is not intended to state or imply that only that IBM product, program, or service may be used. Any functionally equivalent product, program, or service that does not infringe any IBM intellectual property right may be used instead. However, it is the user's responsibility to evaluate and verify the operation of any non-IBM product, program, or service. IBM may have patents or pending patent applications covering subject matter described in this document. The furnishing of this document does not grant you any license to these patents. You can send license inquiries, in writing, to:IBM Director of Licensing IBM Corporation North Castle Drive, MD-NC119 Armonk, NY 10504-1785 US For license inquiries regarding double-byte character set (DBCS) information, contact the IBM Intellectual Property Department in your country or send inquiries, in writing, to:Intellectual Property Licensing Legal and Intellectual Property Law IBM Japan Ltd. 19-21, Nihonbashi-Hakozakicho, Chuo-ku Tokyo 103-8510, Japan INTERNATIONAL BUSINESS MACHINES CORPORATION PROVIDES THIS PUBLICATION "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESS OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF NON-INFRINGEMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Some jurisdictions do not allow disclaimer of express or implied warranties in certain transactions, therefore, this statement may not apply to you. This information could include technical inaccuracies or typographical errors. Changes are periodically made to the information herein; these changes will be incorporated in new editions of the publication. IBM may make improvements and/or changes in the product(s) and/or the program(s) described in this publication at any time without notice. Any references in this information to non-IBM websites are provided for convenience only and do not in any manner serve as an endorsement of those websites. The materials at those websites are not part of the materials for this IBM product and use 8
  • 32. of those websites is at your own risk. IBM may use or distribute any of the information you provide in any way it believes appropriate without incurring any obligation to you. Licensees of this program who wish to have information about it for the purpose of enabling: (i) the exchange of information between independently created programs and other programs (including this one) and (ii) the mutual use of the information which has been exchanged, should contact:IBM Director of Licensing IBM Corporation North Castle Drive, MD-NC119 Armonk, NY 10504-1785 US Such information may be available, subject to appropriate terms and conditions, including in some cases, payment of a fee. The licensed program described in this document and all licensed material available for it are provided by IBM under terms of the IBM Customer Agreement, IBM International Program License Agreement or any equivalent agreement between us. The performance data and client examples cited are presented for illustrative purposes only. Actual performance results may vary depending on specific configurations and operating conditions. Information concerning non-IBM products was obtained from the suppliers of those products, their published announcements or other publicly available sources. IBM has not tested those products and cannot confirm the accuracy of performance, compatibility or any other claims related to non-IBMproducts. Questions on the capabilities of non-IBM products should be addressed to the suppliers of those products. Statements regarding IBM's future direction or intent are subject to change or withdrawal without notice, and represent goals and objectives only. All IBM prices shown are IBM's suggested retail prices, are current and are subject to change without notice. Dealer prices may vary. This information is for planning purposes only. The information herein is subject to change before the products described become available. This information contains examples of data and reports used in daily business operations. To illustrate them as completely as possible, the examples include the names of individuals, companies, brands, and products. All of these names are fictitious and any similarity to actual people or business enterprises is entirely coincidental. COPYRIGHT LICENSE: This information contains sample application programs in source language, which illustrate programming techniques on various operating platforms. You may copy, modify, and distribute these sample programs in any form without payment to IBM, for the purposes of developing, using, marketing or distributing application programs conforming to the application programming interface for the operating platform for which the sample programs are written. These examples have not been thoroughly tested under all conditions. IBM, therefore, cannot guarantee or imply reliability, serviceability, or function of these programs. The sample programs are provided "AS IS", without warranty of any kind. IBM shall not be liable for any damages arising out of your use of the sample programs. 9
  • 33. - - - - - - Each copy or any portion of these sample programs or any derivative work must include a copyright notice as follows: © (your company name) (year). Portions of this code are derived from IBM Corp. Sample Programs. © Copyright IBM Corp. _enter the year or years_. Parent topic:BigInsights Trademarks IBM, the IBM logo, and ibm.com are trademarks or registered trademarks of International Business Machines Corp., registered in many jurisdictions worldwide. Other product and service names might be trademarks of IBM or other companies. A current list of IBM trademarks is available on the web at "Copyright and trademark information" at www.ibm.com/legal/copytrade.shtml. The following terms are trademarks or registered trademarks of other companies and have been used in at least one of the documents in the BigInsights documentation library: Microsoft, Windows, and the Windows logo are trademarks of Microsoft Corporation in the United States, other countries, or both. Intel, Intel logo, Intel Inside, Intel Inside logo, Intel Centrino, Intel Centrino logo, Celeron, Intel Xeon, Intel SpeedStep, Itanium, and Pentium are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries. Java™ and all Java-based trademarks and logos are trademarks or registered trademarks of Oracle and/or its affiliates. UNIX is a registered trademark of The Open Group in the United States and other countries. Linux is a registered trademark of Linus Torvalds in the United States, other countries, or both. Adobe, the Adobe logo, PostScript, and the PostScript logo are either registered trademarks or trademarks of Adobe Systems Incorporated in the United States, and/or other countries. Other company, product, or service names may be trademarks or service marks of others. Terms and conditions for product documentation Permissions for the use of these publications are granted subject to the following terms and conditions. Applicability These terms and conditions are in addition to any terms of use for the IBM website. Personal use You may reproduce these publications for your personal, noncommercial use provided that all proprietary notices are preserved. You may not distribute, display or make derivative work of these publications, or any portion thereof, without the 10
  • 34. express consent of IBM. Commercial use You may reproduce, distribute and display these publications solely within your enterprise provided that all proprietary notices are preserved. You may not make derivative works of these publications, or reproduce, distribute or display these publications or any portion thereof outside your enterprise, without the express consent of IBM. Rights Except as expressly granted in this permission, no other permissions, licenses or rights are granted, either express or implied, to the publications or any information, data, software or other intellectual property contained therein. IBM reserves the right to withdraw the permissions granted herein whenever, in its discretion, the use of the publications is detrimental to its interest or, as determined by IBM, the above instructions are not being properly followed. You may not download, export or re-export this information except in full compliance with all applicable laws and regulations, including all United States export laws and regulations. IBM MAKES NO GUARANTEE ABOUT THE CONTENT OF THESE PUBLICATIONS. THE PUBLICATIONS ARE PROVIDED "AS-IS" AND WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING BUT NOT LIMITED TO IMPLIED WARRANTIES OF MERCHANTABILITY, NON- INFRINGEMENT, AND FITNESS FOR A PARTICULAR PURPOSE. IBM Online Privacy Statement IBM Software products, including software as a service solutions, (“Software Offerings”) may use cookies or other technologies to collect product usage information, to help improve the end user experience, to tailor interactions with the end user, or for other purposes. In many cases no personally identifiable information is collected by the Software Offerings. Some of our Software Offerings can help enable you to collect personally identifiable information. If this Software Offering uses cookies to collect personally identifiable information, specific information about this offering’s use of cookies is set forth below. This Software Offering does not use cookies or other technologies to collect personally identifiable information. If the configurations deployed for this Software Offering provide you as customer the ability to collect personally identifiable information from end users via cookies and other technologies, you should seek your own legal advice about any laws applicable to such data collection, including any requirements for notice and consent. For more information about the use of various technologies, including cookies, for these purposes, see IBM’s Privacy Policy at http://www.ibm.com/privacy and IBM’s Online Privacy Statement at http://www.ibm.com/privacy/details in the section entitled “Cookies, Web Beacons and Other Technologies,” and the “IBM Software Products and Software-as-a-Service Privacy Statement” at 11
  • 36. - - - - - IBM BigInsights Product overview BigInsights® is a flexible software platform that provides capabilities to discover and analyze business insights that are hidden in large volumes of structured and unstructured data, giving value to previously dormant data. BigInsights information roadmap This document provides links to the information resources that are available for IBM® Open Platform with Apache Hadoop and the BigInsights value-add services. Introduction to BigInsights BigInsights is a software platform for discovering, analyzing, and visualizing data from disparate sources. You use this software to help process and analyze the volume, variety, and velocity of data that continually enters your organization every day. BigInsights is a collection of value-added services that can be installed on top of the IBM Open Platform with Apache Hadoop, wihch is the open Hadoop foundation. Release notes Release notes contain information about the installation and administration of IBM BigInsights Enterprise Edition and its components. Installing the free IBM Open Platform with Apache Hadoop and BigInsights Quick Start Edition, non-production software What's new for Version 4.1 13
  • 37. - - - - - - - - - - - - - - - - - - - - - - - IBM BigInsights BigInsights information roadmap This document provides links to the information resources that are available for IBM® Open Platform with Apache Hadoop and the BigInsights® value-add services. Product overview Evaluating Planning Installing Administering Getting started Developing Analyzing Troubleshooting and support Reference Community resources Product overview BigInsights home page This web page provides an overview of BigInsights and its related components. Introduction These topics introduce you to BigInsights, including its product modules and components. Last updated: 2015-008 New features and capabilities These topics include information about the features, capabilities, and updates that are included in the most recent version of BigInsights. Last updated: 2015- 08 SQL-on-Hadoop without compromise This white paper contains information on the updated Big SQL for BigInsights V3.0 and the speed, portability, and robust functionality that this SQL on Hadoop solution provides. Last updated: 2014-04 Evaluating Analyzing social media and structured data with InfoSphere BigInsights This article provides a quick start on BigSheets. You'll learn how to model big data in BigSheets, manipulate this data using built-in macros and functions, create charts to visualize your work, and export the results of your analysis in one of several popular output formats. Last updated: 2012 Big Data Networked Storage Solution for Hadoop This IBM® Redpaper™ provides a reference architecture, based on Apache Hadoop, to help businesses gain control over their data, meet tight service level agreements (SLAs) around their data applications, and turn data-driven insight into effective action. Big Data Networked Storage Solution for Hadoop delivers the capabilities for ingesting, storing, and managing large data sets with high reliability. IBM BigInsights provides an innovative analytics platform that processes and analyzes all types of data to turn large complex data into 14
  • 38. - - - - - - - - - - - - - - - - insight.Last updated: 2013-07 Quick Start Edition IBM BigInsights Quick Start Edition is a free, downloadable non-production version of BigInsights that enables new solutions that cost effectively turn large, complex volumes of data into insight by combining Apache Hadoop, (including the MapReduce framework and the Hadoop Distributed File Systems), with unique, enterprise-ready technologies and capabilities from across IBM, including Big SQL, text analytics and BigSheets. Last updated: 2015-08 Planning System requirements This document describes the system requirements for BigInsights. Last updated: 2015-08 Performance and Capacity Implications for Big Data The purpose of this IBM Redpaper™ publication is to consider the performance and capacity implications of big data solutions, which must be taken into account for them to be viable. This paper describes the benefits that big data approaches can provide. We then cover performance and capacity considerations for creating big data solutions. We conclude with what this means for big data solutions, both now and in the future. Last updated: 2014- 02-07 Using the IBM Big Data and Analytics Platform to Gain Operational Efficiency This IBM® Redbooks® Solution Guide describes how to use IBM Big Data and Analytics Platform to provide a more comprehensive view of a customer’s interaction with an organization’s products and services. It provides an example about how to take multiple sources of information and analyze that data to gain insight. This example uses disparate data sources, real-time analytics, an appliance data warehouse, analytic modeling, and reporting tools to support the business decision-making process. Last updated: 2014-04 Big SQL: Data warehouse-grade performance on Hadoop This Impact 2014 presentation introduces Big SQL and places it in the SQL on Hadoop context. Last updated: 2014-04 Taming Big Data with Big SQL This Impact 2014 presentation introduces big data, BigInsights, and Big SQL. It provides performance and security best practices for Big SQL and announces security improvements over Hive 0.12. Last updated: 2014-04 Installing Release notes The release notes contain critical information to ensure the successful installation and operation of BigInsights. Last updated: 2015-08 Administering Setting up Security and Administering BigInsights These topics describe how to complete general administration tasks, such as configuring user security, administering individual components, and deploying and running programs. Last updated: 2015-08 15
  • 39. - - - - - - - - - - - - - - - - - - - - Getting started BigInsights FAQs The FAQs area in the Hadoop Dev community provides answers to questions frequently asked about BigInsights, Big SQL, Hive, and HBase. BigInsights tutorials These topics include tutorials that you can use to quickly get started with BigInsights. Last updated: 2015-08 IBM big data education IBM offers classroom, on-site, and e-Learning training classes to help build and enhance your skills BigInsights. IBM BigInsights education These e-Learning training classes include material on BigInsights Foundation, Big SQL, BigInsights Analytics for Business Analysts, and BigInsights Analytics for Programmers. Big Data University The Big Data University website contains courses, downloads, and educational materials about Hadoop and other big data applications. Understanding BigInsights This developerWorks® article provides an introduction to BigInsights, including architecture, capabilities, and scenarios for how you can use the product in your organization. Last updated: 2011-10 Developing Developing and administering applications These topics describe how to develop and maintain BigInsights applications. Last updated: 2015-08 Set up and use federation This developerWorks article introduces Big SQL federation capabilities by using many data sources, including IBM DB2 for Linux, UNIX, and Windows, IBM PureData System for Analytics, IBM PureData System for Operational Analytics, Teradata, and Oracle. Federation enables you to sent distributed requests to multiple data sources within a single SQL statement. Last updated: 2014-07-08 Analyzing Analyzing big data These topics describe how to analyze data with IBM BigSheets, analyzing and manipulating data with Jaql, and analyzing documents with text analytics. Last updated: 2015-08 BigSheets This website includes information about BigSheets, which is a browser-based visualization tool that you can use to extend the scope of your business intelligence data. Troubleshooting and support 16
  • 40. - - - - - - - - - - - - - - - - - - Troubleshooting These topics describe how to troubleshoot issues with BigInsights components, security, and text analytics Last updated: 2015-08 dW Answers for BigInsights This Q&A area in the Hadoop Dev community provides a place to ask questions and get answers from experts. BigInsights product support The IBM Support Portal is a unified, customizable view of all technical support tools and information for all IBM systems, software, and services. Updated continuously. Reference Reference Use the reference information to read more about commands and functions, supported languages, and console messages. Last updated: 2015-08 Community resources Hadoop Dev This dev-to-dev community site where you can find resources and tips from experts, ask questions, and share with others. Updated continuously. IBM Meetup Groups This website, designed for developers, data scientists, and big data enthusiasts, provides an opportunity to work hands-on with the solutions and tools in the big data portfolio. IBM developerWorks The IBM developerWorks website contains developer resources, tutorials, and articles about BigInsights. The Big Data and Analytics Hub This website provides links to big data communities, events, blogs, videos and podcasts, and developer-centric material. Updated continuously. Video Guide This developerWorks article provides links to new and trending videos from the IBM Big Data channel on YouTube. You can also go to the Videos area in Hadoop Dev for a searchable and continuously updated list. Parent topic:Product overview 17
  • 41. - - - - - IBM BigInsights Introduction to BigInsights BigInsights® is a software platform for discovering, analyzing, and visualizing data from disparate sources. You use this software to help process and analyze the volume, variety, and velocity of data that continually enters your organization every day. BigInsights is a collection of value-added services that can be installed on top of the IBM® Open Platform with Apache Hadoop, wihch is the open Hadoop foundation. .BigInsights helps your organization understand and analyze massive volumes of unstructured information as easily as smaller volumes of information. The flexible platform is built on an Apache Hadoop open source framework that runs in parallel on commonly available, low-cost hardware. You can easily scale the platform to analyze hundreds of terabytes, petabytes, or more of raw data that is derived from various sources. As information grows, you add more hardware to support the influx of data. BigInsights helps application developers, data scientists, and administrators in your organization quickly build and deploy custom analytics to capture insight from data. This data is often integrated into existing databases, data warehouses, and business intelligence infrastructure. By using BigInsights, users can extract new insights from this data to enhance knowledge of your business. For more information about the IBM Open Source Platform, see Installing IBM Open Source Platorm for Apache Hadoop. BigInsights incorporates tooling and value-add services for numerous users, speeding time to value and simplifying development and maintenance: Software developers can use the value-add services that are provided to develop custom text analytic functions to analyze loosely structured or largely unstructured text data. Data scientists and business analysts can use the data analysis tools within the value-add services to explore and work with unstructured data in a familiar spreadsheet-like environment. IBM Open Platform with Apache Hadoop and the BigInsights value-add services The content of IBM Open Platform with Apache Hadoop and the BigInsights value- add services includes the following: IBM BigInsights Quick Start Edition for Non-Production Environments Test drive the IBM Open Platform with Apache Hadoop and BigInsights value-add modules, Version 4.1 by downloading the Quick Start Edition, which is a free, non- production software. BigInsights features and architecture BigInsights provides distinct capabilities for discovering and analyzing business insights that are hidden in large volumes of data. These technologies and features combine to help your organization manage data from the moment that it enters your enterprise. 18
  • 42. - - - Suggested services layout for IBM Open Platform with Apache Hadoop and BigInsights value-added services In your multi-node cluster, it is suggested that you have at least one management node in your non-high availability environment, if performance is not an issue. If performance is a concern, consider configuring at least three management nodes. If you use the BigInsights - Big SQL service, consider configuring four management nodes. If you use a high availability environment, consider six management nodes. Use the following list as a guide for the nodes in your cluster. Scenarios for working with big data BigInsights provides capabilities to derive business value from complex, unstructured information. BigInsights supports various scenarios that can help different organizations grow by finding value that is hidden in data and data relationships. Where BigInsights fits in an enterprise data architecture Reusing business investments and incorporating existing assets is important when expanding your enterprise data architecture. BigInsights supports data exchange with a number of sources, relational data stores, and applications so that it can integrate into your existing architecture. Parent topic:Product overview 19
  • 43. IBM BigInsights IBM Open Platform with Apache Hadoop and the BigInsights value-add services The content of IBM® Open Platform with Apache Hadoop and the BigInsights® value-add services includes the following: Table 1. Supported features of the BigInsights editions Parent topic:Introduction to BigInsights Modules Supported features IBM Open Platform with Apache Hadoop Cluster management, and services such as Hive, HBase, Oozie, Flume, HDFS IBM BigInsights Analyst Module Big SQL and BigSheets IBM BigInsights Data Scientist Module The contents of the IBM BigInsights Analyst Module, plus Text Analytics and Big R IBM BigInsights Enterprise Management Module GPFS and Platform Symphony IBM BigInsights for Apache Hadoop The contents of the IBM BigInsights Data Scienties module, the BigInsights Analyst module, and the BigInsights Enterprise Management module. In addition, it contains a license that provides limited-use licenses for other software so that you can get even more value out of Hadoop. IBM BigInsights Quick Start Edition Big SQLIBM BigInsights Big RBigSheetsText AnalyticsConnectorsIBM Hadoop core 20
  • 44. - - - - IBM BigInsights IBM BigInsights Quick Start Edition for Non- Production Environments Test drive the IBM® Open Platform with Apache Hadoop and BigInsights® value- add modules, Version 4.1 by downloading the Quick Start Edition, which is a free, non-production software. Use the Quick Start Edition to begin exploring the features of IBM Open Platform with Apache Hadoop and BigInsights value-add modules by using real data and running real applications. The Quick Start Edition comes loaded with most of the same features as the IBM Open Platform with Apache Hadoop, and the related services bundled in the Data Scientist and Business Analyst packages, without any need to upgrade or uninstall your current products. The Quick Start Edition puts no data limit on the cluster and there is no time limit on the license. Download the software You can download the native software. See Installing the value-add services for information how downloading and installing. Or, you can download the VM image, the latter of which comes preconfigured. Complete the tutorials After you download the software, use the BigInsights tutorials to begin working with big data. For more information, view the video tutorials on the BigInsights home page. The following table highlights the supported and unsupported features of the Quick Start Edition. Table 1. Supported and unsupported features of the Quick Start Edition Parent topic:Introduction to BigInsights Related concepts: IBM BigInsights Quick Start Edition for Non-Production Environments: VM image README Supported features Unsupported features Big SQLIBM BigInsights Big RBigSheetsText AnalyticsWorkload optimizationQuery SupportConnectorsManagement toolsIBM Open Platform with Apache Hadoop High availability (HA) capabilityGeneral Parallel File System (GPFS™)Production support 21
  • 45. IBM BigInsights BigInsights features and architecture BigInsights® provides distinct capabilities for discovering and analyzing business insights that are hidden in large volumes of data. These technologies and features combine to help your organization manage data from the moment that it enters your enterprise. By combining these technologies, BigInsights extends the Hadoop open source framework with enterprise-grade security, governance, availability, integration into existing data stores, tools that simplify developer productivity, and more. Hadoop is a computing environment built on top of a distributed, clustered file system that is designed specifically for large-scale data operations. Hadoop is designed to scan through large data sets to produce its results through a highly scalable, distributed batch processing system. Hadoop comprises two main components: a file system, known as the Hadoop Distributed File System (HDFS), and a programming paradigm, known as Hadoop MapReduce. To develop applications for Hadoop and interact with HDFS, you use additional technologies and programming languages such as Pig, Hive, Flume, and many others. Apache Hadoop helps enterprises harness data that was previously difficult to manage and analyze. BigInsights features Hadoop and its related technologies as a core component. 22
  • 46. - - - - - - File systems The Hadoop Distributed File System (HDFS) comes with IBM Open Platform with Apache Hadoop as your distributed file system. MapReduce frameworks The MapReduce framework is the core of Apache Hadoop. This programming paradigm provides for massive scalability across hundreds or thousands of servers in a Hadoop cluster. Open source technologies The following open source technologies are included with IBM Open Platform with Apache Hadoop version 4.1. Text Analytics BigInsights includes Text Analytics, which extracts structured information from unstructured and semistructured data. IBM Big SQL Big SQL is a massively parallel processing (MPP) SQL engine that deploys directly on the physical Hadoop Distributed File System (HDFS) cluster. Integration with other IBM products BigInsights complements and extends existing business capabilities by integrating with other IBM products. These integration points extend existing technologies to encompass more comprehensive information types, enabling a complete view of your business. Parent topic:Introduction to BigInsights 23
  • 47. - IBM BigInsights File systems The Hadoop Distributed File System (HDFS) comes with IBM® Open Platform with Apache Hadoop as your distributed file system. Hadoop Distributed File System (HDFS) The Hadoop Distributed File System (HDFS) allows applications to run across multiple servers. HDFS is highly fault tolerant, runs on low-cost hardware, and provides high-throughput access to data. Parent topic:BigInsights features and architecture 24
  • 48. IBM BigInsights Hadoop Distributed File System (HDFS) The Hadoop Distributed File System (HDFS) allows applications to run across multiple servers. HDFS is highly fault tolerant, runs on low-cost hardware, and provides high-throughput access to data. Data in a Hadoop cluster is broken into smaller pieces called blocks, and then distributed throughout the cluster. Blocks, and copies of blocks, are stored on other servers in the Hadoop cluster. That is, an individual file is stored as smaller blocks that are replicated across multiple servers in the cluster. Each HDFS cluster has a number of DataNodes, with one DataNode for each node in the cluster. DataNodes manage the storage that is attached to the nodes on which they run. When a file is split into blocks, the blocks are stored in a set of DataNodes that are spread throughout the cluster. DataNodes are responsible for serving read and write requests from the clients on the file system, and also handle block creation, deletion, and replication. An HDFS cluster supports NameNodes, an active NameNode and a standby NameNode, which is a common setup for high availability. The NameNode regulates access to files by clients, and tracks all data files in HDFS. The NameNode determines the mapping of blocks to DataNodes, and handles operations such as opening, closing, and renaming files and directories. All of the information for the NameNode is stored in memory, which allows for quick response times when adding storage or reading requests. The NameNode is the repository for all HDFS metadata, and user data never flows through the NameNode. A typical HDFS deployment has a dedicated computer that runs only the NameNode, because the NameNode stores metadata in memory. If the computer that runs the NameNode fails, then metadata for the entire cluster is lost, so this computer is typically more robust than others in the cluster. Parent topic:File systems Related reference: Read the HDFS Architecture Guide Read the HDFS User Guide 25
  • 49. - - IBM BigInsights MapReduce frameworks The MapReduce framework is the core of Apache Hadoop. This programming paradigm provides for massive scalability across hundreds or thousands of servers in a Hadoop cluster. Hadoop MapReduce In IBM® Open Platform with Apache Hadoop, the MapReduce framework, MapReduce version 2, is run as a YARN workload framework. The benefits of this new approach are that resource management is separated from workload management, and MapReduce applications can coexist with other types of workloads such as Spark or Slider. Yarn The current version of the product supports the new Apache Hadoop YARN framework and integrates it with the rest of the IBM Open Platform with Apache Hadoop components. Yarn decouples resource management from workload management. Parent topic:BigInsights features and architecture 26
  • 50. IBM BigInsights Hadoop MapReduce In IBM® Open Platform with Apache Hadoop, the MapReduce framework, MapReduce version 2, is run as a YARN workload framework. The benefits of this new approach are that resource management is separated from workload management, and MapReduce applications can coexist with other types of workloads such as Spark or Slider. In this programming paradigm, applications are divided into self-contained units of work. Each of these units of work can be run on any node in the cluster. In a Hadoop cluster, a MapReduce program is known as a job. A job is run by being broken down into pieces, known as tasks. These tasks are scheduled to run on the nodes in the cluster where the data exists. MapReduce version 2 jobs are executed by YARN in the Hadoop cluster. The YARN ResourceManager spawns a MapReduce ApplicationMaster container, which requests additional containers for mapper and reducer tasks. The ApplicationMaster communicates with the NameNode to determine where all of the data required for the job exists across the cluster. It attempts to schedule tasks on the cluster where the data is stored, rather than sending data across the network to complete a task. The YARN framework and the Hadoop Distributed File System (HDFS) typically exist on the same set of nodes, which enables the ResourceManager program to schedule tasks on nodes where the data is stored. As the name MapReduce implies, the reduce task is always completed after the map task. A MapReduce job splits the input data set into independent chunks that are processed by map tasks, which run in parallel. These bits, known as tuples, are key/value pairs. The reduce task takes the output from the map task as input, and combines the tuples into a smaller set of tuples. Each MapReduce ApplicationMaster monitors its spawned tasks. If a task fails to complete, the ApplicationMaster will reschedule that task on another node in the cluster. This distribution of work enables map tasks and reduce tasks to run on smaller subsets of larger data sets, which ultimately provides maximum scalability. The MapReduce framework also maximizes parallelism by manipulating data stored across multiple clusters. MapReduce applications do not have to be written in Java™, though most MapReduce programs that run natively under Hadoop are written in Java. Parent topic:MapReduce frameworks 27
  • 51. IBM BigInsights Yarn The current version of the product supports the new Apache Hadoop YARN framework and integrates it with the rest of the IBM® Open Platform with Apache Hadoop components. Yarn decouples resource management from workload management. The YARN framework uses a ResourceManager service, a NodeManagers service, and an Application master service. The Application master service is an import type of YARN services that is a per- application service. It is responsible for negotiating with the Resource Manager to apply resources for a particular application. It also monitors the status of the application execution and provides tracking information. The bottleneck for the central service, like Resource Manager, on a highly concurrent and utilized cluster is resolved by transferring the scheduling responsibility from Resource Manager to Application Manager. The ResourceManager is in charge of scheduling resources for jobs. The basic allocation unit is a container. Containers are workload agnostic, and they can represent any type of computation, such as a map or reduce task in MapReduce. The ResourceManager ensures that the cluster capacity is not exceeded by keeping track of the scheduled containers and queueing requests when resources are busy. NodeManagers spawn containers scheduled by the ResourceManager and monitor that they do not go beyond the expected resource utilization. Containers that use more memory or CPU than allocated are terminated. For more information about the Yarn architecture, see http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/YARN.html Parent topic:MapReduce frameworks 28
  • 52. - - - - IBM BigInsights Open source technologies The following open source technologies are included with IBM® Open Platform with Apache Hadoop version 4.1. Table 1. Open source technology versions by IBM BigInsights value-add services release Ambari Apache Ambari is an open framework for provisioning, managing, and monitoring Apache Hadoop clusters. Ambari provides an intuitive and easy-to-use Hadoop management web UI backed by its collection of tools and APIs that simplify the operation of Hadoop clusters. Flume Apache Flume is a distributed, reliable, and available service for efficiently collecting, aggregating, and moving large amounts of streaming event data. Flume helps you aggregate data from many sources, manipulate the data, and then add the data into your Hadoop environment. Hadoop Apache Hadoop contains open-source software for reliable, scalable, distributed computing and storage. The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models. It is designed to scale up from single servers to thousands of machines, each offering local computation and storage. HBase Apache HBase is a column-oriented database management system that runs on top of HDFS and is often used for sparse data sets. Unlike relational database systems, HBase does not support a structured query language like SQL. HBase Open source technology 4.1.0.0 4.1.0.1 4.1.0.2 Ambari 2.1.0 2.1.0 2.1.0 Flume 1.5.2 1.5.2 1.5.2 Hadoop (HDFS, YARN MapReduce) 2.7.1 2.7.1 2.7.1 HBase 1.1.1 1.1.1 1.1.1 Hive 1.2.1 1.2.1 1.2.1 Kafka 0.8.2.1 0.8.2.1 0.8.2.1 Knox 0.6.0 0.6.0 0.6.0 Oozie 4.2.0 4.2.0 4.2.0 Pig 0.15.0 0.15.0 0.15.0 Slider 0.80.0 0.80.0 0.80.0 Solr 5.1.0 5.1.0 5.1.0 Spark 1.4.1 1.4.1 1.5.1 Sqoop 1.4.6 1.4.6 1.4.6 Zookeeper 3.4.6 3.4.6 3.4.6 29
  • 53. - - - - - - - - applications are written in Java™, much like a typical MapReduce application. HBase allows many attributes to be grouped into column families so that the elements of a column family are all stored together. This approach is different from a row-oriented relational database, where all columns of a row are stored together. Hive Apache Hive is a data warehouse infrastructure that facilitates data extract- transform-load (ETL) operations, in addition to analyzing large data sets that are stored in the Hadoop Distributed File System (HDFS). IBM Open Platform with Apache Hadoop includes a JDBC driver that is used for programming with Hive and for connecting with Cognos Business Intelligence software. Kafka Apache Kafka is a distributed publish-subscribe messaging system rethought as a distributed commit log. It is designed to be fast, scalable, durable, and fault-tolerant providing a unified, high-throughput, low-latency platform for handling real-time data feeds. Kafka is often used in place of traditional message brokers because of its higher throughput, reliability and replication. Oozie Apache Oozie is a management application that simplifies workflow and coordination between MapReduce jobs. Oozie provides users with the ability to define actions and dependencies between actions. Oozie then schedules actions to run when the required dependencies are met. Workflows can be scheduled to start based on a given time or based on the arrival of specific data in the file system. Pig Apache Pig is a platform for analyzing large data sets that consist of a high-level language for expressing data analysis programs, coupled with infrastructure for evaluating these programs. A key property of Pig programs is that their structure is amenable to substantial parallelization, which in turns enables them to handle very large data sets. Slider Apache Slider (incubating) is a YARN application to deploy existing distributed applications on YARN, monitor them, and make them larger or smaller as desired, even while they are running. Solr Solr is an enterprise search tool from the Apache Lucene project that offers powerful search tools, including hit highlighting, as well as indexing capabilities, reliability and scalability, a central configuration system, and failover and recovery. Spark Spark is a component of IBM Open Platform with Apache Hadoop that includes Apache Spark. Apache Spark is a fast and general-purpose cluster computing system. It provides high-level APIs in Java, Scala and Python, and an optimized engine that supports general execution graphs. It also supports a rich set of higher- level tools including Spark SQL for SQL and structured data processing, MLLib for machine learning, GraphX for combined data-parallel and graph-parallel computations, and Spark Streaming for streaming data processing. Sqoop Sqoop is a tool designed to easily import information from structured databases (such as SQL) and related Hadoop systems (such as Hive and HBase) into your 30
  • 54. - - Hadoop cluster. You can also use Sqoop to extract data from Hadoop and export it to relational databases and enterprise data warehouses. ZooKeeper ZooKeeper is a centralized infrastructure and set of services that enable synchronization across a cluster. ZooKeeper maintains common objects that are needed in large cluster environments, such as configuration information, distributed synchronization, and group services. Many other open source projects that use Hadoop clusters require these cross-cluster services. Having these services available in ZooKeeper ensures that each project can embed ZooKeeper without having to build new synchronization services into each project. Other Apache projects The IBM Open Platform with Apache Hadoop is a pure open source offering with the latest components in the Apache Hadoop and Spark ecosystems. Parent topic:BigInsights features and architecture Related reference: Apache Hadoop website Related information: Apache Solr 31
  • 55. - - - - - - - - - - - - - - - - - - IBM BigInsights Ambari Apache Ambari is an open framework for provisioning, managing, and monitoring Apache Hadoop clusters. Ambari provides an intuitive and easy-to-use Hadoop management web UI backed by its collection of tools and APIs that simplify the operation of Hadoop clusters. Core Ambari The release of IBM® Open Platform with Apache Hadoop includes an updated Apache Ambari 2.1.0 with more functionality and improvements. Customizable Dashboards [AMBARI-9792] : Ability to customize the Metric widgets displayed on HDFS, YARN and HBase on Service Summary pages. Includes ability for Operators to create new widgets and share widgets in a Widget Library. Guided Configs [AMBARI-9794] : Service Configs for HDFS, YARN, Hive and HBase included new UI controls (such as slider-bars) and an improved organization/layout. Manual Kerberos [AMBARI-9783] : When enabling Kerberos, ability to perform setup Kerberos manually. New User Views : Hive, Pig, Files and Capacity Scheduler user views are included by default with Ambari. Rack Awareness [AMBARI-6646] : Ability to set Rack ID on hosts. Ambari will generate a topology script automatically and set the configuration for HDFS. Alerts Log Appender [AMBARI-10249] : Log alert state change events to ambari- alerts.log. JDK 1.8 [AMBARI-9784] : Added support for Oracle JDK 1.8. RHEL/CentOS/Oracle Linux 7 [AMBARI-979] : Added support for RHEL/CentOS/Oracle Linux 7. Ambari Alerts (AMBARI-6354) Ambari Metrics (AMBARI-5707) Simplified Kerberos Setup (AMBARI-7204) Hive Metastore HA (AMBARI-6684) HiveServer2 HA (AMBARI-8906) Oozie HA (AMBARI-6683) Add HDFS-NFS gateway as a new component to HDFS in Ambari stack (AMBARI- 9224) Extensibility Blueprints: Host Discovery [AMBARI-10750] : Ability to automatically add hosts to a blueprint-created cluster. Views Framework: Auto-create [AMBARI-10424] : Ability to specify how to automatically create a view instance. Views Framework: Auto-configure [AMBARI-10306] : Ability to specify how to automatically configuration a view instance based on the cluster being managed by Ambari. 32
  • 56. For more information about the updates, see https://issues.apache.org/jira/browse/ Parent topic:Open source technologies 33
  • 57. - - - - - - - - - - - - - - - - - - - - - - - - - - IBM BigInsights Flume Apache Flume is a distributed, reliable, and available service for efficiently collecting, aggregating, and moving large amounts of streaming event data. Flume helps you aggregate data from many sources, manipulate the data, and then add the data into your Hadoop environment. Use the following terms as a guide for working with Flume: sources Any data source that Flume supports, channels A repository where the data are staged. sinks The target of data where Flume sends the interceptors to. IBM® Open Platform with Apache Hadoop and BigInsights include the following changes on top of Flume 1.5.2: FLUME-2095: JMS source with TIBCO FLUME-924: Implement a JMS source for Flume NG FLUME-997: Support secure transport mechanism FLUME-1502: Support for running simple configurations embedded in host process FLUME-1516: FileChannel Write Dual Checkpoints to avoid replays FLUME-1632: Persist progress on each file in file spooling client/source FLUME-1735: Add support for a plugins.d directory FLUME-1894: Implement Thrift RPC FLUME-1917: FileChannel group commit (coalesce fsync) FLUME-2010: Support Avro records in Log4jAppender and the HDFS Sink FLUME-2048: Avro container file deserializer FLUME-2070: Add a Flume Morphline Solr Sink FLUME-1227: Introduce some sort of SpillableChannel FLUME-2056: Allow SpoolDir to pass just the filename that is the source of an event FLUME-2071: Flume Context doesn’t support float or double configuration values. FLUME-2185: Upgrade morphlines to 0.7.0 FLUME-2188: flume-ng-log4jappender Support user supplied headers FLUME-2225: Elasticsearch Sink for ES HTTP API FLUME-2294: Add a sink for Kite Datasets FLUME-2309: Spooling directory should not always consume the oldest file first. For a complete list of the new features, improvements, and bug fixes available, refer to the CHANGELOG.txt file located in your Flume installation directory. For more information about Flume, see http://flume.apache.org/. Parent topic:Open source technologies 34
  • 58. 35
  • 59. - - - - - - - - - - - - - - - - - - - - - - - - - - IBM BigInsights Hadoop Apache Hadoop contains open-source software for reliable, scalable, distributed computing and storage. The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models. It is designed to scale up from single servers to thousands of machines, each offering local computation and storage. The Apache Hadoop 2.7.1 release includes important new features and improvements since Hadoop 2.2.0: Support for Access Control Lists in HDFS Native support for Rolling Upgrades in HDFS Usage of protocol-buffers for HDFS FSImage for smooth operational upgrades Complete HTTPS support in HDFS Enhanced support for new applications on YARN with Application History Server and Application Timeline Server Support for strong SLAs in YARN CapacityScheduler via Preemption Support for Heterogeneous Storage hierarchy in HDFS. In-memory cache for HDFS data with centralized administration and management. Simplified distribution of MapReduce binaries with HDFS in YARN Distributed Cache. The IBM-specific changes are: Backport HDFS-8432:Introduce a minimum compatible layout version to allow downgrade in more rolling upgrade use cases. Backport HADOOP-9431: TestSecurityUtil#testLocalHostNameForNullOrWild on systems where hostname contains capital letters Fix to remove cross-site scripting / scripting injection in HDFS webapps Backport HADOOP-11138: Stream yarn daemon and container logs through log4j Fix hadoop2 scripts to enable log streaming Backport HADOOP-10420: Add support to Swift-FS to support tempAuth Backport MAPREDUCE-5621: mr-jobhistory-daemon.sh doesn't have to execute mkdir and chown all the time Make the location of container executor config file configurable Add log streaming for container logs Backport HADOOP-7436: Bundle Log4j socket appender Metrics plugin in Hadoop Upgrade jsch to 0.1.50 because of JDK 1.7 incompatibilities Backport MAPREDUCE-6191: TestJavaSerialization fails with getting incorrect MR job result Backport HADOOP-11418: Property "io.compression.codec.lzo.class" does not work with other value besides default Fix race condition in Configuration.write Backport MAPREDUCE-6246: DBOutputFormat.java appending extra semicolon to query which is incompatible with DB2 Backport HDFS-7282. Fix intermittent TestShortCircuitCache and TestBlockReaderFactory failures resulting from TemporarySocketDirectory GC Backport HDFS-7182. JMX metrics aren't accessible when NN is busy 36
  • 60. - - - Upgrade jetty version to 6.1.26-ibm Backport HADOOP-10062. race condition in MetricsSystemImpl#publishMetricsNow that causes incorrect results. Backport HDFS-6874: Add GET_BLOCK_LOCATIONS operation to HttpFS Parent topic:Open source technologies 37