Currently, the Yahoo EC Taiwan team provides business performance matrix to users by acquiring data from the Web production and Back office ERP systems. The reporting system is built using traditional BI technologies such as RDBMS, ETL tools, OLAP tools, home-made reporting tools, store procedures, web pages,?. With increasing usage growth of the user browsing data in the business decision on daily basis, The ability to provided data analytics on these Big Data is getting more and more important and needed. The traditional RDBMs have reaching its limit in process big data while connecting to OLAP tool. We started with the feasibility of connecting MicroStrategy with Hive 0.9 and created a prototype system to test in two scenarios – ad-hoc query to Hive and performance test of the predefined MicroStrategy Intelligent Cube for ad-hoc analytics. We did the performance test on Ad-hoc query via HiveQL and query from MicroStrategy cube, and will share the result in the session. Based on our test results, we will be able to provide the following applications to different types of users. A) Ad-hoc query running against Hadoop can allow well trained data analyst or power users to have deeper analysis on data within Hadoop. B) OLAP reports running against MicroStrategy Intelligent Cube can provide quicker response time on ad-hoc analytics with predefined data in Cube.
3. Tim Hsu
• Senior Data Engineer at Yahoo!
• Data modeling, BI application design
• Wants to provide an integrated, easy to use BI
system to Yahoo! EC users
7/11/133
4. Neal Lee
• Data Engineer at Yahoo!
• Aims to build up an easy to use self-service BI
platform connecting to Hadoop
7/11/134
6. § APAC is the best region where
Yahoo! runs EC business
§ Major EC properties
› 2001 Auction
› 2004 Shopping Mall
› 2008 Store Market
§ Yahoo! is the leading
eCommerce company in Taiwan
Who Are We?
7/11/136
In MM USD
- 1,000 2,000 3,000 4,000 5,000
EHS
National 3C Chains
Fubon momo TV shopping
FarEastern Dept store
TK 3C
PC Home
SOGO Dept Store
Y!EC
RT Mart(hyper mart)
FamiMart
PxMart (hyper mart)
Carrefour
ShinKwan Mitsukoshi
7-Eleven
2011 Taiwan Retail Revenue
7. Types of End Users in Yahoo! EC Taiwan
7/11/137
GM, BU Heads
Business Analysts
Marketers, Data Analysts
Category Managers
Suppliers, Sellers
8. BI Needs for Different Types of Users
7/11/138
SophisticatedSummarized
Business
Lowanalytics
Technical
Highanalytics
Data Scale
User Types &
Analytical Needs
GM,
BU
Business
Analysts
Marketers,
Data
Analysts
Category
Managers
Suppliers,
Sellers
15. Performance Test
§ Use case: Visitor distribution by demographic and device preference
§ Source Data: 293TB web logs in 60 days
§ Transformed Cube : 2.3 GB, 60.5M rows
§ Test environment
› MicroStrategy Server: 8 Cores 2.5G, 16G RAM, v9.2.1
› Hive Server: 4 Cores 2.5G, 4G RAM, v0.9
› Hadoop clusters: 300+ nodes, v0.23
7/11/1315
16. Case C1:
Cross tab with date
slice
Case C2:
Dynamic prompt on
date
Case C3:
Dynamic data
grouping (Browser)
Case C4:
80/20 Analysis
Case C5:
Data grouping
& charting
Test Cases
7/11/1316
Case C1:
Cross tab with date
slice
Case C2:
Dynamic prompt on
date
Case C3:
Dynamic data
grouping (Browser)
Case C4:
80/20 Analysis
Case C5:
Data grouping
& charting
17. 7/11/1317
10 CU. 25 CU. 50 CU. 100 CU.
20 Days 1.8 3.5 6.1 11.9
40 Days 3.1 6.8 12.1 24.5
60 Days 4.7 9.6 19.2 36.1
0
5
10
15
20
25
30
35
40
Avg.ResponseTime(sec)
Concurrent Users
Avg. Resp. Time by Concurrent Users
20 Days
40 Days
60 Days
20 Days 40 Days 60 Days
10 CU. 1.8 3.1 4.7
25 CU. 3.5 6.8 9.6
50 CU. 6.1 12.1 19.2
100 CU. 11.9 24.5 36.1
0
5
10
15
20
25
30
35
40
Avg.ResponseTime(sec)
Data Volume in Cube
Avg. Resp. Time by Data Volume
10 CU.
25 CU.
50 CU.
100 CU.
Test Result
19. Use Cases in Demonstration
§ Dynamic OLAP analysis using in memory cubes
§ Direct access to Hadoop through Hive
§ Self-service Business Intelligence
7/11/1319