2. 1
FusionOps|7/18/2016
FusionOps
DS Developer Test
Solution -1
Flow Chart
Sales Data Analysis
Summary:-
Sales Date Sales Quantity Unit Cost Unit Price Marketing Spent Discount Flag
Count 4000 4000 4000 4000 4000 4000
Mean 20132640.4 5047.348838 4.12248 12.1791675 193.22525 0.3215
Std 9731.630793 26503.76132 8.65621441 25.80803025 318.6396473 0.467110584
Min 20120131 9 0.07 0.32 3 0
25% 20121105.25 408.75 0.91 2.9 60 0
50% 20130880.5 963 1.46 4.32 108 0
75% 20140655.25 2117 2.2 6.51 203.25 1
Max 20150430 390630 52.38 166.43 3966 1
TABLE 1:- SUMMARY OF THE SALES LEVEL DATA
3. 2
FusionOps|7/18/2016
In the above table, we found the proper details (i.e Central Tendency and Variations) of the given data.
As the better result Mean of the Sales Quantity is 5047.34883 and the Median is 6899.864279. It means that the
Median of the Sales Quantity is Higher than the Mean of the Sales Quantity.
Now, Mean of the Marketing Spent is 193.22525 and Median of the Marketing Spent is 155.981928. It
means that Mean of the Marketing Spent is higher than the Median of the Marketing Spent.
Uni-Variate and Bi-Variate Analysis:-
FIGURE 1:-PAIR WISE PLOT OF SALES LEVEL ANALYSIS
4. 3
FusionOps|7/18/2016
FIGURE 2:- BARPLOT OF ABC CLASS
From the above Fiqure:-2 , we found the total count values of each category of the ABC Class as shown below:-
ABC Class Total Counts Discount Flags
A 1080 351
B 1480 453
C 1440 482
TABLE 2 CATEGORIES OF CLASS ABC
5. 4
FusionOps|7/18/2016
Summary Of the Categories of the Class ABC(Product Segmentation):-
TABLE 3:-SUMMARY OF CATEGORY A
TABLE 4:- SUMMARY OF CLASS B
Sales Date Sales Quantity Unit Cost Unit Price Marketing Spent Discount Flag
Count 1080 1080 1080 1080 1080 1080
Mean 20132640.4 16049.80648 4.633481481 17.26198148 453.6935185 0.325
Std 9734.921816 49349.07531 8.36362872 35.78886613 520.6330711 0.468591841
Min 20120131 47 0.07 0.32 54 0
25% 20121105.25 1266.75 0.45 1.55 190.5 0
50% 20130880.5 4185.5 1.03 3.12 297 0
75% 20140655.25 10848 2.9 6.725 479.25 1
Max 20150430 390630 32.39 166.43 3966 1
Sales Date Sales
Quantity
Unit Cost Unit Price Marketing
Spent
Discount
Flag
count 1480 1480 1480 1480 1480 1480
mean 20132640.4 1460.31 4.752783784 12.00458108 132.7682432 0.306081081
std 9733.702872 992.2210921 9.222109011 22.10871481 71.55556198 0.461019588
min 20120131 33.9 0.45 1.78 27 0
25% 20121105.25 783.5 0.8875 2.77 75 0
50% 20130880.5 1394 1.2 3.29 117 0
75% 20140655.25 1937.5 2.32 7.65 173.25 1
max 20150430 6951 39.21 110.8 531 1
6. 5
FusionOps|7/18/2016
Sales Date Sales
Quantity
Unit Cost Unit Price Marketing
Spent
Discount
Flag
count 1440 1440 1440 1440 1440 1440
mean 20132640.4 482.1844097 3.091416667 8.546493056 60.01041667 0.334722222
std 9733.794272 294.1944846 8.167083891 18.85617808 34.8026187 0.472057205
min 20120131 9 0.66 3.02 3 0
25% 20121105.25 248 1.29 4.12 33 0
50% 20130880.5 464.5 1.74 5.01 53 0
75% 20140655.25 646.25 2.02 6.4725 78 1
max 20150430 1895 52.38 134.41 216 1
TABLE 5:- SUMMARY OF CLASS C
From the Above Summary, we came to know that :-
Sales Quantity of Class A is more as compared to Class B and Class C.
Marketing Spent of Class A is more as compared to Class B and Class C.
Unit Price of Class A of last 25 percentiles of Sales are more as compared to Class B and Class C. In
other words, we can say that A comprises of more numbers of expensive Sales History.
It is very exciting to note that the Standard Deviation of Marketing Spent of the sales of the Class A
is 520 which is a way more expensive than respective values of Class B and Class C.
Discount Flag of C is more as compared to Class A and Class B.
Sales Quantity of Class A :- Product in the category Class A are in huge demand as compared to Class B
and Class C.
Class Total Sales
A 17333791.0
B 2161258.8
C 694345.55
TABLE 6 TOTAL SALES
7. 6
FusionOps|7/18/2016
BARPLOT OF DISCOUNT FLAGS
FIGURE 3:- BARPLOT OF DISCOUNT FLAG
From the above Barplot of Discount Flag, we get that there was more sale of the product under Discount as
compared to Promotion as you see in the below table:-
Discount Flag Total Counts
Promotion 1286
Discount 2714
TABLE 7 TOTAL COUNTS OF DISCOUNT FLAGS
As in the above result we see that 67.85 percent of Products are sold under the Discount and the remaining 32.15
percent of the Products are sold under Promotion.
8. 7
FusionOps|7/18/2016
Correlation Of Sales Level Data:-
From the above table we see significant Correlation between Marketing Spent and Sales Quantity that is
0.622740066 which means if we increase the Marketing Spent , the Sales Quantity also increase.
Correlation between Unit Price and Unit cost is also good that is 0.940819548 which means if we
increase the Unit Price , the Unit Cost also increase.
Sales Date Sales
Quantity
Unit Cost Unit Price Marketing
Spent
Discount
Flag
Sales Date 1 0.018289693 0.000642644 0.000710024 0.022292488 -
0.006249567
Sales
Quantity
0.018289693 1 -
0.079925357
-
0.077920384
0.622740066 0.000442116
Unit Cost 0.000642644 -0.079925357 1 0.940819548 0.106927754 0.00759269
Unit Price 0.000710024 -0.077920384 0.940819548 1 0.237334958 -
0.035743076
Marketing
Spent
0.022292488 0.622740066 0.106927754 0.237334958 1 -
0.150267176
Discount
Flag
-
0.006249567
0.000442116 0.00759269 -
0.035743076
-0.150267176 1
TABLE 8 CORRELATION OF SALES LEVEL DATA
10. 9
FusionOps|7/18/2016
Summary Of Product Level Data:-
Unit Cost Sales Quantity Unit Price Discount Flag Marketing Spent
count 100 100 100 100 100
mean 4.12248 201893.9535 12.1791675 12.86 7729.01
std 8.689409387 1036170.382 25.76945362 1.675853469 11271.81413
min 0.0775 1116 0.3785 10 931
25% 0.9100625 17890 2.892375 11 2897.25
50% 1.43425 38601.5 4.379625 13 4688
75% 2.157125 85749.45 6.6428125 14 8529.75
max 51.201 10290924 147.223 17 78490
TABLE 9 SUMMARY OF THE PRODUCT LEVEL DATA
From the above table , we see the Product level of the Data:-
As the Mean of the Sales Quantity is 201893.9535 and the Median is 275403.451160.
It means that the Median of the Sales Quantity is Higher than the Mean of the Sales Quantity.
11. 10
FusionOps|7/18/2016
Barplot of Class ABC (Product Level)
FIGURE 1 BARPLOT OF CLASS ABC(PRODUCT LEVEL DATA)
From the above Fiqure , we found the total count values of each category of the ABC Class as shown below:-
Abc Class Total Counts
A 27
B 37
C 36
TABLE 10 TOTAL COUNTS OF THE CLASS ABC
12. 11
FusionOps|7/18/2016
Summary of The Class ABC(Product Level Data):-
Unit Cost Sales Quantity Unit Price Discount Flag Marketing Spent
count 27 27 27 27 27
mean 4.633481481 641992.2593 17.26198148 13 18147.74074
Std 8.509789031 1951783.957 36.22357567 1.61721508 17879.64391
Min 0.0775 3933 0.3785 11 7353
25% 0.454375 56306.5 1.671125 11.5 9903.5
50% 1.005 177058 2.99625 13 12979
75% 2.666375 432854 6.612875 14.5 15234
Max 31.823 10290924 147.223 16 78490
TABLE 11 SUMMARY OF CLASS A (PRODUCT LEVEL)
Unit Cost Sales Quantity Unit Price Discount Flag Marketing Spent
count 37 37 37 37 37
mean 4.752783784 58412.4 12.00458108 12.24324324 5310.72973
Std 9.335749421 37122.25589 22.24332308 1.516674092 1464.675687
Min 0.485 1754.8 2.1505 10 3181
25% 0.88975 35531 2.81275 11 4255
50% 1.179 61624 3.155 12 5090
75% 2.27975 72011 7.22575 13 5879
Max 38.32575 152059 95.622 15 8979
TABLE 12 SUMMARY OF CLASS B (PRODUCT LEVEL)
13. 12
FusionOps|7/18/2016
Unit Cost Sales Quantity Unit Price Discount Flag Marketing Spent
count 36 36 36 36 36
mean 3.091416667 19287.37639 8.546493056 13.38888889 2400.416667
std 8.271471763 10679.29319 19.00472153 1.711770642 755.2505876
min 0.716 1116 3.60075 11 931
25% 1.369125 10757.75 4.29525 12 1789.25
50% 1.7075 20211.5 5.016625 13 2385.5
75% 2.0568125 25934 6.2524375 14.25 2921.75
max 51.201 38983 119.047 17 3811
TABLE 13 SUMMARY OF CLASS C (PRODUCT LEVEL)
From the Above Summary of the PRODUCT LEVEL DATA, we came to know that :-
Sales Quantity of Class A is more as compared to Class B and Class C.
Marketing Spent of Class A is more as compared to Class B and Class C.
Unit Price of Class A of last 25 percentiles of Sales are more as compared to Class B and Class C. In
other words, we can say that A comprises of more numbers of expensive Sales History.
It is very exciting to note that the Standard Deviation of Marketing Spent of the sales of the Class A
is 520 which is a way more expensive than respective values of Class B and Class C.
Discount Flag of C is more as compared to Class A and Class B.
Sales Quantity of Class A :- Product in the category Class A are in huge demand as compared to Class B
and Class C.
Correlation Of Product Level Data
From the above table we see the significant Correlation between Marketing Spent and Sales Quantity and
Correlation between Unit Price and Unit Cost
Unit Cost Sales Quantity Unit Price Discount Flag Marketing Spent
Unit Cost 1 -0.082260483 0.947731937 0.092942086 0.12113583
Sales Quantity -0.082260483 1 -0.080638021 0.017179196 0.694245221
Unit Price 0.947731937 -0.080638021 1 0.130603683 0.258419279
Discount Flag 0.092942086 0.017179196 0.130603683 1 0.053700919
Marketing Spent 0.12113583 0.694245221 0.258419279 0.053700919 1
TABLE 13 CORRELATION OF PRODUCT LEVEL DATA
44. 43
FusionOps|7/18/2016
Results And Conclusions:-
From the Data of all the Product ,ARIMA stands out as the best model we have use in R Statistical
Programming Language from Time Series Analysis in which an Auto.Arima() function automatically
calculate ‘p’ (lag for AR) , ‘q’ (lag for MA) and ‘d’ (Stationary Flag) based on AIC (Akaike’s An Information
Criterion)
Please find the attached result of Time Series Forecasting for the month of January , February and March for
all the Product.
It is interesting to see all the comparison histogram of AIC , ARIMA as the minimum AIC except in products where
Exponential Smoothing is just better than ARIMA.
In Product 2, Product 4, Product 5, Product 6 and Product 7 shows an incremental trend in the Demand
whereas the rest has a fluctuating Demand along a constant rolling mean.
Interestingly Product 3 shows a decremented trend in the Demand with respect to Time.
The Product 4, Product 5, Product 6, Product 7, Product 8,Product 9 also shows the Seasonal
Behaviour.
Product January February March
Prod_1 540 476 466
Prod_2 2935 2941 2951
Prod_3 3279 3225 3229
Prod_4 8033 8108 7849
Prod_5 12450 12174 12255
Prod_6 1578 1609 1679
Prod_7 21174 20865 20204
Prod_8 233 284 257
Prod_9 82 125 60
Prod_10 1 0 2
45. 44
FusionOps|7/18/2016
Code :
library(dataiku)
library(forecast)
library(dplyr)
#importing data for the DataSet-2
demandingdata <- read.csv('/home/prerna/Documents/DSDeveloperTest/DataSet-2.csv')
demandingdata= transform(demandingdata, ym = as.yearmon(as.character(demandingdata$Date), "%Y%m"))
pr=c()
for (i in unique(demandingdata$Product)) {
#creating temporary data for the products
temp=demandingdata[demandingdata$Product==i,]
#time series forecasting for the Demand
timedata=ts(temp$Demand,start = 2013,frequency = 12)
plot(timedata)
#forcast for the Exponential Smooting
m_ets = ets(timedata)
f_ets = forecast(m_ets, h=3) # forecast 24 months into the future
plot(f_ets)
#applying auto.arima() function for the forecast
m_aa = auto.arima(timedata)
f_aa = forecast(m_aa, h=3)
plot(f_aa)
#TBATS model for the forecast.
m_tbats = tbats(timedata)
f_tbats = forecast(m_tbats, h=3)
plot(f_tbats)
#Barplot for the ETS, ARIMA and TBATS
barplot(c(ETS=m_ets$aic, ARIMA=m_aa$aic, TBATS=m_tbats$AIC), col="light blue", ylab="AIC")
last_date = index(timedata)[length(timedata)]
#forecast for the predicted result of the Products
forecast_df =f_aa
x=as.data.frame(f_aa)
pr=rbind(pr,x$`Point Forecast`)
}
#exporting csv
write.csv(pr,file='predictedresult.csv')
46. 45
FusionOps|7/18/2016
Solution-3
Task-1
1)Creating Data
a) $mysql -u root -p ******
b) Mysql> create database FusionOps;
2)Pushing CSV’s to MySQL Using Python:-
#!/usr/bin/python
import MySQLdb
from pandas.io import sql
import pandas as pd
# Open database connection
db = MySQLdb.connect("localhost","root","prerna1289","FusionOps" )
cursor = db.cursor()
df=pd.read_csv("/home/prerna/Desktop/SalesData.csv")
df['Sales Order Date']=pd.to_datetime(df['Sales Order Date'])
df.to_sql(con=db, name='SalesData', if_exists='replace', flavor='mysql')
sd=cursor.execute("select * from SalesData;")
print sd
df=pd.read_csv("/home/prerna/Desktop/PurchasingData.csv")
df['Replenishment Date']=pd.to_datetime(df['Replenishment Date'])
df.to_sql(con=db, name='PurchasingData', if_exists='replace', flavor='mysql')
pd1=cursor.execute("select * from PurchasingData;")
print pd1
47. 46
FusionOps|7/18/2016
Task-2
1)Creating DailySalesAndStockData table:-
#!/usr/bin/python
from datetime import date,datetime
import MySQLdb
from pandas.io import sql
import pandas as pd
# Open database connection
db = MySQLdb.connect("localhost","root","prerna1289","FusionOps" )
cursor = db.cursor()
data_you_need=pd.DataFrame(columns=['Date','PartNo', 'ShopNo' ,'Sales_Quantity' ,'Sales_Quantity_Cum' ,'End-of-day
Stock'])
df1=pd.read_sql('select * from SalesData GROUP BY CONCAT(Part_Number,ShopNo);', con=db)
df1=df1[list(['Part_Number','ShopNo'])]
for index, row in df1.iterrows():
df_sale = pd.read_sql('select * from SalesData where Part_Number= ''+str(row['Part_Number'])+'' and
ShopNo=''+str(row['ShopNo'])+'';', con=db)
df_purchase= pd.read_sql('select * from PurchasingData where Part_Number= ''+str(row['Part_Number'])+'' and
ShopNo=''+str(row['ShopNo'])+'';', con=db)
date1=pd.date_range(date(2014,12,15), date(2015,3,31), freq='D')
df=pd.DataFrame(date1, index=date1,columns=['Date'])
df['PartNo']=str(row['Part_Number'])
df['ShopNo']=str(row['ShopNo'])
result = pd.merge(df, df_sale[list(['Sales_Order_Date','Sales_Quantity'])], how='left', left_on=['Date'],
right_on=['Sales_Order_Date'])
df = result.fillna(0)
result = pd.merge(df, df_purchase[list(['Replenishment_Date','Quantity_Produced/Bought'])], how='left',
left_on=['Date'], right_on=['Replenishment_Date'])
result= result[list(['Date','PartNo','ShopNo','Sales_Quantity','Quantity_Produced/Bought'])].fillna(0)