Se ha denunciado esta presentación.
Utilizamos tu perfil de LinkedIn y tus datos de actividad para personalizar los anuncios y mostrarte publicidad más relevante. Puedes cambiar tus preferencias de publicidad en cualquier momento.
YoungStatisticiansConference7 February 2013
YoungStatisticiansConference7 February 2013
YoungStatisticiansConference7 February 2013
Outline1 Where fools fear to tread2 Working with inadequate tools3 When you can’t lose4 Getting dirty with data5 Going to ...
My storyOlympic video poker slotsBeware of smelly clientsThreats and slanderNerves in courtThree universityconsulting serv...
My storyOlympic video poker slotsBeware of smelly clientsThreats and slanderNerves in courtThree universityconsulting serv...
My storyOlympic video poker slotsBeware of smelly clientsThreats and slanderNerves in courtThree universityconsulting serv...
My storyOlympic video poker slotsBeware of smelly clientsThreats and slanderNerves in courtThree universityconsulting serv...
My storyOlympic video poker slotsBeware of smelly clientsThreats and slanderNerves in courtThree universityconsulting serv...
My storyOlympic video poker slotsBeware of smelly clientsThreats and slanderNerves in courtThree universityconsulting serv...
My storyOlympic video poker slotsBeware of smelly clientsThreats and slanderNerves in courtThree universityconsulting serv...
My storyOlympic video poker slotsBeware of smelly clientsThreats and slanderNerves in courtThree universityconsulting serv...
Outline1 Where fools fear to tread2 Working with inadequate tools3 When you can’t lose4 Getting dirty with data5 Going to ...
Disposable tableware companyProblem: Want forecasts of each ofhundreds of items. Series can bestationary, trended or seaso...
Disposable tableware companyProblem: Want forecasts of each ofhundreds of items. Series can bestationary, trended or seaso...
Disposable tableware companyProblem: Want forecasts of each ofhundreds of items. Series can bestationary, trended or seaso...
Disposable tableware companyProblem: Want forecasts of each ofhundreds of items. Series can bestationary, trended or seaso...
Disposable tableware companyMethods currently used      A 12 month average      C 6 month average      E straight line reg...
Disposable tableware companyMy solution   Use first differencing to deal with trend, or seasonal   differencing to deal wit...
Disposable tableware companyMy solution   Use first differencing to deal with trend, or seasonal   differencing to deal wit...
Disposable tableware companyMy solution   Use first differencing to deal with trend, or seasonal   differencing to deal wit...
Disposable tableware companyMy solution   Use first differencing to deal with trend, or seasonal   differencing to deal wit...
Disposable tableware companyMy solution  Use first differencing to deal with trend, or seasonal Some lessons with seasonali...
Outline1 Where fools fear to tread2 Working with inadequate tools3 When you can’t lose4 Getting dirty with data5 Going to ...
Forecasting the PBS   Man vs Wild Data   When you can’t lose   9
Forecasting the PBSThe Pharmaceutical Benefits Scheme (PBS) isthe Australian government drugs subsidy scheme.    Many drugs...
Forecasting the PBSThe Pharmaceutical Benefits Scheme (PBS) isthe Australian government drugs subsidy scheme.    Many drugs...
Forecasting the PBSThe Pharmaceutical Benefits Scheme (PBS) isthe Australian government drugs subsidy scheme.    Many drugs...
Forecasting the PBSThe Pharmaceutical Benefits Scheme (PBS) isthe Australian government drugs subsidy scheme.    Many drugs...
Forecasting the PBS   Man vs Wild Data   When you can’t lose   11
Forecasting the PBS In 2001: $4.5 billion budget, under-forecasted by $800 million. Thousands of products. Seasonal demand...
Forecasting the PBS In 2001: $4.5 billion budget, under-forecasted by $800 million. Thousands of products. Seasonal demand...
Forecasting the PBS In 2001: $4.5 billion budget, under-forecasted by $800 million. Thousands of products. Seasonal demand...
Forecasting the PBS In 2001: $4.5 billion budget, under-forecasted by $800 million. Thousands of products. Seasonal demand...
Forecasting the PBS In 2001: $4.5 billion budget, under-forecasted by $800 million. Thousands of products. Seasonal demand...
ATC drug classificationA   Alimentary tract and metabolismB   Blood and blood forming organsC   Cardiovascular systemD   De...
ATC drug classification14 classes           A      Alimentary tract and metabolism84 classes         A10      Drugs used in...
Forecasting the PBS Monthly data on thousands of drug groups and 4 concession types available from 1991. Method needs to b...
Forecasting the PBS Monthly data on thousands of drug groups and 4 concession types available from 1991. Method needs to b...
Forecasting the PBS Monthly data on thousands of drug groups and 4 concession types available from 1991. Method needs to b...
Forecasting the PBS Monthly data on thousands of drug groups and 4 concession types available from 1991. Method needs to b...
Forecasting the PBS Monthly data on thousands of drug groups and 4 concession types available from 1991. Method needs to b...
Forecasting the PBS                              Total cost: A03 concession safety net group              1200            ...
Forecasting the PBS                              Total cost: A05 general copayments group              250              20...
Forecasting the PBS                              Total cost: D01 general copayments group              700              60...
Forecasting the PBS                               Total cost: S01 general copayments group              6000              ...
Forecasting the PBS                                                             Total cost: R03 general copayments group  ...
Forecasting the PBS                                                                Total cost: R03 general copayments grou...
Outline1 Where fools fear to tread2 Working with inadequate tools3 When you can’t lose4 Getting dirty with data5 Going to ...
Airline passenger traffic   Man vs Wild Data   Getting dirty with data   18
Airline passenger traffic                                        First class passengers: Melbourne−Sydney2.01.00.0         ...
Airline passenger traffic                                        First class passengers: Melbourne−Sydney2.01.00.0         ...
Airline passenger traffic                                     Economy Class Passengers: Melbourne−Sydney                   ...
Airline passenger traffic                                     Economy Class Passengers: Melbourne−Sydney                   ...
Airline passenger traffic                                     Economy Class Passengers: Melbourne−Sydney                   ...
Possible model                       ∗                 Yt = Yt + Z t                  ∗                 Yt = β0 +       βj...
Possible model                       ∗                 Yt = Yt + Z t             ∗ Some lessonsβ0 +            Yt =       ...
Outline1 Where fools fear to tread2 Working with inadequate tools3 When you can’t lose4 Getting dirty with data5 Going to ...
Extreme electricity demand   Man vs Wild Data   Going to extremes   23
The problem We want to forecast the peak electricity demand in a half-hour period in ten years time. We have twelve years ...
The problem We want to forecast the peak electricity demand in a half-hour period in ten years time. We have twelve years ...
The problem We want to forecast the peak electricity demand in a half-hour period in ten years time. We have twelve years ...
The problem We want to forecast the peak electricity demand in a half-hour period in ten years time. We have twelve years ...
The problem We want to forecast the peak electricity demand in a half-hour period in ten years time. We have twelve years ...
South Australian demand data   Man vs Wild Data   Going to extremes   25
South Australian demand data                      Black Saturday →   Man vs Wild Data        Going to extremes   25
South Australian demand data                                                    South Australia state wide demand (summer ...
South Australian demand data                                              South Australia state wide demand (January 2011)...
Demand boxplots (Sth Aust)                                        Time: 12 midnight              3.5              3.0     ...
Temperature data (Sth Aust)                                         Time: 12 midnight              3.5                    ...
Monash Electricity Forecasting Model                                                J log(yt ) = hp (t ) + fp (w1,t , w2,t...
Monash Electricity Forecasting Model                                                J log(yt ) = hp (t ) + fp (w1,t , w2,t...
Monash Electricity Forecasting Model                                                J log(yt ) = hp (t ) + fp (w1,t , w2,t...
Monash Electricity Forecasting Model                                                J log(yt ) = hp (t ) + fp (w1,t , w2,t...
Monash Electricity Forecasting Model                                                J log(yt ) = hp (t ) + fp (w1,t , w2,t...
Monash Electricity Forecasting Model                                                           J      log(yt ) = hp (t ) +...
Monash Electricity Forecasting Model                                                           J      log(yt ) = hp (t ) +...
Monash Electricity Forecasting Model                                                           J      log(yt ) = hp (t ) +...
Monash Electricity Forecasting Model                                                           J      log(yt ) = hp (t ) +...
Monash Electricity Forecasting Model                                                           J      log(yt ) = hp (t ) +...
Monash Electricity Forecasting Model                                                           J      log(yt ) = hp (t ) +...
Fitted results (Summer 3pm)                                                                      Time: 3:00 pm            ...
Monash Electricity Forecasting Model                                                                    J         log(yt )...
Monash Electricity Forecasting Model                                                                    J         log(yt )...
Monash Electricity Forecasting Model                                                                    J         log(yt )...
Monash Electricity Forecasting Model                                                                    J         log(yt )...
Monash Electricity Forecasting Model                                                                    J         log(yt )...
Monash Electricity Forecasting Model                                                                    J         log(yt )...
Monash Electricity Forecasting Model                                                                    J         log(yt )...
0.4     Fitted results (Summer 3pm)                                        Time: 3:00 pm                                  ...
Monash Electricity Forecasting Model                                              J log(yt ) = hp (t ) + fp (w1,t , w2,t )...
Monash Electricity Forecasting Model                                              J log(yt ) = hp (t ) + fp (w1,t , w2,t )...
Monash Electricity Forecasting Model                                              J log(yt ) = hp (t ) + fp (w1,t , w2,t )...
Half-hourly models     x x1 x2 x3 x4 x5 x6 x48 x96 x144 x192 x240 x288 d d1 d2 d3 d4 d5 d6 d48 d96 d144 d192 d240 d288 x+ ...
Half-hourly models                                                     R−squared                90R−squared (%)           ...
Half-hourly models                                                    South Australian demand (January 2011)              ...
Half-hourly models   Man vs Wild Data   Going to extremes   35
Half-hourly models   Man vs Wild Data   Going to extremes   35
Adjusted modelOriginal model                                                  J   log(yt ) = hp (t ) + fp (w1,t , w2,t ) +...
Adjusted modelOriginal model                                                  J   log(yt ) = hp (t ) + fp (w1,t , w2,t ) +...
Peak demand forecasting                                             J     qt,p = hp (t ) + fp (w1,t , w2,t ) +         cj ...
Peak demand backcasting                                             J     qt,p = hp (t ) + fp (w1,t , w2,t ) +         cj ...
Peak demand backcasting                                                  PoE (annual interpretation)             4.0      ...
Peak demand forecasting                                                                         South Australia GSP       ...
Peak demand distribution                                                     Annual POE levels             6              ...
Results    We have successfully forecast the extreme upper tail in    ten years time using only twelve years of data!    T...
Results    We have successfully forecast the extreme upper tail in    ten years time using only twelve years of data!    T...
Results    We have successfully forecast the extreme upper tail in    ten years time using only twelve years of data!    T...
Results    We have successfully forecast the extreme upper tail in    ten years time using only twelve years of data!    T...
Outline1 Where fools fear to tread2 Working with inadequate tools3 When you can’t lose4 Getting dirty with data5 Going to ...
Crazy clients The client who wouldn’t tell me the problem. The client who wanted all meetings held at random locations for...
Crazy clients The client who wouldn’t tell me the problem. The client who wanted all meetings held at random locations for...
Crazy clients The client who wouldn’t tell me the problem. The client who wanted all meetings held at random locations for...
Crazy clients The client who wouldn’t tell me the problem. The client who wanted all meetings held at random locations for...
Go forth and consultA good statistician is not smarter thaneveryone else, he merely has his ignorancebetter organised.    ...
Go forth and consultAll models are wrong, some are useful.                         (George E P Box)      Man vs Wild Data ...
Go forth and consultIt is better to solve the right problem thewrong way than the wrong problem theright way.             ...
Go forth and consultIt is better to solve the right problem thewrong way than the wrong problem theright way.             ...
Próxima SlideShare
Cargando en…5
×

Ysc2013

  • Sé el primero en comentar

Ysc2013

  1. 1. YoungStatisticiansConference7 February 2013
  2. 2. YoungStatisticiansConference7 February 2013
  3. 3. YoungStatisticiansConference7 February 2013
  4. 4. Outline1 Where fools fear to tread2 Working with inadequate tools3 When you can’t lose4 Getting dirty with data5 Going to extremes6 Final thoughts Man vs Wild Data Where fools fear to tread 2
  5. 5. My storyOlympic video poker slotsBeware of smelly clientsThreats and slanderNerves in courtThree universityconsulting servicesReviewing my ownworkSix times an expertwitnessHundreds of clients Man vs Wild Data Where fools fear to tread 3
  6. 6. My storyOlympic video poker slotsBeware of smelly clientsThreats and slanderNerves in courtThree universityconsulting servicesReviewing my ownworkSix times an expertwitnessHundreds of clients Man vs Wild Data Where fools fear to tread 3
  7. 7. My storyOlympic video poker slotsBeware of smelly clientsThreats and slanderNerves in courtThree universityconsulting servicesReviewing my ownworkSix times an expertwitnessHundreds of clients Man vs Wild Data Where fools fear to tread 3
  8. 8. My storyOlympic video poker slotsBeware of smelly clientsThreats and slanderNerves in courtThree universityconsulting servicesReviewing my ownworkSix times an expertwitnessHundreds of clients Man vs Wild Data Where fools fear to tread 3
  9. 9. My storyOlympic video poker slotsBeware of smelly clientsThreats and slanderNerves in courtThree universityconsulting servicesReviewing my ownworkSix times an expertwitnessHundreds of clients Man vs Wild Data Where fools fear to tread 3
  10. 10. My storyOlympic video poker slotsBeware of smelly clientsThreats and slanderNerves in courtThree universityconsulting servicesReviewing my ownworkSix times an expertwitnessHundreds of clients Man vs Wild Data Where fools fear to tread 3
  11. 11. My storyOlympic video poker slotsBeware of smelly clientsThreats and slanderNerves in courtThree universityconsulting servicesReviewing my ownworkSix times an expertwitnessHundreds of clients Man vs Wild Data Where fools fear to tread 3
  12. 12. My storyOlympic video poker slotsBeware of smelly clientsThreats and slanderNerves in courtThree universityconsulting servicesReviewing my ownworkSix times an expertwitnessHundreds of clients Man vs Wild Data Where fools fear to tread 3
  13. 13. Outline1 Where fools fear to tread2 Working with inadequate tools3 When you can’t lose4 Getting dirty with data5 Going to extremes6 Final thoughts Man vs Wild Data Working with inadequate tools 4
  14. 14. Disposable tableware companyProblem: Want forecasts of each ofhundreds of items. Series can bestationary, trended or seasonal. Theycurrently have a large forecastingprogram written in-house but it doesn’tseem to produce sensible forecasts.They want me to tell them what iswrong and fix it. Man vs Wild Data Working with inadequate tools 5
  15. 15. Disposable tableware companyProblem: Want forecasts of each ofhundreds of items. Series can bestationary, trended or seasonal. Theycurrently have a large forecastingprogram written in-house but it doesn’tseem to produce sensible forecasts.They want me to tell them what iswrong and fix it.Additional information Program written in COBOL making numerical calculations limited. It is not possible to do any optimisation. Man vs Wild Data Working with inadequate tools 5
  16. 16. Disposable tableware companyProblem: Want forecasts of each ofhundreds of items. Series can bestationary, trended or seasonal. Theycurrently have a large forecastingprogram written in-house but it doesn’tseem to produce sensible forecasts.They want me to tell them what iswrong and fix it.Additional information Program written in COBOL making numerical calculations limited. It is not possible to do any optimisation. Their programmer has little experience in numerical computing. Man vs Wild Data Working with inadequate tools 5
  17. 17. Disposable tableware companyProblem: Want forecasts of each ofhundreds of items. Series can bestationary, trended or seasonal. Theycurrently have a large forecastingprogram written in-house but it doesn’tseem to produce sensible forecasts.They want me to tell them what iswrong and fix it.Additional information Program written in COBOL making numerical calculations limited. It is not possible to do any optimisation. Their programmer has little experience in numerical computing. They employ no statisticians and want the program to produce forecasts automatically. Man vs Wild Data Working with inadequate tools 5
  18. 18. Disposable tableware companyMethods currently used A 12 month average C 6 month average E straight line regression over last 12 months G straight line regression over last 6 months H average slope between last year’s and this year’s values. (Equivalent to differencing at lag 12 and taking mean.) I Same as H except over 6 months. K I couldn’t understand the explanation. Man vs Wild Data Working with inadequate tools 6
  19. 19. Disposable tableware companyMy solution Use first differencing to deal with trend, or seasonal differencing to deal with seasonality. Use simple exponential smoothing on (differenced) data with the parameter selected from {0.1, 0.3, 0.5, 0.7, 0.9}. For each series, try 15 models: no differencing, first differencing, and seasonal differencing, plus SES with 5 parameter values. Model selected based on smallest MSE. (Only one parameter for each model, so no need to penalize for model size.) Man vs Wild Data Working with inadequate tools 7
  20. 20. Disposable tableware companyMy solution Use first differencing to deal with trend, or seasonal differencing to deal with seasonality. Use simple exponential smoothing on (differenced) data with the parameter selected from {0.1, 0.3, 0.5, 0.7, 0.9}. For each series, try 15 models: no differencing, first differencing, and seasonal differencing, plus SES with 5 parameter values. Model selected based on smallest MSE. (Only one parameter for each model, so no need to penalize for model size.) Man vs Wild Data Working with inadequate tools 7
  21. 21. Disposable tableware companyMy solution Use first differencing to deal with trend, or seasonal differencing to deal with seasonality. Use simple exponential smoothing on (differenced) data with the parameter selected from {0.1, 0.3, 0.5, 0.7, 0.9}. For each series, try 15 models: no differencing, first differencing, and seasonal differencing, plus SES with 5 parameter values. Model selected based on smallest MSE. (Only one parameter for each model, so no need to penalize for model size.) Man vs Wild Data Working with inadequate tools 7
  22. 22. Disposable tableware companyMy solution Use first differencing to deal with trend, or seasonal differencing to deal with seasonality. Use simple exponential smoothing on (differenced) data with the parameter selected from {0.1, 0.3, 0.5, 0.7, 0.9}. For each series, try 15 models: no differencing, first differencing, and seasonal differencing, plus SES with 5 parameter values. Model selected based on smallest MSE. (Only one parameter for each model, so no need to penalize for model size.) Man vs Wild Data Working with inadequate tools 7
  23. 23. Disposable tableware companyMy solution Use first differencing to deal with trend, or seasonal Some lessons with seasonality. differencing to deal Use simple exponential smoothing on (differenced) Be pragmatic. data with the parameter selected from {0Understand .9}. .1, 0.3, 0.5, 0.7, 0 your tools well enough For each series, to adapt them. to be able try 15 models: no differencing, first differencing, and seasonal differencing, plus SES with successful consulting job often A 5 parameter values. Model selected based on methods. (Only one uses very simple smallest MSE. parameter for each model, so no need to penalize for model size.) Man vs Wild Data Working with inadequate tools 7
  24. 24. Outline1 Where fools fear to tread2 Working with inadequate tools3 When you can’t lose4 Getting dirty with data5 Going to extremes6 Final thoughts Man vs Wild Data When you can’t lose 8
  25. 25. Forecasting the PBS Man vs Wild Data When you can’t lose 9
  26. 26. Forecasting the PBSThe Pharmaceutical Benefits Scheme (PBS) isthe Australian government drugs subsidy scheme. Many drugs bought from pharmacies are subsidised to allow more equitable access to modern drugs. The cost to government is determined by the number and types of drugs purchased. Currently nearly 1% of GDP. The total cost is budgeted based on forecasts of drug usage. Man vs Wild Data When you can’t lose 10
  27. 27. Forecasting the PBSThe Pharmaceutical Benefits Scheme (PBS) isthe Australian government drugs subsidy scheme. Many drugs bought from pharmacies are subsidised to allow more equitable access to modern drugs. The cost to government is determined by the number and types of drugs purchased. Currently nearly 1% of GDP. The total cost is budgeted based on forecasts of drug usage. Man vs Wild Data When you can’t lose 10
  28. 28. Forecasting the PBSThe Pharmaceutical Benefits Scheme (PBS) isthe Australian government drugs subsidy scheme. Many drugs bought from pharmacies are subsidised to allow more equitable access to modern drugs. The cost to government is determined by the number and types of drugs purchased. Currently nearly 1% of GDP. The total cost is budgeted based on forecasts of drug usage. Man vs Wild Data When you can’t lose 10
  29. 29. Forecasting the PBSThe Pharmaceutical Benefits Scheme (PBS) isthe Australian government drugs subsidy scheme. Many drugs bought from pharmacies are subsidised to allow more equitable access to modern drugs. The cost to government is determined by the number and types of drugs purchased. Currently nearly 1% of GDP. The total cost is budgeted based on forecasts of drug usage. Man vs Wild Data When you can’t lose 10
  30. 30. Forecasting the PBS Man vs Wild Data When you can’t lose 11
  31. 31. Forecasting the PBS In 2001: $4.5 billion budget, under-forecasted by $800 million. Thousands of products. Seasonal demand. Subject to covert marketing, volatile products, uncontrollable expenditure. Although monthly data available for 10 years, data are aggregated to annual values, and only the first three years are used in estimating the forecasts. All forecasts being done with the FORECAST function in MS-Excel! Man vs Wild Data When you can’t lose 12
  32. 32. Forecasting the PBS In 2001: $4.5 billion budget, under-forecasted by $800 million. Thousands of products. Seasonal demand. Subject to covert marketing, volatile products, uncontrollable expenditure. Although monthly data available for 10 years, data are aggregated to annual values, and only the first three years are used in estimating the forecasts. All forecasts being done with the FORECAST function in MS-Excel! Man vs Wild Data When you can’t lose 12
  33. 33. Forecasting the PBS In 2001: $4.5 billion budget, under-forecasted by $800 million. Thousands of products. Seasonal demand. Subject to covert marketing, volatile products, uncontrollable expenditure. Although monthly data available for 10 years, data are aggregated to annual values, and only the first three years are used in estimating the forecasts. All forecasts being done with the FORECAST function in MS-Excel! Man vs Wild Data When you can’t lose 12
  34. 34. Forecasting the PBS In 2001: $4.5 billion budget, under-forecasted by $800 million. Thousands of products. Seasonal demand. Subject to covert marketing, volatile products, uncontrollable expenditure. Although monthly data available for 10 years, data are aggregated to annual values, and only the first three years are used in estimating the forecasts. All forecasts being done with the FORECAST function in MS-Excel! Man vs Wild Data When you can’t lose 12
  35. 35. Forecasting the PBS In 2001: $4.5 billion budget, under-forecasted by $800 million. Thousands of products. Seasonal demand. Subject to covert marketing, volatile products, uncontrollable expenditure. Although monthly data available for 10 years, data are aggregated to annual values, and only the first three years are used in estimating the forecasts. All forecasts being done with the FORECAST function in MS-Excel! Man vs Wild Data When you can’t lose 12
  36. 36. ATC drug classificationA Alimentary tract and metabolismB Blood and blood forming organsC Cardiovascular systemD DermatologicalsG Genito-urinary system and sex hormonesH Systemic hormonal preparations, excluding sex hor- mones and insulinsJ Anti-infectives for systemic useL Antineoplastic and immunomodulating agentsM Musculo-skeletal systemN Nervous systemP Antiparasitic products, insecticides and repellentsR Respiratory systemS Sensory organsV Various Man vs Wild Data When you can’t lose 13
  37. 37. ATC drug classification14 classes A Alimentary tract and metabolism84 classes A10 Drugs used in diabetes A10B Blood glucose lowering drugs A10BA Biguanides A10BA02 Metformin Man vs Wild Data When you can’t lose 14
  38. 38. Forecasting the PBS Monthly data on thousands of drug groups and 4 concession types available from 1991. Method needs to be automated and implemented within MS-Excel. Exponential smoothing seems appropriate (monthly data with changing trends and seasonal patterns), but in 2001, automated exponential smoothing was not well-developed, and not available in MS-Excel. As part of this project, we developed an automatic forecasting algorithm for exponential smoothing state space models based on the AIC. Forecast MAPE reduced from 15–20% to about 0.6%. Man vs Wild Data When you can’t lose 15
  39. 39. Forecasting the PBS Monthly data on thousands of drug groups and 4 concession types available from 1991. Method needs to be automated and implemented within MS-Excel. Exponential smoothing seems appropriate (monthly data with changing trends and seasonal patterns), but in 2001, automated exponential smoothing was not well-developed, and not available in MS-Excel. As part of this project, we developed an automatic forecasting algorithm for exponential smoothing state space models based on the AIC. Forecast MAPE reduced from 15–20% to about 0.6%. Man vs Wild Data When you can’t lose 15
  40. 40. Forecasting the PBS Monthly data on thousands of drug groups and 4 concession types available from 1991. Method needs to be automated and implemented within MS-Excel. Exponential smoothing seems appropriate (monthly data with changing trends and seasonal patterns), but in 2001, automated exponential smoothing was not well-developed, and not available in MS-Excel. As part of this project, we developed an automatic forecasting algorithm for exponential smoothing state space models based on the AIC. Forecast MAPE reduced from 15–20% to about 0.6%. Man vs Wild Data When you can’t lose 15
  41. 41. Forecasting the PBS Monthly data on thousands of drug groups and 4 concession types available from 1991. Method needs to be automated and implemented within MS-Excel. Exponential smoothing seems appropriate (monthly data with changing trends and seasonal patterns), but in 2001, automated exponential smoothing was not well-developed, and not available in MS-Excel. As part of this project, we developed an automatic forecasting algorithm for exponential smoothing state space models based on the AIC. Forecast MAPE reduced from 15–20% to about 0.6%. Man vs Wild Data When you can’t lose 15
  42. 42. Forecasting the PBS Monthly data on thousands of drug groups and 4 concession types available from 1991. Method needs to be automated and implemented within MS-Excel. Exponential smoothing seems appropriate (monthly data with changing trends and seasonal patterns), but in 2001, automated exponential smoothing was not well-developed, and not available in MS-Excel. As part of this project, we developed an automatic forecasting algorithm for exponential smoothing state space models based on the AIC. Forecast MAPE reduced from 15–20% to about 0.6%. Man vs Wild Data When you can’t lose 15
  43. 43. Forecasting the PBS Total cost: A03 concession safety net group 1200 1000 800$ thousands 600 400 200 0 1995 2000 2005 2010 Man vs Wild Data When you can’t lose 16
  44. 44. Forecasting the PBS Total cost: A05 general copayments group 250 200$ thousands 150 100 50 0 1995 2000 2005 2010 Man vs Wild Data When you can’t lose 16
  45. 45. Forecasting the PBS Total cost: D01 general copayments group 700 600 500 400$ thousands 300 200 100 0 1995 2000 2005 2010 Man vs Wild Data When you can’t lose 16
  46. 46. Forecasting the PBS Total cost: S01 general copayments group 6000 5000 4000$ thousands 3000 2000 1000 0 1995 2000 2005 2010 Man vs Wild Data When you can’t lose 16
  47. 47. Forecasting the PBS Total cost: R03 general copayments group 1000 2000 3000 4000 5000 6000 7000$ thousands 1995 2000 2005 2010 Man vs Wild Data When you can’t lose 16
  48. 48. Forecasting the PBS Total cost: R03 general copayments group 1000 2000 3000 4000 5000 6000 7000 Some lessons Often what people do is very bad, and it is easy to make a big difference.$ thousands Sometimes you have to invent new methods, and that can lead to publications. You have to implement solutions in the client’s software environment. Be aware of the2000 1995 politics. 2005 2010 Man vs Wild Data When you can’t lose 16
  49. 49. Outline1 Where fools fear to tread2 Working with inadequate tools3 When you can’t lose4 Getting dirty with data5 Going to extremes6 Final thoughts Man vs Wild Data Getting dirty with data 17
  50. 50. Airline passenger traffic Man vs Wild Data Getting dirty with data 18
  51. 51. Airline passenger traffic First class passengers: Melbourne−Sydney2.01.00.0 1988 1989 1990 1991 1992 1993 Year Business class passengers: Melbourne−Sydney0 2 4 6 8 1988 1989 1990 1991 1992 1993 Year Economy class passengers: Melbourne−Sydney3020100 1988 1989 1990 1991 1992 1993 Man vs Wild Data Year Getting dirty with data 19
  52. 52. Airline passenger traffic First class passengers: Melbourne−Sydney2.01.00.0 1988 Not1989 real 1990 the data! 1991 1992 1993 Year Or is it? class passengers: Melbourne−Sydney Business0 2 4 6 8 1988 1989 1990 1991 1992 1993 Year Economy class passengers: Melbourne−Sydney3020100 1988 1989 1990 1991 1992 1993 Man vs Wild Data Year Getting dirty with data 19
  53. 53. Airline passenger traffic Economy Class Passengers: Melbourne−Sydney 35 30Passengers (thousands) 25 20 15 10 5 0 1988 1989 1990 1991 1992 1993 Man vs Wild Data Getting dirty with data 20
  54. 54. Airline passenger traffic Economy Class Passengers: Melbourne−Sydney 35 30Passengers (thousands) 25 20 15 10 5 0 1988 1989 1990 1991 1992 1993 Man vs Wild Data Getting dirty with data 20
  55. 55. Airline passenger traffic Economy Class Passengers: Melbourne−Sydney 35 30Passengers (thousands) 25 20 15 10 5 0 1988 1989 1990 1991 1992 1993 Man vs Wild Data Getting dirty with data 20
  56. 56. Possible model ∗ Yt = Yt + Z t ∗ Yt = β0 + βj xt,j + Nt j Yt = observed data for one passenger class. ∗ Yt = reconstructed data. Zt = latent process (usually equal to zero). xt,j are covariates and dummy variables. Nt = seasonal ARIMA process of period 52. Man vs Wild Data Getting dirty with data 21
  57. 57. Possible model ∗ Yt = Yt + Z t ∗ Some lessonsβ0 + Yt = βj xt,j + Nt j Real data is often very messy. Be Yt = aware of the causes. passenger class. observed data for one ∗ Yt = Get an answer data. if it isn’t pretty. reconstructed even Zt = What to do with the non-integer zero). latent process (usually equal to xt,j are covariates (average 52.19) seasonality? and dummy variables. Nt = How to deal with process of period 52. seasonal ARIMA the correlations between classes and between routes? You often think of better approaches long after the project is finished. Man vs Wild Data Getting dirty with data 21
  58. 58. Outline1 Where fools fear to tread2 Working with inadequate tools3 When you can’t lose4 Getting dirty with data5 Going to extremes6 Final thoughts Man vs Wild Data Going to extremes 22
  59. 59. Extreme electricity demand Man vs Wild Data Going to extremes 23
  60. 60. The problem We want to forecast the peak electricity demand in a half-hour period in ten years time. We have twelve years of half-hourly electricity data, temperature data and some economic and demographic data. The location is South Australia: home to the most volatile electricity demand in the world. Sounds impossible? Man vs Wild Data Going to extremes 24
  61. 61. The problem We want to forecast the peak electricity demand in a half-hour period in ten years time. We have twelve years of half-hourly electricity data, temperature data and some economic and demographic data. The location is South Australia: home to the most volatile electricity demand in the world. Sounds impossible? Man vs Wild Data Going to extremes 24
  62. 62. The problem We want to forecast the peak electricity demand in a half-hour period in ten years time. We have twelve years of half-hourly electricity data, temperature data and some economic and demographic data. The location is South Australia: home to the most volatile electricity demand in the world. Sounds impossible? Man vs Wild Data Going to extremes 24
  63. 63. The problem We want to forecast the peak electricity demand in a half-hour period in ten years time. We have twelve years of half-hourly electricity data, temperature data and some economic and demographic data. The location is South Australia: home to the most volatile electricity demand in the world. Sounds impossible? Man vs Wild Data Going to extremes 24
  64. 64. The problem We want to forecast the peak electricity demand in a half-hour period in ten years time. We have twelve years of half-hourly electricity data, temperature data and some economic and demographic data. The location is South Australia: home to the most volatile electricity demand in the world. Sounds impossible? Man vs Wild Data Going to extremes 24
  65. 65. South Australian demand data Man vs Wild Data Going to extremes 25
  66. 66. South Australian demand data Black Saturday → Man vs Wild Data Going to extremes 25
  67. 67. South Australian demand data South Australia state wide demand (summer 10/11) 3.5South Australia state wide demand (GW) 3.0 2.5 2.0 1.5 Oct 10 Nov 10 Dec 10 Jan 11 Feb 11 Mar 11 Man vs Wild Data Going to extremes 25
  68. 68. South Australian demand data South Australia state wide demand (January 2011) 3.5 3.0South Australian demand (GW) 2.5 2.0 1.5 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 Date in January Man vs Wild Data Going to extremes 25
  69. 69. Demand boxplots (Sth Aust) Time: 12 midnight 3.5 3.0 2.5Demand (GW) q q q q q q q q q q 2.0 q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q 1.5 q q q q q q q q q q q q q q q q q q 1.0 q q Mon Tue Wed Thu Fri Sat Sun Day of week Man vs Wild Data Going to extremes 26
  70. 70. Temperature data (Sth Aust) Time: 12 midnight 3.5 Workday Non−workday 3.0 2.5Demand (GW) 2.0 1.5 1.0 10 20 30 40 Temperature (deg C) Man vs Wild Data Going to extremes 27
  71. 71. Monash Electricity Forecasting Model J log(yt ) = hp (t ) + fp (w1,t , w2,t ) + cj zj,t + nt j =1 yt denotes per capita demand at time t (measured in half-hourly intervals) and p denotes the time of day p = 1, . . . , 48; hp (t ) models all calendar effects; fp (w1,t , w2,t ) models all temperature effects where w1,t is a vector of recent temperatures at location 1 and w2,t is a vector of recent temperatures at location 2; zj,t is a demographic or economic variable at time t nt denotes the model error at time t. Man vs Wild Data Going to extremes 28
  72. 72. Monash Electricity Forecasting Model J log(yt ) = hp (t ) + fp (w1,t , w2,t ) + cj zj,t + nt j =1 yt denotes per capita demand at time t (measured in half-hourly intervals) and p denotes the time of day p = 1, . . . , 48; hp (t ) models all calendar effects; fp (w1,t , w2,t ) models all temperature effects where w1,t is a vector of recent temperatures at location 1 and w2,t is a vector of recent temperatures at location 2; zj,t is a demographic or economic variable at time t nt denotes the model error at time t. Man vs Wild Data Going to extremes 28
  73. 73. Monash Electricity Forecasting Model J log(yt ) = hp (t ) + fp (w1,t , w2,t ) + cj zj,t + nt j =1 yt denotes per capita demand at time t (measured in half-hourly intervals) and p denotes the time of day p = 1, . . . , 48; hp (t ) models all calendar effects; fp (w1,t , w2,t ) models all temperature effects where w1,t is a vector of recent temperatures at location 1 and w2,t is a vector of recent temperatures at location 2; zj,t is a demographic or economic variable at time t nt denotes the model error at time t. Man vs Wild Data Going to extremes 28
  74. 74. Monash Electricity Forecasting Model J log(yt ) = hp (t ) + fp (w1,t , w2,t ) + cj zj,t + nt j =1 yt denotes per capita demand at time t (measured in half-hourly intervals) and p denotes the time of day p = 1, . . . , 48; hp (t ) models all calendar effects; fp (w1,t , w2,t ) models all temperature effects where w1,t is a vector of recent temperatures at location 1 and w2,t is a vector of recent temperatures at location 2; zj,t is a demographic or economic variable at time t nt denotes the model error at time t. Man vs Wild Data Going to extremes 28
  75. 75. Monash Electricity Forecasting Model J log(yt ) = hp (t ) + fp (w1,t , w2,t ) + cj zj,t + nt j =1 yt denotes per capita demand at time t (measured in half-hourly intervals) and p denotes the time of day p = 1, . . . , 48; hp (t ) models all calendar effects; fp (w1,t , w2,t ) models all temperature effects where w1,t is a vector of recent temperatures at location 1 and w2,t is a vector of recent temperatures at location 2; zj,t is a demographic or economic variable at time t nt denotes the model error at time t. Man vs Wild Data Going to extremes 28
  76. 76. Monash Electricity Forecasting Model J log(yt ) = hp (t ) + fp (w1,t , w2,t ) + cj zj,t + nt j =1hp (t ) includes handle annual, weekly and daily seasonalpatterns as well as public holidays:hp (t ) = p (t) + αt,p + βt,p + γt,p + δt,p p (t) is “time of summer” effect (a regression spline); αt,p is day of week effect; βt,p is “holiday” effect; γt,p New Year’s Eve effect; δt,p is millennium effect; Man vs Wild Data Going to extremes 29
  77. 77. Monash Electricity Forecasting Model J log(yt ) = hp (t ) + fp (w1,t , w2,t ) + cj zj,t + nt j =1hp (t ) includes handle annual, weekly and daily seasonalpatterns as well as public holidays:hp (t ) = p (t) + αt,p + βt,p + γt,p + δt,p p (t) is “time of summer” effect (a regression spline); αt,p is day of week effect; βt,p is “holiday” effect; γt,p New Year’s Eve effect; δt,p is millennium effect; Man vs Wild Data Going to extremes 29
  78. 78. Monash Electricity Forecasting Model J log(yt ) = hp (t ) + fp (w1,t , w2,t ) + cj zj,t + nt j =1hp (t ) includes handle annual, weekly and daily seasonalpatterns as well as public holidays:hp (t ) = p (t) + αt,p + βt,p + γt,p + δt,p p (t) is “time of summer” effect (a regression spline); αt,p is day of week effect; βt,p is “holiday” effect; γt,p New Year’s Eve effect; δt,p is millennium effect; Man vs Wild Data Going to extremes 29
  79. 79. Monash Electricity Forecasting Model J log(yt ) = hp (t ) + fp (w1,t , w2,t ) + cj zj,t + nt j =1hp (t ) includes handle annual, weekly and daily seasonalpatterns as well as public holidays:hp (t ) = p (t) + αt,p + βt,p + γt,p + δt,p p (t) is “time of summer” effect (a regression spline); αt,p is day of week effect; βt,p is “holiday” effect; γt,p New Year’s Eve effect; δt,p is millennium effect; Man vs Wild Data Going to extremes 29
  80. 80. Monash Electricity Forecasting Model J log(yt ) = hp (t ) + fp (w1,t , w2,t ) + cj zj,t + nt j =1hp (t ) includes handle annual, weekly and daily seasonalpatterns as well as public holidays:hp (t ) = p (t) + αt,p + βt,p + γt,p + δt,p p (t) is “time of summer” effect (a regression spline); αt,p is day of week effect; βt,p is “holiday” effect; γt,p New Year’s Eve effect; δt,p is millennium effect; Man vs Wild Data Going to extremes 29
  81. 81. Monash Electricity Forecasting Model J log(yt ) = hp (t ) + fp (w1,t , w2,t ) + cj zj,t + nt j =1hp (t ) includes handle annual, weekly and daily seasonalpatterns as well as public holidays:hp (t ) = p (t) + αt,p + βt,p + γt,p + δt,p p (t) is “time of summer” effect (a regression spline); αt,p is day of week effect; βt,p is “holiday” effect; γt,p New Year’s Eve effect; δt,p is millennium effect; Man vs Wild Data Going to extremes 29
  82. 82. Fitted results (Summer 3pm) Time: 3:00 pm 0.4 0.4Effect on demand Effect on demand 0.0 0.0 −0.4 −0.4 0 50 100 150 Mon Tue Wed Thu Fri Sat Sun Day of summer Day of week 0.4Effect on demand 0.0 −0.4 Normal Day before Holiday Day after Holiday Man vs Wild Data Going to extremes 30
  83. 83. Monash Electricity Forecasting Model J log(yt ) = hp (t ) + fp (w1,t , w2,t ) + cj zj,t + nt j =1 6 + −fp (w1,t , w2,t ) = ¯ fk,p (xt−k ) + gk,p (dt−k ) + qp (xt ) + rp (xt ) + sp (xt ) k =0 6 + Fj,p (xt−48j ) + Gj,p (dt−48j ) j=1 xt is ave temp across two sites (Kent Town and Adelaide Airport) at time t; dt is the temp difference between two sites at time t; + xt is max of xt values in past 24 hours; − xt is min of xt values in past 24 hours; ¯ xt is ave temp in past seven days. Each function is smooth & estimated using regression splines. Man vs Wild Data Going to extremes 31
  84. 84. Monash Electricity Forecasting Model J log(yt ) = hp (t ) + fp (w1,t , w2,t ) + cj zj,t + nt j =1 6 + −fp (w1,t , w2,t ) = ¯ fk,p (xt−k ) + gk,p (dt−k ) + qp (xt ) + rp (xt ) + sp (xt ) k =0 6 + Fj,p (xt−48j ) + Gj,p (dt−48j ) j=1 xt is ave temp across two sites (Kent Town and Adelaide Airport) at time t; dt is the temp difference between two sites at time t; + xt is max of xt values in past 24 hours; − xt is min of xt values in past 24 hours; ¯ xt is ave temp in past seven days. Each function is smooth & estimated using regression splines. Man vs Wild Data Going to extremes 31
  85. 85. Monash Electricity Forecasting Model J log(yt ) = hp (t ) + fp (w1,t , w2,t ) + cj zj,t + nt j =1 6 + −fp (w1,t , w2,t ) = ¯ fk,p (xt−k ) + gk,p (dt−k ) + qp (xt ) + rp (xt ) + sp (xt ) k =0 6 + Fj,p (xt−48j ) + Gj,p (dt−48j ) j=1 xt is ave temp across two sites (Kent Town and Adelaide Airport) at time t; dt is the temp difference between two sites at time t; + xt is max of xt values in past 24 hours; − xt is min of xt values in past 24 hours; ¯ xt is ave temp in past seven days. Each function is smooth & estimated using regression splines. Man vs Wild Data Going to extremes 31
  86. 86. Monash Electricity Forecasting Model J log(yt ) = hp (t ) + fp (w1,t , w2,t ) + cj zj,t + nt j =1 6 + −fp (w1,t , w2,t ) = ¯ fk,p (xt−k ) + gk,p (dt−k ) + qp (xt ) + rp (xt ) + sp (xt ) k =0 6 + Fj,p (xt−48j ) + Gj,p (dt−48j ) j=1 xt is ave temp across two sites (Kent Town and Adelaide Airport) at time t; dt is the temp difference between two sites at time t; + xt is max of xt values in past 24 hours; − xt is min of xt values in past 24 hours; ¯ xt is ave temp in past seven days. Each function is smooth & estimated using regression splines. Man vs Wild Data Going to extremes 31
  87. 87. Monash Electricity Forecasting Model J log(yt ) = hp (t ) + fp (w1,t , w2,t ) + cj zj,t + nt j =1 6 + −fp (w1,t , w2,t ) = ¯ fk,p (xt−k ) + gk,p (dt−k ) + qp (xt ) + rp (xt ) + sp (xt ) k =0 6 + Fj,p (xt−48j ) + Gj,p (dt−48j ) j=1 xt is ave temp across two sites (Kent Town and Adelaide Airport) at time t; dt is the temp difference between two sites at time t; + xt is max of xt values in past 24 hours; − xt is min of xt values in past 24 hours; ¯ xt is ave temp in past seven days. Each function is smooth & estimated using regression splines. Man vs Wild Data Going to extremes 31
  88. 88. Monash Electricity Forecasting Model J log(yt ) = hp (t ) + fp (w1,t , w2,t ) + cj zj,t + nt j =1 6 + −fp (w1,t , w2,t ) = ¯ fk,p (xt−k ) + gk,p (dt−k ) + qp (xt ) + rp (xt ) + sp (xt ) k =0 6 + Fj,p (xt−48j ) + Gj,p (dt−48j ) j=1 xt is ave temp across two sites (Kent Town and Adelaide Airport) at time t; dt is the temp difference between two sites at time t; + xt is max of xt values in past 24 hours; − xt is min of xt values in past 24 hours; ¯ xt is ave temp in past seven days. Each function is smooth & estimated using regression splines. Man vs Wild Data Going to extremes 31
  89. 89. Monash Electricity Forecasting Model J log(yt ) = hp (t ) + fp (w1,t , w2,t ) + cj zj,t + nt j =1 6 + −fp (w1,t , w2,t ) = ¯ fk,p (xt−k ) + gk,p (dt−k ) + qp (xt ) + rp (xt ) + sp (xt ) k =0 6 + Fj,p (xt−48j ) + Gj,p (dt−48j ) j=1 xt is ave temp across two sites (Kent Town and Adelaide Airport) at time t; dt is the temp difference between two sites at time t; + xt is max of xt values in past 24 hours; − xt is min of xt values in past 24 hours; ¯ xt is ave temp in past seven days. Each function is smooth & estimated using regression splines. Man vs Wild Data Going to extremes 31
  90. 90. 0.4 Fitted results (Summer 3pm) Time: 3:00 pm 0.4 0.4 0.4 0.2 0.2 0.2 0.2Effect on demand Effect on demand Effect on demand Effect on demand 0.0 0.0 0.0 0.0 −0.2 −0.2 −0.2 −0.2 −0.4 −0.4 −0.4 −0.4 10 20 30 40 10 20 30 40 10 20 30 40 10 20 30 40 Temperature Lag 1 temperature Lag 2 temperature Lag 3 temperature 0.4 0.4 0.4 0.4 0.2 0.2 0.2 0.2Effect on demand Effect on demand Effect on demand Effect on demand 0.0 0.0 0.0 0.0 −0.2 −0.2 −0.2 −0.2 −0.4 −0.4 −0.4 −0.4 10 20 30 40 10 15 20 25 30 15 25 35 10 15 20 25 Lag 1 day temperature Last week average temp Previous max temp Previous min temp Man vs Wild Data Going to extremes 32
  91. 91. Monash Electricity Forecasting Model J log(yt ) = hp (t ) + fp (w1,t , w2,t ) + cj zj,t + nt j =1 Same predictors used for all 48 models. Predictors chosen by cross-validation on summer of 2007/2008 and 2009/2010. Each model is fitted to the data twice, first excluding the summer of 2009/2010 and then excluding the summer of 2010/2011. The average out-of-sample MSE is calculated from the omitted data for the time periods 12noon–8.30pm. Man vs Wild Data Going to extremes 33
  92. 92. Monash Electricity Forecasting Model J log(yt ) = hp (t ) + fp (w1,t , w2,t ) + cj zj,t + nt j =1 Same predictors used for all 48 models. Predictors chosen by cross-validation on summer of 2007/2008 and 2009/2010. Each model is fitted to the data twice, first excluding the summer of 2009/2010 and then excluding the summer of 2010/2011. The average out-of-sample MSE is calculated from the omitted data for the time periods 12noon–8.30pm. Man vs Wild Data Going to extremes 33
  93. 93. Monash Electricity Forecasting Model J log(yt ) = hp (t ) + fp (w1,t , w2,t ) + cj zj,t + nt j =1 Same predictors used for all 48 models. Predictors chosen by cross-validation on summer of 2007/2008 and 2009/2010. Each model is fitted to the data twice, first excluding the summer of 2009/2010 and then excluding the summer of 2010/2011. The average out-of-sample MSE is calculated from the omitted data for the time periods 12noon–8.30pm. Man vs Wild Data Going to extremes 33
  94. 94. Half-hourly models x x1 x2 x3 x4 x5 x6 x48 x96 x144 x192 x240 x288 d d1 d2 d3 d4 d5 d6 d48 d96 d144 d192 d240 d288 x+ x− x dow hol dos MSE ¯ 1 • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • 1.037 2 • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • 1.034 3 • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • 1.031 4 • • • • • • • • • • • • • • • • • • • • • • • • • • • • • 1.027 5 • • • • • • • • • • • • • • • • • • • • • • • • • • • • 1.025 6 • • • • • • • • • • • • • • • • • • • • • • • • • • • 1.020 7 • • • • • • • • • • • • • • • • • • • • • • • • • • 1.025 8 • • • • • • • • • • • • • • • • • • • • • • • • • • 1.026 9 • • • • • • • • • • • • • • • • • • • • • • • • • 1.03510 • • • • • • • • • • • • • • • • • • • • • • • • 1.04411 • • • • • • • • • • • • • • • • • • • • • • • 1.05712 • • • • • • • • • • • • • • • • • • • • • • 1.07613 • • • • • • • • • • • • • • • • • • • • • 1.10214 • • • • • • • • • • • • • • • • • • • • • • • • • • 1.01815 • • • • • • • • • • • • • • • • • • • • • • • • • 1.02116 • • • • • • • • • • • • • • • • • • • • • • • • 1.03717 • • • • • • • • • • • • • • • • • • • • • • • 1.07418 • • • • • • • • • • • • • • • • • • • • • • 1.15219 • • • • • • • • • • • • • • • • • • • • • 1.18020 • • • • • • • • • • • • • • • • • • • • • • • • • 1.02121 • • • • • • • • • • • • • • • • • • • • • • • • 1.02722 • • • • • • • • • • • • • • • • • • • • • • • 1.03823 • • • • • • • • • • • • • • • • • • • • • • 1.05624 • • • • • • • • • • • • • • • • • • • • • 1.08625 • • • • • • • • • • • • • • • • • • • • 1.13526 • • • • • • • • • • • • • • • • • • • • • • • • • 1.00927 • • • • • • • • • • • • • • • • • • • • • • • • • 1.06328 • • • • • • • • • • • • • • • • • • • • • • • • • 1.02829 • • • • • • • • • • • • • • • • • • • • • • • • • 3.52330 • • • • • • • • • • • • • • • • • • • • • • • • • 2.14331 • • • • • • • • • • • • • • • • • • • • • • • • • 1.523 Man vs Wild Data Going to extremes 34
  95. 95. Half-hourly models R−squared 90R−squared (%) 80 70 60 12 midnight 3:00 am 6:00 am 9:00 am 12 noon 3:00 pm 6:00 pm 9:00 pm 12 midnight Time of day Man vs Wild Data Going to extremes 35
  96. 96. Half-hourly models South Australian demand (January 2011) 4.0 Actual Fitted 3.5South Australian demand (GW) 3.0 2.5 2.0 1.5 1.0 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 Man vs Wild Data Date in January Going to extremes 35
  97. 97. Half-hourly models Man vs Wild Data Going to extremes 35
  98. 98. Half-hourly models Man vs Wild Data Going to extremes 35
  99. 99. Adjusted modelOriginal model J log(yt ) = hp (t ) + fp (w1,t , w2,t ) + cj zj,t + nt j =1Model allowing saturated usage J qt = hp (t ) + fp (w1,t , w2,t ) + cj zj,t + nt j=1 qt if qt ≤ τ ; log(yt ) = τ + k(qt − τ ) if qt > τ . Man vs Wild Data Going to extremes 36
  100. 100. Adjusted modelOriginal model J log(yt ) = hp (t ) + fp (w1,t , w2,t ) + cj zj,t + nt j =1Model allowing saturated usage J qt = hp (t ) + fp (w1,t , w2,t ) + cj zj,t + nt j=1 qt if qt ≤ τ ; log(yt ) = τ + k(qt − τ ) if qt > τ . Man vs Wild Data Going to extremes 36
  101. 101. Peak demand forecasting J qt,p = hp (t ) + fp (w1,t , w2,t ) + cj zj,t + nt j=1Multiple alternative futures created: hp (t ) known; simulate future temperatures using double seasonal block bootstrap with variable blocks (with adjustment for climate change); use assumed values for GSP, population and price; resample residuals using double seasonal block bootstrap with variable blocks. Man vs Wild Data Going to extremes 37
  102. 102. Peak demand backcasting J qt,p = hp (t ) + fp (w1,t , w2,t ) + cj zj,t + nt j=1Multiple alternative pasts created: hp (t ) known; simulate past temperatures using double seasonal block bootstrap with variable blocks; use actual values for GSP, population and price; resample residuals using double seasonal block bootstrap with variable blocks. Man vs Wild Data Going to extremes 37
  103. 103. Peak demand backcasting PoE (annual interpretation) 4.0 10 % 50 % 90 % 3.5 q q qPoE Demand q 3.0 q q q q q q q q 2.5 q q 2.0 98/99 00/01 02/03 04/05 06/07 08/09 10/11 Year Man vs Wild Data Going to extremes 38
  104. 104. Peak demand forecasting South Australia GSP 120 High billion dollars (08/09 dollars) Base 100 Low 80 60 40 1990 1995 2000 2005 2010 2015 2020 Year South Australia population 2.0 High Base Low 1.8 million 1.6 1.4 1990 1995 2000 2005 2010 2015 2020 Year Average electricity prices High 22 Base Low 20 c/kWh 18 16 14 12 1990 1995 2000 2005 2010 2015 2020 Year Man vs Wild Data Major industrial offset demand Going to extremes 39 0
  105. 105. Peak demand distribution Annual POE levels 6 1 % POE 5 % POE 10 % POE 50 % POE 5 90 % POE q Actual annual maximumPoE Demand 4 q q q q 3 q q q q q q q q q 2 98/99 00/01 02/03 04/05 06/07 08/09 10/11 12/13 14/15 16/17 18/19 20/21 Year Man vs Wild Data Going to extremes 40
  106. 106. Results We have successfully forecast the extreme upper tail in ten years time using only twelve years of data! This method has now been adopted for the official long-term peak electricity demand forecasts for all states except WA.Some lessons Cross-validation is very useful in prediction problems. Statistical modelling is an iterative process. Getting client understanding of percentiles is extremely difficult. Beware of clients who think they know more than you! Man vs Wild Data Going to extremes 41
  107. 107. Results We have successfully forecast the extreme upper tail in ten years time using only twelve years of data! This method has now been adopted for the official long-term peak electricity demand forecasts for all states except WA.Some lessons Cross-validation is very useful in prediction problems. Statistical modelling is an iterative process. Getting client understanding of percentiles is extremely difficult. Beware of clients who think they know more than you! Man vs Wild Data Going to extremes 41
  108. 108. Results We have successfully forecast the extreme upper tail in ten years time using only twelve years of data! This method has now been adopted for the official long-term peak electricity demand forecasts for all states except WA.Some lessons Cross-validation is very useful in prediction problems. Statistical modelling is an iterative process. Getting client understanding of percentiles is extremely difficult. Beware of clients who think they know more than you! Man vs Wild Data Going to extremes 41
  109. 109. Results We have successfully forecast the extreme upper tail in ten years time using only twelve years of data! This method has now been adopted for the official long-term peak electricity demand forecasts for all states except WA.Some lessons Cross-validation is very useful in prediction problems. Statistical modelling is an iterative process. Getting client understanding of percentiles is extremely difficult. Beware of clients who think they know more than you! Man vs Wild Data Going to extremes 41
  110. 110. Outline1 Where fools fear to tread2 Working with inadequate tools3 When you can’t lose4 Getting dirty with data5 Going to extremes6 Final thoughts Man vs Wild Data Final thoughts 42
  111. 111. Crazy clients The client who wouldn’t tell me the problem. The client who wanted all meetings held at random locations for security reasons. The client who didn’t like the answer. Expert witnessing on the color purple (and now yellow). Man vs Wild Data Final thoughts 43
  112. 112. Crazy clients The client who wouldn’t tell me the problem. The client who wanted all meetings held at random locations for security reasons. The client who didn’t like the answer. Expert witnessing on the color purple (and now yellow). Man vs Wild Data Final thoughts 43
  113. 113. Crazy clients The client who wouldn’t tell me the problem. The client who wanted all meetings held at random locations for security reasons. The client who didn’t like the answer. Expert witnessing on the color purple (and now yellow). Man vs Wild Data Final thoughts 43
  114. 114. Crazy clients The client who wouldn’t tell me the problem. The client who wanted all meetings held at random locations for security reasons. The client who didn’t like the answer. Expert witnessing on the color purple (and now yellow). Man vs Wild Data Final thoughts 43
  115. 115. Go forth and consultA good statistician is not smarter thaneveryone else, he merely has his ignorancebetter organised. (Anonymous) Man vs Wild Data Final thoughts 44
  116. 116. Go forth and consultAll models are wrong, some are useful. (George E P Box) Man vs Wild Data Final thoughts 44
  117. 117. Go forth and consultIt is better to solve the right problem thewrong way than the wrong problem theright way. (John W Tukey) Man vs Wild Data Final thoughts 44
  118. 118. Go forth and consultIt is better to solve the right problem thewrong way than the wrong problem theright way. (John W Tukey) Slides available from robjhyndman.com Man vs Wild Data Final thoughts 44

    Sé el primero en comentar

    Inicia sesión para ver los comentarios

  • muntimehdi

    Mar. 8, 2013
  • leyritz12

    Apr. 22, 2016
  • mathematixy

    Dec. 18, 2017

Vistas

Total de vistas

2.799

En Slideshare

0

De embebidos

0

Número de embebidos

1.365

Acciones

Descargas

36

Compartidos

0

Comentarios

0

Me gusta

3

×