Tuesday, January 23, 2007

Creating a Forecast using AWM

Building the Forecast
There are many reasons for recommending Oracle’s OLAP option within a data warehouse environment. The OLAP home page on OTN has presentations and technical white papers that help to explain these benefits. One of these benefits is the ability to support complex data models that are difficult to manage within a purely relational environment. For example, the creation of a forecast.

In the past I have presented many times at various Oracle user group conferences and OracleWorld sessions on how to use the results from a forecast within a BI Beans application, Spreadsheet Addin and Discoverer OLAP. But I have never gone into detail about how to actually create a forecast. Why? Because until now, you needed to have a deep understanding of OLAP DML and it was difficult to integrate the execution of the forecast into the normal procedures of building an analytic workspace.

With the release of AWM10.2.0.3 (you will need to apply the patchset to your database which can be downloaded from Metalink) it is now possible to very quickly and easily create a new forecast without the need to have a deep understanding of the internals of Oracle OLAP or OLAP DML. If you do want to get a better understanding of the mechanics of building a forecast, the Help topics contained in Analytic Workspace Manager for OLAP DML provide some excellent examples of how to manually create a forecast and this will provide some insight into what is actually happening behind the scenes of AWM in the analytic workspace.

Creating a forecast is a very simple. But before you start you need to have additional time periods within your time dimension to hold the results of your forecast. This requirement will be familiar to Sales Analyzer customers. If you load the BI sample OLAP schema (SH_OLAP) you will notice the last year in the time dimension contains no data for any of the measures (revenue, costs and quantity). However, the first four time periods do contain data. Therefore, we can use the prior periods to compute new forecasted values for the last year in the time dimension.

The steps to create a new forecast are as follows:

1) Identify the measure you want to forecast and then create a new measure to hold the results of that forecast. For example, to create a forecast for sales revenue, add a new measure called Revenue Forecast to the Sales cube. You can create as many forecast measures, as you need. But remember a forecast measure must be in the same cube as the measure that will be used to source the data for the forecast. In this case since we want to forecast revenue, the forecast measure must be in the Sales Cube as this contains the source measure namely “Revenue”.

2) This new measure does not need to be mapped as the data will be sourced from existing revenue measure and calculated once the data for the revenue measure has been loaded.

3) Next create a new calculation plan to build and populate the forecast. In 10gR2 versions of AWM there is a new node in the Model view to manage calculation plans.

4) Defining the calculation plan is a two-stage process. The first part is to create the plan. This involves simply providing a name and corresponding labels. Each plan is then composed of a series of steps and there can be as many steps as required and the order of the steps can be controlled within the dialog.

5) There are three types of step: Aggregation, Allocate and Forecast. All these steps may be used when creating a forecast. Which steps are used to depends on how the forecast is created. Ideally the forecast should be generated at the lowest level for each dimension. But this may or may not be required. If this is not the case then before the forecast can be aggregated up the various hierarchies within each dimension the data has to be allocated down to the lowest levels. So AWM provides both “Allocation” and “Aggregation” steps to manage these operations.

To keep this simple, let us assume we are going to forecast data at the lowest level for each dimension in our Sales cube: Product, Channel, Geography and Time. First, a forecast step is created to generate the required data at the base level.

6) The New Forecast Step dialog is displayed. This panel contains three tabs:
  • General - controls the main inputs to the model, such as source measure, target measure, time dimension and type of forecast.
  • Advanced Settings – controls the parameters for the forecast model
  • Status - controls the dimension values that will be used to by the model

The various labels can be set to anything but it is useful to keep the information meaningful for future reference. The information entered here is not displayed to end-users;they will only see the measure we created in Step 1. These labels are purely for documentation purposes.

7) Next step is to select the cube (remember your target measure that is going to contain the forecast has to be in the same cube as the source measure). In this case we will use the SALES cube (dimensioned by Product, Channel, Geography and Time. It contains three measures: revenue, costs, and quantity. Plus the additional measure Forecast Sales).
8) Set the source measure to be Revenue
9) Set the target measure to be Revenue Forecast (as created at step 1)
10) The relevant Time dimension should be automatically selected.
11) Oracle OLAP supports a number of different forecast methods and if you have a good understanding of forecasting you can select the required method from the list in the pulldown. There is a description associated with each type of forecast to help you. However, the easiest method to use is the “Automatic” method. The forecast engine identifies the best fit by quickly testing each statistical method against the historical data. It selects the method that would have generated the most accurate forecast in the past. This method usually produces the most accurate short-term forecasts.

12) If you need to fine-tune the various parameters for the selected method, the Advanced Settings tab allows you fine-grained control to tweak these settings. Below are the settings for the “Automatic” method.

13) The last step on the General tab is to set the number of time periods for the forecast. In the common schema sample there are time periods (months and quarters and years) for five years within the time dimension, however, the last year does not contain any data. So there are 48 time months with data and 12 months without data. In this case, therefore, we set the number of time periods to forecast to 48. On the Status tab we will select the historical time periods and the forecast time periods and the forecast periods will be the last ones in status, based on the number specified on this page. As a result a forecast will be generated for the final year in the time dimension.
14) Switching to the Status tab allows us to define the dimension values to be used in the forecast. As stated earlier to keep this simple we will forecast data at the lowest level for each dimension. If you have used the Query Builder in BI Beans, or Spreadsheet Addin or Discoverer the next series of steps will be familiar to you as the selection process uses the same query builder panel (nice consistent look and feel and one of the benefits of reusable components).

For the Time dimension we will select to use values at the month level, as you can see the query builder panel makes it very easy to create this selection.

Repeat this step for each dimension. Once you have set the selections for each of the dimensions (Time, Channel, Geography, and Product), click the Create button to create the new forecast step.

15) Once this calculation plan is executed it will run a forecast for each product, within each channel, for each country, for each month and store the result in the new measure - Revenue Forecast. In addition, new revenue figures will be forecast for the last year in the time dimension.
16) However, at the moment the data for the forecast is only shown at the base level for each dimension. Therefore, another calculation step is required to roll-up the values so we can look at the forecast for All Products, All Channels, All Regions and across years. This is an Aggregation step.

17) This step allows us to define how the forecast measure should be aggregated. In this case the aggregation method will be the same as the other measures in the sales cube and the process for defining the aggregation step is almost the same as the process for defining a new cube.
18) Enter descriptive labels, then select the Sales Cube from the pulldown and then select Revenue Forecast as the measure to aggregate.

19) The “Summarize To” tab allows you to control which levels are pre-calculated as part of this process. As the sales cube is quite small and aggregated extremely quickly (a couple of minutes on my laptop) the best option is to select all the levels for each dimension.

Depending on the size of your cube and user expectations regarding performance you can tweak these settings to manage the amount of time it takes to create the forecast and aggregate the data. For example, if your forecast measure is taking a long time to aggregate you could try aggregating every other level. But make sure you always aggregate the top level.

20) Done! The completed panel should look the one shown below.

All that remains now is to execute this new calculation plan to generate the forecast data into our new measure.

Once the forecast has been executed and aggregated the results can be viewed in BI Beans, Spreadsheet Addin, Discoverer OLAP or BI Enterprise Edition.

Displaying the Results
To display the results of the forecast, it is useful to create a new folder and add the forecast measure to that folder. This makes it easier for users to find the new measure and makes it obvious the data is actually a forecast. In addition, it can be useful to create corresponding variance and % variance measures to allow users to compare the accuracy of the forecast to the base measure. Again these calculated measures could be added to the new forecast measure folder to make the easy to find.