Wednesday, October 31, 2007

Optimising Aggregations in OLAP

I have been investigating two very useful interesting, and possibly useful, procedures this week. They are part of the DBMS_AW package and allow you to tune your aggregation process. One of the many questions that get asked during OLAP projects is: which levels to precompute and which levels to compute at query time? Most people seem to think you must pre-compute all levels across all their dimensions. Well, the flip side of this approach is the data load and aggregations stage can take more time than you expected and possibly have available. So is it possible to balance the time taken for the data load/aggregation and pre-solving as much as possible to maintain query performance? As part of the DBMS_AW package there are two procedures that can help you optimise this aggregation process by identifying the most expensive members within a dimension. The two packages are:
  • ADVISE_CUBE
  • ADVISE_REL
Both allow you to define a percentage value to precompute as part of the aggregation step that forms the normal build process. In 11g this is all included directly within AWM, but for those customers using 10g here is a quick overview of how to use these procedures.


Advise_Cube
The ADVISE_CUBE procedure helps you determine how to preaggregate a standard form cube in an analytic workspace. The procedure takes two arguments:
  • Aggmap_name:The name of an aggmap associated with the cube.
  • Precompute_percentage: The percentage of the cube's data to preaggregate. The default is 20%.
Each RELATION statement in the aggmap must have a pre-compute clause containing a valueset. ADVISE_CUBE returns a list of dimension members in each valueset. If the valueset is not empty, ADVISE_CUBE deletes its contents before adding new values.

The aggmap must have a pre-compute clause in each of its RELATION statements. The precompute clause must consist of a valueset. Based on the precompute percentage that you specify, ADVISE_CUBE returns a set of dimension members in each valueset.

Now I tried this using the common schema and ran into a few problems in getting the correct results. For each measure within a cube there is a formula that refers to an AGGMAP object, as shown below where the aggmap is OBJ1123208571

DEFINE SALES_PRT_TOPFRML FORMULA DECIMAL
EQ aggregate(this_aw!SALES_PRT_TOPVAR using this_aw!OBJ1124208571)

The aggmap looks like this:

DEFINE OBJ1124208571 AGGMAP
AGGMAP
RELATION this_aw!TIME_PARENTREL(this_aw!TIME_AGGRHIER_VSET) PRECOMPUTE(this_aw!TIME_AGGRDIM_VSET) OPERATOR SUM args DIVIDEBYZERO YES DECIMALOVERFLOW YES NASKIP YES COUNT NO
RELATION this_aw!CHANNELS_PARENTREL(this_aw!CHANNELS_AGGRHIER_VSET) PRECOMPUTE(this_aw!CHANNELS_AGGRDIM_VSET) OPERATOR SUM args DIVIDEBYZERO YES DECIMALOVERFLOW YES NASKIP YES COUNT NO
RELATION this_aw!GEOGRAPHIES_PARENTREL(this_aw!GEOGRAPHIES_AGGRHIER_VSET) PRECOMPUTE(this_aw!GEOGRAPHIES_AGGRDIM_VSET) OPERATOR SUM args DIVIDEBYZERO YES DECIMALOVERFLOW YES NASKIP YES COUNT NO
RELATION this_aw!PRODUCTS_PARENTREL(this_aw!PRODUCTS_AGGRHIER_VSET) PRECOMPUTE(this_aw!PRODUCTS_AGGRDIM_VSET) OPERATOR SUM args DIVIDEBYZERO YES DECIMALOVERFLOW YES NASKIP YES COUNT NO
MODEL this_aw!SALES_OBJ1124208571_TIME_AWXMLMODEL PRECOMPUTE(NA)
MODEL this_aw!SALES_OBJ1124208571_CHANNELS_AWXMLMODEL PRECOMPUTE(NA)
MODEL this_aw!SALES_OBJ1124208571_GEOGRAPHIES_AWXMLMODEL PRECOMPUTE(NA)
MODEL this_aw!SALES_OBJ1124208571_PRODUCTS_AWXMLMODEL PRECOMPUTE(NA)
AGGINDEX NO
CACHE SESSION
END

In the documentation the example shows a much simpler AGGMAP as follows:

DEFINE UNITS_AGG AGGMAP
RELATION product_parentrel PRECOMPUTE (prodvals)
RELATION time_parentrel PRECOMPUTE (timevals)

Executing the ADVISE_CUBE procedure against the standard form AGGMAP does not generate an error but it does not generate a result either. The only way I could get this to work was to create another AGGMAP using a different set of valusets as follows:

DEFINE SALES_REVENUE_AGGMAP AGGMAP
AGGMAP
RELATION TIME_PARENTREL PRECOMPUTE (TIME_VALS)
RELATION CHANNELS_PARENTREL PRECOMPUTE (CHANNELS_VALS)
RELATION GEOGRAPHIES_PARENTREL PRECOMPUTE (GEOGRAPHIES_VALS)
RELATION PRODUCTS_PARENTREL PRECOMPUTE (PRODUCTS_VALS)
END

And then copy the results from the new valueset to the master valueset. This is the script I used to execute the procedure:

SQL>set serveroutput on
SQL>exec dbms_aw.execute('aw detach SH_AW')
SQL>exec dbms_aw.execute('aw attach SH_AW RW')
SQL>exec dbms_aw.advise_cube ('SALES_REVENUE_AGGMAP', 40)
SQL>exec dbms_aw.execute('status PRODUCTS_VALS')
SQL>exec dbms_aw.execute('status GEOGRAPHIES_VALS')
SQL>exec dbms_aw.execute('status CHANNELS_VALS')
SQL>exec dbms_aw.execute('status TIME_VALS')
SQL>exec dbms_aw.execute('limit PRODUCT_AGGRDIM_VSET to PRODUCTS_VALS')
SQL>exec dbms_aw.execute('limit TIME_AGGRDIM_VSET to TIME_VALS')
SQL>exec dbms_aw.execute('limit CHANNELS_AGGRDIM_VSET to CHANNELS_VALS')
SQL>exec dbms_aw.execute('limit GEOGRAPHIES_AGGRDIM_VSET to GEOGRAPHIES_VALS')
SQL>exec dbms_aw.execute('update;commit;aw detach SH_AW')

I am not sure how this works, in terms of computing a cost for a specific member(s) within a dimension. Things are a little cleared with the other package, Advise_Rel.

Advise_Rel
I had a lot more success with the ADIVISE_REL procedure. This works on a specified dimension and can use the existing the valuesets referenced by the standard form aggmap. The procedure takes three arguments:
  • Family_relation_name: The name of a family relation, which specifies a dimension and the hierarchical relationships between the dimension members.
  • Valueset_name: The name of a valueset to contain the results of the procedure. The valueset must be defined from the dimension in the family relation. If the valueset is not empty, ADVISE_REL deletes its contents before adding new values.
  • Precompute_percentage: A percentage of the dimension to preaggregate. The default is 20%.
The valueset must be based on the dimension to be analyzed and in this case can be the valueset listed in the standard form aggmap. Based on the pre-compute percentage that you specify, ADVISE_REL returns a set of dimension members into the valueset specified. The values to a pre-compute are selected on the cost of drilling on a parent member. The more children a parent has the longer it will take to compute at query time.

This is the script I used to execute the procedure:

SET SERVEROUTPUT ON
spool advise_dims.log
EXEC dbms_aw.execute('aw attach SH_AW rw');
EXEC dbms_aw.advise_rel('PRODUCT_PARENTREL', 'PRODUCT_AGGRDIM_VSET', 25);
EXEC dbms_aw.advise_rel('CHANNELS_PARENTREL', 'CHANNELS_AGGRDIM_VSET', 25);
EXEC dbms_aw.advise_rel('GEOGRAPHIES_PARENTREL', 'GEOGRAPHIES_AGGRDIM_VSET', 25);
EXEC dbms_aw.advise_rel('TIME_PARENTREL', 'TIME_AGGRDIM_VSET', 25);
EXEC dbms_aw.execute('show values(PRODUCTS_AGGRDIM_VSET)');
EXEC dbms_aw.execute('show values(GEOGRAPHIES_AGGRDIM_VSET)');
EXEC dbms_aw.execute('show values(CHANNELS_AGGRDIM_VSET)');
EXEC dbms_aw.execute('show values(TIME_AGGRDIM_VSET)');
execute dbms_aw.execute('update;commit;aw detach SH_AW');
spool off
exit

Once this is complete the last step is load data into the cube and the aggregation engine will now use these values to pre-compute specific members.

For more information on these procedures check the OLAP Reference Manual:

http://download.oracle.com/docs/cd/B19306_01/olap.102/b14350/dbms_aw004.htm#CEGJCHHA
http://download-uk.oracle.com/docs/cd/B14117_01/olap.101/b10334/dbms_aw.htm#CEGFHDII

Hope this is useful

Keith