Monday, July 16, 2007

What no GUI? How to Maintain Analytic Workspaces using Scripts

This is the second part of a series on a non-GUI based approach to building and maintaining analytic workspaces. Some people might struggle with the requirement to provide a script to rebuild and analytic workspace when Oracle provides a simple GUI tool called Analytic Workspace Manager. Unless of course you are a DBA.

But when it comes to day-to-day data loading scripting is really the only way forward. Fortunately, Analytic Workspace Manager can take the strain out of creating these scripts, however, once generated they are extremely easy to modify and move around from server to server.

The purpose of this post is to provide an overview of these maintenance scripts. There are three basic categories of scripts that need to be run during a maintenance cycle:
  • Dimension maintenance
  • Cube loading
  • Aggregation

Options to Control Dimension Maintenance
Dimension scripts are relatively simple and the likely modifications relate to options to setting all current attributes to NA and removing dimension members no longer present in the source table.

How to identify the dimension being processed? The XML file has a tag call XMLIDref that identifies the dimension to be processed. As shown here:

XMLIDref="AIRLINE.DIMENSION"

In a maintenance script you either maintain each dimension using individual scripts or you can insert multiple lines of XMLIDref identifiers.

Why and when do you need reset attributes? This is only necessary when you need to set an attribute to NA to remove its current display label and the next planned load will not replace it label with another one. Otherwise, pre-existing entries for attribute labels are automatically over written during a load. The tag to control this process is:

CleanAttrs="false" or CleanAttrs="true"

Why and when do you need remove dimension members? There may be cases where you actually want to remove a dimension member from the AW. Some people might see this as a way to manage slowly changing dimensions within OLAP, but there is a very nasty side effect of using this feature. The process of deleting it removes all the data related to that member across all cubes to which it contributed. Therefore, you should think very carefully before enabling this tag. Once the data is gone the data is gone, unless you have a backup or sufficient flashback space enabled (which from an AW perspective is highly unlikely). The tag to control this process is:

CleanDim="false" or CleanDim="true"


Option to Control Cube Loading
There are four basic tags used to support loading of data within a cube. The first identifies, XMLIDref, the cube to be processed. It is not possible to maintain a single measure within a cube as processing is done at the cube level (although if you are using a view as the source into your cube mapping you could modify the view to only provide values for the specific measure and return null for all other measures). The XMLIDRef appears as follows, where SALES is the name of the cube to be processed.

XMLIDref="SALES.CUBE"

When you load data into a cube you have the option to remove all existing values from the cube. I have to admit the documentation is not very helpful regarding this tag:

Indicates whether the analytic workspace deletes existing measures before it loads new measures. This method returns true if the analytic workspace deletes existing before it loads new measures or false otherwise.

I can only assume this will set a variable to NA prior to refreshing data within a cube. Obviously the process of removing all existing data will take some time and this needs to be considered when trying to determine how long the maintenance job will take to run.

CleanMeasures="false"


The last two tags work in exactly the same way as outlined in the dimension maintenance section. During a cube build it is possible to maintain the base dimensions within the same job. In this case the dimensions are processed first to ensure all the required members are present before the cube is processed.

CleanAttrs="true"
CleanDim="true"

My personal preference is to do the dimension maintenance first and then run a separate job to load and aggregate any cubes. This helps reduce the amount of temp space required during a build as you can at least commit all the dimension changes to disk before loading data into the cube.


Options to Control Aggregation
The tag TrackStatus, which can have the value of true or false, determines if a load should aggregate only incoming values. If this is set to false, the whole cube is aggregated once the data load has completed. Obviously the aim should be to set this to true to limit the amount of work performed during the aggregation process. During data loads into a partitioned cube you can limit processing to just two partitions – data driven target partition (for example a specific month) and the default partition (quarters and years if partitioned by month) by using this feature, which will make data loading a very fast process.

TrackStatus=”true” or TrackStatus=”false”



Options to Control Execution
The tag ID is just used to mark the job. It must be unique within a group of jobs submitted at the same time. But, apart from that, it has no real significance and there is no link though to data recorded within XML_LOAD_LOG. The tag should look something like this:

Id="Action54"

The tag AWName identifies the analytic workspace that will be attached as part of this job. For example below the aw called SH_AW in the schema SH_OLAP will be used:

AWName="SH_OLAP.SH_AW"


The tag BuildType is a very important tag as it controls how the build is managed. It has two modes:

  • Execute – this forces the calling process, your SQL program, to wait until the procedure has completed before regaining control.
  • Background - this allows the calling process, your SQL program, to continue processing while the job continues in the background. If you have multiple processors (see section on MaxJobQueues) this is the method you must use to allow the controlling job to spawn multiple jobs to get the work done.

Interestingly, the main problem with using this tag within in a script is no errors are returned even if the build fails. The only message that gets returned is “Completed Successfully”. Therefore, to determine the status of a job you need to monitor XML_LOAD_LOG to check for any reported errors.

BuildType="BACKGROUND"


If you have a server with multiple CPUs you can run parallel execution jobs by setting the MaxJobQueues tag.

MaxJobQueues="5"

Controls the number of parallel jobs used to execute the aggregation step. The result is the following entries in XML_LOAD_LOG:

08:58:50 Running Jobs: AWXML$_483_1. Waiting for Tasks to Finish...
08:58:50 Started 1 Finished 2 out of 10 Tasks.
08:58:50 Running Jobs: AWXML$_483_1.


Don’t set this figure too high! The danger is if you try to set this too close to the total number of CPUs swamps the machine and performance of the load and aggregation drops off. The best starting point is to aim to use half the number of CPUs


And last but not least…
If you look at a maintenance script generated by AWM you will notice a number after the word XML_CLOB. In the example below the number is 190:

dbms_lob.writeappend(xml_clob, 190,

this refers to the number of characters, including spaces, between the next single quotation mark and the last quotation mark at the end of the line:


(xml_clob, 190,' ………………………………………………………………… ');


if modify any of the entries for tags outlined above you need to make sure the number after XML_CLOB correctly identifies the number of characters between the quotation marks. The easiest way to do this is to use Word (Tools->Word Count). If the number is wrong the job will fail but you will not get an error message returned via SQLPlus, only the statement “PL/SQL Procedure Completed Successfully”. If you are really lucky and the issue is with the BuildDatabase tag the job might actually run to completion as the defaults for the missing tags might kick in and allow the job to run. Otherwise, all you are likely to see in XML_LOAD_LOG is the statement such as:

14:13:48 Attached AW SH_OLAP.SH_AW in RW Mode


And nothing else. This means the job managed to find the correct AW and attach it but the rest of your job failed. So check the numbers!

And make sure you detach the AW otherwise all other jobs will be blocked by that session which will have the AW attached in read-write mode.