Monday, December 17, 2007

OLAP Workshop Part 3 :Building an Analytic Workspace

OLAP Workshop Part 3 :Building an Analytic Workspace

In this lesson, we look at how to use the Analytic Workspace Manager 10g (AWM 10g) tool in conjunction with 10g OLAP to build multidimensional database objects. We will use AWM to perform the following tasks:
  • Create an analytic workspace
  • Define dimensions
  • Define cubes
  • Load data from source relational tables
  • View data
Throughout this post I have tried to add observations and best practices I have picked up while working with various customers across the US and EMEA. As a result I am going to split this posting into possibly three postings:
  • Create an analytic workspace and defining dimensions
  • Modelling and mapping different types of dimensions
  • Define cubes and load data from source relational tables
So what is the difference between AWM and OWB? AWM should be considered an “EL” tool, it does not contain transformation tools (in AWM 11g simple transformations are possible), for building analytic workspaces. .The target audience for AWM is business users and also developers already using another ETL tools that does not provide support for OLAP data modelling.

For this workshop we are going to focus on using AWM. For more information on using OWB to build OLAP data models see the links posted in Workshop 2.


Building Blocks of the Multi-dimensional Model







The first step is design the logical data model, including the dimensions and measures that are needed in the AW. Obviously Warehouse Builder provides the perfect environment for creating a logical data model, and there are some presentations that cover this on the OWB OTN Home.

However, if you prefer a more to use a pure data modelling tool then I would recommend looking at CWD4ALL from one of our partners, IKAN. To quote directly from their website:

CWD4ALL is a database & OLAP modeling and design tool, fully conformant to the OMG CWM™ specifications. Its advanced modeling and design capabilities provide the means to align your modeling activities with this new worldwide standard. CWD4ALL provides both forward and reverse engineering functionalities. Reverse engineering constructs a graphical model from your existing DB/2, UDB, Oracle and SQL Server 2000/2005 database or schema, which on their part can be forward engineered to an Oracle relational model and the corresponding DDL or Oracle Analytical Workspaces.

……

Multi-dimensional modelling and OLAP
The multi-dimensional model is integrated with the CWD4ALL data models, which stay on top them and use metadata from the logical model and therefore can be mapped to different implementations of relational models. Thus schema and naming independence for used tables and columns is provided, allowing a high flexibility during deployment of the model. Detailed and compact multi-dimensional diagrams provide an easy way to define dependencies in the multi-dimensional model even going beyond possibilities of traditional Star and Snowflake schemas. The multi-dimensional model provides support for OLAP & Oracle Analytical Workspaces. For relational data warehousing you will be able to deploy definitions of levels, dimensions and hierarchies to the Oracle dictionary, thus building a basis for the query rewrite option of the Oracle database. Moreover it will assist in building materialized view definition and defining bitmap join indexes. For Oracle OLAP data warehouse, CWD4ALL will deploy the multi-dimensional model to Oracle 10g Analytical Workspaces and assist in defining the SQL path to calculations in Oracle Analytical Workspaces through view definitions based on OLAP-TABLE functionality. In addition to designing and maintaining correct database relational model behavior, CWD4ALL allows the designer to examine and specify actual application behavior so that space management, undo datasets and the full range of implementation-specific parameters can be modeled and defined appropriately.

For more information on CWD4ALL click here to go to the website.

Obviously AWM can also help you design your logical data model but there is one important difference between AWM and OWB/CWD4ALL. AWM is best described as “design time at run time” which means as you create dimensions, cubes etc using AWM the object is created immediately in the analytic workspace, i.e. there is no deployment phase. Which from a business user perspective makes this an ideal product as it simplifies the whole process.

In this workshop we will create the following model based on the sales history schema (SH)

Dimensions
  • Time: Shows how data varies over time
  • Product: Shows how data varies by product
  • Geography: Shows how data varies by geography
  • Channel: Shows how data varies according to each distribution channel

Stored Measures
  • Sales
  • Costs
  • Quantity

Examining the Logical Model:
Dimensions
After you have identified dimensions, you can identify the levels of summarization within each dimension. Analysis requirements reveal that:
  • Channel dimension has three levels: Total, Class, and Channel
  • Geography dimension has four levels: Total, Region, Subregion, and Country
  • Product dimension has four levels: Total, Category, Subcategory, and Product.
  • Time dimension has three levels: Year, Quarter, and Month.
Note with OLAP dimensions and additional top level is always added to allow business users to fully analyse. In Excel terms this provides the “All” level, or in relational terms allows the dimension to be pivoted out of the query.

Cubes
Cubes provide a convenient way of collecting similar measures of the same dimensionality. It is not uncommon for many measures to have the same shape, and so by defining their shape (and other shared characteristics) for a cube, you can save time when building your AW. Multidimensional cubes are stored in AWs. A particular AW may contain more than one cube, and each cube may describe a different dimensional shape.

Dimensions defined the edges of a cube. Although there is no limit to the number of edges of a cube, BI tools typically organize the display along three edges: row edge, column edge, and page edge. A single dimension or multiple dimensions can be placed on each edge.

A cube is simply a logical object that helps an administrator to build and maintain an AW. It also aids in the definition of measures with common characteristics, such as sparsity patterns and aggregation rules. Measures in the same cube have the same relationships to other logical objects and can easily be analyzed and displayed together.

In this lesson, we will create our first cube, Sales, containing three measures. These are base measures, which store the facts collected about the business. Each measure that belongs to a particular cube shares particular characteristics with other measures in the cube, such as the same dimensions. The Sales cube includes:
  • Dimensions: Time, Geography, Product, and Channel
  • Measures: sales, costs, quantity
The data for these measures, and the dimensions that organize the measures, will be sourced from tables in the SH schema, as discussed in the next slide.

Getting Started
Configuring your database:
This is an area that seems to cause the most problems. This is always surprising to me considering that OLAP 10g is completely integrated into the database engine. The key here is making sure you have the correct patches applied to your database kernel and database instance. You can validate your existing configuration against the OLAP certification matrix, which can be viewed from here:

http://www.oracle.com/technology/products/bi/olap/collateral/olap_certification.html

When applying database patches please note that the majority of patches are composed of two parts. Firstly you need to use the Universal Installer to apply the kernel updates. Secondly your database instance needs to be upgraded via a series of SQL scripts. This is all documented but many people get caught out.

In this case I am using 10.2.0.3 version of the database (Note that OLAP is available only in the Enterprise Edition of the database and is a costed option) with an additional OLAP patched described as the “OLAP A Patch” in Metalink. All this is explained in the certification matrix as stated above.

You can quickly and easily check the status of your schema by connecting to your database instance using SQLPlus (or use SQLDeveloper) and run the following commands:

col comp_name format a25 heading 'Component'
col version format a12 heading 'Version'
col status format a10 heading 'Status'
col modified heading 'Modified'

SELECT comp_name, version, status, modified
FROM dba_registry
WHERE comp_name like '%OLAP%';

Component Version Status Modified
------------------------- ------------ ---------- --------------------
OLAP Analytic Workspace 10.2.0.3.0 VALID 19-NOV-2006 08:13:33
Oracle OLAP API 10.2.0.3.0 VALID 19-NOV-2006 08:13:35
OLAP Catalog 10.2.0.3.0 VALID 19-NOV-2006 08:13:38


Setting up AWM
The first step is to download AWM from the OTN OLAP home page:

http://www.oracle.com/technology/software/htdocs/devlic.html?url=http://download.oracle.com/otn/java/olap/AWM_102030A_Win32.zip

and the associated readme file that explains installation requirements is here:

http://www.oracle.com/technology/products/bi/olap/awm102030A_readme.html

After installing AWM you can run the awm.exe file located in the ..awm/bin directory or click on the desktop icon.








Making a connection to your database instance
The first step is to create a new user to own our analytic workspace. For this example we will create a user called SH_OLAP and this user will need to have a special role assigned to it to allow the user to create and manage analytic workspaces. This role is OLAP_USER. Some people are often tempted to use a different role, OLAP_DBA. This is similar to providing a normal user with the DBA privileges. Do not be tempted to use this role as provides a lot of additional privileges that can in some cases cause lots of problems during use. This role should only be assigned to the user OLAPSYS.

CREATE USER "SH_OLAP" PROFILE "DEFAULT"
IDENTIFIED BY "SH_OLAP" DEFAULT TABLESPACE "SH_OLAP"
TEMPORARY TABLESPACE "SH_OLAP_TEMP"
ACCOUNT UNLOCK;
GRANT "CONNECT" TO "SH_OLAP";
GRANT "OLAP_USER" TO "SH_OLAP";

This user will need SELECT priviledges on the source tables that will be used to populate the dimensions and the cubes.

As a best practice I will normally assign my OLAP user to its own tablespace and create a temp tablespace specifically for that user as well. The reasons for this will become evident later when we look at what happens during the loading of data into a dimension and/or a cube. Make sure this new user has sufficient quota on both these tablespaces.

When you start AWM for the first time you will need to define a new connection to your database instance.



This step can cause problems and a number of people have posted questions on the OLAP forum regarding connecting to a database instance. Using a TNSNames alias as the connection string causes the majority of problems. You can only use a TNS alias if you also install the SQL net layer from the database client CD. This is not provided as part of the AWM installation so needs to be added as part of a separate process.

Personally I always a JDBC connection as this does require any additional software to be installed. The syntax for a jdbc connection is :

Hostname:port:sid

An example would be something like this:

klaker-uk.uk.oracle.com:1521:beans

Alternatively you can use localhost or 127.0.0.1 to identify the host:


To now connect to our database instance simply click on “+” sign next to the database name. You will notice from the picture below you can define as many connections within AWM as you need. Once the connection dialog is shown enter the username and password in the dialog box. And then click OK to start the connection process.



Creating an Analytic Workspace
To create an analytic workspace, we need to perform the following steps:

  • Find the schema name under which you want the AW to reside, in this case our user is SH_OLAP. Expand that schema name to display the Analytic Workspaces node.
  • Next right-click the “Analytic Workspaces” node. This will show the “Create New Analytic Workspace” dialog box appears.
  • Lastly we can enter a name for our AW which in this case is SH_AW.

Optionally, we can choose a tablespace where this AW is stored. By default, the default tablespace is used for the schema. This is set up by the database administrator (DBA) when the schema is created. In this case the default tablespace is SH_OLAP.

Now we have created the AW, it appears in the navigator under the node where it is created. It is attached in read/write mode, which means that you can make changes to it.

At this point we have two options:
  • Manually define our dimensions and cubes
  • Load a predefined model from a template
You can create a workspace directly from a template. A template holds the definition of objects. You can use templates to create analytic workspaces, cubes, and dimensions. In this case we are going to manually define the objects within our AW.




Note: Once we have created the AW, a table named AW$SH_OLAP is created at the database level. (The format for the naming convention is AW$aw_name, WHERE aw_name is the name that you have chosen for your AW.) This table stores all of the multidimensional objects in your AW.

Does Oracle OLAP Support Multiple Languages?
Yes, an AW can support multiple languages. This enables the users of your OLAP applications and tools to view the metadata and descriptive attributes in their native languages.
The number and choice of languages is restricted only by the database character set and your ability to provide translated text. To add support for multiple languages, perform the following steps:
  • In the Model View navigation tree, expand the folder for the AW.
  • Click the Languages folder, and select the languages for the AW on
  • the General tabbed page.
  • As you create your objects, such as dimensions, levels, hierarchies, attributes, cubes, measures, calculated measures, and measure folders, open the Translations tabbed page of the property sheet. Enter the object labels and descriptions in each language.
When you map the dimensions, map the attributes to columns for each language.

Note: In this class, a single language is used, which is American, although the image shown below has multiple languages selected.




Creating Dimensions
Dimensions are lists of unique values that identify and categorize data. They form the edges of measures (facts). Dimensions have structure that helps in the navigation of data and the definition of calculations. This structure includes levels, hierarchies, and attributes in the logical model. You define these supporting objects, in addition to the dimension itself, in order to have a fully functional dimension.

Dimension Type
AWM provides two types of dimensions:
  • User Dimensions
  • Time Dimensions
Most dimensions that we will create during this workshop are of the type default “User Dimension”. In the example shown below, a Time dimension is created. If you explicitly set the dimension type to “Time Dimension,” AWM automatically prepares some additional time attributes. When populated, these attributes facilitate time series calculations on the measures that share this dimension. It is recommended that all your time dimensions be created with this setting.

Again this seems to regularly come up on the forums – you will only see the time series calculations in the Calculation Builder (See the workshop on building cubes). These calculations require two additional time based attributes to be populated – Timespan and End Date. Most customers do not have these attributes in their existing relational schemas so they need to be added. Fortunately OWB will generate an OLAP compliant time dimension. If you are not using OWB then you will need to find a different way of creating these additional attributes (there was a posting on the OLAP forum where someone actually posted all the code required to create an OLAP compliant time dimension but unfortunately I cannot find the thread now, may be someone else can locate it).



Other Tabbed Pages
The Translations tabbed page enables you to provide labels in languages that your AW uses:



The "Implementation Details" tabbed page enables you to identify certain dimension characteristics. By default I always recommend using the Surrogate Key option as this ensures unique members are created across all the levels within a dimension.




For people used to relational data models this can be quite confusing. What happens during when loading data into a dimension is all the source columns are collapsed into a single column within the AW. Therefore, across all the source columns the keys must be unique. Let’s consider an example with time:

YEAR_ID YEAR_DESC QUARTER_ID QUARTER_DESC MONTH_ID MONTH_DESC 31-12-2007 Yr 2007 31-12-2007 Q4 2007 31-12-2007 December 2007
Using this as a source table to populate our time dimension we have three keys (YEAR_ID, QUARTER_ID, MONTH_ID) that will be collapsed into a single column. It is obvious in this case the same key is used to identify three different levels within our dimension. In this case the last key that is read will win and the end result will be a single dimension member will be added to our time dimension and that member will be either a month or quarter or year (most likely a year). If we switch to using surrogate keys then three members will be added because the data load program will concatenate the level name with the source key to ensure uniqueness:
  • YEAR_31-12-2007
  • QUARTER_31-12-2007
  • MONTH_31-12-12007
Simple really! But does this have any impact on the AW? In some cases ‘Yes’. I have found that it is prudent to keep your level names as short as possible. When building levels I normally assign simple level names such as L1, L2, L3 etc etc. This keeps the surrogate keys small and compact. What I found working with one customer was that the OLAP engine had certain issues when using very large (75 characters) text strings as the source key and by adding the level to start of each key as well caused data loading problems. Probably an edge case but since then I have always preferred using simple level ids.

Creating Levels
For business analysis, data is typically summarized at various levels. For example, our database schema, SH, contains daily snapshots from a transactional schema (OE, HR, etc). Days are thus the base level. However, in this case the decision has been take to summarize this data up to the monthly level and then add quarterly, and yearly levels.

Levels have parent-child or one-to-many relationships, which form a hierarchy. For example, each month summarizes days, each quarter summarizes months, and each year summarizes quarters. This hierarchical structure enables analysts to detect trends at the higher levels and then drill down to the lower levels to identify factors that contributed to a trend.

To create a level for any dimension, right-click the Level icon beneath the dimension, and then select Create Level. Fill in the name, labels, and a description. The labels can be used in reports; the description enables you to comment on the object. Repeat the process for each level in the dimension.




The end result might look something like this (note I did not follow my own advice and created level names that are quite long, just to make it easier to explain certain steps in this workshop):





Creating Hierarchies
A hierarchy is a way of organizing dimension members. Oracle OLAP supports two basic types of hierarchy:
  • Value based
  • Level Based
In this workshop all the dimensions use level based hierarchies. This is where ordered levels are used as a means of organizing data. These hierarchies can be used to define data aggregation. For example, using our newly created Time dimension, we can define a hierarchy to aggregate data from the month level to the quarter level to the year level. This hierarchy can also be used to define a navigational drilldown path, regardless of whether the levels in the hierarchy represent aggregated totals.

Dimensions can have one or more hierarchies. If you define multiple hierarchies, make sure to define one of them as the default hierarchy.

As shown in the image below right-clicking on the hierarchy icon in the navigator provides a pop menu option to “Create Hierarchy”.



Name the hierarchy. We can provide descriptive labels that can be used in reports. Select the “Set as Default Hierarchy” option if this is the only hierarchy for the dimension or if it is the hierarchy that will be used most frequently for analysis. In this case we can choose the Level Based Hierarchy option.

Finally we can select the levels for our hierarchy. The levels are organized from the highest level of aggregation to the lowest.




Creating Attributes
Attributes contain descriptive information about dimension members that are used for data selection and identification. They are used for labelling cross-tabular and graphical data displays, selecting data, organizing dimension members, and so on. AWM 10g defines basic attributes automatically. For each dimension, it creates long and short description attributes. For a Time dimension, it also creates time-span and end-date attributes.

We can easily add additional attributes, such as month number or quarter number. These additional attributes further enrich the AW. The Implementation Details tabbed page identifies the data type for the attribute. This data type should match the source data.

For most attributes it is useful to consider creating an Index if the attribute has a reasonably low cardinality. This will improve query performance if the attribute is used as a filter within a business query. What happens under the covers is an additional dimension is created containing the members of the attribute and a relation is created to map the attribute dimension members to the base dimension members. Oracle OLAP is very efficient at using relations during queries as a way of filtering members.





In the next sub-section of this workshop (Workshop 3), we will review how to map dimensions to source data and how to manage different types of dimensions (value based, skip level, ragged, star source table and snowflake source tables).

Don't forget the other sections of this series:

OLAP Workshop Part 1 : Basic Overview of OLAP Concepts
OLAP Workshop Part 2 : Understanding OLAP Technology