Wednesday, November 21, 2007

OLAP Workshop : Basic Overview of OLAP Concepts

I have recently been working with two of our key OLAP experts (Stuart Bunby and Kevin Lancaster) helping deliver various OLAP workshops. Based on the questions and discussions I had with many people during this years OpenWorld I think it might be useful to share some of information we put together for those workshops via this blog. This series of postings (not sure how many it will be at this stage) are designed to take you from the OLAP basics right through to more advanced features such as creating your own calculations using the huge library of OLAP analytical functions and on to more DBA type topics such as monitoring cube data load and aggregation processes. I am hoping I can keep the flow of topics in the correct order....

Let’s start with the most obvious question “What Is OLAP”?

Online analytical processing (OLAP) is in my opinion a highly abused term that has lost much of its original meaning. In its original guise, dating from the early 1990’s, the term was used to describe a class of computer systems that were designed and optimized for analysis. This is still true of Oracle OLAP but not for many of the other proprietary solutions currently available in the marketplace.

From my perspective OLAP is about working with data in business terms without having to understand the underlying storage mechanism and having the ability to intelligently and transparently support the many different types of business rules that always exist within an organisation. For example, a very simple and obvious example is Stock. Most query tools do not understand how to correctly analyse stock over time, it is left to the user to select the correct aggregation method.

By using this term, it is possible to differentiate the more analytical requirements of the business analyst and senior management community from the requirements of the more general basic requirements that are easily and quickly answered by most directly query a transaction processing (OLTP) system. OLAP has now evolved into a more generic environment that is centred around use of the term “business intelligence”. Here the emphasis is more on “online” or active access as well as being far more “analytical” in terms of the reports that are generated.

What do these terms, online and analytical mean?

Online: Although most OLAP tools and applications enable development of reports that can be saved and printed when not connected to live data, OLAP emphasizes live access to data rather than static reporting. Analytic queries are submitted against the database in real time, and the results are returned in real time.

Analytical processing: This is the key concept with OLAP. End users can:
  • Easily navigate multidimensional data to perform unpredictable ad hoc queries and to display the results in a variety of interesting layouts
  • Transparently manage business rules across dimensions and cubes
  • Drill through levels of detail to uncover significant aspects of data
  • Rapidly and efficiently obtain the results of sophisticated data calculation and selection across multiple dimensions of data
A standard transactional report or query might ask, “When did order 84305 ship?” This query reflects the basic mechanics of doing business. It involves simple data selection and little or no calculation processing. It can be answered directly from the transactional system, probably without impacting other operations. Every organisation needs this basic level of information.

In contrast, OLAP systems are typically deployed to extend and enhance an organization’s ability to answer a much broader range of business questions about the data they are collecting in their transactional systems:
  • How do sales for our top 10 most profitable products across Europe for this quarter compare with sales a year ago?
  • What are the differences in the product-sales mix between the regions, relative to the global sales mix?
  • What are our forecast units, unit price per service, unit cost per product, sales, cost trends, and profit for the next 12 months?
  • In what ways does the mix vary by salesperson, and what is the relative performance of our salespeople?
  • What are the products making up 40% of our profit for each region over time?

These questions are more analytical and complex, and the answer to one question often leads immediately to another question as the user follows a train of thought in researching a business problem or opportunity.

OLAP is designed to make it easy for end users to ask these types of analytical questions without requiring:
  • Assistance from the IT department
  • Programming skills
  • Technical knowledge about the organization of the database
The results of queries also need to be rapid so that the analyst’s train of thought is not interrupted and the value of the analysis is not diminished.

A typical multidimensional business query, would be something like the following:

For each region of the world, what was the percentage change in revenue for our top 20% products, over a rolling three-month period this year compared to the same period last year?
This simple business question describes both the data that the user wants to examine and the structural form of that data. Business users typically want to answer questions that include terms such as what, where, who, and when. For example, you find the following essential questions embedded in the sample question:

  • What products are selling best? (“…top 20%…”)
  • Where are they selling? (“…each region of the world…”)
  • When have they performed the best? (“…percentage change in revenue…”)
If you examine the query in detail it appears to be translate into a very complex query. When I have discussed this type of query during presentations at conferences and with customers you can always spot the DBAs because they immediately try to translate this question into a SQL statement. Of course it is possible to create a SQL query to answer this question. But lets start by breaking this query down and examining it in more detail:

  • There are two calculations (percentage change in revenue and rolling three-month total).
  • There is a ranking element (the top 20% of products)
  • There are aggregations (region level of the geography dimension)
  • There are multidimensional selections (specific products, specific time periods, and specific regions)
  • The result of the query is a multidimensional view of the data (perhaps as a tabular display on the screen, perhaps as a graph, or both)

Now I think many people would agree this does in fact look like a complex query. To try and frame this within a traditional query and reporting tool would require considerable skill. And don’t forget this is the starting point for the analysis not the final result. The data returned by this query will drive other even more interesting and complex queries.

But the complexity of this query is a technical issue rather than a business issue.
Therefore, a key goal of OLAP technology is to make it very easy for end users to ask such questions about their data without placing a burden on the IT department.

How Does OLAP Make This Easy?

Business users think dimensionally. By design, OLAP technology stores, processes, and presents data in a dimensional way. The data model of OLAP systems reflects the users’ picture of their business data, making it easy to formulate queries in business terms.

OLAP systems are optimised for fast retrieval of data for dimensional analysis.
We will now examine the multidimensional logical model, which serves as the basis for OLAP systems.

The key Objects within the OLAP Model Most OLAP data models are built around two key concepts: measures and dimensions.

Measures represent factual data; they are sometimes called “facts.” Typical examples of measures are sales, cost, profit, and margin. Measures are organized by one or more dimensions. Many people visualize measures as being a simple cube type shape, in which the edges of the shape are the dimensions and the contents of the shape are the measure values. The image below shows a generic simple three dimensional measure.

Of course measures are not restricted to just three-dimensional definitions. A measure can have as many or as few dimensions as required to accurately manage the data associated with the measure. In additional Oracle’s OLAP option allows you to design and manage multiple cubes each with different dimensionality. In addition Oracle OLAP supports a number of different data types for measure such as: numeric (Oracle SQL data type), decimal, integer, text, date and boolean.

Measures can be divided into two categories:
  • Stored Measures
  • Calculated (or derived) Measures
Stored measures are loaded, aggregated and stored directly within the database. Alternatively, they can be derived from the results of calculations that are stored. For example a forecast could be derived from another stored measure such as revenue and the results of the forecast calculation stored in the database.

Calculated measures are measures whose values are calculated dynamically at query time. Only the calculation rule(s) is stored in the database. Common calculations include measures such as ratios, differences, moving totals, and averages. Calculations do not require disk storage space, and they do not extend the processing time required for data maintenance.

Note: Oracle OLAP has a library of several hundred multidimensional calculation functions that can be used in calculated measures. It is even possible for expert users of Oracle OLAP to define their own functions to perform virtually any calculation.

From a business user perspective both measures appear and are used in exactly the same way and have equally as fast query access. All measures are equivalent as far as the end-user interface is concerned. However, it can be useful to at least provide them with the access to the definition of a calculated measure via a tooltip for example, just to avoid confusion.

Below is an example of a typical report contained both stored and calculated measures. The two calculated columns are “Profit” and “Margin”. Because both types of measures are treated the same, business users can use them in queries, conditions, to drive colour coding, exception reporting etc.

So what is the difference between a cube and a measure? If you have used Analytic Workspace Manager you will be familiar with the concept of a cube. This is a high level container, which is invisible to business users querying the OLAP data, for grouping together measures that share the same dimensionality. Cubes do make your life much easier in terms of being able to manage a whole group of measures collectively, store them all to an XML template, load data into a group of measures via single reference and so on.

Dimensions identify and categorize the data within your measures by forming the edges of the measures. Examples of dimensions include product, geography, time, and distribution channel.

Dimensions have three key components:
· Hierarchies
· Levels
· Attributes

With the Oracle OLAP data model, dimensions are stored once and are used repeatedly. This allows dimensions, and their members, to be shared across measures. While dimensions form the edge of a measure, the members point to individual cells inside the multi-dimensional measure, as can be seen above.

In the example below, there is just one Time dimension even though it appears three times. The three measures in the picture have different shapes, or dimensionality. Sales and Units are both dimensioned by the Customer, Product, and Time. Price is only dimensioned by Product and Time; it does not use the Customer dimension because the price does not vary by customer.

Hierarchies Dimension hierarchies are optional but are common in OLAP systems. A hierarchy is a logical structure that groups like members of a dimension together for the purpose of analysis. For example:
  • A Time dimension might have a hierarchy that describes how months are grouped together to represent a quarter and how quarters are grouped together to represent a full year.
  • An Organization dimension might have a hierarchy that makes it easy for you to identify the direct reports of a specific manager.
Each dimension can have multiple hierarchies if required. For example, the time dimension can have a hierarchy that represents the Julian calendar and another hierarchy that represents a fiscal calendar.

A dimension’s structure is organized hierarchically based on parent-child relationships. These relationships enable:
  • Navigation between levels: Hierarchies on dimensions enable drilling down to lower levels or navigating (rolling up) to higher levels. Drilling down on the Time dimension member “2005” will likely navigate you to the quarters Q1 2005 through Q4 2005. In a calendar year hierarchy, drilling down on Q1 2005 would navigate you to the months January 05 through March 05. These kinds of relationships make it easy for users to navigate large volumes of multidimensional data.
  • Aggregation from child values to parent values: The parent represents the aggregation of its children. Data values at lower levels aggregate into data values at higher levels. Dimensions are structured hierarchically so that data at different levels of aggregation can be manipulated together efficiently for analysis and display. You learn about the aggregation capabilities of Oracle OLAP in the lesson titled “Applying Advanced Dimensional Design and Cube Processing Techniques.”
  • Allocation from parent values to child values: The reverse of aggregation is allocation and is heavily used by planning, budgeting, and similar applications. Here, the role of the hierarchy is to identify the children and descendants of particular dimension members for “top-down” allocation of budgets (among other uses).
  • Grouping of members for calculations: Share and index calculations take advantage of hierarchical relationships (for example, the percentage of total profit contributed by each product, or the percentage share of product revenue for a certain category, or costs as a percentage of the geographical region for a retail location).

In this example, you can do the following in the Product hierarchy:
  • Navigate up through each level in the hierarchy from the lowest level to the highest level
  • Navigate down the hierarchy from the highest level to the lowest level
  • Aggregate data from the lowest level (individual products) up through the hierarchy to the highest level (total product)

Levels Each level represents a position in the hierarchy. The level above the base level contains aggregate values for the levels below it. The members at different levels have a one-to-many parent-child relationship. A hierarchy typically contains several levels, and a single level can be included in more than one hierarchy.

If data for the Sales measure is stored at the Product level, then the higher levels of the product dimension enable the sales data to be aggregated correctly into Subcategory, Category, and All Products levels.

If there are multiple hierarchies built over a dimension, it may be that a level would appear in more than one hierarchy or may exist in only one hierarchy.

Types of Hiearchies Within a multi-dimensional model there are two basic types of hierarchies:
  • Level Based
  • Value Based
Most of the hierarchies are level based, including the Product dimension hierarchy shown in the previous slide and the Time dimension hierarchy shown in this slide. In the time hierarchy example, there are Day, Month, Quarter, and Year levels in the hierarchy.

Sales forces also generally have a level-based structure, as in the following example:
Representative > Area > Region > Country > Continent > World

Other dimensions may have hierarchies that are not strictly level based. For example,
there is clearly a hierarchy in an organization chart, but all the direct reports of the President may not be at the same level. In the example, the two VPs (vice presidents) and the President’s Admin (administrative assistant) are all direct reports of the President but are not at the same level. The VPs are not at the Admin level, and the Admin is not at the VP level.

Attributes provide descriptive information about the dimension members and are also useful when you are selecting dimension members for analysis:
  • Select the products whose color (attribute) is “Blue.”
  • Select the customers who have two children.
  • Select the promotions that are of type “Multipack.”
  • Select all time periods whose description contains “January.”
Most types of attributes are entirely optional. Oracle OLAP permits a large number of attributes to be created if required. Some attributes are valid for all the members of the dimension, regardless of level. For example, all products at all levels have a description. Others attributes are valid for certain levels or certain hierarchies only. For example, only individual product items have a color.

In the above picture certain products are displayed together with their parent levels in the hierarchy (subcategories and categories) on the Product dimension. In addition, two sample attributes (Color and Product Manager) are displayed for these products.
Other examples of typical attributes might include:

For Product dimensions
: Color, Flavor, Pack Size, Brand Manager Name, and so on
For Customer dimensions: Gender, Marital Status, Date of Birth, and other types of demographic information

Note: Sometimes, attributes may also be modeled as levels in an alternate hierarchy. Consider the above example: If the business requirement was that measures should be aggregated by Product Manager (total for Bruce, John, Karl, Mary, and so on.) or by Color (total for Blue, Green, Red, White, Yellow, and so on) and by the ability to drill down, aggregate, allocate, or calculate data based on these values, then many designers would consider creating additional hierarchies on the Product dimension for this purpose. However, many attributes have little business use as aggregates and are used simply in filtering.

What's coming next?
The next posting will provide an overview of the various products that can be used to build, maintain and query Oracle OLAP cubes.