Friday, November 04, 2005

Intelligently building OLAP cubes with data mining

The great thing about working at Oracle is that we just keep adding more and more features to our database. This means as BI product managers we can get access to some very interesting concepts. Sometimes it takes two or three concepts to come together and then something truly exceptional happens. During our discussions of possible presentations for this years Oracle World we brought together three BI technologies to create something we named "The Cube Factory"

The OLAP team announced a new Java API for AW XML for 10g, and this was combined with major improvements in cube build times. Internally we have seen load times and refreshes for some cubes drop from dramatically for a build and 1 minute for a refresh. This sort of performance makes you completely rethink the need for building cubes as part of a normal ETL process. With these sort of build and refresh times users could build their own cubes as part of application. You could call this "real-time" cubes, but that term is a very loaded concept. May be something for a future posting.

The second piece of this jigsaw to drop into place concerns the new data mining 10g PL/SQL packages. These new packages contain two procedures: predict and explain. Explain makes it easy for users to better understand their data. The first step is to select a target attribute of interest, for example "high value customers. The Explain procedure will then sift through the data searching for attributes that have the greatest influence on a target attribute, ranks them in order, and creates a table that contains the results. For example, Explain could be used to identify attributes most associated with:
  • Customers who churn to a competitor
  • Patients who develop a disease
  • Customers who represent the highest value to you
  • People who don't comply with regulations
  • Parts that fail
The explain package is very interesting when you combine this with the new AW Java APIs and the new performance improvements for cube build times. When you bring all these features together it is possible to intelligently build OLAP cubes. At Oracle World this year I presented a paper called the "Cube Factory". This presented a 27 dimensional model that could be customized using attribute importance to intelligently build a simpler cube based on only the most important customer attributes. In our example this resulted in a simple 5 dimensional model.

So how would you bring all this together? For Oracle World we created an HTML application using Business Intelligence Beans. We created a page to show a list of measures and all the associated 27 dimensions. Using the predictive analytics we could score the importance of each dimension and then load that information into a variable, which could then be used to display the results in a graph. By provide saved selection scripts users could toggle between viewing all scored dimensions or just the most important dimensions.
A "create cube" button calls the AW Java API to build a cube based on the most important dimensions. The resulting cube can then be displayed within a crosstab with pre-defined color-coding etc. The application we very easy to build as Business Intelligence Beans contains a number of powerful JSP Tags specifically designed to help build these types of applications.

The integration of data mining into the database does create some other interesting scenarios that we will be exploiting in other demos. For example, data mining provides a concept called market "basket analysis". This groups complimentary products together into groups, which could be directly used within an OLAP dimension to define custom aggregates. Custom aggregates are calculated dimension members and can be used to create user-defined hierarchies. Mark Rittman's Oracle Weblog (this link will take you directly to Mark's weblog)contains some more information about this feature.

Extending this even further, why restrict yourself to the set of packages within DBMS_PREDICTIVE_ANALYTICS package. The results from data mining models could be integrated into an ETL process. For example customers could be classified into groups which could then be used to create hierarchies within a customer dimension. Something like this could be built into a Warehouse Builder process flow and mapping. Enterprising developers could create an OWB Expert (those with OWB Paris) to provide an integrated solution.
And so it goes on and on...The possibilities are only really limited by your own imagination.

For even more information about data mining see Abhinav’s previous posting: Oracle Data Miner 10gR2 beta now available, with a Discoverer bridge, this links directly to the BI Blog website and the listed article.

More information on the new data mining predictive analytic package can be found Click here for details. The link will take you to the Data Mining home page on OTN. This covers the Excel Worksheet Addin, but also provides information on both PREDICT and EXPLAIN procedures.

There is an excellent overview of the Explain and Predict procedures on the OTN Data Mining page: The OLAP home page on OTN provides more details on the OLAP APIs and designing and building cubes with AWM. The following links are from the OLAP OTN Home page:
Oracle OLAP Analytic Workspace Java API Reference
Analytic Workspace Manager 10g