Thursday, February 09, 2006

Using Excel as a data source for building dimensions and cubes

Did you know you could use data stored in Excel as a source for populating dimensions and cubes created within Analytic Workspace Manager? If not then read on…

During a recent meeting we were discussing how to create a hierarchy editor for use within our various data modeling products. My response was that most people manage their hierarchies in Excel, so why not just read the hierarchy directly from Excel. As a consultant, building a multi-dimensional model was always a process of collecting two data sources for each dimension. First you walk down to the IT department and ask for access to the source table(s) to create a specific dimension. Then walk back to the users and ask them for their spreadsheets that actually contained the hierarchy they had modeled outside of the database. The dimensions modeled in most data warehouses are never quite what users want or at worst reflect structures that don’t actually relate to how the users manage their business.

Anyway, back to the meeting…so the response was: “Well we cannot read data from Excel, end of story”. After a quick chat with some of the Warehouse Builder Product Managers I found we could in fact read data directly from Excel and it is extremely easy – assuming you have your Oracle database running on Windows, otherwise this gets very expensive as ODBC drivers for UNIX are not cheap.

So assuming you have Excel and your Oracle database instance is running on a Microsoft platform how would you do get data from an Excel worksheet into your multi-dimensional model? Lets assume you are a Category Manager and you want to reorganize your brands across existing sub-categories and also create some new categories. Then you want to analyze the effect on revenue and costs across each subcategory and may be even forecast future revenue and costs as well. Most managers use Excel because it is everyone’s favorite desktop productivity tool. So you create a take a download of your product dimension data into Excel and model the new hierarchy directly within Excel. Now how to do you get that new model back into your multi-dimensional model to see the results….

Following the scenario above the steps below show how you could use the Excel Worksheet as a source within Analytic Workspace Manager to populate a dimension.

1) Step 1 – get the original Product hierarchy into Excel

In Excel you can connect directly to your Oracle instance using the normal ODBC connection processes. Using the Data Source Administrator that is part of Windows XP you can define a new DSN source to connect to your Oracle database instance:

The Server references my TNSNAMES.ora entry that connects to my Oracle instance. In this case I am connecting to the SH schema. Next step is within Excel

The query wizard allows me to choose a source table, in this case I have selected the Products table:

Finishing the wizard loads the data directly into Excel

Now the products table can be modified to remodel the hierarchy simply by using Cut and Paste to move the cells around. It would be really nice to be able to use the Excel Outline mode so the hierarchy could be edited and viewed like a normal hierarchy. The only problem is this implies reading an embedded total dimension within Analytic Workspace Manager and it seems this is not currently supported. However, I have a simple workaround for this. I will cover this in another blog.

OK, we have the new model ready in Excel. Next we define a named data range that covers all the dimension members. This range name is how we will refer to the data from within SQLPlus.

2) Create a new ODBC source for your worksheet

Now the spreadsheet is complete we need to copy it to the Windows server hosting our Oracle database instance. Once the file is on the server we can create an ODBC connection to the Worksheet using the Data Source Administrator tool. Note, this time we are creating the ODBC on the Server machine.

3) Configuring Oracle to use ODBC

This section will appear to be very scary, but as long as you follow these steps nothing can wrong. Trust me. After you have done one ODBC source setting up the next one is easy as riding a bike. Just remember to write down the names you use at each step.

When you install an Oracle database we also install something called heterogeneous data services. This is a little known free option that is actually very powerful.

First you need to setup a data source: In the Oracle Home on your server there is directory called “hs”, for example on my server the directory is e:\oracle\product\10.1.0\Db_1\hs. In this directory there is an admin directory and you need to create file called “initODBCName.ora”. In this example I created a file called initProdBEANS.ora. This file contains the following lines

# This is a sample agent init file that contains the HS parameters that
# are needed for an ODBC Agent.
# HS init parameters
HS_DB_NAME = hsodbc

The HS_FDS_CONNECT_INFO should point to the name of you’re the reference you created in the ODBC Data Services Administrator.

Secondly you need to update the database Listener configuration file to refer to the new ODBC data source. Use notepad to edit the Listener.ora file located in your ORACLE_HOME\Network\admin directory. In the SID_LIST_LISTENER section add an entry similar to the following:

(ORACLE_HOME = E:\oracle\product\10.1.0\Db_1)
(PROGRAM = hsodbc)

The SID_NAME is the reference to the name of the file you created in the previous step minus the “init”. So in this case the file was called “initProdBEANS.ora”, therefore, the SID_NAME is “ProdBEANS”. Now reload the listener configuration file using LSNRCTL RELOAD and check the new SID is available using LSRNCTL STATUS. The new SID, ProdBEANS, should be listed.

Last step is to add a TNSNAMES entry to allow us to point to the new SID quickly and easily when we setup the database link.


4) Creating a view in the database

To be able to view the data from Excel we need to create a database link to allow us to connect to the Excel Worksheet and then a view that is based on the source data range we defined in Excel.

In SQLPlus, assuming you have the correct privileges, create a database link as follows:

Create database link ProdBEANS connect to SCOTT identified by TIGER using ‘PRODBEANS’

We can now use the database link to define the source table for our view that will pull data directly from Excel.

Create view EXCEL_PRODUCTS as select * from PRODUCTS@ProdBEANS

5) Create a mapping using Analytic Workspace Manager

This view can now be used as a source view within Analytic Workspace Manager to populate a dimension. Having created the dimension using the AWM wizards the mapping process is simply a question of drawing lines between the columns in the view to the corresponding levels/attributes in the dimension.

There you have it, using Excel as a data source for your multi-dimensional model. You can do all your hierarchy design within Excel and then quickly and easily load the results.