Creating a basic Mondrian OLAP Cube

This post is a hands on tutorial on how to create an analysis cube for the Mondrian OLAP engine. It is an introductory post for an audience with no OLAP experience at all. I will assume some experience with relational databases and the Kettle ETL tool. If you’d like to follow the examples you will need access to a database, a copy of Pentaho Kettle and a Mondrian installation. I will be using MySQL as RDBMS and JasperServer 3.7. CE for the Mondrian installation. Other possibilities include Pentaho BI-Server and a bare bones Mondrian/JPivot and PAT installation.

The example data

For the example data I would like to use a small and simple dataset, so it is easy to share and easy to understand. Therefore I decided to use an Excel extract from the public issue tracker for the Kettle tool. It has one thousand lines of issues, bugs and feature requests that I would like to put into an OLAP cube for analysis. Each row contains the issue type, a summary, the assignee, a priority,  a status and a resolution. I took the liberty to replace the real names of the assignees with figures from Sesame Street in the input data. And no, I will not reveal who is who :-)

Download the Excel file that contains the example data.

Transforming the Excel file to a Mondrian cube

A Mondrian cube is stored in a relational database. The central table for an OLAP cube is called the fact table. It contains entities (think: rows) whose properties are analyzed in the OLAP cube. For the example cube an issue will be a fact. It will have the properties type, assignee, priority, status and resolution. I’d like to drop the summary field, because it contains free text information that is of little use in aggregations or breakdowns.

The simplest path from the Excel sheet to the database is to create the table in the database and to import the Excel sheet. I will be using Kettle for that. I’ll be using a counter to create a key for each issue. Other than that the import transformation is just your garden variety database import.

As a result the facts table looks like this:

CREATE TABLE fact_issue
(
	id INT PRIMARY KEY,
	issue_type TINYTEXT,
	assignee TINYTEXT,
	priority TINYTEXT,
	status TINYTEXT,
	resolution TINYTEXT
);

Get the import transformation if you want to follow along.

Now that the data has made it to the database Mondrian needs to be configured to use it.

Telling Mondrian about the cube

Mondrian cubes are configured using schema files. The schema is going to hold a single cube. We must give it a name and define dimensions for it. A dimension is an OLAP concept that I will not bother to define with academic precision at this point. For the time being I’d like to think of a Mondrian dimension as a property of a fact (i.e. issue) that we can break down along. As such the issue type, assignee, priority, status and resolution are each dimensions of an issue. OLAP dimensions are always hierarchical in nature. The issue dimensions only have one hierarchy level each. In real world applications dimensions may have multiple hierarchies with multiple levels each. If you’re wondering how this can possibly be useful, imagine a date dimension that breaks down by year/quarter/month/day or by year/week/weekday. That’s two hierarchies with four levels and three levels respectively. An OLAP cube also needs at least one measure. Again, academic definitions are probably not useful at this point. Therefore I’d like to think of a measure as something describing a quantitative value. In the cube at hand an obvious measure seems to be the amount of issues in any given breakdown. It is interesting how many open issues are not assigned yet, for instance. In real world cubes measures usually address questions beginning with “how many?” and “how much?” :-)

The discussion above leads to the following schema file:

<?xml version="1.0"?>
<Schema name="IssueSchema">
	<Cube name="Issue">
		<Table name="fact_issue"/>
		<Dimension name="Type">
			<Hierarchy hasAll="true" allMemberName="All Types">
				<Level name="Type" column="issue_type" uniqueMembers="true"/>
			</Hierarchy>
		</Dimension>
		<Dimension name="Assignee">
			<Hierarchy hasAll="true" allMemberName="All Assignees">
				<Level name="Assignee" column="assignee" uniqueMembers="true"/>
			</Hierarchy>
		</Dimension>
		<Dimension name="Priority">
			<Hierarchy hasAll="true" allMemberName="All Priorities">
				<Level name="Priority" column="priority" uniqueMembers="true"/>
			</Hierarchy>
		</Dimension>
		<Dimension name="Status">
			<Hierarchy hasAll="true" allMemberName="All Status">
				<Level name="Status" column="status" uniqueMembers="true"/>
			</Hierarchy>
		</Dimension>
		<Dimension name="Resolution">
			<Hierarchy hasAll="true" allMemberName="All Resolution">
				<Level name="Resolution" column="resolution" uniqueMembers="true"/>
			</Hierarchy>
		</Dimension>
		<Measure name="Issue Count" column="id" aggregator="count" formatString="Standard"/>
	</Cube>
</Schema>

The schema file must be presented to the Mondrian engine. I can not give a complete account on how to do this, since the necessary steps depend on the server software (and version) you are using. The screencast shows the steps necessary on JasperServer 3.7 CE. You’ll have to consult the manual of your server solution to find out how to introduce a new OLAP schema.

Regardless of which server you use, at some point you’ll likely need to provide an MDX query for a view of the new cube. The initial query usually is not very important in interactive environments, since you’ll be able to use the UI to navigate the cube. And since this post is not dealing with MDX as such, I’d like to present a short MDX query that will work as a kickstart for the cube without comment.

SELECT   
    Type.Children ON COLUMNS,
    Assignee.Children ON ROWS
FROM Issue

Conclusion

This post gives a practical introduction to Mondrian cubes. It shows how to create a minimal, but fully operational Mondrian cube. It also explains that an OLAP cube always revolves around a fact table and tries to give an intuitive understanding of dimensions and measures. Dimensions are essentially hierarchical properties attributed to facts that can be used to examine facts with certain characteristics at various levels of detail. Measures are describing facts in terms of numbers. They answer quantitative questions.

I hope that this little introduction enables Mondrian beginners to get up to speed faster. There is much more to know, of course. The follow up article deals with dimension tables. Check it out if you want the cube to be stored more efficiently.

If you would like to suggest a Mondrian topic, feel free to leave a comment.

Cheers

Slawo

Related Literature:

41 comments to Creating a basic Mondrian OLAP Cube

  • Anonymous

    I have a question,can the schema file be define dynamic and load dynamic? I mean that the schema file define at runtime with java code

  • Slawomir Chodnicki

    Hi there,
    well when programming Mondrian at some point you create a connection and you have to supply a schema for that. This schema can come from a file or be defined on the fly. In principle this is possible to define it dynamically. You’d have to study the Mondrian source and javadoc and probably some of the existing examples of how to do it. If I understand your question correctly you want to do what Wabit, JasperServer or Pentahos BI-Server do. All of those are open source so check them out :)

    Cheers

    Slawo

  • Wendy

    Hi, great article. I’m new to OLAP and I have a question regarding the Mondrian engine. I would like to setup an OLAP cube using Mondrian for reporting in my app. The reporting front-end will be provided by my app so I do not need the client app (or JPivot) provided in the Mondrian install (i.e. the WAR). I plan to connect iReport (from Jasper) to my Mondrian cube to generate the report. What components do I need to install in order to get the Mondrian engine (i.e. the in-memory OLAP analysis engine)? The Mondrian install comes in a WAR format, but I don’t need a web app.

  • Slawomir Chodnicki

    Hi Wendy,

    Yeah sure, it can be done. Have a look at the Wabit project here:
    http://www.sqlpower.ca/page/wabit

    Wabits starts an in-memory mondrian server once you suppy a schema and the db connection. It allows you to query it to build reports. I guess it’s pretty much what you need. Wabit uses olap4j api.

    Wabit is open source, get the sources here:
    http://code.google.com/p/wabit/

    I suggest you try the tool, then get the source and have a look at the package ca.sqlpower.wabit.rs.olap
    Maybe the OlapQuery or the OlapConnectionPool class would be a good starting point.

    Lots of luck with your app :)

    Cheers

    Slawo

  • Maxi

    Hi, really helpful what you do,

    I have a question, I’m using JasperServer CE, with mondrian OLAP Cubes, and i have some Analysis Views with many rows so i wanted to create a row called ‘others’ where i want to aggregate all rows that are less important ‘cos of their values.

    The aim of this is to make the view less difficult to understand.

    It’s possible?

    Thanks

  • Slawomir Chodnicki

    Hi Maxi,
    thanks for reading :)
    The general answer to your question is yes, it can be done. The details vary depending on your concrete requirements, but all in all it seems to me that you essentially need a top-n query and an aggregation of the rest.

    In the context of the example cube, suppose I want the top 5 people with bugs assigned, the rest should be aggregated in “others”.
    I might start with something like this to first list everybody:


    SELECT {[Measures].[Issue Count]} ON COLUMNS,
    {[Assignee].[All Assignees].Children} ON ROWS
    FROM [Issue]

    Now I’d refine it to list only the top5 assignees. I’d use the TopCount function to retrieve them:

    WITH
    SET TopBugHunters AS 'TopCount([Assignee].[All Assignees].Children, 5.0, [Measures].[Issue Count])'
    SELECT [Measures].[Issue Count] ON COLUMNS,
    [TopBugHunters] ON ROWS
    from [Issue]

    Now that I’ve identified the top guys, I’d create a set with the top guys removed, thus creating a set expression for the ‘Others’. Then I create a calculated member that aggregates over those, and display them below the top 5.


    WITH
    SET TopBugHunters as 'TopCount([Assignee].[All Assignees].Children, 5.0, [Measures].[Issue Count])'
    SET OtherBugHunters as 'Except({[Assignee].[All Assignees].Children}, [TopBugHunters])'
    MEMBER Assignee.[All Assignees].[Other Bughunters] as Aggregate (OtherBugHunters)

    SELECT {[Measures].[Issue Count]} ON COLUMNS,
    Union([TopBugHunters],{Assignee.[All Assignees].[Other Bughunters]}) ON ROWS
    FROM [Issue]

    Hope that helps you along

    Cheers

    Slawo

  • Maxi

    That was just what i needed!

    Thanks a lot!

  • [...] the samples I’ll use the schema from the introductory post about Mondrian cubes. The cube represents a bug-tracker OLAP and stores how many bugs are assigned [...]

  • Jon Kears

    Slawo

    This is a brilliant article, been looking all over the web for a step-by-step, and was amazed how simply it all fits together. Have subscribed to your blog, will read with interest.

    Thanks again!

  • Yanis Guenane

    I am just dropping a line to thank you for this article.

    Clear and efficient that all I needed.
    As Jon, I just subscribed.

    Hope to read more articles soon.

  • Slawomir Chodnicki

    Thanks for the nice comment Yanis, much appreciated :)

  • Sharath Soundararajan

    Searched through the net and found one article that gave a great understanding and its none other than yours… Thanks!

  • Very nice, i’m working on this BI Solutions. More help of this kind is always welcome to my mail id forprogramuse@gmail.com. Thanks

  • Aymeric

    Hi,

    I’m working on Jasper, and i’ve made mondrian cube, i want to know if something is possible :
    When the cube appear, can I deploy automatically a row ? I don’t want to clic on the “+” button.

    Thanks!

  • Slawomir Chodnicki

    Hi Aymeric,

    I’m assuming you mean the default view of the cube, and the fact that it comes unfolded on both dimensions. The sample MDX query asks for an unfolded view. If you want to change that, and you’re working on JasperServer the easiest thing is probably to use the user interface to produce the view you want (single row if you like). The user interface of JasperServer uses manipulates the MDX query in the background. Once you are happy with your view, you can just save it. There’s a save and a save as button at the bottom left. The next time you open that view it will show you the view as you last saved it.

    Cheers

    Slawo

  • Aymeric

    Hi Slawo,

    In fact, i don’t want to modify the default cube view, on the other hand i want when i deploy one dimension with the button +, the under informations of this dimension are yet deployed, because actually i have to deploy one by one, each information, and i have more than fifty clic on button + for visualize the global result.

    Thanks for your prewious answer.

    Aymeric

  • Aymeric

    An idea?

    Aymeric

  • Slawomir Chodnicki

    Hi Aymeric,

    If I understand correctly you want to be flexible about opening the various sublevels of dimensions at once. I’m afraid the jasper user interface does not support that. If you want to do that, you’d have to write a corresponding MDX query and go with that, maybe saving the query as a separate view. MDX functions like Descendants are your friend here.

    Cheers

    Slawo

  • Aymeric

    Thanks Slawo, i will explore this possibility and i’ll keep you informed!

    Aymeric

  • Thanks to all. Quite a good way of sharing the knowledge.

  • dhaker

    Hi,
    i would like to know is it possible to query multiple said 2 cubes within one olap query
    if yes how we could perform this with Mondrian
    thanks

  • Slawomir Chodnicki

    Hi dhaker,

    OLAP queries cannot operate on different cubes, they always query a single cube. You can have virtual cubes however, which combine different fact tables taking their shared dimensions into account. The result is a virtual cube, that is fed from multiple fact tables.

    Cheers
    Slawo

  • Terance Cambel

    Hi,

    Below is a part of an hierarchy. I included SQL, but it stop producing the results. Also I want to know
    how a Query can be included in a Join.

    Best Regards
    Terance (terance.cambel@gmail.com)

    SELECT cm_ins_name
    FROM “clmreg”, “nmmasclaim”
    WHERE “clmreg”.”osc_claims_ref” = “nmmasclaim”.”cm_claims_ref”
    AND “clmreg”.”osc_change_no” = “nmmasclaim”.”cm_change_no”

  • Slawomir Chodnicki

    Hey Terrance,

    can you try to be a bit more specific about what you are trying to achieve? I’m afraid I don’t seem to get the idea. I also think that wordpress ate up most of the XML you posted, sanitizing the tags away. If you’d like to repost, maybe you’d like to use code tags, so the schema XML may be preserved.

    Cheers

    Slawo

  • mespub

    Hi,

    Thank you for this great article.

    Do you know if it’s possible to use a parameter (like inputcontrol) when launching an Analysis View in JasperServer CE 3.7.1.

    Regards

  • Slawomir Chodnicki

    Hi mesapub,

    sadly, no, I have no idea about the level of integration in JasperServer CE. So far I’ve never seen anything that would imply parameter support for analysis views in CE.

    Cheers
    Slawo

  • Leo van den Berg

    Hi,

    We are working on a cube. We want to set a red color when the
    amount < 0 but it is not working. We have a linux server. This
    is the code:
    FORMAT_STRING=IIF([Measure].[amount]<0,"#B22222","#B22222")
    Can you help me out.

    Thank you.

  • Slawomir Chodnicki

    Hey Leo,

    you’ll have to adhere to the string format specs to make the style appear properly.

    A cell format in JPivot works something like this:
    vb_format_string|style=somestyle|arrow=some_image
    somestyle is going to be added to the css classes of the cell (as “cell-style-somestyle”, if I recall correctly), and some_image wil be a html image added to the cell on the right (“arrow_some_image.gif” if I recall correctly).

    Please use Firebug for Firefox or Chrome Developer Tools to check what exactly gets rendered.

    Try something like this for starters:
    Iif([Measures].[amount] > 0, ‘|#|style=green’, ‘|#|style=red’)

    Cheers
    Slawo

  • Hey Slawomir!

    First, thanks for the article. It cleared up some things for me as a beginner with OLAP.

    I also have a question. I’m using JasperServer 4.2.1 CE, and it has a new interface compared to the 3.7 version you are using. The only problem I have is that I can’t find a way to upload a new OLAP schema. The Jaspersoft OLAP CP User Guide says that this should be done by selecting Add Resource > File > OLAP Schema with a right-click on a folder. However, I don’t see the File option after selecting Add Resource.

    My temporary work-around is to copy other schemas and edit them However, this has the drawback that their resource ID cannot be changed, so I have a lot of schemas with the same ID.

    Thanks in advance in case you have the time to answer.

    Cheers,
    Vlad

  • Slawomir Chodnicki

    Hi Vlad,

    I’m sorry, I have not looked at JasperServer 4 in detail. I suspect maybe that maybe the docs are wrong, and you’d have to choose a different type. Or you’d have to create a deeper level of nesting (non-top level folder) before you can add files. Not sure. I guess it’s best to ask on the jasper community forums.

    Cheers
    Slawo

  • Connie

    I am also using the newer version of Jasper server (4.1.3)
    The schema is added when you add the connection. As you have probably figured out by now…

    The problem I have is it tells me that the MDX query is not valid. I should think that the query language has not changed in this version. Any ideas as to what would be the issue with the MDX query??

  • Senthil

    This is a great a article. Thank you !!

  • kop

    Hi Folks,

    Nice article.But i have a problem
    I want to create a cube , which contains the aggregated data and want the cube to use as a source to my report.
    I have now created an xml file from the schema workbench which contains dimensions and facts.
    So for me the next after creating this xml file. how do i proceed??????
    How do i populate the data out of my cube.??????????
    My xml is just a skeleton with all the dimension names and facts.
    So please folks let me know how do i procced to next step so that i can succefully creat cube.????
    Please help.

    Thanks
    Kop

  • naresh

    Hi , I am new to OLAP .I hav 1 question that i have a “stores(id,name,url)” table in mysql how to create a cube for this stores Table Can u explain in detailed steps? Bcoz i don’t know how to write a schema for olap generation. Please It’s urgent .Thanks In advance

  • sanjoy

    Dear Slawo
    Can u provide me any step by step process of writing SQL Query in cube.I m trying this code for example:

    select
    Socialgroupname as SGName,
    Socialgroupcode as SGCode
    from
    SocialGroup
    where

    Socialgroupcode =’${filter}’

    now where to define this ‘filter’ or what is the next step? in SpagoBI OLAP.Can u provide any example.

  • Alex

    These articles are great for me starting on Mondrian. Any examples of non additive measures ? Thanks

  • Ravinda R

    Sir….
    the article was useful….
    I am new to this. We are implementing a student datawarehouse. We have a plan to use mondrian and jpivot. Can you help us to create schema and cubes. and please suggest a best way to install this.

    Thank you…

  • kaushal

    is there some open source tool which can generate this type of schema so that we can develop these cubes fast just drag and drop !!!

    regards
    kaushal

  • khyati

    Hi
    I am using JPivot for creating olap cubes.
    i have 2 parameters-VERTICAL,DESIGNATION in a cube and have generated separate reports for each. now i want to generate a report using the parameter selected by user. so if the user selects VERTICAL the report is generated using vertical parameter and vice versa without user explicitly changing the selected template.

Leave a Reply

 

 

 

You can use these HTML tags

<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>