Here goes a simple sample for a datawarehousing project from a bank. (note that this is just simple current accounts and fixed deposits, we're not even talking about equities, derivatives, bonds or forex).
In a dwh project, usually there will be some existing reports from old system to be compared against a new EDW. EDW will consist of ETL (processing source data and implementation of all data transformation) and OLAP (reporting layer where further filters and metrics can be applied). With this, data accuracy will be challenged and it's troublesome to find which part exactly has data gone wrong. Sometimes numbers can be critical, especially in banks. These numbers appearing in the report are read by Marketing or Sales people for their forecast, or even CEOs. A cent lost has to be digged out.
In layman's terms: my edw has 20 apples (15 are apple pies, 5 are apple muffins) and 30 oranges (10 are orange pies, 20 are orange muffins). Existing report shows 30 apples and 30 oranges. Somewhere data has gone wrong, it maybe during tagging of pies and muffins or it maybe there is a certain formula for classifying what is a valid apple and a valid orange. Literally comparing apples to oranges. (I hope it gives a better idea now) :D Depends on the type of project, if numbers dont tally, normally it can NOT go into production until the culprit is found. Or sometimes maybe the existing report is actually wrong. Or sometimes the business requirements are not correctly captured. Endless possibilities.
I'm actually not a SQL expert. I can only do simple queries and subqueries, not the too fancy ones. Think about it, it's quite fun to do those intersect, union to extract the correct data that you need in order to find the culprit.
No comments:
Post a Comment