Thursday, December 17, 2009

Data Checks (1 of 2)

Sharing some techniques. I drafted this one time when I was asked to do some 'knowledge sharing'. It's actually just common sense and some room for imagination to cater for 'intelligent assumptions'. :)

Scenario: Amounts do not tally.

1) Open the report in UAT instance. If specific account number or cust is given, search for that number. Replicate the issue for those accounts/CIFs raised, you can bring in other fields into the report (eg Sector Code, Sub Product Type, etc) if you need more information.

2) Check how the amount is calculated. The amount field can be an OLAP variable, look for the specific table.column in ETL that is used and do the calculation manually. If it’s a direct fetch from EDW, then query the database.

3) Next is to query the source itself. If it’s a lookup field from another instance, then look for specific join conditions in actual ETL mapping. If let’s say all your amounts are null but in Raw has value, then something is wrong in EDW. Either it’s an ETL lookup issue related to data, or requirement itself is wrong.

Eg: Issues:
- All debit fields are null because debit stream lookup to VBM in ETL in incorrect
- Some FD Credit balances are not matching and consistently lower in OLAP than in Raw, because VBM lookup in ETL has a component for renewal counters, and this field appears differently in VBM and GFDR and has to be padded with 0.

4) It could also happen that for some of the accounts match, some do not match, and there is no data format (trimming, padding) issue or ETL lookup error. This would mean that some of the filters (in either ETL side or OLAP side) are not proper

Eg: We had issues wherein we actually have to modify the filters, upon investigation that something is consistently wrong in our side. Sample issues:
BO: add filter to include Accounts Closed within Current Month. Remove filter on Receipt Status is Closed
ETL: remove filter on sub product type = GT.
Net, it can happen that requirement itself is incorrect. There are times when we cannot fully rely on what is stated in signed off mapping specs/report specs.

5) Another way is the comparison of total number of accounts. In this project, we normally go by account. Get the total number of accounts in OLAP vs the total number of accounts in Raw.
- It can happen that there are MISSING accounts from Raw but in EDW we take them in.
- Or the other way around, there are EXTRA accounts from Raw but in EDW we don’t have. With this, then you have to extract the accounts that are missing/extra, justify why it is or it is not there in EDW. Then do analysis and find a pattern where possible.

No comments: