Thursday 13 November 2014

Postgresql - crosstab function - rows to columns anyone?

Recently, I was tasked on enhancing a report for our system. Just to get an idea, here is a little background that you need to know.

Customers are registered in the system and have exactly one profile associated with them. The profile contained a bunch of answers to certain questions. Up until now, these questions were static - so conceptually you have one column for the answer to every question. Pretty simple right?

profile1 - answer1 - answer2
profile2 - answer1 - answer2
....

However, now the questions are asked *dynamically*. The sizes of questions can grow or shrink. All of a sudden that one little neat profile had to contain a collection of answers - enter a one to many relationship.

Now about that report, the columns would contain the answers which in the past was just a matter of adding the relevant answer to the select clause. This would result in a column in the result set and that would be that.

However, I can't do that any more. Once I join the answer to the profile, because of the one to many; I get many result sets for the same profile.

profile1 - answer1
profile1 - answer2
profile2 - answer1
profile2 - answer2
....

Yikes! I can no longer just iterate over my result set and put answers in my respective columns. 

So there are 2 solutions to this problem. The easy solution would be to not to join the profile to answers - rather loop through each profile; get my answers for that profile, and then use that for my remaining columns.

But wouldn't it be nice if I could just take those stupid rows and transpose them into columns? Enter crosstab. This confused me at first; so let me break it down the way I understood it.

I am going to borrow the example from postgres's official documentation http://www.postgresql.org/docs/9.1/static/tablefunc.html

The key thing to remember is that 3 things are required:

1. The unique identifier - This is the guy that represents each unique record in our new transposed result set. This of this as what you would put in GROUP BY if you were performing aggregations. You can also think of it as our seed value. In the above, we want everything *GROUPED BY* year.

2. The categories you want transposed - These are the buckets that are coming up in your pesky rows instead of columns! Note these are not the values of the buckets - they just uniquely classify what should be transposed. In my business case, I wanted the questions to be transposed. In the above example, we want the months to be transposed.

3. The value of that category for each unique identifier - This makes sense yeah? We have a bucket for every record and we want that value. In the above, its the quantity the corresponding month. In my case, it is the answer to the question (buckets) for each profile.

Notice how you need to have a 1-1 relationship between your bucket and answer.

The rest is all syntax really, which I will let you read from the postgres docs.

No comments:

Post a Comment