InsideOut

Bring your inside out...

My Photo
Name:
Location: Pune, Bangalore, India

I like doing things in new ways... music... touring... swimming... teaching...mathematics...

Tuesday, January 04, 2005

Oracle's Funny Interpretter

Quite sometime back I got interested in the GROUP BY clause of SQL-92. I feels that's an interesting clause, especially when it works in combination with expressions and aggregate functions.

Oh! Before I dive deep into the thought provoking title, let me put some background for those who have a faint idea of SQL-92 constructs. SQL is a query-language (something that helps to fetch/update data using small chunks of code). It expands to Structured Query language. The question 'how much structured' really makes me sink deep into thoughts. Hmmm...let me not spoil the spirit by discussing the structuredness of SQL. But I'd say it's successful (like HTML) because of its simplicity and flexible nature (I think, I sound applauding!).

I don't have patience to teach you SQL here (I think its pretty simple, get C.J. Date's book in front of you). My focus would be on the SELECT construct, that helps you fetch the data of your desire. GROUP BY clause helps you partition the data as per your desire (i.e each partition following some characteristic that you specify). Rest of this blog, I'll talk about partitioning (I guess, 'grouping' is the appropriate term for the SQL programmers).

I'd also mention this:
SELECT COL1, COL2
FROM TAB1
GROUP BY COL1, COL2, COL3
The column-list in SL (short for 'SELECT') clause must be a subset of that in GB (short for 'Group By') clause. So the query above is valid. What follows is an invalid query (as COL2 is not in the GB-list):

SELECT COL1, COL2
FROM TAB1
GROUP BY COL1, COL3

Let me use Pink color for GB-List and Green for the SL-List. SL clause can take any column if it's an argument to an aggregate function. Thus, the following query is valid.
SELECT COL1, sum(COL2)
FROM TAB1
GROUP BY COL1, COL3
If you know a bit of SQL, you see every thing is in perfect shape. But what if we bring in expressions? OK, let's see:

SELECT (COL2 * 2)
FROM TAB1
GROUP BY COL1, (COL2 * 2)
is valid, because the SL-list is a subset of the GB-list (as per SQL-92 requirement). The following queries are all valid and are treated well by the Oracle's SQL Interpretter:

SELECT (COL2 * 2 * 2)
FROM TAB1
GROUP BY COL1, (COL2 * 2)

SELECT (COL1 + COL2)
FROM TAB1
GROUP BY (COL1 * COL2)

SELECT (COL1 + COL2)
FROM TAB1
GROUP BY COL1, COL2

SELECT (COL2 * 2)
FROM TAB1
GROUP BY COL1, (2 * COL2)

SELECT (COL2 * 2 * 2)
FROM TAB1
GROUP BY COL1, (COL2 * 4)

Sorry if this confuses you! The Subset rule is just a rule of thumb. To judge their validity, requires a bit of going into SQL-92, which gives a wide allowance of what is acceptable in a query with a GB. I won't try to explain it here. But I may take if offline through comments.

I was a fan of Oracle. I believed the developers really pumped in good amount of AI to achieve the expression comparision, until I came across the following observation with the Oracle's Interpretter:

SQL> select (2*comm) + (2*sal) from bonus group by (2*comm) + (2*sal);

(2*COMM)+(2*SAL)
----------------
150000
150020
150220




SQL> select (2*comm) + (2*sal) from bonus group by (2*comm) + 2*(sal);

(2*COMM)+(2*SAL)
----------------
150000
150020
150220




SQL> select (2*comm) + (2*sal) from bonus group by (2*comm) + (sal)*2;
select (2*comm) + (2*sal) from bonus group by (2*comm) + (sal)*2
*
ERROR at line 1:
ORA-00979: not a GROUP BY _expression

You observed what happened?...NO?? Let me hand it over to you. Oracle's Interpretter says, the following SQL query is invalid (in fact, it's a valid SQL-92 query)
SELECT COL1 * 2 + COL2 *2
FROM TAB1
GROUP BY COL1 * 2 + 2* (COL2)
Whoa! I am not criticizing the failure of Oracle to treate this one correctly. But rather I want to bring your attention to a pattern. Go back and read Oracle's reaponse in Red. What does the success of the first two queries suggest you? To me it appeared that Oracle implements a good AI (this is just not as trivial as Expression-String-matching or Expression-Syntax-Tree matching) to resolve the expression-comparision! In fact, this conclusion is buttressed by the support of other sofisticated GB queries involving expression (re-read the statement in Blue, back).

But hey! What happened to the third query in Red? Are COL1 * 2 + COL2 *2 and COL1 * 2 + 2* (COL2) not equivalent? They are! Then why does Oracle's interpretter fail here? This suggests Donkey-work done!

Another valid SQL, that fails in Oracle.
SELECT 2*(COL1) + 2*(COL2)
FROM TAB1
GROUP BY 2*(COL1 + COL2)
We know that 2*(COL1) + 2*(COL2) and 2*(COL1 + COL2) are equivalent. The question is how do you make the Computer recognize this! One way is to write code for each pattern you can think of (this is exactly what I mean by Donkey-Coding!). Another is to have a robust AI that matches expression as Human Brain does (here your system will work in case of unforeseen patterns as well; not dependent on how many patterns you could think of!).

The title of this blog could appropriately have been Oracle's Donkey interpretter! Whatever is it... it's really costly to implement such a robust AI. But at the same time, it's needed to support all the common patterns and Oracle has done that! No wonder, it's successful.



6 Comments:

Anonymous Anonymous said...

The following seems to be invalid:

SELECT (COL1 + COL2)
FROM TAB1
GROUP BY (COL1 * COL2)

2:41 PM  
Blogger Sujeet Banerjee said...

Oh! You are right! That's a typo :(

The correct one is

SELECT (COL1 + COL2)
FROM TAB1
GROUP BY (COL1 + COL2)

3:46 PM  
Blogger Sujeet Banerjee said...

This comment has been removed by a blog administrator.

3:48 PM  
Blogger Srihari SN said...

Dude,

did u send this to oracle ?

1:44 PM  
Blogger Sujeet Banerjee said...

Well, I do not wish to. I feel this is the place where I express my own views and expressions :)

I'd certainly love to address any issues or questions through comments. Please feel free.

- Sujeet

7:24 PM  
Blogger Sujeet Banerjee said...

Hi Manivannan

Thanks for asking the questions. In fact, I was eager to write out the validity rules for GB. You just gave me the opportunity.

Let me write everything a fresh as a new blog. Simply, as I'd get better editing facilities.

- Sujeet

8:38 PM  

Post a Comment

<< Home