Correlated Subquery – Performance Tuning
A correlated subquery (CS) is a subquery whose outer query results are processed a row at a time, against the subquery result. The subquery result is computed for each row processed.
- a CS eliminates the need for intermediate or temporary tables
- a CS is fully integrated with global join planning to minimize costs
- CS is significantly faster than the query using temporary tables
Example:
1. Employee with highest salary in each department
SELECT L_Na,Sal, D_No
FROM test.employee ee
WHERE Sal = (SELECT MAX (Sal)
FROM test.employee em
WHERE ee.D_No=em.D_No);
Answer Set:
Last_Name | Salary_Amount | Dept_No |
stifler | 60000 | 50 |
pitt | 70000 | 10 |
young | 60000 | 50 |
jones | 60000 | 30 |
paul | 70000 | 70 |
penn | 60000 | 60 |
white | 75000 | 90 |
nelson | 70000 | 80 |
pitt | 45000 | 100 |
lucas | 75000 | 20 |
phips | 65000 | 40 |
2. Employees whose salary is greater than the department average salary
SELECT L_Na, Sal, D_No
FROM test.employee ee
WHERE Sal > (SELECT AVG (Sal)
FROM test.employee em
WHERE ee.D_No= em.D_No);
Answer Set:
Last_Name | Salary_Amount | Dept_No |
paul | 70000 | 70 |
phips | 65000 | 40 |
shook | 65000 | 10 |
pitt | 70000 | 10 |
jones | 60000 | 30 |
lucas | 75000 | 20 |
wall | 65000 | 10 |
Concatenation & Correlated subqueries performance tuning
Concatenation allows to retrieve data correlated to the MIN/MAX function in a single pass
Example:
1. Employees with highest salary in each department
This can be written as using correlated subquery
SELECT D_No, Sal, L_Na, F_Na
FROM test.employee ee
WHERE Sal IN
(SELECT MAX (Sal)
FROM test.employee em
WHERE ee.D_No=em.D_No
GROUP BY D_No)
ORDER BY D_No;
Answer Set:
Dept_No | Salary_Amount | Last_Name | First_Name |
10 | 70000 | pitt | arun |
20 | 75000 | lucas | frank |
30 | 60000 | jones | indiana |
40 | 65000 | phips | carla |
50 | 60000 | stifler | tom |
50 | 60000 | young | nikie |
60 | 60000 | penn | arun |
70 | 70000 | paul | ak |
80 | 70000 | nelson | julie |
90 | 75000 | white | sam |
100 | 45000 | pitt | mark |
It can also be written as
SELECT D_No,
MAX(Sal || ' ' || L_Na || ',' ||F_Na)
FROM test.employee
GROUP BY D_No
ORDER BY D_No ;
Answer Set:
Dept_No | Maximum(((((Salary_Amount||' ')||Last_Name)||',')||First_Nam |
10 | 70000 pitt ,arun |
20 | 75000 lucas ,frank |
30 | 60000 jones ,indiana |
40 | 65000 phips ,carla |
50 | 60000 young ,nikie |
60 | 60000 penn ,arun |
70 | 70000 paul ,ak |
80 | 70000 nelson ,julie |
90 | 75000 white ,sam |
100 | 45000 pitt ,mark |
Did you observe the two answer sets? Can you tell the difference and how to overcome it? Please leave your answers in comments.