MS SQL Select Weird Behaviour
SQLWeirdness
Setup
Let's say you have a table #myTable, such that...
| Type | Year | NumA | NumB | |
|---|---|---|---|---|
| 1 | A | 2020 | 5 | 4 |
| 2 | B | 2020 | 3 | 2 |
| 3 | A | 2020 | 6 | 1 |
| 4 | B | 2020 | 8 | 6 |
| ... | ... | ... | ... | ... |
| 1999 | A | 2025 | 8 | 7 |
| 2000 | B | 2025 | 6 | 3 |
Code
Now, run the SQL query:
SELECT * FROM (
SELECT TOP (100) *
FROM #myTable
) AS t
ORDER BY [Type]
You might imagine that the nested subquery would be ran first followed by the ordering of that data.
But you actually end up with all rows containing the [Type] = 'A'.
| Type | Year | NumA | NumB | |
|---|---|---|---|---|
| 1 | A | 2020 | 5 | 4 |
| 2 | A | 2020 | 6 | 1 |
| 3 | A | 2020 | 8 | 6 |
| 4 | A | 2020 | 3 | 2 |
| ... | ... | ... | ... | ... |
| 99 | A | 2021 | 3 | 3 |
| 100 | A | 2021 | 2 | 3 |
ORDER BY can infact affect the inner subquery!
Larger Example
However, it's more complicated than that, as after trying this on a much larger table with 18million rows.
You can see that the nested makes a major difference on the speed.
Performing a simple TOP (1000) takes 1 second:
SELECT TOP (1000) * FROM #myLargeTable
Performing a TOP (1000) with an ORDER BY takes 34 seconds.
This makes sense as ordering requires more power:
SELECT TOP (1000) * FROM #myLargeTable ORDER BY 1
But performing a TOP (1000) with an ORDER BY outside of that inner subquery takes 1 second!
SELECT * FROM (
SELECT TOP (1000) *
FROM #myLargeTable
) AS t
ORDER BY 1
So clearly the subquery is making some kind of difference.
Similar Notes
WITH Clause
Important to note that a WITH clause makes no difference to this result:
WITH mysubtable AS (
SELECT TOP (100) *
FROM #myTable
) AS t
SELECT * FROM mysubtable ORDER BY [Type]
Also returns a table with all 'A's.
ROW_NUMBER Function
Important to note that using a ROW_NUMBER function produces the same result:
SELECT *, r = ROW_NUMBER () OVER (ORDER BY [Type]) FROM (
SELECT TOP (100) *
FROM #myTable
) AS t
Also returns a table with all 'A's.
Thoughts
Minimum Batch Size
My hypothesis, is the SELECT query can only fetch rows in larger batch sizes.
So it grabs a larger batch (at least 2000 rows) and remembers it only wants 100.
Then it orders it. And then it perform the row cutoff at the end.
Being Smart
Alternatively, maybe SQL is being smart and realises that it can use a larger row count initially without affecting speed while still producing "better" results.