MS SQL Select Weird Behaviour

SQL

Weirdness

Setup

Let's say you have a table #myTable, such that...
TypeYearNumANumB
1A202054
2B202032
3A202061
4B202086
...............
1999A202587
2000B202563
Where there are 2000 rows, [Type] alternates between A and B, [Year] is split so about 20% of the rows are each. And [NumA] and [NumB] are random numbers.

Code

Now, run the SQL query: SELECT * FROM ( SELECT TOP (100) * FROM #myTable ) AS t ORDER BY [Type] You might imagine that the nested subquery would be ran first followed by the ordering of that data.
But you actually end up with all rows containing the [Type] = 'A'.
TypeYearNumANumB
1A202054
2A202061
3A202086
4A202032
...............
99A202133
100A202123
So it seems, the ORDER BY can infact affect the inner subquery!

Larger Example

However, it's more complicated than that, as after trying this on a much larger table with 18million rows. You can see that the nested makes a major difference on the speed.

Performing a simple TOP (1000) takes 1 second: SELECT TOP (1000) * FROM #myLargeTable Performing a TOP (1000) with an ORDER BY takes 34 seconds. This makes sense as ordering requires more power: SELECT TOP (1000) * FROM #myLargeTable ORDER BY 1 But performing a TOP (1000) with an ORDER BY outside of that inner subquery takes 1 second! SELECT * FROM ( SELECT TOP (1000) * FROM #myLargeTable ) AS t ORDER BY 1 So clearly the subquery is making some kind of difference.

Similar Notes

WITH Clause

Important to note that a WITH clause makes no difference to this result: WITH mysubtable AS ( SELECT TOP (100) * FROM #myTable ) AS t SELECT * FROM mysubtable ORDER BY [Type] Also returns a table with all 'A's.

ROW_NUMBER Function

Important to note that using a ROW_NUMBER function produces the same result: SELECT *, r = ROW_NUMBER () OVER (ORDER BY [Type]) FROM ( SELECT TOP (100) * FROM #myTable ) AS t Also returns a table with all 'A's.

Thoughts

Minimum Batch Size

My hypothesis, is the SELECT query can only fetch rows in larger batch sizes. So it grabs a larger batch (at least 2000 rows) and remembers it only wants 100. Then it orders it. And then it perform the row cutoff at the end.

Being Smart

Alternatively, maybe SQL is being smart and realises that it can use a larger row count initially without affecting speed while still producing "better" results.