Thursday, March 6, 2008

How to manipulate data through time.

Plenty of us are use bit column as piece of information if record is valid or not. But if  you're selecting only valid records, you must filter table with filtering statement like


SELECT 10 Id
FROM SomeTable
WHERE Del=0


or something like that statement. Multiple queries should take disadvantage of getting full scan of table rows, because we couldn't use index on this column. But there should be more disadvantages.

First of them is absence of history related to our data collections. Second one is higher CPU and I/O load during executing this statement compared to filtering columns which should be indexed.

So I decided to choose two columns in my tables to localize records in time. First of them is column with GETDATE() default value (you know it probably :)), "CreateDate" name with nonclustered index applied, ordered in descend order.

A several weeks ago I spoked with one software company owner about NULL values in columns. "Here is no reason to use delegated values instead of NULLs", he said. "When database engine comes with new version, here can be plenty of difficulties with those values.Why you have choosed this kind of mechanism?"

I thought that seeking some column values  with datetime type was faster then seeking columns which has NULL values enabled. Scanning NULLs is slower then filtering specified values as I noticed in my last article. I answered him that database engine always use index scan and never use index seek. "You're wrong", he said. "Server always use index scan."

I talked about it with my boss and he said: "Compare those two mechanisms".
I decided to write testing script. I've created table with primary key (integer type) with identity incrementation and two columns with datatime type. I inserted about 1 million records into it and then I selected all records with specified values, records with null values and then I seeked records in some random interval. Interval is random for every iteration in both test runs. Selecting range over NULL enabled column is more slower, because SARG is not applied and query is much slower.

You can change previous statement to this:

SELECT [some columns]
FROM dbo.Table
WHERE CreateDate <= @ParameterDate AND DeleteDate >= @ParameterDate

to get records which are valid for specified range and query doesn't consume so much server resources.


Here you can download testing script.