Thursday, March 6, 2008

How to manipulate data through time.

Plenty of us are use bit column as piece of information if record is valid or not. But if  you're selecting only valid records, you must filter table with filtering statement like

FROM SomeTable

or something like that statement. Multiple queries should take disadvantage of getting full scan of table rows, because we couldn't use index on this column. But there should be more disadvantages.

First of them is absence of history related to our data collections. Second one is higher CPU and I/O load during executing this statement compared to filtering columns which should be indexed.

So I decided to choose two columns in my tables to localize records in time. First of them is column with GETDATE() default value (you know it probably :)), "CreateDate" name with nonclustered index applied, ordered in descend order.

A several weeks ago I spoked with one software company owner about NULL values in columns. "Here is no reason to use delegated values instead of NULLs", he said. "When database engine comes with new version, here can be plenty of difficulties with those values.Why you have choosed this kind of mechanism?"

I thought that seeking some column values  with datetime type was faster then seeking columns which has NULL values enabled. Scanning NULLs is slower then filtering specified values as I noticed in my last article. I answered him that database engine always use index scan and never use index seek. "You're wrong", he said. "Server always use index scan."

I talked about it with my boss and he said: "Compare those two mechanisms".
I decided to write testing script. I've created table with primary key (integer type) with identity incrementation and two columns with datatime type. I inserted about 1 million records into it and then I selected all records with specified values, records with null values and then I seeked records in some random interval. Interval is random for every iteration in both test runs. Selecting range over NULL enabled column is more slower, because SARG is not applied and query is much slower.

You can change previous statement to this:

SELECT [some columns]
FROM dbo.Table
WHERE CreateDate <= @ParameterDate AND DeleteDate >= @ParameterDate

to get records which are valid for specified range and query doesn't consume so much server resources.

Here you can download testing script.

Monday, January 21, 2008

To be with NULL or not to be with NULL

using ISNULL and COALESCE  sql functions.

From time to time you probably catch an idea about optimizing performance in your sql queries.
At time I was written this article, I was using  SQL Server 2005.

I had read much articles about optimizing queries in SQL queries. Some authors had wroted, that ISNULL function is better and other had said that COALESCE is faster, because COALESCE function work similar to CASE statement in programming languages.

So I created a small SQL script to measure which method should be faster.

I ran this script on my machine, so results were better for ISNULL function. I thought it's because COALESCE must hold datatype for each tested value.

Please also notice differences between selecting values from NULL-enabled and non NULL enabled columns with the same data types. I think, that querying over not NULL enabled columns should gain some speed.

FunctionData typeNullValues PresenceMinCPUTimeMaxCPUTimeAvgCPUTimeMin Timelength [ms]Max Timelength [ms]Avg Timelength [ms]