SQL Programmability & API Development Team Blog

All posts are AS IS, without any further guarantees or warranties.

Mon, 29 Jun 2009 22:56:00 GMT

Interesting issue with Filtered indexes.

Recently, an ISV I work with ran into an interesting problem with Filtered Indexes.

The application does all SQL INSERT, DELETE, and UPDATE operations using individual stored procedures. To improve performance, they decided to create a Filtered Index (new in SQL 2008) to restrict the data such that it does not contain NULL values, in this case drastically reducing the number of rows in the index. See documentation here: http://msdn.microsoft.com/en-us/library/ms175049.aspx

This all worked fine, until months later, when the ‘UPDATE’ stored procedure was updated as part of a routine application upgrade. The stored proc was replaced with a newer version using a TSQL script. After adding the ‘new’ stored proc the application was tested and the following was observed;

·       the SELECT statements accessing the table continued to use the filtered index

·       the ‘INSERT’, and ‘DELETE’ stored procs continued to work

·       However, the modified ‘UPDATE’ stored proc returned the following error:

InnerException: System.Data.SqlClient.SqlException: UPDATE failed because the following SET options have incorrect settings: 'QUOTED_IDENTIFIER'. Verify that SET options are correct for use with indexed views and/or indexes on computed columns and/or filtered indexes and/or query notifications and/or XML data type methods and/or spatial index operations.

Note that the error message is very explicit and actually points us to the source of the problem. Can you guess what it is?

You are correct! The TSQL script used SET options that were set differently for the new ‘UPDATE’ stored proc than they were originally, and they don’t conform to the rules required to utilize Filtered Indexes.  In this case, the SET QUOTED_IDENTIFIER was set to an invalid setting.

The solution to the problem was to recreate the stored proc using the ‘correct’ settings required to use Filtered indexes, documented here: http://msdn.microsoft.com/en-us/library/ms188783.aspx

The following simplified TSQL example shows the problem and the solution.

NOTE: the fact that Stored Procedures were used is important, because they ‘inherit’ the SET statement values they were created with, and NOT the values they are executed with.

USE USE master

GO

CREATE DATABASE FI_Test

GO

USE FI_Test

GO

SET ANSI_NULLS ON

GO

SET QUOTED_IDENTIFIER ON

GO

CREATE TABLE [FactSalesQuota](

      [SalesQuotaKey] [int] IDENTITY(1,1) NOT NULL,

      [EmployeeKey] [int] NOT NULL,

      [DateKey] [int] NOT NULL,

      [CalendarYear] [smallint] NOT NULL,

      [CalendarQuarter] [tinyint] NOT NULL,

      [SalesAmountQuota] [money] NOT NULL,

 CONSTRAINT [PK_FactSalesQuota_SalesQuotaKey] PRIMARY KEY CLUSTERED

([SalesQuotaKey] ASC) ON [PRIMARY]

) ON [PRIMARY]

GO

CREATE NONCLUSTERED INDEX FI_FactSalesQuota

ON FactSalesQuota(Employeekey,CalendarQuarter)

WHERE CalendarYear = 2009 -- <---- This makes it a Filtered index

GO

INSERT FactSalesQuota values(53,20090101,2009,4, 37000.00)

GO

SELECT * FROM FactSalesQuota

GO

CREATE PROCEDURE UPDATE_FactSalesQuota

 @EmployeeKey int,

 @CalendarYear smallint,

 @CalendarQuarter tinyint,

 @SalesAmountQuota money

AS

UPDATE FactSalesQuota

 SET SalesAmountQuota =  @SalesAmountQuota

    

 WHERE      EmployeeKey = @EmployeeKey and

            CalendarYear = @CalendarYear and

            CalendarQuarter = @CalendarQuarter 

GO

EXECUTE UPDATE_FactSalesQuota 53,2009,4,50000.00

GO

SELECT * FROM FactSalesQuota

GO

 

--- this was the script to update the sp

use FI_Test

GO

sp_rename UPDATE_FactSalesQuota , UPDATE_FactSalesQuota_V1

GO

SET ANSI_NULLS ON

GO

SET QUOTED_IDENTIFIER OFF -- <------ HERE is what caused the error

--SET QUOTED_IDENTIFIER ON -- <------ it will work if this is set correctly

GO

CREATE PROCEDURE UPDATE_FactSalesQuota

 @EmployeeKey int,

 @CalendarYear smallint,

 @CalendarQuarter tinyint,

 @SalesAmountQuota money

AS

UPDATE FactSalesQuota

 SET  SalesAmountQuota =  @SalesAmountQuota,

        DateKey = cast((CONVERT (char(8) ,getdate(), 112)) as int)

 WHERE      EmployeeKey = @EmployeeKey and

            CalendarYear = @CalendarYear and

            CalendarQuarter = @CalendarQuarter 

GO

--this fails

EXECUTE UPDATE_FactSalesQuota 53,2009,4,52000.00

GO

SELECT * FROM FactSalesQuota

GO

-- Now go back fix the script, and retry the sript and now it will work

 

 

 

Cross Posted from http://blogs.microsoft.com/mssqlisv

Mon, 18 May 2009 20:57:00 GMT

Why did the size of my indexes expand when I rebuilt my indexes?

Recently I worked with a partner who was seeing some interesting behavior.  Upon rebuilding their indexes they noticed that the total space used by all indexes increased significantly.   The table has no clustered index but does have a total of nine non-clustered indexes. 

The sequence of events is as follows:

·       Step 1: Approximately 12 million rows are inserted into an existing table via some batch loading of the data. 

 

·       Step 2: All the indexes on the table are rebuilt using:

 

ALTER INDEX MyIndex ON MyTable WITH (SORT_IN_TEMPDB=ON, ONLINE=ON)

The sp_spaceused procedure was used before and after each of the steps above to measure the amount of space used by the table and indexes.  Here are the results:

BEFORE STEP 1:

 

sp_spaceused MyTable

 

Name          Rows         Reserved       Data          Index Size      Unused

------        ------       ----------     -------       ----------      -----------

MyTable       1156563588   324009704 KB   88318384 KB   235511080 KB    180240 KB

 

AFTER STEP 1, BEFORE STEP 2 (Data added, nothing done to indexes):

 

sp_spaceused MyTable

 

Name          Rows      Reserved       Data         Index Size   Unused

------        ------    ----------     -------      ----------   -----------

MyTable       1169556034 329729960 KB   89645944 KB 240051312 KB 32704 KB

 

 

AFTER STEP 2 (All indexes rebuilt):

 

sp_spaceused MyTable

 

Name          Rows      Reserved       Data         Index Size   Unused

------        ------    ----------     -------      ----------   -----------

MyTable       1169595370 459848840 KB   89649160 KB 363548216 KB 6651464 KB

 

Notice the size of the index after adding the rows to the table and then after rebuilding the indexes (highlighted in red above).  Before the rebuild the index size was approximately 240GB but after the rebuild it was nearly 365GB, an increase of nearly 50%.

 

It is also worth noting that in the above sample there were concurrent inserts against this table while the 12 million rows were added, as well as during the index rebuilds. This is why the indexes were rebuilt online and explains the differences in rowcount between each step.

 

Why did my index sizes increase so much?

The answer is related to the fact that 1) RCSI was enabled on the database and 2) the index was rebuilt ONLINE.   When RCSI is enabled on a database there is an additional 14 bytes appended to each row as it is inserted, updated or deleted.  This applies to the table as well as any index modified by the action. The purpose of this extra space is to maintain information about row versions that is needed for the RCSI functionality.  This is described in greater depth in the following blog: http://blogs.msdn.com/sqlserverstorageengine/archive/2008/03/30/overhead-of-row-versioning.aspx.

When an index is rebuilt using the option ONLINE=ON SQL Server will append these 14 bytes to every row during the rebuild.  However, when the index is rebuilt with the option ONLINE=OFF the 14 bytes are not appended but instead removed from any existing rows, unless the row is currently part of an active transaction.   The difference in behavior is by design.   When RCSI is not enabled, the above does not apply since there is no need to maintain row version information.

An existing index can be expected to increase in size after an online rebuild when either of the following is true.

1.       An index with existing data has been rebuilt offline prior to the online rebuild. In this case the size of the index will increase because the 14 bytes removed during the offline rebuild are added to each row during the online rebuild. 

2.       The database had existing data prior to RCSI being enabled.  In this case any online rebuild will add the additional 14 bytes to each row.

 

For this specific scenario the database had existing data prior to RCSI being enabled and these indexes were also rebuilt offline at some point in time before the online index rebuilds were performed.  

How much can I expect my indexes to grow in size?

The size increase as a percentage of the original index sizes will be dependent on the size of the index keys for the indexes.  This can be much higher in cases when the size of the key columns in bytes is relatively small. 

The below illustrates the approximate size of the index keys and the increase in size as a percentage of the key size for two of the nine indexes.  The Row Identifier below (RID) is added to each row since this table is a heap (no clustered index) and is used to identify the FILEID:PAGEID:ROWID for each index row.

CREATE UNIQUE NONCLUSTERED INDEX [NC_Idx_1] ON [MyTable]

(

      [Column1] ASC, --[PersonID] [numeric](16, 0)  (9 bytes)

      [Column2] ASC, --[int]                   (4 bytes)

      [Column3] ASC, --[varchar](20)                (20 bytes max)

      [Column4] ASC, --[varchar](20)                (20 bytes max)

      [Column5] ASC  --[char](10)                   (10 bytes)

 

--9+4+20+20+10+8(RID)=71 (Maximum original key size)

--                    +14 (RCSI – versioning information)

--                    =85

-- (~20% increase in size, could be more if varchar columns have <20 bytes)

)

 

CREATE UNIQUE NONCLUSTERED INDEX [NC_Idx_2] ON [MyTable]

(

      [Column2] ASC, --[int]                 (4 bytes)

      [Column5] ASC  --[char](10)           (10 bytes)

 

--4+10+8(RID)=22 (Approx. original key size)

--           +14 (RCSI – versioning information)

--            =36

-- (~60% increase in size)

)

 

As shown above, the additional 14 bytes introduced as part of the rebuild is significantly more as a percentage of the total row size for NC_Idx_2 than for NC_Idx_1.  For this particular example there were nine indexes on the existing table, five of which had relative small key sizes (similar to NC_Idx_2 above). This explains way there was such a large increase in size as a percentage of the original size.  In addition to the sp_spaceused procedure, the DMV sys.dm_index_physical_stats exposes a column avg_record_size_in_bytes which can be used to measure the average row size within an index before and after index rebuilds.  The avg_record_size_in_bytes includes the 14 bytes added by RCIS, if present, as part of the calculation.

Related to this behavior, there are some other interesting considerations.

1.       After an index is rebuilt offline, workloads that do many updates or deletes may introduce fragmentation to the index.  When an index is rebuilt offline any existing versioning information is removed from the row however any update or delete will add these 14 bytes back into to the row.  When data pages are nearly full, as is the case after index rebuilds, the increase in row size as a result of the addition 14 bytes may result in page splits.  This problem can be avoided by explicitly specifying a FILLFACTOR less than 100% when rebuilding an index.  This will leave free space on the data/index pages and reduce the likelihood of splits. This is not a consideration if indexes are rebuilt online since the row versioning information will exist on each row already.

 

2.       Data compression (in SQL Server 2008) performed on an index or table (either ROW or PAGE compression) using the ONLINE=ON option may result in less space saving than expected when RCSI is enabled due to the addition of the 14 bytes.  These 14 bytes are initialized with a timestamp and place holder for the version record pointer and the information is not compressed by either ROW or PAGE compression.  The stored procedure sp_estimate_data_compression_savings can be used as a method to measure the expected impact of data compression on the index.

 

It is possible that this behavior may change in a future release of SQL Server.

Cross Posted from http://blogs.microsoft.com/mssqlisv

Fri, 10 Apr 2009 23:43:00 GMT

SQL Server 2005 / 2008 table partitioning : Important things to consider when switching-out partitions.

An ISV recently found an anomoly in their implementation of a "drop table partition" function which could lead to unexpected partitions being switched out.

Typically, to do this "drop partition logic" SQL Server partitions are manipulated using the following operations:

·      switch-out the partition which isn't needed anymore (or being archived) into an empty target table

·      merge / reset the partition function ranges of the partitioned source table. For example; move some of the remaining data into a new partition / filegroup

·      possibly drop the target table

The simple, standard way to switch-out a partition is to specify a partition number. The ISV implemented a select statement which used a boundary value of a partition range to return the corresponding partition number. Then the TSQL switch command was used to switch out this partition. This works fine as long as there is only one process doing the "switching".

But now let's assume that two processes would try this at the same time. Both processes will run the selects to get the partition numbers. Then the first one does the switch-out and the merge of the partition function. Unfortunately, the latter command will change the partition numbers as they are dynamically maintained by SQL Server. An activity like merge or splits of partitions will trigger a re-enumeration of all or parts of the partitions of a the table.

Afterwards the partition number which the second process got before the re-enumaration (tiggered by the first process) might not be accurate any more. A switch-out using the old partition number could result in switching out the wrong partition.

The solution is simple. We recommend using the $PARTITION function for the switch-out which allows you to specify a partition boundary value instead of a partition number.

The boundary values are not dynamic and therefore this issue won't come up.

Repro :

The repro script below does the following: 

·       create a partitioned test table with 5 boundary values: 1960, 1970, 1980, 1990, 2000

·       insert 1 row into the 1970 range, 2 rows into the 1980 range and 5 rows into the 1990 range

·       now switch out the 1970 range and the 1980 range by specifying the boundary values

·       as expected, three ranges remain: 1960, 1990 with 5 rows and 2000

·       now repeat the same test by specifying a partition number instead of the boundary value

·       the output after creating the test table shows partition number 2 for boundary 1970 and 3 for boundary 1980

·       using "hard-coded" partition numbers for the "switch partition" function simulates the select mentioned above

·       however, now the results look different. Like before the boundaries 1960, 1990 and 2000 remain  as expected. But the number of rows are not at all what we would expect!

·       instead of 5 rows for boundary 1990 we see only 2 ! What happened ?

·       well - the first process did the merge of the partition function. This changed the partition number of boundary 1990 from 4 to 3 and the one of  boundary value 1980 from 3 to 2. The call of the "switch partition" function with partition number 3 will now switch-out the rows of boundary value 1990. Therfore the 5 rows are gone and the 2 rows of 1980 will be kept. The merge function would still use the correct boundary value. So the list of boundary values looks ok but the content is wrong !

 

The solution to use the $PARTITION function also works in case two processes interfere between switch-out and the partition function merge.

TSQL Repro:

 

if exists ( select * from sys.procedures where name = 'dp_reset_test' )

drop procedure dp_reset_test

 

if exists ( select * from sys.procedures where name = 'dp_list_partitions' )

drop procedure dp_list_partitions

 

if exists ( select * from sys.procedures where name = 'dp_switch_partition_via_boundary' )

drop procedure dp_switch_partition_via_boundary

 

if exists ( select * from sys.procedures where name = 'dp_switch_partition_via_partno' )

drop procedure dp_switch_partition_via_partno

 

go

set nocount on

go

 

-- stored procedure to create partitioned test table

create procedure dp_reset_test

as

begin

 

if exists ( select * from sys.objects where name = 'dp_test1' and type = 'U' )

drop table dp_test1

if exists ( select * from sys.objects where name = 'dp_test1_clone' and type = 'U' )

drop table dp_test1_clone

if exists ( select * from sys.partition_schemes where name = 'ps_year' )

drop partition scheme ps_year

if exists ( select * from sys.partition_functions where name = 'pf_year' )

drop partition function pf_year

 

CREATE PARTITION FUNCTION pf_year (int)

AS

RANGE LEFT FOR VALUES ( 1960, 1970, 1980, 1990, 2000 )

CREATE PARTITION SCHEME ps_year AS PARTITION pf_year ALL TO ([PRIMARY])

create table dp_test1

( col1 int default 99 ,

col2 int,

col3 int primary key

) on ps_year(col3)

create table dp_test1_clone

( col1 int default 99 ,

col2 int,

col3 int primary key

)

 

insert into dp_test1 values ( 1966,1966,1966 )

insert into dp_test1 values ( 1971,1971,1971 )

insert into dp_test1 values ( 1972,1972,1972 )

insert into dp_test1 values ( 1984,1984,1984 )

insert into dp_test1 values ( 1985,1985,1985 )

insert into dp_test1 values ( 1986,1986,1986 )

insert into dp_test1 values ( 1987,1987,1987 )

insert into dp_test1 values ( 1988,1988,1988 )

print ''

end

go

 

-- stored procedure to print partitions info

create procedure dp_list_partitions ( @tabname char(20) )

as

begin

declare @p_number int

declare @p_rows int

declare @boundary_value int

declare p_details cursor for

select partition_number, rows, convert(int,sprv.value)

from sys.partitions sp,

sys.partition_functions spf,

sys.partition_range_values sprv

where object_id = OBJECT_ID(@tabname) and

spf.function_id = sprv.function_id and

sprv.boundary_id = sp.partition_number and

( sp.index_id = 1 or sp.index_id = 0 ) and

spf.name = 'pf_year'

order by partition_number

 

open p_details

FETCH NEXT FROM p_details

INTO @p_number, @p_rows, @boundary_value

 

print 'part no     ' +

'# rows        ' +

'boundary'

 

WHILE @@FETCH_STATUS = 0

begin

print convert(char(10), @p_number) +

'    ' +

convert(char(10), @p_rows) +

'    ' +

convert(char(10), @boundary_value)

FETCH NEXT FROM p_details

INTO @p_number, @p_rows, @boundary_value

end

close p_details

deallocate p_details

print ''

end

go

 

-- stored procedure to get rid of a partition by specifying the partion number

create procedure dp_switch_partition_via_partno ( @partno int, @boundary int )

as

begin

truncate table dp_test1_clone

alter table dp_test1 switch partition @partno to dp_test1_clone

alter partition function [pf_year]() merge range (@boundary)

end

go

 

-- stored procedure to get rid of a partition by specifying the boundary value

create procedure dp_switch_partition_via_boundary ( @boundary int, @merge_flag int )

as

begin

if( @merge_flag = 1 )

begin

truncate table dp_test1_clone

alter table dp_test1 switch partition $PARTITION.pf_year(@boundary) to dp_test1_clone

alter partition function [pf_year]() merge range (@boundary)

end

if( @merge_flag = 2 )

begin

truncate table dp_test1_clone

alter table dp_test1 switch partition $PARTITION.pf_year(@boundary) to dp_test1_clone

end

if( @merge_flag = 3 )

begin

alter partition function [pf_year]() merge range (@boundary)

end

end

go

 

-- test sample

-- create partitioned test table

execute dp_reset_test

print 'Test table with 5 boundary values : '

print ''

execute dp_list_partitions 'dp_test1'

print 'Switch partitions with boundary 1970,1980 via boundary value : '

print ''

execute dp_switch_partition_via_boundary 1970, 1

execute dp_switch_partition_via_boundary 1980, 1

execute dp_list_partitions 'dp_test1'

 

print 'as expected boundary values 1970 and 1980 are gone and 1990 remains with 5 rows'

print ''

print ''

 

-- reset test table

execute dp_reset_test

print 'Switch partitions with boundary 1970,1980 via partno : '

print ''

execute dp_switch_partition_via_partno 2, 1970

execute dp_switch_partition_via_partno 3, 1980

execute dp_list_partitions 'dp_test1'

 

print 'boundary values 1970 and 1980 are gone but 1990 remains with 2 rows which is unexpected !'

print ''

print ''

 

-- reset test table

execute dp_reset_test

print 'Switch partitions with boundary 1970,1980 via boundary value with '

print '"deferred merge of the partition function" : '

print ''

 

 

-- switch out only

execute dp_switch_partition_via_boundary 1970, 2

execute dp_switch_partition_via_boundary 1980, 2

 

-- merge partition function only

execute dp_switch_partition_via_boundary 1970, 3

execute dp_switch_partition_via_boundary 1980, 3

execute dp_list_partitions 'dp_test1'

 

print 'as expected boundary values 1970 and 1980 are gone and 1990 remains with 5 rows'

print ''

Cross Posted from http://blogs.microsoft.com/mssqlisv

Fri, 27 Mar 2009 00:30:00 GMT

Avoid using JDK Date APIs to handle timezone sensitive date and time

JDK APIs for Class “java.util.Date” and “java.sql.Timestamp(subclass of Date) including getHours(), getMinutes(), getSeconds(), getTimestamp() allow you to retrieve date/time related information. However, the JVM (Java Virtual Machine) won’t handle timezone sensitive data properly using these APIs. As matter of fact, these APIs were deprecated starting JDK 1.1 (http://java.sun.com/j2se/1.5.0/docs/api/java/util/Date.html). But I am still seeing ISV developers use these APIs in their JAVA applications, leading to incorrect results or behavior.

What happens is when date/time data is inserted into the SQL Server database, it’s stored correctly in SQL Server. However, when the date/time is read using the APIs mentioned above, the retrieved value is implicitly  converted to “local time” depending on where the host of JVM (Java Virtual Machine) is. For example, when a java application reads “1/8/2009 3:30:00 AM” from SQL Server database, you would get different results depending on the location of the app.

 

SQL Server (in pacific time zone):

create table datetime_tbl (id INTEGER unique not null, dateTime_v DATETIME not null)

insert into datetime_tbl (id, dateTime_v) values (1, '2009-01-08 03:30:00')

go

 

Java application:

Statement s = connection.createStatement();

query = "select id, dateTime from datetime_tbl where id = " + 1;

s.execute(query);

ResultSet rs = s.getResultSet();

rs.next();

String localApptimezone = Calendar.getInstance().getTimeZone().getID();

// -8 is offset of GMT to read the time as pacific time.

java.util.TimeZone timeZoneP = new java.util.SimpleTimeZone(-8*3600000, "GMT-8 (Pacific)");

java.util.Calendar cal= java.util.Calendar.getInstance(timeZoneP);

Timestamp tStamp = rs.getTimestamp(2, cal);

System.out.println("local application timezone: " + localApptimezone);

System.out.println("time: " + tStamp.toString() + " in " + timeZoneP.getID());

Results (note: 1 hour difference for Arizona test):

 

Location of app/JVM

Result

Washington (Pacific time zone)

local application timezone: America/Los_Angeles

time: 2009-01-08 03:30:00.0 in GMT-8 (Pacific)

Arizona (Mountain time zone)

local application timezone: America/Phoenix

time: 2009-01-08 04:30:00.0  in GMT-8 (Pacific)

 

The recommended way of handling this type of scenario is to utilize DateFormat and avoid getTimestamp() and other aforementioned APIs. Alternatively, you can convert the date/time to character string directly.

 

Statement s = connection.createStatement();

query = "select id, dateTime from datetime_tbl where id = " + 1;

s.execute(query);

ResultSet rs = s.getResultSet();

rs.next();

String localApptimezone = Calendar.getInstance().getTimeZone().getID();

// -8 is offset of GMT to read the time as pacific time.

java.util.TimeZone timeZoneP = new java.util.SimpleTimeZone(-8*3600000, "GMT-8 (Pacific)");

java.util.Calendar cal= java.util.Calendar.getInstance(timeZoneP);

java.text.DateFormat dateFormat = java.text.DateFormat.getInstance();

dateFormat.setTimeZone(timeZoneP);

System.out.println("local application timezone: " + localApptimezone);

System.out.println("time: " + dateFormat.format(cal.getTime()) + " in " + timeZoneP.getID());

 

Results (note: consistent results for both region tests as expected):

 

Location of app/JVM

Result

Washington (Pacific time zone)

local application timezone: America/Los_Angeles

time: 1/8/09 3:30 AM in GMT-8 (Pacific)

Arizona (Mountain time zone)

local application timezone: America/Phoenix

time: 1/8/09 3:30 AM in GMT-8 (Pacific)

 

Cross Posted from http://blogs.microsoft.com/mssqlisv

Mon, 02 Feb 2009 21:20:00 GMT

Zeroing in on blocking on seemingly unrelated tables

In one of our recent lab tests we were surprised to see blocking occur on a table that did not participate in the transaction being reported as the cause of the blocking.  From the sp_lock output we noticed an ‘X’ lock being held on the table, but we could guarantee that there was no insert, delete or update activity on the table.   Needless to say, this was odd and baffled us for while and it was only when we analyzed the definitions of the tables that we could determine the source of the problem.  Let’s take a look at this scenario via a simplified example and explain the cause of the blocking.

Consider the case where we have two tables, ORDERS and ORDER_LINE and a foreign-key relationship as shown below.

 

These tables are populated with the following 4 rows of data:

INSERT INTO ORDERS (ORDER_ID, CREATED_BY, DUE_DATE) VALUES

       (100, 'Burzin', '11/24/2003'),

       (101, 'Burzin', '11/28/2003');

      

INSERT INTO ORDER_LINE (ORDER_ID, ORDER_LINE_ID, ITEM, QUANTITY) VALUES

       (100, 1, 1028, 12),

       (101, 2, 1029, 24);

Furthermore, we had Read Committed Snapshot Isolation (RCSI) enabled on the database.

In our scenario we had two transactions executing the following two T-SQL statements via separate database connections (different SPIDs).

SPID-56

BEGIN TRAN

UPDATE ORDER_LINE SET ORDER_ID = 101 WHERE ORDER_ID = 100;

...

 

SPID-57

BEGIN TRAN

DELETE ORDERS WHERE ORDER_ID = 100;

...

 

(NOTE:  both these transactions operate on different tables.)

 

When these transactions were executed, we observed that blocking occured on the ORDERS table. This was a bit non-intuitive and baffling at first.  To get to the root of the problem we started by investigating the common causes but couldn’t find any reason for the two transactions operating on different tables to block each other.  However, on further analysis we noticed that the ORDER_LINE table had a foreign-key relationship to the ORDERS table and because of this when the ORDER_LINE table was updated, the ORDERS table was referenced to ensure that the foreign-key relationship was being preserved. This was why the DELETE statement held a shared (‘S’) lock on the rows of child table, ORDER_LINE, even though it wass only deleting from parent table, ORDERS.

 

This solved a part of the mystery.  Upon looking at the output of sp_lock we observed that there were two exclusive (X) Keylocks (see rows 11 and 12 in the screenshot below) acquired on the ORDERS table which were causing other transactions operating on the ORDERS table to block.

 

 

 

 

This once again was a bit confusing.  If the ORDERS table was being accessed solely to verify the referential integrity of the data and preserve the foreign-key relationship, why were exclusive locks being acquired?  Furthermore, since RCSI was enabled on the database we expected the read and write operations to not cause any blocking.

On digging deeper we determined that the database engine had to acquire an ‘X’ lock on the ORDERS table as soon as the second transaction tried to modify a row to prevent the possibility of the referential integrity being broken.  One could imagine a pathological case where the first transaction updated the ORDER_LINE table with a value that qualified the foreign-key relationship, and then the second transaction DELETE the value from the ORDERS table leaving the referential integrity broken and the foreign-key pointing to a phantom value. In fact this is exactly what occurs in the example presented above and is depicted by the screenshot where ‘S’ lock on ORDER_LINE table requested by the DELETE statement is blocked by the UDATE statement. Therefore the blocking chain can be viewed as: Queries accessing the ORDERS table à blocked by DELETE statement à blocked by UPDATE statement.  To prevent this situation from occurring, the database engine acquires an ‘X’ lock on a referenced table (ORDERS) as soon as it determines that there is a data modification transaction operating on it. This is by design and expected behavior as without this behavior there is a possibility of the referential integrity between the tables breaking.

If your application is encountering a similar issue you may want to try work around it by:

1.       Modifying your application so that two transactions do not operate on the same or referenced objects concurrently, e.g. in the above example the update and the delete could have been performed via a single transaction

2.       Removing the referential integrity.  This assumes that your application does not need this, or can enforce it via some other means, e.g. within the business logic layer of the application.

 

Cross Posted from http://blogs.microsoft.com/mssqlisv

Thu, 29 Jan 2009 19:19:00 GMT

Using SQL Server 2008 Management Data Warehouse for database monitoring in my application

SQL Server 2008 introduced what we call the Management Data Warehouse. The Management Data Warehouse is a relational database that contains the data that is collected from a server using the new SQL Server 2008 data collection mechanism. The Warehouse consists of these components:

·       An extensible data collector :
ms-help://MS.SQLCC.v10/MS.SQLSVR.v10.en/s10de_4deptrbl/html/b2c2bd5e-62c5-4129-a252-ef9439a05487.htm

·       A database schema which is indirectly extensible by additions in the data collection: ms-help://MS.SQLCC.v10/MS.SQLSVR.v10.en/s10de_4deptrbl/html/9874a8b2-7ccd-494a-944c-ad33b30b5499.htm

·       Stored procedures which allow the DBA to create their own data collection set and own the resultant data collection items: ms-help://MS.SQLCC.v10/MS.SQLSVR.v10.en/s10de_6tsql/html/9dd2824f-ea55-439b-8cd5-3a81fedb1432.htm

·       Three Data Collections Sets which are delivered with SQL Server 2008 and which can be enabled at any time: ms-help://MS.SQLCC.v10/MS.SQLSVR.v10.en/s10de_4deptrbl/html/0e49bcc7-3bab-4dd5-b5f5-62efa13864f3.htm

·       Standard reports delivered with SQL Server 2008 Management Studio display data collected by the three predefined Data Collection Sets. For DBA created Data Collections,  reports need to be generated by the DBA or the data can be queried with normal T-SQL queries.

However, this DBA/management toolset can be used by ISVs as well to collect and monitor performance data of SQL Server through their applications. The Management Data Warehouse as delivered with SQL Server can be used as a framework by an application as a basis for database monitoring functionality. Due to the high flexibility of the Management Warehouse this actually a very easy task! The usage of this new component as a database monitoring framework can make support way easier; it can make it simple to find answers to questions like: ‘Was the performance issue users complained about a few hours ago rooted on the database  side, the storage backend, or somewhere in the application?’

In the following sections we’ll describe the principle steps to enable SQL Server Management Data Warehouse and Data Collections:

After the customer deployed the application, the customer would need to configure the Management Data Warehouse with SQL Server Management Studio (Object Explorer à Management à Data Collection). Thereby the customer needs to define which database the database schema of the Management Data Warehouse needs to be deployed in. The customer now can be instructed to create a database of a certain name and a certain database file locations or to use a database the application already created during its installation. The application can take reference on this database and can query the database after the user contexts the application uses are assigned into the ‘dbo’ role of the Managment Warehouse database. SQL Server now will create now the following entities in the Management Warehouse database:

·       A user named ‘mdw_check_operator_admin’.

·       A schema named ‘core’ with a set of tables (see
ms-help://MS.SQLCC.v10/MS.SQLSVR.v10.en/s10de_4deptrbl/html/9874a8b2-7ccd-494a-944c-ad33b30b5499.htm

·       A schema ‘snapshot’ which contains the tables needed for the System Data Collector Set which are delivered with SQL Server 2008 already.  A closer description can be found here: ms-help://MS.SQLCC.v10/MS.SQLSVR.v10.en/s10de_4deptrbl/html/0e49bcc7-3bab-4dd5-b5f5-62efa13864f3.htm

Other functionality needed for making the whole warehouse working already is deployed in SQL Server’s msdb database.

In order to start the collection of the 3 collectors which are deployed with SQL Server the customer needs to have SQLAgent started and needs to enable the 3 collection sets manually. However these 3 collection sets only cover minimal aspects and often are not sufficient in a support case to detect whether the issue is on the database side or whether the issue is somewhere else. Even worse if the performance problem currently is not present, but happened hours or days ago, there hardly is any way to figure out what happened hours or days ago.

Another possibility to enable the Management Warehouse and to configure it can be done by T-SQL stored procedure as they are delivered in msdb.

Extending the Data Collections to our Need

The nice thing about the Management Data Warehouse is the fact that the data collector is a framework which allows the definition of own data collection sets and data collection items. The extension of the data collection or definition of an own data collection also will extend the schema of the warehouse automatically. For an application vendor it means collecting what their typical support scenarios are. As a great example on how to extend such a collection set, one could script the 3 collection sets which are delivered into a Query Window. This can be done by marking one of the sets, click the right mouse button and select the options ‘Script Data Collection As’ à ‘Create to’ à ‘Query Window’. Unfortunately all the collections are ‘TSQL Query Collector Type’ and ‘Query Activity Collector Type based, whereas the SQL Server Data Collector also can be enhanced reading Windows Performance Counters and SQL Server Trace information. For different Collector types please see:
ms-help://MS.SQLCC.v10/MS.SQLSVR.v10.en/s10de_4deptrbl/html/a7c8a8ed-d989-4d74-a885-188336c37863.htm

In order to demonstrate how to extend the data collector we use one of the cases which keep us busy very often supporting ISV applications running on SQL Server and databases in general. ‘How was the I/O performance during the time the users were complaining of the application being so slow’

 

Build a Data Collection which collects Query Performance Counters

Being faced with a support situation where hours or days ago users were complaining about performance of the application, it becomes extremely tricky to figure out what really happened. When everybody points to the database as source of the problems, it is hard to find any evidence at all unless the customer has a comprehensive set of Performance Counters recorded steadily. That however is a rare case all too often. No blame to a customer not having a comprehensive Perfmon Traces since the administration and archiving of the trace can be rather cumbersome. Therefore we want to show in our first case of extending the SQL Server 2008 Data Collector a case where we collect Performance Monitor data. In opposite to the usage of the Perfmon on the OS side, the Data Collector and the Management Data Warehouse take responsibility to administrate the data. As first we’ll show the script to extend the Data Collector and then walk through it step by step.

So the script generating such a collection would look like:

use msdb;

Begin Transaction

Begin Try

Declare @collection_set_id_1 int

Declare @collection_set_uid_2 uniqueidentifier

EXEC [dbo].[sp_syscollector_create_collection_set]

      @name=N'Disk Performance and SQL CPU',

      @collection_mode=1,

      @description=N'Collects logical disk performance counters and SQL Process CPU',

      @target=N'',

      @logging_level=0,

      @days_until_expiration=7,

      @proxy_name=N'',

      @schedule_name=N'CollectorSchedule_Every_5min',

      @collection_set_id=@collection_set_id_1 OUTPUT,

      @collection_set_uid=@collection_set_uid_2 OUTPUT

Select @collection_set_id_1, @collection_set_uid_2

 

Declare @collector_type_uid_3 uniqueidentifier

Select @collector_type_uid_3 = collector_type_uid From [dbo].[syscollector_collector_types] Where name = N'Performance Counters Collector Type';

Declare @collection_item_id_4 int

EXEC [dbo].[sp_syscollector_create_collection_item]

@name=N'Logical Disk Collection and SQL Server CPU',

@parameters=N'<ns:PerformanceCountersCollector xmlns:ns="DataCollectorType">

      <PerformanceCounters Objects="LogicalDisk"

            Counters="Avg. Disk Bytes/Read"

            Instances="*" />

      <PerformanceCounters Objects="LogicalDisk"

            Counters="Avg. Disk Bytes/Write"

            Instances="*" />

      <PerformanceCounters Objects="LogicalDisk"

            Counters="Avg. Disk sec/Read"

            Instances="*" />

      <PerformanceCounters Objects="LogicalDisk"

            Counters="Avg. Disk sec/Write"

            Instances="*" />

      <PerformanceCounters Objects="LogicalDisk"

            Counters="Disk Read Bytes/sec"

            Instances="*" />

      <PerformanceCounters Objects="LogicalDisk"

            Counters="Disk Write Bytes/sec"

            Instances="*" />

      <PerformanceCounters Objects="Process"

            Counters="% Privileged Time"

            Instances="sqlservr" />

      <PerformanceCounters Objects="Process"

            Counters="% Processor Time"

            Instances="sqlservr" />

</ns:PerformanceCountersCollector>',

@collection_item_id=@collection_item_id_4 OUTPUT,

@frequency=5,

@collection_set_id=@collection_set_id_1,

@collector_type_uid=@collector_type_uid_3

Select @collection_item_id_4

 

Commit Transaction;

End Try

Begin Catch

Rollback Transaction;

DECLARE @ErrorMessage NVARCHAR(4000);

DECLARE @ErrorSeverity INT;

DECLARE @ErrorState INT;

DECLARE @ErrorNumber INT;

DECLARE @ErrorLine INT;

DECLARE @ErrorProcedure NVARCHAR(200);

SELECT @ErrorLine = ERROR_LINE(),

       @ErrorSeverity = ERROR_SEVERITY(),

       @ErrorState = ERROR_STATE(),

       @ErrorNumber = ERROR_NUMBER(),

       @ErrorMessage = ERROR_MESSAGE(),

       @ErrorProcedure = ISNULL(ERROR_PROCEDURE(), '-');

RAISERROR (14684, @ErrorSeverity, 1 , @ErrorNumber, @ErrorSeverity, @ErrorState, @ErrorProcedure, @ErrorLine, @ErrorMessage);

 

End Catch;

 

GO

Now let’s go through it step by step. As mentioned vital functionality already is delivered in the SQL Server database msdb. Therefore we execute the script above in msdb.

First step is to create an own Data Collection Set.  The name we give will show up in SQL Server Management Studio and at the end will be the name we use to query for results.

Also very important information to define is the data retention period with the parameter named @days_until_expiration. Since we don’t want to end up blowing the volume of the Management Data Warehouse beyond the size of the application to monitor, this parameter needs to be set. Data beyond that expiration age is getting purged.

Other important information to give is the schedule which should be used to execute the collection. The creation of the Management Data Warehouse created pre-defined schedules with the following names:

·       CollectorSchedule_Every_5min                                                                                                    

·       CollectorSchedule_Every_10min                                                                                                    

·       CollectorSchedule_Every_15min                                                                                                   

·       CollectorSchedule_Every_30min                                                                                                    

·       CollectorSchedule_Every_60min                                                                                                   

·       CollectorSchedule_Every_6h                                                                                                       

 

The names also can be checked in msdb.dbo.sysschedules. Sure own schedules can also be defined and can be used instead.

Another parameter is worth mentioning. Having read the Books Online Documentation about the Data Collector we pointed out earlier, it got mentioned that one can cache the data for a while before uploading into the Management Data Warehouse. Whether one wants to do so is determined with the parameter called @collection_mode. In our case the value of 1 means a direct upload after the collection without any caching.

The data about the collection set now is entered into a table in msdb. The important thing now is to get the ID and UID  of the collection set which are delivered as output of the procedure to create the collection set.

The set of collections already existing also can be retrieved out of msdb.dbo.syscollector_collection_sets_internal

The second step is to get the UID of one of the 4 different Data Collection types we mentioned above already. This is done with this query in the script:

Select @collector_type_uid_3 = collector_type_uid From [dbo].[syscollector_collector_types] Where name = N'Performance Counters Collector Type';

 

The other 3 Data Collection types would need to be defined by these names:

·       Generic T-SQL Query Collector Type

·       Generic SQL Trace Collector Type

·       Query Activity Collector Type

 

Now the third and probably least documented step so far. The specific data collection item is going to be defined. The first parameter is the name of the collection. The second parameter which simply has the name ‘parameter’ defines what really has to happen. As one can see it is a XML structure which has kind of a one line header and then the same structure for any of the definition of a counter again. Looks pretty simple as one can see above where we define 6 different counters of Logical Disks over all disk partitions visible to the server. We also added collecting the Privileged CPU and the overall CPU SQL Server is consuming. Everything is set now. One just needs to enable the new Data Collection Set with the stored procedure sp_syscollector_start_collection_set (see: ms-help://MS.SQLCC.v10/MS.SQLSVR.v10.en/s10de_6tsql/html/d8357180-f51e-4681-99f9-0596fe2d2b53.htm ) and have SQLAgent running and the data collection starts.

 

How to read the data

In order to programmatically read the data one needs to look a bit into the different tables of the Management Data Warehouse schema where the data is getting stored in:

snapshots.performance_counter_values: will store the raw and formatted data identified with a performance_counter_instance_id, the snapshot_id and the date of the snapshot. So we already have two important components of data with a query like this:

select spcv.formatted_value as 'Formatted Value',

spcv.collection_time as 'Collection Time'

from snapshots.performance_counter_values spcv

order by spcv.collection_time desc

In order to get the Path and the name of the counters based on the id, we need to join the table snapshots.performance_counter_instances with the table containing the values. A query could look like this:

select spci.path as 'Counter Path', spci.object_name as 'Object Name',

spci.counter_name as 'counter Name', spci.instance_name,

spcv.formatted_value as 'Formatted Value',

spcv.collection_time as 'Collection Time'

from snapshots.performance_counter_values spcv,

snapshots.performance_counter_instances spci

where spcv.performance_counter_instance_id = spci.performance_counter_id

order by spcv.collection_time desc

So far so good. However the data we are selecting so far might be from different collection sets even from different database instances (if one allows centralizing). Therefore we need to restrict the data now to the data collected by our custom build data collection set. In order to do so we need to look at several different tables to make this connection. At the end the query looks like:

select spci.path as 'Counter Path', spci.object_name as 'Object Name',

spci.counter_name as 'counter Name', spci.instance_name,

spcv.formatted_value as 'Formatted Value',

spcv.collection_time as 'Collection Time',

csii.instance_name as 'SQL Server Instance'

from snapshots.performance_counter_values spcv,

snapshots.performance_counter_instances spci,

msdb.dbo.syscollector_collection_sets_internal scsi,

core.source_info_internal csii,

core.snapshots_internal csi

where spcv.performance_counter_instance_id = spci.performance_counter_id and

scsi.collection_set_uid=csii.collection_set_uid and

csii.source_id = csi.source_id and csi.snapshot_id=spcv.snapshot_id and

scsi.name = 'Disk Performance and SQL CPU'

order by spcv.collection_time desc

 

Well that is the first example of how can extend the SQL Server 2008 Data Collector and use the Management Data Warehouse as a base for monitoring through an application. We’ll continue the series introducing more extensions covering other areas of monitoring or suppor Cross Posted from http://blogs.microsoft.com/mssqlisv

Thu, 27 Nov 2008 01:35:00 GMT

OPTIMIZE FOR UNKNOWN – a little known SQL Server 2008 feature

Using parameterized queries is a well known SQL Server Best Practice. This technique ensures caching and reuse of existing query execution plans (instead of constantly compiling new plans), as well as avoiding SQL injection by mandating that input data be data type safe.

See more about SQL Server parameterization Best Practices here: http://blogs.msdn.com/sqlprogrammability/archive/2007/01/13/6-0-best-programming-practices.aspx

An application that I work with presented me with an interesting dilemma; It wanted to utilize the benefits of plan reuse but the parameter values that the application initially sends to SQL Server are not representative of the values passed in the subsequent re-execution of the statement. SQL Server compiled and cached a ‘good’ plan for the first parameter values. Unfortunately, this had the unintended side effect of caching a poor execution plan for all subsequent parameter values. To make this clearer let’s look at the following example query;

Select * from t where col1 > @P1 or col2 > @P2 order by col1;

Let’s assume for simplicities sake that col1 is unique and is ever increasing in value, col2 has 1000 distinct values and there are 10,000,000 rows in the table, and that the clustered index consists of col1, and a nonclustered index exists on col2.

Imagine the query execution plan created for the following initially passed parameters: @P1= 1 @P2=99

These values would result in an optimal queryplan for the following statement using the substituted parameters:

Select * from t where col1 > 1 or col2 > 99 order by col1;

Now, imagine the query execution plan if the initial parameter values were:  @P1 = 6,000,000 and @P2 = 550.

As before, an optimal queryplan would be created after substituting the passed parameters:

Select * from t where col1 > 6000000 or col2 > 550 order by col1;

These two identical parameterized SQL Statements would potentially create and cache very different execution plans due to the difference of the initially passed parameter values. However, since SQL Server only caches one execution plan per query, chances are very high that in the first case the query execution plan will utilize a clustered index scan because of the ‘col1 > 1’ parameter substitution. Whereas, in the second case a query execution plan using index seek would most likely be created.

Unfortunately if the initial parameter values are similar to the first example above, then a ‘table scan’ execution plan gets created and cached, even though most of the following queries would rather use a plan that contains the index seek.

There are a number of ways to work-around this issue;

·      Recompile every time the query is executed using the RECOMPILE hint - This can be very CPU intensive and effectively eliminates the benefits of caching queryplans.

·      Unparameterize the query – Not a viable option in most cases due to SQL injection risk.

·      Hint with specific parameters using the OPTIMIZE FOR hint (However, what value(s) should the app developer use?) This is a great option if the values in the rows are static, that is; not growing in number, etc. – However in my case the rows were not static.

·      Forcing the use of a specific index

·      Use a plan guide – Using any of the recommendations above.

SQL Server 2008 provides another alternative: OPTIMIZE FOR UNKNOWN

 

SQL Server 2008 provides a different alternative; the OPTIMIZE FOR UNKNOWN optimizer hint. This hint directs the query optimizer to use the standard algorithms it has always used if no parameters values had been passed to the query at all. In this case the optimizer will look at all available statistical data to reach a determination of what the values of the local variables used to generate the queryplan should be, instead of looking at the specific parameter values that were passed to the query by the application.

Full documentation of optimizer hints can be found here:

http://msdn.microsoft.com/en-us/library/ms181714(SQL.100).aspx

Example:

@p1=1, @p2=9998,

Select * from t where col > @p1 or col2 > @p2 order by col1

option (OPTIMIZE FOR (@p1 UNKNOWN, @p2 UNKNOWN))

Using this new optimizer hint option has allowed the ISV to generate queries that result in the benefits of parameterization; such as plan reuse, while eliminating the problems caused by the caching of queryplans that were created using nontypical initially passed parameter values.

NOTE: This new optimizer hint option, like all optimizer hints, should be used only by experienced developers and database administrators in cases where SQL Server cannot create an optimal plan.

Cross Posted from http://blogs.microsoft.com/mssqlisv

Fri, 31 Oct 2008 23:49:00 GMT

SQL Server 2008 : new binary – hex string conversion functionality can dramatically improve related query performance by orders of magnitude.

In previous SQL Server releases it wasn’t possible to convert binary data to string characters in hex format directly, because SQL Server did not have a built-in Transact-SQL command for converting binary data to a hexadecimal string. The Transact-SQL CONVERT command converted binary data to character data in a one byte to one character fashion. SQL Server would take each byte of the source binary data, convert it to an integer value, and then uses that integer value as the ASCII value for the destination character data. This behavior applied to the binary, varbinary, and timestamp datatypes.

 

The only workarounds were to use either a stored procedure as described in a Knowledge Base Article:  "INFO: Converting Binary Data to Hexadecimal String" (  http://support.microsoft.com/kb/104829   ) or by writing a CLR function.

 

An ISV I work with doesn’t support CLR and therefore they implemented their own version of a custom convert function in form of a stored procedure. This one was even faster than everything else they found on the Internet.

 

NEW – IN SQL SERVER 2008 the convert function was extended to support binary data – hex string conversion. It looks like a tiny improvement almost not worth mentioning.

 

However, for the ISV it was a big step forward as some critical queries need this functionality. Besides the fact that they no longer have to ship and maintain their own stored procedure, a simple repro showed a tremendous performance improvement.

 

Repro:

=====

 

I transformed the procedure described in the KB article mentioned above into a simple function. The stored procedure below will create a simple test table with one varbinary column and insert some test rows in 10K packages ( e.g. nr_rows = 100 -> 1 million rows in the table ).

 

The repro shows two different test cases:

1. insert 0x0 two million times

2. insert 0x0123456789A12345 two million times

 

Depending on the length of the value the disadvantage of the stored procedure solution will be even bigger. On my test machine the results of the test queries below were:

(both tests were done with the same SQL Server 2008 instance - no change of any settings)

 

1. two million times value 0x0

 

    a, using stored procedure : about 3460 logical reads, no disk IO, ~52 secs elapsed time

    b, using new convert feature : about 5200 logical reads,  no disk IO, < 1 sec elapsed time

 

2. two million times value 0x0123456789A12345

    a, using stored procedure : about 3460 logical reads, no disk IO, ~157 secs elapsed time

    b, using new convert feature : about 5200 logical reads,  no disk IO, < 1 sec elapsed time

 

Repro Script:

========

 

create function sp_hexadecimal ( @binvalue varbinary(255) )

returns varchar(255)

as

begin

      declare @charvalue varchar(255)

      declare @i int

      declare @length int

      declare @hexstring char(16)

      select @charvalue = '0x'

      select @i = 1

      select @length = datalength(@binvalue)

      select @hexstring = '0123456789abcdef'

      while (@i <= @length)

      begin

            declare @tempint int

            declare @firstint int

            declare @secondint int

            select @tempint = convert(int, substring(@binvalue,@i,1))

            select @firstint = floor(@tempint/16)

            select @secondint = @tempint - (@firstint*16)

            select @charvalue = @charvalue +

            substring(@hexstring, @firstint+1, 1) +

            substring(@hexstring, @secondint+1, 1)

            select @i = @i + 1

      end

return ( @charvalue )

end

 

 

create procedure cr_conv_test_table ( @value varbinary(16), @nr_rows int )

as

begin

      declare @exist int

      declare @counter int

      set NOCOUNT ON

      set statistics time off

      set statistics io off

      set statistics profile off

      set @exist = ( select count(*) from sys.objects

                              where name = 'conv_test_table' and

                                    type = 'U' )

      if( @exist = 1 )

            drop table conv_test_table

 

      set @exist = ( select count(*) from sys.objects

                              where name = 'conv_test_table_temp' and

                                    type = 'U' )

      if( @exist = 1 )

            drop table conv_test_table_temp

 

      create table conv_test_table ( varbincol varbinary(16) )

      create table conv_test_table_temp ( varbincol varbinary(16) )

      set @counter = 10000

      while @counter > 0

            begin

                  insert into conv_test_table_temp values ( @value )

                  set @counter = @counter - 1

            end

      set @counter = @nr_rows

      while @counter > 0

      begin

            insert into conv_test_table select * from conv_test_table_temp

            set @counter = @counter - 1

      end

end

 

-- create 2 million test rows

execute cr_conv_test_table 0x0, 200

 

set statistics time on

set statistics io on

 

-- compare runtime of stored procedure with new convert feature

select count(*) from conv_test_table

 where dbo.sp_hexadecimal(varbincol) = '0x00'

select count(*) from conv_test_table

 where CONVERT(varchar(255),varbincol,1) = '0x00'

 

-- create 2 million test rows

execute cr_conv_test_table 0x0123456789A12345, 200

 

set statistics time on

set statistics io on

 

-- compare runtime of stored procedure with new convert feature

select count(*) from conv_test_table

 where dbo.sp_hexadecimal(varbincol) = '0x0123456789A12345'

select count(*) from conv_test_table

 where CONVERT(varchar(255),varbincol,1) = '0x0123456789A12345'

 

 

 

Cross Posted from http://blogs.microsoft.com/mssqlisv

Fri, 22 Aug 2008 23:54:00 GMT

How to create an autonomous transaction in SQL Server 2008

I have been asked by many customers and partners, especially those migrating from Oracle, this question: how to create an autonomous transaction in SQL Server? It turns out to be a tricky thing to do since SQL Server doesn't have built-in autonomous transaction support like Oracle.

An Autonomous transaction is essentially a nested transaction where the inner transaction is not affected by the state of the outer transaction. In other words, you can leave the context of current transaction (outer transaction) and call another transaction (autonomous transaction). Once you finish work in the autonomous transaction, you can come back to continue on within current transaction. What is done in the autonomous transaction is truly DONE and won't be changed no matter what happens to the outer transaction. To make it easier to understand, here is an example of the described scenario.

BEGIN TRAN OuterTran

      INSERT TABLE1

      BEGIN “AUTONOMOUS” TRAN InnerTran

            INSERT TABLE2

      COMMIT “AUTONOMOUS” TRAN InnerTran

ROLLBACK TRAN OuterTran

The above pseudo script is meant to preserve result of INSERT TABLE2”. In SQL Server 2008 or prior versions, "ROLLBACK TRAN" would always rollback all inner transactions to the outermost "BEGIN TRAN" statement (without specifiying savepoint). So the "InnerTran" transaction would be rolled back as well, which is not the desired behavior for the particular scenario.

You could wonder why we need an autonomous transaction in the first place. Why can't we just implement two separate transactions so they don't interfere with each other? There are scenarios where people do need logic structured like this. Logging errors in database is one of the most common scenarios. Below is a TSQL script demonstrating a nested transaction where the inner transaction attempts to save the runtime errors in a table.

USE TEMPDB

GO

CREATE TABLE ErrorLogging (logTime DATETIME, msg VARCHAR(255))

CREATE TABLE TestAT (id INT PRIMARY KEY)

GO

CREATE PROCEDURE usp_ErrorLogging

      @errNumber INT

AS

      INSERT INTO ErrorLogging VALUES (GETDATE(), 'Error ' + CAST(@errNumber AS VARCHAR(8)) +' occurred.')

GO

 

DECLARE @ERROR AS INT

INSERT INTO TestAT VALUES (1)

BEGIN TRAN OuterTran

      INSERT INTO TestAT VALUES (1) -- This will raise primary key constraint violation error

     

      SELECT @ERROR = @@ERROR

      IF @ERROR <> 0

      BEGIN

            BEGIN TRAN InnerTran

                  EXEC usp_ErrorLogging @ERROR

            COMMIT TRAN InnerTran

     

            ROLLBACK TRAN OuterTran

      END

 

IF @@TRANCOUNT > 0     

COMMIT TRAN OuterTran

GO

SELECT * FROM TestAT

SELECT * FROM ErrorLogging

GO

If you run above script against SQL Server, you would see no error message recorded in table "ErrorLogging" due to the "ROLLBACK TRAN OuterTran" statement. So, how can we make it work?

In SQL Server 2008, you can implement a loopback linked server to achieve the same goal. For more information about loopback linked server, check Books Online for details (http://msdn.microsoft.com/en-us/library/ms188716.aspx).

 

USE MASTER

GO

EXEC sp_addlinkedserver @server = N'loopback',@srvproduct = N' ',@provider = N'SQLNCLI', @datasrc = @@SERVERNAME

GO

EXEC sp_serveroption loopback,N'remote proc transaction promotion','FALSE'

Go

Note 'remote proc transaction promotion' is a new option on SQL Server 2008, which allows you to control whether or not you want to enlist remote stored procedure call in a distributed transaction. When this option is off  (FALSE) as we set in the above example, the local transaction will not be promoted to distributed transaction. This is how we are able to separate outer and inner transactions in a "autonomous transaction" fashion.

The Inner transaction above can be replaced by:

      BEGIN TRAN InnerTran

            EXEC loopback.tempdb.dbo.usp_ErrorLogging @ERROR

      COMMIT TRAN InnerTran

Full working script is in the appendix below. I want to point out that this method of using a loopback linked server might not scale well if it's executed very frequently. And it only works in SQL Server 2008 due to new server option of 'remote proc transaction promotion' as discussed above. As always, test before you use it.

If you are looking for alternative ways of creating autonomous transaction on SQL 2008 or 2005, you have these options:

  1. Loopback connection from SQLCLR procedure to start a new transaction. Compared to more rigid structure need of loopback linked server, SQLCLR is more flexible and gives you more control over how you want to handle interaction with database. If the logic of the autonomous transaction includes computational tasks, it's one of SQLCLR's strengths to provide performance gain as extra benefit.
  2. Using table variable to save data within transaction. Table variables are not affected by transaction rollback thus serve as temporary buffer for transaction data. Once transaction is done, you can dump data out of table variable to a permanent table. Table variables have limited scope and are less flexible. Usually they would also be slower due to lack of index/statistics. However, it does offer you a pure TSQL option with no need to create anything new.
  3. Loopback connection from Extended Stored Procedures.

Extended Stored Procedure are on the SQL Server deprecation list and we strongly recommend NOT using it.

In a future blog, I'll provide sample SQLCLR code and a script using a table variable to create autonomous transactions. I will also compare their performance differences with loopback linked server in a scalability test. Stay tuned.

Appendix

USE MASTER

GO

EXEC sp_addlinkedserver @server = N'loopback',@srvproduct = N' ',@provider = N'SQLNCLI', @datasrc = @@SERVERNAME

GO

EXEC sp_serveroption loopback,N'remote proc transaction promotion','FALSE'

EXEC sp_serveroption loopback,N'RPC OUT','TRUE' -- Enable RPC to the given server.

Go

USE TEMPDB

GO

CREATE TABLE ErrorLogging (logTime DATETIME, msg VARCHAR(255))

CREATE TABLE TestAT (id INT PRIMARY KEY)

GO

CREATE PROCEDURE usp_ErrorLogging

      @errNumber INT

AS

      INSERT INTO ErrorLogging VALUES (GETDATE(), 'Error ' + CAST(@errNumber AS VARCHAR(8)) +' occurred.')

GO

 

DECLARE @ERROR AS INT

INSERT INTO TestAT VALUES (1)

BEGIN TRAN OuterTran

      INSERT INTO TestAT VALUES (1) -- This will raise primary key constraint violation error

     

      SELECT @ERROR = @@ERROR

      IF @ERROR <> 0

      BEGIN

            BEGIN TRAN InnerTran

                  EXEC loopback.tempdb.dbo.usp_ErrorLogging @ERROR

            COMMIT TRAN InnerTran

           

            ROLLBACK TRAN OuterTran

      END

 

IF @@TRANCOUNT > 0     

COMMIT TRAN OuterTran

GO

SELECT * FROM TestAT

SELECT * FROM ErrorLogging

GO
Cross Posted from http://blogs.microsoft.com/mssqlisv

Fri, 11 Jul 2008 23:51:00 GMT

UPDATE with OUTPUT clause – Triggers – and SQLMoreResults

NOTE:  the code in this BLOG is TSQL instead of ODBC calls.  Since ODBC can be hard to understand and other API’s will have the same basic issues, I decided to use the simpler and more concise TSQL, which should also appeal to a wider audience.

 

An ISV I work with recently ran into an interesting problem; here is the description and solution.

 

PROBLEM:

Adding an unexpected trigger caused application code to fail due to incomplete SQL Syntax, and not reading through all returned results.

 

The ISV wanted to utilize the OUTPUT Clause of the UPDATE statement in their ODBC (SNAC) based application. The OUTPUT clause is very useful in providing data back to the application regarding the row, or rows, which were updated (or: inserted / deleted).  In the example I use below, the application is interested in knowing the date/time of the updated row(s).

 

This could be accomplished by issuing the following statement:

UPDATE T SET COL2 = @Pcol2, COL3 = getdate() OUTPUT CAST(INSERTED.COL3 AS varchar(30))WHERE COL1 = @Pcol1

 

The ISV coded up the application expecting a return value for number of rows affected, and if that value was greater than 0 then it also returned the value of the inserted date/time.

 

This worked well, until an external Partner application added a trigger to the table listed in the UPDATE statement.

 

Example: CREATE TRIGGER [dbo].[TTrigger1] on [dbo].[T] after update as update t2 set col3 = 0

 

Now the application failed on the UPDATE statement with the following error message:

[Microsoft][SQL Native Client][SQL Server]The target table 'T' of the DML statement cannot have any enabled triggers if the statement contains an OUTPUT clause without INTO clause.

 

The error message is self-explanatory, but was a surprise to the ISV application (and the application developer).  The developer did not expect a trigger to ever be created on the table.

There are two different methods of getting OUTPUT data from an UPDATE statement;

·       UPDATE with the OUTPUT clause only – this returns output results directly as part of the statement. This option cannot have a trigger defined on the table.

·       UPDATE with OUTPUT and INTO clauses – this returns the output a specific table, or table variable. This option must be used if there is any possibility the table will have a trigger on it at any point.

·       See the following website for complete the OUTPUT Clause documentation:

http://msdn.microsoft.com/en-us/library/ms177564.aspx

The developer then utilized the following syntax to send the same statement to SQL Server, and also to get the expected result back: declare @p165 table (col2 varchar(30));UPDATE T SET COL2 = ?, COL3 = getdate() OUTPUT CAST(INSERTED.COL3 AS varchar(30)) into @p165 WHERE COL1 = 1;select * from @p165

 

Now a subtlety occurred, can you guess what it was? If you guessed that additional results are returned you are correct.

The ODBC code returned data in a loop utilizing the following API calls:  SQLFetch, SQLNumResultCols, SQLRowCount, SQLMoreResults:

·       The first results returned were the number of rows affected by the trigger, not the number of rows affected by the UPDATE statement, which was what the application was actually expecting

·       The second set of results were the number of rows affected by the UPDATE statement

·       The third set of results were the number of rows returned by the SELECT statement reading the table variable

·       And finally, the actual data from the updated row(s) – which is what we really wanted in the first place!

So, the lessons to be learned here are:

1.   Be aware that triggers will affect your UPDATE statements if utilizing the OUTPUT clause

2.    You should utilize the INTO clause to avoid the issue

3.    Always use SQLMoreResults to read all of the result-sets that could be returned from SELECT, UPDATE, INSERT, or DELETE statements.

4.    Triggers should include the ‘SET NOCOUNT ON’ statement to avoid returning the ‘affected number of rows’.

SOLUTION:

The application was changed to utilize the INTO clause, and SQLMoreResults was used to return all the resulting data.  Using SET NOCOUNT ON in trigger logic is also a best practice that prevents additional results ‘Rows affected’ from being generated.

 

Here is a script to duplicate the issues I’ve described:

USE tempdb

GO

------You may want to run this script in steps from comment – to comment

------so you can follow along, instead of running the entire script at once

 

CREATE TABLE t(

      [col1] [int] NOT NULL,

      [col2] [varchar](30) NULL,

      [col3] [datetime] NULL

) ON [PRIMARY]

GO

insert into t values (1,'abc', getdate())

insert into t values (1,'abc', getdate())

insert into t values (1,'abc', getdate())

GO

select * from t

GO

UPDATE t SET col2 = 'Peter', col3 = getdate()

OUTPUT CAST(INSERTED.col3 AS varchar(30))WHERE col1 = 1

GO

select * from t

GO

------So far everything is good, Now let’s add the new table and the trigger

CREATE TABLE t2(

      [col1] [int] NULL,

      [col2] [datetime] NULL

) ON [PRIMARY]

GO

insert into t2 values (2, getdate())

insert into t2 values (2, getdate())

GO

select * from t2

GO

------In this example, the trigger: ttr1 will update the rows

------of a second table: t2

CREATE TRIGGER ttr1 on t after update as update t2 set col1 = 0

GO

------OK, let’s try now with the trigger on

UPDATE t SET col2 = 'Peter', col3 = getdate() OUTPUT CAST(INSERTED.col3 AS varchar(30))WHERE col1 = 1

GO

------Chances are good you got the following error message

--Msg 334, Level 16, State 1, Line 1

--The target table 't' of the DML statement cannot have any enabled triggers --if the statement contains an OUTPUT clause without INTO clause.

----- let’s fix that now.

declare @p1 varchar(30)

UPDATE t SET col2 = 'Peter', col3 = getdate() OUTPUT CAST(INSERTED.col3 AS varchar(30))into @p1 WHERE col1 = 1

GO

------Notice this failed as well with the following error message

--Msg 1087, Level 16, State 1, Line 2

--Must declare the table variable "@p1".

------We need to use a table

------for this to work correctly we must use a table or

------a table variable where the ‘INTO’ data will reside,

------and be retrieved from

declare @p1 table (col2 varchar(30))

UPDATE t SET col2 = 'Peter', col3 = getdate() OUTPUT CAST(INSERTED.col3 AS varchar(30))into @p1 WHERE col1 = 1

select * from @p1

--Now you get what we were originally looking for

--    the date/times of the rows that were updated

--Look at the results under the 'Messages' tab as well...

--you will see the number of rows affected:

--    2 for the rows inserted as part of the trigger

--    3 for the rows Updated

--    and 3 for the rows we selected from the table variable

--Now, you can see that the application must utilize SQLMoreResults if it

--wants to return all the valid results.

 

Cross Posted from http://blogs.microsoft.com/mssqlisv

Thu, 26 Jun 2008 20:13:00 GMT

Use SQL Server replay tools to reproduce and resolve customer issues

For many ISVs run that into issues at customer sites, it is sometimes difficult to isolate underlying problems, especially on a 24x7 production environment, where limitations apply to real time troubleshooting and live debugging. In situations like this, constructing a repro scenario in a separate environment would be ideal to minimize impact to live production system, and to speed up resolution process.

 

SQL Server Profiler

Allow me introduce SQL Profiler, which offers replay trace function.  Well, it’s not something new. First shipped in SQL Server 7.0, the feature has gone through many improvements in later releases. You can use the tool to take a captured trace as input and replay it against test database(s). It helps identify issues that could be reproduced by replaying the events in the trace. Profiler itself uses ODBC only. In SQL Server 2005 and 2008, the replay function can be configured to use multiple threads (up to 64) to replay workloads.

Advantages:

1.       SQL Server profiler is a built-in tool with full support of Microsoft product team. It works out of box.

2.       Easy to set up and run. Capture a trace using predefined replay template with all required events, and replay it against original database(s) (target machine needs to meet certain requirements http://msdn2.microsoft.com/en-us/library/ms175525.aspx)

3.       In addition to multi-threaded replay, it also provides option of step through to replay events in the order they were traced.

Disadvantages:

1.       Certain events can’t be replayed including replication, events involving GUID, session binding events, operations on Blobs using bcp, full-text, READTEXT, WRITETEXT, and etc. See BOL for more details (http://msdn2.microsoft.com/en-us/library/ms188238.aspx)

2.       The tool does not support multiple machine replay (running multiple instances of Profiler from multiple machines to replay the trace).

3.       Profiler GUI tool is client side tracing and might be intrusive and generate significant performance overhead when capturing events for replay. Be careful of what events to capture and consider using server side tracing (sp_trace_create).

 

RML Utilities

Starting in SQL Server 2000, SQL Server Customer Support Services team (CSS) started a project of similar tool, called Ostress, with higher flexibility and scalability to help troubleshoot some of the challenging SQL problems. The latest version is packaged in “Replay Markup Language(RML) Utilities” supporting both SQL 2005 and SQL 2000 (http://support.microsoft.com/kb/944837). The tool can replay multi-threaded events as profiler does but with multiple machine replay support. It can simulate up to 1000 concurrent threads. The tool has a component called Ostress (just like old version), which takes a TSQL batch file and “stress” test it by opening arbitrary number of connections and iterate the TSQL batch in each connection configurable number of loops. This is useful when workload can be characterized as same or similar batch from various number of users (connections).

 

Advantages:

1.       The tool offers both replay and stress test options.

2.       It supports multiple machine replay (multiple instances of OStress) with up to 1000 concurrent threads.

3.       OStress supports 7.0, 2000, and 2005 trace formats.

Disadvantages:

1.       The tool is provided as is, no technical support from Microsoft. But you can submit questions via contact in readme of the tool.

2.       Requires extra steps to process trace file and convert to RML format before being replayed.

3.       Does not support MARS replay.

 

Recommendation

When to use SQL profiler and when to use RML Utilities? If you have a workload that can be replayed/reproduced with no or low concurrency requirement (<64 concurrent threads), use profiler that offers flexibility of step through or multi-threaded replay options. If you need to replay a workload with high concurrency requirement (> 64 threads) or an isolated batch that can be “stress” tested for simulation, use RML Utilities. Keep in mind, for concurrency replay, full sync of ordered events is very hard to replay and no tools exist today to exactly duplicate the original trace. So the issues that happened on traced source server might not be reproduced consistently afterwards even on same environment.

 

Both tools above are for database replay. For a simulation test of multi-tier application environment, consider load-test tool of Visual Studio (Team edition) or 3rd-party vendor products.

Cross Posted from http://blogs.microsoft.com/mssqlisv

Tue, 27 May 2008 20:49:00 GMT

SQL Server Intermittent Connectivity Issue

Recently many customers of an ISV I work with, reported intermittent connectivity issues when running the ISV application on SQL Server. Some customers reported the issue to be SQL Server 2005 specific. Others stated that they are experiencing the same issue on both SQL Server 2000 and 2005. Due to the intermittent nature, and the variation of the issue, it took us quite a while  to collect all the data, (odbc trace, netmon trace, sql trace…), analyse it, and understand the exact cause.

SynAttackProtect

The first issue we found was a subtle Winsock behavior change in Window 2003 SP1. Windows 2003 SP1 introduces a configurable registry setting: SynAttackProtect, that protects the server from network Denial-Of-Service attacks.  By default the protection is on.  In a SQL Server environment, when the number of simultaneous client connection requests is more than the system can handle and SQL Server backlog queue is full, the client will receive a 'connection failed' error from SQL Server:

TCP Provider: An existing connection was forcibly closed by the remote host

The SQL Protocols team has a good Blog that explains the detailed interaction between SynAttackProtect setting and SQL Server. See http://blogs.msdn.com/sql_protocols/archive/2006/04/12/574608.aspx.

In Windows 2003, this issue could be worked-around by configuring the registry setting to disable SynAttackProtect.

1)    Launch regedit.exe

2)    Add DWORD value named SynAttackProtect under registry key HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters\

3)    Set data value to 0

See http://technet2.microsoft.com/windowsserver/en/library/8d3a9f4d-13d1-4280-ac57-30242504d8ba1033.mspx?mfr=true for additional information.

After applying the registry change, two customers reported the intermittent connection issue went away. Both customers had been stress-testing SQL Server.  SynAttackProtect is more likely to become an issue in a lab environment where SQL Server is being stress-tested with extreme load.  Customers running stress-testing in lab environments should turn off SynAttackProtect.  I am not sure I’d recommend proactively turning it off in a production system given the potential security risk.  If a production system ever runs into the SynAttackProtect issue, the where/why of the large number of connection requests should be examined first.

Windows “Scalable Networking Pack”

Windows Scalable Networking Pack was a second network stack change that was released as part of Windows 2003 SP1 + KB91222, or Windows 2003 SP2. With the Scalable Networking Pack, the TCP Chimney Offload feature is enabled by default to increase performance. However implementations on certain network cards are problematic when TCP Chimney Offload enabled, and can cause intermittent connection drop. When the connection is dropped due to incompatibility between the network card and Windows Scalable Networking Pack, typical error message is

[08S01] [Microsoft][SQL Native Client]Communication link failure

A workaround for this issue could be to disable to TCP Chimney Offload feature.

1)    Launch regedit.exe

2)    Edit DWORD EnableTCPChimney under HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters, set data value to 0

3)    Edit DWORD EnableRSS under HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters, set data value to 0

4)    Edit DWORD EnableTCPA under HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters, set data value to 0

5)    Restart the server

Amongst others, network cards using Broadcom 5708 chipsets are known to have compatibility issues w/ TCP Chimney Offload.  Disabling TCP Chimney Offload would fix the incompatibility issue. However it would also remove the benefit of Windows TCP stack performance improvements. It should only be used as a workaround until a fix becomes available from the network card vendor. 

See support article http://support.microsoft.com/kb/942861 for additional information.

By now, the majority of customers solved the intermittent connectivity issue after applying the SynAttackProtect and/or TCP Chimney Offload changes. Some customers connecting to SQL Server through Citrix have to turn off the TCP Chimney Offload feature on the Citrix server to fix the issue.

Query Timeout

The particular ISV application runs with a configurable Query timeout, by default the timeout is set to 1 second.  When a query timeout happens, the application will retry the query for 16 times, if it still fails, the query will be submitted again with nolock hint.  How would that affect the connection?  At the first glance, it seems to be irrelevant, but it does. Here’s the sequence of what could happen.

1)    The query is submitted to SNAC for execution

2)    The execution of the query takes too long so the timeout expires

3)    Once the timeout happens, client attempts to cancel the query and sends an ATTN packet to the server. After sending ATTN, the client then waits for the response from the server. A timeout is set for this wait, the value of the timeout is obtained through SQL_ATTR_CONNECT_TIMEOUT and if not set, the default to 120 seconds.

4)    The wait for server to response also times out. This is treated as an error from the client and basically the connection is considered dead in such scenarios

5)    The client marks the connection as dead and then returns the “Query timeout expired” error message.

6)     The application code, on seeing a timeout expired message attempts to execute the request again on the same connection, but immediately hits the “communication link failure” message because the connection is deemed as dead by the client

 

During the investigation, we discovered a regression in SQL Server 2005. SQL Server 2005 may not response to a query cancel (timeout) request in a timely manner if the query requires index scan through a large number of pages. Checking for any attention requests is delayed when the SQL Server storage engine is busy bringing pages in from disk. Resulting in

[08S01] [Microsoft][SQL Native Client]Communication link failure

 

A hotfix is available to fix the regression, see details in

http://support.microsoft.com/kb/945442.

 

All the remaining customers had some long running batch queries that index scan a large table. This is just the last piece we needed to complete the puzzle. After applying the hotfix, all customers reported the problem solved.

Cross Posted from http://blogs.microsoft.com/mssqlisv


Wed, 19 Mar 2008 02:25:00 GMT

Using time zone data in SQL Server 2008

 

In SQL Server 2008 Microsoft has introduced a number of new date and time data types.  One of these is the datetimeoffset data type. This data type includes an offset from UTC time as well as the datetime value and ensures that the datetime can be retrieved in UTC or a particular timezone based on this offset.  There are also new functions to allow for conversions between different time zones using the new function SWITCHOFFSET(). 

 

An example from SQL Server 2008 Books On Line (BOL):

CREATE TABLE dbo.test

    (

    ColDatetimeoffset datetimeoffset

    );

GO

INSERT INTO dbo.test

VALUES ('1998-09-20 7:45:50.71345 -5:00');

GO

SELECT SWITCHOFFSET (ColDatetimeoffset, '-08:00')

FROM dbo.test;

GO

--Returns: 1998-09-20 04:45:50.7134500 -08:00

SELECT ColDatetimeoffset

FROM dbo.test;

--Returns: 1998-09-20 07:45:50.7134500 -05:00

 

One of the most common questions we are asked is why we use the offset and not a timezone name.  A timezone name is much easier to remember than an offset, and most people do not know an offset without looking it up, making queries more difficult. 

 

Unfortunately, there is no current international standard authority for timezone names and values.  Each system needs to use a system of their own choosing, and until there is an international standard, it is not feasible to try to have SQL Server provide one, and would ultimately cause more problems than it would solve.  However, there are a couple of common systems that are well recognized around the world.  One is the Dynamic timezone data that is stored in the Windows Vista registry.  This data can be read from the registry into a file, which is then imported into SQL Server.


Another cross-platform standard is the public domain Olson Timezone database (
http://www.twinsun.com/tz/tz-link.htm). There are many public domain programs for extracting a time zone from these files, but at this time most are programmatic solutions.   So a programmatic function could be written in the CLR, but to provide full database functionality and query-ability, a table is needed.

 

The programmatic solutions take a date, then apply the many different rules that determine when a zone is in daylight savings time and when it is not.  However, there are also historical changes.  Daylight savings time changed in 2007, meaning that determining what the offset for a particular zone is at a particular time is different depending on the year.  Then there are times when leap seconds need to be added.  Therefore any data-driven solution must have rows that have valid time ranges as well.

 

The approach recommended here is to take one of the DLLs found on the web and instead of providing a programmatic solution around a specific date – to write all of the rows out into a database as an offset with the valid ranges.  Currently this example uses the Windows standard naming conventions for timezones, with a mapping to the Olson timezone names, but you could easily add other names in other languages as well.

 

Working with timezones is very complex, and the following is a suggestion only for some ideas on how to use time zone data more effectively.   This is an example program (no guarantees) that uses a .NET library and writes the data from the Olson tz files in table format, and which can then be imported into SQL Server.    The .NET timezone library can be found at http://www.babiej.demon.nl/Tz4Net/main.htm and they request a small donation.   

 

Here is some sample code to write to files the timezone data (no guarantees – does not include leap seconds):

 

            StreamWriter sr = File.CreateText(@"D:\TZMapping.txt");

            StreamWriter tr = File.CreateText(@"D:\TZZones.txt");

 

            string[] zoneNames = OlsonTimeZone.AllNames;

            sr.WriteLine("ID\tDaylightName\tStandardName\tRawUtcOffset\tOffsetSeconds\tWin32Id");

            tr.WriteLine("ID\tTransitionStart\tTransitionEnd\tDeltaSeconds\tDST");

 

            for (int i = 0; i < zoneNames.Length; i++)

            {

                OlsonTimeZone tz = OlsonTimeZone.GetInstanceFromOlsonName(zoneNames[i].ToString());

                sr.Write(i.ToString() + "\t");

                sr.Write(tz.DaylightName.Trim() + "\t");

                sr.Write(tz.StandardName.Trim() + "\t");

                sr.Write(tz.RawUtcOffset.ToString() + "\t");

                sr.Write(tz.RawUtcOffset.TotalSeconds.ToString() + "\t");

                sr.WriteLine(tz.Win32Id == null ? "" : tz.Win32Id.Trim());

 

                DaylightTime[] times = tz.AllTimeChanges;

                for (int j = 0; j < times.Length; j++)

                {

                    tr.Write(i.ToString() + "\t");

                    tr.Write(times[j].Start.ToString("yyyy-MM-dd HH:mm:ss") + "\t");

                    tr.Write(times[j].End.ToString("yyyy-MM-dd HH:mm:ss") + "\t");

                    tr.Write(times[j] is StandardTime ? "0\t" :times[j].Delta.TotalSeconds.ToString() + "\t");

                    tr.WriteLine(times[j] is StandardTime ? false.ToString() : true.ToString() );

                }

            }

            tr.WriteLine();

            sr.WriteLine();

            tr.Close();

            sr.Close();

 

Import the TZMapping file, which will become the parent table, with the ID as the primary key.   Your table structure might look like this:

  

 

 

 Please note:  If you use the Flat File Datasource in the Import Data Wizard in SQL Server 2008 Management Studio, you will need to open the Advanced Tab to set the source OutPutColumnWidth to greater than the default of 50.   Then import the TZZones file, which will become the child table with the ID, TransitionStart, and TransitionEnd as the composite primary key with a foreign key reference to the TZMapping table.  The TZZones table includes historical timezone data.    Joining these new tables into your data into queries now allows for queries that include standard names, Windows IDs, etc.

 

For example, offsets can now be retrieved by a preferred name:

 

select UtcOffset from TZmapping where StandardName = 'US/Pacific (PST)'

 

The following two queries return different offset amounts for the same day in two different years.  This is because the US changed daylight savings time, and the date in March now falls into daylight savings when it did not before.

 

 

select (dbo.TZMapping.OffsetSeconds + dbo.TZZones.DeltaSeconds)/3600

from dbo.TZMapping 

join dbo.TZZones

on dbo.TZMapping.id = dbo.TZZones.id

where dbo.TZMapping.StandardName = 'America/Los_Angeles (PST)'

and '2006-03-15'

between dbo.TZZones.TransitionStart

and dbo.TZZones.TransitionEnd

 

 

 

select (dbo.TZMapping.OffsetSeconds + dbo.TZZones.DeltaSeconds)/3600

from dbo.TZMapping 

join dbo.TZZones

on dbo.TZMapping.id = dbo.TZZones.id

where dbo.TZMapping.StandardName = 'America/Los_Angeles (PST)'

and '2007-03-15'

between dbo.TZZones.TransitionStart

and dbo.TZZones.TransitionEnd

 

 

Again, timezones are a complex area and each application will need to address how you are going to handle time zone data to make programs more user friendly.  This is just one small example.

Cross Posted from http://blogs.microsoft.com/mssqlisv

Wed, 19 Mar 2008 01:49:00 GMT

Increase your SQL Server performance by replacing cursors with set operations

You have probably heard many times, from different sources, that as a best practice; avoid using TSQL cursors.

During a recent visit to a partner we ran into a common cursor case, which I wanted to use as an example to demonstrate why you should avoid TSQL cursors in most cases, and how to convert cursor logic to simple set join operations. Now there are certain scenarios where using a cursor makes sense. For example, a cursor is ideal for row by row processing that can’t be accomplished by set based operations. A cursor is flexible in that it provides a window, or subset, of data and that allows manipulation of the data in various ways. Study carefully what you want to achieve on case by case basis before using a cursor. Keep in mind SQL Server, as a modern RDBMS system, performs much better with set operations.

Here is simplified version of a real cursor that was used to update a big table with over 200 million rows.

DECLARE @EntityId Varchar(16)

DECLARE @PerfId Varchar(16)

DECLARE @BaseId Varchar(16)

DECLARE @UpdateStatus Int

 

DECLARE outerCursor CURSOR FOR

SELECT EntityId, BaseId

FROM outerTable

--Returns 204,000 rows

 

OPEN outerCursor

FETCH NEXT FROM outerCursor INTO @EntityId, @BaseId

 

WHILE @@FETCH_STATUS = 0

BEGIN

            DECLARE innerCursor CURSOR FOR

            SELECT PRFMR_ID

            FROM innerTable

            WHERE ENTY_ID = @BaseId

                       

            OPEN innerCursor

            FETCH NEXT FROM innerCursor INTO @PerfId

            SET @UpdateStatus = @@FETCH_STATUS

           

            WHILE @UpdateStatus = 0

            BEGIN

                  UPDATE 200MilRowTable

                  SET ENTY_ID = @EntityId

                  WHERE PRFMR_ID = @PerfId

           

                  FETCH NEXT FROM innerCursor INTO @PerfId

                  SET @UpdateStatus = @@FETCH_STATUS

            END

           

            CLOSE innerCursor

            DEALLOCATE innerCursor --clean up inner cursor

                       

            FETCH NEXT FROM outerCursor INTO @EntityId, @BaseId

END

 

CLOSE outerCursor

DEALLOCATE outerCursor –cleanup outer cursor

 

 

You might notice that this is a nested cursor with 204,000 loops in total for outerCursor. The innerTable has 10 million rows but innerCursor varies in number of loops depending on @BaseId of outerCursor. When I arrived at the customer this cursor had been running for over a day. The developer was “hoping” that it would finish soon given another day or two. The problem was nobody knew for sure how much time this thing would need to complete. Well, we can find out how much progress it has made so far to make an educated guess:

SELECT execution_count, st.text

FROM sys.dm_exec_query_stats as qs

CROSS APPLY sys.dm_exec_sql_text(qs.sql_handle) as st

WHERE st.text like '%FETCH NEXT FROM outerCursor INTO%'

 

This would tell us how many times the outer cursor has looped already. It returned 107. That’s only around 0.5% (107/204,000) after 30 hours of running. If the trend were to continue, the cursor would need another 8+ months!!!

A rewrite of the batch to take advantage of set operations is as simple as this:

SELECT i.PRFMR_ID, o.EntityId INTO #tempTable

FROM innerTable i join outerTable o on i.ENTY_ID = o.BaseId

Go

UPDATE 200MilRowTable

SET m.ENTY_ID = t.EntityId

FROM 200MilRowTable m join #tempTable t on m.PRFMR_ID = t.PRFMR_ID

Go

--note this is only one of a few ways to rewrite.

In this particular case, “SELECT INTO” is minimally logged under simple recovery mode. The two statement approach makes it easier to understand the conversion logic.

This batch took approximately 17 hours to complete. Between the statement, I also put the database into simple recovery mode and added appropriate indexes to the temp table. I also dropped indexes from 200MilRowTable that touched “ENTY_ID” to speed this up. Adding indexes back took another 7 hours. The total time was approximately 24 hours, which is just a small fraction of the original cursor batch. I need to point out that the non-cursor batch uses more resources since the UPDATE now spawns multiple threads to process parallely. Remember our goal here is to make this finish faster not worrying about how much resources it consumes.

Note: this might not be a perfect example because the nested cursor is magnifying the slow performance. 

However, the bottom line is; aviod cursors if possible and use joins / set operations whenever you can.

Cross Posted from http://blogs.microsoft.com/mssqlisv

Thu, 06 Mar 2008 22:33:00 GMT

Appending Data Using SQL 2008 Filestream

SQL Server 2008 has a new feature called Filestream, which allows you to save large binary files in the file system, instead of in the database.  This is targeted directly at the scenario that many document management and web applications have today where they save some metadata in the database, with a column holding the path to the actual file.  However, there is no transactional context between the two; therefore, the data in the database can be changed to no longer point to the actual location, or the file moved without the pointer being updated. Users want to be able to have documents, pictures, video, etc. in the file system with the streaming support the file system provides, but have the metadata be consistent.  This is exactly what the Filestream feature is targeted for.

 

When a large binary files gets ‘updated’ most users/applications will replace the file in its entirety (you do not usually update part of an image through the database). Filestream does not currently support in-place updates.  Therefore an update to a column with the filestream attribute is implemented by creating a new zero-byte file, which then has the entire new data value written to it.  When the update is committed, the file pointer is then changed to point to the new file, leaving the old file to be deleted at garbage collection time.  This happens at a checkpoint for simple recovery, and at a backup or log backup.

 

Code that updates large value data types (varchar(max), nvarchar(max), and varbinary(max)) may use the .Write(offset) UPDATE clause; however since an in-place update of a column with the Filestream attribute is not supported, the .Write(offset) UPDATE clause will error out. Therefore, even though filestream is only a storage attribute, the update code does not work against both systems, requiring knowledge of underlying storage mechanism.

 

Many of the update actions taken against large data types are actually appends.  And many logging scenarios continuously append to an ever-growing file.  Filestream offers an option for this case, to avoid the scenario where the original data is pulled to the client, data appended, and then a new file written with the combined data – leaving the old file to be garbage collected later.

 

 In order to avoid this round trip, a device FS control (FSCTL_SQL_FILESTREAM_FETCH_OLD_CONTENT) can be issued to trigger a server-side copy of the old content.  This avoids the data moving over the network. This control is not yet documented; note that the DeviceIoControl ControlCode constant that is needed is: 599392.  This will be added to the Filestream API documentation at a future date.

 

Example code:

[DllImport(“kernel32.dll”, SetLastError = true)]

Static extern bool DeviceIoControl(

IntPtr hDevice,

Uint dwIoControlCode,

IntPter lpInBuffer,

Uint ninBufferSize,

[out] IntPtr lpOutBuffer,

Uiny nOutBufferSize,

Ref uint lpBytesReturned,

IntPtr lpOverlapped );

 

                IntPtr hFile = IntPtr.Zero;

Uint bytesReturned;

                // hFile = mySafeHafle.DangereousGetHandle();  or something similar

               

Bool result = DeviceIoControl ( hFile, 599392, IntPtr.Zero, 0, IntPtr.Zero, 0, ref lpBytesReturned, InPtr.Zero );

 

As Filestream is minimally logged, if there is no activity other than filestream activity, there is usually not enough activity to trigger a checkpoint, and the old files will not get deleted.  In non-simple recovery mode, the log will not increase, but the db will grow until those files get garbage collected.  This will occur in a background thread, and can impact the server through this activity.

 

Therefore, it is a best practice to manually issue that checkpoint if in simple recovery mode; and to maintain an optimal transaction log backup to avoid the file growth.

 

Other quick notes:

 

Filestream deletes are significantly faster than blob deletes.

If an application needs to commonly delete large data values, filestream will be more scalable as the metadata is changed, then the file garbage collected asynchronously.  For instance, the delete of a 6GB nvarbinary(max) file that took 6 seconds, happens in milliseconds with a filestream delete.

 

Filestream maximum size is different for SNAC 9 and SNAC 10.

Files using SNAC9 have a maximum size of 2GB.  SNAC 10 supports unlimited size.

 

Note:  Altering an existing column without a filestream attribute to having the filestream attribute is not supported.

This means that moving data from a blob storage mechanism to filestream mechanism requires that the data be copied to the new datatype with the filestream attribute.

 

 

Cross Posted from http://blogs.microsoft.com/mssqlisv