SQL Server-ing you

A blog about SQL Server

SQL Server – Load data in a partition table using switch command

leave a comment »

I suggest you read the first part before reading this post.

Yesterday we have seen how to implement a sliding window partition strategy.
Today we are going to use it and load data using a staging table and then switch the data into the master table.

To make for a complete test, we are going to add an index to yesterday’s table on creation_date and description.

It’s important to understand what happens inside the data files.

When loading the data into the staging table you can imagine your table name like a pointer. tableName_Load points to where the data are located.
When the switch command is applied, The master table tableName  points to the new slot of data stored until then into the staging table and the staging table points to Null.
The data now belongs to the staging table. In this scenario, the data never moves on the hard disk. That is why a switch command is so fast and that’s also why there’s no transaction log produced for a switch.

Important point: The switch command request for a schema change lock. It’s a very broad lock which will require that only the user performing the switch access the table at the time the switch occur. Failing that, the switch command will wait for its lock until it gets it. This, if the database is busy can be pretty long. If you experience waiting for a switch, keep in mind that it’s not a long transaction. It’s the wait crippling your process.

Last but not least:  It might look attractive to create a partition on a date because it fits our organized mind of grouping things by dates but it might not be the way which will give you the best performance. Know your data intimately, know your queries intimately. If you pull a lot of data by city id for exemple, it might be a good idea to partition your table by city_id…

Create table [tablename]
(

id int,
creation_date smalldatetime,
description varchar(50)
CONSTRAINT PK_id_date PRIMARY KEY CLUSTERED
(

id asc,
creation_date smalldatetime,
description varchar(50)
) ON [pschTableName](creation-date)
) ON [pschTableName](creation_date)

CREATE INDEX IDX_PK2_3 ON tableName (creation_date, description) ON [pschTableName](creation_date)

The idea here is to insert the data in a table of similar structure in columns, primary key and indexes as the master table.

So we are going to create a table called tablename_load. Since this table has to be on the filegroup where the data for that day will be located,  it’s best to write a stored procedure for that which will do it for you.


CREATE PROCEDURE [dbo].[uspPartitionDailyCreateSwitchTables]

@day smalldatetime
AS
— exec uspPartitionDailyCreateSwitchTables ‘2009/03/01’
DECLARE @Sqlcommand varchar(4000)
,@tablename varchar(50)
,@filegroup varchar(50)

BEGIN

SET @tableName = ‘tableName_load’
SET @filegroup = replace(@tablename,’_load’,”) + ‘_’ + replace(convert(varchar(10),@day,120),’-‘,”)

if exists(select * from sys.objects where name = @tablename)
BEGIN
SET @sqlcommand = ‘drop table ‘ + @tablename
exec (@sqlcommand)
END

SET @sqlcommand = ‘Create table [tablename] ( id int,creation_date smalldatetime,description varchar(50) ON [‘ + @filegroup + ‘]’
exec (@sqlcommand)

END
GO

This stored procedure will drop the load table if it exists and create a new one one the filegroup created for that day. What’s important to noticeis  that the primary key and indexes are not created yet at that point.

The reason is that it’s much much faster to insert data on a table with no index or primary key. So it’s important to make sure that your data integrity is good.  Otherwise the job will fail when the primary key will be created. That’s after tableName_Load has been loaded with SSIS or BCP.

Now that the table has been created and populated using your favorite loading tool, we need to create the index and add a constraint on the date (if a column using integer was used for partitioning then the constraint would be on that integer column).

The usage of a constraint is not required by me being excessively picky about my data. That’s SQL Server requirements to be able to switch data into a partition. Even though you’re sure that your data is good, the switch will fail with an error saying you’re missing a constrait. Or that the constraint for the load table is not matching the constraint for the partition.

This is the stored procedure used:

CREATE PROCEDURE [dbo].[uspLoadDataAddConstraintsIndexes]
@day smalldatetime
AS

DECLARE @sqlcommand varchar(4000)
DECLARE @filegroup varchar(50)

BEGIN
–define filegroup to create the table on
SET @filegroup = ‘tableName_’ + replace(convert(varchar(10), @day,120),’-‘,”)
IF  EXISTS (SELECT * FROM sys.check_constraints WHERE object_id = OBJECT_ID(N'[dbo].[CK_tableName_load]’) AND parent_object_id = OBJECT_ID(N'[dbo].[tableName_load]’))
ALTER TABLE [dbo].[tableName_load] DROP CONSTRAINT [CK_tableName_load]

–create the constraint
SET @sqlcommand  = ‘ALTER TABLE [dbo].[tableName_load] WITH CHECK ADD CONSTRAINT [CK_tableName_load] CHECK  ([creation_date]>= convert(smalldatetime,”’+ convert(varchar(10),@day,120) +”’) AND [creation_date]< convert(smalldatetime,”’+ convert(varchar(10),dateadd(dd,1,@day),120)+”’))’
exec (@sqlcommand)
SET @sqlcommand = ‘ALTER TABLE [dbo].[tableName_load] CHECK CONSTRAINT [CK_tableName_load]’
exec (@sqlcommand)
set @sqlcommand = ‘ALTER TABLE [dbo].[tableName_load] ADD CONSTRAINT [PK_tableName_load] PRIMARY KEY CLUSTERED([id ASC, creation_date ASC) ON [‘ + @filegroup + ‘]’
exec (@sqlcommand)
–create the index
set @sqlcommand = ‘CREATE INDEX IDX_PK2_3 ON [dbo].[tableName_load] (creation_date ASC , description ASC) ON [‘ + @filegroup + ‘]’
exec (@sqlcommand)
END
GO

Finally all we need to do is compile the whole as follow:

CREATE PROCEDURE [dbo].[uspLoadData]
@date smalldatetime
AS

DECLARE @partitionNumber varchar(2)

BEGIN

–Create constraints before switching and after BCP
exec uspLoadDataAddConstraintsIndexes @date

–Switch the partition tables into place
SET @partitionNumber = dbo.getPartitionNumber(‘pfntableName’,@date)
exec Partition2Switch ‘tableName_load’, @partitionNumber, ‘tableName’

END
GO

The function getPartitionNumber and the stored procedyre Partition2Switch can be find in the article posted just before this one.

To summarize everything since yesterday:

1) There’s  a procedure that creates a new partition every day on a new filegroup and datafile
2) there’s  another procedure to clean up the oldest partition and get rid of the filegroup and data file
3) There’s a procedure to create a staging table on the filegroup created for that day.
4) data are loaded in the staging table.
5) After data are loaded on the staging table, we create constraint and indexes prior to switching
6) We switch the data to the master table.

Advertisements

Written by Nicolas de Fontenay

2009/07/17 at 7:37 am

Posted in administration

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: