SQL Server-ing you

A blog about SQL Server

Archive for the ‘administration’ Category

SQL Server – Load data in a partition table using switch command

leave a comment »

I suggest you read the first part before reading this post.

Yesterday we have seen how to implement a sliding window partition strategy.
Today we are going to use it and load data using a staging table and then switch the data into the master table.

To make for a complete test, we are going to add an index to yesterday’s table on creation_date and description.

It’s important to understand what happens inside the data files.

When loading the data into the staging table you can imagine your table name like a pointer. tableName_Load points to where the data are located.
When the switch command is applied, The master table tableName  points to the new slot of data stored until then into the staging table and the staging table points to Null.
The data now belongs to the staging table. In this scenario, the data never moves on the hard disk. That is why a switch command is so fast and that’s also why there’s no transaction log produced for a switch.

Important point: The switch command request for a schema change lock. It’s a very broad lock which will require that only the user performing the switch access the table at the time the switch occur. Failing that, the switch command will wait for its lock until it gets it. This, if the database is busy can be pretty long. If you experience waiting for a switch, keep in mind that it’s not a long transaction. It’s the wait crippling your process.

Last but not least:  It might look attractive to create a partition on a date because it fits our organized mind of grouping things by dates but it might not be the way which will give you the best performance. Know your data intimately, know your queries intimately. If you pull a lot of data by city id for exemple, it might be a good idea to partition your table by city_id…

Create table [tablename]
(

id int,
creation_date smalldatetime,
description varchar(50)
CONSTRAINT PK_id_date PRIMARY KEY CLUSTERED
(

id asc,
creation_date smalldatetime,
description varchar(50)
) ON [pschTableName](creation-date)
) ON [pschTableName](creation_date)

CREATE INDEX IDX_PK2_3 ON tableName (creation_date, description) ON [pschTableName](creation_date)

The idea here is to insert the data in a table of similar structure in columns, primary key and indexes as the master table.

So we are going to create a table called tablename_load. Since this table has to be on the filegroup where the data for that day will be located,  it’s best to write a stored procedure for that which will do it for you.


CREATE PROCEDURE [dbo].[uspPartitionDailyCreateSwitchTables]

@day smalldatetime
AS
— exec uspPartitionDailyCreateSwitchTables ‘2009/03/01’
DECLARE @Sqlcommand varchar(4000)
,@tablename varchar(50)
,@filegroup varchar(50)

BEGIN

SET @tableName = ‘tableName_load’
SET @filegroup = replace(@tablename,’_load’,”) + ‘_’ + replace(convert(varchar(10),@day,120),’-‘,”)

if exists(select * from sys.objects where name = @tablename)
BEGIN
SET @sqlcommand = ‘drop table ‘ + @tablename
exec (@sqlcommand)
END

SET @sqlcommand = ‘Create table [tablename] ( id int,creation_date smalldatetime,description varchar(50) ON [‘ + @filegroup + ‘]’
exec (@sqlcommand)

END
GO

This stored procedure will drop the load table if it exists and create a new one one the filegroup created for that day. What’s important to noticeis  that the primary key and indexes are not created yet at that point.

The reason is that it’s much much faster to insert data on a table with no index or primary key. So it’s important to make sure that your data integrity is good.  Otherwise the job will fail when the primary key will be created. That’s after tableName_Load has been loaded with SSIS or BCP.

Now that the table has been created and populated using your favorite loading tool, we need to create the index and add a constraint on the date (if a column using integer was used for partitioning then the constraint would be on that integer column).

The usage of a constraint is not required by me being excessively picky about my data. That’s SQL Server requirements to be able to switch data into a partition. Even though you’re sure that your data is good, the switch will fail with an error saying you’re missing a constrait. Or that the constraint for the load table is not matching the constraint for the partition.

This is the stored procedure used:

CREATE PROCEDURE [dbo].[uspLoadDataAddConstraintsIndexes]
@day smalldatetime
AS

DECLARE @sqlcommand varchar(4000)
DECLARE @filegroup varchar(50)

BEGIN
–define filegroup to create the table on
SET @filegroup = ‘tableName_’ + replace(convert(varchar(10), @day,120),’-‘,”)
IF  EXISTS (SELECT * FROM sys.check_constraints WHERE object_id = OBJECT_ID(N'[dbo].[CK_tableName_load]’) AND parent_object_id = OBJECT_ID(N'[dbo].[tableName_load]’))
ALTER TABLE [dbo].[tableName_load] DROP CONSTRAINT [CK_tableName_load]

–create the constraint
SET @sqlcommand  = ‘ALTER TABLE [dbo].[tableName_load] WITH CHECK ADD CONSTRAINT [CK_tableName_load] CHECK  ([creation_date]>= convert(smalldatetime,”’+ convert(varchar(10),@day,120) +”’) AND [creation_date]< convert(smalldatetime,”’+ convert(varchar(10),dateadd(dd,1,@day),120)+”’))’
exec (@sqlcommand)
SET @sqlcommand = ‘ALTER TABLE [dbo].[tableName_load] CHECK CONSTRAINT [CK_tableName_load]’
exec (@sqlcommand)
set @sqlcommand = ‘ALTER TABLE [dbo].[tableName_load] ADD CONSTRAINT [PK_tableName_load] PRIMARY KEY CLUSTERED([id ASC, creation_date ASC) ON [‘ + @filegroup + ‘]’
exec (@sqlcommand)
–create the index
set @sqlcommand = ‘CREATE INDEX IDX_PK2_3 ON [dbo].[tableName_load] (creation_date ASC , description ASC) ON [‘ + @filegroup + ‘]’
exec (@sqlcommand)
END
GO

Finally all we need to do is compile the whole as follow:

CREATE PROCEDURE [dbo].[uspLoadData]
@date smalldatetime
AS

DECLARE @partitionNumber varchar(2)

BEGIN

–Create constraints before switching and after BCP
exec uspLoadDataAddConstraintsIndexes @date

–Switch the partition tables into place
SET @partitionNumber = dbo.getPartitionNumber(‘pfntableName’,@date)
exec Partition2Switch ‘tableName_load’, @partitionNumber, ‘tableName’

END
GO

The function getPartitionNumber and the stored procedyre Partition2Switch can be find in the article posted just before this one.

To summarize everything since yesterday:

1) There’s  a procedure that creates a new partition every day on a new filegroup and datafile
2) there’s  another procedure to clean up the oldest partition and get rid of the filegroup and data file
3) There’s a procedure to create a staging table on the filegroup created for that day.
4) data are loaded in the staging table.
5) After data are loaded on the staging table, we create constraint and indexes prior to switching
6) We switch the data to the master table.

Written by Nicolas de Fontenay

2009/07/17 at 7:37 am

Posted in administration

SQL Server – How to implement a sliding windows partition strategy on a table

with one comment

When a table grows big,  it’s important to start thinking about partitioning.

The answer to the question of  “When is it a good time to start implementing partitions?” is: It depends. If your queries takes more than a minute on your current hardware after optimizing your SQL code then it might be a good idea to consider it.

The stored procedures that follows support a table with 1.7 billion rows and still growing.

This post address the question after the decision have been made to implement partitioning:

For this sliding windows partitioning strategy, following decisions have been made:

1) The table will be partitioned daily on the creation date
2) Each day is stored on a different filegroup and data file
3) Filegroup and data file have the following name: tablename_yyyymmdd
4) Data will be removed from the table after 28 days as well as the filegroup and data file.
5) We have decided to start loading data from 2009/07/01 only.

First of all we need to create the partitioning functions and scheme that our table will use and we need at least a few filegroups with datafiles for that.

database structure

CREATE PARTITION FUNCTION [pfnTableName](smalldatetime)
AS RANGE RIGHT
FOR VALUES (‘2009/07/01’, ‘2009/07/02′,’2009/12/2020’)

What’s important to notice here is that the partition will use a “RANGE RIGHT” function. The dates are just boundaries and the partitions will start from these dates on the right so everything bigger than 2009/07/01 will be the partition No2. What’s partition No1 then? It’s everything else that’s before ‘2009/07/01’.  We will see that with the partition scheme below

CREATE PARTITION SCHEME [pschTableName]
AS PARTITION [fnSmartSearchAssignments]
TO ([PRIMARY], [tablename_20090701],[tablename_20090702], [PRIMARY])

From the partition function above we had 3 boundaries. As a result, we have 4 partitions. The left most and right most partitions should be always empty. Otherwise you might have to deal with a right mess when you realize that your new partition for the next day have not been created and now data from 2009/07/02 and the 7 other days later are all mixed up.

Since it’s empty I use [primary] to make more sense here. But I use a default filegroup called [User] (Oracle habit) which stores any table which hasn’t been assigned explicitly to a filegroup. That way I ensure that I never get any data on [Primary].

We can now create our table:

Create table [tablename]
(

id int,
creation_date smalldatetime,
description varchar(50)
CONSTRAINT PK_id_date PRIMARY KEY CLUSTERED
(
id asc,
creation_date
) ON [pschTableName](creation-date)
) ON [pschTableName](creation_date)

Now we have a table with a clustered index both using the same partition function and scheme. It is said that the index is aligned with the data.

Now that we got our structure, we have got to automate this whole world. For that purpose, I’ve created a set of procedures which will:

Create a new filegroup
Create a new datafile
Assign the new filegroup to be used for the new partition
Create the new partition
Switch data in or out of the master table to/from a staging table

CREATE PROCEDURE [dbo].[uspPartitionCreateNewFilegroup]
@nameDB varchar(50),
@filegroup varchar(50)

AS

DECLARE @sqlcommand varchar(1000)

BEGIN

set @sqlcommand = ‘ALTER DATABASE ‘ + @nameDB + ‘ ADD FILEGROUP ‘ + @filegroup
print @sqlcommand
exec (@sqlcommand)

END
GO

CREATE PROCEDURE [dbo].[uspPartitionAddDataFiletoFilegroup]
@nameDB            varchar(50),
@logicalName    varchar(50),
@destination    varchar(100),
@filegroup        varchar(50),
@size            varchar(5)

AS

DECLARE @datafile varchar(50)
DECLARE @sqlcommand varchar(1000)

BEGIN

–Get the left part of the datafile (before the dot) to be used as logical name
set @datafile = @logicalName + ‘.ndf’
set @sqlcommand = ‘ALTER DATABASE ‘ + @nameDB  + ‘ ADD FILE (NAME = N”’ + @logicalName + ”’, FILENAME = N”’ + @destination + ‘\’ + @datafile + ”’, SIZE = ‘+ @size +’MB, MAXSIZE = UNLIMITED, FILEGROWTH = 10%) TO FILEGROUP [‘ + @filegroup + ‘]’
print @sqlcommand
exec (@sqlcommand)

END
GO

CREATE PROCEDURE [dbo].[uspPartitionDefineNextUsedFilegroup]
@schemeName varchar(50),
@filegroup varchar(50)

As

DECLARE @sqlcommand varchar(1000)

BEGIN
set @sqlcommand = ‘ALTER PARTITION SCHEME ‘ + @schemeName + ‘ NEXT USED [‘ + @filegroup + ‘]’
exec (@sqlcommand)

END
GO

CREATE PROCEDURE [dbo].[uspPartitionAddNewPartition]
@partitionFunction varchar(50),
@nextPeriod varchar(10)
AS

DECLARE @sqlcommand varchar(1000)

BEGIN

set @sqlcommand = ‘ALTER PARTITION FUNCTION ‘ + @partitionFunction + ‘() SPLIT RANGE (”’ + @nextPeriod +”’)’
exec (@sqlcommand)

END
GO

CREATE PROCEDURE [dbo].[uspPartition2Switch]
@sourceTable varchar(50),
@partitionNumber varchar(2),
@destinationTable varchar(50)
AS

DECLARE @sqlcommand varchar(1000)

BEGIN

IF (RIGHT(@sourceTable,6) = ‘switch’ or RIGHT(@sourceTable,4) = ‘load’)
SET @sqlcommand = ‘ALTER TABLE ‘ + @sourceTable + ‘ SWITCH  TO ‘ + @destinationTable + ‘ PARTITION ‘ + @partitionNumber
ELSE
SET @sqlcommand = ‘ALTER TABLE ‘ + @sourceTable + ‘ SWITCH PARTITION ‘ + @partitionNumber + ‘ TO ‘ + @destinationTable
exec (@sqlcommand)

END
GO

CREATE PROCEDURE [dbo].[PartitionMergeLeft]
@partitionFunction varchar(50),
@mergeDate varchar(10)
AS

DECLARE @sqlcommand varchar(1000)

BEGIN

SET @sqlcommand = ‘ALTER PARTITION FUNCTION ‘ + @partitionFunction + ‘() MERGE RANGE (”’ + convert(varchar(10),@mergeDate,120) + ”’)’

exec (@sqlcommand)

END
GO

These are very portable procedures generating the sql commands that will allow me to automate the whole process.
The missing ingredient is the ability to tell which partition is being used for a day. That function does it for me:

CREATE FUNCTION [dbo].[ufngetPartitionNumber]
(@functionName varchar(50)
,@rangeValue smalldatetime)
RETURNS int
AS

BEGIN

DECLARE @partitionNumber int

SELECT        @partitionNumber = [boundary_id]
FROM        sys.partition_range_values a
INNER JOIN    sys.partition_functions b
ON        a.function_id = b.function_id
WHERE        [name] = @functionName
AND            [VALUE] = @rangeValue

SET @partitionNumber = @partitionNumber + 1

RETURN @partitionNumber

END
GO

Now that all this is set up, all I got to do is assemble a procedure that will do the job for me:

CREATE PROCEDURE [dbo].[uspPartitionDailyTableNameRightSide]
@date smalldatetime = ”
AS
BEGIN

DECLARE @filegroup                    varchar(50)
,@nextDay                varchar(10)
,@partitionFunction        varchar(50)
,@schemeName            varchar(50)
,@nameDB                varchar(50)
,@destination            varchar(100)
,@tableName                varchar(50)
,@size                    varchar(50)

———–
–Create the new filegroup for tomorrow
–Add a datafile to the filegroup
–Set the tomorrow’s file group to be next used for the partition
–Create the staging table on the filegroup
–Create the new partition
–exec PartitionDailyTableNameRightSide ‘2009/07/17’
———–

–Initialization
IF @date = ”
SET @nextDay = convert(varchar(10),dateadd(dd,1,getDate()), 120)
ELSE
SET @nextday = convert(varchar(10),@date,120)

SET @partitionFunction = ‘pfnTableName’
SET @tableName = ‘TableName’
SET @filegroup = @TableName + ‘_’ + replace(@nextday,’-‘,”) –format Tablename_yyyymmdd
SET @SchemeName = ‘pschTableName’
SET @nameDB = ‘MyDB’
SET @destination = ‘C:\Database\Datafiles\MyDB’
SET @size = ‘13312’

exec PartitionCreateNewFilegroup @nameDB, @filegroup
exec PartitionAddDataFiletoFilegroup @nameDB, @filegroup, @destination, @filegroup, @size
exec PartitionDefineNextUsedFilegroup @schemeName, @filegroup
exec PartitionAddNewPartition @partitionFunction, @nextday

END
GO

For good performance during loading, it’s nice to set the expected size of the data file in advance, hence the “size” option.

On the left hand side it looks like this:

CREATE PROCEDURE [dbo].[uspPartitionDailyTableNameLeftSide]
@mergeDate smalldatetime = ”
AS

DECLARE @date                varchar(10)
,@partitionNumber    varchar(2)
,@filegroup            varchar(50)
,@sourceTable        varchar(50)
,@destinationTable    varchar(50)
,@partitionFunction varchar(50)
,@nameDB            varchar(50)

BEGIN

–initialization
Set @partitionFunction = ‘pfnTableName’
Set @sourceTable = ‘TableName’
Set @destinationTable = ‘TableName_switch’
Set @nameDB = ‘MyDB’

if @mergeDate = ”
BEGIN
SET @date = convert(varchar(10),dateadd(dd,-29,getdate()),120)
SET @partitionNumber = ‘2’
END
else
BEGIN
SET @date = convert(varchar(10),@mergeDate,120)
SET @partitionNumber = dbo.getPartitionNumber(@partitionFunction,@date)
END

SET @filegroup = @sourceTable + ‘_’ + replace(@date,’-‘,”)

–If the table exists, drop it.
if exists(select * from sys.objects where name = @destinationTable)
BEGIN
exec PartitionDropTable @destinationTable
END
–Then create it.
exec PartitionCreateTableInFilegroup @destinationTable, @filegroup, @date

exec Partition2Switch @sourceTable, @partitionNumber, @destinationTable
exec PartitionMergeLeft @partitionFunction, @date
exec PartitionDropTable @destinationTable
exec PartitionRemoveDataFile @nameDB, @filegroup –it requires data file logical name but logical name and filegroup –are the same
exec PartitionRemoveFilegroup @nameDB, @filegroup

END
GO

In this procedure there’s 2 things: First I got a CreateTableInFilegroup because to switch the data out of the master to the staging table, I need to create the table with the exact same structure as the master table on the filegroup where the data are located.

Here’s the procedure:

CREATE PROCEDURE [dbo].[PartitionCreateTableInFilegroup]
(@tableName varchar(50)
,@filegroup varchar(50)
,@day smalldatetime
)
AS
–exec PartitionCreateTableInFilegroup ‘TableName_Switch’,TableName_20090212′, ‘2009/07/18’

DECLARE @Sqlcommand varchar(4000)

BEGIN

SET @sqlcommand = ‘CREATE TABLE [dbo].TableName_switch](id int, description varchar(50), creation_date smalldatetime) ON [‘ + @filegroup + ‘]) ON [‘ + @filegroup + ‘]’
exec (@sqlcommand)

END
GO

Also, there’s a drop table stored procdure I created and use to ensure I will never throw a master table and instead just a staging table.

You can re-use the generic stored procedure as is on your environment but anybody else using
[dbo].[uspPartitionDailyTableNameRightSide]
or
[dbo].[uspPartitionDailyTableNameRightSide]
Should review and arrange it for his own environment and at his own risk.

Now that we have seen how to add a new partition on the right side and remove them on the left side, we will see in the next post how to insert huge amount data in a partition table while taking advantage of the switch ability.

Written by Nicolas de Fontenay

2009/07/16 at 11:13 am

Posted in administration