SQL Server-ing you

A blog about SQL Server

SQL Server – How to implement a sliding windows partition strategy on a table

with one comment

When a table grows big,  it’s important to start thinking about partitioning.

The answer to the question of  “When is it a good time to start implementing partitions?” is: It depends. If your queries takes more than a minute on your current hardware after optimizing your SQL code then it might be a good idea to consider it.

The stored procedures that follows support a table with 1.7 billion rows and still growing.

This post address the question after the decision have been made to implement partitioning:

For this sliding windows partitioning strategy, following decisions have been made:

1) The table will be partitioned daily on the creation date
2) Each day is stored on a different filegroup and data file
3) Filegroup and data file have the following name: tablename_yyyymmdd
4) Data will be removed from the table after 28 days as well as the filegroup and data file.
5) We have decided to start loading data from 2009/07/01 only.

First of all we need to create the partitioning functions and scheme that our table will use and we need at least a few filegroups with datafiles for that.

database structure

CREATE PARTITION FUNCTION [pfnTableName](smalldatetime)
AS RANGE RIGHT
FOR VALUES (‘2009/07/01’, ‘2009/07/02′,’2009/12/2020’)

What’s important to notice here is that the partition will use a “RANGE RIGHT” function. The dates are just boundaries and the partitions will start from these dates on the right so everything bigger than 2009/07/01 will be the partition No2. What’s partition No1 then? It’s everything else that’s before ‘2009/07/01’.  We will see that with the partition scheme below

CREATE PARTITION SCHEME [pschTableName]
AS PARTITION [fnSmartSearchAssignments]
TO ([PRIMARY], [tablename_20090701],[tablename_20090702], [PRIMARY])

From the partition function above we had 3 boundaries. As a result, we have 4 partitions. The left most and right most partitions should be always empty. Otherwise you might have to deal with a right mess when you realize that your new partition for the next day have not been created and now data from 2009/07/02 and the 7 other days later are all mixed up.

Since it’s empty I use [primary] to make more sense here. But I use a default filegroup called [User] (Oracle habit) which stores any table which hasn’t been assigned explicitly to a filegroup. That way I ensure that I never get any data on [Primary].

We can now create our table:

Create table [tablename]
(

id int,
creation_date smalldatetime,
description varchar(50)
CONSTRAINT PK_id_date PRIMARY KEY CLUSTERED
(
id asc,
creation_date
) ON [pschTableName](creation-date)
) ON [pschTableName](creation_date)

Now we have a table with a clustered index both using the same partition function and scheme. It is said that the index is aligned with the data.

Now that we got our structure, we have got to automate this whole world. For that purpose, I’ve created a set of procedures which will:

Create a new filegroup
Create a new datafile
Assign the new filegroup to be used for the new partition
Create the new partition
Switch data in or out of the master table to/from a staging table

CREATE PROCEDURE [dbo].[uspPartitionCreateNewFilegroup]
@nameDB varchar(50),
@filegroup varchar(50)

AS

DECLARE @sqlcommand varchar(1000)

BEGIN

set @sqlcommand = ‘ALTER DATABASE ‘ + @nameDB + ‘ ADD FILEGROUP ‘ + @filegroup
print @sqlcommand
exec (@sqlcommand)

END
GO

CREATE PROCEDURE [dbo].[uspPartitionAddDataFiletoFilegroup]
@nameDB            varchar(50),
@logicalName    varchar(50),
@destination    varchar(100),
@filegroup        varchar(50),
@size            varchar(5)

AS

DECLARE @datafile varchar(50)
DECLARE @sqlcommand varchar(1000)

BEGIN

–Get the left part of the datafile (before the dot) to be used as logical name
set @datafile = @logicalName + ‘.ndf’
set @sqlcommand = ‘ALTER DATABASE ‘ + @nameDB  + ‘ ADD FILE (NAME = N”’ + @logicalName + ”’, FILENAME = N”’ + @destination + ‘\’ + @datafile + ”’, SIZE = ‘+ @size +’MB, MAXSIZE = UNLIMITED, FILEGROWTH = 10%) TO FILEGROUP [‘ + @filegroup + ‘]’
print @sqlcommand
exec (@sqlcommand)

END
GO

CREATE PROCEDURE [dbo].[uspPartitionDefineNextUsedFilegroup]
@schemeName varchar(50),
@filegroup varchar(50)

As

DECLARE @sqlcommand varchar(1000)

BEGIN
set @sqlcommand = ‘ALTER PARTITION SCHEME ‘ + @schemeName + ‘ NEXT USED [‘ + @filegroup + ‘]’
exec (@sqlcommand)

END
GO

CREATE PROCEDURE [dbo].[uspPartitionAddNewPartition]
@partitionFunction varchar(50),
@nextPeriod varchar(10)
AS

DECLARE @sqlcommand varchar(1000)

BEGIN

set @sqlcommand = ‘ALTER PARTITION FUNCTION ‘ + @partitionFunction + ‘() SPLIT RANGE (”’ + @nextPeriod +”’)’
exec (@sqlcommand)

END
GO

CREATE PROCEDURE [dbo].[uspPartition2Switch]
@sourceTable varchar(50),
@partitionNumber varchar(2),
@destinationTable varchar(50)
AS

DECLARE @sqlcommand varchar(1000)

BEGIN

IF (RIGHT(@sourceTable,6) = ‘switch’ or RIGHT(@sourceTable,4) = ‘load’)
SET @sqlcommand = ‘ALTER TABLE ‘ + @sourceTable + ‘ SWITCH  TO ‘ + @destinationTable + ‘ PARTITION ‘ + @partitionNumber
ELSE
SET @sqlcommand = ‘ALTER TABLE ‘ + @sourceTable + ‘ SWITCH PARTITION ‘ + @partitionNumber + ‘ TO ‘ + @destinationTable
exec (@sqlcommand)

END
GO

CREATE PROCEDURE [dbo].[PartitionMergeLeft]
@partitionFunction varchar(50),
@mergeDate varchar(10)
AS

DECLARE @sqlcommand varchar(1000)

BEGIN

SET @sqlcommand = ‘ALTER PARTITION FUNCTION ‘ + @partitionFunction + ‘() MERGE RANGE (”’ + convert(varchar(10),@mergeDate,120) + ”’)’

exec (@sqlcommand)

END
GO

These are very portable procedures generating the sql commands that will allow me to automate the whole process.
The missing ingredient is the ability to tell which partition is being used for a day. That function does it for me:

CREATE FUNCTION [dbo].[ufngetPartitionNumber]
(@functionName varchar(50)
,@rangeValue smalldatetime)
RETURNS int
AS

BEGIN

DECLARE @partitionNumber int

SELECT        @partitionNumber = [boundary_id]
FROM        sys.partition_range_values a
INNER JOIN    sys.partition_functions b
ON        a.function_id = b.function_id
WHERE        [name] = @functionName
AND            [VALUE] = @rangeValue

SET @partitionNumber = @partitionNumber + 1

RETURN @partitionNumber

END
GO

Now that all this is set up, all I got to do is assemble a procedure that will do the job for me:

CREATE PROCEDURE [dbo].[uspPartitionDailyTableNameRightSide]
@date smalldatetime = ”
AS
BEGIN

DECLARE @filegroup                    varchar(50)
,@nextDay                varchar(10)
,@partitionFunction        varchar(50)
,@schemeName            varchar(50)
,@nameDB                varchar(50)
,@destination            varchar(100)
,@tableName                varchar(50)
,@size                    varchar(50)

———–
–Create the new filegroup for tomorrow
–Add a datafile to the filegroup
–Set the tomorrow’s file group to be next used for the partition
–Create the staging table on the filegroup
–Create the new partition
–exec PartitionDailyTableNameRightSide ‘2009/07/17’
———–

–Initialization
IF @date = ”
SET @nextDay = convert(varchar(10),dateadd(dd,1,getDate()), 120)
ELSE
SET @nextday = convert(varchar(10),@date,120)

SET @partitionFunction = ‘pfnTableName’
SET @tableName = ‘TableName’
SET @filegroup = @TableName + ‘_’ + replace(@nextday,’-‘,”) –format Tablename_yyyymmdd
SET @SchemeName = ‘pschTableName’
SET @nameDB = ‘MyDB’
SET @destination = ‘C:\Database\Datafiles\MyDB’
SET @size = ‘13312’

exec PartitionCreateNewFilegroup @nameDB, @filegroup
exec PartitionAddDataFiletoFilegroup @nameDB, @filegroup, @destination, @filegroup, @size
exec PartitionDefineNextUsedFilegroup @schemeName, @filegroup
exec PartitionAddNewPartition @partitionFunction, @nextday

END
GO

For good performance during loading, it’s nice to set the expected size of the data file in advance, hence the “size” option.

On the left hand side it looks like this:

CREATE PROCEDURE [dbo].[uspPartitionDailyTableNameLeftSide]
@mergeDate smalldatetime = ”
AS

DECLARE @date                varchar(10)
,@partitionNumber    varchar(2)
,@filegroup            varchar(50)
,@sourceTable        varchar(50)
,@destinationTable    varchar(50)
,@partitionFunction varchar(50)
,@nameDB            varchar(50)

BEGIN

–initialization
Set @partitionFunction = ‘pfnTableName’
Set @sourceTable = ‘TableName’
Set @destinationTable = ‘TableName_switch’
Set @nameDB = ‘MyDB’

if @mergeDate = ”
BEGIN
SET @date = convert(varchar(10),dateadd(dd,-29,getdate()),120)
SET @partitionNumber = ‘2’
END
else
BEGIN
SET @date = convert(varchar(10),@mergeDate,120)
SET @partitionNumber = dbo.getPartitionNumber(@partitionFunction,@date)
END

SET @filegroup = @sourceTable + ‘_’ + replace(@date,’-‘,”)

–If the table exists, drop it.
if exists(select * from sys.objects where name = @destinationTable)
BEGIN
exec PartitionDropTable @destinationTable
END
–Then create it.
exec PartitionCreateTableInFilegroup @destinationTable, @filegroup, @date

exec Partition2Switch @sourceTable, @partitionNumber, @destinationTable
exec PartitionMergeLeft @partitionFunction, @date
exec PartitionDropTable @destinationTable
exec PartitionRemoveDataFile @nameDB, @filegroup –it requires data file logical name but logical name and filegroup –are the same
exec PartitionRemoveFilegroup @nameDB, @filegroup

END
GO

In this procedure there’s 2 things: First I got a CreateTableInFilegroup because to switch the data out of the master to the staging table, I need to create the table with the exact same structure as the master table on the filegroup where the data are located.

Here’s the procedure:

CREATE PROCEDURE [dbo].[PartitionCreateTableInFilegroup]
(@tableName varchar(50)
,@filegroup varchar(50)
,@day smalldatetime
)
AS
–exec PartitionCreateTableInFilegroup ‘TableName_Switch’,TableName_20090212′, ‘2009/07/18’

DECLARE @Sqlcommand varchar(4000)

BEGIN

SET @sqlcommand = ‘CREATE TABLE [dbo].TableName_switch](id int, description varchar(50), creation_date smalldatetime) ON [‘ + @filegroup + ‘]) ON [‘ + @filegroup + ‘]’
exec (@sqlcommand)

END
GO

Also, there’s a drop table stored procdure I created and use to ensure I will never throw a master table and instead just a staging table.

You can re-use the generic stored procedure as is on your environment but anybody else using
[dbo].[uspPartitionDailyTableNameRightSide]
or
[dbo].[uspPartitionDailyTableNameRightSide]
Should review and arrange it for his own environment and at his own risk.

Now that we have seen how to add a new partition on the right side and remove them on the left side, we will see in the next post how to insert huge amount data in a partition table while taking advantage of the switch ability.

Advertisements

Written by Nicolas de Fontenay

2009/07/16 at 11:13 am

Posted in administration

One Response

Subscribe to comments with RSS.

  1. […] a comment » I suggest you read the first part before reading this […]


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: