SQL Server-ing you

A blog about SQL Server

SQL Server – How to implement a sliding windows partition strategy on a table

with one comment

When a table grows big,  it’s important to start thinking about partitioning.

The answer to the question of  “When is it a good time to start implementing partitions?” is: It depends. If your queries takes more than a minute on your current hardware after optimizing your SQL code then it might be a good idea to consider it.

The stored procedures that follows support a table with 1.7 billion rows and still growing.

This post address the question after the decision have been made to implement partitioning:

For this sliding windows partitioning strategy, following decisions have been made:

1) The table will be partitioned daily on the creation date
2) Each day is stored on a different filegroup and data file
3) Filegroup and data file have the following name: tablename_yyyymmdd
4) Data will be removed from the table after 28 days as well as the filegroup and data file.
5) We have decided to start loading data from 2009/07/01 only.

First of all we need to create the partitioning functions and scheme that our table will use and we need at least a few filegroups with datafiles for that.

database structure

CREATE PARTITION FUNCTION [pfnTableName](smalldatetime)
AS RANGE RIGHT
FOR VALUES (‘2009/07/01’, ‘2009/07/02′,’2009/12/2020’)

What’s important to notice here is that the partition will use a “RANGE RIGHT” function. The dates are just boundaries and the partitions will start from these dates on the right so everything bigger than 2009/07/01 will be the partition No2. What’s partition No1 then? It’s everything else that’s before ‘2009/07/01’.  We will see that with the partition scheme below

CREATE PARTITION SCHEME [pschTableName]
AS PARTITION [fnSmartSearchAssignments]
TO ([PRIMARY], [tablename_20090701],[tablename_20090702], [PRIMARY])

From the partition function above we had 3 boundaries. As a result, we have 4 partitions. The left most and right most partitions should be always empty. Otherwise you might have to deal with a right mess when you realize that your new partition for the next day have not been created and now data from 2009/07/02 and the 7 other days later are all mixed up.

Since it’s empty I use [primary] to make more sense here. But I use a default filegroup called [User] (Oracle habit) which stores any table which hasn’t been assigned explicitly to a filegroup. That way I ensure that I never get any data on [Primary].

We can now create our table:

Create table [tablename]
(

id int,
creation_date smalldatetime,
description varchar(50)
CONSTRAINT PK_id_date PRIMARY KEY CLUSTERED
(
id asc,
creation_date
) ON [pschTableName](creation-date)
) ON [pschTableName](creation_date)

Now we have a table with a clustered index both using the same partition function and scheme. It is said that the index is aligned with the data.

Now that we got our structure, we have got to automate this whole world. For that purpose, I’ve created a set of procedures which will:

Create a new filegroup
Create a new datafile
Assign the new filegroup to be used for the new partition
Create the new partition
Switch data in or out of the master table to/from a staging table

CREATE PROCEDURE [dbo].[uspPartitionCreateNewFilegroup]
@nameDB varchar(50),
@filegroup varchar(50)

AS

DECLARE @sqlcommand varchar(1000)

BEGIN

set @sqlcommand = ‘ALTER DATABASE ‘ + @nameDB + ‘ ADD FILEGROUP ‘ + @filegroup
print @sqlcommand
exec (@sqlcommand)

END
GO

CREATE PROCEDURE [dbo].[uspPartitionAddDataFiletoFilegroup]
@nameDB            varchar(50),
@logicalName    varchar(50),
@destination    varchar(100),
@filegroup        varchar(50),
@size            varchar(5)

AS

DECLARE @datafile varchar(50)
DECLARE @sqlcommand varchar(1000)

BEGIN

–Get the left part of the datafile (before the dot) to be used as logical name
set @datafile = @logicalName + ‘.ndf’
set @sqlcommand = ‘ALTER DATABASE ‘ + @nameDB  + ‘ ADD FILE (NAME = N”’ + @logicalName + ”’, FILENAME = N”’ + @destination + ‘\’ + @datafile + ”’, SIZE = ‘+ @size +’MB, MAXSIZE = UNLIMITED, FILEGROWTH = 10%) TO FILEGROUP [‘ + @filegroup + ‘]’
print @sqlcommand
exec (@sqlcommand)

END
GO

CREATE PROCEDURE [dbo].[uspPartitionDefineNextUsedFilegroup]
@schemeName varchar(50),
@filegroup varchar(50)

As

DECLARE @sqlcommand varchar(1000)

BEGIN
set @sqlcommand = ‘ALTER PARTITION SCHEME ‘ + @schemeName + ‘ NEXT USED [‘ + @filegroup + ‘]’
exec (@sqlcommand)

END
GO

CREATE PROCEDURE [dbo].[uspPartitionAddNewPartition]
@partitionFunction varchar(50),
@nextPeriod varchar(10)
AS

DECLARE @sqlcommand varchar(1000)

BEGIN

set @sqlcommand = ‘ALTER PARTITION FUNCTION ‘ + @partitionFunction + ‘() SPLIT RANGE (”’ + @nextPeriod +”’)’
exec (@sqlcommand)

END
GO

CREATE PROCEDURE [dbo].[uspPartition2Switch]
@sourceTable varchar(50),
@partitionNumber varchar(2),
@destinationTable varchar(50)
AS

DECLARE @sqlcommand varchar(1000)

BEGIN

IF (RIGHT(@sourceTable,6) = ‘switch’ or RIGHT(@sourceTable,4) = ‘load’)
SET @sqlcommand = ‘ALTER TABLE ‘ + @sourceTable + ‘ SWITCH  TO ‘ + @destinationTable + ‘ PARTITION ‘ + @partitionNumber
ELSE
SET @sqlcommand = ‘ALTER TABLE ‘ + @sourceTable + ‘ SWITCH PARTITION ‘ + @partitionNumber + ‘ TO ‘ + @destinationTable
exec (@sqlcommand)

END
GO

CREATE PROCEDURE [dbo].[PartitionMergeLeft]
@partitionFunction varchar(50),
@mergeDate varchar(10)
AS

DECLARE @sqlcommand varchar(1000)

BEGIN

SET @sqlcommand = ‘ALTER PARTITION FUNCTION ‘ + @partitionFunction + ‘() MERGE RANGE (”’ + convert(varchar(10),@mergeDate,120) + ”’)’

exec (@sqlcommand)

END
GO

These are very portable procedures generating the sql commands that will allow me to automate the whole process.
The missing ingredient is the ability to tell which partition is being used for a day. That function does it for me:

CREATE FUNCTION [dbo].[ufngetPartitionNumber]
(@functionName varchar(50)
,@rangeValue smalldatetime)
RETURNS int
AS

BEGIN

DECLARE @partitionNumber int

SELECT        @partitionNumber = [boundary_id]
FROM        sys.partition_range_values a
INNER JOIN    sys.partition_functions b
ON        a.function_id = b.function_id
WHERE        [name] = @functionName
AND            [VALUE] = @rangeValue

SET @partitionNumber = @partitionNumber + 1

RETURN @partitionNumber

END
GO

Now that all this is set up, all I got to do is assemble a procedure that will do the job for me:

CREATE PROCEDURE [dbo].[uspPartitionDailyTableNameRightSide]
@date smalldatetime = ”
AS
BEGIN

DECLARE @filegroup                    varchar(50)
,@nextDay                varchar(10)
,@partitionFunction        varchar(50)
,@schemeName            varchar(50)
,@nameDB                varchar(50)
,@destination            varchar(100)
,@tableName                varchar(50)
,@size                    varchar(50)

———–
–Create the new filegroup for tomorrow
–Add a datafile to the filegroup
–Set the tomorrow’s file group to be next used for the partition
–Create the staging table on the filegroup
–Create the new partition
–exec PartitionDailyTableNameRightSide ‘2009/07/17’
———–

–Initialization
IF @date = ”
SET @nextDay = convert(varchar(10),dateadd(dd,1,getDate()), 120)
ELSE
SET @nextday = convert(varchar(10),@date,120)

SET @partitionFunction = ‘pfnTableName’
SET @tableName = ‘TableName’
SET @filegroup = @TableName + ‘_’ + replace(@nextday,’-‘,”) –format Tablename_yyyymmdd
SET @SchemeName = ‘pschTableName’
SET @nameDB = ‘MyDB’
SET @destination = ‘C:\Database\Datafiles\MyDB’
SET @size = ‘13312’

exec PartitionCreateNewFilegroup @nameDB, @filegroup
exec PartitionAddDataFiletoFilegroup @nameDB, @filegroup, @destination, @filegroup, @size
exec PartitionDefineNextUsedFilegroup @schemeName, @filegroup
exec PartitionAddNewPartition @partitionFunction, @nextday

END
GO

For good performance during loading, it’s nice to set the expected size of the data file in advance, hence the “size” option.

On the left hand side it looks like this:

CREATE PROCEDURE [dbo].[uspPartitionDailyTableNameLeftSide]
@mergeDate smalldatetime = ”
AS

DECLARE @date                varchar(10)
,@partitionNumber    varchar(2)
,@filegroup            varchar(50)
,@sourceTable        varchar(50)
,@destinationTable    varchar(50)
,@partitionFunction varchar(50)
,@nameDB            varchar(50)

BEGIN

–initialization
Set @partitionFunction = ‘pfnTableName’
Set @sourceTable = ‘TableName’
Set @destinationTable = ‘TableName_switch’
Set @nameDB = ‘MyDB’

if @mergeDate = ”
BEGIN
SET @date = convert(varchar(10),dateadd(dd,-29,getdate()),120)
SET @partitionNumber = ‘2’
END
else
BEGIN
SET @date = convert(varchar(10),@mergeDate,120)
SET @partitionNumber = dbo.getPartitionNumber(@partitionFunction,@date)
END

SET @filegroup = @sourceTable + ‘_’ + replace(@date,’-‘,”)

–If the table exists, drop it.
if exists(select * from sys.objects where name = @destinationTable)
BEGIN
exec PartitionDropTable @destinationTable
END
–Then create it.
exec PartitionCreateTableInFilegroup @destinationTable, @filegroup, @date

exec Partition2Switch @sourceTable, @partitionNumber, @destinationTable
exec PartitionMergeLeft @partitionFunction, @date
exec PartitionDropTable @destinationTable
exec PartitionRemoveDataFile @nameDB, @filegroup –it requires data file logical name but logical name and filegroup –are the same
exec PartitionRemoveFilegroup @nameDB, @filegroup

END
GO

In this procedure there’s 2 things: First I got a CreateTableInFilegroup because to switch the data out of the master to the staging table, I need to create the table with the exact same structure as the master table on the filegroup where the data are located.

Here’s the procedure:

CREATE PROCEDURE [dbo].[PartitionCreateTableInFilegroup]
(@tableName varchar(50)
,@filegroup varchar(50)
,@day smalldatetime
)
AS
–exec PartitionCreateTableInFilegroup ‘TableName_Switch’,TableName_20090212′, ‘2009/07/18’

DECLARE @Sqlcommand varchar(4000)

BEGIN

SET @sqlcommand = ‘CREATE TABLE [dbo].TableName_switch](id int, description varchar(50), creation_date smalldatetime) ON [‘ + @filegroup + ‘]) ON [‘ + @filegroup + ‘]’
exec (@sqlcommand)

END
GO

Also, there’s a drop table stored procdure I created and use to ensure I will never throw a master table and instead just a staging table.

You can re-use the generic stored procedure as is on your environment but anybody else using
[dbo].[uspPartitionDailyTableNameRightSide]
or
[dbo].[uspPartitionDailyTableNameRightSide]
Should review and arrange it for his own environment and at his own risk.

Now that we have seen how to add a new partition on the right side and remove them on the left side, we will see in the next post how to insert huge amount data in a partition table while taking advantage of the switch ability.

Written by Nicolas de Fontenay

2009/07/16 at 11:13 am

Posted in administration

10 good habits with SQL Server

leave a comment »

This is a number of good habits to have when working with databases:

1) Don’t use dynamic SQL.  That is concatenate some string to form a SQL query then execute it using
exec (@mysqlcommand). This will forbid you and your users from re-using the execution plan set for this query. If it’s frequently used, it can be sinking your performance.

2) Store your tables outside of the Primary file group. Split them by function. As your DB grows and your company is successful you will be able to move your data files belonging to a filegroup to another disk improving your performance. Taking advantage of RAID 10 and RAID 5.

3) Store your tempdb to a different disk from where your transaction log and data files are (that would be 3 disks or Raid system). If  you can’t afford that, group tempdb and transaction log together and put your data files to a different disk.

4) Know your database intimately. How much does it grow every months, how much is the business expected to grow. Is it smooth or does it double suddenly. That will allow you to define the size of your data file when you create it. Save the disk space for this data file allows for the data inserted in table belonging to it to be contiguous. Less fragmentation, better performance.

4) Look at the execution plan.  If you’re learning how to write good  sql, looking at it and trying different ways will help. On the long run, you’ll learn how to identify bottle necks and understand the different methods used by the SQL Server engine.

5)  The join command is not a filter. That one is a tricky one. exemple:

if I do :

select a.column1, a.column2, b.column1
from tablea a
inner join tableb b
on a.id = b.id
where b.column1 =  ‘somefilter’

That will produce good results as long as the database is small.

But as it grows big and reach a few million rows, the table a will still need to be scanned from top to bottom to define which rows matches b.id for which b.column1 = ‘somefilter’.

If tablea is the big table, that’s a serious problem. Answer: filter a. Is it running on all rows or is it for the last day of inserted data? it could be this instead:

select id,  column1,  column2
into #tempa
from tablea
where date = @yesterday

select a.column1, a.column2, b.column1
from #tempa a
inner join tableb b
on a.id = b.id
where b.column1 =  ‘somefilter’

6)  If you’ve got to use the like command,  try to avoid using filters looking like this: ‘%myfitler%’.
You won’t be able to use an index with it because the SQl Server engine can’t tell where to look. It could be anything. Instead use:  like ‘myfilter%’ That way the SQL Server engine knows that you’re looking for something starting with ‘m’.

7)  Give rights on a need to know basis and ask for reasons when you’re being asked for privileges. Data are sensitive and in the end you’re the one in charge.

8) Do good backups and test them regularly. This point would be a long post in itself but… Really. I’ve seen places making backups and feeling safe like that. Better feel truly unsafe than wrongly safe.

9) When deploying solutions, use methods that won’t leave password clear in text files. The best way I’ve found is to stick my stored procedures in SSIS which is then protected either with my user key or a password of its own stored in a configuration file.

10) Have fun doing what you do. Databases are sensitive. Pressure is often high. If you’re doing a good job, nobody will notice because it means nothing is breaking, so really have a lot of fun and pleasure implementing something great. It will make you feel good 🙂

Written by Nicolas de Fontenay

2009/07/14 at 4:03 am

Transfer data from multiple tables to same tables somewhere else

leave a comment »

I’ve got this small project today which makes for a very simple yet interesting package in SSIS.

The problem is as follow: We got a database which has table generated every day with some traffic generated data in it.

We had a new server recently on which we got to migrate the data on to it. I know I know. Why didn’t we do a backup? I wonder too but I’ve been standing in front of the situation already done. Now we have data here and old data there… And we need them here.

I love using expressions because it makes a package really flexible when changing environments from dev to production for instance.

But here’s another use for expressions:

The tables have the following format: stat_daily_yyyymmdd. They need to be created then the data will be pushed from Server1 to Server2.

For this, I used a  For… Loop container, a SQL task and a data flow.

In the data flow, I use an OLE DB Source and an OLE DB destination.

1) We will need a couple of initial variable to start with as follow:

counter int32 0
date datetime 2009/01/01 (whenever the tables starts being created)

2) Then we need some variables to store our generated table and query

tableName string
properties: evaluate as expression True
Expression:

“stat_daily_” + (DT_STR,4,1252)YEAR(DATEADD(“dd”, @[User::counter] ,@[User::date]))
+ RIGHT(“0” + (DT_STR,4,1252)MONTH(DATEADD(“dd”, @[User::counter] ,@[User::date])),2)
+ RIGHT(“0” + (DT_STR,4,1252)DAY(DATEADD(“dd”, @[User::counter] ,@[User::date])),2)
This will add @counter to the initial date and an extra 0 to the date for months and days the end result looks like this:
I’ve found this technique on the excellent blog from Jamie Thomson, SSIS junkie

stat_daily_20090101
stat_daily_20090102
and so on…

Statdaily String
evaluate as expression: True
expression: “select * from ” +  @[User::tablename]

createTable String:
evaluate as expression: True
expression :”Create table ” + @[User::tablename] +
“( column1 varchar(10),
column2 int) on [myfilegroup]”

Now that we got our variables ready we need to configure our components:

1) The for loop container:

initExpression: @counter = 0
evalExpression: @date <(DT_DATE) “yyyy\mm\dd” replace “yyyy\mm\dd” by the date where it should end.
assignExpression: @counter = @counter + 1

2) Insert an SQL task into the for loop container and here’s its parameters:

connection string: Set it to your destination database

Then click on “Expressions”:
Property: SqlStatementSource
Expression: @[User::createTable]

3) Inside the for loop container, add a data flow task and link the SQL task to it.

In the data flow task:

a) Add an OLE DB Source component

choose your connection string as usual but  data access should be:

Access table name from variable.  Select the Stat_Daily variable.

b) Add an OLE DB destination and link the source to it.

Configure it as usual but choose

Table name or view name variable – fast load

Then select [User::tablename] as the variable.

Here we hit a problem…

The table is not created yet so SSIS fails to map the fields saying that the table does not exist.

To work around this,  create the table in the database, perform the mapping and drop the table.

When executing the package, it will fail validation saying that the table does not exist. To avoid this, change the property DelayValidation to True at package level.

Now you can take a break, take a drink and enjoy your favorite facebook game while the computer does the job for you.

Hope this post will help some people out there 🙂

Note the technic to map a non existent table can also be used to work with temporary table. It’s the same problem really.

Written by Nicolas de Fontenay

2009/07/08 at 10:31 am

Posted in SSIS

SSIS odds

leave a comment »

I Just realised that it’s possible for a component to hide behind a container such as for loop or sequence container.

If a component is suddenly missing, check under those !

Written by Nicolas de Fontenay

2009/07/08 at 9:56 am

Posted in Uncategorized

Tagged with ,

How to get a table definition if it’s a temporary table

leave a comment »

It works pretty much like for any other tables. You will need to be using the correct database first:

USE MyDatabase
GO

select            a.name, b.name, c.name, b.max_length, b.[precision], b.scale
from              sys.objects a
inner join    sys.columns b
on                   a.[object_id] = b.[object_id]
inner join    sys.types c
on                   b.system_type_id = c.system_type_id
where           [type] = ‘U’
and                a.name = ‘some_table_name’
order by      column_id

If it’s a temporary table however there’s a little trick.

The first thing to change is trivial:

USE tempdb
GO

select            a.name, b.name, c.name, b.max_length, b.[precision], b.scale
from              sys.objects a
inner join    sys.columns b
on                   a.[object_id] = b.[object_id]
inner join    sys.types c
on                   b.system_type_id = c.system_type_id
where           [type] = ‘U’
and                a.name like  ‘#temp%’

order by      column_id

Things to know:

1) A temporary table is a user generated table and therefore its type is ‘U’
2) The result can be a lot more than what you would expect. For #temp, you can get #temp_____________________________________________0000000039A1

This is to ensure that 2 tables with identical names from 2 different users won’t be created.

Written by Nicolas de Fontenay

2009/05/06 at 5:00 am

Welcome to SQL Server Whore

leave a comment »

Hello everyone.

passers by or people with a problem (presumably with SQL Server) are all welcome here.
Except those posting comments about enlarging there penis.

I was a DBA on Oracle Server (I still am but I don’t use it much these days). Since november last year I’ve been working with SQL Server 2005 and soon 2008.

That’s quite a big change. I’ve found out a lot about Microsoft philosophy on a lot of things and I want to use this blog mainly as my lifetime repository of ideas for SQL Server.

What you can expect here in the coming future are queries to make life easier (but hopefully SQL Server 2008 solves that part), my own thoughts about query optimization, a bit of query analysis with the execution plan and certainly some oracle/sql server comparisons.

Written by Nicolas de Fontenay

2009/05/06 at 4:42 am

Posted in General