SQL Server-ing you

SSIS – How to get the last date of a month?

I need to get the first and last date of a month in SSIS. These will be used to process partitions in my SSAS Cube afterwards.

The riddle answer is: The last day of the month is the day before the first day of next month.

1) Create 3 variables: Date, MonthBeginning and MonthEnding

2) Change MonthBeginning property “evaluate as expression to true” and edit the expression as follow
(DT_WSTR,4)YEAR(@[User::Date]) + “/” + (DT_WSTR,4)MONTH(@[User::Date]) + “/01”
3) Change MonthEnding property “evaluate as expression to true” and edit the expression as follow:
(DT_WSTR,4)YEAR(DATEADD(“dd”,-1,DATEADD(“mm”,1,(DT_DATE) @[User::MonthBeginning]))) + “/” + (DT_WSTR,4)MONTH(DATEADD(“dd”,-1,DATEADD(“mm”,1,(DT_DATE) @[User::MonthBeginning]))) + “/” + (DT_WSTR,4)DAY(DATEADD(“dd”,-1,DATEADD(“mm”,1,(DT_DATE) @[User::MonthBeginning])))

I like my dates in the yyyy/mm/dd format hence the big expression above otherwise it returns something looking like this: mm/dd/yyyy. I hate this.

yyyy/mm/dd format makes it easier for me since I just need to replace / with “” (empty string) to get a nice integer 20091001 and 20091031 which will represent the foreign key in my fact table 🙂 yay.

Then you can go along and use this to partition your cube and so on.

To summarize: we get the first day of the month we seek, say 2009/10/01. We add one month: 2009/11/01 and we remove one day: 2009/10/31. Bingo. Last date of the month.

Written by Nicolas de Fontenay

2009/10/29 at 8:18 am

Posted in Uncategorized

Tagged with BI, Business Intelligence, cube partition, Data Warehouse, partition, sql server, SSAS, SSIS

SSIS and file reading using parallelism

leave a comment »

In this post, I’m discussing the usage of parallelism in SSIS when reading multiple flat files of different type located in the same folder.

So what we have is an SSIS control flow with 5 different data flow components.
Each data flow components perform a very simple task using a multiflatfile connection as a source and a database as a destinationeach feeding a different table.

I can’t even give the name of the different file types because of the secrecy lines in my contract. But let’s call them:

Type1, Type2, Type3,Type4 and Type5.

Every hour a job collects data from a cluster of 7 servers and ftp it to our server. That is 24 files * 7 a day per type = 168 files per file type and 840 files total.

They are pretty large data too, so to speed up the process rather than running them one after the other, I have decided to run them all together. We have a powerful server with plenty of RAM and processors and I wanted to take advantage of it. It’s not processors nor Ram that failed in this design. It was access to the folder by each of the data flow component.

Since the files are sent in a zip file hourly, they are all unzip in a folder. All of the control flow will access the folder and get their assigned files as defined by the multiflatfile connection (D:\project\yyyymmdd\type1*.txt, D:\project\yyyymmdd\type2*.txt) etc..

A few days after deployment, we started noticing that some of tables was not having the correct amount of data. Sometimes it was significantly less. We also noticed that the afflicted table was more often those starting with the first alphabetical letters (tables with a name starting by a or c).

As it turns out, each data flow in the process of identifying which file belongs to him, locks it to read the name and select or reject it. In a situation where all starts at the same time, the data flow to which the file belongs has big chances that when he tries to access it to read its content, the said file is locked… The problem will get more serious as the number of data flow trying to access files in the same folder increase.

Solution:

1) Put each file types in a different folder

Using a foreach … loop component doesn’t help solving the problem because it also locks the files in the folder to read it and build its collection of objects.

2) Change your package property “FailParentOnFailure” to true. This will ensure that in the event where one of the data flow fail (including lock files), the job will also fail rather than keep loading whatever is left.

Written by Nicolas de Fontenay

2009/10/05 at 2:32 am

Posted in SSIS

T_SQL – Stored Procedure to create a time dimension

leave a comment »

This is a stored procedure that will build a 24 hour time dimension with it’s boundaries.

Note: you should create the final table and plan for a surrogate key with identity(1,1).

Also, to store hours in a 24h style, it’s not possible to use an int data type as this would wipe out the starting 0 (zero).

Anyway. here goes:

ALTER PROCEDURE Sp_Create_Dim_Time
AS

BEGIN

DECLARE @COUNT INT
SET @COUNT = 1

CREATE TABLE Dim_Time_TEMP
(

Dim_Time_Time varchar(2) not null,
Dim_Time_min varchar(8) not null,
Dim_Time_max varchar(8) not null
)

WHILE @COUNT <= 24
BEGIN

INSERT INTO Dim_Time_TEMP
SELECT CASE WHEN
LEN(CONVERT(VARCHAR(2),DATEPART(HH,DATEADD(HH,@COUNT,GETDATE())))) = 1
THEN
‘0’ + CONVERT(VARCHAR(2),DATEPART(HH,DATEADD(HH,@COUNT,GETDATE())))
ELSE
CONVERT(VARCHAR(2),DATEPART(HH,DATEADD(HH,@COUNT,GETDATE())))
END [TIME],

CASE WHEN
LEN(CONVERT(VARCHAR(2),DATEPART(HH,DATEADD(HH,@COUNT,GETDATE())))) = 1
THEN
‘0’ + CONVERT(VARCHAR(2),DATEPART(HH,DATEADD(HH,@COUNT,GETDATE()))) + ‘:00:00’
ELSE
CONVERT(VARCHAR(2),DATEPART(HH,DATEADD(HH,@COUNT,GETDATE()))) + ‘:00:00’
END [MIN],

CASE WHEN
LEN(CONVERT(VARCHAR(2),DATEPART(HH,DATEADD(HH,@COUNT,GETDATE())))) = 1
THEN
‘0’ + CONVERT(VARCHAR(2),DATEPART(HH,DATEADD(HH,@COUNT,GETDATE()))) + ‘:59:59’
ELSE
CONVERT(VARCHAR(2),DATEPART(HH,DATEADD(HH,@COUNT,GETDATE()))) + ‘:59:59’
END [MAX]

SET @COUNT = @COUNT + 1

END

INSERT INTO Dim_Time (Dim_Time_Time, Dim_Time_Min, Dim_Time_Max)
SELECT       Dim_Time_Time, Dim_Time_Min, Dim_Time_Max
FROM       Dim_Time_TEMP
ORDER BY   Dim_Time_Time

DROP TABLE Dim_Time_TEMP

END

The result will look like this:

Dim_Time_Sk   Dim_Time_Time   Dim_Time_Min   Dim_Time_Max
1   00   00:00:00   00:59:59
2   01   01:00:00   01:59:59
3   02   02:00:00   02:59:59
4   03   03:00:00   03:59:59
5   04   04:00:00   04:59:59
6   05   05:00:00   05:59:59
7   06   06:00:00   06:59:59
8   07   07:00:00   07:59:59
9   08   08:00:00   08:59:59
10   09   09:00:00   09:59:59
11   10   10:00:00   10:59:59
12   11   11:00:00   11:59:59
13   12   12:00:00   12:59:59
14   13   13:00:00   13:59:59
15   14   14:00:00   14:59:59
16   15   15:00:00   15:59:59
17   16   16:00:00   16:59:59
18   17   17:00:00   17:59:59
19   18   18:00:00   18:59:59
20   19   19:00:00   19:59:59
21   20   20:00:00   20:59:59
22   21   21:00:00   21:59:59
23   22   22:00:00   22:59:59
24   23   23:00:00   23:59:59

Hope you enjoyed

Nico

Written by Nicolas de Fontenay

2009/09/04 at 8:29 am

Posted in Data Warehouse

a send-mail function for powershell

leave a comment »

I’ve been working a lot with powershell.

Our data warehouse processes are going to be controlled by powershell which in return calls SSIS procedures.

Every steps are logged for success or failure. Everything is tracked.

At powershell level however I needed to be informed if there’s a string malformation when dtexec is called for exemple.

So I’ve built this send-mail function which I think will be useful to anyone required to send mails:

function Send-Email
{
param
(
[string]$message,
[string]$subject,
[string]$to,
[string]$from,
[string]$attachment
)
$SmtpClient = new-object system.net.mail.smtpClient
$MailMessage = New-Object system.net.mail.mailmessage
$SmtpClient.Host = “smtp.mail.com”
$mailmessage.from = ($from)
$mailmessage.To.add($to)
$mailmessage.Subject = $subject
$mailmessage.Body = $message
if ($attachment)
{
$att = new-object Net.Mail.Attachment($attachment)
$MailMessage.Attachments.Add($att)
}
$smtpclient.Send($mailmessage)
}

usage:

send-mail “some message” “my subject” “to@anaddress.com” “from@me.com”

send-mail “some message” “my subject” “to@anaddress.com” “from@me.com” “D:\Test.txt”

The function checks whether $attachment is null or not and will attach a document if it finds any.

Written by Nicolas de Fontenay

2009/09/01 at 7:59 am

Posted in Uncategorized

Tagged with mail, powershell, script, send mail

SSIS – variables takes initial value before using expressions?

leave a comment »

wait a second… Could it… Is it…

I have a pretty straight forward SSIS job using a bunch of expressions set variables and I was having something really weird: Every day, I would have an extra row with data from the same date 2009/07/15. Even though I have more than one file for that day (2009/07/15), only the first file (which has one row) would be loaded. Then it loads the rest for the current day.

What happened here?! Of course when something like that occur, you check your sources.

The way things works, I got a date variable which is defined when calling dtexec with the day to be loaded (say 2009/07/19). Then this date is used to set the path for the files to be loaded: D:\Load\20090719.

The date variable was set to 2009/07/15. That very date for which I always get an extra row every single day.

I’ve changed it to blank. I’m going to run some tests to see what happens. Will keep posted later today 🙂

Written by Nicolas de Fontenay

2009/07/20 at 2:19 am

Posted in SSIS

SQL Server – Load data in a partition table using switch command

leave a comment »

I suggest you read the first part before reading this post.

Yesterday we have seen how to implement a sliding window partition strategy.
Today we are going to use it and load data using a staging table and then switch the data into the master table.

To make for a complete test, we are going to add an index to yesterday’s table on creation_date and description.

It’s important to understand what happens inside the data files.

When loading the data into the staging table you can imagine your table name like a pointer. tableName_Load points to where the data are located.
When the switch command is applied, The master table tableName points to the new slot of data stored until then into the staging table and the staging table points to Null.
The data now belongs to the staging table. In this scenario, the data never moves on the hard disk. That is why a switch command is so fast and that’s also why there’s no transaction log produced for a switch.

Important point: The switch command request for a schema change lock. It’s a very broad lock which will require that only the user performing the switch access the table at the time the switch occur. Failing that, the switch command will wait for its lock until it gets it. This, if the database is busy can be pretty long. If you experience waiting for a switch, keep in mind that it’s not a long transaction. It’s the wait crippling your process.

Last but not least: It might look attractive to create a partition on a date because it fits our organized mind of grouping things by dates but it might not be the way which will give you the best performance. Know your data intimately, know your queries intimately. If you pull a lot of data by city id for exemple, it might be a good idea to partition your table by city_id…

Create table [tablename]
(

id int,
creation_date smalldatetime,
description varchar(50)
CONSTRAINT PK_id_date PRIMARY KEY CLUSTERED
(

id asc,
creation_date smalldatetime,
description varchar(50)
) ON [pschTableName](creation-date)
) ON [pschTableName](creation_date)

CREATE INDEX IDX_PK2_3 ON tableName (creation_date, description) ON [pschTableName](creation_date)

The idea here is to insert the data in a table of similar structure in columns, primary key and indexes as the master table.

So we are going to create a table called tablename_load. Since this table has to be on the filegroup where the data for that day will be located, it’s best to write a stored procedure for that which will do it for you.

CREATE PROCEDURE [dbo].[uspPartitionDailyCreateSwitchTables]
@day smalldatetime
AS
— exec uspPartitionDailyCreateSwitchTables ‘2009/03/01’
DECLARE @Sqlcommand varchar(4000)
,@tablename varchar(50)
,@filegroup varchar(50)

BEGIN

SET @tableName = ‘tableName_load’
SET @filegroup = replace(@tablename,’_load’,”) + ‘_’ + replace(convert(varchar(10),@day,120),’-‘,”)

if exists(select * from sys.objects where name = @tablename)
BEGIN
SET @sqlcommand = ‘drop table ‘ + @tablename
exec (@sqlcommand)
END

SET @sqlcommand = ‘Create table [tablename] ( id int,creation_date smalldatetime,description varchar(50) ON [‘ + @filegroup + ‘]’
exec (@sqlcommand)

END
GO

This stored procedure will drop the load table if it exists and create a new one one the filegroup created for that day. What’s important to noticeis that the primary key and indexes are not created yet at that point.

The reason is that it’s much much faster to insert data on a table with no index or primary key. So it’s important to make sure that your data integrity is good. Otherwise the job will fail when the primary key will be created. That’s after tableName_Load has been loaded with SSIS or BCP.

Now that the table has been created and populated using your favorite loading tool, we need to create the index and add a constraint on the date (if a column using integer was used for partitioning then the constraint would be on that integer column).

The usage of a constraint is not required by me being excessively picky about my data. That’s SQL Server requirements to be able to switch data into a partition. Even though you’re sure that your data is good, the switch will fail with an error saying you’re missing a constrait. Or that the constraint for the load table is not matching the constraint for the partition.

This is the stored procedure used:

CREATE PROCEDURE [dbo].[uspLoadDataAddConstraintsIndexes]
@day smalldatetime
AS

DECLARE @sqlcommand varchar(4000)
DECLARE @filegroup varchar(50)

BEGIN
–define filegroup to create the table on
SET @filegroup = ‘tableName_’ + replace(convert(varchar(10), @day,120),’-‘,”)
IF EXISTS (SELECT * FROM sys.check_constraints WHERE object_id = OBJECT_ID(N'[dbo].[CK_tableName_load]’) AND parent_object_id = OBJECT_ID(N'[dbo].[tableName_load]’))
ALTER TABLE [dbo].[tableName_load] DROP CONSTRAINT [CK_tableName_load]

–create the constraint
SET @sqlcommand = ‘ALTER TABLE [dbo].[tableName_load] WITH CHECK ADD CONSTRAINT [CK_tableName_load] CHECK ([creation_date]>= convert(smalldatetime,”’+ convert(varchar(10),@day,120) +”’) AND [creation_date]< convert(smalldatetime,”’+ convert(varchar(10),dateadd(dd,1,@day),120)+”’))’
exec (@sqlcommand)
SET @sqlcommand = ‘ALTER TABLE [dbo].[tableName_load] CHECK CONSTRAINT [CK_tableName_load]’
exec (@sqlcommand)
set @sqlcommand = ‘ALTER TABLE [dbo].[tableName_load] ADD CONSTRAINT [PK_tableName_load] PRIMARY KEY CLUSTERED([id ASC, creation_date ASC) ON [‘ + @filegroup + ‘]’
exec (@sqlcommand)
–create the index
set @sqlcommand = ‘CREATE INDEX IDX_PK2_3 ON [dbo].[tableName_load] (creation_date ASC , description ASC) ON [‘ + @filegroup + ‘]’
exec (@sqlcommand)
END
GO

Finally all we need to do is compile the whole as follow:

CREATE PROCEDURE [dbo].[uspLoadData]
@date smalldatetime
AS

DECLARE @partitionNumber varchar(2)

BEGIN

–Create constraints before switching and after BCP
exec uspLoadDataAddConstraintsIndexes @date

–Switch the partition tables into place
SET @partitionNumber = dbo.getPartitionNumber(‘pfntableName’,@date)
exec Partition2Switch ‘tableName_load’, @partitionNumber, ‘tableName’

END
GO

The function getPartitionNumber and the stored procedyre Partition2Switch can be find in the article posted just before this one.

To summarize everything since yesterday:

1) There’s a procedure that creates a new partition every day on a new filegroup and datafile
2) there’s another procedure to clean up the oldest partition and get rid of the filegroup and data file
3) There’s a procedure to create a staging table on the filegroup created for that day.
4) data are loaded in the staging table.
5) After data are loaded on the staging table, we create constraint and indexes prior to switching
6) We switch the data to the master table.

Written by Nicolas de Fontenay

2009/07/17 at 7:37 am

Posted in administration

Picture needs

leave a comment »

I definitely need to be able to store pictures here for my SSIS needs.

Written by Nicolas de Fontenay

2009/07/17 at 2:26 am

Posted in Uncategorized

SQL Server – How to implement a sliding windows partition strategy on a table

with one comment

When a table grows big, it’s important to start thinking about partitioning.

The answer to the question of “When is it a good time to start implementing partitions?” is: It depends. If your queries takes more than a minute on your current hardware after optimizing your SQL code then it might be a good idea to consider it.

The stored procedures that follows support a table with 1.7 billion rows and still growing.

This post address the question after the decision have been made to implement partitioning:

For this sliding windows partitioning strategy, following decisions have been made:

1) The table will be partitioned daily on the creation date
2) Each day is stored on a different filegroup and data file
3) Filegroup and data file have the following name: tablename_yyyymmdd
4) Data will be removed from the table after 28 days as well as the filegroup and data file.
5) We have decided to start loading data from 2009/07/01 only.

First of all we need to create the partitioning functions and scheme that our table will use and we need at least a few filegroups with datafiles for that.

database structure

CREATE PARTITION FUNCTION [pfnTableName](smalldatetime)
AS RANGE RIGHT
FOR VALUES (‘2009/07/01’, ‘2009/07/02′,’2009/12/2020’)

What’s important to notice here is that the partition will use a “RANGE RIGHT” function. The dates are just boundaries and the partitions will start from these dates on the right so everything bigger than 2009/07/01 will be the partition No2. What’s partition No1 then? It’s everything else that’s before ‘2009/07/01’. We will see that with the partition scheme below

CREATE PARTITION SCHEME [pschTableName]
AS PARTITION [fnSmartSearchAssignments]
TO ([PRIMARY], [tablename_20090701],[tablename_20090702], [PRIMARY])

From the partition function above we had 3 boundaries. As a result, we have 4 partitions. The left most and right most partitions should be always empty. Otherwise you might have to deal with a right mess when you realize that your new partition for the next day have not been created and now data from 2009/07/02 and the 7 other days later are all mixed up.

Since it’s empty I use [primary] to make more sense here. But I use a default filegroup called [User] (Oracle habit) which stores any table which hasn’t been assigned explicitly to a filegroup. That way I ensure that I never get any data on [Primary].

We can now create our table:

Create table [tablename]
(

id int,
creation_date smalldatetime,
description varchar(50)
CONSTRAINT PK_id_date PRIMARY KEY CLUSTERED
(
id asc,
creation_date
) ON [pschTableName](creation-date)
) ON [pschTableName](creation_date)

Now we have a table with a clustered index both using the same partition function and scheme. It is said that the index is aligned with the data.

Now that we got our structure, we have got to automate this whole world. For that purpose, I’ve created a set of procedures which will:

Create a new filegroup
Create a new datafile
Assign the new filegroup to be used for the new partition
Create the new partition
Switch data in or out of the master table to/from a staging table

CREATE PROCEDURE [dbo].[uspPartitionCreateNewFilegroup]
@nameDB varchar(50),
@filegroup varchar(50)

DECLARE @sqlcommand varchar(1000)

BEGIN

set @sqlcommand = ‘ALTER DATABASE ‘ + @nameDB + ‘ ADD FILEGROUP ‘ + @filegroup
print @sqlcommand
exec (@sqlcommand)

END
GO

CREATE PROCEDURE [dbo].[uspPartitionAddDataFiletoFilegroup]
@nameDB           varchar(50),
@logicalName   varchar(50),
@destination   varchar(100),
@filegroup       varchar(50),
@size           varchar(5)

DECLARE @datafile varchar(50)
DECLARE @sqlcommand varchar(1000)

BEGIN

–Get the left part of the datafile (before the dot) to be used as logical name
set @datafile = @logicalName + ‘.ndf’
set @sqlcommand = ‘ALTER DATABASE ‘ + @nameDB + ‘ ADD FILE (NAME = N”’ + @logicalName + ”’, FILENAME = N”’ + @destination + ‘\’ + @datafile + ”’, SIZE = ‘+ @size +’MB, MAXSIZE = UNLIMITED, FILEGROWTH = 10%) TO FILEGROUP [‘ + @filegroup + ‘]’
print @sqlcommand
exec (@sqlcommand)

END
GO

CREATE PROCEDURE [dbo].[uspPartitionDefineNextUsedFilegroup]
@schemeName varchar(50),
@filegroup varchar(50)

DECLARE @sqlcommand varchar(1000)

BEGIN
set @sqlcommand = ‘ALTER PARTITION SCHEME ‘ + @schemeName + ‘ NEXT USED [‘ + @filegroup + ‘]’
exec (@sqlcommand)

END
GO

CREATE PROCEDURE [dbo].[uspPartitionAddNewPartition]
@partitionFunction varchar(50),
@nextPeriod varchar(10)
AS

DECLARE @sqlcommand varchar(1000)

BEGIN

set @sqlcommand = ‘ALTER PARTITION FUNCTION ‘ + @partitionFunction + ‘() SPLIT RANGE (”’ + @nextPeriod +”’)’
exec (@sqlcommand)

END
GO

CREATE PROCEDURE [dbo].[uspPartition2Switch]
@sourceTable varchar(50),
@partitionNumber varchar(2),
@destinationTable varchar(50)
AS

DECLARE @sqlcommand varchar(1000)

BEGIN

IF (RIGHT(@sourceTable,6) = ‘switch’ or RIGHT(@sourceTable,4) = ‘load’)
SET @sqlcommand = ‘ALTER TABLE ‘ + @sourceTable + ‘ SWITCH TO ‘ + @destinationTable + ‘ PARTITION ‘ + @partitionNumber
ELSE
SET @sqlcommand = ‘ALTER TABLE ‘ + @sourceTable + ‘ SWITCH PARTITION ‘ + @partitionNumber + ‘ TO ‘ + @destinationTable
exec (@sqlcommand)

END
GO

CREATE PROCEDURE [dbo].[PartitionMergeLeft]
@partitionFunction varchar(50),
@mergeDate varchar(10)
AS

DECLARE @sqlcommand varchar(1000)

BEGIN

SET @sqlcommand = ‘ALTER PARTITION FUNCTION ‘ + @partitionFunction + ‘() MERGE RANGE (”’ + convert(varchar(10),@mergeDate,120) + ”’)’

exec (@sqlcommand)

END
GO

These are very portable procedures generating the sql commands that will allow me to automate the whole process.
The missing ingredient is the ability to tell which partition is being used for a day. That function does it for me:

CREATE FUNCTION [dbo].[ufngetPartitionNumber]
(@functionName varchar(50)
,@rangeValue smalldatetime)
RETURNS int
AS

BEGIN

DECLARE @partitionNumber int

SELECT       @partitionNumber = [boundary_id]
FROM       sys.partition_range_values a
INNER JOIN   sys.partition_functions b
ON       a.function_id = b.function_id
WHERE       [name] = @functionName
AND           [VALUE] = @rangeValue

SET @partitionNumber = @partitionNumber + 1

RETURN @partitionNumber

END
GO

Now that all this is set up, all I got to do is assemble a procedure that will do the job for me:

CREATE PROCEDURE [dbo].[uspPartitionDailyTableNameRightSide]
@date smalldatetime = ”
AS
BEGIN

DECLARE @filegroup                   varchar(50)
,@nextDay               varchar(10)
,@partitionFunction       varchar(50)
,@schemeName           varchar(50)
,@nameDB               varchar(50)
,@destination           varchar(100)
,@tableName               varchar(50)
,@size                   varchar(50)

———–
–Create the new filegroup for tomorrow
–Add a datafile to the filegroup
–Set the tomorrow’s file group to be next used for the partition
–Create the staging table on the filegroup
–Create the new partition
–exec PartitionDailyTableNameRightSide ‘2009/07/17’
———–

–Initialization
IF @date = ”
SET @nextDay = convert(varchar(10),dateadd(dd,1,getDate()), 120)
ELSE
SET @nextday = convert(varchar(10),@date,120)

SET @partitionFunction = ‘pfnTableName’
SET @tableName = ‘TableName’
SET @filegroup = @TableName + ‘_’ + replace(@nextday,’-‘,”) –format Tablename_yyyymmdd
SET @SchemeName = ‘pschTableName’
SET @nameDB = ‘MyDB’
SET @destination = ‘C:\Database\Datafiles\MyDB’
SET @size = ‘13312’

exec PartitionCreateNewFilegroup @nameDB, @filegroup
exec PartitionAddDataFiletoFilegroup @nameDB, @filegroup, @destination, @filegroup, @size
exec PartitionDefineNextUsedFilegroup @schemeName, @filegroup
exec PartitionAddNewPartition @partitionFunction, @nextday

END
GO

For good performance during loading, it’s nice to set the expected size of the data file in advance, hence the “size” option.

On the left hand side it looks like this:

CREATE PROCEDURE [dbo].[uspPartitionDailyTableNameLeftSide]
@mergeDate smalldatetime = ”
AS

DECLARE @date               varchar(10)
,@partitionNumber   varchar(2)
,@filegroup           varchar(50)
,@sourceTable       varchar(50)
,@destinationTable   varchar(50)
,@partitionFunction varchar(50)
,@nameDB           varchar(50)

BEGIN

–initialization
Set @partitionFunction = ‘pfnTableName’
Set @sourceTable = ‘TableName’
Set @destinationTable = ‘TableName_switch’
Set @nameDB = ‘MyDB’

if @mergeDate = ”
BEGIN
SET @date = convert(varchar(10),dateadd(dd,-29,getdate()),120)
SET @partitionNumber = ‘2’
END
else
BEGIN
SET @date = convert(varchar(10),@mergeDate,120)
SET @partitionNumber = dbo.getPartitionNumber(@partitionFunction,@date)
END

SET @filegroup = @sourceTable + ‘_’ + replace(@date,’-‘,”)

–If the table exists, drop it.
if exists(select * from sys.objects where name = @destinationTable)
BEGIN
exec PartitionDropTable @destinationTable
END
–Then create it.
exec PartitionCreateTableInFilegroup @destinationTable, @filegroup, @date

exec Partition2Switch @sourceTable, @partitionNumber, @destinationTable
exec PartitionMergeLeft @partitionFunction, @date
exec PartitionDropTable @destinationTable
exec PartitionRemoveDataFile @nameDB, @filegroup –it requires data file logical name but logical name and filegroup –are the same
exec PartitionRemoveFilegroup @nameDB, @filegroup

END
GO

In this procedure there’s 2 things: First I got a CreateTableInFilegroup because to switch the data out of the master to the staging table, I need to create the table with the exact same structure as the master table on the filegroup where the data are located.

Here’s the procedure:

CREATE PROCEDURE [dbo].[PartitionCreateTableInFilegroup]
(@tableName varchar(50)
,@filegroup varchar(50)
,@day smalldatetime
)
AS
–exec PartitionCreateTableInFilegroup ‘TableName_Switch’,TableName_20090212′, ‘2009/07/18’

DECLARE @Sqlcommand varchar(4000)

BEGIN

SET @sqlcommand = ‘CREATE TABLE [dbo].TableName_switch](id int, description varchar(50), creation_date smalldatetime) ON [‘ + @filegroup + ‘]) ON [‘ + @filegroup + ‘]’
exec (@sqlcommand)

END
GO

Also, there’s a drop table stored procdure I created and use to ensure I will never throw a master table and instead just a staging table.

You can re-use the generic stored procedure as is on your environment but anybody else using
[dbo].[uspPartitionDailyTableNameRightSide]
or
[dbo].[uspPartitionDailyTableNameRightSide]
Should review and arrange it for his own environment and at his own risk.

Now that we have seen how to add a new partition on the right side and remove them on the left side, we will see in the next post how to insert huge amount data in a partition table while taking advantage of the switch ability.

Written by Nicolas de Fontenay

2009/07/16 at 11:13 am

Posted in administration

10 good habits with SQL Server

leave a comment »

This is a number of good habits to have when working with databases:

1) Don’t use dynamic SQL. That is concatenate some string to form a SQL query then execute it using
exec (@mysqlcommand). This will forbid you and your users from re-using the execution plan set for this query. If it’s frequently used, it can be sinking your performance.

2) Store your tables outside of the Primary file group. Split them by function. As your DB grows and your company is successful you will be able to move your data files belonging to a filegroup to another disk improving your performance. Taking advantage of RAID 10 and RAID 5.

3) Store your tempdb to a different disk from where your transaction log and data files are (that would be 3 disks or Raid system). If you can’t afford that, group tempdb and transaction log together and put your data files to a different disk.

4) Know your database intimately. How much does it grow every months, how much is the business expected to grow. Is it smooth or does it double suddenly. That will allow you to define the size of your data file when you create it. Save the disk space for this data file allows for the data inserted in table belonging to it to be contiguous. Less fragmentation, better performance.

4) Look at the execution plan. If you’re learning how to write good sql, looking at it and trying different ways will help. On the long run, you’ll learn how to identify bottle necks and understand the different methods used by the SQL Server engine.

5) The join command is not a filter. That one is a tricky one. exemple:

if I do :

select a.column1, a.column2, b.column1
from tablea a
inner join tableb b
on a.id = b.id
where b.column1 = ‘somefilter’

That will produce good results as long as the database is small.

But as it grows big and reach a few million rows, the table a will still need to be scanned from top to bottom to define which rows matches b.id for which b.column1 = ‘somefilter’.

If tablea is the big table, that’s a serious problem. Answer: filter a. Is it running on all rows or is it for the last day of inserted data? it could be this instead:

select id, column1, column2
into #tempa
from tablea
where date = @yesterday

select a.column1, a.column2, b.column1
from #tempa a
inner join tableb b
on a.id = b.id
where b.column1 = ‘somefilter’

6) If you’ve got to use the like command, try to avoid using filters looking like this: ‘%myfitler%’.
You won’t be able to use an index with it because the SQl Server engine can’t tell where to look. It could be anything. Instead use: like ‘myfilter%’ That way the SQL Server engine knows that you’re looking for something starting with ‘m’.

7) Give rights on a need to know basis and ask for reasons when you’re being asked for privileges. Data are sensitive and in the end you’re the one in charge.

8) Do good backups and test them regularly. This point would be a long post in itself but… Really. I’ve seen places making backups and feeling safe like that. Better feel truly unsafe than wrongly safe.

9) When deploying solutions, use methods that won’t leave password clear in text files. The best way I’ve found is to stick my stored procedures in SSIS which is then protected either with my user key or a password of its own stored in a configuration file.

10) Have fun doing what you do. Databases are sensitive. Pressure is often high. If you’re doing a good job, nobody will notice because it means nothing is breaking, so really have a lot of fun and pleasure implementing something great. It will make you feel good 🙂

Written by Nicolas de Fontenay

2009/07/14 at 4:03 am

Posted in General, Uncategorized

Tagged with best practice, database, sql, sql server

Transfer data from multiple tables to same tables somewhere else

leave a comment »

I’ve got this small project today which makes for a very simple yet interesting package in SSIS.

The problem is as follow: We got a database which has table generated every day with some traffic generated data in it.

We had a new server recently on which we got to migrate the data on to it. I know I know. Why didn’t we do a backup? I wonder too but I’ve been standing in front of the situation already done. Now we have data here and old data there… And we need them here.

I love using expressions because it makes a package really flexible when changing environments from dev to production for instance.

But here’s another use for expressions:

The tables have the following format: stat_daily_yyyymmdd. They need to be created then the data will be pushed from Server1 to Server2.

For this, I used a For… Loop container, a SQL task and a data flow.

In the data flow, I use an OLE DB Source and an OLE DB destination.

1) We will need a couple of initial variable to start with as follow:

counter int32 0
date datetime 2009/01/01 (whenever the tables starts being created)

2) Then we need some variables to store our generated table and query

tableName string
properties: evaluate as expression True
Expression:

“stat_daily_” + (DT_STR,4,1252)YEAR(DATEADD(“dd”, @[User::counter] ,@[User::date]))
+ RIGHT(“0” + (DT_STR,4,1252)MONTH(DATEADD(“dd”, @[User::counter] ,@[User::date])),2)
+ RIGHT(“0” + (DT_STR,4,1252)DAY(DATEADD(“dd”, @[User::counter] ,@[User::date])),2)
This will add @counter to the initial date and an extra 0 to the date for months and days the end result looks like this:
I’ve found this technique on the excellent blog from Jamie Thomson, SSIS junkie

stat_daily_20090101
stat_daily_20090102
and so on…

Statdaily String
evaluate as expression: True
expression: “select * from ” + @[User::tablename]

createTable String:
evaluate as expression: True
expression :”Create table ” + @[User::tablename] +
“( column1 varchar(10),
column2 int) on [myfilegroup]”

Now that we got our variables ready we need to configure our components:

1) The for loop container:

initExpression: @counter = 0
evalExpression: @date <(DT_DATE) “yyyy\mm\dd” replace “yyyy\mm\dd” by the date where it should end.
assignExpression: @counter = @counter + 1

2) Insert an SQL task into the for loop container and here’s its parameters:

connection string: Set it to your destination database

Then click on “Expressions”:
Property: SqlStatementSource
Expression: @[User::createTable]

3) Inside the for loop container, add a data flow task and link the SQL task to it.

In the data flow task:

a) Add an OLE DB Source component

choose your connection string as usual but data access should be:

Access table name from variable. Select the Stat_Daily variable.

b) Add an OLE DB destination and link the source to it.

Configure it as usual but choose

Table name or view name variable – fast load

Then select [User::tablename] as the variable.

Here we hit a problem…

The table is not created yet so SSIS fails to map the fields saying that the table does not exist.

To work around this, create the table in the database, perform the mapping and drop the table.

When executing the package, it will fail validation saying that the table does not exist. To avoid this, change the property DelayValidation to True at package level.

Now you can take a break, take a drink and enjoy your favorite facebook game while the computer does the job for you.

Hope this post will help some people out there 🙂

Note the technic to map a non existent table can also be used to work with temporary table. It’s the same problem really.

Written by Nicolas de Fontenay

2009/07/08 at 10:31 am

Posted in SSIS

SQL Server-ing you

SSIS – How to get the last date of a month?

SSIS and file reading using parallelism

T_SQL – Stored Procedure to create a time dimension

a send-mail function for powershell

SSIS – variables takes initial value before using expressions?

SQL Server – Load data in a partition table using switch command

Picture needs

SQL Server – How to implement a sliding windows partition strategy on a table

10 good habits with SQL Server

Transfer data from multiple tables to same tables somewhere else

Yay 18 visits yesterday

When was that?

Categories

Your comments

Tags

Top Posts

Archives