SQL Server-ing you

A blog about SQL Server

SSIS – How to get the last date of a month?

leave a comment »

I need to get the first and last date of a month in SSIS. These will be used to process partitions in my SSAS Cube afterwards.

The riddle answer is: The last day of the month is the day before the first day of next month.

1) Create 3 variables: Date, MonthBeginning and MonthEnding

2) Change MonthBeginning property “evaluate as expression to true” and edit the expression as follow
(DT_WSTR,4)YEAR(@[User::Date])  + “/” + (DT_WSTR,4)MONTH(@[User::Date]) + “/01″
3) Change MonthEnding property “evaluate as expression to true” and edit the expression as follow:
(DT_WSTR,4)YEAR(DATEADD(“dd”,-1,DATEADD(“mm”,1,(DT_DATE) @[User::MonthBeginning]))) + “/” + (DT_WSTR,4)MONTH(DATEADD(“dd”,-1,DATEADD(“mm”,1,(DT_DATE) @[User::MonthBeginning]))) + “/” + (DT_WSTR,4)DAY(DATEADD(“dd”,-1,DATEADD(“mm”,1,(DT_DATE) @[User::MonthBeginning])))

I like my dates in the yyyy/mm/dd format hence the big expression above otherwise it returns something looking like this: mm/dd/yyyy. I hate this.

yyyy/mm/dd format makes it easier for me since I just need to replace / with “” (empty string) to get a nice integer 20091001 and 20091031 which will represent the foreign key in my fact table :) yay.

Then you can go along and use this to partition your cube and so on.

To summarize: we get the first day of the month we seek, say 2009/10/01. We add one month: 2009/11/01 and we remove one day: 2009/10/31. Bingo. Last date of the month.

Written by Nicolas de Fontenay

2009/10/29 at 8:18 am

SSIS and file reading using parallelism

leave a comment »

In this post, I’m discussing the usage of parallelism in SSIS when reading multiple flat files of different type located in the same folder.

So what we have is an SSIS control flow with 5 different data flow components.
Each data flow components perform a very simple task using a multiflatfile connection as a source and a database as a destinationeach feeding a different table.

I can’t even give the name of the different file types because of the secrecy lines in my contract. But let’s call them:

Type1, Type2, Type3,Type4 and Type5.

Every hour a job collects data from a cluster of 7 servers  and ftp it to our server. That is 24 files * 7 a day per type = 168 files per file type and 840 files total.

They are pretty large data too, so to speed up the process rather than running them one after the other, I have decided to run them all together. We have a powerful server with plenty of RAM and processors and I wanted to take advantage of it. It’s not processors nor Ram that failed in this design. It was access to the folder by each of the data flow component.

Since the files are sent in a zip file hourly, they are all unzip in a folder. All of the control flow will access the folder and get their assigned files as defined by the multiflatfile connection (D:\project\yyyymmdd\type1*.txt, D:\project\yyyymmdd\type2*.txt) etc..

A few days after deployment, we started noticing that some of  tables was not having the correct amount of data. Sometimes it was significantly less.  We also noticed that the afflicted table was more often those starting with the first alphabetical letters (tables with a name starting by a or c).

As it turns out, each data flow in the process of identifying which file belongs to him, locks it to read the name and select or reject it. In a situation where all starts at the same time, the data flow to which the file belongs has big chances that when he tries to access it to read its content, the said file is locked… The problem will get more serious as the number of data flow trying to access files in the same folder increase.

Solution:

1) Put each file types in a different folder

Using a foreach … loop component doesn’t help solving the problem because it also locks the files in the folder to read it and build its collection of objects.

2) Change your package property “FailParentOnFailure” to true. This will ensure that in the event where one of the data flow fail (including lock files),  the job will also fail rather than keep loading whatever is left.

Written by Nicolas de Fontenay

2009/10/05 at 2:32 am

Posted in SSIS

T_SQL – Stored Procedure to create a time dimension

leave a comment »

This is a stored procedure that will build a 24 hour time dimension with it’s boundaries.

Note: you should create the final table and plan for a surrogate key with identity(1,1).

Also, to store hours in a 24h style, it’s not possible to use an int data type as this would wipe out the starting 0 (zero).

Anyway. here goes:

ALTER PROCEDURE Sp_Create_Dim_Time
AS

BEGIN

DECLARE @COUNT INT
SET @COUNT = 1

CREATE TABLE Dim_Time_TEMP
(

Dim_Time_Time varchar(2) not null,
Dim_Time_min varchar(8) not null,
Dim_Time_max varchar(8) not null
)

WHILE @COUNT <= 24
BEGIN

INSERT    INTO Dim_Time_TEMP
SELECT    CASE WHEN
LEN(CONVERT(VARCHAR(2),DATEPART(HH,DATEADD(HH,@COUNT,GETDATE())))) = 1
THEN
‘0’ + CONVERT(VARCHAR(2),DATEPART(HH,DATEADD(HH,@COUNT,GETDATE())))
ELSE
CONVERT(VARCHAR(2),DATEPART(HH,DATEADD(HH,@COUNT,GETDATE())))
END [TIME],

CASE WHEN
LEN(CONVERT(VARCHAR(2),DATEPART(HH,DATEADD(HH,@COUNT,GETDATE())))) = 1
THEN
‘0’ + CONVERT(VARCHAR(2),DATEPART(HH,DATEADD(HH,@COUNT,GETDATE()))) + ‘:00:00′
ELSE
CONVERT(VARCHAR(2),DATEPART(HH,DATEADD(HH,@COUNT,GETDATE()))) + ‘:00:00′
END [MIN],

CASE WHEN
LEN(CONVERT(VARCHAR(2),DATEPART(HH,DATEADD(HH,@COUNT,GETDATE())))) = 1
THEN
‘0’ + CONVERT(VARCHAR(2),DATEPART(HH,DATEADD(HH,@COUNT,GETDATE()))) + ‘:59:59′
ELSE
CONVERT(VARCHAR(2),DATEPART(HH,DATEADD(HH,@COUNT,GETDATE()))) + ‘:59:59′
END [MAX]

SET @COUNT = @COUNT + 1

END

INSERT INTO Dim_Time (Dim_Time_Time, Dim_Time_Min, Dim_Time_Max)
SELECT        Dim_Time_Time, Dim_Time_Min, Dim_Time_Max
FROM        Dim_Time_TEMP
ORDER BY    Dim_Time_Time

DROP TABLE Dim_Time_TEMP

END

The result will look like this:

Dim_Time_Sk    Dim_Time_Time    Dim_Time_Min    Dim_Time_Max
1    00    00:00:00    00:59:59
2    01    01:00:00    01:59:59
3    02    02:00:00    02:59:59
4    03    03:00:00    03:59:59
5    04    04:00:00    04:59:59
6    05    05:00:00    05:59:59
7    06    06:00:00    06:59:59
8    07    07:00:00    07:59:59
9    08    08:00:00    08:59:59
10    09    09:00:00    09:59:59
11    10    10:00:00    10:59:59
12    11    11:00:00    11:59:59
13    12    12:00:00    12:59:59
14    13    13:00:00    13:59:59
15    14    14:00:00    14:59:59
16    15    15:00:00    15:59:59
17    16    16:00:00    16:59:59
18    17    17:00:00    17:59:59
19    18    18:00:00    18:59:59
20    19    19:00:00    19:59:59
21    20    20:00:00    20:59:59
22    21    21:00:00    21:59:59
23    22    22:00:00    22:59:59
24    23    23:00:00    23:59:59

Hope you enjoyed

Nico

Written by Nicolas de Fontenay

2009/09/04 at 8:29 am

Posted in Data Warehouse

a send-mail function for powershell

leave a comment »

I’ve been working a lot with powershell.

Our data warehouse processes are going to be controlled by powershell which in return calls SSIS procedures.

Every steps are logged for success or failure. Everything is tracked.

At powershell level however I needed to be informed if there’s a string malformation when dtexec is called for exemple.

So I’ve built this send-mail function which I think will be useful to anyone required to send mails:

function Send-Email
{
param
(
[string]$message,
[string]$subject,
[string]$to,
[string]$from,
[string]$attachment
)
$SmtpClient = new-object system.net.mail.smtpClient
$MailMessage = New-Object system.net.mail.mailmessage
$SmtpClient.Host = “smtp.mail.com”
$mailmessage.from = ($from)
$mailmessage.To.add($to)
$mailmessage.Subject = $subject
$mailmessage.Body = $message
if ($attachment)
{
$att = new-object Net.Mail.Attachment($attachment)
$MailMessage.Attachments.Add($att)
}
$smtpclient.Send($mailmessage)
}

usage:

send-mail “some message” “my subject” “to@anaddress.com” “from@me.com”

or

send-mail “some message” “my subject” “to@anaddress.com” “from@me.com” “D:\Test.txt”

The function checks whether $attachment is null or not and will attach a document if it finds any.

Written by Nicolas de Fontenay

2009/09/01 at 7:59 am

Posted in Uncategorized

Tagged with , , ,

SSIS – variables takes initial value before using expressions?

leave a comment »

wait a second… Could it… Is it…

I have a pretty straight forward SSIS job using a bunch of expressions set variables and I was having something really weird: Every day, I would have an extra row with data from the same date 2009/07/15.  Even though I have more than one file for that day (2009/07/15), only the first file (which has one row) would be loaded. Then it loads the rest for the current day.

What happened here?! Of course when something like that occur, you check your sources.

The way things works, I got a date variable which is defined when calling dtexec with the day to be loaded (say 2009/07/19). Then this date is used to set the path for the files to be loaded: D:\Load\20090719.

The date variable was set to 2009/07/15. That very date for which I always get an extra row every single day.

I’ve changed it to blank. I’m going to run some tests to see what happens. Will keep posted later today :)

Written by Nicolas de Fontenay

2009/07/20 at 2:19 am

Posted in SSIS

SQL Server – Load data in a partition table using switch command

leave a comment »

I suggest you read the first part before reading this post.

Yesterday we have seen how to implement a sliding window partition strategy.
Today we are going to use it and load data using a staging table and then switch the data into the master table.

To make for a complete test, we are going to add an index to yesterday’s table on creation_date and description.

It’s important to understand what happens inside the data files.

When loading the data into the staging table you can imagine your table name like a pointer. tableName_Load points to where the data are located.
When the switch command is applied, The master table tableName  points to the new slot of data stored until then into the staging table and the staging table points to Null.
The data now belongs to the staging table. In this scenario, the data never moves on the hard disk. That is why a switch command is so fast and that’s also why there’s no transaction log produced for a switch.

Important point: The switch command request for a schema change lock. It’s a very broad lock which will require that only the user performing the switch access the table at the time the switch occur. Failing that, the switch command will wait for its lock until it gets it. This, if the database is busy can be pretty long. If you experience waiting for a switch, keep in mind that it’s not a long transaction. It’s the wait crippling your process.

Last but not least:  It might look attractive to create a partition on a date because it fits our organized mind of grouping things by dates but it might not be the way which will give you the best performance. Know your data intimately, know your queries intimately. If you pull a lot of data by city id for exemple, it might be a good idea to partition your table by city_id…

Create table [tablename]
(

id int,
creation_date smalldatetime,
description varchar(50)
CONSTRAINT PK_id_date PRIMARY KEY CLUSTERED
(

id asc,
creation_date smalldatetime,
description varchar(50)
) ON [pschTableName](creation-date)
) ON [pschTableName](creation_date)

CREATE INDEX IDX_PK2_3 ON tableName (creation_date, description) ON [pschTableName](creation_date)

The idea here is to insert the data in a table of similar structure in columns, primary key and indexes as the master table.

So we are going to create a table called tablename_load. Since this table has to be on the filegroup where the data for that day will be located,  it’s best to write a stored procedure for that which will do it for you.


CREATE PROCEDURE [dbo].[uspPartitionDailyCreateSwitchTables]

@day smalldatetime
AS
— exec uspPartitionDailyCreateSwitchTables ‘2009/03/01′
DECLARE @Sqlcommand varchar(4000)
,@tablename varchar(50)
,@filegroup varchar(50)

BEGIN

SET @tableName = ‘tableName_load’
SET @filegroup = replace(@tablename,’_load’,”) + ‘_’ + replace(convert(varchar(10),@day,120),’-‘,”)

if exists(select * from sys.objects where name = @tablename)
BEGIN
SET @sqlcommand = ‘drop table ‘ + @tablename
exec (@sqlcommand)
END

SET @sqlcommand = ‘Create table [tablename] ( id int,creation_date smalldatetime,description varchar(50) ON [‘ + @filegroup + ‘]’
exec (@sqlcommand)

END
GO

This stored procedure will drop the load table if it exists and create a new one one the filegroup created for that day. What’s important to noticeis  that the primary key and indexes are not created yet at that point.

The reason is that it’s much much faster to insert data on a table with no index or primary key. So it’s important to make sure that your data integrity is good.  Otherwise the job will fail when the primary key will be created. That’s after tableName_Load has been loaded with SSIS or BCP.

Now that the table has been created and populated using your favorite loading tool, we need to create the index and add a constraint on the date (if a column using integer was used for partitioning then the constraint would be on that integer column).

The usage of a constraint is not required by me being excessively picky about my data. That’s SQL Server requirements to be able to switch data into a partition. Even though you’re sure that your data is good, the switch will fail with an error saying you’re missing a constrait. Or that the constraint for the load table is not matching the constraint for the partition.

This is the stored procedure used:

CREATE PROCEDURE [dbo].[uspLoadDataAddConstraintsIndexes]
@day smalldatetime
AS

DECLARE @sqlcommand varchar(4000)
DECLARE @filegroup varchar(50)

BEGIN
–define filegroup to create the table on
SET @filegroup = ‘tableName_’ + replace(convert(varchar(10), @day,120),’-‘,”)
IF  EXISTS (SELECT * FROM sys.check_constraints WHERE object_id = OBJECT_ID(N'[dbo].[CK_tableName_load]’) AND parent_object_id = OBJECT_ID(N'[dbo].[tableName_load]’))
ALTER TABLE [dbo].[tableName_load] DROP CONSTRAINT [CK_tableName_load]

–create the constraint
SET @sqlcommand  = ‘ALTER TABLE [dbo].[tableName_load] WITH CHECK ADD CONSTRAINT [CK_tableName_load] CHECK  ([creation_date]>= convert(smalldatetime,”’+ convert(varchar(10),@day,120) +”’) AND [creation_date]< convert(smalldatetime,”’+ convert(varchar(10),dateadd(dd,1,@day),120)+”’))’
exec (@sqlcommand)
SET @sqlcommand = ‘ALTER TABLE [dbo].[tableName_load] CHECK CONSTRAINT [CK_tableName_load]’
exec (@sqlcommand)
set @sqlcommand = ‘ALTER TABLE [dbo].[tableName_load] ADD CONSTRAINT [PK_tableName_load] PRIMARY KEY CLUSTERED([id ASC, creation_date ASC) ON [‘ + @filegroup + ‘]’
exec (@sqlcommand)
–create the index
set @sqlcommand = ‘CREATE INDEX IDX_PK2_3 ON [dbo].[tableName_load] (creation_date ASC , description ASC) ON [‘ + @filegroup + ‘]’
exec (@sqlcommand)
END
GO

Finally all we need to do is compile the whole as follow:

CREATE PROCEDURE [dbo].[uspLoadData]
@date smalldatetime
AS

DECLARE @partitionNumber varchar(2)

BEGIN

–Create constraints before switching and after BCP
exec uspLoadDataAddConstraintsIndexes @date

–Switch the partition tables into place
SET @partitionNumber = dbo.getPartitionNumber(‘pfntableName’,@date)
exec Partition2Switch ‘tableName_load’, @partitionNumber, ‘tableName’

END
GO

The function getPartitionNumber and the stored procedyre Partition2Switch can be find in the article posted just before this one.

To summarize everything since yesterday:

1) There’s  a procedure that creates a new partition every day on a new filegroup and datafile
2) there’s  another procedure to clean up the oldest partition and get rid of the filegroup and data file
3) There’s a procedure to create a staging table on the filegroup created for that day.
4) data are loaded in the staging table.
5) After data are loaded on the staging table, we create constraint and indexes prior to switching
6) We switch the data to the master table.

Written by Nicolas de Fontenay

2009/07/17 at 7:37 am

Posted in administration

Picture needs

leave a comment »

I definitely need to be able to store pictures here for my SSIS needs.

Written by Nicolas de Fontenay

2009/07/17 at 2:26 am

Posted in Uncategorized

Follow

Get every new post delivered to your Inbox.