SQL Server-ing you

A blog about SQL Server

Archive for October 2009

SSIS – How to get the last date of a month?

leave a comment »

I need to get the first and last date of a month in SSIS. These will be used to process partitions in my SSAS Cube afterwards.

The riddle answer is: The last day of the month is the day before the first day of next month.

1) Create 3 variables: Date, MonthBeginning and MonthEnding

2) Change MonthBeginning property “evaluate as expression to true” and edit the expression as follow
(DT_WSTR,4)YEAR(@[User::Date])  + “/” + (DT_WSTR,4)MONTH(@[User::Date]) + “/01”
3) Change MonthEnding property “evaluate as expression to true” and edit the expression as follow:
(DT_WSTR,4)YEAR(DATEADD(“dd”,-1,DATEADD(“mm”,1,(DT_DATE) @[User::MonthBeginning]))) + “/” + (DT_WSTR,4)MONTH(DATEADD(“dd”,-1,DATEADD(“mm”,1,(DT_DATE) @[User::MonthBeginning]))) + “/” + (DT_WSTR,4)DAY(DATEADD(“dd”,-1,DATEADD(“mm”,1,(DT_DATE) @[User::MonthBeginning])))

I like my dates in the yyyy/mm/dd format hence the big expression above otherwise it returns something looking like this: mm/dd/yyyy. I hate this.

yyyy/mm/dd format makes it easier for me since I just need to replace / with “” (empty string) to get a nice integer 20091001 and 20091031 which will represent the foreign key in my fact table 🙂 yay.

Then you can go along and use this to partition your cube and so on.

To summarize: we get the first day of the month we seek, say 2009/10/01. We add one month: 2009/11/01 and we remove one day: 2009/10/31. Bingo. Last date of the month.

Advertisements

Written by Nicolas de Fontenay

2009/10/29 at 8:18 am

SSIS and file reading using parallelism

leave a comment »

In this post, I’m discussing the usage of parallelism in SSIS when reading multiple flat files of different type located in the same folder.

So what we have is an SSIS control flow with 5 different data flow components.
Each data flow components perform a very simple task using a multiflatfile connection as a source and a database as a destinationeach feeding a different table.

I can’t even give the name of the different file types because of the secrecy lines in my contract. But let’s call them:

Type1, Type2, Type3,Type4 and Type5.

Every hour a job collects data from a cluster of 7 servers  and ftp it to our server. That is 24 files * 7 a day per type = 168 files per file type and 840 files total.

They are pretty large data too, so to speed up the process rather than running them one after the other, I have decided to run them all together. We have a powerful server with plenty of RAM and processors and I wanted to take advantage of it. It’s not processors nor Ram that failed in this design. It was access to the folder by each of the data flow component.

Since the files are sent in a zip file hourly, they are all unzip in a folder. All of the control flow will access the folder and get their assigned files as defined by the multiflatfile connection (D:\project\yyyymmdd\type1*.txt, D:\project\yyyymmdd\type2*.txt) etc..

A few days after deployment, we started noticing that some of  tables was not having the correct amount of data. Sometimes it was significantly less.  We also noticed that the afflicted table was more often those starting with the first alphabetical letters (tables with a name starting by a or c).

As it turns out, each data flow in the process of identifying which file belongs to him, locks it to read the name and select or reject it. In a situation where all starts at the same time, the data flow to which the file belongs has big chances that when he tries to access it to read its content, the said file is locked… The problem will get more serious as the number of data flow trying to access files in the same folder increase.

Solution:

1) Put each file types in a different folder

Using a foreach … loop component doesn’t help solving the problem because it also locks the files in the folder to read it and build its collection of objects.

2) Change your package property “FailParentOnFailure” to true. This will ensure that in the event where one of the data flow fail (including lock files),  the job will also fail rather than keep loading whatever is left.

Written by Nicolas de Fontenay

2009/10/05 at 2:32 am

Posted in SSIS