SQL Server-ing you

A blog about SQL Server

Archive for the ‘SSIS’ Category

SSIS and file reading using parallelism

leave a comment »

In this post, I’m discussing the usage of parallelism in SSIS when reading multiple flat files of different type located in the same folder.

So what we have is an SSIS control flow with 5 different data flow components.
Each data flow components perform a very simple task using a multiflatfile connection as a source and a database as a destinationeach feeding a different table.

I can’t even give the name of the different file types because of the secrecy lines in my contract. But let’s call them:

Type1, Type2, Type3,Type4 and Type5.

Every hour a job collects data from a cluster of 7 servers  and ftp it to our server. That is 24 files * 7 a day per type = 168 files per file type and 840 files total.

They are pretty large data too, so to speed up the process rather than running them one after the other, I have decided to run them all together. We have a powerful server with plenty of RAM and processors and I wanted to take advantage of it. It’s not processors nor Ram that failed in this design. It was access to the folder by each of the data flow component.

Since the files are sent in a zip file hourly, they are all unzip in a folder. All of the control flow will access the folder and get their assigned files as defined by the multiflatfile connection (D:\project\yyyymmdd\type1*.txt, D:\project\yyyymmdd\type2*.txt) etc..

A few days after deployment, we started noticing that some of  tables was not having the correct amount of data. Sometimes it was significantly less.  We also noticed that the afflicted table was more often those starting with the first alphabetical letters (tables with a name starting by a or c).

As it turns out, each data flow in the process of identifying which file belongs to him, locks it to read the name and select or reject it. In a situation where all starts at the same time, the data flow to which the file belongs has big chances that when he tries to access it to read its content, the said file is locked… The problem will get more serious as the number of data flow trying to access files in the same folder increase.

Solution:

1) Put each file types in a different folder

Using a foreach … loop component doesn’t help solving the problem because it also locks the files in the folder to read it and build its collection of objects.

2) Change your package property “FailParentOnFailure” to true. This will ensure that in the event where one of the data flow fail (including lock files),  the job will also fail rather than keep loading whatever is left.

Written by Nicolas de Fontenay

2009/10/05 at 2:32 am

Posted in SSIS

SSIS – variables takes initial value before using expressions?

leave a comment »

wait a second… Could it… Is it…

I have a pretty straight forward SSIS job using a bunch of expressions set variables and I was having something really weird: Every day, I would have an extra row with data from the same date 2009/07/15.  Even though I have more than one file for that day (2009/07/15), only the first file (which has one row) would be loaded. Then it loads the rest for the current day.

What happened here?! Of course when something like that occur, you check your sources.

The way things works, I got a date variable which is defined when calling dtexec with the day to be loaded (say 2009/07/19). Then this date is used to set the path for the files to be loaded: D:\Load\20090719.

The date variable was set to 2009/07/15. That very date for which I always get an extra row every single day.

I’ve changed it to blank. I’m going to run some tests to see what happens. Will keep posted later today 🙂

Written by Nicolas de Fontenay

2009/07/20 at 2:19 am

Posted in SSIS

Transfer data from multiple tables to same tables somewhere else

leave a comment »

I’ve got this small project today which makes for a very simple yet interesting package in SSIS.

The problem is as follow: We got a database which has table generated every day with some traffic generated data in it.

We had a new server recently on which we got to migrate the data on to it. I know I know. Why didn’t we do a backup? I wonder too but I’ve been standing in front of the situation already done. Now we have data here and old data there… And we need them here.

I love using expressions because it makes a package really flexible when changing environments from dev to production for instance.

But here’s another use for expressions:

The tables have the following format: stat_daily_yyyymmdd. They need to be created then the data will be pushed from Server1 to Server2.

For this, I used a  For… Loop container, a SQL task and a data flow.

In the data flow, I use an OLE DB Source and an OLE DB destination.

1) We will need a couple of initial variable to start with as follow:

counter int32 0
date datetime 2009/01/01 (whenever the tables starts being created)

2) Then we need some variables to store our generated table and query

tableName string
properties: evaluate as expression True
Expression:

“stat_daily_” + (DT_STR,4,1252)YEAR(DATEADD(“dd”, @[User::counter] ,@[User::date]))
+ RIGHT(“0” + (DT_STR,4,1252)MONTH(DATEADD(“dd”, @[User::counter] ,@[User::date])),2)
+ RIGHT(“0” + (DT_STR,4,1252)DAY(DATEADD(“dd”, @[User::counter] ,@[User::date])),2)
This will add @counter to the initial date and an extra 0 to the date for months and days the end result looks like this:
I’ve found this technique on the excellent blog from Jamie Thomson, SSIS junkie

stat_daily_20090101
stat_daily_20090102
and so on…

Statdaily String
evaluate as expression: True
expression: “select * from ” +  @[User::tablename]

createTable String:
evaluate as expression: True
expression :”Create table ” + @[User::tablename] +
“( column1 varchar(10),
column2 int) on [myfilegroup]”

Now that we got our variables ready we need to configure our components:

1) The for loop container:

initExpression: @counter = 0
evalExpression: @date <(DT_DATE) “yyyy\mm\dd” replace “yyyy\mm\dd” by the date where it should end.
assignExpression: @counter = @counter + 1

2) Insert an SQL task into the for loop container and here’s its parameters:

connection string: Set it to your destination database

Then click on “Expressions”:
Property: SqlStatementSource
Expression: @[User::createTable]

3) Inside the for loop container, add a data flow task and link the SQL task to it.

In the data flow task:

a) Add an OLE DB Source component

choose your connection string as usual but  data access should be:

Access table name from variable.  Select the Stat_Daily variable.

b) Add an OLE DB destination and link the source to it.

Configure it as usual but choose

Table name or view name variable – fast load

Then select [User::tablename] as the variable.

Here we hit a problem…

The table is not created yet so SSIS fails to map the fields saying that the table does not exist.

To work around this,  create the table in the database, perform the mapping and drop the table.

When executing the package, it will fail validation saying that the table does not exist. To avoid this, change the property DelayValidation to True at package level.

Now you can take a break, take a drink and enjoy your favorite facebook game while the computer does the job for you.

Hope this post will help some people out there 🙂

Note the technic to map a non existent table can also be used to work with temporary table. It’s the same problem really.

Written by Nicolas de Fontenay

2009/07/08 at 10:31 am

Posted in SSIS