SQL Server-ing you

A blog about SQL Server

SSIS and file reading using parallelism

leave a comment »

In this post, I’m discussing the usage of parallelism in SSIS when reading multiple flat files of different type located in the same folder.

So what we have is an SSIS control flow with 5 different data flow components.
Each data flow components perform a very simple task using a multiflatfile connection as a source and a database as a destinationeach feeding a different table.

I can’t even give the name of the different file types because of the secrecy lines in my contract. But let’s call them:

Type1, Type2, Type3,Type4 and Type5.

Every hour a job collects data from a cluster of 7 servers  and ftp it to our server. That is 24 files * 7 a day per type = 168 files per file type and 840 files total.

They are pretty large data too, so to speed up the process rather than running them one after the other, I have decided to run them all together. We have a powerful server with plenty of RAM and processors and I wanted to take advantage of it. It’s not processors nor Ram that failed in this design. It was access to the folder by each of the data flow component.

Since the files are sent in a zip file hourly, they are all unzip in a folder. All of the control flow will access the folder and get their assigned files as defined by the multiflatfile connection (D:\project\yyyymmdd\type1*.txt, D:\project\yyyymmdd\type2*.txt) etc..

A few days after deployment, we started noticing that some of  tables was not having the correct amount of data. Sometimes it was significantly less.  We also noticed that the afflicted table was more often those starting with the first alphabetical letters (tables with a name starting by a or c).

As it turns out, each data flow in the process of identifying which file belongs to him, locks it to read the name and select or reject it. In a situation where all starts at the same time, the data flow to which the file belongs has big chances that when he tries to access it to read its content, the said file is locked… The problem will get more serious as the number of data flow trying to access files in the same folder increase.

Solution:

1) Put each file types in a different folder

Using a foreach … loop component doesn’t help solving the problem because it also locks the files in the folder to read it and build its collection of objects.

2) Change your package property “FailParentOnFailure” to true. This will ensure that in the event where one of the data flow fail (including lock files),  the job will also fail rather than keep loading whatever is left.

Advertisements

Written by Nicolas de Fontenay

2009/10/05 at 2:32 am

Posted in SSIS

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: