Apr 7, 2010

SSIS - Delete files in a folder older than a specified number of days

Scenario: How to delete files created a number of days before today. The files will have the date appended to the file name and we will not check the file properties to decide the file age. The file name would follow the naming convention: File_ddmmyyyy.txt

This can be achieved very easily in SSIS. Create an integer variable to store the value stating how old files do you wish to delete. Suppose you wish to delete all files older than 3 days, the variable would have the value 3. I name this variable intFileMaxAge. Make this variable configurable so that it can be changed as and when required.

Next I show the layout of the package.

Control Flow

                                                       Fig. 1

The Foreach Loop Container is configured to pick the files from a particular folder(you should make this connection configurable) and the “Retrieve file name” property is set to “Name and extension”. Refer fig. 2

ForEach loop

                                                       Fig. 2

Next we need to save the file name for each iteration to a string variable strFileName. For this go to the Variable Mappings tab and select the variable strFileName from the drop down box and set the index to zero. Refer fig.  3

 

ForEach loop_VarMapping

                                                       Fig. 3

Once the above is done put a script task in the Foreach Loop Container. Do not make any changes to the script task. It is just there so that we can have Precedence Constraint before the File system task to control the flow. We need the File System Task to be triggered only when the condition is met i.e. files created before certain days (Refer fig. 1). Double click the precedence  and set the Evaluation operation to Expression. In the Expression put the below expression:

DATEDIFF("dd",(DT_Date)(SUBSTRING(@[User::strFileName],6,2)+"-"+SUBSTRING(@[User::strFileName],8,2)+"-"+SUBSTRING(@[User::strFileName],10,4)),GETDATE()) > @[User::intFileMaxAge]

The above expression extracts the date part from the file name and converts it to date type. Once that’s done it gets the difference of the between the file date and today’s date. After this the expression checks whether the difference is greater than the value specified in the variable intFileMaxAge.

If the above expression will be evaluated to true the control will go to the File Sytem Task.

Setting up the File System Task. (refer fig. 4) The operation is set to Delete file  as we need to delete files. Set the IsSourcePathVariable to False as I create a connection to connect to the file and use expression to modify the connection at run time.

FileSystem Task

                                                       Fig. 4

In the Connection Managers select the connection manager configured above(refer fig. 4) go to its properties and in the expression set connection string to the following expression:

@[User::strFolderPath]+ @[User::strFileName]

Where the variable strFolderPath contains the folder in which the files are placed. Again make this configurable.

That completes the package.

9 comments:

  1. Hi,
    good blog specially on SSIS,
    I am interest in contents,
    Glad to familiar with your blog.

    Regards,
    Reza Rad,
    http://www.rad.pasfu.com

    ReplyDelete
  2. great job .....
    it saved me alot time ... thank you so much sudeep

    ReplyDelete
  3. Hi, my experience with SSIS packages is limited. I do not see how you set up the DeleteFile connection with an expression. Also, I keep getting stuck at setting up the precedence. My file is formatted Ayyyy-mm-dd-archive.txt. The error I get is:

    Error at Constraint: Error code 0x80020005 occurred attempting to convert from data type DT_WSTR to data type DT_DATE.

    Error at Constraint: Casting expression "(SUBSTRING(@[User::strFileName],10,2) + "-" + SUBSTRING(@[User::strFileName],7,2) + "-" + SUBSTRING(@[User::strFileName],2,4))" from data type "DT_WSTR" to data type "DT_DATE" failed with error code 0xC00470C2.

    Error at Constraint: Evaluating function "DATEDIFF" failed with error code 0xC00470C4.

    ReplyDelete
  4. hi Johnson,
    Is this the complete expression that you have set up? Where are you using the datediff function. And could you explain exactly whats your requirement.

    ReplyDelete
  5. I get the same error as Johnson. I copied your formula into the Precedence Constraint Editor properly . I followed your instructions step by step and get the same error as Johnson.

    DATEDIFF("dd",(DT_Date)(SUBSTRING(@[User::strFileName],6,2)+"-"+SUBSTRING(@[User::strFileName],8,2)+"-"+SUBSTRING(@[User::strFileName],10,4)),GETDATE()) > @[User::intFileMaxAge]

    ReplyDelete
  6. Could you provide your sample file name.

    ReplyDelete
  7. I received the same "Error at Constraint" message, but I was able to resolve it by placing a value in strFileName, formatted exactly like the names of the files I was going to be deleting. Even though strFileName is going to be updated at run-time, the expression editor needed a value to evaluate.

    I'm very inexperienced with SSIS, so forgive me if I'm stating the obvious, but I thought this might be helpful to other neophytes who may have made the same mistake I did.

    ReplyDelete
  8. You are correct. What happens is at the run time the variable gets populated but while development we need to populate this value. Thanks for pointing this.

    ReplyDelete

Thanks for your valuable comment!!