Nov 24, 2009

Dynamic file handling and formatting dateTime.

Hi,
I’m back with the second post. As I said I would be updating on how to set the file name dynamically for multiple files kept in a folder or different folder.
Point to note that all the files have the same metadata.
Scenario:
For simplicity sake I will be using a case where the file names and file paths are saved in a DB table. There is an ID auto increment field to based on which the files need to be picked and inserted into the destination Table.
The above table’s Schema is:


You can download this package from here.

CREATE TABLE [dbo].[FileConfig](
[ID] [int] IDENTITY(1,1) NOT NULL,
[FileName] [nvarchar](50) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
[FilePath] [nvarchar](max) COLLATE SQL_Latin1_General_CP1_CI_AS NULL
) ON [PRIMARY]

The input file data is comma delimited
The destination is SQL Table.
The data will be modified as per the definition given below.


Input  File Columns
Output  File Columns
ID -> String
ID -> unsigned Integer
FName -> String(50)
FName -> String(50)
LName -> String(50)
LName -> String(50)
Age -> String
Age -> unsigned Integer
DOB > String(DD/MM/YYYY)
DOB > DateTime


So the Package will look like this:


Create 2 string Variable: FileName and FilePath with the scope as the package(Practice in my case). Set the value of the variable to a file name and path that exists (I set it to “File1.txt” & “C:\SSIS\Practice\Files”) just while development.
Next create an object variable(ObjfileDetails) to store the result set from the initial Execute SQL Task.


Now set up the Execute SQL Task to get the file name and path. We save the full result set to an object variable.
ResultSet : Full Result Set
ConnectionType : OLEDB(that’s what I have used you could also use ADO.NET)
Connection : Select the appropriate one.
Note: it is ordered by ID column.


Set up the Result set as shown in figure


Note: The Result Name is set to 0.
Once the SQL Task will be executed this object variable will containing the entire result set returned by the query.
In the collections tab do the below settings:
Use the object variable in the For each loop.
Set the enumerator to Foreach ADO Enumerator.


Next goto the Variable mapping and do the below mentioned settings:
Select the 2 string variable to which the file name and file path will be saved for each iteration of the object variable.

Note: Set the index starting from 0 and the order should be same as the order of the columns fetched in the SELECT query above.

Once this is done we put a Data Flow Task(DFT) in the above For Each Loop.
In the DFT use:
1. Flat File Source to fetch the data from the input file.
2. Derived Column transformation to change the string date field to DB_DBDATE
3. OLEDB Destination to send the valid records to database table.
4. Flat File destination to log the error records.
Let’s have a look at the layout of the DFT.


In the Flat file source we make a new connection manager where we set browse and set the file path of an existing file.
In the Advanced tab set the Column Names as per the need and set the data type to which you can convert the data without the loss of information like the numeric columns. Leave the other to default or set the max width of the string columns.


Once the Flat File connection manager is set up select the columns you need in the flow and your done.
Next connect the Flat File Source to the Derived column to change the DOB column to DT_DBDATE format.
To do this in the derived column add a new column give it a name and set the expression to :
(DT_DBDATE)(SUBSTRING(DOB,4,2) + "-" + SUBSTRING(DOB,1,2) + "-" + SUBSTRING(DOB,7,4))
Once the expression is set click the Configure Error Output button at the bottom.

In case of error in the Derived Column transform (e.g. the date field is No Value, “00/00/000” or any invalid date) set it to redirect row so that it can be logged and analyzed.


For the correct records send the data to Table
In the OLEDB Destination:
Select the OLEDB Connection needed to connect to the destination Data base.
Select the table where the columns need to be inserted.


Goto the mapping
Go the mapping and do the mapping as required and click OK.


For the error records(red arrow) after the derived Column set a flat file destination to log the error records.
The DFT is now all set.
Now the DFT Flat File source needs to be made dynamic.
To achieve this follow the steps below.
Goto the connection manager right click-> Properties


In the properties pane for the above connection manager:
Set the DelayValidation to True
Expand the Expressions by clicking the ellipse.


On the Property Expressions Editor window:
In the Property column select the ConnectionString property from the drop down list.
Now expand the Expression column by clicking the ellipse. An Expression Builder window will pop up.

On the Expression Builder window:
Set the Expression as: @[User::FilePath]+ “\\” + @[User::FileName]
Click the Evaluate expression button at the left hand bottom to see if the expression is creating the desired output.
Click OK.
Click OK on the Property Expression Editor as well.


Save and build the package.
Now when you execute the package the package will take the filenames from the DB table and dump the data in order in the final table.




Let me know if you need any further clarification or I am not clear in any step.

3 comments:

  1. How to avoid if we get multiple headers when we merge files ?

    ReplyDelete
  2. If the number of header columns are fixed in ur Flat file source you could select how many rows to skip.
    Else you need to use Script Component and select just the detailed records.

    ReplyDelete
  3. You can have a look at the complete discussion in the SSIS Forum.

    ReplyDelete

Thanks for your valuable comment!!