Nov 24, 2009

Dynamic file handling and formatting dateTime.

Hi,
I’m back with the second post. As I said I would be updating on how to set the file name dynamically for multiple files kept in a folder or different folder.
Point to note that all the files have the same metadata.
Scenario:
For simplicity sake I will be using a case where the file names and file paths are saved in a DB table. There is an ID auto increment field to based on which the files need to be picked and inserted into the destination Table.
The above table’s Schema is:


You can download this package from here.

CREATE TABLE [dbo].[FileConfig](
[ID] [int] IDENTITY(1,1) NOT NULL,
[FileName] [nvarchar](50) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
[FilePath] [nvarchar](max) COLLATE SQL_Latin1_General_CP1_CI_AS NULL
) ON [PRIMARY]

The input file data is comma delimited
The destination is SQL Table.
The data will be modified as per the definition given below.


Input  File Columns
Output  File Columns
ID -> String
ID -> unsigned Integer
FName -> String(50)
FName -> String(50)
LName -> String(50)
LName -> String(50)
Age -> String
Age -> unsigned Integer
DOB > String(DD/MM/YYYY)
DOB > DateTime


So the Package will look like this:


Create 2 string Variable: FileName and FilePath with the scope as the package(Practice in my case). Set the value of the variable to a file name and path that exists (I set it to “File1.txt” & “C:\SSIS\Practice\Files”) just while development.
Next create an object variable(ObjfileDetails) to store the result set from the initial Execute SQL Task.


Now set up the Execute SQL Task to get the file name and path. We save the full result set to an object variable.
ResultSet : Full Result Set
ConnectionType : OLEDB(that’s what I have used you could also use ADO.NET)
Connection : Select the appropriate one.
Note: it is ordered by ID column.


Set up the Result set as shown in figure


Note: The Result Name is set to 0.
Once the SQL Task will be executed this object variable will containing the entire result set returned by the query.
In the collections tab do the below settings:
Use the object variable in the For each loop.
Set the enumerator to Foreach ADO Enumerator.


Next goto the Variable mapping and do the below mentioned settings:
Select the 2 string variable to which the file name and file path will be saved for each iteration of the object variable.

Note: Set the index starting from 0 and the order should be same as the order of the columns fetched in the SELECT query above.

Once this is done we put a Data Flow Task(DFT) in the above For Each Loop.
In the DFT use:
1. Flat File Source to fetch the data from the input file.
2. Derived Column transformation to change the string date field to DB_DBDATE
3. OLEDB Destination to send the valid records to database table.
4. Flat File destination to log the error records.
Let’s have a look at the layout of the DFT.


In the Flat file source we make a new connection manager where we set browse and set the file path of an existing file.
In the Advanced tab set the Column Names as per the need and set the data type to which you can convert the data without the loss of information like the numeric columns. Leave the other to default or set the max width of the string columns.


Once the Flat File connection manager is set up select the columns you need in the flow and your done.
Next connect the Flat File Source to the Derived column to change the DOB column to DT_DBDATE format.
To do this in the derived column add a new column give it a name and set the expression to :
(DT_DBDATE)(SUBSTRING(DOB,4,2) + "-" + SUBSTRING(DOB,1,2) + "-" + SUBSTRING(DOB,7,4))
Once the expression is set click the Configure Error Output button at the bottom.

In case of error in the Derived Column transform (e.g. the date field is No Value, “00/00/000” or any invalid date) set it to redirect row so that it can be logged and analyzed.


For the correct records send the data to Table
In the OLEDB Destination:
Select the OLEDB Connection needed to connect to the destination Data base.
Select the table where the columns need to be inserted.


Goto the mapping
Go the mapping and do the mapping as required and click OK.


For the error records(red arrow) after the derived Column set a flat file destination to log the error records.
The DFT is now all set.
Now the DFT Flat File source needs to be made dynamic.
To achieve this follow the steps below.
Goto the connection manager right click-> Properties


In the properties pane for the above connection manager:
Set the DelayValidation to True
Expand the Expressions by clicking the ellipse.


On the Property Expressions Editor window:
In the Property column select the ConnectionString property from the drop down list.
Now expand the Expression column by clicking the ellipse. An Expression Builder window will pop up.

On the Expression Builder window:
Set the Expression as: @[User::FilePath]+ “\\” + @[User::FileName]
Click the Evaluate expression button at the left hand bottom to see if the expression is creating the desired output.
Click OK.
Click OK on the Property Expression Editor as well.


Save and build the package.
Now when you execute the package the package will take the filenames from the DB table and dump the data in order in the final table.




Let me know if you need any further clarification or I am not clear in any step.

Expressions

Hi,
Lets talks about EXPRESSIONS.
It is one of the most interesting and used feature available in SSIS. At times we tend to over do in out attempt to work using Expressions. One needs to decide when it is best to use an expression. A few pointers that I would like to provide.
  • Use expressions where they are small
  • Do not make the expression overtly complicated. It makes management difficult.
  • Avoid too many nested conditions in expressions.
Expressions can be used at various places:
  1. Precedence constraint - Should evaluate to TRUE or FALSE
  2. Setting up variable value - Should be of the variable data type
  3. For Loop
  4. Setting up properties for various objects dynamically especially Connection Strings for various Connection Managers
  5. Derived Column Transform(most often used here)
  6. Conditional Split

Sample expressions:

Get today’s date in the format DD/MM/YYYY
RIGHT("0" + (DT_WSTR,2)DAY(GETDATE()),2) + "/" + RIGHT("0" + (DT_WSTR,2)MONTH(DATEADD("d",31,GETDATE())),2) + "/" + (DT_WSTR,4)YEAR(GETDATE())
Pad a string with zeroes on the left(output length-10)
RIGHT((“0000000000”+“STRING”),10)or
REPLICATE("0",10 - LEN("String")) + "STRING"

Convert date of input string column to date time. Validate that if its value is NULL or empty or illegal date it should be converted to Null date.
ISNULL([Column 0]) || LEN(TRIM((DT_WSTR,10)[Column 0])) == 0 || [Column 0]==”00-00-0000” ? NULL(DT_DATE) : (DT_DBDATE)((DT_WSTR,10)[Column 0])

Create file name with the current time stamp in the format: FileName_YY-MM-DD-HHMMSS.txt
"FileName_" + SUBSTRING((DT_WSTR,30)GETDATE(),1,10) + SUBSTRING((DT_WSTR,30)GETDATE(),12,2) + SUBSTRING((DT_WSTR,30)GETDATE(),15,2) + SUBSTRING((DT_WSTR,30)GETDATE(),18,2) + ".txt" 


Get the difference of the date in a column w.r.t. today’s date
DATEDIFF("d",(DT_DATE)(SUBSTRING([Column 0],10,2) + "-" + SUBSTRING([Column 0],12,2) + "-" + SUBSTRING([Column 0],6,4)),GETDATE()) 


If length of a column or variable is zero set its value to “asd” else set it to the variable value.
LEN([Column 0]) == 0 ? "asd" : [Column 0] 


FindString function:
FINDSTRING("111abcda","aa",1)  this will give 0
FINDSTRING("111abcda","a",1) this will give 4
FINDSTRING("111abcda","a",2) this will give 8

Check if a string contains a value do Step1 else Step2
FINDSTRING("111abcda","aa",1) >0? Step1 : Step2 

Set File name in the following format: FileName_MMMDD_YYYY.txt (FileName_Nov23_2009.txt)
"FileName_" + (MONTH(GETDATE()) == 1 ? "Jan" : MONTH(GETDATE()) == 2 ? "Feb" : MONTH(GETDATE()) == 3 ? "Mar" : MONTH(GETDATE()) == 4 ? "Apr" : MONTH(GETDATE()) == 5 ? "May" : MONTH(GETDATE()) == 6 ? "Jun" : MONTH(GETDATE()) == 7 ? "Jul" : MONTH(GETDATE()) == 8 ? "Aug" : MONTH(GETDATE()) == 9 ? "Sep" : MONTH(GETDATE()) == 10 ? "Oct" : MONTH(GETDATE()) == 11 ? "Nov" : MONTH(GETDATE()) == 12 ? "Dec" : "InvalidMonth") + (DT_WSTR,3)DAY(GETDATE()) + "_" + (DT_WSTR,5)YEAR(GETDATE()) + ".txt" 

Functions available to create EXPRESSIONS
I will talk about some of the most commonly used functions in the next section.


Mathematical Functions
FUNCTION
Result
DataType
ROUND( 2.23 , 2 )
2
Integer
ROUND( 2.53 , 2 )
3
Integer
CEILING(2.23 )
3
Numeric
FLOOR(2.9)
2
Numeric
ABS(3.2)
3
Numeric
ABS(3.9)
4
Numeric
ABS(-3.2)
3
Numeric
ABS(-3.9)
4
Numeric

String Functions
FUNCTION
Result
DataType
LOWER( "ABD" )
abc
String
UPPER( "abd" )
ABD
String
LTRIM( "    abc   " )
"abc   "
String
RTRIM( "   abc   " )
"   abc"
String
TRIM( "   abc   " )
"abc"
String
REVERSE( "abc" )
"cba"
String
RIGHT( abc, 2 )
"bc"
String
SUBSTRING( "abcde", 2 , 3 )
"bcd"
String
REPLICATE("a",3)
"aaa"
String
FINDSTRING("abcd","ab",1)
"ab" 1
String
REPLACE("Apple", "pp", "n app")
"An apple"
String

Date/Time Functions
FUNCTION
Result
DataType
DATEADD( "mm", 1, (DT_DATE)"12/24/2009" )
1/24/2010 0:00
DateTime
DATEADD( "dd", -1, (DT_DATE)"12/24/2009" )
12/23/2009 0:00
DateTime
DATEADD( "yy", 1, (DT_DATE)"12/24/2009" )
12/24/2010 0:00
DateTime
DATEDIFF( "dd", (DT_DATE)"12/24/2009", DT_DATE)"01/26/2010" )
33
Integer
DATEDIFF( "mm", (DT_DATE)"12/24/2009", (DT_DATE)"01/26/2010" )
1
Integer
DATEDIFF( "yy", (DT_DATE)"12/24/2009", (DT_DATE)"01/26/2010" )
1
Integer
DATEPART( "dd", (DT_DATE)"12/24/2009" )
24
Integer
DATEPART( "mm", (DT_DATE)"12/24/2009" )
12
Integer
DATEPART( "yy", (DT_DATE)"12/24/2009" )
2009
Integer
DAY( (DT_DATE)"12/24/2009" )
24
Integer
MONTH( (DT_DATE)"12/24/2009" )
12
Integer
YEAR( (DT_DATE)"12/24/2009" )
2009
Integer


NULL Functions
FUNCTION
Result
DataType
ISNULL( «expression» )
TRUE/FALSE
Boolean
NULL(DT_DATE)
NULL
DateTime


Type Casts
FUNCTION
(DT_STR, «length», «code_page»)
(DT_WSTR, «length»)
(DT_NUMERIC, «precision», «scale»)
(DT_DECIMAL, «scale»)

Operators

FUNCTION
Comments
?:
 If then Else -> "Condition"?"True Action": "False Action"
&&
Logical AND
||
Logical OR