1. Introduction
SSIS – SQL Server Integration Services is an Extract – Transform – Load (ETL) tool of enterprise class, which is available as part of SQL server 2005/2008/2008R2/2012.SSIS enables database administrators and application developers to design, implement, and manage complex, high-performance ETL applications. Using SSIS, we can select data from one or more sources and standardize, join, merge, cleanse, augment, derive, calculate, and perform just about any other function and operation required for our data integration applications.One common use for SSIS is to move data from one data source to another. The reasons for moving data are too numerous to count. Some common business reasons for using SSIS include migrating business data from one application to another, extracting data for distribution to external entities, integrating data from external entities, creating sample test data sources for development environments, and extracting and loading data into business intelligence (BI) application systems.Listed below are some SQL Server Integration Services (SSIS) best practices
2. Use of Package configuration file



SSIS – SQL Server Integration Services is an Extract – Transform – Load (ETL) tool of enterprise class, which is available as part of SQL server 2005/2008/2008R2/2012.SSIS enables database administrators and application developers to design, implement, and manage complex, high-performance ETL applications. Using SSIS, we can select data from one or more sources and standardize, join, merge, cleanse, augment, derive, calculate, and perform just about any other function and operation required for our data integration applications.One common use for SSIS is to move data from one data source to another. The reasons for moving data are too numerous to count. Some common business reasons for using SSIS include migrating business data from one application to another, extracting data for distribution to external entities, integrating data from external entities, creating sample test data sources for development environments, and extracting and loading data into business intelligence (BI) application systems.Listed below are some SQL Server Integration Services (SSIS) best practices
2. Use of Package configuration file
Never hard code database names or changeable
values like mail server or path in the SSIS packages, always use an xml Package
Configuration File.The connection managers also need to be made configurable
using expressions. Hence if there is any change in query or server name or
output file name, we can quickly make the changes in config file and execute
the package without opening and editing the package. For complex packages we
can even split the configuration files as this is more modular and easier. Even
a SQL Server database and table are created to contain the actual package configurations.
3. Calling a child Package
from a Parent package:
If we are using a part of code multiple
times in the same package or in multiple packages, we can separate the reused
code and make it a child package which is being called from the Parent package.
When a child package is executed from a master package, the parameters that are
passed from the master Package needs to be configured in the child package. For
this, we can use the “Parent Package Configuration” option in thechild package.
But, for using the “Parent Package Configuration”, we need to specify the name
of the “Parent Package Variable” that is passed to the child package. If we
want to call the same child package multiple times(each time with a different
parameter value), declare the parent package variables (with the same name as
givenin the child package) with a scope limited to “Execute Package Tasks”.
4. Protection.
To avoid most of the package deployment error
from one system to another system, set the package protectionlevel to “DontSaveSenstive”.
5. Sequence Containers
Make use of sequence containers to group
logical related tasks into a single group for better visibility and
understanding.
6.
Delay Validation :
Consider a scenario in which first component
of the package creates an object i.e. a temporary table, which is being
referenced by the second component of the package. During package validation,
the first component has not yet executed, so no object has been created causing
a package validation failure when validating the second component. SSIS will
throw a validation exception and will not start the package execution. To
overcome this situation every component has a DelayValidation (default=FALSE) property. If we set it toTRUE,
early validation will be skipped and the component will be validated only at
the component level (latevalidation) which is during package execution.
7. Logging:
Logs help to gather run-time information
about a package and troubleshoot a package every time it is run. For this the
Package Level Logging Mode needs to be set as Enabled. We can enable or Disable
the Logging for each task separately or can use UseParentSetting option.
We can use any type of log providers given in the
provider type list. Also make sure to tick the type of log provider added,
otherwise nothing will be written into the log file.
Make log file names dynamic so that you get
a new log file for each execution.
8. Package Name Limitations:
When an SSIS package with a package name
exceeding 100 chars is deployed into SQL Server, it trims thepackage name to
100 chars, which may cause an execution failure. So, limit the package names to
a maximum of100 characters.
9. Annotations:
Please provide annotations in the package
wherever necessary for better understanding of the logic.
10. Naming Conventions in SSIS:
When creating connections, do not use the
default name. Say for Package Log use "FF_DEST_Conn_SSISPkgLog", for
SQL connection use "OLEDB_Connection", and for SMTP use "SMTP_Server_Connection”.
The package variable names used should be
meaningful names which describe their contents and use.
The name of the package as well as the tasks
used should reflect its function.
11.
OLEDB Destination Settings:
There are a couple of settings with OLEDB destination which can impact the performance of data transfer which are listed below.
Data Access Mode – This setting provides the 'fast load' option which internally uses a BULK INSERT statement for uploading data into the destination table instead of a simple INSERT statement (for each single row) as in the case for other options. So unless you have a reason for changing it, don't change this default value of fast load. If you select the 'fast load' option, there are also a couple of other settings which you can use as discussed below.
Keep Identity – By default this setting is unchecked which means the destination table (if it has an identity column) will create identity values on its own. If you check this setting, the dataflow engine will ensure that the source identity values are preserved and same value is inserted into the destination table.
There are a couple of settings with OLEDB destination which can impact the performance of data transfer which are listed below.
Data Access Mode – This setting provides the 'fast load' option which internally uses a BULK INSERT statement for uploading data into the destination table instead of a simple INSERT statement (for each single row) as in the case for other options. So unless you have a reason for changing it, don't change this default value of fast load. If you select the 'fast load' option, there are also a couple of other settings which you can use as discussed below.
Keep Identity – By default this setting is unchecked which means the destination table (if it has an identity column) will create identity values on its own. If you check this setting, the dataflow engine will ensure that the source identity values are preserved and same value is inserted into the destination table.
Keep Nulls – Again by default this
setting is unchecked which means default value will be inserted (if the default
constraint is defined on the target column) during insert into the destination
table if NULL value is coming from the source for that particular column. If
you check this option then default constraint on the destination table's column
will be ignored and preserved NULL of the source column will be inserted into
the destination.
Table Lock – By default this setting is checked and the recommendation is to let it be checked unless the same table is being used by some other process at same time. It specifies a table lock will be acquired on the destination table instead of acquiring multiple row level locks, which could turn into lock escalation problems.
Check Constraints – Again by default this setting is checked and recommendation is to un-check it if you are sure that the incoming data is not going to violate constraints of the destination table. This setting specifies that the dataflow pipeline engine will validate the incoming data against the constraints of target table. If you uncheck this option it will improve the performance of the data load.
Table Lock – By default this setting is checked and the recommendation is to let it be checked unless the same table is being used by some other process at same time. It specifies a table lock will be acquired on the destination table instead of acquiring multiple row level locks, which could turn into lock escalation problems.
Check Constraints – Again by default this setting is checked and recommendation is to un-check it if you are sure that the incoming data is not going to violate constraints of the destination table. This setting specifies that the dataflow pipeline engine will validate the incoming data against the constraints of target table. If you uncheck this option it will improve the performance of the data load.
12.
Environment issue for Excel
connection:
The Excel Source or Excel
Connection manager works only with the 32 bit runtime. When a package using the
Excel Source is enabled for 64-bit runtime (by default, it is enabled), it will
fail on the production server using the 64-bit runtime. Go to the solution
property pages\debugging and set Run64BitRuntime to False. If we are using
Batch file for execution, the dtexec command will by default point to the 64-
bit driver. In order to avoid package
failure, specify the 32- bit dtexec say “E:\Program Files (x86)\Microsoft SQL
Server\100\DTS\Binn\DTExec".
13.
Retaining Connection in SSIS:
The control flow of an SSIS
package threads together various control tasks. It is possible to set a
transaction that can span into multiple tasks using the same connection. To
enable this, the “retainsameconnection” property of the Connection Manager
should be set to “True”.But in some cases say we are creating an Excel file and
the next task is to send the file via email to a few intended recipients, the
package will fail giving error as the process is already in use. In such
situations, the “retainsameconnection” property of the Connection Manager
should be set to “False”.
14. Type
Casting for Excel destination:
For writing data into excel file,
the columns with string data types needs to be type casted to [DT_WSTR] data
type.
15.
OLEDB connection establishment
using C# script task:
While connecting to the DB server
using Windows authentication in C# script task, we need to provide the
connection string such that the Provider should come in fourth position after
Integrated Security Parameter.
16.
Excel Provider issues
If the production
machine is using BIDS
(Business Intelligence Development Studio) tool which is a 32bit process
(devenv.exe, you can check it from Task Manager), it cannot reach 64bit ACE
OLEDB
Provider. To avoid this we should have 32bit ACE OLEDB Provider installed on
your machine. But you cannot install 32bit. So the connection string for Excel needs
to be “Provider=Microsoft.Jet.OLEDB.4.0;Data Source=;Extended
Properties="EXCEL 8.0;HDR=YES";”.
If we are
using Access DB connection managers, the Provider should be “Provider=Microsoft.Jet.OLEDB.4.0;”
17.
Sorting:
If you know that data in a source
is sorted, set IsSorted=TRUE on the source adapter output. This may save
unnecessary SORTs later in the pipeline which can be expensive. Setting this
value does not perform a sort operation; it only indicates that the data it
sorted.
18. Effective use of queries:
Only select columns that you need
in the pipeline to reduce buffer size and reduce OnWarning events at
execution time. Use variable on the SQLSourceType
property of the Execute SQL Task instead of direct SQL statements. This removes
ambiguity when different OLE DB providers are being used. It is also easier as
the queries are passed from the configuration file and any modification will be
easier.
19. Use
of Event Handler:
Whenever there is an error or warning or any
event happening on a package, it can be configured to send email and notify the
specified users. For this event handlers are used.
20.
Folder Structure:
Use the same folder structure for
all the packages as the code for validation of folders in the SSIS packages can
be reused.
No comments:
Post a Comment