SQL Server 2008 introduced a new category of datatypes known as
spatial datatypes which store spatial information. The new spatial datatypes are
geography and geometry. SQL Server Management Studio comes with
good good support
for these spatial data types like the spatial results tab and facilitated access
to .NET Common Language Runtime (CLR) functions on these datatypes.
SQL Server Integration Services (SSIS) is one of the business intelligence
applications which is most sensitive to datatypes. Even in the R2 version
of SQL Server 2008, it can be safely said that there is no real out-of-box
support for these spatial data types. The only possible datatype the maps in
SSIS to any spatial datatype is DT_IMAGE, which means that any spatial data is
treated like a binary large object (blob) data. There may be cases when
spatial data from two different database servers are brought in and one may
wish to perform some spatial functions on the these. However , if
the data is DT_IMAGE, it is almost impossible to dissect the
different pieces of information from this data.
Solution :
There is no built-in support for spatial data types in
SSIS, which means that there is no datatype which exactly matches the spatial
datatype and when any spatial field is read using a source adapter, the
datatype that will be mapped to any spatial field is DT_IMAGE. The
solution to this issue is re-building the datatype in SSIS. Script
transformation is the only component for rebuilding any spatial datatype,
and we will also examine the reason for this.
For
this demonstration, we will consider a dataset
which has any spatial datatype field, and to keep our focus on the concept, we
will
consider the geography datatype. We will read this datatype from a SQL Server table
and then extract information from this datatype. This exercise will
demonstrate how to use spatial datatypes in SSIS to the same level as in SQL Server using SSMS.
Environmental setup:
For
this exercise, will use a table consisting of at least
one field of the geography datatype having at least one record. To save time and
effort, I will reuse the Address table that we built in the
BI-Satellite project,
as below:
Key Factors in the Design Decision:
Before we start with the package
design, there are several points to keep in mind :
1) Every geography datatype has a spatial reference identifier (Srid) associated
with it, which will be stored along with the value stored in the field.
2) To rebuild a .NET CLR datatype (i.e. geography) we need to use a transform
that is capable of facilitating access to .NET assemblies, which is the Script
transform.
3) Almost all of the spatial static or instance .NET CLR functions capable of
creating a spatial object take two input parameters: i) Srid and ii)
geometrical structure in text, xml, binary or a byte stream format.
4) Microsoft.SQLServer.Types namespace contains classes related to spatial
datatypes, and for the geography datatype the class that we require is SqlGeography.
5) Geography Markup Language (GML) is one of the XML based data format standard
defined by Open Geospatial Consortium (OGC), and fortunately the geography
datatype
has built-in functions to export GML data.
6) We need to make sure that when we read data into the package,
this should
be read in some form of text and not as a DT_IMAGE datatype. If the data is read
as DT_IMAGE, then it will be impossible to cast it to a datatype which
can be fed to a .NET CLR function to create a spatial datatype.
Achieve Spatial Data Support in SSIS
Continues…