Geocoding Text-Based Spatial Data for Use In SSIS Packages Part-I
)
Overview
Text based location information is the starting point for spatial data
analysis, but this information is generally not directly consumable by
applications that are used for representing data in analyzable form. To make
this data usable
by an array of applications that analyze or consume geographic / spatial data,
we need to convert it to a more standardized form by geocoding it.
We discussed this process of geocoding using SSIS 2008 R2 and Bing Maps’ webservice in the first part of our BI-Satellite article series.
After geocoding, the next step in the process is to model an in-memory or
persistent data repository for storing geocoded data. Geocoded data is
represented for analysis by a geometrical shape, and this shape can be anything
from a point to a polygon. So it is very apparent that if we want to represent a
geometry on a geographical plane, we need to create a mapping between a geocode
value to geometrical shape that is intended to be plotted on the geography. For
example, we can have latitudes and longitudes of a list of records as we saw in
the first part of BI-Satellite article series and if we want to represent
this on
a geographical map, we can represent it either as a point, polygon, square etc.
Also there can be cases where you may want to draw a line connecting all the
locations in the same continent. In this case the line needs to be passed
co-ordinates that joins all the intended locations.
If the above paragraph seems too confusing, it would get more clear through the
course of the exercise we will work through. The objective of
the exercise is to store geocoded data and
model an approach to store this data.
Storage approach
A detailed discussion of Latitude and Longitude is
beyond the scope of this article, but for those curious to dig
deeper into this area, a
good explanation can be found
here.
Once the geocoded data is obtained, the co-ordinate values (i.e. latitude and
longitude) are stored in their respective fields along with their corresponding
address records. But with the introduction of the geography data type in SQL Server
2008, the approach of storing this data has reached the next level of flexibility
and become more concise and intelligent.
We learned in our mathematics class in our school days that any geometric shape
is a collection of points. We are going to use the same fundamental concept in
storing this geocoded data. We are interested in recording just a single
location and we just need to create a point on our target area which is represented
by a latitude and longitude combination. From an object oriented point of view,
geography data type can be seen as a class. If your object oriented programming
fundamentals are still fresh, you will recall that a class can have two
different kind of methods, instance and static. In this case, instance of
the geography data type would mean a variable or field of the same type, and static
methods would mean that you can use methods that can be accessed on the
geography data type itself.
From the above paragraph, at least two things should be
clear:
1) We need to create a point and store it in a field of the geography data
type.
2) We will require some functions / methods from the geography data type
which would take latitude and longitude as input and return a point out of the
same.
We have had enough of theory by now, so let’s jump into some practicals. From a
implementation point of view, we need to accomplish two tasks:
1) Modify the “Address” table to accommodate the point that we would
create on the geocoded data.
2) Modify the SSIS package to update the “Address” table with the value
of this point.
Modification
Follow the below
outlined steps the modify the table and
package:
1) Alter the Address Table and add a field named “Location” of “Geography” data
type to the table. After this has been done, your table should look
something like the below screenshot.
2) Open the package, add a user variable named “UpdateCommand” from the control
flow with package level scope.
3) Switch to the Data Flow task, edit the script transform properties and add a user variable to the list of ReadWriteVariables property as shown in the below
screenshot.
Working with Spatial Data Part II – Plotting Shapes and Storing Geocoded Data
Continues…