Working with Spatial Data Part I – Geocoding Text-Based Spatial Data for Use In SSIS Packages
Address information is one of the most common attributes of data stored in a business data repository. Examples of such data can be addresses of employees, store locations of retail chains, sales of a product etc. You
may wonder about the last example and how sales of a product can be linked with an address. In most of the retail chains, customers are provided with a
loyalty card, on which they collect points for each product purchased. This card contains the address of the customer
and so geographic data on sales is collected. This data facilitates data mining using algorithms such as market-basket analysis by geography, and using the same data in spatial form one can analyze consumption of any particular product in an
The focus for this article is geocoding of any location data. In simplest definition,
geocoding is the process of deriving the position of a location in any standard geographical notation. The most well-known form of the
geocoding is latitude and longitude , by which any position on the earth’s perimeter can be
Spatial (geographical) data analysis for any purpose includes the below steps:
1) Geocoding text based location information.
2) Generating a repository of the same geocoded information for an applications
3) Representing this information in an analyzable form, such as on a geographical map.
4) Customizing the presentation layer to make dynamic retrieval of data and formatting the same in a presentable manner.
For this demonstration we will focus on Step 1 – Geocoding location information.
In order to create a demo project for our concept, we need to make three different artifacts available to our project which are as below:
1) Test Data: We need to create some test data which we can be used for geocoding. As we want to apply this concept to data that is found in some real world database systems, we need to create an “Address” table as shown in the below screenshot. I have inserted some data, which are some famous buildings across the globe. I have intentionally left some values NULL in either “AddressLine2” field or “City” field,
so that it simulates the data in legacy or some on-line transaction processing (OLTP) systems. Many systems allow
for entering data in a free text format and we may not always be able to find data precise to the last digit of
a zip code. For now, to keep our focus on the concept, we will keep this data simple and easy to geocode.
2) Geocode processing engine: As we ourselves cannot create a database of all the places across the globe, we need
a service such as Google Maps or Bing Maps which can take text based location information as
an input and return the latitude and longitude of the corresponding location.
These web based applications expose web-services which can be consumed by
applications, our purpose in this case is to geocode the text based data we have in our “Address” table.
For the purposes of this sample application, we will use the webservice exposed by Bings Maps.
First we will need to subscribe to the service, to do so, navigate to www.bingmapsportal.com, sign up and create a key which
will be your gateway to the webservices exposed by Bing Maps. In the below
screenshot, you can see that I have created my application-key pair.
3) Geocode webservice wrapper class: We need to create a client or wrapper class which we can use in our SSIS package, so that we can conveniently consume it’s webservices.
The .NET Framework ships with the WSDL.exe tool, which can create a proxy class by taking the webservice’s wsdl (Web
Service Description Language) as the input. You need to have any version of the
.NET Framework installed, and use this tool with the below command from
the Visual Studio Command prompt:
wsdl /out:C:/geocode.cs http://dev.virtualearth.net/webservices/v1/geocodeservice/geocodeservice.svc?wsdl
Upon executing this command, the tool will create a geocode.cs C# class file, which is the proxy class we
will use in our SSIS package (the URL string in the command is used to retrieve
the wsdl of the Bing Maps geocode webservice).