Working with Spatial Data Part II – Plotting Shapes and Storing Geocoded Data

Geocoding Text-Based Spatial Data for Use In SSIS Packages Part-I ) Overview
Text based location information is the starting point for spatial data analysis, but this information is generally not directly consumable by applications that are used for representing data in analyzable form. To make this data usable  by an array of applications that analyze or consume geographic / spatial data, we need to convert it  to  a more standardized form by geocoding it. We discussed this process of geocoding using SSIS 2008 R2 and Bing Maps’ webservice in the first part of our BI-Satellite article series. After geocoding, the next step in the process is to model an in-memory or persistent data repository for storing geocoded data. Geocoded data is represented for analysis by a geometrical shape, and this shape can be anything from a point to a polygon. So it is very apparent that if we want to represent a geometry on a geographical plane, we need to create a mapping between a geocode value to geometrical shape that is intended to be plotted on the geography. For example, we can have latitudes and longitudes of a list of records as we saw in the first part of BI-Satellite article series and if we want to represent this  on a geographical map, we can represent it either as a point, polygon, square etc. Also there can be cases where you may  want to draw a line connecting all the locations in the same continent. In this case the line needs to be passed co-ordinates that joins all the intended locations. If the above paragraph seems too confusing, it would get more clear through the course of the exercise we will work through. The objective of the exercise is to store geocoded data and model an approach to store this data. Storage approach
A detailed discussion of Latitude and Longitude is beyond the  scope of this article, but for those  curious to dig deeper into this area, a good  explanation can be found  here. Once the geocoded data is obtained, the co-ordinate values (i.e. latitude and longitude) are stored in their respective fields along with their corresponding address records. But with the introduction of the geography data type in SQL Server 2008, the approach of storing this data has reached the next level of flexibility and become more concise and intelligent. We learned in our mathematics class in our school days that any geometric shape is a collection of points. We are going to use the same fundamental concept in storing this geocoded data. We are interested in recording just a single location and  we just need to create a point on our target area which is represented by a latitude and longitude combination. From an object oriented point of view, geography data type can be seen as a class. If your object oriented programming fundamentals are still fresh, you will recall that a class can have two different kind of methods, instance and static. In this case,  instance of the geography data type would mean a variable or field of the same type, and static methods would mean that you can use methods that can be accessed on the geography data type itself. From the above paragraph,  at least two things should be clear: 1) We need to create a point and store it in a field of the geography data type. 2) We will  require some functions / methods from the geography data type which would take latitude and longitude as input and return a point out of the same. We have had enough of theory by now, so let’s jump into  some practicals. From a implementation point of view, we need to accomplish two tasks: 1) Modify the “Address” table to accommodate the point that we would create on the geocoded data. 2) Modify the SSIS package to update the “Address” table with the value of this point. Modification
Follow the below outlined steps the modify the table and package: 1) Alter the Address Table and add a field named “Location” of “Geography” data type to the table. After this  has been done, your table should look something like the below screenshot. 2) Open the package, add a user variable named “UpdateCommand” from the control flow with package level scope. 3) Switch to the Data Flow task, edit the script transform properties and add a  user variable to the list of ReadWriteVariables property as shown in the below screenshot.

Continues…

Leave a comment

Your email address will not be published.