Pages

Monday, April 28, 2014

How to convert a Shapefile from UTM (WGS84) Coordinates into GPS Latitude/Longitude in R

According to Gartner, more than 80% of all information is supposed to have spatial reference.
The business value of any kind of data with spatial reference can be dramatically leveraged by means of integration and visualization with geographical, demographic and geopolitical data.

Developers and IT professionals often choose the way of creating their own geographical master data instead of relying on traditional GIS applications, which are often too highly specialized in order to be integrate into the ongoing information systems.

For this purpose, many open data sources and technologies can be used; the most common data format is the ERSI shapefile. To import a shapefile from the filesystem to one database many different tools can be used; in SQL Server environments I suggest the free and fast Shape2SQL Freeware tool.

Important: during the upload. Shape2SQL uses an unique transaction. If for any reason the creation of the SQL spatial index fails, the entire transaction will be rollbacked and and you won't see any newly created table in your target database - I suggest to disable this automatic and buggy index creation feature. Do not also forget to set the SRID as 4236.

You might think that once you your "shape" table has been created, your pairs of latitude and longitude coordinates are ready to uniquely identify every surface/polygon as well as every point on the earth's surface. Unfortunately, it's not quite that simple.

If your goal is to integrate and visualize data on standard platforms as Google Maps, OpenStreeMap or Bing Maps, you need in fact GPS latitude/longitude coordinates in WGS 1984 format - otherwise known as EPSG 4326. Most of the open data sources, however, publish shapefile data in UTM format, an old format that differs from the latitude/longitude system in several respects.

How to convert shapefile from UTM to latitude/longitude GPS formats? Here is fast and no-cost solution, using the popular data manipulation and data analysis opensource framework "R", togheter with the gdal library.

First of all, we install gdal and switch to our working directory (i.e., the directory in which we copied the original UTM shapefile):

install.packages("rgdal")
setwd("[yourworkingdirectory]")
getwd()


We then import the shapefile by creating a dataset in our workspace:
shape <- readOGR("directory", layer="filename_without_extension")

A dataset called "shape" of type SpatialPolygonsDataFrame has now been created. We are curious to see what exactly we just imported, and how does it looks like:

dimensions(shape)
summary(shape)
plot(shape)

You should see something like this:



We are now ready for the UTM to GPS Lat/Long conversion:

shape_gps = spTransform(shape, CRS("+proj=longlat +ellps=GRS80"))

Eventually, we commit the result back to the filesystem:

writeOGR(shape_gps, ".", "shape_gps", driver="ESRI Shapefile")

A file called "shape_gps.shp" will now contain your gps coordinate data.

In case SQL Server complains about the validity of your shape data, make use of the MakeValid() SQL (CLR) function.

Tuesday, April 22, 2014

Talend Open Studio Cookbook by Rick Barton



Packt Publishing hat recently published a new Book: the "Talend Open Studio Cookbook" by Rick Barton.

As Data Warehouse developer and engineer, I've been extensively used Talend for many years. Apart from the basic tutorials provided by Talend, however, my only source for learning has always been the proactive Talend online community.

This book doesn't digress too much in theory and provides a full, comprehensive view on many every-day, concrete situations and their corresponding solving patterns. The learning-by-doing "recipe" approach followed by the book makes it easier to read and understand it. The book also provides a good foundation of XML principles; it lacks, however, a chapter illustrating the development of custom components, a scenario that is more likely to occur than expected.

Prerequisites for the book are a basic knowledge of Java or any c-like object-oriented programming language, as well as a rudimentary understanding of relational concepts.

I highly recommend this book for both IT experts and novices with a focus on system and data integration.