I recently had to implement a solution to prove it was possible to integrate a shape file (.SHP) in Azure Data Lake Store (ADLS) for post geographic spatial analysis using Azure Data Lake Analytics (ADLA).
A shape file is a data set used by a geographic analysis application that stores a collection of geographic features, such as streets or zip code boundaries, in the form of points, lines or area features.
As you already figured, storing a shape file in ADLS is not a difficult goal to achieve, however, how can you possibly use ADLA to obtain the geographic data from the file? In this blog I’ll explain how we can extract the data to a supported format, such as CSV, and use it to run geographic spatial analysis in ADLA, with the support of the spatial data types introduced in the SQL Server 2008 (details here).
As always, whenever we face a limitation of ADLA, C# is our best friend. In order to read the content of a shape file, we need to start by adding a geospatial assembly to our solution, which, in my case, was the “Catfood” ESRI Shapefile Reader (details here).
The shape file used in this example contains a list of parks in London. The following code demonstrates how to extract the metadata and the geographic shapes to a CSV file. The only shapes extracted are polygons, although it is possible to add more if needed.
public static void CreateWorkForThreads()
//Create a new dataset and store the data in a table
DataSet ds = CreateNewDataSet();
DataTable dt = ds.Tables;
int count = 0;
// Parse the shapefile and select the columns we are interested in
using (Shapefile shapefile = new Shapefile(@"path\file.shp"))
foreach (Shape shape in shapefile)
string metadataNames = shape.GetMetadataNames();
string geometry = "";
int countParts = 0;
int countShape = 0;
DataRow dr = dt.NewRow();
//Extract the metadata. The first iteraction will extract the name of the columns
if (metadataNames != null)
foreach (string metadataName in metadataNames)
if (count == 0)
dr[metadataName] = metadataName;
dr[metadataName] = shape.GetMetadata(metadataName);
//Shape is not part of the metadata, so manually defining the name of the column
if (count == 0)
dr["shape"] = "shape";
// cast shape based on the type
// a point is just a single x/y point
ShapePoint shapePoint = shape as ShapePoint;
MessageBox.Show("Point (" + shapePoint.Point.X.ToString() + ", " + shapePoint.Point.Y.ToString() + ")");
// a polygon contains one or more parts - each part is a list of points which
// are clockwise for boundaries and anti-clockwise for holes
// see http://www.esri.com/library/whitepapers/pdfs/shapefile.pdf
ShapePolygon shapePolygon = shape as ShapePolygon;
foreach (PointD part in shapePolygon.Parts)
countShape = 0;
if (countParts == 0)
geometry = "(";
geometry = geometry + " | (";
foreach (PointD point in part)
if (part.Length - 1 != countShape)
geometry = geometry + point.X + " " + point.Y + " |";
geometry = geometry + point.X + " " + point.Y + " )";
//Build our Polygon.
//Eg. POLYGON((-122.358 47.653, -122.348 47.649| -122.348 47.658, -122.358 47.658, -122.358 47.653))
dr["shape"] = "POLYGON(" + geometry + ")";
//Extract the data to a csv file
using (System.IO.StreamWriter sw =
foreach (DataRow row in dt.Rows)
object array = row.ItemArray;
for (i = 0; i < array.Length - 1; i++)
sw.Write(array[i].ToString() + ",");
public static DataSet CreateNewDataSet()
DataSet dsTemp = new DataSet();
DataTable dtTemp = new DataTable("londonparks");
Now that we have a valid file that can be processed by ADLA, we can upload it to ADLS and start performing geospatial analysis. To do so, I simply followed the logic described in Sacha’s blog (here).
The following U-SQL has in consideration a dataset that contains details of the trajectory of a courier, tracked on a daily basis. With the following code, we identify if a courier drove by a park by using the Intersect function. Because we have to cross two datasets, a C# function was created to help the evaluation of multiple events.
// Reference the assemblies we require in our script.
REFERENCE SYSTEM ASSEMBLY [System.Xml];
REFERENCE ASSEMBLY [SQLServerExtensions].[SqlSpatial];
REFERENCE ASSEMBLY [USQL.Core];
// Once the appropriate assemblies are registered, we can alias them using the USING keyword.
USING Geometry = Microsoft.SqlServer.Types.SqlGeometry;
USING Geography = Microsoft.SqlServer.Types.SqlGeography;
USING SqlChars = System.Data.SqlTypes.SqlChars;
// Extract the list of parks
USING Extractors.Text(delimiter : ',', silent : false, quoting : true, skipFirstNRows : 1);
//Extract data from the file containing the courier trajectory
USING Extractors.Text(delimiter : '|', silent : false, quoting : true, skipFirstNRows : 1);
//Get the list of vehicles that drove by the park.
SELECT DISTINCT a. *,
"1" AS VehicleIntersected
FROM @trajectories AS a
@parks AS b
WHERE Utilities.Intersect(b.[SHAPE], a.[Longitude], a.[Latitude]).ToString() == "True";
//Get the list of vehicles that didn't drive by the park.
SELECT a. *,
"0" AS VehicleIntersected
FROM @trajectories AS a
@vehicleIntersection AS b
ON b.VehicleKey == a.VehicleKey
AND b.GPSDateTimeUTC == a.GPSDateTimeUTC
WHERE b.VehicleKey IS NULL;
//Union both datasets to get the complete set of data
//Export the results to a csv file
@finalData TO "LABORATORY/GeoSpatialIntersection.csv"
USING Outputters.Text(outputHeader : true, delimiter : ',', quoting : true);
And here is the C# function. It accepts three parameters and calculate the intersection of a point with a shape.
public static string Intersect(string shape, double longitude, double latitude)
//Because we had a csv file, the coordinates in the polygon were separated by |
//It is important to use the .MakeValid() method to validate any invalid shape
//In case the dataset had multypoligon shapes, without the MakeValid(), the function would throw an error
var g =
var h = Geography.Point(longitude, latitude, 4326);
As always, if you have any questions or comments, do let me know.