Francesco Sbrescia's Blog

Objects detection with Data Lakes Analytics

In this blog I’m going to show one of the advantages of linking Data Lakes Analytics with Machine Learning.

We’ll be uploading a series of images to the Data Lake, we will then run a USQL script that will detect objects in the images and create relative tags in a text file.

First of all you need an instance of  Data Lake Store and one of Data Lake Analytics, once these are up and running we need to enable Python/R/Cognitive in your Data Lake Analytics instance (here is a blog to help you out on this).

First things first, we need to put an image in our Data Lake Store, following Azure Data Lake best practices I put the images in my laboratory subfolder.


Once our images are in place we need to create a script, in your Data Lake analytics instance click on New Job


This will open a new blade with an empty script, let’s give our new Job a name “ImageTagging”.

In order to use Image tagging we need to import the relevant ASSEMBLIES:


Next we need to extract information (location, filename etc.) on the image file(s) we want to analyse, in this case we’ll process all images in the specified folder.

EXTRACT FileName string, ImgData byte[]

FROM @"/Laboratory/Desks/CSbrescia/ImageTagging/{FileName:*}.jpg"
USING new Cognition.Vision.ImageExtractor();

The following step is where the magic happens, the script analyses all the images located in the folder indicated before, it detects all objects present in each image and create tags; here is the structure of this “variable”:

  • Image name
  • Number of tagged objects detected
  • A string with all the tags
PROCESS @images
    PRODUCE FileName,
            NumObjects int,
            Tags string
    READONLY FileName
    USING new Cognition.Vision.ImageTagger();

Now we can write our variable with all the tags to an output file

OUTPUT @TaggedObjects

TO "/Laboratory/Desks/CSbrescia/ImageTagging/ImageTags.tsv"
   USING Outputters.Tsv();

Here are the images I used in this example


And here is the list of objects detected



In conclusion, we have created a pretty handy tool for automatic image tagging using Data Lake with very little knowledge required on the background processes involved.

To be noted that there seems to be an image size limit, i had to resize all images to about 500 kb. 


Comments (5) -

  • Michael Rys

    4/28/2017 6:30:47 PM | Reply

    Great post Smile. One small nitpick: The syntax on the file set should be just {filename} and not {filename:*}. I am even surprised you are not getting an error on the latter (this form got deprecated some while ago).

    There is a rowsize limit of 4MB. Since you are using the processor, the data needs to be provided in a rowset, it limits the images to at most 4MB (minus the other column value sizes). If the object detection would be available inside an extractor, you would not run into this limit.


  • Tris Robinson

    5/2/2017 8:26:34 AM | Reply

    Nice blog Chicco! Overall, how accurate would you say the detection rates were and the false positives rate? Is the tool reliable? For instance, some tags on the fruit seem to be guesses such as table - you can't actually tell that from the picture but I'm guessing it usually recognises fruit bowls appearing on tables so it's tagged the photo as table even though it's on a white background. Useful all the same though! Smile

    • Chicco Sbrescia

      5/9/2017 10:29:23 AM | Reply

      For image tagging it follows a category taxonomy ( the algorithm it's quite accurate when searching for things withing the taxonomy

  • Jose Mendes

    5/4/2017 11:37:46 AM | Reply

    Great tutorial Chicco.
    Can you tell me what is the source for the tagging, ie, what is behind the method Cognition.Vision.ImageTagger?

    • Chicco Sbrescia

      5/9/2017 10:31:52 AM | Reply

      it's all part of a taxonomy ( i suppose that with time Microsoft will expand the list of detectable objects.