Jim Bennett

Senior Cloud Developer Advocate at Microsoft, Xamarin Certified Developer, author of Xamarin In Action, blogger, speaker, father and lover of beer, whisky and Thai food. Opinions are my own.

  Reading, UK

See me soon at:

Techorama NL
Expert Day For Xamarin
Caribbean Developers Conference

My book:

Xamarin In Action

My podcast:

The Jim And Tonic Show
The Jim And Tonic Show on iTunes

Contact me:

  Xamarin In Action
  Twitter
  The Jim And Tonic Show
  GitHub
  LinkedIn
  YouTube
  Email
  CV

Recently I've been playing with a lot AI and seeing how it can be used in mobile apps to enhance the experience offered to the user. Currently I am playing with facial recognition using the Azure Cognitive Services FaceAPI. This is a stupidly powerful API that can do a LOT of different things:

  • Detect faces in images
  • Analyze those faces to detect characteristics such as hair color, gender, age
  • Detect the different points on a face, such as the pupils or lips
  • Compare two faces and verify if they are from the same person
  • Group faces by similar facial characteristics
  • Identify faces from a repository of up to a million images

This is a very powerful set of APIs with a large number of different use cases. For example if you were building a social network you could use the facial identification to automatically tag peoples friends in images. If you were building a ride share system you could use facial verification to ensure the driver is who you expect it to be. For now I'm going to focus on one particular example - identifying faces from a photo in a mobile app.

Face Finder

I've built a sample mobile app to show off the facial recognition tools in this API, and you can grab the code from my GitHub repo. This app takes a photo, then finds all the faces in that photo, giving a breakdown of the details of each face.

Animated GIF of the Face Finder app in action

In the rest of this post I'll go through you can get signed up for FaceAPI, and how the app works.

Getting started with FaceAPI

Start by heading to the FaceAPI page. From this page you can see some of the features in action, such as face verification and face detection. You can even upload pictures yourself to see what faces it detects. Once you are ready click the big green Try Face API button. From there you will see an option to get an API key. Click the button, agree to the T&Cs (assuming you do agree to them of course), and log in with your preferred identity provider. Once logged in you will see the API limits for the free tier, an endpoint and 2 keys.

The Face API keys, limits and endpoint

These trial keys have a short lifespan, only 30 days. You are also limited to 30,000 API calls in those 30 days, making a maximum of 20 calls to the API per minute. After the 30 days you can generate new keys and start all over again. This is just designed for you to try out the API, once you are ready to use it in a production system you can subscribe using the Azure portal and pay for more API calls.

Building and running the app

The Face Finder app is pretty complete, all you need to do is update the ApiKeys.cs file with your API key and endpoint. For the FaceApiKey, just copy yours and paste it in. For the FaceApiRegion, find the relevant entry in the AzureRegions enum that matches the endpoint showing. For example, for me the endpoint is https://westcentralus.api.cognitive.microsoft.com/vision/v1.0, so I set my region to be AzureRegions.Westcentralus.

Once you have done this, build and run the app. When it loads, tap the Take photo button and take a picture of one or more faces. The app will then show a list of all the faces detected, describing them using the detected age and gender. Tap on a face in the list to see more details, including if that person is smiling, if they are wearing glasses, what hair, facial hair and makeup they have, and their emotion.

So how does it work

This app is a simple Xamarin.Forms app, with three pages and some view models. The first page, FaceFinderPage.xaml has a button you tap to take a photo, wired up to a command on the FaceFinderViewModel. This uses the Xam.Plugin.Media plugin from James Montemagno to launch the camera and take a picture. This picture is then run through the face API.

The face API is accessed via an SDK from a NuGet package. Currently there are a load of NuGet packages from Microsoft with names containing ProjectOxford - the code name for the various vision cognitive services. These are being replaced with new packages that are called Microsoft.Azure.CognitiveServices.*, and these packages are currently in pre-release. For the face API, I'm using the Microsoft.Azure.CognitiveServices.Vision package.

Adding the Vision package

The two classes of note in this package are FaceAPI and FaceOperations. FaceAPI wraps your connection to the Azure Face Api and is configured using your API key and endpoint, FaceOperations uses this API to perform the different face-related operations on an image.

Initializing the FaceAPI

Before you can use the Face API you have to configure it to use one of your keys and the appropriate endpoint. When constructing an instance of FaceAPI you need to pass it credentials, in the form of an instance of ApiKeyServiceClientCredentials which takes one of the API keys assigned to your account as a string in a constructor argument:

var creds = new ApiKeyServiceClientCredentials("<your api key>");  

You can then pass this to the constructor of the FaceAPI:

var faceApi = new FaceAPI(creds);  

Finally you set the appropriate Azure region using the enum value that matches the endpoint shown with your keys, for example if your endpoint is in the West Central US region use AzureRegions.Westcentralus:

faceApi.AzureRegion = AzureRegions.Westcentralus;  

Detecting faces

Once you have your instance of the FaceAPI you can then use that to detect faces using the FaceOperations class. Construct an instance of this class passing in the face API:

var fo = new FaceOperations(faceApi);  

Once you have the face operations class, you can use this to perform all the different operations the API supports. The method I'm interested in is the DetectInStreamAsync method. This takes a stream containing the image, and sends it up to Azure to detect faces. The Async suffix is because it is an async method that you can await (not sure why they've added this suffix - they don't have non-async versions on the API).

var faces = fo.DetectInStreamAsync(imageStream);  

The imageStream comes from the media plugin. When you use this plugin to take a photo it returns a MediaFile which has a method to get the image as a stream that can be passed to the detect call. This method has some other parameters on it which we'll look at later.

The results of this call is a list of detected faces - literally a List<DetectedFace>, with one entry per face that was detected in the image. Each DetectedFace in the list contains a set of properties about that face, including coordinates of a rectangle that shows where the face is in the image. The picture below shows a face with this rectangle drawn on top.

A picture of the author with a bounding box showing the face rectangle

Detecting face attributes

So far, so good - we can find where a face is. Now what about more details about the face? This is where the extra parameters on the DetectInStreamAsync method come in.

DetectInStreamAsync(Stream image,  
                    bool? returnFaceId = true, 
                    bool? returnFaceLandmarks = false, 
                    IList<FaceAttributeTypes> returnFaceAttributes = null, 
                    CancellationToken cancellationToken = default(CancellationToken));

So what do these parameters do:

  • returnFaceId - set this to true (the default) to return an Id for the face. This Id can then be used for face searching operations - outside the scope of this post!
  • returnFaceLandmarks - set this to true (default is false) to return the coordinates of the facial landmarks, for example the positions of the pupils, nose, lips etc.
  • returnFaceAttributes - this is a list of the different face attributes you want returned. There are a lot of these! In the Face Finder app I get them all, and they are:
    • Age - a guess at the age of the face. Seeing as it predicted me at 9 years older than I am in the image above it's either buggy, or (more likely) I need more sleep and to look after myself!
    • Gender - a guess at the presented gender. Just limited to male or female.
    • HeadPose - what position the head is in, pitch, roll and yaw.
    • Smile - the percentage certainty that the face is smiling.
    • FacialHair - the percentage certainty that the face has a beard, mustache or sideburns.
    • Glasses - the type of glasses (if any) the person is wearing, so normal glasses, sun glasses etc.
    • Emotion - the percentages that the face is displaying a set of emotions (e.g. anger, happiness, surprise)
    • Hair - the percentages certainty that the face has different colored hair. If no hair is detected then this list is empty, otherwise it covers all natural hair colors and 'other'. This also specifies if the face is bald or if the hair is invisible (such as under a hat or scarf).
    • Makeup - The percentage certainty that the face has eye or lip makeup on.
    • Occlusion - how much of the face is occluded (such as by a mask, bandana, hair etc.)
    • Accessories - any accessories on the face, such as glasses or a hat
    • Blur - how blurry the face is.
    • Exposure - how well exposed the picture is.
    • Noise - how much noise there is in the image.

These landmarks and attributes come back on the DetectedFace instances, and are present for all faces in the image.

What sort of apps can I build with this?

This API is great and can provide a LOT of power, but are they just for mucking around, or can we build real-world apps with them? Well a few examples I can think of without trying too hard are:

  • Passport photo app. Passports have strict requirements about photos, so you could use this API to ensure the person was looking at the camera (HeadPose), no glasses (Glasses) or make-up (Makeup), and a neutral expression (Emotion). The photo needs to be of good quality (Exposure, Blur, Noise).
  • Facial blurring. You could detect the face rectangle and blur out faces automatically, for example in apps that make photos public. You could even do it by age to only blur out children if needed.
  • Adding decoration to faces. Using the landmarks you could add fake glasses, hats, dog noses or other things to a face to make a comedy picture. I've even seen an example replacing faces with emojis that match the emotion being shown.
  • Auto-picture selection. You could take a selection of pictures and have it choose the best one based on quality (Exposure, Blur, Noise) and if the people in it are smiling (Emotion).

Where can I learn more?

We've got loads of great content on-line showing all the cool things you ca do with this API, from all different languages. Check them out:




About the Author

Jim Bennett

Cloud Developer Advocate at Microsoft, Xamarin Certified Developer, blogger, author of Xamarin in Action, speaker, father and lover of beer, whisky and Thai food. Opinions are mine

 

Recently I've been playing with a lot AI and seeing how it can be used in mobile apps to enhance the experience offered to the user. Currently I am playing with facial recognition using the Azure Cognitive Services FaceAPI. This is a stupidly powerful API that can do a LOT of different things:

  • Detect faces in images
  • Analyze those faces to detect characteristics such as hair color, gender, age
  • Detect the different points on a face, such as the pupils or lips
  • Compare two faces and verify if they are from the same person
  • Group faces by similar facial characteristics
  • Identify faces from a repository of up to a million images

This is a very powerful set of APIs with a large number of different use cases. For example if you were building a social network you could use the facial identification to automatically tag peoples friends in images. If you were building a ride share system you could use facial verification to ensure the driver is who you expect it to be. For now I'm going to focus on one particular example - identifying faces from a photo in a mobile app.

Face Finder

I've built a sample mobile app to show off the facial recognition tools in this API, and you can grab the code from my GitHub repo. This app takes a photo, then finds all the faces in that photo, giving a breakdown of the details of each face.

Animated GIF of the Face Finder app in action

In the rest of this post I'll go through you can get signed up for FaceAPI, and how the app works.

Getting started with FaceAPI

Start by heading to the FaceAPI page. From this page you can see some of the features in action, such as face verification and face detection. You can even upload pictures yourself to see what faces it detects. Once you are ready click the big green Try Face API button. From there you will see an option to get an API key. Click the button, agree to the T&Cs (assuming you do agree to them of course), and log in with your preferred identity provider. Once logged in you will see the API limits for the free tier, an endpoint and 2 keys.

The Face API keys, limits and endpoint

These trial keys have a short lifespan, only 30 days. You are also limited to 30,000 API calls in those 30 days, making a maximum of 20 calls to the API per minute. After the 30 days you can generate new keys and start all over again. This is just designed for you to try out the API, once you are ready to use it in a production system you can subscribe using the Azure portal and pay for more API calls.

Building and running the app

The Face Finder app is pretty complete, all you need to do is update the ApiKeys.cs file with your API key and endpoint. For the FaceApiKey, just copy yours and paste it in. For the FaceApiRegion, find the relevant entry in the AzureRegions enum that matches the endpoint showing. For example, for me the endpoint is https://westcentralus.api.cognitive.microsoft.com/vision/v1.0, so I set my region to be AzureRegions.Westcentralus.

Once you have done this, build and run the app. When it loads, tap the Take photo button and take a picture of one or more faces. The app will then show a list of all the faces detected, describing them using the detected age and gender. Tap on a face in the list to see more details, including if that person is smiling, if they are wearing glasses, what hair, facial hair and makeup they have, and their emotion.

So how does it work

This app is a simple Xamarin.Forms app, with three pages and some view models. The first page, FaceFinderPage.xaml has a button you tap to take a photo, wired up to a command on the FaceFinderViewModel. This uses the Xam.Plugin.Media plugin from James Montemagno to launch the camera and take a picture. This picture is then run through the face API.

The face API is accessed via an SDK from a NuGet package. Currently there are a load of NuGet packages from Microsoft with names containing ProjectOxford - the code name for the various vision cognitive services. These are being replaced with new packages that are called Microsoft.Azure.CognitiveServices.*, and these packages are currently in pre-release. For the face API, I'm using the Microsoft.Azure.CognitiveServices.Vision package.

Adding the Vision package

The two classes of note in this package are FaceAPI and FaceOperations. FaceAPI wraps your connection to the Azure Face Api and is configured using your API key and endpoint, FaceOperations uses this API to perform the different face-related operations on an image.

Initializing the FaceAPI

Before you can use the Face API you have to configure it to use one of your keys and the appropriate endpoint. When constructing an instance of FaceAPI you need to pass it credentials, in the form of an instance of ApiKeyServiceClientCredentials which takes one of the API keys assigned to your account as a string in a constructor argument:

var creds = new ApiKeyServiceClientCredentials("<your api key>");  

You can then pass this to the constructor of the FaceAPI:

var faceApi = new FaceAPI(creds);  

Finally you set the appropriate Azure region using the enum value that matches the endpoint shown with your keys, for example if your endpoint is in the West Central US region use AzureRegions.Westcentralus:

faceApi.AzureRegion = AzureRegions.Westcentralus;  

Detecting faces

Once you have your instance of the FaceAPI you can then use that to detect faces using the FaceOperations class. Construct an instance of this class passing in the face API:

var fo = new FaceOperations(faceApi);  

Once you have the face operations class, you can use this to perform all the different operations the API supports. The method I'm interested in is the DetectInStreamAsync method. This takes a stream containing the image, and sends it up to Azure to detect faces. The Async suffix is because it is an async method that you can await (not sure why they've added this suffix - they don't have non-async versions on the API).

var faces = fo.DetectInStreamAsync(imageStream);  

The imageStream comes from the media plugin. When you use this plugin to take a photo it returns a MediaFile which has a method to get the image as a stream that can be passed to the detect call. This method has some other parameters on it which we'll look at later.

The results of this call is a list of detected faces - literally a List<DetectedFace>, with one entry per face that was detected in the image. Each DetectedFace in the list contains a set of properties about that face, including coordinates of a rectangle that shows where the face is in the image. The picture below shows a face with this rectangle drawn on top.

A picture of the author with a bounding box showing the face rectangle

Detecting face attributes

So far, so good - we can find where a face is. Now what about more details about the face? This is where the extra parameters on the DetectInStreamAsync method come in.

DetectInStreamAsync(Stream image,  
                    bool? returnFaceId = true, 
                    bool? returnFaceLandmarks = false, 
                    IList<FaceAttributeTypes> returnFaceAttributes = null, 
                    CancellationToken cancellationToken = default(CancellationToken));

So what do these parameters do:

  • returnFaceId - set this to true (the default) to return an Id for the face. This Id can then be used for face searching operations - outside the scope of this post!
  • returnFaceLandmarks - set this to true (default is false) to return the coordinates of the facial landmarks, for example the positions of the pupils, nose, lips etc.
  • returnFaceAttributes - this is a list of the different face attributes you want returned. There are a lot of these! In the Face Finder app I get them all, and they are:
    • Age - a guess at the age of the face. Seeing as it predicted me at 9 years older than I am in the image above it's either buggy, or (more likely) I need more sleep and to look after myself!
    • Gender - a guess at the presented gender. Just limited to male or female.
    • HeadPose - what position the head is in, pitch, roll and yaw.
    • Smile - the percentage certainty that the face is smiling.
    • FacialHair - the percentage certainty that the face has a beard, mustache or sideburns.
    • Glasses - the type of glasses (if any) the person is wearing, so normal glasses, sun glasses etc.
    • Emotion - the percentages that the face is displaying a set of emotions (e.g. anger, happiness, surprise)
    • Hair - the percentages certainty that the face has different colored hair. If no hair is detected then this list is empty, otherwise it covers all natural hair colors and 'other'. This also specifies if the face is bald or if the hair is invisible (such as under a hat or scarf).
    • Makeup - The percentage certainty that the face has eye or lip makeup on.
    • Occlusion - how much of the face is occluded (such as by a mask, bandana, hair etc.)
    • Accessories - any accessories on the face, such as glasses or a hat
    • Blur - how blurry the face is.
    • Exposure - how well exposed the picture is.
    • Noise - how much noise there is in the image.

These landmarks and attributes come back on the DetectedFace instances, and are present for all faces in the image.

What sort of apps can I build with this?

This API is great and can provide a LOT of power, but are they just for mucking around, or can we build real-world apps with them? Well a few examples I can think of without trying too hard are:

  • Passport photo app. Passports have strict requirements about photos, so you could use this API to ensure the person was looking at the camera (HeadPose), no glasses (Glasses) or make-up (Makeup), and a neutral expression (Emotion). The photo needs to be of good quality (Exposure, Blur, Noise).
  • Facial blurring. You could detect the face rectangle and blur out faces automatically, for example in apps that make photos public. You could even do it by age to only blur out children if needed.
  • Adding decoration to faces. Using the landmarks you could add fake glasses, hats, dog noses or other things to a face to make a comedy picture. I've even seen an example replacing faces with emojis that match the emotion being shown.
  • Auto-picture selection. You could take a selection of pictures and have it choose the best one based on quality (Exposure, Blur, Noise) and if the people in it are smiling (Emotion).

Where can I learn more?

We've got loads of great content on-line showing all the cool things you ca do with this API, from all different languages. Check them out: