Recognize text, faces and landmarks: Add Machine Learning to your Android apps

17 min read

Machine learning (ML) can help you create innovative, immersive and unique experiences for your mobile users.

Once you have mastered ML, you can use it to create a variety of applications, including apps that automatically organize photos based on the subject, recognize and track a person's face in a live stream, extract text from an image and much more .

But ML is not really beginner friendly! If you want to improve your Android apps with powerful machine learning capabilities, where exactly do you start?

In this article I will give an overview of an SDK (Software Development Kit) that promises to have the power of ML within reach even if you zero ML experience. At the end of this article you have the basics you need to start creating intelligent, ML-driven apps that are able to label images, scan barcodes, recognize faces and famous landmarks and many other powerful Perform ML tasks.

Meet the machine learning kit from Google

With the introduction of technologies such as TensorFlow and CloudVision, ML is being used more and more, but these technologies are not for the faint of heart! You usually need a deep understanding of neural networks and data analysis, just to get started with technology such as TensorFlow.

Even if you do have some experience with ML, creating a machine learning-powered mobile app can be a time-consuming, complex and expensive process, where you need to gather enough data to train your own ML models, and then optimize those ML models to work efficiently in the mobile environment. If you are an individual developer or have limited resources, it may not be possible to put your ML knowledge into practice.

ML Kit is Google's attempt to bring machine learning to the masses.

Under the hood, ML Kit bundles several powerful ML technologies that typically require extensive ML knowledge, including Cloud Vision, TensorFlow and the Android Neural Networks API. ML Kit combines these specialized ML technologies with pre-trained models for general cases of mobile use, including extracting text from an image, scanning a bar code and identifying the contents of a photo.

Regardless of whether you already have some knowledge of ML, you can use ML Kit to add powerful machine learning capabilities to your Android and iOS apps – simply pass on some data to the appropriate part of ML Kit, such as the text recognition or language identification API, and this API uses machine learning to send a response back.

How do I use the ML Kit API & # 39; s?

ML Kit is subdivided into various APIs that are distributed as part of the Firebase platform. If you want to use one of the ML Kit APIs, you must make a connection between your Android Studio project and a corresponding Firebase project and then communicate with Firebase.

Most ML Kit models are available as internal models that you can download and use locally, but some models are also available in the cloud, allowing your app to perform ML tasks via the device's internet connection.

Each approach has its own unique strengths and weaknesses, so you must decide whether local or external processing is the most useful for your specific app. You can even add support for both models and then let your users decide which model to use at runtime. Alternatively, you can configure your app to select the best model for the current conditions, for example, only using the cloud-based model when the device is connected to Wi-Fi.

If you opt for the local model, your app's machine selection functions are always available, regardless of whether the user has an active internet connection. Because all the work is done locally, built-in models are ideal when your app needs to process large amounts of data quickly, for example if you use ML Kit to manipulate a live video stream.

Meanwhile, cloud-based models typically offer more accuracy than their device counterparts, because the cloud models use the power of Google Cloud Platform's machine learning technology. For example, the on-device model of the Image Labeling API includes 400 labels, but the cloud model is over 10,000 labels.

Depending on the API, there may also be some functionality that is only available in the cloud. For example, the text recognition API can only identify non-Latin characters if you use the cloud-based model.

The cloud-based APIs are only available for Blaze-level Firebase projects, so you must upgrade to a Blaze plan before you can use any of the ML Kit cloud models.

If you decide to explore the cloud models, a free quota was available for all ML Kit APIs at the time of writing. If you just wanted to experiment with cloud-based image tagging, you can upgrade your Firebase project to the Blaze plan, test the API for fewer than 1,000 images, and then switch back to the free Spark plan without being billed. However, terms and conditions have an annoying habit of changing over time, so be sure to read the fine print before upgrading to Blaze, just to make sure you don't get hit by unexpected bills!

Identify text in each image, with the Text Recognition API

The Text Recognition API can intelligently identify, analyze and process text.

You can use this API to create applications that extract text from an image, so your users don't have to waste time on laborious manual data entry. For example, you can use the text recognition API to help your users extract and record the data from receipts, invoices, business cards or even food labels, simply by taking a photo of the item in question.

You can even use the text recognition API as the first step in a translation app, where the user takes a picture of an unknown text and the API extracts all text from the image, ready to be passed on to a translation service.

The ML Kit text recognition API on the device can identify text in any Latin language, while the opposite in the cloud can recognize a wider variety of languages ​​and characters, including Chinese, Japanese and Korean characters. The cloud-based model has also been optimized to extract thin text from images and text from tightly packed documents, which you should take into account when deciding which model you want to use in your app.

Do you want some practical experience with this API? Then view our step-by-step guide for creating an application that can extract the text from any image using the text recognition API.

Understanding the content of an image: the Image Labeling API

The Image Labeling API can recognize entities in an image, including locations, people, products, and animals, without the need for additional contextual metadata. The Image Labeling API returns information about the detected entities in the form of labels. For example, in the following screenshot, I provided the API with a nature photo and it responded with labels like & # 39; Forest & # 39; and & # 39; River & # 39 ;.

This ability to recognize the content of an image can help you create apps that tag photos based on their subject; filters that automatically identify and remove inappropriate content submitted by users from your app; or as a basis for advanced search functionality.

Many of the ML Kit APIs return multiple possible results, complete with associated reliability scores – including the Image Labeling API. If you pass the image of a poodle photo, it can return labels such as "poodle", "dog", "pet" and "small animal", all with varying scores that indicate the confidence of the API in each label. Hopefully "poodle" has the highest confidence score in this scenario!

You can use this reliability score to create a threshold that must be met before your application interacts with a particular tag, for example, to display it to the user or to tag a photo with this tag.

Image marking is available both on the device and in the cloud, but if you choose the cloud model, you will have access to more than 10,000 labels compared to the 400 labels included in the model on the device.

For a more in-depth look at the Image Labeling API, check out Determining the content of an image with machine learning. In this article, we build an application that processes an image and then returns the labels and reliability scores for each entity detected in that image. We also implement on-device and cloud models in this app, so you can see exactly how the results differ depending on the model you choose.

Expressions and traces: the face recognition API

The Face Detection API can locate human faces in photos, videos, and live streams and extracts information about each detected face, including position, size, and orientation.

You can use this API to help users edit their photos, for example by automatically cropping all the blank space around their latest headshot.

The Face Detection API is not limited to images – you can also apply this API to videos, for example you can create an app that identifies all faces in a video feed and then blurs everything except those faces, similar to the blurring property of Skype.

Face detection is always performed on the device, where it is fast enough to be used in real time, so unlike the majority of the API's of ML Kit, Face Detection does not record a cloud model.

In addition to detecting faces, this API has a few extra functions that are worth exploring. First, the Face Detection API can identify face landmarks, such as eyes, lips, and ears, and then retrieve the exact coordinates for each of these landmarks. This one landmark provides you with an accurate map of every detected face – perfect for creating augmented reality (AR) apps that add Snapchat-style masks and filters to the user's camera feed.

The Face Detection API also offers facial treatment classification. ML Kit currently supports two facial classifications: eyes open and smiling.

You can use this classification as a basis for accessibility services, such as hands-free operation, or to create games that respond to the player's facial expression. The ability to detect if someone is smiling or has their eyes open can also come in handy when you take a camera app – after all, there's nothing worse than taking a bunch of photos just to find out later that someone is had his eyes closed in it every recording.

Finally, the Face Detection API includes a Face Tracking component, which assigns an ID to a face and then tracks tracks that are directed across multiple consecutive images or video frames. Note that this is face Keep up and no real facial treatment recognition. Behind the scenes, the Face Detection API tracks the position and movement of the face and then concludes that this face is probably from the same person, but it is ultimately unaware of the person's identity.

Try the Face Detection API yourself! Discover how you can build a face-detecting app with machine learning and Firebase ML Kit.

Barcode scanning with Firebase and ML

Barcode scanning may not sound as exciting as some of the other machine learning APIs, but it is one of the most accessible parts of ML Kit.

Barcode scanning does not require specialist hardware or software, so you can use the Barcode Scanning API and ensure that your app remains accessible to as many people as possible, including users on older or budget devices. As long as a device has a working camera, it should have no problems scanning a barcode.

ML Kit & # 39; s Barcode Scanning API can extract a wide range of information from printed and digital barcodes, making it a fast, easy and accessible way to pass on real-world information to your application, without the need for user annoying manual have to perform data entry.

There are nine different data types that the Barcode Scanning API can recognize and parse from a barcode:

  • TYPE_CALENDAR_EVENT. This contains information such as the location of the event, the organizer and the start and end time. If you are promoting an event, you can put a printed barcode on your posters or flyers or place a digital barcode on your website. Potential attendees can then extract all information about your event by simply scanning the barcode.
  • TYPE_CONTACT_INFO. This data type includes information such as the e-mail address, name, telephone number and contact title.
  • TYPE_DRIVER_LICENSE. This contains information such as the street, city, state, name and date of birth in connection with the driving license.
  • TYPE_EMAIL. This data type includes an e-mail address, plus the subject line of the e-mail and body text.
  • TYPE_GEO. This contains the latitude and longitude for a specific geostation, which is an easy way to share a location with your users or to share their location with others. You can even use geographic barcodes to trigger location-based events, such as displaying useful information about the user's current location or as a basis for location-based mobile games.
  • TYPE_PHONE. This contains the telephone number and number type, for example whether it is a work or a telephone number at home.
  • TYPE_SMS. This contains a text message body and the telephone number that is linked to the text message.
  • TYPE_URL. This data type contains a URL and the title of the URL. Scanning a TYPE_URL barcode is much easier than if you trust your users to manually type a long, complex URL without typos or spelling errors.
  • TYPE_WIFI. This contains the SSID and password of a Wi-Fi network plus the encryption type such as OPEN, WEP or WPA. A Wi-Fi barcode is one of the easiest ways to share Wi-Fi login information, while also completely eliminating the risk of your users entering this information incorrectly.

The Barcode Scanning API can parse data from a range of different barcodes, including linear formats such as Codabar, Code 39, EAN-8, ITF and UPC-A, and 2D formats such as Aztec, Data Matrix and QR codes.

To make it easier for your end users, this API simultaneously scans for all supported barcodes and can also extract data regardless of the orientation of the barcode – so it doesn't matter if the barcode is completely upside down when the user scans it!

Machine Learning in the Cloud: the Landmark Recognition API

You can use ML Kit's Landmark Recognition API to identify known natural and constructed landmarks within an image.

If you provide this API with an image with a famous landmark, it returns the name of that landmark, the latitude and longitude values ​​of the landmark, and a selection box to indicate where the landmark is found within the image.

You can use the Landmark Recognition API to create applications that automatically tag the user's photos, or to provide a more personalized experience, for example, if your app recognizes that a user is capturing photos of the user's photos. Eiffel Tower, then it may be some interesting facts about this landmark, or suggest similar, nearby tourist attractions that the user might want to visit later.

Unusual for ML Kit, the Landmark Detection API is only available as a cloud-based API, so your application can only perform landmark detection when the device has an active internet connection.

The language identification API: development for an international audience

Nowadays, Android apps are used all over the world by users who speak many different languages.

The ML Kit language identification API can address your Android app to an international audience by creating a series of text and determining the language in which it is written. The language identification API can identify more than a hundred different languages, including romanized text for Arabic, Bulgarian, Chinese, Greek, Hindi, Japanese and Russian.

This API can be a valuable addition to any application that processes text provided by the user, as this text rarely contains any language information. You can also use the language identification API in translation apps as the first step to translation something, is to know with which language you work! For example, if the user points the camera of his device at a menu, your app can use the language identification API to determine if the menu is written in French and then offer to translate this menu with a service such as the Cloud Translation API ( maybe after extracting the text, using the text recognition API?)

Depending on the string in question, the Language Identification API can return multiple possible languages, accompanied by trust scores, so you can determine which detected language is most likely correct. Note that at the time of writing ML Kit could not identify multiple different languages ​​within the same set.

To ensure that this API provides real-time language identification, the language identification API is only available as a model on the device.

Available soon: smart answer

Google plans to add more APIs to ML Kit in the future, but we already know about one upcoming API.

According to the ML Kit website the upcoming one is Smart Reply API You can provide responses to contextual messages in your applications by proposing text fragments that fit into the current context. Based on what we already know about this API, it seems that Smart Reply is similar to the suggested answer function that is already available in the Android Messages app, Wear operating system and Gmail.

The following screenshot shows what the suggested answer function currently looks like in Gmail.

What's next? Use TensorFlow Lite with ML Kit

ML Kit offers ready-made models for common mobile use cases, but at some point you may want to go further than these ready-made models.

It is possible to create your own ML models with TensorFlow Lite and then distribute them with ML Kit. Keep in mind, however, that unlike ML Kit's ready-made API & # 39; s, working with your own ML models significant amount of ML expertise.

After you have created your TensorFlow Lite models, you can upload them to Firebase and Google then manages the hosting and offers these models to your end users again. In this scenario, ML Kit acts as an API layer above your custom model, which simplifies some of the hard work involved in using custom models. In particular, ML Kit automatically pushes the latest version of your model to your users, so you don't have to update your app every time you want to adjust your model.

To provide the best possible user experience, you can specify the conditions that must be met before your application will download new versions of your TensorFlow Lite model, for example, only update the model when the device is idle, charged or connected to Wi-Fi. Fi. Fi. You can even use ML Kit and TensorFlow Lite in addition to other Firebase services, for example with Firebase Remote Config and Firebase A / B tests to show different models to different sets of users.

If you want to go beyond pre-built models, or if the existing ML Kit models do not fully meet your needs, you can find more information about creating your own machine learning models with the official Firebase documents.

Shut down

In this article, we have looked at every part of the Google machine kennel set and we have discussed a number of common scenarios in which you may want to use all the ML Kit APIs.

Google plans to add more API & # 39; s in the future, so which computer & # 39; s do you see added to ML Kit next? Let us know in the comments below!


fbq(‘init’, ‘539715236194816’);
fbq(‘track’, “PageView”);

Written by

Don Bradman

Leave a Reply

Your email address will not be published. Required fields are marked *