How to Use Google’s ML Kit to Enhance Pepper With AI (Part 3)

Welcome to the third part of this blog series “Enhancing Pepper robot with AI with Google’s ML Kit“! In case you missed the previous articles, I recommend you start reading here for an introduction to what we are building in this series. In the second part, we saw how to use ML Kit to teach Pepper to recognize objects around it and point at them.

In this article, we will explore is the Digital Ink Recognition API of Google’s ML Kit. We are going to leverage the Kit’s ability to recognize sketches and handwritten text on a digital surface to implement the following game: we will draw or write something on the tablet on Pepper’s chest and Pepper should be able to recognize what it is. Let’s look at an example video to see how it is supposed to work before we dive in.

Funny, isn’t it? It does a good job recognizing my drawings even given my poor skills! Let’s see how to implement this in our Android App.

Implementation

Here you can find the full code of the application.

The implementation of the recognition is based on the Quickstart sample application.

With regards to the architecture, we have the following classes: our fragment, the ViewModel, the DrawingView, which will be our main view with which we interact by touch, the ModelManager to manage the download and selection of ML models, the StrokeManager class that is the center of the recognition logic and the RecognitionTask that contains the task that will run asynchronously to obtain the recognition results.

Of course, we also need a layout that includes the DrawingView. In addition to recognizing drawings, this API also includes a mode to recognize handwritten text in different languages. We will use this as well and offer both options in the game by adding a toggle to the layout, through which the user can change from one mode to the other.

How to draw

To draw something on a touch screen, we need four basic components: a Bitmap to hold the pixels, a Canvas to host the draw calls that write into the bitmap, and drawing primitives such as Rect, Path, text, Bitmap, and some Paint.

The main component that is going to allow us to recognize what has been written or drawn on the screen is the Ink object. It represents the user input as a sequence of strokes (a stroke being a sequence of touch points between the finger-down and finger-up events). This will, later on, serve as input to the recognizer. The way to build it is to draw it on the canvas that we are going to hold on our DrawingView, the main view for rendering content with the mentioned basic components. This View reacts to touch inputs, renders them on the canvas, and passes the content to the StrokeManager to store the points in the strokes drawn into the Ink object.

How to recognize a drawing

The StrokeManager, in turn, manages the recognition logic and the content that has been added by scheduling the recognition on the background thread whenever it is notified that new content was added.

The RecognitionTask does so by triggering an asynchronous task in which we pass the Ink object to the DigitalInkRecognizer, the core component of the DigitalInk API from ML Kit. The recognizer returns a list of recognition candidates.

The StrokeManager will notify the fragment through an interface when the recognition content changes.

How to set everything up

To set it all up, the first thing we do in onViewCreated in our fragment is initialize the StrokeManager in our DrawingView.

Then, we download or update the ML Kit Ink Recognition models, if needed, and select the active mode (draw or text) based on the user selection on the screen through the toggle. We also set the event listeners for the interaction with the user: to start recognizing, to clear the screen, and to repeat the rules.

Lastly, Pepper will explain the rules and afterward wait for the user to finish drawing and either click the button or say that they are done. When either of those happens, the recognition task will be started asynchronously from the StrokeManager using the DigitalInkRecognizer object.

This is how we initialize everything in the onViewCreated method of our fragment:

override fun onViewCreated(view: View, savedInstanceState: Bundle?) {
    super.onViewCreated(view, savedInstanceState)

    // Init Stroke Manager
    binding.drawingDrawingView.setStrokeManager(strokeManager)
    strokeManager.setContentChangedListener(this)
    strokeManager.setClearCurrentInkAfterRecognition(true)
    strokeManager.setTriggerRecognitionAfterInput(false)
    strokeManager.reset()

    // Download ML Kit Ink Recognition models and select active model
    strokeManager.downloadModels()
    mode = selectActiveMode()
    strokeManager.setActiveModel(mode.string)

    // Model changed listener
    binding.drawingModelSwitch.setOnCheckedChangeListener { _, _ ->
        mode = selectActiveMode()
        strokeManager.setActiveModel(mode.string)

        mainViewModel.setQiChatVariable(
            getString(R.string.GameMode),
            if (mode == Mode.DRAWING) getString(R.string.drawingMode)
            else getString(
                R.string.textMode
            )
        )
        mainViewModel.goToQiChatBookmark(getString(R.string.changedModeBookmark))
    }

    // Button Listeners
    viewModel.uiEvents.observe(viewLifecycleOwner) {
        when (it) {
            DrawingViewModel.UiEvent.ExplainRules -> explainDrawingRules()
            DrawingViewModel.UiEvent.Clear -> clear()
            DrawingViewModel.UiEvent.RecognizeDrawing -> recognizeDrawing()
        }
    }
    explainDrawingRules()
}

override fun onViewCreated(view: View, savedInstanceState: Bundle?) {

super.onViewCreated(view, savedInstanceState)

// Init Stroke Manager

binding.drawingDrawingView.setStrokeManager(strokeManager)

strokeManager.setContentChangedListener(this)

strokeManager.setClearCurrentInkAfterRecognition(true)

strokeManager.setTriggerRecognitionAfterInput(false)

strokeManager.reset()

// Download ML Kit Ink Recognition models and select active model

strokeManager.downloadModels()

mode = selectActiveMode()

strokeManager.setActiveModel(mode.string)

// Model changed listener

binding.drawingModelSwitch.setOnCheckedChangeListener { _, _ ->

mode = selectActiveMode()

strokeManager.setActiveModel(mode.string)

mainViewModel.setQiChatVariable(

getString(R.string.GameMode),

if (mode == Mode.DRAWING) getString(R.string.drawingMode)

else getString(

R.string.textMode

)

mainViewModel.goToQiChatBookmark(getString(R.string.changedModeBookmark))

}

// Button Listeners

viewModel.uiEvents.observe(viewLifecycleOwner) {

when (it) {

DrawingViewModel.UiEvent.ExplainRules -> explainDrawingRules()

DrawingViewModel.UiEvent.Clear -> clear()

DrawingViewModel.UiEvent.RecognizeDrawing -> recognizeDrawing()

}

explainDrawingRules()

}

How to process the results

As mentioned earlier, our fragment needs to implement the interface StrokeManager.ContentChangedListener and override the method onDrawingContentChanged to be notified when the recognized content changes.

This is what we will do when the recognized content has changed after the user is done drawing:

override fun onDrawingContentChanged() {
    if (strokeManager.getContent().isNotEmpty()) {
        translateAndSayResult(strokeManager.getContent()[0].text ?: "")
    } else {
        goToNotRecognizedDrawingBookmark()
    }
}

override fun onDrawingContentChanged() {

if (strokeManager.getContent().isNotEmpty()) {

translateAndSayResult(strokeManager.getContent()[0].text ?: "")

} else {

goToNotRecognizedDrawingBookmark()

}

At that moment, the result will be translated, using the ML Kit Translation API (this will be explored in detail in another article of this series), in case needed, and made available to be told to the user via the chat using a bookmark and a QiChat variable. If nothing is available, Pepper will respond accordingly informing that nothing could be recognized.

private fun translateAndSayResult(text: String) {
    // Translate from english to the robot language if necessary and then go to the bookmark
    if (mode != Mode.DRAWING || mainViewModel.language == Language.ENGLISH) {
        goToRecognizedDrawingBookmark(text)
    } else {
        mainViewModel.translate(Language.ENGLISH, mainViewModel.language, text)
            .addOnSuccessListener { translated ->
                goToRecognizedDrawingBookmark(translated)
            }
    }
}

private fun translateAndSayResult(text: String) {

// Translate from english to the robot language if necessary and then go to the bookmark

if (mode != Mode.DRAWING || mainViewModel.language == Language.ENGLISH) {

goToRecognizedDrawingBookmark(text)

} else {

mainViewModel.translate(Language.ENGLISH, mainViewModel.language, text)

.addOnSuccessListener { translated ->

goToRecognizedDrawingBookmark(translated)

}

private fun goToRecognizedDrawingBookmark(text: String) {
    mainViewModel.setQiChatVariable(getString(R.string.recognizedDrawing), text)
    mainViewModel.goToQiChatBookmark(getString(R.string.recognizedDrawingBookmark))
}

private fun goToRecognizedDrawingBookmark(text: String) {

mainViewModel.setQiChatVariable(getString(R.string.recognizedDrawing), text)

mainViewModel.goToQiChatBookmark(getString(R.string.recognizedDrawingBookmark))

}

Voice interaction

Finally, both the UI and the logic are finished and the last thing that is necessary is the voice interaction of the game.

The dialog is always a very important part of a Pepper application as it constitutes how the user will communicate with the robot and the experience will very much be influenced by how the dialog is built. There are some essential concepts to take into account when designing the dialog to create a meaningful experience. For instance, Pepper should react to several ways how the user may ask a question or say something, otherwise, if it only knows one there is too big a chance that the user will decide on another and nothing will be recognized. How many? As many as you can think of is best. To provide a good experience, it is also interesting to do the same in the other direction, i.e., let Pepper answer slightly differently every time, so that it is not too repetitive and boring. This can be done by including several variations of an answer that are somewhat different in wording and programming a random pick. For example, making use of QiChat Concepts. Also, think of a Plan B, in case the dialog does not go as planned. For example, use the tablet to replicate the most important functions so that the user has another way to access the feature in case the dialog does not work, e.g., because it is too loud for them to understand each other. Another important point is to pay attention to the language the robot is using, making sure it is clear, and understandable, both acoustically by setting pauses at the right moments and in terms of content, not ambiguous. Moreover, the content should be adapted to the setting, the context, and the user. Not too formal, not too informal, friendly, respectful, etc.

This is what the part in our chat topic related to this game looks like. As you can see, we make extensive use of Bookmarks and QiChatVariables to communicate back and forth with the logic:

concept:(ready) ["ready" "i'm ready" "done" "i'm done" "i am ready" "i am done"]
concept:(recognizing) ^rand["mmm let me think just a moment" "okay, give me a second, what could that be?" "oh, that one's easy!" "let me think about it"]
concept:(correctguess) ^rand["yay super!" "i knew it!" "cool! I'm glad" "cool! you've done a very nice drawing"]
concept:(again) ^rand["let's play one more time" "do you want to play again?" "would you like to play once more?" "that was cool, let's do it again" "are you up for another round?"]

u:(^empty) %drawingRulesBookmark Hey, let's play a drawing game! \pau=600\ Think of something and draw it on my Tablet! \pau=600\ Using machine learning I will try to recognize what it is. Are you ready?
    u1:(~yes) great! \pau=600\ let's see %startGameBookmark \pau=600\ say done or press the button when you're done
    u1:(~no) okay, maybe later!

u:(drawing) %startDrawingBookmark

u:(~ready) %readyToRecognizeBookmark

u:(^empty) %recognizedDrawingBookmark ~recognizing \pau=500\ I would say it's a \pau=500\ $recognizedDrawing \pau=1000\ is it correct?
    u1:(~yes) ~correctguess \pau=500\ ~again %clearBookmark
    u1:(~no) ouch! please try again %clearBookmark

u:(^empty) %notRecognizeDrawingBookmark hmmm I can't really say what that is... would you like to try again?
    u1:(~yes) great
    u1:(~no) okay, maybe later!

concept:(ready) ["ready" "i'm ready" "done" "i'm done" "i am ready" "i am done"]

concept:(recognizing) ^rand["mmm let me think just a moment" "okay, give me a second, what could that be?" "oh, that one's easy!" "let me think about it"]

concept:(correctguess) ^rand["yay super!" "i knew it!" "cool! I'm glad" "cool! you've done a very nice drawing"]

concept:(again) ^rand["let's play one more time" "do you want to play again?" "would you like to play once more?" "that was cool, let's do it again" "are you up for another round?"]

u:(^empty) %drawingRulesBookmark Hey, let's play a drawing game! \pau=600\ Think of something and draw it on my Tablet! \pau=600\ Using machine learning I will try to recognize what it is. Are you ready?

u1:(~yes) great! \pau=600\ let's see %startGameBookmark \pau=600\ say done or press the button when you're done

u1:(~no) okay, maybe later!

u:(drawing) %startDrawingBookmark

u:(~ready) %readyToRecognizeBookmark

u:(^empty) %recognizedDrawingBookmark ~recognizing \pau=500\ I would say it's a \pau=500\ $recognizedDrawing \pau=1000\ is it correct?

u1:(~yes) ~correctguess \pau=500\ ~again %clearBookmark

u1:(~no) ouch! please try again %clearBookmark

u:(^empty) %notRecognizeDrawingBookmark hmmm I can't really say what that is... would you like to try again?

u1:(~yes) great

u1:(~no) okay, maybe later!

Conclusion

This is only a demo use case that you can build combining ML Kit DigitalInkRecognition API and Pepper but not all you can do, as the combination of both truly gives the opportunity to do a lot more cool apps and fun games that involve recognizing handwritten text and drawings on its tablet.

I hope you enjoyed the implementation of this game! Check out the rest of the articles of this series, where we’re going to see more use cases and how to implement them in our ML-Kit-powered Android app for the Pepper robot!

Introduction
demo with ML Kit’s Object Detection API
demo with ML Kit’s digital ink recognition api (this article)

Swift Training

Unser Training gibt eine Einführung in die Entwicklung mit Swift und vermittelt einen Überblick über das dazugehörige Ökosystem aus Tools, Frameworks und Libraries.

Zum Training

Name	Borlabs Cookie
Anbieter	Eigentümer dieser Website
Zweck	Speichert die Einstellungen der Besucher, die in der Cookie Box von Borlabs Cookie ausgewählt wurden.
Cookie Name	borlabs-cookie
Cookie Laufzeit	1 Jahr

Akzeptieren
Name	Google Analytics
Anbieter	Google LLC
Zweck	Cookie von Google für Website-Analysen. Erzeugt statistische Daten darüber, wie der Besucher die Website nutzt.
Datenschutzerklärung	https://policies.google.com/privacy?hl=de
Cookie Name	_ga,_gat,_gid
Cookie Laufzeit	2 Jahre

Akzeptieren
Name	Hotjar
Anbieter	Hotjar Ltd.
Zweck	Hotjar ist ein Analysewerkzeug für das Benutzerverhalten von Hotjar Ltd. Wir verwenden Hotjar, um zu verstehen, wie Benutzer mit unserer Website interagieren.
Datenschutzerklärung	https://www.hotjar.com/legal/policies/privacy/
Host(s)	*.hotjar.com
Cookie Name	_hjClosedSurveyInvites, _hjDonePolls, _hjMinimizedPolls, _hjDoneTestersWidgets, _hjIncludedInSample, _hjShownFeedbackMessage, _hjid, _hjRecordingLastActivity, hjTLDTest, _hjUserAttributesHash, _hjCachedUserAttributes, _hjLocalStorageTest, _hjptid
Cookie Laufzeit	Sitzung / 1 Jahr

Akzeptieren
Name	HubSpot
Anbieter	HubSpot Inc.
Zweck	HubSpot ist ein Verwaltungsdienst für Benutzerdatenbanken bereitgestellt von HubSpot, Inc. Wir nutzen HubSpot auf dieser Website für unsere Online Marketing-Aktivitäten.
Datenschutzerklärung	https://legal.hubspot.com/privacy-policy
Host(s)	*.hubspot.com, hubspot-avatars.s3.amazonaws.com, hubspot-realtime.ably.io, hubspot-rest.ably.io, js.hs-scripts.com
Cookie Name	__hs_opt_out, __hs_d_not_track, hs_ab_test, hs-messages-is-open, hs-messages-hide-welcome-message, __hstc, hubspotutk, __hssc, __hssrc, messagesUtk
Cookie Laufzeit	Sitzung / 30 Minuten / 1 Tag / 1 Jahr / 13 Monate

Akzeptieren
Name	OpenStreetMap
Anbieter	OpenStreetMap Foundation
Zweck	Wird verwendet, um OpenStreetMap-Inhalte zu entsperren.
Datenschutzerklärung	https://wiki.osmfoundation.org/wiki/Privacy_Policy
Host(s)	.openstreetmap.org
Cookie Name	_osm_location, _osm_session, _osm_totp_token, _osm_welcome, _pk_id., _pk_ref., _pk_ses., qos_token
Cookie Laufzeit	1-10 Jahre

How to Use Google’s ML Kit to Enhance Pepper With AI (Part 3)

Implementation

How to draw

How to recognize a drawing

How to set everything up

How to process the results

Voice interaction

Conclusion

Swift Training

Hat dir der Beitrag gefallen? Antworten abbrechen

Ähnliche Artikel

Level Up: Bewegungsmangel im Büro spielerisch bekämpfen

When and How to Start Coding With Kids

Developing an Audio App for Apple CarPlay – Lessons learned

How to Use Google’s ML Kit to Enhance Pepper With AI (Part 3)

Implementation

How to draw

How to recognize a drawing

How to set everything up

How to process the results

Voice interaction

Conclusion

Swift Training

Hat dir der Beitrag gefallen? Antworten abbrechen

Ähnliche Artikel

Level Up: Bewegungsmangel im Büro spielerisch bekämpfen

When and How to Start Coding With Kids

Developing an Audio App for Apple CarPlay – Lessons learned

Newsletter