Welcome to the third part of this blog series “Enhancing Pepper robot with AI with Google’s ML Kit“! In case you missed the previous articles, I recommend you start reading here for an introduction to what we are building in this series. In the second part, we saw how to use ML Kit to teach Pepper to recognize objects around it and point at them.
In this article, we will explore is the Digital Ink Recognition API of Google’s ML Kit. We are going to leverage the Kit’s ability to recognize sketches and handwritten text on a digital surface to implement the following game: we will draw or write something on the tablet on Pepper’s chest and Pepper should be able to recognize what it is. Let’s look at an example video to see how it is supposed to work before we dive in.
Funny, isn’t it? It does a good job recognizing my drawings even given my poor skills! Let’s see how to implement this in our Android App.
Implementation
Here you can find the full code of the application.
The implementation of the recognition is based on the Quickstart sample application.
With regards to the architecture, we have the following classes: our fragment, the ViewModel, the DrawingView, which will be our main view with which we interact by touch, the ModelManager to manage the download and selection of ML models, the StrokeManager class that is the center of the recognition logic and the RecognitionTask that contains the task that will run asynchronously to obtain the recognition results.
Of course, we also need a layout that includes the DrawingView. In addition to recognizing drawings, this API also includes a mode to recognize handwritten text in different languages. We will use this as well and offer both options in the game by adding a toggle to the layout, through which the user can change from one mode to the other.
How to draw
To draw something on a touch screen, we need four basic components: a Bitmap to hold the pixels, a Canvas to host the draw calls that write into the bitmap, and drawing primitives such as Rect, Path, text, Bitmap, and some Paint.
The main component that is going to allow us to recognize what has been written or drawn on the screen is the Ink object. It represents the user input as a sequence of strokes (a stroke being a sequence of touch points between the finger-down and finger-up events). This will, later on, serve as input to the recognizer. The way to build it is to draw it on the canvas that we are going to hold on our DrawingView, the main view for rendering content with the mentioned basic components. This View reacts to touch inputs, renders them on the canvas, and passes the content to the StrokeManager to store the points in the strokes drawn into the Ink object.
How to recognize a drawing
The StrokeManager, in turn, manages the recognition logic and the content that has been added by scheduling the recognition on the background thread whenever it is notified that new content was added.
The RecognitionTask does so by triggering an asynchronous task in which we pass the Ink object to the DigitalInkRecognizer, the core component of the DigitalInk API from ML Kit. The recognizer returns a list of recognition candidates.
The StrokeManager will notify the fragment through an interface when the recognition content changes.
How to set everything up
To set it all up, the first thing we do in onViewCreated in our fragment is initialize the StrokeManager in our DrawingView.
Then, we download or update the ML Kit Ink Recognition models, if needed, and select the active mode (draw or text) based on the user selection on the screen through the toggle. We also set the event listeners for the interaction with the user: to start recognizing, to clear the screen, and to repeat the rules.
Lastly, Pepper will explain the rules and afterward wait for the user to finish drawing and either click the button or say that they are done. When either of those happens, the recognition task will be started asynchronously from the StrokeManager using the DigitalInkRecognizer object.
This is how we initialize everything in the onViewCreated method of our fragment:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 |
override fun onViewCreated(view: View, savedInstanceState: Bundle?) { super.onViewCreated(view, savedInstanceState) // Init Stroke Manager binding.drawingDrawingView.setStrokeManager(strokeManager) strokeManager.setContentChangedListener(this) strokeManager.setClearCurrentInkAfterRecognition(true) strokeManager.setTriggerRecognitionAfterInput(false) strokeManager.reset() // Download ML Kit Ink Recognition models and select active model strokeManager.downloadModels() mode = selectActiveMode() strokeManager.setActiveModel(mode.string) // Model changed listener binding.drawingModelSwitch.setOnCheckedChangeListener { _, _ -> mode = selectActiveMode() strokeManager.setActiveModel(mode.string) mainViewModel.setQiChatVariable( getString(R.string.GameMode), if (mode == Mode.DRAWING) getString(R.string.drawingMode) else getString( R.string.textMode ) ) mainViewModel.goToQiChatBookmark(getString(R.string.changedModeBookmark)) } // Button Listeners viewModel.uiEvents.observe(viewLifecycleOwner) { when (it) { DrawingViewModel.UiEvent.ExplainRules -> explainDrawingRules() DrawingViewModel.UiEvent.Clear -> clear() DrawingViewModel.UiEvent.RecognizeDrawing -> recognizeDrawing() } } explainDrawingRules() } |
How to process the results
As mentioned earlier, our fragment needs to implement the interface StrokeManager.ContentChangedListener and override the method onDrawingContentChanged to be notified when the recognized content changes.
This is what we will do when the recognized content has changed after the user is done drawing:
1 2 3 4 5 6 7 |
override fun onDrawingContentChanged() { if (strokeManager.getContent().isNotEmpty()) { translateAndSayResult(strokeManager.getContent()[0].text ?: "") } else { goToNotRecognizedDrawingBookmark() } } |
At that moment, the result will be translated, using the ML Kit Translation API (this will be explored in detail in another article of this series), in case needed, and made available to be told to the user via the chat using a bookmark and a QiChat variable. If nothing is available, Pepper will respond accordingly informing that nothing could be recognized.
1 2 3 4 5 6 7 8 9 10 11 |
private fun translateAndSayResult(text: String) { // Translate from english to the robot language if necessary and then go to the bookmark if (mode != Mode.DRAWING || mainViewModel.language == Language.ENGLISH) { goToRecognizedDrawingBookmark(text) } else { mainViewModel.translate(Language.ENGLISH, mainViewModel.language, text) .addOnSuccessListener { translated -> goToRecognizedDrawingBookmark(translated) } } } |
1 2 3 4 |
private fun goToRecognizedDrawingBookmark(text: String) { mainViewModel.setQiChatVariable(getString(R.string.recognizedDrawing), text) mainViewModel.goToQiChatBookmark(getString(R.string.recognizedDrawingBookmark)) } |
Voice interaction
Finally, both the UI and the logic are finished and the last thing that is necessary is the voice interaction of the game.
The dialog is always a very important part of a Pepper application as it constitutes how the user will communicate with the robot and the experience will very much be influenced by how the dialog is built. There are some essential concepts to take into account when designing the dialog to create a meaningful experience. For instance, Pepper should react to several ways how the user may ask a question or say something, otherwise, if it only knows one there is too big a chance that the user will decide on another and nothing will be recognized. How many? As many as you can think of is best. To provide a good experience, it is also interesting to do the same in the other direction, i.e., let Pepper answer slightly differently every time, so that it is not too repetitive and boring. This can be done by including several variations of an answer that are somewhat different in wording and programming a random pick. For example, making use of QiChat Concepts. Also, think of a Plan B, in case the dialog does not go as planned. For example, use the tablet to replicate the most important functions so that the user has another way to access the feature in case the dialog does not work, e.g., because it is too loud for them to understand each other. Another important point is to pay attention to the language the robot is using, making sure it is clear, and understandable, both acoustically by setting pauses at the right moments and in terms of content, not ambiguous. Moreover, the content should be adapted to the setting, the context, and the user. Not too formal, not too informal, friendly, respectful, etc.
This is what the part in our chat topic related to this game looks like. As you can see, we make extensive use of Bookmarks and QiChatVariables to communicate back and forth with the logic:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 |
concept:(ready) ["ready" "i'm ready" "done" "i'm done" "i am ready" "i am done"] concept:(recognizing) ^rand["mmm let me think just a moment" "okay, give me a second, what could that be?" "oh, that one's easy!" "let me think about it"] concept:(correctguess) ^rand["yay super!" "i knew it!" "cool! I'm glad" "cool! you've done a very nice drawing"] concept:(again) ^rand["let's play one more time" "do you want to play again?" "would you like to play once more?" "that was cool, let's do it again" "are you up for another round?"] u:(^empty) %drawingRulesBookmark Hey, let's play a drawing game! \pau=600\ Think of something and draw it on my Tablet! \pau=600\ Using machine learning I will try to recognize what it is. Are you ready? u1:(~yes) great! \pau=600\ let's see %startGameBookmark \pau=600\ say done or press the button when you're done u1:(~no) okay, maybe later! u:(drawing) %startDrawingBookmark u:(~ready) %readyToRecognizeBookmark u:(^empty) %recognizedDrawingBookmark ~recognizing \pau=500\ I would say it's a \pau=500\ $recognizedDrawing \pau=1000\ is it correct? u1:(~yes) ~correctguess \pau=500\ ~again %clearBookmark u1:(~no) ouch! please try again %clearBookmark u:(^empty) %notRecognizeDrawingBookmark hmmm I can't really say what that is... would you like to try again? u1:(~yes) great u1:(~no) okay, maybe later! |
Conclusion
This is only a demo use case that you can build combining ML Kit DigitalInkRecognition API and Pepper but not all you can do, as the combination of both truly gives the opportunity to do a lot more cool apps and fun games that involve recognizing handwritten text and drawings on its tablet.
I hope you enjoyed the implementation of this game! Check out the rest of the articles of this series, where we’re going to see more use cases and how to implement them in our ML-Kit-powered Android app for the Pepper robot!
- Introduction
- demo with ML Kit’s Object Detection API
- demo with ML Kit’s digital ink recognition api (this article)