In this blog post, we measure and investigate the glass-to-glass latency in Android.
Have you ever wondered why the preview of the camera image in your favorite photo App has a lag? Not noticed yet? Just try it out:
Pick up your favorite phone – Android or iPhone, it does not matter. Start the camera app, look at the preview on the screen, and slowly wave your hand in front of the camera. Then wave a bit faster. Did you notice the lag between the real movement of your hand and the captured hand on the screen? It works best when you can see your hand and the captured hand at the same time.
Ok. Now you should see that there is a small lag between the real movement and the camera image on the screen. It is just a small lag. It does not take seconds, but how long is the delay exactly? And why is it noticeable? And can it be shorter? Or can the image be lag-free altogether?
Read on, or watch the video
If you are curious and want to know more about the current technology of camera sensors, smartphone displays, and the software side, read on! Afterwards, you know why augmented and mixed reality is technologically challenging.
If you want to watch a 30-minute video instead of reading this article, you have the choice. I gave a presentation about this topic at the AOSP and AAOS Meetup in November 2023. The video is available here: Measure and investigate the glass-to-glass latency in Android.
A bit of definitions: The glass-to-glass latency
Let’s define our problem first. For this blog post, we want to focus on the so-called glass-to-glass latency. Take any physical object, like your hand, from the experiment described above. Your hand emits photons, which travel through the air until they hit the lens of your smartphone camera. Behind the lens, the photons hit the image sensor, which converts them into electrical signals.
The electrical signals are transformed into pixels, and then the smartphone processor reads them out and stores them in the memory. Some additional circuits in the processor convert and optimize the pixel data and hand over the image to the application software. In our case, the app just displays the received image on the smartphone display. That means it requests the GPU to draw the image onto the screen. Some other specialized circuit on the processor converts the rendered pixels into electrical signals and sends them to the display of your smartphone. The display emits photons, which travel through the display glass and eventually hit your eye.
The time span between the first glass, the camera lens, and the second glass, the display, is the glass-to-glass latency. It is the amount of time the whole system needs to process the image and display it again.
Why is the glass-to-glass latency important?
The glass-to-glass latency is important for every use case that involves looking at a display that records the real world and wanting to interact with it simultaneously. E.g., in the case of a virtual headset that augments objects in the real world with markers or text. If you move your head and the latency is too high, marks will not appear above the detected object. Instead, they lag and cause a bad user experience.
Other examples are remote-controlled cars or even model airplanes. In these setups, the latency between the image capture and the display is crucial to allowing the operator to control the device precisely.
My device under test – DUT
The abbreviation DUT stands for “device under test“. My DUT was a Pixel 2 smartphone, developed by Google. It is quite an outdated phone. It was released in 2017 and discontinued in 2019. Google also abandoned the software side in October 2020. They will not release security updates anymore.
Nevertheless, it had some advantages. First, it was available on my desk when I did my investigations in 2022. So I did not need to buy an extra device. Second, the Pixel 2 is supported by the Android Open Source Project, which is distributed by Google. I could easily build my own firmware for it and get root access to the device.
For reference, the camera is running at 30 frames per second. When you divide 30 frames by 1 second, you get 0.0333 seconds, which is 33.3 milliseconds. This means that the camera takes a photo every 33.3 milliseconds. That is a typical frame rate for video cameras.
The display runs at 60 frames per second. So the smartphone refreshes the contents of the display 60 times per second. The faster the display refresh rate, the smoother the animations, e.g., for scrolling. If you divide again, you get 16.6 milliseconds between two display frames.
Why did I calculate the number? The durations, 33.3 milliseconds between two photos and 16.6 milliseconds between two display frames, will come up again later on.
I cross-checked my latency findings with newer devices, like my recent smartphone and an iPad. The results are similar. That is because the technology (and challenges) are still the same nowadays.
The naive way of measuring the latency
When you search for the glass-to-glass or camera to display latency on the internet you find videos like “How to measure the LATENCY of your iPhone, camera, monitor,..“ or other blog articles. They use the following simple measuring setup:
Get a clock that can show the current time in the millisecond range. Place the DUT behind the clock so it can film the clock’s display. Get an additional camera and place it behind the DUT and the clock. It should capture the display of the DUT and the display of the clock in one shot.
When the additional camera takes a picture, it contains two timestamps. The time of the clock and the clock’s time on the display of the DUT. Since there is some delay, the latency, in the DUT capturing, processing, and displaying the image, the two timestamps are not the same. But they are captured at the same time because they are in the same image. The difference between the two timestamps is the glass-to-glass latency of the DUT.
The following photo shows the DUT, my Pixel 2 phone, in the foreground and the clock, my Fairphone 3 phone, in the background.
Why the naive way is problematic
There are a couple of problems with this approach. First, the measuring equipment – the clock and the camera – uses the same technology as the DUT. So any DUT-specific technical limitations also apply to the measuring system and may distort the results. For example, how the time is shown on the display of the clock or how a photo or video is captured by the camera. More about these technical limitations and why they matter later.
Secondly, the precision is not great. The clock may have only two decimal places for the seconds (± 10 milliseconds) and the camera is capturing maybe 30 frames per second, which means 33 milliseconds for a single shot. So the measuring accuracy is in the same order as the expected results.
My naive test results
Nevertheless I tested this approach, too, and here are my results: In this naive measurement setup, the latency is 85 milliseconds, just the difference between the two time points.
Even though I titled the test naive, the result is “correct“. It is one of the correct values. But this I learned only later when I had built a more sophisticated test setup.
The better way – photodiodes and oscilloscope
A better test setup was needed. When you dive into the topic, you quickly find out that an exact and easy way to measure the latency is to use some off-the-shelf electronics and an oscilloscope. So I built the following circuit:
On the left side, you see an LED, a button connected to a resistor, and a voltage source. When the button is pressed, the light shines directly into the smartphone camera. This way, I can control when the camera sensor sees full white or no light.
The smartphone is running the Camera App in preview mode. So it continuously captures images and shows them on the display. So it either captures a “dark“ image or a full white image when the LED is turned off or on.
On the right side of the smartphone, I have placed a photodiode directly on the display. It’s connected to a voltage source and a resistor. When the display switches between showing white or black pixels, the photodiode will change its internal resistance. And this change can be measured by the oscilloscope.
Here is also a real-world image of the test setup:
Tests, tests, and more tests
This setup allows for precisely measuring the glass-to-glass latency for any device. Depending on its specs, an oscilloscope can measure voltage changes in submillisecond ranges. The reaction time of LEDs and photodiodes is also in the same range or even faster. The error bars in the test setup are neglectable small for measurements in the millisecond range.
Apart from this setup, I also did a lot of other tests. My investigations took roughly four weeks to complete, and I performed 15 different tests. But after that, I got a result.
Final measurement results
The following graphic is my final finding for the glass-to-glass latency on Android:
The graphic shows a sysgraph graph in the top half and the oscilloscope capture in the bottom half. They are aligned so that the events on the outside, where the LED and display light up, and on the inside, where the app receives the image, are lined up. The blue voltage line is the voltage on the LED, and the yellow voltage line is the voltage on the photodiode.
From this graphic, you can deduce which component on the system introduces which latency:
- 0-33 ms camera sensor exposure
- 24 ms sensor to application (onImageAvailable Callback)
- ~33 ms application to display (GPU, surfaceflinger, VSYNC)
- 0-16 ms display scanout
So the minimum and maximum ranges are 57 to 106 milliseconds. With this result, we can say that our naive test setup from above was “correct“. There, we measure 85 milliseconds and this fits perfectly into the range.
To understand what is going on, we have to look at the technical details. First, we will look at the display and how a display refresh works. Then we will dive into the inner workings of the camera sensor, and lastly, we will look at the software side and inspect the Camera HAL and VYSNCs in Android.
The display scan-out
To inspect what is happening on the display scan out I used two photodiodes. One is located at the top of the display and one is located at the bottom of the display. I wrote a simple test app that shows a black screen but switches the whole screen to white for a single frame. And then switches back to black for the next frames.
When this setup is captured on the oscilloscope, you see the following graph.
The yellow line is the left photodiode. You can see when it receives light from the white pixel. It is kind of a rectangular voltage for around 16 milliseconds, which is also the refresh rate of the display. The blue line is the right photodiode. It also shows a rectangular voltage line, but 12.4 milliseconds later.
The conclusion is that the pixels of a display are not refreshed at once. They are updated one after another. To be precise: from left to right and from top to bottom. This update process takes time. In the oscilloscope capture, the two different voltage peaks are 12.4 milliseconds apart. This is near the refresh rate of the display. Since the photodiodes are not exactly on the first and last rows of pixels, the whole refresh time is longer than 12.4 milliseconds. Since the refresh rate of the display is 16 milliseconds, this means that the display needs nearly the whole duration of 16 milliseconds to update all pixels.
Rolling shutter effect
The next component is the camera sensor. Here we have to talk about the rolling shutter effect.
Traditional cameras, like digital or analog cameras, have a global shutter. It is a mechanical mechanism to control when and how long light from the outside can shine through the lens on the analog film or digital sensor. The exposure time controls how long the shutter is open.
The digital sensor is a 2D array of pixels divided into rows. When the exposure of the frame starts, each row is activated separately from top to bottom. A row is only activated for the exposure time, but the rows are only activated after each other with a delay. The following graphic should illustrate the process:Since there is a delay between every row, a fast-moving object will be captured at different positions. For example, look at the following image I took with my smartphone: It shows an airplane cockpit and captures a rotating propeller.
Since the propeller is moving fast and the rows of the sensor are activated at different times, every row of pixels captures the propeller at a different position. So the actual straight propeller looks bent in the image.
The time between the exposure of the first row and the exposure of the last row is called skew time. Interestingly, the skew time does not depend on the exposure time for each row. It is always the same in different light situations and takes roughly 33 ms, which is also the frame rate of the camera sensor.
The inside world – the software
While the camera sensor captures the images, the pixel data is already transferred to the main memory of the system. The transfer and the further processing of the image are done by drivers in the Linux kernel and Android’s Camera HAL. When the image is ready, the OnImageAvailableListener callback fires in the Android application. This is the first time the application can read and inspect the pixel data of the captured image.
In our setup, the image is just drawn on the screen without further application-specific processing at once. But “at once“ does not mean immediately. It means when the next VSYNC happens!
The VSYNC is an event that occurs at the same rate as the refresh rate of the display. In your case, it happens every 16 milliseconds. It is the point in time just before the display starts to update the first pixel, the upper left pixel, of the screen.
In Android, the display VSYNC is called “HW_VSYNC_0“. There are two additional VSYNC signals:
- VSYNC-sf: When surfaceflinger starts to compose the screen using the apps screen, toolbar, and overlays.
- VSYNC-app: When the app starts to draw its screen.
So in total, there are three VSYNCs in Android, and the page VSYNC describes them in detail.
The three different VSYNCs fire one after another and are partially in parallel. When the display updates the screen based on the current frame, surfaceflinger composes the next screen frame at the same time. And at the same time, the app draws the next frame after that. This mechanism is called triple buffering because there are three different frame buffers active at the same time.
Recap
We have looked at the technical components of glass-to-glass latency. Every component introduces its latency and limits. Additionally, the camera sensor and the display do not operate instantly. Not every pixel is exposed at the same time and not every pixel of the display is lit up at once. Therefore, the glass-to-glass latency is a range and not a precise number.
From the last pixel of the camera sensor to the first pixel of the display, the latency is 57 milliseconds. From the first pixel of the camera sensor to the last pixel of the display, the latency is 106 milliseconds. So it is in the range of 80 +/- 25 milliseconds.
Show me the code
Apart from the external test setup, I have written some test applications and code to find out the various aspects and behaviors of the system. The result, which I also cleaned up a bit and added documentation to, can be found in the following Github repository:
https://github.com/inovex/android-glass-to-glass-latency
One interesting thing in this repo is that I reverse-engineered how the torchlight on the Pixel 2 phone is activated. Parts of the camera HAL implementation are open source. So it was quite easy. I have implemented the needed ioctl calls in a small C library. See here
https://github.com/inovex/android-glass-to-glass-latency/tree/main/pixeltorch
I used the torchlight to synchronize the inside world, the systrace graph, with the outside world, the LEDs, and photodiode voltage changes.
External References
The glass-to-glass latency is not a new topic. This property of a system is relevant for multiple use cases, from remote-controlled cars to augmented reality. So there are already other resources available.
For example, these two blog posts
by Daniel Wagner. He talks about the challenges of augmented and virtual reality. If you want to know how to use head-tracking to synchronize the captured images from the real world with the current head position, read these.
Another recommendation is the blog post Virtual Reality – Blatant Latency and How to Avoid It on the ARM community website. It explains how double/triple buffering increases the latency and how to avoid it with tricky synchronization.
Additionally, there are some research papers about measuring the glass-to-glass latency
- A system for high-precision glass-to-glass delay measurements in video communication
- A LED-Based IR/RGB End-to-End Latency Measurement Device
using similar or more sophisticated techniques to measure the latency.
Outlook
For me, it was a fun investigation. I learned a lot, e.g., about camera sensors and the Android graphic subsystem. It was also nice to use my basic electrical engineering skills to build, refine, and use my LED and photodiode-based test setup.
Now I am very curious about whether it’s possible to reduce this latency. Which hardware part or software component must be tweaked or configured differently to make it faster?
TL;DR: The camera sensor to display latency (the glass-to-glass latency on Android) of a recent smartphone is 80 milliseconds ± 25 milliseconds.