Beyond Ambilight: Pre-Computed Surround Lighting for Sub-100ms Reactive Immersion

Video size:

Abstract

Discover how predictive algorithms and sub-100ms lighting control deliver 82% more immersion in gaming and entertainment. Learn the tech behind pre-computed surround lighting that scales across content formats bringing next-level realism to your smart home experience.

Summary

Transcript

This transcript was autogenerated. To make changes, submit a PR.

Hi. Hello everyone. My name is John Komar. I'm a former senior engineer at Google. Currently now work at Meta. I have around like 13 years of experience working in different stacks of like software development. I'm mostly a full stack development, but my current recent experience has been mostly working on Android native applications at Meta slash Facebook. So yeah, today I'm going to present, by white paper, which is pre-computer surround lightning for like reload latency. And I hope we gain something useful out of this in this conference. Yeah, let's get started. So of the main things which we are going to cover in this conference and this particular presentation as what is the system architecture of this system what are the performance advantages of using this approach compared to the conventional approaches? How does predictive modeling help make this system more performant than the conventional ways on how sound lights are handled? What are the user experience advantages of using the system and what are the challenges which I faced when I was writing this paper, as well as developing the system? So one of the major challenges in the existing system of ambient lighting, so I'll give you some context. Ambient lighting. When I say ambient lighting, it is generally when you are watching something on your screen. Let's say you're watching a movie or you're playing games, you can have these like colorful surround lights set up in your home. So what's of the new companies, for example, let's say Phillips and a bunch of other manufacturers now. They have released these smart bulbs, which can sync with the video content being played on your screen. So the usual conventional way of doing this is they would have a camera pointed towards the screen. The camera would see what is being played on the screen, and then there's a small box which is going to process the information which is happening in the video. Then it would send out commands to the different bulbs or the different lights set up across the room to change their colors to match in the screen. Now the biggest problem in this entire approach is latency because everything is computed on the fly, especially like camera, seeing what exactly is going on. Then there's this like computation, which is happening on the device locally. What they are seeing. Then deciding which colors make sense. Which brightness makes sense, and then sending out the signals to the bulbs. This entire process is heavily computational expensive, not to mention the added latency and passing these signals usually over ZigBee to these lakes around smart lights. And because they have to do this on the runtime, there are limited number of colors or limited number of like brightness numbers these machines can play around with because the more computational heavy they make it, it is going to add more to the latency. So this entire system is physically bound by the challenges of physics, basically how much time it takes to process certain amount of information, and how much time it takes. To send those signal to smart lights or smart bulbs or like different light strips to solve this problem. What I came up with this approach was to have this pre computation or basically. Pre-processing what content was being displayed on the screen via the signal pass through. So this approach does not really rely on a camera or a physical device looking at what's going on in the screen, but basically trying to pre-process what this scene might entail. And this predictive buffer basically predicts what the lights in this thing are going to be. I'll explain this with an example. Let's say there's, you're watching a movie and there's a police car chase theme. Now, we know in most countries, like the police cars have red and blue lights. So this system would pre-compute that in this particular car chase scene, the scene would require blue and red color lights flicking through the system. So the system can pre-compute and pre-prepare these like encodings and send these signals to the, but right before the scene starts on the screen, so the users would not. Experience any latency as well as immediately the signals will be passed onto the surround lights as the screening is being posted onto the on the main screen, what happens in the conventional systems as I have personally witnessed that myself, even though when the police car chase seen is over. Is when the bulbs start reacting to those scenes like you would see blue and red colored lights. I'm just giving examples, would flicker corresponding to the police car chase scene. But the scene is no longer even on the screen. So this whole experience is like really jarring. The latency is like extremely high and this entire experience is supposed to be more immersive and more engaging. But in my personal experience as well as the server which we have conducted with internal test users, this entire experience of the conventional way of doing this thing has been, I would say, completely disconnected from what you're watching on the screen versus what you're experiencing in the smart lights. So this approach of like pre computation, not only bridge that latency barrier, but also gives you a lot of flexibility to play around with. So in this case we did some benchmark analysis and we found out that even though the conventional systems had the latency of around, let's say I think it was around like 200, 300 milliseconds, but with our approach, we can basically send out the signals with the overall latency of 60.7 milliseconds, which is almost like a more than one 10th less latency compared to pen menstrual system. And also because we have the capability of pre-cutting this information regarding the scenes in the movie or games you are playing or the TV content which you're watching. This could be even be streamed or d platforms or online streaming like Netflix or like Amazon Prime or Disney Plus, et cetera. So because this information is like pre-processed, the system has the capability of peeking through the frames, which would be visible in the next few seconds. So with this look ahead approach, not only the system can pre-compute the light signals it has to send out for the currency, it can even do a look ahead for the next scene. So that not only saves the time for pre-pro when the scene actually comes, which avoids this jarring experience in case the stream buffers or like loads, it also creates the smooth experience that when you transition to the next scene. The lights are already pre compute. It completely avoids even the possibility of having a delay because we already pre buffered what the smart lights would be showing in terms of color or brightness, two seconds before the scene was even on the screen. And also because of this pre computation, it can support like multiple frame rates. For example, let's say you are playing games. The frequency is not really around like 24 60 hertz. It can even go up to one 20 hertz, even new monitors about two 40 hertz. So with this approach, it has not capped by the frame rate of what you are watching, but basically human time of how much few seconds of content which you're going to process. For example, if you're playing a game. The game developers in their gaming engines could pre encode the signals of the colors the game environment has. So e, because the character can move in like certain directions in the map, they can pre-compute that if character moves, let's say the direction of snow, we can make the entire room like snow white. Or if the character moves towards the sea, we can make the entire room blue. So all this pre computation can be done, and also game developers can pre-code certain information. Which the system can handle and directly send the signals and make sure they're ready before the scene actually comes. Also we developed a system for error collection. So for example, when this theme is being shown, or like the predictive analysis being done for the next few scenes it can pre-check itself that the colors it is sending to the smart lights actually makes sense. I'll explain this with an example, for example. Let's say a character is looking at a beautiful rainbow in the sky, even though the primary color in the ceiling would be the blue sky the conventional systems in that case would actually show blue lights in the room. But actually what the user is actually looking at is the rainbow. So in this particular case, with the machine learning algorithms. It can predetermine, what is the main subject and main focus on that particular scene? In this case, it's a rainbow. So with this new system, it can actually send rainbow colors, which could be spread across the entire room and make the scene more immersive compared to existing lights, which would only show I would say maybe sky blue. And as, as I previously mentioned, the conventional systems because they have to rely on this like real time processing. They have to rely on the camera input. They have to rely on sending the signals. On the fly, they're bound by the physical limitations on how much they can pre-computer. So instead of using all in the entire color gamut, and instead of using the entire brightness spectrum from like zero to a hundred percent, let's say, they would play it really safe. They would only play with like major primary colors. The systems are really risk adverse that they would not tinker with the brightness. They would not tinker with like too many color combinations and risks of getting it wrong. So to play it they usually play within a very safe range of colors, but in our system, the new system, because it is pre-cutting and it has algorithms and it has checks and balances. To make sure that the scene and the light signal it's sending is actually correct. It can actually play with the entire color spectrum of all the million color combinations it can produce from RGB different combinations of RGB lights, and it can also go from like zero to a hundred percent brightness. For example, let's say you're playing a horror game, it can actually go completely. To create that immersion that you're playing a horror team or horror scene. Or if you are watching a really nice, let's say the latest Superman movie with like bright, beautiful colors, it can go to a hundred percent brightness with like really immersive bright colors to match what the content creator intended the movie or the content will look like. How did we achieve really sub hundred millisecond response time. These are the key areas which we actually focus on. The first one being like really efficient frame processing. If we do not have a good frame processing algorithm in place, or we do not have good pre encoding logic in place there is no way getting around it if you are not able to find the colors that a particular screen should be showing. And if, because this is going to the bottleneck, just processing the raw content or the video content and then the language colors to show the other areas in this system are basically optimization. Basically, pre computing is basically applying this frame, processing to few frames in the future, making sure that you have enough buffer when those scenes come onto the screen, you already have. Those screens early. No communication. Basically talking between the smart lights because your system knows how far those smart bulbs are. Let's say your smart bulbs are really far spread out in your huge home theater system. Or let's say the system is even being used in a commercial theater system because the physical limitations that the birds or the lights are. Located really far away, you can pre calibrate for that LA agency because you know that bulb is the time. It takes one second to act the signal to reach to that bulb. And because we have pre computation, you can actually send the light signal for the scene, which is coming a second later, so that it is completely in sync with the video. So even, and this would have been, and this actually is completely impossible in the current system because the current systems rely on the camera. They cannot please send the signal to the bulb. And if the bulb is like really far away, or even if there's a network lab, let's say your outer is not super efficient and there's a delay between your setup box or your, let's say, smart device between those bulbs, the latency is really pronounced and it's like really bad user experience. And by using dedicated systems for pre-com computing this. The better hardware we have, the better GPUs we have. This system can be scaled further in terms of depth, basically reducing the time it takes of frame processing as well as how much pre computation we can do. Moving on aside from these approaches, we can further improve the system to improve the overall immersion by identifying the scene. For example, in this the previous example, which I mentioned, for example, rainbow in the Sky classifying the scene that this, the image which the user is looking at right now, the scene user is looking at is rainbow in the sky, makes or breaks what the signal is going to be because existing systems. We just rely on what is the maximum amount of color or what is the maximal present color in this scene, which is going to be blue, but in our scene classification system, it is going to actually recognize this is not just a blue sky. It has a rainbow inside it. That way it can completely differentiate which kind of color signals to stent to the smart poles. It can also do color mapping because I mentioned previous systems. They rely on camera input if your screen does not have color accurate. Let's say display or the display itself is not of high quality. Or let's say you are watching content on CRT or some other data hardware. Their systems also fail because they completely rely on what exactly is on the screen, not what signals are being sent to the screen. We can also do motion analysis. Let's say your monitor has really high frame rate, one 40 hertz. And because the exist existing systems that rely on camera, the monitors frame rate could be far higher than the computational capabilities of those systems. In that case, the seams could actually be changing faster than what their systems can compute and send to the buds. In that case, those systems have so much lag. It's almost impossible to keep the video or the game content in sync with the smart lights. Also with focus election, it is similar to scheme classification, but also it can identify if there's like a main accuracy or if there's like a some emotional theme. It can also determine which areas to focus on Embassy itself and. With the adoption of the system over time, content creators movie producers, as well as other people who are in the department of color mixing, or like this video engineering, they can eventually employ pattern learning algorithms on how accurate the previous predictions work. And this system can go grow more accurate over a period of time. And this entire coding could be very similar to, let's say, how some engineers work today. Whenever we watch a movie, there are in the background. There are like so many sound engineers who have watched that content, who have specifically chosen that this sound should go to studio speakers. This sound should go to the roofer. This sound should go to the AT speakers. Very similar when there is an entire ecosystem built around the sound part of the video, we are watching eventually down the line. I believe we can also have this like surround part of. Different smart lights reacting to the video content. So the major application of this particular approach is going to be giving immersion gaming industry, I believe has already crossed in terms of profits. The other like sports in terms of the money it makes. The game is getting very popular, be it like phone games, like consoles or pc. So having a more immersive gaming experience, basically the game which you're playing your entire room reacts to. It is going to only add to the overall experience of the users, of the gamers. We can also adopt this approach not only for gamers and like cinematic movie watchers. But also for education content as well as reducing the visual comfort. Sometimes for people who have reduced impaired visibility, having lights in the room which react to the content could make the experience somewhat more pleasant as and soothing. And also people who are not able to focus entirely on the minute details on the screen, they could still get the overall experience or overall, the vibe of the movie or the vibe of the scene, just simply because of the surround lights reacting to the content on the screen. Some of the optimizations, which could be done specific to the content types. Basically as the gaming progresses we have moved on to having 60 hertz being the benchmark of smooth gaming. Now, one 20 hertz is pretty common, and with the advancements in new GPUs, new monitor hardware as well as virtual reality, it is only going to keep going up. Since our logic is not really capped to the. To the frequency of content and based on the pre encoding and the pre detection of what the scenes are going to be, our system is going to easily scale with the improvements in gaming over a period of time. Cinemas I believe have been very stagnant in terms of the exp movie watching experience in the past. I would say 30 years. Aside from new IMAX and sound formats, they haven't been really radical improvements in the cinema watching experience in the past few years. Now with this new system, the directors and the sound engineers and the new field of I would say occupation maybe the sound light engineers can actually include this new information that corresponding to this movie scene. The theater lights should react in this way. The backlight should react in this way. If it's a horror movie, the lights should flicker in this way. For example, imagine watching a movie which has. A thunderstorm scene and the entire theater lights flash according to the scene on the movie. It is going to create a much more immersive experience compared to the movies experiences we are used to watching right now. And also with the increased short form content people's focus duration has gone down significantly. We are all aware, like I think the studies say the average focus time now for people is around seven seconds. Which is alarming. But with these surround line lights and the surround encoding of content, people might be able to focus more as it can reduce the distractions and people can actually focus on the screen, which they're supposed to. It can maybe di the lights around the room when you are focusing, let's say, on your coding tasks or you are trying to read your book. So it'll not be just limited to the videos, but based on the content which is on your screen, it can pre-process and can determine like what work you are doing on right now. And it can put you on focus mode. It'll also be you watching like maybe some calming videos or listening to soothing music and the lights in your room could react accordingly, making the experience much more deeper. The architecture, which I briefly touched upon in the previous slides we can go over that. Basically the system is based on the content input, what the video or the audio content is actually is. Then we have this like frame analysis engine, which. Which sifts through this content, which decodes this content and breaks into this, like what the light should be. Then there's this predictive buffer, which is an optimization, which predicts what the next few frames are going to look like. Then there's this node controller, which is responsible for sending out the signals, the actual signals to those surrounding lights. Could be strips, could be bulbs, could be any even virtual devices. For the sake of testing, and eventually they could be replaced with actual bulbs and then actual physical lights, which are responsible for showing the actual colors. Some of the major challenges, which I encountered while working on this particular project was the latency reduction. As I previously mentioned, tradit traditional systems, they completely rely on camera input, basically their systems. Watch the video and translate the way humans do it, which I believe is really unoptimized way of doing this because computers and we have like algorithms which can pre-process this video or format, we do, we need not actually rely on what's on the screen. We can actually pre-compute software information and we can even predict what's going to be the information in the next few seconds. Also, the way the content and the frame rates change so rapidly in today's environment. Some of those existing systems cannot just keep up with the way the technology is advancing the way Movie scenes are much more immersive, much more color depth. The games have accept a much more higher framing rate. The hardwares that would be built to keep up with these technologies are going to be extremely expensive and would not be consumer friendly. And the existing systems, they really have limitations on how much computation they can actually do while making the devices affordable and how much color accurate, how much brightness, accurate they can actually be with the technical limitations they have. So the implementation considerations, which I made when I write the system. Basically the systems could scale from a casual TV or home theater watching person who has some sound lights and could scale all the way to a commercial movie theaters where people go to a movie theater and actually have the system running behind the movie and controlling the smart lights around. You. Imagine like a four D movie, but the chairs would not be shaking. But still the experience would be much more immersive and. The best part about this approach, it can convert any of the existing theaters. Into these smart theaters which correspond or which respond to the content on the screen with very minimal additional expenditure. Converting a regular theater into a 40 theater is going to be significantly expensive compared to this approach in which you have to spend like maybe a few hundred dollars to install these smart lights and all of the internal logic to pre-compute how the signals will be sent to b will take care of it itself. And this system could easily live inside a gaming environment. Let's say you have a PS five, you have an Xbox, you have a computer, if this information is preen code. And all of these devices have enough hardware to handle this on the device, live in coding. And these could easily handle sending signals to your smart devices existing in your home. So this can scale from like entire movie watching experience to casual, to professional gamers. Who consume live visual content. So I believe this approach could be the next redefining step in redefining how we consume. Media, basically video and audio content because the performance is going to be extremely fast with predictive intelligence. It can keep up with new contents, it can keep up with new games, it can keep up with new movies, and it can keep up easily. Keep up with new video formats and hardware, which is not really possible. And same systems with the advancement in surround lights, better lights, better, smart light bands, light strips, light lamps. And the experience is going to keep getting even better. And as the system scales, it can handle like more and more lights, maybe even like a hundred supplies in the future. And because it is like pre-com computing all this information and it is being computed once, not everyone relying on the system or using the system has to compute this information on their device. So movie producers, content producers could pre-compute or prefe this information. And aside from the video, like subtitles or surround sound input, this could be a new input and the user's devices could just consume this input and send that smart lights. Overall, this would reduce the dependence on consumers and having to buy really expensive new hardware. For example, the current hardware. Which does something similar in a very inefficient manner is at least two $50. Not to mention the cost of buying additional bulbs with and additional cameras which go out of it like really fast. I hope this new system sounds useful and hopefully it will be eventually adopted by the industry. I have written defensive publication for the same approach with the same name. It would be good read if you guys want to find out, and I hope you have fun and rest of the conference. Thank you so much.

Slides

Download slides (PDF)

See all 53 talks at this event!

Conf42 Kube Native 2025 - Online

October 16 2025 - premiere 5PM GMT

Beyond Ambilight: Pre-Computed Surround Lighting for Sub-100ms Reactive Immersion

Video size:

Abstract

Summary

Transcript

Slides

Chandan Kumar

Software Engineer @ Google

Join the community!

Featured event

2026

2025

Info

Conf42 Kube Native 2025 - Online

October 16 2025 - premiere 5PM GMT

Beyond Ambilight: Pre-Computed Surround Lighting for Sub-100ms Reactive Immersion

Video size:

Abstract

Summary

Transcript

Slides

Chandan Kumar

Software Engineer @ Google

Join the community!