Google’s DeepMind and Google Brain have merged to form a new AI team. The new Google DeepMind has shared details about how one of its visual language models (VLM) is being used to generate descriptions for YouTube Shorts. This innovation can help with discoverability as Shorts are created in just a few minutes and often do not include descriptions and helpful titles, making them harder to find through search.
Flamingo, the AI model developed by Google DeepMind, can make descriptions by analyzing the initial frames of a video to explain what is going on. For instance, it can describe a dog balancing a stack of crackers on its head. The text descriptions generated are stored as metadata to better categorize videos and match search results to viewer queries. The metadata is not user-facing, but it is accurate, and it aligns with Google’s responsibility standards. Google DeepMind’s chief business officer, Colin Murdoch, explained that creators often do not add metadata to Shorts because the process of creating a video is more streamlined than it is for a longer-form video. Besides, Shorts are mostly watched on a feed where people are just swiping to the next video instead of actively browsing for them, so there is less incentive to add metadata.
The new AI model, Flamingo, solves this problem by understanding the videos and providing descriptive text. It is valuable for helping the systems that are already looking for this metadata and allows them to more effectively understand these videos so that they can make a match for users when they are searching for them. Flamingo is already applying auto-generated descriptions to new Shorts uploads, and it has done so for a large corpus of existing videos, including the most viewed videos.
Todd Sherman, the director of product management for Shorts, noted that for a longer-form video, a creator might spend hours on things like pre-production, filming, and editing, so adding metadata is a relatively small piece of the process of making a video. And because people often watch longer-form videos based on things like a title and a thumbnail, creators making those have incentives to add metadata that helps with discoverability. However, applying something like Flamingo to longer-form YouTube videos does not feel outside the realm of possibility, given Google’s major push to infuse AI into nearly everything it offers. This could have a significant impact on YouTube search in the future.
Google DeepMind’s Flamingo model is a significant improvement to YouTube Shorts, which are becoming increasingly popular. As they are created in just a few minutes, they often lack descriptions and helpful titles, making them challenging to find through search. This innovation can help improve discoverability and make it easier for users to find the videos they are interested in. However, Google must ensure that the generated descriptions are accurate and aligned with its responsibility standards. Any mistakes from Flamingo could be harmful to creators and open Google up to significant criticism.
Flamingo is an exciting innovation that can help improve discoverability on YouTube Shorts. It is already being applied to new Shorts uploads and a large corpus of existing videos, including the most viewed videos. While it is not currently being applied to longer-form YouTube videos, it could be in the future, given Google’s major push to infuse AI into nearly everything it offers.