Google’s video generator is coming to a few more customers — Google Cloud customers, to be precise.
On Tuesday, Google announced that Veo, its AI model that can generate short video clips from images and prompts, will be available in private preview for customers using Vertex AI, Google Cloud’s AI development platform.
Google says that the launch will enable one customer, Quora, to bring Veo to its Poe chatbot platform, and another, Oreo owner Mondelez International, to create marketing content with its agency partners.
“We created Poe to democratize access to the world’s best generative AI models,” Poe product lead Spencer Chan said in a statement. “Through partnerships with leaders like Google, we’re expanding creative possibilities across all AI modalities.”
Flagship generator
Unveiled in April, Veo can generate 1080p clips of animals, objects, and people up to six seconds in length at either 24 or 30 frames per second. Google says that Veo is able to capture different visual and cinematic styles, including shots of landscapes and time lapses, and make edits to already-generated footage.
Why the long wait for the API? “Enterprise readiness,” says Warren Barkley, senior director of product management at Google Cloud.
“Since Veo was announced, our teams have augmented, hardened, and improved the model for enterprise customers on Vertex AI,” he said. “As of today, you can create high definition videos in 720p, in 16:9 landscape or 9:16 portrait aspect ratios. Similar to how we have improved capabilities of other models such as Gemini on Vertex AI, we will continue to do this for Veo.”
Veo understands VFX reasonably well from prompts, says Google (think captions like “enormous explosion”), and has somewhat of a grasp on physics, including fluid dynamics. The model also supports masked editing for changes to specific regions of a video, and is technically capable of stringing together footage into longer projects.
In these ways, Veo is competitive with today’s leading video-generating models — not only OpenAI’s Sora, but models from Adobe, Runway, Luma, Meta, and others.
That’s not to suggest that Veo’s perfect. Reflecting the limitations of today’s AI, objects in Veo’s videos disappear and reappear without much explanation or consistency. And Veo often gets its physics wrong. For example, cars will inexplicably, impossibly reverse on a dime.
Training and risks
Veo was trained on lots of footage. That’s generally how it works with generative AI models: provided with example after example of some form of data, the models pick up on patterns in the data that enable them to generate new data — videos, in Veo’s case.
Google, like many of its AI rivals, won’t say exactly where it sources the data to train its generative models. Asked about Veo specifically, Barkley would only say the model “may” be trained on “some” YouTube content “in accordance with [Google’s] agreement with YouTube creators.” (Google’s parent company, Alphabet, owns YouTube.)
“Veo has been trained on a variety of high-quality, video-description data sets that are heavily curated for safety and security,” he added. “Google’s foundational models are trained primarily on publicly available sources.”
Reporting by The New York Times in April revealed that Google broadened its terms of service last year in part to allow the company to tap more data to train its AI models. Under the old ToS, it wasn’t clear whether Google could use YouTube data to build products beyond the video platform. Not so under the new terms, which loosen the reins considerably.
While Google hosts tools to let webmasters block the company’s bots from scraping training data from their websites, it doesn’t offer a mechanism to let creators remove their works from its existing training sets. Google maintains that training models using publicly available data is fair use, meaning the company believes it isn’t obligated to ask permission from — or compensate — data owners. (Google says it doesn’t use customer data to train its models, however.)
Thanks to the way today’s generative models behave when trained, they carry certain risks, like regurgitation, which refers to when a model generates a mirror copy of training data. Tools like Runway’s have been found to spit out stills substantially similar to those from copyrighted videos, laying a possible legal minefield for users of the tools.
Google’s solution is prompt-level filters for Veo, including for violent and explicit content. In the event those fail, the company says its indemnity policy provides a defense for eligible Veo users against allegations of copyright infringement.
“We plan to indemnify Veo outputs on Vertex AI when it becomes generally available,” Barkley said.
Veo everywhere
Over the past few months, Google has slowly built Veo into more of its apps and services as it works to polish the model.
In May, Google brought Veo to Google Labs, its early access program, for select testers. And in September, Google announced a Veo integration for YouTube Shorts, YouTube’s short-form video format, to allow creators to generate backgrounds and six-second video clips.
What about the deepfake risks of all this, you might be wondering? Google says that it’s using its proprietary watermarking technology, SynthID, to embed invisible markers into frames that Veo generates. Granted, SynthID isn’t foolproof against edits, and Google hasn’t made the content ID piece available to third parties.
These may be moot points if Veo doesn’t gain meaningful traction. On the partnerships front, Google has ceded ground to generative AI rivals, who’ve moved quickly to woo producers, studios, and creative agencies with their tools. Runway recently signed a deal with Lionsgate to train a custom model on the studio’s movie catalog, and OpenAI teamed up with brands and independent directors to showcase Sora’s potential.
Google at one point said it was exploring Veo’s applications in collaboration with artists including Donald Glover (AKA Childish Gambino). The company gave no update on those outreach efforts today.
Google’s pitch for Veo — a way to reduce costs and quickly iterate on video content — runs the risk of alienating creatives. A 2024 study commissioned by the Animation Guild, a union representing Hollywood animators and cartoonists, estimates that more than 100,000 U.S.-based film, television, and animation jobs will be disrupted by AI by 2026.
That might explain Google’s cautious, “slow and steady” approach. When asked, Barkley wouldn’t give an ETA for Veo’s general availability in Vertex, nor would he say when Veo might come to additional Google platforms and services.
“We typically release products in preview first, as it allows us to get real-world feedback from a select group of our enterprise customers before it becomes generally available for wider use,” he said. “This helps improve functionality and ensure the product meets the needs of our customers.”
In a related announcement today, Google said that its flagship image generator, Imagen 3, is now available for all Vertex AI customers without a waitlist. It’s gained new customization and image editing features — but these are gated behind a separate waitlist for now.