Google I/O 2025 unveiled new virtual try-on tools for fashion and makeup that blend cutting-edge AI and AR technologies. These tools allow users to virtually try on clothing and beauty products in Google’s platforms, powered by advanced machine learning models and integrated with Google’s Shopping services. In this post, we’ll dive into the technical architecture of these tools – including the custom image-generation model for apparel, the AR makeup try-on system, and how they compare to existing platforms like Apple’s ARKit and Meta’s (now-discontinued) Spark AR. We’ll also highlight real-time rendering improvements, personalization features, cross-platform capabilities, key quotes from the I/O keynote, and provide links to relevant SDKs, APIs, and documentation for developers.
At I/O 2025, Google introduced a virtual dressing room experience that is “the first of its kind working at this scale”, enabling shoppers to “try on billions of items of clothing” using their own photos. Instead of relying on preset models, users can upload a full-body photo of themselves and see AI-generated images of how clothes would look on them “within moments”. For beauty products, Google expanded its AR Beauty features to let users try on multiple makeup products simultaneously, inspired by trending looks and celebrities. As Google’s VP Vidhya Srinivasan emphasized during the keynote, “Our try-on experience works with your photo. It’s not some pre-captured image or a model that doesn’t look like you.” This shift toward personal photos and full makeup looks marks a significant advancement in personalization and immersion for online shopping.
Under the hood, these try-on tools leverage a combination of generative AI models, Google’s vast Shopping Graph data, and AR face and body tracking technologies. In the sections below, we explore the fashion and makeup try-on systems separately, then compare them to other platforms and discuss performance and developer resources.
Google’s virtual try-on for apparel is powered by a custom generative image model for fashion. This model was developed by Google’s Shopping AI research team and builds on a diffusion-based architecture introduced in 2023. The generative model is designed to understand how clothing items interact with the human body, capturing subtleties like draping, stretching, and wrinkles on different body shapes. In essence, it can render a garment onto a person’s image from scratch, rather than simply overlaying or warping the clothing. Google says the feature uses an AI model that “understands the human body and nuances of clothing — like how different materials fold, stretch and drape on different bodies.” This results in highly realistic try-on images that reflect both the garment’s details and the user’s pose.
Architecture – “TryOnDiffusion” Model: Google’s research paper TryOnDiffusion: A Tale of Two U-Nets (CVPR 2023) details the core of this technology. The model uses a parallel dual-U-Net diffusion architecture with a cross-attention mechanism to merge a person image and a clothing image into a new composite. In practice, the pipeline involves several steps:
- Preprocessing
The user’s full-body photo is processed to segment out the person and remove their current outfit, creating a “clothing-agnostic” base image (essentially the person’s shape and pose in plain form). Similarly, the clothing product image (e.g. a blouse or pants) is segmented to isolate just the garment. Both the person and garment poses are also computed (using pose estimation) to guide the alignment.
- Dual U-Net Diffusion
The model runs a diffusion process conditioned on the two inputs. One U-Net branch processes the person (with their pose and segmented body), and another U-Net processes the garment image. Through cross-attention, the garment features are implicitly “warped” and blended into the person’s image during the generative process. This approach preserves garment details (patterns, texture, etc.) while adapting it to the person’s shape and pose, achieving a more realistic result than traditional geometry-based warping.
- Multi-Stage & Super-Resolution
The generation is done in a coarse-to-fine manner – for example, an initial 128×128 image, then refined at 256×256 with the combined U-Nets, and finally upscaled (via a super-resolution diffusion model) to a high-resolution output (Google’s production system can produce images up to 1024×1024).
- Output
The end result is a photorealistic image of the user wearing the selected clothing item. According to Google, this diffusion-based approach “produces life-like portrayals of clothing on people” and handles a wide range of body types and poses that prior methods struggled with.
This architecture is a significant leap over earlier virtual try-on techniques. Traditional methods often used 2D image warping (cut-and-paste of clothing textures) which led to unnatural results (misaligned patterns, incorrect folds). By generating every pixel with a learned model, Google’s approach achieves a much higher realism. In tests, TryOnDiffusion achieved state-of-the-art quality, preserving both the person’s features and the garment’s details even with different poses. The model was trained on a large dataset of people and clothing images, which likely included diverse body shapes and garment types to generalize well (Google referenced using the Monk Skin Tone scale for diversity in their earlier model version).
The try-on service is deeply integrated into Google’s Shopping Graph and Search. When a user searches for apparel (shirts, dresses, pants, etc.) on Google and selects an item with a “Try On” badge, the system fetches the product’s image and feeds it, along with the user’s uploaded photo, into the generation pipeline. Because the Shopping Graph indexes billions of product listings, the model had to be efficient enough to scale across many brands and items. Google’s Lilian Rincon noted that this state-of-the-art technology is “the first of its kind working at this scale”, capable of handling “billions of apparel listings” in this virtual try-on system. The heavy lifting of the image generation is done on Google’s servers (leveraging TPU or GPU farms) and delivered to the user as an image in seconds. In a live I/O demo, the try-on result was generated almost in real-time after a photo upload, showcasing the real-time rendering improvements in this iteration. (It’s worth noting that users are advised to upload a clear, well-lit, full-body photo, and for best results wear form-fitting clothes – likely to help the AI accurately map the new garment onto their pose.)
As of I/O 2025, this powerful fashion try-on capability is available to end-users via Google Search (currently through Search Labs in the U.S.), but not yet exposed as a standalone API for third-party developers. However, the research underpinning it is public – developers and researchers can refer to Google’s TryOnDiffusion paper and even unofficial open-source implementations on GitHub (e.g. tryonlabs/tryondiffusion). It’s conceivable that Google could integrate this model into a cloud API or an SDK in the future (similar to how Google offers Vision AI services), but for now there’s no out-of-the-box “virtual dressing room” API from Google. That said, retailers can take advantage of Google’s integration by ensuring their product images are in Google’s Shopping Graph, thus automatically enabling the try-on for their items. There are also startups like Vybe, Daydream, and others working on virtual try-on APIs, indicating a growing ecosystem for developers who want to add similar functionality to apps. If you’re eager to experiment, you could explore academic code or use computer vision libraries to implement a basic try-on (for instance, using pose estimation + image segmentation to overlay clothes), but achieving Google’s level of quality would require training on large datasets with a diffusion model.
For those looking to explore or implement similar technologies, here are some useful resources:
- TryOnDiffusion Research Paper (Google, 2023):
“TryOnDiffusion: A Tale of Two U-Nets” – Introduces Google’s diffusion-based virtual try-on model for apparel. (ArXiv preprint and project page ). This is a must-read for understanding the architecture and could inspire custom implementations.
- ARCore Augmented Faces (Google)
The ARCore developer guide for face effects explains how to get a 3D face mesh and overlay textures for things like makeup. Available for Android, Unity, iOS (via ARCore XR Plugin).
- Apple ARKit Face Tracking:
Apple’s ARKit documentation provides details on ARFaceAnchor and using SceneKit/RealityKit to render content on faces. Great for iOS developers making AR try-on apps.
- MediaPipe Solutions (Google)
MediaPipe offers pre-built ML models for face mesh, iris tracking, hair segmentation, etc., which can be used in Python or in browsers. Check out MediaPipe Face Mesh and Selfie Segmentation in the [MediaPipe documentation] for building custom AR filters.
- Snap Lens Studio
Snap’s Lens Studio (not an open SDK, but a creative tool) is available for making AR effects. Documentation on makeup and face effects is on Snap’s website. Snap’s CameraKit SDK can also embed Snap’s AR into your app (if you’re a Snap partner developer).
- WebAR Libraries
If you prefer web, look into libraries like mind-ar-js or three.js with face-tracking shaders, and projects like AR.js. Also, Mozilla’s WebXR API for web browsers can access camera and perform some AR, though high-level face tracking is usually done with additional JS libraries.
- Google Search Labs – Try-On Experiment
While not a developer tool, if you’re in the U.S. you can sign up for Search Labs and try the Google clothing try-on yourself. It’s useful to see the latency and quality first-hand when evaluating the tech.
Google’s I/O 2025 virtual try-on announcements showcase a fusion of computer vision, graphics, and commerce. For developers, they are a glimpse into the future of shopping interfaces – one where AI and AR work hand-in-hand. Whether you aim to use Google’s offerings or build your own, the key takeaways are: invest in realistic rendering (be it via generative AI or careful 3D design), make it personalized, keep it performant in real-time, and integrate it with the user journey (not just as a gimmick but as a functional step toward a goal, like buying an item or choosing a style). The tools and models unveiled at I/O 2025 set a new bar for what’s possible, and we can expect rapid evolution in this space as these technologies become more accessible to the developer community.