MatE: Material Extraction from Single-Image via Geometric Prior
Abstract
The creation of high-fidelity, physically-based rendering (PBR) materials remains a bottleneck in many graphics pipelines, typically requiring specialized equipment and expert-driven post-processing. To democratize this process, we present MatE, a novel method for generating tileable PBR materials from a single image taken under unconstrained, real-world conditions. Given an image and a user-provided mask, MatE first performs coarse rectification using an estimated depth map as a geometric prior, and then employs a dual-branch diffusion model. Leveraging a learned consistency from rotation-aligned and scale-aligned training data, this model further rectify residual distortions from the coarse result and translate it into a complete set of material maps, including albedo, normal, roughness and height. Our framework achieves invariance to the unknown illumination and perspective of the input image, allowing for the recovery of intrinsic material properties from casual captures. Through comprehensive experiments on both synthetic and real-world data, we demonstrate the efficacy and robustness of our approach, enabling users to create realistic materials from real-world image.
Method Overview
Overview of our pipeline, $\mathcal{E}$ denotes the pre-trained encoder. (Left) Our model consists of a Reference U-Net that processes the masked input latents to extract conditional KV features and a Main U-Net that denoises the the latent material maps ($z_{t^{\prime}}$) guided by the injected KV features. (Right) Visualization of our coarse rectification based on geometric prior.
Overview of our dataset construction pipeline. We apply thin-plate spline (TPS) transformations to planar meshes to introduce geometric distortions. PBR materials are then mapped onto these meshes using UV coordinates, and HDRIs are employed for realistic environmental illumination. From randomly sampled camera positions and viewpoints, we then utilize Blender to render synthetic images and their corresponding masks, concurrently saving the camera poses which are essential during our training.
To circumvent artifacts such as physically implausible scaling variations(eg. inset (a)) and structural disruptions caused by discontinuous UVs(eg. insets (b) and (c)) that arise when mapping materials to complex 3D models, we generate our dataset using topologically simpler, planar meshes distorted via thin-plate spline transformation.
Illustration of orientation ambiguity. Given an input image and mask, the model will face inherent uncertainty regarding the canonical orientation of the material.
Experiment Result Overview
Qualitative comparison of material extraction results with state-of-the-art methods on real-world images. Within each cell, the four maps are (top-left to bottom-right) albedo, normal, roughness and height. Material Palette omits height (map left blank). $^*$ denotes our re-implementation of MaterialPicker, which we built upon the Video DiT framework of Wan-Video.
Our unprojection generates a coarsely rectified texture, suffers from holes (Column 4). Our interpolation fills these artifacts to produce a dense map (Column 5).
Qualitative results of our geometry-based coarse rectification as a plug-and-play module on Material Palette.
Visualization of generated tileable materials. The figure displays tileable materials generated by noise rolling.
Sensitivity analysis of hyperparameters. We demonstrate that a large depth shift $d_{\text{shift}}$ diminishes perspective correction strength by flattening depth, while a small scale factor $s_{\text{sample}}$ fills projection holes at the cost of image sharpness.
Quality comparison of synthetic datasets. We compare materials generated by our pipeline against MaterialPicker. MaterialPicker utilizes complex underlying geometries, which inevitably lead to severe distortions and discontinuities, breaking the structure of the texture. In contrast, our approach effectively preserves the structural integrity and coherence of the patterns.
Qualitative results on real-world images. We present material extraction results from inputs captured by mobile devices in unconstrained environments.
3D Examples Overview
More Results
BibTeX
@ARTICLE{2025arXiv251218312Z,
author = {{Zhang}, Zeyu and {Zhai}, Wei and {Yang}, Jian and {Cao}, Yang},
title = "{MatE: Material Extraction from Single-Image via Geometric Prior}",
journal = {arXiv e-prints},
keywords = {Computer Vision and Pattern Recognition},
year = 2025,
month = dec,
eid = {arXiv:2512.18312},
pages = {arXiv:2512.18312},
archivePrefix = {arXiv},
eprint = {2512.18312},
primaryClass = {cs.CV},
adsurl = {https://ui.adsabs.harvard.edu/abs/2025arXiv251218312Z},
adsnote = {Provided by the SAO/NASA Astrophysics Data System}
}