Fig. 1. We present Human-LRM, a template-free large reconstruction model for feed-forward 3D human digitalization from a single image. Trained on a vast dataset comprising multi-view capture and 3D scans, our model generalizes across a broader range of scenarios. Guided by dense novel views generated by a conditional diffusion model, our model can generate high-fidelity full body humans from a single image.
Fig 2. Comparison of Human-LRM with SoTA single-view human reconstruction methods on in-the-wild images. Compared to volumetric reconstruction methods, our method achieves superior generalizability to challenging poses (a) and higher fidelity appearance prediction (b). Compared to generalizable human NeRF methods (c), our result achieves much better geometry quality.
Fig. 4: Geometry and appearance comparison with PIFu, GTA and SIFU on in-the-wild images.
Fig. 5: Comparison of our single-view reconstruction model to previous volumetric reconstruction methods: PIFu, PIFu-HD, ECON, LRM, GTA, and SIFU. All models are trained on THuman 2.0. For each example we show the geometry (colored by vertex normals) from 4 views.
Fig. 6: Novel view renderings results on HuMMan v1.0.
Fig. 8: Example novel view results after each stage. Results for Stage I and Stage III are mesh renderings. Results for Stage II are diffusion model outputs (i.e. images).
Figure S2. Depth comparison to HDNet, ZoeDepth and DPT. Red color means the region is close.
Figure S3. Normal comparison to HDNet.
@article{humanlrm2023,
author = {Zhenzhen Weng and Jingyuan Liu and Hao Tan and Zhan Xu and Yang Zhou and Serena Yeung-Levy and Jimei Yang},
title = {Template-Free Single-View 3D Human Digitalization with Diffusion-Guided LRM},
journal = {Preprint},
year = {2023},
}