Weicai Ye 叶伟才

I received my Ph.D. degree from ZJU3DV, State Key Lab of CAD&CG, Zhejiang University in 2024, advised by Prof. Hujun Bao and Prof. Guofeng Zhang. Before that, I also interned at Shanghai AI Laboratory, working with Dr. Tong He, Prof. Wanli Ouyang and Prof. Yu Qiao.

Previously, I was a visiting researcher in the Computer Vision and Geometry Group (CVG), ETH Zurich, advised by Prof. Marc Pollefeys.

My research goal is to make computers (robotics) learn to perceive, localize, reconstruct, reason, and interact with the real world like human beings, that is AGI. I'm interested in 3D Vision Foundation Model, World Model, Physical World Simulator, and Embodied AI, especially correspondence, 3D/4D reconstruction, rendering, generation, and robotics manipulation.

Email  /  CV  /  Google Scholar  /  Twitter  /  Github

profile photo
Research

* denotes equal contribution, † denotes corresponding author, ‡ denotes project lead. Representative works are highlighted.

Where Am I and What Will I See: An Auto-Regressive Model for Spatial Localization and View Prediction
Junyi Chen, Di Huang, Weicai Ye, Wanli Ouyang, Tong He
Arxiv 2024, Under Review
project page / arXiv / code / framework

Proposed a novel auto-regressive framework that jointly addresses spatial localization and view prediction.

HiSplat: Hierarchical 3D Gaussian Splatting for Generalizable Sparse-View Reconstruction
Shengji Tang, Weicai Ye†, Peng Ye, Weihao Lin, Yang Zhou, Tao Chen, Wanli Ouyang
Arxiv 2024, Under Review
project page / arXiv / code

Proposed a hierarchical manner in generalizable 3D Gaussian Splatting to construct hierarchical 3D Gaussians via a coarse-to-fine strategy, which significantly enhances reconstruction quality and cross-dataset generalization.

DGTR: Distributed Gaussian Turbo-Reconstruction for Sparse-View Vast Scenes
Hao Li, Yuanyuan Gao, Haosong Peng, Chenming Wu, Weicai Ye, Yufeng Zhan, Chen Zhao, Dingwen Zhang, Jingdong Wang, Junwei Han
Arxiv 2024, Under Review
project page / arXiv / code / framework

Proposed a novel distributed framework for efficient Gaussian reconstruction for sparse-view vast scenes, leveraging feed-forward Gaussian model for fast inference and a global alignment algorithm to ensure geometric consistency.

DiffPano: Scalable and Consistent Text to Panorama Generation with Spherical Epipolar-Aware Diffusion
Weicai Ye*‡ , Chenhao Ji*, Zheng Chen, Junyao Gao, Xiaoshui Huang, Song-Hai Zhang, Wanli Ouyang, Tong He†, Cairong Zhao†, Guofeng Zhang†
NeurIPS, 2024, CCF-A
project page / arXiv / code

Proposed scalable and consistent text-to-panorama generation with spherical epipolar-aware diffusion. Established large-scale panoramic video-text datasets with corresponding depth and camera poses. Achieved long-term, consistent, and diverse panoramic scene generation given unseen text and camera poses with SOTA performance.

MeshAnything: Artist-Created Mesh Generation with Autoregressive Transformers
Yiwen Chen, Tong He, Di Huang, Weicai Ye, Sijin Chen, Jiaxiang Tang, Xin Chen, Zhongang Cai, Lei Yang, Gang Yu, Guosheng Lin, Chi Zhang
Arxiv 2024, Under Review
project page / arXiv / code / huggingface

MeshAnything mimics human artist in extracting meshes from any 3D representations. It can be combined with various 3D asset production pipelines, such as 3D reconstruction and generation, to convert their results into Artist-Created Meshes that can be seamlessly applied in 3D industry.

Lumina-T2X: Transforming Text into Any Modality, Resolution, and Duration via Flow-based Large Diffusion Transformers
Peng Gao, Le Zhuo, Dongyang Liu, Ruoyi Du, Xu Luo, Longtian Qiu, Yuhang Zhang, Rongjie Huang, Shijie Geng, Renrui Zhang, unlin Xie, Wenqi Shao, Zhengkai Jiang, Tianshuo Yang, Weicai Ye, Tong He, Jingwen He, Yu Qiao, Hongsheng Li
Arxiv 2024, Under Review
project page / arXiv / code / demo / huggingface

Proposed flow-based large diffusion transformers foundation model for transforming text into any modality (image, video, 3D, Audio, music, etc.), resolution, and duration.

DATAP-SfM: Dynamic-Aware Tracking Any Point for Robust Dense Structure from Motion in the Wild
Weicai Ye‡, Xinyu Chen, Ruohao Zhan, Di Huang, Xiaoshui Huang, Haoyi Zhu, Hujun Bao, Wanli Ouyang, Tong He†, Guofeng Zhang†
Arxiv 2024 coming, Under Review
project page / arXiv / code

Proposed a concise, elegant, and robust SfM pipeline with point tracking for smooth camera trajectories and dense pointclouds from casual monocular videos.

FedSurfGS: Scalable 3D Surface Gaussian Splatting with Federated Learning for Large Scene Reconstruction
Weicai Ye, Hao Li, Yuanyuan Gao, Yalun Dai, Junyi Chen, Nanqing Dong, Dingwen Zhang, Hujun Bao, Wanli Ouyang, Yu Qiao, Tong He, Guofeng Zhang
Arxiv 2024, coming
project page / arXiv / code

First cloud-edge-device hierarchical framework with federated learning for large-scale high-fidelity surface reconstruction in a distributed manner, achieving balance between high-precision reconstruction and low-cost memory.

PGSR: Planar-based Gaussian Splatting for Efficient and High-Fidelity Surface Reconstruction
Danpeng Chen, Hai Li, Weicai Ye, Yifan Wang, Weijian Xie, Shangjin Zhai, Nan Wang, Haomin Liu, Hujun Bao, Guofeng Zhang
Arxiv 2024, Under Review
project page / arXiv / code / Result1 / Result2

Proposed photorealistic rendering and efficient high-fidelity surface reconstruction model without any pretrained priors, outperforming 3DGS-based (Sugar, 2DGS, Gaussian Opacity Fields, etc.) and SDF-Based methods on T&T, DTU, etc. with faster training. e.g. (ours: only 1 hour vs Neuralangelo 128+ hours)

DynaSurfGS: Dynamic Surface Reconstruction with Planar-based Gaussian Splatting
Weiwei Cai, Weicai Ye†‡, Peng Ye, Tong He, Tao Chen†
Arxiv 2024, Under Review
project page / arXiv / code

Based on PGSR, proposed the DynaSurfGS framework, which can facilitate real-time photorealistic rendering and dynamic high-fidelity surface reconstruction, achieving smooth surfaces with meticulous geometry.

GigaGS: Scaling up Planar-Based 3D Gaussians for Large Scene Surface Reconstruction
Junyi Chen*, Weicai Ye*†‡, Yifan Wang, Danpeng Chen, Di Huang, Wanli Ouyang, Guofeng Zhang, Yu Qiao, Tong He†
Arxiv 2024, Under Review
project page / arXiv / code / More results

Based on PGSR, proposed photorealistic rendering and efficient high-fidelity large surface reconstruction in a divide-and-conquer manner with LOD structure, outperforming Neuralangelo.

StreetSurfGS: Scalable Large Scene Surface Reconstruction with Gaussian Splatting for Urban Street Scences
Xiao Cui* Weicai Ye*†‡, Yifan Wang, Guofeng Zhang, Wengang Zhou, Tong He†, Houqiang Li
Arxiv 2024, Under Review
project page / arXiv / code

Based on PGSR, proposed photorealistic rendering and efficient high-fidelity Large Scene Surface Reconstruction for Urban Street Scenes with Free Camera Trajectories, outperforming F2NeRF.

NeuRodin: A Two-stage Framework for High-Fidelity Neural Surface Reconstruction.
Yifan Wang, Di Huang, Weicai Ye†, Guofeng Zhang, Wanli Ouyang, Tong He†
NeurIPS, 2024, CCF-A
project page / arXiv / code / result1 / result2

Identified two main factors of the SDF-based approach that degrade surface quality and proposed a two-stage neural surface reconstruction framework without any pretrained priors, achieving faster training (only 18 GPU hours) and high-fidelity surface reconstruction with fine-grained details, outperforming Neuralangelo on T&T, ScanNet++, etc.

ND-SDF: Learning Normal Deflection Fields for High-Fidelity Indoor Reconstruction
Ziyu Tang, Weicai Ye†, Yifan Wang, Di Huang, Hujun Bao, Tong He†, Guofeng Zhang
Arxiv 2024, Under Review
project page / arXiv / code / More results

Proposed Normal Deflection fields to represent the angle deviation between the scene normals and the prior normals, achieving smooth surfaces with fine-grained structures, outperforming MonoSDF.

D3FlowSLAM: Self-Supervised Dynamic SLAM with Flow Motion Decomposition and DINO Guidance
Xingyuan Yu*, Weicai Ye*‡, Xiyue Guo, Yuhang Ming, Jinyu Li, Hujun Bao, Zhaopeng Cui, Guofeng Zhang†
Arxiv 2024, Under Review
project page / arXiv / code

Proposed self-supervised dynamic SLAM with Flow Motion Decomposition and DINO Guidance, outperforming DROID-SLAM.

Point Mamba: A Novel Point Cloud Backbone Based on State Space Model with Octree-Based Ordering Strategy.
Jiuming Liu, Ruiji Yu, Yian Wang, Yu Zheng, Tianchen Deng, Weicai Ye, Hesheng Wang
Arxiv 2024, Under Review
project page / arXiv / code

Proposed efficient point cloud backbone with Mamba framework and achieved SOTA performance.

Point Cloud Matters: Rethinking the Impact of Different Observation Spaces on Robot Learning
Haoyi Zhu, Yating Wang, Di Huang, Weicai Ye, Wanli Ouyang, Tong He
NeurIPS 2024 Datasets and Benchmarks Track , CCF-A
project page / arXiv / code

Implied that point cloud observation, or explicit 3D information, matters for robot learning. With point cloud as input, the agent achieved higher mean success rates and exhibited better generalization ability.

NeRF-Det++: Incorporating Semantic Cues and Perspective-aware Depth Supervision for Indoor Multi-View 3D Detection
Chenxi Huang, Yuenan Hou, Weicai Ye, Di Huang, Xiaoshui Huang, Binbin Lin, Deng Cai, Wanli Ouyang
Arxiv 2024, Under Review
project page / arXiv / code / Video

Incorporating semantic cues and perspective-aware depth supervision, NeRF-Det++ outperforms NeRF-Det by +1.9% in mAP@0.25 and +3.5% in mAP@0.50 on ScanNetV2.

DifFlow3D: Toward Robust Uncertainty-Aware Scene Flow Estimation with Diffusion Model
Jiuming Liu, Guangming Wang, Weicai Ye, Chaokang Jiang, Jinru Han, Zhe Liu, Guofeng Zhang, Dalong Du, Hesheng Wang
CVPR, 2024, CCF-A
arXiv / code / demo

Proposed plug-and-play and iterative diffusion refinement framework for robust scene flow estimation. Achieved unprecedented millimeter level accuracy on KITTI, and with 6.7% and 19.1% EPE3D reduction respectively on FlyingThings3D and KITTI 2015.

IntrinsicNeRF: Learning Intrinsic Neural Radiance Fields for Editable Novel View Synthesis
Weicai Ye*, Shuo Chen*, Chong Bao, Hujun Bao, Marc Pollefeys, Zhaopeng Cui, Guofeng Zhang
ICCV, 2023, CCF-A
project page / arXiv / code / poster

Introduced intrinsic decomposition into the NeRF-based rendering and performed editable novel view synthesis in room-scale scenes.

PVO: Panoptic Visual Odometry
Weicai Ye*, Xinyue Lan*, Shuo Chen, Yuhang Ming, Xingyuan Yu, Zhaopeng Cui, Hujun Bao, Guofeng Zhang
CVPR, 2023, CCF-A
project page / arXiv / code / poster

Introduced panoptic visual odometry framework to achieve comprehensive modeling of the scene motion, geometry, and panoptic segmentation information.

DeFlowSLAM: Self-Supervised Scene Motion Decomposition for Dynamic Dense SLAM
Weicai Ye*, Xingyuan Yu*, Xinyue Lan, Yuhang Ming, Jinyu Li, Zhaopeng Cui, Hujun Bao, Guofeng Zhang
Arxiv, 2022
Project / code / arxiv

Proposed a novel dual-flow representation of selfsupervised scene motion decomposition for dynamic dense SLAM

iDF-SLAM: End-to-End RGB-D SLAM with Neural Implicit Mapping and Deep Feature Tracking
Yuhang Ming, Weicai Ye, Andrew Calway
Arxiv, 2022
arXiv / video

Proposed a novel end-to-end RGB-D SLAM, which adopts a feature-based deep neural tracker as frontend and a NeRF-based neural implicit mapper as the backend.

Improving Feature-based Visual Localization by Geometry-Aided Matching
Hailin Yu, Youji Feng, Weicai Ye, Mingxuan Jiang, Hujun Bao, Guofeng Zhang
Arxiv, 2022
arxiv / code / video /

As a main solution for feature matching and visual localization, integrated into OpenXRLab.

Hybrid Tracker with Pixel and Instance for Video Panoptic Segmentation
Weicai Ye*, Xinyue Lan*, Ge Su, Zhaopeng Cui, Hujun Bao, Guofeng Zhang
Arxiv, 2022
arxiv / video

Achieved SOTA performance on video panoptic segmentation from two perspectives: feature space (Instance Tracker) and spatial location (Pixel Tracker).

Coxgraph: Multi-Robot Collaborative, Globally Consistent, Online Dense Reconstruction System
Xiangyu Liu, Weicai Ye, Chaoran Tian, Zhaopeng Cui, Hujun Bao, Guofeng Zhang
IROS, 2021, Best Paper Award Finalist on Safety, Security, and Rescue Robotics in memory of Motohiro Kisoi.
project page / arXiv / code / video

Proposed an efficient system named Coxgraph for multi-robot collaborative dense reconstruction in real-time. To facilitate transmission, we propose a compact 3D representation which transforms the SDF submap to mesh packs.

ARCargo: Multi-Device Integrated Cargo Loading Management System with Augmented Reality
Tianxiang Zhang, Chong Bao, Hongjia Zhai, Jiazhen Xia, Weicai Ye‡, Guofeng Zhang
CyberSciTech, 2021
paper / video

Proposed a multi-device integrated cargo loading management system with AR, which monitors cargoes by fusing perceptual information from multiple devices in real-time.

SuperPlane: 3D Plane Detection and Description from a Single Image
Weicai Ye, Hai Li, Tianxiang Zhang, Xiaowei Zhou, Hujun Bao, Guofeng Zhang
VR, 2021, CCF-A
paper / video

Introduced robust plane matching in texture-less scenes and achieved SOTA performance in image-based localization.

Saliency Guided Subdivision for Single-View Mesh Reconstruction
Hai Li*, Weicai Ye*, Guofeng Zhang, Sanyuan Zhang, Hujun Bao
3DV, 2020
paper / video / poster / slides

Proposed a novel saliency guided subdivision method to achieve the trade-off between detail generation and memory consumption. Our method can both produce visually pleasing mesh reconstruction results with fine details and achieve better performance.

Learning Bipartite Graph Matching for Camera Localization
Hailin Yu, Weicai Ye, Youji Feng, Hujun Bao, Guofeng Zhang
ISMAR, 2020
paper / poster

Proposed bipartite graph network with Hungarian pooling layer to deal with 2D-3D matching, which can find more correct matches and improves localization on both the robustness and accuracy.

Experiences
shailab Researcher Intern
General 3D Vision Team, Shanghai AI Laboratory
As the first author/corresponding author/project lead, proposed Match Anything (Correspondence Foundation Model), DiffusionSfM, InternVerse (Reconstruction Foundation Models include SurfelGS, FedSurfGS, GigaGS, StreetSurfGS, InvrenderGS, NeuRodin), DiffPano (Text to Multi-view Panorama Generation), MAIL (Embodied Foundation Model for Imitation Learning), etc. Working with Dr. Tong He, Prof. Wanli Ouyang, and Prof. Yu Qiao. Mentoring 10+ junior researchers at Shanghai AI Lab.
2023.10-Present
ethz Visiting Researcher
Computer Vision and Geometry Lab, ETH Zürich, advised by Prof. Marc Pollefeys
2022.09-2023.03
sensetime 3D Vision Researcher Intern
3D Reconstruction of Indoor Scene of RGB-D Images, Sensetime
2018.01-2018.05
baidu Software Engineer Intern
Video Search System, Baidu
2017.02-2017.07
Adwards and Honors
  • 2022 Academic Rising Star at Zhejiang University.
  • 2021 Best Paper Award Finalist on Safety, Security, and Rescue Robotics in memory of Motohiro Kisoi (IROS2021).
  • 2020 5th of ECCV GigaVision Challenge.
  • 2020 6th among 1945 teams in the Taobao Live Product Identification Contest.
  • 2019 Zhijun He Outstanding Scholarship.
  • 2019 Chiang Chen Industrial Charity Foundation Grant.
  • 2018 Champion of 2018 Cloudwalk Headcount Challenge with 31,500¥ Bonus.
  • 2017 National Encouragement Scholarship, Ranked 3rd of the 111 Students.
  • 2017 Meritorious Winner in Mathematical Contest Modeling.
  • 2016 First Prize in Sichuan Province Contest District in China Undergraduate Mathematical Contest in Modeling.
Services
  • Conference Reviewer: ICML, NeurIPS, ICLR, CVPR, ECCV, ICRA, IROS, VR, ISMAR, AAAI
  • Journal Reviewer: TIP, TCSVT, RAL, CVM
Collaborators
    Mentored 20+ Junior Researchers at Shanghai AI Lab, ETH, TUM, CUHK, NTU, THU, ZJU, SJTU, FDU, USTC, NWPU, and TJU.

Design and source code from Jon Barron's website