cv
Click the PDF icon on the right to download my CV.
Basics
| Name | Yun Zhang |
| Label | First year PhD student at UCLA's Mobility Lab |
| yun666@g.ucla.edu | |
| Phone | (310) 694-6791 |
| Url | https://HandsomeYun.github.io/ |
| Summary | Researcher specializing in physical intelligence, autonomous driving, and computer vision. |
Education
-
2025.09 - Present United States
-
2021.09 - 2025.06 United States
Undergraduate
University of California, Los Angeles (UCLA)
B.S. in Mathematics in Computer Science, B.S. in Statistics and Data Science
- Cumulative GPA: 3.823/4.0
- Dean's Honors List (Fall 2021, Winter/Spring/Fall 2022, Winter/Spring 2023)
-
2015.09 - 2021.06 Athens, Greece
High School
American Community School of Athens (ACS Athens)
- Weighted Cumulative GPA: 4.886/4.0
- Final IB Score: 44/45
Publications
-
2025.09.01 MIC-BEV: Multi-Infrastructure Camera Bird's-Eye-View Transformer with Relation-Aware Fusion for 3D Object Detection
Submitted to IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), Best Paper Award for DriveX Workshop
First Author. MIC-BEV is a Transformer-based framework for multi-camera infrastructure perception. It performs 3D object detection and BEV segmentation by fusing features from multiple cameras through a geometry-aware graph module. Designed for diverse camera setups and harsh conditions, MIC-BEV maintains strong robustness under sensor degradation. To support this, we introduce M2I, a synthetic dataset covering varied layouts, weather, and viewpoints. Experiments on both M2I and the real-world RoScenes dataset show that MIC-BEV achieves state-of-the-art performance and reliability for real-world deployment. (Currently releasing the Workshop version)
-
2025.03.09 AutoVLA: Vision-Language-Action Model for End-to-End Autonomous Driving with Adaptive Reasoning and Reinforcement Fine-Tuning
Accepted by Neural Information Processing Systems (NeurIPS)
Fourth Author. AutoVLA is a vision-language-action model for end-to-end autonomous driving with adaptive reasoning and reinforcement fine-tuning capabilities.
-
2025.03.09 InSPE: Rapid Evaluation of Heterogeneous Multi-Modal Infrastructure Sensor Placement
Submitted to The IEEE/CVF Winter Conference on Applications of Computer Vision 2026 (WACV)
Co-firsr Author. This paper introduces InSPE, a framework for evaluating heterogeneous multi-modal infrastructure sensor placement by integrating metrics like sensor coverage, occlusion, and information gain, supported by a new dataset and benchmarking experiments to optimize perception in intelligent intersections.
-
2025.03.09 AgentAlign: Misalignment-Adapted Multi-Agent Perception for Resilient Inter-Agent Sensor Correlations
Submitted to The IEEE/CVF Winter Conference on Applications of Computer Vision 2026 (WACV)
Second Author. This work presents AgentAlign, a real-world multi-agent perception framework that mitigates multi-modality misalignment in cooperative autonomous systems using cross-modality feature alignment and introduces the V2XSet-Noise dataset for robust evaluation.
-
2025.03.09 RelMap: Enhancing Online Map Construction with Class-Aware Spatial Relation and Semantic Priors
Submitted to The 40th Annual AAAI Conference on Artificial Intelligence (AAAI)
Second Author. RelMap is an online HD map construction framework that enhances vectorized map generation using class-aware spatial relations and semantic priors, significantly improving accuracy and data efficiency on nuScenes and Argoverse 2 datasets.
-
2025.03.09 V2XPnP: Vehicle-to-Everything Spatio-Temporal Fusion for Multi-Agent Perception and Prediction
Accepted by International Conference on Computer Vision (ICCV)
Sixth Author. V2XPnP proposes a spatio-temporal fusion framework for multi-agent perception and prediction, leveraging a Transformer-based architecture and a novel sequential dataset to benchmark when, what, and how to fuse information in V2X scenarios.
Research
-
2024.05 - 2024.08 Researcher
HKU Summer Research Program
Leveraged Large Language Models (MiniGPT-4) for multi-modality brain tumor segmentation, integrating four distinct MRI modalities (T1c, T1w, T2c, and FLAIR) onto a common space to enhance segmentation accuracy.
- Awarded Best Presenter and received a PhD offer with a Presidential Scholarship.
-
2023.02 - 2025.09 Research Assistant
Mobility Lab, UCLA
Contributed to multi-agent perception, sensor fusion, and infrastructure-aware autonomous driving, co-authoring five papers on multi-modal sensor placement (InSPE), misalignment adaptation in cooperative perception (AgentAlign), class-aware map construction (RelMap), spatio-temporal fusion for V2X perception (V2XPnP), and real-world cooperative perception datasets (V2X-ReaLO).
- Participating in the U.S. DOT Intersection Safety Challenge and won $750,000
-
2023.01 - 2024.12 Research Assistant
Vwani Roychowdhury's Lab, UCLA
Contributed to the implementation and deep learning models of Hilbert (HIL) detector of PyHFO, a multi-window desktop application providing time-efficient HFO detection algorithms for artifact and HFO with spike classification
- Reduced the detection run-time by 50 times compared to state-of-the-art with comparative study to ensure correctness.
Work
-
2023.06 - 2023.12 AI/Data Analyst Intern
Office of Palo Alto Councilmember Greg Tanaka
Analyzed voter data from public social media, HubSpot, and voter profiles within a California congressional district, identifying trends and developing predictive models to anticipate voting behavior
- Utilized LLMs to generate personalized campaign emails and campaign services, increasing efficiency.
-
2022.12 - 2023.03 Data Analysis Intern
Uber, Hong Kong
Participated in the facial mask recognition project during the COVID-19 pandemic for backend utilities
- Performed in-depth analysis and demand forecasting for Uber's regional operations, evaluating the influence of factors such as humidity, wind, time of day, as well as origin and destination
Skills
| Autonomous Systems & Simulation | |
| CARLA | |
| OpenCDA | |
| OpenSCENARIO Documentation | |
| Scenario Runner |
| Multi-Agent & Cooperative Perception | |
| V2X | |
| Cooperative Perception | |
| Sensor Fusion | |
| Multi-Agent Perception | |
| Intermediate Fusion | |
| Multi-Sensor Misalignment |
| Programming Languages | |
| Python | |
| C++ | |
| JavaScript | |
| C# | |
| R | |
| LaTeX | |
| Bash/Shell Scripting |
| Machine Learning & Data Science | |
| Pytorch | |
| TensorFlow | |
| Scikit-learn | |
| Pandas | |
| NumPy | |
| MATLAB | |
| Jupyter Notebooks |
| Medical Imaging & Biomedical Analysis | |
| Segment Anything Model (SAM) | |
| nnUNet | |
| BraTS | |
| Image Segmentation |
| DevOps & Cloud Computing | |
| Docker | |
| AWS | |
| Git | |
| GitKraken |
| Web Development & Frontend Technologies | |
| React | |
| Node.js | |
| HTML | |
| CSS | |
| JavaScript | |
| Tableau |
Languages
| Chinese | |
| Native speaker |
| English | |
| Fluent |
Interests
| Cooking | |
| Chinese Cuisine | |
| Japanese Cuisine | |
| Western Cuisine | |
| Desserts | |
| Fusion Cooking |