profile photo

Yangguang Li(李阳光)

I am working at SenseTime from 2019 to now. I obtained my master degree from Beijing University of Posts and Telecommunications and bachelor degree from Tianjin University.

Currently, I am focusing on the exploration and exploitation of autonomous driving camera perception models(Monocular and BEV). Before that I focused on general vision(multi-modal/big model) and Detection(face/human/traffic/structure/keypoint/video/automl) research.

Email  /  Linkedin  /  Zhihu  /  Github  /  Google Scholar

  • January 2023: One paper gets accepted to Pattern Recognition 2023.
  • October 2022: One paper gets accepted to NeurIPS ML4AD 2022.
  • October 2022: One paper gets accepted to EMNLP 2022.
  • September 2022: One paper gets accepted to BMVC 2022.
  • September 2022: One paper gets accepted to NeurIPS 2022.
  • August 2022: One paper gets accepted to COLING 2022.
  • July 2022: One paper gets accepted to ECCV 2022.
  • June 2022: 1st place in Embodied AI Challenge @ CVPR 2022.
  • June 2022: 2nd place in UG2+ Challenge @ CVPR 2022.
  • June 2022: One papers gets accepted to ICML workshops 2022.
  • April 2022: One papers gets accepted to CVPR workshops 2022.
  • April 2022: One paper gets accepted to IJCAI 2022 as long oral.
  • January 2022: One paper gets accepted to ICLR 2022.
  • April 2022: OpenGVLab is released.
  • November 2021: INTERN general vision technical report is released.
bcdnet Towards Accurate Binary Neural Networks via Modeling Contextual Dependencies
Xingrun Xing, Yangguang Li, Wei Li, Wenrui Ding, Yalong Jiang, Yufeng Wang, Jing Shao, Chunlei Liu, Xianglong Liu
ECCV, 2022
PDF, bibtex, code

We propose BCDNet, a newly designed method to binary neural modules, which enables BNNs to learn effective contextual dependencies.

declip Supervision Exists Everywhere: A Data Efficient Contrastive Language-Image Pre-training Paradigm
Yangguang Li*, Feng Liang*, Lichen Zhao*, Yufeng Cui, Wanli Ouyang, Jing Shao, Fengwei Yu, Junjie Yan
ICLR, 2022
PDF, bibtex, code, video presentation

We propose Data efficient CLIP (DeCLIP), a method to efficiently train CLIP via utilizing the widespread supervision among the image-text data.

declip_benchmark Democratizing Contrastive Language-Image Pre-training: A CLIP Benchmark of Data, Model, and Supervision
Yufeng Cui*, Lichen Zhao*, Feng Liang*, Yangguang Li, Jing Shao
ICMLW, 2022
PDF, bibtex, code

We propose a CLIP benchmark of data, model, and supervision.

repre RePre: Improving Self-Supervised Vision Transformer with Reconstructive Pre-training
Luya Wang, Feng Liang, Yangguang Li, Honggang Zhang, Wanli Ouyang, Jing Shao
IJCAI, 2022, Long oral
PDF, bibtex

We propose RePre to extends contrastive frameworks by adding a branch for reconstructing raw image pixels in parallel with the existing contrastive objective.

tbd Task-Balanced Distillation for Object Detection
Ruining Tang, Zhenyu Liu, Yangguang Li, Yiguo Song, Hui Liu, Qide Wang, Jing Shao, Guifang Duan, Jianrong Tan
Pattern Recognition, 2023
PDF, bibtex

We alleviate the problem of inconsistent spatial distributions between classification score and localization quality (IOU) in detection task by designing a customized knowledge distillation teacher-student training workflow.

rxr 1st Place Solutions for RxR-Habitat Vision-and-Language Navigation Competition (CVPR 2022)
Dong An*, Zun Wang*, Yangguang Li, Yi Wang, Yicong Hong, Yan Huang, Liang Wang, Jing Shao
Winner of the 2nd RxR-Habitat Competition, 2022
PDF, bibtex

We present a modular plan-and-control approach, which consists of three modules: the candidate waypoints predictor (CWP), the history enhanced planner and the tryout controller.

imci IMCI: Integrate Multi-view Contextual Information for Fact Extraction and Verification
Hao Wang, Yangguang Li, Zhen Huang, Yong Dou
COLING, 2022
PDF, bibtex, code

We propose IMCI to integrate multi-view contextual information for fact extraction and verification.

r2f R2F: A General Retrieval, Reading and Fusion Framework for Document-level Natural Language Inference
Hao Wang, Yixin Cao, Yangguang Li, Zhen Huang, Kun Wang, Jing Shao
EMNLP, 2022
PDF, bibtex, code

We establish a general solution, named Retrieval, Reading and Fusion (R2F) framework, and a new setting, by analyzing the main challenges of DocNLI: interpretability, long-range dependency, and cross-sentence inference.

intern INTERN: A New Learning Paradigm Towards General Vision
Jing Shao*, Siyu Chen*, Yangguang Li*, Kun Wang*, Zhenfei Yin*, Yinan He*, Jianing Teng*, Qinghong Sun*, Mengya Gao*, Jihao Liu*, Gengshi Huang*, .etc
CoRR, 2021
PDF, bibtex, code, project

we propose a new learning paradigm named INTERN, which introduces a continuous learning scheme, including a highly extensible upstream pretraining pipeline leveraging large-scale data and various supervisory signals, as well as a flexible downstream adaptation towards diversified tasks.

Selected Honors
  • SenseTime Team Award, SenseTime’s highest award, 2021.
  • SenseTime Outstanding Intern, 2020.
  • SenseTime Outstanding Intern, 2019.

  • Reviewer in CVPR, ICML, ECCV, NeurIPS, TCSVT
  • Organizer in ECCV 2022 Computer Vision in the Wild Challenge

  • Thanks to Feng (Jeff) Liang