Youze "Hargen" Zheng

I'm an undergraduate student at the University of California, San Diego with a broad interest in artificial intelligence, machine learning, and natural language processing (NLP). Currently, I'm a research assistant in the Laboratory for Emerging Intelligence, advised by Prof. Leon Bergen and Prof. Ramamohan Paturi.

My research focuses on advancing understanding of cutting-edge large language models and developing novel methods to improve their capabilities. Previously, I have worked on building efficient long-context retrievers at sentence level [1] and contributed to developing a biomedical benchmark to evaluate model performance and identify critical gaps in current systems [2].

I plan to pursue graduate studies in Computer Science in the areas of machine learning and NLP.

News

08/20/2025: The RoBBR Benchmark has been accepted to EMNLP 2025 🎉!
07/07/2025: Our SPScanner paper has been accepted to COLM 2025 🎉! Check out the paper on arXiv.
04/01/2024: Joined LEI: Laboratory for Emerging Intelligence, as a research assistant.
08/01/2022: Moved to La Jolla to pursue my undergraduate degree at UC San Diego.

Publications

Below is a list of my published papers. You may also see my Google Scholar page.

Single-Pass Document Scanning for Question Answering

Weili Cao*, Jianyou Wang*, Youze Zheng*, Longtian Bao*, Qirui Zheng, Taylor Berg-Kirkpatrick, Ramamohan Paturi, Leon Bergen

COLM 2025. Oral Spotlight Presentation (Top 2%)

TL;DR Paper Code Citation

TL;DR: Single-pass scanner in long document QA outperforms embedding methods while nearly matching full-context LLMs at much lower cost.

@inproceedings{cao2025singlepass,
    title={Single-Pass Document Scanning for Question Answering},
    author={Weili Cao and Jianyou Wang and Youze Zheng and Longtian Bao and Qirui Zheng and Taylor Berg-Kirkpatrick and Ramamohan Paturi and Leon Bergen},
    booktitle={Second Conference on Language Modeling},
    year={2025}}

Measuring Risk of Bias in Biomedical Reports: The RoBBR Benchmark

Jianyou Wang*, Weili Cao*, Longtian Bao, Youze Zheng, Gil Pasternak, Kaicheng Wang, Xiaoyue Wang, Ramamohan Paturi, Leon Bergen

EMNLP 2025.

TL;DR Paper Code Citation

TL;DR: A Biomedical Risk-of-Bias Benchmark created with novel subtasks that measure retrieval and reasoning abilities of LLMs and embedding models when performing risk-of-bias assessment.

@inproceedings{wang2025measuring,
    title={Measuring Risk of Bias in Biomedical Reports: The Ro{BBR} Benchmark},
    author={Jianyou Wang and Weili Cao and Longtian Bao and Youze Zheng and Gil Pasternak and Kaicheng Wang and Xiaoyue Wang and Ramamohan Paturi and Leon Bergen},
    booktitle={The 2025 Conference on Empirical Methods in Natural Language Processing},
    year={2025}}

*Equal Contribution.

Teaching Experiences

Below is a complete list of courses that I have served as a teaching assistant at UC San Diego.

CSE 151A: Introduction to Machine Learning | Materials

UC San Diego, Spring 2025 (with Prof. Sanjoy Dasgupta)

Held office hours; graded homework and exams. The course covers basic ML concepts such as nearest neighbors, basic probability, statistics, optimization, linear/logistic regression, and decision trees.
CSE 151B: Deep Learning

UC San Diego, Winter 2025 (with Prof. Garrison W. Cottrell)

Graded projects and exams. The course covers DL topics such as backpropagation, CNNs, RNNs, Transformers, and reinforcement learning. Students have to implement backprop from scratch and fine-tune 330M BERT for intent classification.
LIGN 167: Deep Learning for Natural Language Understanding

UC San Diego, Fall 2024 (with Prof. Leon Bergen)

Graded assignments; built autograder scripts. The course introduces basic neural network architectures, optimization methods, masked/autoregressive language modeling, and encoder-decoder architectures.
DSC 10: Principles of Data Science | Materials

UC San Diego, Fall 2024 (with Dr. Janine Tiefenbruck)

Held office hours; graded assignments and exams; maintained the course website. This first data science course covers data exploration/analysis/visualization, bootstrapping, hypothesis testing, and linear regression.
DSC 20: Data Structures for Data Science | Materials

UC San Diego, Fall 2023, Winter/Spring/Summer 2024 (with Dr. Marina Langlois, Dr. Jamal Tayeb)

Held office hours; developed programming assignments; graded homework and exams. The course covers data structures, algorithms, and basic programming concepts in Python.

News

Publications

Teaching Experiences

CSE 151A: Introduction to Machine Learning | Materials

CSE 151B: Deep Learning

LIGN 167: Deep Learning for Natural Language Understanding

DSC 10: Principles of Data Science | Materials

DSC 20: Data Structures for Data Science | Materials