Research

📄 Selected Research

Pre-print

AURA: Adaptive Uncertainty-aware Refinement for LLM-as-a-Judge Auditing.

Zilong Zhang, Noah Yi-Ting Hung, Weiyi He, Lei Ding, Junxi Chang, Chi-Kuang Yeh.

Submitted·arXiv 2026

LLM Evaluation Active Learning
📄
BibTeX
@misc{zhang2025aura, title={AURA: Adaptive Uncertainty-aware Refinement for LLM-as-a-Judge Auditing}, author={Zhang, Zilong and Hung, Noah Yi-Ting and He, Weiyi and Ding, Lei and Chang, Junxi and Yeh, Chi-Kuang}, year={2025}, note={arXiv preprint} }
Large language models (LLMs) are increasingly used as judges for open-ended generation, as large-scale human evaluation is often expensive and difficult to scale, yet their preferences remain imperfect proxies for human judgment. Existing auditing pipelines often assume that a reliable subset of examples or clean supervision signals are available beforehand, for example from human annotation, heuristic filtering, or the outputs of strong judges. In LLM evaluation, this assumption is fragile: the initial split may inherit judge bias, while human verification is typically too scarce to define stable groups at scale. We propose AURA, an adaptive uncertainty--aware refinement framework for auditing pairwise LLM--as--a--judge decisions under selected human verification. AURA iteratively learns a human-consistency signal, propagates reliable evidence, and prioritizes uncertain comparisons for human review. The key idea is to treat trust in a judge as a latent quantity that is progressively refined as evidence accumulates. We provide a compact formulation, a stable refinement procedure, and a comprehensive evaluation on both synthetic and real pairwise LLM-answer data.
Quantifying and Auditing LLM Evaluation via Positive-Unlabeled Learning.

Zilong Zhang^*, Noah Yi-Ting Hung^*, Lei Ding, Chi-Kuang Yeh.

Submitted·arXiv 2026

LLM Evaluation PU Learning
📄
BibTeX
@misc{zhang2025llm_pu, title={Quantifying and Auditing LLM Evaluation via Positive-Unlabeled Learning}, author={Zhang, Zilong and Hung, Noah Yi-Ting and Ding, Lei and Yeh, Chi-Kuang}, year={2025}, note={arXiv preprint} }
Large Language Models (LLMs) are increasingly used as judges for scalable evaluation, yet such LLM-as-a-Judge systems exhibit systematic biases that are decoupled from semantic quality, most notably verbosity bias. Meanwhile, human supervision is costly and typically selective, yielding reliable positive judgments but leaving most outputs unlabelled and potentially mixed in quality. We formulate LLM evaluation under selective human supervision as a positive–unlabelled learning problem and propose a geometric auditing framework based on Partial Optimal Transport. By aligning a small set of human-verified positives with a reliable subset of unlabelled outputs in a fixed embedding space, our method identifies human-consistent preferences and corrects biased judges without retraining. Experiments demonstrate improved alignment with human preferences, increased robustness to presentation biases, and interpretable confidence estimates, offering a scalable and statistically grounded alternative to existing LLM-as-a-judge pipelines.
Sparse Deep Additive Model with Interactions: Enhancing Interpretability and Predictability

Noah Yi-Ting Hung, Li-Hsiang Lin, Vince D. Calhoun

Submitted·arXiv 2025

Interpretable ML Non-parametric Deep Learning
📄
BibTeX
@article{hung2025sparse, title={Sparse Deep Additive Model with Interactions: Enhancing Interpretability and Predictability}, author={Hung, Noah Yi-Ting and Lin, Li-Hsiang and Calhoun, Vince D.}, journal={arXiv preprint arXiv:2509.23068}, year={2025} }
SDAMI combines sparsity-driven feature selection with deep subnetworks for flexible function approximation. Unlike conventional black-box models, SDAMI explicitly disentangles main effects and interaction effects to enhance interpretability while maintaining predictive power on small or moderate samples with high-dimensional features.

Journal

Deep P-Spline: Theory, Fast Tuning, and Application

Noah Yi-Ting Hung, Li-Hsiang Lin, Vince D. Calhoun

Accepted·Journal of the American Statistical Association (JASA) 2026

Non-parametric 🏆 JSM SPES Award
📄
BibTeX
@article{hung2026deep, title={Deep p-spline: Theory, fast tuning, and application}, author={Hung, Noah Yi-Ting and Lin, Li-Hsiang and Calhoun, Vince D}, journal={Journal of the American Statistical Association}, number={just-accepted}, pages={1--28}, year={2026}, publisher={Taylor \& Francis} }
We introduce a difference penalty that automates knot selection and simplifies neuron selection in deep neural networks. DPS extends conventional DNN modeling and forms the basis for a latent variable framework using the ECM algorithm for efficient structure tuning with theoretical guarantees.
Improving Sugarcane Extraneous Matter Prediction Using Transformed Near-Infrared Spectral Data and Functional Regression

Stephania Imbachi Ordonez, Noah Yi-Ting Hung, Kevin McPeak, Gillian Eggleston, Li-Hsiang Lin

Accepted·Journal of Agricultural, Biological, and Environmental Statistics (JABES) 2026

Functional Analysis
📄
BibTeX
@article{. }
We introduce a difference penalty that automates knot selection and simplifies neuron selection in deep neural networks. DPS extends conventional DNN modeling and forms the basis for a latent variable framework using the ECM algorithm for efficient structure tuning with theoretical guarantees.
Semi-Markov Process-Driven Maintenance Scheduling for Tainter Gate System Considering Multiple Limit States

John Thedy, Kuo-Wei Liao, Noah Yi-Ting Hung

Accepted·Journal of Structural Health Monitoring (JSHM) 2024

Reliability Stochastic Processes
📄
BibTeX
@article{thedy2024smc, title={Semi-Markov Process-Driven Maintenance Scheduling for Tainter Gate System Considering Multiple Limit States}, author={Thedy, John and Liao, Kuo-Wei and Hung, Noah Yi-Ting}, journal={Journal of Structural Health Monitoring}, year={2024} }
Adopting the Semi-Markov Process, which accommodates non-exponential distributions of state durations, to formulate optimal maintenance strategies for Tainter gate systems noted for prolonged dormancy and significant operational uncertainties.