Publications

AutoFocus-IL: VLM-based Saliency Maps for Data-Efficient Visual Imitation Learning without Extra Human Annotations

AutoFocus-IL: VLM-based Saliency Maps for Data-Efficient Visual Imitation Learning without Extra Human Annotations

Litian Gong, Fatemeh Bahrani, Yutai Zhou, Amin Banayeeanzade, Jiachen Li, Erdem Bıyık

arXiv 2025

AutoFocus-IL is a simple yet effective method to improve data efficiency and generalization in visual imitation learning by guiding policies to attend to task-relevant features rather than distractors and spurious correlations. Although saliency regularization has emerged as a promising way to achieve this, existing approaches typically require costly supervision such as human gaze data or manual saliency annotations. In contrast, AutoFocus-IL leverages vision-language models (VLMs) to automatically identify and track key objects in demonstrations, generating temporal saliency maps that highlight causal visual signals while suppressing distractors. These maps are then used to regularize behavior cloning policies, yielding stronger alignment between visual attention and task-relevant cues. Experiments in both the CARLA simulator and real-robot manipulation tasks demonstrate that AutoFocus-IL not only outperforms standard behavior cloning but also surpasses state-of-the-art baselines that assume privileged access to human supervision, such as gaze data.

ORIC: Benchmarking Object Recognition under Contextual Incongruity in Large Vision-Language Models

ORIC: Benchmarking Object Recognition under Contextual Incongruity in Large Vision-Language Models

Zhaoyang Li*, Zhan Ling*, Yuchen Zhou, Litian Gong, Erdem Bıyık, Hao Su

arXiv 2025

Large Vision-Language Models (LVLMs) excel at captioning, visual question answering, and robotics by combining vision and language, yet they often miss obvious objects or hallucinate nonexistent ones in atypical scenes. We examine these failures through the lens of uncertainty, focusing on contextual incongruity, where objects appear unexpectedly or fail to appear in expected contexts, and show that such cases increase recognition difficulty for state-of-the-art LVLMs. To study this regime, we introduce the Object Recognition in Incongruous Context (ORIC) framework, which constructs incongruous object-context pairs through two complementary strategies: (1) LLM-guided sampling to identify hard-to-recognize objects present in the image and (2) CLIP-guided sampling to mine plausible but absent ones. Applied to MSCOCO, ORIC produces ORIC-Bench and ORIC-style training data. Evaluating 18 LVLMs and 2 open-vocabulary detectors reveals substantial performance drops and bias patterns under incongruous contexts. Fine-tuning Qwen3-VL-8B-Instruct with Visual Reinforcement Fine-Tuning on 600 ORIC-style samples improves results on ORIC-Bench, AMBER, and HallusionBench. Overall, we show that contextual incongruity is a key source of uncertainty and provide tools for more reliable LVLMs.

PaperCode
A Friendly Grid-connected Distribution System with PV and ESS for Remote Rural Residential Family

A Friendly Grid-connected Distribution System with PV and ESS for Remote Rural Residential Family

Litian Gong, Jiaxuan Ren, Shuoyu Jin, Shaorong Wang

3rd International Conference on New Energy and Power Engineering (ICNEPE) 2023

The study of this paper is to utilize distributed PV and energy storage system (ESS) to both extend the capacity and enhance the supply reliability of remote rural residential family distribution system. The proposed new distribution system in this paper has the highlights: The MPPTs of multi groups of PV arrays in parallel in the system are simultaneously implemented by only adjusting in real time the DC bus operation voltage, so that the topology and control of the PV generation subsystem are simplified and reduced on cost. The grid power injects under controlling into the DC bus through the accessing circuit composed of a six-phase diode rectifier with power frequency isolation transformer and a boost circuit, which uses very mature circuit and simple control to reach a higher reliability in a low cost way comparing with the scheme using three-phase fully controllable power electric switches rectifier. Using three single phase DC/AC inverters, each one with power frequency isolation transformer, independent voltage amplitude control loop and coordinated voltage phase control loop, builds the three-phase DC/AC inverter, whose AC port is the type of Y0 wiring three phase four lines, and with the capacity of tolerating power imbalances to a certain degree among phases. In this paper, the topology, functional units, operation modes and their control ways of the proposed distribution system are introduced in detail. Also, the design method and case simulation results of the proposed system are given.

DOI