Publications

You can also find my articles on my Google Scholar profile or on my Semantic Scholar profile.

Mind the Gap: Aligning Knowledge Bases with User Needs to Enhance Mental Health Retrieval

Published in NeurIPS GenAI4Health Workshop, 2025

Access to reliable mental health information is vital for early help-seeking, yet expanding knowledge bases is resource-intensive and often misaligned with user needs. This results in poor performance of retrieval systems when presented concerns are not covered or expressed in informal or contextualized language. We present an AI-based gap-informed framework for corpus augmentation that authentically identifies underrepresented topics (gaps) by overlaying naturalistic user data such as forum posts in order to prioritize expansions based on coverage and usefulness. In a case study, we compare Directed (gap-informed augmentations) with Non-Directed augmentation (random additions), evaluating the relevance and usefulness of retrieved information across four retrieval-augmented generation (RAG) pipelines. Directed augmentation achieved near-optimal performance with modest expansions–requiring only a 42% increase for Query Transformation, 74% for Reranking and Hierarchical, and 318% for Baseline–to reach ~95% of the performance of an exhaustive reference corpus. In contrast, Non-Directed augmentation required substantially larger and thus practically infeasible expansions to achieve comparable performance (232%, 318%, 403%, and 763%, respectively). These results show that strategically targeted corpus growth can reduce content creation demands while sustaining high retrieval and provision quality, offering a scalable approach for building trusted health information repositories and supporting generative AI applications in high-stakes domains.

Recommended citation: Chan et al. (2025). "Mind the Gap: Aligning Knowledge Bases with User Needs to Enhance Mental Health Retrieval." NeurIPS GenAI4Health Workshop. https://arxiv.org/abs/2509.13626

Harnessing digital phenotyping to advance university student mental health (Brightline) in Singapore: study protocol for a prospective observational study

Published in BMJ Open, 2025

TLDR: This study will employ an observational study design over a 6-month period, recruiting 500 students from a major public university in Singapore, to identify the digital biomarkers associated with depression, anxiety, stress, loneliness and affect among university students.

Recommended citation: Ito et al. (2025). "Harnessing digital phenotyping to advance university student mental health (Brightline) in Singapore: study protocol for a prospective observational study." BMJ Open. https://bmjopen.bmj.com/lookup/doi/10.1136/bmjopen-2025-103652

Conversational Self-Play for Discovering and Understanding Psychotherapy Approaches

Published in AI4X Conference, 2025

This paper explores conversational self-play with LLMs as a scalable approach for analyzing and exploring psychotherapy approaches, evaluating how well AI-generated therapeutic dialogues align with established modalities.

Recommended citation: Kampman, Onno P. (2025). "Conversational Self-Play for Discovering and Understanding Psychotherapy Approaches." AI4X Conference. https://www.semanticscholar.org/paper/Conversational-Self-Play-for-Discovering-and-Kampman-Xing/5cfc7ea13348b11fb52bed98dd431b8c1809f4b6

SEACrowd: A Multilingual Multimodal Data Hub and Benchmark Suite for Southeast Asian Languages

Published in EMNLP, 2024

This work introduces SEACrowd, a comprehensive resource center that fills the resource gap by providing standardized corpora in nearly 1,000 SEA languages across three modalities, and assesses the quality of AI models on 36 indigenous languages across 13 tasks.

Recommended citation: Lovenia et al. (2024). "SEACrowd: A Multilingual Multimodal Data Hub and Benchmark Suite for Southeast Asian Languages." EMNLP. https://aclanthology.org/2024.emnlp-main.296.pdf

Time-varying functional connectivity as Wishart processes

Published in Imaging Neuroscience, 2024

The WP outperformed a sliding window approach with adaptive cross-validated window lengths and a dynamic conditional correlation-multivariate generalized autoregressive conditional heteroskedasticity (MGARCH) baseline on the external stimulus prediction task, while being less prone to false positives in the TVFC null models.

Recommended citation: Kampman et al. (2024). "Time-varying functional connectivity as Wishart processes." Imaging Neuroscience. https://direct.mit.edu/imag/article/doi/10.1162/imag_a_00184/121101/Time-varying-functional-connectivity-as-Wishart