|
High-Fidelity Tuning of Olfactory Mixture Distances in the Perceptual Space of Smell Through a Community Effort
Vahid Satarifard*, Laura Sisson*, Yikun Han*, Pedro Ilidio*, Matej Hladis*, Maxence Lalis*, Xuebo Song, Wenjie Yin, Aharon Ravia, CiCi Xingyu Zheng, Gaia Andreoletti, Jake Albrecht, Robert Pellegrino, Zehua Wang, Stephen Yang, Robbe D'hondt, Achilleas Ghinis, Jasper de Boer, Felipe Kenji Nakano, Alireza Gharahighehi, DREAM Olfactory Mixtures Prediction Consortium, Benjamin Sanchez-Lengeling, Andreas Keller, Leslie B. Vosshall, Sebastien Fiorucci, Ambuj Tewari, Jeremie Topin, Celine Vens, Marten Bjorkman, Danica Kragic, Noam Sobel, Nicholas A. Christakis, Joel D. Mainland, Pablo Meyer
bioRxiv, 2025
arxiv
/ code
We present an ensemble model derived from the DREAM Olfactory Mixtures Prediction Challenge that accurately predicts the perceptual similarity of complex odor mixtures. By aggregating top-performing architectures, our approach outperforms state-of-the-art methods, establishing a robust, validated framework for mapping molecular combinations to human olfactory perception.
|
|
Teaching Machine Olfaction in an Undergraduate Deep Learning Course: An Interdisciplinary Approach Based on Chemistry, Machine Learning, and Sensory Evaluation
Yikun Han, Michelle Krell Kydd, Joseph Ward, Ambuj Tewari
arXiv, 2025
code
We integrated machine olfaction into an undergraduate deep learning course, introducing smell as a new modality alongside traditional data types. Hands-on activities and graph neural networks enhanced student engagement and comprehension. We discuss challenges and future improvements.
|
|
Beyond Answers: Transferring Reasoning Capabilities to Smaller LLMs Using Multi-Teacher Knowledge Distillation
Yijun Tian*, Yikun Han*, Xiusi Chen*, Wei Wang, Nitesh V. Chawla
International Conference on Web Search and Data Mining (WSDM), 2025
paper
/ arxiv
/ code
We present TinyLLM, a knowledge distillation approach that transfers reasoning abilities from multiple large language models (LLMs) to smaller ones. TinyLLM enables smaller models to generate both accurate answers and rationales, achieving superior performance despite a significantly reduced model size.
|
|
Mapping from Meaning: Addressing the Miscalibration of Prompt-Sensitive Language Models
Kyle Cox, Jiawei Xu, Yikun Han, Abby Xu, Tianhao Li, Chi-Yang Hsu, Tianlong Chen, Walter Gerych, Ying Ding
AAAI Conference on Artificial Intelligence (AAAI), 2025
paper
/ arxiv
/ code
We explore prompt sensitivity in large language models (LLMs), where semantically identical prompts can yield vastly different outputs. By modeling this sensitivity as generalization error, we improve uncertainty calibration using paraphrased prompts. Additionally, we propose a new metric to quantify uncertainty caused by prompt variations, offering insights into how LLMs handle semantic continuity in natural language.
|
|
When Large Language Models Meet Vector Databases: A Survey
Zhi Jing*, Yongye Su*, Yikun Han*
Artificial Intelligence x Multimedia (AIxMM), 2025
paper
/ arxiv
We survey the integration of Large Language Models (LLMs) and Vector Databases (VecDBs), highlighting VecDBs’ role in addressing LLM challenges like hallucinations, outdated knowledge, and memory inefficiencies. This review outlines foundational concepts and explores how VecDBs enhance LLM performance by efficiently managing vector data, paving the way for future advancements in data handling and knowledge extraction.
|
|
A Community Detection and Graph-Neural-Network-Based Link Prediction Approach for Scientific Literature
Chunjiang Liu*, Yikun Han*, Haiyun Xu, Shihan Yang, Kaidi Wang, Yongye Su
Mathematics, 2024
paper
/ arxiv
We integrate the Louvain community detection algorithm with various GNN models to improve link prediction in scientific literature networks. This approach consistently boosts performance, with models like GAT seeing AUC increases from 0.777 to 0.823, demonstrating the effectiveness of combining community insights with GNNs.
|
|
A Comprehensive Survey on Vector Database: Storage and Retrieval Technique, Challenge
Le Ma*, Ran Zhang*, Yikun Han*, Shirui Yu, Zaitian Wang, Zhiyuan Ning, Jinghan Zhang, Ping Xu, Pengjiang Li, Wei Ju, Chong Chen, Dongjie Wang, Kunpeng Liu, Pengyang Wang, Pengfei Wang, Yanjie Fu, Chunjiang Liu, Yuanchun Zhou, Chang-Tien Lu
arXiv, 2023
arxiv
We present a comprehensive survey of vector database techniques—covering hash-, tree-, graph-, and quantization-based ANNS methods—and outline integration opportunities with large language models for emerging research.
|
|
DREAM Olfactory Mixtures Prediction Challenge
Yikun Han, Zehua Wang, Stephen Yang, Ambuj Tewari
RECOMB/ISCB Conference on Regulatory & Systems Genomics with DREAM Challenges, 2024
writeup
/ video
/ code
/ website
/ news
/ slide
We use pre-trained graph neural networks and boosting techniques to enhance odor mixture discriminability, transforming single molecule embeddings into mixture predictions with improved robustness and accuracy.
|
|
Advisor: Prof. Ambuj Tewari
Research Topics:
[1] Graph Neural Networks
[2] Molecular Property Prediction
[3] Protein-Ligand Affinity Prediction
|
|
Advisor: Prof. Ying Ding, Prof. Jiliang Tang
Research Topics:
[1] Graph Retrieval-Augmented Generation
[2] Medical AI
[3] Collaborator Recommendation
|
|
Advisor: Prof. Nitesh V. Chawla
Research Topics:
[1] Knowledge Distillation
[2] Multi-Teacher Collaboration
[3] In-Context Learning
|
|
Advisor: Prof. Gang Chen
Research Topics:
[1] LAPACK Optimization
[2] Parallel Computation for Large-Scale Matrices
[3] High-Performance Matrix Factorization and Back Substitution
|
|
PhD
Information Sciences
GPA: 4.00/4.00
|
|
Master
Data Science
GPA: 3.97/4.00
|
|
Bachelor
Information Resources Management
GPA: 3.87/4.00
Rank: 2/76
|
|
RSGDREAM Travel Award, 2024
Outstanding Graduate, 2023
Second Prize Scholarship 2022
Outstanding Student, 2021
Outstanding Student, 2020
|
|
Program Committee Member: GenAI4Health@NeurIPS 2025
Reviewer: ICWSM 2026, AMIA 2026, IEEE TNNLS
|
|