|
|
Visual Words Meet BM25: Sparse Auto-encoder Visual Word Scoring for Image Retrieval
Donghoon Han,
Eunhwan Park,
Seunghyeon Seo
Preprint, 2026
arxiv
/
project page
We apply Okapi BM25 scoring to sparse visual words from a Sparse Autoencoder on ViT patch tokens,
enabling efficient, interpretable image retrieval that matches dense accuracy via a two-stage pipeline.
|
|
|
CacheFocus: Dynamic Cache Re-Positioning for Efficient
Retrieval-Augmented Generation
Kun-Hui Lee*,
Eunhwan Park*,
Donghoon Han,
Seung-Hoon Na
Under Review
arXiv
CacheFocus improves long-context handling in LLMs by optimizing cache reuse and reducing inference
latency without additional training. It introduces Layer-Adaptive Cache Pruning and Adaptive
Positional Allocation to enhance efficiency and mitigate performance degradation. Experiments show
that CacheFocus outperforms existing methods, maintaining strong performance even with extended
input lengths.
|
|
|
Layerwise-priority-based gradient adjustment for few-shot learning
Jangho Kum,
Junhoo Lee,
Donghoon Han,
Nojun Kwak
Expert Systems with Applications 2025
paper
/
code
GAIL (Gradient Adjustment in Inner Loop) improves few-shot learning by adjusting per-layer gradient
updates based on priority, enhancing task adaptation. Unlike prior methods that either freeze or
selectively update layers, GAIL aligns gradients across tasks for better convergence. Experiments
show that GAIL outperforms existing inner-loop adaptation techniques in both speed and accuracy.
|
|
|
MERLIN: Multimodal Embedding Refinement via LLM-based Iterative Navigation for
Text-Video Retrieval-Rerank Pipeline
Donghoon Han*,
Eunhwan Park*,
Gisang Lee*,
Adam Lee,
Nojun Kwak
EMNLP 2024 Industry Track
code /
arXiv
MERLIN leverages LLMs in a training-free, iterative feedback pipeline to refine text-video
retrieval, significantly enhancing alignment between user queries and video content, with boosted
improvements in Recall@1 across datasets.
|
|
|
Unleash the Potential of CLIP for Video Highlight Detection
Donghoon Han*,
Seunghyeon Seo*,
Eunhwan Park,
SeongUk Nam,
Nojun Kwak
CVPR 2024 Workshop on Efficient Large Vision Models
arXiv
We leverage the pre-trained multimodal model CLIP to achieve state-of-the-art performance in video
highlight detection by fine-tuning the encoder and integrating a novel saliency pooling technique.
|
|
|
ConcatPlexer: Additional Dim1 Batching for Faster ViTs
Donghoon Han,
Seunghyeon Seo,
DongHyeon Jeon,
Jiho Jang,
Chaerin Kong,
Nojun Kwak
NeurIPS 2023 Workshop on Advancing Neural Network Training  
(Oral)
arXiv
We expedite ViT inference by concatenating abstract visual tokens from multiple images along dim=1
and processing them collectively.
|
|
|
MixNeRF: Modeling a Ray with Mixture Density for Novel View Synthesis from Sparse Inputs
Seunghyeon Seo,
Donghoon Han*,
Yeonjin Chang*,
Nojun Kwak
CVPR 2023   (Qualcomm Innovation Fellowship Korea 2023
Winner)
project page
/
code
/
video
/
arXiv
We model a ray with mixture density model, leading to efficient learning of density distribution
with sparse inputs, and propose an effective auxiliary task of ray depth estimation for few-shot
novel view synthesis.
|
|
|
Few-shot Image Generation with Mixup-based Distance Learning
Chaerin Kong,
Jeesoo Kim,
Donghoon Han,
Nojun Kwak
ECCV 2022
project page
/
code
/
arXiv
Instead of directly combatting memorization for few-shot (n<100) image synthesis, we propose
latent
space smoothing regularizations that empower the generator to produce diverse (perceptually
continuous) set of samples.
|
|
|
Correlation-Concealing Adversarial Noise
Injection for Improved Disentanglement in
Label-Based Image Translation
Seonguk Park,
Jookyung Song,
Donghoon Han,
Nojun Kwak
IEEE Access 2023
paper link
This study reveals a significant limitation in multi-domain image translation models: the inability
to perform effective recursive translations. The authors propose a simple solution using additive
perturbations during training, which not only addresses this issue but also enhances overall
translation quality.
|
|
|
The U-Net based GLOW for Optical-Flow-free Video Interframe Generation
Saem Park,
Donghoon Han,
Nojun Kwak
ICPRAM 2022
arXiv
This study presents a new method for video frame interpolation using an invertible U-Net based
Generative Flow, avoiding optical flow techniques. The approach maintains temporal consistency and
image quality, offering a innovative baseline for video interpolation without traditional
limitations.
|
Thanks for sharing the website template, Jon Barron. :)
|
|