Few-Shot Pattern Detection via Template Matching and Regression

Eunchan Jo Dahyun Kang Sanghyun Kim Yunseon Choi Minsu Cho
Pohang University of Science and Technology (POSTECH)
✨ ICCV 2025 (Highlight) ✨

TL; DR

We propose TMR, a simple template-matching detector for few-shot pattern detection, achieving strong results on diverse datasets including our new dataset RPINE.

Intro teaser image

Few-shot pattern detection. Given a few exemplar for each target pattern, the task is to detect all matching instances of each pattern. This example include non-object patterns as well as object patterns.

Abstract

We address the problem of few-shot pattern detection, which aims to detect all instances of a given pattern, typically represented by a few exemplars, from an input image. Although similar problems have been studied in few-shot object counting and detection (FSCD), previous methods and their benchmarks have narrowed patterns of interest to object categories and often fail to localize non-object patterns. In this work, we propose a simple yet effective detector based on template matching and regression, dubbed TMR. While previous FSCD methods typically represent target exemplars as spatially collapsed prototypes and lose structural information, we revisit classic template matching and regression. It effectively preserves and leverages the spatial layout of exemplars through a minimalistic structure with a small number of learnable convolutional or projection layers on top of a frozen backbone. We also introduce a new dataset, dubbed RPINE, which covers a wider range of patterns than existing object-centric datasets. Our method outperforms the state-of-the-art methods on the three benchmarks, RPINE, FSCD-147, and FSCD-LVIS, and demonstrates strong generalization in cross-dataset evaluation.

Method Overview

Overall network architecture

The proposed method, dubbed template matching and regression (TMR), is designed to be aware of the structure and shape of given exemplars. Given an input image, TMR first extracts a feature map using a backbone network. It then crops a template feature from the support exemplar's bounding box using a template extraction technique based on RoIAlign. This template is correlated with the image feature map to produce a template matching feature map. Using this correlation map, the model learns bounding box regression parameters to rectify the template box size adaptively. This process, termed template-conditioned regression, enables the model to handle support exemplars of varying sizes more effectively. Notably, TMR consists only of a few 3x3 and linear projections without any complicated modules such as cross-attention.

Proposed Dataset: RPINE

Overall network architecture

Existing benchmarks (e.g., FSCD-147, FSCD-LVIS) mainly target object-level patterns, limiting the evaluation of general pattern detection. To address this, we introduce a new dataset, Repeated Patterns IN Everywhere (RPINE), which covers diverse repeated patterns in the real world. RPINE contains images with varying degrees of objectness, from well-defined object-level patterns to non-object patterns, all annotated with bounding boxes via crowd-sourcing. Compared to FSCD datasets, RPINE provides broader coverage, including both non-object patterns and nameless parts of objects.

Results

Quantitative comparison

Method SD MAE (↓) RMSE (↓) AP (↑) AP50 (↑) AP75 (↑)
C-DETR 9.58 21.24 13.88 32.20 10.22
SAM-C 18.77 37.14 18.80 34.04 18.74
PseCo 48.20 88.16 23.18 44.54 21.24
GeCo 9.57 17.07 23.33 45.93 21.19
TMR (ours) 8.45 19.87 33.59 64.05 30.52
TMR (ours) 8.30 19.40 29.66 58.94 25.41
One-shot pattern counting and detection results on the RPINE dataset.
SD denotes box refinement with the SAM decoder. All the models are trained by the official code.
Method Seen Unseen
AP (↑) AP50 (↑) AP (↑) AP50 (↑)
FSDetView-PB 2.72 7.57 1.03 2.89
AttRPN-PB 4.08 11.15 3.15 7.87
C-DETR 4.92 14.49 3.85 11.28
DAVE 6.75 22.51 4.12 14.16
PseCo 22.37 42.56 - -
GeCo - - 11.47 24.49
TMR (ours) 27.49 48.48 22.71 39.68
Three-shot counting detection-based methods on the FSCD-LVIS seen and unseen split.

Cross-dataset comparison

Train Test cross-eval AP AP50
GeCo TMR GeCo TMR
FSCD-147 FSCD-147 43.42 44.43 75.06 73.83
FSCD-LVISseen 13.96 21.25 25.87 37.18
RPINE 19.47 26.21 38.69 52.01
RPINE FSCD-147 36.99 41.39 60.38 69.19
FSCD-LVISseen 10.01 20.92 17.44 37.87
RPINE 23.33 29.66 45.93 58.94
Cross-dataset comparison of GeCo and TMR.
TMR presents overwhelming performances, showing its strong generalization ability.

Qualitative Results

qualitative image
Qualitative comparison with the state-of-the-art models on RPINE (the first two images) and FSCD-147 (the last two images).
qualitative image
Additional qualitative results on the FSCD-147 dataset.

References

PseCo: Huang, Zhizhong, et al. "Point segment and count: A generalized framework for object counting." In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024.

GeCo: Pelhan, Jer, et al. "GeCo: A novel unified architecture for low-shot counting by detection and segmentation." Advances in Neural Information Processing Systems 37 (2024): 66260-66282.

BibTeX

@inproceedings{jo2025tmr,
  title     = {Few-Shot Pattern Detection via Template Matching and Regression},
  author    = {Eunchan Jo, Dahyun Kang, Sanghyun Kim, Yunseon Choi, and Minsu Cho},
  booktitle = {International Conference on Computer Vision (ICCV)},
  year      = {2025},
}