Skip to main navigation Skip to search Skip to main content

Doubly Efficient Fuzzy Private Set Intersection for High-Dimensional Data With Cosine Similarity

  • Hyunjung Son
  • , Seunghun Paik
  • , Yunki Kim
  • , Sunpill Kim
  • , Heewon Chung
  • , Jae Hong Seo*
  • *Corresponding author for this work
    • Hanyang University

    Research output: Contribution to journalJournal articlepeer-review

    Abstract

    Fuzzy private set intersection (fuzzy PSI) is a cryptographic protocol that enables privacy-preserving similarity matching, where a client securely learns which of its items are sufficiently close to some item in the service provider’s dataset. This functionality is essential for various real-world applications, including biometric authentication, information retrieval, or recommendation systems. However, existing fuzzy PSI protocols suffer from two major barriers to deployment. First, many cannot handle high-dimensional feature vectors, e.g., from 128 to 512, in practical applications because of their exponential computation/communication overheads in dimension. Furthermore, existing protocols supporting L2 distance cannot accommodate cosine similarity. We found that their distributional assumptions on the data enforce overly strict similarity matching thresholds for the application, leading to prohibitively large false negatives. In this paper, we present FPHC, the first fuzzy PSI protocol that efficiently handles high-dimensional data with cosine similarity. FPHC features linear complexity on both computation and communication with respect to the dimension of set elements. We leverage CKKS, a homomorphic encryption scheme supporting approximate real-valued arithmetic, to compute similarity scores and threshold comparison, along with a clever packing method for efficiency. Moreover, we introduce a novel proof technique to harmonize the approximation error from the sign function with the noise flooding, proving the security of FPHC under the semi-honest model. Finally, we extend our FPHC to functionalities such as labeled or circuit fuzzy PSI. Through experiments, we demonstrate that FPHC can perform fuzzy PSI over 512-dimensional data in a few minutes, which was computationally infeasible in other previous fuzzy PSI proposals.

    Original languageEnglish
    Pages (from-to)217108-217125
    Number of pages18
    JournalIEEE Access
    Volume13
    DOIs
    StatePublished - 2025

    Keywords

    • Fuzzy matching
    • fuzzy private set intersection
    • homomorphic encryption

    Fingerprint

    Dive into the research topics of 'Doubly Efficient Fuzzy Private Set Intersection for High-Dimensional Data With Cosine Similarity'. Together they form a unique fingerprint.

    Cite this