An efficient partition-based filtering for similarity joins on mapreduce framework

  • Miyoung Jang
  • , Archana B. Lokhande
  • , Naeun Baek
  • , Jae Woo Chang*
  • *Corresponding author for this work

    Research output: Contribution to conferenceConference paperpeer-review

    Abstract

    Similarity join is an important operation in MapReduce framework to find pairs of similar objects like images, video and time series. Since MapReduce basics do not support efficient join processing, the duplicate reduction of candidates and load-balancing among partitions are the major challenges. Recently, many partition based similarity join algorithms have been proposed to solve such problems. However, the existing algorithms still have limitations for supporting efficient join processing over large-scale data set. In this paper, we proposed a similarity join algorithm with an efficient filtering technique on MapReduce to overcome the limitations of traditional partitioning method in two ways: (1) the number of outputs records generated by the filtering matrix reduces duplicates and (2) the estimated join cost generated by using a partition matrix leads to a better load-balance among reducers. Moreover, we have conducted experimental evaluations using sequential data to show the speed-up and scale-up of proposed method.

    Original languageEnglish
    Title of host publicationAdvanced Multimedia and Ubiquitous Engineering - MUE/FutureTech 2017
    EditorsJames J. Park, Shu-Ching Chen, Kim-Kwang Raymond Choo
    PublisherSpringer Verlag
    Pages528-533
    Number of pages6
    ISBN (Print)9789811050404
    DOIs
    StatePublished - 2017
    Event12th International Conference on Future Information Technology, FutureTech 2017 - Seoul, Korea, Republic of
    Duration: 2017.05.222017.05.24

    Publication series

    NameLecture Notes in Electrical Engineering
    Volume448
    ISSN (Print)1876-1100
    ISSN (Electronic)1876-1119

    Conference

    Conference12th International Conference on Future Information Technology, FutureTech 2017
    Country/TerritoryKorea, Republic of
    CitySeoul
    Period17.05.2217.05.24

    Keywords

    • Join matrix
    • Load balancing
    • MapReduce-based similarity join algorithm
    • Parallel join processing

    Quacquarelli Symonds(QS) Subject Topics

    • Engineering - Mechanical

    Fingerprint

    Dive into the research topics of 'An efficient partition-based filtering for similarity joins on mapreduce framework'. Together they form a unique fingerprint.

    Cite this