Skip to main navigation Skip to search Skip to main content

A grid-based k-nearest neighbor join for large scale datasets on MapReduce

  • Miyoung Jang
  • , Young Sung Shin
  • , Jae Woo Chang*
  • *Corresponding author for this work

    Research output: Contribution to conferenceConference paperpeer-review

    Abstract

    Because MapReduce supports efficient parallel data processing, MapReduce-based query processing algorithms have been widely studied. Among various query types, k-nearest neighbor join, which aims to produce the k nearest neighbors of each point of a dataset from another dataset, has been considered most important in data analysis. Existing k-NN join query processing algorithms on MapReduce suffer from high index construction and computation costs which make them unsuitable for big data processing. In this paper, we propose a new grid-based k-NN join query processing algorithm on MapReduce. First, we design a dynamic grid index that represents the distribution of join datasets. Based on this index, we prune out unnecessary cells for the join with the distance-based filtering. This can reduce the data transmission and computation overheads. From performance analysis, we show that our algorithm outperforms the existing scheme up to seven times in terms of query processing time while achieving high query result accuracy.

    Original languageEnglish
    Title of host publicationProceedings - 2015 IEEE 17th International Conference on High Performance Computing and Communications, 2015 IEEE 7th International Symposium on Cyberspace Safety and Security and 2015 IEEE 12th International Conference on Embedded Software and Systems, HPCC-CSS-ICESS 2015
    PublisherInstitute of Electrical and Electronics Engineers Inc.
    Pages888-891
    Number of pages4
    ISBN (Electronic)9781479989362
    DOIs
    StatePublished - 2015.11.23
    Event17th IEEE International Conference on High Performance Computing and Communications, IEEE 7th International Symposium on Cyberspace Safety and Security and IEEE 12th International Conference on Embedded Software and Systems, HPCC-ICESS-CSS 2015 - New York, United States
    Duration: 2015.08.242015.08.26

    Publication series

    NameProceedings - 2015 IEEE 17th International Conference on High Performance Computing and Communications, 2015 IEEE 7th International Symposium on Cyberspace Safety and Security and 2015 IEEE 12th International Conference on Embedded Software and Systems, HPCC-CSS-ICESS 2015

    Conference

    Conference17th IEEE International Conference on High Performance Computing and Communications, IEEE 7th International Symposium on Cyberspace Safety and Security and IEEE 12th International Conference on Embedded Software and Systems, HPCC-ICESS-CSS 2015
    Country/TerritoryUnited States
    CityNew York
    Period15.08.2415.08.26

    Keywords

    • Bitmap encryption
    • Cloud computing
    • Density aware
    • Grid index
    • Location data protection

    Quacquarelli Symonds(QS) Subject Topics

    • Computer Science & Information Systems

    Fingerprint

    Dive into the research topics of 'A grid-based k-nearest neighbor join for large scale datasets on MapReduce'. Together they form a unique fingerprint.

    Cite this