Skip to main navigation Skip to search Skip to main content

A semi-clustering scheme for high performance PageRank on Hadoop

  • Seungtae Hong
  • , Jeonghoon Lee
  • , Jaewoo Chang
  • , Dong Hoon Choi*
  • *Corresponding author for this work

    Research output: Contribution to conferenceConference paperpeer-review

    Abstract

    As global Internet business has been evolving, large-scale graphs are becoming popular. PageRank computation on the large-scale graphs using Hadoop with default data partitioning method suffers from poor performance because Hadoop scatters even a set of directly connected vertices to arbitrary multiple nodes. In this paper we propose a semi-clustering scheme to address this problem and improve the performance of PageRank on Hadoop. Our scheme divides a graph into a set of semi-clusters, each of which consists of connected vertices, and assigns a semi-cluster to a single data partition in order to reduce the cost of data exchange between nodes during the computation of PageRank. The semi-clusters are merged and split before the PageRank computation, in order to evenly distribute a large-scale graph into a number of data partitions. Our semi-clustering scheme drastically improves the performance: total elapsed time including the cost of the semi-clustering computation reduced by up to 36%. Furthermore, the effectiveness of our scheme increases as the size of the graph increases.

    Original languageEnglish
    Title of host publicationAdvances in Conceptual Modeling - ER 2014 Workshops, ENMO, MoBiD, MReBA, QMMQ, SeCoGIS, WISM, and ER Demos, Proceedings
    EditorsMarta Indulska, Sandeep Purao
    PublisherSpringer Verlag
    Pages35-44
    Number of pages10
    ISBN (Electronic)9783319122557
    DOIs
    StatePublished - 2014
    Event33rd International Conference on Conceptual Modeling, ER 2014 - Atlanta, United States
    Duration: 2014.10.272014.10.29

    Publication series

    NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
    Volume8823
    ISSN (Print)0302-9743
    ISSN (Electronic)1611-3349

    Conference

    Conference33rd International Conference on Conceptual Modeling, ER 2014
    Country/TerritoryUnited States
    CityAtlanta
    Period14.10.2714.10.29

    Keywords

    • Hadoop
    • Large-scale graph analysis
    • PageRank
    • Semi-clustering

    Quacquarelli Symonds(QS) Subject Topics

    • Computer Science & Information Systems

    Fingerprint

    Dive into the research topics of 'A semi-clustering scheme for high performance PageRank on Hadoop'. Together they form a unique fingerprint.

    Cite this