A new efficient resource management framework for iterative mapreduce processing in large-scale data analysis

  • Seungtae Hong
  • , Kyongseok Park
  • , Chae Deok Lim
  • , Jae Woo Chang*
  • *Corresponding author for this work

    Research output: Contribution to journalJournal articlepeer-review

    Abstract

    To analyze large-scale data efficiently, studies on Hadoop, one of the most popular MapReduce frameworks, have been actively done. Meanwhile, most of the large-scale data analysis applications, e.g., data clustering, are required to do the same map and reduce functions repeatedly. However, Hadoop cannot provide an optimal performance for iterative MapReduce jobs because it derives a result by doing one phase of map and reduce functions. To solve the problems, in this paper, we propose a new efficient resource management framework for iterative MapReduce processing in large-scale data analysis. For this, we first design an iterative job state-machine for managing the iterative MapReduce jobs. Secondly, we propose an invariant data caching mechanism for reducing the I/O costs of data accesses. Thirdly, we propose an iterative resource management technique for efficiently managing the resources of a Hadoop cluster. Fourthly, we devise a stop condition check mechanism for preventing unnecessary computation. Finally, we show the performance superiority of the proposed framework by comparing it with the existing frameworks.

    Original languageEnglish
    Pages (from-to)704-717
    Number of pages14
    JournalIEICE Transactions on Information and Systems
    VolumeE100D
    Issue number4
    DOIs
    StatePublished - 2017.04

    Keywords

    • Hadoop
    • Iterative data processing framework
    • Large-scale data analysis
    • Mapreduce

    Quacquarelli Symonds(QS) Subject Topics

    • Computer Science & Information Systems
    • Engineering - Electrical & Electronic
    • Engineering - Petroleum
    • Data Science

    Fingerprint

    Dive into the research topics of 'A new efficient resource management framework for iterative mapreduce processing in large-scale data analysis'. Together they form a unique fingerprint.

    Cite this