Enabling Strategies for Big Data Analytics in Hybrid Infrastructures

Published in HPCS 2018, 2018

Recommended citation: C. S. Anjos, Julio; Matteussi, Kassiano; R. R. De Souza Jr, Paulo; da Silva Veith, Alexandre; Fedak, Gilles; Luis Victoria Barbosa, Jorge and R. Geyer, Claudio

[Paper] [BIBTEX]

Abstract

A huge volume of data is produced every day by social networks (e.g. Facebook, Instagram, Whatsapp, etc.), sensors, mobile devices and other applications. Although the Cloud computing scenario has grown rapidly in recent years, it still suffers from a lack of the kind of standardization that involves the resource management for Big Data applications, such as the case of MapReduce. In this context, the users face a big challenge in attempting to understand the requirements of the application and how to consolidate the resources properly. This scenario raises significant challenges in the different areas: systems, infrastructure, platforms as well as providing several research opportunities in Big Data Analytics. This work proposes the use of hybrid infrastructures such as Cloud and Volunteer Computing for Big Data processing and analysis. In addition, it provides a data distribution model that improves the resource management of Big Data applications in hybrid infrastructures. The results indicate the feasibility of hybrid infrastructures since it supports the reproducibility and predictability of Big Data processing by low and high-scale simulation within Hybrid infrastructures.