active networks Advertising big data cloud computing Clustering algorithms Computational modeling Context-awareness corba dynamic configuration Electronic Commerce graph processing elasticity Hadoop hdfs Kernel MapReduce Middleware monitoring Network topology operating systems Organizations Pervasive Comput Pervasive Computing SDN networks Fabric Security security policies Software storage temporal locality ubiquitous computing verification

## 2012 |

Farivar, Reza ; Raghunathan, Anand ; Chakradhar, Srimat ; Kharbanda, Harshit ; Campbell, Roy H PIC: Partitioned Iterative Convergence for Clusters Conference 2012 IEEE International Conference on Cluster Computing, IEEE IEEE, 2012, ISBN: 978-0-7695-4807-4. Abstract | BibTeX | Tags: Clustering algorithms, Computational modeling, Convergence, Data models, Integrated circuit modeling, Partitioning algorithms @conference{198, title = {PIC: Partitioned Iterative Convergence for Clusters}, author = {Farivar, Reza and Raghunathan, Anand and Chakradhar, Srimat and Kharbanda, Harshit and Campbell, Roy H.}, isbn = {978-0-7695-4807-4}, year = {2012}, date = {2012-01-01}, booktitle = {2012 IEEE International Conference on Cluster Computing}, pages = {391-401}, publisher = {IEEE}, organization = {IEEE}, abstract = {Iterative-convergence algorithms are frequently used in a variety of domains to build models from large data sets. Cluster implementations of these algorithms are commonly realized using parallel programming models such as MapReduce. However, these implementations suffer from significant performance bottlenecks, especially due to large volumes of network traffic resulting from intermediate data and model updates during the iterations. To address these challenges, we propose partitioned iterative convergence (PIC), a new approach to programming and executing iterative convergence algorithms on frameworks like MapReduce. In PIC, we execute the iterative-convergence computation in two phases - the best-effort phase, which quickly produces a good initial model and the top-off phase, which further refines this model to produce the final solution. The best-effort phase iteratively performs the following steps: (a) partition the input data and the model to create several smaller, model-building sub-problems, (b) independently solve these sub-problems using iterative convergence computations, and (c) merge solutions of the sub-problems to create the next version of the model. This partitioned, loosely coupled execution of the computation produces a model of good quality, while drastically reducing network traffic due to intermediate data and model updates. The top-off phase further refines this model by employing the original iterative-convergence computation on the entire (un-partitioned) problem until convergence. However, the number of iterations executed in the top-off phase is quite small, resulting in a significant overall improvement in performance. We have implemented a library for PIC on top of the Hadoop MapReduce framework, and evaluated it using five popular iterative-convergence algorithms (Page Rank, K-Means clustering, neural network training, linear equation solver and image smoothing). Our evaluations on clusters ranging from 6 nodes to 256 nodes demonstrate a 2.5X-4X speedup compared to conventional implementations using Hadoop.}, keywords = {Clustering algorithms, Computational modeling, Convergence, Data models, Integrated circuit modeling, Partitioning algorithms}, pubstate = {published}, tppubtype = {conference} } Iterative-convergence algorithms are frequently used in a variety of domains to build models from large data sets. Cluster implementations of these algorithms are commonly realized using parallel programming models such as MapReduce. However, these implementations suffer from significant performance bottlenecks, especially due to large volumes of network traffic resulting from intermediate data and model updates during the iterations. To address these challenges, we propose partitioned iterative convergence (PIC), a new approach to programming and executing iterative convergence algorithms on frameworks like MapReduce. In PIC, we execute the iterative-convergence computation in two phases - the best-effort phase, which quickly produces a good initial model and the top-off phase, which further refines this model to produce the final solution. The best-effort phase iteratively performs the following steps: (a) partition the input data and the model to create several smaller, model-building sub-problems, (b) independently solve these sub-problems using iterative convergence computations, and (c) merge solutions of the sub-problems to create the next version of the model. This partitioned, loosely coupled execution of the computation produces a model of good quality, while drastically reducing network traffic due to intermediate data and model updates. The top-off phase further refines this model by employing the original iterative-convergence computation on the entire (un-partitioned) problem until convergence. However, the number of iterations executed in the top-off phase is quite small, resulting in a significant overall improvement in performance. We have implemented a library for PIC on top of the Hadoop MapReduce framework, and evaluated it using five popular iterative-convergence algorithms (Page Rank, K-Means clustering, neural network training, linear equation solver and image smoothing). Our evaluations on clusters ranging from 6 nodes to 256 nodes demonstrate a 2.5X-4X speedup compared to conventional implementations using Hadoop. |