EFFECTIVE AND EFFICIENT WAY OF REDUCE DEPENDENCY ON DATASET WITH THE HELP OF MAPREDUCE ON BIG DATA

Satish Londhe; Smita Mahajan

doi:10.18510/ijsrtm.2015.364

Submitted

September 30, 2015

Accepted

September 30, 2015

Published

September 30, 2015

Download

PDF

Statistic

Read Counter : 293 Download : 164

Downloads

Download data is not yet available.

Abstract

With the fast development of networks these days organizations has overflowing with the collection of millions of data with big number of combination. This big data challenges over trade troubles. It requires more analysis for the high-performance procedure. The new method of hadoop and MapReduce methods are discussed starting the data mining standpoint. In the proposed research work we have to progress performance through parallelization of different operations such as loading the information, index building and evaluating the queries. Thus the performance analysis is completed with the minimum of three nodes with in the Amazon cloud environment. Hbase is a open source, non-relational and distributed database model. It executes on the pinnacle of Hadoop. It consists of a single key with multiple values. Looping is avoid in retrieving a meticulous data from huge datasets and it consume less amount of time for execute the data. HDFS file system is used to store the data after performing arts the map reduces operations and the execution time is decreased when the amount of nodes gets increased. The performance analysis is tuned with the parameters such as the carrying out complexity.

Keywords

Mapreduce Data Mining Big data Hadoop etc

License

Authors retain the copyright without restrictions for their published content in this journal. IJSRTM is a SHERPA ROMEO Journal.

Publishing License

This is an open-access article distributed under the terms of

How to Cite

Londhe, S., & Mahajan, S. (2015). EFFECTIVE AND EFFICIENT WAY OF REDUCE DEPENDENCY ON DATASET WITH THE HELP OF MAPREDUCE ON BIG DATA. International Journal of Students’ Research in Technology & Management, 3(6), 401–405. https://doi.org/10.18510/ijsrtm.2015.364

Download Citation

References

O'Reilly Media ( 2013).â€ Disruptive Possibilities: How Big Data Changes Everythingâ€.
McKinsey Global Institute : Big Data:The next frontier for innovation, competition , and productivity.
Coulouris GF, Dollimore J, Kindberg T: Distributed Systems: Concepts and Design: Pearson Education; 2005.
de Oliveira Branco M: Distributed Data Management for Large Scale Applications. Southampton â€“ United Kingdom: University of Southampton; 2009.
Fayyad, U. M., Piatetsky-Shapiro, G., Smyth, P., & Uthurusamy, R. (1996). Advances in knowledge discovery & data mining. Cambridge, MA: MIT Press.
Han, J., Kamber, M. (2000). Data mining: Concepts and Techniques. New York: Morgan-Kaufman.
Hastie, T., Tibshirani, R., & Friedman, J. H. (2001). The elements of statistical learning : Data mining, inference, and prediction. New York: Springer. DOI: https://doi.org/10.1007/978-0-387-21606-5
Pregibon, D. (1997). Data Mining. Statistical Computing and Graphics, 7, 8.
Weiss, S. M., & Indurkhya, N. (1997). Predictive data mining: A practical guide. New York: Morgan-Kaufman.
Westphal, C., Blaxton, T. (1998). Data mining solutions. New York: Wiley.
Dean J, Ghemawat S: MapReduce: simplified data processing on large clusters. Commun ACM 2008, 51(1):107â€“113. DOI: https://doi.org/10.1145/1327452.1327492
Peyton Jones SL: The Implementation of Functional Programming Languages (Prentice-Hall International Series in Computer Science). New Jersey â€“ USA: Prentice-Hall, Inc; 1987.
http://en.wikipedia.org/wiki/Apache_Hadoop.Apache .
https://infosys.uni-saarland.de/publications/ BigDataTutorial.pdf.
Shvachko K, Kuang H, Radia S, Chansler R: The hadoop distributed file system. In Mass Storage Systems and Technologies (MSST), 2010 IEEE 26th Symposium on: 2010, IEEE; 2010:1â€“10. DOI: https://doi.org/10.1109/MSST.2010.5496972
The Apache Software Foundation. http://apache.org/
Olson M: Hadoop: Scalable, flexible data storage and analysis. IQT Quart 2010, 1(3):14â€“18.
Xiaojing J: Google Cloud Computing Platform Technology Architecture and the Impact of Its Cost. In 2010 Second WRI World Congress on Software Engineering: 2010; 2010:17â€“20.

References

O'Reilly Media ( 2013).â€ Disruptive Possibilities: How Big Data Changes Everythingâ€.

McKinsey Global Institute : Big Data:The next frontier for innovation, competition , and productivity.

Coulouris GF, Dollimore J, Kindberg T: Distributed Systems: Concepts and Design: Pearson Education; 2005.

de Oliveira Branco M: Distributed Data Management for Large Scale Applications. Southampton â€“ United Kingdom: University of Southampton; 2009.

Fayyad, U. M., Piatetsky-Shapiro, G., Smyth, P., & Uthurusamy, R. (1996). Advances in knowledge discovery & data mining. Cambridge, MA: MIT Press.

Han, J., Kamber, M. (2000). Data mining: Concepts and Techniques. New York: Morgan-Kaufman.

Hastie, T., Tibshirani, R., & Friedman, J. H. (2001). The elements of statistical learning : Data mining, inference, and prediction. New York: Springer. DOI: https://doi.org/10.1007/978-0-387-21606-5

Pregibon, D. (1997). Data Mining. Statistical Computing and Graphics, 7, 8.

Weiss, S. M., & Indurkhya, N. (1997). Predictive data mining: A practical guide. New York: Morgan-Kaufman.

Westphal, C., Blaxton, T. (1998). Data mining solutions. New York: Wiley.

Dean J, Ghemawat S: MapReduce: simplified data processing on large clusters. Commun ACM 2008, 51(1):107â€“113. DOI: https://doi.org/10.1145/1327452.1327492

Peyton Jones SL: The Implementation of Functional Programming Languages (Prentice-Hall International Series in Computer Science). New Jersey â€“ USA: Prentice-Hall, Inc; 1987.

http://en.wikipedia.org/wiki/Apache_Hadoop.Apache .

https://infosys.uni-saarland.de/publications/ BigDataTutorial.pdf.

Shvachko K, Kuang H, Radia S, Chansler R: The hadoop distributed file system. In Mass Storage Systems and Technologies (MSST), 2010 IEEE 26th Symposium on: 2010, IEEE; 2010:1â€“10. DOI: https://doi.org/10.1109/MSST.2010.5496972

The Apache Software Foundation. http://apache.org/

Olson M: Hadoop: Scalable, flexible data storage and analysis. IQT Quart 2010, 1(3):14â€“18.

Xiaojing J: Google Cloud Computing Platform Technology Architecture and the Impact of Its Cost. In 2010 Second WRI World Congress on Software Engineering: 2010; 2010:17â€“20.

EFFECTIVE AND EFFICIENT WAY OF REDUCE DEPENDENCY ON DATASET WITH THE HELP OF MAPREDUCE ON BIG DATA

Article Sidebar

Downloads

Main Article Content

Abstract

Keywords

Article Details

Publishing License

References

References