Main Article Content

Abstract

With the fast development of networks these days organizations has overflowing with the collection of millions of data with big number of combination. This big data challenges over trade troubles. It requires more analysis for the high-performance procedure. The new method of hadoop and MapReduce methods are discussed starting the data mining standpoint. In the proposed research work we have to progress performance through parallelization of different operations such as loading the information, index building and evaluating the queries. Thus the performance analysis is completed with the minimum of three nodes with in the Amazon cloud environment. Hbase is a open source, non-relational and distributed database model. It executes on the pinnacle of Hadoop. It consists of a single key with multiple values. Looping is avoid in retrieving a meticulous data from huge datasets and it consume less amount of time for execute the data. HDFS file system is used to store the data after performing arts the map reduces operations and the execution time is decreased when the amount of nodes gets increased. The performance analysis is tuned with the parameters such as the carrying out complexity.

Keywords

Mapreduce Data Mining Big data Hadoop etc

Article Details

How to Cite
Londhe, S., & Mahajan, S. (2015). EFFECTIVE AND EFFICIENT WAY OF REDUCE DEPENDENCY ON DATASET WITH THE HELP OF MAPREDUCE ON BIG DATA. International Journal of Students’ Research in Technology & Management, 3(6), 401–405. https://doi.org/10.18510/ijsrtm.2015.364

References

  1. O'Reilly Media ( 2013).†Disruptive Possibilities: How Big Data Changes Everythingâ€.
  2. McKinsey Global Institute : Big Data:The next frontier for innovation, competition , and productivity.
  3. Coulouris GF, Dollimore J, Kindberg T: Distributed Systems: Concepts and Design: Pearson Education; 2005.
  4. de Oliveira Branco M: Distributed Data Management for Large Scale Applications. Southampton – United Kingdom: University of Southampton; 2009.
  5. Fayyad, U. M., Piatetsky-Shapiro, G., Smyth, P., & Uthurusamy, R. (1996). Advances in knowledge discovery & data mining. Cambridge, MA: MIT Press.
  6. Han, J., Kamber, M. (2000). Data mining: Concepts and Techniques. New York: Morgan-Kaufman.
  7. Hastie, T., Tibshirani, R., & Friedman, J. H. (2001). The elements of statistical learning : Data mining, inference, and prediction. New York: Springer. DOI: https://doi.org/10.1007/978-0-387-21606-5
  8. Pregibon, D. (1997). Data Mining. Statistical Computing and Graphics, 7, 8.
  9. Weiss, S. M., & Indurkhya, N. (1997). Predictive data mining: A practical guide. New York: Morgan-Kaufman.
  10. Westphal, C., Blaxton, T. (1998). Data mining solutions. New York: Wiley.
  11. Dean J, Ghemawat S: MapReduce: simplified data processing on large clusters. Commun ACM 2008, 51(1):107–113. DOI: https://doi.org/10.1145/1327452.1327492
  12. Peyton Jones SL: The Implementation of Functional Programming Languages (Prentice-Hall International Series in Computer Science). New Jersey – USA: Prentice-Hall, Inc; 1987.
  13. http://en.wikipedia.org/wiki/Apache_Hadoop.Apache .
  14. https://infosys.uni-saarland.de/publications/ BigDataTutorial.pdf.
  15. Shvachko K, Kuang H, Radia S, Chansler R: The hadoop distributed file system. In Mass Storage Systems and Technologies (MSST), 2010 IEEE 26th Symposium on: 2010, IEEE; 2010:1–10. DOI: https://doi.org/10.1109/MSST.2010.5496972
  16. The Apache Software Foundation. http://apache.org/
  17. Olson M: Hadoop: Scalable, flexible data storage and analysis. IQT Quart 2010, 1(3):14–18.
  18. Xiaojing J: Google Cloud Computing Platform Technology Architecture and the Impact of Its Cost. In 2010 Second WRI World Congress on Software Engineering: 2010; 2010:17–20.