Key Features
- Become an expert Hadoop administrator and perform tasks for optimizing your Hadoop Cluster
- Import and export data into Hive and use Oozie to manage workflow.
- Practical recipes to help you plan and secure your Hadoop cluster, and make it highly available
Book Description
Hadoop allows distributed storage and processing of large data sets across clusters of computers. Learning to administer Hadoop is crucial for exploiting its unique features. With this book, you will be able to overcome common problems encountered in Hadoop Administration.
This book begins with laying the foundation by showing the steps to set up the Hadoop cluster and its various nodes. You will get a better understanding of how to maintain Hadoop cluster, especially on the HDFS layer and using YARN and MapReduce. Further you will explore durabiltiy and high availability of Hadoop cluster. Get a better understanding of the schedulers in Hadoop and how to configure and use them for your tasks. You will also get a hands-on experience with the back up and recovery options and also performance tuning aspects of Hadoop. Finally, you will get a better understanding of troubleshooting, diagnostics and best practices in Hadoop administration.
By the end of this book, you will get a proper understanding of working with Hadoop clusters and will also be able to secure, encrypt it and configure auditing for your Hadoop clusters
What you will learn
- Set up hadoop architecture to run a Hadoop cluster smoothly.
- Maintain Hadoop cluster on HDFS, YARN and MapReduce.
- Understand High Availability with Zookeeper and Journal Node.
- Configure Flume for data ingestion and Oozie to run various workflows.
- Tune the Hadoop cluster for optimal performance.
- Schedule jobs on Hadoop cluster using Fair and Capacity scheduler.
- Secure your cluster and troubleshoot it for various common pain points.
About the Author
Gurmukh Singh has been an infrastructure engineer for over 10 years and has worked on big data platforms in the past 5 years. He started his career as a field engineer, setting up lease lines and radio links. He has vast experience in enterprise servers and network design and in scaling infrastructures and tuning them for performance. He is the founder of a small start-up called Netxillon Technologies, which is into big data training and consultancy. He talks at various technical meetings and is an active participant in the open source community's activities. He writes at http://linuxaddict.org and maintains his Github account at https://github.com/gdhillon.