We love eBooks
    Learning Hadoop 2
    Publisher

    This site is safe

    You are at a security, SSL-enabled, site. All our eBooks sources are constantly verified.

    Learning Hadoop 2

    By Garry Turkington

    What do you think about this eBook?

    About

    Design and implement data processing, lifecycle management, and analytic workflows with the cutting-edge toolbox of Hadoop 2

    About This Book

    • Construct state-of-the-art applications using higher-level interfaces and tools beyond the traditional MapReduce approach
    • Use the unique features of Hadoop 2 to model and analyze Twitter’s global stream of user generated data
    • Develop a prototype on a local cluster and deploy to the cloud (Amazon Web Services)

    Who This Book Is For

    If you are a system or application developer interested in learning how to solve practical problems using the Hadoop framework, then this book is ideal for you. You are expected to be familiar with the Unix/Linux command-line interface and have some experience with the Java programming language. Familiarity with Hadoop would be a plus.

    What You Will Learn

    • Write distributed applications using the MapReduce framework
    • Go beyond MapReduce and process data in real time with Samza and iteratively with Spark
    • Familiarize yourself with data mining approaches that work with very large datasets
    • Prototype applications on a VM and deploy them to a local cluster or to a cloud infrastructure (Amazon Web Services)
    • Conduct batch and real time data analysis using SQL-like tools
    • Build data processing flows using Apache Pig and see how it enables the easy incorporation of custom functionality
    • Define and orchestrate complex workflows and pipelines with Apache Oozie
    • Manage your data lifecycle and changes over time

    In Detail

    This book introduces you to the world of building data-processing applications with the wide variety of tools supported by Hadoop 2. Starting with the core components of the framework—HDFS and YARN—this book will guide you through how to build applications using a variety of approaches.

    You will learn how YARN completely changes the relationship between MapReduce and Hadoop and allows the latter to support more varied processing approaches and a broader array of applications. These include real-time processing with Apache Samza and iterative computation with Apache Spark. Next up, we discuss Apache Pig and the dataflow data model it provides. You will discover how to use Pig to analyze a Twitter dataset.

    With this book, you will be able to make your life easier by using tools such as Apache Hive, Apache Oozie, Hadoop Streaming, Apache Crunch, and Kite SDK. The last part of this book discusses the likely future direction of major Hadoop components and how to get involved with the Hadoop community.

    Download eBook Link updated in 2017
    Maybe you will be redirected to source's website
    Thank you and welcome to our newsletter list! Ops, you're already in our list.

    eBooks by Garry Turkington

    Author's page

    Related to this eBook

    Browse collections Find similar eBooks

    Keep connected to us

    Follow us on Social Media or subscribe to our newsletter to keep updated about eBooks world.

    Explore eBooks

    Browse all eBook collections

    Collections is the easy way to explore our eBook directory.