Key Features
- Optimize your work flow with Spark in data science, and get solutions to all your big data problems
- Large-scale data science made easy with Spark
- Get recipes to make the most of Spark’s power and speed in predictive analytics
Book Description
Spark has emerged as the big data platform of choice for data scientists. The real power and value proposition of Apache Spark is its platform to execute data science tasks. Spark’s unique use case is that it combines ETL, batch analytics, real-time stream analysis, machine learning, graph processing, and visualizations to allow data scientists to tackle the complexities that come with raw unstructured data sets.
This hands-on, practical resource will allow you to dive in and become comfortable and confident in working with Spark for data science. We will walk you through various techniques to deal with simple and complex data science tasks with Spark. We’ll effectively offer solutions to problematic concepts in data science using Spark’s data science libraries. The book will help you derive intelligent information at every step of the way through simple yet efficient recipes that will not only show you how to implement algorithms, but also optimize your work.
What you will learn
- Explore the topics of data mining, text mining, NLP, information retrieval, and machine learning
- Solve real-world analytical problems with large data sets
- Get the flavor of challenges in data science and address them with a variety of analytical tools on a distributed system like Spark (apt for iterative algorithms), which offers in-memory processing and more flexibility for data analysis at scale
About the Author
Padma Priya Chitturi Analytics Lead at Fractal Analytics Pvt Ltd and has over 4 years of experience in Big Data processing. Currently, she is part of capability development at Fractal and responsible for solution development for analytical problems across multiple business domains at large scale. Prior to this, she worked for an Airlines product on a real-time processing platform serving one million user requests/sec at Amadeus Software Labs. She has worked on realizing large-scale deep networks (Jeffrey dean's work in Google brain) for image classification on the big data platform Spark. She works closely with Big Data technologies such as Spark, Storm Cassandra, and Hadoop. She is an open source contributor to Apache Storm and you can find her name in the Storm community..
She has also authored technical and research articles.