Key Features
- Clean, format, and explore data using graphical and numerical summaries
- Leverage the IPython environment to efficiently analyze data with Python
- Packed with easy-to-follow examples to develop advanced computational skills for the analysis of complex data
Book Description
Python, a multi-paradigm programming language, has become the language of choice for data scientists for data analysis, visualization, and machine learning. Ever imagined how to become an expert at effectively approaching data analysis problems, solving them, and extracting all of the available information from your data? Well, look no further, this is the book you want!
Through this comprehensive guide, you will explore data and present results and conclusions from statistical analysis in a meaningful way. You'll be able to quickly and accurately perform the hands-on sorting, reduction, and subsequent analysis, and fully appreciate how data analysis methods can support business decision-making.
You'll start off by learning about the tools available for data analysis in Python and will then explore the statistical models that are used to identify patterns in data. Gradually, you'll move on to review statistical inference using Python, Pandas, and SciPy. After that, we'll focus on performing regression using computational tools and you'll get to understand the problem of identifying clusters in data in an algorithmic way. Finally, we delve into advanced techniques to quantify cause and effect using Bayesian methods and you'll discover how to use Python's tools for supervised machine learning.
What you will learn
- Read, sort, and map various data into Python and Pandas
- Recognise patterns so you can understand and explore data
- Use statistical models to discover patterns in data
- Review classical statistical inference using Python, Pandas, and SciPy
- Detect similarities and differences in data with clustering
- Clean your data to make it useful
- Work in Jupyter Notebook to produce publication ready figures to be included in reports
About the Author
Magnus Vilhelm Persson is a scientist with a passion for Python and open source software usage and development. He obtained his PhD in Physics/Astronomy from Copenhagen University's Centre for Star and Planet Formation (StarPlan) in 2013. Since then, he has continued his research in Astronomy at various academic institutes across Europe. In his research, he uses various types of data and analysis to gain insights into how stars are formed. He has participated in radio shows about Astronomy and also organized workshops and intensive courses about the use of Python for data analysis.
You can check out his web page at http://vilhelm.nu.
Luiz Felipe Martins holds a PhD in applied mathematics from Brown University and has worked as a researcher and educator for more than 20 years. His research is mainly in the field of applied probability. He has been involved in developing code for open source homework system, WeBWorK, where he wrote a library for the visualization of systems of differential equations. He was supported by an NSF grant for this project. Currently, he is an associate professor in the department of mathematics at Cleveland State University, Cleveland, Ohio, where he has developed several courses in applied mathematics and scientific computing. His current duties include coordinating all first-year calculus sessions.
Table of Contents
- Tools of the Trade
- Exploring Data
- Learning About Models
- Regression
- Clustering
- Bayesian Methods
- Supervised and Unsupervised Learning
- Time Series Analysis
- More on Jupyter Notebook and matplotlib Styles