I am Wes McKinney, creator of the Python pandas project and author of Python for Data Analysis. I have been using Python for data work since 2007 and have worked extensively in the open source community to build accessible and fast data processing tools for Python programmers.
Polars is a new modern data frame library, powered by Apache Arrow, that has grown like wildfire in the Python ecosystem in recent years. It provides many of the same capabilities as pandas, but with substantially better performance and scalability.
I recommend that Python programmers become proficient at both Polars and pandas, since while pandas is ubiquitous, more and more work will shift to Polars in the coming years.
Want to speed up your data analysis and work with larger-than-memory datasets? Python Polars offers a blazingly fast, multithreaded, and elegant API for data loading, manipulation, and processing. With this hands-on guide, you'll walk through every aspect of Polars and learn how to tackle practical use cases using real-world datasets.
Jeroen Janssens and Thijs Nieuwdorp from Xomnia in Amsterdam show you how this superfast DataFrame library is perfect for efficient data wrangling, ETL pipelines, and so much more. This book helps you quickly learn the syntax and understand Polars' underlying concepts. You don't need to have experience with pandas or…
This is one of the best books I’ve read on how to write better code and build more maintainable software in Python. It is well-written, concise, and to the point.
Brett’s book is a perfect companion to the other books on this list, which are more focused on data analysis and using specific libraries to build data systems.
It's easy to start developing programs with Python, which is why the language is so popular. However, Python's unique strengths, charms, and expressiveness can be hard to grasp, and there are hidden pitfalls that can easily trip you up. This second edition of Effective Python will help you master a truly "Pythonic" approach to programming, harnessing Python's full power to write exceptionally robust and well-performing code. Using the concise, scenario-driven style pioneered in Scott Meyers' best-selling Effective C++, Brett Slatkin brings together 90 Python best practices, tips, and shortcuts, and explains them with realistic…
While this book has a good amount of overlap with my book, it provides a valuable introduction to scikit-learn, one of the most popular libraries for machine learning in Python. There is also excellent content to improve your data visualization skills with matplotlib.
For many researchers, Python is a first-class tool mainly because of its libraries for storing, manipulating, and gaining insight from data. Several resources exist for individual pieces of this data science stack, but only with the Python Data Science Handbook do you get them all-IPython, NumPy, Pandas, Matplotlib, Scikit-Learn, and other related tools. Working scientists and data crunchers familiar with reading and writing Python code will find this comprehensive desk reference ideal for tackling day-to-day issues: manipulating, transforming, and cleaning data; visualizing different types of data; and using data to build statistical or machine learning models. Quite simply, this is…
This is a super useful book published more recently that shows how to make the most of pandas’s deep toolbelt of features.
Compared with Python for Data Analysis, it explores some of the newer features added to pandas, and I think that any advanced pandas user will become more effective in their day to day work by reading it.
Best practices for manipulating data with Pandas. This book will arm you with years of knowledge and experience that are condensed into an easy to follow format. Rather than taking months reading blogs and websites and searching mailing lists and groups, this book will teach you how to write good Pandas code.
It covers:
Series manipulation
Creating columns
Summary statistics
Grouping, pivoting, and cross-tabulation
Time series data
Visualization
Chaining
Debugging code
and more...
This is a great follow-up book to Python Data Science Handbook.
Co-authored by one of the core developers of scikit-learn, this provides a deeper introduction to doing machine learning work in Python. This will give you a solid foundation to be able to move on later to deeper topics including deep learning or other AI topics.
Machine learning has become an integral part of many commercial applications and research projects, but this field is not exclusive to large companies with extensive research teams. If you use Python, even as a beginner, this book will teach you practical ways to build your own machine learning solutions. With all the data available today, machine learning applications are limited only by your imagination. You'll learn the steps necessary to create a successful machine-learning application with Python and the scikit-learn library. Authors Andreas Muller and Sarah Guido focus on the practical aspects of using machine learning algorithms, rather than the…
My book provides a foundation in Python programming skills and add-on libraries to become competent at essential data wrangling, analytics, and visualization. Much of the book is focused on pandas, but it also gives an overview of the Python language along with Jupyter, NumPy, and matplotlib.
It's a practical, modern introduction to data science tools in Python. It's ideal for analysts new to Python and for Python programmers new to data science and scientific computing. Data files and related material are available on GitHub.