Practical Python Data Wrangling and Data QualityThe world around us is full of data that holds unique insights and valuable stories, and this book will help you uncover them. Whether you already work with data or want to learn more about its possibilities, the examples and techniques in this practical book will help you more easily clean, evaluate, and analyze data so that you can generate meaningful insights and compelling visualizations.
Complementing foundational concepts with expert advice, author Susan E. McGregor provides the resources you need to extract, evaluate, and analyze a wide variety of data sources and formats, along with the tools to communicate your findings effectively. This book delivers a methodical, jargon-free way for data practitioners at any level, from true novices to seasoned professionals, to harness the power of data.
Use Python 3.8+ to read, write, and transform data from a variety of sources; Understand and use programming basics in Python to wrangle data at scale; Organize, document, and structu ...
Practical SQL, 2nd EditionPractical SQL is an approachable and fast-paced guide to SQL (Structured Query Language), the standard programming language for defining, organizing, and exploring data in relational databases. Anthony DeBarros, a journalist and data analyst, focuses on using SQL to find the story within your data. The examples and code use the open-source database PostgreSQL and its companion pgAdmin interface, and the concepts you learn will apply to most database management systems, including MySQL, Oracle, SQLite, and others.
You'll first cover the fundamentals of databases and the SQL language, then build skills by analyzing data from real-world datasets such as US Census demographics, New York City taxi rides, and earthquakes from US Geological Survey. Each chapter includes exercises and examples that teach even those who have never programmed before all the tools necessary to build powerful databases and access information quickly and efficiently.
You'll learn how to: Create databases and ...
Data Science on the Google Cloud Platform, 2nd EditionLearn how easy it is to apply sophisticated statistical and machine learning methods to real-world problems when you build using Google Cloud Platform (GCP). This hands-on guide shows data engineers and data scientists how to implement an end-to-end data pipeline with cloud native tools on GCP.
Throughout this updated second edition, you'll work through a sample business decision by employing a variety of data science approaches. Follow along by building a data pipeline in your own project on GCP, and discover how to solve data science problems in a transformative and more collaborative way.
You'll learn how to: Employ best practices in building highly scalable data and ML pipelines on Google Cloud; Automate and schedule data ingest using Cloud Run; Create and populate a dashboard in Data Studio; Build a real-time analytics pipeline using Pub/Sub, Dataflow, and BigQuery; Conduct interactive data exploration with BigQuery; Create a Bayesian model with Spark on Cloud Dataproc; Fore ...
Intermediate Statistics with RIntroductory statistics courses prepare students to think statistically but cover relatively few statistical methods. Building on the basic statistical thinking emphasized in an introductory course, a second course in statistics at the undergraduate level can explore a large number of statistical methods. This text covers more advanced graphical summaries, One-Way ANOVA with pair-wise comparisons, Two-Way ANOVA, Chi-square testing, and simple and multiple linear regression models. Models with interactions are discussed in the Two-Way ANOVA and multiple linear regression setting with categorical explanatory variables. Randomization-based inferences are used to introduce new parametric distributions and to enhance understanding of what evidence against the null hypothesis "looks like". Throughout, the use of the statistical software R via Rstudio is emphasized with all useful code and data sets provided within the text. ...
Essential Math for Data ScienceMaster the math needed to excel in data science, machine learning, and statistics. In this book author Thomas Nield guides you through areas like calculus, probability, linear algebra, and statistics and how they apply to techniques like linear regression, logistic regression, and neural networks. Along the way you'll also gain practical insights into the state of data science and how to use those insights to maximize your career.
Learn how to: Use Python code and libraries like SymPy, NumPy, and scikit-learn to explore essential mathematical concepts like calculus, linear algebra, statistics, and machine learning; Understand techniques like linear regression, logistic regression, and neural networks in plain English, with minimal mathematical notation and jargon; Perform descriptive statistics and hypothesis testing on a dataset to interpret p-values and statistical significance; Manipulate vectors and matrices and perform matrix decomposition; Integrate and build upon incremental ...
R in Action, 3rd EditionR in Action, 3rd Edition makes learning R quick and easy. That's why thousands of data scientists have chosen this guide to help them master the powerful language. Far from being a dry academic tome, every example you'll encounter in this book is relevant to scientific and business developers, and helps you solve common data challenges. R expert Rob Kabacoff takes you on a crash course in statistics, from dealing with messy and incomplete data to creating stunning visualizations. This revised and expanded third edition contains fresh coverage of the new tidyverse approach to data analysis and R's state-of-the-art graphing capabilities with the ggplot2 package.
Used daily by data scientists, researchers, and quants of all types, R is the gold standard for statistical data analysis. This free and open source language includes packages for everything from advanced data visualization to deep learning. Instantly comfortable for mathematically minded users, R easily handles practical prob ...
Advanced Analytics with PySparkThe amount of data being generated today is staggering and growing. Apache Spark has emerged as the de facto tool to analyze big data and is now a critical part of the data science toolbox. Updated for Spark 3.0, this practical guide brings together Spark, statistical methods, and real-world datasets to teach you how to approach analytics problems using PySpark, Spark's Python API, and other best practices in Spark programming.
Data scientists Akash Tandon, Sandy Ryza, Uri Laserson, Sean Owen, and Josh Wills offer an introduction to the Spark ecosystem, then dive into patterns that apply common techniques-including classification, clustering, collaborative filtering, and anomaly detection, to fields such as genomics, security, and finance. This updated edition also covers NLP and image processing.
If you have a basic understanding of machine learning and statistics and you program in Python, this book will get you started with large-scale data analysis.
Familiarize yourself wi ...
Beginning Data Science in R 4, 2nd EditionDiscover best practices for data analysis and software development in R and start on the path to becoming a fully-fledged data scientist. Updated for the R 4.0 release, this book teaches you techniques for both data manipulation and visualization and shows you the best way for developing new software packages for R.
Beginning Data Science in R 4, Second Edition details how data science is a combination of statistics, computational science, and machine learning. You'll see how to efficiently structure and mine data to extract useful patterns and build mathematical models. This requires computational methods and programming, and R is an ideal programming language for this.
This book is based on a number of lecture notes for classes the author has taught on data science and statistical programming using the R programming language. Modern data analysis requires computational skills and usually a minimum of programming. ...
Even You Can Learn Statistics and Analytics, 4th EditionThis book discusses statistics and analytics using plain language and avoiding mathematical jargon. If you thought you couldnt learn these data analysis subjects because they were too technical or too mathematical, this book is for you!
This edition delivers more everyday examples and end-of-chapter exercises and contains updated instructions for using Microsoft Excel. Youll use downloadable data sets and spreadsheet solutions, template-based solutions you can put right to work. Using this book, you will understand the important concepts of statistics and analytics, including learning the basic vocabulary of these subjects.
Create tabular and visual summaries and learn to avoid common charting errors; Gain experience working with common descriptive statistics measures including the mean, median, and mode; and standard deviation and variance, among others; Understand the probability concepts that underlie inferential statistics; Learn how to apply hypothesis tests, using Z, t, chi ...
Time Series Analysis with Python CookbookTime series data is everywhere, available at a high frequency and volume. It is complex and can contain noise, irregularities, and multiple patterns, making it crucial to be well-versed with the techniques covered in this book for data preparation, analysis, and forecasting.
This book covers practical techniques for working with time series data, starting with ingesting time series data from various sources and formats, whether in private cloud storage, relational databases, non-relational databases, or specialized time series databases such as InfluxDB. Next, you'll learn strategies for handling missing data, dealing with time zones and custom business days, and detecting anomalies using intuitive statistical methods, followed by more advanced unsupervised ML models. The book will also explore forecasting using classical statistical models such as Holt-Winters, SARIMA, and VAR. The recipes will present practical techniques for handling non-stationary data, using power transforms, A ...