IT eBooks
Download, Read, Use
Beginning Data Science in R 4, 2nd Edition
Beginning Data Science in R 4, 2nd Edition

Discover best practices for data analysis and software development in R and start on the path to becoming a fully-fledged data scientist. Updated for the R 4.0 release, this book teaches you techniques for both data manipulation and visualization and shows you the best way for developing new software packages for R. Beginning Data Science in R 4, Second Edition details how data science is a combination of statistics, computational science, and machine learning. You'll see how to efficiently structure and mine data to extract useful patterns and build mathematical models. This requires computational methods and programming, and R is an ideal programming language for this. This book is based on a number of lecture notes for classes the author has taught on data science and statistical programming using the R programming language. Modern data analysis requires computational skills and usually a minimum of programming. ...
Data Engineering with Alteryx
Data Engineering with Alteryx

Alteryx is a GUI-based development platform for data analytic applications. Data Engineering with Alteryx will help you leverage Alteryx's code-free aspects which increase development speed while still enabling you to make the most of the code-based skills you have. This book will teach you the principles of DataOps and how they can be used with the Alteryx software stack. You'll build data pipelines with Alteryx Designer and incorporate the error handling and data validation needed for reliable datasets. Next, you'll take the data pipeline from raw data, transform it into a robust dataset, and publish it to Alteryx Server following a continuous integration process. By the end of this Alteryx book, you'll be able to build systems for validating datasets, monitoring workflow performance, managing access, and promoting the use of your data sources. ...
Time Series Analysis with Python Cookbook
Time Series Analysis with Python Cookbook

Time series data is everywhere, available at a high frequency and volume. It is complex and can contain noise, irregularities, and multiple patterns, making it crucial to be well-versed with the techniques covered in this book for data preparation, analysis, and forecasting. This book covers practical techniques for working with time series data, starting with ingesting time series data from various sources and formats, whether in private cloud storage, relational databases, non-relational databases, or specialized time series databases such as InfluxDB. Next, you'll learn strategies for handling missing data, dealing with time zones and custom business days, and detecting anomalies using intuitive statistical methods, followed by more advanced unsupervised ML models. The book will also explore forecasting using classical statistical models such as Holt-Winters, SARIMA, and VAR. The recipes will present practical techniques for handling non-stationary data, using power transforms, A ...
DevOps in Python, 2nd Edition
DevOps in Python, 2nd Edition

Take advantage of Python to automate complex systems with readable code. This new edition will help you move from operations/system administration into easy-to-learn coding. You'll start by writing command-line scripts and automating simple DevOps-style tasks followed by creating reliable and fast unit tests designed to avoid incidents caused by buggy automation. You'll then move on to more advanced cases, like using Jupyter as an auditable remote-control panel and writing Ansible and Salt extensions. The updated information in this book covers best practices for deploying and updating Python applications. This includes Docker, modern Python packaging, and internal Python package repositories. You'll also see how to use the AWS API, and the Kubernetes API, and how to automate Docker container image building and running. Finally, you'll work with Terraform from Python to allow more flexible templating and customization of environments. ...
Hands-on Machine Learning with Python
Hands-on Machine Learning with Python

Here is the perfect comprehensive guide for readers with basic to intermediate level knowledge of machine learning and deep learning. It introduces tools such as NumPy for numerical processing, Pandas for panel data analysis, Matplotlib for visualization, Scikit-learn for machine learning, and Pytorch for deep learning with Python. It also serves as a long-term reference manual for the practitioners who will find solutions to commonly occurring scenarios. The book is divided into three sections. The first section introduces you to number crunching and data analysis tools using Python with in-depth explanation on environment configuration, data loading, numerical processing, data analysis, and visualizations. The second section covers machine learning basics and Scikit-learn library. It also explains supervised learning, unsupervised learning, implementation, and classification of regression algorithms, and ensemble learning methods in an easy manner with theoretical and practical le ...
Introduction to Scientific Programming with Python
Introduction to Scientific Programming with Python

This open book offers an initial introduction to programming for scientific and computational applications using the Python programming language. The presentation style is compact and example-based, making it suitable for students and researchers with little or no prior experience in programming. The book uses relevant examples from mathematics and the natural sciences to present programming as a practical toolbox that can quickly enable readers to write their own programs for data processing and mathematical modeling. These tools include file reading, plotting, simple text analysis, and using NumPy for numerical computations, which are fundamental building blocks of all programs in data science and computational science. At the same time, readers are introduced to the fundamental concepts of programming, including variables, functions, loops, classes, and object-oriented programming. Accordingly, the book provides a sound basis for further computer science and programming studies. ...
Fundamentals of Data Engineering
Fundamentals of Data Engineering

Data engineering has grown rapidly in the past decade, leaving many software engineers, data scientists, and analysts looking for a comprehensive view of this practice. With this practical book, you'll learn how to plan and build systems to serve the needs of your organization and customers by evaluating the best technologies available through the framework of the data engineering lifecycle. Authors Joe Reis and Matt Housley walk you through the data engineering lifecycle and show you how to stitch together a variety of cloud technologies to serve the needs of downstream data consumers. You'll understand how to apply the concepts of data generation, ingestion, orchestration, transformation, storage, and governance that are critical in any data environment regardless of the underlying technology. This book will help you: Get a concise overview of the entire data engineering landscape; Assess data engineering problems using an end-to-end framework of best practices; Cut through mar ...
Distributed Machine Learning with Python
Distributed Machine Learning with Python

Reducing time cost in machine learning leads to a shorter waiting time for model training and a faster model updating cycle. Distributed machine learning enables machine learning practitioners to shorten model training and inference time by orders of magnitude. With the help of this practical guide, you'll be able to put your Python development knowledge to work to get up and running with the implementation of distributed machine learning, including multi-node machine learning systems, in no time. You'll begin by exploring how distributed systems work in the machine learning area and how distributed machine learning is applied to state-of-the-art deep learning models. As you advance, you'll see how to use distributed systems to enhance machine learning model training and serving speed. You'll also get to grips with applying data parallel and model parallel approaches before optimizing the in-parallel model training and serving pipeline in local clusters or cloud environments. By the en ...
The Azure Data Lakehouse Toolkit
The Azure Data Lakehouse Toolkit

Design and implement a modern data lakehouse on the Azure Data Platform using Delta Lake, Apache Spark, Azure Databricks, Azure Synapse Analytics, and Snowflake. This book teaches you the intricate details of the Data Lakehouse Paradigm and how to efficiently design a cloud-based data lakehouse using highly performant and cutting-edge Apache Spark capabilities using Azure Databricks, Azure Synapse Analytics, and Snowflake. You will learn to write efficient PySpark code for batch and streaming ELT jobs on Azure. And you will follow along with practical, scenario-based examples showing how to apply the capabilities of Delta Lake and Apache Spark to optimize performance, and secure, share, and manage a high volume, high velocity, and high variety of data in your lakehouse with ease. The patterns of success that you acquire from reading this book will help you hone your skills to build high-performing and scalable ACID-compliant lakehouses using flexible and cost-efficient decoupled sto ...
Machine Learning for Streaming Data with Python
Machine Learning for Streaming Data with Python

Streaming data is the new top technology to watch out for in the field of data science and machine learning. As business needs become more demanding, many use cases require real-time analysis as well as real-time machine learning. This book will help you to get up to speed with data analytics for streaming data and focus strongly on adapting machine learning and other analytics to the case of streaming data. You will first learn about the architecture for streaming and real-time machine learning. Next, you will look at the state-of-the-art frameworks for streaming data like River. Later chapters will focus on various industrial use cases for streaming data like Online Anomaly Detection and others. As you progress, you will discover various challenges and learn how to mitigate them. In addition to this, you will learn best practices that will help you use streaming data to generate real-time insights. By the end of this book, you will have gained the confidence you need to stream ...
← Prev       Next →
Reproduction of site books is authorized only for informative purposes and strictly for personal, private use.
Only Direct Download
IT eBooks Group © 2011-2025