IT eBooks
Download, Read, Use
Modern Data Engineering with Apache Spark
Modern Data Engineering with Apache Spark

Leverage Apache Spark within a modern data engineering ecosystem. This hands-on guide will teach you how to write fully functional applications, follow industry best practices, and learn the rationale behind these decisions. With Apache Spark as the foundation, you will follow a step-by-step journey beginning with the basics of data ingestion, processing, and transformation, and ending up with an entire local data platform running Apache Spark, Apache Zeppelin, Apache Kafka, Redis, MySQL, Minio (S3), and Apache Airflow. Apache Spark applications solve a wide range of data problems from traditional data loading and processing to rich SQL-based analysis as well as complex machine learning workloads and even near real-time processing of streaming data. Spark fits well as a central foundation for any data engineering workload. This book will teach you to write interactive Spark applications using Apache Zeppelin notebooks, write and compile reusable applications and modules, and fully t ...
Designing Autonomous AI
Designing Autonomous AI

Early rules-based artificial intelligence demonstrated intriguing decision-making capabilities but lacked perception and didn't learn. AI today, primed with machine learning perception and deep reinforcement learning capabilities, can perform superhuman decision-making for specific tasks. This book shows you how to combine the practicality of early AI with deep learning capabilities and industrial control technologies to make robust decisions in the real world. Using concrete examples, minimal theory, and a proven architectural framework, author Kence Anderson demonstrates how to teach autonomous AI explicit skills and strategies. You'll learn when and how to use and combine various AI architecture design patterns, as well as how to design advanced AI without needing to manipulate neural networks or machine learning algorithms. Students, process operators, data scientists, machine learning algorithm experts, and engineers who own and manage industrial processes can use the methodolo ...
How to Lead in Data Science
How to Lead in Data Science

How to Lead in Data Science is full of techniques for leading data science at every seniority level - from heading up a single project to overseeing a whole company's data strategy. Authors Jike Chong and Yue Cathy Chang share hard-won advice that they've developed building data teams for LinkedIn, Acorns, Yiren Digital, large asset-management firms, Fortune 50 companies, and more. You'll find advice on plotting your long-term career advancement, as well as quick wins you can put into practice right away. Carefully crafted assessments and interview scenarios encourage introspection, reveal personal blind spots, and highlight development areas. Lead your data science teams and projects to success! To make a consistent, meaningful impact as a data science leader, you must articulate technology roadmaps, plan effective project strategies, support diversity, and create a positive environment for professional growth. This book delivers the wisdom and practical skills you need to thrive a ...
Grokking Machine Learning
Grokking Machine Learning

Grokking Machine Learning teaches you how to apply ML to your projects using only standard Python code and high school-level math. No specialist knowledge is required to tackle the hands-on exercises using Python and readily available machine learning tools. Packed with easy-to-follow Python-based exercises and mini-projects, this book sets you on the path to becoming a machine learning expert. Discover powerful machine learning techniques you can understand and apply using only high school math! Put simply, machine learning is a set of techniques for data analysis based on algorithms that deliver better results as you give them more data. ML powers many cutting-edge technologies, such as recommendation systems, facial recognition software, smart speakers, and even self-driving cars. This unique book introduces the core concepts of machine learning, using relatable examples, engaging exercises, and crisp illustrations. Grokking Machine Learning presents machine learning algorithm ...
Data Privacy
Data Privacy

Data Privacy teaches you to design, develop, and measure the effectiveness of privacy programs. You'll learn from author Nishant Bhajaria, an industry-renowned expert who has overseen privacy at Google, Netflix, and Uber. The terminology and legal requirements of privacy are all explained in clear, jargon-free language. The book's constant awareness of business requirements will help you balance trade-offs, and ensure your user's privacy can be improved without spiraling time and resource costs. Data privacy is essential for any business. Data breaches, vague policies, and poor communication all erode a user's trust in your applications. You may also face substantial legal consequences for failing to protect user data. Fortunately, there are clear practices and guidelines to keep your data secure and your users happy. Data Privacy: A runbook for engineers teaches you how to navigate the trade-off s between strict data security and real world business needs. In this practical book ...
Understanding Machine Learning
Understanding Machine Learning

The subject of this book is automated learning, or, as we will more often call it, Machine Learning (ML). That is, we wish to program computers so that they can "learn" from input available to them. Roughly speaking, learning is the process of converting experience into expertise or knowledge. The input to a learning algorithm is training data, representing experience, and the output is some expertise, which usually takes the form of another computer program that can perform some task. Seeking a formal-mathematical understanding of this concept, we'll have to be more explicit about what we mean by each of the involved terms: What is the training data our programs will access? How can the process of learning be automated? How can we evaluate the success of such a process (namely, the quality of the output of a learning program)? ...
Python Challenges
Python Challenges

Augment your knowledge of Python with this entertaining learning guide, which features 100 exercises and programming puzzles and solutions. Python Challenges will help prepare you for your next exam or a job interview, and covers numerous practical topics such as strings, data structures, recursion, arrays, and more. Each topic is addressed in its own separate chapter, starting with an introduction to the basics and followed by 10 to 15 exercises of various degrees of difficulty, helping you to improve your programming skills effectively. Detailed sample solutions, including the algorithms used for all tasks, are included to maximize your understanding of each area. Author Michael Inden also describes alternative solutions and analyzes possible pitfalls and typical errors. Three appendices round out the book: the first covers the Python command line interpreter, which is often helpful for trying out the code snippets and examples in the book, followed by an overview of Pytest for ...
Advanced Analytics with PySpark
Advanced Analytics with PySpark

The amount of data being generated today is staggering and growing. Apache Spark has emerged as the de facto tool to analyze big data and is now a critical part of the data science toolbox. Updated for Spark 3.0, this practical guide brings together Spark, statistical methods, and real-world datasets to teach you how to approach analytics problems using PySpark, Spark's Python API, and other best practices in Spark programming. Data scientists Akash Tandon, Sandy Ryza, Uri Laserson, Sean Owen, and Josh Wills offer an introduction to the Spark ecosystem, then dive into patterns that apply common techniques-including classification, clustering, collaborative filtering, and anomaly detection, to fields such as genomics, security, and finance. This updated edition also covers NLP and image processing. If you have a basic understanding of machine learning and statistics and you program in Python, this book will get you started with large-scale data analysis. Familiarize yourself wi ...
Q# Pocket Guide
Q# Pocket Guide

Ready to build quantum computing applications using Q# and the Microsoft Quantum Development Kit? This is the book for you. Q# is a domain-specific language for expressing quantum algorithms that combines familiar "classical" language constructs with quantum-specific ones. Ideal for any developer familiar with (or willing to learn) the basics of quantum computing and looking to get started with quantum programming, this pocket guide quickly helps you find syntax and usage information for unfamiliar aspects of Q#. You'll explore the quantum software development lifecycle from implementing the program to running it on quantum simulators to testing and debugging it. You'll learn to use the tools provided by Microsoft's Quantum Development Kit for each step of the process. You'll explore: Q# language details, including data types, statements, and operators; Guidelines for organizing Q# code and invoking it from different environments; Information on simulators and tools in the Micros ...
In-Memory Analytics with Apache Arrow
In-Memory Analytics with Apache Arrow

Apache Arrow is designed to accelerate analytics and allow the exchange of data across big data systems easily. In-Memory Analytics with Apache Arrow begins with a quick overview of the Apache Arrow format, before moving on to helping you to understand Arrow's versatility and benefits as you walk through a variety of real-world use cases. You'll cover key tasks such as enhancing data science workflows with Arrow, using Arrow and Apache Parquet with Apache Spark and Jupyter for better performance and hassle-free data translation, as well as working with Perspective, an open source interactive graphical and tabular analysis tool for browsers. As you advance, you'll explore the different data interchange and storage formats and become well-versed with the relationships between Arrow, Parquet, Feather, Protobuf, Flatbuffers, JSON, and CSV. In addition to understanding the basic structure of the Arrow Flight and Flight SQL protocols, you'll learn about Dremio's usage of Apache Arrow to e ...
← Prev       Next →
Reproduction of site books is authorized only for informative purposes and strictly for personal, private use.
Only Direct Download
IT eBooks Group © 2011-2025