Data Pipelines Pocket Reference
Data pipelines are the foundation for success in data analytics. Moving data from numerous diverse sources and transforming it to provide context is the difference between having data and actually gaining value from it. This pocket reference defines data pipelines and explains how they work in today's modern data stack.
You'll learn common considerations and key decision points when implementing pipelines, such as batch versus streaming data ingestion and build versus buy. This book addresses the most common decisions made by data professionals and discusses foundational concepts that apply to open source frameworks, commercial products, and homegrown solutions.
You'll learn: What a data pipeline is and how it works; How data is moved and processed on modern data infrastructure, including cloud platforms; Common tools and products used by data engineers to build pipelines; How pipelines support analytics and reporting needs; Considerations for pipeline maintenance, testing ...
The Economics of Data, Analytics, and Digital Transformation
In today's digital era, every organization has data, but just possessing enormous amounts of data is not a sufficient market discriminator. The Economics of Data, Analytics and Digital Transformation aims to provide actionable insights into the real market discriminators, including an organization's data-fueled analytics products that inspire innovation, deliver insights, help make practical decisions, generate value, and produce mission success for the enterprise.
The book begins by first building your mindset to be value-driven and introducing the Big Data Business Model Maturity Index, its maturity index phases, and how to navigate the index. You will explore value engineering, where you will learn how to identify key business initiatives, stakeholders, advanced analytics data sources, and instrumentation strategies that are essential to data science success. The book will help you accelerate and optimize your company's operations through AI and machine learn ...
Advanced Analytics in Power BI with R and Python
This easy-to-follow guide provides R and Python recipes to help you learn and apply the top languages in the field of data analytics to your work in Microsoft Power BI. Data analytics expert and author Ryan Wade shows you how to use R and Python to perform tasks that are extremely hard, if not impossible, to do using native Power BI tools. For example, you will learn to score Power BI data using custom data science models and powerful models from Microsoft Cognitive Services.
The R and Python languages are powerful complements to Power BI. They enable advanced data transformation techniques that are difficult to perform in Power BI in its default configuration but become easier by leveraging the capabilities of R and Python. If you are a business analyst, data analyst, or a data scientist who wants to push Power BI and transform it from being just a business intelligence tool into an advanced data analytics tool, then this is the book to help you do that. ...
IoT and Edge Computing for Architects, 2nd Edition
Industries are embracing IoT technologies to improve operational expenses, product life, and people's well-being. An architectural guide is needed if you want to traverse the spectrum of technologies needed to build a successful IoT system, whether that's a single device or millions of IoT devices.
IoT and Edge Computing for Architects, Second Edition encompasses the entire spectrum of IoT solutions, from IoT sensors to the cloud. It examines modern sensor systems, focusing on their power and functionality. It also looks at communication theory, paying close attention to near-range PAN, including the new Bluetooth® 5.0 specification and mesh networks. Then, the book explores IP-based communication in LAN and WAN, including 802.11ah, 5G LTE cellular, Sigfox, and LoRaWAN. It also explains edge computing, routing and gateways, and their role in fog computing, as well as the messaging protocols of MQTT 5.0 and CoAP.
With the data now in internet form, you'll get an understanding of ...
Beginning Apache Spark Using Azure Databricks
Analyze vast amounts of data in record time using Apache Spark with Databricks in the Cloud. Learn the fundamentals, and more, of running analytics on large clusters in Azure and AWS, using Apache Spark with Databricks on top. Discover how to squeeze the most value out of your data at a mere fraction of what classical analytics solutions cost, while at the same time getting the results you need, incrementally faster.
This book explains how the confluence of these pivotal technologies gives you enormous power, and cheaply, when it comes to huge datasets. You will begin by learning how cloud infrastructure makes it possible to scale your code to large amounts of processing units, without having to pay for the machinery in advance. From there you will learn how Apache Spark, an open source framework, can enable all those CPUs for data analytics use. Finally, you will see how services such as Databricks provide the power of Apache Spark, without you having to know a ...
Pro Power BI Desktop, 3rd Edition
Deliver eye-catching and insightful business intelligence with Microsoft Power BI Desktop. This new edition has been updated to cover all the latest features of Microsoft's continually evolving visualization product. New in this edition is help with storytelling - adapted to PCs, tablets, and smartphones - and the building of a data narrative. You will find coverage of templates and JSON style sheets, data model annotations, and the use of composite data sources. Also provided is an introduction to incorporating Python visuals and the much awaited Decomposition Tree visual.
Pro Power BI Desktop shows you how to use source data to produce stunning dashboards and compelling reports that you mold into a data narrative to seize your audience's attention. Slice and dice the data with remarkable ease and then add metrics and KPIs to project the insights that create your competitive advantage. Convert raw data into clear, accurate, and interactive information with Microsoft's free self-ser ...
Beginning Microsoft Power BI
Analyze company data quickly and easily using Microsoft's powerful data tools. Learn to build scalable and robust data models, clean and combine different data sources effectively, and create compelling and professional visuals.
Beginning Power BI is a hands-on, activity-based guide that takes you through the process of analyzing your data using the tools that that encompass the core of Microsoft's self-service BI offering. Starting with Power Query, you will learn how to get data from a variety of sources, and see just how easy it is to clean and shape the data prior to importing it into a data model. Using Power BI tabular and the Data Analysis Expressions (DAX), you will learn to create robust scalable data models which will serve as the foundation of your data analysis. From there you will enter the world of compelling interactive visualizations to analyze and gain insight into your data. You will wrap up your Power BI journey by learning how to package and share your reports an ...
Explore the modern market of data analytics platforms and the benefits of using Snowflake computing, the data warehouse built for the cloud.
With the rise of cloud technologies, organizations prefer to deploy their analytics using cloud providers such as Amazon Web Services (AWS), Microsoft Azure, or Google Cloud Platform. Cloud vendors are offering modern data platforms for building cloud analytics solutions to collect data and consolidate into single storage solutions that provide insights for business users. The core of any analytics framework is the data warehouse, and previously customers did not have many choices of platform to use.
Snowflake was built specifically for the cloud and it is a true game changer for the analytics market. This book will help onboard you to Snowflake, present best practices to deploy, and use the Snowflake data warehouse. In addition, it covers modern analytics architecture and use cases. It provides use ...
Text Analytics with Python, 2nd Edition
Leverage Natural Language Processing (NLP) in Python and learn how to set up your own robust environment for performing text analytics. This second edition has gone through a major revamp and introduces several significant changes and new topics based on the recent trends in NLP.
You'll see how to use the latest state-of-the-art frameworks in NLP, coupled with machine learning and deep learning models for supervised sentiment analysis powered by Python to solve actual case studies. Start by reviewing Python for NLP fundamentals on strings and text data and move on to engineering representation methods for text data, including both traditional statistical models and newer deep learning-based embedding models. Improved techniques and new methods around parsing and processing text are discussed as well.
Text summarization and topic models have been overhauled so the book showcases how to build, tune, and interpret topic models in the context of an interest dataset on NIPS conferenc ...
Understanding Azure Data Factory
Improve your analytics and data platform to solve major challenges, including operationalizing big data and advanced analytics workloads on Azure. You will learn how to monitor complex pipelines, set alerts, and extend your organization's custom monitoring requirements.
This book starts with an overview of the Azure Data Factory as a hybrid ETL/ELT orchestration service on Azure. The book then dives into data movement and the connectivity capability of Azure Data Factory. You will learn about the support for hybrid data integration from disparate sources such as on-premise, cloud, or from SaaS applications. Detailed guidance is provided on how to transform data and on control flow. Demonstration of operationalizing the pipelines and ETL with SSIS is included. You will know how to leverage Azure Data Factory to run existing SSIS packages. As you advance through the book, you will wrap up by learning how to create a single pane for end-to-end monitoring, which is a key s ...
Apache Spark 2: Data Processing and Real-Time Analytics
Apache Spark is an in-memory, cluster-based data processing system that provides a wide range of functionalities such as big data processing, analytics machine learning, and more. With this Learning Path, you can take your knowledge of Apache Spark to the next level by learning how to expand Spark's functionality and building your own data flow and machine learning programs on this platform.
You will work with the different modules in Apache Spark, such as interactive querying with Spark SQL, using DataFrames and datasets, implementing streaming analytics with Spark Streaming, and applying machine learning and deep learning techniques on Spark using MLlib and various external tools.
By the end of this elaborately designed Learning Path, you will have all the knowledge you need to master Apache Spark, and build your own big data processing and analytics pipeline quickly and without any hassle. ...