Getting Started with Amazon RedshiftAmazon Redshift is a fast, fully managed, petabyte-scale data warehouse service. It provides an excellent approach to analyzing all your data using your existing business intelligence tools.
Getting Started with Amazon Redshift is an easy-to-read, descriptive guide that breaks down the complex topics of data warehousing and Amazon Redshift. You will learn the fundamentals of Redshift technology and how to implement your own Redshift cluster, through practical, real-world examples. This exciting new technology is a powerful tool in your arsenal of data management and this book is a must-have to implement and manage your next enterprise Data Warehouse. ...
Real-Time Big Data AnalyticsFive or six years ago, analysts working with big datasets made queries and got the results back overnight. The data world was revolutionized a few years ago when Hadoop and other tools made it possible to get the results from queries in minutes. But the revolution continues. Analysts now demand sub-second, near real-time query results. Fortunately, we have the tools to deliver them. This report examines tools and technologies that are driving real-time big data analytics. ...
Think BayesIf you know how to program with Python and also know a little about probability, you're ready to tackle Bayesian statistics. With this book, you'll learn how to solve statistical problems with Python code instead of mathematical notation, and use discrete probability distributions instead of continuous mathematics. Once you get the math out of the way, the Bayesian fundamentals will become clearer, and you'll begin to apply these techniques to real-world problems.
Bayesian statistical methods are becoming more common and more important, but not many resources are available to help beginners. ...
Microsoft SQL Server 2012 with HadoopWith the explosion of data, the open source Apache Hadoop ecosystem is gaining traction, thanks to its huge ecosystem that has arisen around the core functionalities of its distributed file system (HDFS) and Map Reduce. As of today, being able to have SQL Server talking to Hadoop has become increasingly important because the two are indeed complementary. While petabytes of unstructured data can be stored in Hadoop taking hours to be queried, terabytes of structured data can be stored in SQL Server 2012 and queried in seconds. This leads to the need to transfer and integrate data between Hadoop and SQL Server. ...
Getting Started with MariaDBIn the modern age, storing data is of paramount importance, and this is where databases enter the picture. MariaDB is a relatively new database that has become very popular in a short amount of time. It is a community-developed fork of MySQL and it is designed to be an enhanced and backward compatible database solution.
Getting Started with MariaDB is a practical, hands-on, beginner-friendly guide to installing and using MariaDB. This book will start with the installation of MariaDB before moving on to the basics. You will then learn how to configure and maintain your database with the help of real-world examples. ...
Learning Cython ProgrammingCython is a very powerful combination of Python and C. Using Cython, you can write Python code that calls back and forth from and to C or C++ code natively at any point. It is a language with extra syntax allowing for optional static type declarations. It is also a very popular language as it can be used for multicore programming.
Learning Cython Programming will provide you with a detailed guide to extending your native applications in pure Python; imagine embedding a twisted web server into your native application with pure Python code. You will also learn how to get your new applications up and running by reusing Python's extensive libraries such as Logging and Config Parser to name a few. ...
Doing Data ScienceNow that people are aware that data can make the difference in an election or a business model, data science as an occupation is gaining ground. But how can you get started working in a wide-ranging, interdisciplinary field that's so clouded in hype? This insightful book, based on Columbia University's Introduction to Data Science class, tells you what you need to know.
In many of these chapter-long lectures, data scientists from companies such as Google, Microsoft, and eBay share new algorithms, methods, and models by presenting case studies and the code they use. If you're familiar with linear algebra, probability, and statistics, and have programming experience, this book is an ideal introduction to data science. ...
Anonymizing Health DataWith this practical book, you will learn proven methods for anonymizing health data to help your organization share meaningful datasets, without exposing patient identity. Leading experts Khaled El Emam and Luk Arbuckle walk you through a risk-based methodology, using case studies from their efforts to de-identify hundreds of datasets.
Clinical data is valuable for research and other types of analytics, but making it anonymous without compromising data quality is tricky. This book demonstrates techniques for handling different data types, based on the authors' experiences with a maternal-child registry, inpatient discharge abstracts, health insurance claims, electronic medical record databases, and the World Trade Center disaster registry, among others. ...
Thinking with DataMany analysts are too concerned with tools and techniques for cleansing, modeling, and visualizing datasets and not concerned enough with asking the right questions. In this practical guide, data strategy consultant Max Shron shows you how to put the why before the how, through an often-overlooked set of analytical skills.
Thinking with Data helps you learn techniques for turning data into knowledge you can use. You'll learn a framework for defining your project, including the data you want to collect, and how you intend to approach, organize, and analyze the results. You'll also learn patterns of reasoning that will help you unveil the real problem that needs to be solved. ...
KNIME EssentialsKNIME is an open source data analytics, reporting, and integration platform, which allows you to analyze a small or large amount of data without having to reach out to programming languages like R.
KNIME Essentials teaches you all you need to know to start processing your first data sets using KNIME. It covers topics like installation, data processing, and data visualization including the KNIME reporting features. Data processing forms a fundamental part of KNIME, and KNIME Essentials ensures that you are fully comfortable with this aspect of KNIME before showing you how to visualize this data and generate reports. ...