Data Analysis with Open Source ToolsCollecting data is relatively easy, but turning raw information into something useful requires that you know how to extract precisely what you need. With this insightful book, intermediate to experienced programmers interested in data analysis will learn techniques for working with data in a business environment. You'll learn how to look at data to discover what it contains, how to capture those ideas in conceptual models, and then feed your understanding back into the organization through business plans, metrics dashboards, and other applications.
Along the way, you'll experiment with concepts through hands-on workshops at the end of each chapter. Above all, you'll learn how to think about the results you want to achieve - rather than rely on tools to think for you. ...
Big Data GlossaryTo help you navigate the large number of new data tools available, this guide describes 60 of the most recent innovations, from NoSQL databases and MapReduce approaches to machine learning and visualization tools. Descriptions are based on first-hand experience with these tools in a production environment.
This handy glossary also includes a chapter of key terms that help define many of these tool categories: NoSQL Databases, MapReduce, Storage, Servers, Processing, Natural Language Processing, Machine Learning, Visualization, Acquisition, Serialization. ...
Data Mashups in R.How do you use R to import, manage, visualize, and analyze real-world data? With this short, hands-on tutorial, you learn how to collect online data, massage it into a reasonable form, and work with it using R facilities to interact with web servers, parse HTML and XML, and more. Rather than use canned sample data, you'll plot and analyze current home foreclosure auctions in Philadelphia.
This practical mashup exercise shows you how to access spatial data in several formats locally and over the Web to produce a map of home foreclosures. It's an excellent way to explore how the R environment works with R packages and performs statistical analysis. ...
Think PythonIf you want to learn how to program, working with Python is an excellent way to start. This hands-on guide takes you through the language one step at a time, beginning with basic programming concepts before moving on to functions, recursion, data structures, and object-oriented design.
Through exercises in each chapter, you'll try out programming concepts as you learn them. Think Python is ideal for students at the high school or college level, as well as self-learners, home-schooled students, and professionals who need to learn programming basics. ...
Spring DataYou can choose several data access frameworks when building Java enterprise applications that work with relational databases. But what about big data? This hands-on introduction shows you how Spring Data makes it relatively easy to build applications across a wide range of new data access technologies such as NoSQL and Hadoop.
Through several sample projects, you'll learn how Spring Data provides a consistent programming model that retains NoSQL-specific features and capabilities, and helps you develop Hadoop applications across a wide range of use-cases such as data analysis, event stream processing, and workflow. ...
Beginning PythonAs an open source, object-oriented programming language, Python is easy to understand, extendable, and user-friendly. This book covers every aspect of Python so that you can get started writing your own programs with Python today. Author James Payne begins with the most basic concepts of the Python language - placing a special focus on the 2.6 and 3.1 versions - and he offers an in-depth look at existing Python programs so you can learn by example. Topics progress from strings, lists, and dictionaries to classes, objects, and modules. With this book, you will learn how to quickly and confidently create a robust, reliable, and reusable Python application. ...
Bad Data HandbookWhat is bad data? Some people consider it a technical phenomenon, like missing values or malformed records, but bad data includes a lot more. In this handbook, data expert Q. Ethan McCallum has gathered 19 colleagues from every corner of the data arena to reveal how they've recovered from nasty data problems.
From cranky storage to poor representation to misguided policy, there are many paths to bad data. Bottom line? Bad data is data that gets in the way. This book explains effective ways to get around it. ...
SciPy and NumPyWant to learn SciPy and NymPy quickly? Cut through the complexity of online documentation with this concise and illustrated book, and discover how easily you can get up to speed with these Python libraries. You'll understand why they're powerful enough for many of today's leading scientists and engineers.
Learn how to use NumPy for numerical processing, including array indexing, math operations, and loading and saving data. With SciPy, you'll work with advanced mathematical functions such as optimization, interpolation, integration, clustering, statistics, and other tools that take scientific programming to a whole new level. ...
PythonPython: Create-Modify-Reuse is designed for all levels of Python developers interested in a practical, hands-on way of learning Python development. This book is designed to show you how to use Python (in combination with the raw processing power of your computer) to accomplish real-world tasks in a more efficient way. Don't look for an exhaustive description of the Python language - you won't find it. The book's main purpose is not to thoroughly cover the Python language, but rather to show how you can use Python to create robust, real-world applications. ...
Distributed Network DataBuild your own distributed sensor network to collect, analyze, and visualize real-time data about our human environment - including noise level, temperature, and people flow. With this hands-on book, you'll learn how to turn your project idea into working hardware, using the easy-to-learn Arduino microcontroller and off-the-shelf sensors.
Authors Alasdair Allan and Kipp Bradford walk you through the entire process, from prototyping a simple sensor node to performing real-time analysis on data captured by a deployed multi-sensor network. ...