The Visual OrganizationThe era of Big Data as arrived, and most organizations are woefully unprepared. Slowly, many are discovering that stalwarts like Excel spreadsheets, KPIs, standard reports, and even traditional business intelligence tools aren't sufficient. These old standbys can't begin to handle today's increasing streams, volumes, and types of data.
In The Visual Organization, award-winning author and technology expert Phil Simon looks at how an increasingly number of organizations are embracing new dataviz tools and, more important, a new mind-set based upon data discovery and exploration. Simon adroitly shows how Amazon, Apple, Facebook, Google, Twitter, and other tech heavyweights use powerful data visualization tools to garner fascinating insights into their businesses. But make no mistake: these companies are hardly alone. Organizations of all types, industries, sizes are representing their data in new and amazing ways. As a result, they are asking better questions and making better business ...
Learning to Love Data ScienceUntil recently, many people thought big data was a passing fad. "Data science" was an enigmatic term. Today, big data is taken seriously, and data science is considered downright sexy. With this anthology of reports from award-winning journalist Mike Barlow, you'll appreciate how data science is fundamentally altering our world, for better and for worse.
Barlow paints a picture of the emerging data space in broad strokes. From new techniques and tools to the use of data for social good, you'll find out how far data science reaches. ...
Web Scraping with PythonThe Internet contains the most useful set of data ever assembled, largely publicly accessible for free. However, this data is not easily reusable. It is embedded within the structure and style of websites and needs to be carefully extracted to be useful. Web scraping is becoming increasingly useful as a means to easily gather and make sense of the plethora of information available online. Using a simple language like Python, you can crawl the information out of complex websites using simple programming.
This book is the ultimate guide to using Python to scrape data from websites. In the early chapters it covers how to extract data from static web pages and how to use caching to manage the load on servers. After the basics we'll get our hands dirty with building a more sophisticated crawler with threads and more advanced topics. Learn step-by-step how to use Ajax URLs, employ the Firebug extension for monitoring, and indirectly scrape data. Discover more scraping nitty-gritties such ...
R in ActionR in Action is the first book to present both the R system and the use cases that make it such a compelling package for business developers. The book begins by introducing the R language, including the development environment. Focusing on practical solutions, the book also offers a crash course in practical statistics and covers elegant methods for dealing with messy and incomplete data using features of R.
R is a powerful language for statistical computing and graphics that can handle virtually any data-crunching task. It runs on all important platforms and provides thousands of useful specialized modules and utilities. This makes R a great way to get meaningful information from mountains of raw data. ...
The Definitive Guide to MongoDB, 3rd EditionThe Definitive Guide to MongoDB, Third Edition, is updated for MongoDB 3 and includes all of the latest MongoDB features, including the aggregation framework introduced in version 2.2 and hashed indexes in version 2.4. The Third Edition also now includes Node.js along with Python.
MongoDB is the most popular of the "Big Data" NoSQL database technologies, and it's still growing. David Hows from 10gen, along with experienced MongoDB authors Peter Membrey and Eelco Plugge, provide their expertise and experience in teaching you everything you need to know to become a MongoDB pro. ...
Client-Side Data StorageOne of the most useful features of today's modern browsers is the ability to store data right on the user's computer or mobile device. Even as more people move toward the cloud, client-side storage can still save web developers a lot of time and money, if you do it right. This hands-on guide demonstrates several storage APIs in action. You'll learn how and when to use them, their plusses and minuses, and steps for implementing one or more of them in your application.
Ideal for experienced web developers familiar with JavaScript, this book also introduces several open source libraries that make storage APIs easier to work with. ...
Spark for Python DevelopersLooking for a cluster computing system that provides high-level APIs? Apache Spark is your answer—an open source, fast, and general purpose cluster computing system. Spark's multi-stage memory primitives provide performance up to 100 times faster than Hadoop, and it is also well-suited for machine learning algorithms.
Are you a Python developer inclined to work with Spark engine? If so, this book will be your companion as you create data-intensive app using Spark as a processing engine, Python visualization libraries, and web frameworks such as Flask.
To begin with, you will learn the most effective way to install the Python development environment powered by Spark, Blaze, and Bookeh. You will then find out how to connect with data stores such as MySQL, MongoDB, Cassandra, and Hadoop.
You'll expand your skills throughout, getting familiarized with the various data sources (Github, Twitter, Meetup, and Blogs), their data structures, and solutions to effectively tackle complex ...
Panel Data Analysis using EViewsA comprehensive and accessible guide to panel data analysis using EViews software.
This book explores the use of EViews software in creating panel data analysis using appropriate empirical models and real datasets. Guidance is given on developing alternative descriptive statistical summaries for evaluation and providing policy analysis based on pool panel data. Various alternative models based on panel data are explored, including univariate general linear models, fixed effect models and causal models, and guidance on the advantages and disadvantages of each one is given. ...
Learning Informatica PowerCenter 9.xInformatica PowerCenter provides the perfect platform to utilize and leverage business data. It allows you to easily, conveniently, and efficiently work on different types of data.
This book covers functionality such as creating/importing source and target, identifying errors, and debugging your mapping through a series of comprehensive tutorials. Besides learning about types of Slowly Changing Dimensions (SCDs), you will learn to create and link workflows. As you progress, exhaustive knowledge of transformations and techniques to create folders, migrate code, and optimize system performance will be explored in detail.
The step-by-step approach and adoption of real-time scenarios will guide you through effectively accessing all core functionalities offered by Informatica PowerCenter. ...
VMware vRealize Operations Performance and Capacity ManagementVMware vRealize Operations is a suite of products that automates operations management using patented analytics and an integrated approach to performance, capacity, and configuration management. vRealize Operations Manager is the most important component of this suite that helps administrators to maintain and troubleshoot their VMware environment as well as their physical environment.
This book takes you through the fundamental differences between a Software-Defined Data Center and a classic physical data center, and how these differences impact both architecture and operations. From a strategic point of view, you will come across the most common challenges associated with performance management in a Software-Defined Data Center. Furthermore, you will learn all the key counters in vSphere and vRealize Operations, understand their dependencies, and acquaint yourself with practical solutions to configure them for a healthy virtual environment. ...