Social Media Mining with RThe growth of social media over the last decade has revolutionized the way individuals interact and industries conduct business. Individuals produce data at an unprecedented rate by interacting, sharing, and consuming content through social media. However, analyzing this ever-growing pile of data is quite tricky and, if done erroneously, could lead to wrong inferences.
By using this essential guide, you will gain hands-on experience with generating insights from social media data. This book provides detailed instructions on how to obtain, process, and analyze a variety of socially-generated data while providing a theoretical background to help you accurately interpret your findings. You will be shown R code and examples of data that can be used as a springboard as you get the chance to undertake your own analyses of business, social, or political data. ...
IPython Interactive Computing and Visualization CookbookIPython is at the heart of the Python scientific stack. With its widely acclaimed web-based notebook, IPython is today an ideal gateway to data analysis and numerical computing in Python.
IPython Interactive Computing and Visualization Cookbook contains many ready-to-use focused recipes for high-performance scientific computing and data analysis. The first part covers programming techniques, including code quality and reproducibility; code optimization; high-performance computing through dynamic compilation, parallel computing, and graphics card programming. The second part tackles data science, statistics, machine learning, signal and image processing, dynamical systems, and pure and applied mathematics. ...
Fast Data Processing with SparkSpark is a framework for writing fast, distributed programs. Spark solves similar problems as Hadoop MapReduce does but with a fast in-memory approach and a clean functional style API. With its ability to integrate with Hadoop and inbuilt tools for interactive query analysis (Shark), large-scale graph processing and analysis (Bagel), and real-time analysis (Spark Streaming), it can be interactively used to quickly process and query big data sets.
Fast Data Processing with Spark covers how to write distributed map reduce style programs with Spark. The book will guide you through every step required to write effective distributed programs from setting up your cluster and interactively exploring the API, to deploying your job to the cluster, and tuning it for your purposes. ...
Graphing Data with RIt's much easier to grasp complex data relationships with a graph than by scanning numbers in a spreadsheet. This introductory guide shows you how to use the R language to create a variety of useful graphs for visualizing and analyzing complex data for science, business, media, and many other fields. You'll learn methods for highlighting important relationships and trends, reducing data to simpler forms, and emphasizing key numbers at a glance.
Anyone who wants to analyze data will find something useful here - even if you don't have a background in mathematics, statistics, or computer programming. If you want to examine data related to your work, this book is the ideal way to start. ...
Mastering Apache SparkApache Spark is an in-memory cluster based parallel processing system that provides a wide range of functionality like graph processing, machine learning, stream processing and SQL. It operates at unprecedented speeds, is easy to use and offers a rich set of data transformations.
This book aims to take your limited knowledge of Spark to the next level by teaching you how to expand Spark functionality. The book commences with an overview of the Spark eco-system. You will learn how to use MLlib to create a fully working neural net for handwriting recognition. You will then discover how stream processing can be tuned for optimal performance and to ensure parallel processing. The book extends to show how to incorporate H20 for machine learning, Titan for graph based storage, Databricks for cloud-based Spark. Intermediate Scala based code examples are provided for Apache Spark module processing in a CentOS Linux and Databricks cloud environment. ...
Algorithms, 4th EditionThe latest version of Sedgewick's best-selling series, reflecting an indispensable body of knowledge developed over the past several decades.
Full treatment of data structures and algorithms for sorting, searching, graph processing, and string processing, including fifty algorithms every programmer should know.
New Java implementations written in an accessible modular programming style, where all of the code is exposed to the reader and ready to use.
Algorithms are studied in the context of important scientific, engineering, and commercial applications. Clients and algorithms are expressed in real code, not the pseudo-code found in many other books. ...
Storm Real-time Processing CookbookStorm is a free and open source distributed real-time computation system. Storm makes it easy to reliably process unbounded streams of data, doing for real-time processing what Hadoop did for batch processing. Storm is simple, can be used with any programming language, and is a lot of fun to use!
Storm Real Time Processing Cookbook will have basic to advanced recipes on Storm for real-time computation.
The book begins with setting up the development environment and then teaches log stream processing. This will be followed by real-time payments workflow, distributed RPC, integrating it with other software such as Hadoop and Apache Camel, and more. ...
Mastering QlikViewQlikView and its new sister product, Qlik Sense, are the leading tools for BI and data discovery. They both feature the ability to consolidate relevant data from multiple sources into a single application, an associative data model to allow you to explore the data the way your brain works, and state-of-the-art visualizations, dashboards, analysis, and reports.
The book starts by reviewing the best performance-tuning techniques and then advances to help you discover strategies to improve performance and test scalability with JMeter. You will also learn dimensional data modeling and creating best-practice ETL techniques using the QlikView script and QlikView's graphical ETL tool, Expressor. Following this, you will deploy QlikView Governance Dashboard to import multiple data sources and view all the information in a single location. Finally, you will learn why virtualization is important and what are the best practices for virtualization in QlikView. ...
RStudio for R Statistical Computing CookbookThe requirement of handling complex datasets, performing unprecedented statistical analysis, and providing real-time visualizations to businesses has concerned statisticians and analysts across the globe. RStudio is a useful and powerful tool for statistical analysis that harnesses the power of R for computational statistics, visualization, and data science, in an integrated development environment.
This book is a collection of recipes that will help you learn and understand RStudio features so that you can effectively perform statistical analysis and reporting, code editing, and R development. The first few chapters will teach you how to set up your own data analysis project in RStudio, acquire data from different data sources, and manipulate and clean data for analysis and visualization purposes. You'll get hands-on with various data visualization methods using ggplot2, and you will create interactive and multidimensional visualizations with D3.js ...
Building Machine Learning Systems with Python, 2nd EditionUsing machine learning to gain deeper insights from data is a key skill required by modern application developers and analysts alike. Python is a wonderful language to develop machine learning applications. As a dynamic language, it allows for fast exploration and experimentation. With its excellent collection of open source machine learning libraries you can focus on the task at hand while being able to quickly try out many ideas.
This book shows you exactly how to find patterns in your raw data. You will start by brushing up on your Python machine learning knowledge and introducing libraries. You'll quickly get to grips with serious, real-world projects on datasets, using modeling, creating recommendation systems. Later on, the book covers advanced topics such as topic modeling, basket analysis, and cloud computing. These will extend your abilities and enable you to create large complex systems. ...