||Mastering Apache Spark|
Apache Spark is an in-memory cluster based parallel processing system that provides a wide range of functionality like graph processing, machine learning, stream processing and SQL. It operates at unprecedented speeds, is easy to use and offers a rich set of data transformations.
This book aims to take your limited knowledge of Spark to the next level by teaching you how to expand Spark functionality. The book commences with an overview of the Spark eco-system. You will learn how to use MLlib to create a fully working neural net for handwriting recognition. You will then discover how stream processing can be tuned for optimal performance and to ensure parallel processing. The book extends to show how to incorporate H20 for machine learning, Titan for graph based storage, Databricks for cloud-based Spark. Intermediate Scala based code examples are provided for Apache Spark module processing in a CentOS Linux and Databricks cloud environment.
||Apache Mesos Essentials|
Apache Mesos is a cluster manager that provides efficient resource isolation and sharing across distributed applications, or frameworks. It allows developers to concurrently run the likes of Hadoop, Spark, Storm, and other applications on a dynamically shared pool of nodes. With Mesos, you have the power to manage a wide range of resources in a multi-tenant environment.
Starting with the basics, this book will give you an insight into all the features that Mesos has to offer. You will first learn how to set up Mesos in various environments from data centers to the cloud. You will then learn how to implement self-managed Platform as a Service environment with Mesos using various service schedulers, such as Chronos, Aurora, and Marathon. You will then delve into the depths of Mesos fundamentals and learn how to build distributed applications using Mesos primitives.
||Apache Mahout Essentials|
Apache Mahout is a scalable machine learning library with algorithms for clustering, classification, and recommendations. It empowers users to analyze patterns in large, diverse, and complex datasets faster and more scalably.
This book is an all-inclusive guide to analyzing large and complex datasets using Apache Mahout. It explains complicated but very effective machine learning algorithms simply, in relation to real-world practical examples.
Starting from the fundamental concepts of machine learning and Apache Mahout, this book guides you through Apache Mahout's implementations of machine learning techniques including classification, clustering, and recommendations. During this exciting walkthrough, real-world applications, a diverse range of popular algorithms and their implementations, code examples, evaluation strategies, and best practices are given for each technique. Finally, you will learn vdata visualization techniques for Apache Mahout to bring your data to life.
||Apache Solr Enterprise Search Server, 3rd Edition|
Solr is a widely popular open source enterprise search server that delivers powerful search and faceted navigation features—features that are elusive with databases. Solr supports complex search criteria, faceting, result highlighting, query-completion, query spell-checking, relevancy tuning, geospatial searches, and much more.
This book is a comprehensive resource for just about everything Solr has to offer, and it will take you from first exposure to development and deployment in no time. Even if you wish to use Solr 5, you should find the information to be just as applicable due to Solr's high regard for backward compatibility. The book includes some useful information specific to Solr 5.
Get a solid grounding in Apache Oozie, the workflow scheduler system for managing Hadoop jobs. With this hands-on guide, two experienced Hadoop practitioners walk you through the intricacies of this powerful and flexible platform, with numerous examples and real-world use cases.
Once you set up your Oozie server, you'll dive into techniques for writing and coordinating workflows, and learn how to write complex data pipelines. Advanced topics show you how to handle shared libraries in Oozie, as well as how to implement and manage Oozie's security capabilities.
||Learning Apache Mahout|
In the past few years the generation of data and our capability to store and process it has grown exponentially. There is a need for scalable analytics frameworks and people with the right skills to get the information needed from this Big Data. Apache Mahout is one of the first and most prominent Big Data machine learning platforms. It implements machine learning algorithms on top of distributed processing platforms such as Hadoop and Spark.
Starting with the basics of Mahout and machine learning, you will explore prominent algorithms and their implementation in Mahout development. You will learn about Mahout building blocks, addressing feature extraction, reduction and the curse of dimensionality, delving into classification use cases with the random forest and Naïve Bayes classifier and item and user-based recommendation. You will then work with clustering Mahout using the K-means algorithm and implement Mahout without MapReduce. Finish with a flourish by exploring end-to-end use cases on customer analytics and test analytics to get a real-life practical know-how of analytics projects.
||Apache Solr Search Patterns|
Apache Solr is an open source search platform built on a Java library called Lucene. It serves as a search platform for many websites, as it has the capability of indexing and searching multiple websites to fetch desired results.
We begin with a brief introduction of analyzers and tokenizers to understand the challenges associated with implementing large-scale indexing and multilingual search functionality. We then move on to working with custom queries and understanding how filters work internally. While doing so, we also create our own query language or Solr plugin that does proximity searches. Furthermore, we discuss how Solr can be used for real-time analytics and tackle problems faced during its implementation in e-commerce search. We then dive deep into the spatial features such as indexing strategies and search/filtering strategies for a spatial search. We also do an in-depth analysis of problems faced in an ad serving platform and how Solr can be used to solve these problems.
||Apache Solr Essentials|
Search is everywhere. Users always expect a search facility in mobile or web applications that allows them to find things in a fast and friendly manner.
Apache Solr Essentials is a fast-paced guide to help you quickly learn the process of creating a scalable, efficient, and powerful search application. The book starts off by explaining the fundamentals of Solr and then goes on to cover various topics such as data indexing, ways of extending Solr, client APIs and their indexing and data searching capabilities, an introduction to the administration, monitoring, and tuning of a Solr instance, as well as the concepts of sharding and replication. Next, you'll learn about various Solr extensions and how to contribute to the Solr community. By the end of this book, you will be able to create excellent search applications with the help of Solr.
||Apache Hive Essentials|
In this book, we prepare you for your journey into big data by firstly introducing you to backgrounds in the big data domain along with the process of setting up and getting familiar with your Hive working environment. Next, the book guides you through discovering and transforming the values of big data with the help of examples. It also hones your skill in using the Hive language in an efficient manner. Towards the end, the book focuses on advanced topics such as performance, security, and extensions in Hive, which will guide you on exciting adventures on this worthwhile big data journey.
By the end of the book, you will be familiar with Hive and able to work efficiently to find solutions to big data problems.
||Learning Apache Mahout Classification|
This book is a practical guide that explains the classification algorithms provided in Apache Mahout with the help of actual examples. Starting with the introduction of classification and model evaluation techniques, we will explore Apache Mahout and learn why it is a good choice for classification.
Next, you will learn about different classification algorithms and models such as the Naïve Bayes algorithm, the Hidden Markov Model, and so on.
Finally, along with the examples that assist you in the creation of models, this book helps you to build a mail classification system that can be produced as soon as it is developed. After reading this book, you will be able to understand the concept of classification and the various algorithms along with the art of building your own classifiers.
||Learning Apache Kafka, 2nd Edition|
Kafka is one of those systems that is very simple to describe at a high level but has an incredible depth of technical detail when you dig deeper.
Learning Apache Kafka Second Edition provides you with step-by-step, practical examples that help you take advantage of the real power of Kafka and handle hundreds of megabytes of messages per second from multiple clients. This book teaches you everything you need to know, right from setting up Kafka clusters to understanding basic blocks like producer, broker, and consumer blocks. Once you are all set up, you will then explore additional settings and configuration changes to achieve ever more complex goals. You will also learn how Kafka is designed internally and what configurations make it more effective. Finally, you will learn how Kafka works with other tools such as Hadoop, Storm, and so on.