Build an enterprise search engine using Apache Solr: index and search documents; ingest data from varied sources; apply various text processing techniques; utilize different search capabilities; and customize Solr to retrieve the desired results. Apache Solr: A Practical Approach to Enterprise Search explains each essential concept - backed by practical and industry examples - to help you attain expert-level knowledge.
The book, which assumes a basic knowledge of Java, starts with an introduction to Solr, followed by steps to setting it up, indexing your first set of documents, and searching them. It then introduces you to information retrieval and its implementation in Apache Solr; this will help you understand your search problem, decide the approach to build an effective solution, and use various metrics to evaluate the results.
The book next covers the schema design and techniques to build a text analysis chain for cleansing, normalizing and enriching your documents and addressing different types of search queries. It describes various popular matching techniques which are generally applied to improve the precision and recall of searches.
You will learn the end-to-end process of data ingestion from varied sources, metadata extraction, pre-processing and transformation of content, various search components, query parsers and other advanced search capabilities.
Maven is the #1 build tool used by developers and it has been around for more than a decade. Maven stands out among other build tools due to its extremely extensible architecture, which is built on of the concept of convention over configuration. It's widely used by many open source Java projects under Apache Software Foundation, Sourceforge, Google Code, and more.
Maven Essentials is a fast-paced guide to show you the key concepts in Maven and build automation. We get started by introducing you to Maven and exploring its core concepts and architecture. Next, you will learn about and write a Project Object Model (POM) while creating your own Maven project. You will also find out how to create custom archetypes and plugins to establish the most common goals in build automation. After this, you'll get to know how to design the build to prevent any maintenance nightmares, with proper dependency management. We then explore Maven build lifecycles and Maven assemblies. Finally, you will discover how to apply the best practices when designing a build system to improve developer productivity.
||Learning Apache Thrift|
This book will help you set aside the basics of service-oriented systems through your first Apache Thrift-powered app. Then, progressing to more complex examples, it will provide you with tips for running large-scale applications in production environments.
||Mastering Apache Spark|
Apache Spark is an in-memory cluster based parallel processing system that provides a wide range of functionality like graph processing, machine learning, stream processing and SQL. It operates at unprecedented speeds, is easy to use and offers a rich set of data transformations.
This book aims to take your limited knowledge of Spark to the next level by teaching you how to expand Spark functionality. The book commences with an overview of the Spark eco-system. You will learn how to use MLlib to create a fully working neural net for handwriting recognition. You will then discover how stream processing can be tuned for optimal performance and to ensure parallel processing. The book extends to show how to incorporate H20 for machine learning, Titan for graph based storage, Databricks for cloud-based Spark. Intermediate Scala based code examples are provided for Apache Spark module processing in a CentOS Linux and Databricks cloud environment.
||Apache Mesos Essentials|
Apache Mesos is a cluster manager that provides efficient resource isolation and sharing across distributed applications, or frameworks. It allows developers to concurrently run the likes of Hadoop, Spark, Storm, and other applications on a dynamically shared pool of nodes. With Mesos, you have the power to manage a wide range of resources in a multi-tenant environment.
Starting with the basics, this book will give you an insight into all the features that Mesos has to offer. You will first learn how to set up Mesos in various environments from data centers to the cloud. You will then learn how to implement self-managed Platform as a Service environment with Mesos using various service schedulers, such as Chronos, Aurora, and Marathon. You will then delve into the depths of Mesos fundamentals and learn how to build distributed applications using Mesos primitives.
||Apache Mahout Essentials|
Apache Mahout is a scalable machine learning library with algorithms for clustering, classification, and recommendations. It empowers users to analyze patterns in large, diverse, and complex datasets faster and more scalably.
This book is an all-inclusive guide to analyzing large and complex datasets using Apache Mahout. It explains complicated but very effective machine learning algorithms simply, in relation to real-world practical examples.
Starting from the fundamental concepts of machine learning and Apache Mahout, this book guides you through Apache Mahout's implementations of machine learning techniques including classification, clustering, and recommendations. During this exciting walkthrough, real-world applications, a diverse range of popular algorithms and their implementations, code examples, evaluation strategies, and best practices are given for each technique. Finally, you will learn vdata visualization techniques for Apache Mahout to bring your data to life.
||Apache Solr Enterprise Search Server, 3rd Edition|
Solr is a widely popular open source enterprise search server that delivers powerful search and faceted navigation features—features that are elusive with databases. Solr supports complex search criteria, faceting, result highlighting, query-completion, query spell-checking, relevancy tuning, geospatial searches, and much more.
This book is a comprehensive resource for just about everything Solr has to offer, and it will take you from first exposure to development and deployment in no time. Even if you wish to use Solr 5, you should find the information to be just as applicable due to Solr's high regard for backward compatibility. The book includes some useful information specific to Solr 5.
Get a solid grounding in Apache Oozie, the workflow scheduler system for managing Hadoop jobs. With this hands-on guide, two experienced Hadoop practitioners walk you through the intricacies of this powerful and flexible platform, with numerous examples and real-world use cases.
Once you set up your Oozie server, you'll dive into techniques for writing and coordinating workflows, and learn how to write complex data pipelines. Advanced topics show you how to handle shared libraries in Oozie, as well as how to implement and manage Oozie's security capabilities.
||Learning Apache Mahout|
In the past few years the generation of data and our capability to store and process it has grown exponentially. There is a need for scalable analytics frameworks and people with the right skills to get the information needed from this Big Data. Apache Mahout is one of the first and most prominent Big Data machine learning platforms. It implements machine learning algorithms on top of distributed processing platforms such as Hadoop and Spark.
Starting with the basics of Mahout and machine learning, you will explore prominent algorithms and their implementation in Mahout development. You will learn about Mahout building blocks, addressing feature extraction, reduction and the curse of dimensionality, delving into classification use cases with the random forest and Naïve Bayes classifier and item and user-based recommendation. You will then work with clustering Mahout using the K-means algorithm and implement Mahout without MapReduce. Finish with a flourish by exploring end-to-end use cases on customer analytics and test analytics to get a real-life practical know-how of analytics projects.
||Apache Solr Search Patterns|
Apache Solr is an open source search platform built on a Java library called Lucene. It serves as a search platform for many websites, as it has the capability of indexing and searching multiple websites to fetch desired results.
We begin with a brief introduction of analyzers and tokenizers to understand the challenges associated with implementing large-scale indexing and multilingual search functionality. We then move on to working with custom queries and understanding how filters work internally. While doing so, we also create our own query language or Solr plugin that does proximity searches. Furthermore, we discuss how Solr can be used for real-time analytics and tackle problems faced during its implementation in e-commerce search. We then dive deep into the spatial features such as indexing strategies and search/filtering strategies for a spatial search. We also do an in-depth analysis of problems faced in an ad serving platform and how Solr can be used to solve these problems.
||Apache Solr Essentials|
Search is everywhere. Users always expect a search facility in mobile or web applications that allows them to find things in a fast and friendly manner.
Apache Solr Essentials is a fast-paced guide to help you quickly learn the process of creating a scalable, efficient, and powerful search application. The book starts off by explaining the fundamentals of Solr and then goes on to cover various topics such as data indexing, ways of extending Solr, client APIs and their indexing and data searching capabilities, an introduction to the administration, monitoring, and tuning of a Solr instance, as well as the concepts of sharding and replication. Next, you'll learn about various Solr extensions and how to contribute to the Solr community. By the end of this book, you will be able to create excellent search applications with the help of Solr.