||High Performance Spark|
If you've successfully used Apache Spark to solve medium sized-problems, but still struggle to realize the "Spark promise" of unparalleled performance on big data, this book is for you. High Performance Spark shows you how take advantage of Spark at scale, so you can grow beyond the novice-level. It's ideal for software engineers, data engineers, developers, and system administrators working with large-scale data applications.
Learn how to make Spark jobs run faster; Productionize exploratory data science with Spark; Handle even larger data sets with Spark; Reduce pipeline running times for faster insights.
||Apache Cordova in Action|
Apache Cordova in Action teaches you how to design, create, and launch hybrid mobile apps people will want to use. With the help of straightforward, real-world examples, you'll learn to build apps from the Cordova CLI and to make use of native device features like the camera and accelerometer. You'll learn testing techniques and discover the PhoneGap Build service and how to submit your apps to Google Play and the Apple App Store. Along the way, this helpful guide discusses mobile app design and shows you how to create effective, professional-quality UI and UX.
||Practical Graph Analytics with Apache Giraph|
Practical Graph Analytics with Apache Giraph helps you build data mining and machine learning applications using the Apache Foundation's Giraph framework for graph processing. This is the same framework as used by Facebook, Google, and other social media analytics operations to derive business value from vast amounts of interconnected data points.
Graphs arise in a wealth of data scenarios and describe the connections that are naturally formed in both digital and real worlds. Examples of such connections abound in online social networks such as Facebook and Twitter, among users who rate movies from services like Netflix and Amazon Prime, and are useful even in the context of biological networks for scientific research. Whether in the context of business or science, viewing data as connected adds value by increasing the amount of information available to be drawn from that data and put to use in generating new revenue or scientific opportunities.
Apache Giraph offers a simple yet flexible programming model targeted to graph algorithms and designed to scale easily to accommodate massive amounts of data. Originally developed at Yahoo!, Giraph is now a top top-level project at the Apache Foundation, and it enlists contributors from companies such as Facebook, LinkedIn, and Twitter.
Build an enterprise search engine using Apache Solr: index and search documents; ingest data from varied sources; apply various text processing techniques; utilize different search capabilities; and customize Solr to retrieve the desired results. Apache Solr: A Practical Approach to Enterprise Search explains each essential concept - backed by practical and industry examples - to help you attain expert-level knowledge.
The book, which assumes a basic knowledge of Java, starts with an introduction to Solr, followed by steps to setting it up, indexing your first set of documents, and searching them. It then introduces you to information retrieval and its implementation in Apache Solr; this will help you understand your search problem, decide the approach to build an effective solution, and use various metrics to evaluate the results.
The book next covers the schema design and techniques to build a text analysis chain for cleansing, normalizing and enriching your documents and addressing different types of search queries. It describes various popular matching techniques which are generally applied to improve the precision and recall of searches.
You will learn the end-to-end process of data ingestion from varied sources, metadata extraction, pre-processing and transformation of content, various search components, query parsers and other advanced search capabilities.
Maven is the #1 build tool used by developers and it has been around for more than a decade. Maven stands out among other build tools due to its extremely extensible architecture, which is built on of the concept of convention over configuration. It's widely used by many open source Java projects under Apache Software Foundation, Sourceforge, Google Code, and more.
Maven Essentials is a fast-paced guide to show you the key concepts in Maven and build automation. We get started by introducing you to Maven and exploring its core concepts and architecture. Next, you will learn about and write a Project Object Model (POM) while creating your own Maven project. You will also find out how to create custom archetypes and plugins to establish the most common goals in build automation. After this, you'll get to know how to design the build to prevent any maintenance nightmares, with proper dependency management. We then explore Maven build lifecycles and Maven assemblies. Finally, you will discover how to apply the best practices when designing a build system to improve developer productivity.
||Learning Apache Thrift|
This book will help you set aside the basics of service-oriented systems through your first Apache Thrift-powered app. Then, progressing to more complex examples, it will provide you with tips for running large-scale applications in production environments.
||Apache Mesos Essentials|
Apache Mesos is a cluster manager that provides efficient resource isolation and sharing across distributed applications, or frameworks. It allows developers to concurrently run the likes of Hadoop, Spark, Storm, and other applications on a dynamically shared pool of nodes. With Mesos, you have the power to manage a wide range of resources in a multi-tenant environment.
Starting with the basics, this book will give you an insight into all the features that Mesos has to offer. You will first learn how to set up Mesos in various environments from data centers to the cloud. You will then learn how to implement self-managed Platform as a Service environment with Mesos using various service schedulers, such as Chronos, Aurora, and Marathon. You will then delve into the depths of Mesos fundamentals and learn how to build distributed applications using Mesos primitives.
||Apache Mahout Essentials|
Apache Mahout is a scalable machine learning library with algorithms for clustering, classification, and recommendations. It empowers users to analyze patterns in large, diverse, and complex datasets faster and more scalably.
This book is an all-inclusive guide to analyzing large and complex datasets using Apache Mahout. It explains complicated but very effective machine learning algorithms simply, in relation to real-world practical examples.
Starting from the fundamental concepts of machine learning and Apache Mahout, this book guides you through Apache Mahout's implementations of machine learning techniques including classification, clustering, and recommendations. During this exciting walkthrough, real-world applications, a diverse range of popular algorithms and their implementations, code examples, evaluation strategies, and best practices are given for each technique. Finally, you will learn vdata visualization techniques for Apache Mahout to bring your data to life.
||Apache Solr Enterprise Search Server, 3rd Edition|
Solr is a widely popular open source enterprise search server that delivers powerful search and faceted navigation features—features that are elusive with databases. Solr supports complex search criteria, faceting, result highlighting, query-completion, query spell-checking, relevancy tuning, geospatial searches, and much more.
This book is a comprehensive resource for just about everything Solr has to offer, and it will take you from first exposure to development and deployment in no time. Even if you wish to use Solr 5, you should find the information to be just as applicable due to Solr's high regard for backward compatibility. The book includes some useful information specific to Solr 5.
Get a solid grounding in Apache Oozie, the workflow scheduler system for managing Hadoop jobs. With this hands-on guide, two experienced Hadoop practitioners walk you through the intricacies of this powerful and flexible platform, with numerous examples and real-world use cases.
Once you set up your Oozie server, you'll dive into techniques for writing and coordinating workflows, and learn how to write complex data pipelines. Advanced topics show you how to handle shared libraries in Oozie, as well as how to implement and manage Oozie's security capabilities.
||Learning Apache Mahout|
In the past few years the generation of data and our capability to store and process it has grown exponentially. There is a need for scalable analytics frameworks and people with the right skills to get the information needed from this Big Data. Apache Mahout is one of the first and most prominent Big Data machine learning platforms. It implements machine learning algorithms on top of distributed processing platforms such as Hadoop and Spark.
Starting with the basics of Mahout and machine learning, you will explore prominent algorithms and their implementation in Mahout development. You will learn about Mahout building blocks, addressing feature extraction, reduction and the curse of dimensionality, delving into classification use cases with the random forest and Naïve Bayes classifier and item and user-based recommendation. You will then work with clustering Mahout using the K-means algorithm and implement Mahout without MapReduce. Finish with a flourish by exploring end-to-end use cases on customer analytics and test analytics to get a real-life practical know-how of analytics projects.