Pro Apache Hadoop, 2nd EditionPro Apache Hadoop, Second Edition brings you up to speed on Hadoop – the framework of big data. Revised to cover Hadoop 2.0, the book covers the very latest developments such as YARN (aka MapReduce 2.0), new HDFS high-availability features, and increased scalability in the form of HDFS Federations. All the old content has been revised too, giving the latest on the ins and outs of MapReduce, cluster design, the Hadoop Distributed File System, and more.
This book covers everything you need to build your first Hadoop cluster and begin analyzing and deriving value from your business and scientific data. Learn to solve big-data problems the MapReduce way, by breaking a big problem into chunks and creating small-scale solutions that can be flung across thousands upon thousands of nodes to analyze large data volumes in a short amount of wall-clock time. Learn how to let Hadoop take care of distributing and parallelizing your software - you just focus on the code; Hadoop takes care ...
Learning StormStarting with the very basics of Storm, you will learn how to set up Storm on a single machine and move on to deploying Storm on your cluster. You will understand how Kafka can be integrated with Storm using the Kafka spout.
You will then proceed to explore the Trident abstraction tool with Storm to perform stateful stream processing, guaranteeing single message processing in every topology. You will move ahead to learn how to integrate Hadoop with Storm. Next, you will learn how to integrate Storm with other well-known Big Data technologies such as HBase, Redis, and Kafka to realize the full potential of Storm. ...
Apache Solr High PerformanceApache Solr is one of the most popular open source search servers available on the web. However, simply setting up Apache Solr is not enough to ensure the success of your web product. To maximize efficiency, you need to use techniques to boost Solr performance in order to return relevant results faster. You need to implement robust techniques that focus on optimizing the performance of your Solr instances and also troubleshoot issues that are prone to arise while maintaining Solr.
Apache Solr High Performance is a practical guide that will help you explore and take full advantage of the robust nature of Apache Solr so as to achieve optimized Solr instances, especially in terms of performance. ...
Apache Hadoop YARNApache Hadoop is helping drive the Big Data revolution. Now, its data processing has been completely overhauled: Apache Hadoop YARN provides resource management at data center scale and easier ways to create distributed applications that process petabytes of data. And now in Apache Hadoop YARN, two Hadoop technical leaders show you how to develop new applications and adapt existing code to fully leverage these revolutionary advances.
YARN project founder Arun Murthy and project lead Vinod Kumar Vavilapalli demonstrate how YARN increases scalability and cluster utilization, enables new programming models and services, and opens new options beyond Java and batch processing. They walk you through the entire YARN project lifecycle, from installation through deployment. ...
60 Recipes for Apache CloudStackPlanning to deploy and maintain a public, private, or hybrid cloud service? This cookbook's handy how-to recipes help you quickly learn and install Apache CloudStack, along with several API clients, API wrappers, data architectures, and configuration management technologies that work as part of CloudStack's ecosystem.
You'll learn how to use Vagrant, Ansible, Chef, Fluentd, Libcloud, and several other open source tools that let you build and operate CloudStack better and faster. If you're an experienced programmer, system administrator, or DevOps practitioner familiar with bash, Git, package management, and some Python, you're ready to go. ...
Pro Apache Hadoop, 2nd EditionPro Apache Hadoop, Second Edition brings you up to speed on Hadoop – the framework of big data. Revised to cover Hadoop 2.0, the book covers the very latest developments such as YARN (aka MapReduce 2.0), new HDFS high-availability features, and increased scalability in the form of HDFS Federations. All the old content has been revised too, giving the latest on the ins and outs of MapReduce, cluster design, the Hadoop Distributed File System, and more.
This book covers everything you need to build your first Hadoop cluster and begin analyzing and deriving value from your business and scientific data. Learn to solve big-data problems the MapReduce way, by breaking a big problem into chunks and creating small-scale solutions that can be flung across thousands upon thousands of nodes to analyze large data volumes in a short amount of wall-clock time. Learn how to let Hadoop take care of distributing and parallelizing your software - you just focus on the code; Hadoop takes care ...
Learning StormStarting with the very basics of Storm, you will learn how to set up Storm on a single machine and move on to deploying Storm on your cluster. You will understand how Kafka can be integrated with Storm using the Kafka spout.
You will then proceed to explore the Trident abstraction tool with Storm to perform stateful stream processing, guaranteeing single message processing in every topology. You will move ahead to learn how to integrate Hadoop with Storm. Next, you will learn how to integrate Storm with other well-known Big Data technologies such as HBase, Redis, and Kafka to realize the full potential of Storm. ...
Cassandra High AvailabilityApache Cassandra is a massively scalable, peer-to-peer database designed for 100 percent uptime, with deployments in the tens of thousands of nodes supporting petabytes of data.
This book offers readers a practical insight into building highly available, real-world applications using Apache Cassandra. The book starts with the fundamentals, helping you to understand how the architecture of Apache Cassandra allows it to achieve 100 percent uptime when other systems struggle to do so. You'll have an excellent understanding of data distribution, replication, and Cassandra's highly tunable consistency model. This is followed by an in-depth look at Cassandra's robust support for multiple data centers, and how to scale out a cluster. Next, the book explores the domain of application design, with chapters discussing the native driver and data modeling. Lastly, you'll find out how to steer clear of common antipatterns and take advantage of Cassandra's ability to fail gra ...
Beginning Apache Cassandra DevelopmentBeginning Apache Cassandra Development introduces you to one of the most robust and best-performing NoSQL database platforms on the planet. Apache Cassandra is a document database following the JSON document model. It is specifically designed to manage large amounts of data across many commodity servers without there being any single point of failure. This design approach makes Apache Cassandra a robust and easy-to-implement platform when high availability is needed.
Apache Cassandra can be used by developers in Java, PHP, Python, and JavaScript—the primary and most commonly used languages. In Beginning Apache Cassandra Development, author and Cassandra expert Vivek Mishra takes you through using Apache Cassandra from each of these primary languages. Mishra also covers the Cassandra Query Language (CQL), the Apache Cassandra analog to SQL. You'll learn to develop applications sourcing data from Cassandra, query that data, and delive ...
Mastering Apache Maven 3Maven is the number one build tool used by developers for more than a decade. Maven stands out among other build tools due to its extremely extensible architecture, which is built on top of the concept "convention over configuration". This has made Maven the de-facto tool used to manage and build Java projects.
This book is a technical guide to the difficult and complex concepts in Maven and build automation. It starts with the core Maven concepts and its architecture, and then explains how to build extensions such as plugins, archetypes, and lifecycles in depth.
This book is a step-by-step guide that shows you how to use Apache Maven in an optimal way to address your enterprise build requirements. ...
Learning Karaf CellarApache Karaf is a popular OSGi container that provides rich and broad features, and together with Cellar, you can easily manage farms of containers that provide synchronization between the instances of Karaf. In a real production system, users require a farm of containers to implement failover and scalability, as well as the tools required to provision the different members of a cluster. This book will help you understand the architecture, installation, and configuration of a cluster and teach you about different components and features to get the best out of a clustering solution using Apache Karaf Cellar.
Learning Karaf Cellar starts with an introduction to some of the key features of Karaf. After a quick but detailed understanding of OSGi and Karaf, this book takes you through the concept of provisioning clusters and then covers what Cellar is and how to use it. ...