||Pro Apache Hadoop, 2nd Edition|
Pro Apache Hadoop, Second Edition brings you up to speed on Hadoop – the framework of big data. Revised to cover Hadoop 2.0, the book covers the very latest developments such as YARN (aka MapReduce 2.0), new HDFS high-availability features, and increased scalability in the form of HDFS Federations. All the old content has been revised too, giving the latest on the ins and outs of MapReduce, cluster design, the Hadoop Distributed File System, and more.
This book covers everything you need to build your first Hadoop cluster and begin analyzing and deriving value from your business and scientific data. Learn to solve big-data problems the MapReduce way, by breaking a big problem into chunks and creating small-scale solutions that can be flung across thousands upon thousands of nodes to analyze large data volumes in a short amount of wall-clock time. Learn how to let Hadoop take care of distributing and parallelizing your software - you just focus on the code; Hadoop takes care of the rest.
||60 Recipes for Apache CloudStack|
Planning to deploy and maintain a public, private, or hybrid cloud service? This cookbook's handy how-to recipes help you quickly learn and install Apache CloudStack, along with several API clients, API wrappers, data architectures, and configuration management technologies that work as part of CloudStack's ecosystem.
You'll learn how to use Vagrant, Ansible, Chef, Fluentd, Libcloud, and several other open source tools that let you build and operate CloudStack better and faster. If you're an experienced programmer, system administrator, or DevOps practitioner familiar with bash, Git, package management, and some Python, you're ready to go.
||Apache Hadoop YARN|
Apache Hadoop is helping drive the Big Data revolution. Now, its data processing has been completely overhauled: Apache Hadoop YARN provides resource management at data center scale and easier ways to create distributed applications that process petabytes of data. And now in Apache Hadoop YARN, two Hadoop technical leaders show you how to develop new applications and adapt existing code to fully leverage these revolutionary advances.
YARN project founder Arun Murthy and project lead Vinod Kumar Vavilapalli demonstrate how YARN increases scalability and cluster utilization, enables new programming models and services, and opens new options beyond Java and batch processing. They walk you through the entire YARN project lifecycle, from installation through deployment.
||Apache Solr High Performance|
Apache Solr is one of the most popular open source search servers available on the web. However, simply setting up Apache Solr is not enough to ensure the success of your web product. To maximize efficiency, you need to use techniques to boost Solr performance in order to return relevant results faster. You need to implement robust techniques that focus on optimizing the performance of your Solr instances and also troubleshoot issues that are prone to arise while maintaining Solr.
Apache Solr High Performance is a practical guide that will help you explore and take full advantage of the robust nature of Apache Solr so as to achieve optimized Solr instances, especially in terms of performance.
||Apache Camel Developer's Cookbook|
Apache Camel is a de-facto standard for developing integrations in Java, and is based on well-understood Enterprise Integration Patterns. It is used within many commercial and open source integration products. Camel makes common integration tasks easy while still providing the developer with the means to customize the framework when the situation demands it. Tasks such as protocol mediation, message routing and transformation, and auditing are common usages of Camel. Apache Camel Developer's Cookbook provides hundreds of best practice tips for using Apache Camel in a format that helps you build your Camel projects.
Message publishing is a mechanism of connecting heterogeneous applications together with messages that are routed between them, for example by using a message broker like Apache Kafka. Such solutions deal with real-time volumes of information and route it to multiple consumers without letting information producers know who the final consumers are.
Apache Kafka is a practical, hands-on guide providing you with a series of step-by-step practical implementations, which will help you take advantage of the real power behind Kafka, and give you a strong grounding for using it in your publisher-subscriber based architectures.
||Apache Accumulo for Developers|
Accumulo is a sorted and distributed key/value store designed to handle large amounts of data. Being highly robust and scalable, its performance makes it ideal for real-time data storage. Apache Accumulo is based on Google's BigTable design and is built on top of Apache Hadoop, Zookeeper, and Thrift.
Apache Accumulo for Developers is your guide to building an Accumulo cluster both as a single-node and multi-node, on-site and in the cloud. Accumulo has been proven to be able to handle petabytes of data, with cell-level security, and real-time analyses so this is your step by step guide in taking full advantage of this power.
||Microsoft SQL Server 2012 with Hadoop|
With the explosion of data, the open source Apache Hadoop ecosystem is gaining traction, thanks to its huge ecosystem that has arisen around the core functionalities of its distributed file system (HDFS) and Map Reduce. As of today, being able to have SQL Server talking to Hadoop has become increasingly important because the two are indeed complementary. While petabytes of unstructured data can be stored in Hadoop taking hours to be queried, terabytes of structured data can be stored in SQL Server 2012 and queried in seconds. This leads to the need to transfer and integrate data between Hadoop and SQL Server.
||CMIS and Apache Chemistry in Action|
CMIS and Apache Chemistry in Action is a comprehensive guide to the CMIS standard and related ECM concepts. In it, you'll find clear teaching and instantly useful examples for building content-centric client and server-side applications that run against any CMIS-compliant repository. In fact, using the CMIS Workbench and the InMemory Repository from Apache Chemistry, you'll have running code talking to a real CMIS server by the end of chapter 1.
This book requires some familiarity with content management systems and a standard programming language like Java or C#. No exposure to CMIS or Apache Chemistry is assumed.
||Apache CloudStack Cloud Computing|
Cloud computing is changing the way IT is delivered in enterprises around the world. The world's leading open source cloud computing platform, Cloudstack, helps you implement a cloud computing service in your enterprise or set up an infrastructure as a service (IaaS) offering for your customers.
With Apache Cloudstack Cloud Computing, learn the leading open source cloud computing platform in an easy step-by-step approach, from understanding the basics of setting up an infrastructure as a service cloud to actual deployment scenarios and extensibility features of CloudStack.
||Apache Sqoop Cookbook|
Integrating data from multiple sources is essential in the age of big data, but it can be a challenging and time-consuming task. This handy cookbook provides dozens of ready-to-use recipes for using Apache Sqoop, the command-line interface application that optimizes data transfers between relational databases and Hadoop.
Sqoop is both powerful and bewildering, but with this cookbook's problem-solution-discussion format, you'll quickly learn how to deploy and then apply Sqoop in your environment. The authors provide MySQL, Oracle, and PostgreSQL database examples on GitHub that you can easily adapt for SQL Server, Netezza, Teradata, or other relational systems.