Bioinformatics with R CookbookBioinformatics is an interdisciplinary field that develops and improves upon the methods for storing, retrieving, organizing, and analyzing biological data. R is the primary language used for handling most of the data analysis work done in the domain of bioinformatics.
Bioinformatics with R Cookbook is a hands-on guide that provides you with a number of recipes offering you solutions to all the computational tasks related to bioinformatics in terms of packages and tested codes.
With the help of this book, you will learn how to analyze biological data using R, allowing you to infer new knowledge from your data coming from different types of experiments stretching from microarray to NGS and mass spectrometry. ...
HBase EssentialsWith an example-oriented approach, this book begins by providing you with a step-by-step learning process to effortlessly set up HBase clusters and design schemas. Gradually, you will be taken through advanced data modeling concepts and the intricacies of the HBase architecture. Moreover, you will also get acquainted with the HBase client API and HBase shell. Essentially, this book aims to provide you with a solid grounding in the NoSQL columnar database space and also helps you take advantage of the real power of HBase using data scans, filters, and the MapReduce framework. Most importantly, the book also provides you with practical use cases covering various HBase clients, HBase cluster administration, and performance tuning. ...
scikit-learn CookbookPython is quickly becoming the go-to language for analysts and data scientists due to its simplicity and flexibility, and within the Python data space, scikit-learn is the unequivocal choice for machine learning. Its consistent API and plethora of features help solve any machine learning problem it comes across.
The book starts by walking through different methods to prepare your data—be it a dataset with missing values or text columns that require the categories to be turned into indicator variables. After the data is ready, you'll learn different techniques aligned with different objectives—be it a dataset with known outcomes such as sales by state, or more complicated problems such as clustering similar customers. Finally, you'll learn how to polish your algorithm to ensure that it's both accurate and resilient to new datasets. ...
IPython Notebook EssentialsIn data science, it is difficult to present interesting visual or technical content, as it involves scientific notations that are not easy to type in a normal document format. IPython provides a web-based UI called Notebook, which creates a working environment for interactive computing that combines code execution with computational documents. IPython Notebook makes the task simpler as it was developed for scientific programming to solve larger problems through a series of smaller programs. IPython Notebook is used to learn Python in a fun and interactive way and to do some serious parallel / technical computing.
The book begins with an introduction to the efficient use of IPython Notebook for interactive computation. The book then focuses on the integration of technologies such as matplotlib, pandas, and SciPy. The book is aimed at empowering you to work with IPython Notebook for interactive computing, configuring it, creating your own notebooks / research documents. You will learn ...
Mastering HadoopHadoop is synonymous with Big Data processing. Its simple programming model, "code once and deploy at any scale" paradigm, and an ever-growing ecosystem makes Hadoop an all-encompassing platform for programmers with different levels of expertise.
This book explores the industry guidelines to optimize MapReduce jobs and higher-level abstractions such as Pig and Hive in Hadoop 2.0. Then, it dives deep into Hadoop 2.0 specific features such as YARN and HDFS Federation.
This book is a step-by-step guide that focuses on advanced Hadoop concepts and aims to take your Hadoop knowledge and skill set to the next level. The data processing flow dictates the order of the concepts in each chapter, and each chapter is illustrated with code fragments or schematic diagrams. ...
Mastering SplunkSplunk is the definitive technology solution used to manage the ever-growing volumes of machine-generated data. This technology is indispensable for industries involved in big data analysis, online services, education, finance, healthcare, retail, and telecommunications. So, having Splunk experience will be relevant for a long time to come!
This book will first take you through the evolution of Splunk and how it fits into an organization's architectural roadmap. Master advanced search topics and explore in-depth methods to leverage Splunk tables, charts, fields, and other cases. As we advance through the chapters, you will master the best practices of values and lookups, indexes, business effective dashboards, and discover the cornerstones of how to evolve your current Splunk application and its monitoring capabilities. Finally, we round things off with the discussion of transactions from an enterprise perspective. ...
Time Series DatabasesTime series data is of growing importance, especially with the rapid expansion of the Internet of Things. This concise guide shows you effective ways to collect, persist, and access large-scale time series data for analysis. You'll explore the theory behind time series databases and learn practical methods for implementing them. Authors Ted Dunning and Ellen Friedman provide a detailed examination of open source tools such as OpenTSDB and new modifications that greatly speed up data ingestion. ...
Apache Hive EssentialsIn this book, we prepare you for your journey into big data by firstly introducing you to backgrounds in the big data domain along with the process of setting up and getting familiar with your Hive working environment. Next, the book guides you through discovering and transforming the values of big data with the help of examples. It also hones your skill in using the Hive language in an efficient manner. Towards the end, the book focuses on advanced topics such as performance, security, and extensions in Hive, which will guide you on exciting adventures on this worthwhile big data journey.
By the end of the book, you will be familiar with Hive and able to work efficiently to find solutions to big data problems. ...
SQL Server 2012 Data Integration RecipesSQL Server 2012 Data Integration Recipes provides focused and practical solutions to real world problems of data integration. Need to import data into SQL Server from an outside source? Need to export data and send it to another system? SQL Server 2012 Data Integration Recipes has your back. You'll find solutions for importing from Microsoft Office data stores such as Excel and Access, from text files such as CSV files, from XML, from other database brands such as Oracle and MySQL, and even from other SQL Server databases. You'll learn techniques for managing metadata, transforming data to meet the needs of the target system, handling exceptions and errors, and much more. ...
Coding InterviewsThis book is about coding interview question of software and Internet companies. It covers five key factors which determine performance of candidates: (1) the basics of programming languages, data structures and algorithms, (2) approaches to writing code with high quality, (3) tips to solve difficult problems, (4) methods to optimize code, (5) soft skills required in interviews. The basics of languages, algorithms and data structures are discussed as well as questions that explore how to write robust solutions after breaking down problems into manageable pieces. It also includes examples to focus on modeling and creative problem solving. ...