Biodoop¶

Biodoop is a suite of tools for computational biology that focuses on the efficient, distributed implementation of the most computationally demanding and/or data-intensive tasks. It consists of a core component, which includes a set of general-purpose modules, plus a number of application-specific components.

Current applications focus on sequence alignment and manipulation of alignment records. Applications generally run on the Pydoop API for Hadoop and are built to scale well in both the number of computing nodes available and the amount of data to process, making them particularly well suited for processing large data sets.

Core¶

Currently, Biodoop’s core contains a few modules for handling FASTA streams, wrappers for BLAST, I/O modules for some bio formats, a module for converting sequences to the nib format and protobuf serializers for several objects.

Release Notes¶

Release 0.2.0:

added genotyping-related sub-package “gt” (for now it just contains protobuf serialization stuff)
added “io” sub-package with readers/writers for some bio file formats
added “messages” sub-package with general-purpose protobuf modules
added “seq/align” sub-package with tools for reading SAM files
added a module for converting sequences to the nib format

Installation¶

install prerequisites:

NumPy

SciPy

Protocol Buffers

Pydoop

get biodoop-core from the download page
unpack the biodoop-core tarball
build the protobuf code in bl/core/messages and bl/core/gt/messages
move to the distribution’s root directory and run:
```
python setup.py install
```
for a system-wide installation, or:
```
python setup.py install --user
```
for a local installation

BLAST¶

The BLAST package provides a wrapper-based MapReduce implementation of BLAST for Hadoop. See the Biodoop-BLAST documentation for details.

Table Of Contents

This Page

Biodoop¶

Core¶

Release Notes¶

Installation¶

BLAST¶

Navigation

Table Of Contents

This Page

Quick search

Biodoop¶

Core¶

Release Notes¶

Installation¶

BLAST¶

Navigation