Biodoop is a suite of tools for computational biology that focuses on the efficient, distributed implementation of the most computationally demanding and/or data-intensive tasks. It consists of a core component, which includes a set of general-purpose modules, plus a number of application-specific components.
Current applications focus on sequence alignment and manipulation of alignment records. Applications generally run on the Pydoop API for Hadoop and are built to scale well in both the number of computing nodes available and the amount of data to process, making them particularly well suited for processing large data sets.
Currently, Biodoop’s core contains a few modules for handling FASTA streams, wrappers for BLAST, I/O modules for some bio formats, a module for converting sequences to the nib format and protobuf serializers for several objects.
Release 0.2.0:
get biodoop-core from the download page
unpack the biodoop-core tarball
build the protobuf code in bl/core/messages and bl/core/gt/messages
move to the distribution’s root directory and run:
python setup.py install
for a system-wide installation, or:
python setup.py install --user
for a local installation
The BLAST package provides a wrapper-based MapReduce implementation of BLAST for Hadoop. See the Biodoop-BLAST documentation for details.