HPC-BLAST Project at AACE
While I was at the Joint Institute for Computational Sciences (JICS), I worked on a project with the Application Acceleration Center of Excellence (AACE) to develop a high-performance version of NCBI BLAST. This project was funded by Intel through and Intel Parallel Computing Center (IPCC) grant. You can read the university’s announcement here.
NCBI BLAST is one of the most important and utilized collection of software tools in the field of bioinformatics for sequence alignment. A new sequence of genetic material or protein can be compared to known sequences to allow biologists to infer function and relationships. NCBI BLAST was a huge leap forward over earlier methods, but was still seen as slow compared to the amount of work required as known nucleotide and protein databases exploded in size. Some researchers developed variations on the BLAST algorithm that increased throughput but suffered from poor adoption in practice as NCBI BLAST results are the established gold standard. Other teams had worked to parallelize BLAST for use on HPC clusters but did so with the older BLAST toolkit written in C.
We overcame these two limitations by implementing a parallel version of the NCBI BLAST C++ toolkit. By working with the new C++ toolkit, our implementation could easily adapt to new releases of the toolkit. And by keeping the BLAST algorithm at the center of our development, sequence alignment results matched highly with the results generated by standard NCBI version. The design of our parallel implementation allowed for distributing the query searches across processor cores and across compute nodes in a cluster, i.e., course and fine grained parallelization. In this way, we were able to achieve commendable parallel speedup on a variety of compute architectures.
Links
- The code repository is here.
- The poster that was presented at ACM-BCB 2015 is here.
- The paper that was presented at BiCOB 2019 can be access here.