Wednesday, February 23, 2011

Looking for alignment quality control

I've noticed that my linkage scores are highly dependent on my alignment. Since I'm gathering a lot of sequences automatically occasionally I can get few crappy ones. These crappy ones throw off the entire alignment. So I've started looking at some automated ways to remove these bad sequences. As per this question on BioStar: Automagically remove “badly” aligning sequence from Multiple-Sequence Alignment this brought me to two suggestions: NorMD and GUIDANCE.

NorMD is a command-line C program. The v1.4 README for the newest version of NormMD does not conform to the actual code (describing functions that are no longer there and functions there are not in the readme file). And the function that I need normd_rd is not present and the help-files associated with the other code are not helpful. I've gone back to version 1.1 which has the desired files. However, this is terribly slow ... 4 hours so far and it still hasn't finished one file.

GUIDANCE is a webservice (http://guidance.tau.ac.il/). I've uploaded the core.fasta file and its been 4 hours and still isn't done.

I'll give the things the night to try to finish otherwise I'll need to code something quick and dirty.

No comments:

Post a Comment