Friday, March 11, 2011

Sequence-space figures

So I've normalized the figures to be in "sequence-space" instead of alignment space ... all positions were normalized to the AF054250 sequence (it would be trivial to change this).
and
This cleaned things up relatively well ... although there are still large chunks of un-measured space ... but these are due to the low-entropy cutoff, not sure there's much I can do about that one :(. The only thing I can think of to fix this would be too allow a window to grow indefinitely if its below the entropy cutoff ... right now if the "conserved window" is greater than 5 AAs then I just skip past it. While this would fix my problem I'm not sure it is worth the time it would take to re-compute everything (and it wouldn't add much biological significance).

I could take the adjacent values (since if I allowed the window to extend it would be the same as the adjacent value).  When I do that I get this:
Which looks pretty bad ... and misleading. Since this seems to imply that whole genome can be predicted from any other part with at least 70% accuracy. Which is NOT true ... if you were given one of those low-entropy regions (without the nearby AAs) you'd be out of luck.

No comments:

Post a Comment