So i used imagesc to create a new representation of the linkage data:
This has quite a bit of a difference between the scatter-plot I made yesterday ... reposted here:
As you can see ... in the scatter figure it seems that E1 and E2 have the highest levels of predictive power. While in the image figure it seems that the NS3-NS5A has the majority of predictive power.
There is a simple reason for the dis-congruity ... the scatter-plot shows SINGLE AA linkages while the image figure shows MULTIPLE AA linkages. So it would seem that the linkage between NS3/NS5 with the rest of the genome only occurs at the "motif" level while the linkage between E1/E2 occurs at the single AA level.
You can see the small spikes in the area plot below the image figure that correspond to the spikes in the scatter figure. However they are drastically overshadowed by the other linkages around them.
I'm not sure of the biological significance but I saw a similar effect in the HIV data ... just never made it into the most recent version of the manuscript.
The thing that really bugs me is the "white-space" in the image figure. This is mostly due to two things: low entropy cutoffs and multiple alignment issues. There's not much I can do to fix the low entropy regions ... I could cut those sections out of the figure but that would make it weirdly distorted. I'm hoping by converting from "alignment-space" to "sequence-space" will clean up some of these issues.


No comments:
Post a Comment