Hi Krzysztof,
I wrote a kernel module that allocates some memory and maps it to the ESRAM. "stats" interface shows that the mapping is done. It then copies two arrays inside that region. I could get only 20% speedup but the documentation says 3x speedup is expected.
It was a while ago but I think I tried to use the whole ESRAM to avoid cache effects.
I have tried mmap also (wrote a module that provides a file interface). It made my user code a bit slower!
Regards,
-Ehsan