Improve memcpy performance from 290 MiB/s to 340 MiB/s (17% improvment)
use 64 bytes cache lines, reduce the main loop to 64-bytes instead of 128 bytes and adjust the prefetch distance to the optimal value.
Showing
Please register or sign in to comment