Motion estimation is the most time consuming stage of MPEG family encodings and it has been shown to absorb up to 90% of the total execution time of MPEG processing. Therefore, we propose a hardware/software co-design paradigm that uses a PIM module to e.ciently execute motion estimation operations by reducing the memory access penalty caused by a large number of memory accesses.
We segment the PIM module into small pieces so that each smaller piece can execute the operations in parallel. However, in order to execute the operations in parallel, there are critical overhead operations which involve replicating a huge amount of data to many of these smaller PIM modules. Not only do these replications require a large amount of additional memory accesses but also require calculations when generating addresses. Therefore, we also present an e.cient data distribution mechanism to e.ectively support parallel executions among these smaller PIM modules.
With our paradigm, the host processor can be relieved from computationally-intensive and data-intensive workloads of motion estimation. We observed a reduction of up to 2034 ? of improvement in the number of memory accesses and a performance improvement of up to 439 ? for the execution of motion estimation operations when using our computing paradigm.