* MPI_NEM_LMT_DMA_SYNC_OR_DMA?

* EXTRACFLAGS will be deprecated soon

* add verbose check to loopback_perf
* test the sync API as well

* offload pinning as well
  + and support pinning overlap properly
* pin while submitting DMA copies to overlap more

* fix the lockup when process is killed by force?
