Graphics processing units (GPUs) can improve deep neural network inference throughput via batch processing, where multiple tasks are concurrently processed. We focus on novel scenarios that the energy-constrained mobile devices offload to an edge server with GPU. The task is partitioned into sub-tasks for a finer granularity of offloading and scheduling, user energy consumption minimization pro...