In modern data centers, many flow-based and task-based schemes have been proposed to speed up the data transmission in order to provide fast, reliable services for millions of users. However, existing flow-based schemes treat all flows in isolation, contributing less to or even hurting user experience due to the stalled flows. Other prevalent task-based approaches, such as centralized and decentralized scheduling, are sophisticated or unable to share task information. In this work, we first reveal that relinquishing bandwidth of leading flows to the stalled ones effectively reduces the task completion time. We further present the design and implementation of a general supporting scheme that shares the flow-tardiness information through a receiver-driven coordination. Our scheme can be flexibly and widely integrated with the state-of-the-art TCP protocols designed for data centers, while making no modification on switches. Through the testbed experiments and simulations of typical data center applications, we show that our scheme reduces the task completion time by 70% and 50% compared with the flow-based protocols (e.g. DCTCP, L2DCT) and task-based scheduling (e.g. Baraat), respectively. Moreover, our scheme also outperforms other approaches by 18% to 25% in prevalent topologies of data center.