Programmers are taking advantage of the increasing availability of on-chip parallelism to meet the rising performance demands of diverse applications. Support of tools that can facilitate the detection of incorrect program execution when concurrent threads are involved is critical to this evolution. Many concurrency bugs manifest as some form of data race condition, and their runtime detection is inherently difficult due to the high overhead of the required memory trace comparisons. Various software and hardware tools have been proposed to detect concurrency bugs at runtime. However, software-based schemes lead to significant performance overhead, while, hardware-based schemes require significant hardware modifications. To enable cost-efficient design of data race detectors, it is desirable to utilize available on-chip resources. The recent integration of CPU cores with data-parallel accelerator cores, such as GPU, provides the opportunity to offload the task of data race detection to these accelerator cores. In this paper, we explore this opportunity by designing a GPU Accelerated Data Race Detector (GUARD) that utilizes GPU cores to process memory traces and detect data races in parallel applications executing on the CPU cores. GUARD further explores various optimization techniques for: (i) reducing the size of memory traces by employing signatures; and (ii) improving accuracy of signatures using coherence-based filtering. Overall, GUARD achieves the performance of hardware-based data race detection mechanisms with minimal hardware modifications.