Dynamic binary translation (DBT) is a core technology to many important applications such as system virtualization, dynamic binary instrumentation and security. However, there are several factors that often impede its performance: (1) emulation overhead before translation; (2) translation and optimization overhead, and (3) translated code quality. On the dynamic binary translator itself, the issues also include its retargetability to support guest applications from different instruction-set architectures (ISAs) to host machines also with different ISAs, an important feature for system virtualization. In this work, we take advantage of the ubiquitous multicore platforms, using multithreaded approach to implement DBT. By running the translators and the dynamic binary optimizers on different threads on different cores, it could off-load the overhead caused by DBT on the target applications; thus, afford DBT of more sophisticated optimization techniques as well as the support of its retargetability. Using QEMU (a popular retargetable DBT for system virtualization) and LLVM (Low Level Virtual Ma- chine) as our building blocks, we demonstrated in a multi- threaded DBT prototype, called HQEMU, that it could improve QEMU performance by a factor of 2.4X and 4X on the SPEC 2006 integer and floating point benchmarks for x86 to x86-64 emulations, respectively, i.e. it is only 2.5X and 2.1X slower than native execution of the same benchmarks on x86-64, as opposed to 6X and 8.4X slowdown on QEMU. For ARM to x86-64 emulation, HQEMU could gain a factor of 2.4X speedup over QEMU for the SPEC 2006 integer benchmarks.