Hybrid Binary-Unary Hardware Accelerator

S. Rasoul Faraji, Kia Bazargan

Research output: Contribution to journalArticlepeer-review

13 Scopus citations

Abstract

Stream-based computing such as stochastic computing has been used in recent years to create designs with significantly smaller area by harnessing unary encoding of data. However, the area saving comes at an exponential price in latency, making the area × delay cost unattractive. In this article, we present a novel method which uses a hybrid binary / unary representation to perform computations. We first divide the input range into a few sub-regions, perform unary computations on each sub-region individually, and finally pack the outputs of all sub-regions back to compact binary. Moreover, we propose a synthesis methodology and a regression model to predict an optimal or close-to-optimal design in the design space. To the best of our knowledge, we are the first to show a scalable method based on parallel bit-stream data representation that can beat conventional binary in terms of a real cost, i.e., area × delay and energy consumption in almost all functions that we tried at resolutions of 8-, 10-, and 12-bits. Our method outperforms the binary, stochastic, and fully unary methods on a number of functions, especially low-cost binary CORDIC-based functions, and on a common edge detection algorithm on FPGA and in ASIC implementation. In terms of area × delay cost, our {on FPGA, in ASIC} cost is on average only {4.72\4.72, 24.36\24.36} and {20.16\20.16, 60.12\60.12} of the parallel binary pipeline implementation at 8- and 10-bit resolution, respectively. These numbers are 2-3 orders of magnitude better than the results of traditional stochastic methods. Our method is not competitive with the parallel CORDIC-based pipeline binary method for high-resolution (12-bit), highly oscillating functions such as \sin (15 x)sin(15x). However, for complex functions like gammagamma function, the proposed method can beat any other methods in terms of area × delay, throughput, latency, and energy per sample costs. To implement the Roberts cross edge detection algorithm, the proposed method takes 5.7 and 39.45 percent of the area × delay cost of FPGA and ASIC implementation of the binary method, respectively. In terms of energy efficiency for FPGA implementation, our method uses only 8.4, 12.7, and 27.7 percent of the energy per sample usage of serial binary implementations at 8-, 10-, and 12-bit resolutions, respectively. These numbers change to 23.9, 38.54, and 99.3 percent compared to parallel binary implementations.

Original languageEnglish (US)
Article number8981875
Pages (from-to)1308-1319
Number of pages12
JournalIEEE Transactions on Computers
Volume69
Issue number9
DOIs
StatePublished - Sep 1 2020

Bibliographical note

Publisher Copyright:
© 1968-2012 IEEE.

Keywords

  • CORDIC
  • Hybrid computing system
  • alternator logic
  • edge detection
  • hardware accelerators
  • scaling network
  • stochastic computing
  • unary computing system

Fingerprint

Dive into the research topics of 'Hybrid Binary-Unary Hardware Accelerator'. Together they form a unique fingerprint.

Cite this