Most modern neural networks use the basic Multiply-Accumulate (MAC) Operation in some form or another, and as networks get larger, the computational needs for these larger networks grow rapidly. Typically neural networks implemented on FPGAs use variable multipliers so that any weight can be used in the MAC, but this also forces the accelerator designer to store the weights of the model in off-chip memory (DRAM). Moreover, because of high computational power and high memory bandwidth requirements of today's CNNs, it is harder for FPGA platforms to deliver the best performance. In this paper, we propose a fully parallel-pipeline Hybrid Binary-Unary Neural Network (HBUNN) architecture to implement a low-cost and high-performance ResNet-18 convolutional neural network. We use a hybrid binary-unary method to implement constant-coefficient multipliers and batch normalization units. These two units reduce hardware cost by 30.7% and 47.97% on average compared to the conventional binary equivalent, respectively. Moreover, we propose a novel training scheme using our hardware cost-aware regularizers that not only improves the area cost of the proposed architecture and the conventional binary architecture by 59.3% and 76.7% respectively, but also maintains the same accuracy. Finally, we have implemented three trained networks using different regularizers. The proposed HBUNN architectures reduce the area cost by 30%, and the area × delay cost by 69% on average compared to the conventional binary architectures. The error rate of the proposed work is 12.93%, while its throughput is 278 Kfps.