Polar codes have become one of the most favorable capacity achieving error correction codes (ECC) along with their simple encoding method. However, among the very few prior successive cancellation (SC) polar decoder designs, the required long code length makes the decoding latency high. In this paper, conventional decoding algorithm is transformed with look-ahead techniques. This reduces the decoding latency by 50%. With pipelining and parallel processing schemes, a parallel SC polar decoder is proposed. Sub-structure sharing approach is employed to design the merged processing element (PE). Moreover, inspired by the real FFT architecture, this paper presents a novel input generating circuit (ICG) block that can generate additional input signals for merged PEs on-the-fly. Gate-level analysis has demonstrated that the proposed design shows advantages of 50% decoding latency and twice throughput over the conventional one with similar hardware cost.