• No results found

In this Chapter, we have discussed several state-of-the-art RTL to C reverse engineering techniques. We have also presented various simulation-based verification, phase-wise verifi- cation, and end-to-end verification techniques for checking the correctness of the HLS tool.

The state-of-the-art Trojan detection mechanisms are also discussed in this chapter. In this process, we have identified some limitations of existing reverse engineering, verification, and hardware Trojan detection methods of HLS. In the subsequent chapters, we present the Fast Simulation framework, End-to-end Verification of HLS, RTL to C reverse engineer- ing for faster simulation, and Trojan detection frameworks which overcome the limitations identified in this chapter.

FastSim: A Fast Simulation Framework for 3

High-Level Synthesis

3.1 Introduction

The ever-increasing complexity of digital design and the inception of information technology are driving design methodologies towards the use of high-level behaviour than the register transfer level (RTL). In this scenario, high-level synthesis (HLS) plays a significant role by enabling the automatic generation of RTL design starting from high-level descriptions.

Several HLS tools like Vivado HLS [10], Catapult-C [3], Intel OpenCL HLS [6], SCC [13], etc. from industries and Bambu [107], LegUp [36], etc. from academia have been introduced both for field-programmable gate array (FPGA) and application specific integrated circuit (ASIC) hardware designs. The HLS enables the algorithm developers to use the FPGA targets by abstracting lower level details like clock, target architecture and reconfigurability.

Since its induction, HLS has made significant progress in shortening system design time by providing flexible and instant optimization opportunities at the behavioural level for pipelining, loop unrolling, enabling parallel dataflow streams, etc. However, the verification of the synthesized model is still primarily carried out by time consuming RTL simulations.

Although various phase-wise formal verification methods of HLS are proposed [87, 24, 46],

Introduction

a monolithic end-to-end formal verification of HLS is not yet available due to the difference between high-level programming sets in various computer languages, and the generated RTL. Consequently, HLS designers still depend on Xilinx RTL simulators (e.g. Xsim, VCS) or hardware-software power emulators (e.g., Zebu [12]) for verification purposes. Due to the RTL simulator’s verification time overhead and the non-FPGA experts being unable to understand the details of the RTL code, the HLS tools come up with software based verifi- cation. In Vivado HLS design Suite [10], There are two steps to verifying the design [110]:

(i) C-simulation: where, Before synthesis, the behavioral specifications to be synthesized should be validated with a test bench using C simulation. The test bench is self-checking and validates that the results from the design to be synthesized are correct. (ii) C/RTL co-simulation: where the Vivado HLS can verify that the synthesized RTL is functionally identical to the C source code with the original (same) test bench. Although Vivado HLS uses both C-simulation and RTL co-simulation to determine the correctness of the design, C-simulation is faster than the RTL co-simulation.

A comparative analysis of the simulation time (in seconds) for C simulator and RTL simulators like Vivado RTL co-simulator and ModelSim are presented in Table3.1 for a few HLS benchmarks tested for 30k input test cases. From the table, it could be concluded that C simulation is much faster compared to HLS based RTL co-simulation. The simulation times for RTL simulators are comparable.

Table 3.1: C simulation vs RTL simulation comparison Bench C-sim(s) RTL co-Sim(s) Modelsim(s)

des 28.01 34672 36024

mips 0.985 2620 2885

aes enc 12.656 4389 4693

aes dec 14.442 4467 4780

Table 3.1 gives an intuition that a C like behavioural code realized from HLS generated RTL is likely to simulate faster as compared to traditional RTL simulation. The abstraction of the high-level model from RTL has been in use in mainstream companies for decades, for cycle-based simulations in early verification phases. For example, Tenison VTOC [1]

automatically generates C++ or SystemC models from an RTL hardware description. The emulation of RTL behaviour using a sequential language like C adds up several important constraints to incorporate the significant features of RTL simulators like cycle accuracy,

FastSim: A Fast Simulation Framework for High-Level Synthesis

accurate performance estimation, the capability to simulate instruction and task level paral- lelisms along with proper code readability. Two closest works with these targets are FLASH [44] and Verilator [11]. FLASH [44] incorporates scheduling information in the source C code for cycle accurate simulation. Although FLASH guarantees faster simulation of scheduled C code, it does not consider the design transformations during allocation, binding and the datapath, and controller generation phases of HLS. Consequently, the correctness of RTL generated by HLS is not guaranteed by FLASH. On the other hand, Verilator [11] generates C++ code from any synthesizable Verilog RTL for faster simulation. The C++ code could be simulated faster than RTL simulation and can verify the functional correctness of RTL generated by the HLS tool. However, being a generic tool, Verilator disregards the leverages offered by the finite state machines with datapath (FSMD) oriented nature of HLS gener- ated RTLs where the datapath and the controller are well separable and the controller is specified as a well defined finite state machine (FSM). Hence, the generated C++ code is highly complex and much slower compared to FLASH.

3.1.1 Contributions

In this work, we develop a simulator that overcomes the limitations of both FLASH and Verilator. Specifically, we propose a framework that converts an HLS generated RTL to an equivalent C-code similar to Verilator but takes advantage of the structure of the HLS generated RTL. We extract the register transfer (RT) operation(s) performed in the datapath in each state of the controller FSM from the control signal assignment of that state. This way our simulator automatically generates the behavioural FSMD in C code semantics from the HLS generated Verilog RTL while maintaining the state machine sequence of the synthesized RTL. Our framework guarantees fast simulation, functional correctness of the RTL, cycle accuracy, accurate performance estimation, and generates a highly readable and debug friendly simulation code by preserving all register and port names for easier correlation with the HLS synthesis report. In addition to typical C programming constructs, our framework supports advanced HLS constructs like array mapping to external memory modules, non-inlined function calls, parallel execution frameworks invoked by loop unrolling, pipelining, etc., and accurate simulation of pipeline stalls during external memory access.

The contribution of this chapter are summarized as follows:

• We demonstrate the efficiency of FSMD aware RTL to C conversion for faster HLS

Our Proposed Framework

design verification.

• We present a completely automated, fast, and cycle accurate simulation based ver- ification framework FastSim for HLS generated RTLs. The framework ensures the end-to-end correctness of HLS. It is also equipped to give accurate design performance estimation.

• FastSim can model various hardware parallelisms like loop and task level pipelines.

Our simulator also generates a well indented and arranged simulation C code for convenient design debugging.

• We also present a detailed experimental comparison of our simulation framework for RTL generated by the Vivado HLS tool with state-of-art simulators like Verilator, ModelSim, Vivado RTL cosimulator (XSIM) etc. on diverse workloads from CHStone benchmark suite and several other standard programs.

The remainder of this chapter is organized as follows. Section3.2 presents our proposed framework and flow diagrams. The RTL to C conversion and its implementation details are discussed in Section 3.3. Section 3.4 demonstrates the major challenges faced during RTL to C conversion. In Section 3.5, we elaborate the parallel execution strategy adopted by FastSim. The debug strategies and design performance estimation are discussed in Section 3.6. Experimental results are presented in Section 3.7. Finally, Section 3.8 concludes the chapter.