To my knowledge, it has not been submitted elsewhere for the award of a degree. In the first application, we show that RTL-C can be used for faster simulation-based verification of HLS.
High-Level Synthesis Flow
- Preprocessing
- Scheduling
- Scheduling Algorithms in HLS
- Allocation and Binding
- Datapath and Controller Generation
But, in the case of ALAP scheduling, it can be observed that operations V8 and V9 do not have any direct successors. In case of register connection, it can be observed that t1, t3 and t6 are connected to R1.
Advances in High-Level Synthesis
The Rathlin Image Processing Language (RIPL) [120] is a self-contained high-level domain-specific image processing language for developing memory-efficient image processing applications on FPGAs. Other recent application areas on HLS include safe hardware implementation of logic locking at the RTL and C levels [22], HLS for machine learning application [60] is gaining popularity.
Motivations and Objectives
- HLS tool Usage
- Slower RTL Simulation
- Lack of Formal Verification Support
- Threat of Hardware Trojan in HLS
- Objectives
RTL co-simulation is the primary way to verify the correctness of the generated RTL of an HLS tool. Therefore, generating an equivalent C code from the RTL would greatly reduce the HLS verification time.
Contributions
- FastSim: A Fast Simulation Framework for High-Level Synthesis
- DEEQ: Data-driven End-to-End Equivalence Checking of High-Level
- REVAMP: Reverse Engineering Register to Variable Mapping in High-
- BLAST: Belling the Black-Hat High-Level Synthesis Tool
Specifically, the proposed method will formally prove the equivalence between the input C and the generated C from the RTL. This work contributes a C to RTL equivalence checking method to prove the correctness of the HLS-generated RTL against its original C code.
Organization of the Thesis
Generic RTL to C Conversion for Fast Simulation
While Verilator has been incorporated over the years with advanced optimizations to speed up simulations, it ignores the inherent FSMD framework of the HLS-based RTLs. As discussed above, most existing works fail to address the inherent FSMD framework of the HLS-based RTLs and debug errors that could occur during mapping, bindings, and data path and controller generation phases of HLS, and simulate stalls from accessing external memory.
Verification of High-level Synthesis
Formal Verification of HLS
- Phase wise Verification of HLS
- End to end Verification of HLS
It uses a state-based equivalence checking methodology to verify the correctness of the controller behavior. In this method, two key optimizations (operating gate and global variables) are identified that complicate equivalence checking for high-level synthesis.
Hardware Trojan Detection
Therefore, more advanced detection mechanisms are required to eliminate the requirement for the golden model. However, most of the HT detection method discussed above compared the golden model with a design circuit to detect HT.
Conclusion
Contributions
Specifically, we propose a framework that converts an HLS-generated RTL into an equivalent C code similar to Verilator, but takes advantage of the structure of the HLS-generated RTL. In this way, our simulator automatically generates the C-code behavioral FSMD semantics of the HLS-generated Verilog RTL, while preserving the state machine order of the synthesized RTL.
Our Proposed Framework
We also present a detailed experimental comparison of our RTL simulation framework generated by the Vivado HLS tool with state-of-the-art simulators such as Verilator, ModelSim, Vivado RTL co-simulator (XSIM), etc. Then we have a simulation phase where the behavioral C source and RTL-C code are co-simulated with C compilers like GCC or using C-simulation HLS tools with test cases to verify the functional correctness of the generated RTL.
RTL to C Conversion
- RTL Structure
- AST Representation
- Automatic Pre-processing of RTL
- Concat Operation
- Part-select Operation
- RTL to C Conversion
- Extraction of Variables, Controller and State wise Micro-
- Rewrite Method
- RAM, ROM and Modules
- Generation of Cycle Accurate RTL-C Code
The next task is to identify the RT operations in each state of the FSM controller from the active microoperations in that state. This FSMD constitutes state-separated register transfers and is a behavioral description of the RTL.
Challenges Resolved in RTL to C Conversion
- Data Inconsistency
- Sign Conversion
- Data-width Mismatch
- Level-triggered Operations
On the other hand, the variable r C in RTL-C stores all 42 bits of the result. Second, we use the new value of the variable (instead of the old value as in edge.
Modelling Hardware Parallelism in C
Loop Unrolling
3.17, a is updated to an old + 5 before copying to b as shown in the solution given in Fig. The bare FSM model is similar to the basic FSM where register transfers form loop iterations.
Instruction Level Pipelining
The self loop transitionT3 of stateS3 indicates the inner for loop,loop-1.2 of the function warp. The main difference from conventional states is that, in addition to usual FSM state variables, there are two additional variables, S3 1 flag and S3 2 flag that represent the activation flags of substages S3-1 and S3-2, respectively, of stage S3.
Task Level Pipelining
In a no-data-flow case, FastSim makes a function call to such an FSMD in the state where the function was scheduled. In the data flow scenario, it also extracts the FIFO/PIPO structural instances and models the FIFO/PIPO transactions using FIFO/PIPO module function call.
Debug Framework and Performance Estimation
For loop boundaries and data-dependent conditionals, the execution cycles depend on the input. In fact, additional performance results such as the number of execution cycles for each module or loop and the number of cycles when a particular FIFO was full/empty can also be reported using additional trace variables in our RTL-C code.
Experimental Results
- Experimental Setup and Benchmark Characteristics
- RTL to C Conversion Results
- HLS Simulation Results
- Performance Estimation
The experimental details of the RTL to C conversion process are presented in Table 3.3 for the benchmark programs. In Table 3.7, we compare the performance estimates of FastSim with respect to different state-of-art simulation frameworks for 30,000 test cases.
Conclusion
Contributions
In this chapter, we propose a C to RTL equivalence check method called DEEQ to prove the correctness of the RTL generated by the HLS with respect to the input C code. It reduces many-to-many trace equivalence to one-to-one trace equivalence in most cases.
Equivalence Problem Formulation
The FSMD Model
Each arithmetic predicate is of the form eR0, where e is an arithmetic expression andR P t““,‰,ě,ą,ă,ďu. The execution conditions and path data transformation can also be defined in a similar way.
Equivalence of FSMDs
Equivalence Checking between C and RTL-C
- Generate all Traces in Both Behaviors
- Merge Compatible Traces
- Find Potential Corresponding Traces
- Checking One-to-one Equivalence
- Checking One-to-many Equivalence
The union of the condition for the execution of all such traces must correspond to the condition for the execution of τ0 (in line 28). Since the behavior is deterministic, the union of the condition for the executions of all such traces must be the same as the trace of the other behavior.
Correctness
Termination
Although SMT expects the program to be in static single assignment (SSA) [32] form, we do not need to convert C and RTL-C to SSA. We do not use every line of the program in the SMT formulation. If the SMT solver returns SAT, it means that the corresponding formulas are not equivalent.
Soundness
For each τ0 in the rest of the trace in T0, a subset of traces is identified in T1 such that the data transformation of each of them is equivalent to τ0 and the union of its execution condition is equivalent to the execution condition of τ0. So the union of condition of performances of all such corresponding traces in T1 is also true.
Completeness
For a subset of traces in T0, one-to-one equivalence is shown with T1 in the while loop. Therefore, the underlying SMT theory used to encode the trace equivalence problem is undecidable.
Experiment Results and Analysis
The results of the equivalence check in terms of total time spent are given in 13. For Dfadd, the number of traces after merging is not equal. the equivalence check takes more time.
Conclusion
Contributions
In our approach, we first obtain SD-C and RTL-C from the scheduling information generated by the HLS tool and the output RTL, respectively. With this mapping information, we can rewrite the RTL-C in terms of variables in SD-C and finally generate an equivalent planned C code of the RTL-C.
Daikon based Reverse Engineering of Register to Variable Mapping
- Reverse Engineering Steps
- RTL-C Abstraction from RTL
- Scheduled C code
- Combine Two C Codes
- Invariant Generation using Daikon
- Extract Useful Mapping
- Challenges Resolved
- Generating Quality Test Cases
- Unmapped Temporary Variables
- Transitive Analysis of Daikon Output
- One Register to Many Variables Mapping in a State
- Mealy vs Moore Models
- Correctness of Reverse Engineering Flow
- Experimental Results
The useful mapping of the registers of RTL-C and the variables of SD-C is shown in Fig.5.8. The RTL-C and SD-C codes are then combined manually and the rest of the flow is automated.
SMT based Reverse Engineering of Register to Variable Mapping
Reverse Engineering Steps
- FSMD Extraction from HLS Generated RTL
- SSA Transformation
- SAT Formulation
The reverse engineering method is presented as Algorithm 3. The lines 1-10 of Algorithm 3 perform this step. All the registers in the RTL-FSMD can be replaced by the corresponding variable with mvr.
Experimental Results
We have done simulation-based verification for our benchmarks to check the correctness of the registry for variable reverse engineering. The outputs match all benchmarks used, confirming the correctness of our reverse engineering flow.
Applications of REVAMP Framework
Register Allocation and Security Aspect
For example, fpvaq "fpvbq "fpvcq " rx means that the variables va, vb and vc are mapped to the register rx. In this work, we target the reverse engineering of the register to variable mapping in register allocation.
Correlating C and RTL
Fast Simulation and Debug
Comparisons of Daikon and SMT based Reverse Engineering Frameworks
Conclusion
Contributions
In [16], we have used the high-level behavior of RTL to prove the correctness of HLS by showing the equivalence between C and RTL. For the completeness of the chapter, we discuss the ideas of [15] and [16] and briefly here as well.
HLS Trojan Detection Framework
In [15], we showed a way to extract high-level behavior from HLS-generated RTLs. Specifically, we look for any inconsistencies or differences between high-level behavior extraction from RTL in [15] or during equivalence checking in [16].
Detection of Battery Exhaustion Attack
Attack Model
The output of the FUs is bit flipped and then multiplexed with the actual input of the FU. We consider only the relevant part of the data path to explain the effect of the battery exhaustion attack.
RTL-FSMD Extraction
Battery drain attack will be detected while deriving RTL-FSMD from Verilog RTL. The general idea of the RTL-FSMD extraction process will be explained in the next subsection and how the BE attack will be detected during that phase in the subsequent subsection.
Detection
1 Collect all bit-reversed registers in state sn in Rbf from RSn (Ref. Algorithm 1);. Since r4 and r5 appear in the input sample of the multiplier, Algorithm 4 will report a possible battery drain attack.
Detection of Degradation Attack
Attack Model
As a result, the number of tracks in the RTL-FSMD increased and the execution state of some tracks also changed. Consequently, the equivalence of some traces of C-FSMD could not be found in RTL-FSMD.
C to RTL Equivalence Checking
DA will be detected during equivalence checking of two FSMDs (between C-FSMD and RTL-FSMD). In such a situation, we will analyze to find a set of tracks in RTL-FSMD whose association corresponds to the track in C-FSMD.
Detection
A potentially matched pair of tracks is equivalent if their respective execution condition and data transformations are equivalent (in lines 8-10). Therefore, a careful manual inspection of the execution condition of the traces will identify if a spurious HT trigger condition has been added in the RTL-FSMD.
Detection of Downgrade Attack
Attack Model
We only consider the relevant part of the data path to explain the effect of a downgrade attack. If the trigger condition is True, the loop is repeated in2 times instead of in1 times.
Detection
By carefully analyzing the traces included in the ccomb, we can identify a false HT trigger condition for a step-down attack and the loop counter value is changed. In addition to the number of loops, the actual operations are also affected by the change.
Experimental Results
In the case of exhaustion attack, the detection time is shorter compared to other attacks because the BE attack is identified during the FSMD extraction phase. From the device usage report summary obtained after synthesis, we calculate the overhead (slices, flipflops and LUTs) required by the additional logic added to implement BE, DA and DG in the original RTL.
Performance of BLAST for HLS Optimizations
Such optimizations do not affect the control flow of the input behavior much. Therefore, the performance of BLAST will not be affected by applications of such software optimizations in the front-end of HLS.
Conclusion
Daikon and SMT-based frameworks for reverse engineering the register to variable mapping in high-level synthesis are presented in Chapter 5. The experimental results show that our methods detect HTs from the black-hat HLS tool.
Future Perspectives
Critical Path Reporting at the Behavioral Level: In circuit design, a critical path is the longest combinational path in a circuit that must be completed within a target time period. Although the overhead area is not important for such locking, additional logic can be inserted into the critical path of the design.
Conclusion
FastSim: A fast simulation framework for high-level synthesis”, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. A fast simulation framework for high-level synthesis. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.
High-level Synthesis flow
Example of 2 nd order differential equation solver (DIFFEQ)
Basic Blocks with 3-address codes for DIFFEQ
RTL: (a) Datapath unit, (b) Controller unit
Contributions of the Thesis
FastSim: Overall flow
RTL structure generated by HLS
AST Code Snippet
Example of pre-processing of Concat operation
Example of pre-processing part-select operation
Parser Flow Diagram
Data-structure for storing variables
Data-structure for storing Controller
Data-structure for state-wise Micro-operations with their conditions
RTL-C code snippet for RAM module
Generated RTL-C code outline
Data inconsistency problem
A Solution to Data inconsistency solution
Sign conversion issue and solution
Data-width mismatch and solution
RTL with level triggered operation
RTL-C Code with level-triggered always block issue and its solution
Sample function warp with pipelined loop
Generated FSM for warp module
A representative code snippet depicting the pipelined state S3 in warp module 56
Control flow of RTL-C: Task level pipeline
Representative RTL-C code structure of main
Representative RTL-C code structure of module 2
Example to illustrate FSMD Model
Proposed equivalence checking framework
Daikon based register to variable reverse engineering flow
C source code of Diffeq example
State 1 of diffeq.verbose.rpt file generated by Vivado HLS
State 1 of scheduled C code after decoding verbose file
State 1 in RTL-C of Diffeq example
State 1 of combined C code
Daikon invariants output for state 1
Mapping of State 1 registers to variables
SMT based Register to variable reverse engineering flow
SSA example
The overall flow of our HT detection framework
An example to illustrate the effect of battery exhaustion attack
Overview of EDA Tools for High-Level Synthesis
C simulation vs RTL simulation comparison
Characteristics of Benchmarks
RTL to C conversion results for Benchmarks
Comparisons of FastSim with various RTL Simulators
Comparisons of FastSim with various RTL Simulators after applying pipeline
Comparisons of results for task level pipelining using ping pong (pp) or FIFO
Performance estimation in clock cycles by Vivado synthesis report, Fastsim
Experimental results for different high-level synthesis benchmarks
Experimental Results for different high-level synthesis benchmarks
Experimental Results for different high-level synthesis benchmarks
Experimental Results of Battery exhaustion degrade, and downgrade attacks
Comparisons of area overhead for RTL (original) with respect to the RTL(BE),
Comparisons of increase in delay for RTL (original) with RTL(BE) and