• No results found

Reverse Engineering of High-Level Synthesis and its Applications

N/A
N/A
Protected

Academic year: 2023

Share "Reverse Engineering of High-Level Synthesis and its Applications"

Copied!
176
0
0

Loading.... (view fulltext now)

Full text

To my knowledge, it has not been submitted elsewhere for the award of a degree. In the first application, we show that RTL-C can be used for faster simulation-based verification of HLS.

High-Level Synthesis Flow

  • Preprocessing
  • Scheduling
    • Scheduling Algorithms in HLS
  • Allocation and Binding
  • Datapath and Controller Generation

But, in the case of ALAP scheduling, it can be observed that operations V8 and V9 do not have any direct successors. In case of register connection, it can be observed that t1, t3 and t6 are connected to R1.

Figure 1.2: Example of 2 nd order differential equation solver (DIFFEQ)
Figure 1.2: Example of 2 nd order differential equation solver (DIFFEQ)

Advances in High-Level Synthesis

The Rathlin Image Processing Language (RIPL) [120] is a self-contained high-level domain-specific image processing language for developing memory-efficient image processing applications on FPGAs. Other recent application areas on HLS include safe hardware implementation of logic locking at the RTL and C levels [22], HLS for machine learning application [60] is gaining popularity.

Table 1.1: Overview of EDA Tools for High-Level Synthesis
Table 1.1: Overview of EDA Tools for High-Level Synthesis

Motivations and Objectives

  • HLS tool Usage
  • Slower RTL Simulation
  • Lack of Formal Verification Support
  • Threat of Hardware Trojan in HLS
  • Objectives

RTL co-simulation is the primary way to verify the correctness of the generated RTL of an HLS tool. Therefore, generating an equivalent C code from the RTL would greatly reduce the HLS verification time.

Contributions

  • FastSim: A Fast Simulation Framework for High-Level Synthesis
  • DEEQ: Data-driven End-to-End Equivalence Checking of High-Level
  • REVAMP: Reverse Engineering Register to Variable Mapping in High-
  • BLAST: Belling the Black-Hat High-Level Synthesis Tool

Specifically, the proposed method will formally prove the equivalence between the input C and the generated C from the RTL. This work contributes a C to RTL equivalence checking method to prove the correctness of the HLS-generated RTL against its original C code.

Organization of the Thesis

Generic RTL to C Conversion for Fast Simulation

While Verilator has been incorporated over the years with advanced optimizations to speed up simulations, it ignores the inherent FSMD framework of the HLS-based RTLs. As discussed above, most existing works fail to address the inherent FSMD framework of the HLS-based RTLs and debug errors that could occur during mapping, bindings, and data path and controller generation phases of HLS, and simulate stalls from accessing external memory.

Verification of High-level Synthesis

Formal Verification of HLS

  • Phase wise Verification of HLS
  • End to end Verification of HLS

It uses a state-based equivalence checking methodology to verify the correctness of the controller behavior. In this method, two key optimizations (operating gate and global variables) are identified that complicate equivalence checking for high-level synthesis.

Hardware Trojan Detection

Therefore, more advanced detection mechanisms are required to eliminate the requirement for the golden model. However, most of the HT detection method discussed above compared the golden model with a design circuit to detect HT.

Conclusion

Contributions

Specifically, we propose a framework that converts an HLS-generated RTL into an equivalent C code similar to Verilator, but takes advantage of the structure of the HLS-generated RTL. In this way, our simulator automatically generates the C-code behavioral FSMD semantics of the HLS-generated Verilog RTL, while preserving the state machine order of the synthesized RTL.

Our Proposed Framework

We also present a detailed experimental comparison of our RTL simulation framework generated by the Vivado HLS tool with state-of-the-art simulators such as Verilator, ModelSim, Vivado RTL co-simulator (XSIM), etc. Then we have a simulation phase where the behavioral C source and RTL-C code are co-simulated with C compilers like GCC or using C-simulation HLS tools with test cases to verify the functional correctness of the generated RTL.

Figure 3.1: FastSim: Overall flow
Figure 3.1: FastSim: Overall flow

RTL to C Conversion

  • RTL Structure
  • AST Representation
  • Automatic Pre-processing of RTL
    • Concat Operation
    • Part-select Operation
  • RTL to C Conversion
    • Extraction of Variables, Controller and State wise Micro-
    • Rewrite Method
    • RAM, ROM and Modules
    • Generation of Cycle Accurate RTL-C Code

The next task is to identify the RT operations in each state of the FSM controller from the active microoperations in that state. This FSMD constitutes state-separated register transfers and is a behavioral description of the RTL.

Figure 3.4: AST Code Snippet
Figure 3.4: AST Code Snippet

Challenges Resolved in RTL to C Conversion

  • Data Inconsistency
  • Sign Conversion
  • Data-width Mismatch
  • Level-triggered Operations

On the other hand, the variable r C in RTL-C stores all 42 bits of the result. Second, we use the new value of the variable (instead of the old value as in edge.

Figure 3.13: Data inconsistency problem
Figure 3.13: Data inconsistency problem

Modelling Hardware Parallelism in C

Loop Unrolling

3.17, a is updated to an old + 5 before copying to b as shown in the solution given in Fig. The bare FSM model is similar to the basic FSM where register transfers form loop iterations.

Instruction Level Pipelining

The self loop transitionT3 of stateS3 indicates the inner for loop,loop-1.2 of the function warp. The main difference from conventional states is that, in addition to usual FSM state variables, there are two additional variables, S3 1 flag and S3 2 flag that represent the activation flags of substages S3-1 and S3-2, respectively, of stage S3.

Figure 3.20: Generated FSM for warp module
Figure 3.20: Generated FSM for warp module

Task Level Pipelining

In a no-data-flow case, FastSim makes a function call to such an FSMD in the state where the function was scheduled. In the data flow scenario, it also extracts the FIFO/PIPO structural instances and models the FIFO/PIPO transactions using FIFO/PIPO module function call.

Figure 3.22: Sample function “model flow”
Figure 3.22: Sample function “model flow”

Debug Framework and Performance Estimation

For loop boundaries and data-dependent conditionals, the execution cycles depend on the input. In fact, additional performance results such as the number of execution cycles for each module or loop and the number of cycles when a particular FIFO was full/empty can also be reported using additional trace variables in our RTL-C code.

Experimental Results

  • Experimental Setup and Benchmark Characteristics
  • RTL to C Conversion Results
  • HLS Simulation Results
  • Performance Estimation

The experimental details of the RTL to C conversion process are presented in Table 3.3 for the benchmark programs. In Table 3.7, we compare the performance estimates of FastSim with respect to different state-of-art simulation frameworks for 30,000 test cases.

Table 3.3: RTL to C conversion results for Benchmarks
Table 3.3: RTL to C conversion results for Benchmarks

Conclusion

Contributions

In this chapter, we propose a C to RTL equivalence check method called DEEQ to prove the correctness of the RTL generated by the HLS with respect to the input C code. It reduces many-to-many trace equivalence to one-to-one trace equivalence in most cases.

Equivalence Problem Formulation

The FSMD Model

Each arithmetic predicate is of the form eR0, where e is an arithmetic expression andR P t““,‰,ě,ą,ă,ďu. The execution conditions and path data transformation can also be defined in a similar way.

Figure 4.1: Example to illustrate FSMD Model
Figure 4.1: Example to illustrate FSMD Model

Equivalence of FSMDs

Equivalence Checking between C and RTL-C

  • Generate all Traces in Both Behaviors
  • Merge Compatible Traces
  • Find Potential Corresponding Traces
  • Checking One-to-one Equivalence
  • Checking One-to-many Equivalence

The union of the condition for the execution of all such traces must correspond to the condition for the execution of τ0 (in line 28). Since the behavior is deterministic, the union of the condition for the executions of all such traces must be the same as the trace of the other behavior.

Figure 4.2: Proposed equivalence checking framework
Figure 4.2: Proposed equivalence checking framework

Correctness

Termination

Although SMT expects the program to be in static single assignment (SSA) [32] form, we do not need to convert C and RTL-C to SSA. We do not use every line of the program in the SMT formulation. If the SMT solver returns SAT, it means that the corresponding formulas are not equivalent.

Soundness

For each τ0 in the rest of the trace in T0, a subset of traces is identified in T1 such that the data transformation of each of them is equivalent to τ0 and the union of its execution condition is equivalent to the execution condition of τ0. So the union of condition of performances of all such corresponding traces in T1 is also true.

Completeness

For a subset of traces in T0, one-to-one equivalence is shown with T1 in the while loop. Therefore, the underlying SMT theory used to encode the trace equivalence problem is undecidable.

Experiment Results and Analysis

The results of the equivalence check in terms of total time spent are given in 13. For Dfadd, the number of traces after merging is not equal. the equivalence check takes more time.

Table 4.1: Experimental results for different high-level synthesis benchmarks
Table 4.1: Experimental results for different high-level synthesis benchmarks

Conclusion

Contributions

In our approach, we first obtain SD-C and RTL-C from the scheduling information generated by the HLS tool and the output RTL, respectively. With this mapping information, we can rewrite the RTL-C in terms of variables in SD-C and finally generate an equivalent planned C code of the RTL-C.

Daikon based Reverse Engineering of Register to Variable Mapping

  • Reverse Engineering Steps
    • RTL-C Abstraction from RTL
    • Scheduled C code
    • Combine Two C Codes
    • Invariant Generation using Daikon
    • Extract Useful Mapping
  • Challenges Resolved
    • Generating Quality Test Cases
    • Unmapped Temporary Variables
    • Transitive Analysis of Daikon Output
    • One Register to Many Variables Mapping in a State
    • Mealy vs Moore Models
  • Correctness of Reverse Engineering Flow
  • Experimental Results

The useful mapping of the registers of RTL-C and the variables of SD-C is shown in Fig.5.8. The RTL-C and SD-C codes are then combined manually and the rest of the flow is automated.

Figure 5.1: Daikon based register to variable reverse engineering flow
Figure 5.1: Daikon based register to variable reverse engineering flow

SMT based Reverse Engineering of Register to Variable Mapping

Reverse Engineering Steps

  • FSMD Extraction from HLS Generated RTL
  • SSA Transformation
  • SAT Formulation

The reverse engineering method is presented as Algorithm 3. The lines 1-10 of Algorithm 3 perform this step. All the registers in the RTL-FSMD can be replaced by the corresponding variable with mvr.

Figure 5.11: SMT based Register to variable reverse engineering flow
Figure 5.11: SMT based Register to variable reverse engineering flow

Experimental Results

We have done simulation-based verification for our benchmarks to check the correctness of the registry for variable reverse engineering. The outputs match all benchmarks used, confirming the correctness of our reverse engineering flow.

Applications of REVAMP Framework

Register Allocation and Security Aspect

For example, fpvaq "fpvbq "fpvcq " rx means that the variables va, vb and vc are mapped to the register rx. In this work, we target the reverse engineering of the register to variable mapping in register allocation.

Correlating C and RTL

Fast Simulation and Debug

Comparisons of Daikon and SMT based Reverse Engineering Frameworks

Conclusion

Contributions

In [16], we have used the high-level behavior of RTL to prove the correctness of HLS by showing the equivalence between C and RTL. For the completeness of the chapter, we discuss the ideas of [15] and [16] and briefly here as well.

HLS Trojan Detection Framework

In [15], we showed a way to extract high-level behavior from HLS-generated RTLs. Specifically, we look for any inconsistencies or differences between high-level behavior extraction from RTL in [15] or during equivalence checking in [16].

Detection of Battery Exhaustion Attack

Attack Model

The output of the FUs is bit flipped and then multiplexed with the actual input of the FU. We consider only the relevant part of the data path to explain the effect of the battery exhaustion attack.

RTL-FSMD Extraction

Battery drain attack will be detected while deriving RTL-FSMD from Verilog RTL. The general idea of ​​the RTL-FSMD extraction process will be explained in the next subsection and how the BE attack will be detected during that phase in the subsequent subsection.

Detection

1 Collect all bit-reversed registers in state sn in Rbf from RSn (Ref. Algorithm 1);. Since r4 and r5 appear in the input sample of the multiplier, Algorithm 4 will report a possible battery drain attack.

Detection of Degradation Attack

Attack Model

As a result, the number of tracks in the RTL-FSMD increased and the execution state of some tracks also changed. Consequently, the equivalence of some traces of C-FSMD could not be found in RTL-FSMD.

C to RTL Equivalence Checking

DA will be detected during equivalence checking of two FSMDs (between C-FSMD and RTL-FSMD). In such a situation, we will analyze to find a set of tracks in RTL-FSMD whose association corresponds to the track in C-FSMD.

Detection

A potentially matched pair of tracks is equivalent if their respective execution condition and data transformations are equivalent (in lines 8-10). Therefore, a careful manual inspection of the execution condition of the traces will identify if a spurious HT trigger condition has been added in the RTL-FSMD.

Detection of Downgrade Attack

Attack Model

We only consider the relevant part of the data path to explain the effect of a downgrade attack. If the trigger condition is True, the loop is repeated in2 times instead of in1 times.

Figure 6.6: (a) C Code from AES before adding tj, (b) Representative RTL-C after downgrade attack
Figure 6.6: (a) C Code from AES before adding tj, (b) Representative RTL-C after downgrade attack

Detection

By carefully analyzing the traces included in the ccomb, we can identify a false HT trigger condition for a step-down attack and the loop counter value is changed. In addition to the number of loops, the actual operations are also affected by the change.

Figure 6.8: An example to illustrate the effect of downgrade attack: (a) Trace of τ 00 from C- C-FSMD obtained after unrolling (nb = 64) (b) Trace of τ 10 from RTL-FSMD obtained after unrolling (nb = 32) (c) Trace of τ 11 from RTL-FSMD obtained after unrol
Figure 6.8: An example to illustrate the effect of downgrade attack: (a) Trace of τ 00 from C- C-FSMD obtained after unrolling (nb = 64) (b) Trace of τ 10 from RTL-FSMD obtained after unrolling (nb = 32) (c) Trace of τ 11 from RTL-FSMD obtained after unrol

Experimental Results

In the case of exhaustion attack, the detection time is shorter compared to other attacks because the BE attack is identified during the FSMD extraction phase. From the device usage report summary obtained after synthesis, we calculate the overhead (slices, flipflops and LUTs) required by the additional logic added to implement BE, DA and DG in the original RTL.

Table 6.1: Experimental Results of Battery exhaustion degrade, and downgrade attacks for dif- dif-ferent high-level synthesis benchmarks
Table 6.1: Experimental Results of Battery exhaustion degrade, and downgrade attacks for dif- dif-ferent high-level synthesis benchmarks

Performance of BLAST for HLS Optimizations

Such optimizations do not affect the control flow of the input behavior much. Therefore, the performance of BLAST will not be affected by applications of such software optimizations in the front-end of HLS.

Figure 6.9: Representation of pipelined loop in C
Figure 6.9: Representation of pipelined loop in C

Conclusion

Daikon and SMT-based frameworks for reverse engineering the register to variable mapping in high-level synthesis are presented in Chapter 5. The experimental results show that our methods detect HTs from the black-hat HLS tool.

Future Perspectives

Critical Path Reporting at the Behavioral Level: In circuit design, a critical path is the longest combinational path in a circuit that must be completed within a target time period. Although the overhead area is not important for such locking, additional logic can be inserted into the critical path of the design.

Conclusion

FastSim: A fast simulation framework for high-level synthesis”, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. A fast simulation framework for high-level synthesis. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

High-level Synthesis flow

Example of 2 nd order differential equation solver (DIFFEQ)

Basic Blocks with 3-address codes for DIFFEQ

RTL: (a) Datapath unit, (b) Controller unit

Contributions of the Thesis

FastSim: Overall flow

RTL structure generated by HLS

AST Code Snippet

Example of pre-processing of Concat operation

Example of pre-processing part-select operation

Parser Flow Diagram

Data-structure for storing variables

Data-structure for storing Controller

Data-structure for state-wise Micro-operations with their conditions

RTL-C code snippet for RAM module

Generated RTL-C code outline

Data inconsistency problem

A Solution to Data inconsistency solution

Sign conversion issue and solution

Data-width mismatch and solution

RTL with level triggered operation

RTL-C Code with level-triggered always block issue and its solution

Sample function warp with pipelined loop

Generated FSM for warp module

A representative code snippet depicting the pipelined state S3 in warp module 56

Control flow of RTL-C: Task level pipeline

Representative RTL-C code structure of main

Representative RTL-C code structure of module 2

Example to illustrate FSMD Model

Proposed equivalence checking framework

Daikon based register to variable reverse engineering flow

C source code of Diffeq example

State 1 of diffeq.verbose.rpt file generated by Vivado HLS

State 1 of scheduled C code after decoding verbose file

State 1 in RTL-C of Diffeq example

State 1 of combined C code

Daikon invariants output for state 1

Mapping of State 1 registers to variables

SMT based Register to variable reverse engineering flow

SSA example

The overall flow of our HT detection framework

An example to illustrate the effect of battery exhaustion attack

Overview of EDA Tools for High-Level Synthesis

C simulation vs RTL simulation comparison

Characteristics of Benchmarks

RTL to C conversion results for Benchmarks

Comparisons of FastSim with various RTL Simulators

Comparisons of FastSim with various RTL Simulators after applying pipeline

Comparisons of results for task level pipelining using ping pong (pp) or FIFO

Performance estimation in clock cycles by Vivado synthesis report, Fastsim

Experimental results for different high-level synthesis benchmarks

Experimental Results for different high-level synthesis benchmarks

Experimental Results for different high-level synthesis benchmarks

Experimental Results of Battery exhaustion degrade, and downgrade attacks

Comparisons of area overhead for RTL (original) with respect to the RTL(BE),

Comparisons of increase in delay for RTL (original) with RTL(BE) and

Figure

Figure 1.1: High-level Synthesis flow
Figure 1.5: (a) ASAP, (b) ALAP, (c) List, (D) Force directed scheduling of DIFFEQ example
Figure 1.6: (a) Registers allocation and binding , (b) Multiplier allocation and binding of DIFFEQ example
Figure 1.7: RTL: (a) Datapath unit, (b) Controller unit
+7

References

Related documents

Teachers have reported that students are more likely to complete assignments if they have access to internet at home.] In response to school closures caused by COVID-19, UNESCO