Performance of BLAST for HLS Optimizations
Table 6.3: Comparisons of increase in delay for RTL (original) with RTL(BE) and RTL(DA)
Benchmark Time Summary(ns)
RTL codes RTL
(original) RTL (BE) Increase
in delay RTL (DA) Increase in delay Parker
MIATBC 6.425 7.724 1.299 7.969 1.544
MORTAC 10.785 11.857 1.072 11.913 1.128
MCPD 9.625 10.574 0.949 10.689 1.064
WAKA
MIATBC 5.585 6.714 1.129 6.925 1.340
MORTAC 7.921 8.985 1.064 9.566 1.645
MCPD 7.243 8.123 0.880 8.964 1.721
Motion
MIATBC 9.265 10.148 0.883 10.282 1.017
MORTAC 12.709 13.814 1.105 13.965 1.256
MCPD 9.869 10.871 1.002 11.120 1.251
Array add
MIATBC 2.151 4.179 2.028 4.653 2.502
MORTAC 5.212 6.871 1.659 6.789 1.577
MCPD 4.947 5.986 1.039 6.125 1.178
DFadd
MIATBC 2.880 4.200 1.320 4.542 1.662
MORTAC 5.330 6.812 1.482 7.152 1.822
MCPD 4.947 6.100 1.153 6.512 1.565
AES
MIATBC 1.900 2.810 0.91 3.010 1.110
MORTAC 5.781 6.662 0.881 6.753 0.972
MCPD 4.977 6.170 1.193 6.341 1.364
DES
MIATBC 2.891 3.743 0.852 3.934 1.043
MORTAC 5.542 6.336 0.794 6.482 0.94
MCPD 4.943 5.891 0.948 5.982 1.039
BLAST: Belling the Black-Hat High-Level Synthesis Tool
lence checking phase. The RTL-FSMD extraction phase is relied on the FastSim [15] tool.
This tool is equipped to handle all kinds of optimizations applied in HLS. To detect BE attack, BLAST essentially adds a module (i.e., Algorithm 4) in FastSim flow to analyze the BE attack in presence of bit-flipped operations in a state as discussed in Subsection 6.3.3.
Therefore, the BE attack can be detected by BLAST irrespective of what HLS optimiza- tions are applied by the HLS tool. Since BLAST analyzes the RTLs generated by the HLS tool in a state-wise manner of the controller FSM, the run time of BE attack detection is not impacted much by the applications of HLS optimizations. Usually, the BE attack is identified in seconds.
The DA and DG attacks detection rely on the C to RTL equivalence checking in which the RTL-FSMD extracted in the phase one is formally compared with the input C be- haviour (i.e., C-FSMD). Since BLAST checks the trace level equivalence between these two behaviours, a major change in the control flow due to HLS optimizations will impact DA and DG attacks detection probability. The front-end optimizations like constant propaga- tion, copy propagation, common sub-expression elimination, dead code elimination, static single assignment, code motion, operator strength reduction (e.g., multiplication by con- stant is replaced by left shift by constant), etc. mostly impact the data dependence in the behaviour. Such optimizations do not impact much on the control flow of the input be- haviour. Therefore, the performance of BLAST won’t be impacted by applications of such software optimizations in the front-end of the HLS.
Let us now discuss the hardware oriented optimizations. The array partitioning es- sentially breaks an array into multiple arrays to map them into multiple RAMs in order to improve memory access time. The array merging is the reverse process of array partitioning.
In our case, the RAMs are represented as arrays in RTL-FSMD. So, we have two behaviours where the number of intermediate arrays are different. The control structure of the input be- haviour is not impacted by this optimization. Therefore, array partitioning/merging won’t impact our DA and DG detection. Loop unrolling unrolls the loop of input C. In algorithm 5, we use Klee to identify traces in the behaviours. Klee unrolls loops to identify the traces.
Although loop unrolling changes the control structure, it won’t impact the detection of DA and DG attacks in BLAST since loops are unrolled during detection.
The loop pipelining creates multiple stages within a state where each stage works on the data of different iterations of the loop. This helps in running the multiple iterations of the loop in parallel to improve the latency. For a pipelined function, the pipelined stages
Performance of BLAST for HLS Optimizations
stage1:
a = I1 + 10;
b = I2 + 5;
stage2:
c = a + b;
state3:
d = c * c;
(a)
State 1:
//Code to update stage1, stage2 and stage3 flags a_t = a, b_t = b, c_t = c;
if(stage1){
a = I1 + 10; b = I2 + 5; } if(stage2)
c = a_t + b_t;
if(stage3) d = c_t * c_t;
(b)
Figure 6.9: Representation of pipelined loop in C
work in a similar manner. The FastSim creates a sequential representation of the pipelined stages with suitable logic to handle the inherent dataflow among the subsequent stages.
Consider the example in Fig. 6.9 to understand the fact. Assume the operations within a loop body are scheduled in three pipeline stages as shown in Fig. 6.9(a). The corresponding RTL-FSMD behaviour is shown in Fig. 6.9(b). Each pipeline stage is activated by a flag.
In the first clock, only stage 1 is active and in the second clock, both stage 1 and stage 2 are active. From the third clock, all stages are active. FastSim copies the value of each intermediate variable into a temporary variable and uses them in the right-hand expression of the operations. Consequently, at ith clock, stage 1 works onith inputs, stage 2 works on thepi´1qthdata and state 3 works onpi´2qthdata. During equivalence checking between C- FSMD and RTL-FSMD, such pipelined loop will result in a single trace. The corresponding loop of C-FSMD also results in a single trace. Thus, there won’t be any change in the control flow between C-FSMD and RTL-FSMD in presence of loop pipelining. Therefore, BLAST can detect DA and DG attacks in presence of loop and function pipelining.
In data-flow optimization, the producer-consumer relation between various modules in the input C code is identified and such modules are executed in parallel in RTL. The first-in-first-out (FIFO) or Ping-Pong buffer is used between a producer-consumer pair for asynchronous data communication between them. To model such parallel behaviour in RTL-FSMD (which is a sequential behaviour), we extract the RTL-FSMD for each module first. We then generate a global RTL-FSMD in which one of the states of each module will be executed in each clock1. The next state to be executed in a module is determined by the state transition of RTL-FSMD of the corresponding module. The detail of such modeling
1Since RTL-FSMD is a cycle accurate model, operations executed in each clock can be tracked.
BLAST: Belling the Black-Hat High-Level Synthesis Tool
may be found in [15]. Since the control flow of the RTL-FSMD in presence of data flow optimization is completely different from that of the C-FSMD, BLAST cannot detect DA or DG attach in such a scenario. Specifically, the Algorithm 5 returns false-negative (in line no 19) if dataflow optimizations are applied. In general, if the control flow of the input behaviour is modified significantly by HLS, BLAST may return false-negative results. We have used Klee to obtain the traces in a program (line 3of Algorithm5) and Z3 SMT solver for checking the equivalence of traces in our Algorithms. So, the run time of BLAST largely depends on these two tools to detect DA and DG attacks.