• No results found

Parallelization and Vectorization in GCC

N/A
N/A
Protected

Academic year: 2022

Share "Parallelization and Vectorization in GCC"

Copied!
124
0
0

Loading.... (view fulltext now)

Full text

(1)

Tutorial on GCC for Parallelization

Parallelization and Vectorization in GCC

Uday Khedker, Supratim Biswas, and Prashant Singh Rawat

(www.cse.iitb.ac.in/grc)

GCC Resource Center,

Department of Computer Science and Engineering, Indian Institute of Technology, Bombay

February 2012

(2)

Feb 2012

Outline

• Transformation for parallel and vector execution

• Data dependence

• Auto-parallelization and auto-vectorization in Lambda Framework

• Introduction to Polyhedral Framework

• Conclusion

(3)

Feb 2012

The Scope of This Tutorial

• What this tutorial does not address

Details of algorithms, code and data structures used for parallelization and vectorization

Machine level issues related to parallelization and vectorization

• What this tutorial addresses

GCC’s approach of discovering and exploiting parallelism

Illustrated using carefully chosen examples

(4)

Part 1

Transformations for Parallel and

Vector Execution

(5)

Feb 2012

Vectorization: SISD ⇒ SIMD

• Parallelism in executing operation on shorter operands (8-bit, 16-bit, 32-bit operands)

• Existing 32 or 64-bit arithmetic units used to perform multiple operations in parallel

A 64 bit word ≡ a vector of 2 × (32 bits), 4 × (16 bits), or 8 × (8 bits)

(6)

Feb 2012

Example 1

Vectorization (SISD ⇒ SIMD) : Yes Parallelization (SISD ⇒ MIMD) : Yes

Original Code int A[N], B[N], i;

for (i=1; i<N; i++)

A[i] = A[i] + B[i-1];

(7)

Feb 2012

Example 1

Vectorization (SISD ⇒ SIMD) : Yes Parallelization (SISD ⇒ MIMD) : Yes

Original Code int A[N], B[N], i;

for (i=1; i<N; i++) A[i] = A[i] + B[i-1];

Observe reads and writes into a given location

. . . . . .

A[0..N]

B[0..N]

(8)

Feb 2012

Example 1

Vectorization (SISD ⇒ SIMD) : Yes Parallelization (SISD ⇒ MIMD) : Yes

Original Code int A[N], B[N], i;

for (i=1; i<N; i++) A[i] = A[i] + B[i-1];

Observe reads and writes into a given location

. . . . . .

A[0..N]

B[0..N]

(9)

Feb 2012

Example 1

Vectorization (SISD ⇒ SIMD) : Yes Parallelization (SISD ⇒ MIMD) : Yes

Original Code int A[N], B[N], i;

for (i=1; i<N; i++) A[i] = A[i] + B[i-1];

Observe reads and writes into a given location

. . . . . .

A[0..N]

B[0..N]

Iteration # 1

(10)

Feb 2012

Example 1

Vectorization (SISD ⇒ SIMD) : Yes Parallelization (SISD ⇒ MIMD) : Yes

Original Code int A[N], B[N], i;

for (i=1; i<N; i++) A[i] = A[i] + B[i-1];

Observe reads and writes into a given location

. . . . . .

A[0..N]

B[0..N]

Iteration # 1 2

(11)

Feb 2012

Example 1

Vectorization (SISD ⇒ SIMD) : Yes Parallelization (SISD ⇒ MIMD) : Yes

Original Code int A[N], B[N], i;

for (i=1; i<N; i++) A[i] = A[i] + B[i-1];

Observe reads and writes into a given location

. . . . . .

A[0..N]

B[0..N]

Iteration # 1 2 3

(12)

Feb 2012

Example 1

Vectorization (SISD ⇒ SIMD) : Yes Parallelization (SISD ⇒ MIMD) : Yes

Original Code int A[N], B[N], i;

for (i=1; i<N; i++) A[i] = A[i] + B[i-1];

Observe reads and writes into a given location

. . . . . .

A[0..N]

B[0..N]

Iteration # 1 2 3 4

(13)

Feb 2012

Example 1

Vectorization (SISD ⇒ SIMD) : Yes Parallelization (SISD ⇒ MIMD) : Yes

Original Code int A[N], B[N], i;

for (i=1; i<N; i++) A[i] = A[i] + B[i-1];

Observe reads and writes into a given location

. . . . . .

A[0..N]

B[0..N]

Iteration # 1 2 3 4 5

(14)

Feb 2012

Example 1

Vectorization (SISD ⇒ SIMD) : Yes Parallelization (SISD ⇒ MIMD) : Yes

Original Code int A[N], B[N], i;

for (i=1; i<N; i++) A[i] = A[i] + B[i-1];

Observe reads and writes into a given location

. . . . . .

A[0..N]

B[0..N]

Iteration # 1 2 3 4 5 6

(15)

Feb 2012

Example 1

Vectorization (SISD ⇒ SIMD) : Yes Parallelization (SISD ⇒ MIMD) : Yes

Original Code int A[N], B[N], i;

for (i=1; i<N; i++) A[i] = A[i] + B[i-1];

Observe reads and writes into a given location

. . . . . .

A[0..N]

B[0..N]

Iteration # 1 2 3 4 5 6 7

(16)

Feb 2012

Example 1

Vectorization (SISD ⇒ SIMD) : Yes Parallelization (SISD ⇒ MIMD) : Yes

Original Code int A[N], B[N], i;

for (i=1; i<N; i++) A[i] = A[i] + B[i-1];

Observe reads and writes into a given location

. . . . . .

A[0..N]

B[0..N]

Iteration # 1 2 3 4 5 6 7 8

(17)

Feb 2012

Example 1

Vectorization (SISD ⇒ SIMD) : Yes Parallelization (SISD ⇒ MIMD) : Yes

Original Code int A[N], B[N], i;

for (i=1; i<N; i++) A[i] = A[i] + B[i-1];

Observe reads and writes into a given location

. . . . . .

A[0..N]

B[0..N]

Iteration # 1 2 3 4 5 6 7 8 9

(18)

Feb 2012

Example 1

Vectorization (SISD ⇒ SIMD) : Yes Parallelization (SISD ⇒ MIMD) : Yes

Original Code int A[N], B[N], i;

for (i=1; i<N; i++) A[i] = A[i] + B[i-1];

Observe reads and writes into a given location

. . . . . .

A[0..N]

B[0..N]

Iteration # 1 2 3 4 5 6 7 8 9 10

(19)

Feb 2012

Example 1

Vectorization (SISD ⇒ SIMD) : Yes Parallelization (SISD ⇒ MIMD) : Yes

Original Code int A[N], B[N], i;

for (i=1; i<N; i++) A[i] = A[i] + B[i-1];

Observe reads and writes into a given location

. . . . . .

A[0..N]

B[0..N]

Iteration # 1 2 3 4 5 6 7 8 9 10 11

(20)

Feb 2012

Example 1

Vectorization (SISD ⇒ SIMD) : Yes Parallelization (SISD ⇒ MIMD) : Yes

Original Code int A[N], B[N], i;

for (i=1; i<N; i++) A[i] = A[i] + B[i-1];

Observe reads and writes into a given location

. . . . . .

A[0..N]

B[0..N]

Iteration # 1 2 3 4 5 6 7 8 9 10 11 12 . . .

(21)

Feb 2012

Example 1

Vectorization (SISD ⇒ SIMD) : Yes Parallelization (SISD ⇒ MIMD) : Yes

Original Code Vectorized Code

int A[N], B[N], i;

for (i=1; i<N; i++) A[i] = A[i] + B[i-1];

int A[N], B[N], i;

for (i=1; i<N; i=i+ 4 ) A[i:i+3] = A[i:i+3] + B[i-1:i+2];

. . . . . .

A[0..N]

B[0..N]

Iteration #

Vectorization

Factor

(22)

Feb 2012

Example 1

Vectorization (SISD ⇒ SIMD) : Yes Parallelization (SISD ⇒ MIMD) : Yes

Original Code Vectorized Code

int A[N], B[N], i;

for (i=1; i<N; i++) A[i] = A[i] + B[i-1];

int A[N], B[N], i;

for (i=1; i<N; i=i+ 4 ) A[i:i+3] = A[i:i+3] + B[i-1:i+2];

. . . . . .

A[0..N]

B[0..N]

Iteration #

Vectorization Factor

1

(23)

Feb 2012

Example 1

Vectorization (SISD ⇒ SIMD) : Yes Parallelization (SISD ⇒ MIMD) : Yes

Original Code Vectorized Code

int A[N], B[N], i;

for (i=1; i<N; i++) A[i] = A[i] + B[i-1];

int A[N], B[N], i;

for (i=1; i<N; i=i+ 4 ) A[i:i+3] = A[i:i+3] + B[i-1:i+2];

. . . . . .

A[0..N]

B[0..N]

Iteration #

Vectorization Factor

1 2

(24)

Feb 2012

Example 1

Vectorization (SISD ⇒ SIMD) : Yes Parallelization (SISD ⇒ MIMD) : Yes

Original Code Vectorized Code

int A[N], B[N], i;

for (i=1; i<N; i++) A[i] = A[i] + B[i-1];

int A[N], B[N], i;

for (i=1; i<N; i=i+ 4 ) A[i:i+3] = A[i:i+3] + B[i-1:i+2];

. . . . . .

A[0..N]

B[0..N]

Iteration #

Vectorization Factor

1 2 3 . . .

(25)

Feb 2012

Example 1 Vectorization (SISD ⇒ SIMD) : Yes Parallelization (SISD ⇒ MIMD) : Yes

Original Code int A[N], B[N], i;

for (i=1; i<N; i++) A[i] = A[i] + B[i-1];

Observe reads and writes

into a given location

(26)

Feb 2012

Example 1 Vectorization (SISD ⇒ SIMD) : Yes Parallelization (SISD ⇒ MIMD) : Yes

Original Code int A[N], B[N], i;

for (i=1; i<N; i++) A[i] = A[i] + B[i-1];

Observe reads and writes into a given location

. . . . . .

A[0..N]

B[0..N]

(27)

Feb 2012

Example 1 Vectorization (SISD ⇒ SIMD) : Yes Parallelization (SISD ⇒ MIMD) : Yes

Original Code int A[N], B[N], i;

for (i=1; i<N; i++) A[i] = A[i] + B[i-1];

Observe reads and writes into a given location

. . . . . .

A[0..N]

B[0..N]

Iteration #

(28)

Feb 2012

Example 1 Vectorization (SISD ⇒ SIMD) : Yes Parallelization (SISD ⇒ MIMD) : Yes

Original Code Parallelized Code int A[N], B[N], i;

for (i=1; i<N; i++) A[i] = A[i] + B[i-1];

int A[N], B[N], i;

foreach (i=1; i<N; ) A[i] = A[i] + B[i-1];

. . . . . .

A[0..N]

B[0..N]

Iteration # 1

(29)

Feb 2012

Example 1: The Moral of the Story Vectorization (SISD ⇒ SIMD) : Yes

Parallelization (SISD ⇒ MIMD) : Yes

int A[N], B[N], i;

for (i=1; i<N; i++) A[i] = A[i] + B[i-1];

Observe reads and writes into a given location

. . . . . .

A[0..N]

B[0..N]

(30)

Feb 2012

Example 1: The Moral of the Story Vectorization (SISD ⇒ SIMD) : Yes

Parallelization (SISD ⇒ MIMD) : Yes

int A[N], B[N], i;

for (i=1; i<N; i++) A[i] = A[i] + B[i-1];

. . . . . .

A[0..N]

B[0..N]

When the same location is accessed across different iter- ations, the order of reads and writes must be preserved

Nature of accesses in our example Iteration i Iteration i + k Observation

Read Write

Write Read

Write Write

Read Read

(31)

Feb 2012

Example 1: The Moral of the Story Vectorization (SISD ⇒ SIMD) : Yes

Parallelization (SISD ⇒ MIMD) : Yes

int A[N], B[N], i;

for (i=1; i<N; i++) A[i] = A[i] + B[i-1];

. . . . . .

A[0..N]

B[0..N]

When the same location is accessed across different iter- ations, the order of reads and writes must be preserved

Nature of accesses in our example Iteration i Iteration i + k Observation

Read Write No

Write Read

Write Write

Read Read

(32)

Feb 2012

Example 1: The Moral of the Story Vectorization (SISD ⇒ SIMD) : Yes

Parallelization (SISD ⇒ MIMD) : Yes

int A[N], B[N], i;

for (i=1; i<N; i++) A[i] = A[i] + B[i-1];

. . . . . .

A[0..N]

B[0..N]

When the same location is accessed across different iter- ations, the order of reads and writes must be preserved

Nature of accesses in our example Iteration i Iteration i + k Observation

Read Write No

Write Read No

Write Write

Read Read

(33)

Feb 2012

Example 1: The Moral of the Story Vectorization (SISD ⇒ SIMD) : Yes

Parallelization (SISD ⇒ MIMD) : Yes

int A[N], B[N], i;

for (i=1; i<N; i++) A[i] = A[i] + B[i-1];

. . . . . .

A[0..N]

B[0..N]

When the same location is accessed across different iter- ations, the order of reads and writes must be preserved

Nature of accesses in our example Iteration i Iteration i + k Observation

Read Write No

Write Read No

Write Write No

Read Read

(34)

Feb 2012

Example 1: The Moral of the Story Vectorization (SISD ⇒ SIMD) : Yes

Parallelization (SISD ⇒ MIMD) : Yes

int A[N], B[N], i;

for (i=1; i<N; i++) A[i] = A[i] + B[i-1];

. . . . . .

A[0..N]

B[0..N]

When the same location is accessed across different iter- ations, the order of reads and writes must be preserved

Nature of accesses in our example Iteration i Iteration i + k Observation

Read Write No

Write Read No

Write Write No

Read Read Does not matter

(35)

Feb 2012

Example 2 Vectorization (SISD ⇒ SIMD) : Yes Parallelization (SISD ⇒ MIMD) : No

Original Code int A[N], B[N], i;

for (i=0; i<N; i++)

A[i] = A[i+1] + B[i];

(36)

Feb 2012

Example 2 Vectorization (SISD ⇒ SIMD) : Yes Parallelization (SISD ⇒ MIMD) : No

Original Code int A[N], B[N], i;

for (i=0; i<N; i++) A[i] = A[i+1] + B[i];

Observe reads and writes into a given location

. . . . . .

A[0..N]

B[0..N]

(37)

Feb 2012

Example 2 Vectorization (SISD ⇒ SIMD) : Yes Parallelization (SISD ⇒ MIMD) : No

Original Code int A[N], B[N], i;

for (i=0; i<N; i++) A[i] = A[i+1] + B[i];

Observe reads and writes into a given location

. . . . . .

A[0..N]

B[0..N]

Iteration #

(38)

Feb 2012

Example 2 Vectorization (SISD ⇒ SIMD) : Yes Parallelization (SISD ⇒ MIMD) : No

Original Code int A[N], B[N], i;

for (i=0; i<N; i++) A[i] = A[i+1] + B[i];

Observe reads and writes into a given location

. . . . . .

A[0..N]

B[0..N]

Iteration # 1

(39)

Feb 2012

Example 2 Vectorization (SISD ⇒ SIMD) : Yes Parallelization (SISD ⇒ MIMD) : No

Original Code int A[N], B[N], i;

for (i=0; i<N; i++) A[i] = A[i+1] + B[i];

Observe reads and writes into a given location

. . . . . .

A[0..N]

B[0..N]

Iteration # 1 2

(40)

Feb 2012

Example 2 Vectorization (SISD ⇒ SIMD) : Yes Parallelization (SISD ⇒ MIMD) : No

Original Code int A[N], B[N], i;

for (i=0; i<N; i++) A[i] = A[i+1] + B[i];

Observe reads and writes into a given location

. . . . . .

A[0..N]

B[0..N]

Iteration # 1 2 3

(41)

Feb 2012

Example 2 Vectorization (SISD ⇒ SIMD) : Yes Parallelization (SISD ⇒ MIMD) : No

Original Code int A[N], B[N], i;

for (i=0; i<N; i++) A[i] = A[i+1] + B[i];

Observe reads and writes into a given location

. . . . . .

A[0..N]

B[0..N]

Iteration # 1 2 3 4

(42)

Feb 2012

Example 2 Vectorization (SISD ⇒ SIMD) : Yes Parallelization (SISD ⇒ MIMD) : No

Original Code int A[N], B[N], i;

for (i=0; i<N; i++) A[i] = A[i+1] + B[i];

Observe reads and writes into a given location

. . . . . .

A[0..N]

B[0..N]

Iteration # 1 2 3 4 5

(43)

Feb 2012

Example 2 Vectorization (SISD ⇒ SIMD) : Yes Parallelization (SISD ⇒ MIMD) : No

Original Code int A[N], B[N], i;

for (i=0; i<N; i++) A[i] = A[i+1] + B[i];

Observe reads and writes into a given location

. . . . . .

A[0..N]

B[0..N]

Iteration # 1 2 3 4 5 6

(44)

Feb 2012

Example 2 Vectorization (SISD ⇒ SIMD) : Yes Parallelization (SISD ⇒ MIMD) : No

Original Code int A[N], B[N], i;

for (i=0; i<N; i++) A[i] = A[i+1] + B[i];

Observe reads and writes into a given location

. . . . . .

A[0..N]

B[0..N]

Iteration # 1 2 3 4 5 6 7

(45)

Feb 2012

Example 2 Vectorization (SISD ⇒ SIMD) : Yes Parallelization (SISD ⇒ MIMD) : No

Original Code int A[N], B[N], i;

for (i=0; i<N; i++) A[i] = A[i+1] + B[i];

Observe reads and writes into a given location

. . . . . .

A[0..N]

B[0..N]

Iteration # 1 2 3 4 5 6 7 8

(46)

Feb 2012

Example 2 Vectorization (SISD ⇒ SIMD) : Yes Parallelization (SISD ⇒ MIMD) : No

Original Code int A[N], B[N], i;

for (i=0; i<N; i++) A[i] = A[i+1] + B[i];

Observe reads and writes into a given location

. . . . . .

A[0..N]

B[0..N]

Iteration # 1 2 3 4 5 6 7 8 9

(47)

Feb 2012

Example 2 Vectorization (SISD ⇒ SIMD) : Yes Parallelization (SISD ⇒ MIMD) : No

Original Code int A[N], B[N], i;

for (i=0; i<N; i++) A[i] = A[i+1] + B[i];

Observe reads and writes into a given location

. . . . . .

A[0..N]

B[0..N]

Iteration # 1 2 3 4 5 6 7 8 9 10

(48)

Feb 2012

Example 2 Vectorization (SISD ⇒ SIMD) : Yes Parallelization (SISD ⇒ MIMD) : No

Original Code int A[N], B[N], i;

for (i=0; i<N; i++) A[i] = A[i+1] + B[i];

Observe reads and writes into a given location

. . . . . .

A[0..N]

B[0..N]

Iteration # 1 2 3 4 5 6 7 8 9 10 11

(49)

Feb 2012

Example 2 Vectorization (SISD ⇒ SIMD) : Yes Parallelization (SISD ⇒ MIMD) : No

Original Code int A[N], B[N], i;

for (i=0; i<N; i++) A[i] = A[i+1] + B[i];

Observe reads and writes into a given location

. . . . . .

A[0..N]

B[0..N]

Iteration # 1 2 3 4 5 6 7 8 9 10 11 12 12 . . .

(50)

Feb 2012

Example 2 Vectorization (SISD ⇒ SIMD) : Yes Parallelization (SISD ⇒ MIMD) : No

Original Code int A[N], B[N], i;

for (i=0; i<N; i++) A[i] = A[i+1] + B[i];

• Vector instruction is synchronized: All reads before writes in a given instruction

. . . . . .

A[0..N]

B[0..N]

Iteration #

(51)

Feb 2012

Example 2 Vectorization (SISD ⇒ SIMD) : Yes Parallelization (SISD ⇒ MIMD) : No

Original Code int A[N], B[N], i;

for (i=0; i<N; i++) A[i] = A[i+1] + B[i];

• Vector instruction is synchronized: All reads before writes in a given instruction

. . . . . .

A[0..N]

B[0..N]

Iteration # 1

(52)

Feb 2012

Example 2 Vectorization (SISD ⇒ SIMD) : Yes Parallelization (SISD ⇒ MIMD) : No

Original Code int A[N], B[N], i;

for (i=0; i<N; i++) A[i] = A[i+1] + B[i];

• Vector instruction is synchronized: All reads before writes in a given instruction

. . . . . .

A[0..N]

B[0..N]

Iteration # 1 2

(53)

Feb 2012

Example 2 Vectorization (SISD ⇒ SIMD) : Yes Parallelization (SISD ⇒ MIMD) : No

Original Code int A[N], B[N], i;

for (i=0; i<N; i++) A[i] = A[i+1] + B[i];

• Vector instruction is synchronized: All reads before writes in a given instruction

. . . . . .

A[0..N]

B[0..N]

Iteration # 1 2 3 . . .

(54)

Feb 2012

Example 2 Vectorization (SISD ⇒ SIMD) : Yes Parallelization (SISD ⇒ MIMD) : No

Original Code int A[N], B[N], i;

for (i=0; i<N; i++) A[i] = A[i+1] + B[i];

• Vector instruction is synchronized: All reads before writes in a given instruction

• Read-writes across multiple instructions ex- ecuting in parallel may not be synchronized

. . . . . .

A[0..N]

B[0..N]

Iteration # 1 2 3 . . .

(55)

Feb 2012

Example 2: The Moral of the Story Vectorization (SISD ⇒ SIMD) : Yes

Parallelization (SISD ⇒ MIMD) : No Original Code

int A[N], B[N], i;

for (i=0; i<N; i++) A[i] = A[i+1] + B[i];

Observe reads and writes into a given location

. . . . . .

A[0..N]

B[0..N]

(56)

Feb 2012

Example 2: The Moral of the Story Vectorization (SISD ⇒ SIMD) : Yes

Parallelization (SISD ⇒ MIMD) : No Original Code

int A[N], B[N], i;

for (i=0; i<N; i++) A[i] = A[i+1] + B[i];

. . . . . .

A[0..N]

B[0..N]

When the same location is accessed across different iter- ations, the order of reads and writes must be preserved

Nature of accesses in our example Iteration i Iteration i + k Observation

Read Write

Write Read

Write Write

Read Read

(57)

Feb 2012

Example 2: The Moral of the Story Vectorization (SISD ⇒ SIMD) : Yes

Parallelization (SISD ⇒ MIMD) : No Original Code

int A[N], B[N], i;

for (i=0; i<N; i++) A[i] = A[i+1] + B[i];

. . . . . .

A[0..N]

B[0..N]

When the same location is accessed across different iter- ations, the order of reads and writes must be preserved

Nature of accesses in our example Iteration i Iteration i + k Observation

Read Write Yes

Write Read

Write Write

Read Read

(58)

Feb 2012

Example 2: The Moral of the Story Vectorization (SISD ⇒ SIMD) : Yes

Parallelization (SISD ⇒ MIMD) : No Original Code

int A[N], B[N], i;

for (i=0; i<N; i++) A[i] = A[i+1] + B[i];

. . . . . .

A[0..N]

B[0..N]

When the same location is accessed across different iter- ations, the order of reads and writes must be preserved

Nature of accesses in our example Iteration i Iteration i + k Observation

Read Write Yes

Write Read No

Write Write

Read Read

(59)

Feb 2012

Example 2: The Moral of the Story Vectorization (SISD ⇒ SIMD) : Yes

Parallelization (SISD ⇒ MIMD) : No Original Code

int A[N], B[N], i;

for (i=0; i<N; i++) A[i] = A[i+1] + B[i];

. . . . . .

A[0..N]

B[0..N]

When the same location is accessed across different iter- ations, the order of reads and writes must be preserved

Nature of accesses in our example Iteration i Iteration i + k Observation

Read Write Yes

Write Read No

Write Write No

Read Read

(60)

Feb 2012

Example 2: The Moral of the Story Vectorization (SISD ⇒ SIMD) : Yes

Parallelization (SISD ⇒ MIMD) : No Original Code

int A[N], B[N], i;

for (i=0; i<N; i++) A[i] = A[i+1] + B[i];

. . . . . .

A[0..N]

B[0..N]

When the same location is accessed across different iter- ations, the order of reads and writes must be preserved

Nature of accesses in our example Iteration i Iteration i + k Observation

Read Write Yes

Write Read No

Write Write No

Read Read Does not matter

(61)

Feb 2012

Example 3 Vectorization (SISD ⇒ SIMD) : No Parallelization (SISD ⇒ MIMD) : No

int A[N], B[N], i;

for (i=0; i<N; i++) A[i+1] = A[i] + B[i+1];

Observe reads and writes

into a given location

(62)

Feb 2012

Example 3 Vectorization (SISD ⇒ SIMD) : No Parallelization (SISD ⇒ MIMD) : No

int A[N], B[N], i;

for (i=0; i<N; i++) A[i+1] = A[i] + B[i+1];

Observe reads and writes into a given location

. . . . . .

A[0..N]

B[0..N]

(63)

Feb 2012

Example 3 Vectorization (SISD ⇒ SIMD) : No Parallelization (SISD ⇒ MIMD) : No

int A[N], B[N], i;

for (i=0; i<N; i++) A[i+1] = A[i] + B[i+1];

Observe reads and writes into a given location

. . . . . .

A[0..N]

B[0..N]

Iteration #

(64)

Feb 2012

Example 3 Vectorization (SISD ⇒ SIMD) : No Parallelization (SISD ⇒ MIMD) : No

int A[N], B[N], i;

for (i=0; i<N; i++) A[i+1] = A[i] + B[i+1];

Observe reads and writes into a given location

. . . . . .

A[0..N]

B[0..N]

Iteration # 1

(65)

Feb 2012

Example 3 Vectorization (SISD ⇒ SIMD) : No Parallelization (SISD ⇒ MIMD) : No

int A[N], B[N], i;

for (i=0; i<N; i++) A[i+1] = A[i] + B[i+1];

Observe reads and writes into a given location

. . . . . .

A[0..N]

B[0..N]

Iteration # 1 2

(66)

Feb 2012

Example 3 Vectorization (SISD ⇒ SIMD) : No Parallelization (SISD ⇒ MIMD) : No

int A[N], B[N], i;

for (i=0; i<N; i++) A[i+1] = A[i] + B[i+1];

Observe reads and writes into a given location

. . . . . .

A[0..N]

B[0..N]

Iteration # 1 2 3

(67)

Feb 2012

Example 3 Vectorization (SISD ⇒ SIMD) : No Parallelization (SISD ⇒ MIMD) : No

int A[N], B[N], i;

for (i=0; i<N; i++) A[i+1] = A[i] + B[i+1];

Observe reads and writes into a given location

. . . . . .

A[0..N]

B[0..N]

Iteration # 1 2 3 4

(68)

Feb 2012

Example 3 Vectorization (SISD ⇒ SIMD) : No Parallelization (SISD ⇒ MIMD) : No

int A[N], B[N], i;

for (i=0; i<N; i++) A[i+1] = A[i] + B[i+1];

Observe reads and writes into a given location

. . . . . .

A[0..N]

B[0..N]

Iteration # 1 2 3 4 5

(69)

Feb 2012

Example 3 Vectorization (SISD ⇒ SIMD) : No Parallelization (SISD ⇒ MIMD) : No

int A[N], B[N], i;

for (i=0; i<N; i++) A[i+1] = A[i] + B[i+1];

Observe reads and writes into a given location

. . . . . .

A[0..N]

B[0..N]

Iteration # 1 2 3 4 5 6

(70)

Feb 2012

Example 3 Vectorization (SISD ⇒ SIMD) : No Parallelization (SISD ⇒ MIMD) : No

int A[N], B[N], i;

for (i=0; i<N; i++) A[i+1] = A[i] + B[i+1];

Observe reads and writes into a given location

. . . . . .

A[0..N]

B[0..N]

Iteration # 1 2 3 4 5 6 7

(71)

Feb 2012

Example 3 Vectorization (SISD ⇒ SIMD) : No Parallelization (SISD ⇒ MIMD) : No

int A[N], B[N], i;

for (i=0; i<N; i++) A[i+1] = A[i] + B[i+1];

Observe reads and writes into a given location

. . . . . .

A[0..N]

B[0..N]

Iteration # 1 2 3 4 5 6 7 8

(72)

Feb 2012

Example 3 Vectorization (SISD ⇒ SIMD) : No Parallelization (SISD ⇒ MIMD) : No

int A[N], B[N], i;

for (i=0; i<N; i++) A[i+1] = A[i] + B[i+1];

Observe reads and writes into a given location

. . . . . .

A[0..N]

B[0..N]

Iteration # 1 2 3 4 5 6 7 8 9

(73)

Feb 2012

Example 3 Vectorization (SISD ⇒ SIMD) : No Parallelization (SISD ⇒ MIMD) : No

int A[N], B[N], i;

for (i=0; i<N; i++) A[i+1] = A[i] + B[i+1];

Observe reads and writes into a given location

. . . . . .

A[0..N]

B[0..N]

Iteration # 1 2 3 4 5 6 7 8 9 10

(74)

Feb 2012

Example 3 Vectorization (SISD ⇒ SIMD) : No Parallelization (SISD ⇒ MIMD) : No

int A[N], B[N], i;

for (i=0; i<N; i++) A[i+1] = A[i] + B[i+1];

Observe reads and writes into a given location

. . . . . .

A[0..N]

B[0..N]

Iteration # 1 2 3 4 5 6 7 8 9 10 11

(75)

Feb 2012

Example 3 Vectorization (SISD ⇒ SIMD) : No Parallelization (SISD ⇒ MIMD) : No

int A[N], B[N], i;

for (i=0; i<N; i++) A[i+1] = A[i] + B[i+1];

Observe reads and writes into a given location

. . . . . .

A[0..N]

B[0..N]

Iteration # 1 2 3 4 5 6 7 8 9 10 11 12 12 . . .

(76)

Feb 2012

Example 3 Vectorization (SISD ⇒ SIMD) : No Parallelization (SISD ⇒ MIMD) : No

int A[N], B[N], i;

for (i=0; i<N; i++) A[i+1] = A[i] + B[i+1];

. . . . . .

A[0..N]

B[0..N]

Iteration # 1 2 3 4 5 6 7 8 9 10 11 12 12 . . .

Nature of accesses in our example Iteration i Iteration i + k Observation

Read Write No

Write Read

Write Write

Read Read

(77)

Feb 2012

Example 3 Vectorization (SISD ⇒ SIMD) : No Parallelization (SISD ⇒ MIMD) : No

int A[N], B[N], i;

for (i=0; i<N; i++) A[i+1] = A[i] + B[i+1];

. . . . . .

A[0..N]

B[0..N]

Iteration # 1 2 3 4 5 6 7 8 9 10 11 12 12 . . .

Nature of accesses in our example Iteration i Iteration i + k Observation

Read Write No

Write Read Yes

Write Write

Read Read

(78)

Feb 2012

Example 3 Vectorization (SISD ⇒ SIMD) : No Parallelization (SISD ⇒ MIMD) : No

int A[N], B[N], i;

for (i=0; i<N; i++) A[i+1] = A[i] + B[i+1];

. . . . . .

A[0..N]

B[0..N]

Iteration # 1 2 3 4 5 6 7 8 9 10 11 12 12 . . .

Nature of accesses in our example Iteration i Iteration i + k Observation

Read Write No

Write Read Yes

Write Write No

Read Read

(79)

Feb 2012

Example 3 Vectorization (SISD ⇒ SIMD) : No Parallelization (SISD ⇒ MIMD) : No

int A[N], B[N], i;

for (i=0; i<N; i++) A[i+1] = A[i] + B[i+1];

. . . . . .

A[0..N]

B[0..N]

Iteration # 1 2 3 4 5 6 7 8 9 10 11 12 12 . . .

Nature of accesses in our example Iteration i Iteration i + k Observation

Read Write No

Write Read Yes

Write Write No

Read Read Does not matter

(80)

Feb 2012

Example 4

Vectorization (SISD ⇒ SIMD) : No

Parallelization (SISD ⇒ MIMD) : Yes

(81)

Feb 2012

Example 4

Vectorization (SISD ⇒ SIMD) : No Parallelization (SISD ⇒ MIMD) : Yes

• This case is not possible

(82)

Feb 2012

Example 4

Vectorization (SISD ⇒ SIMD) : No Parallelization (SISD ⇒ MIMD) : Yes

• This case is not possible

• Vectorization is a limited granularity parallelization

(83)

Feb 2012

Example 4

Vectorization (SISD ⇒ SIMD) : No Parallelization (SISD ⇒ MIMD) : Yes

• This case is not possible

• Vectorization is a limited granularity parallelization

• If parallelization is possible then vectorization is trivially possible

(84)

Feb 2012

Data Dependence

Let statements S

i

and S

j

access memory location m at time instants t and t + k

Access in S

i

Access in S

j

Dependence Notation Read m Write m Anti (or Pseudo) S

i

¯ δ S

j

Write m Read m Flow (or True) S

i

δ S

j

Write m Write m Output (or Pseudo) S

i

δ

O

S

j

Read m Read m Does not matter

• Pseudo dependences may be eliminated by some transformations

• True dependence prohibits parallel execution of S

i

and S

j

(85)

Feb 2012

Data Dependence

Consider dependence between statements S

i

and S

j

in a loop

• Loop independent dependence. t and t + k occur in the same iteration of a loop

S

i

and S

j

must be executed sequentially

Different iterations of the loop can be parallelized

• Loop carried dependence. t and t + k occur in the different iterations of a loop

Within an iteration, S

i

and S

j

can be executed in parallel

Different iterations of the loop must be executed sequentially

• S

i

and S

j

may have both loop carried and loop independent

dependences

(86)

Feb 2012

Dependence in Example 1

• Program

int A[N], B[N], i;

for (i=1; i<N; i++)

A[i] = A[i] + B[i-1]; /* S1 */

• Dependence graph S

1

δ ¯

• No loop carried dependence

Both vectorization and parallelization are possible

Vectorization is possible since all reads are done before all writes

(87)

Feb 2012

Dependence in Example 1

• Program

int A[N], B[N], i;

for (i=1; i<N; i++)

A[i] = A[i] + B[i-1]; /* S1 */

• Dependence graph S

1

¯ δ

Dependence in the same iteration

• No loop carried dependence

Both vectorization and parallelization are possible

Vectorization is possible since all reads are done before all writes

(88)

Feb 2012

Dependence in Example 2

• Program

int A[N], B[N], i;

for (i=0; i<N; i++)

A[i] = A[i+1] + B[i]; /* S1 */

• Dependence graph S

1

δ ¯

1

• Loop carried anti-dependence Parallelization is not possible

Vectorization is possible since all reads are done before all writes

(89)

Feb 2012

Dependence in Example 2

• Program

int A[N], B[N], i;

for (i=0; i<N; i++)

A[i] = A[i+1] + B[i]; /* S1 */

• Dependence graph S

1

δ ¯ δ ¯

11

Dependence due to the outermost loop

• Loop carried anti-dependence Parallelization is not possible

Vectorization is possible since all reads are done before all writes

(90)

Feb 2012

Dependence in Example 3

• Program

int A[N], B[N], i;

for (i=0; i<N; i++)

A[i+1] = A[i] + B[i+1]; /* S1 */

• Dependence graph S

1

δ

1

• Loop carried flow-dependence

Neither parallelization not vectorization is possible

Vectorization is possible since all reads are done before all writes

(91)

Feb 2012

Iteration Vectors and Index Vectors: Example 1

for (i=0, i<4; i++) for (j=0; j<4; j++) {

a[i+1][j] = a[i][j] + 2;

}

Iteration Index Vector Vector LHS RHS

0, 0 1, 0 0, 0

0, 1 1, 1 0, 1

0, 2 1, 2 0, 2

0, 3 1, 3 0, 3

1, 0 2, 0 1, 0

1, 1 2, 1 1, 1

1, 2 2, 2 1, 2

1, 3 2, 3 1, 3

2, 0 3, 0 2, 0

2, 1 3, 1 2, 1

2, 2 3, 2 2, 2

2, 3 3, 3 2, 3

3, 0 4, 0 3, 0

3, 1 4, 1 3, 1

3, 2 4, 2 3, 2

3, 3 4, 3 3, 3

(92)

Feb 2012

Iteration Vectors and Index Vectors: Example 1

for (i=0, i<4; i++) for (j=0; j<4; j++) {

a[i+1][j] = a[i][j] + 2;

}

Loop carried dependence exists if

• there are two distinct iteration vectors such that

• the index vectors of LHS and RHS are identical

Iteration Index Vector Vector LHS RHS

0, 0 1, 0 0, 0

0, 1 1, 1 0, 1

0, 2 1, 2 0, 2

0, 3 1, 3 0, 3

1, 0 2, 0 1, 0

1, 1 2, 1 1, 1

1, 2 2, 2 1, 2

1, 3 2, 3 1, 3

2, 0 3, 0 2, 0

2, 1 3, 1 2, 1

2, 2 3, 2 2, 2

2, 3 3, 3 2, 3

3, 0 4, 0 3, 0

3, 1 4, 1 3, 1

3, 2 4, 2 3, 2

3, 3 4, 3 3, 3

(93)

Feb 2012

Iteration Vectors and Index Vectors: Example 1

for (i=0, i<4; i++) for (j=0; j<4; j++) {

a[i+1][j] = a[i][j] + 2;

}

Loop carried dependence exists if

• there are two distinct iteration vectors such that

• the index vectors of LHS and RHS are identical

Conclusion: Dependence exists

Iteration Index Vector Vector LHS RHS

0, 0 1, 0 0, 0

0, 1 1, 1 0, 1

0, 2 1, 2 0, 2

0, 3 1, 3 0, 3

1, 0 2, 0 1, 0

1, 1 2, 1 1, 1

1, 2 2, 2 1, 2

1, 3 2, 3 1, 3

2, 0 3, 0 2, 0

2, 1 3, 1 2, 1

2, 2 3, 2 2, 2

2, 3 3, 3 2, 3

3, 0 4, 0 3, 0

3, 1 4, 1 3, 1

3, 2 4, 2 3, 2

3, 3 4, 3 3, 3

(94)

Feb 2012

Iteration Vectors and Index Vectors: Example 1

for (i=0, i<4; i++) for (j=0; j<4; j++) {

a[i+1][j] = a[i][j] + 2;

}

Loop carried dependence exists if

• there are two distinct iteration vectors such that

• the index vectors of LHS and RHS are identical

Conclusion: Dependence exists

Iteration Index Vector Vector LHS RHS

0, 0 1, 0 0, 0

0, 1 1, 1 0, 1

0, 2 1, 2 0, 2

0, 3 1, 3 0, 3

1, 0 2, 0 1, 0

1, 1 2, 1 1, 1

1, 2 2, 2 1, 2

1, 3 2, 3 1, 3

2, 0 3, 0 2, 0

2, 1 3, 1 2, 1

2, 2 3, 2 2, 2

2, 3 3, 3 2, 3

3, 0 4, 0 3, 0

3, 1 4, 1 3, 1

3, 2 4, 2 3, 2

3, 3 4, 3 3, 3

(95)

Feb 2012

Iteration Vectors and Index Vectors: Example 1

for (i=0, i<4; i++) for (j=0; j<4; j++) {

a[i+1][j] = a[i][j] + 2;

}

Loop carried dependence exists if

• there are two distinct iteration vectors such that

• the index vectors of LHS and RHS are identical

Conclusion: Dependence exists

Iteration Index Vector Vector LHS RHS

0, 0 1, 0 0, 0

0, 1 1, 1 0, 1

0, 2 1, 2 0, 2

0, 3 1, 3 0, 3

1, 0 2, 0 1, 0

1, 1 2, 1 1, 1

1, 2 2, 2 1, 2

1, 3 2, 3 1, 3

2, 0 3, 0 2, 0

2, 1 3, 1 2, 1

2, 2 3, 2 2, 2

2, 3 3, 3 2, 3

3, 0 4, 0 3, 0

3, 1 4, 1 3, 1

3, 2 4, 2 3, 2

3, 3 4, 3 3, 3

(96)

Feb 2012

Iteration Vectors and Index Vectors: Example 2

for (i=0, i<4; i++) for (j=0; j<4; j++) {

a[i][j] = a[i][j] + 2;

}

Iteration Index Vector Vector LHS RHS

0, 0 0, 0 0, 0

0, 1 0, 1 0, 1

0, 2 0, 2 0, 2

0, 3 0, 3 0, 3

1, 0 1, 0 1, 0

1, 1 1, 1 1, 1

1, 2 1, 2 1, 2

1, 3 1, 3 1, 3

2, 0 2, 0 2, 0

2, 1 2, 1 2, 1

2, 2 2, 2 2, 2

2, 3 2, 3 2, 3

3, 0 3, 0 3, 0

3, 1 3, 1 3, 1

3, 2 3, 2 3, 2

3, 3 3, 3 3, 3

(97)

Feb 2012

Iteration Vectors and Index Vectors: Example 2

for (i=0, i<4; i++) for (j=0; j<4; j++) {

a[i][j] = a[i][j] + 2;

}

Loop carried dependence exists if

• there are two distinct iteration vectors such that

• the index vectors of LHS and RHS are identical

Iteration Index Vector Vector LHS RHS

0, 0 0, 0 0, 0

0, 1 0, 1 0, 1

0, 2 0, 2 0, 2

0, 3 0, 3 0, 3

1, 0 1, 0 1, 0

1, 1 1, 1 1, 1

1, 2 1, 2 1, 2

1, 3 1, 3 1, 3

2, 0 2, 0 2, 0

2, 1 2, 1 2, 1

2, 2 2, 2 2, 2

2, 3 2, 3 2, 3

3, 0 3, 0 3, 0

3, 1 3, 1 3, 1

3, 2 3, 2 3, 2

3, 3 3, 3 3, 3

(98)

Feb 2012

Iteration Vectors and Index Vectors: Example 2

for (i=0, i<4; i++) for (j=0; j<4; j++) {

a[i][j] = a[i][j] + 2;

}

Loop carried dependence exists if

• there are two distinct iteration vectors such that

• the index vectors of LHS and RHS are identical

Conclusion: No dependence

Iteration Index Vector Vector LHS RHS

0, 0 0, 0 0, 0

0, 1 0, 1 0, 1

0, 2 0, 2 0, 2

0, 3 0, 3 0, 3

1, 0 1, 0 1, 0

1, 1 1, 1 1, 1

1, 2 1, 2 1, 2

1, 3 1, 3 1, 3

2, 0 2, 0 2, 0

2, 1 2, 1 2, 1

2, 2 2, 2 2, 2

2, 3 2, 3 2, 3

3, 0 3, 0 3, 0

3, 1 3, 1 3, 1

3, 2 3, 2 3, 2

3, 3 3, 3 3, 3

(99)

Feb 2012

Example 4: Dependence

Program to swap arrays Dependence Graph for (i=0; i<N; i++)

{

T = A[i]; /* S1 */

A[i] = B[i]; /* S2 */

B[i] = T; /* S3 */

}

S

1

S

3

S

2

¯ δ

δ ¯

δ

δ ¯

1

δ

1O

Loop independent anti dependence due to A[i]

(100)

Feb 2012

Example 4: Dependence

Program to swap arrays Dependence Graph for (i=0; i<N; i++)

{

T = A[i]; /* S1 */

A[i] = B[i]; /* S2 */

B[i] = T; /* S3 */

}

S

1

S

3

S

2

δ ¯

δ ¯

δ

δ ¯

1

δ

1O

Loop independent anti dependence due to A[i]

(101)

Feb 2012

Example 4: Dependence

Program to swap arrays Dependence Graph for (i=0; i<N; i++)

{

T = A[i]; /* S1 */

A[i] = B[i]; /* S2 */

B[i] = T; /* S3 */

}

S

1

S

3

S

2

¯ δ

δ ¯

δ

δ ¯

1

δ

1O

Loop independent anti dependence due to B[i]

(102)

Feb 2012

Example 4: Dependence

Program to swap arrays Dependence Graph for (i=0; i<N; i++)

{

T = A[i]; /* S1 */

A[i] = B[i]; /* S2 */

B[i] = T; /* S3 */

}

S

1

S

3

S

2

¯ δ

δ ¯

δ

δ ¯

1

δ

1O

Loop independent flow dependence due to T

(103)

Feb 2012

Example 4: Dependence

Program to swap arrays Dependence Graph for (i=0; i<N; i++)

{

T = A[i]; /* S1 */

A[i] = B[i]; /* S2 */

B[i] = T; /* S3 */

}

S

1

S

3

S

2

¯ δ

δ ¯

δ

δ ¯

1

δ

1O

Loop carried anti dependence due to T

(104)

Feb 2012

Example 4: Dependence

Program to swap arrays Dependence Graph for (i=0; i<N; i++)

{

T = A[i]; /* S1 */

A[i] = B[i]; /* S2 */

B[i] = T; /* S3 */

}

S

1

S

3

S

2

¯ δ

δ ¯

δ

δ ¯

1

δ

O1

Loop carried output dependence due to T

(105)

Feb 2012

Example 4: Dependence

Program to swap arrays Dependence Graph

for (i=0; i<N; i++) {

T = A[i]; /* S1 */

A[i] = B[i]; /* S2 */

B[i] = T; /* S3 */

}

S

1

S

3

S

2

¯ δ

δ ¯

δ

δ ¯

1

δ

1O

Loop independent anti dependence due to A[i]

(106)

Feb 2012

Data Dependence Theorem

There exists a dependence from statement S

1

to statement S

2

in common nest of loops if and only if there exist two iteration vectors i and j for the nest, such that

1. i < j or i = j and there exists a path from S

1

to S

2

in the body of the loop,

2. statement S

1

accesses memory location M on iteration i and statement S

2

accesses location M on iteration j, and

3. one of these accesses is a write access.

(107)

Feb 2012

Anti Dependence and Vectorization

Read precedes Write lexicographically

int A[N], B[N], C[N], i;

for (i=0; i<N; i++) { C[i] = A[i+2];

A[i] = B[i];

}

(108)

Feb 2012

Anti Dependence and Vectorization

Read precedes Write lexicographically

int A[N], B[N], C[N], i;

for (i=0; i<N; i++) { C[i] = A[i+2];

A[i] = B[i];

}

int A[N], B[N], C[N], i;

for (i=0; i<N; i=i+4) { C[i:i+3] = A[i+2:i+5];

A[i:i+3] = B[i:i+3];

}

(109)

Feb 2012

Anti Dependence and Vectorization

Write precedes Read lexicographically

int A[N], B[N], C[N], i;

for (i=0; i<N; i++) { A[i] = B[i];

C[i] = A[i+2];

}

(110)

Feb 2012

Anti Dependence and Vectorization

Write precedes Read lexicographically

int A[N], B[N], C[N], i;

for (i=0; i<N; i++) { A[i] = B[i];

C[i] = A[i+2];

}

int A[N], B[N], C[N], i;

for (i=0; i<N; i++) { C[i] = A[i+2];

A[i] = B[i];

}

(111)

Feb 2012

Anti Dependence and Vectorization

Write precedes Read lexicographically

int A[N], B[N], C[N], i;

for (i=0; i<N; i++) { A[i] = B[i];

C[i] = A[i+2];

}

int A[N], B[N], C[N], i;

for (i=0; i<N; i++) { C[i] = A[i+2];

A[i] = B[i];

}

int A[N], B[N], C[N], i;

for (i=0; i<N; i=i+4) { C[i:i+3] = A[i+2:i+5];

A[i:i+3] = B[i:i+3];

}

(112)

Feb 2012

True Dependence and Vectorization

Write precedes Read lexicographically

int A[N], B[N], C[N], i;

for (i=0; i<N; i++) { A[i+2] = C[i];

B[i] = A[i];

}

(113)

Feb 2012

True Dependence and Vectorization

Write precedes Read lexicographically

int A[N], B[N], C[N], i;

for (i=0; i<N; i++) { A[i+2] = C[i];

B[i] = A[i];

}

int A[N], B[N], C[N], i;

for (i=0; i<N; i=i+4) { A[i+2:i+5] = C[i:i+3];

B[i:i+3] = A[i:i+3];

}

(114)

Feb 2012

Multiple Dependences and Vectorization Anti Dependence and True Dependence

int A[N], i;

for (i=0; i<N; i++) { A[i] = A[i+2];

}

(115)

Feb 2012

Multiple Dependences and Vectorization Anti Dependence and True Dependence

int A[N], i;

for (i=0; i<N; i++) { A[i] = A[i+2];

}

int A[N], i, temp;

for (i=0; i<N; i++) { temp = A[i+2];

A[i] = temp;

}

(116)

Feb 2012

Multiple Dependences and Vectorization Anti Dependence and True Dependence

int A[N], i;

for (i=0; i<N; i++) { A[i] = A[i+2];

}

int A[N], i, temp;

for (i=0; i<N; i++) { temp = A[i+2];

A[i] = temp;

}

int A[N], T[N], i;

for (i=0; i<N; i++) { T[i] = A[i+2];

A[i] = T[i];

}

(117)

Feb 2012

Multiple Dependences and Vectorization Anti Dependence and True Dependence

int A[N], i;

for (i=0; i<N; i++) { A[i] = A[i+2];

}

int A[N], i, temp;

for (i=0; i<N; i++) { temp = A[i+2];

A[i] = temp;

}

int A[N], T[N], i;

for (i=0; i<N; i=i+4) { T[i:i+3] = A[i+2:i+5];

A[i:i+3] = T[i:i+3];

}

int A[N], T[N], i;

for (i=0; i<N; i++) { T[i] = A[i+2];

A[i] = T[i];

}

(118)

Feb 2012

Multiple Dependences and Vectorization

True Dependence and Anti Dependence

int A[N], B[N], i;

for (i=0; i<N; i++) { A[i] = B[i];

B[i+2] = A[i+1];

}

(119)

Feb 2012

Multiple Dependences and Vectorization

True Dependence and Anti Dependence

int A[N], B[N], i;

for (i=0; i<N; i++) { A[i] = B[i];

B[i+2] = A[i+1];

}

int A[N], B[N], i;

for (i=0; i<N; i++) { B[i+2] = A[i+1];

A[i] = B[i];

}

(120)

Feb 2012

Multiple Dependences and Vectorization

True Dependence and Anti Dependence

int A[N], B[N], i;

for (i=0; i<N; i++) { A[i] = B[i];

B[i+2] = A[i+1];

}

int A[N], B[N], i;

for (i=0; i<N; i++) { B[i+2] = A[i+1];

A[i] = B[i];

}

int A[N], B[N], i;

for (i=0; i<N; i=i+4) { B[i+2:i+5] = A[i+1:i+4];

A[i:i+3] = B[i:i+3];

}

(121)

Feb 2012

Cyclic Dependences and Vectorization

Cyclic True Dependence

int A[N], B[N], i;

for (i=0; i<N; i++) { B[i+2] = A[i];

A[i+1] = B[i];

}

(122)

Feb 2012

Cyclic Dependences and Vectorization

Cyclic True Dependence

int A[N], B[N], i;

for (i=0; i<N; i++) { B[i+2] = A[i];

A[i+1] = B[i];

}

Cyclic Anti Dependence

int A[N], B[N], i;

for (i=0; i<N; i++) { B[i] = A[i+1];

A[i] = B[i+2];

}

(123)

Feb 2012

Cyclic Dependences and Vectorization

Cyclic True Dependence

int A[N], B[N], i;

for (i=0; i<N; i++) { B[i+2] = A[i];

A[i+1] = B[i];

}

Cyclic Anti Dependence

int A[N], B[N], i;

for (i=0; i<N; i++) { B[i] = A[i+1];

A[i] = B[i+2];

}

Rescheduling of statements will not break the cyclic dependence - cannot

vectorize

(124)

Feb 2012

Last but not the least . . .

Thank You!

References

Related documents

3 July 2011 gcc-par-vect: Parallelization and Vectorization in GCC using Lambda Framework 2/62.. Loop Transforms

SIC = scalar iteration cost VIC = vector iteration cost VOC = vector outside cost VF = vectorization factor PL ITERS = prologue iterations EP ITERS = epilogue iterations SOC =

When the same location is accessed across different iterations, the order of reads and writes must be preserved.. Nature of accesses in our example Iteration i Iteration i +

3 July 2011 intro-par-vect: Introduction to Parallelization and Vectorization 3/28. A Taxonomy of

The cues to action which modifies and influences the adult females perception is the structured teaching programme regarding cervical cancer which explains the meaning,

The natural join of two relations R and S is a set of pairs of tuples, one from R and one from S, that agree on whatever attributes are common to the schemas of R and S. Connect

-s file Returns true if file exists and has a greater size that zero -s file Returns true if file exists and has a greater size that zero -w file Returns true if file exists and

Harmonization of requirements of national legislation on international road transport, including requirements for vehicles and road infrastructure ..... Promoting the implementation