• No results found

An efficient methodology for building custom processors based on instruction set extensions

N/A
N/A
Protected

Academic year: 2022

Share "An efficient methodology for building custom processors based on instruction set extensions"

Copied!
10
0
0

Loading.... (view fulltext now)

Full text

(1)

AN EFFICIENT METHODOLOGY FOR BUILDING CUSTOM PROCESSORS

BASED ON INSTRUCTION SET EXTENSIONS

by

NAGARAJU POTHINENI

Department of Computer Science & Engineering

Submitted

in fulfillment of the requirements of the degree of Doctor of Philiosophy

to the

Indian Institute of Technology Delhi

October, 2008

(2)

I.T. DELHI

DD ~tl ~'`~f D.. Ti

Acc. No.. r :. .3G ,

Y1-E.

(3)

Certificate

This is to certify that the thesis titled An Efficient Methodology for Building Custom Processors based on Instruction Set Extensions being submitted by Nagaraju Pothineni for the award of Doctor of Philosophy in Computer Sci- ence & Engg. is a record of bona fide work carried out by him under our guidance and supervision at the Dept. of Computer Science & Engg., Indian Institute of Technology Delhi. The work presented in this thesis has not been submitted else- where, either in part or full, for the award of any other degree or diploma.

Anshu Kumar Professor

Dept. of Computer Science & Engg.

Indian Institute of Technology Delhi

olin Paul Assistant Professor

Dept. of Computer Science & Engg.

Indian Institute of Technology Delhi

(4)

Acknowledgments

First and foremost, I would like to express my deep gratitude to Prof. Anshul Ku- mar for his invaluable guidance from the early stages of my PhD till to date. He has spent several intense hours on my thesis and provided constant and constructive feed- back. He has supported and nurtured innovative thinking and shown the path of doing research in a dedicated manner. I can not forget the numerous times he spent long night hours working on my reports and providing the comments with impeccable patience. It has been very memorable and enjoying to have worked with him, as I have learnt not just how to develop and explore new ideas, but how to be professional, ethical and kind to others. He is undoubtedly the one person who has improved my technical abilities and writing skills most.

I would also deeply thank my co-advisor Dr. Kolin Paul for his guidance throughout my PhD. He has always injected many innovative and non-conventional ideas in me and encouraged to think beyond one narrow field. He is very prompt in any help I had asked whether it is evaluating the reports or providing feedback on my ideas etc. During the interaction with him, he used to ask very thought provoking questions that helped me to shape my work.

I had carried my work on datapath merging at EPFL, Switzerland. I thank Prof.

Paolo Ienne for giving me an opportunity to work in his lab and for technical guidance.

He has kindly given me access to MIMOSYS tool. I also would be greatly indebted to Dr. Philip Brisk for his guidance and help while I was at EPFL.

I thank Prof. Balakrishnan, Prof. Preeti Ranjan Panda, and Prof. Ranjan Bose for serving on my SRC committee and for their feedback on every presentation I gave before them.They had given several useful and constructive feedback on my work.

I would not have have been able to work without the cooperation and help from the laboratory staff, Mr. Somdutt Sharma and Ms. Vandana. They are very prompt in providing the hardware equipment or software whenever I had asked for.

Working for intense four years would have very painful without the pleasing and entertaining company of my colleagues. I need to acknowledge the personal and moral support from Anant Vishnoi, Aryabartta Sahu, Neeraj Goel, Lava Bhargav, Krishnaiah

(5)

and Murali during my stay at IITD.

Last but not least, I thank my parents for their support and encouragement during the course of my PhD.

Nagaraju Pothineni

(6)

Abstract

In today's consumer electronics market, device vendors have been constantly facing the challenge of delivering more energy efficient, feature rich mobile platforms in a very short span of time. With the advantage of growing number of transistors packaged in a single chip, we are now able to build all the components of a system on a single chip. It is a challenging task of the designer to exploit the large number of transistors and deliver an innovative platform that meets the user's requirements and has less aggregated design cost.

In the recent past, researchers have proposed paradigm of extensible processors, wherein designers augment the instruction set of the processor with custom instructions that implement the critical portions of the application on dedicated hardware units called Custom Functional Units (CFUs) that are tightly integrated into the processor pipeline. CFUs can significantly enhance the performance of the application and also reduce the energy consumption, which is a key requirement in today's design.

We believe that for extensible processors to become successfully employed in main- stream design, there must be automation framework that can assist the designer starting from profiling till the hardware generation. Though there are quite a few commercial tools available in the industry for the design of extensible processor, their use is still limited and designers are required to manually make decisions about the choice of CFUs and their use in the application. This puts a heavy burden on the designer and conse- quently, prolongs the design cycle and has direct effect on the design cost. To overcome these gaps, in this thesis, we have developed an efficient methodology for building cus- tomized processors in a short amount of time.

The methodology is built around the principle of thorough and accurate analysis of the design space. To handle the potentially large design space, we present efficient algorithms for exhaustive enumeration of custom instruction candidates and efficient spatial reuse computation based on preprocessing step known as Recurring Pattern Computation (RPC).

(7)

Also, we show that when the processor supports serialization of operands, largest legal arithmetic clusters (MaxMIM0s) yield maximum performance. We present an efficient method of identifying and evaluating the MaxMIMO patterns.

Finally, we present a methodology called as Data Path Merging (DPM), for the area efficient synthesis of generated custom instructions. The DPM is based on the principle of sharing resources among the mutually exclusive operations. When the number of inputs/outputs of the custom instruction are larger than the available register ports, the proposed synthesis methodology automatically pipelines and serializes the operands before performing the resource sharing.

(8)

Contents

List of Figures v

List of Tables ix

1 Introduction 1

1.1 Issues in Automatic Customization . . . 2

1.2 Limitations of Previous Work . . . 3

1.3 Overview of our Approach . . . 5

1.4 Main Contributions . . . 7

1.5 Layout of the Thesis . . . 7

2 Custom Processor Architectures 9 2.1 Tensilica LX2 . . . 10

2.2 ARC ARCompact . . . 12

2.3 MIPS CorExtend . . . 13

2.4 NIOS II ... 14

2.5 Influence of Base Processor Architecture on ISE . . . . . . 16

2.5.1 Architectural Constraints . . . 16

2.6 Execution Model . . . 17

3 Custom Instruction Generation 19 3.1 Introduction . . . 19

3.2 Related Work . . . 21

3.3 Problem Formulation . . . 26

3.4 Generating Convex Subgraphs . . . 27

3.5 Useful k-node Subgraphs . . . .. . . 34 i

(9)

ii CONTENTS

3.5.1 Useful k-node subgraphs . . . 34

3.6 Our Algorithm . . . 35

3.6.1 Generating set of k + 1 node useful subgraphs, UF[k + 1] . . . 36

3.7 Experimental Results . . . 44

3.7.1 Experiment Setup . . . 44

3.7.2 Results . . . 44

3.8 Summary . . . 46

4 Reuse Driven Customization 49 4.1 Introduction . . . 49

4.2 Outline of Pattern Analysis & Selection . . . . . . . 50

4.3 Related Work . . . 52

4.4 Recurring Pattern Computation . . . 54

4.4.1 Problem Formulation . . . 54

4.4.2 A Simple Exhaustive Approach . . . . . . 56

4.4.3 Our Algorithm . . . . . . . 58

4.4.4 Recurrence Extraction . . . 65

4.5 Instruction Selection . . . 68

4.5.1 Pattern Characterization . . . 68

4.5.2 Pattern Selection . . . 70

4.6 Experiments & Results . . . 72

4.7 Summary . . . 74

5 Flexible I/O FUs 81 5.1 Introduction . . . 81

5.2 Related Work . . . .. 83

5.3 Pattern Size and Performance Gain . . . 84

5.4 Our Methodology . . . .. . . . 86

5.5 Convex MaxMIMOs Generation . . . 88

5.5.1 Convex MaxMIMO Identification . . . . . . . . . . . . . . 88

5.5.2 Conditions for Convexity & Maximality . . . . . . 89

5.5.3 Algorithm based on Groupwise Incompatibility . . . . . . 91

5.6 Timeshaping of Patterns . . . . . . . 92

(10)

CONTENTS iii

5.7 Iterative Selection Algorithm . . . 98

5.8 Experiments . . . 99

5.9 Summary . . . 103

6 Custom Instruction Synthesis 105 6.1 Introduction . . . 105

6.2 Combinational Datapath Merging . . . 106

6.2.1 Problem Statement . . . 106

6.2.2 Related Work . . . 108

6.2.3 Problem Formulation . . . 109

6.2.4 Our Solution . . . 111

6.2.5 Evaluating the Supergraph . . . 112

6.2.6 Example . . . 113

6.2.7 ILP Formultaion . . . 113

6.2.8 Heuristic Solution . . . 116

6.2.9 Experiments & Results . . . 120

6.3 Multicycle Datapath Merging . . . 124

6.3.1 Timeshaping . . . 127

6.3.2 Timeshape Refinement . . . 130

6.3.3 Operation Merging . . . 133

6.3.4 Register Allocation . . . 137

6.3.5 Experiments & Results . . . 141

6.4 Summary . . . 145

7 Conclusion 147 7.1 Future Directions . . . 148

Bibliography 151

References

Related documents

This is to certify that the thesis titled “PERFORMANCE STUDIES OF 802.11 MAC BASED ON ORDER DEPENDENT CAPTURE CAPABILITY AT PHY” being submitted by VEGAD MAYUR MANSUKHLAL to

Chapter 5 of the thesis describes a novel methodology (Intercalate) for an efficient prediction of ligand binding modes and binding free energies for DNA-ligand

In this thesis, an attempt has been made to generate test data automatically for traditional methodology based on the automated generated control flow graph using three

This is to certify that the thesis entitled “Development of a Methodology for Supportability of Mechanical Systems” being submitted by Mohammad Asjad to the

This is to certify that the thesis titled ‘Development of Nonwoven Fabric Based Filtration System for Potable Water’, being submitted by Mr. Rahul Rajkumar

This is to certify that the thesis entitled " Efficient Systolic Architectures for Matrix, Signal Processing and Graph Related Problems", being submitted by M.Zubair to

This is to certify that the thesis titled EMPLOYING REDUNDANCY BASED TECHNIQUES TO PROVIDE RELIABILITY, SECURITY AND ACCOUNTABILITY IN MODERN PROCESSORS being submitted by

Carmona et. In the theory of Random SchrOdinger Operators, one deals with a collection of random operators in a single fixed Hilbert Space. The assumption of strict