Impact of Higher Level Functional Units on High Performance Multi - Core Node Architectures Aravind Vasudevan‡ WARAN Research Foundation [email protected]

Balaji Subramaniam‡ WARAN Research Foundation [email protected]

Vidya Sangkar L‡ WARAN Research Foundation [email protected]

Abstract As a result of increasing requirements for computationally intensive application, recognizing and exploiting the computational concurrency of an application becomes vital in design process of a node. The execution will be faster if we perform these computations faster and it will have a direct impact on the performance. Thus with increase in complexity and problem size for such applications, there is need for change in the design of functional units. In an attempt to address these issues, this paper presents a new paradigm involving Higher Level functional Units (HLFU) that could replace ALU based units in a processing node and its impact on the performance. 1. Introduction The evolution of new architectural concepts is required to harness the power of the technology in the current supercomputing era [2]. With the increase in complexity of the applications, there is a need for change in the design of the functional units. In conventional multiprocessors, cores are replicated processors with common cache shred through non-uniform access as in TRIPS [3]. These designs have increased raw the processing power of the node. But with increasing complexity of the application, the number of basic operations handled by the cores and hence by the processing nodes increases. Thus analysis of the characteristics of the application becomes a vital process in design, because different applications are computationally intensive with respect to particular classes of algorithms than others. Thus, with increase in the complexity and the problem size to be handled by the processing node (for such applications) the node architecture design has to incorporate higher level functional units. The class of functional units for the node architecture can be found by analysis of set of applications [4]. Thus HLFU’s such as the matrix multiplication units, sorter units, max/min finders etc. can replace large number of ALU based resources. Effective utilization of such units can give performance that cannot be achieved by any ALU based nodes. This paper is divided into 3 sections, the next section describes about the advantages of using a Higher Level Functional Units over conventional ALU based units in detail. In section 3, simulation results are shown which describes the advantages of using the HLFU in terms of instruction counts and the memory fetches with respect to problem sizes of different class of algorithms. ‡

Under – Graduate Research Trainee at WARAN Research Foundation, Chennai, India

2. Higher Level Functional Units As discussed earlier the Higher Level Functional Units such as the matrix multiplication units, graph partitioning units, graph traversal units, sorter units etc. can deliver higher performance. The types of functional units that are used are determined the characteristics of the application [4]. The corresponding number of instruction fetches and memory accesses is lesser in comparison to conventional ALU based processors. Hence, the usage of higher level functional units will have significant effect on the execution in terms of mapping, computation and communication complexity of application execution at the cluster level. One would think that when functional units are scaled from ALU to higher level functional units, the consumption of power increases but adapting a low power design for the functional units results in power consumption that is comparable with that of the ALU based units which are discussed in detail in [5]. Not all algorithms can be visualized as higher level functional units because of certain factors. This is evident in the case of Singular Value Decomposition [6], wherein the algorithm is not data partitionable. Because of this SVD cannot be fabricated as a unit. To overcome this, the required units can be arranged closer together, and the scheduler can delegate the algorithm to these functional units. Although it is not exactly similar to fabricating a unit for SVD, this mapping reduces the inter unit communication complexity.

Figure 1: Higher level instruction set

Figure 2: Instruction set comparison

2.1 Impact on Instruction Set Architecture (ISA): The ISA must fulfill the prerequisites of extracting performance of the underlying node by exploiting hardware level parallelism. A new ISA [1,4] is designed to govern the proposed higher-level functional units. In the higher-level ISA, one single instruction corresponds to number of ALU instructions(fig.2), which are to be executed after resolving their dependencies. The higher level ISA has direct impact on the mapping and communication complexity within a node of a cluster. The detailed ISA for the HLFU is given in figure 1.

Graph 4 LU Decomposition

Graph 1 LU Decomposition 5e+5

80000

4e+5

Instruction Count

Data accessed in KB

60000

40000

20000

0

3e+5

2e+5

1e+5

0

1

2

3

4

5

6

7

8

9

10

11

1

12

2

3

4

5

6

7

8

9

10

11

12

10

11

12

9

10

11

Scale Factor

Scale Factor ALU HLFU

ALU HLFU

Graph 2 Convex Hull

Graph 5 Convex Hull

18000

3e+5

16000 3e+5

12000

Instruction Count

Data accessed in KB

14000

10000 8000 6000

2e+5

2e+5

1e+5

4000 2000

5e+4

0 0 1

2

3

4

5

6

7

8

9

10

11

12

1

2

3

4

5

Scale Factor ALU HLFU

7

8

9

ALU HLFU

Graph 6 Kernighan-Lin Algorithm

Graph 3 Kernighan-Lin Algorithm 1.2e+6

7000 6000

1.0e+6

5000

8.0e+5

Instruction count

Data accessed in KB

6

Scale Factor

4000 3000 2000

6.0e+5

4.0e+5

2.0e+5 1000

0.0

0

1

2

3

4

5

6

7

8

9

10

11

1

2

ALU MIP

3

4

5

6

7

Scale Factor

Scale Factor ALU HLFU

8

3. Simulation Analysis: To strike a comparison between the usage of ALU and HLFU a simulation methodology has been adopted which makes use of the IDA disassembler [IDA disassembler]. The first step would be to analyze the code fragments obtained from the disassembled output. The disassembled output consists of flow graphs and code sections which make up the original code. The flow graphs depict the control flow of the program and each code block represents a functional block of code. The next step would be to identify a pattern from this output and assign a sequence of activities for each instruction. For example, a single add instruction would boil down into the following sub steps, load the contents from the registers into the temporary registers, issue the “add” select signal to the ALU unit and store the result in the temporary output register and finally move it to the destination register. But this is not the case with the HLFUs. Since they are hard-wired, there will be no need to store the intermediate data. The data required for performing this is fed directly to these functional units and the data is routed through the unit. This removes all the intermediate loads and stores. The only memory access for the HLFUs will be the input and the output which will vary with the type of the unit. This reduction in memory access is shown in the graphs labeled 1-3. For a given algorithm the reduction in memory access is shown with respect to increase in problem size. As discussed in the previous sections, using higher level functional units abstracts the instruction set to a higher level. Although each instruction becomes more complex the number of instructions reduces. This is shown in the graphs labeled 4-7. For example, a single 2*2 Matrix Multiplication will effectively boil down into 8 multiplications and 4 additions. In conventional ALU architectures, performing a 2*2 matrix multiplication operation would require 8 multiplications and 4 additions. But if we have a unit that performs this multiplication a single instruction would suffice. This is the reason for the lesser number of instructions in the case of HLFUs. Choosing the type of the HLFUs is also important. The choice is heavily influenced by the characteristics of the application to be executed. For example, if a matrix centric application has to be run, it would be meaningful to use matrix related units rather than graph theoretic units. Also, having a unit that can perform ‘n’ additions together would make more sense rather than using n separate adder units. 4. Conclusion With the increase in the need for computational power, extraction of parallelism and pipelining, we move towards the use of higher level functional units. This paper describes the advantages of using the HLFU’s over the conventional ALU based architecture. Following this simulation results were presented that strike a contrast between the ALU based architecture and HLFUs. From this paper, it is clearly evident that HLFUs are far more advantageous than conventional ALU based architecture and the feasibility of such units is discussed in [1, 5].

5. References [1] N.Venkateswaran et. al, “Towards Node Architecture Designs for Realizing High Productivity Supercomputers” presented at the pre-conference of 23rd International supercomputing conference 2008 (ISC 08) held at Dresden, Germany. [2] Peter Hildebrand (Chair), Warren Wiscombe (Science Lead) et. al, “Earth Science Vision 2030 Predictive Pathways for a Sustainable Future” NASA Working Group Report. [3] Doug Burger et. al, “Exploiting ILP, TLP, and DLP with the polymorphous TRIPS architecture” in ACM SIGARCH Computer Architecture News, Proceedings of the 30th annual international symposium on Computer architecture ISCA ’03 [4] Shyamsundar Gopalakrishnan, “High Performance Node Architecture for Supercomputing Clusters: A Generalized Design Methodology and Development of CAD Environment”, A Thesis Proposal Submitted to Waran Research Foundation. [5] Ravindhiran Mukundrajan, “Power Estimation of Higher Level Functional Units in Heterogeneous Multi-Core Processors” submitted to HiPC ‘08 student research symposium [6] Golub, Gene H & Kahan, William (1965), “Calculating the singular values and pseudo-inverse of a matrix”, Journal of the Society for Industrial and Applied Mathematics: Series B, Numerical Analysis 2(2): 205–224. [7] IDA Disassembler, http://www.datarescue.com/idabase/

Impact of Higher Level Functional Units on High ...

This paper is divided into 3 sections, the next section describes about the advantages of using a .... [7] IDA Disassembler, http://www.datarescue.com/idabase/

247KB Sizes 0 Downloads 252 Views

Recommend Documents

High level Functional Specification of the FlexiblePower ... - GitHub
2 / 14. Summary. This document provides a high level functional description of the .... Smart grids are used to better incorporate renewable energy sources. ..... http://www.broadband-forum.org/technical/download/TR-069_Amendment-4.pdf).

The future impact of the Internet on higher education.pdf ...
The future impact of the Internet on higher education.pdf. The future impact of the Internet on higher education.pdf. Open. Extract. Open with. Sign In. Main menu.

Impact of IT on Higher Education through Continuing Education - arXiv
learning in the secondary level itself should be one of the strategies for equipping our young with these skills. • Setting up schools of advanced studies and special research groups in IT. • Strategic alliances with global majors Microsoft,. Ora

Impact of IT on Higher Education through Continuing Education - arXiv
learning and school administration. IT can be used to promote greater and more efficient communication within the school, amongst schools. It would enhance the effectiveness of educational administration. Ready access to online data .... make remote

Impact of Building Architecture on Indoor Thermal Comfort Level: A ...
Thermal (LMT) model was used to evaluate the net heat gain to the building ... to derive the human sensational scale using Predictive Mean Vote (PMV) index,.

SECTION V. Functional Semantics of Lexical and Phraseological Units ...
where the composition, source and ways of clerical terminology formation are .... are mainly for the priest needs, therefore, become for him an easy way to make ... the money: Коли маєш сто кіп, то будеш піп – "Mention of an

On Experiencing High-Level Properties - Indrek Reiland.pdf ...
functional kind properties like being a stethoscope or being a cathode ray tube, and even. semantic properties (Bayne 2009, Fish 2009, Johnston 2004, 2006, ...

Observational evidence for volcanic impact on sea level and ... - PNAS
Dec 11, 2007 - (GSL) after some major volcanic eruptions. However, observational evidence has not ... period 2–3 years after the eruption relative to preeruption sea level. These results are statistically robust ..... changes and glacial isostatic

Mathematics Higher level Paper 2
(a) Use these results to find estimates for the value of a and the value of b. Give your answers to five significant figures. [4]. (b) Use this model to estimate the mean time for the finalists in an Olympic race for boats with 8 rowers. Give your an

Observational evidence for volcanic impact on sea level and ... - PNAS
Dec 11, 2007 - others, which will cause time-varying geographical bias if care is not taken to avoid ... E-mail: [email protected]. This article contains ..... atmospheric pressure patterns, it is still necessary to minimize geographical bias. We hav

Observational evidence for volcanic impact on sea level and ... - PNAS
Dec 11, 2007 - 31:L12217, 10.1029/2004GL020044. 15. Trenberth KE, Dai A (2007) Geophys Res Lett 34:L15702, 10.1029/. 2007GL030524. 16. Milly PCD, Cazenave A, Gennero C (2003) Proc Natl Acad Sci USA 100:13158–. 13161. 17. Thomas GE, Stamnes K (1999)

Higher Pay, Worse Outcomes? The Impact of Mayoral Wages on Local ...
Nov 29, 2015 - This include the design and implementation of the municipal ...... both good and bad, have an incentive to campaign on the same platform.

High-Level Data Partitioning for Parallel Computing on ...
Nov 23, 2010 - Comparison of Communications on a Star Topology . ...... (2001b), the future of computing platforms is best described ... of a small number of interconnected heterogeneous computing .... as the computational (number crunching) equivale

Higher Pay, Worse Outcomes? The Impact of Mayoral Wages on Local ...
Nov 29, 2015 - drawn from disaggregated municipal accounts and the achievement of performance goals set by ..... public works11 and managing the local business environment.12 In ... implementation of social transfer programs, the poverty rate still .

TANA HIGH-LEVEL FORUM ON SECURITY IN ... -
The annual Forum is an in-formal gathering of a wide spectrum of African leaders and citizens: heads of state and governmen! leaders of regional organizations, civil society and the private sector; eminent scholars and practitioners; students and you

TIMESTAMP LIQUID LEVEL (LTS) LOW LEVEL ALARM HIGH LEVEL ...
TIMESTAMP. LIQUID LEVEL (LTS). LOW LEVEL ALARM. HIGH LEVEL ALARM. 8/10/2017 9:27:11. 115. 0. 0. 8/10/2017 10:10:05. 115. 0. 0. 9/15/2017 13:52:06.

higher level work experience.pdf
3 217 - Bamboo Products LAVANYA E 1 4 13637 - HIDAYATHUL ISLAM MLPS AZHIKODE. 4 217 - Bamboo Products ANAMIKA M V 2 4 13667 - PKVSMUPS ...

Hypervisor Top Level Functional Specification - GitHub
100. HvSendSyntheticClusterIpiEx . ...... That is, the hypervisor is free to deliver the interrupt ..... When a message is sent, the hypervisor selects a free message buffer. ...... The Flags field included an invalid mask value in the proximity doma

Physics Higher Level Electricity and Electronics
Planck's constant h. 6·63 x 10–34 J s. Magnitude of the charge on electron e. 1·60 x 10–19 C. Mass of electron me. 9·11 x 10–31 kg. Acceleration due to gravity g.

Biology Higher and standard level
How does water travel up the xylem vessel on a hot, sunny day? ...... M. V. Amama (ship). Rhosneigr (unpolluted site). 0.00 0.01 0.05 0.10 0.50 1.00 5.00 10.0. Concentration of copper / mg dm–3. [Source: adapted from Russell and ...... (ii) Suggest

IMPACT OF TROPICAL WEATHER SYSTEMS ON INSURANCE.pdf ...
IMPACT OF TROPICAL WEATHER SYSTEMS ON INSURANCE.pdf. IMPACT OF TROPICAL WEATHER SYSTEMS ON INSURANCE.pdf. Open. Extract.