Performance Measurement of Processes and Threads Controlling ...

Viewer
Transcript

Performance Measurement of Processes and Threads Controlling, Tracking and Monitoring Based on Shared-Memory Parallel Processing Approach

A Thesis Submitted to the Council of Faculty of Science and Science Education School of Science at the University of Sulaimani In partial fulfillment of the requirements for the degree of Master of Science in Computer Science

BY Karzan Hussein Sharif B.Sc. Computer Science (2009), University of Sulaimani

Supervised by Dr. Subhi Rafeeq Muhammad Zeebaree Assistant Professor

Rebandan 2715

January 2016

‫ﺑﺴﻢ ﷲ اﻟﺮﲪﻦ اﻟﺮﺣﻴﻢ‬

‫ﺸﺎء وﻓَـﻮ َق ُﻛ ِﻞ ِذي ِﻋﻠ ٍْﻢ ﻋ ِ‬ ‫ٍ‬ ‫ﱠ‬ ‫ﻴﻢ‬ ‫ﻠ‬ ‫ﻧ‬ ‫ﻦ‬ ‫ﻣ‬ ‫ﺎت‬ ‫َ‬ ‫ﻧَـ ْﺮﻓَ ُﻊ َد َر َﺟ َ‬ ‫َ ٌ‬ ‫َ ْ ّ‬ ‫ﺻﺪق ﷲ اﻟﻌﻈﻴﻢ‬ ‫ﺳﻮرة ﻳﻮﺳﻒ‪ -‬اﻻﻳﺔ‪(76):‬‬

Dedication

To my mother,

The spirit of my father as long as I wished him to be with me in this time,

My dear wife and

My daughter, Alaa

i

Acknowledgements

First of all, I would like to thank GOD for inspiring me with hope and power to continue in my M.A. study.

I express my greatest gratitude and appreciation to the Kurdistan Regional Government, Ministry of Higher Education and Scientific Research, the Presidency of Sulaimani University, School of Science and the Department of computer and Ministry of Education, for their support, opportunities and facilities for carrying out this work.

My great thanks go to my supervisor Asst. Prof. Dr. Subhi Rafeeq Muhammad Zeebaree, for his kindness, valuable guidance, advice and assistance. He gave me a lot of his time and effort and I really got benefit from his experience especially in this particular subject.

I also give deep thanks to my dear wife, who has helped me keeping balance between the sharing my work, my quest for knowledge and life.

ii

Abstract Modern systems have been changed from traditional processors architecture to multicore architectures. Hence, it is very important to study how these systems are working, what are their problems, what are the essential drawbacks of them and how these problems/drawbacks can be overcome. This thesis is going in deep in this field. A professional integrated operating system performance measurement system is proposed, designed and implemented. Because of that the monitoring itself will not be enough, so this system provides full controlling of the CPUchanging, Priority-changing and pausing-resuming-killing of these processes and threads. During these complex operations, the proposed system provides the ability of measuring or computing the details of Total-execution-time, CPU-time, Usertime, Kernel-time, context switching, CPU-usage, and the priority of the processes and threads. The proposed system can be used with Windows and Linux OSs. Also, it can be used with any architecture of the multicore processors. In spite of that measuring all these parameters in one system is very difficult and complex (never been done by any previous work) specially the Kernel-time and Context-switching, but all of them has been applied and determined successfully in this thesis. Adding to above features that never been integrated (by any previous work) in one system, and in order to illustrate the real abilities of this system, shared-memory parallel processing approach has been proposed and applied. This design and implementation of algorithm of additional task treated as a complete MSc work previously. But in this thesis, it is just an additional step been applied to run all possible cases of processes and threads that construct the under-tested-program which may be one of the: Single-Process-Single-Thread, Single-Process-MultiThread, Multi-Process-Single-Thread, Multi-Process-Multi-Thread and MultiProcess-Single-Multi-Thread. The algorithms of this system are designed and implemented via C++ programming language that the nearest programming to the operating systems provides more increasing of processing speed. iii

Table of Contents Subject

Page No.

Dedication………………………………………………………………………. i Acknowledgements……………………………………………………………

ii

Abstract………..……………………………………………………………….. iii Table of Contents…………..…………………………………………………….

iv

List of Tables……….…………………………….……………………………. vii List of Figures………………………………………………………………….

viii

List of abbreviations ……………………………………………………............

xi

List of Algorithms ………………………………………………………………

xiii

Chapter One: Introduction 1.1 Overview…………………………………………………………….… 1 1.2 Literature Survey………………………………………………………. 5 1.3 Problem Statement ……………………………………………………. 7 1.4 Significant Points ……………………………………………………... 7 1.5 Aims of the Thesis …………………………………………………. …………

8

1.6 Contribution of the Thesis …………………………………………….

9

1.7 Thesis Layout …………………………………………………………. 9

Chapter Two: Parallel Processing and Multi Core Systems 2.1 Introduction …………………………………………………………. 10 2.2 Parallel Processing: Classes, Types and Approaches …...……………. 11 2.3 Shared Memory Parallel Processing Approach………………………. 12 2.4 Trending from Single-Core to Multi-Core Processor System…..…..… 15

iv

2.5 Processes ………………………………………….…………………. 18 2.6 Threads ………………………………………………………………. 20 2.7 Processes and Threads Life Cycles……………………………..……. 23 2.8 Context Switching of Processes and Threads ………………………. 25 2.9 User and Kernel Levels Threads .…....………………………………. 27 2.10 Multithreading and Multithreaded Execution ……………………….. 28 2.11 Processes and Threads Monitoring ………………………………...... 32 Chapter Three: Structure of the Proposed System 3.1 Introduction……………………………………………………….…… 34 3.2 Structure of the Proposed System ………………….…………..…....... 35 3.3 Features Measured and Calculated by the Proposed System………......36 3.3.1 Self-Checking Part……………………………………….……...........37 3.3.2 Under-Testing Programs Part………………………………………...38 3.3.2.a Program-Information Section ……………….……………………….. 39 3.3.2.b Processes-Information Section………….……………………......... 40 3.3.2.c Threads-Information Section…………………………………........ 42 3.4 Controlling Stage Algorithm……………………..……………….…… 43 3.5 Tracking Stage Algorithm…………………………….……………… 47

Chapter Four: Implementation Results of the Proposed System 4.1 Introduction……………………………………………………….….….50 4.2 Mechanism of Application Software Operation ………….…………….50 4.3 Monitoring-Controlling-Tracking Implemented Results …………….…52 4.3.1 Controlling-Tracking Stage Cases…………………………………….52 4.3.2 Discussion of Controlling Stage Cases………………………………..60 4.3.3 Implementation and Monitoring Results………………………………61 v

4.3.4 Discussion of Implementation and Monitoring Results ………………77 4.4 Evaluation of the Obtained Results …………..…………………………79 4.5 Multi-Changing Controlling Options (MCCO) …….…………………...89 4.6 Comparison between the PMS and Previous Works…………………….90

Chapter Five: Conclusions and Suggestions for Future Works 5.1 Conclusions………………………………………………....……………93 5.2 Suggestions for Future Works……………………….….……....……….94 Appendix A…………………….…………………………….….…….…… 96 References ………………….……………………………….………….…. 103

vi

List of Tables Table

Title

Page No.

(4.1)

Elapsed CPU Time for Processes of MP/ST using Corei7

63

(4.2)

Elapsed User Time for Threads of MP/MT using Corei7

64

(4.3)

Recorded Elapsed CPU Time for Processes and Threads of MP/SMT 65 using Corei7 Relation between Threads Priority and Context Switch of MP/ST using 67 Corei7 Effect of Changing Priority on Elapsed Running Time of MP/SMT using 69 Corei7

(4.4) (4.5) (4.6)

Effect of Changing Selected CPU on Elapsed Running Time for P2 of MP/SMT using Corei7

(4.7)

Effect of Changing the Priority on Elapsed Running Time for Threads of 72 P2 of MP/SMT using Corei7 Effect of Increasing No. of Participated CPUs on CPU-Usage of SP/MT 73 using Corei5 74 Elapsed CPU Time for P1 and P2 of MP/SMT Using i5

(4.8) (4.9)

71

(4.10) Elapsed CPU Time for Threads of MP/SMT Using Core-2 Due

76

(4.11) Effects of MCCO using different case studies

89

vii

List of Figures Figure (2.1)

Title

Page No.

Shared Memory and Distributed Memory system architectures

13

(2.2)

Shared Memory (UMA)

14

(2.3)

Shared Memory (NUMA)

14

(2.4)

Single Core CPU Chip

15

(2.5)

Multi Core CPU Chip

17

(2.6)

Thread tree

21

(2.7)

Process Life Cycle

23

(2.8)

Multithreaded System Architecture

28

(3.1)

Flowchart of main mechanism for the proposed system

35

(4.1)

GUI of the Proposed System

51

(4.2)

Browsing the Features of the Existed Processes.

51

(4.3)

GUI of the PMS with MPMT structure before applying the controlling cases on its Processes and Threads.

53

(4.4)

(4.5)

Changing the Priority of T5P1 (Thread 6712 of Process 4924) from Normal to High Changing the allocated CPU of T2P1 (Thread 2376 of Process 4924) from Core3 to Core4.

54

55

(4.6)

Pausing T1P2 (Thread 6880 of Process 6564).

56

(4.7)

Killing T4P2 (Thread 7620 of Process 6564).

57

(4.8)

Resuming T1P2 (Thread 6880 of Process 6564).

58

(4.9)

The Status of the Case Study after Normal Termination of all its Processes and Threads viii

59

(4.10)

User Time of MP/ST using Core i7

63

(4.11)

Kernel Time of MP/ST using Core i7

64

(4.12)

65

(4.13)

User Time of an MP/MT using Core i7 Treads User Time of MPSMT for P2 using Core i7

(4.14)

Threads Kernel Time of MPSMT for P2 using Core i7

66

(4.15) (4.16) (4.17)

(4.18) (4.19) (4.20) (4.21)

Context Switching of MPST for (Process 1) using Core i7 Context Switch of MPST for (Process 2) using Core i7 Context Switching of MP/ST for (Process 3) using Core i7 Context Switching of MP/ST for (Process 4) using Core i7 Elapsed Running Time of MP/SMT (Only 2 Processes) using Core i7 Elapse Running Time of MP/SMT using Core i7 ERT for All threads of P2 on CPU7 of MP/SMT using Core i7

66

68 68 69

69 70 70 72 72

(4.24)

Elapsed Running Time for Threads of P2 of MP/SMT, Thread3 High Priority using Core i7 Elapsed Running Time for Threads of P2 of MP/SMT, Thread2 Low Priority and Thread4 Real Time Priority using Core i7 CPU Usage of SP/MT for 8 CPUs using Core i5

(4.25)

Core i5 of MP/SMT of (P1 Single Thread onCPU0)

75

(4.22) (4.23)

(4.26) (4.27) (4.28)

Core i5 of MPSMT of (P2 with 3 Threads on CPUs 1,2 and 3) User Time for P1 with 1 Threads of MP/SMT on CPU0 using Core i5 User Time for P2 with 3 Threads of MP/SMT on CPU1 using Core i5 ix

73 73

75 76 77

(4.29)

Before CPU-Changing by PMS and the Status of CPUUsage of System’s Task Manger Effect of CPU-Changing by PMS on the Status of

81

(4.30)

CPU-Usage of System’s Task Manger. (P1, P2 and P3

82

allocated to CPU1, and P4 to CPU0). Effect of CPU-Changing by PMS on the Status of (4.31)

CPU-Usage of System’s Task Manger. (P1 and P2 to

83

CPU3, P3 to CPU2, and P4 to CPU0). (4.32)

(4.33)

Effect of CPU-Changing by PMS on the Status of CPU-Usage of System’s Task Manger. (P1 and P2 to CPU3, P3 to CPU2, and P4 to CPU4). Effect of CPU-Changing by PMS on the Status of CPU-Usage of System’s Task Manger. (P1 and P2 to

84

85

CPU5, P3 to CPU2, and P4 to CPU4). (4.34)

(4.35)

(4.36)

Effect of CPU-Changing by PMS on the Status of CPU-Usage of System’s Task Manger. (P1 and P2 to CPU5, P3 to CPU2, and P4 to CPU4 then Killed). Effect of CPU-Changing by PMS on the Status of CPU-Usage of System’s Task Manger. (P1 and P2 to CPU5, P3 to CPU2 then Killed, and P4 Killed). Effect of CPU-Changing by PMS on the Status of CPU-Usage of System’s Task Manger. (P1 and P2 to CPU5 then Killed, P3 and P4 Killed).

x

86

87

88

List of Abbreviations Abbreviation

Meaning

ATET

Average of Total-Execution-Time

CMP

Chip Multi Processor

CMT/CMP

Chip Multi-Threading/ Chip Multi Processor Processing

CC

Cache Coherent

CC-UMA

Cache Coherent- Uniform Memory Access

CC-NUMA

Cache Coherent- Non Uniform Memory Access

CPU

Central Processing Unit

CPU-KT

CPU Kernel-Time

CPU-UT

CPU User-Time

DS

Data Structure

DMP

Distributed-Memory Parallel Processing

ECPUT

Elapsed CPU Time

ERT

Elapsed Running Time

GPU

Graphic Processing Unit

GUI

Graphic User Interface

HPC

High Performance Computing

I/O

Input/ Output

LC

Life Cycles

LT

Lost-Time

MP/MT

Multi Process/Multi Thread

MP/SMT

Multi Process/Single Multi Thread

MP/ST

Multi Process/Single Thread

M-Process

Monitor-Process

MPI

Message Passing Interface

xi

MTA

Multi-Threaded Architecture

MIMD

Multiple Instruction Multiple Data

NUMA

Non Uniform Memory Access

OS

Operating System

P

Process

PC

Personal Computer

PCB

Process Control Block

PP

Parallel Processing

P-Process

Program-Process

QT

Quasar Toolkit

RAM

Random Accesses Memory

SMP

Symmetric Multi-Processing

SP/ST

Single Process/Single Thread

SP/MT

Single Process/ Multi Thread

SDK

Software Development Kit

T

Thread

TCB

Thread Control Block

TET

Total-Execution-Time

TP

Target Program

UMA

Uniform Memory Access

UTP

Under Test Program

xii

List of Algorithms Algorithm

Title

Page No.

(3.1)

Algorithm : Pseudo steps of Controlling Stage

46

(3.2)

Algorithm : Pseudo steps of Monitoring Stage

48

(3.3)

Algorithm Pseudo steps of Tracking Stage

49

xiii

Introduction

C H A P T E R O N E

Chapter One Introduction 1.1 Overview The behavior of the complex real time systems contains more than one processor which can be considered as a complex style especially when treating with multitask processing. Multiprocessor systems can produce efficient process execution when treating with multi-processes and the execution-productivity can be improved with time-sharing among these processes with real-time style [11]. Consequently, the generality of multiprocessing fundamentals can be extended to the threads not just to processes. To provide good multiprocessing systems, it is necessary to make system developing. This developing will collect details related to intercommunication among the processes and the inter-cooperation of the threads within the same process in order to reduce the executed time for programs, processes and threads. Hence, the performance of the system can be measured and the final decision of the system reliability will be detected [28]. Depending on the principles of multiprogramming execution approach with the modern programming techniques, there was a necessity to go toward the multiprocessing technology. This technology depends on the multicore structuring of the processors. Because of the bounded abilities of the single core processors related to the speed of execution, multiprocessors have been designed and depended on in the computer systems. The characteristics and features of the multicore clusters cannot be cleared well without producing good studies with wide details about the behaviors of the processors in comprehensive manner. In this manner, the optimal performance of these systems as hardware parts can be

1

determined that will be reflected directly on the processing time elapsed for the processes and threads in parallel approach [4]. The time spent in accessing memory has effect on the consumed time for application execution. In order to provide the abilities of understanding the causes of inefficiency memory accessing, there will be a need to tools detect these causes. Due to the effect of new technology the processor speed has increased much faster than memory access speeds [25]. In general, perfect computer system operation needs optimum memory management to make as maximum as possible sufficient available memory space to be able of receiving new processes to take their execution sequence. This is can be done efficiently when treating with multiprogramming (i.e. multi-processes and multithreads). So, using multiprocessor systems will provide the ability of managing the process of new entered-processes with suitable memory space to be available continuously. This is one of the reasons for providing monitoring software to monitor the activities of the processes and threads [26]. In multiprocessing systems, the concurrency degree and dependency are important. Providing a parallel system means that more than one process and more than one thread should be able to be executed at the same time. The parallel processing either be done on the systems with single-processor or multiprocessor. That parallelism of these single-processor systems are done via time-sharing technique. While for multiprocessor systems there is no need to time-sharing except when more than one (process or thread) allocated to just one processor of these processors [33]. In general, it is preferred to make the concurrency degree staying at a high level. So, the instructions of any thread will be executed independently of those of other threads. It is very important to execute the instructions of the same thread sequentially, but maybe these instructions are interleaved with those of other 2

threads when the system is a single-processor for time-sharing approach. Or, these threads have been executed simultaneously even if the system is a multiprocessor one, which is means more than one thread are allocated for core (i.e. processor). The independency feature may face problems of shared resources [33]. The processing of processes and threads in a parallel manner with high degree means that the system is working at High Performance Computing (HPC) which is called high degree parallelism. This facility has been raised by increasing number of the internal cores of the processor. The multiplicity of the cores can be increased to cover different types of cores and different capabilities and specializations of them. Hence, it can deliver exceptional system performance. This increasing is called transferring from single-core to multicore architecture of processors. So, in the last decade, the demand to multicore architectures was prevalent [29]. The understanding of how parallel processing to be evaluated cannot been made by the most experienced programmers. Hence, there is a need to have a good idea about some criteria such as; total execution time, consumed CPU time and a number of context switches. In complementary to that, the need to solution programming models for complex problems that may have heavy computations always returns the spotlight to concentrate on parallel programing which causes parallel processing. Consequently, these complex problems can be solved efficiently using multicore systems [15]. Both of competitive and cooperative systems can be considered as classes that construct the concurrent systems. For the first class, there is a competence to get the shared resources. This competence is done by the individual components of the system, while collaboration is applied within the second class. So, to have a communication and synchronization among active objects, both of competition and collaboration will be addressed by the programming languages [19]. 3

For security policy enforcement, hardware snooping, simulation and binary instrumentation can be depended on as memory tracking methods. The intension is that when using monitoring-code insertion within the application (i.e. rewriting the application), it will enforce compliance with security. However, approaches of memory tracing usually have limitations of time, accuracy and capacity. Also, monitoring-code interferes with the application as little as possible (conservatively and transparency) [24]. The demand of integrity management systems has been increased for the embedded systems that may be vulnerable to attacks. So, it is necessary to study the process and the thread Life Cycle (LC) to provide efficient monitoring. It is not important for normal users to know what has happened for the system in relation with LC states. But, for the programmers studying all states (creation, ready, running, waiting and terminating) adding to them the suspension states, are very important. Hence, the programmer may overcome most of related problems (including the conditions that may cause deadlock occurrence) by rearranging the structure of his programs [10]. Monitoring system can be designed and applied for any class of the computer systems. One of these classes is the symmetric multi-threaded (multi-processor) systems. These systems are constructed depending on shared memory principles. So, the efficient resource shared among the system’s processors needs to design efficient monitoring programs. The mechanism of process monitoring systems depends on the configuration of the processing unit (single-core or multi-core), number of the available resources, level of parallel programming (that’s related with type of the operating system) which finally means a number of the entered processes and threads into the system to be executed at the same time, and the number of pending requests to get resources [8].

4

1.2 Literature Survey Sewon Moon and Byeong-Mo Chang, [31] 2006 developed a thread monitoring system for multithreaded Java programs, which can trace or monitor running threads and synchronization. They designed a monitoring system which has options to select interesting threads and synchronized actions. Using this tool, programmers can monitor only interesting threads and synchronization in more details by selecting options. It also provides profile information after execution, which summarizes behavior of running threads and synchronized actions during execution. They have implemented the system with SDK 1.4.2 on Window XP on Pentium 4 processor. Kirk Kelsey & et al, [20] 2008 presented the interface design and system implementation for fast track. It lets a programmer or a profiling tool mark fasttrack code regions to manage the parallel execution of the speculative process, to check processes and to ensure the correct display of program outputs. The core of the run-time system balanced exploitable parallelism and available processors when the fast track is too slow or too fast. The programming interface closely affects the run-time support. Michelle Goodstein & et al, [26] 2009 worked on solving the multithreaded application monitoring problem using a combination of techniques. First, it leverages cache coherence to record the interleaving of memory accesses from different threads. Second, it extends the framework to comprehend “logical races” events involving at least two threads where the relative order may be important for particular lifeguards, but where not all events are memory accesses. Finally, it uses special thread-switch messages to ensure proper synchronization with cores when application threads are swapped out.

5

Subhi R. M. Zebari, [35] 2010 built a monitoring program for checking the effects of multi-threading on the program execution time, and he addressed a useful approach which enabled the user (programmer) to monitor and track the systemstatus, the program, each process and each thread in the system and make the suitable changes on any state during its Life Cycle (LC). He made relations between serial and parallel executions of the threads. Ban B. Fatohi, [2] 2011 addressed the building of a software application that monitors and tracks the program that is currently running on a multi-processor system. The algorithms related with this work are designed to give capability of running different possible cases of processes/threads. The algorithms of this software application were designed by QT designer application and executed by QT creator application and QT library. Hyun-Ji Kim1, Byoung-Kwi Lee2, Ok-Kyoon Ha3, and Yong-Kee Jun1, [13] 2014 designed and implemented “Dynamic Monitoring Tool based on Vector Clocks for Multithread Programs”. They presented a dynamic monitoring tool, called VcTrace that analyzes partial ordering of concurrent threads and their accesses to the shared memory locations during an execution of the program based on vector clock system. Empirical results show that VcTrace is sound and practical for data race detection as well as analyzing multithread programs. The implementation and experimentation was carried on a system with Intel Xeon Quard-core 2 CPUs and 48GB main memory under Linux operating system (kernel 2.6). The FastTrack algorithm was connected to VcTrace to detect data races. As a summarized view for the proposed system by this thesis and the previous works pointed in this section, it can conclude that: each of the previous work has moved good step toward process monitoring field. But, still each work suffers from

6

some drawbacks that make it to be considered as a dependable system in this field. The main problem is how to assemble (in one system) all monitoring-controllingtracking operations with their many-details related to all possible types of programs (single or multi) processes-threads. So, the proposed system has been constructed in order to overcome the drawbacks of previous works and produce one new system capable of providing full monitoring-controlling-tracking functions. 1.3 Problem Statement This thesis deals with how to overcome the inefficiency of the programs done by the programmers that may cause problems with resources allocation/deallocations/reallocation from/to the processes-threads during the operation of the computer. To do these operations there will be a need to software system capable of: monitoring the processes-threads during their LCs, controlling the executionpausing-priority-allocated-CPUs-time-recording for processes-threads. Another problem related to this subject is to how to teach students (in the computer field) to monitor the processes-threads with flexible software. 1.4 Significant Points The first significant point of this thesis relates with studying the features of PCoperation monitoring system depending on the shared memory parallel processing approach that gave power to this work. The second point is providing sufficient knowledge about the mechanism of processes-threads LCs. The third point is enabling the programmer to see what happens for his system when changing the allocated CPU and the priority level for his programs (processes-threads). Finally, verifying the demand of the students to learn-apply processes-threads monitoring – controlling-tracking. 7

1.5 Aims of the Thesis The main aim of this thesis is to design and implement a monitoring system for self-checking of PC system-information and monitoring all processes currently in the system with full details. The thesis aims to providing full monitoring, controlling, tracking and re-monitoring of all possible cases of single-multi processes-threads (which represent the principles of shared memory system parallel processing) with all information related to them including elapsed kernel time and context-switching, and enabling the programmer to control pausingrunning-killing of these processes-threads, forcing them to specified CPUs and changing their priorities. 1.6 Contribution of the Thesis This thesis added an important contribution to the OS and parallel processing fields. The integration of the proposed and implemented system has been gain by utilizing efficient proposed shared memory parallel processing system to be used as platform for OS performance measurement. The proposed approach of heavy load division and processing (in parallel) among the processors of the same system is depended to create and force (single and multi) processes that consist of (single and multi) threads. Hence, this system has the capabilities of merging the facilities of parallel processing with those of OS. The integrated system has the ability of: • Creating (single and multi) processes and threads that use heavy data-load for processing, and treated as components of parallel processing system. • The PCs that contain more than one Core as processors are used, and these Cores used to receive the created processes and threads in different groups. Then the principles of shared memory parallel processing are applied to execute them. 8

• During the parallel processing functionality, the concepts of controlling (forcing) the processes and threads among the cores are applied. The steps of changes-tracking and monitoring are applied during any controlling activity. • During all of these activities, the following values has been measured or calculated: all related time-values ((Total, Average, CPU, User and Kernel), Context-switching, CPU usage, size of (program, process and thread), and the current allocated Core) for each process and each thread from starting to the end of execution. So, integrating all these important activities in one system not been provided by any previous work in the related field. 1.7 Thesis Layout Adding to chapter one, the thesis is outlined as follows: Chapter Two: Describes the background theory of the parallel processing approaches with concentrating on the shared memory approach. Also, it deals with systems monitoring with the related classes and types of different approaches. Chapter Three: Deals with the proposed algorithms for the suggested approach of self-checking the system, monitoring, controlling and tracking the user under-test-program. Chapter Four: Introduces the output implemented results obtained from the executing the proposed algorithms of chapter three. Then it makes a comparison between the results of this work in one side and those of the previous works in the other side. Chapter Five: Represents the conclusions of this work, and the necessary suggestions for future studies.

9

Parallel Processing and Multi-Core Systems

C H A P T E R T W O

Chapter Two Parallel Processing and Multi-Core Systems 2.1 Introduction This chapter deals with the background theory related to the main subjects that are interesting in this thesis. The main topics of this thesis start from the platform work which represents an illustrated case study that can provide the abilities of feature measurements related to monitoring system. This platform is the parallel processing approaches, and the nearest one for such measurements is the shared memory parallel processing approach. Adding to that, distributed memory parallel processing approach can be depended in a narrow scale to illustrate the capability of measuring the total and CPU consumed times. On the base of this platform case study, it is being necessary to prepare the background theory related to the structure of the monitoring systems (application software) and the mechanism of their operation. The important topics include also single multi core processors structures, processes and threads LCs, types of the processes-threads, modes of operations (kernel or user), details of the consumed time (total, CPU, Kernel and user), processes-threads context switching and finally monitoring systems structure depending on their tasks (monitoring, controlling and tracking).

10

2.2 Parallel Processing: Classes, Types and Approaches Parallel processing means using more than one CPU at the same time to execute a program. Ideally, Parallel Processing (PP) makes a program run faster because there are more engines (CPUs) running it. In practice, it is often difficult to divide a program in such a way that separate CPUs can execute different portions without interfering with each other. Most computers have just one CPU, but some models have several. With single-CPU computers, it is possible to perform PP by connecting the computers in a network. However, this type of parallel processing requires very sophisticated software called distributed processing software [6]. There are three major classes of PP machines: multiprocessors, clusters, and grids. Multiprocessors offer the lowest communications delay as the compute nodes are tightly coupled. Cluster computers require low communications latency and high bandwidth between several closely related machines. Grid computers often use the Internet for the interconnection between machines. These machines vary in computing power, memory and processor type. Multiprocessors are computers with more than one CPU or core. There are three types of multiprocessors in use today. SMP machines are common and have SM banks. DMPs have distributed memory banks (or separate memory for each processor), and are mostly used in specialized high performance computing. There are also machines called vector processors which use multiple data paths and memory banks, but the processors all do the same task [17]. There are two types of PP architectures: Message Passing Architecture and Shared Memory Architecture. These two differ in their memory organization, resulting in different speed and communication. Programs written for one type of architecture might not perform well when executed on the other architecture [7]. 11

A message passing system, such as Message Passing Interface (MPI) or Parallel Virtual Machine, provides a library of message passing procedure calls. Parallel programs communicate with each other synchronously and directly by sending messages. In MPI system, a message buffer is defined as a triplet (address, count, data type), where the address specifies the start of the buffer; count specifies the number of data units contained in the buffer; and data type specifies the data type of the information being transferred. There are two basic message operations: send and receive. Each message is marked by a unique tag that is used by the receiver for rearranging message sequence if messages arrive out of order. The tags are allocated at runtime and no wildcard tag matching is allowed in message receiving [7]. 2.3 Shared Memory Parallel Processing Approach Shared Memory parallel computers are those, in which processors have the capability to access the memory as a global address space. Processors can operate independently and the changes made on a data item by one processor are visible to all other processors. The propinquity of the processors facilitates data sharing faster [12]. The fundamental abstraction in the SM programming model is the concept of a thread. Multiple threads share the address space of a program and enable concurrent and parallel execution. Although the address space is kept coherent, often there is a need to synchronize the threads to ensure the correctness of the program. This is a characteristic feature of the SM programming model. Unprotected accesses to the same shared data item will cause a so-called race condition or data race [22]. A major characteristic of most SM systems is that access to data is independent of the processor making the request and is relatively fast, almost as fast as typical 12

memory access times in a uniprocessor system. However, when many processors are making simultaneous requests to a single memory location or bank, and memory access becomes a bottleneck, access times can increase greatly. For this reason, physical memory layout and data organization within the memory are critical to ensure that the memory system can handle as many simultaneous requests as possible. Synchronization between processes must be maintained to ensure this data consistency The Figure (2.1) illustrates the Shared Memory and Distributed Memory system architectures [7].

Figure (2.1): Shared Memory and Distributed Memory system architectures [7].

Physically SM shares a single memory among their processors, so that, a value written to shared memory by one processor can be directly accessed by any processor. Alternatively, logically SM can be implemented for systems in which each processor has its own memory by converting each non local memory reference

into

an

appropriate

inter

processor

communication.

Either

implementation of SM is generally considered easier to use than message passing [1]. Memory access in an SM system can be classified into two types, Uniform Memory Access (UMA) and Non Uniform Memory Access (NUMA). In UMA, 13

the access time to memory from all processors is the same and all the processors in the system are symmetric and identical. NUMA is non-symmetric as the access time to memory for all processors is different. Mostly the NUMA shared memory system is formed by linking more than one Symmetric Multi-Processor (SMP) together [12].

Figure (2.2): Shared Memory (UMA) [5].

Figure (2.3) Shared Memory (NUMA) [5].

Both UMA and NUMA have cache coherent variants (CC-UMA and CC-NUMA). The main problem with this type of architecture is its scalability; the larger the 14

number of processors in the system, the larger is the traffic to access the memory [12]. An SM parallel computer whose individual processors SM (and I/O) are in such a way that each of them can access any memory location with the same speed; that is; they have a uniform memory access (UMA) time. Many small SM machines are symmetric in this sense. Larger SM machines, however, usually do not satisfy this definition; even though the difference may be relatively small, some memory may be nearer to one or more of the processors and thus accessed faster by them. Such machines have CC -NUMA [30].

2.4 Trending from Single-Core to Multi-Core Processor Systems The single-core CPU utilizes one core inside the processor. This was the very first type of CPU, and today; it is not used in many machines. The Figure (2.4) illustrates the single core CPU chip [14].

Figure (2.4): Single Core CPU Chip

15

In order to improve the processor performance, the response of the industry has been to increase the number of cores on the die. While microprocessor technology has delivered significant improvements in clock speed over the past decade, it has also exposed a variety of other performance bottlenecks. To alleviate these bottlenecks, microprocessor designers have explored alternate routes to cost effective performance gains. This has led to use of multiple cores on a die. The design of the contemporary multi-core architecture has progressively diversified from more conventional architectures. An important feature of these new architectures is the integration of large number of simple cores with software managed cache hierarchy with local storage [23]. Multi-core processors represent an evolutionary change in conventional computing as well setting the new trend for high performance computing (HPC) - but parallelism is nothing new. Intel has a long history with the concept of parallelism and the development of hardware-enhanced threading capabilities. Intel has been delivering threading capable products for more than a decade. The move toward chip-level multiprocessing architectures with a large number of cores continues to offer dramatically increased performance and power characteristics [27].

Multicore Central Processing Units (CPU) are becoming the standard for the current era of processors through the significant level of performance that CPUs offer. This includes multiple multicore architectures, different level of parallelism, different levels of performance, and with the variety of architectures, it becomes necessary to compare multicore architectures to make sure that the performance aligns itself with the expected specifications [14]. Multi-core processors, as the name implies, contain two or more distinct cores in the same physical package. In this design, each core has its own execution pipeline and each core has the resources required to run without blocking resources needed 16

by the other software threads. The multi-core design enables two or more cores to run at somewhat slower speeds and at much lower temperatures. Multi-core processors are MIMD because different cores execute different threads (Multiple Instructions), operating on different parts of memory (Multiple Data) and they are also a shared memory multiprocessor in which all cores share the same memory[14].

Figure (2.5) Multi Core CPU Chip [14] Multi-core is a design in which a single physical processor contains the core logic of more than one processor, for example when an Intel Xeon processor was opened up and inside all the circuitry and logic for two (or more) Intel Xeon processors were packaged. The multi-core design puts several such processor “cores” and packages them as a single physical processor. The goal of this design is to enable a system to run more tasks simultaneously and thereby achieve greater overall system performance. The following are the examples of multi core processors [14].

17

2.5 Processes From the programmers’ perspective a program is an ordered set of instructions. On the other hand, from the point of view of an operating system, it is an executable file stored in the secondary (auxiliary) memory, typically on a disk [32]. A process is not a program (application). A program is a static sequence of instructions (text). Programs and processes are synergistic. The program needs the process in order to execute; the process needs the program in order to have purpose. It is sometimes easy to confuse the two because of their dependency. Further distinguish the two by defining a program as a passive entity and a process as an active entity. A process generally includes the current activity (program counter), contents of the processor’s registers, process stack, data section containing global variables, and the text (the program). The above-mentioned description of a process is known as its data structure [37] In operating system terminology, the notion of process is used relating to execution instead of the term program. It designates a commission, or quantum of work dealt with as an entity. Consequently, required resource like address space etc. will be allocated typically on a process basis [37]. A process is an entity that can be assigned to and executed on a processor. Moreover, a process is also an entity that reserves resources so that it can use those resources to execute a program. And lastly, a process is an instance of a program (this definition will prove to be useful in the discussion of swapping and the “context of a process”). As can be seen, the role of a process is given in its definition – an entity that reserves system resources for the execution of a program [37]. With process creation, memory space and, when needed, extra resources like I/Odevices should be allocated to a process. Lastly, to execute a process a processor is 18

to be allocated for. This is expressed in operating system terminology by saying that the process is to be scheduled for execution. Earlier operating systems, such as the IBM/360 operating systems like /DOS, /MFT, /MVT etc. used the term task in the same sense as recent operating systems use the notion of a process. Processes and threads are necessary to the OS, because the OS utilizes both as a means of executing programs. The term process has been given many different definitions. Generically, a process can be defined as a program in execution, or an entity that can be assigned to and executed on a processor, or it is the “animated spirit” of a program. In the Windows NT OS (WINNT), a process is defined as a set of resources reserved for the threads to execute a program. The Unix OS defines a process as an instance of a program in execution, but the definitions are not in conflict with one another. Just the opposite is true; the definitions support and enhance one another [37].

Each process has a lifecycle which consists of creation execution phase and termination. In order to execute a program prior, a corresponding process is to be created. A process will be brought into existence by using system services (system calls or supervisor macros). The creation of a process means commissioning the operating system to execute a program.

The creation of a process involves the following four main actions: • setting up the process description, • Allocation of an address space and • loading the program into the allocated address space and • passing on the process description to the scheduler.

19

Usually, OSs describes a process by means of a description table called Process Control Block or PCB. A PCB contains all the information that can be relevant during the whole lifecycle of a process. It holds on the one hand, basic data, such as process identification, owner, process status, description of the allocated address space etc. And on the other hand, it provides space for all implementation dependent, and process specific additional information. Such supplementary information may be required sometimes during process management, in connection with memory management, and scheduling. For instance, page tables, working set lists, and various timers are related to the execution of the process which may account to a considerable extent [32].

2.6 Threads The notion of thread was introduced in the framework of the process-thread model in order to express more parallelism in code than in the process model. This will be achieved by declaration of smaller chunks of code, called threads (lightweight processes), within a process as an entity that can be executed concurrently in parallel. A thread, like a process, is a sequence of instructions. Threads are created within and belonging to processes. All the threads created within one process share the resources of the process, above all, the address space. Of course, scheduling will be performed on a per-thread basis. In other words, the process-thread model is a finer grain scheduling model than the process model.

Although this model is far more affordable than the process model, it has numerous advantages over and above. Evidently, with finer grained entities more parallelism can be exposed than in the case of the processes. In addition, the creation of threads or the communication, synchronization or switch among threads 20

are far less expensive operations than those for processes, since all threads belonging to the same process are sharing the same resources.

Threads have a similar lifecycle as the processes and will be managed mainly in the same way as processes are. Initially, each process will be created with one single thread. However, threads are usually allowed to create new ones using particular system calls. Then, typically for each process a thread tree will be created Figure (2.6).

Figure (2.6): Thread tree During creation each thread will be declared by a particular data structure mostly called Thread Control Block (TCB). The scheduling of threads will be performed in a similar way as described above for processes. Correspondingly, threads can be basically in one of three states; running, ready to run or waiting (blocked). Of course, each real operating system maintains, beyond the basic ones, a number of system specific additional states. Thread management is performed by setting up TCB queues for each state and performing the state transitions according to the state transition diagram and the scheduling policy. The scheduling part of the operating system overtakes the responsibility for managing all these queues in much the same way as it occurs with processes. At the end of thread creation, the 21

TCB will be placed into the queue of the ready-to-run threads and is contesting for the processor [32].

A traditional process is thus equivalent to a process containing a single thread. Threads have the advantage that can simplify programming when concurrency is naturally present but awkward to express in serial languages (for example, a web server responding to multiple client requests can more cleanly separate the state of each transaction). Transaction processing is a typical, commercially-important, workload with an abundance of threads [9]. For systems that support threads, the OS is designed to schedule threads instead of processes for a quantum. This is the case in Windows. A thread is an entity spawned from a process that executes a program. More simply stated, a thread is a precisely measurable controlled unit of work (a basic unit of CPU utilization). Threads enable resources (although threads do not reserve resources) to be shared and accessed concurrently within the same process, which is useful for related jobs. A thread can be spawned from a process or another thread. A thread can only spawn another thread (a child thread). Many modern OSs are using processes and threads [16]. Threads are very similar to processes. However, they are used to enhance the capability of a process’ multi-tasking ability. Threads allow resources to be shared and accessed concurrently within the same process by executing in the same address space (domain). They are sometimes called lightweight processes, and they generally consist of a program counter, a register set, and a stack space. A thread, like a process, is assigned to run in user space or system space depending on its privilege [38].

22

2.7 Processes and Threads Life Cycles One of the important aspects of a process or a thread is its life cycle. Processes make it possible for multi-tasking OSs running on a single processor to multi-task and make the computer appear to process multiple applications simultaneously. Processes primarily accomplish this through life cycle transitions and time allocation from the OS. There are several stages in the life cycle of a process. A complete process state transition diagram is depicted in Figure (2.7) [21].

Figure (2.7): Process Life Cycle [21] Figure (2.1) shows the transition state diagram for processes in the OS. However, every process, regardless of the OS that supports it, can be represented in part or fully by the figure. Not every OS’s supported process will have the full array of transition states as shown in Figure (2.1). As a general rule, a process will always have one of the following states:

new/created, running, waiting, ready, or

terminated. Each transition state is defined as follows [37]: User running: the process is executing in user mode. 23

Kernel running: the process is executing in kernel mode. Ready to run in memory: the process is not executing but is ready to run as soon as the kernel schedules it. A sleep in memory: is when the process is sleeping and resides in main memory. Ready to run, swap is when the process is ready to run, but the swapper (process 0) must swap the process into the main memory before the kernel can schedule it to execute (although some refer to process 0 as a swapper, it may be more precise to think of it as a transitional mechanism that transitions a process from the User running state to ready to run in memory, the process remains in main memory throughout the transition). Sleep swapped: is when the process is sleeping and the swapper, which refers to a mechanism that actually moves the process from the main memory to secondary memory) has swapped the process to secondary storage to make room for other processes in the main memory. Preempted: the process is returning from kernel to user mode, but the kernel preempts it and does a context switch to schedule another process. The distinction between this state and the state “ready to run in memory” is based on the process. In this state, a process running in kernel mode can be preempted only when it is about to return to user mode. Otherwise, the preempted state is the same as “ready to run in memory”. Fork: is when the process is newly created and is in the transition state; it exists, but not ready to run, nor is it sleeping. This state is the start state for all processes except process 0. Termination: the process executed the exit system call and is in the finishing state. The process no longer exists, but it leaves a record containing an exit code and some timing statistics for its parent process to collect. The zombie state is the final state of the process. 24

There are several triggers that cause a process to transition from one state to another. One of the major triggers is the central processing unit (CPU) time. The time that is allocated to a process is known as a quantum. For a single-user, the processes of a single task OS, execute sequentially, and once the process begins, it runs until it ends. For the single-user of the single task system, the process has complete control of the CPU and resources until it has accomplished its task – the current process monopolizes the CPU until it completes its task. The user cannot move between various programs or open multiple windows [37].

2.8 Context Switching of Processes and Threads The method that allows the CPU in multi-tasking OSs to move from one process to another in accordance with a scheduling algorithm is known as process swapping and sometimes it is referred to as process switching. A process must be in main memory to be executed. As mentioned earlier, to avoid starvation because one process controls or monopolizes the CPU, each process is given a quantum of time to have exclusive use of the CPU. Once the process’s quantum expires, the process is removed from execution and another process is scheduled for execution. Depending on the design of the OS, there may be additional events, besides the expiration of the process’s quantum that could cause processes to transition from the running state. Some of the more general events that are usually common to most OSs are preemption, which can be caused by process priority (one process having a higher priority than another), I/O operations, and traps, traps are for mode switching, not process switching, but a snap shot of the state of the machine (process or thread context) must be maintained during a trap [34].

25

A process with a higher priority may preempt a lower priority process. Note that many OSs have a mechanism in place to ensure that low priority processes are not starved. I/O operations could cause a process to wait. In such cases, to optimize CPU usage, the process that is waiting on the I/O operation might transit from the running state. A trap is a term that is used to explain a processor’s mechanism for capturing an executing process or thread when an exception or an interrupt occurs requiring snapshot of the process’s context; the process’s context is its registers, stack information, and other pertinent information, which needs to be preserved when the system mode is switching from user mode to kernel mode. The cause of a trap is usually associated with the execution of the current instruction. Usually a trap is used to handle an error or an exceptional condition [37]. During process transitions, the OS must save the state and pertinent information of the currently executing process. The state and pertinent information of a process is the process context. In general, (expanding the definition of the process context), it usually consists of the program counter, other process registers, and stack information from both user and kernel space (remember that a process has data, text, and a stack – the earlier definition of a process being an instance of a program which corresponds to the “text” portion of the process). By freezing the process and capturing a snapshot of it as it exists in its halted or frozen state, the OS is able to move it from executing memory to storage and is later able to move it back into executing memory with its precise settings when it was frozen or halted. The OS prevents arbitrary process swapping and transitions, thereby maintaining consistency [37].

26

2.9 User and Kernel Levels Threads Threads are visible only from within the process, where they share all process resources like address space, open files, and so on. The following state is unique to each thread: Thread ID, Register state (including PC and stack pointer), Stack, Signal mask, Priority and Thread-private storage [36]. Because threads share the process instructions and most of its data, a change in shared data by one thread can be seen by the other threads in the process. When a thread needs to interact with other threads in the same process, it can do so without involving the OS. Threads are the primary programming interface in multithreaded programming. User-level threads are handled in user space and so can avoid kernel context switching penalties [36]. An application can have thousands of threads and still does not consume many kernel resources. How many kernel resources the application uses is largely determined by the application. By default, threads are very lightweight. But, to get more control over a thread (for instance, to control scheduling policy more), the application can bind the thread. When an application binds threads to execution resources, the threads become kernel resources [36]. Figure (2.8) illustrates both of (kernel and user) levels.

27

Figure (2.8): Multithreaded System Architecture

2.10 Multithreading and Multithreaded Execution Threads are an inherit part of software products as a fundamental unit of CPU utilization as a basic building block of multithreaded systems. The use of threads has evolved over the years from each program consisting of a single thread as the path of execution of it. The notion of multithreading is the expansion of the original application thread to multiple threads running in parallel handling multiple events and performing multiple tasks concurrently. Today's modern operating systems foster the ability of multiple threads controlled by a single process all within the same address space. Multithreading brings a higher level of responsiveness to the user as a thread can run while other threads are on hold awaiting instructions. As all threads are contained within a parent process, they share the resources and memory allocated to the process working within the same address space, making it less costly to generate multiple threads vs. Processes. 28

These benefits increase even further when executed on a multiprocessor architecture as multiple threads can run in parallel across multiple processors as only one process may execute on one processor [18]. Windows supports concurrency among processes because threads in different processes may execute concurrently. Moreover, multiple threads within the same process may be allocated to separate processors and execute simultaneously. A multithreaded process achieves concurrency without the overhead of using multiple processes. Threads within the same process can exchange information through their common address space and have access to the shared resources of the process. Threads in different processes can exchange information through shared memory that has been set up between the two processes [39].

An existing Windows thread is in one of six states [39]: • Ready: It may be scheduled for execution. The microkernel dispatcher keeps track of all ready threads and schedules them in priority order. • Standby: A standby thread has been selected to run next on a particular processor. The thread waits in this state until that processor is made available. If the standby thread's priority is high enough, the running thread on that processor may be preempted in favor of the standby thread. Otherwise, the standby thread waits until the running thread blocks or exhausts its time slice. • Running: Once the microkernel performs a thread or a process switch, the standby thread enters the running state and begins execution and continues execution until it is preempted. It exhausts its time slice, blocks, or terminates. In the first two cases, it goes back to the ready state. 29

• Waiting: A thread enters the waiting state when (1) it is blocked on an event (e.g., I/O), (2) it voluntarily waits for synchronization purposes, or (3) an environment subsystem directs the thread to suspend itself. When the waiting condition is satisfied, the thread moves to the Ready state if all of its resources are available. • Transition: A thread enters this state after waiting if it is ready to run but the resources are not available. For example, the thread's stack may be paged out of memory. When the resources are available, the thread goes to the Ready state. • Terminated: A thread can be terminated by itself, by another thread, or when its parent process terminates. Once housekeeping chores are completed, the thread is removed from the system, or it may be retained by the executive1 for future reinitialization.

Typically, some threads will be available for execution, whilst others are blocked waiting either to transmit or receive data. The computation consists of processing interleaved with communication, and the threads have a lifetime and state of their own. This is a control-flow design (the program code says when data transfers are to take place), but the data flow determines what threads are available for execution at any given time. Such a system could be implemented on a conventional (multiprocessor) machine using software scheduling of the threads, but it is far more efficient if the processor is modified to hold several thread contexts

internally;

this

is

a Multithreaded

Architecture

(MTA).

The

communication of most interest is not usually between threads but between a thread and the memory system and MTAs is particularly concerned with hiding memory latency. The technique of context-switching when a particular thread 30

cannot immediately make progress is more general, though, and can also be used to hide functional units’ latencies, if the context switch is fast enough. It should be noted that multithreading does not directly improve the execution time of an individual thread; rather, it improves the throughput of the system as a whole. In this respect, it is similar to multiprocessing, and can be viewed as multiplexing several virtual processors onto one physical processor. Multithreaded architectures differ depending on the policy determining when to switch threads. Fine-grained MTAs can switch to a different thread each cycle; the usual alternative is to switch on a cache miss or an explicit instruction, termed coarse-grained or block multithreading. In many modern OSs the execution model is as follows: each program is represented as a process. The process is created when program execution is a requested container for various resources and attributes of the program. Process also owns thread and has the following features: • Owns address space and open resources (files, sockets...) that all threads share. • Holds information about user and the environment (User/Group ID, directory). • Contains at least one thread. A thread is an execution path of a program. It is the smallest unit of execution. It consists of a stack, the state of the CPU registers, and an entry in the execution list of the system scheduler. Each thread shares all of the process’s resources, and has the following features [9]: • It exists within a process • It has its own independent control and scheduling

31

• It uses process' resources • It is terminated when a process terminates • It shares memory space, but each has its own stack • It has its own priority level

2.11 Processes and Threads Monitoring Multiprogramming usually talks about the context of OSs as opposed to applications. Multiprogramming is a scheduling technique that allows more than one job to be in an executable state at any time. In a multi programmed system, the jobs (or processes) share system resources such as the main system memory and the processor. There is an illusion in a single core system that the processes are executing simultaneously because the OS uses the technique of time slices. In the time slice scheme, each process is given a small interval to execute. After that interval, the OS switches contexts and lets another process execute for an interval. These intervals are called time slices, and they are so small that the OS switches the context fast enough to give the illusion that more than one process or job is executing at the same time. In contrast, a multiprocessor is a computer that has more than one processor. In this case, the referring is towards the idea of having two or more general-purpose processors. Technically speaking, a computer with a CPU and a GPU is a multiprocessor. Multicore application design and implementation uses parallel programming techniques to design software that can take advantage of Chip Multi-Processor 32

(CMP). The design process specifies the work of some task as either two or more threads, two or more processes, or some combination of threads and processes. That design can then be implemented using template libraries, class libraries, thread libraries, OS calls, or low-level programming techniques. The OS utility generates a report that summarizes execution statistics for the current processes. This information can be used to monitor the status of current processes. In a multiprocessor environment, this utility is useful to monitor the state, CPU and memory usage, processor utilized, priority, and start time of the current processes executing. Command options control which processes are listed and what information is displayed about each process. The priority level of a process can be changed by using a function. Each process has a value that is used to calculate the priority level of the calling process. A process inherits the priority of the process that created it. But the priority of a process can be lowered by raising its value. Only super-user and kernel processes can raise priority levels [3]. The runtime system can use knowledge of a program’s expected behavior, provided by code annotations, to inform its thread scheduling and migration decisions on a heterogeneous multi-core architecture. However, code annotations are not the only way of providing this behavior information. Instead, the runtime system could directly monitor certain aspects of a program’s behavior at runtime. By using this monitored information to infer a program’s behavior, the runtime system can enhance the behavior information provided by annotations and even support efficient exploitation of heterogeneous cores by completely unmodified applications that have not been augmented with behavior annotations [29].

33

Structure of the Proposed System

C H A P T E R T H R E E

Chapter Three Structure of the Proposed System 3.1 Introduction The proposed system has been structured in order to be used easily and effectively. The proposed system provides the programmer (i.e. user) with facilities of monitoring the overall system in general, but with more details for a number of proposed programs with different structures: Single-Process-SingleThread (SPST), Single-Process-Multi-Thread (SPMT), Multi-Process-SingleThread (MPST), Multi-Process-Multi-Thread (MPMT), and Multi-ProcessSingle-Multi-Thread (MPSMT).Seven sorting-techniques are depended used by the above structures; these techniques are Selection, Insertion, Quick, Heap, Merge, Bidirectional and Bubble. Hence, appendix A illustrates the flowchart of these sorting-techniques. Adding to the monitoring activities, the proposed system is capable of controlling those processes and threads of the under-test programs with features of Running-Pausing-Resuming, Taking-off/Returning the CPU, Forcing the Processes-Threads among the available Processors, Priority-Changing of the Processes-Threads, and Recording the CPU-Kernel-User timings of the Processes-Threads during their execution. As a consequence, activity there will be tracking of all these activities from the start of running the program until closing them. Hence, the tracking stage will provide the ability of handling and updating all changes done during the controlling stage and recollecting this information to be used again by monitoring stage which always monitors the system. This chapter will illustrate the algorithms and steps of operation for each of the above stages as Pseudo-Code. Also, the GUIs brows all information related with all existing processes and under testing program, the processes, and the threads 34

will be addressed here with all dynamic tables that collect the information appeared or changed on these GUIs. Adding to the structured-code of the algorithms illustrated in this chapter, also for more illustration Appendix A produces their flowcharts. 3.2 Structure of the Proposed System The proposed system is constructed to work as a closed circle collecting its three main stages: Controlling Tracking Monitoring, and there are close relations among them depending on the changes occurring on the existing processes or the under-test programs, processes and threads. The general representationstructure and its mechanism is shown in the flowchart of Figure (3.1).

Figure (3.1): flowchart of the main mechanism for the proposed system. 35

The system can provide the abilities of producing the general features of the computer that runs this system. So, it can be known as the features produced by the system-information file about hardware components of the computer and type/version of the OS. 1. Names of the processes that are instantly in use before the test-operation and managed by the OS according to the existing processes. 2. Memory size of each of the existed-process (whatever be its state) during its Life-Cycle. The above steps are related with instance-status of the system. The applicationsoftware also gives the ability of Appling monitoring for testing any wanted program which is called Under Test Program (UTP) including determining its type Single-Multi Process, number of Processes, controlling the guidance of the UTP's processes to the wanted processors (this is up to the user choices).

3.3 Features Measured and Calculated by the Proposed System Adding to that the UTP may be single or multi process, each process of the UTP has one or more threads according to its type (i.e. SP-ST, SP-MT, MP-ST, MPMT and MP-SMT). So, for the UTP, each Process and each thread the software provides the ability of determining the following information: 1. Total-Execution-Time (TET). 2. Average of Total-Execution-Time (ATET). 3. Lost-Time (LT). 4. CPU Burst-Time (CPU-BT). 5. CPU Kernel-Time (CPU-KT). 6. CPU User-Time (CPU-UT). 7. CPU Usage.

36

This work deals with two main parts; existed processes part and under test program part. Each one has its owned Data Structures (DS) and GUI with full details about the important features and information related withthe processes and threads. 3.3.1 Self-Checking Part This part makes monitoring on all existing processes before/during running this SW-Application which are out of the testing-programs. There is a special DS representing this part that relates with the existing processes, it can handle all information provided by Windows Task-Manager of any of these processes, adding to that many other information added here which gives monitoring ability to the user or the programmer. The structure of this DS is as bellow: Process Name: Is the name of the (existed or under-test) process. This data remained during the process LC. PID: process identification number assigned by the OS. This value is unique for each process and will not be duplicated during the process LC. User Name: Name of opened processes depending on their services. This data remained constant during the process LC. Priority: The priority of the process to dispatch from Ready-state to Runningstate. This value is assigned by the OS and may be changed during process LC. Also, this operation may be repeated more than one time during the process-LC. Start Time: Starting execution time of the process which depends on the PCclock. This data remained constant during the process LC.

37

Elapsed Running Time: The total consumed time of the process-LC. This value will be changed during process LC. Elapsed CPU Time: The consumed CPU-time for the process during its LC. This value will be changed during process LC. User Time: The part of the consumed CPU-time related with the user activities for the process during its LC. This value will be changed during process LC. Kernel Time: The part of the consumed CPU-time related with the kernel activities for the process during its LC. This value will be changed during process LC. CPU Usage%: The percentage of CPU occupied to the process. This value will be changed during process LC. RAM: process size (in Bytes) within physical memory. This value is remained constant during the process LC. Threads: Number of threads in this process. This data is remained constant during the process LC. Processor ID: The name of the processor assigned to that process. This assigned CPU may be changed during process LC. This part is browsed on a special GUI that collects all the above information, which is called General-GUI. 3.3.2 Under-Testing Programs Part This part is related with the under test programs. These under-test programs need to more detailed-DSs to be monitored correctly during their running. These

38

DSs are related with the status of these programs and consequently their processes and in turn threads of these processes. The selection of any of the above types of under-test programs is done bythe General-GUI, but the information of it and its processes and threads will be illustrated on a new GUI and Under-Test-Program-GUI. This new one GUI is constructed on four sections, which are: a. Program-Information Section The DS of this section provides all related information about the program under test including following fields: The structure of this DS is as Follows: Processes: The total number of Processes of the UTP. Threads: The total number of Threads of the UTP. Creation time: Starting execution time of the program which depends on the PCclock. This data remained constant during the tested programLC. User Time: The part of the consumed CPU-time related with the user activities for the program during its LC. This value will be changed during tested program-LC. Kernel Time: The part of the consumed CPU-time related with the kernel activities for the p during its LC. This value will be changed during tested program-LC. Elapsed CPU Time: The consumed CPU-time for the program during its LC. This value will be changed during tested program-LC.

39

Elapsed Running Time: The total consumed time of the program-LC. This value will be changed during tested program-LC. CPU usage %: The percentage of CPU occupied to the program. This value will be changed during tested program-LC. Priority: The priority of the program to dispatch from Ready-state to Runningstate. This value is assigned by the OS and may be changed during process LC. Also, this operation may be repeated more than once during the tested program-LC. Creation Time Average: Average creation time for all processes belonging to the UTP. Running-Time Average: Average running time for all processes belonging to the UTP. CPU Time Average: Average elapsed CPU time for all processes. CPU Usage % Average: Average CPU usage for all processes. End Time Average: Average ending-execution time for all processes belongs to the UTP. End Time: UTP ending-execution time(taken from the PC-clock). b. Processes-Information Section The DS of this section provides all related information about the processes of the program under test including the following fields: Process Name: The name of processes. ID: Process identification number assigned by the OS. This value is unique for each process and will not be duplicated during the process-LC. 40

RAM: Process size (in Bytes) within physical memory. This value remained constant during the process-LC. Thread: Total No. of threads for each process. Creation time: Starting execution time of the process which depends on the PCclock. This data is remained constant during the process-LC. User Time: The part of the consumed CPU-time related with the user activities for the process during its LC. This value will be changed during process-LC. Kernel Time: The part of the consumed CPU-time related with the kernel activities for the process during its LC. This value will be changed during process-LC. Elapsed CPU Time: The consumed CPU-time for the process during its LC. This value will be changed during process-LC. Elapsed Running Time: The total consumed time of the process-LC. This value will be changed during process-LC. CPU Usage %: The percentage of CPU occupied to the process. This value will be changed during process-LC. Processor ID: Process identification number assigned by the OS. This value is unique for each process and will not be duplicated during the process-LC. Priority: The priority of the process to dispatch from Ready-state to Runningstate. This value is assigned by the OS and may be changed during process LC. Also, this operation may be repeated more than once during the process-LC.

41

Read Count: Number of times read from file. Read Byte: Number of bytes that have been read from the process's data. This value is remained constant during the process-LC. End Time Average: Average ending time for all threads belong to specific Process End Time: Ending execution time for the process. c. Threads-Information Section The DS of this section provides all related information about the threads within the processes of the program under test including following fields: Thread ID: Thread identification number assigned by the OS. This value is unique for each process and will not be duplicated during the thread LC. Owned by Process ID: Name of the process owning this Thread. Creation time: Starting execution time of the thread which depends on the PCclock. This data remained constant during the thread LC. User Time: The part of the consumed CPU-time related with the user activities for the thread during its LC. This value will be changed during thread LC. Kernel Time: The part of the consumed CPU-time related with the kernel activities for the thread during its LC. This value will be changed during thread LC. Elapsed CPU Time: The consumed CPU-time for the thread during its LC. This value will be changed during thread LC.

42

Elapsed Running Time: The total consumed time of the thread-LC. This value will be changed during thread LC. CPU Usage %: The percentage of CPU occupied to the thread. This value will be changed during thread LC. Processor ID: The name of the processor assigned to that thread. This assigned CPU may be changed during thread LC. Priority: The priority of the thread to dispatch from Ready-state to Runningstate. This value is assigned by the OS and may be changed during thread LC. Also, this operation may be repeated more than once during the thread-LC. State: State of threads (Running, killed, Pause, Resume) Context Switch: Number of times the processor holds and leaves the thread.This value will be changed during thread-LC. Wait Time(s:ms:µs):Lost time that represents the period between getting CPUresource and leaving it, which is may be repeated more than one time. This time does not include the wasted time at Waiting-State, but it is the time of staying stand by at Ready-State. This value will be changed during threadLC. End Time: Ending execution time of the thread. This value is remained constant during the thread-LC. 3.4 Controlling Stage Algorithm The DSs of this section provide all related information about the abilities of making changes on the operation, priority and target processors of all processes and threads within the program under test. Types of these values are Boolean, 43

and their effects will be reflected on the entries of monitoring-DSs depending on an intermediate DS form call Tracking-DS. These changes includethe following options: i. Start-Kill: A process/thread can be started or created by the user. The user has the ability of starting and/or killing them at any time during its LC. ii. Stop-Resume: A process-thread can be stopped by the user at any time during its LC, and it can be resumed again later. iii. Change Priority: The priority of a process-thread can be changed by the user at any time during its LC, and the priority levels are (Real Time, High, Above Normal, Normal, Below Normal, Low, Idle) iv. Change Target CPU: The execution of a process/thread can be removed by the user at any time during its LC from the current CPU to any other CPU within the same PC. This stage provides the ability of making changes on the processes and threads of the tested program. These changes include their status operation (i.e. Starting, Pausing, Resuming and Killing), priority and forcing them to the target processors. This stage depends on the instance status of the processes and threads imported from the continuous updating data of the monitoring stage during running of these processes and threads. There are four operations related with this stage (killing, pausing then resuming any thread, changing priority and changing target CPU). For killing operation; if all threads of any process are killed by the user then that process will be terminated (finished) automatically without any interference from the user, but it will still stay in the system. So, to remove this process from 44

the system the user needs to use kill operation upon it. Adding to that, this stage gives the user the ability of killing any process directly without the need of killing all its thread firstly, which causes killing of all of its threads. For pausing-resuming operation, the user is capable of applying pausing on any thread and resuming its running later at any time. By the same manner as in killing operation the process will get 0% of CPU-usage if all its threads are paused. And if a process is paused directly, all its threads will be paused. For changing priority operation, this stage provides the ability of changing the priority of any thread within a certain process, this means that this thread will take new importance in comparison among all other threads of that process, not the threads of other processes. So, the priority of the process will not be affected by the new priorities of its threads even if all its threads have new values of priority. This is to take in mind that the priority of a process can be compared with those of other processes, neglecting the effects of changing properties of its threads. Also, the user can change the priority of the process which may be compared with those of other processes, and this value will not have effect on the priority of its threads. For changing CPU, this stage provides ability to change execution of processes or threads from one CPU to another. So, when all threads belong to the same process are changed to one target CPU, then that process will not be moved to the target CPU completely because with creation of a process in the sharedmemory-system the OS will give the choice of using all CPUs within the system to that process. so, this process will still have this ability regardless of threads movement among the CPUs. On the other hand, if the user moved a process from one CPU to another, all its threads must be forced automatically to that target CPU. General Pseudo steps of controlling stage are illustrated in the algorithm (3.1). 45

Begin Step 1.

Depending on the UTP selected during monitoring stage

Step 2.

Select type of controlling: -change CPU -change priority -kill process/threads -pause-resume processes-threads.

Step 3.

IF change CPU

Step 4.

Select the target CPU ID

Step 5.

Select the processes or threads Id to be changed

Step 6

Move the execution of the selected port to the target CPU ELSE

Step 7.

Go to

End IF Step 8.

IF change Priority

Step 9.

Select the priority type: real time, high, above normal, normal, Below normal, low and idle

Step 10.

Select the processes or threads Id to be changed

Step 11.

Change the priority of the selected processes or threads.

ELSE Step 12.

Go to

End IF Step 13.

IF Kill process or threads

Step 14.

Select the processes or threads Id to be killed

Step 15.

Kill the selected processes or threads.

ELSE 46

Step 16.

Go to

Step 17.

IF Pause process or threads

Step 18.

Select the processes or threads Id to be paused

Step 19.

Pause the selected processes or threads.

ELSE Step 20.

Go to End

End IF End Algorithm (3.1): Pseudo steps of Controlling Stage 3.5 Monitoring Stage Algorithm Monitoring stage is work from the starting running this software in order to monitor the system. All of the information about the programs, processes and threads (existed or under test) that explained in section 3.3 above with all GUIs and tables including their details are ley on the responsibilities of monitoringstage. Adding to the existing processes in the system before running monitoring software, any instance running process (out of under testing once) will be entered directly to the table of existing processes and the monitoring stage will monitor it too with all the details listed in the first GUI. Monitoring stage is capable of receiving the changes occurred during the controlling stage handled by the tracking stage and sent to monitoring stage. These changes either occurred due to the termination of processes within the existing processes and putting it out of the monitoring GUIs and tables, or due to the changes occurred at the controlling stage upon the under testing program, processes and threads. General Pseudo steps of monitoring stage are illustrated in the algorithm (3.2).

Begin Step 1.

Input Exists processes into SW 47

Step 2.

Any changes occurred to any entire process

Step 3. Display the current status of the system related with existence processes before the tests Step 4. Testing a UTP for monitoring which will one be of the: SP-ST, SP-MT, MP-ST, MP-MT, MP-SMT Step 5.

Changes occurred to any controlling stage

Step 6.

Display the status of target program through whole of the execution time of the UTP.

Step 7.

Record all above information in suitable files for a certain period selected by the user

End Algorithm (3.2): Pseudo steps of Monitoring Stage. 3.5 Tracking Stage Algorithm There are two phases of tracking-stage; the first one is related to the existence processes before and after running this software. When any running process is complete, the OS will terminate it and removes out of the system. When the user wants to terminate any process by force (i.e. killing it), the OS will also remove it out of the system. So, for correct monitoring, this stage will update all above changes and produce them to the monitoring stage. Second phase is related with the control stage that deals only with the under test program. So, when there are no control-operations there will be no need to this stage, and only monitoring stage will appear. In this stage the software will trace any changes occurred during the controlling stage and update the existence status of the program, processes and thread that affected by these changes. The updated changes will be produced to the monitoring stage which is in operation

48

continuously. General Pseudo steps of tracking stage are illustrated in the algorithm (3.3). Begin Step 1.

Running Monitoring SW

Step 2.

IF changes occurred by controlling stage Go to Step 5 ELSE

Step 3.

Record all information related with the occurred Happened to the UTP and processes and thread

Step 4. Save the results in files to be processed by the Monitoring or controlling stage End IF Step 5. IF More controlling needed

Go to Step 2

Step 6. ELSE Call Monitoring stage End IF End Algorithm (3.3): Pseudo steps of Tracking Stage.

49

Implementation Results of the Proposed System

C H A P T E R F O U R

Chapter Four Implementation Results of the Proposed System 4.1 Introduction The results of implementing algorithms explained in chapter three will be produced and discussed in this chapter. This chapter will concentrate on those programs SPST, SPMP, MPST, MPMT and MPSMT that are tested and considered as user programs. Depending on the targets of this thesis, the results will deal with running & pausing, forcing and priority changing of the under test processes & threads. Adding to these results this chapter brows the related timings during these changes done on the processes & threads. The timings are totalexecution, CPU-execution-time, Kernel-time and user-time. Adding to that, the results of another test-style will be produced that are related with the distributed memory parallel processing approach. 4.2 Mechanism of Application Software Operation With general view of the proposed system’s GUI, four main parts can be shown. The first one is related to the Controlling part, which in turn can be interpreted as Monitoring and Tracking too. The second part is concerned with features of the Under Test Program (UTP). The third part illustrates the features of all processes of this UTP. Finally, the fourth part is related with the features of all these processes. Figure (4.1) represents the GUI of the Proposed Monitoring System (PMS), while Figure (4.2) shows the currently existed processes with all their features. The Control-Part (CP) consists of five sub-parts: Case Study (CS), Priority Change (PC), Core Change (CC), Kill-Pause-Resume-Process-Thread (KPR-PT), and Data Recording (DR). Adding to that, there are four options: Start (for starting the processing with the selected case study as a UTP), Clear (preparing to leave the current UTP in order to select another case study), Exit (exiting from the monitoring system), and Processes Information Table (to brows the important 50

information of the existing processes) by selecting the Processes and Information Table from the GUI.

Figure (4.1) GUI of the Proposed System

Figure (4.2): Browsing the Features of the Existed Processes. The proposed system needs to assume programs with different structures related with a number of processes and threads, so sorting algorithms have been depended.

51

Adding to that, the other algorithms (rather than of sorting) can be depended. The mechanism of working with the PMS is as follows: a. Select one of the Case Studies: SPST, SPMT, MPST, MPMT, and MPSMT. b. Select the Start bottom. c. Control the Priority of the processes/threads: Idle, Low, Below Normal, Normal, Above Normal, high, and Real Time. d. Change the CPUs that execute the processes/threads: Core 0, Core 1, Core 2, Core 3, Core 4, Core 5, Core 6, and Core 7. e. Control the operation of the processes & threads: Kill, Pause, and Resume. f. Control the Data Recording. In order to illustrate the activities of the PMS, several case studies have been assumed. The details of the obtained results are illustrated in section (4.3). 4.3 Monitoring & Controlling & Tracking Implemented Results The proposed system has been implemented using more than one structure of the UTPs with single/multi processes-threads. So, different case studies have been depended as explained in the controlling part of the monitoring system’s GUI. For more understanding the mechanism of the PMS, this section will deal with controlling options of the program with structure MPMT. 4.3.1 Controlling & Tracking Stage Cases Figure (4.3) represents the GUI of the PMS with MPMT structure implemented on a PC with Core i7 with 8 logic-processors. Many cases will be depended in order to illustrate the abilities of the system taking this figure as reference for the depended cases.

52

Figure (4.3): GUI of the PMS with MPMT structure before applying the controlling cases on its Processes and Threads. Case-1 (Priority Controlling): Changing the Priority of Thread 5 Process 1 from (Normal to High) as shown in Figure (4.4). Case-2 (CPU Controlling): Changing the allocated CPU of T2P1 from Core3 to Core4 as shown in Figure (4.5). Case-3 (ThreadOperation Controlling): Pausing T1P2 (Thread 7620 of Process 6564) as shown in Figure (4.6). 53

Case-4 (Thread Operation Controlling): Killing T4P2 (Thread 6880 of Process 6564) as shown in Figure (4.7). Case-5 (Thread Operation Controlling): Resuming T1P2 (Thread 7620 of Process 6564) as shown in Figure (4.8). Case-6: The Status of the Case Study after Normal Termination of all its Processes and Threads, as shown in Figure (4.9).

Figure (4.4): Changing the Priority of Thread 5 Process 1 (Thread 6712 of Process 4924) from Normal to High.

54

Figure (4.5): Changing the allocated CPU of T2P1 (Thread 2376 of Process 4924) from Core3 to Core4. 55

Figure (4.6): Pausing T1P2 (Thread 6880 of Process 6564). 56

Figure (4.7): Killing T4P2 (Thread 7620 of Process 6564). 57

Figure (4.8): Resuming T1P2 (Thread 6880 of Process 6564).

58

Figure (4.9): The Status of the Case Study after Normal Termination of all its Processes and Threads 59

4.3.2 Discussion of Controlling Stage Cases Figure (4.3) shows an illustrative example of MPMT with 8 logical processors. The proposed program here consists of 3 processes (P1 consists of 5 Threads, P2 consists of 3 Threads, and P3 consists of 7 Threads). Hence, there will be 3 processes with 15 threads allocated to 8 CPUs randomly. This figure was depended on to check the abilities of the PMS for controlling cases. These cases have been selected to display the activities and efficiency of the PMS. Because of that there are four main parts within the controlling options, so there was a test for each controlling-part. Figure (4.4) represents Priority-Controlling. In general, the OS makes the priority of all processes and threads to the default value as a Normal level. For this test, the priority level of the fifth thread of first process is changed to High level. This figure illustrates that the priority changing will not cause stopping processes and threads execution. Figure (4.5) represents CPU-Controlling: in general, the OS makes all CPUs to be available for each process, while each thread must be allocated to only one CPU. For this test, the second thread of the first process firstly was allocated to (CPU-3). Depending on this test, this thread is allocated to (CPU-4). Also, this figure illustrates that the allocated CPU changing will not cause stopping processes and threads execution. Figure (4.6) represents Operation-Controlling (Pausing): in general, the OS makes all processes and threads to be on execution state (even when they are out of running-state). For this test, first thread of second process was at execution situation. Depending on this test the operation of this thread has been paused. Also, this figure illustrates that the pausing thread(s) will not cause pausing or stopping other processes and threads execution.

60

Figure (4.7) represents Operation-Controlling (Killing): in general, the OS makes all processes and threads to be on execution state (even when they are out of running-state). For this test, the fourth thread of the second process was at execution situation. Depending on this test the operation of this thread has been killed. Also, this figure illustrates that the killing thread(s) will not cause pausing or stopping other processes and threads execution. Figure (4.8) represents Operation-Controlling (Resuming): in general, the OS makes all processes and threads to be on execution state (even when they are out of running-state). For this test, the first thread of the second process was at execution situation. Depending on this test the operation of this thread has been resumed. Also, this figure illustrates that the resuming thread(s) will not cause pausing or stopping other processes and threads execution. Figure (4.9) represents the status of the processes and threads after Normal Termination of them. Here all related features have been obtained for the program, processes, and thread. 4.3.3 Implementation and Monitoring Results The implementation part is applied to cover all features of the PMS. The obtained results are considered as monitoring and tracking stages. The illustration of these results is shown in Tables (4.1 to 4.10) and Figures (4.10 to 4.28). Different types of multiprocessor systems (i7, i5 and Core-2-Due) have been depended on for the implementation results. Table (4.1) represents Elapsed CPU Time for Processes of MP-ST using i7 processor. While User and Kernel Time are illustrated in Figures (4.10 and 4.11) respectively. 61

Table (4.2) represents Elapsed User Time for Threads of MP-MT using i7 processor and has been plotted in Figure (4.12). Table (4.3) illustrates the Recorded Elapsed CPU Time for Processes and Threads of MP-SMT using i7 processor. Figures (4.13 and 4.14)) show the plot of these results for P2. Relation between Threads Priority and Context Switch of MP-ST using i7 processor is illustrated in Table (4.4). Figures (4.15, 4.16, 4.17 and 4.18) show the Context Switching of MP-ST for (P1, P2, P3 and P4) respectively. Table (4.5) represents the Effect of Changing Priority on Elapsed Running Time of MP/SMT using i7 processor. Figures (4.19 and 4.20) show the plots of these results with the same and different MP-SMT priorities. Table (4.6) represents the results of the effect of Changing Selected CPU on Elapsed Running Time for P2 of MP-SMT using i7 processor. Figure (4.21) shows the plots of these results for all/selected number of threads. Table (4.7) represents Effect of Changing the Priority on Elapsed Running Time for Threads of P2 of MP/SMT using i7 processor. Figures (4.22 and 4.23) show the plots of these results for all/selected numbers of threads and different priorities. Table (4.8) represents the results of the effect of Increasing No. of Participated CPUs on CPU-Usage of SP-MT using i5 processor which are plotted as in Figure (4.24). Table (4.9) represents the results of the Elapsed CPU Time for P1 and P2 of MPSMT Using i5-Processor System. Figures (4.25, 4.26) show the plots of these results with different selection of threads and processors. Table (4.10) represents the results of the Elapsed CPU Time for Threads of MPSMT using Core-2 Due Processor System. While these results are plotted in Figures (4.27 and 4.28). 62

Table (4.1): Elapsed CPU Time for Processes of MP-ST using Core i7 Process

Kernel Time (Sec)

User Time (Sec)

P1 P2 P3 P4

0.062 0.015 0.000 0.015

890.015 890.375 882.812 915.082

Processes

P4

915.082

P3

882.812

P2

890.375

P1

890.015

860

870

880

890

900

910

User Time (Sec)

Figure (4.10): User Time of MP/ST using Core i7

63

920

P4

0.015

P3

Processes

0

P2

0.015

P1

0.062

0

0.01

0.02

0.03

0.04

0.05

0.06

0.07

Kernel Time(sec)

Figure (4.11): Kernel Time of MP-ST using Core i7

Table (4.2): Elapsed User Time for Threads of MP-MT using Core i7 Process

P1

p2

Thread

User Time(sec)

T1 T2 T3 T4 T5 T1 T2 T3 T4 T5 T6

182.687 163.718 0.484 0.484 636.89 610.000 0.562 0.515 163.609 117.890 636.718

64

User Time T6

636.718

Processes with threads P1 p2

T5

117.89

T4

163.609

T3

0.515

T2

0.562

T1

610

T5

636.89

T4

0.484

T3

0.484

T2

163.718

T1

182.687 0

100

200

300

400

500

600

700

Figure (4.12): User Time of an MP-MT using Core i7 Table (4.3): Recorded Elapsed CPU Time for Processes and Threads of MP-SMT using Core i7

Processes Threads Record Time User Time Kernel Time

P1

T1

P2

T1

P2

T2

P2

T3

4 8 12 16 20 24 4 8 12 16 20 24 4 8 12 16 20 24 4 8 12 16 20

2.312 4.437 6.546 8.625 10.734 12.812 2.218 4.343 6.437 8.468 10.593 12.687 2.375 4.437 6.531 8.687 10.765 12.859 2.312 4.375 6.437 8.546 10.625 65

0 0 0 0 0 0 0.015 0.015 0.015 0.015 0.015 0.015 0.015 0.015 0.015 0.015 0.015 0.015 0.015 0.015 0.015 0.015 0.015

24

12.750

0.015

P2

T2

T1 0

user time record time

5

10

15

20

25

30

P2 T1 T2 T3 2.22 4.34 6.44 8.47 10.6 12.7 2.38 4.44 6.53 8.69 10.8 12.9 2.31 4.38 6.44 8.55 10.6 12.8 4

8

12

16

20

24

4

8

12

16

20

24

4

8

12

16

20

Record Time (Sec) user time

record time

Figure (4.13): Treads User Time of MP-SMT for P2 using Core i7

P2

T3 KERNEL TIME

User Time(Sec)

T3

T2

T1 0

5

T1 kernel Time 0 0 record time 4

0

10

0

0

15

P2 T2 0 0 0

8 12 16 20 24 4

0

20

0

0

0

25

T3 0 0

8 12 16 20 24 4

0

30

0

0

0

8 12 16 20 24

RECORD TIME kernel Time

record time

Figure (4.14): Threads Kernel Time of MP-SMT for P2 using Core i7 66

24

Table (4.4): Relation between Threads Priority and Context Switch of MP-ST using Core i7 Record Time (sec)

P1T Context Priority switch

56

94 168 243 318 392 467 545 622 700 776 854 930 1008 1085

60

1115

64

1118

68

1120

4 8 12 16 20 24 28 32 36 40 44 48 52

72 76 80 84 88 92 96 100 104 108 112 116 120 124 128

1121 1124 1126 1228 1129 1130 1131 1132 1133 1135 1136 1139 1141

1142 1144

Normal Normal Normal Normal Normal Normal Normal Normal Normal Normal Normal Normal Normal Normal Above Normal Above Normal Above Normal Above Normal Above Normal Above Normal Above Normal Above Normal Above Normal Above Normal Above Normal Above Normal Above Normal Above Normal Above Normal Above Normal Above Normal Above Normal

P2T Context switch 96 174 247 321 395 471 547 624 702 779 856 934 1011 1088

Priority

P3T Context Priority switch

Normal

94 181 256 330 404 481 558 635 712 789 867 944 1021 1098

1165

Normal

1232 1234 1234 1236 1237 1238 1239 1240 1241 1242 1243 1244 1245 1248 1249 1250 1252

P4T Context Priority switch

Normal

95 183 255 329 404 480 558 635 713 789 869 947 1024 1101

1128

Normal

1179

Normal

Real Time

1129

Normal

1247

Normal

Real Time

1130

1249

Normal

Normal Normal Normal Normal Normal Normal Normal Normal Normal Normal Normal Normal Normal

Real Time Real Time Real Time Real Time Real Time Real Time Real Time Real Time Real Time Real Time Real Time Real Time Real Time Real Time Real Time

67

1131 1132 1133 1134 1135 1136 1137 1138 1139 1140 1141 1142 1143 1144 1145

Normal Normal Normal Normal Normal Normal Normal Normal Normal Normal Normal Normal Normal

High High High High High High High High High High High High High High High High

1249 1250 1251 1252 1253 1253 1255 1256 1257 1258 1259 1260 1261 1262 1263

Normal Normal Normal Normal Normal Normal Normal Normal Normal Normal Normal Normal Normal Normal

High High High High High High High High High High High High High High High

1400

Context switch

1200 1000

Above Normal priority

800 600

Normal priority

400 200 0 0

20

40

60

80

100

120

140

record time

Figure (4.15): Context Switching of MP-ST for (Process 1) using Core i7 1400 1200

Context switch

1000

Real Time priority

800 600

Normal priority

400 200 0 0

20

40

60

80

100

120

140

Record time

Figure (4.16): Context Switch of MP-ST for (Process 2) using Core i7

68

1400 1200

Context switch

1000

High priority

800

Normal priority

600 400 200 0 0

20

40

60

80

100

120

140

Record Time

Figure (4.17): Context Switching of MP-ST for (Process 3) using Core i7 1400 1200

Context Switch

1000

High priority

800 600

Normal priority

400 200 0 0

20

40

60

80

100

120

140

Record Time

Figure (4.18): Context Switching of MP-ST for (Process 4) using Core i7

Table (4.5): Effect of Changing Priority on Elapsed Running Time of MP-SMT using Core i7 Elapsed Running Time Processes

P1 & P2 Normal Priority(sec)

P1:Normal Priority P2:High Priority

P1

890.241

957.933

P2

980.441

616.593 69

P1 & P2 Normal Priority(sec) ELAPSED RUNING TIME

1000 980

980.441

960 940 920 900

890.241

880 860 840

P1 & P2 Normal Priority(sec)

P1

P2

890.241

980.441

PROCESSES P1 & P2 Normal Priority(sec)

Figure (4.19): Elapsed Running Time of MP-SMT (Only 2 Processes) using Core i7 P1:Normal Priority P2:High Priority 1200

ELAPSED RUNING TIME

1000

957.933

800 616.593 600 400 200 0

P1:Normal Priority P2:High Priority

P1

P2

957.933

616.593 PROCESES

Figure (4.20): Elapse Running Time of MP-SMT using Core i7

70

Table (4.6): Effect of Changing Selected CPU on Elapsed Running Time for P2 of MP-SMT using Core i7 Elapsed Running Time Recording Time

All Threads of P2 on CPU7

Each Thread of P2 on individual CPU

4 8 12 16 20 24 28 32 36 40 44 48 52 56 60 64 68 72 76 80

35.030 39.062 43.093 47.124 51.155 55.187 59.218 63.249 67.281 71.312 75.343 79.374 83.406 87.437 91.468 95.5 99.531 103.562 107.593 111.625

26.781

71

30.812 34.843 38.875 42.906 46937 50.969 55.0 59.031 63.062 67.094 71.125 75.156 79.188 83.219 87.250 91.281 95.313 99.344 103.375

120

Elapsed Running Time

100 80 60 40 20 0 4

8

12

16

20

24

28

32

36

40

44

48

52

56

60

64

68

72

76

80

Recoeding Time All Threads of P2 on CPU7

Each Thread of P2 on individual CPU

Figure (4.21): ERT for All threads of P2 on CPU7 of MP-SMT using Core i7 Table (4.7): Effect of Changing the Priority on Elapsed Running Time for Threads of P2 of MP-SMT using Core i7 Elapsed Running Time of P2 Only T2 Low Priority and Only T3 High Priority T3 Real Time priority

Threads

All Threads Normal Priority

T1

125.125

125.704

125.672

T2

170.510

170.110

273.267

T3

622.130

598.115

589.145

T3 Threads

598.115

T2

169.11

T1

122.704 0

100

200

300

400

500

600

700

Elapsed Running time of P2

Figure (4.22): Elapsed Running Time for Threads of P2 of MP-SMT, Thread3 High Priority using Core i7 72

Threads

T3

589.145

T2

273.267

T1

121.672 0

100

200

300

400

500

600

700

Elapsed Running Time of P2

Figure (4.23): Elapsed Running Time for Threads of P2 of MP-SMT, Thread2 Low Priority and Thread 3 Real Time Priority using Core i7 Table (4.8): Effect of Increasing No. of Participated CPUs on CPU-Usage of SPMT using Core i5 CPU Usage 100%

No of CPUs

25

1

23.84

2

68.26

3

97.28

4

No of CPUs

4

97.28

3

68.26

2

23.84

1

25

0

20

40

60

80

100

120

CPU Usage Figure (4.24): CPU Usage of SP-MT for 4 CPUs using Core i5 73

Table (4.9): Elapsed CPU Time for P1 and P2 of MP-SMT using Core i5 P1 Running on CPU0

Recording Time

P2 Running on all other CPUs

User Time

Kernel Time

Elapsed CPU Time

User Time

Kernel Time

Elapsed CPU Time

4

33.281

0.031

33.312

35.830

0.046

35.877

8

37.312

0.031

37.343

39.908

0.046

39.955

12

41.328

0.031

41.359

43.987

0.046

44.034

16

45.359

0.031

45.390

48.065

0.046

48.112

20

49.390

0.031

49.421

52.175

0.046

52.222

24

53.453

0.031

53.484

56.253

0.046

56.300

28

57.484

0.031

57.515

60.331

0.046

60.378

32

61.515

0.031

61.546

64.410

0.046

64.457

36

65.531

0.031

65.562

68.472

0.046

68.519

40

69.546

0.031

69.578

72.535

0.046

72.582

44

73.562

0.031

73.593

76.598

0.046

76.645

48

77.578

0.031

77.609

80.660

0.046

80.707

52

81.593

0.031

81.625

84.723

0.046

84.770

56

85.625

0.031

85.656

88.801

0.046

88.848

60

89.640

0.031

89.71

92.864

0.046

92.911

64

93.656

0.031

93.687

96.656

0.046

96.974

68

97.671

0.031

97.703

100.99

0.046

101.037

72

101.687

0.031

101.718

105.052

0.046

105.099

76

105.703

0.031

105.734

109.115

0.046

109.162

80

109.718

0.031

109.750

113.178

0.046

113.225

74

User Time

Kernel Time

Elapsed CPU Time

120

100

Elapsed CPU Time

80

60

40

20

0 4

8

12 16 20 24 28 32 36 40 44 48 52 56 60 64 68 72 76 80 Time ( sec)

Figure (4.25): Core i5 of MP-SMT of (P1 Single Thread on CPU0) User Time

Kernel Time

Elapsed CPU Time

120

Elapsed CPU TIme

100 80 60 40 20 0 4

8

12 16 20 24 28 32 36 40 44 48 52 56 60 64 68 72 76 80 Recording Time

Figure (4.26): Core i5 of MP-SMT of (P2 with 3 Threads on CPUs 1, 2 and 3)

75

Table (4.10): Elapsed CPU Time for Threads of MP-SMT using Core-2 Due Recording Time

P1 T Running on CPU0

5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80

User Time 4.446 9.437 14.430 19.437 24.414 29.359 34.335 39.374 44.366 49.343 54.350 59.358 64.366 69.326 74.318 79.310

T1 Running on CPU1

P2 T2 Running on CPU1

T3 Running on CPU1

User Time 2.839 5.350 7.846 10.374 12.870 15.334 17.799 20.326 22.854 25.365 27.877 30.357 32.869 35.381 37.814 40.295

User Time 2.870 5.366 7.909 10.389 12.870 15.350 17.940 20.467 22.963 25.459 27.939 30.420 32.916 35.412 37.955 40.451

User Time 1.310 2.75

80

79.31

75

74.318

70

69.326

65

64.366

Recording Time(Sec)

60

59.358

55

54.35

50

49.343

45

44.366

40

39.374

35

34.335

30

29.359

25

24.414

20

19.437

15

14.43

10

9.437

5

4.446 0

10

20

30

40

50

60

70

80

90

User Time

Figure (4.27): User Time for P1 with 1 Threads of MP-SMT on CPU0 using Core2 Due 76

80 75 70 65 60

Recording Time

55 50 45 40 35 30 25 20 15 10 5 0

5

10

15

20

25

30

35

40

45

User Time T3 Running on CPU1

T2 Running on CPU1

T1 Running on CPU1

Figure (4.28): User Time for P2 with 3 Threads of MP-SMT on CPU1using Core-2 Due

4.3.4 Discussion of Implementation and Monitoring Results Table (4.1) and related Figures (4.10 and 4.11) illustrate that the PMS has the ability of determining the burst-time of each process (MPST as an example with 4 processes). This time represents exact CPU time specified for this process/thread which in turn consists of two parts. First part called Kernel-Time, and the second called User-Time. Because of that these programs are user programs, so the UserTime is very larger than kernel-Time. The Kernel-Time approximately equals to zero. Another test has been depended for this PMS. This test relates with the abilities of determining Elapsed User Time for 11 threads (2 processes) of MPMT. These results are illustrated in Table (4.2) and related Figure (4.12).

77

The system is able to determine both of User-Time and Kernel-Time during a certain period (24 seconds) for MP-SMT. The depended program consists of 2 processes (P1 with single thread and P2 with 3 threads). With this feature of the PMS it will be easy for the programmer to monitor his program behaviors during the execution. Table (4.3) and related Figures (4.13 and 4.14) represent these results. The relation between Processes Priority and Context Switch of MP-ST is illustrated in Table (4.4) and Figures (4.15, 4.16, 4.17 and 4.18) of the four processes, each has single thread. The priority of each was changed to upper level which caused decreasing in number of context switching. The priorities were changed as: P1, P2, P3 and P4 at second (60, 64, 68 and 72) respectively. Changing process priority will have effect on its execution time. Hence, when executing P1 and P2 with normal priority, the execution times are (890.241 and 980.441) seconds respectively. But, after changing priority of P2 to High level, these times have been changed to (957.933 and 616.953) seconds for (P1 and P2) respectively. Table (4.5) and related Figures (4.19 and 4.20) illustrate these results. Table (4.6) and related Figure (4.21) represents results of the effect of Changing Selected CPU on Elapsed Running Time for P2 (has 3 threads. It is clear that this time will be decreased with forcing the processes among the CPUs to be executed individually instead of forcing all of them on a single CPU. Table (4.7) represents the effect of Changing the Priority on Elapsed Running Time for Threads of P2. Assuming that P2 has three threads and executed three times. Firstly, all threads have normal priority. Secondly, the priority of T3 changed to High level which caused reducing its execution time from (622.130 to 598.115) seconds while the other two threads have not changed. Finally, the priority of T2 is reduced to Low level and the priority of T3 changed to Real Time level. The last one caused increasing execution time of T2 (170.510 to 273.267) 78

seconds and that of T3 were decreased from 622.130 to 589.145 seconds. Figures (4.22 and 4.23) show the plots of these results. Table (4.8) represents the results of the effect of Increasing No. of Participated CPUs on CPU-Usage of SP-MT (14 threads) using i5 processor which are plotted as in Figure (4.24). It is clear that the overall CPU usage equals to 100%. So, for a system that has 4 Cores when using one CPU, its CPU usage will be 25% of the 4 Cores. For 2 CPUs, it will be 50%, for 3 Cores, it will be 75%, and for all Cores (i.e. 4 Cores) they will be 100%. The results are near to these values. Table (4.9) represents the results of Elapsed CPU Time for P1 (single thread) on CPU0. P2 (3 threads) of MP-SMT using Core i5 (i.e. 4 CPUs), these threads are executed on (CPU1, CPU3 and CPU3). Figures (4.25, 4.26) show the plots of these results with different selection of threads and processors. Table (4.10) represents the results of Elapsed CPU Time for P1 (single thread) on CPU0. P2 (3 threads) of MP-SMT using Core 2 Due (i.e. 2 CPUs), all of these threads are executed on CPU1. Figures (4.27, 4.28) show the plots of these results with different selection of threads and processors. 4.4 Evaluation of the Obtained Results In order to check the efficiency of the proposed monitoring system PMS, some of the tests related with forcing the processes to the CPUs have been applied. in parallel with the status of windows operating system (i.e. Task Manager Tool). Figure (4.29) represents the status of the CPU-Usage History of the Task Manger for a system with Core i7 (i.e. 8 CPUs) before executing the under test program (UTP). Figure (4.30) represents the status of the CPU-Usage History of the Task Manger after starting execution of one of the UTPs (MPST: 4 process and 4 threads). Here, the processes by default have been allocated to the CPU1, CPU1, CPU1, and CPU0 respectively, but before Forcing Processes among the CPUs. Figure (4.31) represents the PMS and Task Manager after forcing processes to CPUs (3, 3, 2 and 0). It is clear that the effects of the forcing have appeared at the Task Manager. 79

Figure (4.32) represents the PMS and Task Manager after forcing processes to CPUs (3, 3, 2 and 4). It is clear that the effects of the forcing have appeared at the Task Manager. Figure (4.33) represents the PMS and Task Manager after forcing processes to CPUs (5, 5, 2 and 4). It is clear that the effects of the forcing have appeared at the Task Manager. Figure (4.34) represents the PMS and Task Manager after forcing processes to CPUs (5, 5, 2 and 4). But P4 has been killed and will be stopped. It is clear that the effects of the forcing have appeared at the Task Manager. Noticing that CPU4 at task manager will start decreasing its usage. Figure (4.35) represents the PMS and Task Manager after forcing processes to CPUs (5, 5, and 2). But P3 has been killed and is stopped, P4 is not running because it have killed at the previous step. It is clear that the effects of the forcing have appeared at the Task Manager. Noticing that CPU2 at task manager will start decreasing its usage and CPU4 approximately has no usage. Figure (4.36) represents the PMS and Task Manager after forcing processes to CPUs (5, and 5). But both of (P1 and P2) have been killed and will be stopped, P3 and P4 are not running because they have been killed at the previous step. It is clear that the effects of the forcing have appeared at the Task Manager. Noticing that CPU5 at task manager will start decreasing its usage. Also both of (CPU3 and CPU4) approximately have no usage.

80

Figure (4.29): Before CPU-Changing by PMS and the Status of CPU-Usage of System’s Task Manger

81

Figure (4.30): Effect of CPU-Changing by PMS on the Status of CPU-Usage of System’s Task Manger. (P1, P2 and P3 allocated to CPU1 and P4 to CPU0).

82

Figure (4.31): Effect of CPU-Changing by PMS on the Status of CPU-Usage of System’s Task Manger. (P1 and P2 to CPU3, P3 to CPU2, and P4 to CPU0).

83

Figure (4.32): Effect of CPU-Changing by PMS on the Status of CPU-Usage of System’s Task Manger. (P1 and P2 to CPU3, P3 to CPU2, and P4 to CPU4).

84

Figure (4.33): Effect of CPU-Changing by PMS on the Status of CPU-Usage of System’s Task Manger. (P1 and P2 to CPU5, P3 to CPU2, and P4 to CPU4).

85

Figure (4.34): Effect of CPU-Changing by PMS on the Status of CPU-Usage of System’s Task Manger. (P1 and P2 to CPU5, P3 to CPU2, and P4 to CPU4 then Killed).

86

Figure (4.35): Effect of CPU-Changing by PMS on the Status of CPU-Usage of System’s Task Manger. (P1 and P2 to CPU5, P3 to CPU2 then Killed, and P4 Killed).

87

Figure (4.36): Effect of CPU-Changing by PMS on the Status of CPU-Usage of System’s Task Manger. (P1 and P2 to CPU5 then Killed, P3 and P4 Killed).

88

4.5 Multi-Changing Controlling Options (MCCO) One of the efficient capabilities of the proposed monitoring system is the MultiChanging of Controlling Options (MCCO). So, this section will brows the abilities of MCCO with different cases. Table (4.11): Effects of MCCO using different case studies. Case Study

Proce ss ID

Threa d ID

SP/ST

8776

SP/MT

MP/ST

Priority Level

CPU (Core)

Execution Status

Priority Level

CPU (Core)

Execution Status

4864

High

C6

paused

R.T.

C3

Running

8572

5900

Normal

C1

Running

High

C1

Running

8572

9092

Normal

C1

Running

Normal

C7

Running

8572

4560

High

C0

Running

High

C0

Paused

8572

4820

Normal

C6

Running

Normal

C6

killed

8992

9632

High

C1

Running

Above Normal

C1

Running

8924

1568

Normal

C0

Running

High

C3

Running

9408

9088

Real Time

C7

Running

Real Time

C7

killed

6648

5104

Normal

C5

Running

Normal

C5

Paused

7036

4580

High

C0

Running

High

C0

Paused

7036

8136

Real Time

C7

Running

Real Time

C7

Killed

7036

9688

Above Normal

C5

Running

Above Normal

C0

Running

7496

8664

Normal

C7

Running

Normal

C4

Running

7496

7660

Normal

C3

Running

High

C3

Running

7484

980

Normal

C5

Running

High

C5

Running

5924

7368

Real Time.

C1

Running

Real Time

C2

Running

MP/MT

MP/SMT

After Changing

Before Changing

89

5924

4736

High

C3

Running

High

C3

Paused

5924

5916

Normal

C7

Running

Normal

C7

killed

Table (4.11) represents MCCO using different case studies. Three controlling options have been depended to brows the ability of this monitoring system. These options are: Priority level, CPU (i.e. Core number) and the status of the under-test threads belongs the related processes. For example, looking at the MPMT case study within this table, it can be observed that: Process (7036) has three Threads (4580, 8136 and 9688), while Process (7496) has two Threads (8664 and 7660). It is clear that for these threads triple changes have been applied for the (allocated CPUs, Priority levels and Execution Status). Hence the following changes have been applied: 1. Execution Status of Thread (4580) has been changed from Running to Paused. 2. Execution Status of Thread (8136) has been changed from Running to Killed. 3. CPU Core of Thread (9688) has been changed from C5 to C0. 4. CPU Core of Thread (8664) has been changed from C7 to C4. 5. Priority Level of Thread (7660) has been changed from Normal to High. At the same manner MCCO have been applied to the rest cases (SPST, SPMT and MPSMT). So there is no need to repeat these details for them. 4.6 Comparison between the PMS and the Previous Works In order to make a comparison between the algorithms and the results depended in this work on one-side and those of the previous works on the other-side, there are several works that are addressed. In comparison with the system of Sewon Moon and Byeong-Mo Chang [31] 2006, it can be noticed that: 1. Depended on an existed monitoring system and developed it to monitor multiple threads instead of single thread. - While, the PMS of this thesis: 90

o Has been proposed and not been used previously. o It has the abilities of Monitoring, Controlling and Tracking. o Treats with processes and threads at the same time. 2. Depended on java programs. - While, the PMS of this thesis built in C++ programming language that deals directly with operating system and is faster. 3. Finally, implemented the system with only Window XP and only on Pentium 4 processor. - While, the PMS of this thesis can be implemented on any version of Windows OS, also for any processor type.

In comparison with the system of Ban B. Fatohi, [2] 2011, the following important points realized: 1. Depended on Qt tool for programming the application software which is an additional layer for C++ language tool, and limits the utilizing of C++ language features. - While, the PMS of this thesis treated directly with the C++ program language tool which: o Provides more accurate. o Reduces the OS overhead. o All features are available with using C++ tool. At the other hand not all privileges provided with Qt tool. 2. The previous work avoided using multiple controlling features. - While the PMS of this thesis provides the full controlling options for the under test program (UTP). 3. Implemented the system with only Window XP. - While, the PMS of this thesis can be: o Implemented on any version of Windows OS. 91

o Also, it can be implemented on the Linux OS.

In comparison with the system of Hyun-Ji Kim1, Byoung-Kwi Lee2, Ok-Kyoon Ha3, and Yong-Kee Jun1, [13] 2014, the following important points can be observed: 1. Presented a monitoring tool of concurrent threads and their accesses to the shared memory locations during an execution of the program. 2. While, the PMS of this thesis presents monitoring and full controlling system, this system treats with threads as well as with processes. 3. Implementation and experimentation was carried on a system with Intel Xeon 2 CPUs under Linux operating system (kernel 2.6), while the PMS of this thesis can be implemented for any processor type and for both of (Linux and Windows) OSs.

92

Conclusions and Suggestions for Future Works

C H A P T E R F I V E

Chapter Five Conclusions and Suggestions for Future Works 5.1 Conclusions The following are the important points concluded from the proposed system: 1. One of the most important conclusions for this thesis is producing a professional integrated OS performance measurements system that provides Monitoring, Controlling and Tracking stages for all Processes and Threads of under test programs for the types (SPST, SPMT, MPST, MPMT, and MPSMT) have been applied efficiently. The integration covered all controlling operations related with: pausing-resumingkilling, CPU changing, Priority Changing and Real-Time recording for all processes and threads of the under-tested programs. 2. The second most important conclusions is that performance measurements of the OS has been done by merging another very important computation technique called Shared Memory Parallel Processing Approach. Just dealing with field of parallel processing is a challenge, so merging this important technique with full OS performance measurements in one system considered as a contribution for this thesis. 3. During the performance measurements, there are some of them have not been produced by any previous work, which are: (Kernel and User) CPU Execution-Time and Context-Switching. So, this is another contribution towards this field.

93

4. All existed processes (not for under test programs) can be browsed and monitored at the same style of the functionality of Task Manger tool of the Windows OS. 5. The problem of using just one type and one version of the used OS has been overcome. This system can be applied successfully to any version of Windows OSs. Adding to that Linux OS can be used with the important features of this system. Also, this is a new step in such this works never has been applied before. 6. The under-graduate and postgraduate students can get benefit from this system during their studying. Especially those that working in the Operating Systems field, to be applied as a real application that treats directly and in real time style with the: processes, threads and resources which represent the heart of the OS. This is very important conclusion point, because there are no such integrated system deals with OS and Parallel Processing fields that can be used at the computers Labs in the universities for students teaching.

5.2 Suggestions for Future Work

1. It is important to improve this system to monitor and measure the real changes occurred for the processes when transferring from logical memory to the physical memory and Vise-Versa. The monitoring must be done on the instant changes happened to Process Control Block (PCB) and How and When the related resources be Allocated, Deallocated, Reallocated during the transferring. 2. In another direction, this system can be improved to measure (i.e. compute) the spent time for the processes and threads at each state of their life-cycles. This direction can be applied when treating with the two types of processes (Preemptive and Non-preemptive) to compute 94

the real time of the Long-Term-Scheduling, Medium-Term-Scheduling, Short-Term-Scheduling and the Dispatcher latency time. 3. It is required to use distributed memory systems and hybrid memory systems to apply monitoring, controlling and tracking on the processes and threads (locally and remotely). The important point here will be the distinguishing of the processes and threads that been sent by each computer and guiding the results with the computed execution times from the servers to the same processes and threads of the same specified clients.

95

Appendix A Figures for all Algorithms that explained in chapter Four.

No Yes

Yes

No

No

Yes

Figure (A-1): Bubble Sort Flowchart

96

No

Yes

No

Yes Yes

No

No

Yes

Figure (A-2): Insertion Sort Flowchart

97

Yes

No

Yes

No

No

Yes

Figure (A-3): Merge Sort Flowchart

98

Figure (A-4): Selection Sort Flowchart

99

Figure (A-5): Quick Sort Flowchart

100

No

Yes

Figure (A-6): Heap Sort Flowchart

101

Figure (A-7): Bidirectional Sort Flowchart

102

References [1] Ananth Grama, Anshul Gupta, George Karypis and Vipin Kumar,(2003) "Introduction to Parallel Computing", Addison-Wesley. [2] Ban Bihnam Fatohi, (2011) “Modified Approach for Processes and Threads Monitoring and Tracking”, MSc Thesis, University of Zakho. [3] Cameron Hughes & Tracey Hughes, (2001) “Professional Multicore Programming Design and Implementation for C++ Developers”, Willey publishing Inc. [4] Chai, Lei, Qi Gao, and Dhabaleswar K. Panda, (2007) "Understanding the impact of multi-core architecture in cluster computing: A case study with intel dual-core system.", Seventh IEEE International Symposium on. [5] Chapman B., Jost G. and Van Der Pas R. (2008) "Using OpenMP: Portable Shared Memory Parallel Programming", The MIT Press. [6] Coutu J. (2009) "Reduction of Co-Simulation runtime through parallel processing”, M.Sc. thesis, University of Saskatchewan. [7] Dietz H., (2004) "Linux Parallel Processing HOWTO", V2.0, available: http://aggregate.org/LDP/ [8] Ewrgey Nikolaevich, Stanislav Viktorovich and etal, (2010) “Method of Efficient Performance Monitoring for Symmetric Multi-Threading Systems”, U.S.Patent, US 7,836,447 B2. [9] Gregory M. Wright, (2011) “A single-chip multiprocessor architecture with hardware thread support”, Ph.D. Thesis, University of Manchester. [10] Hiromasa Shimada, (2010) “Optimistic Synchronization for External Monitoring Service”, MSc. Waseda University. [11] Holenderski, Mike, Reinder J. Bril, and Johan J. Lukkien, (2013) "Grasp: Visualizing the behavior of hierarchical multiprocessor real-time systems.", Journal of systems architecture 59.6: 307-314.

103

[12] Huisman J. A. (2010) “High-speed parallel processing on CUDA-enabled Graphics Processing Units", M.Sc. thesis, Delft University of Technology. [13] Hyun-Ji Kim1, Byoung-Kwi Lee2, Ok-Kyoon Ha3, and Yong-Kee Jun1, (2014) “Dynamic Monitoring Tool based on Vector Clocks for Multithread Programs” Advanced Science and Technology Letters Vol.76 pp.45-49. [14] IRakhee Chhibber, IIDr. R.B. Garg, (2014)” Multicore Processor, Parallelism and Their Performance Analysis” International Journal of Advanced Research in Computer Science & Technology (IJARCST). [15] Jaakko Kotimäki, (2010) “Measuring system activity on multi-core and multiprocessorplatforms”, MSc Thesis, AALTO UNIVERSITY. [16] Jari Porras, (2004) “Tuning Performance of Multi-threaded programs”, MSc Thesis, Lappeenranta University of Technology, Lappeenranta. [17] Kamboh A. M. and Das R., (2005) "Parallel Processing to Enhance Performance of ATPGs”, Project Report for EECS-570, University of Michigan. [18] Kevin Haghighat, (2008) “Multithreading” an Operating System Analysis. [19] Kienzle, Jörg, and Rachid Guerraoui, (2002) "Aop: Does it make sense? the case of concurrency and failures." Springer Berlin Heidelberg, 37-61. [20] Kirk Kelsey, Tongxin Bai, Chen Ding &Chengliang Zhang, (2009) “Fast Track:A Software System for Speculative Program Optimization”, 7th annual IEEE/ACM International Symposium on Code Generation and Optimization. [21] Larus and M. Parkes, (2002) “Using Cohort Scheduling to Enhance Server Performance”, USENIX Annual Technical Conf., pages 103–114, 2002. [22] LÖF H. (2006) "Iterative and Adaptive PDE Solvers for Shared Memory Architectures", Ph.D. Dissertation, Uppsala University. [23] M.R. Pimple, S.R. Sathe, (2011) “Architecture Aware Programming on Multi-Core Systems”, (IJACSA) International Journal of Advanced Computer Science and Applications, Vol. 2. 104

[24] Mads Dam1, Bart Jacobs, Andreas Lundblad1, and Frank Piessens, (2009) “Security Monitor Inlining for Multithreaded java”, 23rd European Conference, Genoa, Italy. [25] Marathe and Jaydeep, (2007) "METRIC: Memory tracing via dynamic binary rewriting to identify cache inefficiencies.", Association for Computing Machinery Transactions on Programming Languages and Systems (TOPLAS). [26] Michelle Goodstein, Evangelos Vlachos1 and etal, (2009) “Parallel LBA: Coherence- based Parallel Monitoring of Multithreaded Applications”, Chiang Mai University CMU-CS-09-108. [27] Pawe Gepner and Micha F. Kowalik, (2006) “Multi-Core Processors: New Way to Achieve High System Performance” Parallel Computing in Electrical Engineering, 2006. PAR ELEC 2006. International Symposium on. [28] Roberta Coelho, Ayla Dantas and etal, (2006) “The Application Monitor Aspect Pattern “, Association for Computing Machinery 978-1-60558-151. [29] Ross McIlroy, (2010) “Using Program Behaviour to Exploit Heterogeneous Multi-Core Processors”, Ph.D. Thesis, University of Glasgow. [30] Schulz M. and McKee S. A. (2003) "A Framework for Portable Shared MemoryProgramming", IEEE International Parallel and Distributed Processing Symposium, pp. 1113-1121. [31] Sewon Moon and Byeong-Mo Chang, (2006) "A Thread Monitoring System for Multithreaded Java Programs" Association for Computing Machinery SIGPLAN Notices. [32] Shaaban, (2011) “Introduction to Parallel Processing” Spring EECC756. [33] Shamlin and David, (2004) "Threads unraveled: a parallel processing primer.", Proceedings of the Twenty-ninth Annual SAS Institute Inc.

105

[34] Solomon, D.A., (1998) “Inside Windows NT” Microsoft Press. [35] Subhi R. M. Zebari, (2010) “A new Approach for Process Monitoring”, 1st Scientific Research Conference of Technical Education, Duhok. [36] Sun Microsystems, (2008) “Multithreaded Programming Guide”, Sun Microsystems Inc. [37] Titus R. Burns, (2011) “Analyzing Threads and Processes in Windows”, MSc Thesis, Naval Postgraduated School California. [38] Vipin Saxena and Manish Shrivastava, (2009) “UML Modeling and Performance Evaluation of Multithreaded Programs on Dual Core Processor”, International Journal of Hybrid Information Technology Vol.2, No.3. [39] William Stallings, (2005) “Operating Systems: Internals and Design Principles, Fifth Edition” Prentice Hall, ISBN 0-13-147954-7

106

‫ا‬ (‫ا درا‬ /0‫ ه‬2 3

‫ ! ا‬،# $‫ و‬.‫دة ا 'اة‬ ‫إ‬ ‫تا‬ ‫ر ا‬ ‫ ا‬4 5 ‫وآ‬ ( (.‫ ا‬67‫ ا 'ا‬#‫ ه‬، ‫ آ‬+ #‫ه‬

‫ا‬

،

‫ا‬ -.‫ ا‬/0‫ ه‬1 2 ) ‫آ‬ .1‫ آ‬+ ‫ا‬

‫ ا‬1 4 ‫ م‬- 0 ) ‫ و إ= اح و ? و‬.‫ا ا ل‬0‫ ه‬#! 6 ‫ ا‬#! ‫ه ب‬0 ‫ ه' ا‬:‫; و‬.‫ ا‬/0‫ ه‬8 ' ‫ ة‬I J ‫ا ا م ا‬0‫ '! ه‬G 0 ، ! ‫'ن آ‬4 ‫ ذا‬$ D=‫ن ا‬F! G ‫ ذ‬1 ‫ و أ‬.1 + ‫ م ا‬- ‫س أداء‬ .‫ 'ط‬O ‫ت وا‬ ‫ ا‬/0‫ ه‬1 =‫ و‬5=' ‫ ف ا‬M (‫و ' وأ‬.‫ا‬ ‫ و‬، K‫ا آ‬ ‫ ة ا‬:‫ و‬2 4‫ا‬ Total-) 1 P ) ‫ ب‬J: ‫ = س أو‬2 ‫ح '! ا رة‬ ‫ وا م ا‬،‫ة‬ ‫ت ا‬ ‫ ا‬/0‫ل ه‬QR‫و‬ ،(context switching) ،(Kernel-time) ، (User-time) ، ( CPU-time)،(execution-time S 8 ‫ ا‬O (‫ ا‬4 ‫ح‬ ‫ ا م ا‬.(priority of the processes and threads) ،(CPU-usage) T ‫ ا‬2 .‫دة ا 'ى‬ ‫ت‬ ‫ ا‬$ ‫ أي‬S 8 ‫ ا‬O (‫ ا‬4 ، W ‫ أ‬. X4 ‫ و وز و‬1 + ‫ م ا‬P'?R (6$ ( 1 2 ‫ أي‬1D= ‫ث‬ ) ‫و‬ 3 P ‫ أ‬:‫ م وا‬- #! ‫ ا‬/0‫ ه‬1‫أن = س آ‬ /0‫ ه‬#! ‫ ح‬$ /‫ وأ د‬8 DI = 1‫ آ‬4 ‫ و‬،(Kernel-time and Context-switching) . :‫; و‬.‫ا‬ (‫ و‬، :‫ م وا‬- #! (6$ ( 1 2 ‫ أي‬6 ; 2) ‫إد‬ # ‫ ا‬/Q2‫آ'رة أ‬0 ‫ات ا‬K ‫إ] ! إ ا‬ ‫ إن‬.8 DI ‫ا 'از و‬ ‫ آ‬+ ‫اآ ة ا‬0 ‫ _ ا‬- ‫أ= اح‬ ! ،‫ا ا م‬0 ‫'] ^ = را ا‬ #! 4 ‫ و‬. $ ( 1 4 ‫ ا‬J 1 2 8-‫ أ‬2 1 ! ]‫إ‬ #‫'ارز ت ه‬O ‫ ا‬/0‫ ه‬1 0 ) ‫? و‬ # ‫ ا‬S ]‫ت وا 'ا‬ ‫ ا `ت ا‬S 1 + ‫ و‬6 DI ! ]‫'ة إ‬IR ‫د‬ -‫ ا‬، :‫; و‬.‫ ا‬/0‫ه‬ Single-Process-Single-Thread, ) ‫ ا‬:‫'ن وا‬4 = D ' $ # ‫ ر وا‬D R‫ _ ا‬- $-‫ ء‬$ Single-Process-Multi-Thread, Multi-Process-Single-Thread, Multi-Process-Multi‫ا ا م‬0‫ه‬ ‫'ارز ت‬O ‫? ا‬ .(Thread and Multi-Process-Single-Multi-Thread. 2 ( ‫ز دة‬ ‫ و '! أآ‬1 + ‫ا‬ -. $ ‫ أ= ب‬#‫ ه‬# ‫وا‬C++ D‫ا‬ 6 ; 2 ‫ه‬0 ) ‫و‬ . ‫ا‬

‫@‬ ‫ة‪ ،‬ا ر م وا‬ ‫ا‬ ‫س أداء ا ! ت وا ط‬ ‫‪ / 0‬ا‪ .‬ب ا ! ‪ ,‬ا & از* ذات ا (اآ ة ا '& آ‬

‫ا‬

‫ءً‬

‫@‬ ‫@‬ ‫@‬ ‫@‬

‫ر‬ ‫آ‬

‫ا مو‬ ‫ا‬ ‫ا‬ ‫لا م‬ ‫آ ‪+‬ء ) ( ' ت ‪ #$ %‬دة‬ ‫‪ ,‬ما '‬

‫) ‪%'-‬‬ ‫آ رزان ــــــ ‪4* 3‬‬ ‫ر‪ 0‬س ‪ ,‬م ا ' )‪،(2009‬‬

‫م‬

‫ا‬

‫ا‬

‫‪.‬‬

‫@‬ ‫@‬ ‫‪ $‬اف‬ ‫@ د‪ 56 7 .‬ر; ‪ 9 6 :‬ز* ري‬ ‫ا ذا ‪,‬‬

‫@‬

‫@‬

‫‪ )0 7‬ا‪6‬ول ‪2016‬‬

‫ر ‪9‬ا‪8‬‬

‫‪1437‬‬

ón‚íq @ó ä‹ @ŠŽìŒ@òìó÷ŠóióÜNñŠóŽïŽì‹q@ò‹Ð@ñ†bïäíi@Žíi@ŠóŽïŽì‹q@ ðäŽíØ@ñ†bïäíi@óÜ@òìóäaŠŒaí @çbØóïćîíä@ðàónï@ @ðš@çbïäbØóïqa‹‚@ñó÷@Lµš@çbïäbØó“ŽïØ@LçóØ@ò† ”ï÷@çŽíš@óäbàónï@ãó÷@óØ@Žñ‹Ùi@òìó÷@ŠóóÜ@òìóåï'ÜíŽ ÙŽïÜ@óØ @ãó÷@Šó@óÜ@ñóä'ýíÔ@òìóåï'ÜíŽ ÙŽïÜ@ìóÜ@óÙŽïØóî@óîòìóåï'ÜíŽ ÙŽïÜ@ãó÷NæîóÙi@ŠóòŠbš@óäb“ŽïØ@ìó÷@µäaím@ò†@çŽíš@ Lóî @@Nòìa‹Ø@üi@ñ‰ŽîŠ2 @óàbäŠói@ì@òìa‹Ø@æîaî†@óØ@Nòì†‹Ø@ñŽïèói@ñ‹Žî†ìbš@@ðÙŽïàónï@ñŠbïå“Žïq@óØ@òìa‹Ø@ómóibi @òì Nçbïä†‹ÙŽïuójŽïu@ ðmbØóÜ@ pbÙi@ çbØòHThreadIìHProcessI@ ñ‹Žî†ìbš@ óîòìó÷@ óàónï@ ãóÜ@ oóióà @~´ƒ“Žïq@ îŠ2 íŽ @ ~HCPUIîŠ2 íŽ @ ð'ÜìŽ 2 äŽíØ@ ñbäaím@ óàónï@ ãó÷@ ~óïä@ ‘ói@ ç†‹Ùî‹Žî†ìbš@ ñòìó÷ŠóióÜ @@Næîìa‡Žïq@ðäbØòHThreadIìHProcessIìíàóè@ñòìóä†‹ÙŽïrnò†~ç†‹iìbäóÜ~ç‡äbnòì ~HCPUIðmbØ@~çíjŽïuójïŽ u ðmbØ@ñŽíØ@ïäaŒ@íŽ i@æîìa‡Žïq@ñìaìóm@ðä†‹ØóäaíŽïq@ì@ç†‹ØŠbàˆóè@ñbäaím@óàónï@ãó÷ @ì HProcessI@nƒ“Žïq@bèòìŠóè@òì~HCPUIñímbèŠbØói@ñò‰ŽîŠ2 ~@ Context Switching ðmbØ@ ~æŽïèŠbØói@ðmbØ @bèòìŠóè@òì@~HLinux)ì@HWindowsIôàónï@@ŠóóÜ@Žñ‹åŽïéiŠbØói@Žñ‹äaímò†@óàónï@ãó÷@òìNçbØòHThreadI @@NñŠóŽïŽì‹q@ò‹Ð@ðÙŽî†bïäíi@@Šóè@ŠóóÜ @Hòìa‹Øóä@ “Žïq@ ðäbØòŠbØ@ óÜ@ óØI@ òŒü'Üb÷@ ì@ ça‹ @ ŠûŒ@ óØ@ a‡àónï@ Ûóî@ ìbä@ óÜ@ a‹Ùbi@ óØ@ õóäbîbäaím@ ìóÜ@ ó u @ðîómìóØŠóói@óäaìó÷@ìó÷@ìíàóè@ã'ýói@HKernel-time and Context-switching@I@õòŠbiŠò†@ômójîbmói @“Žïq@ ðäbØòŠbØ@ óÜ@ óØI@ óØóàónï@ ñbäaím@ õòŠbiŠò†@ a‹Ùbi@ ñòìóÜ@ †bîŒ@ Na†óîòìóåï'ÜüÙŽïÜ@ ãóÜ@ òìaŠ‡àb−ó÷ @ì@òìa‹ØŠbïå“Žïq@Hshared-memory parallel processing approachI@La‡àónï@Ûóî@ìbä@óÜHòìa‹Øóä @@Nòìa‹ÙŽïuói@Žðu @ìíàóè@ a†HmemoryIóÜ@ æ’óiìbè@ ñóäbàónï@ ìó÷@ ŠóóÜ@ óØ@ Žðma†ò†@ çbàòìó÷@ ñbäaím@ óàónï@ ãó÷@ bèòìŠóè @@Z@óäbàóÜ@ÚŽïØóîóÜ@oŽî‡ÙŽïq@óØ@òìóåîóÙjïÔbm@çbØòHthreadI@ì@HprocessI@ðäbØóîóØ Single-Process-Single-Thread, Single-Process-Multi-Thread, Multi-Process-Single-I @@NHThread, Multi-Process-Multi-Thread and Multi-Process-Single-Multi-Thread @çbàŠŽìŒ@ðØóîbäaím@óØ@HC++I@ðäbàŒói@òìa‹Øüi@ ñóàbäŠói@ì@æîaî†@óàónï@ãó÷@ðä†‹ÙnìŠ†@ñó îŽ Š2 @ìóØŽìi Nç†‹ÙŽïuói@Žðu@ðîa‹Žï‚@ðä†‹Ø†bîŒ@óÜ@Žðma†ò†

@ @ç†‹Ùî‹Žî†ìbš@ì@çìíša†aì†ói@Lç†‹ØŠa†Šíå@ñììŠóÜ@çbØò‡Žî‹q@ì@÷Žïû‹q@ðîaŠbØ@ñŠòíŽïq a‡jîŠómìbè@ïŽïû‹q@óÜ@•óiìbè@ñòìòŠó‚†bî@ñbàóåi@ŠóóÜ@ @ @ @ @óØóîóàbä@ @çbØónäaŒ@ò†ŠòìŠóq@ì@oäaŒ@ônÜóØbÐ@ðäóàí−ó÷@ói@òìa‹Ø@•óÙ“Žïq@ @ðäbáŽïÝ@ñüÙäaŒ@óÜ@oäaŒ@ð2ÜíÙ@ @õóàbäaì‹i@ðäbåŽïénò†@ói@ðäbØóïnîìa‡Žïq@óÜ@ÚŽï’ói@Ûòì@ @ŠómíïràüØ@äaŒ@óÜ@Šónbà@ @

@ @

ZçóîýóÜ@ Òî‹’@µy@çaŒŠbØ@ @NðäbáŽïÝ@ñüÙäaŒ@~H2009IŠómíïràüØ@äaŒ@‘üîŠüÜbØói@ @ @

Z@ôn’ŠóqŠó@ói@ ñŠbjŽîŒ@‡á«@ÖïÐŠ@ðzj–@N†@ Šò†ò‡îŠbî@ñŠüïÐû‹q

@

@ @2715@ça‡äójŽîŠ@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@2016@@ãóØóî@î‹“m

Performance Measurement of Processes and Threads Controlling ...

Performance Measurement of Processes and Threads C ... on Shared-Memory Parallel Processing Approach.pdf. Performance Measurement of Processes and ...

Download PDF

9MB Sizes 2 Downloads 352 Views

Report

Performance Measurement of Processes and Threads Controlling ...

Recommend Documents