DRMonitor â A Distributed Resource Monitoring System

Viewer
Transcript

DRMonitor – A Distributed Resource Monitoring System Patrício Domingues ESTG – Leiria - Portugal [email protected]

Luís Silva Univ. Coimbra - Portugal [email protected]

Abstract DRMonitor is a system for monitoring usage of computing resources of networked heterogeneous (Linux and Windows NT’s derived) personal computers. DRMonitor is aimed to serve resource-monitoring applications and to assist load-balancing policies by providing performance and load data about each monitored machine included in the system. Through a reduced set of primitives, applications can periodically receive updated information about system usage. The paper describes available primitives, discusses some internal aspects of the monitoring system, and presents monitoring results of a classroom with 10 personal computers.

Keywords Distributed monitoring, Resource monitoring, Network of personal computers.

João Gabriel Silva Univ. Coimbra - Portugal [email protected]

1. Introduction This paper presents DRMonitor, a distributed system for computational resources monitoring of networks of personal computers. The system aims to assist applications dependent of workload information by periodically diffusing updated metrics of monitored nodes. Basically, the main goal of DRMonitor is to permit performance and usage monitoring of a set of computers connected by a local area network. The ability to monitor resource behavior and usage of distributed systems of loosely coupled heterogeneous computers can be a precious help to system managers and system planners, letting them detect troubleshooting nodes, and assess current resources usage to better plan future needs. Performance and usage monitoring might also come handy for load balancing purposes, when distributed computational resources are used for execution of distributed or parallel applications. In fact, many load-balancing policies rely on up-to-date metrics to properly assess systems’ workload and thus detect imbalance states [1]. The remainder of this paper is organized as follows. Section 2 describes the metrics made available by DRMonitor. Section 3 examines implementation, while section 4 briefly focuses on availability of the system. In section 5, a 48 hours monitoring of a laboratory classroom, involving 10 personal computers, is analyzed. Section 6 reviews related works, while section 7 discusses future plans. Finally, section 8 concludes the paper.

2. Description DRMonitor is built around three main components: a per-organisation coordinator, a per-node agent and a perapplication library. The coordinator is primarily responsible for receiving, storing and distributing the current system metric values, as well as managing the state of the monitored computers. An agent daemon running in each node performs information collection, periodically sampling local workload, and forwarding it to DRMonitor’s coordinator. In return, together with the coordinator’s acknowledgment, the agent receives updated activity values respecting all monitored nodes. Metrics are

made available to applications running in monitored nodes through the library component that exposes DRMonitor API. Available system metrics are grouped in two categories, static and dynamic. Static metrics describe characteristics that remain constant over time, and which greatly influence system performances (CPU performance, memory size, etc.). Dynamic metrics are related to computational activity, measuring system’s main resources usage (percentage of CPU usage, percentage of used memory, etc.). Dynamic metrics are time dependent, requiring periodic samplings in order to keep an up-todate view of each individual node.

− Integer performance: index evaluating the system

2.1 Node Identification Monitored nodes are identified by a persistent 16 bits integer (ID) assigned by the coordinator. When an agent node first connects to the system, it sends a unique identifier to the coordinator (right now, the network interface card MAC address is used). Checking its repository against the node’s identifier, the coordinator retrieves the node’s ID, or assigns a new one if this is the node’s first connection ever made to the system. With this methodology, ID are centralized within the coordinator and are made persistent, remaining constant even if parts or the whole system goes down. The main motivation for the usage of an integer ID instead of node’s MAC address lies in its compactness (16 bits against 48 bits). A drawback of relying upon MAC address for node’s ID matching occurs when a faulty NIC needs to be replaced. However, based in our experience at ESTG-Leiria, faulty NICs are rare.

2.2 Metrics Static metrics comprise the following items: − IP address: this is the node’s Internet Protocol address. The IP address was chosen instead of the computer name, since IP address is sufficient for accessing the computer, while a computer name requires a name service (like DNS) for name resolution. Besides, client computers, as the one targeted by DRMonitor, are seldom registered in a name service. Another advantage of IP address over name lies in its compactness. IP address was not chosen as the node’s system-wide identifier because more and more organisations are using dynamic IP attribution, upon which node’s IP is assigned through the DHCP protocol [2] and can thus change over time; − Processor type: this identifies the system’s CPU architecture (right now, only X86 systems are supported);

− −

− − −

performance on integer computation. Integer performance is assessed through BYTEmark [3], a benchmark that exposes CPU and FPU performances. The native mode of BYTEmark, which is the one used in DRMonitor, relies on well-know algorithms to summarise computer performance with two numerical indexes: one for integer performance, and another to expose floating-point behaviour. Both performance indexes are relative to a Pentium 90 MHz system. Reducing performance metrics to only two indexes is particularly attractive to DRMonitor since this eases information distribution, and simplifies performance comparisons amongst monitored nodes; Floating-point performance: the floating-point performance is also computed through BYTEmark benchmark; Operative systems: numerical code that identifies the computing node’s operative system. Right now Linux, Windows NT, Windows 2000 and Windows XP are supported; Total main memory: indicates how much main memory (RAM) is installed; Total swap: returns size of swap space; Boot time: timestamp (Unix epoch time) that indicates when node was booted. To avoid needs for time synchronization of monitored machines, the timestamp refers to coordinator’s time clock.

Dynamic metrics comprise the following objects: − Load: in UNIX systems, this metric measures the average number of processes in ready state for the last minute, that is, processes that are waiting for CPU. In WIN32, it indicates the instantaneous processor queue length in units of threads; − Processes count: number of processes in the system. This metric is further refined in low-priority and high-priority processes. The former value identifies sleeping, suspended and nice processes, while high-priority counts the processes running in regular or high priority; − Users count: counts the users currently logged on the system. User metric is split in interactive and idle users. A user is counted as an interactive user if she has registered input activity (keyboard, mouse) in the last n minutes (by default n is 3). On the contrary, a user inactive for more than n minutes is counted as idle; − Memory usage: this metric assesses memory usage. For that, two values are returned: percentage of free main memory, and percentage

of used swap. Both percentages are relative to memory sizes reported by static metrics; − Processor usage: processor usage is represented through three percentages: CPU consumed by high-priority processes (that is, high or regular priority), CPU used by low-priority processes and CPU left idle. The rather limited set of captured metrics is mainly due to operating systems differences that use different representations for accounting metrics. While some metrics such as CPU utilisation, boot time and memory usage are universal, others are system specific. For instance, process count is more meaningful in Linux since these systems are process-oriented, while Windows systems are thread-oriented. Respecting metrics accessibility, Windows NT derived systems provide two API: performance data helper [4] (PDH, used in DRMonitor) and windows measurement interface (WMI). Under Linux, many of the metrics are captured through the /proc pseudo-filesystem while others depend on specific system calls.

3. Implementation

2.3. Interface

3.1.1. Coordinator. For each DRMonitor system there is a unique coordinator node running the coordinator daemon (drmd) that controls the whole system state. It receives, stores and distributes DRMonitor collected information to participating agents, the only system component with whom it communicates. The coordinator normally waits for reception of agent messages that carry collected metrics. Upon message reception, coordinator replies back, sending current DRMonitor status that only includes data that were updated since the last message sent to communicating agent.

Through the DRMonitor C library, a programmer can easily retrieve current static and dynamic metrics of monitored nodes. In fact, the programmer just has to use the DRM_GetInfo primitive that returns static and dynamic metric values. The static and dynamic metrics are returned through separate dynamically allocated arrays, respectively, of StaticInfo_t and DynamicInfo_t data types, which are C structures that encapsulate above-described metrics. An array element is returned for each node currently monitored, so vectors have variable length. It is possible to instruct the primitive to retrieve only one type of information (static or dynamic). Information can also be retrieved in a incremental way, that is, a DRM_GetInfo function call will only returns the information that was updated since program’s last access to DRMonitor metrics. Besides information access, two others primitives fit administrative purposes: DRM_Init and DRM_Exit. The former instantiates the library in the calling process, and must be called once, prior to first information access. DRM_Exit terminates the library instantiation, freeing allocated resources.

3.1. Assumptions In developing DRMonitor two basic assumptions were made respecting the computing environment: existence of a TCP/IP stack and multithread support. Since the TCP/IP protocolar stack is available for most computing platforms, being the de facto standard for network communication, this assumption is in no way restrictive. Multithreaded requirement seems more restrictive, but, currently, all major operative systems available for personal computers provide multithreaded support, either natively (WIN32, some UNIX), or through programming library (Linux). The hurdle is that thread interface is not uniform across the targeted platforms (for instance, pthreads for Linux and native thread in Win32), requiring the development of an abstraction layer to hide thread manipulation particularities. In a future version of DRMonitor we plan to experiment pthreads-Win32 [5], which is a pthreads implementation for WIN32 systems.

3.1.2. Agent. The agent daemon (drm) is made up of two threads, agent and interface, which cooperate to fully perform their monitoring and information distribution tasks. The coordinator’s name or network address must be given when the agent daemon is started, either by a command line switch or through a configuration file. The agent thread manages local node throughout the three phases of the monitoring protocol: join, monitor and termination. This is the only thread that communicates with DRMonitor coordinator. In the join phase the agent thread declares its intention to incorporate the DRMonitor system, sending its identifying key (i.e. its MAC address) to the coordinator. Thereafter, and if local node was authorized to join DRMonitor (receiving its system-wide ID), the agent thread enters the monitoring phase. In this phase the agent thread continuously repeats the same sequence of operations: samples local node workload, sends collected information to coordinator and then awaits reception of the coordinator acknowledge message.

Local node workload is sampled every few seconds (by default, the sampling period is one second). Periodically, the last samples of each metric type are combined together through a weighted average, obtained by way of exponential smoothing [6], which emphasizes newer samples over more aged ones. The averaged measures are then forwarded to the coordinator node. Thereafter, the agent thread awaits the coordinator acknowledge, a message which also includes the whole monitored system metrics. Agent thread terminates when the coordinator node signalizes the monitoring system shutdown, or when no message is received from the coordinator node over a long period of time. Created after a successful local agent join phase, the interface thread interacts only with local processes linked to the library component. This thread processes the request of service resulting form the DRMonitor API calls issued by local processes. For each initialized library handle, the interface thread holds the data to manage the library instance. Specifically, two categories of data are maintained: authentication keys and access timestamps. Both authentication keys, which are 32 bits random integers generated at handle initialization, certify the provenience of all messages exchanged between the interface thread and the linked library. Each key is used to certify one side of the communication. This basic certification permits to avoid interference amongst client applications running in the same host. Interference could occur if a bad-behaved or buggy application tried to use other application’s library handle (handle are simples integer indexes, like file descriptor). For each library handle, two timestamps register the absolute time of last accesses to static and dynamic DRMonitor metrics, respectively. Whenever an incremental access is performed to DRMonitor monitoring information by way of that library handle, only static and/or dynamic metric values newer than the respective time mark are returned to the requesting application. 3.1.3. Library. The library component is simply the DRMonitor code, which implements the library API, linked to user’s application. In practice, the library component acts as a wrapper layer between the user application and the local agent interface thread, hiding the inner from the application programmer. In fact, each API call is converted in a service request to be sent for processing to the local agent interface thread, which then answers back to the library component. Finally, the library component adapts returned data (if any) packaging them under the appropriate DRMonitor public interface.

3.2. Communication DRMonitor communication needs are fulfilled by two communication subsystems. One subsystem transports the messages exchanged between the system coordinator and each of the monitoring agent, while the other subsystem is in charge of the locally exchanged agent-library messages. Data exchange between agents and the system coordinator are performed through UDP sockets. UDP was selected instead of the connection-oriented TCP protocol due to its lower resource consumption and higher scalability. To overcome UDP unreliability a simple layer was built on top of UDP to ensure the reliable delivery of single datagram messages through a retry with acknowledgment scheme. Support for bigger than single datagram messages was not considered since this would have significantly increased the complexity of the added layer. In fact, if such messages were considered, TCP would have been the appropriate protocol to select. Data exchanges between local node agent and applications that access DRMonitor metrics are performed through UNIX socket datagrams in UNIX platforms and UDP sockets in other platforms. As this communication is local, we assume that it is not affected by common datagram problems, especially lost and out-of-order delivered messages. Figure 1 depicts the architecture of DRMonitor. Program A and program B represent two unrelated

node 1

node 2

agent

agent

program A

...

node n agent

program B coordinator

programs that are accessing DRMonitor’s metrics through the library interface. Figure 1: Architecture of DRMonitor

4. Availability Networked personal computers are usually known for their instability, needing frequent reboots, either for resolving system crash or completing software installation. Moreover, a personal computer is usually assigned to an individual who effectively controls the machine. Therefore, computers can be switched on and off in an almost unpredictable manner. Since DRMonitor is centralized around a coordinator, it is vulnerable to a failure of its coordinator node. Therefore, the coordinator daemon should be run in a robust system. Usually,

networks of personal computers have one or more dedicated server machines, which availability is critical to the whole computing environment (for instance, file servers and name servers). These machines are treated with particular care, have long uptime and are therefore good candidates for hosting the coordinator daemon. To reinforce system availability, both coordinator and agent daemons are run through a software watchdog application, named runme. Basically, runme forks off a process that executes the application under its control. When the controlled process stops, the software watchdog launches it again. Besides this basic operation, runme can also be configured to perform binary update of controlled application on first execution or each time the application is launched. Another feature lies in notification, with the possibility of an e-mail message notifying each application launch.

5. Monitoring Experiment The DRMonitor system was used in a 48 hours monitoring of a laboratory room at ESTG Leiria, from 7 pm of 4th June to 7 pm of 6th June 2002. The laboratory is mainly used for class teaching. When no classes are being taught, computer science students are allowed to use the laboratory’s personal computers for performing their practical assignments and homework’s. On weekday the classroom closes from 4 am to 8 pm. The laboratory is equipped with 10 personal computers all fitted with Pentium III processors. In fact, 8 of the machines are identical, the ninth is a spare, and the last one is a Linux server. All other computers run the professional edition of Windows 2000. Also, except for the Linux server, which is always switched on, others machines might be switched off. For the experiment the Linux server was chosen as DRMonitor’s coordinator, mainly due to its high availability, and the fact that it was the only machine that did not allow logon at console. The main characteristics of computers are shown in Table 1. Columns “INT” and “FPU” refer, respectively, to DRMonitor’s integer and floating-point performance indexes. Table 1: Characteristics of monitored computers Qty

CPU (MHz)

INT

FPU

OS

RAM (MB)

Swap (MB)

1 1 8

728 450 1100

10.67 5.92 15.07

9.51 6.72 16.96

Linux W2K W2K

512 256 256

125 427 617

To avoid change of behaviors that could false results, only system administrators knew about the monitoring. For the purpose of the experiment, we developed a

DRMonitor based application that recorded, each minute, the dynamic metrics of monitored nodes of the system. Thus, over 48 hours, 2880 samples were collected. Averages of these collected metrics are shown in Table 2 and in Table 3. Except for the boot count column that shows the accounting of boot cycles, all values are percentages. In Table 2 and Table 3, Linux server is identified as L, while windows 2000 machines are labeled from W1 to W9. W1 corresponds to the slowest machine, while W2 through W9 are the 8 identical machines. Table 2: Average of CPU, RAM and Swap usage Machine

CPU Usage

RAM Usage

Swap Usage

L W1 W2 W3 W4 W5 W6 W7 W8 W9 Average

3.51 3.10 2.29 3.47 7.37 3.40 5.52 5.77 11.28 8.81 5.45

68.27 79.2 61.39 65.38 73.52 47.87 67.81 63.53 60.66 63.72 65.82

0.00 47.43 18.79 16.9 17.27 17.12 16.03 20.29 20.67 18.75 19.15

Table 3: Average of uptime, boot and interactive usage Machine Uptime Boot Interactive Usage L 100.00 0 83.24 W1 100.00 0 45.32 W2 99.72 1 49.27 W3 76.84 2 27.95 W4 77.40 1 34.46 W5 80.43 1 40.65 W6 48.78 2 18.70 W7 40.88 1 65.08 W8 34.40 3 80.67 W9 80.99 6 59.29 Average 73.95 1.7 50.46 As expected and shown by similar works [7], average CPU usage was very reduced, slightly above 5%, confirming that many CPU cycles go unused. However, memory appeared as a scarcer resource than CPU, as shown by the 65.82% of RAM usage. Nonetheless, in a 256 MB computer, 35% of unused memory represents almost 90 MB. Average swap usage was 19.15%, which indicates, jointly with memory usage, that monitored computers’ RAM memory size is appropriate. Average uptime of the 10 monitored machines was 73.95%. Note that all the downtime resulted from machines being switched off. Except for machine W9, which had 6 reboots, the boot count was quite low, mostly limited to 1 or 2. It is interesting to note that interactive

usage of machines represents only half of machines’ uptime, that is, almost half of the time machines are switched on with no users logged on. The distributions of percentages of idle CPU (top line), free RAM (middle line) and used swap (bottom line) over the 48 hours are plotted in Figure 2. During the monitoring, CPU idleness remained always above 55%. In fact, CPU idleness was frequently near 100%, even if some bursts occurred. Surprisingly, some of this burst activity was also detected during the laboratory closing hours, especially on the second day. We plan to investigate this issue on a next monitoring assessment. Figure 2 also shows that CPU usage is tightly coupled with RAM memory usage: an increase in CPU usage augments RAM utilization and vice-versa. This is an obvious consequence of the fact that new processes need memory and CPU. The reverse happens when a process terminates. Swap usage remained practically constant, slowly oscillating over 20%. This strengthens our idea that machines are fitted with enough RAM, at least for the workload observed during the monitoring. Comparing RAM and swap usage, it can be observed that the swap curve roughly follows memory usage, although strongly attenuating high frequencies. This is a natural consequence of how memory systems are organized.

Figure 2: Resource usage During the experiment we also assessed DRMonitor intrusiveness. The coordinator daemon, run on the Linux server, had a very low CPU usage, well below 1%. However, on the same machine the agent used on average 1.3% of CPU time. We also observed that agent’s CPU usage is coupled to the number of existing processes in the system (rises when process count is higher), and thus suspect that much of the CPU consumption occurs when processes are being counted and classified (high and low priority). According to [11] accessing Linux’s pseudofilesystem is quite costly. Since DRMonitor resorts to /proc filesystem for processes accounting and classification, the observed overhead is probably caused by /proc usage. We plan to address this issue in a future

version. Under Windows 2000, agent intrusiveness was practically imperceptible.

6. Related Work Resource performance and usage monitoring has been an active area of research. Distributed System Monitor (DSMon) [8] is a distributed program that gathers system information, distributing it to all participating processors. The main emphasis of DSMon is on fault-tolerance. Ganglia [9] is a comprehensive monitoring system developed at the University of Berkeley. Though restricted to Unix machines (Win32 version is in beta) Ganglia permits the incorporation of user’s defined metrics. Ganglia information is centrally managed, diffusing monitored data through UDP multicast or by way of TCP connections. The Ganglia system is mainly oriented for clusters. The Network Weather Service (NWS) [10] is a distributed framework that aims to provide short-term forecasts of dynamically changing performance characteristics of a distributed set of computing resources. NWS measures fraction of CPU time available for new processes, TCP connection time, end-to-end TCP network latency, and end-to-end network bandwidth. It periodically monitors and dynamically forecasts shortterm expected performance of various computational resources and networks for monitored machines. NWS operates through a set of performance sensors (“monitors”) from which it gathers readings of instantaneous conditions. Performance forecasts are drawn from mathematical analysis of collected metrics. The fact that NWS only supports Unix environments appears as a major drawback, especially when considering that a high percentage of machines run Windows based operative systems. Comparatively to the above-mentioned systems, DRMonitor is lightweight (only a client daemon needs to be installed in monitored machines) and fairly portable (Windows OS and Linux). Respecting application-level security, DRMonitor can be run in user-level mode thus requiring no privileged rights. DRMonitor also offers a simple API, through which monitored data can be accessed. Also, by way of its two performance evaluation indexes, DRMonitor can be used for implementing static load balancing policies for distributed or parallel applications. The main weaknesses of the DRMonitor respect scalability and security. Scalability has not yet been properly tested, while security has not been highly considered. In trustable and controlled environments, DRMonitor can be a useful monitoring system, appropriate for resource-monitoring applications and to assist load-balancing policies.

7. Future work One of our short-term goals is to optimize DRMonitor’s agent in order to reduce its intrusiveness, possibly developing an auto control scheme that will space metrics sampling when CPU time consumed by the agent process grows above a defined threshold. Based on DRMonitor, we also plan to develop a monitoring logging tool that will use a database for storing monitoring data, and a web interface for ease of access. The main goal of the tool will be to permit quick analysis of resource usage in network of personal computers, allowing detection of performance problems (machines which needs to be upgraded) or opportunities for idle resource harvesting.

8. Conclusions This paper has presented the DRMonitor monitoring system. DRMonitor is a simple, yet useful, performance monitoring system that can be used for resource planning, or easily integrated in load balancing methodologies for harvesting idle CPU cycles and unused memory. We also presented a monitoring analysis of 48 consecutive hours of a laboratory classroom equipped with 10 personal computers. Our results showed that, on average, as much as 95% of CPU remains idle. Combined with the 35% of unused RAM, laboratory classrooms appear as good candidates for CPU cycle harvesting. Attractiveness of laboratory classrooms for idle CPU cycle exploitation is strengthened by the fact that machines have no real personal “owner”, being centrally and thus more closely managed.

References [1] M. H.Willebeek-LeMair and A. P. Reeves, "Strategies for Dynamic Load Balancing," IEEE Transactions on Parallel and Distributed Systems, vol. 4, pp. 979-993, September 1993.

[2] R. Droms, "RFC 2131 - Dynamic Host Configuration Protocol," March 1997. [3] Bytemark, http://www.byte.com/bmark/. [4] M. Pietrek, "Under The Hood," in Microsoft System Journal, March 1998. [5] Pthreads-Win32, http://sources.redhat.com/pthreads-Win32/, 2002. [6] W. Stallings, "Exponential Smoothing," Dr. Dobb's Journal, vol. 283, pp. 127-130, March 1998. [7] T. E. Anderson, D. E. Culler, D. A. Patterson, and NOW team, "A Case for NOW (Network of Workstations)," IEEE Micro, pp. 54-64, February 1995. [8] M. Bearden and R. Bianchini, "Efficient and faulttolerant distributed host monitoring using system-level diagnosis," presented at IFIP/IEEE International Conference on Distributed Platforms: Client/Server and Beyond, Dresden, Germany, 1996. [9] Ganglia, http://ganglia.sourceforge.net/. [10] R. Wolski, N. Spring, and J. Hayes, "The Network Weather Service: A Distributed Resource Performance Forecasting Service for Metacomputing," Journal of Future Generation Computing Systems, vol. 15, pp. 757768, 1999. [11] R. Minnich and K. Reid, "Supermon: High performance monitoring for linux clusters," presented at 5th Annual Linux Showcase & Conference, Oakland, California, USA, 2001.

DRMonitor â A Distributed Resource Monitoring System

classroom, involving 10 personal computers, is analyzed. Section 6 reviews related ... IP address: this is the node's Internet Protocol address. The IP address ...

Download PDF

63KB Sizes 1 Downloads 147 Views

Report

DRMonitor â A Distributed Resource Monitoring System

Recommend Documents

DRMonitor â A Distributed Resource Monitoring System