uCloud Lab

Grid Computing

Introduction

Grid service environment is the main structure of a virtual organization, and a member of virtual organization uses various scattered computational resources or information resources as one of virtual computers. In order for a user to solve massive problem using Grid, a mechanism or policy is required that enables an access to Grid resources [1?-3]. Grid resource is of disparate locations and has participation intention and purpose, and Grid application has disparate operation capacities and certain user requirements; therefore, resource management and task management models are required for both Grid resource and application [4?-6]. This paper develops GridRMF (Grid Resource Management Framework), which provides Grid service that responds to variability of Grid resources and diversity of Grid applications.

GridRMF constructs 3-hierarchical resource management structure from the point of view of Grid user’'s resource sharing, and can be categorized into VMS (Virtual organization Management) and RMS (Resource Management System) in terms of resource management. Infrastructure of this framework defines three models: a communication model for maintaining consistency of subservient communication according to hierarchical resource management, an information storage model which accounts for redundancy of Grid information, and a remote object reference model in order to reducing server’'s overhead. VMS is a Grid resource brokering system, which controls access to Grid resource and satisfies application’'s requirements. This paper proposes two strategies: LRM auto-recovery strategy that accounts for availability of Grid service, and optimized virtual organization selection strategy with application’'s characteristics. RMS is system that makes ultimate attribution to the service’'s performance, featuring existing PDP’'s [7] task allocation algorithm that considers adaptability of Grid resources [8]. In collecting and integrating Grid resources, this framework provides resource status monitoring and various visualization views so that it can contribute resource state and performance to task scheduling. GridRMF, moreover, minimizes association with its infrastructure, and constructs solidified API application proxy, using minimum code modifications, so that it could support independency from certain applications. Realized Grid framework is applied with applicationswith different requirements and computational level, showing validity of the suggested service.

Related works

Grid can realize efficient information management through resource management model selection by logically integrating various resources which are belonged to different individuals or organizations. Amongst structure models for Grid resource management, hierarchical models (Globus [9], Legion [10], Ninf, NetSolve [11]) comply with mutual relationship between resource providers and requesters, and abstract owner models include mechanisms for granting permission for which environment real resources should be used in and for their use. This paper reconfigures passive and active hierarchical model components into 3-hierarchical resource management model in terms of their function and management. Among components resources that influence performance are categorized into either static resource component or dynamic resource component according to their variability, and they are used as scheduling information. Domain control agent, an active component, uses dynamic virtual observer monitoring integrated status information—-whereas other frameworks, like Maui Scheduler, Globus, GRAM, and Legion Host Object, provides status information by publishing information service through direct cueing. Moreover, the virtual organization that satisfies application’'s computational level and requirement (i.e., selecting a group that will carry actual computation) provides a mechanism that grants permission to share resource to LRM according to log file information.

Globus is the most well-known Grid related research that follows hierarchical model; it connects and manages scattered heterogeneous resources among infrastructure so that they can be used in Grid tasks, providing service that are required by super-structure. Globus is not an inseparable single system, but is a system that provides various services which grid requires as an independent element. Elements of Globus may be similar to this paper constructs in terms of providing a function that integrates systems that differ from both hardware and software point of view, minimizing performance degradation, However, Globus is ineffective from metadata management point of view in that it has a uniform transforming and transmitting period regardless of distinct characteristics of data, thus impeding overall grid performance GridRMF modeled in this paper are egorized into VMS and RMS, thus providing Grid service that considers relationship between resources. VMS provides most suitable virtual organization due to user request by hierarchically grouping node structure information. RMS calculates running time by receiving metadata status information through virtual observer, and running task allocation algorithm that runs local resource scheduler. Especially, GridRMF added monitoring technology that observes Grid resource information in real-time which corresponds to various visualization views, thus supporting easy management of Grid information and response to change in information.

Research for tools, methods, and applications for Grid system’'s performance monitoring is taking place. Each monitor is not independent and can be run on own Grid system. This framework extracts monitoring information and modifies them into any form, whether it can integrate them or filters relationship between them. It then creates log files of the monitoring information and distributes them to system administrator, displaying processed information [12, 13]. There are several monitoring systems that can support such mechanism: DRMonitor [14], TOPSYS [15], TMP, Supermon, andWatchTower [16]. This paper applies services suggested by each monitoring system to Grid service environment, and concentrates on enhancing performance of monitoring components. Table 1 shows comparison between suggests framework and monitoring system.

Grid resource management framework

GridRMF architecture

As shown in Fig. 1, GridRMF categorizes components into 4 categories according to resources’' participation purpose, and divides them into two hierarchical management systems according to resource information management. There are four types of components: RR (Resource Requester) which requests resource for running Grid application, RP (Resource Provider) which provides idle resources, LRM (Local Resource Manager) which commands RPs an actual task and manages them, and GIM (Global Information Manager) which controls overall task requests and connects them. Information resource management system is, as stated, divided into two hierarchies: RMS for managing physically divided metadata owned by individual, and VMS which contains virtual organization. Figure 1 shows architecture of GridRMF and control flow. Table 2 shows cockpit functions of an adaptive resource management, and their specifics will be explained later on.
Table 1

GridRMF infrastructure

GridRMF constructs common infrastructure for smooth data flow of Grid resources in three steps.

First, communication model which can uniformly respond to active integration during exchanging data, commands, and queries between components (systems and resources connected by Grid) is required. The communication model has two objects: connectivity object which defines connection between resources that follow hierarchical structure into a session, and protocol object which creates and interprets requirements from any commands or queries. Protocol object has Protocol Data Parser for each component that translates commands or queries into a form that can be read by corresponding component and interprets the created form—messages created during the process are to be used as monitoring component. Table 3 shows some of message header define and remark.

Second, resources can be classified into either local resource or wide resource; minimum duplicates of resource information and consistent usage of resource are required. As shown in Fig. 2, resource information is created into entries and is managed according their function and use. SuperEntry controls every entry and gives them an ID and name. Using its ID, specified entry is then applied to connection control between components, scheduling, and monitoring and is connected with other different entries. EntryTable makes and stores similar components into a list in the form of hash table. APPEntry stores independent information of corresponding applications. Applications are used directly by RR, LRM, and RP; therefore, APPEntry records required information of corresponding components—-in other words: from RP, UI name for defining user interface is recorded. From RP, name of proxy interface which acquires common proxy applications and processes them is recorded. From LRM, scheduler name for acquiring reference data for scheduling is recorded. RREntry records the name of application being run by RP and the name of class that is necessary for getting a result remotely. LRMEntry is managed by GIM, and records every RP’'s benchmarking information (TotCpuSpeed, TotCpuNum, NumofRP, etc.) owned by corresponding LRM. RPEntry is used as LRM’'s monitoring components, and records metadata information, status information and task level of RP.

Third, GridRMF construct in a way that RP directly sends computation results to RR through Remote Object Reference so that overheads of certain resources that composes the framework are to be reduced. Remote Object Reference Mechanism functions as following:

LRM creates RefBinder that has RREntry list, converts it into RRRefBroker, a remote interface, and searches RR and RP by binding created RRRefBroker. RR receives the search result through TaskReceiver class, sends the result to LRM through RemoteTaskResult interface, and registers remote result process class onto RefBinder class created by LRM. RP receives corresponding class from RRRefBroker and sends the task result to RR. This mechanism reduces communication time, because RP directly sends the result RR and not through LRM.
Table 2
Figure 1
Table 3
Figure 2

Virtual organization management system of GridRMF

Grid construct gathers resources and users among scattered domain, and creates one virtual organization. VMS is a brokering system composed of GIM and LRM (shown in Fig. 1) that enables users to use remote resources. Therefore, VMS realizes performance enhancement of the framework through efficient resource brokerage.

Resource broker

Resource Brokering connects between grid users and recourse and controls the connection;it can be categorized into either connection brokering rule or job brokering rule.

Connection brokering rule has three major functions: user access control, user authorization, and task submission and monitoring.

RR Entry is automatically created by RR’s connection request, and includes additional information such as image files or data files according to an application specification installed in RR. GIM searches LRM by RR Entry information and name of the application. GIM implements a job brokering service when suitable LRM is found, and registers on the RR waiting list when such LRM does not exist. RP is disconnected after sending a DLL file for gaining LRM list and system hardware property information through GIM. LRM creates LRM Entry by outside connection requests, and LRM virtual observer is automatically created in order to monitor LRM itself. GIM responds to a fault that is originated from unexpected or intentional system error by saving LRM’s updated information and status information and tracing a log file. Job brokering rule follows four factors described in Fig. 3 when selecting virtual organization.

  • Possibility of application execution: One application is assumed to be executed by one virtual organization. One LRM is possible to execute applications independently. GIM provides a list of application which is already executed and should be executed. LRM selects application which can be executed. The first stage selection depends on LRM which knows the capability and size of its virtual organization.
  • Idle State: GIM senses the wait state by LRM virtual observer that is going to execute the requested task and then decides the possibility of execution.
  • Application Power: The operation result of grid application is recorded as log file. The attribute of log file is the application name, execution time, LRM information, number of LRM and RP, total amount of static CPU speed, etc.
  • LRM Performance Index: Performance index can be obtained by the analysis of application log file. It selects a value close to the expected value in proportion to the average speed of CPU and number of nodes for executing requested application.

Strategy of LRM state control and auto-recovery

LRM is a set of framework’s resources, and a communication broker with frequent message transmissions. VMS creates virtual observer according to LRM, and separates operation function and control function. LRM virtual observer, according to individual policy, schedules applications delegated by GIM. It also is a monitor that monitors status of LRM that manages RPs within the virtual organization.

LRM proposes Auto-Recovery strategy in order to guarantee availability of service by monitoring any malfunction within application in real-time, and receiving proper response resource from other LRM. RR must maintain continuous usage of resource until the manual disconnection from grid system. Also RR should be able to use resource connected to LRM despite the change in application specification. GIM detects faults with LRM sensor and information recording LRM virtual observer. When a fault of a certain LRM is detected, GIM searches for other unoccupied LRM. RR sends a remote reference object, an address referenced to a new LRM, and task table, thus automatically initiating a computation by the new LRM. If new LRM search should fail, RR goes into stand-by mode, and it is put up on a waiting list. RR’'s task table and remote object reference do not carry on replicated computation even when LRM fault occurs. Figure 4 is a demonstration of a LRM_0 fault where RR finishes rest of the request task with new LRM_n.
Figure 3
Figure 4
Figure 5

Monitoring visualization of VMS

VMS provides monitoring information in various forms of visualization view to administrator and system constructor. VMS visualizes LRM and RP. Visualization component includes LRMEntry which records status tracked by LRM virtual observer and renewed information of RREntry. As shown in Fig. 5, VMS monitoring visualization provides two types of view. LRM/RR Join View shows relationship between RR and LRM in a form of table component, and specific information of selected row will be displayed as text component. Figure 5b shows LRM and structure of 7 RPs, and the area enclosed with dotted lines shows that a single RP is running two operations (LRM#1, LRM#2) simultaneously.

Resource management system of GridRMF

RMS proposes task allocation mechanism with monitoring information for enhancement of framework’s performance.

Task allocation algorithms

Task allocation algorithm of RMS can be categorized into either static, dynamic or adaptive according to resource management.

UTA is an algorithm that uniformly allocates tasks regardless of RP’s performance. It does not consider RP’s status and reallocation does not take place. CPTA reallocates tasks according to the result of Linpack benchmarking RP performance. It shows faster running time than UTA does; however, RP’s performance is static while the task is running. SPTA has RP performance-based reallocation; it reallocates a task that has been allocated a low-performance RP to an RP that has already completed a task, thus exhibiting fast processing ability. SPTA, however, makes a decision based on the initial result of RP performance benchmarking during reallocation; therefore, it cannot respond to the varying RP performance or status. Performance evaluation shown in Fig. 6 is a comparison between static task allocation algorithms.

DPTA while processing a task takes performance fluctuation of RP and its participation and secession into consideration; it reallocates tasks based on CPU speed and job history. Figure 7 shows the capacity results in regard to the cases where a certain RP cannot execute tasks due to system fault of its where new RP participate in the task processing.

APTA is not based on simple physical number, but based on realtime monitoring information when determining RP’s performance. Since the effect of CPU usage by users and internal system on total processing time can vary, the standard value that measures RP’s performance is applied by defining a Proportional Factor (PF). PF can be obtained by calculating RP’s task processing time based on its CPU usage, while two identical tasks are run on two identical RPs. If CPU speed of RP is defined as {sRP1, sRP2, . . . , sRPNumRP−1}, expected value of total CPU usage as cpuUsageAvg during reallocation, and the expected value of job processor usage as jobUsageAvg, a performance value of RPi can be calculated from the following equation:

RPiPerformance = (100− (cpuUsageAvg− jobUsageAvg)) ×PF ×sRPi ×0.01

Performance evaluation shown in Fig. 9 is a comparison between task allocation algorithms which constructs virtual organization with four RPs that has different performance levels. In a case of SPTA, DPTA, and APTA, there exists an improvement in performance and a change in task size. APTA decreases the total execution time by efficient reallocation based on RP’s performance variation. (c) Shows the result of calculating, based on the total CPU usage, expected value of a Job CPU Usage used in processing a task and a User CPU Usage independently used by the users measured periodically. Because of the relative decrease in job CPU usage among total CPU usage, the higher the user CPU usage is the greater effect on the total running time it will have. (d) Shows the result of user CPU usage randomly imposed and applied on a task allocation algorithm.
Table 4
Figure 6
Figure 7
Figure 8
Figure 9

Monitoring visualization of RMS

RMS monitoring visualization enhances operation efficiency and responds to faults by calculating RP’'s status change value. When RP’'s metadata are renewed, RMS monitoring visualization receives status information through dynamically created RP virtual observer, sends the information to RPEntry and monitoring visualizes. Figure 10 shows five different views of monitoring information visualization of RMS. RP Information View shows selected RP metadata information in the forms of a progress bar and a text component, analyzes extracted hardware information, and applies to the RP performance evaluation. RP Real-time Table View shows information of every RP that is updated periodically in a form of table. RP State Graph View shows CPU usage and user CPU usage of every RP as a realtime graph. Connection Graph View observes RP task process status connected to LRM. RP Network Information View shows total length of running time of RP tasks and a ratio between network transferring times in the form of a bar graph.

GridRMF implementation and performance result

Application architecture

Due to diversification of Grid application specifics, Grid service environment requires hierarchical separation in order to obtain minimum relationship. For the mechanism minimizing interdependence between different application and GridRMF, Application Proxy is designed as shown in Fig. 11. Application Proxy is a standardized common API and includes core function for interacting between application and GridRMF. Application proxy is an Application Name of plug-in for accessing specific application and the object is generated dynamically by Application Templet.
Figure 10
Figure 11
Figure 12
Figure 13

Application execution

In GridRMF execution, an application that has tasks with no relationship and requires large computation with little data is used. Applied application created rendered image using Mandelbrot’s Fractal image process with LSM coloring technique [19]. Figure 12 is an example of LRM Auto-Recovery strategy, and the dotted portion shows execution process: 1!during executing independent applications by constructing two virtual organizations and 6 RPs (LRM[0]-4RP, LRM[1]-2RP), 2!LRM[1] malfunctioned; however, 3!LRM[0] which finished the operation took over LRM[1]’s job by brokering LRM[0] and RR[1]. GIM records LRM’s task process information with LRM virtual observer in order to prevent redundant operation. Figure 13 shows an example where 2 RPs are added during executing a task with original 2 RPs and task is reallocated throughout 4 RPs by APTA mechanism. Color change in (c) displays visualization of reallocation due to 2 additional RPs, and (d) displays that operation will continue with a change in task specification unless the framework is terminated.

Conclusions

This paper constructed GridRMF for efficient resource management in Grid service environment. GridRMF proposes hierarchical resource management structure where its components are categorized into GIM, LRM, RP, and RR according to Grid resource’s characteristics and participation purpose, and its resource management domain are hierarchically divided into VMS and RMS. Communication model is defined in a way that active integration of transmission of data, command, and query between components that follows hierarchical resource management structure is realized. Creating and managing entries, information store model that supports consistency of resource usage and preventing redundant storing of resource information. Remote object reference model that sends task results directly to RR is defined in order to reduce server’s overheads. VMS proposes optimal virtual organization selection mechanism according to LRM performance level, and LRM Auto-Recovery strategy without redundant operation that recovers VO’s fault. In order to respond to status change or Grid resource or its participation, secession, and fault, RMS proposes Adaptive Performance Task Allocation algorithm that supports load balancing, and fault tolerance. Adaptive Performance Task Allocation uses RPPerformance value extracted by real-time resource status monitoring, thus enhancing performance that accounts for resource’s variability. Also, in order to minimize dependency between GridRMF and Grid application, application proxy is used to support independency between applications. Lastly, this paper displayed two examples applying two applications to services suggested by GridRMF, and analyzed adaptability and operation level of Grid resource.

For future studies, in terms of suggested framework in this paper, user access interface that follows subdivided task process for user-friendly task operations is to be added, and application with interdependent tasks or other special cases will be applied. In order to framework developed in this paper to be web service component further development of WSRF-based interface API standard and technology is required.