You are on page 1of 66

Overview of SOA Programming Model and Runtime System for Windows HPC Server 2008

Microsoft Corporation Published: May 2008

Abstract
With the increasing number and size of the problems being tackled on ever-larger clusters, developers, users, and administrators face increasing challenges in meeting time-to-result goals. Applications must be developed quickly, run efficiently on the cluster, and be effectively managed so that application performance, reliability, and resource utilization are optimized. Taking an approach to building applications using Service-Oriented Architecture (SOA) with Windows HPC Server 2008 can help meet these challenges. Windows HPC Server 2008 provides a platform for SOA-based applications. The SOA programming model allows solution developers and architects to rapidly develop new high performance computing (HPC) cluster-enabled interactive applications and easily modify existing distributed computing applications. With Windows HPC Server 2008, the developer build/debug/deploy experience is streamlined, the speed of processing is accelerated, and the management of the applications and systems is simplified. This white paper provides a technical overview of SOA applications and the Windows HPC Server 2008 functions that support the SOA model; including building and deploying SOA

applications; their architecture, runtime system, scaling, and performance considerations; and monitoring and troubleshooting.

This document was developed prior to the products release to manufacturing, and as such, we cannot guarantee that all details included herein will be exactly as what is found in the shipping product. The information contained in this document represents the current view of Microsoft Corporation on the issues discussed as of the date of publication. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information presented after the date of publication. This White Paper is for informational purposes only. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED, OR STATUTORY, AS TO THE INFORMATION IN THIS DOCUMENT. Complying with all applicable copyright laws is the responsibility of the user. Without limiting the rights under copyright, no part of this document may be reproduced, stored in or introduced into a retrieval system, or transmitted in any form or by any means (electronic, mechanical, photocopying, recording, or otherwise), or for any purpose, without the express written permission of Microsoft Corporation. Microsoft may have patents, patent applications, trademarks, copyrights, or other intellectual property rights covering subject matter in this document. Except as expressly provided in any written license agreement from Microsoft, the furnishing of this document does not give you any license to these patents, trademarks, copyrights, or other intellectual property. 2008 Microsoft Corporation. All rights reserved. Microsoft, Active Directory, Excel, SharePoint, SQL Server, Visual Basic, Visual Studio, Windows, the Windows logo, Windows PowerShell, and Windows Server are trademarks of the Microsoft group of companies. All other trademarks are property of their respective owners.

Contents
Contents.........................................................................................................................................4 Windows HPC Server 2008 Overview...........................................................................................1 Service-Oriented Architecture Application Overview..................................................................4 Building an Application: The SOA Programming Model.............................................................8 Running the SOA Application: Architectural Considerations...................................................23 Monitoring and Managing the SOA Infrastructure.....................................................................44 Troubleshooting and Diagnosing SOA Application Runtime Errors........................................51 Advanced Programming Topics.................................................................................................54 Summary......................................................................................................................................59 Glossary.......................................................................................................................................60

Windows HPC Server 2008 Overview


High performance computing (HPC) applications use a cluster of computers working together to solve a single computational problem or single set of closely related computational problems. Windows HPC Server 2008 enables such cluster-based supercomputing based on x64 versions of the Windows Server 2008 operating system. Windows HPC Server 2008 can efficiently scale to thousands of processing cores and provides a comprehensive set of deployment, administration, and monitoring tools that are easy to deploy, manage, and integrate with an existing infrastructure. A wide range of software vendors in various vertical markets have been designing their applications to work seamlessly with Windows HPC Server 2008, so that users can submit and monitor jobs from within familiar applications without having to learn new or complex user interfaces. Windows HPC Server 2008 includes an advanced Job Scheduler, a new and faster Microsoft Message Passing Interface (MS-MPI), rapid deployment options using Windows Deployment Services (WDS), and a new management interface built on the Microsoft System Center user interface (UI) that supports Windows PowerShell as a preferred scripting language. Windows HPC Server 2008 takes advantage of Windows Server 2008 failover services, in addition to the failover clustering capabilities of Microsoft SQL Server, for cluster failover and redundancy. Windows HPC Server 2008 integrates with other Microsoft products to help increase HPC productivity and improve the overall user experience. This includes collaboration through Microsoft Office SharePoint Server 2007 and the Windows Workflow Foundation (WF), in addition to improved management and efficiency through integration with System Center solutions. Windows HPC Server 2008 delivers an integrated platform that makes it possible to create a new breed of applications that can be run in interactive settings, in addition to the traditional batch applications in the engineering, oil and gas, and life science market segments. These new interactive applications include trade and risk management applications in financial services, Microsoft Office Excel, and insurance risk modeling applications. Windows HPC Server 2008 can be used for massively parallel programs (computational fluid dynamics, reservoir simulation) in addition to embarrassingly parallel programs (Basic Local Alignment Search Tool [BLAST], Monte Carlo simulations). Through integration with the Windows Communication Foundation (WCF), Windows HPC Server 2008 empowers software developers working with Service-Oriented Architecture (SOA) applications to harness the power of parallel computing offered by HPC solutions.

Note: For general information about Windows HPC Server features and capabilities, see the white paper Windows HPC Server 2008 Technical Overview. For overall management and deployment information, see the white paper Windows HPC Server 2008 System Management Overview. For information about the Windows HPC Server 2008 Job Scheduler, see the white paper Windows HPC Server 2008 Job Scheduler. These papers can be found at http://www.microsoft.com/hpc/default.aspx.

Job Operation in Windows HPC Server 2008


Jobs, defined as discrete activities scheduled to perform on the compute cluster, are the key to operating in a Windows HPC Server environment. Compute cluster jobs are comprised of tasks; the job can be a single task, or it can include many individual tasks. Tasks can be serial, running one after another, or parallel, running across multiple processors. Tasks can also run interactively as SOA applications. The structure of the tasks in a job is determined by the dependencies among tasks and the type of application being run. In addition, jobs and tasks can be targeted to specific nodes within the cluster. Nodes can be reserved exclusively for particular jobs, or they can be shared between different jobs and tasks. To understand job operation, it is helpful to understand the components of an HPC cluster. Figure 1 shows cluster components and how they relate to each

other. Figure 1 Elements of a compute cluster

A cluster consists of a single head node (or a primary and secondary head node, if the deployed cluster is made highly available) and compute nodes. For interactive SOA applications, the cluster also includes one or more WCF broker nodes. The head node, which can also operate as a compute node, is the central management node for the cluster. The head node deploys the compute nodes, runs the Job Scheduler, monitors job and node status, runs diagnostics on nodes, and provides reports on node and job activities. Compute nodes execute job tasks. WCF broker nodes act as intermediaries between the application and the services. The broker load-balances the service requests to the services, and finally return results to the application When a user submits a job to the cluster, the Job Scheduler validates the job properties and stores the job in a SQL Server database. The job is entered into the job queue based on the specified policy. When the necessary resources are available, the job is sent to the compute nodes assigned for the job and run under the users security context. As a result, the complexity of using and synchronizing different credentials is eliminated, and the user does not have to employ different methods of sharing data or compensate for permission differences among different operating systems. An SOA application differs from traditional HPC batch-oriented applications in several ways. The admission, allocation, and activation boundaries are blurred. The initial admission involves a session in addition to the actual job, and the job admission request comes from the library implementing the session, not directly from the application code. Allocation is still fairly typical, with the Job Scheduler still managing resource allocation. Once a session is created, requests are sent to the broker node and results returned back to the client through the broker node.

Service-Oriented Architecture Application Overview


What Is SOA?
A Service-Oriented Architecture is an approach to building distributed, loosely coupled applications. SOA separates functions into distinct services that can be distributed over a network, and combined and reused. These functions are loosely coupled with the operating systems and programming languages underlying the applications. SOA defines and provisions the IT infrastructure to support and participate in the exchange of data between different applications. SOA services communicate with each other by passing data or by coordinating an activity between several services. The SOA architecture is not tied to a specific technology. It may be implemented using a wide range of technologies (including SOAP, Web services, WCF) and a variety of languages across different operating systems. The defining characteristic of SOA is independent services with defined interfaces that can be called to perform their tasks in a standard waythe service does not need to know the calling application, and the application does not need to know how the service actually performs its tasks

Batch Applications and Interactive Applications


While the first version of Windows HPC Server 2008 supports traditional HPC applications in the engineering, oil and gas, and life science market segments (applications that generally run in batch fashion), Windows HPC Server 2008 now delivers a platform that supports a new breed of applications that run in interactive settings, including trade and risk management applications in financial services and WCF or Web servicesbased applications (see Figure 2).

Figure 2 Windows HPC Server 2008 now focuses on interactive applications

Target Applications for SOA


HPC applications submitted to compute clusters are typically classified as either message intensive or embarrassingly parallel. While message-intensive applications comprise sequential tasks, embarrassingly parallel problems can be easily divided into very large numbers of parallel tasks, with no dependency or communication between them. To solve embarrassingly parallel problems without having to write the low-level code, developers need to encapsulate the core calculations as a software module. The SOA programming model makes this encapsulation possible and effectively hides the details for data serialization and distributed computing. Windows HPC Server 2008 includes support for embarrassingly parallel applications that use the SOA programming model; these applications use compute clusters interactively to provide near real-time calculation of complex algorithms. Table 1 shows some example applications and the related tasks. Table 1 Examples of SOA Applications Example Application
Monte Carlo problems that simulate the behavior of various mathematical or physical systems. Monte

Example Task
Predicting the price of a financial instrument.

Units of Work
The pricing of each security.

Carlo methods are used in physics, physical chemistry, economics, and related fields. BLAST searches. Genetic algorithms. Gene matching. Evolutionary computational metaheuristics. Computational physics and rendering. Calling add-in functions. Individual matching of genes. Computational steps.

Ray Tracing.

Each pixel to be rendered. Each add-in function call.

Microsoft Office Excel add-in calculations.

The Monte Carlo problem, a frequently used example of an SOA application, simulates the behavior of various mathematical or physical systems; it is used in physics, physical chemistry, economics, and related fields. The Monte Carlo problem is a computational algorithm that relies on repeated random sampling. Because of the reliance on repeated computation and random or pseudorandom numbers, Monte Carlo methods are well-suited for HPC. Monte Carlo methods tend to be used when it is infeasible or impossible to compute an exact result with a deterministic algorithm. The Monte Carlo method is widely used by financial analysts who want to construct stochastic or probabilistic financial models (as opposed to the traditional static and deterministic models). Many financial corporations use the Monte Carlo methods for making investment decisions or for valuing mergers and acquisitions; for example, financial corporations may need to formulate trading strategy against historical market data, complete risk analysis via Monte Carlo simulation in near real time, and price new derivative instruments. Another SOA example is Basic Local Alignment Search Tool (BLAST), a computer program that identifies homologous genes (genes in different species that share similar structures and functions) in different organisms. For example, there may be a gene in mice related to liking (or not liking) the consumption of alcohol; using BLAST, it is possible to search the human genome in search of a homologous gene. Because of the many iterations required, BLAST is well-suited for SOA and HPC. Table 2 describes the features and tools that Windows HPC Server 2008 provides for meeting the needs of SOA applications. Table 2 Benefits of Windows HPC Server 2008 for SOA Applications

Tas ks Build

User Needs The ability to solve embarrassingly parallel problems without having to write the low-level code. An integrated development environment (IDE) tool that lets developers develop, deploy, and debug applications on a cluster.

Windows HPC Server 2008 Features A service-oriented programming model based on WCF that effectively hides the details for data serialization and distributed computing. Microsoft Visual Studio 2008 with tools to debug services and clients. Low latency round-trip. End-to-end Kerberos with WCF transport-level security. Dynamic allocation of resources to the service instances.

Run

Ability to distribute short calculation requests efficiently. Ability to run user applications securely. A system that decides where to run the tasks of the application and dynamically adjusts cluster resource allocation to the processing priorities of the workload.

Manag e

The ability to monitor application performance from a single point of control. The ability to monitor and report service usage.

Runtime monitoring of performance counters, including the number and status of outstanding service calls and resource usage. Service resource usage reports.

Building an Application: The SOA Programming Model


SOA applications need to be developed quickly, run efficiently on the cluster, and be effectively managed so that application performance, reliability, and resource use are guaranteed. Developers need to be able to encapsulate the core calculations as software modules that can be deployed and run on the cluster; these software modules identify and marshal the data required for each calculation and optimize performance by minimizing the data movement and communication overhead. The SOA programming model provides the specifications and open technologies that enable developers to write service programs and client programs using the widely adopted WCF platform. The Microsoft Visual Studio development system provides easy-to-use WCF service templates and service referencing utilities that let developers quickly prototype, debug, and unit-test applications.

Benefits of Windows HPC Server 2008 for SOA Applications


Windows HPC Server 2008 provides a scalable, reliable, and secure interactive application platform that empowers developers to rapidly develop and easily modify cluster-enabled interactive applications.

Getting Started: Building an Application with the SOA Programming Model


Building an SOA application using the SOA programming model consists of three steps: 1. 2. 3. Creating the service. Deploying the service to a cluster. Creating a client application.

Creating the Service A service in the SOA programming model is defined as a program exposing a collection of endpoints; all communication with a service happens via the service's endpoints. Each endpoint specifies a contract that identifies which methods are accessible via this endpoint, a binding that determines how a client application can communicate with this endpoint, and an address that indicates where this endpoint can be found. The following steps can be used to create a service: 1. Launch Visual Studio 2008 and create a Class Library project. Name the project EchoService. 2. In the Visual Studio Project Explorer pane, navigate to the EchoService project.

3. 4.

Right-click References. An Add Reference dialog appears. In the Add Reference dialog, select the .NET tab.

5. Select System.ServiceModel. This reference is required for writing the WCF Service code. 6. a. If this reference is not listed on the .NET tab, follow these steps: Click the Browse tab.

b. Locate and select the file System.ServiceModel.dll from %windir %\Microsoft.Net\Framework\v3.0\Windows Communication Foundation. c. Click OK. 7. In Solution Explorer, navigate to EchoService item, and then rename the file Class1.cs to EchoService.cs. 8. Open the file EchoService.cs, and then copy and paste the following code into it:

using System; using System.Collections.Generic; using System.Text; using System.Diagnostics; using System.ServiceModel; namespace EchoService { [ServiceContract] public interface IEchoService { [OperationContract] string Echo(string input); } [ServiceBehavior(IncludeExceptionDetailInFaults = true)] public class EchoService : IEchoService { #region IEchoService Members public string Echo(string input) { return Environment.MachineName + ":" + input; } #endregion } }

9. Compile the service to create EchoService.dll. This file should reside in Visual Studio 2008\Projects\EchoService\EchoService\bin\[Debug| Release]\. Deploying the Service to a Compute Cluster The following steps can be used to deploy the service DLL file to the compute cluster. 1. Copy the file EchoService.dll to the \Services folder on the local drive of all compute nodes.

2. Register the service DLL on each node in the cluster by creating the EchoService.config file in the %CCP_HOME%ServiceRegistration folder (Default folder is: c:\Program Files\Microsoft HPC Pack\ServiceRegistration):
<?xml version="1.0" encoding="utf-8" ?> <configuration> <configSections> <!--Register service's custom configuration sections and group--> <sectionGroup name="microsoft.Hpc.Session.ServiceRegistration" type="Microsoft.Hpc.Scheduler.Session.Configuration.ServiceRegistration, Microsoft.Hpc.Scheduler.Session, Version=2.0.0.0, Culture=neutral, PublicKeyToken=31bf3856ad364e35"> <section name="service" type="Microsoft.Hpc.Scheduler.Session.Configuration.ServiceConfiguration, Microsoft.Hpc.Scheduler.Session, Version=2.0.0.0, Culture=neutral, PublicKeyToken=31bf3856ad364e35" allowDefinition="Everywhere" allowExeDefinition="MachineToApplication" /> </sectionGroup> </configSections> <microsoft.Hpc.Session.ServiceRegistration> <service assembly="c:\Services\EchoService.dll" contract="EchoService.IEchoService" type="EchoService.EchoService" /> </microsoft.Hpc.Session.ServiceRegistration> </configuration>

The contract and service attributes are optional if the service defines only one interface; otherwise, specify these values for each interface that the service defines.

Creating a Client Program Before creating the client program, install the Microsoft HPC Pack 2008 Client Utilities and the Microsoft HPC Pack 2008 SDK on the client computer. The following steps can be used to create the EchoService client proxy code from the EchoService DLL. 1. Navigate to the following folder: Visual Studio 2008\Projects\EchoService\EchoService\bin\ [Debug|Release]\ 2. Run the command svcutil EchoService.dll. This command generates the WSDL and XSD files for the service. 3. Run the command svcutil *.wsdl *.xsd /async /language:C# /out:EchoClientProxy.cs. 4. Launch Visual Studio 2008 and create a Console Application project. Name it EchoClient. 5. In the Solution Explorer, navigate to EchoClient and add a reference to the file Microsoft.Hpc.Scheduler.dll, Microsoft.Scheduler.Properties.dll and Microsoft.Hpc.Scheduler.Session.dll (these files are in C:\Program Files\Microsoft HPC Pack 2008 SDK\bin). 6. In the Solution Explorer/EchoClient pane, add a reference to System.ServiceModel (produced in step 5). 7. Add the file EchoClientProxy.cs to the client program. 8. Right-click EchoClient, click Add, and then click Existing Item. Windows Explorer appears. 9. Browse to the folder where the file EchoClientProxy.cs is located, and select it. 10. Click OK. 11. Add the following code to the file Program.cs:, and then compile the client program and run it:
using System; using System.Collections.Generic; using System.Text; using System.ServiceModel; using System.Threading; using Microsoft.Hpc.Scheduler.Session;

namespace EchoClient { class Program { static void Main(string[] args) { string scheduler = "localhost"; string serviceName = "EchoService"; if (args.Length > 0) { scheduler = args[0]; if (args.Length > 1) { serviceName = args[1]; } } // Create a session object that specifies the head node // to which to connect //and the name of the WCF service to use. // This example uses the default start information for a // session. SessionStartInfo info = new SessionStartInfo(scheduler, serviceName); info.ResourceUnitType = Microsoft.Hpc.Scheduler.Properties.JobUnitType.Node; info.MinimumUnits = 1; info.MaximumUnits = 4; Console.WriteLine("Creating a session..."); // Create the session by calling the factory method using (Session session = Session.CreateSession(info)) { Console.WriteLine("Session's Endpoint Reference:{0}", session.EndpointReference.ToString()); // Binds session to the client proxy using NetTcp // binding (specify only NetTcp binding). The // security mode must be Transport and you cannot

// enable reliable sessions. EchoServiceClient client = new EchoServiceClient(new NetTcpBinding(SecurityMode.Transport, false), session.EndpointReference); AsyncResultCount = 100; for (int i = 0; i < 100; i++) // EchoCallback is defined in EchoClientProxy.cs. { // This call will not block, // as results becomes available // the EchoCallback method will be invoked client.BeginEcho("hello world", EchoCallback, new RequestState(client, i)); } AsyncResultsDone.WaitOne(); client.Close(); Console.WriteLine("Please enter any key to continue..."); Console.ReadLine(); } } static int AsyncResultCount = 0; static AutoResetEvent AsyncResultsDone = new AutoResetEvent(false); // Encapsulates the context of the function callback class RequestState { int input; EchoServiceClient client; public RequestState(EchoServiceClient client, int input) { this.client = client; this.input = input; } public int Input {

get { return input; } } public string GetResult(IAsyncResult result) { return client.EndEcho(result); } } static void EchoCallback(IAsyncResult result) { RequestState state = result.AsyncState as RequestState; Console.WriteLine("Response({0}) = {1}", state.Input, state.GetResult(result));

if (Interlocked.Decrement(ref AsyncResultCount) <= 0) { AsyncResultsDone.Set(); } } } }

Session API
The key abstraction of the SOA programming model is that a client application creates a session with the Job Scheduler that allocates a pool of service instances on the compute nodes as workers for the session, distributing the calculations over multiple service instances to accelerate the processing speed. The client application then invokes the methods exposed by the service instances. The namespace for the session API is Microsoft.Hpc.Scheduler.Session. It contains two key classes, Session and SessionStartInfo. This namespace is described in Table 3. SessionStartInfo specifies a set of values used when creating a session; it offers most common controls over the session. Table 3 Microsoft.Hpc.Scheduler.Session Namespace Class Description

Session

Session enables the client code to create a virtual pool of service instances for a given service and distribute the calculations over multiple service instances to accelerate processing speed. SessionStartInfo specifies a set of values used when creating a session. SessionStartInfo offers most common controls over the session you start.

SessionStartInf o

Key SessionStartInfo properties are listed in Table 4. Table 4 Microsoft.ComputeCluster.Scheduler.Session.SessionStartInfo Properties

Properties Headnode JobTemplate ResourceUnitT ype MinimumUnits MaximumUnits NodeGroup Priority

Descriptions Head node of the cluster. Name of the job template. A job template is an administratordefined submission policy. Type of allocation units requested. Options include:
Microsoft.Hpc.Scheduler.Properties.JobUnitType.{Node, Socket or Core}.

Minimum number of requested resource units. Maximum number of requested resource units. Name of the node group. A node group is an administratordefined collection of nodes. Priority of the session. Options include:
Microsoft.Hpc.Scheduler.Properties.JobPriority.{Highest, AboveNormal, Normal, BelowNormal, Lowest}.

The default is Normal. Project RequestedNod es Runtime Secure ServiceJobNam e ServiceName ShareSession Project name of the session. A list of nodes on which to run the service instances of the session. Wall clock runtime limit of the session. If set to false, security will be turned off. (see Table 9 Security Approaches for details). The default is true. Name of the service job. Name of the service. If false, only the person that created the session can send requests to the broker. If true, anyone who can submit jobs based on the job template can send requests to the broker. The default is false. User name. The default is the submission user. Password of the user. The default is the submission user. TransportScheme.Http or TransportScheme.NetTcp.

UserName Password TransportSche me

Service Deployment
Two steps are involved in deploying a service: Copying and place the service DLLs across the cluster. Registering the service with Windows HPC Server 2008.

Copying and Placing the Service DLLs There are three options for copying and placing the service DLLs: 1. Local Deployment: Service DLLs are copied to the compute nodes. This option yields the best performance, but updating the service binaries can be time-consuming in a large cluster, especially if not all the nodes are online at the same time. 2. Central Deployment: Service DLLs are placed on a file share. This option makes it easy to update the service binaries, but loading from DLLs on compute node may result in long startup times; the service binaries may be large, and it may be necessary to set up .NET security permission for the compute nodes to access the service binaries off of the share. To set up .NET security permissions, use the caspol command (see MSDN documentation details). 3. Hybrid Deployment: Some large, not-so-frequently-updated service binaries are copied to the local nodes, while the small or more frequently updated services are copied to a file share. This provides the best of both worlds by minimizing the initial load time and ensuring the ease-of-service update. Registering the Service To register a service with Windows HPC Server 2008, create a service registration XML file and put it in the service registration folder. The service registration file contains the metadata that describes the service assembly path, the interface contract type and class that implements the contract, whether the service is composed of 32-bit or 64-bit binaries, and the environment variables needed for the service. The service configuration file must be named servicename.config, where the servicename is the same as that passed into the SessionStartInfo constructor (see Table 4 Microsoft.ComputeCluster.Scheduler.Session.SessionStartInfo Properties for details). Information for registering service assembly on nodes is provided in the following code sample:
<?xml version="1.0" encoding="utf-8" ?> <configuration> <configSections>

<!--Register service's custom configuration sections and group--> <sectionGroup name="microsoft.Hpc.Session.ServiceRegistration" type="Microsoft.Hpc.Scheduler.Session.Configuration.ServiceRe gistration, Microsoft.Hpc.Scheduler.Session, Version=2.0.0.0, Culture=neutral, PublicKeyToken=31bf3856ad364e35"> <section name="service" type="Microsoft.Hpc.Scheduler.Session.Configuration.ServiceConfi guration, Microsoft.Hpc.Scheduler.Session, Version=2.0.0.0, Culture=neutral, PublicKeyToken=31bf3856ad364e35" allowDefinition="Everywhere" allowExeDefinition="MachineToApplication" /> </sectionGroup> </configSections> <microsoft.Hpc.Session.ServiceRegistration> <service assembly="%CCP_HOME%bin\EchoSvc.dll" contract="EchoSvc.IEchoSvc" type="EchoSvc.EchoSvc" architecture="x86"> <!--Below is a sample for adding environment variables to the service--> <environmentVariables> <add name="PATH" value="%MY_SERVICES_HOME%Bin"/> <add name="myname2" value="myvalue2"/> </environmentVariables> </service> </microsoft.Hpc.Session.ServiceRegistration> </configuration>

Table 5 provides service metadata and Table 6 provides registration methods for the three deployment options: Table 5 Service Metadata Meta data assembly contract Description
The full path to the service DLL.

Required or Optional Required Optional if there is only one interface in the service DLL Optional if there is only one interface in the DLL

The interface of the service (WCF contract). The class that implements the WCF contract.

type architecture

The architecture on which your Optional (Default is WCF service can run. The possible AnyCpu) values include AnyCpu, x86, and x64. Environment variables used by your service. Optional

environmentVaria bles

Table 6 Registration Methods for the Three Deployment Options Deployment Options Local Deployment Registration Methods To register the service DLL on each node in the cluster by creating the servicename.config file in the %CCP_HOME%ServiceRegistration folder. (Default folder is: c:\Program Files\Microsoft HPC Pack\ServiceRegistration). The central service registration folder can be anywhere accessible by the compute node. The path of folder is configurable through the folder: C:\> cluscfg setenvs CCP_SERVICEREGISTRATION_PATH=\\filer\serviceregis tration

Central Deployment and Hybrid Deployment

Maintaining Multiple Versions of a Service The goal of the service deployment is to enable the XCOPY style of deployment. For each version of the service, a new service registration file needs to be created. The client that uses the new version of the service needs to use a different service name when creating a session. For example, when moving to service version 2, the client application will be changed as follows:
SessionStartInfo info = new SessionStartInfo(headnode, serviceV2); info.ResourceUnitType = Microsoft.Hpc.Scheduler.Properties.JobUnitType.Node; // Create the session by calling the factory method Session session = Session.CreateSession(info)

Running the SOA Application: Architectural Considerations


The underlying architecture for supporting the SOA programming model and the general steps for running the SOA application are shown in Figure 3.

Figure 3 Interactive sessions through the WCF The head node enables an administrator to monitor job status, view service usage reports, and view application logs. The compute nodes let the administrator view service performance counters, compute node health, and event logs. At the back-end, the WCF broker node virtualizes the service endpoints, balances requests, collects responses, and grows/shrinks the service pool. The compute nodes track service usage, run the service as the user, restart the service upon failure, and write the tracing.

Running the SOA Application


The following steps can be used to run the SOA application on a Windows HPC Server 2008 cluster: 1. The SOA client application initiates a session with the Job Scheduler. 2. The Job Scheduler allocates the compute nodes and starts the service instances (which load the service DLL files) on those nodes through the node

manager. Service instances are responsible for hosting endpoints, which are registered on compute nodes. The Job Scheduler allocates a broker node to launch the WCF broker job, using the round-robin strategy when selecting a broker node. At startup, the broker job publishes its endpoint reference by setting the sessions EPR property. The number of broker and service instance processes depends on the sessions resource requirements, node availability, and workload conditions. These requirements are specified by the client application or by the pre-configured administrative scheduler templates, which are customized according to the dependent resource requirements for the usage scenario. 3. 4. Client retrieves the broker nodes EPR from Job Scheduler. Client sends requests to the broker node.

5. The broker node routes and load-balances service requests between the client and service instance. Broker nodes also assist the scheduler service with managing service instance lifetimes and the grow/shrink policies for cluster resources. 6. The broker node forwards the responses received from the service instances back to the client application. The SOA component roles are summarized in Table 7. Table 7 SOA Component Roles Components WCF Broker Roles Request forwarder Descriptions Stores and forwards request/response messages between client application and service instances. Performs computation. Allocates resources to sessions. Starts the job on the node and authorizes the service.

Service Instance Job Scheduler Node Manager

Service Resource allocator Job nodal agent/authorizat ion service

When the SOA client application creates a session, the session API creates two jobs: a WCF broker job (started in the broker nodes) and a service job (started on the allocated compute nodes). Table 8 Broker and Service Jobs in an SOA Session Jobs WCF Broker Job: one task Service Job: as many tasks as there are allocated units Programs HpcWcfBroker HpcServiceHost Execution Nodes Broker nodes Compute nodes

The number of service instances can change during processing, according to the dynamic workload condition of the cluster. As the job is running, the administrator can use the Windows HPC Server 2008 Administrator Console to monitor the heat map (provides an overview of system utilization) of the broker and compute nodes, and use the job manager to monitor the progress and resource usage of the session job. The resource usage of services is logged so that usage reports based on users, projects, or service names can be created.

Recovering from a Node Failure


Occasionally, an HPC cluster can experience the failure of a compute node or a broker node. Compute Node Failure If a compute node fails, the outstanding requests sent to the nodes will be rerouted to the remaining service instances. To restore the processing capacity, the WCF broker node requests that Job Scheduler start a new service instance; the Job Scheduler then determines whether new resources should be allocated to this session based on the available resources, the relative priority of the session compared to other pending sessions, and any other running jobs or sessions. If the request is granted, the new service is started and added to the WCF broker nodes service instance pool. WCF Broker Node Failure When the WCF broker node fails, the processing disruption is more severe. There are two ways to recover from the WCF broker node failure: server-side initiated recovery or client-side initiated recovery. Server side. For a server-side initiated recovery, the WCF broker node must provide transactional semantics for the message exchange between the client application, the broker node, and the services. A server-side initiated recovery negatively impacts performance and adds management complexity for the persistent storage.

Client side. For a recovery initiated by the client side, the client side must keep track of all unfulfilled messages; the client application re-establishes the session and resends the outstanding messages. Because recovery initiated by the client application does not require transactional semantics or central storage for message persistence, it is more efficient and adds no extra management overhead. This type of recovery is straightforward for an interactive scenario.

The following code shows how the client application can recover from a broker node failure. The client application uses a queue to track the unfulfilled requests. Initially the queue contains all the requests, and the client application retrieves the requests and sends them asynchronously. When a CommunicationException occurs, the client application re-queues the messages into the unfulfilled request queue, recreates a session, and resumes from where the client left off. The requests can all be sent in spite of broker node failuresthereby achieving reliable message delivery. This sample code also shows the use of the Anonymous Delegate. The Anonymous Delegate serves as an AsyncCallback function, and provides more concise code by capturing the free variables used in the context; this eliminates the need to create a context class (such as the RequestState class) to stash the variables required for the callback function. It also makes the thread synchronization performed through the ManualResetEvent more readable:
using System; using System.Collections.Generic; using System.Linq; using System.Text; using LongRunningSvcClient.ServiceReference1; using System.ServiceModel; using System.Threading; using System.Diagnostics; using Microsoft.Hpc.Scheduler.Session; namespace LongRunningSvcClient { class Program { static Semaphore outstandingRequests = null; static void Main(string[] args) { EndpointAddress epr = null; bool createSession = true;

int numServiceInstances = 1; int maxOutstandingRequests = 10; if (args.Length > 0) { numServiceInstances = int.Parse(args[0]); } if (args.Length > 1) { maxOutstandingRequests = int.Parse(args[1]); }

Queue<int> unfulfilled = new Queue<int>(); for (int i = 0; i < 40000; i++) { unfulfilled.Enqueue(i); }

// // Loop until all the service calls are completed // Stopwatch timer = Stopwatch.StartNew(); long start = 0; ManualResetEvent finishedEvt = new ManualResetEvent(false); for (;;) { int cnt = unfulfilled.Count; finishedEvt.Reset(); Session session = null; if (createSession == true) { Console.WriteLine("Creating session..."); session = CreateSession(numServiceInstances);

if (session == null) return; epr = session.EndpointReference; createSession = false; } // Create Client Proxy Service1Client client = new Service1Client( new NetTcpBinding(SecurityMode.Transport, false), epr); client.InnerChannel.OperationTimeout = new TimeSpan(1, 0, 0, 0); Console.WriteLine("Proxy created EndpointReference = {0}", epr.ToString()); bool brokerConnectionBroken = false; outstandingRequests = new Semaphore(maxOutstandingRequests, maxOutstandingRequests); start = timer.ElapsedMilliseconds; while (unfulfilled.Count != 0) { int n = unfulfilled.Dequeue(); try { // Keep the outstanding requests to within [0, maxOutstandingRequests] outstandingRequests.WaitOne(); client.BeginSquare( n, delegate(IAsyncResult result) { try { int reply = client.EndSquare(result); // Console.WriteLine("Square({0})={1}",

result.AsyncState, reply); } catch (CommunicationException) { unfulfilled.Enqueue((int)result.AsyncState); brokerConnectionBroken = true; } catch (TimeoutException) { unfulfilled.Enqueue((int)result.AsyncState); } Interlocked.Decrement(ref cnt); if (cnt == 0) { finishedEvt.Set(); } outstandingRequests.Release(); }, n); } catch (CommunicationException) { brokerConnectionBroken = true; finishedEvt.Set(); break; } } finishedEvt.WaitOne(); if (unfulfilled.Count == 0) break; if (brokerConnectionBroken == true) { session.Dispose(); createSession = true; // callback context

} } timer.Stop(); long end = timer.ElapsedMilliseconds; Console.WriteLine("throughput is {0}", 40000 / ((end-start) / 1000.0)); Console.WriteLine("Please enter any key to continue..."); Console.ReadLine(); } static Session CreateSession(int numServiceInstances) { SessionStartInfo startInfo = new SessionStartInfo("r25-1183d1002", "SquareService1_0 "); #region resource requirements startInfo.ResourceUnitType = Microsoft.Hpc.Scheduler.Properties.JobUnitType.Core; startInfo.MinimumUnits = 1; startInfo.MaximumUnits = numServiceInstances; startInfo.Priority = Microsoft.Hpc.Scheduler.Properties.JobPriority.AboveNormal; #endregion Session session = null; try { session = Session.CreateSession(startInfo); } catch (Exception ex) { Console.WriteLine(ex.Message); if (ex.InnerException != null) Console.WriteLine(ex.InnerException.Message); return null;

} return session; } } }

In a batch scenario, however, there are additional considerationsfor example, the client application, which must be kept highly available, may fail after the session has been created. One way to maintain client application availability is to submit the client application as a job: the HPC Job Scheduler Service automatically restarts jobs if the node that they are running on fails. . For this scenario, the client application must be able to maintain its checkpoint (when restarted, it must resume from where it left off). To ensure that the client application is fungible (able to run on any compute node), the checkpoint storage (for example, a shared file system or message queuing system) should be accessible from all compute nodes.

Security
The service broker supports the standard, interoperable transport, HTTP, and a more efficient transport, TCP. This enables client applications running on thirdparty platforms to invoke WCF services and lets native Windows clients get the best performance. Table 9 details the security approaches that the SOA system takes to authenticate and authorize user requests for TCP and HTTP bindings. You can turn security on or off by setting the SessionStartInfo.Secure property to true or false. Table 9 Security Approaches Bindings NetTcp SessionStartInfo.Secure = true Http Service broker establishes endpoints on NetTcpBinding with Transport security. Clients are authenticated using Windows integrated security (Kerberos or NTLM). Broker authorizes clients based on their Windows identity. Service broker establishes endpoints on BasicHttpBinding with TransportWithMessageCredent SessionStartInfo.Secure = false Service broker establishes endpoints on NetTcpBinding with no security. Clients are not authenticated. Broker allows every connection in and send/receive messages. Service broker establishes endpoints on BasicHttpBinding with no security.

ial security. Traffic is secured by HTTPS. Broker authenticates clients by their user name and password which are passed in the message headers. Broker authorizes clients based on their Windows identity.

Clients are not authenticated. Broker allows every connection in and send/receive messages.

Figure 4 shows the security model for SOA.


Http / TCP Bindings Transport security mode Cred types: Basic / Windows TCP Binding Transport security mode Cred types: Basic / Windows

Excel
Remote XLL

Session Request/ Response

Service EPR

Service Router

Service Instance

CCP Service Container

Service EPR Client Service Request/ Reponse Service EPR

Service Router

WS Msg

Service Instance

WAS

Service Router

Service Instance

WAS

Figure 4 Security model for SOA

Sharable Sessions
By default, each client application initiates a new broker on the broker node; this is suitable for applications that require dedicated compute resources for mission-critical, deadline-sensitive workloads. Each service typically primes itself upon startup (placing the data into memory); this helps to ensure a fast response time. However, there are scenarios where multiple users run low-compute, but dataintensive, applications that require each service request to access a wide range of domain data. Having each client application create its own copy of the services at startup can incur a high startup time and can be cost prohibitive. Sharable sessions provide a solution. The Job Scheduler API lets the job property of a session be queried by jobid or jobname; this can then be shared among the client applications, enabling a session created by one client

applications to be used by other client applications. The following figure shows the components of a shared session. The main steps are: 1. Creating a shared session (for example, mysharedsession). 2. Starting the broker on the WCF broker node and the service instances on the compute nodes. 3. Getting the EPR by the jobname (in the example, mysharedsession). Figure 5 shows the steps and the architecture for shared sessions.

Figure 5 Shared sessions The following code can be used by a producer client application to create a sharable session:
SessionStartInfo startInfo = new SessionStartInfo(HeadNode", MyService"); // create a sharable session startInfo.ShareSession = true; Session session = Session.CreateSession(startInfo); // Write the jobId to the output Console.WriteLine(Broker Job Id is {0}, session.BrokerJob.Id);

The following code can be used by the consumer client applications:


IScheduler sched = new Scheduler(); sched.Connect("HeadNode"); ISchedulerJob job = sched.OpenJob(jobId); // Get the Endpoint Reference of the Broker EndpointAddress epr = new EndpointAddress(job.EndpointAddresses[0]); // Create a proxy out of the client

Service1Client client = new Service1Client(new NetTcpBinding(SecurityMode.None, false), epr);

Service Instance Resourcing Model


The HPC Job Scheduler Service allocates the compute nodes and starts the service instances, which are responsible for hosting endpoints registered on the compute nodes. The Service Instance Resourcing model defines how service instances are mapped to computing resources. There are three Service Instance Resourcing models, as shown in Table 10. Table 10 Service Instance Resource Models Resourcing Model One service process per processor One service process per node One service process per socket Description Used to host services that are linked with non-thread safe libraries. Multithreaded services. Single threaded services that are memory-bus intensive.

Error: Reference source not found11 shows the details for each of the resourcing models. Table 11 Resource Model Details Resourcin g Model One service process per processor One service process per node Job Scheduling Type SessionStartInfo.ResourceUni tType = Core Example C++ analytics services in capital market firms. Service code that uses multiple processor s on a given node. Memoryintensive calculatio

SessionStartInfo.ResourceUni tType = Node

One service process

SessionStartInfo.ResourceUni tType = Socket

per socket

n services.

How Broker Dispatches Requests to Service Instances


The number of batched messages the broker node sends to the services is based on service resource unit type and the service throttling behavior. Table 12 Broker and Service Request Dispatching Resourcing Model Number of Requests Broker Sends to the Service in One Batch 1 Number of cores on the node Number of cores on the socket

Core-wide Node-wide Socket-wide

To override the default behavior, configure the ServiceThrottlingBehavior section of your service.dll.config file to specify the maximum concurrent calls a service can take. For example, if you are using the Parallel Extension to write a service and you want to override the default behavior of the node-wide service instance to only receive one request at a time, you can specify the following service behavior in the service.dll.config:
<serviceBehaviors> <behavior </behavior> </serviceBehaviors> name="Throttled"> <serviceThrottling maxConcurrentCalls="1" />

The broker will use the maxConcurrentCall as the capacity of the service. This lets the administrator or software developer use a standard WCF setting to fine tune the broker nodes dispatching algorithm to fit the processing capacity of the service.

Broker Configuration Parameters


The following parameters govern the behavior of the broker: Table 13 Broker Configuration Parameters Parameters loadSamplingInterval allocationAdjustInterval Descriptions Service load sampling interval in milliseconds Service resource allocation adjustment interval in milliseconds Let the load be the number of unfulfilled messages in the broker, and the load ratio be: 100 * load/ (number of service instances * number of cores per instance) The processing capacity is considered appropriate if: (allocationShrinkLoadRatioThresh old) < (load ratio) < (allocationShrinkGrowThreshold) The broker will grow the allocation if: (load ratio) > (allocationGrowLoadRatioThreshol d), and will shrink the allocation if the (load ratio) < (allocationShrinkLoadRatioThresh old) clientConnectionTimeou t After a session is created, if no client application is connected within this timeout period, the session will be closed (see Session Life Cycle Model for details). Unit: millisecond clientIdleTimeout After a client application connects to a session, if the client application does not send messages within this timeout period, the 300,000 300,000 Default s 1,000 60,000

allocationGrowLoadRatioThr eshold allocationShrinkLoadRatioT hreshold

Upper threshold: 125 Lower threshold: 75

connection will be closed by the broker (see Session Life Cycle Model for details). Unit: millisecond sessionIdleTimeout When all the client applications are idle (timed out), if no more client applications are connected within this timeout period, the session will be closed (see Session Life Cycle Model for details). sessionIdleTimeout is not supported for the HTTP binding. Unit: millisecond statusUpdateInterval The timer interval for the broker to publish service stats to the job. Unit: millisecond messageThrottleStartT hreshold messageThrottleStopTh reshold Broker stops receiving request messages from the client if the number of queued messages exceeds the messageThrottleStartThreshol d and accepts request messages if the number of queued messages goes below the messageThrottleStopThreshol d. Start threshold: 5120 Stop threshold: 3840 15,000 0

Session Life Cycle Model


To understand how clientConnectionTimeout, clientIdleTimeout, and sessionIdleTimeout work, it is helpful to understand how the broker manages the session life-cycle mode.

Client connection timeout

All Clients idle timeout

Created

A client connects

Busy

All clients disconnect

Idle

Session Idle timeout

Closed

A client connects

Figure 6 Brokers session life-cycle model

Figure 6 shows the life-cycle model of a session. After a session is created, it goes through a busy state, an idle state, and ends up in a closed state. If no client application connects within the clientConnectionTimeout period, the session will be closed. When a session is in the busy state, if all the client applications are idle (no messages are sent for over the specified clientIdleTimeout seconds), the client application is closed. If all the client applications disconnect, the session is in the idle state; if no client application connects to an idled session over the sessionIdleTimeout period, the session is closed.

These broker configuration parameters can be controlled at two levels: 1. Broker node level 2. Session level Broker Node Level Settings To control the broker settings at the per-node level, specify the monitor element of the HpcWcfBroker.exe.config file in the %CCP_HOME%\bin folder as follows:
<?xml version="1.0" encoding="utf-8" ?> <configuration> <microsoft.Hpc.Broker> <!--configuration to control broker's monitoring behavior--> <monitor messageThrottleStartThreshold="5120" messageThrottleStopThreshold="3840" loadSamplingInterval="1000" allocationAdjustInterval="60000" allocationGrowLoadRatioThreshold="125" allocationShrinkLoadRatioThreshold="75" clientIdleTimeout="300000" clientConnectionTimeout="300000" sessionIdleTimout="0" statusUpdateInterval="15000"/> </microsoft.Hpc.Broker>

Session Level Settings The broker node settings can be overridden by the session level settings from the client application code using the session API. For example, the following code sets the client idle timeout to be 1000 seconds:
SessionStartInfo startInfo = new SessionStartInfo("headnode", "servicename"); startInfo.BrokerSettings.clientIdleTimeout = 1000000; Session = session.CreateSession(startInfo);

Configuring the system for large sessions


The default system configuration supports small to medium sized sessions (under 300 service instances). To support large sessions, you need to change the settings in two places: 1. Broker settings in the HpcWcfBroker.exe.config file 2. .Net Port Sharing settings smsvchost.exe.config file Broker Settings To support larger sessions, change the registration binding and the throttle behavior attribute values in the HpcWcfBroker.exe.config file in %CCP_HOME%\bin. For example:
<binding name="RegistrationBinding" portSharingEnabled="true" maxConnections="4000" listenBacklog="1000" openTimeout="00:02:00" receiveTimeout="00:02:00" sendTimeout="00:02:00" > <security mode="Transport" /> </binding>

<behavior name="Throttled"> <serviceThrottling maxConcurrentCalls="4096" maxConcurrentSessions="4096" /> </behavior>

The Tables 14 and 15 describe the attributes for the broker registration binding and throttle behavior.

Table 14 Broker Registration Binding Attributes Registration binding attribute maxConnections Description The maximum number of outstanding connections that WCF is expected to handle at once. WCF is most stressed during the initial registration storm. Set this number higher than the maximum number of connections that you expect to service simultaneously. listenBacklog The maximum number of queued connection requests that can be pending. For synchronous callers, TCP connection time includes the WCF connection time. Open, Receive, and Send timeouts The default timeouts are 10 seconds. Because the WCF broker registration connections are secured and authenticated, the broker can occasionally incur a Domain Controller hit when the cached Kerberos tickets become stale. If the timeouts are set too low, you might get connection failures. Change the timeouts to the HTTP standard disconnect time of 2 minutes.

Table 15 Broker Throttle Behavior Attributes Throttle behavior attribute maxConcurrentCa lls Description The maximum number of calls concurrently in dispatch that WCF can handle. There is no penalty in setting it high as the incoming startup burst could have many calls concurrently in flight. maxConcurrentSe ssions The maximum number of sessions supported concurrently by the endpoint. Set this higher than the maximum number of expected connections. If this number is not set high enough, session establishment will be refused. .Net Port Sharing Settings To support larger sessions, change the net.tcp attribute values in the smsvchost.exe.config file in %windir %\microsoft.net\framework64\v3.0\windows communication foundation\. For example:
<net.tcp listenBacklog="1000" maxPendingConnections="1000" maxPendingAccepts="8" receiveTimeout="00:02:00"

Table 16 describes the attributes for the net.tcp settings. Table 16 Port Sharing Attributes .Net Port Sharing attribute maxPendingConn ections Description The maximum number of connections awaiting final accept. This number should be equal to the Registration Binding listenBacklog value. maxPendingAccep ts The maximum number of concurrent threads that the process spins up to accept incoming requests. This number should be set to approximately the number of cores on the cluster.

Monitoring and Managing the SOA Infrastructure


All IT systems need to be maintained efficiently to maintain a high return on investment (ROI) over time. With Windows HPC Server 2008, administrators can effectively monitor user sessions via the Node Management and Job Management Wunderbar in the Administration Console. With Windows HPC Server 2008, an administrator can configure nodes, monitor broker nodes, manage sessions, and troubleshoot run-time problems.

Monitoring the Cluster


Clicking the Node Management in the Navigation pane opens the Node Management view in the Administration Console. There are two basic views available in the Node Management center pane: List ViewShows node properties and resources in a standard list format. Heat Map ViewProvides an at-a-glance view of the node health metrics in a heat map format.

For a quick overview of the overall health and status of all nodes (or for a subset of nodes based on the filtering properties), display the nodes as a metrics heat map, as shown in the following figure.

Figure 7 Node state heat map

From the Heat Map view, you can quickly switch to List view, or take action on the node directly. The list of actions available for a selected node is populated in the Actions pane on the right of the Administration Console, or on the shortcut menu. Double-clicking on a selected node opens a dialog box that provides details about the node. Advanced Monitoring with System Center Operations Manager Windows HPC Server 2008 provides basic built-in monitoring. It also includes a custom Microsoft System Center Operations Manager Management Pack (to be made available when the product is released to manufacturing [RTM]) that supports advanced monitoring of Windows HPC Server 2008 clusters within the familiar and extensive System Center enterprise management environment. With System Center Operation Manager, administrators can monitor and aggregate events, provide e-mail alerts, provide for application monitoring, and perform other services.

Enabling a Broker Node


To run SOA applications, it is necessary to enable a broker node. The head node can be configured as a WCF broker node; however, when a node is configured as a WCF broker, it cannot also be a compute node. To verify that the broker role is enabled on a node, launch the Administration Console. If enabled, WcfBrokerNodes is listed under Groups, as shown in the following figure:

Figure 8 Verifying that broker node is enabled

If the broker node role is not enabled, a WCF broker node can be configured using the following steps: 1. 2. 3. Click Node Management in the Navigation Pane. Navigate to HeadNodes, and then select By Group. On the Actions pane, select Take Offline.

4. Verify in the central Results pane that the state of the node changes to Offline. 5. On the Actions pane, select Change Role. 6. The Change Additional Role dialog is displayed. Choose Select Router Node, and then click OK.

Figure 9 Change Additional Role dialog box 7. On the Action pane, click Take Online. 8. Verify that RouterNode appears under the head nodes Groups column.

Monitoring a Session
When an SOA client application creates a session, the session API creates two jobs: a WCF broker job and a service job. The WCF broker job can only be started on the broker nodes, and the service job can only be started on the compute nodes. If a session uses the core as the allocation type, then there are as many services as there are cores allocated to the service job. Monitoring the WCF Broker Node WCF broker nodes host the critical SOA infrastructure that serves as the intermediary between the client application and the service. As such, the performance of an SOA application is contingent upon their health. To let an administrator effectively monitor the broker nodes, Windows HPC Server 2008 provides built-in performance counters that address several areas: system, network, and WCF call rates. These performance counters can be viewed from the heat map of the Administration Console and can make it possible for the administrator to determine whether the system is in a critical health condition. The Heat Map view is shown in the following figure.

Figure 10 Heat map

There is also a List view, as shown in the following figure.

Figure 11 Monitoring a broker node By viewing the memory usage of the nodes, for example, an administrator can determine whether certain nodes are reaching their memory threshold, rendering them unfit for new jobs. To ensure that no new jobs start on these nodes, the administrator can take the node offline until the node is below the memory threshold.

Monitoring a Service Job To monitor a service job, select Active Jobs from the Job Management pane, and then click on a running job in the Active Jobs pane. Details of the Job Properties are provided, as shown in the following figure.

Figure 12 Monitoring a service job

Reporting
To view reports of resource usages by a service, select Job Resource Usage from the Charts and Reports pane. Select the Group By: Service. Details of the Service Resource Usage Reports are provided, as shown in the following figure.

Figure 13 Service resource usage report

Troubleshooting and Diagnosing SOA Application Runtime Errors


SOA applications are distributed in nature, and they can present practical challenges for troubleshooting. Source of errors can include application service errors and system configuration issues. Because services are running on remote compute nodes that are commonly shielded behind a firewall and behind the head node, error conditions are hard to access programmatically from the client application. Windows HPC Server 2008 provides exception propagation, making it possible for service faults to be caught and processed by the client application in a transparent fashion. To enable the exception propagation, use the attribute [] at the service interface declaration: Because services can public class EchoService : IEchoService be deployed in { an out-of-band #region IEchoService Members fashion and because multiple bindings and topologies for the broker and compute nodes are supported in Windows HPC Server 2008, services may be misdeployed and the system may be misconfigured, resulting in potential application runtime failure.
[ServiceBehavior(IncludeExceptionDetailInFaults = true)]

The application exceptions cannot provide detailed diagnostics information, because the client application often does not have privileges to access system information. Windows HPC Server 2008, therefore, provides two diagnostic tools to let the administrator effectively troubleshoot the system: Service repository test Service model test

Service Repository Test


With the service repository test, an administrator can determine which services are installed on particular nodes. The test report contains two sections: a summary section and a details section. The summary section displays a table of services and their registered nodes. Using this section, the administrator can verify whether a service has been successfully deployed on computers that are accessible to users. The details section shows the path, the service, and the contract type of each service for each node. This effectively serves as a post-deployment validation, as shown in the following figure.

Figure 14 Servicer repository test results

Service Model Test


The service model test checks the system configuration of and run-time performance of the SOA infrastructure so that the administrator can ensure that the system is ready to run SOA workloads and can determine if the system has any bottlenecks. To run the test, perform the following steps: 1. 2. 3. 4. Navigate to the page and select Service Model Test. Click Run test. A node selection dialog appears. Select the nodes to run the test and click OK. Navigate to the Show Result page, as shown in the following figure.

Figure 15 Broker service test

Advanced Programming Topics


Throttling Requests
Given the asynchronous nature of the client programming model, an application can potentially be memory-demanding if the size of the data or the number of messages is very large. To control the memory footprint of these applications, the client application can throttle the requests by sending batches of requests at a time. In doing so, both the client-side and the broker-side memory usage can be made efficient and effective. For throttling to work, the client application uses a semaphore to control the number of outstanding requests that the client application issues. In the following sample code, the client application keeps the outstanding request to 10the sending thread blocks further requests if there are 10 requests outstanding. The sending thread resumes sending when it receives a signal from the receiving threads:
/ Create a semaphore that can satisfy up to 10 concurrent requests. // send up to 10 requests. outstandingRequests = new Semaphore(10, 10); SessionStartInfo info = new SessionStartInfo(scheduler, serviceName); using (Session session = Session.CreateSession(info)) { int i; NetTcpBinding binding = new NetTcpBinding(SecurityMode.Transport, false); EchoSvcClient client = new EchoSvcClient(binding, session.EndpointReference); // set the timeout to 1 day client.InnerChannel.OperationTimeout = new TimeSpan(1, 0, 0, 0); AsyncResultCount = 100; Use an // initial count of 10 so that initially the sending thread (main program) can

for (i = 0; i < 100; i++)

{ // Enters the semaphore. This call will // block is there are 10 outstanding // requests, until the receiving thread // signals it from the callback function // EchoCallBack() outstandingRequests.WaitOne(); client.BeginEcho("hello world", EchoCallback, new RequestState(client, i)); } AsyncResultsDone.WaitOne();

client.Close(); } } // receiving thread entry point static void EchoCallback(IAsyncResult result) { RequestState state = result.AsyncState as RequestState; if (Interlocked.Decrement(ref AsyncResultCount) <= 0) { AsyncResultsDone.Set(); return; } string response = state.GetResult(result); Console.WriteLine("Response({0}) = {1}", state.Input, response); // Signals the sending thread to resume sending outstandingRequests.Release(); }

Handling Large Messages


The default WCF message size is 64 KB. To send messages larger than this, set the message size on both the client application side and the service side. On the client application side, set the buffer sizes when creating the net.tcp binding as follows:
NetTcpBinding binding = new NetTcpBinding(SecurityMode.Transport); binding.MaxBufferSize = 262144; binding.MaxReceivedMessageSize = 262144; binding.MaxBufferPoolSize = 1048576; binding.ReaderQuotas.MaxArrayLength = 131072; binding.ReaderQuotas.MaxBytesPerRead = 262144; Service1Client client = new Service1Client(binding,session.EndpointReference);

On the service side, specify the message sizes in the services configuration file, servicename.dll.config, as follows:
<?xml version="1.0" encoding="utf-8" ?> <configuration> <system.serviceModel> <bindings> <!-- configure a binding that support a session --> <netTcpBinding> <binding name="myBinding"> <readerQuotas maxArrayLength="1048576"/> <security mode="Transport"/> </binding> </netTcpBinding> </bindings> <services> <service name="WCFService1.Service1"> <!--behaviorConfiguration="myBinding"--> <endpoint address="" binding="netTcpBinding" bindingConfiguration="myBinding" contract="WCFService1.IService1"/> </service> </services> </system.serviceModel> </configuration>

Reducing the Message Passing Overhead


As WCF hides the object serialization from a developer, it provides a very attractive, transparent way of invoking the services. Developers may wish to use this model to communicate objects using WCF messages. However, larger objects lead to long serialization times. Depending on the underlying processor speed and memory architecture, there is a point beyond which sending larger messages yields diminishing returns with respect to the network bandwidth. Adding cores to the session does not reduce the processing time. It is good practice to keep the messages to the default WCF message size (64 KB) to help ensure processing efficiency and resource efficacy. To reduce the size of the messages, the developer can send references to the object instead. By enabling the server side to load the data, the developer can avoid sending large objects via the broker, thus avoiding the serialization time associated with the WCF stack. This provides a cost savings. Moreover, because parametric sweep applications typically access shared global data, directing the client application to stash the global data into a data cache and having the services load the data at startup eliminates further data transfer costthe client application only sends processing-specific data, as opposed to duplicating the global data in each request.

Supporting Long Service Calls


To support long service calls, you must adjust the receiveTimeout attribute value on the broker side, and the operationTimeout property value on the client side code. If a service call lasts more than ten minutes, the broker closes the connection to the client. To avoid interrupting long service calls, include a <servicename>.dll.config file with the service assembly file. Use this configuration file to set a receiveTimeout value on the broker side that allows enough time for the service calls to complete. For example, to allow a service to take maximum 30 minutes, specify the receiveTimeout attribute of the binding element as follows, where <ServiceImplName> and <ServiceContractName> are the implementation and contract names of your service:
<system.serviceModel> <services> <service name="<ServiceImplName>"> <endpoint binding="netTcpBinding" bindingConfiguration="ServiceBinding" name="tcpbinding0" contract="<ServiceContractName>" /> <host> <baseAddresses> <add baseAddress="net.tcp://localhost:9088/" /> </baseAddresses>

</host> </service> </services> <bindings> <netTcpBinding> <binding name="ServiceBinding" receiveTimeout="00:30:00" portSharingEnabled="true"> <security mode="Transport" /> </binding> </netTcpBinding> </bindings> </system.serviceModel>

Specify the OperationTimeout property value in the client side code as follows:
NetTcpBinding binding = new NetTcpBinding(SecurityMode.Transport); //binding.ReceiveTimeout = TimeSpan.FromMinutes(30); Service1Client client = new Service1Client(binding, session.EndpointReference); client.InnerChannel.OperationTimeout = TimeSpan.FromMinutes(30);

Summary
Windows HPC Server 2008 combines the power of the Windows Server platform with rich, out-of-the-box functionality to help improve the productivity and reduce the complexity of the HPC environment. Windows HPC Server 2008 can efficiently scale to thousands of processing cores and provides a comprehensive set of deployment, administration, and monitoring tools that are easy to deploy, manage, and integrate with your existing infrastructure. A wide range of software vendors, in various verticals, have been designing their applications to work seamlessly with Windows HPC Server 2008, so that users can submit and monitor jobs from within familiar applications without having to learn new or complex user interfaces. Windows HPC Server 2008 includes a new and more scalable Job Scheduler that was built to address both batch and newer service-oriented jobs. New job scheduling tools and enhancements include support for SOA workloads. Compute nodes can communicate with the submitting systems through Windows Communication Framework (WCF) broker nodes, dedicated nodes that act as proxies between the public and private networks and that can be added as necessary for additional scalability. Windows HPC Server 2008 allows solution developers and architects to rapidly develop new interactive applications and easily modify existing distributed computing applications. The developer experience of build/debug/deploy is streamlined, the speed of processing is accelerated, and the management of the applications and systems is simplified with Windows HPC Server 2008.

Glossary
The following terminology is helpful when running Windows HPC Server. Administration Console The Administration Console is the overall management interface for cluster administration. Based on the Microsoft System Center user interface, it uses Navigation Bars to quickly change the context and view. The Job Manager Navigation Bar provides a graphical interface to job management and scheduling. Cluster A cluster is the top-level unit of Windows HPC Server. A cluster contains the following elements: Node: A single physical or logical computer with one or more processors. Nodes can be a head node, compute nodes, or WCF Broker nodes. Queue: An element providing queuing and job scheduling. Each Windows HPC Server cluster contains only one queue, and that queue contains pending jobs. Completed jobs are purged periodically from the queue. Job: A collection of tasks that a user initiates. Jobs are used to reserve resources for subsequent use by one or more tasks. Users can submit jobs in either interactive or batch processes. Tasks: A task represents the execution of a program on given compute nodes. A task can be a serial program (single process), or a parallel program (using multi-threading, OpenMP, or MPI). Job Scheduler The Job Scheduler queues jobs and their tasks. It allocates resources to these jobs; initiates the tasks on the compute nodes of the cluster; and monitors the status of jobs, tasks, and compute nodes. Job scheduling uses scheduling policies to decide how to allocate resources. The interface layer provides for job and task submission, manipulation, and monitoring services accessible through various entry points. The scheduling layer provides a decision-making mechanism that balances supply and demand by applying scheduling policies. The workload is distributed across available nodes in the cluster according to the job profile. The execution layer provides the workspace used by tasks. This layer creates and monitors the job execution environment and releases the resources assigned to the task upon task completion. The execution environment supplies the workspace customization for the task, including environment variables, scratch disk settings, security context, and execution integrity, in addition to application-specific launching and recovery mechanisms.

Navigation Buttons A set of buttons at the lower left of the Administration Console that shift the view and context to different areas of Windows HPC Server 2008 management and administration. For example, clicking the Job Management Navigation button opens the Job Manager. Node Manager The job agent and authorization service on the compute node. The Node Manager also starts the job on the node. Scheduling Policies Windows HPC Server uses nine scheduling policies: Priority-based first come, first served (FCFS) Backfilling Exclusive scheduling Resource matchmaking Job template Multilevel compute resource allocation (MCRA) Preemption Adaptive allocation (grow/shrink) Service-Oriented Architecture Service orientation provides an evolutionary approach to building distributed computing software that facilitates loosely coupled integration and resilience to change. With the advent of the WS-* Web services, architecture has made service-oriented software development feasible with mainstream development tools support and broad industry interoperability. Although most frequently implemented using industry-standard Web services, service orientation is independent of technology and its architectural patterns and can be used to connect with earlier computing packages. Service orientation need not require rewriting functionality from the ground up. By wrapping existing HPC code into modular services, the developer can extract more value from what is already there and extend and evolve the existing applications beyond the boundaries of what they were designed to do, for example, a batch computation solver can be rendered as interactive solver services. Session A session is a connection between the application and the services. A session consists of a managed pool of service instances on the compute nodes so that the application can decompose the domain and distribute the calculation requests to the pool to accelerate the processing speed. Task Execution

Windows HPC Server 2008 has two types of tasksbasic tasks and parametric tasks. A basic task uses a single command line that includes the run command, along with metadata that describes how to run the command. A basic task can be a parallel task and can be run across multiple nodes or cores. Parallel tasks typically communicate with other parallel tasks in the job using the Microsoft Message Passing Interface (MS-MPI), or through shared memory when running on multiple cores on a single node. A parametric task contains a command line with wildcards, letting the same task run many times with different inputs for each step. A parametric task can be a parallel task and can be run across multiple nodes or cores. Windows Communication Foundation Windows Communication Foundation is the unified programming model for building SOA applications from Microsoft. WCF enables developers to build secure, reliable, transacted solutions that integrate across platforms and interoperate with existing investments. WCF Broker Stores and forwards request/response messages between client application and service instances. Windows HPC Server 2008 For those seeking productive solutions for high performance computing, Windows HPC Server 2008 provides a comprehensive platform built on Windows Server 2008 that helps to simplify deployment, management, and integration with existing infrastructure, thus helping to improve the productivity of your system administrators, application developers, and users. Windows HPC Server 2008 unites the power of commodity x64-based computers, the security of Active Directory, and the Windows Server 2008 operating system to provide an affordable, easy-to-use, and scalable HPC solution. Windows HPC Server 2008 uses node templates to help simplify and speed deployment of compute nodes using standard Windows Server 2008 deployment technologies. Additional compute nodes can be added to a cluster by simply connecting computers to the network. The Microsoft Message Passing Interface implementation is compatible with the reference MPICH2 and uses high-speed network direct drivers. Integration with Active Directory helps enable role-based security for administration and users, and the use of the System Center user interface model provides a familiar administrative and scheduling interface. The Windows HPC Server 2008 Job Scheduler supports heterogeneous clusters and enables the use of Service-Oriented Architecture applications on the cluster.

You might also like