|
This
is a first draft of a technical architecture for SORMA provided by Lars
Rasmusson from SICS. An architectural task force within SORMA is
currently developing a consolidated architecture.
At the top of the
layered architecture are user agents, which are software components
that create resource specifications, encoded in an SLA language. The
SLA language is also understood by brokers, software that can act to
create the service requested in the project. The brokers need only to
understand a subset of the SLA language. For instance, one broker may
only be able to understand explicit requests for SUNgrid resources.
Users and brokers register their requests and capabilities in a
messaging framework, the Open Grid Market. (It can be either a
centralized database, or a peer-to-peer system.) The grid market
matches the requests with the advertised capabilities, and reports back
to the requesting user.
Figure 1. The
layered architecture. User agents and brokers talk the expressive SLA
language. The Open Grid Market matches their requests and puts them in
contact. Brokers and resource fabrics talk some fabric specific
language to allow brokers to acquire resources. The surveillance of the
running nodes is made through a shared information gathering layer to
simplify the brokers’ tasks of discovering and keeping track of nodes
and their status.
As an example (the numbers correspond to the numbers in Figure 1):
(1) a user submits a request for nodes that can run gcc2.96.
(2) A broker
declares that it can provide nodes that run linux2.4 with a base redhat
8.1 distribution, including gcc2.96, and more. (1+2) The grid market
forwards the users request to the broker, and forwards the broker’s id
to the user agent.
(3) The user agent
sends a request to the broker, who replies with an offer, which could
either be ignored or followed by an accept from the user.
(4) The resource
fabrics register their presence in the Information Gathering
Infrastructure, which is a kind of bus where status messages and logs
are aggregated.
(5) The brokers
collect information from the bus about the available resources, and
their status. Different resource fabrics can advertise different
information, and it is up to the brokers to be able to interpret the
information correctly.
(6) A broker that
has got an accept on an offer (see msg 3 above), talks to the resource
fabric market/reservation service in order to get resources in
accordance with the SLA.
In this
architecture there is no centralized bank or currency. It makes it
easier to plug in other, already existing frameworks, if we don’t have
to be able to convert the currencies. So at least initially, the
currency is resource fabric specific. A user could for instance only
request Tycoon nodes if it has money in the Tycoon bank.
The architecture
sketch here does not yet include details about security nor specifies
which specific protocols that should be used. These issues will be
addressed by iteratively upgrading the protocols based on the use case
requirements.
Example:
An initial simple SLA language can only talk about applications (yes, this is VERY simplified!).
SLA-version: 0.1
Action: (Tell|Ask)
Application:
Broker-IP: :
Host: :
A broker registers at the Grid Market by sending i.e.
SLA-version: 0.1
Action: Tell
Application: gcc4.0.2
Broker-IP: 193.10.66.141:7685
A user queries the Grid Market by sending i.e.
SLA-version: 0.1
Action: Ask
Application: gcc4.0.2
The Grid market has
saved all the broker announcements, and does a simple string match on
the SLA-version: and Application: fields to determine which brokers
that can broker the request. In our example, the Grid Market replies to
the user with
SLA-version: 0.1
Action: Tell
Application: gcc4.0.2
Broker-IP: 193.10.66.141:7685
The user can now connect directly to the broker and send its query again:
SLA-version: 0.1
Action: Ask
Application: gcc4.0.2
to which the broker replies
SLA-version: 0.1
Action: Tell
Application: gcc4.0.2
Broker-IP: 193.10.66.141:7685
Host: 193.10.66.20:8483
This means that the
user can connect to a gcc service at that host. We of course have to
specify if it is a SOAP, RPC, ssh, or something else interface. But for
version 0.1, we are satisfied with a simple interface where one telnets
the host, pipes a tarball of files to stdin, and reads the tarballed
result from stdout. Note that the 0.1 protocol has no provision to have
any other transport protocols. We will just leave that for later
versions, perhaps by adding a Transport: field in the SLA.
Anyway, the point is that now the
user can easily get jobs running. And, more importantly, the protocol
is so simple, and not the least bit general, so you all are following
what is happening this far.
Now, how did the
broker know that there was a resource fabric host that had a gcc
service running? The answer is that it talked to the information
gathering infrastructure. In its first incarnation, it is very similar
to the grid market. It understands messages of the format:
RESOURCE-version: 0.1
Action: (Tell|Ask)
Application:
Host: :
Poll: :
CPU-Load:
Date:
Just like a broker,
a resource fabric node registers to the information gathering
infrastructure, and then the information infrastructure starts polling
the node every 10 seconds for its current CPU-Load. It records this
information so that brokers can ask for load information, if they for
instance want to recommend a lightly loaded node to its user. The Poll:
fields gives the IP address to which one should telnet to get updated
info.
The resource node sends
RESOURCE-version: 0.1
Action: Tell
Application: gcc4.0.2
Host: 193.10.66.20:8483
Poll: 193.10.66.20:8484
CPU-Load: 0.2
Date: 34857934
to the resource fabric. The broker queries the resource fabric with
RESOURCE-version: 0.1
Action: Ask
Application: gcc4.0.2
CPU-Load: 0.5
Date: 34857924
which in the
protocol version 0.1 means that we want to have one registration where
Date is greater than 34857924 and CPU-Load is less than 0.5, and
Application is exactly gcc4.0.2. In our example, he gets back the
message that the resource fabric node sent to the information gathering
infrastructure. Again, more sophisticated queries like subscriptions,
ranges, etc. are left for later versions of the protocol.
A simple version
0.1 implementation of the Information gathering Infrastructure will
simply keep a list of all the Tell type messages it has received the
last hour, and it will poll all registered resource nodes once every 10
seconds by telneting the Poll address and read stdout. Of course other
implementations are possible. One could implement another
infrastructure that capped the number of polls to one per second, etc.
The point is that it can be done without changing the protocols or any
of the other components.
So with this simple
first version, we can now define which protocol things should be
changed for the next versions. For instance, if it is necessary to use
SOAP based messages, we should define a SOAP based interface for some
later version of the protocol. Again, it should not be feature
complete from start, but the transition from where we currently are to
where we want to go, should be absolutely clear.
|