ZOOM ZOOM

The ZOOM ErrorLogger Package:

Collective Error Logging

The General Idea

The Need

The experiment has several (or many) processes, probably on multiple CPU's, running jobs as part of a unified overall production. These jobs will on occasion issue error messages using the ZOOM ErrorLogger package. (We will refer to these jobs as the "clients.")

However, instead of many distinct Error Log files, the experiment would like all these error messages to come out in a unified way from one collecting process which we will refer to as the "server."

Meeting this Need

The package now provides a new ELdestination named ELcollected. ELcollected has a user interface just like the usual ELoutput destination, and in fact is derived from ELoutput, so the changeover to collective logging for a client program already doing logging should be very easy.

The package also has a way to invoke an ErrorLogger supplying a buffer of message data. This call, used on the server side, lets you take the message that has come in and get it to the server's error log as if it were an ordinary locally-issued error message (but with the remote process and context indicated).

The Transport Mechanism

Each experiment (and possibly each level of trigger and/or reconstruction within an experiment) is likely to have its own prefered means of moving data between the clients and the server. The ErrorLogger package does not try to dictate how this is done. Instead, it lets the experiment provide:

Setting Up at the Sending End

Establishing the Destination

In the place where your framework had been setting up an ELoutput destination, instead (or in addition) set up an ELcollected. This class is derived from ELoutput, and you can control its behavior in the same way. But the constructor for an ELdestination takes an ELsender object (see below) that defines how the transport mechaninsm is done. Note that you can, if you wish, establish multiple ELcollected destinations, with different transport mechanisms, thresholds, and so forth.

Defining the Transport Mechanism

To define the transport mechanism used by the experiment, you will need to declare a class derived from ELsender. The header of this class is quite simple and should copy this boilerplate: The implementation of the clone method also follows a simple boilerplate: The constructor can be completely trivial, or it might take arguments to specify something about the desired transport. For example, if there are several possible points of collection in a VME-based system, you can have the constructor take a VME bus address or whatever.

We have a toy example in BasicColl.cc which mocks up a transport mechanism by writing to a file which will later be read by a server job; there MySender has a data member specifying the file, and the constructor looks like

The key method is send(nbytes, data); this has to take the nbytes starting at pointer data and transmit it to the desired server.

Logging Error Messages

The rest of the code, which might log error messages in various places using an ErrorLog object, remains unchanged.

Setting Up at the Collecting (Server) End

Fundamentally, the server has to set up its own error log, with whatever destinations it wants. Then when each message arrives via the transport mechanism, it must pass the bytecount and data to the errlog object.

A toy example in BasicRecv.cc acts as the server end of our mocks transport mechanism used by in BasicColl.cc, reading a message at a time from the file created and responding to it as if the message came from a genuine asynchronous transport mechanism.

Setting up the Server's Error Log

The Server process is like any job in its use of the ErrorLogger package. That is, it calls ELadministrator::instance() to get an ELadministrator, attaches one or more ELdestinations to that, and instantiates an ErrorLog which we conventionally call errlog.

But here errlog is intended to field error messages stemming from the remote client processes, rather than from problems arising in the server itself.

Notice that we assigned a very short pkgName name to errlog in this example. That is because this name will be prepended to the module name of each message coming from a client. For readability (and especially in statistics keeping, where only the first 16 characters of the module name are keyed upon) it is desirable to keep this name short.

The server process may well have duties beyond active as a central collection point for error messages. If the server process itself may need to issue error messages, we recommend using one or more other ErrorLogs, each of which can have an appropriate package name.

Fielding Messages

The key activities in fielding a message sent from the client process are tied into the transport mechanism, as discussed below. Once the process is aware that a message has been received, and has the nbytes of the message located at *data, the server must do: This looks like issuing any error message, but notice the arguments of errlog now pertain to the message, rather than a severity and an id. This special form of issuing an error message gets the following information which was captured on the client when the ELcollected destination got the error, rather than using local information: Thus the output to any destination on the server will look like the output of the issued error message would have on the client.

The following example code hypothetically detects the presence of a message by polling, assuming that a function getAmessage creates a buffer and places the message data in it, or returns false if no new message is present.

In realistic cases it may be more likely that some asynchronous mechanism is used instead.

Two small wrinkles in the product as currently implemented:

Receiving a Message - The Server Side of the Transport Mechanism

Of course, the transport mechanism is up to the experiment. However, it needs to have the following properties:

Main ErrorLogger Package Page

ZOOM Home Page - Fermilab at Work - Fermilab Home


Mark Fischler
Last modified: August 6, 1999