Collective Error Logging
However, instead of many distinct Error Log files, the experiment would like all these error messages to come out in a unified way from one collecting process which we will refer to as the "server."
ELcollected.
ELcollected has a user interface just like the usual ELoutput destination,
and in fact is derived from ELoutput, so the changeover to collective
logging for a client program already doing logging should be very easy.
The package also has a way to invoke an ErrorLogger supplying a buffer of message data. This call, used on the server side, lets you take the message that has come in and get it to the server's error log as if it were an ordinary locally-issued error message (but with the remote process and context indicated).
ELoutput
destination, instead (or in addition) set up an ELcollected.
This class is derived from ELoutput, and you can control its
behavior in the same way. But the constructor for an ELdestination takes
an ELsender object (see below) that defines how the transport
mechaninsm is done.
MySender transmitter ( ); ELcollected collectedD ( transmitter ); ELdestControl collected ( logger->attach(collectedD) ); collected.setThreshold ( ELwarning ); // and other controlling statementsNote that you can, if you wish, establish multiple
ELcollected
destinations, with different transport mechanisms, thresholds, and so forth.
ELsender.
The header of this class is quite simple and should copy this boilerplate:
class MySender : public ELsender {
public:
MySender ( ); // or a signature taking some arguments
void send (int nbytes, const char * data);
ZM_COVARIANT_TYPE(ELsender *, MySender *) clone() const;
protected:
// Any data you need to know to do your transport mechanism.
};
The implementation of the clone method also follows
a simple boilerplate:
ZM_COVARIANT_TYPE(ELsender *, MySender *) MySender::clone() const {
return new MySender ( *this );
}
The constructor can be completely trivial, or it might take arguments
to specify something about the desired transport. For example, if there
are several possible points of collection in a VME-based system, you can
have the constructor take a VME bus address or whatever.
We have a toy example in BasicColl.cc which mocks up a transport mechanism by writing to a file which will later be read by a server job; there MySender has a data member specifying the file, and the constructor looks like
MySender::MySender ( ofstream & fs ) : file (fs) {}
The key method is send(nbytes, data); this has to take the
nbytes starting at pointer data and transmit it
to the desired server.
errlog
object.
A toy example in BasicRecv.cc acts as the server end of our mocks transport mechanism used by in BasicColl.cc, reading a message at a time from the file created and responding to it as if the message came from a genuine asynchronous transport mechanism.
ELadministrator::instance() to get an
ELadministrator, attaches one or more ELdestinations
to that, and instantiates an ErrorLog which we conventionally
call errlog.
But here errlog is intended to field error messages stemming from
the remote client processes, rather than from problems arising in the server
itself.
ELadministrator * logger = ELadministrator::instance();
ELoutput logfileD( "logFIleName.errlog" );
ELdestControl logfile ( logger->attach(logfileD) );
ErrorLog errlog ("*");
Notice that we assigned a very short pkgName name to
errlog in this example.
That is because this name will be
prepended to the module name of each message coming from a client.
For readability (and especially in statistics keeping, where only the first
16 characters of the module name are keyed upon) it is desirable to keep
this name short.
The server process may well have duties beyond active as a central collection
point for error messages. If the server process itself may need to
issue error messages, we recommend using one or more other
ErrorLogs,
each of which can have an appropriate package name.
ErrorLog servererrlog ("From Server");
nbytes of the message located at *data, the
server must do:
errlog (nbytes,data) << endmsg;This looks like issuing any error message, but notice the arguments of
errlog now pertain to the message,
rather than a severity and an id.
This special form of issuing an error message gets the following information
which was captured on the client when the ELcollected destination got the
error, rather than using local information:
The following example code hypothetically detects the presence of a
message by polling, assuming that a function getAmessage
creates a buffer and places the message data in it, or returns false
if no new message is present.
while (1) {
if (getAmessage(n, data)) {
errlog (n,data) << endmsg;
delete[] data;
}
}
In realistic cases it may be more likely that some asynchronous mechanism
is used instead.
Two small wrinkles in the product as currently implemented:
errlog, rather than the time the
message was issued on the client.
ELstatistics destination on the
server, similar messages from two distinct processes are not lumped together,
since the process name is considered part of the message id for that purpose.
errorlog(nbytes, data) must be invoked.
Detection of the presence of a message
is part of the transport mechanism. For example, if MPI were used, you would
want to post a receive handler which copies puts the data into a local
(stack) buffer, invokes the code that calls
errorlog(nbytes, data), and finally
posts another receive-with-handler.