The ZOOM ErrorLogger Package:

Further Options for the Frameworker


Error counts

You can obtain a cumulative count of errors of a given severity -- including those which occured but were not sent to a destination or saved.
	logger->severityCount (ELerror);
	logger->severityCount (ELsevere, ELfatal);
The two-argument form adds the counts of the severities from the first to the second level; for example, severe1, severe2, abort and fatal.

These counts are can be reset to zero by

	logger->resetSeverityCount(ELerror);
	logger->resetSeverityCount(ELerror, ELfatal);
	logger->resetSeverityCount();
The latter resets all the severity counts. Note that these methods of the ELadministrator "logger" have nothing to do with the counts kept in ELstatistics for summary purposes.

In the two-argument forms of these routines, if the first is a higher level than the second argument, the count will be zero and nothing will be reset.


Error History: ELerrorList

You can set up to keep a std::list of all the Error Objects that were produced as a result of logging error messages. To do this, first instantiate a std::list<ErrorObj>. You then construct an ELerrorList destination supplying that list, and attaching that destination to the logger.
  std::list<ErrorObj> theList;
  ELerrorList theErrorListD ( theList );
  ELdestControl theErrorList ( logger->attach(theErrorListD) );   
You can control what sort of errors the ELerrorList destination will react to, as per other destinations, by setting thresholds, limits, and a module filter if desired.
  theErrorList.setThreshold(ELerror);
  theErrorList.setLimit(*,10);
When the ELerrorList reacts to a message, it creates an ErrorObj which is identical to that of the message, adds one last item containing the full context of this error, an places that ErrorObj at the end of the ELerrorList.

The list (in the above example, theErrorList) remains owned and controlled by the instantiator. Thus at any time the framework can find out information about the list, examine any of the ErrorObjs present, or clear the list so that it does not grow indefinitely.

  int numErrors = theList.size();
  std::list<ErrorObj>::const_iterator e;
  for ( e = theList.begin(); e != theList.end(); ++e ) {
    if (e->.xid().id == "Tracking error") {
      // do whatever is to be done if a tracking error has happened
    }
  }
  list.clear();

One warning: Do not allow the list to be destroyed while errors may still be logged to it!

Unlike what the logger does with destination objects (it creates and retains a clone so that a destination object can safely be discarded once attached to the logger), ELerrorList does not take over control of the list supplied to its constructor. It simply adds ErrorObjs to that list, trusting that the reference to the list remains valid.


Getting Information from an ErrorObj

The header ErrorObj.h defines the ErrorObj class; this in turn contains an ELextendedID.

The information you can extract from an ErrorObj is:


Control Over whether traces are logged

Obviously, if a message is not logged at some destination (because it misses the severity threshold or has occured more times than its limit) no trace is sent to that destination either. But even if the message is logged, you may not want the additional information (and clutter) of a trace unless the message is sufficiently severe:
   dest.setTrace(severity)   -- For errors logged to this destination, 
				 include the trace (if available) if severity 
				 is at least this level.
The default provided for ELoutput is setTrace(ELerror).

(However, at this time, we do not have the ability to generate useful trace information based on the calling stack.)


Timespan in conjunction with limits

Sometimes, you may have a large bunch of errors logged in a short period of time, due to a specific problem within, say, a peculiar event. The limit mechanism prevents an output destination from being swamped with more than N of these identical messages.

If the messages continue to arrive on a regular basis, then this throttling should continue to apply. But sometimes, there may be a burst of errors, hitting the limit, followed by a long hiatus, followed by another instance or burst of that type of error. In that case, you may well wish to see the errors immediately after the dry spell.

The way this is provided is by the concept of a timespan. A timespan is a number of seconds associated with a given error type. When a message arrives to ELoutput (which uses the same ELlimitTable class to implement its throttling behavior), the count (tward throttling based on the limit) might be zeroed. The count is zeroed if the number of seconds between the previous occurence of this type of error and the present occurence exceeds the timespan for this type of error. Of course, if the count is zeroed, this and the next N errors will not be suppressed.

(On systems where getting time information is impractical, timespan will be moot.)

The semantics of setting timespans, and choosing which timespan is applicable, is exactly the same as for setting limits. However, a timespan is expressed as a float number of seconds t.

   dest.setTimespan      (id, t)
   dest.setTimespan      (severity, t)
   dest.setTimespan      ("*", t)
   logger->setTimespans  (id, t)
   logger->setTimespans  (severity, t)
   logger->setTimespans  ("*", t)
A timespan (or a limit) can be "unset" by supplying a honking large value that will never be reached.


How ELstatistics information is kept

The error statistics kept by the ELstatistics destination are logically a map. The combination of 20-byte message id, severity, process, module and subroutine, which we for conveneince combine to form --ELextendedID--, acts as the key for this map. The data is a (zero-able) count and an aggregate count for each type of message, plus the (brief) contexts (run/event) of the two first and one latest instance of occurence of such a message with non-null contexts.

A last piece of data for each entry is a flag telling whether any message of this type has been throttled out of all the destinations because of limits. In "How the framework sets up for logging," it was mentioned that ELstatistics should be attached ater all the ordinary destinations.

	ELdestControl  logfile  = logger->attach ( ELoutput(cerr)      );
	ELdestControl  logcerr  = logger->attach ( ELoutput("myFileName.log"));
	ELdestControl  logstats = logger->attach ( ELstatistics(5000)  );
This is the recommended order, for the following reason: Each destination indicates whether it has ignored a given message because of a limit or threshold. An ELstatistics, when passed an error object, notes whether any destination has YET actually logged the error message; if not, it will mark that error type as having an instance that was ignored by every destination due to limits. So if you want that asterisk in the statistics summary to be meaningful, attach ELstatistics after all the destinations which might output the message.

In the above example, we attached errbuf after ELstatistics because we want the * to appear in the summary if an error was neither logged to logfile nor logcerr. But of course the frameworker has flexibility. For instance, one could imagine a attaching a scrolling screen that will get millions of messages AFTER logstats, saying that if the instance only was output there, we want to note in the summary that some instance appears in no destination logs.

In addition to the information kept in the individual error table, a count and an aggregate count of errors by severity level is kept, so that that information may be supplied in the summary.

You can access the ELstatistics information by the stats.summary() methods.


Getting ELstatistics information as a map

The frameworker can obtain a std::map< ELextendedID, StatsCount > containing the same information available when an ELstatistics summary is requested.

The difference is that for puposes within a program, it may be easier to extract information from the ELextendedID keys and StatsCount structures than to parse summary output lines.

  ELdestControl logstats = logger->attach ( ELstatistics()  );
  ... 
  std::map m = logstats.statisticsMap();
To use this, one should know about the ELextendedID struct from ELextendedID.h, and the StatsCount struct from ELmap.h.

An example of usage is in the file testStatsMap.cc.


Changing the ostream used by an ELoutput

The frameworker can switch ostreams used by an ELoutput destination. For example, one set of runs can write to one file, and then this can be closed for examination while another file is used for further messages. This is done, of course, via the associated destControl.
  ELdestControl logfile = logger->attach ( ELoutput("earlyFile.txt")  );
  ...
  logfile.changeFile("laterFile.txt"); 
Just as an ELoutput may be constructed supplying either a file name (in which case it will create and own a new ofstream) or by supplying a reference to an ostream, so too the ostream may be switched by supplying a file name (as above) or an ostream:
  std::ostream os1, os2;
  ELdestControl logfile = logger->attach ( ELoutput(os1)  );
  ...
  logfile.changeFile(os2); 

A good general rule is that if an ELoutput was constructed by giving an ostream it should be and then switched supplying an ostream; and if it was constructed by file name it should be switched by file name.

There is a potential trap if this rule is violated and the two files or streams involved refer to the same actual object. A documentation page shows the details of this trap.

In addition, if you just want to inspect a file while a long job is running, knowing that it is up to date on the error messages issued, a flush command is provided:

  logfile.flush(); 


Formatting control available in ELoutput

The ELoutput destination provided allows the frameworker to control some aspects of the format outputted for each error message. The methods in the second column of this chart reflect the default behavior.
  dest.suppressTime()        dest.includeTime()
  dest.suppressGMT()         dest.includeGMT()
  dest.suppressModule()      dest.includeModule()
  dest.suppressSubroutine()  dest.includeSubroutine()
  dest.suppressText()        dest.includeText()
  dest.suppressContext()     dest.includeContext()
  dest.includeSerial()       dest.suppressSerial()
  dest.useFullContext()      dest.useContext()
Note that these methods should be called as methods of the destination control obtained when the ELoutput was attached to the logger.

Users can use \n at the start or end of items sent to the log, to control line formatting. ELoutput also supports a couple of routines to force newlines into the "epilog" of context and time infomation appended after the last item:

dest.separateEpilogue()      dest.attachEpilogue()
dest.separateTime()          dest.attachTime()
The former places a newline after the last user-supplied item so that the run/event context, module name, and and time stamp come out on a fresh line. The latter forces the timestamp to come out at the start of its own line (after the usual indent). These may help with easy scanning of the log output. Their inverse functions in the second column can be called to reverse the process.

If a timestamp is included, by default it tacks on the Greenwich Mean Time (more technically known as the Universal Common Time). This feature can be suppressed. dest.suppressGMT() and dest.suppressGMT() are moot if the overall timestamp is suppressed.

Another flexibility is setting line length: The ELoutput destination by default formats messages lines by starting a new line whenever an item would go past column 80. One can change that line length:

dest.setLineLength(len)      dest.getLineLength()
Note that this is done separately for each destination, thus a log might have 132-column formating while a screen output sticks to 80. Note also that for the ELstatistics destination, the formatting -- which assigns fixed column widths for various fields -- is unaffected by this line length.

Another flexibility is squelching the "ErrorLog Established" message that comes out at the top of the log. This message can be very useful in cases where a log was appended to an existing file. Nonetheless, it is sometimes desirable that a program with no problematic occurances emit absolutely nothing to the error logging stream. This is done when constructing the ELoutput destination. A second argument is accepted; this is a boolean which defaults to true, but if false will squelch the ErrorLog Established message.

ELdestControl logcerr  = logger->attach ( ELoutput(cerr, false) );
ELdestControl logfile  = logger->attach ( ELoutput("myFileName.log", false) );
The default constructor for ELoutput does not provide this flexibility.


Hex output formatting via ErrorLog

When data of integer type is output, it is occasionally preferable to see the value in hex. The package does not support streaming the hex manipulator into an ErrorLog, because technical considerations involving the details of signature of hex make doing this in a portable manner too difficult.

Instead, the framworker or a user can set an ErrorLog to print all integer items larger that some trigger value in a format like 258 [0x00000102]:

  errlog.setHexTrigger (int trigger);
The rules for deciding whether an integer will have its hex value output along with its decimal value are as follows:
  1. The logic determining whether to use the hex form is on a per-ErrorLog basis. Message using a different ErrorLog will see the other ErrorLog's hexTrigger.
  2. Hex output is done whenever an int, long, or short, (or the unsigned form of one of those) is streamed to the ErrorLog, and the absolute value of that integer is greater than or equal to the hexTrigger established for that ErrorLog.
  3. The default value of hexTrigger is negative. A negative value turns off hex formatting.

An example of usage is in the file testHex.cc.


Spaces after ints are output

The default behavior when an int-type variable is part of a message is not to append a space after it. (Strings and floating point numbers do have ending spaces automatically appended.) Often, output will be easier to read if spaces are appended. To enable this behavior, the framework can do:
  errlog.setSpaceAfterInt (true);


Recovering an ELdestControl handle

If you wish to modify the behavior of a destination, but no longer have in scope the ELdestControl obtained when that destination was attached to the logger, that handle can be recovered as long as it was assigned an identifying string when attached. For example, if the setup code looked like
  ELdestControl logfile;
  logfile = logger->attach(ELoutput ( "filename.log" ), "myLogFile" );
then later, after that logfile object has been lost, you can get the handle back by
  ELdestControl logfile;
  bool handleFound;
  handleFound = errlog.getELdestControl ("myLogFile", logfile);
  if (!handleFound) { /* ... oops, wrong name -- user chooses what to do */ }
Assuming the string supplied is recognized as being one the id for an attached destination, logfile, which is passed by non-const reference, will become a handle to control that destination.

If getELdestControl() is supplied a string it does not recognize, then an error message of severity SEVERE2 will be sent to Errrlog, and logfile will be left unmodified. In the above example logfile was initialized as a default ELdestControl, and methods invoked on this default ELdestControl will have no effect on any destinations.


Limiting the size of Error Count Tables

By default, each unique type of error message (that is, each new type of extended id encountered) will result in an entry to a table in each destination, which tracks the frequency of occurence of that type of message for purposes of imposing a limit on the number of times it is output. Since the space of possible id's is huge, under some circumstances, one may wish to impose an absolute limit on the number of id's tracked.
  ELdestControl logfile;
  logfile = logger->attach(ELoutput ( "filename.log" ), "myLogFile" );
  logfile.setLimit("*", 10);
  logfile.setTableLimit (200);
In the above example, if 250 different types of messages are logged, only the first 200 go into the table. Thus if 50 messages of the first type are issued, the limit will suppress messages of the first type after 10 have been output. But since messages of type 201 won't have an entry in the table, all 50 messages of that type would be responded to and output.


ErrorLogger Package Page


Mark Fischler
Last modified: Thu Mar 15 2001