----------------------------- ----------------------------- The ZOOM ErrorLogger Package testing coverage ----------------------------- ----------------------------- How the Physicist Logs Errors 1) How the Physicist has access to the logger 2) How a physicist issues a log message 3) How a physicist can issue a log message in multiple statements: 4) How an instance of ErrorObj may be formed 5) What will the error message output look like 6) What summary information will be available 7) How the physicist can indicate what is being done 8) A list of the available severity levels Frameworker's Basics of Setting up and Using Error Loggers 9) How the frameworker sets up for logging 10) How the Module (or Package) Sets Up errlog 11) Global ErrorLog Instantiation 12) ELcontextSupplier: How the run and event numbers are indicated: 13) Basics of Thresholds and Limits 13.5) Filtering by Module or multiple modules 13.7) DiscardThreshold 14) Controlling abort behavior 15) Obtaining error statistics summaries 16) Clearing statistics and/or limits Further Options for the Frameworker 17) Checking on errors 18) Error counts 19) Traces and control over whether traces are logged 20) Time span in conjunction with limits 21) ELerrorList Local Semantics 22) Accessing Saved Error Messages 23) How ELstatistics information is kept 23.5) Access to the statistics map 24) Formatting control available in ELoutput 24.5) Hex output formatting via ErrorLog Customization Hooks and Issues Relating to Collective Logging 25) What an ELdestination Must Do 26) (removed) 27) Avoiding lists, strings, streams, and templates 28) Table Limitations 29) Direct logging to one destination 30) Custom Severity Levels? **************************************** * The ErrorLogger Syntax and Semantics * **************************************** How the Physicist Logs Errors ----------------------------- 1) How the Physicist has access to the logger --------------------------------------------- 2) How a physicist issues a log message: ---------------------------------------- errlog (ELerror, "Too much energy") << "E = " << totalEnergy << endmsg; a b c d e f Here is the breakdown of this syntax: a) The ErrorLog object has a () method which returns something you can treat like an ostream, that is, you can do << to it. It takes two mandatory arguments -- a severity and a message ID. The presence of these arguments has the effect of saying that this is the start of an error message. <> <> b) ELerror is one of the ErrorLogger severity levels. They all start with EL (they are listed below). They are all instances (provided by the ErrorLogger package) of the class --ELseverityLevel--. Invoking this operator() of the ErrorLog object has the effect of starting a new error message, and establishing the severity of that error. <> <> c) The second argument is a string (or char*), and determines the message identifier (ID). Although the entire content of the string will go into the output text, the message ID is considered for statistics and limits purposes to be the leading 20 characters of this string, padded with trailing blanks. <> <> d) There follow an arbitrary number of further outputs to the message. These will together form an informational string, which most destinations will print after the ID, with suitable line-break insertions. These further outputs are optional. <> <> e) Notice that the further arguments need not be strings (although you could, if you wish, format them all up into one string by using a stringstream). In the example, the double totalEnergy is put to the stream. Any data type or class that could be streamed to cout can be sent to the log message in this way. <> <> f) The endmsg at the end of the message is treated specially, and indicates that the message is over. Only one endmsg should be placed in a message; the user is free to insert explicit "\n" characters and/or endl should control of multiple lines be desired. <> <> <> See Note 1 about endmsg. An alternative to doing log (sev, id) is to use the provided ERRLOG macro: ERRLOG ( sev, id ) equivalent to errlog ( sev, id ) << __FILE__ <<":" << __LINE__ << " " <> This assumes that the ErrorLog available is named errlog; a second macro is ERRLOGTO ( logname, sev, id ) equivalent to logname ( sev, id ) << __FILE__ <<":" << __LINE__ << " " <> <> 2.5) Messages for Debugging: ---------------------------- An alternative way to issue a message is to provide to errlog(), instead of a severity and id, a single integer representing a debugging level. errlog (3) << "Some key information" << endmsg; <> If the framework for the job has set the debug verbosity level to this value or higher, then the message will emerge. But if the verbosity level is lower than the integer provided -- and by default if the framework does nothing the level is considered to be zero -- then the message will be ignored with little runtime overhead cost. -- How the Physicist Logs Errors -- 3) How a physicist can issue a log message in multiple stamements: ------------------------------------------------------------------ errlog (ELerror, " "Too much energy") << "E = " << totalEnergy; if ( condition ) { errlog << "more stuff"; } errlog << "yet more stuff" << endmsg; g h <> <> <> g) The ErrorLog object also has a direct << method. This streams more into its message string, without declaring a new error or establishing a severity level. Thus you may continue to build up the message, by having repeated lines which do not use endmsg until the last one. <> h) Although key error information (id, severity, timestamp, context, and so forth) is captured immediately when the log () method is done, the --ErrorObj-- object remains "open" to additional data until either the endmsg is encountered, or some other invocation of errlog () occurs. <> <> If the physicist forgets to start a new message with log (severity, id) and instead then does errlog << stuff, then assuming the previous message was closed off with an endmsg (or this is the first one for this log), a new message will be opened. The severity assigned to such a message will be ELunspecified. <> An error message is dispatched to each of the potential logging destinations when it is closed. If upon termination there is still an active message being issued, the destinations and ELadministrator will terminate it, thus sending it to each destination, and preventing information from being lost. See note 12: "Termination issues," for an outline of the technical steps needed to accomplish this. <> -- How the Physicist Logs Errors -- 4) How an instance of ErrorObj may be formed independently ---------------------------------------------------------- An --ErrorObj-- is an object representing an error message. Although normally neither the physicist nor the frameworker works with an ErrorObj directly, one can form such an object and later send it to a log. This might be done, for example, if you want to build up some history of what was done, but only log it if some ghastly condition later occurs. ErrorObj myMsg ( ELwarning, "Suspicious Pt" ); j k l errlog (myMsg); m myMsg = ErrorObj ( ELsevere, "Out of space" ); myMsg << "was doing step" << 20 ; n p errlog (myMsg); j) You can instantiate some pre-defined ErrorObj and, when and if an error happens, use it. <> <> k,l) The constructor for ErrorObj takes the two mandatory fields: severity and id. <> <> m) We support sending one of these formed messages to the log. <> <> n) ErrorObj has operator<< so you can pump further info into it, just as in the case of pumping more into the log. <> <> p) Notice in both this and the previous example we did not put endmsg. endmsg is inappropriate in ErrorObj's, and ought not to compile. When an ErrorObj is supplied to a log (as in errlog(myMsg)) this is known to be complete and the implied endmsg is supplied automatically. <> <> -- How the Physicist Logs Errors -- 5) What will the error message output look like: ------------------------------------------------ Although each possible destination may do different things with an error message, the ErrorLogger package supplies a standard --ELoutput-- destination which will output (to a stream or file) as follows: %ERLOG-w Too much energy: E = 834.750032 d0L2proc5 CTCDRVmodule CTCTRKsubr 10-Jul-1999 14:49:03 CST run=234 event=543 <> This is the default formatting; if some portion of the message would overrun the end of an 80-column line, it will instead be started on the next line, indenting to align with the first character of the id. <> A space is inserted to separate each item (each object put to the log via the << operator) from the next. The user can force line breaks by putting a \n as the first or last character in an item. (If \n appears in the middle of an item, indentation will not be done and the column formatter may introduce an unneeded new line when it thinks 80 columns would be exceeded.) <> 6) What summary information will be available --------------------------------------------- There will be a method (normally invoked by the framework at the end of a job or run) to deliver a summary, either to one or more of the log destinations or to another ostream or file. The summary information will be a table by message type (message type means the combination of ID, process, module, subroutine, and severity) of count. If there were any occurrences of a message that were not logged anywhere due to limits, that is indicated by an asterisk. A second part supplies up to 3 example contexts for each message type: The first two occurrences, and the last one. In a third part there will be information about total counts at each severity level. Counts are given since the last clear, and also a total for the whole job. The responsibility of triggering the output of summary information belongs to the frameworker rather than the individual physicists. An illustration of the format of the summary information is given in the section on "Obtaining error statistics summaries." -- How the Physicist Logs Errors -- 7) How the physicist can indicate what is being done: ----------------------------------------------------- The physicist may OPTIONALLY declare the name of the subroutine currently executing. When an error is logged, this name will go into the message, and also into the overall statistics. The user will generally only declare fairly large steps, to avoid some overhead. errlog.setSubroutine( "myName" ); <> <> <> <> An alternative is available: If after the ID string, an item is a string of the form "@SUB=myname", then myname is treated as the subroutine name, regardless of what setSubroutine() has set up. errlog ( ELsevere, "Bank Confusion" ) << "@SUB=prepare_IO" << endmsg; <> 8) A list of the available severity levels: ------------------------------------------- Each severity level has a corresponding ELseverityLevel object, for example, ELwarning. Each such object has two relevant behaviors: a) Methods getSymbol() and getName() to get its symbol and full name respectively; the destinations will use these methods to prepare text. <> <> b) There is a comparison to ask whether one ErrorSeverity is more severe than another. So the concept of logging everything above a certain severity is meaningful. <> The ErrorSeverity objects -- instantiated globally by the package -- are: Severity object Symbol Full name Intention --------------- ------ --------- --------- ELzeroSeverity -- -- used for thresholds only ELincidental .. .. flash this on a screen ELsuccess -! Success report reaching a milestone ELinfo -i Info information ELwarning -w Warning warning ELwarning2 -W Warning! more serious warning ELerror -e Error error detected ELerror2 -E Error! more serious error ELnextEvent -n Next advise to skip to next event ELunspecified ?? ?? severity was not specified ELsevere -s Severe future results are suspect ELsevere2 -S Severe! more severe ELabort -A Abort! suggest aborting ELfatal -F Fatal! strongly suggest aborting! ELhighestSeverity !! !! used for thresholds only <> The frameworker can control whether declaring severe, abort, or fatal errors actually will abort the job. The intentions listed for all the error types are only advisory; it is up to the framework to do what the experiment intends. If no abort threshold is set, then even ELfatal errors will not automatically abort the job. ELzeroSeverity and ELhighestSeverity are not supposed to be used in forming error messages but ae available to the frameworker when setting up various thresholds. An ELseverityLevel may be constructed from a string (or char*). This allows a user or framework to accept run-time input specifying what severity to assign to a given possible error. ELstring input; cin >> input; ELseverityLevel mysev (input); // ... errlog (mysev, "this error"); <> The options accepted for a given level are its symbol, its full name, the name of its severity object -- these are as listed in the above table -- and all-caps names: "ZERO" "INCIDENTAL" "SUCCESS" "INFO" "WARNING" "WARNING2" "ERROR" "ERROR2" "NEXT" "UNSPECIFIED" "SEVERE" "SEVERE2" "ABORT" "FATAL" "HIGHEST" The translation used is case-sensitive. If no match is found, then ELunspecified will be used. Frameworker's Basics of Setting up and Using Error Loggers ---------------------------------------------------------- 9) How the frameworker sets up for logging ------------------------------------------- The ErrorLog objects that modules set up and that users see require that an ELadministrator exist. This encapsulates all the framework control over how logging is done; in particular, the ELadministrator holds the information as to what destinations should be used, and the way to get context information when needed. The framework, outside the context of any ordinary module, should get an instance by invoking the ELadministrator::instance() method. It will then use methods of this to establish how to get context information, and to attach various ELdestination sinks. So first the framework instantiates the ELadministrator: #include "ErrorLogger/ELadministrator.h" ZM_USING_NAMESPACE( zmel ) /* using zmel; */ ELadministrator * logger = ELadministrator::instance(); <> <> Notice that the Singleton pattern returns a pointer to the single instance. Next the framework provides the ELcontextSupplier. This is done by creating a class derived from ELcontextSupplier and overriding the pure virtual methods such as context(). An instance of this class is passed to logger.setContextSupplier(). MyContextObj contx; // See below for how the derived context // supplier looks. logger->setContextSupplier (contx); <> <> <> Providing a context supplier is optional; it provides a way for the logger to get at the run and event numbers when an error is logged. If no supplier is provided, no run/event information will be associated with messages. Further details are given below in "ELcontextSupplier: How the run and event numbers are indicated." The frameworker may also declare the name of the overall process (or job, or node in a farm situation). When an error is logged, this name will go into the message. This could be used to distinguish among many cooperating nodes which share the same output for destinations. logger->setProcess( "PCfarm81" ); <> -- Frameworker's Basics -- Having instantiated logger and told it how to get context information, the framework must attach one or more destinations -- sinks for the messages and statistics to be sent to. An error logger with no sinks is probably useless. The logged information can go to each destination attached to the logger. But not every message will be acted on by every destination; each is subject to thresholds and limits, as discussed below in "Basics of Thresholds and Limits." Each destination must be derived from ELdestination. Four classes derived from ELdestination are provided by the ErrorLogger package: --ELoutput-- is constructed taking an ostream, and is the typical way to associate a log with either cout, cerr, or an ofstream for a file. It has two forms of constructors: ELoutput xx(ostream &); // An ostream (e.g. cerr). <> <> <> ELoutput xx("xfile"); // A file name. // This is mostly a convenience, since // one could have supplied an ofstream. // See Note 9: "ELoutput (fileName)" // for details. <> <> <> --ELcollected-- may be constructed taking an object of class derived from ELsender, which implements the framework's chosed transport mechanism. A frameworker in a multi-procss job could put one or more ELcollecteds on the list to route messages to one or more collection points. <> --ELerrorList-- may be constructed taking a list of ErrorObj's. A frameworker could put an ELerrorList to be able to have code periodically examine any ErrorObjs that were logged. <> <> <> --ELstatistics-- is a table of message frequencies and sample contexts, which contains methods for generating run summaries. It may also have an associated ostream, to which unreported statistics would be sent when a job terminates. <> <> <> The usual framework will use ELstatistics and one or more ELoutput sinks. To attach a destination, the frameworker must instantiate it, and attach the destination to the ELadministrator. Attaching the destination returns an --ELdestControl-- object, which is a handle for controlling the behavior of that destination. If no destination is attached to the ELadministrator then one is attached automatically to "cout" when the first error is logged. ELoutput logfileD("myfile.log"); ELdestControl logfile = logger->attach( logfileD ); <> <> Later, the ELdestControl may be used to control the behavior of this destination, as in logfile.setLimit("*", 20); Sometimes it is desirable to modify the behavior of a destination, from code other than the method which attached that destination to the logger. In that case, it is likely that the ELdestControl handle will have gone out of scope. To be able to recover this handle, there is an optional second aregument to attach, meant to take the name of the destination. ELdestControl logfile = logger->attach( logfileD, "logfile" ); <> Subsequently, code can recover a handle to logfileD: ); // in later code, where that logfile has been lost ELdestControl logfile bool handleFound; handleFound = errlog.getELdestControl ("myLogFile", logfile); if (!handleFound) { ... oops, wrong name ... } <> A bit about object ownership and such is given in note 5: "ELdestControl details." The important thing to know is that all actions that apply to the ELdestination class can be invoked off the ELdestControl; it is a faithful proxy. Such methods include setting limits, thresholds, and time spans, triggering summary output, and even directly sending an ErrorObj. -- Frameworker's Basics -- Sometimes, a framework may wish to permanently or temporarily "turn a destination off." This can be done by logfile.setThreshold(ELhighestSeverity) <> <> -- Frameworker's Basics -- 10) How the Module (or Package) Sets Up errlog ---------------------------------------------- Thus in the Module base class, the lines setting up the ErrorLog could look something like: class Module { public: Module(module_selection) { ... errlog.setModule( "moduleName" ); }; protected: ... ErrorLog errlog; } <> (An additional routine, setPackage(name), is identical in operation to setModule(name).) <> <> If this setup is used, the classes derived from Module, which contain the physicists' code, need do nothing further to be able to issue messages to errlog. An alternative constructor is available which would let a package declare at global scope an ErrorLog with a given package name: myerrlog = ErrorLog("myPackageName"); <> 11) Global ErrorLog Instantiation --------------------------------- #include "ErrorLogger/ErrorLog.h" ErrorLog errlog; main() { // The main framework ... errlog.setModule ("Framework Level"); errlog (ELsevere, "Event Sequence Bad") << "Events went from" << prior_event << " to " << event" << endmsg; ... } <> <> <> <> -- Frameworker's Basics -- (b) The problem with using a global errlog as shown above can be seen from the following scenario: Module Tracker (with name "TRACKER") calls the utility chiSqFit which is at global scope (not a member function of Tracker). chiSqFit detects a problem, and having been instrumented for logging, does errlog (ELwarning, "bad fit") << ndegrees << endmsg; where now errlog is a global scope logger which we assume has been set up with errlog.setModule("Framework Level"). This message will appear in the log as having come from module "Framework Level" rather than module "TRACKER." The frameworker may well judge that the identity of the originating module is much more important to know, particularly since the utility does have the opportunity to identify the subroutine in which the problem occurred. The solution to this dilemma lies in explicitly doing a setModule(name) to the global errlog when entering a new module. (You could conceivably automate this, by doing ::errlog.setModule in the Module code before the key function is invoked; but for simplicity we will illustrate doing this explicitly.) Thus: main() { // The main framework ... errlog.setModule ("Framework Level"); errlog (ELsevere, "Event Sequence Bad") << "Events went from" << prior_event << " to " << event" << endmsg; ... errlog.setModule("FindTracks"); // In case of use of the global errlog FindTracks.doit(event); // The FindTracks module. errlog.setModule("DoPhysics"); // In case of use of the global errlog DoPhysics.doit(event); // The DoPhysics module. ... } <> Notice that those setModule calls will not be necessary if all the error logging happens in places where the errlog in scope is that of the module; in that case, the automatic mechanism gets the proper module name with no extra concern for the frameworker. -- Frameworker's Basics -- 12) ELcontextSupplier: How the run and event numbers are indicated: ------------------------------------------------------------------- The run and event are printed in the text of every error message. However, the framework does not call some setEvent() method every time a new event is started. In order to avoid overhead for every event (when errors might be logged in only a tiny fraction of those events), this information is not pre-supplied by the framework. Instead, when an error is being logged, the logger asks for the run and event names at that point. It asks by invoking the context() function of the --ELcontextSupplier-- object set up for the logger. The frameworker should create a class derived from ELcontextSupplier and pass an instance of that class to the constructor of the ELadministrator, as shown above ("How the frameworker sets up for logging"). The context supplier is specified via the setContextSupplier() method of ELadministrator. If this method is not invoked, no context information will be associated with error messages, no user function massage error objects will be invoked, and no trace information will be output. An ELcontextSupplier defines the following simple interface: class ELcontextSupplier { public: virtual ELcontextSupplier* clone() const = 0; virtual ELstring context () const = 0; virtual ELstring summaryContext () const = 0; virtual ELstring fullContext () const = 0; virtual void editErrorObj(ErrorObj & msg) const; virtual ELstring traceRoutine() const; } As seen from this, the class derived from ELcontextSupplier **must** define the clone() method (a trivial recipe for this is given below) and the three virtual methods returning ELstring, though some may of those be identical to others. Although destinations are free to call for and use whichever context string they wish, the intent of the three forms is as follows: context() is the form used by the typical output-to-a-log-file destination, for example ELoutput. Here the string ought not to be too long, to avoid clutter in the log; but no limitation or truncation is imposed per se. A typical string would be "run= 1234 event= 12345". <> summaryContext() is the form used by ELstatistics to get a string suitable for insertion into a table. This is length-critical, and will be truncated at 16 characters in the statistics summary. A typical string would be 1234/12345. <> <> fullContext() is a form intended for use where extra info may be useful and length is not a big consideration. ELoutput can enable fullContext() instead of context() on a per-destination basis. fullContext() would typically just return context(), but another example might be "run= 1234 event= 12345 reco version 3.2" <> If a custom form of ELstring is used which may have a fixed length limit, the context supplier should protect against over-writing past the end of the available space. Another mandatory method is the clone() method. This, however, is just a trivial copy into new memory. The following "mantra" for writing a context supplier can be followed: class myContextSupplier : public ELcontextSupplier { public: myContextSupplier * clone() const; ELstring context() const; ELstring fullContext() const; ELstring summaryContext() const; }; // myContextSupplier myContextSupplier::clone() const {return new myContextSupplier( *this )} ELstring myContextSupplier::context() const { ostringstream ev; /* whatever code it takes to form context string in ev */} return ev.str(); } ELstring myContextSupplier::fullContext() const { return context(); } ELstring myContextSupplier::summaryContext() const { return context(); } This mantra assumes that fullContext and summaryContext are the same as the context string; it should be obvious how to put different code there. Those familair with Visual C++ on NT will note that the abouve mantra will not work, since clone() is declared to return an ELcontextSupplier* in the base class ELcontextSupplier*. Standard C++, but not Microsoft, supports covariant return types; VC++ requires exactly matching return types for virtual functions. For this purpose, ZMenvironment.h provides a macro which resolves to ELcontextSupplier* on NT: ZM_COVARIANT_TYPE(ELcontextSupplier *, myContextSupplier *) clone() const; We recommend this be used to promote portability to NT while keeping type safety wherever possible. However, none of the functionality of the ErrorLogger package is hindered by the assumption (used on NT) that clone() returns an ELcontextSUpplier*. The remaining routines are optional and need not be supplied in the user-specific class derived from ELcontextSupplier: The editErrorObj() routine will be called when the message is started. It provides a hook to modify the module or subroutine information, the id, or any part of the message other than the actual text items (which would not have been established yet). This method is optional; the editErrorObj() method of the base ELcontextSupplier class is an inline method doing nothing. (The editErrorObj() routine also provides a backdoor by which the framework can cause a specific routine to be called every time an error message is started. It will be called exactly once for each message instantiated, and the message id, module and subroutine, and severity will be available at that point.) <> The traceRoutine() routine will be called for each destination that is going to log a trace (see section (19) Traces and control over whether traces are logged). It must return an ELstring, containing the text to be appended as the trace. This method is optional; if it is not present, the traceRoutine() method of the base ELcontextSupplier class provides a trace as best the package knows how. Currently, we don't know how to provide a trace, so the string returned will be empty. This may change for some or all systems in the future. <> (Obviously, traceRoutine() is another possible hook for a framework to tie handling routines, but since it is called only for destinations with appropriate trace thresholds, and may be called more than once for a given message, it may be unwise to use it this way.) -- Frameworker's Basics -- 13) Basics of Thresholds and Limits ---------------------------------- Some nomenclature: Thresholds and limits both apply to particular destinations. When we say "limit", we shall always mean some count of messages of a particular description, beyond which some action will cease to happen. A limit can be specified by message id, or for multiple id's in two ways: A general wild card, or all messages of some severity level. When we say "threshold", we shall always mean some ELseverityLevel, at or above which some action would happen. One can set a limit or a threshold for each individual ELdestination. To do this, you use methods of the associated ELdestControl, which we will call "dest" for these examples. The effects are as follows: Method Effect ------ ------ dest->setThreshold (severity) <> <> <> Suppress logging or acting on messages below this severity, for this destination. dest->setLimit (id, n) <> <> <> <> dest->setLimit (severity, n) <> <> dest->setLimit ("*", n) <> <> <> <> For this destination, don't log past n instances of any given exception id matching the specified type. logger->setLimits ("*",n) <> In case one has established two or more applicable limits, the limit used is the most specific applicable case: Specified ID before specific severity level, and both before wild card "*". (See Note 6: "Limit Semantics" for further details.) <> <> <> <> As implied by the above chart, each ELdestination owns a threshold level (if no threshold is explicitly set, the threshold is ELzeroSeverity). Each ELdestination also owns a general limits table (indexed by severity level) and a limits table (indexed by message id). Each limits table entry contains as data the limit and the count. <> <> Notice that the ELerrorList and ELcollected are just particular destinations. That implies that filtering by severity and throttling by frequency for those are set up via setThreshold and setLimit, just as for other destinations. <> Also notice that ELstatistics is also a particular destination. That means that filtering by severity is set up via setThreshold. In the case of an ELstatistics destination, however, setLimit has no effect: Once an error message id gets into that table, there is absolutely no cost to incrementing its count, so a limit would be useless. -- Frameworker's Basics -- A limit can be made "infinite" by setting it to -1. <> Finally, two features slightly soften the concept of throttling via a limit. The first is that a count is past its limit, action does not cease completely: If the applicable limit for a given error would be L, then all if there is an excess E = count-L, instances with E/L = 2**N for any non-negative integer N, will be logged. Thus if the limit is 5, you will see numbers 1, 2, 3, 4, 5, 10, 15, 25, 45, 85, and so forth. <> <> The second concept applies when a large bunch of errors of some type is followed by a long period with no such errors. At that point the count toward the limit may be reset, so that a new cluster of these errors can again be logged. This is explained below in "Time spans on Limits." <> 13.5 Filtering by Module or multiple modules ------------------------------------------- Individual destinations can filter the messages they will react to by module. The typical framework code related to this routines might look like: logfile.filterModule("Tracking"); // Ignore messages except those from the // Tracking module. <> The model is that there is a `respondToModule' list and an `ignoreModule' list. At any given time, either the `respondToModule' list or the `ignoreModule' list may be active, but not both. Then there are two ways to call each of two methods to influence these: dest.ignoreModule("*"); <> Clear `respondToModule' list, and filter out all messages, except those coming from any module that is later added to the `respondToModule' list. dest.respondToModule("name"); <> Add this module to the `respondToModule' list, and remove it (if present) from the `ignoreModule' list. dest.ignoreModule("name"); <> Add this module to the `ignoreModule' list, and remove it (if present) from the `respondToModule' list. dest.respondToModule("*") <> Clear `ignoreModule' list, and respond to all messages, except those coming from any module that is later added to the `ignoreModule' list. Two other available methods are: dest.filterModule("a") has the same effect as <> dest.ignoreModule("*"); dest.respondToModule("a"); dest.excludeModule("a") has the same effect as <> dest.respondToModule("*"); dest.ignoreModule("a"); 13.7 DiscardThreshold --------------------- The action taken when a message is sent to an ErrorLog includes formatting, timestamping, and "shopping" the message to each destination. Even if no destination reacts to the message, this can take a fair aomount of time. The frameworker (or for that matter any user with access to errlog) can short-ciruit this by doing errlog.setDiscardThreshold (sev) <> The meaning is that any messages sent through the usual mechanism of errlog ( severity, id ) << items << endmsg, which have severity less than sev, will be discarded with the absolute minimum possible work. The message will not be shopped to destinations, and each operator<< (until another errlog(sev, id) is started) will simply return, doing no work. The net effect is a speedup of null-message processing by a factor of 30. The discardThreshold works like other thresholds in that if a message severity is greater or equal to the threshold, it can be reacted to. The discardThreshold takes precedence over individual destination's thresholds. The default discardThreshold is ELzeroSeverity. <> 13.8 Establishing Debug Verbosity Levels ---------------------------------------- The ErrorLogger package supports a separate syntax for issuing information intended for debugging. The idea is that this output can easily be disabled at an overall level, set on a per-ErrorLog basis by the framework. The fundamental notion is that of a debug "verbosity" level, which is simply an integer. If a message is given a level which is higher than the cuttoff chosen by the framework for that logger, then that message will be completely (and efficiently) ignored. errlog.setDebugVerbosity(3); <> <> The default, if no debug verbosity level is set for an ErrorLog, is a level of 0: All debug messages assigned any positive verbosity level will be ignored by default. The cut-off on debugging verbosity level is controllable for each individual ErrorLog. This allows the framework to accomodate cases where verbosity needs to be high for one module but not for others. By default, every message issued through this debug message mechanism is assigned a severity of ELinfo and a message id of "DEBUG". The framework can alter this for a given ErrorLog, for example: errlog.setDebugMessages ( ELwarning, "differentID" ); <> <> The mechanism of debug verbosity levels acts very much like that of discard thresholds for the ErrorLog, in that: + If the debug verbosity is too low, the message is discarded regardless of thresholds set for various destinations, and never even makes it to statistics. <> + If the debug verbosity "cut" is passed, then individual output destinations may still reject a given message based on its severity not being high enough. <> -- Frameworker's Basics -- 14) Controlling abort behavior ------------------------------ The higher severity levels are used by physicists to indicate that they consider this error so severe as to warrant terminating the job. However, ultimate control of this rests in the hands of the framework. The control is associated with the logger (the ELadministrator) rather than any specific destination. logger->setAbortThreshold (ELfatal); <> <> <> By default, the abort threshold is ELabort. <> If the physicist issues a log message with severity as at or above the abort threshold, UPON COMPLETION OF THE LOG MESSAGE and dispatch of that to the various destinations, the logger will terminate the job by invoking exit (severityLevel). <> <> <> Assuming the framework had established an atexit() handler, this would be called as the job exits. <> A plausible philosophy is to set the abort threshold at ELhighestSeverity, so that NO user message aborts the job. In that case, we advise periodically checking for the presence of severe errors (see "Checking on errors"). <> -- Frameworker's Basics -- 15) Obtaining error statistics summaries ---------------------------------------- Assuming the frameworker has attached an instance of the provided ELstatistics destination to the logger, statistics summaries may be obtained. ELstatistics has no visible reaction to being sent an error message, but places information into a table, which can be formatted and output. For the purposes of the illustrations below, we will assume that the logger has attached an ELstatistics destination, with an ELdestControl called "stats." The summary of error statistics can be obtained as a string. It can also be sent to a destination device (as a series of summary lines); ELoutput places these lines into its associated stream to output the summary. Other destinations may treat these lines differently; in particular, ELcollected and ELstatistics ignore them. To obtain the summary: stats->summary(dest); // Sends the summary to this ELdestination. stats->summary(destControl); // <> <> <> <> <> stats->summary(os); // Sends the whole summary string as a char*, // to some arbitrary ostream os. <> stats->summary(s&); // Sends the whole summary string to ELstring s. <> Except when the framework supplies a string& s, the summary will be sent using char* arguments, a single line (with a fixed maximum length) at a time. Each of the above forms also has an optional last argument, containing a char* to insert into the first line of the summary string. stats->summary(dest, "summary title"); < <> stats->summary(os, "summary title"); <> stats->summary(s&, "summary title"); <> ELoutput puts up 40 characters of this title in the first line of the summary output. (Custom destinations should protect themselves against misbehaving if sent an arbitrarily long title.) <> There is an additional means of having the summary sent to an ostream: If the constructor of ELstatistics is supplied an ostream argument, then when the job completes the destructor for that ELstatistics will check to see if any information has changed since the last summary request. If so, one final summary will automatically be sent to that ostream. (This is safe assuming the user does not explicitly delete the ostream before the statistics destination goes away.) If Elstatistics was constructed without specifying this terination ostream, then it will default to use cerr as this "termination ostream." <> To suppress the termination summary altogether: stats.noTerminationSummary(); <> Not every message will make it to become a summary entry: The frameworker can set a threshold for the ELstatistics destination, filtering out errors below the severity of below (say) ELerror. Also, ELstatistics can be constructed with a maximum number of error IDs it will keep track of. -- Frameworker's Basics -- The format of the summary information is that of three parts: The first part lists the errors which have occurred, with their frequencies. The second part re-lists just the identification, and supplies the contexts (that is, the run/event or up to 17 characters of the context string) of the first two and last examples of occurrences of each error. The third part lists the count of occurrences of each severity level. <> The presence of an unterminated message at the end of a job will terminate themessage, and trigger the termination summary. <> -- Frameworker's Basics -- 16) Clearing statistics and/or limits ------------------------------------- The method to clear the counts in the statistics being kept in the destination whose ELdestControl is "stats" is stats->clearSummary(); <> This clears both the individual statistics (kept by message ID) and the counts for the various severity levels. It would normally be used after having invoked stats->summary(..) in some form to stream the information somewhere. clearSummary() does not zero the aggregate counts for individual error IDs or for severity levels. It also does not affect any limits set, or wipe the knowledge of the message IDs from the statistics tables. (Thus there would remain a bunch of errors with counts of zero; when statistics are output these do not appear.) There is a more sweeping method: stats->wipe(); <> This clears everything -- counts and aggregate counts for individual ID's and for severity levels. The statistics is wiped clean, which may be relevant if memory management issues have imposed a limit on the number of message ID entries in the statistics table. The same routine also wipes out any information in the limits table. This includes values which have been supplied by setLimit() (or setTimespan()), and counts of the individual instances of each message ID. For the ELstatistics destination this is moot (since limits do not apply) but for ELoutput or ELerrorList this may be useful when a maximum number of message ID entries in the limits table has been imposed. dest->wipe(); <> logger->wipe(); // Applies wipe() to all attached destinations <> <> This clears everything -- counts and aggregate counts for severity levels and for individual ID's, as well as any limits established. (This includes the limits for "*" all messages.) The table is wiped clean, which may be relevant if memory management issues imposes a limit on the number off message ID entries in the statistics table. Finally, there is a way to zero the counts going toward the limits of all IDs in a destination. This does not affect ELstatistics (which does not use limits) and will not compactify the limits table (see Note 6: "Limit Semantics"). It also does not affect aggregate counts. dest->zero(); <> <> **************************************** * Further Options for the Frameworker * **************************************** 17) Checking on errors ---------------------- Since the severity levels above ELnext indicate advice that the flow of processing ought to be modified or ceased, it is advisable (in all but the most time-critical applications) that the framework check for the presence of such errors, between modules or between events. To make this inexpensive, a routine checkSeverity() is provided. ErrorSeverity highest = logger->checkSeverity(); if ( highest >= ELnext ) { do whatever }; <> This routine provides the severity level of the highest error declared since the last checkSeverity(), or ELzeroSeverity if none have been declared since the last check. 18) Error counts ---------------- You can obtain a cumulative count of errors of a given severity -- including those which occurred but were not sent to a destination or saved. logger->severityCount(ELerror); <> <> <> logger->severityCount(ELsevere, ELfatal); <> <> The two-argument form adds the counts of the severities from the first to the second level; for example, severe1, severe2, abort and fatal. These counts are can be reset to zero by logger->resetSeverityCount(ELerror); <> <> logger->resetSeverityCount(ELerror, ELfatal); <> logger->resetSeverityCount(); <> The latter resets all the severity counts. Note that these methods of the ELadministrator "logger" have nothing to do with the counts kept in ELstatistics for summary purposes. In the two-argument forms of these routines, if the first is a higher level than the second argument, the count will be zero and nothing will be reset. <> -- Further Options -- 20) Time span in conjunction with limits --------------------------------------- Sometimes, you may have a large bunch of errors logged in a short period of time, due to a specific problem within, say, a peculiar event. The limit mechanism prevents an output destination from being swamped with more than N of these identical messages. If the messages continue to arrive on a regular basis, then this throttling should continue to apply. But sometimes, there may be a burst of errors, hitting the limit, followed by a long hiatus, followed by another instance or burst of that type of error. In that case, you may well wish to see the errors immediately after the dry spell. The way this is provided is by the concept of a time span. A time span is a number of seconds associated with a given error type. When a message arrives to ELoutput or ELerrorList (which use the same ELlimitTable class to implement their throttling behavior), the count (toward throttling based on the limit) might be zeroed. The count is zeroed if the number of seconds between the previous occurrence of this type of error and the present occurrence exceeds the time span for this type of error. Of course, if the count is zeroed, this and the next N errors will not be suppressed. (On systems where getting time information is impractical, time span will be moot.) The semantics of setting time spans, and choosing which time span is applicable, is exactly the same as for setting limits. However, a time span is expressed as a float number of seconds t. dest->setTimespan (id, t) <> dest->setTimespan (severity, t) <> dest->setTimespan ("*", t) <> logger->setTimespans (id, t) <> logger->setTimespans (severity, t) <> logger->setTimespans ("*", t) <> A time span (or a limit) can be "unset" by supplying a honking large value that will never be reached. 21) ELerrorList Local Semantics -------------------------------- The ELerrorList is intended to support something that was done in Run I: ] At least one experiment had the framework examine errors AFTER an event was completed. One could imagine storing the error information in a data bank tied to the event, for example. The ELerrorList records an ErrorObj by pushing in onto a std::list using push_back(theErrorObj). The list it works with is that list supplied to its constructor. ELerrorList will add one item to the ErrorObj: It invokes fullContext() to get the full run/event context and adds that as a last item before pushing the ErrorObj onto the list. The list remains the property of (and the responsibility of) the framework. Thus the ELerrorList Local Semantics is merely the semantics of std::list. Of relevance: * The user can iterate through the list to get at each ErrorObj, and can examine each one. * The user can invoke list.clear() between events to keep the list size minimal. * IMPORTANT -- The list must not be allowed to destruct while errors might still be logged. For example, the following is wrong and will likely core dump: ErrorLog errlog; int main() { setup(); doThingsThatMightLogErrors(); } void setup() { // .. std::list theList; logger->attach(ELerrorList(theList)); <> } void doThingsThatMightLogErrors() { // things had better **not** log errors, because the ELerrorList // will still want to record them, but theList is not valid here. } The operations which come from the user or frameworker, to impact ELerrorList mostly come from the fact that it is an ELdestination subclass: setThreshold() Thresholds, limits and time spans are set setLimit() as for any other ELdestination class. setTimespan() 23) How ELstatistics information is kept ---------------------------------------- The error statistics kept by the ELstatistics destination are logically a map. The combination of 20-byte message id, severity, process, module and subroutine, which we for convenience combine to form --ELextendedID--, acts as the key for this map. The data is a (zero-able) count and an aggregate count for each type of message, plus the (brief) contexts (run/event) of the two first and one latest instance of occurences of such a message with non-null contexts. A last piece of data for each entry is a flag telling whether any message of this type has been throttled out of all the destinations because of limits. In "How the framework sets up for logging," it was mentioned that ELstatistics should be attached after all the ordinary destinations. ELdestControl logfile = logger->attach ( ELoutput(cerr) ); ELdestControl logcerr = logger->attach ( ELoutput("myFileName.log")); ELdestControl logstats = logger->attach ( ELstatistics(cout) ); ELdestControl errbuf = logger->attach ( ELsaveBuffer(20000) ); This is the recommended order, for the following reason: Each destination indicates whether it has ignored a given message because of a limit or threshold. An ELstatistics, when passed an error object, notes whether any destination has YET actually logged the error message; if not, it will mark that error type as having an instance that was ignored by every destination due to limits. So if you want that asterisk in the statistics summary to be meaningful, attach ELstatistics after all the destinations which might output the message. In the above example, we attached errbuf after ELstatistics because we want the * to appear in the summary if an error was neither logged to logfile nor logcerr. For instance, one could imagine a attaching a scrolling screen that will get millions of messages AFTER logstats, saying that if the instance only was output there, we want to note in the summary that some instance appears in no relevant destination logs. In addition to the information kept in the individual error table, a count and an aggregate count of errors by severity level is kept, so that that information may be supplied in the summary. You can access the ELstatistics information by the stats.summary(..) methods. The ELstatistics class provides the default constructor (using cerr as its ostream), and also constructors taking two pieces of information. The first is a limit on the number of distinct entries for which statistics will be kept. The second is an ostream to be used to output a final statistics summary upon job termination (or whenever the ELstatistics object is destructed) if statistics have changed singe the last requested summary - see Note 12: Termination issues. ELstatistics( int spaceLimit=-1); <> ELstatistics( std::ostream osp ); <> ELstatistics( int spaceLimit=-1, std::ostream & osp ); <> The example above, ELdestControl logstats = logger->attach ( ELstatistics(cout) ); sets up statistics with no limit on the number of entries, and when logstats is destructed, if it has changed since the last summary, it would send its summary as a string to cout. 23.5 Access to the statistics map --------------------------------- The frameworker can obtain a std::map containing the same information available when an ELstatistics summary is requested. The difference is that for puposes within a program, it may be easier to extract information from the ELextendedID keys and StatsCount structures than to parse summary output lines. ELdestControl logstats = logger->attach ( ELstatistics() ); ... std::map m = logstats.statisticsMap(); <> To use this, one should know about the ELextendedID struct from ELextendedID.h, and the StatsCount struct from ELmap.h. (There is no guarantee that ELstatistics keeeps this information in the form of a map. However, if it is kept in another form, then statisticsMap() will do the transformation and supply a map.) 24) Formatting control available in ELoutput -------------------------------------------- The ELoutput destination provided allows the frameworker to control some aspects of the format outputted for each error message. The methods in the second column of this chart reflect the default behavior. dest->suppressTime() dest->includeTime() <> <> <> dest->suppressModule() dest->includeModule() <> dest->suppressSubroutine() dest->includeSubroutine() <> dest->suppressText() dest->includeText() <> dest->suppressContext() dest->includeContext() <> dest->includeSerial() dest->suppressSerial() <> dest->useFullContext() dest->useContext() <> Note that these methods should be called as methods of the destination control obtained when the ELoutput was attached to the logger. Users can use \n at the start or end of items sent to the log, to control line formatting. ELoutput also supports a couple of routines to force newlines into the "epilogue" of context and time infomation appended after the last item: dest->separateEpilogue() dest->attachEpilogue() <> dest->separateTime() dest->attachTime() <> The former places a newline after the last user-supplied item so that the run/event context, module name, and and time stamp come out on a fresh line. The latter forces the timestamp to come out at the start of its own line (after the usual indent). These may help with easy scanning of the log output. Their inverse functions in the second column can be called to reverse the process. Another flexibility is setting line length: The ELoutput destination by default formats messages lines by starting a new line whenever an item would go past column 80. One can change that line length: dest->setLineLength(len) dest->getLIneLength() <> Note that this is done separately for each destination, thus a log might have 132-column formating while a screen output sticks to 80. Note also that for the ELstatistics destination, the formatting -- which assigns fixed column widths for various fields -- is unaffected by this line length. Another flexibility is squelching the "ErrorLog Established" message that comes out at the top of the log. This message can be very useful in cases where a log was appended to an existing file. Nonetheless, it is sometimes desirable that a program with no problematic occurances emit absolutely nothing to the error logging stream. This is done when constructing the ELoutput destination. A second argument is accepted; this is a boolean which defaults to true, but if false will squelch the ErrorLog Established message. ELdestControl logcerr = logger->attach ( ELoutput(cerr, false) ); <> ELdestControl logfile = logger->attach ( ELoutput("myFileName.log", false) ); <> The default constructor for ELoutput does not provide this flexibility. For further flexibility, a custom ELdestination must be created. 24.5 Hex output formatting via ErrorLog --------------------------------------- When data of integer type is output, it is occasionally preferable to see the value in hex. The package does not support streaming the hex manipulator into an ErrorLog, because technical considerations involving the details of signature of hex make doing this in a portable manner too difficult. Instead, the framworker or a user can set an ErrorLog to print all integer items larger that some trigger value in a format like 258 [0x00000102]: errlog.setHexTrigger (int n); <> The rules are as follows: The logic determining whether to use the hex form is on a per-ErrorLog basis. Hex output is done whenever an int, long, or short (or the unsigned form of one of those) is streamed to the errlog, and the absolute value of that integer is greater than or equal to the hexTrigger established for that ErrorLog. <> The default value of hexTrigger is negative. A negative value turns off hex formatting. <> 29) Direct logging to one destination -------------------------------------- Although in general all logging is done through ErrorLog objects, which work through the ELadministrator, the frameworker may log a formed ErrorObj directly to an ELdestControl. bool reacted = logfile.log ( ErrorObj & myMsg ); <> As implied, the destination's thresholds and limits still apply; the log() method returns false if the destination did not react to this ErrorObj. If this backdoor mechanism is used, it has bypassed the ErrorLog functions. In particular, ErrorLog normally provides module and subroutine strings. To restore this ability, the ErrorObj class has two further methods: myMsg.setModule ( "whatever" ); <> myMsg.setSubroutine ( "whatever" ); <> Note that (unless the destination happens to be ELstatistics) error messages logged in this manner will not be reflected in the statistics. Generally, it is a better idea to send things to all destinations using errorlog (myMsg); and let each destination filter out what it does not want. Another point for custom destinations: In ELdestination.h we have defined the method log() which is used in two senses. Generally, it is the method by which errlog sends the full message to a dest (e.g., ELstatistics) that wants the full message at once. However, it can also be invoked directly as a way to log the message to a destination via a back door. The problem is that the former usage implies NOT processing individual items, because they assumedly have been processed. The latter implies processing items if that is how they get output. To allow for this, the destination must be prepared for a backdoor log, if it has an appropriate action to take. In some cases (ELoutput, for instance) it can know absolutely that log() was coming through the backdoor because skipMessageObject was set so the ELadministrator would never call its log(). Then it can know to process the individual items. I can imagine a dest that reacts both to items and to the overall message. In that case, one should check msg.isFromELadministrator to see if the backdoor mechanism was used.