Combining HepHistogram Objects
from Multiple FileManagers

John Marraffino
5 March 2001

From time to time, particularly in anticipation of dealing with multiple HepTuple output files from farm analysis, it will be useful to be able to combine several sets of HepHistogram objects into a single file. As you know, it is already possible to combine individual HepHistogram objects, roughly as follows.


     HepHist1D & h1 = manager1->retrieveHist1D( "Title1" );
     HepHist1D & h2 = manager2->retrieveHist1D( "Title1" );
     h1 += h2;

where we can think of h2 as an update to h1 even though initially h1 may have been empty. If there were multiple instances of h2 and we sequentially applied += to each of them, always with h1 as the left hand side, then each instance of h2 would be referred to as an incremental update.

Now this has been generalized to collections of HepHistogram objects as well. This note describes that ability and gives some hints, tips and warnings.

Since the semantics are those of accumulation (in the sense of a += operation), we will refer to the manager of the file intended to hold the accumulated HepHistogram objects as the targetManager and the manager of the file intended to supply incremental updates as the sourceManager. There may, of course, be many sourceManagers but, for the moment, we consider only one. The most straightforward operation would then begin with something like


     HepFileManager * sourceManager =
                      new HepHBookFileManager( fileName,
                                         HepFileManager::HEP_READONLY,
                                         blocksize, topDirName );

     HepFileManager * targetManager =
                      new HepRootFileManager( ofileName,
                                        HepFileManager::HEP_REFRESH,
                                        otopDirName );

There is some significance to the second argument to the HepFileManager constructor but we defer discussion of that for the time being.

A new member function named combine and an operator overload of += (implemented to simply pass its arguments to combine) have been added to the HepFileManager class that allow something like the following.


  // Combine two sets of HepHistograms using HepFileManager::combine 

     HepFileManager& target = targetManager->combine( sourceManager );

Alternatively, one can write


  // Combine two sets of HepHistograms using the += operator

     *targetManager += *sourceManager;
 

The combine member function scans the directory trees for the two files searching for HepHistogram objects in the same directories and having the same titles. This is the definition of a match. Note that the id is not considered. When a matching pair is detected, the members of that pair are combined, using the += operator from the corresponding object class (HepHist1D, HepHist2D or HepHistProf) and the resulting object is left on the file that corresponds to targetManager. Ntuples are not considered since there are other mechanisms for combining them.

In the case of no match, one of two things occurs, depending on the nature of the mismatch. If the unmatched object is among those managed by targetManager, it is simply left unmolested. On the other hand, if the unmatched object is among those managed by sourceManager, it is cloned into the targetManager set, including, if necessary, as many levels of new directories from its original path as are needed to graft it into the targetManager tree. That is, an entire new branch may be created somewhere on the targetManager directory tree. Note that the entire operation takes place in two phases. In the first phase, the source and target directory structures are reconciled and then, in the second phase, the HepHistogram objects are combined. As a consequence of this two phase approach, even if no histograms are combined, the targetManager directory structure may have been altered in the sense of new directories being added. Under no circumstances is anything ever removed.

Since combine returns a HepFileManager reference, one can write things like


     HepFileManager& target = targetManager->combine( sourceManager1 )
                                            .combine( sourceManager2 )
							.
							.
							.
                                            .combine( sourceManagerN );

for combining several sets of HepHistogram objects at one stroke, assuming the corresponding managers have all been instantiated previously. This represents the basic functionality of the HepFileManager::combine member function and its co-conspirator, +=.

Earlier, we deferred discussion of some details of the use of the HepFileManager constructor for this use. There are two issues involved. First, one would like to play safe and instantiate the file manager for the source files so as to protect its HepHistogram objects from inadvertent damage. The natural tendancy would be to instantiate a const file manager and be done with it. For some technical reasons, that will not work. (Among other things, a const file manager will not allow use of the cd member function!) The alternative is what is shown above. There is a HepFileManager constructor that takes a so-called mode parameter. The allowed values are defined through an enum in the file manager base class header and HepFileManager::HEP_READONLY is one of them - with the obvious meaning.

The second issue is more subtle. Recall that both HBook and Root will carry multiple versions of a object, referred to as cycles. For histogram objects, these are denoted on directory listings by something like ObjectTitle;n where n is the cycle number. By default, each merge of two histogram objects will produce a new cycle of that object without purging any of the previous cycles. Consequently, a three-increment merge, for example, of some object will produce a target manager file with four cycles of that object - the original one and an additional one for each merge. Indeed, for some applications, this may be desirable but the price is that the file will grow fairly quickly if there are many histogram objects and many merges. In the case where retaining all cycles is not needed, setting mode to HepFileManager::HEP_REFRESH is useful. This has the effect of suppressing creation of multiple cycles and is the logical equivalent of doing the incremental update "in place." Note however that this only works for the Root manager. The other managers will simply ignore this mode value and carry on in their default fashion.

At this point, a few more words on what was rather glibly referred to as a "matching pair" of HepHistogram objects are in order. We said that the only requirements were that the two objects were in the same directory and had the same title. More properly, two such objects should be thought of as match candidates. When using the += operators from the HepHistogram classes, additional basic requirements are imposed. These are

If any of these conditions is not satisfied, a warning level ZMexception is thrown and the merge attempt for this pair of HepHistograms is abandoned. While these requirements are necessary (indeed the underlying combine algorithms require them), it is still possible that a user may want even closer control of the definition of a "match." To that end, another level of abstraction is available.

The HepTuple kit now includes a new class named HepHistCombiner that is intended to serve as a base class for a user-defined, descendent class. An instance of that class is then passed to the HepFileManager::combine member function. At the point where a candidate pair has been defined, the HepHistCombiner::combine function is invoked. If the user has provided an overriding function, that function is called with references to the two HepHistogram objects as arguments and the user may do whatever higher level tests are required. If the user has provided no overriding function, the base class member function does nothing and abandons the merge. It is prudent for the user's combine function to impose the basic compatiblity tests in addition to whatever else is needed because the standard implementation of += will fail without it. Most of this is shown in the following example where, for brevity, we show only our override of the combine function for HepHist1D objects.


     #include "HepTuple/HepHistCombiner.h"

     class myCombiner : public HepHistCombiner {

     public:

       void combine( HepHist1D& target, const HepHist1D& source )
       {
     // Use the default test for compatibility. Then use the default 
     // combination algorithm for compatible histograms. This differs
     // from the default case only in that we bother to check that the
     // source histogram is not empty.  We also make some noise so 
     // they know we've been here.

         if( target.compatibleHist1D( source ) ) {
           if( source.entries() > 0 ) {
             target += source;
             cout << "HepHist1Ds with title " << source.title()
                  >> " are compatible and have been combined" << endl;
           } else {
             cout << "Source HepHist1D is empty. Skip it." << endl;
           }
         } else {
           cout << "HepHist1Ds are not compatible." << endl;
         }
       }

     // Similar overridden function definitions for 2D and Profile
     // histogram objects would go here.

     };

With this, the driving program starts the action as follows.


     // Merge the two files using my custom combiner algorithm.
     // Instantiate a myCombiner object and pass it to
     // HepFileManager::combine

       myCombiner mycomb;
       HepFileManager& target 
		= targetManager->combine( sourceManager, mycomb );

Although the example is basically silly, it shows the mechanics of how to take complete control of the HepHistogram combining. The business of testing for empty histograms could just as easily be some sophisticated Kolmogorov test or some such. At the same time, it is no painful stretch of the imagination to see how one might use this machinery for something entirely apart from just merging two histogram sets. But that, as the saying goes, is another story.