It is important to understand that any limitations on block and column names imposed by the underlying implementation still apply here. As an example, we note that HBOOK limits block and column names to a maximum of 8 characters and will truncate longer names to fit. On the other hand, HepTuple itself will recognize the full names as presented, resulting in a mismatch between itself and HBOOK that can produce startling behaviour. The user will find it wise to observe the inherent limitations of the underlying implementation.
The single anomolous case is that of an Ntuple, being defined or specified by supplying only the manager and the title.
In HBOOK the designated variables are set up on a per-block basis. The start
of a block is provided via the VARIABLE argument to
HBNAME, or
the TUPLE argument to HBOOKNC.
In HBOOK the correct definition of a block is as a COMMON block.
In this package, a block should be a struct.
That option is also available in this package, via the method
COMMON block.
The analogous concept in C++ is either a global struct, or a struct allocated
off the heap via new. Unlike a Fortran COMMON block,
the C or C++ the compiler is free to "pad" a structure to optimize word
alignement. For example, in the following struct, the releative addresses
likely to be assigned by a compiler are shown:
struct x {
Int2 a; // Address 0
Int4 b; // Address 4 (skipped 2 to word-align)
Float4 c; // Address 8
Float8 d; // Address 16 (skipped 4 to longword-align)
Float16 e; // Address 32 or 24 (might skip 8 to quadword-align)
}
The implementation of the Ntuple manager needs to know about this padding. so that it can capture the proper data for each column. For simple homogenous blocks, such as a collection of N floats, the word alignment happens automatically without need for padding -- for such structures, the chform syntax is appropriate. Whenever a struct is inhomogenous, one should explicitly describe each column, to provide its designated variable. The package will now compute the padding used, and allow for it in implementing the Ntuple.
FlatNuples do not use designated variables; the accumulate method takes a pointer to a data array as an argument.
COMMON declaration.
This format is retained not only for the convenience of experienced HBOOK users, but because in some cases it may be easier to use than explicitly filling in the properties of each column. However, when a block is inhomogenous and might involve padding, it becomes necessary to supply the designated variables for each column explicitly.
This package supplies routines to convert from a chform string to an array of ColumnAttribs structures, and vice-versa:
The ColumnAttribs structure has:
- char* tag; // . . . . . . Must be unique across entire Ntuple
- ColumnData_t type; // . . Int2_ct, Int4_ct, Float4_ct, Float8_ct, ...
- void* variable; // . . .. Designated variable
- void* defaultVal; // . .. Value assigned to uncaptured data
- char* block; // . . . . . Name of block this column is in
- void* blockStart; // . .. Address of start of this columns block
- int nbits; // . . . . . . Packing - number of bits. 0 = no packing
- float rangeLo; // . . . . Packing - minimum value in range
- float rangeHi; // . . . . Packing - maximum value in range
- int indexLo; // . . . . . Span of index variable - minimum
- int indexLo; // . . . . . Span of index variable - maximum
- int ndim; // . . . . . .. Number of dimensions in array of columns
- int* extents // . . . . . Array containing extents in each of
ndims dimensions
- char* index // . . . . .. Column tag for column to use as index for first
dimension
A design decision needs to be made about these various pointers in a structure which will itself often be new-ed.
-
Int1_ct; Uint1_ct
-
Int2_ct; Uint2_ct
-
Int4_ct; Uint4_ct
-
Int8_ct; Uint8_ct
-
Float4_ct;
-
Float8_ct;
-
Float16_ct;
- Bool_ct;
- Bool2_ct; // For compatibility with LOGICAL*2 in HBOOK
- Bool4_ct; // For compatibility with LOGICAL*4 in HBOOK
-
Char_ct;
-
Char4_ct;
Char8_ct;
Char12_ct;
Char16_ct;
-
Char20_ct;
Char24_ct;
Char28_ct;
Char32_ct;
variable argument when defining a column
variable argument is a void * because the
contents of the column might be an int, float, or whatever. The package
could also provide runtime
type-safe methods by supplying several different signatures
such as The range is specified in terms of floats; there is no provision for specifying the minimum and maximum values wwith greater precision than that.
Note that the numbering is zero-based rather than 1-based; this agrees with the convention used by HBook.
The labels are generated with numbers in the of "Fortran order", which is the reverse of the C++ ordering. That is, while in an array declared as
float x[4][3][2];
Assuming columnwise storage, the entire array of columns is not made to share one buffer. Instead, each individual column goes to one buffer and hence is blocked in big chunks on the disk. Thus a viewing juob which only scans X(1,2,3) for every row need not read in extraneous data pertaining to X(1,2,1) and so forth.
For example, if J is an index variable for X(J), and J goes from 1 to 5, then five columns X(0), X(1), ... X(4) are formed Now if 4 rows have respectively J=1, J=3, J=2, J=5, the actual column buffers would hold 4, 3, 2, 2, and 1 values respectively. Note that when an index variable is used, the index is the count of how many columns are to be filled, and the columns are still labeled as 0-based.
For arrays with more than one dimension, the index variable is always the range of the slowest varying dimension. In C++ that is the first index in an array x[J][5][3], while in Fortran that would be the last index X(3,5,J).
An index variable must always have a defined span of possible values. The data for an index variable is packed in a lossless way, which relies on that span being accurate. If a column array might have no entries, the intex span should start from zero. An index outside its defined span will cause a warning message, and the storage of only the first spanMax array elements for any column using that index for that row.
This default value may be mmodified byt the columnDefault method. Note that this takes an int, float, or single char; then int form can be used to supply a value for short or long, and the float form for double. More sophisitcated precision on this default value is not supported.
Whether or not the retrieval method selected simultaneously supplies the value of the index variable, columns which are "not present" for a given row because the index for that column array for that row was too small, will be set to their default values when retrieval of the row, block, or array of columns would cause them to be filled.
Some underlying managers will also allow for entire blocks to be uncaptured for a row; in these, the contents of the retrived block will be every column holding its default value, if that block was not captured for the retrieved row.
For example, the user might have some complicated calorimetry structure which will sequentially be filled ten times for the event ~, based on different raw data. Although ten distinct instances of that structure to be Ntuple-d might be good coding practice, it is perimissible to use just one instance and capture it into block 1, 2, 3, ... 10 before doing register() to move to the next row.
Actually, all columns with no specified block which were defined between one row-store and another will end up in the same block.
Block names again are treated as their first 8 characters. When a column is defined, you may use a longer form for its name than just its 8-byte column tag -- the format is bname::ctag and we call this a nametag. For example, in the nametag "BLOCK5::PxDev" the column tag is PxDev and it is in block BLOCK5.
Unfortunately, one cannot use block names as a namespace mechanism to allow disjoint users to choose their column tags arbitrarily: The column tag must be unique across the entire HepNtuple. (HBOOK imposes this restriction.) Some other form of column tag naming agreement must be reached for a given program.
Other than the un-named ("don't care") block, which can be assigned anew when new columns are added after a store has been done, the block name assigned is the block name used. We have abandoned the thought of automatically changing the block name under various conditions.
This has a manager-specific consequence for HBOOK:
Character std::string columns may have to be placed into distinct blocks from ordinary variables. The manual implies this is necessary. If so, it will be an error to mix character and numeric values in the same block if an HBOOK manager is used.