Good, robust and reliable debuggers are required in code development. Such debuggers are traditionally used with compiled languages. They are de-facto part of the programming environment: one must have one to do physics analysis. GUI front-end are probably prefered to command line driven. As a new scripting language(s) are proposed, one must be able to "debug" in that language. Since this scripting language, or interpreted language, is interactive, one could state that the debugger is de-facto there, as variable can printed, and scripts can be executed one line at a time. However, this does not mean that this interactive programming environment brings all the features of a good debugger for a compiled language: for instance, it is unclear how break points, conditional or not, are managed in complicated scripts. Thus, the script language must have a debugger. Likewise, profiling is particularly relevant when building large software system. Once must be able to locate the code segments that are used more often, so that they can be optmized or migrated to the compiled language. Integration of these tools is a sticky point in a mixed language environment. Although running nested debugger (e.g., starting with the C or C++ debugger, and firing the scripting debugger underneath) is possible, it could be confusing at times. Thus, seamless integration of the debugger/profiler should be on the wish list.
The analysis system (or framework) must to accomodate user-written modules, so that these modules can be interactively called. These modules are written in the prefered compiled language (C, C++, or FORTRAN), or in the scripting language, and can be executed within the "framework". This capability can be either based on dynamical linking, pipes, RPC calls & shared memmory access on UNIX systems, or similar access methods. It is mandatory that the data structures created in the user code, compiled langugage be accessible while running the interactive scripts, from the "framework". It is also desirable that all user-written methods or functions be accessible in an interactive session.
It is strongly desired that we (the RunII-CD/D0/CDF staff assigned to support the RunII analysis tasks) have access to the source code of the ``shareware", or freeware componenents of the runII analysis system or framework. Read access to commercial source code is prefered, but probably not very realistic.
While it is tempting to define robustness as synonymous with being absolutely bulletproof, a little reflection forces us to conclude that this is not> possible and that a more realistic definition is called for. Turning our thinking upside down, it is probably more fruitful to say what robust is not. The one word that seems to embody the opposite of robust is the word fragile in the sense that a fragile system is one that collapses utterly at the first irregularity. In general, there seems to be two gross sorts of irregularities, the first being things for which the user is responsible (pilot error), and the second being missing or faulty system resources which the user had the right to expect were present and functioning but which were, in fact, missing or broken.
To a reasonable extent, sorting exception conditions into these two classes helps to localize them. The first class is connected with the user's interaction with the system and suggests that the user interface needs to pay a lot of attention to validating the user's input before acting on it and potentially doing serious damage. This is painful and tedious but errors of this sort can and should almost always be identified, reported and perhaps even logged from within the interface. Thus the user interface should also be regarded as a sort of gatekeeper, denying access to the internals of the system unless the action request is properly formed and completely valid within the current context. On the other hand, the second class of exceptions tends to be related to the system's management of it's resources. Again, that tends to localize attention to the interaction with things other than the user, such as the file system or the network. Simply hanging or crashing when, for instance, the event data server is unavailable is not acceptable.
This appears to be one of the easy items. Both within the scientific community and the commercial world, web-based documentation is rapidly becoming a standard way of distributing information, occasionally to the exclusion of the more traditional media. One typical example we happen to know is the documentation that accompanies the Rogue Wave Tools.h++ product. There is a set of printed manuals one can buy from the company but they appear to be little more than a hard-copy version of what can be had free from their web site. Moreover, given the production delays inherent in producing and distributing books, it is nearly certain that the documentation available on the web is more current.
The point here is that is it hardly necessary to make an issue of this since the rest of the world seems to have already decided that it is "A Good Thing." On the contrary, the only substantive question appears to be one of obtaining or providing the tools necessary to put our own, internally produced documentation on the web as well.
With respect to standards, it is important to realize that there are both de facto and de jure standards and that neither Fermilab nor even the DOE is in a position to determine either. Certain decisions and strategies have already been adopted by the Run II management that acknowledge this. The adoption of cvs as the code management system is an example of the use of a de facto standard and the adoption of the KAI CC compiler is in anticipation of a de jure standard, as soon as ISO accepts the language committee's recommendation, designated ISO/IEC FDIS 14882. As their press release points out, the FDIS stands for "Final Draft International Standard." At present, it appears that both of these decisions were wise and have already begun to pay dividends, the first clearly more than the second so far.
In fairness, it should be noted that CERN has reached a very different conclusion after starting with a very similar premise. Based on their own experience, CERN has chosen to work with the vendor compilers wherever possible and accept vendor-to-vendor variation, known to be large in the case of C++ code. This was driven in part by heavy reliance on third-party software (such as the NAG library) developed under the "native" compilers.
At the same time, certain other strategies, based on accepting developments at other DOE labs rather than on industry standards, appear less than stunningly successful. Two examples come to mind. A substantial amount of effort has been invested in HPSS as the event data storage and management system. It now appears there is a good probability that effort will have been wasted and that another solution needs to be found. Another example is the adoption of SoftRelTools, developed by the BaBaR collaboration at SLAC, instead of the more widely known and used imake from the X11 consortium or autoconfigure from the Free Software Foundation. The wisdom of this choice is not completely clear yet so little can be learned, but at least one of us (JMM) already believes that decision was a mistake, at least in part.
Based on these few, admittedly incomplete cases, it is possible to offer a tentative generalization. Where there is an industry standard, adopt it even if some other lab has not. Where there is no acceptable industry standard but some sister lab or major experiment has developed a tool that survives critical inspection, adopt it.