Notes on troubleshooting, or what to do when there's a problem. ==================================================================================================== CORRUPT DATA. On occasion, telephone will be "working", but the information is garbled and/or nonsense and/or just plain wrong. This has happened when there has been an ORACLE (or other) error at the BSS end of things, and the data file that they ship is corrupt. UPDATE: 22 Dec 1997: the telemaint.job (in the $TELESERVER_DIR/cron directory) is now smart enough to ABORT if the incoming data file contains any lines of text beginning with ERROR at line So these types of Oracle errors should NOT cause the telephone database to disappear in the future (however, the data will be a week old). (In other words, you can skip the part below about "telephone.broken.html"). If OTHER errors cause the database to be corrupt, here's what you do: on node www.fnal.gov (aka www-tele.fnal.gov), from the products account (or other privileged account): $ setup teledata $ cd $TELEDATA_DIR/www $ cp telephone.html telephone.save.html $ cp telephone.broken.html telephone.html (Make sure you cp not mv, because mv won't update the timestamp on the file so people's browsers won't reload the new page!) 5 Jan 1999: I am in the process of breaking the data out of the teleserver product (put the data into a separate teledata product), and we want to put the data into AFS space. This may mean that we'd have to volrelease, etc., to really change the telephone web page... stay tuned! This will put up a "front page" saying that the telephone directory is undergoing maintenance, buying you some time! Then, in ALL cases of errors: Then, look at the incoming data file: $ cd ~telemain $ more cd_extract.data When I've seen problems, they've been immediately apparent because the top of this file contains things like: translate_section(section) section_name, * ERROR at line 7: ORA-04052: error occurred when looking up remote object GLC.GLC_ORGS@GLPRD.WORLD ORA-00604: error occurred at recursive SQL level 1 ORA-01035: ORACLE only available to users with RESTRICTED SESSION privilege ORA-02063: preceding line from GLPRD If this is the case (or if there is any question whether or not the file is corrupt), call: Paul B. Czarapata, x3773, czarjr@fnal.gov NOTE, as of 30 Jan 1998, Paul Czarapata no longer works at the lab. Contact for telephone issues should now be Edith Brown, x2648, glynne@fnal.gov. He is the person currently responsible for sending the BSS data files to us and is usually able to send a new file promptly. Once you have a new ~telemain/cd_extract.data, on node www.fnal.gov: $ setup teleserver $ cd $TELESERVER_DIR/cron $ ./telemaint.job This will take between 30 minutes and 3 hours depending on the load. Don't panic. Once the data files have been rebuilt, you can test things by looking at http://www.fnal.gov/telephone/telephone.save.html (fill in a name and see if things come out ok). If everything is ok: $ cp telephone.save.html telephone.html (Make sure you cp not mv!). ==================================================================================================== FILE NOT FOUND. On occasion, if there is a problem on www.fnal.gov and the disk containing the teleserver product is unavailable, users will get a "File not found" message when they try to load telephone. There's not much that can be done about this, except to fix the disk (currently /usr0). ==================================================================================================== WEB SERVER DOWN. If www-tele.fnal.gov has some kind of hardware problem, and the telephone directory needs to be moved to a new node, here are the things that must be performed. NOTE, this is being written BEFORE being tested. Hopefully I will have thought of everything, but in case not, sorry. - teledata must be available and configured on the target node. Presumably, teledata is in AFS space and the target node also has AFS; but if not, then somebody will need to "install" a recent copy of teledata. (This would mean tarring the original files on the host node, untarring them on the target node, then doing a ups declare -c -r /path/to/teledata/files -m teledata.table -0 teledata v1_0 or something similar). NOTE, the target node MUST be the same flavor as the original www-tele node! The gdbm data files used by teleserver are flavor-dependent! If you must change the flavor of the web server node (the node running the teleserver script), then you must regenerate the gdbm files by manually running the $TELESERVER_DIR/cron/telemaint.job script using the command $ setup teleserver $ $TELESERVER_DIR/cron/telemaint.job [input [output]] The default input file is $TELEDATA_DIR/data/RAWDATA; the default output file is $TELEDATA_DIR/data/NASTDATA. These should be fine (but you may wish to save a copy of the NASTDATA file from the original platform, so that you can easily revert when www-tele is alive again). - links to the teledata html files. The web administrator will need to go to the default html area on the target node and create a link named telephone which points to /usr/local/products/teledata/current/www so that http://target.node/telephone/ maps to /usr/local/products/teledata/current/www/index.html on the target node. See the teledata README file for details. - teleserver must be available and configured on the target node. This should be as simple as doing a "upd install teleserver -G -c" on the target node. - links to the teleserver code files. The web administrator will need to go to the default cgi-bin area on the target node and create a link to the teleserver script so that http://target.node/cgi-bin/telephone.script/ points to /usr/local/products/teleserver/current/bin/telephone.script - web server restart. The web server on the target node must be running in a FUE environment, and must set and pass the environmental variables needed by teleserver. These variables include: PATH - must have an appropriate version of perl (>= v5_004) in the path PERL5LIB - must point to the PERL5 librariesx TELESERVER_DIR - points to the teleserver code files TELEDATA_DIR - points to the telephone data files Typically, this is done by doing a "setup teleserver,perl" before starting the web server (and by making sure that the web server's configuration files pass the above environmental variables). - change the www-tele alias. Contact the networking folks, make sure that www-tele.fnal.gov points to the new target node.