Update on RSTDC debugging

The current status of RSTDC debugging follows. To remind you, there are two broad classes of problems: Those problems that corrupt the data without hanging the readout (RSTDC Disaster), and those problems that crash or hang the readout. Renee, Peter, and Erik have been studying the data corruption mechanisms that don't hang the readout, which are characterized by at least TDC and/or DYC ordinal number mismatches. The problems that crash or hang the readout varied, and will be discussed later. In an effort to reproduce some of these problems Renee, George and I got the readout going with Level-0 RSMON triggers running at the high volume (RSTDC data of 8MB/spill) and high rate (900+ events/spill). This was achieved by turning off the FERA ADC, TD, and Fastbus readout. For comparison, at the highest beam rate we ran at during the fall the volume was 5 MB/spill and the rate was 300-400 events/spill. After running for a bout ~10 hours in this configuration NO ordinal number mismatches occured.

We are however able to reproduce one of the DAQ hanging/crashing conditions, previously refered to as "Unexpected DC2 header", which the RSTDC PowerPC (ppc02) prints out before it hangs requiring a reboot. At this high volume and rate this error occurs about every 20 minutes. On closer inspection of the data, it turns out that this error condition is caused by the first DYC header (DYC#4) in the DC2(Memory 2) going AWOL. The DC2 byte count is there, (recall that the DC2 doesn't generate a formal header), but the next long word is a TDC header. The problem follows the RS-485 input cable into the DC2, and doesn't follow a UTN or DTN (RS-485 cable terminators). So it is likely due to the RS-485 cable or DYC4. Swapping DYC4 is looking good, we have gone now for about an hour now at these rates with a new DYC4. Stay tuned. If this continues to look good, George and I will work on getting Level 1.2 going and looking into those RSTDC associdated readout problems.

On further inspection of the system I discovered a couple of other potential problems. First, which was known prior to this visit, is that the "Buffer Almost Full" signal on the DC2s is NOT OR'd into the global BUSY logic. Renee has set this BAF threshold to 31 MB on both the 32 MB memories. Given that our high rate beam running generated only 5 MB for both memories combined, it is unlikely that this is an issue. No action was taken, but it is worth keeping in mind. More of an issue is that the the DYC input buffer "Half Full" (HF) signals are NOT set to be internally OR'd into the DYC BUSY signal. This can be seen by a CAMAC read of the DYCs (F0), and I have verified by visual inspection of the jumpers. I don't know the history of this but it is a mistake. Our maximum per-event data volume is about 6TDCs*32Chan*16hits*4bytes + headers = 12.5 Kbyte, The HF mark is 8 KByte, and the FIFO capacity is 16 KByte. If the DC2 never asserts WAIT on the DYC (wishfull thinking), one might imagine this situation is acceptable. However, independent of whether HF is internally OR'd into BUSY or not, the DYC will NOT issue a REN in response to REQ if the HF signal is set. The problem here is that if BUSY is not sent back to the trigger when HF is exceeded on a particular DYC, triggers could be sent and accepted by OTHER DYCs and other frontends, leading to synchronization problems. I have addressed this concern by internally OR'ing the HF signal on each DYC into BUSY coming from each DYC.

Other observations: All DYCs are set to 16 usec holdoff of issuing REN in response to REQ. This was verified by visually inspecting the DYC jumper settings, and is not readable via a CAMAC instruction. This is a reasonable value. I also noticed that at least two DYCs have BUSY and PERMIT RG-174 connectors that are somewhat loose on the PC boards. These connectors are not constrained by holes in the DYC front panels, so one should take care not to wiggle them too hard when inserting/extracting cables.

More later, BobT


Bob Tschirhart
Last modified: Sun Dec 2 17:50:45 CST 2001