Chasing RS TDC hangs and errors

At the end of the Run in October 2001 we were getting multiple RS TDC errors, some of those led to serious corrutption of the data. Also sometimes run would terminate unexpectidly hanging the trigger. RS TDC would issue busy and DYC lights would be in some weird position. Following studies were done by George, Peter and Sasha to try to classify and fix this problems:

Summary of losses of data due to RS TDC errors in 2001 Run

Extract from Peter's email:

At Steve's suggestion I've searched for runs with potenally bad RSTDC data in the fall 2001 data using the km21 pass2 ntuples. This work can be found in bnlku9:~pcooper/pass2/.

There is no variable stored in the ntuple which directly tests the validity of the RSTDC data. There is IERRTDC; but we know that for data before Oct 25 the event number was not being loaded into the DYC headers so all those events look like failures. I chose to use npvrd; the number of range stack counters with ADCs and prompt TDCs associated with both ends. For kmu21 triggers this number should be >20 since most muons traverse the whole range stack. For some runs there are a significant fraction of the events with npvrd<15. These are candidates for runs with missing or incorrect RSTDC data. This can be seen on the top plot of the attached figure which box-plots npvrd vs run number.

The distribution of runs ( PS ) vs fraction is shown in the lower left plot. I've chosen to define suspect bad runs as those having >5% of the events with npvrd<15. There are 48 such runs out of 507 runs with km21 triggers. This list of suspect runs is below. The lower left plot is the npvrd distribution for all data. 5.9% of the events have npvrd<18, These events are probably bad. Among the causes for these bad events are bad RSTDC data.

More detailed mail exchange on this subject

Fixing RS TDC hanging the trigger.

RS TDC would hang the trigger when L1.2 trigger was running and long MPI (long compared to L1.0 trigger) was generated by special circuit build in NIM logic . We have studied Logic analysier output DYC MPI signal . It is the bottom signal on the plot as seen in Sea-scape mode in ghostview - note little vertical line. We noticed that MPI signal was reset 12 us after RSTDC stop (as it is supposed to), but 10 ns later came back - which was clear indication of strange behaviour.

Following suggestion from Bob T we studied MPI signal for L1.0 trigger varying delay of MPI stop signal (left bottom gate generator on NIM logic diagram) - effectively changing length of MPI signal. We learned that MPI signals of smaller length (<50 usec) were all OK. But increasing the length to bigger values at some point (about 50 usec) showed this sudden glitch. It turned out that it was very strange behaviour of Gate generator (P/S 794) - once we set that dial to 1 sec (it used to be 0.1 sec) - the problem disappeared. This is a strange feature of generator.

Verifying RS TDC operation

After we fixed generator we no longer experienced any RS TDC errors or trigger hangs. Of course it does not prove that they are gone, because DAQ setup was quite simple compared to real datataking. But still it was encouraging. Here is a list of studies that we performed.

Sasha Kushnirenko
Last modified: Mon Jan 14 17:17:43 CST 2002