MSHEAR Binary Patch 0531-01
R.W. Busby 26 Sep 99
Caltech/USGS TriNet Seismic Network
An archive file msbp990926.slip.lzh is available to correct an error in SLIP communication links that can render the link inoperable. This patch applies to MSHEAR 36/09-0531. It is also backwards compatible to all previous versions of MSHEAR. It is the first patch of this release.
Description of the problem
The datalogger enters a condition whereby all SLIP communication is disabled. The system sends IP packets out the serial port but no packets are accepted in return. The serial port of the datalogger is not polled for input and hence no incoming slip packets are received. The ifslip process transmitting packets monopolizes the serial port and does not allow the ifslip process receiving packets access.
In our case, the trigger for entering the problem state was the onset of acknowledges to the SEC comlink during a period when the PRI comlink was still retransmitting frequently but receiving no acknowledgments. An error in the configuration parameters exacerbated the situation by configuring the comlink resend packet window size to be six packets instead of two, when used with a very short resend timeout of 2 seconds. (See configuration details below) In lab simulations, we used ws=16, resendpkts=6 and resend=10 for normal operations and could trigger the problem state by changing resend=2 via the option K of the aqshell menu. In our case, killing the dacommo process could reactivate the link.
The flooding option of a comlink does not cause the slip driver to hang. It seems gross overload of packets to the driver causes it to discard packets and it does not hang. There appear to be only certain conditions in which the driver attempts to keep up transmission at the expense of reception.
Clear indication of the problem is identified through the slipstat reports. A view of this status at 5 minute intervals revealed the outbound packets increased while the inbound packets do not. Also, the number of polls of the serial port do not increase. See example diagnostic output in the diagnostic details below.
Solution to the Problem
A revised slip driver (ifslip) works in combination with a revised serial port descriptor (5x8530) to correctly arbitrate for the serial port. The serial port driver is available for Q4120 and Q730. A version for Q680 systems is not yet available. The new slip driver will work with older versions of the serial port descriptor (including the Q680 version) to avoid the error, though better performance is achieved with the new combination. The revised files as offered as a binary lharc archive at;
ftp://quake.geo.berkeley.edu/pub/quanterra/mshear/release/msbp990926.slip.lzhThis archive includes a copy of the old version and the new version of each module, as well as update the file actually used by the system to the new version.
To install the archive, transfer the binary file to the datalogger (to /h0/HOLDING). Extract the file by entering;
chd /h0
lharc –xf holding/msbp990926.slip.lzh
The version of these modules released with MSHEAR 36/09-0531 were;
/h0/isp98/cmds/ifslip version 11 crc $2686D0 edition #21
/h0/overlays/4120/5x8530 version 23 crc $A391B3 edition #22
The new versions are;
/h0/isp98/cmds/ifslip version 12 crc $1D0880 edition #21
/h0/overlays/4120/5x8530 version 26 crc $DD5D03 edition #26
Use the crc to uniquely identify the module. The determine what version is loaded in memory, enter;
sysop: ident -m ifslip
Header for: ifslip
Module size: $26EC #9964
Owner: 0.0
Module CRC: $1D0880 Good CRC <<==== look at this value
Header parity: $2564 Good parity
Edition: $15 #21
Ty/La At/Rev $E01 $A001
Permission: $555 -----e-r-e-r-e-r
Dev Drv, 68000 obj, Sharable, System State Process
For more details of the versions and the filesystem organization see Version Details below.
Additional Details Section
Diagnostic Details
VSP) date
August 19, 1999 Thursday 10:00:06 pm
VSP) slipstat /sl2
=======================================================================
IFSLIP Device Information Statistics:
-----------------------------------------------------------------------
Device = sl2 Driver = ifslip
MTU = 1006 bytes
Flags = 0x0132 [ BROADCAST PT_TO_PT NO-TRAILERS NO-ARP ]
if_this = 0x00e38190 if_next = 0x00eb5290 if_prev = 0x00e30010
if_static = 0x00e31990 if_size = 0x000000d0
Socket Address (Internet Style):
-----------------------------------------------------------------------
Address Family = 2 IP Port = 0 IP Address = 131.215.58.9
IFSLIP Driver Static Storage:
-----------------------------------------------------------------------
Input Output
--------------- ---------------
Serial Device: /x2 /x2
Process ID's: 19 20
Compression: OFF OFF
Mbuf Queue Head: 0x00000000 0x00e0ca10
Bytes In/Out: 68908970 1438618817
IP Packets In/Out:
1823507 2844113Compressed Packets: 187430 0
Uncompressed Packets: 23299 0
Biggest IP Packet: 414 552
Smallest IP Packet: 3 40
Errors: 409832 0
Reopens: 0 0
System path: 20 21
Death Flag: 0 0
Mbuf Size: 4096
Failed InMbuf Alloc: 0
Runts: 6896
GS_READY Polls:
9621394SS_SIG Waits:
9621394
IFSLIP Device Descriptor Options:
-----------------------------------------------------------------------
Serial Device - Input: /x2
Serial Device - Output: /x2
Process Priority - Input: 128
Process Priority - Output: 128
Receive Buffer Size: 4096
Compression: OFF
Parity-Stop Bits-Bits/Char: 0x00
Baud Rate Code: 0x0f
=======================================================================
VSP) ifcontrol /sl2
mbuf control module revision: 1
total mbuf size: 393216
total allocated: 15552
minimum reserve: 49152
failed attempts: 0
allocation mode: NO WAIT
looking for if control module ifi.83D73A09
if control module revision: 1
ip address: 83D73A09
if device name: sl2
total queued on xmit: 13700
xmit queue limit: 15000
discarded xmit bytes: 257912912
discarded xmit packets:
470644total queued on recv: 0
discarded recv packets: 0
queued bytes in serial xmit buffer: 1024
total input packets:
1406780total output packets:
2844169
A second report is obtained five minutes later.
VSP) date
August 19, 1999 Thursday 10:06:12 pm
VSP) slipstat /sl2
=======================================================================
IFSLIP Device Information Statistics:
-----------------------------------------------------------------------
Device = sl2 Driver = ifslip
MTU = 1006 bytes
Flags = 0x0132 [ BROADCAST PT_TO_PT NO-TRAILERS NO-ARP ]
if_this = 0x00e38190 if_next = 0x00eb5290 if_prev = 0x00e30010
if_static = 0x00e31990 if_size = 0x000000d0
Socket Address (Internet Style):
-----------------------------------------------------------------------
Address Family = 2 IP Port = 0 IP Address = 131.215.58.9
IFSLIP Driver Static Storage:
-----------------------------------------------------------------------
Input Output
--------------- ---------------
Serial Device: /x2 /x2
Process ID's: 19 20
Compression: OFF OFF
Mbuf Queue Head: 0x00000000 0x00e27350
Bytes In/Out: 68908970 1439210109
IP Packets In/Out:
1823507 2845192Compressed Packets: 187430 0
Uncompressed Packets: 23299 0
Biggest IP Packet: 414 552
Smallest IP Packet: 3 40
Errors: 409832 0
Reopens: 0 0
System path: 20 21
Death Flag: 0 0
Mbuf Size: 4096
Failed InMbuf Alloc: 0
Runts: 6896
GS_READY Polls:
9621394SS_SIG Waits:
9621394
IFSLIP Device Descriptor Options:
-----------------------------------------------------------------------
Serial Device - Input: /x2
Serial Device - Output: /x2
Process Priority - Input: 128
Process Priority - Output: 128
Receive Buffer Size: 4096
Compression: OFF
Parity-Stop Bits-Bits/Char: 0x00
Baud Rate Code: 0x0f
=======================================================================
VSP) infcontrol /sl2
mbuf control module revision: 1
total mbuf size: 393216
total allocated: 13248
minimum reserve: 49152
failed attempts: 0
allocation mode: NO WAIT
looking for if control module ifi.83D73A09
if control module revision: 1
ip address: 83D73A09
if device name: sl2
total queued on xmit: 12056
xmit queue limit: 15000
discarded xmit bytes: 258174308
discarded xmit packets:
471121total queued on recv: 0
discarded recv packets: 0
queued bytes in serial xmit buffer: 604
total input packets:
1406780total output packets:
2845252
Module Version Details
The slip driver is located in /h0/isp98/cmds as ifslip.11 for the old version and ifslip.12 as the new version. The module used by the system is /h0/isp98/cmds/ifslip and is a copy of ifslip.12.
sysop: ident ifslip.11
Header for: ifslip
Module size: $264A #9802
Owner: 0.0
Module CRC: $2686D0 Good CRC
Header parity: $25C2 Good parity
Edition: $15 #21
Ty/La At/Rev $E01 $A001
Permission: $555 -----e-r-e-r-e-r
Dev Drv, 68000 obj, Sharable, System State Process
sysop: ident ifslip.12
Header for: ifslip
Module size: $26EC #9964
Owner: 0.0
Module CRC: $1D0880 Good CRC
Header parity: $2564 Good parity
Edition: $15 #21
Ty/La At/Rev $E01 $A001
Permission: $555 -----e-r-e-r-e-r
Dev Drv, 68000 obj, Sharable, System State Process
The serial port descriptor for Q4120 and Q730 systems is located in /h0/overlays/4120. The old version is located in a subdirectory 23/5x8530 while the new version is in another subdirectory 26/5x8530. The module used by the system is /h0/overlays/4120/5x8530 and is a copy of 26/5x8530. The module used by Q680 systems is /h0/overlays/147/5x8530 or /h0/overlays/00/5x8530 depending on the CPU type. Installing the lharc file will not update these Q680 serial port descriptors but the slip driver will avoid the error.
Module released with MSHEAR 36/09-0531
sysop: ident 23/5x8530
Header for: 5x8530
Module size: $1120 #4384
Owner: 0.0
Module CRC: $A391B3 Good CRC
Header parity: $1E75 Good parity
Edition: $16 #22
Ty/La At/Rev $E01 $A001
Permission: $555 -----e-r-e-r-e-r
Dev Drv, 68000 obj, Sharable, System State Process
Revised Module:
sysop: ident 26/5x8530
Header for: 5x8530
Module size: $1160 #4448
Owner: 0.0
Module CRC: $DD5D03 Good CRC
Header parity: $1E79 Good parity
Edition: $1A #26
Ty/La At/Rev $E01 $A001
Permission: $555 -----e-r-e-r-e-r
Dev Drv, 68000 obj, Sharable, System State Process
Configuration Details
Portion of desired Key file for a SLIP link:
ws1 6
ws2 6
resend1 2
resend2 2
rspkt1 3
rspkt2 3
This produces the desired comlink configuration section of aqcfg:
* comlink section for IP mode on pri
*
[pri]
levels=32 mprio=20 port=35145 ipaddr=131.215.63.5 pkts=2500
fmt=QSL rce=y
resend=2 maxresends=15 synctime=20 ws=6
resendpkts=3 netdly=120 netto=60 delay=5
grpsize=1 grpto=0 detprio=14 timeprio=24
notify=y station=GOR udp=y keepnew=y
*
*
* comlink section for IP mode on sec
*
[sec]
levels=32 mprio=20 port=37145 ipaddr=131.215.63.6 pkts=2500
fmt=QSL rce=y
resend=2 maxresends=15 synctime=20 ws=6
resendpkts=3 netdly=120 netto=60 delay=5
grpsize=1 grpto=0 detprio=14 timeprio=24
notify=y station=GOR udp=y keepnew=y
*
Portion of the key file which caused the error when both comlinks became active. Note that the correct key is resendpkts=%rspkt1%.
ws1 6
ws2 6
resend1 2
resend2 2
rspkts1 3
rspkts2 3
That produces the desired comlink configuration section of aqcfg, the default
keyvalue of 6 is used for resendpkt, rather than 2:
* comlink section for IP mode on pri
*
[pri]
levels=32 mprio=20 port=35145 ipaddr=131.215.63.5 pkts=2500
fmt=QSL rce=y
resend=2 maxresends=15 synctime=20 ws=6
resendpkts=6 netdly=120 netto=60 delay=5
grpsize=1 grpto=0 detprio=14 timeprio=24
notify=y station=GOR udp=y keepnew=y
*
*
* comlink section for IP mode on sec
*
[sec]
levels=32 mprio=20 port=37145 ipaddr=131.215.63.6 pkts=2500
fmt=QSL rce=y
resend=2 maxresends=15 synctime=20 ws=6
resendpkts=6 netdly=120 netto=60 delay=5
grpsize=1 grpto=0 detprio=14 timeprio=24
notify=y station=GOR udp=y keepnew=y
*