Swarm Monitor Program Design Goals: Watch for "swarms" of events in one or more geographical regions. Issue alarms (mail and/or pager messages) to internal users when rate of events exceeds an adjustable threshold. Issue notifications (mail) that swarm rate continues above the threshold. Report (mail) when swarm rate has decayed below a turn-off threshold. Minimize "flapping" (rapid changes of state between "on" and "off"). The current implementation of the swarm monitor, called "swarmon", is a perl script running as a client of the CISN Process Control System (PCS). It is being tested using events from the Geysers area (where geothermal energy extraction has resulted in numerous small earthquakes) a replay of a major swarm in the Long Valley area in 1997-1998. The following attempts to describe the behavior of swarmon such that seismologists and other network operators can understand and evaluate the program's suitability. Introduction ------------ The swarmon program is concerned with rates of earthquakes, always in events per hour. However because earthquakes are usually discrete events, the rate must be measured by counting events over some time interval. There are several places in the swarmon configuration where both rate thresholds and measurement intervals can be specified. swarmon can be configured to monitor swarms in one or more geographical regions. A region is specified by a polygon of latitude and longitude points. The program assumes that these regions will not overlap; that is, any event is expected to lie in at most one swarm region. As swarmon encounters new events from PCS, it queries the database to determine if the event lies within any of the swarm region. If the event occurred with a swarm region, then the event is recorded in the swarm_events database table along with the event origin time and region name, and that region's event rate is analyzed. Swarm Detection --------------- For each swarm region there is a configured base rate threshold and a detection time interval. The product of these two numbers gives a threshold number of events. The count time interval over which swarmon counts events ends at the current time, and goes back in time for the configured detection time interval but never before the last notification time. Each time a new event occurs within a swarm region, swarmon counts from the swarm_events table the number of events that have occurred during the above count time interval. If this number of events meets or exceeds the threshold number of events, then a swarm is declared to be in progress. Note that the threshold number of event is always based on the configured detection time interval and the threshold rate. But the count time interval may be shorter than the configured detection time since the count time interval will never start before the last notification time. The result is that when the threshold number of events is reached, the actual rate of events per hour may be higher than the threshold rate. This design was chosen to minimize the number of alarms generated for a given swarm. Once a given event is counted as part of one notification, it will NOT be counted as part of the following notification. The following actions are taken when a swarm is first declared: - Email and pager notifications are sent to a configured list of recipients. The message is: Earthquake swarm for events per hour since For email, the subject line is: Swarm Alarm for - The threshold rate for the region is multiplied by region's configured "increment" factor (expected to be 1.0 or larger). This new threshold rate will be used to determine whether further alarms should be issued for this swarm region. - A timer is set at hours in the future for renotification in case of no further rate threshold excedance. - A timer is set at hours in the future to re-examine the adjustable rate threshold in case the swarm has abated. - The swarmon state for this region is saved to disk. If the swarm continues with increasing rate, then the event rate may reach the now possibly enhanced threshold. The actions for this case are similar to the initial case: - Email and pager notifications are sent to a configured list of recipients. For re-alarms, the message is: Earthquake swarm for continues at increasing rate: events per hour since For email, the subject line is: Swarm Alarm for - The threshold rate for the region is multiplied by region's configured "increment" factor (must be 1.0 or larger). This new threshold rate will be used to determine whether further alarms should be issued for this swarm region. - Any previously set timers are removed. - A timer is set at hours in the future for renotification in case of no further rate threshold excedance. - A timer is set at hours in the future to re-examine the adjustable rate threshold in case the swarm has abated. - The swarmon state for this region is saved to disk. Thus the threshold rate may be adjusted higher as the swarm develops, provided that the configured increment factor is larger than 1.0. Swarm Continuation Without Threshold Exceedance ----------------------------------------------- After a swarm has been declared for a region, and after there has been no threshold exceedance for the duration of the configured NOTIFY_INTERVAL, the renotification timer will expire and perform the renotification procedure. - Email (only) notifications are sent to a configured list of recipients. The message is: Earthquake swarm for continues: events per hour since . The current alarm threshold is events per hour. The subject line for this email is: Swarm continues for - The swarmon state for this region is saved to disk. - A timer is set at hours in the future for the next re-notification. Swarm Abatement --------------- After a swarm has been declared for a region, and after there has been no threshold exceedance for the duration of the configured RERATE_INTERVAL, the rerate timer will expire and perform the rerate procedure. - The rate of events is computed since the last threshold exceedence or the last rerate, whichever is later. - A proposed next rate threshold is computed by dividing the current rate threshold by the increment factor. - If the rate of events for this rerate period is less than the configured turnoff threshold AND the proposed next rate threshold is less than the base threshold, the swarm is declared to be over. The following actions are taken: - Email notification is sent with the following text: Earthquake swarm terminated for The email subject is: Swarm Terminated for - The current rate threshold is set to the base rate threshold - The pending renotification timer is deleted. - If the rate of events for this rerate period is less than the configured base rate threshold, then the current rate threshold is set to the proposed next rate threshold. No notification of this change is made. - The swarmon state for this region is saved to disk. - A timer is set at hours in the future for the next rerate procedure. - The expiration time of the region's renotification timer is checked. If it would expire shortly (less than 1/4 of the renotification interval) before the next rerate timer, then the renotification timer is adjusted to expire 5 seconds after the rerate timer. This avoids a notification that a swarming is continuing followed shortly thereafter by a notification that it has terminated. The rerate procedure is designed to slowly reduce the current rate threshold as a swarm is ending, requiring that the actual swarm rate be low for an extended period of time before the swarm is declared to be over. By setting the turnoff threshold (which must be less than or equal to the base threshold) to a low value, you will extend the time before swarmon declares the swarm to be over. That will reduce the incidence of "flapping", or rapid changes in state between "swarm in progress" and "swarm terminated". Program Restart --------------- Since swarmon saves its state to files every timer there is a state change, swarmon can be stopped and restarted at any time, such as for a configuration change or system maintenance. On startup, swarmon reads any state files after it has read its configuration. For any swarms which were in progress at the time of shutdown, swarmon will set the appropriate timers to continue monitoring the swarm. If any timers would have expired during the shutdown period, swarmon will set new timers to expire shortly after startup. This allows swarmon to process any pending events in PCS before the renotification and rerate procedures are run. Configuration Files ------------------- Following are sample configuration files including explanatory comments. Base Configuration File: # Database connection parameters: DB_USER = some_user DB_PASSWORD = some_password DB_NAME = db_name # PCS parameters: PCS_GROUP = EventStream PCS_TABLE = ewdb PCS_STATE = SwarmAlarm # time interval in seconds between queries to PCS tables: WATCH_INTERVAL = 60 # Directory where config files are found: CONF_DIR = /home/ncss/ncpp/swarmAlarm/config # Directory where state files are saved (for restarts): STATE_DIR = /home/ncss/ncpp/swarmAlarm/state # Directory where log files are written: LOG_DIR = /home/ncss/ncpp/swarmAlarm/logs DEBUG = 1 # Number of days to keep old events in swarm_events table: MAXDAYSTOKEEP = 30 # swarm region config files; use as many as needed, one per line: SWARM_CONF = Geysers SWARM_CONF = Lassen SWARM_CONF = Long_Valley SWARM_CONF = Shasta-Tennant-Medicine_Lake Region Configuration File: # long name of swarm region (spaces permitted) NAME = Long Valley # region name as known to database (no spaces allowed here) REGION = Long_Valley # The largest time interval in hours over which the rate is determined: DETECT_INTERVAL = 6 # The base rate in events per hour above which a swarm is declared: BASE_RATE_THRESHOLD = 1.67 # The rate in events per hour below which the swarm is declared terminated: TURNOFF_RATE_THRESHOLD = 1.0 # Factor by which base rate is multiplied each time threshold is exceeded: INCREMENT = 1.5 # Interval in hours of no turnoff threshold exceedance after which adjustable # threshold will be adjusted: RERATE_INTERVAL = 24 # Interval in hours of no threshold exceedance after which a "swarm continues" # email will be sent: NOTIFY_INTERVAL = 24 # email address to notify of swarm alarm, continuation, and termination. # multiple addresses can be listed on one line, separated by commas, no spaces. # Repeat this command as necessary for more addresses: NOTIFY_MAIL = lombard@seismo.berkeley.edu # pager/qpage name to notify of swarm alarm. # multiple names can be listed on one line, separated by commas, no spaces. # Repeat this command as necessary for more names: # NOTIFY_PAGE = lombard Database Table -------------- swarmon uses one database table to keep track of recent events in the monitored regions. This table saves the work of having to repeatedly determine whether an event is inside a swarm region. CREATE TABLE "SWARM_EVENTS" ( "EVID" NUMBER(15,0) NOT NULL, "REGION" VARCHAR2(24) NOT NULL, "DATETIME" NUMBER(25,10) NOT NULL, CONSTRAINT "SWARM_EVENTSKEY01" PRIMARY KEY ("EVID", "REGION") ENABLE );