Core watcher

Introduction

Core watcher is an application which is run on a tested host and constantly monitors new core dump files. If it discovers that a core dump was generated by one of the binaries located in Test Agent directory, it gets backtrace and logs it. It simplifies debugging of segfaults in Test Agent, RPC server and other applications.

Requirements

Core watcher can be used only on Linux hosts. It requires at least 3.15 kernel version (where F_OFD* fcntl() locks were added). It needs gdb to be installed as it relies on it when investigating core dumps.

Usage

In builder configuration file , add TE_TA_APP macro to build core watcher for a given Test Agent, for example:

TE_TA_APP([ta_core_watcher], [${$1_TA_TYPE}],
          [${$1_TA_TYPE}], [], [], [], [], [],
          [ta_core_watcher], [])

In RCF configuration file, add core_watcher and core_pattern properties for ta object, like

<ta ...>
    ...
    <conf name="core_watcher">yes</conf>
    <conf name="core_pattern">/var/tmp/core.te.%h-%p-%t</conf>
</ta>

core_pattern may be omitted if it is not required to change system core pattern value. Note that changing system core pattern is often necessary because piping core pattern to some program is used by default which may even not save core dumps at all for binaries unknown to the system. However even if core watcher cannot change core pattern to the requested value (because it is not run under root or because other core watchers are already run), it will try to obtain the current system value of core pattern and use it.

If multiple Test Agents are run on the same host (each may be run by a different testing session), core watchers can be run simultaneously for all of them. Every core watcher will print logs only about binaries located in specific TA directory, ignoring other core dumps. However if core pattern change is requested, only one of the simultanously running core watchers will be able to change it.

WARNING: core watcher is not compatible with setting core pattern via configuration tree, using /agent/sys/core_pattern node. This is currently used by some test suites to configure saving core dumps to a known location while testing is run, so that later they may be examined manually. The problem with this approach is that core pattern may be set via configuration tree only once TA is fully initialized, so any core dumps generated during TA initialization will not be saved to expected location and may be even completely lost. Such core dumps are significant because they may appear when you implement new configuration objects on TA and there is a problem in initialization code which must be debugged.

Architecture

Normal way of collecting logs from a remote host is via Test Agent. A process forked from TA sends logs to TA; TA stores them in a buffer. Logger regularly collects accumulated logs from TA with help of rcf_ta_get_log().

Unfortunately core watcher cannot use this mechanism as Test Agent itself may crash producing a core dump, in which case logging anything about core dump via TA is impossible.

So core watcher is started directly by RCF (via ssh) and is not dependent on TA in any way. It prints its logs to stdout, and RCF gets this output from ssh and sends it to Logger.

When started, every core watcher opens a file /tmp/te_core_pattern_lock (possibly creating it). It then obtains a shared lock on it and holds it until termination, so that other core watchers will not change core pattern in an unexpected way if they are run simultaneously. If core pattern change is requested, core watcher also tries to obtain exclusive lock and changes core pattern only if that succeeds; after that lock is converted back to shared.

Core watcher which changed core pattern is responsible for restoring it. If at the end it cannot immediately obtain exclusive lock on te_core_pattern_lock file, it forks a child process and terminates immediately. The child process will then wait until a lock can be obtained and then will change system core pattern back to its original value.

Core watcher uses inotify API to monitor location where core dumps are saved. Every time a new file is detected, it uses gdb to check whether this is a core dump from one of the binaries in TA directory, and if this is the case, backtrace is obtained from gdb and logged.