Skip to content

Commit

Permalink
Multi-instance plugin configuration support
Browse files Browse the repository at this point in the history
* Converted plugins to configuration objects
* Added multi-instance support to samplers:
  * procnetdev2
  * meminfo
  * avro_kafka store
  * csv store
  * sos store
* Converted all samplers to support new plugin infrastructure
* Changed configuration object reference counting to use ref_t
  • Loading branch information
tom95858 committed Jan 15, 2025
1 parent 2902a3c commit 6f14669
Show file tree
Hide file tree
Showing 99 changed files with 2,132 additions and 2,546 deletions.
46 changes: 16 additions & 30 deletions ldms/man/ldmsd.man
Original file line number Diff line number Diff line change
Expand Up @@ -9,53 +9,40 @@ ldmsd \- Start an ldms daemon
ldmsd [OPTION...]

.SH DESCRIPTION
The ldmsd command can be used
to start an ldms process. Plugin configuration of the ldmsd can be done via the
a configuration file or the ldmsd_controller.

Starting ldmsd with the configuration file option enables you to statically configure a
sampler without requiring python. Dynamically configuring samplers with ldmsd_controller requires python.
Currently, v2's ldmsctl can still be used to dynamically configure a sampler without requiring
python. This capability will be replaced and it is not recommended that you use this option.
The \c ldmsd command is used to start an instance of an ldmsd server.
Configuration of the ldmsd is accomplished statically with a configuration
file provided on the command line, or dynamically with the ldmsd_controller
or distributed configuration management server Maestro.

.SH ENVIRONMENT
.SS
The ldmsd-check-env program will dump currently set environment variables that may influence ldmsd and plugin behavior.
The following environment variables must often be set:
.TP
LD_LIBRARY_PATH
Path to ovis/lib and libevent2/lib, if not in a system default path. Depending on the system these may be lib64 instead of lib.
.TP
PATH
Include the path to sbin directory containing ldmsd.
.SS The following environment variables may be set to override compiled-in defaults:
.TP
ZAP_LIBPATH
Path to ovis/lib/ovis-ldms
Path to the location of the LDMS transport libraries.
.TP
LDMSD_PLUGIN_LIBPATH
Path to ovis/lib/ovis-ldms
Path to the location of the LDMS plugin libraries.
.TP
LDMSD_PIDFILE
Full path name of pidfile overriding the default /var/run/ldmsd.pid unless the command line argument "-r pidfilepath" is present.
Full path name of a file overriding the default of /var/run/ldmsd.pid. The command line
argument "-r pid-file-path" takes precedence over this value.
.TP
LDMSD_LOG_TIME_SEC
If present, log messages are stamped with the epoch time rather than the date string. This is useful when sub-second information is desired or correlating log messages with other epoch-stamped data.
.TP
LDMSD_SOCKPATH
Path to the unix domain socket for the ldmsd. Default is created within /var/run. If you must change the default (e.g., not running as root and hence /var/run is not writeable), set this variable (e.g., /tmp/run/ldmsd) or specify "-S socketpath" to ldmsd.
.TP
LDMSD_MEM_SZ
The size of memory reserved for metric sets. Set this variable or specify "-m"
to ldmsd. See the -m option for further details. If both are specified, the -m
option takes precedence over this environment variable.
.TP
LDMSD_UPDTR_OFFSET_INCR
The increment to the offset hint in microseconds. This is only for updaters that
determine the update interval and offset automatically. For example, the offset
hint is 100000 which is 100 millisecond of the second. The updater offset will
be 100000 + LDMSD_UPDTR_OFFSET_INCR. The default is 100000 (100 milliseconds).
The increment to the offset hint in microseconds for updaters that
determine the update interval and offset automatically. For example, if the offset
hint is 100000, the updater offset will be 100000 + LDMSD_UPDTR_OFFSET_INCR.
The default is 100000 (100 milliseconds).

.SS CRAY Specific Environment variables for ugni transport
.TP
ZAP_UGNI_PTAG
For XE/XK, the PTag value as given by apstat -P.
For XC, The value does not matter but the environment variable must be set.
Expand All @@ -65,7 +52,7 @@ For XE/XK, the Cookie value corresponding to the PTag value as given by apstat -
For XC, the Cookie value (not Cookie2) as given by apstat -P
.TP
ZAP_UGNI_CQ_DEPTH
Optional value for the CQ depth. Default is 2048.
Optional value for the CQ depth. The default is 2048.
.TP
ZAP_UGNI_STATE_INTERVAL
Optional. If set, then ldmsd will check all nodes' states via rca interface.
Expand Down Expand Up @@ -155,9 +142,8 @@ Silence can be obtained by specifying /dev/null for the log file or using comman
.TP
.BI "-v, --log_level" " LOG_LEVEL"
.br
LOG_LEVEL can be one of DEBUG, INFO, ERROR, CRITICAL or QUIET.
LOG_LEVEL can be one of DEBUG, INFO, WARN, ERROR, CRITICAL or QUIET.
The default level is ERROR. QUIET produces only user-requested output.
(Note: this has changed from the previous release where q designated no (QUIET) logging).
.TP
.BI -L, --log_config " <CINT:PATH> | <CINT> | <PATH>"
.br
Expand Down
10 changes: 10 additions & 0 deletions ldms/python/ldms.pxd
Original file line number Diff line number Diff line change
Expand Up @@ -927,9 +927,19 @@ cdef extern from "zap/zap.h" nogil:
uint64_t sq_sz
int pool_idx
uint64_t thread_id
pid_t tid
uint64_t idle_time
uint64_t active_time
timespec start
timespec wait_start
timespec wait_end
int waiting
void *app_ctxt

struct zap_thrstat_result:
int count
zap_thrstat_result_entry entries[0]

zap_thrstat_result *zap_thrstat_get_result()

cdef extern from "asm/byteorder.h" nogil:
Expand Down
34 changes: 21 additions & 13 deletions ldms/python/ldmsd/ldmsd_communicator.py
Original file line number Diff line number Diff line change
Expand Up @@ -57,13 +57,13 @@
import errno

#:Dictionary contains the cmd_id, required attribute list
#:and optional attribute list of each ldmsd commands. For example,
#:and optional attribute list of each ldmsd command. For example,
#:LDMSD_CTRL_CMD_MAP['load']['req_attr'] is the list of the required attributes
#:of the load command.
#:LDMSD_CTRL_CMD_MAP['load']['opt_attr'] is the list of the optional attributes
#:of the load command.
LDMSD_CTRL_CMD_MAP = {'usage': {'req_attr': [], 'opt_attr': ['name']},
'load': {'req_attr': ['name']},
'load': {'req_attr': ['name'], 'opt_attr' : ['inst']},
'term': {'req_attr': ['name']},
'config': {'req_attr': ['name']},
'source': {'req_attr': ['path'], 'opt_attr':[]},
Expand Down Expand Up @@ -1639,22 +1639,26 @@ def store_time_stats(self, name=None, reset = False):
except Exception as e:
return errno.ENOTCONN, str(e)

def plugn_load(self, name):
def plugn_load(self, name, instance=None):
"""
Load an LDMSD plugin.
Load a plugin instance.
Parameters:
name - The plugin name
inst - The configuration instance name. If None, 'name' is used
Returns:
A tuple of status, data
- status is an errno from the errno module
- data is an error message if status != 0 or None
"""
if not instance:
instance = name
req = LDMSD_Request(
command_id=LDMSD_Request.PLUGN_LOAD,
attrs=[
LDMSD_Req_Attr(attr_id=LDMSD_Req_Attr.NAME, value=name),
LDMSD_Req_Attr(attr_id=LDMSD_Req_Attr.INSTANCE, value=instance)
])
try:
req.send(self)
Expand All @@ -1665,10 +1669,10 @@ def plugn_load(self, name):

def plugn_term(self, name):
"""
Terminate a plugin
Terminate a plugin instance
Parameters:
name - The plugin name
name - The plugin instance name
Returns:
A tuple of status, data
Expand All @@ -1686,7 +1690,7 @@ def plugn_term(self, name):

def plugn_config(self, name, cfg_str):
"""
Configure an LDMSD plugin
Configure a plugin instance
Parameters:
- The plugin name
Expand All @@ -1709,10 +1713,10 @@ def plugn_config(self, name, cfg_str):

def plugn_stop(self, name):
"""
Stop a LDMSD Plugin
Stop a plugin instance
Parameters:
name - The plugin name
name - The plugin instance name
Returns:
A tuple of status, data
- status is an errno from the errno module
Expand All @@ -1733,10 +1737,12 @@ def plugn_stop(self, name):

def plugn_status(self, name=None):
"""
Get the status of a named plugin, or all plugins if no name is specified
Get the status of a plugin instance
If a name is not specified, the status is returned for all plugins.
Parameters:
[name] - The plugin name
name - The plugin instance name
Returns:
A tuple of status, data
Expand All @@ -1758,10 +1764,12 @@ def plugn_status(self, name=None):

def plugn_sets(self, name=None):
"""
List the sets by plugin that provides that sets. If name is provided only provide sets for that plugin
List the sets provided by a plugin instance
If name is not provided the sets for each plugin instance are returned.
Parameters:
[name] - The plugin name
name - The plugin name
Returns:
A tuple of status, data
Expand Down
77 changes: 23 additions & 54 deletions ldms/python/ldmsd/ldmsd_controller
Original file line number Diff line number Diff line change
Expand Up @@ -165,48 +165,6 @@ class LdmsdCmdParser(cmd.Cmd):
"""
print(args)

def do_connect(self, args):
if self.state != self.comm.CONNECTED:
if self.comm:
self.comm.connect()
kwargs = {
"xprt": "sock",
"host": "localhost",
"port": None,
"auth": "none",
"auth_opt": {},
}
r = re.compile(r"\s*(\w+)=(\w+)")
for attr, value in r.findall(args):
if attr in kwargs:
kwargs[attr] = value
else:
# treat as auth options
kwargs["auth_opt"][attr] = value
self.comm = Communicator(kwargs['xprt'],
kwargs['host'],
kwargs['port'],
kwargs['auth'],
kwargs['auth_opt'])
rc = self.comm.connect()
if rc:
self.promt = "Error connecting to {1} on port {2} with {0}".format(xprt, host, port)
else:
self.prompt = "{0}:{1}:{2}> ".format(xprt, host, port)
else:
print(f"WARN: Already connected, do nothing.")
return None

def complete_connect(self, text, line, begidx, endidx):
name_list = ["xprt=", "host=", "port=", "auth="]
full_list = ["xprt=(sock|rdma|ugni)",
"host=HOSTNAME",
"port=PORT",
"auth=AUTH_METHOD"]
if not text:
return full_list
return [x for x in name_list if x.startswith(text)]

def precmd(self, line):
if line[0:1] == '#':
return ''
Expand Down Expand Up @@ -276,6 +234,9 @@ class LdmsdCmdParser(cmd.Cmd):
if rc:
print(f'Error {rc} printing current configuration: {msg}')

def complete_dump_cfg(self, text, line, begidx, endidx):
return self.__complete_attr_list('dump_cfg', text)

def do_source(self, arg):
"""
Parse commands from the specified file as if they were entered
Expand Down Expand Up @@ -330,13 +291,19 @@ class LdmsdCmdParser(cmd.Cmd):
"""
Load a plugin at the Aggregator/Producer
Parameters:
name= The plugin name
name= The plugin to load
inst= The configuration name for the plugin (default = <name>)
"""
arg = self.handle_args('load', arg)
if arg:
rc, msg = self.comm.plugn_load(arg['name'])
name = arg['name']
if arg['inst']:
inst = arg['inst']
else:
inst = name
rc, msg = self.comm.plugn_load(name, inst)
if rc:
print(f'Error loading plugin {arg["name"]}: {msg}')
print(f'Error loading plugin {arg["name"]} as {arg["inst"]}: {msg}')

def complete_load(self, text, line, begidx, endidx):
return self.__complete_attr_list('load', text)
Expand Down Expand Up @@ -1311,6 +1278,8 @@ class LdmsdCmdParser(cmd.Cmd):

def __avg_update(self, cur_avg, cur_cnt, v, cnt):
new_cnt = cur_cnt + cnt
if new_cnt == 0:
return 0.0
return (cur_avg * (cur_cnt / new_cnt) + v * (cnt / new_cnt), new_cnt)

def __min_max(self, tbl, data):
Expand Down Expand Up @@ -1496,8 +1465,9 @@ class LdmsdCmdParser(cmd.Cmd):

try:
strgp_tbl, schema_tbl, thread_tbl = self.__store_time_process(j)
except:
raise
except Exception as e:
print(e)
return

# Schema Table
print(f"{'='*(4+21+16+16+16+21+21+11+11)}")
Expand Down Expand Up @@ -1567,12 +1537,11 @@ class LdmsdCmdParser(cmd.Cmd):
rc, msg = self.comm.plugn_status(arg['name'])
if rc == 0:
plugins = fmt_status(msg)
print("Name Type Interval Offset Libpath")
print("------------ ------------ ------------ ------------ ------------")
print("Plugin Instance Type libpath")
print("------------ ------------------------ ------------ ------------")
for plugn in plugins:
print("{0:12} {1:12} {2:12} {3:12} {4:12}".format(
plugn['name'], plugn['type'],
plugn['sample_interval_us'], plugn['sample_offset_us'],
print("{0:12} {1:24} {2:12} {3}".format(
plugn['name'], plugn['instance'], plugn['type'],
plugn['libpath']))

def do_plugn_sets(self, arg):
Expand Down Expand Up @@ -1673,7 +1642,7 @@ class LdmsdCmdParser(cmd.Cmd):
if arg:
rc, msg = self.comm.plugn_config(arg['name'], arg['cfg_str'])
if rc:
print(f'Error configuring {arg["name"]} plugin: {msg}')
print(f'Error configuring the \'{arg["name"]}\' plugin: {msg}')

def complete_config(self, text, line, begidx, endidx):
return self.__complete_attr_list('config', text)
Expand Down Expand Up @@ -2228,7 +2197,7 @@ class LdmsdCmdParser(cmd.Cmd):
if msg == "":
return
if rc != 0:
print(f"Error {msg['errcode']}: {resp['msg']}")
print(f"Error {rc}: {msg}")
return
msg = fmt_status(msg)
self.display_worker_thread_stats(msg['worker_threads'])
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -481,7 +481,7 @@ static int config(struct ldmsd_plugin *self, struct attr_value_list *kwl, struct
* This makes call into core LDMS functions for initializing the sampler
*/
errno = 0;
g_base = base_config(avl, SAMP, SAMP, mylog);
g_base = base_config(avl, self->inst_name, SAMP, mylog);
ovis_log(mylog, OVIS_LDEBUG, SAMP": Base config() called.\n");
if (g_base == NULL) {
rc = errno ? errno : -1;
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -254,7 +254,7 @@ static int config(struct ldmsd_plugin *self,

// Create an instance from the base "class". This is effectively calling
// the base class constructor.
base = base_config(attribute_value_list, SAMP, SAMP, __gpu_metrics_log);
base = base_config(attribute_value_list, self->inst_name, SAMP, __gpu_metrics_log);
if (!base) {
rc = errno;
goto err;
Expand Down
2 changes: 1 addition & 1 deletion ldms/src/contrib/sampler/ipmireader/ipmireader.c
Original file line number Diff line number Diff line change
Expand Up @@ -406,7 +406,7 @@ static int config(struct ldmsd_plugin *self, struct attr_value_list *kwl, struct
if (rc != 0)
sleep(retry);

base = base_config(avl, SAMP, SAMP, mylog);
base = base_config(avl, self->inst_name, SAMP, mylog);
if (!base) {
rc = errno;
goto err;
Expand Down
2 changes: 1 addition & 1 deletion ldms/src/contrib/sampler/ipmireader/ipmisensors.c
Original file line number Diff line number Diff line change
Expand Up @@ -286,7 +286,7 @@ static int config(struct ldmsd_plugin *self, struct attr_value_list *kwl, struct
goto err;


base = base_config(avl, SAMP, SAMP, mylog);
base = base_config(avl, self->inst_name, SAMP, mylog);
if (!base) {
rc = errno;
goto err;
Expand Down
Loading

0 comments on commit 6f14669

Please sign in to comment.