Device operation¶
Before instantiation, only the number of workers in the pool stage (see below) and the ZeroMQ connections (see below) may need to be configured. The former will soon be replaced by an automatic scaling at runtime.
During runtime, most actions are performed via a set of slots:
- Start, Stop
Start or stop the data pipeline. Before the pipeline may operate, a context must be initialized successfully via reconfigure and the device be in the ACTIVE state. Stopping the pipeline while an operator runs will cancel this operator as well.
- Reconfigure
Initialize a new context from the provided source file. This slot is central to any changes to the pipeline configuration during runtime, e.g. for code changes to take effect or recover from an error condition. It is possible to perform this action in the PASSIVE, ACTIVE, PROCESSING (called hit reconfigure) and ERROR state. Any internal buffers are cleared as well (see Clear buffer below), while constant data and parameters are preserved if possible from the previous context.
- Clear buffer
Clear transient data in the data pipeline, which is typically used for reduction operations such as averaging. When the context is reconfigured, this happens implicitly as all internal objects are recreated. This will not clear any feedback data in the pipeline (see Clear const below).
- Clear const
Clear constant data in the data pipeline, which is typically used for reference or background signals. This kind of data will persist through a reconfiguration process and can only be cleared with this slot.
- Suspend
Set an empty context, which will cause all device proxies and pipeline connections to be closed. In this PASSIVE state, the device will consume almost no system resources. It can be recovered by reconfiguration with a non-empty context.
The context is contained in a source file, which must be readable by the xctrl user on the node the device runs on. Currently, this excludes storing the file on GPFS in a proposal directory.
States¶
- INIT
Occurs during the initial device initialization as well as whenever the pipeline context is reconfigured during normal operation.
- PASSIVE
When the device is suspended, an empty context is loaded. Thus, there are no proxies or pipeline connections in this state and the device will consume almost no resources at all. This state is recommended when the device will not be used in the foreseeable feature in order to reduce the load on fast data producers. It is reached by pressing “Suspend” or reconfiguring with an empty context path.
- ACTIVE
In this state, a valid context is initialized, all device proxies and pipeline connections are established and the pipeline may start processing at any time. It is typically entered when a context is reconfigured successfully from the INIT, PASSIVE or ERROR state.
- PROCESSING
The pipeline is in full operation in this state and matched data is being sent for processing. The context may still be reconfigured, which will immediately return to this state if possible, i.e. no error condition occurs.
- ENGAGING
The device enters this state while the pipeline is running and an operator is executed. Hence this can only be reached from the PROCESSING state and will return to it once the operator finishes or a recoverable error occurs. The context may not be reconfigured while an operator is running.
- ERROR
This condition is reached whenever an unrecoverable error occurs. Typically, this is caused by a problem in context code. The device can only recover via a context reconfiguration, which will always transition into ACTIVE if successful, even if the error occured in PROCESSING or ENGAGING.
- STOPPING
Occurs only during device shutdown.
Built-in scenes¶
There are two scenes provided with the device:
The default
overview
scene provides all properties and slot for diagnostics purposes at runtime.The optional
compact
scene includes only the top-most portion of theoverview
scene with the most critical information for normal operations.
For operators, it is recommended to start from the compact
scene and extend it by any parameters, action slots, operators or other GUI elements relevant for a given application.
Context-dependant properties¶
The pipeline context may inject additional properties (via parameters) or slots (via action views and operators) into the device schema, which are then available under their respective nodes parameters
, actionViews
and operators
.
The device will attempt to preserve a parameter value across context reconfiguration, if it continues to be defined by it and did not change its type. In order to reset a value back to its default setting set in the code, the device may be suspended first.
Matching strategies¶
The data from all input sources must be matched for a train before it can be sent to the data pipeline, as in particular fast data arrives asynchronously at the device. The MetroProcessor provides different strategies on how to deal with this situation, which may differ in train latency and order through the pipeline. All these implementations use the max_train_latency
property of the device for fine tuning. In all cases, input data older than this value is discarded immediately.
- PATIENT
This is the most basic matching strategy, which buffers all input events and processes these entries in the pipeline when the corresponding train reaches maximum train latency. Already buffered data will be overwritten with newer data for the same train and input source, if it arrives before the latency cap is reached.
Latency always equal to maximum train latency.
Within train latency window, pipeline data is overwritten by newer data.
- GREEDY
In contrast to the patient matching strategy above, this implementation distinguishes between complete and incomplete trains. A train is considered complete once all the input sources required by the context’s views is present. In this case, the train is processed immediately and may therefore happen before an earlier, but incomplete train, enters the pipeline. A train is processed at the latest when its age reaches the maximum train latency.
While the overall rate is the same as for the patient matcher, this implementation can offer significantly faster feedback if all input sources have similar latency. At the same time, it avoids the excessive stuttering of the cunning train matcher below when one or more inputs drop trains frequently (at the cost of higher latency for incomplete trains). It is therefore recommended for online visualization of results, in particular if fast feedback as a reaction to control action is desired.
Duplicate entries are dropped for the same input source and train, as a train may have been executed the moment the original input arrived.
Minimum latency for complete trains, same as patient for incomplete trains.
Good balance between smooth output and fast feedback.
Duplicate data for same input and train is discarded.
- CUNNING
The cunning train matching strategy extends the greedy algorithm by the assumption that input sources always arrive in-order. Once data is received for a particular train, the same input will never send data for an earlier train. This way, it is possible to execute incomplete trains much sooner when there is no hope of completing them anymore. Hence few train will reach the maximum train latency, unless it its value is set too low or too many trains are dropped by a device.
This strategy will yield the lowest latency, but may experience stuttering when the input sources drop trains at different rates and/or drop different train IDs. In this case, trains will be released in bursts with possibly long lags in between. However, unlike the greedy matcher, trains are always executed in order. It is therefore best suited for transformation of data without any real-time visualization.
As with the greedy train matching above, data will not be replaced if multiple entries arrive for the same input and train.
As many complete trains as early as possible.
Overall minimum latency, but may cause stuttering.
Duplicate data for same input and train is discarded.
In general, it is recommended to use GREEDY
for real-time visualization and CUNNING
for transformation application. The PATIENT
strategy offers the smoothest output rates at the cost of higher, but constant latency.
Train offsets¶
A train offset may be applied to any fast data, if the source device appears to have an offset relative to the pipeline device. This allows to match data actually belonging to the same, but differing in their reported train IDs.
The input path must be an unambiguous qualifier for a karabo#
data path. Only the source and pipeline name are used, any hash key is ignored.
Worker pool¶
The pool size should be large enough to ensure this stage is able to proces trains at 10 Hz or more, i.e. >= ceil(t // 10)
for processing time t
in the pool stage. This can be verified by ensuring the load reported in the worker statistics node is less than 1.0 for all pool workers.
ZeroMQ configuration¶
There are several ZeroMQ connections between the device, the pipeline stages and any clients:
- Output
The reduce stage binds a
PUB
socket to this address and sends the pipeline results. It is recommended to set this to an Infiniband address for maximum bandwidth to clients.
- Control
The device binds a
ROUTER
socket to this address to control the pipeline stages. By default, this is set to an IPC connection on the same node using the device ID as path.
- Reduce
The reduce stage binds a
PULL
socket to this address to receive the pool stage results directly viaPUSH
. By default, this is set to an IPC connection on the same node using the device ID as path.
In most cases, only the output address should require any adjustments, e.g. in order to run several device instances on the same node.