Stepping Stones
===============

Generates events when TCP connections resume after a lengthy pause
(controlled by SteppingStones::min_pause).  This package is meant to provide
a building block for stepping-stone (and possibly data-relay) analysis.
It's not meant to be customer-visible.

| Option | Default Value | User Configurable? | Description |
|--------|---------------|--------------------|-------------|
| table_expire_interval | 100 msec | ? | Polling interval for expired table entries. |
| SteppingStones::min_pause | 500 msec | ? | Key value: how long do connections need to pause. |
| step_delta | 80 msec | ? | How close two pauses have to be for connections to be viewed as coinciding. |
| stp_ratio_thresh | 0.3 | ? | Proportion of idle times that must coincide. | 
| stp_common_host_thresh | 2 | ? | How many in-a-row coincidences connection pair must manifest to be deemed to reflect a stepping stone. For pairs that share a common intermediary host. |
| stp_random_pair_thresh | 4 | ? | How many in-a-row coincidences connection pair must manifest to be deemed to reflect a stepping stone. For pairs that do not share a common intermediary host. |



Experimental
------------

To build and run this package in a container based on zeekstack-docker:

    $ docker run \
        --rm \
        -it -v $(pwd):/src/package \
        -w /src/package \
        zeek-build:latest \
        bash -c 'make && cd ./testing && btest -c ./btest.cfg -v'


Clustering Approach 1a
----------------------

All correlation of connections happens on the manager.

In the first version, the timestamp used for correlating connections is
the local time when the manager starts processing a forwarded pause event.
Concretely, the timestamp for the pause event on a worker is not used.

This works as long as there is zero or close to zero latency between the
pause event on the worker the manager starting processing.

In real life, messages are delayed and events queue up. Once this happens
the timestamps associated with pause events by the manager become arbitrary.
In fact, it may lead to correlations between connections that sit in the
queue together, even though their corresponding pause events happened at
far different times.


Clustering Approach 1b
----------------------

Switch `network_time()` to `current_time()` as `network_time()` stopps
moving if events queue on the manager (this may be an issue elsewhere, too).
This also prevents table expiration from happening.

Other than handling overload better due to expiration based on real-time,
it still uses the manager time and may therefore cause spurious correlations.


Clustering Approach 2
---------------------

Summary: Instead of using the local manager time, send network_time() of
         pause events from workers and discard messages arriving too late.

All correlation still happens on the manager.

Workers forward pause events with their local `network_time()` attached.
As `network_time()` for each worker should come from the same NIC, we
assume they can be compared.

The manager tracks a `max_worker_time` variable that is updated whenever
a pause event is handled. It represents the maximum `network_time()` received
from pause events across all workers at any given point.

The `max_worker_time` variable tracked on the manager is used for expiration
and discarding of connections in `recent_awakenings` events. Events that are
received with a `network_time()` value lower than `max_worker_time - step_delta`
are discarded.
The latter scenario can happen if the pause events of one worker are lagging
behind more than `step_delta` compared to those coming from another worker.
This is considered a problem and we can `Reporter::warn()` about it.

When real life hits and events are delayed, we may now discard messages.
No correlation happens for discarded pause events.
This seems no worse than for approach 1 where we timestamp delayed messages
with the current time of the manager. Further, it is feasible to detect
this situation and report it.


Circuit Breaker / Overload protection
-------------------------------------

The stepping-stones package is very performance sensitive with respect to:

* the pause event rate
* the rate at which connections are established and removed
* the number of concurrent connections

As the connection correlation is a centralized process on the manager, it
is fairly easy to overload it when above parameters become too high.

To prevent overload, a configurable limit is provided:

* `SteppingStones::max_conns`

When the `conn_pauses` table reaches the `max_conns` limit, the manager will
start discard pause events for connections that are not already tracked.
Further, workers will stop forwarding pause events for new connections if
their locally tracked connection table reaches `max_conns` as well.

There may be potential to have the worker use a lower limit depending
on the number of total workers. But it's unclear this is correct with
regards to load balancing.
