Fanout Processor #1878

lalitb · 2026-01-23T20:46:06Z

Fan-out Processor Implementation

Implements all four discussed scenarios:

Scenario	Config	Description
1	`mode: parallel, await_ack: primary`	Duplicate to all, wait for primary only
2	`mode: parallel, await_ack: all`	Duplicate to all, wait for all (with per-destination timeout)
3	`mode: sequential`	Send one-by-one, advance after ack
4	`fallback_for: <port>`	Failover to backup on nack/timeout

Why Stateful (not Stateless like Go collector)

The Go Collector's fanout is stateless because it uses synchronous, blocking calls:

err := consumer.ConsumeLogs(ctx, ld)  // blocks until complete, error returns directly

Our OTAP engine uses async message passing with explicit ack/nack routing:

effect_handler.send_message_to(port, pdata).await?;  // returns immediately
// ack arrives later as separate NodeControlMsg::Ack

I explored making scenarios 1 and 3 stateless but hit three blockers:

subscribe_to() mutates context - Fanout must subscribe to receive acks, which pushes a frame onto the context stack. For correct upstream routing, we need the original pdata (pre-subscription). We cannot use ack.accepted from downstream.
Downstream may mutate/drop payloads - into_parts(), transformers, and filters mean we can't rely on getting intact pdata back in ack/nack messages.
Sequential/fallback/timeout require coordination - Need to know which destination is active, when to advance to the next, and when to trigger fallbacks or finish.

Even if downstream guaranteed returning intact payloads, we'd still need state for await_all completion tracking, fallback chains, and sequential advancement. The only gain would be a minor memory optimization (not storing original_pdata), not true statelessness.

Adopting Go's synchronous model would require fundamental engine architecture changes, not just fanout changes.

Memory Optimizations

While full statelessness isn't possible, I have implemented fast paths to minimize allocations for common configurations:

Configuration	Fast Path	State Per Request
`await_ack: none`	Fire-and-forget	None (zero inflight tracking)
`parallel + primary + no fallback + no timeout`	Slim primary	Minimal (`request_id → original_pdata`)
All other configs	Full	Complete endpoint tracking

Fast Path Details

Fire-and-forget (await_ack: none)
Bypasses all inflight state. Clone, send, and ACK upstream immediately.
Zero allocations per request.
Slim primary path
Uses a tiny HashMap<u64, OtapPdata> instead of the full Inflight struct with EndpointVec.
Ignores non-primary ACKs and NACKs.
Full path
Required for:
- Sequential mode
- await_all
- Any fallback
- Any timeout
Tracks all endpoints and request state.

Code Structure

Inflight holds per-request state:

original_pdata - pre-subscription pdata, used for all upstream acks/nacks
endpoints[] - per-destination status (Acked/Nacked/InFlight/PendingSend)
next_send_queue - drives sequential mode advancement
completed_origins - tracks completion for await_ack: all
timeout_at - per-destination deadlines for timeout/fallback triggering

Not all fields are used for every scenario, but the overhead is minimal - empty HashSets don't allocate, SmallVec is inline for ≤4 items, and clone cost is O(1) for bytes::Bytes.

Documentation

See crates/otap/src/fanout_processor/README.md for configuration examples and behavior details.

…-processor

codecov · 2026-01-23T20:49:09Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 81.84%. Comparing base (6ad291b) to head (a2bd621).

Additional details and impacted files

@@             Coverage Diff             @@
##             main    #1878       +/-   ##
===========================================
- Coverage   85.18%   81.84%    -3.35%     
===========================================
  Files         508      181      -327     
  Lines      153846    51433   -102413     
===========================================
- Hits       131047    42093    -88954     
+ Misses      22265     8806    -13459     
  Partials      534      534

Components	Coverage Δ
otap-dataflow	`∅ <ø> (∅)`
query_abstraction	`80.61% <ø> (ø)`
query_engine	`90.23% <ø> (-0.35%)`	⬇️
syslog_cef_receivers	`∅ <ø> (∅)`
otel-arrow-go	`53.50% <ø> (ø)`
quiver	`∅ <ø> (∅)`

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

…-processor

lalitb · 2026-01-25T08:23:41Z

Fanout now runs stateless in one case (await_ack: none fire-and-forget) and slim state in one case (parallel + primary-only with no fallback/timeout: just request_id -> original_pdata). All other configuration remains fully stateful to preserve ordering, failover, and upstream routing semantics. Have updated the PR desc and Readme.md accordingly. Ready for review with that explicit split.

lquerel · 2026-01-26T16:43:49Z

Regarding the stateless vs stateful debate, I'm wondering if it's actually more a question of stack vs stackless, since we're based on message passing. I'm not a Go Collector specialist, but from what I understand the state is kept on the stack across the different sequential calls. So in the end, the presence and management of state don't seem that different to me.

In our case, having support for both a parallel mode and a sequential mode should allow users to choose between speed with limited control, or a precisely defined sequence that can be interrupted at any step. That would make this fanout processor something truly powerful and expressive.

lalitb · 2026-01-26T17:33:22Z

Regarding the stateless vs stateful debate, I'm wondering if it's actually more a question of stack vs stackless, since we're based on message passing. I'm not a Go Collector specialist, but from what I understand the state is kept on the stack across the different sequential calls. So in the end, the presence and management of state don't seem that different to me.

You're right - state always exists somewhere. In Go fanout it's implicit on the call stack during the synchronous call, then gone when it returns; in our async pipeline we make it explicit via the message's context stack or a map when coordinating multiple outcomes. The key difference is blocking vs non-blocking - Go's blocking model gives implicit correlation, while our async model requires explicit correlation since acks arrive later as separate messages.

In our case, having support for both a parallel mode and a sequential mode should allow users to choose between speed with limited control, or a precisely defined sequence that can be interrupted at any step. That would make this fanout processor something truly powerful and expressive.

Yes, parallel vs sequential gives users that flexibility - speed when order doesn't matter, precise control when it does. Currently Go collector fanout is only sequential.