Skip to content

26. An Event-Driven Pipeline

The previous chapter modeled a request/response SaaS backend with sync calls as the dominant style. This chapter takes a system shaped the other way: an event-driven data pipeline where almost every interaction is async, where modules don’t call each other directly, and where the architectural questions are different.

Specifically: an analytics pipeline. Raw events come in from product surfaces, get enriched with user attributes, get filtered and aggregated, land in a warehouse and a real-time dashboard. The shape that matters: most modules are subscribers reading from sources, not callers requesting from providers.

What we’re modeling

ProductApp ──► RawEvents
StoreFront ──► RawEvents
└─► Enricher.Enrich (subscribes)
└─► EnrichedEvents
├─► Aggregator.Roll (subscribes)
│ │
│ └─► Metrics.Push
└─► Warehouse.Land (subscribes)
└─► Dashboard.Refresh (subscribes)

Six modules: two emitters (ProductApp, StoreFront), one transformer (Enricher), one aggregator, one warehouse, one dashboard. Two distinct event interfaces (RawEvents, EnrichedEvents). Every cross-module link is async.

Package layout

analytics/
├── package.archspace
├── sources.arch # ProductApp, StoreFront (event emitters)
├── pipeline.arch # Enricher, Aggregator (transformers)
├── sinks.arch # Warehouse, Dashboard
└── views.arch

Manifest:

name: acme.analytics
version: "0.1.0"
use * from arch.modules
use * from arch.kinds

The modules

sources.arch:

frontend ProductApp {
team: Product
labels {
domain: Analytics
security.zone: Internal
}
"Customer-facing product. Emits clickstream and lifecycle events."
event RawEvents {
"page_view, click, purchase, signup, churn"
}
}
frontend StoreFront {
team: Frontend
labels {
domain: Analytics
security.zone: Internal
}
"E-commerce frontend. Emits its own clickstream into the same pipeline."
event RawEvents
}

Two emitters, both publishing RawEvents. The shape of the events differs between the two, but the interface kind (event) and the topic (RawEvents on each module) is what the wiring resolves against — subscribers see each as a separate <Module>.RawEvents.

pipeline.arch:

service Enricher {
team: Data
labels {
domain: Analytics
security.zone: Internal
}
"Joins raw events with user profile data. Emits enriched events downstream."
command Enrich {
"Subscribed to every RawEvents source."
subscribes: ProductApp.RawEvents
}
event EnrichedEvents {
"Raw event + resolved user_id, session_id, plan_tier."
}
}
service Aggregator {
team: Data
labels {
domain: Analytics
security.zone: Internal
}
"Rolls events into minute / hour / day buckets. Pushes to Metrics."
command Roll {
"Subscribed to the enriched stream."
subscribes: Enricher.EnrichedEvents
}
}
service Metrics {
team: Data
labels {
domain: Analytics
security.zone: Internal
}
"Tier-1 metrics surface — counters and rollups."
command Push { "Receive rolled-up metric increments from Aggregator." }
}

Note Enricher.Enrich subscribes to one source’s RawEvents. The pipeline has two sources (ProductApp.RawEvents and StoreFront.RawEvents) — both need wiring. Two options:

Option A: one subscribe-handler per source.

service Enricher {
command EnrichProduct { subscribes: ProductApp.RawEvents }
command EnrichStore { subscribes: StoreFront.RawEvents }
event EnrichedEvents
}

Option B: one canonical event topic, sources publish to a shared one. Requires a project-local interface kind that captures “this event participates in the analytics topic.” Subtler — covered in Chapter 29 when we talk about metamodel design.

For this chapter, Option A — explicit per-source handlers. It’s clearer and the validator catches everything.

sinks.arch:

database Warehouse {
team: Data
labels {
domain: Analytics
data.classification: pii
}
"Long-term analytical store. Append-only, partitioned by event date."
command Land {
"Append enriched events to the canonical table."
subscribes: Enricher.EnrichedEvents
}
event LandedEvents {
"Emitted after a successful Land. Downstream consumers (Dashboard) subscribe here."
}
}
frontend Dashboard {
team: Data
labels {
domain: Analytics
security.zone: Internal
}
"Real-time analytics dashboard. Subscribes to warehouse landings for live refresh."
command Refresh {
"Refresh active visualizations when new data lands."
subscribes: Warehouse.LandedEvents
}
}

Warehouse is a database kind (with required labels.data.classification from the stdlib). Land is the inbound command subscribed to enriched events; LandedEvents is the outbound event the downstream Dashboard listens to. The two-interface pattern — one to receive, one to broadcast — is the right shape whenever a node both consumes and emits.

Mistake to avoid. Don’t introduce a fake “EventBus” or “Broker” module just to anchor subscriptions. The wiring already lives on the handler. Adding a broker module bloats the diagram with a node that doesn’t represent an architectural element — it represents implementation. If your real architecture has a managed broker (Kafka, SNS), model that as an external_system only if your services explicitly call its admin API. The data plane through it is invisible to the model; that’s correct.

Processes for the rare sync flow

Even an event-driven pipeline has sync moments — usually administration:

process #flow01 BackfillReplay {
DataEngineer > Warehouse.Land : "manual replay of archived events"
}
process #flow02 DashboardLoad {
User > Dashboard.Refresh : "initial page load"
}

Two processes for the two sync touchpoints. The dataflow itself isn’t captured here — it’s captured by the subscribes: wiring, which the renderer joins into the diagram automatically.

Why not write a giant process covering the whole pipeline? Processes encode causality with an explicit step order. The pipeline’s causality is already encoded in the subscribe wiring — Aggregator.Roll happens because Enricher.EnrichedEvents fired, not because some process said “Aggregator goes after Enricher.” Forcing it into a process duplicates information and constrains the diagram unnecessarily.

Views

view PipelineDataflow {
focus domain: Analytics
layout dagre
"Full pipeline — sources, transformers, sinks. Async edges show subscribe wiring."
}
view PIIScope {
focus data.classification: pii
group by team
layout elk
"Every module touching PII data. Used for data-classification audits."
}

PIIScope works because of the data.classification: pii label on Warehouse. Cascade pulls it through to anything nested inside; the focus filter does the rest.

What an event-driven model looks like

After validation:

  • Six modules, two event interfaces, four async handlers wired via subscribes:.
  • Diagram with directed edges that follow the data flow — sources at the top, sinks at the bottom, dashed edges for async.
  • Two views: dataflow for engineers, PII scope for governance.

The dependency graph is implicit in the subscribe wiring. Where the SaaS backend (Chapter 25) had every dependency captured in a process step, this pipeline has most dependencies in subscriber declarations. Both are valid — pick the surface that matches how the system actually works.

Decisions you’ll face

One handler per source vs. one shared topic. Per-source is more explicit and validator-friendly; a shared topic is more idiomatic for actual Kafka-style brokers. If you have many sources publishing the same shape of event, define a project-local interface kind (Chapter 29) so the shape is documented in one place.

Where does the broker live? As above: usually nowhere in the canonical model. Brokers are implementation. Exception: when your services call the broker’s admin API (creating topics, listing subscribers programmatically), the broker becomes an external_system with those administrative commands as interfaces.

Modeling fan-out. Don’t enumerate every subscriber on the source — let each subscriber declare itself. The source publishes; the subscribers wire. This keeps the source decoupled from how it’s consumed.

Backpressure and retries. Not in the model. Those are operational concerns. The model captures who consumes what; how reliably it does so is a different artifact (an SLO doc, an incident-response runbook).

Labels that flow. domain: Analytics is on every module here; you could cascade it from a system Analytics { ... } container that holds all six modules instead. Whether to introduce that container is taste — it reads cleanly in the diagram as one cluster but adds one level of nesting in source.

Summary

  • Event-driven systems model async flow via subscribes: on handlers, not via process steps.
  • Per-source handlers (Option A) are clearer; shared topics require a project-local interface kind.
  • Don’t model the broker — its data plane is invisible to architecture.
  • Use processes for the remaining sync moments (replays, dashboard loads, admin operations).
  • Views for dataflow (layout dagre) and governance scopes (PII, security zones) work directly off cascading labels.

What’s next

Chapter 27: An External Integration → — your system plus a third-party API plus their webhooks, modeling the boundary precisely.