CK Flows

SaaS — Lead Routing Hardening

Refactor of a monolithic record-triggered Flow into modular subflows with defensive fault paths and event-driven retries. Result: −92% Flow errors and +18% speed-to-lead.

SaaSRecord-Triggered, Subflow2025-09-242 min read
−92%
Flow error rate
+18%
Speed-to-lead (median)
−88%
Assignment faults per 1k leads
Measured from error log records / 1,000 leads

TL;DR

  • Split monolithic record-triggered Flow into intent-specific subflows (enrichment, scoring, assignment).
  • Added defensive fault paths + platform-event retry pattern; centralized logging for observability.
  • Bulk-safe updates + strict entry criteria eliminated recursion and reduced governor risk.

Context

Mid-market B2B SaaS with spiky inbound volume across regions. Existing routing relied on one large record-triggered Flow and ad-hoc decisions maintained by multiple admins.

Problem

Under peak loads the Flow faulted and partially assigned records, creating SLA misses on first response, duplicate owner handoffs, and inconsistent task creation.

Intervention

Architecture — Decomposed routing into subflows: ① enrichment, ② score & segment, ③ owner assignment, ④ post-assign tasks/SLAs.

Reliability — Added fault paths that raise a platform event with context; retry subflow consumes events and replays safe operations.

Bulk safety — Consolidated DML to a single commit subflow using collections; guarded entry/exit criteria to avoid recursion.

Observability — Error log custom object tracks Flow, element, exception, record id, and attempt count; daily dashboard and alerts.

Change safety — Added unit test template for subflow decisions (sample inputs/expected outputs) to catch regression in sandboxes.

Outcomes

Window90 days pre vs 90 days post go-live
IndustrySaaS
CloudsSales Cloud
Flow TypesRecord-Triggered, Subflow
−92%
Flow error rate
+18%
Speed-to-lead (median)
−88%
Assignment faults per 1k leads
Measured from error log records / 1,000 leads

Errors counted via platform-event + error-log object; STL measured as Lead.CreatedDate → first owner activity. Data excludes weekends/holidays per client reporting convention.

Timeline

1 week design + 1 week build + 1 week bake-in with monitoring.

Stack

Sales Cloud, Platform Events, Custom Error Log object (reports/dashboards).

Artifacts

  • Before/after routing Flow diagram
  • Retry pattern sequence diagram (platform events)
  • Error trend chart (90d window)
  • Decision table (segment → owner/subqueue)

FAQ

How did you ensure bulk safety and avoid recursion?

All writes were consolidated into a commit subflow operating on collections. Entry conditions prevent re-entry on the same context, and updates are batched.

What happens when an external dependency fails (e.g., enrichment)?

A fault path raises a platform event with the record id and failure context. The retry consumer evaluates idempotency and replays only safe steps.

How are failures monitored day-to-day?

Error log records are summarized on a dashboard by flow, element, and hour. Thresholds trigger alerts to an ops channel and a weekly email digest.

What changed for admins?

Admins update a small decision subflow or a decision table instead of the monolith; unit test templates catch regressions in sandboxes before deploy.