Supervision

In most languages, supervision requires a framework or a special runtime construct. In Saga, supervision is a handler pattern: catch a failure, log it, and restart the computation. This falls directly out of the effect system with no special language support.

The Basic Pattern

A supervisor is a handler for Fail that restarts the computation instead of propagating the error:

fun supervise : (Unit -> Unit needs {Fail, ..e}) -> Unit needs {..e}
supervise f = f () with {
  fail reason = {
    dbg $"crashed: {reason}"
    supervise f
  }
}

When the computation calls fail!, the handler catches it, logs the reason, and calls supervise f again. The computation restarts from scratch.

Because the supervisor doesn't call resume, the failed computation is abandoned. A fresh invocation starts clean.

fun unreliable_worker : Unit -> Unit needs {Fail}
unreliable_worker () = {
  dbg "working..."
  fail! "something broke"
}

main () = supervise unreliable_worker
# working...
# crashed: something broke
# working...
# crashed: something broke
# (loops forever)

Retry Limits

An infinite restart loop is rarely what you want. Add a retry counter:

fun supervise_n : Int -> (Unit -> a needs {Fail, ..e}) -> Result a String needs {..e}
supervise_n 0 _ = Err "retries exhausted"
supervise_n n f = {
  let result = f () with {
    fail reason = {
      dbg $"attempt failed: {reason}"
      supervise_n (n - 1) f
    }
    return value = Ok value
  }
  result
}

After n failures, the supervisor gives up and returns Err.

Using `Std.Supervisor`

The standard library provides supervised, which wraps this pattern:

import Std.Supervisor (supervised)

fun unreliable : Unit -> Int needs {Fail String}
unreliable () = fail! "something went wrong"

main () = {
  let result = supervised 3 (fun () -> unreliable ())
  dbg $"result: {debug result}"
}
# result: Err("something went wrong")

supervised catches failures and retries up to the given number of times. It returns Ok value on success or Err reason with the last failure if all retries are exhausted.

Backoff

Since supervision is just a function, adding backoff is straightforward. Use the Timer effect to add a delay between retries:

import Std.Actor (Timer)

fun supervise_backoff : Int -> Int -> (Unit -> a needs {Fail, ..e})
  -> Result a String needs {Timer, ..e}
supervise_backoff 0 _ _ = Err "retries exhausted"
supervise_backoff n delay f = {
  let result = f () with {
    fail reason = {
      dbg $"failed: {reason}, retrying in {show delay}ms"
      sleep! delay
      supervise_backoff (n - 1) (delay * 2) f
    }
    return value = Ok value
  }
  result
}

# Usage: exponential backoff starting at 100ms, up to 5 retries
supervise_backoff 5 100 my_worker

The delay doubles on each retry: 100ms, 200ms, 400ms, 800ms, 1600ms. Because the Timer effect is just another effect in the needs clause, no special integration is required.

Let It Crash

The BEAM's "let it crash" philosophy says: don't try to handle every possible error defensively. Instead, let processes fail and have a supervisor restart them. This works because BEAM processes are isolated. One crashing process cannot corrupt another's state.

In Saga, this philosophy maps directly to effects:

Write your logic assuming everything works. Use fail! when something goes wrong.
Wrap it in a supervisor handler at the boundary.
The supervisor decides the restart policy. The business logic doesn't know or care.

fun server_loop : Unit -> Unit needs {Fail, Actor Request, Process}
server_loop () = {
  let req = receive { r -> r }
  let response = process_request req
  send! req.reply_to response
  server_loop ()
}

main () = {
  supervise (fun () -> server_loop ())
} with beam_actor

If process_request fails, the supervisor restarts the loop. The requesting process gets no response (it should have its own timeout), but the server keeps running.

Supervision and Resources

When a supervised computation acquires resources, combine supervision with scoped cleanup to ensure resources are released on restart:

fun worker : Unit -> Unit needs {Fail, Scope}
worker () = {
  let db = acquire_scoped! (fun () -> connect "postgres") disconnect
  do_work db
}

main () = {
  supervise (fun () -> {
    worker () with run_scoped
  })
}