Read this first!

Clojure issue tracking now lives at http://dev.clojure.org/jira, and the wiki is at http://dev.clojure.org. These Assembla pages are kept online for historical interest only.

This is an issues list for agent error handlers. It is not a promise of any feature. See also ticket #30.

Background

Since agents run their actions asynchronously on thread pool threads, there isn't a direct path from a thrown exception back to the sender of the action, unlike all of the other reference types where the initiator is on the stack. So any exceptions must be caught in the thread pool thread and put somewhere. Currently that place is in the agent itself. Agent exception handlers will provide a hook for that exception to go somewhere else. They might additionally provide other features.

Synch or not?
- i.e. does the handler run in the agent's thread-pool thread?
  - an arbitrary function
  - if not, some queue mechanism
Basic facilities
- convey the exception and agent
- convey the action and args?
  - if actions are function objects, quite opaque.
    - But may be useful if only for "re-running" the action with a changed agent or world state
  - could be e.g. vars, which are callable and named
    - but agents rarely know the var of their actions
    - that's an architectural choice - if you were using handlers you might choose to send vars
- Control over:
  - pending sends
  - watcher notification
  - pending actions
  - future agent interaction (e.g. 'kill' the agent)
- If synch
  - what operations are allowed in arbitrary function?
    - esp. sends to same or other agents
  - opens door to possibly 'fixing' problem and returning new state
- if asynch
  - need to be able to create an error-queue
    - and attach handler to that?
  - or some blocking consumer?
  - we don't have queue abstraction yet
- since known asynch, fixed policy on pending sends and watcher notifications?
  - can't alter state
  - can do arbitrary sends

The original idea behind a synch handler function was that it could put the error on a queue if desired. The only complexity with an arbitrary synch handler function are:

In what context does it run? Can it send to the same or other agents?
Can it communicate policy back to the agent?
- e.g. relating to how to proceed as far as pending messages, pending actions, future agent interaction and agent state
  - If it can do any of these, then it can't be moved to a queue model later
- if it can't do any of these, then we'll need to be able to setup fixed policy on the agent itself
  - :on-error
    - :die
    - :continue
    - :clear-and-continue
    - :reset
    - ???

The Plan:

Top level error policy controlled per-agent by :on-error setting
- :continue -- perfect for simple logging, keeps agent always "healthy"
  - any pending sends from the action are thrown away, watchers will not be notified
  - synch handler function
    - passed agent and exception, can deref to get agent state
      - could be passed agent state and use *agent* if needed?
      - no, because that won't work if we switch to asynch/queues
    - may send to same or other agents
    - errors from handler are ignored?
      - could fall back to :fail policy, but it seems weird to connect policies like this
      - yes, ignore
    - return value of handler is ignored (no "fixing" of agent state or other policy in return value)
  - queued actions proceed after handler is done
  - derefs of agent and sends never fail (agent is always in a "healthy" state)
  - :continue is the default when a handler is provided
- :fail -- allows more complex control, possiblity of fixing errors without losing queued actions
  - pending sends from the action are thrown away, watchers will not be notified
  - queued actions however are held, waiting for a possible restart/clear of the agent
  - synch handler function, while agent not yet in :fail state
    - passed agent and exception, can deref to get agent state
      - could be passed agent state and use *agent* if needed?
      - no, as above, would dictate synch
    - may send to same or other agents, but sends to self will also wait for restart/clear
    - errors from handler are ignored?
      - could replace the action's exception with the handler's exception, but probably should match :continue policy
      - yes, match by ignoring
    - return value is ignored -- can't "fix" agent state here, nor return policy, see restart below.
  - after handler returns, agent is atomically moved to :fail state
    - send to agent now re-throws exception
    - deref of agent is allowed
      - necessary for (restart a @a) to work
    - (agent-error a) returns the exception (nil if not :fail state)
    - (restart a new-state :clear-actions bool) un-fails the agent
      - throws an exception if agent is not :fail state
      - clears queued actions if :clear-actions is true (defaults to false)
      - sets state to new-state
        
        validation is done: failure throws immediately from 'restart', agent remains unchanged (:fail state with old value)
        
        watchers will not be notified
      - moves agent out of :fail, allowing any queued actions to continue. new sends now ok
  - Deprecate agent-errors and clear-agent-errors. For temporary backward-compatibility:
    - agent-errors now does (list (agent-error a)) or nil if not :fail state
    - clear-agent-errors now does (restart a @a)
      - Note the semantics aren't identical but very similar
  - :fail is the default policy when no handler is provided
Errors from watchers always ignored
Validator fail is handled the same as action exception