Septum FSM Troubleshooting¶
This guide covers common issues, error messages, and debugging techniques for Septum finite state machines.
Common Errors¶
BlockedInUntimedState¶
Error message:
Cause: The FSM is in a state without a timeout configured, and no message is available to process.
Solutions:
-
Configure a timeout:
-
Send a message:
-
Use
can_dwell=Truefor intentionally blocking states:
PopFromEmptyStack¶
Error message:
Cause:
Attempting to Pop when no states have been pushed onto the stack.
Solutions:
-
Ensure
PushbeforePop:@septum.state class MainMenu: @septum.transitions def transitions(): return [ # Must push before popping later LabeledTransition(Events.SETTINGS, Push(SettingsMenu, MainMenu)), ] @septum.state class SettingsMenu: @septum.transitions def transitions(): return [ # Now pop is safe LabeledTransition(Events.BACK, Pop), ] -
Track stack depth in state logic:
@septum.state class SafeState: @septum.on_state async def on_state(ctx): stack_depth = ctx.common.get("stack_depth", 0) if stack_depth > 0: return Events.POP_BACK else: return Events.GO_HOME @septum.on_enter async def on_enter(ctx): # Track stack depth ctx.common["stack_depth"] = ctx.common.get("stack_depth", 0) @septum.transitions def transitions(): return [ LabeledTransition(Events.POP_BACK, Pop), LabeledTransition(Events.GO_HOME, HomeState), ]
Timeout Errors¶
Error message:
Cause: State exceeded its configured timeout duration.
Solutions:
-
Increase timeout:
-
Handle timeout in
on_timeout:@septum.state(config=StateConfiguration(timeout=30.0)) class ExternalAPIState: @septum.on_timeout async def on_timeout(ctx): logger.warning("API call timed out, using fallback") return Events.USE_FALLBACK @septum.transitions def transitions(): return [ LabeledTransition(Events.USE_FALLBACK, FallbackState), ] -
Make state operation faster:
Retry Exhaustion¶
Error message:
Cause: State with retry configuration has exhausted all retry attempts.
Solutions:
-
Handle retry exhaustion:
@septum.state(config=StateConfiguration(retries=3)) class RetryState: @septum.on_fail async def on_fail(ctx): logger.error(f"Operation failed after {ctx.retry_count} retries") # Transition to error state return Events.FAILED @septum.transitions def transitions(): return [ LabeledTransition(Events.FAILED, ErrorState), ] -
Increase retry count:
-
Fix underlying issue:
@septum.state(config=StateConfiguration(retries=3)) class DatabaseState: @septum.on_state async def on_state(ctx): try: # Fix the actual issue causing retries result = await db_connection.execute(query) return Events.SUCCESS except ConnectionError: # Retry only for transient errors return Events.RETRY
Validation Errors¶
Error message:
Cause: FSM construction detected structural issues (unreachable states, invalid transitions, etc.).
Solutions:
-
Check state references:
-
Ensure all states are reachable:
-
Handle all event enum values:
# BAD: Missing transition for Events.ERROR @septum.state class MyState: class Events(Enum): SUCCESS = auto() ERROR = auto() # Not handled! @septum.transitions def transitions(): return [ LabeledTransition(Events.SUCCESS, NextState), # ERROR not handled! ] # GOOD: All events handled @septum.transitions def transitions(): return [ LabeledTransition(Events.SUCCESS, NextState), LabeledTransition(Events.ERROR, ErrorState), ]
Debugging Techniques¶
Enable Debug Logging¶
import logging
# Enable detailed FSM logging
logging.basicConfig(
level=logging.DEBUG,
format='%(asctime)s [%(levelname)s] %(name)s: %(message)s'
)
logger = logging.getLogger("mycorrhizal.septum")
logger.setLevel(logging.DEBUG)
Debug log output:
[FSM] Transitioned to MyState
[FSM] [DEBUG] common object id: 140234567890
[FSM] [DEBUG] on_enter is None? False
[FSM] [DEBUG] After on_enter, common: {'counter': 1}
[FSM] Executing on_state handler
[FSM] Transitioned to NextState
Inspect FSM State at Runtime¶
async def debug_fsm(fsm: StateMachine):
"""Print detailed FSM state."""
print(f"Current state: {fsm.current_state.name}")
print(f"Stack depth: {len(fsm.state_stack)}")
print(f"Stack contents:")
for i, state in enumerate(fsm.state_stack):
print(f" {i}: {state.name}")
print(f"Common context: {fsm.context.common}")
print(f"Message queue depth: {fsm.message_queue.qsize()}")
Visualize FSM Structure¶
Export FSM to Mermaid diagram for visual debugging:
from mycorrhizal.septum.util import to_mermaid
fsm = StateMachine(initial_state=MyState)
await fsm.initialize()
# Export to Mermaid
mermaid = to_mermaid(fsm)
print(mermaid)
# Copy output to https://mermaid.live/ for visualization
Add Breakpoints in State Handlers¶
@septum.state
class DebugState:
@septum.on_enter
async def on_enter(ctx):
print(f"[DEBUG] Entering {DebugState.__name__}")
print(f"[DEBUG] Context: {ctx.common}")
# Add breakpoint here in debugger
import pdb; pdb.set_trace()
@septum.on_state
async def on_state(ctx):
print(f"[DEBUG] Executing {DebugState.__name__}")
result = await some_operation()
print(f"[DEBUG] Result: {result}")
return Events.DONE
@septum.on_leave
async def on_leave(ctx):
print(f"[DEBUG] Exiting {DebugState.__name__}")
Trace State Transitions¶
class TracingFSM(StateMachine):
"""FSM with transition tracing."""
async def transition_to(self, target_state):
"""Override to trace transitions."""
from_state = self.current_state.name if self.current_state else "None"
to_state = target_state.name
print(f"[TRACE] Transition: {from_state} -> {to_state}")
print(f"[TRACE] Stack: {[s.name for s in self.state_stack]}")
print(f"[TRACE] Context: {self.context.common}")
# Call parent implementation
result = await super().transition_to(target_state)
print(f"[TRACE] Transition complete")
return result
Common Issues¶
FSM Hangs¶
Symptoms: - FSM stops responding - No state transitions occurring - Messages not being processed
Diagnosis:
async def diagnose_hang(fsm: StateMachine):
"""Diagnose why FSM is hanging."""
print(f"Current state: {fsm.current_state.name}")
print(f"State has timeout: {fsm.current_state.config.timeout}")
print(f"State can dwell: {fsm.current_state.config.can_dwell}")
print(f"Messages in queue: {fsm.message_queue.qsize()}")
print(f"Stack depth: {len(fsm.state_stack)}")
Solutions:
-
Check for blocking operations:
-
Ensure state returns event:
-
Add timeout:
Unexpected State Transitions¶
Symptoms: - FSM transitions to wrong state - States skipped or repeated - Transition order incorrect
Diagnosis:
@septum.state
class DebugTransitionState:
@septum.on_state
async def on_state(ctx):
# Log decision-making
logger.debug(f"Current context: {ctx.common}")
if ctx.common.get("should_retry"):
logger.debug("Deciding: RETRY")
return Events.RETRY
else:
logger.debug("Deciding: DONE")
return Events.DONE
@septum.transitions
def transitions():
logger.debug("Available transitions:")
for t in [
LabeledTransition(Events.RETRY, RetryState),
LabeledTransition(Events.DONE, DoneState),
]:
logger.debug(f" {t.event} -> {t.target}")
return [
LabeledTransition(Events.RETRY, RetryState),
LabeledTransition(Events.DONE, DoneState),
]
Solutions:
-
Check transition logic:
-
Check for event name conflicts:
# BAD: Event name conflicts @septum.state class State1: class Events(Enum): NEXT = auto() @septum.state class State2: class Events(Enum): NEXT = auto() # Same name, different enum! # GOOD: Unique event names @septum.state class State1: class Events(Enum): STATE1_NEXT = auto() @septum.state class State2: class Events(Enum): STATE2_NEXT = auto()
Memory Leaks¶
Symptoms: - Memory usage grows over time - FSM instance count increasing - Context data accumulating
Diagnosis:
import tracemalloc
import gc
async def check_memory_usage():
"""Check for memory leaks."""
gc.collect()
snapshot1 = tracemalloc.take_snapshot()
# Run FSM for a while
await run_fsm_extended()
gc.collect()
snapshot2 = tracemalloc.take_snapshot()
# Compare snapshots
top_stats = snapshot2.compare_to(snapshot1, 'lineno')
for stat in top_stats[:10]:
print(stat)
Solutions:
-
Clean up context data:
-
Avoid accumulating data:
# BAD: Unbounded growth @septum.state class AccumulatingState: @septum.on_state async def on_state(ctx): # List grows forever ctx.common.setdefault("history", []).append(data) return Events.DONE # GOOD: Bounded size @septum.state class BoundedState: @septum.on_state async def on_state(ctx): history = ctx.common.setdefault("history", []) history.append(data) # Keep only last 100 items if len(history) > 100: history.pop(0) return Events.DONE -
Reuse FSM instances:
# BAD: Creating new FSM for each request async def handle_request(request): fsm = StateMachine(initial_state=ProcessState) await fsm.initialize() await fsm.run() # FSM discarded # GOOD: Pool of FSMs class FSMPool: def __init__(self, state_class, size=10): self.pool = asyncio.Queue(maxsize=size) self.state_class = state_class async def initialize(self): for _ in range(size): fsm = StateMachine(initial_state=self.state_class) await fsm.initialize() await self.pool.put(fsm) async def acquire(self): return await self.pool.get() async def release(self, fsm): # Reset context if needed await self.pool.put(fsm)
Performance Issues¶
Symptoms: - Slow state transitions - High CPU usage - Poor throughput
Diagnosis:
import time
@septum.state
class ProfiledState:
@septum.on_state
async def on_state(ctx):
start = time.perf_counter()
# State logic
result = await expensive_operation()
elapsed = time.perf_counter() - start
if elapsed > 0.1: # Log if > 100ms
logger.warning(f"Slow state: {elapsed:.3f}s")
return Events.DONE
Solutions:
-
Profile and optimize hot paths:
-
Avoid unnecessary work:
# BAD: Expensive operation every tick @septum.on_state async def on_state(ctx): # Recalculated every time result = expensive_computation(input_data) return Events.DONE # GOOD: Cache when possible @septum.on_enter async def on_enter(ctx): # Calculate once on entry ctx.common["cached_result"] = expensive_computation(ctx.common["input_data"]) @septum.on_state async def on_state(ctx): # Use cached result result = ctx.common["cached_result"] return Events.DONE -
Use async operations:
Getting Help¶
If you're still stuck after trying these solutions:
-
Check the examples:
-
Review the API reference:
-
Enable debug logging:
-
Create a minimal reproduction:
-
Report issues:
- Include error messages
- Share relevant code snippets
- Describe expected vs actual behavior
- Include FSM structure (Mermaid export)
See Also¶
- Production Guide - Deployment and performance
- API Reference - Complete API documentation
- Best Practices - Design patterns
- PDA Guide - Hierarchical state machines