💼 Professional

Event Model Learning from Story-Based Traces

My ANU honours thesis: adapting a classical AI planning algorithm to learn finite state machine models of events from narrative data, producing 165,364 FSMs in under 6 seconds.

October 25, 2024
PythonAI PlanningMachine LearningResearchANU

Read the full report

View the poster

What this was

For my ANU honours project (COMP4560, 2024), I adapted LOCM2, an algorithm originally designed to learn action models from AI planning traces, to work on story-based narrative data instead. The goal: given a dataset of narrative plans describing fictional events, automatically construct finite state machine (FSM) models for each kind of event.

The upstream problem is narrative AI. If you want a system that can reason about stories, generate them coherently, or answer questions about them, you need some structured representation of what events mean and how they relate to objects. My thesis was a step toward generating that representation automatically from existing story data.

Supervisor: Dr. Patrik Haslum, ANU School of Computing.

The starting point: LOCM2

LOCM2 (Learning Object-Centric Models 2) takes sequences of planning actions and figures out how objects transition between states. Feed it "pick-up(block)", "stack(block, table)", and it constructs an FSM for block that captures the valid transitions.

The catch: it was built for clean, formal planning traces. My dataset was narrative plans, 3,487 of them, stored as JSON. They had nested event structures, ambiguous object references, and no guarantees about ordering. None of that fit the assumptions LOCM2 was designed around.

The actual problems

Nested events. Narrative events contain sub-events. LOCM2 expects flat action sequences. I wrote a recursive parser that flattened the tree structure into ordered sequences while preserving logical grouping.

Ambiguous object types. Planning traces have typed objects. Story plans do not. An object called "sword" needs to be categorized somehow before you can build state machines for it. I used WordNet word sense frequency as a probabilistic proxy, assigning the most common sense as the type. It is not perfect, but it is reproducible and produces reasonable groupings across a large corpus.

Partial ordering. Many events in a narrative plan are not strictly sequenced. LOCM2 assumes total order. I built a precedence graph layer that encodes partial orderings and generates all valid linearizations before passing sequences downstream. This was the most involved part of the implementation.

Cycles. The original algorithm had no cycle detection. Narrative data loops. I added explicit detection and handling so the FSM construction would not diverge.

What came out

On the full dataset of 3,487 plans:

  • 165,364 FSMs generated
  • 425,022 states, 327,168 transitions total
  • Average FSM: 2.57 states, 1.98 transitions
  • 93.5% well-formed (154,663 FSMs)
  • 6.5% ill-formed due to ambiguous or conflicting traces (10,701 FSMs)
  • Total processing time: 5.68 seconds

The well-formed rate held up well given how noisy narrative data is compared to formal planning domains. The ill-formed cases were concentrated in event types with the most ambiguous object references, which is what you would expect.

What I learned

The algorithmic work was interesting, but the harder problem was making the whole pipeline robust to data that was never designed to be machine-readable. Narrative plans are written for human storytelling systems. They encode assumptions about context, object identity, and event order that formal planning domains make explicit. Bridging that gap required a lot of careful preprocessing before LOCM2 could do anything useful.

The Union-Find data structure did a lot of heavy lifting for grouping event sequences by object identity across plans. Getting that right was what made the scale feasible.

This project sits at the boundary between classical AI planning and narrative intelligence. The FSMs it produces are the kind of structured event knowledge that downstream story reasoning systems need but rarely have in quantity. Whether they are good enough for those systems is the next question, and one I would be interested in picking back up.