Architecture¶

DeepDriveWE-Academy is built on the Academy multi-agent framework. Simulations and resampling logic run as independent agents that communicate asynchronously through typed messages.

Agent Model¶

Every workflow has two agent types:

graph LR
    M["main()"] -->|launch| SA["SimulationAgent(s)"]
    M -->|launch| WA["WestpaAgent"]

    SA -->|"SimResult"| WA
    WA -->|"SimMetadata"| SA

SimulationAgent : Runs a single MD simulation per invocation. Receives SimMetadata (walker weight, restart file, parent pcoord), runs the simulation in a thread pool via agent_run_sync, and sends the SimResult back to the WestpaAgent.

WestpaAgent : Orchestrates the iteration cycle. Buffers incoming SimResult objects until the full batch is collected, then runs user-defined inference (binning, recycling, resampling) and dispatches the next iteration of simulations round-robin.

Communication Flow¶

run_westpa_workflow registers and launches all agents.
The initial batch of SimMetadata (from ensemble.next_sims) is dispatched round-robin to SimulationAgent instances.
Each SimulationAgent calls its run_simulation method, then sends the result to WestpaAgent.receive_simulation_data.
Once all results arrive, WestpaAgent.run_westpa fires:
- Calls run_inference (user-defined resampling).
- Advances ensemble state via ensemble.advance_iteration.
- Checkpoints the ensemble.
- Dispatches the next round of simulations.
Steps 3--4 repeat until max_iterations is reached.

Academy Primitives¶

See the Academy documentation for more information.

Primitive	Purpose
`Agent`	Base class for all agents. Subclass to add state and methods.
`@action`	Marks an async method as remotely callable by other agents.
`@loop`	Marks an async method as a background control loop.
`Handle[T]`	Typed proxy for calling actions on a remote agent of type `T`.
`Manager`	Orchestrates agent registration, launching, and shutdown.
`LocalExchangeFactory`	In-process message transport (single machine).
`HttpExchangeFactory`	Cloud message transport via the Academy Exchange (multi-node).

Execution Model¶

Simulations are CPU/GPU-bound and are offloaded to a Parsl executor. The WestpaAgent runs on a CPU thread. The Academy Manager is initialized with named executors:

async with await Manager.from_exchange_factory(
    factory=LocalExchangeFactory(),
    executors={
        'gpu': ParslPoolExecutor(parsl_config),
        'cpu': ThreadPoolExecutor(max_workers=1),
    },
    default_executor='gpu',
) as manager:
    ...

Agents are assigned to executors at launch time:

await manager.launch(OpenMMSimAgent, ..., executor='gpu')
await manager.launch(HuberKimWestpaAgent, ..., executor='cpu')

Checkpointing¶

EnsembleCheckpointer saves the full ensemble state after each iteration:

JSON checkpoints -- one file per iteration (checkpoint-000001.json) containing the serialized WeightedEnsemble.
HDF5 log -- a cumulative west.h5 file in WESTPA-compatible format for analysis with standard WE tools.

Workflows automatically resume from the latest checkpoint when restarted.

Extending the Framework¶

To build a custom workflow:

Subclass SimulationAgent -- override run_simulation to run your simulation engine and compute progress coordinates.
Subclass WestpaAgent -- override run_inference to plug in your binning, recycling, and resampling strategy.
Call run_westpa_workflow -- pass your agent types, ensemble configuration, and compute settings.

See the OpenMM NTL9 tutorial for a complete example.