Architecture¶
DeepDriveWE-Academy is built on the Academy multi-agent framework. Simulations and resampling logic run as independent agents that communicate asynchronously through typed messages.
Agent Model¶
Every workflow has two agent types:
graph LR
M["main()"] -->|launch| SA["SimulationAgent(s)"]
M -->|launch| WA["WestpaAgent"]
SA -->|"SimResult"| WA
WA -->|"SimMetadata"| SA
SimulationAgent
: Runs a single MD simulation per invocation. Receives SimMetadata
(walker weight, restart file, parent pcoord), runs the simulation in
a thread pool via agent_run_sync, and sends the SimResult back
to the WestpaAgent.
WestpaAgent
: Orchestrates the iteration cycle. Buffers incoming SimResult
objects until the full batch is collected, then runs
user-defined inference (binning, recycling, resampling) and
dispatches the next iteration of simulations round-robin.
Communication Flow¶
run_westpa_workflowregisters and launches all agents.- The initial batch of
SimMetadata(fromensemble.next_sims) is dispatched round-robin toSimulationAgentinstances. - Each
SimulationAgentcalls itsrun_simulationmethod, then sends the result toWestpaAgent.receive_simulation_data. - Once all results arrive,
WestpaAgent.run_westpafires:- Calls
run_inference(user-defined resampling). - Advances ensemble state via
ensemble.advance_iteration. - Checkpoints the ensemble.
- Dispatches the next round of simulations.
- Calls
- Steps 3--4 repeat until
max_iterationsis reached.
Academy Primitives¶
See the Academy documentation for more information.
| Primitive | Purpose |
|---|---|
Agent |
Base class for all agents. Subclass to add state and methods. |
@action |
Marks an async method as remotely callable by other agents. |
@loop |
Marks an async method as a background control loop. |
Handle[T] |
Typed proxy for calling actions on a remote agent of type T. |
Manager |
Orchestrates agent registration, launching, and shutdown. |
LocalExchangeFactory |
In-process message transport (single machine). |
HttpExchangeFactory |
Cloud message transport via the Academy Exchange (multi-node). |
Execution Model¶
Simulations are CPU/GPU-bound and are offloaded to a
Parsl executor. The WestpaAgent
runs on a CPU thread. The Academy Manager is initialized with named
executors:
async with await Manager.from_exchange_factory(
factory=LocalExchangeFactory(),
executors={
'gpu': ParslPoolExecutor(parsl_config),
'cpu': ThreadPoolExecutor(max_workers=1),
},
default_executor='gpu',
) as manager:
...
Agents are assigned to executors at launch time:
await manager.launch(OpenMMSimAgent, ..., executor='gpu')
await manager.launch(HuberKimWestpaAgent, ..., executor='cpu')
Checkpointing¶
EnsembleCheckpointer saves the full ensemble state after each
iteration:
- JSON checkpoints -- one file per iteration
(
checkpoint-000001.json) containing the serializedWeightedEnsemble. - HDF5 log -- a cumulative
west.h5file in WESTPA-compatible format for analysis with standard WE tools.
Workflows automatically resume from the latest checkpoint when restarted.
Extending the Framework¶
To build a custom workflow:
- Subclass
SimulationAgent-- overriderun_simulationto run your simulation engine and compute progress coordinates. - Subclass
WestpaAgent-- overriderun_inferenceto plug in your binning, recycling, and resampling strategy. - Call
run_westpa_workflow-- pass your agent types, ensemble configuration, and compute settings.
See the OpenMM NTL9 tutorial for a complete example.