Workflow Internals
This page is a technical map of the main PyAR workflow implementations. If you are using PyAR for chemistry, start with the task pages instead: Aggregation and Cluster Search, Reaction Search, Solvation and Growth Around a Core, and Bond Scan.
The word “workflow” is used here in the developer sense: a coordinated set of sampling, optimisation, selection, restart, and reporting steps.
Aggregation internals
Aggregation searches for low-energy packings of one or more fragments. This is the internal route behind molecular clusters, noncovalent complexes, and small aggregate models.
Examples:
pyar-cli aggregate C H -as 1 4 -N 8
pyar-cli react A.xyz B.xyz -N 8 -gmin 100 -gmax 1000 --software xtb
pyar-cli solvate solute.xyz solvent.xyz --software xtb -ss 10 -N 16
pyar-cli scan-bond 1 2 A.xyz B.xyz -N 8
pyar-cli -a C H -as 1 4 -N 8
pyar-cli --aggregate --formula C5H4 -N 8
Aggregation restart state is stored as readable JSON:
aggregates/
state.json
ag_.../
selected/
selected/
stoichiometry_.../
state.json records the input geometry and calculation settings, selected
pathway order, completed pathways, pathway-level selected results, and final
selected results. Re-running an interrupted aggregation with the same request
resumes only unfinished pathways while reusing their existing step outputs.
Legacy pyar.log pathway markers are imported once into JSON state when an
older aggregates/ calculation is resumed.
For a chemistry researcher, the main outputs to inspect are:
aggregates/state.jsonfor restart and provenanceselected/for the chosen low-energy candidatesthe energy table output for quick ranking of the structures
Reaction internals
Reaction searches operate on exactly two input structures and are meant for reaction discovery, bond formation, or close-contact pathway exploration.
pyar-cli -r A.xyz B.xyz -N 8 -gmin 100 -gmax 1000 --software xtb
pyar-cli react A.xyz B.xyz -N 8 -gmin 100 -gmax 1000 --software xtb
The geomeTRIC/TRIC reaction route is used for registered energy-gradient
providers. At present, this route is wired for xtb, aimnet_2,
orca, and gaussian. In practice, xtb and aimnet_2 are the
easier immediately usable options; orca and gaussian require the
corresponding executable and should be validated on the target installation.
Ordinary aggregation and standalone optimization continue to use each
backend’s native optimizer. A bonded reaction candidate is relaxed again with
gamma=0.0 before product identity is assessed.
For a chemist, the reaction workflow is useful when you want to:
search for candidate products without hand-building every starting guess
compare multiple orientations of the same reactants
inspect whether a close-contact structure relaxes back to starting material or becomes a new product
review the trace summary for energetic trends and bond changes
Reaction restart state is stored as readable JSON and XYZ snapshots:
reaction/
state.json
state/
geometries/
gamma_0100/
products/
state.json records the numeric gamma schedule, current cycle, pending and
retained geometries, completed jobs, discovered products, and the calculation
settings used for restart validation. Re-running the same command in an
interrupted calculation directory resumes compatible pending work. Completed
state is retained as a run record. An existing reaction/ directory without
a compatible state record is never overwritten automatically; start from a new
directory or remove archived output deliberately.
Useful files to inspect are:
reaction/
state.json
gamma_.../
orientation_.../
reaction_trace/
trace.jsonl
steps/step_*.xyz
path_summary.csv
candidate_ts/
highest_backend_energy.xyz
highest_total_energy.xyz
pre_product_geometry.xyz
max_bond_change.xyz
metadata.json
trace_plots/
reaction_profile.png
backend_energy_hartree is the physical backend energy without the AFIR
bias. total_energy_hartree is the optimization objective, including AFIR,
that geomeTRIC follows. The candidate file
candidate_ts/highest_backend_energy.xyz is usually the first structure to
inspect for later NEB, string, dimer, or TS attempts. The file
candidate_ts/pre_product_geometry.xyz is based on the first persistent
connectivity change and should not be treated as a confirmed transition state.
Legacy jobs.pkl reaction checkpoints are imported once when their gamma
schedule is unambiguous. A legacy checkpoint whose formatted keys have lost
distinct fractional gamma values exits with a clear error instead of resuming
an uncertain calculation.
Solvation internals
solvate is the command name, but the internal route is broader than solvent
placement. It explores how a central core grows when units are added around
it. Microsolvation is a common use case, and so is adding ligands to a
transition metal center to build an organometallic complex.
pyar-cli -s solute.xyz solvent.xyz --software xtb -ss 10 -N 16
pyar-cli solvate solute.xyz solvent.xyz --software xtb -ss 10 -N 16
Solvation restart state is stored as readable JSON:
solvation/
state.json
state/
geometries/
aggregate_002/
aggregate_003/
state.json records the input seeds, added fragment, calculation settings,
next cycle, completed cycles, and the current seeds to continue from.
Re-running an interrupted solvation with the same request resumes from the
last completed cycle and reuses the stored seed geometries.
For a chemistry researcher, the main outputs to inspect are:
solvation/state.jsonfor restart and cycle progresssolvation/state/geometries/for saved seed structuresthe selected seed geometries from the final cycle
Bond-scan internals
Bond scanning evaluates a distance scan between two fragments. It is a simple way to probe whether a bond-forming or bond-breaking coordinate behaves as expected before committing to a more expensive reaction search.
pyar-cli --scan-bond 1 2 A.xyz B.xyz -N 8
pyar-cli scan-bond 1 2 A.xyz B.xyz -N 8