# Cookbook

Task-oriented recipes for the workflows that landed in v4.x. Each
section is self-contained — copy, paste, modify the inputs, run.

For background on the underlying CLI flags and Python API, see the
[programmatic API guide](api_guide.md) and the
[main README](README.md).

---

## 1. One file → one structured result

The lowest-friction path for notebooks and scripts. Replaces the older
15-positional-arg `calc_bbe()` constructor.

```python
from goodvibes import compute_thermo

r = compute_thermo("ethane.log", QH=True, spc="TZ", temperature=313.15)

print(f"qh-G(T) = {r.qh_gibbs_free_energy:.6f} Hartree")
print(f"point group: {r.point_group}, σ = {r.symmno}")
print(f"level of theory (auto-detected): {r.level_of_theory}")
print(f"frequency scale factor (auto-applied): {r.bbe.scale_fac}")
```

`compute_thermo` returns a frozen `ThermoResult` dataclass with every
attribute `calc_bbe` produces, plus `r.bbe` and `r.qcdata` for advanced
reads. Defaults match the CLI: gas-phase concentration (`P/RT`),
auto-lookup of the frequency scaling factor from the level of theory
via the Truhlar database.

---

## 2. Batch a directory with parallel parsing → DataFrame

The most common notebook workflow: parse hundreds of conformers,
filter, sort, export.

```python
import glob
from goodvibes import compute_batch, to_dataframe
from goodvibes.constants import KCAL_TO_AU

paths = sorted(glob.glob("conformers/*.log"))
results = compute_batch(paths, jobs=8)        # 8 worker processes

df = to_dataframe(results)
df = df.sort_values("qh_gibbs_free_energy")
df["ΔG_kcal"] = (df.qh_gibbs_free_energy - df.qh_gibbs_free_energy.min()) * KCAL_TO_AU

# Drop conformers more than 3 kcal/mol above the lowest
keep = df[df["ΔG_kcal"] < 3.0]
print(f"{len(keep)} of {len(df)} conformers within 3 kcal/mol of the lowest")
keep[["name", "qh_gibbs_free_energy", "ΔG_kcal"]].to_csv("survivors.csv", index=False)
```

`jobs=0` uses all CPU cores. Output preserves input order. Pandas is
optional — install with `pip install goodvibes[full]`.

The same thing from the shell, no Python:

```bash
goodvibes conformers/*.log --jobs 8 --csv all_thermo.csv
```

---

## 3. N-way selectivity (replaces `--ee`)

The v4.1 redesign generalizes `--ee a:b` (2-bucket only) to N-way
selectivity. Each bucket is named explicitly with `--label NAME=PATTERN`
(repeatable). The patterns are `fnmatch` globs against the input
filenames — no filesystem walks.

The example fixture in `goodvibes/examples/selectivity/` is a
Diels–Alder TS set: 8 transition states across two regiochemistries
(1,2- vs 1,4-) and two diastereomers (exo / endo).

**2-way (exo vs endo)**

```bash
cd goodvibes/examples/selectivity
goodvibes DA_*.out --label exo='*_exo_*' --label endo='*_endo_*'
```

```text
Selectivity, Boltzmann-averaged (gibbs, T = 298.15 K)
       Species   Files   Population (%)   ΔΔG (kcal/mol)
       exo           4             2.56            2.156
★      endo          4            97.44            0.000

Ratio exo:endo = 3:97   Major: endo   excess = 94.88%   ΔΔG = 2.16 kcal/mol

Selectivity, Lowest conformer only (gibbs, T = 298.15 K)
       Species   Files   Population (%)   ΔΔG (kcal/mol)
       exo           1             1.84            2.355
★      endo          1            98.16            0.000

Ratio exo:endo = 2:98   Major: endo   excess = 96.31%   ΔΔG = 2.36 kcal/mol
```

The two tables answer different questions: the Boltzmann row shows the
selectivity once you average over conformers; the lowest-conformer row
shows what the selectivity would be if only the most stable TS in each
species mattered. The gap between them tells you how much of the
selectivity is driven by conformer mixing.

**4-way (regio × stereo)**

```bash
goodvibes DA_*.out \
  --label exo_12='*_exo_12*'   --label endo_12='*_endo_12*' \
  --label exo_14='*_exo_14*'   --label endo_14='*_endo_14*'
```

For N > 2 the summary line drops `excess` and `ΔΔG` (those are 2-bucket
concepts) and just reports the ratio — `Ratio exo_12:endo_12:exo_14:endo_14 = 2:97:0:0`.

**Per-species subdirectories**

If your conformers are organized into one directory per species,
`--label` patterns are matched against the immediate parent
directory's basename in addition to the file's basename. So a
layout like

```text
selectivity_separated/
  exo/
    DA_exo_12_i.out
    DA_exo_12_ii.out
    ...
  endo/
    DA_endo_12_i.out
    ...
```

works with the directory names as labels:

```bash
cd selectivity_separated
goodvibes */*out --label exo='exo*' --label endo='endo*'
```

The shell expands `*/*out` to relative paths like `exo/DA_exo_12_i.out`,
and the `'exo*'` pattern matches the parent dir `exo`. The same
patterns also keep working on flat layouts (where the species is
encoded in the filename), so you don't need to know in advance
which layout your data uses.

**JSON output**

Add `--json results.json` and the file gets two top-level blocks,
`selectivity` and `selectivity_lowest`, each with the per-species
populations, ΔΔG, ee (when N=2), and the source files for every
species. Schema is `0.4`.

**Strip plot**

To visualize where the selectivity comes from — lowest-TS gap vs
conformer mixing — write a per-species ΔG strip plot:

```bash
goodvibes DA_*.out \
  --label exo='*_exo_*' --label endo='*_endo_*' \
  --strip-plot selectivity.png
```

The image shows one column per species with scattered conformer
ΔG values (relative to the global lowest). A tight cluster near the
bottom of a column means that species is dominated by its lowest
conformer; a wide spread means conformer mixing is contributing.

In Python:

```python
import matplotlib.pyplot as plt
from goodvibes import compute_batch
from goodvibes.selectivity import (
    compute_selectivity, parse_label_args, assign_files_to_labels,
)
from goodvibes.plot import plot_selectivity_strip

results = compute_batch(glob.glob("DA_*.out"))
thermo = {r.file: r.qh_gibbs_free_energy for r in results}
labels = parse_label_args(["exo=*_exo_*", "endo=*_endo_*"])
files_per_label = assign_files_to_labels(list(thermo), labels)
sel = compute_selectivity(thermo, files_per_label, 298.15)

ax = plot_selectivity_strip(sel, thermo)
plt.savefig("selectivity.png", dpi=200, bbox_inches="tight")
```

matplotlib is in the optional `[plot]` extras (or `[full]`) — install
with `pip install goodvibes[plot]`.

**Migration from `--ee`**

```bash
# v3.x
goodvibes *.log --ee 'P_R_*:P_S_*'

# v4.x equivalent
goodvibes *.log --label R='P_R_*' --label S='P_S_*'
```

`--ee` still works in v4.x with a `DeprecationWarning`; it's slated
for removal in v5.0.

---

## 4. PES with the new YAML format

The legacy line-based PES file (`--- # PES` markers) is auto-detected
and still works, but it isn't real YAML and has no stoichiometry
support. v4.2 adds a proper YAML schema with `pathways:` / `species:`
/ `format:` top-level keys and a `coeff*name` syntax for stoichiometric
sums.

```yaml
# azabor_PES_v2.yaml
pathways:
  Ph:
    - "R1-An + Aza-Phos"
    - "R1-Comp + THF"
    - "AmTS + THF"
    - "Azir-Comp + THF"
    - "OpenTS + THF"
    - "Syn-P + THF"

species:
  R1-An:      {files: "r1-li-3thf-c1*"}
  Aza-Phos:   {files: "azaoxy-phosphine-full*"}
  THF:        {files: "thf*"}
  R1-Comp:    {files: "r1-phosphine-2thf-full*"}
  Azir-Comp:  {files: "aziridinium-phos-full*"}
  Syn-P:      {files: "syn-product-phos-full*"}
  OpenTS:     {files: "openTS-phos-full*"}
  AmTS:       {files: "aminationTS-full-unfrz-c1*"}

format:
  units: kcal/mol
  decimals: 1
```

Stoichiometric example: a bimolecular reaction would write a point
as `"2*A + B"`. Each species' `files:` is a glob (single string) or
explicit list (`[a.log, b.log]`).

**Assigning species by directory.** When each species lives in its
own subdirectory, use `dir:` (single) or `dirs:` (list) instead of
file globs:

```yaml
species:
  R1-An:      {dir: "R1-An"}
  Aza-Phos:   {dir: "Aza-Phos"}
  AmTS:      {dir: "AmTS"}
  # combine if a species has both subdir conformers and a separate
  # explicit file:
  THF:        {files: "thf_extra.log", dir: "THF"}
```

`dir:` matches files whose immediate parent directory's basename
equals the value (or matches it as an fnmatch glob — `dir: "TS_*"`
catches every `TS_R/`, `TS_S/`, ...). Trailing `/`, `/*` or `/**`
on the dir name is ignored.

Run it from the directory above the per-species subdirectories:

```bash
cd goodvibes/examples/pes_separated
goodvibes */*log --spc tzpop --pes azabor_PES.yaml
```

The shell `*/*log` glob hands GoodVibes relative paths like
`R1-An/r1-li-3thf-c1.log` — the `dir: "R1-An"` rule sees `R1-An`
as the parent dir basename and assigns the file there.

Run it:

```bash
cd goodvibes/examples/pes
goodvibes *.log --pes azabor_PES_v2.yaml --spc sp_tzpop
```

By default each species' contribution is **gconf-corrected**: lowest
qh-G conformer + Boltzmann adjustment + the −R Σ pᵢ ln pᵢ mixing
entropy. Two flags change that:

| Mode | Flag | What it does |
| --- | --- | --- |
| gconf (default) | — | lowest + adjustment + mixing entropy |
| pure Boltzmann | `--nogconf` | Boltzmann-weighted average, no mixing entropy |
| lowest only | `--lowest-only` | use each species' single lowest qh-G conformer |

The mode tag appears in the table title:

```text
RXN: Ph  (kcal/mol)  at T = 298.15 K, p = 1 atm — lowest conformer per species
```

**Reaction-profile diagram**

```bash
goodvibes *.log --pes azabor_PES_v2.yaml --spc sp_tzpop \
                --pes-plot pes.png
```

Saves a step-plot of the pathway's qh-G profile (one column per
point, horizontal bar at each level, smooth bezier connectors).
matplotlib via `pip install goodvibes[plot]`.

If your PES YAML defines multiple pathways (e.g. an R-side and an
S-side TS sharing reactants and products), `--pes-plot` overlays
them on the same axes by default — different colors from the
matplotlib cycle, with a legend and shared x-axis. Pathways must
have the same number of points to be comparable.

For full control over colors, single-pathway selection, point
annotations, or per-conformer scatter, drop down to the API:

```python
from goodvibes.pes_loader import load_pes
from goodvibes.plot import plot_pes
import matplotlib.pyplot as plt

pes = load_pes("R_vs_S.yaml", thermo_data)        # 2-pathway YAML
ax = plot_pes(
    pes,
    colors=["#26a6a4", "#e76f51"],     # custom per-pathway palette
    connector_style="bezier",          # or "linear"
    label_points=True,                 # annotate ΔqhG at each level
)
plt.savefig("R_vs_S.png", dpi=200, bbox_inches="tight")

# Or pick one pathway and overlay individual conformer dots:
ax = plot_pes(
    pes, pathway_index=0,
    show_conformers=True,
    thermo_lookup={f: bbe.qh_gibbs_free_energy
                   for f, bbe in thermo_data.items()},
)
```

The legacy `--graph FILE.yaml` flag is still supported and reads
styling (dpi, color, title, legend, gridlines, ylim, ...) from a
YAML's `--- # FORMAT` block. It will be deprecated in v5.1 once
`--pes-plot` covers the remaining gaps.

---

## 5. PES + JSON for downstream analysis

```bash
goodvibes *.log --pes azabor_PES_v2.yaml --spc sp_tzpop --json pes.json
```

The JSON gets a `pes` block (schema v1.0):

```python
import json

with open("pes.json") as f:
    payload = json.load(f)

for path in payload["pes"]["pathways"]:
    print(f"\n=== {path['name']} ({path['units']}) ===")
    for pt in path["points"]:
        print(f"  {pt['label']:25s}  ΔqhG = {pt['relative']['qh_g']:+7.2f}")
```

Each point carries `label`, `species` (name + coefficient + resolved
files), and `relative` (Δ-values for E, ZPE, H, qh-H, T·S, T·qh-S, G,
qh-G, plus SPC variants when `--spc` was set). Plug straight into
plotting libraries or downstream pipelines.

---

## 6. Parse once, re-analyze many times

QC outputs are slow to parse, especially for large conformer ensembles
or composite-method SPCs. The unified v1.0 JSON payload (`--export`)
captures every parsed field once; subsequent runs read it back via
`--import` and skip the QC files entirely. Useful for re-running at a
different temperature, concentration, frequency cutoff, or quasi-RRHO
scheme without touching the original `.log`/`.out` files.

```bash
# First pass — parse + apply SPC + export the structured payload.
goodvibes conformers/*.log --spc TZ --export thermo.json

# Re-run at 350 K with the same files but no QC parsing. Re-pass --spc
# to keep the cached SPC numbers driving G(T)_SPC; drop it for plain G.
goodvibes --import thermo.json --spc TZ -t 350

# Re-run with a stricter Truhlar low-frequency cutoff. Still no parsing.
goodvibes --import thermo.json --spc TZ -f 150 --QH

# Combine cached --spc with selectivity at a new temperature.
goodvibes --import thermo.json --spc TZ -t 313.15 \
          --label R='cat_R*' --label S='cat_S*'
```

`--export` writes the same payload as `--json`, so a single file covers
both downstream pipelines and re-import. Once exported, the original
`.log`/`.out` files can be archived, moved, or deleted — `--import`
works with just the JSON. The `--spc` energies are cached on the QCData
record, so re-passing `--spc <suffix>` reuses them without ever
re-reading the SPC files.

---

## See also

- The full CLI flag table in the [main README](README.md).
- The [programmatic API reference](api_guide.md) for `compute_thermo`,
  `compute_batch`, `ThermoResult`, and `to_dataframe`.
- The full module reference covers `goodvibes.pes_loader`,
  `goodvibes.pes_model`, `goodvibes.selectivity`, etc., for users
  embedding GoodVibes in larger pipelines.