# `qlat_scripts.v1.gen_data` — Propagator Generation and Field Selection Utilities Source: `qlat/qlat_scripts/v1/gen_data.py` > **Note:** Update this document when updating the source file. ## Outline - [Overview](#overview) - [Terminology](#terminology) - [Output Data Layout](#output-data-layout) - [Inverter Setup](#inverter-setup) - [Wall-Source Propagators](#wall-source-propagators) - [Field Selection Weight Computation](#field-selection-weight-computation) - [Random Field and Selection Probability](#random-field-and-selection-probability) - [Sub-Sampling](#sub-sampling) - [Selection Splitting](#selection-splitting) - [Point-Source Propagators](#point-source-propagators) - [HVP Computation](#hvp-computation) - [Random U(1) Volume-Source Propagators](#random-u-1-volume-source-propagators) - [Smeared-Source Propagators](#smeared-source-propagators) - [Typical Workflow](#typical-workflow) - [Key Parameters](#key-parameters) - [Notes](#notes) ## Overview This module provides the core data generation pipeline for RBC/UKQCD lattice QCD calculations. It handles: - Wall-source and point-source quark propagator inversions - Smeared-source propagators - Random U(1) volume-source propagators - Field/point selection with importance sampling (weight-based) - Sub-sampling of existing selections - HVP (hadronic vacuum polarization) computation - AMA (all-mode-accumulation) error correction support All `run_*` functions follow a **lazy-evaluation** pattern: they return a callable `get_*` (wrapped with `@q.lazy_call`) that loads or computes data on first invocation and caches the result. If the output already exists on disk, the function returns the loader immediately without performing any computation. ## Terminology | Term | Meaning | |------|---------| | `job_tag` | Ensemble identifier, e.g. `"24D"`, `"test-4nt8"` | | `traj` | Trajectory number (integer) | | `inv_type` | Quark flavor index: `0` = light, `1` = strange, `2` = charm | | `inv_acc` | Inversion accuracy: `0` = sloppy, `1` = medium, `2` = exact | | `gf` | Gauge field (`q.GaugeField`) | | `gt` | Gauge transformation (`q.GaugeTransform`) | | `eig` | Eigenvectors for deflation (from `run_eig`) | | `psel` | Point selection (`q.PointsSelection`) | | `fsel` | Field selection (`q.FieldSelection`) | | `psel_prob` | Point selection with probability weights (`q.SelectedPointsRealD`) | | `fsel_prob` | Field selection with probability weights (`q.SelectedFieldRealD`) | | `wi` | Wall-source index list: list of `(idx, tslice, inv_type, inv_acc)` tuples | | `sfw` | Selected-field writer (`q.open_fields(..., "a")`) | | `qar_sp` | QAR archive for point-selected props (`q.open_qar_info(...)`) | ## Output Data Layout All outputs are saved under `{job_tag}/...` with trajectory subdirectory `traj-{traj}/`. The typical naming conventions: - **Full propagators**: `{job_tag}/prop-wsrc-full-{flavor}/traj-{traj}/` - **Sparse wall-source props**: `{job_tag}/prop-wsrc-{flavor}/traj-{traj}/` - **Point-source props**: `{job_tag}/prop-psrc-{flavor}/traj-{traj}/` - **Smeared-source props**: `{job_tag}/prop-smear-{flavor}/traj-{traj}/` - **Random U(1) props**: `{job_tag}/prop-rand-u1-{flavor}/traj-{traj}/` - **Random U(1) sparse props**: `{job_tag}/prop-rand-u1-{type}-sparse-{flavor}/traj-{traj}/` - **HVP fields**: `{job_tag}/hvp-psrc-{flavor}/traj-{traj}/` - **Field selection weights**: `{job_tag}/field-selection-weight/traj-{traj}/` - **Point-selected propagators**: `{job_tag}/psel-prop-{type}-{flavor}/traj-{traj}/` where `{flavor}` is `"light"`, `"strange"`, or a name from `quark_flavor_list`. Data is written atomically: files are first saved with `.acc` suffix, then renamed upon completion via `q.qrename_info`. ## Inverter Setup ### `run_get_inverter(job_tag, traj, *, inv_type, get_gf, get_gt=None, get_eig=None)` Pre-computes and caches the inverter for all accuracy levels (`inv_acc` = 0, 1, 2) for a given quark flavor. Calls `ru.get_inv(gf, job_tag, inv_type, inv_acc, gt=gt, eig=eig)` internally. **Parameters:** - `get_gf` — callable returning gauge field (required) - `get_gt` — callable returning gauge transform (optional, defaults to `None`) - `get_eig` — callable returning eigenvectors (optional, defaults to `None`) ## Wall-Source Propagators ### `compute_prop_wsrc_1(job_tag, traj, *, gf, gt, eig, idx, tslice, inv_type, inv_acc)` Core wall-source inversion. Creates a wall source at `tslice`, applies the inverter, and returns the solution propagator. **Returns:** `q.Prop` — the solution field. ### `run_prop_wsrc_full(job_tag, traj, *, inv_type, get_gf, get_eig, get_gt, get_wi)` Computes **full** (un-sampled) wall-source propagators for all time slices and saves them along with their `qnorm_field` (used later for importance-sampled field selection). Skips if sparse wsrc data already exists. **Output:** `{job_tag}/prop-wsrc-full-{flavor}/traj-{traj}/` **Key parameters:** - `get_wi` — callable returning the wall-source index list ### `run_prop_wsrc_sparse(job_tag, traj, *, inv_type, get_gf, get_gt, get_eig, get_psel, get_fsel, get_wi)` Generates sparse (sampled) wall-source propagators by loading full wsrc data or performing on-the-fly inversions, then projecting onto `psel` and `fsel`. Saves both the selected-field propagator and the wall-sink projected propagator. **Output:** - `{job_tag}/prop-wsrc-{flavor}/traj-{traj}/` (selected-field data) - `{job_tag}/psel-prop-wsrc-{flavor}/traj-{traj}/` (point-selected data + wall-sink) **Behavior:** If full wsrc data is unavailable and `is_performing_inversion_if_no_full_prop_available` is `False` (default), prints a warning and returns. ## Field Selection Weight Computation ### `run_f_weight_from_wsrc_prop_full(job_tag, traj)` Computes importance-sampling weights from full wall-source propagator norms. Returns `get_f_weight` callable that yields a `q.FieldRealD(geo, 1)` with per-site weights averaged around 1. **Algorithm:** 1. Loads `qnorm_field` from full wsrc data for both light and strange quarks 2. Computes global-sum-per-tslice profiles 3. Averages weight profiles across time slices 4. Combines light (25%) and strange (25%) with uniform baseline (50%) **Output:** `{job_tag}/field-selection-weight/traj-{traj}/weight.field` **Returns:** `get_f_weight` callable, or `None` if data is unavailable. ### `run_f_weight_uniform(job_tag, traj)` Alternative to `run_f_weight_from_wsrc_prop_full`: creates a uniform weight field (`f_weight.set_unit()`). Useful for testing or when full wsrc data is not yet available. **Returns:** `get_f_weight` callable. ### `run_f_weight_load(job_tag, traj)` Loads an existing weight field from disk. Raises `Exception` if the file does not exist. **Returns:** `get_f_weight` callable. ## Random Field and Selection Probability ### `run_f_rand_01(job_tag, traj)` Generates a reproducible uniform random field in [0, 1) used for stochastic selection. The RNG seed is derived from `get_job_seed(job_tag)` and the trajectory number. **Output:** `{job_tag}/field-selection-weight/traj-{traj}/f-rand-01.field` **Returns:** `get_f_rand_01` callable yielding `q.FieldRealD(geo, 1)`. ### `run_fsel_prob(job_tag, traj, *, get_f_rand_01, get_f_weight)` Creates the field selection (stochastic spatial sampling) with probability weights. **Algorithm:** 1. Selects sites where `f_weight * fsel_rate >= f_rand_01` 2. Saves the `FieldSelection` and a `SelectedFieldRealD` of selection probabilities **Parameters:** - `get_f_rand_01` — set to `None` to load existing data - `get_f_weight` — set to `None` to load existing data **Output:** - `{job_tag}/field-selection/traj-{traj}.field` - `{job_tag}/field-selection-weight/traj-{traj}/fsel-prob.sfield` **Returns:** `get_fsel_prob` callable yielding `q.SelectedFieldRealD(fsel, 1)`. ### `run_psel_prob(job_tag, traj, *, get_f_rand_01, get_f_weight, tag=None)` Creates the point selection with probability weights. Same algorithm as `run_fsel_prob` but for points. **Parameters:** - `tag` — optional tag for named selections (e.g. `"small"`, `"median"`, `"large"`) **Output:** - `{job_tag}/points-selection/traj-{traj}.lati` (default, no tag) - `{job_tag}/field-selection-weight/traj-{traj}/psel-prob.lat` (default, no tag) - `{job_tag}/psel_{tag}/traj-{traj}/psel.lati` (with tag) - `{job_tag}/psel_{tag}/traj-{traj}/psel-prob.lat` (with tag) **Returns:** `get_psel_prob` callable yielding `q.SelectedPointsRealD(psel, 1)`. ### `run_fsel_from_fsel_prob(get_fsel_prob)` / `run_psel_from_psel_prob(get_psel_prob)` Convenience wrappers that extract `fsel` / `psel` from the probability-weighted objects. **Returns:** `lambda: get_fsel_prob().fsel` / `lambda: get_psel_prob().psel`, or `None` if input is `None`. ## Sub-Sampling ### `run_fsel_prob_sub_sampling(job_tag, traj, *, sub_sampling_rate, get_fsel_prob, get_f_rand_01, get_f_weight)` Creates a sub-sample of an existing field selection. Approximately `sub_sampling_rate` fraction of the original selection is kept. **Parameters:** - `sub_sampling_rate` — fraction in [0, 1]; `1.0` means complete sub-sampling - `get_f_weight` — if `None`, uses `fsel_prob * sub_sampling_rate` as probability (not exactly equivalent to using `f_weight`) **Returns:** `get_fsel_prob_sub` callable. ### `run_psel_prob_sub_sampling(job_tag, traj, *, sub_sampling_rate, get_psel_prob, get_f_rand_01, get_f_weight)` Point-selection analogue of `run_fsel_prob_sub_sampling`. If `get_param(job_tag, "use_simple_psel_sub_sampling", default=False)` is `True`, simply takes the first `sub_sampling_rate * original_num` points. **Returns:** `get_psel_prob_sub` callable. ## Selection Splitting ### `run_psel_split(job_tag, traj, *, get_psel, num_piece)` / `run_fsel_split(job_tag, traj, *, get_fsel, num_piece)` Splits a point/field selection into `num_piece` sub-selections with increased spatial separation (for independent measurement chunks). `num_piece` should be a power of 2. **Output:** `{job_tag}/points-selection-split/traj-{traj}/num-piece-{num_piece}/` or `{job_tag}/field-selection-split/traj-{traj}/num-piece-{num_piece}/` **Returns:** `get_psel_list` callable yielding `list[q.PointsSelection]` of length `num_piece`. ## Point-Source Propagators ### `compute_prop_psrc(job_tag, traj, xg_src, inv_type, inv_acc, *, idx, gf, gt, sfw, qar_sp, psel, fsel, f_rand_01, sfw_hvp, qar_hvp_ts, eig)` Core point-source inversion at position `xg_src`. Performs AMA-style field selection for the solution: sites with large propagator norm (above `field_selection_fsel_psrc_prop_norm_threshold`) are probabilistically added to the field selection. **Saves:** - Selected-field propagator (in `sfw`) - Point-selected propagator + wall-sink projection (in `qar_sp`) - Additional `fsel-prob-psrc-prop` field for the adaptive selection - HVP contraction (if `sfw_hvp` / `qar_hvp_ts` are provided) ### `run_prop_psrc(job_tag, traj, *, inv_type, get_gf, get_eig, get_gt, get_psel, get_fsel, get_f_rand_01)` Runs point-source propagator generation for all points in `psel` with AMA multi-accuracy. For each source point, sloppy inversion is always performed; medium and exact inversions are done stochastically based on `prob_acc_1_psrc` and `prob_acc_2_psrc`. **Output:** - `{job_tag}/prop-psrc-{flavor}/traj-{traj}/` - `{job_tag}/psel-prop-psrc-{flavor}/traj-{traj}/` - `{job_tag}/hvp-psrc-{flavor}/traj-{traj}/` (HVP fields) - `{job_tag}/hvp-sum-tslice-psrc-{flavor}/traj-{traj}/` (HVP time-slice sums) ## HVP Computation ### `calc_hvp_sum_tslice(chvp_16)` Computes the HVP summed over spatial slices in all 4 directions from a `chvp_16` field (produced by `q.contract_chvp_16`). **Returns:** `ld_hvp_ts` — LatData with shape `[t_dir, t, mu, nu]` where `t_dir` ∈ {x, y, z, t}. ### `compute_hvp_average(job_tag, traj, *, inv_type, psel_prob, data_path, geo)` Computes the AMA-corrected average HVP field from point-source HVP data, incorporating probability weights and source-position shifting. **Returns:** `hvp_average` — `q.FieldComplexD(geo, 16)`. ### `run_hvp_average(job_tag, traj, *, inv_type, get_psel_prob)` Wrapper that loads or computes the average HVP field. **Output:** `{job_tag}/hvp-average/traj-{traj}/hvp_average_{flavor}.field` **Returns:** `load` callable yielding `q.FieldComplexD(geo, 16)`. ## Random U(1) Volume-Source Propagators ### `run_field_rand_u1_dict(job_tag, traj)` Generates reproducible random U(1) fields for both `fsel` and `psel` source types, plus their conjugates. **Output:** `{job_tag}/field-rand-u1/traj-{traj}/` **Returns:** `get_field_rand_u1_dict` callable yielding a dict with keys `"fsel-src"`, `"fsel-src-dag"`, `"psel-src"`, `"psel-src-dag"`, each mapping to a `q.FieldComplexD`. ### `run_prop_sparse_rand_u1_src(job_tag, traj, *, inv_type, get_gf, get_psel, get_fsel, get_field_rand_u1_dict, get_psel_list=None, get_fsel_psel_list=None, get_eig=None)` Computes propagators from random U(1) sparse sources defined on `psel_list` or `fsel_psel_list` sub-selections. Supports both dagger and non-dagger inversions with AMA multi-accuracy. **Parameters:** - Set exactly one of `get_psel_list` or `get_fsel_psel_list` - `get_psel_list` — callable returning `list[q.PointsSelection]` for psel-type sources - `get_fsel_psel_list` — callable returning `list[q.PointsSelection]` for fsel-type sources **Output:** - `{job_tag}/prop-rand-u1-{type}-sparse-{flavor}/traj-{traj}/` - `{job_tag}/psel-prop-rand-u1-{type}-sparse-{flavor}/traj-{traj}/` ### `run_prop_rand_u1(job_tag, traj, *, inv_type, get_gf, get_fsel, get_eig=None)` Computes random U(1) volume-source propagators using `q.mk_rand_u1_prop`. Each source uses a unique RNG seed derived from `(job_seed, traj, idx_rand_u1)`. **Parameters:** - Number of sources controlled by `get_param(job_tag, "n_rand_u1_fsel")` **Output:** `{job_tag}/prop-rand-u1-{flavor}/traj-{traj}/` ## Smeared-Source Propagators ### `run_prop_smear(job_tag, traj, *, inv_type, get_gf, get_gf_ape, get_eig, get_gt, get_psel, get_fsel, get_psel_smear, get_psel_smear_median)` Generates propagators from APE-smeared sources. The smearing parameters are read from job parameters: - `get_param(job_tag, "prop_smear_coef")` - `get_param(job_tag, "prop_smear_step")` Both the original and smeared-sink solutions are saved. **Output:** - `{job_tag}/prop-smear-{flavor}/traj-{traj}/` - `{job_tag}/psel-prop-smear-{flavor}/traj-{traj}/` - `{job_tag}/psel_smear_median-prop-smear-{flavor}/traj-{traj}/` ## Typical Workflow A typical data generation script (see `examples-py-gpt/gpt-qlat-auto-simple.py`) follows this sequence: ```python from qlat_scripts.v1 import * # 1. Gauge field and transform get_gf = run_gf(job_tag, traj_gf) get_gt = run_gt(job_tag, traj_gf, get_gf) # 2. Eigenvectors (for light quark deflation) get_eig_light = run_eig(job_tag, traj_gf, get_gf) get_eig_strange = run_eig_strange(job_tag, traj_gf, get_gf) # 3. Wall-source index list get_wi = run_wi(job_tag, traj) # 4. Full wall-source propagators (for importance sampling weights) run_prop_wsrc_full(job_tag, traj, inv_type=0, get_gf=get_gf, get_eig=get_eig_light, get_gt=get_gt, get_wi=get_wi) run_prop_wsrc_full(job_tag, traj, inv_type=1, get_gf=get_gf, get_eig=get_eig_strange, get_gt=get_gt, get_wi=get_wi) # 5. Field/point selection from wsrc propagator norms get_f_weight = run_f_weight_from_wsrc_prop_full(job_tag, traj) get_f_rand_01 = run_f_rand_01(job_tag, traj) get_fsel_prob = run_fsel_prob(job_tag, traj, get_f_rand_01=get_f_rand_01, get_f_weight=get_f_weight) get_psel_prob = run_psel_prob(job_tag, traj, get_f_rand_01=get_f_rand_01, get_f_weight=get_f_weight) get_fsel = run_fsel_from_fsel_prob(get_fsel_prob) get_psel = run_psel_from_psel_prob(get_psel_prob) # 6. Sparse wall-source propagators (sampled from full data) run_prop_wsrc_sparse(job_tag, traj, inv_type=0, get_gf=get_gf, get_gt=get_gt, get_eig=get_eig_light, get_psel=get_psel, get_fsel=get_fsel, get_wi=get_wi) # 7. Point-source propagators (with AMA) run_prop_psrc(job_tag, traj, inv_type=0, get_gf=get_gf, get_eig=get_eig_light, get_gt=get_gt, get_psel=get_psel, get_fsel=get_fsel, get_f_rand_01=get_f_rand_01) # 8. Random U(1) propagators run_prop_rand_u1(job_tag, traj, inv_type=0, get_gf=get_gf, get_fsel=get_fsel, get_eig=get_eig_light) ``` ### Alternative: Uniform Weights (Testing) For testing or when full wsrc data is unavailable: ```python get_f_weight = run_f_weight_uniform(job_tag, traj) ``` ### Multi-Flavor with Extended Quark Masses For ensembles with charm-like quarks (see `examples-py-gpt/gpt-qlat-data-gen-prop.py`): ```python quark_mass_list = get_param(job_tag, "quark_mass_list") for inv_type, mass in enumerate(quark_mass_list): if inv_type == 0: get_eig = get_eig_light else: get_eig = None run_prop_rand_vol_u1_src(job_tag, traj, inv_type=inv_type, get_gf=get_gf, get_psel=get_psel, get_fsel=get_fsel, get_eig=get_eig) ``` ## Key Parameters These parameters are read via `get_param(job_tag, ...)`: | Parameter | Description | |-----------|-------------| | `total_site` | Lattice dimensions, e.g. `[24, 24, 24, 64]` | | `fermion_params` | Fermion action parameters per `(inv_type, inv_acc)` | | `cg_params-{inv_type}-{inv_acc}` | CG solver parameters (`maxiter`, `maxcycle`) | | `field_selection_fsel_rate` | Fraction of sites in field selection | | `field_selection_psel_rate` | Fraction of sites in point selection | | `field_selection_fsel_psrc_prop_norm_threshold` | Threshold for adaptive psrc field selection | | `prob_exact_wsrc` | Probability of exact-accuracy wall-source inversion | | `prob_acc_1_psrc` | Probability of medium-accuracy point-source inversion | | `prob_acc_2_psrc` | Probability of exact-accuracy point-source inversion | | `prob_acc_1_rand_u1` | Probability of medium-accuracy rand U(1) inversion | | `prob_acc_2_rand_u1` | Probability of exact-accuracy rand U(1) inversion | | `n_rand_u1_fsel` | Number of random U(1) sources | | `prop_smear_coef` | APE smearing coefficient | | `prop_smear_step` | APE smearing steps | | `quark_flavor_list` | List of flavor names, e.g. `["light", "strange", "charm-1"]` | | `quark_mass_list` | List of quark masses | | `lanc_params` | Lanczos parameters for eigenvector generation | | `clanc_params` | Chebyshev-Lanczos parameters | ## Notes - All `run_*` functions use `q.obtain_lock()` / `q.release_lock()` for file-based mutual exclusion in multi-node environments. - AMA (all-mode-accumulation) is implemented stochastically: sloppy inversions are always performed, while medium and exact inversions are done with probabilities `prob_acc_1` and `prob_acc_2`. The probability is used later to reweight results. - The `@q.lazy_call` decorator ensures each `get_*` callable executes its computation at most once. - The `@q.timer` and `@q.timer_verbose` decorators provide automatic timing instrumentation. - Functions decorated with `@q.timer(is_timer_fork=True)` support timer forking for parallel execution contexts.