Skip to content

alpha-forge explore

Manage exploration pipeline state and run the full pipeline in one command. These commands are used internally by the AI agent skill /explore-strategies.

Subcommand Description
run Run backtest → optimize → WFT → DB registration end-to-end (main command)
import Bulk-import a Markdown log into the exploration DB
log Manually record an exploration trial to the DB
status Show coverage map against a goal
result Show details of the latest trial saved in the exploration DB
diagnose Estimate whether a longer backtest period would let a WFT-failed strategy pass, via linear extrapolation of the trade rate (issue #685)
health Detect consecutive failures and scaffold fixation from recent trials (quality gate for unattended runs)
recommend Write next-exploration candidates to recommendations.yaml
coverage Update or view parameter coverage YAML

alpha-forge explore run

Runs validation → auto data fetch → backtest → optimize → walk-forward test (WFT) → coverage update → DB registration in a single command. Returns exit code 1 on failure (except --dry-run / --pre-check).
Called internally by the /explore-strategies agent skill.

alpha-forge explore run <SYMBOL> --strategy <NAME> --goal <GOAL> [--no-cleanup] [--dry-run] [--pre-check] [--json] [--db <PATH>]
Option Description Default
--strategy Strategy name (required)
--goal Goal name — applies pre_filter / target_metrics from goals.yaml default
--no-cleanup Skip file / DB cleanup on failure (for debugging) off
--dry-run Print planned steps and exit without running off
--pre-check Run backtest only (default params), skip optimization and WFT (#321) off
--json Output result as JSON to stdout (deprecated: use alpha-forge explore result show <id> --json instead) off
--db Path to exploration DB (defaults to path from forge.yaml)

Using --pre-check

Use for rapid screening during strategy design. Optimization and WFT are not executed.

alpha-forge explore run SPY --strategy my_rsi_v1 --pre-check
alpha-forge explore run SPY --strategy my_rsi_v1 --pre-check --json

Sample text output with --pre-check:

📊 Pre-check (backtest, default params)
  Sharpe:     0.821
  MaxDD:      19.9%
  Trades:     24 ⚠️ low (may be insufficient for WFT windows)
  Signals:    31
  Pre-filter: FAIL ❌

→ Optimization and WFT are skipped.

Output JSON example

{
  "symbol": "SPY",
  "strategy_id": "spy_hmm_rsi_v3",
  "passed": false,
  "backtest": {
    "sharpe": 0.82,
    "max_dd": 19.9,
    "trades": 42
  },
  "pre_filter_pass": true,
  "wft_avg_sharpe": 1.12,
  "wft_target": 1.5,
  "skip_reason": "wft_failed",
  "cleanup_done": true,
  "entry_signals": 31
}
Field Description
passed true when WFT meets target_metrics
skip_reason Reason for skip/failure: validation_failed / no_signals / pre_filter_failed / wft_failed / pre_check_only / dry_run / null
cleanup_done true when strategy JSON and result JSON were automatically removed on failure
entry_signals Number of days with long entry signal (set during --pre-check; may be null for backward compatibility)

alpha-forge explore result show

Display the latest exploration result for a strategy from the DB. Use this to inspect failure details after alpha-forge explore run exits with code 1.

alpha-forge explore result show <STRATEGY_ID> [--goal <GOAL>] [--json] [--db <PATH>]
Option Description Default
--goal Filter by goal name
--json Output result as JSON to stdout off
--db Path to exploration DB (defaults to path from forge.yaml)

Examples

# Display latest result in human-readable format
alpha-forge explore result show gc_bb_hmm_rsi_v1

# Filter by goal and output as JSON (includes wft_diagnostics and more)
alpha-forge explore result show gc_bb_hmm_rsi_v1 --goal commodities --json

Typical failure investigation flow after alpha-forge explore run returns exit code 1:

FORGE_CONFIG=forge.yaml alpha-forge explore run GC=F --strategy gc_bb_hmm_rsi_v1 --goal commodities
# exit code 1 → retrieve details from DB
FORGE_CONFIG=forge.yaml alpha-forge explore result show gc_bb_hmm_rsi_v1 --goal commodities --json

The --json output includes wft_diagnostics, pre_filter_diagnostics, and opt_metrics fields.

pre_filter_diagnostics structure (issue #409)

When skip_reason: "pre_filter_failed", the pre_filter_diagnostics field contains a structured {value, threshold, passed, gap} object for each criterion so autonomous exploration agents can decide programmatically which criterion failed and by how much.

{
  "pre_filter_diagnostics": {
    "sharpe_ratio":      {"value": 0.716, "threshold": 1.0,  "passed": false, "gap": -0.284},
    "max_drawdown":      {"value": 1.66,  "threshold": 25.0, "passed": true,  "gap": 23.34},
    "trades":            {"value": 16,    "threshold": 30,   "passed": false, "gap": -14},
    "monthly_volume_usd":{"value": 10606.43, "threshold": 0.0, "passed": null, "note": "未チェック(monthly_volume_usd_min が 0 以下です)"},
    "verdict": "failed",
    "failed_criteria": ["sharpe_ratio", "trades"]
  }
}
Field Description
value Observed metric from the backtest. monthly_volume_usd is also computed, but when its threshold (monthly_volume_usd_min) is 0 or below it is treated as "not evaluated" — passed becomes null and a note is attached, while value itself is still computed. The note text is emitted in Japanese regardless of locale
threshold Threshold resolved from the pre_filter section of goals.yaml
passed Whether the criterion is met (null means not evaluated)
gap "value − threshold" (for max_drawdown it is "threshold − value"). Negative = shortfall, positive = headroom
verdict "passed" if all criteria pass, otherwise "failed"
failed_criteria Names of failed criteria in stable order: sharpe_ratiomax_drawdowntrades

wft_diagnostics structure (issue #684)

When skip_reason is "wft_insufficient_oos_data" or "wft_no_valid_oos_windows", the wft_diagnostics field contains structured per-window verdicts and an aggregate summary, mirroring the style of pre_filter_diagnostics. Agents can determine which windows failed and why.

{
  "wft_diagnostics": {
    "total_oos_trades": 17,
    "oos_trades_by_window": [3, 3, 0, 6, 5],
    "valid_windows": 4,
    "required_valid_windows": 3,
    "min_oos_trades_per_window": 3,
    "windows": [
      {
        "window_index": 1,
        "oos_trades": 3,
        "oos_metric": -0.01,
        "valid": true,
        "skip_reason": null,
        "failed_criteria": [],
        "criteria": {
          "min_trades":     {"value": 3, "threshold": 3, "passed": true, "gap": 0},
          "metric_finite":  {"value": -0.01, "passed": true}
        }
      },
      {
        "window_index": 3,
        "oos_trades": 0,
        "oos_metric": null,
        "valid": false,
        "skip_reason": null,
        "failed_criteria": ["min_trades", "metric_finite"],
        "criteria": {
          "min_trades":     {"value": 0, "threshold": 3, "passed": false, "gap": -3},
          "metric_finite":  {"value": null, "passed": false}
        }
      }
    ],
    "summary": {
      "total_windows": 5,
      "valid_windows": 4,
      "required_valid_windows": 3,
      "min_required_trades": 3,
      "min_valid_windows_ratio": 0.6,
      "min_trades_violated_windows": [3],
      "metric_invalid_windows": [3],
      "skipped_windows": []
    }
  }
}
Field Description
windows[].window_index 1-based window index
windows[].oos_trades Number of trades during the OOS period
windows[].oos_metric OOS optimization metric (NaN/inf normalized to null)
windows[].valid True iff both min_trades and metric_finite pass
windows[].failed_criteria List of failed criteria (min_trades, metric_finite, window_skip:<reason>)
windows[].criteria Per-criterion {value, threshold, passed, gap}
summary.min_trades_violated_windows 1-based indices where min_trades failed
summary.metric_invalid_windows 1-based indices where the metric was NaN/inf/None
summary.skipped_windows 1-based indices where the engine itself skipped the window
summary.required_valid_windows Required valid windows = ceil(total × min_valid_windows_ratio)

The legacy fields (total_oos_trades, oos_trades_by_window, valid_windows, required_valid_windows, min_oos_trades_per_window) are kept alongside the new fields for backward compatibility.

alpha-forge explore diagnose

Estimate whether a longer backtest period would let a WFT-failed strategy pass, using linear extrapolation of the trade rate (issue #685). Designed as a follow-up to alpha-forge explore result show when you see wft_failed.

alpha-forge explore diagnose <STRATEGY_ID> [--goal <GOAL>] [--periods 10y,20y,30y] \
                                    [--windows 5] [--min-oos-trades 3] \
                                    [--db <PATH>] [--json]
Option Description Default
--goal Filter records by goal DB-attached goal
--periods Comma-separated periods to evaluate (e.g. 10y,20y,30y) 10y,20y,30y
--windows WFT window count goals.yaml wft config or 5
--min-oos-trades Required OOS trades per window goals.yaml wft config or 3
--json JSON output off

Extrapolation logic

  • trade_rate = total_trades / current_period_years
  • For each scenario: expected = trade_rate × (period / windows)
  • ratio = expected / min_oos_trades_per_window
  • pass_probability: ratio>=3 → 90%, >=2 → 70%, >=1.5 → 50%, >=1 → 30%, <1 → 0%
  • recommendation is the shortest period that meets ≥0.7. Falls back to ≥0.5, then highest. Returns null if all scenarios are 0.

Sample output

WFT diagnose: nvda_ema_macd_supertrend_lt_v1 (symbol=NVDA, goal=long-term-stocks, skip_reason=wft_failed)

Current observation:
  backtest_period: 20.0y  total_trades: 1167  trade_rate: 58.35/y
  wft_windows: 5  min_oos_trades_per_window: 3

Extrapolation by period:
  ✓ 10.0y / 2.0y/window → ~116.7 trades/window (req 3, ratio 38.9, pass_prob ≈ 90%)
  ✓ 20.0y / 4.0y/window → ~233.4 trades/window (req 3, ratio 77.8, pass_prob ≈ 90%)
  ✓ 30.0y / 6.0y/window → ~350.1 trades/window (req 3, ratio 116.7, pass_prob ≈ 90%)

Recommendation:
  goals.yaml: exploration.backtest_period: "10y"
  alpha-forge data fetch NVDA --provider yfinance --period 10y --interval 1d
  Estimated pass probability: ~90% (tier: high)

alpha-forge explore health

Aggregate the most recent N trials and detect consecutive failures or scaffold fixation (issue #408). Designed to be invoked at the start of every iteration of the unattended /explore-strategies --runs 0 loop, so structural failures (scaffold bugs, goals.yaml drift) can be caught early instead of burning runs forever.

alpha-forge explore health --goal <GOAL> [--last N] [--strict] [--json] [--db <PATH>]
Option Description Default
--goal Goal name to aggregate default
--last Number of recent trials to analyze 5
--strict Exit with code 1 when escalation: true (used to break the unattended loop). Returns 0 when only warning: true (issue #467) off
--json Output result as JSON to stdout off
--db Path to exploration DB (defaults to path from forge.yaml)

Output JSON example

{
  "goal": "default",
  "last_n": 5,
  "pass_rate": 0.0,
  "failure_breakdown": {"pre_filter_failed": 3, "no_signals": 2},
  "scaffold_transformation_rate": 1.0,
  "most_common_combo": "ATR+BB+RSI",
  "same_combo_streak": 5,
  "escalation": true,
  "warning": false,
  "escalation_type": "scaffold_degradation",
  "major_indicator_concentration": {"EMA": 0.8, "SUPERTREND": 0.4, "BB": 0.2},
  "recommended_actions": [
    "Pass rate over the last 5 trials is 0%. Check pre_filter thresholds, target symbols, and candidate indicators in goals.yaml.",
    "All recent trials had their indicators transformed by the scaffold. Inspect the indicator filters in `alpha_forge.strategy.scaffold` (see alpha-forge issues #399 and #400)."
  ]
}
Field Description
last_n Actual number of trials analyzed (capped by DB row count when fewer than --last exist)
pass_rate Ratio of trials with passed=True (0.0–1.0)
failure_breakdown Failure counts grouped by skip_reason
scaffold_transformation_rate Ratio of trials whose scaffold transformed the requested indicators (excluding the auto-added ATR-only case)
same_combo_streak How many of the most recent trials share the same indicator_combo
escalation true when pass_rate==0 AND scaffold-related root cause (scaffold_transformation_rate>=0.5 or mid-range). Hard-stop signal (issue #467)
warning true when pass_rate==0 AND same_combo_streak==last_n AND scaffold_transformation_rate<=0.1 (only agent_selection_bias). Loop continues (exit 0) and the agent is expected to switch to a different indicator combo on the next run (issue #467)
escalation_type Cause classification (issues #436 / #467): "scaffold_degradation" (escalation) / "agent_selection_bias" (warning) / null
major_indicator_concentration Indicator-level concentration over the recent N scaffold trials (issue #858). Maps {indicator: containment_ratio (0–1)}dominant_combo aggregates full-string combos, so ATR+EMA+SUPERTREND and ATR+EMA+HMM+SMA are distinct, but this field aggregates per indicator and exposes dominant indicators like EMA. ATR is excluded (auto-added by scaffold)
recommended_actions Human-facing remediation hints derived from the detected pattern

Escalation rules

If the DB contains fewer than --last rows for the goal, the report stays observational (escalation: false and warning: false are both forced) and never blocks the loop. Once enough history accumulates, the report takes one of the following shapes:

  • 0% pass rate and scaffold transformation rate >= 50%escalation: true / escalation_type: "scaffold_degradation" (hard stop)
  • 0% pass rate and all of the most recent N trials share the same indicator_combo:
  • scaffold transformation rate <= 10%warning: true / escalation: false / escalation_type: "agent_selection_bias" (loop continues; downgraded to warning by issue #467 because the agent can resolve it by picking a different combo)
  • mid-range (10% < rate < 50%) → conservatively classified as escalation: true / "scaffold_degradation"

Use inside the unattended skill

# Run at the start of every iteration of /explore-strategies
FORGE_CONFIG=forge.yaml alpha-forge explore health \
  --goal default --last 5 --strict --json
# exit code 1 → surface recommended_actions to a human and break the loop

alpha-forge explore recommend show

Displays the next exploration candidates from recommendations.yaml (produced by auto-relax / analyze-exploration).

alpha-forge explore recommend show [--recs <path>] [--json] [--validated-only]
Option Description Default
--recs <path> Path to recommendations.yaml data/explorer/recommendations.yaml
--json Emit JSON off
--validated-only Strict mode: only show candidates whose consumable strategy_id exists in the DB (intended for agent consumption, issue #831) off

Agent consumption notes (issue #831)

Each candidate carries two fields: strategy_id (the consumable identifier) and variant_of (a reference to the base strategy it was derived from). When passing a candidate to explore run / backtest run, always use strategy_id. Using variant_of often fails with ValueError: Unknown template name because the base is no longer registered in the DB / template registry.

Example output (post issue #831):

Recommendations (auto-relax / 2026-05-19T13:10:59+00:00)
  #1 NVDA ATR+EMA+SUPERTREND (score=1.083) [strategy_id: nvda_ema_supertrend_v2_optimized] [variant of nvda_ema_supertrend_v2 (missing)]
     auto-relax variant
  • [strategy_id: ...] is what agents / users should pass to --strategy.
  • [variant of <id> (missing)] — the (missing) marker indicates the base is no longer in the DB (informational only — do not consume it).
  • Candidates with strategy_id=None AND a DB-missing variant_of are automatically removed from the file.
  • With --validated-only, only candidates whose strategy_id exists in the DB are shown, so agents can safely forward the value to --strategy.