Advanced workflows
==================

This page highlights optional steps available when training models. The
``advanced_demo.ipynb`` notebook demonstrates the commands.

Grid search
-----------

Run ``mlcls-train -g`` to search a wider range of parameters. The job records
metrics for each candidate and stores the best estimator under ``artefacts/``.
To perform grid search on the random-forest pipeline pass the model option::

   mlcls-train --model random_forest -g
   mlcls-train --model gboost -g

Calibration
-----------

After training a model you can calibrate the predicted probabilities::

   mlcls-eval --calibrate isotonic

This fits a calibration model on the validation fold and reports Brier score in
the output table. The calibrated estimator is saved with the suffix
``_calibrated.joblib``.

Run the standalone helper to draw reliability plots for both models::

   python -m src.calibration

The script reads the saved models and generates ``*_calibration.png``
files under ``artefacts/``.

Fairness checks
---------------

Use the evaluation command with ``--group-col`` to compute group metrics such
as statistical parity and equal opportunity. The summary table now includes an
``equal_opp`` column showing the worst to best true positive rate ratio::

   mlcls-eval --group-col gender --group-col marital
This command prints parity ratios for each group and stores them in
``artefacts/group_metrics.csv``. ``summary_metrics.csv`` records the
``equal_opp`` ratio for each model. It also stores ``eq_odds`` which is the
difference between the true- and false-positive rate gaps.

Set a custom probability cutoff for these metrics with ``--threshold``. When
omitted the tool chooses the Youden J statistic::

   mlcls-eval --group-col gender --threshold 0.6

Select specific pipelines with ``--models``. Pass multiple names to evaluate
only those models::

   mlcls-eval --models logreg random_forest svm

The ``advanced_demo.ipynb`` notebook walks through these steps and shows the
additional plots.

SHAP values
-----------

Provide a DataFrame to ``logreg_coefficients`` or ``tree_feature_importances``
and pass ``shap_csv_path`` to store per-feature SHAP values::

   from src.feature_importance import logreg_coefficients

   shap_df = logreg_coefficients(
       "artefacts/lr.joblib",
       shap_csv_path="artefacts/logreg_shap_values.csv",
       X=X_test,
   )

The helper function ``compute_shap_values`` creates the table with columns
matching the input DataFrame.

SHAP plots
----------

Use ``plot_shap_summary`` to visualise these values::

   from src.feature_importance import plot_shap_summary

   plot_shap_summary(
       "artefacts/lr.joblib",
       X=X_test,
       png_path="artefacts/logreg_shap.png",
   )

The image ``logreg_shap.png`` appears under ``artefacts/``.

Report artefacts
----------------

Gather recent metrics and plots with the report command::

   mlcls-report

The tool copies the latest files into ``report_artifacts/``. You can zip the
folder and share it with collaborators.