Advanced workflows ================== This page highlights optional steps available when training models. The ``advanced_demo.ipynb`` notebook demonstrates the commands. Grid search ----------- Run ``mlcls-train -g`` to search a wider range of parameters. The job records metrics for each candidate and stores the best estimator under ``artefacts/``. To perform grid search on the random-forest pipeline pass the model option:: mlcls-train --model random_forest -g mlcls-train --model gboost -g Calibration ----------- After training a model you can calibrate the predicted probabilities:: mlcls-eval --calibrate isotonic This fits a calibration model on the validation fold and reports Brier score in the output table. The calibrated estimator is saved with the suffix ``_calibrated.joblib``. Run the standalone helper to draw reliability plots for both models:: python -m src.calibration The script reads the saved models and generates ``*_calibration.png`` files under ``artefacts/``. Fairness checks --------------- Use the evaluation command with ``--group-col`` to compute group metrics such as statistical parity and equal opportunity. The summary table now includes an ``equal_opp`` column showing the worst to best true positive rate ratio:: mlcls-eval --group-col gender --group-col marital This command prints parity ratios for each group and stores them in ``artefacts/group_metrics.csv``. ``summary_metrics.csv`` records the ``equal_opp`` ratio for each model. It also stores ``eq_odds`` which is the difference between the true- and false-positive rate gaps. Set a custom probability cutoff for these metrics with ``--threshold``. When omitted the tool chooses the Youden J statistic:: mlcls-eval --group-col gender --threshold 0.6 Select specific pipelines with ``--models``. Pass multiple names to evaluate only those models:: mlcls-eval --models logreg random_forest svm The ``advanced_demo.ipynb`` notebook walks through these steps and shows the additional plots. SHAP values ----------- Provide a DataFrame to ``logreg_coefficients`` or ``tree_feature_importances`` and pass ``shap_csv_path`` to store per-feature SHAP values:: from src.feature_importance import logreg_coefficients shap_df = logreg_coefficients( "artefacts/lr.joblib", shap_csv_path="artefacts/logreg_shap_values.csv", X=X_test, ) The helper function ``compute_shap_values`` creates the table with columns matching the input DataFrame. SHAP plots ---------- Use ``plot_shap_summary`` to visualise these values:: from src.feature_importance import plot_shap_summary plot_shap_summary( "artefacts/lr.joblib", X=X_test, png_path="artefacts/logreg_shap.png", ) The image ``logreg_shap.png`` appears under ``artefacts/``. Report artefacts ---------------- Gather recent metrics and plots with the report command:: mlcls-report The tool copies the latest files into ``report_artifacts/``. You can zip the folder and share it with collaborators.