Kaggle Challenge Log #9 — From Model Tuning to Ensemble Modeling (Playground S6E4: Predicting Irrigation Need, Day 1) - Kaggle Relearning!

📝 Summary

- Content: Built an ensemble model for a new competition
- Conclusion: Completed parameter tuning for individual models and weight tuning for the ensemble

📝 Summary
🎯 Goal
📍 Current Position and Challenges
💬 Prompts and Chat Log
📊 Results (Score & Ranking)
📚 Insights
🔮 Next

🎯 Goal

This time, I’m joining a new competition and trying to implement a base model using the ensemble method that performed well in Episode 2!

www.kaggle.com

📍 Current Position and Challenges

- Starting the Kaggle Playground Series competition “Predicting Irrigation Need”❗
- In the previous competition, I focused on understanding the data, but model selection from the competition before that had a more noticeable performance improvement
- In this new competition, I want to improve performance by combining model selection and data understanding

The challenges this time are:
“First, build a base model by comparing and combining models”
“Create prompts that generate complex code combining multiple models”
“Ensure the performance of the constructed base model”
These three points need to be cleared.

💬 Prompts and Chat Log

*Blue text indicates prompts sent to the AI*

Overall workflow for today and implementation of “1. Preprocessing”

I want to take on a Kaggle competition. Nice to work with you.

I ultimately want to build an ensemble model using CatBoost, XGBClassifier, and LightGBM. I will upload the sample submission and the first few rows of the training data.

Here is the overall structure I want the code to follow:
5-Step code
1. Training data preprocessing ⇒ X_train, y_train
2. Model building ⇒ model
3. Test data preprocessing ⇒ X_test
4. Prediction ⇒ y_pred
5. Submission ⇒ submission.csv

First, I want to implement:
1. Training data preprocessing ⇒ X_train, y_train

I think we need to handle categorical variables for each model.
For CatBoost: categorical column indices → cat_idx and original X_train
For XGBoost and LightGBM: X_train_encoded (numerically encoded), and encoders X_le for reuse during inference
For the target variable, should we also prepare y_train_encoded and y_le?

If there is any other necessary preparation, please let me know. If everything is fine, please output only the preprocessing code.

Uploaded files: 5 rows of 📝sample_submission.csv and 100 rows of 📝train.csv

Copilot’s Response

🌱 Conclusion: What needs to be prepared in preprocessing
✔ For CatBoost
•  X_train (original DataFrame)
•  cat_idx (indices of categorical columns)
•  Target variable can remain as string
✔ For XGBoost / LightGBM
•  X_train_encoded (LabelEncoded features)
•  X_le (LabelEncoders for each column)
•  y_train_encoded (integer-encoded target)
•  y_le (LabelEncoder for target)
✔ Other
•  Auto-detection of numeric and categorical columns
•  Prepare encoders so the same ones are used for test data
•  Keep categorical column indices for LightGBM (same after encoding)

🧩 Preprocessing Code (Step 1 Only)
import pandas as pd
from sklearn.preprocessing import LabelEncoder

# =========================================
# 1. Training Data Preprocessing
# =========================================

# --- Target variable ---
y_train = train_df["Irrigation_Need"]          # For CatBoost (keep as string)
y_le = LabelEncoder()
y_train_encoded = y_le.fit_transform(y_train)  # For XGB / LGBM

# --- Features ---
X_train = train_df.drop(columns=["Irrigation_Need"])

# --- Extract categorical columns ---
cat_cols = X_train.select_dtypes(include=["object"]).columns.tolist()
num_cols = X_train.select_dtypes(exclude=["object"]).columns.tolist()

# CatBoost categorical column indices
cat_idx = [X_train.columns.get_loc(c) for c in cat_cols]

# --- LabelEncoding for XGBoost / LightGBM ---
X_train_encoded = X_train.copy()
X_le = {}  # store encoders

for col in cat_cols:
    le = LabelEncoder()
    X_train_encoded[col] = le.fit_transform(X_train[col])
    X_le[col] = le

# LightGBM categorical column indices (same after encoding)
lgbm_cat_idx = cat_idx

CatBoost Version — Implementation of “2. Model Building”

Thanks! Next, let’s implement the CatBoost-only model building!

Please make it run on GPU (T4×2).

I want to tune using KFold (5 splits) and Optuna (20 trials).

Is it enough to use y_train, X_train, and cat_idx as inputs?

Please output the model-building code. The tuned model should be model_cat, and the best parameters best_params_cat.

Can we use OOF predictions for ensemble weighting? Should we store them as oof_cat?

Copilot’s Response

🌱 Are y_train / X_train / cat_idx enough?
Yes, completely enough.
- CatBoost accepts categorical columns as strings
- cat_idx enables fast GPU training
- y_train can remain as string (CatBoost handles it internally)

🌟 Can OOF predictions be used for ensemble?
Absolutely.
- Weight optimization with Optuna
- Second-layer stacking
- Stable model comparison
All can use oof_cat.

🔥 CatBoost (GPU: T4×2) × Optuna (20 trials) × KFold (5 splits)
Below is Step 2 model-building code, runnable as-is.

🚀 Step 2: CatBoost Model Building (GPU / Optuna / KFold)

import optuna
from sklearn.model_selection import KFold
from sklearn.metrics import balanced_accuracy_score
from catboost import CatBoostClassifier
import numpy as np

# =========================================
# 2. CatBoost Model Building (GPU + Optuna)
# =========================================

N_SPLITS = 5
N_TRIALS = 20

kf = KFold(n_splits=N_SPLITS, shuffle=True, random_state=42)

oof_cat = np.zeros(len(X_train))  # store OOF predictions

# -----------------------------
# Optuna objective
# -----------------------------
def objective_cat(trial):

    params = {
        "iterations": trial.suggest_int("iterations", 300, 1500),
        "depth": trial.suggest_int("depth", 4, 10),
        "learning_rate": trial.suggest_float("learning_rate", 0.01, 0.3),
        "l2_leaf_reg": trial.suggest_float("l2_leaf_reg", 1e-3, 10.0, log=True),
        "bagging_temperature": trial.suggest_float("bagging_temperature", 0.0, 5.0),
        "random_strength": trial.suggest_float("random_strength", 0.0, 5.0),
        "border_count": trial.suggest_int("border_count", 32, 255),
        "task_type": "GPU",
        "devices": "0,1",   # T4 × 2
        "loss_function": "MultiClass",
        "eval_metric": "BalancedAccuracy",
        "verbose": False
    }

    oof_pred = np.zeros(len(X_train))

    for train_idx, valid_idx in kf.split(X_train):
        X_tr, X_val = X_train.iloc[train_idx], X_train.iloc[valid_idx]
        y_tr, y_val = y_train.iloc[train_idx], y_train.iloc[valid_idx]

        model = CatBoostClassifier(**params)
        model.fit(
            X_tr, y_tr,
            cat_features=cat_idx,
            eval_set=(X_val, y_val),
            verbose=False
        )

        pred = model.predict(X_val)
        oof_pred[valid_idx] = pred.reshape(-1)

    score = balanced_accuracy_score(y_train, oof_pred)
    return score

# -----------------------------
# Run Optuna
# -----------------------------
study_cat = optuna.create_study(direction="maximize")
study_cat.optimize(objective_cat, n_trials=N_TRIALS)

best_params_cat = study_cat.best_params
best_params_cat["task_type"] = "GPU"
best_params_cat["devices"] = "0,1"
best_params_cat["loss_function"] = "MultiClass"
best_params_cat["eval_metric"] = "BalancedAccuracy"
best_params_cat["verbose"] = False

print("Best params (CatBoost):")
print(best_params_cat)

# -----------------------------
# Train final model on full data
# -----------------------------
model_cat = CatBoostClassifier(**best_params_cat)
model_cat.fit(
    X_train, y_train,
    cat_features=cat_idx,
    verbose=False
)

# -----------------------------
# Recompute OOF predictions
# -----------------------------
oof_cat = np.zeros(len(X_train))

for train_idx, valid_idx in kf.split(X_train):
    X_tr, X_val = X_train.iloc[train_idx], X_train.iloc[valid_idx]
    y_tr, y_val = y_train.iloc[train_idx], y_train.iloc[valid_idx]

    model_tmp = CatBoostClassifier(**best_params_cat)
    model_tmp.fit(
        X_tr, y_tr,
        cat_features=cat_idx,
        verbose=False
    )

    pred = model_tmp.predict(X_val)
    oof_cat[valid_idx] = pred.reshape(-1)

print("OOF Balanced Accuracy:", balanced_accuracy_score(y_train, oof_cat))

Fixed 3 bugs (sent error messages to Copilot):
1. GPU device format: changed devices from "0,1" to "0:1"
2. loss_function and eval_metric mismatch → changed eval_metric from "BalancedAccuracy" to "Accuracy"
3. y_train cannot be string → changed y_train to y_train_encoded

CatBoost Version — Implementation of “3. Test Data Processing”, “4. Prediction”, “5. Submission”

It’s fixed! Thank you.

I would like to run inference and submission with this model alone for now and check the score.

Please output the code for Step 3 (test data preprocessing), Step 4 (prediction), and Step 5 (submission).

Copilot’s Response

🌱 Step3: Test Data Preprocessing (same as Train)
CatBoost can handle categorical variables as strings, so we only need to match the structure of X_train.

# =========================================
# 3. Test Data Preprocessing
# =========================================
# Assuming test_df is already loaded
X_test = test_df.copy()

# Use the same categorical columns as train
cat_cols_test = X_test.select_dtypes(include=["object"]).columns.tolist()

# CatBoost can handle string categories as-is
# No additional processing required

🌾 Step4: Prediction (CatBoost only)
# =========================================
# 4. Prediction
# =========================================
y_pred_test_encoded = model_cat.predict(X_test).reshape(-1)

# Predictions are integers (0,1,2), so convert back to original labels
y_pred_test = y_le.inverse_transform(y_pred_test_encoded)

📦 Step5: Create Submission File (match sample_submission)
Your uploaded sample_submission_5.csv format:
id,Irrigation_Need
630000,Low
630001,Low
...

We will match this format.

# =========================================
# 5. Submission File Creation
# =========================================
submission = pd.DataFrame({
    "id": test_df["id"],
    "Irrigation_Need": y_pred_test
})

submission.to_csv("submission_catboost.csv", index=False)

print("Created submission_catboost.csv.")

🎉 CatBoost-only submission is now complete!
Your 5‑Step workflow is now:
- Preprocessing (done)
- CatBoost model building (done)
- Test preprocessing (done)
- Prediction (done)
- Submission (done)

Now you can upload to Kaggle and check your score.

The full code is available here

The CatBoost-only score was 0.95802

XGBoost Version — Implementation of “2. Model Building”

Next, please implement the XGBoost-only model.

Assume Step 1 (training data preprocessing) is already done. Please output only Step 2 (model building).

Use GPU (T4×2), KFold (5 splits), and Optuna (20 trials).

Can we use y_train_encoded and X_train_encoded as inputs?

Copilot’s Response

import optuna
from sklearn.model_selection import KFold
from sklearn.metrics import balanced_accuracy_score
import xgboost as xgb
import numpy as np

# =========================================
# 2. XGBoost Model Building (GPU + Optuna)
# =========================================

N_SPLITS = 5
N_TRIALS = 20

kf = KFold(n_splits=N_SPLITS, shuffle=True, random_state=42)

oof_xgb = np.zeros(len(X_train_encoded))

# -----------------------------
# Optuna objective
# -----------------------------
def objective_xgb(trial):

    params = {
        "objective": "multi:softmax",
        "num_class": len(np.unique(y_train_encoded)),
        "tree_method": "gpu_hist",
        "predictor": "gpu_predictor",
        "gpu_id": 0,  # Use GPU0 only (stable)
        "max_depth": trial.suggest_int("max_depth", 3, 10),
        "learning_rate": trial.suggest_float("learning_rate", 0.01, 0.3),
        "subsample": trial.suggest_float("subsample", 0.5, 1.0),
        "colsample_bytree": trial.suggest_float("colsample_bytree", 0.5, 1.0),
        "min_child_weight": trial.suggest_int("min_child_weight", 1, 10),
        "gamma": trial.suggest_float("gamma", 0.0, 5.0),
        "lambda": trial.suggest_float("lambda", 1e-3, 10.0, log=True),
        "alpha": trial.suggest_float("alpha", 1e-3, 10.0, log=True),
        "n_estimators": trial.suggest_int("n_estimators", 200, 1500)
    }

    oof_pred = np.zeros(len(X_train_encoded))

    for train_idx, valid_idx in kf.split(X_train_encoded):
        X_tr, X_val = X_train_encoded.iloc[train_idx], X_train_encoded.iloc[valid_idx]
        y_tr, y_val = y_train_encoded[train_idx], y_train_encoded[valid_idx]

        model = xgb.XGBClassifier(**params)
        model.fit(
            X_tr, y_tr,
            eval_set=[(X_val, y_val)],
            verbose=False
        )

        pred = model.predict(X_val)
        oof_pred[valid_idx] = pred

    score = balanced_accuracy_score(y_train_encoded, oof_pred)
    return score


# -----------------------------
# Run Optuna
# -----------------------------
study_xgb = optuna.create_study(direction="maximize")
study_xgb.optimize(objective_xgb, n_trials=N_TRIALS)

best_params_xgb = study_xgb.best_params
best_params_xgb["objective"] = "multi:softmax"
best_params_xgb["num_class"] = len(np.unique(y_train_encoded))
best_params_xgb["tree_method"] = "gpu_hist"
best_params_xgb["predictor"] = "gpu_predictor"
best_params_xgb["gpu_id"] = 0

print("Best params (XGBoost):")
print(best_params_xgb)

# -----------------------------
# Train full model
# -----------------------------
model_xgb = xgb.XGBClassifier(**best_params_xgb)
model_xgb.fit(
    X_train_encoded, y_train_encoded,
    verbose=False
)

# -----------------------------
# Recompute OOF
# -----------------------------
oof_xgb = np.zeros(len(X_train_encoded))

for train_idx, valid_idx in kf.split(X_train_encoded):
    X_tr, X_val = X_train_encoded.iloc[train_idx], X_train_encoded.iloc[valid_idx]
    y_tr, y_val = y_train_encoded[train_idx], y_train_encoded[valid_idx]

    model_tmp = xgb.XGBClassifier(**best_params_xgb)
    model_tmp.fit(X_tr, y_tr, verbose=False)

    pred = model_tmp.predict(X_val)
    oof_xgb[valid_idx] = pred

print("OOF Balanced Accuracy (XGB):", balanced_accuracy_score(y_train_encoded, oof_xgb))

Error message indicated that "gpu_hist" was not supported, so GPU was abandoned and CPU was used instead.

XGBoost / LightGBM Version — Implementation of “3. Test Data Preprocessing”

Now we need to convert X_test into X_test_encoded for inference. Could you output the code for Step 3 (test data preprocessing)?

Copilot’s Response

🌱 Step3: Test Data Preprocessing (X_test → X_test_encoded)
CatBoost can use string categories as-is,
but XGBoost / LightGBM require numeric encoding,
so we convert using the same LabelEncoders from train.

🚀 Step3: Test Data Preprocessing Code
# =========================================
# 3. Test Data Preprocessing (X_test → X_test_encoded)
# =========================================

# Assuming test_df is already loaded
X_test = test_df.copy()

# --- LabelEncoding for XGBoost / LightGBM ---
X_test_encoded = X_test.copy()

for col in X_le.keys():  # reuse encoders fitted on train
    le = X_le[col]

    # Handle unseen categories: replace with "Unknown"
    X_test_encoded[col] = X_test_encoded[col].map(
        lambda x: x if x in le.classes_ else "Unknown"
    )

    # Add "Unknown" to classes if needed
    if "Unknown" not in le.classes_:
        le.classes_ = np.append(le.classes_, "Unknown")

    # Transform
    X_test_encoded[col] = le.transform(X_test_encoded[col])

The full code is available here

The XGBoost-only score was 0.95966

XGBoostClassifier Version — Implementation of “2. Model Building”

Next, please implement the XGBoostClassifier-only model.

Assuming Step 1 (training data preprocessing) is already completed, please output only the code for Step 2 (model building).

Use CPU for computation, with KFold (5 splits) and Optuna (20 trials) for tuning.

The outputs should be model_xgbcls and best_params_xgbcls, and the inputs should be y_train_encoded and X_train_encoded.

Copilot’s Response

import optuna
from sklearn.model_selection import KFold
from sklearn.metrics import balanced_accuracy_score
import xgboost as xgb
import numpy as np

# =========================================
# 2. XGBoostClassifier Model Building (CPU + Optuna)
# =========================================

N_SPLITS = 5
N_TRIALS = 20

kf = KFold(n_splits=N_SPLITS, shuffle=True, random_state=42)

oof_xgbcls = np.zeros(len(X_train_encoded))

# -----------------------------
# Optuna objective
# -----------------------------
def objective_xgbcls(trial):

    params = {
        "objective": "multi:softmax",
        "num_class": len(np.unique(y_train_encoded)),
        "tree_method": "hist",          # Fast CPU training
        "predictor": "cpu_predictor",
        "max_depth": trial.suggest_int("max_depth", 3, 10),
        "learning_rate": trial.suggest_float("learning_rate", 0.01, 0.3),
        "subsample": trial.suggest_float("subsample", 0.5, 1.0),
        "colsample_bytree": trial.suggest_float("colsample_bytree", 0.5, 1.0),
        "min_child_weight": trial.suggest_int("min_child_weight", 1, 10),
        "gamma": trial.suggest_float("gamma", 0.0, 5.0),
        "lambda": trial.suggest_float("lambda", 1e-3, 10.0, log=True),
        "alpha": trial.suggest_float("alpha", 1e-3, 10.0, log=True),
        "n_estimators": trial.suggest_int("n_estimators", 200, 1500)
    }

    oof_pred = np.zeros(len(X_train_encoded))

    for train_idx, valid_idx in kf.split(X_train_encoded):
        X_tr, X_val = X_train_encoded.iloc[train_idx], X_train_encoded.iloc[valid_idx]
        y_tr, y_val = y_train_encoded[train_idx], y_train_encoded[valid_idx]

        model = xgb.XGBClassifier(**params)
        model.fit(
            X_tr, y_tr,
            eval_set=[(X_val, y_val)],
            verbose=False
        )

        pred = model.predict(X_val)
        oof_pred[valid_idx] = pred

    score = balanced_accuracy_score(y_train_encoded, oof_pred)
    return score


# -----------------------------
# Run Optuna
# -----------------------------
study_xgbcls = optuna.create_study(direction="maximize")
study_xgbcls.optimize(objective_xgbcls, n_trials=N_TRIALS)

best_params_xgbcls = study_xgbcls.best_params
best_params_xgbcls["objective"] = "multi:softmax"
best_params_xgbcls["num_class"] = len(np.unique(y_train_encoded))
best_params_xgbcls["tree_method"] = "hist"
best_params_xgbcls["predictor"] = "cpu_predictor"

print("Best params (XGBoostClassifier):")
print(best_params_xgbcls)

# -----------------------------
# Train full model with best params
# -----------------------------
model_xgbcls = xgb.XGBClassifier(**best_params_xgbcls)
model_xgbcls.fit(
    X_train_encoded, y_train_encoded,
    verbose=False
)

# -----------------------------
# Recompute OOF
# -----------------------------
oof_xgbcls = np.zeros(len(X_train_encoded))

for train_idx, valid_idx in kf.split(X_train_encoded):
    X_tr, X_val = X_train_encoded.iloc[train_idx], X_train_encoded.iloc[valid_idx]
    y_tr, y_val = y_train_encoded[train_idx], y_train_encoded[valid_idx]

    model_tmp = xgb.XGBClassifier(**best_params_xgbcls)
    model_tmp.fit(X_tr, y_tr, verbose=False)

    pred = model_tmp.predict(X_val)
    oof_xgbcls[valid_idx] = pred

print("OOF Balanced Accuracy (XGBClassifier):", balanced_accuracy_score(y_train_encoded, oof_xgbcls))

The full code is available here

The XGBoostClassifier-only score was 0.96171 — the highest among the models tested this time!

LightGBM Version — Implementation of “2. Model Building”

Next, please implement the LightGBM-only model.

Assuming Step 1 (training data preprocessing) is already completed, please output only the code for Step 2 (model building).

Use GPU (P100), KFold (5 splits), and Optuna (20 trials) for tuning.

Please output model_lgbm and best_params_lgbm, using y_train_encoded and X_train_encoded as inputs.

Copilot’s Response

import optuna
from sklearn.model_selection import KFold
from sklearn.metrics import balanced_accuracy_score
import lightgbm as lgb
import numpy as np

# =========================================
# 2. LightGBM Model Building (GPU + Optuna)
# =========================================

N_SPLITS = 5
N_TRIALS = 20

kf = KFold(n_splits=N_SPLITS, shuffle=True, random_state=42)

oof_lgbm = np.zeros(len(X_train_encoded))

# -----------------------------
# Optuna objective
# -----------------------------
def objective_lgbm(trial):

    params = {
        "objective": "multiclass",
        "num_class": len(np.unique(y_train_encoded)),
        "metric": "multi_logloss",
        "device": "gpu",              # Use GPU
        "gpu_platform_id": 0,
        "gpu_device_id": 0,
        "boosting_type": "gbdt",
        "learning_rate": trial.suggest_float("learning_rate", 0.01, 0.3),
        "num_leaves": trial.suggest_int("num_leaves", 16, 256),
        "max_depth": trial.suggest_int("max_depth", -1, 12),
        "feature_fraction": trial.suggest_float("feature_fraction", 0.5, 1.0),
        "bagging_fraction": trial.suggest_float("bagging_fraction", 0.5, 1.0),
        "bagging_freq": trial.suggest_int("bagging_freq", 1, 10),
        "min_data_in_leaf": trial.suggest_int("min_data_in_leaf", 10, 100),
        "lambda_l1": trial.suggest_float("lambda_l1", 1e-3, 10.0, log=True),
        "lambda_l2": trial.suggest_float("lambda_l2", 1e-3, 10.0, log=True),
        "verbose": -1
    }

    oof_pred = np.zeros(len(X_train_encoded))

    for train_idx, valid_idx in kf.split(X_train_encoded):
        X_tr, X_val = X_train_encoded.iloc[train_idx], X_train_encoded.iloc[valid_idx]
        y_tr, y_val = y_train_encoded[train_idx], y_train_encoded[valid_idx]

        train_data = lgb.Dataset(
            X_tr, label=y_tr, categorical_feature=lgbm_cat_idx
        )
        valid_data = lgb.Dataset(
            X_val, label=y_val, categorical_feature=lgbm_cat_idx
        )

        model = lgb.train(
            params,
            train_data,
            valid_sets=[valid_data]
        )

        pred = np.argmax(model.predict(X_val), axis=1)
        oof_pred[valid_idx] = pred

    score = balanced_accuracy_score(y_train_encoded, oof_pred)
    return score


# -----------------------------
# Run Optuna
# -----------------------------
study_lgbm = optuna.create_study(direction="maximize")
study_lgbm.optimize(objective_lgbm, n_trials=N_TRIALS)

best_params_lgbm = study_lgbm.best_params
best_params_lgbm["objective"] = "multiclass"
best_params_lgbm["num_class"] = len(np.unique(y_train_encoded))
best_params_lgbm["metric"] = "multi_logloss"
best_params_lgbm["device"] = "gpu"
best_params_lgbm["gpu_platform_id"] = 0
best_params_lgbm["gpu_device_id"] = 0
best_params_lgbm["boosting_type"] = "gbdt"
best_params_lgbm["verbose"] = -1

print("Best params (LightGBM):")
print(best_params_lgbm)

# -----------------------------
# Train full model with best params
# -----------------------------
train_data_full = lgb.Dataset(
    X_train_encoded, label=y_train_encoded, categorical_feature=lgbm_cat_idx
)

model_lgbm = lgb.train(
    best_params_lgbm,
    train_data_full
)

# -----------------------------
# Recompute OOF
# -----------------------------
oof_lgbm = np.zeros(len(X_train_encoded))

for train_idx, valid_idx in kf.split(X_train_encoded):
    X_tr, X_val = X_train_encoded.iloc[train_idx], X_train_encoded.iloc[valid_idx]
    y_tr, y_val = y_train_encoded[train_idx], y_train_encoded[valid_idx]

    train_data = lgb.Dataset(
        X_tr, label=y_tr, categorical_feature=lgbm_cat_idx
    )

    model_tmp = lgb.train(
        best_params_lgbm,
        train_data
    )

    pred = np.argmax(model_tmp.predict(X_val), axis=1)
    oof_lgbm[valid_idx] = pred

print("OOF Balanced Accuracy (LightGBM):", balanced_accuracy_score(y_train_encoded, oof_lgbm))

Since verbose_eval=False was not accepted, that line was removed.

Also, because predictions contained probabilities instead of class integers, the following code was added:

# Prediction (probabilities)
proba = model_lgbm.predict(X_test_encoded)

# Probabilities → class index
y_pred_encoded = np.argmax(proba, axis=1)

# Class index → original labels (Low / Medium / High)
y_pred = y_le.inverse_transform(y_pred_encoded)

The full code is available here

The LightGBM-only score was 0.95900

Ensemble Version — Implementation of “2. Model Building” and “4. Prediction”

Great, it worked! Now let’s finish by building the ensemble model.

Each model has already been trained on the full training data using best_params — that is, model_cat, model_xgb, model_xgbcls, and model_lgbm are all fit.

Please output the code for weight optimization using Optuna as the second half of Step 2.

Also, please output the code for Step 4 (prediction) separately.

No other parts are needed. Thank you.

Copilot’s Response

🌟 Assumptions (your environment)
- model_cat
- model_xgb
- model_xgbcls
- model_lgbm
are all trained on the full training data.
- OOF predictions already exist (oof_cat, oof_xgb, oof_xgbcls, oof_lgbm)

============================================
✅ 2. Model Building (Part 2) — Ensemble Weight Optimization
============================================
# =========================================
# 2. Ensemble Weight Optimization (Optuna)
# =========================================

import optuna
from sklearn.metrics import balanced_accuracy_score
import numpy as np

# Stack OOF predictions (shape: (n_samples, 4))
oof_stack = np.vstack([
    oof_cat,
    oof_xgb,
    oof_xgbcls,
    oof_lgbm
]).T  # (n_samples, 4)

def objective_ensemble(trial):

    # Weights for the 4 models
    w_cat    = trial.suggest_float("w_cat",    0.0, 1.0)
    w_xgb    = trial.suggest_float("w_xgb",    0.0, 1.0)
    w_xgbcls = trial.suggest_float("w_xgbcls", 0.0, 1.0)
    w_lgbm   = trial.suggest_float("w_lgbm",   0.0, 1.0)

    weights = np.array([w_cat, w_xgb, w_xgbcls, w_lgbm])

    # Normalize (sum to 1)
    if weights.sum() == 0:
        return 0
    weights = weights / weights.sum()

    # Weighted voting (integer labels)
    pred = np.sum(oof_stack * weights, axis=1)

    # Round to nearest class index
    pred = np.round(pred).astype(int)

    # Score
    score = balanced_accuracy_score(y_train_encoded, pred)
    return score


study_ens = optuna.create_study(direction="maximize")
study_ens.optimize(objective_ensemble, n_trials=50)

best_weights = study_ens.best_params
print("Best ensemble weights:", best_weights)

# Normalize and store
w_cat    = best_weights["w_cat"]
w_xgb    = best_weights["w_xgb"]
w_xgbcls = best_weights["w_xgbcls"]
w_lgbm   = best_weights["w_lgbm"]

weight_sum = w_cat + w_xgb + w_xgbcls + w_lgbm
w_cat, w_xgb, w_xgbcls, w_lgbm = (
    w_cat/weight_sum,
    w_xgb/weight_sum,
    w_xgbcls/weight_sum,
    w_lgbm/weight_sum
)

print("Normalized weights:", w_cat, w_xgb, w_xgbcls, w_lgbm)

============================================
✅ 4. Prediction (Ensemble)
============================================
CatBoost uses X_test,
XGB / XGBClassifier / LightGBM use X_test_encoded.

# =========================================
# 4. Prediction (Ensemble)
# =========================================

# --- Predictions from each model ---
# CatBoost → direct class index
pred_cat = model_cat.predict(X_test).reshape(-1)

# XGB → probabilities → argmax
pred_xgb = np.argmax(model_xgb.predict_proba(X_test_encoded), axis=1)

# XGBClassifier → probabilities → argmax
pred_xgbcls = np.argmax(model_xgbcls.predict_proba(X_test_encoded), axis=1)

# LightGBM → probabilities → argmax
pred_lgbm = np.argmax(model_lgbm.predict(X_test_encoded), axis=1)

# --- Stack predictions ---
pred_stack = np.vstack([
    pred_cat,
    pred_xgb,
    pred_xgbcls,
    pred_lgbm
]).T  # (n_samples, 4)

# --- Weighted voting ---
y_pred_encoded = (
    pred_stack[:,0] * w_cat +
    pred_stack[:,1] * w_xgb +
    pred_stack[:,2] * w_xgbcls +
    pred_stack[:,3] * w_lgbm
)

# Round to integer class index
y_pred_encoded = np.round(y_pred_encoded).astype(int)

# --- Convert back to original labels ---
y_pred = y_le.inverse_transform(y_pred_encoded)

🎉 Ensemble model complete!
- Weight optimization with Optuna
- Weighted voting across 4 models
- Correct flow: integer labels → inverse_transform

This concludes today’s conversation.

Here is the full code created today:

www.kaggle.com

📊 Results (Score & Ranking)

Score: 0.96171 (previous: -)
Rank🏆: 493/814 (previous: -)

📚 Insights

1. Successfully implemented tuning code for individual models including encoding, and computed optimal parameters.
2. From model comparison, XGBoostClassifier performed best for this competition, while the ensemble did not outperform expectations.
3. Several GPU-related bugs occurred, suggesting GPU code generation is currently a weak area for Copilot. I plan to switch to generating CPU-based code and manually adjusting parameters for GPU afterward.