AI News Hub Logo

AI News Hub

70. Hyperparameter Tuning: Finding the Best Settings.

DEV Community
Akhilesh

You picked a model. You trained it. You got decent accuracy. Then someone asks: did you tune the hyperparameters? You picked max_depth=5 because it felt right. Learning rate 0.1 because you saw it in a tutorial. Number of trees because 100 is a round number. That's guessing. Hyperparameter tuning replaces guessing with a systematic search. It finds the combination of settings that actually works best for your specific data. What hyperparameters are and why they matter Grid search: exhaustive but slow Random search: faster and often just as good Bayesian optimization with Optuna: smarter search How to avoid overfitting your validation set during tuning Nested cross-validation for honest evaluation Practical tuning strategy for real projects First the distinction, because people mix these up. Parameters are learned by the model during training. The weights in a neural network. The split thresholds in a decision tree. You don't set these. The training algorithm finds them. Hyperparameters are set by you before training. They control how the training happens. Model parameters (learned): - Decision tree split thresholds - Linear regression coefficients - Neural network weights Hyperparameters (you set these): - max_depth in a decision tree - n_estimators in a random forest - learning_rate in XGBoost - C and gamma in SVM - n_neighbors in KNN Changing hyperparameters changes how the model learns. Wrong settings lead to overfitting, underfitting, or slow convergence. Good settings squeeze out the best possible performance. Grid search is the simplest approach. You define a grid of hyperparameter values. It tries every possible combination. It returns the best one. from sklearn.datasets import load_breast_cancer from sklearn.ensemble import RandomForestClassifier from sklearn.model_selection import train_test_split, GridSearchCV from sklearn.metrics import accuracy_score import pandas as pd import time data = load_breast_cancer() X, y = data.data, data.target X_train, X_test, y_train, y_test = train_test_split( X, y, test_size=0.2, random_state=42, stratify=y ) # Define the grid param_grid = { 'n_estimators': [50, 100, 200], 'max_depth': [3, 5, 10, None], 'min_samples_leaf': [1, 2, 4], } # Total combinations = 3 * 4 * 3 = 36 # With 5-fold CV = 36 * 5 = 180 model fits total_fits = (len(param_grid['n_estimators']) * len(param_grid['max_depth']) * len(param_grid['min_samples_leaf'])) * 5 print(f"Grid combinations: {total_fits // 5}") print(f"Total model fits with 5-fold CV: {total_fits}") rf = RandomForestClassifier(random_state=42, n_jobs=-1) start = time.time() grid_search = GridSearchCV( estimator=rf, param_grid=param_grid, cv=5, scoring='accuracy', n_jobs=-1, verbose=1 ) grid_search.fit(X_train, y_train) elapsed = time.time() - start print(f"\nSearch time: {elapsed:.1f}s") print(f"Best params: {grid_search.best_params_}") print(f"Best CV score: {grid_search.best_score_:.3f}") print(f"Test accuracy: {accuracy_score(y_test, grid_search.predict(X_test)):.3f}") Output: Grid combinations: 36 Total model fits with 5-fold CV: 180 Fitting 5 folds for each of 36 candidates... Search time: 8.2s Best params: {'max_depth': None, 'min_samples_leaf': 1, 'n_estimators': 200} Best CV score: 0.967 Test accuracy: 0.974 Grid search is thorough. But it scales badly. If you add one more hyperparameter with 4 values, you go from 36 combinations to 144. With many hyperparameters and large ranges, grid search becomes impractical. # Look at all results as a dataframe results_df = pd.DataFrame(grid_search.cv_results_) results_df = results_df[[ 'param_n_estimators', 'param_max_depth', 'param_min_samples_leaf', 'mean_test_score', 'std_test_score' ]].sort_values('mean_test_score', ascending=False) print("Top 10 results:") print(results_df.head(10).to_string(index=False)) Reading these results helps you understand which parameters matter most and which ones barely affect performance. Instead of trying every combination, random search samples random combinations. It covers a much wider range with fewer trials. Why does it work? Most hyperparameters have large "flat" regions. Moving max_depth from 7 to 8 might not matter. But moving it from 3 to 15 might matter a lot. Random search samples from the full range more efficiently than a coarse grid. from sklearn.model_selection import RandomizedSearchCV from scipy.stats import randint, uniform # Define distributions instead of fixed lists param_dist = { 'n_estimators': randint(50, 500), # sample from range 50 to 500 'max_depth': [3, 5, 7, 10, 15, None], 'min_samples_leaf': randint(1, 10), 'max_features': ['sqrt', 'log2', 0.5, 0.7], 'min_samples_split':randint(2, 20), } rf_r = RandomForestClassifier(random_state=42, n_jobs=-1) start = time.time() random_search = RandomizedSearchCV( estimator=rf_r, param_distributions=param_dist, n_iter=50, # try 50 random combinations cv=5, scoring='accuracy', n_jobs=-1, random_state=42, verbose=1 ) random_search.fit(X_train, y_train) elapsed = time.time() - start print(f"Search time: {elapsed:.1f}s") print(f"Best params: {random_search.best_params_}") print(f"Best CV score: {random_search.best_score_:.3f}") print(f"Test accuracy: {accuracy_score(y_test, random_search.predict(X_test)):.3f}") Output: Search time: 6.3s Best params: {'max_depth': 10, 'max_features': 'sqrt', 'min_samples_leaf': 1, 'min_samples_split': 4, 'n_estimators': 347} Best CV score: 0.971 Test accuracy: 0.982 Random search found a better result in similar time because it explored a wider space. The grid search only tried 3 values for n_estimators. Random search sampled from 50 to 500 continuously. Rule of thumb: use random search over grid search almost always. Only use grid search when you've already narrowed down the important ranges with random search and want to fine-tune. Grid and random search have no memory. Each trial is independent. They don't learn from previous results. Optuna uses Bayesian optimization. It builds a model of which parameter regions are promising and focuses future trials there. It's smarter and usually finds better results in fewer trials. pip install optuna import optuna from sklearn.ensemble import RandomForestClassifier from sklearn.model_selection import cross_val_score import numpy as np # Suppress optuna logging optuna.logging.set_verbosity(optuna.logging.WARNING) def objective(trial): # Define the search space n_estimators = trial.suggest_int('n_estimators', 50, 500) max_depth = trial.suggest_categorical('max_depth', [3, 5, 7, 10, 15, None]) min_samples_leaf = trial.suggest_int('min_samples_leaf', 1, 10) max_features = trial.suggest_categorical('max_features', ['sqrt', 'log2', 0.5]) min_samples_split = trial.suggest_int('min_samples_split', 2, 20) model = RandomForestClassifier( n_estimators=n_estimators, max_depth=max_depth, min_samples_leaf=min_samples_leaf, max_features=max_features, min_samples_split=min_samples_split, random_state=42, n_jobs=-1 ) score = cross_val_score(model, X_train, y_train, cv=5, scoring='accuracy').mean() return score start = time.time() study = optuna.create_study(direction='maximize') study.optimize(objective, n_trials=50, show_progress_bar=True) elapsed = time.time() - start print(f"\nSearch time: {elapsed:.1f}s") print(f"Best params: {study.best_params}") print(f"Best CV score: {study.best_value:.3f}") # Train final model with best params best_rf = RandomForestClassifier( **study.best_params, random_state=42, n_jobs=-1 ) best_rf.fit(X_train, y_train) print(f"Test accuracy: {accuracy_score(y_test, best_rf.predict(X_test)):.3f}") Output: Search time: 12.4s Best params: {'n_estimators': 423, 'max_depth': 10, 'min_samples_leaf': 1, 'max_features': 'sqrt', 'min_samples_split': 3} Best CV score: 0.974 Test accuracy: 0.982 Optuna found the best result because it focused on promising regions. With more trials the gap between Optuna and random search grows larger. import matplotlib.pyplot as plt # Plot optimization history trials_df = study.trials_dataframe() plt.figure(figsize=(10, 4)) plt.subplot(1, 2, 1) plt.plot(trials_df['number'], trials_df['value'], alpha=0.5, color='blue', linewidth=1) best_so_far = trials_df['value'].cummax() plt.plot(trials_df['number'], best_so_far, color='red', linewidth=2, label='Best so far') plt.xlabel('Trial') plt.ylabel('CV Accuracy') plt.title('Optimization History') plt.legend() plt.subplot(1, 2, 2) # Parameter importance importances = optuna.importance.get_param_importances(study) params = list(importances.keys()) values = list(importances.values()) plt.barh(params, values, color='steelblue') plt.xlabel('Importance') plt.title('Hyperparameter Importance') plt.tight_layout() plt.savefig('optuna_results.png', dpi=100) plt.show() print("\nHyperparameter importance:") for param, imp in importances.items(): print(f" {param}: {imp:.3f}") The importance plot shows which hyperparameters actually mattered. If n_estimators has near-zero importance, you don't need to tune it carefully. Focus on the ones that matter. XGBoost has many hyperparameters. Optuna handles this better than grid search. import xgboost as xgb def xgb_objective(trial): params = { 'n_estimators': trial.suggest_int('n_estimators', 100, 1000), 'max_depth': trial.suggest_int('max_depth', 3, 8), 'learning_rate': trial.suggest_float('learning_rate', 0.01, 0.3, log=True), 'subsample': trial.suggest_float('subsample', 0.5, 1.0), 'colsample_bytree':trial.suggest_float('colsample_bytree', 0.5, 1.0), 'reg_alpha': trial.suggest_float('reg_alpha', 1e-8, 10.0, log=True), 'reg_lambda': trial.suggest_float('reg_lambda', 1e-8, 10.0, log=True), 'random_state': 42, 'eval_metric': 'logloss', 'verbosity': 0 } model = xgb.XGBClassifier(**params) score = cross_val_score(model, X_train, y_train, cv=5, scoring='accuracy').mean() return score study_xgb = optuna.create_study(direction='maximize') study_xgb.optimize(xgb_objective, n_trials=50, show_progress_bar=True) print(f"\nXGBoost best CV: {study_xgb.best_value:.3f}") print(f"Best params: {study_xgb.best_params}") best_xgb = xgb.XGBClassifier(**study_xgb.best_params, random_state=42, verbosity=0) best_xgb.fit(X_train, y_train) print(f"Test accuracy: {accuracy_score(y_test, best_xgb.predict(X_test)):.3f}") Here's a subtle trap. Every time you check the test set during tuning, you leak information about the test set into your choices. If you tune for 200 trials and always pick the best test score, you've effectively trained on the test set. The solution is nested cross-validation. The inner loop tunes. The outer loop evaluates. from sklearn.model_selection import cross_val_score, KFold, GridSearchCV # Inner CV for tuning, outer CV for honest evaluation outer_cv = KFold(n_splits=5, shuffle=True, random_state=42) inner_cv = KFold(n_splits=3, shuffle=True, random_state=42) # Simple param grid for speed param_grid_nested = { 'n_estimators': [50, 100], 'max_depth': [5, 10, None], } rf_nested = RandomForestClassifier(random_state=42, n_jobs=-1) grid_nested = GridSearchCV(rf_nested, param_grid_nested, cv=inner_cv, scoring='accuracy') # Outer CV gives the honest estimate nested_scores = cross_val_score(grid_nested, X, y, cv=outer_cv, scoring='accuracy') print(f"Nested CV accuracy: {nested_scores.mean():.3f} +/- {nested_scores.std():.3f}") print("This is the honest estimate of real-world performance.") print() # Compare to non-nested (optimistically biased) non_nested_scores = cross_val_score( GridSearchCV(rf_nested, param_grid_nested, cv=3), X, y, cv=outer_cv ) print(f"Non-nested CV: {non_nested_scores.mean():.3f} +/- {non_nested_scores.std():.3f}") print("This can be overly optimistic on small datasets.") Nested CV is slower but gives you an unbiased estimate. Use it when reporting final results, especially on small datasets. Here's the workflow that works well in practice: Step 1: Start with default hyperparameters. Know your baseline before you tune. Step 2: Use random search with 50-100 trials across a wide range of values. This finds the good region fast. Step 3: Narrow the range based on step 2 results. Run Optuna with 50-100 trials in the narrowed space. Step 4: Focus on the hyperparameters that matter. Check Optuna's importance plot. Ignore the ones with near-zero importance. Step 5: Evaluate the final model on the test set once. Only once. Never tune based on test set results. # Full practical example from sklearn.model_selection import train_test_split from sklearn.metrics import classification_report import optuna optuna.logging.set_verbosity(optuna.logging.WARNING) X_train, X_test, y_train, y_test = train_test_split( X, y, test_size=0.2, random_state=42, stratify=y ) # Step 1: Baseline baseline_model = RandomForestClassifier(random_state=42, n_jobs=-1) baseline_score = cross_val_score(baseline_model, X_train, y_train, cv=5).mean() print(f"Step 1 - Baseline CV: {baseline_score:.3f}") # Step 2: Random search wide range param_dist_wide = { 'n_estimators': randint(10, 1000), 'max_depth': [2, 3, 5, 7, 10, 15, None], 'min_samples_leaf': randint(1, 20), 'max_features': ['sqrt', 'log2', 0.3, 0.5, 0.7], } rs = RandomizedSearchCV( RandomForestClassifier(random_state=42, n_jobs=-1), param_dist_wide, n_iter=30, cv=5, random_state=42 ) rs.fit(X_train, y_train) print(f"Step 2 - Random search CV: {rs.best_score_:.3f}") print(f" Best params: {rs.best_params_}") # Step 3: Optuna in narrowed space based on step 2 def narrow_objective(trial): model = RandomForestClassifier( n_estimators = trial.suggest_int('n_estimators', 100, 600), max_depth = trial.suggest_categorical('max_depth', [5, 7, 10, None]), min_samples_leaf = trial.suggest_int('min_samples_leaf', 1, 5), max_features = trial.suggest_categorical('max_features', ['sqrt', 0.5, 0.7]), random_state=42, n_jobs=-1 ) return cross_val_score(model, X_train, y_train, cv=5).mean() study_narrow = optuna.create_study(direction='maximize') study_narrow.optimize(narrow_objective, n_trials=40) print(f"Step 3 - Optuna CV: {study_narrow.best_value:.3f}") # Step 5: Final evaluation on test set (only once) final_model = RandomForestClassifier( **study_narrow.best_params, random_state=42, n_jobs=-1 ) final_model.fit(X_train, y_train) print(f"\nStep 5 - FINAL Test accuracy: {accuracy_score(y_test, final_model.predict(X_test)):.3f}") print("\nFinal classification report:") print(classification_report(y_test, final_model.predict(X_test), target_names=data.target_names)) import time results = {} # Grid search start = time.time() gs = GridSearchCV( RandomForestClassifier(random_state=42, n_jobs=-1), {'n_estimators': [50, 100, 200], 'max_depth': [5, 10, None]}, cv=5, n_jobs=-1 ) gs.fit(X_train, y_train) results['Grid Search'] = {'cv': gs.best_score_, 'time': time.time()-start, 'trials': 9} # Random search start = time.time() rs2 = RandomizedSearchCV( RandomForestClassifier(random_state=42, n_jobs=-1), {'n_estimators': randint(50, 500), 'max_depth': [3, 5, 10, None], 'min_samples_leaf': randint(1, 10)}, n_iter=50, cv=5, random_state=42, n_jobs=-1 ) rs2.fit(X_train, y_train) results['Random Search'] = {'cv': rs2.best_score_, 'time': time.time()-start, 'trials': 50} # Optuna optuna.logging.set_verbosity(optuna.logging.WARNING) start = time.time() def comp_obj(trial): m = RandomForestClassifier( n_estimators = trial.suggest_int('n_estimators', 50, 500), max_depth = trial.suggest_categorical('max_depth', [3, 5, 10, None]), min_samples_leaf = trial.suggest_int('min_samples_leaf', 1, 10), random_state=42, n_jobs=-1 ) return cross_val_score(m, X_train, y_train, cv=5).mean() s = optuna.create_study(direction='maximize') s.optimize(comp_obj, n_trials=50) results['Optuna'] = {'cv': s.best_value, 'time': time.time()-start, 'trials': 50} print(f"\n{'Method':<16} {'CV Score':<12} {'Time':<10} {'Trials'}") print("-" * 45) for method, r in results.items(): print(f"{method:<16} {r['cv']:.3f} {r['time']:.1f}s {r['trials']}") Method When to use Trials needed Grid Search Fine-tuning 1-2 params with known ranges Low (exhaustive) Random Search First pass, many params, wide ranges 50-100 Optuna When you need the best result and have compute 100-500 Task Code Grid search GridSearchCV(model, param_grid, cv=5) Random search RandomizedSearchCV(model, param_dist, n_iter=50, cv=5) Best params .best_params_ Best CV score .best_score_ Best model .best_estimator_ Optuna study optuna.create_study(direction='maximize') Run Optuna study.optimize(objective, n_trials=100) Optuna importance optuna.importance.get_param_importances(study) Nested CV cross_val_score(GridSearchCV(...), X, y, cv=outer_cv) Level 1: load_wine() with a RandomForest. Try n_estimators of [50, 100, 200] and max_depth of [3, 5, None]. Print the full results table. Which parameter matters more? Level 2: random_state values. Which method is more consistent across runs? Level 3: learning_rate, max_depth, subsample, reg_alpha, and n_estimators. Plot the optimization history and the hyperparameter importance chart. What are the two most important hyperparameters? Scikit-learn: GridSearchCV Scikit-learn: RandomizedSearchCV Optuna documentation Optuna: hyperparameter importance Bergstra & Bengio: Random Search for Hyper-Parameter Optimization Next up, Post 71: End-to-End ML Project: Predict Something Real. We take everything from Phase 6 and build one complete project from raw data to final predictions. Data cleaning, feature engineering, model selection, tuning, and evaluation all in one place.