Sling Academy
Home/Scikit-Learn/Scikit-Learn Complete Cheat Sheet

Scikit-Learn Complete Cheat Sheet

Last updated: December 21, 2024

Scikit-learn, a powerful library in the Python ecosystem, is essential for any machine learning developer. It offers streamlined and efficient methods for data preprocessing, model deployment, and evaluation. In this article, we will provide a comprehensive cheat sheet for Scikit-learn to help you navigate through its numerous functionalities with ease.

Installation

Before diving into Scikit-learn, ensure that you have the library installed. You can install it using pip:

pip install scikit-learn

Loading and Splitting Data

Scikit-learn provides simple utilities for data loading, such as load_iris for loading the Iris dataset. Here's an example of how to load and split the data:

from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split

iris = load_iris()
X_train, X_test, y_train, y_test = train_test_split(iris.data, iris.target, test_size=0.2, random_state=42)

Data Preprocessing

Preprocessing data is a crucial step in machine learning. Scikit-learn provides several techniques for preprocessing your dataset, such as StandardScaler and MinMaxScaler to scale your features:

from sklearn.preprocessing import StandardScaler

scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

Building a Model

Building a model in Scikit-learn is straightforward. First, you need to choose an estimator, for instance, LogisticRegression for a classification problem:

from sklearn.linear_model import LogisticRegression

model = LogisticRegression()
model.fit(X_train_scaled, y_train)

Model Prediction and Evaluation

Once your model is trained, you can generate predictions. Additionally, Scikit-learn provides various metrics to evaluate model performance:

from sklearn.metrics import accuracy_score, confusion_matrix

# Make predictions
y_pred = model.predict(X_test_scaled)

# Evaluate the model
accuracy = accuracy_score(y_test, y_pred)
cm = confusion_matrix(y_test, y_pred)
print(f"Accuracy: {accuracy}")
print(f"Confusion Matrix: {cm}")

Model Selection

Scikit-learn provides utilities like GridSearchCV for hyperparameter tuning, allowing users to select the best model:

from sklearn.model_selection import GridSearchCV

# Define parameter grid
param_grid = {'C': [0.1, 1, 10], 'solver': ['lbfgs', 'liblinear']}

# Set up GridSearch
grid_search = GridSearchCV(LogisticRegression(), param_grid, cv=5)
grid_search.fit(X_train_scaled, y_train)

# Best parameters
grid_search.best_params_

Conclusion

Scikit-learn offers a rich toolkit for every stage of machine learning, easy-to-use interfaces, and a consistent API. This cheat sheet provides a foundation for beginners and professionals eager to leverage Scikit-learn’s capabilities to solve practical problems. Whether you are building a simple classification model or tuning hyperparameters for more accuracy, Scikit-learn offers the tools you need.

Previous Article: Robust Scaling for Outlier-Heavy Data with Scikit-Learn

Series: Scikit-Learn Tutorials

Scikit-Learn

You May Also Like

  • Generating Gaussian Quantiles with Scikit-Learn
  • Spectral Biclustering with Scikit-Learn
  • ValueError: Estimator Does Not Support Sparse Input in Scikit-Learn
  • Scikit-Learn TypeError: Cannot Broadcast Due to Shape Mismatch
  • AttributeError: 'dict' Object Has No Attribute 'predict' in Scikit-Learn
  • KeyError: Missing 'param_grid' in Scikit-Learn GridSearchCV
  • Scikit-Learn ValueError: 'max_iter' Must Be Positive Integer
  • Fixing Log Function Error with Negative Values in Scikit-Learn
  • RuntimeError: Distributed Computing Backend Not Found in Scikit-Learn
  • Scikit-Learn TypeError: '<' Not Supported Between 'str' and 'int'
  • AttributeError: GridSearchCV Has No Attribute 'fit_transform' in Scikit-Learn
  • Fixing Scikit-Learn Split Error: Number of Splits > Number of Samples
  • Scikit-Learn TypeError: Cannot Concatenate 'str' and 'int'
  • ValueError: Cannot Use 'predict' Before Fitting Model in Scikit-Learn
  • Fixing AttributeError: NoneType Has No Attribute 'predict' in Scikit-Learn
  • Scikit-Learn ValueError: Cannot Reshape Array of Incorrect Size
  • LinAlgError: Matrix is Singular to Machine Precision in Scikit-Learn
  • Fixing TypeError: ndarray Object is Not Callable in Scikit-Learn
  • AttributeError: 'str' Object Has No Attribute 'fit' in Scikit-Learn