What is scikit-learn?

Scikit-learn is an open-source library for machine learning in Python. It is built on top of NumPy, SciPy and Matplotlib. It has APIs for a large number of machine learning algorithms across the machine learning problems of classification, regression and clustering.

Scikit-learn also has extensive support for other components of a machine learning pipeline such as feature pre-processing, model selection and model evaluation.

Scikit-learn started out as a Google Summer Of Code project by David Cournapeau.

Currently, scikit-learn is one of the most widely used Python libraries and is considered to be one of the most well-maintained open-source projects with a vibrant community.

Who uses scikit-learn?

Machine learning practitioners or researchers who build predictive or explanatory models in their workflows use scikit-learn regularly. With scikit-learn, training models and making predictions is seamless and quick. It lets you prototype quickly and iterate fast over hypothesis and models.

Scikit-learn also has APIs for basic neural network models. Training a model as complicated as a Random Forest is as simple as a fit() function call from the Random Forest object.

How can you get started with scikit-learn?

Installing scikit-learn

1. If you use pip, then type the following command to install scikit-learn as part of your Python environment:

pip install -U scikit-learn

2. If you use Anaconda, then you don’t have to install anything since it comes pre-installed with the latest version of scikit-learn.

Building a model with scikit-learn

To build your first model on scikit-learn, first import Logistic Regression and in-built datasets from scikit-learn using the following commands:

from sklearn.linear_model import LogisticRegression
from sklearn import datasets

Create a Logistic Regression object using the following command:

lr = LogisticRegression()

Load the iris training dataset and get separate the features (X) and the target label (y).

iris = datasets.load_iris()X, y = iris.data, iris.target

Train the Logistic Regression model using the fit() and providing it with the features and the label to train on. This fit() function is where the actual learning takes place.


The regression model is now trained on your data and you can use it to predict on new instances.

To learn more about the various functions that scikit-learn offers, you can go through the official scikit-learn documentation.

Think we're missing something? 🧐 Help us update this article by sending us your suggestions here. 🙏

See also

Articles you might be interested in

  1. A gentle introduction to Scikit-Learn: A Python machine learning library
  2. Hands-on machine learning with Scikit-Learn and TensorFlow
  3. Scikit-learn source code