This post does not explain what ML is and the usual stuff. Rather it mentions some basic details about ML. This is based on Aurelion’s ML book.

Types of ML

Based on how they are classified, below are some classifications

  • Whether or not they are trained with human supervision

    • Supervised

    • Unsupervised

    • Semi supervised

    • Reinforcement Learning

  • Whether or not they can learn incrementally on the fly

    • Online

    • batch learning

  • Whether they work by simply comparing new data points to known data points, or instead by detecting patterns in the training data and building a predictive model, much like scientists do

    • instance-based

    • model-based learning

Common Supervised Learning Algorithms

  • k-Nearest Neighbors

  • Linear Regression

  • Logistic Regression

  • Support Vector Machines (SVMs)

  • Decision Trees and Random Forests

  • Neural networks

Common Unsupervised Learning Algorithms

  • Clustering

    • K-Means

    • DBSCAN

    • Hierarchical Cluster Analysis (HCA)

  • Anomaly detection and novelty detection

    • One-class SVM

    • Isolation Forest

  • Visualization and dimensionality reduction

    • Principal Component Analysis (PCA)

    • Kernel PCA

    • Locally Linear Embedding (LLE)

    • t-Distributed Stochastic Neighbor Embedding (t-SNE)

  • Association rule learning

    • Apriori

    • Eclat

Most semi supervised learning algorithms are combinations of unsupervised and supervised algorithms

For example, deep belief networks (DBNs) are based on unsupervised components called restricted Boltzmann machines (RBMs) stacked on top of one another. RBMs are trained sequentially in an unsupervised manner, and then the whole system is fine-tuned using supervised learning techniques.

Reinforcement Learning Algorithms

There are no common algorithms. These work on Reward and Penalties and the learning happens over time by running it on multiple real-life examples

  • Algorithms which play Chess or Go are an example

  • Programs used in Robots are another example

Batch and Online Learning algorithms

As the name says there are algorithms which have to be trained offline i.e. Batch Algorithms and algorithms which can learn on the fly i.e. Online algorithms

Instance vs Model algorithms

Depending on whether the algorithm uses learned instances to predict for new inputs like say Classification Algorithms (k-NearestNeighbors for example) or uses a Model like say Regression Algorithms where you have a line/plane.

Challenges in ML

  • Bad Data

  • Bad Model

What to do with the Bad data or Bad model?

  • Feature Selection

  • Feature Extraction

  • Regularization

  • Hyperparameters

Data Load

Below function will be useful to get data from online datasets.

import os

import tarfile

import urllib

DOWNLOAD_ROOT = “https://raw.githubusercontent.com/ageron/handson-ml2/master/”

HOUSING_PATH = os.path.join(“datasets”, “housing”)

HOUSING_URL = DOWNLOAD_ROOT + “datasets/housing/housing.tgz”

def fetch_housing_data(housing_url=HOUSING_URL, housing_path=HOUSING_PATH):

os.makedirs(housing_path, exist_ok=True)

tgz_path = os.path.join(housing_path, “housing.tgz”)

urllib.request.urlretrieve(housing_url, tgz_path)

housing_tgz = tarfile.open(tgz_path)

housing_tgz.extractall(path=housing_path)

housing_tgz.close()

Look at the next few posts on ML End to End process for more information.