ML Functions

Overview

Machine learning (ML) functions enable trained models to be run directly within SQL queries. They support real-time classification of new data and detection of anomalies without requiring custom code. These functions allow predictions to be embedded directly into workflows that operationalize insights at the data layer.

Introduction to Machine Learning

Machine learning is a field of study in artificial intelligence that develops and applies methods for learning patterns from historical data and using those patterns to make predictions or decisions on new data. Unlike rule-based systems, ML models adapt automatically based on the data. Common use cases include fraud detection, predictive maintenance, and customer behavior analysis. Once trained, models can be deployed and invoked using ML functions to generate predictions at scale.

Classification

Classification is a supervised learning technique that assigns each input to one of a predefined set of classes or labels. Models are trained on labeled datasets, where each input is paired with its correct output, and then used to classify new data. For example, a model may predict whether an incoming email is "spam" or "not spam" based on its content and metadata.

Following are the types of classification:

  • Binary Classification: Predicts one of two possible outcomes (for example, fraud vs. non-fraud).

  • Multiclass Classification: Predicts one label from multiple possible categories (for example, product type A, B, or C).

  • Multilabel Classification: Assigns multiple labels to a single data point (for example, tagging an image with "beach" and "sunset").

Examples:

  • A credit-card transaction classified as “fraudulent” or “legitimate.

  • Customer support tickets categorized as “billing,” “technical issue,” or “account upgrade.

The ML_CLASSIFY  is a supervised machine learning function for classification tasks. It supports both binary (two classes) and multi-class (more than two classes) classification. It leverages algorithms such as logistic regression, random forest, and gradient boosting. Use SQL queries to call ML_CLASSIFY function and return predicted class labels.

Anomaly Detection

Anomaly detection identifies data points that deviate significantly from expected patterns. Anomalies signal critical issues such as fraud, equipment failure, or network intrusions. An anomaly is any value or pattern that does not match normal behavior. Anomalies indicate the following:

  • Performance issues (for example, server overload)

  • System faults (for example, failed jobs or memory leaks)

  • Opportunities (for example, traffic spikes caused by a marketing campaign)

For example, if a cluster's CPU usage normally stays between 20–60% and suddenly rises to 95%, the spike is an anomaly.

Time-Series Anomaly Detection

Time-series anomaly detection analyzes data collected over time. For example, CPU or memory utilization per minute or hour. It considers not only individual values, but also the sequence and patterns in the data. It learns seasonal patterns (daily or weekly), long‑term trends, and normal variability ranges. The system flags values that deviate from these learned patterns.

Following are the types of time-series anomaly detection:

  • Supervised: Supervised models use labeled anomalies to learn failure patterns.

  • Unsupervised: Unsupervised models learn normal behavior from historical data and flag deviations without labels.

The ML_ANOMALY_DETECT function is an unsupervised time-series anomaly detection function currently. It supports statistical methods (e.g., z-score, interquartile range) and machine learning methods (e.g., Isolation Forest, One-Class SVM). This function returns a prediction for each row, identifying it as normal or anomalous, which can trigger alerts or be recorded for further analysis.

Install ML Functions

To install ML Functions, navigate to AI > AI & ML Functions, select the deployment on which to install ML Functions. In the ML Functions tab, select Install, review the ML Functions Summary and then select Deploy.

Once the ML Functions are installed, query them in the SQL Editor or SingleStore Notebooks. SingleStore provides the following ML Functions:

Category

Function

Statistical and Predictive Functions

ML_CLASSIFY(model_name, TO_JSON(selected_data.*))
ML_ANOMALY_DETECT(model_name, TO_JSON(selected_data.*))

Statistical and Predictive Functions

ML_CLASSIFY

Performs binary and multi-class classification on a dataset using standard machine learning algorithms. Supports common algorithms including:

  • Logistic Regression

  • Random Forest

  • Gradient Boosting

Syntax

ML_CLASSIFY(model_name, TO_JSON(selected_data.*))

Arguments

  • model_name: Name of the trained ML model to use.

  • selected_data: A row or set of rows selected for prediction.

Return Type

string

Usage

Basic usage

SELECT cluster.ML_CLASSIFY(model_name, TO_JSON(selected_data.*)) AS predictions
FROM (SELECT * FROM table) AS selected_data;

Basic usage with LIMIT

SELECT cluster.ML_CLASSIFY(model_name, TO_JSON(selected_data.*)) AS predictions
FROM (SELECT * FROM table WHERE column1 > 100000LIMIT 100) AS selected_data;

Insert predictions into a table

INSERT INTO predictions_table (id, prediction);
SELECT selected_data.id,
cluster.ML_CLASSIFY(model_name, TO_JSON(selected_data.*)) AS prediction 
FROM (SELECT * FROM table LIMIT 100) AS selected_data;

ML_ANOMALY_DETECT

Detects outliers and anomalies in datasets using statistical or machine learning-based methods. Suitable for security, monitoring, and anomaly detection applications. Supports the following methods:

  • Statistical: z-score, interquartile range (IQR)

  • ML-based: Isolation Forest, One-Class SVM

Syntax

ML_ANOMALY_DETECT(model_name, TO_JSON(selected_data.*))

Arguments

  • model_name: Name of the trained ML model to use.

  • selected_data: A row or set of rows selected for prediction.

Return Type

string

Usage

Basic usage

SELECT cluster.ML_ANOMALY_DETECT(model_name, TO_JSON(selected_data.*)) AS predictions
FROM (SELECT * FROM table) AS selected_data;

Basic usage with LIMIT

SELECT cluster.ML_ANOMALY_DETECT(model_name, TO_JSON(selected_data.*)) AS predictions
FROM (SELECT * FROM tableWHERE column1 > 100000LIMIT 100) AS selected_data;

Insert predictions into a table

INSERT INTO predictions_table (id, prediction);
SELECT selected_data.id,
cluster.ML_ANOMALY_DETECT(model_name, TO_JSON(selected_data.*)) AS prediction 
FROM (SELECT * FROM table LIMIT 100) AS selected_data;

Train a New ML Model

To train a new ML model, follow these steps:

  1. Navigate to AI > Models.

  2. Select ML Models tab and then select Train New ML Model.

  3. In the Select Function dialog, select one of the following ML functions:

    • ML_CLASSIFY

    • ML_ANOMALY_DETECT

    Select Next to configure the model.

Configure Model

Model Name

Enter the name of the ML model.

Training Description

Enter the training description.

Workspace

Select the SingleStore deployment (workspace) the notebook connects to.

Specifying a workspace allows natively connecting the SingleStore databases referenced in the notebook.

Compute Size

Select one of the following compute sizes:

  • Small

  • Medium

  • GPU-T4

Run as

Run the notebook for training a model with or without personal credentials. Select one of the following:

  • Run as <username>: Runs the notebook using the permissions and access of the current user account.

  • Run as a Service Account: Runs the notebook independently of personal credentials, using a service account.

    Note

    Service accounts can only be created by Admin.

Select Next.

Select Training Data

Database

Select the database that contains the training data.

Table

Select the table from the selected database to train the machine learning model.

Target Column

Select the column that represents the prediction target for the model.

Feature Selection Mode

Specify how feature columns are selected.

Feature Column

Select one or more columns to be used as input features for training the model.

Preview the data and select Next.

Review the Summary and generated Fusion SQL syntax in the Generated SQL Script. The generated script performs the following:

  • Creates and trains a ML model

  • Uses data from the selected table in the selected database

  • Predicts values of target column status

  • Runs on the selected compute instance

  • Uses all available features by default

Following is the syntax of Fusion SQL script:

%s2ml train <machine_learning_algorithm>
--model <model_name>
--db <database_name>
--input_table <table_name>
--target_column <target_column>
--description <training_description>
--runtime <compute_instance>
--selected_features { \"mode\": <feature_selection_mode>, \"features\": <feature_column> }

Select Start Training to train the ML model.

Manage an Existing ML Model

Existing ML models can be managed by performing the following actions:

  • View details

  • Run prediction

  • Share

  • Delete

View Details of an Existing ML Model

To view details of an existing ML model, select the ellipsis under Actions column of the trained ML model, and select View Details. Alternatively, select the ML model in the Name column. Select the Details tab to view training status, training configuration, training logs, and details about how to use the ML model.

Run Prediction on an Existing ML Model

Run batch prediction on the existing ML model.

Run a Batch Prediction

To run a batch prediction on the existing ML model, select the ellipsis under Actions column of the trained ML model, and select Run Prediction.

Select Prediction Data

Database

Select the database.

Target Table

Select the target table on which the prediction will be run.

Target Column

Select the target column on which the prediction will focus on.

Timestamp Column

Select the column having timestamp data. Available for ML_ANOMALY_DETECT only.

Preview the data and select Next.

Configure Destination

Prediction Interval Width

Select the interval width of prediction. Available for ML_ANOMALY_DETECT only.

Destination Table Name

Select the destination table in which the prediction results will be stored.

Destination Column

Select the destination column in which the prediction data will be saved.

Run as

Run the notebook for training a model with or without personal credentials. Select one of the following:

  • Run as <username>: Runs the notebook using the permissions and access of the current user account.

  • Run as a Service Account: Runs the notebook independently of personal credentials, using a service account.

    Note

    Service accounts are only created by Admin.

Review the Summary and generated Fusion SQL syntax in the Generated SQL Script. Select Start Prediction to run batch prediction on the trained ML model.

View Predictions of an Existing ML Model

To view the predictions of the trained ML model, select the ML model in the Name column. Select the Predictions tab to view prediction metadata and status.

Share an Existing ML Model

To share an existing ML model, select the ellipsis under the Actions column of the trained ML model, and select Share.

Delete an Existing ML Model

To delete an existing ML model, select the ellipsis under Actions column of the trained ML model, and select Delete.

Status of ML Models

Status

Description

Pre-processing

The system is preparing data for ML model training (e.g., data cleaning, feature extraction).

Training

The ML model is currently being trained but results are not yet available.

Done

The ML model has been successfully trained and is ready for use.

Error

The ML model training or processing failed due to an error.

Last modified: November 21, 2025

Was this article helpful?

Verification instructions

Note: You must install cosign to verify the authenticity of the SingleStore file.

Use the following steps to verify the authenticity of singlestoredb-server, singlestoredb-toolbox, singlestoredb-studio, and singlestore-client SingleStore files that have been downloaded.

You may perform the following steps on any computer that can run cosign, such as the main deployment host of the cluster.

  1. (Optional) Run the following command to view the associated signature files.

    curl undefined
  2. Download the signature file from the SingleStore release server.

    • Option 1: Click the Download Signature button next to the SingleStore file.

    • Option 2: Copy and paste the following URL into the address bar of your browser and save the signature file.

    • Option 3: Run the following command to download the signature file.

      curl -O undefined
  3. After the signature file has been downloaded, run the following command to verify the authenticity of the SingleStore file.

    echo -n undefined |
    cosign verify-blob --certificate-oidc-issuer https://oidc.eks.us-east-1.amazonaws.com/id/CCDCDBA1379A5596AB5B2E46DCA385BC \
    --certificate-identity https://kubernetes.io/namespaces/freya-production/serviceaccounts/job-worker \
    --bundle undefined \
    --new-bundle-format -
    Verified OK

Try Out This Notebook to See What’s Possible in SingleStore

Get access to other groundbreaking datasets and engage with our community for expert advice.