MySQL CUME_DIST Function

Summary: in this tutorial, you will learn how to use the MySQL CUME_DIST() function to calculate cumulative distribution value.

Overview of MySQL `CUME_DIST()` Function

The CUME_DIST() is a window function that returns the cumulative distribution of a value within a set of values. It represents the number of rows with values less than or equal to that row’s value divided by the total number of rows.

The returned value of the CUME_DIST() function is greater than zero and less than or equal one (0 < CUME_DIST() <= 1). The repeated column values receive the same CUME_DIST() value.

The following shows the syntax of the CUME_DIST() function:

CUME_DIST() OVER (
 PARTITION BY expr, ...
 ORDER BY expr [ASC | DESC], ...
 )

Code language: SQL (Structured Query Language) (sql)

In this syntax, the PARTITION BY clause divides the result set returned by the FROM clause into partitions to which the CUME_DIST() function applies.

The ORDER BY clause specifies the logical order of the rows in each partition or the whole result set in case the PARTITION BY is omitted. The CUME_DIST() function calculates the cumulative distribution value of each row based on its order in the partition.

The approximate formula of the CUME_DIST() function is as follows:

ROW_NUMBER() / total_rows

Code language: SQL (Structured Query Language) (sql)

MySQL `CUME_DIST()` function example

Let’s create a table named scores with some sample data for the demonstration:

CREATE TABLE scores (
 name VARCHAR(20) PRIMARY KEY,
 score INT NOT NULL
 );

INSERT INTO
scores(name, score)
VALUES
(‘Smith’,81),
(‘Jones’,55),
(‘Williams’,55),
(‘Taylor’,62),
(‘Brown’,62),
(‘Davies’,84),
(‘Evans’,87),
(‘Wilson’,72),
(‘Thomas’,72),
(‘Johnson’,100);

Code language: SQL (Structured Query Language) (sql)

The following statement finds the cumulative distribution on the score in the result set:

SELECT
 name,
 score,
 ROW_NUMBER() OVER (ORDER BY score) row_num,
 CUME_DIST() OVER (ORDER BY score) cume_dist_val
 FROM
 scores;

Code language: SQL (Structured Query Language) (sql)

Here is the output:

In this example, the score is sorted in ascending order from 55 to 100. Note that the ROW_NUMBER() function was added for reference.

So how the CUME_DIST() function performs calculation?

For the first row, the function finds the number of rows in the result set, which have value less than or equal to 55. The result is 2. Then CUME_DIST() function divides 2 by the total number of rows which is 10: 2/10. the result is 0.2 or 20%. The same logic is applied to the second row.

For the third row, the function finds the number of rows with the values less than or equal to 62. There are four rows. Then the result of the CUME_DIST() function is: 4/10 = 0.4 which is 40%.

The same calculation logic is applied to the remaining rows.

In this tutorial, you have learned how to use the MySQL CUME_DIST() function to calculate the cumulative distribution of a value in a set of values.

MySQL CUME_DIST Function

MySQL CUME_DIST Function

Overview of MySQL `CUME_DIST()` Function

MySQL `CUME_DIST()` function example

Leave a Reply Cancel reply

Recent Posts

Categories

MySQL CUME_DIST Function

Overview of MySQL CUME_DIST() Function

MySQL CUME_DIST() function example

Leave a Reply Cancel reply

Recent Posts

Categories

Overview of MySQL `CUME_DIST()` Function

MySQL `CUME_DIST()` function example