Mysql – How to choose a line in the case of considering weight?

I have a table that looks like this:

id: primary key
content: varchar< br />weight: int

What I want to do is to randomly select a row from the table, but consider the weight. For example, if I have 3 rows:

id, content, weight
1, "some content", 60
2, "other content", 40
3, "something", 100

The first row has a 30% chance of being selected, the second row has a 20% chance of being selected, and the third row has a 50% chance of being selected.

Is there a way to do this? If I have to execute 2 or 3 queries, this is not a problem.

I have tried van’s The solution, although it works, it is not fast.

My solution

The way I solve this problem is to maintain a separate link table for the weights. Basic The table structure is similar to this:

CREATE TABLE `table1` (
`id` int(11) UNSIGNED AUTO_INCREMENT PRIMARY KEY,
`name` varchar (100),
`weight` tinyint(4) NOT NULL DEFAULT '1',
);

CREATE TABLE `table1_weight` (
`id` bigint (20) UNSIGNED AUTO_INCREMENT PRIMARY KEY,
`table1_id` int(11) NOT NULL
);

If I have a record with a weight of 3 in table1, then I am in Create 3 records in table1_weight and link to table1 through the table1_id field. No matter what the weight value in Table 1, this is the number of linked records I created in table1_weight.

Test

In the data set with 976 records in table1, the total weight is 2031, so there are 2031 records in table1_weight. I ran the following two SQLs:

> A version of van’s solution

SELECT t.*
FROM table1 t
INNER JOIN
(SELECT t.id,
SUM(tt.weight) AS cum_weight
FROM table1 t
INNER JOIN table1 tt ON tt.id <= t.id
GROUP BY t.id) tc ON tc.id = t.id,
(SELECT SU M(weight) AS total_weight
FROM table1) tt,
(SELECT RAND() AS rnd) r
WHERE r.rnd * tt.total_weight <= tc.cum_weight
ORDER BY t.id ASC
LIMIT 1

>Add auxiliary table for weighting

SELECT t.*
FROM table1 t
INNER JOIN table1_weight w
ON w.table1_id = t.id
ORDER BY RAND()
LIMIT 1

SQL 1 always lasts 0.4 seconds.

SQL 2 It takes 0.01 to 0.02 seconds.

Conclusion

If the speed of choosing random weighted records is not a problem, the single-table SQL suggested by van is good, and there is no overhead of maintaining a separate table. /p>

If, in my case, a short selection time is critical, then I would suggest two table methods.

I have a look It looks like a table like this:

id: primary key
content: varchar
weight: int

I want What to do is to randomly select a row from the table, but take the weight into account. For example, if I have 3 rows:

id, content, weight
1, "some content", 60
2, "other content", 40
3, "something", 100

The first line has a 30% chance of being selected, the second line The chance of being selected is 20%, and the chance of being selected in the third row is 50%.

Is there a way to do this? If I have to execute 2 or 3 queries, it is not a problem.

I have tried van’s solution and although it works, it is not fast.

My solution

My solution to this problem is to maintain a separate link table for the weights. The basic table structure is similar to this:

CREATE TABLE `table1` (
`id` int(11) UNSIGNED AUTO_INCREMENT PRIMARY KEY,
`name` varchar(100),
`weight` tinyint (4) NOT NULL DEFAULT '1',
);

CREATE TABLE `table1_weight` (
`id` bigint(20) UNSIGNED AUTO_INCREMENT PRIMARY KEY,
`table1_id` int(11) NOT NULL
);

If I have a record with a weight of 3 in table1, then I create 3 records in table1_weight, which are linked to by the table1_id field table1. Regardless of the weight value in Table 1, this is the number of link records I created in table1_weight.

Test

In a data set with 976 records in table1, The total weight is 2031, so there are 2031 records in table1_weight. I ran the following two SQLs:

> A version of van’s solution

SELECT t.*
FROM table1 t
INNER JOIN
(SELECT t.id,
SUM(tt.weight) AS cum_weight
FROM table1 t
INNER JOIN table1 tt ON tt.id <= t.id
GROUP BY t.id) tc ON tc.id = t.id,
(SELECT SUM(weight) AS total_weight
FROM table1 ) tt,
(S ELECT RAND() AS rnd) r
WHERE r.rnd * tt.total_weight <= tc.cum_weight
ORDER BY t.id ASC
LIMIT 1

>Join Auxiliary table for weighting

SELECT t.*
FROM table1 t
INNER JOIN table1_weight w
ON w.table1_id = t.id
ORDER BY RAND( )
LIMIT 1

SQL 1 always lasts 0.4 seconds.

SQL 2 takes 0.01 to 0.02 seconds.

Conclusion

If the speed of selecting randomly weighted records is not a problem, the single-table SQL suggested by van is good, and there is no overhead of maintaining a separate table.

If, in my case, a short selection time is the key Yes, then I would suggest two table methods.

Leave a Comment

Your email address will not be published.