id: primary key
content: varchar< br />weight: int
What I want to do is to randomly select a row from the table, but consider the weight. For example, if I have 3 rows:
id, content, weight
1, "some content", 60
2, "other content", 40
3, "something", 100
The first row has a 30% chance of being selected, the second row has a 20% chance of being selected, and the third row has a 50% chance of being selected.
Is there a way to do this? If I have to execute 2 or 3 queries, this is not a problem.
My solution
The way I solve this problem is to maintain a separate link table for the weights. Basic The table structure is similar to this:
CREATE TABLE `table1` (
`id` int(11) UNSIGNED AUTO_INCREMENT PRIMARY KEY,
`name` varchar (100),
`weight` tinyint(4) NOT NULL DEFAULT '1',
);
CREATE TABLE `table1_weight` (
`id` bigint (20) UNSIGNED AUTO_INCREMENT PRIMARY KEY,
`table1_id` int(11) NOT NULL
);
If I have a record with a weight of 3 in table1, then I am in Create 3 records in table1_weight and link to table1 through the table1_id field. No matter what the weight value in Table 1, this is the number of linked records I created in table1_weight.
Test
In the data set with 976 records in table1, the total weight is 2031, so there are 2031 records in table1_weight. I ran the following two SQLs:
> A version of van’s solution
SELECT t.*
FROM table1 t
INNER JOIN
(SELECT t.id,
SUM(tt.weight) AS cum_weight
FROM table1 t
INNER JOIN table1 tt ON tt.id <= t.id
GROUP BY t.id) tc ON tc.id = t.id,
(SELECT SU M(weight) AS total_weight
FROM table1) tt,
(SELECT RAND() AS rnd) r
WHERE r.rnd * tt.total_weight <= tc.cum_weight
ORDER BY t.id ASC
LIMIT 1
>Add auxiliary table for weighting
SELECT t.*
FROM table1 t
INNER JOIN table1_weight w
ON w.table1_id = t.id
ORDER BY RAND()
LIMIT 1
SQL 1 always lasts 0.4 seconds.
SQL 2 It takes 0.01 to 0.02 seconds.
Conclusion
If the speed of choosing random weighted records is not a problem, the single-table SQL suggested by van is good, and there is no overhead of maintaining a separate table. /p>
If, in my case, a short selection time is critical, then I would suggest two table methods.
I have a look It looks like a table like this:
id: primary key
content: varchar
weight: int
I want What to do is to randomly select a row from the table, but take the weight into account. For example, if I have 3 rows:
id, content, weight
1, "some content", 60
2, "other content", 40
3, "something", 100
The first line has a 30% chance of being selected, the second line The chance of being selected is 20%, and the chance of being selected in the third row is 50%.
Is there a way to do this? If I have to execute 2 or 3 queries, it is not a problem.
I have tried van’s solution and although it works, it is not fast.
My solution
My solution to this problem is to maintain a separate link table for the weights. The basic table structure is similar to this:
CREATE TABLE `table1` (
`id` int(11) UNSIGNED AUTO_INCREMENT PRIMARY KEY,
`name` varchar(100),
`weight` tinyint (4) NOT NULL DEFAULT '1',
);
CREATE TABLE `table1_weight` (
`id` bigint(20) UNSIGNED AUTO_INCREMENT PRIMARY KEY,
`table1_id` int(11) NOT NULL
);
If I have a record with a weight of 3 in table1, then I create 3 records in table1_weight, which are linked to by the table1_id field table1. Regardless of the weight value in Table 1, this is the number of link records I created in table1_weight.
Test
In a data set with 976 records in table1, The total weight is 2031, so there are 2031 records in table1_weight. I ran the following two SQLs:
> A version of van’s solution
SELECT t.*
FROM table1 t
INNER JOIN
(SELECT t.id,
SUM(tt.weight) AS cum_weight
FROM table1 t
INNER JOIN table1 tt ON tt.id <= t.id
GROUP BY t.id) tc ON tc.id = t.id,
(SELECT SUM(weight) AS total_weight
FROM table1 ) tt,
(S ELECT RAND() AS rnd) r
WHERE r.rnd * tt.total_weight <= tc.cum_weight
ORDER BY t.id ASC
LIMIT 1
>Join Auxiliary table for weighting
SELECT t.*
FROM table1 t
INNER JOIN table1_weight w
ON w.table1_id = t.id
ORDER BY RAND( )
LIMIT 1
SQL 1 always lasts 0.4 seconds.
SQL 2 takes 0.01 to 0.02 seconds.
Conclusion
If the speed of selecting randomly weighted records is not a problem, the single-table SQL suggested by van is good, and there is no overhead of maintaining a separate table.
If, in my case, a short selection time is the key Yes, then I would suggest two table methods.