Performance – Combination (N Select K) Parallelization and Efficiency

Recently I have been using word combinations to make “phrases” in different languages, and I have noticed some things I can do with more professional input.

Define some constants for this,

The average depth (n) is 6-7

The length of the input set is about 160 unique words.

> Memory-Generating n permutations of 160 words takes up a lot of space. I can abuse the database by writing it to disk, but then I need to keep waiting for IO to get performance. Another trick is to dynamically generate like a generator object Combination
>Time-If I am not wrong, then choose k to get something big and fast like this formula factorial(n)/(factorial(depth)*(factorial(n-depth))) which means that the input set is very It will get bigger soon.

My question is this.

Considering that I have a function f(x), it takes a combination and applies a costly calculation, For example:

func f(x) {
if query_mysql("text search query").value> 15 {
return true
}
return false
}

How to handle and execute this function effectively in a large number of combinations?

For bonus issues, can a combination be generated at the same time?

Update: I already know how to generate them as usual, more is to make them efficient.

One way is to first calculate how much parallelism you can get based on the number of threads you get. Let the number of threads be T, and split the work as follows:

> Sort the elements according to some total ordering.
>Find the smallest number d such that the choice (n,d)>=T.
>Find all combinations of’depth’ (exactly) d (usually much lower than Depth d, and can be calculated on one core).
>Now, spread the work to your T cores, each core gets a set of’prefixes’ (each prefix c is a combination of size d), and For each case, find all suffixes whose “smallest” element is “larger” according to the total ordering ratio max(c).

This method can also be well converted to map-reduce paradigm.

map(words): //one mapper
sort(words) //by some total ordering function
generate all combiations of depth `d` exactly // NOT K!!!
for each combination c produced:
idx <- index in words of max(c)
emit(c,words[idx+1:end])
reduce(c1, words): //T reducers
combinations <- generate all combinations of size kd from words
for each c2 in combinations:
c <- concat(c1 ,c2)
emit(c,f(c))

Recently I have been using a combination of words to make “phrases” in different languages, I pay attention There are some things I can do with more professional input.

Define some constants for this,

The average depth (n) is 6-7

< p>The length of the input set is about 160 unique words.

> Memory – Generating n permutations of 160 words will take up a lot of space. I can abuse the database by writing the database to disk, but then I need to keep waiting for IO to gain performance. Another trick is like generating Dynamically generate the combination
>time like the device object-if I’m correct, then choose k to get something big and fast like this formula factorial(n)/(factorial(depth)*(factorial(n-depth))) this Means that the input set will soon become larger.

My question is this.

Considering that I have a function f(x), it takes a combination and applies a Cost calculation, for example:

func f(x) {
if query_mysql("text search query").value> 15 {
return true
}
return false
}

How to handle and execute this function effectively in a large number of combinations?

For bonus issues, can a combination be generated at the same time?

Update: I already know how to generate them as usual, and more is to make them efficient.

One way is based on The number of threads you get calculate how much parallelism you can get. Let the number of threads be T, and split the work as follows:

> Sort the elements according to some total order.
>Find the smallest number d such that the choice (n,d)>=T.
>Find all combinations of’depth’ (exactly) d (usually much lower than depth d, and can be calculated on one core).
>Now, spread the work to your T cores, each core gets a set of’prefixes’ (each prefix c is a combination of size d), and for each case, find their “minimum” element All suffixes of “bigger” are based on the total sort ratio max(c).

This method can also be well converted to map-reduce paradigm.

< pre>map(words): //one mapper
sort(words) //by some total ordering function
generate all combiations of depth `d` exactly // NOT K!!!
for each combination c produced:
idx <- index in words of max(c)
emit(c,words[idx+1:end])
reduce(c1, words): / /T reducers
combinations <- generate all combinations of size kd from words
for each c2 in combinations:
c <- concat(c1,c2)
emit(c,f (c))

Leave a Comment

Your email address will not be published.