[Data Structure] Buron Filter

1. The introduction of Bloom filter:
We know that finding a number in a large number of integers is done by using a bitmap Yes; if you want to find out whether a string is in a large number of strings, the bitmap cannot be resolved, so the Bloom filter is introduced.


2. Overview of Bloom filter:
Bloom filter is proposed by Bloom, it is composed of a series of binary vectors and Some mapping functions are implemented. It is mainly used to retrieve whether a string is in a set, and the space efficiency and query time are far beyond the general algorithm. It can also represent the complete set. Other data structures can’t~ but it is also There are certain shortcomings: a certain degree of misjudgment.
Why is there misjudgment?
When two strings are projected to the same position through a certain mapping function, we will set the corresponding position to 1. When querying, this bit is 1, which does not guarantee that both strings are Exist~ Only one can be guaranteed to exist~ When this bit is 0, the string must not exist, so ~~**The absence of existence is accurate, and the existence is inaccurate**~
To reduce misjudgment, we take a measure: eliminate different hash functions to map a string to different bits, and only when these bits are 1 at the same time, can it be guaranteed that the string exists (when the projected bit The more the number, the lower the false positive rate)
According to the “existence is inaccurate, non-existence is accurate” of the bloom filter, the bloom filter is generally used
When logging in to some websites or application software, the system will first search in the Bloom filter. If this information does not exist, it will prompt an error exit~ Otherwise, check whether the user name entered by the user is in the database (store user information), and if it exists, then Allow login.
Bloom filters generally do not allow deletion, because a certain bit may be projected from multiple characters~ we can’t just set this position to 0. If you want to delete operation, you must introduce a reference count. The reference count cannot be stored in one bit or one byte (a byte can represent 0~255, 256 numbers at most), and an integer number must be used, so the use of space, It has become more all of a sudden~


3. The implementation code of Bloom filter

#pragma once#include"BitMap.hpp"#includetemplate <typename K>struct __HashFunc1{ size_t BKDRHash(const char* str) {register  size_t hash = 0; size_t ch = 0; while(ch = (size_t)*str++) {hash = hash * 131 + ch;}  return hash;} size_t operator()(const string& key) {return< /span> BKDRHash(key.c_str()); }};template<typename K>struct __HashFunc2{ size_t SDBMHash(const char* str ) {register size_t hash = 0; size_t ch = 0 ; while(ch = (size_t)*str++) {hash = hash * 65599 + ch;} return hash;} size_t operator()( const string& key) {return SDBMHas h(key.c_str()); }};template<typename K>struct __HashFunc3{ size_t RSHash(const char* str) {< span class="hljs-keyword">register size_t hash = 0; size_t magic = 63689; size_t ch = 0; while(ch = (size_t)*str++) {hash = hash * magic + ch; magic *= 378551;} return hash;} size_t operator()(const string& key) {return RSHash(key.c_str()); }};template<typename K>struct __HashFunc4{ size_t RSHash(const char* str) {register size_t hash = 0< /span>; size_t magic = 63689; size_t ch = 0; for(long i = 0; ch = (size_t)*str ++ ; ++i) {if((i & 1) == 0) hash ^= ((hash << 7) ^ ch ^(hash >>  3)); else hash ^= ((hash << 11) ^ ch ^ (hash >> 5 ));} return hash;} size_t operator()(const string& key) {return RSHash(key.c_str());} };template<typename K>struct __HashFunc5{ size_t RSHash(const char* str) {if (!*str) return 0;  register size_t hash = 1315423911; size_t magic = 63689; size_t ch = 0; while(ch = (size_t)*str++) {ha sh ^= ((hash << 5) + ch + (hash >> 2));} return hash;} size_t operator()(const string& key) {return RSHash(key.c_str()); }};template<typename K = string,typename HashFunc1 = __HashFunc1,typename HashFunc2 = __HashFunc2,typename HashFunc3 = __HashFunc3,typename HashFunc4 = __HashFunc4,typename HashFunc5 = __HashFunc5 >class BloomFilter{public: BloomFilter(size_t num ) :_bitMap(num * 5) ,_range(num * 5) {} void Set(const K& key) {size_t hash1 = HashFunc1()(key)% _range; size_t hash2 = HashFunc2()( key)% _range; size_t hash3 = HashFunc3()(key)% _range; size_t hash4 = HashFunc4()(key)% _range; size_t hash5 = HashFunc5()(key)% _range; _bitMap.Set(hash1); cout << hash1 <cout << hash2 <cout << hash3 <cout << hash4 <cout << hash5 <bool Test(const K& key) {size_t hash1 = HashFunc1()(key)% _range; if(_bitMap.Test(hash1) == 0) return false; size_t hash2 = HashFunc2()( key)% _range; if(_bitMap.Test(hash2) == 0) return false; size_t hash3 = HashFunc3()(key)% _range; if< /span>(_bitMap.Test(hash3) == 0) return false; size_t hash4 = HashFunc4()(key)% _range; if(_bitMap.Test(hash4) == 0) return false span>; size_t hash5 = HashFunc5()(key)% _range; if(_bitMap.Test(hash5) == 0 ) return false; return true; }private: BitMap _bitMap; size_t _range;};void TestBloomFilter(){ BloomFilter<> bf(5); bf.Set("_bitMap. Test(hash2) == 0"); cout << endl; bf.Set("_bitMap. Test(hash3) == 0"); cout << endl; bf.Set("_bitMap. Test(hash4) == 0"); cout << endl; cout<< bf.Test("_bitMap.T est(hash2) == 0")<cout<"ั๎ อฌัง")<cout<"_bitMap.Test(hash4 ) == 0")<

4. Bloom filter with delete operation:

template <typename K>struct __HashFunc1{ size_t BKDRHash(const char* str) {register size_t hash = 0; size_t ch = 0; while (ch = (size_t)*str++) {hash = hash * 131 + ch;} return hash; } size_t operator()(const string& key) {return BKDRHash(key.c_str( )); }};template<typename K>struct  __HashFunc2{ size_t SDBMHash(const char* str) {register size_t hash = 0; size_t ch = 0; while(ch = (size_t)*str++) {hash = hash * 65599 + ch;} return hash;} size_t operator()(const string& key) {return SDBMHash(key.c_str()); }};template<typename K>struct __HashFunc3{ size_t RSHash(const char* str) { register size_t hash = 0; size_t magic = 63689; size_t ch = 0; while(ch = (size_t)*str++) {hash = hash * magic + ch; magic *= 378551;} return hash;} size_t operator( )(const string& key) {return RSHash(key.c_str()); }};template<typename K>struct __HashFunc4{ size_t RSHash(const char* str) {register size_t hash = 0; size_t magic = 63689; size_t ch = 0; for(long i = 0; ch = (size_t)*str ++; ++i) { if((i & 1) == 0) hash ^= ((hash << 7) ^ ch ^(hash >> 3) ); else hash ^= ((hash << 11) ^ ch ^(hash >> 5));} return hash;} size_t operator()(const string& key) {return RSHash(key.c_str()); }};template<typename K>struct __HashFunc5{ size_t RSHash (const char* str) {if (!*str) return 0; register size_t hash = 1315423911; size_t magic = 63689; size_t ch = 0; while(ch = (size_t)*str++) {hash ^= ((hash << 5) + ch + (hash >> 2));} return hash;} size_t operator()(const string& key) {return RSHash(key.c_str()); }};template <typename K = string,typename< /span> HashFunc1 = __HashFunc1,typename HashFunc2 = __HashFunc2,typename HashFunc3 = __HashFunc3 ,typename HashFunc4 = __HashFunc4,typename HashFunc5 = __HashFunc5 >class BloomFilter{public: BloomFilter(size_t num) {_refB itMap.resize(num * 5); _range = num * 5;} void Set(const K& key) {size_t hash1 = HashFunc1()(key)% _range; size_t hash2 = HashFunc2()(key )% _range; size_t hash3 = HashFunc3()(key)% _range; size_t hash4 = HashFunc4()(key)% _range; size_t hash5 = HashFunc5()(key)% _range;  //_bitMap.Set(hash1); _refBitMap[hash1]++; cout << hash1 <cout << hash2 <cout << hash3 <cout << hash4 <cout << hash5 <bool Test(const K& key) {size_t hash1 = HashFunc1()(key)% _range; if(_refBitMap[hash1] == 0) return false; size_t hash2 = HashFunc2()(key)% _range; if(_refBitMap[hash2] = = 0) return false; size_t hash3 = HashFunc3()(key)% _range; if(_refBitMap[hash3] == 0) return false; size_t hash4 = HashFunc4()(key)% _range; if(_refBitMap[hash4] == 0) return false; size_t hash5 = HashFunc5()(key)% _range;  if(_refBitMap[hash5] == 0) return false; return true;} bool ReSet(const string& key) {size_t hash1 = HashFunc1()(key )% _range; size_t hash2 = HashFunc2()(key)% _range; size_t hash3 = HashFunc3()(key)% _range; size_t hash4 = HashFunc4()(key)% _range; size_t hash5 = HashFunc5()(key)% _range; if (_refBitMap[hash1] == 0 || _refBitMap[hash2] == 0 || _refBitMap[hash3] == < span class="hljs-number">0 || _refBitMap[hash4] == 0 || _refBitMap[hash5] == 0) {return false;} _refBitMap[hash1]-- ; _refBitMap[hash2]--; _refBitMap[hash3]--; _refBitMap[hash4]--; _refBitMap[hash5]--; }private: vector _refBitMap; size_t _range;};void TestRefBloomFilter(){ BloomFilter<> bf(5); bf.Set("_bitMap.Test(hash2) == 0 "); cout << endl; bf.Set("_bitMap.Test(hash3) == 0 "); cout << endl; bf.Set(" _bitMap.Test(hash4) == 0"); cout << endl; bf.ReSet(" _bitMap.Test(hash4) == 0"); cout<"_bitMap. Test(hash2) == 0")<cout<"Yang Classmate")<cout<"_bitMap.Test(hash4) = = 0")<

1. Introduction of Bloom filter:
We know that searching for a number in a large number of integers is done by using a bitmap; if you want to find out whether a string is in a large number of strings, the bitmap cannot be resolved, so the Bloom filter is introduced.


2. Overview of Bloom filter:
Bloom filter is proposed by Bloom, it is composed of a series of binary vectors and Some mapping functions are implemented. It is mainly used to retrieve whether a string is in a set, and the space efficiency and query time are far beyond the general algorithm. It can also represent the complete set. Other data structures can't~ but it is also There are certain shortcomings: a certain degree of misjudgment.
Why is there misjudgment?
When two strings are projected to the same position through a certain mapping function, we will set the corresponding position to 1. When querying, this bit is 1, which does not guarantee that both strings are Exist~ Only one can be guaranteed to exist~ When this bit is 0, the string must not exist, so ~~**The absence of existence is accurate, and the existence is inaccurate**~
To reduce misjudgment, we take a measure: eliminate different hash functions to map a string to different bits, and only when these bits are 1 at the same time, can it be guaranteed that the string exists (when the projected bit The more the number, the lower the false positive rate)
According to the "existence is inaccurate, non-existence is accurate" of the bloom filter, the bloom filter is generally used
When logging in to some websites or application software, the system will first search in the Bloom filter. If this information does not exist, it will prompt an error exit~ Otherwise, check whether the user name entered by the user is in the database (store user information), and if it exists, then Allow login.
Bloom filters generally do not allow deletion, because a certain bit may be projected from multiple characters~ we can't just set this position to 0. If you want to delete operation, you must introduce a reference count. The reference count cannot be stored in one bit or one byte (a byte can represent 0~255, 256 numbers at most), and an integer number must be used, so the use of space, It has become more all of a sudden~


3. The implementation code of Bloom filter

#pragma once#include"BitMap.hpp"#includetemplate <typename K>struct __HashFunc1{ size_t BKDRHash(const char* str) {register  size_t hash = 0; size_t ch = 0; while(ch = (size_t)*str++) {hash = hash * 131 + ch;}  return hash;} size_t operator() (const string& key) {return BKDRHash(key.c_str()); }};template<typename K>struct __HashFunc2{ size_t SDBMHash(const char* str) {< span class="hljs-keyword">register size_t hash = 0; size_t ch = 0; while(ch = (size_t)*str++) {hash = hash * 65599 + ch;} return hash;} size_t operator()(const string& key) {return SDBMHash(k ey.c_str()); }};template<typename K>struct __HashFunc3{ size_t RSHash(const char* str) {register size_t hash = 0; size_t magic = 63689; size_t ch = 0; while(ch = (size_t)*str++) {hash = hash * magic + ch; magic *= 378551;} return hash;} size_t operator()(const string& key) {return RSHash(key.c_str()); }};template<typename K>struct __HashFunc4{ size_t RSHash(const char* str) {register size_t hash = 0; size_t magic = 63689; size_t ch = 0; for(long i = 0; ch = (size_t)*str ++; ++i) {if((i & 1) == 0) hash ^= ((hash << 7) ^ ch ^(hash >> 3 )); else hash ^= ((hash << 11) ^ ch ^( hash >> 5)); } return hash;} size_t operator()(const< /span> string& key) {return RSHash(key.c_str()); }};< span class="hljs-keyword">template<typename K>struct __HashFunc5{ size_t RSHash(const char* str) {if(!*str) return 0; register span> size_t hash = 1315423911; size_t magic = 63689; size_t ch = 0; while(ch = (size_t)*str++) {hash ^= ((hash << 5) + ch + (hash >> 2));} < span class="hljs-keyword">return hash;} size_t operator()(const string& key) {return RSHash(key.c_str()); }};template<typename K = string,typename HashFunc1 = __HashFunc1,typename HashFunc2 = __HashFunc2, typename HashFunc3 = __HashFunc3,typename HashFunc4 = __HashFunc4,typename HashFunc5 = __HashFunc5 >class BloomFilter{public: BloomFilter(size_t num) :_bitMap(num * 5) ,_range(num * 5) {} void Set(const K& key) {size_t hash1 = HashFunc1()(key)% _range; size_t hash2 = HashFunc2()(key )% _range; size_t hash3 = HashFunc3()(key)% _range; size_t hash4 = HashFunc4()(key)% _range; size_t hash5 = HashFunc5()(key)% _range; _bitMap.Set(hash1); cout << hash1 <cout << hash2 <cout << hash3 <cout < cout << hash5 <bool Test(co nst K& key)    {        size_t hash1 = HashFunc1()(key) % _range;        if(_bitMap.Test(hash1) == 0)            return false;        size_t hash2 = HashFunc2()(key) % _range;        if(_bitMap.Test(hash2) == 0)            return false;        size_t hash3 = HashFunc3()(key) % _range;        if(_bitMap.Test(hash3) == 0)            return false;        size_t hash4 = HashFunc4()(key) % _range;        if(_bitMap.Test(hash4) == 0)            return false;        size_t hash5 = HashFunc5()(key) % _range;        if(_bitMap.Test(hash5) == 0)            return false;        return true;    }private:    BitMap _bitMap;    size_t _range;};void TestBloomFilter(){    BloomFilter<> bf(5);    bf.Set("_bitMap.Test(hash2) == 0");    cout << endl;    bf.Set("_bitMap.Test(hash3) == 0");    cout << endl;    bf.Set("_bitMap.Test(hash4) == 0");    cout << endl;    cout<"_bitMap.Test (hash2) == 0")<cout<"ั๎อฌัง")<cout<"_bitMap.Test(hash4) == 0")<

4.带删除操作的布隆过滤器:

template <typename K>struct __HashFunc1{    size_t BKDRHash(const char* str)    {        register size_t hash = 0;        size_t ch = 0;        while(ch = (size_t)*str++)        {            hash = hash * 131 + ch;        }        return hash;    }    size_t operator()(const  string& key)    {        return BKDRHash(key.c_str());    }};template<typename K>struct __HashFunc2{    size_t SDBMHash(const char* str)    {        register size_t hash = 0;        size_t ch = 0;        while(ch = (size_t)*str++)        {            hash = hash * 65599 + ch;        }        return hash;    }    size_t operator()(const  string& key)    {        return SDBMHash(key.c_str());    }};template<typename K>struct __HashFunc3{    size_t RSHash(const char* str)    {        register size_t hash = 0;        size_t magic = 63689;        size_t ch = 0;        while(ch = (size_t)*str++)        {            hash = hash * magic + ch;            magic *= 378551;        }        return hash;    }    size_t operator()(const  string& key)    {        return RSHash(key.c_str());    }};template<typename K>struct __HashFunc4{    size_t RSHash(const char* str)    {        register size_t hash = 0;        size_t magic = 63689;        size_t ch = 0;        for(long i = 0; ch = (size_t)*str ++; ++i)        {            if((i & 1) == 0)                hash ^= ((hash << 7) ^ ch ^(hash >> 3));            else                hash ^= ((hash << 11) ^ ch ^(hash >> 5));        }        return hash;    }    size_t operator()(const  string& key)    {        return RSHash(key.c_str());    }};template<typename K>struct __HashFunc5{    size_t RSHash(const char* str)    {        if(!*str)            return 0;        register size_t hash = 1315423911;        size_t magic = 63689;        size_t ch = 0;        while(ch = (size_t)*str++)        {            hash ^= ((hash << 5) + ch + (hash >> 2));        }        return hash;    }    size_t operator()(const  string& key)    {        return RSHash(key.c_str());    }};template<typename K = string,typename HashFunc1 = __HashFunc1,typename HashFunc2 = __HashFunc2,typename HashFunc3 = __HashFunc3,typename HashFunc4 = __HashFunc4,typename HashFunc5 = __HashFunc5 >class BloomFilter{public:    BloomFilter(size_t num)    {        _refBitMap .resize(num * 5);        _range = num * 5;    }    void Set(const K& key)    {        size_t hash1 = HashFunc1()(key) % _range;        size_t hash2 = HashFunc2()(key) % _range;        size_t hash3 = HashFunc3()(key) % _range;        size_t hash4 = HashFunc4()(key) % _range;        size_t hash5 = HashFunc5()(key) % _range;        //_bitMap.Set(hash1);        _refBitMap[hash1]++;        cout << hash1 <cout << hash2 <cout << hash3 <cout << hash4 <cout << hash5 <bool Test(const K& key)    {        size_t hash1 = HashFunc1()(key) % _range;        if(_refBitMap[hash1] == 0)            return false;        size_t hash2 = HashFunc2()(key) % _range;        if(_refBitMap[hash2] == 0)            return false;        size_t hash3 = HashFunc3()(key) % _range;        if(_refBitMap[hash3] == 0)            return false;        size_t hash4 = HashFunc4()(key) % _range;        if(_refBitMap[hash4] == 0)            return false;        size_t hash5 = HashFunc5()(key) % _range;        if(_refBitMap[hash5] == 0)            return false;        return true;    }    bool ReSet(const string& key)    {        size_t hash1 = HashFunc1()(key) % _range;        size_t hash2 = HashFunc2()(key) % _range;        size_t hash3 = HashFunc3()(key) % _range;        size_t hash4 = HashFunc4()(key) % _range;        size_t hash5 = HashFunc5()(key) % _range;        if (_refBitMap[hash1] == 0            || _refBitMap[hash2] == 0            || _refBitMap[hash3] == 0            || _refBitMap[hash4] == 0            || _refBitMap[hash5] == 0)        {            return false;        }        _refBitMap[hash1]--;        _refBitMap[hash2]--;        _refBitMap[hash3]--;        _refBitMap[hash4]--;        _refBitMap[hash5]--;    }private:    vector _refBitMap;    size_t _range;};void TestRefBloomFilter(){    BloomFilter<> bf(5);    bf.Set("_bitMap.Test(hash2) == 0");    cout << endl;    bf.Set("_bitMap.Test(hash3) == 0");    cout << endl;    bf.Set("_bitM ap.Test(hash4) == 0");    cout << endl;    bf.ReSet("_bitMap.Test(hash4) == 0");    cout<"_bitMap.Test(hash2) == 0")<cout<"杨同学")<cout<"_bitMap.Test(hash4) == 0")<

Leave a Comment

Your email address will not be published.