【Sumeng is committed to becoming the most outstanding data science community, focusing on big data, analysis and mining, and data visualization Field, business scope: offline activities, online courses, headhunting services, project docking]
Translator: Juliashine
Now we stand in the perspective of each use case to consider which system is suitable for these use cases.
What’s your opinion?
First of all, we have to look at various data models. The classification methods of these models come from Emil Eifrem and NoSQL databases.
Document database
-
Origin: Inspired by Lotus Notes.
-
Data model: a collection of documents containing key-value
-
Examples: CouchDB, MongoDB
li>
-
Advantages: Natural data model, friendly programming, rapid development, web friendly, CRUD.
Graph database
-
Origin: Euler and graph theory.
-
Data model: nodes and relationships, and can also handle key-value pairs.
-
Examples: AllegroGraph, InfoGrid, Neo4j
-
Advantages: Solve complex graph problems.
Relational database
-
Source: EF Codd proposed in A Relational Model of Data for Large Shared Data Banks
-
Data model: various relationships
-
Examples: VoltDB, Clustrix, MySQL
-
Advantages: high-performance, scalable OLTP, support for SQL, materialized views, support transactions, Programming friendly.
Object database
-
Origin: Graph Database Research
-
Data Model: Object
-
Examples: Objectivity, Gemstone
-
Advantages: complex object model, fast key-value access, key function access, and the advantages of graph databases.
Key-Value database
-
Source: Amazon’s paper Dynamo and Distributed HashTables.
-
Data model: key-value pairs
-
Examples: Membase, Riak
-
Advantages: Processing a large amount of data, quickly processing a large number of read and write requests. Programming friendly.
BigTable type database
-
Source: Google’s paper BigTable.
-
Data model: Column cluster, each row is theoretically different
-
Examples: HBase, Hypertable, Cassandra
-
Advantages: Processing large amounts of data, coping with extremely high write loads, high availability, supporting cross-data center, MapReduce.
Data structure service
-
Source: ?
-
Data model: dictionary operations, lists, sets and string values
-
Example: Redis
-
Advantages: Different from any previous database
Grid database
-
Origin: Data grid and tuple space research.
-
Data model: Space-based architecture
-
Example: GigaSpaces, Coherence
-
Advantages: High performance and high scalability for transaction processing
You What should I use for my application?
-
The key is to be aware of the difference The applications require different data models and products. Choose the right data model and product.
-
To understand what kind of data model your application requires, see What The Heck Are You Actually Using NoSQL For? In this article I summarized some of the different features Unconventional usage scenarios.
-
Adapt to your needs and application scenarios. In turn, you can find the product that best suits your architecture. Neither NoSQL nor SQL is important.
-
Comprehensively consider the data model, product characteristics and application scenarios. Different products have different functions, and it is impossible to decide who to choose based only on the data model.
-
Which product has the features you need most is the best.
If your application has the following requirements:
-
Complex things, if you can’t afford the risk of data loss or you want a simple transaction programming model Choose relational database and grid database.
-
Example: An inventory system requires complete ACID features. I would be very unhappy if I was told that it was sold out after I bought an item. I don’t want compensation, I just want what I buy.
-
Scalability, either NoSQL or SQL. The target product should support horizontal expansion, partitioning, online hardware addition and reduction, load balancing, automatic sharding, and data balancing And fault tolerance.
-
In pursuit of high availability, Bigtable type databases can be used to support eventual consistency.
-
If you need to deal with long-term fast reads and writes, you can look at document databases, key-value databases, or memory databases. You can also consider SSD.
-
To realize the social network, the first choice should be the graph database. Secondly, a relational database like Riak can also be used. An in-memory relational database that supports simple SQL join operations can handle situations where the amount of data is small. Redis’ set and list operations are just that.
If your application has the following requirements:
-
If you need different access methods and data types, you can look at the document database. They are very flexible in this regard.
-
For offline analysis of large data volumes, Hadoop should be considered first, followed by other products that support MapReduce. Of course, supporting MapReduce is not the same as being good at MapReduce processing.
-
If you need to span multiple data centers, you can choose a product based on the Bigtable model, or a distributed product that can solve the delay problem and partition fault tolerance.
-
CRUD type applications can consider document databases, so that complex data structures can be accessed without joining.
-
Riak can be considered for search.
-
If you need lists, sets, queues, publish-subscribe and other data structures, you can consider Redis. Its distributed locks and other features are also very useful.
-
Programming is friendly. If you want to use JSON, HTTP, REST, Javascript and other data types that programmers love to hear, the first choice is document database and key-value database.
If your application has the following requirements:
-
A materialized view for real-time transaction processing, you can consider VoltDB, which is very suitable To quickly process a large number of transactions.
-
Enterprise-level support and service-level agreement, you can find products on the market that use this as a selling point, such as Membase.
-
To record a large amount of continuous data without high requirements for consistency, you can look at the Bigtable type database, because it works on a distributed file system and can handle large The size of the write request.
-
It needs to be as simple as possible to use, please consider the PAAS scheme, with this scheme you hardly need to do anything yourself.
-
If you want to sell your products to corporate customers, please consider relational databases, because they are used to relational databases.
-
To dynamically build the relationship between objects, the attributes of the objects can be dynamically added and subtracted. You can consider a graph database, because it does not require a schema and can be modeled on demand in the code.
-
To support large audio and video files, you can look at storage services like S3. NoSQL is not suitable for storing BLOBS, although MongoDB also provides file services.
If your application has the following requirements:
-
To quickly upload large amounts of data in batches, you have to find products that support this scenario . But most products do not support batch operations.
-
Easy to change, you should choose a document database and a key-value database that support dynamic schema. It supports optional domains, and domains can be increased or decreased without modifying the schema.
-
In order to support integrity constraints, select a database that supports SQL DDL, which can be implemented in a stored procedure or application code.
-
Deep connection graph database, which supports fast positioning between entity keys.
-
In order to make the calculation close to the data and reduce the overhead of data transmission over the network, a stored procedure can be considered. Relational databases, Internet databases, document databases and key-value databases all support stored procedures.
If your application has the following requirements:
-
To store BLOB data, select Key -value database. It can store web pages or complex objects. The latter can be obtained only by join in a relational database, which is costly. It can also reduce latency.
-
Choose a proven and mature product, and choose a general solution when dealing with scalability issues (vertical scaling, tuning, caching, data fragmentation, anti-paradigm Etc.)
-
Variable data types, irregular data, unfixed number of columns, complex data structure, etc., consider the document database, Key -value database, and Bigtable database. Their data types are more flexible.
-
If you need fast relational query, but don’t want to implement it yourself, then choose a database that supports SQL.
-
Be able to operate in the cloud and automatically utilize all the features and benefits of the cloud. There is no such thing yet.
If your application has the following requirements:
-
Support secondary index, search through different keys, consider relationship Database and Cassandra, the latter has added support for secondary indexes.
-
The scale continues to grow (real big data scenario), but the infrequently accessed data can use the Bigtable type database, because its data is stored on a distributed file system , It is easy to expand.
-
To integrate with other services, check whether the database provides some post-write synchronization function, so as to be able to capture database changes and notify other systems to ensure consistency.
-
Fault tolerance, check whether the write operation can succeed in power outages, partition failures, and other failure scenarios.
-
If it is just to promote technological innovation in a certain direction, it seems that there is no ready-made thing that can achieve this goal, you have to create a new one yourself. This is not easy.
-
CouchDB/Mobile couchbase can be used on the mobile platform.
Which is better?
- < li>
-
The performance test data has its own specific scenarios, which may not be suitable for your situation.
-
If your company has just been established and you don’t have a finished product yet, and you are willing to try something new, then choosing SQL or NoSQL will cost you more Mind (the implication is that a piece of white paper is good for painting, you can toss about it without the burden of the existing system?).
-
The performance gap is not obvious when the amount of data is small, but when the amount of data becomes large?
-
There is no perfect thing. If you go to Amazon’s forum, it is full of complaints about the performance and service of various products, and GAE is the same. Every product will have problems, can you solve the problems of the product you choose?
Migrating to NoSQL for a 25% performance improvement is not worthwhile.
This article is translated from 35+ Use Cases For Choosing Your Next NoSQL Database< /p>
Source of the article: http://www.36dsj.com/archives/22258
——————— ——————————
Dataunion website: www.dataunion.org
Dataunion Weibo: @数盟社区
Sumeng WeChat: DataScientistUnion
Sumeng [big data group] 272089418
Sumeng[Data Visualization Group] 179287077
Sumeng[Data Analysis Group] 110875722
—————————————————
Click to read the original text, more exciting technology, information content ~
【SMU is committed to becoming the most outstanding data science Community, focusing on big data, analysis and mining, data visualization, business scope: offline activities, online courses, headhunting services, project docking]
Translator: Juliashine
Now we are considering the kind from the perspective of each use case The system is suitable for these use cases.
What’s your opinion?
First of all, we have to look at various data models. The classification methods of these models come from Emil Eifrem and NoSQL databases.
Document database
-
Origin: Inspired by Lotus Notes.
-
Data model: a collection of documents containing key-value
-
Examples: CouchDB, MongoDB
li>
-
Advantages: Natural data model, friendly programming, rapid development, web friendly, CRUD.
Graph database
-
Origin: Euler and graph theory.
-
Data model: nodes and relationships, and can also handle key-value pairs.
-
Examples: AllegroGraph, InfoGrid, Neo4j
-
Advantages: Solve complex graph problems.
Relational database
-
Source: EF Codd proposed in A Relational Model of Data for Large Shared Data Banks
-
Data model: various relationships
-
Examples: VoltDB, Clustrix, MySQL
-
Advantages: high-performance, scalable OLTP, support for SQL, materialized views, support transactions, Programming friendly.
Object database
-
Origin: Graph Database Research
-
Data Model: Object
-
Examples: Objectivity, Gemstone
-
Advantages: complex object model, fast key-value access, key function access, and the advantages of graph databases.
Key-Value database
-
Source: Amazon’s paper Dynamo and Distributed HashTables.
-
Data model: key-value pairs
-
Examples: Membase, Riak
-
Advantages: Processing a large amount of data, quickly processing a large number of read and write requests. Programming friendly.
BigTable type database
-
Source: Google’s paper BigTable.
-
Data model: Column cluster, each row is theoretically different
-
Examples: HBase, Hypertable, Cassandra
-
Advantages: Processing large amounts of data, coping with extremely high write loads, high availability, supporting cross-data center, MapReduce.
Data structure service
-
Source: ?
-
Data model: dictionary operations, lists, sets and string values
-
Example: Redis
-
Advantages: Different from any previous database
Grid database
-
Origin: Data grid and tuple space research.
-
Data model: Space-based architecture
-
Example: GigaSpaces, Coherence
-
Advantages: High performance and high scalability for transaction processing
You What should I use for my application?
-
The key is to be aware of the difference The applications require different data models and products. Choose the right data model and product.
-
To understand what kind of data model your application requires, see What The Heck Are You Actually Using NoSQL For? In this article I summarized some of the different features Unconventional usage scenarios.
-
Adapt to your needs and application scenarios. In turn, you can find the product that best suits your architecture. Neither NoSQL nor SQL is important.
-
Comprehensively consider the data model, product characteristics and application scenarios. Different products have different functions, and it is impossible to decide who to choose based only on the data model.
-
Which product has the features you need most is the best.
If your application has the following requirements:
-
Complex things, if you can’t afford the risk of data loss or you want a simple transaction programming model Choose relational database and grid database.
-
Example: An inventory system requires complete ACID features. I would be very unhappy if I was told that it was sold out after I bought an item. I don’t want compensation, I just want what I buy.
-
Scalability, either NoSQL or SQL. The target product should support horizontal expansion, partitioning, online hardware addition and reduction, load balancing, automatic sharding, and data balancing And fault tolerance.
-
In pursuit of high availability, Bigtable type databases can be used to support eventual consistency.
-
If you need to deal with long-term fast reads and writes, you can look at document databases, key-value databases, or memory databases. You can also consider SSD.
-
To realize the social network, the first choice should be the graph database. Secondly, a relational database like Riak can also be used. An in-memory relational database that supports simple SQL join operations can handle situations where the amount of data is small. Redis’ set and list operations are just that.
If your application has the following requirements:
-
If you need different access methods and data types, you can look at the document database. They are very flexible in this regard.
-
For offline analysis of large data volumes, Hadoop should be considered first, followed by other products that support MapReduce. Of course, supporting MapReduce is not the same as being good at MapReduce processing.
-
If you need to span multiple data centers, you can choose a product based on the Bigtable model, or a distributed product that can solve the delay problem and partition fault tolerance.
-
CRUD type applications can consider document databases, so that complex data structures can be accessed without joining.
-
Riak can be considered for search.
-
If you need lists, sets, queues, publish-subscribe and other data structures, you can consider Redis. Its distributed locks and other features are also very useful.
-
Programming is friendly. If you want to use JSON, HTTP, REST, Javascript and other data types that programmers love to hear, the first choice is document database and key-value database.
If your application has the following requirements:
-
A materialized view for real-time transaction processing, you can consider VoltDB, which is very suitable To quickly process a large number of transactions.
-
Enterprise-level support and service-level agreement, you can find products on the market that use this as a selling point, such as Membase.
-
To record a large amount of continuous data without high requirements for consistency, you can look at the Bigtable type database, because it works on a distributed file system and can handle large The size of the write request.
-
It needs to be as simple as possible to use, please consider the PAAS scheme, with this scheme you hardly need to do anything yourself.
-
If you want to sell your products to corporate customers, please consider relational databases, because they are used to relational databases.
-
To dynamically build the relationship between objects, the attributes of the objects can be dynamically added and subtracted. You can consider a graph database, because it does not require a schema and can be modeled on demand in the code.
-
To support large audio and video files, you can look at storage services like S3. NoSQL is not suitable for storing BLOBS, although MongoDB also provides file services.
If your application has the following requirements:
-
To quickly upload large amounts of data in batches, you have to find products that support this scenario . But most products do not support batch operations.
-
Easy to change, you should choose a document database and a key-value database that support dynamic schema. It supports optional domains, and domains can be increased or decreased without modifying the schema.
-
In order to support integrity constraints, select a database that supports SQL DDL, which can be implemented in a stored procedure or application code.
-
Deep connection graph database, which supports fast positioning between entity keys.
-
In order to make the calculation close to the data and reduce the overhead of data transmission over the network, a stored procedure can be considered. Relational databases, Internet databases, document databases and key-value databases all support stored procedures.
If your application has the following requirements:
-
To store BLOB data, select Key -value database. It can store web pages or complex objects. The latter can be obtained only by join in a relational database, which is costly. It can also reduce latency.
-
Choose a proven and mature product, and choose a general solution when dealing with scalability issues (vertical scaling, tuning, caching, data fragmentation, anti-paradigm Etc.)
-
Variable data types, irregular data, unfixed number of columns, complex data structure, etc., consider the document database, Key -value database, and Bigtable database. Their data types are more flexible.
-
If you need fast relational query, but don’t want to implement it yourself, then choose a database that supports SQL.
-
Be able to operate in the cloud and automatically utilize all the features and benefits of the cloud. There is no such thing yet.
If your application has the following requirements:
-
Support secondary index, search through different keys, consider relationship Database and Cassandra, the latter has added support for secondary indexes.
-
The scale continues to grow (real big data scenario), but the infrequently accessed data can use the Bigtable type database, because its data is stored on a distributed file system , It is easy to expand.
-
To integrate with other services, check whether the database provides some post-write synchronization function, so as to be able to capture database changes and notify other systems to ensure consistency.
-
Fault tolerance, check whether the write operation can succeed in power outages, partition failures, and other failure scenarios.
-
If it is just to promote technological innovation in a certain direction, it seems that there is no ready-made thing that can achieve this goal, you have to create a new one yourself. This is not easy.
-
CouchDB/Mobile couchbase can be used on the mobile platform.
Which is better?
- < li>
-
The performance test data has its own specific scenarios, which may not be suitable for your situation.
-
If your company has just been established and you don’t have a finished product yet, and you are willing to try something new, then choosing SQL or NoSQL will cost you more Mind (the implication is that a piece of white paper is good for painting, you can toss about it without the burden of the existing system?).
-
The performance gap is not obvious when the amount of data is small, but when the amount of data becomes large?
-
There is no perfect thing. If you go to Amazon’s forum, it is full of complaints about the performance and service of various products, and GAE is the same. Every product will have problems, can you solve the problems of the product you choose?
Migrating to NoSQL for a 25% performance improvement is not worthwhile.
This article is translated from 35+ Use Cases For Choosing Your Next NoSQL Database< /p>
Source of the article: http://www.36dsj.com/archives/22258
——————— ——————————
Dataunion website: www.dataunion.org
Dataunion Weibo: @数盟社区
Sumeng WeChat: DataScientistUnion
Sumeng [big data group] 272089418
Sumeng[Data Visualization Group] 179287077
Sumeng[Data Analysis Group] 110875722
—————————————————
Click to read the original text, more exciting technology, information content ~