For structured data on a machine, NOSQL does the actual advantage for RDBMS?

So I have been trying to find out whether NoSQL really brings a lot of value beyond automatic sharding, and processed UNSTRUCTURED data.

Assuming I can install my STRUCTURED data on a single machine, or have an effective “auto-sharding” function for SQL, what are the advantages of any NoSQL option? I have determined the following:

>Document-based (MongoDB, Couchbase, etc.)-Apart from the “auto-sharding” function, it is difficult for me to understand the advantages. Linked objects are very similar to SQL connections, and embedded objects can significantly affect the size of the document and pose a challenge to copying (notes may belong to the post and user, so the data will be redundant). In addition, ACID and transaction losses are a big disadvantage.
>Based on key-value (Redis, Memcached, etc.)-provides different use cases, very suitable for caching, but not complex queries
> Columnar (Cassandra, HBase, etc.)-It seems that the biggest advantage here is how the data is stored On disk, and for aggregation rather than general use
> Graph (Neo4j, OrientDB, etc.) – the most interesting is that the use of edges and nodes is an interesting value proposition, but not for highly complex relational data For general use, this is very useful.

I can see the advantages of Key-Value, Columnar and Graph DB for specific use cases (caching, social network relationship mapping, aggregation), but I don’t see any reason to use structured data like MongoDB. The “auto-sharding” function.

If SQL has similar “auto-sharding” capabilities, then SQL is a meaningless thing for structured data? It seems to be for me, but I want community opinions

Note: This is about typical CRUD applications such as social networks, e-commerce sites, CMS, etc.

If you start running on a server, then many advantages of NoSQL will come out. The biggest advantage of the most popular NoSQL is high availability and shorter downtime. The final consistency requirement can also improve performance. It really depends on your needs.

>Document-based-If your data fits a small amount of data, then a document-oriented database. For example, on a classified advertising website, we use users, accounts, and lists as the core data. Most of the search and display operations are listed separately. With a legacy database, we have to perform nearly 40 connection operations to get the data of a single list. Using NoSQL is a single query. Using NoSQL, we can also create indexes for nested data, and query the unconnected results again. In this case, we are actually mirroring the data from SQL to MongoDB for search and display (and other reasons), and we are now working on a longer-term migration strategy. ElasticSearch, RethinkDB, etc. are also good databases. RethinkDB actually takes a very conservative approach to data, and ElasticSearch’s out-of-the-box index is second to none.
>Key-value storage – Caching is a good use case. When you run a medium-large-volume website that reads a lot of data, a good caching strategy alone can allow you 4-5 times the number of users handled by a single server.
> Columnar-Cassandra is especially useful for distributing a large amount of load and even single-value search. The scaling of Cassandra is very linear with the number of servers used. Very suitable for re-reading and writing scenarios. I found this to be less valuable for real-time search, but it is very good when you have a very high load and need to distribute. It requires more planning and may not suit your needs. You can adjust the settings to meet your CAP needs, and even handle distribution to multiple data centers in the box. Note: Most applications emphasize that this level of use is not required. In most cases, ElasticSearch may be more suitable for you to consider using HBase/Hadoop or Cassandra.
Charts-I am not very familiar with graph databases, so I can’t comment on them here.

Given that you subsequently made special comments and SQL on MongoDB…even with two automatic sharding. Especially PostgreSQL has made great progress in getting non-graphic data available (JSON/JSONB type), not to mention features like PLV8, which may be best suited to handle the type of load you might throw. A document storage with NoSQL advantages . Where might this happen, where replication, sharding, and failover are all performed on the solution in the box.

It is really not the best method for small to medium load fragmentation. In most cases, most of the cases are read, so where you have a replica set, when you have 3-5 servers, there are usually more reading nodes. MongoDB is very good in this case, the master node is automatically selected, and the failover is quite fast. The only strange thing I saw was that when Azure only dropped in the second half of 2014, only one of the servers appeared first, and the other two were almost 40 minutes away. With replication, any given read request can be handled by a single server as a whole. Your data structure becomes simpler and the chance of your data loss decreases.

In my own example above, for a medium-scale classified advertising website, most of the data belongs to a collection, which is searched and displayed from the collection. Using this use case, document storage works much better than structured/normalized data. The way objects are stored is closer to how they are represented in the application. There is no intermittent cognition, and it just works.

In fact, SQL JOIN operations can kill performance, especially when aggregating data across these connections. For a single query of a single user, even more than a dozen are possible. When you connect dozens of times with thousands of simultaneous users, it starts to fall apart. At this point you have several options…

>Caching-Caching is always a good method, the less data changes, the better the method. This can range from a set of memcache/redis instances to using things like MongoDB, RethinkDB or ElasticSearch to save combined records. The challenge here is to update or invalidate your cached data.
>Migration – It is also a good idea to migrate data to a data repository that better represents your needs. If you need to deal with a large number of writes, or a very large number of read scenarios, the SQL database cannot keep up. You may never handle Facebook or Twitter likes on SQL.
>Something in between – when you need to expand it depends on what you are doing and what your pain point is will be the best solution for a given situation. Many developers and administrators are worried about breaking the data into multiple places, but this is often the best answer. Does your analysis data really need to be in the same place as your core business data? For this, your login needs to be tightly coupled? Are you doing a lot of related queries? It really depends on

Personal opinion in advance

For me, I like the safety net provided by SQL. As the central store of core data, this is my first choice. I tend to treat RDBMS as stupid storage, and I don’t like being tied to a given platform. I think many people try to over-regulate their data. Usually I add an XML or JSON field to the table, so that more data can be stored without bloating the program, especially if it is unlikely to be queried, then I will be in the object in the application code Have attributes stored in these fields. A good example might be payment… If you are using one system or multiple systems (one for CC and Paypal, Google, Amazon, etc.), the details of the transaction really will not affect your records, why create 5 A table stores this detailed data.

When data is very suitable for document storage, I would say: If most of the queries are applicable to a single record or collection, you can normalize it. It is great to use this as a mirror of your main data.

For rewriting data, you want multiple systems to be running… It depends largely on your needs… Do you need fast hot query performance? To go with ElasticSearch you need an absolute large-scale horizontal scale, HBase or Cassandra.

The key to take away here is not to be afraid to mix it up… there really is no one that fits all sizes. In addition, I feel that if PostgreSQL provides a good framework (for the open source version) solution, even for replication and automated failover, they are much better than most at this point.

I haven’t really entered, but I should mention that there are many SaaS solutions and other providers that provide hybrid SQL systems. You can develop MySQL/MariaDB locally and deploy to a system with SQL on top of a distributed storage cluster. I still think HBase or ElasticSearch is better for logging and analyzing data, but SQL on top solutions is also very attractive.

More: http://www.mongodb.com/nosql-explained

So I have been trying to find out whether NoSQL is really It brings a lot of value beyond automatic sharding, and processes UNSTRUCTURED data.

Assuming I can install my STRUCTURED data on a single machine, or have an effective “auto-sharding” function for SQL, what are the advantages of any NoSQL option? I have determined the following:

>Document-based (MongoDB, Couchbase, etc.)-Apart from the “auto-sharding” function, it is difficult for me to understand the advantages. Linked objects are very similar to SQL connections, and embedded objects can significantly affect the size of the document and pose a challenge to copying (notes may belong to the post and user, so the data will be redundant). In addition, ACID and transaction losses are a big disadvantage.
>Based on key-value (Redis, Memcached, etc.)-provides different use cases, very suitable for caching, but not complex queries
> Columnar (Cassandra, HBase, etc.)-It seems that the biggest advantage here is how the data is stored On disk, and for aggregation rather than general use
> Graph (Neo4j, OrientDB, etc.) – the most interesting is that the use of edges and nodes is an interesting value proposition, but not for highly complex relational data For general use, this is very useful.

I can see the advantages of Key-Value, Columnar and Graph DB for specific use cases (caching, social network relationship mapping, aggregation), but I don’t see any reason to use structured data like MongoDB. The “auto-sharding” function.

If SQL has similar “auto-sharding” capabilities, then SQL is a meaningless thing for structured data? It seems to be for me, but I want community opinions

Note: This is about typical CRUD applications such as social networks, e-commerce sites, CMS, etc.

If you start running on a server, then many advantages of NoSQL will come out. The biggest advantage of the most popular NoSQL is high availability and shorter downtime. The final consistency requirement can also improve performance. It really depends on your needs.

>Document-based-If your data fits a small amount of data, then a document-oriented database. For example, on a classified advertising website, we use users, accounts, and lists as the core data. Most of the search and display operations are listed separately. With a legacy database, we have to perform nearly 40 connection operations to get the data of a single list. Using NoSQL is a single query. Using NoSQL, we can also create indexes for nested data, and query the unconnected results again. In this case, we are actually mirroring the data from SQL to MongoDB for search and display (and other reasons), and we are now working on a longer-term migration strategy. ElasticSearch, RethinkDB, etc. are also good databases. RethinkDB actually takes a very conservative approach to data, and ElasticSearch’s out-of-the-box index is second to none.
>Key-value storage – Caching is a good use case. When you run a medium-large-volume website that reads a lot of data, a good caching strategy alone can allow you 4-5 times the number of users handled by a single server.
> Columnar-Cassandra is especially useful for distributing a large amount of load and even single-value search. The scaling of Cassandra is very linear with the number of servers used. Very suitable for re-reading and writing scenarios. I found this to be less valuable for real-time search, but it is very good when you have a very high load and need to distribute. It requires more planning and may not suit your needs. You can adjust the settings to meet your CAP needs, and even handle distribution to multiple data centers in the box. Note: Most applications emphasize that this level of use is not required. In most cases, ElasticSearch may be more suitable for you to consider using HBase/Hadoop or Cassandra.
Charts-I am not very familiar with graph databases, so I can’t comment on them here.

Given that you subsequently made special comments and SQL on MongoDB…even with two automatic sharding. Especially PostgreSQL has made great progress in getting non-graphic data available (JSON/JSONB type), not to mention features like PLV8, which may be best suited to handle the type of load you might throw. A document storage with NoSQL advantages . Where might this happen, where replication, sharding, and failover are all performed on the solution in the box.

It is really not the best method for small to medium load fragmentation. In most cases, most of the cases are read, so where you have a replica set, when you have 3-5 servers, there are usually more reading nodes. MongoDB is very good in this case, the master node is automatically selected, and the failover is quite fast. The only strange thing I saw was that when Azure only dropped in the second half of 2014, only one of the servers appeared first, and the other two were almost 40 minutes away. With replication, any given read request can be handled by a single server as a whole. Your data structure becomes simpler and the chance of your data loss decreases.

In my own example above, for a medium-scale classified advertising website, most of the data belongs to a collection, which is searched and displayed from the collection. Using this use case, document storage works much better than structured/normalized data. The way objects are stored is closer to how they are represented in the application. There is no intermittent cognition, and it just works.

In fact, SQL JOIN operations can kill performance, especially when aggregating data across these connections. For a single query of a single user, even more than a dozen are possible. When you connect dozens of times with thousands of simultaneous users, it starts to fall apart. At this point you have several options…

>Caching-Caching is always a good method, the less data changes, the better the method. This can range from a set of memcache/redis instances to using things like MongoDB, RethinkDB or ElasticSearch to save combined records. The challenge here is to update or invalidate your cached data.
>Migration – It is also a good idea to migrate data to a data repository that better represents your needs. If you need to deal with a large number of writes, or a very large number of read scenarios, the SQL database cannot keep up. You may never handle Facebook or Twitter likes on SQL.
>Something in between – when you need to expand it depends on what you are doing and what your pain point is will be the best solution for a given situation. Many developers and administrators are worried about breaking the data into multiple places, but this is often the best answer. Does your analysis data really need to be in the same place as your core business data? For this, your login needs to be tightly coupled? Are you doing a lot of related queries? It really depends on

Personal opinion in advance

For me, I like the safety net provided by SQL. As the central store of core data, this is my first choice. I tend to treat RDBMS as stupid storage, and I don’t like being tied to a given platform. I think many people try to over-regulate their data. Usually I add an XML or JSON field to the table, so that more data can be stored without bloating the program, especially if it is unlikely to be queried, then I will be in the object in the application code Have attributes stored in these fields. A good example might be payment… If you are using one system or multiple systems (one for CC and Paypal, Google, Amazon, etc.), the details of the transaction really will not affect your records, why create 5 A table stores this detailed data.

When data is very suitable for document storage, I would say: If most of the queries are applicable to a single record or collection, you can normalize it. It is great to use this as a mirror of your main data.

For rewriting data, you want multiple systems to be running… It depends largely on your needs… Do you need fast hot query performance? To go with ElasticSearch you need an absolute large-scale horizontal scale, HBase or Cassandra.

The key to take away here is not to be afraid to mix it up… there really is no one that fits all sizes. In addition, I feel that if PostgreSQL provides a good framework (for the open source version) solution, even for replication and automated failover, they are much better than most at this point.

I haven’t really entered, but I should mention that there are many SaaS solutions and other providers that provide hybrid SQL systems. You can develop MySQL/MariaDB locally and deploy to a system with SQL on top of a distributed storage cluster. I still think HBase or ElasticSearch is better for logging and analyzing data, but SQL on top solutions is also very attractive.

More: http://www.mongodb.com/nosql-explained

Leave a Comment

Your email address will not be published.