[Reprint] [reference] NoSQL database conversation by writing [Author: Yan On]

Reprinted: NoSQL Database conversation by writing of: Yen On

Written Discussion NoSQL database

Yan Open
v0.2
2010.2

  1. Sequence
  2. Thought papers
    1. CAP
    2. Eventual consistency
      1. Variant
    3. BASE
    4. Other
      1. I / O's five-minute rule
      2. Do not delete data
      3. RAM is the hard disk, hard disk is a tape
      4. Amdahl's law and Gustafson's law
      5. Gigabit Ethernet
  3. Means articles
    1. Consistent hashing
        1. The status of the Amazon
        2. Algorithm selection
    2. Quorum NRW
    3. Vector clock
    4. Virtual node
    5. gossip
      1. Gossip (State Transfer Model)
      2. Gossip (Operation Transfer Model)
    6. Merkle tree
    7. Paxos
      1. Background
    8. DHT
    9. Map Reduce Execution
    10. Handling Deletes
    11. Storage implementation
    12. Node changes
    13. Keep out
      1. Description
      2. Features
  4. Software articles
    1. Sub-database
      1. MemCached
        1. Features
        2. Memory allocation
        3. Cache policy
        4. Caching database query
        5. Data redundancy and fault prevention
        6. Memcached Client (mc)
        7. Cache-style Web application framework
        8. Performance Testing
      2. dbcached
        1. Memcached and dbcached are functionally the same?
    2. Column storage Series
      1. The Hbase Hadoop
      2. HadoopDB Yale University
      3. GreenPlum
      4. The Cassandra FaceBook
        1. Cassandra Features
        2. Keyspace
        3. Column family (CF)
        4. Key
        5. Column
        6. Super column
        7. Sorting
        8. Storage
        9. API
      5. The BigTable Google
      6. The PNUTS Yahoo
        1. Features
        2. PNUTS achieve
          1. Record-level mastering the master record level
          2. PNUTS structure
          3. Addressing and separation Tablets
          4. Write call diagram
        3. PNUTS sentiment
      7. The Microsoft SQL Data Services
    3. Non-cloud service competitors
    4. Document Storage
      1. CouchDB
        1. Features
      2. Riak
      3. MongoDB
      4. Terrastore
      5. ThruDB
    5. Key Value / Tuple Storage
      1. The Amazon SimpleDB
      2. Chordless
      3. Redis
      4. Scalaris
      5. Tokyo cabinet / Tyrant
      6. CT.M
      7. Scalien
      8. Berkley DB
      9. MemcacheDB
      10. Mnesia
      11. LightCloud
      12. HamsterDB
      13. Flare
    6. Key Value store eventual consistency
      1. The Dynamo Amazon
        1. Features
        2. Architecture Features
      2. BeansDB
        1. Introduction
        2. Update
        3. Features
        4. Performance
      3. Nuclear
        1. Tips on two design
      4. Voldemort
      5. Dynomite
      6. Kai
    7. Uncategorized
      1. Skynet
      2. Drizzle
    8. Comparison
      1. Scalability
      2. Data and query model
      3. Sustainable design
  5. Application articles
    1. eBay architecture experience
    2. Taobao architecture experience
    3. Flickr architecture experience
    4. Operation and maintenance experience of Twitter
      1. Operation and maintenance experience
        1. Metrics
        2. Configuration Management
        3. Darkmode
        4. Process Management
        5. Hardware
      2. Code Collaborative experience
        1. Review System
        2. Deployment Management
        3. Team Communication
      3. Cache
    5. Cloud Computing Architecture
    6. Anti-patterns
      1. Single point of failure (Single Point of Failure)
      2. Synchronous call
      3. Do not have the ability to roll back
      4. Do not log
      5. Segmentation of the database without
      6. Without the application of segmentation
      7. Scalability depends on third-party vendors will be
    7. OLAP
      1. OLAP reporting products where the greatest difficulty?
    8. There are underlying principles NOSQL
      1. Assuming the failure is inevitable
      2. Partition the data
      3. Save multiple copies of the same data
      4. Dynamic stretching
      5. Query Support
      6. Using Map / Reduce cluster processing
      7. Disk and memory based on the realization of
      8. Just hype?
  6. Attached
    1. Thanks
    2. Version Notes
    3. Quote


Sequence

No country has a relatively complete NoSQL database information, there are many pioneers arranges a lot, but not very system. Yet this position try to look at each of the data integration, and writing some of his own views.
Some of the current NoSql writing some of the major technologies, algorithms and ideas. Also cited a large number of existing database instance. Read full articles, I believe readers will understand the one about NoSQL database.
In addition, I also prepared to develop an open-source memory database galaxydb. This book also provides some structure for the database information.

Thought papers


CAP, BASE, and the final consistency is the cornerstone of the three NoSQL database exists. The five-minute rule is a theoretical basis for data storage memory. This is the source of everything.

CAP


[Reprint] [reference] NoSQL database conversation by writing [Author: Yan On]

  • C: C onsistency consistency
  • A: A vailability availability (mean quick access to data)
  • P: Tolerance of network P artition partition tolerance (distributed)


10 years ago, Eric Brewer, Professor of the CAP that the famous theory, and later Seth Gilbert and Nancy lynch two CAP proved correctness of the theory. CAP theory tells us that a distributed system can not satisfy the consistency, availability, fault tolerance, and partition the three requirements, can only meet two.

Eat and fish also can not have both. Concern is consistency, then you need to deal with because the system is unavailable due to write failures, and if you are concerned about availability, then you should know that the system may not be accurate read operations to write operations write to read into the latest value. Therefore, the focus system is different from the corresponding strategy used is not the same, only the true understanding of the needs of the system is it possible to make good use of CAP theory.

As an architect, there are two directions to take advantage of the general theory of CAP

  1. key-value stores, such as Amaze Dynamo, etc., according to three principles of CAP the flexibility to choose different tendencies of the database products.
  2. Domain model + distributed cache + Storage (Qi4j and NoSql movement), according to three principles of CAP projects in conjunction with their custom flexible distributed programs, difficult.
I am prepared to provide a third alternative: CAP's database can be configured to achieve the dynamic deployment of CAP.

  • CA: traditional relational database
  • AP: key-value database


The large site, availability and partition tolerance of a higher priority than data consistency, the general will try to move A, P in the direction the design, and then by other means to ensure the consistency of business needs. Architects do not waste energy on how to perfect the design to meet the three distributed systems, but should be trade-offs.

For the consistency of different data requirements are different. For example, the user comments on the inconsistency is not sensitive and can tolerate a relatively longer period of time inconsistency, this inconsistency does not affect transactions and user experience. The data are very price sensitive, usually can not tolerate more than 10 seconds in the price of inconsistencies.

CAP proof theory: Brewer's CAP Theorem

Eventual consistency


In short: the process of loose, the results of tight, the end result must be consistent

In order to better describe the client-side consistency, we carried out the following scene, the scene consists of three components:
  • Storage Systems
Storage system can be understood as a black box, it provides us with the availability and sustainability assurance.
  • Process A
ProcessA mainly from a storage system write and read operations
  • Process B and ProcessC
ProcessB and C is independent of A, and B, and C are independent of each other, they also realize the storage system write and read operations.

Below to above under the different scenarios to describe the degree of consistency:

  • Strong consistency
Strong consistency (immediate compliance) if A writes a value to the first storage system, storage system to ensure follow-up A, B, C's read operations will return the latest value
  • Weak consistency
A first, if a value is written to the storage systems, storage systems can not guarantee the follow-up A, B, C read operation can read the latest value. In this case there is a "inconsistencies window" concept, it refers specifically to write values from the A to the subsequent operation of A, B, C to read the latest value of this period of time.
  • Eventual consistency
Weak consistency of the final agreement is a special case. If A first write a value to the storage system, storage system to ensure that if A, B, C before the follow-up to read no other write operation, then update the same value, and ultimately all the reads will be written to read to the most A latest value. In this case, if no failure occurs, then "inconsistency window" depends on the following factors: interaction delay, system load, and the number of replica replication technology (which can be understood as master / salve mode in, salve the number), the final consistency of the most famous is the DNS system, the system can be said that when a domain name of the IP updated after the configuration strategy, and according to the different cache control policies, and ultimately all customers will see the value of the latest .

Variant

  • Causal consistency (causal consistency)
If Process A Process B notice that it has updated data, the follow-up Process B A read operation, read the latest written value, and there is no causal relationship with the A to C is the final consistency.
  • Read-your-writes consistency
If Process A writes a new value, then the Process A follow-up operation will read the latest value. But after a while other users may have to before they can see.
  • Session consistency
Such consistency requires the client and storage system interaction stage of the entire session to ensure Read-your-writes consistency.Hibernate the session is to provide security for the consistency of such consistency.
  • Monotonic read consistency
If Process A conformance requirements of this has been the object of a value to read, then subsequent operations will not be read into the earlier value.
  • Monotonic write consistency
Such consistency will ensure that the system performs a sequence of Process all the write operation.

BASE

That it is very interesting, BASE is the meaning of the English base, and ACID is the acid. Really incompatible ah.

  • Basically Availble - Basic available
  • Soft-state - soft state / Flexible Service
"Soft state" can be understood as "no connection", and "Hard state" is the "connection-oriented" and
  • Eventual Consistency - consistency of the final
Eventual consistency is the ultimate goal is ACID.

Anti-ACID BASE model model ACID completely different model, at the expense of high consistency, availability or reliability of access: Basically Available basic available. Support the partition fails (eg sharding debris by the database) Soft state soft-state state can have a period of time is not synchronized, asynchronous. Eventually consistent final agreement is consistent with the final data can be, and not always consistent.

The main implementations thought BASE
1. The database by function
2.sharding debris

BASE basic idea mainly emphasized the availability, if you need high availability, that is pure performance, then they would have the consistency or the expense of fault tolerance, BASE program ideas in performance or have the potential to be tapped for.

Other


I / O's five-minute rule


In 1987, Jim Gray and Gianfranco Putzolu published the "five-minute rule" point of view, in short, if a record is accessed frequently, it should be placed in memory, otherwise it should stay on your hard disk by a need to access. The critical point is five minutes. Looks like a law of empirical fact, five minutes of the evaluation criteria are based on input costs to determine, according to the level of hardware development at the time, to keep in memory the cost is equivalent to 1KB of data storage hard disk, according to the cost of 400 seconds (close to five minutes). This law is about the time in 1997 conducted a review, confirmed the five-minute rule is still valid (hard disk, memory, virtually no qualitative leap), and this review is for the SSD in this "new old hardware" may take to influence.



With the flash era, divided into two five-minute rule: It is slow as the SSD memory (extended buffer pool) or as a fast hard drive using the (extended disk) to use. Small memory pages in memory and flash memory for mobile comparison between the large memory pages between flash memory and disk movement. In this law 20 years after first proposed, in the flash times, 5 minutes rule is still valid, but for larger memory pages (for 64KB of the page, change the page size is precisely reflects the development of computer hardware technology, and bandwidth, delay).

Do not delete data



Oren Eini (aka Ayende Rahien) suggested the developer to avoid the soft delete the database, the reader may therefore think hard delete is a reasonable choice. Response to the Article as Ayende, Udi Dahan is strongly recommended to completely avoid deletion of data.

Advocate so-called soft delete in the table to add a column to keep the data integrity IsDeleted. If a line set IsDeleted flag columns, then the line is considered to be deleted. Ayende feel that this method "simple and easy to understand, easy to implement, easier to communicate," but "is often wrong." The question is:

Delete a line or an entity almost always not a simple event. It not only affects the data model, will also affect the appearance of the model. So we have a foreign key to ensure that no "Order Line" did not correspond to the parent "order" situation. And this example can only be regarded as the simplest case. ... ...

When soft remove the time, whether we prefer, the data are prone to damage, such as no one do not mind a small adjustment can make the "customer" of the "new order" point to an order has been soft deleted.

If the request is received by the developer to remove the data from the database, if it does not recommend using a soft delete, it can only hard-deleted. In order to ensure data consistency, the developer in addition to delete rows of data directly related to, but should cascade delete related data. Udi Dahan may remind the reader that the real world is not the cascade:

Assuming marketing decision to remove from the catalog, like commodities, it is not to say that all the products containing the old order must be lost? Then cascade down all the invoices corresponding to these orders is also the delete? Delete such a step down, our company is not profit and loss statements should be redone?

God has no ears.

Problem seems to be out in the "delete" on the interpretation of the word. Dahan gives this example:

I say "delete" actually refers to the product "sale" of the. We will no longer sell this product, dry the stock will no longer purchase. After the customer search for products or browse through the directory will not see this when the goods, but a temporary charge of the warehouse people who have to continue to manage them. "Delete" is to say the sake of expediency.

He then gave some standing on the correct interpretation of the user point of view:

Order not to be deleted, is "cancel" the. Too late to cancel the order, but also have to spend.

Staff not being deleted, is to be "fired" (and probably be retired.) There are appropriate to deal with compensation.

Post is not deleted, is "filled" (or recruitment application is withdrawn.)

In these examples, our focus should be placed on the user wants to accomplish, rather than a physical body in the technical action. In almost all cases, the need to consider the total is more than one entity.

In place of IsDeleted flag, Dahan data suggested a representative of the state of fields: valid, disable, cancel, dispose of and so on. Users can make use of such a state of the field the past data as a decision-making.

Delete data in addition to destruction of data consistency, there are other negative consequences. Dahan recommended that all data stays in the database: "Do not delete. Is not deleted."

Go to site to view other content here to continue
  • del.icio.us
  • StumbleUpon
  • Digg
  • TwitThis
  • Mixx
  • Technorati
  • Facebook
  • NewsVine
  • Reddit
  • Google
  • LinkedIn
  • YahooMyWeb

Related Posts of [Reprint] [reference] NoSQL database conversation by writing [Author: Yan On]

  • Mapping tool for browser-side analysis

    Divided into two main categories: One category is pure Javascript Advantages: does not require any plug-in, no understanding of Flash Recommend Product: 1 Flot http://code.google.com/p/flot/ Function well, the document is too easy 2 EJS commercial pu ...

  • Manual configuration in UBUNTU under ruby on rails environment

    Careless mistake for the day before yesterday, the sources, the results after 810 error after the upgrade, the loss of response button. On google found a lot of trouble really, lucky point modification under the / etc/X11/xorg.conf to restore both, b ...

  • Ruby Rails system calls the use of Ping

    Want to use call system commands under RoR, the general exec and system and the syscall Want to use in Rails, the call ping command The most common are ruby support Ping usage is as follows: Ping contains routines to test for the reachability of remote ho

  • Struts Spring Hibernate test Noodles

    Hibernate working principle and why to use? Principle: 1. To read and parse configuration file 2. Read and analytic mapping information, create a SessionFactory 3. Open Sesssion 4. Create Affairs transation 5. Persistence operation 6. Submitted Services 7

  • Rails source code analysis (4): Request / Response

    1) Rails defines two abstract class: AbstractRequest and AbstractResponse code is not posted, first take a look at Response. 2) is mainly responsible for the specific ctgi_process.rb main interface is responsible for implementation methods of impleme ...

  • What is the JPA

    Same, JDO, also started compatible JPA. At the field of ORM, it seems that JPA is a benevolent government, is the normative specification. At the support of major manufacturers, JPA use became widespread. 2 Spring Spring + Hibernate often referred to as t

  • Hibernate II Study Notes

    11. Many-to-many Of many that can be converted to two one-to-many <set name="students" table="teacher_student"> <key column="techer_id"/> <many-to-many column="student_id"/> </set> many-to-many data only from one end of the mainten

  • Use Ext JS to read the JsonReader complex object json

    Today was how to resolve the following complex json object to the difficult living over a long time .. did not find documentation how to read JsonReader Ways json object (possibly also because of their limited level of E the text did not correctly underst

  • NoClassDefFoundError: javax / servlet / Servlet

    In the project in order to achieve a listener in web.xml set up a listener, did start in Tomcat actually occurred after java.lang.NoClassDefFoundError: javax / servlet / ServletContextListener this anomaly google and found the reasons for the lack of serv

  • The level Hibernate cache

    Hibernate cache level: (1) a cache is very short and the session life cycle consistent, also known as session-level cache-level cache or transaction-level cache (2) Ways of Supporting level cache: get (); load (); iterator (); only entity object cach ...

blog comments powered by Disqus
Recent
Recent Entries
Tag Cloud
Random Entries