[Reprint] [reference] NoSQL database conversation by writing [Author: Yan On]

Reprinted: NoSQL Database conversation by writing of: Yen On

Written Discussion NoSQL database

Yan Open
v0.2
2010.2

  1. Sequence
  2. Thought papers
    1. CAP
    2. Eventual consistency
      1. Variant
    3. BASE
    4. Other
      1. I / O's five-minute rule
      2. Do not delete data
      3. RAM is the hard disk, hard disk is a tape
      4. Amdahl's law and Gustafson's law
      5. Gigabit Ethernet
  3. Means articles
    1. Consistent hashing
        1. The status of the Amazon
        2. Algorithm selection
    2. Quorum NRW
    3. Vector clock
    4. Virtual node
    5. gossip
      1. Gossip (State Transfer Model)
      2. Gossip (Operation Transfer Model)
    6. Merkle tree
    7. Paxos
      1. Background
    8. DHT
    9. Map Reduce Execution
    10. Handling Deletes
    11. Storage implementation
    12. Node changes
    13. Keep out
      1. Description
      2. Features
  4. Software articles
    1. Sub-database
      1. MemCached
        1. Features
        2. Memory allocation
        3. Cache policy
        4. Caching database query
        5. Data redundancy and fault prevention
        6. Memcached Client (mc)
        7. Cache-style Web application framework
        8. Performance Testing
      2. dbcached
        1. Memcached and dbcached are functionally the same?
    2. Column storage Series
      1. The Hbase Hadoop
      2. HadoopDB Yale University
      3. GreenPlum
      4. The Cassandra FaceBook
        1. Cassandra Features
        2. Keyspace
        3. Column family (CF)
        4. Key
        5. Column
        6. Super column
        7. Sorting
        8. Storage
        9. API
      5. The BigTable Google
      6. The PNUTS Yahoo
        1. Features
        2. PNUTS achieve
          1. Record-level mastering the master record level
          2. PNUTS structure
          3. Addressing and separation Tablets
          4. Write call diagram
        3. PNUTS sentiment
      7. The Microsoft SQL Data Services
    3. Non-cloud service competitors
    4. Document Storage
      1. CouchDB
        1. Features
      2. Riak
      3. MongoDB
      4. Terrastore
      5. ThruDB
    5. Key Value / Tuple Storage
      1. The Amazon SimpleDB
      2. Chordless
      3. Redis
      4. Scalaris
      5. Tokyo cabinet / Tyrant
      6. CT.M
      7. Scalien
      8. Berkley DB
      9. MemcacheDB
      10. Mnesia
      11. LightCloud
      12. HamsterDB
      13. Flare
    6. Key Value store eventual consistency
      1. The Dynamo Amazon
        1. Features
        2. Architecture Features
      2. BeansDB
        1. Introduction
        2. Update
        3. Features
        4. Performance
      3. Nuclear
        1. Tips on two design
      4. Voldemort
      5. Dynomite
      6. Kai
    7. Uncategorized
      1. Skynet
      2. Drizzle
    8. Comparison
      1. Scalability
      2. Data and query model
      3. Sustainable design
  5. Application articles
    1. eBay architecture experience
    2. Taobao architecture experience
    3. Flickr architecture experience
    4. Operation and maintenance experience of Twitter
      1. Operation and maintenance experience
        1. Metrics
        2. Configuration Management
        3. Darkmode
        4. Process Management
        5. Hardware
      2. Code Collaborative experience
        1. Review System
        2. Deployment Management
        3. Team Communication
      3. Cache
    5. Cloud Computing Architecture
    6. Anti-patterns
      1. Single point of failure (Single Point of Failure)
      2. Synchronous call
      3. Do not have the ability to roll back
      4. Do not log
      5. Segmentation of the database without
      6. Without the application of segmentation
      7. Scalability depends on third-party vendors will be
    7. OLAP
      1. OLAP reporting products where the greatest difficulty?
    8. There are underlying principles NOSQL
      1. Assuming the failure is inevitable
      2. Partition the data
      3. Save multiple copies of the same data
      4. Dynamic stretching
      5. Query Support
      6. Using Map / Reduce cluster processing
      7. Disk and memory based on the realization of
      8. Just hype?
  6. Attached
    1. Thanks
    2. Version Notes
    3. Quote

Sequence

No country has a relatively complete NoSQL database information, there are many pioneers arranges a lot, but not very system. Yet this position try to look at each of the data integration, and writing some of his own views.
Some of the current NoSql writing some of the major technologies, algorithms and ideas. Also cited a large number of existing database instance. Read full articles, I believe readers will understand the one about NoSQL database.
In addition, I also prepared to develop an open-source memory database galaxydb. This book also provides some structure for the database information.

Thought papers

CAP, BASE, and the final consistency is the cornerstone of the three NoSQL database exists. The five-minute rule is a theoretical basis for data storage memory. This is the source of everything.

CAP

[Reprint] [reference] NoSQL database conversation by writing [Author: Yan On]

  • C: C onsistency consistency
  • A: A vailability availability (mean quick access to data)
  • P: Tolerance of network P artition partition tolerance (distributed)

10 years ago, Eric Brewer, Professor of the CAP that the famous theory, and later Seth Gilbert and Nancy lynch two CAP proved correctness of the theory. CAP theory tells us that a distributed system can not satisfy the consistency, availability, fault tolerance, and partition the three requirements, can only meet two.

Eat and fish also can not have both. Concern is consistency, then you need to deal with because the system is unavailable due to write failures, and if you are concerned about availability, then you should know that the system may not be accurate read operations to write operations write to read into the latest value. Therefore, the focus system is different from the corresponding strategy used is not the same, only the true understanding of the needs of the system is it possible to make good use of CAP theory.

As an architect, there are two directions to take advantage of the general theory of CAP

  1. key-value stores, such as Amaze Dynamo, etc., according to three principles of CAP the flexibility to choose different tendencies of the database products.
  2. Domain model + distributed cache + Storage (Qi4j and NoSql movement), according to three principles of CAP projects in conjunction with their custom flexible distributed programs, difficult.

I am prepared to provide a third alternative: CAP's database can be configured to achieve the dynamic deployment of CAP.

  • CA: traditional relational database
  • AP: key-value database

The large site, availability and partition tolerance of a higher priority than data consistency, the general will try to move A, P in the direction the design, and then by other means to ensure the consistency of business needs. Architects do not waste energy on how to perfect the design to meet the three distributed systems, but should be trade-offs.

For the consistency of different data requirements are different. For example, the user comments on the inconsistency is not sensitive and can tolerate a relatively longer period of time inconsistency, this inconsistency does not affect transactions and user experience. The data are very price sensitive, usually can not tolerate more than 10 seconds in the price of inconsistencies.

CAP proof theory: Brewer's CAP Theorem

Eventual consistency

In short: the process of loose, the results of tight, the end result must be consistent

In order to better describe the client-side consistency, we carried out the following scene, the scene consists of three components:

  • Storage Systems

Storage system can be understood as a black box, it provides us with the availability and sustainability assurance.

  • Process A

ProcessA mainly from a storage system write and read operations

  • Process B and ProcessC

ProcessB and C is independent of A, and B, and C are independent of each other, they also realize the storage system write and read operations.

Below to above under the different scenarios to describe the degree of consistency:

  • Strong consistency

Strong consistency (immediate compliance) if A writes a value to the first storage system, storage system to ensure follow-up A, B, C's read operations will return the latest value

  • Weak consistency

A first, if a value is written to the storage systems, storage systems can not guarantee the follow-up A, B, C read operation can read the latest value. In this case there is a "inconsistencies window" concept, it refers specifically to write values from the A to the subsequent operation of A, B, C to read the latest value of this period of time.

  • Eventual consistency

Weak consistency of the final agreement is a special case. If A first write a value to the storage system, storage system to ensure that if A, B, C before the follow-up to read no other write operation, then update the same value, and ultimately all the reads will be written to read to the most A latest value. In this case, if no failure occurs, then "inconsistency window" depends on the following factors: interaction delay, system load, and the number of replica replication technology (which can be understood as master / salve mode in, salve the number), the final consistency of the most famous is the DNS system, the system can be said that when a domain name of the IP updated after the configuration strategy, and according to the different cache control policies, and ultimately all customers will see the value of the latest .

Variant

  • Causal consistency (causal consistency)

If Process A Process B notice that it has updated data, the follow-up Process B A read operation, read the latest written value, and there is no causal relationship with the A to C is the final consistency.

  • Read-your-writes consistency

If Process A writes a new value, then the Process A follow-up operation will read the latest value. But after a while other users may have to before they can see.

  • Session consistency

Such consistency requires the client and storage system interaction stage of the entire session to ensure Read-your-writes consistency.Hibernate the session is to provide security for the consistency of such consistency.

  • Monotonic read consistency

If Process A conformance requirements of this has been the object of a value to read, then subsequent operations will not be read into the earlier value.

  • Monotonic write consistency

Such consistency will ensure that the system performs a sequence of Process all the write operation.

BASE

That it is very interesting, BASE is the meaning of the English base, and ACID is the acid. Really incompatible ah.

  • Basically Availble - Basic available
  • Soft-state - soft state / Flexible Service

"Soft state" can be understood as "no connection", and "Hard state" is the "connection-oriented" and

  • Eventual Consistency - consistency of the final

Eventual consistency is the ultimate goal is ACID.

Anti-ACID BASE model model ACID completely different model, at the expense of high consistency, availability or reliability of access: Basically Available basic available. Support the partition fails (eg sharding debris by the database) Soft state soft-state state can have a period of time is not synchronized, asynchronous. Eventually consistent final agreement is consistent with the final data can be, and not always consistent.

The main implementations thought BASE
1. The database by function
2.sharding debris

BASE basic idea mainly emphasized the availability, if you need high availability, that is pure performance, then they would have the consistency or the expense of fault tolerance, BASE program ideas in performance or have the potential to be tapped for.

Other

I / O's five-minute rule

In 1987, Jim Gray and Gianfranco Putzolu published the "five-minute rule" point of view, in short, if a record is accessed frequently, it should be placed in memory, otherwise it should stay on your hard disk by a need to access. The critical point is five minutes. Looks like a law of empirical fact, five minutes of the evaluation criteria are based on input costs to determine, according to the level of hardware development at the time, to keep in memory the cost is equivalent to 1KB of data storage hard disk, according to the cost of 400 seconds (close to five minutes). This law is about the time in 1997 conducted a review, confirmed the five-minute rule is still valid (hard disk, memory, virtually no qualitative leap), and this review is for the SSD in this "new old hardware" may take to influence.

[Reprint] [reference] NoSQL database conversation by writing [Author: Yan On]

With the flash era, divided into two five-minute rule: It is slow as the SSD memory (extended buffer pool) or as a fast hard drive using the (extended disk) to use. Small memory pages in memory and flash memory for mobile comparison between the large memory pages between flash memory and disk movement. In this law 20 years after first proposed, in the flash times, 5 minutes rule is still valid, but for larger memory pages (for 64KB of the page, change the page size is precisely reflects the development of computer hardware technology, and bandwidth, delay).

Do not delete data

Oren Eini (aka Ayende Rahien) suggested the developer to avoid the soft delete the database, the reader may therefore think hard delete is a reasonable choice. Response to the Article as Ayende, Udi Dahan is strongly recommended to completely avoid deletion of data.

Advocate so-called soft delete in the table to add a column to keep the data integrity IsDeleted. If a line set IsDeleted flag columns, then the line is considered to be deleted. Ayende feel that this method "simple and easy to understand, easy to implement, easier to communicate," but "is often wrong." The question is:

Delete a line or an entity almost always not a simple event. It not only affects the data model, will also affect the appearance of the model. So we have a foreign key to ensure that no "Order Line" did not correspond to the parent "order" situation. And this example can only be regarded as the simplest case. ... ...

When soft remove the time, whether we prefer, the data are prone to damage, such as no one do not mind a small adjustment can make the "customer" of the "new order" point to an order has been soft deleted.

If the request is received by the developer to remove the data from the database, if it does not recommend using a soft delete, it can only hard-deleted. In order to ensure data consistency, the developer in addition to delete rows of data directly related to, but should cascade delete related data. Udi Dahan may remind the reader that the real world is not the cascade:

Assuming marketing decision to remove from the catalog, like commodities, it is not to say that all the products containing the old order must be lost? Then cascade down all the invoices corresponding to these orders is also the delete? Delete such a step down, our company is not profit and loss statements should be redone?

God has no ears.

Problem seems to be out in the "delete" on the interpretation of the word. Dahan gives this example:

I say "delete" actually refers to the product "sale" of the. We will no longer sell this product, dry the stock will no longer purchase. After the customer search for products or browse through the directory will not see this when the goods, but a temporary charge of the warehouse people who have to continue to manage them. "Delete" is to say the sake of expediency.

He then gave some standing on the correct interpretation of the user point of view:

Order not to be deleted, is "cancel" the. Too late to cancel the order, but also have to spend.

Staff not being deleted, is to be "fired" (and probably be retired.) There are appropriate to deal with compensation.

Post is not deleted, is "filled" (or recruitment application is withdrawn.)

In these examples, our focus should be placed on the user wants to accomplish, rather than a physical body in the technical action. In almost all cases, the need to consider the total is more than one entity.

In place of IsDeleted flag, Dahan data suggested a representative of the state of fields: valid, disable, cancel, dispose of and so on. Users can make use of such a state of the field the past data as a decision-making.

Delete data in addition to destruction of data consistency, there are other negative consequences. Dahan recommended that all data stays in the database: "Do not delete. Is not deleted."

Go to site to view other content here to continue

分类:Java 时间:2010-09-29 人气:208
分享到:
blog comments powered by Disqus

相关文章

  • Web application framework with regard to the choice of technology Summary 2009-09-27

    Publisher: tomore Date :2007-11-23 14:11 Done recently in Web application framework and technology choice. We need to achieve the characteristics of a Web2.0 website Factors to consider: 1. Function modules (2-3 basic functions (release, reviews, pho ...

  • Web application framework based on MVC design-PHP 2010-10-08

    Abstract: Analysis of the MVC software model described by a language based on PHP XML and XSL technology combined with the MVC framework for discussion of the MVC design pattern instance of the design method. Keywords: MVC model view controller architectu

  • Python WEB Application Framework Overview 2011-04-25

    Keywords: python framework; Django; web application <br /> Reference URL: http://wiki.woodpecker.org.cn/moin/PyWebFrameList Less attention by people to sort Python + Web ~ SWiK collection and collation,,, SEE PyWebFrameVs ~ framework comparison

  • The Web application framework based on the business SimpleFramework 2009-12-20

    Simpleframework is based on Intercepting-Filter mode and MVC2 pattern (we call it: post-processing mode, After-Processing Pattern), based on the "application of the component, component or application (Application are components, component can be a a

  • How fast. Flexible web application framework 2010-08-19

    B / S architecture is increasingly becoming the mainstream enterprise applications, market size is growing rapidly every year. However, with this rapid growth does not match the efficiency of web development is still at a low level. Although the dema

  • Web application framework for the world 2010-10-21

    Tom Nolle is ExperiaSphere chief strategist, ExperiaSphere is designed to create an abstract service logic and management of open-source Java project, and he is also a private consulting firm CIMI's CEO, CIMI specializing in advanced computing an

  • Development environment in Ubuntu 11.04 SSH example AppFuse 2.1 Java Web Application Framework 2011-05-06

    Just under the original development of the Windows 7 AppFuse 2.1 application, successfully moved to the next Ubuntu 11.04, making the Ubuntu environment, can use Maven 3.0.3 and Eclipse Galileo development source code and deploy the WAR package Tomcat6 on

  • google sitebricks learning experience - the most simple web application 2010-12-15

    Introduction Sitebricks is still in beta, it is a new Java ™ Web application framework. You may want to ask: "Why do I need another Web Framework?" Through Google Sitebricks, you can quickly build a maintenance or operation by another Web applic

  • The use of struts + spring + hibernate assembly web application 2009-11-16

    This article will discuss how the combination of several well-known framework to achieve the purpose of loosely coupled and how to set up your framework, how to make your line all the application layer. Challenging are: combination of these frameworks so

  • Reprinted PHP Application Framework Design: 1 - Getting Started 2010-02-24

    Transfer from http://www.lbsharp.com/wordpress/index.php/2005/10/13/php-application-framework-design-1-getting-started/ This article describes the design of a complete application framework written in PHP. The reader is assumed to have a working know ...

iOS 开发

Android 开发

Python 开发

JAVA 开发

开发语言

PHP 开发

Ruby 开发

搜索

前端开发

数据库

开发工具

开放平台

Javascript 开发

.NET 开发

云计算

服务器

Copyright (C) codeweblog.com, All Rights Reserved.

CodeWeblog.com 版权所有 黔ICP备15002463号-1

processed in 0.549 (s). 12 q(s)