Reprint: http://blog.csdn.net/hedylin/archive/2007/04/03/1550088.aspx

Speaking of databases, I think the data structure can not but talk about. In 1996, I joined the University in computer programming, when the teacher told us that: a computer program = data structure + algorithm. Although the procedure has been developed mainly for the gradual process of transition to the main object-oriented, but I deeply agree with 8 years ago, teachers told us the formula: a computer program = data structure + algorithm. Object-oriented development, the first thing to do is, first analysis of the whole process to deal with data, extracted from abstract template to design the abstract template, and then gradually add in the data processing function ( that the algorithm), and finally, give the class members and function of the data into access in order to achieve package.

It is said that the initial prototype of the database from the U.S. account of a dairy thin (paper-based, and we can see that the database is not necessarily stored in the computer data records ^_^), inside the dairy farm income and expenditure accounts, programmers in their finishing, when input to the computer to be inspired. When a good data structure in accordance with the provisions of the data collected in large quantity to a certain extent, the procedures for the implementation of efficiency considerations, programmers will be the search and update features such as the maintenance of separate calls into a separate module, this module Later on slowly evolved into what we are exposed to the database management system (DBMS) - development of an important branch.

The following into the subject, first of all, I personally contacted by the procedural to the database design of the foundation staff at some categories:
1, there is no systematic data structure learning programmers. Such works often programmers just improvise their toys, they are often used to design only a limited number of tables, the realization of all the data in certain types of features in a side table, the table is almost no connection between. A lot of free online management software is such a thing, when the program features a limited, small amount of data when the program up and running without any problems, but if its management is more important to use the data, very large risks.
2, the system studied data structure, but has not yet been developed to process relatively high efficiency of the requirements management software programmers. Most of these people graduate from school soon, they in the design of the structure of database tables, in strict accordance with the provisions of textbooks,死扣ER diagram and 3NF (Do not lose heart, all the database design expert are the beginning of this step). Their work, for general-type lightweight access management software, has been enough. However, if the system needs to add new features, the original database table is almost too big to bring in new blood.
3, Type II programmer, after several procedures to enhance efficiency and functionality of the toss about the upgrade, the upgrade was designed as a database老鸟, the first category of the eyes of an expert programmer. Such programmers can do more than 20 tables of data management system for medium-sized business development. They know what the circumstances a fair amount of redundant data to improve process efficiency, and the design of the database may be better to expand, when users need to add new features, the original database table so only a small number of modifications to .
4, in the 10 experienced a similar duplication of database management software design, programmers in the third category did not change jobs persevered, but I hope to find "lazy" people will gradually know-how consciousness and thus the completion of quantitative to qualitative changes conversion. They designed the structure of the database tables have some vision, can predict the future function of the data needed to upgrade in order to advance future. Programmers such as the current most advanced data mining of high-level software developers.
5, the third category or the fourth category programmers programmers, each of the existing database management system and development of certain principles of the study, after, or in its second development on the basis of either a self-developed have their own copyright Universal Database management systems.

I personally is at the end of the third category, so the following list is only suitable for a number of design skills and part of the second category third category of database designers. At the same time, I rarely encountered as a result of interest in this regard will peer deep drilling, it is inevitable that the text of errors and omissions in this first statement, welcome to correct me, do not藏私Oh 8)

First, the relationship between the data table tree
Database design in a lot of programmers are experienced when the relationship between the data tree, such as the common types of form, that is a big category, there are a number of sub-category below, some sub-categories and sub-categories such circumstances. When the type of uncertainty, the user would like to add in any type of new sub-category, or delete a category and all sub-categories, but their number is expected to be gradually after growth at this time we will consider the use of a data table to save the data. In accordance with the teachings of textbooks, the second category would probably be a programmer to design a similar structure of this data sheet:

Type _1 (Type_table_1)
Note the name of the type of constraints
type_id int not repeat categories logo, primary key
type_name char (50) does not allow for the type of name space, not allowed to repeat
type_father int space does not allow for the parent category of the category ID, if it is the top node, then the only value is set to a

This design dapper, fully satisfy the 3NF, and meet all the requirements of users. Is this it on the line? The answer is NO! Why?

Let's try to estimate the user wishes to set out how the data in this table. Of users, of course, hope that he set his first set out the relationship between the level of all the categories, such as this:
Total categories
Category 1
Type 1.1
Category 1.1.1
Type 1.2
Category 2
Type 2.1
Category 3
Type 3.1
Type 3.2
... ...

In order to achieve this look at the list of shows (the first tree traversal), to the above table the number of times search? Note that although the category of 1.1.1 may be added after 3.2 in the category of records, the answer is N times. The efficiency of such a small amount of data for nothing, but the type of expansion in future to be dozens or even hundreds of records, only one type is necessary to search out dozens of the table, the whole process leaves much to be desired on the operating efficiency. Perhaps programmers would say that the second category, then I build a temporary array or temporary tables, specifically to preserve the type of table the results of the first traversal, so that only the first time in the search dozens of run-time, once again set out the type of relationship between all the direct reading of the temporary array, or a temporary table on the line. In fact, no need to go a new memory allocation to store the data, as long as a certain form of data expansion, then add the number of types of constraints on the list click to complete the above list of just a search on the list. The following is the expanded structure of the data sheet:

Type _2 (Type_table_2)
Note the name of the type of constraints
type_id int not repeat categories logo, primary key
type_name char (50) does not allow for the type of name space, not allowed to repeat
type_father int space does not allow for the parent category of the category ID, if it is the top node, then the only value is set to a
type_layer char (6) limit the 3-layer, the initial value of 000000 classes of first traversal, mainly to reduce the number of searchable database

In accordance with this table structure, we take a look at the example above, records the data in the table is how:

type_id type_name type_father type_layer
1 categories 0,000,000
2 types of 11,010,000
3 categories of 1.1 2,010,100
4 categories of 1.2 2,010,200
5 types of 21,020,000
6 types of 2.1 5,020,100
7 types of 31,030,000
8 categories of 3.1 7,030,100
9 categories of 3.2 7,030,200
10 categories 1.1.1 3,010,101
... ...

Now according to the size of type_layer click Search: SELECT * FROM Type_table_2 ORDER BY type_layer

Record set list is as follows:

type_id type_name type_father type_layer
1 categories 0,000,000
2 types of 11,010,000
3 categories of 1.1 2,010,100
10 categories 1.1.1 3,010,101
4 categories of 1.2 2,010,200
5 types of 21,020,000
6 types of 2.1 5,020,100
7 types of 31,030,000
8 categories of 3.1 7,030,100
9 categories of 3.2 7,030,200
... ...

Records are listed in the order now is just the result of the first traversal. In controlling the display when the level of categories, as long as the numerical field type_layer judge, each group of two, such as greater than 0 to two space shifted to right. Of course, this example I set restrictions on the condition that a maximum of 3 layers, each layer can be located up to 99 sub-categories, as long as the demand by the user changed a bit, and the median length of type_layer, you can change the restrictions on the number of layers and sub-categories. In fact, the above design is not just only in the type of table used, some may be tree-line forum for a list of procedures showed a similar design mostly.

Some may think that, Type_table_2 in the field is type_father redundant data can be removed. If so, in the insert, delete a category, the need to compare the contents of type_layer cumbersome to determine, so I did not eliminate type_father field, and this is consistent with the database design appropriate to retain data to reduce redundancy procedures the principles of complexity, the latter I will give you an intentional increase in the case of data redundancy.

Second, the design of goods, information table
Assuming you are a Department of a department store computer developers, one day ask you boss for the company to develop a set of online e-commerce platform, which are several thousand kinds of goods sold in department stores, but only the intention to sell dozens of online convenience of transport goods, of course, the future may be in the e-commerce platform to sell new goods. The platform will now proceed to the commodity information database table design. Each sale of goods will have the same attributes, such as the number of goods, trade names, product category, information vendors, including the number, stock, stock price, sales price, price. You quickly design a four table: goods type table (Wares_type), suppliers table (Wares_provider), commodity information table (Wares_info):

Commodity type table (Wares_type)
Note the name of the type of constraints
type_id int not repeat categories logo, primary key
type_name char (50) does not allow for the type of name space, not allowed to repeat
type_father int space does not allow for the parent category of the category ID, if it is the top node, then the only value is set to a
type_layer char (6) limit the 3-layer, the initial value of 000000 classes of first traversal, mainly to reduce the number of searchable database

Table vendors (Wares_provider)
Note the name of the type of constraints
No duplication provider_id int supplier logo, primary key
provider_name char (100) does not allow for air supplier name

Commodity information table (Wares_info)
Note the name of the type of constraints
No duplication of goods wares_id int identity, primary key
wares_name char (100) does not allow for air trade names
wares_type int commodity space does not permit the type of logo, and associated Wares_type.type_id
wares_info char (200) to allow for space-related information
Space does not allow for the provider int vendors logo, and associated Wares_provider.provider_id
setnum int initial value for the number of intron 1, the default is 1
stock int the initial value of 0 stock, default is 0
buy_price money does not allow for the purchase price of air
Space does not allow for sell_price money price
Space does not allow for the discount money price

You with this form to the owner of three inspections, the owner would like to add a picture of the field of goods, but goods are only part of picture. OK, you are in the commodity information table (Wares_info) to add a BOOL type of haspic field, and then built a new table - Commodity picture table (Wares_pic):

Commodity picture table (Wares_pic)
Note the name of the type of constraints
No duplication of goods pic_id int picture logo, primary key
wares_id int-owned commodity space does not allow for identification, and associated Wares_info.wares_id
pic_address char (200) does not allow image storage path is empty

Development is complete, fully meet the requirements of the current owner, and officially opened. Period of time, the owner intends to launch this new platform sales, including certain types of goods all need to add "length" attributes. Toss to the first round of the ... ... Of course, you add the goods in accordance with the picture of the old method of form, in the commodity information table (Wares_info) added a field haslength of the BOOL type and construction of a new form - the length of table products ( Wares_length):

Commodity length table (Wares_length)
Note the name of the type of constraints
No duplication of goods length_id int picture logo, primary key
wares_id int-owned commodity space does not allow for identification, and associated Wares_info.wares_id
length char (20) does not allow the length of description of goods is empty

Has just been changed not long after, the owner also intends on a new batch of goods, the goods all need to add certain types of "width" attributes. The teeth and bite you, but also according to抓药side, add the width of the merchandise table (Wares_width). After a period of time, the new owner of the goods will need to add some "height" attributes, you start to feel the design of your database in this manner in accordance with the growth continues, soon it turned into a maze? So, Is there any way to curb this unpredictability, but repeat a similar expansion of this database? I read "Agile Software Development: Principles, models and practice," the authors found similar examples cited: 7.3 "Copy" program. Among them, I very much agree with this point of Agile Software Development: almost in the initial pre-design, but once the demand for change, the pursuit of excellence at this time as a programmer, it should be a review of the entire architecture design from scratch, in the revised design able to meet the future modification of the system architecture similar. The following is I need to add "length" attribute of the changes provided by the program:

Remove the product information sheet (Wares_info) in haspic field, add an additional attribute table of goods (Wares_ex_property) and additional information sheet goods (Wares_ex_info) 2 tables to complete the function to add new attributes.

Additional attributes of goods table (Wares_ex_property)
Note the name of the type of constraints
No duplication of goods ex_pid int extra attributes logo, primary key
p_name char (20) does not allow for the additional attribute name space

Additional information sheet goods (Wares_ex_info)
Note the name of the type of constraints
ex_iid int additional information without duplication of goods logo, primary key
wares_id int-owned commodity space does not allow for identification, and associated Wares_info.wares_id
property_id int commodity space does not allow for the identification of additional attributes, and associated Wares_ex_property.ex_pid
property_value char (200) does not allow empty attribute value of additional commodities

Additional attributes in the form of goods (Wares_ex_property) to add two records:
ex_pid p_name
1 product picture
2 commodities length

And then the entire e-commerce platform in the background of additional management features additional attributes of a product management function, after the new products to add new attributes, just use the extra features to the product attribute table (Wares_ex_property) add a record that is can be. Do not be afraid to change, was the first hit by a bullet is not a bad thing, a bad track is the same as the second coming, the third hit by bullets. Sooner than the first bullet, the more serious injuries, followed by the more strong resistance 8) (To be continued)

Third, multi-user and privilege management design
The development of database management software category, it is impossible not to consider the multi-user settings and user rights issues. Despite the current market of large and medium-sized background database system software provides a multi-user, as well as a fine to the database tables in a set of permissions functionality, I personally recommended that: a set of sophisticated database management software, or should be self - the design of user management of this function for two reasons:
1. That large and medium-sized background database system software provided by the multi-user and its permissions are set for a total of database attributes, not necessarily able to fully meet the demand for certain special cases;
2. Do not over-rely on back-end database system software for some special function, a variety of large and medium-sized background database system is not fully compatible with software. Otherwise, once the future need to convert the database back-end database platform or system software version upgrades, before the architecture design may not be able to reuse it.

The following to see how to design a more flexible multi-user management module, that is, the database management software, system administrators can add new users, modify existing users permissions, users have been deleted. First of all, the analysis of user needs, are listed in the database management software to achieve the functions of all those in need; and then, according to a certain degree of contact Classification of these functions, that is, to certain types of users need to use the functions classified as a class; the final table began to build:

Menu (Function_table)
Note the name of the type of constraints
no duplication of functions f_id int identity, primary key
f_name char (20) function does not allow empty names, are not allowed to repeat
f_desc char (50) to allow for the empty function description

User Group Table (User_group)
Note the name of the type of constraints
group_id int not repeat the user group logo, primary key
group_name char (20) does not allow the user group name is empty
group_power char (100) does not allow for the user group permissions table space, says a collection of menu f_id

User table (User_table)
Note the name of the type of constraints
user_id int not duplicate user ID, primary key
user_name char (20) no duplication of user name
user_pwd char (20) does not allow empty user password
user_type int space does not allow for their user group logo, and associated User_group.group_id

This user group structure of the design, when the need to add new users, new users can specify their own user group; when the future system will need to add new features or functions of the old authority to amend, the only operation menu and user groups, the records, the original function of the user to the corresponding changes. Of course, the architecture design to database management software to determine the future, making the prospects of the development of relatively complex. However, when a larger number of users (10 or more), or the future probability of a larger software upgrade, this price is worth it.

Fourth, concise volume m: n Design
When m: n relations are generally set up three tables, m a, n a, m: n 1. However, m: n are sometimes encountered in the case of batch processing, such as libraries, are generally at the same time allows users to borrow books n, if the request for records in accordance with approved loan inquiries, that is, a user of a particular list of all loan books, the how to design it? Let us build the necessary first three tables:

Books table (Book_table)
Note the name of the type of constraints
No duplication of books book_id int identity, primary key
book_no char (20) without duplication of code books
book_name char (100) does not allow the name of the book is empty
... ...

Lending users table (Renter_table)
Note the name of the type of constraints
renter_id int no duplicate user ID, primary key
renter_name char (20) does not allow the user name is empty
... ...

Borrow a record sheet (Rent_log)
Note the name of the type of constraints
rent_id int borrow record no repeat logo, primary key
Space does not allow for r_id int user ID, and associated Renter_table.renter_id
Space does not allow for b_id int Books logo, and associated Book_table.book_id
rent_date datetime is not allowed to borrow for the air time
... ...

In order to achieve inquiries in accordance with approved loan records, we can build a table to save the bulk of the loan information, such as:

Lending volume table (Batch_rent)
Note the name of the type of constraints
No duplication of bulk batch_id int borrow logo, primary key
batch_no int is not allowed to borrow for the air volume number of batch_no with the same number of loan
Space does not allow for rent_id int borrow record labels, and associated Rent_log.rent_id
batch_date datetime is not allowed to borrow for the space time volume

This design please? We take a look at a user in order to set out a time to borrow all the books, the need to query? First of all, to borrow table search volume (Batch_rent), to meet the requirements of all records rent_id field data, and use these data as a query into a record sheet to borrow (Rent_log) in to make inquiries. So, Is there any way to improve it? The following is a simple volume design, without adding new tables, just make some corrections to borrow a record sheet (Rent_log) can. Records of the revised form (Rent_log) are as follows:

Borrow a record sheet (Rent_log)
Note the name of the type of constraints
rent_id int borrow record no repeat logo, primary key
Space does not allow for r_id int user ID, and associated Renter_table.renter_id
Space does not allow for b_id int Books logo, and associated Book_table.book_id
batch_no int is not allowed to borrow for the air volume number of batch_no with the same number of loan
rent_date datetime is not allowed to borrow for the air time
... ...

Among them, the same loan and the batch_no the first rent_id of the same storage. For example: Suppose the greatest rent_id 64, followed by a user of the first three books to borrow, then insert the 3-volume lending batch_no are 65 records. In addition, after a user rents a disc, then insert a rental rent_id record is 68. Using this design, lending volume of information inquiries, simply use a standard T_SQL the nested query. Of course, the design of 3NF, but above the design standard than 3NF, which is better? Not necessary for me to answer you.

Fifth, the choice of data redundancy
Part of the "relations between the data table tree" to keep a redundant field, further examples here - to add a redundant form. Take a look at an example: I had a company where the work of its staff in order to resolve the meal, and a small restaurant near the contact, accounting for dinner every day, the cost shared equally by number, at the end of cash settlement from the company, the work of each person for meals per month from their salaries. Of course, every day and the number of meals is not fixed, but, due to the work of每顿meal of different dishes by point, not the same as the cost of每顿. For example, Monday, 5 people would spend lunch 40, dinner 2 to spend 20, Tuesday, 6 Chinese people to spend 36 yuan, dinner 3 to spend 18 per hour. In order to facilitate the calculation of the work of each person for meals per month, I wrote a simple meal accounting management procedures, databases, there are three tables:

Staff Table (Clerk_table)
Note the name of the type of constraints
No duplication of staff clerk_id int identity, primary key
clerk_name char (10) does not allow employees to air the names of

Master Schedule of each meal (Eatdata1)
Note the name of the type of constraints
totle_id int total meal table without duplication logo, primary key
persons char (100) dining space does not allow employees to staff the collection logo
eat_date datetime date does not allow for air dining
eat_type char (1) does not allow for air dining type, used to distinguish between, the dinner
totle_price money does not allow the total cost for the space meal
persons_num int dining space does not allow for the number of

Billing small dining table (Eatdata2)
Note the name of the type of constraints
id int not repeat billing small dining table logo, primary key
Space does not allow for t_id int total table meals logo, and associated Eatdata1.totle_id
c_id int space does not allow for staff identification logo, and associated Clerk_table.clerk_id
Space does not allow for the price money to spend per person per meal

Among them, the fine dining table billing (Eatdata2) is the record of the total table meals (Eatdata1) recorded by a flat open dining staff, the letter is a form of redundancy. Of course, every meal can be the overall table (Eatdata1) part of the field into the small dining table billing (Eatdata2), so the overall table meals (Eatdata1) has become redundant on the table, but this way of eating designed by Billing more detailed statement of the duplication of data, compared with the above program is still better. However, billing is fine dining table (Eatdata2) the redundancy table, doing the monthly statistics each meal time, greatly simplifies the programming complexity, only with such a query similar to the statistics of each meals per month to send the general ledger number and meals:

SELECT clerk_name AS personname, COUNT (c_id) as eattimes, SUM (price) AS ptprice FROM Eatdata2 JOIN Clerk_tabsle ON (c_id = clerk_id) JOIN eatdata1 ON (totleid = tid) WHERE eat_date> = CONVERT (datetime, ' "& the_date &"') AND eat_date <DATEADD (month, 1, CONVERT (datetime, ' "& the_date &"')) GROUP BY c_id

Imagine, if it is not that redundant tables, each of the meals per person per month statistical general ledger will be more trouble, the program efficient enough to choke. Then, in the end when a certain degree of redundancy can increase the information? I think there are two principles:

1, the user's overall demand. When the user more attention in the database records in accordance with the norms of the algorithm must be processed, then lists the data. If the algorithm can directly use the embedded database system for the background function to be completed at this time can increase the appropriate redundant fields, and even to save redundant form these data processing algorithms. We should know that large quantities of data for the query, modify, or delete, the background database is much higher than the efficiency of the system to prepare our own code.
2, to simplify the complexity of development. Modern software development, to achieve the same function, they are many. Although programmers do not require proficiency in most of the development tools and platforms, but still need to be aware of the method development tools with which the procedure is more simple, efficient and number. The nature of redundant data is to use space for time, especially in the current development of hardware is much higher than software, so the appropriate redundancy is acceptable. However, I would like to emphasize in the final again: do not rely on too many platforms and development tools to simplify the development of the properties, not sure if this degree is good, the upgrade will be planted to safeguard the late fall of big.