The design of efficient and reasonable query on MySQL

Release Time: 2006.11.23 04:36 Source: Microsoft technical communities Author: yuanye


Database system is the core of management information systems, database-based online transaction processing (OLTP) and online analytical processing (OLAP) is a banking, business, government and other departments of the most important one of computer applications. From the application of most systems, the query operation in a variety of database operations in the largest occupied, and the query operation is based on the SELECT statement in the SQL statement is a statement of the cost of the largest. For example, if the amount of data accumulated to a certain extent, such as a bank account to the database table of information on the accumulation of millions or even tens of millions of records, full table scan often requires tens of minutes time, and even a few hours. If better than the full table scan query strategy can often be reduced to a few minutes to make inquiries, we can see the importance of query optimization technology.

The author in the application of the implementation of the project found that many programmers in the use of a number of front-end database development tools (such as PowerBuilder, Delphi, etc.) database application development by focusing on only the beautiful user interface does not attach importance to the efficiency of query, resulting in the applications developed inefficiency, serious waste of resources. Therefore, how to design reasonable and efficient query it is very important. This paper is based on application, combined with database theory, query optimization techniques to introduce the system in real use.

Analysis

Many programmers think that query optimization is a DBMS (database management system) tasks, prepared with the programmer has little to do with SQL statement, which is wrong. A good query plan performance often can improve the number of times. Query plan is submitted by users a collection of SQL statements, query plan is optimized to deal with the statement after the collection of produce. DBMS query plan to deal with the process is as follows: in the query after the lexical, syntax check, the statement will be submitted to the DBMS's query optimizer, optimizer after algebraic optimization and optimization of access to the path followed by pre-compiled modules processing of statements and generate inquiries, planning, and then at the right time to the system implementation, the final results will be returned to the user. In the actual database products (such as Oracle, Sybase, etc.) are all versions of the high cost-based optimization method, this optimization of the dictionary from the system based on the information table to estimate the different costs of planning inquiries, and then select a better planning. While it is in the database query optimization has been done better, but by the user of the SQL statement submitted to the system based on optimization, it is difficult to imagine a worse original query plan after the system has become efficient after optimization, so written statement of the advantages and disadvantages of users is essential. System we did not discuss query optimization, focusing on the following plan to improve the user's query solution.

To solve the problem

Below Informix relational database system as an example, plans to improve the user's query method.

1. The rational use of the index

Database index is an important data structure, and its fundamental purpose is to improve the query efficiency. Now most of the database products are the first to propose the use of IBM's ISAM index structure. The use of the index to the right, the use of the principle as follows:

● Connect the regular, but not designated as a foreign key column on the index, rather than the regular field connected automatically generated by the optimizer index.

● sort in frequent or in groups (that is, to group by or order by the operation) out of the index.

● in the conditional expression often used out different values to create more access, fewer values in different columns on the index will not. For example, an employee table in the "gender" out only "male" and "F" two different values, so there is no need to create the index. If the index does not improve the query efficiency, it would seriously reduce the update speed.

● to be sorted out if there is more than can be listed in the establishment of these composite index (compound index).

● the use of system tools. Informix database as a tool tbcheck, you can index on a suspicious check. In some database server, the index may be due to frequent failure or operating efficiency and allows to read, if a query using the index for no clear reasons to slow down, you can try tbcheck tool to check the integrity of the index and, if necessary, repair. In addition, when database table after a large amount of data to update, delete and rebuild the index can improve query speed.

2. To avoid or simplify the sort

Should be simplified or to avoid duplication of large scale sequencing. When to use the index to automatically generate the appropriate output of the order, the optimizer to avoid the sort step. Here are some factors:

● index does not include one or more columns to be sorted;

● group by or order by clause in the order listed and not the same as the order of the index;

● sort out the table from different.

In order to avoid unnecessary sorting, it is necessary to correctly build the index, a reasonable combined database table (although sometimes that may affect the form of standardized, but in relation to the improvement of the efficiency is worth it). If the sort is inevitable, it should try to simplify it, sort out, such as narrowing the scope.


 

3. Elimination of large-scale sequence data table row access

Nested query in the form of the order of the efficiency of access to the query may have a fatal impact. For example, the use of sequential access strategy, a 3-layer nested queries, if each line were 1000 inquiries, then the query will be one billion row data query. Avoid such a situation is the primary means of connecting to the index column. For example, two tables: Student Table (Student ID, name, age ... ...) and the course list (school number, course number, results). If you connect two tables to be done, it is necessary in the "Student ID" field on the connectivity index.

And can also be set to avoid the use of sequential access. Despite all the checks listed in both indexes, but some form of optimization where clause uses the order of forced access. The following query will be forced on orders implementation of the order form below:


SELECT * FROM orders WHERE (customer_num = 104 AND order_num> 1001) OR order_num = 1008



Although the customer_num and has order_num index, but the above statement in the optimizer to access the path or use the order of scanning the entire table. Because of this statement is to retrieve a collection of separate lines, it should be replaced by the following statement:


SELECT * FROM orders WHERE customer_num = 104 AND order_num> 1001



UNION



SELECT * FROM orders WHERE order_num = 1008



This path will be able to use the index to deal with inquiries.

4. Subquery to avoid related

At the same time, a label out in the main query and the query where clause in there, it was very likely the main query when the column value changes, the subquery must be a query. The more levels of nested queries, the lower the efficiency and should therefore be avoided as far as possible subquery. If the subquery is inevitable, then in the sub-query to filter out the line as much as possible.


 

5. To avoid the difficulties of the formal expression

MATCHES and LIKE keyword support for wildcard match, regular expression is technically called. However, this match particularly time-consuming. For example: SELECT * FROM customer WHERE zipcode LIKE "98_ _ _"

Even in the zipcode field index was established, in which case is still under way to adopt the order of scanning. If the statement changed to SELECT * FROM customer WHERE zipcode> "98000", in the implementation of query will use the index to look up, obviously the speed would be greatly enhanced.

In addition, we should also avoid the beginning of the substring. Statements such as: SELECT * FROM customer WHERE zipcode [2,3]> "80", used in the where clause of non-started substring, so this statement will not use the index.

6. The use of temporary tables to speed up query

To a subset of table to sort and create a temporary table, and sometimes speed up the query. It helps to avoid multiple sort operations, but also in other areas to simplify the work of optimizer. For example:


SELECT cust.name, rcVBles.balance, ... ... other columns



FROM cust, rcvbles



WHERE cust.customer_id = rcvlbes.customer_id



AND rcvblls.balance> 0



AND cust.postcode> "98000"



ORDER BY cust.name



If this query to be executed many times and on more than one occasion, you can not pay all of the customer to find out on a temporary file, and sort the names of clients:


SELECT cust.name, rcvbles.balance, ... ... other columns



FROM cust, rcvbles



WHERE cust.customer_id = rcvlbes.customer_id



AND rcvblls.balance> 0



ORDER BY cust.name



INTO TEMP cust_with_balance



Then the following temporary table in the query:


SELECT * FROM cust_with_balance



WHERE postcode> "98000"



Temporary table in the main line than the smaller firms in the table, and the physical order is the order of the required reduction of disk I / O, so query workload can be reduced significantly.

Note: create a temporary table will not reflect changes to the main table. In the main data table modifications frequent, careful not to lose data.

7. Used to replace the sort of non-sequential access

Non-sequential disk access is the slowest operation in the disk access arm movement back and forth. SQL statements to hide this information, allows us to write applications in a very easy to write a request to access a large number of non-sequential-page query. In some cases, the ability to use the database to replace the sort of non-sequential access can improve query.