mysql query to use the match () against () function problem

分类:Database 2011-09-13 来源:CodeWeblog.com 人气:190

MATCH (col1, col2 ,...) AGAINST (expr [IN BOOLEAN MODE | WITH QUERY EXPANSION])
MySQL supports full text indexing and search capabilities. MySQL FULLTEXT index type in the full-text index. FULLTEXT indexes for MyISAM tables only; they can be CHAR, VARCHAR or TEXT column as part of the CREATE TABLE statement is created, or subsequently use the ALTER TABLE or CREATE INDEX added. For larger data sets, enter your data without a FULLTEXT index to the table, then create the index, its data entry faster than the speed of existing FULLTEXT index more quickly.

Full-text search with MATCH () function is executed together.

mysql> CREATE TABLE articles (
-> Id INT UNSIGNED AUTO_INCREMENT NOT NULL PRIMARY KEY,
-> Title VARCHAR (200),
-> Body TEXT,
-> FULLTEXT (title, body)
->);
Query OK, 0 rows affected (0.00 sec)

mysql> INSERT INTO articles (title, body) VALUES
-> ('MySQL Tutorial', 'DBMS stands for DataBase ...'),
-> ('How To Use MySQL Well', 'After you went through a ...'),
-> ('Optimizing MySQL', 'In this tutorial we will show ...'),
-> ('1001 MySQL Tricks ', '1. Never run mysqld as root. 2. ...'),
-> ('MySQL vs. YourSQL', 'In the following database comparison ...'),
-> ('MySQL Security', 'When configured properly, MySQL ...');
Query OK, 6 rows affected (0.00 sec)
Records: 6 Duplicates: 0 Warnings: 0

mysql> SELECT * FROM articles
-> WHERE MATCH (title, body) AGAINST ('database' IN BOOLEAN MODE);
+----+-------------------+------------------------ ------------------+
| Id | title | body |
+----+-------------------+------------------------ ------------------+
| 5 | MySQL vs. YourSQL | In the following database comparison ... |
| 1 | MySQL Tutorial | DBMS stands for DataBase ... |
+----+-------------------+------------------------ ------------------+
2 rows in set (0.00 sec)
MATCH () function to perform database for a string of natural language search. A database is a set of one or two included in the FULLTEXT the column. Search string as a AGAINST () the parameters are given. For each row in the table, MATCH () returns a correlation value, that is, search string and MATCH () column in the table specified in the text of the line between a similarity measure.

By default, the search is performed case-insensitive way. However, you can be indexed by using the binary sort out a case-sensitive full-text search. For example, you can use the latin1 character set to a column to sort the given latin1_bin for the full-text search is case-sensitive.

If the above examples, when MATCH () is used in a WHERE statement, the related non-negative floating point value. Zero correlation means that there is no similarity. Correlation calculation is based on the number of words in the line, the line number of the unique child, the total number of words in the database, and contains a special word file (line) number.

For natural language full-text search, requiring MATCH () function in the name of your table columns and some FULLTEXT index contains the same columns. Information for the above, attention, MATCH () function (title and full text) named in the columns and articles Miao Nei ULLTEXT the same columns in the index. To search for the title and text, respectively, should create FULLTEXT indexes for each column.

Basically, the above example shows how to use the correlation line in order to return gradually weak MATCH () function. The following example will show how to explicitly retrieve the associated values. The order of rows returned is uncertain, because the SELECT statement does not contain a WHERE or ORDER BY clause:

mysql> SELECT id, MATCH (title, body) AGAINST ('Tutorial')
-> FROM articles;
+----+-----------------------------------------+
| Id | MATCH (title, body) AGAINST ('Tutorial' IN BOOLEAN MODE) |
+----+-----------------------------------------+
| 1 | 0.65545833110809 |
| 2 | 0 |
| 3 | 0.66266459226608 |
| 4 | 0 |
| 5 | 0 |
| 6 | 0 |
+----+-----------------------------------------+
6 rows in set (0.00 sec)
The following example is more complex. Asked to return the relevant value, while the line is fading by relevance sort order. To achieve this result, you should specify the two MATCH (): once in the SELECT list and once in the WHERE clause. This does not cause additional housekeeping, because the MySQL optimizer noticed two MATCH () call is the same, which will activate a full-text search code.

mysql> SELECT id, body, MATCH (title, body) AGAINST
-> ('Security implications of running MySQL as root') AS score
-> FROM articles WHERE MATCH (title, body) AGAINST
-> ('Security implications of running MySQL as root' IN BOOLEAN MODE);
+----+-------------------------------------+------ -----------+
| Id | body | score |
+----+-------------------------------------+------ -----------+
| 4 | 1. Never run mysqld as root. 2. ... | 1.5219271183014 |
| 6 | When configured properly, MySQL ... | 1.3114095926285 |
+----+-------------------------------------+------ -----------+
2 rows in set (0.00 sec)
Table with 2 rows (0.00 seconds)

MySQL FULLTEXT prototype implementation of any word character (letters, numbers and underscores in part) as a sequence of words. This sequence may also contain a single quote ('), but no more than one in a row. This means that aaa'bbb will be seen as a word, and aaa''bbb are considered two words. In the word before or after the single quotation marks will be removed FULLTEXT parser; 'aaa'bbb' will become aaa'bbb.

FULLTEXT analysis program by looking for certain delimiter to determine the starting and ending word positions, such as '' (space character),, (comma) and. (Period). If the word is not a separator to separate (for example, in Chinese), the FULLTEXT parser can not determine a word's starting position and end position. To be able to in such language to add words to the FULLTEXT index or other indexed terms, you must preprocess them so that by some such as "sort of arbitrary delimiter separated.

Some words in the full-text search will be ignored:

Any too short word will be ignored. Full-text search can find the default minimum word length of 4 characters.
Stop word in the word will be ignored. Disable the word is to have a "the" or "some" that are too common and are considered non-semantic word. There is a built-in stop word, but it can be user-defined list is rewritten.

Thesaurus and ask each correct word in the thesaurus and ask according to their importance in being measured. In this way, a number of documents in the word with a lower importance (and even many of the importance of the word zero), because in this particular semantic lexicon in its low value. Conversely, if the word is relatively rare, then it will get a higher importance. And the importance of the word are combined to be used to calculate the correlation of the line.

Large lexicon of the technology used with the optimal contract (in fact, it is carefully adjusted at this time). For small tables, word distribution does not adequately reflect their semantic value, and this model may sometimes produce bizarre results. For example, although the word "MySQL" appear in the article table for each row, but the word may not get any search results:

mysql> SELECT * FROM articles

-> WHERE MATCH (title, body) AGAINST ('MySQL');

Can not find the search term (0.00 seconds)

The search result is empty because the word "MySQL" appear in at least 50% of the full text of the line. Therefore, it is included in the stop word. For large data sets, using the most appropriate to the operation of a natural language inquiry ---- not from a 1GB table every line and return again. For small data sets, its usefulness may be relatively small.

A line with the contents of all rows in the table of half the words are less likely to find relevant documents. In fact, it is easier to find a lot of irrelevant content. We all know that when we try to use the Internet search engine to find information on when the high frequency of this happening. Can be inferred, because the line contains the word where the particular data set and was given a lower semantic value. A given word may be in a data set with more than 50% of its threshold, while in another data set is not.

When you first try to use full-text search to understand the working process, this 50% threshold provides important operational implication: If you create a table and only insert one row of articles 1, 2, and each text words appear in all lines of at least 50% probability. Then the result is what you do not search. Must be inserted at least 3 lines, and the more the better. Need to bypass the 50% limit of the user can use Boolean search code

分享到:
blog comments powered by Disqus

相关文章

  • mysql full text search match () against 2011-03-29

    A SELECT query LIKE Statement to execute this query, although this method is feasible , But for the full-text search, this is an extremely low efficiency of the method , In particular, when dealing with large amounts of data . ------------------- Abo

  • Failed to read auto-increment value from storage engine 2010-03-06

    The project has encountered a problem, that is, data stored on a table when an error, "Failed to read auto-increment value from storage engine". In the structure of the table after a careful examination of the table auto_increment values ar ...

  • Search engine data collection (r) 2010-11-18

    Collection of learning resources search engine First, the search engine technology / dynamic resource <A>, Miscellaneous 1, Lu Liang's Search Engine http://www.wespoke.com/ Lu Liang is an expert on search engine development, have previously deve

  • How to make SELECT query results additional auto-increment number sqlserver 2010-04-09

    How to make SELECT query results additional auto-increment serial number if the data table itself does not contain auto-increment number of fields, to how to do SELECT query results can give an extra auto-increment serial number then? The following f ...

  • Security of data - entry-level of 2010-02-10

    Security of data - entry-level of Database, data security is a business system must be addressed in the past we have used a number of tools for regular data backup, such as regular backups sybase, mysql hot backup. The server failed hard rent to us s ...

  • Auto-increment in the new record into the table a unique number generated when the 2010-08-23

    We usually want to insert a new record every time, automatically creating the primary key field value. We can create a table auto-increment field. The syntax for the MySQL SQL statement following the "Persons" table "P_Id" column

  • Online Data Entry Jobs - What Are The Best Jobs 2010-11-20

    Jobs that are plentifulbr> br> More and more businesses are moving into the global marketplace, thanks to the capabilities of the internet. These same businesses are realizing that online data entry jobs can make a significant different in the cost

  • Data Entry Services ? How they increase efficiency in your business? 2010-11-20

    In addition, use the other functions to ensure that hygiene Databases contain only relevant and Current Information. Br Br Industries that use Data crisscross Herve Leger Bandage entry Data entry Services Services Br is widely used in many different

  • Things to do after getting a data entry job 2010-11-24

    The first thing to do and probably the most difficult is to try and set a schedule. If you decided to work at home because you didn't like waking up early and working long hours without decent breaks, you should still try to get a working Schedule. It

  • Java auto-increment and decrement operators 2010-12-04

    Article out to: http://www.gootry.com Original Address: http://www.gootry.com/java-base/article/100307153336/57 And C similar, Java offers a number of fast operations. These shortcuts make coding easier operation, but also makes the code easier to re

iOS 开发

Android 开发

Python 开发

JAVA 开发

开发语言

PHP 开发

Ruby 开发

搜索

前端开发

数据库

开发工具

开放平台

Javascript 开发

.NET 开发

云计算

服务器

Copyright (C) codeweblog.com, All Rights Reserved.

CodeWeblog.com 版权所有 黔ICP备15002463号-1

processed in 0.502 (s). 14 q(s)