Html code with the javascript used to resolve
In the computer, we use the information stored in binary code are expressed. We recognize, and the screen in English, Chinese characters and other symbols and the binary code stored by the co-conversion, is encoded.
There are two basic concepts should be noted, charset and character encoding:
charset, character set, which is a symbol and a number of mapping a table, that is, it determines koubei the 107 is' a ', 21475 is the reputation of the "I", different forms have different mapping relations, such as ascii, gb2312, Unicode. through the figure and character mapping table, we have a binary that can be converted into a digital character.
chracter encoding, encoding. For example, should be the same "I" in the number of 21,475, we are using \ u5k3e3 said, or use% E5% 8F% A3 to express it? This is determined by the character encoding.
For 'koubei.com' This string is the common character of Americans, they developed a character set known as ASCII, the full name of the american standard code of information interchange American Standard Code for Information Interchange, which used 0-127 128 figures (2 of 7 power, 0 × 00 -0 × 7f ) this 123abc represent 128 commonly used characters. A total of 7 bits, plus the first one is the sign bit, we should come up negative yards that what the anti-code, and a total of 8 bits form a byte. Americans are stingy when the point is, if the very beginning, designed as a byte is 16 bits, 32 bits, the world's many problems would be less, but at that time, it is estimated that 8 bits they think enough can be said that 128 different characters in it !
Between the computer this stuff out is to engage the Americans, so they save the home from the code symbols are OK, with the very cool. However, the beginning of the internationalization of computer time, problems arose, you take China, for example, like tens of thousands of Chinese characters, how do?
The existing 8 bits of a byte is the foundation of the system can not be destroyed, not to change to like 16 bits, or change too much, can only go the other way: with a number of ascii characters to represent a other characters, that is, MBCS (Multi-Byte Character System, multi-byte character system).
MBCS With this concept, we can express more of the characters, for example, we use the two ascii characters, there are 16 bits, in theory, there are 2 of the 16 th power 65,536 characters. However, how these codes assigned to the characters on it? For example, the word "I" is the Unicode code 21475, who determined it? Character set, which has just been introduced by charset. ascii is one of the most basic character set, On top of that, we have similar to gb2312, big5 this for Simplified Chinese and Traditional Chinese character set, etc. MBCS. Unicode Consortium has called the agency, decided to get all the characters, including a character set (UCS, Universal Character Set) and the corresponding encoding standards, namely Unicode. Since 1991, it released the first edition of the Unicode international standards, ISBN 0-321-18578-1, the International Organization for Standardization ISO is also involved in the custom, ISO / IEC 10646: the Universal Character Set. In short, Unicode is a basic coverage of all existing signs on the earth the standard characters, and now is being increasingly widely used, ECMA Standard also provides, javascript language use Unicode characters in the internal standard (which means, javascript the variable names, function names are allowed in Chinese!).
For the development in China are concerned, are more likely to run into the problem is gbk, gb2312, utf-8 conversion between such a problem. Strictly speaking, this is not very accurate, gbk, gb2312 is the character set (charset), which is a utf-8 encoding (character encoding), is the Unicode character set UCS standards of a code because the use of Unicode characters Set the main page UTF-8 encoding, which is why we often put them side by side, but in fact is not accurate.
With Unicode, at least not when human civilization before the aliens, which is a master key, and you use it. And now the most widely used form of Unicode encoding UTF-8 (8-bit UCS / Unicode Transformation Format), and it has several good places in particular:
1, encoding UCS character set, the world's common
2, is a variable length coding (variable-length character encoding), compatible with ascii
The second point is a lot of advantages, it allows pure ascii previously used coding systems, and will not increase the amount of additional storage (assuming the fixed-length coding means that each character formed by two bytes, then this time ascii characters occupy double the storage space will be increased).
UTF-8 to make it clear that the introduction of a more convenient form:
U-00000000 - U-0000007F: 0xxxxxxx
U-00000080 - U-000007FF: 110xxxxx 10xxxxxx
U-00000800 - U-0000FFFF: 1110xxxx 10xxxxxx 10xxxxxx
U-00010000 - U-001FFFFF: 11110xxx 10xxxxxx 10xxxxxx 10xxxxxx
U-00200000 - U-03FFFFFF: 111110xx 10xxxxxx 10xxxxxx 10xxxxxx 10xxxxxx
U-04000000 - U-7FFFFFFF: 1111110x 10xxxxxx 10xxxxxx 10xxxxxx 10xxxxxx 10xxxxxx
To understand this table, we see enough the first two lines
U-00000000 - U-0000007F:
0xxxxxxx this is the first line, which means that if you found a utf-8 encoded byte is the binary code 0xxxxxxx, is 0 at the beginning, that is, between 0-127 decimal, then he was alone on behalf of the byte character, and is owned and ascii code exactly the same meaning. All other utf8-encoded binary value 1 is used at the beginning of the 1xxxxxxx, more than 127, and will need at least 2 bytes to represent a symbol. Therefore, a byte is a first switch, the representatives of this is not an ascii character code. As I mentioned earlier, this is the compatibility of the definition from the English point of view, is the utf8 encoding two attributes:
UCS characters U +0000 to U +007 F (ASCII) are encoded simply as bytes 0 × 00 to 0 × 7F (ASCII compatibility). This means that files and strings which contain only 7-bit ASCII characters have the same encoding under both ASCII and UTF-8.
All UCS characters> U +007 F are encoded as a sequence of several bytes, each of which has the most significant bit set. Therefore, no ASCII byte (0 × 00 -0 × 7F ) can appear as part of any other character.
Then we take a look at the second line:
U-00000080 - U-000007FF: 110xxxxx 10xxxxxx
Look at the first byte: 110xxxxx, the meaning of it is that I am not a ascii code (because the first is not to 0), I is a multi-bytes characters of the first byte (the second for 1), I participated in that This character is composed of two bytes (the third is 0), from fourth place to start is the character of the location information is stored.
Look at the second byte: 10xxxxxx, meaning it is: I am not a ascii code (because the first not to 0), I am not a multi-bytes characters of the first byte (the second bit is 0), the third place to start is the character of the location information is stored.
From this example, can be summed up, utf-8 encoding, in a long list of consecutive byte binary code, may be 2-6 bytes to represent a symbol, then compared that to a byte symbol ascii code, we need space to store two additional information: First, the location of the beginning of this symbol, a "starter" position, with the words of biology, that is, when the protein translation initiation codon AUG of the position; Second, the the number of bytes used symbols (in fact, if each symbol has starter, this length is not provided, but the provision of information to increase the length of bytes in some loss of fault tolerance at the time). Solution is: use a second byte is 1 to represent whether the byte is a character of the initial byte (a byte inside because the first has been used earlier, that the ascii code 0, 1, said non - ascii), that is, a multi-byte symbols of the first bytes must be 11xxxxxx, between 192-255 a binary number. Next, from the beginning of the third, the length of the information provided, the third is that the symbol 0 is 2 bytes, the beginning of the third one each 1, the bytes occupied by a number of characters plus one. utf-8 up to the definition of the 6-byte characters, such 110xxxxx than that of the starter more than 2 bytes 4 1, so the starter is 1111110x, shown in the above table.
Let us consider the standard definition of it in English to express the same meaning:
The first byte of a multibyte sequence that represents a non-ASCII character is always in the range 0xC0 to 0xFD and it indicates how many bytes follow for this character. All further bytes in a multibyte sequence are in the range 0 × 80 to 0xBF . This allows easy resynchronization and makes the encoding stateless and robust against missing bytes.
True information bit (that is, the real character of the charset of the digital information), is the direct use of binary mode, depending on the order of the table on top of the 'x' on. Programmers to use our contacts in China is now the largest Chinese characters, which is in the encoding range U-00000800 - U-0000FFFF between, from the above table can be found, the interval of the utf-8 encoding is used in three expressed in bytes (which is utf-8 encoded characters will be occupied by more than 2 bytes for each character of the EUC-CN encoding gb2312 character set of Chinese characters to use more storage space reasons), or by word-of-mouth the "I" word For example it mouth Unicode characters in the code is as follows:
Port: 21475 == 0 × 53e3 == binary 101001111100011
In the javascript in, run the code (using firebug's console, or an HTML editor to insert the following code between a pair of script tags):
Js code
alert ( '\ u53e3'); / / get 'I'
alert (escape ( 'I')); / / get '% u53E3'
alert (String.fromCharCode ('21475 ')); / / get' I '
alert ( 'I'. charCodeAt (0)); / / get'21475 '
alert (encodeURI ( 'I')); / / get '% E5% 8F% A3'
alert ( '\ u53e3'); / / get 'I'
alert (escape ( 'I')); / / get '% u53E3'
alert (String.fromCharCode ('21475 ')); / / get' I '
alert ( 'I'. charCodeAt (0)); / / get'21475 '
alert (encodeURI ( 'I')); / / get '% E5% 8F% A3'
Can see, string volume can be used directly \ u + hexadecimal Unicode character code of the form 'I', and fromCharCode methods accepted 10 of the Unicode hexadecimal code, the characters' mouth '.
The second one is alert to be '% u7545', which is a non-standard encoding of Unicode, URI are part of the Percent encoding, but the use of W3C has formally been rejected by any one of the RFC are not standard, ECMA-262 standard set forth in the escape of such acts, it is estimated that is temporary.
It is interesting comparison alert received the fifth '% E5% 8F% A3' what is it? How's it been?
This is the URI used in a relatively large number of Percent encoding, percent-encoding, RFC 3986 standard requirements.
RFC 3986 provides that, Percent encoding the non-reserved words are as follows:
Unreserved characters, per RFC 3986 (January 2005)
ABCDEFGHIJKLMNOPQRSTU VWXYZ
abcdefghijklmnopqrstu vwxyz
0 1 2 3 4 5 6 7 8 9 - _. ~
In other words, these words appear in the URI of the time, do not encode, because they and the URI of the format does not matter, only that the original meaning of the characters
In addition, the reserved words are as follows:
Reserved characters, per RFC 3986 (January 2005)
! * '();: @ & = + $, /?% # []
These characters, there is a special significance, and if does not mean that those intent on behalf of the special significance of the time appeared to be encoded as follows:
Reserved characters after percent-encoding
! * '();: @ & = + $, /?% # []
% 21% 2A% 27% 28% 29% 3B% 3A% 40% 26% 3D% 2B% 24% 2C% 2F% 3F% 25% 23% 5B% 5D
The% after the No. 2 is a hexadecimal number, this number is the Unicode encoding UTF-8 of another form of expression.
Let us look in detail to restore the 'I' word is why '% E5% 8F% A3' it.
Just now we talked about 'I' of the Unicode encoding of the binary form is 21475:
101001111100011
Just now we talked to, for a Chinese character, it's UTF-8 encoded form is:
U-00000800 - U-0000FFFF: 1110xxxx 10xxxxxx 10xxxxxx
Now we The blanks to be 'I' binary code into the incision to fill out the replacement of x:
101001111100011 = ---- 0101 - 001111 - 100011
101001111100011 = 1110xxxx 10xxxxxx 10xxxxxx
The first one byte less, add the left 0 filled by:
11100101 10001111 10100011
Let us put the three into a binary number of 16-band, and percent-plus, run the following javascript code:
Js code
alert (
'%' + ParseInt ('11100101 ', 2). ToString (16) +
'%' + ParseInt ('10001111 ', 2). ToString (16) +
'%' + ParseInt ('10100011 ', 2). ToString (16)
) / / Get '% e5% 8f% a3'
alert (
'%' + ParseInt ('11100101 ', 2). ToString (16) +
'%' + ParseInt ('10001111 ', 2). ToString (16) +
'%' + ParseInt ('10100011 ', 2). ToString (16)
) / / Get '% e5% 8f% a3'
How to be% e5% 8f% a3 instead.
In addition the built-javascript function encodeURI, decodeURI, encodeURIComponent, decodeURIComponent is conducted Percent Encode, only in the treatment of: /;? Special characters, such as when there are differences.
In addition, re-introduce the HTML in the Numeric character reference, NCR code
I believe we all know, HTML is the need to encode special characters, such as the & needs to be coded as follows: & also ® to do the special characters . In fact, we can use HTML code to display any Unicode character, and edit a html document as follows:
Html code
<html>
<body>
I
I
I
</ body>
</ html>
<html>
<body>
I
I
I
</ body>
</ html>
The result is three "I" word.
There is also a commonly used encoding is base64 encoding, base64 encoding is in the email to the heads of non-pure 8-bit transport-layer transmission of binary data and designed so that you can pass in the email in the binary attachments. It with az AZ 0-9 + / = This is 64 characters to indicate the original data, and character encoding for the three to four, the length increased by 33%.
This encoding are more advanced in some of the more commonly used javascript applications, such as the Super Mario games, its music is written inside the javascript file. For example, the use of graphic examples of canvas, which is also the head of the source code written in javascript. RFC 2397 which is provided for data URIs agreement, Firefox browser support, IE8 also supports the beginning of the use of base64 encoding and data URIs, we can not use any foreign music, images and other multimedia files and create a rich effect.
Is more than I would like to introduce the javascript and html coding used and the principle, the final sentence would also like to mention, a lot of hacking and coding practices, and use the code after the code through some simple filtering, the following js code:
Js code
var a = 'word-of-mouth';
\ u0061 = 'koubei.com';
alert (a); / / get 'koubei.com'
var a = 'word-of-mouth';
\ u0061 = 'koubei.com';
alert (a); / / get 'koubei.com'
Related Posts of Html code with the javascript used to resolve
-
JS menu
JS menu
-
hibernate Technical Study Notes (first)
Introduction: Model does not match (impedance mismatch) java object-oriented language, object model, its key concepts are: inheritance, association, polymorphism, etc.; database is the relational model, its key concepts are: tables, primary keys, for ...
-
Javascript Object Model
Javascript Object Model
-
Rails2.0.2 change the default DB adpter
In Rails2.0.2 rails demo ... ... MissingSourceFile in SayController # hello no such file to load - sqlite3 RAILS_ROOT: / home / kenb / rails-projects / demo ... ... Checked config / database.yml, adpter default is set become the sqlite3. Check the ra ...
-
ROR resources
Ruby Web site resources: ruby official website: http://www.ruby-lang.org ruby doc official website: http://www.ruby-doc.org rubyonrails official website: http://www.rubyonrails.org programming ruby online edition (Ruby studying the "Bible") ...
-
hibernate using c3p0 connection pooling
Private http://www.lifevv.com/tenyo/doc/20070605102040991.html c3p0 for open source's JDBC connection pool, with the release hibernate. This article describes how to use the hibernate configuration in c3p0. c3p0 connection pool configuration is v ...
-
Hibernate configuration parameters hibernate.hbm2ddl.auto
Hibernate in the configuration file: <properties> <property name="hibernate.hbm2ddl.auto" value="create" /> </ properties> Parameter Description: validate load hibernate, the authentication to create a database t ...
-
What is the appfuse
First, Appfuse brief introduction Matt Raible are Appfuse developed a guiding entry-level J2EE framework, how to integrate its popular Spring, Hibernate, ibatis, struts, Xdcolet, junit, etc. give the basic framework of the model, the latest version 1 ...
-
In the servlet use Bean
According to Sun's definition, JavaBean is a reusable software components. In fact JavaBean is a Java class, through the package into a property and methods of treatment of a function or a business object, referred to as bean. Because JavaBean is ...
-
Hibernate secondary cache
Hibernate cache: 2-bit cache, also known as process-level cache or SessionFactory level cache, secondary cache can be shared by all of the session Cache configuration and the use of: Will echcache.xml (the document code in hibernate package directory ...













Leave a Reply