Has been praised Sun rigorous and elegant treatment technologies (bless Sun). Sun JDK source code in the Java libraries, even the notes are clear and regulatory norms Fan, javadoc comment use was also a meticulous reading comfortably familiar. Thus, in their daily work and learning, often read the source code for Java library, enjoying themselves? If you encounter strange problems, the source code to help is even greater.
Gossip talk less and return to the topic. These days, has been for the Java's "memory leak" problem entangled. Java application, the memory occupied by the constant, regular rise, eventually exceeding the monitoring thresholds. Holmes had a shot!
Speaking of Java, memory leaks, in fact, the definition is not so clear. First, if the JVM is not bug, then the theory is not a "can not be recycled heap space", ie C / C + + in the kind of memory leak in Java, does not exist. Second, if the Java program has been due to hold an object reference, but from the program logic point of view, this object can no longer be used, then we can think that this object is leaked. If this is the number of objects a lot, then it is obvious that a lot of memory space has been leaked ( "waste" more accurate) of the.
However, this article would like to say the memory leaks do not belong to the above reasons, it marked with inverted commas. The specific reason, really unexpected. For more information, please explain below.
Analysis of the general steps of memory leaks
If you find Java apps are consuming the memory leak there are signs, then we generally use the following steps to analyze
- The Java application uses the heap dump down the
- Using the Java heap analysis tool to identify memory footprint than expected (usually because of too many) of the suspect object
- When necessary, the need to analyze the suspect objects and other objects of the reference relationship.
- View the program's source code, to identify the reasons for the excessive number of suspect objects.
If the Java application, a memory leak, do not worry with the application to kill, but to preserve the area. If it is Internet applications, you can cut the flow to other servers. The purpose is to preserve the scene in order to run the JVM's heap dump down.
JDK comes with jmap tool that can do this thing. It is the means of implementation are:
jmap -dump:format=b,file=heap.bin <pid>
format = b means that, dump out the binary file format.
file-heap.bin means that, dump out the file name is heap.bin.
<pid> is the JVM's process ID.
(In linux under) to run a ps aux | grep java, find the JVM's pid; before the implementation of jmap-dump: format = b, file = heap.bin <pid>, get heap dump file.
The binary heap dump file parsing into a human-readable information, tools naturally require professional help, here Recommended
Memory Analyzer, referred to as MAT, is the Eclipse Foundation open source projects, contributions by the SAP, and IBM. Companies produce go the software giant is still very using of, MAT can be analyzed with hundreds of millions of class object heap, quickly calculate the size of the memory occupied by each object, the object reference to the relationship between the automatic detection of memory leaks suspected objects, powerful, and user-friendly easy to use.
MAT interface development based on Eclipse, released in two forms: Eclipse plug-ins and Eclipe RCP. MAT analysis results to the form of pictures and reports at a glance. In short individual is still very like this tool. The following posted two first official screenshots:
Closer to home, I used the MAT opened heap.bin, is easy to see, char  out of their expected number of multi-occupying more than 90% of memory. In general, char  in the JVM does take up a lot of memory, the numbers are very large, because String objects to char  as an internal storage. However, this char  is too greedy, and careful observation of one found that there were tens of thousands of dollars in char , each occupied by several hundred K of memory. This phenomenon shows, Java program to save tens of thousands of large String object. Cementation process logic, this should not be, and certainly a problem somewhere.
In the suspicious char  in, any pick one, use the Path To GC Root feature, find the char  reference to the path and found String object is referenced by a HashMap. This is also the expected things, Java memory leaks are mostly due to the global object is left in the HashMap in the not released. However, the HashMap is used as a cache, set the cache entry threshold, the threshold is reached after the guide will be automatically eliminated. From this logical analysis, it should not appear the memory leak. Although the cache of the String object has reached tens of thousands of dollars, but still did not meet pre-set threshold value (threshold value is set to relatively large, because it was estimated String objects are relatively small).
However, another issue caught my attention: Why such a huge cache of String objects? The internal char  the length of the hundreds of K. Although the number of objects cached in the String has not yet reached the threshold value, but the size of String objects far exceeded our expectations, eventually leading to a large number of memory being consumed, the formation of signs of memory leaks (memory consumption and accurately said it should be too much).
Further allows for deeper investigation on this issue to see how the String large object was placed in the HashMap. By looking at the program's source code, I discovered that indeed String large objects, but did not put large objects into HashMap of String, but the large String object split (called String.split method), and then split out of String Small objects placed in the HashMap.
This is strange, it is clearly split into the HashMap in the String after a small object, how will occupy so much space? Is it split method of String class, there are problems?
With the above-mentioned questions, I looked up Sun JDK6 in the String class code, mainly yes yes split method implementation:
As can be seen, Stirng.split method called Pattern.split method. Read Pattern.split method code:
Attention to look at the first nine lines: Stirng match = input.subSequence (intdex, m.start ()). ToString ();
The match was split out of here, the String of small objects, it is actually String objects subSequence great results. Read String.subSequence code:
String.subSequence have called String.subString, continue to see:
11,12 look at the first line, we finally see the prospect of a solution, if the content is complete subString the original string, then return to the original String object; Otherwise, we will create a new String object, but this looks like using the original String object String object char . Through String constructor to confirm this point:
In order to avoid the memory copy speed, Sun JDK directly reuse the original String objects char , offset and length to identify the contents of the different strings. In other words, subString out to String of small objects will still be pointing to the original String Large Objects char , split is the same situation. This explains why the HashMap of String objects char  are so great.
Reasons to explain
In fact, out of the previous section has analyzed the cause, and then tidy up this section:
- Program from each request to get a String large object, the object of internal char  the length of the hundreds of K.
- Program String large objects do split, the String will be split into smaller objects HashMap that is used for the cache.
- Sun JDK6 right String.split method is optimized, split out Stirng object directly using the original String object char 
- HashMap Each String object actually points to a huge char 
- HashMap is capped at 10000, so the cached objects Sting total size = 10000 * 100 K = G-class.
- G-level cache memory is occupied, and a lot of memory is wasted, resulting in signs of memory leaks.
Find the reasons for the solution, and will have. split is to use, but we should not split up into the HashMap of String objects directly, but instead call about String copy constructor String (String original), this constructor is safe, concrete can see the code:
Only, new String (string) the code is very strange, 囧 . Perhaps, subString and the split should provide an option to let the programmer control over whether or reuse String objects char .
Although, subString and the split caused by the realization of the problem now, but this bug can count String class do? Personally feel that hard to say. Because such optimization is more reasonable, subString and spit the result is certainly a continuous sub-sequence of the original string. Can only say, String is not just a core class, which for the JVM is just as important as the type of the original type.
JDK implementation of the String to do all possible optimization is understandable. However, optimization has brought hardship, we have enough understanding of their programmers in order to make good use of them.