Optimization into a worry: String.split caused by "memory leak"

sponsored links
Has been praised Sun rigorous and elegant treatment technologies (bless Sun). Sun JDK source code in the Java libraries, even the notes are clear and regulatory norms Fan, javadoc comment use was also a meticulous reading comfortably familiar. Thus, in their daily work and learning, often read the source code for Java library, enjoying themselves? If you encounter strange problems, the source code to help is even greater.

Gossip talk less and return to the topic. These days, has been for the Java's "memory leak" problem entangled. Java application, the memory occupied by the constant, regular rise, eventually exceeding the monitoring thresholds. Holmes had a shot!

Speaking of Java, memory leaks, in fact, the definition is not so clear. First, if the JVM is not bug, then the theory is not a "can not be recycled heap space", ie C / C + + in the kind of memory leak in Java, does not exist. Second, if the Java program has been due to hold an object reference, but from the program logic point of view, this object can no longer be used, then we can think that this object is leaked. If this is the number of objects a lot, then it is obvious that a lot of memory space has been leaked ( "waste" more accurate) of the.

However, this article would like to say the memory leaks do not belong to the above reasons, it marked with inverted commas. The specific reason, really unexpected. For more information, please explain below.

Analysis of the general steps of memory leaks

If you find Java apps are consuming the memory leak there are signs, then we generally use the following steps to analyze

  1. The Java application uses the heap dump down the
  2. Using the Java heap analysis tool to identify memory footprint than expected (usually because of too many) of the suspect object
  3. When necessary, the need to analyze the suspect objects and other objects of the reference relationship.
  4. View the program's source code, to identify the reasons for the excessive number of suspect objects.

dump heap

If the Java application, a memory leak, do not worry with the application to kill, but to preserve the area. If it is Internet applications, you can cut the flow to other servers. The purpose is to preserve the scene in order to run the JVM's heap dump down.

JDK comes with jmap tool that can do this thing. It is the means of implementation are:

jmap -dump:format=b,file=heap.bin <pid>

format = b means that, dump out the binary file format.

file-heap.bin means that, dump out the file name is heap.bin.

<pid> is the JVM's process ID.

(In linux under) to run a ps aux | grep java, find the JVM's pid; before the implementation of jmap-dump: format = b, file = heap.bin <pid>, get heap dump file.

analyze heap

The binary heap dump file parsing into a human-readable information, tools naturally require professional help, here Recommended Memory Analyzer .

Memory Analyzer, referred to as MAT, is the Eclipse Foundation open source projects, contributions by the SAP, and IBM. Companies produce go the software giant is still very using of, MAT can be analyzed with hundreds of millions of class object heap, quickly calculate the size of the memory occupied by each object, the object reference to the relationship between the automatic detection of memory leaks suspected objects, powerful, and user-friendly easy to use.

MAT interface development based on Eclipse, released in two forms: Eclipse plug-ins and Eclipe RCP. MAT analysis results to the form of pictures and reports at a glance. In short individual is still very like this tool. The following posted two first official screenshots:

Optimization into a worry: String.split caused by "memory leak"

Optimization into a worry: String.split caused by "memory leak"

Closer to home, I used the MAT opened heap.bin, is easy to see, char [] out of their expected number of multi-occupying more than 90% of memory. In general, char [] in the JVM does take up a lot of memory, the numbers are very large, because String objects to char [] as an internal storage. However, this char [] is too greedy, and careful observation of one found that there were tens of thousands of dollars in char [], each occupied by several hundred K of memory. This phenomenon shows, Java program to save tens of thousands of large String object. Cementation process logic, this should not be, and certainly a problem somewhere.

Thus led

In the suspicious char [] in, any pick one, use the Path To GC Root feature, find the char [] reference to the path and found String object is referenced by a HashMap. This is also the expected things, Java memory leaks are mostly due to the global object is left in the HashMap in the not released. However, the HashMap is used as a cache, set the cache entry threshold, the threshold is reached after the guide will be automatically eliminated. From this logical analysis, it should not appear the memory leak. Although the cache of the String object has reached tens of thousands of dollars, but still did not meet pre-set threshold value (threshold value is set to relatively large, because it was estimated String objects are relatively small).

However, another issue caught my attention: Why such a huge cache of String objects? The internal char [] the length of the hundreds of K. Although the number of objects cached in the String has not yet reached the threshold value, but the size of String objects far exceeded our expectations, eventually leading to a large number of memory being consumed, the formation of signs of memory leaks (memory consumption and accurately said it should be too much).

Further allows for deeper investigation on this issue to see how the String large object was placed in the HashMap. By looking at the program's source code, I discovered that indeed String large objects, but did not put large objects into HashMap of String, but the large String object split (called String.split method), and then split out of String Small objects placed in the HashMap.

This is strange, it is clearly split into the HashMap in the String after a small object, how will occupy so much space? Is it split method of String class, there are problems?

View Code

With the above-mentioned questions, I looked up Sun JDK6 in the String class code, mainly yes yes split method implementation:

As can be seen, Stirng.split method called Pattern.split method. Read Pattern.split method code:

Attention to look at the first nine lines: Stirng match = input.subSequence (intdex, m.start ()). ToString ();

The match was split out of here, the String of small objects, it is actually String objects subSequence great results. Read String.subSequence code:

String.subSequence have called String.subString, continue to see:

11,12 look at the first line, we finally see the prospect of a solution, if the content is complete subString the original string, then return to the original String object; Otherwise, we will create a new String object, but this looks like using the original String object String object char []. Through String constructor to confirm this point:

In order to avoid the memory copy speed, Sun JDK directly reuse the original String objects char [], offset and length to identify the contents of the different strings. In other words, subString out to String of small objects will still be pointing to the original String Large Objects char [], split is the same situation. This explains why the HashMap of String objects char [] are so great.

Reasons to explain

In fact, out of the previous section has analyzed the cause, and then tidy up this section:

  1. Program from each request to get a String large object, the object of internal char [] the length of the hundreds of K.
  2. Program String large objects do split, the String will be split into smaller objects HashMap that is used for the cache.
  3. Sun JDK6 right String.split method is optimized, split out Stirng object directly using the original String object char []
  4. HashMap Each String object actually points to a huge char []
  5. HashMap is capped at 10000, so the cached objects Sting total size = 10000 * 100 K = G-class.
  6. G-level cache memory is occupied, and a lot of memory is wasted, resulting in signs of memory leaks.


Find the reasons for the solution, and will have. split is to use, but we should not split up into the HashMap of String objects directly, but instead call about String copy constructor String (String original), this constructor is safe, concrete can see the code:

Only, new String (string) the code is very strange, 囧 . Perhaps, subString and the split should provide an option to let the programmer control over whether or reuse String objects char [].

Does Bug

Although, subString and the split caused by the realization of the problem now, but this bug can count String class do? Personally feel that hard to say. Because such optimization is more reasonable, subString and spit the result is certainly a continuous sub-sequence of the original string. Can only say, String is not just a core class, which for the JVM is just as important as the type of the original type.

JDK implementation of the String to do all possible optimization is understandable. However, optimization has brought hardship, we have enough understanding of their programmers in order to make good use of them.
  • del.icio.us
  • StumbleUpon
  • Digg
  • TwitThis
  • Mixx
  • Technorati
  • Facebook
  • NewsVine
  • Reddit
  • Google
  • LinkedIn
  • YahooMyWeb

Related Posts of Optimization into a worry: String.split caused by "memory leak"

  • hibernate Technical Study Notes (first)

    Introduction: Model does not match (impedance mismatch) java object-oriented language, object model, its key concepts are: inheritance, association, polymorphism, etc.; database is the relational model, its key concepts are: tables, primary keys, for ...

  • Rails2.0.2 change the default DB adpter

    In Rails2.0.2 rails demo ... ... MissingSourceFile in SayController # hello no such file to load - sqlite3 RAILS_ROOT: / home / kenb / rails-projects / demo ... ... Checked config / database.yml, adpter default is set become the sqlite3. Check the ra ...

  • ROR resources

    Ruby Web site resources: ruby official website: http://www.ruby-lang.org ruby doc official website: http://www.ruby-doc.org rubyonrails official website: http://www.rubyonrails.org programming ruby online edition (Ruby studying the "Bible") ...

  • Building Scalable java EE application (2)

    When the number of concurrent users marked the beginning of growth, you may be dissatisfied with a single machine can provide the performance, or because of a single JVM instance gc restrictions, you can not expand your java application, in such circumsta

  • JAVA interview questions

    JAVA interview questions 1, object-oriented features of what has 1. Abstract 2. Inheritance 3. Packaging 4. Polymorphisms 2, String data types are the most basic right? Basic data types include byte, int, char, long, float, double, boolean and short. java

  • hibernate using c3p0 connection pooling

    Private http://www.lifevv.com/tenyo/doc/20070605102040991.html c3p0 for open source's JDBC connection pool, with the release hibernate. This article describes how to use the hibernate configuration in c3p0. c3p0 connection pool configuration is v ...

  • Hibernate configuration parameters hibernate.hbm2ddl.auto

    Hibernate in the configuration file: <properties> <property name="hibernate.hbm2ddl.auto" value="create" /> </ properties> Parameter Description: validate load hibernate, the authentication to create a database t ...

  • In the servlet use Bean

    According to Sun's definition, JavaBean is a reusable software components. In fact JavaBean is a Java class, through the package into a property and methods of treatment of a function or a business object, referred to as bean. Because JavaBean is ...

  • Learn Java flow

    Related Articles: J2EE without EJB Introducing to Spring Framework (English revised edition) J2EE without EJB caused consider Recommend circles: reading space More related recommend Java Learning Path (1), tools, articles First, JDK (Java Development Kit)

  • Hibernate secondary cache

    Hibernate cache: 2-bit cache, also known as process-level cache or SessionFactory level cache, secondary cache can be shared by all of the session Cache configuration and the use of: Will echcache.xml (the document code in hibernate package directory ...

blog comments powered by Disqus
Recent Entries
Tag Cloud
Random Entries