Java.lang.OutofMemoryError when converting XML in a huge directory

I want to use XSLT2 to transform XML files in a huge directory with many levels. There are more than 1 million files, each of which is 4 to 10 kB. After a while I always Received java.lang.OutOfMemoryError: Java heap space.

My command is:
java -Xmx3072M -XX: UseConcMarkSweepGC -XX: CMSClassUnloadingEna
bled -XX: MaxPermSize = 512M ……

Adding more memory to -Xmx is not a good solution.

This is my code:

for (File file: dir.listFiles()) {
if (file.isDirectory()) {
pushDocuments(file);
} else {
indexFiles.index( file);
}
}

public void index(File file) {
ByteArrayOutputStream outputStream = new ByteArrayOutputStream();

try {
xslTransformer.xslTransform(outputStream, file);
outputStream.flush();
outputStream.close();
} catch (IOException e) {
System .err.println(e.toString());
}
}

XSLT conversion performed by net.sf.saxon.s9api

public void xslTransform(ByteArrayOutputStream outputStream, File xmlFile) {
try {
XdmNode source = proc.newDocumentBuilder().build(new StreamSource(xmlFile));
Serializer out = proc.newSerializer();
out.setOutputStream( outputStream);
transformer.setInitialContextNode(source);
transformer.setDestination(out);
transformer.transform();

out.close();< br />} catch (SaxonApiException e) {
System.err.println(e.toString());
}
}

My usual recommendation for the Saxon s9api interface is to reuse the XsltExecutable object, but create a new XsltTransformer for each transformation. XsltTransformer will cache the documents you have read , In case you need them again. In this case, this is not what you want.

As an alternative, you can call xsltTransformer.getUnderlyingController().clearDocumentPool( after each transformation ).

(Please note that you can ask Saxon’s questions on saxonica.plan.io, it is very likely that we [Saxonica] will notice and answer them. You can also ask them here and Mark them as “saxon”, which means that we will be able to answer this question at some point, but not always immediately. If you ask on StackOverflow that there is no product-specific label, will anyone notice this? Question, this is totally time and time again.

I want to use XSLT2 to convert XM in a huge directory with many levels. L file. There are more than 1 million files, each file is 4 to 10 kB. After a while I always receive java.lang.OutOfMemoryError: Java heap space.

My command Yes:
java -Xmx3072M -XX: UseConcMarkSweepGC -XX: CMSClassUnloadingEna
bled -XX: MaxPermSize = 512M ……

Adding more memory to -Xmx is not a good solution.< /p>

This is my code:

for (File file: dir.listFiles()) {
if (file.isDirectory()) {
pushDocuments(file);
} else {
indexFiles.index(file);
}
}

public void index( File file) {
ByteArrayOutputStream outputStream = new ByteArrayOutputStream();

try {
xslTransformer.xslTransform(outputStream, file);
outputStream.flush();< br /> outputStream.close();
} catch (IOException e) {
System.err.println(e.toString());
}
}

XSLT transformation performed by net.sf.saxon.s9api

public void xslTransform(ByteArrayOutputStream outputStream, File xmlFile) {
try {
XdmNode source = proc.newDocumentBuil der().build(new StreamSource(xmlFile));
Serializer out = proc.newSerializer();
out.setOutputStream(outputStream);
transformer.setInitialContextNode(source);
transformer.setDestination(out);
transformer.transform();

out.close();
} catch (SaxonApiException e) {
System. err.println(e.toString());
}
}

My usual recommendation for the Saxon s9api interface is to reuse XsltExecutable Object, but create a new XsltTransformer for each transformation. XsltTransformer caches the documents you have read in case you need them again. In this case, this is not what you want.

< p>As an alternative, you can call xsltTransformer.getUnderlyingController().clearDocumentPool() after each conversion.

(Please note that you can ask Saxon’s questions on saxonica.plan.io, this It is very likely that we [Saxonica] will notice and answer them. You can also ask them here and mark them as “saxon”, which means we will probably answer this question at some point, but not always Answer immediately. If you ask on StackOverflow that there is no product-specific label, then whether anyone will notice this problem, it is totally time and time again.

Leave a Comment

Your email address will not be published.