Requirement: Due to system switching, it is required to patch the src attribute of the img tag in the webpage content in the database. For example:
Content="<p><img title=\"122444234\" src=\"/files/post/122444234.jpg\"/><p>Other characters";
After the replacement is required:
Content="<p><img title=\"122444234\" src=\"http://xxx.xxx.com/files/post/122444234_500.jpg\" /><p>Other characters";
Use regular to solve, the code is as follows (ApiUtil.java static method)
Java code collection code
/**
* Double-pack the src in the img tag
* @param content content
* @param replaceHttp needs to be added to the domain name in src
* @param size needs to add _size to the file name in src
* @return
*/
Public static String repairContent(String content,String replaceHttp,int size){
String patternStr="<img\\s*([^>]*)\\s*src=\\\"(.*?)\\\"\\s*([^>]*)>";
Pattern pattern = Pattern.compile(patternStr,Pattern.CASE_INSENSITIVE);
Matcher matcher = pattern.matcher(content);
String result = content;
While(matcher.find()) {
String src = matcher.group(2);
Logger.debug("pattern string:"+src);
String replaceSrc = "";
If(src.lastIndexOf(".")>0){
replaceSrc = src.substring(0,src.lastIndexOf("."))+"_"+size+src.substring(src.lastIndexOf("."));
}
If(!src.startsWith("http://")&&!src.startsWith("https://")){
replaceSrc = replaceHttp + replaceSrc;
}
Result = result.replaceAll(src,replaceSrc);
}
Logger.debug(" content == " +content);
Logger.debug(" result == " + result);
Return result;
}
Test code:
Java code collection code
Public static void main(String[] args) {
String content = "<p><img title=\"10010001\" src=\"/files/post/10010001.gif\" width=\"200\" height=\"300\" />" +
"</p><p><img title=\"10010002\" src=\"/files/post/10010002.gif\" width=\"500\" height=\"300\" /><p> </p>"+
"</p><p><img title=\"10010003\" src=\"/files/post/10010003.gif\" width=\"600\" height=\"300\" /><p> </p>";
String replaceHttp = "http://www.baidu.com";
Int size = 500;
String result = ApiUtil.repairContent(content, replaceHttp, size);
System.out.println(result);
}
The key is the regular expression:
<img\\s*([^>]*)\\s*src=\\\"(.*?)\\\"\\s*([^>]*)>
In particular ([^>]) can’t be replaced with ., otherwise it will only match <img to the last “>” symbol of the string. If the content of each src is different, only the last src will be replaced.