freemarker+wkhtmltopdf生成花里胡哨的pdf记录

2023-01-02 00:00:00 记录生成花里胡哨

背景介绍
- 业务需求需要最终生成pdf，这个pdf花里花哨的能达到二三十页之多，目录、页眉页脚、表格数据、图片等都需要根据数据的变化跟着变化，拿到这份需求内心是崩溃的太难了。
开始寻找解决方案
- 以前同事也通过api的方式生成过word文档，但那种样式简单，能变化的也只有表格的多少，还有一些基本信息，且不要求页眉页脚字体等。免费版api也就支持几页word或pdf，肯定不能支持如今的需求了，于是上收费版，但是看了眼价格2w多，瞬间想想换个方式吧。（其实主要是目前需求的样式设计很复杂，如果都夹杂到java的业务逻辑中，耦合度也太高了，并且样式什么调整难度太大，开发进度会异常缓慢。）
- 由于样式复杂就想到了使用freemarker生成html，再通过html转pdf这种方式生成，这步是可行的，pdf的样式问题也很容易通过html去进行控制，并且freemarker对java也是支持的。
- 接下来就是找html转pdf的工具，最开始前端同事找到了itext来转pdf，但是我通过调研和尝试发现，这玩意写起来也是相当难受，还得用java去控制itext，需要去深入了解itext才行，并且他要求你的html格式要非常严格，标签必须有头有尾，不然就给你报错，div嵌套多了也得给你报错，并且字体这方面也支持不太好，（因为我们要用授权的字体去生成pdf，不然会侵权的。）
- itext效果也不好，当时就给我整懵逼了，于是在网上继续寻找解决方案，找了一圈也没发现啥好的方法，于是上到github开始寻找，果不其然wkhtmltopdf出现在我眼前，这玩意看了介绍就很简单，个人理解的原理：command+p打印预览转储为pdf，和这个差不多。网络上介绍的原理：是一个使用 Qt WebKit 引擎做渲染的，能够把html 文档转换成 pdf 文档或图片(image) 的命令行工具。
- wkhtmltopdf这个东西用起来很方便，就是个命令行工具，简单例子wkhtmltopdf in.html out.html这多简单，比itext舒服多了。
wkhtmltopdf
- wkhtmltopdf的引擎是基于WebKit ，WebKit是Safari、Mail、App Store 和 macOS、iOS 和 Linux 上的许多其他应用程序使用的Web 浏览器引擎。网上说早些时候谷歌浏览器也用过WebKit。
- wkhtmltopdf开源并且在github上托管，现在有1w多star。
- 官网: https://wkhtmltopdf.org/
- github：https://github.com/wkhtmltopdf/wkhtmltopdf
- 国内的一篇介绍：https://www.jianshu.com/p/4d65857ffe5e
- 命令介绍（原版）:https://wkhtmltopdf.org/usage/wkhtmltopdf.txt

wkhtmltopdf基于java

public FileItem wkhtmltopdfConvert(String srcPath, String destPath, String fileName) throws Exception { 
    StringBuilder cmd = new StringBuilder();
    cmd.append(findWkHtmlToPdfExecutable());
    cmd.append(StringUtils.SPACE);
    cmd.append("--margin-top 0 --margin-bottom 0 --margin-left 0 --margin-right 0");
    cmd.append(StringUtils.SPACE);
    cmd.append("--enable-local-file-access --disable-smart-shrinking");
    cmd.append(StringUtils.SPACE);
    cmd.append(srcPath);
    cmd.append(StringUtils.SPACE);
    cmd.append(destPath);
    log.debug("wkhtmltopdf command executed:{}", cmd);
    boolean result = true;
    try { 
        Process proc = Runtime.getRuntime().exec(cmd.toString());
        HtmlToPdfInterceptor error = new HtmlToPdfInterceptor(proc.getErrorStream());
        HtmlToPdfInterceptor output = new HtmlToPdfInterceptor(proc.getInputStream());
        error.start();
        output.start();
        proc.waitFor();
    } catch (Exception e) { 
        result = false;
        e.printStackTrace();
    }
    if (!result) { 
        return null;
    }
    InputStream inputStream = new FileInputStream(destPath);
    FileItem fileItem = storeTo(inputStream, APPLICATION_PDF, fileName);
    try (InputStream in = fileItem.getInputStream(); FileOutputStream out = new FileOutputStream(destPath)) { 
        IOUtils.copy(in, out);
    } catch (FileNotFoundException e) { 
        e.printStackTrace();
    }
    return fileItem;
}
public FileItem storeTo(InputStream data, String streamValue, String fileName) throws IOException { 
    FileItemFactory fileItemFactory = new DiskFileItemFactory(1024 * 1024 * 10, null);
    FileItem fileItem = fileItemFactory.createItem("file", streamValue, true,
            fileName);
    OutputStream outputStream = fileItem.getOutputStream();

    try { 
        int read = 0;
        byte[] bytes = new byte[1024];
        while ((read = data.read(bytes)) != -1) { 
            outputStream.write(bytes, 0, read);
        }
    } finally { 
        outputStream.flush();
        outputStream.close();
    }

    return fileItem;
}
public String findWkHtmlToPdfExecutable() { 
    Process process;
    try { 
        String osName = System.getProperty("os.name").toLowerCase();
        String cmd = osName.contains("windows") ? "where wkhtmltopdf" : "which wkhtmltopdf";
        process = Runtime.getRuntime().exec(cmd);
        HtmlToPdfInterceptor error = new HtmlToPdfInterceptor(process.getErrorStream());
        error.start();
        process.waitFor();
        return IOUtils.toString(process.getInputStream(), Charset.defaultCharset());
    } catch (Exception e) { 
        log.warn("no wkhtmltopdf found!", e);
    }
    return "wkhtmltopdf";
}

关于踩坑
- 生成的pdf占不满整张页面，一方面是设计的html页面问题如果都改成相对定位可能会解决，但是我用的方案是html全部按照a4纸大小 842*595px尽量绝对定位，然后通过wkhtmltopdf的命令进行方法 --zoom 1.2这样的。注意，也有可能是body的margin自带8px间距导致的！
- 生成的pdf含有多于空白页：用第三方工具删除最后一页的空白页，但这个方法其实并不好，所以只是暂定方案。为什么会多出来空白页，个人分析是因为分页问题导致的，因为我的pdf最后一页是一张图片完全占满，导致工具换行所以多了一页。如果你最好一页高度小于835px应该不会产生空白页。java删除最后一页代码：
```
pom :
<dependency>
   <groupId>org.apache.pdfbox</groupId>
   <artifactId>pdfbox-app</artifactId>
   <version>1.8.10</version>
</dependency>
public void cutPdf(String path, String newPath) throws Exception { 
   File file = new File(path);
   if (!file.exists()) { 
       return;
   }
   PDDocument document = PDDocument.load(file);
   int noOfPages = document.getNumberOfPages();
   // 删除最后一页
   document.removePage(noOfPages - 1);
   try { 
       document.save(newPath);
   } catch (Exception e) { 
       e.printStackTrace();
   } finally { 
       document.close();
   }
} 
```
- 不同的操作系统直接工具的渲染会有差异。所以在本地搞差不多就在测试环境调样式吧。
- 部分图片出现变浅的情况，参考: https://github.com/wkhtmltopdf/wkhtmltopdf/issues/2221，我的是页眉出现了变浅，html <div style="position: absolute;width: 595px;height:44px;background-image:url('yemei.jpg');background-size: cover;z-index: 1000;">，其实也是参考了github的解决方案。z-index: 1000;position: relative;
- 字体不支持的问题：参考:https://blog.csdn.net/nandao158/article/details/105812976但他的方案我尝试了没有效果，我的方案是将ttf格式字体通过转换变为svg格式字体再引入就行了。这个我搞了一天才搞出来，属于完全的乱尝试碰上了，过程很难受。@font-face { font-family: 'ziti'; src: url("ziti.svg") format("svg"); }
- 样式问题：部分css失效，主要是flex相关的，还有一些坑参考：https://www.jianshu.com/p/57c897cfaa27
- 分页：页面之间用一个div包起来加入html page-break-inside: avoid;
- 工具部署到docker：直接把工具安装到docker的基础镜像就行了，这个倒是不难。
总结
- 我本来是后端因为前端没人力所以交给我来做。前端这些也是才接触到，开发过程很难受，还有就是html适配pdf工具的过程也是折磨。但是学习到了一些前端的知识还是有很大收获的。踩了很多坑才总结出来，希望大家有好的解决方案也提出了。生成pdf这个确实太不好做了！！！
- 这套方案也是我在部门里首次用到的，相对以前还是有突破性的。直接给公司省了2w购买生成word的Api的费用。并且这套方案相比以前纯java调用api生成pdf节省30%的人日成本。并且支持的pdf样式有了较高的提升。

    原文作者：这题我咋不会？
    原文地址: https://blog.csdn.net/lilian9215/article/details/124966637
    本文转自网络文章，转载此文章仅为分享知识，如有侵权，请联系博主进行删除。

相关文章