如何在 Java 中有效地管理文件系统上的文件?

2022-01-24 00:00:00 file filesystems save web-services java

我正在创建一些 JAX-WS 端点,我想为它们保存接收和发送的消息以供以后检查.为此,我计划将消息(XML 文件)以某种合理的层次结构保存到文件系统中.每天将有数百甚至数千个文件.我还需要为每个文件存储元数据.

I am creating a few JAX-WS endpoints, for which I want to save the received and sent messages for later inspection. To do this, I am planning to save the messages (XML files) into filesystem, in some sensible hierarchy. There will be hundreds, even thousands of files per day. I also need to store metadata for each file.

我正在考虑将元数据(仅几个字段)放入数据库表中,但将 XML 文件内容本身放入文件系统中的文件中,以免因内容数据(很少读取)而使数据库膨胀.

I am considering to put the metadata (just a couple of fields) into database table, but the XML file content itself into files in a filesystem in order not to bloat the database with content data (that is seldomly read).

是否有一些简单的库可以帮助我保存、加载、删除文件等?自己实现也没那么难,但不知道有没有现成的解决方案?只是一个简单的库,已经提供了对文件系统的轻松访问(最好通过不同的操作系统).

Is there some simple library that helps me in saving, loading, deleting etc. the files? It's not that tricky to implement it myself, but I wonder if there are existing solutions? Just a simple library that already provides easy access to filesystem (preferrably over different operating systems).

或者我什至需要那个,我应该只使用原始/自定义 Java 吗?

Or do I even need that, should I just go with raw/custom Java?

推荐答案

有没有一些简单的库帮助我保存、加载、删除等文件?这不是那么棘手自己实现它,但我想知道如果有现有的解决方案?只是一个已经提供的简单库轻松访问文件系统(最好在不同的操作系统上).

Is there some simple library that helps me in saving, loading, deleting etc. the files? It's not that tricky to implement it myself, but I wonder if there are existing solutions? Just a simple library that already provides easy access to filesystem (preferrably over different operating systems).

Java API

好吧,如果你需要做的事情真的很简单,你应该可以通过 java.io.File(删除、检查存在、读取、写入等)和一些使用 FileInputStream 和 FileOutputStream.

Well, if what you need to do is really simple, you should be able to achieve your goal with java.io.File (delete, check existence, read, write, etc.) and a few stream manipulations with FileInputStream and FileOutputStream.

你也可以加入 Apache commons-io 及其方便的 FileUtils 了解更多实用功能.

You can also throw in Apache commons-io and its handy FileUtils for a few more utility functions.

Java 独立于操作系统.你只需要确保你使用 File.pathSeparator,或者使用构造函数 File(File parent, String child) 这样你就不需要明确提到分隔符.

Java is independent of the OS. You just need to make sure you use File.pathSeparator, or use the constructor File(File parent, String child) so that you don't need to explicitly mention the separator.

Java 文件 API 比较高级,可以抽象出许多操作系统的差异.大多数时候就足够了.仅当您需要一些不在 API 中的相对特定于操作系统的功能时,它才有一些缺点,例如检查磁盘上文件的物理大小(不是逻辑大小)、*nix 上的安全权限、硬盘驱动器的可用空间/配额等.

The Java file API is relatively high-level to abstract the differences of the many OS. Most of the time it's sufficient. It has some shortcomings only if you need some relatively OS-specific feature which is not in the API, e.g. check the physical size of a file on the disk (not the the logical size), security rights on *nix, free space/quota of the hard drive, etc.

大多数操作系统都有一个用于文件写入/读取的内部缓冲区.使用 FileOutputStream.writeFileOutputStream.flush 确保数据已发送到操作系统,但不必写入磁盘.Java API 还支持这种低级集成来管理这些缓冲问题(例如 here) 用于数据库等系统.

Most OS have an internal buffer for file writing/reading. Using FileOutputStream.write and FileOutputStream.flush ensure the data have been sent to the OS, but not necessary written on the disk. The Java API support also this low-level integration to manage these buffering issue (example here) for system such as database.

文件和目录也是用File抽象出来的,你需要用isDirectory检查.这可能会令人困惑,例如,如果您有一个文件 x 和一个目录 /x(我不记得具体如何处理这个问题,但有一个方式).

Also both file and directory are abstracted with File and you need to check with isDirectory. This can be confusing, for instance if you have one file x, and one directory /x (I don't remember exactly how to handle this issue, but there is a way).

网络服务

Web 服务可以使用 xs:base64Binary 来传递数据,也可以使用 MTOM(消息传输优化机制)如果文件很大.

The web service can use either xs:base64Binary to pass the data, or use MTOM (Message Transmission Optimization Mechanism) if files are large.

交易

请注意,数据库是事务性的,而文件系统不是.因此,如果操作失败并重试,您可能必须添加一些检查.

Note that the database is transactional and the file system not. So you might have to add a few checks if operations fails and are re-tried.

您可以使用涉及某种形式的分布式事务的复杂设计(请参阅此 answer),或者尝试采用更简单的设计,以提供您所需的稳健性水平.可能的设计是:

You could go with a complicated design involving some form of distributed transaction (see this answer), or try to go with a simpler design that provides the level of robustness that you need. A possible design could be:

  • 更新.如果用户想要覆盖一个文件,你实际上是创建一个新文件.逻辑文件名和物理文件之间的间接级别存储在数据库中.这样一来,您就永远不会覆盖写入的物理文件,以确保回滚是一致的.
  • 创建.当用户想要创建一个文件时,同样的故事
  • 删除.如果用户想删除一个文件,你只能先在数据库中进行.定期作业轮询文件系统以识别未在数据库中列出的文件,并将其删除.这种两阶段删除确保删除操作可以回滚.
  • Update. If the user wants to overwrite a file, you actually create a new one. The level of indirection between the logical file name and the physical file is stored in database. This way you never overwrite a physical file once written, to ensure rollback is consistent.
  • Create. Same story when user want to create a file
  • Delete. If the user want to delete a file, you do it only in database first. A periodic job polls the file system to identify files which are not listed in database, and removes them. This two-phase deletes ensures that the delete operation can be rolled back.

这不如在真实事务数据库中写入 BLOB 健壮,但提供了一些健壮性.否则你可以看看 commons-transaction,但我觉得这个项目已经死了(2007).

This is not as robust as writting BLOB in real transactional database, but provide some robustness. You could otherwise have a look at commons-transaction, but I feel like the project is dead (2007).

相关文章