如何解析/解压缩/解压缩 Nexus 生成的 Maven 存储库索引
我已经从 http://mirrors.ibiblio.org/pub/mirrors/maven2/dot-index/nexus-maven-repository-index.gz
我想列出这些索引文件中的工件信息(例如 groupId、artifactId、version).我已经读到有一个高级API.看来我必须使用以下 maven 依赖项.但是,我不知道要使用的入口点是什么(哪个类?)以及如何使用它来访问这些文件:
I would like to list the artifacts information from these index files (groupId, artifactId, version for example). I have read that there is a high level API for that. It seems that I have to use the following maven dependency. However, I don't know what is the entry point to use (which class?) and how to use it to access those files:
<dependency>
<groupId>org.sonatype.nexus</groupId>
<artifactId>nexus-indexer</artifactId>
<version>3.0.4</version>
</dependency>
推荐答案
看看 https://github.com/cstamas/maven-indexer-examples 项目.
简而言之:您不需要手动下载 GZ/ZIP(新/旧格式),它会为您完成索引器(此外,如果可能,它还会为您处理增量更新).
In short: you dont need to download the GZ/ZIP (new/legacy format) manually, it will indexer take care of doing it for you (moreover, it will handle incremental updates for you too, if possible).
GZ 是新"格式,独立于 Lucene 索引格式(因此,独立于 Lucene 版本),仅包含数据,而 ZIP 是旧"格式,它实际上是简单的 Lucene 2.4.x 索引压缩.目前没有发生数据内容更改,但计划在未来发生.
GZ is the "new" format, independent of Lucene index-format (hence, independent of Lucene version) containing data only, while the ZIP is "old" format, which is actually plain Lucene 2.4.x index zipped up. No data content change happens currently, but is planned in future.
正如我所说,两者之间没有数据内容差异,但某些字段(如您所见)已编入索引但未存储在索引中,因此,如果您使用 ZIP 格式,您将让它们可搜索,但不可检索.
As I said, there is no data content difference between two, but some fields (like you noticed) are Indexed but not stored on index, hence, if you consume the ZIP format, you will have them searchable, but not retrievable.
相关文章