通过 JCR 实现基于标签的搜索系统的最佳方式,例如 Modeshape

2022-01-18 00:00:00 tags java jcr modeshape

我需要一个 JCR 的基于标签的搜索系统,比如 Modeshape.我想通过一些标签搜索节点.问题是实现它的最佳方法是什么?

I need a tag based search system by JCR like Modeshape. i want to search nodes by some tags. Question is that what is the best way to implement it?

  1. 为标签添加新的节点类型和 mixin如果这是真的,我在哪里可以定义用户可以查看的标签名称?
  2. 实现标签的层次结构并在我的节点中引用它们.如果这是真的,我该如何引用它们?
  3. 任何其他方式.

推荐答案

在 JCR 中有几种实现标签的方法.您选择哪个选项将取决于您自己的应用程序的需求.以下是我知道的四个选项.

There are several ways to implement tags in JCR. Which option you pick will depend upon the needs of your own application(s). Here are four options I know of.

选项 1:使用 Mixins

为每个标签定义一个混合节点类型定义,它是一个标记混合(它没有属性定义或子节点定义),使用 NodeTypeManager 动态注册它们.然后,当您想要标记"一个节点时,只需将代表该标记的 mixin 添加到该节点即可.任何节点都可以有多个标签,您可以查询具有特定标签的所有节点.

Define for each tag a mixin node type definition that is a marker mixin (it has no property definitions or child node definitions), registering them dynamically using the NodeTypeManager. Then when you want to "tag" a node, simply add to that node the mixin that represents the tag. Any node could have multiple tags, and you could query for all the nodes that have a particular tag.

(在此响应的其余部分中,acme"用作通用命名空间.您应该将其替换为适合您自己的应用程序和组织的命名空间.)

例如,给定标签acme:tag1",您可以通过简单查询找到所有具有此标签的节点:

For example, given a tag "acme:tag1", you could find all nodes that have this tag with the simple query:

SELECT * FROM [acme:tag1]

这种方法的缺点是维护标签很麻烦.创建新标签需要注册新的节点类型.您不能轻松地重命名标签,而是必须为具有新名称的标签创建 mixin;找到所有具有代表旧标签的mixin的节点,删除旧的mixin,并添加新的;最后删除旧标签的节点类型定义(在它不再在任何地方使用之后).删除旧标签以类似的方式完成.另一个缺点是不容易将额外的元数据(例如,显示名称)与标签相关联,因为节点类型定义上不允许有额外的属性.

The disadvantage of this approach is that maintaining tags is cumbersome. Creating new tags requires registering new node types. You cannot easily rename tags, but instead would have to create the mixin for the tag with the new name; find all nodes that have the mixin representing the old tag, remove the old mixin, and add the new one; and finally remove the node type definition for the old tag (after it is no longer used anywhere). Removing old tags is done in a similar manner. Another disadvantage is that it is not easy to associate additional metadata (e.g., display name) with a tag, since extra properties aren't allowed on node type definitions.

这种方法应该执行得很好.

This approach should perform quite well.

选项 2:使用分类法和强引用

在这种方法中,您将在存储库的一个区域中创建一个简单的节点结构,您可以在其中为每个标签(例如分类)创建一个节点.在此节点上,您可以设置描述标签的属性(例如,显示名称);这些属性可以随时更改(例如,重命名标签).

In this approach, you would create a simple node structure in an area of the repository into which you can create a node for each tag (e.g., a taxonomy). On this node you could set properties that describe the tag (e.g., display name); these properties can be changed at any time (e.g., to rename the tag).

然后要将标签应用"到节点,您只需创建与标签的某种关系.一种方法是定义一个包含 REFERENCE 类型的acme:tags"多值属性的 mixin 节点类型.当你想对一个节点应用一个或多个标签时,只需将 mixin 添加到节点并将acme:tags"属性设置为标签节点.

Then to "apply" the tag to a node, you simply have to create some sort of relationship to the tag. One way is to define a mixin node type that contains a "acme:tags" multivalued property of type REFERENCE. When you want to apply one or more tags to a node, simply add the mixin to the node and set the "acme:tags" property to the tag node(s).

要查找特定标签的所有节点,您可以在标签节点上调用getReferences()"来查找包含对该标签节点的引用的所有节点.

To find all nodes of a particular tag, you can call "getReferences()" on a tag node to find all of the nodes that contain a reference to the tag node.

这种方法的好处是,所有标签都必须在一个或多个分类法(可能包括用户特定的分类法)中进行控制/管理.但是,也有一些缺点.首先,REFERENCE 属性的性能可能不是很好.一些 JCR 实现完全不鼓励使用 REFERENCES.ModeShape 不会,但是当有许多节点包含对同一节点的引用时(例如,许多节点具有单个标记),ModeShape 可能会开始降低 REFERENCE 性能.

This approach has the benefit that all tags have to be controlled/managed within one or more taxonomies (including perhaps user-specific taxonomies). However, there are some disadvantages, too. First and foremost, the performance of REFERENCE properties might not be great. Some JCR implementations discourage the use of REFERENCES altogether. ModeShape does not, but ModeShape might start to degrade REFERENCE performance when there are lots of nodes that contain references to the same node (e.g., lots of nodes with a single tag).

选项 3:使用分类法和弱引用

此选项与上面的选项 2 类似,只是acme:tags"属性将是 WEAKREFERENCE 而不是 REFERENCE.您仍将定义和管理一个或多个分类法.要查找具有特定标签的节点,您不能在标签节点上使用getReferences()"方法(因为它们不适用于 WEAKREFERENCE 属性),但您可以通过查询轻松做到这一点:

This option is a hybrid similar to Option 2 above except that the "acme:tags" properties would be WEAKREFERENCE instead of REFERENCE. You would still define and manage one or more taxonomies. To find nodes with a particular tag, you can't use the "getReferences()" method on the tag node (since they don't work with WEAKREFERENCE properties), but you can easily do this with a query:

SELECT * FROM [acme:taggable] AS taggable 
JOIN [acme:tag] AS tag ON taggable.[acme:tags] = tag.[jcr:uuid]
AND LOCALNAME(tag) = 'tag1'

这种方法确实使用一个或多个分类法,使控制标签更容易一些,因为它们必须存在于分类法中才能使用.重命名和删除也更容易.在性能方面,这比 REFERENCE 方法要好,因为 WEAKREFERENCE 属性在处理大量引用时会表现得更好,无论它们都指向一个节点还是多个节点.

This approach does enforce using one or more taxonomies, makes it a bit easier to control the tags, since they must exist in a taxonomy before they can be used. Renaming and removing is also easier. Performance-wise, this is better than the REFERENCE approach, since WEAKREFERENCE properties will perform better with large numbers of references, regardless of whether they all point to one node or many.

缺点是即使标签仍在使用,您也可以删除它,但包含对该已删除标签的 WEAKREFERENCE 的节点将不再有效.这可以通过应用程序中的一些约定来解决,或者通过简单地使用分类上的元数据来说明特定标签已弃用"并且不应使用.(IMO,后者实际上是这种方法的一个好处.)

The disadvantage is that you can remove a tag even if it is still used, but the nodes that contain a WEAKREFERENCE to that removed tag will not be valid anymore. This can be remedied with some conventions in your application, or by simply using metadata on the taxonomy to say that a particular tag is "deprecated" and shouldn't be used. (IMO, the latter is actually a benefit of this approach.)

此选项的性能和扩展性通常比选项 2 好.

This option will generally perform and scale much better than Option 2.

选项 4:使用字符串属性

另一种方法是简单地使用 STRING 属性来使用要应用的标签的名称来标记每个节点.例如,您可以定义一个定义多值 STRING 属性的 mixin(例如,acme:taggable"),当您想要标记节点时,只需添加 mixin(如果尚未存在)并添加标记为 "acme:tags" STRING 属性上的值(同样,如果它尚未作为值存在).

Yet another approach is to simply use a STRING property to tag each node with the name of the tag(s) that are to be applied. For example, you could define a mixin (e.g., "acme:taggable") that defines a multi-valued STRING property, and when you want to tag a node simply add the mixin (if not already present) and add the name of the tag as a value on the "acme:tags" STRING property (again, if it's not already present as a value).

这种方法的主要优点是非常简单:您只需在要标记的节点上使用字符串值.要查找带有特定标签(例如tag1")的所有节点,只需发出查询:

The primary advantage of this approach is that it is very simple: you're simply using string values on the node that is to be tagged. To find all nodes that are tagged with a particular tag (e.g., "tag1"), simply issue a query:

SELECT * 
FROM [acme:taggable] AS taggable 
WHERE taggable.[acme:tags] = 'tag1'

标签的管理很容易:没有管理.如果要重命名标签,则可以重命名标签值.如果要删除标签(并从带有标签的节点中删除),则可以通过从acme:tags"属性中删除值来完成(可能在后台作业中).

Management of the tags is easy: there is no management. If a tag is to be renamed, then you could rename the tag values. If a tag is to be deleted (and removed from the nodes that are tagged with it), then that can be done by removing the values from the "acme:tags" properties (perhaps in a background job).

请注意,这允许使用任何标签名称,因此最适用于标签名称根本不受控制的情况.如果您想控制用作标签值的字符串列表,只需在存储库中创建一个分类(如上面的选项 2 和 3 中所述),然后让您的应用程序将值限制为分类中的值.您甚至可以拥有多个分类法,其中一些可能是特定于用户的.但这种方法与选项 2 或 3 的控制力不同.

Note that this allows any tag name to be used, and thus works best for cases where the tag names are not controlled at all. If you want to control the list of strings used as tag values, simply create a taxonomy in the repository (as described in Options 2 and 3 above) and have your application limit the values to those in the taxonomy. You can even have multiple taxonomies, some of which are perhaps user-specific. But this approach doesn't have quite the same control as Options 2 or 3.

此选项的性能将比选项 3 好一点(因为查询更简单),但也可以扩展.

This option will perform a bit better than Option 3 (since the queries are simpler), but will scale just as well.

相关文章