JAXB 将非 ASCII 字符转换为 ASCII 字符

2022-01-09 00:00:00 locale xsd java jaxb

我有一些元素名称包含非 ASCII 字符的 xsd 模式.当我使用 Eclipse Kepler 使用 Generate JAXB Classes 命令生成 java 类时,生成的类和它们的变量包含非 ASCII 字符.我想把这个非 ASCII 字符转换成 ASCII 字符.

我已经在 J​​AVA_TOOL_OPTIONS 设置了语言环境

-Duser.country=GB -Duser.language=en

例如

© ->一世Ç->CŞ->小号Ö->○Ğ ->Gü ->ü我->一世ö ->○ü->你ç->C?->GΣ ->s

解决方案

因为要求是通用解决方案并且不使用外部绑定文件,我在下面提供了 2 个选项:

选项 1 - 通用解决方案 - 创建自定义 XJC 插件以进行规范化

通用解决方案是有效的:

  1. 扩展 com.sun.tools.xjc.Plugin 抽象类并覆盖 JAXB 用于命名工件的方法 - 基本创建插件
  2. META-INF文件夹的services目录内具体调出实现名称后,将这个实现打包到一个jar中罐子
  3. 将这个新创建的 jar 与 jaxb 库一起部署并通过 ANT 运行它(build.xml 下面提供,请继续阅读)

为了您的目的,我创建了插件,您可以从

我使用了 选项 1 中已经提到的 xsd,其元素名称包含重音"(非 ASCII)字符:

如果我在没有指定外部绑定的情况下生成类,我会得到以下输出:

!

现在,如果我稍微更改绑定以生成我选择的类名和变量,我将 binding.xml 编写为:

<jxb:bindings xmlns:xs="http://www.w3.org/2001/XMLSchema"xmlns:jxb="http://java.sun.com/xml/ns/jaxb" 版本="2.1"><jxb:globalBindings localScoping="toplevel"/><jxb:bindings schemaLocation="some.xsd"><jxb:bindings node="//xs:element[@name='Şhİpto']"><jxb:class name="ShipTo"/></jxb:绑定><jxb:bindings node="//xs:element[@name='Örderperson']"><jxb:property name="OrderPerson"/></jxb:绑定><jxb:bindings node="//xs:element[@name='Şhİpto']//xs:complexType"><jxb:class name="ShipToo"/></jxb:绑定></jxb:绑定></jxb:绑定>

现在当我通过 eclipse 通过指定绑定文件生成我的类时:

在接下来的步骤中,我选择我得到的包和绑定文件,

注意:如果您不使用 eclipse 生成类,您可能需要检查 xjc 绑定编译器 来利用你的外部绑定文件.

I have some xsd schemas that element names contains non-ASCII characters. When I generate java classes using Generate JAXB Classes command using Eclipse Kepler, generated classes and variables of them contains non-ASCII characters. I want to transform this non-ASCII characters to ASCII characters.

I already set locale at JAVA_TOOL_OPTIONS

-Duser.country=GB -Duser.language=en

For example

İ -> I
Ç -> C
Ş -> S
Ö -> O
Ğ -> G
Ü -> U
ı -> i
ö -> o
ü -> u
ç -> c
ğ -> g
ş -> s

解决方案

EDIT: Since the requirement is of a generic solution and not using the external binding files, I have offered 2 options below:

Option 1 - A Generic Solution - Create a Custom XJC plugin to normalize

The generic solution is effectively:

  1. Extend com.sun.tools.xjc.Plugin abstract class and override methods that JAXB uses to name the artifacts - create a plugin bascially
  2. Pack this implementation in a jar after specifically calling out the name of the implementation within the services directory of the META-INF folder inside the jar
  3. Deploy this newly created jar along with jaxb libs and run it through ANT (build.xml provided below, read on)

For your purpose, I have created the plugin for which you can download the jar from here, download the ant script (build.xml) from here. Put the jar to your build path in eclipse and edit the ant file to provide your locations of your JAXB libs, target package of the generated classes, project name and schema location and run it. That's it!

Explanation:

I created a custom XJC plugin with an extra command line option -normalize to replace the accented characters in your created Java classes, methods, variables, properties and interfaces with their ASCII equivalents.

XJC has the capability of custom plugins creation to control the names, annotations and other attributes of the generated classes, variables and so on. This blog post though old can get you started with the basics of such plugin implementations.

Long story short, I created a class extending the abstract com.sun.tools.xjc.Plugin class, overriding its methods important one being onActivated.

In this method, I have set com.sun.tools.xjc.Option#setNameConverter to a custom class which takes care of overriding the required methods of acquiring names of the class, methods etc. I have committed the source to my git repo here as well, below is the detailed usage of it:

import java.text.Normalizer;

import org.xml.sax.ErrorHandler;
import org.xml.sax.SAXException;

import com.sun.tools.xjc.BadCommandLineException;
import com.sun.tools.xjc.Options;
import com.sun.tools.xjc.Plugin;
import com.sun.tools.xjc.outline.Outline;
import com.sun.xml.bind.api.impl.NameConverter;

/**
 * {@link Plugin} that normalized the names of JAXB generated artifacts
 * 
 * @author popofibo
 */
public class NormalizeElements extends Plugin {

    /**
     * Set the command line option
     */
    @Override
    public String getOptionName() {
        return "normalize";
    }

    /**
     * Usage content of the option
     */
    @Override
    public String getUsage() {
        return "  -normalize    :  normalize the classes and method names generated by removing the accented characters";
    }

    /**
     * Set the name converted option to a delegated custom implementation of
     * NameConverter.Standard
     */
    @Override
    public void onActivated(Options opts) throws BadCommandLineException {
        opts.setNameConverter(new NonAsciiConverter(), this);
    }

    /**
     * Always return true
     */
    @Override
    public boolean run(Outline model, Options opt, ErrorHandler errorHandler)
            throws SAXException {
        return true;
    }

}

/**
 * 
 * @author popofibo
 * 
 */
class NonAsciiConverter extends NameConverter.Standard {

    /**
     * Override the generated class name
     */
    @Override
    public String toClassName(String s) {
        String origStr = super.toClassName(s);
        return normalize(origStr);
    }

    /**
     * Override the generated property name
     */
    @Override
    public String toPropertyName(String s) {
        String origStr = super.toPropertyName(s);
        return normalize(origStr);
    }

    /**
     * Override the generated variable name
     */
    @Override
    public String toVariableName(String s) {
        String origStr = super.toVariableName(s);
        return normalize(origStr);
    }

    /**
     * Override the generated interface name
     */
    @Override
    public String toInterfaceName(String s) {
        String origStr = super.toInterfaceName(s);
        return normalize(origStr);
    }

    /**
     * Match the accented characters within a String choosing Canonical
     * Decomposition option of the Normalizer, regex replaceAll using non POSIX
     * character classes for ASCII
     * 
     * @param accented
     * @return normalized String
     */
    private String normalize(String accented) {
        String normalized = Normalizer.normalize(accented, Normalizer.Form.NFD);
        normalized = normalized.replaceAll("[^\p{ASCII}]", "");
        return normalized;
    }
}

To enable this plugin with the normal jaxb unmarshalling is to pack these class in a jar, add /META-INF/services/com.sun.tools.xjc.Plugin file within the jar and put it in your build path.

/META-INF/services/com.sun.tools.xjc.Plugin file within the jar:

This file reads:

com.popofibo.plugins.jaxb.NormalizeElements

As mentioned before, I pack it in a jar, deploy it in my eclipse build path, now the problem I ran in to with running eclipse kepler with JDK 1.7 is I get this exception (message):

com.sun.tools.xjc.plugin Provider <my class> not a subtype

Hence, it's better to generate the classes using ANT, the following build.xml does justice to the work done so far:

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<project name="SomeProject" default="createClasses">

    <taskdef name="xjc" classname="com.sun.tools.xjc.XJC2Task">
        <classpath>
            <pathelement
                path="C:/Workspace/jaxb-ri-2.2.7/jaxb-ri-2.2.7/lib/jaxb-xjc.jar" />
            <pathelement
                path="C:/Workspace/jaxb-ri-2.2.7/jaxb-ri-2.2.7/lib/jaxb-impl.jar" />
            <pathelement
                path="C:/Workspace/jaxb-ri-2.2.7/jaxb-ri-2.2.7/lib/jaxb2-value-constructor.jar" />
            <pathelement path="C:/Workspace/normalizeplugin_xjc_v0.4.jar" />
        </classpath>
    </taskdef>

    <target name="clean">
        <delete dir="src/com/popofibo/jaxb" />
    </target>

    <target name="createClasses" depends="clean">
        <xjc schema="res/some.xsd" destdir="src" package="com.popofibo.jaxb"
            encoding="UTF-8">
            <arg value="-normalize" />
        </xjc>
    </target>
</project>

The schema to showcase this normalization process I chose was:

<xs:element name="shiporder">
  <xs:complexType>
    <xs:sequence>
      <xs:element name="Örderperson" type="xs:string"/>
      <xs:element name="Şhİpto">
        <xs:complexType>
          <xs:sequence>
            <xs:element name="name" type="xs:string"/>
            <xs:element name="address" type="xs:string"/>
            <xs:element name="Çity" type="xs:string"/>
            <xs:element name="ÇoÜntry" type="xs:string"/>
          </xs:sequence>
        </xs:complexType>
      </xs:element>
      <xs:element name="İtem" maxOccurs="unbounded">
        <xs:complexType>
          <xs:sequence>
            <xs:element name="title" type="xs:string"/>
            <xs:element name="note" type="xs:string" minOccurs="0"/>
            <xs:element name="qÜantity" type="xs:positiveInteger"/>
            <xs:element name="price" type="xs:decimal"/>
          </xs:sequence>
        </xs:complexType>
      </xs:element>
    </xs:sequence>
    <xs:attribute name="orderid" type="xs:string" use="required"/>
  </xs:complexType>
</xs:element>

</xs:schema> 

As you can see, I have set the argument and package as to where I want to have my classes generated, and voila - the ASCII names for classes, methods, variables in the generated artifacts (the only gap I see is with the XML annotations which would not affect the cause but also easy to overcome):

The above screenshot shows the names were normalized and are replaced by their ASCII counterparts (to check how it would look without the replacement, please refer to the screenshots in option 2).

Option 2 - Using External binding file

To remove accented characters, you can create a custom binding file and use it to bind your class and property names while generating your classes. Refer to: Creating an External Binding Declarations File Using JAXB Binding Declarations

I took the xsd already mentioned in Option 1 with element names containing "accented" (Non-ASCII) characters:

If I generate the classes without specifying the external binding, I get the following outputs:

!

Now if I change the binding a bit to generate class names and variables of my choice, I write my binding.xml as:

<jxb:bindings xmlns:xs="http://www.w3.org/2001/XMLSchema"
    xmlns:jxb="http://java.sun.com/xml/ns/jaxb" version="2.1">
    <jxb:globalBindings localScoping="toplevel" />

    <jxb:bindings schemaLocation="some.xsd">
        <jxb:bindings node="//xs:element[@name='Şhİpto']">
            <jxb:class name="ShipTo" />
        </jxb:bindings>
        <jxb:bindings node="//xs:element[@name='Örderperson']">
            <jxb:property name="OrderPerson" />
        </jxb:bindings>
        <jxb:bindings node="//xs:element[@name='Şhİpto']//xs:complexType">
            <jxb:class name="ShipToo" />
        </jxb:bindings>
    </jxb:bindings>

</jxb:bindings>

Now when I generate my class through eclipse by specifying the binding file:

In the next steps, I choose the package and the binding file I get,

Note: If you are not using eclipse to generate your classes, you might want to check xjc binding compiler out to utilize your external binding file.

相关文章