Oracle Text 不适用于 NVARCHAR2.还有什么可能不可用?

我们将迁移应用程序以使其支持 Unicode,并且必须在整个数据库的 unicode 字符集或存储在 N[VAR]CHAR2 中的 unicode 列之间进行选择.

We are going to migrate an application to have it support Unicode and have to choose between unicode character set for the whole database, or unicode columns stored in N[VAR]CHAR2.

我们知道,如果我们选择 NVARCHAR2,我们将不再有可能使用 Oracle Text 对列内容进行索引,因为 Oracle Text 只能基于 CHAR 类型对列进行索引.

We know that we will no more have the possibility of indexing column contents with Oracle Text if we choose NVARCHAR2, because Oracle Text can only index columns based on the CHAR type.

除此之外,从 Oracle 的可能性中获取数据时是否可能出现其他主要差异?

Apart that, is it likely that other major differences arise when harvesting from Oracle possibilities?

另外,是否有可能在较新版本的 Oracle 中添加了一些新功能,但仅支持 CHAR 列或 NCHAR 列而不支持两者?

Also, is it likely that some new features are added in newer versions of Oracle, but only supporting either CHAR columns or NCHAR columns but not both?

感谢您的回答.

感谢您的回答.我将讨论您的观点,适用于我们的案例:

Thank you for your answer. I will discuss your points, applied to our case:

我们的应用程序通常单独存在于 Oracle 数据库中,负责处理数据本身.其他连接数据库的软件仅限于Toad,Tora 或 SQL 开发人员.

Our application is usually alone on the Oracle database and takes care of the data itself. Other software that connect to the database are limited to Toad, Tora or SQL developer.

我们还使用 SQL*Loader 和 SQL*Plus 与数据库进行基本通信声明或在产品的版本之间升级.我们有没有听说有关 NVARCHAR2 的所有这些软件有任何具体问题.

We also use SQL*Loader and SQL*Plus to communicate with the database for basic statements or to upgrade between versions of the product. We have not heard of any specific problem with all those software regarding NVARCHAR2.

我们也不知道我们客户中的数据库管理员会喜欢在不支持数据的数据库上使用其他工具NVARCHAR2,我们并不真正担心他们的工具是否会破坏,毕竟他们在自己的工作中很熟练,必要时可能会找到其他工具.

We are also not aware that database administrators among our customers would like to use other tools on the database that could not support data on NVARCHAR2 and we are not really concerned whether their tools might disrupt, after all they are skilled in their job and may find other tools if necessary.

您的最后两点对我们的案例更有见地.我们不使用很多来自 Oracle 的内置包,但它仍然发生.我们将探讨问题.

Your last two points are more insightful for our case. We do not use many built-in packages from Oracle but it still happens. We will explore that problem.

如果我们的应用程序(在 Visual C++ 下编译)使用 wchar_t存储 UTF-16,必须对所有处理过的数据进行编码转换?

Could we also expect performance breakage if our application (that is compiled under Visual C++), that uses wchar_t to store UTF-16, has to perform encoding conversions on all processed data?

推荐答案

如果您有任何选择,请为整个数据库使用 Unicode 字符集.一般来说,生活就是这样简单得多.

If you have anything close to a choice, use a Unicode character set for the entire database. Life in general is just blindingly easier that way.

  • 有很多第三方实用程序和库根本不支持 NCHAR/NVARCHAR2 列,或者无法让使用 NCHAR/NVARCHAR2 列变得愉快.例如,当您闪亮的新报告工具无法报告您的 NVARCHAR2 数据时,这非常烦人.
  • 对于自定义应用程序,使用 NCHAR/NVARCHAR2 列需要跳过一些使用 CHAR/VARCHAR2 Unicode 编码列不需要的问题.例如,在 JDBC 代码中,您会不断调用 Statement.setFormOfUse 方法.其他语言和框架会有其他问题;有些会相对完整地记录在案,而其他次要的会相对模糊.
  • 许多内置包只接受(或返回)一个 VARCHAR2 而不是 NVARCHAR2.由于隐式转换,您仍然可以调用它们,但最终可能会遇到字符集转换问题.
  • 一般来说,能够避免数据库中的字符集转换问题并将这些问题归咎于数据库实际从客户端发送或接收数据的边缘,可以使开发应用程序的工作变得更加容易.调试由网络传输导致的字符集转换问题就足够了——当存储过程将来自 VARCHAR2 和 NVARCHAR2 的数据连接起来并将结果存储在 VARCHAR2 中,然后再通过网络发送时,可以确定某些数据已损坏太难受了.

Oracle 设计了 ​​NCHAR/NVARCHAR2 数据类型,适用于您尝试在与使用 Unicode 的新应用程序相同的数据库中支持不支持 Unicode 的遗留应用程序的情况,以及存储一些 Unicode 数据有益的情况使用不同的编码(即您有大量的日语数据,您更愿意使用 NVARCHAR2 中的 UTF-16 编码而不是 UTF-8 编码来存储这些数据).如果您不是这两种情况之一,而且听起来也不像,我会不惜一切代价避免使用 NCHAR/NVARCHAR2.

Oracle designed the NCHAR/ NVARCHAR2 data types for cases where you are trying to support legacy applications that don't support Unicode in the same database as new applications that are using Unicode and for cases where it is beneficial to store some Unicode data with a different encoding (i.e. you have a large amount of Japanese data that you would prefer to store using the UTF-16 encoding in a NVARCHAR2 rather than the UTF-8 encoding). If you are not in one of those two situations, and it doesn't sound like you are, I would avoid NCHAR/ NVARCHAR2 at all costs.

回复您的跟进

我们的应用程序通常是单独在Oracle 数据库并负责数据本身.其他软件连接到数据库仅限于Toad、Tora 或 SQL 开发人员.

Our application is usually alone on the Oracle database and takes care of the data itself. Other software that connect to the database are limited to Toad, Tora or SQL developer.

自己处理数据"是什么意思?我希望您不是说您已将应用程序配置为绕过 Oracle 的字符集转换例程,而是您自己完成所有字符集转换.

What do you mean "takes care of the data itself"? I'm hoping you're not saying that you've configured your application to bypass Oracle's character set conversion routines and that you do all the character set conversion yourself.

我还假设您正在使用某种 API/库来访问数据库,即使它是 OCI.您是否研究过需要对应用程序进行哪些更改以支持 NCHAR/NVARCHAR2 以及您使用的 API 是否支持 NCHAR/NVARCHAR2?您在 C++ 中获取 Unicode 数据这一事实实际上并不表示您不需要进行(可能很重要的)更改来支持 NCHAR/NVARCHAR2 列.

I'm also assuming that you are using some sort of API/ library to access the database even if that is OCI. Have you looked into what changes you'll need to make to your application to support NCHAR/ NVARCHAR2 and whether the API you're using supports NCHAR/ NVARCHAR2? The fact that you're getting Unicode data in C++ doesn't actually indicate that you won't need to make (potentially significant) changes to support NCHAR/ NVARCHAR2 columns.

我们还使用 SQL*Loader 和 SQL*Plus与数据库通信基本报表或升级之间产品的版本.我们还没有听说过所有人的任何具体问题那些关于 NVARCHAR2 的软件.

We also use SQL*Loader and SQL*Plus to communicate with the database for basic statements or to upgrade between versions of the product. We have not heard of any specific problem with all those software regarding NVARCHAR2.

那些应用程序都使用 NCHAR/NVARCHAR2.NCHAR/NVARCHAR2 向脚本中引入了一些额外的复杂性,尤其是当您尝试对无法在数据库字符集中表示的字符串常量进行编码时.不过,您当然可以解决这些问题.

Those applications all work with NCHAR/ NVARCHAR2. NCHAR/ NVARCHAR2 introduce some additional complexities into scripts particularly if you are trying to encode string constants that are not representable in the database character set. You can certainly work around the issues, though.

我们也不知道数据库我们客户中的管理员想使用其他工具无法支持数据的数据库在 NVARCHAR2 上,我们并不是真的担心他们的工具是否可能扰乱,毕竟他们擅长他们的工作,并且可能会找到其他工具,如果必要的.

We are also not aware that database administrators among our customers would like to use other tools on the database that could not support data on NVARCHAR2 and we are not really concerned whether their tools might disrupt, after all they are skilled in their job and may find other tools if necessary.

虽然我确信您的客户可以找到其他方法来处理您的数据,但如果您的应用程序不能很好地与他们的企业报告工具或企业 ETL 工具或他们碰巧使用过的任何桌面工具配合使用,客户很可能会责怪您的应用程序而不是他们的工具.它可能不会成为节目的终结者,但让客户不必要地悲伤也没有任何好处.这可能不会促使他们使用竞争对手的产品,但不会让他们渴望接受您的产品.

While I'm sure that your customers can find alternate ways of working with your data, if your application doesn't play nicely with their enterprise reporting tool or their enterprise ETL tool or whatever desktop tools they happen to be experienced with, it's very likely that the customer will blame your application rather than their tools. It probably won't be a show stopper, but there is also no benefit to causing customers grief unnecessarily. That may not drive them to use a competitor's product, but it won't make them eager to embrace your product.

我们还能期待性能吗如果我们的应用程序(即在 Visual C++ 下编译),使用wchar_t 存储 UTF-16,必须对所有执行编码转换处理过的数据?

Could we also expect performance breakage if our application (that is compiled under Visual C++), that uses wchar_t to store UTF-16, has to perform encoding conversions on all processed data?

我不确定您说的是什么转化".这可能会回到我最初的问题,即您是否在说明您正在绕过 Oracle 的 NLS 层自行进行字符集转换.

I'm not sure what "conversions" you're talking about. This may get back to my initial question about whether you're stating that you are bypassing Oracle's NLS layer to do character set conversion on your own.

不过,我的底线是,鉴于您所描述的内容,我认为使用 NCHAR/NVARCHAR2 没有任何优势.使用它们有很多潜在的缺点.即使您可以消除 99% 的与您的特定需求无关的缺点,但是,您仍然面临这样一种情况,即充其量只能在两种方法之间进行清洗.鉴于此,我更愿意采用最大限度提高灵活性的方法,即将整个数据库转换为 Unicode(大概是 AL32UTF8)并直接使用.

My bottom line, though, is that I don't see any advantages to using NCHAR/ NVARCHAR2 given what you're describing. There are plenty of potential downsides to using them. Even if you can eliminate 99% of the downsides as irrelevant to your particular needs, however, you're still facing a situation where at best it's a wash between the two approaches. Given that, I'd much rather go with the approach that maximizes flexibility going forward, and that's converting the entire database to Unicode (AL32UTF8 presumably) and just using that.

相关文章