在 Python 中将货币解析为数字

2022-01-17 00:00:00 python numbers currency-formatting

问题描述

我刚刚从在 Python 中将数字格式化为货币了解到 Python模块 babel 提供 babel.numbers.format_currency 将数字格式化为货币.例如，

I just learnt from Format numbers as currency in Python that the Python module babel provides babel.numbers.format_currency to format numbers as currency. For instance,

from babel.numbers import format_currency s = format_currency(123456.789, 'USD', locale='en_US') # u'$123,456.79' s = format_currency(123456.789, 'EUR', locale='fr_FR') # u'123xa0456,79xa0u20ac'

反过来呢，从货币到数字，比如$123,456,789.00 --> 123456789?babel 提供 babel.numbers.parse_number 来解析本地数字，但我没有找到类似 parse_currency 的东西.那么，将本地货币解析为数字的理想方法是什么?

How about the reverse, from currency to numbers, such as $123,456,789.00 --> 123456789? babel provides babel.numbers.parse_number to parse local numbers, but I didn't found something like parse_currency. So, what is the ideal way to parse local currency into numbers?

我经历了 Python:从字符串中删除除数字以外的字符.

# Way 1 import string all=string.maketrans('','') nodigs=all.translate(all, string.digits) s = '$123,456.79' n = s.translate(all, nodigs) # 12345679, lost `.` # Way 2 import re n = re.sub("D", "", s) # 12345679

不关心小数分隔符..

从字符串中删除所有非数字字符，. 除外(请参阅这里),

Remove all non-numeric characters, except for ., from a string (refer to here),

import re # Way 1: s = '$123,456.79' n = re.sub("[^0-9|.]", "", s) # 123456.79 # Way 2: non_decimal = re.compile(r'[^d.]+') s = '$123,456.79' n = non_decimal.sub('', s) # 123456.79

它确实处理小数分隔符..

It does process the decimal separator ..

但上述解决方案在遇到时不起作用，例如，

But the above solutions don't work when coming to, for instance,

from babel.numbers import format_currency s = format_currency(123456.789, 'EUR', locale='fr_FR') # u'123xa0456,79xa0u20ac' new_s = s.encode('utf-8') # 123 456,79 €

如您所见，货币的格式各不相同.以一般方式将货币解析为数字的理想方法是什么?

As you can see, the format of currency varies. What is the ideal way to parse currency into numbers in a general way?

解决方案

使用 babel
babel 文档指出数字解析没有完全实现是但他们已经做了很多工作来将货币信息输入图书馆.您可以使用 get_currency_name() 和 get_currency_symbol() 来获取货币详细信息，也可以使用所有其他 get_... 函数来获取正常数字详细信息(小数点、减号等).

Using babel

The babel documentation notes that the number parsing is not fully implemented yes but they have done a lot of work to get currency info into the library. You can use get_currency_name() and get_currency_symbol() to get currency details, and also all other get_... functions to get the normal number details (decimal point, minus sign, etc.).

使用该信息，您可以从货币字符串中排除货币详细信息(名称、符号)和分组(例如美国的 、).然后将小数细节更改为 C 语言环境使用的细节(- 表示减号，. 表示小数点).

Using that information you can exclude from a currency string the currency details (name, sign) and groupings (e.g. , in the US). Then you change the decimal details into the ones used by the C locale (- for minus, and . for the decimal point).

这导致了这段代码(我添加了一个对象来保存一些数据，这可能会在进一步处理中派上用场):

This results in this code (i added an object to keep some of the data, which may come handy in further processing):

import re, os from babel import numbers as n from babel.core import default_locale class AmountInfo(object): def __init__(self, name, symbol, value): self.name = name self.symbol = symbol self.value = value def parse_currency(value, cur): decp = n.get_decimal_symbol() plus = n.get_plus_sign_symbol() minus = n.get_minus_sign_symbol() group = n.get_group_symbol() name = n.get_currency_name(cur) symbol = n.get_currency_symbol(cur) remove = [plus, name, symbol, group] for token in remove: # remove the pieces of information that shall be obvious value = re.sub(re.escape(token), '', value) # change the minus sign to a LOCALE=C minus value = re.sub(re.escape(minus), '-', value) # and change the decimal mark to a LOCALE=C decimal point value = re.sub(re.escape(decp), '.', value) # just in case remove extraneous spaces value = re.sub('s+', '', value) return AmountInfo(name, symbol, value) #cur_loc = os.environ['LC_ALL'] cur_loc = default_locale() print('locale:', cur_loc) test = [ (n.format_currency(123456.789, 'USD', locale=cur_loc), 'USD') , (n.format_currency(-123456.78, 'PLN', locale=cur_loc), 'PLN') , (n.format_currency(123456.789, 'PLN', locale=cur_loc), 'PLN') , (n.format_currency(123456.789, 'IDR', locale=cur_loc), 'IDR') , (n.format_currency(123456.789, 'JPY', locale=cur_loc), 'JPY') , (n.format_currency(-123456.78, 'JPY', locale=cur_loc), 'JPY') , (n.format_currency(123456.789, 'CNY', locale=cur_loc), 'CNY') , (n.format_currency(-123456.78, 'CNY', locale=cur_loc), 'CNY') ] for v,c in test: print('As currency :', c, ':', v.encode('utf-8')) info = parse_currency(v, c) print('As value :', c, ':', info.value) print('Extra info :', info.name.encode('utf-8') , info.symbol.encode('utf-8'))

输出看起来很有希望(在美国语言环境中):

The output looks promising (in US locale):

$ export LC_ALL=en_US $ ./cur.py locale: en_US As currency : USD : b'$123,456.79' As value : USD : 123456.79 Extra info : b'US Dollar' b'$' As currency : PLN : b'-zxc5x82123,456.78' As value : PLN : -123456.78 Extra info : b'Polish Zloty' b'zxc5x82' As currency : PLN : b'zxc5x82123,456.79' As value : PLN : 123456.79 Extra info : b'Polish Zloty' b'zxc5x82' As currency : IDR : b'Rp123,457' As value : IDR : 123457 Extra info : b'Indonesian Rupiah' b'Rp' As currency : JPY : b'xc2xa5123,457' As value : JPY : 123457 Extra info : b'Japanese Yen' b'xc2xa5' As currency : JPY : b'-xc2xa5123,457' As value : JPY : -123457 Extra info : b'Japanese Yen' b'xc2xa5' As currency : CNY : b'CNxc2xa5123,456.79' As value : CNY : 123456.79 Extra info : b'Chinese Yuan' b'CNxc2xa5' As currency : CNY : b'-CNxc2xa5123,456.78' As value : CNY : -123456.78 Extra info : b'Chinese Yuan' b'CNxc2xa5'

而且它仍然适用于不同的语言环境(巴西以使用逗号作为小数点而著称):

And it still works in different locales (Brazil is notable for using the comma as a decimal mark):

$ export LC_ALL=pt_BR $ ./cur.py locale: pt_BR As currency : USD : b'US$123.456,79' As value : USD : 123456.79 Extra info : b'Dxc3xb3lar americano' b'US$' As currency : PLN : b'-PLN123.456,78' As value : PLN : -123456.78 Extra info : b'Zloti polonxc3xaas' b'PLN' As currency : PLN : b'PLN123.456,79' As value : PLN : 123456.79 Extra info : b'Zloti polonxc3xaas' b'PLN' As currency : IDR : b'IDR123.457' As value : IDR : 123457 Extra info : b'Rupia indonxc3xa9sia' b'IDR' As currency : JPY : b'JPxc2xa5123.457' As value : JPY : 123457 Extra info : b'Iene japonxc3xaas' b'JPxc2xa5' As currency : JPY : b'-JPxc2xa5123.457' As value : JPY : -123457 Extra info : b'Iene japonxc3xaas' b'JPxc2xa5' As currency : CNY : b'CNxc2xa5123.456,79' As value : CNY : 123456.79 Extra info : b'Yuan chinxc3xaas' b'CNxc2xa5' As currency : CNY : b'-CNxc2xa5123.456,78' As value : CNY : -123456.78 Extra info : b'Yuan chinxc3xaas' b'CNxc2xa5'

<小时>
值得指出的是，babel 存在一些编码问题.这是因为语言环境文件(在 locale-data 中)本身确实使用不同的编码.如果您使用熟悉的货币，那应该不是问题.但是如果你尝试不熟悉的货币，你可能会遇到问题(我刚刚了解到波兰使用 iso-8859-2，而不是 iso-8859-1).

It is worth to point out that babel has some encoding problems. That is because the locale files (in locale-data) do use different encoding themselves. If you're working with currencies you're familiar with that should not be a problem. But if you try unfamiliar currencies you might run into problems (i just learned that Poland uses iso-8859-2, not iso-8859-1).

相关文章