如何在vc++中读取包含uxxxx的文件
我有一个txt文件,其内容是:
I have txt file whose contents are:
u041fu0435u0440u0432u044bu0439_u0438u043du0442u0435u0440u0430u043au0442u0438u04u04u043u04u04u043u04u043u043u04u04304u0440u043du0435u0442_u043au0430u043du0430u043b
u041fu0435u0440u0432u044bu0439_u0438u043du0442u0435u0440u0430u043au0442u0438u0432u043du044bu0439_u0438u043du0442u0435u0440u043du0435u0442_u043au0430u043du0430u043b
我怎样才能读取这样的文件来得到这样的结果:
How can I read such file to get result like this:
Первый_интерактивный_интернет_канал"
"Первый_интерактивный_интернет_канал"
如果我输入:
string str = _T("u041fu0435u0440u0432u044bu0439_u0438u043du0442u0435u0440u0430u043au0442u0438u0432u043du044bu0439_u0438u043du0442u0435u0440u043du0435u0442_u043au0430u043du0430u043b");
然后结果 str
很好,但是如果我从文件中读取它,那么它与文件中的结果相同.我想这是因为 'u' 变成了 'u'.有没有简单的方法将 uxxxx 符号转换为 C++ 中的相应符号?
then result in str
is good but if I read it from file then it is the same like in file. I guess it is because 'u' becomes 'u'.
Is there simple way to convert uxxxx notation to corresponding symbols in C++?
推荐答案
以下是 MSalters 建议的示例:
Here is an example for MSalters's suggestion:
#include <iostream>
#include <string>
#include <fstream>
#include <algorithm>
#include <sstream>
#include <iomanip>
#include <locale>
#include <boost/scoped_array.hpp>
#include <boost/regex.hpp>
#include <boost/numeric/conversion/cast.hpp>
std::wstring convert_unicode_escape_sequences(const std::string& source) {
const boost::regex regex("\\u([0-9A-Fa-f]{4})"); // NB: no support for non-BMP characters
boost::scoped_array<wchar_t> buffer(new wchar_t[source.size()]);
wchar_t* const output_begin = buffer.get();
wchar_t* output_iter = output_begin;
std::string::const_iterator last_match = source.begin();
for (boost::sregex_iterator input_iter(source.begin(), source.end(), regex), input_end; input_iter != input_end; ++input_iter) {
const boost::smatch& match = *input_iter;
output_iter = std::copy(match.prefix().first, match.prefix().second, output_iter);
std::stringstream stream;
stream << std::hex << match[1].str() << std::ends;
unsigned int value;
stream >> value;
*output_iter++ = boost::numeric_cast<wchar_t>(value);
last_match = match[0].second;
}
output_iter = std::copy(last_match, source.end(), output_iter);
return std::wstring(output_begin, output_iter);
}
int wmain() {
std::locale::global(std::locale(""));
const std::wstring filename = L"test.txt";
std::ifstream stream(filename.c_str(), std::ios::in | std::ios::binary);
stream.seekg(0, std::ios::end);
const std::ifstream::streampos size = stream.tellg();
stream.seekg(0);
boost::scoped_array<char> buffer(new char[size]);
stream.read(buffer.get(), size);
const std::string source(buffer.get(), size);
const std::wstring result = convert_unicode_escape_sequences(source);
std::wcout << result << std::endl;
}
我总是惊讶于 C++ 中看似简单的事情是多么复杂.
I'm always surprised how complicated seemingly simple things like this are in C++.
相关文章