如何使用 boost::spirit 解析 csv

2021-12-09 00:00:00 csv c++ boost boost-spirit boost-spirit-qi

我有这个 csv 行

std::string s = R"(1997,Ford,E350,"ac, abs, moon","some "rusty" parts",3000.00)";

我可以使用 boost::tokenizer 解析它:

I can parse it using boost::tokenizer:

typedef boost::tokenizer< boost::escaped_list_separator<char> , std::string::const_iterator, std::string> Tokenizer;
boost::escaped_list_separator<char> seps('\', ',', '"');
Tokenizer tok(s, seps);
for (auto i : tok)
{
    std::cout << i << std::endl;
}

它是正确的,除了标记生锈"应该有双引号被剥离.

It gets it right except token "rusty" should have double quotes which are getting stripped.

这是我尝试使用 boost::spirit

Here is my attempt to use boost::spirit

boost::spirit::classic::rule<> list_csv_item = !(boost::spirit::classic::confix_p('"', *boost::spirit::classic::c_escape_ch_p, '"') | boost::spirit::classic::longest_d[boost::spirit::classic::real_p | boost::spirit::classic::int_p]);
std::vector<std::string> vec_item;
std::vector<std::string>  vec_list;
boost::spirit::classic::rule<> list_csv = boost::spirit::classic::list_p(list_csv_item[boost::spirit::classic::push_back_a(vec_item)],',')[boost::spirit::classic::push_back_a(vec_list)];
boost::spirit::classic::parse_info<> result = parse(s.c_str(), list_csv);
if (result.hit)
{
  for (auto i : vec_item)
  {
    cout << i << endl;
   }
}

问题:

  1. 不起作用,只打印第一个标记

  1. does not work, prints the first token only

为什么是 boost::spirit::classic?找不到使用 Spirit V2 的示例

why boost::spirit::classic? can't find examples using Spirit V2

设置很残酷..但我可以忍受

the setup is brutal .. but I can live with this

** 我真的很想使用 boost::spirit 因为它往往很快

** I really want to use boost::spirit because it tends to be pretty fast

预期输出:

1997
Ford
E350
ac, abs, moon
some "rusty" parts

3000.00

推荐答案

Sehe 的帖子看起来比我的要干净一些,但我把它放在一起了一段时间,所以无论如何都在这里:

Sehe's post looks a fair bit cleaner than mine, but I was putting this together for a bit, so here it is anyways:

#include <boost/tokenizer.hpp>
#include <boost/spirit/include/qi.hpp>

namespace qi = boost::spirit::qi;

int main() {
    const std::string s = R"(1997,Ford,E350,"ac, abs, moon",""rusty"",3000.00)";

    // Tokenizer
    typedef boost::tokenizer< boost::escaped_list_separator<char> , std::string::const_iterator, std::string> Tokenizer;
    boost::escaped_list_separator<char> seps('\', ',', '"');
    Tokenizer tok(s, seps);
    for (auto i : tok)
        std::cout << i << "
";
    std::cout << "
";

    // Boost Spirit Qi
    qi::rule<std::string::const_iterator, std::string()> quoted_string = '"' >> *(qi::char_ - '"') >> '"';
    qi::rule<std::string::const_iterator, std::string()> valid_characters = qi::char_ - '"' - ',';
    qi::rule<std::string::const_iterator, std::string()> item = *(quoted_string | valid_characters );
    qi::rule<std::string::const_iterator, std::vector<std::string>()> csv_parser = item % ',';

    std::string::const_iterator s_begin = s.begin();
    std::string::const_iterator s_end = s.end();
    std::vector<std::string> result;

    bool r = boost::spirit::qi::parse(s_begin, s_end, csv_parser, result);
    assert(r == true);
    assert(s_begin == s_end);

    for (auto i : result)
        std::cout << i << std::endl;
    std::cout << "
";
}   

这输出:

1997
Ford
E350
ac, abs, moon
rusty
3000.00

1997
Ford
E350
ac, abs, moon
rusty
3000.00

值得注意的事情:这没有实现完整的 CSV 解析器.您还需要研究转义字符或其他实现所需的任何内容.

Something Worth Noting: This doesn't implement a full CSV parser. You'd also want to look into escape characters or whatever else is required for your implementation.

另外:如果您正在查看文档,那么您就会知道,在 Qi 中,'a' 等效于 boost::spirit::qi::lit('a')"abc" 等价于 boost::spirit::qi::lit("abc").

Also: If you're looking into the documentation, just so you know, in Qi, 'a' is equivalent to boost::spirit::qi::lit('a') and "abc" is equivalent to boost::spirit::qi::lit("abc").

关于双引号:因此,正如 Sehe 在上面的评论中指出的那样,输入文本中围绕 "" 的规则并不直接清楚意味着什么.如果您希望所有不在带引号的字符串中的 "" 实例都转换为 ",那么类似下面的内容将起作用.

On Double quotes: So, as Sehe notes in a comment above, it's not directly clear what the rules surrounding a "" in the input text means. If you wanted all instances of "" not within a quoted string to be converted to a ", then something like the following would work.

qi::rule<std::string::const_iterator, std::string()> double_quote_char = """" >> qi::attr('"');
qi::rule<std::string::const_iterator, std::string()> item = *(double_quote_char | quoted_string | valid_characters );

相关文章