如何让我的拆分只在一个真实的行上工作并且能够跳过字符串的引用部分?

2021-12-12 00:00:00 string split parsing c++ boost

所以我们有一个简单的拆分:

#include <iostream>
#include <string>
#include <vector>
#include <algorithm>
#include <iterator>
using namespace std;

vector<string> split(const string& s, const string& delim, const bool keep_empty = true) {
    vector<string> result;
    if (delim.empty()) {
        result.push_back(s);
        return result;
    }
    string::const_iterator substart = s.begin(), subend;
    while (true) {
        subend = search(substart, s.end(), delim.begin(), delim.end());
        string temp(substart, subend);
        if (keep_empty || !temp.empty()) {
            result.push_back(temp);
        }
        if (subend == s.end()) {
            break;
        }
        substart = subend + delim.size();
    }
    return result;
}

或boost split.我们有简单的 main 像:

or boost split. And we have simple main like:

int main() {
    const vector<string> words = split("close no "
 matter" how 
 far", " ");
    copy(words.begin(), words.end(), ostream_iterator<string>(cout, "
"));
}

如何让它像输出一样

close 
no
"
 matter"
how
end symbol found.

我们想引入未拆分的拆分结构和结束解析过程的字符.怎么办?

we want to introduce to split structures that shall be held unsplited and charecters that shall end parsing process. how to do such thing?

推荐答案

如下代码:

vector<string>::const_iterator matchSymbol(const string & s, string::const_iterator i, const vector<string> & symbols)
{
    vector<string>::const_iterator testSymbol;
    for (testSymbol=symbols.begin();testSymbol!=symbols.end();++testSymbol) {
        if (!testSymbol->empty()) {
            if (0==testSymbol->compare(0,testSymbol->size(),&(*i),testSymbol->size())) {
                return testSymbol;
            }
        }
    }

    assert(testSymbol==symbols.end());
    return testSymbol;
}

vector<string> split(const string& s, const vector<string> & delims, const vector<string> & terms, const bool keep_empty = true)
{
    vector<string> result;
    if (delims.empty()) {
        result.push_back(s);
        return result;
    }

    bool checkForDelim=true;

    string temp;
    string::const_iterator i=s.begin();
    while (i!=s.end()) {
        vector<string>::const_iterator testTerm=terms.end();
        vector<string>::const_iterator testDelim=delims.end();

        if (checkForDelim) {
            testTerm=matchSymbol(s,i,terms);
            testDelim=matchSymbol(s,i,delims);
        }

        if (testTerm!=terms.end()) {
            i=s.end();
        } else if (testDelim!=delims.end()) {
            if (!temp.empty() || keep_empty) {
                result.push_back(temp);
                temp.clear();
            }
            string::const_iterator j=testDelim->begin();
            while (i!=s.end() && j!=testDelim->end()) {
                ++i;
                ++j;
            }
        } else if ('"'==*i) {
            if (checkForDelim) {
                string::const_iterator j=i;
                do {
                    ++j;
                } while (j!=s.end() && '"'!=*j);
                checkForDelim=(j==s.end());
                if (!checkForDelim && !temp.empty() || keep_empty) {
                    result.push_back(temp);
                    temp.clear();
                }
                temp.push_back('"');
                ++i;
            } else {
                //matched end quote
                checkForDelim=true;
                temp.push_back('"');
                ++i;
                result.push_back(temp);
                temp.clear();
            }
        } else if ('
'==*i) {
            temp+="\n";
            ++i;
        } else {
            temp.push_back(*i);
            ++i;
        }
    }

    if (!temp.empty() || keep_empty) {
        result.push_back(temp);
    }
    return result;
}

int runTest()
{
    vector<string> delims;
    delims.push_back(" ");
    delims.push_back("	");
    delims.push_back("
");
    delims.push_back("split_here");

    vector<string> terms;
    terms.push_back(">");
    terms.push_back("end_here");

    const vector<string> words = split("close no "
 end_here matter" how 
 far testsplit_heretest"another split_here test"with some"mo>re", delims, terms, false);

    copy(words.begin(), words.end(), ostream_iterator<string>(cout, "
"));
}

生成:

close
no
"
 end_here matter"
how
far
test
test
"another split_here test"
with
some"mo

根据您提供的示例,您似乎希望换行符出现在引号之外时被视为分隔符,并在引号内时由文字 表示,这就是这样做的.它还添加了具有多个分隔符的功能,例如我使用测试时的 split_here.

Based on the examples you gave, you seemed to want newlines to count as delimiters when they appear outside of quotes and be represented by the literal when inside of quotes, so that's what this does. It also adds the ability to have multiple delimiters, such as split_here as I used the test.

我不确定您是否希望以匹配引号的方式拆分不匹配的引号,因为您提供的示例将不匹配的引号用空格分隔.此代码将不匹配的引号视为任何其他字符,但如果这不是您想要的行为,它应该很容易修改.

I wasn't sure if you want unmatched quotes to be split the way matched quotes do since the example you gave has the unmatched quote separated by spaces. This code treats unmatched quotes as any other character, but it should be easy to modify if this is not the behavior you want.

行:

if (0==testSymbol->compare(0,testSymbol->size(),&(*i),testSymbol->size())) {

将适用于 STL 的大多数(如果不是全部)实现,但不能保证工作.它可以替换为更安全但更慢的版本:

will work on most, if not all, implementations of the STL, but it is not gauranteed to work. It can be replaced with the safer, but slower, version:

if (*testSymbol==s.substr(i-s.begin(),testSymbol->size())) {

相关文章