计算文本文件中每个单词的出现次数
给定一个包含多个字符串的大型文本文件,读取文本文件并计算 C++ 中每个单词出现的次数的最有效方法是什么?文本文件的大小未知,所以我不能只使用一个简单的数组.此外,还有一个问题.该文本文件的每一行都以一个类别关键字开头,后面的单词是该类别的特征.我需要能够计算每个单词在该类别中出现的次数.
Given a large text file with multiple strings, what would be the most efficient way to read the text file and count how many occurrences of each word are there in C++? The text file's size is unknown so I cannot just use a simple array. Also, there is another catch. Each line of this text file starts with a category key word and the following words are the features of that category. I need to be able to count how many occurrences of each word is in that category.
例如:
colors red blue green yellow orange purple
sky blue high clouds air empty vast big
ocean wet water aquatic blue
colors brown black blue white blue blue
在此示例中,我需要计算在颜色"类别中,蓝色"出现了 4 次,尽管总共出现了 6 次总蓝色.
With this example, I need to count that within the "colors" category, there are 4 occurrences of "blue", even though there are 6 total occurrences of blue in total.
推荐答案
我会使用 stream 用于读取和分隔单词(它通过查找空格来分隔单词)并将它们保存到 dictionary (标准的 C++ 方法是使用 std::map
).
I would use a stream for reading and separating the words (it separates words by looking for whitespace) and save them to a dictionary (The standard C++ method is to use std::map
).
这是一个 C++ 文档代码:
Here is a C++ documented code:
#include <iostream>
#include <map> // A map will be used to count the words.
#include <fstream> // Will be used to read from a file.
#include <string> // The map's key value.
using namespace std;
//Will be used to print the map later.
template <class KTy, class Ty>
void PrintMap(map<KTy, Ty> map)
{
typedef std::map<KTy, Ty>::iterator iterator;
for (iterator p = map.begin(); p != map.end(); p++)
cout << p->first << ": " << p->second << endl;
}
int main(void)
{
static const char* fileName = "C:\MyFile.txt";
// Will store the word and count.
map<string, unsigned int> wordsCount;
{
// Begin reading from file:
ifstream fileStream(fileName);
// Check if we've opened the file (as we should have).
if (fileStream.is_open())
while (fileStream.good())
{
// Store the next word in the file in a local variable.
string word;
fileStream >> word;
//Look if it's already there.
if (wordsCount.find(word) == wordsCount.end()) // Then we've encountered the word for a first time.
wordsCount[word] = 1; // Initialize it to 1.
else // Then we've already seen it before..
wordsCount[word]++; // Just increment it.
}
else // We couldn't open the file. Report the error in the error stream.
{
cerr << "Couldn't open the file." << endl;
return EXIT_FAILURE;
}
// Print the words map.
PrintMap(wordsCount);
}
return EXIT_SUCCESS;
}
输出:
空气:1
水生:1
大:1
黑色:1
蓝色:6
棕色:1
云:1
颜色:2
空:1
绿色:1
高:1
海洋:1
橙色:1
紫色:1
红色:1
天空:1
广大:1
水:1
湿:1
白色:1
黄色:1
air: 1
aquatic: 1
big: 1
black: 1
blue: 6
brown: 1
clouds: 1
colors: 2
empty: 1
green: 1
high: 1
ocean: 1
orange: 1
purple: 1
red: 1
sky: 1
vast: 1
water: 1
wet: 1
white: 1
yellow: 1
相关文章