python统计文本字符串里面单词出现的频率
python统计文本字符串里面单词出现的频率
""" 作者:皮蛋编程(https://www.pidancode.com) 创建日期:2022/3/27 功能描述:python统计文本字符串里面单词出现的频率 """ str1 = """Man who run in front of car, get tired. Man who run behind car, get exhausted.""" print("Original string:") print(str1) # create a list of words separated at whitespaces wordList1 = str1.split(None) # strip any punctuation marks and build modified word list # start with an empty list wordList2 = [] for word1 in wordList1: # last character of each word lastchar = word1[-1:] # use a list of punctuation marks if lastchar in [",", ".", "!", "?", ";"]: word2 = word1.rstrip(lastchar) else: word2 = word1 # build a wordList of lower case modified words wordList2.append(word2.lower()) print("Word list created from modified string:") print(wordList2) # create a wordfrequency dictionary # start with an empty dictionary freqD2 = {} for word2 in wordList2: freqD2[word2] = freqD2.get(word2, 0) + 1 # create a list of keys and sort the list # all words are lower case already keyList = freqD2.keys() keyList = sorted(keyList) print("Frequency of each word in the word list (sorted):") for key2 in keyList: print("%-10s %d" % (key2, freqD2[key2]))
输出结果:
Original string: Man who run in front of car, get tired. Man who run behind car, get exhausted. Word list created from modified string: ['man', 'who', 'run', 'in', 'front', 'of', 'car', 'get', 'tired', 'man', 'who', 'run', 'behind', 'car', 'get', 'exhausted'] Frequency of each word in the word list (sorted): behind 1 car 2 exhausted 1 front 1 get 2 in 1 man 2 of 1 run 2 tired 1 who 2
以上代码在python3.9环境下测试通过
相关文章