如何检查字符串是否看起来是随机的,或者是人为生成和发音的?

2022-01-02 00:00:00 nlp algorithm mysql spam phonetics

为了识别 [可能] 机器人生成的用户名.

For the purpose of identifying [possible] bot-generated usernames.

假设你有一个像bilbomoothof"这样的用户名..这可能是无稽之谈,但它仍然包含可发音的声音,所以看起来是人工生成的.

Suppose you have a username like "bilbomoothof" .. it may be nonsense, but it still contains pronouncable sounds and so appears human-generated.

我承认它可能是从音节或单词部分的字典中随机生成的,但让我们暂时假设所讨论的机器人有点垃圾.

I accept that it could have been randomly generated from a dictionary of syllables, or word parts, but let's assume for a moment that the bot in question is a bit rubbish.

  1. 假设你有一个像sdfgbhm342r3f",对人类来说这是显然是一个随机字符串.但是可以这可以通过编程方式识别吗?
  2. 有没有可用的算法(类似于 Soundex 等)可以识别内部可发音的声音像这样的字符串?

最受赞赏的适用于 PHP/MySQL 的解决方案.

Solutions applicable in PHP/MySQL most appreciated.

推荐答案

我想如果您可以限制自己使用可发音的英语,您可能会想到类似的东西.对我来说(我是法国人),像 szczepan 或 wawrzyniec 这样的词是不可发音的,当然也有一定的随机性.

I guess you could think of something like that if you could restrict yourself to pronounceable sounds in english. For me (I am French), words like szczepan or wawrzyniec are unpronounceable and certainly have a certain randomness.

但它们实际上是波兰人的名字(意思是史蒂文和劳伦斯)...

But they are actually Polish first names (meaning steven and lawrence)...

相关文章