SQL:从全名字段中解析名字、中间名和姓氏
如何使用 SQL 从全名字段中解析名字、中间名和姓氏?
How do I parse the first, middle, and last name out of a fullname field with SQL?
我需要尝试匹配与全名不直接匹配的名称.我希望能够获取全名字段并将其分解为名字、中间名和姓氏.
I need to try to match up on names that are not a direct match on full name. I'd like to be able to take the full name field and break it up into first, middle and last name.
数据不包括任何前缀或后缀.中间名是可选的.数据格式为First Middle Last".
The data does not include any prefixes or suffixes. The middle name is optional. The data is formatted 'First Middle Last'.
我对一些实用的解决方案很感兴趣,可以让我完成 90% 的工作.如前所述,这是一个复杂的问题,所以我会单独处理特殊情况.
I'm interested in some practical solutions to get me 90% of the way there. As it has been stated, this is a complex problem, so I'll handle special cases individually.
推荐答案
这是一个自包含的示例,其中包含易于操作的测试数据.
Here is a self-contained example, with easily manipulated test data.
在此示例中,如果您的名称包含三个以上的部分,则所有额外"内容都将放入 LAST_NAME 字段中.标识为titles"的特定字符串除外,例如DR"、MRS"和MR".
With this example, if you have a name with more than three parts, then all the "extra" stuff will get put in the LAST_NAME field. An exception is made for specific strings that are identified as "titles", such as "DR", "MRS", and "MR".
如果缺少中间名,那么您只会得到 FIRST_NAME 和 LAST_NAME(MIDDLE_NAME 将为 NULL).
If the middle name is missing, then you just get FIRST_NAME and LAST_NAME (MIDDLE_NAME will be NULL).
你可以将它粉碎成一个巨大的嵌套的 SUBSTRING 块,但可读性已经够难了,因为它在 SQL 中这样做.
You could smash it into a giant nested blob of SUBSTRINGs, but readability is hard enough as it is when you do this in SQL.
编辑——处理以下特殊情况:
1 - NAME 字段为 NULL
2 - NAME 字段包含前导/尾随空格
3 - NAME 字段在名称中有 > 1 个连续空格
4 - NAME 字段只包含名字
5 - 将原始全名作为单独的列包含在最终输出中,以提高可读性
6 - 将特定的前缀列表作为单独的标题"列处理
SELECT
FIRST_NAME.ORIGINAL_INPUT_DATA
,FIRST_NAME.TITLE
,FIRST_NAME.FIRST_NAME
,CASE WHEN 0 = CHARINDEX(' ',FIRST_NAME.REST_OF_NAME)
THEN NULL --no more spaces? assume rest is the last name
ELSE SUBSTRING(
FIRST_NAME.REST_OF_NAME
,1
,CHARINDEX(' ',FIRST_NAME.REST_OF_NAME)-1
)
END AS MIDDLE_NAME
,SUBSTRING(
FIRST_NAME.REST_OF_NAME
,1 + CHARINDEX(' ',FIRST_NAME.REST_OF_NAME)
,LEN(FIRST_NAME.REST_OF_NAME)
) AS LAST_NAME
FROM
(
SELECT
TITLE.TITLE
,CASE WHEN 0 = CHARINDEX(' ',TITLE.REST_OF_NAME)
THEN TITLE.REST_OF_NAME --No space? return the whole thing
ELSE SUBSTRING(
TITLE.REST_OF_NAME
,1
,CHARINDEX(' ',TITLE.REST_OF_NAME)-1
)
END AS FIRST_NAME
,CASE WHEN 0 = CHARINDEX(' ',TITLE.REST_OF_NAME)
THEN NULL --no spaces @ all? then 1st name is all we have
ELSE SUBSTRING(
TITLE.REST_OF_NAME
,CHARINDEX(' ',TITLE.REST_OF_NAME)+1
,LEN(TITLE.REST_OF_NAME)
)
END AS REST_OF_NAME
,TITLE.ORIGINAL_INPUT_DATA
FROM
(
SELECT
--if the first three characters are in this list,
--then pull it as a "title". otherwise return NULL for title.
CASE WHEN SUBSTRING(TEST_DATA.FULL_NAME,1,3) IN ('MR ','MS ','DR ','MRS')
THEN LTRIM(RTRIM(SUBSTRING(TEST_DATA.FULL_NAME,1,3)))
ELSE NULL
END AS TITLE
--if you change the list, don't forget to change it here, too.
--so much for the DRY prinicple...
,CASE WHEN SUBSTRING(TEST_DATA.FULL_NAME,1,3) IN ('MR ','MS ','DR ','MRS')
THEN LTRIM(RTRIM(SUBSTRING(TEST_DATA.FULL_NAME,4,LEN(TEST_DATA.FULL_NAME))))
ELSE LTRIM(RTRIM(TEST_DATA.FULL_NAME))
END AS REST_OF_NAME
,TEST_DATA.ORIGINAL_INPUT_DATA
FROM
(
SELECT
--trim leading & trailing spaces before trying to process
--disallow extra spaces *within* the name
REPLACE(REPLACE(LTRIM(RTRIM(FULL_NAME)),' ',' '),' ',' ') AS FULL_NAME
,FULL_NAME AS ORIGINAL_INPUT_DATA
FROM
(
--if you use this, then replace the following
--block with your actual table
SELECT 'GEORGE W BUSH' AS FULL_NAME
UNION SELECT 'SUSAN B ANTHONY' AS FULL_NAME
UNION SELECT 'ALEXANDER HAMILTON' AS FULL_NAME
UNION SELECT 'OSAMA BIN LADEN JR' AS FULL_NAME
UNION SELECT 'MARTIN J VAN BUREN SENIOR III' AS FULL_NAME
UNION SELECT 'TOMMY' AS FULL_NAME
UNION SELECT 'BILLY' AS FULL_NAME
UNION SELECT NULL AS FULL_NAME
UNION SELECT ' ' AS FULL_NAME
UNION SELECT ' JOHN JACOB SMITH' AS FULL_NAME
UNION SELECT ' DR SANJAY GUPTA' AS FULL_NAME
UNION SELECT 'DR JOHN S HOPKINS' AS FULL_NAME
UNION SELECT ' MRS SUSAN ADAMS' AS FULL_NAME
UNION SELECT ' MS AUGUSTA ADA KING ' AS FULL_NAME
) RAW_DATA
) TEST_DATA
) TITLE
) FIRST_NAME
相关文章