SQL:交叉应用将名称拆分为名字、姓氏和 MI

2021-09-10 00:00:00 sql tsql sql-server

我有一个包含这样的用户名的表.

I have a table that has user names like this.

Name
-----
Smith-Bay, Michael R.
Abbott, David Jr.
Actor, Cody
Agular, Stephen V.

我需要名字看起来像:

Last         First    MI
-------------------------
Smith-Bay    Michael  R
Abbott       David    Jr
Actor        Cody
Agular       Stephen  V 

我有以下 SQL 将名称拆分为第一个和最后一个:

I have the following SQL that splits the name into first and last:

select vl.lastname, vf.firstname
from users as t cross apply
(values (left(t.name, charindex(', ', t.name)), stuff(t.name, 1, 
charindex(', ', t.name) + 1, ''))) vl(lastname, rest) 
cross apply 
(values (left(vl.rest, charindex(' ', vl.rest + ' ')))) vf(firstname)
order by  vl.lastname

如何应用另一个交叉应用来提取名字减去末尾句点之后的所有内容?

How can I apply another cross apply to extract basically everything after the first name minus the period at the end?

推荐答案

我不得不多次这样做,因为我经常使用 ETL 并且由于数据存储不佳而需要从字符串中提取项目或者只是简单地从报告中提取数据.数据并不总是很好地打包在单独的列中,我发现自己出于各种原因解析数据.希望您解析的数据是一致的.不一致的数据要么使这变得更加困难,要么不可能.如果您的名字完全符合您建议的格式,那么我下面的方法将非常有效.我在很多场合都用过它.

I've had to do this on many occasions as I work ETL on a regular basis and either need to extract items from within strings due to either bad data storage or just simply having to pull the data from reports. The data isn't always nicely packaged in separate columns and I find myself parsing data for all sorts of reasons. Hopefully the data you are parsing is consistent. Inconsistent data either makes this much more difficult or impossible. If you can rely on your names being exactly in the format you suggested my method below will work perfectly. I've used it on many occasions.

下面的方法我在许多不同的语言中都使用过.我已经在 MS ACCESS、Microsoft SSMS 和 C# 中完成了这项工作.我的例子来自 Oracle.

The method below I've used in many different languages. I've done this in MS ACCESS, Microsoft SSMS and C#. My example is out of Oracle.

基本思想是:

找到分隔你的 First_Name、Last_Name 和 Middle_Initial 字符串的字符位置.

使用获得的字符位置将字符串提取到新列中.

代码如下:

WITH character_pos AS
(
/* First we need the character positions for spaces, commas and the period for the middle initial */
SELECT name
  /* Find 1st Space in the name so we can extract the first name from the string */
  , instr(name, ', ') AS comma_1st_space_pos
  /* Find 2nd Space in the name so we can extract the last name from the string */
  , instr(name, ' ', 1, 2) AS comma_2nd_space_pos
  /* Get the Length of the last name so we know how many characters the substr function should extract */
  , instr(name, ' ', 1, 2) - (instr(name, ', ') + 2) AS last_name_length
  /* Find period in the name so we can extract the Middle Initial should it exist */
  , instr(name, '.')  AS period_pos
  , (instr(name, '.') - 1) - instr(name, ' ', 1, 2) AS middle_initial_length
  
FROM parse_name
) /* END character_pos CTE */

SELECT name  
  , substr(name, 0, comma_1st_space_pos -1) AS last_name
   
  , CASE WHEN  period_pos = 0 THEN substr(name, comma_1st_space_pos + 2)
    ELSE substr(name, comma_1st_space_pos + 2, last_name_length) 
    END AS first_name
   
  , substr(name, comma_2nd_space_pos + 1, middle_initial_length) AS middle_initial
  
  , comma_1st_space_pos, comma_2nd_space_pos, last_name_length
  , period_pos, middle_initial_length
FROM character_pos
;

我使用 CTE 只是为了在实际提取之外组织字符位置,但这一切都可以在一个 SQL 语句中完成.

I used a CTE just to organize the character positions outside of the actual extraction however this all could be done in one single SQL Statement.

基本上,这证明除了一些简单的字符串解析函数之外,您不需要任何额外的东西.您只需要 Instring 和 Substring,它们通常以任何语言提供.没有存储过程,没有临时表,也不需要额外的外部代码.除非有超出原始问题范围的其他因素导致必须使用 SQL 以外的任何其他内容.

Basically this proves you don't need anything extra outside of just some simple string parsing functions. All you need is Instring and Substring which are usually available in any language. No Stored procedures, no temp table and no extra outside code needed. Unless there are other factors outside the scope of the original question that makes it necessary to use anything other than just SQL.

相关文章