在搜索之间加速

2021-09-10 00:00:00 tsql sql-server

我正在使用 MaxMind 免费数据库进行 IP 查找.我将数据转换成下表:

I am using MaxMind free databases to do IP lookups. I convert the data to the following table:

CREATE TABLE [dbo].[GeoBlocks](
    [StartIPNum] [varchar](50) NULL,
    [EndIPNumb] [varchar](50) NULL,
    [LocationNum] [varchar](50) NULL,
    [PostalCode] [varchar](50) NULL,
    [Latitude] [varchar](50) NULL,
    [Longitude] [varchar](50) NULL)

这个查找表中大约有 350 万条记录.

There are about 3.5M records in this lookup table.

我的目标是通过查找 IP 在 StartIPNum 和 EndIPNum 之间的记录来确定 IP(十进制形式)的 LocationNum

My goal is to determine the LocationNum for an IP(decimal form) by finding the record where the IP is between StartIPNum and EndIPNum

我的存储过程如下所示:参数:@DecimalIP bigint

My stored procedure looks like this: Parameter: @DecimalIP bigint

select GeoBlocks.StartIPNum ,@DecimalIP as DecimalIp 
     ,GeoBlocks.Postalcode ,GeoBlocks.Latitude as Latitude
     ,GeoBlocks.Longitude as Longitude
     from GeoBlocks
    where @DecimalIP between GeoBlocks.StartIPNum  and GeoBlocks.EndIPNumb

我在 StartIPNum 和 EndIPNum 上创建了唯一索引.

I have created unique indexes on StartIPNum and EndIPNum.

但是,当我运行它时,SQL Server 会对查询的 Where 部分进行表扫描.此查询需要 650-750 毫秒.(我服务器上的大多数查询需要 0-2 毫秒)

However, when I run this, SQL server does a table scan for the Where portion of the query. This query takes 650-750ms. (Most queries on my server take 0-2ms)

如何加快查询速度?

添加示例数据:

StartIPNum  EndIPNumb   LocationNum PostalCode  Latitude    Longitude
1350218632  1350218639  2782113                  48.2000    16.3667
1350218640  1350218655  2782113                  48.2000    16.3667
1350218656  1350218687  2782113                  48.2000    16.3667
1350218688  1350218751  2782113                  48.2000    16.3667
1350218752  1350218783  2782113                  48.2000    16.3667

推荐答案

更新:

总结分散在各种评论中的信息:

To summarize information scattered among various comments:

  1. IP 地址列是 VarChar(50) 字符串,包含没有左填充的十进制值.这些列上的索引将按字母顺序而不是数字顺序对它们进行排序,即10"<2".(使用左填充,排序在数字上也是正确的:10">02".)

  1. The IP address columns are VarChar(50) strings containing decimal values without left padding. An index on those columns will sort them alphabetically, not numerically, i.e. "10" < "2". (With left padding the sort will be correct numerically as well: "10" > "02".)

WHERE 子句( where @DecimalIP 在 GeoBlocks.StartIPNum 和 GeoBlocks.EndIPNumb 之间)使用混合数据类型.@DecimalIP 是一个 BIGINT 而两列是 VarChar(50).SQL 通过实现数据类型优先级方案来处理混合数据类型之间的操作.(Ref.)这会导致每行中的 IP 地址被转换从字符串到 BIGINT 值,因此比较以数字方式完成,并且以相当大的成本返回预期"结果.在这种情况下,索引(几乎)毫无用处.

The WHERE clause (where @DecimalIP between GeoBlocks.StartIPNum and GeoBlocks.EndIPNumb) uses mixed datatypes. @DecimalIP is a BIGINT while the two columns are VarChar(50). SQL handles operations among mixed datatypes by implementing a data type precedence scheme. (Ref.) This causes the IP addresses in each row to be converted from strings to BIGINT values, hence the comparison is done numerically and the "expected" results are returned at a considerable cost. The indexes are (all but) useless in this case.

将列更改为 BIGINT 将允许使用索引来提高性能并确保比较按数字而不是按字母顺序进行.包含 StartIPNumEndIPNumb 列的单个索引将大大提高性能.请注意,如果不允许重叠地址范围,则索引在 StartIPNum 上将有效地唯一,并且可以用 StartIPNum 上的索引替换为 EndIPNumb作为包含列的性能.

Changing the columns to BIGINT will allow the use of an index to improve performance and ensure that comparisons are done numerically rather than alphabetically. An single index containing both the StartIPNum and EndIPNumb columns will greatly improve performance. Note that if overlapping address ranges are not allowed then the index will effectively be unique on StartIPNum and could be replaced with an index on StartIPNum with EndIPNumb as an included column for performance.

原答案:

如果您使用点号表示的 IPV4 地址,例如192.168.0.42",您可以使用此 UDF 将字符串转换为 BIGINT 值:

If you are using IPV4 addresses in dotted notation, e.g. "192.168.0.42", you can convert the strings into BIGINT values with this UDF:

create function [dbo].[IntegerIPV4Address]( @IPV4Address VarChar(16) )
  returns BigInt
  with SchemaBinding
  begin
  declare @Dot1 as Int = CharIndex( '.', @IPV4Address );
  declare @Dot2 as Int = CharIndex( '.', @IPV4Address, @Dot1 + 1 );
  declare @Dot3 as Int = CharIndex( '.', @IPV4Address, @Dot2 + 1 );
  return Cast( Substring( @IPV4Address, 0, @Dot1 ) as BigInt ) * 0x1000000 +
    Cast( Substring( @IPV4Address, @Dot1 + 1, @Dot2 - @Dot1 - 1 ) as BigInt ) * 0x10000 +  
    Cast( Substring( @IPV4Address, @Dot2 + 1, @Dot3 - @Dot2 - 1 ) as BigInt ) * 0x100 +
    Cast( Substring( @IPV4Address, @Dot3 + 1, Len( @IPV4Address ) * 1 ) as BigInt );
  end

您可以根据函数结果存储整数值或在计算列上创建索引.请注意,您需要更改查询以引用 WHERE 子句中的整数列.

You can either store the integer values or create an index on a computed column based on the functions result. Note that you need to change your query to reference the integer column in the WHERE clause.

如果您将值存储为整数,以下函数会将它们转换回规范化字符串,其中地址的每个部分都是三位数.这些值可用于比较,因为它们将按字母顺序和数字顺序排序.

If you store the values as integers the following function will convert them back to normalized strings where each part of the address is three digits. These values can be used in comparisons since they will sort the same way both alphabetically and numerically.

create function [dbo].[NormalizedIPV4Address]( @IntegerIPV4Address as BigInt )
  returns VarChar(16)
  with SchemaBinding -- Deterministic function.
  begin
  declare @BinaryAddress as VarBinary(4) = Cast( @IntegerIPV4Address as VarBinary(4) );
  return Right( '00' + Cast( Cast( Substring( @BinaryAddress, 1, 1 ) as Int ) as VarChar(3) ), 3 ) +
    '.' + Right( '00' + Cast( Cast( Substring( @BinaryAddress, 2, 1 ) as Int ) as VarChar(3) ), 3 ) +
    '.' + Right( '00' + Cast( Cast( Substring( @BinaryAddress, 3, 1 ) as Int ) as VarChar(3) ), 3 ) +
    '.' + Right( '00' + Cast( Cast( Substring( @BinaryAddress, 4, 1 ) as Int ) as VarChar(3) ), 3 )
  end

您可以对表中的字符串值进行往返,将它们全部转换为规范化"形式,以便使用这两个函数对它们进行正确排序.不是一个理想的解决方案,因为它需要对所有未来的插入和更新进行规范化,但目前可能会有所帮助.

You could round-trip the string values in your table to get them all into "normalized" form so that they sort correctly by using both functions. Not an ideal solution since it requires that all future inserts and updates be normalized, but it may help for the moment.

相关文章