在 T-SQL 中反序列化 PHP 序列化数据

2021-12-19 00:00:00 serialization php magento tsql

我正在尝试从 Magento 订单中提取礼品卡代码.其他一些代码使用 Magento API 从 Magento 检索订单信息作为 XML 并将 XML 插入到 MS SQL Server 记录中.使用 T-SQL 我可以使用 XML 函数来解析从 Magento API 检索到的 XML 并获得我需要的几乎所有东西,但存储实际礼品卡代码的唯一位置是在gift_cards 字段中,该字段恰好是一个 php 序列化字符串.

I am trying to extract a gift card code from a Magento order. Some other code uses the Magento API to retrieve the order info as XML from Magento and insert the XML into a MS SQL Server record. Using T-SQL I can use XML functions to parse the XML that was retrieved from the Magento API and get almost everything I need, but the only place the actual gift card code is stored is in the gift_cards field which happens to be a php serialized string.

示例:

a:1:{i:0;a:5:{s:1:"i";s:1:"1";s:1:"c";s:12:"00XCY8S3ZXCU";s:1:"a";d:119;s:2:"ba";d:119;s:10:"authorized";d:119;}}
a:3:{i:0;a:5:{s:1:"i";s:2:"10";s:1:"c";s:12:"045EMJJWRCF1";s:1:"a";d:100;s:2:"ba";d:100;s:10:"authorized";d:100;}i:1;a:5:{s:1:"i";s:2:"11";s:1:"c";s:12:"06DUQ7Z5GVT7";s:1:"a";d:101;s:2:"ba";d:101;s:10:"authorized";d:101;}i:2;a:5:{s:1:"i";s:2:"12";s:1:"c";s:12:"07A6MRYW511J";s:1:"a";d:102;s:2:"ba";d:102;s:10:"authorized";d:102;}}

礼品卡代码是键为c"的数组中的值,例如:00XCY8S3ZXCU 045EMJJWRCF1 06DUQ7Z5GVT7 07A6MRYW511J

The gift card code is the value in the array with the key of "c" such as: 00XCY8S3ZXCU 045EMJJWRCF1 06DUQ7Z5GVT7 07A6MRYW511J

我目前正在尝试通过使用 T-SQL 函数解析值来解决这个问题,这就像尝试用螺丝刀钉钉子一样.显然在这里之前已经问过这个问题,唯一的建议是构建一个解析器在 T-SQL 中从头开始,但使用 PHP 对其进行反序列化是更好的选择.

I'm currently trying to tackle this by parsing the value using a T-SQL function, which is like trying to drive a nail with a screw driver. Apparently this has been asked before here and the only suggestion was to build a parser from scratch in T-SQL, but that using PHP to unserialize it was the better option.

如果 Magento 不将 PHP 序列化数据存储在他们的数据库中,然后在他们的 Web 服务中仍然以序列化形式提供它,那就太好了,但这就是我必须使用的.我会考虑使用 C# 来转换它并将其作为一个单独的字段存储在数据库中,但是能够在 T-SQL 中解析数据会方便得多.如果我使用 C# 来解析和反序列化 PHP 对象,我可能会将它作为 XML 存储回数据库中,因为这是一种更好的数据交换格式.

It would be nice if Magento didn't store PHP serialized data in their database, and then serve it out still serialized in their web services, but that's what I have to work with. I would consider using C# to convert it and store it as a separate field in the database, but it would be a lot more convenient to be able to parse the data in T-SQL. If I were to use C# to parse and unserialize the PHP object I'd probably store it as XML back in the database because that's a much better format to exchange the data in.

推荐答案

这是我自己想出来的.帖子让我感到鼓舞关于解析 JSON 并决定找出序列化的 php 对象.不过我采取了完全不同的方法.

This is what I was able to come up with myself. I was encouraged by a post about parsing JSON and decided to figure it out for serialized php objects. I took a completely different approach though.

更新的代码示例是现已在 github 上发布.

序列化的php字符串:

The serialized php string:

a:3:{
  i:0;
  a:5:{
    s:1:"i";
    s:2:"10";

    s:1:"c";
    s:12:"045EMJJWRCF1";

    s:1:"a";
    d:100;

    s:2:"ba";
    d:100;

    s:10:"authorized";
    d:100;
  }

  i:1;
  a:5:{
    s:1:"i";
    s:2:"11";

    s:1:"c";
    s:12:"06DUQ7Z5GVT7";

    s:1:"a";
    d:101;

    s:2:"ba";
    d:101;

    s:10:"authorized";
    d:101;
  }

  i:2;
  a:5:{
    s:1:"i";
    s:2:"12";

    s:1:"c";
    s:12:"07A6MRYW511J";

    s:1:"a";
    d:102;

    s:2:"ba";
    d:102;

    s:10:"authorized";
    d:102;
  }
}

我的查询以获取结果:

select *
from parsePhpSerializedString('a:3:{i:0;a:5:{s:1:"i";s:2:"10";s:1:"c";s:12:"045EMJJWRCF1";s:1:"a";d:100;s:2:"ba";d:100;s:10:"authorized";d:100;}i:1;a:5:{s:1:"i";s:2:"11";s:1:"c";s:12:"06DUQ7Z5GVT7";s:1:"a";d:101;s:2:"ba";d:101;s:10:"authorized";d:101;}i:2;a:5:{s:1:"i";s:2:"12";s:1:"c";s:12:"07A6MRYW511J";s:1:"a";d:102;s:2:"ba";d:102;s:10:"authorized";d:102;}}')

查询结果:

element_id  parent_id   var_name                                           var_type                                           var_length  value_int   value_string                                                                                                                                                                                                                                                     value_decimal
----------- ----------- -------------------------------------------------- -------------------------------------------------- ----------- ----------- ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- ---------------------------------------
1           0           NULL                                               a                                                  3           NULL        NULL                                                                                                                                                                                                                                                             NULL
2           1           0                                                  a                                                  5           NULL        NULL                                                                                                                                                                                                                                                             NULL
3           1           1                                                  a                                                  5           NULL        NULL                                                                                                                                                                                                                                                             NULL
4           1           2                                                  a                                                  5           NULL        NULL                                                                                                                                                                                                                                                             NULL
5           2           i                                                  s                                                  2           NULL        10                                                                                                                                                                                                                                                               NULL
6           2           c                                                  s                                                  12          NULL        045EMJJWRCF1                                                                                                                                                                                                                                                     NULL
7           2           a                                                  d                                                  NULL        NULL        NULL                                                                                                                                                                                                                                                             100
8           2           ba                                                 d                                                  NULL        NULL        NULL                                                                                                                                                                                                                                                             100
9           2           authorized                                         d                                                  NULL        NULL        NULL                                                                                                                                                                                                                                                             100
10          3           i                                                  s                                                  2           NULL        11                                                                                                                                                                                                                                                               NULL
11          3           c                                                  s                                                  12          NULL        06DUQ7Z5GVT7                                                                                                                                                                                                                                                     NULL
12          3           a                                                  d                                                  NULL        NULL        NULL                                                                                                                                                                                                                                                             101
13          3           ba                                                 d                                                  NULL        NULL        NULL                                                                                                                                                                                                                                                             101
14          3           authorized                                         d                                                  NULL        NULL        NULL                                                                                                                                                                                                                                                             101
15          4           i                                                  s                                                  2           NULL        12                                                                                                                                                                                                                                                               NULL
16          4           c                                                  s                                                  12          NULL        07A6MRYW511J                                                                                                                                                                                                                                                     NULL
17          4           a                                                  d                                                  NULL        NULL        NULL                                                                                                                                                                                                                                                             102
18          4           ba                                                 d                                                  NULL        NULL        NULL                                                                                                                                                                                                                                                             102
19          4           authorized                                         d                                                  NULL        NULL        NULL                                                                                                                                                                                                                                                             102

如果我只想要礼品卡代码,我可以写一个这样的查询:

If I just wanted the gift card codes I can write a query like this:

select value_string
from parsePhpSerializedString('a:3:{i:0;a:5:{s:1:"i";s:2:"10";s:1:"c";s:12:"045EMJJWRCF1";s:1:"a";d:100;s:2:"ba";d:100;s:10:"authorized";d:100;}i:1;a:5:{s:1:"i";s:2:"11";s:1:"c";s:12:"06DUQ7Z5GVT7";s:1:"a";d:101;s:2:"ba";d:101;s:10:"authorized";d:101;}i:2;a:5:{s:1:"i";s:2:"12";s:1:"c";s:12:"07A6MRYW511J";s:1:"a";d:102;s:2:"ba";d:102;s:10:"authorized";d:102;}}')
where   parent_id != 0 and
        var_name = 'c'

结果:

value_string
-------------
045EMJJWRCF1
06DUQ7Z5GVT7
07A6MRYW511J

这里是解析序列化的 PHP 字符串的 T-SQL 函数:

Here is the T-SQL function for parsing the serialized PHP string:

IF OBJECT_ID (N'dbo.parsePhpSerializedString') IS NOT NULL
   DROP FUNCTION dbo.parsePhpSerializedString
GO
CREATE FUNCTION dbo.parsePhpSerializedString( @phpSerialized VARCHAR(MAX))
RETURNS @results table 
    (
        element_id int identity(1,1) not null, /* internal surrogate primary key gives the order of parsing and the list order */
        parent_id int, /* if the element has a parent then it is in this column. */
        var_name varchar(50), /* the name or key of the element in a key/value array list */
        var_type varchar(50),
        var_length int,
        value_int int,
        value_string varchar(max),
        value_decimal numeric
    )
AS
BEGIN

    /*
    Built by Matt Johnson (matt@evdat.com) 2012-08-14
    */

    -- we use this table later for collecting auto generated
    -- identity values when inserting records into @results
    declare @insertedIds table (
        element_id int
    )

    -- define variables
    declare @element_start int
    declare @var_type_end int
    declare @var_type varchar(50)
    declare @element_end int
    declare @chunk varchar(max)
    declare @var_length_start int
    declare @var_length_end int
    declare @var_length_string varchar(max)
    declare @var_length int
    declare @value_start int
    declare @value_end int
    declare @value_string varchar(max)
    declare @value_int int
    declare @value_decimal numeric
    declare @array_level int
    declare @value_string_position int
    declare @next_open int
    declare @next_close int
    declare @parent_id int
    declare @element_id int
    declare @key_element_id int
    declare @inserted_element_id int
    declare @var_name varchar(50)

    --initialize variables
    set @parent_id = 0


    --loop through the supplied @phpSerialized string until it's empty
    while 1=1 begin
        set @element_start = null
        set @var_type_end = null
        set @var_type = null
        set @element_end = null
        set @chunk = null
        set @var_length_start = null
        set @var_length_end = null
        set @var_length_string = null
        set @var_length = null
        set @value_start = null
        set @value_end = null
        set @value_string = null
        set @value_int = null
        set @value_decimal = null
        set @array_level = null
        set @value_string_position = null
        set @next_open = null
        set @next_close = null
        set @var_name = null

        --confirm that there is an element to parse and define its starting point
        --patindex will return a value of 1 if the pattern is found and this pattern
        --will only match if the element starting point is the first character in the
        --supplied string. If it is encapsulated in quotes or anything else it will not match
        set @element_start = patindex('[asid]:%[;}]', @phpSerialized)

        if @element_start <= 0 begin
            --if the supplied string is now empty check the existing results table
            --for any nested elements in any array elements

            --reset the value of @element_id to be safe
            set @element_id = null

            --only retrieve the first element found containing sub elements to parse
            select  top 1 
                    @phpSerialized = value_string,
                    @element_id = element_id    
            from @results 
            where   var_type = 'a' and 
                    value_string is not null

            --set the parent_id to the array's element_id
            set @parent_id = @element_id

            --if there were no results found then that means there either
            --were no arrays to parse, or all arrays have already been parsed
            --so break the continuous loop because we are completely done now
            if @element_id is null break

            --set the @element_start again now that we 
            --have a new string to parse for elements
            set @element_start = patindex('[asid]:%[;}]', @phpSerialized)
        end

        --find the end of the type of the element then extract the variable type from the string
        set @var_type_end = patindex('%:%', @phpSerialized)
        set @var_type = substring(@phpSerialized, @element_start, @var_type_end-@element_start)

        --generate an error if a variable type is supplied that hasn't been coded for.
        if @var_type not like '[asid]' begin
            /*
            print @var_type
            RAISERROR (N'Error parsing php serialized string. Variable type found that has not been defined to parse for.', -- Message text.
                       16, -- Severity,
                       1 -- State
                       )
            */

            --apparently errors can't be raised within a function so skip the element
            break
        end

        --array elements contain sub elements so we use different methods for parsing
        --sub elements than we do for parsing individual elements.
        if @var_type != 'a' begin
            --element has no sub elements

            --determine the end of this individual element and then extract 
            --only this individual element from the string
            set @element_end = patindex('%;%', @phpSerialized)+1
            set @chunk = substring(@phpSerialized, @element_start, @element_end-@element_start)

            --strings are serialized differently than numeric elements
            if @var_type = 's' begin
                --element has var length

                --find the starting and ending positions for the var_length and then extract the length
                set @var_length_start = @var_type_end+1
                set @var_length_end = patindex('%:%', substring(@chunk, @var_length_start, len(@chunk))) + @var_length_start - 1
                set @var_length_string = substring(@chunk, @var_length_start, @var_length_end-@var_length_start)
                if @var_length_string not like '[^0-9]' begin
                    --its nice to verify this is actually a number before casting it as such
                    set @var_length = cast(@var_length_string as int)
                end

                --find the starting and ending positions for the value and then extract the value
                set @value_start = @var_length_end+1
                set @value_end = patindex('%;%', @chunk)
                --a string value is quoted so remove quotes in start and end of substring for value
                --we set the substring starting position +1 just past the start of the quote and then
                --set the length of the extracted string -2 to account for both the starting quote and 
                --ending quote to be removed from the extracted string.
                set @value_string = substring(@chunk, @value_start+1, @value_end-@value_start-2)

            end else begin
                --element does not have a var length

                --find the starting and ending positions for the value and then extract the value as a string
                set @value_start = @var_type_end+1
                set @value_end = patindex('%;%', @chunk)
                set @value_string = substring(@chunk, @value_start, @value_end-@value_start)

                --determine what value type the string should be converted to
                if @var_type = 'i' begin
                    if @value_string not like '[^0-9.]' begin
                        set @value_int = cast(@value_string as int)
                        --clear the value_string because the element's value has been converted to its appropriate type
                        set @value_string = null
                    end
                end else if @var_type = 'd' begin
                    if @value_string not like '[^0-9.]' begin
                        set @value_decimal = cast(@value_string as numeric)
                        --clear the value_string because the element's value has been converted to its appropriate type
                        set @value_string = null
                    end
                end

            end


        end else begin
            --element is array and has sub elements

            --we are going to chop up the string to try and determine its end so we'll
            --first set the string to a variable we can destroy in this process
            set @chunk = @phpSerialized

            --find the starting and ending positions for the var_length and then extract the length
            --arrays use this to state how may elements this array contains
            set @var_length_start = @var_type_end+1
            set @var_length_end = patindex('%:%', substring(@chunk, @var_length_start, len(@chunk))) + @var_length_start - 1
            set @var_length_string = substring(@chunk, @var_length_start, @var_length_end-@var_length_start)
            if @var_length_string not like '[^0-9]' begin
                set @var_length = cast(@var_length_string as int)
            end

            --find the value starting position
            --later we will find the true end of the value
            set @value_start = @var_length_end+1

            -- to determine the ending position we have to dig through the sub elements and track the
            -- nested level to identify the ending brace for this level
            set @array_level = 0
            --we start the string position at 1 for the begining of the serialized string
            set @value_string_position = 1

            -- loop through the value chopping up the chunk while trying to find the ending brace for this array
            while 1=1 begin

                --find the next open and close braces in the chunk
                set @next_open = patindex('%{%', @chunk)
                set @next_close = patindex('%}%', @chunk)

                --check to see which brace is the next in the chunk
                if @next_open > 0 and @next_open < @next_close begin
                    --found an opening brace

                    --since this is an opening brace we need to increment the level and strip off
                    --everything from the chunk before the brace so that we can search for additional braces
                    --we also note the position in the string for use in finding the end of the value later
                    --we track the previous position and add to it because we keep chopping off the beginning of
                    --the chunk as we parse through the string, and later we will need to reference the position
                    --relative to the entire serialized string.
                    set @value_string_position = @value_string_position + patindex('%{%', @chunk)-1
                    set @chunk = substring(@chunk, patindex('%{%', @chunk)+1, len(@chunk))
                    set @array_level = @array_level + 1

                end else if @next_close > 0 begin
                    --found a closing brace
                    --print 'found close at level: ' + cast(@array_level as varchar(10)) + '(' + cast(patindex('%}%', @chunk) as varchar(10)) + ')'

                    --since this is a closing brace we need to decrement the level and strip off
                    --everything from the chunk before the brace so that we can search for additional braces
                    --we also note the position in the string for use in finding the end of the value later
                    --we track the previous position and add to it because we keep chopping off the beginning of
                    --the chunk as we parse through the string, and later we will need to reference the position
                    --relative to the entire serialized string.
                    set @value_string_position = @value_string_position + patindex('%}%', @chunk)+1
                    set @chunk = substring(@chunk, patindex('%}%', @chunk)+1, len(@chunk))
                    set @array_level = @array_level - 1

                end else break

                --once we get back to level 0 we know we've found the end of this array element
                --so break the continuous loop now that we have the ending position
                if @array_level <= 0 break
            end

            --set the ending position of the element and the value since the value is the last part of the element
            set @element_end = @value_string_position
            set @value_end = @element_end
            --an array value is surrounded by braces so remove the braces in start and end of the substring value
            --we set the substring starting position +1 just past the start of the opening brace and then
            --set the length of the extracted string -2 to account for both the opening brace and 
            --closing brace to be removed from the extracted string.
            set @value_string = substring(@phpSerialized, @value_start+1, @value_end-@value_start-2)
            set @chunk = substring(@phpSerialized, @element_start, @element_end-@element_start)

            -- if the array is empty just set it to null so that
            -- we don't try and parse the contents of the array value later.
            if @value_string = '' set @value_string = null
        end

        --we populate the results table differently depending on the element that is being parsed. 
        --Any element contained in an array has a key element and a value element. Though we parse
        --all key elements in an array all we do with them in the results table is set their value
        --as the var_name (key) for the element, and store the key_element_id for setting the value
        --in the next pass since key/value pairs are listed sequentially in a serialized string.
        if @parent_id > 0 and @key_element_id > 0 begin
            --parent_id > 0 indicates this is a sub element inside an array
            --and the array contents currently being parsed contain key elements and value
            --elements sequentially in the serialized string. Because key_element_id is not 0
            --we know this must be the value part of the element contained in the array as the
            --key was just previously defined.

            --update the existing keyed element with it's type, length and value
            update @results
            set var_type = @var_type, 
                var_length = @var_length, 
                value_string = @value_string, 
                value_int = @value_int, 
                value_decimal = @value_decimal
            where element_id = @key_element_id      

            set @key_element_id = null
        end else if @parent_id > 0 begin
            --this element is a part of the array which contains key/value paris and since
            --the @key_element_id is 0 or not defined we can asume this is the key of the pair

            --determine what the key element type is and cast it as a string to the var_name
            if @var_type = 'i' set @var_name = cast(@value_int as varchar(50))
            if @var_type = 'd' set @var_name = cast(@value_decimal as varchar(50))
            if @var_type = 's' set @var_name = @value_string

            --here we are assuming that all keys should be of type s, d, or i... so we if get anything
            --other than that throw an error since this code is not setup to handle that.
            if @var_type not like '[sid]' begin
                /*
                print @var_type
                RAISERROR (N'Error parsing php serialized string. Variable type found that should be akey but is not of type s, d, or i.', -- Message text.
                           16, -- Severity,
                           1 -- State
                           )
                */

                --apparently errors can't be raised within a function so skip the element
                break
            end

            --insert a new record into the results table defining the parent_id and var_name
            insert @results 
            (
                parent_id,
                var_name
            ) 
            output inserted.element_id into @insertedIds 
            values (
                @parent_id,
                @var_name
            )

            --since we stored the identity value in the output set that value to @inserted_element_id
            select top 1 @inserted_element_id = element_id from @insertedIds
            delete from @insertedIds

            --set the key_element_id so that the next pass catches the value and assigns it to this result record
            set @key_element_id = @inserted_element_id

        end else begin
            --this will be executed for parsed strings that are not part of an array
            --in which case the elements do not contain value_name keys

            --insert the entire element details into the results table
            insert into @results 
            (
                parent_id, 
                var_type, 
                var_length, 
                value_string, 
                value_int, 
                value_decimal
            ) 
            output inserted.element_id into @insertedIds 
            values (
                @parent_id,
                @var_type,
                @var_length,
                @value_string,
                @value_int,
                @value_decimal
            )

            --here we capture the identiy value for the inserted record
            select top 1 @inserted_element_id = element_id from @insertedIds
            delete from @insertedIds
        end

        -- if the current php serilized string was an array then it would have
        -- been parsed and elements added to the results, so we should remove
        -- the value_string from the array element in the results table so that
        -- it doesn't get parsed again in the loop. 
        if @element_id is not null begin
            --the only strings that get parsed where the @element_id has
            --a value is from an array element
            update @results
            set value_string = null
            where element_id = @element_id
        end

        --since we have parsed this element from the serialized string chop off this element
        --from the string and run the rest of it through the loop again to ensure all
        --elements have been parsed from the supplied serialized string.
        set @phpSerialized = substring(@phpSerialized, @element_end, len(@phpSerialized))

    end

    return
end

虽然它没有考虑所有序列化类型等,但对于我在这里的目的来说已经足够了,并且包含的​​注释应该有助于指导任何需要对其进行调整以供自己使用或扩展其功能的人.

And though it doesn't account for all serialized types and such, it was enough for my purpose here, and the included comments should help to guide anyone that needs to adapt it for their own use, or extend it's functionality.

相关文章