MariaDB/MySQL UPDATE 语句具有多个连接,包括范围连接

2022-01-15 00:00:00 sql-update sql mariadb mysql

我有桌子

登录历史

create table login_history
(
    id         int auto_increment primary key,
    ip         int unsigned,
    created    datetime(6)  not null,
    uid    int unsigned not null,
);

IP 到位置表

create table ip2location
(
    ip_from      int unsigned not null primary key,
    ip_to        int unsigned null,
    country_code char(2)      null,
)

帐户表

create table account
(
    uid               int unsigned not null primary key,
);

一些订单

create table order
(
    id             int auto_increment primary key,
    item_id        varchar(20)       not null,
    price          int               not null,
    timestamp      datetime(6)       not null,
    country_code   char(2)           null,
    uid            int unsigned      null
)

所有表都有适合这个问题的索引.

All tables have fitting indices for this problem.

我的目标是用 ip2location 表中的国家/地区填写订单的国家/地区代码.我有登录历史记录,因为我想让问题不再复杂,所以我可以使用最新的 IP,用户在给定的时间范围内.我假设在时间范围内切换国家并购买东西是一个可以忽略不计的用例.另外,由于登录历史只保留了几天,我想填写将 country_code 设置为 null 的旧订单,同时为用户获取国家/地区.

My goal is to fill the country code of the orders with the country from the ip2location table. I have a login history and because I want to make the problem not more complicated as necessary, I am fine with using the most recent IP, a user had in the given time range. I assume that switching a country and buying something within the time range, is a negligible use case. Also, because the login history is only held for a couple of days, I want to fill old orders, which have set the country_code to null, also to get the country for the user.

我的方法如下.

我正在尝试使用以下on"加入两个表.表达.

I am trying to join both tables with the following "on" expression.

update order

left join account using(uid)
left join (
    select uid, 
           LAST_VALUE(ip) over (PARTITION BY uid) as `ip_int`
    from login_history
    where created >= '{{ current_date }}'
    and created < '{{ next_date }}'
    group by user_id
    ) as lh
on account.uid = lh.uid
left join ip2location as ip on
    (ip.ip_from < login_history.ip_int and ip.ip_to >= login_history.ip_int)
    or
    (ip.ip_from = lh.ip_int)
set
    order.country_id = ip.country_id
where order.country_id is null;

它可以工作,但速度很慢.可能也是因为表格的大小:

It works but is very slow. Probably also because of the size of the tables:

  • login_history >15 米奥.条目(where 语句将其减少到 500K 条目)
  • 帐号 >7 米奥.条目
  • ip2location ~ 200K 条目
  • 订单>1 米奥.

也许这是 MariaDB 无法提供解决方案的用例.目标是在 30 秒内完成此查询.由于不锁表太久,当然越快越好.

Maybe that's a use case where MariaDB can't provide a solution. The target is, to finish this query in under 30 seconds. For reasons of not locking the table for too long, faster would be better of course.

我在以下陈述中看到了一些潜力.为了在 ip2location 表中找到正确的条目,我必须使用一个范围,并且我还必须考虑存在一个条目,其中只给出一个 IP,并且 ip_to 字段为空.

I see some potential in the following statement. For finding the right entry in the ip2location table, I have to use a range and I have to also consider an entry exists, where only one IP is given, and the ip_to field is null.

left join ip2location as ip on
        (ip.ip_from <= login_history.ip_int and ip.ip_to >= login_history.ip_int)
        or
        (ip.ip_from = lh.ip_int)

此外,以下选择看起来有些时间紧迫:

Also, the following select looks somewhat time intense:

select uid, 
               LAST_VALUE(ip) over (PARTITION BY uid) as `ip_int`
        from login_history
        where created >= '{{ current_date }}'
        and created < '{{ next_date }}'
        group by user_id

我曾考虑通过先使用 select 然后使用 update 语句来拆分它,但最终,由于组织此任务的脚本,这可能会花费更多时间并且还会使用更多 CPU 时间.

I thought about splitting this up by first use a select and then an update statement, but in the end, this could cost more time and also would use more CPU time, because of the script, which organizes this task.

您能帮我找到更好的查询吗?或者您对如何有效地解决这个问题有什么好的建议吗?

Can you help me find a better query or do you have some good advice on how to tackle this problem efficiently?

提前致谢,祝您有美好的一天!

Thanks in advance and have a nice day!

推荐答案

我认为以下基于相关子查询的方法可以满足您的要求:

I think the following approach, based on a correlated subquery, does what you ask for:

update orders o
set country = (
    select il.country_code
    from login_history lh
    inner join ip2location il on lh.ip >= il.ip_from and lh.ip_to < il.ip_to
    where lh.created <= o.timestamp and lh.uid = o.uid
    order by lh.created desc limit 1
) 
where o.country_id is null

这会搜索日期早于或等于订单时间戳的同一用户的最新登录历史记录,并恢复相应的国家/地区.

This searches for the latest login_history for the same user whose date prior or equal to the order timestamp, and recovers the corresponding country.

相关文章