MySQL - 正确的方法事件计数

2022-01-23 00:00:00 join subquery mysql self-join

我想列出具有特定事件计数的用户,但我对采用哪种方法感到困惑.

I want to list users which have a particular event count but I'm confused on which approach to take.

这是数据库表:

CREATE TABLE `event` (
  `event_id` int(11) unsigned NOT NULL AUTO_INCREMENT,
  `visitor_id` int(11) DEFAULT NULL,
  `key` varchar(200) DEFAULT NULL,
  `value` text,
  `label` varchar(200) DEFAULT '',
  `datetime` datetime DEFAULT NULL,
  PRIMARY KEY (`event_id`)
) ENGINE=InnoDB AUTO_INCREMENT=6 DEFAULT CHARSET=utf8;


INSERT INTO `event` (`event_id`, `visitor_id`, `key`, `value`, `label`, `datetime`)
VALUES
    (1, 1, 'LOGIN', NULL, '', NULL),
    (2, 2, 'LOGIN', NULL, '', NULL),
    (3, 1, 'VIEW_PAGE', 'HOTEL', '', NULL),
    (4, 2, 'VIEW_PAGE', 'HOTEL', '', NULL),
    (5, 1, 'PURCHASE_HOTEL', NULL, '', NULL);

CREATE TABLE `visitor` (
  `visitor_id` int(11) unsigned NOT NULL AUTO_INCREMENT,
  `datetime` datetime DEFAULT NULL,
  PRIMARY KEY (`visitor_id`)
) ENGINE=InnoDB AUTO_INCREMENT=6 DEFAULT CHARSET=utf8;

INSERT INTO `visitor` (`visitor_id`, `datetime`)
VALUES
    (1, NULL),
    (2, NULL);

这是我的方法:

SELECT DISTINCT
    t1.`visitor_id`
FROM
    `visitor` t1

JOIN `event` t2 on t1.visitor_id = t2.visitor_id AND t2.`key` = 'LOGIN'
JOIN `event` t3 on t1.visitor_id = t3.visitor_id AND t3.`key` = 'VIEW_PAGE' AND t3.`value` = 'HOTEL'
WHERE ( SELECT COUNT(*) FROM `event` WHERE `event`.`key` = 'PURCHASE_HOTEL' ) > 0

这应该只列出访客 1,但它实际上也列出了没有 PURCHASE_HOTEL 事件的访客 2.

this should only list visitor 1 but it does actually list visitor 2 too which does not have the PURCHASE_HOTEL event.

您可以想象,每个特定案例都会有更多的规则",例如所有 JOIN 事件.我们能以某种方式纠正和改进这个问题吗?

As you can imagine, there will be more "rules" like all the JOIN events for each particular case. Can we correct and improve this somehow?

奖励:这种方法叫什么名字?

BONUS: What is the name of this approach?

推荐答案

我认为这是一个set-within-sets"查询.对于这种类型的查询,我喜欢使用带有 having 子句的聚合.以下检查您正在寻找的三个条件:

I think this is a "set-within-sets" query. I like using aggregation with a having clause for this type of query. The following checks the three conditions you are looking for:

select visitor_id
from event e
group by visitor_id
having sum(e.key = 'LOGIN') > 0 and
       sum(e.key = 'VIEW_PAGE' and e.value = 'HOTEL') > 0 and
       sum(e.key = 'PURCHASE_HOTEL') > 0;

having 子句中的第一个条件计算LOGIN 记录的数量,当至少找到一条时为真.(如果您想要一个,请将 >0 更改为 = 0.)

The first condition in the having clause counts the number of LOGIN records and is true when at least one is found. (If you want exactly one, change > 0 to = 0.)

第二个条件检查酒店页面的查看情况.

The second condition checks the viewing of the hotel page.

第三个统计酒店购买次数.

The third counts the number of hotel purchases.

相关文章