如何获取最频繁的值SQL

2021-12-30 00:00:00 sql oracle11g oracle

我有一个表 Orders(id_trip, id_order)、表 Trip(id_hotel, id_bus, id_type_of_trip) 和表 Hotel(id_hotel, name).

I have a table Orders(id_trip, id_order), table Trip(id_hotel, id_bus, id_type_of_trip) and table Hotel(id_hotel, name).

我想获得订单表中最常出现的酒店的名称.

I would like to get name of the most frequent hotel in table Orders.

SELECT hotel.name from Orders
 JOIN Trip
 on Orders.id_trip = Trip.id_hotel
 JOIN hotel
 on trip.id_hotel = hotel.id_hotel
  FROM (SELECT hotel.name, rank() over (order by cnt desc) rnk
          FROM (SELECT hotel.name, count(*) cnt
                  FROM Orders
                 GROUP BY hotel.name))
 WHERE rnk = 1;

推荐答案

分布中最常出现的值"是统计学中的一个独特概念,具有技术名称.它被称为分布的模式.Oracle 有 STATS_MODE() 函数.https://docs.oracle.com/cd/B19306_01/server.102/b14200/functions154.htm

The "most frequently occurring value" in a distribution is a distinct concept in statistics, with a technical name. It's called the MODE of the distribution. And Oracle has the STATS_MODE() function for it. https://docs.oracle.com/cd/B19306_01/server.102/b14200/functions154.htm

例如,使用标准SCOTT模式中的EMP表,select stats_mode(deptno) from scott.emp将返回30 - 员工最多的部门数量.(30 是部门名称"或编号,不是该部门的员工人数!)

For example, using the EMP table in the standard SCOTT schema, select stats_mode(deptno) from scott.emp will return 30 - the number of the department with the most employees. (30 is the department "name" or number, it is NOT the number of employees in that department!)

就你而言:

select stats_mode(h.name) from (the rest of your query)

注意:如果两个或多个酒店并列为最频繁",则 STATS_MODE() 将返回其中之一(非确定性).如果您需要所有绑定值,您将需要一个不同的解决方案 - 一个很好的例子在文档中(上面链接).这是 Oracle 对统计概念的理解和实施中的一个记录缺陷.

Note: if two or more hotels are tied for "most frequent", then STATS_MODE() will return one of them (non-deterministic). If you need all the tied values, you will need a different solution - a good example is in the documentation (linked above). This is a documented flaw in Oracle's understanding and implementation of the statistical concept.

相关文章