哪种方法可以更快地使用 PHP/Laravel 从 MySQL/MariaDB 获取所有 POI
如果我错了,请纠正我.
Correct me if I'm wrong.
用户在我的网站上创建了三种方法来获取最近的房屋:
There are three approaches to get the nearest homes, users have created in my website:
- 要创建一个包含两列(纬度、经度)的表格,它们都是浮动的,然后说:
这里是:
$latitude = 50;
$longitude = 60;
SELECT * FROM my_table
WHERE (latitude <= $latitude+10 AND latitude >= $latitude-10)
AND (longitude <= $longitude+10 AND longitude >= $longitude-10)
这里的 10 表示例如 1 公里.
that 10 here means 1km for example.
在这种方法中,我们也可以使用 harvesine 公式.
In this approach we can also use harvesine formula.
将这些列(纬度,经度)合并到一列名为点的点为 POINT 类型,然后再次逐行搜索.
To merge those columns(latitude, longitude) to one column named point as POINT type and again search each row one by one.
要将多个点(用户创建的房屋坐标)分类为一个国家(即城市)的一个部分的类别,如果查询带有 $latitude 和 $longitude 以查看最近的房屋,我会检查它们存储在哪个类别中,以免搜索所有行,而仅搜索此查询(坐标)所属的部分.
To categorize multiple points(the coordinates of homes users have created) as a category for one section of a country i.e. city and if a query comes with $latitude and $longitude to see the nearest homes, I will check in which category they are stored IN ORDER NOT TO search all rows but search only the section this query(coordinate) belongs to.
我猜方法 1 很慢,因为表格的每一行的条件,如果我使用 harvesine 公式,也会很慢.
As I guess approach number 1 is slow because of the conditions for each row of table and again slow if I use harvesine formula.
如果我使用 ST_Distance,它似乎又变慢了,因为它只是有很多计算.
If I use ST_Distance it seems again it's slow because again it just has lots of calculations.
但是,如果我使用方法 3,检查每个部分的特定点用户似乎比检查所有行更快.我知道如何为每个家设置点,但我不知道如何在另一个表中创建多个家位置作为一个部分.
But if I use approach number 3 it seems it is faster to check each section for an specific point user is than check all rows. I know how to set point for each home however I don't know how to create multiple home positions as a section maybe in another table.
顺便说一句,新版本的 MySQL 和 MariaDB 空间索引在 InnoDB 中受支持.
BTW in new versions of MySQL and MariaDB Spatial Indexes are supported in InnoDB.
我的问题:
第 1 种方法真的很慢吗,或者其他 ST_* 函数与这种方法相同,可以用这些公式一一检查所有行?哪个更快?
Is approach number 1 really slow or other ST_* functions are the same as this approach to check all rows with those formulas mentioned there one by one? Which one is faster?
除了简单的条件之外,方法 2 是否可以使其更快?我的意思是当使用 POINT 类型而不是 float 并使用 ST_* 函数而不是自己做时,它是否会做出任何改变?我想知道算法是否不同.
Does approach number 2 do something other than simple conditions to make it faster? I mean does it make any changes when using type of POINT instead of float and using ST_* functions instead of doing it myself? I want to know whether the algorithm is different.
如果方法 3 是这三种方法中最快的,我如何对点进行分类,以免搜索表中的所有行?
If approach number 3 is the fastest in these three approaches, how can I categorize points in order not to search all rows in a table?
如何使用空间索引使其尽可能快?
How can I use Spatial Indexes to make it as fast as possible?
如果存在任何其他方法并且我没有提及,您能否告诉我如何仅通过 PHP/Laravel 中的 MySQL/MariaDB 中的坐标获得最近的房屋?
If any other approaches exist and I didn't mention, could you please tell me how can I get the nearest homes just by having coordinates in MySQL/MariaDB in PHP/Laravel?
谢谢大家
推荐答案
你使用哪个公式来计算距离并不重要.更重要的是您必须阅读、处理和排序的行数.在最好的情况下,您可以在 WHERE 子句中使用条件索引来限制处理的行数.您可以尝试对您的位置进行分类 - 但这取决于您的数据的性质,如果这会运作良好.您还需要找出要使用的类别".更通用的解决方案是使用 SPATIAL INDEX 和 ST_Within() 函数.
Which formula you use for the distance doesn't matter much. What matters much more is the number of rows which you have to read, process and sort. In best case you can use an index for a condition in the WHERE clause to limit the number of processed rows. You can try to categorize your locations - But it depends on the nature of your data, if that is going to work well. You would also need to find out which "category" to use. A more general solution would be to use a SPATIAL INDEX and the ST_Within() function.
现在让我们运行一些测试..
Now let's run some tests..
在我的数据库(MySQL 5.7.18)中,我有下表:
In my DB (MySQL 5.7.18) I have the following table:
CREATE TABLE `cities` (
`cityId` MEDIUMINT(9) UNSIGNED NOT NULL AUTO_INCREMENT,
`country` CHAR(2) NOT NULL COLLATE 'utf8mb4_unicode_ci',
`city` VARCHAR(100) NOT NULL COLLATE 'utf8mb4_unicode_ci',
`accentCity` VARCHAR(100) NOT NULL COLLATE 'utf8mb4_unicode_ci',
`region` CHAR(2) NULL DEFAULT NULL COLLATE 'utf8mb4_unicode_ci',
`population` INT(10) UNSIGNED NULL DEFAULT NULL,
`latitude` DECIMAL(10,7) NOT NULL,
`longitude` DECIMAL(10,7) NOT NULL,
`geoPoint` POINT NOT NULL,
PRIMARY KEY (`cityId`),
SPATIAL INDEX `geoPoint` (`geoPoint`)
) COLLATE='utf8mb4_unicode_ci' ENGINE=InnoDB
数据来自自由世界城市数据库,包含 3173958(3.1M) 行.
The data comes from Free World Cities Database and contains 3173958 (3.1M) rows.
注意geoPoint
是多余的,等于POINT(longitude, latitude)
.
考虑到用户位于伦敦的某个地方
Concider the user is located somewhere in London
set @lon = 0.0;
set @lat = 51.5;
并且您想从 cities
表中找到最近的位置.
and you want to find the nearest location from the cities
table.
一个微不足道"的查询是
A "trivial" query would be
select c.cityId, c.accentCity, st_distance_sphere(c.geoPoint, point(@lon, @lat)) as dist
from cities c
order by dist
limit 1
结果是
988204 Blackwall 1085.8212159861014
执行时间:~ 4.970 秒
Execution time: ~ 4.970 sec
如果您使用不太复杂的函数 ST_Distance()
,您会得到相同的结果,执行时间约为 4.580 秒 - 差别不大.
If you use the less complex function ST_Distance()
, you get the same result with an execution time of ~ 4.580 sec - which is not so much difference.
请注意,您不需要在表格中存储地理点.您也可以使用 (point(c.longitude, c.latitude)
而不是 c.geoPoint
.令我惊讶的是它甚至更快(ST_Distance
和 ST_Distance_Sphere
约 4.0 秒).如果我根本没有 geoPoint
列可能会更快.但这仍然没有没关系,因为您不希望用户等待,所以如果您可以做得更好,请登录以获得响应.
Note that you don't need to store a geo point in the table. You can as good use (point(c.longitude, c.latitude)
instead of c.geoPoint
. To my surprise it is even faster (~3.6 sec for ST_Distance
and ~4.0 sec for ST_Distance_Sphere
). It might be even faster if I didn't have a geoPoint
column at all. But that still doesn't matter much, since you don't want the user to wait so log for a respose, if you can do better.
现在让我们看看如何将 SPATIAL INDEX 与 ST_Within()
一起使用.
Now let's look how we can use the SPATIAL INDEX with ST_Within()
.
您需要定义一个包含最近位置的多边形.一个简单的方法是使用 ST_Buffer(),它会生成一个有 32 个点的多边形,几乎是一个圆*.
You need to define a polygon which will contain the nearest location. A simple way is to use ST_Buffer() which will generate a polygon with 32 points and is nearly a circle*.
set @point = point(@lon, @lat);
set @radius = 0.1;
set @polygon = ST_Buffer(@point, @radius);
select c.cityId, c.accentCity, st_distance_sphere(c.geoPoint, point(@lon, @lat)) as dist
from cities c
where st_within(c.geoPoint, @polygon)
order by dist
limit 1
结果是一样的.执行时间约为 0.000 秒(这是我的客户 (HeidiSQL) 所说的).
The result is the same. The execution time is ~ 0.000 sec (that's what my client (HeidiSQL) says).
* 请注意,@radius
以度数表示,因此多边形将更像椭圆而不是圆形.但是在我的测试中,我总是得到与简单而缓慢的解决方案相同的结果.在我在生产代码中使用它之前,我会调查更多的边缘情况.
* Note that the @radius
is notated in degrees and thus the polygon will be more like an ellipse rather than a circle. But in my tests I always got the same result as with the simple and slow solution. I would though investigate more edge cases, before I use it in my production code.
现在您需要为您的应用程序/数据找到最佳半径.如果它太小 - 你可能得不到任何结果,或者错过最近的点.如果它太大 - 您可能需要处理太多行.
Now you need to find the optimal radius for your application/data. If it's too small - you might get no results, or miss the nearest point. If it's too big - you might need to process too many rows.
这里是给定测试用例的一些数字:
Here some numbers for the given test case:
- @radius = 0.001:无结果
- @radius = 0.01:恰好一个位置(有点幸运) - 执行时间 ~ 0.000 秒
- @radius = 0.1:55 个位置 - 执行时间 ~ 0.000 秒
- @radius = 1.0:2183 个位置 - 执行时间 ~ 0.030 秒
相关文章