MySQL复杂子查询公式

2022-01-23 00:00:00 subquery mysql inner-join query-optimization mysql-variables

我有两张桌子 - books 和 images.books 具有 id、name、releasedate、purchasecount 等列.images 有 bookid (和books里面的id一样，基本上一本书可以有多张图片.虽然我没有设置任何外文key constraint)，bucketid，poster(每条记录指向某个bucket中的一个图片文件，对应某个bookid).

表架构:

poster 在images 中是唯一的，因此它是一个主键.
书籍封面索引:(name, id, releasedate)
图片覆盖索引:(bookid,poster,bucketid)

我的查询是，给定一个名称，从 books 表中找到名称与该名称匹配的前十本书(按 purchasecount 的数量排序)，对于该书，返回任何(最好是首先)从 images 表中记录(bucketid 和 poster).

显然，这可以通过运行第一个查询并使用其结果查询 images 表来解决，但这会很慢，所以我想使用join"和子查询一次性完成.但是，我正在尝试并没有给我正确的结果:

select books.id,books.name,year(releasedate),purchasecount,bucketid,poster from books内连接(从图片中选择 bucketid、bookid、poster) t ont.bookid = books.id where name like "%foo%" order by purchasecount desc limit 2;

任何人都可以在这里根据需要建议一个最佳查询来获取结果集(包括更改表架构以缩短搜索时间的任何建议)吗?

更新的小提琴:

最终查询

选择 b.id,b.purchasecount,xDerivedImages2.poster,xDerivedImages2.bucketid来自书籍 b左连接( 选择 i.bookid,i.poster,i.bucketid,i.upvotes,@rn := if(@lastbookid = i.bookid, @rn + 1, 1) 作为 rownum，@lastbookid := i.bookid 作为虚拟从( 选择 bookid,max(upvotes) 作为 maxup来自图像按书名分组) xDerivedImages加入图像我关于 i.bookid=xDerivedImages.bookid 和 i.upvotes=xDerivedImages.maxup交叉连接(选择@rn:=0,@lastbookid:=-1)参数按 i.bookid 订购) xDerivedImages2在 xDerivedImages2.bookid=b.id 和 xDerivedImages2.rownum=1按 b.purchasecount desc 排序限制 10

结果

+----+--------------+---------------------+----------+|编号 |购买次数 |海报 |桶ID |+----+----------------+---------+----------+|4 |第678章空 |空 ||6 |500 |空 |空 ||5 |第459章swt |11 ||1 |第456章blah_blah_tie_break |111 ||3 |77 |qwqq |14 ||2 |11 |z |81 |+----+----------------+---------+----------+

cross join的意义仅仅是引入和设置2个变量的起始值.就是这样.

结果是按 purchasecount 降序排列的前 10 本书以及来自 images 的信息(如果存在)(否则为 NULL)最受好评的图像.所选择的图像遵循 tie-break 规则，如上面在可视化部分中提到的第一个规则，带有 rownum.

最后的想法

我把它留给 OP 在最后插入适当的 where 子句，因为给出的示例数据没有有用的书名可供搜索.那部分是微不足道的.哦，对大宽度主键的模式做一些事情.但目前这是题外话.

I have two tables - books and images. books has columns like id, name, releasedate, purchasecount. images has bookid (which is same as the id in books, basically one book can have multiple images. Although I haven't set any foreign key constraint), bucketid, poster (each record points to an image file in a certain bucket, for a certain bookid).

Table schema:

poster is unique in images, hence it is a primary key.
Covering index on books: (name, id, releasedate)
Covering index on images: (bookid,poster,bucketid)

My query is, given a name, find the top ten books (sorted by number of purchasecount) from the books table whose name matches that name, and for that book, return any (preferably the first) record (bucketid and poster) from the images table.

Obviously this can be solved by two queries by running the first, and using its results to query the images table, but that will be slow, so I want to use 'join' and subquery to do it in one go. However, what I am trying is not giving me correct results:

select books.id,books.name,year(releasedate),purchasecount,bucketid,poster from books 
inner join (select bucketid,bookid, poster from images) t on 
t.bookid  = books.id  where name like "%foo%" order by purchasecount desc limit 2;

Can anybody suggest an optimal query to fetch the result set as desired here (including any suggestion to change the table schema to improve search time) ?

Updated fiddle: http://sqlfiddle.com/#!9/17c5a8/1.

The example query should return two results - fooe and fool, and one (any of the multiple posters corresponding to each book) poster for each result. However I am not getting correct results. Expected:

fooe - 1973 - 459 - 11 - swt (or fooe - 1973 - 459 - 11 - pqr)

fool - 1963 - 456 - 12 - xxx (or fool - 1963 - 456 - 111 - qwe)

解决方案

I agree with Strawberry about the schema. We can discuss ideas for better performance and all that. But here is my take on how to solve this after a few chats and changes to the question.

Note below the data changes to deal with various boundary conditions which include books with no images in that table, and tie-breaks. Tie-breaks meaning using the max(upvotes). The OP changed the question a few times and added a new column in the images table.

Modified quetion became return 1 row make per book. Scratch that, always 1 row per book even if there are no images. The image info to return would be the one with max upvotes.

Books table

create table books 
(   id int primary key, 
    name varchar(1000), 
    releasedate date, 
    purchasecount int
) ENGINE=InnoDB;

insert into books values(1,"fool","1963-12-18",456);
insert into books values(2,"foo","1933-12-18",11);
insert into books values(3,"fooherty","1943-12-18",77);
insert into books values(4,"eoo","1953-12-18",678);
insert into books values(5,"fooe","1973-12-18",459);
insert into books values(6,"qoo","1983-12-18",500);

Data Changes from original question.

Mainly the new upvotes column.

The below includes a tie-break row added.

create table images 
(   bookid int, 
    poster varchar(150) primary key, 
    bucketid int, 
    upvotes int -- a new column introduced by OP
) ENGINE=InnoDB;

insert into images values (1,"xxx",12,27);
insert into images values (5,"pqr",11,0);
insert into images values (5,"swt",11,100);
insert into images values (2,"yyy",77,65);
insert into images values (1,"qwe",111,69);
insert into images values (1,"blah_blah_tie_break",111,69);
insert into images values (3,"qwqqe",14,81);
insert into images values (1,"qqawe",8,45);
insert into images values (2,"z",81,79);

Visualization of a Derived Table

This is just to assist in visualizing an inner piece of the final query. It demonstrates the gotcha for tie-break situations, thus the rownum variable. That variable is reset to 1 each time the bookid changes otherwise it increments. In the end (our final query) we only want rownum=1 rows so that max 1 row is returned per book (if any).

Final Query

select b.id,b.purchasecount,xDerivedImages2.poster,xDerivedImages2.bucketid
from books b
left join
(   select i.bookid,i.poster,i.bucketid,i.upvotes,
    @rn := if(@lastbookid = i.bookid, @rn + 1, 1) as rownum,
    @lastbookid := i.bookid as dummy
    from 
    (   select bookid,max(upvotes) as maxup
        from images
        group by bookid
    ) xDerivedImages
    join images i
    on i.bookid=xDerivedImages.bookid and i.upvotes=xDerivedImages.maxup
    cross join (select @rn:=0,@lastbookid:=-1) params
    order by i.bookid
) xDerivedImages2
on xDerivedImages2.bookid=b.id and xDerivedImages2.rownum=1
order by b.purchasecount desc
limit 10

Results

+----+---------------+---------------------+----------+
| id | purchasecount | poster              | bucketid |
+----+---------------+---------------------+----------+
|  4 |           678 | NULL                |     NULL |
|  6 |           500 | NULL                |     NULL |
|  5 |           459 | swt                 |       11 |
|  1 |           456 | blah_blah_tie_break |      111 |
|  3 |            77 | qwqqe               |       14 |
|  2 |            11 | z                   |       81 |
+----+---------------+---------------------+----------+

The significance of the cross join is merely to introduce and set starting values for 2 variables. That is all.

The results are the top ten books in descending order of purchasecount with the info from images if it exists (otherwise NULL) for the most upvoted image. The image selected honors tie-break rules picking the first one as mentioned above in the Visualization section with rownum.

Final Thoughts

I leave it to the OP to wedge in the appropriate where clause at the end as the sample data given had no useful book name to search on. That part is trivial. Oh, and do something about the schema for the large width of your primary keys. But that is off-topic at the moment.

相关文章