Py学习  »  DATABASE

如何对多个表使用mysql group by和inner join限制记录数

Peter • 4 年前 • 1088 次点击  

以下查询:

SELECT
year, id, rate
FROM h
WHERE year BETWEEN 2000 AND 2009
AND id IN (SELECT rid FROM table2)
GROUP BY id, year
ORDER BY id, rate DESC

产量:

year    id  rate
2006    p01 8
2003    p01 7.4
2008    p01 6.8
2001    p01 5.9
2007    p01 5.3
2009    p01 4.4
2002    p01 3.9
2004    p01 3.5
2005    p01 2.1
2000    p01 0.8
2001    p02 12.5
2004    p02 12.4
2002    p02 12.2
2003    p02 10.3
2000    p02 8.7
2006    p02 4.6
2007    p02 3.3

我只想要每个ID的前5个结果:

2006    p01 8
2003    p01 7.4
2008    p01 6.8
2001    p01 5.9
2007    p01 5.3
2001    p02 12.5
2004    p02 12.4
2002    p02 12.2
2003    p02 10.3
2000    p02 8.7

有没有一种方法可以使用在group by中工作的limit-like修饰符来实现这一点?

Python社区是高质量的Python/Django开发社区
本文地址:http://www.python88.com/topic/40087
 
1088 次点击  
文章 [ 13 ]  |  最新文章 4 年前
Himanshu Patel
Reply   •   1 楼
Himanshu Patel    7 年前

请尝试下面的存储过程。我已经核实过了。我得到了正确的结果,但没有使用 groupby .

CREATE DEFINER=`ks_root`@`%` PROCEDURE `first_five_record_per_id`()
BEGIN
DECLARE query_string text;
DECLARE datasource1 varchar(24);
DECLARE done INT DEFAULT 0;
DECLARE tenants varchar(50);
DECLARE cur1 CURSOR FOR SELECT rid FROM demo1;
DECLARE CONTINUE HANDLER FOR NOT FOUND SET done = 1;

    SET @query_string='';

      OPEN cur1;
      read_loop: LOOP

      FETCH cur1 INTO tenants ;

      IF done THEN
        LEAVE read_loop;
      END IF;

      SET @datasource1 = tenants;
      SET @query_string = concat(@query_string,'(select * from demo  where `id` = ''',@datasource1,''' order by rate desc LIMIT 5) UNION ALL ');

       END LOOP; 
      close cur1;

    SET @query_string  = TRIM(TRAILING 'UNION ALL' FROM TRIM(@query_string));  
  select @query_string;
PREPARE stmt FROM @query_string;
EXECUTE stmt;
DEALLOCATE PREPARE stmt;

END
MLF
Reply   •   2 楼
MLF    8 年前

试试这个:

SET @num := 0, @type := '';
SELECT `year`, `id`, `rate`,
    @num := if(@type = `id`, @num + 1, 1) AS `row_number`,
    @type := `id` AS `dummy`
FROM (
    SELECT *
    FROM `h`
    WHERE (
        `year` BETWEEN '2000' AND '2009'
        AND `id` IN (SELECT `rid` FROM `table2`) AS `temp_rid`
    )
    ORDER BY `id`
) AS `temph`
GROUP BY `year`, `id`, `rate`
HAVING `row_number`<='5'
ORDER BY `id`, `rate DESC;
Dev-Ria
Reply   •   3 楼
Dev-Ria    11 年前

对于像我这样有疑问的人。我做了以下的限制和其他任何由一个特定的小组。

DELIMITER $$
CREATE PROCEDURE count_limit200()
BEGIN
    DECLARE a INT Default 0;
    DECLARE stop_loop INT Default 0;
    DECLARE domain_val VARCHAR(250);
    DECLARE domain_list CURSOR FOR SELECT DISTINCT domain FROM db.one;

    OPEN domain_list;

    SELECT COUNT(DISTINCT(domain)) INTO stop_loop 
    FROM db.one;
    -- BEGIN LOOP
    loop_thru_domains: LOOP
        FETCH domain_list INTO domain_val;
        SET a=a+1;

        INSERT INTO db.two(book,artist,title,title_count,last_updated) 
        SELECT * FROM 
        (
            SELECT book,artist,title,COUNT(ObjectKey) AS titleCount, NOW() 
            FROM db.one 
            WHERE book = domain_val
            GROUP BY artist,title
            ORDER BY book,titleCount DESC
            LIMIT 200
        ) a ON DUPLICATE KEY UPDATE title_count = titleCount, last_updated = NOW();

        IF a = stop_loop THEN
            LEAVE loop_thru_domain;
        END IF;
    END LOOP loop_thru_domain;
END $$

它循环遍历一个域列表,然后每个域只插入200个限制

Ricky Moreno
Reply   •   4 楼
Ricky Moreno    11 年前
SELECT year, id, rate
FROM (SELECT
  year, id, rate, row_number() over (partition by id order by rate DESC)
  FROM h
  WHERE year BETWEEN 2000 AND 2009
  AND id IN (SELECT rid FROM table2)
  GROUP BY id, year
  ORDER BY id, rate DESC) as subquery
WHERE row_number <= 5

子查询几乎与您的查询相同。唯一的变化是添加

row_number() over (partition by id order by rate DESC)
Shlomi Noach
Reply   •   5 楼
Shlomi Noach    11 年前

以下职位: sql: selcting top N record per group 描述在不使用子查询的情况下实现此目标的复杂方法。

它改进了这里提供的其他解决方案:

  • 在单个查询中执行所有操作
  • 能够正确利用索引
  • 避免子查询,众所周知在mysql中会产生错误的执行计划

但它并不漂亮。如果在mysql中启用了窗口函数(也称为分析函数),一个很好的解决方案是可以实现的——但事实并非如此。 这篇文章中使用的技巧是使用group concat,它有时被描述为“mysql的穷人窗口函数”。

John
Reply   •   6 楼
John    7 年前

做了一些工作,但是我认为我的解决方案是值得分享的,因为它看起来既优雅又快速。

SELECT h.year, h.id, h.rate 
  FROM (
    SELECT id, 
      SUBSTRING_INDEX(GROUP_CONCAT(CONCAT(id, '-', year) ORDER BY rate DESC), ',' , 5) AS l
      FROM h
      WHERE year BETWEEN 2000 AND 2009
      GROUP BY id
      ORDER BY id
  ) AS h_temp
    LEFT JOIN h ON h.id = h_temp.id 
      AND SUBSTRING_INDEX(h_temp.l, CONCAT(h.id, '-', h.year), 1) != h_temp.l

请注意,此示例是为问题的目的而指定的,并且可以很容易地为其他类似目的进行修改。

Patrick Tucci Wang Wen'an
Reply   •   7 楼
Patrick Tucci Wang Wen'an    7 年前

在oracle中构建虚拟列rowid

表:

`
CREATE TABLE `stack` 
(`year` int(11) DEFAULT NULL,
`id` varchar(10) DEFAULT NULL,
`rate` float DEFAULT NULL) 
ENGINE=InnoDB DEFAULT CHARSET=utf8mb4
`

数据:

insert into stack values(2006,'p01',8);
insert into stack values(2001,'p01',5.9);
insert into stack values(2007,'p01',5.3);
insert into stack values(2009,'p01',4.4);
insert into stack values(2001,'p02',12.5);
insert into stack values(2004,'p02',12.4);
insert into stack values(2005,'p01',2.1);
insert into stack values(2000,'p01',0.8);
insert into stack values(2002,'p02',12.2);
insert into stack values(2002,'p01',3.9);
insert into stack values(2004,'p01',3.5);
insert into stack values(2003,'p02',10.3);
insert into stack values(2000,'p02',8.7);
insert into stack values(2006,'p02',4.6);
insert into stack values(2007,'p02',3.3);
insert into stack values(2003,'p01',7.4);
insert into stack values(2008,'p01',6.8);

像这样的SQL:

select t3.year,t3.id,t3.rate 
from (select t1.*, (select count(*) from stack t2 where t1.rate<=t2.rate and t1.id=t2.id) as rownum from stack t1) t3 
where rownum <=3 order by id,rate DESC;

如果删除t3中的where子句,则显示如下:

enter image description here

获取“前n个记录”->在where子句(t3的where子句)中添加“rownum<=3”;

选择“年份”->在where子句(t3的where子句)中添加“2000到2009之间”;

Saharsh Shah
Reply   •   8 楼
Saharsh Shah    9 年前

试试这个:

SELECT h.year, h.id, h.rate 
FROM (SELECT h.year, h.id, h.rate, IF(@lastid = (@lastid:=h.id), @index:=@index+1, @index:=0) indx 
      FROM (SELECT h.year, h.id, h.rate 
            FROM h
            WHERE h.year BETWEEN 2000 AND 2009 AND id IN (SELECT rid FROM table2)
            GROUP BY id, h.year
            ORDER BY id, rate DESC
            ) h, (SELECT @lastid:='', @index:=0) AS a
    ) h 
WHERE h.indx <= 5;
Brian L Cartwright
Reply   •   9 楼
Brian L Cartwright    11 年前

这需要一系列子查询来对值进行排序、限制它们,然后在分组时执行求和

@Rnk:=0;
@N:=2;
select
  c.id,
  sum(c.val)
from (
select
  b.id,
  b.bal
from (
select   
  if(@last_id=id,@Rnk+1,1) as Rnk,
  a.id,
  a.val,
  @last_id=id,
from (   
select 
  id,
  val 
from list
order by id,val desc) as a) as b
where b.rnk < @N) as c
group by c.id;
Community Sarfraz
Reply   •   10 楼
Community Sarfraz    6 年前

不,不能任意限制子查询(在较新的mysqls中可以在有限的范围内这样做,但不能针对每个组5个结果)。

这是一个groupwise最大类型查询,这在sql中是很重要的。有 various ways 在某些情况下更有效地解决这个问题,但是对于top-n,您通常会想看看 Bill's answer 去问一个类似的问题。

与此问题的大多数解决方案一样,如果有多个行具有相同的 rate 值,因此您可能仍然需要大量的后处理来检查该值。

user1691427 Vishal Kumar
Reply   •   11 楼
user1691427 Vishal Kumar    7 年前

对我来说

SUBSTRING_INDEX(group_concat(col_name order by desired_col_order_name), ',', N) 

工作得很好。没有复杂的问题。


例如:每组获得前1名

SELECT 
    *
FROM
    yourtable
WHERE
    id IN (SELECT 
            SUBSTRING_INDEX(GROUP_CONCAT(id
                            ORDER BY rate DESC),
                        ',',
                        1) id
        FROM
            yourtable
        GROUP BY year)
ORDER BY rate DESC;
Salman A Jan Hančič
Reply   •   12 楼
Salman A Jan Hančič    4 年前

这个 original query 使用的用户变量和 ORDER BY 在派生表上;不能保证这两个怪癖的行为。修改后的答案如下。

在mysql 5.x中,可以使用poor man的rank over partition来获得所需的结果。只需将表与自身进行外部连接,对于每一行,计算行数 较小的 比它。在上述情况下,较小的行是具有较高速率的行:

SELECT t.id, t.rate, t.year, COUNT(l.rate) AS rank
FROM t
LEFT JOIN t AS l ON t.id = l.id AND t.rate < l.rate
GROUP BY t.id, t.rate, t.year
HAVING COUNT(l.rate) < 5
ORDER BY t.id, t.rate DESC, t.year

Demo and Result :

| id  | rate | year | rank |
|-----|------|------|------|
| p01 |  8.0 | 2006 | 0    |
| p01 |  7.4 | 2003 | 1    |
| p01 |  6.8 | 2008 | 2    |
| p01 |  5.9 | 2001 | 3    |
| p01 |  5.3 | 2007 | 4    |
| p02 | 12.5 | 2001 | 0    |
| p02 | 12.4 | 2004 | 1    |
| p02 | 12.2 | 2002 | 2    |
| p02 | 10.3 | 2003 | 3    |
| p02 |  8.7 | 2000 | 4    |

请注意,如果费率有关联,例如:

100, 90, 90, 80, 80, 80, 70, 60, 50, 40, ...

上面的查询将返回6行:

100, 90, 90, 80, 80, 80

更改为 HAVING COUNT(DISTINCT l.rate) < 5 要获得8行:

100, 90, 90, 80, 80, 80, 70, 60

或改为 ON t.id = l.id AND (t.rate < l.rate OR (t.rate = l.rate AND t.pri_key > l.pri_key)) 要获得5行:

 100, 90, 90, 80, 80

在mysql 8或更高版本中,只需使用 RANK , DENSE_RANK or ROW_NUMBER 功能:

SELECT *
FROM (
    SELECT *, RANK() OVER (PARTITION BY id ORDER BY rate DESC) AS rnk
    FROM t
) AS x
WHERE rnk <= 5
fthiella
Reply   •   13 楼
fthiella    8 年前

你可以用 GROUP_CONCAT 聚合函数,将所有年份归为一列,按 id 并由 rate :

SELECT   id, GROUP_CONCAT(year ORDER BY rate DESC) grouped_year
FROM     yourtable
GROUP BY id

结果:

-----------------------------------------------------------
|  ID | GROUPED_YEAR                                      |
-----------------------------------------------------------
| p01 | 2006,2003,2008,2001,2007,2009,2002,2004,2005,2000 |
| p02 | 2001,2004,2002,2003,2000,2006,2007                |
-----------------------------------------------------------

然后你可以用 FIND_IN_SET ,它返回第一个参数在第二个参数中的位置。

SELECT FIND_IN_SET('2006', '2006,2003,2008,2001,2007,2009,2002,2004,2005,2000');
1

SELECT FIND_IN_SET('2009', '2006,2003,2008,2001,2007,2009,2002,2004,2005,2000');
6

结合使用 GROUP_CONCAT FIND_IN_SET ,并按find_in_set返回的位置进行筛选,然后可以使用此查询仅返回每个id的前5年:

SELECT
  yourtable.*
FROM
  yourtable INNER JOIN (
    SELECT
      id,
      GROUP_CONCAT(year ORDER BY rate DESC) grouped_year
    FROM
      yourtable
    GROUP BY id) group_max
  ON yourtable.id = group_max.id
     AND FIND_IN_SET(year, grouped_year) BETWEEN 1 AND 5
ORDER BY
  yourtable.id, yourtable.year DESC;

请看小提琴 here .

请注意,如果有多行可以具有相同的费率,则应考虑在“费率”列而不是“年份”列上使用“组一致(按费率的不同费率顺序)”。

group_concat返回的字符串的最大长度是有限的,因此如果需要为每个组选择几条记录,则此操作很有效。