这里相当于在建立MySQL RDS后,在产生一个通过Binlog数据复制的 DuckDB,作为这个MySQL RDS的附属。这里DuckDB 和 MySQL 二者是独立的,各自有独立的链接地址。
这里我们做一个测试,来看看复杂的SQL在MySQL和DuckDB之间运行的差距。
下面我们把表和灌入数据的脚本展示在下方 方便大家也可以进行类似的测试。
DROP TABLE IF EXISTS orders; DROP TABLE IF EXISTS products; DROP TABLE IF EXISTS users; CREATE TABLE users ( user_id BIGINT AUTO_INCREMENT PRIMARY KEY, user_name VARCHAR(100), email VARCHAR(200), country VARCHAR(50), create_time DATETIME DEFAULT CURRENT_TIMESTAMP ); CREATE TABLE products ( product_id BIGINT AUTO_INCREMENT PRIMARY KEY, product_name VARCHAR(200), category VARCHAR(50), price DECIMAL(10,2), create_time DATETIME DEFAULT CURRENT_TIMESTAMP ); CREATE TABLE orders ( order_id BIGINT AUTO_INCREMENT PRIMARY KEY, user_id BIGINT, product_id BIGINT, quantity INT, order_date DATETIME DEFAULT CURRENT_TIMESTAMP, status VARCHAR(20) );
DELIMITER $$ CREATE PROCEDURE load_test_data() BEGIN DECLARE i INT DEFAULT 0; -- 插入 users (10万) SET i = 1; WHILE i <= 100000 DO INSERT INTO users (user_name, email, country) VALUES ( CONCAT('user_', i), CONCAT('user_', i, '@test.com'), ELT(FLOOR(1 + RAND()*5), 'China','USA','UK','Germany','India') ); SET i = i + 1; END WHILE; -- 插入 products (1万) SET i = 1; WHILE i <= 10000 DO INSERT INTO products (product_name, category, price) VALUES ( CONCAT('product_', i), ELT(FLOOR(1 + RAND()*5), 'Electronics','Clothes','Food','Books','Sports'), ROUND(RAND()*1000, 2) ); SET i = i + 1; END WHILE; -- 插入 orders (100万) SET i = 1; WHILE i <= 1000000 DO INSERT INTO orders (user_id, product_id, quantity, order_date, status) VALUES ( FLOOR(1 + RAND()*100000), -- 有效 user_id FLOOR(1 + RAND()*10000), -- 有效 product_id FLOOR(1 + RAND()*5 + 1), -- 数量 1~5 NOW() - INTERVAL FLOOR(RAND()*365) DAY, ELT(FLOOR(1 + RAND()*3), 'PAID','SHIPPED','CANCELLED') ); SET i = i + 1; END WHILE; END $$ DELIMITER ;
CALL load_test_data();
把以上脚本在MySQL上运行即可,数据会通过Binlog 直接复制到DuckDB中。
然后我们链接到MySQL 和 DuckDB,来运行同样的SQL看看之间的差距。
目标:统计最近 90 天各国家、各品类的销售额/订单数/客单价,并计算在国家内的 排名 与 贡献占比。 特点:WITH CTE、三表 JOIN、窗口函数 RANK()、分区内占比、条件过滤。 WITH o90 AS ( SELECT o.order_id, o.user_id, o.product_id, o.quantity, o.order_date, o.status FROM orders o WHERE o.status = 'PAID' AND o.order_date >= NOW() - INTERVAL 90 DAY ), base AS ( SELECT u.country, p.category, o.order_id, (o.quantity * p.price) AS amount FROM o90 o JOIN users u ON u.user_id = o.user_id JOIN products p ON p.product_id = o.product_id ), agg AS ( SELECT country, category, COUNT(DISTINCT order_id) AS order_cnt, SUM(amount) AS revenue, AVG(amount) AS avg_order_value FROM base GROUP BY country, category ) SELECT a.country, a.category, a.order_cnt, ROUND(a.revenue, 2) AS revenue, ROUND(a.avg_order_value, 2) AS avg_order_value, RANK() OVER (PARTITION BY a.country ORDER BY a.revenue DESC) AS rev_rank_in_country, ROUND( a.revenue / NULLIF(SUM(a.revenue) OVER (PARTITION BY a.country), 0) * 100, 2 ) AS revenue_share_in_country_pct FROM agg a ORDER BY a.country, rev_rank_in_country, a.revenue DESC;
目标:先在每个国家内按 用户近 180 天总消费 做十分位分层(NTILE(10)),把 Top10% 作为高价值用户,与“其余用户”对比,按 国家 × 品类 汇总对比贡献。 WITH o180 AS ( SELECT o.order_id, o.user_id, o.product_id, o.quantity, o.order_date FROM orders o WHERE o.status = 'PAID' AND o.order_date >= NOW() - INTERVAL 180 DAY ), user_rev AS ( -- 每位用户在 180 天内的总消费(用于分层) SELECT u.user_id, u.country, SUM(o.quantity * p.price) AS user_revenue_180d FROM o180 o JOIN users u ON u.user_id = o.user_id JOIN products p ON p.product_id = o.product_id GROUP BY u.user_id, u.country ), user_seg AS ( -- 在各国家内按用户总消费分十分位,1=最高 SELECT user_id, country, user_revenue_180d, NTILE(10) OVER (PARTITION BY country ORDER BY user_revenue_180d DESC) AS decile_in_country FROM user_rev ), base AS ( -- 还原到订单明细,打上用户的分层标签 SELECT us.country, us.decile_in_country, p.category, (o.quantity * p.price) AS amount FROM o180 o JOIN user_seg us ON us.user_id = o.user_id JOIN products p ON p.product_id = o.product_id ) SELECT country, category, -- 高价值用户(Top10%,decile=1) ROUND(SUM(CASE WHEN decile_in_country = 1 THEN amount END), 2) AS top10_revenue, COUNT(CASE WHEN decile_in_country = 1 THEN 1 END) AS top10_lines, -- 其余用户 ROUND(SUM(CASE WHEN decile_in_country <> 1 THEN amount END), 2) AS others_revenue, COUNT(CASE WHEN decile_in_country <> 1 THEN 1 END) AS others_lines, -- Top10% 在该国家×品类中的贡献占比 ROUND( SUM(CASE WHEN decile_in_country = 1 THEN amount END) / NULLIF(SUM(amount), 0) * 100, 2 ) AS top10_share_pct FROM base GROUP BY country, category ORDER BY country, top10_share_pct DESC, category;
我们在看结果DuckDB 运行这条语句需要0.125秒,而MySQL需要3.094秒。
MySQL
目标:构建 国家 × 品类 × 月 的月度指标(收入、订单数、客单价),计算 同比 YoY,并给出 国家内占比。最后使用 WITH ROLLUP 输出 国家小计。 WITH o12 AS ( SELECT o.order_id, o.user_id, o.product_id, o.quantity, o.order_date FROM orders o WHERE o.status = 'PAID' AND o.order_date >= DATE_FORMAT(DATE_SUB(CURDATE(), INTERVAL 12 MONTH), '%Y-%m-01') ), mth AS ( -- 基础月度聚合:国家 × 品类 × 月 SELECT u.country, p.category, DATE_FORMAT(o.order_date, '%Y-%m-01') AS month_start, COUNT(DISTINCT o.order_id) AS order_cnt, SUM(o.quantity * p.price) AS revenue, AVG(o.quantity * p.price) AS avg_order_value FROM o12 o JOIN users u ON u.user_id = o.user_id JOIN products p ON p.product_id = o.product_id GROUP BY u.country, p.category, DATE_FORMAT(o.order_date, '%Y-%m-01') ), mth_y AS ( -- 为计算同比,准备去年的同月指标 SELECT country, category, DATE_FORMAT(DATE_ADD(STR_TO_DATE(month_start, '%Y-%m-%d'), INTERVAL 12 MONTH), '%Y-%m-01') AS month_start, -- 映射到“今年同月” revenue AS revenue_last_year FROM mth ) SELECT x.country, x.category, x.month_start, ROUND(x.revenue, 2) AS revenue, x.order_cnt, ROUND(x.avg_order_value, 2) AS avg_order_value, -- 同比(与去年同月比) ROUND( (x.revenue - y.revenue_last_year) / NULLIF(y.revenue_last_year, 0) * 100, 2 ) AS yoy_revenue_pct, -- 国家内当月占比 ROUND( x.revenue / NULLIF( SUM(x.revenue) OVER (PARTITION BY x.country, x.month_start), 0 ) * 100, 2 ) AS share_in_country_month_pct FROM mth x LEFT JOIN mth_y y ON y.country = x.country AND y.category = x.category AND y.month_start = x.month_start -- 输出国家小计(ROLLUP) ORDER BY x.country, x.month_start, x.revenue DESC;