MySQL:有效地填充存储过程中的表

我正在 MySQL 服务器上测试性能,并用超过 2 亿条记录填充表.存储过程生成大 SQL 字符串的速度非常慢.非常欢迎任何帮助或评论.

系统信息:

  • 数据库: MySQL 5.6.10 InnoDB 数据库(测试).
  • 处理器: AMD Phenom II 1090T X6 内核,每个内核 3910Mhz.
  • 内存: 16GB DDR3 1600Mhz CL8.
  • HD: SSD 中的 Windows 7 64 位 SP1,SSD 中安装的 mySQL,机械硬盘中写入的日志.

存储过程创建一个 INSERT sql 查询,其中包含要插入表中的所有值.

DELIMITER $$使用`测试`$$删除程序如果存在`inputRowsNoRandom`$$CREATE DEFINER=`root`@`localhost` PROCEDURE `inputRowsNoRandom`(IN NumRows BIGINT)开始/* 用要插入的行丢失构建插入语句 */声明我 BIGINT;声明 nMax BIGINT;声明 squery LONGTEXT;DECLARE svalues LONGTEXT;设置 i = 1;SET nMax = NumRows + 1;SET squery = 'INSERT INTO `entity_versionable` (fk_entity, str1, str2, bool1, double1, DATE) VALUES ';SET svalues = '("1", "a1", 100, 1, 500000, "2013-06-14 12:40:45"),';当我 <最大溶解氧SET squery = CONCAT(squery, svalues);SET i = i + 1;结束时;/*选择查询;*/SET squery = LEFT(squery, CHAR_LENGTH(squery) - 1);SET squery = CONCAT(squery, ";");选择查询;/* 执行插入语句 *//*开始交易;*//* 从 squery 准备 stmt;执行 stmt;解除分配准备 stmt;*//*犯罪;*/完$$分隔符;


结果:

  1. 连接 20000 个字符串需要大约 45 秒的处理时间:

<块引用>

调用 test.inputRowsNoRandom(20000);

  1. 连接 100000 个字符串大约需要 +5/12 分钟 O_O:

<块引用>

调用 test.inputRowsNoRandom(100000);

<子>结果(按持续时间排序) - 以秒为单位的陈述(总和) ||百分比
释放项目 0.00005 50.00000
起始 0.00002 20.00000
执行 0.00001 10.00000
初始化 0.00001 10.00000
清理 0.00001 10.00000
总计 0.00010 100.00000

由于执行查询而改变状态变量
变量值说明
Bytes_received 21 Bytes 从客户端发送到服务器
Bytes_sent 97 Bytes 从服务器发送到客户端
com_select 1 已执行的SELECT语句数
问题 1 服务器执行的语句数

测试:
我已经用不同的 MySQL 配置进行了测试,从 12 到 64 个线程,设置缓存打开和关闭,将日志移动到另一个硬件磁盘...
还使用 TEXT、INT..
进行了测试
附加信息:

  • 性能链接:general&multiple-cores、配置、优化IO、Debiancores,最佳配置,配置 48gb ram..
  • 分析 SQL 查询:如何分析查询,检查查询中可能存在的瓶颈


问题:

  • 代码有问题吗?如果我发送 100000 个字符串来构建最终的 SQL 字符串,则 SELECT squery; 的结果是一个 NULL 字符串.发生了什么?(一定有错误,但我没有看到).
  • 我可以以任何方式改进代码以加快速度吗?
  • 我读过存储过程中的一些操作可能真的很慢,我应该在 C/Java/PHP 中生成文件吗...和 ​​发送到mysql?<块引用>

    mysql -u mysqluser -p 数据库名称

    数字.sql

  • MySQL 似乎仅对一个 SQL 查询使用一个内核,会nginx 或其他数据库系统:多线程数据库,Cassandra、Redis、MongoDB...) 使用存储过程获得更好的性能并为一个查询使用多个 CPU?(因为我的单个查询仅使用总 CPU 的 20% 和大约 150 个线程).

更新:

  • 填表的有效方法,检查下面的 peterm 答案.
  • 存储过程的性能,现代 RDBMS 或内联查询.

解决方案

不要使用循环,尤其是在 RDBMS 中这种规模的循环.

尝试使用查询快速用 100 万行填充您的表格

INSERT INTO `entity_versionable` (fk_entity, str1, str2, bool1, double1, date)选择 1, 'a1', 100, 1, 500000, '2013-06-14 12:40:45'从(选择 a.N + b.N * 10 + c.N * 100 + d.N * 1000 + e.N * 10000 + f.N * 100000 + 1 Nfrom (select 0 as N union all select 1 union all select 2 union all select 3 union all select 4 union all select 5 union all select 6 union all select 7 union all select 8 union all select 9) a,(选择0为N并全选1并全选2并全选3并全选4并全选5并全选6并全选7并全选8并全选9)b,(选0为N并全选1并全选2并全选3并全选4并全选5并全选6并全选7并全选8并全选9)c,(选择0为N并全选1并全选2并全选3并全选4并全选5并全选6并全选7并全选8并全选9)d, (选择0为N union all select 1 union all select 2 union all select 3 union all select 4 union all select 5 union all select 6 union all select 7 union all select 8 union all select 9) e,(选0为N并全选1并全选2并全选3并全选4并全选5并全选6并全选7并全选8并全选9)f) t

我的盒子(MacBook Pro 16GB RAM,2.6Ghz Intel Core i7)用了大约 8 秒完成

<前>查询正常,1000000 行受影响(7.63 秒)记录:1000000 重复:0 警告:0

UPDATE1 现在是使用准备语句的存储过程的一个版本

DELIMITER $$创建程序`inputRowsNoRandom`(IN NumRows INT)开始声明 i INT DEFAULT 0;准备 stmt从'插入到`entity_versionable`(fk_entity,str1,str2,bool1,double1,date)值(?, ?, ?, ?, ?, ?)';设置@v1 = 1,@v2 = 'a1',@v3 = 100,@v4 = 1,@v5 = 500000,@v6 = '2013-06-14 12:40:45';当我 <行数使用@v1、@v2、@v3、@v4、@v5、@v6 执行 stmt;SET i = i + 1;结束时;解除分配准备 stmt;完$$分隔符;

在约 3 分钟内完成:

<前>mysql> CALL inputRowsNoRandom(1000000);查询正常,0 行受影响(2 分 51.57 秒)

感受 8 秒和 3 分钟的不同

UPDATE2 为了加快速度,我们可以明确地使用事务并批量提交插入.所以这里是 SP 的改进版本.

DELIMITER $$CREATE PROCEDURE inputRowsNoRandom1(IN NumRows BIGINT, IN BatchSize INT)开始声明 i INT DEFAULT 0;准备 stmt从'插入到`entity_versionable`(fk_entity,str1,str2,bool1,double1,date)值(?, ?, ?, ?, ?, ?)';设置@v1 = 1,@v2 = 'a1',@v3 = 100,@v4 = 1,@v5 = 500000,@v6 = '2013-06-14 12:40:45';开始交易;当我 <行数使用@v1、@v2、@v3、@v4、@v5、@v6 执行 stmt;SET i = i + 1;如果我 % BatchSize = 0 那么犯罪;开始交易;万一;结束时;犯罪;解除分配准备 stmt;完$$分隔符;

不同批次大小的结果:

<前>mysql> CALL inputRowsNoRandom1(1000000,1000);查询正常,0 行受影响(27.25 秒)mysql> CALL inputRowsNoRandom1(1000000,10000);查询正常,0 行受影响(26.76 秒)mysql> CALL inputRowsNoRandom1(1000000,100000);查询正常,0 行受影响(26.43 秒)

您自己会看到差异.仍然 > 比交叉连接差 3 倍.

I am testing performance in a MySQL Server and filling a table with more than 200 million of records. The Stored Procedure is very slow generating the big SQL string. Any help or comment is really welcome.

System Info:

  • Database: MySQL 5.6.10 InnoDB database (test).
  • Processor: AMD Phenom II 1090T X6 core, 3910Mhz each core.
  • RAM: 16GB DDR3 1600Mhz CL8.
  • HD: Windows 7 64bits SP1 in SSD, mySQL installed in SSD, logs written in mechanical hard disk.

The Stored Procedure creates a INSERT sql query with all the values to be inserted into the table.

DELIMITER $$
USE `test`$$

DROP PROCEDURE IF EXISTS `inputRowsNoRandom`$$

CREATE DEFINER=`root`@`localhost` PROCEDURE `inputRowsNoRandom`(IN NumRows BIGINT)
BEGIN
    /* BUILD INSERT SENTENCE WITH A LOS OF ROWS TO INSERT */
    DECLARE i BIGINT;
    DECLARE nMax BIGINT;
    DECLARE squery LONGTEXT;
    DECLARE svalues LONGTEXT;

    SET i = 1;
    SET nMax = NumRows + 1;
    SET squery = 'INSERT INTO `entity_versionable` (fk_entity, str1, str2, bool1, double1, DATE) VALUES ';
    SET svalues = '("1", "a1", 100, 1, 500000, "2013-06-14 12:40:45"),';

    WHILE i < nMax DO
        SET squery = CONCAT(squery, svalues);
        SET i = i + 1;
    END WHILE;

    /*SELECT squery;*/
    SET squery = LEFT(squery, CHAR_LENGTH(squery) - 1);
    SET squery = CONCAT(squery, ";");
    SELECT squery;

    /* EXECUTE INSERT SENTENCE */
    /*START TRANSACTION;*/
    /*PREPARE stmt FROM squery;
    EXECUTE stmt;
    DEALLOCATE PREPARE stmt;
    */

    /*COMMIT;*/
END$$
DELIMITER ;


Results:

  1. Concatenating 20000 strings takes about 45 seconds to be processed:

CALL test.inputRowsNoRandom(20000);

  1. Concatenating 100000 strings takes about +5/12 minutes O_O:

CALL test.inputRowsNoRandom(100000);

Result (ordered by duration) - stateduration (summed) in sec || percentage
freeing items 0.00005 50.00000
starting 0.00002 20.00000
executing 0.00001 10.00000
init 0.00001 10.00000
cleaning up 0.00001 10.00000
Total 0.00010 100.00000

Change Of STATUS VARIABLES Due To Execution Of Query
variable value description
Bytes_received 21 Bytes sent from the client to the server
Bytes_sent 97 Bytes sent from the server to the client
Com_select 1 Number of SELECT statements that have been executed
Questions 1 Number of statements executed by the server

Tests:
I have already tested with different MySQL configurations from 12 to 64 threads, setting cache on and off, moving logs to another hardware disk...
Also tested using TEXT, INT..

Additional Information:

  • Performance links: general&multiple-cores, configuration, optimizing IO, Debiancores, best configuration, config 48gb ram..
  • Profiling a SQL query: How to profile a query, Check for possible bottleneck in a query


Questions:

  • Is something wrong in the code? If I send 100000 strings to build the final SQL string, the result of SELECT squery; is a NULL string. Whats happening? (error must be there but I dont see it).
  • Can I improve the code in any way to speed it up?
  • I have read some operations in Stored Procedures can be really slow, should I generate the file in C/Java/PHP.. and send it to mysql?

    mysql -u mysqluser -p databasename < numbers.sql

  • MySQL seems to use only one core for one single SQL query, would nginx or other database system: Multithreadted DBs, Cassandra, Redis, MongoDB..) achieve better performance with stored procedures and use more than one CPU for one query? (Since my single query is using only 20% of total CPU with about 150 threads).

UPDATE:

  • Efficent way of filling the table, check peterm answer below.
  • Performance of Stored Procedure, modern RDBMS or inline queries.

解决方案

Don't use loops especially on that scale in RDBMS.

Try to quickly fill your table with 1m rows with a query

INSERT INTO `entity_versionable` (fk_entity, str1, str2, bool1, double1, date)
SELECT 1, 'a1', 100, 1, 500000, '2013-06-14 12:40:45'
  FROM
(
select a.N + b.N * 10 + c.N * 100 + d.N * 1000 + e.N * 10000 + f.N * 100000 + 1 N
from (select 0 as N union all select 1 union all select 2 union all select 3 union all select 4 union all select 5 union all select 6 union all select 7 union all select 8 union all select 9) a
      , (select 0 as N union all select 1 union all select 2 union all select 3 union all select 4 union all select 5 union all select 6 union all select 7 union all select 8 union all select 9) b
      , (select 0 as N union all select 1 union all select 2 union all select 3 union all select 4 union all select 5 union all select 6 union all select 7 union all select 8 union all select 9) c
      , (select 0 as N union all select 1 union all select 2 union all select 3 union all select 4 union all select 5 union all select 6 union all select 7 union all select 8 union all select 9) d
      , (select 0 as N union all select 1 union all select 2 union all select 3 union all select 4 union all select 5 union all select 6 union all select 7 union all select 8 union all select 9) e
      , (select 0 as N union all select 1 union all select 2 union all select 3 union all select 4 union all select 5 union all select 6 union all select 7 union all select 8 union all select 9) f
) t

It took on my box (MacBook Pro 16GB RAM, 2.6Ghz Intel Core i7) ~8 sec to complete

Query OK, 1000000 rows affected (7.63 sec)
Records: 1000000  Duplicates: 0  Warnings: 0

UPDATE1 Now a version of a stored procedure that uses a prepared statement

DELIMITER $$
CREATE PROCEDURE `inputRowsNoRandom`(IN NumRows INT)
BEGIN
    DECLARE i INT DEFAULT 0;

    PREPARE stmt 
       FROM 'INSERT INTO `entity_versionable` (fk_entity, str1, str2, bool1, double1, date)
             VALUES(?, ?, ?, ?, ?, ?)';
    SET @v1 = 1, @v2 = 'a1', @v3 = 100, @v4 = 1, @v5 = 500000, @v6 = '2013-06-14 12:40:45';

    WHILE i < NumRows DO
        EXECUTE stmt USING @v1, @v2, @v3, @v4, @v5, @v6;
        SET i = i + 1;
    END WHILE;

    DEALLOCATE PREPARE stmt;
END$$
DELIMITER ;

Completed in ~3 min:

mysql> CALL inputRowsNoRandom(1000000);
Query OK, 0 rows affected (2 min 51.57 sec)

Feel the difference 8 sec vs 3 min

UPDATE2 To speed things up we can explicitly use transactions and commit insertions in batches. So here it goes an improved version of the SP.

DELIMITER $$
CREATE PROCEDURE inputRowsNoRandom1(IN NumRows BIGINT, IN BatchSize INT)
BEGIN
    DECLARE i INT DEFAULT 0;

    PREPARE stmt 
       FROM 'INSERT INTO `entity_versionable` (fk_entity, str1, str2, bool1, double1, date)
             VALUES(?, ?, ?, ?, ?, ?)';
    SET @v1 = 1, @v2 = 'a1', @v3 = 100, @v4 = 1, @v5 = 500000, @v6 = '2013-06-14 12:40:45';

    START TRANSACTION;
    WHILE i < NumRows DO
        EXECUTE stmt USING @v1, @v2, @v3, @v4, @v5, @v6;
        SET i = i + 1;
        IF i % BatchSize = 0 THEN 
            COMMIT;
            START TRANSACTION;
        END IF;
    END WHILE;
    COMMIT;
    DEALLOCATE PREPARE stmt;
END$$
DELIMITER ;

Results with different batch sizes:

mysql> CALL inputRowsNoRandom1(1000000,1000);
Query OK, 0 rows affected (27.25 sec)

mysql> CALL inputRowsNoRandom1(1000000,10000);
Query OK, 0 rows affected (26.76 sec)

mysql> CALL inputRowsNoRandom1(1000000,100000);
Query OK, 0 rows affected (26.43 sec)

You see the difference yourself. Still > 3 times worse than cross join.

相关文章