Bulk Inserts & Load Data Infile

from the Artful MySQL Tips List


First, some basic benchmarks. Multi-row Inserts are much quicker than single-row Insert syntax; Load Data Infile is much quicker than either. On a modern machine properly configured, you might insert 200-byte InnoDB rows at a rate of 2k rows/sec with single committed Inserts, six times that rate with multi-row Insert syntax and one Commit, three times that rate again with Load Data Infile.

Import performance also depends on source and target characteristics, for example with 30-300MB dump files of extended inserts into InnoDB tables, 350k-4 million row data rows in 800-4,000 statements, we see these speed ranges on modest 2020 hardware ...

+------------+-----------+----------+--------+
| Source     | Target	 | Rows/sec | MB/sec |
+------------+-----------+----------+--------+
| local file | local SSD |  25-40k  |   3-4  |
+------------+-----------+----------+--------+
| local file | local HDD |   20k    |    2   |
+------------+-----------+---------+---------+
| local file | LAN HDD   |   10k    |    1   |
+------------+-----------+----------+--------+
MyISAM has no transaction logic so it's 20-25 times faster than InnoDB loading huge input files into a table.

If bulk inserts into InnoDB tables can't be avoided, there's wide agreement that performance is improved by disabling unique_checks and foreign_key_checks for the duration of the bulk insert. This and other basic bulk insert optimisations are summarised at https://dev.mysql.com/doc/refman/5.7/en/innodb-auto-increment-handling.html.

For huge Inserts on a system with lots of RAM and multiple cores, in my.cnf try ...

[mysqld]
# leave core optimisation to mysql
innodb_thread_concurrency = 0
innodb_read_io_threads = 64
innodb_write_io_threads = 64
innodb_io_capacity = 5000

Turning off log_bin for the duration of the Load Data process cuts execution time by 20-25%. Is that safe? All transactions involved are already documented in the Load Data input file, so if the server isn't a replication master, you can probably live without the redundancy of duplicating every command to the binary log, and just save the input file as an external record of the operation. To do it before MySQL version 5.6, you had to unset log_bin in my.cnf/ini and restart the server, and of course reverse that with another server reset afterwards. Since 5.6, it's a dynamic variable:

set session sql_log_bin=0;

See here for Bill Karwin's excellent slideshow on optimising InnoDB bulk inserts. For my.cnf settings that improve Insert performance, see https://nbsoftsolutions.com/blog/optimizing-innodb-bulk-insert

It's often best to leave autocommit on, and with InnoDB disable keys does not help, see http://dba.stackexchange.com/questions/31752/mysql-load-data-infile-slows-by-80-after-a-few-gigs-of-input-with-innodb-engine

But be sure to read Kristian Köhntopp's excellent little note on keeping a disposable MySQL instance for speedy non-ACID bulk operations.

On partitioning and InnoDB and Load Data, see http://www.ajaydivakaran.com/mysql-innodb-are-inserts-slowing-down, and on the benefits of chunking, see Baron Schwarz's pt-fifo_split.

Last updated 2 Sep 2022


Return to the Artful MySQL Tips page