评论

收藏

[MySQL] 【原创】MySQL和PostgreSQL 导入数据对比

数据库 数据库 发布于:2021-07-04 09:52 | 阅读数:543 | 评论:0

  在虚拟机上测评了下MySQL 和 PostgreSQL 的各种LOAD FILE方式以及时间。 因为是虚拟机上的测评,所以时间只做参考,不要太较真, 看看就好了。
  MySQL 工具:
  1. 自带mysqlimport工具。
  2. 命令行 load data infile ...
  3. 利用mysql-connector-python Driver来写的脚本。
  PostgreSQL 工具:
  1. pgloader 第三方工具。
  2. 命令行 copy ... from ...
  3. 利用psycopg2写的python 脚本。
  测试表结构:
mysql> desc t1;
+----------+-----------+------+-----+-------------------+-------+
| Field  | Type    | Null | Key | Default       | Extra |
+----------+-----------+------+-----+-------------------+-------+
| id     | int(11)   | NO   | PRI | NULL        |     |
| rank   | int(11)   | NO   |   | NULL        |     |
| log_time | timestamp | YES  |   | CURRENT_TIMESTAMP |     |
+----------+-----------+------+-----+-------------------+-------+
3 rows in set (0.00 sec)
mysql> select count(*) from t1;
+----------+
| count(*) |
+----------+
|  1000000 |
+----------+
1 row in set (6.80 sec)
  测试CSV文件:
  t1.csv
  

  MySQL 自身的loader: (时间24妙)
  
mysql> load data infile '/tmp/t1.csv' into table t1 fields terminated by ',' enclosed by '"' lines terminated by '\r\n';
Query OK, 1000000 rows affected (24.21 sec)
Records: 1000000  Deleted: 0  Skipped: 0  Warnings: 0
  MySQL python 脚本:(时间23秒)
  >>>
  Running 23.289 Seconds
  

  MySQL 自带mysqlimport:(时间23秒)
[root@mysql56-master ~]# time mysqlimport t_girl '/tmp/t1.csv' --fields-terminated-by=',' --fields-enclosed-by='"' --lines-terminated-by='\r\n' --use-threads=2 -uroot -proot
t_girl.t1: Records: 1000000  Deleted: 0  Skipped: 0  Warnings: 0
real  0m23.664s
user  0m0.016s
sys   0m0.037s
  

  PostgreSQL 自身COPY:(时间7秒)
t_girl=# copy t1 from '/tmp/t1.csv' with delimiter ',';
COPY 1000000
Time: 7700.332 ms
  Psycopg2 驱动copy_to方法:(时间6秒)
[root@postgresql-instance scripts]# python load_data.py
Running 5.969 Seconds.
  

  Pgloader 导入CSV:(时间33秒)
[root@postgresql-instance ytt]# pgloader commands.load   
          table name     read   imported   errors      time
            ytt.t1  1000000  1000000      0     33.514s
------------------------------  ---------  ---------  ---------  --------------
------------------------------  ---------  ---------  ---------  --------------
       Total import time  1000000  1000000      0     33.514s
Pgloader 直接从MySQL 拉数据:(时间51秒)
[root@postgresql-instance ytt]# pgloader commands.mysql
          table name     read   imported   errors      time
         fetch meta data      2      2      0      0.138s
------------------------------  ---------  ---------  ---------  --------------
              t1  1000000  1000000      0     51.136s
------------------------------  ---------  ---------  ---------  --------------
------------------------------  ---------  ---------  ---------  --------------
------------------------------  ---------  ---------  ---------  --------------
       Total import time  1000000  1000000      0     51.274s
  

  附上commands.load和commands.mysql
commands.load:
LOAD CSV 
   FROM '/tmp/ytt.csv' WITH ENCODING UTF-8
    ( 
       id, rank, log_time 
    ) 
   INTO postgresql://t_girl:t_girl@127.0.0.1:5432/t_girl?ytt.t1
   WITH skip header = 0, 
    fields optionally enclosed by '"', 
    fields escaped by backslash-quote, 
    fields terminated by ',' 
  SET work_mem to '32 MB', maintenance_work_mem to '64 MB';
commands.mysql:
LOAD DATABASE 
   FROM mysql://python_user:python_user@192.168.1.131:3306/t_girl?t1
   INTO postgresql://t_girl:t_girl@127.0.0.1:5432/t_girl?ytt.t1
 with data only
  SET maintenance_work_mem to '64MB', 
    work_mem to '3MB', 
    search_path to 'ytt';
  附pgloader 手册:
  http://pgloader.io/howto/pgloader.1.html
  


  
关注下面的标签,发现更多相似文章