报错,报错信息如下, 数据库版本 9.0.4。
--1执行长时间查询时报错。
skytf=> SELECTcount(id) from skytf.tbl_info;
ERROR: cancelingstatement due to conflict with recovery
DETAIL: User query might have needed to see rowversions that must be removed.
备注: 表"skytf.tbl_info" 是个大表,光数据就有 12G,这个统计SQL 正常情况
下需要2分钟左右完成, 但每次执行到一会儿是,抛出以上错误。根据错误信
息,初步估计当在从库上执行查询时,与主库发生了冲突。
--2 网上GOOGLE,信息如下
Long running queries on thestandby are a bit tricky, because they
might need to see row versions that are already removed on themaster.
备注: 意思是说,长时间SQL如果跑在 standby 节点上可以说是一个笑话,因为standby 节点
有可能需要读取主库上被 removed的数据。
--3 解决方法,修改参数
修改参数,设置成以下值, max_standby_streaming_delay = 300s;
max_standby_streaming_delay (integer)
When Hot Standby is active, this parameter determines how long thestandby server should wait before canceling standby queries thatconflict with about-to-be-applied WAL entries, as described inSection 25.5.2. max_standby_streaming_delay applies when WAL datais being received via streaming replication. The default is 30seconds. Units are milliseconds if not specified. A value of -1allows the standby to wait forever for conflicting queries tocomplete. This parameter can only be set in the postgresql.conffile or on the server command line.
Note that max_standby_streaming_delay is not the same as themaximum length of time a query can run before cancellation; ratherit is the maximum total time allowed to apply WAL data once it hasbeen received from the primary server. Thus, if one query hasresulted in significant delay, subsequent conflicting queries willhave much less grace time until the standby server has caught upagain.
备注:上面的解释很好理解:当在 Standby 提供应用时,如果 Standby 节点上的SQL 与接收主库日志发生冲突时,
这个参数决定了从库等侍这个查询的时间,默认值为 30 s, 难怪,刚才的统计SQL,执行时间估计在二分钟左
右,从而被 Standby 库主动 Cancel 了。也可以将这个参数设置成 -1. 表示standby 节点永远等侍这个查询,
这无疑是有风险的,如果这个查询不结束,那么从库一直处于与主库的中断状态,不会同步主库数据,而会一
直等从库这个SQL执行完成, 这里将参数设置成 300s ,是经过了与开发人员的沟通后确定的一个值。
--4再次执行统计SQL
skytf=> selectcount(*) from tbl_info;
count
----------
88123735
(1 row)
Time: 131068.569 ms
备注:这回终于可以执行了,这个SQL花了 二分钟多,低于5分钟。
--5其它建议
Another option is to increase vacuum_defer_cleanup_age on the primaryserver, so that dead rows will not be cleaned up as quickly as theynormally would be. This will allow more time for queries to executebefore they are cancelled on the standby, without having to set ahigh max_standby_streaming_delay. However it is difficult toguarantee any specific execution-time window with this approach,since vacuum_defer_cleanup_age is measured in transactions executedon the primary server.
备注:上面这段话来自手册上的,也是针对从库与主库可能产生冲突时的建议方法,可以设置参数
vacuum_defer_cleanup_age, 由于这个参数是以事务数来确定的,在实际操作时很难操作,
故不采设置这个参数的方法。
--6总结
PostgreSQL 的 Hot Standby是个好东西,但用从库的时候也要注意,用得不好从库可能拒绝提供服务。