标签归档:charset

MySQL DATE_FORMATE函数内置字符集的坑

今天帮同事处理一个SQL(简化过后的)执行报错:

mysql> select date_format('2013-11-19','Y-m-d') > timediff('2013-11-19', '2013-11-20');                                         

ERROR 1267 (HY000): Illegal mix of collations (utf8_general_ci,COERCIBLE) and (latin1_swedish_ci,NUMERIC) for operation '>'

乍一看挺莫名其妙的,查了下手册,发现有这么一段:

The language used for day and month names and abbreviations is controlled by the value of the lc_time_names system variable (Section 9.7, “MySQL Server Locale Support”).

The DATE_FORMAT() returns a string with a character set and collation given by character_set_connection and collation_connection so that it can return month and weekday names containing non-ASCII characters.

也就是说,DATE_FORMATE() 函数返回的结果是带有字符集/校验集属性的,而 TIMEDIFF() 函数则没有字符集/校验集属性,我们来验证一下:

mysql> set names utf8;
mysql> select charset(date_format('2013-11-19','Y-m-d')), charset(timediff('2013-11-19', '2013-11-20'));
+--------------------------------------------+-----------------------------------------------+
| charset(date_format('2013-11-19','Y-m-d')) | charset(timediff('2013-11-19', '2013-11-20')) |
+--------------------------------------------+-----------------------------------------------+
| utf8                                       | binary                                        |
+--------------------------------------------+-----------------------------------------------+

mysql> set names gb2312;
mysql> select charset(date_format('2013-11-19','Y-m-d')), charset(timediff('2013-11-19', '2013-11-20'));
+--------------------------------------------+-----------------------------------------------+
| charset(date_format('2013-11-19','Y-m-d')) | charset(timediff('2013-11-19', '2013-11-20')) |
+--------------------------------------------+-----------------------------------------------+
| gb2312                                     | binary                                        |
+--------------------------------------------+-----------------------------------------------+

可以看到,随着通过 SET NAMES 修改 character_set_connection、collation_connection  值,DATE_FORMAT() 函数返回结果的字符集也跟着不一样。在这种情况下,想要正常工作,就需要将结果进行一次字符集转换,例如:

mysql> select date_format('2013-11-19','Y-m-d') > convert(timediff('2013-11-19', '2013-11-20') using utf8);
+----------------------------------------------------------------------------------------------+
| date_format('2013-11-19','Y-m-d') > convert(timediff('2013-11-19', '2013-11-20') using utf8) |
+----------------------------------------------------------------------------------------------+
|                                                                                            1 |
+----------------------------------------------------------------------------------------------+

就可以了 :)

P.S,MySQL的版本:5.5.20-55-log Percona Server (GPL), Release rel24.1, Revision 217

MySQL字符集的一个坑

今天帮同事处理一个棘手的事情,问题是这样的:

无论在客户机用哪个版本的mysql客户端连接服务器,发现只要服务器端设置了

character-set-server = utf8

之后,

character_set_client、 character_set_connection、character_set_results

就始终都是和服务器端保持一致了,即便在mysql客户端加上选项

--default-character-set=utf8

也不行,除非连接进去后,再手工执行命令

set names latin1

,才会将client、connection、results的字符集改过来。

经过仔细对比,最终发现让我踩坑的地方是,服务器端设置了另一个选项:

skip-character-set-client-handshake

文档上关于这个选项的解释是这样的:

--character-set-client-handshake

Don't ignore character set information sent by the client. To ignore client information and use the default server character set, use --skip-character-set-client-handshake; this makes MySQL behave like MySQL 4.0

这么看来,其实也是有好处的。比如启用 skip-character-set-client-handshake 选项后,就可以避免客户端程序误操作,使用其他字符集连接进来并写入数据,从而引发乱码问题。