Most of us are lucky that software bugs are a small part of our lives. The ‘things just work’ attitude works for most situations, but occasionally something appears that may seem trivial at first glance may be catastrophic.
A bug appeared in MySQL’s recently released 8.0.32 that really caught my attention. This is a prime example of how a minor bug could have a significant impact on your life and a detrimental impact on your company or project.
The title Wrong result for AVG() OVER(ROWS BETWEEN 1 FOLLOWING AND UNBOUNDED FOLLOWING) is the first thing that caught my eye. A lot of developers heavily use the AVG() function. My first thought was, ‘ut-oh, AVG() being broken is a terrible thing,’ but then I realized it was thankfully not that drastic.
The OVER() clause indicates this is a problem in a Window Function. And odds are, if you are using OVER() for calculations, you are also using AVG() a great deal. I love Window Functions with AVG() a lot, so I was happy that the problem was constrained a little, but I was sitting squarely in the middle of those constraints.
I am curious. And also wondering if this could affect my code.
If you find yourself in a similar position, the first step is to read the entire bug report from top to bottom, paying attention to the details. At this point, it is still okay to wonder if this is an edge case that may not bother you, and you will be able to go on about your day happily.
Most MySQL bug reports come with an example of the problem. The author of this bug, Markus Winand, is well-known in the database world and wrote a beautiful, clear example of reporting a bug. He describes the problem, identifies the error, and provides a test script.
The problem, he states, is that there is a wrong answer when using AVG() with the window-frame ROWS BETWEEN 1 FOLLOWING AND UNBOUNDED FOLLOWING. He then details the issue: the result is wrong for the second row (it seems to return the SUM, not divided by the count). And then comes an indication that he really has tested this issue.
Problem does not exist in 8.0.31 or earlier and also not in 8.0.32 when using RANGE instead of ROWS.
This may not be a big issue as MySQL 8.0.32 is very new and not widely used in production yet. So there may be a collective sigh of relief in the MySQL metaverse that this is not a problem that has been silently plaguing us for an unknown period of time. And those who have already upgraded are scrambling to check their code for ROWS BETWEEN 1 FOLLOWING AND UNBOUNDED FOLLOWING.
I would encourage you to test your non-8.0.32 versions by issuing the following just in case and as practice. The following code is from the bug report.
CREATE TABLE t ( n INTEGER ); INSERT INTO t VALUES (1), (2), (3), (4); SELECT n , SUM(n) OVER(ORDER BY n ROWS BETWEEN 1 FOLLOWING AND UNBOUNDED FOLLOWING) AS "sum(n)" , COUNT(*) OVER(ORDER BY n ROWS BETWEEN 1 FOLLOWING AND UNBOUNDED FOLLOWING) AS "count(n)" , AVG(n) OVER(ORDER BY n ROWS BETWEEN 1 FOLLOWING AND UNBOUNDED FOLLOWING) AS "avg(n)" , SUM(n) OVER(ORDER BY n ROWS BETWEEN 1 FOLLOWING AND UNBOUNDED FOLLOWING) / COUNT(*) OVER(ORDER BY n ROWS BETWEEN 1 FOLLOWING AND UNBOUNDED FOLLOWING) AS "sum(n)/count(n)" FROM t ORDER BY n; +------+--------+----------+--------+-----------------+ | n | sum(n) | count(n) | avg(n) | sum(n)/count(n) | +------+--------+----------+--------+-----------------+ | 1 | 9 | 3 | 3.0000 | 3.0000 | | 2 | 7 | 2 | 7.0000 | 3.5000 | <-- Column "avg(n)" is wrong. | 3 | 4 | 1 | 4.0000 | 4.0000 | | 4 | NULL | 0 | NULL | NULL | +------+--------+----------+--------+-----------------+
My 8.0.32 instance, not in production, reported the wrong number. My primary system reported the proper 3.5 in the avg(n) column. Whew! Hopefully, this will be an easy fix for 8.0.33.
You need to peruse the list of bugs regularly, say once a week, just to keep apprised of potential problems on your horizon. You can search for bugs on https://bugs.mysql.com for a selected time and filter on versions and periods.
If the bug is a worry, try the test case. Yes, there are occasions when you are looking at an edge-case situation that may not bother you. Still, part of being a Database Administrator (DBA) is a healthy dose of paranoia that something will try and hurt your data. Test to make sure either way.
Lastly, it pays to check your critical calculations from time to time. Imagine you are an actuarial for an insurance company, and this bug skipped past you! The numbers you rely on are untrustworthy, and your company could be in ‘less than optimal circumstances.’
And a big thank you and ‘good catch’ to Marcus Winand for finding this bug.
Percona Monitoring and Management is a best-of-breed open source database monitoring solution. It helps you reduce complexity, optimize performance, and improve the security of your business-critical database environments, no matter where they are located or deployed.