News
Planet MySQL
Planet MySQL - https://planet.mysql.com

  • Scaling IO-Bound Workloads for MySQL in the Cloud – part 2
    This post is a followup to my previous article https://www.percona.com/blog/2018/08/29/scaling-io-bound-workloads-mysql-cloud/ In this instance, I want to show the data in different dimensions, primarily to answer questions around how throughput scales with increasing IOPS. A recap: for the test I use Amazon instances and Amazon gp2 and io1 volumes. In addition to the original post, I also tested two gpl2 volumes combined in software RAID0. I did this for the following reason: Amazon cap the single gp2 volume throughput to 160MB/sec, and as we will see from the charts, this limits InnoDB performance. Also, a reminder from the previous post: we can increase gp2 IOPS by increasing volume size (to the top limit 10000 IOPS), and for io1 we can increase IOPS by paying per additional IOPS. Scaling with InnoDB So for the first result, let’s see how InnoDB scales with increasing IOPS. There are a few interesting observations here: InnoDB scales linearly with additional IOPS, but it faces a throughput limit that Amazon applies to volumes. So besides considering IOPS, we should take into account the maximal throughout of volumes. In the second chart we compare InnoDB performance vs the cost of volumes: It’s interesting to see here the slope for gp2 volumes is steeper than for io1 volumes. This means we can get a bigger increase in InnoDB performance per dollar using gp2 volumes, but only until we reach the IOPS and throughput limits that are applied to gp2 volumes. Scaling with MyRocks And here’s the similar chart but for MyRocks: Here we can also see that MyRocks scales linearly, showing identical results on gp2 and io1 volumes. This means that running on gp2 will be cheaper. Also, there is no plateau in throughput, as we saw for InnoDB, which means that MyRocks uses less IO throughput. And the chart for the cost of running MyRocks: This charts also shows that it is cheaper to run on gp2 volume but only while it provides enough IOPS. I assume that using two gp2 volumes would allow me to double the throughput. (I did not run the test for MyRocks using two volumes) Conclusions Both MyRocks and InnoDB can scale (linearly) with additional IOPS on gp2 and io1 Amazon volumes. Take into account that IOPS is not the only factor to consider. There is also throughput limit, which affects InnoDB results, so for further scaling you might need to use multiple volumes. The post Scaling IO-Bound Workloads for MySQL in the Cloud – part 2 appeared first on Percona Database Performance Blog.

  • What is the “(scanning)” variant of a loose index scan?
    A query plan uses loose index scan if “Using index for group-by” appears in the “Extra”  column of the EXPLAIN output. In some plans though, “Using index for group-by (scanning)” appears. What does “(scanning)” mean and how is it different from the regular loose index scan?…

  • Driving Innovation with MySQL 8.0 and Docker
    When I worked at Borland on InterBase (one of the first MVCC relational databases), the saxophone-playing founder of Borland, Philippe Kahn, would talk about the ‘forehead install’ at nearly every meeting that I attended. Installs should be easy, he would say. As easy as hitting your forehead to the space bar. In fact, Kahn claimed that installing software with another product “could be equivalent to a heart transplant”. Although MySQL installs are not complex, there is one tool that can make installs and test driving new software more palatable – Docker. There are two advantages of testing MySQL 8.0 with Docker: (1) installing and starting MySQL 8.0 is even simplier with Docker and (2) changing my.cnf values with the SET PERSIST option can also help you navigate the Docker environment. Installing Docker, Starting MySQL, and Connecting to MySQL First, you grab Docker: https://docs.docker.com/install . Then, you pull and run MySQL 8.0 (Linux) by running the following: $ docker run –name mysql8 -e MYSQL_ROOT_PASSWORD=Mypasswd* -d mysql/mysql-server Unable to find image ‘mysql/mysql-server:latest’ locally … 03ba86c1f15c: Pull complete   That’s it! The above command pulls and invokes Docker to run the latest MySQL release as a daemon. The MySQL image is < 700 MB – a small footprint!  It should take < a minute to pull and start your container. Let’s make sure that your container is running before you connect to your server. Run the following command in bold to get the status of your MySQL container. Look for the word ‘healthy’ to know that it is running: $ docker ps CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES 9e9916ed9d6e mysql/mysql-server “/entrypoint.sh mysq…” 48 seconds ago Up 47 seconds (healthy)   Now you login to your containerized server: $ docker exec -it mysql8 mysql -uroot -p Enter password: .. Server version: 8.0.12 MySQL Community ServerType ‘help;’ or ‘\h’ for help. Type ‘\c’ to clear the current input statement. mysql>   Changing my.cnf Values with a MySQL 8.0 Container Remember that a Docker container is a process that talks to the Linux OS. A container sits on top of the OS and the file system. You really have to sleuth around beneath the container to find the data directory and the my.cnf file (more about that in a future blog). However, to help you test MySQL 8.0, you can persist many my.cnf options from within MySQL. This is a handy feature when working with containers. For example, let’s say you want to change the maximum number of connections: mysql> SET PERSIST max_connections=300; mysql>  SELECT @@global.max_connections; @@global.max_connections 300 mysql> \q   You see that the maximum number of connections has changed. But you want to make sure that the change has persisted to the my.cnf file. You could just issue a ‘docker exec’ command to re-reconnect to your server. But let’s get more familiar with Docker. You restart your MySQL container with: $ docker restart mysql8   And now, the test. Did the SET PERSIST command write to the my.cnf file? $ docker exec -it mysql8 mysql -uroot –p Enter password: mysql>  SELECT @@global.max_connections;  @@global.max_connections                       300    You are now ready to try some of the top MySQL 8.0 features such as window functions, CTE’s, invisible indexes, ALTER TABLE… ADD COLUMN INSTANT, user roles, and other new features along with your personalized SET OPTIONs to begin sanity testing before you upgrade. In conclusion, this blog is just an hors d’oeuvre to whet your appetite for both Docker and MySQL 8.0. Bon appétit! Photo above from https://aboveaverage.com/oed-word-of-the-year-whatever-you-spell-when-you-smash-your-head-on-keyboard/ “The statements and opinions expressed here are my own and do not necessarily represent those of the Oracle Corporation.” -Kathy Forte, Oracle MySQL Solutions Architect

  • MariaDB 10.2.18 and MariaDB Connector/Node.js 2.0.0 now available
    The MariaDB Foundation is pleased to announce the availability of MariaDB 10.2.18, the latest stable release in the MariaDB 10.2 series, as well as MariaDB Connector/Node.js 2.0.0, the second alpha release of the new 100% JavaScript non-blocking MariaDB client for Node.js, compatible with Node.js 6+. See the release notes and changelogs for details. Download MariaDB […] The post MariaDB 10.2.18 and MariaDB Connector/Node.js 2.0.0 now available appeared first on MariaDB.org.

  • Why Optimization derived_merge can Break Your Queries
    Lately, I worked on several queries which started returning wrong results after upgrading MySQL Server to version 5.7 The reason for the failure was derived merge optimization which is one of the default optimizer_switch  options. Issues were solved, though at the price of performance, when we turned it OFF . But, more importantly, we could not predict if any other query would start returning incorrect data, to allow us to fix the application before it was too late. Therefore I tried to find reasons why derived_merge  can fail. Analyzing the problem In the first run, we turned SQL Mode ONLY_FULL_GROUP_BY on, and this removed most of the problematic queries. That said, few of the queries that were successfully working with ONLY_FULL_GROUP_BY  were affected. A quick search in the MySQL bugs database gave me a not-so-short list of open bugs: Bug #84812 Bug #84445 Bug #85117 Bug #85192 Bug #90877 Bug #91418 Bug #91878 At first glance, the reported queries do not follow any pattern, and we still cannot quickly identify which would break and which would not. Then I took a second look by running all of the provided test cases in my environment and found that for four bugs, the optimizer rewrote the query. For three of the bugs, it rewrote in both 5.7 and 8.0, and one case it rewrote in 8.0 only. The remaining three buggy queries (Bug #85117, Bug #91418, Bug #91878) have things in common. Let’s first look at them: Bug #85117select temp.sel from table1 t1 left join ( select *,1 as sel from table1_use t1u where t1u.`table1id`=1 ) temp on temp.table1id = t1.id order by t1.value Bug #91418select TST.UID ,TST.BID ,TST.THING_NAME ,TST.OTHER_IFO ,vw2.DIST_UID from TEST_SUB_PROBLEM TST join ( select uuid() as DIST_UID, vw.* from ( select DISTINCT BID, THING_NAME from TEST_SUB_PROBLEM ) vw ) vw2 on vw2.BID = TST.BID; Bug #91878SELECT Virtual_Table.T_FP AS T_FP, ( SELECT COUNT(Virtual_Table.T_FP) FROM t1 t WHERE t.f1 = Virtual_Table.T_FP AND Virtual_Table.T_FP = 731834939448428685 ) AS Test_Value FROM ( SELECT t.f1 AS T_FP, tv.f1 AS TV_FP FROM t1 AS t JOIN t2 AS tv ON t.f1 = tv.t1_f1 ) AS Virtual_Table GROUP BY Virtual_Table.TV_FP HAVING Test_Value > 0; Two of the queries use DISTINCT  or GROUP BY , one uses ORDER BY  clause. The cases do not have not the same clause in common—which is what I’d expect to see—and so, surprisingly, these are not the cause of the failure. However, all three queries use generated values: a constant in the first one; UUID()  and COUNT()  functions in the second and third respectively. This similarity is something we need to investigate. To find out why derived_merge  might work incorrectly for these queries we need to understand how this optimization works and why it was introduced. The intent behind derived_merge First I recommend checking the official MySQL User Reference Manual and MariaDB knowledge base. It is correct to use both manuals: even if low-level implementations vary, the high-level architecture and the purpose of this optimization are the same. In short: derived_merge  is used for queries that have subqueries in the FROM  clause,  also called “derived tables” and practically converts them into JOIN queries. This optimization allows avoiding unnecessary materialization (creating internal temporary tables to hold results). Virtually this is the same thing as a manual rewrite of a query with a subquery into a query that has JOIN clause(s) only. The only difference is that when we rewrite queries manually, we can compare the expected and actual result, then adjust the resulting query if needed. The MySQL optimizer has to do a correct rewrite at the first attempt. And sometimes this effort fails. Let’s check why this happens for these particular queries, reported in the MySQL Bugs Database. Case Study 1: a Query from Bug #85117 Original query select temp.sel from table1 t1 left join ( select *,1 as sel from table1_use t1u where t1u.`table1id`=1 ) temp on temp.table1id = t1.id order by t1.value was rewritten to: Note (Code 1003): /* select#1 */ select 1 AS `sel` from `test`.`table1` `t1` left join (`test`.`table1_use` `t1u`) on(((`test`.`t1`.`id` = 1) and (`test`.`t1u`.`table1id` = 1))) where 1 order by `test`.`t1`.`value`; You can always find a query that the optimizer converts the original one to in the SHOW WARNINGS output following EXPLAIN [EXTENDED] for the query. In this case, the original query asks to return all rows from the table table1 , but selects only the generated field from the subquery. The subquery selects the only row with table1id=1 . Avoiding derived merge optimization is practically the same as joining table table1 with a table with one row. You can see how it works in this code snippet:mysql> create temporary table temp as select *,1 as sel from table1_use t1u where t1u.`table1id`=1; Query OK, 1 row affected (0.00 sec) Records: 1 Duplicates: 0 Warnings: 0 mysql> select * from temp; +----+----------+------+-----+ | id | table1id | uid | sel | +----+----------+------+-----+ | 1 | 1 | 99 | 1 | +----+----------+------+-----+ 1 row in set (0.00 sec) mysql> select temp.sel from table1 t1 left join temp on temp.table1id = t1.id order by t1.value; +------+ | sel | +------+ | 1 | | NULL | | NULL | +------+ 3 rows in set (0.00 sec) However, when the optimizer uses derived-merge optimization, it completely ignores the fact that the resulting table has one row, and that the calculated value would be either NULL  or 1 depending if a row corresponding to table1  exists in the table. That it prints select 1 AS `sel`  in the EXPLAIN  output while uses select NULL AS `sel`  does not change anything: both are wrong. The correct query without a subquery should look like:mysql> select if(`test`.`t1u`.`table1id`, 1, NULL) AS `sel` -> from `test`.`table1` `t1` -> left join (`test`.`table1_use` `t1u`) -> on(((`test`.`t1`.`id` = 1) and (`test`.`t1u`.`table1id` = 1))) -> where 1 -> order by `test`.`t1`.`value`; +------+ | sel | +------+ | 1 | | NULL | | NULL | +------+ 3 rows in set (0.00 sec) This report is the easiest of the bugs we will discuss in this post, and is also fixed in MariaDB. Case Study 2: a Query from Bug #91418 mysql> select * from TEST_SUB_PROBLEM; +-----+--------+------------+---------------------+ | UID | BID | THING_NAME | OTHER_IFO | +-----+--------+------------+---------------------+ | 1 | thing1 | name1 | look a chicken | | 2 | thing1 | name1 | look an airplane | | 3 | thing2 | name2 | look a mouse | | 4 | thing3 | name3 | look a taperecorder | | 5 | thing3 | name3 | look an explosion | | 6 | thing4 | name4 | look at the stars | +-----+--------+------------+---------------------+ 6 rows in set (0.00 sec) mysql> select -> TST.UID ,TST.BID ,TST.THING_NAME ,TST.OTHER_IFO ,vw2.DIST_UID -> from -> TEST_SUB_PROBLEM TST -> join ( -> select uuid() as DIST_UID, vw.* -> from ( -> select DISTINCT BID, THING_NAME -> from TEST_SUB_PROBLEM -> ) vw -> ) vw2 -> on vw2.BID = TST.BID; +-----+--------+------------+---------------------+--------------------------------------+ | UID | BID | THING_NAME | OTHER_IFO | DIST_UID | +-----+--------+------------+---------------------+--------------------------------------+ | 1 | thing1 | name1 | look a chicken | e4c288fd-b29c-11e8-b0d7-0242673a86b2 | | 2 | thing1 | name1 | look an airplane | e4c28aef-b29c-11e8-b0d7-0242673a86b2 | | 3 | thing2 | name2 | look a mouse | e4c28c47-b29c-11e8-b0d7-0242673a86b2 | | 4 | thing3 | name3 | look a taperecorder | e4c28d92-b29c-11e8-b0d7-0242673a86b2 | | 5 | thing3 | name3 | look an explosion | e4c28ed9-b29c-11e8-b0d7-0242673a86b2 | | 6 | thing4 | name4 | look at the stars | e4c29031-b29c-11e8-b0d7-0242673a86b2 | +-----+--------+------------+---------------------+--------------------------------------+ 6 rows in set (0.00 sec) This query should create a unique DIST_UID  for each unique BID name. But, instead, it generates a unique ID  for each row. First, let’s split the query into a couple of queries using temporary tables, to confirm our assumption that it was written correctly in the first place: mysql> create temporary table vw as select DISTINCT BID, THING_NAME from TEST_SUB_PROBLEM; Query OK, 4 rows affected (0.01 sec) Records: 4 Duplicates: 0 Warnings: 0 mysql> select * from vw; +--------+------------+ | BID | THING_NAME | +--------+------------+ | thing1 | name1 | | thing2 | name2 | | thing3 | name3 | | thing4 | name4 | +--------+------------+ 4 rows in set (0.00 sec) mysql> create temporary table vw2 as select uuid() as DIST_UID, vw.* from vw; Query OK, 4 rows affected (0.01 sec) Records: 4 Duplicates: 0 Warnings: 0 mysql> select * from vw2; +--------------------------------------+--------+------------+ | DIST_UID | BID | THING_NAME | +--------------------------------------+--------+------------+ | eb155f0e-b29d-11e8-b0d7-0242673a86b2 | thing1 | name1 | | eb158c05-b29d-11e8-b0d7-0242673a86b2 | thing2 | name2 | | eb159b28-b29d-11e8-b0d7-0242673a86b2 | thing3 | name3 | | eb15a916-b29d-11e8-b0d7-0242673a86b2 | thing4 | name4 | +--------------------------------------+--------+------------+ 4 rows in set (0.00 sec) mysql> select -> TST.UID ,TST.BID ,TST.THING_NAME ,TST.OTHER_IFO ,vw2.DIST_UID -> from TEST_SUB_PROBLEM TST -> join vw2 -> on vw2.BID = TST.BID; +-----+--------+------------+---------------------+--------------------------------------+ | UID | BID | THING_NAME | OTHER_IFO | DIST_UID | +-----+--------+------------+---------------------+--------------------------------------+ | 1 | thing1 | name1 | look a chicken | eb155f0e-b29d-11e8-b0d7-0242673a86b2 | | 2 | thing1 | name1 | look an airplane | eb155f0e-b29d-11e8-b0d7-0242673a86b2 | | 3 | thing2 | name2 | look a mouse | eb158c05-b29d-11e8-b0d7-0242673a86b2 | | 4 | thing3 | name3 | look a taperecorder | eb159b28-b29d-11e8-b0d7-0242673a86b2 | | 5 | thing3 | name3 | look an explosion | eb159b28-b29d-11e8-b0d7-0242673a86b2 | | 6 | thing4 | name4 | look at the stars | eb15a916-b29d-11e8-b0d7-0242673a86b2 | +-----+--------+------------+---------------------+--------------------------------------+ 6 rows in set (0.01 sec) mysql> select distinct DIST_UID -> from ( -> select -> TST.UID ,TST.BID ,TST.THING_NAME ,TST.OTHER_IFO ,vw2.DIST_UID -> from TEST_SUB_PROBLEM TST -> join vw2 -> on vw2.BID = TST.BID -> ) t; +--------------------------------------+ | DIST_UID | +--------------------------------------+ | eb155f0e-b29d-11e8-b0d7-0242673a86b2 | | eb158c05-b29d-11e8-b0d7-0242673a86b2 | | eb159b28-b29d-11e8-b0d7-0242673a86b2 | | eb15a916-b29d-11e8-b0d7-0242673a86b2 | +--------------------------------------+ 4 rows in set (0.00 sec) With temporary tables, we have precisely four unique DIST_UID  values unlike the six values that our original query returned. Let’s check how the original query was rewritten: Note (Code 1003): /* select#1 */ select `test`.`TST`.`UID` AS `UID`, `test`.`TST`.`BID` AS `BID`, `test`.`TST`.`THING_NAME` AS `THING_NAME`, `test`.`TST`.`OTHER_IFO` AS `OTHER_IFO`, uuid() AS `DIST_UID` from `test`.`TEST_SUB_PROBLEM` `TST` join (/* select#3 */ select distinct `test`.`TEST_SUB_PROBLEM`.`BID` AS `BID`, `test`.`TEST_SUB_PROBLEM`.`THING_NAME` AS `THING_NAME` from `test`.`TEST_SUB_PROBLEM`) `vw` where (`vw`.`BID` = `test`.`TST`.`BID`) You can see that the optimizer did not wholly remove the subquery here. Let’s run this modified query, and run a test with a temporary table one more time: mysql> select -> `test`.`TST`.`UID` AS `UID`, -> `test`.`TST`.`BID` AS `BID`, -> `test`.`TST`.`THING_NAME` AS `THING_NAME`, -> `test`.`TST`.`OTHER_IFO` AS `OTHER_IFO`, -> uuid() AS `DIST_UID` -> from -> `test`.`TEST_SUB_PROBLEM` `TST` -> join -> (/* select#3 */ -> select -> distinct `test`.`TEST_SUB_PROBLEM`.`BID` AS `BID`, -> `test`.`TEST_SUB_PROBLEM`.`THING_NAME` AS `THING_NAME` -> from -> `test`.`TEST_SUB_PROBLEM` -> ) `vw` -> where (`vw`.`BID` = `test`.`TST`.`BID`) -> ; +-----+--------+------------+---------------------+--------------------------------------+ | UID | BID | THING_NAME | OTHER_IFO | DIST_UID | +-----+--------+------------+---------------------+--------------------------------------+ | 1 | thing1 | name1 | look a chicken | 12c5f554-b29f-11e8-b0d7-0242673a86b2 | | 2 | thing1 | name1 | look an airplane | 12c5f73a-b29f-11e8-b0d7-0242673a86b2 | | 3 | thing2 | name2 | look a mouse | 12c5f894-b29f-11e8-b0d7-0242673a86b2 | | 4 | thing3 | name3 | look a taperecorder | 12c5f9de-b29f-11e8-b0d7-0242673a86b2 | | 5 | thing3 | name3 | look an explosion | 12c5fb20-b29f-11e8-b0d7-0242673a86b2 | | 6 | thing4 | name4 | look at the stars | 12c5fc7d-b29f-11e8-b0d7-0242673a86b2 | +-----+--------+------------+---------------------+--------------------------------------+ 6 rows in set (0.01 sec) This time the changed query result is no different to the one we received from the original one. Let’s manually replace the subquery with temporary tables, and check if it affects the result again. mysql> create temporary table vw -> select -> distinct `test`.`TEST_SUB_PROBLEM`.`BID` AS `BID`, -> `test`.`TEST_SUB_PROBLEM`.`THING_NAME` AS `THING_NAME` -> from `test`.`TEST_SUB_PROBLEM`; Query OK, 4 rows affected (0.01 sec)<br>Records: 4 Duplicates: 0 Warnings: 0 mysql> select * from vw; +--------+------------+ | BID | THING_NAME | +--------+------------+ | thing1 | name1 | | thing2 | name2 | | thing3 | name3 | | thing4 | name4 | +--------+------------+ 4 rows in set (0.00 sec) mysql> select -> `test`.`TST`.`UID` AS `UID`, -> `test`.`TST`.`BID` AS `BID`, -> `test`.`TST`.`THING_NAME` AS `THING_NAME`, -> `test`.`TST`.`OTHER_IFO` AS `OTHER_IFO`, -> uuid() AS `DIST_UID` -> from `test`.`TEST_SUB_PROBLEM` `TST` -> join vw where (`vw`.`BID` = `test`.`TST`.`BID`) ; +-----+--------+------------+---------------------+--------------------------------------+ | UID | BID | THING_NAME | OTHER_IFO | DIST_UID | +-----+--------+------------+---------------------+--------------------------------------+ | 1 | thing1 | name1 | look a chicken | e11dbe61-b2a0-11e8-b0d7-0242673a86b2 | | 2 | thing1 | name1 | look an airplane | e11dc050-b2a0-11e8-b0d7-0242673a86b2 | | 3 | thing2 | name2 | look a mouse | e11dc1af-b2a0-11e8-b0d7-0242673a86b2 | | 4 | thing3 | name3 | look a taperecorder | e11dc2be-b2a0-11e8-b0d7-0242673a86b2 | | 5 | thing3 | name3 | look an explosion | e11dc3a8-b2a0-11e8-b0d7-0242673a86b2 | | 6 | thing4 | name4 | look at the stars | e11dc4e9-b2a0-11e8-b0d7-0242673a86b2 | +-----+--------+------------+---------------------+--------------------------------------+ 6 rows in set (0.00 sec) In this case, the temporary table contains the correct number of rows: 4, but the outer query calculates a  UUID  value for all rows in the table TEST_SUB_PROBLEM . It does not take into account that the user initially asks for a unique UUID  for each unique BID  and not each unique UID . Instead, it just moves a call of UUID()  function into the outer query, which creates a unique value for each row in the table TEST_SUB_PROBLEM . It does not take into account that the temporary table contains only four rows. In this case, it would not be easy to build an effective query that generates distinct UUID  values for rows with different BID ‘s and the same UUID  values for rows with the same BID . Case Study 3: a Query from Bug #91878 This query is supposed to calculate a number of rows based on complex conditions: SELECT Virtual_Table.T_FP AS T_FP, (SELECT COUNT(Virtual_Table.T_FP) FROM t1 t WHERE t.f1 = Virtual_Table.T_FP AND Virtual_Table.T_FP = 731834939448428685) AS Test_Value FROM (SELECT t.f1 AS T_FP, tv.f1 AS TV_FP FROM t1 AS t JOIN t2 AS tv ON t.f1 = tv.t1_f1) AS Virtual_Table GROUP BY Virtual_Table.TV_FP HAVING Test_Value > 0; However, it returns no rows when it should return 22 (check the bug report for the full test case). mysql> SELECT Virtual_Table.T_FP AS T_FP, -> ( -> SELECT -> COUNT(Virtual_Table.T_FP) -> FROM t1 t -> WHERE -> t.f1 = Virtual_Table.T_FP -> AND -> Virtual_Table.T_FP = 731834939448428685 -> ) AS Test_Value -> FROM ( -> SELECT -> t.f1 AS T_FP, tv.f1 AS TV_FP -> FROM t1 AS t -> JOIN t2 AS tv -> ON t.f1 = tv.t1_f1 -> ) AS Virtual_Table -> GROUP BY Virtual_Table.TV_FP -> HAVING Test_Value > 0; Empty set (1.28 sec) To find out why this happens let’s perform a temporary table check first. mysql> create temporary table Virtual_Table SELECT t.f1 AS T_FP, tv.f1 AS TV_FP FROM t1 AS t JOIN t2 AS tv ON t.f1 = tv.t1_f1; Query OK, 18722 rows affected (2.12 sec) Records: 18722 Duplicates: 0 Warnings: 0 mysql> SELECT Virtual_Table.T_FP AS T_FP, -> (SELECT COUNT(Virtual_Table.T_FP) FROM t1 t -> WHERE t.f1 = Virtual_Table.T_FP AND Virtual_Table.T_FP = 731834939448428685) AS Test_Value -> FROM Virtual_Table GROUP BY Virtual_Table.TV_FP HAVING Test_Value > 0; +--------------------+------------+ | T_FP | Test_Value | +--------------------+------------+ | 731834939448428685 | 1 | | 731834939448428685 | 1 | | 731834939448428685 | 1 | | 731834939448428685 | 1 | | 731834939448428685 | 1 | | 731834939448428685 | 1 | | 731834939448428685 | 1 | | 731834939448428685 | 1 | | 731834939448428685 | 1 | | 731834939448428685 | 1 | | 731834939448428685 | 1 | | 731834939448428685 | 1 | | 731834939448428685 | 1 | | 731834939448428685 | 1 | | 731834939448428685 | 1 | | 731834939448428685 | 1 | | 731834939448428685 | 1 | | 731834939448428685 | 1 | | 731834939448428685 | 1 | | 731834939448428685 | 1 | | 731834939448428685 | 1 | | 731834939448428685 | 1 | +--------------------+------------+ 22 rows in set (1.62 sec) The rewritten query returned the correct result, as we expected. To identify why the original query fails, let’s check how the optimizer rewrote it: Note (Code 1003): /* select#1 */ select `test`.`t`.`f1` AS `T_FP`, (/* select#2 */ select count(`test`.`t`.`f1`) from `test`.`t1` `t` where (('731834939448428685' = 731834939448428685) and (`test`.`t`.`f1` = 731834939448428685)) ) AS `Test_Value` from `test`.`t1` `t` join `test`.`t2` `tv` where (`test`.`tv`.`t1_f1` = `test`.`t`.`f1`) group by `test`.`tv`.`f1` having (`Test_Value` > 0) Interestingly, when I run this query on the original tables it returned all 18722 rows that exist in table t2 . This output means that we cannot entirely rely on the  EXPLAIN  output. But still we can see the same symptoms: Subquery uses a function to generate a value Subquery in the FROM  clause is converted into a JOIN, and its values are accessible by an outer subquery We also see that the query has GROUP BY  and HAVING  clauses, thus adding a complication. The query is almost correct, but in this case, the optimizer mixed aliases: it uses the same alias in the internal query as in the external one. If you change the alias from t  to t2  in the subquery, the rewritten query starts returning correct results:mysql> select -> `test`.`t`.`f1` AS `T_FP`, -> (/* select#2 */ -> select -> count(`test`.`t`.`f1`) -> from -> `test`.`t1` `t` -> where ( -> ('731834939448428685' = 731834939448428685) -> and -> (`test`.`t`.`f1` = 731834939448428685) -> ) -> ) AS `Test_Value` -> from -> `test`.`t1` `t` -> join -> `test`.`t2` `tv` -> where -> (`test`.`tv`.`t1_f1` = `test`.`t`.`f1`) -> group by `test`.`tv`.`f1` -> having (`Test_Value` > 0); ... | 731834939454553991 | 1 | | 731834939453739998 | 1 | +--------------------+------------+ 18722 rows in set (0.49 sec) mysql> select -> `test`.`t`.`f1` AS `T_FP`, -> (/* select#2 */ -> select -> count(`test`.`t`.`f1`) -> from -> `test`.`t1` `t2` -> where ( -> (t2.f1=t.f1) -> and -> (`test`.`t`.`f1` = 731834939448428685) -> ) -> ) AS `Test_Value` -> from -> `test`.`t1` `t` -> join -> `test`.`t2` `tv` -> where -> (`test`.`tv`.`t1_f1` = `test`.`t`.`f1`) -> group by `test`.`tv`.`f1` -> having (`Test_Value` > 0); +--------------------+------------+ | T_FP | Test_Value | +--------------------+------------+ | 731834939448428685 | 1 | | 731834939448428685 | 1 | | 731834939448428685 | 1 | | 731834939448428685 | 1 | | 731834939448428685 | 1 | | 731834939448428685 | 1 | | 731834939448428685 | 1 | | 731834939448428685 | 1 | | 731834939448428685 | 1 | | 731834939448428685 | 1 | | 731834939448428685 | 1 | | 731834939448428685 | 1 | | 731834939448428685 | 1 | | 731834939448428685 | 1 | | 731834939448428685 | 1 | | 731834939448428685 | 1 | | 731834939448428685 | 1 | | 731834939448428685 | 1 | | 731834939448428685 | 1 | | 731834939448428685 | 1 | | 731834939448428685 | 1 | | 731834939448428685 | 1 | +--------------------+------------+ 22 rows in set (1.82 sec) While the calculated value is not the reason why this query returns incorrect results, it is similar to the previous examples because the optimizer does not take in account that the value of `test`.`t`.`f1`  in the outer query is not necessarily equal to 731834939448428685. Is also interesting that neither Oracle nor PostgreSQL accept such a query, and instead complain of improper use of the  GROUP BY clause. Meanwhile, MySQL accepts this query even with SQL Mode set to ONLY_FULL_GROUP_BY . Reported as bug #92020. Conclusion and recommendations While derived_merge  is a very effective optimization, it can rewrite queries destructively. Safety measures when using this optimization are: Make sure that you use the latest version of MySQL/Percona/MariaDB servers which include all of the new bug fixes. Generated values for the subquery results either constant or returned values of functions is the red flag. Relaxing SQL Mode ONLY_FULL_GROUP_BY  is always dangerous and should not be used together with derived_merge . As a last resort, you can consider rewriting queries to JOIN  manually or turning derived_merge  optimization OFF .   The post Why Optimization derived_merge can Break Your Queries appeared first on Percona Database Performance Blog.

Current News