<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Weez.com &#187; using</title>
	<atom:link href="http://www.weez.com/tag/using/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.weez.com</link>
	<description>Solving everyday practical LAMP problems... one at a time</description>
	<lastBuildDate>Fri, 10 Feb 2012 23:07:33 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.0.1</generator>
		<item>
		<title>Percona Replication Manager, a solution for MySQL high availability with replication using Pacemaker</title>
		<link>http://www.weez.com/2011/11/percona-replication-manager-a-solution-for-mysql-high-availability-with-replication-using-pacemaker/</link>
		<comments>http://www.weez.com/2011/11/percona-replication-manager-a-solution-for-mysql-high-availability-with-replication-using-pacemaker/#comments</comments>
		<pubDate>Tue, 29 Nov 2011 23:37:49 +0000</pubDate>
		<dc:creator>Abidoon</dc:creator>
				<category><![CDATA[MySQL]]></category>
		<category><![CDATA[Availability]]></category>
		<category><![CDATA[High]]></category>
		<category><![CDATA[manager]]></category>
		<category><![CDATA[Pacemaker]]></category>
		<category><![CDATA[Percona]]></category>
		<category><![CDATA[Replication]]></category>
		<category><![CDATA[Solution]]></category>
		<category><![CDATA[using]]></category>

		<guid isPermaLink="false">http://www.weez.com/2011/11/percona-replication-manager-a-solution-for-mysql-high-availability-with-replication-using-pacemaker/</guid>
		<description><![CDATA[Over the last year, the frustration of many of us at Percona regarding issues with MMM has grown to a level where we started looking at other ways of achieving higher availability using MySQL replication. One of the weakness of MMM is its communication layer, so instead of reinventing a flat tire, we decided, Baron [...]]]></description>
			<content:encoded><![CDATA[<p>Over the last year, the frustration of many of us at Percona regarding issues with MMM has grown to a level where we started looking at other ways of achieving higher availability using MySQL replication. One of the weakness of MMM is its communication layer, so instead of reinventing a flat tire, we decided, Baron Schwartz and I, to develop a solution using <a href="http://www.clusterlabs.org">Pacemaker</a>, a well known and established cluster manager with a bullet proof communication layer.  One of the great thing about Pacemaker is its flexibility but flexibility may results in complexity.  With the help of people from the Pacemaker community, namely Florian Haas and Raoul Bhatia, I have been able to modify the existing MySQL Pacemaker resource agent in a way that it survived our replication tests and offered a behavior pretty similar to MMM regarding Virtual IP addresses, VIPs, management. We decided to call this solution PRM for Percona Replication Manager.  All the parts are opensource and available under the GPL license. </p>
<p>Keep in mind this solution is hot from the press, consider it alpha.  Like I said above, it survived testing in a very controlled environment but it is young and many issues/bugs are likely to be found.  Also, it is different from Yoshinori Matsunobu&#8217;s MHA solution and in fact it is quite a complement to it. One of my near term goal is to integrate with MHA for master promotion.</p>
<p>The solution is basically made of 3 pieces:</p>
<ul>
<li>The Pacemaker cluster manager</li>
<li>A Pacemaker configuration</li>
<li>A MySQL resource agent</li>
</ul>
<p>Here I will not cover the Pacemaker installation since this is fairly straightforward and covered <a href="http://www.clusterlabs.org">elsewhere</a>.  I&#8217;ll discuss the MySQL resource agent and the supporting configuration while assuming basic knowledge of Pacemaker.  </p>
<p>But, before we start, what does this solution offers. </p>
<ul>
<li>Reader and writer VIPs behaviors similar to MMM</li>
<li>If the master fails, a new master is promoted from the slaves, no master to master setup needed. Selection of master is based on scores published by the slaves, the more up to date slaves have higher scores for promotion</li>
<li>Some nodes can be dedicated to be only slaves or less likely to become master</li>
<li>A node can be the preferred master</li>
<li>If replication on a slave breaks or lags beyond a defined threshold, the reader VIP(s) is removed.  MySQL is not restarted.</li>
<li>If no slaves are ok, all VIPs, readers and writer, will be located on the master</li>
<li>During a master switch, connections are killed on the demoted master to avoid replication conflicts</li>
<li>All slaves are in read_only mode</li>
<li>Simple administrative commands can remove master role from a node</li>
<li>Pacemaker stonith devices are supported</li>
<li>No logical limits in term of number of nodes</li>
<li>Easy to add nodes</li>
</ul>
<p>In order to setup the solution you&#8217;ll need my version of the MySQL resource agent, it is not yet pushed to the main Pacemaker resource agents branch.  More testing and cleaning will be needed before that happen. You can get the resource agent from here:</p>
<p><a href="https://github.com/y-trudeau/resource-agents/raw/master/heartbeat/mysql" title="https://github.com/y-trudeau/resource-agents/raw/master/heartbeat/mysql">https://github.com/y-trudeau/resource-agents/raw/master/heartbeat/mysql</a></p>
<p>You can also the whole branch from here:</p>
<p><a href="https://github.com/y-trudeau/resource-agents/zipball/master" title="https://github.com/y-trudeau/resource-agents/zipball/master">https://github.com/y-trudeau/resource-agents/zipball/master</a></p>
<p>On my Ubuntu Lucid VM, this file goes in /usr/lib/ocf/resource.d/heartbeat/ directory.</p>
<p>To use this agent, you&#8217;ll need a Pacemaker configuration.  As a starting point, I&#8217;ll discuss the configuration I use during my tests.</p>
<pre>
node testvirtbox1 \
        attributes IP="10.2.2.160"
node testvirtbox2 \
        attributes IP="10.2.2.161"
node testvirtbox3 \
        attributes IP="10.2.2.162"
primitive p_mysql ocf:heartbeat:mysql \
        params config="/etc/mysql/my.cnf" pid="/var/run/mysqld/mysqld.pid" \
               socket="/var/run/mysqld/mysqld.sock" replication_user="root" \
               replication_passwd="rootpass" max_slave_lag="15" evict_outdated_slaves="false" \
               binary="/usr/bin/mysqld_safe" test_user="root" \
               test_passwd="rootpass" \
        op monitor interval="5s" role="Master" OCF_CHECK_LEVEL="1" \
        op monitor interval="2s" role="Slave" OCF_CHECK_LEVEL="1"
primitive reader_vip_1 ocf:heartbeat:IPaddr2 \
        params ip="10.2.2.171" nic="eth0"
primitive reader_vip_2 ocf:heartbeat:IPaddr2 \
        params ip="10.2.2.172" nic="eth0"
primitive reader_vip_3 ocf:heartbeat:IPaddr2 \
        params ip="10.2.2.173" nic="eth0"
primitive writer_vip ocf:heartbeat:IPaddr2 \
        params ip="10.2.2.170" nic="eth0" \
        meta target-role="Started"
ms ms_MySQL p_mysql \
        meta master-max="1" master-node-max="1" clone-max="3" clone-node-max="1" notify="true" globally-unique="false" target-role="Master" is-managed="true"
location No-reader-vip-1-loc reader_vip_1 \
        rule $id="No-reader-vip-1-rule" -inf: readerOK eq 0
location No-reader-vip-2-loc reader_vip_2 \
        rule $id="No-reader-vip-2-rule" -inf: readerOK eq 0
location No-reader-vip-3-loc reader_vip_3 \
        rule $id="No-reader-vip-3-rule" -inf: readerOK eq 0
location No-writer-vip-loc writer_vip \
        rule $id="No-writer-vip-rule" -inf: writerOK eq 0
colocation reader_vip_1_dislike_reader_vip_2 -200: reader_vip_1 reader_vip_2
colocation reader_vip_1_dislike_reader_vip_3 -200: reader_vip_1 reader_vip_3
colocation reader_vip_2_dislike_reader_vip_3 -200: reader_vip_2 reader_vip_3
property $id="cib-bootstrap-options" \
        dc-version="1.0.11-a15ead49e20f047e129882619ed075a65c1ebdfe" \
        cluster-infrastructure="openais" \
        expected-quorum-votes="3" \
        stonith-enabled="false" \
        no-quorum-policy="ignore" \
        last-lrm-refresh="1322236006"
property $id="mysql_replication" \
        replication_info="10.2.2.162|mysql-bin.000090|106"
rsc_defaults $id="rsc-options" \
        resource-stickiness="100"
</pre>
<p>Let&#8217;s review the configuration.  First it begins by 3 <em>node</em> entries defining the 3 nodes I have in my cluster.  One attribute is required to each node, the IP address that will be used for replication.  This is a real IP address not a reader or writer VIP.  This attribute allows the use of a private network for replication if needed. </p>
<p>Next is the mysql primitive resource declaration.  This primitive defines the mysql resource on each node and has many parameters, here&#8217;s the ones I had to define:</p>
<ul>
<li><em>config</em>:  The path of the my.cnf file.  Remember that Pacemaker will start MySQL, not the regular init.d script</li>
<li><em>pid</em>: The pid file.  This is use by Pacemaker to know if MySQL is already running. It should match the my.cnf pid_file setting.</li>
<li><em>socket</em>: The MySQL unix socket file</li>
<li><em>replication_user</em>: The user to use when setting up replication.  It is also currently used for the &#8216;CHANGE MASTER TO&#8217; command, something that should/will change in the future</li>
<li><em>replication_passwd</em>: The password for the above user</li>
<li><em>max_slave_lag</em>: The maximum allowed slave lag in seconds, if a slave lags by more than that value, it will lose its reader VIP(s)</li>
<li><em>evict_outdated_slaves</em>: A mandatory to set this to false otherwise Pacemaker will stop MySQL on a slave that lags behind.  This will absolutely not help its recovery.</li>
<li><em>test_user and test_passwd</em>: The credentials to test MySQL.  Default is to run select count(*) on mysql.user table, so the user given should at least have select on that table.</li>
<li><em>op monitor</em>: An entry is needed for each role, Master and Slave. Intervals must not be the same.
</ul>
<p>Following the mysql primitive declaration, the primitives for 3 reader vips and one writer vip are defined.  Those are straightforward so I&#8217;ll skip detailed description. The next interesting element is the master-slave &#8220;ms&#8221; declaration.  This is how Pacemaker defines an asymmetrical resource having a master and slaves.  The only thing that may change here is <em>clone-max=&#8221;3&#8243;</em> which should match the number of database nodes you have.  </p>
<p>The handling of the VIPs is the truly new thing in the resource agent.  I am grateful to Florian Haas who told me to use node attributes to avoid Pacemaker from over reacting.  The availability of a reader or writer VIPs on a node are controlled by the attributes readerOK and writerOK and the location rules.  An infinite negative weight is given when a VIP should not be on a host. I also added a few colocation rules to help spread the reader VIPs on all the nodes.  </p>
<p>As a final thought on the Pacemaker configuration, remember that in order for a pacemaker cluster to run correctly on a 2 nodes cluster, you should set the quorum policy to ignore.  Also, this example configuration has no stonith devices defined so stonith is disable.  At the end of the configuration, you&#8217;ll notice the <em>replication_info</em> cluster attribute.  You don&#8217;t have to define this, the mysql RA will add it automatically when the first a node will promoted to master.</p>
<p>There are not many requirements regarding the MySQL configuration,  Pacemaker will automatically add &#8220;skip-start-slave&#8221; for a saner behavior.  One of the important setting is &#8220;log_slave_updates = OFF&#8221; (default value).  In some cases, if slaves are logging replication updates, it may cause failover issues.  Also, the solution relies on the <em>read_only</em> setting on the slave so, make sure the application database use doesn&#8217;t have the <em>SUPER</em> privilege which overrides read_only.</p>
<p>Like I mentioned above, this project is young.  In the future, I&#8217;d like to integrate MHA to benefit for its capacity of bringing all the nodes to a consistent level.  Also, the security around the solution should be improved, a fairly easy task I believe.   Of course, I&#8217;ll work with the maintainers of the Pacemaker resources agents to include it in the main branch once it matured a bit.</p>
<p>Finally, if you are interested by this solution but have problems setting it up, just contact us at Percona, we&#8217;ll be pleased to help.</p>
<p>View full post on <a href="http://www.mysqlperformanceblog.com/2011/11/29/percona-replication-manager-a-solution-for-mysql-high-availability-with-replication-using-pacemaker/">MySQL Performance Blog</a></p>
]]></content:encoded>
			<wfw:commentRss>http://www.weez.com/2011/11/percona-replication-manager-a-solution-for-mysql-high-availability-with-replication-using-pacemaker/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Patrick Crews: Drizzle’s Jenkins system using dbqp for randgen and crashme testing</title>
		<link>http://www.weez.com/2011/06/patrick-crews-drizzle%e2%80%99s-jenkins-system-using-dbqp-for-randgen-and-crashme-testing/</link>
		<comments>http://www.weez.com/2011/06/patrick-crews-drizzle%e2%80%99s-jenkins-system-using-dbqp-for-randgen-and-crashme-testing/#comments</comments>
		<pubDate>Wed, 22 Jun 2011 20:06:49 +0000</pubDate>
		<dc:creator>Abidoon</dc:creator>
				<category><![CDATA[Drizzle]]></category>
		<category><![CDATA[crashme]]></category>
		<category><![CDATA[Crews]]></category>
		<category><![CDATA[dbqp]]></category>
		<category><![CDATA[Drizzle’s]]></category>
		<category><![CDATA[Jenkins]]></category>
		<category><![CDATA[Patrick]]></category>
		<category><![CDATA[randgen.]]></category>
		<category><![CDATA[system]]></category>
		<category><![CDATA[Testing]]></category>
		<category><![CDATA[using]]></category>

		<guid isPermaLink="false">http://www.weez.com/2011/06/patrick-crews-drizzle%e2%80%99s-jenkins-system-using-dbqp-for-randgen-and-crashme-testing/</guid>
		<description><![CDATA[Well, that&#8217;s pretty much it, thanks for stopping by ; ) In all seriousness, it&#8217;s kind of neat that we&#8217;re using dbqp to run some of our staging tests and we gain a few neat things: Speed Here are the trend charts for randgen and crashme.  While it doesn&#8217;t look like randgen is showing much [...]]]></description>
			<content:encoded><![CDATA[<p>Well, that&#8217;s pretty much it, thanks for stopping by ; )</p>
<p>In all seriousness, it&#8217;s kind of neat that we&#8217;re using <a href="http://docs.drizzle.org/testing/dbqp.html">dbqp</a> to run some of our staging tests and we gain a few neat things:</p>
<h2>Speed</h2>
<p>Here are the trend charts for <a href="http://jenkins.drizzle.org/view/Drizzle-staging/job/drizzle-staging-randgen/buildTimeTrend">randgen</a> and <a href="http://jenkins.drizzle.org/view/Drizzle-staging/job/drizzle-staging-crash-me/buildTimeTrend">crashme</a>.  While it doesn&#8217;t look like randgen is showing much of an improvement, it is worth mentioning that this job now runs both the standard <strong>and</strong> the transaction log tests in a single run &gt;: )  Previously, we had a separate drizzle-automation job for the transaction log.  Just the trx_log tests took ~30 minutes to run (plus build time).  Long story short, we&#8217;re saving about 30-40 minutes on randgen testing per staging run and only needing to build once!</p>
<h2>Maintainability</h2>
<p>The jobs we run are in the tree and anyone can easily repeat them.  While <a href="https://launchpad.net/drizzle-automation">Drizzle-automation</a> kicks major butt (and I have taken many ideas from it), it is a separate piece of software that requires <a href="http://wiki.drizzle.org/Automation_Documentation#Installing_drizzle-automation">setup and maintenance</a>.  Basing things around an in-tree setup means that you only need the code and any required bits and pieces.  Now if we need to set up a new <a href="http://docs.drizzle.org/testing/randgen.html">randgen</a> machine, we only need the <a href="https://launchpad.net/randgen">randgen</a> and dbd::drizzle installed (and we plan on including randgen in-tree soon, so you won&#8217;t even need that!).  If we need to set up a new crash-me machine, we only need dbd::drizzle &#8211; and everyone should have dbd::drizzle installed! ; )</p>
<h2>Ease of use</h2>
<p>Pretty much all tests provide the same standard output:</p>
<h3>dtr mode</h3>
<p>From the command:<br />
<code><br />
./dbqp<br />
</code><br />
Our default mode is dtr (aka using drizzletest.cc to execute standard .test files).  To run all available tests, use the make target &#8211; make test-dbqp<br />
<code><br />
20110621-081404  trigger_dictionary.loaded                  [ pass ]       43<br />
20110621-081408  logging_stats.cumulative                   [ pass ]     1045<br />
20110621-081412  errmsg_stderr.stderr                       [ pass ]       36<br />
20110621-081412  ===============================================================<br />
20110621-081412 INFO Test execution complete in 496 seconds<br />
20110621-081412 INFO Summary report:<br />
20110621-081412 INFO Executed 566/566 test cases, 100.00 percent<br />
20110621-081412 INFO STATUS: PASS, 566/566 test cases, 100.00 percent executed<br />
20110621-081412 INFO Spent 254 / 496 seconds on: TEST(s)<br />
20110621-081412 INFO Test execution complete<br />
20110621-081412 INFO Stopping all running servers...<br />
</code></p>
<h3>randgen mode</h3>
<p>From the command:<br />
<code><br />
./dbqp --mode=randgen --randgen-path=/path/to/your/randgen<br />
</code><br />
<code><br />
20110621-170141  main.subquery                              [ pass ]     3780<br />
20110621-170148  main.subquery_semijoin                     [ pass ]     3016<br />
20110621-170156  main.subquery_semijoin_nested              [ pass ]     3750<br />
20110621-170202  main.varchar                               [ pass ]     2658<br />
20110621-170202  ===============================================================<br />
20110621-170202 INFO Test execution complete in 147 seconds<br />
20110621-170202 INFO Summary report:<br />
20110621-170202 INFO Executed 19/19 test cases, 100.00 percent<br />
20110621-170202 INFO STATUS: PASS, 19/19 test cases, 100.00 percent executed<br />
20110621-170202 INFO Spent 77 / 147 seconds on: TEST(s)<br />
20110621-170202 INFO Test execution complete<br />
20110621-170202 INFO Stopping all running servers...<br />
</code></p>
<h3>crashme mode</h3>
<p>From the command:<br />
<code><br />
./dbqp --mode=crashme<br />
</code><br />
<code><br />
20110621-181515  main.crashme                               [ fail ]   149840<br />
20110621-181515  func_extra_to_days=error        # Function TO_DAYS<br />
20110621-181515  ###<br />
20110621-181515  ###&lt;select to_days('1996-01-01') from crash_me_d<br />
20110621-181515  ###&gt;2450084<br />
20110621-181515  ###We expected '729024' but got '2450084'<br />
20110621-181515  func_odbc_timestampadd=error        # Function TIMESTAMPADD<br />
20110621-181515  ###<br />
20110621-181515  ###&lt;select timestampadd(SQL_TSI_SECOND,1,'1997-01-01 00:00:00')<br />
20110621-181515  ###&gt;1997-01-01 00:00:01.000000<br />
20110621-181515  ###We expected '1997-01-01 00:00:01' but got '1997-01-01 00:00:01.000000'<br />
20110621-181515  ###<br />
20110621-181515  ###&lt;select {fn timestampadd(SQL_TSI_SECOND,1,{ts '1997-01-01 00:00:00'}) }<br />
20110621-181515  ###&gt;1997-01-01 00:00:01.000000<br />
20110621-181515  ###We expected '1997-01-01 00:00:01' but got '1997-01-01 00:00:01.000000'<br />
20110621-181515<br />
20110621-181515 ERROR Failed test.  Use --force to execute beyond the first test failure<br />
20110621-181515  ===============================================================<br />
20110621-181515 INFO Test execution complete in 153 seconds<br />
20110621-181515 INFO Summary report:<br />
20110621-181515 INFO Executed 1/1 test cases, 100.00 percent<br />
20110621-181515 INFO STATUS: FAIL, 1/1 test cases, 100.00 percent executed<br />
20110621-181515 INFO FAIL tests: main.crashme<br />
20110621-181515 INFO Spent 149 / 153 seconds on: TEST(s)<br />
20110621-181515 INFO Test execution complete<br />
20110621-181515 INFO Stopping all running servers...<br />
</code></p>
<p>While this isn&#8217;t a huge feature, it is nice to have a standardized report for knowing if something failed, what failed and how (we always dump test tool output on test failures).  Why is this nice?  Well, the world is a busy place and only needing to know one way of reading test output simplifies things just a teensy little bit.  This small improvement becomes a huge benefit over time if you happen to spend good chunks of your day looking at test output like me : )</p>
<p>Other than that, I&#8217;m still working on teaching dbqp interesting new tricks that will help me in testing <a href="http://www.skysql.com/en/index">SkySQL</a>&#8216;s <a href="http://blogs.skysql.com/2011/05/new-features-added-to-skysql.html">Reference Architecture</a> &#8211; expect to hear more about that next month!</p>
<p>View full post on <a href="http://www.wc220.com/?p=276">Planet Drizzle</a></p>
]]></content:encoded>
			<wfw:commentRss>http://www.weez.com/2011/06/patrick-crews-drizzle%e2%80%99s-jenkins-system-using-dbqp-for-randgen-and-crashme-testing/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Using sets to solve difficult decision problems – the subset sum problem</title>
		<link>http://www.weez.com/2011/05/using-sets-to-solve-difficult-decision-problems-%e2%80%93-the-subset-sum-problem/</link>
		<comments>http://www.weez.com/2011/05/using-sets-to-solve-difficult-decision-problems-%e2%80%93-the-subset-sum-problem/#comments</comments>
		<pubDate>Wed, 18 May 2011 05:09:32 +0000</pubDate>
		<dc:creator>Abidoon</dc:creator>
				<category><![CDATA[MySQL]]></category>
		<category><![CDATA[Decision]]></category>
		<category><![CDATA[Difficult]]></category>
		<category><![CDATA[Problem]]></category>
		<category><![CDATA[Problems]]></category>
		<category><![CDATA[sets]]></category>
		<category><![CDATA[Solve]]></category>
		<category><![CDATA[subset]]></category>
		<category><![CDATA[using]]></category>

		<guid isPermaLink="false">http://www.weez.com/2011/05/using-sets-to-solve-difficult-decision-problems-%e2%80%93-the-subset-sum-problem/</guid>
		<description><![CDATA[In my previous post, I described how to use natural database compression on sets to reduce their footprint in the database. I also showed an easy way to detect if any subset of numbers in one set exists in another set in a very efficient way. This is a special case of the P vs [...]]]></description>
			<content:encoded><![CDATA[<p>In my previous post, I described how to use natural database compression on sets to reduce their footprint in the database.  I also showed an easy way to detect if any subset of numbers in one set exists in another set in a very efficient way.  This is a special case of the <a href="http://en.wikipedia.org/wiki/Subset_sum"> P vs NP problem</a>.  That special case was a foothold on a way to efficiently test a set of any size.</p>
<p>Here is a generic method for determining if any subset of numbers in a set adds up to any other number.   By naturally compressing the data in a computational friendly way, we can change the way we work with sets of numbers.<br />
<span id="more-6909"></span></p>
<p>Lets find out if any of the numbers in our list add up to any other number.  First, the set is contained in a naturally compressed (rle encoded) table:</p>
<pre>
CREATE TABLE `ex2` (
  `val` bigint(20) NOT NULL DEFAULT '0',
  `cnt` bigint(21) NOT NULL DEFAULT '0'
);

mysql> select * from test.ex2;
+-----+----------+
| val | cnt      |
+-----+----------+
| -10 |        1 |
|  -3 |        1 |
|  -2 |        1 |
|   0 |   250000 |
|   4 |  8000000 |
|   7 | 14680064 |
|  14 | 14680064 |
|  15 | 35417088 |
|  16 |        1 |
+-----+----------+
9 rows in set (0.00 sec)

mysql> select * from sum_to_check;
+------+------------------+
| id   | sum_to_check_for |
+------+------------------+
|    1 |                0 |
|    2 |                1 |
|    3 |                2 |
|    4 |                5 |
|    5 |               -5 |
+------+------------------+
5 rows in set (0.00 sec)
</pre>
<p>Here, we can test for any N items to see if they add up:</p>
<pre>
mysql> select sum_to_check_for,
group_concat(a.val),
group_concat(sum_to_check_for),
sum(a.val) = sum_to_check_for matched
from ex2 a
join sum_to_check
where val in(-2,-3)
group by 1
having sum(a.val) = sum_to_check_for ;
+------------------+---------------------+--------------------------------+---------+
| sum_to_check_for | group_concat(a.val) | group_concat(sum_to_check_for) | matched |
+------------------+---------------------+--------------------------------+---------+
|               -5 | -3,-2               | -5,-5                          |       1 |
+------------------+---------------------+--------------------------------+---------+
1 row in set (0.00 sec)
</pre>
<p>Now we can check the entire set in a reasonable amount of time, even though it is millions of entries long:</p>
<pre>
mysql> select sum_to_check_for, group_concat(a.val), group_concat(sum_to_check_for), sum(a.val) = sum_to_check_for matched from ex2 a join sum_to_check where val in(-10) group by 1 having sum(a.val) = sum_to_check_for ;
Empty set (0.00 sec)

mysql> select sum_to_check_for, group_concat(a.val), group_concat(sum_to_check_for), sum(a.val) = sum_to_check_for matched from ex2 a join sum_to_check where val in(-10,-3) group by 1 having sum(a.val) = sum_to_check_for ;
Empty set (0.00 sec)

mysql> select sum_to_check_for, group_concat(a.val), group_concat(sum_to_check_for), sum(a.val) = sum_to_check_for matched from ex2 a join sum_to_check where val in(-10,-3,-2) group by 1 having sum(a.val) = sum_to_check_for ;
Empty set (0.00 sec)

mysql> select sum_to_check_for, group_concat(a.val), group_concat(sum_to_check_for), sum(a.val) = sum_to_check_for matched from ex2 a join sum_to_check where val in(-10,-3,-2) group by 1 having sum(a.val) = sum_to_check_for ;

mysql> select sum_to_check_for, group_concat(a.val), group_concat(sum_to_check_for), sum(a.val) = sum_to_check_for matched from ex2 a join sum_to_check where val in(-10) group by 1 having sum(a.val) = sum_to_check_for ;
Empty set (0.00 sec)

mysql> select sum_to_check_for, group_concat(a.val), group_concat(sum_to_check_for), sum(a.val) = sum_to_check_for matched from ex2 a join sum_to_check where val in(-10,-3) group by 1 having sum(a.val) = sum_to_check_for ;
Empty set (0.00 sec)

mysql> select sum_to_check_for, group_concat(a.val), group_concat(sum_to_check_for), sum(a.val) = sum_to_check_for matched from ex2 a join sum_to_check where val in(-10,-3,-2) group by 1 having sum(a.val) = sum_to_check_for ;
Empty set (0.00 sec)

...
</pre>
<p>Note that here we search the entire set of millions of items 5 times over (one for each sum_to_check_for):</p>
<pre>
mysql> select sum_to_check_for, group_concat(a.val), group_concat(sum_to_check_for), sum(a.val) = sum_to_check_for matched from ex2 a join sum_to_check where val in(-10,-3,-2,0,4,7,14,15,16) group by 1 having sum(a.val) = sum_to_check_for ;
Empty set (0.00 sec)
</pre>
<p>Then move along:</p>
<pre>
mysql> select sum_to_check_for, group_concat(a.val), group_concat(sum_to_check_for), sum(a.val) = sum_to_check_for matched from ex2 a join sum_to_check where val in(-3) group by 1 having sum(a.val) = sum_to_check_for ;Empty set (0.00 sec)

mysql> select sum_to_check_for, group_concat(a.val), group_concat(sum_to_check_for), sum(a.val) = sum_to_check_for matched from ex2 a join sum_to_check where val in(-3,-2) group by 1 having sum(a.val) = sum_to_check_for ;
+------------------+---------------------+--------------------------------+---------+
| sum_to_check_for | group_concat(a.val) | group_concat(sum_to_check_for) | matched |
+------------------+---------------------+--------------------------------+---------+
|               -5 | -3,-2               | -5,-5                          |       1 |
+------------------+---------------------+--------------------------------+---------+
1 row in set (0.00 sec)
</pre>
<p>Keep in mind, that there are a lot of items in my list:</p>
<pre>
mysql> select sum(cnt) from ex2;
+----------+
| sum(cnt) |
+----------+
| 73027220 |
+----------+
1 row in set (0.00 sec)

mysql> select * from ex2;
+-----+----------+
| val | cnt      |
+-----+----------+
| -10 |        1 |
|  -3 |        1 |
|  -2 |        1 |
|   0 |   250000 |
|   4 |  8000000 |
|   7 | 14680064 |
|  14 | 14680064 |
|  15 | 35417088 |
|  16 |        1 |
+-----+----------+
9 rows in set (0.00 sec)
</pre>
<p>This works for two reasons.  First, I know the contents of the set, and the distribution characteristics from the from structure of the compressed version of the table itself.   The data is stored in a histogram. This allows me to collapse the set of items to search to only the items in the set, and then do a linear search.  You can do all of the searches in parallel in a parallel database.  </p>
<p>If we keep the counts with the items (maintained with changes to the set) then this allows us to create a finite and easily computable and easily checkable result.  We know not to check to see if 17 adds up to something because we know it is not in our set.  </p>
<p>View full post on <a href="http://www.mysqlperformanceblog.com/2011/05/17/using-sets-to-solve-difficult-decision-problems-the-subset-sum-problem/">MySQL Performance Blog</a></p>
]]></content:encoded>
			<wfw:commentRss>http://www.weez.com/2011/05/using-sets-to-solve-difficult-decision-problems-%e2%80%93-the-subset-sum-problem/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Using any general purpose computer as a special purpose SIMD computer</title>
		<link>http://www.weez.com/2011/05/using-any-general-purpose-computer-as-a-special-purpose-simd-computer/</link>
		<comments>http://www.weez.com/2011/05/using-any-general-purpose-computer-as-a-special-purpose-simd-computer/#comments</comments>
		<pubDate>Mon, 16 May 2011 13:46:31 +0000</pubDate>
		<dc:creator>Abidoon</dc:creator>
				<category><![CDATA[MySQL]]></category>
		<category><![CDATA[computer]]></category>
		<category><![CDATA[General]]></category>
		<category><![CDATA[purpose]]></category>
		<category><![CDATA[SIMD]]></category>
		<category><![CDATA[special]]></category>
		<category><![CDATA[using]]></category>

		<guid isPermaLink="false">http://www.weez.com/2011/05/using-any-general-purpose-computer-as-a-special-purpose-simd-computer/</guid>
		<description><![CDATA[Often times, from a computing perspective, one must run a function on a large amount of input. Often times, the same function must be run on many pieces of input, and this is a very expensive process unless the work can be done in parallel. Shard-Query introduces set based processing, which on the surface appears [...]]]></description>
			<content:encoded><![CDATA[<p>Often times, from a computing perspective, one must run a function on a large amount of input.  Often times, the same function must be run on many pieces of input, and this is a very expensive process unless the work can be done in parallel.</p>
<p>Shard-Query introduces set based processing, which on the surface appears to be similar to other technologies on the market today.  However, the scaling features of Shard-Query are just a side effect of the fact that it operates on sets in parallel.  Any set can be operated on to any arbitrary degree of parallelism up to, and including, the cardinality of the set.<br />
This is because:</p>
<ol>
<li>It is often possible to arbitrarily transform one type of expression into a different, but compatible type for computational purposes as long as the conversion is bidirectional
<li>An range operation over a set of integers or dates can be transformed into one or more discrete sub-ranges
<li>Any operation on an entire set is the same as running that operation on each item individually.
<li>After expanding a set operation into N discrete dimensions, it can always be collapsed back into a one dimensional set.
<li>Arrays are sets
</ol>
<p>Treating a general purpose computer as an SIMD computer is possible because in a set, you can perform operations on all of the items independently.  The SIMD processor simply needs to wait for all parallel operations on its input to complete.  Parallelism is embarrassing and the maximum degree of parallelism is easily enforced with queue.  </p>
<p>Today I am going to show you how to take almost any function, and treat any size cluster of Turing computers as a specialized purpose SIMD computer with respect to your function.   The SQL interface to Shard-Query imposes a wait for all the workers to complete, but you can register a callback function to handle the output of each input asynchronously, if you like.</p>
<p>Right now I believe this only works on finite sets.  I&#8217;ve decided to show how to count the number of unique words, an how many times those words appear in a document.  Set based processing of course works on sets.  A document is a set of words.</p>
<p>Before you read further, I want to tell you why I&#8217;ve decided to use the words of the Constitution of the United States of America as an example.   It is my favourite document in the world.  It speaks of honesty, and integrity, truth and openness.  I believe in all of these things.  I believe, that with cheap computation, our world can become an amazing place.  Please use this technology constructively and for peaceful purposes.  Love one another and let&#8217;s solve all the complex problems in the world together in peace and harmony.</p>
<p>The following is a somewhat naive example, since grammar will not be taken into account.  I start by splitting our document into a list of words:<br />
cat /tmp/constitution.txt | sed -e&#8217;s/ /\n/g&#8217; > words</p>
<p>I am going to perform the following operations on every word in the constitution:<br />
1) compute the md5 of every item<br />
2) compute the md5 on the reverse of every item<br />
3) count the total number of words<br />
4) count the frequency of words<br />
5) order by the frequency of words, then by the md5 of the word, then by the md5 of the reverse of the word.<br />
6) determine the number of unique words.  This is not projected, but you can infer it from the number of items in the output set. </p>
<p>The US Constitution is not very large.  I inflated the document size significantly to over 3 million &#8220;words&#8221; by duplicating the entire set multiple times.<br />
mysql> load data infile &#8216;/tmp/words&#8217; into table words (chars);<br />
Query OK, 6033 rows affected (0.01 sec)<br />
Records: 6033  Deleted: 0  Skipped: 0  Warnings: 0</p>
<p>I blow up the size of the words table and I create words2.  This is the data upon which we will operate:</p>
<pre>
create table words2 partition by hash(bucket) partitions 12 as select id % 6 bucket, chars words from words;
Query OK, 3088896 rows affected (2.41 sec)
Records: 3088896  Duplicates: 0  Warnings: 0
</pre>
<p>Here is the serial version as run by the native database interface (MySQL):</p>
<pre>
mysql> select word, md5(word), md5(reverse(word)), count(*)
from words2 group by 1,2,3 order by 3,1 desc;
...
| Legislature      | 4380f755e4150b1c11f0ae9ca1910bcb | fecd2758f3c64c8176ce60c4ff7c1cf3 |     3072 |
| consent          | 9d721d9a89406a2a6861efaae44a785f | fede6baff4c3716c37a3c60bf4051b3f |      512 |
| admit,           | d803450bb41af1f7372af6ddc8e42d14 | fee1e6f166edfccd849fe4438eb1924f |      512 |
| Affirmation.     | 9568b7e19ee3da70d3e486134add2743 | fee5d3a27ec5be41941b5689f70c5587 |      512 |
| may,             | 289cf5ceddb80bab96c92de0a918e122 | fee80b247ce32faca9de1a031119533c |     1024 |
| legislatures     | 0640c734a3d25eed18126c7db6a39523 | ff238c73fea4086c10cda4a46aeb9d9a |     1024 |
| Time             | a76d4ef5f3f6a672bbfab2865563e530 | ff38a346616fc8a4df42c7f6c95bf1cc |     2048 |
| Congress:        | 873c419d2c2139bc8bbc3cbaffcc3473 | ff592a4dac2aa93c8a0589898885fe48 |      512 |
| Charles          | 399423ff652ebb6a6701be7ec3202fc6 | ffac637b74c0f062904ab466d9bf9e01 |     1024 |
| impairing        | 1c718d732bc6f6805835f8be6ef6e43e | ffc86c559e06009a743d891ce1e4fc4f |      512 |
+------------------+----------------------------------+----------------------------------+----------+
1427 rows in set (4.94 sec)
</pre>
<p>1427 rows in set (5.00 sec)<br />
1427 rows in set (5.03 sec)<br />
1427 rows in set (5.00 sec)</p>
<p>Since the data fits in memory speed is near constant and the single threaded operation burns one CPU.</p>
<p>To help completely demonstrate how Shard-Query makes parallel set operations work, I&#8217;ll operate in only one dimension for the first example, just like the MySQL client.  This will be a linear operation because Shard-Query has no idea how to add parallelism in this case.  It is data set agnostic, operating only on sets, not relations.  If it were smarter it would ask the data dictionary about partitioning.</p>
<pre>
Array
(
    [word] => Congress:
    [md5(word)] => 873c419d2c2139bc8bbc3cbaffcc3473
    [md5(reverse(word))] => ff592a4dac2aa93c8a0589898885fe48
    [count(*)] => 512
)
Array
(
    [word] => Charles
    [md5(word)] => 399423ff652ebb6a6701be7ec3202fc6
    [md5(reverse(word))] => ffac637b74c0f062904ab466d9bf9e01
    [count(*)] => 1024
)
Array
(
    [word] => impairing
    [md5(word)] => 1c718d732bc6f6805835f8be6ef6e43e
    [md5(reverse(word))] => ffc86c559e06009a743d891ce1e4fc4f
    [count(*)] => 512
)
1427 rows returned (5.0057470798492s, 4.9994130134583s, 0.0063340663909912s)
</pre>
<p>The set of three numbers are wall clock time (as calculated by microtime()), SQL execution time, and parse time, respectively. </p>
<p>Actually performance is a little worse.  This is not unexpected.  Since Shard-Query must add at small amount of overhead, a single threaded operation may be slower than the same operation on the native database. </p>
<p>That doesn&#8217;t matter because Shard-Query is a smart database proxy that can add parallelism.  In this mode it will add additional six degrees of parallelism the query:</p>
<pre>
Array
(
    [word] => Congress:
    [md5(word)] => 873c419d2c2139bc8bbc3cbaffcc3473
    [md5(reverse(word))] => ff592a4dac2aa93c8a0589898885fe48
    [count(*)] => 342
)
Array
(
    [word] => Charles
    [md5(word)] => 399423ff652ebb6a6701be7ec3202fc6
    [md5(reverse(word))] => ffac637b74c0f062904ab466d9bf9e01
    [count(*)] => 598
)
Array
(
    [word] => impairing
    [md5(word)] => 1c718d732bc6f6805835f8be6ef6e43e
    [md5(reverse(word))] => ffc86c559e06009a743d891ce1e4fc4f
    [count(*)] => 512
)
1427 rows returned (0.87930011749268s, 0.87229418754578s, 0.0070059299468994s)
</pre>
<p>Why six degrees of parallelism?  Because that is how many physical cores are connected to my bus, and because I chose to create six hash &#8220;buckets&#8221; in the table.   This allows MySQL to set up a sequential scan over the items in this bucket, particularly since we are examining all the items.  We operate on all the buckets and then use intelligent expression substitution to put the results back together, when necessary.  When sorting or grouping are used, a final pass over the final result may be necessary, and this may add a small amount of serialization at the end.</p>
<p>How does this work?  </p>
<p>Here is the most important part of the explain plan in the mode without parallelism.  Notice that there is only one query.  If your database system can not provide native parallelism, then performance will be poor.</p>
<pre>
-- SQL TO SEND TO SHARDS:
Array
(
    [0] => SELECT word AS `word`,md5(word) AS `md5(word)`,md5(reverse(word)) AS `md5(reverse(word))`,COUNT(*) AS `count(*)`
FROM words2 AS `words2` GROUP BY 1,2,3 ORDER BY NULL
)
</pre>
<p>The other important optimization combines results from multiple queries together.  This query is single threaded, and thus this serves no purpose.   It will be much more important in a moment.</p>
<pre>
-- AGGREGATION SQL:
SELECT `word`,`md5(word)`,SUM(`count(*)`) AS `count(*)`
FROM `aggregation_tmp_39323566`  GROUP BY 1,2 ORDER BY 1 ASC
ON DUPLICATE KEY UPDATE
`word`=VALUES(`word`),
`md5(word)`=VALUES(`md5(word)`),
`count(*)`=`count(*)` +  VALUES(`count(*)`)
</pre>
<p>Now, consider the query with BETWEEN 1 and 6 added to the where clause.  This creates boundary conditions for our query.  Any set of integers can be broken up into as many items are the set contains, and thus it is possible to convert the BETWEEN expression into a partition elimination expression.</p>
<p>Here is the output from the parallel version:</p>
<pre>

-- SQL TO SEND TO SHARDS:
Array
(
    [0] => SELECT word AS `word`,md5(word) AS `md5(word)`,md5(reverse(word)) AS `md5(reverse(word))`,COUNT(*) AS `count(*)`
FROM words2 AS `words2` WHERE bucket  = 1   GROUP BY 1,2,3 ORDER BY NULL
    [1] => SELECT word AS `word`,md5(word) AS `md5(word)`,md5(reverse(word)) AS `md5(reverse(word))`,COUNT(*) AS `count(*)`
FROM words2 AS `words2` WHERE bucket  = 2   GROUP BY 1,2,3 ORDER BY NULL
    [2] => SELECT word AS `word`,md5(word) AS `md5(word)`,md5(reverse(word)) AS `md5(reverse(word))`,COUNT(*) AS `count(*)`
FROM words2 AS `words2` WHERE bucket  = 3   GROUP BY 1,2,3 ORDER BY NULL
    [3] => SELECT word AS `word`,md5(word) AS `md5(word)`,md5(reverse(word)) AS `md5(reverse(word))`,COUNT(*) AS `count(*)`
FROM words2 AS `words2` WHERE bucket  = 4   GROUP BY 1,2,3 ORDER BY NULL
    [4] => SELECT word AS `word`,md5(word) AS `md5(word)`,md5(reverse(word)) AS `md5(reverse(word))`,COUNT(*) AS `count(*)`
FROM words2 AS `words2` WHERE bucket  = 5   GROUP BY 1,2,3 ORDER BY NULL
    [5] => SELECT word AS `word`,md5(word) AS `md5(word)`,md5(reverse(word)) AS `md5(reverse(word))`,COUNT(*) AS `count(*)`
FROM words2 AS `words2` WHERE bucket  = 6   GROUP BY 1,2,3 ORDER BY NULL
)
</pre>
<p>This powers the UPSERT.  The results from the six branches are combined with this.</p>
<pre>
-- AGGREGATION SQL:
SELECT `word`,`md5(word)`,SUM(`count(*)`) AS `count(*)`
FROM `aggregation_tmp_27656998`  GROUP BY 1,2 ORDER BY 1 ASC
ON DUPLICATE KEY UPDATE
`word`=VALUES(`word`),
`md5(word)`=VALUES(`md5(word)`),
`count(*)`=`count(*)` +  VALUES(`count(*)`)
</pre>
<p>All six branches compute fully in parallel.  </p>
<p>Set processing also allows us to capture and effectively aggregate and manage every single change in any database.  The entire collection of human intelligence can be stored an efficiently reused over and over again, conserving resources now lost through wasted re-computation.  I will be sharing more information about this in the future.</p>
<p>Finally, I would like to note that this set has a cardinality of 3088896.  This is the maximum theoretical degree of parallelism that this data set can achieve with my method.  Likely current network technologies can not support   such a degree.</p>
<pre>
mysql> select min(id),max(id) from words;
+---------+---------+
| min(id) | max(id) |
+---------+---------+
|       1 | 3088896 |
+---------+---------+
1 row in set (0.00 sec)
</pre>
<p>View full post on <a href="http://www.mysqlperformanceblog.com/2011/05/16/using-any-general-purpose-computer-as-a-special-purpose-simd-computer/">MySQL Performance Blog</a></p>
]]></content:encoded>
			<wfw:commentRss>http://www.weez.com/2011/05/using-any-general-purpose-computer-as-a-special-purpose-simd-computer/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Patrick Crews: New dbqp feature – using pre-created datadirs for tests</title>
		<link>http://www.weez.com/2011/05/patrick-crews-new-dbqp-feature-%e2%80%93-using-pre-created-datadirs-for-tests/</link>
		<comments>http://www.weez.com/2011/05/patrick-crews-new-dbqp-feature-%e2%80%93-using-pre-created-datadirs-for-tests/#comments</comments>
		<pubDate>Sat, 14 May 2011 20:07:57 +0000</pubDate>
		<dc:creator>Abidoon</dc:creator>
				<category><![CDATA[Drizzle]]></category>
		<category><![CDATA[Crews]]></category>
		<category><![CDATA[datadirs]]></category>
		<category><![CDATA[dbqp]]></category>
		<category><![CDATA[Feature]]></category>
		<category><![CDATA[Patrick]]></category>
		<category><![CDATA[precreated]]></category>
		<category><![CDATA[tests]]></category>
		<category><![CDATA[using]]></category>

		<guid isPermaLink="false">http://www.weez.com/2011/05/patrick-crews-new-dbqp-feature-%e2%80%93-using-pre-created-datadirs-for-tests/</guid>
		<description><![CDATA[Why would one want to do this, you may ask?  Well, for starters, it makes a great &#8216;canary-in-the-coal-mine&#8216; in regards to backwards compatibility! For Drizzle, we&#8217;ve created some tables (via the randgen&#8217;s data generator if you are curious), saved a copy of the datadir, and then created a test case that uses said datadir for [...]]]></description>
			<content:encoded><![CDATA[<p>Why would one want to do this, you may ask?  Well, for starters, it makes a great &#8216;<a href="http://en.wikipedia.org/wiki/Animal_sentinels#Canaries_in_coal_mines">canary-in-the-coal-mine</a>&#8216; in regards to <a href="https://blueprints.launchpad.net/drizzle/+spec/test-dump-restore-req">backwards compatibility</a>!</p>
<p>For Drizzle, we&#8217;ve created some tables (via the <a href="http://forge.mysql.com/wiki/RandomDataGenerator">randgen&#8217;s data generator</a> if you are curious), saved a copy of the datadir, and then created a test case that uses said datadir for the test server.  The test executes some simple SQL queries to make sure we can read the tables properly.  This way, if we ever do something to either the server or .<strong>dfe</strong> format (<strong>d</strong>ata <strong>f</strong>ormat <strong>e</strong>xchange &#8211; had a most enlightening conversation with the team about this format&#8217;s history at the MySQL UC), we&#8217;ll have a broken test that cries about it.  From there, we&#8217;ll know we have to take some action.  The always-amazing <a href="http://www.flamingspork.com/blog/">Stewart Smith</a> has also created some <a href="https://code.launchpad.net/~stewart/drizzle/fkey-binary-backwards-compat">foreign key backwards compatibility tests</a>, which I <strong>believe</strong> marks further progress towards the magical goodness that is <a href="http://www.flamingspork.com/blog/2011/03/23/multi-tenancy-drizzle/">catalogs</a>!</p>
<p>We signal that we want to do this by using a .cnf file:</p>
<p><code><br />
[test_servers]<br />
servers = [[]]</code></p>
<p><code> </code></p>
<p><code>[s0]<br />
load-datadir=backwards_compat_data<br />
</code></p>
<p>Each server is named s0,s1,sN.  If a server name is contained in the .cnf file, the test-runner will do the appropriate magic to use the specified datadir for that server.  The argument to load-datadir is the name of the directory that is intended for use in the test.  All datadirs are expected to live in drizzle/tests/std_data.  Tests that do use a .cnf file, like main.backwards_compatibility and slave.basic are skipped by test-run.pl automatically (you *can&#8217;t* run them via test-run.pl).</p>
<p>This is something that I don&#8217;t believe could be accomplished with the old test runner, or at least not *easily* done (see <a href="http://en.wikipedia.org/wiki/Rube_Goldberg_machine">Rube Goldberg</a>) ; ).  At some point, we will switch over to dbqp entirely and remove test-run.pl.  Seeing comments like <a href="http://blog.drizzle.org/2011/05/10/1115/comment-page-1/#comment-11614">this</a>, makes me happy and think things are on track.</p>
<p><a href="http://docs.drizzle.org/testing/dbqp.html">dbqp</a> was created with the idea that it should be easy to express complex testing setups (multiple servers, using a preloaded datadir, etc, etc) and it looks like the incubation is starting to pay some benefits.  In addition to allowing this voodoo to happen, the code I&#8217;ve added to the test runner will allow us to start doing proper tests of the super Mr. Shrewsbury&#8217;s <a href="http://dshrewsbury.blogspot.com/2011/03/multi-master-support-in-drizzle.html">multi-master replication</a>.  Joe Daly has also been doing some very promising work for <a href="http://www.8bitsofbytes.com/?p=28">hierarchical replication</a> based on Dave&#8217;s tree.  I&#8217;ll be creating some example tests for these badass features soon.  The moral of the story is that by rethinking our test-runner, one tiny bit of code helps us move the ball forward on testing replication, backwards compatibility, <span><strong>and</strong></span> catalogs.</p>
<p>It&#8217;s honestly one of the best parts of working on the Drizzle project &#8211; being encouraged to experiment and rethink problems has enabled all sorts of <a href="http://jenkins.drizzle.org/">innovation</a> (but one example of <a href="http://inaugust.com/">Monty Taylor&#8217;s</a> computing wizardry!) and <a href="http://www.flamingspork.com/blog/2011/05/13/drizzle-json-interface-merged/">cool features</a>.  Thanks to this freedom to experiment, we now have even more ways of making sure we are producing quality code.</p>
<p>My view of QA is that we do help test, but that we also help other people answer their own questions about quality (via tools, documentation, examples, etc).  Ultimately, a test is a question &#8211; &#8220;Do you return the right answer for this query?&#8221;, &#8220;Can you survive a beating from the <a href="http://www.wc220.com/wp-content/uploads/2011/04/randgen1.png">randgen</a>?&#8221;, etc &#8211; and asking questions should be easy and informative.  QA shouldn&#8217;t be the sole province of some obscure priest class, but everyone&#8217;s playground.  When I see developers like Stewart <a href="http://bazaar.launchpad.net/~stewart/drizzle/fkey-binary-backwards-compat/revision/2279">writing interesting test cases</a> and even <a href="https://code.launchpad.net/~stewart/drizzle/better-dbqp-alignment/+merge/58415">contributing to the test tool itself</a>, I&#8217;m even happier than when I find a bug (and finding bugs is quite awesome!).</p>
<p>Anyway, the code is proposed for a merge to trunk and documentation is available (under testing/writing test cases).  I hope that this makes trying to break things even more fun for people &gt;: )</p>
<p>View full post on <a href="http://www.wc220.com/?p=247">Planet Drizzle</a></p>
]]></content:encoded>
			<wfw:commentRss>http://www.weez.com/2011/05/patrick-crews-new-dbqp-feature-%e2%80%93-using-pre-created-datadirs-for-tests/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Robert Hodges: Parallel Replication Using Shards Is the Only Workable Approach for SQL</title>
		<link>http://www.weez.com/2011/04/robert-hodges-parallel-replication-using-shards-is-the-only-workable-approach-for-sql/</link>
		<comments>http://www.weez.com/2011/04/robert-hodges-parallel-replication-using-shards-is-the-only-workable-approach-for-sql/#comments</comments>
		<pubDate>Sat, 30 Apr 2011 20:07:26 +0000</pubDate>
		<dc:creator>Abidoon</dc:creator>
				<category><![CDATA[Drizzle]]></category>
		<category><![CDATA[Approach]]></category>
		<category><![CDATA[Hodges]]></category>
		<category><![CDATA[only]]></category>
		<category><![CDATA[parallel]]></category>
		<category><![CDATA[Replication]]></category>
		<category><![CDATA[Robert]]></category>
		<category><![CDATA[Shards]]></category>
		<category><![CDATA[using]]></category>
		<category><![CDATA[Workable]]></category>

		<guid isPermaLink="false">http://www.weez.com/2011/04/robert-hodges-parallel-replication-using-shards-is-the-only-workable-approach-for-sql/</guid>
		<description><![CDATA[There have been a couple of recent blog articles (here and here) asking for parallel replication based on something other than schemas. &#160;These articles both focus on the problem of parallelizing updates within a single MySQL schema.&#160;&#160;I read these with great interest, not least because they both mentioned&#160;Tungsten&#160;(thanks!) and also found that our&#160;schema-based parallelization approach [...]]]></description>
			<content:encoded><![CDATA[<p>There have been a couple of recent blog articles (<a href="http://www.tusacentral.net/joomla/index.php/mysql-blogs/96-a-dream-on-mysql-parallel-replication">here</a> and <a href="http://openlife.cc/blogs/2011/march/parallelizing-mysql-replication-slave">here</a>) asking for parallel replication based on something other than schemas. &nbsp;These articles both focus on the problem of parallelizing updates within a single MySQL schema.&nbsp;&nbsp;I read these with great interest, not least because they both mentioned&nbsp;<a href="http://code.google.com/p/tungsten-replicator/">Tungsten</a>&nbsp;(thanks!) and also found that our&nbsp;schema-based parallelization approach is too limited. &nbsp;It is therefore worth a short article explaining exactly what the Tungsten approach is and why we chose it. </p>
<p>First of all, Tungsten does not exactly use schema-based parallel replication. &nbsp;Tungsten is actually based on what I call the <i>serialized shard</i> model of replication. &nbsp;We assign global transaction IDs to all transactions, which means that for any particular set of transactions we can always figure out the correct serialization and apply in the right order. &nbsp;This is true even if the transactions travel across independent replication paths or if we have master failover. </p>
<p>Second, we assign a shard ID to all transactions. &nbsp;Shards are independent streams of transactions that execute correctly when applied by themselves in serial order. &nbsp;Shards are typically independent, which means transactions in different shards can execute in parallel without deadlocking or corrupting data. &nbsp;This is the case when each shard contains data for a single customer in a multi-tenant application. &nbsp;We also have a notion of &#8220;critical shards,&#8221; which are shards that contain global data, such as shared currency rate tables. &nbsp;Updates in critical shards cause full serialization across all shards. &nbsp;</p>
<p>You can define shards in a variety of ways, but as a practical matter identifying durable shards inside individual MySQL schemas is hard for most applications, especially if there are constraints between tables or you have large transactions. &nbsp; Many SQL applications tend to make most of their updates to a small number of very large tables, which makes finding stable dividing lines even more difficult. &nbsp;Schemas are therefore a natural unit of sharding, and Tungsten uses these by default. </p>
<p>Schema-based sharding seems pretty limiting, but for current SQL databases it is really the only approach that works. &nbsp;Here are some important reasons that give you a flavor of the issues.</p>
<p>* <b>Restart</b>. &nbsp;To handle failures you need to mark the exact restart point on each replication apply thread or you will either repeat or miss transactions. &nbsp;This requires precise and repeatable serialization on each thread, which you get with the serialized shard model.</p>
<p>* <b>Deadlocks</b>. &nbsp;If there are conflicts between updates you will quickly hit deadlocks. &nbsp;This is especially true because one of the biggest single thread replication optimizations is block commit, where you commit dozens of success transactions at once&#8211;it can raise throughput by 100% in some cases. &nbsp;Deadlocks on the other hand can reduce effective throughput to zero in pathological cases. &nbsp; Shard-based execution avoids deadlocks.</p>
<p>* <b>Ordering</b>. &nbsp;SQL gives you a lot of ways to shoot yourself in the foot through bad transaction ordering. &nbsp;You can&#8217;t write to a table before creating it. &nbsp;You can&#8217;t delete a row before it is inserted. &nbsp;Violating these rules does not just lead to invalid data but also causes errors that stop replication. &nbsp;The workarounds are either unreliable and slow (conflict resolution) or impractical for most applications (make everything an insert). &nbsp;To avoid this you need to observe serialization very carefully.</p>
<p>* <b>Throughput</b>. &nbsp;SQL transactions in real systems vary tremendously in duration, which tends to result in individual long transactions blocking simpler parallelization schemes that use in-memory distribution of updates. &nbsp;In the Tungsten model we can solve this by letting shard progress vary (by hours potentially), something that is only possible with a well-defined serialization model that deals with dependencies between parallel update streams. &nbsp;I don&#8217;t know of another approach that deals with this problem. </p>
<p>If you mess up the solution to any of the foregoing problems, chances are good you will irreparably corrupt data, which leads to replication going completely off the rails. &nbsp;Then you reprovision your slave(s). &nbsp;The databases that most need parallel replication are very large, so this is a multi-hour or even multi-day process. &nbsp;It makes for unpleasant calls with customers when you tell them they need to do this. </p>
<p>I don&#8217;t spend a lot of time worrying that Tungsten parallel replication is not well suited to the single big schema problem. &nbsp;So far, the only ways I can think of making it work scalably require major changes to the DBMS or the applications that use it. &nbsp;In many cases your least costly alternative may be to use SSDs to boost slave I/O performance. </p>
<p>My concerns about Tungsten&#8217;s model lie in a different area. &nbsp;The serialized shard model is theoretically sound&#8211;it has essentially the same semantics as causally dependent messaging in distributed systems. &nbsp;However, if we fail to identify shards correctly (and don&#8217;t know we failed) we will have crashes and corrupt data. &nbsp;I want Tungsten either to work properly or tell users it won&#8217;t work and degrade gracefully to full serialization. &nbsp;If we can&#8217;t do one of these two for every conceivable sequence of transactions that&#8217;s a serious problem. </p>
<p>So, to get back to my original point, serialized shards are the best model for parallel replication in SQL databases as we find them today. &nbsp;I suspect if you look at some of the other incipient designs for parallel replication on MySQL you will find that they follow this model in the end if not at first. &nbsp;I would think in fact that the next step is to add MySQL features that make sharded replication more effective. &nbsp;The drizzle team seems to be thinking along these lines already.
<div class="blogger-post-footer"><img width="1" height="1" src="https://blogger.googleusercontent.com/tracker/768233104244702633-5488735872437270768?l=scale-out-blog.blogspot.com" alt="" /></div>
<p>View full post on <a href="http://scale-out-blog.blogspot.com/2011/03/parallel-replication-using-shards-is.html">Planet Drizzle</a></p>
]]></content:encoded>
			<wfw:commentRss>http://www.weez.com/2011/04/robert-hodges-parallel-replication-using-shards-is-the-only-workable-approach-for-sql/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Marcus Eriksson: Visualizing data in real time using drizzle and rabbitmq, DEMO!</title>
		<link>http://www.weez.com/2011/04/marcus-eriksson-visualizing-data-in-real-time-using-drizzle-and-rabbitmq-demo/</link>
		<comments>http://www.weez.com/2011/04/marcus-eriksson-visualizing-data-in-real-time-using-drizzle-and-rabbitmq-demo/#comments</comments>
		<pubDate>Thu, 07 Apr 2011 20:06:44 +0000</pubDate>
		<dc:creator>Abidoon</dc:creator>
				<category><![CDATA[Drizzle]]></category>
		<category><![CDATA[data]]></category>
		<category><![CDATA[demo]]></category>
		<category><![CDATA[Eriksson]]></category>
		<category><![CDATA[Marcus]]></category>
		<category><![CDATA[RabbitMQ]]></category>
		<category><![CDATA[Real]]></category>
		<category><![CDATA[Time]]></category>
		<category><![CDATA[using]]></category>
		<category><![CDATA[Visualizing]]></category>

		<guid isPermaLink="false">http://www.weez.com/2011/04/marcus-eriksson-visualizing-data-in-real-time-using-drizzle-and-rabbitmq-demo/</guid>
		<description><![CDATA[A couple of weeks ago, when Mozilla released firefox 4, i watched glow.mozilla.org &#8211; a site made to visualize the firefox downloads in real time. I figured this could be quite a neat use case for my old drizzle to web socket replicator. So, the compontents involved: Drizzle with the rabbitmq plugin, see this blog [...]]]></description>
			<content:encoded><![CDATA[<p>A couple of weeks ago, when Mozilla released firefox 4, i watched <a href="http://glow.mozilla.org">glow.mozilla.org</a> &#8211; a site made to visualize the firefox downloads in real time. I figured this could be quite a neat use case for my old drizzle to web socket replicator.</p>
<p>So, the compontents involved:
<ul>
<li><a href="http://www.drizzle.org">Drizzle</a> with the rabbitmq plugin, see <a href="http://developian.blogspot.com/2011/03/publish-transactions-to-rabbitmq-in.html">this blog post</a> for more information on how to get it up and running.</li>
<li><a href="http://launchpad.net/drizzle-jdbc">Drizzle-JDBC</a> to generate the data in the database</li>
<li><a href="http://www.rabbitreplication.org">RabbitReplication</a></li>
<li><a href="http://www.rabbitmq.org">RabbitMQ</a> to transport the transactions from drizzle to the applier</li>
<li>Your browser, both <a href="http://www.google.com/chrome">Chrome</a> and <a href="http://www.getfirefox.net">Firefox</a> support web sockets, but for firefox you need to <a href="http://techdows.com/2010/12/turn-on-websockets-in-firefox-4.html">enable them first</a></li>
</ul>
<p>The basic flow is this:
<ol>
<li>10 inserts per second are executed in the drizzle database using drizzle-jdbc, these inserts contain a random IP address</li>
<li>Drizzle does its magic and packages up each insert as a google protobuf transaction message</li>
<li>The rabbitmq plugin picks up the message and publishes it to a RabbitMQ server</li>
<li>RabbitReplication consumes the message from RabbitMQ and maps it onto an <a href="http://bazaar.launchpad.net/~krummas/rabbitreplication/trunk/view/head:/src/org/drizzle/managedclasses/ExampleRepl.java">annotated java object</a>. This gives alot of flexibility to the rabbitreplication user since you can manipulate the information any way you want, in this example i extract a longitude and latitude value from the IP address using <a href="http://www.maxmind.com/">MaxMind GeoLite City</a>. Have a look at row the &#8220;setSomething()&#8221; method in the ExampleRepl.java file. This file is something a rabbitreplication end user would implement and provide</li>
<li>The java object is serialized to json</li>
<li>The json string is published to all connected clients</li>
<li>A javascript on the page takes the lat/lng-values and maps these onto a google map</li>
<li>The same javascript appends some of the raw data to a table</li>
</ol>
<p>So, I&#8217;m hoping the little demo i&#8217;m linking below shows some of the extreme flexibility of the Drizzle replication. This, for me, is the unique selling point for Drizzle. Also, if anyone has an actual production use case for this, i&#8217;d be extremely happy to help implement it!</p>
<p><center><br />
<h1><a href="http://demo.rabbitreplication.org" target="_blank">http://demo.rabbitreplication.org</a></h1>
<p>(Thanks to <a href="http://www.rackspace.com">Rackspace</a> for hosting!)</center></p>
<div class="separator"><a href="http://4.bp.blogspot.com/-OPZlFbdYh5k/TZ2Ld-itO2I/AAAAAAAAAUY/OJ6olXdjMMs/s1600/rabbitrepl-websockets.png"><img border="0" height="400" width="278" src="http://4.bp.blogspot.com/-OPZlFbdYh5k/TZ2Ld-itO2I/AAAAAAAAAUY/OJ6olXdjMMs/s400/rabbitrepl-websockets.png" /></a></div>
<div class="blogger-post-footer"><img width="1" height="1" src="https://blogger.googleusercontent.com/tracker/6543848899761399219-5825744296215803679?l=developian.blogspot.com" alt="" /></div>
<p>View full post on <a href="http://developian.blogspot.com/2011/04/visualizing-data-in-real-time-using.html">Planet Drizzle</a></p>
]]></content:encoded>
			<wfw:commentRss>http://www.weez.com/2011/04/marcus-eriksson-visualizing-data-in-real-time-using-drizzle-and-rabbitmq-demo/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Flexviews – part 3 – improving query performance using materialized views</title>
		<link>http://www.weez.com/2011/04/flexviews-%e2%80%93-part-3-%e2%80%93-improving-query-performance-using-materialized-views/</link>
		<comments>http://www.weez.com/2011/04/flexviews-%e2%80%93-part-3-%e2%80%93-improving-query-performance-using-materialized-views/#comments</comments>
		<pubDate>Mon, 04 Apr 2011 21:50:17 +0000</pubDate>
		<dc:creator>Abidoon</dc:creator>
				<category><![CDATA[MySQL]]></category>
		<category><![CDATA[Flexviews]]></category>
		<category><![CDATA[Improving]]></category>
		<category><![CDATA[materialized]]></category>
		<category><![CDATA[Part]]></category>
		<category><![CDATA[performance]]></category>
		<category><![CDATA[query]]></category>
		<category><![CDATA[using]]></category>
		<category><![CDATA[views]]></category>

		<guid isPermaLink="false">http://www.weez.com/2011/04/flexviews-%e2%80%93-part-3-%e2%80%93-improving-query-performance-using-materialized-views/</guid>
		<description><![CDATA[Combating &#8220;data drift&#8221; In my first post in this series, I described materialized views (MVs). An MV is essentially a cached result set at one point in time. The contents of the MV will become incorrect (out of sync) when the underlying data changes. This loss of synchronization is sometimes called drift. This is conceptually [...]]]></description>
			<content:encoded><![CDATA[<h1>Combating &#8220;data drift&#8221;</h1>
<p>In my <a href="http://www.mysqlperformanceblog.com/2011/03/23/using-flexviews-part-one-introduction-to-materialized-views/">first post</a> in this series, I described materialized views (MVs).  An MV is essentially a cached result set at one point in time.   The contents of the MV will become incorrect (out of sync) when the underlying data changes. This loss of synchronization is sometimes called <em>drift</em>.  This is conceptually similar to a replication slave that is behind.  Until it catches up, the view of the data on the slave is &#8220;behind&#8221; the changes on the master.  An important difference is that each MV could have drifted by a different length of time.</p>
<p><strong>A view which has drifted out of sync must be <em>refreshed</em></strong>.  Since an MV drifts over time from the &#8220;base tables&#8221; (those tables on which the view was built) there must be a process to bring them up-to-date.  Flexviews includes two different refresh methods.  Each method is named after the way in which the contents of the view are updated.  The two methods are <em>complete</em> refresh and <em>incremental</em> refresh.   The view type/refresh method are interchangeable since a refresh method is selected at the time of view creation and may not be changed.  If I talk about a &#8216;complete MV&#8217;, I mean a &#8216;MV for which the complete refresh method was selected at creation&#8217;.  Also, when I say &#8216;view&#8217; I mean MV.<span id="more-5537"></span></p>
<p>I will fully describe each method below, but the important thing to know right now is that there are two significant differences between the methods.</p>
<ul>
<li>A view which uses the <em>complete</em> refresh method must be rebuilt <em>from scratch</em> each time the view is refreshed.  This is similar to storing the results of a query in Memcache.  When the cache &#8220;expires&#8221;, the contents must be fully recalculated..</li>
<li>An <em>incrementally refreshable</em> view (that is, one which uses the <em>incremental</em> method) can be updated much more efficiently, by examining a history of the rows which have changed since the view was either created, or last refeshed.</li>
</ul>
<p>The incremental method is somewhat conceptually similar to using mysql binary logs for point-in-time recovery after a restoring a backup.  Since the backup is a essentially a &#8216;snapshot&#8217; of the data at the time of the backup, upon restoration the backup may be &#8216;out of date&#8217; since changes to the database may have happened since the backup was taken.  Replaying the binary logs brings the database up-to-date by replaying the changes.  </p>
<p>An MV is also like a snapshot.   The incremental refresh method uses the changelogs collected by FlexCDC to update the snapshot to reflect the changes that have happened in the database, instead of replaying binary logs directly.</p>
<h2>Selecting a refresh method</h2>
<p>The refresh method you select is determined by the following two major factors:</p>
<ol>
<li>The cost (in terms of execution time) of the query on which the view is based.</li>
<li>The SQL features used in the query on which the view is based.</li>
</ol>
<h3>The cost of the query</h3>
<p>The <em>complete</em> method completely rebuilds an MV from scratch at each refresh.  This means that the minimum amount of drift for this type of MV is the amount of time it takes to execute the query on which the view is based.  An incrementally refreshable view, on the other hand, can usually be refreshed very quickly, depending on the amount of data which has changed since the last refresh.  This is because the incremental refresh method must examine only those rows related to the ones which changed since the last time the view was refreshed.</p>
<p>The incremental refresh method can be many orders of magnitude faster than the original query execution time.  It may be possible, therefore, to refresh an MV based on a query that takes 45 minutes to execute with a greater frequency than 45 minutes, even as frequently as every minute, perhaps.</p>
<p>The refresh method you select effectively controls the minimum amount of time it takes to refresh the view, and therefore the minimum amount of drift it will encounter.   Keep in mind that you can build complete refresh views on top of incrementally refreshable view, which can give you a lot of flexibility to get the results you want, but more quickly.  More on this later.</p>
<h3>SQL features used in the query</h3>
<p>Incremental refresh doesn&#8217;t support all of the SQL features available to SELECT statements.</p>
<table border="1">
<tbody>
<tr>
<th>Refresh type</th>
<th>Aggregation</th>
<th>Outer join</th>
<th>All SQL functions</th>
<th>Built using SQL</th>
<th>Requires FlexCDC</th>
</tr>
<tr>
<td>COMPLETE</td>
<td align="center">Y</td>
<td align="center">Y</td>
<td align="center">Y</td>
<td align="center">Y</td>
<td align="center">N</td>
</tr>
<tr>
<td>INCREMENTAL</td>
<td align="center">Y*</td>
<td align="center">N</td>
<td align="center">N**</td>
<td align="center">N***</td>
<td align="center">Y</td>
</tr>
</tbody>
</table>
<p><strong>*</strong>All aggregate functions supported, except GROUP_CONCAT and AVG(distinct).<br />
<strong>**</strong>non-deterministic functions like RAND(),NOW(), etc, are not supported.<br />
<strong>***</strong>There is a script to convert SQL to the Flexviews API (see convert.php below)</p>
<h1>Refresh methods</h1>
<p>
<h2>The incremental refresh method</h2>
<p>The incremental refresh method uses <a href="http://www.mysqlperformanceblog.com/2011/03/25/using-flexviews-part-two-change-data-capture/"> table changelogs</a> which are created by FlexCDC.  The refresh algorithm computes changes in multiple transactions.  After each transaction it must wait for FlexCDC to process changes from that transaction.  In practice, this means that it takes a minimum of a few seconds to incrementally refresh a view, even when the number of changes is very small.</p>
<h3>The Flexviews SQL_API</h3>
<p>Unlike views which use the complete refresh method,  incrementally refreshable views are not built directly from SQL.  Instead, Flexviews includes a MySQL stored procedure API, called the <a href="http://flexviews.googlecode.com/svn/trunk/manual.html">SQL_API</a> which is used to define the view.   These stored procedures maintain the Flexviews data dictionary. The data dictionary is used by the incremental refresh algorithm to build and maintain the view.  You should always use the stored procedures to modify the dictionary.  Do not modify it directly.  </p>
<p><b>convert.php</b><br />
You are probably unfamiliar with the SQL_API.  In order to make working with the Flexviews easier,  it includes a script called &#8216;convert.php&#8217;.  This script makes it easy to create incrementally refreshable views from SQL statements.   It reads one or more &#8220;CREATE TABLE <em>db.schema</em> &#8230; AS SELECT&#8221; and/or &#8220;INSERT INTO <em>db.schema</em> &#8230; AS SELECT&#8221; statements from standard input, and outputs the SQL_API statements representing the original SQL statements.  Each MV name will be taken from the table name specified in each statement.  You should try each statement for correctness before you attempt to convert it.  You can easily check that it parses by running the SELECT portion, adding LIMIT 0 to the SELECT part of the query.</p>
<p>convert.php<br />
This script takes as one argument the default database.  Notice in the following example, the schema is not specified.  </p>
<p>The MV will be placed into whatever schema is specified as the first argument on the command line:</p>
<pre>$ cat sales.sql
create table dashboard_customer_sales AS
select customer_id,
customer_name,
DATE_FORMAT(order_date,'%Y%m') as sale_when,
sum(quantity) as total_items,
sum(price * quantity) as total_price,
count(*) as total_lines
from orders o
join customers c using(customer_id)
join order_lines ol using(order_id)
group by customer_id,
customer_name, sale_when;

$ php convert.php demo &lt; sales.sql
CALL flexviews.create('demo', 'dashboard_customer_sales', 'INCREMENTAL');
SET @mvid := LAST_INSERT_ID();
CALL flexviews.add_expr(@mvid,'GROUP','customer_id','customer_id');
CALL flexviews.add_expr(@mvid,'GROUP','customer_name','customer_name');
CALL flexviews.add_expr(@mvid,'GROUP','DATE_FORMAT(order_date,"%Y%m")','sale_when');
CALL flexviews.add_expr(@mvid,'SUM','quantity','total_items');
CALL flexviews.add_expr(@mvid,'SUM','price * quantity','total_price');
CALL flexviews.add_expr(@mvid,'COUNT','*','total_lines');

CALL flexviews.add_table(@mvid,'demo','orders','o',NULL);
CALL flexviews.add_table(@mvid,'demo','customers','c','USING (customer_id) ');
CALL flexviews.add_table(@mvid,'demo','order_lines','ol','USING (order_id) ');
CALL flexviews.enable(@mvid);</pre>
<p>The script (convert.php) supports basic queries which use select/group by/join/where.   You may not use sub-queries or any non-deterministic functions like NOW() or RAND().  HAVING clauses, ORDER BY clauses, etc, are not supported.  You can work around many of these limitations.  I will discuss one of the workarounds below, in the &#8216;complete refresh&#8217; section.  I&#8217;ll discuss others in future blog posts.</p>
<h3>Enable the view to use it</h3>
<p>The SQL_API call flexviews.enable() is used to actually build the contents of the view, making it available for querying:</p>
<pre>
mysql&gt; call flexviews.enable(
-&gt; flexviews.get_id('demo','dashboard_customer_sales'));
Query OK, 0 rows affected (41 min 52.04 sec)</pre>
<h3>Data dictionary</h3>
<p>Here is a quick example of the list of tables used by the above view, as stored in the data dictionary:</p>
<pre>mysql&gt; select *
from flexviews.mview_table
where mview_id=
flexviews.get_id('demo','dashboard_customer_sales')\G
*************************** 1. row ***************************
      mview_table_id: 27
            mview_id: 21
    mview_table_name: orders
  mview_table_schema: demo
   mview_table_alias: o
mview_join_condition: NULL
    mview_join_order: 999
*************************** 2. row ***************************
      mview_table_id: 28
            mview_id: 21
    mview_table_name: customers
  mview_table_schema: demo
   mview_table_alias: c
mview_join_condition: USING (customer_id)
    mview_join_order: 999
*************************** 3. row ***************************
      mview_table_id: 29
            mview_id: 21
    mview_table_name: order_lines
  mview_table_schema: demo
   mview_table_alias: ol
mview_join_condition: USING (order_id)
    mview_join_order: 999
3 rows in set (0.00 sec)</pre>
<h3>Using the dictionary</h3>
<p>The following SQL is dynamically generated by the flexviews.get_sql() function call.  This function reads the meta-data stored in the Flexviews data dictionary.  It returns the SQL which represents the data stored in the view.  This is much more convenient than trying to read from the dictionary directly to determine the contents of the view.</p>
<pre>mysql&gt; select flexviews.get_sql(
-&gt; flexviews.get_id('demo','dashboard_customer_sales'))
-&gt; as 'SQL' \G
*************************** 1. row ***************************
SQL:
SELECT NULL as mview$pk,
(customer_id) as `customer_id`,
(customer_name) as `customer_name`,
(DATE_FORMAT(order_date,'%Y%m')) as `sale_when`,
SUM(price * quantity) as `total_price`,
COUNT(*) as `total_lines`
FROM  demo.orders as o
JOIN demo.customers as c USING (customer_id)
JOIN demo.order_lines as ol USING (order_id)
GROUP BY (customer_id),
 (customer_name),
(DATE_FORMAT(order_date,'%Y%m'))
1 row in set, 1 warning (0.00 sec)</pre>
<p>
<h2>The complete refresh method</h2>
<p>This method is actually quite simple in its operation.  It will completely rebuild a view based on the SQL which defines it, each time the view is refreshed.</p>
<p>This is not much different from building a table with CREATE TABLE .. AS SELECT.  In fact, this is actually part of what the COMPLETE refresh method does during its operation.  It also takes care of atomically* replacing the new contents of the view with the old contents (using RENAME TABLE) which means that the view remains available for querying, even during refresh.</p>
<p>A complete refresh type view can use non-deterministic functions like NOW().   In fact any valid SELECT statement can be materialized.  The definition of the MV is stored internally, and represents the actual SQL that defines the view.  A simple API call is made to associate the SQL definition with the view, but the definition itself is a SQL statement, similar to a regular view.  If you choose to use only the complete refresh method, then FlexCDC is not required to use Flexviews.</p>
<p>In the following example, the total sales amounts are calculated (for all time) from a monthly summary.  The monthly summary is actually another MV. This saves a significant amount of time.  If this MV did not access the other MV, then the SQL necessary to compute this list would take over 40 <strong>minutes</strong> to execute, because that is how long it takes to create the monthly summary from scratch.</p>
<p>Here is an example of a complete refresh MV which computes the total sales for all customers.  It computes this from the dashboard_top_customers MV, ordering them in descending order:</p>
<pre>mysql&gt; call flexviews.create(
-&gt; 'demo','dashboard_top_customers','COMPLETE');
Query OK, 1 row affected (0.00 sec)

mysql&gt; call flexviews.set_definition(
-&gt;flexviews.get_id('demo','dashboard_top_customers'),
    -&gt; 'select customer_id,
    '&gt; sum(total_price) total_price,
    '&gt; sum(total_lines) total_lines
    '&gt; from demo.dashboard_customer_sales dsc
    '&gt; group by customer_id
    '&gt; order by total_price desc;
    '&gt; ');
Query OK, 1 row affected (0.00 sec)
</pre>
<p>When the view is &#8216;enabled&#8217;, the contents are created:</p>
<pre>
mysql&gt; call flexviews.enable(
-&gt; flexviews.get_id('demo','dashboard_top_customers'));
Query OK, 0 rows affected (5.73 sec)</pre>
<p>It only took about six seconds to calculate the total sales for all customers from the dashboard_customer_sales table.  If you were paying attention to the incremental example,  the dashboard_customer_sales table just happens to be the view we created above.  </p>
<p>I picked this example, because it is important to understand that you can build complete refresh views on top of incremental ones.  This will allow you to create complete refresh views that refresh much more quickly.</p>
<p>If this had to actually access the orders, order_lines and customers tables directly (the tables on which dashboard_customer_sales is built), then the query would take significantly longer (40+ minutes): </p>
<pre>
mysql> SELECT
-> customer_id as `customer_id`,
-> SUM(price * quantity) as `total_price`,
-> COUNT(*) as `total_lines`
-> FROM  demo.orders as o
-> JOIN demo.customers as c USING (customer_id)
-> JOIN demo.order_lines as ol USING (order_id)
-> GROUP BY (customer_id)
-> ORDER BY total_price desc
-> LIMIT 10;
+-------------+-------------+-------------+
| customer_id | total_price | total_lines |
+-------------+-------------+-------------+
|         689 |      770793 |        3811 |
|        6543 |      754138 |        3740 |
|        5337 |      742034 |        3674 |
|        5825 |      738420 |        3593 |
|        5803 |      733495 |        3670 |
|        1579 |      732507 |        3666 |
|        9316 |      731091 |        3610 |
|        2046 |      722631 |        3531 |
|        6319 |      720100 |        3572 |
|        6019 |      718031 |        3475 |
+-------------+-------------+-------------+

10 rows in set (43m 10.11 sec)</pre>
<p>As an added benefit, if you build more than one complete refresh view from an incrementally refreshable one, then you can keep the complete views transactionally consistent with each other, as long as you refresh all of the complete views after the incremental one.</p>
<p>We can use the new MV (dashboard_top_customers) is built, to list the top 10 customers nearly instantly.  Note that every MV gets a special column `mview$pk` which is an auto_increment BIGINT surrogate key for the table.  For complete refresh views which are ordered with ORDER BY, such as this one, this creates a ranking function automatically.  In cases where this is not useful, simply ignore this column.  It is used to prevent wide innodb primary keys on the MV.</p>
<pre>mysql&gt; select mview$pk as rank,
 customer_id,
 total_price,
 total_lines
 from demo.dashboard_top_customers
limit 10;
+------+-------------+-------------+-------------+
| rank | customer_id | total_price | total_lines |
+------+-------------+-------------+-------------+
|    1 |         689 |      770793 |        3811 |
|    2 |        6543 |      754138 |        3740 |
|    3 |        5337 |      742034 |        3674 |
|    4 |        5825 |      738420 |        3593 |
|    5 |        5803 |      733495 |        3670 |
|    6 |        1579 |      732507 |        3666 |
|    7 |        9316 |      731091 |        3610 |
|    8 |        2046 |      722631 |        3531 |
|    9 |        6319 |      720100 |        3572 |
|   10 |        6019 |      718031 |        3475 |
+------+-------------+-------------+-------------+
10 rows in set (0.00 sec)</pre>
<p>You can also use this MV to display the total sales amount for any particular customer.</p>
<p>Notice how you can use this summary to calculate the number of lines in the order_lines table, three orders of magnitude more quickly than COUNT(*).</p>
<pre>mysql&gt; select count(*) cnt from order_lines\G
*************************** 1. row ***************************
cnt: 155187034
1 row in set (32.03 sec)

mysql&gt; select sum(total_lines) cnt from dashboard_top_customers\G
*************************** 1. row ***************************
cnt: 155187034
1 row in set (0.03 sec)</pre>
<p>Notice the difference in response time.</p>
<h2>Refresh method performance comparison</h2>
<p>For demonstration purposes, I did the following:</p>
<ol>
<li>Created one view of each type.  The view is the same as the one specified above.</li>
<li>Both views took about 40 minutes to build</li>
<li>Deleted 100 order lines.</li>
<li>Refreshed both views.</li>
<li>Compared the refresh performance and query results.</li>
</ol>
<p>The view type does not usually* affect the time it takes to build the MVs the first time.   Both of these views build in about the same about of time:</p>
<pre>mysql&gt; call flexviews.enable(
-&gt; flexviews.get_id('demo','complete_example2'));
Query OK, 0 rows affected (42 min 42.14 sec)

mysql&gt; call flexviews.enable(
-&gt; flexviews.get_id('demo','dashboard_customer_sales'));
Query OK, 0 rows affected (41 min 52.04 sec)</pre>
<p><b>*</b>If you use MIN/MAX/COUNT_DISTINCT,  a secondary view will be built in the flexviews schema, transparently, to manage the distinct values for each group in the view.  This will increases the time it takes to build, and the refresh will be more expensive.  This is an optimization which is required to efficiently refresh MV which use those aggregate functions. </p>
<p>Now I delete some line items from orders:</p>
<pre>mysql&gt; delete
-&gt; from order_lines
-&gt; where order_id
-&gt; between 1 and 100
-&gt; limit 500;
Query OK, 484 rows affected (0.27 sec)</pre>
<p>Since the base table has changed, the views now exhibit drift.  There should be 484 less lines reflected in the count in our MVs but they are out-of-date and must be refreshed:</p>
<pre>mysql&gt; select sum(total_lines) from complete_example2;
+------------------+
| sum(total_lines) |
+------------------+
|        155187034 | &lt;-- too high
+------------------+
1 row in set (0.68 sec)

mysql&gt; select sum(total_lines) from  dashboard_customer_sales ;
+------------------+
| sum(total_lines) |
+------------------+
|        155187034 | &lt;-- too high
+------------------+
1 row in set (0.61 sec)</pre>
<h3>Refreshing the MVs</h3>
<p> The view which uses the complete refresh method takes a very long time to refresh: </p>
<pre>mysql&gt; call flexviews.refresh(
-&gt; flexviews.get_id('demo','complete_example2'),'BOTH',NULL);
Query OK, 0 rows affected (42 min 42.14 sec)</pre>
<p>But the incrementally refreshable version does not take long to refresh: </p>
<pre>mysql&gt; call flexviews.refresh(
-&gt; flexviews.get_id('demo',
-&gt; 'dashboard_customer_sales'),'BOTH',NULL);
Query OK, 0 rows affected (7.01 sec)</pre>
<p>The second one is 365x faster because it examines only the rows that changed. This exceptional ability to look at only what changed, even for MVs with aggregation and joins, is the value proposition for Flexviews. </p>
<h3>The flexviews.refresh() stored procedure</h3>
<p> The first parameter to flexviews.refresh() is the materialized view id. Each view has an identifier which can be obtained with flexviews.get_id(&#8216;schema&#8217;,&#8217;table&#8217;). </p>
<p>The second parameter reflects the refresh type. This parameter can take the options: &#8216;BOTH&#8217;,&#8217;COMPUTE&#8217;,&#8217;APPLY&#8217;, or &#8216;COMPLETE&#8217;. I specified &#8216;BOTH&#8217; for the second parameter in both examples. &#8216;BOTH&#8217; means compute the changes to the view, and also apply them. This is a combination of the &#8216;COMPUTE&#8217; and &#8216;APPLY&#8217; options. When you specify &#8216;COMPUTE&#8217; then Flexviews will compute the row changes for the view, but not apply them. If their are changes which have not been applied, then you can apply those changes by passing the &#8216;APPLY&#8217; option. </p>
<p>In both examples, I pass NULL as the last parameter.   Materialized views which based on the complete refresh method will always take NULL for this parameter.   For incrementally refreshable views, the last parameter is a transaction id number, which is usually obtained with flexviews.get_uow_id_from_datetime() function. If you pass a NULL value, then it refreshes the view up to the latest changes which have been collected.  You can use this to refresh multiple incremental refresh views to the same transactional point in time. </p>
<p>Also note that you can &#8216;COMPUTE&#8217; or &#8216;APPLY&#8217; changes to a particular transaction id. Logically, you can not apply changes past the transaction id to which you have computed them. </p>
<h3>And then confirm they contain the same results</h3>
<pre>mysql&gt; select sum(total_lines) from dashboard_customer_sales ;
+------------------+
| sum(total_lines) |
+------------------+
| 155186550        |
+------------------+
1 row in set (0.64 sec) 

mysql&gt; select sum(total_lines) from complete_example2 ;
+------------------+
| sum(total_lines) |
+------------------+
| 155186550        |
+------------------+
1 row in set (0.68 sec)</pre>
<h2>Conclusion</h2>
<p>Flexviews supports two refresh methods, the complete method and the incremental method.  I think you will agree that the incremental method has significant advantages over the complete method.  In this example the former method was over 350x faster than the latter.</p>
<p>I also showed how you can combine both types of views together.  The complete method examples show how to create a complete refresh view which reads from an incrementally refreshable one.  This allows the use of SQL features not available with the incremental method, like ORDER BY or use of NOW(), it but still provides improved performance during refresh by accessing summarized data.</p>
<p>I hope this helps you understand how Flexviews can help you ensure fast response times in your application by making access to summary data efficient.</p>
<p>View full post on <a href="http://www.mysqlperformanceblog.com/2011/04/04/flexviews-part-3-improving-query-performance-using-materialized-views/">MySQL Performance Blog</a></p>
]]></content:encoded>
			<wfw:commentRss>http://www.weez.com/2011/04/flexviews-%e2%80%93-part-3-%e2%80%93-improving-query-performance-using-materialized-views/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Using Flexviews – part two, change data capture</title>
		<link>http://www.weez.com/2011/03/using-flexviews-%e2%80%93-part-two-change-data-capture/</link>
		<comments>http://www.weez.com/2011/03/using-flexviews-%e2%80%93-part-two-change-data-capture/#comments</comments>
		<pubDate>Fri, 25 Mar 2011 23:50:13 +0000</pubDate>
		<dc:creator>Abidoon</dc:creator>
				<category><![CDATA[MySQL]]></category>
		<category><![CDATA[capture]]></category>
		<category><![CDATA[change]]></category>
		<category><![CDATA[data]]></category>
		<category><![CDATA[Flexviews]]></category>
		<category><![CDATA[Part]]></category>
		<category><![CDATA[using]]></category>

		<guid isPermaLink="false">http://www.weez.com/2011/03/using-flexviews-%e2%80%93-part-two-change-data-capture/</guid>
		<description><![CDATA[In my previous post I introduced materialized view concepts. As a reminder, the first post covered the following topics: What is a materialized view(MV)? It explained that an MV can pre-compute joins and may aggregate and summarize data. Using the aggregated data can significantly improve query response times compared to accessing the non-aggregated data. Keeping [...]]]></description>
			<content:encoded><![CDATA[<p>In my <a href="http://www.mysqlperformanceblog.com/2011/03/23/using-flexviews-part-one-introduction-to-materialized-views/">previous post</a> I introduced materialized view concepts.   </p>
<p>As a reminder, the first post covered the following topics:</p>
<ol>
<li>What is a materialized view(MV)?</li>
<li>It explained that an MV can pre-compute joins and may aggregate and summarize data.</li>
<li>Using the aggregated data can significantly improve query response times compared to accessing the non-aggregated data.</li>
<li>Keeping MVs up-to-date (refreshing) is usually expensive.</li>
<li>A change data capture tool can be used to implement an efficient way of refreshing them.</li>
</ol>
<p>This post begins with an introduction to change data capture technology and describes some of the ways in which it can be leveraged for your benefit.  This is followed by a description of FlexCDC, the change data capture tool included with Flexviews.  It continues with an overview of how to install and run FlexCDC, and concludes with a demonstration of the utility.</p>
<p>
<h1>What is Change Data Capture (CDC)?</h1>
<p>As the name implies, CDC software captures the changes made to database rows and makes those changes available in a convenient form which can be accessed by other programs.  CDC applications exist for many commercial databases but until recently this type of software was not available for MySQL.</p>
<p>Change Data Capture can be used to:</p>
<ol>
<li>Monitor a database table, or tables for changes.</li>
<li>Improve ETL performance by identifying the data which has changed.</li>
<li>Maintain materialized views with Flexviews (the primary purpose of FlexCDC).</li>
<li>Feed search engines like Sphinx or Solr only the rows that change.</li>
<li>Feed third party replication systems.</li>
<li>Provide data to &#8220;external triggers&#8221; such as Gearman jobs.</li>
</ol>
<p><span id="more-5293"></span></p>
<p>CDC  tools usually operate in one of the following ways:</p>
<ol>
<li>Timestamps (usually more than one) to identify rows that have changed</li>
<li>Triggers to capture changes synchronously</li>
<li>Database log reading to capture the changes asynchronously</li>
</ol>
<p>The first method has serious drawbacks, such that it can&#8217;t identify deleted rows and MySQL timestamps may not be flexible enough.</p>
<p>The trigger method has a lot of problems.  Triggers add a significant overhead.  When the structure of a table is changed, the triggers must be changed. The work in the trigger is immediate and affects every transaction.  Finally, MySQL has limited trigger support, some of which is the cause of the aforementioned problems.  The biggest problem, at least from standpoint of how Flexviews works, is that triggers can not, under normal conditions, detect the commit order of transactions.  This above all makes triggers an unacceptable CDC method.</p>
<p>This leaves the third method, log based capture as the best option because it imposes less overhead than triggers and change data capture may be done remotely and asynchronously.</p>
<p>
<h1>Binary log based CDC</h1>
<p>The CDC tool included with Flexviews is called FlexCDC.  It seemed like an appropriate name.  The <a href="http://dev.mysql.com/doc/refman/5.5/en/binary-log.html">Binary Log</a> is the MySQL log which records changes to tables in the database.  FlexCDC reads the binary log to determine what rows have changed.  Because of this, FlexCDC requires that you use row-based binary logs (RBR).  If you don&#8217;t have MySQL 5.1 or aren&#8217;t using RBR, then it is possible to set up a dedicated MySQL slave which has log_slave_updates=1 and binlog_format=row to process SBR changes from a MySQL master.  I&#8217;ll talk more about that in another blog post.</p>
<p>FlexCDC does not implement a full binary log parser.  Instead, it invokes the &#8216;mysqlbinlog&#8217; utility and it processes the predictable output of this program.  mysqlbinlog will always be able to read the binary logs of the version of mysql it ships with (and previous versions) so there is no worry about binary log format changes.   FlexCDC is written in PHP so it is portable.</p>
<p>
<h1>Setting up FlexCDC</h1>
<p><strong>FlexCDC has some basic requirements:</strong></p>
<ul>
<li>MySQL 5.1+</li>
<li>row based logging (binlog_format=1)</li>
<li>unique server_id in the my.cnf</li>
<li>log_slave_updates=1, if this is a MySQL slave</li>
<li>transaction_isolation=READ-COMMITTED</li>
</ul>
<p>You can get FlexCDC directly out of the Flexviews SVN.  I suggest that you just grab all of Flexviews:</p>
<pre>$ svn checkout http://flexviews.googlecode.com/svn/trunk/ flexviews</pre>
<p>Next you have to customize the example ini file.  FlexCDC is located in the flexviews/consumer/ subdirectory.</p>
<p><strong>Create the settings file:</strong></p>
<p>Change to the flexviews/consumer directory and copy the consumer.ini.example file to consumer.ini and edit it, making appropriate changes.  The file is well commented.  The example settings file should work for most MySQL installations which allow connections as root with no password from localhost.  It is possible to read from and/or write to remote servers, but this example uses the local machine which is the usual configuration for Flexviews since it requires local access to the tables and the changelogs in order to maintain materialized views.  Most database servers have some spare CPU for binary log parsing.</p>
<p><strong>Run the setup script:</strong></p>
<p>This will create the metadata tables and capture the initial binary log position.</p>
<pre>php ./setup_flexcdc.php --ini consumer.ini
$ php setup_flexcdc.php
setup starting
setup completed
</pre>
<p>If the setup detects any problems (such as binary logging not being enabled) it will exit with an error. It will exit with a message &#8220;setup completed&#8221; otherwise.</p>
<p>
<h1>Verify installation</h1>
<p><strong>The binary log stores it progress in a metadata table:</strong></p>
<pre>$ mysql -e 'select * from flexviews.binlog_consumer_status\G' -uroot
*************************** 1. row ***************************
server_id: 999
master_log_file: binary_log.000001
master_log_size: 214652
exec_master_log_pos: 214652
</pre>
<p>If you select from that table you won&#8217;t see anything changing, even if you are writing into your database.  This isn&#8217;t anything to worry about, since the background process isn&#8217;t running yet.</p>
<p><strong>Starting the background process:</strong></p>
<p>FlexCDC includes a consumer_safe.sh script that will start up a copy of FlexCDC and restart it if it exits with error.  You can shut down FlexCDC by sending it a HUP signal.  The script will drop a .pid file so you know what process to HUP.</p>
<pre>$ ./consumer_safe.sh --ini=consumer.ini &amp;
[1] 959</pre>
<pre>$ ps
PID TTY          TIME CMD
959 pts/1    00:00:00 consumer_safe.sh
960 pts/1    00:00:00 php
967 pts/1    00:00:00 ps
6248 pts/1    00:00:00 bash</pre>
<pre>$ cat flexcdc.pid
960</pre>
<p>
<h1>Adding a changelog to a table</h1>
<p>FlexCDC copies the contents of database rows which change into special tables called changelogs.  Each changelog is located in the flexviews database and is named $SCHEMA_$TABLE where $SCHEMA is the schema in which the source table is located and $TABLE the name of the source table.  If that is confusing it should be clear in a moment.</p>
<p>Lets create a table, insert some rows, add a change log, delete rows and then insert some more of them:</p>
<pre>mysql&gt; create table
> test.demo (
> c1 int auto_increment primary key,
> c2 int
> )
> engine=innodb;
Query OK, 0 rows affected (0.00 sec)

mysql&gt; insert into test.demo values (1,1);
Query OK, 1 row affected (0.00 sec)</pre>
<p>Even though FlexCDC is running in the background, it didn&#8217;t capture any changes from that insert.  We need to add the table to the list of tables to changelog.  There is a utility included with FlexCDC called &#8216;add_table.php&#8217;.  This script automates the process of adding a table to the list of tables to changelog.  It does this by adding an entry to the `flexviews`.`mvlogs` metadata table, and it creates the changelog table itself.</p>
<pre>$ php add_table.php --schema=test --table=demo
success</pre>
<p>Note that you can enable auto_changelog=true in the config file to automatically record changes for any table, starting from the first time a change is seen for that table.  This is generally only useful if you have a small number of tables, and you want to track changes on all of them.  </p>
<p>You may have also noted that I did not include &#8211;ini=consumer.ini.  This is because this is the default config filename to search for.  I included it in the earlier examples for illustration purposes.</p>
<p>
<h1>Examine the changes</h1>
<p><strong>Now that the changelog has been added, any changes to `test`.`demo` will automatically be captured.</strong></p>
<p>Insert data in one transaction (two rows):</p>
<pre>mysql&gt; insert into test.demo values (NULL,2),(NULL,3);
Query OK, 2 rows affected (0.00 sec)
Records: 2  Duplicates: 0  Warnings: 0</pre>
<p>And delete data in a second transaction:</p>
<pre>mysql&gt; delete from test.demo where c1=1;
Query OK, 1 row affected (0.00 sec)</pre>
<p><strong>The changelog is flexviews.test_demo.  This because the source table is `test`.`demo`.</strong></p>
<pre>mysql&gt; select * from flexviews.test_demo\G
*************************** 1. row ***************************
dml_type: 1
uow_id: 10
fv$server_id: 999
c1: 2
c2: 2
*************************** 2. row ***************************
dml_type: 1
uow_id: 10
fv$server_id: 999
c1: 3
c2: 3
*************************** 3. row ***************************
dml_type: -1
uow_id: 11
fv$server_id: 999
c1: 1
c2: 1
3 rows in set (0.00 sec)</pre>
<p>As you can see, there are three rows in the changelog, each representing one of the changes we made.</p>
<p>You will notice that the source table only has two columns, but the changelog contains five.  All change logs contain three additional metadata columns: dml_type, uow_id and fv$server_id.  These columns represent the type of change (insert is 1, delete -1), the transaction order and the source of the changes, respectively.</p>
<p>Finally, note that the two insertions happened inside of the same transaction, and that the insertions happened before the deletion.  Though they are none shown here, updates would be represented by a deletion followed by an insertion.</p>
<p>View full post on <a href="http://www.mysqlperformanceblog.com/2011/03/25/using-flexviews-part-two-change-data-capture/">MySQL Performance Blog</a></p>
]]></content:encoded>
			<wfw:commentRss>http://www.weez.com/2011/03/using-flexviews-%e2%80%93-part-two-change-data-capture/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Using Flexviews – part one, introduction to materialized views</title>
		<link>http://www.weez.com/2011/03/using-flexviews-%e2%80%93-part-one-introduction-to-materialized-views/</link>
		<comments>http://www.weez.com/2011/03/using-flexviews-%e2%80%93-part-one-introduction-to-materialized-views/#comments</comments>
		<pubDate>Thu, 24 Mar 2011 05:02:19 +0000</pubDate>
		<dc:creator>Abidoon</dc:creator>
				<category><![CDATA[MySQL]]></category>
		<category><![CDATA[Flexviews]]></category>
		<category><![CDATA[Introduction]]></category>
		<category><![CDATA[materialized]]></category>
		<category><![CDATA[Part]]></category>
		<category><![CDATA[using]]></category>
		<category><![CDATA[views]]></category>

		<guid isPermaLink="false">http://www.weez.com/2011/03/using-flexviews-%e2%80%93-part-one-introduction-to-materialized-views/</guid>
		<description><![CDATA[If you know me, then you probably have heard of Flexviews. If not, then it might not be familiar to you. I&#8217;m giving a talk on it at the MySQL 2011 CE, and I figured I should blog about it before then. For those unfamiliar, Flexviews enables you to create and maintain incrementally refreshable materialized [...]]]></description>
			<content:encoded><![CDATA[<p>If you know me, then you probably have heard of <a href="http://Flexvie.ws">Flexviews</a>.  If not, then it might not be familiar to you.  I&#8217;m giving a <a href="http://en.oreilly.com/mysql2011/public/schedule/detail/17146">talk on it at the MySQL 2011 CE</a>, and I figured I should blog about it before then.  For those unfamiliar, Flexviews enables you to create and maintain incrementally refreshable materialized views.</p>
<p>You might be asking yourself &#8220;what is an incrementally refreshable materialized view?&#8221;.  If so, then keep reading.  This is the first in a multi-part series describing Flexviews.</p>
<p><span id="more-5284"></span><br />
<strong>The output of any SQL query is logically an in-memory table</strong><br />
The output of a SQL query, and what we normally think of as a <em>result set</em> is really a virtual table.  It has columns and rows, just like a database table, but it is temporary in nature, usually existing only in memory for a short time.  This concept extends from relational algebra, upon which SQL is built.  All SQL can be broken down into relational algebra, which is convenient because that means there are lots of transformations that can be done to it, all without changing the meaning or the output.  The output of all relational algebraic statements is a table, so conceptually so too is the output of a SQL statement.</p>
<p>MySQL even includes a SQL statement that makes this perfectly clear: CREATE TABLE .. AS SELECT (CTAS).  The results of the SELECT portion of the statement are stored in a table.  Storing the results of a SQL statement into a table (even a temporary table)  is called <em>materializing</em> the results.</p>
<p><strong><em>Views </em>are similar to regular SQL statements. </strong><br />
A <em>view </em>is a SQL statement which acts like a table. When you execute a query on a view, the result set is ephemeral, generated at run-time for consumption and then immediately discarded.</p>
<p><em>Views </em>are not generally considered to be a performance optimization because:</p>
<ul>
<li>The contents of the result set are computed <em>each time the view is accessed</em>.</li>
<li>If multiple statements access the same view repeatedly or concurrently, then this computation is likely to be expensive</li>
<li>If the view accesses large quantities of data, then the computation is likely expensive.</li>
<li>Views containing grouping, aggregation, sorting, distinct or other conditions must be fully computed and stored in a temporary table before they can be accessed, which is very expensive.</li>
</ul>
<p><strong>What is a materialized view (MV)?</strong><br />
A <em>materialized view</em> is similar to a regular <em>view</em>, in that it represents the result set of a query, but the contents are stored (<em>materialized!</em>) as a real table.   So a MV is similar to a table created with the CTAS command described above.  This similarity is fairly superficial though.  While both store results in a table,  the MV represents the results of a SQL statement at a specific point in time, essentially a <em>consistent snapshot </em>of the query result.   It is not possible to create multiple different tables via CTAS and have them all be transactionally consistent with one another, unless you stop all database write activity.</p>
<p><em>Materialized views </em>can be used to enhance performance by acting as a cache.  Further, the cost of a cache miss is lower because incremental refresh is faster than recomputing the contents from scratch.</p>
<ul>
<li>The contents of the result set are updated periodically, not each time the view is accessed.</li>
<li>If multiple statements access the same view repeatedly or concurrently it is not likely to be very expensive.</li>
<li>If the view is large, accessing the table will be considerably cheaper</li>
<li>You can add indexes to the MV.</li>
<li>Since the data is already joined together and pre-aggregated, CPU and memory usage may be reduced compared to computing the results.</li>
<li>You can often generate multiple different blocks of content from the summarized data in one or more views</li>
</ul>
<p><strong>Materialized views must be <em>refreshed</em></strong>.<br />
Refreshing a MV brings it up to date to reflect the changes in the database since the view was either first created or last refreshed, whichever is more recent.  More importantly a MV can can be refreshed <em>to a specific point in time</em>, not just &#8220;now&#8221;.  This means that you can maintain multiple MVs and keep them synced to the same point in time.   There are two different methods by which a MV can be refreshed.</p>
<p>The first is the <strong><em>complete</em></strong> refresh method, which rebuilds the entire contents of the view from scratch.  This is the less desirable method:</p>
<ul>
<li>During a complete refresh, the view contents are completely recalculated, which could be very expensive</li>
<li>On some databases, the contents of the view are not available during complete refresh (not true of Flexviews)</li>
<li>During refresh the view may occupy twice as much space (similar to ALTER TABLE)</li>
<li>Supports all SQL syntax (like outer join) but can&#8217;t be refreshed to a specific point in time.</li>
<li>Performs no better than CTAS, but gives a convenient method of creating and refreshing the materialized results</li>
</ul>
<p>The second is refresh method is <strong><em>incremental </em></strong>refresh.  This method updates the view.  It usually examines only the rows which have changed since the view was last refreshed.</p>
<p><em>Incremental </em>refresh has obvious benefits which include:</p>
<ul>
<li>Refreshing large views is orders of magnitude faster than complete refresh.</li>
<li>When updating the view, only a subset of the database rows must be examined</li>
<li>The rows examined are related to the rows which have changed since the last refresh</li>
<li>The view can be refreshed forward to to a specific transactional point in time</li>
<li>Multiple views can be rolled forward to the exact same consistent point in time, with respect to committed transactions</li>
<li>The processing can be done on a slave dedicated to such work.</li>
</ul>
<p>And some drawbacks:</p>
<ul>
<li>Not all SQL syntax supported (no outer join), no non-deterministic functions, etc.</li>
<li>There is overhead for change-data capture.</li>
<li>Some extra storage is used for the changes and deltas.</li>
<li>Creating an MV is not as simple as I&#8217;d like.</li>
</ul>
<p><strong>Incremental refresh capabilities imply that Flexviews has some way to capture the changes that happen in the database and then apply those changes to the views. </strong></p>
<p>These capabilities break down into three main categories which will be blogged about in subsequent posts:</p>
<ul>
<li>Change Data Capture &#8211; How Flexviews figures out what changed in the database</li>
<li>Delta computation &#8211; How it uses those changes to compute the differences between the old results and the new results</li>
<li>Delta application &#8211; How Flexviews uses the deltas to actually make the changes</li>
<li>Finally, the last blog post will describe how to use Flexviews for online schema change, or as an ELT tool and it will cover how to create the materialized views which Flexviews manages.</li>
</ul>
<p>View full post on <a href="http://www.mysqlperformanceblog.com/2011/03/23/using-flexviews-part-one-introduction-to-materialized-views/">MySQL Performance Blog</a></p>
]]></content:encoded>
			<wfw:commentRss>http://www.weez.com/2011/03/using-flexviews-%e2%80%93-part-one-introduction-to-materialized-views/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

