David Shrewsbury: Drizzle Transaction Message Limit
Posted on 06 September 2010 by Abidoon
Some recent changes I made have recently been pushed to Drizzle trunk that affect the size of the Transaction protobuf message that any replication stream will see (e.g., the transaction log). This was necessary to fix bug 600795.
Without a Transaction message size limit, for any bulk operations, like LOAD DATA, we would have ended up with a Transaction message that could possibly contain a very large Statement message that contained all of the INSERT data for the bulk load. This obviously could eat up a large amount of memory if we kept allowing the Statement to grow without bounds. The Drizzle kernel, when it can, keeps appending the values to INSERT onto the same record.
To circumvent this, we now allow multiple Transaction records for a single database transaction. Each Transaction GPB message representing a single database transaction will all have the same transaction ID, and only the last Transaction message will have the Statement’s end_segment attribute set to true.
Here is an example of this change that you might now see in the transaction log:
transaction_context {
server_id: 1
transaction_id: 3
start_timestamp: 1283118092815781
end_timestamp: 1283118092815869
}
statement {
type: INSERT
start_timestamp: 1283118092815782
end_timestamp: 1283118092815868
insert_header {
table_metadata {
schema_name: “test”
table_name: “t”
}
field_metadata {
type: INTEGER
name: “id”
}
field_metadata {
type: VARCHAR
name: “a”
}
}
insert_data {
segment_id: 1
end_segment: false
record {
insert_value: “2″
insert_value: “abc”
is_null: false
is_null: false
}
record {
insert_value: “3″
insert_value: “def”
is_null: false
is_null: false
}
}
}
This example is a bit contrived as there is no need to split up such a small transaction, but you can see the basic changes here. We have two Transaction messages, both with the same transaction ID. You can see that the Statement’s end_segment is set to false in the first message, while the Statement within the second Transaction message has end_segment set to true.
So, in case it isn’t obvious, there are now two ways to determine when you should commit if you are a replication stream TransactionApplier, or if you are reading from the transaction log:
- If the transaction ID changes, COMMIT.
- Or, if the current Transaction has all Statement messages with end_segment set to true, COMMIT.
Protocol Buffers are not designed to handle large messages. As a general rule of thumb, if you are dealing in messages larger than a megabyte each, it may be time to consider an alternate strategy.
So 1M seemed to be a reasonable default. I’ll change this in the near future to be a configurable value once we get some changes to our sys var stuff merged.
View full post on Planet Drizzle
Tags | David, Drizzle, Limit, message, Shrewsbury, Transaction
