4 Nov 2009

Does ActiveMQ 5.3 broker get hung if there are millions of messages on a queue and direct JDBC is used?


That was an interesting ActiveMQ issue I had to work on last week.
We used ActiveMQ 5.3.0.3-fuse with direct JDBC persistence to a MySQL database.

And noticed that when we had 2 million persistent messages on queue A, we could not even process any messages on an empty queue B. It seemed the broker got hung but then it did react to any commands from the web or JMX console.

When looking into the cause of this more deeply we found this:

In ActiveMQ there is a cleanup task thread that periodically checks for expired or acknowledged messages. In case of using a direct JDBC persistence, this task runs the following SQL command against the JDBC database table where the JMS messages are stored:


DELETE FROM ACTIVEMQ_MSGS WHERE ( EXPIRATION<>0 AND EXPIRATIONOR ID <=
( SELECT min(ACTIVEMQ_ACKS.LAST_ACKED_ID) FROM ACTIVEMQ_ACKS WHERE
ACTIVEMQ_ACKS.CONTAINER=ACTIVEMQ_MSGS.CONTAINER)


Depending on the JDBC database used, that SQL statement might lock the entire database table during the run of that statement. This is certainly the case when using MySQL but not when using Apache Derby. Other JDBC databases might lock the entire table as well.

During that time no other persistent message (no matter for what queue) can be processed, as all JMS messages for all queues are stored in the same database table called activemq_msgs. For a database table with millions of messages, this might take several mins or more (I had to wait around 15 mins for that statement to complete).
Also, the cleanup task is scheduled to run at a fixed rate (every 2 mins or so), so once it completed it is already late on schedule and therefore gets kicked off again straight away, not leaving much time for other threads to insert new messages into the db. So the threads managing the other queues starve and might perhaps never get the processing time to process their messages.

This behavior can be confirmed by adding the following logging configuration


log4j.logger.org.apache.activemq.store.jdbc=DEBUG


It will log the above SQL statement being executed for a long time and being called repeatedly as soon as it has finished.
Also when attaching jconsole to the broker and capturing a stack trace of the thread that is to process new messages, it should be waiting for a long time on the JDBC database driver to complete the SQL insert statement.


Name: QueueThread:queue://JMSLoadTest.queue.0
State: RUNNABLE
Total blocked: 1 Total waited: 0

Stack trace:
java.net.SocketInputStream.socketRead0(Native Method)
java.net.SocketInputStream.read(SocketInputStream.java:129)
com.mysql.jdbc.util.ReadAheadInputStream.fill(ReadAheadInputStream.java:113)
com.mysql.jdbc.util.ReadAheadInputStream.readFromUnderlyingStreamIfNecessary(ReadAheadInputStream.java:160)
com.mysql.jdbc.util.ReadAheadInputStream.read(ReadAheadInputStream.java:188)
com.mysql.jdbc.MysqlIO.readFully(MysqlIO.java:2428)
com.mysql.jdbc.MysqlIO.reuseAndReadPacket(MysqlIO.java:2882)
com.mysql.jdbc.MysqlIO.reuseAndReadPacket(MysqlIO.java:2871)
com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3414)
com.mysql.jdbc.MysqlIO.sendCommand(MysqlIO.java:1936)
com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:2060)
com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2542)
com.mysql.jdbc.PreparedStatement.executeInternal(PreparedStatement.java:1734)
com.mysql.jdbc.PreparedStatement.executeQuery(PreparedStatement.java:1876)
org.apache.commons.dbcp.DelegatingPreparedStatement.executeQuery(DelegatingPreparedStatement.java:91)
org.apache.commons.dbcp.DelegatingPreparedStatement.executeQuery(DelegatingPreparedStatement.java:91)
org.apache.activemq.store.jdbc.adapter.DefaultJDBCAdapter.doRecoverNextMessages(DefaultJDBCAdapter.java:709)
org.apache.activemq.store.jdbc.JDBCMessageStore.recoverNextMessages(JDBCMessageStore.java:230)
org.apache.activemq.store.ProxyMessageStore.recoverNextMessages(ProxyMessageStore.java:83)
org.apache.activemq.broker.region.cursors.QueueStorePrefetch.doFillBatch(QueueStorePrefetch.java:75)
org.apache.activemq.broker.region.cursors.AbstractStoreCursor.fillBatch(AbstractStoreCursor.java:210)
org.apache.activemq.broker.region.cursors.AbstractStoreCursor.hasNext(AbstractStoreCursor.java:119)
org.apache.activemq.broker.region.cursors.StoreQueueCursor.hasNext(StoreQueueCursor.java:131)
org.apache.activemq.broker.region.Queue.doPageIn(Queue.java:1243)
org.apache.activemq.broker.region.Queue.pageInMessages(Queue.java:1378)
org.apache.activemq.broker.region.Queue.iterate(Queue.java:1086)
org.apache.activemq.thread.DeterministicTaskRunner.runTask(DeterministicTaskRunner.java:84)
org.apache.activemq.thread.DeterministicTaskRunner$1.run(DeterministicTaskRunner.java:41)
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:650)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:675)
java.lang.Thread.run(Thread.java:595)


The broker itself is not hung but the cleanup task locks the entire JDBC database. If the broker is left running for a longer time, it will process new messages, but only a few within a longer time frame.
I raised MB-558 (mirrored in AMQ-2470) for this problem and Gary Tully fixed it by changing the clean up task from a running at a fixed rate to running at a fixed delay.

2 comments:

vhold said...

Interesting, are you using InnoDB or MyISAM for that particular table?

Torsten Mielke said...

MySQL was run using InnoDB.