I recently investigated an issue in our PO system and learnt few useful things. I thought it will be useful for others to share it here:
In our PO system, every morning, we will open NetWeaver Administrator and go to “Message Monitor”. There we could see messages in Error and Successful status. When clicking on Successful messages, it will display the “Message List” with all the messages. When clicking on one of the messages, it was supposed open the message immediately, but it took 3 to 4 mins to open. In some cases, it took more than 5mins.
The initial checks of the system didn’t reveal any obvious issues. All the flags were green, and system was running fine in every other aspect. There has been no delays or failures anywhere else in the system. So, we logged an Incident with SAP.
- Firstly, we were asked to check if tables like BC_MSG, BC_MSG_LOG, BC_MSG_VERSION, BC_MSG_VERSION is taking up a lot of space. We checked the count(*) of these tables and they were not big. Some tables were even empty.
- Next, I was asked to collect Thread Dumps, which I did by going to NWA > Troubleshooting > Advanced > Thread Dump Analysis. While simulating the issue, I clicked on “Generate Thread Dump”. According to the dumps, the slowness was happening in getting message audit logs. So we were asked to check the count(*) on BC_MSG_AUDIT table. It had 0 rows.
- Next, I was asked to use the HangMan Tool as mentioned in the Note 948633. I ran it as instructed while simulating the issue and collected logs. These logs didn’t reveal the cause of the issue.
- Next, SAP assigned the incident to the Database Expert, who looked at the database latency values. The Read Latency should be a maximum of 20 msecs but we had 140 msecs. The Write Latency should be a maximum of 10 msecs but we had 17 msecs. We found that increasing the Filesystem storage size (where database files are stored) would improve the latency. Hence, we did the same and the latency figures reduced to less than 20 and 10.
- Unfortunately, that didn’t resolve the issue. Message monitor was still taking a long time to open the messages.
SAP database engineer did various checks on our SQL Server database and finally noticed that the table “BC_MSG_AUDIT” was taking up a lot of space even when it had 0 rows. This was evident when we ran this query.
exec sp_spaceused '[<your DB name here>].[BC_MSG_AUDIT]'
The Data Size was around 20GB when the table had 0 rows. This was an indication that the table is corrupt.
We also noticed the table didn’t contain a Clustered Index, which was evident by running this query.
exec sp_helpindex '[<your DB name here>].[BC_MSG_AUDIT]'
We decided that dropping and recreating the index (as a clustered index) would fix the corrupt table. We used these queries to do the same:
-- drop indexes DROP INDEX I_BC_MSG_AUDIT_ID ON <your DB name here>.BC_MSG_AUDIT; DROP INDEX I_BC_MSG_AUDIT ON <your DB name here>.BC_MSG_AUDIT; -- create the clustered index first CREATE CLUSTERED INDEX I_BC_MSG_AUDIT_ID ON <your DB name here>.BC_MSG_AUDIT (MSG_ID) WITH (DATA_COMPRESSION=PAGE) -- and then the non-clustered one CREATE NONCLUSTERED INDEX I_BC_MSG_AUDIT ON <your DB name here>.BC_MSG_AUDIT (MSG_ID, MSG_EXISTS, TIME_STAMP) WITH (DATA_COMPRESSION=PAGE)
After doing the above, the row size came down to 8kb. This resolved the issue. The message monitor opened messages immediately, which proves the delay was happening due to this corrupt table.
PLEASE NOTE: The above scenario may not be exactly same in your system. Please use above queries with caution.