Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.


Info

This hanging issue can occur only between Sep 2017 and Oct 2018 EagleML versions (fixed in Nov-2018). EagleML version can be found in the first line of \tpe\dynamic\msgcenter\eagle_ml-2-0_cm\w_config.inc

"The 18000" Issue

How to tell you are seeing this issue:

  1. There are long running tasks (> 20 min) for the parallel_exec stream. This can be seen in the Tasks screen of MCC with default parameters. Or in the File Statistics screen.
    AND
  2. There was MC restart or crash after the time shown as "start processing" for these files.

You can tell when restart happened by looking at the log files. MC starts a new set of logs in this case. So, if you see more than 4 MC log files for same hour, that is the red flag. Check the times inside the logs to get the exact restart time.

Where does 18000 come from?

  • The value is the hard-coded timeout in the EagleML include file tpe\dynamic\msgcenter\eagle_ml-2-0_cm\waiting_processor.inc for parallel_exec stream to check if a file has been loaded
  •  It can be overridden in the w_config_custom as W_REQ_CNT_LOOP_MAX parameter (but it is too late if it is already processing)

Solution

How to Fix the issue (how to recover the files which are stuck, waiting for those 18000 seconds)

No MC restart will be required.

In the message Center console you can see which files are stuck.

Image Removed

The plan is 

1. To assign this file to the stream where the running process (erroneously) is looking for it

We are going to need the file name and stream name of the line which stays in progress (green circular arrow).

Div
langSQL

update  msgcenter_dbo.msg_message_stat set stream_id = (select msg_stream_id from msgcenter_dbo.msg_streams where msg_stream_title = 'eagle_ml-2-0_default_cm_acquire_data')
where file_name like '%

Background Color
coloryellow

AK_PE_108

%'
and stream_id = (select msg_stream_id from msgcenter_dbo.msg_streams where msg_stream_title = '

Background Color
coloryellow

eagle_default_in_csv_all

')

This statement will change stream name for the topmost (failed) record. The *parallel_exec stream is in progress only because it does not “see” that completed record.

...

The picture will turn into something like this:

Notice, stream name of the failed record changed, and now *parallel_exec stream could finish its polling.

Image Removed

  1. Assign the file back to correct stream.

Same statement, but stream names swapped:

update  msgcenter_dbo.msg_message_stat set stream_id = (select msg_stream_id from msgcenter_dbo.msg_streams where msg_stream_title = 'eagle_default_in_csv_all')

where file_name like '%AK_PE_108%'

and stream_id = (select msg_stream_id from msgcenter_dbo.msg_streams where msg_stream_title = ' eagle_ml-2-0_default_cm_acquire_data'')

The result is now same as it would be without the “18000 issue” :

Image Removed

We will have to repeat these three steps for all the stuck files.

It is ok to run the first update statement for all the stuck files, then wait for all to complete,  and then recover them all using the third statement.

Child pages (Children Display)