I recently encountered an interesting issue in our Oracle 11gR2 databases (it occurs in 12c also) where the number of scheduler jobs increases without check. If you're running DBMS_SCHEDULER jobs this could be impacting you. I'm a little embarrassed to admit that I didn't discover the root cause at first, instead I realized that queries on DBA_OBJECTS were performing poorly. Scheduler jobs are objects, and, well, after thousands were populated in DBA_OBJECTS we hit a threshold where the execution plan changed on these queries. Digging backwards I realized that there were a ton of rows with object_type = 'JOB'.
The issue is described in Metalink document 2001165.1 and caused by multiple bugs. I ended up running a slightly different solution than recommended. This is partly because the described solution isn't quite foolproof and partly that PL/SQL seems, to me, a better, faster, and more elegant solution than the recommended method of writing dynamic SQL to a file and then executing the file.
I solved it with:
alter system set job_queue_processes=0 scope=memory sid='*';begin
for c1 in (
select job_name, state
from dba_scheduler_jobs
where job_name like 'AQ$_PLSQL_NTFN_%')
loop
begin
if c1.state = 'RUNNING'
then dbms_scheduler.stop_job(c1.job_name);
end if;
dbms_scheduler.drop_job(c1.job_name);
exception
when others then null;
end;
end loop;
end;
/
Feel free to put something more meaningful than a "when others then null" exception handler here. I was lazy!
Then apply the recommended patch and bounce the instance. In a RAC environment the patch can be applied and instance restarts rolled across the nodes, but ideally don't revert job_queue_processes on a node until it has been restarted, else you'll see a new batch of jobs accumulating to be purged.
NOTE: I did try applying the patch in one environment prior to setting the job_queue_process parameter and purging the orphaned records, but the issue persisted. It appears to be sensitive to the order:
- Set job_queue_processes=0
- Purge the records
- Apply the patch
- Restart the instance and restore the value of job_queue_processes (automatic if you use scope=memory)
When complete, I'd recommend collecting dictionary statistics, since the large number of jobs deleted could easily skew optimizer operations (and that's how I originally discovered we were victims of the bug).
No comments:
Post a Comment