Long running Operations in
Oracle (entries in v$session_longops)
By Gints Plivna
What are long running operations and
how to monitor them?
Built in types of long operations
10 000 database block threshold
1 000 database block threshold
Even 20 block hash join may show up
in v$session_longops
Hash joins can either fly or crawl
Linear prediction may be wrong
References and Further reading
What
are long running operations and how to monitor them?
There is a
dynamic performance view v$session_longops that is populated for many long
running operations in Oracle. The primary criterion for any operation to appear
in v$session_longops is to run more than 6 seconds. Although this isn’t the
only criterion as well as not all operations that take more than 6 seconds are
shown in this view. For example one can find hash joins in v$session_longops,
but you won’t find there nested loop joins even if they are longer than 6
seconds and are joining very big data sets.
Some tools
are showing long running operations graphically, so that one doesn’t need to
query v$session_longops directly but tool is doing it for you.

Figure 1 Long operation by Oracle Enterprise
Manager 9.2.0

Figure 2 Long operations by Oracle
Application Express 2.1.
Of course
there is always possibility to get necessary information straight from the view
v$session_longops. In the examples given for this paper I’ve mostly used
following query to get the most recent info from v$session_longops:
select * from (
select opname,
target, sofar, totalwork,
units,
elapsed_seconds, message
from
v$session_longops order by start_time desc)
where rownum <=1;
On a busy server one would like to add filter on sid and serial# in the inner
query to get the info only about the necessary session and the query is as follows:
select * from (
select opname,
target, sofar, totalwork,
units,
elapsed_seconds, message
from
v$session_longops
where sid = <sid>
and serial# = <serial#>
order by
start_time desc)
where rownum <=1;
One of the
most important things when one is monitoring long running operations is to
remember that Oracle considers all of them to be linear i.e. estimated time remaining (column TIME_REMAINING) is:
- inversely proportional to work
already done (column SOFAR) and
- directly proportional to
already spent seconds for work (column ELAPSED_SECONDS) and
- directly proportional to
remaining work i.e. difference between all work (column TOTALWORK) and work
already done (column SOFAR).
So the
formula is ![]()
After calculation TIME_REMAINING is rounded to the whole number. Of course
in the real life not all things are linear because of various workloads and
various works to be done and therefore expectations may be sometimes too optimistic
as well as pessimistic. See example in chapter Linear prediction may be wrong.
Built
in types of long operations
Each new
version of Oracle adds several new types of built in long operations that are
shown in v$session_longops. Some of them are:
- Table scan;
- Index Fast Full Scan;
- Hash join;
- Sort/Merge;
- Sort Output;
- Rollback;
- Gather Table's Index Statistics.
In next
chapters I’ll look deeper into first three of the above listed types.
Table
scan
Table scan
is one of the most common long operations. It is shown only when done via FULL
SCAN and is measured in database blocks occupied by the table.
Work
behind the scenes
Actually
Table scan may hide many other operations behind the scenes, therefore don’t be
surprised if it takes very long time according to v$session_longops. Some of
the examples are as follows (Example for each of the following bullet can be
found here VariousTableScans.txt):
- Simple query scanning all the
rows in the table;
- Query with a filter;
- Creation of an index and table
scan to gather necessary info for the index;
- Alter table column from NULL to
NOT NULL;
- Query with join where outer
table is accessed via FULL SCAN and inner table is joined using NESTED
LOOPS join, so table scan actually includes a hidden join;
- A cursor in PL/SQL that does a
full scan on the table but inside the cursor the possibilities to do
something are infinite.
All above statements
according to v$session_longops are doing the same amount of work i.e. scanning
94577 blocks of the table BIG, but the time taken as well as the total work
done is rather different.
10 000
database block threshold
Besides
criterion that Table scan has to run more than 6 seconds there is one another
criterion that is not so widely known – table has to occupy at least 10000
database blocks. Table scan has to satisfy BOTH criteria to show up in
v$session_longops i.e. if the table has less than 10000 blocks but table scan
runs more than 6 seconds that’s not enough.
Following
example (full listing in 10000BlocksLongops.txt)
illustrates this requirement.
1) Table
BIG is created from dba_source in tablespace with segment space management set
to manual to avoid any automatic block assignment for table.
2) To boost
up the time of full scan index on two columns owner, name is created and table
BIG is joined to itself using nested loops join.
3) To
quickly fill up necessary blocks only one row into table is inserted and then
issued command minimize rows_per_block.
4) Insert
of 19987 rows fills up 9995 blocks.
5) Now
following query is executed:
select /*+ full(a) use_nl(a, b) */ count(distinct
a.line) from big a, big b
where a.owner = b.owner
and a.name =
b.name;
It takes
41.68 seconds to run but there isn't any corresponding entry in
v$session_longops.
6) Now two
more rows are inserted and gathering table statistics approves that table now
takes up 10 000 blocks (On XE I needed to insert only one additional row,
therefore I tried to insert row after row).
7) The same
query as in paragraph 5) is executed again.
Now it
takes 41.48 seconds and one can see corresponding entry in v$session_longops.
Index
Fast Full Scan
Index Fast
Full Scan is one of the index access paths used when all the information can be
gathered from index instead of scanning the entire table.
1 000
database block threshold
Index FFS
as well as Table Scans have another criterion for showing up in v$session_longops.
Index has to occupy at least 1000 blocks i.e. 10 times less than threshold for
Table Scans.
Following
example (full listing in 1000IndexFFSLongops.txt)
illustrates this requirement.
1) Table
BIG1 is created from dba_source in tablespace with segment space management set
to manual to avoid any automatic block assignment for table. To make easier
calculation index organized table is used.
2) To boost
up the time of index fast full scan table BIG1 is joined to itself.
3) There
isn’t possibility to use clause minimize rows_per_block for indexes, so an
approximate number of rows should be found to fill up necessary blocks.
4) Insert
of 82150 rows fills up 993 leaf blocks and index has bevel 2.
5) Now
following query is executed:
select count(distinct a.line) from big1 a, big1 b
where a.owner
= b.owner
and a.name =
b.name
/
It takes 2
minutes and 28.68 seconds to run but there isn't any corresponding entry in
v$session_longops.
6) Table is
dropped and 50 rows more are inserted than previously. It fills up one more
leaf block i.e. 994 leaf blocks.
7) The same
query as in 5) is executed again.
8) Now it
takes 2 minutes 26.37 seconds and one can see corresponding entry in
v$session_longops showing Index Fast Full Scan over 1000 blocks taking 147
seconds.
Hash
join
Hash joins
are usually used joining large data sets. The performance of hash joins
strongly depends on available memory either allocated by pga_aggregate_target
(if using workarea_size_policy = auto) or hash_area_size (if using
workarea_size_policy = manual). Block is the unit to measure hash join work in
v$session_longops.
Even
20 block hash join may show up in v$session_longops
It seems
that the main criterion to show up hash join in v$session_longops is the time
to perform it i.e. is it longer than 6 seconds. That’s because one can see even
hash join of 20 blocks as a long running operation. The test case of course is
artificially slowed down to make hash join very slow. And it is as follows
(full listing in 20HashJoinLongops.txt):
1) Table
ParodyOfBig is created from dba_source with just 1300 rows.
2)
workarea_size_policy is set to manual and hash_area_size to 50000.
3)
Anonymous pl/sql block is created. It contains cursor with select joining
ParodyOfBig to itself. To force hash join appropriate hint is used.
4) To slow
down hash join after fetching each row we just wait for 0.03 seconds.
5)
Anonymous block is executed and it takes 39.05 seconds to run. It matches up
with calculated necessary time for executing this script i.e. 1300 rows * 0.03
seconds = 39 seconds.
6) One can
see a row in v$session_longops that indicates that there were 20 blocks for
hash join and the work took 15 seconds. I cannot explain why only 15 seconds
are shown in elapsed_seconds column in v$session_longops.
So if you
see a badly performing hash join with small number of blocks (small can be even
in thousands) make sure that hash join actually isn’t doing something else
behind the scenes. It probably won’t be dbms_lock.sleep as in this particular
case, but possibilities to slow down execution flow between two cursor fetches
are infinite.
Hash
joins can either fly or crawl
Hash joins are
very much affected by available memory and number of blocks to join is also
dependant on available memory. In following example a table of 50000 rows was
created from dba_source and joined to itself (full listing in HashJoinsFlyOrCrawl.txt).
The same query was executed with different hash_area_size and as you can see it
very much can affect both physical reads and execution time (which mostly is
affected by reads and writes to temporary tablespace because hash area is too
small). Depending on hash area size execution time can be as small as 0.01
second or as big as 8 and a half minutes.
Following
table is a summary of test queries.
|
Hash area size |
Query execution time |
Hash blocks |
Physical reads according to autotrace |
|
16000000 |
0 00.01 |
? |
0 |
|
8000000 |
0 00.03 |
? |
936 |
|
4000000 |
0 01.03 |
? |
1472 |
|
1000000 |
0 04.00 |
? |
1718 |
|
500000 |
0 04.01 |
? |
1786 |
|
300000 |
0 03.04 |
? |
2722 |
|
150000 |
0 16.05 |
9288 |
9288 |
|
120000 |
0 19.05 |
13376 |
13376 |
|
110000 |
0 16.03 |
15262 |
15262 |
|
100000 |
0 54.00
(second time) |
33666 |
33666
(second time) |
|
90000 |
1 05.06 |
40052 |
40052 |
|
80000 |
1 22.02 |
49732 |
49732 |
|
70000 |
1 33.05 |
65798 |
65798 |
|
60000 |
8 38.05 |
388112 |
388112 |
|
50000 |
5 56.00 |
388112 |
388112 |
Table 1 Query execution time depending on
hash area size
Of course
if one uses workarea_size_policy = auto then hash_area_size doesn’t matter,
instead pga_aggregate_target is used and a portion of it is allocated to hash
join.
Linear prediction
may be wrong
As already
said Oracle considers long running operation to be linear. To show how Oracle
mistakenly predicts remaining time I’ve created following test case (full
listing in LinearPredictionMain.txt
and LinearPredictionSecondSession.txt).
1) Table
BIG with 50K rows is created.
2)
Anonymous pl/sql block is created. It contains cursor with select joining BIG
to itself. To force hash join appropriate hint is used.
3) To slow
down hash join a small hash_area_size is allocated
4) Cursor
just fetches row by row from the 1st till 50k except for rows between 15K and
18K where it just waits 0.03 seconds.
5) In
another session a script just inserts last row from v$session_longops into a
table to register the trend of events.
5)
Following chart created from the trend of events table shows the situation.
Starting from second 3 since the hash join appears in v$session_longops remaining
time increases rather fast while work done (number of blocks/100) increases
very slowly. Then in the 94th second remaining time reaches its spike and is
178 seconds although as we’ll later see all the work would complete in just 126
seconds. Starting from the 94th second remaining time goes down very fast and
work done as fast goes up.
So there
was a moment in time when hash join was already 94 seconds in v$session_longops
and oracle predicted it to run 178 seconds more however it completed just in 24
seconds. On the other hand at the very start Oracle predicted that hash join
would be over after 24 seconds although it actually needed 115 seconds to
complete. Of course this is just an artificially created test case but many
times the real world is quite the same because of various amount work to do in
different phases of long operation (for example doing some calculations only
for some values), different server workload at execution time, different net
workload accepting results and many other factors that might influence the
performance of particular long operation.

Table 2 Linear prediction of long running
operation
Summary
There are
several facts one has to remember about long running operations:
1) Not all
operations that are running longer than 6 seconds are considered as long
operations. Built-in long operation types are strongly defined although increasing with
every new Oracle version.
2) There
are some other criteria to show the particular operation type in
v$session_longops.
3) Long
operations of particular kind can be very different i.e. they can actually do
many more things behind the scenes and one cannot find out that just looking at
long operation.
4) Oracle
considers long operations as linear operations (each unit of work takes the
same amount of time) therefore estimated completion time can be inaccurate in
case the real process is not linear.
References
and Further reading
[1]
Expert
Oracle, Signature Edition By Thomas Kyte ISBN: 1-59059-525-4, Appendix A,
DBMS_APPLICATION_INFO;
[2]
Thread
in Ask Tom, V$SESSION_LONGOPS, http://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUESTION_ID:1099233454171;
[3]
Oracle®
Database Reference, 10g Release 2 (10.2), V$SESSION_LONGOPS, http://download-uk.oracle.com/docs/cd/B19306_01/server.102/b14237/dynviews_2092.htm;
[4]
What
is the difference between an 'index full scan' and an 'index fast full scan',
by Jonathan Lewis, http://www.jlcomp.demon.co.uk/faq/index_scans.html;
[5]
Oracle®
Database Performance Tuning Guide, 10g Release 2 (10.2), 13 The Query
Optimizer, 13.5.3.7 Fast Full Index Scans, http://download-uk.oracle.com/docs/cd/B19306_01/server.102/b14211/optimops.htm#i52044;
[6]
Thread
in Ask Tom, Hash Join, http://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUESTION_ID:65506569258230;
[7]
SQL
Memory Management in Oracle9i, by Benoît Dageville, Mohamed Zait, http://www.cs.ust.hk/vldb2002/VLDB2002-proceedings/papers/S29P03.pdf.
Includes information how Hash Joins are implemented in Oracle.
[8]
Working
with Automatic PGA, by Christo Kutrovsky, http://www.pythian.com/documents/Working_with_Automatic_PGA.ppt.
Presentation about automatic PGA including info about Hash Joins.
[9]
If only more developers used the DBMS_APPLICATION_INFO package
, by Andy Campbell, http://oracleandy.blogspot.com/2006/09/if-only.html.
Article about DBMS_APPLICATION_INFO.SET_SESSION_LONGOPS procedure that allows user to set his own types of long running operations.
About the
author
Gints
Plivna gints.plivna@gmail.com is
system analyst in Rix Technologies Ltd. (www.rixtech.lv). He has experience in
working with Oracle since 1997 and his interests mostly have been connected
with analyzing system requirements, design, development and SQL tuning.
Contacts:
e-mail - gints.plivna@gmail.com
website - http://www.gplivna.eu/
Licenses
This work
is licensed under the Creative Commons Attribution-ShareAlike 2.5 License. To
view a copy of this license, visit http://creativecommons.org/licenses/by-sa/2.5/
or send a letter to
First edition: 2007-02-01
Thanks to Thomas Mercadante for corrections (IFFS of course is scanning index instead of scanning table): 2007-04-01
Added link [9]: 2007-05-01
To be continued…