The curse of gapless
sequences
By Gints Plivna
The cure (unfortunately just more
like a phantom than real one)
Enhancing the solution with
RETURNING clause of UPDATE
Small tables (1 row!!!) do not need
indexes? Wrong!
The
curse
Gapless
sequences usually are one of the best examples of bad
requirements gathering process. Gapless sequences means:
1)
single
serialization point for inserts (in one table usually) and
2)
thus
seriously reducing scalability and
3)
thus
resulting in bad performance for multi-user multi-insert environments.
Every other
user has to wait for the current user while he either COMMITs his insert and so
uses next gapless sequence number or ROLLBACKs the insert so allowing another
user choose the same next number.
This topic
recently had quite much attention in semi closed (archives available only to
subscribers) ODTUG-SQLPLUS-L
list.
The
road to the curse
As I've
described in my blog
post roots of the problems related to gapless sequences must be sought much
before coding and tuning. Usually the main problem is bad requirements
gathering process not thinking about how the requirement will be implemented
and how it will affect application. So as soon as one encounters such
requirement the very first reaction should be NOOOOOOO!!! Explain customer
above mentioned problems. Try to enforce customer not to ask for such requirement.
Explain the alternatives:
-
if
he needs it just for reports, generate the numbers in query time,
-
if
he needs it just because the old system does so, explain that this is the best
moment to change old rules,
-
if
he needs it just because it seems more convenient for him, explain that
weaknesses will be much bigger than strengths,
-
if
he needs it just because to make some counting, offer him appropriate report,
-
if
he needs it just because anything, tell him something that disproves that.
However
there is one case when usually no one can do anything - legislation. Usually it
is too hard to change the law (-s) enforcing such a stupid requirement. This is
the time when you have to do all the best to reduce the damage and of course
don't forget to document your fight
so that you can sleep well after your decision.
The
cure (unfortunately just more like a phantom than real one)
So what are
the possibilities to implement gapless sequences in Oracle?
Oracle
sequences
Forget
about sequences - I mean oracle supplied objects - sequences. They aren't
gapless! Never! Neither parameter combination ensures continuous numbers.
NOCACHE won't ensure that, ORDER won't ensure that. NONE. Why?
Because as
soon as one gets the sequence number it is lost forever. The most common
reasons are simple rollback and closed session because of "alter system
kill session" or simply lost contact between client and server which
ultimately results in rollback. As soon as you are using SQL Merge statement and sequences together for When Not Matched Then Insert clause you are destined to sequence gaps as I've shown in my blog entry.
To tell the
truth Oracle never guaranteed gapless sequences, Oracle guaranteed only unique
values out of sequences.
So Oracle
objects sequences are not suitable for gapless sequences.
Separate
ID table
Most common
scenario is creating separate ID table containing one row for sequence value.
When you need new id:
1)
select
new id value from ID table and lock it;
2)
do
your work ;
3)
update
ID table with next sequence value;
4)
commit
or rollback all your work along with actions with ID table in one transaction.
Example
follows.
CREATE TABLE id (id_row NUMBER NOT NULL);
INSERT INTO id VALUES (1);
DECLARE
v_new_id
NUMBER;
CURSOR c IS
SELECT id_row FROM id FOR UPDATE;
BEGIN
OPEN c;
FETCH c INTO v_new_id;
-- do your
work;
UPDATE id SET
id_row = id_row + 1 WHERE CURRENT OF c;
CLOSE c;
COMMIT;
END;
/
So what
other people will get while you are doing your work i.e. thinking whether to
commit or rollback?
All other
sessions will simply wait forever while you make your decision. And here comes
the most critical point - you have to make your decision very fast to at least a bit relief other people's pain.
Enhancing
the solution with RETURNING clause of UPDATE
OK. Now
let's assume you have done everything to minimize the think time of people as
much as you can. Can we do something with our procedure to make it also a bit
faster?
The answer
is yes.
Firstly
let's remember the wonderful RETURNING clause. In the script above we are
touching table two times once with SELECT and second time with UPDATE. We can
do all that at once!
Let’s look
at example:
DECLARE
v_new_id NUMBER;
BEGIN
UPDATE id SET id_row = id_row + 1
RETURNING id_row INTO v_new_id;
-- do your work;
COMMIT;
END;
/
As soon as
your row will be updated it will be locked and unlocked only after you'll
commit or rollback.
I've done
comparisons of 10K rows updated with both methods and compared the timings and
latches using Tom Kyte's
runstats.
Comparison
and full result list you can find here
but the summary is as follows:
SELECT FOR
UPDATE + UPDATE needed 183 centiseconds but
UPDATE…RETURNING
just 94 centiseconds – almost two times less.
Sum of
latches was 434 K for SELECT FOR UPDATE + UPDATE but 303 K for
UPDATE…RETURNING, so the gain was ~30%.
Small
tables (1 row!!!) do not need indexes? Wrong!
So we have
one row in our table. Does really adding index will do something good for this
1 row table?
We’ll
create another table with a key holding value for our gapless sequence value
and adding everywhere where clauses to select just one appropriate row with key
= 1.
CREATE TABLE id1 (
key NUMBER NOT
NULL,
id_row NUMBER
NOT NULL,
CONSTRAINT
id1_pk PRIMARY KEY (key));
INSERT INTO id1 VALUES (1, 1);
DECLARE
v_new_id
NUMBER;
CURSOR c IS
SELECT id_row
FROM id1 WHERE
key = 1 FOR UPDATE;
BEGIN
OPEN c;
FETCH c INTO
v_new_id;
-- do your
work;
UPDATE id1 SET
id_row = id_row + 1 WHERE CURRENT OF c;
CLOSE c;
COMMIT;
END;
/
And
enhanced variant with UPDATE…RETURNING is as follows:
DECLARE
v_new_id NUMBER;
BEGIN
UPDATE id1 SET id_row = id_row + 1
WHERE key = 1
RETURNING id_row INTO v_new_id;
-- do your work;
COMMIT;
END;
/
So what
runstats have shown us this time?
Comparison
and full result list you can find here
but the summary is as follows:
SELECT FOR
UPDATE + UPDATE needed 125 centiseconds but
UPDATE…RETURNING
just 58 centiseconds – more than two times less.
Sum of
latches was 196 K for SELECT FOR UPDATE + UPDATE but 122 K for
UPDATE…RETURNING, so the gain was ~38%.
But the
most beautiful thing is that all these numbers are much smaller compared to the
table without index.
Reducing
serialization
Assume your
thousand user multi-insert application runs on this terrible requirement – one
continuous gapless sequence. Usually this is some kind of old paperwork habit –
giving out paper sheets with continuous numbers. However even in the old times
it was not so bad for a real thousand multi-user company running its business. I
cannot imagine a process when thousand people in old days came to one
distributor and he gave the next paper sheet to thousands of people especially
if these people were located in different cities. So why do we have to do that
today? Why do we have to use only one distributor i.e. one row? We have already
created a structure that can hold more than one value and what’s more important
to get exactly that value. So we can insert more than one row into our table of
IDs with just different key values. For example give a sequence window of
allowable values for each of our five branches all around the country.
INSERT INTO id1 VALUES (2, 100);
INSERT INTO id1 VALUES (3, 200);
INSERT INTO id1 VALUES (4, 300);
INSERT INTO id1 VALUES (5, 400);
Of course
we need to enforce that in case the window is exhausted values do not overlap
with the next window. Check constraints may be used for that:
ALTER TABLE id1
ADD CONSTRAINT id1_win1_ck
CHECK (key = 1
AND id_row BETWEEN 1 AND 99 OR key <> 1);
ALTER TABLE id1
ADD CONSTRAINT id1_win2_ck
CHECK (key = 2
AND id_row BETWEEN 100 AND 199 OR key <> 2);
ALTER TABLE id1
ADD CONSTRAINT id1_win3_ck
CHECK (key = 3
AND id_row BETWEEN 200 AND 299 OR key <> 3);
ALTER TABLE id1
ADD CONSTRAINT id1_win4_ck
CHECK (key = 4
AND id_row BETWEEN 300 AND 399 OR key <> 4);
ALTER TABLE id1
ADD CONSTRAINT id1_win5_ck
CHECK (key = 5
AND id_row BETWEEN 400 AND 499 OR key <> 5);
Probably
that’s not the best case if you need to issue a new window each day, but for
relatively rare cases this may be used.
In case you
have contention in database block level i.e. updating different rows located in
the same database block, you can try to use two techniques:
1)
either
increasing INITRANS
parameter for table or
2)
use
old trick with minimize records_per_block. I.e. creating table, putting in only
one row, then issue ALTER TABLE <table_name> MINIMIZE RECORDS_PER_BLOCK,
and then add all other rows. With such technique one can get 2 rows per block
(not one BTW!).
However
both non-default INITRANS and especially MINIMIZE RECORDS_PER_BLOCK should be
used with caution and tested before you try to use them in production.
Summary
Don’t be
fooled by the quite long “The cure” section! All methods explained in it aren’t
worth a single penny as soon as user starts to think whether to commit or
rollback. Even if he thinks one second each time, and using techniques above
you decreased all other components to almost zero (remember also to add the
time for business insert!) the maximum inserted record count per single gapless
sequence won’t be greater than 60*60 = 3600 per hour. With normal Oracle
sequences you can achieve much much better results and at least sequences won’t
be your bottleneck. So think about that in requirements gathering process!
If you have any comments about this article you can discuss them in my blog here.
About
the author
Gints
Plivna gints.plivna@gmail.com is
system analyst in Rix Technologies Ltd. (www.rixtech.lv). He has experience in
working with Oracle since 1997 and his interests mostly have been connected
with analyzing system requirements, design, development and SQL tuning.
Contacts:
e-mail - gints.plivna@gmail.com
website - http://www.gplivna.eu/
Licenses
This work
is licensed under the Creative Commons Attribution-ShareAlike 2.5 License. To
view a copy of this license, visit http://creativecommons.org/licenses/by-sa/2.5/
or send a letter to
First edition:
2007-04-11
Added link to Merge and sequence blog entry:
2008-05-22