Home » SQL & PL/SQL » SQL & PL/SQL » Sequence generation for Parallel Processing (ORACLE 11g, Windows)
Sequence generation for Parallel Processing [message #662782] Fri, 12 May 2017 03:07 Go to next message
lokimisc
Messages: 101
Registered: February 2008
Senior Member
Hi,
We have etl process where our system will read txt files, transform the data and generates output files which are in text format.
Entire process is automated which is triggered by shell script ( creates separate sessions every time when called) through schedule job.
Format of output file is abc_mmddyy_seq where seq is for a particular day it can be from 000 to 999 and next day it will start from 001.
Eg: abc_010117_001, abc_010117_002; abc_020117_001

We have table called "sequence_data" which will just hold the latest sequence for that day using which we are getting next sequence (autonomous procedure)

Everything was working perfectly fine but recently while processing multiple files i.e calling simultaneously more than 10 input files, 2 of the output files got same sequence number reason because 1 file had huge data and while it was taking time for transformation and was about generate output file other smaller file completed transformation process, read same sequence number and generated output file.

So could you please advise what is the best approach so that we don't get same output file.

Thanks
Lokesh
Re: Sequence generation for Parallel Processing [message #662788 is a reply to message #662782] Fri, 12 May 2017 03:49 Go to previous messageGo to next message
John Watson
Messages: 8929
Registered: January 2010
Location: Global Village
Senior Member
I don't know how you are selecting and updating your sequence number, but clearly your transaction structure is inappropriate. Perhaps (just an example) you need to use SELECT...FOR UPDATE to lock the row.
Re: Sequence generation for Parallel Processing [message #662866 is a reply to message #662788] Sun, 14 May 2017 05:56 Go to previous messageGo to next message
lokimisc
Messages: 101
Registered: February 2008
Senior Member
Thanks Watson for your response.
If I use SELECT FOR UPDATE then while parallel processing if another process try to read same row and if it is locked by first session then it will throw exception which I need to avoid.

Regards,
Lokesh
Re: Sequence generation for Parallel Processing [message #662869 is a reply to message #662866] Sun, 14 May 2017 06:21 Go to previous messageGo to next message
John Watson
Messages: 8929
Registered: January 2010
Location: Global Village
Senior Member
Well, yes. You have to serialize the access to the sequence number or you will get duplicates. That's the whole point. If you don't want the exception, use a WAIT clause.
Re: Sequence generation for Parallel Processing [message #662872 is a reply to message #662866] Sun, 14 May 2017 08:33 Go to previous messageGo to next message
Solomon Yakobson
Messages: 3273
Registered: January 2010
Location: Connecticut, USA
Senior Member
lokimisc wrote on Sun, 14 May 2017 06:56
If I use SELECT FOR UPDATE then while parallel processing if another process try to read same row and if it is locked by first session then it will throw exception which I need to avoid.
Not sure what exception are you talking about. My understanding is you have autonomous procedure that:

a) Selects for update date & number columns.
b) if date = trunc(sysdate) then update number = number + 1, return that number + 1 and commit
c) if date != trunc(sysdate) then update date = trunc(sysdate), number = 1, return 1 and commit

Therefore, if multiple sessions call procedure for sequence number they will wait (select for update will wait) since WAIT is default for FOR UPDATE. What you want is optimize that autonomous procedure to work as fast as possible to avoid waits caused by FOR UPDATE locks.
In fact, you don't need SELECT FOR UPDATE. All you need is:

UPDATE SEQUENCE_TABLE
   SET SEQ_DT = TRUNC(SYSDATE),
       SEQ_VAL = CASE SEQ_DT
                   WHEN TRUNC(SYSDATE) THEN SEQ_VAL + 1
                   ELSE 1
                 END
  RETURNING SEQ_VAL
    INTO V_SEQ_VAL;

SY.
Re: Sequence generation for Parallel Processing [message #662904 is a reply to message #662782] Mon, 15 May 2017 07:18 Go to previous messageGo to next message
EdStevens
Messages: 1376
Registered: September 2013
Senior Member
Aside from your sequence numbers and transaction control, that "date" portion of your text file is troubling on two counts.

First, if you are going to store a date in a text file (actually, a character string representation of a date) then why wouldn't you put it in year-month-day format instead of month-day-year? That way, it could be properly sorted and/or compared when (not if) the need arises.

Secondly, I find it troubling that 17 years after Y2K, we have a whole new generation of developers who ignore those lessons and continue with 2-digit years.
Re: Sequence generation for Parallel Processing [message #662905 is a reply to message #662904] Mon, 15 May 2017 07:27 Go to previous messageGo to next message
BlackSwan
Messages: 26766
Registered: January 2009
Location: SoCal
Senior Member
You can lead some folks to knowledge, but you can't make them think.
Re: Sequence generation for Parallel Processing [message #662906 is a reply to message #662904] Mon, 15 May 2017 07:30 Go to previous messageGo to next message
Solomon Yakobson
Messages: 3273
Registered: January 2010
Location: Connecticut, USA
Senior Member
EdStevens wrote on Mon, 15 May 2017 08:18

Secondly, I find it troubling that 17 years after Y2K, we have a whole new generation of developers who ignore those lessons and continue with 2-digit years.
1. Unfortunately Y3K will not be my headache Laughing .
2. My ego isn't that big to assume my code will still be used in Y3K Laughing .

SY.
Re: Sequence generation for Parallel Processing [message #663094 is a reply to message #662906] Mon, 22 May 2017 06:22 Go to previous messageGo to next message
Bill B
Messages: 1971
Registered: December 2004
Senior Member
personally I would use the following

create sequence my_table_seq start with 0 increment by 1;

then every night at midnight use the scheduler to run the following code

declare
l_val number;
begin
    execute immediate
    'select my_table_seq.nextval from dual' INTO l_val;

    execute immediate
    'alter sequence my_table_seq increment by -' || l_val || 
                                                          ' minvalue 0';

    execute immediate
    'select my_table_seq.nextval from dual' INTO l_val;

    execute immediate
    'alter sequence my_table_seq increment by 1 minvalue 0';
end;
/

all done. Guaranteed that the sequence number at the end of your file will always be unique because your using a sequence.

I also agree with the previous comment. If you name your files

abc_20170101_001

then you can easily find a specific date range of tables by simply sorting the table names.

[Updated on: Mon, 22 May 2017 06:25]

Report message to a moderator

Re: Sequence generation for Parallel Processing [message #663253 is a reply to message #663094] Sat, 27 May 2017 02:18 Go to previous messageGo to next message
lokimisc
Messages: 101
Registered: February 2008
Senior Member
Thank You all for your inputs.

Here is the procedure written to handle parallel processing sequence.

create or replace PROCEDURE prc_get_nxt_seq(po_seq OUT NUMBER)
IS
  PRAGMA autonomous_transaction;
  l_lock_name           CONSTANT VARCHAR2(20) := 'EMP_SEQ';
  l_sleep_sec           CONSTANT NUMBER       := 1;
  l_lock_handle         VARCHAR2(100);
  l_lock_id             NUMBER;
  l_lock_release        NUMBER;
  l_seq                 NUMBER;
  l_record_count        NUMBER;
  
BEGIN
  --dbms_output.put_line( sysdate||': StartSerialProcess()' );
  --// create a lock handle
  dbms_lock.allocate_unique( lockname        => l_lock_name, 
                             lockHandle      => l_lock_handle, 
                             expiration_secs => 0 
                           );
  --// attempt to lock in order to start process, else
  --// loop and wait for the lock to become available
  --// for us to use.
  LOOP
    --// attempt a lock
    l_lock_id := dbms_lock.request( lockhandle => l_lock_handle, 
                                    timeout    => 1 
                                  );
  EXIT
  WHEN l_lock_id != 1; --// lock was successful
    --// locking failed - another process is busy, we
    --// need to wait 
    --// retrying the lock)
    dbms_lock.sleep( l_sleep_sec );
  END LOOP;
  --// we have made the lock and can now continue to process...
  
  SELECT COUNT(1) 
  INTO l_record_count 
  FROM  system_data 
  WHERE NAME ='EMP_SEQ' ;

  IF l_record_count < 1 
  THEN  
  
    INSERT INTO SYSTEM_DATA
       (
        name,
        value,
        mod_date
        )
      SELECT 'EMP_SEQ', 0 ,TRUNC(SYSDATE) FROM dual;  
      
  END IF;     
              
  UPDATE system_data
  SET mod_date = TRUNC(SYSDATE),
      mod_user = USER ,
      value    = CASE  TRUNC(mod_date)
                   WHEN TRUNC(SYSDATE) THEN value + 1
                   ELSE 1
                 END
  WHERE name = 'EMP_SEQ' 
  RETURNING value
    INTO l_seq;
  commit;  
  po_seq := l_seq;
  --dbms_lock.sleep( 5 ); --// pretend processsing for 5 secs for testing purpose;
  --// when processing is completed, we need to release the
  --// the lock handle
  l_lock_release := dbms_lock.release( l_lock_handle );
  
EXCEPTION WHEN OTHERS 
THEN
l_lock_release := dbms_lock.release( l_lock_handle );
END prc_get_nxt_seq; 

Thanks and Regards,
Lokesh
Re: Sequence generation for Parallel Processing [message #663269 is a reply to message #663253] Sat, 27 May 2017 05:50 Go to previous messageGo to next message
Solomon Yakobson
Messages: 3273
Registered: January 2010
Location: Connecticut, USA
Senior Member
Why do you need DBMS_LOCK on top of oracle built-in locking. Here it causes nothing but performance degradation. You want that autonomous transaction to complete as fast as possible, so code must assume row with NAME ='EMP_SEQ' is there already. And TRUNC on MOD_DATE shouldn't be needed - we must assume initial row had MOD_DATE with time of 00:00:00. Also, you should add p_name parameter instead of hardcoding 'EMP_SEQ'. Then your will be able to use same code for all sequences. And why procedure and not function? Function would allow you setting sequence value right in INSERT statement.

SY.
Re: Sequence generation for Parallel Processing [message #663270 is a reply to message #663269] Sat, 27 May 2017 05:59 Go to previous messageGo to next message
BlackSwan
Messages: 26766
Registered: January 2009
Location: SoCal
Senior Member
EXCEPTION handler is bad bug waiting to bite you

http://www.orafaq.com/wiki/WHEN_OTHERS
Re: Sequence generation for Parallel Processing [message #663280 is a reply to message #663270] Sun, 28 May 2017 02:32 Go to previous messageGo to next message
lokimisc
Messages: 101
Registered: February 2008
Senior Member
Thanks Solomon for your response.
Randomly same sequence was used by two processes when time differences between them was very less (actually it is not reproducible) so to avoid such scenario dbms_lock is used.
Also I will incorporate your inputs.

Thanks all for helping to solve the problem.

Regards,
Lokesh
Re: Sequence generation for Parallel Processing [message #663281 is a reply to message #663280] Sun, 28 May 2017 05:45 Go to previous messageGo to next message
Solomon Yakobson
Messages: 3273
Registered: January 2010
Location: Connecticut, USA
Senior Member
What scenario DBMS_LOCK will avoid that standard locking wouldn't? If multiple session request sequence value at the same/about the same time only one will update - all others will wait (they will be queued up). So the goal is to minimize sequence value generation transaction time while DBMS_LOCK simply prolongs it.

SY.
Re: Sequence generation for Parallel Processing [message #663282 is a reply to message #663270] Sun, 28 May 2017 05:50 Go to previous messageGo to next message
Solomon Yakobson
Messages: 3273
Registered: January 2010
Location: Connecticut, USA
Senior Member
BlackSwan wrote on Sat, 27 May 2017 06:59
EXCEPTION handler is bad bug waiting to bite you

http://www.orafaq.com/wiki/WHEN_OTHERS
No, this is one of the few cases when WHEN OTHERS is, pretty-much, a must. Otherwise all sessions needing sequence value will be locked out if exception is raised after DBMS_LOCK lock is placed. We have to release the lock regardless of exception. However, you are correct in a sense we shouldn't be masking exceptions so exception handler shold issue RAISE after releasing the lock.

SY.
Re: Sequence generation for Parallel Processing [message #663433 is a reply to message #663282] Fri, 02 June 2017 13:51 Go to previous messageGo to next message
Bill B
Messages: 1971
Registered: December 2004
Senior Member
If you really don't want to use an oracle sequence then do the following

create table file_locks (current_date date,the_seq number);


CREATE OR REPLACE FUNCTION Bld_file_name (P_file_prefix IN VARCHAR2)
    RETURN VARCHAR2
IS
    PRAGMA AUTONOMOUS_TRANSACTION;
    V_the_seq        NUMBER;
    V_current_date   DATE;
BEGIN
    SELECT CURRENT_DATE, The_seq
      INTO V_current_date, V_the_seq
      FROM File_locks
    FOR UPDATE;

    -- check to see if we need to restart the sequence
    IF V_current_date < TRUNC (SYSDATE)
    THEN
        UPDATE File_locks
           SET CURRENT_DATE = TRUNC (SYSDATE), The_seq = 0;

        V_the_seq := 0;
    END IF;

    -- increment the sequence
    UPDATE File_locks
       SET The_seq = The_seq + 1;

    COMMIT;
    RETURN    P_file_prefix
           || '_'
           || TO_CHAR (SYSDATE, 'YYYYMMDD')
           || '_'
           || TO_CHAR (V_the_seq + 1, 'fm999999000');
END Bld_file_name;

Every invocation of the code will return a unique file name and the function does it's own locking of the table.

[Updated on: Fri, 02 June 2017 14:03]

Report message to a moderator

Re: Sequence generation for Parallel Processing [message #663439 is a reply to message #663433] Fri, 02 June 2017 14:28 Go to previous messageGo to next message
Solomon Yakobson
Messages: 3273
Registered: January 2010
Location: Connecticut, USA
Senior Member
The goal is to make procedure as fast as possible so instead of doing select + if + update + possible second update we can do single update. But main issue is PL/SQL is procedural, not transactional therefore SYSDATE will be calculated each time it is called. So you could end up with wrong results if update was before midnight and return after midnight. You have to make it transactional and not just to itself but also to code it might be called from. Also TO_CHAR (V_the_seq + 1, '000') will produce leading space, so format should be 'fm000'. Anyway, something like:

CREATE OR REPLACE
  FUNCTION Bld_file_name(
                         P_FILE_PREFIX IN VARCHAR2,
                         P_CURRENT_DATE IN DATE
                        )
    RETURN VARCHAR2
    IS
        V_THE_SEQ NUMBER:
        PRAGMA AUTONOMOUS_TRANSACTION;
    BEGIN
        UPDATE File_locks
           SET CURRENT_DATE = TRUNC(P_CURRENT_DATE),
               THE_SEQ = CASE CURRENT_DATE
                           WHEN TRUNC(P_CURRENT_DATE) THEN THE_SEQ + 1
                           ELSE 1
                         END
           RETURNING THE_SEQ
             INTO V_THE_SEQ;
    COMMIT;
    RETURN    P_FILE_PREFIX ||
              TO_CHAR(P_CURRENT_DATE,'_YYYYMMDD_') ||
              TO_CHAR(V_THE_SEQ,'FM000');
END Bld_file_name;
/

SY.

[Updated on: Fri, 02 June 2017 14:35]

Report message to a moderator

Re: Sequence generation for Parallel Processing [message #663440 is a reply to message #663439] Fri, 02 June 2017 14:34 Go to previous messageGo to next message
Bill B
Messages: 1971
Registered: December 2004
Senior Member
Yes except for one change.

CREATE OR REPLACE
  FUNCTION Bld_file_name(
                         P_FILE_PREFIX IN VARCHAR2,
                         P_CURRENT_DATE IN DATE
                        )
    RETURN VARCHAR2
    IS
        V_THE_SEQ NUMBER:
        PRAGMA AUTONOMOUS_TRANSACTION;
    BEGIN
        UPDATE File_locks
           SET CURRENT_DATE = TRUNC(P_CURRENT_DATE),
               THE_SEQ = CASE CURRENT_DATE
                           WHEN TRUNC(CURRENT_DATE) THEN THE_SEQ + 1
                           ELSE 1
                         END
           RETURNING THE_SEQ
             INTO V_THE_SEQ;
    COMMIT;
    RETURN    P_FILE_PREFIX ||
              TO_CHAR(P_CURRENT_DATE,'_YYYYMMDD_') ||
              TO_CHAR(V_THE_SEQ,'FM999999000');
END Bld_file_name;

By increasing the format mask for v_the_seq to FM999999000, any sequence number under 1000 will have left padded with zero's if somehow the sequence gets bigger then 999 it won't return ###, it will expand out the sequence to the necessary length. Probably will never be necessary but always code for the edge conditions.

Also shouldn't it be

UPDATE File_locks
SET CURRENT_DATE = TRUNC(P_CURRENT_DATE),
THE_SEQ = CASE CURRENT_DATE
WHEN TRUNC(P_CURRENT_DATE) THEN THE_SEQ + 1
ELSE 1
END
RETURNING THE_SEQ
INTO V_THE_SEQ;

because the set will not use the modified current_date in the same update .

[Updated on: Fri, 02 June 2017 14:41]

Report message to a moderator

Re: Sequence generation for Parallel Processing [message #663442 is a reply to message #663440] Fri, 02 June 2017 14:45 Go to previous messageGo to next message
Solomon Yakobson
Messages: 3273
Registered: January 2010
Location: Connecticut, USA
Senior Member
Well, it could be you are masking one issue with another. If daily sequence number count must not exceed 999 I'd rather change V_THE_SEQ to NUMBER(3) to get exception. If daily count isn't limited then I'd rely on implicit conversion:

    RETURN    P_FILE_PREFIX ||
              TO_CHAR(P_CURRENT_DATE,'_YYYYMMDD_') ||
              V_THE_SEQ;

SY.
Re: Sequence generation for Parallel Processing [message #663444 is a reply to message #663442] Fri, 02 June 2017 14:56 Go to previous messageGo to next message
Bill B
Messages: 1971
Registered: December 2004
Senior Member
I just noticed a problem when I was checking out Solomon's code. I used a reserve word in my table and code CURRENT_DATE is reserved and the column name needs to be changed or you can do. Sorry about that
UPDATE File_locks
           SET CURRENT_DATE = TRUNC(P_CURRENT_DATE),
               THE_SEQ = CASE "CURRENT_DATE"
                           WHEN TRUNC(P_CURRENT_DATE) THEN THE_SEQ + 1
                           ELSE 1
                         END
           RETURNING THE_SEQ
             INTO V_THE_SEQ;

[Updated on: Fri, 02 June 2017 14:57]

Report message to a moderator

Re: Sequence generation for Parallel Processing [message #663445 is a reply to message #663444] Fri, 02 June 2017 15:05 Go to previous messageGo to next message
Solomon Yakobson
Messages: 3273
Registered: January 2010
Location: Connecticut, USA
Senior Member
I agree it isn't good choice for column name, although keyword CURRENT_DATE is not a reserved in SQL:

select * from V$RESERVED_WORDS where keyword = 'CURRENT_DATE'
/
KEYWORD     	LENGTH	RESERVED	RES_TYPE	RES_ATTR	RES_SEMI	DUPLICATE
------------	------	--------	--------	--------	--------	---------
CURRENT_DATE	12    	N       	N       	N       	N       	N        

SY.
Re: Sequence generation for Parallel Processing [message #663446 is a reply to message #663445] Fri, 02 June 2017 15:08 Go to previous message
Bill B
Messages: 1971
Registered: December 2004
Senior Member
true, however at least in 12.2 if your type

select current_date from dual;

you get the current timestamp. lol
Previous Topic: Package to include a sleep inside a procedure
Next Topic: oracle external table issue
Goto Forum:
  


Current Time: Thu Apr 18 18:49:19 CDT 2024