Yann Neuhaus

Subscribe to Yann Neuhaus feed
dbi services technical blog
Updated: 15 hours 43 min ago

Drilling down the pgSentinel Active Session History

Sun, 2018-07-15 13:35

In pgSentinel: the sampling approach for PostgreSQL I mentioned that one of the advantages of the ASH approach is the ability to drill down from an overview of the database activity, down to the details where we can do some tuning. The idea is to always focus on the components which are relevant to our tuning goal:

  • Filter/Group by the relevant dimension to focus on what you want to tune (a program, session, query, time window,…)
  • Sort by the most active samples, to spend time only where you know you can improve significantly

The idea is to start at a high level. Here is a GROUP BY BACKEND_TYPE to show the activity of the ‘client backend’ and the ‘autovacuum worker':

select count(*), backend_type
from pg_active_session_history
where ash_time>=current_timestamp - interval '5 minutes'
group by backend_type
order by 1 desc
;
count | backend_type
-------+-------------------
1183 | client backend
89 | autovacuum worker

I selected only the last 5 minutes (the total retention is defined by pgsentinel_ash.max_entries and the sampling frequency by pgsentinel_ash.pull_frequency).

I ordered by the number of samples for each one, which gives a good idea of the proportion: most of the activity here for ‘client backend’. It may be more interesting to show a percentage, such as 93% activity is from the client and 7% is from the vacuum. However, this removes an interesting measure about the overall activity. The fact that we have 1183 samples within 5 minutes is an indication of the total load. In 5 minutes, we have 300 seconds, which means that each session can have 300 samples, when being 100% active in the database during that time. 1183 samples during 5 minutes mean that we have on average 1183/300 = 4 sessions active. This measure, calculated from the number of samples divided by the number of seconds, and known as Average Active Sessions (AAS) gives two different piece of information:

  • The overall activity in the database, similar to the load average at OS level
  • The relative activity of an aggregate (per session, program, event, time…)
AAS (Average Active Sessions)

In the previous post I counted the number of samples with count(distinct ash_time) because I knew that I had several sessions active during the whole time. But if there are periods of inactivity during those 5 minutes, there are no samples at all. And when drilling down to more detail, there will be some samples with no activity for a specific group. Here I calculate the number of seconds covered by the samples, using a window function:

with ash as (
select *,ceil(extract(epoch from max(ash_time)over()-min(ash_time)over()))::numeric samples
from pg_active_session_history where ash_time>=current_timestamp - interval '5 minutes'
) select round(count(*)::numeric/samples,2) as "AAS",
backend_type
from ash
group by samples,
backend_type
order by 1 desc fetch first 20 rows only
;
AAS | backend_type
-------+-------------------
3.95 | client backend
0.29 | autovacuum worker
(2 rows)

From this output, I know that I have about 4 client sessions running. This is what I want to tune.

Drill down on wait events

Adding the WAIT_EVENT_TYPE to the GROUP BY, I can have more detail about the resources used by those sessions:

with ash as (
select *,ceil(extract(epoch from max(ash_time)over()-min(ash_time)over()))::numeric samples
from pg_active_session_history where ash_time>=current_timestamp - interval '5 minutes'
) select round(count(*)::numeric/samples,2) as "AAS",
backend_type,wait_event_type
from ash
group by samples,
backend_type,wait_event_type
order by 1 desc fetch first 20 rows only
;
AAS | backend_type | wait_event_type
-------+-------------------+-----------------
2.57 | client backend | IO
0.94 | client backend | CPU
0.45 | client backend | LWLock
0.16 | autovacuum worker | CPU
0.12 | autovacuum worker | IO
0.00 | autovacuum worker | LWLock
(6 rows)

This gives a better idea about which system component may be tuned to reduce the response time or the throughput. IO is the major component here with 2.57 AAS being on an I/O call. Let’s get more information about which kind of I/O.

Drilling down to the wait event:

with ash as (
select *,ceil(extract(epoch from max(ash_time)over()-min(ash_time)over()))::numeric samples
from pg_active_session_history where ash_time>=current_timestamp - interval '5 minutes'
) select round(count(*)::numeric/samples,2) as "AAS",
backend_type,wait_event_type,wait_event
from ash
group by samples,
backend_type,wait_event_type,wait_event
order by 1 desc fetch first 20 rows only
;
AAS | backend_type | wait_event_type | wait_event
-------+-------------------+-----------------+------------------
1.52 | client backend | IO | DataFileWrite
0.94 | client backend | CPU | CPU
0.46 | client backend | IO | DataFileExtend
0.41 | client backend | IO | DataFileRead
0.33 | client backend | LWLock | WALWriteLock
0.15 | autovacuum worker | CPU | CPU
0.12 | client backend | LWLock | buffer_mapping
0.10 | autovacuum worker | IO | DataFileRead
0.08 | client backend | IO | WALInitWrite
0.08 | client backend | IO | BufFileWrite
0.02 | client backend | IO | WALWrite
0.01 | autovacuum worker | IO | DataFileWrite
0.01 | client backend | IO | DataFilePrefetch
0.00 | client backend | LWLock | buffer_content
0.00 | autovacuum worker | LWLock | buffer_mapping
(15 rows)

This gives more information. The average 2.57 sessions active on IO are actually writing for 1.52 of them, reading for 0.46 of them, and waiting for the datafile to be extended for 0.46 of them. That helps to focus on the areas where we might improve the performance, without wasting time on the events which are only a small part of the session activity.

Drill-down on queries

This was a drill-down on the system axis (wait events are system call instrumentation). This is useful when we think something is wrong on the system or the storage. But performance tuning must also drive the investigation on the application axis. The higher level is the user call, the TOP_LEVEL_QUERY:

with ash as (
select *,ceil(extract(epoch from max(ash_time)over()-min(ash_time)over()))::numeric samples
from pg_active_session_history where ash_time>=current_timestamp - interval '5 minutes'
) select round(count(*)::numeric/samples,2) as "AAS",
backend_type,top_level_query
from ash
group by samples,
backend_type,top_level_query
order by 1 desc fetch first 20 rows only
;
AAS | backend_type | top_level_query
-------+-------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
0.95 | client backend | SELECT * FROM mypgio('pgio3', 50, 3000, 131072, 255, 8);
0.95 | client backend | SELECT * FROM mypgio('pgio2', 50, 3000, 131072, 255, 8);
0.95 | client backend | SELECT * FROM mypgio('pgio4', 50, 3000, 131072, 255, 8);
0.95 | client backend | SELECT * FROM mypgio('pgio1', 50, 3000, 131072, 255, 8);
0.25 | autovacuum worker | autovacuum: VACUUM ANALYZE public.pgio2
0.02 | client backend | commit;
0.01 | client backend | select * from pg_active_session_history where pid=21837 order by ash_time desc fetch first 1 rows only;
0.01 | client backend | with ash as ( +
| | select *,ceil(extract(epoch from max(ash_time)over()-min(ash_time)over()))::numeric samples +
| | from pg_active_session_history where ash_time>=current_timestamp - interval '5 minutes' +
...

Here I see 4 user calls responsible for most of the 4 active sessions related to the ‘client backend’, each one with AAS=0.95 and this is actually what is running: the PGIO benchmark (see https://kevinclosson.net/) with 4 sessions calling mypgio function.

The function we see in TOP_LEVEL_QUERY is itself running some queries, and the big advantage of the pgSentinel extension, over pg_stat_activity, is the capture of the actual statement running, with the actual values of the parameters:

with ash as (
select *,ceil(extract(epoch from max(ash_time)over()-min(ash_time)over()))::numeric samples
from pg_active_session_history where ash_time>=current_timestamp - interval '5 minutes'
) select round(count(*)::numeric/samples,2) as "AAS",
backend_type,substr(query,1,100)
from ash
group by samples,
backend_type,substr(query,1,100)
order by 1 desc fetch first 20 rows only
;
AAS | backend_type | substr
-------+-------------------+----------------------------------------------------------------------------------------
0.26 | autovacuum worker |
0.02 | client backend | commit
0.02 | client backend | SELECT sum(scratch) FROM pgio1 WHERE mykey BETWEEN 3567 AND 3822
0.01 | client backend | SELECT sum(scratch) FROM pgio4 WHERE mykey BETWEEN 5729 AND 5984
0.01 | client backend | SELECT sum(scratch) FROM pgio4 WHERE mykey BETWEEN 5245 AND 5500
0.01 | client backend | truncate table l_ash.ps
0.01 | client backend | SELECT sum(scratch) FROM pgio1 WHERE mykey BETWEEN 3249 AND 3504
0.01 | client backend | SELECT sum(scratch) FROM pgio1 WHERE mykey BETWEEN 57 AND 312
0.01 | client backend | UPDATE pgio4 SET scratch = scratch + 1 WHERE mykey BETWEEN 3712 AND 3720
0.01 | client backend | SELECT sum(scratch) FROM pgio2 WHERE mykey BETWEEN 1267 AND 1522
0.01 | client backend | SELECT sum(scratch) FROM pgio1 WHERE mykey BETWEEN 703 AND 958
0.01 | client backend | SELECT sum(scratch) FROM pgio2 WHERE mykey BETWEEN 2025 AND 2280
0.01 | client backend | insert into l_ash.ps_diff +
| | select ps1.pid,ps1.uname,ps1.pr,ps1.ni,ps1.virt,ps1.res,ps1.shr,ps1.s,ps1.
0.01 | client backend | UPDATE pgio4 SET scratch = scratch + 1 WHERE mykey BETWEEN 2690 AND 2698
0.01 | client backend | SELECT sum(scratch) FROM pgio3 WHERE mykey BETWEEN 5463 AND 5718
0.01 | client backend | SELECT sum(scratch) FROM pgio4 WHERE mykey BETWEEN 1467 AND 1722
0.01 | client backend | SELECT sum(scratch) FROM pgio1 WHERE mykey BETWEEN 4653 AND 4908
(20 rows)

Here, no session is at the top. We have a few samples for each execution. This is because each execution is different (different values for the parameters) and they have a balanced execution time. If we had one query being longer with one specific set of parameter values, it would show up at the top here.

Finally, we can also aggregate at a higher level than QUERY with QUERYID which is per prepared statement and do not change when executing with different parameter values. If we want to get the text, then we can join with PG_STAT_STATEMENTS

with ash as (
select *,datid dbid,ceil(extract(epoch from max(ash_time)over()-min(ash_time)over()))::numeric samples
from pg_active_session_history where ash_time>=current_timestamp - interval '5 minutes'
) select round(count(*)::numeric/samples,2) as "AAS",dbid,
backend_type,queryid,pg_stat_statements.query
from ash left outer join pg_stat_statements using(dbid,queryid)
group by samples,dbid,
backend_type,queryid,pg_stat_statements.query
order by 1 desc fetch first 15 rows only
;
AAS | dbid | backend_type | queryid | query
-------+-------+----------------+------------+------------------------------------------------------------------------------------------------------
0.89 | 17487 | client backend | 837728477 | SELECT sum(scratch) FROM pgio2 WHERE mykey BETWEEN 100926 AND 101181
0.70 | 17487 | client backend | 3411884874 | SELECT sum(scratch) FROM pgio4 WHERE mykey BETWEEN $1 AND $2
0.68 | 17487 | client backend | 1046864277 | SELECT sum(scratch) FROM pgio3 WHERE mykey BETWEEN 1591 AND 1846
0.67 | 17487 | client backend | 2994234299 | SELECT sum(scratch) FROM pgio1 WHERE mykey BETWEEN $1 AND $2
0.33 | 17487 | client backend | 1648177216 | UPDATE pgio1 SET scratch = scratch + 1 WHERE mykey BETWEEN 2582 AND 2590
0.32 | 17487 | client backend | 3381000939 | UPDATE pgio3 SET scratch = scratch + $1 WHERE mykey BETWEEN $2 AND $3
0.30 | 17487 | client backend | 1109524376 | UPDATE pgio4 SET scratch = scratch + 1 WHERE mykey BETWEEN 5462 AND 5470
0.11 | 17487 | client backend | 3355133240 | UPDATE pgio2 SET scratch = scratch + $1 WHERE mykey BETWEEN $2 AND $3
0.05 | 17547 | client backend | 2771355107 | update l_ash.parameters set value=now(),timestamp=now() where name=$1
0.05 | 17547 | client backend | 1235869898 | update l_ash.parameters set value=$1,timestamp=now() where name=$2
0.02 | 13806 | client backend | 935474258 | select * from pg_active_session_history where pid=$1 order by ash_time desc fetch first $2 rows only
0.01 | 13806 | client backend | 164740364 | with ash as ( +

This shows the main queries running: SELECT and UPDATE on the PGIO1,PGIO2,PGIO3,PGIO4. They run with different parameter values but have the same QUERYID. It seems that PG_STAT_STATEMENTS is not very consistent when capturing the query text: some show the parameter, some other show the values. But you must know that those are the prepared statements. We do not have 0.89 average sessions running the ‘SELECT sum(scratch) FROM pgio2 WHERE mykey BETWEEN 100926 AND 101181′. This is the ‘SELECT sum(scratch) FROM pgio2′ running with different parameter values and for whatever reasons, the PG_STAT_STATEMENTS extension displays one of the set of values rather than ‘BETWEEN $1 AND $2′.

Time dimension

Of course we can also query all samples and drill-down with a graphical tool. For the time axis, this is a better visualization. Here is a quick Excel PivotChart from those 5 minutes samples:
pg_active_session_history
I always have 4 sessions running, as we have seen in the average, but the wait event detail is not uniform during the timeline. This is where you will drill down on the time axis. This can be helpful to investigate a short duration issue. Or to try to understand non-uniform response time. For example, coming from Oracle, I’m not used to this pattern where, from one second to the other, the wait profile is completely different. Probably because of all the background activity such as Vacuum, WAL, sync buffers to disk, garbage collection,… The workload here, PGIO, the SLOB method for PostgreSQL, is short uniform queries. It would be interesting to have some statistics about the response time variation.

Note that in this database cluster, in addition to the PGIO workload, I have a small application running and committing very small changes occasionally and this why you see the peaks with 1 session on WALWrite and 4 sessions waiting on WALWriteLock. This adds to the chaos of waits.

This extension providing active session sampling is only the first component of pgSentinel so do not spend too much time building queries, reports and graphs on this and let’s see when will come with pgSentinel:

pgSentinel is in progress….@postgresql @amplifypostgres @PostgreSQLFR @BertrandDrouvot @ckikof pic.twitter.com/Pwq8vB69MI

— pgSentinel (@Pg_Sentinel) July 11, 2018

 

Cet article Drilling down the pgSentinel Active Session History est apparu en premier sur Blog dbi services.

PGDay Amsterdam – follow up 2 – Where do null values go to in a hash partitioned table?

Fri, 2018-07-13 23:17

This is the second follow up which covers this question: When you hash partition a table in PostgreSQL 11 where do null values for the partitioned column go to? Lets go…

In the demo I used this little table:

postgres=# select version();
                                                            version                                                          
-----------------------------------------------------------------------------------------------------------------------------
 PostgreSQL 11beta1 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-28), 64-bi
(1 row)
postgres=# create table part2 ( a int, list varchar(10) ) partition by hash (a);
CREATE TABLE
postgres=# create table part2_1 partition of part2 FOR VALUES WITH (MODULUS 3, REMAINDER 0);
CREATE TABLE
postgres=# create table part2_2 partition of part2 FOR VALUES WITH (MODULUS 3, REMAINDER 1);
CREATE TABLE
postgres=# create table part2_3 partition of part2 FOR VALUES WITH (MODULUS 3, REMAINDER 2);
CREATE TABLE
postgres=# \d+ part2
                                          Table "public.part2"
 Column |         Type          | Collation | Nullable | Default | Storage  | Stats target | Description 
--------+-----------------------+-----------+----------+---------+----------+--------------+-------------
 a      | integer               |           |          |         | plain    |              | 
 list   | character varying(10) |           |          |         | extended |              | 
Partition key: HASH (a)
Partitions: part2_1 FOR VALUES WITH (modulus 3, remainder 0),
            part2_2 FOR VALUES WITH (modulus 3, remainder 1),
            part2_3 FOR VALUES WITH (modulus 3, remainder 2)

The data we played with was this:

postgres=# insert into part2 (a,list) values (1,'beer');
INSERT 0 1
postgres=# insert into part2 (a,list) values (2,'whine');
INSERT 0 1
postgres=# insert into part2 (a,list) values (3,'schnaps');
INSERT 0 1
postgres=# select * from only part2_1;
 a | list  
---+-------
 2 | whine
(1 row)

postgres=# select * from only part2_2;
 a |  list   
---+---------
 3 | schnaps
(1 row)

postgres=# select * from only part2_3;
 a | list 
---+------
 1 | beer
(1 row)

We have the data evenly distributed over the three partitions. When we insert a row which contains a NULL value for the column we partitioned on:

postgres=# insert into part2 (a,list) values (null,'cocktail');
INSERT 0 1

… where does that column go to?

postgres=# select * from only part2_1;
 a |   list   
---+----------
 2 | whine
   | cocktail
(2 rows)

postgres=# select * from only part2_2;
 a |  list   
---+---------
 3 | schnaps
(1 row)

postgres=# select * from only part2_3;
 a | list 
---+------
 1 | beer
(1 row)

It goes to the first partition and every following NULL key row gets there as well:

postgres=# insert into part2 (a,list) values (null,'rum');
INSERT 0 1
postgres=# select * from only part2_1;
 a |   list   
---+----------
 2 | whine
   | cocktail
   | rum
(3 rows)

I couldn’t find anything in the documentation about that so I did send a mail to the general mailing list and here is the answer: “The calculated hash value for the null value will be zero, therefore, it will fall to the partition having remainder zero.”

 

Cet article PGDay Amsterdam – follow up 2 – Where do null values go to in a hash partitioned table? est apparu en premier sur Blog dbi services.

Database Vault : Rules, Rule Sets and Command Rules

Fri, 2018-07-13 14:31

In a previous blog I talked about protecting data using Realms. With Database Vault we can also protect our database against some SQL statements. These statements can include SELECT, ALTER SYSTEM, database definition language (DDL), and data manipulation language (DML) statements.
We can do this with Command Rules. In this blog I am demonstrating how we can use a Command Rule to prevent SYS from creating a new pluggable database in a multitenant environment.

Before starting the demonstration, we can see that there are some predefined Command Rules which apply to all users.

SQL> show user
USER is "C##DBV_OWNER_ROOT"
SQL> show con_name

CON_NAME
------------------------------
PDB1
SQL> SELECT COMMAND, RULE_SET_NAME FROM DVSYS.DBA_DV_COMMAND_RULE;

COMMAND              RULE_SET_NAME
-------------------- --------------------------------------------------
ALTER PROFILE        Can Maintain Accounts/Profiles
ALTER SYSTEM         Allow Fine Grained Control of System Parameters
ALTER USER           Can Maintain Own Account
CHANGE PASSWORD      Can Maintain Own Account
CREATE PROFILE       Can Maintain Accounts/Profiles
CREATE USER          Can Maintain Accounts/Profiles
DROP PROFILE         Can Maintain Accounts/Profiles
DROP USER            Can Maintain Accounts/Profiles

8 rows selected.
SQL>

Because of these default Command Rules, for example, user sys cannot create a user once Database Vault is enabled.

SQL> conn sys/root@pdb1 as sysdba
Connected.
SQL> create user myuser identified by test;
create user myuser identified by test
                                 *
ERROR at line 1:
ORA-01031: insufficient privileges

To grant a user the ability to use these commands, you can grant the user the role that the rule set checks.

SQL> SELECT PRIVILEGE FROM DBA_SYS_PRIVS WHERE GRANTEE = 'DV_ACCTMGR';

PRIVILEGE
----------------------------------------
DROP PROFILE
ALTER PROFILE
ALTER USER
CREATE PROFILE
CREATE USER
CREATE SESSION
DROP USER

7 rows selected.

SQL>

To allow sys to create a user we can grant the DV_ACCTMGR role to SYS

SQL> show user
USER is "C##DBV_ACCTMGR_ROOT"

SQL> show con_name

CON_NAME
------------------------------
PDB1
SQL>

SQL> grant  DV_ACCTMGR to sys;

Grant succeeded.

And now SYS can create a user

SQL> conn sys/root@pdb1 as sysdba
Connected.
SQL> create user myuser identified by test;

User created.

SQL>

Before starting the demonstration let’s verify that user SYS, by default, can create a pluggable database

SQL> conn sys as sysdba
Enter password:
Connected.
SQL> show con_name

CON_NAME
------------------------------
CDB$ROOT
SQL> create pluggable database PDB2 ADMIN USER pdb2adm IDENTIFIED BY root create_file_dest='/u01/app/oracle/oradata/DBSEC/PDB2';

Pluggable database created.

SQL>

To prevent sys from creating a pluggable database, we are first going to create a RULE. This rule will determine when the command rule will be fired.

SQL> exec DVSYS.DBMS_MACADM.CREATE_RULE(rule_name => 'MY_PDB_RULE', 
                                        rule_expr => 'SYS_CONTEXT(''USERENV'', ''SESSION_USER'') != ''SYS''');

PL/SQL procedure successfully completed.

SQL>

After we have to create a RULE SET which is a collection of one or more rules. We can associate a rule set with a realm authorization, factor assignment, command rule, or secure application role.

SQL> exec DVSYS.DBMS_MACADM.CREATE_RULE_SET(rule_set_name => 'MY_PDB_RULESET', 
                                            description => ' About managing Pdbs', 
                                            enabled => DBMS_MACUTL.G_YES, eval_options => DBMS_MACUTL.G_RULESET_EVAL_ANY,
                                            audit_options => DBMS_MACUTL.G_RULESET_AUDIT_FAIL + DBMS_MACUTL.G_RULESET_AUDIT_SUCCESS, 
                                            fail_options => DBMS_MACUTL.G_RULESET_FAIL_SILENT, fail_message => '', 
                                            fail_code => '', 
                                            handler_options => DBMS_MACUTL.G_RULESET_HANDLER_OFF, 
                                            handler => '',
                                            is_static => FALSE);

PL/SQL procedure successfully completed.
SQL>

We then add the RULE to the RULE SET

BEGIN
DVSYS.DBMS_MACADM.ADD_RULE_TO_RULE_SET(
                                       rule_set_name => 'MY_PDB_RULESET',
                                       rule_name => 'MY_PDB_RULE');
END;
   /

PL/SQL procedure successfully completed.

And finally create a COMMAND RULE which will prevent SYS to execute a CREATE PLUGGABLE DATABASE statement

SQL> exec DVSYS.DBMS_MACADM.CREATE_COMMAND_RULE(command=> 'CREATE PLUGGABLE DATABASE', 
                                                rule_set_name => 'MY_PDB_RULESET', 
                                                object_owner => DBMS_ASSERT.ENQUOTE_NAME('%',FALSE), 
                                                object_name => '%',
                                                enabled => 'Y');

PL/SQL procedure successfully completed.

SQL>

And now if we try to create a Pdb with SYS

SQL> show user
USER is "SYS"
SQL> show con_name

CON_NAME
------------------------------
CDB$ROOT
SQL>  CREATE PLUGGABLE DATABASE PDB3 ADMIN USER pdb3adm IDENTIFIED BY root create_file_dest='/u01/app/oracle/oradata/DBSEC/PDB3';
 CREATE PLUGGABLE DATABASE PDB3 ADMIN USER pdb3adm IDENTIFIED BY root create_file_dest='/u01/app/oracle/oradata/DBSEC/PDB3'
*
ERROR at line 1:
ORA-47400: Command Rule violation for CREATE PLUGGABLE DATABASE on PDB3

SQL>
 

Cet article Database Vault : Rules, Rule Sets and Command Rules est apparu en premier sur Blog dbi services.

PGDay Amsterdam – follow up 1 – Adding columns with a default value and changing the default value right after

Fri, 2018-07-13 01:32

As always, this time during my talk about the PostgreSQL 11 new features in Amsterdam, there have been question I could not immediately answer. The first one was this: Suppose we add a column with a default value in PostgreSQL 11, what happens when we change that default afterwards? Does the table get rewritten? Do we have more than on distinct default value for that column? Here we go …

The sample table:

postgres=# select version();
                                                            version                                                            
-------------------------------------------------------------------------------------------------------------------------------
 PostgreSQL 11beta1 build on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-28), 64-bit
(1 row)
postgres=# create table t1 ( a int, b text );
CREATE TABLE
postgres=# insert into t1 (a,b) 
           select a.*, md5(a::text) 
             from generate_series(1,1000) a;
INSERT 0 1000

Lets add a new column with a default value:

postgres=# alter table t1 add column c text default 'aa';;
ALTER TABLE

This populates the two columns in pg_attribute as described in a previous post:

postgres=# select atthasmissing,attmissingval 
             from pg_attribute 
            where attrelid = 't1'::regclass and attname = 'c';
 atthasmissing | attmissingval 
---------------+---------------
 t             | {aa}
(1 row)

When we check for the distinct values in column “c” we should only see one result (which is “aa”):

postgres=# select c, count(*) from t1 group by c;
 c  | count 
----+-------
 aa |  1000
(1 row)

When I got the question right the concern was: When we change the default now do we see two results when we ask for the distinct values in column “c”? Of course not and the table is not rewritten:

postgres=# alter table t1 alter column c set default 'bb';
ALTER TABLE
postgres=# select c, count(*) from t1 group by c;
 c  | count 
----+-------
 aa |  1000
(1 row)

postgres=# select atthasmissing,attmissingval from pg_attribute where attrelid = 't1'::regclass and attname = 'c';
 atthasmissing | attmissingval 
---------------+---------------
 t             | {aa}
(1 row)

What does that mean? For the existing rows the value is still “aa” as that was true when the column was added. For new values we will get “bb”:

postgres=# \d t1
                  Table "public.t1"
 Column |  Type   | Collation | Nullable |  Default   
--------+---------+-----------+----------+------------
 a      | integer |           |          | 
 b      | text    |           |          | 
 c      | text    |           |          | 'bb'::text

postgres=# insert into t1 (a,b) values (1001,'aa');
INSERT 0 1
postgres=# select c, count(*) from t1 group by c;
 c  | count 
----+-------
 bb |     1
 aa |  1000
(2 rows)

I hope that answers the question. If not, please leave a comment.

 

Cet article PGDay Amsterdam – follow up 1 – Adding columns with a default value and changing the default value right after est apparu en premier sur Blog dbi services.

What a great event – PGDay Amsterdam

Fri, 2018-07-13 00:14

PostgreSQL conferences are always cool and this time it was in Amsterdam: PGDay Amsterdam. Beside that meeting all the great people was fun again the location was really outstanding: The TOBACCO Theater:

IMG-20180712-WA0001

Some impressions:

Here you can see Devrim preparing the opening of the event with the introduction session:
dav

… and then it happened: We finally started:

dav

Jan kicked of the sessions with his talk about the reasons he finally arrived in the PostgreSQL community after working years in another one:

Selection_006

Oleksi took over to speak about ACID, transactions and much more, a great talk:

dav
I had the pleasure to speak about PostgreSQL 11 to close the first sessions before the coffee break:

Selection_002

Stefanie followed with foreign data wrappers and data integration with PostgreSQL (another great one):

Selection_005

And then there was something special: You might know Devrim has a real PostgreSQL tattoo and that was taken as an opportunity to offer temporary tattoos to everyone and that looked like this:

Selection_003

Selection_004

Hans rocked the stage right after:
dav

Devrim right after his talk about WAL:
a115fd15-0f21-4173-afcc-1aece9465afd-original

As in Rapperswil two weeks ago Bruce closed the sessions with his talk: Will PostgreSQL live forever:
70c46b91-342d-41cf-af52-19f282fdd170-original1

There have been other session not mentioned here, which also have been great, but I didn’t ask if it as fine to publish the pictures. I could not attend the party after the event but I am sure that was great as well. See you next year. And never forget: PostgreSQL rocks :).

 

Cet article What a great event – PGDay Amsterdam est apparu en premier sur Blog dbi services.

pgSentinel: the sampling approach for PostgreSQL

Thu, 2018-07-12 17:12

Here is the first test I did with the beta of pgSentinel. This Active Session History sampling is a new approach to Postgres tuning. For people coming from Oracle, this is something that has made our life a lot easier to optimize database applications. Here is a quick example showing how it links together some information that are missing without this extension.

The installation of the extension is really easy (nore details on Daniel’s post):

cp pgsentinel.control /usr/pgsql-10/share/extension
cp pgsentinel--1.0.sql /usr/pgsql-10/share/extension
cp pgsentinel.so /usr/pgsql-10/lib

and declare it in postgresql.conf

grep -i pgSentinel $PGDATA/postgresql.conf
 
shared_preload_libraries = 'pg_stat_statements,pgsentinel'
#pgsentinel_ash.pull_frequency = 1
#pgsentinel_ash.max_entries = 1000000

and restart:

/usr/pgsql-10/bin/pg_ctl restart

Then create the views in psql:

CREATE EXTENSION pgsentinel;

I was running PGIO (the SLOB method for PostgreSQL from Kevin Closson https://kevinclosson.net/)

Without the extension, here is what I can see about the current activity from the OS point of view, with ‘top -c':

top - 21:57:23 up 1 day, 11:22, 4 users, load average: 4.35, 4.24, 4.16
Tasks: 201 total, 2 running, 199 sleeping, 0 stopped, 0 zombie
%Cpu(s): 27.6 us, 19.0 sy, 0.0 ni, 31.0 id, 19.0 wa, 0.0 hi, 3.4 si, 0.0 st
KiB Mem : 4044424 total, 54240 free, 282220 used, 3707964 buff/cache
KiB Swap: 421884 total, 386844 free, 35040 used. 3625000 avail Mem
 
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
9766 postgres 20 0 440280 160036 150328 D 50.0 4.0 10:56.63 postgres: postgres pgio [local] SELECT
9762 postgres 20 0 439940 160140 150412 D 43.8 4.0 10:55.95 postgres: postgres pgio [local] SELECT
9761 postgres 20 0 440392 160088 150312 D 37.5 4.0 10:52.29 postgres: postgres pgio [local] SELECT
9763 postgres 20 0 440280 160080 150432 R 37.5 4.0 10:41.94 postgres: postgres pgio [local] SELECT
9538 postgres 20 0 424860 144464 142956 D 6.2 3.6 0:30.79 postgres: writer process

As I described in a previous post, PostgreSQL changes the title of the process to display the current operation. This looks interesting, but not very detailed (only ‘SELECT’ here) and very misleading because here I’m running PGIO with 50% updates. The ‘SELECT’ here is the user call. Not the actual SQL statement running.

We have more information from PG_STAT_ACTIVITY, but again only the top-level call is displayed, as I mentioned in a previous post:

select * from pg_stat_activity where pid=9766;
-[ RECORD 1 ]----+---------------------------------------------------------
datid | 17487
datname | pgio
pid | 9766
usesysid | 10
usename | postgres
application_name | psql
client_addr |
client_hostname |
client_port | -1
backend_start | 2018-07-12 21:28:46.539052+02
xact_start | 2018-07-12 21:28:46.542203+02
query_start | 2018-07-12 21:28:46.542203+02
state_change | 2018-07-12 21:28:46.542209+02
wait_event_type | IO
wait_event | DataFileWrite
state | active
backend_xid | 37554
backend_xmin | 37553
query | SELECT * FROM mypgio('pgio4', 50, 3000, 131072, 255, 8);
backend_type | client backend

Here, I know what the user is doing: a call to mypgio() started at 21:28:46. And I know which resources are involved on the system: DataFileWrite. But again the most important is missing, the link between the user call and the system resources. And you can only guess it here because you know that a SELECT do not write to datafiles. There’s something hidden in the middle, which is actually an UPDATE. Of course, we can see this UPDATE in PG_STAT_STATEMENTS. But there, it will not be linked with the current activity, the mypgio() call, nor the DataFileWrite wait event. And we also need some timing information to be able to see the database load over the time.

Here is where the pgSentinel extension fills the gap, providing:

  • The actual query running, with the queryid which links to PG_STAT_STATEMENTS, but also the full text with all parameter values
  • multiple samples of the activity, with their timestamp information


select ash_time,pid,wait_event_type,wait_event,state,queryid,backend_type,top_level_query,query from pg_active_session_history order by ash_time desc,pid fetch first 10 rows only;
 
ash_time | pid | wait_event_type | wait_event | state | queryid | backend_type | top_level_query | query
-------------------------------+------+-----------------+---------------+--------+------------+----------------+----------------------------------------------------------+--------------------------------------------------------------------------
2018-07-12 21:57:22.991558+02 | 9761 | IO | DataFileWrite | active | 837728477 | client backend | SELECT * FROM mypgio('pgio2', 50, 3000, 131072, 255, 8); | SELECT sum(scratch) FROM pgio2 WHERE mykey BETWEEN 1065 AND 1320
2018-07-12 21:57:22.991558+02 | 9762 | IO | DataFileWrite | active | 1046864277 | client backend | SELECT * FROM mypgio('pgio3', 50, 3000, 131072, 255, 8); | SELECT sum(scratch) FROM pgio3 WHERE mykey BETWEEN 267 AND 522
2018-07-12 21:57:22.991558+02 | 9763 | IO | DataFileRead | active | 1648177216 | client backend | SELECT * FROM mypgio('pgio1', 50, 3000, 131072, 255, 8); | UPDATE pgio1 SET scratch = scratch + 1 WHERE mykey BETWEEN 1586 AND 1594
2018-07-12 21:57:22.991558+02 | 9766 | IO | DataFileWrite | active | 3411884874 | client backend | SELECT * FROM mypgio('pgio4', 50, 3000, 131072, 255, 8); | SELECT sum(scratch) FROM pgio4 WHERE mykey BETWEEN 3870 AND 4125
2018-07-12 21:57:21.990178+02 | 9761 | CPU | CPU | active | 837728477 | client backend | SELECT * FROM mypgio('pgio2', 50, 3000, 131072, 255, 8); | SELECT sum(scratch) FROM pgio2 WHERE mykey BETWEEN 13733 AND 13988
2018-07-12 21:57:21.990178+02 | 9762 | IO | DataFileRead | active | 1046864277 | client backend | SELECT * FROM mypgio('pgio3', 50, 3000, 131072, 255, 8); | SELECT sum(scratch) FROM pgio3 WHERE mykey BETWEEN 4135 AND 4390
2018-07-12 21:57:21.990178+02 | 9763 | IO | DataFileWrite | active | 2994234299 | client backend | SELECT * FROM mypgio('pgio1', 50, 3000, 131072, 255, 8); | SELECT sum(scratch) FROM pgio1 WHERE mykey BETWEEN 4347 AND 4602
2018-07-12 21:57:21.990178+02 | 9766 | CPU | CPU | active | 3411884874 | client backend | SELECT * FROM mypgio('pgio4', 50, 3000, 131072, 255, 8); | SELECT sum(scratch) FROM pgio4 WHERE mykey BETWEEN 14423 AND 14678
2018-07-12 21:57:20.985253+02 | 9761 | IO | DataFileWrite | active | 837728477 | client backend | SELECT * FROM mypgio('pgio2', 50, 3000, 131072, 255, 8); | SELECT sum(scratch) FROM pgio2 WHERE mykey BETWEEN 129 AND 384
2018-07-12 21:57:20.985253+02 | 9762 | IO | DataFileWrite | active | 1046864277 | client backend | SELECT * FROM mypgio('pgio3', 50, 3000, 131072, 255, 8); | SELECT sum(scratch) FROM pgio3 WHERE mykey BETWEEN 3313 AND 3568
(10 rows)

Everything is there. The timeline where each sample links together the user call (top_level_query), the running query (queryid and query – which is the text with parameter values), and the wait event (wait_event_type and wait_event).

Here is, on one sample, what is currently available in the beta version:

select * from pg_active_session_history where pid=9766 order by ash_time desc fetch first 1 rows only;
-[ RECORD 1 ]----+-----------------------------------------------------------------------
ash_time | 2018-07-12 21:57:23.992798+02
datid | 17487
datname | pgio
pid | 9766
usesysid | 10
usename | postgres
application_name | psql
client_addr |
client_hostname |
client_port | -1
backend_start | 2018-07-12 21:28:46.539052+02
xact_start | 2018-07-12 21:28:46.542203+02
query_start | 2018-07-12 21:28:46.542203+02
state_change | 2018-07-12 21:28:46.542209+02
wait_event_type | IO
wait_event | DataFileExtend
state | active
backend_xid | 37554
backend_xmin | 37553
top_level_query | SELECT * FROM mypgio('pgio4', 50, 3000, 131072, 255, 8);
query | UPDATE pgio4 SET scratch = scratch + 1 WHERE mykey BETWEEN 700 AND 708
queryid | 1109524376
backend_type | client backend

Then, what do we do with this? This is a fact table with many dimensions. And we can drill down on the database activity.

A quick overview of the load shows that I have, on average, 4 foreground sessions running for my user calls, and very low vacuuming activity:

postgres=# select backend_type
postgres-# ,count(*)/(select count(distinct ash_time)::float from pg_active_session_history) as load
postgres-# from pg_active_session_history
postgres-# group by backend_type
postgres-# ;
backend_type | load
-------------------+--------------------
client backend | 4.09720483938256
autovacuum worker | 0.07467667918231
(2 rows)

I’ll show in a future post how to query this view to drill down into the details. For the moment, here is a short explanation about the reason to go to a sampling approach.

Here is an abstract sequence diagram showing some typical user calls to the database. Several components are involved: CPU for the backed process, or for background processes, the OS, the storage… Our tuning goal is to reduce the user call duration. And then to reduce or optimize the work done in the different layers. With the current statistics available on PostgreSQL, like PG_STAT_ACTIVITY or PG_STAT_STATEMENTS, or available from the OS (strace to measure system call duration) we have a vertical approach on the load. We can look at each component individually:
PGASHWHY001

This is basically what we did on Oracle before ASH (Active Session History) was introduced in 10g, 12 years ago. The activity sampling approach takes an orthogonal point of view. Rather than cumulating statistics for each components, it looks at what happens on the system at specific point in times, across all components. We don’t have all measures (such as how many execution of a query) but only samples. However, each sample gives a complete view from the user call down to the system calls. And 1 second samples are sufficient to address any relevant activity, without taking too much space for short retention. For each sample, we cover all layers end-to-end:
PGASHWHY002
This horizontal approach makes the link between the user calls (the user perception of the database performance) and the system resources where we can analyze and optimize. With this, we can ensure that our tuning activity always focuses on the problem (the user response time) by addressing the root cause on the right component.

 

Cet article pgSentinel: the sampling approach for PostgreSQL est apparu en premier sur Blog dbi services.

Active session history in PostgreSQL: Say hello to pgSentinel

Thu, 2018-07-12 00:17

There is new project, currently in beta, which aims to bring active session history (and probably more) to PostgreSQL: pgSentinel. Because PostgreSQL is highly extensible such projects are possible and usually are coming as an extension. pgSentinel is no exception to that so lets see how it can be installed. When you want to try the beta for your own, please connect with the project on twitter.

This is what I got:

postgres@pgbox:/home/postgres/beta/ [pg103] ll
total 120
-rw-r--r--. 1 postgres postgres   1108 Jul  8 22:13 pgsentinel--1.0.sql
-rw-r--r--. 1 postgres postgres    117 Jul  5 22:15 pgsentinel.control
-rwxr-xr-x. 1 postgres postgres 108000 Jul  9 11:12 pgsentinel.so
-rw-r--r--. 1 postgres postgres    623 Jul  9 11:22 readme.txt

You can already see from here that we probably need to load a library because of the pgsentinel.so file. Lets copy that to the correct locations, in my case:

postgres@pgbox:/home/postgres/beta/ [pg103] cp pgsentinel--1.0.sql pgsentinel.control /u01/app/postgres/product/10/db_3/share/extension/
postgres@pgbox:/home/postgres/beta/ [pg103] cp pgsentinel.so /u01/app/postgres/product/10/db_3/lib/

As I plan to run pgbench later to get some load onto the system I’ve created a separate database for installing the extension:

postgres@pgbox:/home/postgres/ [PG10] psql -c "create database bench" postgres
CREATE DATABASE
postgres@pgbox:/home/postgres/ [PG10] pgbench -i -s 10 bench

When we need to load a library we need to tell PostgreSQL about that by setting the shared_preload_libraries parameter. As pgsentinel depends on pg_stat_statements this needs to be installed as well.

postgres=# alter system set shared_preload_libraries='pg_stat_statements,pgsentinel';
ALTER SYSTEM

So once we have that set and the instance is restarted:

postgres@pgbox:/home/postgres/beta/ [PG10] pg_ctl -D $PGDATA restart -m fast

… we should see the new extension in the pg_available_extensions view:

postgres=# select * from pg_available_extensions where name = 'pgsentinel';
    name    | default_version | installed_version |        comment         
------------+-----------------+-------------------+------------------------
 pgsentinel | 1.0             |                   | active session history
(1 row)

Ready to install the extensions:

postgres=# create extension pg_stat_statements;
CREATE EXTENSION
postgres=# create extension pgsentinel ;
CREATE EXTENSION
postgres=# \dx
                                     List of installed extensions
        Name        | Version |   Schema   |                        Description                        
--------------------+---------+------------+-----------------------------------------------------------
 pg_stat_statements | 1.5     | public     | track execution statistics of all SQL statements executed
 pgsentinel         | 1.0     | public     | active session history
 plpgsql            | 1.0     | pg_catalog | PL/pgSQL procedural language
(3 rows)

So what did we get? One solution is to look at the sql file:

cat pgsentinel--1.0.sql
-- complain if script is sourced in psql, rather than via CREATE EXTENSION
\echo Use "CREATE EXTENSION pgsentinel" to load this file. \quit

CREATE FUNCTION pg_active_session_history(
    OUT ash_time timestamptz,
    OUT datid Oid,
    OUT datname text,
    OUT pid integer,
    OUT usesysid Oid,
    OUT usename text,
    OUT application_name text,
    OUT client_addr text,
    OUT client_hostname text,
    OUT client_port integer,
    OUT backend_start timestamptz,
    OUT xact_start timestamptz,
    OUT query_start timestamptz,
    OUT state_change timestamptz,
    OUT wait_event_type text,
    OUT wait_event text,
    OUT state text,
    OUT backend_xid xid,
    OUT backend_xmin xid,
    OUT top_level_query text,
    OUT query text,
    OUT queryid bigint,
    OUT backend_type text
   
)
RETURNS SETOF record
AS 'MODULE_PATHNAME', 'pg_active_session_history'
LANGUAGE C STRICT VOLATILE PARALLEL SAFE;

-- Register a view on the function for ease of use.
CREATE VIEW pg_active_session_history AS
  SELECT * FROM pg_active_session_history();

GRANT SELECT ON pg_active_session_history TO PUBLIC;

The other solution is to ask PostgreSQL directly:

bench=# \dx+ pgsentinel 
  Objects in extension "pgsentinel"
          Object description          
--------------------------------------
 function pg_active_session_history()
 view pg_active_session_history
(2 rows)

Basically we got a function and a view over that function. Lets have a look at the view then:

postgres=# \d pg_active_session_history
                   View "public.pg_active_session_history"
      Column      |           Type           | Collation | Nullable | Default 
------------------+--------------------------+-----------+----------+---------
 ash_time         | timestamp with time zone |           |          | 
 datid            | oid                      |           |          | 
 datname          | text                     |           |          | 
 pid              | integer                  |           |          | 
 usesysid         | oid                      |           |          | 
 usename          | text                     |           |          | 
 application_name | text                     |           |          | 
 client_addr      | text                     |           |          | 
 client_hostname  | text                     |           |          | 
 client_port      | integer                  |           |          | 
 backend_start    | timestamp with time zone |           |          | 
 xact_start       | timestamp with time zone |           |          | 
 query_start      | timestamp with time zone |           |          | 
 state_change     | timestamp with time zone |           |          | 
 wait_event_type  | text                     |           |          | 
 wait_event       | text                     |           |          | 
 state            | text                     |           |          | 
 backend_xid      | xid                      |           |          | 
 backend_xmin     | xid                      |           |          | 
 top_level_query  | text                     |           |          | 
 query            | text                     |           |          | 
 queryid          | bigint                   |           |          | 
 backend_type     | text                     |           |          | 

Most of the columns are already in pg_stat_activity but there is more. Before going further lets generate some load:

postgres@pgbox:/home/postgres/ [PG10] pgbench -c 5 -j 4 -T 60 bench 
starting vacuum...end.
transaction type: 
scaling factor: 10
query mode: simple
number of clients: 5
number of threads: 4
duration: 60 s
number of transactions actually processed: 151397
latency average = 1.982 ms
tps = 2522.898859 (including connections establishing)
tps = 2523.280694 (excluding connections establishing)

Now we should see sampled data in the pg_active_session_history view:

bench=# select ash_time,top_level_query,query,queryid,wait_event_type,wait_event from pg_active_session_history limit 10;
           ash_time            |                               top_level_query                               |                                   query                                    |  queryid   | wait_event_type |  wait_event   
-------------------------------+-----------------------------------------------------------------------------+----------------------------------------------------------------------------+------------+-----------------+---------------
 2018-07-09 14:51:48.883599+02 | create database bench                                                       | create database bench                                                      | 3222771996 | CPU             | CPU
 2018-07-09 14:52:37.291115+02 | copy pgbench_accounts from stdin                                            | copy pgbench_accounts from stdin                                           | 4164808321 | CPU             | CPU
 2018-07-09 14:52:38.292674+02 | alter table pgbench_accounts add primary key (aid)                          | alter table pgbench_accounts add primary key (aid)                         | 4164808321 | CPU             | CPU
 2018-07-09 14:55:51.111621+02 | UPDATE pgbench_branches SET bbalance = bbalance + 2228 WHERE bid = 4;       | UPDATE pgbench_branches SET bbalance = bbalance + 2228 WHERE bid = 4       |  553956422 | Lock            | transactionid
 2018-07-09 14:55:51.111621+02 | END;                                                                        | END                                                                        | 3376944276 | CPU             | CPU
 2018-07-09 14:55:51.111621+02 | UPDATE pgbench_accounts SET abalance = abalance + -2408 WHERE aid = 973208; | UPDATE pgbench_accounts SET abalance = abalance + -2408 WHERE aid = 973208 | 2992934481 | CPU             | CPU
 2018-07-09 14:55:52.112507+02 | UPDATE pgbench_tellers SET tbalance = tbalance + -4957 WHERE tid = 87;      | UPDATE pgbench_tellers SET tbalance = tbalance + -4957 WHERE tid = 87      | 3459630226 | Client          | ClientRead
 2018-07-09 14:55:52.112507+02 | END;                                                                        | END                                                                        | 3376944276 | LWLock          | WALWriteLock
 2018-07-09 14:55:52.112507+02 | END;                                                                        | END                                                                        | 3376944276 | CPU             | CPU
 2018-07-09 14:55:52.112507+02 | UPDATE pgbench_branches SET bbalance = bbalance + -3832 WHERE bid = 8;      | UPDATE pgbench_branches SET bbalance = bbalance + -3832 WHERE bid = 8      |  553956422 | Lock            | transactionid
(10 rows)

The important point here is that we have the queryid which we can use to map that back to pg_stat_statements. So if we want to know what the shared_blks_* statistics for the update statement with query id 553956422 are, we can do that (or write a join over the two views, of course):

bench=# select shared_blks_hit,shared_blks_read,shared_blks_dirtied,shared_blks_written from pg_stat_statements where queryid = 553956422;
 shared_blks_hit | shared_blks_read | shared_blks_dirtied | shared_blks_written 
-----------------+------------------+---------------------+---------------------
          453201 |               29 |                  37 |                   0
(1 row)

Really looks promising, automatic session sampling in PostgreSQL. More tests to come …

 

Cet article Active session history in PostgreSQL: Say hello to pgSentinel est apparu en premier sur Blog dbi services.

Documetum xPlore: Failed to load content file + file system full

Wed, 2018-07-11 08:33

We had issues with our indexing server when launching batches of indexes to the queue, in the xPlore server. The indexing was taking so much time to index documents and almost all queue items fell down to the warning/error state.

In addition, our file system was growing too fast : 100Go for only 30.000 docs. I was like the temp folders weren’t cleared properly. And in other hand the queue was telling that the temp file couldn’t be loaded because it was cleared too early…

Thoughts

I observed the behavior of the temp folders in the idex agent server and noticed that the files were only 10min old, even after 30min after sending the batch of index requests. Hence, I deduced the clearing was still running, which could explain the index warning telling the file couldn’t be found.

That means the processing of the indexes takes too long while the clearing thread runs anyway… But I noticed that the file system was still growing way to much.

Solution

If you didn’t know, by default, the CPS (which parses the downloaded files) only has 1 thread. It means that if it takes too long (50Mo files in my case), the thread will be busy and other files will not be indexed during this time. But the documents will still be downloaded during this time, and the clearing process will still continue to harvest our beloved files.

The point here is to add more CPS threads to parallelize the process and then avoid missing files. You can also increase the time between two clearing phase but it’s not efficient and increasing the number of threads will improve your overall performances.

To do so, edit the following config file:

$DSEARCH_HOME/dsearch/cps/cps_daemon/PrimaryDsearch_local_configuration.xml

Change the following line from 1 to 4:

<daemon_count>1</daemon_count>

A restart will be needed. You can change the value from 1 to 6 maximum. Please note that xPlore uses two other threads for clearing and other processes, and it allows only 8 threads to run at the same time, then 6 is the maximum cps to run, otherwise you’ll have issues with the clearing thread, and you will end up with a file system full.

 

Cet article Documetum xPlore: Failed to load content file + file system full est apparu en premier sur Blog dbi services.

Creating and Using a Parcel Repository for Cloudera Manager

Wed, 2018-07-11 07:59

This blog post describes how to create a hosted Cloudera repository and use it in your Cloudera Manager deployment.

The first step is to install a web server, which will host RPM packages and repodata. The common way, is to use an Apache web server.

Installing Apache HTTPD service
[cdhtest@edge ]$ sudo yum install httpd -y

 

Starting Apache HTTPD service
[cdhtest@edge ]$ sudo systemctl start httpd

Verify that the service has been started properly.

[cdhtest@master html]$ sudo systemctl status httpd
* httpd.service - The Apache HTTP Server
   Loaded: loaded (/usr/lib/systemd/system/httpd.service; disabled; vendor preset: disabled)
   Active: active (running) since Wed 2018-07-11 09:16:45 UTC; 1h 26min ago
     Docs: man:httpd(8)
           man:apachectl(8)
 Main PID: 53284 (httpd)
   Status: "Total requests: 40; Current requests/sec: 0; Current traffic:   0 B/sec"
   CGroup: /system.slice/httpd.service
           |-53284 /usr/sbin/httpd -DFOREGROUND
           |-53285 /usr/sbin/httpd -DFOREGROUND
           |-53286 /usr/sbin/httpd -DFOREGROUND
           |-53287 /usr/sbin/httpd -DFOREGROUND
           |-53288 /usr/sbin/httpd -DFOREGROUND
           |-53289 /usr/sbin/httpd -DFOREGROUND
           |-53386 /usr/sbin/httpd -DFOREGROUND
           |-53387 /usr/sbin/httpd -DFOREGROUND
           |-53388 /usr/sbin/httpd -DFOREGROUND
           `-58024 /usr/sbin/httpd -DFOREGROUND

Jul 11 09:16:45 master systemd[1]: Starting The Apache HTTP Server...
Jul 11 09:16:45 master httpd[53284]: AH00558: httpd: Could not reliably determine the server's fully qualified domain name, using 192.168.2.6. Set the 'ServerName' directive globally to suppress this message
Jul 11 09:16:45 master systemd[1]: Started The Apache HTTP Server.

 

Downloading Parcels for CDH5 and Publishing files

Download parcels according to your OS distribution for CDH5 (latest release) from the below link:

https://archive.cloudera.com/cdh5/parcels/latest/

Download the 2 files:

  • .parcels
  • manifest.json

Before downloading the files, create the CDH parcel directory tree in your web server.

[cdhtest@master html]$ cd /var/www/html/
[cdhtest@master html]$ sudo mkdir -p cdh5.15/
[cdhtest@master html]$ sudo chmod -R ugo+rX /var/www/html/cdh5.15/
[cdhtest@master html]$ cd /var/www/html/cdh5.15/
[cdhtest@master cdh5.15]$ sudo wget https://archive.cloudera.com/cdh5/parcels/latest/CDH-5.15.0-1.cdh5.15.0.p0.21-el5.parcel https://archive.cloudera.com/cdh5/parcels/latest/manifest.json
--2018-07-11 12:16:04--  https://archive.cloudera.com/cdh5/parcels/latest/CDH-5.15.0-1.cdh5.15.0.p0.21-el5.parcel
Resolving archive.cloudera.com (archive.cloudera.com)... 151.101.32.167
Connecting to archive.cloudera.com (archive.cloudera.com)|151.101.32.167|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 1675168741 (1.6G) [binary/octet-stream]
Saving to: 'CDH-5.15.0-1.cdh5.15.0.p0.21-el5.parcel'

100%[==================================================================================================================================================================================================================================================================================================>] 1,675,168,741 53.2MB/s   in 29s

2018-07-11 12:16:32 (56.0 MB/s) - 'CDH-5.15.0-1.cdh5.15.0.p0.21-el5.parcel' saved [1675168741/1675168741]

--2018-07-11 12:16:32--  https://archive.cloudera.com/cdh5/parcels/latest/manifest.json
Reusing existing connection to archive.cloudera.com:443.
HTTP request sent, awaiting response... 200 OK
Length: 74072 (72K) [application/json]
Saving to: 'manifest.json'

100%[====================================================================================================================================================================================================================================================================================================>] 74,072      --.-K/s   in 0s

2018-07-11 12:16:32 (225 MB/s) - 'manifest.json' saved [74072/74072]

FINISHED --2018-07-11 12:16:32--
Total wall clock time: 29s
Downloaded: 2 files, 1.6G in 29s (56.0 MB/s)
[cdhtest@master cdh5.15]$
[cdhtest@master cdh5.15]$ ll
total 1635984
-rw-r--r-- 1 root root 1675168741 Jun 14 18:06 CDH-5.15.0-1.cdh5.15.0.p0.21-el5.parcel
-rw-r--r-- 1 root root      74072 Jun 14 18:08 manifest.json

 

Your Parcels remote repository is now available

CM_Parcels8

 

Configuring the Cloudera Manager Server to Use the Parcel URL for Hosted Repositories

1. In the Cluster Installation – Select Repository step

Click on More Options.

CM_Parcels3

Add your Remote Parcel Repository URL .

CM_Parcels4

Then Cloudera Manager will download, distribute, unpack and activate parcels for all cluster hosts.

CM_Parcels5

 

2. You can also configure your local parcels repository in the Cloudera Manager Configuration Menu.

Click on Administration Menu > Settings

Click on Parcels category > Add your Remote Parcel Repository URL here

CM_Parcels6

Click Save Changes to commit the changes.

 

Activate Parcels

In the Cloudera Manager Parcels page,

Click on Check for New Parcels

Click on Download, Distribute, Activate buttons for the parcels found.

CM_Parcels7

 

Cet article Creating and Using a Parcel Repository for Cloudera Manager est apparu en premier sur Blog dbi services.

Oracle Database Vault: Realm in a Pluggable Database

Tue, 2018-07-10 11:03

Database Vault can also be used in a multitenant environment. In a multitenant environment we must register Oracle Database Vault in the root first, then after in the PDBs.
In this blog we will see how we can use realms to protect data in a pluggable database 12.1.

In CDB$ROOT we have to create common accounts that will be used for the Database Vault Owner (DV_OWNER role) and Database Vault Account Manager (DV_ACCTMGR role) accounts. It is also recommended to create a backup for each user.

SQL> conn sys as sysdba
Enter password:
Connected.
SQL> show con_name

CON_NAME
------------------------------
CDB$ROOT
SQL>

SQL> GRANT CREATE SESSION, SET CONTAINER TO c##dbv_owner_root IDENTIFIED BY root CONTAINER = ALL;

SQL> GRANT CREATE SESSION, SET CONTAINER TO c##dbv_acctmgr_root IDENTIFIED BY root CONTAINER = ALL;

SQL> grant select any dictionary to C##DBV_OWNER_ROOT;

Grant succeeded.

SQL> grant select any dictionary to C##DBV_ACCTMGR_ROOT;

Grant succeeded.

SQL>

The next step is configure Database Vault user account on CDB$ROOT

BEGIN
 DVSYS.CONFIGURE_DV (
   dvowner_uname         => 'c##dbv_owner_root',
   dvacctmgr_uname       => 'c##dbv_acctmgr_root');
 END;
  6  /

PL/SQL procedure successfully completed.

SQL> @?/rdbms/admin/utlrp.sql

We can after enable Oracle Database Vault with user c##dbv_owner_root in CDB$ROOT

SQL> conn c##dbv_owner_root/root
Connected.
SQL> show user
USER is "C##DBV_OWNER_ROOT"
SQL> show con_name

CON_NAME
------------------------------
CDB$ROOT
SQL>

SQL> EXEC DBMS_MACADM.ENABLE_DV;

PL/SQL procedure successfully completed.

SQL>

After restart o CDB$ROOT , we can verify the status. These queries should return TRUE.

SQL> SELECT VALUE FROM V$OPTION WHERE PARAMETER = 'Oracle Database Vault';

VALUE
----------------------------------------------------------------
TRUE

SQL> SELECT VALUE FROM V$OPTION WHERE PARAMETER = 'Oracle Label Security';

VALUE
----------------------------------------------------------------
TRUE

SQL>  SELECT * FROM DVSYS.DBA_DV_STATUS;

NAME                STATUS
------------------- ----------------------------------------------------------------
DV_CONFIGURE_STATUS TRUE
DV_ENABLE_STATUS    TRUE

SQL>

At PDB level, we must register common users we created earlier. In this example I am using a pluggable database named PDB1.

SQL> show user
USER is "SYS"
SQL> show con_name

CON_NAME
------------------------------
PDB1

SQL> GRANT CREATE SESSION, SET CONTAINER TO c##dbv_owner_root CONTAINER = CURRENT;

Grant succeeded.

SQL> GRANT CREATE SESSION, SET CONTAINER TO c##dbv_acctmgr_root CONTAINER = CURRENT;

Grant succeeded.

SQL>

SQL> grant select any dictionary to C##DBV_OWNER_ROOT;

Grant succeeded.

SQL> grant select any dictionary to C##DBV_ACCTMGR_ROOT;

Grant succeeded.

SQL>

Like in CDB$ROOT we also have to configure the Database Vault Users in PDB1

SQL> show user
USER is "SYS"
SQL> show con_name

CON_NAME
------------------------------
PDB1


SQL> BEGIN
 DVSYS.CONFIGURE_DV (
   dvowner_uname         => 'c##dbv_owner_root',
   dvacctmgr_uname       => 'c##dbv_acctmgr_root');
 END;
  6  /

PL/SQL procedure successfully completed.

SQL>

SQL> @?/rdbms/admin/utlrp.sql

And now let’s enable Oracle Database Vault on PDB1

SQL> show user
USER is "C##DBV_OWNER_ROOT"
SQL> show con_name

CON_NAME
------------------------------
PDB1
SQL> EXEC DBMS_MACADM.ENABLE_DV;

PL/SQL procedure successfully completed.

SQL>

With SYS let’s restart PDB1

SQL> show user
USER is "SYS"
SQL> show con_name

CON_NAME
------------------------------
PDB1
SQL> alter pluggable database pdb1 close immediate;

Pluggable database altered.

SQL> alter pluggable database pdb1 open;

Pluggable database altered.

As in CDB$ROOT we can verify

SQL> show con_name

CON_NAME
------------------------------
PDB1
SQL> SELECT VALUE FROM V$OPTION WHERE PARAMETER = 'Oracle Database Vault';

VALUE
----------------------------------------------------------------
TRUE

SQL> SELECT VALUE FROM V$OPTION WHERE PARAMETER = 'Oracle Label Security';

VALUE
----------------------------------------------------------------
TRUE

SQL> SELECT * FROM DVSYS.DBA_DV_STATUS;

NAME                STATUS
------------------- ----------------------------------------------------------------
DV_CONFIGURE_STATUS TRUE
DV_ENABLE_STATUS    TRUE

SQL>

Now that the Database vault is configured, we can create A REALM to protect our DATA. In this example we are protecting data of the SCOTT table EMP. We are using EM 12c to create the REAM.
From Database Home select Security and then Database Vault
dbvault1
In the Database vault page log with any user having appropriate privileges: DV_OWNER or DV_ADMIN role, SELECT ANY DICTIONARY
dbvault2
Before creating the REALM we can verify that user SYSTEM access to table SCOTT.EMP

SQL> show user
USER is "SYSTEM"
SQL> show con_name

CON_NAME
------------------------------
PDB1
SQL> select * from scott.emp;

     EMPNO ENAME      JOB              MGR HIREDATE         SAL       COMM     DEPTNO
---------- ---------- --------- ---------- --------- ---------- ---------- ----------
      7369 SMITH      CLERK           7902 17-DEC-80        800                    20
      7499 ALLEN      SALESMAN        7698 20-FEB-81       1600        300         30
      7521 WARD       SALESMAN        7698 22-FEB-81       1250        500         30
      7566 JONES      MANAGER         7839 02-APR-81       2975                    20
      7654 MARTIN     SALESMAN        7698 28-SEP-81       1250       1400         30
      7698 BLAKE      MANAGER         7839 01-MAY-81       2850                    30
      7782 CLARK      MANAGER         7839 09-JUN-81       2450                    10
      7788 SCOTT      ANALYST         7566 19-APR-87       3000                    20
      7839 KING       PRESIDENT            17-NOV-81       5000                    10
      7844 TURNER     SALESMAN        7698 08-SEP-81       1500          0         30
      7876 ADAMS      CLERK           7788 23-MAY-87       1100                    20
      7900 JAMES      CLERK           7698 03-DEC-81        950                    30

     EMPNO ENAME      JOB              MGR HIREDATE         SAL       COMM     DEPTNO
---------- ---------- --------- ---------- --------- ---------- ---------- ----------
      7902 FORD       ANALYST         7566 03-DEC-81       3000                    20
      7934 MILLER     CLERK           7782 23-JAN-82       1300                    10

14 rows selected.

SQL>

Under Administration Tab, Select Reams
dbvault3
And Click on Create
Give a name and a description for the realm
dbvault4
Click on Next
On the Realm Secured Objects click on Add
dbvault5
Click on OK
dbvault6
Click on Next
On Real Authorization page select ADD
dbvault7
Click on OK
dbvault8
Click Next
On the Review page Click Finish
dbvault9
And the end we should have
dbvault10
And that’s all. We can verify now that SYSTEM is no longer allowed to query SCOTT.EMP

SQL> conn system/root@pdb1
Connected.
SQL> show user
USER is "SYSTEM"
SQL> show con_name

CON_NAME
------------------------------
PDB1
SQL> select * from scott.emp;
select * from scott.emp
                    *
ERROR at line 1:
ORA-01031: insufficient privileges
SQL>

And that user EDGE is allowed to query SCOTT.EMP

SQL> show user
USER is "EDGE"
SQL> show con_name

CON_NAME
------------------------------
PDB1
SQL> select * from scott.emp;

     EMPNO ENAME      JOB              MGR HIREDATE         SAL       COMM     DEPTNO
---------- ---------- --------- ---------- --------- ---------- ---------- ----------
      7369 SMITH      CLERK           7902 17-DEC-80        800                    20
      7499 ALLEN      SALESMAN        7698 20-FEB-81       1600        300         30
      7521 WARD       SALESMAN        7698 22-FEB-81       1250        500         30
      7566 JONES      MANAGER         7839 02-APR-81       2975                    20
      7654 MARTIN     SALESMAN        7698 28-SEP-81       1250       1400         30
      7698 BLAKE      MANAGER         7839 01-MAY-81       2850                    30
      7782 CLARK      MANAGER         7839 09-JUN-81       2450                    10
      7788 SCOTT      ANALYST         7566 19-APR-87       3000                    20
      7839 KING       PRESIDENT            17-NOV-81       5000                    10
      7844 TURNER     SALESMAN        7698 08-SEP-81       1500          0         30
      7876 ADAMS      CLERK           7788 23-MAY-87       1100                    20
      7900 JAMES      CLERK           7698 03-DEC-81        950                    30

     EMPNO ENAME      JOB              MGR HIREDATE         SAL       COMM     DEPTNO
---------- ---------- --------- ---------- --------- ---------- ---------- ----------
      7902 FORD       ANALYST         7566 03-DEC-81       3000                    20
      7934 MILLER     CLERK           7782 23-JAN-82       1300                    10

14 rows selected.

SQL>

 

Cet article Oracle Database Vault: Realm in a Pluggable Database est apparu en premier sur Blog dbi services.

Create an HDFS user’s home directory

Tue, 2018-07-10 07:07

Let’s assume we need to create an HDFS home directory for a user named “dbitest”.

We need first to verify if the user exists on the local filesystem. It’s important to understand that HDFS is mapping users from the local filesystem.

[cdhtest@master ~]$ cat /etc/passwd | grep dbitest
 Create a user on the local file system

When the user is not created, we can easily create one with it associated group.

[cdhtest@master ~]$ sudo groupadd dbitest

[cdhtest@master ~]$ sudo useradd -g dbitest -d/home/dbitest dbitest

[cdhtest@master ~]$ cat /etc/passwd | grep dbitest
dbitest:x:1002:1002::/home/dbitest:/bin/bash
[cdhtest@master ~]$

Note that, the user dbitest should be created in all cluster hosts.

Create a directory in HDFS for a new user

Then we can create the directory under /user in HDFS for the new user dbitest. This directory needs to be created using hdfs user as hdfs user is the super user for admin commands.

[cdhtest@master ~]$ sudo -u hdfs hdfs dfs -mkdir /user/dbitest

 

Verify the owner for our new directory
[cdhtest@master ~]$ sudo -u hdfs hdfs dfs -ls /user
Found 5 items
drwxr-xr-x   - hdfs   supergroup          0 2018-07-10 10:10 /user/dbitest
drwxrwxrwx   - mapred hadoop              0 2018-07-10 07:54 /user/history
drwxrwxr-t   - hive   hive                0 2018-07-10 07:55 /user/hive
drwxrwxr-x   - hue    hue                 0 2018-07-10 07:55 /user/hue
drwxrwxr-x   - oozie  oozie               0 2018-07-10 07:56 /user/oozie

The new home directory has been created but it’s owned by hdfs user.

Change owner for /user/dbitest directory

Use the below command to change the owner of the new user home directory created.

[cdhtest@master ~]$ sudo -u hdfs hdfs dfs -chown dbitest:dbitest /user/dbitest

Let’s see if the owner has changed.

[cdhtest@master ~]$ sudo -u hdfs hdfs dfs -ls /user
Found 5 items
drwxr-xr-x   - dbitest dbitest          0 2018-07-10 10:10 /user/dbitest
drwxrwxrwx   - mapred  hadoop           0 2018-07-10 07:54 /user/history
drwxrwxr-t   - hive    hive             0 2018-07-10 07:55 /user/hive
drwxrwxr-x   - hue     hue              0 2018-07-10 07:55 /user/hue
drwxrwxr-x   - oozie   oozie            0 2018-07-10 07:56 /user/oozie
Change permissions

Change the permissions of the newly created home directory so that no other users can have read, write and execute permissions except the owner.

[cdhtest@master ~]$ sudo -u hdfs hdfs dfs -chmod 700 /user/dbitest
[cdhtest@master ~]$ sudo -u hdfs hdfs dfs -ls /user
Found 6 items
drwxr-xr-x   - admins  cdhtest          0 2018-07-10 08:56 /user/cdhtest
drwx------   - dbitest dbitest          0 2018-07-10 10:10 /user/dbitest
drwxrwxrwx   - mapred  hadoop           0 2018-07-10 07:54 /user/history
drwxrwxr-t   - hive    hive             0 2018-07-10 07:55 /user/hive
drwxrwxr-x   - hue     hue              0 2018-07-10 07:55 /user/hue
drwxrwxr-x   - oozie   oozie            0 2018-07-10 07:56 /user/oozie

 

Test the user dbitest home directory

We can now test the user home directory creation by uploading data into it without specifying the destination directory. The file will be automatically uploaded to the user’s home directory if no destination is specified.

[cdhtest@master ~]$ sudo su dbitest
[dbitest@master ~]$ hdfs dfs -ls /user/dbitest
[dbitest@master ~]$ hdfs dfs -put HelloWorld.txt
[dbitest@master ~]$ hdfs dfs -ls /user/dbitest
Found 1 items
-rw-r--r--   3 dbitest dbitest         39 2018-07-10 10:30 /user/dbitest/HelloWorld.txt

 

Your user home directory has been created successfully.

 

Cet article Create an HDFS user’s home directory est apparu en premier sur Blog dbi services.

Deploy a Cloudera cluster with Terraform and Ansible in Azure – part 3

Tue, 2018-07-10 03:42

After the deployment step with Terraform and the configuration/installation with Ansible, we will continue the installation of our Cloudera cluster with Cloudera Manager.

By following the below steps you will see how to install CDH on our hosts using Cloudera Manager.

Connection

First, Login to Cloudera manager URL.

Cloudera-Manager

When you connect to C.M for the first time, you need to accept the Cloudera Terms and Conditions.

Cloudera-Manager-Conditions

Then choose your desired edition of Cloudera. For this blog post, we will use the Data Hub trial edition (60 days trial).

C.M-Edition

 

Hosts discovery

In this step, provide the IP or hostname of all cluster machines you want to use. To complete this steps check if your /etc/hosts file of each cluster hosts is properly defined.

Cloudera-Manager-Hosts

When all hosts are reachable by Cloudera Manager server,

CDH Installation

This step is about the version of CDH to install in your cluster.

C.M-Installation_Methods

Use parcels installation.

By default, the parcel directory is /opt/cloudera/parcels. A best practice is to have a separate filesystem for /opt (at least 15GB), in order to separate the Cloudera installation to the /root filesystem.

If you don’t have a specific filesystem for /opt, you will have some performance impact on your server.

Java automatic installation

Since we install Java JDK automatically and properly with Ansible, we don’t need to check the box for java installation. Please skip this step.

CM-Java_install

Account details

In this step, we will provide the user account information to Cloudera Manager in order to install all components needed in all cluster hosts.

It’s not recommended to give the root access to Cloudera Manager but a dedicated user with sudo access. For our example we will use the user created during the installation part: dbi, with it associated password.

CM-Users1

Cloudera Installation – Install Agents

In this steps, Cloudera Manager will install and configure cloudera-scm-agent in all cluster hosts.

CM-Install-Agents

 

Cloudera Installation – Parcels installation

After cloudera-scm-agent installation and configuration, Cloudera Manager will install the CDH Parcel and additional parcels on all cluster hosts.

CM-Parcels2

Cloudera Installation – Hosts Inspector

In this step, the host’s inspector will check all cluster hosts requirement and notify you if you have any problem.

Note that, you can go through all validation section to understand all pre-requisites for Cloudera installation. You will get to know the complete checklists items Cloudera use to validate your host’s cluster.

CM-Hosts-Inspector

You can ignore the warning for now and resolve them after the installation. Click on Finish button and go to the next step.

Cluster Setup – Select Services

In this step, choose your services to install. For our sandbox environment we will only install Core Hadoop first.

CM-Services

 

Cluster Setup – Customize Role Assignments

Assign roles by hosts and click on continue.

CM-Roles

 

Cluster Setup – Setup databases

In this step, setup the remote databases for hive metastore, hue, Cloudera reports manager and oozie server.

CM-Databases

Test the connection and click on Continue .

Cluster Setup – Review changes

Ensure that you use the /data directory previously created with Terraform and Ansible.

CM-ReviewsUsers

 

Cluster Setup – Start services

CM-StartServices

 

Congratulations your Cloudera cluster is now installed and configured!

CM-End

CM-Dashboard

 

 

Cet article Deploy a Cloudera cluster with Terraform and Ansible in Azure – part 3 est apparu en premier sur Blog dbi services.

Data Guard: always set db_create_file_dest on the standby

Mon, 2018-07-09 12:21

The file name convert parameters are not dynamic and require a restart of the instance. An enhancement request was filled in 2011. I mentioned recently on Twitter that it can be annoying with Active Data Guard when a file on the primary server is created on a path that has no file name conversion. However, Ian Baugaard mentioned that there is a workaround for this specific case because db_create_file_dest is dynamic:

I recall seeing a MOS note indicating the order of precedence when it comes to db_file_name_convert and db_create_file_dest. The latter wins and makes config much easier especially when using OMF and ASM

— Ian Baugaard (@IanBaugaard) July 5, 2018

I’ve quickly created a 18c Data Guard configuration on the Oracle Cloud DBaaS to test it and here it is.

In the primary database and the standby database, here are the datafiles:

RMAN> report schema;
 
Report of database schema for database with db_unique_name ORCL_01
 
List of Permanent Datafiles
===========================
File Size(MB) Tablespace RB segs Datafile Name
---- -------- -------------------- ------- ------------------------
1 830 SYSTEM YES /u02/app/oracle/oradata/ORCL/system01.dbf
3 510 SYSAUX NO /u02/app/oracle/oradata/ORCL/sysaux01.dbf
4 60 UNDOTBS1 YES /u02/app/oracle/oradata/ORCL/undotbs01.dbf
5 340 PDB$SEED:SYSTEM NO /u02/app/oracle/oradata/ORCL/pdbseed/system01.dbf
6 620 PDB$SEED:SYSAUX NO /u02/app/oracle/oradata/ORCL/pdbseed/sysaux01.dbf
7 5 USERS NO /u02/app/oracle/oradata/ORCL/users01.dbf
8 200 PDB$SEED:UNDOTBS1 NO /u02/app/oracle/oradata/ORCL/pdbseed/undotbs01.dbf
12 340 PDB1:SYSTEM YES /u02/app/oracle/oradata/ORCL/PDB1/system01.dbf
13 620 PDB1:SYSAUX NO /u02/app/oracle/oradata/ORCL/PDB1/sysaux01.dbf
14 200 PDB1:UNDOTBS1 YES /u02/app/oracle/oradata/ORCL/PDB1/undotbs01.dbf
15 50 PDB1:USERS NO /u02/app/oracle/oradata/ORCL/PDB1/PDB1_users01.dbf
 
List of Temporary Files
=======================
File Size(MB) Tablespace Maxsize(MB) Tempfile Name
---- -------- -------------------- ----------- --------------------
1 33 TEMP 32767 /u04/app/oracle/oradata/temp/temp01.dbf
2 62 PDB$SEED:TEMP 32767 /u04/app/oracle/oradata/temp/pdbseed_temp012018-02-08_13-49-27-256-PM.dbf
4 62 PDB1:TEMP 32767 /u04/app/oracle/oradata/temp/temp012018-02-08_13-49-27-256-PM.dbf

The properties of the standby database define no DbFileNameConvert because the directory structure is supposed to be the same:

DGMGRL> show configuration
 
Configuration - fsc
 
Protection Mode: MaxPerformance
Members:
ORCL_01 - Primary database
ORCL_02 - Physical standby database
 
Fast-Start Failover: DISABLED
 
Configuration Status:
SUCCESS (status updated 45 seconds ago)
 
 
DGMGRL> show database verbose 'ORCL_02';
 
Database - ORCL_02
 
Role: PHYSICAL STANDBY
Intended State: APPLY-ON
Transport Lag: 0 seconds (computed 0 seconds ago)
Apply Lag: 0 seconds (computed 0 seconds ago)
Average Apply Rate: 15.00 KByte/s
Active Apply Rate: 532.00 KByte/s
Maximum Apply Rate: 535.00 KByte/s
Real Time Query: ON
Instance(s):
ORCL
 
Properties:
DGConnectIdentifier = 'ORCL_02'
...
DbFileNameConvert = ''
LogFileNameConvert = 'dummy, dummy'
...
 
Log file locations:
Alert log : /u01/app/oracle/diag/rdbms/orcl_02/ORCL/trace/alert_ORCL.log
Data Guard Broker log : /u01/app/oracle/diag/rdbms/orcl_02/ORCL/trace/drcORCL.log
 
Database Status:
SUCCESS

You can see that Oracle defines a dummy log file name convert. This a good idea to avoid some RMAN duplicate issues.

On the standby server, I have no db_create_file_dest defined:

SQL> show parameter create%dest
 
NAME TYPE VALUE
------------------------------------ ----------- ------------------------------
db_create_file_dest string
db_create_online_log_dest_1 string .
db_create_online_log_dest_2 string
db_create_online_log_dest_3 string
db_create_online_log_dest_4 string
db_create_online_log_dest_5 string

Note that the Oracle Cloud DBaaS defines it. I’ve reset it for the purpose of this demo.

New filesystem on Primary server only

I create a new filesystem on the primary server:

[root@DG-dg01 opc]# mkdir /DATA ; chown oracle:dba /DATA

I create a datafile on this new filesystem:

SQL> alter session set container=PDB1;
Session altered.
 
SQL> create tablespace FRANCK datafile '/DATA/franck.dbf' size 100M;
Tablespace created.

The apply is stuck:

DGMGRL> show database 'ORCL_02';
 
Database - ORCL_02
 
Role: PHYSICAL STANDBY
Intended State: APPLY-ON
Transport Lag: 0 seconds (computed 2 seconds ago)
Apply Lag: 11 seconds (computed 2 seconds ago)
Average Apply Rate: 16.00 KByte/s
Real Time Query: OFF
Instance(s):
ORCL
 
Database Error(s):
ORA-16766: Redo Apply is stopped
 
Database Status:
ERROR

The standby alert.log shows the error about the impossibility to create the datafile:

2018-07-06T08:04:59.077730+00:00
Errors in file /u01/app/oracle/diag/rdbms/orcl_02/ORCL/trace/ORCL_pr00_29393.trc:
ORA-01274: cannot add data file that was originally created as '/DATA/franck.dbf'
2018-07-06T08:04:59.111881+00:00
Background Media Recovery process shutdown (ORCL)

db_file_name_convert

The first idea is to set a db_file_name_convert, however, this requires an instance restart, which means downtime when you have sessions on the Active Data Guard standby:

DGMGRL> edit database 'ORCL_02' set property DbFileNameConvert='/DATA,/u02/app/oracle/oradata/ORCL';
Warning: ORA-16675: database instance restart required for property value modification to take effect
 
Property "dbfilenameconvert" updated
 
DGMGRL> show database 'ORCL_02';
 
Database - ORCL_02
 
Role: PHYSICAL STANDBY
Intended State: APPLY-ON
Transport Lag: 0 seconds (computed 2 seconds ago)
Apply Lag: 3 minutes 32 seconds (computed 2 seconds ago)
Average Apply Rate: 16.00 KByte/s
Real Time Query: OFF
Instance(s):
ORCL
Warning: ORA-16675: database instance restart required for property value modification to
take effect
Warning: ORA-16714: the value of property DbFileNameConvert is inconsistent with the member setting
 
Database Error(s):
ORA-16766: Redo Apply is stopped
 
Database Warning(s):
ORA-16853: apply lag has exceeded specified threshold
 
Database Status:
ERROR

db_create_file_dest

The solution is set db_create_file_dest which, on the standby, has higher priority than the convert:

SQL> alter system set db_create_file_dest='/u02/app/oracle/oradata';
System altered.

I restart the apply:

DGMGRL> edit database 'ORCL_02' set state=apply-on;
Succeeded.

No need to restart and future datafile creations will be created there. However, it is too late for this datafile as it has already been created as UNNAMED in the controlfile:

ORA-01186: file 18 failed verification tests
ORA-01157: cannot identify/lock data file 18 - see DBWR trace file
ORA-01111: name for data file 18 is unknown - rename to correct file
ORA-01110: data file 18: '/u01/app/oracle/product/18.0.0/dbhome_1/dbs/UNNAMED00018'

Manual CREATE DATAFILE

Then I must manually create it, but I cannot do that while I am in standby_file_management=auto:

SQL> alter session set container=PDB1;
Session altered.
 
SQL> alter database create datafile '/u01/app/oracle/product/18.0.0/dbhome_1/dbs/UNNAMED00018' as '/u02/app/oracle/oradata/ORCL/franck.dbf';
alter database create datafile '/u01/app/oracle/product/18.0.0/dbhome_1/dbs/UNNAMED00018' as '/u02/app/oracle/oradata/ORCL/franck.dbf'
*
ERROR at line 1:
ORA-01275: Operation CREATE DATAFILE is not allowed if standby file management
is automatic.

This can be changed dynamically:

DGMGRL> edit database 'ORCL_02' set property StandbyFileManagement=manual;
Property "standbyfilemanagement" updated

And then the creation is possible:

SQL> alter session set container=PDB1;
Session altered.
 
SQL> alter database create datafile '/u01/app/oracle/product/18.0.0/dbhome_1/dbs/UNNAMED00018' as new;
Database altered.

You can see that because I have defined db_create_file_dest, I don’t need to name the datafile and create it as OMF with the ‘new’ keyword.

Now I can start the apply and it will resolve the gap:

DGMGRL> edit database 'ORCL_02' set state=apply-on;
Succeeded.
 
DGMGRL> show database 'ORCL_02';
 
Database - ORCL_02
 
Role: PHYSICAL STANDBY
Intended State: APPLY-ON
Transport Lag: 0 seconds (computed 3 seconds ago)
Apply Lag: 0 seconds (computed 3 seconds ago)
Average Apply Rate: 22.00 KByte/s
Real Time Query: ON
Instance(s):
ORCL
Warning: ORA-16675: database instance restart required for property value modification to take effect
Warning: ORA-16714: the value of property DbFileNameConvert is inconsistent with the member setting
 
Database Status:
WARNING

Do not forget to put back standby_file_management”to auto:

DGMGRL> edit database 'ORCL_02' set property StandbyFileManagement=auto;
Property "standbyfilemanagement" updated

So, now that db_create_file_dest is set, new datafiles will be created automatically as OMF (Oracle Managed Files), without caring about file name conversion:

SQL> alter session set container=PDB1;
Session altered.
 
SQL> alter tablespace FRANCK add datafile '/DATA/franck2.dbf' size 100M;
Tablespace altered.

This is confirmed from the standby alert.log:

(4):Datafile 19 added to flashback set
(4):Successfully added datafile 19 to media recovery
(4):Datafile #19: '/u02/app/oracle/oradata/ORCL_02/7050211FE75F26FAE05392781D0AADAA/datafile/o1_mf_franck_fmybw332_.dbf'

Conclusion

Always define db_create_file_dest in the standby database so that datafiles will be created. Better to have them at the wrong place rather than stopping the apply. And anyway, if you don’t like the OMF names, and you are at least in 12c Enterprise Edition, you can change their name later with online move:

SQL> alter session set container=PDB1;
Session altered.
 
SQL> alter database move datafile '/u02/app/oracle/oradata/ORCL_02/7050211FE75F26FAE05392781D0AADAA/datafile/o1_mf_franck_fmybw332_.dbf' to '/u02/app/oracle/oradata/ORCL/franck2.dbf';
Database altered.

 

Cet article Data Guard: always set db_create_file_dest on the standby est apparu en premier sur Blog dbi services.

My tmux scripts to script demos

Fri, 2018-07-06 10:27

When I did my first demo using tmux, I have written the following blog post to explain it: https://blog.dbi-services.com/using-tmux-for-semi-interactive-demos/. Since then I’ve done all my demos with this technique. My scripts and configuration files have evolved now I have uploaded the on GitHub to share them: https://github.com/FranckPachot/scripts/tree/master/tmux-demo

The README.md explains the pre-requisites (a recent version of Tmux), how to download the scripts quickly (I use this to get everything ready on a new Cloud host), and how to run it. The ‘demo’ alias starts tmux attached to the same session (in order to see it on the laptop and the beamer). The .vimrc defines the macros to run lines from the script file as tmux-send commands.

If you have questions, the best is to ask them on Twitter:

Here are the scripts and configuration files I use for my demos with tmux: https://t.co/OuuhHClmt3

— Franck Pachot (@FranckPachot) July 6, 2018

 

Cet article My tmux scripts to script demos est apparu en premier sur Blog dbi services.

18c: Order by in WITH clause is not preserved

Thu, 2018-07-05 14:29

For a previous post I’ve run on 18c a script of mine to get the V$MYSTAT delta values between two queries. This script (new version available on GitHub) generates the queries to store some values and subtract them on the next execution. But I had to fix it for 18c because I relied on some order by in a CTE which is lost in 18c.
The idea was to get the statistic names in a Common Table Expression (CTE):

with stats as (
select rownum n,stat_id,name from (select stat_id,name from v$statname where name in (&names) order by stat_id)
)

and query it from different parts of the UNION ALL which generates the script:

select 'select ' from dual
union all
select ' '||decode(n,1,' ',',')||'"CUR'||stat_id||'" - '||'&'||'LAG'||stat_id||' "DIF'||stat_id||'"' from stats
union all
select ' '||',nvl("CUR'||stat_id||'",0) "CUR'||stat_id||'"' from stats
union all

The rowum in the WITH clause is used later to add the comma for all rows except the first one – using decode(n,1,’ ‘,’,’)

But this relied on two assumptions:

  1. the WITH keeps the rows ordered
  2. the UNION ALL keeps the rows ordered

Of course, it would be better to add a number on each part and an ORDER BY at the end of the query. But for better readability, I didn’t.

However, the first assumption is wrong since 12.2 where the optimizer introduced In-Memory Cursor Duration Temp Tables. In the query above, the CTE is materialized because it is used multiple times. And this optimization keeps it in memory. But it seems that this structure does not keep the rows ordered.

Here is an example where I have a 4 rows table:

SQL> create table DEMO as select rownum id from xmltable('1 to 4');
Table DEMO created.
 
SQL> select * from DEMO order by id;
 
ID
----------
1
2
3
4

When I put the same query in a WITH clause, with its ORDER BY, and query it from 2 union all statements, the rows are not ordered anymore:

SQL> with cte as ( select * from DEMO order by id)
select rownum,id from cte
union all
select rownum,id from cte ;
 
ROWNUM ID
---------- ----------
1 4
2 3
3 2
4 1
1 4
2 3
3 2
4 1

The execution plan shows the materialization of the CTE result, and mentions that it is an In-Memory Cursor Duration Temp Table (CURSOR DURATION MEMORY)

SQL> select * from dbms_xplan.display_cursor(format=>'+projection');
 
PLAN_TABLE_OUTPUT
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
SQL_ID 9uwc06ana6trn, child number 0
-------------------------------------
with cte as ( select * from DEMO order by id) select rownum,id from
cte union all select rownum,id from cte
 
Plan hash value: 4025392480
 
----------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | | | 4 (100)| |
| 1 | TEMP TABLE TRANSFORMATION | | | | | |
| 2 | LOAD AS SELECT (CURSOR DURATION MEMORY)| SYS_TEMP_0FD9D6640_23D74B | | | | |
| 3 | SORT ORDER BY | | 4 | 12 | 4 (25)| 00:00:01 |
| 4 | TABLE ACCESS FULL | DEMO | 4 | 12 | 3 (0)| 00:00:01 |
| 5 | UNION-ALL | | | | | |
| 6 | COUNT | | | | | |
| 7 | VIEW | | 4 | 52 | 2 (0)| 00:00:01 |
| 8 | TABLE ACCESS FULL | SYS_TEMP_0FD9D6640_23D74B | 4 | 12 | 2 (0)| 00:00:01 |
| 9 | COUNT | | | | | |
| 10 | VIEW | | 4 | 52 | 2 (0)| 00:00:01 |
| 11 | TABLE ACCESS FULL | SYS_TEMP_0FD9D6640_23D74B | 4 | 12 | 2 (0)| 00:00:01 |
----------------------------------------------------------------------------------------------------------------------
 
Column Projection Information (identified by operation id):
-----------------------------------------------------------
1 - STRDEF[22], STRDEF[22] 2 - SYSDEF[4], SYSDEF[0], SYSDEF[1], SYSDEF[120], SYSDEF[0] 3 - (#keys=1) "DEMO"."ID"[NUMBER,22] 4 - (rowset=256) "DEMO"."ID"[NUMBER,22] 5 - STRDEF[22], STRDEF[22] 6 - "ID"[NUMBER,22], ROWNUM[8] 7 - "ID"[NUMBER,22] 8 - "C0"[NUMBER,22] 9 - "ID"[NUMBER,22], ROWNUM[8] 10 - "ID"[NUMBER,22] 11 - "C0"[NUMBER,22]

/*+ inline */

If the CTE is not materialized, the rows are ordered, bit the table is read two times:


SQL> with cte as ( select /*+ inline */ * from DEMO order by id)
2 select rownum,id from cte
3 union all
4 select rownum,id from cte ;
 
ROWNUM ID
---------- ----------
1 1
2 2
3 3
4 4
1 1
2 2
3 3
4 4
 
8 rows selected.
 
SQL>
SQL> select * from dbms_xplan.display_cursor(format=>'+projection');
 
PLAN_TABLE_OUTPUT
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
SQL_ID 0dad0vwg3k0th, child number 0
-------------------------------------
with cte as ( select /*+ inline */ * from DEMO order by id) select
rownum,id from cte union all select rownum,id from cte
 
Plan hash value: 2913170750
 
------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | | | 8 (100)| |
| 1 | UNION-ALL | | | | | |
| 2 | COUNT | | | | | |
| 3 | VIEW | | 4 | 52 | 4 (25)| 00:00:01 |
| 4 | SORT ORDER BY | | 4 | 12 | 4 (25)| 00:00:01 |
| 5 | TABLE ACCESS FULL| DEMO | 4 | 12 | 3 (0)| 00:00:01 |
| 6 | COUNT | | | | | |
| 7 | VIEW | | 4 | 52 | 4 (25)| 00:00:01 |
| 8 | SORT ORDER BY | | 4 | 12 | 4 (25)| 00:00:01 |
| 9 | TABLE ACCESS FULL| DEMO | 4 | 12 | 3 (0)| 00:00:01 |
------------------------------------------------------------------------------
 
Column Projection Information (identified by operation id):
-----------------------------------------------------------
 
1 - STRDEF[22], STRDEF[22] 2 - "ID"[NUMBER,22], ROWNUM[8] 3 - "ID"[NUMBER,22] 4 - (#keys=1) "DEMO"."ID"[NUMBER,22] 5 - (rowset=256) "DEMO"."ID"[NUMBER,22] 6 - "ID"[NUMBER,22], ROWNUM[8] 7 - "ID"[NUMBER,22] 8 - (#keys=1) "DEMO"."ID"[NUMBER,22] 9 - (rowset=256) "DEMO"."ID"[NUMBER,22]

“_in_memory_cdt”=off

If the CTE is materialized, but the new feature to keep it in memory is disabled, the rows are ordered (but probably by chance – there si no guarantee):


SQL> alter session set "_in_memory_cdt"=off;
 
Session altered.
 
SQL>
SQL> with cte as ( select /*+ materialize */ * from DEMO order by id)
2 select rownum,id from cte
3 union all
4 select rownum,id from cte ;
 
ROWNUM ID
---------- ----------
1 1
2 2
3 3
4 4
1 1
2 2
3 3
4 4
 
8 rows selected.
 
SQL> select * from dbms_xplan.display_cursor(format=>'+projection');
 
PLAN_TABLE_OUTPUT
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
SQL_ID 9uwc06ana6trn, child number 1
-------------------------------------
with cte as ( select * from DEMO order by id) select rownum,id from
cte union all select rownum,id from cte
 
Plan hash value: 4025392480
 
--------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | | | 4 (100)| |
| 1 | TEMP TABLE TRANSFORMATION | | | | | |
| 2 | LOAD AS SELECT | SYS_TEMP_0FD9D664D_23D74B | | | | |
| 3 | SORT ORDER BY | | 4 | 12 | 4 (25)| 00:00:01 |
| 4 | TABLE ACCESS FULL | DEMO | 4 | 12 | 3 (0)| 00:00:01 |
| 5 | UNION-ALL | | | | | |
| 6 | COUNT | | | | | |
| 7 | VIEW | | 4 | 52 | 2 (0)| 00:00:01 |
| 8 | TABLE ACCESS FULL | SYS_TEMP_0FD9D664D_23D74B | 4 | 12 | 2 (0)| 00:00:01 |
| 9 | COUNT | | | | | |
| 10 | VIEW | | 4 | 52 | 2 (0)| 00:00:01 |
| 11 | TABLE ACCESS FULL | SYS_TEMP_0FD9D664D_23D74B | 4 | 12 | 2 (0)| 00:00:01 |
--------------------------------------------------------------------------------------------------------
 
Column Projection Information (identified by operation id):
-----------------------------------------------------------
 
1 - STRDEF[22], STRDEF[22] 2 - SYSDEF[4], SYSDEF[0], SYSDEF[1], SYSDEF[120], SYSDEF[0] 3 - (#keys=1) "DEMO"."ID"[NUMBER,22] 4 - (rowset=256) "DEMO"."ID"[NUMBER,22] 5 - STRDEF[22], STRDEF[22] 6 - "ID"[NUMBER,22], ROWNUM[8] 7 - "ID"[NUMBER,22] 8 - "C0"[NUMBER,22] 9 - "ID"[NUMBER,22], ROWNUM[8] 10 - "ID"[NUMBER,22] 11 - "C0"[NUMBER,22]

So what?

We should never rely on the preservation of the row order except when explicitly documented (like rownum over a subquery with order by). And this In-Memory Cursor Duration Temporary table also works in parallel and RAC, so we can understand that the rows are not read in the same order as they were stored. This is always a problem when we migrate. When something works by chance in a version and not in the newer versions, people usually incriminate the migration, rather than recognizing an old bug in the application, just because it was working before.

 

Cet article 18c: Order by in WITH clause is not preserved est apparu en premier sur Blog dbi services.

ODA fiber interface link detection issue

Thu, 2018-07-05 11:55

Configuring fiber interfaces during an ODA installation can sometime been challenging. The configuration can depend on the kind of gigabit interface converter (GBIC) and switch been used. In this blog, I would like to show a real case faced during one of our customer deployment, and come with a solution.

Problem

We were facing link detection issue when using the GBIC converter delivered by ORACLE, although using a CISCO GBIC converter could make the link available.
em2 interface was used with a CISCO GBIC.
em3 interface was used with an ORACLE GBIC.

[root@TEST1 tmp]# ethtool em2
Settings for em2:
        Supported ports: [ FIBRE ]
        Supported link modes:   1000baseT/Full
                                10000baseT/Full
        Supported pause frame use: Symmetric Receive-only
        Supports auto-negotiation: Yes
        Advertised link modes:  1000baseT/Full
                                10000baseT/Full
        Advertised pause frame use: No
        Advertised auto-negotiation: Yes
        Speed: 10000Mb/s
        Duplex: Full
        Port: FIBRE
        PHYAD: 1
        Transceiver: internal
        Auto-negotiation: on
        Supports Wake-on: d
        Wake-on: d
        Current message level: 0x00000000 (0)

        Link detected: yes
        
[root@TEST1 tmp]# ethtool em3
Settings for em3:
        Supported ports: [ FIBRE ]
        Supported link modes:   1000baseT/Full
                                10000baseT/Full
        Supported pause frame use: Symmetric Receive-only
        Supports auto-negotiation: Yes
        Advertised link modes:  1000baseT/Full
                                10000baseT/Full
        Advertised pause frame use: No
        Advertised auto-negotiation: Yes
        Speed: Unknown!
        Duplex: Unknown! (255)
        Port: FIBRE
        PHYAD: 1
        Transceiver: internal
        Auto-negotiation: on
        Supports Wake-on: d
        Wake-on: d
        Current message level: 0x00000000 (0)

        Link detected: no

 

Solution

Forcing the interface on 10 Gb and configuring auto-negotiation to false could solve the link detection issue with the ORACLE GBIC converter.
This has been performed by updating both ifcfg-em2 and ifcfg-em3 (stored in /etc/sysconfig/network-scripts) with :

ETHTOOL_OPTS="speed 10000 duplex full autoneg off"

 

Both links could then be detected successfully.

[root@TEST1 ~]# ethtool em2
Settings for em2:
        Supported ports: [ FIBRE ]
        Supported link modes:   1000baseT/Full
                                10000baseT/Full
        Supported pause frame use: Symmetric Receive-only
        Supports auto-negotiation: Yes
        Advertised link modes:  Not reported
        Advertised pause frame use: No
        Advertised auto-negotiation: No
        Speed: 10000Mb/s
        Duplex: Full
        Port: FIBRE
        PHYAD: 1
        Transceiver: internal
        Auto-negotiation: off
        Supports Wake-on: d
        Wake-on: d
        Current message level: 0x00000000 (0)

        Link detected: yes
        
[root@TEST1 ~]# ethtool em3
Settings for em3:
        Supported ports: [ FIBRE ]
        Supported link modes:   1000baseT/Full
                                10000baseT/Full
        Supported pause frame use: Symmetric Receive-only
        Supports auto-negotiation: Yes
        Advertised link modes:  Not reported
        Advertised pause frame use: No
        Advertised auto-negotiation: No
        Speed: 10000Mb/s
        Duplex: Full
        Port: FIBRE
        PHYAD: 1
        Transceiver: internal
        Auto-negotiation: off
        Supports Wake-on: d
        Wake-on: d
        Current message level: 0x00000000 (0)

        Link detected: yes

 

SFP28 Ethernet Controller firmware upgrade

The firmware version we were running is the 20.06.04.06.

[root@TEST1 ~]# ethtool -i em3
driver: bnxt_en
version: 1.8.0
firmware-version: 20.6.141/1.8.1 pkg 20.06.04.06
bus-info: 0000:18:00.1
supports-statistics: yes
supports-test: yes
supports-eeprom-access: yes
supports-register-dump: no
supports-priv-flags: no

In case of further trouble, a firmware upgrade to 20.08.01.14 version can help, as described in following MOS note.
Using the onboard SFP28 ports on an ODA X7-2 server node (Doc ID 2373070.1)

 

Cet article ODA fiber interface link detection issue est apparu en premier sur Blog dbi services.

18c: some optimization about redo size

Tue, 2018-07-03 15:19

Some years ago, at the time of 12.1 release, I published in the SOUG newsletter some tests to show the amount of redo generated by different operations on a 10000 rows table. I had run it on 12.2 without seeing the differences and now on 18.1
I get the statistics from mystat using a script that displays them as columns, with the value being the difference from the previous run. I’ve run the same as in the article, and most of the statistics were in the same ballpark.

12.2

In 12.2 I get the same numbers. I was surprised about that because there is an optimization on 12.2 when updating a column to the same value. But this optimization may not occur for all situations. This reduction of redo generation has been analyzed by Maris Elsins with redo log dumps and by Sayan Malakshinov on triggers. And then the origin of this optimization has been exposed by Bryn Llewellyn. All info and links from the following Twitter conversation:

I hadn't blogged in a long time, but there was a very good reason for a new post: A redo optimization in 12cR2. https://t.co/M9LRNy3PuS @Pythian

— Maris Elsins (@MarisDBA) June 26, 2018

Here are the numbers I have in 18c, which are very similar to those from 12.1


SQL> --- update with different values (the original ones were all lowercase)
SQL>
SQL> update TEST_TABLE set a=upper(a),b=upper(b),c=upper(c),d=upper(d);
10000 rows updated.
 
SQL> commit;
Commit complete.
 
SQL> @ _mystat_diff.sql
SQL> set termout off verify off
SQL> /
 
db block changes redo size undo change vector size redo entries
---------------- ---------------- ----------------------- ----------------
190,485 32,226,368 13,288,940 95,258
 
1 row selected.
 
SQL>
SQL> --- update with same valus
SQL>
SQL> update TEST_TABLE set a=upper(a),b=upper(b),c=upper(c),d=upper(d);
10000 rows updated.
 
SQL> commit;
Commit complete.
 
SQL> @ _mystat_diff.sql
SQL> set termout off verify off
SQL> /
 
db block changes redo size undo change vector size redo entries
---------------- ---------------- ----------------------- ----------------
20,346 4,594,528 1,844,012 10,085
 
1 row selected.

The second update has updated the 4 columns with the same value. For about 1MB of data (10000 rows with avg_row_len=100), we have 1.8MB of undo and 4.4MB of redo (which covers the new data and the undo). I have 4 indexes there but Oracle do not update index entries when the old and new values are the same.

The first update changes all values, and then, in addition to the changes in the table block, the indexes must be updated.

So, here, on my test, it seems that the 12.2 optimization, referenced in the tweet above, did not occur because the redo generated for the table blocks is stull full logging when the old and new values are the same. I can check from a block dump that I have the same value in undo and redo:

REDO RECORD - Thread:1 RBA: 0x000008.00002444.0010 LEN: 0x01c8 VLD: 0x01 CON_UID: 1008806272
SCN: 0x00000000002cb8a4 SUBSCN:3047 07/03/2018 12:23:22
CHANGE #1 CON_ID:4 TYP:0 CLS:36 AFN:14 DBA:0x02405a20 OBJ:4294967295 SCN:0x00000000002cb8a4 SEQ:34 OP:5.1 ENC:0 RBL:0 FLG:0x0000
ktudb redo: siz: 184 spc: 2020 flg: 0x0022 seq: 0x0147 rec: 0x22
xid: 0x000a.009.000002bd
ktubu redo: slt: 9 rci: 33 opc: 11.1 objn: 77968 objd: 77978 tsn: 0
Undo type: Regular undo Undo type: Last buffer split: No
Tablespace Undo: No
0x00000000
KDO undo record:
KTB Redo
op: 0x02 ver: 0x01
compat bit: 4 (post-11) padding: 1
op: C uba: 0x02405a20.0147.21
KDO Op code: URP row dependencies Disabled
xtype: XAxtype KDO_KDOM2 flags: 0x00000080 bdba: 0x0040a994 hdba: 0x0040a7d8
itli: 3 ispac: 0 maxfr: 4863
tabn: 0 slot: 3(0x3) flag: 0x2c lock: 0 ckix: 0
ncol: 4 nnew: 4 size: 0
Vector content:
col 0: [24] 31 30 30 30 30 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58
col 1: [24] 31 30 30 30 30 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58
col 2: [24] 31 30 30 30 30 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58
col 3: [24] 31 30 30 30 30 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58
CHANGE #2 CON_ID:4 TYP:0 CLS:1 AFN:12 DBA:0x0040a994 OBJ:77978 SCN:0x00000000002cb8a4 SEQ:3 OP:11.5 ENC:0 RBL:0 FLG:0x0000
KTB Redo
op: 0x02 ver: 0x01
compat bit: 4 (post-11) padding: 1
op: C uba: 0x02405a20.0147.22
KDO Op code: URP row dependencies Disabled
xtype: XAxtype KDO_KDOM2 flags: 0x00000080 bdba: 0x0040a994 hdba: 0x0040a7d8
itli: 3 ispac: 0 maxfr: 4863
tabn: 0 slot: 3(0x3) flag: 0x2c lock: 3 ckix: 0
ncol: 4 nnew: 4 size: 0
Vector content:
col 0: [24] 31 30 30 30 30 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58
col 1: [24] 31 30 30 30 30 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58
col 2: [24] 31 30 30 30 30 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58
col 3: [24] 31 30 30 30 30 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58

The redo record has the old and new values even when they are the same. I hope that we will have more documentation about the 12.2 redo optimization so that it can be predictable and tunable.

18.1

So, I didn’t see the 12.2 optimizations I expected here. However, but it seems that we have one with deletes on 18c.

In 12.2 the delete of all 10000 rows without index generates 2MB of undo and 3.5MB of redo:

SQL> --- delete all rows
SQL>
SQL> delete from TEST_TABLE;
10000 rows deleted.
 
SQL> commit;
Commit complete.
 
SQL> @ _mystat_diff.sql
SQL> set termout off verify off
SQL> /
 
db block changes redo size undo change vector size redo entries
---------------- ---------------- ----------------------- ----------------
20,690 3,670,476 2,053,292 10,145
 
1 row selected.

The same in 18.1 generates only 1MB of undo and redo:

SQL> --- delete all rows
SQL>
SQL> delete from TEST_TABLE;
10000 rows deleted.
 
SQL> commit;
Commit complete.
 
SQL> @ _mystat_diff.sql
SQL> set termout off verify off
SQL> /
 
db block changes redo size undo change vector size redo entries
---------------- ---------------- ----------------------- ----------------
872 1,187,120 1,116,812 293
 
1 row selected.

So in 12.2 I had one block change, and one redo record per row deleted. In 18.1 it seems that I have one redo record per block where all rows are deleted. Still in the same Twitter conversation, Tanel Poder had the idea to do a sparse delete leaving one row in each block:

SQL> delete from TEST_TABLE where dbms_rowid.rowid_row_number(rowid)!=42;
9849 rows deleted.
 
SQL> commit;
Commit complete.
 
SQL> @ _mystat_diff.sql
SQL> set termout off verify off
SQL> /
 
db block changes redo size undo change vector size redo entries
---------------- ---------------- ----------------------- ----------------
20,431 3,660,204 2,102,584 10,011

Now, as there is no blocks that are fully emptied, I’m back to one redo entry per row deleted.

 

Cet article 18c: some optimization about redo size est apparu en premier sur Blog dbi services.

Event Sourcing: CQN is not a replacement for CDC

Mon, 2018-07-02 16:02

We are in an era where software architects want to stream the transactions out of the database and distribute them, as events, to multiple microservices. Don’t ask why, but that’s the trend: store inconsistent eventually consistent copies of data in different physical components, rather than simply using logical views in the same database, where the data is ACIDely stored, processed and protected. Because it was decided that this segregation, in CQRS (Command Query Responsibility Segregation), will be physical, on different systems, the need for logical replication and change data capture is raising, with a new name: Event Sourcing.

When we want to replicate the changes without adding an overhead to the database, the solution is Change Data Capture from the redo stream. The redo contains all the physical changes and, with dictionary information and a little supplemental logging, we can mine it to extract the logical changes. Currently are commercial products (Oracle GoldenGate, Attunity, Dbvisit replicate) and there are some open source ones based on LogMiner (StreamSets, Debezium). LogMiner is available on all Oracle Database editions without any option. In Enterprise Edition, a more efficient solution was possible with Streams but now you have to pay for GoldenGate to use Streams. Unfortunately, sometimes you pay software update to get features removed and be sold in additional products.

Oracle has another feature that can help to replicate changes: Database Change notification, now known as Continuous Query Notification (CQN) or Object Change Notification (OCN). This feature has been implemented to refresh caches: you have a query that loads the cache and you want to be notified when some changes occurred, so that you have to update/refresh the cache. Then, in theory, this can be used to stream out the changes. However, CQN was not built for frequent changes but rather for nearly static, or slowly changing data. But sometimes we have to test by ourselves and here are my test using CQN with a lot of changes on the underlying table, just to show how it increases the load on the database and slows down the changes.

I create a DEMO table with one million rows:

17:21:56 SQL> whenever sqlerror exit failure;
17:21:56 SQL> create table DEMO (ID constraint DEMO_ID primary key) as select rownum from xmltable('1 to 1000000');
 
Table DEMO created.
 

And a table to hold notifications. As always when I want to start with an example, I start to get it from oracle-base:

17:21:58 SQL> -- from Tim Hall https://oracle-base.com/articles/10g/dbms_change_notification_10gR2
17:21:58 SQL> CREATE TABLE notifications (
2 id NUMBER,
3 message VARCHAR2(4000),
4 notification_date DATE
5 );
 
Table NOTIFICATIONS created.
 
17:21:58 SQL> CREATE SEQUENCE notifications_seq;
 
Sequence NOTIFICATIONS_SEQ created.

The callback function:

17:21:58 SQL> CREATE OR REPLACE PROCEDURE callback (ntfnds IN SYS.chnf$_desc) IS
2 l_regid NUMBER;
3 l_table_name VARCHAR2(60);
4 l_event_type NUMBER;
5 l_numtables NUMBER;
6 l_operation_type NUMBER;
7 l_numrows NUMBER;
8 l_row_id VARCHAR2(20);
9 l_operation VARCHAR2(20);
10 l_message VARCHAR2(4000) := NULL;
11 BEGIN
12 l_regid := ntfnds.registration_id;
13 l_numtables := ntfnds.numtables;
14 l_event_type := ntfnds.event_type;
15 IF l_event_type = DBMS_CHANGE_NOTIFICATION.EVENT_OBJCHANGE THEN
16 FOR i IN 1 .. l_numtables LOOP
17 l_table_name := ntfnds.table_desc_array(i).table_name;
18 l_operation_type := ntfnds.table_desc_array(i).Opflags;
19 IF (BITAND(l_operation_type, DBMS_CHANGE_NOTIFICATION.ALL_ROWS) = 0) THEN
20 l_numrows := ntfnds.table_desc_array(i).numrows;
21 ELSE
22 l_numrows :=0; /* ROWID INFO NOT AVAILABLE */
23 END IF;
24 CASE
25 WHEN BITAND(l_operation_type, DBMS_CHANGE_NOTIFICATION.INSERTOP) != 0 THEN
26 l_operation := 'Records Inserted';
27 WHEN BITAND(l_operation_type, DBMS_CHANGE_NOTIFICATION.UPDATEOP) != 0 THEN
28 l_operation := 'Records Updated';
29 WHEN BITAND(l_operation_type, DBMS_CHANGE_NOTIFICATION.DELETEOP) != 0 THEN
30 l_operation := 'Records Deleted';
31 WHEN BITAND(l_operation_type, DBMS_CHANGE_NOTIFICATION.ALTEROP) != 0 THEN
32 l_operation := 'Table Altered';
33 WHEN BITAND(l_operation_type, DBMS_CHANGE_NOTIFICATION.DROPOP) != 0 THEN
34 l_operation := 'Table Dropped';
35 WHEN BITAND(l_operation_type, DBMS_CHANGE_NOTIFICATION.UNKNOWNOP) != 0 THEN
36 l_operation := 'Unknown Operation';
37 ELSE
38 l_operation := '?';
39 END CASE;
40 l_message := 'Table (' || l_table_name || ') - ' || l_operation || '. Rows=' || l_numrows;
41 INSERT INTO notifications (id, message, notification_date)
42 VALUES (notifications_seq.NEXTVAL, l_message, SYSDATE);
43 COMMIT;
44 END LOOP;
45 END IF;
46 END;
47 /
 
Procedure CALLBACK compiled
 
17:21:58 SQL> -- thanks Tim

and the CQN registration:

17:21:58 SQL> -- register on DEMO;
17:21:58 SQL>
17:21:58 SQL> DECLARE
2 reginfo CQ_NOTIFICATION$_REG_INFO;
3 v_cursor SYS_REFCURSOR;
4 regid NUMBER;
5 BEGIN
6 reginfo := cq_notification$_reg_info ( 'callback', DBMS_CHANGE_NOTIFICATION.QOS_ROWIDS, 0, 0, 0);
7 regid := sys.DBMS_CHANGE_NOTIFICATION.new_reg_start(reginfo);
8 OPEN v_cursor FOR
9 SELECT dbms_cq_notification.CQ_NOTIFICATION_QUERYID, demo.* from DEMO;
10 CLOSE v_cursor;
11 sys.DBMS_CHANGE_NOTIFICATION.reg_end;
12 END;
13 /
 
PL/SQL procedure successfully completed.

Now I delete 1 million rows and commit:


17:21:58 SQL> exec dbms_workload_repository.create_snapshot;
 
PL/SQL procedure successfully completed.
 
17:22:02 SQL>
17:22:02 SQL> -- 1000000 deletes
17:22:02 SQL>
17:22:02 SQL> exec for i in 1..1000000 loop delete from DEMO WHERE id=i; commit; end loop;
 
PL/SQL procedure successfully completed.
 
17:39:23 SQL>
17:39:23 SQL> exec dbms_workload_repository.create_snapshot;

Here are the notifications captured:

17:39:41 SQL> select count(*) from notifications;
COUNT(*)
--------
942741
 
17:39:54 SQL> select * from notifications fetch first 10 rows only;
 
ID MESSAGE NOTIFICATION_DATE
--- ------------------------------------------- -----------------
135 Table (DEMO.DEMO) - Records Deleted. Rows=1 09-MAY-18
138 Table (DEMO.DEMO) - Records Deleted. Rows=1 09-MAY-18
140 Table (DEMO.DEMO) - Records Deleted. Rows=1 09-MAY-18
142 Table (DEMO.DEMO) - Records Deleted. Rows=1 09-MAY-18
145 Table (DEMO.DEMO) - Records Deleted. Rows=1 09-MAY-18
147 Table (DEMO.DEMO) - Records Deleted. Rows=1 09-MAY-18
149 Table (DEMO.DEMO) - Records Deleted. Rows=1 09-MAY-18
152 Table (DEMO.DEMO) - Records Deleted. Rows=1 09-MAY-18
154 Table (DEMO.DEMO) - Records Deleted. Rows=1 09-MAY-18
156 Table (DEMO.DEMO) - Records Deleted. Rows=1 09-MAY-18

The DML has been long and SQL Monitoring shows that 64% of the time was waiting on ‘Wait for EMON to process ntfns’ which is the notification process:
CaptureCQN

The execution of the delete itself (cdq5w65zk18r1 DELETE FROM DEMO WHERE ID=:B1) is only a small part of the database time. And we have additional load on the database:
CaptureCQN01

The following is activity related to Continuous Query notification queuing of messages, the one that slows down the modifications, during the delete (from 17:22 to 17:38):

59p1yadp2g6mb call DBMS_AQADM_SYS.REGISTER_DRIVER ( )
gzf71xphapf1b select /*+ INDEX(TAB AQ$_AQ_SRVNTFN_TABLE_1_I) */ tab.rowid, tab.msgid, tab.corrid, tab.priority, tab.delay, tab.expiration ,tab.retry_count, tab.exception_qschema, tab.exception_queue, tab.chain_no, tab.local_order_no, tab.enq_time, tab.time_manager_info, tab.state, tab.enq_tid, tab.step_no, tab.sender_name, tab.sender_address, tab.sender_protocol, tab.dequeue_msgid, tab.user_prop, tab.user_data from "SYS"."AQ_SRVNTFN_TABLE_1" tab where q_name = :1 and (state = :2 ) order by q_name, state, enq_time, step_no, chain_no, local_order_no for update skip locked
61cgh171qq5m6 delete /*+ CACHE_CB("AQ_SRVNTFN_TABLE_1") */ from "SYS"."AQ_SRVNTFN_TABLE_1" where rowid = :1
ccrv58ajb7pxg begin callback(ntfnds => :1); end;
cdq5w65zk18r1 DELETE FROM DEMO WHERE ID=:B1

And at the end (17:38), when the modifications are committed, my callback function is running to process the messages:
CaptureCQN02
the main query is the insert from the callback function:
8z4m5tw9uh02d INSERT INTO NOTIFICATIONS (ID, MESSAGE, NOTIFICATION_DATE) VALUES (NOTIFICATIONS_SEQ.NEXTVAL, :B1 , SYSDATE)
The callback function may send the changes to another system, rather than inserting them here, but then you can question the availability and, anyway, this will still have a high overhead in context switches and network roundtrips.

In summary, for 1 million rows deleted, here are the queries that have been executed 1 million times:

Elapsed
Executions Rows Processed Rows per Exec Time (s) %CPU %IO SQL Id
------------ --------------- -------------- ---------- ----- ----- -------------
1,000,000 1,000,000 1.0 123.4 55.2 3.2 cdq5w65zk18r1 Module: java@VM188 (TNS V1-V3) DELETE FROM DEMO WHERE ID=:B1
999,753 999,753 1.0 261.5 88.6 .7 dw9yv631knnqd insert into "SYS"."AQ_SRVNTFN_TABLE_1" (q_name, msgid, corrid, priority, state, delay, expiration, time_manager_info, local_order_no, chain_no, enq_time, step_no, enq_uid, enq_tid, retry_count, exception_qschema, exception_queue, recipient_key, dequeue_msgid, user_data, sender_name, sender_address, sender_protoc
978,351 978,351 1.0 212.5 64.3 0 61cgh171qq5m6 Module: DBMS_SCHEDULER delete /*+ CACHE_CB("AQ_SRVNTFN_TABLE_1") */ from "SYS"."AQ_SRVNTFN_TABLE_1" where rowid = :1
978,248 942,657 1.0 971.6 20 .7 8z4m5tw9uh02d Module: DBMS_SCHEDULER INSERT INTO NOTIFICATIONS (ID, MESSAGE, NOTIFICATION_DATE) VALUES (NOTIFICATIONS_SEQ.NEXTVAL, :B1 , SYSDATE)
978,167 942,559 1.0 1,178.7 33.1 .5 ccrv58ajb7pxg Module: DBMS_SCHEDULER begin callback(ntfnds => :1); end;
977,984 977,809 1.0 73.9 96.5 0 brq600g3299zp Module: DBMS_SCHEDULER SELECT INSTANCE_NUMBER FROM SYS.V$INSTANCE
933,845 978,350 1.0 446.9 51.4 .7 gzf71xphapf1b Module: DBMS_SCHEDULER select /*+ INDEX(TAB AQ$_AQ_SRVNTFN_TABLE_1_I) */ tab.rowid, tab.msgid, tab.corrid, tab.priority, tab.delay, tab.expiration ,tab.retry_count, tab.exception_qschema, tab.exception_queue, tab.chain_no, tab.local_order_no, tab.enq_time, tab.time_manager_info, tab.state, tab.enq_tid, tab.step_no, tab.sender_name

This is a huge overhead. And all this has generated 8 millions of redo entries.

In summary, just forget about CQN to stream changes. This feature is aimed at cache refresh for rarely changing data. What we call today ‘event sourcing’ exists for a long time in the database, with redo logs. When a user executes some DML, Oracle generates the redo records first, store them and apply them to update the current version of the table rows. And the redo logs keeps the atomicity of transaction (the ‘A’ in ACID). Then better use this if the changes need to be propagated to other systems.

 

Cet article Event Sourcing: CQN is not a replacement for CDC est apparu en premier sur Blog dbi services.

Documentum – How to really configure the D2 JMS logs starting with JBoss/WildFly

Sun, 2018-07-01 15:00

If you are working with Documentum for quite some time, you are probably familiar with the logback.xml file that can be used to configure the D2 logs. In this blog, I will be talking only about the Content Server side of this configuration. As you probably know, Documentum upgraded the JMS to use JBoss 7.1.1 “recently” (several years already) and even WildFly9.0.1 with the CS 7.3+. On this blog, I will only use “JBoss” but it refers to both JBoss and WildFly versions.  With these recent versions, the logback.xml file stopped working on linux environments (I’m not sure about Windows, I only work on linux). Therefore you will face an issue: the D2 JMS logs cannot really be configured properly, by default. Of course you will still be able to configure the JBoss and JMS logs properly because that is done through the logging.properties file (for the boot.log), through the standalone.xml file (for the server.log) and through all log4j.properties files for each JMS Applications (ServerApps, ACS, BPM, aso…). But if you are using D2, then all the D2 JMS logs (previously stored on D2-JMS.log) will also be added to the server.log as well a console output.

Unfortunately for us, the D2 JMS logs are using DEBUG by default for everything so it might represent some big files at the end of the day as soon as you start working more than XXX concurrent users. Worse than that, the D2 JMS logs, which are in DEBUG, are considered as INFO from the JBoss point of view and therefore, if you are using JBoss with INFO log level, it will print all the DEBUG information from the D2 JMS logs. Of course you could still set the JBoss level to WARN so it would remove all the DEBUG but in this case, you will also be missing the INFO from the JBoss as well as the D2 JMS sides which might include some pretty important information like for example the assurance that the D2.Lockbox can be read properly (no problems with the passwords and/or fingerprint).

So what to do about it? Well there is a JVM parameter that can actually be used to force the JBoss Server to read and use a specific logback.xml file. For that, simply update the startMethodServer.sh script as done below. I will use the logback.xml file that is present by default right under ServerApps.ear and that I will customize to get the best out of it.

First of all, I’m updating the content to add some things. Here is a template for this file:

[dmadmin@content_server_01 ~]$ cd $DOCUMENTUM_SHARED/wildfly9.0.1/server/
[dmadmin@content_server_01 server]$ logback_file="$DOCUMENTUM_SHARED/wildfly9.0.1/server/DctmServer_MethodServer/deployments/ServerApps.ear/logback.xml"
[dmadmin@content_server_01 server]$ 
[dmadmin@content_server_01 server]$ # Here I'm updating the content of the default file to add custom patterns, log level, console output, aso...
[dmadmin@content_server_01 server]$ vi ${logback_file}
[dmadmin@content_server_01 server]$ 
[dmadmin@content_server_01 server]$ cat ${logback_file}
<?xml version="1.0" encoding="UTF-8"?>

<configuration scan="true" scanPeriod="60 seconds">

  <appender class="ch.qos.logback.core.rolling.RollingFileAppender" name="RootFileAppender">
    <file>/tmp/D2-JMS.log</file>
    <append>true</append>
    <filter class="ch.qos.logback.classic.filter.ThresholdFilter">
      <level>debug</level>
    </filter>
    <rollingPolicy class="ch.qos.logback.core.rolling.TimeBasedRollingPolicy">
      <fileNamePattern>/tmp/D2-JMS-%d{yyyy-MM-dd}.log.zip</fileNamePattern>
      <MaxHistory>5</MaxHistory>
    </rollingPolicy>
    <layout class="ch.qos.logback.classic.PatternLayout">
      <pattern>%d{"yyyy-MM-dd HH:mm:ss,SSS z"} [%-5p] \(%t\) - %-45(%C{44}) : %m%n</pattern>
    </layout>
  </appender>

  <appender class="ch.qos.logback.core.ConsoleAppender" name="RootConsoleAppender">
    <layout>
      <pattern>[%-5p] - %-45(%C{44}) : %m%n</pattern>
    </layout>
  </appender>

  <root>
    <level value="${logLevel:-info}"/>
    <appender-ref ref="RootFileAppender"/>
    <appender-ref ref="RootConsoleAppender"/>
  </root>

</configuration>
[dmadmin@content_server_01 server]$

 

Then once you have your template logback.xml file, you need to force JBoss to load and use it otherwise it will just be ignored. As mentioned above, here is the JVM parameter to be added:

[dmadmin@content_server_01 server]$ 
[dmadmin@content_server_01 server]$ grep "JAVA_OPTS=" startMethodServer.sh
JAVA_OPTS="$USER_MEM_ARGS -Djboss.server.base.dir=$JBOSS_BASE_DIR -Dorg.apache.coyote.http11.Http11Protocol.SERVER=MethodServer"
[dmadmin@content_server_01 server]$ 
[dmadmin@content_server_01 server]$ sed -i 's,^JAVA_OPTS="[^"]*,& -Dlogback.configurationFile=$JBOSS_BASE_DIR/deployments/ServerApps.ear/logback.xml,' startMethodServer.sh
[dmadmin@content_server_01 server]$ 
[dmadmin@content_server_01 server]$ grep "JAVA_OPTS=" startMethodServer.sh
JAVA_OPTS="$USER_MEM_ARGS -Djboss.server.base.dir=$JBOSS_BASE_DIR -Dorg.apache.coyote.http11.Http11Protocol.SERVER=MethodServer -Dlogback.configurationFile=$JBOSS_BASE_DIR/deployments/ServerApps.ear/logback.xml"
[dmadmin@content_server_01 server]$

 

Once done, you can customize some values like the path and name of the log file, the number of files to keep, the log level you want to use, aso. Here are some commands to do just that:

[dmadmin@content_server_01 server]$ 
[dmadmin@content_server_01 server]$ d2_log="$DOCUMENTUM_SHARED/wildfly9.0.1/server/DctmServer_MethodServer/logs/D2-JMS.log"
[dmadmin@content_server_01 server]$ 
[dmadmin@content_server_01 server]$ # Commands to update some values on this pattern file using the ${d2_log} variable
[dmadmin@content_server_01 server]$ sed -i "s,<file>.*</file>,<file>${d2_log}</file>," ${logback_file}
[dmadmin@content_server_01 server]$ sed -i "s,<fileNamePattern>.*</fileNamePattern>,<fileNamePattern>${d2_log}-%d{yyyy-MM-dd}.zip</fileNamePattern>," ${logback_file}
[dmadmin@content_server_01 server]$ sed -i "s,<MaxHistory>.*</MaxHistory>,<MaxHistory>180</MaxHistory>," ${logback_file}
[dmadmin@content_server_01 server]$ sed -i "s,<level>.*</level>,<level>info</level>," ${logback_file}
[dmadmin@content_server_01 server]$

 

With the above done, you can just restart the JMS and afterwards, you will have a new file created D2-JMS.log at the location specified and with the log level specified.

[dmadmin@content_server_01 server]$ $JMS_HOME/server/stopMethodServer.sh
{"outcome" => "success"}
[dmadmin@content_server_01 server]$
[dmadmin@content_server_01 server]$
[dmadmin@content_server_01 server]$
[dmadmin@content_server_01 server]$ $JMS_HOME/server/startJMS.sh
Starting the JMS...
The JMS process has been started.
[dmadmin@content_server_01 server]$ sleep 30
[dmadmin@content_server_01 server]$ 
[dmadmin@content_server_01 server]$ cat ${d2_log}
2018-06-16 17:16:48,652 UTC [INFO ] (default task-6) - com.emc.d2.api.methods.D2Method               : D2Method Main method com.emc.d2.api.methods.D2SubscriptionMethod arguments: {-user_name=dmadmin, -method_trace_level=0, -dcbase_name=Repo1.Repo1, -class_name=com.emc.d2.api.methods.D2SubscriptionMethod, -job_id=080f123450001612}
2018-06-16 17:16:49,668 UTC [INFO ] (default task-6) - com.emc.d2.api.methods.D2Method               : ==== START ======================================================================
2018-06-16 17:16:49,670 UTC [INFO ] (default task-6) - com.emc.d2.api.methods.D2Method               : D2-API v4.7.0070 build 186
2018-06-16 17:16:49,674 UTC [INFO ] (default task-6) - com.emc.d2.api.methods.D2Method               : DFC version : 7.3.0040.0025
2018-06-16 17:16:49,675 UTC [INFO ] (default task-6) - com.emc.d2.api.methods.D2Method               : file.encoding : ANSI_X3.4-1968
2018-06-16 17:16:49,676 UTC [INFO ] (default task-6) - com.emc.d2.api.methods.D2Method               : Arguments : {-user_name=dmadmin, -method_trace_level=0, -docbase_name=Repo1.Repo1, -class_name=com.emc.d2.api.methods.2SubscriptionMethod, -job_id=080f123450001612}
[dmadmin@content_server_01 server]$

 

Here you have a working D2-JMS.log file with INFO only information and no DEBUG.

 

Some tips regarding the logback.xml configuration (I put an example of each in the template configuration file above):

  • If you want to display the full date, time (with milliseconds) and timezone in the logs, you will need to add quotes like that: <pattern>%d{“yyyy-MM-dd HH:mm:ss,SSS z”} …</pattern>. This is simply because the comma (,) is used normally to separate the timeformat from the timezone you want to display the logs on (E.g.: %d{HH:mm:ss.SSS, UTC}) but it won’t display the timezone on the logs, in this case. So you if want the seconds to be separated from the milliseconds using a comma, you need to quote the whole string. If you want the current timezone to be displayed on the logs, you can usually do it using the “z” (with not too old Java versions)
  • By default, you cannot use parentheses in the pattern to enclose parameters (like “%-5p”, “%t”, aso…). This is because parentheses are used to group parameters together to apply formatting to them as part of a group. If you really want to use parentheses on the output, then you have to escape them
  • You can define the minimum length of a specific pattern parameter using the “%-X” where X is the number of characters. Using that, you can align the logs as you want (E.g.: “%-5p” for the log level in 5 chars => “DEBUG”, “INFO “, “WARN “, “ERROR”)
  • You can also shorten a specific pattern parameter using {X} where X is again the number of characters you would want the output string to be reduced to. It is not an exact value but the logger will do its best to reduce the length to what you want.
  • You can use different appenders to redirect the logs to different outputs. Usually you will want a file appender to store everything in a file but you can also add a console appender so it gets stored in your default console output (be it your shell, a nohup file or the server.log). If you do not want the console appender so it gets stored only on the D2-JMS.log file, you can just comment the line ‘<appender-ref ref=”RootConsoleAppender”/>’

 

You might be wondering why this JVM parameter is not added by default by the Documentum installer since it is a valid solution for this issue, right? Well, I would simply reply that it’s Documentum. ;)

 

 

Cet article Documentum – How to really configure the D2 JMS logs starting with JBoss/WildFly est apparu en premier sur Blog dbi services.

Documentum – Unable to restart a JMS, failed deployment of bundle.jar

Sun, 2018-07-01 14:21

While working on a very big Documentum project with several other teams, some people were complaining that the JMS wouldn’t start anymore on one of the DEV environments. It is kind of rare to face an issue with the JMS itself (JBoss works pretty well usually…) so I was interested in checking this. This was a Documentum 7.2 environment but I’m sure it would apply to older versions as well as newer versions (7.x, 16.4, …). This was the content of the JMS console output:

[dmadmin@content_server_01 ~]$ cat $JMS_HOME/server/nohup-JMS.out
...
2018-06-12 07:15:54,765 UTC INFO  [org.jboss.modules] JBoss Modules version 1.1.1.GA
2018-06-12 07:15:54,968 UTC INFO  [org.jboss.msc] JBoss MSC version 1.0.2.GA
2018-06-12 07:15:55,019 UTC INFO  [org.jboss.as] JBAS015899: JBoss AS 7.1.1.Final "Brontes" starting
2018-06-12 07:15:56,415 UTC INFO  [org.jboss.as.server] JBAS015888: Creating http management service using socket-binding (management-http)
2018-06-12 07:15:56,417 UTC INFO  [org.xnio] XNIO Version 3.0.3.GA
2018-06-12 07:15:56,428 UTC INFO  [org.xnio.nio] XNIO NIO Implementation Version 3.0.3.GA
2018-06-12 07:15:56,437 UTC INFO  [org.jboss.remoting] JBoss Remoting version 3.2.3.GA
2018-06-12 07:15:56,448 UTC INFO  [org.jboss.as.logging] JBAS011502: Removing bootstrap log handlers
2018-06-12 07:15:56,478 UTC INFO  [org.jboss.as.configadmin] (ServerService Thread Pool -- 27) JBAS016200: Activating ConfigAdmin Subsystem
2018-06-12 07:15:56,486 UTC INFO  [org.jboss.as.clustering.infinispan] (ServerService Thread Pool -- 32) JBAS010280: Activating Infinispan subsystem.
2018-06-12 07:15:56,510 UTC INFO  [org.jboss.as.osgi] (ServerService Thread Pool -- 40) JBAS011940: Activating OSGi Subsystem
2018-06-12 07:15:56,518 UTC INFO  [org.jboss.as.naming] (ServerService Thread Pool -- 39) JBAS011800: Activating Naming Subsystem
2018-06-12 07:15:56,527 UTC INFO  [org.jboss.as.security] (ServerService Thread Pool -- 45) JBAS013101: Activating Security Subsystem
2018-06-12 07:15:56,533 UTC INFO  [org.jboss.as.security] (MSC service thread 1-9) JBAS013100: Current PicketBox version=4.0.7.Final
2018-06-12 07:15:56,535 UTC INFO  [org.jboss.as.connector] (MSC service thread 1-4) JBAS010408: Starting JCA Subsystem (JBoss IronJacamar 1.0.9.Final)
2018-06-12 07:15:56,548 UTC INFO  [org.jboss.as.webservices] (ServerService Thread Pool -- 49) JBAS015537: Activating WebServices Extension
2018-06-12 07:15:56,585 UTC INFO  [org.jboss.as.naming] (MSC service thread 1-1) JBAS011802: Starting Naming Service
2018-06-12 07:15:56,594 UTC INFO  [org.jboss.as.mail.extension] (MSC service thread 1-12) JBAS015400: Bound mail session 
2018-06-12 07:15:56,688 UTC INFO  [org.jboss.ws.common.management.AbstractServerConfig] (MSC service thread 1-8) JBoss Web Services - Stack CXF Server 4.0.2.GA
2018-06-12 07:15:56,727 UTC INFO  [org.apache.coyote.http11.Http11Protocol] (MSC service thread 1-10) Starting Coyote HTTP/1.1 on http--0.0.0.0-9080
2018-06-12 07:15:56,821 UTC INFO  [org.hornetq.core.server.impl.HornetQServerImpl] (MSC service thread 1-13) live server is starting with configuration HornetQ Configuration (clustered=false,backup=false,sharedStore=true,journalDirectory=$JMS_HOME/server/DctmServer_MethodServer/data/messagingjournal,bindingsDirectory=$JMS_HOME/server/DctmServer_MethodServer/data/messagingbindings,largeMessagesDirectory=$JMS_HOME/server/DctmServer_MethodServer/data/messaginglargemessages,pagingDirectory=$JMS_HOME/server/DctmServer_MethodServer/data/messagingpaging)
2018-06-12 07:15:56,827 UTC INFO  [org.hornetq.core.server.impl.HornetQServerImpl] (MSC service thread 1-13) Waiting to obtain live lock
2018-06-12 07:15:56,852 UTC INFO  [org.hornetq.core.persistence.impl.journal.JournalStorageManager] (MSC service thread 1-13) Using AIO Journal
2018-06-12 07:15:56,923 UTC INFO  [org.hornetq.core.server.impl.AIOFileLockNodeManager] (MSC service thread 1-13) Waiting to obtain live lock
2018-06-12 07:15:56,924 UTC INFO  [org.hornetq.core.server.impl.AIOFileLockNodeManager] (MSC service thread 1-13) Live Server Obtained live lock
2018-06-12 07:15:56,981 UTC INFO  [org.jboss.as.remoting] (MSC service thread 1-5) JBAS017100: Listening on 0.0.0.0/0.0.0.0:9092
2018-06-12 07:15:56,982 UTC INFO  [org.jboss.as.remoting] (MSC service thread 1-14) JBAS017100: Listening on /127.0.0.1:9084
2018-06-12 07:15:56,996 UTC INFO  [org.jboss.as.server.deployment.scanner] (MSC service thread 1-10) JBAS015012: Started FileSystemDeploymentService for directory $JMS_HOME/server/DctmServer_MethodServer/deployments
2018-06-12 07:15:57,019 UTC INFO  [org.jboss.as.server.deployment.scanner] (DeploymentScanner-threads - 1) JBAS015014: Re-attempting failed deployment bundle.jar
2018-06-12 07:15:57,020 UTC INFO  [org.jboss.as.server.deployment.scanner] (DeploymentScanner-threads - 1) JBAS015014: Re-attempting failed deployment bundle.jar
2018-06-12 07:15:57,021 UTC INFO  [org.jboss.as.server.deployment.scanner] (DeploymentScanner-threads - 1) JBAS015014: Re-attempting failed deployment bundle.jar
2018-06-12 07:15:57,023 UTC INFO  [org.jboss.as.server.deployment.scanner] (DeploymentScanner-threads - 1) JBAS015014: Re-attempting failed deployment bundle.jar
2018-06-12 07:15:57,024 UTC INFO  [org.jboss.as.server.deployment.scanner] (DeploymentScanner-threads - 1) JBAS015014: Re-attempting failed deployment bundle.jar
2018-06-12 07:15:57,025 UTC INFO  [org.jboss.as.server.deployment.scanner] (DeploymentScanner-threads - 1) JBAS015014: Re-attempting failed deployment bundle.jar
2018-06-12 07:15:57,026 UTC INFO  [org.jboss.as.server.deployment.scanner] (DeploymentScanner-threads - 1) JBAS015003: Found ServerApps.ear in deployment directory. To trigger deployment create a file called ServerApps.ear.dodeploy
2018-06-12 07:15:57,027 UTC INFO  [org.jboss.as.server.deployment.scanner] (DeploymentScanner-threads - 1) JBAS015003: Found XhiveConnector.ear in deployment directory. To trigger deployment create a file called XhiveConnector.ear.dodeploy
2018-06-12 07:15:57,028 UTC INFO  [org.jboss.as.server.deployment.scanner] (DeploymentScanner-threads - 1) JBAS015003: Found bpm.ear in deployment directory. To trigger deployment create a file called bpm.ear.dodeploy
2018-06-12 07:15:57,029 UTC INFO  [org.jboss.as.server.deployment.scanner] (DeploymentScanner-threads - 1) JBAS015003: Found acs.ear in deployment directory. To trigger deployment create a file called acs.ear.dodeploy
2018-06-12 07:15:57,090 UTC INFO  [org.apache.coyote.http11.Http11Protocol] (MSC service thread 1-2) Starting Coyote HTTP/1.1 on http-0.0.0.0-0.0.0.0-9082
2018-06-12 07:15:57,235 UTC INFO  [org.hornetq.core.remoting.impl.netty.NettyAcceptor] (MSC service thread 1-13) Started Netty Acceptor version 3.2.5.Final-a96d88c 0.0.0.0:9090 for CORE protocol
2018-06-12 07:15:57,237 UTC INFO  [org.hornetq.core.remoting.impl.netty.NettyAcceptor] (MSC service thread 1-13) Started Netty Acceptor version 3.2.5.Final-a96d88c 0.0.0.0:9091 for CORE protocol
2018-06-12 07:15:57,239 UTC INFO  [org.hornetq.core.server.impl.HornetQServerImpl] (MSC service thread 1-13) Server is now live
2018-06-12 07:15:57,239 UTC INFO  [org.hornetq.core.server.impl.HornetQServerImpl] (MSC service thread 1-13) HornetQ Server version 2.2.13.Final (HQ_2_2_13_FINAL_AS7, 122) [b774a781-a4da-11e6-a1e8-005056082847]) started
2018-06-12 07:15:57,256 UTC INFO  [org.jboss.as.messaging] (MSC service thread 1-14) JBAS011601: Bound messaging object to jndi name java:jboss/exported/jms/RemoteConnectionFactory
2018-06-12 07:15:57,257 UTC INFO  [org.jboss.as.messaging] (MSC service thread 1-14) JBAS011601: Bound messaging object to jndi name java:/RemoteConnectionFactory
2018-06-12 07:15:57,258 UTC INFO  [org.jboss.as.messaging] (MSC service thread 1-4) JBAS011601: Bound messaging object to jndi name java:/ConnectionFactory
2018-06-12 07:15:57,261 UTC INFO  [org.hornetq.core.server.impl.HornetQServerImpl] (MSC service thread 1-9) trying to deploy queue jms.queue.touchRpcQueue
2018-06-12 07:15:57,267 UTC INFO  [org.jboss.as.messaging] (MSC service thread 1-9) JBAS011601: Bound messaging object to jndi name java:jboss/exported/jms/queue/touchRpcQueue
2018-06-12 07:15:57,268 UTC INFO  [org.jboss.as.messaging] (MSC service thread 1-9) JBAS011601: Bound messaging object to jndi name java:/queue/touchRpcQueue
2018-06-12 07:15:57,269 UTC INFO  [org.jboss.as.messaging] (MSC service thread 1-3) JBAS011601: Bound messaging object to jndi name java:/TouchRpcQueueConnectionFactory
2018-06-12 07:15:57,270 UTC INFO  [org.jboss.as.messaging] (MSC service thread 1-3) JBAS011601: Bound messaging object to jndi name java:jboss/exported/jms/TouchRpcQueueConnectionFactory
2018-06-12 07:15:57,301 UTC INFO  [org.jboss.as.deployment.connector] (MSC service thread 1-8) JBAS010406: Registered connection factory java:/JmsXA
2018-06-12 07:15:57,311 UTC INFO  [org.hornetq.ra.HornetQResourceAdapter] (MSC service thread 1-8) HornetQ resource adaptor started
2018-06-12 07:15:57,312 UTC INFO  [org.jboss.as.connector.services.ResourceAdapterActivatorService$ResourceAdapterActivator] (MSC service thread 1-8) IJ020002: Deployed: file://RaActivatorhornetq-ra
2018-06-12 07:15:57,316 UTC INFO  [org.jboss.as.deployment.connector] (MSC service thread 1-4) JBAS010401: Bound JCA ConnectionFactory 
2018-06-12 07:15:57,343 UTC ERROR [org.jboss.as.controller.management-operation] (DeploymentScanner-threads - 2) Operation ("add") failed - address: ([("deployment" => "bundle.jar")]) - failure description: "JBAS014803: Duplicate resource [(\"deployment\" => \"bundle.jar\")]"
2018-06-12 07:15:57,348 UTC ERROR [org.jboss.as.server.deployment.scanner] (DeploymentScanner-threads - 1) JBAS014654: Composite operation was rolled back
2018-06-12 07:15:57,349 UTC INFO  [org.jboss.as] (Controller Boot Thread) JBAS015951: Admin console listening on http://127.0.0.1:9085
2018-06-12 07:15:57,350 UTC ERROR [org.jboss.as.server.deployment.scanner] (DeploymentScanner-threads - 1) JBAS014654: Composite operation was rolled back
2018-06-12 07:15:57,350 UTC INFO  [org.jboss.as] (Controller Boot Thread) JBAS015874: JBoss AS 7.1.1.Final "Brontes" started in 2885ms - Started 148 of 226 services (78 services are passive or on-demand)
2018-06-12 07:15:57,351 UTC ERROR [org.jboss.as.server.deployment.scanner] (DeploymentScanner-threads - 1) JBAS014654: Composite operation was rolled back
2018-06-12 07:15:57,353 UTC ERROR [org.jboss.as.server.deployment.scanner] (DeploymentScanner-threads - 1) JBAS014654: Composite operation was rolled back
2018-06-12 07:15:57,354 UTC ERROR [org.jboss.as.server.deployment.scanner] (DeploymentScanner-threads - 1) JBAS014654: Composite operation was rolled back
2018-06-12 07:15:57,355 UTC ERROR [org.jboss.as.server.deployment.scanner] (DeploymentScanner-threads - 1) {"JBAS014653: Composite operation failed and was rolled back. Steps that failed:" => {"Operation step-1" => "JBAS014803: Duplicate resource [(\"deployment\" => \"bundle.jar\")]"}}
2018-06-12 07:15:57,357 UTC ERROR [org.jboss.as.server.deployment.scanner] (DeploymentScanner-threads - 1) undefined
2018-06-12 07:15:57,358 UTC ERROR [org.jboss.as.server.deployment.scanner] (DeploymentScanner-threads - 1) undefined
2018-06-12 07:15:57,359 UTC ERROR [org.jboss.as.server.deployment.scanner] (DeploymentScanner-threads - 1) undefined
2018-06-12 07:15:57,359 UTC ERROR [org.jboss.as.server.deployment.scanner] (DeploymentScanner-threads - 1) undefined

 

This was linked to deployments so I tried as a first step to force a redeploy of all Documentum applications. It can be done as shown below and in those cases, it is better to stop the JMS before otherwise it will try to redeploy the applications on the fly, which might work but you aren’t sure they will start again on next startup… So here are the steps:

[dmadmin@content_server_01 ~]$ cd $JMS_HOME/server/DctmServer_MethodServer/deployments
[dmadmin@content_server_01 deployments]$ 
[dmadmin@content_server_01 deployments]$ find . -maxdepth 1 -name "*.failed"
./acs.ear.failed
./ServerApps.ear.failed
./error.war.failed
./bpm.ear.failed
[dmadmin@content_server_01 deployments]$ 
[dmadmin@content_server_01 deployments]$ 
[dmadmin@content_server_01 deployments]$ for i in `find . -maxdepth 1 -name "*.failed"`; do name=`echo ${i} | sed 's,.failed,.dodeploy,'`; mv ${i} ${name}; done
[dmadmin@content_server_01 deployments]$

 

Then you can just start again the JMS:

[dmadmin@content_server_01 deployments]$ $JMS_HOME/server/startJMS.sh
Starting the JMS...
The JMS process has been started.
[dmadmin@content_server_01 deployments]$

 

Unfortunately in this case, the JMS logs were similar and the Documentum applications were still in failed. So looking closer at these logs, it seemed to be related to a bundle.jar file:

[dmadmin@content_server_01 deployments]$ find . -name bundle.jar
./felix-cache/bundle3/version0.0/bundle.jar
./felix-cache/bundle5/version0.0/bundle.jar
./felix-cache/bundle1/version0.0/bundle.jar
./felix-cache/bundle4/version0.0/bundle.jar
./felix-cache/bundle6/version0.0/bundle.jar
./felix-cache/bundle2/version0.0/bundle.jar
[dmadmin@content_server_01 deployments]$
[dmadmin@content_server_01 deployments]$ ls felix-cache/*
felix-cache/bundle0:
bundle.id

felix-cache/bundle1:
bundle.id  bundle.lastmodified  bundle.location  bundle.startlevel  bundle.state  version0.0

felix-cache/bundle2:
bundle.id  bundle.lastmodified  bundle.location  bundle.startlevel  bundle.state  version0.0

felix-cache/bundle3:
bundle.id  bundle.lastmodified  bundle.location  bundle.startlevel  bundle.state  version0.0

felix-cache/bundle4:
bundle.id  bundle.lastmodified  bundle.location  bundle.startlevel  bundle.state  version0.0

felix-cache/bundle5:
bundle.id  bundle.lastmodified  bundle.location  bundle.startlevel  bundle.state  version0.0

felix-cache/bundle6:
bundle.id  bundle.lastmodified  bundle.location  bundle.startlevel  bundle.state  version0.0
[dmadmin@content_server_01 deployments]$
[dmadmin@content_server_01 deployments]$
[dmadmin@content_server_01 deployments]$ for i in `ls felix-cache/`; do echo "felix-cache/${i}:"; cat felix-cache/${i}/bundle.location; echo; echo; done
felix-cache/bundle0:
cat: felix-cache/bundle0/bundle.location: No such file or directory


felix-cache/bundle1:
file:$JMS_HOME/server/DctmServer_MethodServer/deployments/acs.ear/lib/HttpService.jar

felix-cache/bundle2:
file:$JMS_HOME/server/DctmServer_MethodServer/deployments/acs.ear/lib/HttpServiceImpl.jar

felix-cache/bundle3:
file:$JMS_HOME/server/DctmServer_MethodServer/deployments/acs.ear/lib/Common.jar

felix-cache/bundle4:
file:$JMS_HOME/server/DctmServer_MethodServer/deployments/acs.ear/lib/Web.jar

felix-cache/bundle5:
file:$JMS_HOME/server/DctmServer_MethodServer/deployments/acs.ear/lib/Jmx.jar

felix-cache/bundle6:
file:$JMS_HOME/server/DctmServer_MethodServer/deployments/acs.ear/lib/org.apache.felix.bundlerepository-1.2.1.jar

[dmadmin@content_server_01 deployments]$
[dmadmin@content_server_01 deployments]$ diff felix-cache/bundle1/version0.0/bundle.jar $JMS_HOME/server/DctmServer_MethodServer/deployments/acs.ear/lib/HttpService.jar
[dmadmin@content_server_01 deployments]$
[dmadmin@content_server_01 deployments]$ diff felix-cache/bundle3/version0.0/bundle.jar $JMS_HOME/server/DctmServer_MethodServer/deployments/acs.ear/lib/Common.jar
[dmadmin@content_server_01 deployments]$

 

This felix-cache is normally created by the ACS when you start it and it will normally be placed in the folder from where you start the JMS. In this case, it was under the deployments folder, probably because someone started the JMS using the startMethodServer.sh script directly and he didn’t use our custom start script (startJMS.sh as you can see above and below) which actually takes care of that and switch to the correct folder before (and with the proper nohup, aso…). Because it was created under the deployments folder, the JMS was trying to deploy it as any other applications and so it failed. As you can see above, the file “bundle.jar” (in the different folders) is actually a copy of some of the ACS library files – the ones mentioned in the bundle.location – that has been put to the felix-cache by the ACS.

As you know, with a newly installed JMS for Documentum, there is no felix-cache folder under the deployments. What you can find here are the ServerApps, ACS or BPM for example but that’s pretty much it. Therefore, to solve this issue, simply remove the felix-cache folder, force a new deployment of all Documentum applications, restart it again properly and it should be good then:

[dmadmin@content_server_01 deployments]$ $JMS_HOME/server/stopMethodServer.sh; sleep 5; ps -ef | grep MethodServer
[dmadmin@content_server_01 deployments]$
[dmadmin@content_server_01 deployments]$ rm -rf felix-cache/
[dmadmin@content_server_01 deployments]$
[dmadmin@content_server_01 deployments]$ find . -maxdepth 1 -name "*.failed"
./acs.ear.failed
./ServerApps.ear.failed
./error.war.failed
./bpm.ear.failed
[dmadmin@content_server_01 deployments]$
[dmadmin@content_server_01 deployments]$ for i in `find . -maxdepth 1 -name "*.failed"`; do name=`echo ${i} | sed 's,.failed,.dodeploy,'`; mv ${i} ${name}; done
[dmadmin@content_server_01 deployments]$
[dmadmin@content_server_01 deployments]$ $JMS_HOME/server/startJMS.sh
Starting the JMS...
The JMS process has been started.
[dmadmin@content_server_01 deployments]$

 

Then checking if the issue is gone and if all the applications are now properly deployed:

[dmadmin@content_server_01 deployments]$ grep -E " deployment | Deployed " $JMS_HOME/server/nohup-JMS.out
2018-06-12 07:30:15,142 UTC INFO  [org.jboss.as.server.deployment.scanner] (DeploymentScanner-threads - 1) JBAS015003: Found ServerApps.ear in deployment directory. To trigger deployment create a file called ServerApps.ear.dodeploy
2018-06-12 07:30:15,144 UTC INFO  [org.jboss.as.server.deployment.scanner] (DeploymentScanner-threads - 1) JBAS015003: Found XhiveConnector.ear in deployment directory. To trigger deployment create a file called XhiveConnector.ear.dodeploy
2018-06-12 07:30:15,145 UTC INFO  [org.jboss.as.server.deployment.scanner] (DeploymentScanner-threads - 1) JBAS015003: Found bpm.ear in deployment directory. To trigger deployment create a file called bpm.ear.dodeploy
2018-06-12 07:30:15,146 UTC INFO  [org.jboss.as.server.deployment.scanner] (DeploymentScanner-threads - 1) JBAS015003: Found acs.ear in deployment directory. To trigger deployment create a file called acs.ear.dodeploy
2018-06-12 07:30:15,503 UTC INFO  [org.jboss.as.connector.services.ResourceAdapterActivatorService$ResourceAdapterActivator] (MSC service thread 1-1) IJ020002: Deployed: file://RaActivatorhornetq-ra
2018-06-12 07:30:15,535 UTC INFO  [org.jboss.as.server.deployment] (MSC service thread 1-5) JBAS015876: Starting deployment of "bpm.ear"
2018-06-12 07:30:15,535 UTC INFO  [org.jboss.as.server.deployment] (MSC service thread 1-13) JBAS015876: Starting deployment of "ServerApps.ear"
2018-06-12 07:30:15,535 UTC INFO  [org.jboss.as.server.deployment] (MSC service thread 1-7) JBAS015876: Starting deployment of "error.war"
2018-06-12 07:30:15,535 UTC INFO  [org.jboss.as.server.deployment] (MSC service thread 1-11) JBAS015876: Starting deployment of "acs.ear"
2018-06-12 07:30:17,824 UTC INFO  [org.jboss.as.server.deployment] (MSC service thread 1-2) JBAS015876: Starting deployment of "documentum-bocs-ws.war"
2018-06-12 07:30:17,825 UTC INFO  [org.jboss.as.server.deployment] (MSC service thread 1-15) JBAS015876: Starting deployment of "bocs.war"
2018-06-12 07:30:18,093 UTC INFO  [org.apache.catalina.core.ContainerBase.[jboss.web].[default-host].[/ACS]] (MSC service thread 1-16) Initializing CORS filter as per the deployment descriptor
2018-06-12 07:30:18,608 UTC INFO  [org.jboss.as.server.deployment] (MSC service thread 1-11) JBAS015876: Starting deployment of "DmMethods.war"
2018-06-12 07:30:18,608 UTC INFO  [org.jboss.as.server.deployment] (MSC service thread 1-9) JBAS015876: Starting deployment of "DmMail.war"
2018-06-12 07:30:18,904 UTC INFO  [org.jboss.as.server.deployment] (MSC service thread 1-11) JBAS015876: Starting deployment of "bpm.war"
2018-06-12 07:30:32,017 UTC INFO  [org.jboss.as.server] (DeploymentScanner-threads - 2) JBAS018559: Deployed "bpm.ear"
2018-06-12 07:30:32,019 UTC INFO  [org.jboss.as.server] (DeploymentScanner-threads - 2) JBAS018559: Deployed "error.war"
2018-06-12 07:30:32,020 UTC INFO  [org.jboss.as.server] (DeploymentScanner-threads - 2) JBAS018559: Deployed "acs.ear"
2018-06-12 07:30:32,021 UTC INFO  [org.jboss.as.server] (DeploymentScanner-threads - 2) JBAS018559: Deployed "ServerApps.ear"
[dmadmin@content_server_01 deployments]$

 

So takes care when you start the JMS, if you do not have any custom script (it’s pretty much mandatory for the JMS!) or if you do not change the working directory before executing the startMethodServer.sh script, you might have some surprises.

 

Edit: After writing this blog, I searched the OTX website for something related to this and found that there is a KB (KB7795388) about this but I think this blog still makes sense because it provides more information and some explanation. That’s a bad habit of mine, I usually try fixing things myself before checking the OTX website because I don’t like it and the less I’m using it, the better.

 

 

Cet article Documentum – Unable to restart a JMS, failed deployment of bundle.jar est apparu en premier sur Blog dbi services.

Pages