Yann Neuhaus

Subscribe to Yann Neuhaus feed
dbi services technical blog
Updated: 2 hours 10 min ago

Welcome to M|17, part 2

Mon, 2017-04-17 03:57

m17bannernew
Welcome to the second day of the MariaDB’s first user conference
On the 12th, at 09:00, started the first-ever experimental MariaDB Associate certification exam and I was glad to be among the first and participate
This exam was offered free of charges to all registered attendees
As I wrote over, it was really experimental because all candidates faced many problems
Certification
First, as this exam was proctored, the authentification process was very, very slow, essentially due to the overloaded network
Once done, we were all expecting a “Multiple Choice Question” as in almost all other certifications instead of we had to perform real-world database administration tasks on a remote linux box where a MariaDB server was installed
Following skills were tested:
Server configuration
Security
Users and Roles
Schema Operations
Query Performance
Backup and Restore
Testing duration was 90mn but when you are facing network break and slowness, it’s really short
To pass the exam you need 48 points on a total of 60, so 80%
One thing you do not have to forget when you are finished is to absolutely restart the MariaDB server otherwise all your servers configuration answers are lost
They kindly warned us before we started but at the end there were no alert and communication was roughly stopped
This certification will be definitely Online in one or two months
After lunch, which was as the day before a big buffet but more exotic, my decision was to go to the session of Ashraf Sharif from Severalnines
Step-By-Step: Clustering with Galera and Docker Swarm
I was really happy to see him as we often collaborated for several ClusterControl support cases. He was happy too
Unfortunately for him, he had to speed up because 45mn was not enough for such a vast topic
It was even quite a challenge as he had more than 140 slides and a demo
FullSizeRender
Several key notes were then proposed to close this 2-days event in the conference center
Again the air-conditioning was too cool and this time I got sick
Gunnar Hellekson, director of Product Management for Red Hat Enterprise Linux started with Open Source in a dangerous world
He discussed mainly on how we can leverage the amazing innovation coming out of open source communities while still plotting a journey with secure, stable and supported open source platforms, illustrating with some examples of customer and organizations that use open source to not just innovate but add more competitive advantage
The last key note was proposed by Michael Widenius himself, Everything Old is New: the return of relational
As the database lanscape is changing, evolving very fast and is no longer the property of some as Oracle, IBM or Microsoft,
he is convinced that even NOSQL may work for a subset of use cases, open source relational database are delivering more and more capabilities for NoSQL use cases at a rapid pace

As a conclusion for this MariaDB’s first user conference, my overall impression is positive, it was well organized, all the staff were enthusiastic and open, we could meet and talk with a lot of different people
So, a sweet juicy well dosed workshop, some high level sessions to bring sweetness and acidity into perfect harmony, 3 or 4 spicy key notes to enhance the taste of the event spirit, all ingredients to a cocktail shaker, shake and you obtain the delicious and unforgettable M|17 cocktail.

 

Cet article Welcome to M|17, part 2 est apparu en premier sur Blog dbi services.

Listing the extensions available in PostgreSQL

Mon, 2017-04-17 02:52

When you follow this blog regularly you probably already now that PostgreSQL is highly extensible. There are quite a couple of extension which ship by default and are ready to use. How can you know what is there? The most obvious way is to check the documentation. But did you know there are other ways for getting this information?

What you can do to list the available extensions is to check the files on disk at the location where you installed PostgreSQL, in my case:

postgres@pgbox:/u01/app/postgres/product/96/db_2/share/extension/ [PG962] pwd
/u01/app/postgres/product/96/db_2/share/extension
postgres@pgbox:/u01/app/postgres/product/96/db_2/share/extension/ [PG962] ls
adminpack--1.0.sql                  hstore--1.3--1.4.sql                  pageinspect.control                      plperlu--unpackaged--1.0.sql
adminpack.control                   hstore--1.4.sql                       pageinspect--unpackaged--1.0.sql         plpgsql--1.0.sql
autoinc--1.0.sql                    hstore.control                        pg_buffercache--1.0--1.1.sql             plpgsql.control
autoinc.control                     hstore_plperl--1.0.sql                pg_buffercache--1.1--1.2.sql             plpgsql--unpackaged--1.0.sql
...

The issue with this approach is that chances are high that you have no clue what the extensions are about. Better ask the database by checking pg_available_extensions:

postgres=# select * from pg_available_extensions;
        name        | default_version | installed_version |                               comment                                
--------------------+-----------------+-------------------+----------------------------------------------------------------------
 plpgsql            | 1.0             | 1.0               | PL/pgSQL procedural language
 plperl             | 1.0             |                   | PL/Perl procedural language
 plperlu            | 1.0             |                   | PL/PerlU untrusted procedural language
 plpython2u         | 1.0             |                   | PL/Python2U untrusted procedural language
 plpythonu          | 1.0             |                   | PL/PythonU untrusted procedural language
 pltcl              | 1.0             |                   | PL/Tcl procedural language
 pltclu             | 1.0             |                   | PL/TclU untrusted procedural language
 adminpack          | 1.0             |                   | administrative functions for PostgreSQL
 bloom              | 1.0             |                   | bloom access method - signature file based index
 btree_gin          | 1.0             |                   | support for indexing common datatypes in GIN
 btree_gist         | 1.2             |                   | support for indexing common datatypes in GiST
 chkpass            | 1.0             |                   | data type for auto-encrypted passwords
...

Here you can check the “comment” column which explains what an extension is about.

There is another catalog view which gives you even more information, e.g. the dependencies between extensions, pg_available_extension_versions:

postgres=# select * from pg_available_extension_versions where requires is not null;
       name        | version | installed | superuser | relocatable | schema |      requires       |                           comment                            
-------------------+---------+-----------+-----------+-------------+--------+---------------------+--------------------------------------------------------------
 earthdistance     | 1.1     | f         | t         | t           |        | {cube}              | calculate great-circle distances on the surface of the Earth
 hstore_plperl     | 1.0     | f         | t         | t           |        | {hstore,plperl}     | transform between hstore and plperl
 hstore_plperlu    | 1.0     | f         | t         | t           |        | {hstore,plperlu}    | transform between hstore and plperlu
 hstore_plpythonu  | 1.0     | f         | t         | t           |        | {hstore,plpythonu}  | transform between hstore and plpythonu
 hstore_plpython2u | 1.0     | f         | t         | t           |        | {hstore,plpython2u} | transform between hstore and plpython2u
 hstore_plpython3u | 1.0     | f         | t         | t           |        | {hstore,plpython3u} | transform between hstore and plpython3u
 ltree_plpythonu   | 1.0     | f         | t         | t           |        | {ltree,plpythonu}   | transform between ltree and plpythonu
 ltree_plpython2u  | 1.0     | f         | t         | t           |        | {ltree,plpython2u}  | transform between ltree and plpython2u
 ltree_plpython3u  | 1.0     | f         | t         | t           |        | {ltree,plpython3u}  | transform between ltree and plpython3u
(9 rows)

Once you installed an extension you have two options for displaying that information. Either you use the psql shortcut:

postgres=# create extension hstore;
CREATE EXTENSION
postgres=# \dx
                           List of installed extensions
  Name   | Version |   Schema   |                   Description                    
---------+---------+------------+--------------------------------------------------
 hstore  | 1.4     | public     | data type for storing sets of (key, value) pairs
 plpgsql | 1.0     | pg_catalog | PL/pgSQL procedural language
(2 rows)

… or you ask pg_extension:

postgres=# select * from pg_extension ;
 extname | extowner | extnamespace | extrelocatable | extversion | extconfig | extcondition 
---------+----------+--------------+----------------+------------+-----------+--------------
 plpgsql |       10 |           11 | f              | 1.0        |           | 
 hstore  |       10 |         2200 | t              | 1.4        |           | 

Btw: Did you know that you can tell psql to show you the actual statement that gets executed when you use a shortcut?

postgres=# \set ECHO_HIDDEN on
postgres=# \dx
********* QUERY **********
SELECT e.extname AS "Name", e.extversion AS "Version", n.nspname AS "Schema", c.description AS "Description"
FROM pg_catalog.pg_extension e LEFT JOIN pg_catalog.pg_namespace n ON n.oid = e.extnamespace LEFT JOIN pg_catalog.pg_description c ON c.objoid = e.oid AND c.classoid = 'pg_catalog.pg_extension'::pg_catalog.regclass
ORDER BY 1;
**************************

                           List of installed extensions
  Name   | Version |   Schema   |                   Description                    
---------+---------+------------+--------------------------------------------------
 hstore  | 1.4     | public     | data type for storing sets of (key, value) pairs
 plpgsql | 1.0     | pg_catalog | PL/pgSQL procedural language

Happy extending …

 

Cet article Listing the extensions available in PostgreSQL est apparu en premier sur Blog dbi services.

SQLcl on Bash on Ubuntu on Windows

Sun, 2017-04-16 11:54

I’m running my laptop on Windows, which may sound weird, but Linux is unfortunately not an option when you exchange Microsoft Word documents, manage your e-mails and calendar with Outlook and present with Powerpoint using dual screen (I want to share on the beamer only the slides or demo screen, not my whole desktop). However, I have 3 ways to enjoy GNU/Linux: Cygwin to operate on my laptop, VirtualBox to run Linux hosts, and Cloud services when free trials are available.

Now that Windows 10 has a Linux subsystem, I’ll try it to see if I still need Cygwin.
In a summary, I’ll still use Cygwin, but may prefer this Linux subsystem to run SQLcl, the SQL Developer command line, from my laptop.

Bash on Ubuntu on Windows

In this post I’ll detail what I had to setup to get the following:
CaptureWin10bash000
Bash on Windows 10 is available for several months, but with no interaction with the Windows system except accessing to the filesystems. I didn’t try that. This month, Microsoft has released a new update, called ‘Creator Update’ for whatever reason.

Creator Update

You will probably have no choice to update to ‘Creator Update’ soon but for the moment you have to download Windows10Upgrade9252.exe from https://www.microsoft.com/en-us/software-download/windows10

Windows Subsystem for Linux

You enable the feature from Control Panel -> Programs and Features -> Turn Windows features on and off:
CaptureWin10bash002

This requires a reboot. Windows is not yet an OS where you can install or enable features without closing everything. But at least in Windows 10 the reboot is very fast.

Developer mode

This is a beta feature and requires to enable developer mode:
CaptureWin10bash003

You do that on the Setup -> Update and Security -> For developers:

CaptureWin10bash001

Bash

Now, when you run it (type Bash in the start menu) it installs a subset of Ubuntu (downloaded from the web):
CaptureWin10bash005
It asks for a user and password. You will need the password to sudo to root.
You are in Windows/System32 here, which is ugly, so better exit and run again ‘Bash on Ubuntu on Windows’.

HOME

All my customization (.bash_profile .bashrc .vimrc .tmux.conf .ssh/config … ) is in my cygwin environment and I want to share it for the time I’ll run both Cygwin and Bash on Ubuntu on Windows. For this, I sudo and change the entry in /etc/passwd to have my home where I have my cygwin.home:
fpa:x:1000:1000:"",,,:/mnt/d/Dropbox/cygwin-home/:/bin/bash

Mount

Here are the mount points I have on Cygwin
$ mount
C:/cygwin64/bin on /usr/bin type ntfs (binary,auto)
C:/cygwin64/lib on /usr/lib type ntfs (binary,auto)
C:/cygwin64 on / type ntfs (binary,auto)
C: on /cygdrive/c type ntfs (binary,posix=0,user,noumount,auto)
D: on /cygdrive/d type ntfs (binary,posix=0,user,noumount,auto)

My C: and D: windows drives are mounted in /cygdrive

Here are the mounts I have on the Windows Subsystem for Linux:
root@dell-fpa:/mnt# mount
rootfs on / type lxfs (rw,noatime)
data on /data type lxfs (rw,noatime)
cache on /cache type lxfs (rw,noatime)
mnt on /mnt type lxfs (rw,noatime)
sysfs on /sys type sysfs (rw,nosuid,nodev,noexec,noatime)
proc on /proc type proc (rw,nosuid,nodev,noexec,noatime)
none on /dev type tmpfs (rw,noatime,mode=755)
devpts on /dev/pts type devpts (rw,nosuid,noexec,noatime)
none on /run type tmpfs (rw,nosuid,noexec,noatime,mode=755)
none on /run/lock type tmpfs (rw,nosuid,nodev,noexec,noatime)
none on /run/shm type tmpfs (rw,nosuid,nodev,noatime)
none on /run/user type tmpfs (rw,nosuid,nodev,noexec,noatime,mode=755)
C: on /mnt/c type drvfs (rw,noatime)
D: on /mnt/d type drvfs (rw,noatime)
root on /root type lxfs (rw,noatime)
home on /home type lxfs (rw,noatime)
binfmt_misc on /proc/sys/fs/binfmt_misc type binfmt_misc (rw,noatime)

Because I have scripts and configuration files that mention /cygdrive, I’ve created symbolic links for them:

fpa@dell-fpa:/mnt$ sudo su
[sudo] password for fpa:
root@dell-fpa:/mnt# mkdir /cygdrive
root@dell-fpa:/# ln -s /mnt/c /cygdrive/c
root@dell-fpa:/# ln -s /mnt/d /cygdrive/D

chmod

The first thin I do from my bash shell is to ssh to other hosts:


fpa@dell-fpa:/mnt/c/Users/fpa$ ssh 192.168.78.104
Bad owner or permissions on /mnt/d/Dropbox/cygwin-home//.ssh/config

Ok, permissions of .ssh was set from cygwin, let’s try it from Bash On Ubuntu on Linux:

fpa@dell-fpa:/mnt/c/Users/fpa$ chmod 644 /mnt/d/Dropbox/cygwin-home//.ssh/config
fpa@dell-fpa:/mnt/c/Users/fpa$ ls -ld /mnt/d/Dropbox/cygwin-home//.ssh/config
-rw-rw-rw- 1 root root 5181 Mar 5 16:56 /mnt/d/Dropbox/cygwin-home//.ssh/config

This is not what I want. With 644 I expect -rw-r–r–

Let’s try 444:

fpa@dell-fpa:/mnt/c/Users/fpa$ chmod 444 /mnt/d/Dropbox/cygwin-home//.ssh/config
fpa@dell-fpa:/mnt/c/Users/fpa$ ls -ld /mnt/d/Dropbox/cygwin-home//.ssh/config
-r--r--r-- 1 root root 5181 Mar 5 16:56 /mnt/d/Dropbox/cygwin-home//.ssh/config
fpa@dell-fpa:/mnt/c/Users/fpa$ ssh 192.168.78.104
Last login: Sun Apr 16 15:18:07 2017 from 192.168.78.1
...

Ok, this works but there’s a problem. It seems that the Bash On Ubuntu on Linux doesn’t allow to set permissions differently for user, group and others.

SQLcl

The second thing I do from bash in my laptop is to connect to databases with SQLcl. For Cygwin I had an alias that run the sql.bat script because Cygwin can run .bat files. When I run SQLcl from Cygwin, I run the Windows JDK. This doesn’t work in Bash on Ubuntu on Windows because we are in a Linux subsystem. But we don’t need to because SQLcl can be run directly from the sql bash script, calling the Linux JDK from the Linux subsystem. There’s only one thing to do: download the Linux JDK and set JAVA_HOME to the directory.

In my .bashrc I have the following to set the ‘sql’ alias depending on which environment I am


if [[ $(uname -a) =~ CYGWIN ]] then
alias sql='/cygdrive/D/Soft/sqlcl/bin/sql.bat'
else
alias sql='JAVA_HOME=/mnt/d/Soft/jdk1.8.0-Linux /cygdrive/D/Soft/sqlcl/bin/sql'
fi

What I observe here is that it is much faster (or less slower…) to start the JVM from the Linux subsystem.
Here 4 seconds to start SQLcl, connect and exit:

fpa@dell-fpa:/tmp$ time sql sys/oracle@//192.168.78.104/pdb1 as sysdba <<
 
real 0m4.684s
user 0m3.750s
sys 0m2.484s
 
fpa@dell-fpa:/tmp$ uname -a
Linux dell-fpa 4.4.0-43-Microsoft #1-Microsoft Wed Dec 31 14:42:53 PST 2014 x86_64 x86_64 x86_64 GNU/Linux

Here the same from Windows (Cygwin to time – but it’s running on Windows):

$ time sql sys/oracle@//192.168.78.104/pdb1 as sysdba <<
Disconnected from Oracle Database 12c Enterprise Edition Release 12.2.0.1.0 - 64bit Production
 
real 0m16.261s
user 0m0.000s
sys 0m0.015s
 
fpa@dell-fpa ~
$ uname -a
CYGWIN_NT-10.0 dell-fpa 2.7.0(0.306/5/3) 2017-02-12 13:18 x86_64 Cygwin

So what?

The Linux subsystem on Windows is not yet ready. The only thing I proved here is that it is faster to start a Java application from Linux, but for this I always have a VirtualBox VM started on my laptop, and this is where it is faster to run it, and have a real Linux system.

 

Cet article SQLcl on Bash on Ubuntu on Windows est apparu en premier sur Blog dbi services.

In-core logical replication will hit PostgreSQL 10

Thu, 2017-04-13 09:03

Finally in PostgreSQL 10 (expected to be released this September) a long awaited feature will probably appear: In-core logical replication. PostgreSQL supports physical replication since version 9.0 and now the next step happened with the implementation of logical replication. This will be a major help in upgrading PostgreSQL instances from one version to another with no (or almost no) downtime. In addition this can be used to consolidate data from various instances into one instance for reporting purposes or you can use it to distribute only a subset of your data to selected users on other instances. In contrast to physical replication logical replication works on the table level so you can replicate changes in one or more tables, one database are all databases in a PostgreSQL instance which is quite flexible.

In PostgreSQL logical replication is implemented using a publisher and subscriber model. This mean the publisher is the one who will send the data and the subscriber is the one who will receive and apply the changes. A subscriber can be a publisher as well so you can build cascading logical replication. Here is an overview of a possible setup:

pg-logocal-replication-overview

For setting up logical replication when you do not start with an empty database you’ll need to initially load the database where you want to replicate to. How can you do that? I have two PostgreSQL 10 instances (build from the git sources) running on the same host:

Role Port Publisher 6666 Subsriber 6667

Lets assume we have this sample setup on the publisher instance:

drop table if exists t1;
create table t1 ( a int primary key
                , b varchar(100)
                );
with generator as 
 ( select a.*
     from generate_series ( 1, 5000000 ) a
    order by random()
 )
insert into t1 ( a,b ) 
     select a
          , md5(a::varchar)
       from generator;
select * from pg_size_pretty ( pg_relation_size ('t1' ));

On the subscriber instance there is the same table, but empty:

create table t1 ( a int primary key
                , b varchar(100)
                );

Before we start with the initial load lets take a look at the process list:

postgres@pgbox:/home/postgres/ [PUBLISHER] ps -ef | egrep "PUBLISHER|SUBSCRIBER"
postgres 17311     1  0 11:33 pts/0    00:00:00 /u01/app/postgres/product/dev/db_01/bin/postgres -D /u02/pgdata/PUBLISHER
postgres 17313 17311  0 11:33 ?        00:00:00 postgres: PUBLISHER: checkpointer process   
postgres 17314 17311  0 11:33 ?        00:00:00 postgres: PUBLISHER: writer process   
postgres 17315 17311  0 11:33 ?        00:00:00 postgres: PUBLISHER: wal writer process   
postgres 17316 17311  0 11:33 ?        00:00:00 postgres: PUBLISHER: autovacuum launcher process   
postgres 17317 17311  0 11:33 ?        00:00:00 postgres: PUBLISHER: stats collector process   
postgres 17318 17311  0 11:33 ?        00:00:00 postgres: PUBLISHER: bgworker: logical replication launcher   
postgres 17321     1  0 11:33 pts/1    00:00:00 /u01/app/postgres/product/dev/db_01/bin/postgres -D /u02/pgdata/SUBSCRIBER
postgres 17323 17321  0 11:33 ?        00:00:00 postgres: SUBSCRIBER: checkpointer process   
postgres 17324 17321  0 11:33 ?        00:00:00 postgres: SUBSCRIBER: writer process   
postgres 17325 17321  0 11:33 ?        00:00:00 postgres: SUBSCRIBER: wal writer process   
postgres 17326 17321  0 11:33 ?        00:00:00 postgres: SUBSCRIBER: autovacuum launcher process   
postgres 17327 17321  0 11:33 ?        00:00:00 postgres: SUBSCRIBER: stats collector process   
postgres 17328 17321  0 11:33 ?        00:00:00 postgres: SUBSCRIBER: bgworker: logical replication launcher   

You’ll notice that there is a new background process called “bgworker: logical replication launcher”. We’ll come back to that later.

Time to create our first publication on the publisher with the create publication command:

postgres@pgbox:/u02/pgdata/PUBLISHER/ [PUBLISHER] psql -X postgres
psql (10devel)
Type "help" for help.

postgres=# create publication my_first_publication for table t1;
CREATE PUBLICATION

On the subscriber we need to create a subscription by using the create subscription command:

postgres@pgbox:/u02/pgdata/SUBSCRIBER/ [SUBSCRIBER] psql -X postgres
psql (10devel)
Type "help" for help.

postgres=# create subscription my_first_subscription connection 'host=localhost port=6666 dbname=postgres user=postgres' publication my_first_publication;
ERROR:  could not create replication slot "my_first_subscription": ERROR:  logical decoding requires wal_level >= logical

Ok, good hint. After changing that on both instances:

postgres@pgbox:/home/postgres/ [SUBSCRIBER] psql -X postgres
psql (10devel)
Type "help" for help.

postgres=# create subscription my_first_subscription connection 'host=localhost port=6666 dbname=postgres user=postgres' publication my_first_publication;
CREATE SUBSCRIPTION

If you are not on super fast hardware and check the process list again you’ll see something like this:

postgres 19465 19079 19 11:58 ?        00:00:04 postgres: SUBSCRIBER: bgworker: logical replication worker for subscription 16390 sync 16384  

On the subscriber the “logical replication launcher” background process launched a worker process and syncs the table automatically (this can be avoided by using the “NOCOPY DATA”):

postgres=# show port;
 port 
------
 6667
(1 row)

postgres=# select count(*) from t1;
  count  
---------
 5000000
(1 row)

Wow, that was really easy. You can find more details in the logfile of the subscriber instance:

2017-04-13 11:58:15.099 CEST - 1 - 19087 -  - @ LOG:  starting logical replication worker for subscription "my_first_subscription"
2017-04-13 11:58:15.101 CEST - 1 - 19463 -  - @ LOG:  logical replication apply for subscription my_first_subscription started
2017-04-13 11:58:15.104 CEST - 2 - 19463 -  - @ LOG:  starting logical replication worker for subscription "my_first_subscription"
2017-04-13 11:58:15.105 CEST - 1 - 19465 -  - @ LOG:  logical replication sync for subscription my_first_subscription, table t1 started
2017-04-13 11:59:03.373 CEST - 1 - 19082 -  - @ LOG:  checkpoint starting: xlog
2017-04-13 11:59:37.985 CEST - 2 - 19082 -  - @ LOG:  checkpoint complete: wrote 14062 buffers (85.8%); 1 transaction log file(s) added, 0 removed, 0 recycled; write=26.959 s, sync=2.291 s, total=34.740 s; sync files=13, longest=1.437 s, average=0.171 s; distance=405829 kB, estimate=405829 kB
2017-04-13 12:02:23.728 CEST - 2 - 19465 -  - @ LOG:  logical replication synchronization worker finished processing

On the publisher instance you get another process for sending the changes to the subscriber:

postgres 19464 18318  0 11:58 ?        00:00:00 postgres: PUBLISHER: wal sender process postgres ::1(41768) idle

Changes to the table on the publisher should now get replicated to the subscriber node:

postgres=# show port;
 port 
------
 6666
(1 row)
postgres=# insert into t1 (a,b) values (-1,'aaaaa');
INSERT 0 1
postgres=# update t1 set b='bbbbb' where a=-1;
UPDATE 1

On the subscriber node:

postgres=# show port;
 port 
------
 6667
(1 row)

postgres=# select * from t1 where a = -1;
 a  |   b   
----+-------
 -1 | aaaaa
(1 row)

postgres=# select * from t1 where a = -1;
 a  |   b   
----+-------
 -1 | bbbbb
(1 row)

As mentioned initially you can make the subscriber a publisher and the publisher a subscriber at the same time. So when we create this table on both instances:

create table t2 ( a int primary key );

Then create a publication on the subscriber node:

postgres=# create table t2 ( a int primary key );
CREATE TABLE
postgres=# show port;
 port 
------
 6667
(1 row)

postgres=# create publication my_second_publication for table t2;
CREATE PUBLICATION
postgres=# 

Then create the subscription to that on the publisher node:

postgres=# show port;
 port 
------
 6666
(1 row)

postgres=# create subscription my_second_subscription connection 'host=localhost port=6667 dbname=postgres user=postgres' publication my_second_publication;
CREATE SUBSCRIPTION

… we have a second logical replication the other way around:

postgres=# show port;
 port 
------
 6667
(1 row)
postgres=# insert into t2 values ( 1 );
INSERT 0 1
postgres=# insert into t2 values ( 2 );
INSERT 0 1
postgres=# 

On the other instance:

postgres=# show port;
 port 
------
 6666
(1 row)

postgres=# select * from t2;
 a 
---
 1
 2
(2 rows)

There are two new catalog views which give you information about subscriptions and publications:

postgres=# select * from pg_subscription;
 subdbid |        subname         | subowner | subenabled |                      subconninfo                       |      subslotname       |     subpublications     
---------+------------------------+----------+------------+--------------------------------------------------------+------------------------+-------------------------
   13216 | my_second_subscription |       10 | t          | host=localhost port=6667 dbname=postgres user=postgres | my_second_subscription | {my_second_publication}
(1 row)

postgres=# select * from pg_publication;
       pubname        | pubowner | puballtables | pubinsert | pubupdate | pubdelete 
----------------------+----------+--------------+-----------+-----------+-----------
 my_first_publication |       10 | f            | t         | t         | t
(1 row)

What a cool feature and so easy to use. Thanks to all who brought that into PostgreSQL 10, great work.

 

Cet article In-core logical replication will hit PostgreSQL 10 est apparu en premier sur Blog dbi services.

8 + 1 = 9, yes, true, but …

Thu, 2017-04-13 04:14

dbca_mb_1

Btw: If you really would do that (the screen shot is from 12.1.0.2):

SQL> alter system set sga_target=210m scope=spfile;

System altered.

SQL> alter system set sga_max_size=210m scope=spfile;

System altered.

SQL> alter system set pga_aggregate_target=16m scope=spfile;

System altered.

SQL> select banner from v$version;

BANNER
--------------------------------------------------------------------------------
Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production
PL/SQL Release 12.1.0.2.0 - Production
CORE	12.1.0.2.0	Production
TNS for Linux: Version 12.1.0.2.0 - Production
NLSRTL Version 12.1.0.2.0 - Production

SQL> startup force
ORA-00821: Specified value of sga_target 212M is too small, needs to be at least 320M
SQL> 

The same for 12.2.0.1:

SQL> alter system set sga_target=210m scope=spfile;

System altered.

SQL> alter system set sga_max_size=210m scope=spfile;

System altered.

SQL> alter system set pga_aggregate_target=16m scope=spfile;

System altered.

SQL> select banner from v$version;

BANNER
--------------------------------------------------------------------------------
Oracle Database 12c Enterprise Edition Release 12.2.0.1.0 - 64bit Production
PL/SQL Release 12.2.0.1.0 - Production
CORE	12.2.0.1.0	Production
TNS for Linux: Version 12.2.0.1.0 - Production
NLSRTL Version 12.2.0.1.0 - Production

SQL> startup force
ORA-00821: Specified value of sga_target 212M is too small, needs to be at least 468M
ORA-01078: failure in processing system parameters
SQL> 

To close this post here is another one that caught my eye yesterday:
solarisx64_2

Seems I totally missed that there was a x64 version of Solaris 8 and 9 :)

 

Cet article 8 + 1 = 9, yes, true, but … est apparu en premier sur Blog dbi services.

Oracle 12c – Why you shouldn’t do a crosscheck archivelog all in your regular RMAN backup scripts

Thu, 2017-04-13 02:28

Crosschecking in RMAN is quite cool stuff. With the RMAN crosscheck you can update an outdated RMAN repository about backups or archivelogs whose repository records do not match their physical status.

For example, if a user removes archived logs from disk with an operating system command, the repository (RMAN controlfile or RMAN catalog) still indicates that the logs are on disk, when in fact they are not. It is important to know, that the RMAN CROSSCHECK command never deletes any operating system files or removes any repository records, it just updates the repository with the correct information. In case you really want to delete something, you must use the DELETE command for these operations.

Manually removing archived logs or anything else out of the fast recovery area is something you should never do, however, in reality it still happens.

But when it happens, you want know which files are not on their physical location. So why not running a crosscheck archivelog all regularly in your backup scripts? Is it not a good idea?

From my point of view it is not. For two reason:

  • Your backup script runs slower because you do an extra step
  • But for and foremost you will not notice if an archived log is missing

Let’s run a little test case. I simply move one archived log away and run the backup archivelog all command afterwards.

oracle@dbidg03:/u03/fast_recovery_area/CDB/archivelog/2017_03_30/ [CDB (CDB$ROOT)] mv o1_mf_1_61_dfso8r7p_.arc o1_mf_1_61_dfso8r7p_.arc.20170413a

RMAN> backup archivelog all;

Starting backup at 13-APR-2017 08:03:14
current log archived
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=281 device type=DISK
allocated channel: ORA_DISK_2
channel ORA_DISK_2: SID=44 device type=DISK
RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03002: failure of backup command at 04/13/2017 08:03:17
RMAN-06059: expected archived log not found, loss of archived log compromises recoverability
ORA-19625: error identifying file /u03/fast_recovery_area/CDB/archivelog/2017_03_30/o1_mf_1_61_dfso8r7p_.arc
ORA-27037: unable to obtain file status
Linux-x86_64 Error: 2: No such file or directory
Additional information: 7

This is exactly what I have expected. I want to have a clear error message in case an archived log is missing. I don’t want Oracle to skip over it and just continue as if nothing has happened. But what happens if I run a crosscheck archivelog all before running my backup command?

RMAN> crosscheck archivelog all;

released channel: ORA_DISK_1
released channel: ORA_DISK_2
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=281 device type=DISK
allocated channel: ORA_DISK_2
channel ORA_DISK_2: SID=44 device type=DISK
validation succeeded for archived log
archived log file name=/u03/fast_recovery_area/CDB/archivelog/2017_03_28/o1_mf_1_56_dfmzywt1_.arc RECID=73 STAMP=939802622
validation succeeded for archived log
archived log file name=/u03/fast_recovery_area/CDB/archivelog/2017_03_28/o1_mf_1_57_dfo40o1g_.arc RECID=74 STAMP=939839542
validation succeeded for archived log
archived log file name=/u03/fast_recovery_area/CDB/archivelog/2017_03_29/o1_mf_1_58_dfovy7cj_.arc RECID=75 STAMP=939864041
validation succeeded for archived log
archived log file name=/u03/fast_recovery_area/CDB/archivelog/2017_03_29/o1_mf_1_59_dfq7pcwz_.arc RECID=76 STAMP=939908847
validation succeeded for archived log
archived log file name=/u03/fast_recovery_area/CDB/archivelog/2017_03_30/o1_mf_1_60_dfrg8f8o_.arc RECID=77 STAMP=939948334
validation succeeded for archived log
archived log file name=/u03/fast_recovery_area/CDB/archivelog/2017_03_31/o1_mf_1_62_dfv0kybr_.arc RECID=79 STAMP=940032607
validation succeeded for archived log
archived log file name=/u03/fast_recovery_area/CDB/archivelog/2017_03_31/o1_mf_1_63_dfw5s2l8_.arc RECID=80 STAMP=940070724
validation succeeded for archived log
archived log file name=/u03/fast_recovery_area/CDB/archivelog/2017_04_12/o1_mf_1_64_dgw5mgsl_.arc RECID=81 STAMP=941119119
validation succeeded for archived log
archived log file name=/u03/fast_recovery_area/CDB/archivelog/2017_04_13/o1_mf_1_65_dgy552z0_.arc RECID=82 STAMP=941184196
Crosschecked 9 objects

validation failed for archived log
archived log file name=/u03/fast_recovery_area/CDB/archivelog/2017_03_30/o1_mf_1_61_dfso8r7p_.arc RECID=78 STAMP=939988281
Crosschecked 1 objects
RMAN>

The crosscheck validation failed for the archived log which I have moved beforehand. Perfect, the crosscheck has found the issue.

RMAN> list expired backup;

specification does not match any backup in the repository

RMAN> list expired archivelog all;

List of Archived Log Copies for database with db_unique_name CDB
=====================================================================

Key     Thrd Seq     S Low Time
------- ---- ------- - --------------------
78      1    61      X 30-MAR-2017 00:45:33
        Name: /u03/fast_recovery_area/CDB/archivelog/2017_03_30/o1_mf_1_61_dfso8r7p_.arc

However, If I run the backup archivelog all afterwards, RMAN continues as if nothing has ever happened, and in case you are not monitoring expired archived logs or backups, you will never notice it.

RMAN> backup archivelog all;

Starting backup at 13-APR-2017 08:05:01
current log archived
using channel ORA_DISK_1
using channel ORA_DISK_2
channel ORA_DISK_1: starting compressed archived log backup set
channel ORA_DISK_1: specifying archived log(s) in backup set
input archived log thread=1 sequence=56 RECID=73 STAMP=939802622
input archived log thread=1 sequence=57 RECID=74 STAMP=939839542
input archived log thread=1 sequence=58 RECID=75 STAMP=939864041
input archived log thread=1 sequence=59 RECID=76 STAMP=939908847
input archived log thread=1 sequence=60 RECID=77 STAMP=939948334
channel ORA_DISK_1: starting piece 1 at 13-APR-2017 08:05:01
channel ORA_DISK_2: starting compressed archived log backup set
channel ORA_DISK_2: specifying archived log(s) in backup set
input archived log thread=1 sequence=62 RECID=79 STAMP=940032607
input archived log thread=1 sequence=63 RECID=80 STAMP=940070724
input archived log thread=1 sequence=64 RECID=81 STAMP=941119119
input archived log thread=1 sequence=65 RECID=82 STAMP=941184196
input archived log thread=1 sequence=66 RECID=83 STAMP=941184301
channel ORA_DISK_2: starting piece 1 at 13-APR-2017 08:05:01
channel ORA_DISK_2: finished piece 1 at 13-APR-2017 08:05:47
piece handle=/u03/fast_recovery_area/CDB/backupset/2017_04_13/o1_mf_annnn_TAG20170413T080501_dgy58fz7_.bkp tag=TAG20170413T080501 comment=NONE
channel ORA_DISK_2: backup set complete, elapsed time: 00:00:46
channel ORA_DISK_1: finished piece 1 at 13-APR-2017 08:06:07
piece handle=/u03/fast_recovery_area/CDB/backupset/2017_04_13/o1_mf_annnn_TAG20170413T080501_dgy58fy4_.bkp tag=TAG20170413T080501 comment=NONE
channel ORA_DISK_1: backup set complete, elapsed time: 00:01:06
Finished backup at 13-APR-2017 08:06:07

Starting Control File and SPFILE Autobackup at 13-APR-2017 08:06:07
piece handle=/u03/fast_recovery_area/CDB/autobackup/2017_04_13/o1_mf_s_941184367_dgy5bh7w_.bkp comment=NONE
Finished Control File and SPFILE Autobackup at 13-APR-2017 08:06:08

RMAN>

But is this really what I want? Probably not. Whenever an archived log is missing, RMAN should stop right away and throw an error message. This gives me the chance to check what was going wrong and the possibility to correct it.

Conclusion

I don’t recommend to run the crosscheck archivelog all in your regular RMAN backup scripts. This is a command that should be run manually in case it is needed. You just make your backup slower (ok, not too much but still), and you will probably never notice when an archived log is missing, which can lead to a database which can only be recovered to the point before the missing archived log.

 

Cet article Oracle 12c – Why you shouldn’t do a crosscheck archivelog all in your regular RMAN backup scripts est apparu en premier sur Blog dbi services.

Using WebLogic 12C RESTful interface to query a WebLogic Domain configuration

Thu, 2017-04-13 00:12

WebLogic 12.2.1 provides a new REST management interface with full accesses to all WebLogic Server resources.
This new interface provides an alternative to the WLST scripting or JMX developments for management and monitoring of WebLogic Domains.
This blog explains how the RESTful interface can be used to determine a WebLogic Domain configuration and display it’s the principals attributes.

For this purpose, a Search RESTful call will be used.
The RESTful URL to point to the search is: http://vm01.dbi-workshop.com:7001/management/weblogic/latest/edit/search
This search RESTful URL points to the root of the WebLogic Domain configuration managed beans tree.

The search call is a HTTP POST and requires a json structure to define the resources we are looking for.

{
    links: [],
    fields: [ 'name', 'configurationVersion' ],
    children: {
        servers: {
            links: [],
            fields: [ 'name','listenAddress','listenPort','machine','cluster' ],
            children: {
                SSL: {
                    fields: [ 'enabled','listenPort' ], links: []
                }
            }
        }
    }
}

The json structure above defines the search attributes that is provided in the HTTP POST.
This command searches for the WebLogic Domain name and Version.
Then for the servers in the children’s list for which it prints the name, listen port, machine name and cluster name if this server is member of a cluster. In the servers childrens list, it looks for the SSL entry and displays the SSL listen Port.

To execute this REST url from the Unix command line, we will use the Unix curl command:

curl -g --user monitor:******** -H X-Requested-By:MyClient -H Accept:application/json -H Content-Type:application/json -d "{ links: [], fields: [ 'name', 'configurationVersion' ], children: { servers: { links: [], fields: [ 'name', 'listenPort','machine','cluster' ], children: { SSL: { fields: [ 'listenPort' ], links: [] }} } } }" -X POST http://vm01.dbi-workshop.com:7001/management/weblogic/latest/edit/search

Below is a sample of the results provided by such command execution:

{
    "configurationVersion": "12.2.1.0.0",
    "name": "base_domain",
    "servers": {"items": [
    {
          "listenAddress": "vm01.dbi-workshop.com",
          "name": "AdminServer",
          "listenPort": 7001,
          "cluster": null,
          "machine": [
                 "machines",
                 "machine1"
          ],
          "SSL": {
                 "enabled": true,
                 "listenPort": 7002
          }
   },
   {
          "listenAddress": "vm01.dbi-workshop.com",
          "name": "server1",
          "listenPort": 7003,
          "cluster": [
                 "clusters",
                 "cluster1"
          ],
          "machine": [
                 "machines",
                 "machine1"
          ],
          "SSL": {
                 "enabled": false,
                 "listenPort": 7013
          }
  },
  {
          "listenAddress": "vm01.dbi-workshop.com",
          "name": "server2",
          "listenPort": 7004,
          "cluster": [
                "clusters",
                "cluster1"
          ],
          "machine": [
                "machines",
                "machine1"
          ],
          "SSL": {
                "enabled": false,
                "listenPort": 7014
          }
  },
  {
         "listenAddress": "vm01.dbi-workshop.com",
         "name": "server3",
         "listenPort": 7005,
         "cluster": null,
         "machine": [
                "machines",
                "machine1"
         ],
         "SSL": {
               "enabled": false,
               "listenPort": 7015
         }
  }
]}
 

Cet article Using WebLogic 12C RESTful interface to query a WebLogic Domain configuration est apparu en premier sur Blog dbi services.

Welcome to M|17

Wed, 2017-04-12 20:00

m17bannernew

Welcome to the MariaDB’s first user conference

On the 11th, started at 09:00 this big event at the Conrad Hotel in New York, closed to the One World Trade Center
After the short registration process where we received a full bag of goodies (mobilephone lens,Jolt charger, cap,note block,…)
we could choose between 3 workshops.
– Scaling and Securing MariaDB for High Availability
– MariaDB ColumnStore for High Performance Analytics
– Building Modern Applications with MariaDB

I decided to go to the first one presented by Michael de Groot, technical consultant at MariaDB.
After a theoritical introduction of the detailled MariaDB cluster technology and mechanisms (around 40 slides) we had to build up from scratch a MariaDB cluster composed of 4 nodes and I have to admit that this exercise was well prepared as we had just to follow the displayed instructions on the screen.
At the end that means 12:30, almost everybody had deployed the MariaDB cluster and was able to use and manage it.

Afterwards, it was time to get lunch. A big buffet of salads and sandwiches was waiting for us.
It was really nice because we could meet all people as Peter Zaitsev (Percona’s CEO) in a cool and relax atmosphere.

Welcome-mariadb
Atfter lunch, a keynote was delivered by MariaDB CEO Michael Howard in the biggest conference room of the hotel where around 400 people were present.
He mainly talked about the strategic orientation of MariaDB in the Open Source world for the next coming years.
Unfortunately the air conditioning was too cool and a lot of people started sneezing, even I and I had to keep my jacket all the time.

Then, a special guest speaker called Joan Tay Kim Choo, Executive Director of Technology Operations at DBS Bank, talked about their success story.
How they migrated all their databases from Oracle Enterprise and DB2 to MariaDB.

Roger Bodamer, MariaDB Chief Product Officer, then had also his keynote session.
It was really interresting because he discussed about how MariaDB will exploit the fundamental architectural changes in the cloud and how MariaDB will enable both OLTP and Analytical use cases for enterprises at any scale.

Finally, at five started the Welcome Reception and Technology Pavilion, in other words a small party.
Good music, good red wines (Cabernet was really good), good atmosphere.
we could meet all speakers and I had the chance to meet Michael Widenius alias “Monty”, founder of the MySQL Server, a great moment for me.
He gracefully accepted and several times because the light was really bad to take pictures with me.
MontySaid2

Around 18:30, the party was almost over, I was still here, one of the last guest finishing my glass of cabernet, thinking of tomorrow, the second day of this event and all the sessions I planned to see.

 

Cet article Welcome to M|17 est apparu en premier sur Blog dbi services.

Failed to set logmining server parameter MAX_SGA_SIZE to value XX

Wed, 2017-04-12 08:44

When you see something like this in your GoldenGate error log when you try to start an extract:

2017-04-12 14:51:38  ERROR   OGG-02039  Oracle GoldenGate Capture for Oracle, extxxx.prm:  Failed to set logmining server parameter MAX_SGA_SIZE to value 24.
2017-04-12 14:51:38  ERROR   OGG-02042  Oracle GoldenGate Capture for Oracle, extxxx.prm:  OCI Error 23605.
2017-04-12 14:51:38  ERROR   OGG-01668  Oracle GoldenGate Capture for Oracle, extxxx.prm:  PROCESS ABENDING.

… then you should increase the streams_pool_size (maybe you need to increase the sga parameters as well):

SQL> show parameter streams

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
streams_pool_size                    big integer 23M

Go at least to 1GB and you should be fine.

 

Cet article Failed to set logmining server parameter MAX_SGA_SIZE to value XX est apparu en premier sur Blog dbi services.

PostgreSQL 10 is just around the corner, time to dare the switch?

Wed, 2017-04-12 00:54

Some days ago Robert Haas published a great blog post about the features you can expect for the upcoming PostgreSQL 10 (probably in September this year). Beside of what Robert is describing in his blog: Do you still build your database infrastructure on proprietary software? The time to move forward is now, let me explain why:

What you can always hear when it is about replacing proprietary products with open source solutions is: It does not cost anything. Well, this is not entirely true. The software itself is free and at least when if comes to PostgreSQL you are free to do whatever your want. But this does not mean that you do not need to spend money when using open source software. You will still need to either hire people who will operate what you need or you will need to spend some money for someone else operating what you need (in the cloud or not, that does not matter). The big difference is:

  • You won’t need to purchase licenses, fact
  • Internal or external: When you compare the effort to operate a proprietary database with the time required to operate an open source database: You’ll save money for sure, as you’ll usually reduce complexity. The database is there to do its work and not for generating huge amounts of administration efforts.
  • When you need specific features not there yet you’ll need to get in touch with the community and try to convince them to implement it or you implement it yourself or you pay someone for implementing it (all choices will cost some money).

So far for the money aspects. The real benefit you get when choosing PostgreSQL is that you do not lock you in. Of course, once you start using PostgreSQL your data is in PostgreSQL and you can not just take it as it is and put it into another database. And of course, once you start implementing business logic inside the database you might feel that this locks you in again, but this is true for every product you use. Once you start using it you use it in the way the product works and other products usually work in another way. The key point is that you are free to do whatever you want to do with it and PostgreSQL tried to be as much compliant with the SQL Standard as possible. This is a complete change in thinking when you are used to work with the products of the big companies. PostgreSQL gets developed by people around the globe who in turn work for various companies around the globe. But there is no company called PostgreSQL, nobody can “buy” PostgreSQL. It is a pure open source project comparable to the Linux kernel development. Nobody can “buy” the Linux kernel but everybody can build business around it like the various commercial Linux distributions are doing it. The very same is true about PostgreSQL. The PostgreSQL product itself will always be free, check the PostgreSQL license.

What you do not get from PostgreSQL are the tools you need around PostgreSQL, e.g. for monitoring, backup/restore management or tools to automate failover and failback. The tools are there of course, both open source products as well as commercial tools. The commercial ones usually require some kind of subscription (e.g. EnterpiseDB).

Another important point to know is that PostgreSQL is supported on many platforms, check the build farm on what currently is tested and works. You are free to chose whatever platform you want to use: Your company is mainly using Windows, go and install PostgreSQL on Windows. Your main platform is FreeBSD? Go, install and use PostgreSQL on it.

But we need professional support! I know, you are used to work with the support organizations of the big companies and believe that only payed support is good support. If you want to (or are forced to), have a look here or contact us. There are plenty of companies which offer commercial support. In fact the official mailing lists provide outstanding support as well. Post your question to the mailing list which is the right one for your question and the question will get answered pretty fast, trust me. If you can’t believe it: Test it (but better think of asking a question after you searched the archives, maybe the answer is already there).

There are no conferences for PostgreSQL! Really? Have a look here. The next one in Switzerland is here.

I will not go into a features discussion here. If you want to learn more about the features of PostgreSQL search this blog or check the official documentation. There are tons of slides on SlideShare as well and many, many videos. If you really want to know what currently is going on in the PostgreSQL development check the PostgreSQL commit fest which is currently in progress. This is the place where patches are maintained. All is transparent and for every single patch you can check on how the whole discussion started in the hackers mailing list, e.g. for declarative partitioning.

Think about it …

 

Cet article PostgreSQL 10 is just around the corner, time to dare the switch? est apparu en premier sur Blog dbi services.

Documentum – Deactivation of a docbase without uninstallation

Sun, 2017-04-09 03:19

At some of our customers, we often install new docbases for development purposes which are used only for a short time to avoid cross-team interactions/interferences and this kind of things. Creating new docbases is quite easy with Documentum but it still takes some time (unless you use silent installations or Docker components). Therefore installing/removing docbases over and over can be a pain. For this purpose, we often install new docbases but then we don’t uninstall it, we simply “deactivate” it. By deactivate I mean updating configuration files and scripts to act just like if this docbase has never been created in the first place. As said above, some docbases are there only temporarily but we might need them again in a near future and therefore we don’t want to remove them completely.

In this blog, I will show you which files should be updated and how to simulate a “deactivation” so that the Documentum components will just act as if the docbase wasn’t there. I will describe the steps for the different applications of the Content Server including the JMS and Thumbnail Server, Web Application Server (D2/DA for example), Full Text Server and ADTS.

On this blog, I will use a Documentum 7.2 environment in LINUX of course (except for the ADTS…) which is therefore using JBoss 7.1.1 (for the JMS and xPlore 1.5). In all our environments we also have a custom script that can be used to stop or start all components installed in the host. Therefore in this blog, I will assume that you do have a similar script (let’s say that this script is named “startstop”) which include a variable named “DOCBASES=” that contains the list of docbases/repositories installed on the local Content Server (DOCBASES=”DOCBASE1 DOCBASE2 DOCBASE3″). For the Full Text Server, this variable will be “INDEXAGENTS=” and it will contain the name of the Index Agents installed on the local FT (INDEXAGENTS=”Indexagent_DOCBASE1 Indexagent_DOCBASE2 Indexagent_DOCBASE3″). If you don’t have such kind of script or if it is setup differently, then just adapt the needed steps below. I will put this custom startstop script at the following locations: $DOCUMENTUM/scripts/startstop in the Content Server and $XPLORE_HOME/scripts/startstop in the Full Text Server.

In the steps below, I will also assume that the docbase that need to be deactivated is “DOCBASE1″ and that we have two additional docbases installed on our environment (“DOCBASE2″ and “DOCBASE3″) that need to stay up&running. If you have some High Availability environments, then the steps below will apply to the Primary Content Server but for Remote Content Servers, you will need to adapt the name of the Docbase start and shutdown scripts which are placed under $DOCUMENTUM/dba: the correct name for Remote CSs should be $DOCUMENTUM/dba/dm_shutdown_DOCBASE1_<ServiceName@RemoteCSs>.

 

1. Content Server

Ok so let’s start with the deactivation of the docbase on the Content Server. Obviously the first thing to do is to stop the docbase if it is running:

ps -ef | grep "docbase_name DOCBASE1 " | grep -v grep
$DOCUMENTUM/dba/dm_shutdown_DOCBASE1

 

Once done and since we don’t want the docbase to be inadvertently restarted, then we need to update the custom script that I mentioned above. In addition to that, we should also rename the Docbase start script so an installer won’t start the docbase too.

mv $DOCUMENTUM/dba/dm_start_DOCBASE1 $DOCUMENTUM/dba/dm_start_DOCBASE1_deactivated
vi $DOCUMENTUM/scripts/startstop
    ==> Duplicate the line starting with "DOCBASES=..."
    ==> Comment one of the two lines and remove the docbase DOCBASE1 from the list that isn't commented
    ==> In the end, you should have something like:
        DOCBASES="DOCBASE2 DOCBASE3"
        #DOCBASES="DOCBASE1 DOCBASE2 DOCBASE3"

 

Ok so now the docbase has been stopped and can’t be started anymore so let’s start to check all the clients that were able to connect to this docbase. If you have a monitoring running on the Content Server (using the crontab for example), don’t forget to disable the monitoring too since the docbase isn’t running anymore. In the crontab, you can just comment the lines for example (using “crontab -e”). On the Java MethodServer (JMS) side, there are at least two applications you should take a look at (ServerApps and the ACS). To deactivate the docbase DOCBASE1 for these two applications, you should apply the following steps:

cd $DOCUMENTUM_SHARED/jboss7.1.1/server/DctmServer_MethodServer/deployments
vi ServerApps.ear/DmMethods.war/WEB-INF/web.xml
    ==> Comment the 4 lines related to DOCBASE1 as follow:
        <!--init-param>
            <param-name>docbase-DOCBASE1</param-name>
            <param-value>DOCBASE1</param-value>
        </init-param-->

vi acs.ear/lib/configs.jar/config/acs.properties
    ==> Reorder the “repository.name.X=” properties for DOCBASE1 to have the biggest number (X is a number which goes from 1 to 3 in this case since I have 3 docbases)
    ==> Reorder the “repository.acsconfig.X=” properties for DOCBASE1 to have the biggest number (X is a number which goes from 1 to 3 in this case since I have 3 docbases)
    ==> Comment the “repository.name.Y=” property with the biggest number (Y is the number for DOCBASE1 so should be 3 now)
    ==> Comment the “repository.acsconfig.Y=” property with the biggest number (Y is the number for DOCBASE1 so should be 3 now)
    ==> Comment the “repository.login.Y=” property with the biggest number (Y is the number for DOCBASE1 so should be 3 now)
    ==> Comment the “repository.password.Y=” property with the biggest number (Y is the number for DOCBASE1 so should be 3 now)

 

So what has been done above? In the file web.xml, there is a reference to all docbases that are configured for the applications. Therefore commenting these lines in the file simply avoid the JMS to try to contact the docbase DOCBASE1 because it’s not running anymore. For the ACS, the update of the file acs.properties is a little bit more complex. What I usually do in this file is reordering the properties so that the docbases that aren’t running have the biggest index. Since we have DOCBASE1, DOCBASE2 and DOCBASE3, DOCBASE1 being the first docbase installed, therefore it will have by default the index N°1 inside the acs.properties (e.g.: repository.name.1=DOCBASE1.DOCBASE1 // repository.name.2=DOCBASE2.DOCBASE2 // …). Reordering the properties will simply allow you to just comment the highest number (3 in this case) for all properties and you will keep the numbers 1 and 2 enabled.

In addition to the above, you might also have a BPM (xCP) installed, in which case you also need to apply the following step:

vi bpm.ear/bpm.war/WEB-INF/web.xml
    ==> Comment the 4 lines related to DOCBASE1 as follow:
        <!--init-param>
            <param-name>docbase-DOCBASE1</param-name>
            <param-value>DOCBASE1</param-value>
        </init-param-->

 

Once the steps have been applied, you can restart the JMS using your preferred method. This is an example:

$DOCUMENTUM_SHARED/jboss7.1.1/server/stopMethodServer.sh
ps -ef | grep "MethodServer" | grep -v grep
nohup $DOCUMENTUM_SHARED/jboss7.1.1/server/startMethodServer.sh >> $DOCUMENTUM_SHARED/jboss7.1.1/server/nohup-JMS.out 2>&1 &

 

After the restart of the JMS, it won’t contain any errors anymore related to connection problems to DOCBASE1. For example if you don’t update the ACS file (acs.properties), it will still try to project itself to all docbases and it will therefore fail for DOCBASE1.

The next component I wanted to describe isn’t a component that is installed by default on all Content Servers but you might have it if you need document previews: the Thumbnail Server. To deactivate the docbase DOCBASE1 inside the Thumbnail Server, it’s pretty easy too:

vi $DM_HOME/thumbsrv/conf/user.dat
    ==> Comment the 5 lines related to DOCBASE1:
        #[DOCBASE1]
        #user=dmadmin
        #local_folder=thumbnails
        #repository_folder=/System/ThumbnailServer
        #pfile.txt=/app/dctm/server/product/7.2/thumbsrv/conf/DOCBASE1/pfile.txt

sh -c "$DM_HOME/thumbsrv/container/bin/shutdown.sh"
ps -ef | grep "thumbsrv" | grep -v grep
sh -c "$DM_HOME/thumbsrv/container/bin/startup.sh"

 

If you don’t do that, the Thumbnail Server will try to contact all docbases configured in the “user.dat” file and because of a bug with certain versions of the Thumbnail (see this blog for more information), your Thumbnail Server might even fail to start. Therefore commenting the lines related to DOCBASE1 inside this file is quite important.

 

2. Web Application Server

For the Web Application Server hosting your Documentum Administrator and D2/D2-Config clients, the steps are pretty simple: usually nothing or almost nothing has to be done. If you really want to be clean, then there might be a few things to do, it all depends on what you configured… On this part, I will consider that you are using non-exploded applications (which means: war files). I will put these WAR files under $WS_HOME/applications/. In case your applications are exploded (meaning your D2 is a folder and not a war file), then you don’t have to extract the files (no need to execute the jar commands). If you are using a Tomcat Application Server, then the applications will usually be exploded (folder) and will be placed under $TOMCAT_HOME/webapps/.

 – D2:

If you defined the LoadOnStartup property for DOCBASE1, then you might need to execute the following commands to extract the file, comment the line for the DOCBASE1 inside it and update the file back into the war file:

jar -xvf $WS_HOME/applications/D2.war WEB-INF/classes/D2FS.properties
sed -i 's,^LoadOnStartup.DOCBASE1.\(username\|domain\)=.*,#&,' WEB-INF/classes/D2FS.properties
jar -uvf $WS_HOME/applications/D2.war WEB-INF/classes/D2FS.properties

 

Also if you defined which docbase should be the default one in D2 and that this docbase is DOCBASE1 then you need to change the default docbase to DOCBASE2 or DOCBASE3. In my case, I will use DOCBASE2 as new default docbase:

jar -xvf $WS_HOME/applications/D2.war WEB-INF/classes/config.properties
sed -i 's,^defaultRepository=.*,defaultRepository=DOCBASE2,' WEB-INF/classes/config.properties
jar -uvf $WS_HOME/applications/D2.war WEB-INF/classes/config.properties

 

Finally if you are using Single Sign-On, you will have a SSO User. This is defined inside the d2fs-trust.properties file with recent versions of D2 while it was defined in the shiro.ini file before. Since I’m using a D2 4.5, the commands would be:

jar -xvf $WS_HOME/applications/D2.war WEB-INF/classes/d2fs-trust.properties
sed -i 's,^DOCBASE1.user=.*,#&,' WEB-INF/classes/d2fs-trust.properties
jar -uvf $WS_HOME/applications/D2.war WEB-INF/classes/d2fs-trust.properties

 

 – D2-Config:

Usually nothing is needed. Only running docbases will be available through D2-Config.

 

 – DA:

Usually nothing is needed, unless you have specific customization for DA, in which case you probably need to take a look at the files under the “custom” folder.

 

3. Full Text Server

For the Full Text Server, the steps are also relatively easy. The only thing that needs to be done is to stop the Index Agent related to the docbase DOCBASE1 and prevent it from starting again. In our environments, since we sometimes have several docbases installed on the same Content Server and several Index Agents installed on the same Full Text, then we need to differentiate the name of the Index Agents. We usually only add the name of the docbase at the end: Indexagent_DOCBASE1. So let’s start with stopping the Index Agent:

ps -ef | grep "Indexagent_DOCBASE1" | grep -v grep
$XPLORE_HOME/jboss7.1.1/server/stopIndexagent_DOCBASE1.sh

 

Once done and if I use the global startstop script I mentioned earlier in this blog, then the only remaining step is preventing the Index Agent to start again and that can be done in the following way:

mv $XPLORE_HOME/jboss7.1.1/server/startIndexagent_DOCBASE1.sh $XPLORE_HOME/jboss7.1.1/server/startIndexagent_DOCBASE1.sh_deactivated
vi $XPLORE_HOME/scripts/startstop
    ==> Duplicate the line starting with "INDEXAGENTS=..."
    ==> Comment one of the two lines and remove the Index Agent related to DOCBASE1 from the list that isn't commented
    ==> In the end, you should have something like:
        INDEXAGENTS="Indexagent_DOCBASE2 Indexagent_DOCBASE3"
        #INDEXAGENTS="Indexagent_DOCBASE1 Indexagent_DOCBASE2 Indexagent_DOCBASE3"

 

If you have a monitoring running on the Full Text Server for this Index Agent, don’t forget to disable it.

 

4. ADTS

The last section of this blog will talk about the ADTS (Advanced Document Transformation Services), also called the Rendition Server. The ADTS is fairly similar to all other Documentum components: first you start with installing the different binaries and then you can configure a docbase to use/be supported by the ADTS. By doing that, the ADTS will update some configuration files that therefore need to be updated again if you want to deactivate a docbase. As you know, the ADTS is a Windows Server so I won’t show you commands to be executed in this section, I will just point you to the configuration files that need to be edited and what to update inside them. In this section, I will use %ADTS_HOME% as the folder under which the ADTS has been installed. It’s usually a good idea to install the ADTS under a specific/separated drive (not the OS drive) like D:\CTS\.

So the first thing to do is to prevent the different profiles for a docbase to be loaded:

Open the file "%ADTS_HOME%\CTS\config\CTSProfileService.xml"
    ==> Comment the whole "ProfileManagerContext" XML tag related to DOCBASE1
    ==> In the end, you should have something like:
        <!--ProfileManagerContext DocbaseName="DOCBASE1" ProcessExternally="false">
            <CTSServerProfile CTSProfileValue="%ADTS_HOME%\CTS\\docbases\\DOCBASE1\\config\\profiles\\lightWeightProfiles" CTSProfileName="lightWeightProfile"/>
            <CTSServerProfile CTSProfileValue="%ADTS_HOME%\CTS\\docbases\\DOCBASE1\\config\\profiles\\lightWeightSystemProfiles" CTSProfileName="lightWeightSystemProfile"/>
            <CTSServerProfile CTSProfileValue="%ADTS_HOME%\CTS\\docbases\\DOCBASE1\\config\\profiles\\heavyWeightProfiles" CTSProfileName="heavyWeightProfile"/>
            <CTSServerProfile CTSProfileValue="/System/Media Server/Profiles" CTSProfileName="lightWeightProfileFolder"/>
            <CTSServerProfile CTSProfileValue="/System/Media Server/System Profiles" CTSProfileName="lightWeightSystemProfileFolder"/>
            <CTSServerProfile CTSProfileValue="/System/Media Server/Command Line Files" CTSProfileName="heavyWeightProfileFolder"/>
            <CTSServerProfile CTSProfileValue="%ADTS_HOME%\CTS\docbases\DOCBASE1\config\temp_profiles" CTSProfileName="tempFileDir"/>
            <CTSServerProfile CTSProfileValue="ProfileSchema.dtd" CTSProfileName="lwProfileDTD"/>
            <CTSServerProfile CTSProfileValue="MP_PROPERTIES.dtd" CTSProfileName="hwProfileDTD"/>
            <ForClients>XCP</ForClients>
        </ProfileManagerContext-->

 

Once that is done, the queue processors need to be disabled too:

Open the file "%ADTS_HOME%\CTS\config\CTSServerService.xml"
    ==> Comment the two "QueueProcessorContext" XML tags related to DOCBASE1
    ==> In the end, you should have something like (I'm not displaying the whole XML tags since they are quite long...):
        <!--QueueProcessorContext DocbaseName="DOCBASE1">
            <CTSServer AttributeName="queueItemName" AttributeValue="dm_mediaserver"/>
            <CTSServer AttributeName="queueInterval" AttributeValue="10"/>
            <CTSServer AttributeName="maxThreads" AttributeValue="10"/>
            ...
            <CTSServer AttributeName="processOnlyParked" AttributeValue=""/>
            <CTSServer AttributeName="parkingServerName" AttributeValue=""/>
            <CTSServer AttributeName="notifyFailureMessageAdmin" AttributeValue="No"/>
        </QueueProcessorContext-->
        <!--QueueProcessorContext DocbaseName="DOCBASE1">
            <CTSServer AttributeName="queueItemName" AttributeValue="dm_autorender_win31"/>
            <CTSServer AttributeName="queueInterval" AttributeValue="10"/>
            <CTSServer AttributeName="maxThreads" AttributeValue="10"/>
            ...
            <CTSServer AttributeName="processOnlyParked" AttributeValue=""/>
            <CTSServer AttributeName="parkingServerName" AttributeValue=""/>
            <CTSServer AttributeName="notifyFailureMessageAdmin" AttributeValue="No"/>
        </QueueProcessorContext-->

 

After that, there is only one last configuration file to be updated and that’s the session manager which is the one responsible for the errors printed during startup of the ADTS because it defines which docbases the ADTS should try to contact, using which user/password and how many tries should be perform:

Open the file "%ADTS_HOME%\CTS\config\SessionService.xml"
    ==> Comment the whole "LoginContext" XML tag related to DOCBASE1
    ==> In the end, you should have something like:
        <!--LoginContext DocbaseName="DOCBASE1" Domain="" isPerformanceLogRepository="false">
            <CTSServer AttributeName="userName" AttributeValue="adtsuser"/>
            <CTSServer AttributeName="passwordFile" AttributeValue="%ADTS_HOME%\CTS\docbases\DOCBASE1\config\pfile\mspassword.txt"/>
            <CTSServer AttributeName="maxConnectionRetries" AttributeValue="10"/>
        </LoginContext-->

 

Once the configuration files have been updated, simply restart the ADTS services for the changes to be applied.

 

And here we go, you should have a clean environment with one less docbase configured without having to remove it on all servers. As a final note, if you ever want to reactivate the docbase, simply uncomment everything that was commented above, restore the default line from the custom “startstop” scripts and rename the Documentum start scripts with their original names (without the “_deactivated”) on the Content Server and Full Text Server.

 

 

Cet article Documentum – Deactivation of a docbase without uninstallation est apparu en premier sur Blog dbi services.

Service “696c6f76656d756c746974656e616e74″ has 1 instance(s).

Sat, 2017-04-08 02:53

Weird title, isn’t it? That was my reaction when I did my first ‘lsnrctl status’ in 12.2: weird service name… If you have installed 12.2 multitenant, then you have probably seen this strange service name registered in your listener. One per PDB. It is not a bug. It is an internal service used to connect to the remote PDB for features like Proxy PDB. This name is the GUID of the PDB which makes this service independent of the name or the physical location of the PDB. You can use it to connect to the PDB, but should not. It is an internal service name. But on a lab, let’s play with it.

CDB

I have two Container Databases on my system:

18:01:33 SQL> connect sys/oracle@//localhost/CDB2 as sysdba
Connected.
18:01:33 SQL> show pdbs
 
CON_ID CON_NAME OPEN MODE RESTRICTED
------ -------- ---- ---- ----------
2 PDB$SEED READ ONLY NO

CDB2 has been created without any pluggable databases (except PDB$SEED of course).

18:01:33 SQL> connect sys/oracle@//localhost/CDB1 as sysdba
Connected.
18:01:33 SQL> show pdbs
 
CON_ID CON_NAME OPEN MODE RESTRICTED
------ -------- ---- ---- ----------
2 PDB$SEED READ ONLY NO
4 PDB1 READ WRITE NO

CDB1 has one pluggable database PDB1.

PDB1 has its system files in /u01/oradata/CDB1/PDB1/ and I’ve a user tablespace datafiles elsewhere:

18:01:33 SQL> select con_id,file_name from cdb_data_files;
CON_ID FILE_NAME
------ -------------------------------------
1 /u01/oradata/CDB1/users01.dbf
1 /u01/oradata/CDB1/undotbs01.dbf
1 /u01/oradata/CDB1/system01.dbf
1 /u01/oradata/CDB1/sysaux01.dbf
4 /u01/oradata/CDB1/PDB1/undotbs01.dbf
4 /u01/oradata/CDB1/PDB1/sysaux01.dbf
4 /u01/oradata/CDB1/PDB1/system01.dbf
4 /u01/oradata/CDB1/PDB1/USERS.dbf
4 /var/tmp/PDB1USERS2.dbf

Both are registered to the same local listener:

SQL> host lsnrctl status
 
LSNRCTL for Linux: Version 12.2.0.1.0 - Production on 07-APR-2017 18:01:33
 
Copyright (c) 1991, 2016, Oracle. All rights reserved.
 
Connecting to (ADDRESS=(PROTOCOL=tcp)(HOST=)(PORT=1521))
STATUS of the LISTENER
------------------------
Alias LISTENER
Version TNSLSNR for Linux: Version 12.2.0.1.0 - Production
Start Date 07-APR-2017 07:53:06
Uptime 0 days 10 hr. 8 min. 27 sec
Trace Level off
Security ON: Local OS Authentication
SNMP OFF
Listener Log File /u01/app/oracle/diag/tnslsnr/VM104/listener/alert/log.xml
Listening Endpoints Summary...
(DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=VM104)(PORT=1521)))
Services Summary...
Service "4aa269fa927779f0e053684ea8c0c27f" has 1 instance(s).
Instance "CDB1", status READY, has 1 handler(s) for this service...
Service "CDB1" has 1 instance(s).
Instance "CDB1", status READY, has 1 handler(s) for this service...
Service "CDB1XDB" has 1 instance(s).
Instance "CDB1", status READY, has 1 handler(s) for this service...
Service "CDB2" has 1 instance(s).
Instance "CDB2", status READY, has 1 handler(s) for this service...
Service "CDB2XDB" has 1 instance(s).
Instance "CDB2", status READY, has 1 handler(s) for this service...
Service "pdb1" has 1 instance(s).
Instance "CDB1", status READY, has 1 handler(s) for this service...
The command completed successfully

Each container database declares its db_unique_name as a service: CDB1 and CDB2, with an XDB service for each: CDB1XDB and CDB2XDB, each pluggable database has also its service: PDB1 here. This is what we had in 12.1 but in 12.2 there is one more service with a strange name in hexadecimal: 4aa269fa927779f0e053684ea8c0c27f

Connect to PDB without a service name?

Want to know more about it? Let’s try to connect to it:

SQL> connect sys/oracle@(DESCRIPTION=(CONNECT_DATA=(SERVICE_NAME=4aa269fa927779f0e053684ea8c0c27f))(ADDRESS=(PROTOCOL=TCP)(HOST=192.168.78.104)(PORT=1521))) as sysdba
Connected.
SQL> select sys_context('userenv','cdb_name'), sys_context('userenv','con_name'), sys_context('userenv','service_name') from dual;
 
SYS_CONTEXT('USERENV','CDB_NAME') SYS_CONTEXT('USERENV','CON_NAME') SYS_CONTEXT('USERENV','SERVICE_NAME')
--------------------------------- --------------------------------- -------------------------------------
CDB1 PDB1 SYS$USERS

With this service, I can connect to the PDB1 but the service name I used in the connection string is not a real service:

SQL> select name from v$services;
 
NAME
----------------------------------------------------------------
pdb1
 
SQL> show parameter service
 
NAME TYPE VALUE
------------------------------------ ----------- ------------------------------
service_names string CDB1

The documentation says that SYS$USERS is the default database service for user sessions that are not associated with services so I’m connected to a PDB here without a service.

GUID

The internal service name is the GUID of the PDB, which identifies the container even after unplug/plug.

SQL> select pdb_id,pdb_name,con_uid,guid from dba_pdbs;
 
PDB_ID PDB_NAME CON_UID GUID
------ -------- ------- ----
4 PDB1 2763763322 4AA269FA927779F0E053684EA8C0C27F

Proxy PDB

This internal service has been introduced in 12cR2 for Proxy PDB feature: access to a PDB through another one, so that you don’t have to change the connection string when you migrate the PDB to another server.

I’ll create a Proxy PDB in CDB2 to connect to PDB1 which is in CDB1. This is simple: create a database link for the creation of the Proxy PDB which I call PDB1PX1:

18:01:33 SQL> connect sys/oracle@//localhost/CDB2 as sysdba
Connected.
18:01:33 SQL> show pdbs
 
CON_ID CON_NAME OPEN MODE RESTRICTED
------ -------- ---- ---- ----------
2 PDB$SEED READ ONLY NO
 
18:01:33 SQL> create database link CDB1 connect to system identified by oracle using '//localhost/CDB1';
Database link CDB1 created.
 
18:01:38 SQL> create pluggable database PDB1PX1 as proxy from PDB1@CDB1
file_name_convert=('/u01/oradata/CDB1/PDB1','/u01/oradata/CDB1/PDB1PX1');
 
Pluggable database PDB1PX1 created.
 
18:02:14 SQL> drop database link CDB1;
Database link CDB1 dropped.

The Proxy PDB clones the system tablespaces, and this is why I had to give a file_name_convert. Note that the user tablespace datafile is not cloned, so I don’t need to convert the ‘/var/tmp/PDB1USERS2.dbf’. The dblink is not needed anymore once the Proxy PDB is created, as it is used only for the clone of system tablespaces. The PDB is currently in mount.

18:02:14 SQL> connect sys/oracle@//localhost/CDB2 as sysdba
Connected.
18:02:14 SQL> show pdbs
 
CON_ID CON_NAME OPEN MODE RESTRICTED
------ -------- ---- ---- ----------
2 PDB$SEED READ ONLY NO
3 PDB1PX1 MOUNTED

The system tablespaces are there (I’m in 12.2 with local undo which is required for Proxy PDB feature)

18:02:14 SQL> select con_id,file_name from cdb_data_files;
 
CON_ID FILE_NAME
------ ---------
1 /u01/oradata/CDB2/system01.dbf
1 /u01/oradata/CDB2/sysaux01.dbf
1 /u01/oradata/CDB2/users01.dbf
1 /u01/oradata/CDB2/undotbs01.dbf

I open the PDB

18:02:19 SQL> alter pluggable database PDB1PX1 open;
Pluggable database PDB1PX1 altered.

connect

I have now 3 ways to connect to PDB1: with the PDB1 service, with the internal service, and through the Proxy PDB service.
I’ve tested the 3 ways:


18:02:45 SQL> connect demo/demo@//localhost/PDB1
18:02:56 SQL> connect demo/demo@//localhost/PDB1PX1
18:03:06 SQL> connect demo/demo@//localhost/4aa269fa927779f0e053684ea8c0c27f

and I’ve inserted each time into a DEMO table the information about my connection:
SQL> insert into DEMO select '&_connect_identifier' "connect identifier", current_timestamp "timestamp", sys_context('userenv','cdb_name') "CDB name", sys_context('userenv','con_name') "con name" from dual;

Here is the result:

connect identifier timestamp CDB name container name
------------------ --------- -------- --------------
//localhost/PDB1 07-APR-17 06.02.50.977839000 PM CDB1 PDB1
//localhost/PDB1PX1 07-APR-17 06.03.01.492946000 PM CDB1 PDB1
//localhost/4aa269fa927779f0e053684ea8c0c27f 07-APR-17 06.03.11.814039000 PM CDB1 PDB1

We are connected to the same databases. As for this test I’m on the same server with same listener, I can check what is logged in the listener log.

Here are the $ORACLE_BASE/diag/tnslsnr/$(hostname)/listener/alert/log.xml entries related to my connections.

//localhost/PDB1

When connecting directly to PDB1 the connection is simple:


<msg time='2017-04-07T18:02:45.644+02:00' org_id='oracle' comp_id='tnslsnr'
type='UNKNOWN' level='16' host_id='VM104'
host_addr='192.168.78.104' pid='1194'>
<txt>07-APR-2017 18:02:45 * (CONNECT_DATA=(SERVICE_NAME=PDB1)(CID=(PROGRAM=java)(HOST=VM104)(USER=oracle))) * (ADDRESS=(PROTOCOL=tcp)(HOST=127.0.0.1)(PORT=27523)) * establish * PDB1 * 0
</txt>
</msg>

I am connecting with SQLcl which is java: (PROGRAM=java)

//localhost/PDB1PX1

When connecting through the Proxy PDB I see the connection to the Proxy PDBX1:


<msg time='2017-04-07T18:02:56.058+02:00' org_id='oracle' comp_id='tnslsnr'
type='UNKNOWN' level='16' host_id='VM104'
host_addr='192.168.78.104' pid='1194'>
<txt>07-APR-2017 18:02:56 * (CONNECT_DATA=(SERVICE_NAME=PDB1PX1)(CID=(PROGRAM=java)(HOST=VM104)(USER=oracle))) * (ADDRESS=(PROTOCOL=tcp)(HOST=127.0.0.1)(PORT=27524)) * establish * PDB1PX1 * 0
</txt>
</msg>

This is the java connection. But I can also see the connection to the remote PDB1 from the Proxy PDB


<msg time='2017-04-07T18:03:01.375+02:00' org_id='oracle' comp_id='tnslsnr'
type='UNKNOWN' level='16' host_id='VM104'
host_addr='192.168.78.104' pid='1194'>
<txt>07-APR-2017 18:03:01 * (CONNECT_DATA=(SERVICE_NAME=4aa269fa927779f0e053684ea8c0c27f)(CID=(PROGRAM=oracle)(HOST=VM104)(USER=oracle))) * (ADDRESS=(PROTOCOL=tcp)(HOST=192.168.78.104)(PORT=16787)) * establish * 4aa269fa927779f0e053684ea8c0c27f * 0
</txt>
</msg>

Here the program is (PROGRAM=oracle) which is a CDB2 instance process connecting to the CDB1 remote through the internal service.

//localhost/4aa269fa927779f0e053684ea8c0c27f

When I connect to the internal service, I see the same connection to PDB1’s GUID but from (PROGRAM=java) directly


<msg time='2017-04-07T18:03:06.671+02:00' org_id='oracle' comp_id='tnslsnr'
type='UNKNOWN' level='16' host_id='VM104'
host_addr='192.168.78.104' pid='1194'>
<txt>07-APR-2017 18:03:06 * (CONNECT_DATA=(SERVICE_NAME=4aa269fa927779f0e053684ea8c0c27f)(CID=(PROGRAM=java)(HOST=VM104)(USER=oracle))) * (ADDRESS=(PROTOCOL=tcp)(HOST=127.0.0.1)(PORT=27526)) * establish * 4aa269fa927779f0e053684ea8c0c27f * 0
</txt>
</msg>

One more…

So each user PDB, in addition to the PDB name and additional services you have defined, registers an additional internal service, whether the PDB is opened our closed. And the fun is that Proxy PDB also register this additional service. Here is my listener status:


Listening Endpoints Summary...
(DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=VM104)(PORT=1521)))
Services Summary...
Service "4aa269fa927779f0e053684ea8c0c27f" has 1 instance(s).
Instance "CDB1", status READY, has 1 handler(s) for this service...
Service "4c96bda23b8e41fae053684ea8c0918b" has 1 instance(s).
Instance "CDB2", status READY, has 1 handler(s) for this service...
Service "CDB1" has 1 instance(s).
Instance "CDB1", status READY, has 1 handler(s) for this service...
Service "CDB1XDB" has 1 instance(s).
Instance "CDB1", status READY, has 1 handler(s) for this service...
Service "CDB2" has 1 instance(s).
Instance "CDB2", status READY, has 1 handler(s) for this service...
Service "CDB2XDB" has 1 instance(s).
Instance "CDB2", status READY, has 1 handler(s) for this service...
Service "pdb1" has 1 instance(s).
Instance "CDB1", status READY, has 1 handler(s) for this service...
Service "pdb1px1" has 1 instance(s).
Instance "CDB2", status READY, has 1 handler(s) for this service...
The command completed successfully

This “4c96bda23b8e41fae053684ea8c0918b” is the GUID of the Proxy PDB.

SQL> select sys_context('userenv','cdb_name'), sys_context('userenv','con_name'), sys_context('userenv','service_name') from dual;
 
SYS_CONTEXT('USERENV','CDB_NAME')
--------------------------------------------------------------------------------
SYS_CONTEXT('USERENV','CON_NAME')
--------------------------------------------------------------------------------
SYS_CONTEXT('USERENV','SERVICE_NAME')
--------------------------------------------------------------------------------
CDB1
PDB1
SYS$USERS

So that’s a fourth way to connect to PDB1: through the internal service of the Proxy PDB.

Then you can immediately imagine what I tried…

ORA-65280

Because the internal service name is used to connect through Proxy PDB, can I create an proxy for the proxy?

18:03:32 SQL> create pluggable database PDB1PX2 as proxy from PDB1PX1@CDB2
2 file_name_convert=('/u01/oradata/CDB1/PDB1/PX1','/u01/oradata/CDB1/PDB1PX2');
 
Error starting at line : 76 File @ /media/sf_share/122/blogs/proxypdb.sql
In command -
create pluggable database PDB1PX2 as proxy from PDB1PX1@CDB2
file_name_convert=('/u01/oradata/CDB1/PDB1/PX1','/u01/oradata/CDB1/PDB1PX2')
Error report -
ORA-65280: The referenced pluggable database is a proxy pluggable database.

Answer is no. You cannot nest the Proxy PDB.

So what?

Don’t panic when looking at services registered in the listener. Those hexadecimal service names are expected in 12.2, with one per user PDB. You see them, but have no reason to use them directly. You will use them indirectly when creating a Proxy PDB which makes the location where users connect independent from the physical location of the PDB. Very interesting from migration because client configuration is independent from the migration (think hybrid-cloud). You can use this feature even without the multitenant option. Want to see all multitenant architecture options available without the option? Look at the ITOUG Tech day agenda

 

Cet article Service “696c6f76656d756c746974656e616e74″ has 1 instance(s). est apparu en premier sur Blog dbi services.

Trace files segmented in multiple parts as a workaround for bug 23300142

Fri, 2017-04-07 12:27

Today I visited a customer, who deleted a Data Guard configuration (i.e. a temporary Data Guard setup through the broker was deleted). The LOG_ARCHIVE_DEST_STATE_2 on the primary database was set to DEFER temporarily. That resulted in trace-files with name *tt*.trc to become huge (GBytes after a couple of days). Analysis showed that this was caused by bug 23300142 in 12.1.0.2. See My Oracle Support Note

Bug 23300142 - TT background process trace file message: async ignored current log: kcclenal clear thread open (Doc ID 23300142.8)

for details.
Unfortunately the bug does not have a workaround.
Due to the fact that the affected development-databases (which were now normal single instances without Data Guard) could not be restarted, I searched for a temporary workaround to stop the trace-files from growing further. Limiting the trace-file size on the database with

alter system set max_dump_file_size='100M';

did actually not always work to limit the file size. Here an example of a huge trace file (over 5GB):


$ find . -name "*tt*.trc" -ls | tr -s " " | cut -d " " -f7-11 | sort -n
...
5437814195 Apr 7 10:46 ./xxxxxx_site1/XXXXXX/trace/XXXXXX_tt00_28304.trc

However, what came in handy was the uts-trace-segmentation feature of 12c. See Jonathan Lewis’ blog here:

https://jonathanlewis.wordpress.com/2016/01/26/trace-file-size

I.e. I left all DBs on max_dump_file_size=unlimited and set


SQL> alter system set "_uts_first_segment_size" = 52428800 scope=memory;
SQL> alter system set "_uts_trace_segment_size" = 52428800 scope=memory;

Unfortunately setting the limit to the tt-background-process alone does not work:


SQL> exec dbms_system.set_int_param_in_session(sid => 199, serial# => 44511, parnam => '_uts_trace_segment_size', intval => 52428800);
BEGIN dbms_system.set_int_param_in_session(sid => 199, serial# => 44511, parnam => '_uts_trace_segment_size', intval => 52428800); END;
 
*
ERROR at line 1:
ORA-44737: Parameter _uts_trace_segment_size did not exist.
ORA-06512: at "SYS.DBMS_SYSTEM", line 117
ORA-06512: at line 1

With the default setting of “_uts_trace_segments” (Maximum number of trace segments) = 5 I could limit the maximum size of the trace of 1 DB to 250MB (50MB * 5). Below you can see only 4 files, because of 2 tests with earlier splittings of the trace-file:


$ ls -ltr *_tt00_28304*.trc
-rw-r----- 1 oracle dba 52428964 Apr 7 14:14 XXXXXX_tt00_28304_3.trc
-rw-r----- 1 oracle dba 52428925 Apr 7 16:07 XXXXXX_tt00_28304_4.trc
-rw-r----- 1 oracle dba 52428968 Apr 7 17:12 XXXXXX_tt00_28304_5.trc
-rw-r----- 1 oracle dba 43887950 Apr 7 18:50 XXXXXX_tt00_28304.trc

The feature of segmented trace-files may help a lot in situations like bug 23300142.

REMARK: Do not use underscore parameters in production environments without agreement from Oracle Support.

 

Cet article Trace files segmented in multiple parts as a workaround for bug 23300142 est apparu en premier sur Blog dbi services.

12cR2 DML monitoring and Statistics Advisor

Thu, 2017-04-06 15:40

Monitoring DML to get an idea of the activity on our tables is not new. The number of insert/delete/update/truncate since last stats gathering is tracked automatically. The statistics gathering job use it to list and prioritize tables that need fresh statistics. This is for slow changes on tables. In 12.2 we have the statistics advisor that goes further, with a rule that detects volatile tables:

SQL> select * from V$STATS_ADVISOR_RULES where rule_id=14;
 
RULE_ID NAME RULE_TYPE DESCRIPTION CON_ID
------- ---- --------- ----------- ------
14 LockVolatileTable OBJECT Statistics for objects with volatile data should be locked 0

But to detect volatile tables, you need to track DML frequency with finer grain. Let’s investigate what is new here in 12.2

Statistics Advisor tracing

DBMS_STATS has its trace mode enabled as a global preference. It is not documented, but it works with powers of two. 12.1.0.2 introduced 262144 to trace system statistics gathering, so let’s try the next one: 524288

SQL> exec dbms_stats.set_global_prefs('TRACE',0+524288)
PL/SQL procedure successfully completed.

After a while, I grepped my trace directory for DBMS_STATS and found the MMON slave trace (ORCLA_m001_30694.trc here):

*** 2017-04-06T14:10:11.979283+02:00
*** SESSION ID:(81.2340) 2017-04-06T14:10:11.979302+02:00
*** CLIENT ID:() 2017-04-06T14:10:11.979306+02:00
*** SERVICE NAME:(SYS$BACKGROUND) 2017-04-06T14:10:11.979309+02:00
*** MODULE NAME:(MMON_SLAVE) 2017-04-06T14:10:11.979313+02:00
*** ACTION NAME:(Flush KSXM hash table action) 2017-04-06T14:10:11.979317+02:00
*** CLIENT DRIVER:() 2017-04-06T14:10:11.979320+02:00
 
...
 
DBMS_STATS: compute_volatile_flag: objn=74843, flag=0, new_flag=0, inserts_new=619, updates_new=0, deletes_new=0, inserts_old=619, updates_old=0, deletes_old=0, rowcnt=, rowcnt_loc=, stale_pcnt=10, gather=NO_GATHER, flag_result=0
DBMS_STATS: compute_volatile_flag: objn=74862, flag=0, new_flag=0, inserts_new=4393, updates_new=0, deletes_new=0, inserts_old=4393, updates_old=0, deletes_old=0, rowcnt=, rowcnt_loc=, stale_pcnt=10, gather=NO_GATHER, flag_result=0
DBMS_STATS: compute_volatile_flag: objn=74867, flag=1, new_flag=0, inserts_new=4861477, updates_new=584000, deletes_new=13475192, inserts_old=3681477, updates_old=466000, deletes_old=12885192, rowcnt=, rowcnt_loc=, stale_pcnt=10, gather=NO_GATHER, flag_result=1

Those entries appear every hour. Obviously, they are looking at some table (by their object_id) and computes a new flag from an existing flag and statistics about new and old DML (insert, update, delete). There’s a mention or row count and stale percentage. Obviously, the volatility of tables est computed every hour (mentions gather=NO_GATHER) or when we gather statistics (gather=GATHER). This goes beyond the DML monitoring from previous release, but is probably based on it.

Testing some DML

SQL> delete from DEMO;
10000 rows deleted.
 
SQL> insert into DEMO select rownum from xmltable('1 to 10000');
10000 rows created.
 
SQL> commit;
Commit complete.
 
SQL> select count(*) numrows from DEMO;
NUMROWS
----------
10000
 
SQL> update demo set n=n+1 where rownum lt;= 2000;
 
2000 rows updated.
 
SQL> insert into DEMO select rownum from xmltable('1 to 10000');
 
10000 rows created.

I deleted 10000 rows and inserted 10000, with a commit at the end. I updated 2000 ones and inserted 10000 again, without commit.

x$ksxmme

DML monitoring is done in memory, I order to see the changes in DBA_TAB_MODIFICATIONS, we need to flush it. But this in-memory information is visible in X$ fixed view:

SQL> select * from X$KSXMME where objn=&object_id;
old 1: select * from X$KSXMME where objn=&object_id
new 1: select * from X$KSXMME where objn= 74867
 
ADDR INDX INST_ID CON_ID CHUNKN SLOTN OBJN INS UPD DEL DROPSEG CURROWS PAROBJN LASTUSED FLAGS
---------------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ----------
00007F526E0B81F0 0 1 0 64 256 74867 20000 2000 10000 0 2350000 0 1491467123 128

Here are my 10000 deletes + 10000 inserts + 2000 updates + 10000 inserts. Of course the uncommitted ones are there because DML tracking do not keep the numbers for each transaction in order to update later what is committed or not.

The proof is that when I rollback, the numbers do not change:

SQL> rollback;
Rollback complete.
 
SQL> select * from X$KSXMME where objn=&object_id;
old 1: select * from X$KSXMME where objn=&object_id
new 1: select * from X$KSXMME where objn= 74867
 
ADDR INDX INST_ID CON_ID CHUNKN SLOTN OBJN INS UPD DEL DROPSEG CURROWS PAROBJN LASTUSED FLAGS
---------------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ----------
00007F526DDF47F8 0 1 0 64 256 74867 20000 2000 10000 0 2350000 0 1491467123 128

Yes, there is an estimation of the current number of rows here, in real-time. This is used to compare the changes with the total number, but you can use it to see the progress of a big transaction, giving a view of uncommitted changes.

sys.mon_mods_all$

The table sys.mon_mods_all$ is what is behind DBA_TAB_MODIFICATIONS (not exactly, but that will be for another blog post) and you have to flush what’s in memory to see the latest changes there:

SQL> exec dbms_stats.flush_database_monitoring_info;
PL/SQL procedure successfully completed.
 
SQL> select * from sys.mon_mods_all$ where obj#=&object_id;
old 1: select * from sys.mon_mods_all$ where obj#=&object_id
new 1: select * from sys.mon_mods_all$ where obj#= 74867
 
OBJ# INSERTS UPDATES DELETES TIMESTAMP FLAGS DROP_SEGMENTS
---------- ---------- ---------- ---------- ------------------ ---------- -------------
74867 5581477 656000 13835192 06-APR 15:10:53 1 0

The flag 1 means that the table has been truncated since the latest stats gathering.

This is what we already know from previous release. Nothing to do with the trace we see every hour in MMON slave.

sys.optstat_snapshot$

What happens every hour is that a snapshot of sys.mon_mods_all$ is stored in sys.optstat_snapshot$:

SQL> select * from sys.optstat_snapshot$ where obj#=&object_id order by timestamp;
old 1: select * from sys.optstat_snapshot$ where obj#=&object_id order by timestamp
new 1: select * from sys.optstat_snapshot$ where obj#= 74867 order by timestamp
 
OBJ# INSERTS UPDATES DELETES FLAGS TIMESTAMP
---------- ---------- ---------- ---------- ---------- ------------------
74867 999 0 0 32 05-APR-17 17:27:01
74867 1997 0 0 32 05-APR-17 17:33:25
74867 1997 0 0 32 05-APR-17 17:33:31
74867 1997 0 0 32 05-APR-17 17:33:32
74867 80878 0 160 0 05-APR-17 18:59:37
74867 90863 0 210 0 05-APR-17 20:53:07
74867 10597135 0 410 0 05-APR-17 21:53:13
74867 10598134 0 410 32 05-APR-17 22:02:38
74867 38861 0 10603745 1 06-APR-17 08:17:58
74867 38861 0 10603745 1 06-APR-17 09:18:04
74867 581477 124000 11175192 1 06-APR-17 10:11:27
74867 1321477 230000 11705192 1 06-APR-17 11:09:50
74867 2481477 346000 12285192 1 06-APR-17 12:09:56
74867 3681477 466000 12885192 1 06-APR-17 01:10:04
74867 4861477 584000 13475192 1 06-APR-17 02:10:11
74867 5561477 654000 13825192 1 06-APR-17 03:10:19

You see snapshots every hour, the latest being 03:10, 02:10, 01.10, 12:09, 11:09, …
You see additional snapshots at each statistics gathering. I’ve run dbms_stats.gather_table_stats at 17:27 and 17:33 several times the day before. Those snapshots are flagged 32.
The statistics was gathered again at 20:02 (the auto job) and I’ve truncated the table after that which is why the flag is 1.

dbms_stats_advisor.compute_volatile_flag

My guess is that there should be a flag for volatile tables here, because I’ve seen a trace for compute_volatile_flag in MMON trace, so I’ve enabled sql_trace for the MMON slave, and here is the query which takes the snapshot:

insert /* KSXM:TAKE_SNPSHOT */ into sys.optstat_snapshot$ (obj#, inserts, updates, deletes, timestamp, flags) (select m.obj#, m.inserts, m.updates, m.deletes, systimestamp, dbms_stats_advisor.compute_volatile_flag( m.obj#, m.flags, :flags, m.inserts, m.updates, m.deletes, s.inserts, s.updates, s.deletes, null, nvl(to_number(p.valchar), :global_stale_pcnt), s.gather) flags from sys.mon_mods_all$ m, (select si.obj#, max(si.inserts) inserts, max(si.updates) updates, max(si.deletes) deletes, decode(bitand(max(si.flags), :gather_flag), 0, 'NO_GATHER', 'GATHER') gather, max(si.timestamp) timestamp from sys.optstat_snapshot$ si, (select obj#, max(timestamp) ts from sys.optstat_snapshot$ group by obj#) sm where si.obj# = sm.obj# and si.timestamp = sm.ts group by si.obj#) s, sys.optstat_user_prefs$ p where m.obj# = s.obj#(+) and m.obj# = p.obj#(+) and pname(+) = 'STALE_PERCENT' and dbms_stats_advisor.check_mmon_policy_violation(rownum, 6, 2) = 0)

It reads the current values (from sys.mon_mods_all$) and the last values (from sys.optstat_snapshot$), reads the stale percentage parameter, and calls the dbms_stats_advisor.compute_volatile_flag function that updates the flag with one passed as :flag, probably adding the value 64 (see below) when table is volatile (probably when sum of DML is over the row count + stale percentage). The function is probably different when the snapshots comes from statistics gathering (‘GATHER’) or from DML monitoring (‘NO_GATHER’) because the number of rows is absolute or relative to the previous one.

From the trace of bind variables, or simply from the dbms_stats trace, I can see all values:
DBMS_STATS: compute_volatile_flag: objn=74867, flag=1, new_flag=0, inserts_new=5701477, updates_new=668000, deletes_new=13895192, inserts_old=5701477, updates_old=668000, deletes_old=13895192, rowcnt=, rowcnt_loc=, stale_pcnt=10, gather=NO_GATHER, flag_result=1
DBMS_STATS: compute_volatile_flag: objn=74867, flag=1, new_flag=0, inserts_new=4861477, updates_new=584000, deletes_new=13475192, inserts_old=3681477, updates_old=466000, deletes_old=12885192, rowcnt=, rowcnt_loc=, stale_pcnt=10, gather=NO_GATHER, flag_result=1
DBMS_STATS: compute_volatile_flag: objn=74867, flag=1, new_flag=0, inserts_new=5561477, updates_new=654000, deletes_new=13825192, inserts_old=4861477, updates_old=584000, deletes_old=13475192, rowcnt=, rowcnt_loc=, stale_pcnt=10, gather=NO_GATHER, flag_result=1

The input flag is 1 and the output flag is 1. And I think that, whatever the number of DML we have, this is because the new_flag=0

This explains why I was not able to have snapshots flagged as volatile even when changing a lot of rows. Then How can the statistics advisor detect my volatile table?

Statistics Advisor

I’ve traced the statistics advisor

set long 100000 longc 10000
variable t varchar2(30)
variable e varchar2(30)
variable r clob
exec :t:= DBMS_STATS.CREATE_ADVISOR_TASK('my_task');
exec :e:= DBMS_STATS.EXECUTE_ADVISOR_TASK('my_task');
exec :r:= DBMS_STATS.REPORT_ADVISOR_TASK('my_task');
print r

No ‘LockVolatileTable’ rule has raised a recommendation, but I’ve seen a call to the DBMS_STATS.CHECK_VOLATILE function with an object_id as parameter.

dbms_stats_internal.check_volatile

In order to understand what are the criteria, I’ve run (with sql_trace) the function on my table:

SQL> select dbms_stats_internal.check_volatile(&object_id) from dual;
old 1: select dbms_stats_internal.check_volatile(&object_id) from dual
new 1: select dbms_stats_internal.check_volatile( 74867) from dual
 
DBMS_STATS_INTERNAL.CHECK_VOLATILE(74867)
------------------------------------------
F

I suppose ‘F’ is false, which explains why my table was not considered as volatile.

Here is the trace with binds:

PARSING IN CURSOR #140478915921360 len=191 dep=1 uid=0 oct=3 lid=0 tim=99947151021 hv=976524548 ad='739cb468' sqlid='1r3ujfwx39584'
SELECT SUM(CASE WHEN ISVOLATILE > 0 THEN 1 ELSE 0 END) FROM (SELECT OBJ#, BITAND(FLAGS, :B2 ) ISVOLATILE FROM OPTSTAT_SNAPSHOT$ WHERE OBJ# = :B1 ORDER BY TIMESTAMP DESC) O WHERE ROWNUM < :B3
END OF STMT
...
BINDS #140478915921360:
 
Bind#0
oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00
oacflg=03 fl2=1206001 frm=00 csi=00 siz=72 off=0
kxsbbbfp=7fc3cbe1c158 bln=22 avl=02 flg=05
value=64
Bind#1
oacdty=02 mxl=22(21) mxlc=00 mal=00 scl=00 pre=00
oacflg=03 fl2=1206001 frm=00 csi=00 siz=0 off=24
kxsbbbfp=7fc3cbe1c170 bln=22 avl=04 flg=01
value=74867
Bind#2
oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00
oacflg=03 fl2=1206001 frm=00 csi=00 siz=0 off=48
kxsbbbfp=7fc3cbe1c188 bln=22 avl=02 flg=01
value=24

So, here is what the algorithm looks like:

  1. sys.opstat_snapshot$ is read for the latest 24 snapshots (remember that we have snapshots every hour + at each statistics gathering)
  2. ‘ISVOLATILE’ is 1 when the flags from the snapshots has flag 64. This is how I guessed that snapshots should me flagged with 64 by compute_volatile_flag.
  3. And finally, the number of ‘ISVOLATILE’ ones is summed.

So, it seems that the Statistics Advisor will raise a recommendation when the table has been flagged as volatile multiple times over the last 24 hour. How many? let’s guess:

SQL> insert into sys.optstat_snapshot$ select &object_id,0,0,0,64,sysdate from xmltable('1 to 12');
old 1: insert into sys.optstat_snapshot$ select &object_id,0,0,0,64,sysdate from xmltable('1 to 12')
new 1: insert into sys.optstat_snapshot$ select 74867,0,0,0,64,sysdate from xmltable('1 to 12')
 
12 rows created.
 
SQL> select dbms_stats_internal.check_volatile(&object_id) from dual;
old 1: select dbms_stats_internal.check_volatile(&object_id) from dual
new 1: select dbms_stats_internal.check_volatile( 74867) from dual
 
DBMS_STATS_INTERNAL.CHECK_VOLATILE(74867)
-----------------------------------------
F
 
SQL> rollback;
 
Rollback complete.

I’ve called the function after inserting various number of lines with flag=63 into sys.optstat_snapshot$ and up to 12 snapshots, it is still not considered as volatile.
Please remember that this is a lab, we are not expected to update the internal dictionary tables ourselves.

Now inserting one more:

SQL> insert into sys.optstat_snapshot$ select &object_id,0,0,0,64,sysdate from xmltable('1 to 13');
old 1: insert into sys.optstat_snapshot$ select &object_id,0,0,0,64,sysdate from xmltable('1 to 13')
new 1: insert into sys.optstat_snapshot$ select 74867,0,0,0,64,sysdate from xmltable('1 to 13')
 
13 rows created.
 
SQL> select dbms_stats_internal.check_volatile(&object_id) from dual;
old 1: select dbms_stats_internal.check_volatile(&object_id) from dual
new 1: select dbms_stats_internal.check_volatile( 74867) from dual
 
DBMS_STATS_INTERNAL.CHECK_VOLATILE(74867)
-----------------------------------------
T
 
SQL> rollback;
 
Rollback complete.

Good I have a ‘T’ here for ‘true. I conclude that the Statistics Advisor recommends to lock the stats on tables when half of the last 24h hours snapshots have encountered more than STALE_PERCENT modifications.

So what?

My table was not considered as volatile. None of the snapshots have been flagged as volatile. I’m quite sure that the number of DML is sufficient, so I suppose that this is disabled by default and I don’t know how to enable it. What I want to see is the compute_volatile_flag called with new_flag=64 so that snapshots are flagged when a large percentage or rows have been modified, so that enough snapshots have been flagged to be considered by the the check_volatile function.
Even if it is enabled, I think that there are more cases where tables should have statistics locked. Even if a table is empty for 5 minutes per day, we must be sure that the statistics are not gathered at that time. And looking at the Statistics Advisor thresholds, this case is far from being detected.
Final thought here: do you realize that you buy an expensive software to detect the changes happening on your tables, guess how the tables are updated, and recommend (and even implement) a general best practice? Does it mean that, today, we put in production some applications where we have no idea about what it does? Aren’t we supposed to design the application, document which tables are volatile and when they are loaded in bulk, and when to gather stats and lock them?

 

Cet article 12cR2 DML monitoring and Statistics Advisor est apparu en premier sur Blog dbi services.

OUD – Oracle Unified Directory 11.1.2.3, Oracle generates more and more LDAP lookups with every release

Thu, 2017-04-06 04:50

After installing OUD some time ago, I was doing some tests to see how it performs, and as long as I do ldap searching on the command line it looks very good. I am running Unified Directory 11.1.2.3.170117 (latest PSU), just for the protocol and I use the OUD only for TNS resolving and nothing else. However, Oracle clients are not connecting with “ldapsearch”, they are using “sqlplus” and the TNS name is resolved automatically in background.

I do have the following ldap.ora and sqlnet.ora. Very simply and nothing special.

oracle@dbidg03:/u01/app/oracle/network/admin/ [DBIT122] cat ldap.ora
DIRECTORY_SERVERS= (dbidg01:1389)
DEFAULT_ADMIN_CONTEXT = "dc=dbi,dc=com"
DIRECTORY_SERVER_TYPE = OID

oracle@dbidg03:/u01/app/oracle/network/admin/ [DBIT122] cat sqlnet.ora
NAMES.DIRECTORY_PATH = (TNSNAMES,LDAP,EZCONNECT)

Here is a little quiz: How many LDAP search requests do you expect when you connect to a 12.2 databases with the following command?

sqlplus system/manager@dbit122_ldap

Only one, right? Oracle looks up the TNS name dbit122_ldap in the OUD and retrieves the connect string. As soon as Oracle has the connect details, OUD does not play any role anymore. In case you do a ldapsearch from the 12.2 Oracle Home, then this is exactly the case.

oracle@dbidg03:/u01/app/oracle/network/admin/ [DBIT122] which ldapsearch
/u01/app/oracle/product/12.2.0/dbhome_1/bin/ldapsearch
oracle@dbidg03:/u01/app/oracle/network/admin/ [DBIT122] ldapsearch -v -h dbidg01 -p 1389 -b "cn=DBIT122_LDAP,cn=OracleContext,dc=dbi,dc=com" -s base "(objectclass=*)" "objectclass,orclNetDescString,orclNetDescName,orclVersion"
ldap_open( dbidg01, 1389 )
filter pattern: (objectclass=*)
returning: objectclass,orclNetDescString,orclNetDescName,orclVersion
filter is: ((objectclass=*))
cn=dbit122_ldap,cn=OracleContext,dc=dbi,dc=com
1 matches

In the OUD access log, you can see it clearly. One connect, the bind, the search request and finally the disconnect. Exactly how it should be, and the etime is 1 millisecond. That’s the elapsed time to deliver the search request which is very fast.

[dbafmw@dbidg01 logs]$ tail -40f /u01/app/oracle/product/Middleware/11.1.2.3/asinst_1/OUD/logs/access
...
[06/Apr/2017:10:46:49 +0200] CONNECT conn=877 from=192.168.56.203:21971 to=192.168.56.201:1389 protocol=LDAP
[06/Apr/2017:10:46:49 +0200] BIND REQ conn=877 op=0 msgID=1 type=SIMPLE dn="" version=3
[06/Apr/2017:10:46:49 +0200] BIND RES conn=877 op=0 msgID=1 result=0 authDN="" etime=0
[06/Apr/2017:10:46:49 +0200] SEARCH REQ conn=877 op=1 msgID=2 base="cn=DBIT122_LDAP,cn=OracleContext,dc=dbi,dc=com" scope=base filter="(objectclass=*)" attrs="objectclass,orclNetDescString,orclNetDescName,orclVersion"
[06/Apr/2017:10:46:49 +0200] SEARCH RES conn=877 op=1 msgID=2 result=0 nentries=1 etime=1
[06/Apr/2017:10:46:49 +0200] UNBIND REQ conn=877 op=2 msgID=3
[06/Apr/2017:10:46:49 +0200] DISCONNECT conn=877 reason="Client Disconnect"

Ok. Let’s do the first test with Oracle 10.2.0.5. I know, it is not supported, however, regarding LDAP searches it is a version  where everything is ok. My test is very simple, just a sqlplus connection and then an exit. Nothing else.

oracle@dbidg03:/u01/app/oracle/network/admin/ [DBIT102] sqlplus -V

SQL*Plus: Release 10.2.0.5.0 - Production

oracle@dbidg03:/u01/app/oracle/network/admin/ [DBIT102] sqlplus system/manager@dbit122_ldap

SQL*Plus: Release 10.2.0.5.0 - Production on Thu Apr 6 11:00:02 2017

Copyright (c) 1982, 2010, Oracle.  All Rights Reserved.


Connected to:
Oracle Database 12c Enterprise Edition Release 12.2.0.1.0 - 64bit Production

SQL> exit
Disconnected from Oracle Database 12c Enterprise Edition Release 12.2.0.1.0 - 64bit Production

In the OUD access log I see, like expected only one search request.

[dbafmw@dbidg01 logs]$ tail -40f /u01/app/oracle/product/Middleware/11.1.2.3/asinst_1/OUD/logs/access
...
[06/Apr/2017:11:01:18 +0200] CONNECT conn=879 from=192.168.56.203:21974 to=192.168.56.201:1389 protocol=LDAP
[06/Apr/2017:11:01:18 +0200] BIND REQ conn=879 op=0 msgID=1 type=SIMPLE dn="" version=3
[06/Apr/2017:11:01:18 +0200] BIND RES conn=879 op=0 msgID=1 result=0 authDN="" etime=0
[06/Apr/2017:11:01:18 +0200] SEARCH REQ conn=879 op=1 msgID=2 base="cn=dbit122_ldap,cn=OracleContext,dc=dbi,dc=com" scope=base filter="(objectclass=*)" attrs="objectclass,orclNetDescString,orclNetDescName,orclVersion"
[06/Apr/2017:11:01:18 +0200] SEARCH RES conn=879 op=1 msgID=2 result=0 nentries=1 etime=2
[06/Apr/2017:11:01:18 +0200] UNBIND REQ conn=879 op=2 msgID=3
[06/Apr/2017:11:01:18 +0200] DISCONNECT conn=879 reason="Client Disconnect"

Let’s to the same now with 11.2.0.4. This time with a fully supported version. Yes. It still is. :-)

oracle@dbidg03:/u01/app/oracle/network/admin/ [DBIT112] sqlplus -V

SQL*Plus: Release 11.2.0.4.0 Production

oracle@dbidg03:/u01/app/oracle/network/admin/ [DBIT112] sqlplus system/manager@dbit122_ldap

SQL*Plus: Release 11.2.0.4.0 Production on Thu Apr 6 11:03:17 2017

Copyright (c) 1982, 2013, Oracle.  All rights reserved.


Connected to:
Oracle Database 12c Enterprise Edition Release 12.2.0.1.0 - 64bit Production

SQL> exit
Disconnected from Oracle Database 12c Enterprise Edition Release 12.2.0.1.0 - 64bit Production

Wowwwww … now I see already two search request on the OUD. To be honest, I haven’t expected that. One should be sufficient from my point of view.

[dbafmw@dbidg01 logs]$ tail -40f /u01/app/oracle/product/Middleware/11.1.2.3/asinst_1/OUD/logs/access
...
[06/Apr/2017:11:03:43 +0200] CONNECT conn=882 from=192.168.56.203:21979 to=192.168.56.201:1389 protocol=LDAP
[06/Apr/2017:11:03:43 +0200] BIND REQ conn=882 op=0 msgID=1 type=SIMPLE dn="" version=3
[06/Apr/2017:11:03:43 +0200] BIND RES conn=882 op=0 msgID=1 result=0 authDN="" etime=0
[06/Apr/2017:11:03:43 +0200] SEARCH REQ conn=882 op=1 msgID=2 base="cn=dbit122_ldap,cn=OracleContext,dc=dbi,dc=com" scope=base filter="(objectclass=*)" attrs="objectclass,orclNetDescString,orclNetDescName,orclVersion"
[06/Apr/2017:11:03:43 +0200] SEARCH RES conn=882 op=1 msgID=2 result=0 nentries=1 etime=1
[06/Apr/2017:11:03:43 +0200] UNBIND REQ conn=882 op=2 msgID=3
[06/Apr/2017:11:03:43 +0200] DISCONNECT conn=882 reason="Client Disconnect"
[06/Apr/2017:11:03:43 +0200] CONNECT conn=883 from=192.168.56.203:21980 to=192.168.56.201:1389 protocol=LDAP
[06/Apr/2017:11:03:43 +0200] BIND REQ conn=883 op=0 msgID=1 type=SIMPLE dn="" version=3
[06/Apr/2017:11:03:43 +0200] BIND RES conn=883 op=0 msgID=1 result=0 authDN="" etime=1
[06/Apr/2017:11:03:43 +0200] SEARCH REQ conn=883 op=1 msgID=2 base="cn=dbit122_ldap,cn=OracleContext,dc=dbi,dc=com" scope=base filter="(objectclass=*)" attrs="objectclass,orclNetDescString,orclNetDescName,orclVersion"
[06/Apr/2017:11:03:43 +0200] SEARCH RES conn=883 op=1 msgID=2 result=0 nentries=1 etime=2
[06/Apr/2017:11:03:43 +0200] UNBIND REQ conn=883 op=2 msgID=3
[06/Apr/2017:11:03:43 +0200] DISCONNECT conn=883 reason="Client Disconnect"

But when you think, it can’t get worse, then do the same simple test with a 12.1.0.2 Oracle client.

oracle@dbidg03:/u01/app/oracle/network/admin/ [DBIT121] sqlplus -V

SQL*Plus: Release 12.1.0.2.0 Production

oracle@dbidg03:/u01/app/oracle/network/admin/ [DBIT121] sqlplus system/manager@dbit122_ldap

SQL*Plus: Release 12.1.0.2.0 Production on Thu Apr 6 11:06:18 2017

Copyright (c) 1982, 2014, Oracle.  All rights reserved.

Last Successful login time: Thu Apr 06 2017 11:03:43 +02:00

Connected to:
Oracle Database 12c Enterprise Edition Release 12.2.0.1.0 - 64bit Production

SQL> exit
Disconnected from Oracle Database 12c Enterprise Edition Release 12.2.0.1.0 - 64bit Production

Incredible, it is issuing three ldap search requests against the OUD for a simple sqlplus connection.

[dbafmw@dbidg01 logs]$ tail -40f /u01/app/oracle/product/Middleware/11.1.2.3/asinst_1/OUD/logs/access
...
[06/Apr/2017:11:06:41 +0200] CONNECT conn=887 from=192.168.56.203:21986 to=192.168.56.201:1389 protocol=LDAP
[06/Apr/2017:11:06:41 +0200] BIND REQ conn=887 op=0 msgID=1 type=SIMPLE dn="" version=3
[06/Apr/2017:11:06:41 +0200] BIND RES conn=887 op=0 msgID=1 result=0 authDN="" etime=0
[06/Apr/2017:11:06:41 +0200] SEARCH REQ conn=887 op=1 msgID=2 base="cn=dbit122_ldap,cn=OracleContext,dc=dbi,dc=com" scope=base filter="(objectclass=*)" attrs="objectclass,orclNetDescString,orclNetDescName,orclVersion"
[06/Apr/2017:11:06:41 +0200] SEARCH RES conn=887 op=1 msgID=2 result=0 nentries=1 etime=1
[06/Apr/2017:11:06:41 +0200] UNBIND REQ conn=887 op=2 msgID=3
[06/Apr/2017:11:06:41 +0200] DISCONNECT conn=887 reason="Client Disconnect"
[06/Apr/2017:11:06:41 +0200] CONNECT conn=888 from=192.168.56.203:21987 to=192.168.56.201:1389 protocol=LDAP
[06/Apr/2017:11:06:41 +0200] BIND REQ conn=888 op=0 msgID=1 type=SIMPLE dn="" version=3
[06/Apr/2017:11:06:41 +0200] BIND RES conn=888 op=0 msgID=1 result=0 authDN="" etime=0
[06/Apr/2017:11:06:41 +0200] SEARCH REQ conn=888 op=1 msgID=2 base="cn=dbit122_ldap,cn=OracleContext,dc=dbi,dc=com" scope=base filter="(objectclass=*)" attrs="objectclass,orclNetDescString,orclNetDescName,orclVersion"
[06/Apr/2017:11:06:41 +0200] SEARCH RES conn=888 op=1 msgID=2 result=0 nentries=1 etime=2
[06/Apr/2017:11:06:41 +0200] UNBIND REQ conn=888 op=2 msgID=3
[06/Apr/2017:11:06:41 +0200] DISCONNECT conn=888 reason="Client Disconnect"
[06/Apr/2017:11:06:41 +0200] CONNECT conn=889 from=192.168.56.203:21988 to=192.168.56.201:1389 protocol=LDAP
[06/Apr/2017:11:06:41 +0200] BIND REQ conn=889 op=0 msgID=1 type=SIMPLE dn="" version=3
[06/Apr/2017:11:06:41 +0200] BIND RES conn=889 op=0 msgID=1 result=0 authDN="" etime=1
[06/Apr/2017:11:06:41 +0200] SEARCH REQ conn=889 op=1 msgID=2 base="cn=dbit122_ldap,cn=OracleContext,dc=dbi,dc=com" scope=base filter="(objectclass=*)" attrs="objectclass,orclNetDescString,orclNetDescName,orclVersion"
[06/Apr/2017:11:06:41 +0200] SEARCH RES conn=889 op=1 msgID=2 result=0 nentries=1 etime=2
[06/Apr/2017:11:06:41 +0200] UNBIND REQ conn=889 op=2 msgID=3
[06/Apr/2017:11:06:41 +0200] DISCONNECT conn=889 reason="Client Disconnect"

The last test is now with a 12cR2 client. Will it increase now to 4?

oracle@dbidg03:/u01/app/oracle/network/admin/ [DBIT122] sqlplus system/manager@dbit122_ldap

SQL*Plus: Release 12.2.0.1.0 Production on Thu Apr 6 11:09:08 2017

Copyright (c) 1982, 2016, Oracle.  All rights reserved.

Last Successful login time: Thu Apr 06 2017 11:06:41 +02:00

Connected to:
Oracle Database 12c Enterprise Edition Release 12.2.0.1.0 - 64bit Production

SQL> exit
Disconnected from Oracle Database 12c Enterprise Edition Release 12.2.0.1.0 - 64bit Production

No, it did not increase to 4. But with 12cR2 you will see like with 12cR1 also 3 search requests against the OUD.

[dbafmw@dbidg01 logs]$ tail -40f /u01/app/oracle/product/Middleware/11.1.2.3/asinst_1/OUD/logs/access
...
[06/Apr/2017:11:09:07 +0200] CONNECT conn=890 from=192.168.56.203:21990 to=192.168.56.201:1389 protocol=LDAP
[06/Apr/2017:11:09:07 +0200] BIND REQ conn=890 op=0 msgID=1 type=SIMPLE dn="" version=3
[06/Apr/2017:11:09:07 +0200] BIND RES conn=890 op=0 msgID=1 result=0 authDN="" etime=1
[06/Apr/2017:11:09:07 +0200] SEARCH REQ conn=890 op=1 msgID=2 base="cn=dbit122_ldap,cn=OracleContext,dc=dbi,dc=com" scope=base filter="(objectclass=*)" attrs="objectclass,orclNetDescString,orclNetDescName,orclVersion"
[06/Apr/2017:11:09:07 +0200] SEARCH RES conn=890 op=1 msgID=2 result=0 nentries=1 etime=2
[06/Apr/2017:11:09:07 +0200] UNBIND REQ conn=890 op=2 msgID=3
[06/Apr/2017:11:09:07 +0200] DISCONNECT conn=890 reason="Client Disconnect"
[06/Apr/2017:11:09:07 +0200] CONNECT conn=891 from=192.168.56.203:21991 to=192.168.56.201:1389 protocol=LDAP
[06/Apr/2017:11:09:07 +0200] BIND REQ conn=891 op=0 msgID=1 type=SIMPLE dn="" version=3
[06/Apr/2017:11:09:07 +0200] BIND RES conn=891 op=0 msgID=1 result=0 authDN="" etime=0
[06/Apr/2017:11:09:07 +0200] SEARCH REQ conn=891 op=1 msgID=2 base="cn=dbit122_ldap,cn=OracleContext,dc=dbi,dc=com" scope=base filter="(objectclass=*)" attrs="objectclass,orclNetDescString,orclNetDescName,orclVersion"
[06/Apr/2017:11:09:07 +0200] SEARCH RES conn=891 op=1 msgID=2 result=0 nentries=1 etime=1
[06/Apr/2017:11:09:07 +0200] UNBIND REQ conn=891 op=2 msgID=3
[06/Apr/2017:11:09:07 +0200] DISCONNECT conn=891 reason="Client Disconnect"
[06/Apr/2017:11:09:07 +0200] CONNECT conn=892 from=192.168.56.203:21992 to=192.168.56.201:1389 protocol=LDAP
[06/Apr/2017:11:09:07 +0200] BIND REQ conn=892 op=0 msgID=1 type=SIMPLE dn="" version=3
[06/Apr/2017:11:09:07 +0200] BIND RES conn=892 op=0 msgID=1 result=0 authDN="" etime=0
[06/Apr/2017:11:09:07 +0200] SEARCH REQ conn=892 op=1 msgID=2 base="cn=dbit122_ldap,cn=OracleContext,dc=dbi,dc=com" scope=base filter="(objectclass=*)" attrs="objectclass,orclNetDescString,orclNetDescName,orclVersion"
[06/Apr/2017:11:09:07 +0200] SEARCH RES conn=892 op=1 msgID=2 result=0 nentries=1 etime=1
[06/Apr/2017:11:09:07 +0200] UNBIND REQ conn=892 op=2 msgID=3
[06/Apr/2017:11:09:07 +0200] DISCONNECT conn=892 reason="Client Disconnect"

So what is the reason for this high increase in ldap searches. Instead of 1, it is doing 3 with 12cR1 and 12cR2, and 2 with 11gR2. That is 66% more than with Oracle 10gR2 clients. That’s enormous from my point view. Quite a huge extra load on your OUD server, when  you upgrade your Oracle clients.

To make it short, I have no answer. It might be related to the old Oracle Names code, which seems that it is still there. I have found errors in the client trace file regarding a A.SMD query. The A.SMD call is coming from the old Oracle Names server, where you could have done stuff like “NAMESCTL> QUERY DB920.oracle.com A.SMD”. But this is really a long time ago. My last Oracle Name server, I have seen in 2002.

oracle@dbidg02:/u01/app/oracle/network/trc/ [DBIT122] cat 12.2_client.trc | grep A.SMD
(4144394624) [04-APR-2017 14:38:18:633] nnfttran: Error querying DBIT122_LDAP of attribute A.SMD errcode 408
(4144394624) [04-APR-2017 14:38:18:642] nnfttran: Error querying DBIT122_LDAP of attribute A.SMD errcode 408
(4144394624) [04-APR-2017 14:38:18:646] nnfttran: Error querying DBIT122_LDAP of attribute A.SMD errcode 408

If I take a look at my 12cR2 adapters I have no Oracle Names compiled in. I don’t know if this is possible at all, with 12c.

oracle@dbidg03:/u01/app/oracle/network/admin/ [DBIT122] adapters | egrep -A 5 "Installed Oracle Net naming methods"
Installed Oracle Net naming methods are:

    Local Naming (tnsnames.ora)
    Oracle Directory Naming
    Oracle Host Naming
Conclusion

Ok. What should I say … take care if you upgrade your clients to more recent versions, in case you use OUD to resolve your names. It might generate some extra load on your OUD servers. More and more with every release since 10gR2. By the way … I have opened a SR at Oracle, because this seems to be a bug for me. I was very surprised, that I was the first one facing this issue. Will keep you posted as soon as I have results. ;-)

 

Cet article OUD – Oracle Unified Directory 11.1.2.3, Oracle generates more and more LDAP lookups with every release est apparu en premier sur Blog dbi services.

Sharding with Oracle 12c R2 Part II : Scalability and Connections

Wed, 2017-04-05 12:12

In previous blog, we talked about system-managed sharding. We saw how it is possible to create shard databases with Oracle 12c R2. Below we remind the configuration we used.
VM sharddemo1: catalog
VM sharddemo2: shard
VM sharddemo3: shard
One of the characteristics of sharding is the scalability, and in this blog we are going to add a new shard on a new server. The new configuration will be like below
VM sharddemo1: catalog
VM sharddemo2: shard
VM sharddemo3: shard
VM sharddemo4: shard — New added shard
We supposed that you have already read the first part

After adding the new shard, we will see how we connect in a shard environment.

First let’s confirm that we have only two shards running now (one on server sharddemo2 and one on server sharddemo3):

GDSCTL>config shard
Name Shard Group Status State Region Availability
---- ----------- ------ ----- ------ ------------
sh1 shgrp1 Ok Deployed region1 ONLINE
sh21 shgrp1 Ok Deployed region1 ONLINE

To add the new server sharddemo4 in the sharding environment, we first have to register remote scheduler agent on the newly added shard:

[oracle@sharddemo4 ~]$ which schagent
/u01/app/oracle/product/12.2.0.1/dbhome_1/bin/schagent


[oracle@sharddemo4 ~]$ echo welcome | schagent -registerdatabase sharddemo1 8080
Agent Registration Password ?
Oracle Scheduler Agent Registration for 12.2.0.1.2 Agent
Agent Registration Successful!
[oracle@sharddemo4 ~]$

After registration, let’s start the agent

[oracle@sharddemo4 ~]$ schagent -start
Scheduler agent started using port 16267

We must also create the corresponding directories for the database

[oracle@sharddemo4 ~]$ mkdir /u01/app/oracle/oradata
[oracle@sharddemo4 ~]$ mkdir /u01/app/oracle/fast_recovery_area

We can now launch gdsctl on sharddemo1 and connect with an administrator

[oracle@sharddemo1]$ gdsctl
GDSCTL: Version 12.2.0.1.0 - Production on Tue Mar 07 11:43:44 CET 2017
Copyright (c) 2011, 2016, Oracle. All rights reserved.
Welcome to GDSCTL, type "help" for information.
Current GSM is set to REGION1_DIRECTOR
GDSCTL>


GDSCTL>connect mygdsadmin/root
Catalog connection is established
GDSCTL>

As for existing shards, the new server sharddemo4 must be invited

GDSCTL>add invitednode sharddemo4

We also have to create the shard on sharddemo4. The shardgroup, destination and the credentials should be specified.

GDSCTL>create shard -shardgroup shgrp1 -destination sharddemo4 -credential oracle_cred
The operation completed successfully
DB Unique Name: sh41
GDSCTL>

And then now we can deploy

GDSCTL>deploy
deploy: examining configuration...
deploy: deploying primary shard 'sh41' ...
deploy: network listener configuration successful at destination 'sharddemo4'
deploy: starting DBCA at destination 'sharddemo4' to create primary shard 'sh41' ...
deploy: waiting for 1 DBCA primary creation job(s) to complete...
deploy: waiting for 1 DBCA primary creation job(s) to complete...
deploy: waiting for 1 DBCA primary creation job(s) to complete...
deploy: waiting for 1 DBCA primary creation job(s) to complete...
deploy: waiting for 1 DBCA primary creation job(s) to complete...
deploy: waiting for 1 DBCA primary creation job(s) to complete...
deploy: waiting for 1 DBCA primary creation job(s) to complete...
deploy: waiting for 1 DBCA primary creation job(s) to complete...
deploy: waiting for 1 DBCA primary creation job(s) to complete...
deploy: waiting for 1 DBCA primary creation job(s) to complete...
deploy: DBCA primary creation job succeeded at destination 'sharddemo4' for shard 'sh41'
deploy: requesting Data Guard configuration on shards via GSM
deploy: shards configured; background operations in progress
The operation completed successfully
GDSCTL>

Running command config shard, we can verify that a new sharded database sh41 was created

GDSCTL>config shard
Name Shard Group Status State Region Availability
---- ----------- ------ ----- ------ ------------
sh1 shgrp1 Ok Deployed region1 ONLINE
sh21 shgrp1 Ok Deployed region1 ONLINE
sh41 shgrp1 Ok Deployed region1 ONLINE

Now querying the 3 shards we can see that data for sharded tables are automatically balanced between the SH1, SH21 and SH41
For example we have a total of 15 rows in the catalog ORCLCAT for the sharded table CUSTOMERS that should be distributed between the 3 sharded instances

SQL> select name from v$database;
NAME
---------
ORCLCAT

SQL> select count(*) from customers;
COUNT(*)
----------
15

We have 3 rows for customer table in SH1

SQL> select name from v$database;
NAME
---------
SH1

SQL> select count(*) from customers;
COUNT(*)
----------
3

We have 3 rows for customer table in SH21

SQL> select name from v$database;
NAME
---------
SH21

SQL> select count(*) from customers;
COUNT(*)
----------
3

And 9 rows for customer table in SH41

SQL> select name from v$database;
NAME
---------
SH41

SQL> select count(*) from customers;
COUNT(*)
----------
9

Now that we have our sharded databases, how to connect? We have two ways to do it:
1- Connect to a shard by specifying a sharding_key
2- Connect to the shardcatalog via GDS$CATALOG service

For single-shard queries, we can connect to a shard with a given sharding_key using the shard director. For example let’s display data for table customers in SH41 database

SQL> select firstname,lastname,custid from customers;
FIRSTNAME LASTNAME CUSTID
--------------- --------------- ----------------------------
Seane Tuger Seane.Tuger@localdomain
Seaneis Tugeris Seaneis.Tugeris@localdomain
Seanae Tugera Seanea.Tugera@localdomain
Sophiea Moralesa Sophiea.Moralesa@localdomain
Mourada Habiba Mourada.Habiba@localdomain
Michel Robert Michel.Robert@localdomain
Sophie Morales Sophie.Morales@localdomain
Joe Dalton Joe.Dalton@localdomain
Mourad Habib Mourad.Habib@localdomain
9 rows selected.

If we want to retrieve information about customer Seane Tuger, we can connect directly to shard SH41.
For this we need first to create a global service that runs to all shard databases. This global service will be used with a sharding key

GDSCTL>add service -service my_service_shard_srvc
The operation completed successfully
GDSCTL>

Let’s start the service

GDSCTL>start service -service my_service_shard_srvc
The operation completed successfully


GDSCTL>config service
Name Network name Pool Started Preferred all
---- ------------ ---- ------- -------------
my_service_sha my_service_shard_srvc.cust_sd cust_sdb Yes Yes
rd_srvc b.oradbcloud
GDSCTL>

Now that the service is started, we can use the following connect string which will route us directly to the SH41 because of the SH4ARDING_KEY we specified.


[oracle@sharddemo2 admin]$ sqlplus user_shard/root@'(description=(address=(protocol=tcp)
(host=sharddemo1)(port=1571))(connect_data=(service_name=my_service_shard_srvc.cust_sdb.oradbcloud)
(region=region1)(SHARDING_KEY=Seane.Tuger@localdomain)))'
SQL*Plus: Release 12.2.0.1.0 Production on Tue Mar 7 14:34:48 2017
Copyright (c) 1982, 2016, Oracle. All rights reserved.
Last Successful login time: Tue Mar 07 2017 13:41:56 +01:00
Connected to:
Oracle Database 12c Enterprise Edition Release 12.2.0.1.0 - 64bit Production

SQL> select name from v$database;
NAME
---------
SH41

To perform cross-shard queries, we have to connect to the shardcatalog (coordinator database) using the GDS$CATALOG service (from any shard). GDS$CATALOG is a service automatically deployed in the shardcatalog.

[oracle@sharddemo1 ~]$ lsnrctl services
LSNRCTL for Linux: Version 12.2.0.1.0 - Production on 07-MAR-2017 14:45:14
Copyright (c) 1991, 2016, Oracle. All rights reserved.
Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=sharddemo1.localdomain)(PORT=1521)))
Services Summary...
Service "GDS$CATALOG.oradbcloud" has 1 instance(s).
Instance "ORCLCAT", status READY, has 1 handler(s) for this service...
Handler(s):
"DEDICATED" established:604 refused:0 state:ready
LOCAL SERVER

Via sqlplus we can connect like

[oracle@sharddemo2 admin]$ sqlplus user_shard/root@sharddemo1:1521/GDS\$CATALOG.oradbcloud
SQL*Plus: Release 12.2.0.1.0 Production on Tue Mar 7 14:48:05 2017
Copyright (c) 1982, 2016, Oracle. All rights reserved.
Last Successful login time: Tue Mar 07 2017 14:47:58 +01:00
Connected to:
Oracle Database 12c Enterprise Edition Release 12.2.0.1.0 - 64bit Production

SQL> select name from v$database;
NAME
---------
ORCLCAT

Now that we saw how to connect, we can go further and see how oracle manage queries in a shard environment.
Let’s connect to ORCLCAT database and let’s run a query which does a SELECT query accessing multiple shards on a sharded table and let’s look the execution plan. Such queries are called cross-shard queries (CSQ)

SQL> SELECT FirstName,LastName, geo, class FROM Customers WHERE class like '%free%';

Execution Plan
----------------------------------------------------------
Plan hash value: 2953441084

--------------------------------------------------------------
| Id | Operation | Name | Cost (%CPU)| Inst |IN-OUT|
--------------------------------------------------------------
| 0 | SELECT STATEMENT | | 0 (0)| | |
| 1 | SHARD ITERATOR | | | | |
| 2 | REMOTE | | | ORA_S~ | R->S |
--------------------------------------------------------------

Remote SQL Information (identified by operation id):
----------------------------------------------------

2 - EXPLAIN PLAN SET STATEMENT_ID='PLUS470045' INTO PLAN_TABLE@! FOR
SELECT "A1"."FIRSTNAME","A1"."LASTNAME","A1"."GEO","A1"."CLASS" FROM
"CUSTOMERS" "A1" WHERE "A1"."CLASS" LIKE '%free%' /*
coord_sql_id=bkpy0tbjnqu3k */ (accessing
'ORA_SHARD_POOL@ORA_MULTI_TARGET' )

We can see above that the query is using a dblink ORA_MULTI_TARGET to perform CSQ on sharded tables. This dblink is created automatically when the gsm is configured to handle cross-shard querying.


SQL> select owner,DB_LINK,username,host from dba_db_links where owner='USER_SHARD';
OWNER DB_LINK USERNAME HOST
--------------- ----------------------------------- --------------- ---------------
USER_SHARD ORA_SHARD_POOL@ORA_MULTI_TARGET USER_SHARD GDS$CATALOG

Now let’s connect to SH1 and let’s query the duplicated table products and let’s look into the execution plan.

SQL> select * from products where productid>2;

Execution Plan
----------------------------------------------------------
Plan hash value: 1639127380

------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 158 | 0 (0)| 00:00:01 |
| 1 | MAT_VIEW ACCESS BY INDEX ROWID BATCHED| PRODUCTS | 1 | 158 | 0 (0)| 00:00:01 |
|* 2 | INDEX RANGE SCAN | SYS_C007361 | 1 | | 0 (0)| 00:00:01 |
------------------------------------------------------------------------------------------------------

We can see that a materialized view is being used. This means that for duplicated tables oracle is using materialized views to handle queries.

Conclusion

In this article we talked about sharding scalability and how to connect to shards. System-managed sharding was used. In coming articles, we will talk about other sharding methods like composite and user-defined sharding. We will also talk how it is possible to combine sharding and dataguard.

 

Cet article Sharding with Oracle 12c R2 Part II : Scalability and Connections est apparu en premier sur Blog dbi services.

OUD – Oracle Unified Directory 11.1.2.3 Tuning, It is not always the servers fault

Tue, 2017-04-04 03:41

The default configuration which is shipped with OUD is not meant to be ready for enterprise usage. The default settings of OUD are targeted at evaluators and developers who run equipment with limited resources, and so it is quite likely that you run into performance issues if you don’t change anything, before going into production. The OUD performance depends on a lot of things like

  • Network configuration/routing/firewalls/bonding
  • OUD version and configuration (Replication, TLS)
  • Java version and Java runtime memory configuration
  • DNS Lookup times
  • Name Service Cache Daemon
  • And many more …

However, it is not always the servers fault. Sometimes the client is causing the issue. But how do I know, if it is the client or the server. In the following example it takes about 10 seconds to resolve the connect string DBIT122_LDAP. That is enormous. Far too long from being acceptable. Where is the tnsping spending so much time?

oracle@dbidg02:/u01/app/oracle/network/admin/ [DBIT122] time tnsping DBIT122_LDAP

TNS Ping Utility for Linux: Version 12.2.0.1.0 - Production on 04-APR-2017 08:43:06

Copyright (c) 1997, 2016, Oracle.  All rights reserved.

Used parameter files:
/u01/app/oracle/network/admin/sqlnet.ora

Used LDAP adapter to resolve the alias
Attempting to contact (DESCRIPTION=(ADDRESS_LIST=(ADDRESS=(PROTOCOL=TCP)(HOST=dbidg01)(PORT=1521)))(CONNECT_DATA=(SERVICE_NAME=DBIT122)))
OK (10 msec)

real    0m10.177s
user    0m0.017s
sys     0m0.018s

To exclude, that it is the servers fault, just check the OUD access log where you can see any ldap request against the OUD.

[dbafmw@dbidg01 logs]$ tail -50f /u01/app/oracle/product/Middleware/11.1.2.3/asinst_1/OUD/logs/access
...
...
[04/Apr/2017:08:43:39 +0200] CONNECT conn=5 from=192.168.56.202:30826 to=192.168.56.201:1389 protocol=LDAP
[04/Apr/2017:08:43:39 +0200] BIND REQ conn=5 op=0 msgID=1 type=SIMPLE dn="" version=3
[04/Apr/2017:08:43:39 +0200] BIND RES conn=5 op=0 msgID=1 result=0 authDN="" etime=0
[04/Apr/2017:08:43:39 +0200] SEARCH REQ conn=5 op=1 msgID=2 base="cn=DBIT122_LDAP,cn=OracleContext,dc=dbi,dc=com" scope=base filter="(objectclass=*)" attrs="objectclass,orclNetDescString,orclNetDescName,orclVersion"
[04/Apr/2017:08:43:39 +0200] SEARCH RES conn=5 op=1 msgID=2 result=0 nentries=1 etime=2
[04/Apr/2017:08:43:39 +0200] UNBIND REQ conn=5 op=2 msgID=3
[04/Apr/2017:08:43:39 +0200] DISCONNECT conn=5 reason="Client Disconnect"
...
...

The important entry to look for is the etime after the search request. The etime filed is the elapsed time in milliseconds which the server spent processing the request. In the above case, it is 2 milliseconds, so quite fast. If you would see here large elapsed times here, then this would be a good indicator for issues on the server side.

Now, that we know that the server is ok, let’s move to client side. The first thing I am trying to do, is to see how fast the ldapsearch is. I am using the ldapsearch which comes with 12cR2 and I will use the same search criteria which tnsping is using to search for the connect string. The ldapsearch syntax from the OUD binaries differs a little bit with ldapsearch syntax which is shipped with 12cR2. Why should Oracle make them the same, it would be too easy. ;-) Ok, let’s check the ldapsearch.

oracle@dbidg02:/u01/app/oracle/network/admin/ [DBIT122] time ldapsearch -v -h dbidg01 -p 1389 -b "cn=DBIT122_LDAP,cn=OracleContext,dc=dbi,dc=com" \
-s base "(objectclass=*)" "objectclass,orclNetDescString,orclNetDescName,orclVersion"

ldap_open( dbidg01, 1389 )
filter pattern: (objectclass=*)
returning: objectclass,orclNetDescString,orclNetDescName,orclVersion
filter is: ((objectclass=*))
cn=dbit122_ldap,cn=OracleContext,dc=dbi,dc=com
1 matches

real    0m0.020s
user    0m0.005s
sys     0m0.004s

I don’t see any issues here. My ldapsearch came back in a blink of an eye. So .. where are the other 10 seconds? We need more information. We can either use strace or we can activate tracing on the client side. Something less known in the Oracle world is the tnsping tracing, which can be activated too. My tnsping is slow, and so I want only the tnsping to be traced and nothing else. To do so, we need to specify two parameters in the sqlnet.ora file. The TNSPING.TRACE_DIRECTORY and the TNSPING.TRACE_LEVEL. The tnsping trace level can have 4 different values like the sqlnet tracing.

  • 0 or OFF – No Trace output
  • 4 or USER – User trace information
  • 10 or ADMIN – Administration trace information
  • 16 or SUPPORT – Worldwide Customer Support trace information

Because I want to have the full trace output, I go for level 16 which is the support tracing.

oracle@dbidg02:/u01/app/oracle/network/admin/ [DBIT122] cat sqlnet.ora | grep TNSPING

TNSPING.TRACE_DIRECTORY = /u01/app/oracle/network/trc
TNSPING.TRACE_LEVEL = SUPPORT

Ok. Let’s do it again and see the outcome.

oracle@dbidg02:/u01/app/oracle/network/admin/ [DBIT122] time tnsping DBIT122_LDAP

TNS Ping Utility for Linux: Version 12.2.0.1.0 - Production on 04-APR-2017 09:44:44

Copyright (c) 1997, 2016, Oracle.  All rights reserved.

Used parameter files:
/u01/app/oracle/network/admin/sqlnet.ora

Used LDAP adapter to resolve the alias
Attempting to contact (DESCRIPTION=(ADDRESS_LIST=(ADDRESS=(PROTOCOL=TCP)(HOST=dbidg01)(PORT=1521)))(CONNECT_DATA=(SERVICE_NAME=DBIT122)))
OK (10 msec)

real    0m10.191s
user    0m0.013s
sys     0m0.016s
oracle@dbidg02:/u01/app/oracle/network/admin/ [DBIT122]

If we look at the trace file, we see that Oracle found 3 directory paths in the following order, TNSNAMES, EZCONNECT and LDAP.

[04-APR-2017 09:44:44:569] nnfgsrsp: Obtaining path parameter from names.directory_path or native_names.directory_path
[04-APR-2017 09:44:44:569] nnfgsrdp: entry
[04-APR-2017 09:44:44:569] nnfgsrdp: Setting path:
[04-APR-2017 09:44:44:569] nnfgsrdp: checking element TNSNAMES
[04-APR-2017 09:44:44:569] nnfgsrdp: checking element EZCONNECT
[04-APR-2017 09:44:44:569] nnfgsrdp: checking element LDAP

Switching to the TNSNAMES adapter is very fast and Oracle see’s it immediately that the query is unsuccessful, and so it is switching to the next adapter.

[04-APR-2017 09:44:44:569] nnfgrne: Switching to TNSNAMES adapter
[04-APR-2017 09:44:44:569] nnftboot: entry
[04-APR-2017 09:44:44:569] nlpaxini: entry
[04-APR-2017 09:44:44:569] nlpaxini: exit
[04-APR-2017 09:44:44:569] nnftmlf_make_local_addrfile: entry
[04-APR-2017 09:44:44:569] nnftmlf_make_local_addrfile: construction of local names file failed
[04-APR-2017 09:44:44:569] nnftmlf_make_local_addrfile: exit
[04-APR-2017 09:44:44:569] nlpaxini: entry
[04-APR-2017 09:44:44:569] nlpaxini: exit
[04-APR-2017 09:44:44:569] nnftmlf_make_system_addrfile: entry
[04-APR-2017 09:44:44:569] nnftmlf_make_system_addrfile: system names file is /u01/app/oracle/network/admin/tnsnames.ora
[04-APR-2017 09:44:44:569] nnftmlf_make_system_addrfile: exit
[04-APR-2017 09:44:44:569] nnftboot: exit
[04-APR-2017 09:44:44:569] nnftrne: entry
[04-APR-2017 09:44:44:569] nnftrne: Original name: DBIT122_LDAP
[04-APR-2017 09:44:44:569] nnfttran: entry
[04-APR-2017 09:44:44:569] nnfttran: Error querying DBIT122_LDAP of attribute A.SMD errcode 408
[04-APR-2017 09:44:44:569] nnfgrne: Query unsuccessful, skipping to next adapter

Now, Oracle is switching to the EZCONNECT adapter.

[04-APR-2017 09:44:44:569] nnfgrne: Switching to EZCONNECT adapter
[04-APR-2017 09:44:44:569] nnfhboot: entry
[04-APR-2017 09:44:44:569] nnfhboot: exit
[04-APR-2017 09:44:44:569] snlinGetAddrInfo: entry
[04-APR-2017 09:44:54:664] snlinGetAddrInfo: getaddrinfo() failed with error -2
[04-APR-2017 09:44:54:664] snlinGetAddrInfo: exit
[04-APR-2017 09:44:54:665] snlinGetAddrInfo: entry
[04-APR-2017 09:44:54:727] snlinGetAddrInfo: getaddrinfo() failed with error -2
[04-APR-2017 09:44:54:727] snlinGetAddrInfo: exit
[04-APR-2017 09:44:54:727] nnfhrne: Error forming address for DBIT122_LDAP, errcode 406
[04-APR-2017 09:44:54:727] nnfgrne: Query unsuccessful, skipping to next adapter

Ok. Here we go. Between “snlinGetAddrInfo: entry” and “snlinGetAddrInfo: getaddrinfo() failed with error -2″  10 seconds have been gone. Oracle thinks that the DBIT122_LDAP is an easy connect string, and tries to resolve the name, which fails.

So I need to switch the entries in the directory path in the sqlnet.ora file, to NAMES.DIRECTORY_PATH= (TNSNAMES,LDAP,EZCONNECT). After I have done that, the tnsping comes back successfully and very fast.

oracle@dbidg02:/u01/app/oracle/network/admin/ [DBIT122] time tnsping DBIT122_LDAP

TNS Ping Utility for Linux: Version 12.2.0.1.0 - Production on 04-APR-2017 10:25:39

Copyright (c) 1997, 2016, Oracle.  All rights reserved.

Used parameter files:
/u01/app/oracle/network/admin/sqlnet.ora

Used LDAP adapter to resolve the alias
Attempting to contact (DESCRIPTION=(ADDRESS_LIST=(ADDRESS=(PROTOCOL=TCP)(HOST=dbidg01)(PORT=1521)))(CONNECT_DATA=(SERVICE_NAME=DBIT122)))
OK (0 msec)

real    0m0.018s
user    0m0.007s
sys     0m0.006s
Conclusion

It is not always the OUD servers fault when you hit performance issues. I might be on the client as well and it can have a severe impact.

 

Cet article OUD – Oracle Unified Directory 11.1.2.3 Tuning, It is not always the servers fault est apparu en premier sur Blog dbi services.

Can I do it with PostgreSQL? – 14 – optimizer hints

Tue, 2017-04-04 01:23

This is a question that comes up quite often: How can I use optimizer hints in PostgreSQL as I can do it in Oracle? Well, you cant, and the reasons are this:

  • Poor application code maintainability: hints in queries require massive refactoring.
  • Interference with upgrades: today’s helpful hints become anti-performance after an upgrade.
  • Encouraging bad DBA habits slap a hint on instead of figuring out the real issue.
  • Does not scale with data size: the hint that’s right when a table is small is likely to be wrong when it gets larger.
  • Failure to actually improve query performance: most of the time, the optimizer is actually right.
  • Interfering with improving the query planner: people who use hints seldom report the query problem to the project.


But this does not mean that you cant influence the optimizer (or “planner” in PostgreSQL wording), it is just not working in the same way. Lets have a look.

On of the reasons that the planner does not choose an index over a sequential scan is that the parameter effective_cache_size is not set properly. To understand what it does you have to know that PostgreSQL works together with the operating system file cache/disk cache very well. It is not required, as you do it in Oracle, to give most of the available memory of the server to the database. Usually you start with 25% of the total available memory and give that to PostgreSQL by setting the parameter shared_buffers to that value. When pages fall out of that region it is still likely that they are available in the disk cache and can be retrieved from there without going down to disk. And this is what effective_cache_size is about: Setting this parameter does not consume more memory but is telling PostgreSQL how big the total cache of the system really is, so shared_buffers plus disk cache. This gets taken into consideration by the planner. A good starting point is 50 to 75% of the available memory. Lets do a quick test to show how this behaves. Lets generate some data:

postgres=# \! cat a.sql
drop table if exists t1;
create table t1 ( a int );
with generator as 
 ( select a.*
     from generate_series ( 1, 5000000 ) a
    order by random()
 )
insert into t1 ( a ) 
     select a
       from generator;
create index i1 on t1(a);
analyze verbose t1;
select * from pg_size_pretty ( pg_relation_size ('t1' ));
select * from pg_size_pretty ( pg_total_relation_size('t1'));
postgres=# \i a.sql
DROP TABLE
CREATE TABLE
INSERT 0 5000000
CREATE INDEX
psql:a.sql:12: INFO:  analyzing "public.t1"
psql:a.sql:12: INFO:  "t1": scanned 22124 of 22124 pages, containing 5000000 live rows and 0 dead rows; 30000 rows in sample, 5000000 estimated total rows
ANALYZE
 pg_size_pretty 
----------------
 173 MB
(1 row)
 pg_size_pretty 
----------------
 280 MB
(1 row)
postgres=# show shared_buffers ;
 shared_buffers 
----------------
 128MB
(1 row)

The table without the index is big enough to not fit into shared_buffers (173MB) and even bigger of course including the index (280MB). When we set effective_cache_size to a very low value we get costs of 40.55 for the statement below (almost no disk cache):

postgres=# SET effective_cache_size TO '1 MB';
SET
postgres=# explain SELECT * FROM t1 ORDER BY  a limit 10;
                                     QUERY PLAN                                      
-------------------------------------------------------------------------------------
 Limit  (cost=0.43..40.55 rows=10 width=4)
   ->  Index Only Scan using i1 on t1  (cost=0.43..20057243.41 rows=5000000 width=4)
(2 rows)

Setting this to a more realistic value decreases the costs because it is expected to find the index in the disk cache:

postgres=# SET effective_cache_size TO '5 GB';
SET
postgres=# explain SELECT * FROM t1 ORDER BY  a limit 10;
                                    QUERY PLAN                                     
-----------------------------------------------------------------------------------
 Limit  (cost=0.43..0.87 rows=10 width=4)
   ->  Index Only Scan using i1 on t1  (cost=0.43..218347.46 rows=5000000 width=4)
(2 rows)

This is the first “hint” you can set to influence the optimizer/planner. But there are many others. What PostgreSQL allows you to do is to enable or disable features of the planner:

postgres=# select name from pg_settings where name like 'enable%';
         name         
----------------------
 enable_bitmapscan
 enable_hashagg
 enable_hashjoin
 enable_indexonlyscan
 enable_indexscan
 enable_material
 enable_mergejoin
 enable_nestloop
 enable_seqscan
 enable_sort
 enable_tidscan

Using the same data from above we could disable the index only scan:

postgres=# set enable_indexonlyscan=false;
SET
postgres=# explain (analyze,buffers) SELECT * FROM t1 ORDER BY  a limit 10;
                                                       QUERY PLAN                                                        
-------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.43..0.87 rows=10 width=4) (actual time=0.019..0.058 rows=10 loops=1)
   Buffers: shared hit=13
   ->  Index Scan using i1 on t1  (cost=0.43..218347.46 rows=5000000 width=4) (actual time=0.017..0.036 rows=10 loops=1)
         Buffers: shared hit=13
 Planning time: 0.057 ms
 Execution time: 0.084 ms
(6 rows)

postgres=# set enable_indexonlyscan=true;
SET
postgres=# explain (analyze,buffers) SELECT * FROM t1 ORDER BY  a limit 10;
                                                          QUERY PLAN                                                          
------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.43..0.87 rows=10 width=4) (actual time=0.025..0.072 rows=10 loops=1)
   Buffers: shared hit=13
   ->  Index Only Scan using i1 on t1  (cost=0.43..218347.46 rows=5000000 width=4) (actual time=0.023..0.048 rows=10 loops=1)
         Heap Fetches: 10
         Buffers: shared hit=13
 Planning time: 0.068 ms
 Execution time: 0.105 ms
(7 rows)

But the documentation clearly states: “If the default plan chosen by the optimizer for a particular query is not optimal, a temporary solution is to use one of these configuration parameters to force the optimizer to choose a different plan”. For testing and troubleshooting this can be handy.

Another way to influence the optimizer/planner is to set the planner cost constants:

 postgres=# select name from pg_settings where name like '%cost%' and name not like '%vacuum%';
         name         
----------------------
 cpu_index_tuple_cost
 cpu_operator_cost
 cpu_tuple_cost
 parallel_setup_cost
 parallel_tuple_cost
 random_page_cost
 seq_page_cost"
(7 rows)

What they mean is pretty well documented and how you need to set them (if you need to change them at all) depends on your hardware and application. There are others as well, such as the *collapse_limit* parameters and the parameters for the Genetic Query Optimizer.

Conclusion: There are several ways you can influence the optimizer/planner in PostgreSQL it is just not by using hints.

 

Cet article Can I do it with PostgreSQL? – 14 – optimizer hints est apparu en premier sur Blog dbi services.

12cR2 DBCA, Automatic Memory Management, and -databaseType

Mon, 2017-04-03 15:52

This post explains the following error encountered when creating a 12.2 database with DBCA:
[DBT-11211] The Automatic Memory Management option is not allowed when the total physical memory is greater than 4GB.
or when creating the database directly with the installer:
[INS-35178]The Automatic Memory Management option is not allowed when the total physical memory is greater than 4GB.
If you used Automatic Memory Management (AMM) you will have to think differently and size the SGA and PGA separately.

ASMM

Automatic Shared Memory Management, or ASMM is what you do when setting SGA_TARGET and not setting MEMORY_TARGET. Basically, you define the size of the SGA you want to allocate at startup and that will be available for the instance, most of it being buffer cache and shared pool. I’ll not go into the detail of SGA_TARGET and SGA_MAX_SIZE because on the most common platforms, all is allocated at instance startup. Then, in addition to this shared area used by all instance processes, each processes can allocate private memory, and you control this with PGA_AGGREGATE_TARGET.
The total size of SGA and PGA for all instances in a system must reside in physical memory for the simple reason that they are mostly used to avoid I/O (a large buffer cache avoids physical reads and optimizes physical writes, a large PGA avoids reads and writes to tempfiles).

AMM

Because you don’t always know how much to allocate to each (SGA and PGA) Oracle came with a feature where you define the whole MEMORY_TARGET, part of this will be dynamically allocated to SGA or PGA. This is called Automatic Memory Management (AMM). It’s a good idea on the paper: it is automatic, which means that you don’t have to think about it, and it is dynamic, which means that you don’t waste physical memory because of bad sizing.

But it is actually a bad idea when going to implementation, at least on the most common platforms.
SGA and PGA are different beasts that should not be put in the same cage:

  • SGA is big, static, shared, allocated once at startup
  • PGA is small chunks constantly allocated and deallocated, private to processes

First, it is not so easy because you have to size the /dev/shm correctly or you will get the following at startup:
ORA-00845: MEMORY_TARGET not supported on this system
In addition to that, because the whole memory is prepared to contain the whole SGA you see misleading numbers in ‘show sga’.

Second there are lot of bugs, resizing overhead, etc.

And finally, you cannot use large pages when you are in AMM, and in modern system (lot of RAM, lot of processes) having all processes mapping the SGA with small pages of 4k is a big overhead.

So, as long as you have more than few GB on a system, you should avoid AMM and set SGA_TARGET and PGA_AGGREGATE_TARGET independently. Forget MEMORY_TARGET. Forget /dev/shm. Forget also the following documentation at http://docs.oracle.com/database/122/ADMIN/managing-memory.htm#ADMIN00207 which mentions that Oracle recommends that you enable the method known as automatic memory management.
Actually, AMM is not recommended for systems with more than a few GB of physical memory, and most system have more than few GB of physical memory. If you try to use AMM on a system with less than 4GB you get a warning in 12cR1 and it is an error in 12cR2:
CaptureAMM002
I got this when trying to create a database with AMM on a system with more than 4GB of physical memory.

CaptureAMM001
This does not depend on the size of MEMORY_TARGET you choose, or the size of /dev/shm, but only the size of available physical memory:
[oracle@VM104 ~]$ df -h /dev/shm
Filesystem Size Used Avail Use% Mounted on
tmpfs 3.9G 0 3.9G 0% /dev/shm
 
[oracle@VM104 ~]$ free -h
total used free shared buff/cache available
Mem: 7.8G 755M 5.0G 776M 2.1G 6.2G
Swap: 411M 0B 411M

No choice: it is a hard stop

If you are not convinced, then please have a look at MOS Doc ID 2244817.1 which explains this decision:

  • It is not something new: DBCA used to give similar warning message but in 12.2.0.1 it is an error message
  • Reason behind: Because database creation fails some times and in some cases database wont be functional after some times

So, do you want to create a database which may not be functional after some times?

So, what size for SGA and PGA?

Then, if you were thinking that AMM was cool, your next question not is: what size to allocate to SGA and PGA?

Don’t panic.

You are in this situation because you have several GB of RAM. Current servers have lot of memory. You don’t have to size it to the near 100MB. Start with some values, run with it. Look at the performance and the memory advisors. Are you doing too much physical I/O on tables where you expect data to be in cache? Then increase the SGA, and maybe set a minimum for the buffer cache. Do you see lot of hard parse because your application runs lot of statements and procedures? Then increase the SGA and maybe set a minimum for the shared pool. Do you run lot of analytic queries that full scan tables and have to hash and sort huge amount of data? Then decrease the SGA and increase the PGA_AGGREGATE_TARGET.

Where to start?

If you don’t know where to start, look at the DBCA database types:

#-----------------------------------------------------------------------------
# Name : databaseType
# Datatype : String
# Description : used for memory distribution when memoryPercentage specified
# Valid values : MULTIPURPOSE|DATA_WAREHOUSING|OLTP
# Default value : MULTIPURPOSE
# Mandatory : NO
#-----------------------------------------------------------------------------

Those types define the ratio between SGA and PGA. Then why not start with what is recommended by Oracle?

I’ve created the 3 types of instances with the following:
dbca -silent -totalMemory 10000 -databaseType MULTIPURPOSE -generateScripts -scriptDest /tmp/MULT ...
dbca -silent -totalMemory 10000 -databaseType DATA_WAREHOUSING -generateScripts -scriptDest /tmp/DWHG ...
dbca -silent -totalMemory 10000 -databaseType OLTP -generateScripts -scriptDest /tmp/OLTP ...

And here are the settings generated by DBCA
$ grep target /tmp/*/init.ora
DWHG/init.ora:sga_target=6000m
DWHG/init.ora:pga_aggregate_target=4000m
MULT/init.ora:sga_target=7500m
MMULT/init.ora:pga_aggregate_target=2500m
OLTP/init.ora:sga_target=8000m
OLTP/init.ora:pga_aggregate_target=2000m

Here is the summary:

SGA PGA OLTP 80% 20% Multi-Purpose 75% 25% Data Warehousing 60% 40%

(percentages are relative to eachother, here. Donc’ use 100% of physical memory for the Oracle instances because the system needs some memory as well)

This gives an idea where to start. Servers have lot of memory but you don’t have to use all of it. If you have a doubt, leave some free memory to be available for the filesystem cache. Usually, we recommend to used direct i/o (filesystemio_options=setall) to avoid the filesystem overhead. But when you start and want to lower the risks sub-sizing SGA or PGA, then you may prefer to keep that second level of cache (filesystemio_options=async) which uses all the physical memory available. This may improve the reads from tempfiles in case your PGA is too small. This is just an idea, not a recommendation.

So what?

If you have a server with more than few GB, then set SGA and PGA separately. Start with the ratios above, and then monitor performance and advisors. Physical servers today have at least 32GB. Even with a small VM with 1GB for my labs, I prefer to set them separately, because in that case I want to be sure to have a minimum size for buffer cache and shared pool. You may have lot of small VMs with 3GB and think about setting MEMORY_TARGET. But using large pages is a recommendation here because the hypervisor will have lot of memory to map, so ASMM is still the recommandation.

Once you know the size of all SGA, look at Hugepagesize in /proc/meminfo, set the number of hugepages in /etc/sysctl.conf, run sysctl -p and your instances will use available large pages for the SGA.

 

Cet article 12cR2 DBCA, Automatic Memory Management, and -databaseType est apparu en premier sur Blog dbi services.

When automatic reoptimization plan is less efficient

Sun, 2017-04-02 05:05

11gR2 started to have the optimizer react at execution time when a misestimate is encountered. Then the next executions are re-optimized with more accurate estimation, derived from the execution statistics. This was called cardinality feedback. Unfortunately, in rare cases we had a fast execution plan with bad estimations, and better estimations lead to worse execution plan. This is rare, but even when 9999 queries are faster, the one that takes too long will gives a bad perception of this optimizer feature.
This feature has been improved in 12cR1 with new names: auto-reoptimization and statistics feedback. I’m showing an example here in 12.1.0.2 without adaptive statistics (the 12.2 backport) and I’ve also disabled adaptive plan because they show the wrong numbers (similar to what I described in this post). I’ll show that at one point, the re-optimization can go back to the initial plan if it was the best in execution time.

V$SQL

Basically, here is what happened: first execution was fast, but with actual number of rows far from the estimated ones. Auto-reoptimisation kicks in for next execution and get a new plan, but with longer execution time. Third execution is another re-optimization, leading to same bad plan. Finally starting at 4th execution, the time is back to reasonable and we see the same as the first plan is used:

SQL> select sql_id,child_number,plan_hash_value,is_reoptimizable,is_resolved_adaptive_plan,parse_calls,executions,elapsed_time/1e6
from v$sql where sql_id='b4rhzfw7d6vdp';
 
SQL_ID CHILD_NUMBER PLAN_HASH_VALUE I I PARSE_CALLS EXECUTIONS ELAPSED_TIME/1E6
------------- ------------ --------------- - - ----------- ---------- ----------------
b4rhzfw7d6vdp 0 1894156093 Y 1 1 .346571
b4rhzfw7d6vdp 1 955499861 Y 1 1 5.173733
b4rhzfw7d6vdp 2 955499861 Y 1 1 4.772258
b4rhzfw7d6vdp 3 1894156093 N 7 7 .5008

The scope of statistic feedback is not to get optimal execution from the first execution. This requires accurate statistics, static or dynamic, and SQL Plan Directives is a try to get that. Statistics feedback goal is to try to get a better plan rather than re-use one that is based on misestimates. But sometimes the better is the enemy of the good and we have an example here in child cursors 1 and 2. But the good thing is that finally we are back to acceptable execution time, with a final plan that can be re-used without re-optimization.

What surprised me here is that the final plan has the same hash value than the initial one. Is it a coincidence that different estimations gives the same plan? Or did the optimizer finally gave up to try to find better?

V$SQL_REOPTIMIZATION_HINTS

In 12c the statistics feedback are exposed in V$SQL_REOPTIMIZATION_HINTS.

SQL> select sql_id,child_number,hint_text,client_id,reparse from v$sql_reoptimization_hints where sql_id='b4rhzfw7d6vdp';
 
SQL_ID CHILD_NUMBER HINT_TEXT CLIENT_ID REPARSE
------------- ------------ ---------------------------------------------------------------------------------------------------- ---------- ----------
b4rhzfw7d6vdp 0 OPT_ESTIMATE (@"SEL$1" INDEX_FILTER "DM_FOLDER_R1"@"SEL$1" "D_1F0049A880000016" ROWS=1517.000000 ) 1 1
b4rhzfw7d6vdp 0 OPT_ESTIMATE (@"SEL$1" INDEX_SCAN "DM_FOLDER_R1"@"SEL$1" "D_1F0049A880000016" MIN=1517.000000 ) 1 1
b4rhzfw7d6vdp 0 OPT_ESTIMATE (@"SEL$1" TABLE "DM_FOLDER_R1"@"SEL$1" ROWS=1517.000000 ) 1 1
b4rhzfw7d6vdp 0 OPT_ESTIMATE (@"SEL$1" INDEX_FILTER "DM_SYSOBJECT_R2"@"SEL$1" "D_1F0049A880000010" MIN=3.000000 ) 1 0
b4rhzfw7d6vdp 0 OPT_ESTIMATE (@"SEL$1" INDEX_SCAN "DM_SYSOBJECT_R2"@"SEL$1" "D_1F0049A880000010" MIN=3.000000 ) 1 0
b4rhzfw7d6vdp 0 OPT_ESTIMATE (@"SEL$1" TABLE "DM_SYSOBJECT_R2"@"SEL$1" MIN=3.000000 ) 1 0
b4rhzfw7d6vdp 1 OPT_ESTIMATE (@"SEL$1" INDEX_FILTER "DM_FOLDER_R1"@"SEL$1" "D_1F0049A880000016" ROWS=1517.000000 ) 1 0
b4rhzfw7d6vdp 1 OPT_ESTIMATE (@"SEL$1" INDEX_SCAN "DM_FOLDER_R1"@"SEL$1" "D_1F0049A880000016" MIN=1517.000000 ) 1 0
b4rhzfw7d6vdp 1 OPT_ESTIMATE (@"SEL$1" TABLE "DM_FOLDER_R1"@"SEL$1" ROWS=1517.000000 ) 1 0
b4rhzfw7d6vdp 1 OPT_ESTIMATE (@"SEL$1" INDEX_FILTER "DM_SYSOBJECT_R2"@"SEL$1" "D_1F0049A880000010" MIN=3.000000 ) 1 0
b4rhzfw7d6vdp 1 OPT_ESTIMATE (@"SEL$1" INDEX_SCAN "DM_SYSOBJECT_R2"@"SEL$1" "D_1F0049A880000010" MIN=3.000000 ) 1 0
b4rhzfw7d6vdp 1 OPT_ESTIMATE (@"SEL$1" TABLE "DM_SYSOBJECT_R2"@"SEL$1" MIN=3.000000 ) 1 0
b4rhzfw7d6vdp 1 OPT_ESTIMATE (@"SEL$582FA660" QUERY_BLOCK ROWS=1491.000000 ) 1 1

The child cursor 0 was re-optimized to cursor 1 with different number of rows for “DM_FOLDER_R1″ and “DM_SYSOBJECT_R2″
The child cursor 1 has the same values, but an additional number of row correction for a query block.

But we don’t see anything about cursor 2. It was re-optimizable, and was actually re-optimized into cursor 3 but no statistics corrections are displayed here.

Trace

As it is a reproducible case, I’ve run the same while tracing 10046, 10053 and 10507 (level 512) to get all information about SQL execution, Optimiser compilation, and statistics feedback. For each child cursor, I’ll show the execution plan with estimated and actual number of rows (E-Rows and A-Rows) and then some interesting lines from the trace, mainly those returned by:
grep -E "KKSMEC|^atom_hint|^@"

Child cursor 0 – plan 1894156093 – 0.34 seconds

Plan hash value: 1894156093
----------------------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | Cost (%CPU)| A-Rows | A-Time | Buffers |
----------------------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 171 (100)| 1 |00:00:00.04 | 17679 |
| 1 | SORT AGGREGATE | | 1 | 1 | | 1 |00:00:00.04 | 17679 |
| 2 | NESTED LOOPS | | 1 | 1 | 116 (0)| 1491 |00:00:00.04 | 17679 |
| 3 | NESTED LOOPS | | 1 | 1 | 115 (0)| 1491 |00:00:00.04 | 17456 |
| 4 | NESTED LOOPS | | 1 | 49 | 17 (0)| 5648 |00:00:00.01 | 537 |
|* 5 | INDEX RANGE SCAN | D_1F0049A880000016 | 1 | 3 | 3 (0)| 1517 |00:00:00.01 | 13 |
| 6 | TABLE ACCESS BY INDEX ROWID BATCHED | DM_SYSOBJECT_R | 1517 | 16 | 10 (0)| 5648 |00:00:00.01 | 524 |
|* 7 | INDEX RANGE SCAN | D_1F0049A880000010 | 1517 | 71 | 2 (0)| 5648 |00:00:00.01 | 249 |
|* 8 | TABLE ACCESS BY INDEX ROWID | DM_SYSOBJECT_S | 5648 | 1 | 2 (0)| 1491 |00:00:00.03 | 16919 |
|* 9 | INDEX UNIQUE SCAN | D_1F0049A880000108 | 5648 | 1 | 1 (0)| 1491 |00:00:00.03 | 15428 |
| 10 | NESTED LOOPS SEMI | | 5648 | 2 | 25 (0)| 1491 |00:00:00.02 | 14828 |
| 11 | NESTED LOOPS | | 5648 | 7 | 18 (0)| 2981 |00:00:00.02 | 12869 |
| 12 | TABLE ACCESS BY INDEX ROWID BATCHED| DM_SYSOBJECT_R | 5648 | 71 | 4 (0)| 2981 |00:00:00.01 | 7747 |
|* 13 | INDEX RANGE SCAN | D_1F0049A880000010 | 5648 | 16 | 3 (0)| 2981 |00:00:00.01 | 6145 |
|* 14 | TABLE ACCESS BY INDEX ROWID | DM_SYSOBJECT_S | 2981 | 1 | 2 (0)| 2981 |00:00:00.01 | 5122 |
|* 15 | INDEX UNIQUE SCAN | D_1F0049A880000108 | 2981 | 1 | 1 (0)| 2981 |00:00:00.01 | 2140 |
|* 16 | INDEX UNIQUE SCAN | D_1F0049A880000145 | 2981 | 52759 | 1 (0)| 1491 |00:00:00.01 | 1959 |
|* 17 | INDEX UNIQUE SCAN | D_1F0049A880000142 | 1491 | 1 | 1 (0)| 1491 |00:00:00.01 | 223 |
----------------------------------------------------------------------------------------------------------------------------------------

Because of low cardinality estimation of DM_SYSOBJECT_R predicate (E-Rows=3) the optimizer goes to NESTED LOOP. This plan has good execution time here because all blocks are in buffer cache. Reading 17679 blocks from buffer cache takes less than one second. It would have been much longer if those were physical I/O.

This is a case where the optimizer detects misestimate at execution time. Here is what is recorded in the trace:

Reparsing due to card est...
@=0x63a56820 type=3 nodeid=5 monitor=Y halias="DM_FOLDER_R1" loc="SEL$1" oname="SEL$F5BB74E1" act=1517 min=0 est=3 next=(nil)
Reparsing due to card est...
@=0x638fe2b0 type=5 nodeid=4 monitor=Y halias="" loc="SEL$F5BB74E1" onames="SEL$07BDC5B4"@"SEL$5" "SEL$2"@"SEL$5" act=5648 min=0 est=49 next=0x638fe250
Reparsing due to card est...
@=0x638fe4c0 type=5 nodeid=3 monitor=Y halias="" loc="SEL$F5BB74E1" onames="SEL$07BDC5B4"@"SEL$5" "SEL$2"@"SEL$5" "SEL$3"@"SEL$1" act=1491 min=0 est=1 next=0x638fe460
Reparsing due to card est...
@=0x638fe688 type=5 nodeid=2 monitor=Y halias="" loc="SEL$F5BB74E1" onames="SEL$07BDC5B4"@"SEL$5" "SEL$2"@"SEL$5" "SEL$3"@"SEL$1" "R_OBJECT_ID"@"SEL$1" act=1491 min=0 est=1 next=0x638fe5f8
kkocfbCheckCardEst [sql_id=b4rhzfw7d6vdp] reparse=y ecs=n efb=n ost=n fbs=n

Those are the misestimates which triggers re-optimization.

And here are all statistics feedback.

*********** Begin Dump Context (kkocfbCheckCardEst) [sql_id=b4rhzfw7d6vdp cpcnt=0] ***********
@=0x638fe688 type=5 nodeid=2 monitor=Y halias="" loc="SEL$F5BB74E1" onames="DM_FOLDER_R1"@"SEL$1" "DM_SYSOBJECT_R2"@"SEL$1" "TE_"@"SEL$2" "LJ_"@"SEL$2" act=1491 min=0 est=1 next=0x638fe5f8
@=0x638fe5f8 type=3 nodeid=17 monitor=Y halias="LJ_" loc="SEL$2" oname="D_1F0049A880000142" act=0 min=1 est=1 next=0x638fe4c0
@=0x638fe4c0 type=5 nodeid=3 monitor=Y halias="" loc="SEL$F5BB74E1" onames="DM_FOLDER_R1"@"SEL$1" "DM_SYSOBJECT_R2"@"SEL$1" "TE_"@"SEL$2" act=1491 min=0 est=1 next=0x638fe460
@=0x638fe460 type=1 nodeid=8 monitor=Y halias="TE_" loc="SEL$2" act=0 min=1 est=1 next=0x638fe3d0
@=0x638fe3d0 type=3 nodeid=9 monitor=Y halias="TE_" loc="SEL$2" oname="D_1F0049A880000108" act=0 min=1 est=1 next=0x638fe2b0
@=0x638fe2b0 type=5 nodeid=4 monitor=Y halias="" loc="SEL$F5BB74E1" onames="DM_FOLDER_R1"@"SEL$1" "DM_SYSOBJECT_R2"@"SEL$1" act=5648 min=0 est=49 next=0x638fe250
@=0x638fe250 type=1 nodeid=6 monitor=Y halias="DM_SYSOBJECT_R2" loc="SEL$1" act=3 min=1 est=16 next=0x638fe1c0
@=0x638fe1c0 type=3 nodeid=7 monitor=Y halias="DM_SYSOBJECT_R2" loc="SEL$1" oname="D_1F0049A880000010" act=3 min=1 est=71 next=0x63a56820
@=0x63a56820 type=3 nodeid=5 monitor=Y halias="DM_FOLDER_R1" loc="SEL$1" oname="D_1F0049A880000016" act=1517 min=0 est=3 next=(nil)
*********** End Dump Context ***********

We also see some information about execution performance:

kkoarCopyCtx: [sql_id=b4rhzfw7d6vdp] origin=CFB old=0x63a565d0 new=0x7fe74e2153f0 copyCnt=1 copyClient=y
**************************************************************
kkocfbCopyBestEst: Best Stats
Exec count: 1
CR gets: 17679
CU gets: 0
Disk Reads: 0
Disk Writes: 0
IO Read Requests: 0
IO Write Requests: 0
Bytes Read: 0
Bytes Written: 0
Bytes Exchanged with Storage: 0
Bytes Exchanged with Disk: 0
Bytes Simulated Read: 0
Bytes Simulated Returned: 0
Elapsed Time: 51 (ms)
CPU Time: 51 (ms)
User I/O Time: 15 (us)
*********** Begin Dump Context (kkocfbCopyBestEst) **********
*********** End Dump Context ***********

They are labeled as ‘Best Stats’ because we had only one execution at that time.

Finally, the hints are dumped:

******** Begin CFB Hints [sql_id=b4rhzfw7d6vdp] xsc=0x7fe74e215748 ********
Dumping Hints
=============
atom_hint=(@=0x7fe74e21ebf0 err=0 resol=0 used=0 token=1018 org=6 lvl=3 txt=OPT_ESTIMATE (@"SEL$1" TABLE "DM_SYSOBJECT_R2"@"SEL$1" MIN=3.000000 ) )
atom_hint=(@=0x7fe74e21e758 err=0 resol=0 used=0 token=1018 org=6 lvl=3 txt=OPT_ESTIMATE (@"SEL$1" INDEX_FILTER "DM_SYSOBJECT_R2"@"SEL$1" "D_1F0049A880000010" MIN=3.000000 ) )
atom_hint=(@=0x7fe74e21e3f0 err=0 resol=0 used=0 token=1018 org=6 lvl=3 txt=OPT_ESTIMATE (@"SEL$1" INDEX_SCAN "DM_SYSOBJECT_R2"@"SEL$1" "D_1F0049A880000010" MIN=3.000000 ) )
atom_hint=(@=0x7fe74e21dfd0 err=0 resol=0 used=0 token=1018 org=6 lvl=3 txt=OPT_ESTIMATE (@"SEL$1" INDEX_FILTER "DM_FOLDER_R1"@"SEL$1" "D_1F0049A880000016" ROWS=1517.000000 ) )
atom_hint=(@=0x7fe74e21dc68 err=0 resol=0 used=0 token=1018 org=6 lvl=3 txt=OPT_ESTIMATE (@"SEL$1" INDEX_SCAN "DM_FOLDER_R1"@"SEL$1" "D_1F0049A880000016" MIN=1517.000000 ) )
atom_hint=(@=0x7fe74e21d8c8 err=0 resol=0 used=0 token=1018 org=6 lvl=3 txt=OPT_ESTIMATE (@"SEL$1" TABLE "DM_FOLDER_R1"@"SEL$1" ROWS=1517.000000 ) )
********** End CFB Hints **********

Those are exactly what we see in V$SQL_REOPTIMIZATION_HINTS

This is all what we see for this first execution. The next execution starts with:

KKSMEC: Invalidating old cursor 0 with hash val = 1894156093
KKSMEC: Produced New cursor 1 with hash val = 955499861

As a consequence of child cursor 0 marked as reoptimizable, the next execution invalidates it and creates a new child cursor 1.

Child cursor 1 – new plan 955499861 – 5.17 seconds

Here is the new plan we see after that second execution:

Plan hash value: 955499861
------------------------------------------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | Cost (%CPU)| A-Rows | A-Time | Buffers | Reads | OMem | 1Mem | Used-Mem |
------------------------------------------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 30996 (100)| 1 |00:00:04.58 | 102K| 101K| | | |
| 1 | SORT AGGREGATE | | 1 | 1 | | 1 |00:00:04.58 | 102K| 101K| | | |
| 2 | VIEW | VM_NWVW_2 | 1 | 12039 | 30996 (1)| 1491 |00:00:04.58 | 102K| 101K| | | |
| 3 | HASH UNIQUE | | 1 | 12039 | 30996 (1)| 1491 |00:00:04.58 | 102K| 101K| 941K| 941K| 2597K (0)|
|* 4 | HASH JOIN RIGHT SEMI | | 1 | 12039 | 30490 (1)| 4132 |00:00:04.57 | 102K| 101K| 12M| 3867K| 14M (0)|
| 5 | TABLE ACCESS FULL | DM_DOCUMENT_S | 1 | 213K| 210 (1)| 213K|00:00:00.01 | 741 | 0 | | | |
|* 6 | HASH JOIN | | 1 | 36463 | 29665 (1)| 5622 |00:00:04.51 | 101K| 101K| 1405K| 1183K| 2026K (0)|
|* 7 | HASH JOIN | | 1 | 36463 | 18397 (1)| 5622 |00:00:02.23 | 65103 | 65050 | 940K| 940K| 1339K (0)|
|* 8 | HASH JOIN | | 1 | 2222 | 14489 (1)| 1499 |00:00:01.58 | 51413 | 51369 | 992K| 992K| 1377K (0)|
|* 9 | HASH JOIN | | 1 | 2222 | 14120 (1)| 1499 |00:00:01.46 | 50088 | 50057 | 3494K| 1598K| 4145K (0)|
|* 10 | TABLE ACCESS FULL | DM_SYSOBJECT_S | 1 | 39235 | 10003 (1)| 39235 |00:00:00.83 | 36385 | 36376 | | | |
|* 11 | HASH JOIN | | 1 | 24899 | 3920 (1)| 5648 |00:00:00.62 | 13703 | 13681 | 1199K| 1199K| 1344K (0)|
|* 12 | INDEX RANGE SCAN | D_1F0049A880000016 | 1 | 1517 | 12 (0)| 1517 |00:00:00.01 | 13 | 0 | | | |
|* 13 | TABLE ACCESS FULL| DM_SYSOBJECT_R | 1 | 646K| 3906 (1)| 646K|00:00:00.50 | 13690 | 13681 | | | |
| 14 | TABLE ACCESS FULL | DM_FOLDER_S | 1 | 431K| 367 (1)| 431K|00:00:00.04 | 1325 | 1312 | | | |
|* 15 | TABLE ACCESS FULL | DM_SYSOBJECT_R | 1 | 646K| 3906 (1)| 646K|00:00:00.51 | 13690 | 13681 | | | |
|* 16 | TABLE ACCESS FULL | DM_SYSOBJECT_S | 1 | 646K| 10000 (1)| 646K|00:00:02.14 | 36385 | 36376 | | | |
------------------------------------------------------------------------------------------------------------------------------------------------------------
Note
-----
- statistics feedback used for this statement

The notes makes it clear that the estimations comes from previous run (statistics feedback) and we see that for most operations E-Rows = A-Rows. With those a new plan has been chosen, with complex view merging: VM_NWWM. You can find clues about those internal view names on Jonathan Lewis blog. Here probably because the estimated number of rows is high, the subquery has been unnested. It is an ‘EXISTS’ subquery, which is transformed to semi join and merged to apply a distinct at the end.

So, we have a different plan, which is supposed to be better because it has been costed with more accurate cardinalities. .The goal of this post is not to detail the reason why the execution time is longer with a ‘better’ plan. If you look at ‘Reads’ column you can see that the first one has read all blocks from buffer cache but second one had to do physical I/O for all. With nothing from buffer cache, reading 101K blocks in multiblock reads may be faster than reading 17679 so the optimizer decision was not bad. I’ll have to estimate if it is expected to have most of the blocks in buffer cache in real production life as behavior in UAT is different. Some people will stop here, say that cardinality feedback is bad, disable it or even set optimizer_cost_adj to get the nested loop, but things are more complex than that.

The important thing is that the optimizer doesn’t stop there and compares the new execution statistics with the previous one.

**************************************************************
kkocfbCompareExecStats : Current
Exec count: 1
CR gets: 102226
CU gets: 3
Disk Reads: 101426
Disk Writes: 0
IO Read Requests: 1633
IO Write Requests: 0
Bytes Read: 830881792
Bytes Written: 0
Bytes Exchanged with Storage: 830881792
Bytes Exchanged with Disk: 830881792
Bytes Simulated Read: 0
Bytes Simulated Returned: 0
Elapsed Time: 4586 (ms)
CPU Time: 1305 (ms)
User I/O Time: 3040 (ms)
**************************************************************
kkocfbCompareExecStats : Best
Exec count: 1
CR gets: 17679
CU gets: 0
Disk Reads: 0
Disk Writes: 0
IO Read Requests: 0
IO Write Requests: 0
Bytes Read: 0
Bytes Written: 0
Bytes Exchanged with Storage: 0
Bytes Exchanged with Disk: 0
Bytes Simulated Read: 0
Bytes Simulated Returned: 0
Elapsed Time: 51 (ms)
CPU Time: 51 (ms)
User I/O Time: 15 (us)
kkocfbCompareExecStats: improvement BG: 0.172935 CPU: 0.039555

The first execution, with ‘bad’ statistics, is still the best one and this new execution has an improvement of 0.17, which means 5 times slower.

Then in the trace we see again that re-optimisation (reparsing) is considered:

Reparsing due to card est...
@=0x6a368338 type=5 nodeid=11 monitor=Y halias="" loc="SEL$582FA660" onames="SEL$608EC1F7"@"SEL$582FA660" "SEL$04458B50"@"SEL$582FA660" act=5648 min=0 est=24899 next=0x6a3682d8
Reparsing due to card est...
@=0x6a3687b0 type=5 nodeid=7 monitor=Y halias="" loc="SEL$582FA660" onames="SEL$608EC1F7"@"SEL$582FA660" "SEL$04458B50"@"SEL$582FA660" "SEL$FB0FE72C"@"SEL$33802F1B" "SEL$5"@"SEL$33802F1B" "SEL$07BDC5B4"@"SEL$636B5685" act=5622 min=0 est=36463 next=0x6a368750
Reparsing due to card est...
@=0x6a368990 type=5 nodeid=6 monitor=Y halias="" loc="SEL$582FA660" onames="SEL$608EC1F7"@"SEL$582FA660" "SEL$04458B50"@"SEL$582FA660" "SEL$FB0FE72C"@"SEL$33802F1B" "SEL$5"@"SEL$33802F1B" "SEL$07BDC5B4"@"SEL$636B5685" "SEL$FB0FE72C"@"SEL$4" act=5622 min=0 est=36463 next=0x6a368930
Reparsing due to card est...
@=0x6a368b90 type=5 nodeid=4 monitor=Y halias="" loc="SEL$582FA660" onames="SEL$608EC1F7"@"SEL$582FA660" "SEL$04458B50"@"SEL$582FA660" "SEL$FB0FE72C"@"SEL$33802F1B" "SEL$5"@"SEL$33802F1B" "SEL$07BDC5B4"@"SEL$636B5685" "SEL$FB0FE72C"@"SEL$4" "SEL$F5BB74E1"
@"SEL$4" act=4132 min=0 est=12039 next=0x6a368b30
Reparsing due to card est...
@=0x6a368d60 type=4 nodeid=3 monitor=Y halias="" loc="SEL$582FA660" act=1491 min=0 est=12039 next=0x6a368b90

An additional OPT_ESTIMATE hint is generated for the complext view merging view query block:

atom_hint=(@=0x7fe74e21eb90 err=0 resol=0 used=0 token=1018 org=6 lvl=3 txt=OPT_ESTIMATE (@"SEL$1" INDEX_FILTER "DM_FOLDER_R1"@"SEL$1" "D_1F0049A880000016" ROWS=1517.000000 ) )
atom_hint=(@=0x7fe74e21e7b0 err=0 resol=0 used=0 token=1018 org=6 lvl=3 txt=OPT_ESTIMATE (@"SEL$1" INDEX_SCAN "DM_FOLDER_R1"@"SEL$1" "D_1F0049A880000016" MIN=1517.000000 ) )
atom_hint=(@=0x7fe74e21e470 err=0 resol=0 used=0 token=1018 org=6 lvl=3 txt=OPT_ESTIMATE (@"SEL$1" TABLE "DM_FOLDER_R1"@"SEL$1" ROWS=1517.000000 ) )
atom_hint=(@=0x7fe74e21e050 err=0 resol=0 used=0 token=1018 org=6 lvl=3 txt=OPT_ESTIMATE (@"SEL$1" INDEX_FILTER "DM_SYSOBJECT_R2"@"SEL$1" "D_1F0049A880000010" MIN=3.000000 ) )
atom_hint=(@=0x7fe74e21dce8 err=0 resol=0 used=0 token=1018 org=6 lvl=3 txt=OPT_ESTIMATE (@"SEL$1" INDEX_SCAN "DM_SYSOBJECT_R2"@"SEL$1" "D_1F0049A880000010" MIN=3.000000 ) )
atom_hint=(@=0x7fe74e21da38 err=0 resol=0 used=0 token=1018 org=6 lvl=2 txt=OPT_ESTIMATE (@"SEL$582FA660" QUERY_BLOCK ROWS=1491.000000 ) )
atom_hint=(@=0x7fe74e21d600 err=0 resol=0 used=0 token=1018 org=6 lvl=3 txt=OPT_ESTIMATE (@"SEL$1" TABLE "DM_SYSOBJECT_R2"@"SEL$1" MIN=3.000000 ) )

Whith this new cardinality estimation, the next execution will try to get a better execution, but it doesn’t change the optimizer choice and the new child cursor gets the same execution plan:
KKSMEC: Invalidating old cursor 1 with hash val = 955499861
KKSMEC: Produced New cursor 2 with hash val = 955499861

Child cursor 2 – plan 955499861 again – 4.77 seconds

This the third execution:

Plan hash value: 955499861
------------------------------------------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | Cost (%CPU)| A-Rows | A-Time | Buffers | Reads | OMem | 1Mem | Used-Mem |
------------------------------------------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 30996 (100)| 1 |00:00:04.19 | 102K| 101K| | | |
| 1 | SORT AGGREGATE | | 1 | 1 | | 1 |00:00:04.19 | 102K| 101K| | | |
| 2 | VIEW | VM_NWVW_2 | 1 | 1491 | 30996 (1)| 1491 |00:00:04.19 | 102K| 101K| | | |
| 3 | HASH UNIQUE | | 1 | 1491 | 30996 (1)| 1491 |00:00:04.19 | 102K| 101K| 941K| 941K| 1355K (0)|
|* 4 | HASH JOIN RIGHT SEMI | | 1 | 12039 | 30490 (1)| 4132 |00:00:04.19 | 102K| 101K| 12M| 3867K| 14M (0)|
| 5 | TABLE ACCESS FULL | DM_DOCUMENT_S | 1 | 213K| 210 (1)| 213K|00:00:00.01 | 740 | 0 | | | |
|* 6 | HASH JOIN | | 1 | 36463 | 29665 (1)| 5622 |00:00:04.12 | 101K| 101K| 1405K| 1183K| 2021K (0)|
|* 7 | HASH JOIN | | 1 | 36463 | 18397 (1)| 5622 |00:00:03.39 | 65102 | 65050 | 940K| 940K| 1359K (0)|
|* 8 | HASH JOIN | | 1 | 2222 | 14489 (1)| 1499 |00:00:02.94 | 51412 | 51369 | 992K| 992K| 1331K (0)|
|* 9 | HASH JOIN | | 1 | 2222 | 14120 (1)| 1499 |00:00:01.04 | 50088 | 50057 | 3494K| 1598K| 4145K (0)|
|* 10 | TABLE ACCESS FULL | DM_SYSOBJECT_S | 1 | 39235 | 10003 (1)| 39235 |00:00:00.47 | 36385 | 36376 | | | |
|* 11 | HASH JOIN | | 1 | 24899 | 3920 (1)| 5648 |00:00:00.55 | 13703 | 13681 | 1199K| 1199K| 1344K (0)|
|* 12 | INDEX RANGE SCAN | D_1F0049A880000016 | 1 | 1517 | 12 (0)| 1517 |00:00:00.01 | 13 | 0 | | | |
|* 13 | TABLE ACCESS FULL| DM_SYSOBJECT_R | 1 | 646K| 3906 (1)| 646K|00:00:00.43 | 13690 | 13681 | | | |
| 14 | TABLE ACCESS FULL | DM_FOLDER_S | 1 | 431K| 367 (1)| 431K|00:00:01.82 | 1324 | 1312 | | | |
|* 15 | TABLE ACCESS FULL | DM_SYSOBJECT_R | 1 | 646K| 3906 (1)| 646K|00:00:00.33 | 13690 | 13681 | | | |
|* 16 | TABLE ACCESS FULL | DM_SYSOBJECT_S | 1 | 646K| 10000 (1)| 646K|00:00:00.60 | 36385 | 36376 | | | |
------------------------------------------------------------------------------------------------------------------------------------------------------------
Note
-----
- statistics feedback used for this statement

Same plan and same execution time here. Tables are large and SGA is small here.

*********** Begin Dump Context: best estimates ***********
 
**************************************************************
kkocfbCompareExecStats : Current
Exec count: 1
CR gets: 102224
CU gets: 3
Disk Reads: 101426
Disk Writes: 0
IO Read Requests: 1633
IO Write Requests: 0
Bytes Read: 830881792
Bytes Written: 0
Bytes Exchanged with Storage: 830881792
Bytes Exchanged with Disk: 830881792
Bytes Simulated Read: 0
Bytes Simulated Returned: 0
Elapsed Time: 4206 (ms)
CPU Time: 1279 (ms)
User I/O Time: 3084 (ms)
**************************************************************
kkocfbCompareExecStats : Best
Exec count: 1
CR gets: 17679
CU gets: 0
Disk Reads: 0
Disk Writes: 0
IO Read Requests: 0
IO Write Requests: 0
Bytes Read: 0
Bytes Written: 0
Bytes Exchanged with Storage: 0
Bytes Exchanged with Disk: 0
Bytes Simulated Read: 0
Bytes Simulated Returned: 0
Elapsed Time: 51 (ms)
CPU Time: 51 (ms)
User I/O Time: 15 (us)
kkocfbCompareExecStats: improvement BG: 0.172939 CPU: 0.040363

So where we are here? We had an execution which was based on bad estimations. Then two tries on good estimations, but because of different buffer cache behavior they are finally 5 times slower. Nothing else to try.

The good thing is that the optimizer admits it cannot do better and falls back to the best execution time, now considered as the best estimate:

kkocfbCheckCardEst: reparse using best estimates
...
kkocfbCopyCardCtx: No best stats found

We see no OPT_ESTIMATE hints here, reason why there was noting in V$SQL_REOPTIMIZATION_HINTS for cursor 2, but this cursor is still marked as re-optimizable and next execution invalidates it:

KKSMEC: Invalidating old cursor 2 with hash val = 955499861
KKSMEC: Produced New cursor 3 with hash val = 1894156093

We see that we are back to the original plan, which is expected because the static statistics have not changed, and there are no statistics feedback this time.

Child cursor 3 – back to plan 1894156093 – 0.5 seconds

This is the plan that si used for all subsequent executions now.

Plan hash value: 1894156093
----------------------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | Cost (%CPU)| A-Rows | A-Time | Buffers |
----------------------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 171 (100)| 1 |00:00:00.04 | 17677 |
| 1 | SORT AGGREGATE | | 1 | 1 | | 1 |00:00:00.04 | 17677 |
| 2 | NESTED LOOPS | | 1 | 1 | 116 (0)| 1491 |00:00:00.04 | 17677 |
| 3 | NESTED LOOPS | | 1 | 1 | 115 (0)| 1491 |00:00:00.04 | 17454 |
| 4 | NESTED LOOPS | | 1 | 49 | 17 (0)| 5648 |00:00:00.01 | 536 |
|* 5 | INDEX RANGE SCAN | D_1F0049A880000016 | 1 | 3 | 3 (0)| 1517 |00:00:00.01 | 13 |
| 6 | TABLE ACCESS BY INDEX ROWID BATCHED | DM_SYSOBJECT_R | 1517 | 16 | 10 (0)| 5648 |00:00:00.01 | 523 |
|* 7 | INDEX RANGE SCAN | D_1F0049A880000010 | 1517 | 71 | 2 (0)| 5648 |00:00:00.01 | 249 |
|* 8 | TABLE ACCESS BY INDEX ROWID | DM_SYSOBJECT_S | 5648 | 1 | 2 (0)| 1491 |00:00:00.03 | 16918 |
|* 9 | INDEX UNIQUE SCAN | D_1F0049A880000108 | 5648 | 1 | 1 (0)| 1491 |00:00:00.03 | 15427 |
| 10 | NESTED LOOPS SEMI | | 5648 | 2 | 25 (0)| 1491 |00:00:00.02 | 14827 |
| 11 | NESTED LOOPS | | 5648 | 7 | 18 (0)| 2981 |00:00:00.02 | 12868 |
| 12 | TABLE ACCESS BY INDEX ROWID BATCHED| DM_SYSOBJECT_R | 5648 | 71 | 4 (0)| 2981 |00:00:00.01 | 7747 |
|* 13 | INDEX RANGE SCAN | D_1F0049A880000010 | 5648 | 16 | 3 (0)| 2981 |00:00:00.01 | 6145 |
|* 14 | TABLE ACCESS BY INDEX ROWID | DM_SYSOBJECT_S | 2981 | 1 | 2 (0)| 2981 |00:00:00.01 | 5121 |
|* 15 | INDEX UNIQUE SCAN | D_1F0049A880000108 | 2981 | 1 | 1 (0)| 2981 |00:00:00.01 | 2140 |
|* 16 | INDEX UNIQUE SCAN | D_1F0049A880000145 | 2981 | 52759 | 1 (0)| 1491 |00:00:00.01 | 1959 |
|* 17 | INDEX UNIQUE SCAN | D_1F0049A880000142 | 1491 | 1 | 1 (0)| 1491 |00:00:00.01 | 223 |
----------------------------------------------------------------------------------------------------------------------------------------

After a few tries to get a better plan, the optimizer finally switched back to the first one because it was the best in term of response time (I don’t know exactly which execution statistics are used for this decision, elapsed time is just my guess here).

The interesting point here is to understand that you can see a reoptimized cursor without statistics feedback:

  • No rows for the previous cursor in V$SQL_REOPTIMIZATION_HINTS
  • No ‘statistics feedback’ not in the new cursor plan
  • Difference between E-Rows and A-Rows in the new plan
So what?

SQL optimization is a complex task and there is nothing like an execution is ‘fast’ or ‘slow’, an execution plan is ‘good’ or ‘bad’, an optimizer decision is ‘right’ or ‘wrong’. What is fast after several similar executions can be slow on a busy system because less blocks remains in cache. What is slow at a time where the storage is busy may be fast at another time of the day. What is fast with one single user may raise more concurrency contention on a busy system. Cardinality feedback is a reactive attempt to improve an execution plan. On average, things go better with it, but it is not abnormal that few cases can go wrong for a few executions. You can’t blame the optimizer for that, and fast conclusions or optimizer parameter tweaking are not sustainable solutions. And don’t forget that if your data model is well designed, then the critical queries should have one clear optimal access path which will not depend on a small difference in estimated number of rows.

The only thing I can always conclude when I see cardinality feedback going wrong is that there is something to fix in the design of data model, the statistics gathering and/or the query design. When statistics feedback gives a worse execution plan, it is the consequence of the combination of:

  • mis-estimation of cardinalities: bad, insufficient, or stale statistics
  • mis-estimation of response time: bad system statistics, untypical memory sizing, unrepresentative execution context
  • no clear optimal access path: sub-optimal indexing, lack of partitioning,…

It is a good thing to have the auto-reoptimization coming back to the initial plan when nothing better has been observed. I would love to see more control about it. For example, a hint that sets a threshold of execution time where the optimizer should not try to find better. I filled this idea in https://community.oracle.com/ideas/17514 and you can vote for it.

Update 2-APR-2017

I was not clear in this post, but this is the first time I observed this behavior (multiple reoptimization and then back to original plan), so I’m not sure about the reasons and the different conditions required. This was on 12.1.0.2 with JAN17 PSU and the two Adaptive Statistics backport from 12cR2, adaptive plans set to false and no bind variables.

 

Cet article When automatic reoptimization plan is less efficient est apparu en premier sur Blog dbi services.

Pages