First NYC Oracle Meetup

Thanks to all who came out and thanks to the D.B.A. bar for hosting our meetup.  Tuesday evenings start out nice and quiet but by 9:30 they are in full swing so I’ll consider looking into a quieter, lighter space. But, with free wifi and the perfect bar name we’ll see.  Things we covered:

OOW anouncement of 12.2 release on the cloud
12c upgrade demo (was finishing an upgrade on a VM of my standby node)
rlwrap and pre-loading a wordfile for faster typing 🙂
R vs Oracle R

As mentioned in Meetup, I somehow ended up managing two Oracle groups. I’m going to send out a notification but will be closing one to make it simpler.

Thanks again and will see you next month!

Upgrade Stability (Part 4 The Final Countdown)

In the last three posts we covered creating SQL Tuning Sets, diagnosing regressed SQL, creating SQL Plan Baselines, and verifying that the baselines have taken effect. Now it’s just a matter of upgrading your production database.

But, what happens if a baseline doesn’t work? Baselines are essentially just hints and the CBO can be temperamental at times. 12c includes many changes to the CBO so anything can happen. You can troubleshoot by capturing a 10053 trace and grep’ing for SPM. Note that baselines may no longer work for internal SYS SQL, so keep that in mind.

Also in 12c, SPM now stores the actual execution plan right in the SQL Management Base where as before the SQL had to be compiled to generate the plan.  Check to make sure the plan in the baseline is the plan that you are expecting.

SELECT * FROM TABLE(DBMS_XPLAN.display_sql_plan_baseline('&&sql_handle.', '&&plan_name.', 'ADVANCED'));

Other things of note:

Licensing. SPM is included with Enterprise Edition and has been enabled by default since 11.1. That means you can use DBMS_SPM to create SQL Plan Baselines, but as soon as you start using DBMS_SQLTUNE you’ve ventured into the realm of a TUNING PACK license.

What if you are using Standard Edition or are worried about execution plans on your Active Data Guard database?  Remember that at the end of the day SQL Plan Baselines are effectively just stored hints.  Execution Plans has an outline section that includes all the hints that Oracle would need to reproduce that plan. Nothing is stopping you from using them by other means.

Don’t forget Bugs!

Master Note: Master Note: Plan Stability Features (Including SQL Plan Management (SPM)) (Doc ID 1359841.1)

Document 1948958.1 Patches to Consider for 11.2.0.3 to Avoid Problems with SQL Plan Management (SPM)
Document 2034706.1 Patches to Consider for 11.2.0.4 to Avoid Problems with SQL Plan Management (SPM)
Document 2035898.1 Patches to Consider for 12.1.0.2 to Avoid Problems with SQL Plan Management (SPM)

Things to Consider When Upgrading From 12.1.0.1 to Avoid Problems with SQL Plan Management (SPM) (Doc ID 2035897.1)

Finally, there is always a danger in using SPM. They are addictive and are really still a bandaid.  You still need to determine the reason why the plans changed and if it is because the CBO just doesn’t have the right data it needs to make better decisions.  That being said, this approach doesn’t necessarily have to be only for upgrades. Because it is so simple, you can use it to address issues with application releases, other system changes, or Y3K bugs.

Thanks for playing along.

UPDATE: I’m going to be presenting this approach at the NYOUG Fall Meeting on Wednesday 9/14. Hopefully, I’ll catch you there.

Upgrade Stability (Part 3 Now We Test)

In the previous post I showed how to create SQL Tuning Sets (STS) by capturing SQL statements from AWR and the cursor cache. Once the STS exist they can be used to create SQL Plan Baselines.  Ideally, you would export these and load them into a database that you are going to be testing your upgrades. Again, my scripts are based on the thorough work that Carlos has provided.

The first thing to do is to stage and pack your SQL Tuning Sets. By “tagging” the name of them with something common (See sqlset_tag in params.sql) the scripts will stage everything using the same tag.

Run pack_sql_tuning_sets.sql with a user with the ADMINISTER ANY SQL TUNING SET privilege. This script will stage your STS to a table called SPM_STGTAB_STS. Export this table with exp (or expdp) and scp it to your test database.

Now, run unpack_sql_tuning_sets.sql on your test database. I put a Pause in the script after it creates the staging table. At that point you can import your dump file and then hit enter to continue. As I type this, it occurs to me that the staging table step may not really be required, but we had to do it this way for our operation DBAs. Feel free to modify 🙂

Once the STS are unpacked they are available for your use. Run spm.sql to verify that your STS match production.  After that the next step is to upgrade your test database and start running some load!

So, now what? How do you know if SQLs have regressed? Carlos has a script that mines AWR and shows SQL that have either improved or gotten worse.  That is useful if you have enough AWR history that spans the period from before the upgrade. I did something similar which compares the elapsed time from the STS with what is currently in the cursor cache.

SQL> @sqlset_regress_cur.sql

SQL_ID          V12C_PHV EXECUTIONS    AVG_ETIME   V11G_PHV EXECUTIONS    AVG_ETIME D DIFF_ETIME PERFORMAN
------------- ---------- ---------- ------------ ---------- ---------- ------------ - ---------- ---------
6zjxqxdb9t6vd  890145704          3      318.897 2409362170     634767        1.630 * -317.26637 Regressed
376fjswqu4cnt 1920954746          3      140.638 2259879957       3218         .328 * -140.31039 Regressed
0wf3hdbcmfn9d 1566315087          3      161.397 1566315087        174       65.093   -96.304033 Regressed
f61rxd361b8g2 1566315087          3      113.604 1566315087        163       69.922   -43.682398 Regressed
gp1p36b8yskbv 3302961321          7       27.064 1218582728       8981         .682 * -26.381721 Regressed
b2kxd8xwd3ymy  659280062          1       26.402  659280062        101        7.523   -18.879445 Regressed
ghfdrs7hrw252 2155093733         33       23.465 2603635391      85319        5.554 * -17.911551 Regressed
0znx18d043bg3 2743575134          1       15.805 3971049314        107         .004 * -15.801179 Regressed
d29prd53mzuyc 2371823249          8       14.128 3772892946      17002         .084 * -14.044119 Regressed
a6yadgjwu1km1 4258360765         17        9.351 1959399513      81643         .015 * -9.3363534 Regressed

Kerry Osborne’s unstable_plans.sql is another useful script for identifying SQL that are changing PHV and shows SQLs with a wide spread between elapsed times. You can also use AWR’s Top SQL reports or ashtop to identify SQL_IDs that weren’t present before, but are now showing up.

Let’s say you’ve identified a SQL that you suspect as taking longer, let’s prove it. I use awr_plan_change_all.sql which was based on another one of Kerry’s, but mine includes some additional columns for where the time is being spent (IO, CPU, Cluster waits,etc)

SQL> @awr_plan_change_all 2
Enter value for sql_id: f3dmvyhb80a3c

   SNAP_ID   NODE BEGIN_INTERVAL_TIME            SQL_ID        PLAN_HASH_VALUE        EXECS    AVG_ETIME  AVG_CPUTIME   AVG_IOTIME AVG_CLUSTERTIME        AVG_LIO AVG_PLSQL_TIME
---------- ------ ------------------------------ ------------- --------------- ------------ ------------ ------------ ------------ --------------- -------------- --------------
     37365      1 10-AUG-16 10.00.42.154 AM      f3dmvyhb80a3c      3596954981          558        6.360        5.989         .001            .061      412,116.3              0
     37366      1 10-AUG-16 11.00.02.498 AM      f3dmvyhb80a3c                          597        6.038        5.738         .001            .044      414,712.6              0
     37367      1 10-AUG-16 12.00.15.310 PM      f3dmvyhb80a3c                          601        5.962        5.713         .000            .029      416,235.8              0
     37368      1 10-AUG-16 01.00.03.469 PM      f3dmvyhb80a3c                          596        6.030        5.776         .000            .017      417,754.2              0
     37369      1 10-AUG-16 02.00.09.605 PM      f3dmvyhb80a3c                          577        6.240        5.954         .000            .018      419,133.1              0
     37370      1 10-AUG-16 03.00.17.792 PM      f3dmvyhb80a3c                          577        6.209        5.921         .000            .014      419,306.3              0
     37371      1 10-AUG-16 04.00.08.722 PM      f3dmvyhb80a3c                          578        6.238        5.958         .000            .010      420,520.5              0
     37372      1 10-AUG-16 05.00.20.405 PM      f3dmvyhb80a3c                          568        6.348        6.062         .000            .011      421,907.1              0
     37373      1 10-AUG-16 06.00.32.517 PM      f3dmvyhb80a3c                          543        6.629        6.331         .000            .012      423,088.8              0
     37379      1 11-AUG-16 01.00.01.148 AM      f3dmvyhb80a3c      2260335173            1    2,215.397    2,131.329        3.398            .988  178,247,532.0              0
     37379      1 11-AUG-16 01.00.01.148 AM      f3dmvyhb80a3c      3249808128            1    1,387.308    1,338.448         .978            .476  186,764,546.0              0
     37383      1 11-AUG-16 05.00.04.139 AM      f3dmvyhb80a3c      2260335173            1    3,620.944    3,510.651        4.073           1.250  483,276,428.0              0

12 rows selected.

The above ran in 6 seconds, but in 12c switched to 2 different plans that were longer than 1.3K seconds. Checking to see if it is available in the STS. See find_sts.sql :

SQL> @find_sts_sql
Enter SQL_ID: f3dmvyhb80a3c

SQLSET_NAME                    SQLSET_OWNER                   PLAN_HASH_VALUE
------------------------------ ------------------------------ ---------------
STS_11G_AWR                    STS_OWNER                           3596954981
STS_11G_CURCACHE_1             STS_OWNER                           3596954981
STS_11G_CURCACHE_2             STS_OWNER                           3596954981

Now it’s just a matter of creating the baseline. See create_spb_from_sts.sql and baselines.sql:

SQL> @create_spb_from_sts.sql 

Enter SQL_ID: f3dmvyhb80a3c
Enter SQL Set Name: STS_11G_CURCACHE_1
Enter SQL Set Owner: STS_OWNER
Enter optional Plan Hash Value: 3596954981

SQL> @baselines

SQL_HANDLE                                       SIGNATURE PLAN_NAME                      CREATED                        LAST_EXECUTED                  ACC FIX ENA
------------------------------ --------------------------- ------------------------------ ------------------------------ ------------------------------ --- --- ---
...
SQL_c8415cc67a8e1de7                  14429916688601193959 SQL_PLAN_chhawstx8w7g7c24dff08 11-AUG-16 09.27.54.000000 AM   11-AUG-16 09.34.43.000000 AM   YES YES YES
...

16 rows selected.

Another extra step that we have been doing is flushing the cursor(s) to force a new hard parse (and hopefully immediately start using the baseline). See flush_cursor.sql and sq.sql.

SQL> @flush_cursor
Enter value for sql_id: f3dmvyhb80a3c

'EXECDBMS_SHARED_POOL.PURGE('''||ADDRESS||','||HASH_VALUE||''',''C'');--RUNFROMINSTANCE'||INST_ID
-------------------------------------------------------------------------------------------------------------------------------------------------------------
exec DBMS_SHARED_POOL.PURGE ('0000013F89F95BB8, 377497708', 'C'); -- run from instance1


SQL> @sq f3dmvyhb80a3c

   INST_ID CHILD_NUMBER EXECUTIONS PLAN_HASH_VALUE SQL_PLAN_BASELINE              SQL_PROFILE                                                      LAST_ACTIVE_TIME  AVG_ELAPSED
---------- ------------ ---------- --------------- ------------------------------ ---------------------------------------------------------------- ----------------- -----------
         1            0         24      3596954981 SQL_PLAN_chhawstx8w7g7c24dff08                                                                  20160811 09:43:56  6.39465699

There you go. Now it’s just a matter of doing those steps until the database performance stabilizes while continuing to test and noting the baselines that might be required in production. You are set for production.

Upgrade Stability (Part 2 Electric Boogaloo)

In the previous post, I described the method we used to mitigate the performance risks associated with upgrading from 11.2.0.3 to 12.1. The plan was to capture SQL cursors in AWR and the cursor caches of each of our RAC instances in SQL Tuning Sets.

So, what to capture?  It first depends on how large your share pool is and how often your SQL gets aged out. In our case, this wasn’t problem since our shared pool is huge (225G).  Second, what is too much?  As stated before, we have over 200K cursors in GV$SQL. Do we really need every one? This is how I determined what to grab:

-- Totals for everything, by APP SCHEMA (renamed schemas to protect the innocent) 
SQL>  select parsing_schema_name,count(distinct sql_id) from gv$sql where parsing_schema_name in ('APP1','APP2','APP3','APP4') and last_active_time > sysdate-5 group by parsing_schema_name order by 2,1;

PARSING_SCHEMA_NAME            COUNT(DISTINCTSQL_ID)
------------------------------ ---------------------
APP1                                             105
APP2                                            2715
APP3                                           79425
APP4                                          226309

-- Totals for everything run in the last day
SQL>  select parsing_schema_name,count(distinct sql_id) from gv$sql where parsing_schema_name in ('APP1','APP2','APP3','APP4') and last_active_time > sysdate-1 group by parsing_schema_name order by 2,1;

PARSING_SCHEMA_NAME            COUNT(DISTINCTSQL_ID)
------------------------------ ---------------------
APP1                                             100
APP2                                             913
APP3                                            4446
APP4                                           65223

-- Totals for everything executed over 50 times
SQL> select parsing_schema_name, count(distinct sql_id) from gv$sql where parsing_schema_name in ('APP1','APP2','APP3','APP4') and executions > 50 group by parsing_schema_name;

PARSING_SCHEMA_NAME            COUNT(DISTINCTSQL_ID)
------------------------------ ---------------------
APP1                                              14
APP2                                              33
APP3                                             207
APP4                                            2063

SQL>  select parsing_schema_name,count(distinct sql_id) from gv$sql where parsing_schema_name in ('APP1','APP2','APP3','APP4') and executions > 1000 group by parsing_schema_name order by 2,1;

PARSING_SCHEMA_NAME            COUNT(DISTINCTSQL_ID)
------------------------------ ---------------------
APP1                                               1
APP2                                               2
APP3                                              46
APP4                                             591

SQL>  select parsing_schema_name,count(distinct sql_id) from dba_hist_sqlstat where parsing_schema_name in ('APP1','APP2','APP3','APP4') and loaded_versions > 0 group by parsing_schema_name order by 2,1;

PARSING_SCHEMA_NAME            COUNT(DISTINCTSQL_ID)
------------------------------ ---------------------
APP1                                               1
APP2                                              38
APP4                                            1342


We opted to capture anything that was executed more than 50 times. Noting, these numbers were captured prior to us making the decision to load a separate STS for each instance. And we captured everything that was loaded into AWR.

The two scripts that were used to load all of the above were:
create_sql_tuning_set_from_cursor_cache.sql
create_sql_tuning_set_from_awr.sql

If you want to divert from the choices that I made (and you probably will) you’ll have to modify these scripts accordingly.  I did create params.sql that is used to define:

SQLSET_OWNER <<< who owns your STS (needs to have ADMINISTER ANY SQL TUNING SET privilege
SQLSET_TAG <<< used as the prefix of your STS. I used STS_11G for our upgrades
PARSING_SCHEMAS <<< comma delimited list of the schemas you care about. For example, APP1, APP2,APP3,APP4

Couple of other things that you could do differently.

1) Use DBMS_SQLTUNE.CAPTURE_CURSOR_CACHE_SQLSET which periodically polls the shared pool  to find statements to add. This would be helpful if SQL ages out before you can load it

2) Use a different LOAD_OPTION since you can either INSERT (the default), UPDATE, or MERGE statements that are discovered. In our case, once our application has been running for a week there will be no new SQLs introduced, so a one time capture was sufficient. Also, rather than dealing with Plan Hash Value (PHV) differences between instances we opted to just store one STS for each instance. This was okay for us since on our largest database, collectively across our six nodes, all the statements captured required only 100 MB to store.  Less headache and the duplicates could be considered to be a backup if a STS wasn’t able to create the baseline.  This last tidbit ended up proving to be useful.

After running the create_tuning_set* scripts, mine looked like this: (From spm.sql)

SQL> select SQLSET_NAME, SQLSET_OWNER, count(*) from dba_sqlset_statements group by SQLSET_NAME, SQLSET_OWNER order by 2,1;
SQLSET_NAME                    SQLSET_OWNER                     COUNT(*)
------------------------------ ------------------------------ ----------
STS_11G_AWR                    STS_OWNER                            1430
STS_11G CURCACHE_1             STS_OWNER                            2522
STS_11G_CURCACHE_2             STS_OWNER                            2457
STS_11G_CURCACHE_3             STS_OWNER                            2344
STS_11G_CURCACHE_4             STS_OWNER                            2025
STS_11G_CURCACHE_5             STS_OWNER                            2020
STS_11G_CURCACHE_6             STS_OWNER                            1772

Note, that the latest version in github tacks on a unique identifier to the SQLSET_NAME, so the same scripts can be run more than once and not have a name collision.

Now, if you do this on your production database you should be all set for your upgrade. Well, that’s only if you trust me. And you shouldn’t.  In the next post we will cover staging these STS and exporting them so you can set them up in your test database.

Upgrade Stability (Part 1)

We recently upgraded some databases to 12c.  They were big, RAC’d, and due to their exposure to the world, were extremely performance sensitive.  I’ve found that no matter how much you test and test and test and test there will always be some SQL that the CBO will just decide to do it’s own thing. This is especially true when you move to any new release. So, being able to quickly diagnose and react to regressed SQL is critical to a successful upgrade strategy.

I’d read some whitepapers and presentations that discussed using Real Application Testing and SQL Performance Analyzer (SPA) for 12c upgrades, but this didn’t really address the problem I was trying to solve.  What do you do when something happens that wasn’t caught in testing?

Enter SQL Plan Management.  Nigel Bayliss’s  post “Upgrade to Oracle Database 12c and Avoid Query Regression” was more what I was looking for. The idea is that you capture SQL Plan Baselines for all the SQL that you care about and hold dear. I’ve used SQL Plan Baselines several times in the past, but always to address a single SQL that wasn’t able to be tuned (someone else’s application code).  I’ve also Health Checked databases where optimizer_capture_sql_plan_baselines had been set to true and some have had hundreds of thousands of available baselines. But, this approach is something completely different.

I spoke with Mauro Pagnano and Carlos Sierra about doing this for our upgrades and they were both reluctant to just create all these 11g SQL Plan Baselines.  First, these databases have over 200K cursors, so that is a lot of baselines that would need to be created in an otherwise happy database.  Secondly, 11g SQL Baselines aren’t necessarily guaranteed to work in 12c. Third, there are bugs that would need to be patched. Before, I’d lost all hope, they suggested creating SQL Tuning Sets instead.  These are safer to create, easy to export, and with them you can create 12c SQL Plan Baselines with the same execution plans from 11g.

My only problem was that I’d not really used SQL Tuning Sets outside of SPA before. And that was ages ago. I knew they were sometimes created when running Advisors (I’m not really an advisor kind of a guy) or when running SQLT. But, never intentionally!

Enter Carlos’s SPM scripts. They are a treasure trove of scripts for creating SQL Tuning Sets (STS) for cursors that are either in AWR or in the cursor cache, packing them up, exporting them, importing, unpacking, and creating SQL Plan Baselines from an STS. Thank you Carlos!

Over the next few posts I’ll cover the method we used to capture our statements of interest, determine when a SQL Plan Baseline was required, create a baselines, and verify that it is being used.

DBA Disaster: A One Act Play

[Somewhere in a cubicle in corporate America]

DBA: OMG! My database has stopped responding and nothing is working!

CYLBM: Have no fear! It is I, your friendly Check Your Logs Blog Man. I am here to help. First off, have you checked your logs?

DBA: Of course I…wait…hmmm. Hold on.

[mad typing sounds emit from the cubicle]

DBA: Yes! We are saved! Thank you Check Your Logs Blog Man!

CYLBM: You are welcome, mid level DBA!

And remember folks, when disaster strikes, you too can follow my simple one step plan: Check your LOOOOOOOGGGGGGGS!

[And scene.]

GC CR Block Lost vs 12c

Had a panic after a recent upgrade when we noticed that gc cr block lost had a 600x increase.  Fortunately, Ram Cheruvattath had run across this previously and directed me to

False increase of ‘Global Cache Blocks Lost’ or ‘gc blocks lost’ after upgrade to 12c (Doc ID 2096299.1)

SQL> ed
Wrote file afiedit.sql
1 SELECT sysdate snap_time,a.inst_ID "INSTANCE", A.VALUE "GC BLOCKS LOST",
2 B.VALUE "GC CUR BLOCKS SERVED",
3 C.VALUE "GC CR BLOCKS SERVED",
4 A.VALUE/(B.VALUE+C.VALUE) RATIO
5 FROM GV$SYSSTAT A, GV$SYSSTAT B, GV$SYSSTAT C
6 WHERE A.NAME='gc blocks lost' AND
7 B.NAME='gc current blocks served' AND
8 C.NAME='gc cr blocks served' and
9 B.INST_ID=a.inst_id AND
10 C.INST_ID = a.inst_id
11* order by 2
SQL> /

SNAP_TIME INSTANCE GC BLOCKS LOST GC CUR BLOCKS SERVED GC CR BLOCKS SERVED RATIO
----------------- ---------- -------------- -------------------- ------------------- ----------
20160810 14:01:08 1 19373 6732648 7025417 .00140812
20160810 14:01:08 2 16649 7331907 5754179 .001272267
20160810 14:01:08 3 85182 10767159 8645896 .004387872
20160810 14:01:08 4 37808 8047542 14586125 .001670432
20160810 14:01:08 5 37987 8310080 15634197 .001586475
20160810 14:01:08 6 1877 2385405 713257 .000605745

6 rows selected.

SQL> /

SNAP_TIME INSTANCE GC BLOCKS LOST GC CUR BLOCKS SERVED GC CR BLOCKS SERVED RATIO
----------------- ---------- -------------- -------------------- ------------------- ----------
20160810 14:01:26 1 19446 6744274 7031562 .001411602
20160810 14:01:26 2 16716 7344979 5759818 .001275563
20160810 14:01:26 3 85187 10771512 8650641 .004386074
20160810 14:01:26 4 37917 8074631 14623146 .001670516
20160810 14:01:26 5 38083 8336128 15713424 .001583522
20160810 14:01:26 6 1877 2385555 713323 .000605703

6 rows selected.

This upgrade included a migration to brand new hardware so I was suspicious that that was the case, but having so many bad cables would have been even more suspicious.  A reporting database that didn’t include new hardware exhibited the same behavior so, bug it is.

Now this being said, on a completely different databases, one of the instances on an Active Data Guard standby was reporting 30K lost blocks a minute. So, something else was going on. This server was showing many packet reassembles failures:

bzzkf0@eagnmnmep1f60:/home/bzzkf0> cat packets.sh
while true;
do
  x=`netstat -s | grep "packet reassembles failed"|awk '{print $1}'`
  echo `date` ": " `expr $x - $t`;
  t=$x
  sleep 5;
done;

...
Wed Aug 17 16:08:01 CDT 2016 :  265
Wed Aug 17 16:08:06 CDT 2016 :  138
Wed Aug 17 16:08:11 CDT 2016 :  211
Wed Aug 17 16:08:16 CDT 2016 :  1105
Wed Aug 17 16:08:21 CDT 2016 :  1649
Wed Aug 17 16:08:26 CDT 2016 :  1449
Wed Aug 17 16:08:31 CDT 2016 :  1896
Wed Aug 17 16:08:36 CDT 2016 :  2175
Wed Aug 17 16:08:41 CDT 2016 :  3152
Wed Aug 17 16:08:46 CDT 2016 :  3205
Wed Aug 17 16:08:51 CDT 2016 :  2801
Wed Aug 17 16:08:56 CDT 2016 :  2675
Wed Aug 17 16:09:01 CDT 2016 :  1328
Wed Aug 17 16:09:06 CDT 2016 :  451
Wed Aug 17 16:09:11 CDT 2016 :  89
...

This issue was resolved by increasing the buffer size from 256 to 1M. Still need to figure out why there was such an increase in traffic for this to even manifest.  But, stable for now.

Baselines vs SYS

I’ve been working a bit more with 12c lately and ran across something that was a bummer.  We’ve been troubleshooting some changes with partition maintenance running longer than it had in 11.2.0.3 and found that 32K seconds were being spent on 2c3tb462qkzr8

SQL> @ashtop event,sql_id "top_level_sql_id='04fqpz7dx017m'" sysdate-14/24 sysdate

    Total
  Seconds     AAS %This   EVENT                                    SQL_ID        FIRST_SEEN          LAST_SEEN           DIST_SQLEXEC_SEEN
--------- ------- ------- ---------------------------------------- ------------- ------------------- ------------------- -----------------
    32934      .7   61% |                                          2c3tb462qkzr8 2016-07-20 00:00:37 2016-07-20 06:21:35             22606
      882      .0    2% |                                          crds9mkr8v07x 2016-07-20 00:02:21 2016-07-20 06:21:32               882
      600      .0    1% | db file sequential read                  0kcgdb5dmj1km 2016-07-20 00:00:20 2016-07-20 06:21:28               597
      597      .0    1% | db file sequential read                  7sahkyk9d20sq 2016-07-20 00:00:27 2016-07-20 06:21:38               589
      564      .0    1% | row cache lock                           2j4xjxw846717 2016-07-20 00:01:36 2016-07-20 06:19:47               564
      534      .0    1% | row cache lock                           18uzvkt8ka4gx 2016-07-20 00:01:44 2016-07-20 06:20:04               534
      361      .0    1% |                                          g7mt7ptq286u7 2016-07-20 00:00:41 2016-07-20 06:15:49               346
...
20 rows selected.

SQL> @stext
Enter value for sql_id: 2c3tb462qkzr8

(SELECT TP.OBJ# C_KEYOBJ# , BO.SPARE2 C_OBJV#, 2 C_LVLCNT , TP.OBJ# C_LVL1OBJ# , NULL C_LVL2OBJ# , PO.TYPE# C_LVL1TYPE# , NULL C_LVL2TYPE# , PO.SUBNAME C_LVL1NAME , NULL C_LVL2NAME , 0 C_ASSOC# , TP.TS# C_TS# , TS.NAME C_TSNAME FROM SYS.OBJ$ PO, SYS.OBJ$ BO, SYS.TABPART$ TP, SYS.TAB$ T, SYS.TS$ TS WHERE TP.TS# = TS.TS# AND BO.OBJ# = T.OBJ# AND BO.OBJ# = TP.BO# AND PO.OBJ# = TP.OBJ# AND BO.OBJ# = :B1 UNION SELECT CP.OBJ# C_KEYOBJ# , BO.SPARE2 C_OBJV#, 2 C_LVLCNT , CO.OBJ# C_LVL1OBJ# , NULL C_LVL2OBJ# , CO.TYPE# C_LVL1TYPE# , NULL C_LVL2TYPE# , CO.SUBNAME C_LVL1NAME , NULL C_LVL2NAME , 0 C_ASSOC# , NULL C_TS# , NULL C_TSNAME FROM SYS.OBJ$ CO, SYS.OBJ$ BO, SYS.TABCOMPART$ CP, SYS.USER$ U, SYS.TAB$ T WHERE BO.OBJ# = T.OBJ# AND BO.OBJ# = CP.BO# AND CO.OBJ# = CP.OBJ# AND BO.OBJ# = :B1 UNION SELECT SP.OBJ# C_KEYOBJ# , BO.SPARE2 C_OBJV#, 3 C_LVLCNT , CO.OBJ# C_LVL1OBJ# , SO.OBJ# C_LVL2OBJ# , CO.TYPE# C_LVL1TYPE# , SO.TYPE# C_LVL2TYPE# , CO.SUBNAME C_LVL1NAME , SO.SUBNAME C_LVL2NAME , 0 C_ASSO
C

Looks pretty Oracle-ish. In AWR I could see that previously there was a better execution plan so we were directed to create a SQL Baseline for it. Seemed like a good idea since I’d done similar things in 11g when I was in a pinch. We created the baseline only to see that it was still not using the better plan. Ran a 10053 trace and after grep’ing for SPM low and behold:

[andy.klock.MW7RHEPV318NVP] ➤ grep -r "SPM:" /cygdrive/c/partition_maint_12c/*.*
/cygdrive/c/partition_maint_12c/2c3tb462qkzr8_10053.trc:SPM: statement found in SMB
/cygdrive/c/partition_maint_12c/2c3tb462qkzr8_10053.trc:SPM: finding a match for the generated plan, planId = 1189830030
/cygdrive/c/partition_maint_12c/2c3tb462qkzr8_10053.trc:SPM: fixed planId's of plan baseline are: 1756995255
/cygdrive/c/partition_maint_12c/2c3tb462qkzr8_10053.trc:SPM: using qksan to reproduce, cost and select accepted plan, sig = 11918263451633923973
/cygdrive/c/partition_maint_12c/2c3tb462qkzr8_10053.trc:SPM: plan reproducibility round 1 (plan outline + session OFE)
/cygdrive/c/partition_maint_12c/2c3tb462qkzr8_10053.trc:SPM: using qksan to reproduce accepted plan, planId = 1756995255
/cygdrive/c/partition_maint_12c/2c3tb462qkzr8_10053.trc:SPM: planId in plan baseline = 1756995255, planId of reproduced plan = 1756995255
/cygdrive/c/partition_maint_12c/2c3tb462qkzr8_10053.trc:SPM: best cost so far = 12019.439501, current accepted plan cost = 12019.439501
/cygdrive/c/partition_maint_12c/2c3tb462qkzr8_10053.trc:SPM: re-parse to use selected accepted plan, planId = 1756995255
/cygdrive/c/partition_maint_12c/2c3tb462qkzr8_10053.trc:SPM: statement found in SMB
/cygdrive/c/partition_maint_12c/2c3tb462qkzr8_10053.trc:SPM: re-parsing to generate selected accepted plan,  planId = 1756995255
/cygdrive/c/partition_maint_12c/2c3tb462qkzr8_10053.trc:SPM: finding a match for the generated plan, planId = 1756995255
/cygdrive/c/partition_maint_12c/2c3tb462qkzr8_10053.trc:SPM: re-parsed to use selected accepted plan, planId = 1756995255
/cygdrive/c/partition_maint_12c/2c3tb462qkzr8_10053_i1_c0_extract.trc:SPM: disallowed: SQL with a bootstrap object run by SYS

SPM: disallowed: SQL with a bootstrap object run by SYS

That was new to me and was informed that this was introduced via Bug 14029891. Something to consider if you need to change the execution plan for internal SQL.  Fortunately for us in this case we were able to disable the feature that was calling this SQL in the first place.

As Cary Millsap says, the fastest way to do something is to not do it at all. Cary also has something to say about mouse balls but that doesn’t necessarily apply here.

Recovering TDE Encryption Keys

There was a good discussion with some co-workers on database security, which really came down to locking down the OS accounts. Meaning, you can use technologies like Oracle TDE or Oracle Vault or Database Vault or Whatever Vault (), but if the operating system isn’t secure then nothing is really.  Frits Hoogland had mentioned that with tools like Linux’s shmcat you could attach to the OS memory and scrape whatever data happens to be available. If an encrypted database block is decrypted in the SGA then it is readable.  Now, in cases where security is taken seriously then the OS is typically locked down, however, this got me thinking about what else is available in memory.

There was a financial institution that had lost its Oracle SSO Wallet password and due to compliance issues were forced to do a complete export and import with a new encryption key. Is it possible that the TDE encryption key is available in memory and if it is, is it possible to add just the key to a new wallet for TDE to use? Let’s see.

First off, environment:

[oracle@sandbox ~]$ cat /etc/redhat-release
Red Hat Enterprise Linux Server release 5.5 (Tikanga)

SQL> select * from v$version;

BANNER
--------------------------------------------------------------------------------
Oracle Database 11g Enterprise Edition Release 11.2.0.2.0 - 64bit Production
PL/SQL Release 11.2.0.2.0 - Production
CORE    11.2.0.2.0      Production
TNS for Linux: Version 11.2.0.2.0 - Production
NLSRTL Version 11.2.0.2.0 - Production

My sqlnet.ora:

# sqlnet.ora Network Configuration File: /u01/app/oracle/product/11.2.0/dbhome_1/network/admin/sqlnet.ora
# Generated by Oracle configuration tools.

NAMES.DIRECTORY_PATH= (TNSNAMES)
ADR_BASE = /u01/app/oracle

#ENCRYPTION_WALLET_LOCATION=
#  (SOURCE=(METHOD=FILE)(METHOD_DATA=
#    (DIRECTORY=/u01/app/oracle/admin/ora11gr2/tde_wallet/)))

WALLET_LOCATION =
   (SOURCE =
     (METHOD = FILE)
     (METHOD_DATA =
       (DIRECTORY = /u01/app/oracle/admin/ora11gr2/tde_wallet)
     )
   )

#SQLNET.WALLET_OVERRIDE = TRUE
SSL_CLIENT_AUTHENTICATION = FALSE
SSL_VERSION = 0

Let’s set up an encryption key and create an encrypted tablespace.

SQL> !ls -l /u01/app/oracle/admin/ora11gr2/tde_wallet
total 0

SQL> alter system set encryption key identified by "abcd1234";

System altered.

SQL> !ls -l /u01/app/oracle/admin/ora11gr2/tde_wallet
total 4
-rw-r--r-- 1 oracle oinstall 1309 Sep  9 21:21 ewallet.p12

SQL> select * from v$encryption_wallet;

WRL_TYPE
--------------------
WRL_PARAMETER
--------------------------------------------------------------------------------
STATUS
------------------
file
/u01/app/oracle/admin/ora11gr2/tde_wallet
OPEN

SQL> alter system set encryption wallet close identified by "abcd1234";

System altered.

SQL> alter system set encryption wallet open identified by "abcd1234";

System altered.

SQL> create tablespace tde_tbs datafile '+DATA' size 1M encryption using 'AES256' default storage (encrypt);

Tablespace created.

SQL> create table t (id number) tablespace tde_tbs;

Table created.

SQL> insert into t (id) values (1);

1 row created.

SQL> commit;

Commit complete.

Ok, now let’s check out the contents of our wallet:

[oracle@sandbox ora11gr2]$ mkstore -wrl /u01/app/oracle/admin/ora11gr2/tde_wallet -list
Oracle Secret Store Tool : Version 11.2.0.2.0 - Production
Copyright (c) 2004, 2010, Oracle and/or its affiliates. All rights reserved.
 
Enter wallet password:
Oracle Secret Store entries:
ORACLE.SECURITY.DB.ENCRYPTION.AZXbXU8xmE8ivy0/MkYbHf8AAAAAAAAAAAAAAAAAAAAAAAAAAAAA
ORACLE.SECURITY.DB.ENCRYPTION.MASTERKEY 
 
[oracle@sandbox ~]$ mkstore -wrl /u01/app/oracle/admin/ora11gr2/tde_wallet -viewEntry ORACLE.SECURITY.DB.ENCRYPTION.AZXbXU8xmE8ivy0/MkYbHf8AAAAAAAAAAAAAAAAAAAAAAAAAAAAA
Oracle Secret Store Tool : Version 11.2.0.2.0 - Production
Copyright (c) 2004, 2010, Oracle and/or its affiliates. All rights reserved.
 
Enter wallet password:
ORACLE.SECURITY.DB.ENCRYPTION.AZXbXU8xmE8ivy0/MkYbHf8AAAAAAAAAAAAAAAAAAAAAAAAAAAAA = AEMAASAAlQCipDUWe29wSWLzdQWbcCczxpeMYyL3isvGSFlOxO4DEABzWudMB81vL2m4qi51N/3kBQcAeHQDFw84NA==

Ok, everything looks good above, but let’s see what’s in memory.

[oracle@bbh ~]$ shmcat -M 0x74b5ac88 |grep --text ORACLE.SECURITY.DB.ENCRYPTION
…
                                                                                                                                                 1▒@2nq▒▒H▒y䜰   JC=US,O="VeriSign, Inc.",OU=Class 2 Public Primary Certification AuthorityQ▒▒▒▒y䜰      0▒▒yIh▒y▒▒y9▒▒▒y䜰      ▒f~NE▒^Wo<▒^▒▒a▒▒▒y䜰   90▒40▒▒▒f0_1▒^Wo<*▒H▒▒
   0    UUS1 0U
dQ▒▒▒P/▒y䜰▒▒蕼▒▒▒▒P/▒y▒y @▒yAEMAASAAlQCipDUWe29wSWLzdQWbcCczxpeMYyL3isvGSFlOxO4DEABzWudMB81vL2m4qi51N/3kBQcAeHQDFw84NA==y▒▒▒h7▒y▒6▒▒6▒ORACLE.SECURITY.DB.ENCRYPTION.AZXbXU8xmE8ivy0/MkYbHf8AAAAAAAAAAAAAAAAAAAAAAAAAAAAAQ▒▒▒h7▒y䜰      0R\p9▒yY▒▒▒h7▒y䜰       5AZXbXU8xmE8ivy0/MkYbHf8AAAAAAAAAAAAAAAAAAAAAAAAA0_1AI▒▒▒*▒H▒▒6▒▒▒yORACLE.SECURITY.DB.ENCRYPTION.MASTERKEYQ▒▒▒h7▒y䜰     0'4▒8▒y▒▒▒▒h7▒y▒6▒ ▒yp▒▒▒)4▒8▒{̺▒i▒▒▒▒9▒y▒6▒▒▒y0▒<0▒▒p▒▒▒)4▒8▒{̺▒0
   0    UUS10U

Hmmm. There is our key in memory and clear text. So, the next question is and sort of the point of this exercise, can we remove this wallet (pretending to lose the wallet or password) and add this encryption key to a newly created wallet?

[oracle@sandbox ora11gr2]$ mv tde_wallet tde_wallet_orig  
[oracle@sandbox ora11gr2]$ orapki wallet create -wallet /u01/app/oracle/admin/ora11gr2/tde_wallet -auto_login_local -pwd andyklock123 
Oracle PKI Tool : Version 11.2.0.2.0 - Production 
Copyright (c) 2004, 2010, Oracle and/or its affiliates. All rights reserved. 

[oracle@sandbox ora11gr2]$ mkstore -wrl /u01/app/oracle/admin/ora11gr2/tde_wallet -createEntry ORACLE.SECURITY.DB.ENCRYPTION.MASTERKEY AZXbXU8xmE8ivy0/MkYbHf8AAAAAAAAAAAAAAAAAAAAAAAAAAAAA 
Oracle Secret Store Tool : Version 11.2.0.2.0 - Production 
Copyright (c) 2004, 2010, Oracle and/or its affiliates. All rights reserved. 

Enter wallet password: 

[oracle@sandbox ora11gr2]$ mkstore -wrl /u01/app/oracle/admin/ora11gr2/tde_wallet -createEntry ORACLE.SECURITY.DB.ENCRYPTION.AZXbXU8xmE8ivy0/MkYbHf8AAAAAAAAAAAAAAAAAAAAAAAAAAAAA AEMAASAAlQCipDUWe29wSWLzdQWbcCczxpeMYyL3isvGSFlOxO4DEABzWudMB81vL2m4qi51N/3kBQcAeHQDFw84NA== 
Oracle Secret Store Tool : Version 11.2.0.2.0 - Production 
Copyright (c) 2004, 2010, Oracle and/or its affiliates. All rights reserved. 

[oracle@sandbox ora11gr2]$ sqlplus / as sysdba 
SQL*Plus: Release 11.2.0.2.0 Production on Thu Mar 24 11:00:43 2016 Copyright (c) 1982, 2010, Oracle. All rights reserved. 
Connected to an idle instance. 
SQL> startup;
ORACLE instance started.

Total System Global Area  534462464 bytes
Fixed Size                  2228200 bytes
Variable Size             327155736 bytes
Database Buffers          197132288 bytes
Redo Buffers                7946240 bytes
Database mounted.
Database opened.

SQL> select * from v$encryption_wallet;
 
WRL_TYPE
--------------------
WRL_PARAMETER
--------------------------------------------------------------------------------
STATUS
------------------
file
/u01/app/oracle/admin/ora11gr2/tde_wallet
OPEN
 
 
SQL> select * from andy.tde_t;
 
        ID
----------
         1

That’s actually kind of messed up. My instance didn’t even know about my andyklock123 password. The beauty of SSO I guess… And not needed since I know the password of the wallet I just created…

For a sanity check, let’s make sure the wallet I hand cranked is actually being used:

SQL> shutdown immediate;
Database closed.
Database dismounted.
ORACLE instance shut down.
SQL>
SQL>
SQL> exit
Disconnected from Oracle Database 11g Enterprise Edition Release 11.2.0.2.0 - 64bit Production
With the Partitioning, Automatic Storage Management, OLAP, Data Mining
and Real Application Testing options
[oracle@bbh ora11gr2]$ ls
adump  dpdump  pfile  tde_wallet
[oracle@bbh ora11gr2]$ mv tde_wallet tde_wallet_hold
[oracle@bbh ora11gr2]$ sqlplus
 
SQL*Plus: Release 11.2.0.2.0 Production on Thu Mar 24 11:15:17 2016
 
Copyright (c) 1982, 2010, Oracle.  All rights reserved.
 
Enter user-name: / as sysdba
Connected to an idle instance.
 
SQL> startup;
ORACLE instance started.
 
Total System Global Area  534462464 bytes
Fixed Size                  2228200 bytes
Variable Size             327155736 bytes
Database Buffers          197132288 bytes
Redo Buffers                7946240 bytes
Database mounted.
Database opened.
SQL> select * from v$encryption_wallet;
 
WRL_TYPE
--------------------
WRL_PARAMETER
--------------------------------------------------------------------------------
STATUS
------------------
file
/u01/app/oracle/admin/ora11gr2/tde_wallet
CLOSED
 
 
SQL> select * from andy.tde_t;
select * from andy.tde_t
                   *
ERROR at line 1:
ORA-28365: wallet is not open

So, there you go. Another way to restore an encryption key :). Would you run shmcat on a production system? Probably not. Would you use it (or some other C program that scans memory) if you had to export/import a 15 TB tablespace? Maybe?

Either way, no matter what security features you’ve implemented, if you don’t lock down your keys and your operating systems you really aren’t as secure as you may think.