Friday, December 7, 2012

JRockit Flight Recorder: Analysis into OAM 11G environment


First Jrockit article I did analysis into OIM environment and we could see java class: com.thortech.xl.scheduler.core.quartz.QuartzSchedulerImpl.scheduleJob was not working properly.
Today I doing some troubleshooting in OAM environment using JROckit Mission COntrol tool. So, let’s follow step-by-step.
First step: I saw one issue into OAM environment and this was throwing stuck threads into WLS AdminServer and OAM Logs during the day.
Second step: Run jrcmd command to collect all information required to do Java analysis into OAM process. Follow one command example:
./jrcmd 17294 start_flightrecording name=OAM-17294 settings=default duration=7200s filename=/tmp/OAM-17294.jfr.gz compress=true
Details:
2.1-17294=> OAM process number.
2.2-start_flightrecording=> command that Mission control will run.
2.3-name.
2.4-settings=> option to use templates during the record process.
2.4.1-Options:
code=>Additional settings for enabling more verbose compiler logging.
default=>Default settings tuned for a very low performance overhead and recommended for always-on production use.
freemem=>Additional settings for debugging out-of-memory and fragmentation problems.
full=>Enables collection of all available events for all subsystems. Warning: This has a very high performance overhead.
io=>Additional settings for enabling more verbose Java I/O logging.
leak=>Additional settings for debugging memory leaks.
locks=>Additional settings for enabling more verbose synchronization logging.
memory=>Additional settings for enabling more verbose GC/memory management logging.
off=>Disables all events for all subsystems.
profile=>Recommended settings for creating a profiling recording. They provide a good balance between the amount of information available and the performance overhead introduced.
sample=>Additional settings for enabling hotspot sampling of code.
semirefs=>Additional settings for debugging problems with java.lang.ref.Reference objects and its subclasses.
2.5-compress=true ==>to compress or not the record file.
3-Third Step: Analysis based on your knowledge related of java and application:
3.1- During the collection time I could see the main problem of OAM was the garbage collection. It was not working properly. Actually
Garbage was working more but heap memory was high. So, I could see problems related of ‘fremem’ so, in my case the command was:
3.1.1- ps -ef |grep oamserver_1 <== get process number of OAM.
3.1.2- ./jrcmd processnumber start_flightrecording name=OAM settings=freemem.jfs duration=7200s filename=/tmp/OAM.jfr.gz compress=true <==start to record JRC file.
other helpful command is:
3.1.3-jrcmd processnumber check_flightrecording <== Check status of Flight recording
Opening this file into Mission COntrol client I could see these 3 important pictures below:

Fig1: As you can see, during the day we had some objects growing and getting the max heap memory allocated. If this was only a memory problem and we just need to add more, ok sounds good. But this was not the case here. As you can see GC is not working properly. You cannot see the blue line working (increasing and decreasing per object clean-up). So, moving forward… Check Fig2 and Fig3.

Fig2: Then another collection day, I could garbage getting stuck because of some objects and then trying to keep working. This behavior means that JVM garbage process was trying to do its work, but some java code was not developed properly.

Fig3:This main picture we could get exactly what are java codes in OAM application were not been clean by GC as well as putting all the server in a bad situation. So, using this analysis we could work and fix the code related and apply a new patch into this environment.
So, basically the problem was related of some SQLs using PreparedStatement. And tangosol(responsible for high performance transactions) IOs. Using this tool I could go directly to the root cause.
Donwnload JRMC: http://www.oracle.com/technetwork/middleware/jrockit/downloads/index.html
Docs and Reference: 1.0- http://docs.oracle.com/cd/E15289_01/index.htm
2.0- http://docs.oracle.com/cd/E15289_01/doc.40/e15070/usingjfr.htm
I hope this helps,
Thiago Leoncio.

No comments:

Post a Comment