Welcome!

Silverlight Authors: Automic Blog, Michael Kopp, AppDynamics Blog, Kaazing Blog, Steven Mandel

Related Topics: Weblogic

Weblogic: Article

Diagnosing Tough Performance Problems

Diagnosing Tough Performance Problems

Although many of the symptoms of performance problems (e.g., poor response time) are similar throughout the application life cycle, the underlying causes and the techniques used to diagnose them become more complex in later stages as the load increases and the configuration becomes more complex. In this article, we discuss the tools and techniques that are useful in diagnosing tough performance problems that occur under realistic high loads. We also illustrate why tools used in development or under limited load conditions are not suitable for finding such tough performance problems.

Reproducing Performance Problems Under Load
There are two principal kinds of performance problems:
persistent problems that affect the performance at all times, and transient problems that occur intermittently and for a limited amount of time. The former are typically easy to reproduce as they can be triggered by high enough workload. The latter are harder to reproduce because the right configuration, state, and workload are required to reproduce them. To diagnose tough performance problems under high load, we need a reliable way of reproducing the problem and the ability to examine the internals of the application under load.

Persistent performance problems (e.g., the Empty-Cart transaction always has poor response time) are often reproduced using load-testing tools. In contrast, current techniques for reproducing transient problems (e.g., response time of the entire system quadrupled for 15 minutes at 2:00 p.m. yesterday) involve guesswork and ad hoc testing in order to approximate the original configuration and load in the production environment. Using current approaches, it may take anywhere from weeks to months to reproduce and diagnose transient problems. An alternative is to use real workload technology, which records the production workload (including all transaction requests and responses) when the transient problem occurs, and plays it back in the test environment in order to reproduce the problem. This reduces the time to reproduce and ultimately solve these problems from weeks and months to hours and days.

Diagnosing the Root Cause
Once the performance problem is reproduced, diagnosing the root cause requires drill-down analysis that correlates external symptoms with potential root causes. This involves:

  • Finding the bottleneck or the transactions, beans, servlets, and methods that consume the most time
  • Breaking down the aggregate time spent in a type of transaction across the servlets, beans, and methods it uses
  • Correlating individual transactions from an HTTP request, through servlet and bean calls, and down to JDBC calls and SQL statements to find the chains of invocations that produce slow response time
  • Examining individual thread execution profiles to see which threads were the bottleneck and where the time was spent

    Profiling tools used by developers often provide aggregate information (e.g., statistical summaries of calls to methods) that helps in such analysis. However, they don't isolate individual calls or objects, nor do they gather information about arguments and results. Additionally, profiling tools that are based on the JVM profiling interface (JVMPI) suffer from a significant overhead (between 200 and 1,000%). This performance impact makes loading the application nearly impossible and significantly skews results. Profiling tools based on JVM sampling have lower overhead (around 100 to 200%) but provide even less information. Such tools can be used under higher load but are less helpful in diagnosing tough problems. Byte code instrumentation technology can provide an arbitrary level of detail, and its overhead can be managed by limiting the scope of instrumentation without sacrificing the level of detail. When properly implemented, gathering data using byte code instrumentation can have overhead as low as 5 to 50%.

    Production monitoring tools instrument a small subset of Java methods and report summary information on beans and methods that take the most time.

    These tools discard much of the detailed information required for root-cause diagnosis because they sample method calls and aggregate results in order to reduce overhead. While such monitoring highlights application bottlenecks in production, it does not go the proverbial "last mile." Developers still invest significant time and effort to reproduce the bottleneck in the test lab to diagnose and fix the underlying problem.

    Detailed diagnostic tools are used in test environments to capture the details of each significant method call. Such tools fall into two categories: tools that aggregate and summarize the captured information online for presentation, and tools that save detailed information to disk for offline analysis. While both types of tools help find bottlenecks and provide a breakdown of aggregate transaction time, the second category also supports correlation of individual transactions through different software layers and examination of the execution profiles of individual threads. This analysis of individual transactions and threads is critical to diagnosing complex performance problems.

    In summary, diagnosing tough performance problems in J2EE applications requires a reliable way of reproducing the problem and a low-overhead technique to examine the internal details of the application under load. The capture and playback of real workload from production provides the best option for reproducing difficult performance problems. Performance diagnostic tools based on byte code instrumentation can capture and correlate individual method calls and thread invocations with low overhead and provide the best option for drill-down analysis.

  • More Stories By Ashutosh Tiwary

    Ashutosh Tiwary has 12 years of software development and performance consulting experience at Boeing, Hewlett-Packard, and Teknekron Communications Systems. He is a PhD
    candidate in computer science at the University of Washington, where his dissertation work forms the basis for Performant's technology.

    More Stories By Przemyslaw Pardyak

    Przemyslaw Pardyak codeveloped Performant's core technology and has eight years of research and development experience including performance management. He is a PhD
    candidate in computer science at the University of Washington.
    pp

    Comments (0)

    Share your thoughts on this story.

    Add your comment
    You must be signed in to add a comment. Sign-in | Register

    In accordance with our Comment Policy, we encourage comments that are on topic, relevant and to-the-point. We will remove comments that include profanity, personal attacks, racial slurs, threats of violence, or other inappropriate material that violates our Terms and Conditions, and will block users who make repeated violations. We ask all readers to expect diversity of opinion and to treat one another with dignity and respect.


    IoT & Smart Cities Stories
    Cloud-enabled transformation has evolved from cost saving measure to business innovation strategy -- one that combines the cloud with cognitive capabilities to drive market disruption. Learn how you can achieve the insight and agility you need to gain a competitive advantage. Industry-acclaimed CTO and cloud expert, Shankar Kalyana presents. Only the most exceptional IBMers are appointed with the rare distinction of IBM Fellow, the highest technical honor in the company. Shankar has also receive...
    DXWorldEXPO LLC announced today that ICOHOLDER named "Media Sponsor" of Miami Blockchain Event by FinTechEXPO. ICOHOLDER gives detailed information and help the community to invest in the trusty projects. Miami Blockchain Event by FinTechEXPO has opened its Call for Papers. The two-day event will present 20 top Blockchain experts. All speaking inquiries which covers the following information can be submitted by email to [email protected] Miami Blockchain Event by FinTechEXPOalso offers sp...
    Headquartered in Plainsboro, NJ, Synametrics Technologies has provided IT professionals and computer systems developers since 1997. Based on the success of their initial product offerings (WinSQL and DeltaCopy), the company continues to create and hone innovative products that help its customers get more from their computer applications, databases and infrastructure. To date, over one million users around the world have chosen Synametrics solutions to help power their accelerated business or per...
    DXWordEXPO New York 2018, colocated with CloudEXPO New York 2018 will be held November 11-13, 2018, in New York City and will bring together Cloud Computing, FinTech and Blockchain, Digital Transformation, Big Data, Internet of Things, DevOps, AI, Machine Learning and WebRTC to one location.
    @DevOpsSummit at Cloud Expo, taking place November 12-13 in New York City, NY, is co-located with 22nd international CloudEXPO | first international DXWorldEXPO and will feature technical sessions from a rock star conference faculty and the leading industry players in the world. The widespread success of cloud computing is driving the DevOps revolution in enterprise IT. Now as never before, development teams must communicate and collaborate in a dynamic, 24/7/365 environment. There is no time t...
    When talking IoT we often focus on the devices, the sensors, the hardware itself. The new smart appliances, the new smart or self-driving cars (which are amalgamations of many ‘things'). When we are looking at the world of IoT, we should take a step back, look at the big picture. What value are these devices providing. IoT is not about the devices, its about the data consumed and generated. The devices are tools, mechanisms, conduits. This paper discusses the considerations when dealing with the...
    Charles Araujo is an industry analyst, internationally recognized authority on the Digital Enterprise and author of The Quantum Age of IT: Why Everything You Know About IT is About to Change. As Principal Analyst with Intellyx, he writes, speaks and advises organizations on how to navigate through this time of disruption. He is also the founder of The Institute for Digital Transformation and a sought after keynote speaker. He has been a regular contributor to both InformationWeek and CIO Insight...
    Digital Transformation is much more than a buzzword. The radical shift to digital mechanisms for almost every process is evident across all industries and verticals. This is often especially true in financial services, where the legacy environment is many times unable to keep up with the rapidly shifting demands of the consumer. The constant pressure to provide complete, omnichannel delivery of customer-facing solutions to meet both regulatory and customer demands is putting enormous pressure on...
    Machine learning has taken residence at our cities' cores and now we can finally have "smart cities." Cities are a collection of buildings made to provide the structure and safety necessary for people to function, create and survive. Buildings are a pool of ever-changing performance data from large automated systems such as heating and cooling to the people that live and work within them. Through machine learning, buildings can optimize performance, reduce costs, and improve occupant comfort by ...
    SYS-CON Events announced today that IoT Global Network has been named “Media Sponsor” of SYS-CON's @ThingsExpo, which will take place on June 6–8, 2017, at the Javits Center in New York City, NY. The IoT Global Network is a platform where you can connect with industry experts and network across the IoT community to build the successful IoT business of the future.