EdgyDebug: March 2015

Tuesday 31 March 2015

Create And Provision Endeca Application with the Deployment Template

Here are the steps to create an Endeca application from scratch and provision it using deployment template.

Application Creation

1. Open command line and change directory to ..\...\ToolsAndFrameworks\3.1.2

\deployment_template\bin and invoke deploy script.

For ex. C:\Endeca\ToolsAndFrameworks\3.1.2\deployment_template\bin\deply.bat

2. Provide configuration parameter/values as script prompts you. First of all it will confirm

IAP version. Then it will prompt following information.

Application Name.
Application Deployment directory.
EAC port.
Workbench port.
Live dgraph port.
Authoring Dgraph port.
Log Server port.

Screen Captures for application creation.

For port values you can use any available port on your machine, or use default one.At the end you will get message that appplication deployed successfully.

Now application creation is done. It is time to provision it. Once application provisioned it will be available for configuration in Endeca Workbench.

To provision an application.

Go to \control directory of your application you created earlier (above steps).

Invoke initialize_services.bat or initialize_services.sh script. This script won't prompt for any values.

Here is the screen capture of application provisioning.

Sunday 15 March 2015

Speed Up ATG Application Using Caching

Here I am going to introduce repository level caching mechanism in ATG. Caching is very critical to application performance.

The thumb rule for caching is

You should design an application so it requires minimal access to the database and ensures data integrity.

Each item descriptor in SQL repository has its own 2 types of cache.

Item Cache
Query Cache

1. Item Cache : The item cache holds property values for repository items. It is indexed by the repository item IDs.

2. Query Cache : The query cache holds the repository IDs of items that match particular queries in the cache.

Advantages of having item and query cache at each item descriptor level.

Set caching size for each item type separately.
Flush cache for each item type separately (selective cache invalidation).

How is work ?

Repository queries are performed in two passes, using two separate SELECT statements.

1. Repository id fetching (query cache) : The first statement gathers the IDs of the repository items that match that query. Repository first examines the query cache whether same query is already cached or not. In the case query is already cached, it returns matching ids from query cache. In this case no query will be fired on Database. If this query is not cached, then repository fires query on database. Create entry in query cache for this query and return the results. It is very useful when repeated queries are common.

2. Repository Item fetching (Item cache) : The SQL repository then examines the result set from the first SELECT statement and finds any items that already exist in the item cache. A second SELECT statement retrieves from the database any items that are not in the item cache.

Interesting fact about query cache : When application executes query using id parameter.

For example id="1234"

In this case ATG is not going to create entry in query cache. As query parameter and id returned from query is same. It bypass the first query, and directly work on second query and item cache.

Cache Tuning :

1. Query Cache Tuning : It is generally safe to set the size of the query cache to 1000 or higher. Query caches only contain the query parameters and string IDs of the result set items, so large query cache sizes can usually be handled comfortably without running out of memory.A query whose parameters are subject to less frequent changes is a good candidate for caching.

2. Item Cache Tuning : Item cache size should be large enough to accommodate the number of items in the repository. For example repository contains 50000 SKUs. Then sku item cache size should be 50000. Item cache size need set properly as it requires more space(memory) as compare to query cache. You need to test application performance with different item cache size based on hardware availability.

Good Luck..!!

Saturday 7 March 2015

Resolving Performance Puzzle

It is challenging for beginners to start performance tuning of an application. I was in same situation when got my first performance tuning assignment. Before starting with performance tuning we need to understand the aspects of it.

In simple words Performance tuning is the improvement of system performance. Most systems will respond to increased load with some degree of decreasing performance. The core of performance tuning is the performance testing.

Here is the list of steps to execute systematic tuning.

Assess the issue, get the numbers (to baseline the performance) to measure the performance (Mostly available/defined in Non Functional Requirement [NFR] document).
Measure the performance of system.
Identify the bottlenecks (Part of the system that is critical for performance).
Modify the system update/remove the bottleneck.
Measure the system performance after above modification.
In the case modification improves the performance adopt it,otherwise revert the above modification.
Repeat from steps 2 to 6 in cycles to improve the performance requirement.

These are high level steps. The most confusing/challenging question comes into mind is.

Where (which part of application) to start performance testing ?.

Most of the web applications are modularized/divided into 3 major parts.

Database.
Back-end code (Including third party calls).
Front-end.

Its always better to start from database and end at front-end.

Sunday 1 March 2015

ATG Application Performance Tuning Toolkit

Here is list of tools I am using for application performance analysis and tuning.

1. JMeter : The Apache JMeter™ desktop application is open source software, a 100% pure Java application designed to load test functional behavior and measure performance. Apache JMeter may be used to test performance both on static and dynamic resources (Files, Web dynamic languages - PHP, Java, ASP.NET, etc. -, Java Objects, Data Bases and Queries, FTP Servers and more). It can be used to simulate a heavy load on a server, group of servers, network or object to test its strength or to analyze overall performance under different load types. You can use it to make a graphical analysis of performance or to test your server/script/object behavior under heavy concurrent load.

2. JvisualVM (Java VisualVM) : Java VisualVM can be used by Java application developers to troubleshoot applications and to monitor and improve the applications' performance. Java VisualVM can allow developers to generate and analyse heap dumps, track down memory leaks, browse the platform's MBeans and perform operations on those MBeans, perform and monitor garbage collection, and perform lightweight memory and CPU profiling.

3. TDA - Thread Dump Analyzer : The TDA Thread Dump Analyzer for Java is a small Swing GUI for analyzing Thread Dumps and Heap Information generated by the Sun Java VM.

4. MAT - Eclipse Memory Analyzer : The Eclipse Memory Analyzer is a fast and feature-rich Java heap analyzer that helps you find memory leaks and reduce memory consumption. Use the Memory Analyzer to analyze productive heap dumps with hundreds of millions of objects, quickly calculate the retained sizes of objects, see who is preventing the Garbage Collector from collecting objects, run a report to automatically extract leak suspects.

5. Automatic Workload Repository (AWR) in Oracle Database : The Automatic Workload Repository (AWR) collects, processes, and maintains performance statistics for problem detection and self-tuning purposes. This data is both in memory and stored in the database.

The statistics collected and processed by AWR include :

Object statistics that determine both access and usage statistics of database segments.

Time model statistics based on time usage for activities, displayed in the V$SYS_TIME_MODEL and V$SESS_TIME_MODEL views.

Some of the system and session statistics collected in the V$SYSSTAT and V$SESSTAT views SQL statements that are producing the highest load on the system, based on criteria such as elapsed time and CPU time.

ASH statistics, representing the history of recent sessions activity.

I will update more on performance tuning.....