EdgyDebug

Tuesday, 31 March 2015

Create And Provision Endeca Application with the Deployment Template

Here are the steps to create an Endeca application from scratch and provision it using deployment template.

Application Creation

1. Open command line and change directory to ..\...\ToolsAndFrameworks\3.1.2

\deployment_template\bin and invoke deploy script.

For ex. C:\Endeca\ToolsAndFrameworks\3.1.2\deployment_template\bin\deply.bat

2. Provide configuration parameter/values as script prompts you. First of all it will confirm

IAP version. Then it will prompt following information.

Application Name.
Application Deployment directory.
EAC port.
Workbench port.
Live dgraph port.
Authoring Dgraph port.
Log Server port.

Screen Captures for application creation.

For port values you can use any available port on your machine, or use default one.At the end you will get message that appplication deployed successfully.

Now application creation is done. It is time to provision it. Once application provisioned it will be available for configuration in Endeca Workbench.

To provision an application.

Go to \control directory of your application you created earlier (above steps).

Invoke initialize_services.bat or initialize_services.sh script. This script won't prompt for any values.

Here is the screen capture of application provisioning.

Sunday, 15 March 2015

Speed Up ATG Application Using Caching

Here I am going to introduce repository level caching mechanism in ATG. Caching is very critical to application performance.

The thumb rule for caching is

You should design an application so it requires minimal access to the database and ensures data integrity.

Each item descriptor in SQL repository has its own 2 types of cache.

Item Cache
Query Cache

1. Item Cache : The item cache holds property values for repository items. It is indexed by the repository item IDs.

2. Query Cache : The query cache holds the repository IDs of items that match particular queries in the cache.

Advantages of having item and query cache at each item descriptor level.

Set caching size for each item type separately.
Flush cache for each item type separately (selective cache invalidation).

How is work ?

Repository queries are performed in two passes, using two separate SELECT statements.

1. Repository id fetching (query cache) : The first statement gathers the IDs of the repository items that match that query. Repository first examines the query cache whether same query is already cached or not. In the case query is already cached, it returns matching ids from query cache. In this case no query will be fired on Database. If this query is not cached, then repository fires query on database. Create entry in query cache for this query and return the results. It is very useful when repeated queries are common.

2. Repository Item fetching (Item cache) : The SQL repository then examines the result set from the first SELECT statement and finds any items that already exist in the item cache. A second SELECT statement retrieves from the database any items that are not in the item cache.

Interesting fact about query cache : When application executes query using id parameter.

For example id="1234"

In this case ATG is not going to create entry in query cache. As query parameter and id returned from query is same. It bypass the first query, and directly work on second query and item cache.

Cache Tuning :

1. Query Cache Tuning : It is generally safe to set the size of the query cache to 1000 or higher. Query caches only contain the query parameters and string IDs of the result set items, so large query cache sizes can usually be handled comfortably without running out of memory.A query whose parameters are subject to less frequent changes is a good candidate for caching.

2. Item Cache Tuning : Item cache size should be large enough to accommodate the number of items in the repository. For example repository contains 50000 SKUs. Then sku item cache size should be 50000. Item cache size need set properly as it requires more space(memory) as compare to query cache. You need to test application performance with different item cache size based on hardware availability.

Good Luck..!!

Saturday, 7 March 2015

Resolving Performance Puzzle

It is challenging for beginners to start performance tuning of an application. I was in same situation when got my first performance tuning assignment. Before starting with performance tuning we need to understand the aspects of it.

In simple words Performance tuning is the improvement of system performance. Most systems will respond to increased load with some degree of decreasing performance. The core of performance tuning is the performance testing.

Here is the list of steps to execute systematic tuning.

Assess the issue, get the numbers (to baseline the performance) to measure the performance (Mostly available/defined in Non Functional Requirement [NFR] document).
Measure the performance of system.
Identify the bottlenecks (Part of the system that is critical for performance).
Modify the system update/remove the bottleneck.
Measure the system performance after above modification.
In the case modification improves the performance adopt it,otherwise revert the above modification.
Repeat from steps 2 to 6 in cycles to improve the performance requirement.

These are high level steps. The most confusing/challenging question comes into mind is.

Where (which part of application) to start performance testing ?.

Most of the web applications are modularized/divided into 3 major parts.

Database.
Back-end code (Including third party calls).
Front-end.

Its always better to start from database and end at front-end.

Sunday, 1 March 2015

ATG Application Performance Tuning Toolkit

Here is list of tools I am using for application performance analysis and tuning.

1. JMeter : The Apache JMeter™ desktop application is open source software, a 100% pure Java application designed to load test functional behavior and measure performance. Apache JMeter may be used to test performance both on static and dynamic resources (Files, Web dynamic languages - PHP, Java, ASP.NET, etc. -, Java Objects, Data Bases and Queries, FTP Servers and more). It can be used to simulate a heavy load on a server, group of servers, network or object to test its strength or to analyze overall performance under different load types. You can use it to make a graphical analysis of performance or to test your server/script/object behavior under heavy concurrent load.

2. JvisualVM (Java VisualVM) : Java VisualVM can be used by Java application developers to troubleshoot applications and to monitor and improve the applications' performance. Java VisualVM can allow developers to generate and analyse heap dumps, track down memory leaks, browse the platform's MBeans and perform operations on those MBeans, perform and monitor garbage collection, and perform lightweight memory and CPU profiling.

3. TDA - Thread Dump Analyzer : The TDA Thread Dump Analyzer for Java is a small Swing GUI for analyzing Thread Dumps and Heap Information generated by the Sun Java VM.

4. MAT - Eclipse Memory Analyzer : The Eclipse Memory Analyzer is a fast and feature-rich Java heap analyzer that helps you find memory leaks and reduce memory consumption. Use the Memory Analyzer to analyze productive heap dumps with hundreds of millions of objects, quickly calculate the retained sizes of objects, see who is preventing the Garbage Collector from collecting objects, run a report to automatically extract leak suspects.

5. Automatic Workload Repository (AWR) in Oracle Database : The Automatic Workload Repository (AWR) collects, processes, and maintains performance statistics for problem detection and self-tuning purposes. This data is both in memory and stored in the database.

The statistics collected and processed by AWR include :

Object statistics that determine both access and usage statistics of database segments.

Time model statistics based on time usage for activities, displayed in the V$SYS_TIME_MODEL and V$SESS_TIME_MODEL views.

Some of the system and session statistics collected in the V$SYSSTAT and V$SESSTAT views SQL statements that are producing the highest load on the system, based on criteria such as elapsed time and CPU time.

ASH statistics, representing the history of recent sessions activity.

I will update more on performance tuning.....

Friday, 27 February 2015

ATG Datasource Debugging / Print SQL statements for repository queries

Here are the steps to debug Data-source problems.

1. Datasource Debugging on WebLogic or WebSphere.

To add datasource debugging, first rename the datasource used by your applications (usually JTDataSource,properties). Then create a new JTDataSource.properties file with the following contents:

$class=atg.service.jdbc.WatcherDataSource
dataSource=/atg/dynamo/service/jdbc/DirectJTDataSource
showOpenConnectionsInAdmin=false
logDebugStacktrace=false
loggingDebug=false
monitored=false
loggingSQLError=true
loggingSQLWarning=false
loggingSQLInfo=false
loggingSQLDebug=false

Second, create a DirectJTDataSource.properties file with the following contents:

$class=atg.nucleus.JNDIReference
JNDIName=java:/ATGSolidDS

Where ATGSolidDS is replaced by the JNDI name of your application server data source.

Place both properties files in your localconfig directory. To enable data source debugging, set the monitored property and the loggingSQLInfo property in the JTDataSource.properties file to true.

Note: Due to the potential performance impact, this feature should be used only in a development environment. Do not enable SQL debugging in a production site.

2. Datasource Debugging on JBoss.

The default JTDataSource for JBoss allows you to monitor and log data source information for debugging purposes. It does this using the WatcherDataSource class. A WatcherDataSource “wraps” another data source, allowing debugging of the wrapped data source. For example:

/atg/dynamo/service/jdbc/JTDataSource.properties
$class=atg.service.jdbc.WatcherDataSource
# The actual underlying DataSource.
dataSource=/atg/dynamo/service/jdbc/DirectJTDataSource

Note: Due to the potential performance impact, the features described here should be used only for debugging in a development environment. Do not use datasource logging in a production environment unless absolutely necessary.

WatcherDataSource Configuration

The default WatcherDataSource configuration is:

showOpenConnectionsInAdmin=false
logDebugStacktrace=false
loggingDebug=false
monitored=false
loggingSQLError=true
loggingSQLWarning=false
loggingSQLInfo=false
loggingSQLDebug=false

This default configuration logs the following information:

currentNumConnectionsOpen
maxConnectionsOpen
numGetCalls
averageGetTime
maxGetTime
numCloseCalls
averageCloseTime
maxCloseTime
averageOpenTime
maxOpenTime.

For additional debugging information, you can set the following properties to true:
- showOpenConnectionsInAdmin—Lists currently open connections, along with the amount of time they have been held open and the thread that is holding them open. This information is useful for identifying Connection leaks. If logDebugStacktrace is also true, then stacktraces are displayed as well.
  
  Note: This momentarily prevents connections from being obtained or returned from the DataSource, so severely affects performance.
- loggingDebug—Logs debug messages on every getConnection() and close() call. These messages include interesting information such as sub-call time, number of open connections, and the calling thread. If logDebugStacktrace is also true then a stacktrace is logged as well.
- logDebugStacktrace—Creates stacktraces on each getConnection() call. This allows the calling code to be easily identified, which can be useful when trying to find Connection leaks, code that is holding Connections open for too long, or code that is grabbing too many Connections at a time.
  
  Note: This is done by generating an exception, which affects performance.
- monitored—Gathers additional connection statistics and SQL logging.

Saturday, 14 February 2015

MDEX Engine 6.4 language-specific, dictionary based, linguistic analysis features

Here is the list of new language specific features of MDEX 6.4.

Segmentation The process of breaking up the non-whitespace language's text into meaningful units.

Tokenization The process of breaking a stream of text up into words, phrases, symbols, or other meaningful elements.

Orthographic normalization The creation of a standard indexed form for diacritic marks.

Decompounding The decomposition of compound word forms into their base terms.

Dynamic stemming The process of determining the base form of a word; a process based on dictionary entries and language specific rules.

Stop words A list of words to be ignored by the Endeca MDEX Engine. Sample stop word lists are now provided for each supported language.

Thursday, 5 February 2015

Update JSP files in compressed war file

Here are the steps to update jsp file in compressed war.

Extract war and update the file as per your need.
Open original war file using 7-zip tool.
Navigate to file location in war (in 7-zip explorer).
Drag the modified file in 7-zip.