I used csv file for indexing.Sometime I faced data corruption or indexing failure issues.Below are the possible reasons.
Indexing fails due to any one of the below reasons.
1.In csv format data,data contains extra column due to delimeter character presence in source data.
2.In the case property length exceeds the configured value.
Solution : Write the encoded delimeter from source data to csv file.Control the property length during csv file extraction.
Indexing fails due to any one of the below reasons.
1.In csv format data,data contains extra column due to delimeter character presence in source data.
2.In the case property length exceeds the configured value.
Solution : Write the encoded delimeter from source data to csv file.Control the property length during csv file extraction.
Data corruption issues are happening due to improper handling of empty columns in the csv file.Due to missing columns subsequent column values are assigned to missing column header property.For example you are having three column A,B and C in csv file.If some record r is not having value for column A then column B value is assigned to column A header and column C value is assigned to column B header source property.In this way it results in data corruption.In the case Endeca is not behaving as expected then verify the generated csv file.
Note : When you are using csv file the number of column in each line of file must match the number of column in the header(source property values) line.For every empty or missing value put the empty column with proper delimeter. CSV file generation(extraction) code must be robust enough to handle these scenarios.
No comments:
Post a Comment