Wednesday, September 14, 2011

Rebuilding the FAST Search Server 2010 index (FS4SP)

Topic: SharePoint 2010, FS4SP, FAST Search Server 2010, FIXML
Subject: Index corruption and rebuilding the FS4SP index from FIXML
Problem: My FS4SP index has become corrupt.  It took a week to original perform our full crawls. Is there anything we can do?
Response:  Though this situation tends to be rare there may come a time when you need to rebuild a FS4SP index.  When FS4SP is indexing items it not only stores the physical index itself <FASTInstall Drive>\FASTSearch\data\data_index but it also stores FIXML <FASTInstall Drive>\FASTSearch\data\data_fixml for each item indexed.  The FIXML contains all the information which is to become that item within the index.  Search results are retrieved from the index so why keep the FIXML around?  Well one reason is for this exact problem: rebuilding an index.  There are two ways of fixing a corrupt index and I will use the Solution\Example section to cover both.  (In addition rebuilding an index is very important in prompting a backup column to a primary column) In this example I am using a three server single Column FS4SP farm.
FS4SP Farm: 1 Admin, 1 Primary Column, and 1 Backup Column.

Note: “fast3.mydomain.local” is the Primary Column Server.
Starting FS4SP Farm Deployment

<?xml version="1.0" encoding="utf-8" ?>
<deployment version="14">     
   <instanceid>FAST Search Server POC</instanceid>
   <connector-databaseconnectionstring></connector-databaseconnectionstring>
 
   <!—Admin Node -- >
  <host name="fast1.mydomain.local">
               <admin />
               <webanalyzer server="true" max-targets="2" link-processing="true" lookup-db="true" redundant-lookup="true" />   
  </host>
 
  <!—Primary Column -- >
  <host name="fast2.mydomain.local">
               <content-distributor id="0" />
               <searchengine row="1" column="0" />
               <indexing-dispatcher />
               <query />
  </host>
 
  <!—Back Column -- >
  <host name="fast3.mydomain.local">
               <document-processor processes="4"/>
               <content-distributor id="1" />
               <indexing-dispatcher />
               <searchengine row="0" column="0" />
               <query />
  </host>
 
  <searchcluster>
<row id="0" index="primary" search="true" />
<row id="1" index="secondary" search="true" />
  </searchcluster>

</deployment>

Solution\Example:
Repairing a corrupt index using FIXML
1.      Optional (To allow for ease of observation)
a.      Clear Event Viewer entries on the Primary Column Server

b.      Clear FS4SP Logs
                                                    i.     Open a FAST Command Shell as Administrator

                                                   ii.     Execute: nctrl stop

                                                  iii.     Execute: net stop fastsearchmonitoring

                                                  iv.     Delete all files under <FASTInstallDrive>\FASTSearch\var\log

                                                   v.     Execute: nctrl start

2.      Resetting the Index

a.      An index reset can be performed from any server in the FS4SP farm and on any Column or Row by specifying the Column, Row in the reset command.  For this example, I will perform the command from the server which hosts the Primary Column and will specify the Row and Column even though the FS4SP farm only has a single column.

b.      An index reset does not rebuild the Column from scratch.  The indexer will validate each item within the FS4SP column against the original FIXML.  Any item which is deemed not in sync or corrupt will be updated in the FS4SP column/index.  This will take less time than rebuilding the index column from scratch.

3.      Open the FAST Command Shell as Administrator on the FS4SP Admin Server

a.      In this example: fast1.mydomain.local

b.      Make sure all crawls are stopped or paused and the FS4SP Column is idle
                                                    i.     Execute: indexerinfo --row=0 --column=0 status

                                                   ii.     Look at the status of each partition status
<indexer hostname="fast3.mydomain.local" port="13050" cluster="webcluster" column="0" row="0"
  <documents size="2022136316.000000" total="7046" indexed="7046" not_indexed="0"/>
  <column_role state="Master" backups="1"/>
  <index_frequence min="0.000000" max="0.000000"/>
  <partition id="0" index_id="1315841940082287000" status="idle" type="dynamic"
    <documents active="7046" total="7046"/>
  </partition>
  <partition id="1" index_id="1315328545925813000" status="idle" type="dynamic"
    <documents active="0" total="0"/>
  </partition>
  <partition id="2" index_id="1315328542666333000" status="idle" type="dynamic"
    <documents active="0" total="0"/>
  </partition>
  <partition id="3" index_id="1315328539447778000" status="idle" type="dynamic"
    <documents active="0" total="0"/>
  </partition>
  <partition id="4" index_id="1315328535885114000" status="idle" type="dynamic"
    <documents active="0" total="0"/>
  </partition>
  <document_api number_of_elements="0" last_sequence="99610"
    <queue_size current="0"/>
    <operations_processed api="0"/>
    <document_api number_of_elements="0" last_sequence="99610" frequence
  </document_api>
</indexer>

c.      Stop the Web Analyzer and Relevancy Admin

                                                    i.     The WebAnalyzer runs on a schedule. To avoid any updates or processing causing changes to the index we will suspend the services from processing.

                                                   ii.     Logon to the FS4SP server which hosts the WebAnalyzer.

1.      In this example: fast1.mydomain.local

                                                  iii.     Open FAST Command Shell As Administrator

                                                  iv.     Execute: waadmin showstatus

1.      The Overall Status needs to be running before we can suspend it.

                                                   v.     If the Status is paused

a.      Execute: waadmin enqueueview

b.      Repeat Steps iv.

                                                  vi.     Execute: waadmin AbortProcessing

                                                vii.     Execute: spreladmin AbortProcessing

4.      Issue an Index reset
a.      Logon to the Primary Index Column Server

                                                    i.     In this example: fast3.mydomain.local

b.      From the FAST Command Shell as Administrator

c.      Execute: indexeradmin --row=0  --column=0 resetindex

d.      Execute: indexerinfo --row=0 --column=0 status

                                                    i.     You may have to issue the command several times to see the work being performed.

                                                   ii.     The index reset will work through each partition of the FS4SP column. If you can keep executing the command to watch the progress.  Note: all the items are originally in partition 1 (they will not end up there. At least in my example because of the number of items I have under index). 

                                                  iii.     When the index reset finishes all the items end up in partition 4. (This will vary depending on the partition spread. Just an interesting observation as to how the reset is performed under the covers)
       <indexer hostname="fast3.mydomain.local" port="13050" cluster="webcluster" column="0" row="0"
  <documents size="2022136316.000000" total="7046" indexed="7046" not_indexed="0"/>
  <column_role state="Master" backups="1"/>
  <index_frequence min="0.000000" max="0.000000"/>
  <partition id="0" index_id="1315841940082287000" status="idle" type="dynamic"
    <documents active="7046" total="7046"/>
  </partition>
  <partition id="1" index_id="1315328545925813000" status="idle" type="dynamic"
    <documents active="0" total="0"/>
  </partition>
  <partition id="2" index_id="1315328542666333000" status="idle" type="dynamic"
    <documents active="0" total="0"/>
  </partition>
  <partition id="3" index_id="1315328539447778000" status="idle" type="dynamic"
    <documents active="0" total="0"/>
  </partition>
  <partition id="4" index_id="1315328535885114000" status="indexing (6%)" type="dynamic"
    <documents active="0" total="0"/>
  </partition>
  <document_api number_of_elements="0" last_sequence="99610"
    <queue_size current="0"/>
    <operations_processed api="0"/>
    <document_api number_of_elements="0" last_sequence="99610" frequence
  </document_api>
</indexer>

e.      Alternative ways to watch the process.  (Especially if you experiment with very little data in the system as the index reset will work fast enough that you may not see the status changing)

                                                    i.     From Event Viewer

1.      Open the Windows Event Viewer

2.      Expand the Applications and Services Logs Node

3.      Open the FAST Search Logs

4.      You will find several entries similar to the following:

    indexer_admin_servant: Reset index requested.
    state::runtime: Indexing suspended
    state::runtime: Indexing resumed
    work_order 4_1315848480044895000: Index State Reset - Not using incremental indexing
    master_indexing_thread (p:4,j:1): Completed Index State Reset. Normal indexing enabled.

f.       From FAST Logs

                                                    i.     Using Windows Explorer Navigate to <FASTInstallDrive>\FASTSearch\var\log\indexer folder

                                                   ii.     Open indexer.txt

                                                  iii.     Search for “Reset index requested” and you will see something similar to the following:
INFO       indexer indexer_admin_servant: Reset index requested.
VERBOSE    indexer rts::indexing::util: Suspending indexing
INFO       indexer state::runtime: Indexing suspended
INFO       indexer state::runtime: Indexing resumed
VERBOSE    indexer percentage_file_distributor: Partition 4 should have 100% of the docs. Num docs 7046 target : 6000000. New range: 1-714
INFO       indexer work_order 4_1315848480044895000: Index State Reset - Not using incremental indexing
VERBOSE    indexer index_producer (4): Index 1315848480044895000 completed. OK docs: 7046, failed docs: 0, errors: 0, range: 1-714, 0 exclusionlisted
VERBOSE    indexer percentage_file_distributor: Partition 3 has empty range.
INFO       indexer work_order 3_1315848737366889000: Index State Reset - Not using incremental indexing
VERBOSE    indexer index_producer (3): Index 1315848737366889000 completed. OK docs: 0, failed docs: 0, errors: 0, range: 0-0, 0 exclusionlisted
VERBOSE    indexer percentage_file_distributor: Partition 2 has empty range.
INFO       indexer work_order 2_1315848740067419000: Index State Reset - Not using incremental indexing
VERBOSE    indexer index_producer (2): Index 1315848740067419000 completed. OK docs: 0, failed docs: 0, errors: 0, range: 0-0, 0 exclusionlisted
VERBOSE    indexer percentage_file_distributor: Partition 1 has empty range.
INFO       indexer work_order 1_1315848743251859000: Index State Reset - Not using incremental indexing
VERBOSE    indexer index_producer (1): Index 1315848743251859000 completed. OK docs: 0, failed docs: 0, errors: 0, range: 0-0, 0 exclusionlisted
VERBOSE    indexer percentage_file_distributor: Partition 0 has empty range.
INFO       indexer work_order 0_1315848746358249000: Index State Reset - Not using incremental indexing
VERBOSE    indexer index_producer (0): Index 1315848746358249000 completed. OK docs: 0, failed docs: 0, errors: 0, range: 0-0, 0 exclusionlisted
VERBOSE    indexer search_controller_holder: Activating index set '0_1315848746358249000,1_1315848743251859000,2_1315848740067419000,3_1315848737366889000,4_1315848480044895000', 7046 active docs, 0 exclusionlisted.
INFO       indexer master_indexing_thread (p:4,j:1): Completed Index State Reset. Normal indexing enabled.
VERBOSE    dictionary_producer Loading rc-file: 'C:\FASTSE~1\var/etc/findexrc'
VERBOSE    dictionary_producer Reading index configuration from C:\FASTSearch\data\data_index\fast3.mydomain.local.normalized.temp\index.cf
VERBOSE    indexer dictionary_builder: New dictionaries ready (trickypoc3.trickydomain.local.normalized.1315848819)

g.      Resume WebAnalyzer and Relevancy Admin once the index reset completes.

                                                    i.     Open FAST Command Shell As Administrator on the FS4SP Admin node or the Server with the WebAnalyzer service enabled.

                                                   ii.     Execute: waadmin EnqueueView

                                                  iii.     Execute: spreladmin Enqueue


h.      Test your Search Center

Rebuilding an index from FIXML
1.      Open the FAST Command Shell as Administrator on the FS4SP Admin Server

a.      In this example: fast1.mydomain.local

b.      Make sure all crawls are stopped or paused and the FS4SP Column is idle
                                                    i.     Execute: indexerinfo --row=0 --column=0 status

                                                   ii.     Look at the status of each partition status
<indexer hostname="fast3.mydomain.local" port="13050" cluster="webcluster" column="0" row="0"
  <documents size="2022136316.000000" total="7046" indexed="7046" not_indexed="0"/>
  <column_role state="Master" backups="1"/>
  <index_frequence min="0.000000" max="0.000000"/>
  <partition id="0" index_id="1315841940082287000" status="idle" type="dynamic"
    <documents active="0" total="0"/>
  </partition>
  <partition id="1" index_id="1315328545925813000" status="idle" type="dynamic"
    <documents active="0" total="0"/>
  </partition>
  <partition id="2" index_id="1315328542666333000" status="idle" type="dynamic"
    <documents active="0" total="0"/>
  </partition>
  <partition id="3" index_id="1315328539447778000" status="idle" type="dynamic"
    <documents active="0" total="0"/>
  </partition>
  <partition id="4" index_id="1315328535885114000" status="idle" type="dynamic"
    <documents active="7046" total="7046"/>
  </partition>
  <document_api number_of_elements="0" last_sequence="99610"
    <queue_size current="0"/>
    <operations_processed api="0"/>
    <document_api number_of_elements="0" last_sequence="99610" frequence
  </document_api>
</indexer>

c.      Stop the Web Analyzer and Relevancy Admin

                                                    i.     The WebAnalyzer runs on a schedule. To avoid any updates or processing causing changes to the index we will suspend the services from processing.

                                                   ii.     Logon to the FS4SP server which hosts the WebAnalyzer.


                                                  iii.     Open FAST Command Shell As Administrator

                                                  iv.     Execute: waadmin showstatus

1.      The Overall Status needs to be running before we can suspend it.

                                                   v.     If the Status is paused

a.      Execute: waadmin enqueueview

b.      Repeat Steps iv.

                                                  vi.     Execute: waadmin AbortProcessing

                                                vii.     Execute: spreladmin AbortProcessing


2.      Rebuild the Primary Column Index
a.      Logon to the Primary Index Column Server

b.      In this example: fast3.mydomain.local

c.      From the FAST Command Shell as Administrator

d.      Execute: nctrl stop

e.      Using windows explorer navigate to <FASTInstallDrive>\FASTSearch\data\

f.       Delete the data_index folder

g.      Execute: nctrl start

h.      Execute: indexerinfo --row=0 --column=0 status

                                                    i.     Notice the total=”7046”, indexed=”0” and not_indexed=”7046” as these numbers are generated from the FIXML.

                                                   ii.     Much like the index reset you will see that status of the partitions changes

                                                  iii.     Keep re-issuing the indexerinfo --row=0 --column=0 status to watch the progress

                                                  iv.     Note the difference between index reset and rebuilding the index.  The index reset moved the items from one partition to another while keeping the index populate.  The rebuilding of the index from scratch started with an indexed count: indexed=”0”.
<indexer hostname="fast3.myomain.local" port="13050" cluster="webcluster" column="0" row="0"
ndex="0">
  <documents size="2022136316.000000" total="7046" indexed="0" not_indexed="7046"/>
  <column_role state="Master" backups="0"/>
  <index_frequence min="0.000000" max="0.000000"/>
  <partition id="0" index_id="0" status="idle" type="dynamic"
    <documents active="0" total="0"/>
  </partition>
  <partition id="1" index_id="0" status="idle" type="dynamic"
    <documents active="0" total="0"/>
  </partition>
  <partition id="2" index_id="0" status="idle" type="dynamic"
    <documents active="0" total="0"/>
  </partition>
  <partition id="3" index_id="0" status="idle" type="dynamic"
    <documents active="0" total="0"/>
  </partition>
  <partition id="4" index_id="0" status="indexing (8%)"
    <documents active="0" total="0"/>
  </partition>
  <document_api number_of_elements="0" last_sequence="99610" frequence="0.000000"
    <queue_size current="0"/>
    <operations_processed api="0"/>
  </document_api>
</indexer>

i.       Much like the index reset above the alternative ways to watch the progress are through the Event Viewer and the indexer.txt log.  The messages will differ but the same end result will occur. 

j.       Resume WebAnalyzer and Relevancy Admin once the index rebuild completes.

                                                    i.     Open FAST Command Shell As Administrator on the FS4SP Admin node or the Server with the WebAnalyzer service enabled.

                                                   ii.     Execute: waadmin EnqueueView

                                                  iii.     Execute: spreladmin Enqueue


k.      Test your Search Center

Conclusion:
For each item indexed into the FS4SP index a FIXML file is created representing the item within the index.  The FIXML items stored on the FS4SP Servers does take up additional storage but it has several beneficial functions from validating security, debugging crawled/managed properties and as in this example fixing a corrupted index or even rebuilding an index.
Special Note: Rebuilding the index from scratch requires a lot more free disk space than an index reset.  Temporary files are created and released and you may consume twice as much disk space as the final index.  If you need to rebuild from scratch watch the amount of free disk space. If it runs out (about 2GB min) the process will fail and immediately start again. Until the lack of free storage space is addressed the process will continue to cycle.