System Requirements

For Unix
  Before Installing
  Installing in Unix
  Installation Confirmation

For Windows
  Installing in Windows
  Installation Confirmation

 

 

 

 


Using DS 4
  Basic Search
   Advanced Search
   Meta Search

Administrating DS 4
   Introducing the Admin Page
  Indexing
  Search  
  Dictionary
  Server
  Log


Advanced Features
  Templates
   CGI-Handler
   Multi-Templates

 

 

 

 

 

 

 

 

Administrating DS 4


Indexing

green02_next.gif Indexing/URLs

This page lets you assign or register URLs to be indexed, URLs not to be indexed, external filters, and file extensions to be indexed.

  • Indexing URLs
  • URLs to be added/modified

    Type in new or modified URLs of sites you wish to index in the box. Then choose from the following options.

    Options

    - Bypass web robot rules (1): Check this box if you want DeepSearch to index URLs even if they have web robot rules (i.e.-robot.txt)
    1http://www.namo.com
    - Index CGI pages (2): Check this box if you want DeepSearch to follow and index CGI pages or files (i.e.-bulletin boards, guest book)
    2http://www.namo.com
    - Do not Index (4): Check this box if you do not wish to index the site
    4http://www.namo.com

    If you check all the boxes and then click add or modified, the new or modified URL will have the value 7 (1+2+4) assigned to it (i.e.-7http://www.namo.com)

    Specify CGI Handler

    If you wish to index non-HTML files in the database, you need to specify the URL of a CGI program that connects DeepSearch and the database to be indexed. In most cases, the program will be in the /cgi-bin/ directory and the URL will be http://www.namo.com/cgi-bin/. If you are using DeepSearch with a web hosting service, your CGI URL may be in the form of http://xxx.xxx.xxx/~userid/cgi-bin/.
    Refer to CGI-Handler link in http://www.namo.com/deepsearch/ for more information on what are CGI-Handlers and how to make them.

  • Use Exclusion Rules

    Type in the URLs, directories, or files you do not wish to index with the appropriate variables and operators. For example, if you wish to exclude http://www.namo.com/download/index.html from URLs to be indexed, type in the following exclusion rule in the box:
    $SITE=www.namo.com&$PATH=/download/&$FILE=index.html
    The above rule will exclude http://www.namo.com/download/index.html. DeepSearch will recognize capital letters and non-capital letters as the same. If you wish to simply use the $URL variable, type in the following exclusion rule in the box:
    $URL=www.namo.com/download/index.html
    If you wish to exclude two or more sites, directories, or files, use operators such as & (AND), | (OR), * (Wild Card), and ^ (Sub String).

  • External Filters

    By assigning external filters, you can view non-HTML files on web browsers as well. DeepSearch already assigns filter programs for common document formats such as DOC, XLS, PPT, and PDF files. If there is a file format you wish to add to the list, type in its filename extension in the box and assign a filter program from the drop-down menu. If there is a specific document that needs a specific filter program, use “File Manager” to upload it.

  • File Extensions to be Indexed

    In this section, you can add, delete, or modify file extensions to be indexed.

 

green02_next.gif Indexing/Options

Indexing/Options is composed of Basic, Advanced, Level Check, and Automatic Indexing options. By setting these various options, DeepSearch can perform its search operations more accurately and swiftly.

    Basic Options

     

    Indexer

    Choose the type of indexer. You can either choose to use the default or the advanced indexer (if installed).

    Web Robot Speed

    Choose the speed at which the robot will retrieve the pages from the target website for indexing. If the target website and DeepSearch are on the same computer (server), choose 0 for the fastest speed. If the target website is on a remote server, choose 2 or greater. (0=fastest, 9=slowest)

    Indexing Method: Full indexing takes a lot of time. Therefore you can choose to incrementally index (Incremental) if you want DeepSearch to only index added or modified pages. You can also ask DeepSearch to only index those pages that are visited most often using artificial intelligence (Smart).

    Robot Threads

    Here you set the number of robots you wish to use when operating indexing. Increasing the number of threads will increase the indexing speed but it will increase target web sites’ traffic as well.

    HTTP Header Information (name:value)

    Assign a value that would give permissions to index sites that require authentications. Administrators could regulate permissions by setting the same value as the DeepSearch HTTP Header value in the authentication algorithm.

    CGI Handler Page

    Set the amount of time and the number of tries for following through CGI Pages.

    Advanced Options

     

    Index Numbers

    Check this box if you wish to index numerical characters as well.

    Follow Commented Links: Check this box if you wish to follow through and index the commented links such as <!-- <a href=aa.html”xxxxx</a> -->.

    Index Words in Stop Lists

    Check this box if you want DeepSearch to index very common words such as is, and, or, etc.

    Follow Javascript/Flash Links: Check this box if you wish to index javascripts and Flash links.

    Number of DB files

    DeepSearch saves indexed files in the form of a database. Here you assign the number of databases you wish to use. In most cases, but if there are a lot of pages to be indexed, increase the number of DB files to use. If you choose two or more DB files when there arent many pages to be indexed, you can slow down the indexing or searching speed.

    Detect Image Sizes

    In order to display the original size (width and height) of the image file on the result page, you may choose to either record the image size while indexing, while searching, or do not record them at all.

    Size of Indexing Buffer

    Assign the number of words you wish to temporarily save in buffer memory (RAM) during indexing. If you assign a large number, the indexing processes will be faster but use more memory space. If you assign a small number, the indexing process will be slower but use less memory space as well.

    Directory Depth Limit

    DeepSearch indexing works by following trails of links starting from the target URL. However, the indexing operation will never stop if there are circular links or some CGI programs produce incorrect links infinitely. To prevent this, you may choose to limit directory depth that DeepSearch will index. The default value 10 means to stop the indexing operation after following the links 10 times.

    Number of Batch Files

    Assign how many documents DeepSearch should process at once for indexing. Assigning a lesser value will decrease the processing speed but use less hard-disk (drive) space. If you do not have enough web server space, it is highly recommended to decrease this value.

    Maximum File Size

    Here you can limit the size of files to index. Usually, web documents are fairly small. If an HTML file is very large like 10 megabytes, it is probably a non-HTML file (like a multimedia file) whose file extension has been designated as *.html. If DeepSearch tries to index such files, an error will occur.

    ** Performing incremental indexing only can lead to inaccurate search results. To prevent this, you need to perform full indexing after X number of incremental indexing.

    Level Check

    Some sites restrict access by requiring authorizations or logging in. There are many levels of authorizations and DeepSearch supports from level 0 to level 15 (16 levels). To access these pages, you must use the CGI handler (*.php, *.asp, *.jsp, etc.) to find out the cookie level value and save this information into DeepSearch indexing database.
    - Cookie Name: Type in the variable name of the CGI handler that would give cookie level information.
    For example, the following CGI-handler, login_proc.php, will have a cookie name cookie_level:


    - Program Name: Type in the file name of a program (no extensions) that would decode cookie level values if the values were coded.
    An example source of a decoding program:


    Automatic Indexing

    In order to keep search results up to date, it is highly recommended to index as often as possible. Namo DeepSearch features automatic indexing because it is inconvenient to manually initiate indexing every time.
    - Specify Time of Day: Specify the desired day of the week and the time of day.
    - Specify Interval: Specify the desired interval in minutes between indexing jobs.
     


green02_next.gif
Indexing/Categories

This option lets you structure the contents of your website into categories. By doing so, users can better understand the organization of your web sites.


    Category
    ID

    Type in the desired category name in the box and click add.

    **The number of _ characters indicates the depth. For example, Namo home page is _/1. Namo; whereas Namo/Products page is __/2.Namo/Products; and Namo/Products/DeepSearch 4 is ___/3.Namo/Products/DeepSearch4.

    Category
    Rules

    Here you assign the URL, Site, Path, or the file location of each Category IDs. Select the appropriate Category ID from the drop down menu and assign a rule to it. Refer to the following example:
    Namo /Products /DeepSearch4.2
    Namo /Products /WebEditor4
    Namo /Company
    Namo /Downloads

    1) Namo/Products/DeepSearch4.2 =$PATH=/products/DeepSearch4.2/*
    2) Namo/Products/WebEditor4 =$PATH=/products/WebEditor4/*
    3) Namo/Products =$PATH=/products/*
    4) Namo/Company =$PATH=/company/*
    5) Namo/Downloads =$PATH=/download/*
    6) Namo =$SITE=www.namo.com

     

     

    ** Input order is important. Letter case, empty characters, and spaces in the variable values are ignored. If the file name contains non-alphabetical characters such as # in download#1.html, use double quotes.

    **After organizing your site categorically, you need to index again.

 

black03_up.gif


Copyright © 1997-2004 SJ Namo, Inc. All rights reserved.
Contact