Spider and Search System for Learning Objects (SASSLO)

The latest version of SASSLO source code is 2.8 !

SASSLO Spider and Download Engine - User Manual

1) Steps in using the engine to search and download learning objects in web pages.
    1.0   Login as SASSLO administrator.
    1.1   Input the web address.
    1.2   Select the file types or file extensions that you wish to download.
    1.3   Input the limit parameters.
    1.4   Add the scan rules.
    1.5   Press the "download" button.
2) Check and select the downloaded objects
    2.1   Discard any unwanted objects.
    2.2   Input descriptions.

1.0 Login as SASSLO administrator.

Figure 1a. Login GUI
You need to login as SASSLO administrator to start using spider.

1.1 Input the web address.

Figure 1b. Input the Web Address GUI
You can input the target web address that the engine starts to search.

1.2 Select the file types or file extensions that you wish to download.

Figure 2a. Select File Types GUI

You can select a particular file type (e.g. "Video Files") by selecting the checkbox next to the file type.
You can also select all the file types by clicking on the "select all available file extensions" button or press the "Load Default File Extensions" button to load the default file extensions(.swf, .class, .zip and .jar).

Figure 2b. Select File Extensions GUI

Each file type contains an array of file extensions. By clicking on the "show details" button, you can specifically select some of the file extensions on the list. You can also add new file extension to the system by entering the extension name and file description, and then pressing the "Add" button.

1.3 Input the limit parameters.

Figure 3. Input the Limit Parameters GUI

Maximum mirroring depth:
This defines how deep the engine will seek. A depth of "2" means that you will catch the web page you enter (see 1.1 above), plus all that can be accessed through any link in the web page.
The default value is "1".

Timeout period:
Define what time the engine has to wait if there is no response given by the remote server.

Timeout period for each file download:
Define the maximum time for each file download connection.

1.4 Add the scan rules.

Figure 4a. Add Scan Rules GUI
You can add a filter to exclude a file extension, a host and a web page which contains some keywords.

1. Choose a criterion of the search link that will be checked by the system

Figure 4b. Select Criterion GUI
2. Input the keywords or file extensions which will be combined with the chosen criterion to form a condition that the search link must be satisfied.
(e.g. if your inputs are:

  Select Criterion: "File name with extension:"
  Select Input String: "doc"

  the rule "-*.doc" will be generated which means that all .doc file are excluded.)

   Figure 4c. Rule Generation

1.5 Press the "download" button.

Figure 5. "Download" Button GUI
Press the "download" button to start download.

2.1 Discard any unwanted objects

Figure 6. Downloaded Object GUI
After the engine has finished downloading the objects, the information of the downloaded objects will be displayed for checking. The information includes the link of the original web page which contains the objects, the original download link, filename, downloaded file size and date added. If the downloaded object is an image, it will be displayed for instant checking.

You can discard any objects if you think they are not useful by de-selecting the "Pick this downloaded object?" checkbox.
You can also discard all the objects downloaded from a particular web page by de-selecting the "Pick all downloaded objects with selected file extensions in this page?" checkbox.

2.2 Input descriptions

By default, all the checkboxs are selected, which means that all the objects are picked.

Figure 7. Input Descriptions GUI
By clicking on the "Input Descriptions" button, you can input more details for the downloaded objects, such as "subject category", "author", "title", etc.
After the annotation of objects, you can click on the "update" button placed at the bottom of the page to store the information to the learning objects database.