SELECT Functional Specifications

Document History Table

Version	written/changed by	change date	reviewed by	major changes	send revisions to
0.1	Alton-Scheidl	1999-04-14	Wheeler		Alton-Scheidl
0.2	Alton-Scheidl	1999-04-15	Wheeler		Alton-Scheidl
0.6	Alton-Scheidl	1999-04-25	Palme		Alton-Scheidl
0.7	Alton-Scheidl	1999-05-06	Wheeler, Kovacs, Micsik, Messnarz	marked red	Alton-Scheidl
0.8	Alton-Scheidl	1999-05-31		focus on server functionality, other major changes marked green	Alton-Scheidl
0.9	Alton-Scheidl	1999-06-23	Palme	use privateURIs instead of ObjectIDs, summary DB gets instant ratings database, user DB gets rate db, TrustTag added, categories may hold value/text pairs	Alton-Scheidl
1.0	Alton-Scheidl	1999-08-04	Wheeler	default rating value is 0 to 9 with decimals allowed; Rater DB/record renamed to Profile DB/record and [memberOfGroup] added;	Alton-Scheidl
1.1	Alton-Scheidl	1999-08-27	Edinburgh Meeting	allow n instant rating values with statistical distribution frequency; Age replaced by Birthyear; separate implicit and given keywords in profiles, atomic and instant rating DBs; user interface specific parameters for profile added	Alton-Scheidl
1.2	Alton-Scheidl	1999-09-13	Procter	Review comments integrated: references to user requirements; indications of basic/advanced features	Alton-Scheidl

A. Overview & Summary

Within the SELECT project we will implement a basic and advanced SELECT service architecture as the first resp. second demonstrator.

A basic SELECT server will do the following

store incoming ratings
output sets of rated documents (with or without full document content) when queried according to given rules
list preconfigured rating categories

An advanced SELECT server will additionally

register users and manage their profiles
manage user groups
allow to register a rating category
allow to exchange rating data between networked servers

IP based protocols allow clients to connect to a SELECT server in order to submit ratings, query documents, sort documents by a given rating criteria or in order to manage a user's account and profile.

A SELECT client may be thin (pure HTML commands, a Java Applet, a browser's plug-in) or a more complex application like a groupware server or a Search Engine. If the client is a multi user servivce and already may handle user accounts (and offers unique user IDs) or is able to store user profiles, then user administration should be handled on the client side, in order to have as less personal and private data on a SELECT server as possible.

A SELECT service will handle the ratings of URIs, which should be URLs and can be an additional naming or numbering scheme. Thus, URIs must be stored together with a rating. In case a client (e.g. a groupware or search service) uses its own resource IDs, and in order to to be able to submit and retrieve query results with such private IDs, they can be stored as a a private URL or URN scheme. In principle, a resource shall be accessible with a URL (preferred) and/or a private URI. An example to adress a document at www.myService.net://Forums/tropical_flowers/145.html with a private pointer could look like: myService://12353679298.

Everyone is allowed to rate, but when ratings are used for filtering, all ratings need not have the some value. For example, anonymous ratings may be valued less, expert ratings valued more, or some other way of putting different weigth to different ratings in different categories of the same document; the user may choose in the query, to use a par ticular weighting scheme, such as a scheme where expert ratings are given the weight 1 and all other ratings the weight 0. A SELECT server will have built-in weighting schemes, which are more efficiently handled than other weighting schemes. In particular, we will start with one fixed rating scheme and the ratings summary data base is based on one particular set of weights, that an EXPERT's rating will carry more weight, and ANONYMOUS considerably less.

We have prepared a functional category for the user interface, to handle rewards for users, whose documents have been rated highly. This is not standard functionality as it is up to the implementors of a client to calculate and distribute rewards. One possible usage of a reward scheme in a client could be that a user is not allowed to benefit from rated documents in his/her own filtering and searching, unless this user also provides ratings and gives away rewards to other users. Weighting will be used as well in the filtering module.

This document focusses on the specification of a multi user and multi sourced ratings database service. However, it can be also used for a single user's collection of ratings.

The schema of the ratings format will be PICS-compliant and may be embedded in XML. We may restrict ourselves to only a subset of all kinds of categories which PICS allows.

B. User Requirements Conclusions

A detailed study of initial user requirements has been performed in the User Requirements Document, utilising questionnaires of user requirements directed at expert consortium partners, usage scenarios created to highlight the most commonly anticipated use cases, a questionnaire targeted at end users, and an overview of the field of internet resource user requirements.

Preliminary recommendations arising from this initial user requirements review include:

a server running the SELECT protocol on top of and complimenting an existing database communicating with Usenet and the WWW using HTML and standard news-reading protocols storing ratings in a standard format

a user with SELECT-enable software resident on their client machine which is compliant with existing web and news reader software and which stores a developing user profile in a standard format

a client-server relationship using a well-defined and flexible communication protocol supporting both the collection and retrieval of resource ratings and active and passive user roles; and whose query language includes the use of logically-marked keywords

C. Data Structures

Following the above mentioned User Requirements Conclusions and the we have worked out a general architecture, in which a SELECT server maintains four databases, which we describe here in detail:

a profiles database (storing user profiles and accounts) and possibly storing user groups
an atomic ratings database (storing each rating provided by a user, by user observation or by a machine)
an instant ratings database (storing 'quick' or pre-computed ratings for instant queries)
a rating categories database (where rating categories can be queried or registered).

In order to develop harmonized protocols we specify the most important data sets to be used in the SELECT service architecture in this chapter. The servers are holding both atomic & instant ratings. Atomic ratings are used for advanced filtering calculations (social filtering, etc.) and non-trivial queries. Every rating is stored separately as an atomic entity. Instant ratings are made accessible for fast queries for any SELECT client, but could be also stored or cached in a SELECT client (e.g. a groupware server). The SELECT protocol allows to ask a SELECT server's config file, which pre-derived ratings are stored in the instant ratings DB (default is upper quartile quality).

SELECT Data Model

[MetatagKeywords] are meant as XML or HTML style keywords specified by the author of the resource. Additionally, the rater or a third party can add keywords which describe the document. Each rater's keywords (if any) are stored with their rating entry in the atomic ratings database [GivenKeywords] and they are added to the instant ratings keywords [ImplicitKeywords]. They are stored to enable maximum flexibility for trying out different filtering techniques. For example, once an algorithm had determined that user X was similar in some way to user Y, keyword searches from user X could consult the list of user Y's keywords first, and then pre-sort them by rating. Also a user can add GivenKeywords to his/her profiles and keywords can be derived implicitly from the user's surf and rating behaviour.

Language is the natural language of the resource (if any), and like keywords, is optional. Author is the original author of the resource if known (not the user doing the rating) and expiry tag is a 'decay' factor either already in the document or assigned by the SELECT system on the fly according to the domain and application area.

Expired entries will be purged from the instant ratings data base at expiration date.

The service will support anonymous, pseudonymous and signed ratings, and every ratings service, which uses our software, should specify which of these alternatives to use. For signed ratings, a signature is stored together with a user record in order to check the user's identity.

D. Server Functionality

Here we describe the functionality of a SELECT rating server.
In case we set up a distributed network of SELECT servers, the functionality shall remain transparent to the interface.
The user may be known to the SELECT server or remain anonymous (with restricted functionality).

Rate

A user may submit a rating to a SELECT server from a generic or a customised interface or from within another application.

Add a rating

Instant Query

[n]

Instant Set Query

Query by Example

Complex Query

Filter Query

Alternative: it might be possible to create a simple tag system such as (KEYWORD SEARCH: [keys]) (NEWS SEARCH: [logical formulation]) - where the default, keyword search, would pass the user's query formulation (here, keywords) along with their UID and preferences (such as extent of filtering, or whether to filter at all). Having a different filtering module, NEWS would naturally have a complimentary language which it would expect to find in the query formulae.

Category Query

Administration

Register Profile
Secure, non-verified registration process. Server provides UID, may be stored as cookie.
Change Profile
A user who provides the correct password or provides the correct answer on a question to remember may change his/her user record, including the password or may change the keyword or interest list.
Delete Profile
User or administrator of server may remove a user's profile record.
Register Category
Service provider may register a new rating category. The rating category may have numerical or verbal values. Should be access controlled.
Import Atomic Ratings
Imports atomic ratings in a predefined format (for SELECT server-server communication).
Export Atomic Ratings
Exports atomic ratings in a predefined format (for SELECT server-server communication).
Add Group
Add a name for a user group.
Change Group
Add/remove members to the group.
Delete Group
Remove user group.

E. User Interface Functionality

We have removed in this document any detailled description of the user interface, as we have noticed that we will have many variants of it and that the user interfaces are very specific to application domains. The development and selection of user interfaces will be rather based on implementation trials, playing with different web techniques, such as:

OS and browser independent plug-ins or applets
proxy injection mechanisms
extended usage of JavaScript
frame based ratings
server side solutions (CGI, servlets)

Some general recommendations for the user interface are:

Rating interaction on a web page should not cause page reload nor disrupt the browsing process.
Depending on the application, users who rate should be offered to register at the rating service, in order to use enhanced rating features such as collaborative rating queries.
For default rating categories which can hold a value from 0-99, it is recommended to use a scale such as 0-9 and multiply by 11.

Please find some pointers to user interface examples below:

F. References

SELECT Draft Protocol Specifications Report
SELECT Draft Module Specifications Report
SELECT Interface Draft Functional Specification
SELECT User Requirements Report