SELECT

Telematics Application Programme

RE4008

Deliverable 2.1 Draft

Functional Specifications Report Written by:
Roland Alton-Scheidl
Richard Wheeler

Contractual Date of Delivery: 31 April 1999
Actual Date of Delivery:
Deliverable Type: RP*
Nature of the Deliverable: SP**

Workpackage WP2
Task S1
Responsibility: TUV

*Type: PU-public, LI-limited, RP-restricted
**Nature: PR-Prototype, RE-Report, SP-Specification, TO-Tool, OT-Other


Document History Table
Version written/changed by change date reviewed by major changes send revisions to
0.1 Alton-Scheidl 1999-04-14 Wheeler   Alton-Scheidl
0.2 Alton-Scheidl 1999-04-15 Wheeler   Alton-Scheidl
0.6 Alton-Scheidl 1999-04-25  Palme   Alton-Scheidl
0.7 Alton-Scheidl 1999-05-06 Wheeler, Kovacs, Micsik, Messnarz marked red Alton-Scheidl
0.8  Alton-Scheidl 1999-05-31   focus on server functionality, other major changes marked green Alton-Scheidl
0.9 Alton-Scheidl 1999-06-23 Palme use privateURIs instead of ObjectIDs, summary DB gets instant ratings database, user DB gets rate db, TrustTag added, categories may hold value/text pairs Alton-Scheidl
1.0 Alton-Scheidl 1999-08-04 Wheeler default rating value is 0 to 9 with decimals allowed; Rater DB/record renamed to Profile DB/record and [memberOfGroup] added; Alton-Scheidl
1.1 Alton-Scheidl 1999-08-27 Edinburgh Meeting allow n instant rating values with statistical distribution frequency; Age replaced by Birthyear; separate implicit and given keywords in profiles, atomic and instant rating DBs; user interface specific parameters for profile added Alton-Scheidl
1.2 Alton-Scheidl 1999-09-13 Procter Review comments integrated: references to user requirements; indications of basic/advanced features Alton-Scheidl

A. Overview & Summary

Within the SELECT project we will implement a basic and advanced SELECT service architecture as the first resp. second demonstrator.

A basic SELECT server will do the following

An advanced SELECT server will additionally IP based protocols allow clients to connect to a SELECT server in order to submit ratings, query documents, sort documents by a given rating criteria or in order to manage a user's account and profile.

A SELECT client may be thin (pure HTML commands, a Java Applet, a browser's plug-in) or a more complex application like a groupware server or a Search Engine. If the client is a multi user servivce and already may handle user accounts (and offers unique user IDs) or is able to store user profiles, then user administration should be handled on the client side, in order to have as less personal and private data on a SELECT server as possible.

A SELECT service will handle the ratings of URIs, which should be URLs and can be an additional naming or numbering scheme. Thus, URIs must be stored together with a rating. In case a client (e.g. a groupware or search service) uses its own resource IDs, and in order to to be able to submit and retrieve query results with such private IDs, they can be stored as a  a private URL or URN scheme. In principle, a resource shall be accessible with a URL (preferred) and/or a private URI. An example to adress a document at www.myService.net://Forums/tropical_flowers/145.html with a private pointer could look like: myService://12353679298.

Everyone is allowed to rate, but when ratings are used for filtering, all ratings need not have the some value. For example, anonymous ratings may be valued less, expert ratings valued more, or some other way of putting different weigth to different ratings in different categories of the same document; the user may choose in the query, to use a par ticular weighting scheme, such as a scheme where expert ratings are given the weight 1 and all other ratings the weight 0. A SELECT server will have built-in weighting schemes, which are more efficiently handled than other weighting schemes. In particular, we will start with one fixed rating scheme and the ratings summary data base is based on one particular set of weights, that an EXPERT's rating will carry more weight, and ANONYMOUS considerably less.

We have prepared a functional category for the user interface, to handle rewards for users, whose documents have been rated highly. This is not standard functionality as it is up to the implementors of a client to calculate and distribute rewards. One possible usage of a reward scheme in a client could be that a user is not allowed to benefit from rated documents in his/her own filtering and searching, unless this user also provides ratings and gives away rewards to other users. Weighting will be used as well in the filtering module.

This document focusses on the specification of a multi user and multi sourced ratings database service. However, it can be also used for a single user's collection of ratings.

The schema of the ratings format will be PICS-compliant and may be embedded in XML. We may restrict ourselves to only a subset of all kinds of categories which PICS allows.

B. User Requirements Conclusions

A detailed study of initial user requirements has been performed in the User Requirements Document, utilising questionnaires of user requirements directed at expert consortium partners, usage scenarios created to highlight the most commonly anticipated use cases, a questionnaire targeted at end users, and an overview of the field of internet resource user requirements.

Preliminary recommendations arising from this initial user requirements review include:

  • a server running the SELECT protocol on top of and complimenting an existing database communicating with Usenet and the WWW using HTML and standard news-reading protocols storing ratings in a standard format
  • a user with SELECT-enable software resident on their client machine which is compliant with existing web and news reader software and which stores a developing user profile in a standard format
  • a client-server relationship using a well-defined and flexible communication protocol supporting both the collection and retrieval of resource ratings and active and passive user roles; and whose query language includes the use of logically-marked keywords

    C. Data Structures

    Following the above mentioned User Requirements Conclusions and the we have worked out a general architecture, in which a SELECT server maintains four databases, which we describe here in detail:
    1. a profiles database (storing user profiles and accounts) and possibly storing user groups
    2. an  atomic ratings database (storing each rating provided by a user, by user observation or by a machine)
    3. an instant ratings database (storing 'quick' or pre-computed ratings for instant queries)
    4. a rating categories database (where rating categories can be queried or registered).
  • SELECT Functional Architecture

    Diagram 1: SELECT Functional Architecture

    In order to develop harmonized protocols we specify the most important data sets to be used in the SELECT service architecture in this chapter. The servers are holding both atomic & instant ratings. Atomic ratings are used for advanced filtering calculations (social filtering, etc.) and non-trivial queries. Every rating is stored separately as an atomic entity. Instant ratings are made accessible for fast queries for any SELECT client, but could be also stored or cached in a SELECT client (e.g. a groupware server). The SELECT protocol allows to ask a SELECT server's config file, which pre-derived ratings are stored in the instant ratings DB (default is upper quartile quality).

    SELECT Data Model

    Notation: 
    :: defines
    [is an optional parameter]
    /* comment */
    primary indexed parameter
    
    /* Atomic Rating Records */
    /* Each individual rating is stored here */
    Rating::
    ResourceLocation,
    RaterIdentifier,   /* pointer to record in profiles database, may be a user or a machine */
    IPaddress,         /* store rater's IP address to be able to check for anonymous spam-ratings */
    Cookie,             /* store also rater's cookie in order to check for spam ratings */
    RatingValue,        /* <0 to 9 (decimals allowed)> | <any value for non-predefined rating categories, as registered > */
    RatingCategory,     /* predefined: Quality|Relevance|Language|Service */
    RatingDate,
    Adult::boolean,
    RaterTag,
    TrustTag,
    [Keywords],         /* keywords assigned by the rater */
    [RatingContext]
    /* Any resource must have URL or URN, but may be also addressed 
    using a resource ID specific to a document or groupware system */
    ResourceLocation::
    URI,             /* usually URL of resource; URL is subset or URI, see http://www.w3.org/Addressing/ */ 
    [privateURI],    /* a private URI scheme may be used by e.g. groupware clients to address its messages directly
                     example: w4goid:msg-d88c24f3ae-6120e08e-62761b826964 */
    /* Rating context reflects the intention of the user when rating */
    RatingContext::General|Business|Leisure|Shopping|Research|Politics
    
    /* Rater tag and trust are assigned by user interface or client */
    RaterTag::Author|Expert|User|Anonymous|Machine
    TrustTag::Signed|Registered|Pseudonymous|Anonymous
    RaterIdentifier is null if anonymous
    /* Instant Rating Records */
    /* Each resource (document, web page, message, etc.), which got at least one rating, 
    is stored here. Its instant rating value(s) is/are re-calculated if a new rating is done. 
    The default value (InstantRating[0]) is the upper-quartile of quality. Any other pre-derived ratings are subject to further research. */ 
    
    Resource:: 
    ResourceLocation,
    InstantRating [n] :: Value,       /* e.g. average upper quartile quality ratings values, profile-free, range is 0-9 */
                         Confidence,  /* number of ratings used for InstantRatingValue */
                         FrequencyDistribution [0-9]  /* counts distribution of ratings for each value */ 
    Adult::boolean,     /* set, if significant ratings indicate adult content */
    EntryDate,          /* date, when record has been added to SELECT server */
    [ResourceDate],     /* date of last change of document (as e.g. retrieved in http protocol) */
    [Size],             /* document size */
    [Title],            /* derived from TITLE or the subject if a message */
    [MetatagKeywords],  /* derived from METATAG */
    [ImplicitKeywords], /* keywords assigned by users or NLP */
    [Summary],          /* from meta tag or, if there is no contents meta tag, first three lines of BODY */
    [LanguageList],     /* language(s) in document, use RFC 1766 language codes */
    [Author],           /* document author */
    [ExpirationDate],   /* estimated or known expiration date of document. Ratings are automatically purged at this time. */
    [Content],          /* may contain a full copy of the resource e.g. the HTML source or MHTML to include also inline objects.
                           If anything and how much is stored will be depending on the SELECT service and its capabilities */
    [RatingContext]     /* rating context */
    
    
    
    /* Profiles Records */
    /* user profile may be stored at SELECT client or at server or may be mirrored at server */
    Profile::
    RaterIdentifier,
    uniqueName,
    Password,
    [Cookie],
    [CookieLifetime],
    [Signature]         /* signature key to verify user for signed ratings */
    [RememberPhraseQuestion],  /* a question, which the user shall be able to answer, in case s/he forgets the password */
    [RememberPhraseAnswer],   
    [Birthyear], /* 4 digit number, year a.d., e.g. 1964 */
    [Gender],
    [GivenKeywords],    /* keywords of interest added manually by user */
    [ImplicitKeywords]  /* keywords gained from discussion fora, in which user participated */
    [InterestProfile],  /* syntax to be specified, allows to define user preferences more exactly than just using keywords */
    [QueryHistory],     /* will be used as an input for some predictive methods */
    [languagesRead],    /* to be listed in a preference order, use RFC 1766 language codes */
    [rewardAccount],
    [LastRatingDate],   /* next parameters give some control on rating activity to trigger warning about spammed ratings */
    [NoOfDocsRated],
    [AverageRatingsGiven],
    [memberOfGroup]
    DisplayRatingsInfo: Boolean (default: true)
       /* This will be used to determine whether or not people want to see other 
       people's ratings as a reward for rating the page themselves. This might be 
       stored in the UID and accessed through an HTML page linked to the server for 
       updating the user's UID.  As such, it could influence what the server sends back, 
       rather than filtering it on the client side.*/
    UseRatingsFrom: Entire Web | Peer Group | Friends
       /* This will determine what group of people you wish to use as a basis for recommendations: 
       Entire Web = Summarised Database Ratings (default)
       Profile Matches = Profile Matched Ratings 
       Friends = User nominated group of SELECT users */
    AutomaticEnhancement: Boolean (default: true)
       /* This will determine whether or not the SELECT system will automatically 
       enhance content with ratings based information  (e.g. colouring hyperlinks, 
       inserting "thumbs up" images next to hyperlinks etc.) */ 
    KeywordQualityInfo: 0...9 (real number, default = 5)
       /* This will determine the balance between keyword matching and quality to 
       use when retrieving search results from the server. */
    ReadingSpeed: Numeric, (default = 250 words per minute)
       /* A measurement of the user's average words/minute reading speed (used to determine 
       how much of a document the user has read). */
    IncludeProfileSearch: Boolean (default: false)
       /* Determine whether or not to use profile matching when performing a search. */
    ML: Boolean (default = true)
       /* Determine whether or not to use ML algorithms and generate automatic ratings. */
    Data: Profile Data
       /* A generic data field will allow the addition / deletion of
       elements without the need to add/remove bits from the profile. Much like a
       cookie. Should have the ability to expand in size. */
    
    /* Rating Categories */
    /* 4 Rating categories are pre-registered with values
    0 to 9, any other value is allowed for other categories. */
    RatingCategory::Quality /* default! */ |Relevance|Language|Service|[...] /* any other category not to be necessarily handled by other servers */ /* value pair example: [1,2,3,4,5;Awful,Poor,Average,Good,Great!] */ Service provider functions required are: - register rating category with numerical values or numerical/textual value pairs - query possible rating categories and rating values - remove a rating category Adding and changing rating categories will be strongly access controlled.
     
    [MetatagKeywords] are meant as XML or HTML style keywords specified by the author of the resource. Additionally, the rater or a third party can add keywords which describe the document. Each rater's keywords (if any) are stored with their rating entry in the atomic ratings database [GivenKeywords] and they are added to the instant ratings keywords [ImplicitKeywords]. They are stored to enable maximum flexibility for trying out different filtering techniques. For example, once an algorithm had determined that user X was similar in some way to user Y, keyword searches from user X could consult the list of user Y's keywords first, and then pre-sort them by rating. Also a user can add GivenKeywords to his/her profiles and keywords can be derived implicitly from the user's surf and rating behaviour.

    Language is the natural language of the resource (if any), and like keywords, is optional. Author is the original author of the resource if known (not the user doing the rating) and expiry tag is a 'decay' factor either already in the document or assigned by the SELECT system on the fly according to the domain and application area.

    Expired entries will be purged from the instant ratings data base at expiration date.

    The service will support anonymous, pseudonymous and signed ratings, and every ratings service, which uses our software, should specify which of these alternatives to use. For signed ratings, a signature is stored together with a user record in order to check the user's identity.

    D. Server Functionality

    Here we describe the functionality of a SELECT rating server.
    In case we set up a distributed network of SELECT servers, the functionality shall remain transparent to the interface.
    The user may be known to the SELECT server or remain anonymous (with restricted functionality).

    Rate

    A user may submit a rating to a SELECT server from a generic or a customised interface or from within another application.

    Add a rating

        SELECT server receives rating on a specified resource and creates an atomic rating record.
        If resource has been rated previously, the resource record is updated (if rating was on quality, InstantRatingValue is recalculated, keywords are added, if rater has provided new keywords, coresponding RatingContext is reassigned, if a context has been assigned by rater). Instead of updating the resource record immediately, this could be done in a periodical background process on all resources, that have been rated in the last period.
        If a resource has not yet been rated before, SELECT server grabs from the resource title, size, date, possibly keywords, language and creator from meta tag and creates a resource object in the rating database.
        SELECT server returns new or old (if not calculated immediately) InstantRatingValue.

      Instant Query

      For the Instant Set Query, the Query by Example or the Filter Query, the interface may specify, which parameters from the resource records should be returned.
       

        Instant Query

        SELECT server returns InstantRatingValue[n] on a given resource URI.

        Instant Set Query

        SELECT server returns ranked resource URIs on given resource URIs. The interface (a groupware server, a document management system) sends a list of URIs to the SELECT server, and gets back a list of these URIs, sorted by an instant rating value and with added rating information on them. Rated URIs may be sparsely populated, not ouputting any result on an instant set query. Shall be used only on SELECT servers where a high percentage of documents is rated (possibly by machine).

        Query by Example

        SELECT server returns resources, that match a "Query by Example".
        This is simple to implement when using SQL statements.

      Complex Query

        Filter Query

        SELECT server returns  resources that match a query which is defined as a boolean or a SQL statement.
        The parameters for the query statements may include record parameters such as Keywords, InterestProfile or Resource.
        The parameters may include: any from instant records, any from atomic ratings
        records and a user's age, gender, keywords and rewardAccount. 

        Example: all resources matching keyword "cat" but not from a resource ending
        with ".com" and rated by women older than 35 

        In future, filter queries may be enriched with collaborative filtering expressions.      

        Example: Return five URLs with keyword "book" and highly ranked by users
        who have rated resources' quality similar to me.

        Alternative: it might be possible to create a simple tag system such as (KEYWORD SEARCH: [keys]) (NEWS SEARCH: [logical formulation]) - where the default, keyword search, would pass the user's query formulation (here, keywords) along with their UID and preferences (such as extent of filtering, or whether to filter at all). Having a different filtering module, NEWS would naturally have a complimentary language which it would expect to find in the query formulae.

        Category Query

        An interface may ask a SELECT server which or if a certain rating category is available and which rating range or values have been registered with a category.

      Administration

      A SELECT server may handle user accounts and may assign users to groups. Handling of groups is offered to make queries simpler, that involve many users. User group management is access controlled. A user may always change the own settings.

      Register Profile

      Secure, non-verified registration process. Server provides UID, may be stored as cookie.

      Change Profile

      A user who provides the correct password or provides the correct answer on a question to remember may change his/her user record, including the password or may change the keyword or interest list.

      Delete Profile

      User or administrator of server may remove a user's profile record.

      Register  Category

      Service provider may register a new rating category. The rating category may have numerical or verbal values. Should be access controlled.

      Import Atomic Ratings

      Imports atomic ratings in a predefined format (for SELECT server-server communication).

      Export Atomic Ratings

      Exports atomic ratings in a predefined format (for SELECT server-server communication).

      Add Group

      Add a name for a user group.

      Change Group

      Add/remove members to the group.

      Delete Group

      Remove user group.

    E. User Interface Functionality

    We have removed in this document any detailled description of the user interface, as we have noticed that we will have many variants of it and that the user interfaces are very specific to application domains. The development and selection of user interfaces will be rather  based on implementation trials, playing with different web techniques, such as: Some general recommendations for the user interface are: Please find some pointers to user interface examples below:

    F. References