Date:         Tue, 16 Mar 1993 10:16:57 ECT
Reply-To:     Indexer's Discussion Group <INDEX-L@BINGVMB.BITNET>
Sender:       Indexer's Discussion Group <INDEX-L@BINGVMB.BITNET>
From:         "T.C. CRAVEN" <35007_567@uwovax.uwo.ca>
Subject:      Re: Spreading Activation

----------------------------Original message----------------------------
In reply to the request of Alicia Abrahamson (March 10), I am making
use of a spreading activation model with a thesaurus as part of a
computer-aided abstracting toolkit that I am developing. I've submitted
a paper on this topic for the 1993 ASIS Annual Meeting.

The basic idea is to get a set of weighted thesaurus terms from
an initial set of keywords. The terms can then be displayed to the
abstractor as suggestions for use in the abstract, or they can be
used in weighting full-text sentences to determine the most interesting
for a particular purpose. A similar approach could no doubt be used
for indexing assistance.

An article worth examining might be

 Paice, C.D. 1991. "A thesaural model of information retrieval".
  Information processing and management. 27 (5): 433-447.


Tim Craven,
U of Western Ontario
35007_567@UWOVAX.UWO.CA
=========================================================================
Date:         Tue, 16 Mar 1993 10:18:44 ECT
Reply-To:     Indexer's Discussion Group <INDEX-L@BINGVMB.BITNET>
Sender:       Indexer's Discussion Group <INDEX-L@BINGVMB.BITNET>
From:         David Lewis <lewis@research.att.com>
Subject:      Query: Automated aids to controlled vocabulary indexing

----------------------------Original message----------------------------

     In a recent discussion on PACS-L, several people expressed the
belief that artificial intelligence software was more likely to be of
use as an aid to human indexing than as a replacement.  I know of a
couple of systems of this sort that have been fielded by Carnegie
Group.  I would be interested in getting pointers to other such
software that currently exists, or to published research in this area.
I'm particularly interested in systems that suggest controlled
vocabulary categories to be assigned, with the human indexer having
the opportunity to confirm or override the machine's suggestion.
    If you reply to me I will post a summary of replies to this list,
and will include your name along with your comment unless otherwise
requested.

thanks, dave
=========================================================================
Date:         Tue, 16 Mar 1993 10:19:58 ECT
Reply-To:     Indexer's Discussion Group <INDEX-L@BINGVMB.BITNET>
Sender:       Indexer's Discussion Group <INDEX-L@BINGVMB.BITNET>
From:         Edie Rasmussen <EMR1@pittvms.BITNET>
Subject:      Re: Seeking guidelines for abstracting

----------------------------Original message----------------------------

There doesn't seem to have been too much written in recent years
on abstracting.  Certainly the "Indexing & Abstracting" books
written in the last few years have only 1 or at most 2 chapters
on abstracting with the rest on indexing (and I must admit I
structure my I&A course about the same way).

There was some early work by Borko, and he co-authored a book on
abstracting:

  H. Borko and C.L. Bernier (1975).  Abstracting Concepts and Methods.
   New York:  Academic.

There's also a standard (Z39.14) for writing abstracts, which has some
examples of informative and indicative.  And going way back, the
article on abstracts and abstracting in the Encyclopedia of Library
and Information Science (vol. 1!) gives some good examples of
what not to do... verbosity, redundancy, etc.

More recently, Raya Fidel did some work on evaluating the potential
of abstract writing for enhancing retrieval performance in an online
environment.  (Information PRocessing & Management 22:309 (1986) and
Journal of Documentation 42: 11 (1986)).

As Kate McCain suggests, if you can get hold of inhouse guidelines
from database producers they often provide useful examples.

Cheers,
Edie Rasmussen
School of Library and Information Science
University of Pittsburgh
(emr1@vms.cis.pitt.edu)
=========================================================================
Date:         Wed, 17 Mar 1993 09:31:26 ECT
Reply-To:     Indexer's Discussion Group <INDEX-L@BINGVMB.BITNET>
Sender:       Indexer's Discussion Group <INDEX-L@BINGVMB.BITNET>
From:         David Lewis <lewis@research.att.com>
Subject:      response to my query on query length

----------------------------Original message----------------------------

Here's a summary of responses to my query on studies about length of
queries people make to online retrieval systems.  Many thanks to all
of you who responded!

Best wishes, Dave

****************************************************************************

From:    Allyson Carlyle                      <IBLEACC@MVS.OAC.UCLA.EDU>

Are you interested in online catalog searches?  I just finished
reading a dissertation that analyzed subject searches in an online
catalog and I believe she looked at how many words the searches
consisted of:

Marilyn A. Lester.  Coincidence of User Vocabulary and Library of
Congress Subject Headings: Experiments to Improve Subject Access in
Academic Library Online Catalogs.  Dissertation, Univ. of Illinois at
Champagne-Urbana, 1989.

****************************************************************************

From:         SALLY WINTHROP <SALWIN@gwuvm.gwu.edu>

Hello David - I don't know if this will help you at all, but there was
a presentation at the Computers in Libraries last week were two people,
Janice Newkirk and Trudi Jacobson from SUNY Albany presented findings
from a CD-ROM search strategy analysis projectn that addressed some of
your questions. I'm sorry I don't know their phone numbers or email
addresses, but they were from the mail library..

****************************************************************************

From:         "Stephen J. Cavrak, Jr." <SJC@uvmvm.uvm.edu>

No research to back this up, but as a searcher, I've found that I
can do a better job shifting through 100 entries than the computer
can.  One of the reasons is that the "broader" search gives me
information that I would not have expected to find, basically
a serendipity effect.

****************************************************************************

From: Dietmar Wolfram <dwolfram@convex.csd.uwm.edu>

As part of an IR system modelling and simulation study, I looked at the
distribution of query lengths for several database environments (OPAC,
bibliographic, and full-text). The data for the two latter types were
collected based on recorded search strategies encompassing a number of
databases. I found that the data could be modelled using a negative binomial
distribution, and as you suspect, that shorter queries are more numerous.

A description of this aspect of the study can be found in:

     Wolfram, D.  (1992). Applying informetric characteristics of databases
          to IR system file design, part 1: informetric models. Information
          Processing and Management 28(1), 121-133.

******************************************************************************

From: Lee Jaffe <jaffe@scilibx.ucsc.edu>

You might look into the writings of Christine Borgman;

Borgman, Christine L.  "Why are Online Catalogs Hard to Use? Lessons
      Learned from Information-Retrieval Studies."  JOURNAL OF THE AMERICAN
      SOCIETY FOR INFORMATION SCIENCE, v37 n6 p387-400 Nov 1986.  (EJ345851)

Borgman, Christine L.  "All Users of Information Retrieval Systems Are Not
      Created Equal: An Exploration into Individual Differences."  INFORMATION
      PROCESSING AND MANAGEMENT, v25 n3 p237-51 1989.  (EJ399422)

Borgman, Christine L.  "Mental Models: Ways of Looking at a System."
       BULLETIN OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE, v9 n2 p38-39
       Dec 1982.  (EJ276742)

******************************************************************************

From: Jim Olivetti <jolivett@capcon.net>

Hi, David.
        I have also had need for such information.  OCLC in developing their
FirstSearch enduser interface had some data (I don't know if its public)  One
thing they claimed was that "OR" is hardly ever used by endusers or
professional searchers.  I have forgotten the figures but it was under 20% of
searches monitored, I think.
        I'd very much like to hear what else you learn!
        Regards, Jim Olivetti, jolivett@capcon.net

******************************************************************************

From: Dag Vaula <Vaula@ub.uib.no>

I found a reference in Current Awareness Abstracts (Aslib) that
may be relevant:

User practices in keyword and boolean searching on an online
public access catalog.
Ensor, P. (Indiana State Univ.), Information Technology &
Libraries, 11(3) Sept.1992,p.210-219.

This article has 17 references which may lead you further.
You could also consider searching the online database LISA
which is carried by DIALOG and probably also by other hosts.

******************************************************************************

From: Hugo Besemer PUDOC-DLO <BESEMER@JKA.WAU.NL>

You asked for references on the complexity (or lack of complexity)  of
boolean search statements as they are submitted by end-users. The following
may be useful:

The best possible online search.
/by: Weaver, Maggie
In: Online Information 92 : 16th INternational Online Information Meeting
Proceedings (London 8-10 December 1992). p.375-388. Learned Information,
Oxford, 1992.

Measures to discriminate among online searchers with Different training and
experience.
/by: Howard, H.
In: Online Review, 6(4), p. 315-327. 1982

End-users, mediated searches, and front-end assistance programs on Dialog:
a comparison of learning performance and satisfaction.
/by:Sullivan, M.V.
In: Journal of the American Society for Information Science, 41
[I like this one, it does not confirm the assumptions of the information
professional. Hugo]

A comparitive evaluation and analysis of End Users performance in an
academic environment.
/by:Walker, M.G.P.
In: PhD Thesis, Syracuse University, 1988.
[I could not get hold of this one. Hugo]

We made some observations with cd-rom users in our libraries, but the
report is in dutch. Is it of any use to you?

My impression is that end users tend to make very simple search requests.
But, when we started giving courses for cd-rom searchers we were completely
swamped and we are still overbooked. So perhaps end-users, not unlike
information professionals, just have to be taught to make more complex
searches.

******************************************************************************

From: "Clifford Lynch" <CALUR%UCCMVSA.bitnet@uccvma.ucop.edu>

David -- there is some info on this in the melvyl statistic screens
that we post weekly, I think. I did some analysis back in about 1987
in more depth as part of my PhD thesis. Most queries are indeed
very short.
Clifford Lynch

******************************************************************************

From: Glee Cady <glee@cic.net>

Try mailing a query directly to Walt Crawford, Research Libraries
Group.  His email address is br.wcc%rlg@forsythe.stanford.edu
Walt has/had  program which analyzed the searches and other
commands that were given in the RLIN system.  He published an
article about them long ago in _Journal of Library Automation_, I
think.  Anyway, he'd be delighted to tell you about it, I'm sure.

<I contacted Crawford, but data on query length was not collected.>

******************************************************************************


From: JN829%ALBNYVMS.bitnet@UACSC2.ALBANY.EDU (Janice Newkirk)

To David Lewis --  Here at the University at Albany (SUNY) a colleague
(Trudi Jacobson) and I have done a pilot study analyzing CD-ROM search
transactions (in preparation for a more complete study looking at
how our library instruction affects how end-users search.)  We have
preliminary data on what types of search statements make up search
strategies of over 500 end-users, who are mostly students.  We were
looking mostly at how boolean operators were used but at several other
variables as well.  Actually, if the entire search strategies are
taken into account, very few were one word long.  In fact, one of the
major problems we've observed is that end-user searchers tend to input
entire phrases in natural language as a search statement.  If you are
interested in any of our data, let me know.

Also, we have a pretty extensive potential bibliography of research and
commentary on search transaction analysis.  Also, during the last six
months, someone posted to this list a query about the subject; we
contacted him; and we received a copy of his bibliography on the subject.
I have lost the name, but if you wanted to follow up you might query
the list about who was doing this.

and

Dave -- The bibliography I wrote to you about (I have just dug it out
of my files) is by Thomas A Peters of Makato State University who can
be contacted at TAPETERS@MSUS1.BITNET or TAPETERS@MSUS.EDU .  I'm
sure he will send you a copy.  The copy I have is paper not disc and
it would be easier for you to get it from him because it is quite
long (and annotated.)  I will upload our bibliography and send it to
you today although I think Peters has almost everything we had.  If
I told you about anything else that I'm not sending, let me know.
Janice

****************************************************************************

From: LINDA HILL <lhill@Sti.NASA.GOV>

          What follows is an interesting query posted to the INDEX-L
          list. I don't have the articles he is seeking - do you? I do
          think that the observed behavior is based on novices, of
          which there are a preponderance, and that as people gain
          experience they welcome - need - the control over
          searching/retrieval that our systems provide.  Novices are
          often content with getting "something" rather than getting
          the best and the complete set of relevant
          documents/citations/data sets/etc. In interpreting these
          statistics, it is also important to keep in mind that
          novices turn to intermediaries (if available) to perform
          more complete and/or precise searches. The number of
          searches that are passed on to professionals is influenced
          by many factors - knowledge of the option, availability,
          convenience, cost to name a few. Another factor that needs
          to be kept in mind is the influence of training (people with
          no training will use a simplistic approach) and the degree
          of intuitiveness of the interface - both screen displays and
          prompts and the logic of the search/retrieval system.


and

From: ROBERT JACK
Date: 3/10/93 11:56AM
To: LINDA HILL
Subject: Query length

          David - I am forwarding a reply from our on-staff searcher.
          Your question set him off on the comments you see here. He
          rambles a bit but it is interesting nevertheless.

          We - NASA - do not collect the statistics you asked for
          from the RECON system. We also have an enduser Network
          Access Mechanism (NAM) product in prototype. They CAN
          capture search statements and derive the length of query.
          However, they are not doing it now. If you end up following
          this line of inquiry, I'd be interested in knowing what the
          utility of the statistics will be so that we could decide if
          it is something that we should be capturing.

          +++++ forwarded message +++++

          Off the top of my head, I can't remember seeing anything
          specific about the query length of online search strategies.
          If indeed it is true that one of the "big online database
          companies" holds that "90%" of the queries are just one word
          long, I would submit the following observations:  (I am
          presuming that the correspondent means "hosts" rather than
          database producers, who usually do not operate their own
          publicly-accessible services):

        (1)  The 90% figure is an exaggeration.

        (2)  Most of the "big online database companies" are accessible via
        idiot-level interfaces (either mainframe, such as Telebase/easyNet/
        I-Quest/etc., or micro-based).  Many of these help the user format
        a batch-mode form of inquiry, and then log-off, to save online
        money.  These are often very simple command-searches.

        (3)  Rather than "one-word" searches, the correspondent may mean
        a single command (other than, of course, a final print/type/display
        command).  Most of the more sophisticated hosts permit users
        to pretty much assemble the whole search in a single command:

        "s (cad + (computer()(aided + assisted)()design)) and (airplane? ?
        + aircraft) and (wingtip? ? + wing()tip? ?)"

        (4)   Every online service which has any kind of "meter-ticking"
        charge structure (per-hour, per minute etc.), and/or charges for
        "search-modifications" (LEXIS/NEXIS, i.e.) punishes the user
        who stays online too long formulating the search strategy.  With
        per-hour charges pushing or exceeding $2 per minute for many
        commercial databases, it is often cheaper to take many
        less-than-full-bibliographic-citation prints than to bother with
        complicated strategies.  Let's face it, some people are Boolean-
        impaired.

        (5)  Full-text searching, contrary to popular belief, often
        does not require complicated strategies -- sometimes I want everything
        on a specific individual, company, product, material, event,
        phenomenon, etc., that can be simply described;  esp.
        if it is a very  unusual word;  moreover, a "limit-all" kind of
        command preceding a search statement usually does not show up
        as a "search command," because of the way LIMITs operate.
        So, a fair amount of searching is, I suspect, "I want everything about
        'x' since the last time I searched for it.  I bet more DIALOG
        commands come from stored strategies and SDIs than from actual
        online searches.

        (6)  If bibliographic files are seen as better indexed than before
        (and some are indexed outstandingly well--MEDLINE), searches can
        be made pretty simple, especially if some of the indexing is
        pre-coordinate (like MEDLINE's).

        (7)  New user instruction is usually not oriented to take into
        account that many of the big databases correspond to printed
        A&I services, most of which are pretty obscure to non-librarians
        and non-spevialists  (when was the first time YOU heard of Chemical
        Abstracts?).  DIALOG training, in particular, is "Anybody who
        can type can search," and "If you can type, you can find out
        EVERYTHING."  Experienced users know the costs;  they may also
        have access to CD-ROM versions of databases used most frequently,
        to avoid the high online costs;  or, they may be shifting to a more
        tolerant pricing system, such as ESA/IRS's.  Anyway, online training
        is usually pretty low-level, and you can't teach strategy at
        a generalist level without examples (or to a whole bunch of strangers
        who don't share a detailed understanding of a single technical
        jargon -- if I'm searching, I know that I need to look for
        either Poecilia reticulata OR Lebistes reticulata;  but I'm not
        sure very many other people know what these are, or why I'd
        consider them synonymous for the purposes of my search).

        (8)  Some online hosts (actual, search-command languages) make the use
        of sophisticated strategies a pain in the ASCII.  The Aquarius version
        of STAIRS comes to mind; as does QL (used for both QL in Canada and
        for the late Vu/TEXT here in the states);  NewsNet;  LEXIS and NEXIS;
        and so on.  QL, for example, does not just take your first find
        command and create a manipulatable set;  it starts showing you
        its retrieval.  You have to "save" it as a set-number;  enter your
        next "find;"  save IT;  and then combine -- a real hair-puller.

        (9)  I was almost ready to quict when I thought of one more:
        DIALOG's OneSearch.  If you "one-searched" 8 databases, with an
        8-command strategy, the avergae would be one command per database;
        this might be pulling at some Sequoia-sized straws.

        Just my two cents.

****************************************************************************

From: MADEWELL@ctrvax.Vanderbilt.Edu  (Ramona Steffey Madewell)

A recent question from you on PACS-L inspired me to do some analysis of
keyword searches on our NOTIS system.  I examined an extraction of
4452 keyword searches done over a six-day period (March 4-9,1993) and
about 50% of them used at least one of the available techniques for
refining a search: either a Boolean or positional operator (such as
and, or, not, adj, same, with or near), a field qualifier, truncation,
or nesting.  About 11% of the searches used more than one technique.
On our system, we have a default operator (near) which applies if more
than term is entered without another connector.  I did not count any
of the searches which used the default in coming up with the numbers
above.  So there were in fact many more searches that at least used
one operator.  I find this type of study fascinating and wish I had
time to do more with it.

****************************************************************************

David D. Lewis
AT&T Bell Laboratories            email: lewis@research.att.com
600 Mountain Ave.; Room 2C-408    ph. 908-582-3976
Murray Hill, NJ  07974; USA       dept. fax. 908-582-7550
=========================================================================