Date: Tue, 16 Mar 1993 10:16:57 ECT Reply-To: Indexer's Discussion Group Sender: Indexer's Discussion Group From: "T.C. CRAVEN" <35007_567@uwovax.uwo.ca> Subject: Re: Spreading Activation ----------------------------Original message---------------------------- In reply to the request of Alicia Abrahamson (March 10), I am making use of a spreading activation model with a thesaurus as part of a computer-aided abstracting toolkit that I am developing. I've submitted a paper on this topic for the 1993 ASIS Annual Meeting. The basic idea is to get a set of weighted thesaurus terms from an initial set of keywords. The terms can then be displayed to the abstractor as suggestions for use in the abstract, or they can be used in weighting full-text sentences to determine the most interesting for a particular purpose. A similar approach could no doubt be used for indexing assistance. An article worth examining might be Paice, C.D. 1991. "A thesaural model of information retrieval". Information processing and management. 27 (5): 433-447. Tim Craven, U of Western Ontario 35007_567@UWOVAX.UWO.CA ========================================================================= Date: Tue, 16 Mar 1993 10:18:44 ECT Reply-To: Indexer's Discussion Group Sender: Indexer's Discussion Group From: David Lewis Subject: Query: Automated aids to controlled vocabulary indexing ----------------------------Original message---------------------------- In a recent discussion on PACS-L, several people expressed the belief that artificial intelligence software was more likely to be of use as an aid to human indexing than as a replacement. I know of a couple of systems of this sort that have been fielded by Carnegie Group. I would be interested in getting pointers to other such software that currently exists, or to published research in this area. I'm particularly interested in systems that suggest controlled vocabulary categories to be assigned, with the human indexer having the opportunity to confirm or override the machine's suggestion. If you reply to me I will post a summary of replies to this list, and will include your name along with your comment unless otherwise requested. thanks, dave ========================================================================= Date: Tue, 16 Mar 1993 10:19:58 ECT Reply-To: Indexer's Discussion Group Sender: Indexer's Discussion Group From: Edie Rasmussen Subject: Re: Seeking guidelines for abstracting ----------------------------Original message---------------------------- There doesn't seem to have been too much written in recent years on abstracting. Certainly the "Indexing & Abstracting" books written in the last few years have only 1 or at most 2 chapters on abstracting with the rest on indexing (and I must admit I structure my I&A course about the same way). There was some early work by Borko, and he co-authored a book on abstracting: H. Borko and C.L. Bernier (1975). Abstracting Concepts and Methods. New York: Academic. There's also a standard (Z39.14) for writing abstracts, which has some examples of informative and indicative. And going way back, the article on abstracts and abstracting in the Encyclopedia of Library and Information Science (vol. 1!) gives some good examples of what not to do... verbosity, redundancy, etc. More recently, Raya Fidel did some work on evaluating the potential of abstract writing for enhancing retrieval performance in an online environment. (Information PRocessing & Management 22:309 (1986) and Journal of Documentation 42: 11 (1986)). As Kate McCain suggests, if you can get hold of inhouse guidelines from database producers they often provide useful examples. Cheers, Edie Rasmussen School of Library and Information Science University of Pittsburgh (emr1@vms.cis.pitt.edu) ========================================================================= Date: Wed, 17 Mar 1993 09:31:26 ECT Reply-To: Indexer's Discussion Group Sender: Indexer's Discussion Group From: David Lewis Subject: response to my query on query length ----------------------------Original message---------------------------- Here's a summary of responses to my query on studies about length of queries people make to online retrieval systems. Many thanks to all of you who responded! Best wishes, Dave **************************************************************************** From: Allyson Carlyle Are you interested in online catalog searches? I just finished reading a dissertation that analyzed subject searches in an online catalog and I believe she looked at how many words the searches consisted of: Marilyn A. Lester. Coincidence of User Vocabulary and Library of Congress Subject Headings: Experiments to Improve Subject Access in Academic Library Online Catalogs. Dissertation, Univ. of Illinois at Champagne-Urbana, 1989. **************************************************************************** From: SALLY WINTHROP Hello David - I don't know if this will help you at all, but there was a presentation at the Computers in Libraries last week were two people, Janice Newkirk and Trudi Jacobson from SUNY Albany presented findings from a CD-ROM search strategy analysis projectn that addressed some of your questions. I'm sorry I don't know their phone numbers or email addresses, but they were from the mail library.. **************************************************************************** From: "Stephen J. Cavrak, Jr." No research to back this up, but as a searcher, I've found that I can do a better job shifting through 100 entries than the computer can. One of the reasons is that the "broader" search gives me information that I would not have expected to find, basically a serendipity effect. **************************************************************************** From: Dietmar Wolfram As part of an IR system modelling and simulation study, I looked at the distribution of query lengths for several database environments (OPAC, bibliographic, and full-text). The data for the two latter types were collected based on recorded search strategies encompassing a number of databases. I found that the data could be modelled using a negative binomial distribution, and as you suspect, that shorter queries are more numerous. A description of this aspect of the study can be found in: Wolfram, D. (1992). Applying informetric characteristics of databases to IR system file design, part 1: informetric models. Information Processing and Management 28(1), 121-133. ****************************************************************************** From: Lee Jaffe You might look into the writings of Christine Borgman; Borgman, Christine L. "Why are Online Catalogs Hard to Use? Lessons Learned from Information-Retrieval Studies." JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE, v37 n6 p387-400 Nov 1986. (EJ345851) Borgman, Christine L. "All Users of Information Retrieval Systems Are Not Created Equal: An Exploration into Individual Differences." INFORMATION PROCESSING AND MANAGEMENT, v25 n3 p237-51 1989. (EJ399422) Borgman, Christine L. "Mental Models: Ways of Looking at a System." BULLETIN OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE, v9 n2 p38-39 Dec 1982. (EJ276742) ****************************************************************************** From: Jim Olivetti Hi, David. I have also had need for such information. OCLC in developing their FirstSearch enduser interface had some data (I don't know if its public) One thing they claimed was that "OR" is hardly ever used by endusers or professional searchers. I have forgotten the figures but it was under 20% of searches monitored, I think. I'd very much like to hear what else you learn! Regards, Jim Olivetti, jolivett@capcon.net ****************************************************************************** From: Dag Vaula I found a reference in Current Awareness Abstracts (Aslib) that may be relevant: User practices in keyword and boolean searching on an online public access catalog. Ensor, P. (Indiana State Univ.), Information Technology & Libraries, 11(3) Sept.1992,p.210-219. This article has 17 references which may lead you further. You could also consider searching the online database LISA which is carried by DIALOG and probably also by other hosts. ****************************************************************************** From: Hugo Besemer PUDOC-DLO You asked for references on the complexity (or lack of complexity) of boolean search statements as they are submitted by end-users. The following may be useful: The best possible online search. /by: Weaver, Maggie In: Online Information 92 : 16th INternational Online Information Meeting Proceedings (London 8-10 December 1992). p.375-388. Learned Information, Oxford, 1992. Measures to discriminate among online searchers with Different training and experience. /by: Howard, H. In: Online Review, 6(4), p. 315-327. 1982 End-users, mediated searches, and front-end assistance programs on Dialog: a comparison of learning performance and satisfaction. /by:Sullivan, M.V. In: Journal of the American Society for Information Science, 41 [I like this one, it does not confirm the assumptions of the information professional. Hugo] A comparitive evaluation and analysis of End Users performance in an academic environment. /by:Walker, M.G.P. In: PhD Thesis, Syracuse University, 1988. [I could not get hold of this one. Hugo] We made some observations with cd-rom users in our libraries, but the report is in dutch. Is it of any use to you? My impression is that end users tend to make very simple search requests. But, when we started giving courses for cd-rom searchers we were completely swamped and we are still overbooked. So perhaps end-users, not unlike information professionals, just have to be taught to make more complex searches. ****************************************************************************** From: "Clifford Lynch" David -- there is some info on this in the melvyl statistic screens that we post weekly, I think. I did some analysis back in about 1987 in more depth as part of my PhD thesis. Most queries are indeed very short. Clifford Lynch ****************************************************************************** From: Glee Cady Try mailing a query directly to Walt Crawford, Research Libraries Group. His email address is br.wcc%rlg@forsythe.stanford.edu Walt has/had program which analyzed the searches and other commands that were given in the RLIN system. He published an article about them long ago in _Journal of Library Automation_, I think. Anyway, he'd be delighted to tell you about it, I'm sure. ****************************************************************************** From: JN829%ALBNYVMS.bitnet@UACSC2.ALBANY.EDU (Janice Newkirk) To David Lewis -- Here at the University at Albany (SUNY) a colleague (Trudi Jacobson) and I have done a pilot study analyzing CD-ROM search transactions (in preparation for a more complete study looking at how our library instruction affects how end-users search.) We have preliminary data on what types of search statements make up search strategies of over 500 end-users, who are mostly students. We were looking mostly at how boolean operators were used but at several other variables as well. Actually, if the entire search strategies are taken into account, very few were one word long. In fact, one of the major problems we've observed is that end-user searchers tend to input entire phrases in natural language as a search statement. If you are interested in any of our data, let me know. Also, we have a pretty extensive potential bibliography of research and commentary on search transaction analysis. Also, during the last six months, someone posted to this list a query about the subject; we contacted him; and we received a copy of his bibliography on the subject. I have lost the name, but if you wanted to follow up you might query the list about who was doing this. and Dave -- The bibliography I wrote to you about (I have just dug it out of my files) is by Thomas A Peters of Makato State University who can be contacted at TAPETERS@MSUS1.BITNET or TAPETERS@MSUS.EDU . I'm sure he will send you a copy. The copy I have is paper not disc and it would be easier for you to get it from him because it is quite long (and annotated.) I will upload our bibliography and send it to you today although I think Peters has almost everything we had. If I told you about anything else that I'm not sending, let me know. Janice **************************************************************************** From: LINDA HILL What follows is an interesting query posted to the INDEX-L list. I don't have the articles he is seeking - do you? I do think that the observed behavior is based on novices, of which there are a preponderance, and that as people gain experience they welcome - need - the control over searching/retrieval that our systems provide. Novices are often content with getting "something" rather than getting the best and the complete set of relevant documents/citations/data sets/etc. In interpreting these statistics, it is also important to keep in mind that novices turn to intermediaries (if available) to perform more complete and/or precise searches. The number of searches that are passed on to professionals is influenced by many factors - knowledge of the option, availability, convenience, cost to name a few. Another factor that needs to be kept in mind is the influence of training (people with no training will use a simplistic approach) and the degree of intuitiveness of the interface - both screen displays and prompts and the logic of the search/retrieval system. and From: ROBERT JACK Date: 3/10/93 11:56AM To: LINDA HILL Subject: Query length David - I am forwarding a reply from our on-staff searcher. Your question set him off on the comments you see here. He rambles a bit but it is interesting nevertheless. We - NASA - do not collect the statistics you asked for from the RECON system. We also have an enduser Network Access Mechanism (NAM) product in prototype. They CAN capture search statements and derive the length of query. However, they are not doing it now. If you end up following this line of inquiry, I'd be interested in knowing what the utility of the statistics will be so that we could decide if it is something that we should be capturing. +++++ forwarded message +++++ Off the top of my head, I can't remember seeing anything specific about the query length of online search strategies. If indeed it is true that one of the "big online database companies" holds that "90%" of the queries are just one word long, I would submit the following observations: (I am presuming that the correspondent means "hosts" rather than database producers, who usually do not operate their own publicly-accessible services): (1) The 90% figure is an exaggeration. (2) Most of the "big online database companies" are accessible via idiot-level interfaces (either mainframe, such as Telebase/easyNet/ I-Quest/etc., or micro-based). Many of these help the user format a batch-mode form of inquiry, and then log-off, to save online money. These are often very simple command-searches. (3) Rather than "one-word" searches, the correspondent may mean a single command (other than, of course, a final print/type/display command). Most of the more sophisticated hosts permit users to pretty much assemble the whole search in a single command: "s (cad + (computer()(aided + assisted)()design)) and (airplane? ? + aircraft) and (wingtip? ? + wing()tip? ?)" (4) Every online service which has any kind of "meter-ticking" charge structure (per-hour, per minute etc.), and/or charges for "search-modifications" (LEXIS/NEXIS, i.e.) punishes the user who stays online too long formulating the search strategy. With per-hour charges pushing or exceeding $2 per minute for many commercial databases, it is often cheaper to take many less-than-full-bibliographic-citation prints than to bother with complicated strategies. Let's face it, some people are Boolean- impaired. (5) Full-text searching, contrary to popular belief, often does not require complicated strategies -- sometimes I want everything on a specific individual, company, product, material, event, phenomenon, etc., that can be simply described; esp. if it is a very unusual word; moreover, a "limit-all" kind of command preceding a search statement usually does not show up as a "search command," because of the way LIMITs operate. So, a fair amount of searching is, I suspect, "I want everything about 'x' since the last time I searched for it. I bet more DIALOG commands come from stored strategies and SDIs than from actual online searches. (6) If bibliographic files are seen as better indexed than before (and some are indexed outstandingly well--MEDLINE), searches can be made pretty simple, especially if some of the indexing is pre-coordinate (like MEDLINE's). (7) New user instruction is usually not oriented to take into account that many of the big databases correspond to printed A&I services, most of which are pretty obscure to non-librarians and non-spevialists (when was the first time YOU heard of Chemical Abstracts?). DIALOG training, in particular, is "Anybody who can type can search," and "If you can type, you can find out EVERYTHING." Experienced users know the costs; they may also have access to CD-ROM versions of databases used most frequently, to avoid the high online costs; or, they may be shifting to a more tolerant pricing system, such as ESA/IRS's. Anyway, online training is usually pretty low-level, and you can't teach strategy at a generalist level without examples (or to a whole bunch of strangers who don't share a detailed understanding of a single technical jargon -- if I'm searching, I know that I need to look for either Poecilia reticulata OR Lebistes reticulata; but I'm not sure very many other people know what these are, or why I'd consider them synonymous for the purposes of my search). (8) Some online hosts (actual, search-command languages) make the use of sophisticated strategies a pain in the ASCII. The Aquarius version of STAIRS comes to mind; as does QL (used for both QL in Canada and for the late Vu/TEXT here in the states); NewsNet; LEXIS and NEXIS; and so on. QL, for example, does not just take your first find command and create a manipulatable set; it starts showing you its retrieval. You have to "save" it as a set-number; enter your next "find;" save IT; and then combine -- a real hair-puller. (9) I was almost ready to quict when I thought of one more: DIALOG's OneSearch. If you "one-searched" 8 databases, with an 8-command strategy, the avergae would be one command per database; this might be pulling at some Sequoia-sized straws. Just my two cents. **************************************************************************** From: MADEWELL@ctrvax.Vanderbilt.Edu (Ramona Steffey Madewell) A recent question from you on PACS-L inspired me to do some analysis of keyword searches on our NOTIS system. I examined an extraction of 4452 keyword searches done over a six-day period (March 4-9,1993) and about 50% of them used at least one of the available techniques for refining a search: either a Boolean or positional operator (such as and, or, not, adj, same, with or near), a field qualifier, truncation, or nesting. About 11% of the searches used more than one technique. On our system, we have a default operator (near) which applies if more than term is entered without another connector. I did not count any of the searches which used the default in coming up with the numbers above. So there were in fact many more searches that at least used one operator. I find this type of study fascinating and wish I had time to do more with it. **************************************************************************** David D. Lewis AT&T Bell Laboratories email: lewis@research.att.com 600 Mountain Ave.; Room 2C-408 ph. 908-582-3976 Murray Hill, NJ 07974; USA dept. fax. 908-582-7550 =========================================================================