========================================================================= Date: Mon, 3 Aug 1992 10:48:52 ECT Reply-To: Indexer's Discussion Group Sender: Indexer's Discussion Group From: Robert Jacobson Subject: Re: Thoughts on Indexing ----------------------------Original message---------------------------- As a non-indexer -- I have trouble keeping my notecards straight -- I would like to hear something about the thought processes (in system lingo, the algorithms) that indexers use (1) to conceptualize the universe of terms they are going to organize, (2) to choose the terms they think significant for organizing, and (3) to organize. No need for scientific jargon (unless you think it's necessary for clarity): I'd just like to know what goes through your head. I'll tell you why after the responses come in. Bob Jacobson ========================================================================= Date: Mon, 3 Aug 1992 10:50:37 ECT Reply-To: Indexer's Discussion Group Sender: Indexer's Discussion Group From: "Nancy C. Mulvany" Subject: Authored Indexes, the Market for... ----------------------------Original message---------------------------- RE> "Most indexing is now done automatically by computers.... Is it true? No, it is not true! For the past eight years (at least), ALL of my clients have their documents online. They could easily (and cheaply) opt for automatic "indexing." Most of my clients are invloved in the computer industry. They have access to the whatever computer tools they desire. However, they pay me to index their material from final (PRINTED) page proofs. They want an index, not a list generated from some automatic tool. RE> ... indexers, are you all overwhelmed with work, or is the pool drying up? At least on the West Coast, we are overwhelmed with work! I'm pleased ot say that the marketplace here has become more sophisticated and desireous of good indexes. After teaching book indexing to hundreds of people for the past 5+ years, I can say that good indexers are few and far between. I have no worries about being replaced by a machine anytime in the near the future, unless, of course, the desire for quality indexing work declines. -Nancy nmulvany@well.sf.ca.us ========================================================================= Date: Mon, 3 Aug 1992 13:25:55 ECT Reply-To: Indexer's Discussion Group Sender: Indexer's Discussion Group From: CGOODSON@UGA.BITNET Subject: Re: Authored Indexes, the Market for... In-Reply-To: Message of Mon, 3 Aug 1992 10:50:37 ECT from ----------------------------Original message---------------------------- On Mon, 3 Aug 1992 10:50:37 ECT Nancy C. Mulvany said: > RE> "Most indexing is now done automatically by computers.... > Is it true? > >No, it is not true! > > RE> ... indexers, are you all overwhelmed with work, or > is the pool drying up? > >At least on the West Coast, we are overwhelmed with work! ***thanks so much for this encouraging response! I am definitely going to pursue this...I loved it in library school....and was encouraged to go in that direction (even offered a job with the NYT Index) but didn't do it... and have regretted it ever since.... ............................................................ | Carol Goodson, Coordinator/Off-Campus Library Services | | Ingram Library, WEST GEORGIA COLLEGE | | Carrollton GA 30118 | | Phone: (404) 836-6502 FAX: (404) 836-6626 | | Bitnet: cgoodson@uga Internet: cgoodson@uga.cc.uga.edu| ............................................................ "You only live once: but once is enough if you play it right" --Woody Allen (Interiors) ............................................................ ========================================================================= Date: Mon, 3 Aug 1992 16:37:50 ECT Reply-To: Indexer's Discussion Group Sender: Indexer's Discussion Group From: KROVETZ@cs.umass.EDU Subject: automatic indexing ----------------------------Original message---------------------------- RE> "Most indexing is now done automatically by computers... Is it true? I don't know how much back-of-the-book indexing is done automatically, but a great deal of the indexing for online information retrieval *is* done automatically. Some of the major databases also have manual indexing (e.g., Medline and Chemical Abstracts). Westlaw used to use only manual indexing by their key-numbers, but they lost market share to Lexis (which only used free-text indexing), so they were forced to include free-text indexing as well. I posted a message about a week ago regarding automatic indexing, but I didn't receive any response. I've heard about automatic indexing vs. manual indexing from the information retrieval community, but not from the indexer community. Did that message make it to the list? Does anyone have any comments about the subject? Bob krovetz@cs.umass.edu ========================================================================= Date: Mon, 3 Aug 1992 16:40:15 ECT Reply-To: Indexer's Discussion Group Sender: Indexer's Discussion Group From: KROVETZ@cs.umass.EDU Subject: Automatic Indexing For those of you who may have missed it, here is Bob's first message about automatic indexing. Charlotte ----------------------------Original message---------------------------- ----------------------------Original message---------------------------- Automatic Indexing vs. Manual Indexing is an old topic in information retrieval. The existing test seem to indicate that they have about the same level of performance. I've heard the arguments about this from the information retrieval community, but never from the viewpoint of professional indexers. When I first heard about it, it didn't make any sense; manual indexing involves intellectual effort - how could the performance be the same as the simple methods used in automatic indexing. I think a plausible explanation is that human indexers are inconsistent, and therefore there is a problem with representation on the indexing side. For automatic indexing, the problem is the wide variety of vocabulary usage, and the user can't think of all the different ways in which a concept can be expressed. Systems like Susanne Humphrey's MedIndEX aim to improve the consistency of manual indexing, so perhaps that will shift the balance, but there's also a lot of room for improvements with automatic methods. My bias is towards the automatic methods, and towards improving them with natural language processing (I'm a computational linguist). I'd welcome any discussion on the topic. Bob krovetz@cs.umass.edu ========================================================================= Date: Tue, 4 Aug 1992 10:54:39 ECT Reply-To: Indexer's Discussion Group Sender: Indexer's Discussion Group From: "Kent S. Larsen" Subject: Thoughts on Indexing ----------------------------Original message---------------------------- While I am not a professionaly indexer by any means, I do work in book publishing and I have some experience dealing with indexers. At least in trade book publishing, very few books are in electronic format when they are delivered to the editors, and even fewer are indexed with computer assistance. Editors seem to be very wary of computer assisted indexing. Also, in reply to an earlier query, I see very few indexing firms being used. Generally, trade publishers use individual indexers that they have had good experiences with. >From my point of view, computer generated indexes have a long way to go before they are widely accepted as a way of producing adequate indexes. This has nothing to do with the actual adequacy of the index, merely the perception in the industry. ========================================================================= Date: Tue, 4 Aug 1992 15:14:07 ECT Reply-To: Indexer's Discussion Group Sender: Indexer's Discussion Group From: RHADDEN@USGSRESV.BITNET Subject: Re: Thoughts on Indexing ----------------------------Original message---------------------------- No, there is still a market out there, especially in smaller, subkect area books and journals. Find one w/o an index, and offer to do one for the publisher. Good luck! lee hadden usgs library (and former freelance indexer) ========================================================================= Date: Tue, 4 Aug 1992 15:14:43 ECT Reply-To: Indexer's Discussion Group Sender: Indexer's Discussion Group From: RHADDEN@USGSRESV.BITNET Subject: RE: an indexer's rights ----------------------------Original message---------------------------- The indexer who works for an hour's wage loses his rights to his/her emplyer. If you are interested in more rights, try dealing with the publisher as either a contractor (for the whole, completed index) or try to sell it as a seperate pub lication. This will at least show you what your work is/is not worth on the open market. Basically, your index is dependent on the page number of a printed item (copy- righterd by the publisher), and thus is secondary to the copyright holder. Compilations of information are not subject to copyright, either. Interestingly enough, if you put in an erroneous citation, you can copyright an index as a work of fiction, and thus vconfuse the idssue to everyone's satisfaction when you are fired. As for credit, everyone would like it- the typesetter, the proofreader, the editors, the binders, etc. Get what you can get away with. Lee Hadden ========================================================================= Date: Tue, 4 Aug 1992 16:49:12 ECT Reply-To: Indexer's Discussion Group Sender: Indexer's Discussion Group From: Carol Roberts Subject: Getting started ----------------------------Original message---------------------------- Getting started I am intrigued and encouraged. I work full-time days as a copyeditor (for Cornell) and also do freelance (specializing in scholarly philosophy texts-old grad students never die, they just...). I have done one index for Cornell, for a student handbook, and have thought I'd like to do some more, combining it with my freelance copyediting (how many copyeditor/indexers specializing in philosophy do you know?). Now that the summer crunch is almost over and my wee one is sleeping through the night, I'm ready for anything. So my question is, would anyone be interested in doing some mentoring? :^) Carol Roberts PUBS, Cornell University ========================================================================= Date: Wed, 5 Aug 1992 12:07:26 ECT Reply-To: Indexer's Discussion Group Sender: Indexer's Discussion Group From: JANDERSON@ZODIAC.BITNET Subject: NISO standard committee press release ----------------------------Original message---------------------------- NISO Committee on Standards for Indexing Prepares Second Draft; Will Hold Open Session at ASIS Annual Meeting The National Information Standards Organization (NISO) committee on standards for indexes has commissioned its chairperson, James D. Anderson of Rutgers University, to prepare a second draft of "Standard Guidelines for Indexing in Information Retrieval." The second draft will be summarized and discussed at an open forum at the annual meeting of the American Society for Information Science in Pittsburgh, October 27-31, 1992. Panelists include chairperson James D. Anderson, Bella Hass Weinberg of St. John's University, Raya Fidel of the University of Washington, and Marcia Bates of the University of California at Los Angeles. The revised draft is expected to include a lengthy glossary of definitions, a comprehensive classification of types of index, and a detailed description of index features and design options. The emerging draft is more complicated than previous standards for indexes because it attempts to address every kind of index used for information retrieval regardless of documentary units and media, method of index compilation (e.g., by computer algorithm or human intellectual analysis), kind of index language, indexing display media, and search options or procedures. The committee intends to describe principles of index syntax and vocabulary linking that can and should be applied to every kind of index. Since there is no national standard for alphanumeric arrangement, the committee expects to suggest standards for indexes displayed is this way as well. The committee held its second meeting in New Brunswick, New Jersey on May 4, 1992. Prior to this meeting, members had analyzed the committee's initial draft, which was based on discussions at its first meeting on October 25, 1991. The original draft had been modeled on the draft "Guidelines for the Content, Organization and Presentation of Indexes" prepared for the International Standards Organization (ISO). However, the committee has concluded that the ISO draft standard is still too narrowly focused on print indexes and human indexing. The committee's third meeting is scheduled for November 6, 1992 in Washington, D.C. James D. Anderson, Chairperson, School of communication, Information, and Library Studies, Rutgers The State University of New Jersey, 4 Huntington St., New Brunswick, NJ 08903, 908/932- 7501, Internet: IN%"janderson@zodiac.rutgers.edu" ========================================================================= Date: Wed, 5 Aug 1992 12:20:26 ECT Reply-To: Indexer's Discussion Group Sender: Indexer's Discussion Group From: KROVETZ@cs.umass.EDU Subject: automatic indexing ----------------------------Original message---------------------------- What are some of the major problems caused by automatic indexing (as applied to generating an index for a book)? Do indexers find them useful as a way of preparing an initial set of index terms? There is research going on in Computational Linguistics on identifying proper nouns. This might be helpful for preparing name indices, and in reducing the number of words considered for the general content index. I think these systems should be generally available in another year or two. Bob krovetz@cs.umass.edu ========================================================================= Date: Wed, 5 Aug 1992 13:01:00 ECT Reply-To: Indexer's Discussion Group Sender: Indexer's Discussion Group From: Candy Schwartz Subject: Automatic Indexing ----------------------------Original message---------------------------- I had replied to Bob's thought re automatic indexing a few weeks ago, but all our outgoing e-mail went into limbo without warning. so I will try again. First of all, let's make a distinction between automatic indexing and free text searching. The former is the identification of subject (or other) content (or likelihood of relevance, or presence in a class, or whatever) based on statistical characteristics of words (or other pieces of item representations) and their distribution in a file. The latter is searching based on words which appear in other than, or in addition to, controlled vocabulary fields. Only in the broadest sense could you call a simple inverted file index an automatic index. As to the question of which is better in representing subject: human indexers or automatic indexing. This has been a subject of debate for a while, and the general consensus is that they perform about the same in most cases, although for some types of information seeking intellectual indexing is more useful, and for other types automatic indexing is. There actually was a debate some years ago between David Batty (thesaurus constructor) and Bruce Croft (associative retrieval expert) on this very topic, at an ASIS conference. The outcome was as stated above. There is not as much research on automatic indexing in back of the book settings (using a text flagging software utility is not automatic indexing), but there is a bit of research on this, and more to come, I'll bet, as more "e-books" (yuck) find their ways around the networks. Most of the research in this field looks at bibliographic or large text databases. It is a bit misleading to imply that most large indexing and abstracting services use automatic indexing. Most of them actually use human indexers and controlled vocabularies, or they just make full text available for searching. MedLine is a shining exception, as is the research they do on automatic indexing. Candy Schwartz (hi Bob) cschwartz@vmsvax.simmons.edu ========================================================================= Date: Thu, 6 Aug 1992 09:06:45 ECT Reply-To: Indexer's Discussion Group Sender: Indexer's Discussion Group From: "R.S. Etheredge" Subject: Re: Getting started ----------------------------Original message---------------------------- Howdee, If there is gonna be some mentoring goin' on, I wanna be included, please. I'm dummer'n dirt, and readin' some of these dry index texts don't seem to help a heap...I'm encouraged by the prospect of networkin' this mentoring idea. Have a happy day... Rusty Etheredge ========================================================================= Date: Thu, 6 Aug 1992 09:08:00 ECT Reply-To: Indexer's Discussion Group Sender: Indexer's Discussion Group From: "R.S. Etheredge" Subject: Re: Automatic Indexing ----------------------------Original message---------------------------- Howdee Candy, You said, "MedLine is a shining exception, as is the research they do on automatic indexing." Can you elucidate further on this statement, for me, please? I am a neophyte to indexing, and find it to be more and more a subject both of interest and devilment. That is, there seems to be too much data, and not enough indexing. I keep stumbling over my own amateurish attempts at indexing my own initiatives. I would like to know more about the research you mentioned above. Have a happy day... Rusty Etheredge ========================================================================= Date: Thu, 6 Aug 1992 13:20:29 ECT Reply-To: Indexer's Discussion Group Sender: Indexer's Discussion Group From: Candy Schwartz Subject: Research at NLM ----------------------------Original message---------------------------- The people who come to mind as authors of reports on NLM indexing and retrieval research (and presenters at conferences) are Susanne Humphrey and Tom Doszkocs. Susanne has presented papers at ASIS meetings and has published several articles on automatic indexing. She will no doubt be present at the ASIS Special Interest Group on Classification Research (SIG CR) Classification Research Workshop to be held in Pittsburgh on October 25. That's a great place ot meet the people involved in this filed. Doszkocs has published a bunch of articles in the library and information science over the past decade. Candy Schwartz cschwartz@vmsvax.simmons.edu ========================================================================= Date: Thu, 6 Aug 1992 13:21:42 ECT Reply-To: Indexer's Discussion Group Sender: Indexer's Discussion Group From: "Nancy C. Mulvany" Subject: Automatic indexing ----------------------------Original message---------------------------- Re> Automatic Indexing I'd like to respond to the comments of both Candy Schwartz and Bob Krovetz about automatic indexing. Candy's description of the distinction between automatic indexing and free text searching is well taken. And, yes, it is of course true that most database publishers use human indexers to select terms from some type of controlled vocabulary, nothing automatic about that! I tend to distinguish automatic methods from "human/manual" methods based upon the type of analysis of text that is used. Automatic methods employ algorithmic analysis, while human methods use intellectual analysis of text. Automatic methods can run the gamut from simple-minded KWIC, KWOC, and KWAC lists to the type of statistical analysis discussed by Salton and others. Bob asks: What are some of the major problems caused by automatic indexing (as applied to generating an index for a book)? Do indexers find them useful as a way of preparing an initial set of index terms? First of all, in deference to those who are working to improve the sophistication of algorithmic analysis of text, most automatic indexing techniques that I have seen applied to a single book are not sophisticated, they are quite simple; i.e, the concordance generators, KWIC/KWOC/KWAC lists, etc. I can only discuss Bob's questions in relation to these simple tools. What are the major problems? They can only list terms that appear in the text. They do not pick up on concepts not stated verbatim. They do not indicate relationships between similar topics. They do not distinguish between passing mention of a topic and substantive discussion of a topic. They do not take into consideration the language of the audience. Are they useful for preparing an initial set of index terms? In my opinion, no. It would not be unusual in book-length material for a concordance generator to provide 50+ references for some of the major terms that appear in the text. The indexer would have to ferret through all these references in order to structure an index entry for the term. This takes a great deal of time. Now imagine that the entire concordance has to be checked in the same way. Then when that's done, the indexer should really go back through the text and create entries for concepts not stated verbatim that do not appear in the list. I think it's important to keep in mind what a book indexer wishes to achieve. It is not our job to be 100% exhaustive and indicate every mention of a term. The index is a filter for users. It is a bridge between the book author and the audience. Ideally we try to anticipate the language and needs of the audience and guide users to specific portions of the text. I would love to see some linguistic analysis tools made available for book indexers. I know that some are working on tools that will analyze a portion of text and suggest index terms. When I need a tool like that, it will be an indication that I am indexing material beyond my expertise. No, I need tools that will help insure the integrity of my index structure. For example, proper names pose problems for indexers. I would like a tool that would call my attention to two entries that may be the same, such as, Eastman Kodak Co. Kodak Co. I would like a tool that would analyze my cross-reference structure and when appropriate, suggest that an entry be double-posted instead of cross-referenced when there are only a few reference locators involved. For example, if I had these entries, dogs, 56-59 hounds. See dogs the program would suggest that I double-post "hounds" rather than using the "See" cross-reference. If I agree, it would edit my index in the following way: dogs, 56-59 hounds, 56-59 In the realm of vocabulary control there is opportunity to develop some very useful tools. For example, I have a set of cross-references that I use whenever I index a document processing software manual. These cross-references have developed from years of indexing this type of material. After I have set up the structure for these cross-references, a linguistic analysis program could be running in the background while I write the index. When I use one of the terms as an entry, the program could suggest appropriate cross-references, and with my approval, add them to the index file. I think this is just the tip of the iceberg in relation to the type of tools that could be developed to help those of us who design information access structures. There are tens of thousands of people writing indexes everyday (I'm including the technical writing community not only the contract indexing community!). However, we have been quite neglected! One need only look at the silly embedded indexing tools (i.e., the indexing modules included in word processors) to see how low our needs are on the totem pole of software development. Nancy Mulvany nmulvany@well.sf.ca.us ========================================================================= Date: Thu, 6 Aug 1992 13:22:51 ECT Reply-To: Indexer's Discussion Group Sender: Indexer's Discussion Group From: "Nancy C. Mulvany" Subject: Mentoring ----------------------------Original message---------------------------- Re > Mentoring.... If you are interested in learning about book indexing and a bit about the freelance indexing world, there is a correspondence course you may be interested in. I designed the course, so I can tell you a bit about it! There are ten lessons plus a final exam. There are three indexing writing assignments, the other lessons essentially involve answering questions. The primary text is Chapter 18 of The Chicago Manual of Style. Webster's Style Guide and ASI's Generic Markup of Index Manuscripts are also used. Many students have gone through this course and have become successful freelance indexers. Others find out that they can't stand indexing!! If you're interested, request the Correspondence Study Catalog from: Correspondence Study Program Graduate School, USDA 14th & Independence Ave., SW Room 1114 Washington, DC 20250 phone: (202) 720-7123 Nancy Mulvany nmulvany@well.sf.ca.us ========================================================================= Date: Fri, 7 Aug 1992 09:04:00 ECT Reply-To: Indexer's Discussion Group Sender: Indexer's Discussion Group From: Kody Janney Subject: RE: Automatic Indexing I am new to the field. I have an bent towards the automatic indexing side. Is there a list of auto indexing programs or any information on how these systems work available? I do have (excerpts from) Lancaster's book. But I am interested in very current information. And in actual systems in use. Thanks for any information you can give me on these elementary questions. -Kody Janney Interactive Home Systems ---------- From: To: Multiple recipients of list INDEX-L Subject: Automatic Indexing Date: Monday, August 03, 1992 4:40PM For those of you who may have missed it, here is Bob's first message about automatic indexing. Charlotte ----------------------------Original message---------------------------- ----------------------------Original message---------------------------- Automatic Indexing vs. Manual Indexing is an old topic in information retrieval. The existing test seem to indicate that they have about the same level of performance. I've heard the arguments about this from the information retrieval community, but never from the viewpoint of professional indexers. When I first heard about it, it didn't make any sense; manual indexing involves intellectual effort - how could the performance be the same as the simple methods used in automatic indexing. I think a plausible explanation is that human indexers are inconsistent, and therefore there is a problem with representation on the indexing side. For automatic indexing, the problem is the wide variety of vocabulary usage, and the user can't think of all the different ways in which a concept can be expressed. Systems like Susanne Humphrey's MedIndEX aim to improve the consistency of manual indexing, so perhaps that will shift the balance, but there's also a lot of room for improvements with automatic methods. My bias is towards the automatic methods, and towards improving them with natural language processing (I'm a computational linguist). I'd welcome any discussion on the topic. Bob krovetz@cs.umass.edu ========================================================================= Date: Fri, 7 Aug 1992 09:06:14 ECT Reply-To: Indexer's Discussion Group Sender: Indexer's Discussion Group From: KROVETZ@cs.umass.EDU Subject: Re: automatic indexing ----------------------------Original message---------------------------- Candy mentions (hi Candy!): First of all, let's make a distinction between automatic indexing and free text searching. The former is the identification of subject (or other) content (or likelihood of relevance, or presence in a class, or whatever) based on statistical characteristics of words (or other pieces of item representations) and their distribution in a file. The latter is searching based on words which appear in other than, or in addition to, controlled vocabulary fields. Only in the broadest sense could you call a simple inverted file index an automatic index. Rather than making a contrast between automatic indexing and free text searching, I think we should look at it from the viewpoint of representation. Automatic indexing can be used to select a set of words from the document and use those words as a representation, or it can be used to infer some controlled vocabulary terms as the representation. As Candy points out, manual assignment of controlled vocabulary terms seems to give the same level of performance as a representation based on the words in the document (appropriately weighted). My question is: why is this the case? I think it might be due to consistency problems in manual indexing, and the variability of expressing a concept in free-text search, but I haven't seem much discussion about this in the literature. Nancy mentions: I tend to distinguish automatic methods from "human/manual" methods based upon the type of analysis of text that is used. Automatic methods employ algorithmic analysis, while human methods use intellectual analysis of text. I don't see algorithmic analysis as being in contrast to an intellectual analysis. I would make a distinction between the simple methods for automatic indexing (based on the statistical properties of the words contained in the documents) and the more intelligent approaches (such as those based on linguistics). First of all, in deference to those who are working to improve the sophistication of algorithmic analysis of text, most automatic indexing techniques that I have seen applied to a single book are not sophisticated, they are quite simple; i.e, the concordance generators, KWIC/KWOC/KWAC lists, etc. I'm not surprised that these aren't a lot of help. They provide you with too many words that are not appropriate as representations (or at least suggestive of an appropriate concept). I'm not a fan of simple statistical approaches (combining linguistics with statistics is another story), but they should at least be helpful in pruning down the list of suggested words. One thing that could be done is to compare the frequencies of the words with their normal frequency in English. If the words exhibit a large variance from that, then those words could be proposed as indices. What are the major problems? They can only list terms that appear in the text. They do not pick up on concepts not stated verbatim. They do not indicate relationships between similar topics. They do not distinguish between passing mention of a topic and substantive discussion of a topic. They do not take into consideration the language of the audience. Inferring concepts (as opposed to exact terms) is certainly a hard problem, but I don't see it as something impossible. The automatic indexing systems that try to infer controlled vocabulary terms are one effort in that direction. There is an interesting article in the latest issue of Communications of the ACM about the assignment of categories in a classification system for the U.S. Census; it doesn't perform as well as a human, but it does fairly well, and it has the potential to approach human performance if it is trained on more data. Distinguishing between a passing use and a central topic is also a hard problem. There's some discussion about that in the linguistics literature, but it's mostly philosophical and doesn't give concrete suggestions for how to implement the notion algorithmically. I'm not sure how indexers take the language of the audience into consideration when they assign index terms. Can you elaborate? I would love to see some linguistic analysis tools made available for book indexers. I know that some are working on tools that will analyze a portion of text and suggest index terms. When I need a tool like that, it will be an indication that I am indexing material beyond my expertise. I know that book indices aren't intended to be exhaustive, but isn't it possible that a system could recommend index terms that you hadn't thought of? No, I need tools that will help insure the integrity of my index structure. For example, proper names pose problems for indexers. I would like a tool that would call my attention to two entries that may be the same, such as, Eastman Kodak Co. Kodak Co. There are several projects going on involving proper noun recognition, and I think they would be a useful tool to the indexer. This would certainly be better than examining an entire concordance! There would be difficulty with distinguishing different individuals with the same last name, and isolating the appropriate references is beyond the state-of-the-art. But, at least the system could indicate which proper nouns occured. In terms of analyzing cross-references, how does an indexer decide whether something should be a cross-reference vs. double-posted? The suggestion about adding cross-references reminded me of Susanne Humphrey's work with MedIndEX. I'm not sure why a linguistic analysis is needed though. Bob P.S. I've heard about KWIC and KWOC indices, but what does KWAC refer to? ========================================================================= Date: Fri, 7 Aug 1992 14:18:40 ECT Reply-To: Indexer's Discussion Group Sender: Indexer's Discussion Group From: "Bill Drew-Serials/Reference Librar. SUNY Morrisville" ----------------------------Original message---------------------------- I am in the process of writing my first book and I will be indexing it myself. I am writing it using WordPerfect 5.1. The subject of the book is electronic resources in agriculture. It will be part of a series for Meckler. My question is where do I start? Should I be using a controlled vocabulary (such as the National Agricultural Library's thesaurus for Agricola)? Should I wait until the final draft is returned to me? Most of the information in this book will be presented in the form of structured entries with a unique number for each entry. Should an index refer to entries or pages? I am organizing the information into a database program (DataPerfect form WordPerfect) where it will be exported out as a WP file to be put into subdocuments. I will then use WP's master document feature to assemble the final document. Any suggestions would be appreciated. I really have very few ideas as to how to approach the indexing. Wilfred Drew (Call me "Bill") Serials/Reference/Computers Librarian State University of New York College of Agriculture and Technology P.O. Box 902; Morrisville, NY 13408-0902 BITNET: DREWWE@SNYMORVA Internet: DREWWE@SNYMORVA.CS.SNYMOR.EDU Phone: (315)684-6055 or 684-6060 Fax: (315)684-6115 Any opinions expressed here are mine and are subject to change without notice. ========================================================================= Date: Fri, 7 Aug 1992 14:19:26 ECT Reply-To: Indexer's Discussion Group Sender: Indexer's Discussion Group From: Candy Schwartz Subject: Books on Automatic Indexing ----------------------------Original message---------------------------- Although they can be heavy going, I would suggest Salton and McGill's Introduction to Modern Information Retrieval, and Salton's Automatic Text processing. I actually think the most cogent brief intro is still Van Rijsbergen's Information Retrieval, but it is quite dated now. Candy Schwartz cschwartz@vmsvax.simmons.edu ========================================================================= Date: Fri, 7 Aug 1992 14:19:59 ECT Reply-To: Indexer's Discussion Group Sender: Indexer's Discussion Group From: Candy Schwartz Subject: Indexer Consistency ----------------------------Original message---------------------------- Theer is a huge body of literature on inter and intra indexer consistency, it just hasn't been studied much in the recent past. I did a state of the art for my very first official publication back in 1974 or 75 or so (for an ASIS meeting), and there were some publicatoons in the next five years, but not much since. Human indexing is not very consistent, and there are many reasons why. Candy Schwartz cschwartz@vmsvax.simmons.edu ========================================================================= Date: Fri, 7 Aug 1992 16:33:00 ECT Reply-To: Indexer's Discussion Group Sender: Indexer's Discussion Group From: KROVETZ@cs.umass.EDU Subject: automatic indexing references ----------------------------Original message---------------------------- I also like Van Rijsbergen's book, and I agree it is pretty dated Salton's book is more recent, but I don't think it adds very much to the information given in Van Rijsbergen. I would recommend "Information Retrieval Experiment", by Karen Sparck Jones (Butterworths). It's an excellent collection of papers about the work that's been done, and the problems involved in doing IR experiments. Candy - Can you post the reference for your review on indexing consistency (and maybe one or two of the papers since that review)? In terms of book indexing, this is a list of papers that I've found on the subject: "Back-of-the-book Indexing: A Case for the Application of Artificial Intelligence", C. Bell and K. Jones, Informatics 5, ASLIB Pub., pp. 155-161, 1979 Bennion B, `Performance Testing of a Book and its Index as an Information Retrieval System', JASIS, pp. 264-270, July 1980 "Experiments in Book Indexing by Computer", H. Borko, Information Storage and Retrieval, Vol. 6, pp. 5-16, 1970 "Fully Automatic Book Indexing", M. Dillon and L. McDonald, J. of Documentation, Vol. 39(1), pp. 135-154, 1983 "Thesaurus-Based Automatic Book Indexing", M. Dillon, Information Processing and Management, Vol. 18(4), pp. 167-178, 1982 "Syntactic Approaches to Automatic Book Indexing", G. Salton, Proceedings of the 26th ACL, pp. 204-210, 1988 Bob =========================================================================