Date: Wed, 9 Sep 1992 11:17:18 ECT Reply-To: Indexer's Discussion Group Sender: Indexer's Discussion Group From: ANDRE DETIENNE Subject: Passim, again ----------------------------Original message---------------------------- Thanks to those of you who shared their thoughts about the use of "passim." This word is not unanimously rejected, though we might say it is generally distrusted. I think it is worth investigating the matter a little more deeply, for it touches what we might call the very philosophy of indexing. In what follows I will make a number of remarks based on the objections raised by Nancy and Jeff. Warning: this message is not short. The primary problem, Nancy says, is the lack of precision. How far this is a real shortcoming is worth studying, since it is mainly a matter of interpretation. If "passim" is affixed to a big group of pages, like "25-60 passim," very little help is indeed given to the reader, no matter the meaning of the term. If, however, "passim" is affixed to a small cluster of pages, like "25-30 passim," the word becomes useful, because it gives to those numbers a different tone, and thereby increases the information given to the reader. We may well imagine, for instance, a case where an author talks about objective idealism. If the entry reads "idealism, objective, 25-30," the reader will know that those pages address the topic somewhat fully. If it reads "25-30 passim," the reader will immediately know that those pages talk intermittently about objective idealism, or make implicit references to it, in a manner that may well be worth a look (at least a diagonal reading). We may even imagine that the phrase "objective idealism" occurs on each of those six pages in a way that is more than a mere "mention," and less than a focused discussion. This may occur when objective idealism serves, not as a central topic of discussion, but as an important context of discussion, or as a secondary topic, or as a regular point of reference. Nancy adds that "passim" means different things to different people. Well, this is a truism. Any word can be undestood and used differently by different people, be they index-makers or index-users. The point is that no matter how varying the meaning is, it will retain a common core, "here and there throughout," or some such. It may lack precision, but in a way that's exactly what the word is about: "the entry is worth looking up on pp. 25-30, but please know, dear reader, that it is not the central topic of discussion -- its various occurrences lack definite precision." Adding "passim" to hyphenated numbers, in this regard, amounts thus to an increase of information, since it tells the reader that pages 25-30 are worth looking at, though they don't deserve the same degree of expectation as regular page numbers. Nancy suggests the use of the word "mentioned" as a good, more precise, substitute for "passim." I disagree. Those two words do not serve the same purpose. In a certain way, "mentioned" is a restriction of the use of "passim" to single discrete pages; it means: on this page and that one, the entry makes a fast appearance. I don't think this is more precise. There are many ways in which a given entry can be merely "mentioned": as a "by the way," an example, an "other case," a certain standpoint, a comparison, a context, a remark, a premisse for an argument, a sudden recollection, a parenthesis, a provisional conclusion, etc. Look at the following layouts: "mentioned: 25, 26, 27, 28, 29, 30" and "25-30 passim." In my view, those two ways of putting it convey different meanings, though, in this special case, where all the numbers are consecutive, they are admittedly very close. The trouble with "mentioned" is that there are many cases where a concept is mentioned so rapidly that it is not even worth indexing, so that the question becomes, once again, where to draw the line. This is a matter where the indexer's "concrete reasonableness" (as Peirce would have said) is to be trusted. Nancy says that "an index is supposed to quickly and efficiently direct readers to a discrete unit of text." This is good common sense, but is rather restrictive. It is one thing to index a software manual (where you've got to be very precise and directive), and quite another to index a philosophical treatise. Many a time, it is just not possible to direct the reader's attention to a specific unit of text without betraying the author's thought (because of the way an argument is developed). A discrete unit of text is always an abs-traction, some sort of discontinuity introduced in the text flow. Of course, the very notion of discrete unit is subject to debate. Ordinarily, the smallest unit of text in an index is an entire page (except in the case of a footnoted page number, where the reader is directed to a specific portion of the page). Ultimate precision (providing the line numbers) is not achievable, nor is it necessary or even desirable. The point is that indexers have to acknowledge that there are limits to precision. Even numbers like "25-30" without "passim" will be used differently by different indexers in similar circumstances. Some might have wanted to break it down to "25, 26, 27-29, 30" and others would have done it differently. The use of a hyphen is probably as dangerously convenient as the use of "passim." Jeff says that if we can't be specific, we are hardly competent indexers. I suppose this is not utterly false. But indexers can't be more specific than the text they index. Sometimes an author would discuss complex matters through circumvolutions of various sorts; why should the index not try to reflect that as well? The use of subentries, carefully considered, may help in certain cases, but they are not a cure. Consider the following example: "Peirce's conception of history, 155-165 passim." The author of the journal article, devoted to the topic of history in general, happens in those pages to refer to Peirce's conception at every turn, though not in a systematical way; his main purpose is elsewhere, and he constantly intersperses his argument with references to Peirce that are not mere "mentions." In such a case, the word "mentioned" followed by eleven page numbers will not do, and "passim" seems to be the best alternative. I should add that "155-165 passim" does not remove, and is not removed by, the possibility of adding subentries to that main entry within that same range of pages. Theoretically, I suppose, "passim" could be used at any level in an entry (main, sub, or sub-sub), which goes to show that specific subentries can't replace passims: they just don't play the same role. Jeff also argues that "too many passims look sloppy and are bad for business," and "it is better to find some other rubric to cover the references." This is a very weak argument, rather cosmetic. It is very unlikely that an index would ever count too many passims. Too many would undoubtedly be a sign of either a sloppy index or indeed a very sloppy text. But that would hardly ever happen, would it? As to finding some rubric (such as a subentry, I imagine) to cover potential "passim" references, well this is a trick that can't work all the time. Often enough it would be simply more honest to use passims than trying to cover them up. Whether passims look nice or not should hardly be a consideration: what indexers want is to be true to the text. The main fear, I gather, comes from the temptation of overusing passims. The temptation is real, but so what? It is all a matter of the indexer's personal integrity, and that cannot be controlled, passims permitted or not. And as with many other things, I am not sure that banning passims altogether to guard ourselves against the temptation is a solution. Can't we trust the indexer's maturity in this regard? What we need is probably a precise technical definition of the textual circumstances in which the use of passim would be acceptable or not. Nancy's reference to the Stanford University Press guide might be a start, though it is rather unsatisfactory. A cluster of references within three pages of one another might mean this: references are found on pages 1, 3, 5, 8, 11, 13, 15, 18, 19, 21, 24, 26, so that we could get "1-26 passim," and the use of the word "mentioned" is more effective here. "Passims" require a greater continuity, mixed with imprecision. Enough said about this subject. Anyone cares to respond? Andre De Tienne ADETIENN@INDYCMS.BITNET ========================================================================= Date: Wed, 9 Sep 1992 11:17:46 ECT Reply-To: Indexer's Discussion Group Sender: Indexer's Discussion Group From: Gordon Joly Subject: Stemmer? In-Reply-To: Your message of "Fri, 31 Jul 92 10:34:19 +0700." ----------------------------Original message---------------------------- >> Well I was wondering if anybody can help me in finding source code for >> a stemmer since I find it pretty hard to do over here. A stemmer is >> part of the automatic indexing process of document databases in the >> Information retrival field, it's basic function is to stem the >> free-text format documents, ie >> >> stemmed word matching >> >> ANALY* Analysis, Analyst, Analyzer etc... >> ____ Gordon Joly Phone +44 71 387 7050 ext 3703 FAX +44 71 387 1397 Internet: G.Joly@cs.ucl.ac.uk UUCP: ...!{uunet,uknet}!ucl-cs!G.Joly Computer Science, University College London, Gower Street, LONDON WC1E 6BT ========================================================================= Date: Wed, 9 Sep 1992 14:35:19 ECT Reply-To: Indexer's Discussion Group Sender: Indexer's Discussion Group From: Kody Janney Subject: RE: Passim, again ----------------------------Original message---------------------------- Although I do have a strong interest in automatic indexing, I am not an indexer. I am certainly an index user. I would like to comment on passim from the user's point of view. Most people don't know what the word means. I had always assumed that it meant "in passing". I suppose I could have reached for a dictionary to look it up, but who uses a dictionary while they are using an index? And my Latin, like most Americans', is nonexistant. As a user, I find that I most appreciate those indexes that give me an indication of the degree of importance of an index term in any particular page. When I see an index that uses passim or mentioned I can use it happily knowing that the indexer has weighted the terms for me and I can cast my net for a broad discussion of a concept or for a very pointed one. So I guess that this is a user vote for mentioned over passim, but better passim then no indication of weight at all. Kody Janney Interactive Home Systems kodyj@microsoft.com ========================================================================= Date: Thu, 10 Sep 1992 14:33:13 ECT Reply-To: Indexer's Discussion Group Sender: Indexer's Discussion Group From: Charlotte Skuster Subject: NISO standards draft James Anderson, chairperson of the NISO committee on standards for indexes, has agreed to share the second draft of the standards with INDEX-L. This is a huge document, so I have divided it into three sections. Even divided, each of the parts are larger than normal for sending through listservs. My advisor at our computer center assures me that sending three large documents to you folks will work just fine. I am not totally convinced, but being a neophyte to network matters, I will follow the advice of the expert. My apologies ahead of time if the document is too large for some systems to handle. I will send the first chunk today, the next tomorrow and the third on Monday. Charlotte ========================================================================= Date: Thu, 10 Sep 1992 14:47:11 ECT Reply-To: Indexer's Discussion Group Sender: Indexer's Discussion Group From: Charlotte Skuster Subject: Standards Draft Part 1 ..ANSI/NISO Standard Guidelines for Indexes, Draft #2.1, page Standards for Libraries * Information Sciences * Publishing National Information Standards Organization (Z39) ANSI/NISO Z39.4-199X Proposed American National Standard Guidelines for Indexes in Information Retrieval Draft #2.1, prepared by James D. Anderson, Chairperson, based on committee recommendations and discussion as of May 4, 1992, plus suggestions by Nancy Mulvany, Deborah Swaim, Hans Wellisch, and Ralph Earle. DISTIBUTION -- This draft is available for comment to all members of the indexing community. A paper copy may be obtained from Rutgers University for the cost of copying, postage and handling ($10.00). Make checks payable to Rutgers The State University. See the addresses in the COMMENTS section below. COMMENTS -- Please send comments regarding this draft to James D. Anderson, School of Communication, Information, and Library Studies, Rutgers the State University of NJ, 4 Huntington St., New Brunswick, NJ 08903, 908/932-7501, FAX 932-6916, internet janderson@zodiac.rutgers.edu NOTE -- @ has been used for accent codes; @@ has been used to mark italics; @@@ has been used to mark boldface type. After editing, these codes will be replaced with the appropriate accents or type-faces. Committee Members James D. Anderson, Chairperson School of Communication, Information, and Library Studies Rutgers the State University of NJ New Brunswick, NJ 08903 Barbara Anderson DIALOG Information Services Palo Alto, California Catherine Grissom Department of Energy Office of Sci & Tech Information Oak Ridge, Tennessee Nancy Mulvany Bayside Indexing Service Kensington, California Barbara Preschel Public Affiars Information Service (PAIS) New York, New York Deborah Swain IBM Research Triangle Park, North Carolina Hans Wellisch College Park, Maryland Abstract This standard provides guidelines for the content, organization, and presentation of indexes used for the retrieval of documents and parts of documents. It deals with the principles of indexing, regardless of the type of material indexed, the indexing method used (intellectual analysis, machine algorithm, or both), or the medium of the index and its presentation. It includes definitions of indexes and of their parts, attributes and aspects; a uniform vocabulary; treatment of the nature and variety of indexes; and recommendations regarding the design, organization, and presentation of indexes. It does not attempt to set standards for every detail or technique of indexing. These can be determined for each index on the basis of factors covered in the standard, including the type of material indexed, the medium of the index, and the type of user for whom the index is designed. Table of Contents Committee Members Abstract Table of Contents Guide to the Standard Normative References Bibliography 0. Proposed title change 1. Scope of the standard 1.1. General statement 1.2. Types of documents 1.3. Presentation of indexes 1.4. Choice of terms 1.5. Method of preparation 2. Definitions 3. Functions of an index 4. Types of index 4.1. Indexes by type of referent 4.2. Indexes by type or extent of indexable matter used to produce the index 4.3. Indexes by arrangement of entries 4.4. Indexes by method of term coordination for searching 4.5. Indexes by type, format, or genre of document or media indexed. 4.6. Indexes by medium of index 4.7. Indexes by periodicity of the index 4.8. Indexes by authorship 5. Features and attributes of indexes 5.1. Subject scope 5.1.1. Multiple versus unified subject indexes 5.2. Documentary scope 5.2.1 Multiple versus unified document type indexes 5.3. Sources 5.4. Codes and symbols 5.5. Display media 5.6. Documentary units 5.7. Indexable matter 5.8. Analysis method 5.9. Exhaustivity 5.10. Specificity 5.11. Syntax 5.12. Vocabulary management 5.13. Text surrogation; locators 5.14. Entry display 5.15. File display and arrangement 6. Vocabulary 6.1. Sources of vocabulary 6.2. Forms of terms 6.2.1. Parts of speech 6.2.2. Spelling 6.2.3. Capitalization 6.2.4. Singular and plural forms 6.2.5. Articles 6.2.6. Bound terms 6.2.7. Antonyms and associated terms 6.2.8. Terms consisting of more than one word 6.2.9. Proper names and titles of documents 6.2.9.1. Personal names 6.2.9.2. Corporate body names 6.2.9.3. Geographical names 6.2.9.4. Titles of documents 6.2.9.5. First lines 6.2.10. Romanization 6.3. Homographs 6.4. Synonymous and equivalent terms 6.5. Hierarchical relationships among terms 6.6. Other relationships 6.7. Changes in terminology 6.8. Display of vocabulary in indexes 6.8.1. Integrated display of vocabulary 6.8.2. Vocabulary in file displays 6.8.2.1. Scope and history notes 6.8.2.2. Cross-references 6.8.3. Vocabulary management for implicit indexes 7. Entries, headings, and search statements 7.1. Heading or search syntax 7.2. Post-coordination syntax 7.2.1. Boolean syntax 7.2.2. Weighted term combinations 7.2.3. Proximity operators, stemming, and truncation 7.3. Pre-coordination syntax 7.3.1. Ad hoc syntax 7.3.2. Natural language syntax 7.3.2.1. KWIC indexes 7.3.2.2. KWOC indexes 7.3.2.3. KWAC indexes 7.3.3. Subject heading lists 7.3.4. Permuted indexes 7.3.5. String indexing 7.3.5.1. Rotated terms 7.3.5.2. Faceted indexing 7.3.5.3. Ad hoc coding 7.3.5.4. Chain indexing 7.4. Locators 7.4.1. Locators for printed documents 7.4.2. Locators for documents in other media 7.4.3. Multiple locators in indexes to single documents 7.4.4. Methods of emphasizing locators 7.4.5. Presentation of locators 8. File display 8.1. File display in electronic media 8.2. File display in visual indexes 8.2.1. File arrangement 8.2.1.1. Classified or relational file displays 8.2.1.2. Alphanumeric file displays 8.2.2. Recurring elements 8.2.3. Vertical spacing 8.2.4. Entry layout 8.2.5. Introductory note 8.2.6. Running headlines 8.2.7. Continuation lines 8.2.8. Typography 8.2.9. Columns 8.2.10. Electronic manuscripts 9. Alphanumeric order 9.1. Standards 9.2. Basic order 9.3. Initial articles 9.4. Subheadings 9.5. Headings with the same initial term 9.6. Cross-references 9.7. Word by word versus letter by letter arrangement 9.8. Numerals 9.9 Comprehensive example Guide to the Standard This standard consists of 9 sections. They are are briefly summarized here: 1. Scope of the standard: describes aspects of index preparation and presentation addressed by the standard. Encompassed are principles, rather than detailed procedures, for the presentation of indexes for retrieval of all types of documents. 2. Definitions: more than 90 terms are defined in order to clarify their use in this standard. Terms are arranged in alphabetical order. 3. Functions of an index: an expanded definition of "index" in the context of information retrieval, in terms of the minimum functions an index ought to perform. 4. Types of index: a continuation of the expanded definition of "index" in terms of the enormous variety of types of index. 5. Features and attributes of indexes: 15 key features and attributes common to all kinds of indexes are briefly described. For the most part, the standard does not specify how these key features and attributes should be implemented, but states that decisions on options should be based primarily on needs, habits, and preferences of users; publishers and index producers should discuss and agree on feature and attribute options prior to the production of an index; and that all special or unusual features should be made clear in an introductory statement or in documentation. 6. Vocabulary: recommendations concerning sources and form of vocabulary used in indexes. The standard emphasizes the importance for linking alternative terms for concepts. It recommends linking of terms for related concepts as well. The display of vocabulary information should be integrated into the index itself. 7. Entries, headings, and search statements: recommendations concerning the combination of terms to create index headings, entries and search statements. The principle recommendation states that such combination is absolutely essential. A wide variety of index syntax is described. 8. File display: options and recommendations for the display of visual indexes, including arrays of retrieved entries or records from implicit indexes (electronic indexes which are not displayed to the user). 9. Alphanumeric order: recommendations for the arrangement of alphanumeric indexes. Normative References The following American National Standards Institute/National Information Standards Organization (ANSI/NISO) standards contain provisions that, through reference in this text, constitute provisions of this NISO standard. At the time of publication, the editions indicated were valid. All standards are subject to revision, and parties to agreements based on this standard are encouraged to investigate the possibility of applying the most recent editions of the standards indicated below. ANSI/NISO Z39.19 -- 199x. @@American National Standard Guidelines for the Construction, Format and Management of Monolingual Thesauri.@@ ANSI Z39.29 -- 1977. @@American National Standard for Bibliographic References.@@ (Currently under revision.) ANSI/NISO Z39.59 -- 1988. @@American National Standard for Electronic Manuscript Preparation and Markup.@@ The following ISO standards and drafts are cited: ISO 690: 1987 (E) -- @@International Standard, Documentation -- Bibliographic references -- Content, Form and Structure.@@ ISO 9115: 1987 (E) -- @@International Standard, Documentation -- Bibliographic Identification (biblid) of Contributions in Serials and Books.@@ ISO/CD 999.4 Information and Documentation -- @@Guidelines for the Content, Organization and Presentation of Indexes.@@ NOTE -- There are no ANSI/NISO standards for the ordering of alphanumeric characters or other signs and symbols. The following filing rules from the American Library Association and the Library of Congress function as de facto standards in the United States, but they are incompatible with each other. Recommendations in this standard are closest to the ALA Filing Rules. American Library Association, Filing Committee. @@ALA Filing Rules.@@ Chicago: American Library Association; 1980. ix, 50 p. Library of Congress, Processing Services. @@Library of Congress Filing Rules,@@ prepared by John C. Rather and Susan C. Biebel. Washington: Library of Congress; 1980. vii, 111 p. The de facto standard for the formulation of name headings, both personal and corporate, is the following: @@Anglo-American Cataloguing Rules,@@ 2d edition, 1988 revision. Prepared by the Joint Steering Committee for Revision of AACR; ed. by Michael Gorman and Paul W. Winkler. Chicago: American Library Association; 1988. Bibliography This standard assumes basic understanding of indexing and indexes. Here are books which can be helpful in providing background information. They are arranged in inverse chronological order. Bell, Hazel K. @@Indexing biographies and other stories of human lives@@. London: Society of Indexers, 1992. Fetters, Linda K. @@A guide to indexing software@@. 4th ed. Port Aransas, TX: American Society of Indexers, 1992. @@Index evaluation checklist: a guide for authors, editors, publishers, reviews, librarians@@. Port Aransas,mm Tx: American Society of Indexers, 1991. Lancaster, F. W. @@Indexing and abstracting in theory and practice@@. Champaign, IL: Graduate School of Library and Information Service, University of Illinois, 1991. Wellisch, Hans H. @@Indexing from A to Z@@. New York: H. W. Wilson, 1991. @@Indexing: the state of our knowledge and the state of our ignorance@@. Medford, NJ: Learned Information, 1989. Salton, Gerard. @@Automatic text processing: the transformation, analysis and retrieval of information by computer@@. Reading, MA: Addison-Wesley, 1989. Rowley, Jennifer E. @@Abstracting and indexing@@. 2nd ed. London: Bingley, 1988. Cravin, Timothy C. @@String indexing@@. Orlando, FL: Academic Press, 1986. Soergel, Dagobert. @@Organizing information: principles of data base and retieval systems.@@ Orlando: Academic Press, 1985. Milstead, Jessica L. @@Subject access systems: alternative in design@@. New York: Academic Press, 1984. Knight, G. N. @@Indexing, the art of@@. London: Allen & Unwin, 1979. Borko, Harold; Bernier, Charles L. @@Indexing concepts and methods@@. New York: Academic Press, 1978. @@UNISIST: indexing principles@@. Paris: Unesco, 1975. 0. Proposed title change: from "Basic Criteria for Indexes" to "Guidelines for Indexes in Information Retrieval." 1. Scope of the standard. 1.1. General statement. This standard provides guidelines for the content, organization, and presentation of indexes used for the retrieval of documents and parts of documents. It deals with the principles of indexing, regardless of the type of material indexed, the indexing method used (intellectual analysis, machine algorithm, or both), or the medium of the index and its presentation. It includes definitions of indexes and of their parts, attributes and aspects; a uniform vocabulary; treatment of the nature and variety of indexes; and recommendations regarding the design, organization, and presentation of indexes. It does not attempt to set standards for every detail or technique of indexing. These can be determined for each index on the basis of factors covered in the standard, including the type of material indexed, the medium of the index, and the type of user for whom the index is designed. NOTE -- In other contexts, the term "index" can be used to indicate other phenomena, for example, a consumer price index indicates the rise and fall of prices. The construction and display of such indexes that do not refer to documents is not covered by this standard. 1.2. Types of documents. This standard applies to indexes for single documents and for collections of documents. "Document" is used in the broadest possible sense (see 2. Definitions). The standard therefore applies to every kind of message recorded on any kind of medium -- print-on-paper media (books, pamphlets, periodicals, reports, fliers, maps, pictures, photographs, etc.); electronic media (online, CD-ROMs, optical disks, magnetic disks or tapes, etc.); microforms; video media; film media; audio records; and realia -- encompassing every kind of format and genre, including treatises, literary works, patents, technical reports, charts, diagrams, tables, illustrations, music, performances, artistic works, and multimedia texts. The term "document" as used throughout this standard implies also parts, sections, even paragraphs or sentences within documents and to collections of documents, depending on the documentary unit to which index entries refer. 1.3. Presentation of indexes. This standard is concerned with basic indexing principles and practice as they affect the presentation of an index. It does not cover the detailed procedures of indexing. Emphasis is on presentation of the index, rather than on the way it is prepared or the way it is structured or stored electronically. The internal representation of computer-readable indexes (inverted files, for example) designed for electronic rather than human scanning is not directly addressed. Nevertheless, all kinds of indexes for human use are considered, regardless of the medium on which the index is displayed (e.g., print on paper, microforms, electronic media, etc.). Examples are illustrative, not prescriptive. 1.4. Choice of terms. This standard covers criteria for the choice of, form of, and access to terms and headings used in index entries once the topics or features to be indexed have been determined. (For the compilation of thesauri that may assist in the selection of index terms, see NISO Z39.19 @@Guidelines for the Construction, Format, and Management of Monolingual Thesauri.@@) 1.5. Method of preparation. This standard is relevant to the preparation of all types of indexes for information retrieval, regardless of whether they are produced "manually" on the basis of human intellectual analysis or by automatic or computer-assisted methods, and whether compiled by one indexer or by teams of indexers. 2. Definitions. These definitions describe terms as they are used in the context of this standard. Within definitions, terms that have their own definitions are spelled in capital letters. Defined terms are listed in the singular noun form; however, within other definitions, corresponding terms may appear as plural nouns, adjectives, or other forms. [NOTE -- Many of these terms are also defined in Z39.19 American National Standard Guidelines for Thesaurus Construction, Structure and Use. As soon as the final version of that standard is available, we will compare definitions that occur in both standards, and adjust ours to conform to theirs to the extent possible. -- JDA] array. A displayed FILE of HEADINGS or ENTRIES. assignment indexing. An INDEXING method by which TERMS are selected by a human or computer to represent the TOPICS or FEATURES of a DOCUMENT. Assigned terms may or may not occur in the documents. (SEE ALSO derivative indexing.) associative relationship. A non-hierarchical relationship among TERMS that are conceptually or semantically linked, for example, "cars" and "accidents". authority file. A set of records of established TERMS or HEADINGS and the CROSS-REFERENCES to be made to and from them, often citing the authority for the preferred form or variants. Types of authority files include name authority files, subject authority files, and THESAURI. (SEE ALSO descriptor.) Boolean operators. The logical operators "and", "or", and "not", which can be used to combine TERMS for searching in post-coordinate INFORMATION RETRIEVAL SYSTEMS. (SEE ALSO post-coordination.) bound term. SEE compound term. broader term. A term to which other terms are subordinate in a HIERARCHY. chain indexing. The creation of HEADINGS that consist of "chains" of TERMS extracted from a CLASSIFICATION scheme, arranged in an order opposite the CITATION ORDER of FACETS in the classification scheme itself. citation order. The order in which FACETS are arranged (cited) in a classified ARRAY; also the order in which TERMS from facets are placed in an index HEADING. class. A set whose members share an attribute, characteristic, property, quality or trait. classification. The operation of grouping CONCEPTS or ENTITIES into classes and establishing relations among them. TERMS representing classes are usually arranged in ARRAYS that illustrate relations among CLASSES, creating a classified INDEX, as opposed to an alphanumeric index. compound term. A TERM consisting of two or more words but representing a single CONCEPT; also a multi-word term representing multiple concepts that are so often considered together that representing them with separate terms is awkward. concept. A unit of thought. INDEXES represent concepts only by means of TERMS or other graphic symbols. Concepts exist only in the mind and are abstract entities independent of terms used to express them. controlled vocabulary. A subset of the lexicon of a NATURAL LANGUAGE. A list of TERMS that may be used for INDEXING, produced by the operation of VOCABULARY CONTROL. Controlled vocabularies are usually recorded in SUBJECT HEADING lists or THESAURI. cross-reference. A link or reference from one TERM or HEADING to another (SEE ALSO see reference; see also reference; equivalent term; broader term; narrower term; related term). Cross-references are of two types: (a) prescriptive, often begining with the words "see" or "use" (or equivalent), which lead to one or more terms or headings that are to be used instead of the one from which the cross-reference is made; (b) suggestive, often beginning with the words "see also" (or equivalent). Suggestive references include "related term (RT)" references, which lead from one term or heading to other terms or headings that are related to or associated with it in the context of an INDEX, and HIERARCHICAL references, including "broader term (BT)" and "narrower term (NT)" references, which lead from one term or heading to other terms or headings which are broader or narrower than the LEAD TERM or heading in the context of an index. derivative indexing. An INDEXING method by which words occurring in the title or TEXT of a DOCUMENT are extracted by a human or computer to serve as indexing TERMS. Also called extractive indexing. (SEE ALSO assignment indexing.) descriptor. A TERM chosen as the preferred representation for a CONCEPT or FEATURE. difference. SEE modifier. document. A MEDIUM on or in which a MESSAGE is encoded; thus, the combination of message and medium. The term applies not only to written and printed materials on paper or microforms (e.g., books, journals, maps, diagrams), but also to nonprint media (e.g., machine-readable records, transparencies, audiorecordings, videorecordings and film) and, by extension, to three- dimensional objects or realia (e.g., museum objects and specimens). The term "document" as used throughout this standard implies also parts, sections, even paragraphs or sentences within documents and to collections of documents, depending on the DOCUMENTARY UNIT. documentary unit. The DOCUMENT, document segment, or collection of documents to which index ENTRIES refer and on which index entries are based. Examples of verbal documentary units include sentences, paragraphs, pages, articles, book- length monographs, complete serial-runs, or complete library collections. The documentary unit determines the relative size of document that an INDEX will point to or an INFORMATION RETRIEVAL SYSTEM will retrieve. entity. Something that has an existence, real or imaginary, concrete or abstract; a thing. entity-oriented indexing. INDEXING based entirely or primarily on the TOPICS and FEATURES of DOCUMENTS rather than on the anticipated needs and requests of users. (SEE ALSO request oriented indexing.) entry. The representation of a DOCUMENT in an INDEX. Consists of a HEADING and a LOCATOR; the heading may be followed by one or more SUBHEADINGS and SUB- SUBHEADINGS. entry term. Also called lead-in term. A TERM or HEADING in an INDEX or THESAURUS to which direct access is provided. If an entry term is not a PREFERRED TERM, a CROSS-REFERENCE will lead to a preferred term that is used in its place. entry vocabulary. All TERMS by which access may be gained to the INDEX, including both those which lead to DOCUMENTS and those from which CROSS- REFERENCES lead to other terms that are used in their place. equivalent term. A SYNONYMOUS term, or a TERM which is equivalent to, or used for, another term in the context of an INDEX. exhaustivity. The detail with which a DOCUMENT is indexed. The number of TERMS assigned, on the average, to a document in a particular INDEX or retrieval system. extractive indexing. SEE derivative indexing. facet. The total of subclasses resulting from the application of a single characteristic to a CLASS. Facets represent the fundamental or most important aspects of a TOPIC. In literature, for example, facets may represent language, nationality, genre, period, theme, writer, etc. Facets form one of the bases of CLASSIFICATION schemes; their CITATION ORDER determines the order in which the facets are arranged. faceted indexing. The assignment of terms to FACET categories and the ordering of terms within index HEADINGS in accordance with a CITATION ORDER of facets. false drop. An irrelevant reference retrieved when TERMS are POST-COORDINATED, for example, "library" and "school" will retrieve both "library school" and "school library". feature. An aspect of a DOCUMENT other than CONCEPTS or TOPICS treated. Features include such aspects as authorship, style, methodology, quality, usefulness, level of complexity, language, format, publication date, etc. file. A sequence of 2 or more ENTRIES or records. Distinguish from the common data-processing and word-processing usage in which "file" is often used for a text document or a collection of data. (SEE ALSO array.) focus. In a COMPOUND TERM, the noun component that identifies the class of CONCEPTS to which the TERM as a whole refers. (SEE ALSO modifier.) free text term. Antonym of CONTROLLED VOCABULARY term. NATURAL LANGUAGE TERMS appearing in DOCUMENTS or their descriptions, which may replace or complement DESCRIPTORS in an INFORMATION STORAGE AND RETRIEVAL SYSTEM. (SEE ALSO keyword.) generic posting. (1) In THESAURI, the use of BROADER TERMS in place of NARROWER TERMS, for example, beds @@use@@ furniture; sofas @@use@@ furniture. (2) In INDEXING and subject cataloging, the assignment of a generic term in addition to or instead of a specific TERM, for example, using "furniture" to index a DOCUMENT on sofas. (SEE ALSO up-posting.) gloss. SEE qualifier. heading. One or more TERMS representing a TOPIC or FEATURE of a DOCUMENT; the first element of an index ENTRY. It may be followed by one or more SUBHEADINGS. hidden index. SEE implicit index. hierarchy. A system of terms ranked by inclusiveness, so that the meaning of any lower term is always included in the meaning of the next higher term. Hierarchical ARRAYS display NARROWER TERMS under BROADER TERMS. homograph. TERMS that have the same spelling, but different meanings, such as race (anthropology), race (sports). Homographs must be distinguished by QUALIFIERS. identifier. A proper name (or its abbreviation) of an institution, person, place, object, operation or process, optionally treated as a type of TERM distinct from DESCRIPTOR. Identifiers may be held in a separate FILE (SEE ALSO authority file), and their form may be controlled (e.g., the name of an international organization having different names in various languages, only one of which is selected as an authorized term or descriptor). implicit index. An INDEX designed to be searched through electronic matching; it is not displayed for visual inspection or searching. Principles of arrangement of entries, while essential for VISUAL INDEXES, do not apply to implicit indexes. index. A systematic guide designed to indicate TOPICS or FEATURES of DOCUMENTS and used to make possible and facilitate retrieval. In other contexts, the term "index" is also used to indicate other phenomena, for example, a consumer price index indicates the rise and fall of prices. The construction and display of such indexes that do not refer to documents is not covered by this standard. indexing. The operation of creating an INDEX for information retrieval; the selection and assignment of TERMS to, or the extraction of terms from, a DOCUMENT in order to indicate TOPICS or FEATURES represented or possible uses of the document and for its retrieval in an INFORMATION STORAGE AND RETRIEVAL SYSTEM. indexing language. In a broad sense, any vocabulary used for INDEXING and the rules for its application. In a narrower sense, A CONTROLLED VOCABULARY or CLASSIFICATION system and the rules for its application. information. Refers both to an ENTITY (e.g., a MESSAGE recorded in a TEXT and represented in a DOCUMENT) and to the process of informing or becoming informed. What constitutes an informative message and successful information (as process) is subjective. Preferably, terms like "message", "text", or "document" should be used when referring to entities that may be informative. information retrieval system. A set of operations and the associated equipment, procedures, algorithms, and documentation by which DOCUMENTS are indexed and the resulting records are stored and displayed, so that selected records (and or the documents they represent) can be retrieved. Also known as "information storage and retrieval system" (ISAR). keyword. A word occurring in the NATURAL LANGUAGE of a DOCUMENT or its description that is considered significant for INDEXING and retrieval. Any word not on a stop list contained in a verbal segment of a document or assigned to a document, such as, title, abstract, SUBJECT HEADINGS. Used as LEAD TERMS in keyword indexes such as keyword-in-context (KWIC), keyword-out-of-context (KWOC), and keyword-along-side-context (KWAC) indexes. KWAC (Key Word Along-side Context) index. An INDEX in which each significant word in a string of TEXT serves as LEAD TERM or access point, followed by the portion of the string that follows the word, then by the portion of the string that precedes the word. KWIC (Key Word In Context) index. An INDEX in which each significant word in a string of TEXT serves as LEAD TERM or access point, by being graphically emphasized and surrounded by the rest of the string. The lead terms or access points are arranged in a column in the middle of the ENTRIES rather than at the left. KWOC (Key Word Out of Context) index. An INDEX in which each significant word in a string of TEXT serves as LEAD TERM or access point, followed by the complete string. Multi-word terms which include the lead term are not preserved, since the lead term is always followed by the first word of the string. lead-in term. SEE entry term. lead term. The first TERM in a HEADING. Distinguished from "lead-in term." literary warrant. Justification for the representation of a CONCEPT in an INDEXING LANGUAGE or for the selection of a PREFERRED TERM because of its frequent occurrence in the literature. locator. The part of an ENTRY that represents the DOCUMENT to which the entry refers. Locators range from brief notation, such as page numbers, to full bibliographic citations. main heading. SEE heading. medium. The physical ENTITY on or in which a MESSAGE is recorded. A medium and a message recorded in or on it constitute a DOCUMENT. message. The INFORMATION conveyed by a DOCUMENT. modifier. In a COMPOUND TERM, one or more components which serve to narrow the extension of a FOCUS and specify one of its subclasses. Also known as 'difference.' (SEE ALSO subheading.) monographic index. 1. An INDEX compiled for a single DOCUMENT. 2. A one-time, closed-end index. (SEE ALSO serial index.) narrower term. A TERM that is subordinate to another term in a HIERARCHY. natural language. A language used by human beings for verbal communication. Words extracted from natural language TEXTS for INDEXING purposes are often called KEYWORDS. near-synonym. SEE quasi-synonym. nonpreferred term. A TERM not used as a DESCRIPTOR, but linked to its equivalent descriptor by a CROSS-REFERENCE in a CONTROLLED VOCABULARY. permuted index. The representation of TERMS in HEADINGS in every possible combination or permutation. Because of the rapid expansion of possible term combinations as the number of terms increases, most permuted indexes display only two terms at a time. post-coordination. The combination of TERMS at the time of a search for a compound CONCEPT, for example, "cataloging" + "periodicals" for the concept "cataloging of periodicals". (SEE ALSO pre-coordination.) postings. The number of DOCUMENTS to which a TERM or HEADING is assigned. pre-coordination. The formulation of a multi-term SUBJECT HEADING or the linking of a HEADING with a SUBHEADING prior to or at the time of INDEXING to express a compound CONCEPT, for example, "cataloging of periodicals" or "cataloging -- periodicals." Pre-coordination differs from the establishment of bound or COMPOUND TERMS as DESCRIPTORS, for example, "birth control" (a bound term) vs. "birth control -- education -- United States" (pre-coordinated terms). preferred term. SEE descriptor. probabilistic indexing. The assignment of weights to TERMS, either through computer algorithm or human estimation, to reflect the relative importance of terms in the description of TOPICS or FEATURES of a DOCUMENT or the representation of a search request. Weights are intended to reflect the probability that a document described by a particular term will be considered useful. The use of weighted terms permits the ranking of retrieved documents on the basis of expected usefulness. proximity operator. A search operator which specifies that two or more search TERMS must be within the stated proximity (e.g., contiguous, not separated by more than 2 or more words, within the same sentence or paragraph or record, etc.). qualifier. A word or phrase added to a TERM used to distinguish among HOMOGRAPHS or to clarify the meaning of a term, for example, "races (ethnology)", "races (sports)". A qualifier is often placed in parentheses and is considered to be part of the term. (SEE ALSO modifier.) quasi-synonym. A TERM whose meaning is not exactly SYNONYMOUS with that of another term, yet which may nevertheless be treated as its equivalent in a particular INDEX. (SEE ALSO equivalent term.) related term. A TERM that is semantically but not HIERARCHICALLY linked to another term by means of a CROSS-REFERENCE. relationship indicator. A word, phrase, abbreviation or symbol identifying a semantic relationship between TERMS. Examples of relationship indicators used in CROSS-REFERENCES include: "USE" and "USED FOR" (for EQUIVALENT TERMS), BT (for BROADER TERMS), NT (for NARROWER TERMS), and RT (for RELATED TERMS). Examples of relationship indicators used in index ENTRIES include: compared to, influence of, applied to, application of, etc. (SEE ALSO role indicator.) request-oriented indexing. INDEXING which is based primarily on analysis of potential requests or searches and only secondarily on the content or FEATURES of DOCUMENTS. (SEE ALSO entity-oriented indexing.) role. A type of action by which the TOPIC represented by a TERM operates on a topic represented by another term in an index ENTRY, for example, application, comparison, influence, operation, process, etc. A role does not indicate either a HIERARCHICAL or an ASSOCIATIVE RELATIONSHIP. role indicator. A word, phrase, abbreviation or symbol identifying the ROLE of a TOPIC represented by a TERM. (SEE ALSO relationship indicator.) Romanization. The conversion of a non-Roman script by means of TRANSCRIPTION or TRANSLITERATION or a combination of the two methods. rotated index. The rotation of TERMS assigned to a DOCUMENT so that each one, in turn, becomes a LEAD TERM and all others constitute a SUBHEADING; non-lead terms may be listed in alphanumeric order or their original order may be maintained, as in a KWAC INDEX. scope note. An explanation, definition, or clarification of a TERM. A scope note is not considered part of a term. (SEE ALSO qualifier.) "see also" reference. A link between two or more TERMS or HEADINGS, for the purpose of suggesting additional BROADER, NARROWER, or other RELATED TERMS or headings. "see" reference. A link between an unused or NONPREFERRED TERM or HEADING and the SYNONYMOUS or equivalent DESCRIPTOR or heading to be used in its place; in electronic INDEXES, synonymous and EQUIVALENT TERMS may be linked so that all may be included in a search, rather than designating one of the linked terms as a "preferred" term and the others as "unused terms." serial index. 1. An INDEX compiled for one or more serials (periodicals, yearbooks, etc.). 2. Any continuing, open-end index. (SEE ALSO monographic index.) specificity. The closeness of fit between an indexing TERM and the TOPIC or FEATURE represented in a DOCUMENT to which it refers. "Specific" does not mean "narrow." A specific term may be broad or narrow depending on the topic or feature to which it refers and its relationship to BROADER or NARROWER TERMS. string indexing. The creation of multi-term index HEADINGS, or "strings" of TERMS, from individual index terms by computer algorithm. Index terms may be coded, sometimes by FACET or role. A string indexing algorithm puts each important term in the lead position and arranges other terms as a SUBHEADING. sub-subheading. A HEADING subordinated to or modifying a SUBHEADING. subentry. SEE subheading. subheading. A HEADING subordinated to a heading in an ENTRY in order to modify or delimit the heading. (SEE ALSO pre-coordination.) subject. SEE concept; feature; topic. subject heading. A TERM or combination of terms used to indicate the summarized overall TOPIC of a DOCUMENT. PRE-COORDINATION of terms representing multiple and related TOPICS or FEATURES is a characteristic of subject headings that distinguishes them from DESCRIPTORS, which tend to represent individual CONCEPTS or features. Subject headings are generally used in printed indexes and library catalogs, whereas descriptors are designed for POST-COORDINATION in electronic INFORMATION RETRIEVAL SYSTEMS. synonym. A TERM having a different form, but exactly or very nearly the same meaning as another term. syntax. The combination of TERMS to form HEADINGS and SUBHEADINGS in index ENTRIES or to form search statements for electronic indexes; also, the rules for such combination. term. A word or phrase used to represent a TOPIC or FEATURE of a DOCUMENT. text. Any organized pattern of symbols. A text may be verbal (a representation of speech, as in writing systems); visual, as in the visual arts; musical, as represented in musical notation; performance, as represented in choreography notation; aural, as in sound recordings; etc. Many disciplines, such as chemistry and mathematics, have special symbol sets to represent texts. A text is manifested in a DOCUMENT. (SEE ALSO message.) thesaurus. A collection of vocabulary with links among SYNONYMOUS, EQUIVALENT, BROADER, NARROWER, and other RELATED TERMS. From the Greek for treasure. topic. An ENTITY, process, operation, place or time period treated in a DOCUMENT. (SEE ALSO feature.) transcription. The process of recording the phonological and/or morphological elements of a language in terms of a writing system. transliteration. The process of recording the graphic symbols of one writing system in terms of corresponding graphic symbols of another writing system. uncontrolled vocabulary. TERMS derived by extraction of significant words or phrases, usually from full text, titles, or abstracts. May also refer to search terms freely chosen by a searcher. (SEE ALSO free test; keyword; term.) uniterm. A single-word term representing a single elemental concept. (SEE ALSO compound term.) unit of analysis. SEE documentary unit. up-posting. The automatic assignment of BROADER TERMS in addition to the specific TERM by which a DOCUMENT is INDEXED. (SEE ALSO generic posting.) used-for term. SEE equivalent term. vector. A weighted INDEX or search TERM. Weights may be assigned by computer algorithm based on term frequency or distribution in DOCUMENTS or by human estimation of importance. Vector searching permits the ranking of retrieved documents based on the expectation of usefulness. visual index. An INDEX designed and displayed for visual examination and searching. (SEE ALSO implicit index.) vocabulary control. The process of organizing a list of TERMS: (1) to indicate which of two or more EQUIVALENT TERMS are authorized for use; and (2) to indicate HIERARCHICAL and aASSOCIATIVE RELATIONSHIPS among terms in the context of a THESAURUS or SUBJECT HEADING LIST. (SEE ALSO vocabulary tracking and management.) vocabulary tracking and management. The process of tracking, mapping, organizing and displaying a vocabulary to facilitate INDEXING and/or searching. The results are often displayed in a "search" or "end-user" THESAURUS and/or integrated with the display of an INDEX. Vocabulary tracking and management is similar to VOCABULARY CONTROL, except that instead of limiting or controlling the use of vocabulary, it describes and displays vocabulary that has been or may be used. 3. Functions of an index. The function of an index is to provide users with an efficient and systematic means for locating documents, portions of documents, or descriptions of documents that may address information needs or requests. An index should therefore: a. Identify documents that treat particular topics or possess particular features. b. Discriminate between major and minor treatments of particular topics or manifestations of particular features. c. Provide access to topics or features by means of the terminology of users. d. Link terms representing equivalent concepts and indicate relationships among terms representing related concepts. e. Provide for the combination of terms to facilitate the identification of particular types or aspects of topics or features and to eliminate unwanted types or aspects. 4. Types of index. Indexes may be categorized by type of referent to which headings refer, by type of indexable matter or analysis base used to produce the index, by method of arranging entries, by method of term coordination for searching, by type of document or media indexed, by medium of the index, and by whether the index is a one-time (closed-end, monographic) index or a continuing (open-end, serial) index. The following examples are illustrious of common types of indexes; they are by no means exhaustive. 4.1. Indexes by type of referent: a. author indexes: often include all types of document creators, such as composers, illustrators, translators, editors, choreographers, artists, sculptors, painters, inventors, etc. b. topic or subject indexes: refer to topics treated in documents. Such indexes often identify documents possessing particular features as well. c. name indexes: limited to proper names, such as names of persons, places, corporate bodys, etc. d. number or notation indexes: provide access through numerical or coded designations such as patent number, ISBN, date, classification notation, etc. 4.2. Indexes by type or extent of indexable matter used to produce the index: a. title indexes: based solely on titles. b. first line indexes: based on first lines of, for example, poetry, hymns. c. citation indexes: based solely on reference citations in documents. d. full-text indexes: based on the full text of documents, with the possible exclusion of non-substantive words on a stop-list. 4.3. Indexes by arrangement of entries: a. alphabetical or alphanumeric indexes: Headings are arranged according to the commonly accepted order of letters and numerals. b. classified indexes: Headings are arranged on the basis of relations among headings, for example, hierarchy, inclusion, chronology, or association. Classified indexes are often based on pre-existing classification schemes, such as the Dewey Decimal Classification. c. alphabetico-classed indexes: Broad headings are arranged alphabetically. Narrower headings are grouped under broad headings, under which they may be arranged alphanumerically or relationally, for example, on the basis of hierarchy, inclusion, chronology, or association. NOTE -- Electronic indexes often have no arrangement that is apparent to the user. However, indexes designed for human scanning, browsing and examination must have some arrangement, regardless of medium. ========================================================================= Date: Fri, 11 Sep 1992 10:09:33 ECT Reply-To: Indexer's Discussion Group Sender: Indexer's Discussion Group From: Edie Rasmussen Subject: Re: Stemmer? ----------------------------Original message---------------------------- There's a chapter on Stemming Algorithms, by Bill Frakes, in "Information Retrieval: Data Structures and Algorithms" (W.B. Frakes and R. Baeza-Yates, eds., Prentice Hall, 1992). It summarizes studies on the effectiveness of a variety of stemming algorithms, and includes source code in C for Porter's algorithm. The code should also be on the disk which is included with the book (though I think early copies were shipped without it). If you have trouble getting the disk and don't want to enter the code, you could try contacting Bill Frakes at frakes@sarvis.cs.vt.edu and asking him for it. Edie Rasmussen University of Pittsburgh ========================================================================= Date: Fri, 11 Sep 1992 10:10:27 ECT Reply-To: Indexer's Discussion Group Sender: Indexer's Discussion Group From: Larry Bonura Subject: What do you read? ----------------------------Original message---------------------------- I am some research on what indexers read. In my random and unscientific sampling of responses I want to see just what types of business publications, trade journals, newspapers, consumer magazines, and books you, the indexer, reads to keep abreast of your profession or to keep on top of your job. Let me know your favorite publications and why. If you don't read any publications for information on indexing or to keep up with your business, I'd like to know that, too. Also, do you have a system to make maximum use of your reading time and have you devised a file system to store articles for future reference? If so, please share them with me. ========================================================================= Date: Fri, 11 Sep 1992 10:12:43 ECT Reply-To: Indexer's Discussion Group Sender: Indexer's Discussion Group From: "Nancy C. Mulvany" Subject: Passim, again and again ----------------------------Original message---------------------------- Re> Passim, again and again Andre brings up some very interesting points in regard to our passim discussion; indeed several issues do get to the heart of the philosophy of indexing. First, I should like to address passim usage as it might appear in my own work. I primarily work with technical material. I have worked on material that would warrant the use of passim; one case that comes immediately to mind was a users manual for a Macintosh software product. The fact that I was tempted to use passim in this type of index is testimony to sloppy writing and extremely poor editing. Passim was not used in this index, instead many entries had long strings of undifferentiated page numbers. Because of the nature of the text discussion it was impossible to break these stings of page numbers into subentries. However, the citations were still important enough to list. From my limited experience indexing philosophical texts, I will quickly add that the desire to use a passim reference is not always indicative of convoluted, sloppy writing/editing. On the other hand, so what? Who cares whether "indexable passing mentions" are due to sloppy writing or the development of an intricate argument? As Andre points out: "But indexers can't be more specific than the text they index. Sometimes an author would discuss complex matters through circumvolutions of various sorts; why should the index not try to reflect that as well?" I think Andre brings up a very valuable point, i.e., indexers can't be more specific than the text they index. Very true! As an indexer I have a strong desire to be precise. I want to send readers to an exact place in the text. I will now admit that the "exact place in the text" may indeed be a range of pages where a topic is discussed in passing. So, an entry like: objective idealism, 25-30 passim may indeed be the most precise way to cite this particular discussion. The more I think about this, the judicious use of passim may not fail my precision test at all! In fact, if we can agree about what passim means, it may prove to be the most precise reference locator format available for the type of discussion cited. Now, what about "mentioned"? Mentioned implies, in my mind, that the topic appears verbatim in the text and is not discussed in detail. We may not be able to use mentioned in the "objective idealism" example above if objective idealism is not stated verbatim on the pages cited. So, again I agree with Andre, mentioned is not a substitute for passim, it is a different beast. I can see mentioned used as a "CYA device" (cover-your-ass) in an index -- cite every verbatim occurrence of the topic, leave nothing out. Whereas passim is not restricted to verbatim mentions; passim can be used to cite a discussion of a conceptual topic that in not necessary stated verbatim on the pages. We are getting into an area of indexing where few tread! There are those indexers who believe that we only index what appears verbatim in the text. Then there are those who believe that we index what appears verbatim and "read in between the lines," we also index (direct users) to concepts implied in the text. My description of passim allows the indexer to use the device to indicate implied discussion; whereas mentioned can only be used for verbatim discussion. Now we move into the area of how interpretive should indexes be? If "objective idealism" is discussed on pages 25-30 passim as a thread of thought that weaves its way through an argument on these pages but is not always discussed explicitly, is it the job of the indexer to make this type of discussion explicit by citing "25-30 passim?" I think I'm getting off the point. To summarize, we must agree on a definition for passim, or at least state how it is being used in a particular index in the headnote. I think it is correct to distinguish mentioned from passim. Lastly, if my index for a technical manual includes passim entries, that should immediately indicate that the manual should not be printed, it should go back the editor and be properly edited! :-} -nancy nmulvany@well.sf.ca.us ========================================================================= Date: Fri, 11 Sep 1992 10:13:07 ECT Reply-To: Indexer's Discussion Group Sender: Indexer's Discussion Group From: "Richard L. Goerwitz" Subject: Re: NISO standards draft ----------------------------Original message---------------------------- I think the general interest of the indexing standards warrants sending them out. Not a one of us, probably, will delete it. Despite the load, therefore, I think you're right to send it out. -Richard ========================================================================= Date: Fri, 11 Sep 1992 15:14:01 ECT Reply-To: Indexer's Discussion Group Sender: Indexer's Discussion Group From: Charlotte Skuster Subject: NISO standards draft part 2 4.4. Indexes by method of term coordination for searching: a. pre-coordinate indexes: See section 2 for definition of "pre-coordination". b. subject heading indexes: See section 2 for definition of "subject heading". c. string indexes: See section 2 for definition of "string indexing". d. chain indexes: See section 2 for definition of "chain indexing". e. post-coordinate indexes: See section 2 for definition of "post- coordination". f. rotated indexes: See section 2 for definition of "rotated indexes". g. uniterm indexes: See section 2 for definition of "uniterm". h. Boolean indexes (implicit indexes in which terms are combined using Boolean operators): See section two for definition of "implicit index" and "Boolean operators". i. vector or probabilistic indexes (implicit indexes in which retrieval is ranked according to occurrence of weighted terms): See section 2 for definition of "probabilistic indexing" and "vector". j. keyword indexes, including KWIC, KWOC, KWAC indexes: See section 2 for definition of "keyword", "KWIC", "KWOC", "KWAC". k. permuted indexes: See section 2 for definition of permuted indexes. 4.5. Indexes by type, format, genre, or medium of document(s) indexes: a. book indexes: NOTE -- An index to a single book is often referred to as a "back-of-the-book index," although such an index may be placed at the beginning or elsewhere in a book. b. periodical and serial indexes: but see also "serial" indexes in 4.7. c. monographic indexes: but see also "monographic" indexes in 4.7. c. poetry indexes. d. short story and fiction indexes. e. film indexes. f. illustration, picture, and painting indexes. g. artifact indexes. h. software indexes. i. computer-readable text indexes. 4.6. Indexes by medium of index: a. print (print-on-paper) indexes, including card indexes (card catalogs), and book indexes. b. microform indexes, including COM (computer-output microform) indexes. c. electronic indexes, including online indexes, optical and magnetic media searched electronically, and CD-ROM indexes. 4.7. Indexes by periodicity of the index: a. monographic (one-time, closed-end) indexes: but see also "monographic" indexes in 4.5. b. serial (continuing, open-end) indexes: but see also "periodical and serial indexes" in 4.5. 4.8. Indexes by authorship: Indexes may be created by computer algorithm, by human intellectual analysis and creativity, and by combinations of computer and human operations. Indexes created by human activity often take the form of separate authored documents, distinct from the document or documents that are indexed. An authored index is created independently through human intellectual analysis of text. This type of index document is distinguished from those that are created solely through algorithmic analysis of text carried out electronically. Authored documents may be expressed in "words, numbers, or other verbal or numerical symbosl or indicia . . ." (17 USCS 101) and are eligible for copyright protection as "literary works" (17 USCS 102(a)(1)). 5. Features and attributes of indexes. The following features and attributes are shared by most, if not all, indexes, regardless of purpose, type, medium or method of creation. Careful consideration of options with regard to each feature or attribute will contribute to a better index, since each feature or attribute will influence overall quality and performance. Decisions on options should be based primarily on needs, habits, and preferences of users. Publishers and index producers should discuss and agree on feature and attribute options prior to the production of an index. All special or unusual features should be made clear to users in an introductory statement or in documentation for electronic indexes (see 8.2.5. Introductory note.) 5.1. Subject scope. The subject scope of an index is not necessarily the same as the subject scope of the documents which it indexes. When this is the case, the subject scope, in terms of the kinds of topics or features indexed, should be clearly stated. Subject scope can be stated in terms of facets or categories covered within the overall scope of the document or documents indexed. Examples of facets or categories include: entities concrete entities persons individuals groups institutions artifacts natural objects abstract entities belief systems theories imaginary entities attributes and properties materials operations, processes, events, conditions places times historical periods Appropriate facets are frequently unique to a particular field. For example, facets that may (or may not) be indexed in a document related to literature may include: specific literatures (by nationality or culture) performance media languages periods individuals (e.g., authors) groups/movements genres works features literary techniques themes/motifs/figures/characters influences (recipients) sources processes types of scholarship methodological approaches theories devices/tools disciplines scholars document types A different field may include some of these same facets, but many different ones as well. 5.1.1. Multiple versus unified subject indexes. Unified subject indexes are generally preferred, but separate indexes are justified when particular aspects are especially important such as persons, animal species, products, ingredients. Separate subject indexes may also be desirable when it is awkward to assimilate verbal terms with non-verbal terms such as chemical formulae. Separate indexes for subject facets are often desirable in electronic indexes to facilitate targeted searches. Such separate indexes should also allow for global searches across all subject indexes. See also 5.2.1. for separate indexes for different documentary types. 5.2. Documentary scope. Indexes are also defined by the categories of documents they index. It is especially important for indexes to collections of documents, including most textual databases, to state explicitly the kinds of documents included within the documentary scope of the index, with respect to such criteria as: medium format periodicity (monographic, serial) audience or level language nationality (place of publication) time (date of publication, date of receipt) specific titles (when the scope of an index is limited to a stated list of documents) 5.2.1. Multiple versus unified document type indexes. Although unified indexes are generally preferred, special interest in particular documentary types may justify separate indexes, for example, for reviews, maps, illustrations, advertisements, etc. Electronic indexes may maintain separate indexes for documentary types to facilitate targeted searches; such separate indexes should also allow for global searches across all indexes. 5.3. Sources. As a general rule, indexes should be based on primary sources, that is, the actual documents being described. When indexes are compiled on the basis of secondary sources (descriptions of documents such as reviews, other indexes or databases or catalogs) rather than the documents themselves, this practice should be clearly stated and the sources of data described. Any "territorial" or locational limits affecting sources should also be explicitly stated. For example, "index is limited to documents found in New York City libraries." 5.4. Codes and symbols. Most indexes within the scope of these guidelines will use the standard Roman alphabet, punctuation symbols, and Arabic and Roman numerals in accordance with standards of English language usage. Whenever any other symbols are used, for example, for music, choreography, chemistry, mathematics, or non-Roman writing systems, these symbols and the codes that govern their use should be described. The method for arranging non-alphanumeric symbols in displays should also be described. 5.5. Display media. Indexes may be displayed in a wide range of media, including print on paper, cards, microforms, or electronic media such as online databases, CD-ROMS, and optical disks. Each medium has particular advantages and disadvantages, which need to be considered in relation to the needs, habits, and preferences of users. The medium will influence most other options regarding access to and display of the index. 5.6. Documentary units. The size and type of documentary units described by an index will directly determine what can be retrieved. For indexes to verbal documents, examples can range from lines, statements, paragraphs, pages, sections, articles, chapters, monographs, serials or series, to entire collections. Analogous units apply to non-verbal documents. The smaller the documentary unit, the more direct the referral to a particular topic or feature will be. Inherent documentary units should be preferred over physical medium units. For example, paragraphs or sections of a printed verbal text should be preferred to pages, since paragraphs or sections are more likely to constitute conceptual units. There usually is no particular or enduring relationship between a physical page in a book and particular part of a text. Indexes that refer to inherent documentary units may be used without change when a text appears in a variety of formats. (See also 7.4. Locators.) 5.7. Indexable matter. Indexable matter constitutes the portions of documents that are actually analyzed and indexed. Not all portions are equally important. For example, introductory matter, appendixes, bibliographies, glossaries, illustrations, tables, advertisements, letters, and reviews may or may not be indexed. Or, the may be indexed at different levels of exhaustivity or specificity. Indexing also may be limited to specific portions of text (e.g., abstracts, first and/or last paragraph, or captions). Decisions on appropriate indexable matter should be based on perceived importance to users of documents and portions of document and should be explicitly stated. 5.8. Analysis method. Documents may be indexed through human intellectual analysis, algorithmic machine analysis, or combinations of human and machine analysis. The method of analysis used to produce an index should be stated. When indexes are created by particular individuals, it is appropriate to acknowledge their contribution (see also 4.8. Indexes by authorship). 5.9. Exhaustivity. Exhaustivity of indexing is the detail with which topics or features of a document are analyzed and described. Exhaustivity may be described as the number of unique terms assigned to or extracted, on average, from a documentary unit. It can range from summary indexing in which one or two terms are assigned per documentary unit, to highly exhaustive indexing in which hundreds of terms may be assigned or extracted. Summary indexing tends to favor high-precision retrieval. Only documents that are closely and centrally related to a particular index term or heading are retrieved. On the other hand, highly exhaustive or detailed indexing tends to favor high-recall retrieval. More documents in which a topic or feature is not central are retrieved. The best level of exhaustivity clearly depends on the needs of users. Multiple levels of exhaustivity are advantageous and can be implemented by marking terms as primary or secondary. In electronic indexes, weights may be assigned to terms so that exhaustivity may be adjusted to the particular needs of a user. Exhaustivity, along with term specificity, heading syntax, and method of vocabulary management, are primary determinants of the size of an index. 5.10. Specificity. Specificity refers to the closeness of fit between index terms and the topics or features they represent. "Specific" does not necessarily mean "narrow," since a specific term may be broad or narrow depending on the topic or feature to which it refers. Nevertheless, specific indexing will provide specific terms for all or most topics and features and will result in a larger indexing vocabulary than more generic indexing. Greater specificity tends to improve the precision of searches, but it tends to make high recall searches more difficult, since categories of topics or features may be indexed by several specific terms, rather than subsumed under a single more generic term. Links between narrower and broader terms can facilitate more comprehensive, high-recall searches (see also 5.12. Vocabulary management). One option is to use high specificity in core areas of the subject scope of the index and to use more generic indexing in peripheral areas. 5.11. Syntax Syntax is the process of combining terms to create headings or search statements. Rules or patterns of syntax determine whether and how index terms are or may be combined to facilitate more accurate searching. One goal of an index is eliminability -- the ability to eliminate entries with a relevant lead term as irrelevant to a particular need. Single term headings often do not provide a sufficient basis for selection of potentially relevant entries or the elimination of potentially irrelevant entries. Unless a term is very specific and the context of the index is narrowly defined, some context is required within the entry or search statement itself. Consequently, every index should provide for the combination of terms in headings or search statements. For pre-coordinated index headings, syntax also governs the number and style of direct access points provided for multi-term headings. Access to all substantive terms in pre-coordinated headings is essential. This access can be achieved either through rotated arrangement of terms within the heading so that each term becomes a lead term or through cross-references. When pre-coordinated headings are searched electronically, individual terms may be accessed in the same way as in post-coordinate indexes (see next paragraph). For post-coordinated indexes, usually accessed through electronic media, rules of syntax are usually based on Boolean operators (AND, OR, NOT), proximity operators, or the combination of weighted terms or vectors. The advantage of the latter approach is that retrieved records may be ranked according to predicted probability of relevance to a particular search statement. Details of syntax are treated more fully in section 6. Vocabulary, entries, and search statements. 5.12. Vocabulary management. The vocabulary of an index should match the vocabulary of users. This is a difficult objective, since research and experience has shown that the vocabulary of users is extremely diverse and subject to constant change. Therefore, the larger the lead-in or entry vocabulary, the better, with links between synonymous and equivalent terms. An index should also assist users in adjusting the level of specificity of their requests to that of the index by providing links between broader and narrower terms. An index can also suggest other avenues of search by linking related or associated terms. Details of vocabulary management are treated more fully in section 6. Vocabulary, entries, and search statements. 5.13. Text surrogation; locators. Unless index terms or headings are attached to or embedded in the full text of a verbal document, indexes must include methods for representing and locating the documents or document descriptions to which they refer. Locators are part of entries. Details are treated more fully in section 6. Vocabulary, entries, and search statements. 5.14. Entry display. Entry display refers to the form and content of individual entries representing individual documents. Issues include unified displays in which all data relating to a particular document are displayed together in one or more entries versus multiple partial displays in which only partial data are displayed in each entry. Details of entry display are treated more fully in section 6. Vocabulary, entries, and search statements. 5.15. File display and arrangement. File display refers to the display of multiple entries in alphanumeric, classified, or relational arrays. For print media, file display determines the means of access to particular entries in the index. For electronic media, file displays may be used for visual browsing or scanning of entries, either as an alternative to or a substitute for the searching of hidden or implicit indexes that are not displayed directly to the user. Sections 8. File display, and 9. Alphanumeric arrangement, are devoted to these issues. 6. Vocabulary. 6.1. Sources of vocabulary. The vocabulary for indexes may come from indexed documents, index users, human indexers, or compilations of vocabulary, such as thesauri, dictionaries, handbooks, and textbooks. The best source is often the text of indexed documents. Users of indexes are another valuable source, but it is often difficult or impossible to access their vocabulary directly. When it is possible to collect search terms employed by users, their terms should be incorporated into the index vocabulary. To the extent possible, indexes should link the vocabulary of users to the vocabulary of documents. Expert indexers may be aware of user vocabulary that is not present in indexed documents; their vocabulary expertise should be used to the fullest extent possible. Previous compilations of vocabulary can also be useful. However, restricting vocabulary to such a collection of terms is usually not advisable, since it may lead to unnecessary constraints on access. 6.2. Forms of terms. Conventions and customs for the form of index terms have developed for English language indexes, as well as for all other natural languages used for indexing. These conventions should be observed for the convenience of users, unless there are overriding conventions in a particular discipline, field, or application. In this standard, only U.S. English language conventions and customs are cited. 6.2.1. Parts of speech. Nouns, including verbal nouns (gerunds) and noun phrases, are the preferred parts of speech for terms. 6.2.2. Spelling. For U.S. indexes, standard U.S. spelling should be used. If there is more than one standard spelling (e.g., groundwater, ground-water, ground water), the one used in the indexed document(s) should be preferred if used consistently. Otherwise, one spelling should be chosen and employed consistently. Alternative spellings should be linked to the standard term. This is especially important in implicit electronic indexes, since even minor variations in spelling (e.g., aluminum, aluminium) may lead to the loss of access. Contractions, abbreviations, and acronyms should be used as terms or linked to terms when known and used by searchers. Their spelling should conform to common usage (see also 6.4. Synonymous and equivalent terms). 6.2.3. Capitalization. All terms should be written with lower-case letters with the exception of proper nouns and acronyms. In proper nouns, the first letter of the first word and the first letter of each succeeding word, other than conjunctions, prepositions, and articles, should be capitalized. Acronyms of names of organizations should follow usage of the organization (e.g., NATO, Unesco). Other acronyms should follow conventional spelling (e.g., radar, COBOL). 6.2.4. Singular and plural forms. In English, the convention and custom in most indexing situations is to use the plural form for terms denoting discrete objects (countables) and the singular form for non-countables (mass words). The plural is used when the question as to quantity asks "How many?" The singular is used when the question as to quantity asks "How much?". If the singular and plural forms have different meanings, both forms should be used: memories memory building buildings 6.2.5. Articles. The use of articles, especially initial articles, should be avoided in terms for topics and features. Articles that are part of a name, however, should not be deleted or transposed (see 6.2.9.2.-6.2.9.5.). 6.2.6. Bound terms. As a general rule, a single term (as opposed to a heading or entry, which is discussed in 6.2) should represent a single concept. What constitutes a single concept will vary from situation to situation. Frequently two or more concepts become "bound" together and are commonly expressed by a "bound term," such as "information science", "birth control", or "form of government". When such bound terms become established, they should be preferred to the alternative of forcing the combination of two separate terms, for example, "science" and "information" or "control" and "conception" at the time of searching or at the time of combining terms into headings and entries. Use of bound terms also helps to avoid "false drops," such as the retrieval of documents on "library schools" when "school libraries" is intended. Similarly, terms like "information" and "science" can occur in many contexts where "information science" is not discussed. 6.2.7. Antonyms and associated terms. When antonyms and other closely associated terms (e.g., honors and awards) are used as combined terms, the constituent terms should be linked to the combined term: awards : honors and awards evil : good and evil NOTE -- The space-colon-space indicates a link. The form of link will depend on index medium and form of display, as discussed in 6.8.2.2. Cross references and 6.8.3. Vocabulary management for implicit indexes. 6.2.8. Terms consisting of more than one word Terms consisting of more than one word, including bound terms, should be used in natural language order, without inversion. 6.2.9. Proper names and titles of documents. Names of persons, corporate bodies and places should be established in accordance with standards used in library practice, since it is advantageous for users to experience a measure of uniformity across information retrieval systems. @@The Anglo-American Cataloguing Rules@@, 2d edition (AACR2), provides detailed guidance for the establishment of names. NOTE -- This standard provides only a cursory summary of provisions of AACR2. 6.2.9.1. Personal names. Personal names should be provided in the form most commonly used, and in as full a form as possible when there is more than one common form. Limiting forenames to initials, unless that is the preferred form (e.g., D. H. Lawrence), invites confusion. When more than one name or form of name is in use, they should be linked as synonymous terms. Where surnames are in common use, names should be entered under surname, followed by a comma and any given names or initials: Lee, Kuan Yew Wheatley, Henry B. Persons identified only by a given name or forename should be entered under that name, qualified, if necessary, by a title of office or other distinguishing epithet: Boudicca, Queen of the Iceni Leonardo da Vinci Ethelred the Unready Persons normally identified by a title of honor or nobility should be entered under that title, expanded if necessary by their family name: Dalai Lama Marlborough, John Churchill, first Duke Compound and multiple surnames, whether hyphenated or not, should be entered under the first part, except where usage favors another practice, as in the case of Portuguese names, which are customarily entered under the last part. Links should be established among all possible forms of entry: Layzell Ward, Patricia [with link from: Ward, Patricia Layzell] Pe'@rez de Cue'@llar, Javier [with link from: De Cue'@llar, Javier Pe'@rez; and from: Cue'@llar, Javier Pe'@rez de] When two or more persons have the same name, their names constitute homographs and should be distinguished with qualifiers such as fuller form of name or dates where available; otherwise use occupation, title, or nationality: Lawrence, D. H. (David Henry) Lawrence, D. H. (Derek Herbert) Butler, Samuel (1612-1680) Butler, Samuel (1835-1902) Rickert, Heinrich (philosopher) Rickert, Heinrich (politician) 6.2.9.2. Corporate body names. Names of corporate bodies should be entered without transposition in the form most commonly used. If more than one form is common, the fuller form should be used. If an abbreviation or acronym is the commonly used form, that form (not the full form) should be used: J. Whitaker & Sons [not Whitaker, J., & Sons] H. W. Wilson Company [not Wilson, H. W., Company] Unesco [not: United Nations Educational, Scientific, and Cultural Organizations] TRON Project [not: The Realtime Operating System Nucleus Project] Do not omit or transpose initial articles that are part of the commonly used name: The Club (London) [not: Club (London)] The Library Association (United Kingdom) [not: Library Association (United Kingdom)] NOTE -- See 9.3. regarding the filing of headings with initial articles. Enter corporate bodies that are parts of larger bodies under their own names unless the name is indistinct or implies subordination. If the name needs the name of a higher body, use the lowest level body that can be entered directly under its own name: Public Library Association. Audiovisual Committee [not: American Library Association. Public Library Association. Audiovisual Committee] When there are several hierarchical levels, as many intervening bodies as necessary should be included in the name to avoid confusion: American Library Association. Resources and Technical Services Division. Board of Directors. Identical names for different bodies constitute homographs and should be distinguished with qualifiers: Metropolitan Museum of Art (New York, NY) Metropolitan Museum of Art (Cleveland, OH) Links should be established among different names for the same body and among all possible forms of entry, including inverted forms: Medicine, National Library of : National Library of Medicine Whitaker, J., & Sons : J. Whitaker & Sons Wilson, H. W., Company : H. W. Wilson Company United Nations Educational, Scientific, and Cultural Organizations : Unesco The Realtime Operating System Nucleus Project : TRON Project NOTE -- The space-colon-space indicates a link. The form of link will depend on index medium and form of display, as discussed in 6.8.2.2. Cross references and 6.8.3. Vocabulary management for implicit indexes. 6.2.9.3. Geographical names. Geographical names should be as full as necessary for clarity, with qualifiers to avoid confusion between otherwise identical names: Middletown (CT) Middletown (OH) Middletown (Powys, Wales) Abbreviations should not be used unless there is a commonly accepted standard, such as U. S. Postal Service abbreviations for states of the United States. Prefer the English form if there is one in general use. Otherwise use the form in the official language of the country: Buenos Aires An article or preposition should be retained in a geographical name of which it forms an integral part: Des Moines Las Vegas Los Angeles The Dalles The Hague NOTE -- See 9.3. regarding the filing of headings with initial articles. 6.2.9.4. Titles of documents. To the extent possible within typographic constraints, titles of documents should not be changed or altered, for example, a chemical should not be substituted for a chemical symbol or a numeral replaced with its name. Titles with numerals should be linked with equivalent titles with names of numerals: 1066 and all that : Ten sixty-six and all that 1984 : Nineteen eighty-four 2001 : Two thousand and one If necessary to avoid confusion, qualify the title of a document with a term that will indicate that it is a document: Charlemagne (play) Genesis (Anglo-Saxon poem) If necessary for identification, names of creators, places of publication, dates or other qualifiers may be used: Ave Maria (Gounod) Ave Maria (Schubert) Ave Maria (Verdi) Natura (Amsterdam) Natura (Milan) An initial article should not be omitted nor transposed to the end of the title: Das Kapital (Marx) [not: Kapital (Marx); Kapital, Das (Marx)] The Tempest [not: Tempest; Tempest, The] See section 9.3. regarding the filing of headings with initial articles. Prepositions at the beginning of a title should be retained: An die Musik To the lighthouse 6.2.9.5. First lines. In first line indexes, initial articles should be retained in natural order, not transposed. See section 9.3. regarding the filing of headings with initial articles. 6.2.10. Romanization. Names and words rendered into Roman script from another writing system should be based on standard Romanization tables, such as those adopted by the Library of Congress, unless a well-established English language form exists: Alexander the Great [not: Alexandros ho Megas] Confucius [not: Kung Fu Tzu] Omar Khayyam [not: 'Umar Khayyam] Links should be established among alternative forms of Romanized names and other terms. 6.3. Homographs. Identical terms which represent different concepts or features can cause confusion and should be differentiated in one of the following ways. a. Addition of a qualifier: races (ethnology) races (sport) b. Use of bound terms: swimming pools car pools betting pools 6.4. Synonymous and equivalent terms. Research and practice indicate that searchers tend to have little agreement on terms for particular concepts or features. Therefore, it is essential that indexes provide for the largest possible number of alternative terms, including abbreviations and acronyms. All terms that may be used for the same concept or feature within the context of an index should be linked so that any such term will lead searchers to the same documents. Terms including numerals should be linked with equivalent terms having the names of numerals: 19th century : nineteenth century 5 year plans : five year plans Small variations in terms which have little or no impact on filing position cause few problems in displayed indexes, but such variations can cause terms to be completely lost in hidden or implicit indexes that are searched electronically. Therefore, terms with even small variations in spelling or endings (e.g., aluminum, aluminium) should be linked in electronic indexes. All such terms with separated filing positions should be linked in displayed indexes. What constitutes equivalent terms will depend on the level of specificity used in an index. Unused narrower terms need to be linked to broader or related terms that are used. 6.5. Hierarchical relationships among terms. Links between narrower and broader terms are important to guide the narrowing of a search to particular members of a larger set of terms (e.g., from "computers" to particular types of computers) or the broadening a search to all members of a larger set (e.g., from "Labrador retrievers" to all species of dog). Examples of hierarchical relationships include: a. genus/species: furniture : chairs behavior : aggression bears : polar bears b. discipline/constituent studies: geology : petrology c. class/individual members: bridges : Golden Gate Bridge standardizing bodies : NISO d. entity/parts or kinds: buildings : rooms United Nations : Unesco population : immigrants chemical industry : petrochemical industry e. larger and smaller geographic units: Europe : France New Jersey : Middlesex County : New Brunswick 6.6. Other relationships. Links between terms having relationships other than hierarchical are important to provide searchers with additional options for improving their searches. Examples of other relationships include: a. discipline/objects studied: botany : plants physical chemistry : molecules b. theoretical study/application or technology: dynamics : mechanical engineering state ownership : nationalized industries c. activity/agent: photography : cameras, photographers singing : voice; singers d. activity/thing acted upon: angling : fish dentistry : teeth e. activity/product: aggression : violence cartography : maps f. closely related topics not generally differentiated in common parlance but differentiated in a particular index: boats : ships pottery : porcelain g. related topics separated in a particular index when related nouns and adjectives take different forms: law : legal aid mouth : oral hygiene 6.7. Changes in terminology. In continuing indexes, care should be taken to link older and newer terms which are synonymous, equivalent or closely related. The date of the change should be indicated. Examples of changing terminology include: a. the introduction of a new term as a substitute for an older term: wireless : radio Negroes : Blacks : African Americans b. name changes: Sri Lanka : Ceylon Harris, Jessica : Milstead, Jessica School of Communication, Information, and Library Studies : Graduate School of Library Service. c. the use of additional terms to express narrower topics previously embraced by a broader term: computers : microcomputers, minicomputers 6.8. Display of vocabulary in indexes. 6.8.1. Integrated display of vocabulary. Information about vocabulary and relations among terms or headings (e.g., synonymy, equivalence, homography, hierarchy, association) should be presented as an integral part of an index. Searchers should not have to consult separate, unconnected files for vocabulary information. How integration of vocabulary information and index terms or headings is accomplished depends largely on the medium of the index and whether or not indexes are displayed for searching. Indexes to electronic databases are frequently not displayed. Instead, searchers rely on electronic matching of terms "behind the scenes." Such indexes may be called "hidden or implicit" indexes. For such indexes, vocabulary information should be displayed as part of the search interface. 6.8.2. Vocabulary in file displays. In print media, indexes must be displayed as arrays or files of entries, since it is through such displays that searchers enter the index. Such displays are becoming more common for electronic indexes as well, especially those designed for end-user searches, such as online public access catalogs (OPACs) in libraries. In index displays, whether in print or electronic media, vocabulary information should be integrated into the same sequence of entries that describe documents, using a variety of notes and cross-references. 6.8.2.1. Scope and history notes A scope note may help in clarifying the scope or application of a term. It should be set off, by means of type or layout, from the term itself. A history note explains changes in usage over time in a continuing index: "Radio" replaced "wireless" in 1950. This information may also be presented in the form of a cross-reference: radio @@in earlier volumes see@@ wireless wireless @@see@@ radio When both old and new entries are present in the same index, "see also" references must be used: radio @@see also@@ wireless @@for references before 1950@@ wireless @@see also@@ radio @@for references from 1950 onward@@ When cross-references refer to newer terms that formerly were subsumed under the lead term in the cross-reference, dates should be attached to terms so that users know when such terms were introduced: computers @@see also@@ microcomputers [1977]; minicomputers [1972] microcomputers @@see also@@ computers for entries before 1977 6.8.2.2. Cross-references. a. Terms and headings that are not used to index documents should be linked to terms that are used (synonymous, equivalent, or combined terms) by using "see," "use," or "search under" references: Bonaparte, Napoleon @@see@@ Napoleon I, Emperor of France da Vinci, Leonardo @@see@@ Leonardo da Vinci aesthetics @@see@@ esthetics storage @@see@@ cold storage; warehouses Vinci, Leonardo da @@see@@ Leonardo da Vinci In monographic (one time, closed-end) indexes, as opposed to serial (open-end, continuing) indexes, a "see" reference should be replaced by a duplicate entry under the alternative heading if the duplicate entry does not occupy more space than the cross-reference would. Some indexes make a distinction between synonymous or equivalent terms on the one hand and broader terms that are used in place of unused narrower terms, using the reference "see" or "use" for the former case and "see under" for the latter case: cars @@see@@ automobiles convertibles (automobiles) @@see under@@ automobiles b. Related terms and headings (synonymous, equivalent, combined, broader, narrower, and other related terms or headings) that are or could be used to index documents should be linked by means of "see also," "use also," or "search also under" references. The nature of the relationships can be indicated. When the cross-reference refers to multiple terms or headings, these should be listed by category, if the nature of the relationship is indicated (e.g., synonymous, equivalent, broader, narrower, related), and within category in alphabetical order, separated by semi-colons. If the nature of relationships is not indicated, the referenced terms or headings should be in a single alphabetical sequence. sexuality @@see also narrower terms@@ bisexuality; chastity; heterosexuality; homosexuality; incest; necrophia; sublimation. @@see also related terms@@ gender; sex; sexual identity; sexual problems. ========================================================================= Date: Mon, 14 Sep 1992 09:12:07 ECT Reply-To: Indexer's Discussion Group Sender: Indexer's Discussion Group From: Robert A Amsler Subject: Re: What do you read? ----------------------------Original message---------------------------- ASIS publications, ARIST volumes. ========================================================================= Date: Mon, 14 Sep 1992 09:12:25 ECT Reply-To: Indexer's Discussion Group Sender: Indexer's Discussion Group From: Ann KELSEY Subject: Specialized subject headings/thesauri ----------------------------Original message---------------------------- Cross-posted to AUTOCAT, INDEX-L, SLA-TECH Does anyone know of a thesaurus or specialized subject headings that would be appropriate for accessing program related grants in the social sciences? This was posted to SLA-TECH with no success, so I am cross-posting in hopes of getting some information. Please respond directly to me, not the list, as I am not a subscriber to Autocat, or Index-L, but am posting the question to these lists in desperation. Thank you for any help you can provide. Ann Kelsey kelsey@pilot.njin.net ========================================================================= Date: Mon, 14 Sep 1992 09:14:05 ECT Reply-To: Indexer's Discussion Group Sender: Indexer's Discussion Group From: Charlotte Skuster Subject: NISO standards draft part 3 "See also" cross-references should normally follow any locators related to the term or heading from which they refer: bears 100, 217, 923 @@see also@@ badgers; koala bears; raccoons However, since their purpose is not only to suggest additional terms or headings that may be useful but also to suggest alternative terms or headings that may be more appropriate, "see also" cross-references should precede both locators and subheadings (if any) in those types of index where they may be overlooked or found only after perusing unwanted references. Examples of such indexes include an index on cards or on a visual display unit, or in a very detailed printed index. In these cases, "see also" cross references should be clearly distinguished from the remainder of the entry. For example, they can be displayed on indented lines following the heading or displayed with parentheses: economics 144, 195, 229, 363, 399, 502 @@see also@@ assets; banking; business; commerce; firms; transport; wealth bibliographies 208 mathematical models 160 statistics 155 or economics (@@see also@@ assets; banking; business; commerce; firms; transport; wealth) 144, 195, 229, 363, 399, 502 bibliographies 208 mathematical models 160 statistics 155 Cross references should be attached to the heading or the subheading from which they refer: economics statistics 155 @@see also@@ econometrics When a cross-reference leads from a subheading under one main headings to the same subheading under another main heading, the reference should include both main and subheadings referred to: economics statistics 155 @@see also@@ economic policy -- statistics An alternative which is less clear makes use of a "see also under" reference: economics statistics 155 @@see also@@ under economic policy 6.8.3. Vocabulary management for implicit indexes. To be useful, vocabulary information, including relations among terms, should be displayed to users in conjunction with and in the same medium that is used to search the index. Users should not have to consult a completely separate vocabulary file, copy down terms, and then re-enter them in order to use them for a search. Users should have the option of automatic or selective addition or replacement of synonymous and equivalent terms. In the automatic mode, when index terms are limited to preferred terms, the preferred term should replace any synonymous or equivalent terms. When all such terms may be used, then all should be added to the search. The user should be notified of any modification to the search. In the selective mode, preferred or synonymous or equivalent terms must be displayed so that they may be selected for replacement or addition. Users should have the option of seeing displays of other vocabulary information and selecting broader, narrower, or other related terms for use in their search, either in addition to terms already selected, or in place of earlier terms. Methods for effectively displaying vocabulary information, especially in conjunction with implicit indexes, are not yet well established, but should be encouraged, since vocabulary information is essential for efficient use of indexes of all types. 7. Entries, headings, and search statements. Index entries consist of a heading (and sometimes subheadings) made up of one or more terms and a locator that identifies or points to the document that the heading describes. Entries also include cross-references to other headings (see 6.8.2.2.). In implicit electronic indexes in which the index itself is not displayed, the search statement is the equivalent of the heading. Like a heading, a search statement may consist of one or more terms. Locators are the representations of retrieved documents. In full text databases relying on implicit indexes, terms may be attached to or embedded in the text itself. In such cases, the text represents itself without the use of separate surrogates, citations, or locators. 7.1. Heading or search syntax. Syntax is the process of combining terms to create headings or search statements. In all indexes, whether presented in electronic or print media, whether displayed or implicit, syntax is absolutely essential. Without syntax, it is often impossible to describe or define the context or aspect of a topic or feature of interest. In most such cases, entirely too much irrelevant material will be retrieved. The combination of terms is usually essential to provide the user a basis on which to eliminate unwanted references without having to consult each one. In displayed indexes, headings with more than 5 undifferentiated locators should be avoided by the addition of additional terms or subheadings for context or aspect. However, the combination of terms, or the use of subheadings, is desirable even when a single-term heading has only one locator. A single-term heading forces the user to consult the reference to determine whether it might be useful; the addition of a term indicating context or aspect can often relieve users of useless pursuits. Indexes use an enormous variety of syntactic methods. The following examples display major current methods. NOTE -- The standard does not advocate any particular method; it simply states that some method of syntax should be used. 7.2. Post-coordination syntax. Post-coordination syntax is used for the combination of terms after the construction of an index, hence the term "post". It is almost always used in searches of implicit indexes in electronic media when the searcher, not the indexer, creates a search statement combining index terms. There are two basic approaches in post-coordinate indexing: Boolean and weighted combination. 7.2.1. Boolean syntax. Boolean syntax combines terms using the operators AND, OR, and NOT. It has become a de facto standard in electronic databases, but it has two major drawbacks: First, the meaning of AND and OR does not correspond to the usual senses of these words. Second, the Boolean search divides a database into two distinct sets, retrieved and not retrieved. Retrieved documents are not ranked in any way on the basis of possible or probably interest. Documents that meet most but not all requirements of a search are not retrieved. In Boolean searches, retrieval is "either/or" -- no "maybes" are considered. 7.2.2. Weighted term combinations. Searching by weighted term combination, also called "vector" or "probabilistic" searching, combines terms and retrieves all documents that are represented by one or more of the search terms. Retrieved documents are then ranked, with those having the highest number of terms coming first. Both index terms and search terms may be weighted to reflect importance or interest. Such weights can further influence the calculation of "retrieval scores" for the purposes of ranking documents. Instead of dividing a database into two distinct sets (retrieved and not retrieved), weighted combination searching rearranges the entire database in the order of estimated interest based on the search statement. 7.2.3. Proximity operators, stemming, and truncation. Both Boolean and weighted term combination syntax may be combined with a wide variety of methods for broadening or limiting the scope of a search statement. These methods include the use of: proximity operators, which specify that two or more terms must be in certain proximity; stemming, which removes certain suffixes and/or prefixes; and truncation, which permits the use of parts of words, including word-roots. 7.3. Pre-coordination syntax. Pre-coordination syntax is used when combination of terms must take place prior to the presentation of the index. It must be used for all displays of complete indexes. The following sections describe examples of a wide variety of pre-coordination syntax. 7.3.1. Ad hoc syntax. Syntax is often applied on a case-by-case, heading-by-heading basis in monographic (one-time, close-end) indexes. Book indexes are a prominent example. Individual headings are created as appropriate for the nature of the indexed document and for the needs of prospective users. While the suggestions in this section apply especially to such ad hoc applications, they will also apply, to varying degrees to other types of syntax. Prepositions should be avoided in subheadings unless needed for clarity: computers for management computers management of food rationing [not: food -- rationing of] land use [not land -- use of] management of computers management use of computers rationing food use land NOTE -- The above examples are shown as they would appear in a printed index, but other styles of linking lead terms and subheadings are also possible, depending on index format and medium. 7.3.2. Natural language syntax. Some indexes take advantage of the syntax of pre-existing segments of text to provide syntax for index headings. The most common examples are keyword indexes based on document titles, but the text segment could come from other parts of documents as well. 7.3.2.1. KWIC indexes Key Word in Context (KWIC) indexes use computer algorithms in order to highlight by position or typography (or both) title words or other words in brief text segments usually not exceeding one line of print. The natural word order of the title or text segment is preserved on both sides of a keyword. Headings are arranged on the basis of the keyword and words following the keyword. Text that extends beyond the right margin may be moved to the left- had portion of the column if there is room, or may be eliminated: zation and presentation of INDEXES. Guidelines for the cont al Standard Guidelines for INDEXES in Information Retrieval Guidelines for Indexes in INFORMATION Retrieval American N dexes in Inf American NATIONAL Standard Guidelines for In+ Note that word pairs and phrases are preserved (e.g., Information Retrieval; National Standard). 7.3.2.2. KWOC indexes Key Word Out of Context (KWOC) indexes were developed to resemble the traditional format of indexes, with lead or main term on the left followed by a subheading, usually indented. With indention, the lead or main term is usually not repeated when it is the same for subsequent entries: Indexes American National Standard Guidelines for Indexes in Information Retrieval Guidelines for the content, organization and presentation of indexes Information American National Standard Guidelines for Indexes in Information Retrieval National American National Standard Guidelines for Indexes in Information Retrieval Note that in this format, word pairs and phrases are not preserved. It is not possible to look up directly phrases such as "information retrieval," "national standard," etc., unless the second word happens to be the first word of the subheading formed from the title or text segment. The loss of direct access to word pairs and phrases is a disadvantage. 7.3.2.3. KWAC indexes Keyword Alongside Context (KWAC) indexes were developed to preserve word pairs and phrases within the traditional format with the lead or main term on the left: Indexes Guidelines for the content, organization and presentation of in Information Retrieval. American National Standard Guidelines for Information Retrieval. American National Standard Guidelines for Indexes in National Standard Guidelines for Indexes in Information Retrieval. American 7.3.3. Subject heading lists. Syntax may be provided by using pre-established lists of subject headings that generally include headings consisting of pre-combined terms or provisions for combining terms at the time of indexing in accordance with rules or patterns. Precombination may be achieved in two ways: a. linking terms to each other by long dashes: animals -- diseases -- chemotherapy libraries -- New Jersey -- New Brunswick b. modifying the lead term by other terms so that the natural word order is inverted: students, foreign Both methods may be combined, to create headings such as "students, foreign -- statistics". 7.3.4. Permuted indexes. Permuted indexes display every possible combination of words from a segment of text or a set of terms. Since the number of such combinations increase exponentially as the number of words in each heading increases, permuted indexes are usually restricted to headings consisting of no more than two words. The titles used for the previous examples of keyword indexes would produce the following permuted index headings: INDEXES AMERICAN CONTENT GUIDELINES INFORMATION NATIONAL ORGANIZATION PRESENTATION RETRIEVAL STANDARD INFORMATION AMERICAN GUIDELINES INDEXES NATIONAL RETRIEVAL STANDARD NATIONAL AMERICAN GUIDELINES INDEXES INFORMATION RETRIEVAL STANDARD 7.3.5. String indexing. String indexing uses computer algorithms to combine multiple terms assigned a document into multiple headings, each of which has a different term as its lead or main term. The set of terms is treated like a "string" that is rearranged under each lead term. 7.3.5.1. Rotated terms. The simplest form of string indexing simply places each term in the lead position and follows it by all other terms in alphanumeric order. In the following examples, numerals are filed after letters: American Mercury (periodical). Editors and Editing. Ku Klux klan. Mencken, H. L. Methodist Episcopal Church (South). Temperance Movements. 1910-33. Editors and Editing. American Mercury (periodical). Ku Klux klan. Mencken, H. L. Methodist Episcopal Church (South). Temperance Movements. 1910-33. Ku Klux klan. American Mercury (periodical). Editors and Editing. Mencken, H. L. Methodist Episcopal Church (South). Temperance Movements. 1910-33. 7.3.5.2. Faceted indexing Faceted indexing arranges terms in entry strings according to facet relationships. Terms are placed into facets, or tagged with facet indicators, by indexers. Faceted indexing that is designed to accomodate broad subject areas, such as PRECIS (Preserved Context Indexing System), uses generic term categories like location, key system or entity, action or effect of action, agent or instrument, viewpoint or aspect, particular instance, document form, and target user. These primary categories are sometimes modified by secondary categories such as part, property, role definer, modifiers, dates, and various connectives. The following coded terms will produce the following headings: [location] West Germany [key entity] cars [modifier] Japanese [action] sales [role definer] effects of/on [agent, instument] advertising West Germany Japanese cars. Sales. Effects of advertising Cars. West Germany Japanese cars. Sales. Effects of advertising Japanese cars. West Germany Sales. Effects of Advertising Sales. Japanese cars. West Germany Effects of Advertising Advertising. Japanese cars. West Germany Effects on sales When faceted indexing is applied to a narrow subject area, facets tend to be tailored to aspects of particular interest in that subject area. In literature, for example, terms may be placed into facets such as specific literatures, performance media, language, periods, individuals (real), groups/movements, genres, works, literary techniques, themes/motifs/figures/characters, influences, sources, processes, methodological approaches, theories, devices/tools, and disciplines. The designated citation order for these facets then determines the order of terms in subheadings: HOMOSEXUALITY English literature. Short story. 1900-1999. Forster, E. M. "Dr. Woolacott." Symbolism. Treatment of salvation; HOMOSEXUALITY. SALVATION English literature. Short story. 1900-1999. Forster, E. M. "Dr. Woolacott." Symbolism. Treatment of SALVATION; homosexuality. SYMBOLISM English literature. Short story. 1900-1999. Forster, E. M. "Dr. Woolacott." SYMBOLISM. Treatment of salvation; homosexuality. 7.3.5.3. Ad hoc coding. Some forms of string indexing require indexers to enclode a natural language statement, which may be created to describe a document or may be an already existing text segment, such as a title. In this example, based on NEPHIS (Nested Phrase Indexing System developed by Timothy Craven) brackets < > are used to enclose meaningful phrases; the question mark is used to introduce connectives, usually prepositions; and the '@' is used to turn off otherwise automatically generated headings. The following coded statement will result in the following headings: @effects? of ? on >? in >> advertising effects on sales of Japanese cars in West Germany cars Japanese -. sales in West Germany. effects of advertising Germany West -. sales of Japanese cars. effects of advertising Japanese cars sales in West Germany. effects of advertising sales of Japanese cars in West Germany. effects of advertising West Germany sales of Japanese cars. effects of advertising 7.3.5.4. Chain indexing. Chain indexing is based on the terms and the citation order of facets or aspects in a classification scheme. The chain index produces headings that complement the classification scheme by creating a "chain" of terms from the classification heading but reversing the order in which facets or aspects are cited. The following headings from the Dewey Decimal Classification would produce the following chain index entries: 100 philosophy 170 ethics 172-179 applied ethics 178 ethics of consumption 178.1 in use of alcoholic beverages alcoholic beverages: consumption: applied ethics 178.1 applied ethics 172-178 beverages: alcoholic: consumption: applied ethics 178.1 consumption: applied ethics 178 ethics 170 philosophy 100 7.4. Locators. The purpose of a locator is to lead the user to the document, the description of the document, or the specific location within the document to which an index entry or search request refers. The nature of the locator will depend on the medium and type of index and on the type of documents to which the index refers. In electronic indexes, index terms or headings may be linked to documents or to their surrogates without visible locators. Locators should refer, as directly and succinctly as possible, to the documentary units to which index headings refer. Documentary units, in verbal texts, range from lines, sentences or statements, paragraphs, sections, chapters, articles, periodical issues, volumes, or series to entire collections. The variety of possible units in non-verbal texts is even more diverse. To the extent possible, documentary units should be inherent units of text rather than units of the medium in or on which the text is presented. For example, paragraphs are preferable to pages in verbal texts, since paragraphs are inherent units of verbal texts whereas pages are not; pagination may vary from one printed manifestation to another of the same verbal text. 7.4.1. Locators for printed documents. Printed books, pamphlets, periodicals, and similar documents normally consist of numbered pages bound into one or more units. Pages are the traditional units for indexes to printed books, pamphlets, and similar documents because pages are numbered while inherent units, such as paragraphs, are usually not numbered. If documents have numbered paragraphs, these paragraph numbers should be used as locators. If pages are divided in some way, such as into columns, such smaller units may be used instead of or in addition to pagination. With certain classes of printed material, inherent textual units are often numbered and therefore may be used as locators. For example, parts of plays may be referred to by act, scene and line number(s), and parts of books of the Bible by chapter and verse number(s). When a document consists of a series of uniquely numbered discrete units, such as abstracts, quotations, or case reports, these units are preferable to pages. When there is more than one numbered sequence, they must be distinguished typographically: Livingstone, Ken 1/3, 1/97, 3/94 or Livingstone, Ken 1:3, 1:97, 3:94 When indexing several issues or volumes of a periodical or serial publication, locators should be based on the numbering of the issues at the time of publication. When documentary units are documents within a collection, for example, articles in a periodical, chapters in a monograph, or letters in an archive, sufficient information must be given to identify the document. For periodical articles, each locator normally consists of: author(s); title of article; title of periodical; volume, issue number, inclusive pagination, and date. The content, format, punctuation, and order of elements should conform to relevant standards, such as @@American National Standard for Bibliographic References@@ (ANSI Z39.29), which is now under revision; @@International Standard, Documentation -- Bibliographic references -- Content, Form and Structure@@ (ISO 690: 1987 (E)); @@International Standard, Documentation -- Bibliographic Identification (biblid) of Contributions in Serials and Books@@ (ISO 9115: 1987 (E)). Some indexing services add information indicating the presence of photographs, tables, and other illustrations or features. These indications are, strictly speaking, not part of the locator; rather, like index headings and subheadings, they assist users in deciding whether documents are likely to be of value to them. 7.4.2. Locators for documents in other media. Documents in other media may, for indexing purposes, be divided into three types: a. Those consisting of elements that form one or more sequences that are, or may be, continuously numbered and so accessed by the user. Such materials may be treated broadly as in 7.4.1. Examples are a collection of slides, a filmstrip, an audiodisc, or a machine-readable database. Locators would be slide numbers, frame numbers, side and band numbers, and record identifiers respectively. b. Those consisting of one or more sequences of elements that cannot be distinguished numerically or so accessed by the user. Examples are serially accessed materials such as motion picture film, audio and videotape. In these cases, relative locators must be devised, such as playing time from a particular point. c. Those not consisting of sequences, such as maps, plans, charts, pictures, sculptures, realia. In some cases specific conventions exist, such as for maps, either grid references or coordinates. In other cases, locators must be devised ad hoc. Most machine-readable text files fall into either category (a) or category (b). Locators for such files may also take the form of file pointers or embedded terms or tokens. 7.4.3. Multiple locators in indexes to single documents. If a document or document segment treats a subject in a consecutively numbered sequence, reference should be made to the first and last numbered elements only (e.g., 3-11). The first and last element should be given in full, not eliding any digits, in order to avoid ambiguity (e.g., 20-25, 103-112, 1014-1027, not 20-5, 103-12, 1014-27). Expressions such as "3ff" or "3 et seq." are not recommended, because they are confusing to most users and may give incomplete information unless defined for a particular index. 7.4.4. Methods of emphasizing locators. If an entry includes several locators, the reference leading to the fullest or most significant treatment may be emphasized typographically. Locators which relate to special matter, such as tables and illustrations, may also be emphasized or marked. Locators to illustrations, for example, may be italicized, enclosed in brackets, or prefixed or suffixed with an 'i' or asterisk. Where more than one type of material is indicated, it is preferable to use the same system for all (e.g., 't' for tables, 'i' for illustrations, 'm' for maps). 7.4.5. Presentation of locators. Locators should be clearly separated from headings by spacing, punctuation, or both, for example, by two spaces or by a comma or colon plus one space. The method used should depend on the nature of headings and the kind of punctuation used within headings. For example, headings that may end with commas and dates or other numerals should not use a comma plus space to introduce locators: Paris, 1989 : 1934, 2045 [not Paris, 1989, 1934, 2045] The method for presenting locators should be consistent throughout an index. 8. File display. Individual index entries may be displayed in ordered arrays or files. In print media indexes, ordered files provide the means of access to particular headings and entries. Therefore, the method of ordering entries in files is absolutely crucial. In electronic media indexes, entries may be sought via electronic matching without regard to file order. However, index file displays in electronic media can suggest options for searching and permit browsing and scanning. Such electronic visual files also need to be displayed in helpful order. Entries retrieved via implicit electronic indexes are displayed in files after retrieval. These files, too, should be ordered according to useful criteria. 8.1. File display in electronic media. When electronic indexes provide options for the display of files, both with respect to the fullness of entries and their arrangement, these options should be clearly described. Since the viewing area (screen) in electronic media is usually small and constrained as compared to print media, it is usually helpful to display entries in stages for scanning and browsing. For example, when entries consist of main headings and subheadings, only main headings may be displayed initially. When main headings are selected, for example, by highlighting, then their subheadings can be displayed. When subheadings are selected, sub- subheadings, locators, or document citations/surrogates can be displayed. In the display of retrieved entries or references, users should have options for the fullness of entries or records ranging from brief (e.g., title and author only) to full (e.g., title, author, full citation, abstract). Users should also have options for the arrangement of retrieved items, such as ranked according to potential relevance, classified by facets or a classification scheme, or ordered alphanumerically by index headings or by citation elements such as authors, titles, publishers, or dates. 8.2. File display in visual indexes. Procedures for displaying indexes in print media are well-established, while appropriate means for visual displays in electronic media are still very much in the development stage. The following sections relate primarily to indexes in print media, but some of the principles discussed are also applicable to visual files in electronic media. 8.2.1. File arrangement. Options for arranging index entries in visual files vary enormously with respect to underlying structure and criteria. Structured files can be helpful in breaking up large files into smaller, useful segments. Groupings can be created on the basis of relations among concepts or the meaning or type of concepts represented. But structured arrangements can be detrimental to searching when the basis for such arrangements is hidden from and therefore unknown by users. For most index displays, the direct and straightforward arrangement on the basis of commonly accepted ordering of alphanumeric characters is preferred, since most users cannot be expected to know less obvious principles for arrangement. When structured, relational, or classified files are used, they should be accompanied by alternative alphanumeric displays. 8.2.1.1. Classified or relational file displays. In classified or relational file displays, entries are arranged on the basis of relations among concepts represented by headings, such as superordination and subordination, class inclusion, chronology, and various types of roles and associations (e.g., discipline, action, object or agent of action, material, method, tools, and property). To the extent possible, the basis of arrangement should be made clear. Summaries or outlines of a classified file displayed at the head of the index are useful for this purpose. In almost all cases, a classified file should be accompanied by an alphabetic or alphanumeric index, unless the classified file is very short and can be quickly scanned. 8.2.1.2. Alphanumeric file displays. Alphanumeric displays are based on the commonly accepted filing values of alphabetic letter and numerals. However, there are an large number of options on how alphanumeric filing should actually be implemented. Some of these option, with recommendations, are considered in 9. Alphanumeric filing. 8.2.2. Recurring elements. It is appropriate to use indention to avoid the repetition of recurring terms in subsequent headings: labor distribution theory earnings monopolistic markets oligopolistic markets perfect competition rather than labor : distribution theory labor : earnings labor : oligopolistic markets labor : perfect competition 8.2.3. Vertical spacing. At least one blank line should separate major sections of the index, such as sections beginning with different letters in alphabetical indexes. In alphanumeric indexes, a blank line should also separate the non-alphabetical headings (e.g., headings beginning with numerals) from the alphabetical sequence. 8.2.4. Entry layout. Entry layout will depend on a variety of factors, such as type of syntax used, length of entries, medium of display, and space available (see 5.11. Syntax and 7.1. Heading or search syntax). When sub- and sub-sub-headings are used, they may be presented in a "set-out" (also called "line-by-line" or "entry-a-line") layout, a "run-on" (also called "paragraph style") layout, or a hybrid of the two styles. In the set-out layout, each subheading and sub-subheading begins on a new line, progressively indented. The run-on layout should be limited to two levels of heading (e.g., main heading and subheading). If three or more levels are used, the set-out layout of subheadings under the main heading should be retained, with the run-on layout being used only for sub-subheadings and further levels of subdivision, as in the hybrid example below. Set-out subheadings are preferable to run-on subheadings because users can scan them more quickly and can therefore understand them more easily. However, where economy dictates space-saving measures, run-on subheadings are preferable to shortening the index. In all layout styles, all items on the same level of heading should be indented by the same amount (in the set-out layout) or delineated by the same punctuation mark, such as a semi-colon (in the run-on layout). Parentheses may be used to indicate a third level of subheading in the run-on layout. In the run-on layout, when there are no locators between headings at two different levels, the two levels should be separated by a colon. (See "origins of tragedy" in the hybrid example below.) In all layout styles, whenever a line "turns over" to the next line, all lines after the first part of the turnover line should be indented more deeply than the deepest subheading indention employed in the index. (See "literary criticism" in the hybrid example below.) Set-out layout: Aristotle debt to Plato 23, 26 literary criticism in 35, 74, 89-93, 101-197 on Aeschylus 101-104, 279 on Aristophanes 195 on Euripides 104-126, 187, 265-266 on Homer 103, 190-194, 206 on Sophocles 127-183, 275-277, 306, 309-310 @@Antigone@@ 155 @@Oedipus Tyrannus@@ 140-149 origins of tragedy in epic 196 in revelry 197 Run-on layout (limited to 2 levels): Aristotle 20-22l; debt to Plato 23, 26; literary criticism in 35, 74, 89-93, 101-197; origins of tragedy 196, 197 Hybrid set out/run-on layout: Aristotle debt to Plato 23, 26 literary criticism in 35, 74, 89-93, 101-197; on Aeschylus 101-104, 279; on Aristophanes 195; on Euripides 104-126, 187, 265-266; on Homer 103, 190- 194, 206; on Sophocles 127-183, 275-277, 306, 309-310 (@@Antigone@@ 155; @@Oedipus Tyrannus@@ 140-149) origins of tragedy: in epic 196; in revelry 197 8.2.5. Introductory note. If an index is not straightforward or its conventions self-explanatory, an explanatory introduction should precede the index. Any abbreviations, symbols, or typographical conventions, requiring explanation should be including in this introduction. In the case of separately published indexes, the introduction should include sufficient bibliographic information (e.g., author, title, publisher, place and date of publication or periodical volumes/issues) in order to completely identify the documents indexed. (See also 5. Features and attributes of indexes.) 8.2.6. Running headlines. Pages on which an index is printed should bear a running headline. In the case of multiple indexes, there should be running headlines on each page bearing an appropriate title for each index. In the case of separately issued indexes, the words "Index to [title of work]" should be used. Scope headlines should be used to indicate the scope of a page spread, reproducing all or part of the first and last heading. The running headline should be centered on the page, with the scope headlines positioned at the left margin on a verso or left-hand page and at the right margin on a recto or right-hand page. 8.2.7. Continuation lines. In setting an index into pages or columns, some entries will be continued from the bottom of one column or page to the top of the next column or page. The continuation of very short parts of entries or of sequences of entries from one column or page to the next should be avoided. Examples are one or two locators at the end of an entry or the final line of an alphanumeric section of the index. Similarly, the initial line of a new alphanumeric sequence should not fall at the bottom of a column or page. When an index entry runs on to a new column or page, the index heading and any subheading and sub-subheading applicable to the run-on entries should be repeated, followed by "(continued)" or the abbreviation "(cont.)". On bottom of column or page: thesauri adaptation 182 construction 353, 364 software 387 On top of next column or page: thesauri (cont.) construction (cont.) standards 374 defined 381 8.2.8. Typography. Typography should contribute to clarity and rapid legibility. Size of letter and width of column should be in proportion to each other. One line should be able to accommodate an index entry of average length, including at least two or more locators. When an index entry occupies more than one line, individual numerical locators should never be divided. Different typefaces (e.g., bold, italics, or small capitals) may be used to distinguish entries for different types of documents, such as illustrations or titles of works. When an index consists of few main headings and many subheadings, the presentation of main headings in a different typeface or style from subheadings may be useful. Such conventions, when adopted, should be explained in an introduction. Too much variety, however, may confuse the user. 8.2.9. Columns. A printed index is normally printed in two columns per page. In large-size documents (e.g., coffee-table books), it may be set in three or four columns. Certain types of index, especially where entries are long, such as an index of first lines or a table of cases in legal works, are better set to full page width. On a page of normal width (5-1/2 -- 6 inches), it is not recommended to use three columns because this may result in many turnover lines in subheadings, making the index more difficult to scan, while not saving space. In a long index, each group of headings beginning with a new initial letter should begin on a new column or on a new page. If more than one index is provided for the same document or collection of documents and each index occupies more than a page or two, each index should begin at the top of a page or column. The title of each index, shortened if necessary, should be repeated at the top of each page as the running headline. 8.2.10. Electronic manuscripts. When indexes designed for print publication are transmitted via electronic media, typographic coding should conform to ANSI/NISO Z39.59 -- 1988, @@American National Standard for Electronic Manuscript Preparation and Markup@@. 9. Alphanumeric order. 9.1. Standards There is no ANSI/NISO standard for alphabetical or alphanumeric arrangement of indexes. Two de facto standards widely used in libraries and databases in the United States are the American Library Association (ALA) and the Library of Congress (LC) filing rules. @@The Chicago Manual of Style@@ (13th ed., University of Chicago Press) is used by many publishers. Of these, the ALA rules are preferred for indexes because the LC rules include classified arrangements based on the nature or type of headings rather than the characters (letters or numerals) in the headings themselves. For example, when multiple headings begin with the same word, the LC rules state that headings are not to be arranged on the basis of subsequent characters, but in an order based on the nature of the concept named or represented, for example, persons, places, things, and titles. Among headings for persons beginning with the same word, forename headings are filed before surname headings, regardless of the filing value of characters following the first word. Among headings for things beginning with the same word, headings are filed according the the type of subheading, if any, rather than alphanumerically. Such classified structure can be highly confusing to the non-expert, and few persons are expert in filing rules. [Note: Mulvany suggests that we delete the following paragraph for now. "Much has changed in the draft for the 14th edition of the Manual. Perhaps [we] should request a copy of the draft from the editor." -- JDA] @The Chicago Manual of Style@@ prefers a type of "letter by letter" arrangement (see 9.7. below) and recommends that numerals be filed as spelled out -- practices abandoned by U.S. library filing rules and the international standard for indexes (ISO/CD 999.4 Information and Documentation -- Guidelines for the Content, Organization and Presentation of Indexes). Like the LC rules, the @@Chicago Manual@@ calls for the filing of headings that begin with the same word but that represent different types of referents (e.g., persons, places, things) in a classified rather than alphabetical order. It assigns filing value to marks of punctuation, such as the comma, colon, or period, while it ignores articles, prepositions and conjunctions at the beginning of subheadings -- also practices not sanctioned by U.S. library filing rules. 9.2. Basic order. The basic order of characters is: a. spaces, dashes, hyphens, diagonal slashes, periods: All characters in this group have equal filing value and file before any numeral or alphabetic character. The order of these characters is determined by subsequent characters. All are treated as if they were a space. Multiple consecutive spaces and their equivalents are to be considered equal to a single space. b. numerals: 0 through 9. All headings beginning with numerals are arranged before headings beginning with letters. Numbers are filed in numerical order. c. alphabetic letters: A through Z. Lowercase and uppercase letters have equal filing value. Modified letters are treated like their plain equivalents in the English alphabet. All other punctuation, signs and symbols are ignored, if possible. If non- alphanumeric signs and symbols are prominently featured and must be filed, a system must be devised and explained. No standards exist for the filing of non-alphanumeric symbols. As an exception, the ampersand (&) may be filed as its spelled-out language equivalent. 9.3. Initial articles. Initial articles that form an integral part of place name and personal name headings (including nicknames, sobriquets, and phrases characterizing persons) are regarded for filing purposes at the beginning of a heading: El Paso Los Angeles The Dalles Cross references should link forms filed under the part following the initial article: Angeles, Los @@see@@ Los Angeles Dalles, The @@see@@ The Dalles Paso, El @@see@@ El Paso Disregard initial articles at the beginning of corporate name headings other than those beginning with personal and place names: The Club (London) clubs Dalles (The) Public Library @@see@@ The Dalles Public Library El Paso. Police Department The Extended Simulation Support System @@see@@ TESS (computer system) libraries The Library Association (United Kingdom) TESS (computer system) The Dalles Public Library Initial articles in the nominative case are ignored at the beginning of titles and topical subject terms: Das Capital The Movement (English poetry) The Nutcraker (ballet) poetry songs 9.4. Subheadings Subheadings are normally arranged in the same way as headings. However, the arrangement of subheadings may be modified by chronological or some other systematic arrangement if such an arrangement is considered helpful to most users and can be clearly understood by them. 9.5. Headings with the same initial term. Headings beginning with the same term should be arranged in the following sequence: a. term alone, with or without subheadings. b. term with qualifier or the term as the first element of a longer term. Terms with qualifiers and longer terms beginning with the same initial term should be interfiled according to the filing value of the characters following the initial term: songs bibliography history and criticism texts songs, American @@see@@ American songs songs and poems [title] songs, Cajun @@see@@ Cajun songs songs (high voice) with piano songs (low voice) songs (medium voice) with guitar songs, Zionist @@see@@ Zionist songs songsters (songs) @@see@@ songs -- texts songwriters @@use@@ composers; lyricists 9.6. Cross-references A reference introduced by "see" or "see also" or analogous linking terms is not part of a heading and does not affect the position of the heading in an alphabetical sequence. 9.7. Word by word versus letter by letter arrangement. Index headings consisting of more than one word should be filed by the word-by- word method in which a space files before a letter: New Brunswick new journalism new moon New York Newark Newfoundland news agencies news-letters news photography newspapers An alternative arrangement, letter-by-letter, disregards the space and characters that have the same filing value as a space, such as a hyphen or slash. This arrangement may be required for the continuation of an existing index, but is not recommended, since it complicates the apparent order, separating headings that begin with the same word: Newark New Brunswick Newfoundland new journalism new moon news agencies news-letters newspapers news photography New York 9.8. Numerals. Headings beginning with Arabic or Roman numerals should be interfiled, arranged in numerical order, and placed before the alphabetical sequence: 3/4 for 3 1:00 a.m. 1.3 acres 2 1/2 minute talk treasury 3 and 30 watchbirds $6.41 per hen per year 007. James Bond: a report 10% review XX century cyclopedia and atlas 21-8-1968: anno humanitatis 49th parallel 1001 nights 1066 and all that 1984 In indexes where few headings begin with numerals, they may be arranged as if spelled out in words. Exceptionally, numerals as prefixes or infixes in names of chemical compounds in biological and chemical texts may be disregarded, unless needed to distinguish homographs: 3-ethyl-4-picoline 4-ethyl-@@alpha@@-picoline In all other cases where numerals occur within headings or subheadings, they should be filed numerically: Club 18-30 Club 21 Club 147 fashions Club one holidays 9.9 Comprehensive example. The following file is designed to illustrate all of the filing situations described in the previous sections: 3/4 for 3 1:00 a.m. 1.3 acres 2 1/2 minute talk treasury 3 and 30 watchbirds $6.41 per hen per year 007. James Bond: a report. 10% review XX century cyclopedia and atlas 21-8-1968: anno humanitatis 49th parallel 1001 nights 1066 and all that 1984 American songs Angeles, Los @@see@@ Los Angeles Cajun songs Das Capital Charles Charles I, King of England Charles II, Emperor of Germany Charles II, King of France Charles III, King of England Charles III, King of England [title] Charles (airplane) Charles, (AL). Police Department. Charles, Allen [surname entry] Charles and the wise men [title] Charles, Duke of York Charles, Prince of Wales Charles, Saint Charles, (VA). Municipal Court. Charles, Virginia [surname entry] Charles (yacht) Club 18-30 Club 21 Club 147 fashions The Club (London) Club one holidays clubs composers Dalles, The @@see@@ The Dalles Dalles (The) Public Library @@see@@ The Dalles Public Library El Paso El Paso. Police Department The Extended Simulation Support System @@see@@ TESS (computer system) libraries The Library Association (United Kingdom) Los Angeles lyricists The Movement (English poetry) Music Africa biography cataloging dictionaries history and criticism to 400 medieval, 400-1500 20th century methods outlines, syllabi, etc. Peru United States Music about the house [title] Music, African Music and architecture Music, Baroque Music (MS). Park Department Music, Roman Music, Valerie [surname entry] New Brunswick new journalism new moon New York Newark Newfoundland news agencies news-letters news photography newspapers The Nutcraker (ballet) Paso, El @@see@@ El Paso poetry songs bibliography history and criticism texts songs, American @@see@@ American songs songs and poems [title] songs, Cajun @@see@@ Cajun songs songs (high voice) with piano songs (low voice) songs (medium voice) with guitar songs -- texts songs, Zionist @@see@@ Zionist songs songsters (songs) @@see@@ songs -- texts songwriters @@use@@ composers; lyricists TESS (computer system) The Dalles Public Library Zionist songs ========================================================================= Date: Mon, 14 Sep 1992 10:52:18 ECT Reply-To: Indexer's Discussion Group Sender: Indexer's Discussion Group From: "Kate McCain" Subject: Re: What do you read? In-Reply-To: Message of Mon, 14 Sep 1992 09:12:07 ECT from ----------------------------Original message---------------------------- I read: JASIS, IPM. JDoc, J InfoSci, Scientometrics, CRL, LQ, The Indexer Boulding's _The Image_ (one of the most important books of the 20th century) Gentle cognitive science, neural networks, etc. I have my A&I students read: Three _very_ important papers by Jim Anderson (Rutgers), Lancaster, Milstead, Soergel (optional), Marcia Bates, Austin (on Precis), Dykstra, Tenopir (book) (and individual interesting papers too numerous to mention). BTW I sense that this list focuses primarily on B-O-B indexing rather than indexing for online retrieval, thesaurus construction, etc. This is actually very helpful for me, since my background & research deals almost exclusively with the latter. I plan to get my students connected to the list as soon as the term starts -- but (having seen the problems on one of the library-related lists) will not give them an onlist assignment. Kate McCain "Bibliometrics R Us" College of Information Studies Drexel University BITNET: mccainkw@duvm Internet: mccainkw@duvm.ocs.drexel.edu =========================================================================