Wednesday, September 11, 2013

19-Sep Optional Alternative Reading


  1. Research on Relevance in Information Science: A Historical Perspective- Tefko Saracevic


    This article is focused on the landmark examinations of the notion of relevance in the field of Information Science, starting with its first definition and extending to the present day. One major flashpoint in the history of this field is the first publications of American Documentation in 1950, which provided scholars and professionals with a platform to discuss and define the essential ideas of information retrieval. Comparisons between the notion of “aboutness” and “relevance” are a central concern in this article, defining the former term as relating to the organization of information (and falling under the jurisdiction of librarians), and the later as related to retrieval (falling under the jursdiction of information science professionals). The authors summarizes many of the same studies as he did in the other article we were assigned by him for this week. He claims that there is still a lot of inconsistency in both algorithm and human-based relevance tests, and that there are many studies being done to address this (he goes on to summarize a few of these). And he concludes by claiming that “relevance research did not contribute a lot, if anything, to pragmatic knowledge: to practical issues of improving or innovating IR systems and procedures”(p. 12), but it did broaden understandings of IR system designs.


    1. Both articles mention that WWII acted as a catalyst, sparking the formation of IR as a professional and academic field. Another mention of military advancements is made during the discussion on relevance tests by the Armed Services Technical Information Agency (p. 5). Was this a chronological coincidence, or is there a connection between massive military endeavors and leaps in technology? Have more recent military occupations and/or wars lead to improvements in IR that are now a part of public, consumer experience? Or has this shifted since the military has outsourced much of its technological development to vendors such as GE and Boeing? I would be interested to hear more about the relationship between search engine advancements, information retrieval, and military efforts.

    2. The first definition of relevance by Calvin M. Mooers includes the notion “'a prospective user is able to convert his need for information into an actual list of citations to documents in storage containing information useful to him'”(p. 4). Would the importance of a “user” and their “needs” have been considered worthwhile without the idea of the consumer and his/her needed supply of demanded goods being so ingrained in our culture? What does it mean that IR was pioneered by an industrialized, western society?

    3. The author, in both articles, makes a point of consistently acknowledging points that are outside the scope of the research included in the article at hand, and redefining the limits of the specific work. For example, when mentioning the more philosophical examinations of the rise of information science, the author states that “[w]hile illuminating, arguments in such treatises are not elaborated here due to restrictions on length. Instead, concentration is on studies containing data or observations”(p. 5). Is this a useful strategy when writing one's own literature review? Is it useful to allude to related arguments in order to assure readers that you have not overlooked them, but that they are not relevant or specific enough to be included?

  2. Marti Hearst. Search user interfaces. Cambridge University Press, 2009. Ch. 2: Evaluation of Search User Interfaces.
    [Subbed for the Smucker article]

    The chapter/article begins with the simple question; What should be measured when assessing a search interface? It then goes onto explore the three main aspects which, in an ideal world, define evaluation of search interfaces: effectiveness, efficiency, and satisfaction. Hearst delves into each of these spending time explaining the methods employed by each type of study that is used to evaluate usability. By doing this, Hearst presents a sort of template one could use to evaluate search interfaces. He presents methods that range from short-term testing like paper prototyping, into longitudinal studies that track behavior in the long term, and even discussions about the intricacies involved in dealing with participants from finding them to handling contextual differences between them in an effort to show the process of evaluation. He concludes the article with a straightforward path to follow to both design an interface and the tools to use in evaluating it.

    In relation to Saracevic’s article and how funding for user related studies can become costly. How effective would a sort of discount usability testing system be when determining relevance and other focuses of user-centered studies?

    Most individuals will would no doubt be influenced at least on some level by the presence of an individual hovering over their use of a low-fi/paper prototype of an interface. As a result does the presence of several evaluators during the low-fi prototype evaluations influence the participants decisions to a noticeable level?

    How feasible would a search engine run by researchers actually be as Heasrt mentions was proposed by Cooper in 2008? Would the cost simply be too much for a search engine like that to exist and how accurate would that information actually be with participants knowing they’re using an engine that will be used for research purposes?

  3. [subbed for the Saracevic article]
    In his paper “Relevance Judges' Understanding of Topical Relevance Types...”, Huang shows the lack of understanding in topical relevance. In his study, four graduate students produced relevance judgments for 28 topics using the MALACH test collection which “aims at improving access to oral history archives through automatic speech recognition (ASR) and subsequent information retrieval assisted by techniques from natural language processing (NLP).” Two months later, the graduate students were interviewed and Huang describes that the participants perceived five different types of topical relevance: direct relevance, indirect relevance, contextual relevance, relevance by comparison, and relevance by pointer.

    1) Huang mentions that logical relevance and situational relevance have built the foundation of this concept. Moreover, situational relevance is built upon the ideas of logical relevance. However, the explanation of situational relevance is brief and not very clear. Can you elaborate more about it?

    2) When describing the similarities and differences of direct relevance and pointer relevance, Huang quotes one of the participants which states that at some point a point of relevance might be a direct relevance. However, one of the characteristics of point of relevance is that they contain little to no relevant information. Doesn't this characteristic contradict the quote of the participant? If so what is the Huang's goal?

    3) I am curious about the benefits for waiting 2-3 months before interviewing the participants. What were some of the drawbacks that were avoided from doing this? I would say that interviewing the candidates immediately after would yield more accurate and distinctive results.

  4. Relevance Judges’ understanding of Topical Relevance Types: An Explication of an Enriched Concept of Topical Relevance


    The authors Huang and Soergel have focused on the importance of Topical Relevance while evaluating an IR system and have expressed the views and understanding of relevance judges.It states the various complexities of topical relevance and also illustrates how difficult it is to identify the associated variables for making judgments. In this study, four graduate students performed search-guided relevance judgments for 28 topics and they have shared their views and opinions in what factors contribute towards topic relevance, what are the roles and characteristics of different kinds of relevance. In a nutshell, the paper speaks about how topical relevance is often misunderstood and given less importance over user relevance and concludes how topical relevance and its different categories form an integral part of IR evaluation system.

    1. Green had pointed out that topical relevance relationship is a hierarchical and structural relationship and that it is not just a matching relationship. How true is this statement? When a document is judged based on its topical relevance, how useful can the structural relationship be? More than the syntactical relevance of the documents, would the user not be more inclined towards the semantic relevance?

    2. The relevance is categorized into multiple segments ranging from direct, indirect to pointer and overall relevance. The Google Search Quality Guidelines (practically working on an IR system) topic spoke about different types of queries “DO-KNOW-GO” queries and these were bucketed into different user intent sections. When classifying queries under the various segments of the user intent, where do queries like “news” fit in? How can one correlate between the user intent and the categories of topic relevance while judging documents?

    3. In the section “Different types of topical relevance as perceived by different participants”, a participant has quoted that “A good direct has a wealth of details while a bad direct had much less information on the given topic”. I disagree to this statement because richness of detail of a topic is related to the quality and not to the quantity. Quality vs Quantity – which is a good measure?

  5. (I posted the same questions by 4.30 pm on September 17. No idea how they disappeared)

    "Relevance Judges’ Understanding of Topical Relevance Types: An Explication of an Enriched Concept of Topical Relevance" submitted instead of "Tefko Saracevic. 2007. Relevance: A Review of the Literature and a Framework for Thinking on the Notion in Information Science. Part III: Behavior and Effects of Relevance. JASIS&T 58.13 (2007): 2126-2144."

    I chose this optional reading as it answered one of my questions from the earlier readings.
    Summary :
    Topical relevance : Relationship between the overall topic of a relevant document and the overall topic of the user need.
    The authors attempt to overthrow the assumption of a single type of topical relationship stating that there are additional relevance factors such as indirect relevance, context relevance, pointer relevance and relevance by comparison. The authors back their claim with the help of interviews with relevance judges 2-3 months after they judged the MALACH test collection using the above mentioned topical relevances. The judges claimed to have a change in perspective and also ascertained that direct topical relevance alone was insufficient, adding that other types of topical relevances and the overlap among them also contained or led to relevant documents.

    The interesting points of discussion that I found in this paper were :

    1. My first question is regarding the "relevance as pointer". From Figure 3, it can be observed that this type of relevance had low relatedness to the topic and less relevant information. P1, P2 and P3 state that the pointer relevance was very straight forward, contained 'small amount of information' and just had a probability of pointing to a relevant topic. Doesn't the function of pointer by relevance be covered by the context relevance by itself? What is the need for a separate pointer relevance? In their subsequent work, "" Huang and Soergel have ignored the pointer relevance. Is the above mentioned the reason behind it?

    2. The paper provides a qualitative study and reports improvement in perception of the judges' understanding without the backing of a quantitative study. Although the need for topical relevance types in addition to just the direct relevance seems very logical and acceptable, the authors have not backed their claim with a quantitative or observable increase in the performance of relevance assessments. What could be the reasoning behind it? Is it acceptable to follow and believe in this theory without actual quantitative analysis?

    3.How does the context relevance hold with varying time? The term context is dependent on so many variables such as time, place and language. What might be relevant for one particular set of variables might not hold for the other. How did the judges assess the performance based on the context when it provides subjective topic definition? If a particular document provides a general background which is relevant to the user topic, will it still be considered contextually relevant even if the document by itself might not be relevant to the user topic? What will the user gain from such a merely contextually relevant document?

  6. 1. The author mentions that TREC evaluations do not require the users to
    interact with the system. He mentioned an interactive track from 1997 to 2000,
    but didn't talk about anything currently being evaluated. Is this still the
    case that TREC doesn't involve any HCI evaluations of the systems? This seems
    like a huge omission -- is this a problem?

    2. Previous readings have discussed how raters' levels of expertise in an area
    can drastically affect the relevance judgements, and tend to produce results
    with higher correlation to each other. I'm curious to what extent this is
    reflected in the HCI aspect of search evaluation. For example, if a user is
    very adept with technology and used to dealing with different UI's, it seems
    like their evaluations will reflect their previous experience, so search UI's
    that are similar to existing UI's will perform better, not necessarily because
    they're inherently more intuitive, but because they align more with the rater's
    experience. Is there any sense as to how much a rater's experience influences
    their preferences? Is there a way to account for and minimize this bias?

    3. The author talks about how aesthetics have a significant impact on
    questionnaire based evaluations, but not on financially incentivized
    assessments. What is the implication of this for search UI's? It seems like
    this calls into question whether or not aesthetics matter in a search UI, but
    the author did not explore this.