Thursday, October 31, 2013

11-7 Moshfeghi, Yashar, et al. Understanding relevance: an fMRI study


  1. 1. The authors present a lot of disclaimers and limitations about the study design (p. 17). What effect did the limitations have on the outcome? Is the study still useful? How could you build upon the results in further studies?

    2. The study concludes that there are three distinct regions that light up more when viewing a relevant image versus an irrelevant image. Would there be enough of a distinction to determine relevance for a partially relevant image? If it is a graded difference, would using fMRI be more helpful than traditional relevance judgments?

    3. The authors describe the three regions of the brain that are more active when viewing a relevant image and describe how this shows a connection between encoding instructions, image processing, and a higher-level decision making. However, studies of brain science are always changing and have shown the brain to be more inter-connected than originally thought and new advances are always changing understanding of how it works. How applicable are these results to the changes that happen in our understanding of the brain? Were the sample size and tests large enough to give data enough data to build on in future experiments?

  2. 1. This direction of research aims to investigate human brain directly in consideration of relevance in searching. In the conclusion, they found that certain brain regions were activated during the process of relevance assessment of image documents. This is inspiring result, but in order to make use of it in reality, we still need to know more about the human brain. We need to know how information is related from human’s perspective. For example, are the two piece of information highly related, partially related, or potentially related?

    2. My second question is about the future research direction in this field from an ethical perspective. This kind of research should first be permitted. Just as referred in this paper, ethical permission for the study was obtained from the Ethics Committee of the College of Science and Engineering, University of Glasgow. Thus in the future, suppose we have mature technologies that could detect relevance and users’ interest directly from users’ brain, how to make people accept it remains problematic. What are other side-effects for this kind of research?

    3. The experiments in this paper mainly focus on image documents. Images and pure texts may stimulate different regions of human brain (just my guess). I am wondering what the difference would be if we change the documents into pure text documents. I think there should be also other differences. When people watch the images, they can quickly get a basic idea about the images. However, when people read texts, their understanding about the text would change as they read more.

  3. 1. This study presents an interesting study trying to connect the cognitive state of relevance to brain activities. My first question is about the measurement techniques. As it is mentioned that fMRI can only measure the brain activity changes based on BLOD. Is there any other metrics we can use to simulate the brain activities? And since fMRI can only measure brain activity change in two seconds, while the cognitive state changes happen in tens to hundreds of milliseconds, is the measurement capture the real time brain activity changes?

    2. Since the brain activity was measured as metrics in this study, and brain controls various activities in human reactions, how can the authors eliminated all these background information to get the correct signal? 18 participants were used, what are their backgrounds in terms of age, physical and mental state? Intuitively, even the fact that the participants are very hungry or angry will have significant effect on their brain activities. Also is it possible that all these environment setups such as the fMRI scanner will cause biases on the brain activities? How can the authors eliminate these factors?

    3. The authors try to combine all different tasks (text, image) with different difficulties together to remove the biases. Intuitively these different tasks are probably processed in different regions of brains, so is it better to separate these tasks and study the brain activities of them separately? According to image recognition theory, different neurons will be activated by simply different image patterns, how many samples do we need to include so as to remove this bias?

  4. 1. The authors explicitly mention the physical limitations such as the lying down position of the participant, the non-movement position of the head and the lighting conditions associated with the proposed study. But they also suggest that presence of experimenters in contact with the participants. How can the presence of an experimenter minimize the effects of physical limitations? Moreover with such physical limitations, this study cannot be performed with large number of users for better generalization. Also are the study and the experiments repeatable?

    2. Cognitive events like thinking, perceiving, recognizing and reasoning are all short time scale events. They take place inside the brain almost intuitively and quickly. Since these are the basis for the judging cognition event, then their FMRI signals tend to overlap. This would increase the difficulty of determining the individual events. How can one map each of these events to the signals? Also in which cases would the cognitive process of events be slow?

    3. As a part of the procedure, a fixation image, containing a fixation cross indicating to participants that they needed to fixate on the cross and prepare for the next stimuli, was provided after every image block. What is the need for a fixation image? Did it help the participants in any particular way? Was the study tested without the fixation images? If not, would the results be different? What is the significance of having 8 seconds interval with just the fixation image?

  5. 1. The researchers indicate that they are trying to further recent studies that use aspects of human behavior as implicit relevance judgments, and say that such methods overcome cognitive burden since they are indirect and unobtrusive (p. 15). I’m confused as to what they consider their methodology, because even though it does measure an aspect of human cognition, it certainly isn’t unobtrusive. Won’t the MRI scanner and associated equipment increase cognitive load?

    2. The authors say the topics for the images were chosen based on the general knowledge of the participants (p. 18). How did they measure / evaluate this, or did they make some assumptions?

    3. I’m wondering why the researchers chose to project each image for 4 seconds versus allowing participants to indicate when they have finished making a judgment? While I agree that in an experimental setting a participant cannot be allotted infinite time, I’m curious because this is such a specific and short instance.

  6. 1. The authors claim that eye tracking and skin temperature measurements have been proven to be ineffective, but they do not cite any studies which support that claim (p. 15). They explain that brain activity provides a better understanding of how documents are determined relevant. Can we see brain activity as a cause and eye movement/ skin temperature changes as an effect/ product of that activity?

    2. For such an expensive experiment, I am surprised they decided to proceed with this version despite all of the limitations. What could have been done to simplify the experiment, make the participant more comfortable or feel more natural, or help make results easier to interpret? It doesn't seem like they took too many measures to account for the limitations- just acknowledged them.

    3. In the results section, the authors discuss the ways in which the brain reacts to seeing images and associating memories with them. How would the experiment and results have differed if the participants were presented with text-based documents instead of images (or with a combination of both)?

  7. It’s very interesting in this paper to study relevance judgment via fMRI. For this research, I have several questions: firstly, this study focuses on the image judgment. So, how about the text document judgment? In cognitive science, reading text and image may involve different processes of brain. For instance, in the working memory, reading text may be related with phonetically coding as well.

    Secondly, the topics in this research are very simple. If the topics became complicated, would the results in this research be different there?

    Similar to EGG, fMRI can merely provide some physical clues, which are very limited to understanding the relevance judgment comprehensively. So, need we incorporate other methods into this research?

  8. 1. The experiment is rather simplified that it is difficult to comprehend the meaning of relevance and the observed activations. Since deciding relevance/non-relevance in an image both require similar processing, there is no clear distinction between the two that is intuitive.

    2. In the conclusion, the authors make the claim that this study is extendable to studying relevance decisions on text – the extension does not seem to be as straight forward. Often a lot of text within a document is not relevant flagging mixed signals. There was no discussion or indication of an activation that reflects satisfaction/accomplishment. Finding the relevant section is a task and maybe monitoring signals that indicate satisfaction and accomplishment are better signals?

    3. Is it not clear how understanding brain activation will help current evaluation methodologies? We already adopt simplifying assumptions, and having a clear definition of relevance might not impact the evaluation process.

  9. 1. fMRI is known to be able to analyze blood flow in the brain. This does not imply that it is capable of understanding the cognitive function of neurons which are critical to mental function. So, even if we do make use of fMRI data, wouldn't the fact that fMRI scans are tedious to interpret in addition to the fact that the user would require to stay completely still - impede the validity of the experiments?

    2. While the analysis of the fMRI data hopes to provide information on which parts of the assessor's brain are active when judging relevance - the paper does not elaborate much on the assessor's profile. We have seen how prior knowledge of the topic, current emotional state and interpretation of the query by the assessor would make a difference in whether he/she considers the image retrieved as relevant or not. Doesn't neglecting these factors account for incompleteness in the investigation?

    3. We have seen how there lies a gap between visual and semantic interpretation of the image which does affect whether the searcher would consider a particular image as relevant or irrelevant to the search query. How would we be able to account for this semantic interpretation of the image when making use of fMRI data when we cannot truly estimate the mindset of the searcher?

  10. The authors state in the Introduction that “Despite the robustness of explicit feedback in improving retrieval effectiveness, it is not always applicable or reliable due to the cognitive burden that it places on users”. I do not agree with the statement as I believe explicit feedback is more a problem of time and resources than a cognitive burden.

    In section 2, when the authors talk about the Physical Limitations, they admit the experiment setup might cause “discomfit and fatigue” in participants. I think it is a big problem as the setup will introduce non-trivial confounding variables to the cognitive performance of the participants. I would feel extremely uncomfortable lying inside a MRI scanner by thinking about the possible radiation and other negative impacts.

    My last question is what is the point of the paper and how practical it is to measure brain activities of users in the real life? Without significant technical advancements in the measurement of brain activities (e.g., to make the equipment not intrusive at all and minimize the size), the contribution of the paper is just theoretical.

  11. 1. The authors state that they selected images they believed to be clearly relevant or not relevant. However, in class, we have focused multiple times on the subjective nature of relevance. We have read about multiple studies that revealed that judges do not agree completely with each other or even themselves over time. The authors of this paper never entertain any of these ideas. Do these issues with relevance translate to images? Or are they strictly issues related to textual documents? Due to the experimental design, the user is still saying what he finds relevant or not, so this issue may not affect the analysis in the end unless the experimenters take their relevance judgments as a ground truth in their evaluations.

    2. As stated in my first question, the authors make a point to emphasize that the relevance of the images is clearly defined. However, when the authors go on to further outline their experimental design, the authors state how they introduced easy and difficult tasks. How can images which are clearly relevant or not relevant be difficult to classify? I understand trying to validate the process by trying to produce difficult situations such that you aren’t handling some trivial cases. However, I do not see how their image selections would enable these ‘difficult’ tasks to exist.

    3. The authors described their training process that each user had to undergo prior to starting the study. Unfortunately, it sounds like the training was done just over the relevance task itself. I got the impression the user was not placed in the fMRI machine to perform the task. In contrast to the previous relevance judgment experiments we’ve read, the author’s study introduces a completely new environment. The user has to keep his head stationary and look at a reflection of a projection of an image. I understand the author’s motive for providing his initial training; however, I think the users should be trained in the experiment’s environment, which is likely to be unfamiliar to the volunteers.

  12. The authors propose that the measurement of the brain region activity changes during explicit relevance judgment could be used as a direct way of measuring relevance. However, will these changes not differ from user to user? If these changes are user dependent, wouldn’t we be dealing with the same (current) problem of relevance but in a different form?

    The use of MRI scanner has forced the authors to arrange the experimental setup (supine position of participants) that they have resorted to. The results found by this method can only be used to model ‘ideal’ users. This is because most of the cognitive processes are short term with fleeting attention spans. Thus any study that tries to analyze this should be quick enough in its evaluation, which the fMRI, is clearly not. Moreover, it seems that this method of relevance determination seems impractical to scale with the current technology.

    The authors mention that their analysis shows that three regions show greater activations for relevant stimuli than for non-relevant stimuli. However, they fail to define what greater is. Was it a conspicuous difference or an insignificant change? What would have happened if the documents presented to them were not images?

    The authors mention that from the pilot study with 4 participants a number of changes were incorporated into the experimental setup thereby resulting in the one that they described in the paper. These changes, if mentioned, would have provided the reader some insights about the challenges the authors faced while creating this new experiment.

  13. 1. The authors mention the fMRI setup that they constructed for the study. Given the physical limitations of this study, how useful are the results that they obtained given the restrictions that were placed on the participants and the unnatural search environment they were in?

    2. The authors mention that the participants all stated they were familiar with search engine use, given the nature of this experiment and the construction of the task, how relevant is search experience to this type of study?

    3. The authors mention that they made a number of changes from their feedback of the pilot study, what were these changes and what about the pilot study made these changes necessary?

  14. 1) When discussing the technical difficulties, Moshfeghi et al. mention that short time scale events can lead to signal overlap. How can this happen and what is its semantic significance?

    2) What kind of relevance do they use in their experiment? Also, wouldn't it be better for the results if they use 3 degrees of relevance since binary relevance is hard for a participants to do especially in the “gray” areas.

    3) When discussing the experimental setup, they state that the control remote used was configured randomly from person to person. Wouldn't this introduce errors into the classification since most users are used to the top button being “on” and other one is “off”.

  15. 1. This paper only considered the case of activated brain regions during the explicit relevance judgement. Why did they only consider the explicit one? Why did they not include the implicit case?
    2. The four search tasks were selected from the ImageCLEF 2009 following the criterion of “general knowledge of the population in mind”. However, according to the topics in Table 1, some titles are still tough for people to catch, which means, the term “general” relies on the certain population who has sufficient knowledge of these titles. Does such dependence introduce any bias in the work?
    3. In this paper, the difficulty of recognizing an image was reflected by the relative assessment processing time. However, this processing time depends on participants’ knowledge. If they were familiar with the topic, the time would be shorter. In such case, how could the difficulty be accurately defined?

  16. 1. It’s mentioned in page 17 that fMRI is an indirect approach. What is the direct approach? Are those direct approaches more efficient to understand relevance? What are the difficulties to apply such direct approaches?
    2. The selected topics in the experiment covered various image recognition tasks. Why did the authors adopt such a strategy? Is it better for the experiment and result analysis if there is one recognition task involved?
    3. In section 4, the authors mentioned that their work was consistent with the conclusions from other related work. If so, why did the authors not focus on more specific issues in the conclusion instead of “redo” the work.

  17. On page 17, the researchers mention keeping in contact with the participants while they are in the MRI scanner. They mention specifically keeping in contact with the participants to “make sure they were meeting the requirements to proceed with the experiment”(17). What exactly were the requirements for proceeding beyond what I assume were being awake and responding to the researchers prompting?

    Given that for the task the researchers provided their own gold standard and training for the participants, how much of the activity they witnessed was attributed to actual relevance decisions from a user as they would experience when searching queries vs. assigning relevance based upon the training they received? Does it matter or does the brain activity follow a process set up by the manner in which the experiment was conducted?

    This experiment was done using binary relevance assessments, which is probably best for determining specific neural activity in the brain when doing these assessments. How might using graded relevance, as we’ve seen many studies use, help researchers in determining the underlying regional brain activity when dealing with relevance?

  18. 1. The authors mentioned their MRI scanner experimental setup and it seemed pretty intimidating to me. For example, participants performed the experiment while they were 'lying supine in the bore of the scanner'. What are the potential effects of judging relevance in such conditions? Shouldn't the experimenters have calibrated results from these settings to results obtained under normal settings and shown that this intimidation factor did not play a major role for their purposes?
    2. The experiments in this paper seem to apply primarily to visual input. As we know, visual inputs tend to excite different brain regions than text or other multimedia even? For each category of relevance therefore would a similar experiment have to be conducted?
    3. The authors mentioned that fMRI has poor temporal resolution, that it can take more than a minute to switch between results. Hence, did each participant have to wait for minutes between tasks? What the effects of this on relevance judgment? It seems like there are many factors that could have introduced noise in the MRI readings and it seems strange to me that these effects were not addressed in this paper.

  19. 1. The experiment described in the study looks at relevance through brain region activation from images. The authors mention that one of the activation regions (parietal lobe) might not be as active in text-based IR scenarios. Aside from the sensory specific aspects of the data, might we expect other regions to be implicated in other study mediums?

    2. Even after reading the article, I’m uncertain whether relevance is actually being described. To answer whether relevance is especially implicated in the activity, wouldn’t it be necessary to show that it is something unique from other cognitive processes? How might the cognitive description of relevance here differ from one seen in pattern matching or problem solving from an fMRI perspective?

    3. It seems very odd to me to describe relevance in terms of brain region activation. How can this methodology work in a scenario where one document is clearly relevant and another document is somewhat/unclearly relevant? It seems like more cognitive effort would be expended in the case of near-relevance than clear-relevance, but the model here would predict the somewhat-relevant item to be more relevant because it caused greater activation.

  20. Have we discussed affective feedback? What is that? Besides clicks, save prints?

    If fMRI's are very subjective, and surrounded by a 'vast controversy', will there be consensus on this technique?

    Topics of the images were supposed to be general enough to be recognized by the general population, but the first item, the 'fortis logo' was something I've never heard of. I wonder why they didn't use the most recognized logos such as coke, disney, etc. .

  21. Having had several MRIs taken over the course of my life, they are an extremely distracting, even unnerving machine. They are large, often very noisy, extremely magnetic (obviously, but to the point of it being dangerous to have metal near the machine), and their imaging results from scans taken over time. Furthermore, you must lie down for a lengthy period of time during the procedure, and you must remain completely still during image capture. The paper states that the time required for an image to be collected is 2 seconds - often it seems to take far longer, in my experience, but that is neither here nor there. The point is, how does this setting impact users' abilities to respond normally to relevant or irrelevant documents? It surely has an impact. Can they collect the right information in a 2-second period, or do they need a shorter timeframe? Can they trust the users' brains to respond normally in the laboratory setting?

    Similar to a question I had about Vo and Gedeon's paper - suppose that we do identify the regions in the brain that activate during relevance/irrelevance decisions. Then what? The authors state that the work contributes to our theoretical understanding of relevance as well as potentially to the development of new methods of relevance feedback. Do these findings actually contribute to our understanding of relevance? How? If I were to guess where relevance judgment might spur activity in my brain, I would guess it would spur some in the judging part and some in the thinking part. That is essentially what this paper stated. Beyond that, the results lack the "power to distinguish between... alternative explanations." (p. 23) And with reference to the second piece, how do we develop new methods of relevance feedback based on fMRI research? There might be very interesting potential here, but I cannot think of anything.

    Does log analysis effectively do what it is purported to do in the paper? The authors state that it "is an indication of how well they performed each task and therefore how trustworthy the captured brain activity is across each relevance block." However, their method seems to rather simply give the subjects' accuracy in classifying the images. How does this relate to brain activity and whether or not the imaging data are "trustworthy" (reliable?) as reflections of the users' relevance selections? It seems rather that any time a user believes an image is relevant, then the scan of the user for that image could be used, no? The actual relevance or irrelevance does not seem to be important here.

  22. Q. Getting inside an MRI machine itself disrupts the stud being done by the author. Mostly people are nowhere near an MRI machine when they are using IR systems. Then the test subjects used for the study would have been an elevated sense of emotion because the tests they were being asked to perform. Given these facts can the results of this suer study be given any practical importance?

    Q. The author has mentioned that some normalisation functions were applied to the scans results. But the details of why these normalisation were done and what did they achieve and how these normalizations did not obscure or extrapolate the test results has not been mentioned by the author

    Q. This type of relevance judgment can not be used practically for every testing being done. These results can’t even be extended to get graded relevance if needed. This testing was performed using images only, and given the fact that images and text are processed by different sections of brain, these results cannot be extended for text evaluation. So whats the scope of application these results ?

  23. 1-Regions of the brain that were identified showed ‘greater activations for relevant stimuli the for non-relevant stimuli’ but the authors did not discuss which regions showed greater activations for non-relevant stimuli. Is it that the same regions always activate when observing images and just activate more if a user finds them relevant? If that is true how can the difference between non-relevant and relevant activity be distinguished with out the user to interpret them? Also does that present generalization issues for other types of documents?

    2- One problem outlined is the limitations of the MRI- it can’t capture real time brain activity so decisions made very fast may be missed. Would a solution for that be to offer more time consuming tasks like reading or watching film clips?

    3- Looking into the future I wonder if we correctly identify what parts of the brain interpret relevance if there wold be a way to artificially stimulate these areas and create a false relevance judgement? Assuming the intent was not malicious would a false sense of relevance be just as good as a real one? Of course if mind manipulation advances that far our biggest concern will hardly be relevance judgements. But maybe we won’t notice.

  24. 1) Since most of the studies we have seen in this class focus on text based relevance, is it possible that their results might have been different with text instead of using images?

    2) It was interesting that the authors noted as one of their technical limitations, that the temporal resolution of measurements is significantly longer than the underlying neural events. Would this limitation prevent the evaluation of textual documents, where the evaluation process is more complex, and might include some overlap? With images it seems like the researchers had the ability to explicitly show each image separately giving enough time for haemodynamic response to each neural event.

    3) I was a bit confused why each user needed to signal whether or not an image was relevant or not. Would not the brain signals speak for themselves? If anything, it seems that the user should have just pressed the button to move on to another image, or something of that nature.

  25. 1. In this article the authors were using an fMRI to find out the areas of the brain used in judging whether something was relevant or not. They did this by showing images related to topics from the 2009 CLEF Photo Retrieval Task. When they report their results the majority of areas of the brain that were active for this task were the areas that were related to visual processing. The author then hypothesize that for text based relevance judging that it would make sense that there would be activity on the left side of the brain in other areas. Does this mean that the results they have are not good because they only show image processing and not an actual relevance judgment? Do you think that the human brain has different processes for judging relevance in text and in images?
    2. In the experimental design that the authors created for this article they state that they broke up the relevant and non-relevant documents for each topic into separate blocks that were shown to the subject in random order. However they point out that this method could lead to some noise as the subject could use cognitive strategies to determine which block is relevant and which block is not. What were the benefits of using the method that they did as opposed to the less noisy random method and do you think that they made the right choice in using this method?
    3. In this article the authors determine three areas of the brain that are functional when making relevance judgments on images. These areas are the superior frontal gyrus, the inferior parietal lobe, and the inferior temporal gyrus. They also give the generally accepted function of each of these areas. Which of these areas do you think is most useful to the act of relevance judgment given their regular functions? Do you think that they are all equally responsible for relevance judging and why?

  26. 1. How is it ethical to perform an fMRI on a person just to study about the way brain reacts to relevance? Even if it is ethical, the person would be under immense stress from the noise (MRIs are scarily noisy) and lying in an uncomfortable position. Does this not affect or bias the study?

    2. Why did the researchers not extend this study to text relevance? Will it not help in producing a better understanding of the whole study? How will the system formulation hold in the case of graded relevance?

    3.The article does not explain in detail about the way in which they chose the eighteen participants. The other reading also had only 19 participants. Does this have anything to do with the cost of the experiment or is this number sufficient to establish statistical significance? I also believe that the results could have been discussed much more clearly, explaining the observation from at least a few more participants. The authors state that the participants were given a questionnaire at the end of the study, but they have not discussed about that further in the paper. Don't you believe that it would have added more validity to the result if they had performed an analysis based on every individual participant, keeping in mind that there were only 18 participants?