Submitted to ACM Computing Reviews October 2006
A Review of the Paper:
Smyth, B. and Balfe, E. 2006. Anonymous personalization in collaborative web search. Inf. Retr. 9, 2 (Mar. 2006), 165-190. DOI= http://dx.doi.org/10.1007/s10791-006-7148-z
I remember as of 1998 meta-search engines such as the Meta Crawler were very popular among students and the research community until the arrival of Google that revolutionized the concept of searching on the WWW. Now meta-search engines are practically not very popular since the dominion of Google, Yahoo, AlltheWeb and Windows Live search engines. Also not much has happened in the search engine arena from the technological perspective since Google’s Page Rank algorithm. Some interesting alternatives have spawned such as the Lexxe search engine based upon advanced natural language processing, domain-specific search engines such as the Kosmix Health search engine, Clusty the clustering search engine or BrainBoost the somehow intelligent question answering search engine.
Well, in this paper it is depicted in detail a proof-of-concept innovative meta-search engine entitled “I-Spy”. We may claim it as some-how revolutionary in the sense that it is based upon hit matrices or query (qi) x document (dj) matrices storing the number of hits on each cell (Hi,j), which reminds us of the typical termxdocument matrices used in information retrieval algorithms. hit matrices represent search histories of previous community users. i-spy uses case-based reasoning (CBR) technology for displaying search results against a given query. In this case, the basic philosophy of CBR is the reuse of successful previous searches for the solution of future queries that present certain similarity. For each submitted query i-spy retrieves a preset maximum number of results per search engine, recombining results as a list ordered by normalized increasing overall scores (Rm).
Next, i-spy re-ranks this ordered list based upon the selection history of previous searches (Rm is converted into RT). Results that are relevant to the current query are promoted (re-ranked with a higher score) in the list. Also relevant previous results that are included in the hit matrix (Hi,j) and not included in RT are finally included in RT and promoted.
i-spy maintains separate hit-matrices for separate communities limiting their growth as compared to termxdocument matrices. We are able to intuitively foresee matrices with probably large number of rows (queries) but limited number of columns (documents) given that normally users only select the most relevant results (documents) to a query (i.e. users select documents among the first 25 results).
We might be able to critize their implementation not taking advantage of parallel processing when users submit queries for determing a-priori not only hit matrices (meanwhile waiting for results from search engines) but also the most relevant top-k community hit matrices enabling naïve users to submit generic queries to all communites and letting the user to choose which community is appropiate for his interests (i.e. community discovery).
We conclude this review stating that European Universities are not really competent in transferring technology to the industry sector loosing clear market opportunities, as compared to American Universities. The European Community should not only promote Network of Excellence (NOE’s), Specific Targeted Research Projects (STREPS) or Integrated Projects (IP’s) but also smaller innovative entrepreneur incubator projects. We believe there is a huge pool of proof-of-concept projects that can be transferred from the academia to the entrepreneur sector potentially generating alternative funds for Universities in the form of royalties, licenses and company shares.
Companies such as Eurekster are trying to fill the niche market of community-based search engines. Their beta proposal named as Swicki learns the search behaviour of communities by enabling community users the promotion or exclusion of web sites and pages and focusing search on domain-specific sub-webs.
NOTE: Formula (5) page 174 has a small errata: Relevance (pj,pi) should be Relevance(pj,qT)
Sphere: Related ContentTAGS:[ case-based reasoning, hit matrices, i-spy, meta search engine, ranking algorithm, search engine ranking, WWW]
Related posts
Print This Post
There Are No BibTex Entries For This Post
Hits for this Post:7346 | Posted in WWWPosted by VirgoBrain | No Comments »

















