Sunday, April 20, 2014

Week 12 - Search Systems and Search Engine Optimization

Information retrieval through searches and search engines is very challenging, expensive and well-established. If search becomes a necessity, some sites or intranets incorporate search systems from sites that allow you to search the entire web. There are three different ways of searching the web:
·        A search within your site or its sub-sites, e.g. a search within www.dice.com and its very sub-sites
·        Search indexes of web pages, e.g. those of www.bing.com
·        Metasearch, which involves searching across multiple sites, e.g. www.clusty.com and www.dogpile.com

The website http://searchenginewatch.com/ is a great resource for the latest information on web searching. The IA has to make the decision whether their site needs to be searchable or not. They should be very careful not to make the typical assumption that a search engine alone will satisfy all users’ information needs. There are browsers who forego the search utility but prefer to peruse the site and have a feel of things. Before the IA makes the decision of adding the search functionality to their site, they should carefully answer the following questions;
·        Is there sufficient content in your site?
·       Does the company have sufficient resources to invest in this effort? Is the investment going to divert resources from more useful navigation systems?
·        Is time and the technical know-how available to invest in optimizing your search system/
·        Are there better alternatives to search?
·        Will your site’s users actually bother to use its search system?

Planning the capacity of your site or intranet can sometime be very tricky and determinant whether to include a search system or not. When sites become very popular, they grow organically and more and more functional features get piled on haphazardly, leading to a navigation nightmare. Certain issues can actually help the IA decide whether or not their site has reached the point of needing a search system:
·        Your site has too much information to browse
·        If the site has become fragmented, it can definitely use some help from a search system
·        Search can actually become a learning tool to help improve the site through the analysis of the search logs
·        Nowadays, search actually needs to be there because it has become a user expectation;  most users typically expect to find a search window on every single web site they visit
·        If your site has highly dynamic content, you should definitely include a search system to it.

The IA should make search inclusion decisions based on the end-users of the site; hence they should know their site’s users. The decision whether or not to include a search functionality to either the intranet or a website is greatly influenced on how much the IA knows his/her site’s users. This decision should be solely made with the users in mind, rather than on the available technology. The search system actually interfaces with the site’s users, hence the user should be the King in influencing this decision.

The working of the search system is usually a three part configuration. At the center of this configuration is the search engine which contains indexes from indexed documents and processes the queries from the searchers via the search interface. Matching indexes are produced in the form of results to the queries which were supplied to the search engine. Documents usually include web pages and web sites serve as the input into the search system. Indexing can be manual or automatic. Traditional commonly used manual systems for compiling indexes of documents make use of cards, such as library catalogue cards, but nowadays a good computerized Personal Reference System is to be preferred. For each document acquired, the bibliographic identification elements are written, or typed, on a card. Thus, for a journal article, the structure is: author's surname and forenames; article title; periodical title; volume number; part number; date of publication; pages. Keywords or descriptors of the contents should be written up. Alternatively, a short abstract or summary can be included (you can often make use of abstracts written by the author). The use of a standardized reference format style is recommended. In automatic indexing, spiders & robots crawl websites and index pages according to their own rules. As a result, they build large databases containing the indexes.

Determining what to search for can also be tricky. Whether to search the entire site or just specific pages or documents or whether to create search zones or not, or whether to index the entire site or just specific pages or documents or zones within the site are all decisions to be made by the IA during the search system design. Sometimes it becomes necessary to determinate search zones to limit searching the entire site/intranet. It might also be necessary to create a mini search site within the website itself. This search site can either be sub-site or a document type. Some sites might necessitate the incorporation of web search within. This involves searching through multimedia and heterogeneous sites with diverse content. Search can also involve full text searches of the information being requested or just the metadata about what’s being requested. The IA also has to decide what type of indexing to incorporate within the search engine for documents, either content words or just important words as those found in the metadata fields. Indexing can also be for specific audiences, by topic or just for recent content, reading level, topic, date of update, user task, etc…

Search algorithms find items with specified properties among a collection of items. The items may be stored individually as records in a database; or may be elements of a search space defined by a mathematical formula or procedure, such as the roots of an equation with integer variables; or a combination of the two, such as the Hamiltonian circuits of a graph. There are about 40 different retrieval algorithms which retrieve information in different ways. Most of these algorithms employ pattern-matching which uses recall and precision.

Query builders affect the outcome of a search by souping up a query’s performance. They are usually invisible to users and common examples include:
·        Spell checks
·        Phonetic tools (the best-known of which is “Soundex”)
·        Stemming tools that allow users to enter a term
·        Natural language processing tools
·        Controlled vocabularies and thesauri

The IA will also need to determine afore-hand and make choices on how the results for the search engines are to be presented. Here, there are two main issues to consider:
·        Which content components to display for each retrieved document– display less information to users who know what they’re looking for, and more information to users who aren’t sure what they want, how much or how many, how much information for each item,.
·        How to list or group the search results – by categories, alphabetically, chronologically, ranking by relevance, ranking by popularity, by users’ or experts’ ratings, by pay-for-placement (different sites bid for the right to be ranked high, or higher, on users’ result lists.

Design the search interface implies putting together what to search, what to retrieve, and how to present the results in a single interface. With a varied user commodity and search-technology functions, there are also many different types of search interfaces. Designing the search interface will involve considering the following variables:
·        Level of searching expertise and motivation
·        Type of information need
·        Type of information being searched

·        Amount of information being searched

No comments: