|
| Walhello > About |
|
Introducing The Walhello Search Engine Walhello is a spider based search engine for the whole web. This index is one of the largest in the world and the basis for the Walhello search service. In addition to the basic search functionlaity based on matching keywords with most relevant web pages Walhello also provides the following integated functionality: News Search in news resources Picture Search A categorised Web Directory Product search of on-line shops An integrated reference with knowledge, articles and discussion boards related to keywords An answering engine that answers specific questions The size of the index is continously growing and the quality of the services is improved by research on mathematical ranking algorithms and knowledge technology. The beginning The World Wide Web contains billions of documents containing publicly accessible useful information and knowledge. The problem of the World Wide Web is however that data on web pages is not very well structured making it difficult find relevant information and to use the information on the Internet effectively. Walhello.com started developing the Walhello (Valhalla + Web + Hello) website in March 2000 as a research & development project. The objective of this project was to structure the Internet and providing services to Internet users by granting access to this "structured Internet". Downloading data (Appie spider) As a first step the appie spider was developed which automatically downloads data from the World Wide Web. By extracting links from Web pages and subsequently downloading the pages corresponding with these extracted URLs. Currently millions of Web pages are downloaded and indexed on a daily basis, including pdf and Word documents. Building World Wide Web Index (Classical Search Engine) To structure the downloaded data software was developed which parsed the downloaded data and extracted the following information from the downloaded pages: Words and Locations of words on web pages Languages (about 40 languages are supported Links between web pages This information was stored in a huge database, which was introduced on the Internet in June 2000 to help finding web sites matching a search query. Mathematically advanced ranking algorithms ensure that the most relevant results are shown first. In June 2000 this classical search engine was introduced on the Internet. At present the index contains about 2 billion web pages and is continually growing. The database is running on a very efficient architecture consisting of a cluster of cheap Linux servers. This architecture enables short response times because of parallized processing, scalability and fault tolerance. At present all application software is developed by Walhello in C/C++. Many years of research in compiler optimisation has resulted in very efficient high performing and reliable software on cheap hardware. Ranking based on clustering and distance (Advanced Ranking Algorithms) Based on research we concluded that information about a certain topic is clustered on adjacent Web pages. Research also showed that these clusters are unique for each search query. Walhello developed technology that can identify dynamically clusters and subsequently the size and relevance of a cluster for each search query performed. The ranking of a web site is based (in addition to characteristics of the page itself) on the distance of the page to clusters and the relevance of these clusters. Walhello is researching computational challenges to improve the ranking of web pages based on this clustering technology. Integrating additional Search Services To extend the search services Walhello has integrated the DMOZ Open Directory and products sold by several leading on-line shops, including Amazon.com and Allposters.com within the Walhello search engine. There are plans to integrate other external information sources as well. Knowledge Engineering The current search engines are mainly based on on mathematical algorithms to determine the relevancy. However humans use knowledge to determine the relevancy of web pages. Therefore Walhello started building a object oriented knowledge base which contains knowledge that can be used to better understand the semantical meaning of the content of web pages. This knowledge base combined with natural language syntactical and semantical parsing technology can be used to retrieve new knowledge and to determine potential data inconsistencies. As a new service some knowledge objects of the Walhello knowledge base are integrated within the Walhello Search engine and are made available on the Internet. Advanced Search Option Walhello has added proximity search as an advanced search option. The proximity search functionality allows users to define the maximum distance (number of characters) between the search terms and is a mixture between the standard keyword search and string search. Maintaining Reference Information for search queries Walhello has started to build and maintain reference information related to a search query to improving the search experience consisting of: References to News articles that match the search query References to Products that match the search query Maintaining knowledge, articles and feedback on search queries obtained from users More information If you want to know more about the Walhello Services you can send an e-mail to walhello@walhello.com. |
|
Copyright (c) 2000-2005 Walhello.com, All rights reserved |