Crawling Out of the Web
نویسندگان
چکیده
منابع مشابه
Crawling the Infinite Web
A large amount of the publicly available Web pages is generated dynamically upon request, and contain links to other dynamically generated pages. Many Web sites that are built with dynamic pages can create arbitrarily many pages. This poses a problem for the crawlers of Web search engines, as the network and storage resources required for indexing Web pages are neither infinite nor free. In thi...
متن کاملCrawling the Web
The large size and the dynamic nature of the Web highlight the need for continuous support and updating of Web based information retrieval systems. Crawlers facilitate the process by following the hyperlinks in Web pages to automatically download a partial snapshot of the Web. While some systems rely on crawlers that exhaustively crawl the Web, others incorporate “focus” within their crawlers t...
متن کاملCrawling the Hidden Web
Current-day crawlers retrieve content from the publicly indexable Web, i.e., the set of web pages reachable purely by following hypertext links, ignoring search forms and pages that require authorization or prior registration. In particular, they ignore the tremendous amount of high quality content “hidden” behind search forms, in large searchable electronic databases. Our work provides a frame...
متن کاملCrawling Out of the RNA World
Comparison of phylogenetically diverse ribonucleoprotein (RNP) enzymes and information about their biochemistry have stimulated hypotheses about their evolution. Instead of the canonical view, in which catalysis proceeds from ribozyme to RNP enzyme to protein enzyme, RNP enzymes and proteins are seen to share contemporary catalysis. Furthermore, the RNA components of RNP enzymes show no evidenc...
متن کاملCrawling the Hidden Web ( Extended
Current-day crawlers retrieve content from the publicly indexable Web, i.e., the set of web pages reachable purely by following hypertext links, ignoring search forms and pages that require authorization or prior registration. In particular, they ignore the tremendous amount of high quality content “hidden” behind search forms, in large searchable electronic databases. Our work provides a frame...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: The Serials Librarian
سال: 2007
ISSN: 0361-526X,1541-1095
DOI: 10.1300/j123v52n03_02