Crawling Out of the Web

نویسندگان
چکیده

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Crawling the Infinite Web

A large amount of the publicly available Web pages is generated dynamically upon request, and contain links to other dynamically generated pages. Many Web sites that are built with dynamic pages can create arbitrarily many pages. This poses a problem for the crawlers of Web search engines, as the network and storage resources required for indexing Web pages are neither infinite nor free. In thi...

متن کامل

Crawling the Web

The large size and the dynamic nature of the Web highlight the need for continuous support and updating of Web based information retrieval systems. Crawlers facilitate the process by following the hyperlinks in Web pages to automatically download a partial snapshot of the Web. While some systems rely on crawlers that exhaustively crawl the Web, others incorporate “focus” within their crawlers t...

متن کامل

Crawling the Hidden Web

Current-day crawlers retrieve content from the publicly indexable Web, i.e., the set of web pages reachable purely by following hypertext links, ignoring search forms and pages that require authorization or prior registration. In particular, they ignore the tremendous amount of high quality content “hidden” behind search forms, in large searchable electronic databases. Our work provides a frame...

متن کامل

Crawling Out of the RNA World

Comparison of phylogenetically diverse ribonucleoprotein (RNP) enzymes and information about their biochemistry have stimulated hypotheses about their evolution. Instead of the canonical view, in which catalysis proceeds from ribozyme to RNP enzyme to protein enzyme, RNP enzymes and proteins are seen to share contemporary catalysis. Furthermore, the RNA components of RNP enzymes show no evidenc...

متن کامل

Crawling the Hidden Web ( Extended

Current-day crawlers retrieve content from the publicly indexable Web, i.e., the set of web pages reachable purely by following hypertext links, ignoring search forms and pages that require authorization or prior registration. In particular, they ignore the tremendous amount of high quality content “hidden” behind search forms, in large searchable electronic databases. Our work provides a frame...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: The Serials Librarian

سال: 2007

ISSN: 0361-526X,1541-1095

DOI: 10.1300/j123v52n03_02