نتایج جستجو برای: url
تعداد نتایج: 13770 فیلتر نتایج به سال:
This paper illustrates the utility of URL information in unsupervised learning. We outline the motivation behind the usage of URL information upfront, and present two techniques for unsupervised learning from URL corpora. First, we devise a similarity measure for URL pairs putting down the intuitions behind the same and verify its goodness by using it for clustering. Further, we outline a metho...
This report presents a study of URL and content persistence among 51 million pages from a national web harvested 8 times over almost 3 years. This study differs from previous ones because it describes the evolution of a large set of web pages for several years, studying in depth the characteristics of persistent data. We found that the persistence of URLs and contents follows a logarithmic dist...
In this paper we study in what order a crawler should visit the URLs it has seen, in order to obtain more "important" pages first. Obtaining important pages rapidly can be very useful when a crawler cannot visit the entire Web in a reasonable amount of time. We define several importance metrics, ordering schemes, and performance evaluation measures for this problem. We also experimentally evalu...
The polysyllabic shortening hypothesis holds that the duration of a primary stressed syllable is inversely proportional to the number of additional syllables within the word. We examine the evidence for this process in British English speech by measuring the duration of primary stressed syllables in monosyllabic, disyllabic and trisyllabic words, both right-headed series – e.g. mend, commend, r...
This study examined how duration of an unstressed final syllable in English is affected by conditions in the following word: stress (trochaic/iambic), accent (accented/unaccented), and initial stop voicing (voiced/voiceless). Results showed that the unstressed final syllable was shorter before an unstressed syllable, presumably due to polysyllabic shortening—i.e., the following unstressed sylla...
Typo-squatting refers to the practice of registering domain names that are typo variations of popular websites. We propose a new approach, called Strider Typo-Patrol, to discover large-scale, systematic typosquatters. We show that a large number of typosquatting domains are active and a large percentage of them are parked with a handful of major domain parking services, which serve syndicated a...
نمودار تعداد نتایج جستجو در هر سال
با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید