Archiving website‐based references in academic papers: Problems caused by reference rot, potential solutions and limitations

نویسندگان

چکیده

The reference list of an academic paper, depending on the topic and breadth discussion, can involve citation websites, that is, web-based references. Academics might also use a wide range tools to assist them with analyses or interpretations, so increased reliance internet as work tool source information becomes inextricably linked journal content (Davis, 2016). As example, it was previously suggested be used platform for creation management online real-time lab notebooks (Mascarelli, 2014), although this proposal does not appear have become mainstream. Software essential part methodology, especially in biomedical, computer science artificial assistance (AI)-based research, such language editing like ChatGPT (van Dis et al., 2023), statistical software (Coenen 2021), may cited within text (i.e., informally), where link another publication website company's found, more formally list. Howison Bullard (2016) were unable access described 21% 90 papers across three strata ranked journals biology, they did describe percentage could due errors links web addresses websites. In their papers, scholars cite blogs, scholarly wikis, presentations add videos debates—that grey literature—all which would specific address (Jones Grey literature encompasses is published peer reviewed, indexed databases (Farrah & Mierzwinski-Urban, 2019; Schöpfel Prost, 2021). citations are integral data integrators facts statements sources (Hyland Jiang, 2019). Over time, unlike references collective stable ‘marker’ persistent identifier (PID) digital object (DOI), allowing identified traced even if journal's uniform resource locator (URL)1 changes, URLs dysfunctional broken), URL disappear altogether shuts down ceases exist, phenomenon known rot (Burnhill 2015). Kunze al. (2017) establishment lexicon PIDs allow user, academic, evaluate level durability identifier, including provider, before deciding whether it. Entities issue enhance reliability PID by following several principles (McMurry 2017). much same way DOIs represent localizers (Neumann Brase, assigned digitally preserved. cases, onus lies owners managers assign DOI URL, thus cover costs2 associated assignment. We note here there registration agencies world3 mainly specialize scientific content, but exclusively limited However, two issues proposal: (1) registered owner, realistic register each every because too expensive, many financially unsustainable plan; (2) owner must update record, notify agency change points correct webpage, do this, then will simply work, link. Moreover, closed any reason, either lead content-less site. Where locators metadata, defies first principle findable, accessible, interoperable reusable (FAIR) principles, having ‘eternally identifier’ (Wilkinson 2016), making unreliable impossible (Philipson, Elliott (2020) found average 43% four biodiversity aggregator provider networks (BHL, DataONE, GBIF, iDigBio) at some point unreliable. Broken antithetic transparent discourse (Gertler Bullock, focus references, narrow subset broader theme archive knowledge preservation (Costa ‘404 found’ page cannot error message observed, technical terms, HyperText Transfer Protocol (HTTP) response. Other messages (with respective codes) indicate restricted error: 400, bad request; 401, unauthorized; 403, forbidden; 404, found; 405, method allowed; 410, gone; 416, requested satisfiable; 500, internal server error; 502, gateway (Zittrain 2014). Such types take place when changes migrates; drift, older evidently being greater risk (Klein websites delisted, deindexed discontinued (Laakso 2021); A soft 404 error, incorrect HTTP status code 200 (OK) 3xx (redirect), masks code, requires other approaches overcome (Ayala 2022). Even academics carefully select time paper's (Nicholas 2015), post-publication real, has practical consequences integrity. suffers from login payment result outcome, no lack access. attempting login-restricted website, subscription-based journal, landing displays prompt pay-to-access prompt, very different rot, appears, found’. This paper discuss because, though access-related issue, error-related issue. all cases temporary irretrievable loss leads discontinuity compromises reference-related integrity (Teixeira da Silva, It necessary libraries, institutes preserve (Kirchhoff, 2008; Lynch, Some measures counter preserved Internet Archive's Wayback Machine (https://archive.org/index.php), hereafter referred Archive simplicity, Webrecorder (https://webrecorder.net/; now Conifer: https://conifer.rhizome.org/), Portico (www.portico.org/), LOCKSS (www.lockss.org/), CLOCKSS (https://clockss.org/) Perma (https://perma.cc/), free case, open second paid latter cases. There likely services. Perma.cc archiving service developed Harvard Library Innovation Lab initial funding Institute Museum Services (Dulin Ziegler, Riss (2015) noted how public documents underlying legal case studies court prevented deliberating Pearce Charlton (2009) identify plagiarized sources. Nwala (2021) proposed scientists historians qualitatively assess important social about recent events, store web-archived collections. implications eliminates ability check validity articles' content. One solution authors. With background mind, objectives. First, provides examples attempted quantify characterize phenomenon. Second, we provide short ‘manual’ editors manually Archive. Third, suggestions improving landscape while taking into account human technological limitations. section chronological appreciation recorded quantified literature. Lawrence (2001), analysing 67,577 100,826 ResearchIndex CiteSeer) between 1993 1999, initially 18%–53% (depending year) invalid. after careful searches correction syntax addresses, only amounted 3%. Wren (2004) observed absolute absolutely inaccessible) 19% 1630 Medline records spanning 1966 2003, value basically remained unchanged follow-up study 6154 unique Medline-indexed 2004, 2005 2007 (Wren, 2008). An analysis (Journal Computer-Mediated Communication, First Monday, Journal Interactive Media Education) revealed approximately half broken, values varied widely sampling years (Ho, 2005). top-ranked dermatology over 1999–2004 period 14.8%, 18.6% 22.1% Archives Dermatology, American Academy Investigative respectively (Wren 2006). Carnevale Aronsky (2007) documented 21.9% 1999 1049 among five (Artificial Intelligence Medicine, International Medical Informatics, Informatics Association: JAMIA, Methods Information Biomedical Informatics), increasing 43.2% 2004. Those authors provided useful 2001 journals. Thorp Brown volume lists Annals Emergency Medicine4 1% 5% 2000 2005, showing 78%, 56% 45% incidence 2000, 2003 respectively. That suggested, likelihood succumb rot. Ducut (2008) 16% 10,208 MEDLINE-indexed 1994 2006. Falagas 2003–2006 window, number accounted 2.5% 3.9% New England Medicine Lancet, respectively, 14.6% 17.9% those few later. Given queried should papers. argued avoidance impossible, given importance. Wagner 49.3% 2011 2002 2004 (Health Affairs, Health Care Management Review, Research, Healthcare Management, Research Review). body 1.06 million URL-containing appearing 3.5 Klein (2014) 34%, 66% 80% 1997 arXiv (preprint), Elsevier science, technology medicine (STM) articles PubMed Commons-archived preprints/articles, 13%, 22% 14% 2012. words, higher When similar group revisited 30% had memento (an archived webpage; Jones 70% suffered just history, mementos, once processing excluded, almost 83% 956,351 shown suffer Zittrain 31.2%, 26.1% Human Rights Journal, Law Technology 24% US Supreme Court opinions (SCOTUS Status Codes). Burnhill 52% 46,000 6400 e-theses 2010 universities archived, 18% irretrievably lost, pertinently noting ‘reference already begun article been published’ (p. 58). Gertler Bullock 2009 Political Science Review broken. Massicotte Botter 23% PhD dissertations during 2011–2015 institutional repository exhibited (link) rapid ‘degeneration’ biomedical (2007), who 2 days 4699 becoming public, 11.9% 840 Sife Lwoga (2017), assessing 574 822 2001–2015 East African medical journals, 44.1% 6.3% retrieved O'Connor (2018) 34.0% 2013 2017 Irish dysfunctional. Vinay Kumar Sampath (2019) ~23% 3912 586 Hi-Tech 2006 2017. Society Laparoscopic Robotic Surgeons, Ott (2022) 2019 2020, 14.7% unavailable bad, broken non-existent links. Yayla 16%–59% Turkish Librarianship 2020 (n = 2619), displaying 20.8% Niveditha calculated, 102,718 2008 2017, 23.1% 10 library (LIS) 15.7% communication media sample reveals ample information-degrading publishing. Our rationale suggesting practically threefold. fairly simple process URL—a detail next—but time-consuming, URL-rich (for now), financial burdens authors, publishers. its long since 1996 (Goldstone, seems lasting tool. service, comes caveats. 501 (c) (3) organization, charitable funds dry up, existence continuity may—ironically, topic—be disappearing. real. For cognizant Eysenbach Trudel (2005) (2006) recommended archival WebCite® (https://webcitation.org/) ceased operations is—also ironically—no longer itself Archive.5 Thus, limitation needs considered publishers decide formalize integration Archive, publishing stream, mandatory requirement instructions (IFAs), debated last paper. filled automatically Machine, crawls Internet, webpages (Internet Help Center, 2023a). reason automatic scanning, crawlers certain protected scanned addition, crawl continuously, intervals. frequency contain version webpage user needs, themselves. unless regular automated archive, gaps, regularly updated. Additionally, options non-public, password-protected pages dynamic forms, JavaScript Despite these limitations, appears popular website. caveat emptor using Archive,6 namely guaranteed. Shankar 25% rate acquisition extraction mementos. screenshot referenced. original well long, latter, problems setting, example columns lists, leading copyeditors truncate otherwise modify introducing tiny (Cabanac, 2015; e.g., https://tinyurl.com/app https://shortdoi.org/) reduce caused excessively URLs. one shortDOI® Anyaoku (2019), our list, https://doi.org/ghx875. private companies, dissolve without obligations, encourage libraries state-funded institutions invest allows develop own software. Similar services currently created allow, rule, shortened domains institutions, usually promote institutions' networks, media, (e.g., goUNG Shortener,7 UBC Shortener Service8). often authorized members create discussing. browser extensions add-ons available simplify procedure 2023b). convenient large Ultimately, primary responsibility website's owners, curators Unreliable biases poor management, traditional centralization approach, decentralization fix (Robinson 2018). providers (unfortunately) interested long-term (Rockembach, Consequently, third-party users others, direct information. Rather, secondary tier, users, whose untrustworthy, containing misinformation. longevity security author editor, nor aspects controlled users. supporting readers information, tool, classically support claim ways, form ‘user responsibility’ ensure valid, least use, archiving, suggest future, Journals' IFAs make clear ultimately reviewers verify functionality review, informing replace them, proof development occurred intervening period. advocating model shared responsibility. back, overview Hiberlink project, aimed pre-submission, submission stages, high sampled STM project led conglomeration organizations, University Edinburgh, Los Alamos National Laboratory Library, Memento, EDINA, Language Group, funded Andrew W. Mellon Foundation,9 functional, gleaned project,10 Archive.11 (2015), (Zotero, EndNote Mendeley) combat integrated Open System (OJS). independently plugins software, able EndNote, Mendeley OJS, discover functional Zotero,12 test plugin not. Except web-crawlers it, feasible humanly possible basis daily weekly; Davis, tend express personal interest wish—for whatever reason—to porous record others not, mementos damaged, (Brunelle Automation verification process13 resolve both accuracy bibliometric metadata citations, inaccuracies reducing potentially achieve step manuscript via system (Salem which, prior press ‘Submit journal’ button, file run through alone enough need transmitted exercise checks passed ensuring button unlocked. From understanding perception, gap (in terms solutions understanding) still exists creators, preservation, limitations include (Cramer 2023): storage (el-Showk, 2018); span preserving organizations bodies; running environmental costs (Pendergrass 2019); highly specialized IT skills; appropriate certification; sufficient advocacy insufficient engagement. Laakso though, encompassed 26,000 titles, 64 (0.02%) updated slightly higher, 0.04% 33,803 138 titles). Cost biggest journals' publishers' investment participation (Wittenberg Digital sufficiently prioritized (Dressler, 2017), ineffective implementation objective library-managed scholarship (Zhou These set-backs difficult implement industry-wide solution, players small entities, for-profit fleet, society-based publish explain why still, decades calls action (Dellavalle 2003), yet proposed. face serious debate swivels junction (and societies) and/or initiative specifically, focus, ‘commons’ knowledge, permanence require voluntary actions appreciate importance good community (Heywood pro-active problem. caveats transfer responsibilities editors. overburdened adding additional quality what seemingly never-ending set requests much. argument put forward editors, overwhelmed volumes submissions under-staffed editorial boards, ratio submitted handling high, requesting authenticity excessive work-related request, voluntary. recognize sustainability-related involves volunteer work. different, necessarily revolve around sustainability, optimization skills effectiveness overall towards Silva Dobránszki, 2018) portfolio screen approve publication, accepted stage, (or copyeditors) could/should correcting—where possible—such incidences new links, corrigenda corrective mechanisms. Here, too, multinational publisher completely resources than society-run former latter. possibility task reviewers, voluntary, equivalent trying over-squeeze lemon Katavić, find balance over-exploiting existing existed decades, alleviate respon

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Thrombolysis in Stroke Patients; Problems and Limitations

Thrombolysis for stroke is being used in some developing countries. This study was designed to evaluate the problems of thrombolysis therapy in Iran. During January-July 2008, all patients with ischemic stroke admitted to Ghaem Hospital, Mashhad, northeast Iran, were enrolled in a prospective observational study. Ghaem Hospital is a tertiary care hospital that includes infrastructure for thromb...

متن کامل

The role of papers in academic promotion

Writing research papers and books as well as publishing in scientific journals are very important in universities’ scientific promotion. They have a veto-like role in promotion procedure. Reaching high level in academic procedure is valuable, provided that scientific activities reach the highest points and, above all, can be used in solving each country’s internal problems. Here, by papers, I ...

متن کامل

Obesity in pregnancy: problems and potential solutions.

Recent years have witnessed an increase in the prevalence of maternal obesity during pregnancy in the United States and worldwide. Obese women have increased risks for gestational problems, such as diabetes, hypertension, and pre-eclampsia. Further, gestational obesity can adversely impact fetal growth and result in macrosomia, congenital abnormalities, and even fetal death. Measures must be ta...

متن کامل

the relationship between academic self-concept and academic achievement in english and general subjects of the students of high school

according to research, academic self-concept and academic achievement are mutually interdependent. in the present study, the aim was to determine the relationship between the academic self-concept and the academic achievement of students in english as a foreign language and general subjects. the participants were 320 students studying in 4th grade of high school in three cities of noor, nowshah...

Antagonistic potential of fluorescent pseudomonads and control of charcoal rot of chickpea caused by Macrophomina phaseolina.

The effectiveness of plant growth promoting rhizobacteria especially Pseudomonas fluorescens isolates were tested against charcoal rot of chickpea both in green house as well as in field conditions. Most of the isolates reduced charcoal rot disease and promoted plant growth in green house. A marked increase in shoot and root length was observed in P. fluorescens treated plants. Among all the P....

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Learned Publishing

سال: 2023

ISSN: ['0953-1513', '1741-4857']

DOI: https://doi.org/10.1002/leap.1560