Editing Internet Archive (section)

==Web archiving==
{{Further|Web archiving}}

===Wayback Machine===
{{main|Wayback Machine}}

[[File:Wayback Machine logo 2010.svg|right|thumb|Wayback Machine logo, used since 2001]]The [[Wayback Machine]] is a service that allows archives of the World Wide Web to be searched and accessed.<ref>{{cite news| url = http://www.businessweek.com/technology/content/feb2002/tc20020228_1080.htm| url-status = dead| archive-url = https://web.archive.org/web/20020601134105/http://www.businessweek.com/technology/content/feb2002/tc20020228_1080.htm| archive-date = June 1, 2002| title = A Library as Big as the World| last = Green | first = Heather | date = February 28, 2002| publisher = [[Business Week]] Online}}</ref> It can be used to see what previous versions of web sites used to look like or to visit web sites that no longer even exist. The Wayback Machine was created as a joint effort between [[Alexa Internet]] (owned by [[Amazon.com]]) and the Internet Archive.<ref name=":1"/> Hundreds of billions of web sites and their associated data (images, source code, documents, etc.) are saved in a database. {{As of|2024|9|5|df=US}}, the Internet Archive held over 866&nbsp;billion web pages, more than 42.5&nbsp;million print materials, 13&nbsp;million videos, 3&nbsp;million TV news reports, 1.2&nbsp;million software programs, 14&nbsp;million audio files, 5&nbsp;million images, and 272,660 concerts in its Wayback Machine.<ref name="Archive2024" />[[File:Internet Archive servers 5034 - Jason Scott.jpg|thumb|[[Server (computing)|Servers]] at the Internet Archive headquarters in San Francisco]]
[[File:Incoming additional storage at Internet Archive.jpg|thumb|right|A [[Disk enclosure#Hard drive shucking|purchase of additional storage]] at the Internet Archive]]

===Archive-It{{anchor|Archive-It}}===
Created in early 2006, Archive-It<ref>{{cite web|url=http://www.archive-it.org/ |title=archive-it.org |publisher=archive-it.org |access-date=April 13, 2013 |url-status=live |archive-url=https://web.archive.org/web/20130414092416/http://archive-it.org/ |archive-date=April 14, 2013 }}</ref> is a web archiving subscription service that allows institutions and individuals to build and preserve collections of digital content and create digital archives. Archive-It allows the user to customize their capture or exclusion of web content they want to preserve for cultural heritage reasons. Through a web application, Archive-It partners can harvest, catalog, manage, browse, search, and view their archived collections.<ref>{{Cite book |url = http://nrs.harvard.edu/urn-3:HUL.InstRepos:25658314 |last = Truman |first = Gail |date = January 2016 |title = Web Archiving Environmental Scan |series = Harvard Library Report |access-date = October 3, 2017 |archive-date = December 8, 2019 |archive-url = https://web.archive.org/web/20191208122749/http://nrs.harvard.edu/urn-3:HUL.InstRepos:25658314 |url-status = live }}</ref>

In terms of accessibility, the archived web sites are full text searchable within seven days of capture.<ref>{{Cite web |last=Bragg |first=Molly |date=July 28, 2014 |title=What is the Difference between the General Archive (sometimes called the Wayback Machine) and Archive-It? |url=https://webarchive.jira.com/wiki/display/ARIH/Archive-It+How-to+FAQ#Archive-ItHow-toFAQ-differencebetween |archive-url=https://web.archive.org/web/20161004224713/https://webarchive.jira.com/wiki/display/ARIH/Archive-It+How-to+FAQ#Archive-ItHow-toFAQ-differencebetween |archive-date=October 4, 2016 |publisher=Archive-It |via=Jira.com}}</ref> Content collected through Archive-It is captured and stored as a [[WARC (file format)|WARC file]]. A primary and back-up copy is stored at the Internet Archive data centers. A copy of the WARC file can be given to subscribing partner institutions for geo-redundant preservation and storage purposes to their best practice standards.<ref>{{cite web|url=http://www.archive-it.org/learn-more |title=About Archive-It |publisher=Archive-It. |access-date=March 3, 2014 |url-status=live |archive-url=https://web.archive.org/web/20140221160344/https://archive-it.org/learn-more |archive-date=February 21, 2014 }}</ref> Periodically, the data captured through Archive-It is indexed into the Internet Archive's general archive.

{{As of|2014|03}}, Archive-It had more than 275 partner institutions in 46 U.S. states and 16 countries that have captured more than 7.4&nbsp;billion URLs for more than 2,444 public collections.{{citation needed|date=October 2024}} Archive-It partners are universities and college libraries, state archives, federal institutions, museums, law libraries, and cultural organizations, including the [[Electronic Literature Organization]], North Carolina State Archives and Library, [[Stanford University]], [[Columbia University]], [[The American University in Cairo|American University in Cairo]], Georgetown Law Library, and many others.{{citation needed|date=October 2024}}

===Internet Archive Scholar===
{{Main|Internet Archive Scholar}}
In September 2020, Internet Archive announced a new initiative to archive and preserve [[open access]] academic journals, called [[Internet Archive Scholar]].<ref>{{Cite web|title=The Internet Archive Will Digitize & Preserve Millions of Academic Articles with Its New Database, 'Internet Archive Scholar'|url=http://www.openculture.com/2020/09/internet-archive-scholar.html |date=September 22, 2020 |access-date=2020-09-23 |website=Open Culture|language=en-US|archive-date=September 22, 2020|archive-url=https://web.archive.org/web/20200922161701/http://www.openculture.com/2020/09/internet-archive-scholar.html |url-status=live}}</ref><ref>{{Cite web|last=Bryan|first=Newbold|date=2021-03-09|title=Search Scholarly Materials Preserved in the Internet Archive|url=https://blog.archive.org/2021/03/09/search-scholarly-materials-preserved-in-the-internet-archive/}}</ref><ref>{{Cite web |title=Internet Archive Scholar [homepage&#93; |url=https://scholar.archive.org/ |publisher=Internet Archive |access-date=24 March 2022}}</ref> Its full-text search index includes over 25&nbsp;million research articles and other scholarly documents preserved in the Internet Archive. The collection spans from digitized copies of eighteenth century journals through the latest open access conference proceedings and pre-prints crawled from the World Wide Web.{{citation needed|date=October 2024}}

=== General Index ===
In 2021, the Internet Archive announced the initial version of the [[General Index (academia)|General Index]], a publicly available [[Index (publishing)|index]] to a collection of 107&nbsp;million academic [[Article (publishing)|journal article]]s.<ref name=":02">{{Cite journal|last=Else|first=Holly|date=2021-10-26|title=Giant, free index to world's research papers released online|url=https://www.nature.com/articles/d41586-021-02895-8|journal=Nature|language=en|doi=10.1038/d41586-021-02895-8|pmid=34703019|s2cid=240000069|access-date=November 12, 2021|archive-date=November 13, 2021|archive-url=https://web.archive.org/web/20211113162341/https://www.nature.com/articles/d41586-021-02895-8|url-status=live}}</ref><ref>{{Cite web|title="The General Index": New tool allows you to search 107 million research papers for free|url=https://bigthink.com/the-present/general-index-open-access/|access-date=2021-11-12|website=Big Think|date=November 5, 2021 |language=en-US|archive-date=November 12, 2021|archive-url=https://web.archive.org/web/20211112214225/https://bigthink.com/the-present/general-index-open-access/|url-status=live}}</ref>