Jump to content
Main menu
Main menu
move to sidebar
hide
Navigation
Main page
Recent changes
Random page
Help about MediaWiki
Special pages
Niidae Wiki
Search
Search
Appearance
Create account
Log in
Personal tools
Create account
Log in
Pages for logged out editors
learn more
Contributions
Talk
Editing
Encyclopedia:Search engine test
(section)
Project page
Discussion
English
Read
Edit
View history
Tools
Tools
move to sidebar
hide
Actions
Read
Edit
View history
General
What links here
Related changes
Page information
Appearance
move to sidebar
hide
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
==Interpreting results== ===General=== {{shortcut|WP:HITS}} A raw hit count should never be relied upon to prove notability. Attention should instead be paid to what (the books, news articles, scholarly articles, and web pages) is found, and whether they actually {{em|do}} demonstrate notability or non-notability, case by case. Hit counts have always been, and very likely always will remain, an extremely erroneous tool for measuring notability, and should not be considered either definitive or conclusive. A manageable sample of results found should be opened individually and read, to actually verify their relevance. In the case of Google (and other search engines such as Bing and Yahoo!), the hit count at the top of the page is unreliable and should usually not be reported. The hit count reported on the penultimate (second-to-last) page of results may be slightly more accurate. For searches with few reported hits (less than 1000) the actual count of hits needed to reach the bottom of the last page of results may be more accurate, but even this is not a sure thing. Google returns different search results depending on factors such as your previous search history and on which Google server you happen to hit.<ref> {{cite web |title=Reliability Verification of Search Engines' Hit Counts |url= http://gplsi.dlsi.ua.es/congresos/qwe10/fitxers/QWE10_Funahashi.pdf |work=Proceedings of the 10th international conference on Current trends in web engineering |publisher=Computer Science and Engineering Division, Waseda University |date=2010 |last1=Takuya |first1=Funahashi |last2=Hayato |first2=Yamana |access-date=5 May 2015}}</ref><ref>{{cite web |title=Why Google Can't Count Results Properly |url= http://searchengineland.com/why-google-cant-count-results-properly-53559 |website=SearchEngineLand.com |date=21 October 2010 |last1=Sullivan |first=Danny |access-date=5 May 2015}}</ref> Other useful considerations in interpreting results are: * Article scope: If narrow, fewer references are required. Try to categorize the point of view, whether it is NPoV, or other; e.g., notice the difference between [[Ontology]] and [[Ontology (computer science)]]. * Article subject: If it's about some historical person, one or two mentions in reliable texts might be enough; if it's some Internet [[neologism]] or a [[Pop music|pop song]], it may be on 700 pages and might still not be considered 'existing' enough to show any notability, for Wikipedia's purposes. ===Biases to be aware of=== In most cases, search results should be reviewed with an awareness and careful skepticism before relying upon them. Common biases include: ====General biases==== '''General (the Internet or people as a whole):''' * ''Personal bias'' – Tendency to be more receptive to beliefs that one is familiar with, agrees with, or are common in one's daily culture, and to discount beliefs and views that contradict one's preferred views. * ''Cultural and computer-usage bias'' – Biased towards information from Internet-using developed countries and affluent parts of society (internet access). Countries where computer use is not so common will often have lower rates of reference to equally notable material, which may therefore appear (mistakenly) non-notable. * ''Undue weight'' – May disproportionally represent some matters, especially related to [[popular culture]] (some matters may be given far more space and others far less, than fairly represents their standing): ''popularity is not notability''. * ''Sources not readily accessible'' – Some sources are accessible to all, but many are payment only, or not reported online. This may, for example, affect the search results you get for a historical topic that achieved its peak media prominence 50 or 100 years ago; valid sources may very well exist, but would be found on microfilms or subscription news archiving sites like [[ProQuest]] or [[Newspapers.com]] rather than in a general Google search. '''General web search engines (Google, Bing web search etc.):''' * ''Dark net'' – Search engines exclude a vast number of pages, and this may include systematic bias so that some matters are excluded disproportionately (for example, because they are commonly visible on sites that do not allow Google indexing, or the content for technical reasons cannot be indexed ([[Adobe Flash|Flash]]- or image-based websites etc.) * ''Search engines as promotion tool'' – An [[Search engine optimization|industry exists]] seeking to influence site position, popularity, and ratings in such searches, or sell advertising space related to searches and search positions. Some subjects, such as [[pornographic actors]], are so dominated by these that searches cannot be reliably used to establish popularity. * ''Review process'' varies; some sites accept any information, while others have some form of review or checking system in place. * ''Self-mirroring'' – Sometimes other sites clone Wikipedia content, which is then passed around the Internet, and more pages built up based upon it (and often not cited), meaning that in reality the source of much of the search engine's findings are actually just copies of Wikipedia's own previous text, not genuine sources. * ''Popular usage bias'' – Popular usage and [[urban legend]] is often reported over correctness **Examples: **#A search for the incorrect [[Charles Windsor]] gives 10 times more results than the correct [[Charles Mountbatten-Windsor]]. **#A search for the most common spelling of [[El Niño]] will often report it spelt "El Nino", without the [[diacritic]]. **#Urban legends are often reported widely, for example [http://www.google.com/search?hl=en&lr=&rls=GGLG%2CGGLG%3A2005-43%2CGGLG%3Aen&q=%22uss++constitution%22+1779+%22set+sail+from+boston%22 hundreds of sites] report that the ''[[USS Constitution]]'' set sail in 1779, although the correct date is 1797. * ''Popular views and perceptions'' are likely to be more reported. For example, there may be many references to [[acupuncture]] and confirming that people are often [[allergy|allergic]] to animal [[fur]], but it may only be with careful research that it is revealed there are medical peer-reviewed assessments of the former, and that people are usually not allergic to fur, but to the sticky skin and saliva particles ([[dander]]) {{em|within}} the fur. * ''Language selection bias'' – For example, an Arabic speaker searching for information on [[homosexuality]] in Arabic will likely find pages which reflect a different bias than an English speaker searching in English on the same subject, since popular and media views and beliefs about homosexuality can differ widely between English-speaking countries (US, UK, Australia, etc.) that tend to include a higher proportion of homosexuality-accepting groups, and Arabic-speaking countries (Middle East) that tend to include a lower proportion. '''Other:''' * Note that other Google searches, particularly [http://books.google.com Google Book Search], have a different systemic bias from Google Web searches and give an interesting cross-check and a somewhat independent view. ===Foreign languages, non-Latin scripts, and old names=== Often for items of non-English origin, or in non-Latin scripts, a considerably larger number of hits result from searching in the correct script or for various transcriptions—be sure to check "''Languages for Displaying (Search) Results''" in "''Search Settings''".<ref name=search /> An [[Arabic language|Arabic]] name, for instance, needs to be searched for in the original script, which is easily done with Google (provided one knows what to search for), but problems may arise if – for example – English, French and German webpages transcribe the name using different conventions. Even for English-only webpages there may be many variants of the same Arabic or [[Russian language|Russian]] name. Personal names in other languages (Russian, [[Anglo-Saxon]]) may have to be searched for both including and excluding the [[patronymic]], and searches for names and other words in strongly [[inflection|inflected]] languages should take into account that arriving at the total number of hits may require searching for forms with varying [[Declension|case]]-endings or other grammatical variations not obvious for someone who does not know the language. Names from many cultures are traditionally given together with titles that are considered part of the name, but may also be omitted (as in [[Mustafa Kemal Pasha|''Gazi'' Mustafa Kemal ''Pasha'']]). Even in [[Old English]], the spelling and rendering of older names may allow dozens of variations for the same person. A simplistic search for one particular variant may underrepresent the web presence by an order of magnitude. A search like this requires a certain linguistic competence which not every individual Wikipedian possesses, but the Wikipedia community as a whole includes many bilingual and multilingual people and it is important for nominators and voters on AfD at least to {{em|be aware of their own limitations}} and not make untoward assumptions when language or transcription bias may be a factor. ===Google distinct page count issues=== Note also, that the number of search string matches reported by search engines is only an estimate. For example, Google will only calculate the actual number of matches once the user navigates through all result pages, to the last one, and even then it places restrictions on the figure. At times, the "match" count estimate can be significantly different (by one or more [[order of magnitude|orders of magnitude]]) to the total count of results shown on the last results page. A site-specific search may help determine if most of the matches are coming from the same web site; a single web site can account for hundreds of thousands of hits. For search terms that return many results, Google uses a process that eliminates results which are "very similar" to other results listed, both by disregarding pages with substantially similar content and by limiting the number of pages that can be returned from any given domain. For example, a search on "Taco Bell" will give only a couple of pages from tacobell.com even though many in that domain will certainly match. Further, Google's list of distinct results is constructed by first selecting the top 1000 results and then eliminating duplicates without replacements. Hence the list of distinct results will always contain fewer than 1000 results regardless of how many webpages actually matched the search terms. For example, {{as of|2010|12|14|lc=y}}, from the about 742 million pages related to "Microsoft", Google was returning 572 "distinct" results.<ref>[http://www.google.com/search?q=Microsoft&num=100&hl=en&lr=&safe=off&start=600&sa=N Google search for "Microsoft"]</ref> Caution must be used in judging the relative importance of websites yielding well over 1000 search results.
Summary:
Please note that all contributions to Niidae Wiki may be edited, altered, or removed by other contributors. If you do not want your writing to be edited mercilessly, then do not submit it here.
You are also promising us that you wrote this yourself, or copied it from a public domain or similar free resource (see
Encyclopedia:Copyrights
for details).
Do not submit copyrighted work without permission!
Cancel
Editing help
(opens in new window)
Search
Search
Editing
Encyclopedia:Search engine test
(section)
Add topic