Jump to content
Main menu
Main menu
move to sidebar
hide
Navigation
Main page
Recent changes
Random page
Help about MediaWiki
Special pages
Niidae Wiki
Search
Search
Appearance
Create account
Log in
Personal tools
Create account
Log in
Pages for logged out editors
learn more
Contributions
Talk
Editing
Business intelligence
(section)
Page
Discussion
English
Read
Edit
View history
Tools
Tools
move to sidebar
hide
Actions
Read
Edit
View history
General
What links here
Related changes
Page information
Appearance
move to sidebar
hide
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
==Unstructured data== Business operations can generate a very large amount of [[data]] in the form of e-mails, memos, notes from call-centers, news, user groups, chats, reports, web-pages, presentations, image-files, video-files, and marketing material. According to [[Merrill Lynch]], more than 85% of all business information exists in these forms; a company might only use such a document a single time.<ref name="rao">{{cite journal|last1=Rao|first1=R.|year=2003|title=From unstructured data to actionable intelligence|url=http://www.ramanarao.com/papers/rao-itpro-2003-11.pdf|journal=IT Professional|volume=5|issue=6|pages=29β35|doi=10.1109/MITP.2003.1254966}}</ref> Because of the way it is produced and stored, this information is either [[Unstructured data|unstructured]] or [[semi-structured data|semi-structured]]. The management of semi-structured data is an unsolved problem in the information technology industry.<ref name="blumberg">{{cite journal|author1=Blumberg, R.|author2=S. Atre|name-list-style=amp|year=2003|title=The Problem with Unstructured Data|url=http://soquelgroup.com/Articles/dmreview_0203_problem.pdf|url-status=dead|journal=DM Review|pages=42β46|archive-url=https://web.archive.org/web/20110125033210/http://soquelgroup.com/Articles/dmreview_0203_problem.pdf|archive-date=25 January 2011}}</ref> According to projections from Gartner (2003), white-collar workers spend 30β40% of their time searching, finding, and assessing unstructured data. BI uses both structured and unstructured data. The former is easy to search, and the latter contains a large quantity of the information needed for analysis and decision-making.<ref name = blumberg /><ref name="negash">{{cite journal|author=Negash, S|year=2004|title=Business Intelligence|journal=Communications of the Association for Information Systems|volume=13|pages=177β195|doi=10.17705/1CAIS.01315|doi-access=free}}</ref> Because of the difficulty of properly searching, finding, and assessing unstructured or semi-structured data, organizations may not draw upon these vast reservoirs of information, which could influence a particular decision, task, or project. This can ultimately lead to poorly informed decision-making.<ref name = rao /> Therefore, when designing a business intelligence/DW-solution, the specific problems associated with semi-structured and unstructured data must be accommodated for as well as those for the structured data. ===Limitations of semi-structured and unstructured data=== {{update|part=section|reason=It's dubious that searchability and semantic analysis are still limitations at the current stage of NLP and AI development|date=December 2023}} There are several challenges to developing BI with semi-structured data. According to Inmon & Nesavich,<ref name = inmon>Inmon, B. & A. Nesavich, "Unstructured Textual Data in the Organization" from "Managing Unstructured data in the organization", Prentice Hall 2008, pp. 1β13</ref> some of those are: * Physically accessing unstructured textual data β unstructured data is stored in a huge variety of formats. * [[Terminology]] β Among researchers and analysts, there is a need to develop standardized terminology. * Volume of data β As stated earlier, up to 85% of all data exists as semi-structured data. Couple that with the need for word-to-word and semantic analysis. * Searchability of unstructured textual data β A simple search on some data, e.g. apple, results in links where there is a reference to that precise search term. (Inmon & Nesavich, 2008)<ref name = inmon /> gives an example: "a search is made on the term felony. In a simple search, the term felony is used, and everywhere there is a reference to felony, a hit to an unstructured document is made. But a simple search is crude. It does not find references to crime, arson, murder, embezzlement, vehicular homicide, and such, even though these crimes are types of felonies". ===Metadata=== To solve problems with searchability and assessment of data, it is necessary to know something about the content. This can be done by adding context through the use of [[metadata]].<ref name = rao />{{Needs independent confirmation|reason=The article is written by a founder of a company that made automatic categorization software. Not sufficient to establish that using automatically generated metadata is a mainstream approach of applying BI to unstructured data.|date=December 2023}} Many systems already capture some metadata (e.g. filename, author, size, etc.), but more useful would be metadata about the actual content β e.g. summaries, topics, people, or companies mentioned. Two technologies designed for generating metadata about content are [[Multiclass classification|automatic categorization]] and [[information extraction]].
Summary:
Please note that all contributions to Niidae Wiki may be edited, altered, or removed by other contributors. If you do not want your writing to be edited mercilessly, then do not submit it here.
You are also promising us that you wrote this yourself, or copied it from a public domain or similar free resource (see
Encyclopedia:Copyrights
for details).
Do not submit copyrighted work without permission!
Cancel
Editing help
(opens in new window)
Search
Search
Editing
Business intelligence
(section)
Add topic