Editing Database (section)

===1970s, relational DBMS===
[[Edgar F. Codd]] worked at IBM in [[San Jose, California]], in one of their offshoot offices that were primarily involved in the development of [[hard disk]] systems. He was unhappy with the navigational model of the CODASYL approach, notably the lack of a "search" facility. In 1970, he wrote a number of papers that outlined a new approach to database construction that eventually culminated in the groundbreaking ''A Relational Model of Data for Large Shared Data Banks''.{{sfn|Codd|1970}}

In this paper, he described a new system for storing and working with large databases. Instead of records being stored in some sort of [[linked list]] of free-form records as in CODASYL, Codd's idea was to organize the data as a number of "[[Table (database)|tables]]", each table being used for a different type of entity. Each table would contain a fixed number of columns containing the attributes of the entity. One or more columns of each table were designated as a  [[primary key]] by which the rows of the table could be uniquely identified; cross-references between tables always used these primary keys, rather than disk addresses, and queries would join tables based on these key relationships, using a set of operations based on the mathematical system of [[relational calculus]] (from which the model takes its name). Splitting the data into a set of normalized tables (or ''relations'') aimed to ensure that each "fact" was only stored once, thus simplifying update operations. Virtual tables called ''views'' could present the data in different ways for different users, but views could not be directly updated.

Codd used mathematical terms to define the model: relations, tuples, and domains rather than tables, rows, and columns. The terminology that is now familiar came from early implementations. Codd would later criticize the tendency for practical implementations to depart from the mathematical foundations on which the model was based.

[[File:Relational key SVG.svg|thumb|In the [[relational model]], records are "linked" using virtual keys not stored in the database but defined as needed between the data contained in the records.]]
 
The use of primary keys (user-oriented identifiers) to represent cross-table relationships, rather than disk addresses, had two primary motivations. From an engineering perspective, it enabled tables to be relocated and resized without expensive database reorganization. But Codd was more interested in the difference in semantics: the use of explicit identifiers made it easier to define update operations with clean mathematical definitions, and it also enabled query operations to be defined in terms of the established discipline of [[first-order predicate calculus]]; because these operations have clean mathematical properties, it becomes possible to rewrite queries in provably correct ways, which is the basis of query optimization. There is no loss of expressiveness compared with the hierarchic or network models, though the connections between tables are no longer so explicit.

In the hierarchic and network models, records were allowed to have a complex internal structure. For example, the salary history of an employee might be represented as a "repeating group" within the employee record. In the relational model, the process of normalization led to such internal structures being replaced by data held in multiple tables, connected only by logical keys.

For instance, a common use of a database system is to track information about users, their name, login information, various addresses and phone numbers. In the navigational approach, all of this data would be placed in a single variable-length record. In the relational approach, the data would be ''normalized'' into a user table, an address table and a phone number table (for instance). Records would be created in these optional tables only if the address or phone numbers were actually provided.

As well as identifying rows/records using logical identifiers rather than disk addresses, Codd changed the way in which applications assembled data from multiple records. Rather than requiring applications to gather data one record at a time by navigating the links, they would use a declarative query language that expressed what data was required, rather than the access path by which it should be found. Finding an efficient access path to the data became the responsibility of the database management system, rather than the application programmer. This process, called query optimization, depended on the fact that queries were expressed in terms of mathematical logic.

Codd's paper was picked up by two people at Berkeley, Eugene Wong and [[Michael Stonebraker]]. They started a project known as [[INGRES]] using funding that had already been allocated for a geographical database project and student programmers to produce code. Beginning in 1973, INGRES delivered its first test products which were generally ready for widespread use in 1979. INGRES was similar to [[IBM System R|System R]] in a number of ways, including the use of a "language" for [[data access]], known as [[QUEL query languages|QUEL]]. Over time, INGRES moved to the emerging SQL standard.

IBM itself did one test implementation of the relational model, [[PRTV]], and a production one, [[IBM Business System 12|Business System 12]], both now discontinued. [[Honeywell]] wrote [[Multics Relational Data Store|MRDS]] for [[Multics]], and now there are two new implementations: [[Dataphor|Alphora Dataphor]] and Rel. Most other DBMS implementations usually called ''relational'' are actually SQL DBMSs.

In 1970, the University of Michigan began development of the [[MICRO Relational Database Management System|MICRO Information Management System]]{{sfn|Hershey|Easthope|1972}} based on [[David L. Childs|D.L. Childs]]' Set-Theoretic Data model.{{sfn|North|2010}}{{sfn|Childs|1968a}}{{sfn|Childs|1968b}} MICRO was used to manage very large data sets by the [[US Department of Labor]], the [[U.S. Environmental Protection Agency]], and researchers from the [[University of Alberta]], the [[University of Michigan]], and [[Wayne State University]]. It ran on IBM mainframe computers using the [[Michigan Terminal System]].<ref name=MICROManual1977>{{cite book |author1=M.A. Kahn |author2=D.L. Rumelhart |author3=B.L. Bronson |date=October 1977 |url=https://docs.google.com/viewer?a=v&pid=explorer&chrome=true&srcid=0B4t_NX-QeWDYZGMwOTRmOTItZTg2Zi00YmJkLTg4MTktN2E4MWU0YmZlMjE3 |title=MICRO Information Management System (Version 5.0) Reference Manual |publisher=Institute of Labor and Industrial Relations (ILIR), University of Michigan and Wayne State University}}</ref> The system remained in production until 1998.