If the consequences of a heap (hash join perhaps) make that the preferred storage then the costs of a wider clustering key need to be balanced into the trade-off analysis.Ĭonsider this example:: ALTER TABLE PersonsĪDD CONSTRAINT pk_PersonID PRIMARY KEY (P_Id,LastName) If for example a small site identifier and 4-byte "site-sequence-number" is feasible then that design might give better performance than a GUID as Surrogate Key. If a GUID must be used due to business constraints than the search for a better clustering key is worthwhile. If the table is organized as a heap then the 8-byte (heap) RowID will be used for key lookups, which is better than a 16-byte GUID but less performant than a 4-byte integer. If a GUID is used as the clustering key the situation will often be worse than if the smallest available Natural Key had been used. Note that these benefits only occur when the surrogate key is both small and the clustering key. Our clustered index fan-outs increase again to reduce clustered index height and size, reduce cache load for our clustered indexes, decrease reads when accessing data through any mechanism (whether index scan, index seek, non-clustered key lookup or foreign key lookup) and decrease storage requirements for both clustered and nonclustered indexes of our tables. When set as the clustering key, so as to be used for key lookups into the clustered index from non-clustered indexes and foreign key lookups from related tables, all these disadvantages disappear. This is where a small Surrogate Key, designated to the RDBMS as "the Primary Key" proves beneficial. chasing other indexes and data out of cache. When our available Natural Key(s) are wide this (1) widens the width of our non-clustered leaf nodes, increasing storage requirements and read accesses for seeks and scans of that non-clustered index and (2) reduces fan-out from our clustered index increasing index height and index size, again increasing reads and storage requirements for our clustered indexes and (3) increases cache requirements for our clustered indexes. Recall that non-covered columns for a non-clustered index can only be found (in general) through a Key Lookup into the clustered index (ignore tables implemented as heaps for a moment). However the Physical Model for our tables will in many instances be inefficient without a Surrogate Key. This is because Surrogate Keys have no business meaning. It is only through those Natural Keys that users are able to uniquely identify rows in the tables as surrogate keys should always be hidden from users. When we then build tables for these Entities their Candidate Keys become Natural Keys in those tables. Boyce, Codd, Date et al refer to these in the Relational Model as Candidate Keys. It is important that the Logical Model for every Entity have at least one set of "business attributes" which comprise a Key for the entity. Instead I refer to the Surrogate Key of the Physical Model and the Natural Key(s) of the Logical Model. Primary Key is very unfortunate notation, because of the connotation of "Primary" and the subconscious association in consequence with the Logical Model. `id` mediumint(9) NOT NULL AUTO_INCREMENT, InnoDB will generate an error " ERROR 1075 (42000): Incorrect table definition there can be only one auto column and it must be defined as a key". This code below has a composite primary key. What would be the point of having multiple auto-generating columns? This is the answer for both the main question and for question of
0 Comments
Leave a Reply. |