What is referential integrity?

por | 21 septiembre, 2007

 

Simply put, referential integrity means that when a record in a table refers to a corresponding record in another table, that corresponding record will exist. Look at the following:

customer

customer_id

name

1

Nhlanhla

2

Anton

customer_sales

transaction_id

amount

customer_id

1

23

1

2

39

3

3

81

2

There are 2 customers in the customer table, but 3 customer_id’s in the customer sales table. Assuming the two tables are linked with the customer_id field, you can tell that Nhlanhla has an amount of 23, and Anton 81. However, there is no corresponding name for customer_id 3. Foreign key relationships are described as parent/child relationships (customer being the parent, and customer_sales the child), and the record is said to be orphaned when its parent is no longer in existence.

A database in this sort of condition is referred to as having poor referential integrity (there are other kinds of integrity problems too). This is not necessarily a serious problem – one of the primary systems I work uses MyISAM tables, and has loads of orphans: article blurbs and article bodies not linked to any articles, but these don’t do much harm besides prickle my aesthetic sensibility, and we’ve never needed to fix this. However, it is not good design, and can sometimes lead to problems, so you should avoid a situation like this where possible.

In the past, the MySQL DBMS could not enforce this, and the responsibility passed to the code to do so. But this wasn’t good enough for serious systems, and one of the most frequently requested features in later versions of MySQL was that of foreign keys, enabling MySQL data to maintain referential integrity. A foreign key is simply a field in one table that corresponds to a primary key in another table. In the example above, customer_id would be the primary key in the customer table, uniquely identifying each record, and transaction_id would be the same in the customer_sales table. In the customer_sales table, the customer_id field could be an example of a foreign key, referring to its namesake in the customer table. A transaction should not exist without an associated customer. The code that generated these tables is clearly buggy!

Defining Foreign Keys in MySQL

Strictly speaking, for a field to be a foreign key, it needs to be defined as such in the database definition. You can ‘define’ a foreign key in any MySQL table type (including the default MyISAM table type), but they do not actually do anything – they are only used to enforce referential integrity in InnoDB tables.

In order to create a foreign key, you need the following:

  • Both tables need to be InnoDB tables.
  • To use the syntax FOREIGN KEY(fk_fieldname) REFERENCES table_name (fieldname)
  • The field being declared a foreign key needs to be declared as an index in the table definition

Here is how you would define the two tables above with a foreign key:

CREATE TABLE customer 
(
    customer_id INT NOT NULL,
    name        VARCHAR(30),
    PRIMARY KEY (customer_id)
) TYPE = INNODB;

CREATE TABLE customer_sales 
(
    transaction_id INT NOT NULL,
    amount 	   INT,
    customer_id    INT NOT NULL,
    PRIMARY KEY(transaction_id),
    INDEX (customer_id),
    FOREIGN KEY (customer_id) REFERENCES customer (customer_id) 
) TYPE = INNODB;

If you get the rather unhelpful error message:

ERROR 1005: Can't create table './test/customer_sales.frm' (errno: 150)

then check your foreign key definitions carefully – something is wrong with the definition. Common causes are a table not being of type InnoDB, a missing index on the same field (customer_id), or attempting to set a field to NULL when it cannot be (see the ON DELETE SET NULL clause below).

Referential integrity can be compromised in three situations: when creating a new record, deleting a record or updating a record. The FOREIGN KEY (transaction_id) REFERENCES customer (customer_id) clause ensures that when a new record is created in the customer_sales table, it must have a corresponding record in the customer table. After creating the above tables, insert the following data, which we will use to demonstrate some of the concepts:

mysql> INSERT INTO customer VALUES(1,'Nhlanhla'),(2,'Anton');
Query OK, 2 rows affected (0.00 sec)
mysql> INSERT INTO customer_sales VALUES(1,23,1),(3,81,2);
Query OK, 2 rows affected (0.00 sec)

Now insert the third record, referring to the non-existent customer 3:

mysql> INSERT INTO customer_sales VALUES(2,39,3);
ERROR 1216: Cannot add or update a child row: a foreign key constraint fails

You cannot add the record, as customer_id 3 does not exist. The constraint has ensured your data keeps its integrity! However, what happens when we delete a record? Let’s add a customer 3, then add the customer_sales record again, after which we delete the 3rd customer:

mysql> INSERT INTO customer VALUES(3,'Malvin');
Query OK, 1 row affected (0.00 sec)
mysql> INSERT INTO customer_sales VALUES(2,39,3);
Query OK, 1 row affected (0.01 sec)
mysql> DELETE FROM customer WHERE customer_id=3;
ERROR 1217: Cannot delete or update a parent row: a foreign key constraint fails

So the constraint holds, and we would need to first delete the record from the customer_sales table. There is a way we could have allowed the delete to go ahead, which we will look at shortly, but first we will need to drop and recreate the index.

Dropping a foreign key

You cannot just drop a foreign key constraint, as you would an ordinary index. See what happens when you try.

mysql> ALTER TABLE customer_sales DROP FOREIGN KEY;
ERROR 1005: Can't create table './test/#sql-86_6b87c.frm' (errno: 150)

To drop the index, you will need to specify the internally generated foreign key id, which you can see by running

mysql> SHOW CREATE TABLE customer_sales;
| Table          | Create Table                                         
| customer_sales | CREATE TABLE 'customer_sales' (
  'transaction_id' int(11) NOT NULL default '0',
  'amount' int(11) default NULL,
  'customer_id' int(11) NOT NULL default '0',
  PRIMARY KEY  ('transaction_id'),
  KEY 'customer_id' ('customer_id'),
  CONSTRAINT '0_22' FOREIGN KEY ('customer_id') REFERENCES 'customer' 
  ('customer_id')
) TYPE=InnoDB |

1 row in set (0.00 sec)

In my case the constraint was 0_22 – it will probably be different for you, so use the constraint specific to you.

mysql> ALTER TABLE customer_sales DROP FOREIGN KEY 0_22;
Query OK, 2 rows affected (0.00 sec)

This works with MySQL 4.0.13 and later.

Deleting foreign keys

You can delete a record from customer and at the same time delete the record from the customer_sales table, using only one delete statement. This is called a cascading delete, where all associated records are deleted, ‘cascading’ through the tables according to the foreign key relationships. An alternative is not to delete the related record, but to set the foreign key value to NULL (assuming the field can be NULL – it is not in our example as we have defined customer_id as NOT NULL). The standard options when deleting a foreign key are listed below.

  • ON DELETE CASCADE
  • ON DELETE SET NULL
  • ON DELETE RESTRICT
  • ON DELETE NO ACTION
  • ON DELETE SET DEFAULT

ON DELETE RESTRICT is the default, and disallows a delete if an associated record still exists, as we saw above. ON DELETE NO ACTION does the same thing. ON DELETE SET DEFAULT does not currently work in MySQL – it is supposed to set the deleted foreign key value to whatever was defined as the default value. Here is an example of a cascading delete:

mysql> ALTER TABLE customer_sales ADD FOREIGN KEY(customer_id) 
REFERENCES customer (customer_id) ON DELETE CASCADE;
Query OK, 3 rows affected (0.00 sec)

Let’s look at what we have before we attempt to delete:

mysql> SELECT * FROM customer;
+-------------+----------+
| customer_id | name     |
+-------------+----------+
|           1 | Nhlanhla |
|           2 | Anton    |
|           3 | Malvin   |
+-------------+----------+
3 rows in set (0.00 sec)

mysql> SELECT * FROM customer_sales;
+----------------+--------+-------------+
| transaction_id | amount | customer_id |
+----------------+--------+-------------+
|              1 |     23 |           1 |
|              2 |     39 |           3 |
|              3 |     81 |           2 |
+----------------+--------+-------------+
3 rows in set (0.00 sec)

Now we delete Malvin from the database:

mysql> DELETE FROM customer WHERE customer_id=3;
Query OK, 1 row affected (0.00 sec)

mysql> SELECT * FROM customer_sales;
+----------------+--------+-------------+
| transaction_id | amount | customer_id |
+----------------+--------+-------------+
|              1 |     23 |           1 |
|              3 |     81 |           2 |
+----------------+--------+-------------+
2 rows in set (0.00 sec)

mysql> SELECT * FROM customer;
+-------------+----------+
| customer_id | name     |
+-------------+----------+
|           1 | Nhlanhla |
|           2 | Anton    |
+-------------+----------+
2 rows in set (0.00 sec)

He has also been removed from the customer_sales table. Use cascading deletes with care!

From MySQL 4.0.8, similar options can be applied to ON UPDATE, i.e:

  • ON UPDATE CASCADE
  • ON UPDATE SET NULL
  • ON UPDATE RESTRICT
  • ON UPDATE NO ACTION
  • ON UPDATE SET DEFAULT

Let’s look at a quick example: First, drop the index again (using the constraint specific to your table)

mysql> ALTER TABLE customer_sales DROP FOREIGN KEY 0_42;
Query OK, 2 rows affected (0.00 sec)

mysql> ALTER TABLE customer_sales ADD FOREIGN KEY(customer_id) 
REFERENCES customer (customer_id) ON DELETE RESTRICT ON UPDATE CASCADE;
Query OK, 2 rows affected (0.00 sec)

mysql>UPDATE customer SET customer_id=customer_id+10;
Query OK, 2 rows affected (0.00 sec)

mysql> SELECT * FROM customer;
+-------------+----------+
| customer_id | name     |
+-------------+----------+
|          11 | Nhlanhla |
|          12 | Anton    |
+-------------+----------+
3 rows in set (0.00 sec)

mysql> SELECT * FROM customer_sales;
+----------------+--------+-------------+
| transaction_id | amount | customer_id |
+----------------+--------+-------------+
|              1 |     23 |          11 |
|              3 |     81 |          12 |
+----------------+--------+-------------+
3 rows in set (0.00 sec)

The updates have cascaded through to the customer_sales table as well.