What Does Candidate Key Mean?
A candidate key is a specific type of field in a relational database that can identify each unique record independently of any other data.
Experts describe a candidate key of having "no redundant attributes" and being a "minimal representation of a tuple" in a relational database table.
One of the most common questions around the use of a candidate key is how the candidate key relates to super keys and primary keys in database design.
Techopedia Explains Candidate Key
Candidate Keys and Primary Keys
The relationship of the candidate key and primary key is easier to describe — basically, a relational database can have more than one candidate keys, but only one of these candidate keys will be the primary key that will be primarily used to identify each record uniquely.
The best way to define candidate keys is with an example: A bank’s database is being designed. To uniquely define each customer’s account, a combination of the customer’s birthdate and a sequential number for each of his or her accounts can be used.
So, Mr. Andrew Smith’s checking account can be numbered 120344-1, and his savings account 120344-2. A candidate key has just been created.
This can raise problems.
What if more than one person with the same birth date wants to open an account with the bank?
Because of such potential pitfalls, a frequently used option is to create a unique candidate key. In this case, the bank’s database can issue unique account numbers that are guaranteed to prevent the problem just highlighted. For good measure, these account numbers can have some built-in logic.
For example checking accounts can begin with a "C," followed by the year and month of creation, and within that month, a sequential number. So, Andrew Smith’s checking account can now be C-200805-22.
Even without referring elsewhere, a teller can identify that this was the 22nd checking account created in May 2008. Savings accounts follow the same logic, but with an "S" instead of "C."
So if a database table has two or more unique identifiers, the database administrator or engineer will choose one of those to be the primary key.
Super Keys and Candidate Keys
The use of super keys and candidate keys is more complicated.
Essentially, a super key is composed of a set of attributes (and by proxy, often a set of table columns) that identify a unique record.
The candidate key also identifies a unique record, but is composed of one individual field or column
You will often see the candidate key referred to as a “minimal” of a super key.
For example, consider a manufacturing facility with two terminals, where only one product can be built at a given terminal at a particular time.
As the database logs the creation of products on a table, referencing time and terminal, the super key, the combination of time and terminal, will be unique for each individual record. However, the individual columns time and terminal will have redundancies.
If the database engineer attaches a candidate key to the table composed of arbitrary alphanumerical input, that candidate key will do the same thing that the time and terminal super key attribute combination does, with just one column.
With all of this in mind, the use of a candidate key can determine how records are accessed and how the subjects of those records are identified.
Like other kinds of database contents, candidate keys can be handled with hash encryption in order to increase cybersecurity. Hashing involves taking an original string field and making it into a shorter string in order to abstract it and hide it from view to limit unauthorized access.
Engineers have to evaluate the possibility of hash collisions and other limiting factors in how to hash this data — how to substitute a shorter string for a given data set in order to make the contents of the database table less accessible to hackers.
The idea of hashing can be effective in these traditional relational database designs. Meanwhile, the candidate key acts as a unique identifier and a main focus of SQL and other queries.