Removing Duplicate Records in Microsoft Access

Introduction

(This article was originally written for Access 2002. While the interface in Access has changed somewhat, the principles are still essentially the same for later versions.)

Occasionally, you might have a table in Access that contains duplicate records. Maybe it was imported from one or more sources or perhaps it was not setup with the necessary key to ensure unique records. Whatever the reason, now you want to make sure that the records in the table are unique. Fortunately, Access offers a couple of options that you can use to remove the duplicates from your table quickly and easily. This article will explain two methods for removing duplicate information from tables.

Please remember that you should always backup your database before making changes to database objects or deleting large amounts of data.

A sample database with the tables and queries shown here is available for download.

Download sample for Access 97

Download sample for Access 2000 and 2002

Throughout this article, I use links to various screenshots which will be displayed in a separate window. An index of these screenshots is available at the bottom of this page.

As the example, I’m using a small table of fictional names, addresses and telephone
numbers. Figure 1 shows a screenshot of the original table which contains duplicate records as shown by the highlighting. For the purpose of this example, I’ve only included thirteen records and it would be simple to pick out the duplicates manually but if this table contained hundreds or even thousands of records, that would probably not be an option.

Figure 1 – Original Contacts table with obvious duplications highlighted

The first way to remove duplicates from a table is to use the Find Duplicates Query Wizard which is available by clicking on the New button in the database window or by selecting “Insert” and “Query” from the menu bar. (Figure 2)

Figure 2 – Accessing the Find Duplicates Query Wizard

After selecting a table or query in which to find duplicate records, the wizard will prompt you to select a field in which the data should be unique for each record (Figure 3). In this case, the home phone number field is the best available. While you could use the address
field to narrow the table by household, the phone number is shorter and is stored in a standard format. It is much less likely to contain errors or abbreviations that will prevent the wizard from finding a duplicate. You can also select a combination of fields if there is no one field that will ensure a unique record.

Figure 3 – Query Wizard: Selecting unique field

The next screen (Figure 4) will enable you to select any other fields you want to include in your query. In this illustration, all of the fields in the table are selected but you can select as many or as few as necessary. These fields will not affect the identification of duplicates but they will make the query results more understandable. If you do not select any additional fields, the wizard will add a field of its own that will count the number of duplications for each value it finds in the field that you selected earlier.

Figure 4 – Query Wizard: Selecting accompanying fields

The next screen will enable you to enter a name for the query to be saved under and will give you the option of viewing the data or changing the design of the query. When you run the query, the results will look something like Figure 5.

Figure 5 – Query Wizard: Find Duplicate Query results

You will notice in this example that the query has actually found three sets of duplicates. In addition to the two duplications which were highlighted in Figure 1, the table also contains the same phone number for Harrison and Geneva Raines. As I mentioned earlier, the wizard will allow the use of more than one field while searching for duplicates. In this case, using the first and last names in addition to the phone number would have eliminated the two highlighted records. With a database like this, you may or may not want to allow more than one contact per household, depending on your use of the data. Either way, the query has found the duplicated information in the table according to the specified settings.

Depending on the size of the table, this method may be all that you need. If the query finds no more than a few dozen duplications or if you want to remove both the original and duplicate record, it’s easy to delete the necessary records from the query. The records are automatically removed from the table on which the query was based. On the other hand, for especially large tables where you only want to remove duplications and leave the original record in the table, this may not be enough.

Using an Append Query

The other method of removing duplicate records is a two-step process that involves creating a copy of the table and once again selecting a field or combination of fields to determine if each record is unique.

The first step is to create a copy of the table structure. To do this, select the table in the database window and then select “Edit” and “Copy” from the menu bar or right-click the table and select “Copy” from the pop-up menu that appears. Then paste the table back into the database window, once again using either the menu bar or the pop-up menu. You will see the dialog box displayed in Figure 6. After entering a new name for the table, select “Structure Only” from the paste options shown. This will create a copy of the table structure but will not transfer the data.

Figure 6 – Copying the table structure

To select the unique field, open the new table in design view and select the appropriate field from the list shown. Right-click the field and select “Primary Key” from the pop-up menu or use the same option on the menu or toolbar (Figure 7). A table’s primary key is used to uniquely identify each record. Access will not allow duplicate information to be entered in a field marked as the primary key. In this example, I’m still using the phone number as the unique field but a combination of fields can also be set by holding down the SHIFT or CTRL key, selecting more than one field and setting it as the primary key using either the pop-up menu, menu or toolbar. Save your changes to the table structure and then close the table to return to the database window.

Figure 7 – Choosing the new primary key

The next step is to create an append query that will transfer the data from the original table to the new table structure. An append query must follow the same rules set by the primary key as you would if entering data manually. This means that it will not be able to copy duplicate records to the new table. Only one copy of each record will be transferred.

To create the append query, create a select query from the original table, including all records in the table as shown in Figure 8 below. Right-click in the top half of the query design area and select “Query Type” from the pop-up menu and then select “Append Query”. This will change the query type while retaining the table and fields that you’ve inserted.

Figure 8 – Creating the append query

You will be asked for the name of the table to append the data to (Figure 9). Select the name of the empty table that you just created and click OK.

Figure 9 – Selecting the table for the append

Notice that the rows displayed in the bottom half of the query design grid have changed. (Figure 10) There is now an “Append To:” row that lists the names of fields in the new table as they correspond to the fields in the table from which the data is being transferred. If the new table had different field names, you would have to select them manually but because the structure is exactly the same, the names are already filled in. Even so, it is a good idea to double-check these field names before running the query.

Figure 10 – Completed append query

When you run the query, Access will display the warning shown in Figure 11. Not all of the records from the Contacts table can be appended to the new table because of key violations. Again, you can double-check the reasons stated in the message and then click “Yes” to proceed with the query.

Figure 11 – Key violations message

An append query will not display a datasheet view as a select query will. It will just perform the append. After the query is run, you can save it if you might want to use it again later or close it without saving it. When you open the new table, it should contain all of the information from the old table, minus the duplicate records (Figure 12).

Figure 12 – Results of append query

Notice that Geneva Raines is still listed in the table but Harrison Raines has been removed. This is because only the phone number was specified in the primary key and Geneva Raines’ record was listed first according to the table’s sort order.

After transferring the data between the tables, you can rename the tables and change the primary key of the new table as necessary. You may find that it’s best to retain the old table for a period of time until you are absolutely sure that you no longer need it. This is the basic procedure for using an append query. If you want, you can also add extra criteria to the append query to determine which records will be transferred.

Before you leave, be sure to check out some of the other articles on Microsoft Access
right here on ComeauSoftware.com.

For more information on append queries, data management or other operations with Microsoft Access, you can consult Microsoft’s searchable online support section at http://support.microsoft.com. If you’re comfortable with Usenet, the newsgroup comp.databases.ms-access is also an excellent resource. The complete newsgroup archive is available online at http://groups.google.com.