Database Cleansing - They, What, Why, and How

By: Heather Maloney
Submitted: 2007-01-17 16:09:53
Print this article | Tell a friend | For publisher | Social Bookmarking
Rating:
 

Firstly the "What"

Cleaning a database is done to:

* Remove duplicate records

* Ensure your data is consistently formatted

* Correct data that is obviously wrong e.g. wrong postcode for a known suburb

* Find other records that are likely to be the same (more on this later)

So "Why" would you want to do that?

To explain why, I am going to use the example of a customer database, but the principles apply to other types of data also.

Have you ever received a marketing message / catalogue in the mail twice or more times? I receive multiple copies of such communications regularly, and I don't always get around to telling the sender of their mistake. This can:

* be interpreted as sloppiness on the part of the organisation

* undo your efforts to target / personalise - any attempt on the organisation's part to "personalise" and "target" the message is wasted, because the recipient knows immediately that it was a mindless distribution of information using a database.

* waste $$$! Everytime you send a communication twice to the one person or household, you have most likely just wasted some of your hard-earned funds.

In addition, cleaning your data, will help you to analyse your data more accurately. For instance, you will know the real number of contacts and perhaps how they are geographically distributed, rather than the distorted figures that can be derived from analysing a corrupted database.

It's not a crime! In fact it is very easy for your data to get in a state that requires cleaning. For example, when a client changes their address, your staff might update the suburb but forget to put in the new postcode. Or, an existing client returns to your organisation several years later, without informing new staff that they are an existing client, and if you don't have the appropriate keys on your database preventing duplicates, the client could be set up again as another customer with the same or similar details.

Having documented processes that your staff can use as a checklist, and appropriate unique keys on your database fields, will go some way to ensuring that your data is kept clean, but incorrect data will never be prevented.

"How" then, do you efficiently clean your database?

Fixing incorrect information such as the postcode matching the suburb is usually done by comparing each record to the correct values in another table. For example, to correct all the postcodes in your data, assuming that the suburb entered is correct, you would write SQL code that would compare the postcode of your record against a table of postcode + suburb + state that you may have obtained from Australia Post. Such a process would likely generate a list of records where the suburb was not found, requiring you to manually investigate and correct the data.

Correcting the formatting of your data, is usually done using some pretty simple SQL perhaps combined with logic programming. You need to decide the format you wish to apply to your data, for example, whether you would like the suburb in title case or all capitals. While this is much less important than getting the data actually right, it can help to make your communications look more professional.

Finding duplicates is a fairly easy task for someone who knows a little about the SQL database language. It is more difficult to find similar records that really are the same person, but are not listed in exactly the same way in your database. For instance the following two records may actually be the same person:

 ID	Firstname	Surname	Address1	Suburb	   Postcode	State
 3442	John	        Citizen PO Box 33	Frankston   3199	VIC
 682	Jonathon	Citien 14 Beach Road	F'STON	    3199        VIC
 

Finding records such as the above calls for what is usually called "Fuzzy" Matching. Software is available to find such records, and much more experienced SQL programmers could write software to find such possible duplicates.

Because you can't confidently use logic to determine whether or not two records are the same in the case given above, usually fuzzy matching would leave the data as is, but produce an exception report, highlighting likely duplicate records.

Even when you can determine confidently that two records are the same, you may wish to manually process the data cleanup to ensure that only the correct data is kept, and that all associated pieces of information are transferred across to the valid record e.g. customer payment history. It is possible however, to set up your de-duplication process to remove all the duplicates and clean up all the records automatically.

Cleaning your database can take some time, and some manual effort on the part of your staff. If you are just starting out with a new database, it is very worthwhile to:

1. Agree and document the data structure, and what information will be stored in what field (which isn't always obvious despite the names you might give fields)

2. Agree the format of the data entered into each field

3. Agree a process to handle the case where a record needs to be entered that won't fit into the current structure

If you need help cleaning your database, Contact Point (www.contactpoint.com.au) can help you. We provide a quick and efficient service to deal with all the database issues discussed above, and can tailor our service to meet your particular needs. Submit a request now for an obligation free quote.

Heather Maloney is the Managing Director of Contact Point IT Services Pty Ltd (http://www.contactpoint.com.au) - a company providing IT Solutions to small to medium sized businesses, that deliver measurable results. Contact Point is focused on helping businesses to interact better with their clients, customers, suppliers and other 3rd parties.

Article source: Expert Articles

Most Recent Articles in Personal Tech category

  • 10 Reasons Why Your Business Needs A Server - By: Joel Ray
    Any reliable computer technician will tell you that backing up data stored on individual computers is not only impractical, it's unreliable. Storing all your data on a server and then backing up all of the data on that server is more likely to work properly without interruption and it is also easier to manage.
  • Setting Up a Service Level Agreement - By: Rob Seolas
    You can scarcely work with an IT company anymore without hearing the term SLA. Service Level Agreements are prevalent in most companies and help to define a contract between 2 parties. Whether general or specific, the agreement will help provide specific guidelines you can rely on through your working relationship.
  • Have You Look Into Online Computer Classes? - By: Arvinder Jeet
    We are getting to the age in our history where you need to learn and know how to use some kind of electronics or you are going to be without a job. The day that we are living in today is getting to be more and more about electronics.
  • Computer Support - By: James Larry
    With online computer support services at your side, you can easily say goodbye to expensive technicians who charge per incident or per hour. People who have experienced these services found them extremely helpful to troubleshoot software problems.
  • Advantages of Registery Fix - By: Rimmy Gill
    The need to fix a corrupt registry is something which any personal computer owner may face at any time. Perhaps in order to address whether it is important to fix a corrupt registry or not, let us first define the Windows registry. The Windows registry is a database of information about the configuration of a computer, including software, hardware as well as files and other preferences that you have set on your computer. When faced with a corrupt Windows Registry, many people wonder what causes this problem.
  • iYogi - Online Computer Repair Services - By: Addie Swadie
    Online computer support is offered through the Internet. The technician will connect to your PC through a remote desktop connection and will fix your computer errors right away while you watch. The process can be availed from any part of the world with just the Internet being the top priority.
  • Technical Net Jargon - By: Sandra Prior
    Bots crawl along the backbone, swinging from server to server and round a few routers. If this makes no sense at all, you need to get acquainted with some technical Net terms.
  • What is Netiquette? - By: Sandra Prior
    Netiquette is an attempt to foist table manners on the Net user who might care - a sort of book of right and proper manners for all young and fair societies. Or generally, a set of guidelines Net users should follow to be good Net citizens.
  • Online Technical Support - By: Andrew Lehren
    Computer support experts are gaining in popularity because of their ability to solve the problem effectively over Internet without having you to leave your premises. They are not the amateurs but certified technicians who resolve your PC issue right in front of your eyes as if it's some magic.
  • 10 Tips on How to Get Best Customer Service Support for your Dell Laptop and Printer - By: James Kara Murat
    Dell is one of the leading companies that sell some of the best computer products like laptops, printers, desktops and other PC accessories to suit the varying needs of different customers across the globe.