The Importance of Clean Data

Getting your Trinity Audio player ready...

Dirty DataWith the addition of powerful import tools within common ecommerce platforms, more and more customers are able to import their products easily on the fly from 3rd party systems like Quickbooks , CSV/Excel or point of sale systems. Depending on the system that you are importing from, your data may fall into different stages of cleanliness. One common issue is that some systems support the use of non-UTF formatted data. For example, Microsoft Word and Microsoft Office use symbols extensively. As this data is imported into a website (either through a CSV import or copy/paste), the web does not parse these symbols and a placeholder is used instead. These will be pretty obvious to spot and generally look like square boxes with question marks within them (�). The proper fix is to store data as cleanly as possible at the root source (generally a CSV/Excel file, POS system, etc). If your using MS Excel or Apple Numbers, you can “Save As” and select CSV and then both applications have options for encoding. You should choose UTF 8 at this point.

Another opportunity to collect clean data is through your online forms. Most websites have at least a contact form on their website to collect information. If you ask for a phone number and an email address, these should be validated that they match common patterns for these types of information. If you do not validate user’s email addresses, you will certainly end up with entries that may look like “jim@domain.com.com”. Statistics show 25% of data quality comes from user error. If you were to apply simple validation to your email text field, this would alert the user to correct his or her email address before submitting the form. After all, what good is collecting their information if its no good?

 

data_infographic