data cleansing


Posted by ian on July 09, 2001 8:58 AM

hi all

can anyone help, i have a csv file of 25,000 records with a number of dups in, what is the quickest easiest way to Cleanse these dups in Excel, I have no experiance of this so simple instructions would be nice.

thanks Ian

Posted by Ben O. on July 09, 2001 9:02 AM

The easiest way would be to import the file into Access and doing a simple query, grouping by the field that contains duplicates. A pivot table in Excel can also give you a list of unique entries, but it's harder to extract data from a pivot table, since they're mainly for reports. So, if you have access, import your data, run the query, and then export it back to Excel.

-Ben

Posted by ian on July 09, 2001 9:17 AM

what it is, is whole lines repeated, i'll see if i can work it out, never really used access, so any pointer would be nice,

cheers ian

Posted by Barrie Davidson on July 09, 2001 9:17 AM

Hi Ian, an easy way to clean out the duplicates is to first sort your data and then perform an advanced filter. To do the advanced filter, select Data|Filter|Advanced Filter - a pop-up box will appear. Select "Copy to a another location", select your list range, select your location to copy the filtered data (in the "Copy to:" box), and check the "Unique Records Only" box.

If you need any further help let me know.

Regards,
Barrie



Posted by Aladin Akyurek on July 09, 2001 9:51 AM

The freedownload ASAP Utilities might also help if you're on Windows. You can find the URL thru a search with google.

Aladin