The Assure stage of the data lifecycle is about data cleaning and quality assurance.
For data to be useful it needs to be clean data. This means deduplicating, normalizing, and correcting your data before you can use it.
Deduplication: Removal of redundant data
Normalization: Making data uniform across cells (e.g., all phone numbers are recorded as 000-000-0000 versus 0000000000, addresses are written out fully Avenue vs Ave, Road vs Rd)
Correction: Removing corrupted, inaccurate, or irrelevant records from a dataset
During your collection process, you should work to ensure that team members collecting data follow the same procedures and standards when they record data to avoid generating lots of cleaning tasks. Secondary datasets may also require some cleaning.
A free tool like OpenRefine is helpful for large cleaning tasks.
NOTICE OF NONDISCRIMINATORY POLICY AS TO STUDENTS
The New York Medical College admits students of any race, color, national and ethnic origin to all the rights, privileges, programs, and activities generally accorded or made available to students at the college. It does not discriminate on the basis of race, color, national and ethnic origin in administration of its educational policies, admissions policies, scholarship and loan programs, and athletic and other school-administered programs. See full non-discrimination statement with contact info.