6.4 part 1
Given a set of data, prepare the data set by removing errors, validate the data, and standardize the data. Download a data set from the Federal Aviation Administration (found here: http://av-info.faa.gov/dd_sublevel.asp?Folder=\AID (Links to an external site.)Links to an external site.) or from one of the following repositories of data: http://www.datasciencecentral.com/profiles/blogs/20-big-data-repositories-you-should-check-out-1 (Links to an external site.)Links to an external site.
Use either Excel or Open Refine (http://openrefine.org/ (Links to an external site.)Links to an external site.) to clean up the data set. Be sure that all of the text within each column is consistent. Save the dataset as an .xml, .cvs, and .xls formats. Submit all three formats of the cleansed data.
6.5 part 2
Document and provide step by step instructions of cleansing the data
Prepare a 3-5 page guide in which you document and describe how you cleaned the data set in Activity 3. Provide examples and screenshots to support each step in the data cleaning process. At the end of your guide, explain the importance of data integrity, data validation, data governance, and documentation. What were some of the challenges associated with the data cleansing process? Also provide recommended actions for how you would change the data set in terms of organization and design based on your interpretation of what the data represents.