Week 3 - Making Data Decisions

For this assignment, due to the nature of the dataset chosen, the only data management that was necessary was coding out missing data. In fact, as I discovered,  the software had already disregarded the missing data so this was not an absolutely necessary step - however, it was a learning moment and allowed me to verify what I had assumed in last week's assignment. I.e. The total number of observations (countries)was 213 based on the length, or number of rows, of the dataset. In the following I will discuss how I determined if this assumption was correct or not.

Furthermore, I had unknowingly already executed the technique of grouping variables with pd.cut in last week's assignment so that I could demonstrate and use frequency tables.

Replacing the Blanks with 'Nan'

This was done as in the video with the following code (program variable for co2 used as the example):

c1= c1.replace(r'^\s*$', np.nan, regex=True)

where c1 was the program variable assigned to the frequency distribution of the bins created.

The Program



Results

The resulting tables for the 3 variables - CO2 emissions, residential electricity use per person, and urban rate - are shown below.

CO2 emissions


Residential electricity use per person

Urban rate

Summary

As stated before, the software had automatically eliminated the blank spaces when generating the frequency tables. However, replacing the blanks with NaN was useful for seeing how many blanks existed in the dataset. It can be seen that for the data for CO2 emissions, residential electricity use and urban rate, the number of countries for which data was missing was 13, 77 and 10, respectively.

Also mentioned previously was that I assumed that the total number of countries observed was 213 based on the number of rows in the dataset, which was obtained in Week 2. For all 3 variables, the sum of the frequency distributions, including the count of NaN values was 213, indicating that the assumption was correct.

The frequency distributions remained the same as in Week 2 because I had already created the bins.


Comments

Popular posts from this blog

Week 1 - Getting the Research Project Started

Week 2 - Running My First Program