Menu

We can use is.na()  to see how many missing values we have in a dataset. We will utilize airquality dataset.

Or this for greater detail.

Yeah, it’s simple enough. But dealing with it is another story.

Omission
For sure, we can just exclude it.

Imputation
Or we can also use a mean to replace the missing value.

That’s simple enough.

But, in some cases where simplicity may not be the best answer. We can use a mice library to perform a lot more advanced imputation.

mice()  offers a whole range of customization (link.)

With m=5, the function will generate five values, which we may use average to find the values that we will put in the data. Also, a result of the function itself also has a lot of interesting results.

In addition to NA, I’d think other types of errors are Inf and NaN. Although these errors only occur on numerical predictors, I think it’s worthwhile to check before creating a model.

We can use is.nan()  and is.infinite()  to check.

md.pattern()  is an excellent function. It will show the NA pattern.

The first column represents a number of observations. So there are 111 observations with 0 NA. Ozone has 35 missing observations. Solar.R has 5. Lastly, two observations have missing values in both Solar.R and Ozone.