Home Forums Main Forums Big Data How to count the number of NAs in a dataset by SAS/SQL/R/Python?

• # How to count the number of NAs in a dataset by SAS/SQL/R/Python?

3 Members · 4 Posts
• ### Datura

Member
January 22, 2021 at 10:55 am

A student had a job interview with a bank yesterday. They asked her an interesting but challenging question:

How do you count the total number of NAs (missing values) in a dataset with a lot of variables by SAS/SQL/R/Python?

This question is not easy to answer, especially for SAS and SQL. Let’s discuss about it.

• ### Justin

January 22, 2021 at 11:24 am

It is very easy to count the total number of missing values in a data frame with R and Python. Below is the code. Suppose we have a data frame named df.

R code

`sum( is.na(df) )  df[ is.na(df) ]<-  -999  ### we can replace all the missing values in this way.`

Python code: we can use either isnull() or is.na() function. Please note: isna()/notna() are same as isnull()/notnull() in Python.

`### Below code counts the number of missing values of each column.df.isnull().sum() df.isna().sum() ### Below code counts the number of missing values of the whole data frame.df.isnull().sum().sum() df.isna().sum().sum()    `

For SAS programming, it is not an easy job. Some people may think of using macro to do it. SAS macro will work it out certainly, but it is too much work. My suggestion is to use macro only when you have to. Actually, we have a much easier and better solution if we use the CMISS() function in SAS. Below gives the SAS code.

`/*cmiss() function: This function counts the number of missing arguments.*//* argument*: specifies a constant, variable, or expression. Argument can be either a character value or a numeric value.*/data AAA;set P1.New_Wireless  End=EOF ;miss=CMISS(of acctno -- sales);Total_Miss + Miss;if EOF=1 then output;run;************* The correct answer is 371,468 for the above input data. ********;`

The most challenging task is to use SQL to do it. It is very complicated and we may have to use the “information schema” to do it. Does any one have a better solution?

• ### Wong

Member
January 22, 2021 at 7:33 pm

For SQL, we can extract all column names from information_schema. Then concatenate the column names with ‘IS NULL’ in where clause to get the number.

`SET @query = CONCAT(        'SELECT COUNT(*) FROM sakila.staff WHERE CONCAT(',  (SELECT GROUP_CONCAT(COLUMN_NAME)	FROM information_schema.COLUMNS	WHERE TABLE_SCHEMA = 'sakila' AND  `TABLE_NAME = 'staff'),`        ') IS NULL');PREPARE CHECK_NULL from @query;EXECUTE CHECK_NULL;  `
• ### Datura

Member
January 31, 2021 at 11:17 am

Great job! Thanks!