Understanding Missingness in Clinical Trials of Psoriasis

PSI Wonderful Wednesday

March 2021

Recap:
Missing data are present in almost any (clinical) data set.

PSI Wonderful Wednesday created a simulated data set, based on a clinical phase III trial on Psoriasis.

The simulated outcome variable is Pain which was collected on a visual analogue scale (range: 0-100). Greater values mean worse pain.

A dichotomized version of pain is also included in the data set: Pain reduction from baseline of at least 30.

Covariates include age, gender, and BMI.

In this hypothetical study, the main interest lies in the comparison of an active treatment arm and a placebo arm.Data were collected at baseline and at ten follow-up time points, but the Pain endpoint has some level of missing data.

Assessing Missingness

Looking at the distribution of missing data shows incresing missingness over time

But - are there within-subject patterns which might be important?

So SAD

An Upset plot shows that the most common pattern of within individual missingness is monotone missingness from visit 6 onwards

Maybe this needs further investigation...

Its about time...

It looks like missingness increases over time,
But this time series is hella hard to read, yo!

That's Better

By ordering patients we can see that monotone missingness increases throughout the study, with around a third of patients having monotone missigness consisting of at least the last two timepoints - not great if your primary endpoint is Visit 10!

This was achieved by:

  • Ordering patients based on a "monotone missingness flag", then
  • Ordering patients based on missing data at Visit 10, then
  • Ordering patients based on missing data at Visit 9 and so on

Missing Data can take many forms...

We often assume missing at random


This just means that the missing data is related in some ways to the data we do have

Assessing MAR

If we plot the demographic data based on groups like:
1) those with data, and (BLUE)
2) those with missing data (GREY)


We can see whether some demographics are more related to missingness than others



*NOTE:
-timepoints included here are 6 -10


Time hasn't been kind...

It looks like the older participants are more likely to have missing data.

but that's ok! Kind of...

By knowing that missingness is related to age we can use this variable to help us predict what these patients may have scored if they had not been missing!

Making stuff up

We can use multiple imputation to "fill in" the gaps in a dataset based on the other varibles.

When we know what variables are the most related to the missing data we can use this to inform our imputation

"Multiple" because we make many datasets and then combine them

No Real Difference

The graph based on the imputed data shows no real difference

This is good, really, as we can have some confidence that our MAR data due to age was not biasing the results

Imputation Station