How to perform a multivariable analysis when you have too few observations

It is sometimes surprising not to be able to carry out a multivariable analysis because the number of subjects is too small while the file contains several hundred observations (patients, subjects).

Linear regressions

For linear regressions, i. e. multivariable analyses for which the outcome variable is numerical, it is necessary to have at least 10 observations per covariate.
A small refinement, when the covariate is categorical with N classes, it counts as N-1 variables. For example, let us take the categorical variable “satisfaction” with the following 5 classes:

  • Not at all satisfied
  • Somewhat dissatisfied
  • Moderately satisfied
  • Somewhat satisfied
  • Very satisfied

When this variable is used in a statistical model, it is automatically coded into 4 dummy variables, each of which is 0 or 1.

SatisfactionVery satisfiedSomewhat satisfiedModerately satisfiedSomewhat dissatisfied
Very satisfied1000
Somewhat satisfied0100
Moderately satisfied0010
Somewhat dissatisfied0001
Not at all satisfied0000
Tip: if you do not have enough number of subjects, start by grouping the classes of the categorical variables.

Logistical regressions and survival analyses

For logistic regressions and survival analyses, i.e. when the outcome variable is binary, it is slightly more complex. There must be at least 10 observations per variable, but be careful, not calculated on the total number of subjects, but on the number of subjects for whom the outcome variable is 0 and for whom the outcome variable is 1.

Thus, if the number of subjects is 179 distributed as follows: 29 patients with Y = 0 and 150 with Y = 1, the maximum number of covariates will 2.

6 Comments
  • HHShah
    Posted at 18:44h, 02 December Reply

    We want to evaluate the three different components each having three or more variable , to analyze the outcome variable (four outcome variable) and need to do inference that which component are determinant of outcome variable?

    • Kevin
      Posted at 13:08h, 03 December Reply

      I’m not sure I understand perfectly. Do you mean that you want to perform an explanatory analysis of a categorical outcome variable having more than 2 categories? If that’s the case, you need to perform a so-called multinomial logistic regression, which is currently not possible with pvalue.io.

  • alaa roushdy
    Posted at 00:44h, 14 June Reply

    i want to do a multivariate analysis with a binary outcome variable and want to include in the explanatory variable all the variables which were significant in the univariate analysis
    but your site give me a pop up message that the explanatory variables chosen are too much
    i used to do this with medcalc software without any problem
    why are you putting a limit to the number of variables that can be used in a multivariate model

    • Kevin
      Posted at 06:24h, 14 June Reply

      The difference between pvalue.io and other statistical software is that it is aimed at people who are not professionals in statistical analysis. In particular, there are a certain number of conditions to be met in the statistical models, including a limited number of covariates. Typically these are not checked in other software. This is why pvalue.io allows you to make correct statistical analyses even without advanced knowledge in statistics.

  • Teshome Gensa GETA
    Posted at 14:12h, 11 February Reply

    Is it proper to run logistic regression analysis with extremely few observations on outcome variable? For example, with total sample of five thousand participants, 99% having outcome yes (Y=0) whereas 1% having outcome no (Y=1). Any one can help me with explanation behind it.

Post A Comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.