03 Jul How to perform a multivariable analysis when you have too few observations
It is sometimes surprising not to be able to carry out a multivariable analysis because the number of subjects is too small while the file contains several hundred observations (patients, subjects).
Linear regressions
For linear regressions, i. e. multivariable analyses for which the outcome variable is numerical, it is necessary to have at least 10 observations per covariate.
A small refinement, when the covariate is categorical with N classes, it counts as N-1 variables. For example, let us take the categorical variable “satisfaction” with the following 5 classes:
- Not at all satisfied
- Somewhat dissatisfied
- Moderately satisfied
- Somewhat satisfied
- Very satisfied
When this variable is used in a statistical model, it is automatically coded into 4 dummy variables, each of which is 0 or 1.
Satisfaction | Very satisfied | Somewhat satisfied | Moderately satisfied | Somewhat dissatisfied |
Very satisfied | 1 | 0 | 0 | 0 |
Somewhat satisfied | 0 | 1 | 0 | 0 |
Moderately satisfied | 0 | 0 | 1 | 0 |
Somewhat dissatisfied | 0 | 0 | 0 | 1 |
Not at all satisfied | 0 | 0 | 0 | 0 |
Logistical regressions and survival analyses
For logistic regressions and survival analyses, i.e. when the outcome variable is binary, it is slightly more complex. There must be at least 10 observations per variable, but be careful, not calculated on the total number of subjects, but on the number of subjects for whom the outcome variable is 0 and for whom the outcome variable is 1.
Thus, if the number of subjects is 179 distributed as follows: 29 patients with Y = 0 and 150 with Y = 1, the maximum number of covariates will 2.
HHShah
Posted at 18:44h, 02 DecemberWe want to evaluate the three different components each having three or more variable , to analyze the outcome variable (four outcome variable) and need to do inference that which component are determinant of outcome variable?
Kevin
Posted at 13:08h, 03 DecemberI’m not sure I understand perfectly. Do you mean that you want to perform an explanatory analysis of a categorical outcome variable having more than 2 categories? If that’s the case, you need to perform a so-called multinomial logistic regression, which is currently not possible with pvalue.io.
alaa roushdy
Posted at 00:44h, 14 Junei want to do a multivariate analysis with a binary outcome variable and want to include in the explanatory variable all the variables which were significant in the univariate analysis
but your site give me a pop up message that the explanatory variables chosen are too much
i used to do this with medcalc software without any problem
why are you putting a limit to the number of variables that can be used in a multivariate model
Kevin
Posted at 06:24h, 14 JuneThe difference between pvalue.io and other statistical software is that it is aimed at people who are not professionals in statistical analysis. In particular, there are a certain number of conditions to be met in the statistical models, including a limited number of covariates. Typically these are not checked in other software. This is why pvalue.io allows you to make correct statistical analyses even without advanced knowledge in statistics.
Teshome Gensa GETA
Posted at 14:12h, 11 FebruaryIs it proper to run logistic regression analysis with extremely few observations on outcome variable? For example, with total sample of five thousand participants, 99% having outcome yes (Y=0) whereas 1% having outcome no (Y=1). Any one can help me with explanation behind it.
Kevin
Posted at 09:09h, 26 JulyYes, please visit: How to perform a multivariable analysis when you have too few observations, based on Peduzzi, P., Concato, J., Kemper, E., Holford, T. R. & Feinstein, A. R. A simulation study of the number of events per variable in logistic regression analysis. J Clin Epidemiol 49, 1373–1379 (1996).