Home on James Uanhoro
https://www.jamesuanhoro.com/
Recent content in Home on James UanhoroHugo -- gohugo.ioen-usMon, 25 Nov 2019 11:43:14 -0500Problems with using odds ratios as effect sizes in binary logistic regression and alternative approaches
https://www.jamesuanhoro.com/post/2019/11/25/problems-with-using-odds-ratios-as-effect-sizes-in-binary-logistic-regression-and-alternative-approaches/
Mon, 25 Nov 2019 11:43:14 -0500https://www.jamesuanhoro.com/post/2019/11/25/problems-with-using-odds-ratios-as-effect-sizes-in-binary-logistic-regression-and-alternative-approaches/This is my first (first author) journal article. We started writing it in Summer 2018, with first submission by November 2018. So my thinking has changed somewhat since then. There is a non-pay-walled version below, but the version of record is available at the Journal of Experimental Education, http://www.tandfonline.com/10.1080/00220973.2019.1693328. The best part of the review process was communicating with the editor, Professor Brian French - he was very kind.
Here’s the PDF and GitHub repository with simulation and figure reproduction code in R (very little comments).How do you communicate the evidence for the practical significance of an intervention using Bayesian methods?
https://www.jamesuanhoro.com/post/2019/09/26/how-do-you-communicate-the-evidence-for-the-practical-significance-of-an-intervention-using-bayesian-methods/
Thu, 26 Sep 2019 00:00:00 +0000https://www.jamesuanhoro.com/post/2019/09/26/how-do-you-communicate-the-evidence-for-the-practical-significance-of-an-intervention-using-bayesian-methods/I was listening to Frank Harrell on the Plenary Sessions podcast talk about Bayesian methods applied to clinical trials. I’d recommend anyone interested in or considering applying Bayesian methods listen to the episode.
One thing I liked was their discussion on communicating the evidence for practical significance in a transparent way. They were talking about how a paper they had read had a very good example of this. The episode doesn’t have notes so I did not see the paper.When data are not so informative, it pays to choose the sum score over the factor score
https://www.jamesuanhoro.com/post/2019/08/02/when-data-are-not-so-informative-it-pays-to-choose-the-sum-score-over-the-factor-score/
Fri, 02 Aug 2019 00:00:00 +0000https://www.jamesuanhoro.com/post/2019/08/02/when-data-are-not-so-informative-it-pays-to-choose-the-sum-score-over-the-factor-score/When I first came across the recent preprint by McNeish and Gordon Wolf on Twitter on how sum scores are factor scores from a heavily constrained model, my first reaction was: don’t we all know this already? Sacha Epskamp asked the same question and there’s a discussion that follows about how people who study these topics know this, but applied researchers may not.
I skimmed the paper and the authors show how sum scores are factor scores when all item error variances are the same and all loadings are the same across items.Multidimensional CFA with RStan
https://www.jamesuanhoro.com/post/2018/11/28/multidimensional-cfa-with-rstan/
Wed, 28 Nov 2018 00:00:00 +0000https://www.jamesuanhoro.com/post/2018/11/28/multidimensional-cfa-with-rstan/For starters, here’s the Stan code. And here’s the R script: Stan code. If you are already familiar with RStan, the basic concepts you need to combine are standard multilevel models with correlated random slopes and heteroskedastic errors.
I will embed R code into the demonstration. The required packages are lavaan, lme4 and RStan.
I like to understand most statistical methods as regression models. This way, it’s easy to understand the claims underlying a large number of techniques.Possibility of heteroskedasticity is a good reason not to dichotomize a continuous variable for use as outcome in logistic regression.
https://www.jamesuanhoro.com/post/2018/10/20/possibility-of-heteroskedasticity-is-a-good-reason-not-to-dichotomize-a-continuous-variable-for-use-as-outcome-in-logistic-regression./
Sat, 20 Oct 2018 00:00:00 +0000https://www.jamesuanhoro.com/post/2018/10/20/possibility-of-heteroskedasticity-is-a-good-reason-not-to-dichotomize-a-continuous-variable-for-use-as-outcome-in-logistic-regression./Continuing on whether it’s a good idea to dichotomize continuous variables prior to analysis for substantive reasons, I think I settle on the side of bad idea. The major reason is potential heteroskedasticity of the error term in the linear regression model for the original continuous variable.
This is an interesting issue, but one that I do not want to devote time to write about. So I decided to write a brief methods note.Should you perform logistic regression on a dichotomized continuous variable when you have access to the continuous variable? I'm not sure.
https://www.jamesuanhoro.com/post/2018/10/07/should-you-perform-logistic-regression-on-a-dichotomized-continuous-variable-when-you-have-access-to-the-continuous-variable-im-not-sure./
Sun, 07 Oct 2018 00:00:00 +0000https://www.jamesuanhoro.com/post/2018/10/07/should-you-perform-logistic-regression-on-a-dichotomized-continuous-variable-when-you-have-access-to-the-continuous-variable-im-not-sure./See here for an update, I recommend not dichotomizing, but it pays to read this first.
A standard situation in education or medicine is we have a continuous measure, but then we have cut-points on those continuous measures that are clinically/practically significant. An example is BMI where I hear 30 matters. You may have an achievement test with 70 as the pass score. When this happens, researchers may sometimes be interested in modeling BMI over 30 or pass/fail as the outcome of interest.Two group mean and variance comparisons
https://www.jamesuanhoro.com/post/2018/10/06/two-group-mean-and-variance-comparisons/
Sat, 06 Oct 2018 00:00:00 +0000https://www.jamesuanhoro.com/post/2018/10/06/two-group-mean-and-variance-comparisons/Someone asked an interesting question on Cross Validated recently about comparing the means and variances of two groups. They had a substantive interest in exploring variance and mean differences between two groups. They were thinking about Shapiro-Wilk test to test the data for normality, Levene’s test or F-test for variance comparisons depending on the results of Shapiro-Wilk, and Mann-Whitney or Welch’s test for means comparisons depending on the Shapiro-Wilk.Modeling the error variance to account for heteroskedasticity
https://www.jamesuanhoro.com/post/2018/05/07/modeling-the-error-variance-to-account-for-heteroskedasticity/
Mon, 07 May 2018 00:00:00 +0000https://www.jamesuanhoro.com/post/2018/05/07/modeling-the-error-variance-to-account-for-heteroskedasticity/One of the assumptions that comes with applying OLS estimation for regression models in the social sciences is homoskedasticity, I prefer constant error variance (it also goes by spherical disturbances). It implies that there is no systematic pattern to the error variance, meaning the model is equally poor at all levels of prediction.
This assumption is important for OLS to be the best linear unbiased predictor (BLUE). Heteroskedasticity, the complement of homoskedasticity, does not bias OLS, however, it causes it to be inefficient, losing the “best” property in the BLUE.Simulating data from regression models
https://www.jamesuanhoro.com/post/2018/05/07/simulating-data-from-regression-models/
Mon, 07 May 2018 00:00:00 +0000https://www.jamesuanhoro.com/post/2018/05/07/simulating-data-from-regression-models/My preferred approach to validating regression models is to simulate data from them, and see if the simulated data capture relevant features of the original data. A basic feature of interest would be the mean. I like this approach because it is extendable to the family of generalized linear models (logistic, Poisson, gamma, …) and other regression models, say t-regression. It’s something Gelman and Hill cover in their regression text.Using binary regression software to model ordinal data as a multivariate GLM
https://www.jamesuanhoro.com/post/2018/02/12/using-binary-regression-software-to-model-ordinal-data-as-a-multivariate-glm/
Mon, 12 Feb 2018 00:00:00 +0000https://www.jamesuanhoro.com/post/2018/02/12/using-binary-regression-software-to-model-ordinal-data-as-a-multivariate-glm/I have read that the most common model for analyzing ordinal data is the cumulative link logistic model, coupled with the proportional odds assumption. Essentially, you treat the outcome as if it were the categorical manifestation of a continuous latent variable. The predictor variables of this outcome influence it in one way only, so you get a single regression coefficient for each predictor. But the model has several intercepts representing the points at which the variable was cut to create the observed categorical manifestation.Using glmer() to perform Rasch analysis
https://www.jamesuanhoro.com/post/2018/01/02/using-glmer-to-perform-rasch-analysis/
Tue, 02 Jan 2018 00:00:00 +0000https://www.jamesuanhoro.com/post/2018/01/02/using-glmer-to-perform-rasch-analysis/I’ve been interested in the relationship between ordinal regression and item response theory (IRT) for a few months now. There are several helpful papers on the topic, here are some randomly picked ones 1 2 3 4 5, and a book.6 In this post, I focus on Rasch analysis. To do any of these analyses as a regression, your data need to be in long format - single column identifying items (regression predictor), single column with item response categories (regression outcome), and column holding the person ID.A Chi-Square test of close fit in covariance-based SEM
https://www.jamesuanhoro.com/post/2017/11/16/a-chi-square-test-of-close-fit-in-covariance-based-sem/
Thu, 16 Nov 2017 00:00:00 +0000https://www.jamesuanhoro.com/post/2017/11/16/a-chi-square-test-of-close-fit-in-covariance-based-sem/TLDR: If you can assume close fit for the RMSEA, there is no reason why you cannot for a Chi-Square test in SEMs. The method to do this is relatively simple, and may cause SEM practitioners to reconsider the Chi-Square test.
When assessing the fit of structural equation models, it is common for applied researchers to dismiss the $\chi^2$ test because it will almost always detect a statistically significant discrepancy between your model and the data, given a large enough sample size.Misspecification and fit indices in covariance-based SEM
https://www.jamesuanhoro.com/post/2017/10/28/misspecification-and-fit-indices-in-covariance-based-sem/
Sat, 28 Oct 2017 00:00:00 +0000https://www.jamesuanhoro.com/post/2017/10/28/misspecification-and-fit-indices-in-covariance-based-sem/TLDR: If you have good measurement quality, conventional benchmarks for fit indices may lead to bad decisions. Additionally, global fit indices are not informative for investigating misspecification.
I am working with one of my professors, Dr. Jessica Logan, on a checklist for the developmental progress of young children. We intend to take this down the IRT route (or ordinal logistic regression), but currently, this is all part of a factor analysis course project.Little's MCAR test at different sample sizes
https://www.jamesuanhoro.com/post/2017/09/21/littles-mcar-test-at-different-sample-sizes/
Thu, 21 Sep 2017 00:00:00 +0000https://www.jamesuanhoro.com/post/2017/09/21/littles-mcar-test-at-different-sample-sizes/TLDR: Little’s MCAR test is unable to tell data that are MCAR from data that are MAR in small samples, but maintains the nominal error rate when null is true across a wide range of sample sizes.
I just found out about the R simglm package and decided to do a small simulation to test Little’s MCAR test1 under different sample sizes. I could have investigated heteroskedasticity in linear regression instead, and I probably will in the future.Theil-Sen regression in R
https://www.jamesuanhoro.com/post/2017/09/21/theil-sen-regression-in-r/
Thu, 21 Sep 2017 00:00:00 +0000https://www.jamesuanhoro.com/post/2017/09/21/theil-sen-regression-in-r/TLDR: When performing a simple linear regression, if you have any concern about outliers or heterosedasticity, consider the Theil-Sen estimator.
A simple linear regression estimator that is not commonly used or taught in the social sciences is the Theil-Sen estimator. This is a shame given that this estimator is very intuitive, once you know what a slope means. Three steps:
Plot a line between all the points in your data Calculate the slope for each line The median slope is your regression slope Calculating the slope this way happens to be quite robust.Linear regression with violation of heteroskedasticity with small samples
https://www.jamesuanhoro.com/post/2017/09/19/linear-regression-with-violation-of-heteroskedasticity-with-small-samples/
Tue, 19 Sep 2017 00:00:00 +0000https://www.jamesuanhoro.com/post/2017/09/19/linear-regression-with-violation-of-heteroskedasticity-with-small-samples/TLDR: In small samples, the wild bootstrap implemented in the R hcci package is a good bet when heteroskedasticity is a concern.
Today while teaching the multiple regression lab, I showed the class the standardized residuals versus standardized predictor plot SPSS lets you produce. It is the plot we typically use to assess homoskedasticity. The sample size for the analysis was 44. I mentioned how the regression slopes are fine under heteroskedasticity, but inference $(t, SE, pvalue)$ may be problematic.On the interpretation of regression coefficients
https://www.jamesuanhoro.com/post/2017/08/11/on-the-interpretation-of-regression-coefficients/
Fri, 11 Aug 2017 00:00:00 +0000https://www.jamesuanhoro.com/post/2017/08/11/on-the-interpretation-of-regression-coefficients/TLDR: We should interpret regression coefficients for continuous variables as we would descriptive dummy variables, unless we intend to make causal claims.
I am going to be teaching regression labs in the Fall, and somehow, I stumbled onto Gelman and Hill’s Data analysis using regression and multilevel/hierarchical models.1 So I started reading it and it’s a good book.
A useful piece of advice they give is to interpret regression coefficients in a predictive manner (p.