Home on James Uanhoro

Home on James Uanhoro https://www.jamesuanhoro.com/ Recent content in Home on James Uanhoro Hugo -- gohugo.io en-us Fri, 08 Nov 2024 00:00:00 +0000 The limitations of small variance priors lie in their implementation https://www.jamesuanhoro.com/post/2024/11/08/the-limitations-of-small-variance-priors-lie-in-their-implementation/ Fri, 08 Nov 2024 00:00:00 +0000 https://www.jamesuanhoro.com/post/2024/11/08/the-limitations-of-small-variance-priors-lie-in-their-implementation/ Thanks to Muthén & Asparouhov (2012), hereafter MA2012,1 small variance priors are a common feature of Bayesian SEMs. The idea is: when we are not sure that a parameter is zero, you can place a prior that attempts to constrain the parameter to zero. A non-zero parameter will hopefully escape the constraint, while zero parameters will have most of their posterior concentrated around zero. Estimation of parameters constrained to zero In latent variable models, we assume relations between observed items are captured or mediated via the latent variables. Hierarchical ordinal regression for analysis of single subject data OR Bayesian estimation of overlap and other effect sizes https://www.jamesuanhoro.com/post/2024/04/14/hierarchical-ordinal-regression-for-analysis-of-single-subject-data-or-bayesian-estimation-of-overlap-and-other-effect-sizes/ Sun, 14 Apr 2024 00:00:00 +0000 https://www.jamesuanhoro.com/post/2024/04/14/hierarchical-ordinal-regression-for-analysis-of-single-subject-data-or-bayesian-estimation-of-overlap-and-other-effect-sizes/ Background I’ve been interested in analysis of single case designs (SCD) using hierarchical models: each case is unique but we can improve estimation for each case by sharing data across cases. Given that data from SCD are often atypical, I’ve thought such data are a good candidate for ordinal regression.1 Broadly, we can model unidimensional2 continuous, count, ordinal data as ordinal, since these data are also ordinal. If we knew the exact distribution for the data, we could perform more efficient analyses, but we often don’t. AERA 2023 Presentation: Modeling misspecification as a parameter in Bayesian SEMs https://www.jamesuanhoro.com/post/2023/05/04/aera-2023-presentation-modeling-misspecification-as-a-parameter-in-bayesian-sems/ Thu, 04 May 2023 00:00:00 +0000 https://www.jamesuanhoro.com/post/2023/05/04/aera-2023-presentation-modeling-misspecification-as-a-parameter-in-bayesian-sems/ Presentation Please find a copy of the AERA 2023 SEM SIG presentation here: Manuscript And a version of the manuscript here: The Open Science Framework repository for code in the paper is: https://osf.io/29ydb/ R package The following links all have installation instructions: package website: https://jamesuanhoro.github.io/minorbsem/ r-universe: https://jamesuanhoro.r-universe.dev/minorbsem GitHub: https://github.com/jamesuanhoro/minorbsem/ Please reach out to me if you run into any problems during installation, package use. Older blog post about the package: The minorbsem package in R The minorbsem package in R https://www.jamesuanhoro.com/post/2023/04/02/the-minorbsem-package-in-r/ Sun, 02 Apr 2023 08:43:23 -0500 https://www.jamesuanhoro.com/post/2023/04/02/the-minorbsem-package-in-r/ I’ve been working on the minorbsem package in R since the start of February. It’s my first proper attempt at an R package. The package allows you fit Bayesian SEMs while accounting for: the possibility that minor factors influence the correlation between items (Uanhoro, 2023), or adventitious error (Wu & Browne, 2015). One can also fit meta-analytic SEMs under similar assumptions (Uanhoro, 2022). The package is hosted on both GitHub and r-universe – r-universe is really great. Hierarchical Covariance Estimation Approach to Meta-Analytic Structural Equation Modeling https://www.jamesuanhoro.com/post/2022/12/14/hierarchical-covariance-estimation-approach-to-meta-analytic-structural-equation-modeling/ Wed, 14 Dec 2022 00:00:00 +0000 https://www.jamesuanhoro.com/post/2022/12/14/hierarchical-covariance-estimation-approach-to-meta-analytic-structural-equation-modeling/ Manuscript Summary twitter thread Things I would change Manuscript There is a non-pay-walled version below, but the version of record is available at the Structural Equation Modeling, http://www.tandfonline.com/10.1080/10705511.2022.2142128. Here’s the PDF and OSF repository with simulation and data analyses code in R. Also, some of the methods in this paper are now implemented in the minorbsem R package: website, r-universe. Some points about the paper from a Twitter thread I just typed up, and some things I would change at the bottom: Traditional SEM overly focuses on LISREL equations to the detriment of dealing with measurement error https://www.jamesuanhoro.com/post/2021/01/18/traditional-sem-overly-focuses-on-lisrel-equations-to-the-detriment-of-dealing-with-measurement-error/ Mon, 18 Jan 2021 00:00:00 +0000 https://www.jamesuanhoro.com/post/2021/01/18/traditional-sem-overly-focuses-on-lisrel-equations-to-the-detriment-of-dealing-with-measurement-error/ So I don’t listen to the Quantitude podcast. It’s really difficult for me to get into it when my playlist includes podcasts with episodes titles like: Interstellar Jihad. However, I saw several tweets inspired by the SEM vs. regression episode. And I have too many thoughts, opinions, feelings to stay quiet. I lay them out here. SEM is regression A case of cultural differences The actual difference is LISREL But too much LISREL, not enough measurement error LISREL works well for normal-data normal-model no manifest covariates models Covariance matrix \(\to\) LISREL likely fails with discrete indicators LISREL likely fails for SEMs with observed covariates (e. Problems with using odds ratios as effect sizes in binary logistic regression and alternative approaches https://www.jamesuanhoro.com/post/2019/11/25/problems-with-using-odds-ratios-as-effect-sizes-in-binary-logistic-regression-and-alternative-approaches/ Mon, 25 Nov 2019 11:43:14 -0500 https://www.jamesuanhoro.com/post/2019/11/25/problems-with-using-odds-ratios-as-effect-sizes-in-binary-logistic-regression-and-alternative-approaches/ This is my first (first author) journal article. We started writing it in Summer 2018, with first submission by November 2018. So my thinking has changed somewhat since then. There is a non-pay-walled version below, but the version of record is available at the Journal of Experimental Education, http://www.tandfonline.com/10.1080/00220973.2019.1693328. The best part of the review process was communicating with the editor, Professor Brian French - he was very kind. Here’s the PDF and GitHub repository with simulation and figure reproduction code in R (very little comments). How do you communicate the evidence for the practical significance of an intervention using Bayesian methods? https://www.jamesuanhoro.com/post/2019/09/26/how-do-you-communicate-the-evidence-for-the-practical-significance-of-an-intervention-using-bayesian-methods/ Thu, 26 Sep 2019 00:00:00 +0000 https://www.jamesuanhoro.com/post/2019/09/26/how-do-you-communicate-the-evidence-for-the-practical-significance-of-an-intervention-using-bayesian-methods/ I was listening to Frank Harrell on the Plenary Sessions podcast talk about Bayesian methods applied to clinical trials. I’d recommend anyone interested in or considering applying Bayesian methods listen to the episode. One thing I liked was their discussion on communicating the evidence for practical significance in a transparent way. They were talking about how a paper they had read had a very good example of this. The episode doesn’t have notes so I did not see the paper. When data are not so informative, it pays to choose the sum score over the factor score https://www.jamesuanhoro.com/post/2019/08/02/when-data-are-not-so-informative-it-pays-to-choose-the-sum-score-over-the-factor-score/ Fri, 02 Aug 2019 00:00:00 +0000 https://www.jamesuanhoro.com/post/2019/08/02/when-data-are-not-so-informative-it-pays-to-choose-the-sum-score-over-the-factor-score/ When I first came across the recent preprint by McNeish and Gordon Wolf on Twitter on how sum scores are factor scores from a heavily constrained model, my first reaction was: don’t we all know this already? Sacha Epskamp asked the same question and there’s a discussion that follows about how people who study these topics know this, but applied researchers may not. I skimmed the paper and the authors show how sum scores are factor scores when all item error variances are the same and all loadings are the same across items. Multidimensional CFA with RStan https://www.jamesuanhoro.com/post/2018/11/28/multidimensional-cfa-with-rstan/ Wed, 28 Nov 2018 00:00:00 +0000 https://www.jamesuanhoro.com/post/2018/11/28/multidimensional-cfa-with-rstan/ For starters, here’s the Stan code. And here’s the R script: Stan code. If you are already familiar with RStan, the basic concepts you need to combine are standard multilevel models with correlated random slopes and heteroskedastic errors. I will embed R code into the demonstration. The required packages are lavaan, lme4 and RStan. I like to understand most statistical methods as regression models. This way, it’s easy to understand the claims underlying a large number of techniques. Possibility of heteroskedasticity is a good reason not to dichotomize a continuous variable for use as outcome in logistic regression. https://www.jamesuanhoro.com/post/2018/10/20/possibility-of-heteroskedasticity-is-a-good-reason-not-to-dichotomize-a-continuous-variable-for-use-as-outcome-in-logistic-regression./ Sat, 20 Oct 2018 00:00:00 +0000 https://www.jamesuanhoro.com/post/2018/10/20/possibility-of-heteroskedasticity-is-a-good-reason-not-to-dichotomize-a-continuous-variable-for-use-as-outcome-in-logistic-regression./ Continuing on whether it’s a good idea to dichotomize continuous variables prior to analysis for substantive reasons, I think I settle on the side of bad idea. The major reason is potential heteroskedasticity of the error term in the linear regression model for the original continuous variable. This is an interesting issue, but one that I do not want to devote time to write about. So I decided to write a brief methods note. Should you perform logistic regression on a dichotomized continuous variable when you have access to the continuous variable? I'm not sure. https://www.jamesuanhoro.com/post/2018/10/07/should-you-perform-logistic-regression-on-a-dichotomized-continuous-variable-when-you-have-access-to-the-continuous-variable-im-not-sure./ Sun, 07 Oct 2018 00:00:00 +0000 https://www.jamesuanhoro.com/post/2018/10/07/should-you-perform-logistic-regression-on-a-dichotomized-continuous-variable-when-you-have-access-to-the-continuous-variable-im-not-sure./ See here for an update, I recommend not dichotomizing, but it pays to read this first. A standard situation in education or medicine is we have a continuous measure, but then we have cut-points on those continuous measures that are clinically/practically significant. An example is BMI where I hear 30 matters. You may have an achievement test with 70 as the pass score. When this happens, researchers may sometimes be interested in modeling BMI over 30 or pass/fail as the outcome of interest. Two group mean and variance comparisons https://www.jamesuanhoro.com/post/2018/10/06/two-group-mean-and-variance-comparisons/ Sat, 06 Oct 2018 00:00:00 +0000 https://www.jamesuanhoro.com/post/2018/10/06/two-group-mean-and-variance-comparisons/ Someone asked an interesting question on Cross Validated recently about comparing the means and variances of two groups. They had a substantive interest in exploring variance and mean differences between two groups. They were thinking about Shapiro-Wilk test to test the data for normality, Levene’s test or F-test for variance comparisons depending on the results of Shapiro-Wilk, and Mann-Whitney or Welch’s test for means comparisons depending on the Shapiro-Wilk. I gave a somewhat detailed answer and liked it, so pasting it here verbatim. Modeling the error variance to account for heteroskedasticity https://www.jamesuanhoro.com/post/2018/05/07/modeling-the-error-variance-to-account-for-heteroskedasticity/ Mon, 07 May 2018 00:00:00 +0000 https://www.jamesuanhoro.com/post/2018/05/07/modeling-the-error-variance-to-account-for-heteroskedasticity/ One of the assumptions that comes with applying OLS estimation for regression models in the social sciences is homoskedasticity, I prefer constant error variance (it also goes by spherical disturbances). It implies that there is no systematic pattern to the error variance, meaning the model is equally poor at all levels of prediction. This assumption is important for OLS to be the best linear unbiased predictor (BLUE). Heteroskedasticity, the complement of homoskedasticity, does not bias OLS, however, it causes it to be inefficient, losing the “best” property in the BLUE. Simulating data from regression models https://www.jamesuanhoro.com/post/2018/05/07/simulating-data-from-regression-models/ Mon, 07 May 2018 00:00:00 +0000 https://www.jamesuanhoro.com/post/2018/05/07/simulating-data-from-regression-models/ My preferred approach to validating regression models is to simulate data from them, and see if the simulated data capture relevant features of the original data. A basic feature of interest would be the mean. I like this approach because it is extendable to the family of generalized linear models (logistic, Poisson, gamma, …) and other regression models, say t-regression. It’s something Gelman and Hill cover in their regression text.1 Sadly, the default method of simulating data from regression models in R misses what one might consider an important source of model uncertainty - variance in estimated regression coefficients. Using binary regression software to model ordinal data as a multivariate GLM https://www.jamesuanhoro.com/post/2018/02/12/using-binary-regression-software-to-model-ordinal-data-as-a-multivariate-glm/ Mon, 12 Feb 2018 00:00:00 +0000 https://www.jamesuanhoro.com/post/2018/02/12/using-binary-regression-software-to-model-ordinal-data-as-a-multivariate-glm/ I have read that the most common model for analyzing ordinal data is the cumulative link logistic model, coupled with the proportional odds assumption. Essentially, you treat the outcome as if it were the categorical manifestation of a continuous latent variable. The predictor variables of this outcome influence it in one way only, so you get a single regression coefficient for each predictor. But the model has several intercepts representing the points at which the variable was cut to create the observed categorical manifestation. Using glmer() to perform Rasch analysis https://www.jamesuanhoro.com/post/2018/01/02/using-glmer-to-perform-rasch-analysis/ Tue, 02 Jan 2018 00:00:00 +0000 https://www.jamesuanhoro.com/post/2018/01/02/using-glmer-to-perform-rasch-analysis/ I’ve been interested in the relationship between ordinal regression and item response theory (IRT) for a few months now. There are several helpful papers on the topic, here are some randomly picked ones 1 2 3 4 5, and a book.6 In this post, I focus on Rasch analysis. To do any of these analyses as a regression, your data need to be in long format - single column identifying items (regression predictor), single column with item response categories (regression outcome), and column holding the person ID. A Chi-Square test of close fit in covariance-based SEM https://www.jamesuanhoro.com/post/2017/11/16/a-chi-square-test-of-close-fit-in-covariance-based-sem/ Thu, 16 Nov 2017 00:00:00 +0000 https://www.jamesuanhoro.com/post/2017/11/16/a-chi-square-test-of-close-fit-in-covariance-based-sem/ TLDR: If you can assume close fit for the RMSEA, there is no reason why you cannot for a Chi-Square test in SEMs. The method to do this is relatively simple, and may cause SEM practitioners to reconsider the Chi-Square test. When assessing the fit of structural equation models, it is common for applied researchers to dismiss the \(\chi^2\) test because it will almost always detect a statistically significant discrepancy between your model and the data, given a large enough sample size. Misspecification and fit indices in covariance-based SEM https://www.jamesuanhoro.com/post/2017/10/28/misspecification-and-fit-indices-in-covariance-based-sem/ Sat, 28 Oct 2017 00:00:00 +0000 https://www.jamesuanhoro.com/post/2017/10/28/misspecification-and-fit-indices-in-covariance-based-sem/ TLDR: If you have good measurement quality, conventional benchmarks for fit indices may lead to bad decisions. Additionally, global fit indices are not informative for investigating misspecification. I am working with one of my professors, Dr. Jessica Logan, on a checklist for the developmental progress of young children. We intend to take this down the IRT route (or ordinal logistic regression), but currently, this is all part of a factor analysis course project. Little's MCAR test at different sample sizes https://www.jamesuanhoro.com/post/2017/09/21/littles-mcar-test-at-different-sample-sizes/ Thu, 21 Sep 2017 00:00:00 +0000 https://www.jamesuanhoro.com/post/2017/09/21/littles-mcar-test-at-different-sample-sizes/ TLDR: Little’s MCAR test is unable to tell data that are MCAR from data that are MAR in small samples, but maintains the nominal error rate when null is true across a wide range of sample sizes. I just found out about the R simglm package and decided to do a small simulation to test Little’s MCAR test1 under different sample sizes. I could have investigated heteroskedasticity in linear regression instead, and I probably will in the future. Theil-Sen regression in R https://www.jamesuanhoro.com/post/2017/09/21/theil-sen-regression-in-r/ Thu, 21 Sep 2017 00:00:00 +0000 https://www.jamesuanhoro.com/post/2017/09/21/theil-sen-regression-in-r/ TLDR: When performing a simple linear regression, if you have any concern about outliers or heterosedasticity, consider the Theil-Sen estimator. A simple linear regression estimator that is not commonly used or taught in the social sciences is the Theil-Sen estimator. This is a shame given that this estimator is very intuitive, once you know what a slope means. Three steps: Plot a line between all the points in your data Calculate the slope for each line The median slope is your regression slope Calculating the slope this way happens to be quite robust. Linear regression with violation of heteroskedasticity with small samples https://www.jamesuanhoro.com/post/2017/09/19/linear-regression-with-violation-of-heteroskedasticity-with-small-samples/ Tue, 19 Sep 2017 00:00:00 +0000 https://www.jamesuanhoro.com/post/2017/09/19/linear-regression-with-violation-of-heteroskedasticity-with-small-samples/ TLDR: In small samples, the wild bootstrap implemented in the R hcci package is a good bet when heteroskedasticity is a concern. Today while teaching the multiple regression lab, I showed the class the standardized residuals versus standardized predictor plot SPSS lets you produce. It is the plot we typically use to assess homoskedasticity. The sample size for the analysis was 44. I mentioned how the regression slopes are fine under heteroskedasticity, but inference \((t, SE, p\)-value) may be problematic. On the interpretation of regression coefficients https://www.jamesuanhoro.com/post/2017/08/11/on-the-interpretation-of-regression-coefficients/ Fri, 11 Aug 2017 00:00:00 +0000 https://www.jamesuanhoro.com/post/2017/08/11/on-the-interpretation-of-regression-coefficients/ TLDR: We should interpret regression coefficients for continuous variables as we would descriptive dummy variables, unless we intend to make causal claims. I am going to be teaching regression labs in the Fall, and somehow, I stumbled onto Gelman and Hill’s Data analysis using regression and multilevel/hierarchical models.1 So I started reading it and it’s a good book. A useful piece of advice they give is to interpret regression coefficients in a predictive manner (p.