| Title: | Factor Model Estimation with Latent Variables |
|---|---|
| Description: | A flexible framework for estimating factor models with multiple latent variables. Supports linear, probit, ordered probit, and multinomial logit model components. Features include multi-stage estimation, automatic parameter initialization, analytical gradients and Hessians, and parallel estimation. Methods are described in Heckman, Humphries, and Veramendi (2016) <doi:10.1016/j.jeconom.2015.12.001>, Heckman, Humphries, and Veramendi (2018) <doi:10.1086/698760>, and Humphries, Joensen, and Veramendi (2024) <doi:10.1257/pandp.20241026>. |
| Authors: | Greg Veramendi [aut, cre], Jess Xiong [aut] |
| Maintainer: | Greg Veramendi <[email protected]> |
| License: | GPL-3 |
| Version: | 1.7.1 |
| Built: | 2026-06-10 12:16:51 UTC |
| Source: | https://github.com/cran/factorana |
Generates R bootstrap samples, fits every replicate in a local loop (each
fit is restartable via stage_dir/sample_<id>.rds), and collects the
results. For a single estimation stage on one machine. For multi-stage,
distributed, or restart-heavy workflows, use the primitives
(generate_bootstrap_samples(), bootstrap_fit_sample(),
collect_bootstrap()) directly so each stage and sample is an independent,
resumable job.
bootstrap_factorana( model_system, data, R, cluster = NULL, stage_dir, control = NULL, seed = NULL, conf_level = 0.95, ... )bootstrap_factorana( model_system, data, R, cluster = NULL, stage_dir, control = NULL, seed = NULL, conf_level = 0.95, ... )
model_system |
A |
data |
Data frame used for estimation. |
R |
Number of bootstrap replicates. |
cluster |
Optional cluster id: a column name in |
stage_dir |
Directory for per-sample result files (and the samples object). |
control |
Estimation control. |
seed |
Optional integer seed for reproducibility. |
conf_level |
Confidence level for percentile intervals. |
... |
Passed to |
A factorana_bootstrap object (see collect_bootstrap()).
generate_bootstrap_samples(), bootstrap_fit_sample(), collect_bootstrap()
Runs a multi-stage bootstrap on one machine: for each replicate, fits each
stage in turn, chaining every stage on that replicate's own earlier-stage
fits. Each (stage, sample) fit is written to dir/stage_<k>/sample_<id>.rds
and skipped if it already exists, so an interrupted run resumes where it
stopped. If a stage fails to converge for a replicate, its later stages are
skipped (so a bad earlier fit is not chained forward).
bootstrap_factorana_multistage( stage_builders, data, R, cluster = NULL, dir, control = NULL, seed = NULL, conf_level = 0.95, ... )bootstrap_factorana_multistage( stage_builders, data, R, cluster = NULL, dir, control = NULL, seed = NULL, conf_level = 0.95, ... )
stage_builders |
A list of functions, one per stage, each with the
signature |
data |
Data frame. |
R |
Number of bootstrap replicates. |
cluster |
Optional cluster id (column name in |
dir |
Directory holding the samples object and the per-stage
subdirectories ( |
control |
Estimation control. |
seed |
Optional seed for sample generation. |
conf_level |
Confidence level for percentile intervals. |
... |
Passed to |
For distributed (multi-node) runs, drive the same per-stage, per-sample work
with bootstrap_fit_sample() directly (one array-job task per sample), which
is what this driver does internally.
A factorana_bootstrap_multistage list with a per-stage
collect_bootstrap() summary in stages and final pointing at the last
stage.
bootstrap_factorana(), bootstrap_fit_sample(), collect_bootstrap()
Fits model_system to the data reweighted by bootstrap sample sample_id
and writes the result to stage_dir/sample_<id>.rds. If that file already
exists it returns immediately, so an interrupted run (or a rebooted node)
simply re-runs the missing samples. This is the unit of work to scatter
across a compute cluster (one array-job task per sample_id).
bootstrap_fit_sample( model_system, data, samples, sample_id, stage_dir, control = NULL, ..., weight_col = ".bootw", overwrite = FALSE )bootstrap_fit_sample( model_system, data, samples, sample_id, stage_dir, control = NULL, ..., weight_col = ".bootw", overwrite = FALSE )
model_system |
A |
data |
The full data frame used to generate the samples. |
samples |
A |
sample_id |
Integer replicate index (1..R). |
stage_dir |
Directory for this stage's per-sample result files. |
control |
Estimation control (see |
... |
Passed to |
weight_col |
Name of the temporary weight column added to the data. |
overwrite |
If |
Rows whose bootstrap weight is zero are dropped, and the remaining rows are
given their integer frequency weight via model_system$weights. For a
multi-stage workflow, set model_system$previous_stage (built from this
sample's previous-stage fit) before calling, so each replicate chains on its
own earlier stage.
Invisibly, the path to the written result file.
generate_bootstrap_samples(), collect_bootstrap()
Constructs a dummy previous_stage result that plugs the
anchor-period measurement intercepts into every factor slot, pairs
them with the (tied) shared loadings and residual sigmas / thresholds,
and is ready to pass as previous_stage to a Stage 2
SE_linear or SE_quadratic model via
define_model_system.
build_dynamic_previous_stage(dyn, stage1_result, data, anchor_period = 1L)build_dynamic_previous_stage(dyn, stage1_result, data, anchor_period = 1L)
dyn |
A |
stage1_result |
The result object from
|
data |
The same data frame passed to
|
anchor_period |
Integer index into |
A list suitable as previous_stage in
define_model_system: has fields model_system,
estimates, std_errors, convergence,
loglik. Every per-component measurement parameter is the
corresponding Stage 1 estimate, with the intercepts overwritten to
the anchor-period values for all periods.
When using parallel estimation with Ctrl-C interrupts, worker processes may continue running after the main process exits. This function finds and terminates any orphaned R worker processes started by the parallel package.
cleanup_parallel_workers(signal = "TERM", verbose = TRUE, list_only = FALSE)cleanup_parallel_workers(signal = "TERM", verbose = TRUE, list_only = FALSE)
signal |
Signal to send (default "TERM" for graceful shutdown, use "KILL" for force) |
verbose |
Whether to print messages (default TRUE) |
list_only |
If TRUE, only list potential workers without killing them (default FALSE) |
Invisible NULL (or vector of PIDs if list_only = TRUE)
# Safe to run: list potential orphaned parallel workers without killing them. cleanup_parallel_workers(list_only = TRUE, verbose = FALSE) # After interrupting a parallel job with Ctrl-C, terminate orphaned workers: cleanup_parallel_workers() # Force kill if graceful shutdown doesn't work: cleanup_parallel_workers(signal = "KILL")# Safe to run: list potential orphaned parallel workers without killing them. cleanup_parallel_workers(list_only = TRUE, verbose = FALSE) # After interrupting a parallel job with Ctrl-C, terminate orphaned workers: cleanup_parallel_workers() # Force kill if graceful shutdown doesn't work: cleanup_parallel_workers(signal = "KILL")
Reads every sample_<id>.rds in stage_dir, stacks the estimates, and
returns the bootstrap covariance, bootstrap standard errors, and percentile
confidence intervals. Non-converged replicates are dropped by default.
collect_bootstrap(stage_dir, conf_level = 0.95, require_convergence = TRUE)collect_bootstrap(stage_dir, conf_level = 0.95, require_convergence = TRUE)
stage_dir |
Directory of per-sample result files. |
conf_level |
Confidence level for percentile intervals. |
require_convergence |
Drop replicates with |
A factorana_bootstrap list with boot_se, boot_cov, ci
(a data frame of percentile intervals and bootstrap SEs), the stacked
estimates, and counts (n_total, n_converged, n_used).
generate_bootstrap_samples(), bootstrap_fit_sample()
Creates a table with model components as columns and parameter types as rows. This is similar to how SEM/CFA software displays results.
components_table(result, digits = 3, show_se = TRUE, stars = TRUE)components_table(result, digits = 3, show_se = TRUE, stars = TRUE)
result |
A factorana_result object from estimate_model_rcpp() |
digits |
Number of decimal places (default 3) |
show_se |
Show standard errors in parentheses (default TRUE) |
stars |
Add significance stars (default TRUE) |
A data frame with the formatted table
# Estimate a small model and display the components table set.seed(1); n <- 200 f <- rnorm(n) dat <- data.frame(intercept = 1, y1 = 1.0 * f + rnorm(n, 0, 0.5), y2 = 0.8 * f + rnorm(n, 0, 0.5)) fm <- define_factor_model(n_factors = 1) mc1 <- define_model_component("m1", dat, "y1", fm, covariates = "intercept", model_type = "linear", loading_normalization = 1) mc2 <- define_model_component("m2", dat, "y2", fm, covariates = "intercept", model_type = "linear", loading_normalization = NA_real_) ms <- define_model_system(components = list(mc1, mc2), factor = fm) ctrl <- define_estimation_control(n_quad_points = 8, num_cores = 1) fit <- estimate_model_rcpp(ms, dat, control = ctrl, optimizer = "nlminb", parallel = FALSE, verbose = FALSE) components_table(fit)# Estimate a small model and display the components table set.seed(1); n <- 200 f <- rnorm(n) dat <- data.frame(intercept = 1, y1 = 1.0 * f + rnorm(n, 0, 0.5), y2 = 0.8 * f + rnorm(n, 0, 0.5)) fm <- define_factor_model(n_factors = 1) mc1 <- define_model_component("m1", dat, "y1", fm, covariates = "intercept", model_type = "linear", loading_normalization = 1) mc2 <- define_model_component("m2", dat, "y2", fm, covariates = "intercept", model_type = "linear", loading_normalization = NA_real_) ms <- define_model_system(components = list(mc1, mc2), factor = fm) ctrl <- define_estimation_control(n_quad_points = 8, num_cores = 1) fit <- estimate_model_rcpp(ms, dat, control = ctrl, optimizer = "nlminb", parallel = FALSE, verbose = FALSE) components_table(fit)
Creates a LaTeX table from a factorana_result with components as columns.
components_to_latex( result, digits = 3, file = NULL, caption = NULL, label = NULL, stars = TRUE, note = NULL, booktabs = TRUE )components_to_latex( result, digits = 3, file = NULL, caption = NULL, label = NULL, stars = TRUE, note = NULL, booktabs = TRUE )
result |
A factorana_result object from estimate_model_rcpp() |
digits |
Number of decimal places (default 3) |
file |
Optional file path to write the LaTeX table |
caption |
Table caption (optional) |
label |
Table label for cross-referencing (optional) |
stars |
Add significance stars (default TRUE) |
note |
Footnote text (optional, default includes significance codes) |
booktabs |
Use booktabs package formatting (default TRUE) |
A character string containing the LaTeX table code
Builds a Stage 1 measurement model for a single latent construct
observed at two or more time points. Measurement invariance is imposed
on loadings and residual sigmas (tied across periods via
equality_constraints), while measurement intercepts are left
period-specific. This is the recommended Stage 1 setup for an SE_linear
or SE_quadratic structural model in the second stage.
define_dynamic_measurement( data, items, period_prefixes, model_type = "linear", n_categories = NULL, covariates = "intercept", evaluation_indicator = NULL )define_dynamic_measurement( data, items, period_prefixes, model_type = "linear", n_categories = NULL, covariates = "intercept", evaluation_indicator = NULL )
data |
Wide-format data frame. Must contain a column named
|
items |
Character vector of item names (e.g.,
|
period_prefixes |
Character vector of column prefixes, one per
period. Column names are assembled as |
model_type |
One of |
n_categories |
Required for |
covariates |
Character vector of covariate column names; default
|
evaluation_indicator |
Name of a column with 0/1 values
indicating which observations contribute to each component's
likelihood; |
Why period-specific intercepts? Under the usual factor-model
identification convention E[f_k] = 0 for every factor k,
pooling measurement intercepts across periods biases them by
when the period-mean drifts between
waves, an artefact that propagates into an under-estimate of
se_intercept in Stage 2. Leaving the intercepts period-specific
and carrying the wave-1 intercepts into Stage 2 (via
build_dynamic_previous_stage) sidesteps the bias.
An object of class "dynamic_measurement": a list with
model_system: a model_system object ready to
pass to estimate_model_rcpp.
factor_model: the underlying factor_model.
items, period_prefixes, model_type,
n_categories, covariates,
evaluation_indicator: the inputs, kept for use by
build_dynamic_previous_stage.
build_dynamic_previous_stage constructs the
Stage 2 previous_stage object from the Stage 1 estimation
result.
# Simulate a simple dynamic single-factor model set.seed(1); n <- 500 f1 <- rnorm(n); eps <- rnorm(n, 0, sqrt(0.5)) f2 <- 0.4 + 0.6 * f1 + eps dat <- data.frame( intercept = 1, eval = 1L, Y_t1_m1 = 1.5 + f1 + rnorm(n, 0, 0.7), Y_t1_m2 = 1.0 + 0.9 * f1 + rnorm(n, 0, 0.75), Y_t1_m3 = 0.8 + 1.1 * f1 + rnorm(n, 0, 0.65), Y_t2_m1 = 1.5 + f2 + rnorm(n, 0, 0.7), Y_t2_m2 = 1.0 + 0.9 * f2 + rnorm(n, 0, 0.75), Y_t2_m3 = 0.8 + 1.1 * f2 + rnorm(n, 0, 0.65) ) dyn <- define_dynamic_measurement( data = dat, items = c("m1", "m2", "m3"), period_prefixes = c("Y_t1_", "Y_t2_"), model_type = "linear", evaluation_indicator = "eval" ) ctrl <- define_estimation_control(n_quad_points = 8, num_cores = 1) s1 <- estimate_model_rcpp(dyn$model_system, dat, control = ctrl, optimizer = "nlminb", parallel = FALSE, verbose = FALSE) prev <- build_dynamic_previous_stage(dyn, s1, dat) # Stage 2 input# Simulate a simple dynamic single-factor model set.seed(1); n <- 500 f1 <- rnorm(n); eps <- rnorm(n, 0, sqrt(0.5)) f2 <- 0.4 + 0.6 * f1 + eps dat <- data.frame( intercept = 1, eval = 1L, Y_t1_m1 = 1.5 + f1 + rnorm(n, 0, 0.7), Y_t1_m2 = 1.0 + 0.9 * f1 + rnorm(n, 0, 0.75), Y_t1_m3 = 0.8 + 1.1 * f1 + rnorm(n, 0, 0.65), Y_t2_m1 = 1.5 + f2 + rnorm(n, 0, 0.7), Y_t2_m2 = 1.0 + 0.9 * f2 + rnorm(n, 0, 0.75), Y_t2_m3 = 0.8 + 1.1 * f2 + rnorm(n, 0, 0.65) ) dyn <- define_dynamic_measurement( data = dat, items = c("m1", "m2", "m3"), period_prefixes = c("Y_t1_", "Y_t2_"), model_type = "linear", evaluation_indicator = "eval" ) ctrl <- define_estimation_control(n_quad_points = 8, num_cores = 1) s1 <- estimate_model_rcpp(dyn$model_system, dat, control = ctrl, optimizer = "nlminb", parallel = FALSE, verbose = FALSE) prev <- build_dynamic_previous_stage(dyn, s1, dat) # Stage 2 input
Define estimation control settings
define_estimation_control( n_quad_points = 16, num_cores = 1, cluster_type = "auto", adaptive_integration = FALSE, adapt_int_thresh = 0.5 )define_estimation_control( n_quad_points = 16, num_cores = 1, cluster_type = "auto", adaptive_integration = FALSE, adapt_int_thresh = 0.5 )
n_quad_points |
Integer. Number of Gauss-Hermite quadrature points for numerical integration (default = 16) |
num_cores |
Integer. Number of processes to use for parallel estimation (default = 1) |
cluster_type |
Character. Type of parallel cluster to use: "auto" (default), "FORK", or "PSOCK". "auto" uses FORK on Unix (faster, shared memory) and PSOCK on Windows. "FORK" forces fork-based parallelism (Unix only, faster due to shared memory). "PSOCK" forces socket-based parallelism (works on all platforms, more overhead). |
adaptive_integration |
Logical. Whether to use adaptive integration in second-stage estimation (default = FALSE). When TRUE, the number of quadrature points per observation is determined based on the standard error of factor scores from a previous estimation stage. |
adapt_int_thresh |
Numeric. Threshold for adaptive integration (default = 0.5).
Smaller values use more integration points. The formula is:
|
Adaptive integration is useful in two-stage estimation where factor scores have been estimated in a first stage. The key insight is that observations with well-identified factor scores (small SE) need fewer integration points, while observations with poorly-identified factors (large SE) need full quadrature.
To use adaptive integration:
Estimate Stage 1 model to get factor scores via estimate_factorscores_rcpp()
Initialize FactorModel for Stage 2 via initialize_factor_model_cpp()
Call set_adaptive_quadrature_cpp() with factor scores, SEs, and variances
Run estimation - the adaptive settings are applied automatically
The diagnostic output from set_adaptive_quadrature_cpp(verbose=TRUE) shows:
Distribution of integration points across observations
Average integration points vs standard (shows computational savings)
Computational reduction percentage
An object of class estimation_control containing control settings
Creates an object of class "factor_model" that specifies the structure of the
unobserved latent factors. This includes the number of factors and mixture components.
Loading constraints are specified at the component level via define_model_component().
Numerical integration settings (quadrature points) are specified in define_estimation_control().
define_factor_model( n_factors, n_types = 1, factor_structure = "independent", n_mixtures = 1, factor_covariates = NULL, se_covariates = NULL )define_factor_model( n_factors, n_types = 1, factor_structure = "independent", n_mixtures = 1, factor_covariates = NULL, se_covariates = NULL )
n_factors |
Integer. Number of latent factors (>=0). Use 0 for models without latent factors. |
n_types |
Integer. Number of types (>=1) |
factor_structure |
Character. Structure of factor dependencies. Options:
|
n_mixtures |
Integer. Number of discrete mixtures (default = 1, allowed: 1-3) |
factor_covariates |
Character vector. Names of covariates that shift factor means.
When specified, the factor distribution becomes f_i ~ N((X_i - mean(X)) * gamma, Sigma)
where X_i is the covariate vector for observation i and gamma is a matrix of coefficients
(one column per factor). This allows factor means to vary by observed characteristics.
The covariates must be present in the data passed to |
se_covariates |
Character vector. Names of covariates that directly affect the outcome factor in SE models. When specified, the structural equation becomes f_k = intercept + sum(alpha_j * f_j) + sum(beta_m * X_m) + epsilon. Valid for any SE structural-equation factor structure (SE_linear, SE_quadratic, SE_interactions, SE_full). All covariates are automatically demeaned internally for identification. |
An object of class "factor_model"
# Single factor model fm <- define_factor_model(n_factors = 1) # Two-factor structural equation model fm_se <- define_factor_model(n_factors = 2, factor_structure = "SE_linear")# Single factor model fm <- define_factor_model(n_factors = 1) # Two-factor structural equation model fm_se <- define_factor_model(n_factors = 2, factor_structure = "SE_linear")
Define a model component
define_model_component( name, data, outcome, factor, evaluation_indicator = NULL, covariates = NULL, model_type = c("linear", "logit", "probit", "oprobit"), intercept = TRUE, num_choices = 2, nrank = NULL, exclude_chosen = TRUE, rankshare_var = NULL, loading_normalization = NULL, factor_index = NULL, factor_spec = c("linear", "quadratic", "interactions", "full"), use_types = FALSE, skip_collinearity_check = FALSE )define_model_component( name, data, outcome, factor, evaluation_indicator = NULL, covariates = NULL, model_type = c("linear", "logit", "probit", "oprobit"), intercept = TRUE, num_choices = 2, nrank = NULL, exclude_chosen = TRUE, rankshare_var = NULL, loading_normalization = NULL, factor_index = NULL, factor_spec = c("linear", "quadratic", "interactions", "full"), use_types = FALSE, skip_collinearity_check = FALSE )
name |
name of model component ####Long name and s name (like in the C++) |
data |
data.frame for validation |
outcome |
Character. Name of the outcome variable. |
factor |
model. Object of Class factor_model. |
evaluation_indicator |
Character (optional). Variable used for evaluation subsample. |
covariates |
Character vector. Names of covariate columns in |
model_type |
Character. Type of model (e.g., "linear", "logit", "probit"). |
intercept |
Logical. Whether to include an intercept (default = TRUE). |
num_choices |
Integer. Number of choices (for multinomial models). |
nrank |
Integer (optional). Rank for exploded multinomial logit. |
exclude_chosen |
Logical. For exploded logit, whether to exclude already-chosen alternatives from later ranks (default TRUE). Set to FALSE for exploded nested logit where the same nest can be chosen multiple times. |
rankshare_var |
Character (optional). Column name (or prefix) for rank-share correction variables. These provide rank-and-choice-specific adjustments to the linear predictor. Data layout: (num_choices-1) * nrank columns, accessed as rankshare_var + (num_choices-1)*irank + icat for irank=0..nrank-1, icat=0..num_choices-2. |
loading_normalization |
Numeric vector of length
|
factor_index |
Integer (optional). Convenience for assigning an item to a
single factor: the item loads only on factor |
factor_spec |
Character. Specification for factor terms in linear predictor.
|
use_types |
Logical. Whether this component uses type-specific intercepts when n_types > 1 in the factor model. Default FALSE. When TRUE and n_types > 1, the component will have (n_types - 1) type-specific intercept parameters that shift the linear predictor for each non-reference type. This allows types to affect outcome models while keeping measurement models type-invariant. |
skip_collinearity_check |
Logical. If TRUE, skip the multicollinearity check on the design matrix. Useful when many coefficients will be fixed via fix_coefficient() after component creation, which resolves the collinearity. Default FALSE. |
An object of class "model_component". A list representing the model component
dat <- data.frame(y = rnorm(50), intercept = 1) fm <- define_factor_model(n_factors = 1) mc <- define_model_component("Y", dat, "y", fm, covariates = "intercept", model_type = "linear", loading_normalization = 1) # Multi-factor: assign an item to a factor with the length-n_factors vector. dat2 <- data.frame(y1 = rnorm(50), y2 = rnorm(50)) fm2 <- define_factor_model(n_factors = 2) # item loads on factor 1 only, as its anchor (loading fixed at 1): a1 <- define_model_component("a1", dat2, "y1", fm2, loading_normalization = c(1, 0)) # item loads freely on factor 2 only -- same thing via factor_index: a2 <- define_model_component("a2", dat2, "y2", fm2, factor_index = 2, loading_normalization = NA_real_)dat <- data.frame(y = rnorm(50), intercept = 1) fm <- define_factor_model(n_factors = 1) mc <- define_model_component("Y", dat, "y", fm, covariates = "intercept", model_type = "linear", loading_normalization = 1) # Multi-factor: assign an item to a factor with the length-n_factors vector. dat2 <- data.frame(y1 = rnorm(50), y2 = rnorm(50)) fm2 <- define_factor_model(n_factors = 2) # item loads on factor 1 only, as its anchor (loading fixed at 1): a1 <- define_model_component("a1", dat2, "y1", fm2, loading_normalization = c(1, 0)) # item loads freely on factor 2 only -- same thing via factor_index: a2 <- define_model_component("a2", dat2, "y2", fm2, factor_index = 2, loading_normalization = NA_real_)
Define a model system
define_model_system( components, factor, previous_stage = NULL, weights = NULL, equality_constraints = NULL, free_params = NULL )define_model_system( components, factor, previous_stage = NULL, weights = NULL, equality_constraints = NULL, free_params = NULL )
components |
A named list of model_component objects |
factor |
A factor_model object |
previous_stage |
Optional result from a previous estimation stage. If provided, the previous stage components and parameters will be fixed and prepended to the new components. This enables sequential/multi-stage estimation where early stages are held fixed while later stages are optimized. |
weights |
Optional. Name of a variable in the data containing observation weights.
When specified, each observation's contribution to the log-likelihood is multiplied
by its weight. Useful for survey weights, importance sampling, or giving different
observations different influence on the estimation. Weights should be positive.
The variable is extracted from the data passed to |
equality_constraints |
Optional. A list of character vectors, where each vector
specifies parameter names that should be constrained to be equal during estimation.
The first parameter in each group is the "primary" (freely estimated), and all
other parameters in the group are set equal to the primary.
Example: |
free_params |
Optional. A character vector of parameter names from previous_stage
that should remain FREE (not fixed) in the current stage. This allows selectively
freeing specific parameters while keeping others fixed. Commonly used to free
factor variances while keeping measurement loadings/thresholds fixed.
Example: |
An object of class "model_system". A list of model_component objects and one factor_model object.
dat <- data.frame(y1 = rnorm(50), y2 = rnorm(50), intercept = 1) fm <- define_factor_model(n_factors = 1) mc1 <- define_model_component("m1", dat, "y1", fm, covariates = "intercept", model_type = "linear", loading_normalization = 1) mc2 <- define_model_component("m2", dat, "y2", fm, covariates = "intercept", model_type = "linear", loading_normalization = NA_real_) ms <- define_model_system(components = list(mc1, mc2), factor = fm)dat <- data.frame(y1 = rnorm(50), y2 = rnorm(50), intercept = 1) fm <- define_factor_model(n_factors = 1) mc1 <- define_model_component("m1", dat, "y1", fm, covariates = "intercept", model_type = "linear", loading_normalization = 1) mc2 <- define_model_component("m2", dat, "y2", fm, covariates = "intercept", model_type = "linear", loading_normalization = NA_real_) ms <- define_model_system(components = list(mc1, mc2), factor = fm)
Reverts to standard (non-adaptive) quadrature integration.
disable_adaptive_quadrature_cpp(fm_ptr)disable_adaptive_quadrature_cpp(fm_ptr)
fm_ptr |
External pointer to FactorModel object |
No return value. Called for its side effect of disabling adaptive
quadrature and clearing any observation weights on the FactorModel
pointed to by fm_ptr.
Calls estimate_model(), then writes the four expected output files:
model_config.csv
meas_par.csv
system_inits_long.csv
simulated_data.csv
estimate_and_write( model_system, factor_model, control, data = NULL, results_dir )estimate_and_write( model_system, factor_model, control, data = NULL, results_dir )
model_system |
model_system object |
factor_model |
factor_model object |
control |
estimation_control object |
data |
Optional data frame (to save as simulated_data.csv) |
results_dir |
Directory for writing output files. This argument is
required; the function writes nothing without an explicit path. Use
|
Invisibly returns a list with two components: results (a
data frame of initial parameter values per component, as produced by
estimate_model()) and packed (a list with values and
ses giving the packed parameter vector and standard errors written
to meas_par.csv). Called primarily for the side effect of writing
model_config.csv, meas_par.csv, system_inits_long.csv,
and optionally simulated_data.csv into results_dir.
Estimates factor scores for each observation after model estimation. The model parameters are held fixed at their estimated values, and factor scores are computed as the posterior mode for each observation.
estimate_factorscores_rcpp( result, data, control = NULL, parallel = FALSE, verbose = TRUE, include_prior = FALSE, id_var = NULL )estimate_factorscores_rcpp( result, data, control = NULL, parallel = FALSE, verbose = TRUE, include_prior = FALSE, id_var = NULL )
result |
A factorana_result object from estimate_model_rcpp() |
data |
Data frame containing all variables (same as used in estimation) |
control |
Optional estimation control object. If NULL, uses default. |
parallel |
Whether to use parallel computation (default FALSE). When TRUE, uses the num_cores setting from control. |
verbose |
Whether to print progress (default TRUE) |
include_prior |
Whether to include the factor prior in likelihood/SE computation (default FALSE). When FALSE, matches legacy C++ behavior where SEs are based only on observation likelihood. When TRUE, includes the prior contribution to the Hessian, resulting in smaller SEs. |
id_var |
Optional character string specifying the name of an ID variable in data. If provided, this variable is included in the output data frame for easier merging with other datasets. The ID values are taken from the original data in the order observations were processed. Note: The ID column must be numeric (not character) since data is converted to a numeric matrix. For character IDs, use the obs_id column to merge with your original data. |
Factor scores are estimated by maximizing the posterior density:
where:
is the vector of factor values for observation i
is the observed data for observation i
are the fixed model parameters (from previous estimation)
is the normal prior on factors
Standard errors are computed from the diagonal of the inverse Hessian of the log-posterior at the mode. By default (include_prior=FALSE), the SE is based only on the observation likelihood Hessian, matching the legacy C++ implementation. Set include_prior=TRUE to include the prior's Hessian contribution (-1/sigma^2) which produces smaller SEs.
When parallel=TRUE, observations are distributed across cores using doParallel/foreach, with each worker processing a subset of observations.
A data frame with columns:
obs_id - Observation index (1-based)
<id_var> - ID variable values (if id_var was specified)
factor_1, factor_2, ... - Estimated factor scores
se_factor_1, se_factor_2, ... - Standard errors
converged - Whether optimization converged for this observation
log_posterior - Log-posterior value at the mode
# Estimate a small one-factor model, then recover factor scores set.seed(1); n <- 200 f <- rnorm(n) dat <- data.frame(intercept = 1, y1 = 1.0 * f + rnorm(n, 0, 0.5), y2 = 0.8 * f + rnorm(n, 0, 0.5)) fm <- define_factor_model(n_factors = 1) mc1 <- define_model_component("m1", dat, "y1", fm, covariates = "intercept", model_type = "linear", loading_normalization = 1) mc2 <- define_model_component("m2", dat, "y2", fm, covariates = "intercept", model_type = "linear", loading_normalization = NA_real_) ms <- define_model_system(components = list(mc1, mc2), factor = fm) ctrl <- define_estimation_control(n_quad_points = 8, num_cores = 1) fit <- estimate_model_rcpp(ms, dat, control = ctrl, optimizer = "nlminb", parallel = FALSE, verbose = FALSE) fscores <- estimate_factorscores_rcpp(fit, dat, control = ctrl, parallel = FALSE, verbose = FALSE) head(fscores)# Estimate a small one-factor model, then recover factor scores set.seed(1); n <- 200 f <- rnorm(n) dat <- data.frame(intercept = 1, y1 = 1.0 * f + rnorm(n, 0, 0.5), y2 = 0.8 * f + rnorm(n, 0, 0.5)) fm <- define_factor_model(n_factors = 1) mc1 <- define_model_component("m1", dat, "y1", fm, covariates = "intercept", model_type = "linear", loading_normalization = 1) mc2 <- define_model_component("m2", dat, "y2", fm, covariates = "intercept", model_type = "linear", loading_normalization = NA_real_) ms <- define_model_system(components = list(mc1, mc2), factor = fm) ctrl <- define_estimation_control(n_quad_points = 8, num_cores = 1) fit <- estimate_model_rcpp(ms, dat, control = ctrl, optimizer = "nlminb", parallel = FALSE, verbose = FALSE) fscores <- estimate_factorscores_rcpp(fit, dat, control = ctrl, parallel = FALSE, verbose = FALSE) head(fscores)
Estimate model
estimate_model(ms, control)estimate_model(ms, control)
ms |
an object of class model_system |
control |
an object of class estimation_control |
description
This function estimates a factor model by optimizing the likelihood using C++ for fast evaluation and R for optimization and parallelization.
estimate_model_rcpp( model_system, data, init_params = NULL, control = NULL, optimizer = "nlminb", parallel = TRUE, verbose = TRUE, max_restarts = 5, factor_scores = NULL, factor_ses = NULL, factor_vars = NULL, init_factor_scores = NULL, checkpoint_file = NULL )estimate_model_rcpp( model_system, data, init_params = NULL, control = NULL, optimizer = "nlminb", parallel = TRUE, verbose = TRUE, max_restarts = 5, factor_scores = NULL, factor_ses = NULL, factor_vars = NULL, init_factor_scores = NULL, checkpoint_file = NULL )
model_system |
A model_system object from define_model_system() |
data |
Data frame containing all variables |
init_params |
Initial parameter values (optional) |
control |
Estimation control object from define_estimation_control() |
optimizer |
Optimizer to use (default "nloptr"):
|
parallel |
Whether to use parallel computation (default TRUE) |
verbose |
Whether to print progress (default TRUE) |
max_restarts |
Maximum number of eigenvector-based restarts for escaping saddle points (default 5). Set to 0 to disable. |
factor_scores |
Matrix (n_obs x n_factors) of factor score estimates from a previous stage. Used for adaptive quadrature. (default NULL) |
factor_ses |
Matrix (n_obs x n_factors) of factor score standard errors from a previous stage. Used for adaptive quadrature. (default NULL) |
factor_vars |
Named numeric vector of factor variances from a previous stage. Used for adaptive quadrature. (default NULL) |
init_factor_scores |
Matrix (n_obs x n_factors) of factor scores to use for initializing factor loadings. When provided, loadings are estimated by treating factor scores as regressors, giving better starting values than the default (0.5). Useful for two-stage estimation where Stage 1 factor scores can improve Stage 2 initialization. (default NULL) |
checkpoint_file |
Path to file for saving checkpoint parameters during optimization. When specified, parameters are saved each time the Hessian is evaluated at a point with improved likelihood. Useful for long-running estimations that may need to be restarted. The file contains parameter names and values as CSV with metadata headers. (default NULL = no checkpointing) |
For maximum efficiency, use optimizer = "nlminb" or optimizer = "trust"
which exploit the analytical Hessian computed in C++. The default L-BFGS methods
only use the gradient and approximate the Hessian from gradient history.
When optimization fails to converge (possibly at a saddle point), the function
will attempt to escape by moving in the direction of negative Hessian eigenvalues.
This is controlled by the max_restarts parameter.
List with parameter estimates, standard errors, log-likelihood, etc.
vcov_factorana() and robust_se() for robust and cluster-robust
standard errors from a fitted model.
# Simulate a simple one-factor model with two linear indicators set.seed(1); n <- 100 f <- rnorm(n) dat <- data.frame(intercept = 1, y1 = 1.0 * f + rnorm(n, 0, 0.5), y2 = 0.8 * f + rnorm(n, 0, 0.5)) fm <- define_factor_model(n_factors = 1) mc1 <- define_model_component("m1", dat, "y1", fm, covariates = "intercept", model_type = "linear", loading_normalization = 1) mc2 <- define_model_component("m2", dat, "y2", fm, covariates = "intercept", model_type = "linear", loading_normalization = NA_real_) ms <- define_model_system(components = list(mc1, mc2), factor = fm) ctrl <- define_estimation_control(n_quad_points = 8, num_cores = 1) result <- estimate_model_rcpp(ms, dat, control = ctrl, optimizer = "nlminb", parallel = FALSE, verbose = FALSE) result$estimates # Robust (Huber-White) standard errors from the same fit: robust_se(result, dat, type = "robust")# Simulate a simple one-factor model with two linear indicators set.seed(1); n <- 100 f <- rnorm(n) dat <- data.frame(intercept = 1, y1 = 1.0 * f + rnorm(n, 0, 0.5), y2 = 0.8 * f + rnorm(n, 0, 0.5)) fm <- define_factor_model(n_factors = 1) mc1 <- define_model_component("m1", dat, "y1", fm, covariates = "intercept", model_type = "linear", loading_normalization = 1) mc2 <- define_model_component("m2", dat, "y2", fm, covariates = "intercept", model_type = "linear", loading_normalization = NA_real_) ms <- define_model_system(components = list(mc1, mc2), factor = fm) ctrl <- define_estimation_control(n_quad_points = 8, num_cores = 1) result <- estimate_model_rcpp(ms, dat, control = ctrl, optimizer = "nlminb", parallel = FALSE, verbose = FALSE) result$estimates # Robust (Huber-White) standard errors from the same fit: robust_se(result, dat, type = "robust")
Used for factor score estimation. The model parameters are held fixed, and the factor values are treated as the parameters to optimize.
evaluate_factorscore_likelihood_cpp( fm_ptr, iobs, factor_values, model_params, compute_gradient = FALSE, compute_hessian = FALSE, include_prior = TRUE )evaluate_factorscore_likelihood_cpp( fm_ptr, iobs, factor_values, model_params, compute_gradient = FALSE, compute_hessian = FALSE, include_prior = TRUE )
fm_ptr |
External pointer to FactorModel object |
iobs |
Observation index (0-based) |
factor_values |
Vector of factor values (size n_factors) |
model_params |
Vector of ALL model parameters (from previous estimation) |
compute_gradient |
Whether to compute gradient (default FALSE) |
compute_hessian |
Whether to compute Hessian (default FALSE) |
include_prior |
Whether to include factor prior in likelihood (default TRUE). Set to FALSE to match legacy C++ behavior (observation likelihood only). |
List with log-likelihood, gradient (if requested), and Hessian (if requested)
Evaluate log-likelihood for given parameters
evaluate_likelihood_cpp( fm_ptr, params, compute_gradient = FALSE, compute_hessian = FALSE )evaluate_likelihood_cpp( fm_ptr, params, compute_gradient = FALSE, compute_hessian = FALSE )
fm_ptr |
External pointer to FactorModel object |
params |
Vector of parameters |
compute_gradient |
Whether to compute gradient (default FALSE) |
compute_hessian |
Whether to compute Hessian (default FALSE) |
List with:
logLikelihood: scalar log-likelihood value
gradient: vector of length n_param_free (if requested)
hessian: vector of length n_param_free*(n_param_free+1)/2 stored as
upper-triangular in row-major order (if requested).
To expand to full symmetric matrix in R:
idx <- 1; for(i in 1:n) for(j in i:n) { H[i,j] <- H[j,i] <- hess[idx]; idx <- idx + 1 }
Evaluate log-likelihood only (for optimization)
evaluate_loglik_only_cpp(fm_ptr, params)evaluate_loglik_only_cpp(fm_ptr, params)
fm_ptr |
External pointer to FactorModel object |
params |
Vector of parameters |
Log-likelihood value
Evaluates the model at the supplied FREE parameters and returns the
per-observation score matrix d(log L_i)/d(theta_free), i.e. the gradient of
each observation's marginal (factor-integrated) log-likelihood with respect
to the free parameters. Rows are observations (in the FactorModel's data
order), columns are free parameters. Summing the rows reproduces the total
gradient returned by evaluate_likelihood_cpp. This is the "meat"
ingredient of a sandwich covariance estimator.
evaluate_obs_scores_cpp(fm_ptr, params)evaluate_obs_scores_cpp(fm_ptr, params)
fm_ptr |
External pointer to FactorModel object |
params |
Vector of FREE parameters (typically the MLE) |
Numeric matrix (nobs x nparam_free) of per-observation scores
Given a full parameter vector (including fixed parameters), extract only the free parameters based on the model's fixed parameter mask.
extract_free_params_cpp(fm_ptr, full_params)extract_free_params_cpp(fm_ptr, full_params)
fm_ptr |
External pointer to FactorModel object |
full_params |
Full parameter vector (size n_param) |
Vector of free parameters only (size n_param_free)
Constrains a regression coefficient (beta) to a fixed value during estimation. The coefficient will not be optimized - it remains at the specified value.
fix_coefficient(component, covariate, value, choice = NULL)fix_coefficient(component, covariate, value, choice = NULL)
component |
A model_component object from define_model_component() |
covariate |
Character. Name of the covariate whose coefficient to fix. Must be one of the covariates specified when creating the component. |
value |
Numeric. The value to fix the coefficient to. |
choice |
Integer (optional). For multinomial logit models with num_choices > 2, specifies which choice's coefficient to fix. Must be between 1 and (num_choices - 1). Choice 0 (reference category) has no parameters. For other model types, this should be NULL (default). |
Fixed coefficients are stored in the component and used during:
Parameter initialization: fixed values are used directly (no estimation)
Optimization: fixed parameters are excluded from the optimization
Likelihood evaluation: fixed values are inserted into the parameter vector
Note: This function only fixes regression coefficients (betas), not:
Factor loadings (use loading_normalization in define_model_component)
Sigma (for linear models)
Thresholds (for ordered probit)
The modified model_component object with the fixed coefficient constraint added.
# Build a minimal model component and fix a coefficient dat <- data.frame(intercept = 1, x1 = rnorm(20), y = rnorm(20)) fm <- define_factor_model(n_factors = 1) mc <- define_model_component( name = "y", data = dat, outcome = "y", factor = fm, covariates = c("intercept", "x1"), model_type = "linear", loading_normalization = 1 ) # Fix intercept to 0 and x1 coefficient to 0.1 mc <- fix_coefficient(mc, covariate = "intercept", value = 0.0) mc <- fix_coefficient(mc, covariate = "x1", value = 0.1) length(mc$fixed_coefficients) # 2 constraints stored on the component# Build a minimal model component and fix a coefficient dat <- data.frame(intercept = 1, x1 = rnorm(20), y = rnorm(20)) fm <- define_factor_model(n_factors = 1) mc <- define_model_component( name = "y", data = dat, outcome = "y", factor = fm, covariates = c("intercept", "x1"), model_type = "linear", loading_normalization = 1 ) # Fix intercept to 0 and x1 coefficient to 0.1 mc <- fix_coefficient(mc, covariate = "intercept", value = 0.0) mc <- fix_coefficient(mc, covariate = "x1", value = 0.1) length(mc$fixed_coefficients) # 2 constraints stored on the component
Constrains a parameter in the latent-factor distribution (factor variance,
SE-equation slope / intercept / residual variance, type-probability
intercept, type-probability loading, factor-mean covariate coefficient,
or SE covariate coefficient) to a fixed value. The parameter will be held
at that value during estimation, excluded from the optimizer's free vector,
and reported in result$estimates with result$std_errors == 0 and
result$param_table$fixed == TRUE.
fix_factor_param(factor_model, name, value = NULL)fix_factor_param(factor_model, name, value = NULL)
factor_model |
A |
name |
Either a single character string naming the factor-distribution
parameter to fix, or a character vector of names paired element-wise with
|
value |
Numeric. The value to fix the parameter to. Pass |
Use this for substantive restrictions known at model-definition time, e.g.,
setting a type_<t>_loading_<k> to 0 when factor k is known not to enter
the type-probability model. Estimating such a loading free can leak
identification, push the optimizer along a flat ridge, and inflate the
standard errors of the remaining type-model parameters.
The modified factor_model object. The fix is stored in
factor_model$fixed_params as a named numeric vector.
previous_stage / free_params
When define_model_system(previous_stage = ..., free_params = ...) is
used, fix_factor_param() always wins. The user-fixed value is the
binding constraint; any value for the same name in previous_stage$ estimates is ignored, and inclusion of the same name in free_params
is silently honoured (the parameter stays fixed regardless). A warning
is emitted once per conflict so accidental misuse surfaces during
debugging without spamming routine workflows.
For SE structures, type_<t>_loading_<n_factors> (the type loading on
the outcome factor) is auto-fixed at 0 because the type probability
model must depend only on input factors. Calling
fix_factor_param(fm, "type_<t>_loading_<n_factors>", 0) is a no-op.
Calling it with any non-zero value is an error.
When a factor variance is non-identified (no measurement component
fixes a non-zero loading on that factor), it is normally auto-fixed at
its initial value of 1. An explicit
fix_factor_param(fm, "factor_var_<k>", v) overrides the auto-fix and
uses the user-supplied value.
fm <- define_factor_model(n_factors = 3, n_types = 2, factor_structure = "SE_linear") # Single fix fm <- fix_factor_param(fm, "type_2_loading_2", 0.0) # Batch via named numeric fm <- fix_factor_param(fm, c(type_2_loading_2 = 0.0, type_2_loading_3 = 0.0)) # Unfix fm <- fix_factor_param(fm, "type_2_loading_2", NA_real_)fm <- define_factor_model(n_factors = 3, n_types = 2, factor_structure = "SE_linear") # Single fix fm <- fix_factor_param(fm, "type_2_loading_2", 0.0) # Batch via named numeric fm <- fix_factor_param(fm, c(type_2_loading_2 = 0.0, type_2_loading_3 = 0.0)) # Unfix fm <- fix_factor_param(fm, "type_2_loading_2", NA_real_)
For models with n_types > 1, each component has type-specific intercepts that shift the linear predictor for each non-reference type. This function constrains those intercepts to zero, effectively removing the type-specific shift for this component.
fix_type_intercepts(component, types = NULL, choice = NULL)fix_type_intercepts(component, types = NULL, choice = NULL)
component |
A model_component object from define_model_component() |
types |
Integer vector (optional). Which types to fix. Must be between 2 and n_types (type 1 is the reference with no intercept). Default is NULL, meaning all non-reference types. |
choice |
Integer (optional). For multinomial logit models with num_choices > 2, specifies which choice's type intercepts to fix. Must be between 1 and (num_choices - 1). Default is NULL, meaning all choices. |
When n_types > 1, the model includes a latent type structure where different types can have different intercepts in each measurement equation. By default, these intercepts are freely estimated. Fixing them to zero constrains the outcome equation to have no type-specific shifts (though the type model loadings at the factor level may still differ).
This is useful when you want the type model to affect outcomes only through factor loadings, not through direct type-specific intercepts.
Fixed type intercepts are stored in the component and used during:
Parameter initialization: fixed values are used directly (no estimation)
Optimization: fixed parameters are excluded from the optimization
Likelihood evaluation: fixed values are inserted into the parameter vector
The modified model_component object with the fixed type intercept constraints added.
# Build a component with n_types = 2 and fix the type-2 intercept dat <- data.frame(intercept = 1, y = rnorm(20)) fm <- define_factor_model(n_factors = 1, n_types = 2) mc <- define_model_component( name = "y", data = dat, outcome = "y", factor = fm, covariates = "intercept", model_type = "linear", loading_normalization = 1, use_types = TRUE ) mc <- fix_type_intercepts(mc) length(mc$fixed_type_intercepts)# Build a component with n_types = 2 and fix the type-2 intercept dat <- data.frame(intercept = 1, y = rnorm(20)) fm <- define_factor_model(n_factors = 1, n_types = 2) mc <- define_model_component( name = "y", data = dat, outcome = "y", factor = fm, covariates = "intercept", model_type = "linear", loading_normalization = 1, use_types = TRUE ) mc <- fix_type_intercepts(mc) length(mc$fixed_type_intercepts)
Compute Gauss-Hermite quadrature nodes and weights
gauss_hermite_quadrature(n)gauss_hermite_quadrature(n)
n |
Number of quadrature points |
List with nodes and weights
Draws R bootstrap replicates by resampling clusters (or rows) with
replacement and records, for each replicate, the integer frequency weight of
every row (the number of times its cluster was drawn). The weights are saved
once so that the same replicate is reproducible across nodes and across
restarts.
generate_bootstrap_samples(data, R, cluster = NULL, dir = NULL, seed = NULL)generate_bootstrap_samples(data, R, cluster = NULL, dir = NULL, seed = NULL)
data |
Data frame used for estimation. |
R |
Number of bootstrap replicates. |
cluster |
Optional cluster id: a column name in |
dir |
Optional directory to write the samples object to (as
|
seed |
Optional integer seed for reproducibility. |
A factorana_boot_samples list with weights (an integer
nrow(data) x R matrix), and metadata (R, nobs, n_clusters).
Invisibly returns the same object when dir is supplied.
bootstrap_fit_sample(), collect_bootstrap(), bootstrap_factorana()
Get parameter counts from FactorModel
get_parameter_info_cpp(fm_ptr)get_parameter_info_cpp(fm_ptr)
fm_ptr |
External pointer to FactorModel object |
List with parameter count information
Initialize a FactorModel C++ object from R model system
initialize_factor_model_cpp( model_system, data, n_quad = 8L, init_params = NULL )initialize_factor_model_cpp( model_system, data, n_quad = 8L, init_params = NULL )
model_system |
R model_system object |
data |
Data frame or matrix with all variables |
n_quad |
Number of quadrature points |
init_params |
Optional initial parameter vector (used to set fixed parameter values) |
External pointer to FactorModel object
Estimates each model component separately in R (ignoring factors) to obtain good starting values. Also checks factor identification.
initialize_parameters(model_system, data, factor_scores = NULL, verbose = TRUE)initialize_parameters(model_system, data, factor_scores = NULL, verbose = TRUE)
model_system |
A model_system object from define_model_system() |
data |
Data frame containing all variables |
factor_scores |
Optional matrix of factor scores (nobs x n_factors). When provided, factor loadings are estimated by including factor scores as regressors in the initialization regressions. This is useful for two-stage estimation where factor scores from Stage 1 can be used to initialize loadings in Stage 2 outcomes. |
verbose |
Whether to print progress (default TRUE) |
Factor identification: For each factor, if NO component has a non-zero fixed loading, then the factor variance is not identified and must be fixed to 1.0.
When factor_scores is provided, the initialization will estimate loadings by
treating factor scores as additional regressors. For example, for a linear model:
Y ~ X + factor_scores and the coefficients on factor_scores become the loading
estimates. This typically provides better starting values than the default (0.5).
List with:
init_params - Initial parameter values
factor_variance_fixed - Logical vector indicating which factor variances must be fixed
Print method for components_table
## S3 method for class 'components_table' print(x, ...)## S3 method for class 'components_table' print(x, ...)
x |
A components_table object |
... |
Additional arguments (ignored) |
Invisibly returns x. Called for its side effect of printing
a components-as-columns view of model estimates (one column per model
component, factor parameters in a separate section) to the console.
Print method for factorana_factorscores
## S3 method for class 'factorana_factorscores' print(x, n = 10, ...)## S3 method for class 'factorana_factorscores' print(x, n = 10, ...)
x |
A factorana_factorscores object |
n |
Number of rows to print (default 10) |
... |
Additional arguments (ignored) |
Invisibly returns x. Called for its side effect of printing
a summary of the factor score estimates (convergence rate, per-factor
summary statistics, and the first n rows) to the console.
Functions for displaying and exporting factor model estimation results in formatted tables, including LaTeX output. Print method for factorana_result objects
## S3 method for class 'factorana_result' print(x, digits = 4, ...)## S3 method for class 'factorana_result' print(x, digits = 4, ...)
x |
A factorana_result object from estimate_model_rcpp() |
digits |
Number of decimal places (default 4) |
... |
Additional arguments (ignored) |
Invisibly returns x. Called for its side effect of printing
a formatted summary (convergence status, log-likelihood, and a parameter
table with estimates and standard errors) to the console.
Print method for factorana_table
## S3 method for class 'factorana_table' print(x, ...)## S3 method for class 'factorana_table' print(x, ...)
x |
A factorana_table object |
... |
Additional arguments (ignored) |
Invisibly returns x. Called for its side effect of printing
a side-by-side model-comparison table (parameters grouped by component,
with estimates and standard errors) to the console.
Print method for model_component objects
## S3 method for class 'model_component' print(x, ...)## S3 method for class 'model_component' print(x, ...)
x |
An object of class "model_component". |
... |
Not used. |
Invisibly returns x. Called for its side effect of printing
a human-readable summary of the component to the console.
Print method for summary.factorana_result
## S3 method for class 'summary.factorana_result' print(x, digits = 4, ...)## S3 method for class 'summary.factorana_result' print(x, digits = 4, ...)
x |
A summary.factorana_result object |
digits |
Number of decimal places (default 4) |
... |
Additional arguments (ignored) |
Invisibly returns x. Called for its side effect of printing
a detailed estimation summary (convergence, log-likelihood, and a
coefficient table with z-values and significance stars) to the console.
Creates a table comparing estimates across multiple models, with standard errors in parentheses below each estimate.
results_table( ..., model_names = NULL, digits = 3, se_format = c("parentheses", "brackets", "below"), include_loglik = TRUE, include_n = TRUE, stars = TRUE )results_table( ..., model_names = NULL, digits = 3, se_format = c("parentheses", "brackets", "below"), include_loglik = TRUE, include_n = TRUE, stars = TRUE )
... |
One or more factorana_result objects, or a list of them |
model_names |
Optional character vector of model names for column headers |
digits |
Number of decimal places (default 3) |
se_format |
How to display standard errors: "parentheses" (default), "brackets", or "below" |
include_loglik |
Include log-likelihood row (default TRUE) |
include_n |
Include number of observations (default TRUE) |
stars |
Add significance stars (default TRUE) |
A data frame with the formatted table
# Estimate a small one-factor model and format the result as a table set.seed(1); n <- 200 f <- rnorm(n) dat <- data.frame(intercept = 1, y1 = 1.0 * f + rnorm(n, 0, 0.5), y2 = 0.8 * f + rnorm(n, 0, 0.5)) fm <- define_factor_model(n_factors = 1) mc1 <- define_model_component("m1", dat, "y1", fm, covariates = "intercept", model_type = "linear", loading_normalization = 1) mc2 <- define_model_component("m2", dat, "y2", fm, covariates = "intercept", model_type = "linear", loading_normalization = NA_real_) ms <- define_model_system(components = list(mc1, mc2), factor = fm) ctrl <- define_estimation_control(n_quad_points = 8, num_cores = 1) fit <- estimate_model_rcpp(ms, dat, control = ctrl, optimizer = "nlminb", parallel = FALSE, verbose = FALSE) results_table(fit, model_names = "Baseline")# Estimate a small one-factor model and format the result as a table set.seed(1); n <- 200 f <- rnorm(n) dat <- data.frame(intercept = 1, y1 = 1.0 * f + rnorm(n, 0, 0.5), y2 = 0.8 * f + rnorm(n, 0, 0.5)) fm <- define_factor_model(n_factors = 1) mc1 <- define_model_component("m1", dat, "y1", fm, covariates = "intercept", model_type = "linear", loading_normalization = 1) mc2 <- define_model_component("m2", dat, "y2", fm, covariates = "intercept", model_type = "linear", loading_normalization = NA_real_) ms <- define_model_system(components = list(mc1, mc2), factor = fm) ctrl <- define_estimation_control(n_quad_points = 8, num_cores = 1) fit <- estimate_model_rcpp(ms, dat, control = ctrl, optimizer = "nlminb", parallel = FALSE, verbose = FALSE) results_table(fit, model_names = "Baseline")
Creates a LaTeX table from factorana results, suitable for academic papers.
results_to_latex( ..., model_names = NULL, digits = 3, file = NULL, caption = NULL, label = NULL, stars = TRUE, note = NULL, booktabs = TRUE, include_loglik = TRUE, param_labels = NULL )results_to_latex( ..., model_names = NULL, digits = 3, file = NULL, caption = NULL, label = NULL, stars = TRUE, note = NULL, booktabs = TRUE, include_loglik = TRUE, param_labels = NULL )
... |
One or more factorana_result objects, or a list of them |
model_names |
Optional character vector of model names for column headers |
digits |
Number of decimal places (default 3) |
file |
Optional file path to write the LaTeX table |
caption |
Table caption (optional) |
label |
Table label for cross-referencing (optional) |
stars |
Add significance stars (default TRUE) |
note |
Footnote text (optional, default includes significance codes) |
booktabs |
Use booktabs package formatting (default TRUE) |
include_loglik |
Include log-likelihood row (default TRUE) |
param_labels |
Optional named list mapping parameter names to display labels |
A character string containing the LaTeX table code
# Estimate a small model and export the result as a LaTeX table set.seed(1); n <- 200 f <- rnorm(n) dat <- data.frame(intercept = 1, y1 = 1.0 * f + rnorm(n, 0, 0.5), y2 = 0.8 * f + rnorm(n, 0, 0.5)) fm <- define_factor_model(n_factors = 1) mc1 <- define_model_component("m1", dat, "y1", fm, covariates = "intercept", model_type = "linear", loading_normalization = 1) mc2 <- define_model_component("m2", dat, "y2", fm, covariates = "intercept", model_type = "linear", loading_normalization = NA_real_) ms <- define_model_system(components = list(mc1, mc2), factor = fm) ctrl <- define_estimation_control(n_quad_points = 8, num_cores = 1) fit <- estimate_model_rcpp(ms, dat, control = ctrl, optimizer = "nlminb", parallel = FALSE, verbose = FALSE) # Return LaTeX as a character string tex <- results_to_latex(fit, model_names = "Baseline") # Or write to a file inside tempdir() results_to_latex(fit, model_names = "Baseline", file = file.path(tempdir(), "results_table.tex"))# Estimate a small model and export the result as a LaTeX table set.seed(1); n <- 200 f <- rnorm(n) dat <- data.frame(intercept = 1, y1 = 1.0 * f + rnorm(n, 0, 0.5), y2 = 0.8 * f + rnorm(n, 0, 0.5)) fm <- define_factor_model(n_factors = 1) mc1 <- define_model_component("m1", dat, "y1", fm, covariates = "intercept", model_type = "linear", loading_normalization = 1) mc2 <- define_model_component("m2", dat, "y2", fm, covariates = "intercept", model_type = "linear", loading_normalization = NA_real_) ms <- define_model_system(components = list(mc1, mc2), factor = fm) ctrl <- define_estimation_control(n_quad_points = 8, num_cores = 1) fit <- estimate_model_rcpp(ms, dat, control = ctrl, optimizer = "nlminb", parallel = FALSE, verbose = FALSE) # Return LaTeX as a character string tex <- results_to_latex(fit, model_names = "Baseline") # Or write to a file inside tempdir() results_to_latex(fit, model_names = "Baseline", file = file.path(tempdir(), "results_table.tex"))
Convenience wrapper returning sqrt(diag(vcov_factorana(...))) as a named
vector over the full parameter vector. See vcov_factorana() for details
and caveats.
robust_se( object, data, type = c("robust", "cluster"), cluster = NULL, n_quad = NULL, finite_sample = TRUE )robust_se( object, data, type = c("robust", "cluster"), cluster = NULL, n_quad = NULL, finite_sample = TRUE )
object |
A |
data |
The data frame used to fit |
type |
Covariance type: |
cluster |
For |
n_quad |
Number of quadrature points for recomputing the per-observation
scores. Defaults to the value used at estimation (stored on |
finite_sample |
Logical; apply the usual finite-sample scaling
( |
Named numeric vector of standard errors (fixed/tied parameters are 0).
Enables adaptive integration where the number of quadrature points varies by observation based on the precision of factor score estimates. When factor scores are well-determined (small SE), fewer integration points are used. Importance sampling weights are computed automatically.
set_adaptive_quadrature_cpp( fm_ptr, factor_scores, factor_ses, factor_vars, threshold = 0.5, max_quad = 16L, verbose = TRUE )set_adaptive_quadrature_cpp( fm_ptr, factor_scores, factor_ses, factor_vars, threshold = 0.5, max_quad = 16L, verbose = TRUE )
fm_ptr |
External pointer to FactorModel object |
factor_scores |
Matrix (n_obs x n_factors) of factor score estimates |
factor_ses |
Matrix (n_obs x n_factors) of standard errors |
factor_vars |
Vector (n_factors) of factor variances from previous stage |
threshold |
Threshold for determining quadrature points (default 0.5, matching legacy) |
max_quad |
Maximum quadrature points per factor (default 16) |
verbose |
Whether to print summary of adaptive quadrature setup (default TRUE) |
No return value. Called for its side effect of enabling adaptive
quadrature on the FactorModel pointed to by fm_ptr and (when
verbose = TRUE) printing a summary of the per-observation
integration-point distribution.
Sets per-observation weights for the likelihood calculation. When weights are set, each observation's contribution to the log-likelihood is multiplied by its weight. This is used for importance sampling in adaptive integration.
set_observation_weights_cpp(fm_ptr, weights)set_observation_weights_cpp(fm_ptr, weights)
fm_ptr |
External pointer to FactorModel object |
weights |
Numeric vector of observation weights (length = n_obs) |
No return value. Called for its side effect of setting the
per-observation likelihood weights on the FactorModel pointed to by
fm_ptr.
Generates a simulated data set from a fitted or fully specified factorana
model, following the legacy simulation algorithm: for each simulated unit a
base row is drawn from data (with weights) to supply the covariates, the
latent factor(s) are drawn from the model's unconditional distribution, and
each component's outcome is drawn from its model (linear, probit, logit, or
ordered probit) given the covariates and factors.
simulate_factor_model( object, data, n = NULL, params = NULL, weights = NULL, seed = NULL, detail = TRUE )simulate_factor_model( object, data, n = NULL, params = NULL, weights = NULL, seed = NULL, detail = TRUE )
object |
A |
data |
Data frame whose rows are resampled to supply the covariates. |
n |
Number of units to simulate (default: |
params |
Named parameter vector. Defaults to |
weights |
Optional resampling weights: a column name in |
seed |
Optional integer seed. |
detail |
If |
Supports all four model types (linear, probit, logit including multinomial,
ordered probit), latent types, mixtures of normals, and the independent,
correlated, and structural-equation (SE_*) factor structures. The simulated
outcome column is named after each component's outcome variable, so the
returned data frame can be re-estimated with the same model system.
A data frame with one row per simulated unit: an xid, the resampled
covariates, the drawn latent factor(s) (factor_<k>), and per component the
simulated outcome (named after the component) plus, when detail = TRUE,
the diagnostic columns.
Summary method for factorana_result objects
## S3 method for class 'factorana_result' summary(object, ...)## S3 method for class 'factorana_result' summary(object, ...)
object |
A factorana_result object from estimate_model_rcpp() |
... |
Additional arguments (ignored) |
A summary object with formatted tables
Computes a sandwich (Huber-White) or cluster-robust covariance matrix for a
model fitted with estimate_model_rcpp(), as a post-hoc alternative to the
default inverse-Hessian (model-based) standard errors. The estimator is
where the "bread" is the inverse observed information already
computed at fit time and the "meat" is built from the
per-observation scores of the marginal (factor-integrated) log-likelihood:
(robust) or with
(cluster).
vcov_factorana( object, data, type = c("hessian", "robust", "cluster"), cluster = NULL, n_quad = NULL, finite_sample = TRUE )vcov_factorana( object, data, type = c("hessian", "robust", "cluster"), cluster = NULL, n_quad = NULL, finite_sample = TRUE )
object |
A |
data |
The data frame used to fit |
type |
Covariance type: |
cluster |
For |
n_quad |
Number of quadrature points for recomputing the per-observation
scores. Defaults to the value used at estimation (stored on |
finite_sample |
Logical; apply the usual finite-sample scaling
( |
One factorana observation is one row of data, with its measurements
integrated jointly over the latent factors. The latent factor therefore
already models the within-row dependence. Clustering is meaningful only when
a cluster groups several rows (for example several individuals in the same
household or school, or several stacked person-wave rows for one person in a
long-format second stage). With one row per cluster, the cluster estimator
reduces to the ordinary robust estimator.
For two-stage estimates these standard errors treat the first-stage (measurement) parameters and any plug-in factor scores as known, so they understate uncertainty (the generated-regressor problem). Use a bootstrap that re-runs both stages for honest two-stage inference.
A nparam x nparam covariance matrix over the full parameter vector,
with row/column names equal to the parameter names. Fixed and
equality-tied parameters have zero rows and columns. Standard errors are
sqrt(diag(.)).
Write a single CSV with all configuration rows
write_model_config_csv(model_system, factor_model, estimation_control, file)write_model_config_csv(model_system, factor_model, estimation_control, file)
model_system |
a model_system object |
factor_model |
a factor_model object |
estimation_control |
an estimation_control object |
file |
Path to CSV to write. Required; pass an explicit path (use
|
Invisibly returns the data frame of configuration rows that was
written to file (columns section, component,
key, value, dtype). Called primarily for its side
effect of writing a CSV.