Title: | Compare Heterogeneous Social Groups |
---|---|
Description: | The Inductive Subgroup Comparison Approach ('ISCA') offers a way to compare groups that are internally differentiated and heterogeneous. It starts by identifying the social structure of a reference group against which a minority or another group is to be compared, yielding empirical subgroups to which minority members are then matched based on how similar they are. The modelling of specific outcomes then occurs within specific subgroups in which majority and minority members are matched. ISCA is characterized by its data-driven, probabilistic, and iterative approach and combines fuzzy clustering, Monte Carlo simulation, and regression Analysis. ISCA_random_assignments() assigns subjects probabilistically to subgroups. ISCA_clustertable() provides summary statistics of each cluster across iterations. ISCA_modeling provides OLS regression results for each cluster across iterations. |
Authors: | Lucas Drouhot [aut, cre], Marion Späth [aut] |
Maintainer: | Lucas Drouhot <[email protected]> |
License: | GPL (>= 3) |
Version: | 0.1.0 |
Built: | 2025-03-02 04:50:10 UTC |
Source: | https://github.com/ldrouhot/isca |
Function to create a cluster or descriptive table across iterations.
ISCA_clustertable(data, cluster_vars, draws = 500)
ISCA_clustertable(data, cluster_vars, draws = 500)
data |
The dataset including all relevant variables and the random assignments from the first ISCA_random_assignments()-function. |
cluster_vars |
A vector specifying the variables of interest. |
draws |
Specification of the number of probabilistic draws. The number of draws should be equal to the number of draws specified in the first step. If not specified, the default is 500. |
The output is a table containing the grand mean, grand standard deviation, and cluster error for each variable and cluster. No cluster error is calculated for dichotomous variables.
data(sim_data) ISCA_step1 <- ISCA_random_assignments(data=sim_data, filter=native, majority_group=1, minority_group=c(0), fuzzifier = 1.5, n_clusters=4, draws=5, cluster_vars= c("female", "age", "education", "income")) result_ISCA_clustertable <- ISCA_clustertable(data = ISCA_step1, cluster_vars = c("native", "education", "age", "female", "discrimination", "religiosity"), draws = 5);
data(sim_data) ISCA_step1 <- ISCA_random_assignments(data=sim_data, filter=native, majority_group=1, minority_group=c(0), fuzzifier = 1.5, n_clusters=4, draws=5, cluster_vars= c("female", "age", "education", "income")) result_ISCA_clustertable <- ISCA_clustertable(data = ISCA_step1, cluster_vars = c("native", "education", "age", "female", "discrimination", "religiosity"), draws = 5);
Function to compute an OLS regression across all clusters and iterations.
ISCA_modeling(data, model_spec, weights = NULL, n_clusters, draws = 500)
ISCA_modeling(data, model_spec, weights = NULL, n_clusters, draws = 500)
data |
The dataset including all relevant variables and the random assignments from the first ISCA_random_assignments()-function. |
model_spec |
A model specification similar to the lm()-function. |
weights |
A vector specifying the variable in which the weights are stored. The default is NONE. |
n_clusters |
Specification of the number of clusters. This value should be equal to the number of clusters specified in the first and second step. |
draws |
Specification of the number of probabilistic draws. The number of draws should be equal to the number of draws specified in the first and second step. If not specified, the default is 500. |
The output is a table containing the regression coefficients, standard error and p-value for each regression term and cluster across all iterations. It also contains the regression coefficient, standard error and p-value for a pooled model, that is a model with all clusters combined.
data(sim_data) ISCA_step1 <- ISCA_random_assignments(data=sim_data, filter=native, majority_group=1, minority_group=c(0), fuzzifier = 1.5, n_clusters=4, draws=5, cluster_vars= c("female", "age", "education", "income")) ISCA_modeling_res <- ISCA_modeling(data= ISCA_step1, model_spec="religiosity ~ native + female + age + education + discrimination", draws = 5, n_clusters = 4);
data(sim_data) ISCA_step1 <- ISCA_random_assignments(data=sim_data, filter=native, majority_group=1, minority_group=c(0), fuzzifier = 1.5, n_clusters=4, draws=5, cluster_vars= c("female", "age", "education", "income")) ISCA_modeling_res <- ISCA_modeling(data= ISCA_step1, model_spec="religiosity ~ native + female + age + education + discrimination", draws = 5, n_clusters = 4);
Function that calculates membership scores for each subgroup and assigns a cluster for a number of random draws.
ISCA_random_assignments( data, filter, majority_group, minority_group, cluster_vars, fuzzifier = 1.5, n_clusters, draws = 500 )
ISCA_random_assignments( data, filter, majority_group, minority_group, cluster_vars, fuzzifier = 1.5, n_clusters, draws = 500 )
data |
A dataset containing all relevant variables |
filter |
Specification of the variable name that contains information on majority / minority status. |
majority_group |
Specification of the value within the variable specified in the previous filter-argument indicating majority status. This could be either a numeric value or a character string. |
minority_group |
specification of the value(s) indicating minority status in the filter variable. This could be either a numeric value or a character string. It can be one single minority group or a vector of several minority groups. |
cluster_vars |
Vector specifying the variables that should be used to create the clusters. |
fuzzifier |
The fuzzifier is a value larger than 1 determining the extent of overlap between clusters. A value of 1 effectively makes fuzzy c-means equivalent to hard k-means. The default is 1.5. |
n_clusters |
Specification of the number of clusters to be created. |
draws |
Specification of the number of probabilistic draws. If not specified, the default is 500. |
The output is a dataframe with all original variables and a new column for every draw, each containing one random assignment. This dataframe is the foundation of the subsequent functions in the ISCA package.
data(sim_data) ISCA_step1 <- ISCA_random_assignments(data=sim_data, filter=native, majority_group=1, minority_group=c(0), fuzzifier = 1.5, n_clusters=4, draws=5, cluster_vars= c("female", "age", "education", "income"));
data(sim_data) ISCA_step1 <- ISCA_random_assignments(data=sim_data, filter=native, majority_group=1, minority_group=c(0), fuzzifier = 1.5, n_clusters=4, draws=5, cluster_vars= c("female", "age", "education", "income"));
Small, artificially created dataset in a cross-sectional format. Provides information on 1000 individuals to illustrate the use of the package.
data(sim_data)
data(sim_data)
A data frame with 1,000 rows and 7 columns:
Dichotomous variable (0/1) indicating a person's sex
value indicating a person's age, range 18-80
value indicating a person's level of education, range 1-9
value indicating a person's income
value indicating a person's level of religiosity, range 1-10
value indicating a person's level of experience discrimination, range 0-8
Dichotomous variable (0/1) indicating whether a person is a native (1) or an immigrant (0)
The data was artificially created for the ISCA package.
data(sim_data) head(sim_data)
data(sim_data) head(sim_data)