Analysis Functions

Analytic Functions

analyze_cohort_persistence()

Tracks retention and persistence of a single student cohort from its entry grade and year onward, including new joiners.

More details

Arguments:

  • dataset: A data frame containing at least id, grade, and year columns.
  • start_grade: Numeric or string representing the cohort’s entry grade.
  • start_year: Numeric or string representing the academic year the cohort begins.
  • end_grade: Optional; final grade to track (default is highest in dataset).
  • subset_expr: Optional logical expression to subset the dataset before analysis.

Returns: A named list invisibly containing:

  • Summary: Long-format data frame showing cohort size and retention at each step.
  • Persistence_Data: List of filtered data frames for students retained each year.
  • Entry_Data: List of new joiners and their persistence outcomes.
  • Entry_Trajectories: List of retention trajectories for new joiners.
  • Statistics: Summary stats (initial/final counts, overall and average annual rates).
  • Heatmap_Plot, Line_Plot, Retention_Plot: Visualizations of cohort retention.
  • Enrollment_Plot, Enrollment_BarPlot: Enrollment trends by cohort and year.
  • Retention_Table: Wide-format table of percent retained by cohort and grade.

Example:

analyze_cohort_persistence(my_data, start_grade = 3, start_year = 2020)
analyze_entry_cohorts()

Summarizes the characteristics and duration-in-cohort of student entry cohorts, including demographic distributions and retention histograms.

⚠️ Warning: Please make sure to run analyze_student_cohorts() before using this function.

More details

Arguments:

  • trajectories: A data frame, specifically the $Trajectories output from analyze_student_cohorts().
  • demographic_variables: Optional character vector of demographic variables to summarize. If NULL, defaults to common variables such as GENDER, ETHNICITY, ELL_STATUS, etc.

Returns: A named list (invisibly) containing:

  • GroupSummaries: A list of data frames summarizing demographic or school characteristics by cohort.
  • DurationTable: A data frame with mean, median, and max years in cohort per entry group.
  • DurationHistograms: A list of ggplot2 histogram plots showing cohort retention distributions by entry year and grade.

Example:

# Step 1: Use analyze_student_cohorts to get the Trajectories
cohorts <- analyze_student_cohorts(dataset = math)

# Step 2: Use the Trajectories to analyze entry cohorts
analyze_entry_cohorts(trajectories = cohorts$Trajectories)
analyze_student_cohorts()

Analyzes multiple student cohorts’ trajectories and retention patterns across grades and years.

More details

Arguments:

  • dataset: A data frame with at least id, grade, and year columns.
  • details: Logical; if TRUE, includes intermediate data objects. (Reserved for future use.)
  • extra_variables: Optional character vector of additional student-level variables to merge (e.g., "GENDER", "ETHNICITY").

Returns: A named list invisibly containing:

  • Trajectories: Cleaned long-format data tracking valid grade progression over time.
  • Data: Nested list of student IDs by cohort and year.
  • Summary: Table of student counts and retention percentages by cohort-year-grade.
  • Table: Table of student counts by cohort-year-grade in wide format
  • Heatmaps: A list of ggplot heatmaps, one per cohort join year.

Example:

analyze_student_cohorts(dataset = my_data,
                        extra_variables = c("GENDER", "ETHNICITY"))

Comparative Functions

compare_achievement_mobility()

Compares student achievement levels by mobility status across two consecutive years.

More details

Arguments:

  • dataset: A data frame containing student-level records with achievement and mobility info.
  • current_year: The focal academic year (numeric or string).
  • current_grade: The grade level for the current year (numeric).
  • achievement_levels: Ordered character vector specifying achievement level categories.

Returns: A named list invisibly containing:

  • Previous_Table, Current_Table: Cross-tabulated percent distributions of achievement by mobility status for the two years.
  • Data: Combined data frame of achievement percentages by group and year.
  • Group_Sizes: Counts of students by mobility group for each year.
  • Most_Common_Level: Most frequent achievement level per mobility group and year.
  • Achievement_Change_Summary: Changes in achievement percentages between years.
  • Comparison_Plot: Side-by-side bar plots of achievement distributions for previous and current year.
  • Stay_Change_Plot, Stay_Change_Summary: Visualization and summary of achievement changes for the stable (“Stay”) group.
  • Caption: Descriptive text summarizing the analysis.
  • Note: Additional context about mobility groups.

Example:

compare_achievement_mobility(
  dataset = my_data,
  current_year = 2023,
  current_grade = 5,
  achievement_levels = c("Advanced", "Proficient", "Partially Proficient", "Unsatisfactory")
)