Korean National Assembly data for political science education.
Documentation: https://kyusik-yang.github.io/assemblykor/
Overview
The goal of assemblykor is to provide a curated collection of Korean National Assembly datasets for teaching quantitative methods in political science. Think of it as a Korean politics counterpart to palmerpenguins.
The package includes seven built-in datasets covering legislators, bills, asset declarations, policy seminars, committee speeches, plenary votes, and roll call records, all drawn from public data of the Korean National Assembly (2000-2026).
Why tidyverse?
This package and all its tutorials follow tidyverse conventions. We use dplyr, ggplot2, tidyr, and the pipe operator (%>%) throughout.
Why tidyverse-first for teaching?
- Readability: tidyverse syntax reads like a sequence of data operations, reducing cognitive load for beginners (Cetinkaya-Rundel et al., 2022).
- Consistency: Packages share a common grammar and data structure, so learning one makes the next easier (Wickham et al., 2019).
- Doing things quickly: Students can perform meaningful data analysis from day one, rather than spending weeks on base R syntax (Robinson, 2017).
- Community standard: Most modern R textbooks, including R for Data Science, adopt tidyverse as the default, making it easier for students to find help and resources.
Note: This package is designed for educational purposes only. The datasets have been processed and curated for classroom use and may not reflect the most up-to-date or complete records. For research or policy analysis, please verify against the original data sources listed in the Data sources section below.
Meet the data
|
20-22대 국회의원 메타데이터 (이름, 정당, 선거구, 성별, 선수, 발의 건수) |
법안 메타데이터 (제목, 위원회, 발의일, 처리 결과, 대표발의자) |
|
의원 재산신고 패널 (순자산, 부동산, 예금, 주식, 13개 시점) |
정책세미나 활동 패널 (개최 건수, 초당적 비율, 여당 여부) |
|
22대 과학기술정보방송통신위원회 상임위 회의록 전수 (발언 전문, 화자 역할 포함) |
본회의 표결 결과 (20-22대, 찬성/반대/기권 수, 의결 결과) |
|
22대 개별 의원 표결 기록 (의원별 찬성/반대/기권/불참, 1,233개 법안) |
|
Usage
library(assemblykor)
data(legislators)
data(bills)
data(wealth)
data(seminars)
data(speeches)
data(votes)
data(roll_calls)Korean font setup
ggplot2 plots with Korean text (axis labels, titles) may show broken characters. Run this once per session to fix it:
set_ko_font()
#> Korean font set: Apple SD Gothic NeoThis auto-detects a Korean font on your system (macOS, Windows, Linux). The interactive tutorials (run_tutorial()) handle this automatically.
Downloadable datasets
Larger datasets are available via download functions (requires the arrow package):
# Bill propose-reason texts (60,925 texts, ~40 MB download)
texts <- get_bill_texts()
# Co-sponsorship records (769,773 rows, ~25 MB download)
proposers <- get_proposers()Tutorials (한국어 수업 자료)
The package includes nine Korean-language tutorials designed for classroom use in political science methods courses:
| # | Tutorial | Topic | Level |
|---|---|---|---|
| 1 | R 기초와 tidyverse |
dplyr 핵심 함수, 파이프, 데이터 결합 |
Beginner |
| 2 | ggplot2 시각화 | 막대, 히스토그램, 산점도, 박스플롯, facet | Beginner |
| 3 | 회귀분석 | OLS, 다중회귀, 로그 변환, 상호작용, 계수 시각화 | Intermediate |
| 4 | 패널 데이터와 고정효과 | 합동 OLS vs FE, 양방향 FE, DiD, 군집 표준오차 | Intermediate |
| 5 | 텍스트 분석 입문 | 키워드 빈도, TF-IDF, 발언 분석, 위원회별 비교 | Intermediate |
| 6 | 네트워크 분석 | 공동발의 네트워크, 중심성, 커뮤니티 탐지, 초당적 분석 | Advanced |
| 7 | 기명투표 분석 | Rice Index, 이탈투표, 투표 히트맵, 정당 응집력 | Advanced |
| 8 | 법안 가결 요인 | 이항 변수, 로지스틱 회귀, 승산비, 예측 확률 | Intermediate |
| 9 | 발언 패턴 분석 | 발언 빈도/길이, 발언 순서, 의제 키워드, 다양성 | Advanced |
Each tutorial is available in two formats:
Option A: Interactive browser (recommended for self-study)
# Launch an interactive tutorial with exercises in the browser
run_tutorial(1) # or run_tutorial("01-tidyverse-basics")Requires the learnr package (install.packages("learnr")). Students can type and run code directly in the browser with hints and solutions.
Option B: Plain R Markdown (for editing in RStudio)
# List available tutorials
list_tutorials()
# Copy an Rmd file to your working directory
open_tutorial(1) # or open_tutorial("01-tidyverse-basics")Students can edit and knit the Rmd file in RStudio at their own pace.
Joining datasets
All datasets share the member_id and/or assembly columns for easy joining:
# Merge legislators with wealth data
leg_wealth <- legislators %>%
inner_join(wealth, by = "member_id", relationship = "many-to-many")CSV access
CSV versions of the smaller datasets are available for teaching file I/O:
# Find the file path
path_to_file("legislators.csv")
# Read directly
legislators_csv <- read.csv(path_to_file("legislators.csv"), fileEncoding = "UTF-8")Data sources
All data in this package are derived from publicly available sources.
| Dataset | Source | License |
|---|---|---|
| Legislators, bills, proposers, votes, roll calls | Open National Assembly Information API (open.assembly.go.kr) | Public domain (Korean government open data) |
| Speeches (committee minutes) | Open National Assembly Information API (open.assembly.go.kr) | Public domain (Korean government open data) |
| Asset declarations | OpenWatch | CC BY-SA 4.0 |
| Policy seminars | National Assembly Seminar Database | Public data |
Inspired by: palmerpenguins (Horst, Hill & Gorman, 2020), which demonstrates how domain-specific datasets can make methods teaching more engaging.
Disclaimer
This package is intended for educational use only (teaching quantitative methods in political science courses). The datasets have been processed, filtered, and in some cases sampled for classroom convenience. They should not be treated as authoritative records.
For research or policy analysis, please consult the original data sources listed above and verify the data independently.
Citation
citation("assemblykor")Yang, Kyusik (2026). assemblykor: Korean National Assembly Data for
Political Science Education. R package version 0.1.1.
https://github.com/kyusik-yang/assemblykor
