3 Basic example – paneldesc package guides

3.1 Set up

Import the package.

library(paneldesc)

3.2 Data import

Import the built-in dataset with simulated unbalanced panel data.

data(production)

3.3 Panel data structure analysis

The first group of functions is designed to analyze the structure of the panel.

To begin with, you can look at the basic characteristics of the panel: the number of entities, the number of periods, the number of rows, and the number of variables.

describe_dimensions(production, index = c("firm", "year"))

rows	entities	periods	variables
180	30	6	5

To begin analyzing the balance of the panel, you can look at the representation of entities by period.

describe_periods(production, index = c("firm", "year"))

year	count	share
1	25	0.833
2	28	0.933
3	30	1.000
4	29	0.967
5	26	0.867
6	19	0.633

For a more detailed analysis of the panel balance, you can look at the distribution of the number of entities by periods and the number of periods by entities.

describe_balance(production, index = c("firm", "year"))

dimension	mean	std	min	max
entities	26.167	3.971	19	30
periods	5.233	0.935	3	6

You can also visualize the distribution of the number of periods across entities.

plot_periods(production, index = c("firm", "year"))

In addition, the main patterns of entities participation in the panel can be displayed in a table or graph.

describe_patterns(production, index = c("firm", "year"))

pattern	1	2	3	4	5	6	count	share
1	1	1	1	1	1	1	16	0.533
2	1	1	1	1	1	0	5	0.167
3	1	1	1	1	0	0	3	0.100
4	0	0	1	1	1	1	2	0.067
5	0	1	1	1	1	0	2	0.067
6	0	1	1	1	1	1	1	0.033
7	1	1	1	0	0	0	1	0.033

plot_patterns(production, index = c("firm", "year"))

3.4 Missing values analysis

The second group of functions is aimed at analyzing missing values, taking into account the nature of panel data.

You can create a heatmap showing the number of missing values for each variable across all time periods.

plot_missing(production, index = c("firm", "year"))

Analysing all variables: sales, capital, labor, industry, ownership

Also, you can output the table with brief summary statistics on missing values.

summarize_missing(production, index = c("firm", "year"))

Analyzing all variables: sales, capital, labor, industry, ownership

variable	na_count	na_share	entities	periods
sales	26	0.144	15	5
capital	26	0.144	17	5
labor	26	0.144	16	6
industry	23	0.128	14	5
ownership	23	0.128	14	5

In addition, you can compare missing values statistics across various entities.

describe_incomplete(production, index = "firm")

firm	na_count	variables
23	15	5
21	11	5
1	10	5
2	10	5
6	10	5
7	10	5
12	10	5
26	10	5
25	6	5
30	6	5
4	5	5
13	5	5
17	5	5
29	5	5
8	2	2
3	1	1
10	1	1
14	1	1
24	1	1

3.5 Numeric variables analysis

The third group of functions is aimed at analyzing numeric variables, taking into account the nature of panel data.

If you need to, you can output a table with simple descriptive statistics.

summarize_numeric(production)

Analyzing all numeric variables: firm, year, sales, capital, labor

variable	count	mean	std	min	max
firm	180	15.500	8.680	1.000	30.000
year	180	3.500	1.713	1.000	6.000
sales	154	69.756	46.804	8.321	336.853
capital	154	32.490	31.053	0.968	194.719
labor	154	79.329	73.687	4.097	419.848

Descriptive statistics can be grouped by some variable, which does not necessarily have to be a panel identifier.

summarize_numeric(production, group = "year")

Analyzing all numeric variables: firm, sales, capital, labor

year	variable	count	mean	std	min	max
1	firm	30	15.500	8.803	1.000	30.000
1	sales	25	58.491	44.590	8.321	190.100
1	capital	24	24.862	16.273	0.968	65.950
1	labor	25	68.871	66.941	4.097	246.852
2	firm	30	15.500	8.803	1.000	30.000
2	sales	28	56.099	37.944	17.803	186.349
2	capital	27	28.790	31.053	3.150	151.464
2	labor	27	60.463	48.484	11.692	222.761
3	firm	30	15.500	8.803	1.000	30.000
3	sales	30	76.660	47.574	20.580	219.513
3	capital	30	35.464	39.174	4.729	194.719
3	labor	29	90.437	82.628	9.284	414.844
4	firm	30	15.500	8.803	1.000	30.000
4	sales	28	73.104	33.238	19.455	135.118
4	capital	29	44.522	35.375	5.080	132.898
4	labor	29	73.967	54.005	16.327	240.726
5	firm	30	15.500	8.803	1.000	30.000
5	sales	24	75.398	43.091	20.161	211.092
5	capital	25	28.351	23.127	5.339	86.078
5	labor	26	90.604	85.026	21.063	413.784
6	firm	30	15.500	8.803	1.000	30.000
6	sales	19	81.744	73.320	20.352	336.853
6	capital	19	29.767	30.908	2.288	108.787
6	labor	18	96.609	103.777	20.507	419.848

Heterogeneity between different groups can also be visualized.

plot_heterogeneity(production, select = "sales", group = "year")

For more detailed heterogeneity analysis, numeric variables can be decomposed into between and within components.

decompose_numeric(production, index = "firm")

Analyzing all numeric variables: year, sales, capital, labor

variable	dimension	mean	std	min	max	count
year	overall	3.500	1.713	1.000	6.000	180.000
year	between	NA	0.000	3.500	3.500	30.000
year	within	NA	1.713	1.000	6.000	6.000
sales	overall	69.756	46.804	8.321	336.853	154.000
sales	between	NA	29.776	25.772	159.197	30.000
sales	within	NA	35.862	-28.397	247.412	5.133
capital	overall	32.490	31.053	0.968	194.719	154.000
capital	between	NA	13.969	8.671	75.083	30.000
capital	within	NA	27.701	-22.444	152.126	5.133
labor	overall	79.329	73.687	4.097	419.848	154.000
labor	between	NA	44.023	24.606	175.731	30.000
labor	within	NA	59.561	-77.709	323.445	5.133

3.6 Factor variables analysis

The last group of functions is aimed at analyzing factor (categorical) variables, taking into account the nature of panel data.

Factor variables can be decomposed into between and within components.

decompose_factor(production, index = "firm")

Analyzing all factor variables: industry, ownership

variable	category	count_overall	share_overall	count_between	share_between	share_within
industry	Industry 1	63	0.401	13	0.433	0.918
industry	Industry 2	45	0.287	11	0.367	0.809
industry	Industry 3	49	0.312	10	0.333	0.917
ownership	private	76	0.484	16	0.533	0.898
ownership	public	55	0.350	13	0.433	0.813
ownership	mixed	26	0.166	7	0.233	0.724

One can also summarize transitions between states of a categorical (factor) variable.

summarize_transition(production, select = "industry", index = c("firm", "year"))

23 rows with NA values in 'industry' removed.

from_to	Industry 1	Industry 2	Industry 3
Industry 1	1.000	0.000	0.000
Industry 2	0.054	0.919	0.027
Industry 3	0.000	0.025	0.975