This document discusses challenges and opportunities in combining data from the European Company Survey (ECS) and the European Working Conditions Survey (EWCS). While statistical matching is not possible due to different target populations, the paper explores integrating aggregate-level estimates from one survey into a micro-dataset from the other. Key variables like sector, size, and country allow combined analysis after ensuring sufficiently populated cells. An example analysis finds relationships between employee representation and earnings at establishment, sector and country levels. Further research is suggested to better integrate survey design and estimation techniques.
1. Combining data from the European Company Survey and
the European Working Conditions Survey
29 March 2021
Challenges and opportunities
Beyond 4.0 – Expert Workshop
2. Eurofound work on statistical matching
• Various projects looking at individual level matching
– EQLS with EU-SILC
– EWCS with EQLS
• Mixed results
– Many common variables between EQLS and EWCS, but small sample size –
not many donors to choose from
– Less overlap between EQLS and EU-SILC, but much larger sample size
making EU-SILC a more attractive donor file
– Generally, less is more: matching fewer variables is more feasible
3. A special case: aggregate level matching of the ECS and EWCS
• 2017 paper: “Combining data from different surveys in analysis:
Compatibility of the 2013 European Company Survey and the 2015
European Working Conditions Survey“
• EWCS and ECS target a different population, so statistical matching
is not possible
• Instead, the paper explores the potential for analysing a micro-
dataset into which aggregate level estimates from another survey
have been integrated
4. Avoiding the ecological fallacy
• “Ecological fallacy” occurs when associations between group level
characteristics are assumed to also hold for the individuals that
make up those groups (Robinson, 1950; Freedman, 1999)
• The ecological fallacy is an issue of validity, but ecological variables
can be very useful, as long as they do not serve as substitutes for
individual level variables (Schwartz, 1999)
• Combined analysis of ECS and EWCS is therefore only useful if
there are explicit expectations about contextual effects
5. Research questions
1. To what extent does the structure of the ECS and the EWCS allow
for combined analysis of the data from the two surveys?
a. What aggregate level units can be used?
b. How do the data need to be manipulated to enable this?
2. How can the topics that are most suited for combined analysis be
identified?
3. What could such a combined analysis look like?
6. Enabling comparison: survey design
• Two very different surveys
– Different target populations
– Collected at different points in time
– Different mode of administration
• Practical issue:
– ECS only covers establishments with at least 10 employees in NACE sectors
B-N, R,S
– Analyses therefore carried out on a sub-sample of the EWCS
7. Sample distribution: ensuring sufficiently populated cells
• When calculating aggregate level estimates, it needs to be ensured
that:
– the number of cells for which an aggregate level estimate is calculated is
sufficiently large to reliably include the variable in a multilevel analysis of
micro-level data
– the number of cases in each cell is sufficiently large to reliably calculate an
aggregate level estimate
• Aim is that each category of the key variables contains at least 100
cases
8. Key variables (1)
Shared demographics
variables
ECS 2013 EWCS 2015 Correspondence
Sector of activity
(NACE Rev 2)
4-digits 4-digits Up to 529 classes
Workplace size 1 - 0-9 (screened out)
2 - 10-19
3 - 20-49
4 - 50-249
5 - 250-499
6 - 500+
1 - 1 (interviewee works
alone)
2 - 2-9
3 - 10-249
4 - 250+
Two matching categories
Geographic location Country Country 28 EU Member States,
MK, ME, TR
(only EU28 in analysis)
Source: ECS 2013 and EWCS 2015, calculations by author
9. Key variables (2)
• Workplace size only distinguishes two size classes
• Net sample in each country is comfortably large
• Two-digit level NACE classification of sector of activity used as a
starting point
– 88 divisions → 77 in both surveys → 22 categories with n <100 in ECS and
24 with n <100 in EWCS → either combined or dropped → 63 categories (5
with n <100 in ECS and 9 with n <100 in EWCS (min n = 54))
10. Decomposing variance: the relevance of different levels of
aggregation
• Empirical way to identify variables for which it is worthwhile to think
about the higher level characteristics that might affect them
• Assessing the extent to which variance in core variables in the ECS
and EWCS can be assigned to the country level and to the sector
level
• Applying cross-classified multilevel models (Hox, 2002)
– Each micro-level unit is nested in a “sector in a country” and each country-
sector combination is nested in both a country and a sector
– Only estimating the intercept and allocating variance to the individual,
country, sector and country*sector level
11. Variance decomposition of core variables in the ECS 2013
Source: ECS 2013 and EWCS 2015, calculations by author
12. Variance decomposition of core variables in the EWCS
Source: ECS 2013 and EWCS 2015, calculations by author
13. Example: effects of employee representation on earnings
• Example based on attractive conditions:
– Many characteristics of social dialogue do not change quickly
– Substantial amount of variance in earnings assigned to the
sectoral and country level
– Information on presence of employee representation available in
both ECS and EWCS
14. Example: hypothetical hypotheses
• The presence of a body for employee representation at a company
has a positive effect on earnings
• The higher the proportion of establishments with an employee
representation in a sector, the higher the earnings in that sector
• The higher the proportion of establishments with an employee
representation in a country, the higher the earnings in that country
15. Cross-classified multilevel model of employee representation presence on
monthly earnings (log)
Estimate SE p
Fixed effects
Intercept 6.618 0.129 0.000
Employee representation present at the
establishment
0.215 0.009 0.000
Proportion of establishments with employee
representation in the sector
0.282 0.234 0.232
Proportion of establishments with employee
representation in the country
0.679 0.227 0.006
Residual
variance
Covariance
parameters
Residual 0.256 0.003 0.000 70%
Country 0.052 0.015 0.000 14%
Sector 0.034 0.007 0.000 9%
Country*Sector 0.026 0.002 0.000 7%
Source: ECS 2013 and EWCS 2015, calculations by author
16. Conclusions
• There is value in exploring and exploiting opportunities for combined
analysis of the ECS and the EWCS
• Meaningful matches can be made using the available key variables
• ECS → EWCS seems to be more promising than EWCS → ECS
• Example suggests that combined analysis can generate meaningful
results
17. Discussion
• Variance decomposition can indicate what level is relevant to
include in research efforts and this relevance might extend to policy
efforts as well
• The example is just a case in point
– ‘Monthly earnings’ are not captured particularly reliably in the EWCS
– Quite unique to have information at individual and higher levels, while still
having a clear rationale for including the higher level information in analysis
• Other sources might be better suited for estimating aggregate level
information
18. Suggestions for further research
• A simple approach to generating estimates was taken (calculating
means), putting fairly limiting restrictions on cell size.
– By using multilevel models to compute aggregate level estimates sufficiently
reliable estimates may be achieved with smaller subsamples (Skrondal and
Rabe-Hesketh, 2009)
• The impact of the fact that the higher level characteristics are based
on estimates was not assessed
– A simulation study could be carried out assessing the robustness of the
multilevel models when estimates are allowed to vary across the range as
defined by their reliability interval
19. Implications
• There is room for improvement…
– Potential for integrating data from, or in, other surveys should be considered
at the design stage
– Harmonisation of survey questions is important even if surveys target
different populations
• …but combined analysis cannot substitute a true linking of worker
and company data
– To answer most research questions an explicit link between the worker and
the workplace is required
20. Useful links
• Working paper
https://www.eurofound.europa.eu/sites/default/files/wpef17036.pdf
• EWCS datasets (UK Data Service)
https://beta.ukdataservice.ac.uk/datacatalogue/series/series?id=200014#!/access-data
• ECS datasets (UK Data Service)
https://beta.ukdataservice.ac.uk/datacatalogue/series/series?id=200012#!/access-data