1. Use of the TCI questionnaire in
General Practice: response rates
and reliability
Evan Kontopantelis
Stephen Campbell
David Reeves
NPCRDC
2. Team Climate Inventory
• 65 item measure with six subscales that
attempts to quantify the working climate
within a practice (Anderson & West, 1994)
• All items on a scale of 1 to 5
• The six subscales (factors) are:
Participation Task style
Support for innovation Reviewing processes
Objectives Working
3. Data collection
• The questionnaire was distributed to all
clinical, nursing and administration staff
working in a sample of 60 practices in 1998
and 42 of the same practices in 2003
• Response rates varied greatly by practice
1998 2003
number of practices 60 42
average respondents per pract 9.5 12.2
average resp rate per pract 63.1% 65.1%
4. The question
• What level of response is required
to obtain a reliable/accurate TCI
subscore for a practice?
5. Data structure
• Three levels: items, respondents & practices
• But which is the exact form?
– When each respondent in a practice evaluates a
different set of items (e.g. “I generally prefer to
work as part of a team”) I:R:P
– When each respondent in a practice evaluates the
same set of items (i.e. “My team has a lot of team
spirit”) (I×R):P
• Unfortunately the questionnaire is a mixture
of both
6. Aggregate-level variables and
reliability
• Methods are based on concepts from
the generalizability theory
• The universe score is the score that an
object of measurement – e.g. a practice
– would receive on a characteristic – e.g.
participation – if its score was based on
the mean of all relevant predefined
conditions of measurement – e.g. all
possible respondents and questions
(O’Brien 1990)
7. Defining reliability
• Reliability is defined as the ratio of the
universe score’s variance to the
expected observed score variance:
2
2
True
Measure
p
• It is an indicator of how different the
observed score would have been, if
another random set of respondents
and/or questions had been selected
8. Variance components
graph variability symbol
solid
circle
expected observed
score
outer
ring
True score
(practices)
grey
ring
Error (respondents)
centre
circle
Error (random error
+ items)
2
p
2
,r pr
2
, , , ,i pi ri pri e
Shavelson &
Webb 1991
9. Estimating Reliability
• For practice j, with nj respondents and k
items and according to the I:R:P design:
(Marsden et al. 2006)
• Variance components need to be
calculated
2
2 2
, , , , ,2
p
j
r pr i pi ri pri e
p
j jn n k
2
2 2
, , , , ,2
ˆ
ˆ
ˆ ˆ
ˆ
p
j
r pr i pi ri pri e
p
j jn n k
10. Accuracy, defined
• It is the likely amount of error on the
observed score compared to the true score
• Using the central limit theorem we estimate a
95% CI for the TCI score and the subscores:
• We defined a score as accurate if the 95% CI
for it was:
• That is, 0.5 points on the scale of 1 to 5
2
, , , ,2
,
95%
ˆ
ˆ
ˆ 1.96
i pi ri pri e
r pr
j
KCI
n
ˆ ˆ[ -0.5, +0.5]μ μ
11. Model & estimation parameters
• Only the 3-level random-effect model was
described but more were estimated:
– Mixed-effect models in which items was treated
as a fixed factor
– 2-level models that use the aggregate score of the
items (R:P)
• Variances are estimated within STATA, using
ANOVA and MLGLM (StataCorp, 2005)
• The GLLAMM command is used to estimate
the parameters of the ml linear models we
employed (Rabe-Hesketh et al. 2005)
15. Why are accuracy and reliability
scores so different?
• Reliability coefficients are
affected by the low
variation in practice scores
• Practice mean scores did
not vary by more than 2
points on the scale of 1 to 5
• TCI may not be particularly
good at detecting practice
differences in climate
1998 2003
μ min max μ min max
part 3.6 2.7 4.5 3.7 2.7 4.6
supinv 3.4 2.5 4.3 3.5 2.6 4.3
reflex 3.4 2.2 4.1 3.4 2.9 4.0
work 3.6 2.8 4.4 3.7 3.1 4.2
obj 3.7 2.7 4.4 3.7 3.0 4.4
task 3.4 2.6 4.4 3.5 2.7 4.1
TCI 3.5 2.8 4.3 3.6 2.9 4.2
Descriptives, practice mean scores
16. Summary
• TCI is a measure that attempts to quantify
the working climate within a practice
• Assuming that the I:R:P structure best
describes our data we calculate:
– the variances in the design
– reliability & accuracy measures
• Small between practice variances affect the
reliability score but don’t affect the accuracy
17. Future work
• Use of a finite population
correction for practices
• Examine the (I×R):P structure and
compare results to the I:R:P one
18. References
• Anderson, N. & West, M. A. 1994, Team climate inventory : manual
and user's guide Windsor : ASE.
• O'Brien, R. M. 1990, "Estimating the reliability of aggregate-level
variables based on individual-level characteristics", Sociological
Methods and Research; 18 (May 90) p.473-504
• Shavelson, R. J. & Webb, N. M. 1991, Generalizability theory : a
primer Newbury Park ; London : Sage Publications.
• Marsden, P. V., Landon, B. E., Wilson, I. B., McInnes, K., Hirschhorn,
L. R., Ding, L., & Cleary, P. D. 2006, "The reliability of survey
assessments of characteristics of medical clinics", Health
Serv.Res., vol. 41, no. 1, p.265-283
• StataCorp 2005, Stata Statistical Software: release 9.2 College
Station, TX.
• Rabe-Hesketh S., Skrondal A., & Pickles A. 2004, GLLAMM Manual
U.C. Berkeley.
Notas del editor
We are writing a paper on the results
To what extend do individuals participate in the team
Support for new ideas: attitudes towards change in your team
Aspects of the objectives set by a team
Task style: how the team monitors and appraises the work it does
Assessment of work done (discussion & consideration of methods / actual changes taking place)
Working in the team (team evaluation / personal eval in relation to the team / interdependence):
6 subscale scores: each being the average of the respective items
The overall TCI score is the average of the 6 subscales
________________________________________________________
Part: We interact frequently (agree/disagree)
Supinv: This team is open and responsive to change
Obj: How worthwhile do you think the team objectives are to you?
Task: Are team members prepared to question the basis of what the team is doing?
Reflex: Team strategies are rarely changed
Work: I generally prefer to work as part of the team
3 practices closed down (GPs retired), merged etc and the rest refused to participate again
Reliability is the extend to which a set of test items can be treated as measuring a single latent variable
We need to apply methods which assess the reliability of aggregate-level variables, on the dataset. (Usually Cronbach’s alpha is used to assess reliability but it does take into account nested data structures)
We calculate a reliability/accuracy score for each of the six subscales and for the overall TCI score
In this analysis we use the I:R:P structure
____________________________________
Crossed / nested
The concept is the same as for Cronbach’s alpha ratio of true score and total score variances
Universe score variance also called true score variance
True score variance = expected observed + error
Here the questions factor is described as random (i.e. each collection of questions is a sample of a larger population of questions that can accurately measure whatever we set out to measure). However it can be treated as fixed as well (if all the items that can measure the latent variable are included in the questionnaire).
Grey ring: variance among respondents, variance in respondent-practice interaction (undistinguishable since we have different respondents for each practice)
Centre circle: variance among items, variance in item-practice, item-respondent, item-respondent-practice interactions, variance of random error of measurement
Naturally we only have among items variability if items is considered to be a random factor. If it is treated as fixed then we only have random error of measurement in the centre circle and no items or items interaction variances.
Solving for nj after setting the reliability to 0.7 we can estimate the number of respondents needed to achieve a certain level of reliability
Therefore, the higher the between practices variability in relation to the total variability, the lower the number of respondents needed to receive a high reliability score.
This formula corresponds to the random model. For a mixed model, where items is a fixed factor, we only have error variance
The expression is the SE of the score/subscore
Since we defined an accurate score like that, we want this expression to be 0.5
Solving for nj we can calculate the minimum number of respondents per practice so that we get an accurate score for a subscore or the overall score
One value for each respondent in the 2-level model.
Analysis of Variance and Multilevel Generalized Linear Models (Hierarchical Linear Models).
Generalized Linear Latent and Mixed Models (GLLAMM)
Pooled vs Unpooled ANOVA. The standard ANOVA procedure employed “pools” the R:P variance terms for each practice together and calculates the Mean Squares based on the assumption that each practice contains the same (or almost the same) number of respondents. However, this doesn’t seem to be the case in this survey since the number of respondents in each practice varies greatly. In order to take these variations into account we will calculate the Means Squares using the unpooled variance terms for each practice (O'Brien 1990).
__________________________________________
Only a two level model has been calculated for the overall score (one value for each respondent, the average of the six subscores). But we can treat the subscores as items and have a 3-level model.
Reliability for the average number of respondents per practice (12 for 2003)
Number of respondents needed to achieve a reliability of 0.7
Number of respondents needed to have a +/-0.5 95% CI for accuracy
_______________________________________
ANOVA – GLLAMM difference. gllamm maximises the marginal log-likelihood using Stata's version of the Newton Raphson Algorithm. ANOVA uses least squares.
3-level random model, whose variances are estimated using unpooled ANOVA
Big change in supinv: although error variances were estimated to be the same, practice variance (true variance)) more than doubled increasing reliability. The same stands for participation
Accuracy is not affected by the change in between-practice variation. It is only affected by size error terms sizes
_______________________________________________
Unpooled ANOVA encountered in the literature that’s why it is selected to be displayed
Variances in the design: various methods are used and compared
FPC: corrects for small practices. The problem is that small practices will be excluded straight off, even if their response rate is high. We need to take that into account