1. 1
4B 4C General Conclusions
A questionnaire on students’ satisfaction about the design was passed to students. It was administered twice, with slight differences, first after
Christmas and then just before the school year came to an end. The idea was to compare to what extent the assessment programme had changed
their beliefs and whether any differences could be spotted between the group that had been observed and the one that was not.
Students were asked to grade from one to ten the different aspects of instruction. The questionnaire was divided into 13 different dimensions,
following the design of the learning diary. These were questions referring to:
1. EC - English Course – The course in general
2. LD - Learning Diary – The learning diary in general
3. ICT – ICT – The use of ICT
4. WG - Wiki global – The use of the wiki in general
5. WO - Wiki Others – the fact that classmates could see the productions of the student on the wiki
6. WI - Wiki I – The fact that the student could see the productions of their classmates on the wiki
7. G –Grammar – The grammar section in the learning diary
8. V – Vocabulary – The vocabulary section in the learning diary
9. FB – Feedback – The error correction techniques used by the teacher on the wiki
10. R – Resources – Resources students use for learning
11. OR - Own Research – The section of the learning diary where students were expected to write freely about their own choices.
12. TC - Teacher Comments – The quality of the teachers’ comments
13. GS - Gold Stars – The role of the students in correcting their own errors in writing
The questionnaires included 53 questions, but one of them was not considered, leaving 52 in total. The last one was an open-ended question. The
June questionnaire included a small diagram that included a number of continuum lines that moved from positive to negative feelings and
students had to represent the way instructions had made them feel.
The results from the first questionnaire were statistically compared using a two-tailed t-test and an f-test to know whether both groups
represented the same or different populations. Secondly, the dimensions and items that reached the threshold were analysed. Then, the
differences between the results in February and June were studied globally. Finally, the most relevant differences between dimensions and items
2. 2
were also reviewed. The level of significance considered to determine whether we can accept or reject the null hypothesis is 5%.
Questionnaires are controllable tools. However, The relevance of the data they provide might be limited or affected by what Skehan (1989)
refers to as the Approval Motive, defined as "A danger for any sort of questionnaire or self-report data. The respondent may answer an item not
with his true beliefs, attitudes, etc., but rather with the answers which he thinks will reflect well on him, i.e. the respondent works out what the
"good" or "right" answer is and gives it." (1989:62.) For this reason, these questionnaires will have to be triangulated with an analysis of
students’ productions on the wiki, interviews, the teachers’ perceptions, general marks and the results of performance and placement exams, in
order to make changes in the design for the second phase of this DBR research.
4B 4C Reflection
Characteristics of the two
groups
Similarities and
differences be 4B and 4C
in February
The groups were
significantly
different.
The average in Questionnaire 1
was 4,89 and the standard
deviation was 2,62.
The average in the first
questionnaire was 5,74, and the
standard deviation was 2,89.
In questionnaire 1, average
perceptions were higher in 4C
and variance was smaller in 4B.
If we run a t-test to compare
groups B and C statistically,
results show that the
perceptions of the two groups
were significantly different in
February.
An f-test to compare variance
throws no significant
differences.
Significance of the design
in the two classes
The average in Questionnaire 1
was 4,89, and in 2 it was 4,66. The
Standard deviation was 2,62 in
The average in the first
questionnaire was 5,74, and in
the second it was 6,24. The
In 4B, the score for the whole
design was below the threshold
in June, unlike 4C.
3. 3
Significant positive
effect in 4C (both in
perception and
variance)
February, and in June it was 2,76.
Running a statistical analysis tells
us that there were no significant
differences in either perception or
variance between the first and the
second questionnaire.
Standard deviation was 2,89 in
February and 2,44 in June.
Statistical analysis confirms that
there was a significant
improvement and significant
smaller variance in the
perceptions of the students of
this group after the design was
applied.
In questionnaire 2, perceptions
and variance improved
significantly in 4C, while in 4B
both data got worse, but not
significantly. We can conclude
that the design had a positive
effect on the perceptions and
variance of 4C students and no
effect in 4B.
Threshold general
dimensions
The dimensions that
were perceived
more negatively in
both groups were
WI, WO and LD.
The dimensions
below threshold
increased in 4B and
disappeared in 4C.
The dimensions where average
results did not reach the threshold
in the first questionnaire were
WO, WI, LD and WG, in that
order, while in the second
questionnaire EC, FB and GS added
to the former, summing a total of 7
dimensions below threshold. This
indicates students did not like the
design.
The dimensions that did not
reach the threshold in the first
questionnaire were WO, LD, GS
and WI, in that order, while all
the dimensions reached
threshold in the second
questionnaire. The students of
4C were more positive about
dimensions than the students of
4B in the beginning.
Whether dimensions and items
reached threshold or not was a
measure to determine if
students were happy with the
design or not. The maximum
score was 10, and threshold was
set at 5, following the norm for
tests in that school. Averages
were found to determine
whether dimensions reached
threshold or not. All the
dimensions and items that
scored below 5 were marked
red.
In 4B, the general score was
below the threshold in both
February and June, unlike 4C.
4. 4
The high number of dimensions
below threshold indicates that
the design was not popular,
particularly in 4B.
In February, the dimensions
below threshold in both groups
were WO, WI, and LD. these
dimensions were no longer
below threshold in 4C in June.
They remained below the
threshold in 4B, while some
others added to the list.
Lowest scoring dimension
WO as the most
disliked dimension
in both
questionnaires and
by both groups.
Significant
improvement in the
perceptions of 4C.
The dimension that scored lowest
in Questionnaire 1 and
questionnaire 2 was WO (Wiki
Others), where none of the items
reached threshold in either
questionnaire 1 or 2. This
dimension referred to the students'
opinion about other student's
seeing what they produced on the
wiki.
The differences between the
average scores for the first and the
second questionnaire are not
significant. What this seems to
indicate is that this class did not
The dimension that scored
lowest in Questionnaires 1 and 2
was WO (Wiki Others.) This
dimension was related to the
public nature of the wiki and
asked students what their
opinion was about the fact that
others could see their work.
None of the items in this
dimension reached the
threshold in the first
questionnaire.
In June, The dimension was
slightly above average, and only
WO4, which asked them if they
What students in both 4B and 4C
disliked more about the design
was that other students could
see what they did. WO was the
most unpopular of the
dimensions.
However, students in 4C
improved their perception about
this dimension significantly,
even if it still remained the
lowest scoring dimension in
June.
Ways to make the design less
demanding on this aspect need
5. 5
see any reason to change this
perception after the design was
applied.
liked the fact that their
productions were public, was
below the threshold (M'agrada
que els meus companys puguin
veure el que faig al wiki.) The
improvement in WO in the
second questionnaire was
statistically significant.
The implications seem to be that
they did not like other students
could see their productions, but
somehow accepted it had a
purpose.
to be found.
Less popular dimensions
WI, WG and LD
would need design
changes.
The following least liked
dimensions in June were WI, WG
and LD. This is very similar to
results in February.
The other low scoring dimensions
in June were LD, WG and WI.
GS had dropped from the list, and
WI had moved to fourth position.
WG, which did not appear in
February, is there.
In 4C, they seemed to dislike the
design in general, but were
coming to terms with the design.
WI, WG and LD remained very
unpopular dimensions both in
February and June in both
groups. This calls for some
design changes.
Highest scoring dimension
OR as the most
popular dimension.
The dimension that scored the
highest number in the first and
second questionnaires
corresponded to a one-item
dimension, OR. OR1 asked
The dimension that scored the
highest in the first and second
questionnaires corresponds to a
one item dimension, OR. In OR1
students were asked if they
It is clear that students felt at
ease with the OR dimension
both in 4B and 4C. Other
dimensions of the learning diary
should probably adopt a more
6. 6
whether students learned English
better when instruction addressed
their own interests (Aprenc millor
l'anglès quan puc escriure sobre
coses que m'importen i/o
m'agraden.)
learn better when they are able
to write about things they like or
care about (Aprenc millor
l'anglès quan puc escriure sobre
coses que m'importen i/o
m'agraden.)
communicative and less
restrictive approach.
Dimension progress
Stagnation and
visceral dislike in
4B.
A pass in 4C.
Dimensions below
threshold in
questionnaire 1 now
significantly better.
Significant
improved cohesion
in 4C in four
dimensions
In the right track for
GS and FB in 4C
The students' perceptions did not
experiment any significant changes
for any dimension. All of this
suggests that the students'
perception of the dimensions was
not any better when the course
finished.
The changes in the students’
perceptions of the four
dimensions below threshold in
questionnaire 1 (WI, WO, GS
and LD) showed statistically
significant improvements in
perception, and in the case of GS,
variance also improved
significantly.
FB was another dimension
where both perceptions and
variance improved significantly.
What progress in the GS and FB
categories seem to indicate is
that the interaction with the
students was working fine and
students trusted the design.
There were two dimensions
were scores were very similar in
the two questionnaires, but
The perceptions of students
from 4B about dimensions did
not progress and were,
somehow, visceral.
In contrast, the opinions of
students from 4C in June
improved significantly precisely
in the dimension were scores
were below the threshold in
February (WI, WO, GS and LD.)
Variance in 4C was also
significantly better in the case of
four dimensions (GS, FB, EC and
G) in June. This suggests that a
cohesive classroom culture was
being created.
In GS and FB improvements
were significant both for scoring
and variance, suggesting in these
two dimensions the design was
7. 7
were variance improved
significantly. These were EC and
G. All of this seems to indicate a
class culture was being built.
in the right track.
Dimension regression
Higher variance in
WI in June may be a
sign of evolution in
4B.
V would probably
need some
readjustment.
There were no significant changes
in the perception of any dimension
in June in 4B, although scores
were, generally speaking, lower.
Variance showed slightly less
substantial perceptions in June.
Standard Deviation was higher in
10 dimensions in June. The only
dimension where there was a
significant difference was the WI
dimension, where the perceptions
of the students seemed to differ to
a significant greater extent in
June.
A dimension that experimented
significant decline in perception
was V. What this indicates is that
some changes in this section are
advisable.
The disagreement in the
students’ perception in WI (that
asked about the relevance of
seeing what other students did
on the wiki) may mean some
students saw a purpose behind
it, while others did not. In this
sense, it might indicate they
were evolving in the direction
4C seems to have evolved.
A more communicative design
for the V dimension is advisable
Threshold general items
The number of items
below threshold
increased a 6% in
4B and decreased a
16% in 4C.
The dimension with
In 4B t here were 26 items that did
not reach threshold in the first
questionnaire (50%) and 29 in
questionnaire 2 (56%.) The high
number of items below threshold
indicates the students in this class
questioned this design from
beginning to end.
In 4C, there were 17 items that
did not reach the threshold in
the first questionnaire (33%),
and 9 in questionnaire 2 (17%.)
The items below the threshold
indicate the students in this
class were not enthusiastic
about the design, but were not
openly aggressive about it.
The number of items below
threshold increased a 6% in 4B
and decreased by 16% in 4C.
The dimension with more items
below threshold in 4C, in both
questionnaires, was LD.
8. 8
more items below
threshold in 4C, in
both questionnaires
was LD.
In spite of the improvement
experimented in threshold
attainment, some items were
below threshold in both
questionnaires. These were
WI1, WO4, R3, R5, LD2, LD4,
LD9, LD11 and LD12.
Item with the lowest score
in questionnaire 1 and 2
Self-esteem
problems in 4B.
In 4C, the fact that
WI and WO were not
problematic in the
end can explain why
their dislike was not
visceral.
The item with the lowest score in
Questionnaire 1 was WI1 (Veure el
que fan els meus companys al wiki
és divertit.), while the lowest score
in Questionnaire 2 was WO2
(Veure el que faig al wiki pot ser
interessant pels meus companys.)
This last result seems to indicate
self-esteem problems on the part
of the students.
The item with the lowest score in
Questionnaire 1 was item LD11,
(Fer el Learning Diary em motiva
a aprendre anglès.)
The lowest scored item in
Questionnaire 2 was LD9. It asked
if the Learning Diary was fun (Fer
el LD és divertit.) The implication
is, of course, that students did not
enjoy the LD.
The lowest scoring items in 4B
are in line with the two least
popular dimensions for them,
which were WO and WI.
A possible way to avoid the
problems WI1 and WO2 caused
in 4B would be to make students
work in small groups where
their learning diaries became
public at the end of the term.
In 4C students are consistent in
their dislike of the learning
diary.
The low appreciation of WO2 in
4B hints at self-esteem or
insecurity problems, that
would explain the visceral
9. 9
dislike of the design on the part
of 4B students. If those were
true, then the fact that WI and
WO were not problematic in
4C in the end may explain why
their dislike of the design was
not visceral.
Items with the highest
score in questionnaire 1
and 2
A blended design is
not seen as
problematic.
Learning a foreign
language by talking
about the things
they were interested
in is important to
students.
The item with the highest score in
Questionnaire 1 was ICT1
(Connectar-se a Internet és fàcil.)
The item with the highest score in
Questionnaire 2 was OR1 (Aprenc
millor l'anglès quan puc escriure
sobre coses que m'importen i/o
m'agraden.)
The item with the highest score in
questionnaires 1 and 2 was ICT1.
This item asked whether
connecting to the Internet is easy
or not (Connectar-se a Internet és
fàcil.)
Other items that were scored high
in questionnaire 2 were OR1 and
LD5. LD5 asked students if they
had followed their own rhythm
(He seguit el meu propi ritme.)
If we look at the items in the
questionnaire that stand out, we
see that students in both groups
were used to working online, so
a blended design was an
appropriate choice. Learning a
foreign language by talking
about the things they were
interested in and working at
their own rhythm was also
something they appreciated.
Item progress
WI4 showed
significant
improvement in 4B
4C students found
the design more fun
19 items showed progress, but this
improvement was only significant
for item WI4 (M'agrada poder
veure el que fan els meus companys
al wiki.) This seems to indicate an
interesting contradiction in the
students’ perception: While they
still disliked that their classmates
While it is true that LD9 obtained
a very low score because students
did not consider the learning diary
“fun,” three of the items that
showed significant improvement
shared the use of the word “fun,”
although “learning diary” was
not part of the statement. These
The significant improvement in
WI14 in 4B seems to indicate
that students were beginning to
understand this dimension
served to guide their learning.
The students from 4C seemed to
find the design funnier in June.
10. 10
in June.
Gold stars were a
source for
motivation in 4C
could see what they were doing,
they had learnt to appreciate
looking at what others did, which
was one of the assumptions of this
design. It is reasonable to think
that if it was changed, and allowed
students to work in small groups,
where only the members of the
groups could see what the other
members were doing, students
may feel more at ease.
were WI1 (Veure el que fan els
meus companys al wiki és
divertit); WO1 (Veure el que faig
al wiki pot ser divertit per als
meus companys) and GS2
(Aconseguir Gold Stars al
Learning Diary és divertit)
GS2 showed significant
improvement, but also GS1
(Entenc el sistema de valoració de
Gold Stars al Learning Diary),
and GS3 (Aconseguir Gold Stars
al Learning Diary és interessant)
did. GS was the category where
there were more items that showed
significant improvement.
The last item to show significant
improvement was LD11 (fer el
learning diary em motiva a
aprendre anglès), even if it had
been the lowest scoring item in
questionnaire1 and was still below
the threshold in questionnaire 2.
The word “fun” was responsible
for the low score of LD9, but it
was also used in WI1, WO1 and
GS2 and these items achieved
significant improvement.
The dimension where more
items achieved significant
improvement was GS, indicating
that gold stars became a source
for motivation.
Item regression 33 items scored lower, but a
significant worse perception can
only be appreciated in the case of
Significant regression occurred for
V2 (Escriure frases d'exemple
(sample sentences) m'ajuda a
Sample sentence should be
redesigned to boost more
creative outputs.
11. 11
FB5, where students evaluated
how much they liked the teacher
suggesting them to visit web pages.
This was probably because they
expected web recommendations to
be more focused in English for
communicative purposes rather
than in web pages to help them
understand the nature of their
errors.
aprendre vocabulari.)
Final reflection
Why clear improvements occurred in one class and not in the other might be explained as a result of the trust building process, which
worked with one group and helped to build a common culture, but not with the other.
The dimensions that seemed to be causing more trouble in 4B were WO and WI. Both relate to exposure, and in this implementation
there is the possibility students felt threatened by them. Both were also problematic in February in 4C, but not in June, when things
worked better in that class. In fact, these dimensions showed significant improvement in 4C. The fact that there was significance higher
variance in WI in 4B in June, and that the only item that improved significantly in 4B was WI4 seems to indicate these students felt
exposed by the design and would have needed to feel safer, while they were beginning to show some interest for the possible guidance
of other students’ productions.
The statistical improvements in GS and FB in 4C are probably key elements to explain the better results of the design in 4C. Both are
related to interaction and dialogue, and probably made instruction easier.