Neil Brown, Research Associate at University of Kent presented his talk at Digibury, November 13, 2013. In it he explored how people learn to programme, what they find diffcult and what problems slow them down.
6. What We Make
2.5 million users
annually
0.4 million users
annually
BlueJ
Greenfoot
Neil Brown, University of Kent, @twistedsq
7. What We Make
2.5 million users
annually
0.4 million users
annually
What Are They All Doing?
BlueJ
Greenfoot
Neil Brown, University of Kent, @twistedsq
8. Some Small-Scale Studies
An Exploration of Novice Compilation Behaviour in BlueJ, Matt Jadud, 2007
“Many students write significant
amounts of code (10+ lines) at a time,
and then attempt to eliminate all the
syntactic errors that exist in the code”
Neil Brown, University of Kent, @twistedsq
9. Some Small-Scale Studies
An Exploration of Novice Compilation Behaviour in BlueJ, Matt Jadud, 2007
Study Size: 62 students
“Many students write significant
amounts of code (10+ lines) at a time,
and then attempt to eliminate all the
syntactic errors that exist in the code”
Neil Brown, University of Kent, @twistedsq
10. BIG DATA
Add recording to all BlueJ instances
(With explicit opt-in)
Neil Brown, University of Kent, @twistedsq
11. MEDIUM
BIG DATA
Add recording to all BlueJ instances
(With explicit opt-in)
Neil Brown, University of Kent, @twistedsq
12. How Much Data?
20,000 users per day
≈ 25% opt-in?
≈ 100KB data per user per day
≈ 0.5GB per day
≈ 200GB per year
Neil Brown, University of Kent, @twistedsq
13. How Much Data?
20,000 users per day ✓
≈ 25% opt-in? ✗ 40%
≈ 100KB data per user per day ✓
≈ 0.5GB per day ✗ ≈ 1 GB
≈ 200GB per year ✗ 300-400 GB
Neil Brown, University of Kent, @twistedsq
14. Headline statistics so far
(5 months in)
140,000 opted-in users
600,000 projects
5,100,000 successful compilations
4,700,000 unsuccessful compilations
Neil Brown, University of Kent, @twistedsq
15. Hardware Specs
2 machines (1 for recording, 1 for analysis)
24 core 2.5Ghz Xeon, 32GB RAM, 5TB RAID
Neil Brown, University of Kent, @twistedsq
16. Most common compile errors
Unknown variable
17%
Semi-colon expected
10%
Unknown method
7%
Bracket expected
7%
Unknown class
5%
Illegal start of expression
4%
Neil Brown, University of Kent, @twistedsq
17. Most common compile errors
Unknown variable
17%
Semi-colon expected
10%
Unknown method
7%
Unknown class
5%
Illegal start of expression
4%
Do they changeexpected 7% term?
during the
Bracket
Neil Brown, University of Kent, @twistedsq
19. Rarer compile errors
65th most common compilation error:
非法的表
式
始
Neil Brown, University of Kent, @twistedsq
20. Rarer compile errors
65th most common compilation error:
非法的表
式
始
Neil Brown, University of Kent, @twistedsq
21. Problematic if statements
What does this code do?
if (x >= 6 && x <= 9)
{
x = 0;
}
Neil Brown, University of Kent, @twistedsq
22. Problematic if statements
What does this code do?
if (x*x >= 36 && x*x <= 81);
{
x = 0;
}
Neil Brown, University of Kent, @twistedsq
23. Problematic if statements
How prevalent is this mistake?
Appeared in 0.15% of source files
How long does it take before people fix it?
Later fixed in half of them...
Neil Brown, University of Kent, @twistedsq
25. Challenges
A lot of data -- and a lot of method questions, e.g.
- How do you measure error difficulty?
- What is a frequent error?
(what is worth caring about?)
- How much can you get from this kind of data-set?
Scaling the analysis (already maxing out 24 cores)
Questions?
Neil Brown, University of Kent, @twistedsq