An efficient educational data mining approach to support e-learning

An efficient educational data mining approach to support
e-learning
Padmaja Appalla1 • Venu Madhav Kuthadi2 • Tshilidzi Marwala1
Springer Science+Business Media New York 2016
Abstract The e-learning is a recent development that has
emerged in the educational system due to the growth of the
information technology. The common challenges involved
in The e-learning platform include the collection and
annotation of the learning materials, organization of the
knowledge in a useful way, the retrieval and discovery of
the useful learning materials from the knowledge space in a
more significant way, and the delivery of the adaptive and
personalized learning materials. In order to handle these
challenges, the proposed system is developed using five
different steps of knowledge input such as the annotation of
the learning materials, creation of knowledge space,
indexing of learning materials using the multi-dimensional
knowledge and XML structure to generate a knowledge
grid and the retrieval of learning materials performed by
matching the user query with the indexed database and
ontology. The process is carried out in two modules such as
the server module and client module. The proposed
approach is evaluated using various parameters such as the
precision, recall and F-measure. Comprehensive results are
achieved by varying the keywords, number of documents
and the K-size. The proposed approach has yielded
excellent results by obtaining the higher evaluation metric,
together with an average precision of 0.81, average recall
of 1 and average F-measure of 0.86 for K = 2.
Keywords E-learning Data mining Knowledge
organization XML Ontology
1 Introduction
A lodestar in the ever-growing horizon of the distance and
continuing education, the e-learning is gradually conquer-
ing the cosmos with the charisma and consequent dyna-
mism of a victorious king, making its presence felt
everywhere [1]. The new-fangled generation of the web
known as the semantic web has emerged as a talented
technology for executing and boosting the e-learning.
Further, it has become the cynosure of the business mag-
nates, industrial entrepreneurs, the academic cream, and
also the intriguing investigators offering them enough food
for thought regarding its utility and applications. It is gifted
with the amazing acumen of co-coordinating the data in the
mechanized form and harmonizing the current web with
the high-tech computers and experts to function hand in
hand. The semantic web technology is endowed with the
capacity to be extensively executed in diverse domains.
e-learning is one of the areas which are likely to get
manifold advantages from the innovative web technique
[2]. In the course of the recent years, the investigations on
the semantic web have ushered in the ground-breaking
concepts so as to give shape to a novel configuration of
web content which would be significantly valuable to the
modern systems [3]. With the result, various techniques
have been devised for building and expanding the semantic
web. The recent innovation is the Resource Definition
Framework (RDF) [4] and its annexes like OWL [5] to
define metadata schemas, domain ontologies and resource
narratives.
Padmaja Appalla
padmaja1074@gmail.com
1
Faculty of Engineering, University of Johannesburg,
Johannesburg, South Africa
2
Department of AIS, University of Johannesburg,
Johannesburg, South Africa
123
Wireless Netw
DOI 10.1007/s11276-015-1173-z

Since the e-learning atmosphere necessitates the supply
of sufficient data and learning materials in diverse forms,
the need for the semantic-based explanation of the
e-learning material, effortless streamlining of the e-learn-
ing plan, and personalized deliverance of the e-learning
material is to be overemphasized [6, 7]. The visualization
of the semantic web is concerned with the capacity to
communicate the World Wide Web data in an usual and
recognized language that can be deciphered by the shrewd
agents, thus allowing them, on behalf of the human user, to
find, distribute and amalgamate data in a mechanized
manner. It furnishes a novel structure for the vibrant, dis-
seminated and extensible planned information (ontology)
set up on the prescribed logic. The Ontology is an unam-
biguous design of a conceptualization by the use of an
approved vocabulary and furnishes an affluent set of con-
structs to usher in a further significant stage of knowledge.
The ontologies and their linked realms are the corners
stones of the semantic web venture [8]. The complexity
associated with the number, varieties, and uses of the
computer based artifacts requires a system designing that
lets intelligence disappear into the infrastructure of active
spaces (such as buildings, shopping malls, theatres, and
homes) [9]. As a matter of fact, the challenges faced in the
investigation of the titanic data have not appeared just like
a blitzkrieg from the blue, with a torrential shower of
hassles within a fraction of a second, but have emerged
gradually, step by step, over a large number of years. This
is in view of the fact the generation of data assumes the
posture of a child’s play, while the location of fruitful facts
from the data takes the shape of a Himalayan Task. In the
current age of innovations, the web services technology
holds out immense potential for proficiently performing the
service oriented architecture and its strategic objectives. In
the domain of the classifier applications, the feature choice
is entrusted with the task of short-listing a subset of the
most leading features by steering clear of the entire
extraneous and superfluous so as to significantly scale up
the precision and accelerate the model training duration for
the classifier. When viewed from the general perspective of
the data mining functionality, the data mining emerges as
the most ideal candidate for exploring enthusing data from
the titanic quantity of data parked in the treasure houses of
databases, data warehouses, or other parallel data store-
houses [10–13].
Vastly equipped with the ever-enhancing enthusiasm
and efficiency to ensure an end product of easy and
effective curriculum, the education is reigning as one of the
shining stars in the galaxy of significant applications for the
multimedia. In the semantic e-learning scenario, the mul-
timedia method [14] has enacted a key role which is rather
identical to that of a long-established textbook as a treasure
house of invaluable data. The building of the context-aware
multimedia services in the heterogeneous networks is still a
complex and time consuming affair due to the hetero-
geneity in the context-aware media contents and network
conditions [15]. Anyhow, the feasibility to stage-manage
the text itself by means of an electronic appliance heralds a
newer horizon for the students to interact with the media,
leading to a further fantastic technique analogous to the
customary note taking. The multimedia technology holds
the acumen and efficiency to drown the aspirants in deep
delight, much in the same lines of a typical textbook.
Techniques are galore and varied for offering the learning
material in a multimedia design to the students. The videos
and images are immensely imperative resources in the
educational data mining field which hold out before the
students, the assorted avenues of accessing amazing data
altogether instinctively and efficiently than the text based
learning materials.
The paper presents an efficient educational data mining
approach to support the e-learning. The approach consists
of the knowledge input, annotation of learning, creation of
a multi-dimensional knowledge representation and the
retrieval of learning materials by matching the user query
with the indexed database and ontology. The proposed
approach consists of two modules such as the server
module and the client module. In the server module, the
documents are read from the database and the corre-
sponding knowledge representation is made. The module
consists of several steps such as the initial process, selec-
tion of unique words, k-means and the structure formation.
In client module, the information is retrieved based on the
user. The input query can be of two types like the user
details based and user interest based. There are several
advantages and disadvantages of each class of method for
developing the student models. In the existing works the
time consumption is found to be very high. The informa-
tion retrieval in the existing ontology method has failed
miserably in the retrieval of the information.
The rest of the paper is organized as follows: a brief
review of researches related to the proposed technique is
presented in Sect. 2. A 360-degree view of the proposed
approach appears in Sect. 3 and the detailed experimental
results and discussion are given in Sect. 4. The conclusion
is summed up in Sect. 5.
2 Review of related works
Many Researchers have developed several approaches in
the e-learning environment. Among them, a handful of
significant researches are presented in this section. Lau
et al. [16] have remarkably launched an innovative concept
map generation technique which was characterized by a
context-sensitive text mining approach and a fuzzy domain
Wireless Netw
123

ontology extraction algorithm. The devised system was
able to mechanically build the concept maps in accordance
with the letters forwarded to the online chat rooms. While
accessing the concept maps, the trainer could immediately
scan the improvement of the students and fine-tune the
pedagogical progression on the fly.
Francesco and De Santo [17] have characteristically
conceived an innovative method for the ontology structure.
They heralded a preliminary debate of the function of the
ontologies in the perspective of e-learning. In addition,
their novel technique visualized an ontological foundation
for exploring the learning devices to customize the learn-
ing. Subsequently, a test assessment of the method was
executed by using the authentic student records. Especially,
the technique was incorporated in a device for the evalu-
ation of students in the course of a learning phase. In
essence, the evaluation entrenched on the Bayesian method
went a long way in facilitating an advanced assessment of
student awareness.
Tankeleviciene and Damasevicius [18] have jointly
launched a novel structure, distinguished by the (semi-)
autonomous data gaining, learning and/or analysis with a
view to facilitate the supply of superior services to the stu-
dents, as a system quality feature. They proposed a new
structure for the expansion of the traditional e-learning
mechanism with intelligence faculties. They gave shape to
an intelligent module of the e-learning mechanism, with the
acumen to augment its local domain ontology with con-
ceptions and linkages gathered by placing an enquiry with a
far-flung data base. They were competent to evolve an
intelligent module of the eLearning system capable of
expanding its local ontology enriched with domain data by
means of the conceptions and linkages gathered by querying
a distant data base.
Ferreira-Satler et al. [19] have fervently formulated an
algorithm that facilitated the mechanical production of the
structure of the ontology. This method had been incorpo-
rated into a management device for the learning objects,
where every user profile was constructed from the learning
objects ushered in by the user himself. This technique had
been executed on a management device of the learning
objects viz. the AGORA. Those who had the occasion to
exploit the device have authenticated that the mechanically
made ontologies were broadly in line with their aspirations
and anticipations.
In addition, Sankey et al. [20] have skillfully offered the
conclusions of an investigation to assess the effect of the
multiple versions on the learning effects, along with
the student education excellence and engagement. From
the significant study it was found that multiple versions of
data failed miserably in ensuring a telling upgrade in the
education efficiency, neglecting of course the marginal
efficacy it prompted. However, the students were all
enthused and rejoiced at their exposure to the multimodal
learning modules, claiming that they were very much
benefited by the deeper understanding and useful preser-
vation of the learning material.
Brut et al. [21] have briskly brought forward a solution
to extend the IEEE LOM standard with the ontology-based
semantic annotations for the effective application of the
learning objects outside the learning management systems.
The data brand analogous to the corresponding method was
initially offered. The expanded indexing technique for the
related s brand expansion was launched with an eye on
achieving superior interpretations of the learning materials.
The novel technique developed and integrated two sacred
substitute techniques for the structure-based indexing of
the textual resources such as the mathematical method of
the latent semantic indexing and the linguistic-oriented
Word Net-based text processing. This led to the enhanced
comprehension of the underlying causes for the superb
outcomes turned out by the former method by means of the
linguistically managed options suggested by the latter
technique. The outcomes of the investigation assume sig-
nificance in the backdrop of embracing the semantic web
technologies in the e-learning field, but also as the pro-
jectors of progress in the pathway leading to the ontology-
based indexing of the textual materials.
Deng et al. [22] have dexterously developed an innovative
multimedia data technique for the e-learning. Their method
necessitated adjustable and re-workable backup for the
structuring of the multimedia content models and also
enabled the potential interactive, transmission of streams of
the multimedia information like the audio, video, text and
interpretations by the use of system services. Anyhow, they
assessed the current standards and applications for the mul-
timedia documents models like the HTML, MHEG, SMIL,
HyTime, RealPlay and MS Windows Media that enabled us
comprehend that they were not able to yield enough bases for
the sophisticated recycle and alteration. Therefore, they
launched an innovative technique for the structuring of the
re-workable and changeable multimedia data. In addition,
they conceived an all-inclusive approach for the advanced
multimedia data creation including the backup for recording
the presentation, regaining the data, abridging the presenta-
tion, weaving the presentation and adapting the representa-
tion. Their innovative technique had appreciable effect and
boosted the multimedia presentation authoring functions in
respect of the methodology and commercial features.
Rodicio and Sáncheza [23] have systematically designed
a method to investigate whether the merits of the human
education really survived simultaneously keeping away the
vexed issues of the earlier investigation. In one particular
type of investigation, the participants studied the geology
from a multimedia model integrating one of the three kinds
of backup such as the human education, preserved backup
Wireless Netw
123

or no backup. After studying the model, they were able to
find keys to the preservation and transmission tests. The
outcomes showed that the participant in the human edu-
cation situation outscored those in the other two situations,
though they were exactly identical to one another.
Moreover, Lau et al. [24] have magnificently made an
innovative e-learning specific multimedia method. They
furnished the students with further command over their
learning program and tempo. Above all, the multimedia
technique additionally supplemented the students with the
diverse versions of the media tailored to their learning
behavior, resulting in the boosting of their learning efficacy.
In 2013 Fernando et al. [25] have fascinating conducted
an extensive survey of mobile cloud computing research,
while highlighting the specific concerns in the mobile
cloud computing. They presented a taxonomy based on the
key issues in this area, and discussed the different
approaches employed to tackle these issues.
3 Proposed educational data mining approach
to support e-learning
In this paper, an innovative educational data mining tech-
nique to support the e-learning is presented. The e-learning
has emerged as the centre of attraction, of late. The prob-
lems faced in the e-learning can be related to the
acquirement and annotation of learning materials, organi-
zation of the acquired materials and the retrieval of the
useful learning materials. The approach consists of the
following steps of the knowledge input (where learning
materials such as the text documents, video and images are
collected), annotation of learning materials such as the
Meta data, creation of a multi-dimensional knowledge
representation (using the tree structure, indexing, XML and
the ontology) and the retrieval of the learning materials by
matching the user query with the indexed database and the
ontology. The proposed approach consists of two modules
such as the server module and the client module. The block
diagram of the proposed approach is given in Fig. 1.
3.1 Server module
In this module, the documents are read from the database
and the corresponding knowledge representation is made.
The process includes many steps which are detailed below:
3.1.1 Initial process
Initially all the documents are collected and stored in the
database. The documents habitually include the text doc-
uments, images and the videos files. Let the text documents
be represented by X = {x1, x2, …, xNx}, images by
Y = {y1, y2, …, yNy} and videos by V = {v1, v2, …, vNv}.
Here Nx is the number of text documents under consider-
ation, Ny represents the number of images under consid-
eration and Nv corresponds to the number of videos under
consideration. Here the total documents (database) can be
represented as D = {X, Y, V} and unified represented by:
D ¼ d1; d2; . . .; dNd
f g where di 2 X; Y; V and Nd
¼ Nx þ Ny þ Nv ð1Þ
All the documents (di, where 0 i B Nd) are read and
selected for further processing. The block diagram of the
initial processing is given in Fig. 2.
3.1.2 Selection of unique words
Each of the text documents consists of the words which are
processed. In the case of the image and video documents,
the words in the title are processed. Let the document z
consist of the words represented by:
WZ ¼ wz;1; wz;2; . . .; wz;Nwz

ð2Þ
Here, Nwz is the total number of words in document z. In
each document, the frequency of all the words is found out.
The frequency of a word represents the number of times the
word appears in the respective document. Subsequently,
top ten words from each document are selected based on
the frequency count. Let the selected top ten words from
each document be represented as:
SwZ ¼ swz;1; swz;2; . . .; swz;10

ð3Þ
where, z represents the document under consideration.
Hence, the selected words are obtained from each docu-
ment. The block diagram for the selection of the unique
words is given in Fig. 3.
After finding the top ten words for each document, the
common unique words to all the documents which come in
the top ten are found out. Here, let the common unique
words be represented by:
CUW ¼ cuw1; cuw2; . . .; cuwNcw
f g;
cuwi 2 Sw1Sw2. . .Swd
f g
ð4Þ
Ncw represents the number of the common unique words
present in all the documents under consideration. Subse-
quently, the frequency of each of the detected common
unique words is found out. Let the frequency of the com-
mon unique word (denoted by cuwz) and represented by
fre(cuwz) be defined by:
fre cuwz
ð Þ ¼ fre Swcumz
1

þ fre Swcumz
2

þ
þ fre Swcumz
d

ð5Þ
where, fre Swj
i

represents the frequency with which the
word ‘‘j’’ appears in the top ten words of the document ‘‘i’’.
Wireless Netw
123

3.1.3 K-means clustering and structure formation
The K-means clustering is a commonly used clustering
algorithm where the input data are grouped into K number of
data clusters. The grouping of the data points to form clusters
depends on the centroid values. The frequency of the com-
mon unique words becomes the input to the K-means clus-
tering in the proposed technique. And the K-means performs
the clustering based on the frequency to cluster the whole
documents into two clusters (as K is taken as 2).
Let there be G number of data points which are denoted
by DP = {dp1, dp2, …, dpG}. Let the centroids be repre-
sented by ceni where 0 i B k. The minimization function
of the algorithm is given by:
Fig. 1 The block diagram of
the proposed approach
Fig. 2 The block diagram of initial processing
Wireless Netw
123

1
G
X
G
j¼1
min dis2
dpj; ceni

ð6Þ
where, dis(dpj, ceni) is the Euclidean distance between data
point dpj and centroid ceni. Hence the objective can be
stated as to locate k cluster centroids, in which the average
squared Euclidean distance between a data point and its
adjacent cluster centroid is minimized. The steps involved
in the K-Means Algorithm are given as:
1. Initialize k centroids, so as to have one centroid for each cluster
2. Calculate the distance dis(dpj, ceni) of every k centroid from
data points dpj in Db
3. Allocate data point dpj to cluster Cui whose distance is least
compared to other clusters
4. Update centroid values based on the membership values of the
novel clusters
5. Repeat Steps 2 to 4, till is no movement of the data points
among the clusters
Hence, after the clustering based on the common unique
words, two clusters of documents are obtained (as k is
taken as two). In each cluster, the frequency of the words is
found out. From these, top five most frequent words from
each cluster are found out which forms the topic of the
respective cluster. Suppose the clusters are represented by
Cui, then the topic of the cluster Topi is represented as:
Topi ¼ xi;1; xi;2; . . .; xi;5

ð7Þ
where xi,j is the jth most frequent word in the ith cluster.
After finding out the topic for all the clusters, the process of
selection of the unique words and K-means clustering is
repeated for the maximum size cluster. That is, the cluster
having the maximum documents at this stage is selected
and processed through the steps again. Hence, after two
iterations of the steps, three clusters are yielded in total and
at this time, all the three clusters are compared to find the
largest and the process is carried out on the largest cluster.
The iteration process is carried out for an arbitrary number
of times. The flow diagram of the process is given in Fig. 4.
The iteration is carried for an arbitrary number of times
to finally result in a tree structure. Tree structure invari-
ably includes the document clusters, topics and levels.
The initial iteration results constitute the top portion of
the tree structure and subsequently formed clusters and
topics form the sub-trees. The number of sub-trees or
levels depends on the number of iteration performed. The
topic found out at the sub-trees forms the sub-topic. The
sub-topic also consists of the top five words of the cluster
based on the frequency count. A sample tree generated is
shown in Fig. 5.
Fig. 3 Block diagram for the
selection of unique words
Wireless Netw
123

In the above figure the tree structure formed for a set
of sample documents is shown. Here it is assumed that
cluster A has more documents than cluster B and cluster
B has more documents than cluster C and D. We can see
that for each of the clusters the respective topic/sub-topic
is fund out. The tree structure is then processed to XML
Fig. 4 The flow diagram of the
process
Fig. 5 Sample tree generation
Wireless Netw
123

format with the use of indexing. The XML file is gen-
erated with the attributes such as the topic, sub-topic,
level and document name. The structure of the XML file
is shown below.
The XML file is processed with one benchmark uni-
versity ontology. The ontology characterizes the informa-
tion as a set of concepts contained by a domain, using
collective terms to represent the types, properties and
interrelationships of those concepts. They are the structural
frameworks for organizing the information. The ontology
consists of the information and data that is taken from the
users. The ontology is created for each user making use of
the attributes of the user such as the major subject, course,
subject they like, languages known, computer knowledge
and so on.
3.2 Client module
In this module, the information is retrieved based on the
user. Initially, e the XML data is read and stored for the
text, images, videos and bench mark university ontology
based on the user. Subsequently, after obtaining the data,
the user is asked for the input query. Let the user be rep-
resented by Usr, corresponding ontology built be repre-
sented as Or, corresponding text, image and video
documents be represented as Xr, Yr and Vr. The retrieval of
learning materials for the users is done adaptively based on
the user query and the ontology which contains the per-
sonalized information of all the users. The input query can
be of two types such as the user details based and user
interest based.
3.2.1 User details based
Here, the information about the particular user is collected
from the ontology. Based on the retrieval, the information
is displayed giving the corresponding text, images and the
videos documents. This is done by first collecting the
information about the user and retrieving the topics and
sub-topics of the user. These collected topics and sub-
topics are searched in our work to find the matching doc-
uments (be it text, image or video). That is, from the
ontology Or for the user Usr, initially the topics are found
out and the corresponding text, image and video documents
(Xr, Yr and Vr) are retrieved.
3.2.2 User interest based
In this case, the user interest is given as input. And based
on the input of the user interest, the topics and sub-topics
are found out to form the ontology. These collected topics
and sub-topics are searched in our work to find the
matching documents (be it text, image or video). The
corresponding text, images and videos documents are dis-
played based on the ontology. Let the user interest be
represented as Uir. That is, based on user interest Uir, the
corresponding text, image and video documents (Xr, Yr and
Vr) are retrieved from the ontology Or for the user Usr. The
Flow diagram of the client module is given in Fig. 6.
4 Results and discussion
In this section, the results obtained for the proposed
approach are given and analysed. In Sect. 4.1, the imple-
mentation details and evaluation metric employed are
Fig. 6 Flow diagram of the client module
Wireless Netw
123

offered. In Sect. 4.2, the implementation screen shots are
presented and in Sect. 4.2, evaluation metric values
obtained for the proposed approach are given.
4.1 Implementation details and evaluation metric
employed
The proposed technique is implemented in JAVA on a
system having 6 GB RAM and 2.9 GHz Intel i-7 processor.
The recall, precision and F-measure are used as the eval-
uation metrics. Intuitively, the recall measures how well
the approach is performing at locating all the relevant data
for a query, and precision measures how well it is per-
forming at rejecting non-relevant data.
The definition of these parameters assumes that, for a
given function, there are two distinct sets of data such as the
retrieved and non-retrieved data (the latter representing the
rest of the data). This obviously applies to the results of a
Boolean search, but the same definition can also be used with
a ranked search, as explained later. If, in addition, the rele-
vance is assumed to be binary, then the results for a query
can be summarized. In Table 1, P represents the relevant set
of data for the query,
P characterizes the non-relevant set, Q
corresponds to the set of the retrieved data, and
Q relates to
the set of non-retrieved data. The operator gives the
intersection of the two sets. For example, P Q represents
the set of data that are both relevant and retrieved.
The three parameters of particular interest are furnished
below.
Recall R
ð Þ ¼
P Q
j j
P
j j
Pr ecision E
ð Þ ¼
P Q
j j
Q
j j
where
j j gives the size of the set under consideration. In
other words, the recall represents the proportion of the
relevant data that are retrieved, and the precision charac-
terizes the proportion of the retrieved data that are relevant.
The F-measure parameter is an efficiency parameter based
on the recall and precision which is used for evaluating the
classification performance and also for certain search
applications. It has the advantage of summarizing effec-
tiveness in a single number and is defined as the harmonic
mean of the recall and precision which is represented as
follows.
Fmeasure F
ð Þ ¼
1
1
2
1
R þ 1
E
¼
2RE
R þ E
ð Þ
4.2 Implementation screen shots
In this section, the implementation screen shot of the
proposed approach is given. The screenshots of the various
stages given here include the first page, tree view, XML
view, ontology, two types such as the user interest and user
view, training result, authentication, retrieved documents,
retrieved images and the retrieved videos. The screen shots
are given in Figs. 7, 8 and 9.
4.3 Results and analysis
In this section, the evaluation metric values obtained for
the proposed technique are given and discussed. The
Table 1 Relevant and retrieved documents
Relevant Non-relevant
Retrieved P Q
P Q
Not retrieved P
Q
P
Q
Fig. 7 First page of the
implementation
Wireless Netw
123

analysis is carried out in three phases of analysis based on
the keywords, number of documents and the K-value.
4.3.1 Analysis based on keywords
Inferences from Tables 2, 3, 4 and 5:
• Tables 2, 3 and 4 give the results values obtained for
the keywords data, database and the mining
respectively.
• The results are taken for the case K = 2.
• The results include the evaluation metric values of the
precision recall and F-measure.
• From the results, we can infer that the proposed
approach has attained good results by achieving high
evaluation metric values.
• Table 5 shows the average values obtained for various
keywords.
• It is seen that among the keywords, the novel approach
has worked best for the keyword data, achieving the
average precision of 0.84, average recall of 1 and the
average F-measure of 0.91.
Fig. 8 Overview of domain
ontology
Fig. 9 Training results
Wireless Netw
123

• Among all the values, the highest precision attained is
roughly 0.95, and the highest F-measure achieved is
approximately 0.97.
4.3.2 Analysis based on documents
In this section, analysis is carried out by finding the eval-
uation metrics based on the number of documents given as
input. Various document sizes taken for evaluation are 20,
40, 60, 80 and 100.
Inferences from Figs. 10, 11, 12 and 13:
• The analysis is carried out by finding the evaluation
metrics based on the number of documents given as
input. Various document sizes taken for evaluation are
20, 40, 60, 80 and 100.
Table 2 Results obtained for keyword: data
Keyword: data
Document
files
Image
files
Video
files
Documents
relevant
Documents
retrieved
Images
relevant
Images
retrieved
Video
relevant
Video
retrieved
Precision Recall F-measure
20 5 5 15 6 4 3 2 2 1 0.52381 0.6875
40 10 10 32 31 5 4 5 5 1 0.952381 0.9756098
60 15 15 49 44 6 6 8 7 1 0.904762 0.95
80 20 20 73 69 13 10 10 9 1 0.916667 0.9565217
100 25 25 89 85 19 16 11 10 1 0.932773 0.9652174
Table 3 Results obtained for keyword: database
Keyword: database
Document
files
Image
files
Video
files
Documents
relevant
Documents
retrieved
Images
relevant
Images
retrieved
Video
relevant
Video
retrieved
20 5 5 6 3 1 1 1 1 I 0.625 0.7692308
40 10 10 20 12 4 4 1 1 1 0.68 0.8095238
60 15 15 37 32 4 4 1 1 1 0.880952 0.9367089
80 20 20 54 49 8 7 3 3 1 0.907692 0.9516129
100 25 25 75 68 12 10 7 7 1 0.904255 0.9497207
Table 4 Results obtained for keyword: mining
Keyword: mining
Document
files
Image
files
Video
files
Documents
relevant
Documents
retrieved
Images
relevant
Images
retrieved
Video
relevant
Video
retrieved
20 5 5 3 2 2 1 1 1 1 0.666667 0.8
40 10 10 3 2 2 1 4 4 1 0.777778 0.875
60 15 15 9 7 2 1 6 6 1 0.823529 0.9032258
80 20 20 15 13 6 5 10 9 1 0.870968 0.9310345
100 25 25 21 18 7 6 13 12 1 0.878049 0.9350649
Table 5 Average values obtained for various keywords
Keyword Average precision Average recall Average F-measure
Data 0.84 1 0.91
Database 0.79 1 0.88
Mining 0.80 1 0.80
Wireless Netw
123

• The results are taken for the case K = 2.
• Figures 10, 11 and 12 give the evaluation metric values
obtained for the keywords data, database and the
mining respectively.
• The proposed approach ushers in excellent results by
achieving high evaluation metric values for all the cases
irrespective of the number of documents.
• Figure 13 shows the average values obtained for
various keywords.
• From the figure, we can infer that the proposed
approach works well for increasing the number of
documents. The best results have been achieved for the
document size of 100 in our case.
4.3.3 Analysis based on K-value
Inferences from Table 6 and Fig. 14:
• Table 6 and Fig. 14 give the average evaluation metric
values obtained by varying the K-size.
• The various K-sizes taken into consideration are 2, 3
and 4.
• From the results, we can see that the approach has
worked well for all the cases and best results have been
obtained for k = 2.
5 Conclusion
The paper presents an efficient educational data mining
approach to support the e-learning. The proposed approach
consists of two modules, such as the server module and the
client module. In the server module, the documents are
read from the database and the corresponding knowledge
-
0.20
0.40
0.60
0.80
1.00
20
40
60
80
100
Fig. 10 Evaluation metric chart for varying number of documents for
keyword: data
0
0.2
0.4
0.6
0.8
1
20
40
60
80
100
keyword: database
0
0.2
0.4
0.6
0.8
1
20
40
60
80
100
keyword: mining
0
0.5
1
20
40
60
80
100
Fig. 13 Average evaluation metric chart for varying number of
documents
Table 6 Average evaluation metric values obtained for varying
K-value
K value Average precision Average recall Average F-measure
2 0.81 1 0.86
3 0.78 1 0.84
4 0.74 1 0.79
Fig. 14 Chart of average evaluation metric values obtained for
varying K-value
Wireless Netw
123

representation is made. In the client module, the informa-
tion is retrieved based on the user requirements. The pro-
posed approach is evaluated using various parameters such
as the precision, recall and the F-measure. The compre-
hensive results are obtained by varying the keywords,
number of documents and the K-size. The proposed
approach has yielded amazing outcomes by obtaining high
evaluation metrics, as exemplified by the average precision
of 0.81, average recall of 1 and the average F-measure of
0.86 for K = 2.
References
1. Ghaleb, F. F. M., Daoud, S. S., Hasna, A. M., Jaam, J. M., El-
Sofany, H. F. (2006). A web-based e-learning system using semantic
web framework. Journal of Computer Science, 2(8), 619–626.
2. Dutta, B. (2006). Semantic web based e-learning. In DRTC
Conference on ICT for Digital Learning Environment, 11th–13th
January, 2006.
3. Berners-Lee, T., Hendler, J., Lassila, O. (2001). The semantic
web. Scientific American, 285(5), 34–44.
4. RDF. (2001). W3C. Semantic web activity: Resource description
framework.
5. OWL. (2003). Web ontology language.
6. Šimić, G., Gasević, D., Devedzić, V. (2004). Semantic web
and intelligent learning management systems. In Workshop on
Applications of Semantic Web Technologies for E-Learning.
7. Thyagharajan, K. K., Nayak, R. (2007). Adaptive content
creation for personalized e-learning using web services. Journal
of Applied Sciences Research, 3(9), 828–836.
8. Alesso, H. P., Smith, C. F. (2006). ‘‘Thinking on the web’’,
Berners-Lee, Gdel and Turing. London: Wiley.
9. Acampora, G., Gaeta, M., Loia, V., Vasilakos, A. (2010).
Interoperable and adaptive fuzzy services for ambient intelli-
gence applications. Journal of ACM Transactions on Autonomous
and Adaptive System (TAAS), 5(2), 8.
10. Tsai, C.-W., Lai, C.-F., Chao, H.-C., Vasilakos, A. (2015). Big
data analytics: A survey. Journal of Big Data, 2(21), 1–32.
11. Sheng, Q., Qiao, X., Vasilakos, A., Szabo, C., Bourne, S., Xu,
X. (2014). Web services composition: A decade’s overview. 280,
218–238.
12. Fong, S., Wong, R., Vasilakos, A. (2014). Accelerated PSO
swarm search feature selection for data stream mining big data.
IEEE Transactions on Services Computing. doi:10.1109/TSC.
2015.2439695.
13. Chen, F. Deng, P. Wan, J., Zhang, D., Vasilakos, A., Rong, X.
(2015). Data mining for the internet of things: Literature review
and challenges. 50, 11–14.
14. Ando, M., Ueno, M. (2008). Cognitive load reduction on mul-
timedia e-learning materials. In Proceedings of IEEE International
Conference on Advanced Learning Technologies, pp. 268–272.
15. Zhou, L., Naixue, X., Lei, S., Vasilakos, A., Yeo, S.-S. (2010).
Context-aware middleware for multimedia. Journal of Services in
Heterogeneous Networks, 25(2), 40–47.
16. Lau, R. Y. K., Song, D., Li, Y., Cheung, T. C. H., Hao, J.-X.
(2009). Toward a fuzzy domain ontology extraction method for
adaptive e-learning. IEEE Transactions on Knowledge and Data
Engineering, 21(6).
17. Francesco, C., De Santo, M. (2010). Ontology for e-learning: A
Bayesian approach. IEEE Transactions on Education, 53(2),
223–233.
18. Tankeleviciene, L., Damasevicius, R. (2010). Towards the
development of genuine intelligent ontology-based e-learning
systems. In 5th IEEE International Conference Intelligent Sys-
tems (IS), pp. 79–84.
19. Ferreira-Satler, M., Romero, F. P., Menendez, V. H., Zapata, A.,
Prieto, M. E. (2010). A fuzzy ontology approach to represent
user profiles in e-learning environments. In IEEE International
Conference on Fuzzy Systems (FUZZ), pp. 1–8.
20. Sankey, M. D., Birch, D., Gardiner, M. W. (2011). The impact
of multiple representations of content using multimedia on
learning outcomes across learning styles and modal preferences.
International Journal of Education and Development Using
Information and Communication Technology (IJEDICT), 7(3),
18–35.
21. Brut, M. M., Sedes, F., Dumitrescu, S. D. (2011). A semantic-
oriented approach for organizing and developing annotation for
e-learning. IEEE Transactions on Learning Technologies, 4(3).
22. Deng, L. Y., Liu, Y.-J., Lee, D.-L., Chen, Y.-H. (2013).
Ontology-based multimedia adaptive learning system for
u-learning. In Information Technology Convergence Lecture
Notes in Electrical Engineering, Vol. 253, pp 669–676.
23. Rodicio, H. G., Sáncheza, E. (2013). Aids to computer-based
multimedia learning: A comparison of human tutoring and
computer support. Interactive Learning Environments, 20(5).
24. Lau, R. W. H., Yen, N. Y., Li, F., Wah, B. (2014). Recent
development in multimedia e-learning technologies. World Wide
Web, 17(2), 189–198.
25. Fernando, N., Loke, S., Rahayu, W. (2013). Mobile cloud
computing: A survey. Journal of Future Generation Computer
Systems, 29, 84–106.
Padmaja Appalla obtained her
Masters Degree in Education
from Open University, UK and
Masters Degree in Business
Administration from Andhra
University, India. She also
completed her Bachelors
Degree in Sciences from St.
Joseph’s College in India. She
has various professional certifi-
cations including PGDSM,
MCSD, OCA, Java certification
to cite a few. Currently, she is
pursuing her D.Phil. in e-learn-
ing from University of Johan-
nesburg. She has over 19 years of experience in the field of
Information Technology and Education and is currently working as
Deputy Pro Vice chancellor Education at Botho University,
Botswana.
Wireless Netw
123

Dr Venu Madhav Kuthadi
currently working with Univer-
sity of Johannesburg, he
obtained his Ph.D. Degree in
Computer Science from MU,
India. He received his Master’s
Degree in Computer Science
from JNTU India. He got
14 years of experience in
research and teaching under-
graduate and postgraduate stu-
dents of Engineering. He holds
B.Tech. in CSE from ANU
India. He has published good
number of articles in interna-
tional journals and conference proceedings. Dr Kuthadi is an Editor
for the International journal IJAEGT.
Professor Tshilidzi Marwala
is a deputy Vice Chancellor at
the University of Johannebsurg.
He was previously the Execu-
tive Dean of the Faculty of
Engineering and the Built
Environment at the University
of Johannesburg, the Head of
Control and Systems Group and
the Carl and Emily Fuchs Pro-
fessor of Electrical Engineering
at the University of the Witwa-
tersrand, Executive Assistant to
the Technical Director at the
South African Breweries, Chair
of the (Telkom) Local Loop Unbundling Committee, Deputy Chair of
Limpopo Business Support Agency, director of the State Information
Technology Agency Pty (Ltd), member of council of Statistics South
Africa and member of council of the National Advisory Council on
Innovation. He has been on the boards of City Power Johannesburg
Pty (Ltd) and EOH Pty (Ltd). He holds a Bachelor of Science in
Mechanical Engineering with a Magna Cum Laude from Case Wes-
tern Reserve University, a Master of Engineering from the University
of Pretoria, a Ph.D. in Computational Intelligence from University of
Cambridge and was a post-doctoral research associate at the
University of London’s Imperial College of Science, Technology and
Medicine. He has received over 40 awards including the Order of
Mapungubwe; has published over 150 articles in refereed interna-
tional journals, conference proceedings and book chapters and has
successfully supervised over 33 master and Ph.D. students.
Wireless Netw
123

An efficient educational data mining approach to support e-learning

Recomendados

Recomendados

Más contenido relacionado

Similar a An efficient educational data mining approach to support e-learning

Similar a An efficient educational data mining approach to support e-learning (20)

Más de Venu Madhav

Más de Venu Madhav (14)

Último

Último (20)

An efficient educational data mining approach to support e-learning