SlideShare una empresa de Scribd logo
1 de 9
Descargar para leer sin conexión
Prediction of Student learning interests using text analytics
Prethiviraj Elango1
,
Mithun Rajkumar Antony2
and Krishna
Ramanathan3
Faculty of Engineering and IT
University of Technology Sydney
Sydney, Australia
{1
Prethiviraj.Elango, 2
Mithun.RajkumarAntony
3
Krishna.Ramanathan}@student.uts.edu.au
Abstract – The collaboration of student learning in
online is popular because of its novel advantages over
the traditional class room learning. There are certain
benefits can be accomplished in using this platform of
learning, if the quality of approach is unique.
However, there ae some limitations in using the vast
amount of available student data. There is no proper
evident in in using the student data for various
purposes. In the existing literatures, there has been
various advantages in using the text analytics for the
enhancement of the educational pattern of learning;
on going through these literatures, this paper proposes
a process model to collect and analyze the student data
on their online learning environment. This proposed
thesis uses data analytic tool called RapidMiner for
text processing to indicate the students’ interest in
various area of study based on their available data.
Furthermore, this report is based on the proof of
concept of a project which is simple enough to target
University of Technology Sydney (UTS) and other
educational stakeholders.
Keywords – Text analytics, online learning,
Prediction accuracy
I. INTRODUCTION
The student online learning environment is a significant
change in the present day scenario. University of
Technology, Sydney (UTS) providing student an
opportunity for this engagement of students in online.
They are using UTS online software for making the
collaboration of student and professors. There are some
limitations in the UTS online in which the students can
participate only in the discussion board of their enrolled
subjects. Professors can only provide some updates
regarding subjects, can publish student marks and can
include the subject materials in UTS online. Professors
cannot monitor the student activities, interests,
intentions and so on, as UTS online does not provide any
opportunities to do so.The mentioned limitations can be
overcome by the implementation of project called CIC
Around. This project is currently under
Roberto Martinez-Maldonado
Connected Intelligence Centre
University of Technology Sydney
Sydney, Australia
Roberto.Martinez-Maldonado@uts.edu.au
process, handled by UTS Connected Intelligence Center
(CIC).
CIC is operating under UTS who handles multiple
project for UTS in which CIC Around is one among
them. The activities involved in CIC is to find the
happenings on intersecting human sense making and
computational analysis. CIC’s research is focused on
various domain projects like education, learning
analytics, human centered, research analytics and
transdisciplinary. Their main aim is to conduct research
to answer the unanswered questions on these domains.
In the CIC Around project, UTS CIC is designing a
participatory design process to build an online
WordPress multisite environment which will be useful
for student learning, their online collaboration with
their peers, provide a students an opportunity to
collaborate with the industry partners and for building a
community among the students. Students can create
groups for the various purpose of studying. Professors
are also provided with the opportunity to monitor the
students’ progress. The implementation of this project
will overcome the existing limitations of UTS online in
which this project is more of a participatory process that
help the student to participate more on this online
learning environment. This project will be more helpful
to the students who are studying blocked mode subjects.
The understanding of wider UTS community student’s
interest in the online learning environment will helps in
the analysis of student data for the future enhancement
of CIC Around.
The proof of concept on the UTS CIC Around with the
WordPress plugins, BuddyPress and BBpress has been
performed. Following the proof of concept, the proposal
of a process model for predicting the students’ interest
on this online learning environment has been done. The
prediction accuracy is based on the rate of interests on
the students over their other areas of learning. This
proposal will be helpful for the University authorities to
refine the particular courses based on the interest level
of students. The data analytic tool called RapidMiner
has been used in which the detailed explanations are
given as follows.
The rest of paper is organized as follows: Section 2
Motivation, Section 3 Methodology, Section 4 Related
work, Section 5 Existing process, Section 6 Proposed
process, Section 7 Conclusion.
II. MOTIVATION FOR THIS RESEARCH
The main aim of this research is to provide the
educational stakeholders a clear insight about using text
analytics in an effective and efficient way. The objective
trying to achieve in this paper, is to improve the
efficiency of the online learning based on their interests
that binds the students from various distance. As
technology is enhancing according to recent trends, it is
necessary for educational stakeholders to use that
technologies to enhance the existing pattern of learning.
Certain Universities will be having their own process
and norms in enhancing their student’s existing pattern
of learning. However, in many cases, student’s interests
cannot be predicted by the Universities to know their
exact thinking on their selected subjects.
There will be vast and vast subjects available for a
particular student to study based on their selected
course. To explain with the simple example, University
of Technology Sydney have refined their Information
Technology course on the four majors like Business
Information systems, Data Analytics, Networking and
Software Development based on the student
participation in their Subject Feedback Survey (SFS).
Other than this, there will be more and more internal
works might be done by UTS to enhance their course.
Also, in the survey students will provide the feedback
only about their enrolled subjects. This is more than a
direct approach without any technical means which does
not will provide the information regarding student actual
interests with their feedback on their enrolled subjects.
Conducting surveys for knowing the student interests is
a tedious process in which University authorities cannot
be able to collect the survey data manually to know their
interests to make some refinements in their course. This
is the starting point to perform the research in this area
which will be useful for various educational
stakeholders.
The research is also based on the similar technique
explained above but in an alternate way of collecting the
student data from their online learning environment.
This research will also help in overcoming the flaw of
not knowing the interest of a student. Here, data analytic
tool is used for the clear understanding of the process
involved in this research.
On the whole, the ultimate motivation of this research
is to accomplish the accuracy of predicting the student
interests in various domain areas and incorporating this
prediction accuracy to refine their subjects involved in
their course. The selected research will also provide a
better understanding of student data which will be
helpful in analyzing various patterns in future.
III. METHODOLOGY
In this research, review of many articles related to text
analytics has been done. And then based on the findings,
proposal has been done for an enhancement related to
the existing approaches in managing the student data and
what can be done with the student data considering on
their online participation. Initially based on the available
student data, proposed idea has been sorted considering
various factors.
The research is mainly focused on the two basis. The
first one is collecting the student data from online
learning environment for the enhancement purpose of
the student learning on the whole. This should be done
after retrieving the data from the online learning
environment. The data should be retrieved on the back
end by reporting and also according to the
specifications mentioned by the stakeholder’s purposes.
So, on researching various criteria’s, finally decided to
use the text analytics in which it will helpful in
collecting, measuring, analyzing and finding the similar
pattern among the students’ data.
The second one is focused on the data analytic tool
called RapidMiner in which the bulk student data will
be processed according to the keyword search option
available in that. The main focus on student data in
aiming the text analytics is to derive the high quality
information from the student entered text. By using this,
similar pattern of text will be structured which will be
supportive in interpreting the output. The above
mentioned two process will be useful in enhancing the
student learning. So, decided to use those two process
and then proceeded with the ideas with some
demonstration. The clear and detailed description of
this two process is clearly explained on the proposed
process section.
IV. RELATED WORK
The initial application of text mining in the field of
higher education was not that effective when compared
to the later one as they were not user friendly and was
very expensive. There are several application of text
mining and a unique method is preferred by every user
to work with the mining tool depending on their
category of knowledge. Text mining also has a great
effect in the field of higher education where the teachers
can analyze the activities of the learner and help the
learner in an efficient manner. Text mining is also used
as a major tool to refine the curriculum of any course in
a university or any education standards.
The author in his book Qualitative Text Mining in
Student’s Service Learning Diary has analyzed the
services in learning activities of the student’s in any
education sector in a way to analyze the outcomes of the
students from e-learning and also to provide a reflection
to the students based on their interaction with eLearning
tools like online discussion board, online exams, etc., He
also quotes that the curriculum of a course can be
updated by using some text mining technologies, which
makes the course even more refine, rather than putting
a huge syllabus with unrelated contents for the students.
He also introduced some computer technology like
(Hsu, 2012).
Instructional design
This is to provide a blueprint and to examine the
teaching standards of every teacher. Instructional
design is used to identify a particular learner who is
holding a high rate of dropping out of the subject. Once
such a learner is identified, a unique approach, and
strategies are used to make an efficient teaching
practice. The authors narrowed down the concept of
instructional design in their book of “Designing
instructional feedback for different learning outcomes”.
The book clearly states that the instructional event,
where a particular student is picked up for motivation
has to follow a pretest, practice and a post-test (Smith
et al., 1993)
Text mining prediction
The authors in their book of text mining predictive
methods for analyzing unstructured information
indicated that any data mining technology will be used
to find out the structured data base but not in the semi-
structured database. Hearst has identified that data
mining would not satisfy the human needs of learning
and teaching information. However, when text mining is
applied with appropriate language and statistics to
analyze text data helps us to attain new data (Weiss et
al., 1989).
The professor followed a research method of this study.
He says: “Initially apply the instructional design model
followed by text mining procedures. The model has to
combine 3 aspects of view: professor in action research,
student teacher in curriculum and instructional
development and design students in motivational
learning evaluation” which is explained on the below
figure.
Figure 1: Research models in three points of view
The author (Ai et al., 2006) in his paper “The
Application of Data Mining Technology in Distance
Learning Evaluation has listed out the knowledge that
we gather because of text mining, they are:
A. Generalized knowledge
A very general description of the characteristics
of any text the mining tools could generate (in
our case, the mining tool is a rapid miner). This
generally contains the reflection of common
nature of similar things, refining the abstract
data and so on.
B. Related knowledge
This data is gathered when one data is
dependent of other similar data or associated
knowledge.
C. Category knowledge
This is similar to the related knowledge but it
differs where the gathered texts are categorized
based on the different characteristics of
knowledge. The most widely used type of
classification of data is a tree view.
D. Predictive knowledge
This can also be said as future knowledge,
which is predicted according to the past data
and the current data. The trending predictive
methods are statistical method, neural networks
and machine learning.
E. Bias-based knowledge
This is nothing but an exceptional knowledge
that’s gathered as a description of the
differences between characteristics between
attributes.
They also quoted the use of E-Portfolio with text
mining as an application to evaluate the learning
behavior of the student. E-Portfolio when used by itself
proves to be an inefficient technique to evaluate the
learning behavior of the student as it’s evaluated
manually by the teacher. It also has the limitations of
handling large number of students. The below figure
shows that, Text mining when used with E-Portfolio
help the teacher to gather some knowledge and in
learning objectives associated with the analysis.
Through the recorded set of mined data, the teacher can
easily understand the regulatory standards and also
analyze the results of student’s learning behaviors,
which further increases the efficiency of learning
evaluation (Ai et al., 2006).
Figure 2: Application of data mining technology in
E-Portfolio
The MCMS (Mining Course Management Systems)
project in Thames Valley University recommends to
build a knowledge management system based on data
mining. Data mining techniques are applied to track the
individual student performance also to refine the
curriculum according to the activities of the student.
Text mining is used as a tool to represent the mined data
by the MCMS in a human understandable way for better
decision making (Oussena, 2008).
A model-driven data integration is applied in MCMS to
fetch the data from different systems into a single data
warehouse for analyzing (Kim et al., 2009). The data in
the warehouse should always be pre-processed and
transformed before it undergoes any mining techniques.
So when the data is ready, it increases the efficiency of
the data mining process. Such an efficient knowledge
gathered from the data mining process will be used by
the university to have an advanced approach of
prediction individual’s behavior, instructing the
students. Text mining is applied here to narrow down
the student’s interaction with the online learning
(ELearning) tool. When a knowledge management
system and a text mining process and used
simultaneously, an university will have the highest level
of data efficiency which further facilitates the university
to choose the most advanced approach in understanding
their student’s need.
Figure 3: Workflow of MCMS
The author determines the student’s test score by using
the data mining prediction technique by using an
effective factor. This factor is later altered according to
the student’s performance in the succeeding year
(Gabrilson, 2003). Luan groups the students into 2
categories. One with the students who can easily deal
with the courses and the other with students who take a
longer time to complete a course (Luan, 2002). Such
groups helps the universities to make a better decision
on refining their curriculum, the time for teaching and
so on.
To understand the factors which determines the
student’s retention, the universities usually collects data
about the history of academic performance of a student,
behavior and perceptions of a student, for instance the
author used different classifiers to predict the student’s
characteristics which lead to a very less accuracy or a
bad accuracy (Superby et al., 2006).
The authors in their paper “Use Data Mining To
Improve Student Retention In Higher Education” has
stated the student retention as the biggest challenge as
it decides better academic programs and a better revenue
for the universities (Oussena et al., 2010). A simple
formula for maintaining the student retention rate was
developed by Seidman (Seidman, 1996), which is:
Retention=Early Identification + (Early + Intensive
+ Continuous) Intervention
This formula helps to understand that early detection of
those students at risks and maintain regular interaction
will be the most recommendable way to increase student
retention
Tinto has provided 5 strategies to increase student
retention to the next level:
• Understanding the expectations of the student.
• Conducting a counselling session in helping the
students choose their courses.
• Providing academic and social support
specially before the start of the first semester
• Motivating the student on explaining their
capability
• Active interaction with the available learning
sources
The authors in their work introduces the idea of using
opinion mining from student’s feedback data. As
opinions of the stakeholders will be the major factor in
individual’s decision making, the authors have
considered this technique to understand their students
better and to refine the curriculum. The result of the
opinion mining depends on how good the data is
preprocessed or stages the data has undergone when it’s
prepared before classification (Dhanalakshmi et al.,
2016).
The authors in their work used linear regression
classifier to identify the variable which is associated
with the academic performance. This leads them to
realize, previous academic performance was the
important variable (Oussena , 2008).
V. EXISTING PROCESS
The existing system of text analytics in general is used
to process the unstructured information into structured,
extract the meaningful information from the entered
text and contained information of the text will be used
by the various data mining algorithms. The extraction
of information will be done by summarizing the number
of words in the document. The summarized words then
can be analyzed to find the similarities and relationship
between them. The most common method in text
analytics is to convert the text to numbers for the
analysis of clustering and predictive data mining
projects. In addition, this method will also be helpful in
various analysis. Text mining also includes sentimental
analysis, summarization of documents, entity relation
model, text clustering and text categorization. The
below figure shows the overall description of the text
analytics process:
Figure 4: Text analytics process
VI. PROPOSED PROCESS
In this proposal, the illustration is going to be with the
usage of text analytics with the student data. The
proceedings are based on the existing text analytics
process. As we are dealing with the student data from
the online learning environment, the first thing we needs
to do is collecting the student information like their
posted data, their comments, their participation data in
any discussion and their micro information like the page
they visits, they page they like and the topics they are
very much interested in. Every data that we will be
collecting from the relational databases will be in an
unstructured format. All unstructured data will be
retrieved in the document format. So to make it into
structured format we can use vector representation
feature. By using this feature, we can bring those
documents in a similar database which will then be
converted into structured format.
The collection of this structured data is very important
because we are going to find some of the similar patterns
and relationship among their data. The main purpose in
doing this is to make sure to find out the similarities of
a single student opinion regarding other subjects in
which it is not in the part of their course.
Example: For example, a student belongs to
Information Technology course but he/she has more
interest in marketing related topics. If that particular
student is participating in more and more marketing
related activities, we can come to the conclusion that
particular Information Technology student is equally
interested in marketing subjects as well. Like this many
other fellow Information Technology students might
have interest in marketing. Now, it is very clear from
this point is quite a considerable amount of information
technology students are interested in marketing. By
identifying this similarities and patterns, the
Universities are provided with the opportunity to refine
the Information Technology course by including
marketing subjects. Likewise many students who all are
comes under one particular course will have equal
interest in other areas as well. So with the help of text
analytics the course can be refined periodically
according to the present trends, scenario and students
behavior.
Demonstration: To predict the students’ interest on
different areas in the online learning environment, we
are going to use RapidMiner software platform. It is an
open source software in which it will be useful in
machine learning, business analysis, text analysis,
predictive analysis and data mining. In this software
platform, we are going to demonstrate how the text
mining process will be effective over the data in online
learning environment. Once the installation of
RapidMiner is done, we should load the extracted
student information from the online learning
environment to the RapidMiner. The extraction can be
done from any Business Intelligence tool like online
analytical processing, Data warehousing and so on.
Before loading the extracted file into RapidMiner, we
should look for the desired extensions for text
processing by clicking the Extensions icon like the
below screenshot:
Figure 5: RapidMiner Extensions
On clicking the extensions icon, we should install
the package of text processing. Once the text
processing package is installed, next selection
process would be dragging and dropping out the
Process Documents from Files from the text
processing package to the work area as given
below:
Figure 6: Dropping Process Documents from Files
to the RapidMiner workspace
After completing this, we should select the
parameters for this stipulated extension of Process
Documents from filters. This selection is shown in
the below screenshot:
Figure 7: Parameter selection
In the above screenshot, in text directories we
should provide the file path of the local computer.
Here, we are going to compare the two extracted
files of student data from their online learning
environment. The data that we are talking about
here is the dummy data for the demonstration
purpose. One is the student data that belongs to the
Information Technology department, the other is the
student data that belongs to the Telecommunication
department. The extraction is based on the student
information, their online participation, their intention,
topics they are very much interested in, the page they
like and so on. The loading of both the student data is
performed like the below screenshot in the RapidMiner
tool:
Figure 8: Loading dummy student data
Once the dummy student has been loaded, we needs to
select our option for vector creation. The Figure 4 shows
the vector creation. In that once the file is loaded, we
needs to specify which vector creation has to be done.
Documents are represented by the vectors. Here, when
the texts are processed, it is an unstructured and ordered
list of pairs which will then be converted into structured
with the help of document vector model. This conversion
will be done by counting the number of words in the
documents. There are four options for counting of words
which is explained below:
Binary Term Occurrences: This is the simplest option
in which it will count whether the selected word is there
in the document or not.
Term Occurrences: This option is related to binary
term occurrences in which it will be checking for how
often a word is occurred in a document.
Term Frequency: This will look for the fraction of
document length which is happening for the particular
term throughout the document.
TF-IDF: This is the most advanced option in the
RapidMiner tool which stands for term frequency-
inverse document frequency. Term frequency is same
as explained above. Inverse document frequency is
based on the document frequency which is a number of
documents that a word occurs in. It is used to determine
the characteristic of a word. In our demonstration we
have selected this option which collectively performs
two mentioned tasks.
The next step that we needs to perform is which process
should happen inside the loop. The process we have
selected is Tokenization. The main purpose of this
process selection is to cut the texts into individual terms
of terms of words. The different separators can be used
which is highlighted in the below screenshot
Figure 9: Selection of a separator
There are number of separators available on the
RapidMiner tool. The first one is non letter which
includes wide spaces, punctuations, symbols and so on.
The next one is specify characters separator in which we
can select the character according to our wish. Apart
from these two, we can also separators like regular
expression, linguistic sentences and linguistic tokens. In
our demonstration we have selected the non-letters
separator.
We can also perform more number of operations under
text processing. For an instance, we have selected the
filtering option called Filter Stop words (English). It will
helps to remove the articles, conjunctions, pronouns and
so on. As we are going to perform multiple operations
on the text processing in the rapid miner tool, we have
to make sure that we have to give the option of break
after in our second and third operation such as
Tokenization and Filter Stop words respectively.
The next step is we needs to run the selected operations
on the RapidMiner tool. Once we run it, we can see the
separation between the original text and processed text
like in the below screenshot:
table view, plot view and distribution table. The view
we have selected here is plot view in which it will
compare the number of words from Information
Technology student data and Telecommunication
student data. From the overall extraction of almost all
student data, we have compared only two department’s
student data to know their interest on the Marketing
area. On giving the selection of word marketing, we can
come to the conclusion that more number of Information
Technology students are interested in marketing area, as
the graph shows. From knowing this, University
authorities can refine Information technology subjects
by adding some of the Marketing subjects to their
curriculum.
Figure 10: Outcome of text analytics
The color has been changed between each and every
words because we have used the tokenizer option in
which it will make the separation between the individual
words and terms. Likewise the same procedures can be
repeated for each and every documents. In our
demonstration we have used two files containing student
data of Information Technology and
Telecommunication department. It also includes the
example set in which it is consist of one line for each
document and one column for each word. In addition to
this some of the Meta information is also provided like
file information, file date, extension path and group or
class which they belongs to with the label attribute.
In addition to this, if we wants to generate a
classification model, it is possible with the available
classification model with in the RapidMiner tool. In our
demonstration we have used Naïve Baiyes
classification model. The selection of this classification
model is available with the modelling package in the
RapidMiner tool. Once we select our classification, it
will be looking the below screenshot in the RapidMiner
working area.
Figure 11: Selecting a classification model
Once after adding the classification model, we can
perform different operation on the required output like
Figure 12: Plot view of processed text data
Thus, with the help of text processing it is easier to
identify the students’ interest on the online learning
environment. Similarly we can compare various patterns
among the students according to the university
specification.
VII. CONCLUSION & FUTURE IMPLICATIONS
The proposed research gives the clear insight of using
the available student data in an effective and efficient
way. The attributes discussed in this research will
provide a greater benefits to the educational
stakeholders to focus more on the students’ academics
based on the predicted interests of the students’. The
prediction factor of students’ largely depends on their
online participation which will also be further helpful in
providing the valuable outcome, if the research is done
on the various areas similar to this. The future
implications would be evaluating the performance of the
students individually, on evaluating the performance
of the students lecturer can provide some needed
assistance to the particular student, providing some
improvements in study materials, and finally
sometimes it will also provide an opportunity to evaluate
the performance of the lecturer. For this implication,
some of the learning analytic tool can be used which will
be solely focused on individual enhancement of
learning.
VIII. REFERENCES
Ai Yubing., Zhang Jianping., 2010. ‘ The
Application of Data Mining Technology in
Distance Learning Learning Evaluation’,
International Forum on Information
Technology in Distance Learning
Evaulation.
Cristianini, N., Shawe-Taylor, J., 2000. ‘An
Introduction to Support Vector Machines and
other kernel-based learning methods’.
Cambridge University Press.
Dhanalakshmi, v., Dhivya Bino., 2016.
‘Opinion mining from student feedback data
using supervised learning algorithms’, 3rd
MEC International Conference on Big Data
and Smart City
Gabrilson, S., Fabro, D. D. M., Valduriez, P.,
2008. ‘Towards the efficient development of
model transformations using model weaving
and matching transformations’, Office of
information technology, Geogia Department
of Education.
Hsu Chia-Ling., 2012. ‘Qualitative Text
Mining in Student’s Service Learning Diary’.
Third International Conference on
Innovations in Bio-Inspired Computing and
Applications
Kim, H., Zhang, Y., Oussena, S., and Clark,
T., 2009. A Case Study on Model Driven
Data Integration for Data Centric Software
Development, In Proceedings of ACM First
International Workshop on Data-intensive
Software Management and Mining
Luan, J. 2002.	‘Data mining and knowledge
management in higher education –
potential applications’. In Proceedings of
AIR Forum, Toronto, Canada.
Mazon, J. N., Trujillo, J., Serrano, M.,
Piattini, M., 2005. ‘Applying MDA to the
development of data warehouses’. DOLAP
2005
Oussena, S., 2008. ‘Mining Courses
Management Systems’. Thames Valley
University.
P. L. , and Smith, T. J. Ragan, ‘Instructional
design’, Macmillan, New York, 1993
Pathros Ibarra García, E. 2011, ‘Model
Prediction of Academic Performance for
First Year Students’, Mexican International
Conference.
S. M. Weiss, N.’ Indurkhya, T. Zhang, and,
F. Damerau, Text mining predictive methods
for analyzing unstructured information’,
Spring Science-Business Media, Inc., New
York, 2005M. Young, The Technical
Writer’s Handbook. Mill Valley, CA:
University Science, 1989.
Schönbrunn, K., Hilbert, A., 2006. ‘Data
Mining in Higher Education, Studies in
Classification’.Data Analysis,and
Knowledge Organization Advances in Data
Proceedings of the 30th Annual Conference
of the Gesellschaft für Klassifikation e.V.,
Berlin.
Seidman, A., 1996. Spring Retention
Revisited: RET = E Id + (E + I + C)Iv.
College and University, 71(4), 18-20.
National Audition Office, 2007, Staying the
course: the retention of students in higher
education
Superby, J.F., Vandamme, J-P., Meskens,
N., 2006. ‘Determination of factors
influencing the achievement of the first-
year university students using data mining
Methods’. Workshop on Educational Data
Mining.
Tinto, V., 2000. ‘Taking student retention
seriously: rethinking the first year of
college’, NACADA Journal, Vol. 19 No. 2,
pp. 5-10.
Thomas, L., 2002. ‘Student retention in
higher education: the role of institutional
habitus’, Journal of Education Policy, Vol.
17 No. 4, August, pp. 423-442.
Yorke, M., Longden, B., 2004. ‘Retention
and student success in higher education’ ,
Society for Research in Higher Education.

Más contenido relacionado

La actualidad más candente

DATA ANALYTICS FOR HIGHER EDUCATION
 DATA ANALYTICS FOR HIGHER EDUCATION DATA ANALYTICS FOR HIGHER EDUCATION
DATA ANALYTICS FOR HIGHER EDUCATIONSamantha Suraweera
 
IRJET - Recommendation of Branch of Engineering using Machine Learning
IRJET - Recommendation of Branch of Engineering using Machine LearningIRJET - Recommendation of Branch of Engineering using Machine Learning
IRJET - Recommendation of Branch of Engineering using Machine LearningIRJET Journal
 
An Evaluation of e-Learning Program: A Case Study at Institute of Education D...
An Evaluation of e-Learning Program: A Case Study at Institute of Education D...An Evaluation of e-Learning Program: A Case Study at Institute of Education D...
An Evaluation of e-Learning Program: A Case Study at Institute of Education D...Syed Jamal Abd Nasir Syed Mohamad
 
Article review - dr johan 1st assignment
Article review - dr johan  1st assignmentArticle review - dr johan  1st assignment
Article review - dr johan 1st assignmentAziz Ahmad
 
Advances in Learning Analytics and Educational Data Mining
Advances in Learning Analytics and Educational Data Mining Advances in Learning Analytics and Educational Data Mining
Advances in Learning Analytics and Educational Data Mining MehrnooshV
 
Topic Discovery of Online Course Reviews Using LDA with Leveraging Reviews He...
Topic Discovery of Online Course Reviews Using LDA with Leveraging Reviews He...Topic Discovery of Online Course Reviews Using LDA with Leveraging Reviews He...
Topic Discovery of Online Course Reviews Using LDA with Leveraging Reviews He...IJECEIAES
 
IRJET- The Influence of Institutional Information Sharing for Students
IRJET- The Influence of Institutional Information Sharing for StudentsIRJET- The Influence of Institutional Information Sharing for Students
IRJET- The Influence of Institutional Information Sharing for StudentsIRJET Journal
 
Chapter123final
Chapter123finalChapter123final
Chapter123finalDelapisa18
 
Clustering Students of Computer in Terms of Level of Programming
Clustering Students of Computer in Terms of Level of ProgrammingClustering Students of Computer in Terms of Level of Programming
Clustering Students of Computer in Terms of Level of ProgrammingEditor IJCATR
 
Impact of library collections on user satisfaction a case study
Impact of library collections on user satisfaction a case studyImpact of library collections on user satisfaction a case study
Impact of library collections on user satisfaction a case studyAlexander Decker
 
Mining Opinions from University Students’ Feedback using Text Analytics
Mining Opinions from University Students’ Feedback using Text AnalyticsMining Opinions from University Students’ Feedback using Text Analytics
Mining Opinions from University Students’ Feedback using Text AnalyticsITIIIndustries
 
A Survey on Research work in Educational Data Mining
A Survey on Research work in Educational Data MiningA Survey on Research work in Educational Data Mining
A Survey on Research work in Educational Data Miningiosrjce
 
Article Review
Article ReviewArticle Review
Article Reviewfatinnah
 
Literature Review on
Literature Review onLiterature Review on
Literature Review onNadia Ayman
 
A Nobel Approach On Educational Data Mining
A Nobel Approach On Educational Data MiningA Nobel Approach On Educational Data Mining
A Nobel Approach On Educational Data Miningijircee
 
Learning Analytics: Seeking new insights from educational data
Learning Analytics: Seeking new insights from educational dataLearning Analytics: Seeking new insights from educational data
Learning Analytics: Seeking new insights from educational dataAndrew Deacon
 
Research on use of social media among students of GHPIBM, Vallabh Vidyanagar
Research on use of social media among students of GHPIBM, Vallabh VidyanagarResearch on use of social media among students of GHPIBM, Vallabh Vidyanagar
Research on use of social media among students of GHPIBM, Vallabh VidyanagarShikha Karamchandani
 

La actualidad más candente (20)

LA as a metacognitive tool
LA as a metacognitive toolLA as a metacognitive tool
LA as a metacognitive tool
 
DATA ANALYTICS FOR HIGHER EDUCATION
 DATA ANALYTICS FOR HIGHER EDUCATION DATA ANALYTICS FOR HIGHER EDUCATION
DATA ANALYTICS FOR HIGHER EDUCATION
 
IRJET - Recommendation of Branch of Engineering using Machine Learning
IRJET - Recommendation of Branch of Engineering using Machine LearningIRJET - Recommendation of Branch of Engineering using Machine Learning
IRJET - Recommendation of Branch of Engineering using Machine Learning
 
Factors influencing academic participation of undergraduate students
Factors influencing academic participation of undergraduate studentsFactors influencing academic participation of undergraduate students
Factors influencing academic participation of undergraduate students
 
An Evaluation of e-Learning Program: A Case Study at Institute of Education D...
An Evaluation of e-Learning Program: A Case Study at Institute of Education D...An Evaluation of e-Learning Program: A Case Study at Institute of Education D...
An Evaluation of e-Learning Program: A Case Study at Institute of Education D...
 
Article review - dr johan 1st assignment
Article review - dr johan  1st assignmentArticle review - dr johan  1st assignment
Article review - dr johan 1st assignment
 
Advances in Learning Analytics and Educational Data Mining
Advances in Learning Analytics and Educational Data Mining Advances in Learning Analytics and Educational Data Mining
Advances in Learning Analytics and Educational Data Mining
 
Topic Discovery of Online Course Reviews Using LDA with Leveraging Reviews He...
Topic Discovery of Online Course Reviews Using LDA with Leveraging Reviews He...Topic Discovery of Online Course Reviews Using LDA with Leveraging Reviews He...
Topic Discovery of Online Course Reviews Using LDA with Leveraging Reviews He...
 
IRJET- The Influence of Institutional Information Sharing for Students
IRJET- The Influence of Institutional Information Sharing for StudentsIRJET- The Influence of Institutional Information Sharing for Students
IRJET- The Influence of Institutional Information Sharing for Students
 
Chapter123final
Chapter123finalChapter123final
Chapter123final
 
E0364026030
E0364026030E0364026030
E0364026030
 
Clustering Students of Computer in Terms of Level of Programming
Clustering Students of Computer in Terms of Level of ProgrammingClustering Students of Computer in Terms of Level of Programming
Clustering Students of Computer in Terms of Level of Programming
 
Impact of library collections on user satisfaction a case study
Impact of library collections on user satisfaction a case studyImpact of library collections on user satisfaction a case study
Impact of library collections on user satisfaction a case study
 
Mining Opinions from University Students’ Feedback using Text Analytics
Mining Opinions from University Students’ Feedback using Text AnalyticsMining Opinions from University Students’ Feedback using Text Analytics
Mining Opinions from University Students’ Feedback using Text Analytics
 
A Survey on Research work in Educational Data Mining
A Survey on Research work in Educational Data MiningA Survey on Research work in Educational Data Mining
A Survey on Research work in Educational Data Mining
 
Article Review
Article ReviewArticle Review
Article Review
 
Literature Review on
Literature Review onLiterature Review on
Literature Review on
 
A Nobel Approach On Educational Data Mining
A Nobel Approach On Educational Data MiningA Nobel Approach On Educational Data Mining
A Nobel Approach On Educational Data Mining
 
Learning Analytics: Seeking new insights from educational data
Learning Analytics: Seeking new insights from educational dataLearning Analytics: Seeking new insights from educational data
Learning Analytics: Seeking new insights from educational data
 
Research on use of social media among students of GHPIBM, Vallabh Vidyanagar
Research on use of social media among students of GHPIBM, Vallabh VidyanagarResearch on use of social media among students of GHPIBM, Vallabh Vidyanagar
Research on use of social media among students of GHPIBM, Vallabh Vidyanagar
 

Similar a Scientific Paper-2

Technology Enabled Learning to Improve Student Performance: A Survey
Technology Enabled Learning to Improve Student Performance: A SurveyTechnology Enabled Learning to Improve Student Performance: A Survey
Technology Enabled Learning to Improve Student Performance: A SurveyIIRindia
 
Technology Enabled Learning to Improve Student Performance: A Survey
Technology Enabled Learning to Improve Student Performance: A SurveyTechnology Enabled Learning to Improve Student Performance: A Survey
Technology Enabled Learning to Improve Student Performance: A SurveyIIRindia
 
Irjet v4 i73A Survey on Student’s Academic Experiences using Social Media Data53
Irjet v4 i73A Survey on Student’s Academic Experiences using Social Media Data53Irjet v4 i73A Survey on Student’s Academic Experiences using Social Media Data53
Irjet v4 i73A Survey on Student’s Academic Experiences using Social Media Data53IRJET Journal
 
Recommendation of Data Mining Technique in Higher Education Prof. Priya Thaka...
Recommendation of Data Mining Technique in Higher Education Prof. Priya Thaka...Recommendation of Data Mining Technique in Higher Education Prof. Priya Thaka...
Recommendation of Data Mining Technique in Higher Education Prof. Priya Thaka...ijceronline
 
A Survey on Educational Data Mining Techniques
A Survey on Educational Data Mining TechniquesA Survey on Educational Data Mining Techniques
A Survey on Educational Data Mining TechniquesIIRindia
 
A Study on Data Mining Techniques, Concepts and its Application in Higher Edu...
A Study on Data Mining Techniques, Concepts and its Application in Higher Edu...A Study on Data Mining Techniques, Concepts and its Application in Higher Edu...
A Study on Data Mining Techniques, Concepts and its Application in Higher Edu...IRJET Journal
 
Munassir etec647 e presentation
Munassir etec647 e presentationMunassir etec647 e presentation
Munassir etec647 e presentationMunassir Alhamami
 
IRJET- Analysis of Student Performance using Machine Learning Techniques
IRJET- Analysis of Student Performance using Machine Learning TechniquesIRJET- Analysis of Student Performance using Machine Learning Techniques
IRJET- Analysis of Student Performance using Machine Learning TechniquesIRJET Journal
 
A HYBRID CLASSIFICATION ALGORITHM TO CLASSIFY ENGINEERING STUDENTS’ PROBLEMS ...
A HYBRID CLASSIFICATION ALGORITHM TO CLASSIFY ENGINEERING STUDENTS’ PROBLEMS ...A HYBRID CLASSIFICATION ALGORITHM TO CLASSIFY ENGINEERING STUDENTS’ PROBLEMS ...
A HYBRID CLASSIFICATION ALGORITHM TO CLASSIFY ENGINEERING STUDENTS’ PROBLEMS ...IJDKP
 
Student View on Web-Based Intelligent Tutoring Systems about Success and Rete...
Student View on Web-Based Intelligent Tutoring Systems about Success and Rete...Student View on Web-Based Intelligent Tutoring Systems about Success and Rete...
Student View on Web-Based Intelligent Tutoring Systems about Success and Rete...ijmpict
 
Assignment 3-Models of evaluation in educational technology
Assignment 3-Models of evaluation in educational technologyAssignment 3-Models of evaluation in educational technology
Assignment 3-Models of evaluation in educational technologyAysha Al-Shuaili
 
1st Seminar Presentation By Ali Aijaz Shar [Autosaved].pptx
1st Seminar Presentation By Ali Aijaz Shar [Autosaved].pptx1st Seminar Presentation By Ali Aijaz Shar [Autosaved].pptx
1st Seminar Presentation By Ali Aijaz Shar [Autosaved].pptxAli Aijaz
 
A COMPREHENSIVE STUDY ON E-LEARNING PORTAL
A COMPREHENSIVE STUDY ON E-LEARNING PORTALA COMPREHENSIVE STUDY ON E-LEARNING PORTAL
A COMPREHENSIVE STUDY ON E-LEARNING PORTALIRJET Journal
 
Learning Analytics In Higher Education: Struggles & Successes (Part 2)
Learning Analytics In Higher Education: Struggles & Successes (Part 2)Learning Analytics In Higher Education: Struggles & Successes (Part 2)
Learning Analytics In Higher Education: Struggles & Successes (Part 2)Lambda Solutions
 
Predictive and Statistical Analyses for Academic Advisory Support
Predictive and Statistical Analyses for Academic Advisory SupportPredictive and Statistical Analyses for Academic Advisory Support
Predictive and Statistical Analyses for Academic Advisory Supportijcsit
 
IRJET- Predicting Academic Performance based on Social Activities
IRJET-  	  Predicting Academic Performance based on Social ActivitiesIRJET-  	  Predicting Academic Performance based on Social Activities
IRJET- Predicting Academic Performance based on Social ActivitiesIRJET Journal
 
Real Time Application for Career Guidance
Real Time Application for Career GuidanceReal Time Application for Career Guidance
Real Time Application for Career Guidanceijtsrd
 

Similar a Scientific Paper-2 (20)

Technology Enabled Learning to Improve Student Performance: A Survey
Technology Enabled Learning to Improve Student Performance: A SurveyTechnology Enabled Learning to Improve Student Performance: A Survey
Technology Enabled Learning to Improve Student Performance: A Survey
 
Technology Enabled Learning to Improve Student Performance: A Survey
Technology Enabled Learning to Improve Student Performance: A SurveyTechnology Enabled Learning to Improve Student Performance: A Survey
Technology Enabled Learning to Improve Student Performance: A Survey
 
Irjet v4 i73A Survey on Student’s Academic Experiences using Social Media Data53
Irjet v4 i73A Survey on Student’s Academic Experiences using Social Media Data53Irjet v4 i73A Survey on Student’s Academic Experiences using Social Media Data53
Irjet v4 i73A Survey on Student’s Academic Experiences using Social Media Data53
 
Recommendation of Data Mining Technique in Higher Education Prof. Priya Thaka...
Recommendation of Data Mining Technique in Higher Education Prof. Priya Thaka...Recommendation of Data Mining Technique in Higher Education Prof. Priya Thaka...
Recommendation of Data Mining Technique in Higher Education Prof. Priya Thaka...
 
A Survey on Educational Data Mining Techniques
A Survey on Educational Data Mining TechniquesA Survey on Educational Data Mining Techniques
A Survey on Educational Data Mining Techniques
 
A Study on Data Mining Techniques, Concepts and its Application in Higher Edu...
A Study on Data Mining Techniques, Concepts and its Application in Higher Edu...A Study on Data Mining Techniques, Concepts and its Application in Higher Edu...
A Study on Data Mining Techniques, Concepts and its Application in Higher Edu...
 
Munassir etec647 e presentation
Munassir etec647 e presentationMunassir etec647 e presentation
Munassir etec647 e presentation
 
G017224349
G017224349G017224349
G017224349
 
IRJET- Analysis of Student Performance using Machine Learning Techniques
IRJET- Analysis of Student Performance using Machine Learning TechniquesIRJET- Analysis of Student Performance using Machine Learning Techniques
IRJET- Analysis of Student Performance using Machine Learning Techniques
 
A HYBRID CLASSIFICATION ALGORITHM TO CLASSIFY ENGINEERING STUDENTS’ PROBLEMS ...
A HYBRID CLASSIFICATION ALGORITHM TO CLASSIFY ENGINEERING STUDENTS’ PROBLEMS ...A HYBRID CLASSIFICATION ALGORITHM TO CLASSIFY ENGINEERING STUDENTS’ PROBLEMS ...
A HYBRID CLASSIFICATION ALGORITHM TO CLASSIFY ENGINEERING STUDENTS’ PROBLEMS ...
 
Multiple Instance E-Learning Behavioural Coding
Multiple Instance E-Learning Behavioural CodingMultiple Instance E-Learning Behavioural Coding
Multiple Instance E-Learning Behavioural Coding
 
Student View on Web-Based Intelligent Tutoring Systems about Success and Rete...
Student View on Web-Based Intelligent Tutoring Systems about Success and Rete...Student View on Web-Based Intelligent Tutoring Systems about Success and Rete...
Student View on Web-Based Intelligent Tutoring Systems about Success and Rete...
 
Assignment 3-Models of evaluation in educational technology
Assignment 3-Models of evaluation in educational technologyAssignment 3-Models of evaluation in educational technology
Assignment 3-Models of evaluation in educational technology
 
1st Seminar Presentation By Ali Aijaz Shar [Autosaved].pptx
1st Seminar Presentation By Ali Aijaz Shar [Autosaved].pptx1st Seminar Presentation By Ali Aijaz Shar [Autosaved].pptx
1st Seminar Presentation By Ali Aijaz Shar [Autosaved].pptx
 
A COMPREHENSIVE STUDY ON E-LEARNING PORTAL
A COMPREHENSIVE STUDY ON E-LEARNING PORTALA COMPREHENSIVE STUDY ON E-LEARNING PORTAL
A COMPREHENSIVE STUDY ON E-LEARNING PORTAL
 
Ijetr042132
Ijetr042132Ijetr042132
Ijetr042132
 
Learning Analytics In Higher Education: Struggles & Successes (Part 2)
Learning Analytics In Higher Education: Struggles & Successes (Part 2)Learning Analytics In Higher Education: Struggles & Successes (Part 2)
Learning Analytics In Higher Education: Struggles & Successes (Part 2)
 
Predictive and Statistical Analyses for Academic Advisory Support
Predictive and Statistical Analyses for Academic Advisory SupportPredictive and Statistical Analyses for Academic Advisory Support
Predictive and Statistical Analyses for Academic Advisory Support
 
IRJET- Predicting Academic Performance based on Social Activities
IRJET-  	  Predicting Academic Performance based on Social ActivitiesIRJET-  	  Predicting Academic Performance based on Social Activities
IRJET- Predicting Academic Performance based on Social Activities
 
Real Time Application for Career Guidance
Real Time Application for Career GuidanceReal Time Application for Career Guidance
Real Time Application for Career Guidance
 

Scientific Paper-2

  • 1. Prediction of Student learning interests using text analytics Prethiviraj Elango1 , Mithun Rajkumar Antony2 and Krishna Ramanathan3 Faculty of Engineering and IT University of Technology Sydney Sydney, Australia {1 Prethiviraj.Elango, 2 Mithun.RajkumarAntony 3 Krishna.Ramanathan}@student.uts.edu.au Abstract – The collaboration of student learning in online is popular because of its novel advantages over the traditional class room learning. There are certain benefits can be accomplished in using this platform of learning, if the quality of approach is unique. However, there ae some limitations in using the vast amount of available student data. There is no proper evident in in using the student data for various purposes. In the existing literatures, there has been various advantages in using the text analytics for the enhancement of the educational pattern of learning; on going through these literatures, this paper proposes a process model to collect and analyze the student data on their online learning environment. This proposed thesis uses data analytic tool called RapidMiner for text processing to indicate the students’ interest in various area of study based on their available data. Furthermore, this report is based on the proof of concept of a project which is simple enough to target University of Technology Sydney (UTS) and other educational stakeholders. Keywords – Text analytics, online learning, Prediction accuracy I. INTRODUCTION The student online learning environment is a significant change in the present day scenario. University of Technology, Sydney (UTS) providing student an opportunity for this engagement of students in online. They are using UTS online software for making the collaboration of student and professors. There are some limitations in the UTS online in which the students can participate only in the discussion board of their enrolled subjects. Professors can only provide some updates regarding subjects, can publish student marks and can include the subject materials in UTS online. Professors cannot monitor the student activities, interests, intentions and so on, as UTS online does not provide any opportunities to do so.The mentioned limitations can be overcome by the implementation of project called CIC Around. This project is currently under Roberto Martinez-Maldonado Connected Intelligence Centre University of Technology Sydney Sydney, Australia Roberto.Martinez-Maldonado@uts.edu.au process, handled by UTS Connected Intelligence Center (CIC). CIC is operating under UTS who handles multiple project for UTS in which CIC Around is one among them. The activities involved in CIC is to find the happenings on intersecting human sense making and computational analysis. CIC’s research is focused on various domain projects like education, learning analytics, human centered, research analytics and transdisciplinary. Their main aim is to conduct research to answer the unanswered questions on these domains. In the CIC Around project, UTS CIC is designing a participatory design process to build an online WordPress multisite environment which will be useful for student learning, their online collaboration with their peers, provide a students an opportunity to collaborate with the industry partners and for building a community among the students. Students can create groups for the various purpose of studying. Professors are also provided with the opportunity to monitor the students’ progress. The implementation of this project will overcome the existing limitations of UTS online in which this project is more of a participatory process that help the student to participate more on this online learning environment. This project will be more helpful to the students who are studying blocked mode subjects. The understanding of wider UTS community student’s interest in the online learning environment will helps in the analysis of student data for the future enhancement of CIC Around. The proof of concept on the UTS CIC Around with the WordPress plugins, BuddyPress and BBpress has been performed. Following the proof of concept, the proposal of a process model for predicting the students’ interest on this online learning environment has been done. The prediction accuracy is based on the rate of interests on the students over their other areas of learning. This proposal will be helpful for the University authorities to refine the particular courses based on the interest level of students. The data analytic tool called RapidMiner
  • 2. has been used in which the detailed explanations are given as follows. The rest of paper is organized as follows: Section 2 Motivation, Section 3 Methodology, Section 4 Related work, Section 5 Existing process, Section 6 Proposed process, Section 7 Conclusion. II. MOTIVATION FOR THIS RESEARCH The main aim of this research is to provide the educational stakeholders a clear insight about using text analytics in an effective and efficient way. The objective trying to achieve in this paper, is to improve the efficiency of the online learning based on their interests that binds the students from various distance. As technology is enhancing according to recent trends, it is necessary for educational stakeholders to use that technologies to enhance the existing pattern of learning. Certain Universities will be having their own process and norms in enhancing their student’s existing pattern of learning. However, in many cases, student’s interests cannot be predicted by the Universities to know their exact thinking on their selected subjects. There will be vast and vast subjects available for a particular student to study based on their selected course. To explain with the simple example, University of Technology Sydney have refined their Information Technology course on the four majors like Business Information systems, Data Analytics, Networking and Software Development based on the student participation in their Subject Feedback Survey (SFS). Other than this, there will be more and more internal works might be done by UTS to enhance their course. Also, in the survey students will provide the feedback only about their enrolled subjects. This is more than a direct approach without any technical means which does not will provide the information regarding student actual interests with their feedback on their enrolled subjects. Conducting surveys for knowing the student interests is a tedious process in which University authorities cannot be able to collect the survey data manually to know their interests to make some refinements in their course. This is the starting point to perform the research in this area which will be useful for various educational stakeholders. The research is also based on the similar technique explained above but in an alternate way of collecting the student data from their online learning environment. This research will also help in overcoming the flaw of not knowing the interest of a student. Here, data analytic tool is used for the clear understanding of the process involved in this research. On the whole, the ultimate motivation of this research is to accomplish the accuracy of predicting the student interests in various domain areas and incorporating this prediction accuracy to refine their subjects involved in their course. The selected research will also provide a better understanding of student data which will be helpful in analyzing various patterns in future. III. METHODOLOGY In this research, review of many articles related to text analytics has been done. And then based on the findings, proposal has been done for an enhancement related to the existing approaches in managing the student data and what can be done with the student data considering on their online participation. Initially based on the available student data, proposed idea has been sorted considering various factors. The research is mainly focused on the two basis. The first one is collecting the student data from online learning environment for the enhancement purpose of the student learning on the whole. This should be done after retrieving the data from the online learning environment. The data should be retrieved on the back end by reporting and also according to the specifications mentioned by the stakeholder’s purposes. So, on researching various criteria’s, finally decided to use the text analytics in which it will helpful in collecting, measuring, analyzing and finding the similar pattern among the students’ data. The second one is focused on the data analytic tool called RapidMiner in which the bulk student data will be processed according to the keyword search option available in that. The main focus on student data in aiming the text analytics is to derive the high quality information from the student entered text. By using this, similar pattern of text will be structured which will be supportive in interpreting the output. The above mentioned two process will be useful in enhancing the student learning. So, decided to use those two process and then proceeded with the ideas with some demonstration. The clear and detailed description of this two process is clearly explained on the proposed process section. IV. RELATED WORK The initial application of text mining in the field of higher education was not that effective when compared to the later one as they were not user friendly and was very expensive. There are several application of text mining and a unique method is preferred by every user to work with the mining tool depending on their category of knowledge. Text mining also has a great
  • 3. effect in the field of higher education where the teachers can analyze the activities of the learner and help the learner in an efficient manner. Text mining is also used as a major tool to refine the curriculum of any course in a university or any education standards. The author in his book Qualitative Text Mining in Student’s Service Learning Diary has analyzed the services in learning activities of the student’s in any education sector in a way to analyze the outcomes of the students from e-learning and also to provide a reflection to the students based on their interaction with eLearning tools like online discussion board, online exams, etc., He also quotes that the curriculum of a course can be updated by using some text mining technologies, which makes the course even more refine, rather than putting a huge syllabus with unrelated contents for the students. He also introduced some computer technology like (Hsu, 2012). Instructional design This is to provide a blueprint and to examine the teaching standards of every teacher. Instructional design is used to identify a particular learner who is holding a high rate of dropping out of the subject. Once such a learner is identified, a unique approach, and strategies are used to make an efficient teaching practice. The authors narrowed down the concept of instructional design in their book of “Designing instructional feedback for different learning outcomes”. The book clearly states that the instructional event, where a particular student is picked up for motivation has to follow a pretest, practice and a post-test (Smith et al., 1993) Text mining prediction The authors in their book of text mining predictive methods for analyzing unstructured information indicated that any data mining technology will be used to find out the structured data base but not in the semi- structured database. Hearst has identified that data mining would not satisfy the human needs of learning and teaching information. However, when text mining is applied with appropriate language and statistics to analyze text data helps us to attain new data (Weiss et al., 1989). The professor followed a research method of this study. He says: “Initially apply the instructional design model followed by text mining procedures. The model has to combine 3 aspects of view: professor in action research, student teacher in curriculum and instructional development and design students in motivational learning evaluation” which is explained on the below figure. Figure 1: Research models in three points of view The author (Ai et al., 2006) in his paper “The Application of Data Mining Technology in Distance Learning Evaluation has listed out the knowledge that we gather because of text mining, they are: A. Generalized knowledge A very general description of the characteristics of any text the mining tools could generate (in our case, the mining tool is a rapid miner). This generally contains the reflection of common nature of similar things, refining the abstract data and so on. B. Related knowledge This data is gathered when one data is dependent of other similar data or associated knowledge. C. Category knowledge This is similar to the related knowledge but it differs where the gathered texts are categorized based on the different characteristics of knowledge. The most widely used type of classification of data is a tree view. D. Predictive knowledge This can also be said as future knowledge, which is predicted according to the past data and the current data. The trending predictive methods are statistical method, neural networks and machine learning. E. Bias-based knowledge This is nothing but an exceptional knowledge that’s gathered as a description of the
  • 4. differences between characteristics between attributes. They also quoted the use of E-Portfolio with text mining as an application to evaluate the learning behavior of the student. E-Portfolio when used by itself proves to be an inefficient technique to evaluate the learning behavior of the student as it’s evaluated manually by the teacher. It also has the limitations of handling large number of students. The below figure shows that, Text mining when used with E-Portfolio help the teacher to gather some knowledge and in learning objectives associated with the analysis. Through the recorded set of mined data, the teacher can easily understand the regulatory standards and also analyze the results of student’s learning behaviors, which further increases the efficiency of learning evaluation (Ai et al., 2006). Figure 2: Application of data mining technology in E-Portfolio The MCMS (Mining Course Management Systems) project in Thames Valley University recommends to build a knowledge management system based on data mining. Data mining techniques are applied to track the individual student performance also to refine the curriculum according to the activities of the student. Text mining is used as a tool to represent the mined data by the MCMS in a human understandable way for better decision making (Oussena, 2008). A model-driven data integration is applied in MCMS to fetch the data from different systems into a single data warehouse for analyzing (Kim et al., 2009). The data in the warehouse should always be pre-processed and transformed before it undergoes any mining techniques. So when the data is ready, it increases the efficiency of the data mining process. Such an efficient knowledge gathered from the data mining process will be used by the university to have an advanced approach of prediction individual’s behavior, instructing the students. Text mining is applied here to narrow down the student’s interaction with the online learning (ELearning) tool. When a knowledge management system and a text mining process and used simultaneously, an university will have the highest level of data efficiency which further facilitates the university to choose the most advanced approach in understanding their student’s need. Figure 3: Workflow of MCMS The author determines the student’s test score by using the data mining prediction technique by using an effective factor. This factor is later altered according to the student’s performance in the succeeding year (Gabrilson, 2003). Luan groups the students into 2 categories. One with the students who can easily deal with the courses and the other with students who take a longer time to complete a course (Luan, 2002). Such groups helps the universities to make a better decision on refining their curriculum, the time for teaching and so on. To understand the factors which determines the student’s retention, the universities usually collects data about the history of academic performance of a student, behavior and perceptions of a student, for instance the author used different classifiers to predict the student’s characteristics which lead to a very less accuracy or a bad accuracy (Superby et al., 2006). The authors in their paper “Use Data Mining To Improve Student Retention In Higher Education” has stated the student retention as the biggest challenge as it decides better academic programs and a better revenue for the universities (Oussena et al., 2010). A simple formula for maintaining the student retention rate was developed by Seidman (Seidman, 1996), which is: Retention=Early Identification + (Early + Intensive + Continuous) Intervention This formula helps to understand that early detection of those students at risks and maintain regular interaction will be the most recommendable way to increase student retention Tinto has provided 5 strategies to increase student retention to the next level: • Understanding the expectations of the student.
  • 5. • Conducting a counselling session in helping the students choose their courses. • Providing academic and social support specially before the start of the first semester • Motivating the student on explaining their capability • Active interaction with the available learning sources The authors in their work introduces the idea of using opinion mining from student’s feedback data. As opinions of the stakeholders will be the major factor in individual’s decision making, the authors have considered this technique to understand their students better and to refine the curriculum. The result of the opinion mining depends on how good the data is preprocessed or stages the data has undergone when it’s prepared before classification (Dhanalakshmi et al., 2016). The authors in their work used linear regression classifier to identify the variable which is associated with the academic performance. This leads them to realize, previous academic performance was the important variable (Oussena , 2008). V. EXISTING PROCESS The existing system of text analytics in general is used to process the unstructured information into structured, extract the meaningful information from the entered text and contained information of the text will be used by the various data mining algorithms. The extraction of information will be done by summarizing the number of words in the document. The summarized words then can be analyzed to find the similarities and relationship between them. The most common method in text analytics is to convert the text to numbers for the analysis of clustering and predictive data mining projects. In addition, this method will also be helpful in various analysis. Text mining also includes sentimental analysis, summarization of documents, entity relation model, text clustering and text categorization. The below figure shows the overall description of the text analytics process: Figure 4: Text analytics process VI. PROPOSED PROCESS In this proposal, the illustration is going to be with the usage of text analytics with the student data. The proceedings are based on the existing text analytics process. As we are dealing with the student data from the online learning environment, the first thing we needs to do is collecting the student information like their posted data, their comments, their participation data in any discussion and their micro information like the page they visits, they page they like and the topics they are very much interested in. Every data that we will be collecting from the relational databases will be in an unstructured format. All unstructured data will be retrieved in the document format. So to make it into structured format we can use vector representation feature. By using this feature, we can bring those documents in a similar database which will then be converted into structured format. The collection of this structured data is very important because we are going to find some of the similar patterns and relationship among their data. The main purpose in doing this is to make sure to find out the similarities of a single student opinion regarding other subjects in which it is not in the part of their course. Example: For example, a student belongs to Information Technology course but he/she has more interest in marketing related topics. If that particular student is participating in more and more marketing related activities, we can come to the conclusion that particular Information Technology student is equally interested in marketing subjects as well. Like this many other fellow Information Technology students might have interest in marketing. Now, it is very clear from this point is quite a considerable amount of information technology students are interested in marketing. By identifying this similarities and patterns, the Universities are provided with the opportunity to refine the Information Technology course by including marketing subjects. Likewise many students who all are
  • 6. comes under one particular course will have equal interest in other areas as well. So with the help of text analytics the course can be refined periodically according to the present trends, scenario and students behavior. Demonstration: To predict the students’ interest on different areas in the online learning environment, we are going to use RapidMiner software platform. It is an open source software in which it will be useful in machine learning, business analysis, text analysis, predictive analysis and data mining. In this software platform, we are going to demonstrate how the text mining process will be effective over the data in online learning environment. Once the installation of RapidMiner is done, we should load the extracted student information from the online learning environment to the RapidMiner. The extraction can be done from any Business Intelligence tool like online analytical processing, Data warehousing and so on. Before loading the extracted file into RapidMiner, we should look for the desired extensions for text processing by clicking the Extensions icon like the below screenshot: Figure 5: RapidMiner Extensions On clicking the extensions icon, we should install the package of text processing. Once the text processing package is installed, next selection process would be dragging and dropping out the Process Documents from Files from the text processing package to the work area as given below: Figure 6: Dropping Process Documents from Files to the RapidMiner workspace After completing this, we should select the parameters for this stipulated extension of Process Documents from filters. This selection is shown in the below screenshot: Figure 7: Parameter selection In the above screenshot, in text directories we should provide the file path of the local computer. Here, we are going to compare the two extracted files of student data from their online learning environment. The data that we are talking about here is the dummy data for the demonstration purpose. One is the student data that belongs to the
  • 7. Information Technology department, the other is the student data that belongs to the Telecommunication department. The extraction is based on the student information, their online participation, their intention, topics they are very much interested in, the page they like and so on. The loading of both the student data is performed like the below screenshot in the RapidMiner tool: Figure 8: Loading dummy student data Once the dummy student has been loaded, we needs to select our option for vector creation. The Figure 4 shows the vector creation. In that once the file is loaded, we needs to specify which vector creation has to be done. Documents are represented by the vectors. Here, when the texts are processed, it is an unstructured and ordered list of pairs which will then be converted into structured with the help of document vector model. This conversion will be done by counting the number of words in the documents. There are four options for counting of words which is explained below: Binary Term Occurrences: This is the simplest option in which it will count whether the selected word is there in the document or not. Term Occurrences: This option is related to binary term occurrences in which it will be checking for how often a word is occurred in a document. Term Frequency: This will look for the fraction of document length which is happening for the particular term throughout the document. TF-IDF: This is the most advanced option in the RapidMiner tool which stands for term frequency- inverse document frequency. Term frequency is same as explained above. Inverse document frequency is based on the document frequency which is a number of documents that a word occurs in. It is used to determine the characteristic of a word. In our demonstration we have selected this option which collectively performs two mentioned tasks. The next step that we needs to perform is which process should happen inside the loop. The process we have selected is Tokenization. The main purpose of this process selection is to cut the texts into individual terms of terms of words. The different separators can be used which is highlighted in the below screenshot Figure 9: Selection of a separator There are number of separators available on the RapidMiner tool. The first one is non letter which includes wide spaces, punctuations, symbols and so on. The next one is specify characters separator in which we can select the character according to our wish. Apart from these two, we can also separators like regular expression, linguistic sentences and linguistic tokens. In our demonstration we have selected the non-letters separator. We can also perform more number of operations under text processing. For an instance, we have selected the filtering option called Filter Stop words (English). It will helps to remove the articles, conjunctions, pronouns and so on. As we are going to perform multiple operations on the text processing in the rapid miner tool, we have to make sure that we have to give the option of break after in our second and third operation such as Tokenization and Filter Stop words respectively. The next step is we needs to run the selected operations on the RapidMiner tool. Once we run it, we can see the separation between the original text and processed text like in the below screenshot:
  • 8. table view, plot view and distribution table. The view we have selected here is plot view in which it will compare the number of words from Information Technology student data and Telecommunication student data. From the overall extraction of almost all student data, we have compared only two department’s student data to know their interest on the Marketing area. On giving the selection of word marketing, we can come to the conclusion that more number of Information Technology students are interested in marketing area, as the graph shows. From knowing this, University authorities can refine Information technology subjects by adding some of the Marketing subjects to their curriculum. Figure 10: Outcome of text analytics The color has been changed between each and every words because we have used the tokenizer option in which it will make the separation between the individual words and terms. Likewise the same procedures can be repeated for each and every documents. In our demonstration we have used two files containing student data of Information Technology and Telecommunication department. It also includes the example set in which it is consist of one line for each document and one column for each word. In addition to this some of the Meta information is also provided like file information, file date, extension path and group or class which they belongs to with the label attribute. In addition to this, if we wants to generate a classification model, it is possible with the available classification model with in the RapidMiner tool. In our demonstration we have used Naïve Baiyes classification model. The selection of this classification model is available with the modelling package in the RapidMiner tool. Once we select our classification, it will be looking the below screenshot in the RapidMiner working area. Figure 11: Selecting a classification model Once after adding the classification model, we can perform different operation on the required output like Figure 12: Plot view of processed text data Thus, with the help of text processing it is easier to identify the students’ interest on the online learning environment. Similarly we can compare various patterns among the students according to the university specification. VII. CONCLUSION & FUTURE IMPLICATIONS The proposed research gives the clear insight of using the available student data in an effective and efficient way. The attributes discussed in this research will provide a greater benefits to the educational stakeholders to focus more on the students’ academics based on the predicted interests of the students’. The prediction factor of students’ largely depends on their online participation which will also be further helpful in providing the valuable outcome, if the research is done on the various areas similar to this. The future implications would be evaluating the performance of the students individually, on evaluating the performance of the students lecturer can provide some needed assistance to the particular student, providing some improvements in study materials, and finally
  • 9. sometimes it will also provide an opportunity to evaluate the performance of the lecturer. For this implication, some of the learning analytic tool can be used which will be solely focused on individual enhancement of learning. VIII. REFERENCES Ai Yubing., Zhang Jianping., 2010. ‘ The Application of Data Mining Technology in Distance Learning Learning Evaluation’, International Forum on Information Technology in Distance Learning Evaulation. Cristianini, N., Shawe-Taylor, J., 2000. ‘An Introduction to Support Vector Machines and other kernel-based learning methods’. Cambridge University Press. Dhanalakshmi, v., Dhivya Bino., 2016. ‘Opinion mining from student feedback data using supervised learning algorithms’, 3rd MEC International Conference on Big Data and Smart City Gabrilson, S., Fabro, D. D. M., Valduriez, P., 2008. ‘Towards the efficient development of model transformations using model weaving and matching transformations’, Office of information technology, Geogia Department of Education. Hsu Chia-Ling., 2012. ‘Qualitative Text Mining in Student’s Service Learning Diary’. Third International Conference on Innovations in Bio-Inspired Computing and Applications Kim, H., Zhang, Y., Oussena, S., and Clark, T., 2009. A Case Study on Model Driven Data Integration for Data Centric Software Development, In Proceedings of ACM First International Workshop on Data-intensive Software Management and Mining Luan, J. 2002. ‘Data mining and knowledge management in higher education – potential applications’. In Proceedings of AIR Forum, Toronto, Canada. Mazon, J. N., Trujillo, J., Serrano, M., Piattini, M., 2005. ‘Applying MDA to the development of data warehouses’. DOLAP 2005 Oussena, S., 2008. ‘Mining Courses Management Systems’. Thames Valley University. P. L. , and Smith, T. J. Ragan, ‘Instructional design’, Macmillan, New York, 1993 Pathros Ibarra García, E. 2011, ‘Model Prediction of Academic Performance for First Year Students’, Mexican International Conference. S. M. Weiss, N.’ Indurkhya, T. Zhang, and, F. Damerau, Text mining predictive methods for analyzing unstructured information’, Spring Science-Business Media, Inc., New York, 2005M. Young, The Technical Writer’s Handbook. Mill Valley, CA: University Science, 1989. Schönbrunn, K., Hilbert, A., 2006. ‘Data Mining in Higher Education, Studies in Classification’.Data Analysis,and Knowledge Organization Advances in Data Proceedings of the 30th Annual Conference of the Gesellschaft für Klassifikation e.V., Berlin. Seidman, A., 1996. Spring Retention Revisited: RET = E Id + (E + I + C)Iv. College and University, 71(4), 18-20. National Audition Office, 2007, Staying the course: the retention of students in higher education Superby, J.F., Vandamme, J-P., Meskens, N., 2006. ‘Determination of factors influencing the achievement of the first- year university students using data mining Methods’. Workshop on Educational Data Mining. Tinto, V., 2000. ‘Taking student retention seriously: rethinking the first year of college’, NACADA Journal, Vol. 19 No. 2, pp. 5-10. Thomas, L., 2002. ‘Student retention in higher education: the role of institutional habitus’, Journal of Education Policy, Vol. 17 No. 4, August, pp. 423-442. Yorke, M., Longden, B., 2004. ‘Retention and student success in higher education’ , Society for Research in Higher Education.