SlideShare una empresa de Scribd logo
1 de 4
Fourth AIMS International Conference on Management                                              December 28-31,
2006

                               Web Data Mining: A Case Study
                           [14 Points Times New Roman, Centred]
                             Samia Massoud, Al Ghurair University
                                   smassoud99@yahooo.com
                       Omprakash K Gupta, Prairie View A&M University
                                     om_gupta@pvamu.edu
                         [12 Points Times New Roman, bold, Centred]

   With an enormous amount of data stored in databases and data warehouses, it is increasingly important to
develop powerful tools for analysis of such data and mining interesting knowledge from it. Data mining is a
process of inferring knowledge from such huge data. The main problem related to the retrieval of information
from the World Wide Web is the enormous number of unstructured documents and resources, i.e., the difficulty
of locating and tracking appropriate sources. In this paper, we survey the research in the area of web mining
and suggest web mining categories and techniques. Furthermore, we present a web mining environment
generator that allows naive users to generate a web mining environment specific to a given domain by
providing a set of specifications.

Keywords: Data Mining, Case Study, Database, Data Warehouse

                 1. Introduction [12 Points Times New Roman, bold, Centred]
In business today, companies are working fast to gain a valuable competitive advantage over other businesses.
A fast-growing and popular technology, which can help to gain this advantage, is data mining. Data mining
technology allows a company to use the mass quantities of data that it has compiled, and develop correlations
and relationships among this data to help businesses improve efficiency, learn more about its customers, make
better decisions, and help in planning. Data Mining has three major components Clustering or Classification,
Association Rules and Sequence Analysis. This technology can develop these analyses on its own, using a blend
of statistics, artificial intelligence, machine learning algorithms, and data stores.

1.1 Data Mining [10 Points Times New Roman, bold, left justified, first letter in CAPITAL]
Data mining is a tool that can extract predictive information from large quantities of data, and is data driven. It
uses mathematical and statistical calculations to uncover trends and correlations among the large quantities of
data stored in a database. It is a blend of artificial intelligence technology, statistics, data warehousing, and
machine learning.
   Data mining started with statistics. Statistical functions such as standard deviation, regression analysis, and
variance are all valuable tools that allow people to study the reliability and relationships between data. Much of
what data mining does is rooted in statistics, making it one of the cornerstones of data mining technology.
   In the 1970’s data was stored using large mainframe systems and COBOL programming techniques. These
simplistic beginnings gave way to very large databases called “data warehouses”, which store data in one
standard format. The dictionary definition of a data warehouse is “a generic term for storing, retrieving, and
managing large amounts of data.” (dictionary.com).These data warehouses “can now store and query terabytes
and megabytes of data in sophisticated database management systems.” (Carbone,2000) These data stores are an
essential part of data mining, because a cornerstone of the technology is that it needs very large amounts of
organized data to manipulate.
   In addition to basic statistics and large data warehouses, a major part of data mining technology is artificial
intelligence (AI). Artificial intelligence started in the 1980’s with a set of algorithms that were designed to
teach a computer how to “learn” by itself. As they developed, these algorithms became valuable data
manipulation tools and were applied to large sets of data. Instead of entering a set of pre-defined hypothesis, the
data mining software, combined with AI technology was able to generate its own relationships between the data.
It was even able to analyze data and discover correlations between the data on its own, and develop models to
help the developers interpret the relationships that were found.
   AI gave way to machine learning. Machine learning is defined as “the ability of a machine to improve its
performance based on previous results.” (dictionary.com) Machine learning is the next step in artificial
intelligence technology because it blends trial and error learning by the system with statistical analysis. This
lets the software learn on its own and allows it to make decisions regarding the data it is trying to analyze.

                                                        1
Fourth AIMS International Conference on Management                                                 December 28-31,
2006

   Later in the 1990’s data mining became wildly popular. Many companies began to use the data mining
technology and found that it was much easier than having actual people work with such large amounts of data
and attributes. This technology allows the systems to “think” for themselves and run analysis that would
provide trend and correlation information for the data in the tables. In 2001, the use of data warehouses grew by
over a third to 77%. (Hardison, 2002).
  Data mining is a very important tool for business and as time goes on, business is becoming more and more
competitive and everyone is scrambling for a competitive edge (Hardison, 2002). Businesses need to gain a
competitive edge, and can get it from the increased awareness they can get from data mining software that is
available on the market right now (Montana, 2001) .

                                        Data Mining Evolutionary Chart
  Evolutionary Step        Business Question            Enabling              Product            Characteristics
                                                        Technologies          Providers

  Data Collection          "What was my total           Computers, tapes,     IBM, CDC           Retrospective,
                           revenue in the last five     disks                                    static       data
  (1960s)                  years?"                                                               delivery

  Data Access              "What were unit sales in     Relational            Oracle,            Retrospective,
                           New     England     last     databases             Sybase,            dynamic       data
  (1980s)                  March?"                      (RDBMS),              Informix,          delivery at record
                                                        Structured Query      IBM,               level
                                                        Language (SQL),       Microsoft
                                                        ODBC

  Data Warehousing &       "What were unit sales in     On-line analytic      Pilot,             Retrospective,
                           New      England    last     processing            Comshare,          dynamic       data
  Decision Support         March? Drill down to         (OLAP),               Arbor,             delivery        at
                           Boston."                     multidimensional      Cognos,            multiple levels
  (1990s)                                               databases,   data     Microstrategy
                                                        warehouses

  Data Mining              "What’s    likely   to       Advanced              Pilot,             Prospective,
                           happen to Boston unit        algorithms,           Lockheed,          proactive
  (Emerging Today)         sales   next    month?       multiprocessor        IBM,      SGI,     information
                           Why?"                        computers,            numerous           delivery
                                                        massive databases     startups
                                                                              (nascent
                                                                              industry)
*This chart is from: http://www.thearling.com/text/dmwhite/dmwhite.htm

                                                 2. Web Mining
Web Mining rapidly collect and integrate information from multiple Web sites. This approach relies on
computing power, rather than programmer brain power, to handle many of the complex issues that arise when
gathering information from disparate sources. One solution is the power by advanced artificial intelligence
algorithms. For instance, machine learning techniques would be used for extracting information from Web sites.
Based on a few examples, the software automatically determines how to extract different types of information
from data sources.

Association Rule Algorithms
An association rule is a rule which implies certain association relationships among a set of objects (such as
``occur together'' or ``one implies the other'') in a database. Given a set of transactions, where each transaction is
a set of literals (called items), an association rule is an expression of the form X Y , where X and Y are sets of
items. The intuitive meaning of such a rule is that transactions of the database which contain X tend to contain
Y.

                                                          2
Fourth AIMS International Conference on Management                                               December 28-31,
2006

Classification Algorithms
In Data classification one develops a description or model for each class in a database, based on the features
present in a set of class-labeled training data. There have been many data classification methods studied,
including decision-tree methods, such as statistical methods, neural networks, rough sets, database-oriented
methods etc.

Sequential Analysis
Here we are looking for a Sequential Patterns, called data-sequences. Each data sequence is an ordered list of
transactions (or item sets), where each transaction is a sets of items (literals). Typically there is a transaction-
time associated with each transaction. A sequential pattern also consists of a list of sets of items. The problem is
to find all sequential patterns with a user-specified minimum support, where the support of a sequential pattern
is the percentage of data sequences that contain the pattern.
   This paper explains how to build a mining model and to solve a few concern problems to best teach online
class using the web and get to know the students to serve them better.

                                                 3. The Study
This paper is organized in two parts. The first part of the paper provides a brief presentation of the clustering
algorithm, and explains a few basic data mining terms.
   The second part of the paper focuses on the performance study of two classes that are using online course
over the web and the web generating a mining algorithm using a variety of parameters. A number of
experiments were conducted, and their results are presented in this part. The experiments were based on
different parameter of interest. The parameters varied form the number of input attributes, the sample size of the
class, and so on. The results of these experiments prove that the data mining algorithm are very efficient and
scalable.

The Clustering Algorithm Used
The Clustering algorithm is based on the Expectation and Maximization algorithm ( Microsoft, 2003). This
algorithm iterates between two steps. In the first step, called the "expectation" step, the cluster membership of
each case is calculated. In the second step, called the "Maximization" step, the parameters of the models are re-
estimated using these cluster memberships, which has the following major steps:
     1. Assign initial means
     2. Assign cases to each mean using some distance measure
     3. Compute new means based on members of each cluster
     4. Cycle until convergence.
  A case is assigned to each cluster with a certain probability and the means of each cluster is shifted based on
that iteration. The following table shows a set of data that could be used to predict best achievement. In this
study, information was generated on users that included the following list of measurements:
     1. Most requested pages
     2. Least requested pages
     3. Top exit pages
     4. Most accessed directories
     5. Most downloaded files
     6. New versus returning visitors
     7. Summary of activity for exam period
     8. Summary of activity by time increment
     9. Number of views per each page
     10. Page not found
The relevant measurements are viewed which implied the following:


                                      Table 1 Statistics on Web Site Visits
          Statistics Report
          Hits                    Entire Site ( Successful)                               2390
                                  Average per day                                         72
                                  Home page                                               256
          Page Views              Page views                                              213
                                  Average per day                                         97

                                                         3
Fourth AIMS International Conference on Management                                                December 28-31,
2006

                                   Document views                                          368
          Visitor Sessions         Visitor sessions                                        823
                                   Average per day                                         65
                                   Average visitor session length                          00:31:24
          Visitors                 Visitors who visited once                               32
                                   Visitors who visited more than once                     75

                                                    4. Results
This brief case study gives a look at what statistics are commonly measured on web sites. The results of these
statistics can be used to alter the web site, thereby altering the next user’s experience. Table 1 displays some
basic statistics that relate to frequency, length and kind of visitor. In Table 1 additional insights are gained with
a breakdown of visitors by each day. This behavior might reflect variation in activity related to exams time or
other issues. Monitoring and understanding visitor behavior is the first step in evaluating and improving the web
site. Another relevant measurement is how many pages are viewed. This can reflect content as well as
navigability. If a majority of visitors viewed only one page, it may imply that they did not find it easy to
determine how to take the next step.

                                                 5. Conclusion
The web offers prospecting and user relationship management opportunities that are limited only by the
imagination. Data mining is a tool that can extract predictive information from large quantities of data, and is
data driven. It uses mathematical and statistical calculations to uncover trends and correlations among the large
quantities of data stored in a database. It is a blend of artificial intelligence technology, statistics, data
warehousing, and machine learning. This data mining technology is becoming more and more popular, and is
one of the fastest growing technologies in information systems today. The future of data mining is wide open,
and it will be exciting to see how far this technology will go.

                                                 6. References

1.    Carbone,     Patricia.,   August,    2000.    "What    is    the   Origin   of   Data    Mining?"
     <http://www.mitre.org/pubs/edge/august_00/carbone.htm>.
2.    Evolutionary Chart http://www.thearling.com/text/dmwhite/dmwhite.htm
3.    “Data Mining Overview.” http//:www.data-mine.com/
4.    “Data Mining” Def. www. Dictionary.Com
5.    “Data Warehouse” Def. www.Dictionary.com
6.   Hardison, David. “Data Mining: The New Gold Rush.” Pharmaceutical Executive. March, 2002. p. 26, 28,
     30.
7.   Microsoft “Performance Study of Microsoft Data Mining Algorithms”
     March 25, 2002
8.    Montana, John. “ Data Mining: A Slippery Slope.” Information Management Journal. October, 2001. p.
     50-54.




                                                         4

Más contenido relacionado

La actualidad más candente

Artificial Intelligence
Artificial IntelligenceArtificial Intelligence
Artificial IntelligenceEnes Bolfidan
 
Data science and visualization lab presentation
Data science and visualization lab presentationData science and visualization lab presentation
Data science and visualization lab presentationiHub Research
 
Mining on Relationships in Big Data era using Improve Apriori Algorithm with ...
Mining on Relationships in Big Data era using Improve Apriori Algorithm with ...Mining on Relationships in Big Data era using Improve Apriori Algorithm with ...
Mining on Relationships in Big Data era using Improve Apriori Algorithm with ...KamleshKumar394
 
Career in Data Science
Career in Data ScienceCareer in Data Science
Career in Data ScienceActonRoy
 
Data Science Lecture: Overview and Information Collateral
Data Science Lecture: Overview and Information CollateralData Science Lecture: Overview and Information Collateral
Data Science Lecture: Overview and Information CollateralFrank Kienle
 
Big data deep learning: applications and challenges
Big data deep learning: applications and challengesBig data deep learning: applications and challenges
Big data deep learning: applications and challengesfazail amin
 
Big Data and Computer Science Education
Big Data and Computer Science EducationBig Data and Computer Science Education
Big Data and Computer Science EducationJames Hendler
 
Introduction to data science club
Introduction to data science clubIntroduction to data science club
Introduction to data science clubData Science Club
 
Data Science Innovations : Democratisation of Data and Data Science
Data Science Innovations : Democratisation of Data and Data Science  Data Science Innovations : Democratisation of Data and Data Science
Data Science Innovations : Democratisation of Data and Data Science suresh sood
 
Data science applications and usecases
Data science applications and usecasesData science applications and usecases
Data science applications and usecasesSreenatha Reddy K R
 
RESEARCH IN BIG DATA – AN OVERVIEW
RESEARCH IN BIG DATA – AN OVERVIEWRESEARCH IN BIG DATA – AN OVERVIEW
RESEARCH IN BIG DATA – AN OVERVIEWieijjournal
 

La actualidad más candente (20)

Artificial Intelligence
Artificial IntelligenceArtificial Intelligence
Artificial Intelligence
 
Data science and visualization lab presentation
Data science and visualization lab presentationData science and visualization lab presentation
Data science and visualization lab presentation
 
Overview of Big Data
Overview of Big DataOverview of Big Data
Overview of Big Data
 
Mining on Relationships in Big Data era using Improve Apriori Algorithm with ...
Mining on Relationships in Big Data era using Improve Apriori Algorithm with ...Mining on Relationships in Big Data era using Improve Apriori Algorithm with ...
Mining on Relationships in Big Data era using Improve Apriori Algorithm with ...
 
Prd 1099
Prd 1099Prd 1099
Prd 1099
 
Data science
Data scienceData science
Data science
 
Data science
Data scienceData science
Data science
 
Career in Data Science
Career in Data ScienceCareer in Data Science
Career in Data Science
 
Data Science Lecture: Overview and Information Collateral
Data Science Lecture: Overview and Information CollateralData Science Lecture: Overview and Information Collateral
Data Science Lecture: Overview and Information Collateral
 
Lecture #02
Lecture #02 Lecture #02
Lecture #02
 
Lecture #03
Lecture #03Lecture #03
Lecture #03
 
Data science
Data scienceData science
Data science
 
Data science Big Data
Data science Big DataData science Big Data
Data science Big Data
 
Big data deep learning: applications and challenges
Big data deep learning: applications and challengesBig data deep learning: applications and challenges
Big data deep learning: applications and challenges
 
Big Data and Computer Science Education
Big Data and Computer Science EducationBig Data and Computer Science Education
Big Data and Computer Science Education
 
Introduction to data science club
Introduction to data science clubIntroduction to data science club
Introduction to data science club
 
Data Analytics Career Paths
Data Analytics Career PathsData Analytics Career Paths
Data Analytics Career Paths
 
Data Science Innovations : Democratisation of Data and Data Science
Data Science Innovations : Democratisation of Data and Data Science  Data Science Innovations : Democratisation of Data and Data Science
Data Science Innovations : Democratisation of Data and Data Science
 
Data science applications and usecases
Data science applications and usecasesData science applications and usecases
Data science applications and usecases
 
RESEARCH IN BIG DATA – AN OVERVIEW
RESEARCH IN BIG DATA – AN OVERVIEWRESEARCH IN BIG DATA – AN OVERVIEW
RESEARCH IN BIG DATA – AN OVERVIEW
 

Destacado

PowerPoint Presentation
PowerPoint PresentationPowerPoint Presentation
PowerPoint Presentationbutest
 
Andrew Shitov Rakudo Jonathan
Andrew Shitov Rakudo JonathanAndrew Shitov Rakudo Jonathan
Andrew Shitov Rakudo Jonathanguest092df8
 
Vecer Na Temu25marec2010
Vecer Na Temu25marec2010Vecer Na Temu25marec2010
Vecer Na Temu25marec2010guestc27e91
 
C:\Fakepath\солянка мясная
C:\Fakepath\солянка мяснаяC:\Fakepath\солянка мясная
C:\Fakepath\солянка мяснаяgmc999
 
Carnivores: Inspection under Philosophy of Action
Carnivores: Inspection under Philosophy of ActionCarnivores: Inspection under Philosophy of Action
Carnivores: Inspection under Philosophy of ActionYoav Francis
 
Help The Needy In Africa
Help The Needy In AfricaHelp The Needy In Africa
Help The Needy In Africahavillacc24
 

Destacado (11)

20161014092229851
2016101409222985120161014092229851
20161014092229851
 
PowerPoint Presentation
PowerPoint PresentationPowerPoint Presentation
PowerPoint Presentation
 
Tugas Pwer Poin
Tugas Pwer PoinTugas Pwer Poin
Tugas Pwer Poin
 
Andrew Shitov Rakudo Jonathan
Andrew Shitov Rakudo JonathanAndrew Shitov Rakudo Jonathan
Andrew Shitov Rakudo Jonathan
 
Vecer Na Temu25marec2010
Vecer Na Temu25marec2010Vecer Na Temu25marec2010
Vecer Na Temu25marec2010
 
C:\Fakepath\солянка мясная
C:\Fakepath\солянка мяснаяC:\Fakepath\солянка мясная
C:\Fakepath\солянка мясная
 
Carnivores: Inspection under Philosophy of Action
Carnivores: Inspection under Philosophy of ActionCarnivores: Inspection under Philosophy of Action
Carnivores: Inspection under Philosophy of Action
 
Preetam CV Experience
Preetam CV ExperiencePreetam CV Experience
Preetam CV Experience
 
20161010092022395
2016101009202239520161010092022395
20161010092022395
 
內湖花市
內湖花市內湖花市
內湖花市
 
Help The Needy In Africa
Help The Needy In AfricaHelp The Needy In Africa
Help The Needy In Africa
 

Similar a Sample Paper.doc.doc

Data Mining: Future Trends and Applications
Data Mining: Future Trends and ApplicationsData Mining: Future Trends and Applications
Data Mining: Future Trends and ApplicationsIJMER
 
Big Data = Big Decisions
Big Data = Big DecisionsBig Data = Big Decisions
Big Data = Big DecisionsInnoTech
 
Review of big data analytics (bda) architecture trends and analysis
Review of big data analytics (bda) architecture   trends and analysis Review of big data analytics (bda) architecture   trends and analysis
Review of big data analytics (bda) architecture trends and analysis Conference Papers
 
An introduction to Data Mining
An introduction to Data MiningAn introduction to Data Mining
An introduction to Data MiningShobhita Dayal
 
An introduction to Data Mining by Kurt Thearling
An introduction to Data Mining by Kurt ThearlingAn introduction to Data Mining by Kurt Thearling
An introduction to Data Mining by Kurt ThearlingPim Piepers
 
Big Data and Implications on Platform Architecture
Big Data and Implications on Platform ArchitectureBig Data and Implications on Platform Architecture
Big Data and Implications on Platform ArchitectureOdinot Stanislas
 
A Survey on Data Mining
A Survey on Data MiningA Survey on Data Mining
A Survey on Data MiningIOSR Journals
 
Big Data Session 1.pptx
Big Data Session 1.pptxBig Data Session 1.pptx
Big Data Session 1.pptxElsonPaul2
 
INF2190_W1_2016_public
INF2190_W1_2016_publicINF2190_W1_2016_public
INF2190_W1_2016_publicAttila Barta
 
Upstate CSCI 525 Data Mining Chapter 1
Upstate CSCI 525 Data Mining Chapter 1Upstate CSCI 525 Data Mining Chapter 1
Upstate CSCI 525 Data Mining Chapter 1DanWooster1
 
Data analysis trend 2015 2016 v071
Data analysis trend 2015 2016 v071Data analysis trend 2015 2016 v071
Data analysis trend 2015 2016 v071Chun Myung Kyu
 
01Introduction to data mining chapter 1.ppt
01Introduction to data mining chapter 1.ppt01Introduction to data mining chapter 1.ppt
01Introduction to data mining chapter 1.pptadmsoyadm4
 

Similar a Sample Paper.doc.doc (20)

Data Mining: Future Trends and Applications
Data Mining: Future Trends and ApplicationsData Mining: Future Trends and Applications
Data Mining: Future Trends and Applications
 
Data mining
Data miningData mining
Data mining
 
Big Data = Big Decisions
Big Data = Big DecisionsBig Data = Big Decisions
Big Data = Big Decisions
 
Review of big data analytics (bda) architecture trends and analysis
Review of big data analytics (bda) architecture   trends and analysis Review of big data analytics (bda) architecture   trends and analysis
Review of big data analytics (bda) architecture trends and analysis
 
An introduction to Data Mining
An introduction to Data MiningAn introduction to Data Mining
An introduction to Data Mining
 
An introduction to Data Mining by Kurt Thearling
An introduction to Data Mining by Kurt ThearlingAn introduction to Data Mining by Kurt Thearling
An introduction to Data Mining by Kurt Thearling
 
Information_Systems
Information_SystemsInformation_Systems
Information_Systems
 
Big Data and Implications on Platform Architecture
Big Data and Implications on Platform ArchitectureBig Data and Implications on Platform Architecture
Big Data and Implications on Platform Architecture
 
A Survey on Data Mining
A Survey on Data MiningA Survey on Data Mining
A Survey on Data Mining
 
Big Data Session 1.pptx
Big Data Session 1.pptxBig Data Session 1.pptx
Big Data Session 1.pptx
 
INF2190_W1_2016_public
INF2190_W1_2016_publicINF2190_W1_2016_public
INF2190_W1_2016_public
 
Upstate CSCI 525 Data Mining Chapter 1
Upstate CSCI 525 Data Mining Chapter 1Upstate CSCI 525 Data Mining Chapter 1
Upstate CSCI 525 Data Mining Chapter 1
 
An introduction to data mining
An introduction to data miningAn introduction to data mining
An introduction to data mining
 
Data analysis trend 2015 2016 v071
Data analysis trend 2015 2016 v071Data analysis trend 2015 2016 v071
Data analysis trend 2015 2016 v071
 
Data Mining Intro
Data Mining IntroData Mining Intro
Data Mining Intro
 
data mining
data miningdata mining
data mining
 
01Intro.ppt
01Intro.ppt01Intro.ppt
01Intro.ppt
 
01Introduction to data mining chapter 1.ppt
01Introduction to data mining chapter 1.ppt01Introduction to data mining chapter 1.ppt
01Introduction to data mining chapter 1.ppt
 
01Intro.ppt
01Intro.ppt01Intro.ppt
01Intro.ppt
 
A Big Data Concept
A Big Data ConceptA Big Data Concept
A Big Data Concept
 

Más de butest

EL MODELO DE NEGOCIO DE YOUTUBE
EL MODELO DE NEGOCIO DE YOUTUBEEL MODELO DE NEGOCIO DE YOUTUBE
EL MODELO DE NEGOCIO DE YOUTUBEbutest
 
1. MPEG I.B.P frame之不同
1. MPEG I.B.P frame之不同1. MPEG I.B.P frame之不同
1. MPEG I.B.P frame之不同butest
 
LESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALLESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALbutest
 
Timeline: The Life of Michael Jackson
Timeline: The Life of Michael JacksonTimeline: The Life of Michael Jackson
Timeline: The Life of Michael Jacksonbutest
 
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...butest
 
LESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALLESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALbutest
 
Com 380, Summer II
Com 380, Summer IICom 380, Summer II
Com 380, Summer IIbutest
 
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
The MYnstrel Free Press Volume 2: Economic Struggles, Meet JazzThe MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazzbutest
 
MICHAEL JACKSON.doc
MICHAEL JACKSON.docMICHAEL JACKSON.doc
MICHAEL JACKSON.docbutest
 
Social Networks: Twitter Facebook SL - Slide 1
Social Networks: Twitter Facebook SL - Slide 1Social Networks: Twitter Facebook SL - Slide 1
Social Networks: Twitter Facebook SL - Slide 1butest
 
Facebook
Facebook Facebook
Facebook butest
 
Executive Summary Hare Chevrolet is a General Motors dealership ...
Executive Summary Hare Chevrolet is a General Motors dealership ...Executive Summary Hare Chevrolet is a General Motors dealership ...
Executive Summary Hare Chevrolet is a General Motors dealership ...butest
 
Welcome to the Dougherty County Public Library's Facebook and ...
Welcome to the Dougherty County Public Library's Facebook and ...Welcome to the Dougherty County Public Library's Facebook and ...
Welcome to the Dougherty County Public Library's Facebook and ...butest
 
NEWS ANNOUNCEMENT
NEWS ANNOUNCEMENTNEWS ANNOUNCEMENT
NEWS ANNOUNCEMENTbutest
 
C-2100 Ultra Zoom.doc
C-2100 Ultra Zoom.docC-2100 Ultra Zoom.doc
C-2100 Ultra Zoom.docbutest
 
MAC Printing on ITS Printers.doc.doc
MAC Printing on ITS Printers.doc.docMAC Printing on ITS Printers.doc.doc
MAC Printing on ITS Printers.doc.docbutest
 
Mac OS X Guide.doc
Mac OS X Guide.docMac OS X Guide.doc
Mac OS X Guide.docbutest
 
WEB DESIGN!
WEB DESIGN!WEB DESIGN!
WEB DESIGN!butest
 

Más de butest (20)

EL MODELO DE NEGOCIO DE YOUTUBE
EL MODELO DE NEGOCIO DE YOUTUBEEL MODELO DE NEGOCIO DE YOUTUBE
EL MODELO DE NEGOCIO DE YOUTUBE
 
1. MPEG I.B.P frame之不同
1. MPEG I.B.P frame之不同1. MPEG I.B.P frame之不同
1. MPEG I.B.P frame之不同
 
LESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALLESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIAL
 
Timeline: The Life of Michael Jackson
Timeline: The Life of Michael JacksonTimeline: The Life of Michael Jackson
Timeline: The Life of Michael Jackson
 
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
 
LESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALLESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIAL
 
Com 380, Summer II
Com 380, Summer IICom 380, Summer II
Com 380, Summer II
 
PPT
PPTPPT
PPT
 
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
The MYnstrel Free Press Volume 2: Economic Struggles, Meet JazzThe MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
 
MICHAEL JACKSON.doc
MICHAEL JACKSON.docMICHAEL JACKSON.doc
MICHAEL JACKSON.doc
 
Social Networks: Twitter Facebook SL - Slide 1
Social Networks: Twitter Facebook SL - Slide 1Social Networks: Twitter Facebook SL - Slide 1
Social Networks: Twitter Facebook SL - Slide 1
 
Facebook
Facebook Facebook
Facebook
 
Executive Summary Hare Chevrolet is a General Motors dealership ...
Executive Summary Hare Chevrolet is a General Motors dealership ...Executive Summary Hare Chevrolet is a General Motors dealership ...
Executive Summary Hare Chevrolet is a General Motors dealership ...
 
Welcome to the Dougherty County Public Library's Facebook and ...
Welcome to the Dougherty County Public Library's Facebook and ...Welcome to the Dougherty County Public Library's Facebook and ...
Welcome to the Dougherty County Public Library's Facebook and ...
 
NEWS ANNOUNCEMENT
NEWS ANNOUNCEMENTNEWS ANNOUNCEMENT
NEWS ANNOUNCEMENT
 
C-2100 Ultra Zoom.doc
C-2100 Ultra Zoom.docC-2100 Ultra Zoom.doc
C-2100 Ultra Zoom.doc
 
MAC Printing on ITS Printers.doc.doc
MAC Printing on ITS Printers.doc.docMAC Printing on ITS Printers.doc.doc
MAC Printing on ITS Printers.doc.doc
 
Mac OS X Guide.doc
Mac OS X Guide.docMac OS X Guide.doc
Mac OS X Guide.doc
 
hier
hierhier
hier
 
WEB DESIGN!
WEB DESIGN!WEB DESIGN!
WEB DESIGN!
 

Sample Paper.doc.doc

  • 1. Fourth AIMS International Conference on Management December 28-31, 2006 Web Data Mining: A Case Study [14 Points Times New Roman, Centred] Samia Massoud, Al Ghurair University smassoud99@yahooo.com Omprakash K Gupta, Prairie View A&M University om_gupta@pvamu.edu [12 Points Times New Roman, bold, Centred] With an enormous amount of data stored in databases and data warehouses, it is increasingly important to develop powerful tools for analysis of such data and mining interesting knowledge from it. Data mining is a process of inferring knowledge from such huge data. The main problem related to the retrieval of information from the World Wide Web is the enormous number of unstructured documents and resources, i.e., the difficulty of locating and tracking appropriate sources. In this paper, we survey the research in the area of web mining and suggest web mining categories and techniques. Furthermore, we present a web mining environment generator that allows naive users to generate a web mining environment specific to a given domain by providing a set of specifications. Keywords: Data Mining, Case Study, Database, Data Warehouse 1. Introduction [12 Points Times New Roman, bold, Centred] In business today, companies are working fast to gain a valuable competitive advantage over other businesses. A fast-growing and popular technology, which can help to gain this advantage, is data mining. Data mining technology allows a company to use the mass quantities of data that it has compiled, and develop correlations and relationships among this data to help businesses improve efficiency, learn more about its customers, make better decisions, and help in planning. Data Mining has three major components Clustering or Classification, Association Rules and Sequence Analysis. This technology can develop these analyses on its own, using a blend of statistics, artificial intelligence, machine learning algorithms, and data stores. 1.1 Data Mining [10 Points Times New Roman, bold, left justified, first letter in CAPITAL] Data mining is a tool that can extract predictive information from large quantities of data, and is data driven. It uses mathematical and statistical calculations to uncover trends and correlations among the large quantities of data stored in a database. It is a blend of artificial intelligence technology, statistics, data warehousing, and machine learning. Data mining started with statistics. Statistical functions such as standard deviation, regression analysis, and variance are all valuable tools that allow people to study the reliability and relationships between data. Much of what data mining does is rooted in statistics, making it one of the cornerstones of data mining technology. In the 1970’s data was stored using large mainframe systems and COBOL programming techniques. These simplistic beginnings gave way to very large databases called “data warehouses”, which store data in one standard format. The dictionary definition of a data warehouse is “a generic term for storing, retrieving, and managing large amounts of data.” (dictionary.com).These data warehouses “can now store and query terabytes and megabytes of data in sophisticated database management systems.” (Carbone,2000) These data stores are an essential part of data mining, because a cornerstone of the technology is that it needs very large amounts of organized data to manipulate. In addition to basic statistics and large data warehouses, a major part of data mining technology is artificial intelligence (AI). Artificial intelligence started in the 1980’s with a set of algorithms that were designed to teach a computer how to “learn” by itself. As they developed, these algorithms became valuable data manipulation tools and were applied to large sets of data. Instead of entering a set of pre-defined hypothesis, the data mining software, combined with AI technology was able to generate its own relationships between the data. It was even able to analyze data and discover correlations between the data on its own, and develop models to help the developers interpret the relationships that were found. AI gave way to machine learning. Machine learning is defined as “the ability of a machine to improve its performance based on previous results.” (dictionary.com) Machine learning is the next step in artificial intelligence technology because it blends trial and error learning by the system with statistical analysis. This lets the software learn on its own and allows it to make decisions regarding the data it is trying to analyze. 1
  • 2. Fourth AIMS International Conference on Management December 28-31, 2006 Later in the 1990’s data mining became wildly popular. Many companies began to use the data mining technology and found that it was much easier than having actual people work with such large amounts of data and attributes. This technology allows the systems to “think” for themselves and run analysis that would provide trend and correlation information for the data in the tables. In 2001, the use of data warehouses grew by over a third to 77%. (Hardison, 2002). Data mining is a very important tool for business and as time goes on, business is becoming more and more competitive and everyone is scrambling for a competitive edge (Hardison, 2002). Businesses need to gain a competitive edge, and can get it from the increased awareness they can get from data mining software that is available on the market right now (Montana, 2001) . Data Mining Evolutionary Chart Evolutionary Step Business Question Enabling Product Characteristics Technologies Providers Data Collection "What was my total Computers, tapes, IBM, CDC Retrospective, revenue in the last five disks static data (1960s) years?" delivery Data Access "What were unit sales in Relational Oracle, Retrospective, New England last databases Sybase, dynamic data (1980s) March?" (RDBMS), Informix, delivery at record Structured Query IBM, level Language (SQL), Microsoft ODBC Data Warehousing & "What were unit sales in On-line analytic Pilot, Retrospective, New England last processing Comshare, dynamic data Decision Support March? Drill down to (OLAP), Arbor, delivery at Boston." multidimensional Cognos, multiple levels (1990s) databases, data Microstrategy warehouses Data Mining "What’s likely to Advanced Pilot, Prospective, happen to Boston unit algorithms, Lockheed, proactive (Emerging Today) sales next month? multiprocessor IBM, SGI, information Why?" computers, numerous delivery massive databases startups (nascent industry) *This chart is from: http://www.thearling.com/text/dmwhite/dmwhite.htm 2. Web Mining Web Mining rapidly collect and integrate information from multiple Web sites. This approach relies on computing power, rather than programmer brain power, to handle many of the complex issues that arise when gathering information from disparate sources. One solution is the power by advanced artificial intelligence algorithms. For instance, machine learning techniques would be used for extracting information from Web sites. Based on a few examples, the software automatically determines how to extract different types of information from data sources. Association Rule Algorithms An association rule is a rule which implies certain association relationships among a set of objects (such as ``occur together'' or ``one implies the other'') in a database. Given a set of transactions, where each transaction is a set of literals (called items), an association rule is an expression of the form X Y , where X and Y are sets of items. The intuitive meaning of such a rule is that transactions of the database which contain X tend to contain Y. 2
  • 3. Fourth AIMS International Conference on Management December 28-31, 2006 Classification Algorithms In Data classification one develops a description or model for each class in a database, based on the features present in a set of class-labeled training data. There have been many data classification methods studied, including decision-tree methods, such as statistical methods, neural networks, rough sets, database-oriented methods etc. Sequential Analysis Here we are looking for a Sequential Patterns, called data-sequences. Each data sequence is an ordered list of transactions (or item sets), where each transaction is a sets of items (literals). Typically there is a transaction- time associated with each transaction. A sequential pattern also consists of a list of sets of items. The problem is to find all sequential patterns with a user-specified minimum support, where the support of a sequential pattern is the percentage of data sequences that contain the pattern. This paper explains how to build a mining model and to solve a few concern problems to best teach online class using the web and get to know the students to serve them better. 3. The Study This paper is organized in two parts. The first part of the paper provides a brief presentation of the clustering algorithm, and explains a few basic data mining terms. The second part of the paper focuses on the performance study of two classes that are using online course over the web and the web generating a mining algorithm using a variety of parameters. A number of experiments were conducted, and their results are presented in this part. The experiments were based on different parameter of interest. The parameters varied form the number of input attributes, the sample size of the class, and so on. The results of these experiments prove that the data mining algorithm are very efficient and scalable. The Clustering Algorithm Used The Clustering algorithm is based on the Expectation and Maximization algorithm ( Microsoft, 2003). This algorithm iterates between two steps. In the first step, called the "expectation" step, the cluster membership of each case is calculated. In the second step, called the "Maximization" step, the parameters of the models are re- estimated using these cluster memberships, which has the following major steps: 1. Assign initial means 2. Assign cases to each mean using some distance measure 3. Compute new means based on members of each cluster 4. Cycle until convergence. A case is assigned to each cluster with a certain probability and the means of each cluster is shifted based on that iteration. The following table shows a set of data that could be used to predict best achievement. In this study, information was generated on users that included the following list of measurements: 1. Most requested pages 2. Least requested pages 3. Top exit pages 4. Most accessed directories 5. Most downloaded files 6. New versus returning visitors 7. Summary of activity for exam period 8. Summary of activity by time increment 9. Number of views per each page 10. Page not found The relevant measurements are viewed which implied the following: Table 1 Statistics on Web Site Visits Statistics Report Hits Entire Site ( Successful) 2390 Average per day 72 Home page 256 Page Views Page views 213 Average per day 97 3
  • 4. Fourth AIMS International Conference on Management December 28-31, 2006 Document views 368 Visitor Sessions Visitor sessions 823 Average per day 65 Average visitor session length 00:31:24 Visitors Visitors who visited once 32 Visitors who visited more than once 75 4. Results This brief case study gives a look at what statistics are commonly measured on web sites. The results of these statistics can be used to alter the web site, thereby altering the next user’s experience. Table 1 displays some basic statistics that relate to frequency, length and kind of visitor. In Table 1 additional insights are gained with a breakdown of visitors by each day. This behavior might reflect variation in activity related to exams time or other issues. Monitoring and understanding visitor behavior is the first step in evaluating and improving the web site. Another relevant measurement is how many pages are viewed. This can reflect content as well as navigability. If a majority of visitors viewed only one page, it may imply that they did not find it easy to determine how to take the next step. 5. Conclusion The web offers prospecting and user relationship management opportunities that are limited only by the imagination. Data mining is a tool that can extract predictive information from large quantities of data, and is data driven. It uses mathematical and statistical calculations to uncover trends and correlations among the large quantities of data stored in a database. It is a blend of artificial intelligence technology, statistics, data warehousing, and machine learning. This data mining technology is becoming more and more popular, and is one of the fastest growing technologies in information systems today. The future of data mining is wide open, and it will be exciting to see how far this technology will go. 6. References 1. Carbone, Patricia., August, 2000. "What is the Origin of Data Mining?" <http://www.mitre.org/pubs/edge/august_00/carbone.htm>. 2. Evolutionary Chart http://www.thearling.com/text/dmwhite/dmwhite.htm 3. “Data Mining Overview.” http//:www.data-mine.com/ 4. “Data Mining” Def. www. Dictionary.Com 5. “Data Warehouse” Def. www.Dictionary.com 6. Hardison, David. “Data Mining: The New Gold Rush.” Pharmaceutical Executive. March, 2002. p. 26, 28, 30. 7. Microsoft “Performance Study of Microsoft Data Mining Algorithms” March 25, 2002 8. Montana, John. “ Data Mining: A Slippery Slope.” Information Management Journal. October, 2001. p. 50-54. 4