Se ha denunciado esta presentación.
Se está descargando tu SlideShare. ×

Lecture 01-1-IIS.pptx

Próximo SlideShare
Data Mining With Big Data
Data Mining With Big Data
Cargando en…3

Eche un vistazo a continuación

1 de 28 Anuncio

Más Contenido Relacionado

Similares a Lecture 01-1-IIS.pptx (20)


Lecture 01-1-IIS.pptx

  1. 1. CS-651 Intelligent Information Systems Khalid Saleem Spring 2022
  2. 2. “Computersare useless. They only give answers.” Pablo Picasso
  3. 3. So, we need to device right questions for the computers to answer. Come up with systems which can help us in this activity ….
  4. 4. Any system which gives us knowledge ….
  5. 5. PSU definition of IIS Research  The goal of the Intelligent Information Systems Research is to explore and support all levels of research that will improve and enhance our ability to generate, manage, search, and mine information and knowledge. Current research covers Internet database design and analysis, mobile Web computing, Web mining and navigation, Web agents, novel and intelligent Web tools, multimedia retrieval, Web and internet models, Web usage, automatic content analysis and digital libraries, Web search, niche search engines, semantic web, scientific databases, data mining, and information retrieval.  All about Web ….
  6. 6. Knowledge Information Data  Data : Simple things; easily captured, structured, transferred, compressible and quantifiable  Information : Relevant and related data having some purpose; needs consensus on meaning and human mediation necessary  Knowledge : Valuable information from human mind; contextual; hard to capture electronically and structure; mostly tacit Source : Adapted from Thomas H. Davenport, Information Ecology
  7. 7. Transactio n Systems Information Systems Data Mining OLAP Decision Support Systems
  8. 8. Definition - DM and KDD  Data mining(DM) is a step in the knowledge discovery process consisting of particular data mining algorithms that, under some acceptable computational efficiency limitations, find patterns or models in data  Knowledge discovery in databases (KDD) (Fayyad et al, 1996), is the process of identifying valid, novel, potentially useful, and ultimately understandable patterns or models in data.
  9. 9. Data Items  Data items refer to an elementary description of things, events, activities, and transactions  that are recorded, classified, and stored but are not organized to convey any specific meaning.  Data items can be numbers, letters, figures, sounds, or images. Examples of data items are a  student grade in a class and the number of hours an employee worked in a certain week.
  10. 10. Information  Information refers to data that have been organized so that they have meaning and value to the recipient.  For example, a grade point average (GPA) is data, but a student’s name coupled with his or her GPA is information.  The recipient interprets the meaning and drawsconclusions and implications from the information.
  11. 11. Knowledge  Knowledge consists of data and/or information that have been organized and processed  to convey understanding, experience, accumulated learning, and expertise as they apply to a current business problem. For example, a company recruiting at your university has found  Over time that students with grade point averages over 3.0 have had the most success in its management program.  Based on its experience, that company may
  12. 12. Information and Information Systems
  13. 13. Types of Information Systems KEY SYSTEM APPLICATIONS IN THE ORGANIZATION
  14. 14. Major Types of Systems • Executive Support Systems (ESS) • Decision Support Systems (DSS) • Management Information Systems (MIS) • Knowledge Work Systems (KWS) • Office Systems • Transaction Processing Systems (TPS) MAJOR TYPES OF SYSTEMS IN ORGANIZATIONS
  16. 16. Task – LOSE WEIGHT!  UK survey - relevant respondents were asked to name the top reasons as to how using dieting apps on their Smartphone has helped them to lose weight. The five highest ranked answers emerged as follows:  1. Easier to track calories and food intake at the push of a button (47%)  2. Can check calorie content of items before deciding to eat them (36%)  3. Helpful for planning healthy and nutritious meals in advance (32%)  4. Helps keep me motivated (24%)  5. Cheaper and easier alternative to diet books and magazines (18%)
  17. 17. Information System  Executive Support System  Management Information System  Decision Support System  Knowledge Work System  Office System  Transaction Processing System 5 year sales plan Inventory Control Pricing/Profit Analysis Engineering/Graphics Word Processing/Agenda Order Processing 6 month diet plan What should be in refrigerator Best food within budget Which food to eat with which food Keeping agenda of eating timings
  18. 18. Big Data …. Open Data
  19. 19. Magnitude of Data  Human brain capacity 2.5 PETABYTES  Total digital data created 422 EXABYTES (2008)  Web size - 98 PETABYTES (2010)  Total genome sequences of all people on Earth 4.75 EXABYTES  Web users - 2 Billion + (2011)  World’s digital storage capacity 1 ZETTABYTE (2011)  Digital data created 1.8 ZETTABYTES (2011) 2.7 ZB (2012)  Digital Data to be produced - 35 ZETTABYTES (by 2020 )  Drastic price reduction in per Gigabyte production of storage  => All data is being or is going to be conserved!  => Huge data centres  1 bit on 12 atoms …. 1 bit on 1000000 atoms IDC Digital Universe 2010/ Popular Science Nov 2011/ IBM GIGA 9 . . TERA 12 . . PETA 15 . . EXA 18 . . ZETTA 21 . . YOTTA 24 19
  20. 20. Some systems we will come across…  Decision Support Systems  Strategic Information Systems  OLAP  Executive Information Systems  Enterprise Information Systems  …..  Green Information Systems based on  Data Warehousing  Data Mining
  21. 21. Data Equity  UK Report : Annual economic benefit of these big data analytics can be above 40 billion £ for UK, for 2017 from Government and Enterprise point of view.  US government has dedicated 200 million dollars for research to handle big data  $10 million data project at the University of California, Berkeley, support for a geosciences data effort called Earth Cube, and more. 21
  22. 22. Some Scenarios …
  23. 23. Obama’s secret weapon in re- election  With a sluggish economy, unemployment teetering at around the eight per cent mark, and growing anti-Obama sentiment in some parts of the country, a second term seemed an uphill task for Obama and it was going to take an extraordinary campaign to make it happen  From the get-go David Axelrod, the brain behind the Obama campaign, recognised the role that data and information could play in the election. The process had been initiated in 2008 but databases were scattered and it wasn’t until the 2010 midterm elections that the Democratic Party, despite heavy losses, was able to streamline the data to accurately forecast results in a meaningful way.
  24. 24. Obama’s secret weapon in re- election  Pakistani scientist Rayid Ghani  Ghani’s job was to make sense of huge amounts of information “The core of the work I was doing was looking at a large amount of data and making sense of it to help other people make better decisions” If the 2008 campaign was about charisma and hope, the 2012 campaign was about science and data.  How data helps you, is it makes you more efficient and it helps you spend your money carefully and in the right way
  25. 25. Collaborative Social IS The data content published by individuals on web, specifically in a collaborative environment like social forum, blog, games etc. Data can be analysed/created by thousands of users (crowd-sourcing initiatives). • Right after the earthquake in Haiti in 2009, the company holding the license to use geo-imagery opened up the map data of Haiti to general public. The web users all over the world examined the imagery and within a couple of days populated the map with information like refugee camps, hospitals, damaged buildings. The relief workers in the area used the map to organise the relief work 25
  26. 26. 26
  27. 27. Other scenarios … • Online business transactional system : Millions of transactions over internet, e.g. online game servers, business transaction portals or other business to business oriented services • New York Stock Exchange produces about one terabyte of data per day • WallMart Data Warehouse Intelligence • Querying these transactions in Column databases! • NoSQL Databases • Graph Databases
  28. 28. Evaluation Criteria:  Sessionals (2) 25, Assignments: 20, Quizzes: 5  Terminal 50 Recommended Readings:  B1. Introduction to Information Systems, 4th Edition, International Student Version, by R. Kelly Rainer, Casey G. Cegielski, April 2012, ©2012  B2. Fundamentals of Database Systems by Ramez ElMisri and Shamkant B. Navathe  B3. Database Systems by Thomas Connolly and Carolyn Begg  B4. Web Data Mining by Bing Liu  B5. Data Mining: Concepts and Techniques. Jiawei Han, Micheline Kamber and Jian Pei. Third Ed. 2012.

Notas del editor

  • Knowledge is both an individual attribute and a collective attribute of the firm.
    Knowledge is generally believed to have a location, either in the minds of humans or in specific business processes.
    Knowledge is “sticky” and not universally applicable or easily moved.
    Knowledge is thought to be situational and contextual. For example, you must know when to perform a procedure as well as how to perform it.
    Tacit knowledge : Knowledge residing in the minds of employees that has not been documented. Like ride a bicycle.
    i.e. Knowledge is a cognitive, even a physiological, event that takes place inside peoples’ heads.
    Explicit knowledge : Knowledge that has been documented
    It is also stored in libraries and records, shared in lectures, and stored by firms in the form of business processes and employee know-how. Knowledge can reside in e-mail, voice mail, graphics, and unstructured documents as well as structured
  • Information System is a term used for systems which give some information –
    IS is categorized depending how the information is used …

    ESS – 6 month diet plan!
    MIS – What should be in my refrigerator and what should not be!
    DSS – which is the best food for me within my budget
    KWS - Which food should be eaten with which other food for maximum energy and minimum
    OS – Keeping agenda of eating timings

    TPS – Transaction processing is simply a system to input the data and convert it into another form

    What we see from the example – Technically speaking a simple Mobile app fits in all 5 categories of information systems
    So, Mobile Information System is here to help individuals, enterprises, and governments
  • eBay 6.5 PB 2009, Google 1 PB of new data every 3 days 2009 : It is said that Google is today the largest manufacturer of Computer Hardware –
    To handle the data - it designs and produces its own hardware. Maybe becasue once in a security workshop, an expert shared a case in which sensitive company asked the proprietory vendors to submitt an AFFADAVID (registered from court) paper saying there is no backdoor chip in the servers. Not one company went for the selling. The company made their servers from off the shelf motherboards and processors!
    The Guardian Tuesday 29 May 2012 - Cyber-attack concerns raised over Boeing 787 chip's 'back door‘
    Sergei Skorobogatov , Christopher Woods, (Cambradge Univ. UK) Breakthrough silicon scanning discovers backdoor in military chip. Cryptographic Hardware and Embedded Systems Workshop (CHES-2012), 9-12 September 2012, LNCS 7428, Springer, ISBN 978-3-642-33026-1, pp.23-40.
    Researchers claim chip used in military systems and civilian aircraft has built-in function that could let in hackers

    Talks about Yottabyte of data is on the boards…
    In a presentation at IBM, ECAI 2012 – A researcher from IBM, Haifa said for 2020 onword we should start thinking how to handle Yottabyte!
    2^30, 40, 50, 60, 70, 80 in binary from Giga to Yotta

    At IBM bit on 12 atoms- Currently it takes about a million atoms to store a bit on a modern hard-disk. Below 12 atoms the researchers found that the bits randomly lost information, owing to quantum effects. More than a factor of 80000 to current disk size to volume ratio …. Density of data

    An example of sensor and machine data is found at the Large Hadron Collider at CERN, the European Organization for Nuclear Research. CERN scientists can generate 40 terabytes of data every second during experiments.
    Similarly, Boeing jet engines can produce 10 terabytes of operational information for every 30 minutes they turn. A four- engine jumbo jet can create 640 terabytes of data on just one e Atlantic crossing; multiply that by the more than 25,000 flights flown each day, and you get an understanding of the impact that sensor and machine-produced data can make on a BI environment
  • Governments are going for support for research to handle the open big data
  • The amount of UNPROTECTED yet sensitive data is growing even faster
    Games, social networks produce millions of transactions per second. Catering for it and at the same time producing statistical reports on it
    ACID properties are not fully complied.
    Graph databases which comply with ACID properties