Se ha denunciado esta presentación.

Various statistical software's in data analysis.



Cargando en…3
1 de 88
1 de 88

Más Contenido Relacionado

Various statistical software's in data analysis.

  1. 1. Role of Various Statistical Software in Data Analysis 1 Dr. S. Selva Mani First year post graduate student Department of public health dentistry Ragas dental college & hospital
  2. 2. CONTENTS • Introduction. • History. • Various software's types. • Common statistical software and their application • Conclusion. • References. 2
  3. 3. INTRODUCTION  The emergence of statistical software has undoubtedly contributed enormously to the development in research studies in this 21st century.  The high premium placed on ICT by human beings, researchers and organizations has undoubtedly made it a major drive of every nation.  Statistical Software (SS) is a vital tool for research analysis, data validation and findings 3
  4. 4. HISTORY 4 ‘Over the course of history, different forms of data analysis methods have been in existence. Initially, it was paper and pen and later the advent of which computer has helped invention of punching machines and later upgraded to simple calculator and complex scientific calculator.  Nevertheless, pundits have revealed that statistical software is a software program that makes the calculation and presentation of statistics relatively easy. Statistical software allows researchers to avoid routine mathematical mistakes and produce accurate figures in their research if they input all data correctly.  Development of statistical software allows academic researchers to conduct more quantitative studies easily.
  5. 5. HISTORY 5 Many researchers, professionals, scientists and business managers also can clearly present accurately prediction of the future using statistical software. Many proprietary and freeware statistical software packages are available that are suitable for different statistical analysis, depending on the user's needs: Some of the Proprietary software are SPSS, STATA, EVIEW, MINITAB, BMDP etc., and Few among Open or free Statistical software are R, EPI-INFO, CS-PRO, to mention but few.
  6. 6. FEATURES OF STATISTICAL SOFTWARE 6 Statistical software has some common characteristics that make it reliable and suitable for data analysis: 1. Data editor is in rows and columns which make it very easy to enter numeric data. 2. There is availability of menu bar comprises drop-down menu, quick analysis as well as brief user manual. 3. Statistical level of measurement is put into consideration in data entry 4. They follow the initial steps in research project
  7. 7. FEATURES OF STATISTICAL SOFTWARE7 All data should be numeric, although it may not be all variables it is not desirable to use letter or word (String variable) as data. This can be achieved by recoding the letter or word (string data) into desirable numeric and labeled appropriately. Data exploration can be done to check for errors and other accuracy. (a) Getting your data ready to enter into the software. (b) Defining and labeling variable (c) Entering data appropriately with each row containing each case and each column as variable. (d) Data checking and cleaning is possible
  10. 10. COMMON QUALITATIVE DATA ANALYSIS STATISTICAL SOFTWARE AND THEIR APPLICATION 10 Atlas ti 6.0( Hyper RESEARCH 2.8( Max QDA ( ) The Ethnograph 5.08 QSRN6( QSR Nvivo ( Weft QDA( Open code 3.4 (
  11. 11. MS EXCEL 11 MS-EXCEL -Microsoft Excel 2010 is one of the most popular software applications worldwide and is part of the Microsoft Office 2010 productivity suite. You can use Excel to analyze data, for example, in accounts, budgets, billing and many other areas. Excel allows you to explore the menu bar and the different tasks that can be done with it. You can work on sample spreadsheets doing basic math, adding and deleting columns and rows, and preparing the worksheet for printing.
  12. 12. MS EXCEL 12 You can run your data visually to show trends, patterns and comparisons between the data in a chart, table or other template, and Excel performs most general statistical analyses but weak in regression, logistic regression, survival analysis, analysis of variance, factor analysis, multivariate analysis.
  13. 13. G POWER 13 G*Power is a free-to use software used to calculate statistical power. The program offers the ability to calculate power for a wide variety of statistical tests including t-tests, F-tests, and chi-square-tests, among others. Additionally, the user must determine which of the many contexts this test is being used, such as a one-way ANOVA versus a multi-way ANOVA. G*Power has a built-in tool for determining effect size if it cannot be estimated from prior literature or is not easily calculable.
  15. 15. SAMPLE SIZE- T TEST 15
  16. 16. Calculating sample size to estimate sensitivity and specificity of a diagnostic tool with precision 16 CAGE is a screening tool for alcoholism validated for adult (age more than 18years )population .A researcher wishes to estimate the sensitivity of the tool in teenage population of age 15-19 years. The sensitivity and specificity done in the same age group from the other population was found to be 75%and 85% respectively .How much samples should be collected to estimate the parameters With 95%confidence interval at 10%of margin error. Sensitivity =75%(0.75) precision d=10%(0.10) Specificity =85%(0.85) z value for C.I( Z=1.96) Sensitivity =(z1-α)2*se*(1-se)/(d)2 Specificity =(z1-α)2*sp*(1-sp)/(d)2
  17. 17. N master 17nMaster is a software that incorporates sample size calculation(STATA, EpiInfo, nQuery, etc.), in terms of contents, each of use and the cost. There is a growing recognition on the importance of sample size calculation and power analysis in research. When planning a study design, it is very imperative to consider how many participants are needed, therefore, sample size must be premeditated carefully to ensure that the research time, patient effort and support costs invested are not wasted. There are many standard statistical software and stand alone packages available in public domain (internet) for performing sample size/power calculations.
  18. 18. EPI INFO 18  Epi Info is public domain statistical software for epidemiology developed by Centers for Disease Control and Prevention (CDC) in Atlanta, Georgia (USA).  Epi Info has been in existence for over 20 years and is currently available for Microsoft windows.  The program allows for electronic survey creation, data entry, and analysis.
  19. 19. EPI INFO 19 Within the analysis module, analytic routines include t-tests, ANOVA, nonparametric statistics, cross tabulations and stratification with estimates of odds ratios, risk ratios, and risk differences, logistic regression (conditional and unconditional), survival analysis (Kaplan Meier and Cox proportional hazard), and analysis of complex survey data. The software is in the public domain, free, and can be downloaded from
  20. 20. SPSS (STATISTICAL PACKAGE FOR THE SOCIAL SCIENCES ) 20 SPSS is most widely used in social science disciplines and courses. SPSS is the oldest software programs developed and made available in 1960s and has been redeveloped over the years, the latest version is SPSS 20.0 which was produced in August 2013. Many sociologists, psychologists and social workers use this program to enter their research data and formulate results. Although social science uses SPSS more widely than other fields, many find it easy to navigate with SPSS because it is a package that many beginners enjoy due to its very easy to use nature.
  21. 21. SPSS 21SPSS has a "point and click" interface that allows you to use pull down menus to select commands that you wish to perform.  Odusina (2011) disclosed that working with SPSS demand some background knowledge of statistics. There are slight variations in the difference version of SPSS e.g. version 10, 11, 12, 13, 14, 15, 16, etc. SPSS assists the user in describing data, testing hypotheses and looking for a correlation or relationship between one or more variables. SPSS is very suitable for most regression analysis and different kinds of ANOVA (regression, logistic regression, survival analysis, analysis of variance, factor analysis, multivariate analysis but not suitable for time series analysis and multilevel regression analysis).
  22. 22. SPSS: USAGE • Data can be typed in, copied and pasted from a text or Excel file, imported through menu-driven prompts, or read in from a ASCII file using Syntax editor • New variables can be added to worksheet or created using formulas • Datasets contain raw data and only one dataset can be active at a time • Can create and save macros and/or commands • Output window displays output, including graphs • Output can be copied and pasted into other documents • Dataset and Outputs are saved separately • Optional syntax window can read and run commands and can also be saved separately 22
  23. 23. EXPLORING SPSS 23
  24. 24. SPSS: EXAMPLES • Cody data from text file into SPSS spreadsheet 24
  25. 25. SPSS: EXAMPLES • Edit variable names and information 25
  26. 26. SPSS: EXAMPLES • Edit variable names and information 26
  27. 27. SPSS: EXAMPLES • Create a contingency table 27
  28. 28. SPSS: EXAMPLES • Create a contingency table 28
  29. 29. SPSS: PROS AND CONS  Pros  Commonly used in industry, especially those that utilize survey data  Easy-to-use menu-driven software  Output and graphics are clear and well-organized  Separate “Data” and “Variable” tabs in data worksheet make it easy to switch from raw data to variable information (labels, codes, variable type, etc CONS Limited options for analyses Can only analyze one data set at a time Not as much help available as some other packages  Not as useful for: Simulations Data mining Survival analysis Sample size calculation/power analysis Advanced or complex modeling Design of experiments 29
  30. 30. PSPP 30 The PSPP project (originally called "Fiasco") was born at the end of the 1990s as a free software replacement for SPSS, which was a data management and analysis tool, at the time produced by SPSS Inc. The nature of SPSS's proprietary licensing and the presence of digital restrictions management motivated the author to write an alternative which later became functionally identical, but with permission for everyone to copy, modify and share.  The latest version was released in April, 2014.
  31. 31. PSPP 31  This software provides a basic set of capabilities: frequencies, cross- tabs comparison of means (t-test and one way ANOVA); linear regression, logistic regression, and re-ordering data, non-parametric tests, chi-square analysis and more. A range of statistical graphs can be produced, such as histograms, pie chart, Screen plots and np-chart. PSPP can import Numeric and Open Document spreadsheets, posture databases, comma-separated values .It can export files in the SPSS 'portable' and 'system' file formats.
  32. 32. EXAMPLES 32
  33. 33. STATA 33 STATA is a powerful statistical package with smart data- management facilities, a wide array of up-to-date statistical techniques, and an excellent system for producing publication-quality graphs. Stata latest version was produced June 24, 2013 which is a fast and easy to use data management package. Stata is available for Windows, Unix, and Mac computers. The standard version is called Stata/IC (or Intercooled Stata) and can handle up to 2,047 variables.
  34. 34. STATA 34 There is a special edition called Stata/SE that can handle up to 32,766 variables (and also allows longer string variables and larger matrices), and a version for multicore/multiprocessor computers called Stata/MP, which has the same limits but is substantially faster.  STATA performs most general statistical analyses (regression, logistic regression, survival analysis, analysis of variance, factor analysis, multivariate analysis and time series analysis).
  35. 35. STATA: USAGE • Data can be typed in, read in through code, copied and pasted from a text or Excel file, or imported and converted from other files (such as a .txt, etc.) • Command window is used to write and run commands • Review window displays previous analysis, which can be selected to run again • Project window displays all input and output, including graphs • Store and edit data in the Data Editor, which can be saved on its own • Log will copy and automatically save the project for you (must start and close log before and after the analyses you want to save) 35
  36. 36. STATA: EXAMPLES • Copy data from a text file into STATA 36
  37. 37. STATA: EXAMPLES • Recode variable 37
  38. 38. STATA: EXAMPLES • Create a frequency table using commands 38
  39. 39. STATA: EXAMPLES • Run a Wilcoxon Rank-Sum test using menu options 39
  40. 40. STATA: EXAMPLES • Run a Wilcoxon Rank-Sum test using menu options 40
  41. 41. STATA: PROS AND CONS Pros:  Somewhat common in both industry and academia  Somewhat flexible and customizable  Contains up-to-date advanced methods  Quality graphics Cons:  Scripting programming language  Can only analyze one data set at a time  Does not work as well with large data sets  Not as useful for:  Quality assessment and improvement  Design of experiments  Optimization 41
  42. 42. STATA: OVERVIEW • Utilizes both menu-driven selections and scripting commands • Multiple versions available depending on needs (commercial, educational, etc.) • Extensive help documentation and technical support • Contains both basic and advanced statistical methods • Not case-sensitive language • Common fields:  Economics  Sociology  Political science 42
  43. 43. MINITAB 43 MINITAB is statistical software used by educators, students, scientists, business associates and researchers to provide statistical software in a multitude of areas.  MINITAB, developed around 1990, and remains one of the oldest statistical software programs available.  MINITAB has compatibility with PC, Macintosh, Linux and all other major platforms. As one of the easiest statistical software programs to use, MINITAB remains a popular choice with those new statistical software.
  44. 44. MINITAB 44 With drop-down menus and dialog boxes describing how and what to do next, MINITAB persists as a popular choice for teaching students about statistics and data analysis. MINITAB primarily has a user base of educators using the program to show students research methods and analysis in college and graduate-level courses. MINITAB performs most general statistical analyses (regression, logistic regression, survival analysis, analysis of variance, factor analysis, but has its weaknesses in general linear model (GLM) and Multilevel regression).
  45. 45. MINITAB: USAGE • Data can be typed in, copied and pasted from a text or Excel file, or imported through menu-driven prompts • New variables can be added to worksheet or created using formulas • Worksheets contain raw data and only one worksheet can be active at a time • Can create and save macros and/or commands • Session window displays output • Graphs and other visual charts are shown in individual windows • Project manager contains outline that helps you to jump to particular output • Worksheet can be saved separately, but saving whole project will save both worksheet and output 45
  46. 46. MINITAB: EXAMPLES • Copy data into Minitab from a text file 46
  47. 47. MINITAB: EXAMPLES • Create a new variable using formula 47
  48. 48. MINITAB: EXAMPLES • Use Assistant to do a graphical analysis 48
  49. 49. MINITAB: EXAMPLES • Use Assistant to do a graphical analysis 49
  50. 50. MINITAB: EXAMPLES • Use Assistant to do a graphical analysis 50
  51. 51. MINITAB: PROS AND CONS  Pros:  Commonly used in industry and some academic settings  Easy-to-use menu-driven software  Clear output and graphics with some interactive features  Has an “Assistant” feature that includes flow-charts and takes users step-by-step to analyze data properly  Used in most undergraduate statistics courses; there are example data sets included in software  Cons:  Limited options for analyses  Can only analyze one data set at a time  Does not work as well with large data sets  Not as much help available as some other packages  Not as useful for:  Simulations  Data warehousing  Multivariate analysis  Nonparametric methods 51
  52. 52. Minitab: Overview • Menu-driven statistical software, but does have scripting language available for typing commands or creating macros • Used in most Six Sigma courses and workshops • Help documentation located in software as well as online • Used by many analysts to quantitatively make decisions • Common fields:  Social science  Marketing  Education  Sociology  Manufacturing 52
  53. 53. Statistical Analysis System (SAS) 53 SAS - which has its latest version produced in December 2011 is a package that many "power users" like because of its power and programmability. SAS is one of the packages that are difficult to learn. To use SAS, you must write SAS programs that manipulate your data and perform your data analyses. If you make a mistake in a SAS program, it can be hard to see where the errors occurred or how to correct it. However, it can take a long time to learn and understand data management in SAS than many other packages like SPSS or STATA with simpler commands line.
  54. 54. SAS: USAGE • Data can be read in through a command or imported through menu- driven prompts • Variables and functions can be created and renamed • Multiple data sets can be handled at once and are stored in various workspaces (“libraries”) • Four types of commands: DATA step (read & edit data); Procedure steps (run built-in functions); macros (create and run own function); ODS statements (set output settings, styles, etc.) • Editor window is used to write and save commands • Log window reads commands and displays any errors or comments • Output window displays some output created by commands • Results viewer window displays most output, including graphs • Can save only commands, only data, or whole project 54
  55. 55. SAS: EXAMPLES • Import data from a text file 55
  56. 56. SAS: EXAMPLES • Import data from a text file 56
  57. 57. SAS: EXAMPLES • Display data set 57
  58. 58. SAS: EXAMPLES • Create new data set and add a variable 58
  59. 59. SAS: EXAMPLES • Run a regression with diagnostic plots 59
  61. 61. SAS: PROS AND CONS  Pros:  Widely used in both industry and academia  High-performance architecture that supports computationally-intensive algorithms  Flexible and customizable analyses and graphics Cons:  Scripting programming language Expensive Some versions are not 100% compatible  Not as useful for:  Simple analysis and manipulation 61
  62. 62. SAS: Overview • Major statistical software in many industries • Multiple add-ons and extensions available, including integration of SQL programming language and integration with JMP • Extensive online help manuals and forums • Used by many statisticians and computer scientists for data mining, data analysis, and development of statistical methodology • Offers various certifications, which many employers value highly • Common fields:  Statistical science Agriculture Computer science  Quantitative finance Engineering  Sociology  Manufacturing  Pharmaceutical science 62
  63. 63. R & MATLAB 63 R & MATLAB – Stanford (2014) identified R and MATLAB as the richest statistical systems by far. They contain an impressive amount of libraries, which is growing each day. Even if a much desired specific model is not part of the standard functionality, you can implement it yourself, because R and Matlab are really programming languages with relatively simple syntaxes. As "languages" they allow you to express any idea. The question is whether you are a good writer or not.
  64. 64. R & MATLAB 64 On the flip side, Matlab has much better graphics, which you will not be ashamed to put in a paper or a presentation. MATLAB and R perform most general statistical analyses (regression, logistic regression, survival analysis, analysis of variance, factor analysis, multivariate analysis).  The greatest strengths of both are probably in its ANOVA, mixed model analysis and users creative freedom in analysis In terms of modern applied statistics tools, R libraries are somewhat richer than those of Matlab. Also R is free software.
  65. 65. R: USAGE • Data can be read in through code or created • Variables and functions can be created and renamed • Multiple data sets can be handled at once • Editor window is used to write and save commands • Console window reads commands and displays output, which is best saved by copying and pasting into a word processing document • Graphs are outputted in separate window, which is overwritten for each new graph unless otherwise indicated in commands • Workspaces can be saved, meaning data sets and variables do not need to be recreated (especially useful if data creation and manipulation take a long time to run) 65
  66. 66. R: EXAMPLES • Read in data set from a text file • Create a variable • Find online help • Run a t-test • Create a histogram 66
  67. 67. R: EXAMPLES • Read in data set from a text file 67
  68. 68. R: EXAMPLES • Create a variable 68
  69. 69. R: EXAMPLES • Find online help 69
  70. 70. R: EXAMPLES • Run a t-test 70
  71. 71. R: EXAMPLES • Create a histogram 71
  72. 72. R: PROS AND CONS Pros:  Widely used in both industry and academia  Flexible and customizable analyses and graphics  Great for:  Data manipulation, editing, and coding  Data mining  Simulations  Survival analysis  Linear and nonlinear model Cons:  Scripting programming language  Mediocre graphics  Not as useful for:  Graphical analysis  Data summary  Exploratory analysis  Quality assessment and improvement  Design of experiments 72
  73. 73. R: Overview • Free, open-source software; similar to S-plus • Multiple add-ons and extensions available, including integration with LaTeX ( a word processor) via R Studio, and Excel via RExcel • Extensive online help manuals and forums • Used by many statisticians and computer scientists for data mining, data analysis, and development of statistical methodology • Case-sensitive language • Common fields:  Statistical science  Computational biology  Computer science 73
  74. 74. Econometric Views (EViews) 74 EViews is a statistical package for window, used mainly for time-series oriented econometrics analysis.  It was developed by Quantitative Micro Software (QMS) and now a part of IHS. Version 1.0 was released in March 1994, and has replaced MicroTSP. The current version of EViews is 8.0, released in March 2013.  EViews can be used for general statistical analysis and econometric analyses, such as cross-section and panel data analysis and time series estimation and forecasting.
  75. 75. Econometric Views (EViews) 75 EViews relies heavily on a proprietary and undocumented file format for data storage.  However, for input and output it supports numerous formats, including databank format, Ms- Excel format, SPSS/PSPP, DAP/SAS, STATA, RATS, and TSP. EViews can access ODBC databases .
  76. 76. Showing 10 Statistical Packages and means of Accessing Statistical Package 76
  77. 77. 77
  78. 78. 78
  79. 79.  Coding can be done manually.  Manual method of coding in qualitative data analysis is rightly considered as labour - intensive, time-consuming and outdated.  Coding can also be done using Qualitative data analysis software such as NVivo, Atlas ti 6.0, HyperRESEARCH 2.8, Max QDA(Qualitative Data Analysis) software and others.  When choosing software for qualitative data analysis you need to consider a wide range of factors such as the type and amount of data you need to analyse, time required to master the software and cost considerations. 79SOFTWARES
  80. 80. ATLAS TI 6.0 80 The purpose of ATLAS.ti is to help researchers uncover and systematically analyze complex phenomena hidden in unstructured data(text, multimedia, geospatial). The program provides tools that let the user locate, code, and annotate findings in primary data material, to weigh and evaluate their importance, and to visualize the often complex relations between them. ATLAS.ti is used by researchers and practitioners in a wide variety of fields including anthropology, arts, architecture, communication, criminology, economics, educational sciences, engineering, ethnological studies, management studies, market research, quality management, psychology and sociology.
  81. 81.  Software can support analyses BUT does not “do” analyses for you.  The Ethnograph 5.08  QSR N6 (  Weft QDA (  Open code 3.4 ( 81 Ritchie J, Lewis J. Qualitative research practice. London: Sage Publications
  82. 82. HYPER RESEARCH 82 HyperRESEARCH is used by qualitative researchers in areas such as health care, legal, sociology, anthropology, music, geography, geology, education, theology, philosophy, history, market research, focus group analysis and most other fields using qualitative research approaches. HyperRESEARCH enables coding and retrieval of source material, theory building, and analyses of your data. With its multimedia capabilities, HyperRESEARCH allows you to work with text, graphics, audio, and video sources. HyperRESEARCH has been in use by qualitative researchers since it was first introduced in 1991.
  83. 83. FEATURES 83Easy-to-use interface: offers the same intuitive case-based interface on all supported platforms.  It provides a turnkey solution by employing a built-in help system and several tutorials with step-by-step instructions. With flexible organization of codes from any source to any case, support for code frequencies and other code statistics, HyperRESEARCH is well suited for mix-method approaches to qualitative research. The HyperRESEARCH study file format and all of the various media types supported will work across operating systems. Multi-Media capabilities: allows you to work with text, graphics, audio, and video source material in many popular formats.
  84. 84. MAXQDA 84 MAXQDA is a software program designed for computer- assisted qualitative and mixed methods data, text and multimedia analysis in academic, scientific, and business institutions. It is being developed and distributed by VERBI Software based in Berlin, Germany. MAXQDA is designed for the use in qualitative, quantitative and mixed methods research. The emphasis on going beyond qualitative research can be observed in the extensive attributes function (called variables in the programme itself) and the ability of the programme to deal relatively quickly with larger numbers of interviews.
  85. 85. FEATURES 85Import of text documents, tables, audio, video, images, twitter tweets, surveys Data is stored in project file '(*.mx18) Reading, editing and coding data Settings links from one part of a document to another Visualization options (number of codes in different documents etc.) Group Comparison Analyse code combinations Import and export demographic information (variables) from and to SPSS and Excel Import of web pages with the free browser add-on MAXQDA Analyse of responses to survey questions
  86. 86. CONCLUSION 86 It is the opinion of the researcher, after analyzing the field data to a very large extent, that statistical software has contributed immensely to social research especially in the area of demographic and data analysis. This was achieved by using a scientific approach to solve fundamental problems in research which is data analysis.  Although some other factors contributed to the quality of research work such as literature review, methodology and findings, it is quite clear that the impact of statistical software packages on analysis and findings of research cannot be over estimated.
  87. 87. REFERENCES 87 Karen (2013) Choosing a Statistical Software Package or Two Source: VISITED ON 7TH April, 2014 McGann, C. (2009). Role of Statistical Software . eHow Contribution accessed by 3rd April, 2014 McCullough, B.D. (1999). "Econometric software reliability: EViews, LIMDEP, SHAZAM and TSP". JournalofAppliedEconometrics 14 (2):191–202. HYPERLINK  "" o "Digital object identifier" doi : Odusina E.K. (2011) Stanford P. (2013) Statistical & Financial Consulting visited 6th April, 2014 STATA PRESS site (2014) Stata history and usage.visited 28th April, 2014. Ann Lewins and Christina Silver: Using Software in Qualitative Research: A Step-by-Step Guide, 2nd edition, 2014, Sage Publications, Los Angeles, London, New Delhi, Singapore
  88. 88. Evaluation : 88 1. Which year did IBM take over SPSS?? 2. Expand BMDP &CS-PRO 3. Which software is most popular ?? 4. What is the latest version of MINITAB?? 5. Is PSPP has any acronym ??