1. Szilard Pafka – Los Angeles area R users group meeting – November 17, 2010
Software tools for data analysis: (size related to surveyed usage)
C C++ Fortran Java + libraries...
Perl Python Ruby Unix shell
Lisp Clojure
R Matlab Octave Maple Mathematica
SPSS Stata Statistica SAS JMP
Excel
SAS EM SPSS Clementine RapidMiner Weka Mahout
MySQL SQL Server NoSQL stores
Hadoop CUDA
support: editors code versioning cloud computing
Possible talks: December: 1. C, interfaces with R (both ways) / something else ?
2. SAS: performance, R interface ready?
3. RExcel
January: 1. Python & R – a comparison
2. numpy, scipy
3. Python vs Unix shell / NLTK / networkX
Other talks (March-)
1. data storage (SQL and some noSQL), access from R
2. data mining platforms
3. Hadoop
4. gpu
5. Java
6. Clojure
...
2. Criterias for talks:
usefulness (for data analysis!) and also comparing it with R
paradigm/philosophy, main usage domain, performance, easiness to learn, quick to program, libraries
break down by:
- part of the data analysis process (pre-processing, exploration (e.g. visualization), modeling etc.)
- nature of data (e.g. numeric, categorical, unstructured text, networks/links etc.)
- size of data
stuff that increases functionality: libraries, 3rd party extensions...
does tool X have R to X and/or X to R interface?
how these tools can be combined to support the whole process of data analysis