Data science is not only about numbers and how to crunch them; it is also about how to communicate project results with the various audience. Scientific journals and conferences are an excellent venue for getting a wider audience reach and gathering valuable comments. The talk will answer the questions: How to structure a scientific paper in data science? What are relevant venues for showcasing your work to gain the most relevant reach? To demystify the process of scientific writing, the case study will be presented: Messy process: Story of the birth of one data science paper.
[DSC Croatia 22] Writing scientific papers about data science projects - Mirjana Pejic Bach
1. Writing scientific papers
about data science
projects
Prof.Dr.Sc. Mirjana Pejić Bach
University of Zagreb, Faculty of Economics and Business
2. Education Roadmap
Faculty of Economics & Business –
Zagreb
Cybernetics & Finance (1993)
Faculty of Economics & Business –
Zagreb
MBA (1996)
Faculty of Economics & Business –
Zagreb
PhD (2003)
MIT Sloan School of Management
– System Dynamics
(1996)
3. Teaching Roadmap
Data mining / Data
science (2003-today)
System dynamics
(2010 –today)
Simulation games
(2007 - today)
Statistics
(1993-1996)
4. Sceintific Resarch Areas
Data mining / data
science
Technology acceptance
/ Digital divide
Statistical modelling
Editorial work /
Reviewer
13. Reasons for writing scientific articles
Increase knowledge - your own and others'
◦ Publication of results that are valuable
◦ Advancing science
Intellectual property protection - formal and informal
Expert reputation – establishing your position in a specific field
Scientific reputation - recognizability and citation
◦ Contact with a larger potential number of readers
◦ Scientific progress
◦ Doctoral study
◦ Scientific titles
◦ Teaching titles
14. How to write a scientific
paper in data science?
15. Pejić Bach, M. (2015). How to write and publish a scientific paper: A closer look to eastern European economics, business
and management journals. Business Systems Research: International journal of the Society for Advancing Innovation and
Research in Economy, 6(1), 93-103.
21. Search queries
Query Results
"data science" (Topic) 7,906 research papers
( "data science" OR "data mining" OR "big data" OR "artificial
intelligence" OR "AI" OR "statistical learning" OR "cluster
analysis" OR "decision trees" OR "artificial neural
networks" OR "ANN" OR "association rules" ) (Topic)
462,448 research papers
( "data science" OR "data mining" OR "big data" OR "artificial
intelligence" OR "AI" OR "statistical learning" OR "cluster
analysis" OR "decision trees" OR "artificial neural
networks" OR "ANN" OR "association rules" ) (Topic) and Highly
Cited Papers and 2021 (Publication Years)
734 research papers
24. Top 10 cited papers
1. Deep learning in neural networks: An overview
2. Mastering the game of Go with deep neural networks and tree search
3. Diagnostic Criteria for Multiple Sclerosis: 2010 Revisions to the McDonald Criteria
4. Representation Learning: A Review and New Perspectives
5. Dermatologist-level classification of skin cancer with deep neural networks
6. GEPIA: a web server for cancer and normal gene expression profiling and interactive analyses
7. The Materials Project: A materials genome approach to accelerating materials innovation
8. Internet of Things: A Survey on Enabling Technologies, Protocols, and Applications
9. Google Earth Engine: Planetary-scale geospatial analysis for everyone
10. Identification of human triple-negative breast cancer subtypes and preclinical models for
selection of targeted therapies
27. 0 1000 2000 3000 4000 5000 6000 7000 8000
REMOTE SENSING
JOURNAL OF INTELLIGENT FUZZY SYSTEMS
INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND…
INFORMATION SCIENCES
APPLIED SOFT COMPUTING
ENERGIES
NEUROCOMPUTING
WEED TECHNOLOGY
IEEE INTERNATIONAL CONFERENCE ON BIG DATA
COMMUNICATIONS IN COMPUTER AND INFORMATION SCIENCE
NEURAL COMPUTING APPLICATIONS
SCIENTIFIC REPORTS
PROCEEDINGS OF SPIE
PROCEDIA COMPUTER SCIENCE
APPLIED SCIENCES BASEL
SENSORS
ANNALS OF NEUROLOGY
SUSTAINABILITY
PLOS ONE
ADVANCES IN INTELLIGENT SYSTEMS AND COMPUTING
EXPERT SYSTEMS WITH APPLICATIONS
LECTURE NOTES IN ARTIFICIAL INTELLIGENCE
ANNALS OF THORACIC SURGERY
IEEE ACCESS
LECTURE NOTES IN COMPUTER SCIENCE
# of publications
29. Methodology
Text mining is the process of exploring and analyzing large
amounts of unstructured text aided by software that can
identify concepts, patterns, topics, keywords and other
attributes in the data.
1st step – word and phrases extraction – Where is the
concern?
2nd step – topic extraction using cluster analysis – What
issues are in the focus?
30. Topic ( "data science" OR "data mining" OR "big
data" OR "artificial intelligence" OR "AI" OR "statistical
learning" OR "cluster analysis" OR "decision
trees" OR "artificial neural
networks" OR "ANN" OR "association rules" )
(462,448 papers)
Highly cited papers
(7906 papers)
Higly cited papers in 2021
(734 papers)
Results from Web of Science Core Collection
32. Example paper: Contreras-Valdes, A., Amezquita-Sanchez, J. P.,
Granados-Lieberman, D., & Valtierra-Rodriguez, M. (2020).
Predictive data mining techniques for fault diagnosis of electric
equipment: A review. Applied Sciences, 10(3), 950.
33. Example paper: Bandara, E., Ng, W. K., De Zoysa, K., Fernando, N., Tharaka, S.,
Maurakirinathan, P., & Jayasuriya, N. (2018, December). Mystiko—blockchain meets big data.
In 2018 IEEE international conference on big data (big data) (pp. 3024-3032). IEEE.
34. Example paper: Andrea, I., Chrysostomou, C., & Hadjichristofi, G. (2015,
July). Internet of Things: Security vulnerabilities and challenges. In 2015
IEEE symposium on computers and communication (ISCC) (pp. 180-187).
IEEE.
35. Example paper: Mookiah, M. R. K., Acharya, U. R., & Ng, E. Y. K. (2012).
Data mining technique for breast cancer detection in thermograms using
hybrid feature extraction strategy. Quantitative InfraRed Thermography
Journal, 9(2), 151-165.
36. Example paper: Brock, J. K. U., & Von Wangenheim, F. (2019). Demystifying AI: What digital transformation leaders
can teach you about realistic artificial intelligence. California Management Review, 61(4), 110-134.
37. Example paper: Bahrammirzaee, A. (2010). A comparative survey of artificial
intelligence applications in finance: artificial neural networks, expert system and
hybrid intelligent systems. Neural Computing and Applications, 19(8), 1165-1195.
38. Example paper: Müller, M., Salathé, M., & Kummervold, P. E. (2020). Covid-twitter-bert:
A natural language processing model to analyse covid-19 content on twitter. arXiv
preprint arXiv:2005.07503.
40. Course: Targeting the right journal /
conference
Select 2 to 3 journals / conferences for publication
the paper matches the topic of the journal
experience of other familiar authors that already published in the
journal
mission statements of the journals
members of the editorial bord
journal quality
Minimize rejections - quality of the paper = quality of the journal
The best papers – The best journals
Other worthy journals
preliminary research
narrow-topic articles
quick publication
a last-resource option if the paper gets rejected in highly-cited
journals
41. Avoid predatory journals and
conferences
What is meant by predatory Journal?
Predatory Journals take advantage of authors
by asking them to publish for a fee without
providing peer-review or editing services.
Because predatory publishers do not follow
the proper academic standards for publishing,
they usually offer a quick turnaround on
publishing a manuscript.
More info:
https://mdanderson.libanswers.com/faq/206
446
https://beallslist.net/
42. Conferences in Croatia & Slovenia
Central European Conference on Information and Intelligent Systems
• https://ceciis.foi.hr/
MIPRO – indexed in Scopus
◦ http://www.mipro.hr/
International Symposium on Operations Research in Slovenia – indexed in Scopus
◦ https://sor.fov.um.si/
International Conference on Operational Research (KOI)
◦ https://hdoi.hr/koi-2022/
45. The most reputable publishers
PAID FOR SUBSCRIPTION PUBLISHERS
Springer
Palgrave Macmillan
Routledge
Cambridge University Press
Elsevier
Nova Science Publishers
Edward Elgar
Information Age Publishing
Princeton University Press
University of California Press
Emerald
OPEN SOURCE PUBLISHERS
PlusOne
Sciendo
Mdpi
Frontiers
Most of paid-for-subscription publishers also publish open access journals and papers
47. Composition: IMRAD framework (1)
Different types of scientific papers
case studies
survey reports
theoretical papers
review papers
IMRAD framework
(1) Introduction (What problem was studied?)
(2) Methods (How was the problem studied?)
(3) Results (What are the results?)
(4) Discussion (What do the findings mean?)
Prof.dr.sc. Mirjana Pejić Bach
48. Title of the paper - understandable and informative, not too long
Abstract - background, purpose, results, methods and conclusion of the paper
Keywords - carefully select
Introduction section
1st paragraph - the current knowledge on the topic
2nd paragraph - direction toward the purpose of the paper
3rd paragraph - the purpose of the paper and it states briefly methodology that has been utilized in
the paper
4th paragraph - other sections of the paper
Convince the editor and the reader that the paper is worth of publishing and
reading.
Composition: IMRAD framework (2)
Prof.dr.sc. Mirjana Pejić Bach
49. Composition: IMRAD framework (3)
Literature review
elaborate the current knowledge
Methods section
the process author carried on in
order to finish the research,
quantitative and qualitative
research or combine together
Results section
present the facts revealed by the
research and not their
interpretation
Discussion section
hardest to write and its deficiencies
are the most often reason for the
papers being rejected
Summarize the findings of the research
– 1st paragraph
Compare the results being expected
from previous research or experience –
2nd paragraph
Propose practical implications of the
results – 3rd paragraph
Explain key limitations of the research –
4th paragraph
Suggest paths for the future research –
5ht paragraph
Prof.dr.sc. Mirjana Pejić Bach
52. Plagiarism Overview
Plagiarism is using someone else’s ideas or words without giving them proper credit. Plagiarism
can range from unintentional (forgetting to include a source in a bibliography) to intentional
(buying a paper online, using another writer’s ideas as your own to make your work sound
smarter). Beginning writers and expert writers alike can all plagiarize. Understand that
plagiarism is a serious charge in academia, but also in professional settings.
If you are...
a student — consequences can include failing grades on assignments or classes, academic
probation, and even expulsion.
a researcher — plagiarism can cause a loss of credibility, legal consequences, and other
professional consequences.
an employee in a corporate or similar setting — you can receive a reprimand or lose your job.
https://owl.purdue.edu/owl/avoiding_plagiarism/index.html
56. Writting scientific papers for beginners
Step 1:
◦ Find an article that serves as a prime example in terms of subject matter and structure there may be
similar topics
Step 2:
◦ Determine the working title of the article
Step 3: Create an article structure
Step 4: Write
◦ 1. introduction, 2. methods, 3. results, 4. discussion
Step 5: Write a summary and specify keywords
57.
58. How to Write a Paper for Publication Franklin L. Rosenfeldt , John T. Dowling,
Salvatore Pepe and Meryl J. Fullerton
60. • Read scientific papers
• Find example of paper on a similar topic in targeted
journal
• Examine how the paper is organized
• Make an outline for the content of the paper in the
form of a bullet list of the future paragraphs and even
sentences
• Start to write
Prof.dr.sc. Mirjana Pejić Bach
61. Messy process of
writting the paper
in scientific journal
PEJIC-BACH, M., BERTONCEL, T., MEŠKO, M., & KRSTIĆ, Ž. (2020). T EXT
MINING OF INDUSTRY 4.0 JOB ADVERTISEMENTS. INTERNATIONAL
JOURNAL OF INFORMATION MANAGEMENT, 50, 416-431.
62. Timeliness for journal writting
1.Shorter conference
papers
• 2-3 months
Longer conference
papers for top
conferences
• 6 months to 1 year
Local journals
• 6 months to 1 year
Top journals
• 1,5 to 3 years
63. Process (2,5 years)
COMPENTENCE:
1st phase – 3 months
◦ Topic of the paper – jobs in industry 4.0
◦ Data source – job advertisments
◦ Methodology – text mining; software – Provalis
2nd phase – 3 months
◦ Data collection
◦ Initial analysis
COURSE:
Journal selection – published similar papers
◦ Parallel process
COMPOSITION
1st draft of the paper following IMRAD
• 3 months
CONTENT
2nd draft of the paper
• 2 months
Final 1st version of the paper
• 6 months
Review process 3 rounds
• 1 year
64. The best way to learn about something
is to write about it!
Good luck!