This document provides an overview of big data analytics and understanding for research activity presented by Dr. Andry Alamsyah. It discusses key concepts related to big data including definitions, characteristics, related fields, and opportunities. It also covers machine learning fundamentals and methodologies including supervised learning, unsupervised learning, and reinforcement learning. Examples of applications in areas like predictive analytics, recommendation systems, and social media analytics are also mentioned. Finally, it discusses data preparation techniques and common data analytics tasks.
3. • Background and Motivation
• Big Data DeNinition and Related Field
• Understanding Pattern
• Data Analytics / Machine Learning Fundamental (Prediction and
Recommendation)
• Social Media Analytics (by Case Study)
• Conclusion
• Working on Your Computer (Machine Learning Practice)
Agenda
11. Cheap Change Everything
efficient economy
new value proposition
• cutting through the BIG DATA hype
• cheap means everywhere
• cheap creates value
• from cheap to strategy
complex
human behaviour
market uncertainty
business
sustainability
disruptive economic
coopetitive, cooperative,
competitive
business ecosystem
/ platform
programmable economy
event driven
API economy
toward large-scale and massive
socio-economic impact
Industry 4.0
13. Big Data DeNinition
•a term => describe extremely large amounts of structured and
unstructured data
•the activity => capture / storage / processing / sharing / reporting of
data => beyond the ability of legacy software tools and hardware
infrastructure
•related to many “science” branch => data analytics, data science,
machine learning, artificial intelligence, IoT, and many more
•the application => on many field => efficient, cost-effective, faster &
accurate decision making
Gigabyte 109 = 1.000.000.000
Terabyte 1012 = 1.000.000.000.000
Petabyte
Exabyte
1015 = 1.000.000.000.000.000
Exabyte 1018 = 1.000.000.000.000.000.000
Zetabyte 1021 = 1.000.000.000.000.000.000.000
1990 2010 Hadoop
store 1400 MB store 1TB 100 drives
working at the
same time can
read 1TB data in
2 minutes
transfer speed 4.5 MB/s transfer speed 100 MB/s
read drive ~ 5 minutes read drive ~ 3 hours
15. DATA ANALYTICS
-the discovery, interpretation, and communication of meaningful patterns in data (wikipedia)
-the process to uncover hidden patterns, unknown correlation, and other useful information that can help organisations make
more informed business decision
SOURCE
review, opinion,
historical data,
conversation,
network friendship,
CCTV, Vlog,
location tagging,
etc
BIG DATA
large, fast, complex
the 5V’s data
DATA SCIENCE
the science to extract
knowledge / pattern from data
SOCIAL COMPUTING
quantification of human / social
behaviour
INSIGHT
market segmentation, risk analytics
information dissemination,
recommended investment, fraud
detection, personalised adv, customer
acquisition and retention, purchase
behaviour, early detection event,
brand awareness, etc
opportunity activity
methodology
benefit
application
Big Data Related Terms (Use Case)
16. Data Analytics
• The discovery, interpretation, and communication of meaningful patterns in data (wikipedia)
• The process to uncover hidden patterns, unknown correlation, and other useful information that
can help organisations make more informed business decision predictive, descriptive, diagnostic,
prescriptive.
17. Predictive Analytics
• study the past if you want to study the future (confucius)
• Predictive Analytics is the art of building and using models that make predictions
based on patterns extracted from historical data. Predictive analytics applications
include: price predictions, dosage predictions, risk assesment, propensity/likelihood
modelling, diagnosis, document classifications
• Predictive is the assignment of a value to any unknown variable.
• A model is trained to make predictions based on a set of historical examples. (we use
Machine Learning)
21. CRISP-DM
CRISP-DM -> Cross -Industry Standard Process for Data Mining is an open standard process model that
describes common approaches used by data mining experts. It is the most widely-used analytics model.[2]
22. Structure Data Type
Column Value
Pa+ent Andry Alamsyah
Date of Birth 12/07/1995
Date Admi?ed 02/03/2019
“The patient came in complaining
of chest pain, shortness of breath,
and lingering headaches.. Smokes
2 packs a day.. Family history of
heart disease.. Has been
experiencing similar symptoms for
the past 12 hours…”
High Degree of
Organiza+on, such as a
rela+onal database
Informa+on that is
difficult to organise using
tradi+onal mechanisms
VS
Structured Unstructured
32. How Can (Big) Data Analytics Helps?
by describing the phenomenon,
by predicting the value,
by estimating the future outcome,
by optimising the resources and the
decision,
by simulating all the possible scenarios ..
34. • Machine learning is defined as an automated process that extracts
patterns from data to build the models used in predictive analytics
applications.
• A branch of artificial intelligence, concerned with the design and
development of algorithms that allow computers to evolve behaviours
based on empirical data.
Machine Learning
35. Machine Learning
Machine Learning is an idea to learn from
examples and experience, without being
explicitly programmed.
Instead of writing code, we feed data to the
generic algorithm, and it builds logic based
on the data given.
Computer Output
Program
Data
• Traditional Programming
Computer Program
Output
Data
• Machine Learning
36. Machine Learning
Machine learning (ML) is the science of
getting computers to act without being
explicitly programmed. ML has given us self-
driving cars, practical speech recognition,
effective web search, and a vastly improved
understanding of the human genome. ML is
pervasive today, we probably use it dozens of
times a day without knowing it. It is the best
way to make progress towards human-level
AI. (standford/coursera)
ML is a type of artiNicial intelligence (AI)
that provides computers with the ability to
l e a r n w i t h o u t b e i n g e x p l i c i t l y
p r o g r a m m e d . M L f o c u s e s o n t h e
development of computer programs that can
teach themselves to grow and change when
exposed to new data. (whatis.com)
41. •It is based on a labeled training set.
•The class of each piece of data in
training set is known.
•Class labels are pre-determined
and provided in the training phase.
Supervised Learning
A
B
A
B
A
B
e Class
l Class
l Class
l Class
e Class
e Class
“What is the class of this data point?”
Task performed : classification, pattern recognition
47. Unsupervised Learning
Problems :
• Clustering
• Association Rules
• Pattern Mining
• It is adopted as a more general term than frequent pattern mining or
association mining
• Outlier Detection
• It is the process of Ninding data which have very different behaviour
from the expectation (outliers or anomalies)
50. Background :
• How to learn a new skill
• Learning and intelligence
• Interaction with environment
• Goal-oriented learning
• Agent – Environment interactions
• Activities
- What to do
- How to map situations to actions
- Process positive and negative rewards
Reinforcement Learning
Reinforcement learning (RL) is an area of machine learning concerned with
how software agents ought to take actions in an environment so as to maximise
some notion of cumulative reward.
Basic reinforcement is modeled as a Markov decision process, and are often stochastic process
53. Data Preparation (CRISP-DM)
Data Preprocessing
• Measures for data quality: A multidimensional view
• Accuracy: correct or wrong, accurate or not
• Completeness: not recorded, unavailable, …
• Consistency: some modified but some not, …
• Timeliness: timely update?
• Believability: how trustable the data are correct?
• Interpretability: how easily the data can be understood?
54. 1.Data Cleaning
a. Fill in missing values
b. Smooth noisy data
c. Iden+fy or remove outliers
d. Resolve inconsistencies
2.Data Reduc6on
a. Dimensionality reduc+on
b. Numerosity reduc+on
c. Data compression
3.Data Transforma6on and Data Discre6sa6on
a. Normalisa+on
b. Concept hierarchy genera+on
4.Data integra6on
a. Integra+on of mul+ple databases or files
Data Preprocessing Task
55. Common Data
Analytics Rules
Tasks Descrip6on Algorithms Examples
Classification Predict if data points belongs to one
of the predefined classes. Prediction
based on learning from known
dataset.
Decision tree, neural
network
Bucketing new customers into
one of the known customer
groups
Regression Predict the numeric target label of a
data point. Prediction based on
learning from known dataset.
Linear regression,
logistic regression
Estimating insurance premium
Clustering Identify natural clusters within the data
set based on inherit properties within
data set.
K-Means, density
based clustering
Finding customer segments in a
company based on transaction
and call data.
Association
Rules
Identify relationships within an item
set based on transaction data
FP-Growth algorithm,
Apriori
Find cross-selling opportunities
for a retailer based on
transaction purchase history
Anomaly
Detection
Predict if a data point is an outlier
compared to other data point in the
dataset
Distance based, density
based, Local Outlier
Factor (LOF)
Fraud transaction detection in
credit cards
56. Estimation
Customer Order Number of Traffic Light Distance Travel Time
1 3 3 3 16
2 1 7 4 20
3 2 4 6 18
4 4 6 8 36
...
1000 2 4 2 12
Label
Learning Model using
Estimation Methods (Linear
Regression)
Travel Time = 0.48O + 0.23TL + 0.5D
Knowledge
Pizza Delivery Time
58. ClassiNication
NIM Gender Nilai UN Asal Sekolah IPS1 IPS2 IPS3 IPS 4 ... Lulus Tepat
10001 L 28 SMAN 2 3.3 3.6 2.89 2.9 Ya
10002 P 27 SMA DK 4.0 3.2 3.8 3.7 Tidak
10003 P 24 SMAN 1 2.7 3.4 4.0 3.5 Tidak
10004 L 26.4 SMAN 3 3.2 2.7 3.6 3.4 Ya
...
...
11000 L 23.4 SMAN 5 3.3 2.8 3.1 3.2 Ya
label
learning using C4.5
classification methods
59. input : golf playing recommendation
output (rules) :
If outlook = sunny and humidity = high then play = no
If outlook = rainy and windy = true then play = no
If outlook = overcast then play = yes
If humidity = normal then play = yes
If none of the above then play = yes
output (tree) :
ClassiNication
68. learning and evaluation process confusion matrix
PREDICTED CLASS
ACTUAL
CLASS
Class=Yes Class=No
Class=Yes a b
Class=No c d
a: TP (true positive)
b: FN (false negative)
c: FP (false positive)
d: TN (true negative)
FNFPTNTP
TNTP
dcba
da
+++
+
=
+++
+
=Accuracy
cba
a
pr
rp
ba
a
ca
a
++
=
+
=
+
=
+
=
2
22
(F)measure-F
(r)Recall
(p)Precision
Model Evaluation
evaluation metric
77. First Topic
Identified
Topic Modelling
•Topic modelling is a type of statistical modelling for discovering the abstract
“topics” that occur in a collection of documents..
•LDA (Latent Dirichlet Allocation) is the most popular (and typically most effective)
topic modelling technique
78. TOP BRAND ALTERNATIVE MEASUREMENT BASED ON
CONSUMER NETWORK ACTIVITY
Abstract:
In Business Intelligence effort, the legacy methodology to
measure product brand awareness use technique such as
surveys, interviews, and questionnaires. This methodology
requires expensive effort to collect data from respondent and
takes considerably time to accomplish. The availability of Big
Data in the form of social media interaction can benefit us.
The conversation and user generated content from social
media certainly can be used to measure brand awareness
through consumer activity. We use Social Network Analysis
methodology to measure the dynamic and evolution of brand
conversations in social media. By comparing the network
properties, we propose new alternative measurement
methods of product brand awareness. Our proposed
methodology is better adapted to large-scale conversational
data in social media.This measurement will also enhance the
current methodology by viewing consumer opinions as a
whole network and not as separated individual. This study
conducted via social networking conversations on Twitter using
two industry case studies, they are mobile operators and
mobile phone brands in Indonesia
mobile phone rank
mobile operator rank
79. A COMPARISON OF INDONESIA E-COMMERCE SENTIMENT ANALYSIS FOR
MARKETING INTELLIGENCE EFFORT
CASE STUDY : BUKALAPAK, TOKOPEDIA, ELEVENIA
Abstract:The rapid growth of e-commerce market in Indonesia, making various e-commerce companies
appear and there has been high competition among them. Marketing intelligence is important activity to
measure competitive position. One element of marketing intelligence is to assess customer satisfaction.
Many Indonesian customers express their sense of satisfaction or dissatisfaction towards the company
through social media. Hence, using social media data, it provides a new practical way to measure
marketing intelligent effort.This research performs sentiment analysis using naive bayes classifier
classification method withTF-IDF weighting.We compare the sentiments towards of top-3 e-commerce
sites visited companies, they are Bukalapak,Tokopedia and Elevenia.We useTwitter data for sentiment
analysis because it's faster, cheaper and easier from both the customer and the researcher side.The
purpose of this research is to find out how to process the huge customer sentimentTwitter to become
useful information for the e-commerce company, and which of those top-3 e-commerce companies has
the highest level of customer satisfaction. From the experiment results, it shows the method can be used
to classify customer sentiments in social mediaTwitter automatically and Elevenia is the highest e-
commerce with customer satisfaction
COMPARABLE RESULT
AMONG THREE CASE STUDY
80. NETWORK TEXT ANALYSIS TO SUMMARISE ONLINE CONVERSATIONS FOR
MARKETING INTELLIGENCE EFFORTS IN TELECOMMUNICATION
INDUSTRY
Abstract - Market tight competition put pressure the companies to employ a new and faster way to support their
marketing intelligence effort.The need of marketing intelligence includes gathering and analysing data for confident
decision making about market and its competition.Today, the abundant large scale data from online social network
services has made possible to extract valuable information such as user opinions and sentiment from the
conversations in the market.As the competition arise, new challenge emerged, which include faster data
summarisation.The common practice of summarise contents is using wordcloud or weighted list of appearance words.
This approach is lack of sense and contextual relations between words in questions, because the words has no
connection with other words that might construct an important phrase.With the help of graph formulation, we
propose a methodology of network text analysis to summarise large conversation in online social network services.
This proposed methodology capture complex relations between words, while still maintain fast summarisation. In this
paper, we compare three major telecommunication provider in Indonesia, which is Telkomsel, XL and Indosat.The
conversations about those brands in online social network services Twitter is collected, Network text about each
brands are constructed and analysed.
81. NETWORK MARKET ANALYSIS USING LARGE SCALE SOCIAL
NETWORK CONVERSATION OF INDONESIA FAST FOOD INDUSTRY
Abstract - The high competitiveness of the Indonesia Fast Food market has forced the industry to find the new way to understand market behaviour. The new challenge
should include faster data collection and analytical process, preferably time delivery needed close to real-time. The common practice of gathering market data using
questionnaires and interviews are considered expensive and time-consuming process compared to mining online conversation with brand community respected. With the
availability of large-scale data from online social network services (oSNS), we can extract valuable information represent dynamic behaviour of the market. Many brands have
their presence in oSNS as a part of their customer relationship management (CRM) effort. The social interactions formed in oSNS can be modeled using Social Network
Analysis (SNA) methodology. In this paper, we compare two brand communities of head to head competitive product in the fast food industry, they are McDonald’s and Burger
King. The SNA model constructs large-scale network, its size, reaching close to a million of nodes and edges. The result will give us insight about what is important in
understanding the dynamic market beside the market size represented by the community conversations.
82. SOCIAL NETWORK AND SENTIMENT ANALYSIS FOR SOCIAL CUSTOMER
RELATIONSHIP MANAGEMENT IN INDONESIA BANKING SECTOR
SCRM Network
BCA BNI MANDIRI
Abstract - The increasing number of social media users affects both individual and corporation user. Banking sector, for example, use social media to support
their Social Customer Relationship Management activity. We investigate the dynamics and evolution of conversation network between bank customer using
Social Network Analysis methodology. Measurement is conducted by calculating its network properties to see the characteristic and how active the network is.
Customers talking about banks’ services can also express their opinion on social media. Therefore we perform sentiment analysis to classify customer’s opinion
into positive, negative and neutral class. This research was performed on Twitter’s conversation about Bank Mandiri, Bank Central Asia (BCA) and Bank
Negara Indonesia (BNI). The result of this research is beneficial for business intelligence purpose to support decision making.
83. MEASURING MARKETING COMMUNICATIONS MIX EFFORT USING
MAGNITUDE OF INFLUENCE AND INFLUENCE RANK METRIC
Abstract: In the context of modern marke:ng, Twi>er is considered as a communica:on pla@orm to spread informa:on. Many companies create and acquire several Twi>er
accounts to support and perform varie:es of marke:ng mix ac:vi:es. Ini:ally, each accounts used to capture specific market profile. Together, the accounts create network of
informa:on that provide consumer to the informa:on they need depends on their contextual u:lisa:on. From many accounts available, we have the fundamental ques:on on
how to measure influence of each account in the market based not only their rela:ons, but also the effects of their pos:ngs. Magnitude of Influence (MOI) metric is adapted
together with Influence Rank (IR) measurement of accounts in their social network neighbourhood. We use social network analysis approach to analyse 65 accounts in the social
network of an Indonesian mobile phone network operator, Telkomsel which involved in marke:ng communica:ons mix ac:vi:es through series of related tweets. Using social
network provide the idea of the ac:vity in building and maintaining rela:onships with the target audience. This paper shows the results of the most poten:al accounts based on
the network structure and engagement. Based on this research, the more number of followers one account has, the more responsibility it has to generate the interac:on from
their followers in order to achieve the expected effec:veness. The focus of this paper is to determine the most poten:al accounts in the applica:on of marke:ng communica:ons
mix in Twi>er.
ratio of affection
magnitude of influence
LCRT function
influence rank (based on pagerank)
84. MAPPING ONLINE TRANSPORTATION SERVICE QUALITY AND MULTI-CLASS
CLASSIFICATION PROBLEM SOLVING PRIORITIES
CASE STUDY : GOJEK AND GRAB
Abstract. Online transportation service is known for its accessibility, transparency, and tariff affordability. These points make online transportation have
advantages over the existing conventional transportation service. Online transportation service is an example of disruptive technology that change the
relationship between customers and companies. In Indonesia, there are high competition among online transportation provider, hence the companies must
maintain and monitor their service level. To understand their position, we apply both sentiment analysis and multiclass classification to understand customer
opinions. From negative sentiments, we can identify problems and establish problem-solving priorities. As a case study, we use the most popular online
transportation provider in Indonesia: Gojek and Grab. Since many customers are actively give compliment and complain about company’s service level on
Twitter, therefore we collect 61,721 tweets in Bahasa during one month observations. We apply Naive Bayes and Support Vector Machine methods to see which
model perform best for our data. The result reveal Gojek has better service quality with 19.76% positive and 80.23% negative sentiments than Grab with 9.2%
positive and 90.8% negative. The Gojek highest problem-solving priority is regarding application problems, while Grab is about unusable promos. The overall
result shows general problems of both case study are related to accessibility dimension which indicate lack of capability to provide good digital access to the end
users.
85. HYBRID SENTIMENT AND NETWORK ANALYSIS OF SOCIAL
OPINION POLARIZATION
Abstract: The rapid growth of social media and user generated contents (UGC)
has provided a rich source of poten:ally relevant data. The problems arise on
how to summarise those data to understand and transforming it into
informa:on. Twi>er as one of the most popular social networking and micro-
blogging service can be analysed in terms of content produced with sen:ment
analysis. On the other hand, some types of networks can also be constructed to
analyse the social network structure and network proper:es. This research
intended to combine those content and structural approaches into hybrid
approach for iden:fies social opinion polarisa:on, this is in the form of
conversa:on network. Sen:ment analysis used to determine public sen:ment,
and social network analysis used to analyse the structure of the network,
detec:ng communi:es and influen:al actors in the network. Using this hybrid
approach, we have comprehensive understanding about social opinion
polarisa:on. As case study, we present real social opinion polarisa:on about
reclama:on issue in Indonesia.
86. DYNAMIC LARGE SCALE DATA ON TWITTER USING
SENTIMENT ANALYSIS AND TOPIC MODELLING
Case Study: Uber
Digital flows now exert a larger impact, the world is now more connected than
ever, the amount of cross-border bandwidth that used has grown 45 times larger
since 2005. With the massive amount of data spreading in the net, including
social media, speed is one most essential factor in business. companies can
take advantage of social media as a source to analyse and extract the
customer’s opinion, and therefore the company can have quick response
towards the condition.
The main purpose of this research is content analysis, to obtain the goal, we
need to extract the information as well as summarise the topic inside it.
However, in order to analyse the content quickly, there are varies choice of tools
with its specific output that creates challenges in the process. We use Naïve
Bayes Sentiment Analysis based on time-series, specifically on daily basis and
topic modeling based on Latent Dirichlet Allocation (LDA) to evaluate the
sentiment of the topic as well as the model of the topics discussed.
The purpose of this research is to help both companies and individuals to map
the public opinion towards certain topic by analyzing the sentiment of the text
and create a topic model. Therefore, a real-time information for determining the
consumer opinion become a crucial part. Twitter can serve the purpose as one
source of real-time information from user-generated content. We pick Uber as
the case study, viewed as one of the most favored transportation methods in
most part of the world. Data collection period is from 10th February 2017 until
28th February 2017 with 1.048.576 tweets collected.
87. ANALYSING EMPLOYEE VOICE USING REAL-TIME FEEDBACK
Abstract People nowadays tend to use social media as a platform to share their
reviews, emotions, and opinions, including about their jobs. Thus, a lot of data is
available on the web. Therefore, a rapid response is needed to analyse and interpret
the data. Unfortunately, many organisations still use annual surveys to assess
satisfaction, engagement, and culture in the workplace. Compared to other
conventional datasets such as company survey and questionnaire, decision-makers
could make decision effectively and efficiently by using the interpreted data. This
may be done with the help of sentiment analysis method.
In this research, we classify the feedback based on its category and sentiment.
Several classification algorithms are used in opinion mining, two of them are Naive
Bayes Classifier (NBC) and Support Vector Machine (SVM). This paper aims to
classify feedback based on sentiments using NBC and SVM.
*ICST, 2018
88. MONTE CARLO SIMULATION AND CLUSTERING FOR
CUSTOMER SEGMENTATION IN BUSINESS ORGANISATION
Abstract: U:lising data for segmenta:on analysis can bring a streamlined way to get poten:al insight as of decision making support in a business organisa:on. Using
appropriate data analy:cal technique help the organisa:ons in profiling their customer segments accurately. The result brings an effec:ve marke:ng strategy. However, there
are :mes in doing data analy:c, the organisa:on needs another variable of data where the value is unavailable, for example: customer’s income data which mostly hard to
collect. By using Monte Carlo simula:on, the value of customer’s income can be generated and then compared with customer spending to construct customer segmenta:on
model. An unsupervised learning for customer segmenta:on model using K-Means clustering enables us to see the grouping pa>erns of customer’s income towards their
spending. Clusters of the dataset might be interpreted as a group of customers that having a similar character. This paper shows us how to generate customer’s income data
and create data cluster to op:mising customer poten:al by u:lising data. Furthermore, the result brings us insight into which group off the customer might unserved properly
considering their average income with their spending behaviour.
89. MAPPING ORGANISATION KNOWLEDGE NETWORK AND
SOCIAL MEDIA BASED REPUTATION MANAGEMENT
Abstract—Knowledge management and reputation are important aspects in an
organization, especially in ICT industry. Controlling knowledge management and
modeling personal reputation through social media is essentials for the organization
because we can see how employee build their relationship around their peer
networks or clients virtually and how knowledge network can support organization
performance. The purpose of this research is to map knowledge network and
reputation formulation in order to fully understand how knowledge flow in an
organization and whether employee reputation have higher degree of influence in
organization knowledge network. We particularly develop formulas to measure
knowledge network and personal reputation based on their social media activities.
As case study, we pick an Indonesian ICT company which actively build their
business around their employee peer knowledge outside the company. For
knowledge network, we perform data collection by conducting interviews. For
reputation management, we crawl data from several popular social media. We base
our work on Social Network Analysis methodology. The result shows that employees
knowledge is directly proportional with their reputation, but there are different
reputations level on different social media observed in this research.
reputation formula for twitter, instagram and linkedin
90. PREDICTION MODELS BASED ON FLIGHT TICKETS AND HOTEL ROOMS DATA
SALES FOR RECOMMENDATION SYSTEM IN ONLINE TRAVELAGENT BUSINESS
Abstract - Indonesia as one of the favorite vacation destinations of domestic and foreign travelers made the value of investment in the tourism industry continued to
grow significantly. This was created more Online Travel Agent business in recent years. However, it made a lot of business travel and Umrah travel in Indonesia is
threatened with bankruptcy, after the online travel business activity is rampant in conventional business market ticket sales and travel tours. The research case
study is different from the Online Travel Agent business in general, because it worked in real-time analytic using flight tickets and hotel rooms sales data to create
prediction or recommendation model. Data mining, extraction of hidden predictive information from large databases, was a powerful technique with great potential
to help companies focus on the most important information in their data warehouse. By using classification method in data mining, the objectives of this paper is to
create predictive models from flight tickets and hotel rooms sales data using the decision tree classification approach. The result of this paper is beneficial for
business that can be used as basic algorithm for programming in Online Travel Agent recommendation feature.
91. EFFECTIVE KNOWLEDGE MANAGEMENT USING
BIG DATAAND SOCIAL NETWORK ANALYSIS
Vizualisa+on of hierarchical structure organiza+on and knowledge
flow of informal organiza+on
Abstract: Knowledge management consists of iden+fying, crea+ng, represen+ng, distribu+ng, and
enabling adop+on of insights and experiences in an organiza+on. One approach of modeling knowledge
management is using network model. Big Data is one of important ICT technological roadmap, which
main func+on is modelling behaviour and helping organiza+on decision support. Social Network
Analysis is a micro version of Big Data where we can model and establish social network quan+fica+on.
In this paper we will show how Social Network Analysis can help organiza+on applying Knowledge
Management strategies and prac+ces by experiment using real-world large dataset contains 360000+
email exchanges between 36000+ employees inside in an organiza+on
business case resolved using SNA methodology
map of full network emaile xchange between employes in Enron
92. INDONESIA INFRASTRUCTURE AND CONSUMER STOCK PORTFOLIO
PREDICTION USING ARTIFICIAL NEURAL NETWORK BACKPROPAGATION
*ICOICT, 2017
Abstract: Ar:ficial Neural Network (ANN) method is increasingly popular to build predic:ve
model that generated small error predic:on. To have a good model, ANN needs large dataset as an
input. ANN backpropaga:on is a gradient decrease method to minimize the output error squared.
Stock price movements are suitable with ANN requirement : it is a large data set because stock price
is recorded up to every seconds, usually called high frequency data. The implementa:onof stock
price predic:on using ANN approach is quite new. The predic:ve model help investor in building
stock por@olio and their decision making process. Buying some stocks in por@olio decrease
diversified risk and increases the chance of higherreturn.In this paper, we show how to generate
predic:on model using ar:ficial neural network backpropaga:on of stock price and forming
por@olio with predicted price that bring predic:on of the por@olio with the smallest error. The data
set we use is historical stock price data from ten different company stocks of infrastructure and
consumer sector Indonesia Stock Exchage. The results is for lower risk condi:on, ANN predic:ve
model gives higher expected return than the return from real condi:on, while for higher risk, the
return from the real condi:on is higherthan the ANN predic:ve model.
93. THE DYNAMIC OF BANKING NETWORK TOPOLOGY
Case Study: Indonesian Presidential Election Event
ABSTRACT - Information and communication technologies have brought major changes in data storage and processing. Various types and high volume of
data has been digitalised and support mining-based data processing to provide knowledge in a modern and efficient way. Banking transaction data has been
stored digitally and suitable for the mining process especially in network science model.Understanding transaction system risk requires fundamental study on
payments flow and bank behaviour in various situations. Lehman Brother’s failure spread contagion impact in a short time indicates that financial markets
have interdependent properties and connected to each other in a large network. Thus, overall system network approach becomes more important than a single
bank. Political conditions greatly affect economic stability including the banking and financial sectors. Presidential election is a major political event for a
nation. This affected on community sentiment and financial market. However, the linkage between political events and topological changes is poorly
understood.This research presents an insight of the event driven dynamic network topology with banking transaction as a case study. We search for the
banking transaction network topology dynamic driven by 2014 Indonesian presidential election event. We discover that banks are more engaged to others in
larger value 3 days before the end of campaign period and less engaged to others in smaller value in the end of campaign period. Unique transaction activity
between banks remain stable with low declination in the end of campaign period. This scenario provides the possibility to learn the banking transaction
pattern and support the financial system stability supervision.
94. A COMPARATIVE STUDY OF EMPLOYEE CHURN PREDICTION
MODEL
Abstract - Churn phenomenon commonly occurs in customer loyalty towards
brand product or services. They becomes critical issue that any industry
would make best effort to avoid. Churn problem may arise within the
organisation, called employee churn. Employee churn creates myriad and
adverse effects to the organisation as it correlates with unfairly workload
distribution, great deal of money lost and also extra time needed to find a
replace, which may result in the rise of customer dissatisfaction rate. The
purpose of this study is to find the best model to predict employee churn. A
successful prediction model for employee churn is significantly needed in
order to avert various negative impacts for the organisation. There are three
popular classification models for prediction, namely naïve bayes, decision
tree, and random forest. This study compares performance of the
aforementioned models by using Human Resource Information System
(HRIS) from one of Indonesia’s renowned telecommunication company. The
data collected for the study spans for 2 years period, started from 2015 until
2017. The findings from the study suggest that the best classification model is
random forest due to its immense accuracy of 97.5%. The second-best
method is naïve bayes with 96.6%, and the lowest accuracy of classification
model is decision tree with 88.7%. The study concludes that the most reliable
and accurate classification model to predict employee churn is random forest
96. STATISTICS DATA ANALYTICS
Confirmative Explorative
Small Data Set Larga Data Set
Small Number of Variable Large Number of Variable
Deductive (no predictions) Inductive
Numeric Data Numeric and Non-Numeric Data
Clean Data Data Cleaning
Complimentary Methods
97. • big data provide granular, micro data
• big data provide relatively fast and cheap process
• research opportunity on data science methods, implementation and evaluation
maturity
• data scientist helps big data initiatives towards future and sustainable economic
activities
• uncovering hidden truths, democratisation by data, are primary objective of data
scientist
• hard to Nind data scientist talent
• high cost to maintain data scientist talent ..
• big data often populations study, so no sampling error => methods familiarity
• beneNit > data quality + costs + security
• ML result credibility (different algorithm, different conclusion)
CHALLENGES
Opportunities
Challenges