1. No 1: Ontology-Driven Text Mining for Digital Forensics
Supervisors: Dr Warren Jin and Dr Nianjun Liu
Period: Semesters 1 and 2, 2007
The use of digital devices such as computers, Internet, personal digital assistants
(PDAs), cell phones, and cameras, etc., as sources of evidence in terrorism, fraud,
white-collar crime, and other criminal investigations has been steadily increasing in
recent years. Digital forensics involves understanding specific aspects of digital
evidence and the general forensic procedures used when analysing any form of digital
evidence. Digital evidence can be any information of probative value that is either
stored or transmitted in a binary form, such as Emails, Office documents, computer
system log files, as well as digital audio and video. It can be used to decide whether a
crime has been committed and can provide a link between a crime and its victim or a
crime and its perpetrator.
This project aims at implementing (or developing) effective text mining techniques to
analyse textual information, especially Emails and computer system log files by
embodying terminological ontology. Ontology can bring necessary background
knowledge, e.g., terms “ball”, “football”, and “basketball” are semantically related to
each other. Driven by ontology, textual information then can be indexed, summarised
or analysed semantically. This enables the text mining techniques can highlight most
interesting textual digital evidences automatically and effectively for a digital forensic
investigation.
The project involves various techniques such as text mining, temporal data mining,
information retrieval, machine learning and statistics. Applicants are expected to have
a major in information technology, computer engineering, computer science or
electrical engineering, preferably with excellent programming skills (C/C++, Java,
and/or Python). Applicants who are interested in research are also welcome,
preferably with strong background in the information retrieval, temporal data mining,
statistics or/and optimisation.
Contact Dr Warren Jin (Huidong.Jin@nicta.com.au)
No 2: Apply Data Mining Techniques for Cyber Intrusion Detection
Supervisors: Dr Warren Jin and Dr Nianjun Liu
Period: Semesters 1 and/or 2, 2007
Intrusion detection is the process of monitoring the events occurring in a computer
system or network and analysing them for signs of intrusions, which are defined as
attempts to bypass the security mechanisms of a computer or network (“compromise
the confidentiality, integrity, availability of information resources”). Due to the
proliferation of high-speed Internet access, more and more organizations are
becoming vulnerable to time-varying cyber attacks (intrusions). Most existing
intrusion detection systems are based on extensive knowledge of patterns associated
2. with known attacks provided by human experts. They are unable to detect novel and
unanticipated attacks.
This project aims at applying data mining techniques to learn real-time profiles that
represent normal behaviour of users, hosts, or networks, and then detect attacks as
significant deviations from these profiles. The project will implement (or develop) an
efficient learning technique to establish stochastic process models from large volume
of network accessing data, such as source IP address and port, protocol type and
accessing timestamps. The dataset may be sized in gigabytes. The stochastic process
model can be temporal association rules, sequential patterns, dynamic Bayesian
networks, or a mixture of Markov models. The technique will be examined on real-
world network intrusion data. Its performance as well as intrusion signs will be
visualised in order to help non-domain experts for understanding.
The project involves various techniques, including temporal data mining, time series
analysis and computer network security. Applicants are expected to have a major in
information technology, computer science, computer engineering or electrical
engineering, preferably with excellent programming skills (Matlab, C/C++, R, and/or
Python) for implementation. Applicants who are interested in research are also
welcome, preferably with strong background in the data mining, statistical machine
learning, artificial intelligence or/and time series.
Contact Dr Warren Jin (Huidong.Jin@nicta.com.au)
No 3: Apply Dynamical Bayesian Network to Query Digital Forensics
Period: Semesters 1 and 2, 2007
Supervisors: Dr Nianjun Liu and Dr Warren Jin
Digital forensics undertakes the post-mortem reconstruction of the causal sequence of
events arising from an intrusion perpetrated by one or more external agents, or as a
result of unauthorised activities generated by authorised users, in one or more digital
systems. The field of digital forensics covers a broad set of applications, uses a variety
of evidence and is supported by a number of techniques. Application areas include
forensic accounting, law enforcement, commodity flow analysis and threat analysis.
Forensic investigations often focus on unusual and interesting events that may not
have arisen previously. A major objective of a digital investigation is to extract these
interesting pieces of evidence and to identify the causal relationship between this
evidence.
This project aims at extending an existing Dynamical Bayesian Network model
developed for digital forensics by investigating a number of possible topics. The
model developed uses a Bayesian network and hidden Markov model network
structure to (i) estimate typical digital crime scenario models from data and (ii) given
such models, infer the most likely criminal act given current observations and past
criminal acts.
3. First, the addition of multi forward linkages in the BN+HMM network structure to
allow direct linkages between BN nodes at consecutive time intervals. Second is the
design of an SQL based query tool to explore the activities of criminals and their
interactions and explain what happened in the past, as well as predict what will
happen in the near future. Finally, the application of a graphical model to data mining
of relational digital forensic databases, including construction of a relational pattern
structural database for known types of digital crime portfolios and their associated
forensics Bayesian Network models.
Applicants must have a major in information technology, computer science, or
electrical engineering, preferably with excellent programming abilities (MATLAB, C/
C++ and JAVA) OR strong mathematical/machine learning/data mining/statistics
background.
Contact Dr Nianjun Liu (nianjun.liu@nicta.com.au)
No 4: Intelligent Environmental Query on Spatial Data
Period: Semesters 1 or 2, 2007
Supervisors: Dr Nianjun Liu and Dr Warren Jin
Analysis of spatial information in natural resources management is crucial to support
a decision making process. However, with the advent of various technologies to
acquire the data, analysis of multiple spatial data becomes a very challenging area.
Those technologies will produce different accuracy and different resolution in the
data. In spite of multi representation of spatial data, evidences of an area can be from
different time and different observer’s view that makes combining those evidences is
quite complicated. Combining spatial data or evidences is not just simply combining
evidences from different technologies, but it is also combining multi criteria
evidences.
Australian Bureau of Rural Science (BRS) has developed a system known as multi
criteria analysis shell for spatial decision support (MCAS-S). The project is to
incorporate an option for use of (Dynamical) Bayesian network approaches to model
multiple types of evidences for intelligent environmental query and decision supports.
The project involves various techniques: image processing to preprocess the spatial
GIS data, machine learning, pattern recognition and probability theory. Applicants
must have a major in information technology, computer science, or electrical
engineering, preferably with excellent programming abilities (MATLAB, C/C++ and
JAVA) OR strong mathematical/machine learning/data mining/statistics/GIS
background.
Contact Dr Nianjun Liu (nianjun.liu@nicta.com.au)
No 5: Intelligent Land Planning on Relational Spatial Data
Period: Semesters 1 or 2, 2007
4. Supervisors: Dr Nianjun Liu and Dr Warren Jin
Multi-criteria approaches to the analysis of complex issues in environment decision
systems have found wide application across business, government and communities
around the world. Such approaches may be readily applied in the context of land
planning, which is a prerequisite to the development of a city, town or suburb.
Generally, planners collect a range of information about an area, including
information about natural resources, topography, demographics, political issues,
economic characteristics and proximity to neighbouring settlements and services, and
combine this information to make planning decisions. Computer aided Multi-criteria
decision support tools allow for measurement and analysis of alternatives or options,
involving a variety of both qualitative and quantitative dimensions.
The project involves collaboration with the ACT Planning and Land Authority
(ACTPLA) to present a sample demonstration of an Intelligent Land Planning tool
using Bayesian Networks. Specifically, it will aim to develop a tool for the selection
of optimal sites for community services within existing areas of settlement in the
ACT, including recreational facilities, schools, childcare centres, aged care facilities
and community centres. The factors of interest include existing settlement patterns,
demographics (current and anticipated), future development, available land resources,
existing services and community need.
Bayesian Network is designed to apply when there is uncertainty about evidence and
how it should be combined in decision making. The proposed approach is to use a
supervised strategy whereby experts provide known thematic layers of the land cover
GIS spatial database as known successful decisions. The Bayesian Networks then
trained to optimize the predications of such decision with the aim of applying the
optimal decision model to new situations or scenarios.
After exploring the ACT spatial database and other data sources, the scholar will
identify the relevant decision factors with the aid of ACTPLA experts and will then
build the Bayesian Network model. After the iterative data mining on the relational
database, the model will be continuously learned and its structure and parameters
adjusted accordingly. Finally, the scholar will create the new tool and test it in the
real-world context within ACTPLA.
The project involves various techniques: spatial relational database, Structure Query
Languages (SQL), machine learning, pattern recognition and time series. Applicants
must have a major in information technology, computer science, or electrical
engineering, preferably with excellent programming abilities (C/C++ and JAVA) OR
strong mathematical/machine learning/data mining/statistics/demography background.
Contact Dr Nianjun Liu (nianjun.liu@nicta.com.au)