4. Text mining
Text mining developed tools and
methods to help scientists
Focused mainly on the body of the
article
Tables and figures are typically
ignored
7. Challenge
Visually structured text
May be ungrammatical and
ambiguous
Various layouts
Value representation types
◦ Numeric
◦ Text
◦ Ranges
◦ Formulas
◦ Complex
10. Table decomposition
Aim: Decompose table into the
structures suitable for further processing
Cell structures that keep information
about navigational path (headers, stubs,
etc.)
Heuristic based approach
Cell structure, alignment, content,
neigbourhood
12. Information extraction
Performed a number of experiments
Extraction of number of patients,
weight, BMI
Approaches:
◦ Rules
◦ Metamap
◦ White and black lists
13. Results
Achieved promising results
Some of the information classes are
easier to extract than other
14. Conclusion & Future work
Information extraction from tables is
feasible
Future work:
◦ Value and table type categorisation
◦ Development of normalization and
extraction engine
◦ Extraction rules
◦ Data storing format (triple store, linked
data)
◦ Data curation interface
◦ Data querying interface