Next: Neural Data Analysis Up: Titlepage Previous: Titlepage

Introduction

In this study, computationally intelligent methods have been developed for profiling of any documents containing textual data. The main goal has been to locate and identify groups of similar documents.

Problems - Data Complexity

Number of documents usually big
Size of the vocabulary might be huge
Textual data might contain several languages and spelling errors
Finnish language contains at least compound words and case endings, but possibly also several dialects

Solutions

Neural Data Analysis Environment
Self-Organizing Maps (TS-SOM)
Multi-stage analysis model
- Word similarity detection and grouping
- Contexts are formed according to the use of words in sentences
- Features representing whole documents are used to train a concluding SOM

Anssi Lensu
Tue Oct 27 15:29:16 EET 1998