next up previous
Next: Missing Data Up: Open Problems and Some Previous: Open Problems and Some

Methodological Problems

This section discusses a few typical questions and problems related to performing qualitative data analysis [3] in general. Usually there exists some precedent assumption of the groups of people. Sometimes the groups can be found from the data set and sometimes not. And even if the expected groups can be found, some other existing groups might slip through the analysis without notice. This is a common problem of analysis methods, which do not resort to the actual data to reveal the existing groups.

Another additional problem arises from the high dimensionality of the data being investigated. People performing the analyses usually loose their ability to understand data above three dimensions. Therefore, the dimensionality of data has to be reduced to two or three dimensions.

   figure22
Figure 2: a) Problem of finding relevant groups. b) Dimensionality reduction is needed for visualizations.

In addition to neural networks, there exists several statistical dimensionality reduction methods: factor analysis, sammon mapping and classical scaling, for example. The last two try to preserve the original distances of data points the same while changing from n dimensional space to 2 dimensions. However, if the data is really complex or there are a dozen or more separate groups, these methods usually loose their ability to generalize properly.

The use of proper analysis methods can also reveal deficiences in the original questionnaire. If some statement is formulated as "I get irritated once in a while", the person responding could disagree strongly for two different reasons. If he or she either "never gets irritated" or "gets irritated all the time", the opinion could be the same. This type of statement does not reveal how often the person gets irritated, which might be the purpose of it.

None of the generally known analysis methods generate good statements or questions, but some methods can reveal, which questions are irrelevant or poorly formulated. For instance, if some distinct group of people has given both positive and negative opinions to some statement, the corresponding statement should be verified by the person doing the actual analysis. The actual analysis method cannot know for sure whether this is a result of there being two separate groups or just a consequence of a poorly chosen question.

If the opinions to some statement vary from person to person and show no noticeable patterns, it could also be a result of the irrelevance of the corresponding statement. But this kind of behavior could also result from a situation of too many input variables compared to the size of a neural network, for instance, or from poorly chosen input variable values. Some neural network methods do not behave consistently if some input variables have a large range and some only a small one.

All of these problems suggest, that an interactive, visual and iterative analysis process could be best for finding the irrelevant and poor questions.


next up previous
Next: Missing Data Up: Open Problems and Some Previous: Open Problems and Some

Anssi Lensu
Tue Nov 3 12:42:42 EET 1998