next up previous
Next: Motivation for categorization of Up: Preprocessing and data field Previous: Preprocessing and data field

Coding via fuzzy memberships

In gallup queries answers to questions are often multivalue choices, coded as an integer, where the numeric magnitude has no importance for the analysis. Therefore, the analysis must use some other coding, where all choices are treated equally, yet making sure that nearby numbers are somehow ordered as in the original coding.

In statistics dummy coding (binary coding) is the usual solution, where each choice of a question is taken as a new variable. The values of dummy variables can be understood as truth values 1 and 0, depending whether or not that particular choice was selected.

The problem of dummy coding is that it ignores the ordering of choices, and the fact that people can be uncertain about their answers. Our solution is to generalize coding from binary truths to real valued memberships, adopted from the fuzzy set theory [7, 8]. The basic idea is depicted in figure 1. An answer choice within range 1..6 is coded as a 6-length vector of truth values between 0..1.

   figure28
Figure 1: Preprocessing of original multi-choice field values.



Anssi Lensu
Tue Nov 3 12:18:16 EET 1998