next up previous
Next: TREE BASED CLASSIFIERS FOR Up: IMPUTATION Previous: NEW AND TRADITIONAL TECHNIQUES

THE DEVELOPMENT OF A DONOR IMPUTATION SYSTEM

Heather Wagstaff and Nargis Rahman

Office for National Statistics, Room D212, Drummond Gate,
London, SW1V 2QQ.
Email1: heather.wagstaff@ons.gov.uk
Email2: nargis.rahman@ons.gov.uk

As part of the EUREDIT Project, the Office for National Statistics (ONS) has developed a prototype donor imputation system (DIS) for the imputation of item non-response. The DIS implements the joint imputation method proposed by Fellegi & Holt 1976. The basic principle underlying this method is that all missing items within a record are imputed using a single clean record as a donor. Donor imputation methods select values from a wholly valid record, the 'donor', and copy the values to fill the missing items of another record, the 'recipient'. The ONS DIS supports a search algorithm which hunts for candidate donor records from the whole data matrix using a set of primary matching variables. Once a pool of candidate donor records has been found the nearest neighbour, based on statistical closeness, is selected to provide all missing items to the recipient. Thus the current DIS searches for a single donor for all missing items in a record but the current functionality also allows an option to select a different donor for each missing item should the user so choose. There is multiple choice of distance functions for both categorical and continuous matching variables. Current evaluation has produced evidence that the DIS achieves good results when a suitable set of matching variables are selected. However, there is also evidence that a comprehensive statistical analysis, together with sound knowledge, of the data set are necessary to obtain a good set of predictors.



Pasi Koikkalainen
Fri Oct 18 19:03:41 EET DST 2002