In constructing a housing price index, one has to make several non-trivial choices. One of them is the choice among alternative estimation methods, such as repeat-sales regression, hedonic regression, etc. There are numerous papers on this issue, both theoretical and empirical. Shimizu et al. (2010), for example, conduct a statistical comparison of several alternative estimation methods using Japanese data. However, there is another important issue which has not been discussed much in the literature, but which has been regarded as critically important from a practical viewpoint: the choice among different data sources for housing prices.
There are several types of datasets for housing prices: datasets collected by real estate agencies and associations; datasets provided by mortgage lenders; datasets provided by government departments or institutions; and datasets gathered and provided by newspapers, magazines, and websites. Needless to say, different datasets contain different types of prices, including sellers’ asking prices, transaction prices, valuation prices, etc. With multiple datasets available, one may ask several questions. Are these prices different? If so, how do they differ from one another? Given the specific purpose of the housing price index one seeks to construct, which dataset is the most suitable? Alternatively, with only one dataset available in a particular country, one may ask whether this is suitable for the purpose of the index one seeks to construct. This paper is a first attempt to address some of these questions.