Patent data have been widely used in research to characterize firms' locations in technological or knowledge space, as well as the proximities among firms. Researchers have measured firms' technological or knowledge proximities with a variety of measures based on patent data, including Euclidean distances (using the technological classifications listed on patents), and overlap in cited patents. Often research has employed only the first listed patent classification in measures of proximities. We explore the effects of using the first listed patent class as well as other methods to measure proximities. We point out that measures of proximity based on small numbers of patents are imprecisely measured random variables. Measures computed on samples with few patents or a single patent class generate both biased and imprecise measures of proximity. We discuss the implications of this for typical research questions employing measures of proximity, and explore the effects of larger sample sizes and coarser patent class breakdowns in mitigating these problems. Where possible, we suggest that researchers increase their sample sizes by aggregating years or using all of the listed patent classes on a patent, rather than just the first.
- Technological distance