Researchers are increasingly interested in the secondary use of EHR data to detect specific outcomes and adverse conditions. Our aim is to develop valid, robust, and practical EHR-derived models for identifying postoperative surgical site infections (SSIs). SSIs can be classified into superficial, deep, and organ/space, and are costly with significant morbidity. Compared with administrative/claims data that previous research heavily relied on, our use of EHR data has the potential to allow for the construction of more informative SSI detection models. Unfortunately, secondary use of EHR data can be challenging due to its often incomplete nature - some specific tests are just ordered to only a subset of patients (e.g., 52% of the surgical patients in our cohort do not have any white blood cell count data within 30 days after the operation). Mostly researchers ignore it by excluding cases or single variables with missing data, or imputing missing values for variables with slight amount of missing data. However, because of the high missingness rate in our data, to simply discard incomplete cases may result in losing important indicators of SSI. In our previous work, we only utilized the complete cases to detect SSIs within 30 days after surgery using the gold standard outcome from a validated national surgical registry - National Surgical Quality Improvement Project (NSQIP). In the current study, we sought to explore several popular treatments of missing data. The performance of the models after applying different treatments are compared to that of the reference model based on the complete cases.