Over 17,000 protein coding genes have been scored according to their predicted probability of exhibiting haploinsufficiency. These predictions are generated using a classification model trained on two datasets:

  1. known haploinsufficient genes and
  2. genes disrupted by unambiguous loss-of-function variants in at least two apparently healthy individuals. The model uses sequence conservation, expression patterns and proximity within a gene network to known haploinsufficient genes as predictor variables. Missing predictor variables are imputed using other gene properties before prediction. Percentages refer to genome-wide percentiles of genes ranked according to their haploinsufficient score.