Over 17,000 protein coding genes have been scored according to their predicted probability of exhibiting haploinsufficiency. These predictions are generated using a classification model trained on two datasets:
- known haploinsufficient genes and
- genes disrupted by unambiguous loss-of-function variants in at least two apparently healthy individuals. The model uses sequence conservation, expression patterns and proximity within a gene network to known haploinsufficient genes as predictor variables. Missing predictor variables are imputed using other gene properties before prediction. Percentages refer to genome-wide percentiles of genes ranked according to their haploinsufficient score.
- High ranks (e.g. 0-10%) indicate a gene is more likely to exhibit haploinsufficiency, low ranks (e.g. 90-100%) indicate a gene is more likely to NOT exhibit haploinsufficiency.
- The manuscript describing the generation and validation of these haploinsufficiency predictions (Huang et al) is published in PLoS Genetics. Updated predictions of haploinsufficiency can be downloaded from our data download page.