Saturday, 3 December, 2016

Welcome to Orphadata

The mission of Orphadata is to provide the scientific community with a comprehensive, high-quality and freely-accessible dataset related to rare diseases and orphan drugs, in a reusable format.

For more information on Xml format files, see the user's guide.

See "How the data are produced"

Description of the freely-accessible dataset


The dataset is a partial extraction of the data stored in Orphanet, which is also accessible at www.orpha.net for consultation purposes only.
This freely-accessible dataset is available in seven languages (English, French, German, Italian, Portuguese, Spanish and Dutch). It includes:

  • An inventory of rare disorders indexed with OMIM, ICD-10, UMLS, MeSH, MedDRa.
    • Types of disorders: Disorders in the database are comprised of a heterogeneous typology of entities of decreasing extension, including: groups of disorders, disorders, sub-types. A “rare disorder” in the database can be a disease, a malformation syndrome, a clinical syndrome, a morphological or a biological anomaly or a particular clinical situation (in the course of a disorder).
    • Flags of disorders: A flag is a numerical indication attached to an element of the database in order to allow information to be retrieved (i.e. find the list of deprecated entries in Orphanet).
    • Relations between disorders: We identify “Moved to” relationships between entities in the database when a deprecated disorder is part of another.
    • Characterisation of the alignments between disorders and external terminologies or resources: OMIM, ICD10, MeSH, UMLS and MedDRA. These alignments are further characterised, specifying whether the terms are perfectly equivalent (exact mapping) or not (other kinds of relationships: from broader to narrower, from narrower to broader,).
  • Linearisation of disorders
    • Disorders can be multi-classified in Orphanet classifications. For analysis purposes, each disorder is attributed to a preferred classification by linking it to the head of classification entity. As some decisions could be made somewhat arbitrarily, we have written a set of rules to make sure attributions are consistent. The methodology is described here.
  • Genes in Orphanet are cross-referenced with Orphanet diseases and indexed with HGNC, OMIM, GenAtlas, UniProtKB, Ensembl, IUPHAR-DB and Reactome. The relationship between a gene and a disease is qualified according to the role that the gene plays in the pathogenesis of a disease
    • Information concerning genetic entities in the database:
      • Type of genetic entities: either gene with protein product, locus, non-coding RNA
      • Their chromosomal location
      • New gene-disease relationships: gain of function and loss of function germline disease causing mutation.
  • A classification of rare diseases established by Orphanet, based on published expert classifications
  • Phenotypes associated with rare disorders
  • The Orphanet inventory of rare disorders is now annotated with the Human Phenotype Ontology (HPO) terms, a standardized and controlled terminology covering phenotypic abnormalities in human diseases.
    This new product contains two different files. The first one contains rare disorders listed in Orphanet annotated with HPO phenotypes. The alignment is characterized by frequency (obligate, very frequent, frequent, occasional, very rare or excluded) and whether the annotated HPO term is a major diagnostic criterion or a pathognomonic sign of the rare disease.

    • Diagnostic criterion: A diagnostic criterion is a phenotypic abnormality used consensually to assess the diagnosis of a disorder. Multiple sets of diagnostic criteria are necessary to achieve the diagnosis. Orphanet indicates only diagnostic criteria that are consensually accepted by the experts of the medical domain AND published in medical literature. Depending of the medical consensus, they could be further qualified as minor, major, etc…This level of precision is yet not informed in the Orphanet dataset.
    • Pathognomonic sign: A pathognomonic phenotype is a feature sufficient by itself to establish definitively and beyond any doubt the diagnosis of the disease concerned (i.e. heliotrope erytheme for dermatomyosistis).

For more information, see the user’s guide.

Only non-nominative data are accessible in accordance with personal data protection laws.
The dataset is updated once a month. The date of the last release is indicated.

"About Orphadata" for more information

How to quote

When quoting Orphanet, please use the following format :

Orphanet: an online rare disease and orphan drug data base. © INSERM 1997.
Available on http://www.orpha.net. Accessed [date accessed].


When quoting Orphadata, please use the following format:

Orphadata: Free access data from Orphanet. © INSERM 1997.
Available on http://www.orphadata.org. Data version [e.g.XML data version].


If you wish to use one of our logos, please make a request via the contact form.

How to access other types of Orphanet data


Orphadata provides access, on request, to other elements of the Orphanet database after signature of a Material Transfer Agreement.
Please find the


The products which can be licenced include:

  • An inventory of orphan drugs at all stages of development, from EMA (European Medicines Agency) orphan designation to European market authorisation, cross-linked with diseases.
  • Summary information on each rare disease in six languages (English, French, German, Italian, Spanish, Portuguese)
  • URLs of other websites providing information on specific rare diseases
  • A directory of specialised services, providing information on centers of expertise, medical laboratories, diagnostic tests, research projects, clinical trials, patient registries, mutation registries, biobanks and patient organisations in the field of rare diseases, in each of the countries in Orphanet’s network.

Request form to access these files