D.4.1. What is a Substance dataset?

The Substance dataset is the central core of information in IUCLID. It contains all data related to a chemical substance like the chemical identity including the substance composition, information on manufacture, use and exposure, information on the classification and labelling, and all required and available endpoint study summaries. A substance to be handled in an IUCLID Substance dataset can be a chemical element, a molecule, or a combination of molecules derived from any manufacturing process or from a biological source, including any additive and any impurity deriving from the process used, but excluding any solvent which may be separated without affecting the stability of the substance or changing its composition (see also chapter B.4.2 Substance-related information).

A Substance dataset is the repository of data, which is used as a basis to create a registration or submission Dossier. When a Substance dataset is created for a given chemical substance, it is assigned to a Reference substance, which in turn is based on the EC Inventory or, if not listed, newly defined. The Reference substance is nothing else than a link to the identity of the chemical. On the user interface, it appears as a data entry field, which however is edited in a separate inventory of Reference substances (see also chapter D.11 Reference substance (Create and update Reference substance related information)).

The difference between the (submission) substance after which a Substance dataset is named and the assigned Reference substance is briefly explained based on the following examples of (i) a mono-constituent substance and (ii) a multi-constituent substance:

Diethyl peroxydicarbonate:
- Reference substance = Diethyl peroxydicarbonate as listed in the EC inventory, with the following identifiers: EC 238-707-3, CAS 14666-78-5, C6H10O6
- Submission substance = Diethyl peroxydicarbonate (technical), i.e. named after the Reference substance as main constituent, but includes isododecane as stabilizing agent and, hence, an additive together with impurities, which need to be specified in section 1.2 Composition. The typical concentration of diethyl peroxydicarbonate in this substance is 22% with an upper limit of 27%.
Mixture of 1,4-dimethylbenzene, 1,2-dimethylbenzene and 1,3-dimethylbenzene:
- Reference substance = Mixture of 1,4-dimethylbenzene, 1,2-dimethylbenzene and 1,3-dimethylbenzene, with the following identifiers: EC 215-535-7, CAS 1330-20-7, C8H10
- Submission substance = e.g. Mixture of 1,4-dimethylbenzene, 1,2-dimethylbenzene and 1,3-dimethylbenzene, i.e. named after the Reference substance as main constituents, but with identification of all these constituents, i.e. 1,4-dimethylbenzene (30-40%), 1,2-dimethylbenzene (25-35%), 1,3-dimethylbenzene (20-30%), and impurities (water, 5-12%).

The second example implies that a Substance dataset is also used as repository for so-called unintentional mixtures. These include:

Multi-constituent substances (as in the example above), which are mixtures consisting of several main constituents present in a certain range of concentrations (e.g. => 10 and < 80% (w/w));
Complex mixtures including Unknown or Variable composition, Complex reaction products or Biological materials (UVCB substances), or process streams.

These kinds of mixtures are handled like any discrete substances. That is, a Reference substance is defined for the mixture and assigned to the Substance dataset created for that mixture.

Important

The IUCLID feature Mixture is designed for handling preparations, which according to the definition given in REACH are intentional mixtures of substances gained by blending of two or more constituents. The components retain their own chemical identity and properties and do not react. The composition of a preparation can be fully characterised. See chapter D.7 Mixture (create and update mixture related information).

A Substance dataset is characterised by the following data structure:

It is always associated with an owner, i.e. a Legal entity.
It may be associated to a Legal entity site, which is the physical location where the substance is produced and/or used. This site must belong to the Legal entity to which the Substance dataset is associated.
The chemical identity defined for a Substance dataset is established by a link (i.e. reference) to the corresponding Reference substance, which in turn is linked to the EC inventory or newly created.
A Substance dataset includes sections 1 to 3 for documenting general information on the substance (including further information on identification, composition etc.), classification and labelling and information on manufacture, use and exposure. These sections are physically managed as one record. No further records can be created.
A Substance dataset also includes sections 4 to 13 for documenting endpoint-related data using multiple Endpoint study records or Endpoint summary records, which have to be created by the user.
Optionally a Substance dataset can be assigned to one or several Templates (for inherit), which will make the Endpoint study or Endpoint summary records of that Template(s) seamlessly available in the Substance dataset.
Optionally a Substance dataset can be populated by using Templates (for copy).

The data structure of a Substance dataset is illustrated by the following figures:

The relations between the database objects shown in the figures above are as follows:

L01: Each "Substance" can be linked to one "Reference Substance".
L02: For each "Substance" no, one or more "Substance Composition(s)" can be defined. For each "Substance Composition" no, one or more "Constituent(s)" can be defined. Each "Constituent" can be linked to one "Reference Substance".
L03: For each "Substance" no, one or more "Substance composition(s)" can be defined. For each "Substance composition" no, one or more "Impurity(ies)" can be defined. Each "Impurity" can be linked to one "Reference Substance".
L04: For each "Substance" no, one or more "Substance composition(s)" can be defined. For each "Substance composition" no, one or more "Additive(s)" can be defined. Each "Additive" can be linked to one "Reference Substance".
L05: Each "Reference Substance" can be linked to one "EC Inventory".
L06: Each "Substance" must be linked to one "Legal Entity" (as the owner).
L07: Each "Substance" can be linked to one "Third Party".
L08: For each "Substance" no, one or more "Joint Submission(s)" can be defined. For each "Joint Submission" one "Lead" can be defined. One "Legal entity" can be linked as "Lead".
L09: For each "Substance" no, one or more "Joint Submission(s)" can be defined. For each "Joint Submission" no, one or more "Member(s)" can be defined. One "Legal entity" can be linked as "Member".
L10: For each "Substance" no, one or more "Legal Entities" can be linked as "Recipient(s)".
L11: For each "Substance" no, one or more "Legal Entities" can be linked as "Supplier(s)".
L12: For each "Substance" no, one or more "Site(s)" can be defined. For each "Site(s)" no, one or more "Site" can be defined. One "Site" can be linked as "Site".
L13: Each "Substance" can be linked to no, one or more "Template(s) (for inherit)".
L14: Each "Template (for inherit)" must be linked to one "Legal Entity" (as the owner).
L15: Each "Template (for copy)" must be linked to one "Legal Entity" (as the owner).
L16: Each "Site" must be linked to one "Legal entity".
R01: Each "Substance" can have no, one or more linked "Endpoint(s)". On deleting the "Substance" all linked "Endpoint(s)" will be deleted, too.
R02: Each "Substance" can have no, one or more linked "Endpoint Summary(ies)". On deleting the "Substance" all linked "Endpoint Summaries" will be deleted.
R03: Each "Template (for inherit)" can have no, one or more linked "Endpoint(s)". On deleting the "Template" all linked "Endpoint(s)" will be deleted.
R04: Each "Template (for inherit)" can have no, one or more linked "Endpoint Summary(ies)". On deleting the "Substance" all linked "Endpoint Summary(ies)" will be deleted.
R05: Each "Template (for copy)" can have no, one or more linked "Endpoint(s)". On deleting the "Template" all linked "Endpoint(s)" will be deleted.
R06: Each "Template (for copy)" can have no, one or more linked "Endpoint Summary(ies)". On deleting the "Substance" all linked "Endpoint Summary(ies)" will be deleted.