database:drs

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
database:drs [2016/08/30 09:40]
jypeter Changed the sections' levels
database:drs [2017/06/09 08:32]
jypeter Added link to CV definition @ github
Line 9: Line 9:
  
 <WRAP center round tip 60%> <WRAP center round tip 60%>
-  * **attribute**: a //global attribute// (e.g. in a netCDF file) used to describe the data+  * **attribute**: a //global attribute// (e.g. in a NetCDF file) used to describe the data
  
   * **CV**: sometimes the value of a given **attribute** has to be taken from a predefined set of values, known as a //Controlled Vocabulary// (**CV**)   * **CV**: sometimes the value of a given **attribute** has to be taken from a predefined set of values, known as a //Controlled Vocabulary// (**CV**)
 +    * [[https://github.com/WCRP-CMIP/CMIP6_CVs|CMIP6 allowed CV values]]
  
-  * **DRS** = //Data Reference Syntax//: the //DRS// is used to identify experiments, simulations, ensembles of experiments, atomic datasets and is used, for example, in file names, directory structures, the further_info_url, and in facets of some search tools+  * **DRS** = //Data Reference Syntax//: the //DRS// is used to identify experiments, simulations, ensembles of experiments, atomic datasets and is used, for example, in [[#pmip4-cmip6_directory_structure_and_file_names|file names, directory structures]], the further_info_url, and in //facets// of some search tools 
 +  * **facet** = a category or attribute you can put a search constraint on, when doing a //faceted search//
  
-Example: the ''experiment_id'' **attribute** is used in the **DRS**, and its value has to be chosen from a **CV** ([//piControl//, //past1000//, //lgm//, ...])+Example: the ''experiment_id'' **attribute** is used in the **DRS**, and its value has to be chosen from a **CV** ([//piControl//, //past1000//, //lgm//, ...]).\\ On the [[https://esgf-node.ipsl.upmc.fr/search/cmip5-ipsl/|IPSL CMIP5 search node]], you can put a search constraint on the //Experiment// **facet** by clicking on //Experiment//  and then selecting //lgm// and clicking on //Search//
 </WRAP> </WRAP>
  
Line 27: Line 29:
    
   * **CMIP6** document: [[https://docs.google.com/document/d/1h0r8RZr_f3-8egBMMh7aqLwy3snpD6_MrDz1q8n5XUk/edit|CMIP6 Global Attributes, DRS, Filenames, Directory Structure, and CV’s (version 1.0)]]   * **CMIP6** document: [[https://docs.google.com/document/d/1h0r8RZr_f3-8egBMMh7aqLwy3snpD6_MrDz1q8n5XUk/edit|CMIP6 Global Attributes, DRS, Filenames, Directory Structure, and CV’s (version 1.0)]]
 +    * Note: [[https://github.com/WCRP-CMIP/CMIP6_CVs|CMIP6 allowed CV values]]
  
   * Legacy **CMIP5** documents:   * Legacy **CMIP5** documents:
Line 41: Line 44:
  
 ^ Project ^ ''activity_id'' ^ ''mip_era'' ^ Note ^ ^ Project ^ ''activity_id'' ^ ''mip_era'' ^ Note ^
-| CMIP6 | CMIP | CMIP6 | | +| [[#deck_and_historical_experiments|CMIP6]] | CMIP | CMIP6 | | 
-| PMIP4-CMIP6 | CMIP\\ \\ "CMIP PMIP"**??** | CMIP6 | Should we use //CMIP// or "//CMIP PMIP//" for [[#pmip4-cmip6|PMIP4 experiments that are part of CMIP6]]?\\ This is confusing | +| [[#pmip4-cmip6_experiments|PMIP4-CMIP6]] | CMIP\\ \\ "CMIP PMIP"**??** | CMIP6 | Should we use //CMIP// or "//CMIP PMIP//" for [[#pmip4-cmip6|PMIP4 experiments that are part of CMIP6]]?\\ This is confusing | 
-| PMIP4 | PMIP | PMIP4**??**\\ \\ CMIP6**??**| Use this for non-CMIP6 experiments, or groups that are not part of CMIP6\\ Should we use //PMIP4// because it is the 4th phase of PMIP, or //CMIP6// because we will be using CMIP6 format specifications? |+| [[#proposed_pmip4_experiment_id_values|PMIP4]] | PMIP | PMIP4**??**\\ \\ CMIP6**??**| Use this for non-CMIP6 experiments, or groups that are not part of CMIP6\\ Should we use //PMIP4// because it is the 4th phase of PMIP, or //CMIP6// because we will be using CMIP6 format specifications? |
  
 ===== Experiment names ===== ===== Experiment names =====
Line 59: Line 62:
  
 <WRAP center round important 60%> <WRAP center round important 60%>
-TODO: How do we specify that an //historical// experiment is the true continuation of a //past1000// experiment?+FIXME How do we specify that an //historical// experiment is the true continuation of a //past1000// experiment?
  
 We probably need to use ''parent_experiment_id''=//past1000// in the files' metadata, as well as ''parent_activity_id'', ''parent_mip_era''=//CMIP6//, ''parent_source_id'', ''branch_time_in_ parent'' and other related ''parent_*'' variables. We can probably also agree on a specific ''variant_label'' that will appear in the file names. We probably need to use ''parent_experiment_id''=//past1000// in the files' metadata, as well as ''parent_activity_id'', ''parent_mip_era''=//CMIP6//, ''parent_source_id'', ''branch_time_in_ parent'' and other related ''parent_*'' variables. We can probably also agree on a specific ''variant_label'' that will appear in the file names.
 </WRAP> </WRAP>
  
 +Reminder: **DECK** = //Diagnostic, Evaluation and Characterization of Klima//. More information on the CMIP6 experiments is available in [[http://www.geosci-model-dev.net/9/1937/2016/|Eyring et al 2017]]
  
 ^ ''experiment_id'' ^ ''experiment'' ^ ^ ''experiment_id'' ^ ''experiment'' ^
Line 96: Line 100:
     * Facilitate representations of groups of experiments that are closely related (e.g., same forecast conditions but different start dates, or experiment with an “offline” model driven by output from various models)</code>     * Facilitate representations of groups of experiments that are closely related (e.g., same forecast conditions but different start dates, or experiment with an “offline” model driven by output from various models)</code>
  
-  * Planning for groups of related experiments+  * Planning for groups of related simulations
     * <code>Often several simulations will be performed that satisfy the conditions specified for each experiment.  For example simulations of the historical period can branch from various points in a control run, and each of these will satisfy the conditions defining the experiment.  Together such simulations constitute a “conforming ensemble” with member all satisfying the same “root” experiment specifications.  There are also occasional cases where the experiment designers (MIP leaders) define a family of related simulations and choose to label these with a common “root” experiment name.  An example of this is the set of decadal prediction hindcasts that are all run similarly but started from different start dates (with each simulation identified by a different sub-experiment label).   Such “defined ensembles” of experiments will be labeled with a “root” experiment name, and a “sub-experiment_id” will be used to distinguish among members in the ensemble.</code>     * <code>Often several simulations will be performed that satisfy the conditions specified for each experiment.  For example simulations of the historical period can branch from various points in a control run, and each of these will satisfy the conditions defining the experiment.  Together such simulations constitute a “conforming ensemble” with member all satisfying the same “root” experiment specifications.  There are also occasional cases where the experiment designers (MIP leaders) define a family of related simulations and choose to label these with a common “root” experiment name.  An example of this is the set of decadal prediction hindcasts that are all run similarly but started from different start dates (with each simulation identified by a different sub-experiment label).   Such “defined ensembles” of experiments will be labeled with a “root” experiment name, and a “sub-experiment_id” will be used to distinguish among members in the ensemble.</code>
-    * Ensemble of experiments usually share a common ''experiment_id'' and have different //ripf// [[#cmip6_variant_label|variant labels]].+    * Ensemble of simulations usually share a common ''experiment_id'' and have different //ripf// [[#cmip6_variant_label|variant labels]].
  
  
Line 113: Line 117:
 | LD''v1''-transpin | [[exp_design:degla#transient_orbit_and_trace_gases_spinup_26-21_ka|Transient orbit and trace gases spinup (26-21 ka)]] | Work | | LD''v1''-transpin | [[exp_design:degla#transient_orbit_and_trace_gases_spinup_26-21_ka|Transient orbit and trace gases spinup (26-21 ka)]] | Work |
 | LD''v1'' | [[exp_design:degla#transient_deglaciation_21-0_ka|Transient deglaciation (21-0 ka)]] | Work | | LD''v1'' | [[exp_design:degla#transient_deglaciation_21-0_ka|Transient deglaciation (21-0 ka)]] | Work |
 +|  FIXME  | Early Holocene (9.5 ka? 8.5 ka?) | Work |
 +|  lig116k ?  | Transition from the LIG to the glacial\\ (2 experiments) | Work |
 +|  FIXME  | MIS11\\ (2 experiments)  | Work |
 +|  FIXME  | Transient Holocene (6 ka to 0) | Work |
 +|  FIXME  | Transient LIG (130 ka to 125 ka) | Work |
 +|  FIXME  | [[http://www.deepmip.org/|DeepMIP]] | Work |
 +|  FIXME  | More experiments to come... ||
  
 ===== Handling groups of simulations ===== ===== Handling groups of simulations =====
Line 204: Line 215:
 aka ''r<k>i<l>p<m>f<n>'' or //ripf// aka ''r<k>i<l>p<m>f<n>'' or //ripf//
  
-<code> +  * ''realization_index''realization number (integer >0)
-    * variant_label: a label constructed from 4 indices stored as global attributes:   +
-              variant_label = r<k>i<l>p<m>f<n> +
-                  where +
-                     k = realization_index +
-                     l = initialization_index +
-                     m = physics_index +
-                     n = forcing_index +
-    * variant_info: brief descriptor of what is unique about this “ripf” variant.</code> +
- +
-  * ''forcing_index''index for variant of forcing (integer >0)+
   * ''initialization_index'' = index for variant of initialization method (integer >0)   * ''initialization_index'' = index for variant of initialization method (integer >0)
   * ''physics_index'' = index for model physics variant (integer >0)   * ''physics_index'' = index for model physics variant (integer >0)
-  * ''variant_info''brief description of what is unique about this //ripf// variant+  * ''forcing_index''index for variant of forcing (integer >0) 
 +    * Note: the information stored in the //forcing// attribute in CMIP5 may in CMIP6 appear in the ''variant_info'' attribute
   * ''variant_label'' = a label constructed from 4 indices stored as global attributes   * ''variant_label'' = a label constructed from 4 indices stored as global attributes
     * <code>r<k>i<l>p<m>f<n>     * <code>r<k>i<l>p<m>f<n>
Line 225: Line 227:
                      m = physics_index                      m = physics_index
                      n = forcing_index</code>                      n = forcing_index</code>
-    * Example:  if realization_index=2initialization_index=1, physics_index=3, and forcing_index=233, then variant_label = “r2i1p3f233”.+  * ''variant_info'' = brief description of what is unique about this //ripf// variant 
 +    * Example: //"forcing: black carbon aerosol only"////"realization 1"////"realization 1; initialized using anomaly approach (method 2)"//
  
  
-== variant_label values for PMIP4 experiments ==+==== PMIP4 and variant_label notes ====
  
-For PMIP4, we have ''sub_experiment_id == none'' (because we don't use forecast and hindcast), and therefore ''member_id == variant_label'' (used in the file names)+Reminder: each option in ''r<k>i<l>p<m>f<n>'' has to be a strictly positive integer 
 + 
 +=== realization_index r<k> === 
 + 
 +The long PMIP4 simulations are going to require both a lot of processing power and a lot of storage. It is quite likely that there will be **only one realization**for a given set of //i<l>p<m>f<n>// and that the variant label will always start with ''r1'' 
 + 
 +=== forcing_index f<n> === 
 + 
 +Depending on available resources, the PMIP4 groups may choose to perform several simulations for the same experiment, using different combinations of forcings. **The forcings used will have to be carefully described** in the documentation (and in the metadata inside each NetCDF file) and be //encoded// in the integer value of the forcing_index. 
 + 
 +There are several ways to proceed. The easiest way is to let each group choose its own way of numbering the forcings combinations (and document it!), but **all groups should try to use a common scheme** for and associate the same combination of forcings with the same integer 
 + 
 +== Sequential numbering scheme == 
 + 
 +The contact people for each experiment determine which forcings combinations are most likely to be used and associate them with a predefined number. If necessary, a group can later ask for a new forcing combination to be registered 
 + 
 +^ Forcings ^  f''forcing_index'' 
 +| Recommended default,\\ or most likely combination,\\ or mandatory simulation |  ''f1'' 
 +| forcing1='on', forcing2='off', etc |  ''f2'' 
 +| Some other combination |  ''fN'' 
 + 
 +== Hierarchical numbering scheme == 
 + 
 +The following scheme will create bigger integersbut the values will be more meaningful 
 + 
 +If there are 10 or less options for each type of forcing, we can assign a power of 10 to each type, multiply it with the forcing option and add everything 
 + 
 +Tentative example for the [[exp_design:lgm|lgm]] experiment: 
 + 
 +^  Power  ^  Forcing  ^ Options ^ 
 +|  2  |  Ice sheet  | ''1''=//ICE-6G-C//\\ ''2''=//GLAC-1D//
 +|  1  |  Aerosols  | ''1''=//Hopcroft et al//\\ ''2''=//Albani et al// | 
 +|  0  |  Vegetation  | ''1''=//interactive vegetation//\\ ''2''=//interactive carbon cycle//\\ ''3''=//prescribed//
 + 
 +Example: //GLAC-1D// + //Hopcroft et al// + //interactive vegetation// = ''2 * 100 + 1 * 10 + 1'' => ''f211'' 
 + 
 +==== variant_label constraints for PMIP4 experiments ==== 
 + 
 +=== historical === 
 + 
 +''historical'' simulations that are the continuation of a [[exp_design:lm|past1000]] simulation should use ''1000'' for the ''initialization_method'' => ''i1000'' 
 + 
 +=== past1000 === 
 + 
 + 
 +FIXME 
 + 
 +=== mid-Holocene === 
 + 
 +FIXME 
 + 
 +There are at least [[exp_design:mh#sensitivity_experiments|2 sensitivity experiments]] 
 + 
 +=== lgm === 
 + 
 + 
 +FIXME 
 + 
 +=== lig127k === 
 + 
 +FIXME 
 + 
 +There are at least [[exp_design:lig127#sensitivity_experiments|3 sensitivity experiments]] 
 + 
 +=== midPliocene-eoi400 === 
 + 
 + 
 +FIXME 
 + 
 +=== LD-LGMspin === 
 + 
 + 
 +FIXME 
 + 
 +=== LD-transpin === 
 + 
 + 
 +FIXME 
 + 
 +=== LD === 
 + 
 + 
 +FIXME 
 + 
 +=== Other PMIP4 specific experiments === 
 + 
 +There may be other experiments listed in [[#proposed_pmip4_experiment_id_values|Proposed PMIP4 experiment_id values]] and [[exp_design:index#pmip4_experiments|PMIP4 experiments]] 
 + 
 +FIXME
  
 ===== PMIP4-CMIP6 directory structure and file names ===== ===== PMIP4-CMIP6 directory structure and file names =====
Line 236: Line 327:
 The //DRS// defines (among other things) how the different attributes will be combined to generate unambiguous directories and file names, in the ESGF distributed database The //DRS// defines (among other things) how the different attributes will be combined to generate unambiguous directories and file names, in the ESGF distributed database
  
-<code>Directory structure = <mip_era>/<activity_id>/<institution_id>/<source_id>/<experiment_id>/<member_id>/<table_id>/<variable_id>/<grid_label>/<version>+<code>Directory structure = <mip_era>/ 
 +                        <activity_id>/ 
 +                          <institution_id>/ 
 +                            <source_id>/ 
 +                              <experiment_id>/ 
 +                                <member_id>/         <== variant_label 
 +                                  <table_id>/ 
 +                                    <variable_id>/ 
 +                                      <grid_label>/ 
 +                                        <version>/
  
 file name = <variable_id>_<table_id>_<experiment_id >_<source_id>_<member_id>_<grid_label>[_<time_range>].nc file name = <variable_id>_<table_id>_<experiment_id >_<source_id>_<member_id>_<grid_label>[_<time_range>].nc
 </code> </code>
  
-^  Dir  ^  File  ^ Attribute ^ Value for PMIP4-CMIP6 ^+For PMIP4, we have ''sub_experiment_id == none'' (because we don't use forecast and hindcast), and therefore ''member_id == variant_label'' 
 + 
 +^  Used in\\ dir?  ^  Used in\\ file?  ^ Attribute\\ name ^ Value for PMIP4-CMIP6 ^
 |  <wrap hi>Y</wrap>  |  N  | ''mip_era'' | CMIP6\\ PMIP4 ? | |  <wrap hi>Y</wrap>  |  N  | ''mip_era'' | CMIP6\\ PMIP4 ? |
 |  <wrap hi>Y</wrap>  |  N  | ''activity_id'' | CMIP\\ PMIP\\ Note: "CMIP PMIP" becomes CMIP (//If multiple activities are listed in the global attribute, the first one is used in the directory structure//) | |  <wrap hi>Y</wrap>  |  N  | ''activity_id'' | CMIP\\ PMIP\\ Note: "CMIP PMIP" becomes CMIP (//If multiple activities are listed in the global attribute, the first one is used in the directory structure//) |
 |  <wrap hi>Y</wrap>  |  N  | ''institution_id'' | institution label (//IPSL//, ...) | |  <wrap hi>Y</wrap>  |  N  | ''institution_id'' | institution label (//IPSL//, ...) |
-|  <wrap hi>Y</wrap>  |  N  | ''version'' | ''vYYYYMMDD'' (e.g., ''v20160218''), indicating a representative date for the version\\ Note: the //version// is not stored in the netdcf files, because it is only specified when //publishing// (eg storing the data in ESGF) the netdcf files |+|  <wrap hi>Y</wrap>  |  N  | ''version'' | ''vYYYYMMDD'' (e.g., ''v20160218''), indicating a representative date for the version\\ Note: the //version// is **not** stored in the NetCDF files and **not** used in the file names, because it is only specified when //publishing// (eg storing the data in ESGF) the NetCDF files |
 |  <wrap hi>Y</wrap>  |  <wrap hi>Y</wrap>  | ''source_id'' | source label (e.g. the model name/version using only authorized characters) | |  <wrap hi>Y</wrap>  |  <wrap hi>Y</wrap>  | ''source_id'' | source label (e.g. the model name/version using only authorized characters) |
 |  <wrap hi>Y</wrap>  |  <wrap hi>Y</wrap>  | ''experiment_id'' | See the [[#experiment_names|Experiment names]] section | |  <wrap hi>Y</wrap>  |  <wrap hi>Y</wrap>  | ''experiment_id'' | See the [[#experiment_names|Experiment names]] section |
-|  <wrap hi>Y</wrap>  |  <wrap hi>Y</wrap>  | ''member_id'' | PMIP4 does not use ''sub_experiment_id'', so the value of ''member_id'' is equal to the variant_label: ''r<k>i<l>p<m>f<n>'' (see the [[#cmip6_variant_label|CMIP6 variant label]] section) |+|  <wrap hi>Y</wrap>  |  <wrap hi>Y</wrap>  | ''member_id'' | PMIP4 does not use ''sub_experiment_id'', so the value of ''member_id'' is equal to the ''variant_label'': ''r<k>i<l>p<m>f<n>'' (see the [[#cmip6_variant_label|CMIP6 variant label]] section) |
 |  <wrap hi>Y</wrap>  |  <wrap hi>Y</wrap>  | ''table_id'' | [[http://cmor.llnl.gov/|CMOR]] table label (''Amon'', ...) | |  <wrap hi>Y</wrap>  |  <wrap hi>Y</wrap>  | ''table_id'' | [[http://cmor.llnl.gov/|CMOR]] table label (''Amon'', ...) |
 |  <wrap hi>Y</wrap>  |  <wrap hi>Y</wrap>  | ''variable_id'' | variable identifier (''tas'', ''pr'', ...) | |  <wrap hi>Y</wrap>  |  <wrap hi>Y</wrap>  | ''variable_id'' | variable identifier (''tas'', ''pr'', ...) |
Line 254: Line 356:
 |  N  |  <wrap hi>Y</wrap>  | ''time_range'' | the last segment of the file name indicates the time-range spanned by the data in the file, and is omitted when inappropriate.  The format for this segment is [[http://cmip-pcmdi.llnl.gov/cmip5/docs/cmip5_data_reference_syntax.pdf|the same as in CMIP5]] | |  N  |  <wrap hi>Y</wrap>  | ''time_range'' | the last segment of the file name indicates the time-range spanned by the data in the file, and is omitted when inappropriate.  The format for this segment is [[http://cmip-pcmdi.llnl.gov/cmip5/docs/cmip5_data_reference_syntax.pdf|the same as in CMIP5]] |
  
 +Examples:
 +  * Directory = ''CMIP6/CMIP/NCAR/CCSM2-1/1pctCO2/r1i1p1f1/Amon/tas/gn/v20150320/''\\ File = ''tas_Amon_CCSM2-1_1pctCO2_r1i1p1f1_gn_202001-202912.nc''
 +
 +  * Directory = ''CMIP6/DCPP/NCAR/CCSM2-1/dcppA-hindcast/s1960-r1i2p1f1/Amon/tas/gr/v20150320/''\\ File = ''tas_Amon_CCSM2-1_hindcast_s1960-r1i2p1f1_gn_198001-198412.nc''
 +
 +===== CMIP6 data license and acknowledgement =====
 +
 +<WRAP center round important 60%>
 +PMIP data users have to add the following PMIP-specific sentence to their acknowledgement:\\ **//PMIP is endorsed by both WCRP/WGCM and Future Earth/PAGES//**
 +</WRAP>
 +
 +
 +==== Data users ====
 +
 +The data end users have to follow the Terms of Use and licensing information detailed on the [[https://pcmdi.llnl.gov/home/CMIP6/CitationRequirements6-0.html|CMIP6: Proper citation and acknowledgement]] page.
 +
 +==== Data providers ====
 +
 +The Terms of Use are also detailed in the ''license'' global attribute available in each data file created by the providers
 +
 +Note: you can get the [[https://github.com/WCRP-CMIP/CMIP6_CVs/blob/master/CMIP6_license.json|latest version of the license]] on github.
 +
 +<code>
 +The “license” attribute should record the following statement (with segments in square brackets optional, and with required, appropriate text entered in place of <*> ): 
 +
 +“CMIP6 model data produced by <Your Centre Name> is licensed under a Creative Commons Attribution-[NonCommercial-]ShareAlike 4.0 International License (https://creativecommons.org/licenses). Use of the data must be acknowledged following guidelines found at https://pcmdi.llnl.gov/home/CMIP6/CitationRequirements6-0.html. Further information about this data, including some limitations, can be found via the further_info_url (recorded as a global attribute in this file)[ and at <some URL maintained by modeling group>]. The data producers and data providers make no warranty, either express or implied, including, but not limited to, warranties of merchantability and fitness for a particular purpose. All liabilities arising from the supply of the information (including any liability arising in negligence) are excluded to the fullest extent permitted by law.”
 +
 +The [*] indicates that institutions may choose to use the Non-commercial version of this license by inserting the words “NonCommercial” at this point, but this will significantly limit the use of the data in downstream climate mitigation and adaptation applications.  Please do not simply copy the statement above when writing data; Some text must be entered, some text is optional and the symbols “[*]” should not appear in the licensing text.
 +</code>
 +
 +==== Details about the Creative Commons licenses ====
 +
 +You can read the //The Licenses// section of [[https://creativecommons.org/licenses/?lang=en|About The Licenses]] if you want to understand the **Creative Commons** copyright licenses. You can also read the [[https://wiki.creativecommons.org/images/6/6d/6licenses-flat.pdf|Six licenses for sharing your work]] summary pdf.
 +
 +More specifically, CMIP6 data will be distributed under the following 2 licenses (each institute has to choose one license)
 +
 +^  Logo  ^ Abbreviation ^ Full name ^ Details ^
 +| {{:database:by-sa.png?nolink&200}} | CC BY-SA 4.0 | Attribution-ShareAlike 4.0 International | https://creativecommons.org/licenses/by-sa/4.0/ |
 +| {{:database:by-nc-sa.png?nolink&200}} | CC BY-NC-SA 4.0 | Attribution-NonCommercial-ShareAlike 4.0 International | https://creativecommons.org/licenses/by-nc-sa/4.0/ |
 +
 +Note: CC buttons and logos are available from the CC site [[https://creativecommons.org/about/downloads/|Downloads]] page.
  • database/drs.txt
  • Last modified: 2017/06/09 08:32
  • by jypeter