How to create a clean netcdf file using the built-in toolbox of Matlab
Post date: Feb 13, 2013 2:5:16 PM
A while ago I wrote a post on how to create a clean netcdf file from Matlab using the Unidata netcdf toolbox. Starting from release 2010a, Matlab is now delivered with a built-in netcdf toolbox. And the API is not quite similar. So, I thought it would be nice top update the how-to as well. Here it is…
This is a Matlab script showing how to create a clean netcdf (4) file, including groups.
Create dummy data to put into the netcdf file
% Define axis, for example: X = 0:2:360; % Longitude Y = -90:2:90; % Latitude Z = -5000:100:0; % Depth T = datenum(1958:2006,1,1,0,0,0); % Time % Create a random field to record in the cdf file dimensions are (Z,Y,X): C = rand(length(Z),length(Y),length(X));
File standard set-up
Open file and insert dimensions
Note I exclude the temporal axis for now. So let’s create a new netCDF4 file:
scope = netcdf.create('myfirstcdffile.nc','netcdf4');
I call the netcdf pointer a scope and not ncid or nc because in netCDF4, we can create groups with their own dimensions and variables. So we need to know where we are when defining dimensions and so on. It will become clear in the next section.
% Define useful constants: NC_GLOBAL = netcdf.getConstant('NC_GLOBAL'); fillValue = -9999; % Define dimensions: dimidZ = netcdf.defDim(scope,'depth',length(Z)); dimidY = netcdf.defDim(scope,'latitude',length(Y)); dimidX = netcdf.defDim(scope,'longitude',length(X)); % Define axis: varid = netcdf.defVar(scope,'depth','double',[dimidZ]); netcdf.putAtt(scope,varid,'standard_name','depth'); netcdf.putAtt(scope,varid,'long_name','Vertical distance below the surface'); netcdf.putAtt(scope,varid,'units','m'); netcdf.defVarFill(scope,varid,false,fillValue); netcdf.putVar(scope,varid,Z); varid = netcdf.defVar(scope,'latitude','double',[dimidY]); netcdf.putAtt(scope,varid,'standard_name','latitude'); netcdf.putAtt(scope,varid,'long_name','Station latitude'); netcdf.putAtt(scope,varid,'units','degrees_north'); netcdf.defVarFill(scope,varid,false,fillValue); netcdf.putVar(scope,varid,Y); varid = netcdf.defVar(scope,'longitude','double',[dimidX]); netcdf.putAtt(scope,varid,'standard_name','longitude'); netcdf.putAtt(scope,varid,'long_name','Station longitude'); netcdf.putAtt(scope,varid,'units','degrees_east'); netcdf.defVarFill(scope,varid,false,fillValue); netcdf.putVar(scope,varid,X);
Insert global attributes
These are just my stuff, you need to customize. But note attributes related to the convention. This is very useful for you netcdf file to be read by anybody else than you.
netcdf.putAtt(scope,NC_GLOBAL,'title','My first netcdf file') netcdf.putAtt(scope,NC_GLOBAL,'long_title','Self explanatory, isnt''it ?') netcdf.putAtt(scope,NC_GLOBAL,'comments','All variables are just noise !') netcdf.putAtt(scope,NC_GLOBAL,'institution','LPO/Ifremer') netcdf.putAtt(scope,NC_GLOBAL,'source','my_model') netcdf.putAtt(scope,NC_GLOBAL,'Conventions','CF-1.6') netcdf.putAtt(scope,NC_GLOBAL,'Conventions_help','http://cf-pcmdi.llnl.gov/documents/cf-conventions/1.6/cf-conventions.html') netcdf.putAtt(scope,NC_GLOBAL,'CreationDate',datestr(now,'yyyy/mm/dd HH:MM:SS')) netcdf.putAtt(scope,NC_GLOBAL,'CreatedBy',getenv('LOGNAME')) %netcdf.putAtt(scope,NC_GLOBAL,'MatlabSource',which('your_matlab_script_generating_this_file'))
Check what we’ve just done
We can now close the netcdf file (optional):
netcdf.close(scope)
and go in a terminal to look at it with:
ncdump -h myfirstcdffile.nc
or simply from Matlab display a minimum of informations:
>> ncdisp('myfirstcdffile.nc') Source: /Users/gmaze/work/Production/How to : Tutorials/myfirstcdffile.nc Format: netcdf4 Variables: depth Size: 51x1 Dimensions: depth Datatype: double latitude Size: 91x1 Dimensions: latitude Datatype: double longitude Size: 181x1 Dimensions: longitude Datatype: double
or a maximum:
>> ncdisp('myfirstcdffile.nc','/','full') Source: /Users/gmaze/work/Production/How to : Tutorials/myfirstcdffile.nc Format: netcdf4 Global Attributes: title = 'My first netcdf file' long_title = 'Self explanatory, isnt'it ?' comments = 'All variables are just noise !' institution = 'LPO/Ifremer' source = 'my_model' Conventions = 'CF-1.6' Conventions_help = 'http://cf-pcmdi.llnl.gov/documents/cf-conventions/1.6/cf-conventions.html' CreationDate = '2013/02/14 11:03:09' CreatedBy = 'gmaze' Dimensions: depth = 51 latitude = 91 longitude = 181 Variables: depth Size: 51x1 Dimensions: depth Datatype: double Attributes: standard_name = 'depth' long_name = 'Vertical distance below the surface' units = 'm' _FillValue = -1e+04 latitude Size: 91x1 Dimensions: latitude Datatype: double Attributes: standard_name = 'latitude' long_name = 'Station latitude' units = 'degrees_north' _FillValue = -1e+04 longitude Size: 181x1 Dimensions: longitude Datatype: double Attributes: standard_name = 'longitude' long_name = 'Station longitude' units = 'degrees_east' _FillValue = -1e+04
Note that the ncdisp function can be used only with the file name. Here I used the complete syntax to highlight the specificity of netCDF4, with the slash /, we ask for information at the top location of the file. We don’t have groups yet, but we could have also asked for informations about a specific group.
Insert a 3D field
If the file was closed, we need to reopen it:
scope = netcdf.open('myfirstcdffile.nc','WRITE') % Rq: here we use 'WRITE' because the file already exists
The variable we want to record is in the Matlab field C but we have to give it a name and define metadata for the netcdf file:
varname = 'my_first_field'; long_name = 'A long description of what this is, well simply noise !'; unit = 'm/s'; % for example of course
And now we can write it:
varid = netcdf.defVar(scope,varname,'double',[dimidZ, dimidY, dimidX]); netcdf.putAtt(scope,varid,'name',varname); netcdf.putAtt(scope,varid,'long_name',long_name); netcdf.putAtt(scope,varid,'units',unit); netcdf.defVarFill(scope,varid,false,fillValue); %netcdf.defVarDeflate(scope,varid,true,true,9); v = C; v(isnan(v)) = fillValue; netcdf.putVar(scope,varid,v);
We can now close the netcdf file (optional):
netcdf.close(scope)
and from Matlab:
ncdisp('myfirstcdffile.nc')
and see the new variable we just added !
Insert a 2D field
The procedure is similar to the 3D case, except that when defining the new variable the dimensions are different:
iz = 1; % Define the level varname = sprintf('%s_level%i',varname,iz); long_name = 'Still a long description of what this is, well simply noise !'; unit = 'm/s'; varid = netcdf.defVar(scope,varname,'double',[dimidY, dimidX]); netcdf.putAtt(scope,varid,'name',varname); netcdf.putAtt(scope,varid,'long_name',long_name); netcdf.putAtt(scope,varid,'units',unit); netcdf.defVarFill(scope,varid,false,fillValue); v = squeeze(C(iz,:,:)); v(isnan(v)) = fillValue; netcdf.putVar(scope,varid,v);
Handling the time axis
You probably use the native Matlab format for your date/time information: datenum. But this is not how conventional netcdf handles time.
Insert the new dimension
Usually, I define my netcdf time axis as the days (can be fractional) since 1900–01–01. So let’s reopen the file:
scope = netcdf.open('myfirstcdffile.nc','WRITE'); % Rq: here we use 'WRITE' because the file already exists
and add the new time axis. First, define the dimension:
dimidT = netcdf.defDim(scope,'time',length(T));
Then add the variable:
varid = netcdf.defVar(scope,'time','double',[dimidT]); netcdf.putAtt(scope,varid,'standard_name','time'); netcdf.putAtt(scope,varid,'long_name','Time of measurements'); netcdf.putAtt(scope,varid,'units','days since 1900-01-01 00:00:00'); netcdf.defVarFill(scope,varid,false,fillValue); netcdf.putVar(scope,varid,T-datenum(1900,1,1,0,0,0));
NOTE THAT THIS IS ONLY MY CHOICE, YOU NEED TO CUSTOMIZE WITH YOUR REQUIREMENTS
Insert a time varying field
Now we create a dummy variable, supposedly a timeserie of an horizontal 2D fields:
D = rand(length(T),length(Y),length(X));
And record it:
varname = 'my_time_serie'; long_name = 'This is a time serie of noise !'; unit = 'degC'; varid = netcdf.defVar(scope,varname,'double',[dimidT,dimdY,dimdX]); netcdf.putAtt(scope,varid,'long_name',long_name); netcdf.putAtt(scope,varid,'units',unit); netcdf.defVarFill(scope,varid,false,fillValue); v = D; v(isnan(v)) = fillValue; netcdf.putVar(scope,varid,v);
We can now close the netcdf file (optional):
netcdf.close(scope)
and from Matlab:
ncdisp('myfirstcdffile.nc')
Variant with groups
This is one of the main new possibility of netCDF4: you can put some structure into the netcdf file itself. I won’t go into the details, but this simply like sub-folders into the file. Let’s see with an example. Imagine that we want we the same netcdf file to contain model output (or observations) as well as some results of their analysis, like say a principal component timeserie from an EOF analysis. We could define the netCDF4 file structure like:
global_scope = netcdf.create('myfirstcdf4file.nc','netcdf4'); netcdf.putAtt(global_scope,NC_GLOBAL,'Comment','This file is not a mess ! It is organized by group') data_scope = netcdf.defGrp(global_scope,'Raw_data'); netcdf.putAtt(data_scope,NC_GLOBAL,'Comment','This group or part of the file contains raw data') another_scope = netcdf.defGrp(global_scope,'Analysis'); netcdf.putAtt(another_scope,NC_GLOBAL,'Comment','This second group or part of the file contains analysis data')
and try:
ncdisp('myfirstcdf4file.nc')
wait ! this throw an error:
Error using netcdflib The NetCDF library encountered an error during execution of 'open' function - 'Unknown file format (NC_ENOTNC)'. Error in netcdf.open (line 60) [varargout{:}] = netcdflib ( 'open', filename, varargin{1} ); Error in internal.matlab.imagesci.nc/openToRead (line 1258) this.ncRootid = netcdf.open(this.Filename,'NOWRITE'); Error in internal.matlab.imagesci.nc (line 122) this.openToRead(); Error in ncdisp (line 51) ncObj = internal.matlab.imagesci.nc(ncFile);
Ok, this was simply to show you that there is a difference between what Matlab knows or wants to do with the actual file state on disk. We need to synchronize both with a sync:
netcdf.sync(global_scope) ncdisp('myfirstcdf4file.nc')
which returns:
Source: /Users/gmaze/work/Production/How to : Tutorials/myfirstcdf4file.nc Format: netcdf4 Groups: /Raw_data/ /Analysis/
this is where the ncdisp parameter location is useful, we can display information for a single group, try, for instance:
ncdisp('myfirstcdf4file.nc','Raw_data','full')
Now, each group can contain its own dimensions and variables, and filling in the data, is as simple as with the previous case, we just need to specify the correct scope of where we want to put information. Consider the following script implementing our use case scenario:
%% Setup data for entire file: netcdf.putAtt(global_scope,NC_GLOBAL,'title','My first netCDF4 file') netcdf.putAtt(global_scope,NC_GLOBAL,'long_title','Self explanatory, isnt''it ?') netcdf.putAtt(global_scope,NC_GLOBAL,'comments','Groups are very useful !') netcdf.putAtt(global_scope,NC_GLOBAL,'institution','LPO/Ifremer') netcdf.putAtt(global_scope,NC_GLOBAL,'source','my_model') netcdf.putAtt(global_scope,NC_GLOBAL,'Conventions','CF-1.6') netcdf.putAtt(global_scope,NC_GLOBAL,'Conventions_help','http://cf-pcmdi.llnl.gov/documents/cf-conventions/1.4/cf-conventions.html') netcdf.putAtt(global_scope,NC_GLOBAL,'CreationDate',datestr(now,'yyyy/mm/dd HH:MM:SS')) netcdf.putAtt(global_scope,NC_GLOBAL,'CreatedBy',getenv('LOGNAME')) %netcdf.putAtt(global_scope,NC_GLOBAL,'MatlabSource',which('your_matlab_script_generating_this_file')) %% Put data into the first group: % Define dimensions: dimidT = netcdf.defDim(data_scope,'time',length(T)); dimidY = netcdf.defDim(data_scope,'latitude',length(Y)); dimidX = netcdf.defDim(data_scope,'longitude',length(X)); % Define axis: varid = netcdf.defVar(data_scope,'time','double',[dimidT]); netcdf.putAtt(data_scope,varid,'standard_name','time'); netcdf.putAtt(data_scope,varid,'long_name','Time of measurements'); netcdf.putAtt(data_scope,varid,'units','days since 1900-01-01 00:00:00'); netcdf.defVarFill(data_scope,varid,false,fillValue); netcdf.putVar(data_scope,varid,T-datenum(1900,1,1,0,0,0)); varid = netcdf.defVar(data_scope,'latitude','double',[dimidY]); netcdf.putAtt(data_scope,varid,'standard_name','latitude'); netcdf.putAtt(data_scope,varid,'long_name','Station latitude'); netcdf.putAtt(data_scope,varid,'units','degrees_north'); netcdf.defVarFill(data_scope,varid,false,fillValue); netcdf.putVar(data_scope,varid,Y); varid = netcdf.defVar(data_scope,'longitude','double',[dimidX]); netcdf.putAtt(data_scope,varid,'standard_name','longitude'); netcdf.putAtt(data_scope,varid,'long_name','Station longitude'); netcdf.putAtt(data_scope,varid,'units','degrees_east'); netcdf.defVarFill(data_scope,varid,false,fillValue); netcdf.putVar(data_scope,varid,X); % Insert data: varid = netcdf.defVar(data_scope,'SST','double',[dimidT, dimidY, dimidX]); netcdf.putAtt(data_scope,varid,'name','Sea_Surface_Temperature'); netcdf.putAtt(data_scope,varid,'long_name','Temperature at the surface of the ocean'); netcdf.putAtt(data_scope,varid,'units','degC'); netcdf.defVarFill(data_scope,varid,false,fillValue); v = D; v(isnan(v)) = fillValue; netcdf.putVar(data_scope,varid,v); % Synchronize: netcdf.sync(data_scope) % Look at what we've just done: disp('------ ncdisp of file (after data insertion in the global scope and one group):') ncdisp('myfirstcdf4file.nc','/','full') disp('------')
Now, let’s put data into the second group. Image we performed an EOF analysis only over the first 40 snapshots of the variable named SST and recorded in the Raw_data group. We want to save the first principal component of the analysis:
%% Put data into the second group: % Define dimensions: T2 = T(1:40); dimidT = netcdf.defDim(another_scope,'time',length(T2)); % Define axis: varid = netcdf.defVar(another_scope,'time','double',[dimidT]); netcdf.putAtt(another_scope,varid,'standard_name','time'); netcdf.putAtt(another_scope,varid,'units','days since 1900-01-01 00:00:00'); netcdf.defVarFill(another_scope,varid,false,fillValue); netcdf.putVar(another_scope,varid,T2-datenum(1900,1,1,0,0,0)); % Insert data: varid = netcdf.defVar(another_scope,'PC','double',[dimidT]); netcdf.putAtt(another_scope,varid,'name','Principal_Component'); netcdf.putAtt(another_scope,varid,'units','none'); netcdf.defVarFill(another_scope,varid,false,fillValue); v = rand(length(T2),1); v(isnan(v)) = fillValue; netcdf.putVar(another_scope,varid,v); % Synchronize: netcdf.sync(another_scope)
Note how a dimension and a variable can be defined in 2 groups and have different properties. Before netCDF4 this would have forced us to create a dimension and a variable with a different name. So the final structure of our netCDF4 file is:
>> ncdisp('myfirstcdf4file.nc','/','full') Source: /Users/gmaze/work/Production/How to : Tutorials/myfirstcdf4file.nc Format: netcdf4 Global Attributes: Comment = 'This file is not a mess ! It is organized by group' title = 'My first netCDF4 file' long_title = 'Self explanatory, isnt'it ?' comments = 'Groups are very useful !' institution = 'LPO/Ifremer' source = 'my_model' Conventions = 'CF-1.6' Conventions_help = 'http://cf-pcmdi.llnl.gov/documents/cf-conventions/1.4/cf-conventions.html' CreationDate = '2013/02/14 12:21:35' CreatedBy = 'gmaze' Groups: /Raw_data/ Attributes: Comment = 'This group or part of the file contains raw data' Dimensions: time = 49 latitude = 91 longitude = 181 Variables: time Size: 49x1 Dimensions: time Datatype: double Attributes: standard_name = 'time' long_name = 'Time of measurements' units = 'days since 1900-01-01 00:00:00' _FillValue = -1e+04 latitude Size: 91x1 Dimensions: latitude Datatype: double Attributes: standard_name = 'latitude' long_name = 'Station latitude' units = 'degrees_north' _FillValue = -1e+04 longitude Size: 181x1 Dimensions: longitude Datatype: double Attributes: standard_name = 'longitude' long_name = 'Station longitude' units = 'degrees_east' _FillValue = -1e+04 SST Size: 49x91x181 Dimensions: time,latitude,longitude Datatype: double Attributes: name = 'Sea_Surface_Temperature' long_name = 'Temperature at the surface of the ocean' units = 'degC' _FillValue = -1e+04 /Analysis/ Attributes: Comment = 'This second group or part of the file contains analysis data' Dimensions: time = 40 Variables: time Size: 40x1 Dimensions: time Datatype: double Attributes: standard_name = 'time' long_name = 'Time of measurements' units = 'days since 1900-01-01 00:00:00' _FillValue = -1e+04 PC Size: 40x1 Dimensions: time Datatype: double Attributes: name = 'Principal_Component' units = '' _FillValue = -1e+04