How to create a clean netcdf file using the built-in toolbox of Matlab

Post date: Feb 13, 2013 2:5:16 PM

A while ago I wrote a post on how to create a clean netcdf file from Matlab using the Unidata netcdf toolbox. Starting from release 2010a, Matlab is now delivered with a built-in netcdf toolbox. And the API is not quite similar. So, I thought it would be nice top update the how-to as well. Here it is…

This is a Matlab script showing how to create a clean netcdf (4) file, including groups.

Create dummy data to put into the netcdf file

% Define axis, for example: X = 0:2:360; % Longitude Y = -90:2:90; % Latitude Z = -5000:100:0; % Depth T = datenum(1958:2006,1,1,0,0,0); % Time  % Create a random field to record in the cdf file dimensions are (Z,Y,X): C = rand(length(Z),length(Y),length(X));

File standard set-up

Open file and insert dimensions

Note I exclude the temporal axis for now. So let’s create a new netCDF4 file:

scope = netcdf.create('myfirstcdffile.nc','netcdf4');

I call the netcdf pointer a scope and not ncid or nc because in netCDF4, we can create groups with their own dimensions and variables. So we need to know where we are when defining dimensions and so on. It will become clear in the next section.

% Define useful constants: NC_GLOBAL = netcdf.getConstant('NC_GLOBAL'); fillValue = -9999;  % Define dimensions: dimidZ = netcdf.defDim(scope,'depth',length(Z)); dimidY = netcdf.defDim(scope,'latitude',length(Y)); dimidX = netcdf.defDim(scope,'longitude',length(X));  % Define axis: varid = netcdf.defVar(scope,'depth','double',[dimidZ]);          netcdf.putAtt(scope,varid,'standard_name','depth'); netcdf.putAtt(scope,varid,'long_name','Vertical distance below the surface'); netcdf.putAtt(scope,varid,'units','m'); netcdf.defVarFill(scope,varid,false,fillValue); netcdf.putVar(scope,varid,Z);  varid = netcdf.defVar(scope,'latitude','double',[dimidY]); netcdf.putAtt(scope,varid,'standard_name','latitude'); netcdf.putAtt(scope,varid,'long_name','Station latitude'); netcdf.putAtt(scope,varid,'units','degrees_north'); netcdf.defVarFill(scope,varid,false,fillValue); netcdf.putVar(scope,varid,Y);  varid = netcdf.defVar(scope,'longitude','double',[dimidX]); netcdf.putAtt(scope,varid,'standard_name','longitude'); netcdf.putAtt(scope,varid,'long_name','Station longitude'); netcdf.putAtt(scope,varid,'units','degrees_east'); netcdf.defVarFill(scope,varid,false,fillValue); netcdf.putVar(scope,varid,X);

Insert global attributes

These are just my stuff, you need to customize. But note attributes related to the convention. This is very useful for you netcdf file to be read by anybody else than you.

netcdf.putAtt(scope,NC_GLOBAL,'title','My first netcdf file') netcdf.putAtt(scope,NC_GLOBAL,'long_title','Self explanatory, isnt''it ?') netcdf.putAtt(scope,NC_GLOBAL,'comments','All variables are just noise !') netcdf.putAtt(scope,NC_GLOBAL,'institution','LPO/Ifremer') netcdf.putAtt(scope,NC_GLOBAL,'source','my_model')  netcdf.putAtt(scope,NC_GLOBAL,'Conventions','CF-1.6')    netcdf.putAtt(scope,NC_GLOBAL,'Conventions_help','http://cf-pcmdi.llnl.gov/documents/cf-conventions/1.6/cf-conventions.html')  netcdf.putAtt(scope,NC_GLOBAL,'CreationDate',datestr(now,'yyyy/mm/dd HH:MM:SS')) netcdf.putAtt(scope,NC_GLOBAL,'CreatedBy',getenv('LOGNAME')) %netcdf.putAtt(scope,NC_GLOBAL,'MatlabSource',which('your_matlab_script_generating_this_file'))

Check what we’ve just done

We can now close the netcdf file (optional):

netcdf.close(scope)

and go in a terminal to look at it with:

ncdump -h myfirstcdffile.nc

or simply from Matlab display a minimum of informations:

>> ncdisp('myfirstcdffile.nc') Source:            /Users/gmaze/work/Production/How to : Tutorials/myfirstcdffile.nc Format:            netcdf4 Variables:     depth                Size:       51x1            Dimensions: depth            Datatype:   double     latitude             Size:       91x1            Dimensions: latitude            Datatype:   double     longitude            Size:       181x1            Dimensions: longitude            Datatype:   double

or a maximum:

>> ncdisp('myfirstcdffile.nc','/','full') Source:            /Users/gmaze/work/Production/How to : Tutorials/myfirstcdffile.nc Format:            netcdf4 Global Attributes:            title            = 'My first netcdf file'            long_title       = 'Self explanatory, isnt'it ?'            comments         = 'All variables are just noise !'            institution      = 'LPO/Ifremer'            source           = 'my_model'            Conventions      = 'CF-1.6'            Conventions_help = 'http://cf-pcmdi.llnl.gov/documents/cf-conventions/1.6/cf-conventions.html'            CreationDate     = '2013/02/14 11:03:09'            CreatedBy        = 'gmaze' Dimensions:            depth     = 51            latitude  = 91            longitude = 181 Variables:     depth                Size:       51x1            Dimensions: depth            Datatype:   double            Attributes:                        standard_name = 'depth'                        long_name     = 'Vertical distance below the surface'                        units         = 'm'                        _FillValue    = -1e+04     latitude             Size:       91x1            Dimensions: latitude            Datatype:   double            Attributes:                        standard_name = 'latitude'                        long_name     = 'Station latitude'                        units         = 'degrees_north'                        _FillValue    = -1e+04     longitude            Size:       181x1            Dimensions: longitude            Datatype:   double            Attributes:                        standard_name = 'longitude'                        long_name     = 'Station longitude'                        units         = 'degrees_east'                        _FillValue    = -1e+04

Note that the ncdisp function can be used only with the file name. Here I used the complete syntax to highlight the specificity of netCDF4, with the slash /, we ask for information at the top location of the file. We don’t have groups yet, but we could have also asked for informations about a specific group.

Insert a 3D field

If the file was closed, we need to reopen it:

scope = netcdf.open('myfirstcdffile.nc','WRITE')  % Rq: here we use 'WRITE' because the file already exists

The variable we want to record is in the Matlab field C but we have to give it a name and define metadata for the netcdf file:

varname = 'my_first_field'; long_name = 'A long description of what this is, well simply noise !'; unit = 'm/s'; % for example of course

And now we can write it:

varid = netcdf.defVar(scope,varname,'double',[dimidZ, dimidY, dimidX]); netcdf.putAtt(scope,varid,'name',varname); netcdf.putAtt(scope,varid,'long_name',long_name); netcdf.putAtt(scope,varid,'units',unit); netcdf.defVarFill(scope,varid,false,fillValue); %netcdf.defVarDeflate(scope,varid,true,true,9);  v = C;     v(isnan(v)) = fillValue;     netcdf.putVar(scope,varid,v);

We can now close the netcdf file (optional):

netcdf.close(scope)

and from Matlab:

ncdisp('myfirstcdffile.nc')

and see the new variable we just added !

Insert a 2D field

The procedure is similar to the 3D case, except that when defining the new variable the dimensions are different:

iz = 1; % Define the level varname = sprintf('%s_level%i',varname,iz); long_name = 'Still a long description of what this is, well simply noise !'; unit = 'm/s'; varid = netcdf.defVar(scope,varname,'double',[dimidY, dimidX]); netcdf.putAtt(scope,varid,'name',varname); netcdf.putAtt(scope,varid,'long_name',long_name); netcdf.putAtt(scope,varid,'units',unit); netcdf.defVarFill(scope,varid,false,fillValue); v = squeeze(C(iz,:,:));     v(isnan(v)) = fillValue;     netcdf.putVar(scope,varid,v);

Handling the time axis

You probably use the native Matlab format for your date/time information: datenum. But this is not how conventional netcdf handles time.

Insert the new dimension

Usually, I define my netcdf time axis as the days (can be fractional) since 1900–01–01. So let’s reopen the file:

scope = netcdf.open('myfirstcdffile.nc','WRITE');  % Rq: here we use 'WRITE' because the file already exists

and add the new time axis. First, define the dimension:

dimidT = netcdf.defDim(scope,'time',length(T));

Then add the variable:

varid = netcdf.defVar(scope,'time','double',[dimidT]);           netcdf.putAtt(scope,varid,'standard_name','time'); netcdf.putAtt(scope,varid,'long_name','Time of measurements'); netcdf.putAtt(scope,varid,'units','days since 1900-01-01 00:00:00'); netcdf.defVarFill(scope,varid,false,fillValue); netcdf.putVar(scope,varid,T-datenum(1900,1,1,0,0,0));

NOTE THAT THIS IS ONLY MY CHOICE, YOU NEED TO CUSTOMIZE WITH YOUR REQUIREMENTS

Insert a time varying field

Now we create a dummy variable, supposedly a timeserie of an horizontal 2D fields:

D = rand(length(T),length(Y),length(X));

And record it:

varname = 'my_time_serie'; long_name = 'This is a time serie of noise !'; unit = 'degC';  varid = netcdf.defVar(scope,varname,'double',[dimidT,dimdY,dimdX]);          netcdf.putAtt(scope,varid,'long_name',long_name); netcdf.putAtt(scope,varid,'units',unit); netcdf.defVarFill(scope,varid,false,fillValue); v = D;     v(isnan(v)) = fillValue;     netcdf.putVar(scope,varid,v);   

We can now close the netcdf file (optional):

netcdf.close(scope)

and from Matlab:

ncdisp('myfirstcdffile.nc')

Variant with groups

This is one of the main new possibility of netCDF4: you can put some structure into the netcdf file itself. I won’t go into the details, but this simply like sub-folders into the file. Let’s see with an example. Imagine that we want we the same netcdf file to contain model output (or observations) as well as some results of their analysis, like say a principal component timeserie from an EOF analysis. We could define the netCDF4 file structure like:

global_scope  = netcdf.create('myfirstcdf4file.nc','netcdf4'); netcdf.putAtt(global_scope,NC_GLOBAL,'Comment','This file is not a mess ! It is organized by group')  data_scope    = netcdf.defGrp(global_scope,'Raw_data'); netcdf.putAtt(data_scope,NC_GLOBAL,'Comment','This group or part of the file contains raw data')  another_scope = netcdf.defGrp(global_scope,'Analysis'); netcdf.putAtt(another_scope,NC_GLOBAL,'Comment','This second group or part of the file contains analysis data')

and try:

ncdisp('myfirstcdf4file.nc')

wait ! this throw an error:

Error using netcdflib The NetCDF library encountered an error during execution of 'open' function - 'Unknown file format (NC_ENOTNC)'.  Error in netcdf.open (line 60)         [varargout{:}] = netcdflib ( 'open', filename, varargin{1} );  Error in internal.matlab.imagesci.nc/openToRead (line 1258)             this.ncRootid = netcdf.open(this.Filename,'NOWRITE');  Error in internal.matlab.imagesci.nc (line 122)                     this.openToRead();  Error in ncdisp (line 51) ncObj   = internal.matlab.imagesci.nc(ncFile);

Ok, this was simply to show you that there is a difference between what Matlab knows or wants to do with the actual file state on disk. We need to synchronize both with a sync:

netcdf.sync(global_scope) ncdisp('myfirstcdf4file.nc')

which returns:

Source:            /Users/gmaze/work/Production/How to : Tutorials/myfirstcdf4file.nc Format:            netcdf4 Groups:     /Raw_data/      /Analysis/

this is where the ncdisp parameter location is useful, we can display information for a single group, try, for instance:

ncdisp('myfirstcdf4file.nc','Raw_data','full')

Now, each group can contain its own dimensions and variables, and filling in the data, is as simple as with the previous case, we just need to specify the correct scope of where we want to put information. Consider the following script implementing our use case scenario:

%% Setup data for entire file:  netcdf.putAtt(global_scope,NC_GLOBAL,'title','My first netCDF4 file') netcdf.putAtt(global_scope,NC_GLOBAL,'long_title','Self explanatory, isnt''it ?') netcdf.putAtt(global_scope,NC_GLOBAL,'comments','Groups are very useful !') netcdf.putAtt(global_scope,NC_GLOBAL,'institution','LPO/Ifremer') netcdf.putAtt(global_scope,NC_GLOBAL,'source','my_model') netcdf.putAtt(global_scope,NC_GLOBAL,'Conventions','CF-1.6')     netcdf.putAtt(global_scope,NC_GLOBAL,'Conventions_help','http://cf-pcmdi.llnl.gov/documents/cf-conventions/1.4/cf-conventions.html') netcdf.putAtt(global_scope,NC_GLOBAL,'CreationDate',datestr(now,'yyyy/mm/dd HH:MM:SS')) netcdf.putAtt(global_scope,NC_GLOBAL,'CreatedBy',getenv('LOGNAME')) %netcdf.putAtt(global_scope,NC_GLOBAL,'MatlabSource',which('your_matlab_script_generating_this_file'))  %% Put data into the first group:  % Define dimensions: dimidT = netcdf.defDim(data_scope,'time',length(T));     dimidY = netcdf.defDim(data_scope,'latitude',length(Y)); dimidX = netcdf.defDim(data_scope,'longitude',length(X));  % Define axis: varid = netcdf.defVar(data_scope,'time','double',[dimidT]);          netcdf.putAtt(data_scope,varid,'standard_name','time'); netcdf.putAtt(data_scope,varid,'long_name','Time of measurements'); netcdf.putAtt(data_scope,varid,'units','days since 1900-01-01 00:00:00'); netcdf.defVarFill(data_scope,varid,false,fillValue); netcdf.putVar(data_scope,varid,T-datenum(1900,1,1,0,0,0));  varid = netcdf.defVar(data_scope,'latitude','double',[dimidY]); netcdf.putAtt(data_scope,varid,'standard_name','latitude'); netcdf.putAtt(data_scope,varid,'long_name','Station latitude'); netcdf.putAtt(data_scope,varid,'units','degrees_north'); netcdf.defVarFill(data_scope,varid,false,fillValue); netcdf.putVar(data_scope,varid,Y);  varid = netcdf.defVar(data_scope,'longitude','double',[dimidX]); netcdf.putAtt(data_scope,varid,'standard_name','longitude'); netcdf.putAtt(data_scope,varid,'long_name','Station longitude'); netcdf.putAtt(data_scope,varid,'units','degrees_east'); netcdf.defVarFill(data_scope,varid,false,fillValue); netcdf.putVar(data_scope,varid,X);  % Insert data: varid = netcdf.defVar(data_scope,'SST','double',[dimidT, dimidY, dimidX]); netcdf.putAtt(data_scope,varid,'name','Sea_Surface_Temperature'); netcdf.putAtt(data_scope,varid,'long_name','Temperature at the surface of the ocean'); netcdf.putAtt(data_scope,varid,'units','degC'); netcdf.defVarFill(data_scope,varid,false,fillValue); v = D;     v(isnan(v)) = fillValue;     netcdf.putVar(data_scope,varid,v);   % Synchronize: netcdf.sync(data_scope)  % Look at what we've just done: disp('------ ncdisp of file (after data insertion in the global scope and one group):') ncdisp('myfirstcdf4file.nc','/','full') disp('------')

Now, let’s put data into the second group. Image we performed an EOF analysis only over the first 40 snapshots of the variable named SST and recorded in the Raw_data group. We want to save the first principal component of the analysis:

%% Put data into the second group:  % Define dimensions: T2 = T(1:40); dimidT = netcdf.defDim(another_scope,'time',length(T2));      % Define axis: varid = netcdf.defVar(another_scope,'time','double',[dimidT]);           netcdf.putAtt(another_scope,varid,'standard_name','time'); netcdf.putAtt(another_scope,varid,'units','days since 1900-01-01 00:00:00'); netcdf.defVarFill(another_scope,varid,false,fillValue); netcdf.putVar(another_scope,varid,T2-datenum(1900,1,1,0,0,0));  % Insert data: varid = netcdf.defVar(another_scope,'PC','double',[dimidT]); netcdf.putAtt(another_scope,varid,'name','Principal_Component'); netcdf.putAtt(another_scope,varid,'units','none'); netcdf.defVarFill(another_scope,varid,false,fillValue); v = rand(length(T2),1);     v(isnan(v)) = fillValue;     netcdf.putVar(another_scope,varid,v);  % Synchronize: netcdf.sync(another_scope)

Note how a dimension and a variable can be defined in 2 groups and have different properties. Before netCDF4 this would have forced us to create a dimension and a variable with a different name. So the final structure of our netCDF4 file is:

>> ncdisp('myfirstcdf4file.nc','/','full')  Source:            /Users/gmaze/work/Production/How to : Tutorials/myfirstcdf4file.nc Format:            netcdf4 Global Attributes:            Comment          = 'This file is not a mess ! It is organized by group'            title            = 'My first netCDF4 file'            long_title       = 'Self explanatory, isnt'it ?'            comments         = 'Groups are very useful !'            institution      = 'LPO/Ifremer'            source           = 'my_model'            Conventions      = 'CF-1.6'            Conventions_help = 'http://cf-pcmdi.llnl.gov/documents/cf-conventions/1.4/cf-conventions.html'            CreationDate     = '2013/02/14 12:21:35'            CreatedBy        = 'gmaze' Groups:     /Raw_data/         Attributes:                    Comment = 'This group or part of the file contains raw data'         Dimensions:                    time      = 49                    latitude  = 91                    longitude = 181         Variables:             time                         Size:       49x1                    Dimensions: time                    Datatype:   double                    Attributes:                                standard_name = 'time'                                long_name     = 'Time of measurements'                                units         = 'days since 1900-01-01 00:00:00'                                _FillValue    = -1e+04             latitude                     Size:       91x1                    Dimensions: latitude                    Datatype:   double                    Attributes:                                standard_name = 'latitude'                                long_name     = 'Station latitude'                                units         = 'degrees_north'                                _FillValue    = -1e+04             longitude                    Size:       181x1                    Dimensions: longitude                    Datatype:   double                    Attributes:                                standard_name = 'longitude'                                long_name     = 'Station longitude'                                units         = 'degrees_east'                                _FillValue    = -1e+04             SST                          Size:       49x91x181                    Dimensions: time,latitude,longitude                    Datatype:   double                    Attributes:                                name       = 'Sea_Surface_Temperature'                                long_name  = 'Temperature at the surface of the ocean'                                units      = 'degC'                                _FillValue = -1e+04      /Analysis/         Attributes:                    Comment = 'This second group or part of the file contains analysis data'         Dimensions:                    time = 40         Variables:             time                    Size:       40x1                    Dimensions: time                    Datatype:   double                    Attributes:                                standard_name = 'time'                                long_name     = 'Time of measurements'                                units         = 'days since 1900-01-01 00:00:00'                                _FillValue    = -1e+04             PC                      Size:       40x1                    Dimensions: time                    Datatype:   double                    Attributes:                                name       = 'Principal_Component'                                units      = ''                                _FillValue = -1e+04