How to organize your digital working environment: two principles and one example for beginners
Post date: Jul 08, 2013 7:28:21 PM
This post is for those who start an internship or a PhD. It summarizes a couple of advices I give to my students.
2 principles:
To start with, here are two principles that should guide you:
"In 4 years, someone I don't know now should be able to easily find information in my stuff". This means that you should consider that you work as part of a team. In practice, it means for example: extensively comment you script, add readme.txt files to your folders to explain what's inside or where it comes from, don't use words like "toto","tutu","tata","tmp","old","very_old","new" for files or folders for more than a week, name files and folders for what they are or contain, etc ...
"Think to future tasks in order to organize present ones". I think the core principle here is modularity. It applies to folder's organization as well as code's structure. For instance, if you analyze a given dataset for a given project, the scripts for the analysis should be in a project folder and the data files in another. May be later, you'll conduct the same analysis on a different dataset or you use the same dataset to conduct a different analysis. In a code structure, this means that you should use functions and/or classes as soon as you copy/pasted the same code snippet more than 3 times !
1 example:
To set-up a clean working environment, you first need a root folder. It could be your home directory (it's often the case), but it could also be any folder from your disk. Just select it and place EVERYTHING underneath. Side effects will be a much easier handling of backups and a clean split between your work and private life if you use a single computer. Now that you have your root folder, let's take an example to illustrate the two principles stated above. Let's say you start an internship in a lab. You're going to use Matlab to conduct your analysis and visualization and you're going to use a dataset called Argo downloaded from a ftp server. Under the root folder, I suggest you have the following folders:
+ data + matlab + projects
Now, let's expand these folders:
- data - ARGO readme.txt + ftp + docs - matlab plot_one_float_trajectory.m - matlab m_map -> m_map_1.4 + m_map_1.2 + m_map_1.4 + copoda startup.m - projects - my_first_analysis_of_Argo_data - analysis + figures load_coordinates.m map_profiles.m - reports + figures version0.doc + my_second_project_as_a_phd
Under the data folder, we have a folder dedicated to the Argo dataset. It contains:
a readme.txt where you would say on which ftp server you downloaded the dataset and when
a ftp folder where the data files are
a docs folder where you will put documents related to this dataset such as the file format description for instance
a matlab folder where you will place scripts that are not related to a scientific analysis or a specific project, but are related to generic tasks on the dataset
Whatever your use or analysis of this dataset, this folder is a standalone source for it. You can then share this folder to your team or get back to it in 4 years easily. Matlab has its own folder at the root level because in this way you will be able to handle your own toolboxes and customized environment easily. In this example I added several sub-folders.The "m_map" folder is in fact a link to the "m_map_1.4" one. This simple trick could be use to simply handle several versions and updates to a similar toolbox. Everywhere you need to refer to this toolbox, you point to the "m_map" folder (see startup.m file below). Then, with the link you point to the desired version. The "copoda" folder doesn't make use of this method because it's a toolbox handled using a svn repository. Versioning is thus delocalized. Last, you have the "startup.m" file. It will automatically be run by Matlab when starting a new session. In this file you will tell Matlab to load any toolbox. It could look like:
% Startup file for Matlab % Last update: 2013/07/01 disp('Hello you !') % Load toolboxes: addpath('~/matlab/m_map') % End of startup
Last, the Projects folder will contain all the real science work ! If there should be only thing to back up here, it would be this folder; all the others could be regenerated. Note that I created two sub-folders for the first project: "analysis" and "reports". The folder "analysis" will contain all the scripts and figures produced during the research phase, the folder "reports" will contain all the reports and hopefully articles about your work. Of course, all of this is an example to help beginners organize their work a minima. It all depends on your tools and common practices in your lab, although the two principles stated above should be kept in mind whatever your situation.