This page provides basic tips on how to write well-structured, readable code.
A fundamental issue of many scientific software tools is that the code is hardly readable if you are not an expert in the science behind the program. This often stems from names of variables, functions, files, etc., that are chosen to be as short as possible. Particularly scientists who are used to work with mathematical formulae in their everyday live tend to use series of cryptic combinations of one to five characters and Greek letters in their scripts. This may be concise and physically sound but users and potential programming partners will soon need a comprehensive manual and book to understand what’s happening.
rho_s = 100 rho_i = 917 c = 0.9 t = 0 ts = 1 rho_p = rho_s while rho_p <= rho_i: rho_p += c t += 1
The solution is simple: Using descriptive names. For example, changing a variable name ‘rho_s’ to ‘density_snow’ will make sure everybody understands immediately what the variable is referring to. This will save the programmer time write comments. It also reduces the risk of ambiguity that may result in misunderstandings and bugs. Most modern code editors and IDEs have powerful tools to search and replace names throughout scripts.
density_fresh_snow = 100 density_ice = 917 compaction = 0.9 time = 0 time_step = 1 density_snowpack = density_fresh_snow while density_snowpack <= density_ice: density_snowpack += compaction time += time_step
Adding comments makes code much easier to understand. When used systematically, e.g. with prominent section headlines and subtle explanations for variables and functions, comments also help to visualize the structure of a program.
# Variable declarations density_fresh_snow = 100 # Unit: kg/m³ density_ice = 917 # Unit: kg/m³ compaction = 0.9 # Unit: kg/m³ time = 0 # Initial timer value, unit: days time_step = 1 # Time increment per iteration, unit: days # Set initial density of snowpack to density of fresh snow density_snowpack = density_fresh_snow # Apply snowpack compaction until density of ice is reached ... while density_snowpack <= density_ice: density_snowpack += compaction # Measure the time time += time_step
Many scientific software projects begin as a small script. Often, these scripts grow longer and longer with time until even the lead programmer finds it hard to remember where in the code a specific functionality is located. Unfortunately, there is now one-fits-all solution to this challenge but several strategies are generally useful.
Start each file with a header, i.e. a block comment that explains what the code does. Beside a general description, it is good practice to include information on input and output data, parameters, the name of the author and the date of the last change.
Defining variables, constants, functions, etc., in the places where they are invoked first in the script may seem straightforward. However, this scales poorly when the project becomes bigger. A simple and effective solution is to group the declaration of global assets in blocks in the first part of a program.
Extract and include
Basically all programming languages offer commands to include code from other files at a given point of a program. This is a versatile tool keep individual files short and clear. For instance, a main script that is called by the user may thus be reduced to the most fundamental logic. All the actual processing routines can be extracted into individual scripts, each with one particular purpose. The main scripts will then include the other script only when they are actually needed. Such ‘sliced’ programs are typically much more concise and readable.
In order to make code reusable and to foster collaboration on a software project, all variables that are specific to individual usage cases should be excluded from the main program. There’s a variety of options how to realize this for different programming languages. Many programming languages support adding parameters after the call of a script. This is particularly useful if there are only few variables that need to be set. For more extensive sets of user variables the definitions may be written to a separate configuration file. This may be a text or source code file and will have a predefined structure, including comments explaining the individual variables, value ranges, units, etc. A good practice is to provide a template configuration with the software, so that users can get started quickly.
In some cases a variable will have one specific value in the majority of usage scenarios. Here it is advisable to set this value as a default value in the main program that is only changed when the user explicitly sets it to something else.
A step further from extracting and and including parts of the code is making the code modular. This means that at some point(s) of the code additional code can be included. Such points are often referred to as ‘hooks’, the additional code as ‘module’. In order to create a working hook and module concept, it is generally necessary to define a clear concept of input and output data formats, so that the main program and the module can communicate with each other.