Structuring a C project

Several developers are confronted with the same problem: Focusing on getting things done rather than writing clean code. Certainly there are great minds who work out every detail beforehand and when they start programming they produce the most beautiful and clean code there is. I'm not one of them. I mostly start programming without a clue about what I'm going to do and tinker until I have something that seems to work somehow. But then, deep down I know, the code I produced is neither clean nor understandable. It's time to clean up the mess and I promise.. I "like" to do that just as much as anyone else.

Code structure is a weird topic. There are many different approaches to get a good one, but, after all, everyone has their personal preferences and ideas when it comes to it. I simply can't tell you what a perfect one looks like either. What I can do, however, is to provide some tools and tips that will help you getting the structure you want.

Use include graphs to get a better overview. A great tool to generate them is this Perl script: cinclude2dot. Using it is quite simple, just put it into your project's root source folder and run it like this (To get higher quality output you should add graph[dpi=<value>]; to the diagraph options in the Perl script (line 187). I've never programmed perl but it's easy to add some lines and make the include graph fit your personal preferences):

~ perl cinclude2dot.pl --include "</path/to/include1,/path/to/include2,...>" --quotetypes quote --merge module > out.dot

It is important that you don't add any spaces to the comma separated list of include directories. There are also more options available and it is easy to modify the script without any Perl-skills, but you should know the dot language since the output is of that type. Just run

~ dot -Tpng out.dot -o out.png

in the same directory to generate an image. As a result you'll get an include graph that might look like this:

With the knowledge of what the code does and the include graph you can see what belongs together and recognize groups of code.

It is easy to identify 4 groups within that project. They are grouped together simply because they have strong internal dependencies. Another interesting thing is, that the position of the marked modules makes sense. The includes are top-to-bottom and the include graph shows the implementation of a library. The functionality can be summarized like this:

The first module implements the API.
The second and third module are connection and message (de-)serialisation.
The fourth module, the "leftover", is a set of helper functions and data structures we use within that library.

Based on that you could implement modules or create a folder structure - Resolving include loops is a must. If you haven't done it yet, you could summarize functions of multiple, logically dependent, source files in a single header. This alone will make your project far more understandable.

Keep in mind, that the graph is just a tool to get an improved overview. Avoid putting to much meaning into it. Structuring a project is not a beauty contest for include graphs.

Include What You Use is an include policy that tells you to include the corresponding header file for every externally defined symbol. The developers of include-what-you-use argue as follows:

When every file includes what it uses, then it is possible to edit any file and remove unused headers, without fear of accidentally breaking the upwards dependencies of that file. It also becomes easy to automatically track and update dependencies in the source code.

iwyu gives you the ability to automatically apply it's rules to your files. Doing so might cause some problems especially if you want to apply slightly different rules. I just use it on all files and manipulate them manually, to make sure I don't mess up anything.

~ include-what-you-use -iquote </path/to/include/root> <file name>

-iquote is a clang option to set a quote-type include directory. (Use it multiple times, if there is more than one)

The output might look something like this:

somefile.c should add these lines:
#include <sodium/core.h> // for sodium_init
#include <uv-unix.h> // for uv_sem_t, uv_thread_t
#include "common.h" // for cstring_copy_string, string

somefile.c should remove these lines:
- #include <sodium.h> // lines 1-1

The full include-list for somefile.c:
#include <sodium/core.h> // for sodium_init
#include <uv-unix.h> // for uv_sem_t, uv_thread_t
#include <uv.h> // for uv_default_loop, uv_loop_close, uv_run, uv_...
#include "some.h" // for some_init, some_start
#include "common.h" // for cstring_copy_string, string

The output shows what exactly is used from each header, which headers to include and which includes you should remove. A personal tip is: Try to avoid transitive dependencies whenever it's practicable. They hide important information, especially when you look at include graphs.

Write tests at some point. When you get something done fast you usually don't work them in. Clean and understandable code requires you to write them. There are obvious reasons why you should implement tests, but I'm sure you've heard them often enough. One not-so-obvious advantage is, that they help you with restructuring a project. How? Simply because they force you to use your functions out of context. If writing your tests forces you to implement dozens of helpers, wrappers and what not, something is not right. Tests should usually be easy to implement and as intelligible as possible.

Documentation is the key to understandability and clean code. When documenting any function or data structure, it should be easy to explain what it does. If you struggle to much with the explanation you might need to change something. It either means you don't really understand what it does or it implements to many things at once. If this is the case, you should consider breaking it apart or even consider another implementation!

You need to be strict as well as honest when it comes to code structure. Identifying a problem implies fixing it as soon as possible. It will become harder the longer you wait and you'll eventually end up with almost irresolvable issues. Keep the following things in mind when working out the structure for your project:

A clean structure will accelerate your future development and helps you to recognize and fix bugs, memory leaks and security issues.
Structuring a project, once done, is not a repetitive task. Usually it is easy to obey the rules once you specify them. Write them down and make sure you have the consent with your fellow developers.
As I suggested earlier, there is no perfect structure. Keep this in mind when discussing it. If you and others can't agree on a rule, just choose one. It doesn't matter to much as long as the rule actually helps making the code more understandable.
Focus on simplicity! Violating your rules might be inevitable, if they force you to introduce unnecessary or redundant code.

Restructuring will take sometime, but it's definitely worth it. Most software problems are due to bad code structure and unintelligible code. The worst thing is, that you will always have a bad feeling about adding new features since you won't know when your code base is finally going to break apart.. and who wants that?

Blog