The post was written in Chinese and translated by LLM. Feel free to contact me if any phrasing seems unnatural.

25-08-22 update for Modules Wrapper，[One big thirdparty module], [suggested filename suffix], [mix include and import]

Build System
How much compile time can C++20 Modules save?
Are C++20 Modules and PCH equivalent? What’s the difference?
Can C++20 Modules reduce code size? Why?
Can we use C++20 Modules for programming now?
- But what’s the cost?
- When to use C++20 Modules
Modules Wrapper
- export-using style
- extern “C++” style
Choose a specific suffix for your wrapped third-party libraries to avoid conflicts
One big thirdparty module
Use .cppm and other special suffixes as the filename suffix for importable module units
How do C++20 Modules reduce compile time?
Recompilation issues from modifying interface files
Other uses for Module Implementation Partition Units
Non-Cascading Changes
Try to avoid placing the same declaration in different TUs
- Mixing import and #include
Impact of Modules on code size
Runtime issues encountered during module migration
Forward declaration issues in Modules
TODO?

C++20 Modules are beneficial for improving code modularity, enhancing program encapsulation, boosting compilation speed, and reducing the size of library code. Consequently, C++20 Modules have been highly anticipated since their inception. However, it is unsatisfactory that this feature, finalized in 2019, is still not widely adopted as of the latter half of 2025. This article shares findings and thoughts from our practical experience in developing and applying Modules, hoping to help friends interested in C++20 Modules.

I am more familiar with Clang’s implementation and work exclusively in a Linux environment. Unless otherwise specified, the environment described in this article should be assumed to be Linux + Clang. I have not verified information related to Windows environments and GCC, so it may not align with the latest facts due to memory errors or outdated information.

For basic knowledge about Modules, you can refer to background and terminology.

For ease of reading, this article will be described from a high-level to a low-level perspective.

Build System

The build system I use at work is a version of Bazel that we’ve modified downstream. We are trying to contribute this implementation to the Bazel community.

I have written some small examples with CMake with C++20 Modules, but I haven’t used it seriously. Therefore, this article will not cover CMake. However, in my memory, CMake seems to be a blocking issue for many people using C++20 Modules. The most memorable example is Boost’s Modules experiment, where one of the blocking issues mentioned was CMake.

Additionally, XMake and Build2 also provide C++20 Modules support, and both projects have heavy users of C++20 Modules, so I have a good impression of them.

There is also HMake, which claims to have a very good implementation for C++20 Modules and has ambitious plans, but I have limited time and understanding. Interested friends can take a look, as there might be new opportunities.

If the build system you’re using doesn’t provide support for C++20 modules, it may be helpful too to reach out and tell them you want it.

How much compile time can C++20 Modules save?

The data I have obtained from practice ranges from 25% to 45%, excluding the build time of third-party libraries, including the standard library.

Online, this number varies widely. The most exaggerated figure I recall is a 26x improvement in project compilation speed after a module-based refactoring. This was likely the result of a large-scale project restructure along with the module conversion. Furthermore, if a project uses extensive template metaprogramming and stores constexpr variable values in Modules, the compilation speed can easily increase by thousands of times, though we generally do not discuss such cases. Apart from these more extreme claims, most reports on C++20 Modules compilation speed improvements are between 10% and 50%.

There are also reports of significant decreases in compilation speed after migrating to Modules. This is likely due to 1. reduced compilation parallelism, and 2. numerous duplicate declarations among Module Units (see below). Of course, it could also be due to defects in the compiler implementation.

Are C++20 Modules and PCH equivalent? What’s the difference?

C++20 Modules and Pre-Compiled Headers (PCH) are not the same thing. Compared to PCH, C++20 Modules have their own semantics. A C++20 Module’s Interface file is a normal Translation Unit and can produce Object Files. This gives C++20 Modules a higher ceiling for compilation acceleration and the ability to generate more efficient code.

Can C++20 Modules reduce code size? Why?

C++20 Modules can reduce the size of object files (.o), static libraries (.a), and dynamic libraries (.so). However, we have not observed a significant difference in the size of the final executable file.

In practice, by calculating the total size of all dynamic library artifacts (.so) in the build directory, we found that the sum of their sizes decreased by 12% after modularization.

The reason is that a C++20 Module’s Interface file is itself a normal Translation Unit that can produce object files (.o), thus avoiding the problem of the same code being repeatedly generated in different Translation Units.

Can we use C++20 Modules for programming now?

Yes, we can. C++20 Modules are usable in a Linux + Clang environment. There are also examples showing that C++20 Modules are usable in a Windows environment with MSVC. I have not yet heard of GCC’s C++20 Modules being used in non-trivial projects.

But what’s the cost?

The most significant cost comes from refactoring existing code. Therefore, if you are starting a new project or an almost entirely new project, it is the most suitable opportunity to use C++20 Modules. It is important to note that once a project uses C++20 Modules, in most cases, its downstream dependents must also use C++20 Modules. This means that for the vast majority of libraries, if they wish for downstream users to continue using header files, these libraries can mostly only provide Modules rather than use Modules. This situation will be discussed further later.

The next cost comes from the compiler. Compiler crashes are always frustrating. As far as I know, the C++20 Modules implementations in Clang, GCC, and MSVC have all been widely criticized for persistent internal compiler errors. However, just for Clang, my impression is that the number of Module-related issue reports has recently fallen below that of Coroutines! (Although this is still shameful).

Next, the most frequently mentioned issue is code intelligence support. In my scenario, the current experimental modules support in clangd already works fine. The problems I previously encountered locally were almost all related to an incorrect compilation database. Also, because clangd’s C++20 Modules support is actually quite simple, with most of the logic in clang-tools-extra/clangd/ModulesBuilder.cpp being basic file management, I highly encourage friends with the need to try and implement it themselves.

Finally, I recall issues with inconsistent behavior between compilers/platforms. For example, another blocking issue in the aforementioned Boost Modules experiment was a behavioral difference between MSVC and Clang. There are also issues with clang-cl and CMake being unable to find the std module in different environments.

When to use C++20 Modules

The project has no downstream users (or all downstream users of this project are already using the Modules you provide!) (or you just don’t care about them anymore).
The project has been updated to the latest compiler and language standard (at least -std=c++23).
The project does not have strong requirements for cross-compiler and cross-platform compatibility (the current compatibility needs to be investigated; I have no personal experience).

At this point, the cost of using C++20 Modules is proportional to the project’s complexity. As a reference, for a project with 4,500 C++ files (including source and header files), totaling one million lines of code (ignoring comments and blank lines), I spent about two months converting it entirely to a version that uses Modules. During the conversion, I used a conversion tool I wrote to assist.

Modules Wrapper

If some downstream users of a project want to use C++20 Modules while others still wish to use header files, the project can provide optional Modules through a Modules Wrapper without requiring any changes from downstream users who are not interested. This is the approach taken by most libraries that currently support Modules. Providing modular “bindings” for libraries can be an interesting way to gain real-world experience from early adopters.

We can see many such libraries at arewemodulesyet. If you know of a library that provides Modules but is not listed on this site, please submit a PR to update it.

The so-called Modules Wrapper commonly comes in the following two forms:

export-using style

module;
#include "header_1.h"
#include "header_2.h"
...
#include "header_n.h"
export module your_library;
export namespace your_namespace {
  using decl_1;
  using decl_2;
  ...
  using decl_n;
}

extern “C++” style

module;
#include "third_party/A/headers.h"
#include "third_party/B/headers.h"
... // Important: **ALL** the 3rd party library headers including standard headers
#include "third_party/Z/headers.h"
export module your_library;
#define IN_MODULE_INTERFACE
extern "C++" {
  #include "header_1.h"
  #include "header_2.h"
  ...
  #include "header_n.h"
}

While defining a macro:

#ifdef IN_MODULE_INTERFACE
#define EXPORT export
#else
#define EXPORT
#endif

It is also recommended to selectively exclude third-party library headers from the files within extern "C++":

#ifndef IN_MODULE_INTERFACE
#include "third_party/A/headers.h"
#endif

#include "header_x.h"

...

This is very helpful for debugging.

Although it appears more complex, the extern C++ style offers better compile-time performance compared to the export using style.

At the same time, when all of your library’s dependencies provide modules, the extern C++ style can be easily converted to the following form:

export module your_library;
import third_party1;
import third_party2;
import third_party3;

...

#define IN_MODULE_INTERFACE
extern "C++" {
  #include "header_1.h"
  #include "header_2.h"
  ...
  #include "header_n.h"
}

After that, if you wish, it should also be relatively straightforward to use Modules in your project.

We hope that all libraries provide corresponding Modules. Ideally, the journey of modularization in the C++ world should be:

Top-down: each project provides Modules (starting from the std module).
Then bottom-up: each project uses Modules.

It is important to note that if you want to use modules but your third-party library does not provide Modules, you can mock one yourself in your own project, even for the std module.

以下为英文翻译（保留原有 markdown 格式）：

Choose a specific suffix for your wrapped third-party libraries to avoid conflicts

If your project has downstream source dependencies, append a specific suffix when wrapping third-party libraries to avoid conflicts for downstream users. For example, if your library is named aaa, then when you wrap the boost library, name its module boost.aaa.mock instead of boost.

One big thirdparty module

If your project includes multiple third-party libraries and most of them do not provide Modules, you can choose to wrap all third-party libraries into a single large thirdparty module. For example:

module;
#include "third_party/header_1.h"
#include "third_party/header_2.h"
...
#include "third_party/header_n.h"
export module third_party;
export namespace third_party_namespace {
  using third_party_decl_1;
  using third_party_decl_2;
  ...
  using third_party_decl_n;
}

Then all references to third-party libraries in your project can be brought in via import third_party;. This approach is crude but simple and effective.

Use .cppm and other special suffixes as the filename suffix for importable module units

Clang recommends using .cppm, .ccm, .cxxm, etc., as the filename suffix for importable module units. MSVC recommends .ixx. GCC has no special requirement.

Although users can in fact use any filename, I still recommend using .cppm as the filename suffix for importable module units. On the one hand, it is more friendly to small tools, such as tools for counting lines of code. On the other hand, this convention also helps when reading code. For example, when we see SourceManager.cppm and SourceManager.cpp, we naturally associate SourceManager.cppm with the interface and SourceManager.cpp with the implementation.

As for the choice between .cppm and .ixx, it comes down to personal preference. .ixx reads like “a better header” or “a C++ version of a header,” which I don’t think is the case. .cppm better matches my notion of “an importable C++ file.”

How do C++20 Modules reduce compile time?

Simply put, the compilation process can be divided into the frontend, mid-end, and backend. In the frontend, the compiler performs language-specific preprocessing, semantic analysis, and mid-end code generation. In the mid-end, the compiler performs language-agnostic, architecture-agnostic optimizations. In the backend, the compiler performs architecture-specific optimizations and code generation.

For projects with heavy template metaprogramming and compile-time computation, the frontend spends a lot of overhead on template expansion and compile-time calculations. For other less “C++-intensive” C++ projects, especially when optimizations are enabled, the mid-end and backend will take up the bulk of the time. Modules help by having the compiler read a pre-parsed file once, storing the information in an efficient format for later use. Each import only needs to read one file, regardless of how many files are included internally.

Frontend and Compile-Time Computation

For compile-time computation, C++20 Modules can serve as a very effective cache. For example:

export module Fibonacci.Cache;

export namespace Fibonacci
{
    ...

	template<unsigned long N>
	constexpr unsigned long Cache = Fibonacci<N>();

    template constexpr unsigned long Cache<30ul>;
}

Obviously, using Modules in this case can result in a very significant compilation speedup.

For template expansion, Modules also act as a cache within the compiler. For example, with this code:

// a.cpp
#include <vector>
#include <string>
#include <iostream>

int main() {
   std::vector<std::string> vec = {"hello", "world"};
   std::cout << vec[0] << " " << vec[1] <<"\n";
}

We compile with the following command:

$ time clang++ -std=c++23 a.cpp -ftime-trace=a.json -c

real	0m0.516s
user	0m0.490s
sys	0m0.023s

Then we mock a std module and instantiate std::vector<std::string> within it:

// a.cppm
module;
#include <vector>
#include <string>
#include <iostream>
export module a;

export namespace std {
   using std::vector;
   using std::string;
   using std::cout;
   using std::operator<<;
}

std::vector<std::string> unused = {"hello", "world"};

Then we use the same logic:

// a.cc
import a;

int main() {
    std::vector<std::string> vec = {"hello", "world"};
    std::cout << vec[0] << " " << vec[1] <<"\n";
}

Let’s compile:

$ clang++ -std=c++23 a.cppm --precompile -o a.pcm
$ time clang++ -std=c++23 a.cc -ftime-trace=a.imported.json -fmodule-file=a=a.pcm -c

real	0m0.077s
user	0m0.063s
sys	0m0.013s

The compilation speed has increased by 7.7 times. What about template instantiation time?

The header version:

$jq '.traceEvents[] | select(.name | IN("Total InstantiateClass", "Total InstantiateFunction"))' a.json
{
  "pid": 41756,
  "tid": 41763,
  "ph": "X",
  "ts": 0,
  "dur": 54859,
  "name": "Total InstantiateClass",
  "args": {
    "count": 246,
    "avg ms": 0
  }
}
{
  "pid": 41756,
  "tid": 41764,
  "ph": "X",
  "ts": 0,
  "dur": 50708,
  "name": "Total InstantiateFunction",
  "args": {
    "count": 109,
    "avg ms": 0
  }
}

(If you don’t have jq, you can load the data in chrome://tracing/)

And the Modules version:

$jq '.traceEvents[] | select(.name | IN("Total InstantiateClass", "Total InstantiateFunction"))' a.imported.json
{
  "pid": 41816,
  "tid": 41827,
  "ph": "X",
  "ts": 0,
  "dur": 2596,
  "name": "Total InstantiateClass",
  "args": {
    "count": 6,
    "avg ms": 0
  }
}
{
  "pid": 41816,
  "tid": 41830,
  "ph": "X",
  "ts": 0,
  "dur": 1510,
  "name": "Total InstantiateFunction",
  "args": {
    "count": 3,
    "avg ms": 0
  }
}

We can see that the instantiation time with Modules is more than 20 times lower than with headers!

Of course, the implementation of a.cppm here is a bit contrived. But in practice, similar phenomena will naturally occur with natural programming in Modules.

It’s important to note here that since Clang currently does not cache constexpr/consteval functions and Clang has a defect when constexpr/consteval are mixed, one should not assume that using Modules will significantly reduce this type of time. However, if Clang fixes these issues, the acceleration power of Modules in compile-time computation will surely be even greater.

Mid-end and Back-end

For functions with non-inline linkage (which can be simply thought of as inline functions plus implicitly instantiated functions), Modules can avoid the overhead of repeated optimization and compilation in the mid-end and back-end across different units.

For example:

// a.cppm
export module a;
export int a() { ... }

// a.cc
import a;
int aa() {
    return a();
}

When compiling a.cc, the implementation of the function a() is not involved at all. This can save a lot of time.

A related change on the C++ language side is that within a module purview, functions defined inside a class are no longer implicitly inline. For example:

// a.cppm
export module a;
export class A {
public:
    int a() { ... }
};

// a.cc
import a;
int aa() {
    A a;
    return a.a();
}

The implementation of A::a() in the code above is still not involved in the compilation of a.cc.

One impact of this behavior is that, compared to the previous model of one-to-one mapping with header files, the compiler loses the opportunity to inline the a() function in a.cc for optimization.

This point is controversial in the community. Many people oppose this behavior because it affects performance. In practice, we supplement this with thinLTO. With thinLTO enabled, we have not found any observable performance degradation. (We even found a few cases of slight performance improvement, the reason for which is unknown, possibly related to changes in code layout). In terms of behavioral specification, I discussed Should we import function bodies to get the better optimizations? in WG21, and the conclusion was that for a better ABI boundary, WG21 recommends the current behavior as the standard behavior.

However, considering the repeated opposition from multiple people, we should consider adding an option in clang to support this behavior in the future. But it needs to be noted that naively importing function bodies from other modules during frontend code generation is not a good implementation. This would lead to a compilation complexity in the mid-end of approximately O(N^2). We should consider embedding optimized LLVM IR in the BMI and avoiding repeated optimization of this optimized LLVM IR in the mid-end optimization passes. This LLVM IR would only be used for IPO (Inter-procedural optimization).

Negative impacts of Modules on compile speed

The introduction of Modules adds extra serialization and deserialization overhead to the compilation process. The serialization and deserialization overheads are expected to be relatively low, especially the deserialization overhead. If you find that the compiler’s deserialization or serialization overhead is too high, it might be a compiler defect.

Apart from compiler internal implementation details, the current design of Modules may introduce two negative impacts:

Scanning time.
Reduced compilation parallelism.

Scanning is a method of interaction between the build system (designed by CMake) and the compiler, which involves a preprocessing pass over the files before compilation begins to analyze the module units provided and required by that file. The overhead is approximately equal to one preprocessing pass. The necessity of this can be referenced in CMake’s articles and talks. Although I haven’t seen anyone in the community complain about this method, internally, perhaps especially due to bazel’s sandboxing mechanism, we can clearly see the overhead of Scanning, which is not what we want. To alleviate this problem, we have implemented a fast scanning mechanism internally. This mechanism does not perform preprocessing but instead performs simple string processing (like grep) on the source file to obtain information about the module units it provides and requires. This operation assumes that no import declaration or module declaration is located within an #include file. If an import declaration is within an #ifdef, fast scanning will add unnecessary dependencies but will not lose any. This mechanism is currently working very well, and if the community is interested, we will try to contribute it later.

The problem of reduced compilation parallelism can be understood as follows: suppose our project has only 32 source files and 100 header files, and each source file includes all header files, with the header files having a linear chain of dependencies. If we convert this project to modules using a 1:1 header:module interface unit mapping, we can expect the compilation speed after modularization to be slower than before on a machine with enough cores. This problem is not significant in projects with a large number of files, far exceeding the number of cores.

Recompilation issues from modifying interface files

When we first started the Modules conversion, we chose to replace header files with Module Interface Units and source files with Module Implementation Units, as the names suggest. However, as the conversion progressed, we found that Partitions are necessary to resolve forward declarations. After enabling Partitions, Module Implementation Units became very awkward. This is because the Primary Module Interface Unit will directly or indirectly import all module interface units in this module, and a module implementation unit will implicitly import the primary module interface unit. This causes any change to a module interface unit to trigger a recompilation of all module implementation units of that module. This is unacceptable.

To alleviate this problem, we found that it can be solved by using module implementation partition units. The syntax for a Module implementation partition unit is:

module module.name:partition_impl.name;

Other module units within the same module can import a Module implementation partition unit. But being able to import does not mean having to import. If we use Module implementation partition units as the former source files to implement the interfaces of this module, we can achieve very fine-grained dependency control, avoiding unnecessary recompilation.

However, it’s a bit unfortunate that in CMake, implementing this pattern still requires placing the Module implementation partition unit in CXX_MODULES. This will cause the user to pay an extra serialization cost. For details, see: [C++20 Modules] We should allow implementation partition unit to not be in CXX_MODULES FILES

Other uses for Module Implementation Partition Units

Besides being used as source files, we found another use for Module Implementation Partition Units is as interfaces used only within the Module, as their name implies. For example, header files in a test directory, or header files in the srcs directory of a project that separates include and srcs, should be replaced with Module Implementation Partition Units.

In this case, we should establish a rule that Module Implementation Partition Units cannot be imported in module interface units. This way, when we read the code, we can have a very clear perception of the interface versus the implementation details.

Non-Cascading Changes

Clang has implemented Non-Cascading Changes. The intention is to use the encapsulation of Modules to break the propagation of changes along the dependency chain. For example, for:

export module a;
export int a() { return 43; }

If we change 43 to 44 or another value, any user of module a can ideally safely avoid recompilation.

Similarly, for:

export module b;
import a;

No matter what changes we make to a, for all direct users of b (but not direct users of a), they can ideally safely avoid recompilation.

To achieve this, Clang has implemented Non-Cascading Changes. This means that for a BMI, its hash value can represent the signature of all its exposed interfaces. For a build system, this means it doesn’t need to consider a file’s non-direct import units. This change is very easy for a build system to implement. We have already implemented and stably run this feature in our downstream Bazel for a long time. Other build systems can consider adopting this change.

Try to avoid placing the same declaration in different TUs

Due to Clang’s implementation limitations, although Clang often accepts such code, its efficiency in handling the same declaration in different TUs is much lower. For example:

module;
#include <vector>
export module a;
std::vector<int> va;

module;
#include <vector>
export module b;
std::vector<int> vb;

// c.cppm
module;
#include <vector>
export module c;
import a;
import b;
...

The compilation speed of the above code should be slower than:

export module a;
import std;
std::vector<int> va;

export module b;
import std;
std::vector<int> vb;

// c.cppm
export module c;
import a;
import b;
import std;
...

This is because the previous code has duplicate code in module a, module b, and module c. The compiler is inefficient at handling this duplicate code, and the generated BMI and object files may also contain redundant content as a result. In clang, we provide the -Wdecls-in-multiple-modules option to check for this situation. This warning is off by default even under -Wall because this code does not actually violate the standard.

This is also why we strongly recommend that C++20 modules users must use the std module. Even if for various reasons you cannot use the standard std module, we recommend that users mock one themselves.

Mixing import and #include

In practice, for projects with complex dependencies, expecting all dependencies to provide modules is unrealistic (or will take a very long time). In such cases, we have to deal with mixing import and #include. The main problem with mixing import and #include is that the same declarations may be introduced across different TUs, which slows down compilation. Headers that contain only macros do not cause any issues.

My summarized practices are:

In non-wrapper module units, avoid including headers that contain anything other than macros.
In wrappers, when the compiler doesn’t error, import before #include performs better than only #include or #include before import.

For example:

module;
#include <vector>
export module my_module;
export std::vector<int> vi = {1, 2, 3, 4};

is not as good as

export module my_module;
import std;
export std::vector<int> vi = {1, 2, 3, 4};

This example is overly simple, but useful. Regardless of why the std module is unavailable in your environment, you should mock a std.mock module. Similarly, if your project needs Boost and the components you require currently don’t provide modules, you should mock a boost.mock module in your repository.

Also, in wrappers, importing before including gives the compiler more opportunities to optimize. For example, for:

//--- a.h
namespace a {
class A {
public:
    int aaaa;

    int get() {
        return aaaa;
    }
};

template <class T>
class B {
public:
    B(T t): t(t) {}
    T t;
};

using BI = B<int>;

inline int get(A a, BI b) {
    return a.get() + b.t;
}

}

//--- a.cppm
module;
#include "a.h"

export module a;

namespace a {
    export using ::a::A;
    export using ::a::get;
    export using ::a::BI;
}

//--- a.cpp
import a;
#include "a.h"

int test() {
    a::A aa;
    a::BI bb(43);
    return get(aa, bb);
}

Then run:

$ clang++ -std=c++20 a.cppm --precompile -o a.pcm
$ clang++ -std=c++20 a.cc -fmodule-file=a=a.pcm -Xclang -ast-dump -fsyntax-only
|-NamespaceDecl 0x8ebd718 prev 0x8ebd5f8 <./a.h:1:1, line:25:1> line:1:11 a
| |-original Namespace 0x8ebd688 'a'
| |-CXXRecordDecl 0x8ebda28 prev 0x8ebd790 <line:2:1, col:7> col:7 referenced class A
| |-ClassTemplateDecl 0x8ebe1c0 prev 0x8ebde18 <line:12:1, line:13:7> col:7 B
| | |-TemplateTypeParmDecl 0x8ebdd38 <line:12:11, col:17> col:17 class depth 0 index 0 T
| | |-CXXRecordDecl 0x8ebe128 prev 0x8ebdff8 <line:13:1, col:7> col:7 class B
| | `-ClassTemplateSpecialization 0x8eed998 'B'
| |-TypeAliasDecl 0x8eede48 prev 0x8eedbc0 <line:19:1, col:17> col:7 referenced BI 'B<int>':'a::B<int>'
| | `-TemplateSpecializationType 0x8eedb10 'B<int>' sugar
| |   |-name: 'B':'a::B' qualified
| |   | `-ClassTemplateDecl 0x8ebe1c0 prev 0x8ebde18 <line:12:1, line:13:7> col:7 B
| |   |-TemplateArgument type 'int'
| |   | `-BuiltinType 0x8e692f0 'int'
| |   `-RecordType 0x8eedaf0 'a::B<int>' imported canonical
| |     `-ClassTemplateSpecialization 0x8eed998 'B'
| `-FunctionDecl 0x8eee530 prev 0x8eee100 <line:21:1, col:25> col:12 used get 'int (A, BI)' inline
|   |-ParmVarDecl 0x8eedef0 <col:16, col:18> col:18 a 'A'
|   `-ParmVarDecl 0x8eedfa0 <col:21, col:24> col:24 b 'BI':'a::B<int>'
`-FunctionDecl

We can see that CXXRecordDecl … class A, ClassTemplateDecl … B, and FunctionDecl … get are all without definitions. Because the compiler imports a first in a.cc, when it later parses class A, it queries whether a definition of class A already exists; once found, the compiler considers class A already parsed and thus skips parsing class A in a.cc. In contrast, if we change a.cc to include first and then import:

#include "a.h"
import a;

int test() {
    a::A aa;
    a::BI bb(43);
    return get(aa, bb);
}

|-NamespaceDecl 0x9a105b0 <./a.h:1:1, line:25:1> line:1:11 a
| |-CXXRecordDecl 0x9a10640 <line:2:1, line:9:1> line:2:7 referenced class A definition
| | |-DefinitionData pass_in_registers aggregate standard_layout trivially_copyable pod trivial literal has_constexpr_non_copy_move_ctor
| | | |-DefaultConstructor exists trivial constexpr defaulted_is_constexpr
| | | |-CopyConstructor simple trivial has_const_param implicit_has_const_param
| | | |-MoveConstructor exists simple trivial
| | | |-CopyAssignment simple trivial has_const_param needs_implicit implicit_has_const_param
| | | |-MoveAssignment exists simple trivial needs_implicit
| | | `-Destructor simple irrelevant trivial constexpr
| | |-CXXRecordDecl 0x9a10758 <col:1, col:7> col:7 implicit class A
| | |-AccessSpecDecl 0x9a10808 <line:3:1, col:7> col:1 public
| | |-FieldDecl 0x9a10850 <line:4:5, col:9> col:9 referenced aaaa 'int'
| | |-CXXMethodDecl 0x9a10950 <line:6:5, line:8:5> line:6:9 used get 'int ()' implicit-inline
| | | `-CompoundStmt 0x9a10a90 <col:15, line:8:5>
| | |   `-ReturnStmt 0x9a10a80 <line:7:9, col:16>
| | |     `-ImplicitCastExpr 0x9a10a68 <col:16> 'int' <LValueToRValue>
| | |       `-MemberExpr 0x9a10a38 <col:16> 'int' lvalue ->aaaa 0x9a10850
| | |         `-CXXThisExpr 0x9a10a28 <col:16> 'a::A *' implicit this
| | |-CXXConstructorDecl 0x9a45c40 <line:2:7> col:7 implicit used constexpr A 'void () noexcept' inline default trivial
| | | `-CompoundStmt 0x9a46200 <col:7>
| | |-CXXConstructorDecl 0x9a45dc8 <col:7> col:7 implicit used constexpr A 'void (const A &) noexcept' inline default trivial
| | | |-ParmVarDecl 0x9a45f08 <col:7> col:7 used 'const A &'
| | | |-CXXCtorInitializer Field 0x9a10850 'aaaa' 'int'
| | | | `-ImplicitCastExpr 0x9a4b130 <col:7> 'int' <LValueToRValue>
| | | |   `-MemberExpr 0x9a4b100 <col:7> 'const int' lvalue .aaaa 0x9a10850
| | | |     `-DeclRefExpr 0x9a4b0c8 <col:7> 'const A' lvalue ParmVar 0x9a45f08 depth 0 index 0 'const A &'
| | | `-CompoundStmt 0x9a4b170 <col:7>
| | |-CXXConstructorDecl 0x9a45ff8 <col:7> col:7 implicit constexpr A 'void (A &&)' inline default trivial noexcept-unevaluated 0x9a45ff8
| | | `-ParmVarDecl 0x9a46138 <col:7> col:7 'A &&'
| | `-CXXDestructorDecl 0x9a4b1b8 <col:7> col:7 implicit referenced constexpr ~A 'void () noexcept' inline default trivial
| |-ClassTemplateDecl 0x9a10c38 <line:12:1, line:17:1> line:13:7 B
| | |-TemplateTypeParmDecl 0x9a10ab0 <line:12:11, col:17> col:17 referenced class depth 0 index 0 T
| | |-CXXRecordDecl 0x9a10b88 <line:13:1, line:17:1> line:13:7 class B definition
| | | |-DefinitionData standard_layout trivially_copyable has_user_declared_ctor can_const_default_init
| | | | |-DefaultConstructor
| | | | |-CopyConstructor simple trivial has_const_param needs_implicit implicit_has_const_param
| | | | |-MoveConstructor exists simple trivial needs_implicit
| | | | `-Destructor simple irrelevant trivial constexpr needs_implicit
| | | |-CXXRecordDecl 0x9a10d18 <col:1, col:7> col:7 implicit class B
| | | |-AccessSpecDecl 0x9a10dc8 <line:14:1, col:7> col:1 public
| | | |-CXXConstructorDecl 0x9a10fc8 <line:15:5, col:19> col:5 a::B<T> 'void (T)' implicit-inline
| | | | |-ParmVarDecl 0x9a10e58 <col:7, col:9> col:9 referenced t 'T'
| | | | |-CXXCtorInitializer Field 0x9a110b0 't' 'T'
| | | | | `-ParenListExpr 0x9a11138 <col:14, col:16> 'NULL TYPE'
| | | | |   `-DeclRefExpr 0x9a11118 <col:15> 'T' lvalue ParmVar 0x9a10e58 't' 'T'
| | | | `-CompoundStmt 0x9a11180 <col:18, col:19>
| | | `-FieldDecl 0x9a110b0 <line:16:5, col:7> col:7 t 'T'
| | |-ClassTemplateSpecializationDecl 0x9a46618 prev 0x9a11298 <line:12:1, line:17:1> line:13:7 imported in a.<global> hidden <undeserialized declarations> class B implicit_instantiation
| | | `-TemplateArgument type 'int'
| | |   `-BuiltinType 0x99bc2f0 'int'
| | `-ClassTemplateSpecializationDecl 0x9a11298 <line:12:1, line:17:1> line:13:7 class B definition implicit_instantiation
| |   |-DefinitionData pass_in_registers standard_layout trivially_copyable has_user_declared_ctor can_const_default_init
| |   | |-DefaultConstructor defaulted_is_constexpr
| |   | |-CopyConstructor simple trivial has_const_param implicit_has_const_param
| |   | |-MoveConstructor exists simple trivial
| |   | |-CopyAssignment simple trivial has_const_param needs_implicit implicit_has_const_param
| |   | `-Destructor simple irrelevant trivial constexpr
| |   |-TemplateArgument type 'int'
| |   | `-BuiltinType 0x99bc2f0 'int'
| |   |-CXXRecordDecl 0x9a34d20 <col:1, col:7> col:7 implicit class B
| |   |-AccessSpecDecl 0x9a34dd0 <line:14:1, col:7> col:1 public
| |   |-CXXConstructorDecl 0x9a34fd0 <line:15:5, col:19> col:5 used B 'void (int)' implicit_instantiation implicit-inline instantiated_from 0x9a10fc8
| |   | |-ParmVarDecl 0x9a34e78 <col:7, col:9> col:9 used t 'int'
| |   | |-CXXCtorInitializer Field 0x9a350b0 't' 'int'
| |   | | `-ImplicitCastExpr 0x9a4b6b8 <col:15> 'int' <LValueToRValue>
| |   | |   `-DeclRefExpr 0x9a4b5d8 <col:15> 'int' lvalue ParmVar 0x9a34e78 't' 'int'
| |   | `-CompoundStmt 0x9a11180 <col:18, col:19>
| |   |-FieldDecl 0x9a350b0 <line:16:5, col:7> col:7 referenced t 'int'
| |   |-CXXConstructorDecl 0x9a490c8 <line:13:7> col:7 implicit used constexpr B 'void (const B<int> &) noexcept' inline default trivial
| |   | |-ParmVarDecl 0x9a49208 <col:7> col:7 used 'const B<int> &'
| |   | |-CXXCtorInitializer Field 0x9a350b0 't' 'int'
| |   | | `-ImplicitCastExpr 0x9a4b410 <col:7> 'int' <LValueToRValue>
| |   | |   `-MemberExpr 0x9a4b3e0 <col:7> 'const int' lvalue .t 0x9a350b0
| |   | |     `-DeclRefExpr 0x9a4b3a8 <col:7> 'const B<int>' lvalue ParmVar 0x9a49208 depth 0 index 0 'const B<int> &'
| |   | `-CompoundStmt 0x9a4b450 <col:7>
| |   |-CXXConstructorDecl 0x9a496e8 <col:7> col:7 implicit constexpr B 'void (B<int> &&)' inline default trivial noexcept-unevaluated 0x9a496e8
| |   | `-ParmVarDecl 0x9a49828 <col:7> col:7 'B<int> &&'
| |   `-CXXDestructorDecl 0x9a4b498 <col:7> col:7 implicit referenced constexpr ~B 'void () noexcept' inline default trivial
| |-TypeAliasDecl 0x9a34930 <line:19:1, col:17> col:7 referenced BI 'B<int>':'a::B<int>'
| | `-TemplateSpecializationType 0x9a34880 'B<int>' sugar
| |   |-name: 'B':'a::B' qualified
| |   | `-ClassTemplateDecl 0x9a10c38 <line:12:1, line:17:1> line:13:7 B
| |   |-TemplateArgument type 'int'
| |   | `-BuiltinType 0x99bc2f0 'int'
| |   `-RecordType 0x9a11390 'a::B<int>' imported canonical
| |     `-ClassTemplateSpecialization 0x9a11298 'B'
| `-FunctionDecl 0x9a34be0 <line:21:1, line:23:1> line:21:12 used get 'int (A, BI)' inline
|   |-ParmVarDecl 0x9a349d0 <col:16, col:18> col:18 used a 'A'
|   |-ParmVarDecl 0x9a34a80 <col:21, col:24> col:24 used b 'BI':'a::B<int>'
|   `-CompoundStmt 0x9a35458 <col:27, line:23:1>
|     `-ReturnStmt 0x9a35448 <line:22:5, col:24>
|       `-BinaryOperator 0x9a35350 <col:12, col:24> 'int' '+'
|         |-CXXMemberCallExpr 0x9a351b0 <col:12, col:18> 'int'
|         | `-MemberExpr 0x9a35150 <col:12, col:14> '<bound member function type>' .get 0x9a10950
|         |   `-DeclRefExpr 0x9a35130 <col:12> 'A' lvalue ParmVar 0x9a349d0 'a' 'A'
|         `-ImplicitCastExpr 0x9a35338 <col:22, col:24> 'int' <LValueToRValue>
|           `-MemberExpr 0x9a35308 <col:22, col:24> 'int' lvalue .t 0x9a350b0
|             `-DeclRefExpr 0x9a352e8 <col:22> 'BI':'a::B<int>' lvalue ParmVar 0x9a34a80 'b' 'BI':'a::B<int>'
|-ImportDecl 0x9a35478 <a.cc:2:1, col:8> col:1 a
`-FunctionDecl 0x9a354e8 <line:4:1, line:8:1> line:4:5 test

We can clearly see the AST generated from a.cc increases substantially.

However, due to historical reasons in compiler development, import before #include poses significant challenges for compilers (as far as I know, including Clang, MSVC, and GCC), exposing various issues. That said, compilers are steadily improving here. Provided the target compiler doesn’t error, using import before #include to wrap third-party libraries is generally a good approach. For example, if our library needs Boost and we cannot modify Boost’s code, we can wrap it like this:

module;
import std;
#include "boost/..."
export module boost.mock;
export using boost::...

If we need to re-export a module that we import, we can export import it within the module purview:

module;
import std;
#include "boost/..."
export module boost.mock;
export import std;
export using boost::...

In the extern C++ style, if not all dependencies provide modules, we can do something similar:

module;
import std;
#include "third_party_not_providing_modules/..."
export module your_library;
#define IN_MODULE_INTERFACE
extern "C++" {
  #include "header_1.h"
  #include "header_2.h"
  ...
  #include "header_n.h"
}

When some of the dependencies do provide modules, we can import those and avoid including their corresponding headers.

module;
#include "third_party_not_providing_modules/..."
export module your_library;
import third_party_providing_modules;
#define IN_MODULE_INTERFACE
extern "C++" {
  #include "header_1.h"
  #include "header_2.h"
  ...
  #include "header_n.h"
}

Impact of Modules on code size

The fact that Modules avoid repeatedly generating function bodies with non-inline linkage, as mentioned above, is one of the main reasons they save code size. In addition, the design of C++20 Modules also helps to save debug information size. For example, using the same example as before:

// a.cpp
#include <vector>
#include <string>
#include <iostream>

int main() {
   std::vector<std::string> vec = {"hello", "world"};
   std::cout << vec[0] << " " << vec[1] <<"\n";
}

We compile with the following command:

$ clang++ -std=c++23 a.cpp -c -g -o a.o
$ du -sh a.o
164K	a.o
$ readelf -S a.o | grep '.debug'
  [278] .debug_abbrev     PROGBITS         0000000000000000  000018b8
  [279] .debug_info       PROGBITS         0000000000000000  00002053
  [280] .rela.debug_info  RELA             0000000000000000  00018810
  [281] .debug_rnglists   PROGBITS         0000000000000000  00007944
  [282] .debug_str_offset PROGBITS         0000000000000000  00007a82
  [283] .rela.debug_str_o RELA             0000000000000000  00018888
  [284] .debug_str        PROGBITS         0000000000000000  00008a86
  [285] .debug_addr       PROGBITS         0000000000000000  000130c5
  [286] .rela.debug_addr  RELA             0000000000000000  0001e870
  [294] .debug_line       PROGBITS         0000000000000000  00014170
  [295] .rela.debug_line  RELA             0000000000000000  0001fce0
  [296] .debug_line_str   PROGBITS         0000000000000000  0001545a

Then for the modules version:

// a.cppm
module;
#include <vector>
#include <string>
#include <iostream>
export module a;

export namespace std {
   using std::vector;
   using std::string;
   using std::cout;
   using std::operator<<;
}

std::vector<std::string> unused = {"hello", "world"};

// a.cc
import a;

int main() {
    std::vector<std::string> vec = {"hello", "world"};
    std::cout << vec[0] << " " << vec[1] <<"\n";
}

$ clang++ -std=c++23 a.cc -g -fmodule-file=a=a.pcm -c -o a.imported.o
$ du -sh a.imported.o
144K	a.imported.o
$readelf -S a.imported.o | grep '.debug'
  [278] .debug_abbrev     PROGBITS         0000000000000000  000018f8
  [279] .debug_info       PROGBITS         0000000000000000  00001f5e
  [280] .rela.debug_info  RELA             0000000000000000  00015cb8
  [281] .debug_rnglists   PROGBITS         0000000000000000  00005f98
  [282] .debug_str_offset PROGBITS         0000000000000000  000060d1
  [283] .rela.debug_str_o RELA             0000000000000000  00015d30
  [284] .debug_str        PROGBITS         0000000000000000  00006c6d
  [285] .debug_addr       PROGBITS         0000000000000000  00010870
  [286] .rela.debug_addr  RELA             0000000000000000  0001a2a8
  [294] .debug_line       PROGBITS         0000000000000000  000118f0
  [295] .rela.debug_line  RELA             0000000000000000  0001b6e8
  [296] .debug_line_str   PROGBITS         0000000000000000  00012b29

As you can see, a.imported.o is 12% smaller than a.o, and the sizes of various debug sections have also decreased.

Runtime issues encountered during module migration

The runtime issues I encountered during the Modules migration can be divided into ODR Violation issues and issues with internal linkage variables in original header files.

ODR stands for the One Definition Rule, which states that a declaration in C++ should have only one definition. This rule is often violated in practice due to duplicate names or a project (often indirectly) depending on different versions of the same library. A program that violates the ODR is considered to have undefined behavior. This often makes me wonder if there are any large-scale C++ programs that are truly free of UB. Many programs that violate the ODR can run normally, but they are very fragile and often expose very strange problems after a change in linking order. From this perspective, the problems exposed during this process may have given us an opportunity for self-examination.

The issue with internal linkage variables in header files is that the initialization functions of internal linkage variables in headers might be called many times, but after switching to modules, they are only called once. This change in the number of calls is reflected at runtime.

Forward declaration issues in Modules

To avoid ODR violations, Modules prohibit declaring and defining the same entity in different Modules. Therefore, the following code is UB:

export module a;
class B;
class A {
public:
    B* b;
};

export module b;
class B {
public:

};

The B declared in module a and the B defined in module b are not the same entity. To alleviate this problem, we can either place module a and module b in the same module and use partitions:

export module m:a;
class B;
class A {
public:
    B* b;
};

export module m:b;
class B {
public:

};

Or use extern "C++", which is considered to be in the Global Module:

export module a;
extern "C++" class B;
class A {
public:
    B* b;
};

export module b;
extern "C++" class B {
public:

};

Or we have to refactor the code.

TODO?

Frankly, the impact of Modules is indeed very large, leading to a high cost of adoption from the toolchain level to the user level, which is the main reason why the progress of Modules is currently very slow. But on the other hand, where there are problems, there are needs. The current state of Modules, being not so bad but not great either, is very suitable for friends interested in toolchains and the community ecosystem to participate in. There are many things that can be done and many low-hanging fruits. Here is a brief list, and discussions are welcome.

First, at the library level, we need more fundamental libraries to provide Modules. Currently, libc++, libstdc++, and MSSTL all provide the std module. The next most important is the Boost library; Boost providing Modules can be considered an important milestone. Additionally, creating wrappers for libraries that you actually need is also a great thing to do.

Then, at the ecosystem level, how to distribute projects that use C++20 Modules also seems to be a black box. My experience is partly in a closed world, and on the other hand, I use a modified downstream Bazel, so I have little knowledge about this in the open-ended world. I often see people asking similar questions, but I haven’t seen particularly good solutions. More practice may be needed.

At the code intelligence level, I think the difficulty of supporting C++20 Modules in clangd is not high, and the problem is very isolated, making it very suitable for getting started.

At the toolchain level, the most obvious issue is cross-platform problems. For example, [libc++] Fix C++23 standard modules when using with clang-cl on Windows. Such problems require a practical environment to solve. At the build system level, some features were mentioned earlier, but I feel the main blocker is in distribution.

Then there’s the tooling level. I have written a conversion tool myself, and I see many others have done similar things. There might still be work to be done in this area. Frankly, Modules are much simpler at the language level compared to other C++ language features. But on the other hand, the impact of Modules on the ecosystem is far greater than other features. This leads to a lot of repetitive work when doing a Modules conversion. If the community could have a unified, easy-to-use tool, I think it would be a great help. I think AI has great potential in this area.

Finally, at the compiler level, the barrier to entry is indeed higher due to its relatively closed internal data structures. The overall development of C++20 Modules in Clang is on a converging trend, as the main features have been implemented, and the remaining issues are mainly bug fixes and performance optimizations. Besides continuing to support new features like reflection and Contracts in serialization and deserialization, Clang C++20 Modules may have the following TODOs:

Add optimized IR to the BMI, use the optimized IR during frontend code generation, and have LLVM passes avoid doing too much work on this type of IR. This is expected to achieve a better balance between runtime performance and compile-time performance.
A standardized BMI format. Currently, the BMI format in clang has no specification to speak of and is versioned by commit hash. If we can standardize the BMI format, we can greatly enhance the compatibility of BMIs, and the so-called distribution of BMIs might no longer be a pipe dream. Even if different compiler options still cause incompatibility, there is a big difference between incompatibility and incompatibility. Microsoft has done https://github.com/microsoft/ifc-spec in this area. Clang can follow it or try to standardize the BMI format from Clang’s perspective.
Some extension attributes, such as [[headers_wrapper]], to mark that a named module is actually a wrapper for a series of header files, and then export other features. For example, if we mark the std module with [[headers_wrapper]], we could let the std module export its own header macros, so we would no longer have to worry about the multiple declaration issues in multiple TUs that the following situation might cause:

import std;
#include <vector> // Even if the compiler accepts it, it is not efficient.