I must say I am not an absolute master at this, but I think I have learned a fair few things that are worth sharing. Please chime in if you have something to add. I must also thank everyone who has responded so far, I have been using your input to revise and improve this Original Post (OP).
What is size?
Storage medium (plural: media) := A device for permanently storing bytes. Examples include hard disk drives and solid state drives, but also older media like optical media, floppy disks and so on.
Filesystem := The extra bits and bytes your computer writes onto your storage medium to keep track of where all the files are and what their names are. This obviously takes some space on its own, hence the standard disclaimer on storage media of 'available size may be less due to formatting'.
Apparent size := How many bytes actually make up the content of a file. This is usually the main figure a file browser application reports.
Block size := the minimum allocation size for parts of a file in a filesystem. Most desktop media with a typical filesystem allocate storage in blocks of 4096 bytes (4K). If a file is any smaller than 4K, it still takes up 4K in storage.
Disk usage := How much space all the files take up in your storage (solid state, spinning disk), which is always a multiple of block size.
Archive file := A file that packs multiple files together, often with metadata such as file permissions, and is often compressed. Even when uncompressed, archive files can often pack many small files together and result in lower disk usage than when all the files are sitting directly on the file system.
Git repository size := How much disk usage a git repository of the mod occupies (other version control systems are available).
If you are using version control like git, that has to store the information to reconstruct every version of your mod. Particularly bad are non-text files like models, textures and sounds, because they aren't ever recorded as deltas (a file that but instead an entirely new version of the file. Your Lua files only need the differences between versions recorded, which makes new versions take much less space. The same goes for your README, LICENSE, settingtypes.conf, mod.conf and other text files.
Git host := Sites that hold git repositories online, enable browsing via a web browser, and may include other features for collaboration, continuous integration et cetera. Examples include GitHub, GitLab, NotABug, CodeBerg, sourcehut. Self-hosted options include GitLab self-hosting, Gitea, Gogs, sourcehut self-hosting and cgit.
Downloaded mod size := The disk usage for a copy of a mod downloaded without any version control files, such as you would get from ContentDB or a git host.
Media size := The size of the files that a client must download when connecting to a server. It is the sum of the disk usage of all the sent media.
Transfer size := The size of the representation of a media file as it is received by the client. Is usually the apparent size of the file, although it differs from the apparent size of that file if it is sent compressed over the network. There is also a small size overhead for UDP/IP.
Luanti sends media from the following directories of a mod: textures, sounds, media, models, locale. We will discuss most of these separately. Media size is smaller than downloaded mod size, though usually not by much unless the mod is almost entirely code with little to no assets.
Lossless compression := A file compression technique that does not result in any loss of information, detail, resolution, and so on. In most cases, the compressed file can never be as small as a lossy compressed one.
Lossy compression := A file compression technique that can reduce the size of a file by sacrificing the ability to perfectly reconstruct it later on. The stored output is only an approximation, and in many cases when a file is lossy compressed more than once, it loses quality every time.
Please, if you are authoring media files, always keep uncompressed or lossless compressed files as the original, and re-export them to lossy compressed formats only as the last step once you are done revising the file.
Measuring Media size
It's a lot trickier to measure the actual media download size than the size of the mod on your hard drive.
But all of the old versions in git are irrelevant to the actual download size, as are the Lua files, your mod.conf, depends.txt and anything else that isn't in a textures/, models/, media/, sounds/ and locale/ directories of your mod. So you will probably get a good approximation by selecting just those directories and counting the total file size from those (and try to avoid rounding while you do that especially if it's a modpack).
Another method you could use is to measure it empirically. Run a portable Luanti installation as a client, which is the default for Windows for instance. You would run a server with Minetest Game (or some other game) and no mods, and connect a client which has Minetest Game (or the same other game) installed to it, and then measure the size of the .minetest/media/cache directory. This gives you a baseline size for connecting to the server. Then you can add to the server any dependencies that your mod has, and measure the size of the cache directory again, just to establish a baseline that excludes your mod. If your mod works without dependencies, then you can skip that step. Finally, you would install your own mod into the server and measure the cache directory one more time. Then you should be able to subtract the first and second baseline numbers to tell you how much data (1) Minetest Game/your game without any mods takes (2) your mod's dependencies takes and (3) just your mod takes. Don't install any mods to the client, don't launch it in singleplayer and don't open the Content tab, as those may add files to the cache that you don't want.
Once you have a good idea of what the actual downloaded size of your mod is, you can figure out if you want to try to decrease its size. You will do this by running different kinds of disk usage optimisation on your media files.
Disk Usage Optimisation
Textures
What's there to gain or lose?
- You will benefit most from optimising your textures if they have fewer colours, flat areas of colour or for photorealistic textures if you save as JPEG.
- You can lose too much detail from textures if you try too hard to compress them. You have to decide what your acceptable level of detail is.
- Textures will usually be easier to optimise than models.
My advice? Don't use BMP at all, and don't use TGA unless it's for a tiny, tiny file - even though a lot of files are going to be smaller than 4KiB. While BMP and TGA do support basic RLE compression when created by some programs, PNG's compression is going to be better in most cases. There's also no guarantee Luanti will handle the combination of palettes and RLE scheme used correctly.
TGA has had a complicated history with Luanti, however, in my opinion there are not many compelling reasons to use it. mcl_maps, part of Mineclone2/5, was using TGA for these Mineclone5 maps is because TGA is an easier format to encode from Lua than PNG. The Minetest (the name at the time) devs had almost forgotten/never knew anyone was using TGA, and ripped it out to reduce how much code they had to manage. Well, the easiest way to find out if somebody uses some feature is to remove it and see if any complaints come in. Long story short, a PNG encoder was written in C++ and added to Luanti so that the excuse of "It's easier to encode" for TGA no longer rang true, but TGA support was also added back. For more information on this complicated history, where nobody is obviously right or wrong, see erlehmann's post in this thread.
The small set of use cases for TGA is for dynamically generated media where the pixel dimensions are quite small, less than 32 pixels in almost all cases (there are always complicated exceptions). xmaps uses TGA effectively because it has a very small palette and pixel dimensions. TGA is also encoded directly into the item metadata of those maps, so every byte counts, and it works out smaller than PNG, especially core.encode_png. Remember: Reducing apparent size will reduce transfer size, but it will not reduce disk usage. This is why the smaller apparent size of TGA is usually moot, and the standard choice of PNG is almost always best. That is why I can only recommend TGA for dynamic media of low resolution; if that's not your use case, disregard the format.
For PNG textures, optipng, oxipng and similar programs are commonly used to reduce their size. pngcrush is an older program than those first two and is obsolete; don't use it. For maximum compression with optipng, you should also strip the metadata. An example invocation of optipng is optipng -O7 -strip all $FILENAME, which optimises as hard as possible but takes longer to run, and will remove all metadata. Removing metadata is at your discretion: maybe you want to keep the authorship information in the file. You may find the odd file with a large amount of metadata though.
For something more extreme than PNG optimiser programs, you can try opening the file in GIMP, and converting to indexed colour with a small palette size. A palette size of 8-32 colours is what I consider normal, depending on texture complexity; of course if your texture only has 2 colours use a 2-colour palette. Be aware that dithering will reduce the gains you can make from converting to indexed colour, because it will be harder to compress a file with dithering; you should probably disable dithering. Finally, export with minimal metadata (untick most/all of the export option boxes), or run the output through optipng to remove all metadata.
Another lossy option for PNGs is pngquant. That is a separate command-line program that can reduce the palette size of files. If you have a big set of files to process with the same basic operations then this would be a better option than doing every file manually in GIMP. Make sure you disable dithering or you may not gain much.
You can also put any images that share an indexed colour palette into the same file and use Luanti's tilesheet functionality. This will save disk usage if the apparent size of any file would be less than 4K. If you are authoring a game or extensive mod and care about file size a lot, strongly consider designing your textures around a single colour palette of your own or someone else's creation, and placing all of your node textures into a single tilesheet file. You will probably want to make an exception for mesh nodes and entities, and not include those in your tilesheet. Meshes require UV mapping, which will make all your models entangled together and cause a massive headache with the amount of complexity that adds. As a hypothetical example of the power of tilesheets, Zughy's Soothing 32 as small already as a texture pack for Minetest Game that uses indexed colour. However it could save a lot more disk space if all the Minetest Game textures were on one tilesheet - however this involves using overrides and would need per-mod support even for mods that just use the original filenames.
JPEG would be suitable in case you are using high resolution textures and want to save space - you can tune the JPEG quality according to how you want your space/quality tradeoff to go. If you want to go beyond simply reducing the quality, you can enable chroma subsampling. This reduces the resolution of the colour information without reducing the resolution of the brightness information. In GIMP when exporting a JPEG, under advanced options you can select subsampling 4:2:0 (chroma quartered). As always test the result to make sure it's acceptable. But the file size can be affected quite drastically.
Sounds
What's there to gain or lose?
- As you decrease sound quality, you may not enjoy the music enough.
- A lot of sound effects don't need a lot of detail, so you may be able to downsample them without a noticeable loss of quality.
The space/quality tradeoff is similar to JPEG because OGG is also a lossy compression algorithm. Since we are dealing with audio, your options are not quite as simple as JPEG. Audio can reduce its target bitrate (or quality setting), which will reduce overall fidelity by running the lossy compression harder. Audio can also reduce the bit depth, which is the amount of bits in a single sample, which is comparable to bits per pixel or bits per channel in image formats. Finally the setting that can't be compared to image formats is the sample rate, given usually in kHz, which is how many samples are played each second. You can read more about these concepts online.
As a start, you might try re-encoding with lower quality from a .ogg exporter like Audacity. If you need more control ffmpeg will help, but be prepared for a steep learning curve with ffmpeg.
My advice, though am no expert in audio, is typically to reduce bitrate/quality if you want to reduce the size - this is usually the primary way to reduce size, and it's only one variable to tune your quality/space tradeoff on. Reduce bit depth and sample rate if you want to go further into tweaking. Reduced bit depth may work for a chiptune/retro sound, since older games often used small bit depths and 'bit crushing' is a technique often used for that retro sound. Sample rate can go down to 24 kHz acceptably for human hearing, though not as ideal as 44.1/48 kHz. Lower than 24 kHz will start to sound like phone hold music, which is never nice.
Music will take up some space. Compared to music, sound effects usually take up minimal space but again you can still reduce the file size by re-encoding.
Compared to PNG files, audio can take a surprising amount of space. This seems intuitive for music, but even short little hit sounds are way bigger than detailed 16px textures. For very short files, there is a minimum size of about 4K I have found in practice, just have an audio file at all. Yes, apparent size, not just disk block usage. See "SFXR/Sound effect generator format direct support" for my notes on this.
3D Models
What's there to gain or lose?
- Choosing the right model format can net you a fair amount of savings.
- If you lossy compress your OBJs too much, you will get nasty z-fighting.
ExeVirus created a compressor for OBJ which he claims that for simple models can make smaller file sizes than B3D: compress-obj. I would measure the size of your non-animated models in both B3D and OBJ compressed with compress-obj, and choose your format based on that. If you are still trying to squeeze for bytes, you can use compress-obj's lossy options.
LMD, on the other hand, claims that with a better exporter, B3D could be even smaller. Well, keep your eye out in case that eventuates, but even if it does, keep comparing OBJ and B3D size.
Locale files
What's there to gain or lose?
- Not a lot to gain here, you'd be lucky to gain more than 1-2 filesystem blocks.
- But if you try the hacky technique with numeric codes, there is definitely the possibility of ruining the experience for someone who joins the server with the target language.
You can remove any comment lines and any redundant newlines. Maybe modify the text for brevity. For somewhat sensible things, that's about it.
Now here's a hack for you: Now, I think these days Luanti supports translating from some other language into English, instead of assuming English is the source language. You might be able to shorten the source strings by making them numeric identifiers and this would reduce the size of all your locale files, but I haven't experimented with it to be sure that you wouldn't end up messing up at least one language. It would be a bit iffy, but technically you could choose something obscure like Kazakh (kk) as the source language but just put numbers in for the source strings, and then just hope no Kazakh players with their language set to Kazakh join... Can any expert tell me if there's a not-hacky way to do this?
Thankfully, the client-side translation features in Luanti are also bandwidth-optimised: Only the translation files that a client wants will be downloaded (ref).
Not yet investigated: 5.10 introduced gettext support, which may be larger or smaller than the older .tr format. Using plural support from gettext would use more space than using bodges like putting plurals in brackets e.g. EN: "hedge(s)" -> DE "Hecke(n)"
Closing notes on disk usage
Of course, a lot of these lossy methods I just mentioned can have consequences to the asset quality, so you also have to weigh the loss of quality against file size savings.
For any file that is already less than a filesystem block in size, it's pointless to try to reduce its size because you can't reduce its disk usage. So don't worry about such small files.
Git Repository Size Optimisation
This is left until last because it is usually not as important. You can expect only powerusers or mod contributors to download the full git repository. There are a few tricks to save bandwidth for both downloaders and git authors.
Taking care about sizes
Let's start with the obvious: Reduce the number and size of your commits. That is not to say you should be afraid of changing your mod, but there are positive steps you can take to reduce the amount of churn in your repository. Sometimes the opposite is true: you should break your changes into logical steps with different commits - for instance, one commit to fix that latest bug, another one to update your README because it's gone out of date (but if you want to put the fact that you fixed the bug into your README, do that in the same commit as the bug fix - the key point is whether the changes are related).
Have a style guide and enforce it: tabs or spaces, when to indent, acceptable lua-isms, and so on. If the style guide is enforced properly there are no mixed tabs/spaces files or other things to annoy people, and they will not feel the need to change lines just to reformat them. This reduces 'diff noise', which is what happens when a commit introduces changes that aren't relevant to the actual code.
Consider squashing and rebasing before merge: Read about "rebase vs merge workflow" online. If you rebase your branch onto the target branch, there is no need for a merge commit, which reduces noise in the git log. If you use git rebase -i, you can squash commits together. This is very helpful because you can often make several 'work in progress' type commits and condense them down later when your changes have worked out to be good.
Avoid changes to media files as much as possible, including squashing any WIP versions out of existence. Git tracks the entire content of binary files in each revision, rather than just the diffs like it will do for text files. Exporting to OBJ can help with this because it's a text format and will be diffed quite easily by git unlike most of the file formats used for Luanti which are binary.
Media source files
You will usually have files that produce your final media files, but that you want to keep around as the sources for those files. You should still keep these version controlled, but they are often quite large. What should you do?
Exclude these media files from the downloaded mod size - potential mod contributors should always get your files through git. For the rest, they will download your files as archives (usually .zip or .tar.gz). You can mark files as export-ignore with a .gitattributes file. Read more about it with git help attributes and git help archive. ContentDB will follow your gitattributes when creating archives for your content; git hosts like GitHub and GitLab should also obey gitattributes. Also, you can run git archive yourself to create such archives; internally this is basically what ContentDB, git hosts and so on will be using.
You might also want to sanitise your media files. For instance, a lot of Inkscape and Blender project files will include external file references, which often use absolute paths, which might tell people your real name or just too much information about what kind of files you keep. Try to use relative paths instead of absolute to avoid this. You can remove paths and other metadata you don't like when exporting or with external tools such as metadata strippers or hex editors.
Another great reason to switch your media source files to relative paths is so that anybody who downloads your .blend file for instance can still use external assets. If the external assets have absolute paths, chances are the next person to open your project won't be able to find the files and it will be a big mess. External assets will save space compared to including them in the .blend, plus they can be edited externally.
Check if your media files support compression from within the application you used to make them. For instance, GIMP's XCF image format can be compressed, as can Blender's .blend file format. Since these are binary files either way, git will be storing the full version of them each time, so the savings are definitely important. Fair warning though for the paranoid: you won't be able to search for and see most of the file contents you would want to redact in a hex editor if the file is compressed.
Git submodules
Submodules are a feature in any recent version of git that allow you to manage git repositories as dependencies of your git repository; run git help submodule to learn more. I usually recommend git submodules for modpacks to manage dependencies properly, like Pandorabox's mod pack, but there are space saving you can make with them as well. For instance, you could have optional submodules not clone them every time, or if you only need them to be there but don't need every version, you can shallow clone them.
Using git lfs or similar
Take this advice with a grain of salt; I haven't actually used what I am recommending here, but I do like it in principle.
git lfs: This is a git plugin, short for "large file storage" that changes the way designated files are stored. It is provided with many installations of git such as Git for windows, but it is separately installable if you don't have it; on Linux and macOS Homebrew it might be a separate package.
Git LFS will save LFS-tracked files similar to a pointer to the file, rather than the entire file. This can drastically reduce the file download size of the git repository. When a commit is checked out, then LFS will fetch the full file from the LFS server.
There is one big caveat to LFS though: The git host has to support LFS, and may even have restrictions on LFS. GitHub, GitLab and Gitea support LFS, although your Gitea host/instance may not have it installed.
An alternative is git-annex, which supports many other non-git-related sources for its files, such as cloud storage hosts, rather than just being bound to the LFS server.
Shallow cloning
By default when cloning a git repository, all of the history of the default branch is retrieved, going all the way back to the initial commit. This is not a problem for small, short-running and small-footprint projects. However, it can be a huge issue for projects that are very long-running, that have big binary files, and have very frequent commits. For instance, it would be inadvisable to download the entire git history of the Linux kernel, which takes up more than a gigabyte.
Here's where shallow cloning comes in. You can restrict the download of git objects back to a certain number of commits, or back to a certain date and you can exclude branches or tags. Read more with git help clone at the terminal or git bash prompt. Relevant options are: --depth, --shallow-since and --shallow-exclude.