Variable width encoding

For people working on the C++ code.
Post Reply
coloblorm
New member
Posts: 2
Joined: Mon Apr 22, 2024 16:01

Variable width encoding

by coloblorm » Post

My 'core' question is: How does the engine locate meta data for nodes? Is there a flag in each node? A chunk of additional storage per map block?

It seems like a huge issue when creating mods is everything must be packed into a fixed width, resulting in all sorts of weird compromises. I keep wanting to add metadata to nodes, but I am worried about the impact of doing so because I don't know how the engine stores and accesses additional data.

My instinct tells me the engine probably tries to isolate anything rendering-related so it can pump it out as quickly as possible, which is why nodes are limited to ID, light, and model/rotation data. I assume there is a parallel array of pointers to metadata, with a flag in the 'main' array value to signal its existence at a given coordinate..?

I love the elegance of UTF-8 character encoding: https://en.wikipedia.org/wiki/UTF-8#Encoding

Essentially, the first (most significant) bit of a byte communicates its relation to the following byte. If it is 0, the remaining 7 bits are the whole definition of the character. If it is 1, the next byte is part of the description.
As a sacrifice to help with locating the start of a character from any point in a stream of bytes, every byte which isn't the start of a character begins with 10, allowing the remaining 6 bits to store data. Thus, a 2-byte character automatically must begin with 110, but then the following byte can store 6 additional bits of data:

Code: Select all

Capacity  Bits
128       0xxxxxxx
2048      110xxxxx 10xxxxxx
65536     1110xxxx 10xxxxxx 10xxxxxx
2097152   11110xxx 10xxxxxx 10xxxxxx 10xxxxxx
Does Minetest use a similar system, or else has there been any discussion about its implementation? This could be applied to 16-bit or 32-bit units relatively easily, though again with the caveat that you'd need to signal whether or not you're at the beginning of a unit.

Any insight on this would be greatly appreciated!

User avatar
LMD
Member
Posts: 1490
Joined: Sat Apr 08, 2017 08:16
GitHub: appgurueu
IRC: appguru[eu]
In-game: LMD
Location: Germany
Contact:

Re: Variable width encoding

by LMD » Post

Yes, metadata is stored "out of band", which is reasonable. See src/mapblock.h and src/nodemetadata.h for details. Specifically, metadata is stored in a sorted map (I'm not sure why it isn't a hash map) from node position to metadata, which makes some sense because often metadata is very sparse, so it would be rather wasteful to allocate an entire array for "null" entries.

UTF-8 encoding is elegant but has its downsides: Accessing a codepoint, given an index, is unacceptably slow (linear time). We need this operation (setting a node, getting a node) to be very fast. Hence a variable-length encoding format would not work well for us. We want constant-time or near constant time indexing, which arrays or maps can provide.
My stuff: Projects - Mods - Website

coloblorm
New member
Posts: 2
Joined: Mon Apr 22, 2024 16:01

Re: Variable width encoding

by coloblorm » Post

I definitely see what you're saying with variable length. The operations are too complex per iteration vs simply grabbing a value and then moving forward a fixed amount. I suppose also sacrificing bits to describe data length isn't all that space-efficient to begin with.

I'll study the files you mentioned along with some other very interesting-looking files in that list to make sure I properly understand what's going on behind the scenes. I'm relieved to see they're well-commented, because I'm much more familiar with C than C++!

Post Reply

Who is online

Users browsing this forum: No registered users and 1 guest