[no sq] Split blockpos into 3 different columns in sqlite3 map database #15768

sfan5 · 2025-02-08T16:44:46Z

This changes the map database schema for sqlite to be (x, y, z, data) instead of (pos, data).

Advantages:

better range queries enables possible optimizations inside Luanti and for external tools
moving away from the "block as integer" code that is apparently so cursed

Disadvantages:

If this SO answer is to be believed, this might actually be minimally slower

To do

This PR is a Work in Progress

add simple unit tests
update docs

How to test

create new world, inspect map.sqlite afterwards
join old world (it should work)
try migrating an old world to e.g. leveldb and back to sqlite (should change to new format)

appgurueu · 2025-02-08T18:19:53Z

arbitrary range queries enables possible optimizations inside Luanti and for external tools

Note that this index will only allow SQLite to speed up the following (parts of) a query:

x BETWEEN ? AND ?
x = ? AND y BETWEEN ? AND ?
x = ? AND y = ? AND z BETWEEN ? AND ?

That is, a range query can essentially only be done along a single axis, and all preceding axes must be equality comparisons (or SQLite will effectively have to loop over all possible values for each), in the order the keys appear in the index.

sfan5 · 2025-02-08T18:22:39Z

We can add more indexing if needed tho.

Edit: we can however influence the order of the default index. (X, Z, Y) sounds most useful so I'll change that.

lhofhansl · 2025-02-09T17:32:03Z

While you are at it, we could also split out a block's timestamp into a separate column.
This timestamp is a frequent cause for writing entire blocks back to the db, even when nothing else has changed.

lhofhansl · 2025-02-09T17:40:42Z

Note that this index will only allow SQLite to speed up the following (parts of) a query

If all our databases support z-order we can do approximate spatial indexing that way. (It's not hard to add it via plpgsql to Postgres for example, and then index on a function.)

On the whole I would expect this change to slow things, but that's just a gut feeling.

src/database/database-sqlite3.cpp

sorcerykid · 2025-02-10T12:14:43Z

Overall I'm in favour of this PR, as it will significantly improve performance of my map analysis scripts, in particularly RocketLib Toolkit which includes many functions for iterating mapblocks in certain areas.

https://bitbucket.org/sorcerykid/rocketlib/src/master/

Currently, I have to resort to generating a "Cache" database that just consists of a cross-reference table for indexing Block Position (X, Y, Z) to Block ID. This process can be extremely time-consuming, particularly on a map.sqlite that is several gigabytes. It would be so nice to finally eliminate that step entirely.

My map viewer application also makes extensive use of position queries for rendering and navigating a density map along the X, Y, or Z axis. Currently, I have to scan the entire database and manually total up the selected mapblocks. But with separate columns for Block Position, this could be streamlined.

sfan5 · 2025-02-10T15:47:51Z

While you are at it, we could also split out a block's timestamp into a separate column.
This timestamp is a frequent cause for writing entire blocks back to the db, even when nothing else has changed.

Would make sense to do at once, but extracting out the timestamp has a whole other bunch of different implications.

If all our databases support z-order we can do approximate spatial indexing that way.

I couldn't find much info on native support in databases for this but also I don't really see any use case.
Letting the database sort through the x,y,z columns should be fast enough if we need it.

sorcerykid · 2025-02-10T17:28:31Z

This timestamp is a frequent cause for writing entire blocks back to the db, even when nothing else has changed.

Can you point to where this happens? I'm genuinely curious, because I was under the impression mapblocks were only written back to the database if a flag was set for one or more changes (aside from just a new timestamp). After all, if a mapblock is already loaded in memory, then I would think the timestamp is also in memory. So It seems the solution is simply not to write the mapblock to the database if only the timestamp has changed.

sfan5 · 2025-02-10T18:31:59Z

Can you point to where this happens?

luanti/src/serverenvironment.cpp

Lines 1434 to 1448 in dd0070a

    
           		/* 
        
           			Handle removed blocks 
        
           		*/ 
        
           		// Convert active objects that are no more in active blocks to static 
        
           		deactivateFarObjects(false); 
        
           		for (const v3s16 &p: blocks_removed) { 
        
           			MapBlock *block = m_map->getBlockNoCreateNoEx(p); 
        
           			if (!block) 
        
           				continue; 
        
           			// Set current time as timestamp (and let it set ChangedFlag) 
        
           			block->setTimestamp(m_game_time); 
        
           		}

Note that this only marks the block to be written when unloaded, not immediately.
One reason I can think of why we need this is the LBM dtime_s parameter.

Someone could go and test how many byte of disk writes it would save if we could update the timestamp independently, for a typical SP or MP session.

src/database/database-sqlite3.cpp

lhofhansl · 2025-02-11T02:07:51Z

Created two worlds with same seed et al. Seen no measurable degradation in load speed. Good!
Also tried to access old db, works as expected.

lhofhansl · 2025-02-11T02:20:16Z

Someone could go and test how many byte of disk writes it would save if we could update the timestamp independently, for a typical SP or MP session.

Just rememberd f1349be :), before we marked every loaded block dirty. Now it's just unloaded active blocks.

lhofhansl · 2025-02-11T03:54:49Z

Now that I am looking a bit more, check this one: https://www.sqlite.org/lang_createtable.html#rowid, especially:

A PRIMARY KEY column only becomes an integer primary key if the declared type name is exactly "INTEGER". Other integer type names like "INT" or "BIGINT" or "SHORT INTEGER" or "UNSIGNED INTEGER" causes the primary key column to behave as an ordinary table column with integer affinity and a unique index, not as an alias for the rowid.

It seems all this time we had declared the sqlite table such that it had two INT keys. (Because we used INT instead of INTEGER to declared the PK).

And when I change that INTEGER and create a new world (with same seed) I see that pure DB load time of a block goes from 6-7us to about 5-6us. In the scheme of things it won't make a visible difference, though.

sfan5 · 2025-02-11T10:57:38Z

It seems all this time we had declared the sqlite table such that it had two INT keys. (Because we used INT instead of INTEGER to declared the PK).

Wow. I can't say I'm a fan of all the subtle behaviors SQLite has (often justified with backwards compatibility).

see: <https://www.sqlite.org/datatype3.html>

lhofhansl · 2025-02-12T17:43:41Z

More of the sqlite quirkiness... According to this https://www.sqlite.org/rowidtable.html

The PRIMARY KEY constraint for a rowid table (as long as it is not the true primary key or INTEGER PRIMARY KEY) is really the same thing as a UNIQUE constraint. Because it is not a true primary key, columns of the PRIMARY KEY are allowed to be NULL, in violation of all SQL standards.

So:

CREATE TABLE `blocks` (
    `x` INT, `y` INT, `z` INT,
    `data` BLOB NOT NULL,
    PRIMARY KEY (`x`, `z`, `y`)
);

Is the same as:

CREATE TABLE `blocks` (
    // the implicit rowid INTEGER PRIMARY KEY,
    `x` INT, `y` INT, `z` INT,
    `data` BLOB NOT NULL,
    UNIQUE  (`x`, `z`, `y`)
);

And if so, we can replace the implicit rowid column with our pos column and maintain as before:

CREATE TABLE `blocks` (
    'pos' INTEGER PRIMARY KEY,
    `x` INT, `y` INT, `z` INT,
    `data` BLOB NOT NULL,
    UNIQUE  (`x`, `z`, `y`)
);

This way we get the benefit of the x, y, z columns and backwards compatibility. Need to be tested of course!
(And well, of course, an old server writing would now mess up our database, since it won't write the x,y, and z columns.)

And according to this: https://www.sqlite.org/lang_createtable.html it's identical to:

CREATE TABLE `blocks` (
    'pos' INTEGER PRIMARY KEY,
    `x` INT, `y` INT, `z` INT,
    `data` BLOB NOT NULL
);
CREATE UNIQUE INDEX xyz ON blocks(x, y, z);

This also makes it clear that by adding the x, y, and z columns we essentially have just added an index.

sfan5 · 2025-02-12T19:35:22Z

I'm sure that could be considered an elegant solution (by some measure), but I really don't want to make the "is this a new format or old format map" decision on a row-basis. It just complicates things for everyone and also literally breaks range queries.

lhofhansl · 2025-02-13T05:33:41Z

Looked at some DB sizes. Sames seed, and camera position and direction. viewing_range = 1000, mostly an open scene with some ocean. Loads about 8000 blocks.

Master: 33.2MB
This PR: 33.2MB - no change in size interestingly and surprisingly
Master with INT replaced by INTEGER: 26.8MB - this saves the extra index, 20% space. This is actually what I'd prefer.
This PR (with WITHOUT ROWID) added: 33.2MB - so that doesn't help at all.

sqlite seems to do well with this change.

So while I'd prefer the old format (with the extra useless index removed for 20% space saving), I guess I'll be the only one. At least this one doesn't increase the size, and performance is on par as far as I can tell.

sfan5 added @ Server / Client / Env. Feature ✨ PRs that add or enhance a feature labels Feb 8, 2025

sfan5 force-pushed the sqlite_xyz branch from d377284 to 51ba93f Compare February 9, 2025 10:46

Zughy added the Roadmap: supported by core dev PR not adhering to the roadmap, yet some core dev decided to take care of it label Feb 9, 2025

lhofhansl reviewed Feb 9, 2025

View reviewed changes

src/database/database-sqlite3.cpp Show resolved Hide resolved

sfan5 force-pushed the sqlite_xyz branch from 51ba93f to 9cf4bfb Compare February 10, 2025 18:22

sfan5 force-pushed the sqlite_xyz branch from 9cf4bfb to bf773ea Compare February 10, 2025 18:55

sfan5 marked this pull request as ready for review February 10, 2025 19:03

lhofhansl reviewed Feb 11, 2025

View reviewed changes

src/database/database-sqlite3.cpp Outdated Show resolved Hide resolved

Some cleanups in Database_SQLite3

bfed21f

sfan5 force-pushed the sqlite_xyz branch from bf773ea to 98129bf Compare February 11, 2025 11:32

Split blockpos into three columns in sqlite3 map database

2203e6d

sfan5 force-pushed the sqlite_xyz branch from 98129bf to 91eb4f2 Compare February 11, 2025 11:48

sfan5 added 2 commits February 11, 2025 19:33

Add unit tests for MapDatabase implementations

2d1334b

Simplify SQLite3 schema types

1cc07d7

see: <https://www.sqlite.org/datatype3.html>

sfan5 force-pushed the sqlite_xyz branch from 91eb4f2 to 1cc07d7 Compare February 11, 2025 18:33

lhofhansl approved these changes Feb 13, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[no sq] Split blockpos into 3 different columns in sqlite3 map database #15768

[no sq] Split blockpos into 3 different columns in sqlite3 map database #15768

sfan5 commented Feb 8, 2025 •

edited

Loading

appgurueu commented Feb 8, 2025

sfan5 commented Feb 8, 2025 •

edited

Loading

lhofhansl commented Feb 9, 2025

lhofhansl commented Feb 9, 2025 •

edited

Loading

sorcerykid commented Feb 10, 2025

sfan5 commented Feb 10, 2025

sorcerykid commented Feb 10, 2025

sfan5 commented Feb 10, 2025 •

edited

Loading

lhofhansl commented Feb 11, 2025 •

edited

Loading

lhofhansl commented Feb 11, 2025

lhofhansl commented Feb 11, 2025 •

edited

Loading

sfan5 commented Feb 11, 2025

lhofhansl commented Feb 12, 2025 •

edited

Loading

sfan5 commented Feb 12, 2025

lhofhansl commented Feb 13, 2025 •

edited

Loading

[no sq] Split blockpos into 3 different columns in sqlite3 map database #15768

Are you sure you want to change the base?

[no sq] Split blockpos into 3 different columns in sqlite3 map database #15768

Conversation

sfan5 commented Feb 8, 2025 • edited Loading

To do

How to test

appgurueu commented Feb 8, 2025

sfan5 commented Feb 8, 2025 • edited Loading

lhofhansl commented Feb 9, 2025

lhofhansl commented Feb 9, 2025 • edited Loading

sorcerykid commented Feb 10, 2025

sfan5 commented Feb 10, 2025

sorcerykid commented Feb 10, 2025

sfan5 commented Feb 10, 2025 • edited Loading

lhofhansl commented Feb 11, 2025 • edited Loading

lhofhansl commented Feb 11, 2025

lhofhansl commented Feb 11, 2025 • edited Loading

sfan5 commented Feb 11, 2025

lhofhansl commented Feb 12, 2025 • edited Loading

sfan5 commented Feb 12, 2025

lhofhansl commented Feb 13, 2025 • edited Loading

sfan5 commented Feb 8, 2025 •

edited

Loading

sfan5 commented Feb 8, 2025 •

edited

Loading

lhofhansl commented Feb 9, 2025 •

edited

Loading

sfan5 commented Feb 10, 2025 •

edited

Loading

lhofhansl commented Feb 11, 2025 •

edited

Loading

lhofhansl commented Feb 11, 2025 •

edited

Loading

lhofhansl commented Feb 12, 2025 •

edited

Loading

lhofhansl commented Feb 13, 2025 •

edited

Loading