Replies: 1 comment 2 replies
-
@vic0824 All good points, we should expand our docs with such information! Trying to explain better in this thread, and perhaps updating the documentation after that. About the bucket selection strategy, the round-robin picks the next bucket from the list. After the last it starts from the first one again. This is the default strategy but doesn't work well with concurrent multi-threading insertion where the best strategy is To summarize:
About the transaction isolation level, I double checked the code and the docs about transaction batch is not correct. That piece was coming from OrientDB that allowed some control over isolation. With ArcadeDB the level is always |
Beta Was this translation helpful? Give feedback.
-
When a Document Type is created, by default ArcadeDB created as many buckets as CPU cores.
I have read that this speeds up parallel insertions, because two insert queries can be performed by two different threads that write to different buckets, without creating contention. This is easy to understand if the bucket selection strategy is
round-robin
, because we can be sure that two consecutive inserts will write to two different buckets. If the strategy ispartitioned
, it is less easy to understand: can we still be sure that two consecutive inserts will write to different buckets? If the strategy isthread
, it's not easy to see how it works, I guess one needs to know the implementation details.What I don't understand is if something similar happens with select queries: if a select query is performed, does the engine scan in parallel all the buckets (assuming the type was created with the default options of 1 bucket per core)? If the insertions are guaranteed to be performed always from the same thread, would the select performances increase if the type was created with only 1 bucket (which means that the engine has to open only 1 file instead of n?) Or the read performances are always better if multiple buckets are used?
Finally, does it make sense to talk about isolation levels in ArcadeDB? In section 8.5. Batch of the manual, it is specified that a batch can be executed with READ_COMMITTED or REPEATABLE_READ isolation levels, but isolation levels are not discussed anywhere else in the document, so it is not clear what is the default isolation level for normal queries (select, insert, update, delete).
The examples in the Isolation section seems to suggest that the default isolation level is REPEATABLE_READ, but this is not explicitly stated anywhere in the document.
In my project, I have several clients that need to execute some tasks stored in the database, and I have to make sure that only one client "books" a single task. With a relational database, I would execute the transaction (select the next available task and change its status to signify that it has been booked) with SERIALIZABLE level, to avoid that two different threads book the same task, but with ArcadeDB I have simulated this by explicitly synchronizing the read and write threads on to a common object. I wonder if there is a "native" way of doing this with ArcadeDB?
Beta Was this translation helpful? Give feedback.
All reactions