Make sure Elasticsearch performance tests apply update/delete works to existing documents and add works to non-existing documents
[6:34 PM] Sanne Grinovero: @yoann one odd think I noticed is in how the changeset is defined on the TransactionContextForTest tc = new TransactionContextForTest();
[6:35 PM] Sanne Grinovero: those streams of random ints might have duplicates
[6:35 PM] Yoann Rodière: Sure, we can improve later
[6:35 PM] Sanne Grinovero: so you might e.g. enqueue 10 "ADD" then follow with 10 "DELETE"s, but you'll have only ~15 operations in the changes as a result :)
[6:36 PM] Sanne Grinovero: as when IDs happen to clash, we optimise them as one operation - or even zero if they cancel each other out
[6:36 PM] Yoann Rodière: Right... unlikely, though
[6:36 PM] Sanne Grinovero: I measured .. it's happening all the time :)
[6:36 PM] Sanne Grinovero: out of 40 operations enqueued
[6:36 PM] Sanne Grinovero: on average only 37 are actually executed
[6:37 PM] Yoann Rodière: :x What if I say it was a featre? :p
[6:37 PM] Yoann Rodière: feature*
[6:37 PM] Sanne Grinovero: yes it's no big issue
[6:37 PM] Sanne Grinovero: it's good to measure our "de-duplication performance" as well :p
[6:37 PM] Sanne Grinovero: just make sure you know about it, it might skew the interpretation of figures a bit
[6:38 PM] Yoann Rodière: I wonder how you'd want to fix that, though? Because if we go that way, we also have a problem with delete operations on non-existing documents
[6:38 PM] Yoann Rodière: To be sure each operation is actually executed, we'd have to keep track of the documents already in the index
[6:39 PM] Yoann Rodière: Or maybe we can safely assume a DELETE resulting in a 404 is as costly as a normal one?
[6:39 PM] Sanne Grinovero: how else are you testing that you DELETE actually existing documents?
[6:40 PM] Sanne Grinovero: if you rely on them being added in the same changeset, that just generates No-Ops ..
[6:41 PM] Sanne Grinovero: @yoann no you can't rely on DELETE as 404 being the same.. an actual Delete is actually way more expensive than any write.
[6:44 PM] Yoann Rodière: Ok, well... I'll have to make some changes.
1. Add an initialization method before each iteration, to ensure we already have documents to delete in the ID range assigned to the thread
2. keep track of added/deleted documents in some thread-scoped context, maybe using a BitSet?
3. Ensure we only delete documents that existed before we opened the current transaction context
4. And while we're at it, do something similar for add and update
[6:44 PM] Sanne Grinovero: @yoann it's an interesting puzzle but I have a deja-vu :
[6:44 PM] Sanne Grinovero: ok you're on the right track
[6:45 PM] Sanne Grinovero: this is what I've done in the past:
[6:45 PM] Sanne Grinovero: each thread has its own *strictly* isolated range of keys it can use
[6:45 PM] Sanne Grinovero: so you have a per-thread pool - but make sure there's no shared keys among them
[6:46 PM] Sanne Grinovero: then you generate sequences of independent work
[6:46 PM] Sanne Grinovero: so while it makes sense to batch things in transactions..
[6:46 PM] Sanne Grinovero: you don't add/delete the same keys in the same transaction.
[6:47 PM] Sanne Grinovero: if these values are also using some "per thread" identifiable label, you can then also assert on some expected count of them
[6:48 PM] Sanne Grinovero: @yoann either way, keep in mind this is just to make sure the figures aren't fooling us too much into wild theories.. Remember the goal is to make it perform a bit better not fix the benchmark ;)