You should avoid complex logic on a hot path of your hash table operations.Here are some general recommendations that can be applied to all hash table implementations: For such a scenario, number of random memory accesses per operation is the most important factor for hash table performance. When hash table does not fit in CPU caches.Performance of hash table operations depends on arithmetic operations like hash function calculation, computing slot location, elements comparisons, and other operations that are required for specific hash table memory layout. This benchmark is created to compare the performance of different hash tables with different hash functions in in-memory aggregation scenario.īenchmark is based on real anonymized web analytics data from Yandex.Metrica dataset.īenchmark computes mapping for each unique key to count for columns from the dataset, similar to such SQL query SELECT column, count(column) FROM hits GROUP BY column.īecause each column in the benchmark has different cardinality and distribution, it is possible to check how different hash tables work for in-memory aggregation on real-world data. Hash tables aggregation benchmark Motivation
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |