this post was submitted on 04 Apr 2025
19 points (100.0% liked)

Programming

19333 readers
51 users here now

Welcome to the main community in programming.dev! Feel free to post anything relating to programming here!

Cross posting is strongly encouraged in the instance. If you feel your post or another person's post makes sense in another community cross post into it.

Hope you enjoy the instance!

Rules

Rules

  • Follow the programming.dev instance rules
  • Keep content related to programming in some way
  • If you're posting long videos try to add in some form of tldr for those who don't want to watch videos

Wormhole

Follow the wormhole through a path of communities !webdev@programming.dev



founded 2 years ago
MODERATORS
 

I'm doing some Galois Field / Cyclic Redundancy Check research for fun and I've come across an intriguing pattern that I need a data structure for.

Across the 64-bit (or even 128-bit or larger) spaces, I've discovered an interesting pattern relating to hamming distances that I'd like a data structure to represent.

I'm going to need something on the order of ~billions of intervals each having somewhere between 1 item to ~1 billion per interval. And I'd like to quickly (O(1) or O(lg(n))) determine if other intervals intersect.


For 32-bit space I can simply make a 512MB Bitmask lol and then AND/OR the two Bitmask. Easy

But for 64-bit space I'm stuck and a bit ignorant to various data structures. I'm wondering if someone out there has a good data structure for me to use?

I've read over Interval Trees on Wikipedia. I'm also considering binary decision diagram over the 64-bits actually. Finally I'm thinking of some kind of 1-dimension octtree like datastructure (is that just a binary tree?? Lol. But BVH trees in 3d space seems similar to my problem it's just I need it optimized down to 1 dimension rather than 3.) Anyone else have any other ideas or cool data structures that might work?

top 4 comments
sorted by: hot top controversial new old
[–] skip0110@lemm.ee 5 points 1 day ago (1 children)

Ph-trees can do range and closest queries across N dimensions very quickly. I have not used it for 1 dimension, but I'd imagine it would work fine.

https://github.com/tzaeschke/phtree

[–] drspod@lemmy.ml 2 points 22 hours ago

I have not used it for 1 dimension

Querying for all intervals (x’, y’) that overlap with an interval (x, y) is a two dimensional range query that selects the upper left quadrant of the x-y plane that satisfies x < y’ && x’ < y

[–] aes@programming.dev 1 points 1 day ago (1 children)

For 32-bit space I can simply make a 512MB Bitmask lol and then AND/OR the two Bitmask. Easy

Uh, no, that's O(n), yeah?

[–] dragontamer@lemmy.world 1 points 22 hours ago* (last edited 22 hours ago)

And typical RAM speeds are 100GB/second for CPUs and 500GB/second on GPUs, meaning 512MB operations are literally on the order of 5 miliseconds for CPU and 1ms on GPU.

Below certain sizes, the 'billions of intervals' is larger than the damn Bitmask. Seriously, 8 bytes per interval (aka one pointer and 0 data) and that's 8GB for the data structure.

Instead of a billion 32-bit intervals to store (4GB of RAM at the minimum) it's obviously a better move to store 500-million byte Bitmasks. And modern GPUs can crush that in parallel with like 3 lines of CUDA anyway.