this post was submitted on 11 Feb 2024
19 points (100.0% liked)

Programming

17364 readers
158 users here now

Welcome to the main community in programming.dev! Feel free to post anything relating to programming here!

Cross posting is strongly encouraged in the instance. If you feel your post or another person's post makes sense in another community cross post into it.

Hope you enjoy the instance!

Rules

Rules

  • Follow the programming.dev instance rules
  • Keep content related to programming in some way
  • If you're posting long videos try to add in some form of tldr for those who don't want to watch videos

Wormhole

Follow the wormhole through a path of communities !webdev@programming.dev



founded 1 year ago
MODERATORS
 

I'm working on a query engine, essentially a tool to scan/filter/annotate by lookups/group by/aggregate a large dataset, tens-of-terabytes range. The compute part seems to be a bottleneck for me (I'll be doing around 80-300 GB/s of reads, and yes, I will have hardware capable of providing that kind of throughput). My hypothesis is that by encoding query in form of template arguments I can make the compiler generate code optimized for a specific type of query (like, the filtering or aggregation keys). But I do not know what queries will users send, so I need a way to instantiate templates at runtime.

Sounds simple: for a new type of query invoke a compiler at runtime to build a dynamic library with a new instantiation, then dynload it and off we go. Some prior work is here, though I'm pretty sure any JIT compiler also can counts here. But there's enough technical details to worry about, and at the same time this idea isn't novel, so I wonder—are there any packaged solutions for this kind of approach?

you are viewing a single comment's thread
view the rest of the comments
[–] solrize@lemmy.world 7 points 9 months ago* (last edited 9 months ago) (1 children)

If the result sets are that large, you have enough time to generate a program and compile it, though as you say a more streamlined approach is to use a JIT library. Postgres uses LLVM for that, or depending on the workload, maybe you could even benefit from using a GPU. None of this is new.

[–] foobaz@lemmy.world 2 points 9 months ago