cll7793

joined 2 years ago
[–] cll7793@lemmy.world 2 points 2 weeks ago

Thanks! That's a good idea!

[–] cll7793@lemmy.world 1 points 2 weeks ago

I'd highly recommend using ZIM to download the websites you want! (https://wiki.openzim.org/wiki/Build_your_ZIM_file)

Once downloaded, you honestly can probably get better results from basic notepad search than google/duckduckgo/bing.

[–] cll7793@lemmy.world 2 points 2 weeks ago

Super useful plugin! You can also subscribe to lists that block SEO/AI generated websites. Now only if there was a whitelist plugin that places forums higher up

[–] cll7793@lemmy.world 1 points 2 weeks ago (3 children)

Someday this will be possible when an open source search engine comes around.

[–] cll7793@lemmy.world 2 points 2 weeks ago

I noticed some of the best resources from the past are unfindable from any search engine. For example some science youtube channels which offer amazing quality content seem to be unfindable. They are replaced with other channels that try to clickbait their way to the top. The same can be said with websites that SEO as much as they can. The highest quality resources are also often in the least quantity. A form of quantity > quality is favored and amplified and sometimes even censored. (Anna's archive)

[–] cll7793@lemmy.world 2 points 2 weeks ago

It's quite sad that we are now at a point where we are forced to make our own search engines from scratch. Search engines are hard! Google's original search algorithm (about 2 decades ago) was quite amazing. You were able to give vague search terms and yet still find the answer you wanted. The secret sauce was ranking based on relevance to the search query. I'm not aware of any guides/projects on search engines. I wish there was a good way I could search for this. (The irony!) But a great starting resource is this series on networks from wikipedia. (https://en.wikipedia.org/wiki/Network_theory)

Some random tips:

  • The main goal of any search engine should be to minimize the number of times a user returns to the ranking page to click on a new link. Big tech should be doing this anyways but they have other goals.
  • The main metadata database needs to topologically connect you to any part of the internet. (https://en.wikipedia.org/wiki/Graph_theory) Think of it as a hub/portal giving you general directions, but doesn't tell you exactly where you should be heading. The ideal solution is to download everything from the internet and query each result for relevance to a search query individually, but this is intractable. Instead you have to group the internet into graphs and sub graphs - STEM, Social, Forums, E-commerce etc. Hyperlinks offer an objective way to calculate connections between websites. For example Lemmy.world <-> Wikipedia.org. The weight of these connections gives you a way to guide a traversal algorithm during search. Semantic analysis of some form allows you to find more efficient ways to draw connections making your search more efficient.
  • The most powerful way to find connection/relevance to a search term is with transformers and their attention mechanism. For example if the search query is "Open source search engine", the attention heatmap would be on groups of websites subjects like Forums, Q&A, Programming, Network Science, etc. There would also be a negative heatmap for topics like Cooking, Sports, Entertainment, etc. From there you want to load up recursively metadata for websites. For example for Lemmy it would be the title of all posts (and maybe their top comments). If it fits, load as much of this as you can into a transformer and calculate the heatmap relative to the search query. Again you are not using the transformer to generate answers. This is a bad idea. Instead you are using it to rank search results in terms of relevance/attention, what the transformer is fundamentally designed for.

As a side note, you are able to tune your model to your own search preferences with little data. You are also able to exchange computation time for search quality! This is amazing. If computation is a concern, traditional traversal algorithms and basic relevance/ranking algorithms work too but at the cost of more engineering.

I hope this sorta helps, if you have any other question feel free to ask! The future of search will likely be self-hosted as conflicts of interest within current search engine providers degrades the quality to the point where they are unusable.

[–] cll7793@lemmy.world 1 points 2 weeks ago

Finding the balance between what to keep to index is hard! The attention mechanism in transformers should be pretty good at ranking results. The idea is to feed into context titles, top answers, etc in bulk along with a search query. The attention heatmap relative to the search gives you a general rank for how good each result is. Ironically enough, this is probably the most powerful indexer, yet no big tech uses it and instead has the model generate answers instead of ranking them. The best part is, this system is tunable and can be adjusted to user preference with little data. The overall goal should be to minimize the number of results a user checks. (This should be what other engines are doing in the first place)

 

Choices have slowly been running out when it comes to effective search engines. It seems inevitable an open source search engine project independent from big tech will be needed.

Some of my own tricks are:

  • Use the blacklist plugin to block sites from search.
  • Search for forum sites and communities instead of specific queries. (Wikipedia has a list of forums that might be useful)
  • For technical questions favor Q&A websites like stack exchange.
  • YouTube videos often offer better information than results from search engines. (Use search engines instead of YT search)
  • Look for blogs and journals that specialize in the topic you're searching for.
  • Use boolean search when possible.
  • Self-host and customize your own metadata search engine. Create a graph network linking websites based on subject/topic. You may not be able to query specific questions but you can discover sites that you otherwise can't in traditonal search. This is a great way to discover hidden gems! (Example: https://internet-map.net/)
  • (Difficult) Self-host and scrape sites across the web in order to create your own query-able database. This would be the most effective way to search the internet and would be completely independent from potential enshittification and censorship. The cost however is quite high both in term of hardware and time. Kiwix offers a way to download websites for offline use. (Ex: Wikipedia, Stack exchange). This is a good starting point to build your own custom search engine.

I would love to hear the tips and tricks you use! I hope this post helps others in more efficiently finding information on the internet!

 

Recently noticed some amazon items are 80% more expensive than ebay. Made this post for some tips and tricks on finding the best prices, deals or services for general items.

Here are some off the top of my head.

  • Government auctions often have the best bang for your buck on some items. (Tech, tools, etc)
  • Retailmenot.com seems to be the only real coupon website with real coupons. I don't know of any others.
  • Some mobile and internet providers will offer a discount if you try and cancel.
  • Insurance companies have been caught buying sensor data from your phone and using it to raise premiums if they detect stuff like sudden stops. To get the best deal, avoid their tracking and don't opt in to their driving performance tracking program.

Here are some potental topics

  • Bank/credit unions plans
  • Insurance plans
  • Food/Groceries
  • Hardware store supplies
  • Tools/House work
  • E-shopping
  • Couponing
  • Pitfalls
[–] cll7793@lemmy.world 2 points 1 year ago

Glad you found it awesome! :)

[–] cll7793@lemmy.world 5 points 1 year ago

Thank you! I didn't know about them!!

[–] cll7793@lemmy.world 8 points 1 year ago (2 children)

I know right? Open source hardware has so many potential benefits over commercial. Significantly decreased price, privacy, good documentation, right to repair, no conflict of interest and potentially one day performance. Imagine we have engineers from across the world improving a single computer chip design, generator design, solar panel fabrication process, or maybe even perhaps an open source fusion reactor blueprint someday in the next 20 years (pun intended).

I'm seriously considering starting something like this myself. Open source blueprints for power generation/energy storage (regular batteries, thermal sand resevior based batteries, hydro power generation), water filtration, machine tools for fabricating anything, CNC machine, plasma cutters, hand tools, etc. Basically everything you could need to live Open Source.

The problem as always is getting enough designers, engineers, and volunteers.

[–] cll7793@lemmy.world 10 points 1 year ago

Oh? I didn't know that. They seemed like a good organization to me too. Open source hardware is quite lacking compared to the software side so I hope they succeed.

[–] cll7793@lemmy.world 2 points 1 year ago

Found a nice list of resources for RSS. The internet really does feel like an echo-chamber now! There is so much noise to signal. Hopefully this post helps reduce the echoing a tad. I would love if you shared some tips for getting started with RSS if that's alright!

 

The goal of this post is to provide a hub to discover some powerful internet resources out there.

For example here's one I wanted to share.

  • Open Source Ecology is a project for open source hardware that is significantly cheaper than retail costs. Some of the equipment include open source designs for CNC machines, windmills, tractors, plasma cutters, power supplies, motors, generators, and much more!

https://www.opensourceecology.org/

Additional Resources

 

The goal of this post is to provide a hub to discover some powerful internet resources out there.

For example here's one I wanted to share.

  • Open Source Ecology is a project for open source hardware that is significantly cheaper than retail costs. Some of the equipment include open source designs for CNC machines, windmills, tractors, plasma cutters, power supplies, motors, generators, and much more!

https://www.opensourceecology.org/

Additional Resources

47
submitted 1 year ago* (last edited 1 year ago) by cll7793@lemmy.world to c/asklemmy@lemmy.world
 

I hope this post offers a good way to discover new communities.

Here's a list of internet forums from Wikipedia

 

What free resources do you know on the internet that everyone should use?

Potential Category Ideas

  • FOSS Software
  • Quality of Life
  • Public Services
  • Other List of Lists
  • Personal Finance
  • Github Awesome Repositories
  • Firefox Addons
  • Free Research/Books
  • Piracy Sites & Lists
  • Real Life Resources

Links and Resources

 

What are your favorite mathematics channels/videos on YouTube?

There are tons of great videos on YT, but I'll list some resources from 3Blue1Brown's SoME3 contest if you want to discover more math explainers.

SoME3 Resources

This is a continuation to my original "What are the most mindblowing things in mathematics?" post.

Additional Resources

 

It is becoming near impossible to find relevant information from search engines. Duckduckgo, SearXNG, Bing, Google, and so many more mainstream engines have a significantly high noise to signal ratio, and it is getting worse.

Here are a collection of the best search engines I know, please add more to the list.

If no more high quality search engines exist, would it be possible to host your own?

EDIT: Some new discoveries. The addon uBlacklist and filters can block super SEO sites from appearing in search.

0
submitted 2 years ago* (last edited 2 years ago) by cll7793@lemmy.world to c/asklemmy@lemmy.ml
 

It is becoming near impossible to find relevant information from search engines. Duckduckgo, SearXNG, Bing, Google, and so many more mainstream engines have a significantly high noise to signal ratio, and it is getting worse.

Here are a collection of the best search engines I know, please add more to the list.

If no more high quality search engines exist, would it be possible to host your own?

EDIT: Some new discoveries. The addon uBlacklist and filters can block super SEO sites from appearing in search.

 

What concepts or facts do you know from math that is mind blowing, awesome, or simply fascinating?

Here are some I would like to share:

  • Gödel's incompleteness theorems: There are some problems in math so difficult that it can never be solved no matter how much time you put into it.
  • Halting problem: It is impossible to write a program that can figure out whether or not any input program loops forever or finishes running. (Undecidablity)

The Busy Beaver function

Now this is the mind blowing one. What is the largest non-infinite number you know? Graham's Number? TREE(3)? TREE(TREE(3))? This one will beat it easily.

  • The Busy Beaver function produces the fastest growing number that is theoretically possible. These numbers are so large we don't even know if you can compute the function to get the value even with an infinitely powerful PC.
  • In fact, just the mere act of being able to compute the value would mean solving the hardest problems in mathematics.
  • Σ(1) = 1
  • Σ(4) = 13
  • Σ(6) > 10^10^10^10^10^10^10^10^10^10^10^10^10^10^10 (10s are stacked on each other)
  • Σ(17) > Graham's Number
  • Σ(27) If you can compute this function the Goldbach conjecture is false.
  • Σ(744) If you can compute this function the Riemann hypothesis is false.

Sources:

 

An Internet Portal is an information hub connecting you a much wider portion of the internet.

For example:

What Internet Portals do you know of that you would like to share?

view more: next ›