this post was submitted on 24 Jan 2024
8 points (66.7% liked)

Python

6356 readers
8 users here now

Welcome to the Python community on the programming.dev Lemmy instance!

πŸ“… Events

PastNovember 2023

October 2023

July 2023

August 2023

September 2023

🐍 Python project:
πŸ’“ Python Community:
✨ Python Ecosystem:
🌌 Fediverse
Communities
Projects
Feeds

founded 1 year ago
MODERATORS
 

Hello,

I made a simple script to scraper threads.net using python and selenium. the script is just few lines long and it's easy to understand.

So what this script does?

first it will open edge browser(which you can change it to firefox or chrome). now you have to enter credentials to log into it. your browsing data and credentials will be stored in user_data which you can move around.

It scroll through threads's feed/hashtag/explore and It will store the src of every image it encounters so at the end we will have a links.txt file containing all the links to the images we have encountered.

now we have links.txt and we can use the following command to download all the images from the links.txt

wget -i links.txt

the script:

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.edge.options import Options
import time

options = Options()
options.add_argument("--user-data-dir=user_data")

driver = webdriver.Edge(options=options)

driver.get('https://threads.net')

s = set()

input("Press any key to continue...")
for i in range(30):
    try:
        elements = driver.find_elements(By.XPATH, "//img")
        for e in elements:
            s.add(e.get_attribute("src"))
        driver.execute_script("window.scrollBy(0, 1000);")
        time.sleep(0.2)
    except:
        print("oopsie")

with open("links.txt", 'w') as f:
    links = list(s)
    for l in links:
        f.write(l+"\n")

driver.quit()

I hope it was usefull :D

Edit: here is a link to links.txt https://0x0.st/HGjx.txt

top 3 comments
sorted by: hot top controversial new old
[–] 1984@lemmy.today 4 points 9 months ago (1 children)

Ok so why do you want all images from threads.net?

[–] kionite231@lemmy.ca 2 points 9 months ago

Because this way I can download a lot of wallpaper and anime pictures :D

There are tons of anime and wallpaper on instagram.com.

you can use this script to scrape instagram too! just change the url in driver.get().

[–] conorm@feddit.uk 1 points 9 months ago

nice code, could you share the asm for all that?