this post was submitted on 29 Oct 2023
3 points (100.0% liked)

Emacs

310 readers
1 users here now

A community for the timeless and infinitely powerful editor. Want to see what Emacs is capable of?!

Get Emacs

Rules

  1. Posts should be emacs related
  2. Be kind please
  3. Yes, we already know: Google results for "emacs" and "vi" link to each other. We good.

Emacs Resources

Emacs Tutorials

Useful Emacs configuration files and distributions

Quick pain-saver tip

founded 1 year ago
MODERATORS
 

Hi Emacs community,

I'm an elisp noob, and I recently wrote a function to get the references on a wikipedia page. I plan on using it for org-mode/org-roam so I can do research faster (even though there's probably already a package for that sort of thing). Unfortunately, it's probably not as robust as I would like to think it is, as some of the dois/isbns appear to be missing in some wikipedia pages I've tested. Here it is for reference:

(defun get-wikipedia-references (subject)
  "Gets references for a wikipedia article"
  (let ((wikipedia-prefix-url "https://en.wikipedia.org/wiki/"))
    (with-current-buffer
	(url-retrieve-synchronously (concat wikipedia-prefix-url subject))
      (let* ((html-start (progn (goto-char (point-min))
				(re-search-forward "^$")))
	     (dom (libxml-parse-html-region (1+ (point)) (point-max)))
	     (result))
	(dolist (cite-tag (dom-by-tag dom 'cite) result)
	  (let ((cite-class (dom-attr cite-tag 'class)))
	    (cond ((string-search "journal" cite-class)
		   (let ((a-tag (dom-search cite-tag (lambda (tag) (string-prefix-p "https://doi.org" (dom-attr tag 'href))))))
		     (setq result (cons (cons (concat "doi:" (dom-text a-tag))
					      (let* ((cite-texts (dom-texts cite-tag))
						     (title-beg (1+ (string-search "\"" cite-texts)))
						     (title-end (string-search "\"" cite-texts (1+ title-beg))))
						(substring cite-texts title-beg title-end)
						))
					result))))
		  ((string-search "book" cite-class)
		   (let ((a-tag (dom-search cite-tag (lambda (tag) (string-prefix-p "/wiki/Special:BookSources" (dom-attr tag 'href))))))
		     (setq result (cons (cons (concat "isbn:" (dom-text (dom-child-by-tag a-tag 'bdi)))
					      (dom-text (dom-child-by-tag cite-tag 'i)))
					result))))
		  (t
		   (let ((a-tag (assoc 'a cite-tag)))
		     (setq result (cons (cons (dom-attr a-tag 'href) (dom-text a-tag)) result))))
		  ))
	  )))))

(get-wikipedia-references "Graph_traversal")
(("doi:10.1109/SFCS.1979.34" . "Random walks, universal traversal sequences, and the complexity of maze problems")
 ("doi:10.1016/j.tcs.2015.11.017" . "Lower and upper competitive bounds for online directed graph exploration")
 ("doi:10.1016/j.tcs.2020.06.007" . "Online graph exploration on a restricted graph class: Optimal solutions for tadpole graphs")
 ("doi:10.1587/transinf.E92.D.1620" . "The Online Graph Exploration Problem on Restricted Graphs")
 ("doi:10.1016/j.tcs.2021.04.003" . "An improved lower bound for competitive graph exploration")
 ("doi:10.1137/0206041" . "An Analysis of Several Heuristics for the Traveling Salesman Problem"))

And yes, I know that I could probably use a library like s, dash, seq, or cl, but I try to keep my elisp functions free of those kind of things. I would appreciate any criticism from the Emacs community about my elisp!

top 4 comments
sorted by: hot top controversial new old
[–] github-alphapapa@alien.top 1 points 10 months ago (1 children)

My first suggestion would be to use plz for HTTP. Then I'd use cl-loop and pcase to simplify the rest of the code. Here's a partial rewrite with a TODO for further exercise. :)

(defun wikipedia-article-references (subject)
  (let* ((url (format "https://en.wikipedia.org/wiki/%s" (url-hexify-string subject)))
         (dom (plz 'get url :as #'libxml-parse-html-region)))
    (cl-loop for cite-tag in (dom-by-tag dom 'cite)
             for cite-class = (dom-attr cite-tag 'class)
             collect (pcase cite-class
                       ((rx "journal")
                        (let ((a-tag (dom-search cite-tag
                                                 (lambda (tag)
                                                   (string-prefix-p "https://doi.org" (dom-attr tag 'href))))))
                          (cons (concat "doi:" (dom-text a-tag))
                                ;; TODO: Use `string-match' with `rx' and `match-string' here.
                                (let* ((cite-texts (dom-texts cite-tag))
                                       (title-beg (1+ (string-search "\"" cite-texts)))
                                       (title-end (string-search "\"" cite-texts (1+ title-beg))))
                                  (substring cite-texts title-beg title-end)))))
                       ((rx "book")
                        (let ((a-tag (dom-search cite-tag
                                                 (lambda (tag)
                                                   (string-prefix-p "/wiki/Special:BookSources" (dom-attr tag 'href))))))
                          (cons (concat "isbn:" (dom-text (dom-child-by-tag a-tag 'bdi)))
                                (dom-text (dom-child-by-tag cite-tag 'i)))))
                       (_ (let ((a-tag (assoc 'a cite-tag)))
                            (cons (dom-attr a-tag 'href) (dom-text a-tag))))))))

Regarding this:

And yes, I know that I could probably use a library like s, dash, seq, or cl, but I try to keep my elisp functions free of those kind of things

First of all, cl and seq are built-in to Emacs and are used in core Emacs code. There's no reason not to use them. Second, dash and s are on ELPA and are widely used; it's largely a matter of style, but they are solid libraries, so again, no reason not to use them. They don't have cooties. ;)

[–] ElfOfPi@alien.top 1 points 10 months ago

I read a reddit post saying that using cl-lib was kind of a bad thing, and I think I've always had a fear that using libraries in my config would just make it more bloated/slow Emacs down. But after all the comments here, I think I'll change my stance on that.

[–] nv-elisp@alien.top 1 points 10 months ago

You don't have anything to guard against a bad response from the server. e.g.

(unless (equal url-http-response-status 200)
  (error "Server responded with status: %S" url-http-response-status))

To position point at the end of the headers:

(goto-char url-http-end-of-headers)

This:

(setq result (cons (cons ...) result))

Is more clearly expressed as:

(push (cons ...) result)

Better yet, you could map over the elements you're interested in and accumulate the results via mapcar or cl-loop. That would obviate the need for the "results" variable.

You could probably shorten things by using the dom-elements function to directly search for the href's you're interested in in combination with dom-parent to get at the parent elements.

Overall your function gets a 65 out of 130 ERU (elisp rating units).

[–] larrasket@alien.top 1 points 10 months ago

having

          ))
  )))))

is not very lispy