I spent all day again writing code that I won't run twice, but that's okay.
My Mastodon instance doesn't enable text search. Why? I don't know. Maybe because their hardware can't support the strain of arbitrary text search, maybe privacy. Alls I know is that when I come up with a gem of a post—I mean a real good one, a real humdinger—I'm stricken with panic that I've posted it already. I've posted it already, and they'll know. They'll know that I repost my statuses.
And for too long I've lived with that silly, extremely-low-stakes fear. Until now. Here's my Mastodon search script. It's an executable, written in Ruby, that will search your post history for any keyword or regular expression you give it, proving once and for all that you're still capable of having original thoughts.
Just set the env variable MASTO_SEARCH_URL
to the url of the Mastodon account you want to search (or set it each invocation with the --url
option) and run it like
$ mastodon_search keyword
or
$ mastodon_search /regular expression/i
Hey, can we get some code snippets?
Heck yeah, we can get some snippets.
Look, you can never have too many command line options. Actually, you can. This is the trimmed-down set. (You know you've gone overboard for an afternoon's API scraping script when you add both --verbose
and --silent
options.)
opts = GetoptLong.new(
["--all", "-a", GetoptLong::NO_ARGUMENT],
["--boosts", GetoptLong::NO_ARGUMENT],
["--case-insensitive", "-i", GetoptLong::NO_ARGUMENT],
["--help", "-h", GetoptLong::NO_ARGUMENT],
["--replies", GetoptLong::NO_ARGUMENT],
["--silent", "-s", GetoptLong::NO_ARGUMENT],
["--url", GetoptLong::REQUIRED_ARGUMENT],
["--verbose", "-v", GetoptLong::NO_ARGUMENT],
)
Now, I love myself some regular expressions (native browser support for regex search when?), but isn't it so unsatisfying when your command line tool accepts regexes only with a custom flag like -e
, and then you have to type them out plain like an ordinary icky string? Ew, gross! Wouldn't you rather type them out like /a regexp literal/i
that's contextually parsed?
Because the intended audience of this post is me, I know the answer to that is yes.
So how do we parse /a regexp literal/i
while preserving those trailing options? Kinda like this:
pattern = if query.match(%r{\A/(?<source>.*)/(?<options>[[:alpha:]]*)\z})
options = $~["options"].chars.map do |opt|
case opt
when "i"
Regexp::IGNORECASE
when "m"
Regexp::MULTILINE
when "x"
Regexp::EXTENDED
end
end.compact.inject(&:|)
Regexp.new($~["source"], options)
else
# You're a string, my dude.
query
end
The global variable $~
is of course the previous regexp match; and because I'm an overachiever when I do this for fun, of course we gotta have named captures.
The eye roll emoji is not powerful enough to convey my distaste for overadherence to HATEOAS. Everyone wants to be clever and put their next/prev links in Link
response headers, even when their response is in easily parseable, First Class Citizen Of The Web JSON, and Mastodon's statuses API is no exception.
Alright, let's fuckinnnnn parse a Link
header I guess!
def find_next_url(link_header)
return if link_header.nil?
match = link_header.match(/<(.*)>;\s+rel="next"/)
match[1] if match
end
Funny thing about Link
response headers: they're supposedly the ideomatically correct way of conveying the direction of a stream of articles in hypermedia, and everyone keeps using them for that, and everyone still ends up having to the same parser out of the same regular expression or .split().map().find()
because nothing natively parses that header except a web browser.
That's it for now
There's nothing else really interesting going on under the hood. Just filtering JSON and accumulating results.
See ya later kids, and remember to always trap your signals. Nobody wants a backtrace when they quit.
Signal.trap("SIGINT") { exit(1) }
Signal.trap("SIGTERM") { exit(1) }