Last year I signed up for a membership at . I had been thinking about doing this since in June 2004. Note that I did not sign up at that time. Why not? For many reasons, but not the least of which is that I wasn’t really into RSS yet (for those who don’t know, RSS is a technology which allows you to know when a site has been updated without having to go to it. Gross oversimplification, but the explanation that you need for my purposes if you don’t understand RSS). I just regularly pulled up the website and looked to see if there was anything new.
Nope, I didn’t actually join until the was announced. One might conclude this makes me either a cheap bastard or a sucker for a contest. I saw it as motivation to metaphorically get off my literal duff, support the writing of someone whose website I enjoy, get a t-shirt, and maybe win a prize. Actual I signed up while sitting down, so I guess the duff thing was also metaphorical.
I actually paid $29 and got a t-shirt, which I have yet to wear. There’s probably something to be said about me psychologically when I didn’t pay $19 for something I was using and then paid an extra $10 for something I didn’t use, but let’s not dwell on that.
What does it get you?
The best quote about comes from DaringFireball.
But please, I implore you, do not think of this as paying $20 just to get a full-content RSS feed. Think of it as a small token of my gratitude for supporting my writing at this site. It’s like when you pledge $100 to PBS and they send you a tote bag; no one does it to get the tote bag.
The first rule of supporting DF is that you do not not support DF for the t-shirt. Maybe that’s why I haven’t worn mine.
After all, I can read the site for free, I can use the for free. I could manually check the Linked List myself or write a shell script to scrape it and alert me when it finds new stuff. I don’t need to subscribe to DF. That was the first realization, and it was an important one.
It’s neither about a tote-bag, a t-shirt, or an RSS feed.
It’s about supporting someone whose writing and observations I enjoy. It was about giving a little back, as I was making a bit of money from AdSense at the time. People were paying to read my stuff, I wanted to pay it forward, spread the weath, insert your trite analogy here.
That was how I thought about signing up, but even more so, I felt that I was paying for what had already been written. As I said, I had been reading the site for awhile, I had enjoyed it. It was as much a “thank you” as anything else. After all, it could have been that the next day/week/month DF would have had a post announcing “Sorry it didn’t work out, I’m going to work for Microsoft.” Ok maybe not the last part, but he could have decided not to write anymore.
Why did I do it? I did it because of excellent articles and analysis like the which had originally gotten my attention. I’ve also written in-depth about a for-pay browser in a free-browser world, maybe that’s another thing we have in common.
In fact, I think that’s probably something worth looking at: someone who is willing to pay for something better than that free alternatives. If you use OmniWeb or if you used Opera before it was free, you were paying for a better alternative than what was available for free. Is paying to support content otherwise available for free really that much of a jump?
As I said, I looked at it in different terms. My $29 got me a t-shirt and a chance to win some other software, and was something of an expression of thanks for the site I had already enjoyed.
Now that the time to renew was approaching, it was time to ask again: What does my $19 get me? Or, to think in my terms, what did my $19 get me for the past year? So I decided to take a closer look at what my subscription supported last year:
- 27 Oct 2005
- 11 Nov 2005
- 16 Nov 2005
- 30 Nov 2005
- 16 Dec 2005
- 26 Dec 2005
- 10 Jan 2006
- 13 Jan 2006
- 18 Jan 2006
- 20 Jan 2006
- 30 Jan 2006
- 2 Feb 2006
- 13 Feb 2006
- 20 Feb 2006
- 22 Feb 2006
- 5 Mar 2006
- 8 Mar 2006
- 10 Mar 2006
- 10 Mar 2006
- 24 Mar 2006
- 27 Mar 2006
- 31 Mar 2006
- 3 Apr 2006
- 6 Apr 2006
- 10 Apr 2006
- 12 Apr 2006
- 13 Apr 2006
- 19 Apr 2006
- 20 Apr 2006
- Cringely’s Machinations 22 Apr 2006
- 26 Apr 2006
- 27 Apr 2006
- 28 Apr 2006
- 30 Apr 2006
- 2 May 2006
- 4 May 2006
- 5 May 2006
- 9 May 2006
- 11 May 2006
- 12 May 2006
- 12 May 2006
- 15 May 2006
- 26 May 2006
- 15 Jun 2006
- 19 Jun 2006
- 20 Jun 2006
- 26 Jun 2006
- 8 Jul 2006
- 15 Jul 2006
- 25 Jul 2006
- 3 Aug 2006
- 4 Aug 2006
- 6 Aug 2006
- 16 Aug 2006
- 17 Aug 2006
- 18 Aug 2006
- 21 Aug 2006
- 23 Aug 2006
- 30 Aug 2006
- 1 Sep 2006
- 5 Sep 2006
- 11 Sep 2006
- 12 Sep 2006
- 13 Sep 2006
- 15 Sep 2006
- 15 Sep 2006
- 18 Sep 2006
- 20 Sep 2006
- 21 Sep 2006
- 26 Sep 2006
- 28 Sep 2006
- 29 Sep 2006
- 2 Oct 2006
- 5 Oct 2006
- 8 Oct 2006
- 10 Oct 2006
- 13 Oct 2006
- 16 Oct 2006
- 17 Oct 2006
- 18 Oct 2006
- 19 Oct 2006
- 20 Oct 2006
- 25 Oct 2006
- 26 Oct 2006
- 27 Oct 2006
I made that into a nifty ordered list in HTML to save me all the counting. 1 Wow, 85 articles! At $19 (the price for that those who didn’t order t-shirts that they didn’t wear) that works out to about $0.23/post. Less than a quarter per article.
OK I pretty much already knew that I was going to renew anyway, but 85 articles? You really can’t complain about that.
And yet, the Unix geek in me wouldn’t stop.
Because if you’re like me (and, as David Letterman says, “I pray to God that you’re not”) part of you wants to know, wants desperately to know, how many words were in those 85 articles? I mean, heck, anyone can write 85 articles if they’re 3 paragraphs long.
Go ahead, guess….. Guess… I can wait here all day. Ok, did you guess? Ok, but did you guess 109,398? No, probably not, and even if you did I wouldn’t believe you. Ok Rainman, how much is that per word? I’m not sure but I think it’s something like $0.00017368 pe r word 2. I’m not great at “The Math” but whatever it is, it’s pretty small.
Was it good for you, you Inner Unix Geek you? I see that look in your eyes, you want more data, compiled via as many geeky Unix scripts as you can get. First I bet you want to know how many articles there were per month.
Get Your Geek On
Alright Sports Fans: here come the scripts.
The first thing to do was get a local copy of the articles, so I could run various tests on them without having to keep hitting the server whenever I wanted some arcane bits (bytes?) of information. But we might as well dump the stuff that we aren’t going to do anything with right away.
For example, what if I could isolate the text of the articles without the header/sidebar/footer information? Well it turns out I can do just that because I’m a Unix Geek. By harnessing the power of and and forking and redirecting.
Fortunately for us the site is fairly standardized in its display. The bits that we want to ignore are fairly static for any short term period. lynx has -dump flag which will linearize the text of the page, giving you just the parts that you would normally see and a few extras. If you don’t have access to Lynx directly, let me describe it: 15 or 16 lines before a line that includes “Ads via The Deck”. Then the article, formatted in 80 characters, and then at the end there are lines to the “Previous” and “Next” articles. Of course you have no way of knowing how long the article will be.
Sed can do this. It can start at the first line and delete to the line that contains “Ads via The Deck” (and since that line is likely to be unique, it’s a good candidate for a match). It can also match the line “Previous:” but that’s a dangerous line because it’s not as likely to be unique, so we have to be more careful. Looking at the format of lynx -dump, that “Previous” line has 3 spaces from the beginning of the line, the word Previous, then a colon and another space. That translates to /^ Previous: /
So we’ll put that all together and it will look something like this:
for $DF_URLS in http://url.one http://url.two http://url.three
do
short=basename $DF_URLS
lynx -dump $DF_URLS |\
sed ‘1,/Ads via The Deck/d; / Previous: /,$d’ > $short
done
Of course replace the URLs with the actual URLs from DaringFireball. Do this in a clean empty directory so the only files in it will be the output files. Then run ‘wc -w *’ and it will give you the word count for each file and then the total.
I bet you want to know: “What days of the week should I expect to see a DF post?”
for DAY in Sunday Monday Tuesday Wednesday Thursday Friday Saturday
do
echo -n "$DAY: " egrep "^$DAY, .*200(5|6)$" * |wc -l done
Sunday: 4
Monday: 14
Tuesday: 10
Wednesday: 15
Thursday: 16
Friday: 22
Saturday: 3
Well what about months?
for MONTH in November December January February \
March April May June July August September October
do
echo -n "$MONTH: " egrep " $MONTH 200(5|6)$" * |wc -l
done
November: 3
December: 2
January: 5
February: 4
March: 7
April: 12
May: 9
June: 4
July: 3
August: 9
September: 13
October: 14
We need to tweak that output because one of those October posts was from 2005 and the rest were from 2006, so October 2006’s result is really 13.
Except for a bit of a light summer (June/July), you can see that post frequency went up a great deal after he .
Ok, now you’re wondering: “How many of those posts included the word ‘jackass’?” Well just run a quick “fgrep -li jackass *”
- and_oranges
- gartner_jackasses
- jackass_kieren_mccarthy
- jackass_paul_thurrott
- jackass_rush_limbaugh
- jackass_stamp
- magic_8ball_zune
- mccarthy_still_a_jackass
- neal_mueller_washington_post
- rob_glaser_jackass
Man I’m tired but it’s a good, scripty tired.
Linked List Love
If you follow a lot of Mac sites, you read the same news over and over again. One site gets it, and 10 echo it. I actually have a folder of RSS feeds I call “Mac News” for sites which all pretty much cover the same thing. I’ve never bothered to check to see if any of them are better than the others or any of them just echo stuff I could hear elsewhere. It’s easy enough to just check that whole folder and scan the headlines.
The is something different. It not only covers Mac news, but also stuff around the ’net that the author finds interesting. I’ve realized over the past year that he and I share similar interest, from movies (The Departed was great) to basketball teams (Bird-era Celtics) to parenting to video games that can be played without controllers using 72 buttons and 3 joysticks. Yes there are rumors that he roots for a certain baseball team from New York and I’m from Boston, but these are things you do not talk about.
Where was I? Oh yes, the Linked List. There’s just a bunch of interesting stuff there, much of which I wouldn’t have found on my own. So that’s another reason that I like being a member of the site.
Then I started to wonder about the Linked List. Just how many posts have there been to it? So I loaded up each of the Linked List archives for the months in question.
Whoa. Just the size of the scrollbar alone told me “I’m not going to even try to count that.”
Clearly I’m going to have to script that.
for MONTH in november december january february \
march april may june july august september october
do
if [ "$MONTH" = "november" -o "$MONTH" = "december" ]
then
YEAR=2005
else
YEAR=2006
fi
lynx -dump http://daringfireball.net/linked/$YEAR/$MONTH |\
fgrep -i http://|\
fgrep -vi http://daringfireball.net|\
awk '{print $2}'|\ cat -n > df.ll.$YEAR.$MONTH.txt done
For those of you who don’t speak shell script, let me translate: We are in a FOR loop which will execute one time for each of the months named there. Note that I begin with November and end with October as those were the months of my subscription, but they needn’t be listed in the order there. My original idea was to use a counter which would increment each time through the loop and after 2 loops it would go from 2005 to 2006, but I decided that the IF/ELSE was more elegant/fewer moving parts.
Inside the loop we check to see if the month is either november or december. If so then it must be 2005, otherwise it is 2006. Then I ran a loop against the archive of the Linked List for each of those month/year combinations. I looked through the output for URLs (the first ‘fgrep’ line) and then excluded links to DF itself (the second fgrep line). I then picked out the 2nd item of each resulting line (which is the URL, not the number.. check the output of lynx -dump and you’ll see for yourself). Then I took that and numbered each line (cat -n) and stuck the output in a file. This last step was not necessary, I could have just run ‘wc -l’ (count lines) against each file. In fact I did such a thing:
wc -l *
141 df.ll.2005.december.txt
102 df.ll.2005.november.txt
170 df.ll.2006.april.txt
203 df.ll.2006.august.txt
114 df.ll.2006.february.txt
169 df.ll.2006.january.txt
203 df.ll.2006.july.txt
228 df.ll.2006.june.txt
176 df.ll.2006.march.txt
225 df.ll.2006.may.txt
298 df.ll.2006.october.txt
320 df.ll.2006.september.txt
2349 total
You can see that the Linked List is hugely active, and as I said before, many of these stories are things that I had not seen elsewhere. This is not just the duplication of content, links, stories that you get at all the “Mac News” sites, in fact many of them are not Mac related at all, but still interesting.
I did wonder about duplication. How many of those 2,349 links were unique (i.e. how many times would you see the same link referred to on DF’s Linked List)? That too was easy to deduce using “awk ‘{print $2}’ *|sort -u|wc -l” which translates to “Give me the 2nd column [which is the URL] from all the files (awk), sort them so that I just just unique lines (sort -u), and count the resulting lines. Answer: 2205. So 144 duplicates. Part of the problem was that I didn’t even try to filter out things like links to “The Deck” the highly unobtrusive ads which run on the site. Still, 2200 links in a year, filtered by a real human, almost all of it stuff that I haven’t seen 18 other times and places. And each piece comes with a line or two telling you what it’s about, so you know if it is something that you are interested in.
Whew.
Well if I wasn’t convinced already, here was certainly a mountain of evidence.
Think I’m a bit weird? Here’s the kicker: I had already sent in my renewal before I did all this. Why?
Because at the end of the day it still comes down to the fact that I like and enjoy the site enough to spend some money on it. The rest is just frosting.
Update 3 November 2006: So this post made the which explains how anyone else saw it :-)
The humble side of me doesn’t want to link to this, but it’s too good to pass up.
Ok, well I didn’t do it to get on the Linked List, but I probably would have been disappointed if it hadn’t made it.
TJ probably could have saved some time if he’d known that you can just add a “.text” extension to the permalink URL for any full article to get it in Markdown-formatted plain text.
Dude, what kind of lesson would that have been for padawan Unix Geeks?
Footnotes:
-
BTW you might think that I just copied and pasted that list from but I totally didn’t. DF shows the list in descending order (newest first) whereas I have it in ascending order (oldest first). Or maybe it’s the other way around… it’s like the nearsighted/farsighted thing, I can never remember which is which. Oh, and I have mine in an ordered list, whereas DF has them in paragraphs, much to the chagrin of semantic web enthusiasts everywhere (all 6 of them).
I did, of course .↩
- How I counted words: I made a list of all the URLs during my subscription period (see above ordered list) and then ran this loop:
for $DF_URLS in (list of full URLs from above each separated by a space)
do
short=`basename $DF_URLS`
lynx -dump $DF_URLS |\
sed '1,/Ads via The Deck/d; / Previous: /,$d' > $short
WORDS=`wc -w $short`
echo "$short ($WORDS)"
doneThis gave me a local copy of each article so I didn’t have to keep hitting the DF site, and worked fine except for the fact that there were two posts named “Feed Me” (30 Apr and 18 Oct 2006). So I manually ran lynx(1) for that URL and saved it to a different filename (feed_me-2). I then ran wc(1) — the Unix word count utility — with the -w flag to give me the number of words in each document. Note that this amount is slightly inflated since I did not bother to delete the words for the date/time stamp at the top of each post, which would count for approximately 4 words per post, or 340 words, which would bring the total to 109,058 words. Also note that this is not necessarily a 100% accurate count as I believe that wc(1) considers pretty much any character surrounded by whitespace as a “word” including things like footnote digits. So maybe we’re down to 109,000 words. Statistically insignificant.
Note that there are three spaces before the word “Previous: ” This relates more to the format of the output of “lynx -dump” than DF’s authoring style itself. If I had been thinking more clearly I would have used an ^ to anchor the regex at the beginning of the line, ↩
Pingback:
Pingback:
Pingback: