|
daypop weblog
Daypop Top Word Bursts
Shortly after putting up the Top 40, I realized there were plenty of memes that make the rounds that weren't accompanied by links. Either the meme was a topic of conversation that had no link, or it was a meme that had no single, authoritative link. Catching heightened word usage is a natural extension to the Top 40. I kept the idea on the back burner until very recently, when I realized Joe Millionaire was the buzz, but there were no authoritative links to anchor the meme. There was the one link to the Joe Millionaire site but few bloggers linked to it when writing about the show (in spite of this, that link still made the Top 40). I went to sleep that night thinking about implementing what everyone now calls Word Bursts. The next morning (can you believe it?) I get emails about the article in New Scientist about Word Bursts. I also read it in Slashdot and it eventually makes the Top 40. Well, that got me working this weekend on this: Daypop Top Word Bursts It's catching topics that don't have authoritative links. It also catches those that do. There are sample posts from weblogs so that you can get a decent idea of what the word burst is in relation to.
 |
Comments disabled.
That's a great idea. This page is just a first pass.
If I were to compile a stop list that included the word "contemplates" it seems like it'd be a huge stop list and something I'd have to maintain on a daily basis. I don't know what the solution is exactly.
You have beaten me to it 
Link 1, Link 2 and Link 3
Oh, one more thing. How about including the number of searches that were performed on each word in the last day or so.
ex:
1. arian (26 searches today) I don't know if Sami and Company are really in with IJ or...
Nice implementation. The only issue I see so far is with duplicates.
For example, the first two places now are 'arian' and 'sami', most of which come from the phrase 'Sami al-Arian'.
Also #3 and #5 are 'horsemen' and 'ablogalypse', which both come from the phrase 'Four Horsemen of the Ablogalypse'.
#4 and #9 are 'grammys' and 'critiquees' which are really the same meme.
So I think what it needs is a to loop through each word burst and measuring its overlap with each other one. If they achieve a high overlap, then merge them into the same meme.
Actually, one more thing. #20 now is 'contemplates', which isn't very useful and you may want to add it to the list of stopwords (e.g. 'and', 'of', 'the').
All that being said, I think its great already.

|