daypop weblog

Daypop Top Word Bursts

Shortly after putting up the Top 40, I realized there were plenty of memes that make the rounds that weren't accompanied by links. Either the meme was a topic of conversation that had no link, or it was a meme that had no single, authoritative link. Catching heightened word usage is a natural extension to the Top 40.  
 
I kept the idea on the back burner until very recently, when I realized Joe Millionaire was the buzz, but there were no authoritative links to anchor the meme. There was the one link to the Joe Millionaire site but few bloggers linked to it when writing about the show (in spite of this, that link still made the Top 40).  
 
I went to sleep that night thinking about implementing what everyone now calls Word Bursts.  
 
The next morning (can you believe it?) I get emails about the article in New Scientist about Word Bursts. I also read it in Slashdot and it eventually makes the Top 40. 
 
Well, that got me working this weekend on this: 
 
Daypop Top Word Bursts 
 
It's catching topics that don't have authoritative links. It also catches those that do. There are sample posts from weblogs so that you can get a decent idea of what the word burst is in relation to.

Comments disabled.


Clustering

That's a great idea. This page is just a first pass.

If I were to compile a stop list that included the word "contemplates" it seems like it'd be a huge stop list and something I'd have to maintain on a daily basis. I don't know what the solution is exactly.

February 24, 2003 at 11:51
posted by anon:Dan Chan


Nice work

You have beaten me to it

Link 1, Link 2 and Link 3

February 24, 2003 at 6:27
posted by anon:Andre REstivo


Number of Searches

Oh, one more thing. How about including the number of searches that were performed on each word in the last day or so.

ex:

1. arian (26 searches today)
I don't know if Sami and Company are really in with IJ or...

February 24, 2003 at 4:57
posted by anon:Michael Fagan


Duplicates

Nice implementation. The only issue I see so far is with duplicates.

For example, the first two places now are 'arian' and 'sami', most of which come from the phrase 'Sami al-Arian'.

Also #3 and #5 are 'horsemen' and 'ablogalypse', which both come from the phrase 'Four Horsemen of the Ablogalypse'.

#4 and #9 are 'grammys' and 'critiquees' which are really the same meme.

So I think what it needs is a to loop through each word burst and measuring its overlap with each other one. If they achieve a high overlap, then merge them into the same meme.

Actually, one more thing. #20 now is 'contemplates', which isn't very useful and you may want to add it to the list of stopwords (e.g. 'and', 'of', 'the').

All that being said, I think its great already.

February 24, 2003 at 4:55
posted by anon:Michael Fagan