daypop weblog

Sunday, September 30, 2001

Need to fix Top 40

Looks like there's no easy way around it. I'll have to differentiate between new links from new weblogs and new links from old weblogs if I want the Top 40 list to be useful and keep adding weblogs to the index. The problem right now is that I've just added a whole bunch of weblogs to the index and the Top 40 is swamped with ringsurf and webring links.

Let me fix this...

Update 11:30: Hopefully this is fixed. Now I can add weblogs like crazy and not have to worry about the Top 40 list going out of wack.

posted at 10:38 - 0 comments

Saturday, September 29, 2001

Two processors

I installed a second processor into the server today. I was worried about being offline for too long while tooling around with the guts of the server so I replicated the index on an old 333MHz machine I have.

It was ridiculously easy to recompile the SMP (multiprocessor) kernel under FreeBSD for the main server.

The other machine was online for no more than 10 minutes while I opened up the main server, plugged in another PIII, a passive heat sink and a VRM. Walk in the park.

Of course, the processor was not the bottleneck before, so this second processor is for pure geek factor.

I'm wondering, is there a cheap load balancing device out there? It seems like a device that would automatically switch the HTTP requests to my old machine anytime the main server went down would be pretty useful. Something simple, not a full blown Cisco solution. Something like a Sonicwall SOHO of load balancers.

posted at 15:25 - 1 comments

Feedback Loop

OK. It's starting to happen. This site has a blogdex feed. While this doesn't create a direct feedback loop with Daypop, it does push up the score on the Blogdex Top 15 links.

These RSS feeds need to be differentiated from normal links.

Otherwise, the scores on Blogdex and Daypop are inaccurate.

Paul Nakada suggesting using redirection to "break the chain". There are a couple ways to do this. The first creates links in the feeds that are in the form of

www.daypop.com/redirect/link_id

I could then take all the requests and redirect them to the proper URLs.

The general purpose solution for the feeds would involve links in the form of

www.daypop.com/redirect?url=blah.com/whatever.html

That way all sites that create feeds could use it (as long as I don't check that the URL is part of the Top 40). But this is open to abuse from sites that don't create feeds. Also, it makes other sites dependent on Daypop for their feeds to work.

There are other ways to differentiate links, like a begin ignore link meta tag and an end ignore link meta tag. Any link between those two tags is uh, ignored. But this is not enforceable by the feed creators. Users will have to place these tags around their feeds.

I'm leaning towards this solution:

www.daypop.com/redirect?url=blah.com/whatever.html

except I'll check that the specified URL is in the Top 40 database. This makes the URL human readable for when users mouseover the link and it displays in the status line.

Any other suggestions?

posted at 9:36 - 1 comments

Friday, September 28, 2001

Minus/Negation: What do you call it?

Now you can do searches like this:

lincoln -president

To find pages which mention Lincoln but DON'T have the word president. This is useful for narrowing your search.

It's another way of adding meaning to a word (adding context?).

Lincoln could refer to any number of things -- Abe, memorial, park, town car. The typical way to add context is to enter more search words to more fully describe what you're looking for.

lincoln park
(not to be confused with linkin park)

Using the minus prefix on terms you don't want strips away search results more conservatively. It retains more meanings of the word Lincoln.

Anything not dealing with President Lincoln including park, street, center, cars, etc...:

lincoln -president -abraham -memorial

These are not great examples, but you get the idea.

posted at 14:27 - 0 comments

The Onion

Even on my screen at 1920x1200, the bar graph for The Onion in the Top 40 list goes off the edge of a maximized window!

posted at 1:07 - 0 comments

Thursday, September 27, 2001

Changed the ranking function for the Top 40

Still experimenting. This just means that the max score can be a leftover from a different ranking function and that sometimes the green and light red that indicate increases and decreases in score will be wrong.

I'm thinking the massive interest in The Onion as a meme is a rare occurence. Otherwise, if the bar indicating the score is frequently that long I'll have to scale down the score.

posted at 21:34 - 0 comments

People are going nuts for The Onion!

The bar graph on the Top 40 goes off the edge of the window for most people (not me, hehe, 1920x1200). And that's not counting the alternate URLs for the issue and the individual article links. I read the issue front to back (do we need new idioms for new mediums?) and loved it. They timed the release of the issue just right. Any earlier and it probably wouldn't have been as funny. "Holy Fucking Shit!" That tag line is classic because it's exactly what most of us were thinking. Well, at least that was what I was thinking...

posted at 11:38 - 0 comments

Wednesday, September 26, 2001

A problem with Top 40

You may notice a couple Top 40 links that aren't really currently popular memes, like Salon.com's home page. This is because I just added a bunch of weblogs to the index and I don't make a distinction between new links from new sites and new links from old sites. So to the ranking function, it looks as if a bunch of webloggers have suddenly expressed interest in Salon.com.

posted at 0:45 - 0 comments

Tuesday, September 25, 2001

Inbreeding

Strange title for a post but Paul Nakada's comment about "gene pools" got me thinking again about the problems of a central place for popular links e.g. Daypop Top 40. The feedback loop I described for an RSS feed exists even without RSS feeds. Webloggers who find cool links on the Top 40 list will post those links on their sites, causing the link to rise in ranking. If everyone becomes too reliant on these lists, then the variety you find in weblog links diminishes. Doesn't it?

On the other hand, I guess Paul Nakada's point was that the more ranking tools there are out there, the larger the "gene pool" of "ranked" links becomes, helping to keep variety from going the way of the Dodo bird.

posted at 20:49 - 7 comments

Advanced Search

I've done the work necessary to implement preferences saved in a cookie. This means you'll be able to specify the number of results to display per page (right now the default is 10) up to 50.

Also, you'll be able to limit the time range further by specifying a minimum age for the pages. For example, you'll be able to search for pages that are "at least two days old" "up to two weeks old".

Right now, I'm wondering if this time range is something that should be saved as a preference. For example, if preferences were set during the attacks on Sept. 11, you'd probably set them to a narrow range, say from zero hours to three hours, because 1) there is a huge amount of news related to the attacks, 2) you need to filter out only the most current news and 3) you are checking the news constantly. If you subsequently forget that you set this range, search results now would seem to turn up very few results. Daypop might look broken.

What to do?

Display the time range being searched? OR

Don't save the time range as a preference?

I suppose regardless of what I decide on, I'll need to modify the Citations link to return all pages from now to four weeks ago, and I'll also need to change the Top 40 weblog citations link to do the same.

I don't want to put preferences in until this gets resolved. Any suggestions?

posted at 17:12 - 0 comments

Monday, September 24, 2001

Daypop Top 40

I guess I might as well provide a link to it here.

When I first started working on Daypop, a Top 40 page was part of the plan -- a complement to the search engine. Instead of just putting something together quickly early on, I was waiting until the data structures stopped changing before computing popular links. I wasn't in any hurry.

Then, of course, Blogdex came out and I scrapped the Top 40 page. I launched Daypop without it.

But my friends who had heard about the Top 40 idea still wanted it. "Wouldn't it be redundant?" I thought. "Well, I guess my ranking algorithm will most likely differ from Blogdex." I knew it would only take a couple hours to put together, most of it creating the HTML, so I put up the page last week and told my friends about it.

I've been using it a lot this past week. Now, I've decided I might as well open it up to the world. The current ranking function has not been extensively tested (I just changed it right now) so who knows what'll happen in the next few days.

What are other webloggers talking about? The Daypop Top 40 is a list of the links that are popular in the weblogging community.

This is not meant as a replacement for Blogdex. Blogdex ranks all links. It's got an All-Time links section. It has a more comprehensive feature set. I only post the Top 40 links that are timely and I use a different ranking function.

The way links are ranked, they have a tendency to shoot to the top if they are popular and just as quickly, they can fall from grace. If a story breaks and the interest is not immediate, if it takes time to build, then the link is less likely to reach the top. I might still mess with the ranking function in the future.

How to read the bar graph:

The "EQ" looking bar under every link displays the relative score of each link. There is a lot of information that you can get from the bar. The is the current score. The farther to the right it is, the higher the score. Any bright green in the bar denotes the increase in score from the last update. Any light red in the bar denotes a decrease in score from the last update. Right now, the list is updated every six hours. The light, light gray shows the maximum score that the link has reached.

The is also a Citations link. Click on this to see all the weblogs that cited the link.

Once again, here's the Daypop Top 40. Hope it comes in handy!

posted at 17:07 - 8 comments

Friday, September 21, 2001

Cached Date

One small change: the TITLE attribute of the Cached link now holds the exact date and time that the page was cached. So moving your mouse pointer over the Cached page link on most browsers will pop up this date and time.

posted at 12:09 - 0 comments

Wednesday, September 19, 2001

Cached Pages

The URL generated in the search results for a cached page includes a timestamp (time=YYYYMMDDHHMMSS). If you happened to use this URL (in a link on your page, for example) and the old version of the page was replaced by a newer one in the database, that URL became obsolete.

I've changed it so that if the time specified isn't in the database, or if no time is specified, Daypop returns the latest cached version of the page.

For example:
http://www.daypop.com/cached?url=http://www.danchan.com/

This brings up the latest version of my weblog. No need to even put a time=20010919102421 (or whatever) in the URL.

posted at 17:01 - 0 comments

Tuesday, September 18, 2001

German translation of Submit page

I got an email from a Daypop user who translated the Submit page text properly to German. Very nice of him to help me out.

Unfortunately, I'm still getting hundreds of irrelevant submissions... *sigh* I've made the change the German text so hopefully that will detract new auto-submission programs from targetting Daypop.

I'd love to Internationalize Daypop... well, later. I get sidetracked sometimes...

I18N = short for Internationalization, "I" followed by 18 characters followed by "N".

posted at 3:31 - 0 comments

Google and Recent News

There are a few webloggers out there that are wondering what's the point in Daypop if Google seems to be committed to refreshing its index more frequently.

Like this post from Sore Eyes:
Since Google have recently announced that they're going to be refreshing their index more frequently precisely so as to catch news stories, I'm not sure how long Daypop's advantage will hold...

The thing is, Daypop only returns current news. While a search on Google for "World Trade" will include the World Trade Organization website, a search on Daypop will return only news articles. I'm hoping there's still some value to be had from this filtering.

posted at 2:58 - 1 comments

Monday, September 17, 2001

About.com

There's an article covering Daypop on About.com. There's a bit of confusion regarding the Citations link and "link:" queries, which do work as you'd expect (they are not just text searches). Other than that it's an excellent review.

posted at 22:11 - 0 comments

Link search back online

I copied the link information from backup into the live server a couple hours ago and restarted the indexing, so link searches should be back on track again.

Just FYI: I'm most likely going to shut down the main server this Saturday in order to install a second CPU. During this time, I'll have my development machine taking care of performing search requests using the data snapshot from 4:00AM Saturday morning. This pretty much means that Saturday, Daypop will not be crawling and indexing, but this should resume on Sunday or late Saturday night.

posted at 18:10 - 0 comments

Bad bad bad

I decided to make a change to the live server without testing it and I forgot to remove this one line that said: DELETE ALL THE LINK INFORMATION!

HAHA! Hooo boy...

This means link searches won't be fully functional. Some sites may show up as new pages get indexed and new link information is added.

I've got a backup from 4:00AM this morning but it'll take some time to get it back in the database...

posted at 12:22 - 0 comments

Sunday, September 16, 2001

I added a new feature

When searching for links, the fragment of text displayed for each page will now include the link text, bolded. Some sites still need to be recrawled, then indexed for this to happen, but about half the weblogs have already been reindexed with the new information.

Try it out: search for link:daypop.com

If you find bugs, email me.

posted at 21:37 - 0 comments

Saturday, September 15, 2001

Anonymous Posting for we::blog

The code had been done for a while but I wasn't sure if it was a good idea. This weblog can be the testbed for it.

posted at 13:03 - 1 comments

Thursday, September 13, 2001

Limit your search to recently cached pages

I added another feature, which may or may not migrate over to a to-be-created "Advanced Search" page, that allows you to narrow your search to a more recent timeframe. Using the new pull-down menu, you can limit your search to pages cached in the last hour, the last three hours, etc. up to two weeks.

posted at 3:41 - 1 comments

Google's cache...

...and indexing seems to have rolled back to sometime in early August. A search for "daypop" on Google now turns up nothing. Looks like they've turned off their daily indexing of weblogs for now.

posted at 0:42 - 0 comments

Wednesday, September 12, 2001

Nostradamus meme

There's been a lot of interest in this meme.

"In the City of God there will be a great thunder, Two brothers torn apart by Chaos, while the fortress endures, the great leader will succumb" , "The third big war will begin when the big city is burning" - Nostradamus 1654

Kevin Fox of www.fury.com exposes the hoax and says:

It's just another urban legend, only operating on Internet time: Shaped, propagated, and publicly debunked, all on the very same day.

posted at 1:37 - 0 comments

Tuesday, September 11, 2001

World Trade Center

I was woken up way too early this morning by a friend of mine who called to say planes had crashed into the twin towers of the World Trade Center. I turned on the TV and got online to check the NY Times, LA Times, etc. but many of the sites were inaccessible, most likely due to extreme demand (the slashdot effect).

Searching on Daypop for "World Trade Center" turned up links that were still available. I put up a Breaking News message to help people out with their searches.

I've been glued to the TV since. Television remains the most current source of reliable, available news.

I'm still in shock over this. Some part of me still hasn't accepted it, the fact that it could happen, that it has happened.

posted at 12:51 - 0 comments

I just went through a review cycle...

...and not a single URL looked to be a news site or weblog! Instead my site submissions list for the last three hours looks like the yellow-pages for German porn sites. This is bad. Somehow one (or several) search engine placement companies must've found out about Daypop.

If an "inappropriate" site sneaks by the review process, please someone email me so I can remove it.

posted at 0:56 - 0 comments

Welcome to the shameless self-promotional segment of our show!

Here are several excerpts of Daypop coverage from webloggers:
____________________

Research Buzz
www.researchbuzz.com

Tuesday, September 4, 2001

Daypop Ba Du Ba Dop

Iff'n you can't get enough of that bloggy goodness, check out Daypop at http://www.daypop.com...

...For thoughtful fun run a query on the "News" pages, then try the "Weblogs" pages. It's amazing the different flavors you get...

Absolutely worth a look.
____________________

Ghost Rocket
web.pitas.com/astroman

Daypop
Tuesday, September 4, 2001

Could it be a metaweblogsearchthingy that seems to work and is up to date? It can't be! Linkwatcher (which seems to have stopped crawling my page; bummer.), the bloated and slow as molasses Weblogs.com (think you can connect in less than five minutes? good luck! BTW, thanks for all the spam, Dave!) and Blogdex (no search capability) don't fit my obsessive weblog searching needs.
____________________

enterfornone
www.enterfornone.net

Current events search engine
Posted by enterfornone on Wednesday, September 05 @ 22:51:19

Google is already scanning many news sites daily which makes it useful for looking for current events, however any search is still going to end up with the usual Google mismatches. Enter Daypop - a search engine designed specifically for searching for current events...
____________________

ToT :: Cock-a-hoop Over Coco Puffs
timothompson.com/journal.htm

stardate: September 5, 2001

...the ability to distinguish which type of content you want to receive results from (news or weblogs) is invaluable...
____________________

The Obvious?
www.blaven.demon.co.uk/weblog/blogger.html

Friday, September 07, 2001

Amazing!

Daypop is a very useful search of all the latest news, blogs etc on the web. It only searches sites that change regularily so if you want to read about something topical it's very useful.
____________________

Commentary
qbtm.tekscode.com/weblog

09^05 @01

DayPop is a cool idea. It's a search engine for current events and weblogs. So if you need to look up current events for school, this works a lot better than Google. Sweetness...
____________________

posted at 0:18 - 0 comments

Monday, September 10, 2001

Add a Daypop Search Box to your site

A couple people suggested I provide a service where sites could add a Daypop search box to their pages. This seemed like a great idea so I've added some instructions on this page for people who want to do so!

posted at 23:28 - 0 comments

Added German submission instructions

I translated using babelfish so most likely the German instructions are close to gibberish, but I'm hoping that some meaning can be gleaned and that fewer inappropriate sites get submitted.

Bitte...

posted at 17:24 - 0 comments

Swamped

I've been swamped with a deluge of several hundred German site submissions that are neither news sites nor weblog pages. At first as I reviewed each site in turn I would try and recognize the e-commerce or porn sites from the URLs and skip over them. Meaning, if the URLs wouldn't pass say, Google's SafeSearch or the Grandma test, you know these words, then I'd go on to the next URL without even reviewing the site.

After a while though, it became apparent that this strategy wasn't working, so I ended up running down the list trying to recognize URLs than might be news or weblog sites. This of course means I might've, probably definitely did, miss some relevant news sites and weblogs. Sorry, resubmit your site if it doesn't show up in the search results in 24 hours.

I'm going to have to reconsider this whole site submission thing...

posted at 13:45 - 0 comments

Submissions

I'm starting to receive Asian language page submissions. Unfortunately, Daypop can't index these pages yet but I'm compiling another list which will get indexed once this functionality exists.

Also, this page is great one-stop shopping for D.A. Blyler articles. I'm not including it in the index because I don't consider it timely, but if you've heard about "The Seven Vices of Highly Creative People" or "The Seven Habits of Sensitive, Celibate Men", this is the page where you can find Blyler's other articles.

posted at 2:21 - 0 comments

Redesigned The Story of Daypop

Not much of a redesign really. alex63's comment on "self referentiality" got me thinking...

...about making this weblog, which is about a search engine, resemble the search results from that search engine. Well, at least a little...

posted at 1:43 - 0 comments

Crawling Userland...

It looks like those forty weblogs are started to get crawled and indexed. I noticed that a search for "please slow down" now only has 20 or so pages with the error message.

posted at 1:37 - 0 comments

Sunday, September 9, 2001

Thanks, Userland!

Those guys at Userland responded quickly! They're going to unblock their sites for Daypop!! Those forty or so weblogs that are affected should get indexed soon enough... This is good good news.

posted at 12:15 - 0 comments

Saturday, September 8, 2001

How does she do it?

Check out this page... Click the Remote Cam. How do you get the "window" to look like that? What is that? Is that really a window? Why does it move when moved instead of tracing an outline like a normal browser (IE/Win98) window? Hmmm...

OK, I'm still cruising around the weblog world adding stuff that I find to the index. It's addicting.

posted at 16:01 - 3 comments

A couple problems with weblogs.com and editthispage.com

I noticed one weblogger from editthispage.com had submitted her weblog several times. Since her weblog was one of the "originals" that I had in my index (I started with a list of about 100 weblogs in the testing stages) I checked out why it wasn't making it into the index.

Do this search: please slow down

You'll notice all the weblogs.com and editthispage.com pages have a message to effect of:

Please Slow Down. Your crawler is hitting our servers too hard.

I checked my scan logs and requests are generally at least a minute apart and more likely even further apart because this only affects about 40 pages which get re-indexed every 24 hours.

Anyway the message is misleading because it's not really the case.

I remember way early on, I encountered this problem with Doc Searls weblog and I emailed Userland to ask what was up. They said they didn't allow crawling of Userland sites at all. I promptly forgot about it and the implications. At the time, I figured OK, so I can't spider Doc's page. I didn't even think about all the other people who had their weblogs hosted by Userland.

I've emailed Userland again and asked them to help me out on this. Hopefully, I can get weblogs.com and editthispage.com webloggers indexed in the near future.

posted at 15:39 - 0 comments

Sampras/Agassi match

I saw the highlights last night and it was amazing, the intensity and concentration of those two. Some awesome agressive play.

I just realized I could find out what others were thinking about the match...

So I did a search through the weblogs for "sampras agassi".

posted at 0:17 - 0 comments

Friday, September 7, 2001

Reaching critical mass

I think even though I only have about 2000 (at most) weblogs in the index, I've got most of the popular news linking kind, because when I compare the number of pages I get with a query like

link:www.theonionavclub.com/avclub3731/avfeature_3731.html

I get almost as many pages as Blogdex. I tried this with a couple of the top popular links on Blogdex and Daypop is pretty close on all of them.

posted at 20:44 - 0 comments

Thursday, September 6, 2001

What is the international language of the web?

It just occured to me. It doesn't matter what language a site is in, a link URL is universal.

posted at 15:17 - 0 comments

Bug Fixed

SLIGHTLY TECHNICAL: I got an email today about a bug in Daypop where the cached page URL wasn't being correctly escaped so cached pages for phrase searches wouldn't show up. Make sense? Ah, forgehduboudit.

Short version: It's fixed.

posted at 12:30 - 0 comments

You can get a sense of the "web"

...of blogs by clicking around on Citations. The center, the core of this web is pretty obvious -- the number of pages returned gets close to 100 (I'm not including sites like Kuro5hin), but as you get farther out towards the edges this number decreases. I haven't found a "dead end" yet.

Not too long ago though, I wasn't even a part of the blog web! And most likely, I'll return from whence I came as soon as the links to danchan scroll off the bottoms of pages reporting news of Daypop ;-P

posted at 12:17 - 0 comments

Citations link

I've been holding off putting in a Citations link because the link searches were so slow. Well, I was looking through the link search code seeing if I was doing something wrong, something stupid, something bad, something I could fix so I wouldn't have to redo the entire link indexing system and there it was! Like a bright shining beacon of incompetence!

It was still there! Something which I thought I changed, I thought I had fixed! Did I dream that I had changed it? WTH is going on?

Anyway, link searches are much faster now. Of course, I want to make them faster still, but that's gotta wait for Version 2. For now, Citations links appear only for weblogs, so webloggers can click around for link love.

posted at 12:09 - 0 comments

Wednesday, September 5, 2001

Tired of all the submitted sites that are selling stuff

Way more than half of the sites being submitted are not weblogs or news sites. Not even close. I guess I need to make it real obvious on the submission page that only weblogs and news sites will be accepted.

One site was powered by Blogger but was a static page of information. It wasn't timely. I guess Blogger was used to make editing the text easy? In any case, using Blogger does not necessarily make a site a weblog.

posted at 19:51 - 0 comments

What about non-news sites?

There are plenty of online magazines that don't offer news or information that is recent or timely. Sites like Webmonkey which have tutorials but no news, per se. Should I index these so that new tutorials show up in search results? It seems like this is not what people will want to use Daypop for...

Then there are sites like dpreview.com which reviews digital cameras. Are new cameras (and their reviews) considered news? Will people use Daypop for this purpose? Should Daypop index sites like this?

posted at 17:24 - 1 comments

What to do about Meta?

I'm running across sites that offer headlines stripped from other sites, which in turn strip them from sources like say Yahoo, which in turn gets them from AP or Reuters. At what point does indexing become redundant? When I started I avoided added these meta-meta-headline sites, but now some people are submitting them. I don't want the search results to become "bloated" and "redundant".

posted at 17:20 - 0 comments

link:searches

They're slow. Way too slow. Search should be (or seem) instantaneous but I used a different, simpler method for link searching that ended up moving like a slug. Ugh. Now, the question is, do I fix it now or wait until Daypop Version 2? I was supposed to put a feature/algorithm freeze on the production Daypop but I can't stand how slow the link searches are!

No bugs have cropped up. Yet. But when it rains, it pours, so should I be messing around with the entire way links get indexed?

posted at 12:24 - 0 comments

Done Surfing for Now

I think I'm all blogged out. I'm done surfing for weblog sites. There are well over a thousand weblogs now. I emailed Evan Williams of Blogger the day I launched Daypop and asked if he happened to have a list of active Blogger weblogs but I haven't received a reply yet. I'm sure he gets a TON of email and maybe a weblog list is not information that he would want to give out. So I've been reviewing sites and adding them in myself in addition to taking submissions. I've gotta say, I kinda liked doing it this way since I've gotten a chance to check out a lot of weblogs and I got to put off programming Daypop stuff.

One other thing I noticed while surfing sites: some sites take forever to load. If it took more than say 15 seconds (sometimes less if I got impatient) to load then I just moved on to the next weblog. I noticed most editthispage sites took forever to load. I check out submitted sites regardless of how long it takes to load (as long as it eventually does).

posted at 3:28 - 2 comments

Adding Sites

I've noticed a couple things while adding sites either from submissions or from my manual crawling of weblogs. First off, I've surfed a ton of weblogs. So many in fact that I figure I've covered a good fraction of the logs out there. I don't know, I'll throw out a number, 20%?

There are a lot of good weblogs out there.

In terms of design, in terms of writing, in terms of good links, in terms of good pictures/photos. Geez, all these talented people out there sharing their thoughts, ideas, talent with everyone else. I was pretty single minded in my pursuit of adding weblogs to Daypop's index, so unfortunately I didn't write down some of the better ones out there. You'll just have to trust me and surf around. Better yet, search around on Daypop for topics that interest you and click on weblogs that you don't recognize.

I had a hard time keeping focused because a lot of times the writing would start to suck me in or the link looked like it could be interesting, I could have spent days exploring. I've done it before.

One thing about submissions: some of them didn't make it in for one reason or another. Since Daypop is a current events search engine, the submitted site should be current and updated on a regular basis. In other words, company sites don't count. If you've got a question about your site, email me.

Another thing about submissions: check to see that your site isn't in the index yet. Actually I find it amazing that I can remember a lot sites that I've seen and avoid duplication, but already I've seen one example where I've failed. Duplication is only a problem when the URLs differ.

I've tried hard to "normalize" URLs which means they follow certain rules. Like if www.yoururl.com exists I'll use the www version and not not the yoururl.com version. Unless you've got something like myblog.blogspot.com. I'd keep that URL the way it is. www.myblog.blogspot.com is too long. Also, if the URL is something.something.com/index.htm, I'll check to see if something.something.com/ works and use that instead. In other words, I tried (or should have tried) to use the simplest www URL possible.

I get tired and some sites break these rules (my bad).

I have checksums on all the pages but I currently don't check them for duplication. That comes in the next version of the engine.

When manually surfing weblogs, I've found a ton of them are dead, or redirected multiple times to dead ends, or just abandoned. Ghost Blogs. Eerie to see writing that just stops sometime in '99. It's still there to view, it still exists, it's just unloved and abandoned. No explanation why. A lot of the times the last entry doesn't say something like "This is my last post." One dead end that I remember was a blank page with "goodfuckingbye".

My rules for adding sites that I encounter while surfing:

The site had a entry in the last couple weeks.
The site's last entry doesn't say something to the effect of "Wow, it's been such a long time since I last wrote!" You'd be surprised how many blogs have this.
The site has a entry at least once a week.

Of course, if you submit a site, it can break these rules. I figure submitting a site means you care enough about it to want it in here. I'll pretty much add any submitted site that's a weblog.

Which brings me to my next topic: some sites are hard to categorize. What constitutes a weblog and news site? The line is being blurred. I love it when the answer is obvious.

An online journal >> weblog
news articles >> news

But what about headlines in a weblog format? With opinion thrown in? For example, what is Kuro5hin and Slashdot? And why have I categorized them differently?

I don't know why. It's a gut feeling generally, or maybe a mistake. But if you've got questions about the category your site falls under, email me.

Further complicating things are sites in languages other than English, which unfortunately is the only language I know. Some submitted sites I've categorized as a weblog merely because it looks like a weblog in say, Swedish. Maybe it's not. Whoops! Correct me if I'm wrong. If you speak other languages and find errors in categorization, help me out!

Another thing, frames. Ugh. What can I do about pages that use these? I can provide a link to the first page with frames and index it but what of pages that that page links to? There is no unique URL for those pages, is there?

My mission: There's a lot to be done still. If you guys keep using it, I'll keep making it better.

posted at 1:05 - 0 comments

Tuesday, September 4, 2001

The Beginning

I launched Daypop on Aug. 29, 2001. I think it was around dinner time, six o'clock or so. A short message went up on and I added an side bar permanent "ad" on www.danchan.com.

Later that night, this weblog

http://snotverjeppie.blogspot.com

linked to Daypop and I got my first referral traffic!

After 24 hours there wasn't much traffic coming to Daypop still. Dave Winer, who wrote the article on Google indexing weblogs daily (ACK!) that nearly gave me a heart attack, linked to Daypop. Thanks to Dave a lot more people were checking out Daypop.

At this point in time, I realized I might have made a mistake in launching Daypop since people were starting to use it and I was going away to Las Vegas for the weekend.

Since Daypop hadn't truly been tested, except by me, any number of things could go wrong while I was away.

Plus, my ISP told me they were doing some massive overhaul of the system and that there was a possibility it could go down. Of course, they said not to worry... Uh huh.

But that sure as hell wasn't going to keep me from going to Vegas! I figured this was a real world test. Let's see how this sucker holds up over the long weekend... Well, at least that's how I justified my trip in my mind.

Vegas was a blast!

Everything was fine when I got back. Since last night, I've added over 200 more sites, submitted by people over the weekend, and I've created an email account for Daypop feedback.

I'll need to link Daypop to this weblog at some point in the near future.

posted at 11:04 - 0 comments

Archive

9/2001 10/2001 11/2001 12/2001 1/2002 2/2002 3/2002 4/2002 5/2002 6/2002 7/2002 8/2002 9/2002 10/2002 11/2002 12/2002 1/2003 2/2003 3/2003 4/2003 5/2003 6/2003 7/2003 8/2003 9/2003 10/2003 11/2003 12/2003 1/2004 2/2004 3/2004 4/2004 5/2004 6/2004 7/2004 8/2004 9/2004 10/2004 11/2004 12/2004 1/2005 2/2005 3/2005 4/2005 5/2005 6/2005 7/2005 8/2005 9/2005 10/2005 11/2005