cdave | Entries tagged with search

I don't remember the first search engine I used, probably Yahoo, or Lycos, but I remember the first I switched to. Hotbot. It didn't proclaim that it reached more of the web than the others, or that it was faster, but it let you use keywords like NOT or AND in your queries, so you could filter the results and get more relevant ones.

This week there's a new kid on the search engine block

Rather than rely on superficial popularity metrics, Cuil searches for and ranks pages based on their content and relevance.

This is such '90s fallacy. No-one cares if you have the deepest search there is. Internet search engines are all about relevance; Returning the best first page possible.

I've just tried out a not very scientific search on my name, and ~~Cool~~ Cuil fails on several accounts.

Not using popularity tests means that the first 5 pages consist of dozens of pages an artist selling prints in many online shops. Yes these pages use his name more than I do, but that just leads to spammers creating pages full key words. This isn't helped by the fact that there doesn't seem to be a way to exclude words. Otherwise it would be easy to filter out results with "poster" or "print".

I had to scroll through to page 6 to find anything that wasn't a shop page. But by then the spammers had started to creep into the result.

The policy of catching all websites seems to have had an interesting effect. Google never managed to cache all of my old blogger site, so I had a look for that. They don't seem to have it at all. But they do have a whole bunch of spammers who have copied the text from my site. Neat I didn't know I was so popular.

I thought I'd try and see how many there were. Filtering out the other sites that may accidently match my query. Searching on the page title (without any punctuation), which is also the first words on the page, and was included in the results returned ... nothing.
What?
It's right there on the page. 90% of the results Cuil already showed me had that exact text. How can they not find any matches?

It's not the size of your cache. It's what you do with it that counts.