September 9, 2005
Tagging and Unstructured Data
by at 3:54 PM
There's been quite a bit of hype around tagging over ther past year, especially around putting structure around user created data (especially at Flickr and Technorati).
At the SES show in New York, I ranted about tagging and the fact that there has been little done to proactively deal with the obvious and inevitable problem with SPAM -- Web pages back in the mid nineties all had facilities to be tagged with meta data, and the first search engines attempted to utilize this functionality, and thus the beginning of search engine spam. If I had a nickel for every starry eyed idealist point to tagging saving the world, I'd be able to fund my own blog search engine...
In fact the founders here were 4/5's of the the core team behind the Open Directory Project which, at the end of the day, was an attempt to create a system to categorize web pages in a scalable way. The political system behind the editors at the Open Directory was a big part of whatever success it has had, and the lack of a moderated system the reason that many similar efforts have not gotten any major traction.
After talking to Ofer Ben Shachar on my webradio show about his company, Raw Sugar, I had some other thoughts around tagging. The big takeawy I got from talking to Ofer was that he saw a huge opportunity in providing value added search around the tagging done by individuals on their own data -- Load in your bookmarks, del.icio.us, Flickr tags and whatever -- get better search results. And, next, if you can put some sort of ordering in, from what people have explicitly have ordered within their own tagging -- you'll have built something of value.
Now, there's a lot of stuff those guys are going to add to their site at Raw Sugar (At least being able to explicitly tag who your friends are within their service hopefully), and I'm not sure if they've cracked the code here -- but I'm recalling the power of gathering unstructured data when I first started using Ryze (one of the orginal social software services), where you could put in anything and have it "group" with other people who entrered the same thing...they would also put some lightweight directory around these entries (education, place of work, etc) and this worked rather well to create ad hoc communities.
So -- on one side, you're looking at some pretty powerful mojo in enabling people to self categorize at least their own data and then leveraging that effort, as well as putting some structure around it. On the other, you are going to have some major problems unless you mopderate or put a reputation system in place (which Ofer mentions in passing, as well).
I'm usually a skeptic about leveraging communities (having run the ODP community for a couple of years, it's a lot harder than you might think), but at least people are beginning to think about some of the problems. Ofer's a fun guy to talk to as well. The interview is on the webmasterradio.fm site.
Recent Entries
- Headline News: Topix on CNN.com
- Topix Cracks the Top 20 & Gets a New Suit
- Inviting Readers to the Party: Expanding the Definition of News
- Topix Grows 81%, According to Hitwise
- What's Missing from Your Local News?
- 500 Editors and Counting
- Reinventing Topix: Topix.Com(munity)
- Topix shows you "How To" at BlogHer
- SXSW Talk: When Communities Attack
- What can you do with one million people?
Archives
- July 2007
- June 2007
- May 2007
- April 2007
- March 2007
- February 2007
- January 2007
- December 2006
- November 2006
- October 2006
- August 2006
- July 2006
- June 2006
- May 2006
- April 2006
- March 2006
- February 2006
- January 2006
- November 2005
- October 2005
- September 2005
- August 2005
- June 2005
- May 2005
- April 2005
- March 2005
- February 2005
- January 2005
- December 2004
- November 2004
- October 2004
- September 2004
- August 2004
- July 2004
- June 2004
- May 2004
- April 2004
- March 2004
- February 2004
- January 2004
Powered by Movable Type
About Topix
- About Us
- Advertise
- Contact Us
- FAQ (General)
- Feedback
- Jobs
- Press Room
- Privacy Policy
- Terms of Service
Blogroll
- Rich Skrenta
- Mike Markson
- Blake Williams
- Chris Zaharias
- alarm:clock
- John Battelle
- Susan Mernit
- Micro Persuasion
- Greg Linden
- Jeremy Zawodny
- Search Engine Watch
- ResourceShelf
- Jeff Jarvis
- Traffick
- TechCrunch
- PaidContent
- Allen Morgan
Topix
