October 2, 2004
RSS Penetration Among Online Publishers
by at 11:20 AM
Here at Gnomedex there's a tremendous amount of excitement and buzz about RSS. But in my conversations about Topix.net's news crawling, I'm finding some misconceptions about how widespead RSS syndication is among traditional online publishers.Only 7% of the sources Topix.net crawls have XML feeds. I'd estimate that only a few hundreds of the top 3,000 newspapers we crawl have RSS support. The rest we obtain with a news crawler which is good about finding articles on news sites, leaving behind the ads and navigation sidebars. It's low maintenance so we don't have to change anything everytime a site redesigns its html.
Even for sites which offer feeds, we'll generally continue to crawl the human-readable version. We've seen sites where the RSS broke but no one at the paper seemed to notice, or cases where the RSS was out of sync with the human-viewable web content. By crawling both we get full coverage of the content available.
There are approximately 1,400 weekly newspapers in the US, and over 2,600 weeklies. There are around 3,000 magazines, and thousands of radio and TV station websites. Not to mention the city government websites we crawl looking for local announcements.
Despite the enthusiasm around RSS, there is a long way to go before the bulk of this content will be available in feeds.
Recent Entries
- Headline News: Topix on CNN.com
- Topix Cracks the Top 20 & Gets a New Suit
- Inviting Readers to the Party: Expanding the Definition of News
- Topix Grows 81%, According to Hitwise
- What's Missing from Your Local News?
- 500 Editors and Counting
- Reinventing Topix: Topix.Com(munity)
- Topix shows you "How To" at BlogHer
- SXSW Talk: When Communities Attack
- What can you do with one million people?
Archives
- July 2007
- June 2007
- May 2007
- April 2007
- March 2007
- February 2007
- January 2007
- December 2006
- November 2006
- October 2006
- August 2006
- July 2006
- June 2006
- May 2006
- April 2006
- March 2006
- February 2006
- January 2006
- November 2005
- October 2005
- September 2005
- August 2005
- June 2005
- May 2005
- April 2005
- March 2005
- February 2005
- January 2005
- December 2004
- November 2004
- October 2004
- September 2004
- August 2004
- July 2004
- June 2004
- May 2004
- April 2004
- March 2004
- February 2004
- January 2004
Powered by Movable Type
About Topix
- About Us
- Advertise
- Contact Us
- FAQ (General)
- Feedback
- Jobs
- Press Room
- Privacy Policy
- Terms of Service
Blogroll
- Rich Skrenta
- Mike Markson
- Blake Williams
- Chris Zaharias
- alarm:clock
- John Battelle
- Susan Mernit
- Micro Persuasion
- Greg Linden
- Jeremy Zawodny
- Search Engine Watch
- ResourceShelf
- Jeff Jarvis
- Traffick
- TechCrunch
- PaidContent
- Allen Morgan
Topix
