~C4Chaos : (hyper)linker

Re: KB101: Be a Cunning Linguist

~C4Chaos said May 1, 2006, 11:41 AM:

 

here's another reason why it's cool to be a cunning linguist:

State of the Blogosphere, April 2006 Part 2: On Language and Tagging


Strong International Growth

Back in April 2005, Technorati started automatically tracking the primary language of each blog that we tracked. We did this so that we could easily allow people to filter out posts in languages other than their native language. This is available in a pull-down menu on every search results page. We also wanted to get some idea of where the worldwide growth of blogging was taking place, and what trends we could glean from the data.

There are three very important caveats in the data sets that I'm going to describe below. The first is that we are using automated language analysis software (based on languid), and it may have bugs, thus over or undercounting a particular language or group of languages. We're going to be continually improving the capabilities of this software, but we are pretty confident in its ability to work reliably, especially over the large data sets that Technorati tracks (over 35 million blogs at this time, and over 1.2 million posts each day). Second, we believe that we are grossly undercounting the Korean blogosphere, mostly due to the fact that the largest Korean blog and hompy services (like Cyworld or Planet Weblog) are not being indexed by Technorati at this time. In addition, we believe that we're somewhat undercounting the French blogosphere, in particular because our indexing of skyblog is poor. We'd love to rectify this - if anyone at these (or other) blogging services is interested in being indexed, please dsifry@technorati.com">drop me a line. Last, Japanese bloggers appear to write shorter posts more often. This could be a result of blogging from mobile phones, and may be skewing the results, given that we are tracking the total number of posts in this analysis.

Another key point to remember is that language breakdown does not necessarily imply a particular country or regional breakdown. For example, Spanish and English are spoken in a large number of countries around the globe - and this analysis doesn't attempt to determine from which country a blogger is writing from - only the primary language of her post.