I just finished another cut-n-paste-a-thon. In February it focused on scraping out a couple of month’s worth of blog posts and  pasting them into Many Eyes to create “real” tag clouds based on actual content instead of the author’s bookmarking.

This time, I was scraping a month’s worth of Tweets. You can blame my natural curiosity.

I’ve said it before, someone needs to build an application that does this automatically. We need something that looks at our posts, tweet streams, and links and outputs things like tag clouds (based on what we’re writing), blog rolls (based on what we’re reading or linking to), and potentially parses things like resources.

How I did it

There was nothing glamorous about this. Ultimately, it was tons of cut and pasting. I didn’t include timestamps, or other people’s names. Yes, including other people’s names could have been interesting in that it could show relationships but since I wanted to compare tweets with blogs, I wanted to focus on the substance of the conversations not relationships. So, I opened each person’s archive and pasted a month’s worth of tweets into a plain text file. That was then uploaded to Many Eyes. I must have forgotten that a month’s worth of tweets–especially for these people–is a hell of a lot of content.

Who I chose

