I did this more as an academic exercise than anything else.
What if you wanted to easily download any user’s latest tweets? There are plenty of tools available for backing up your own tweets, but what if you found someone with lots of inspirational tweets you wanted to save?
I couldn’t find any software or tools that did what I wanted satisfactorily.
All I wanted was to save the text of their tweets. How hard could that be?
Here’s how I did it…
Go to
http://twitter.com/statuses/user_timeline/user.xml?count=100
replace user with the twitter username of the person in question. You can set count to whatever you want, but it doesn’t show more than the last week or so, so 100 is plenty.
This will display an xml file with all the data about the tweets. It’s about 125 lines of code per tweet.
You should see each tweet, followed by </text> and a whole lot of code and text til the next tweet, which will start with <text>. So each tweet is formatted like this:
<text>Text of the tweet</text>
There’s an easy way to strip out all that excess code, leaving just the tweets…
Use the Regular Expression (aka “RegEx”) feature on the find and replace in Notepad++ (download it here for free if you don’t already have it).
Unfortunately, RegEx doesn’t work too well in Notepad++ when line breaks are involved. It took me quite a lot of time & frustration to work that one out!
So first use Extended match. Find
\n
and clear the replace box. Click Replace All, so you have all the text on one line (turn on View –> Word Wrap to make it easy to see everything).
Next, select Regular Expression match and put
</text>([-\w\d\s_<>/+:,?.&;’=]*)<text>
in the Find box. Hit Find Next to check it works. It should have selected everything between the closing </text> of one tweet and the opening <text> of the next tweet.
If it doesn’t, just remove the final <text> in the Find box and see what character is blocking the flow. Add that character between the = and ] and retry til you get a long enough selection. Put the <text> back in and hit Find Next again. Does it work? Good.
Put
\n
in the replace box (so each tweet is displayed on a new line) and hit Replace All.
Voila! That should have cleared about 10,000 lines of code! You’ll have a few lines before and after the tweets, but otherwise you should have just the tweets, each on their own line.
If the Replace All leaves some code in between some of the tweets, just repeat the instructions above for when the Regular Expression doesn’t work. There’s just a random unaccounted character blocking the selection.
</text>([-\w\d\s_<>/+:,?.&;=]*)<text></text>([-\w\d\s_<>/+:,?.&;=]*)<text>
Hi Mike,
what are some of the reasons people would want to go to all the trouble of reading the history of a persons tweets? Besides a business checking into the history of a potential employee?
Syl. If I found someone who tweeted a whole lot of tips, links & recommendations for an area I’m interested in (like SEO), I might want to download them all and read them together, particularly if there were a lot of links that I wanted to save and read later. I find twitter’s interface tough to concentrate on with everything that’s going on around.
Another use would be for when people tweet-write a book. Yes, it’s been done several times. This would be perfect for that situation!
I didn’t even think from the point of view of a business manager trying to snoop, but there are many easier ways to do that (which I won’t share here lol!).
Or import into excel using it as a data source and the code becomes helpful rather than something to discard.
If you’re a developer, here’s a way to bypass the 200 tweet max and get 3200 tweets: http://webapps.stackexchange.com/questions/13732/download-all-the-tweets-from-a-twitter-user
Or if you’re desperate, let me know and I’m sure I can make it happen.