I did this more as an academic exercise than anything else.
What if you wanted to easily download any user’s latest tweets? There are plenty of tools available for backing up your own tweets, but what if you found someone with lots of inspirational tweets you wanted to save?
I couldn’t find any software or tools that did what I wanted satisfactorily.
All I wanted was to save the text of their tweets. How hard could that be?
Here’s how I did it…
replace user with the twitter username of the person in question. You can set count to whatever you want, but it doesn’t show more than the last week or so, so 100 is plenty.
This will display an xml file with all the data about the tweets. It’s about 125 lines of code per tweet.
You should see each tweet, followed by </text> and a whole lot of code and text til the next tweet, which will start with <text>. So each tweet is formatted like this:
<text>Text of the tweet</text>
There’s an easy way to strip out all that excess code, leaving just the tweets…
Use the Regular Expression (aka “RegEx”) feature on the find and replace in Notepad++ (download it here for free if you don’t already have it).
Unfortunately, RegEx doesn’t work too well in Notepad++ when line breaks are involved. It took me quite a lot of time & frustration to work that one out!
So first use Extended match. Find
and clear the replace box. Click Replace All, so you have all the text on one line (turn on View –> Word Wrap to make it easy to see everything).
Next, select Regular Expression match and put
in the Find box. Hit Find Next to check it works. It should have selected everything between the closing </text> of one tweet and the opening <text> of the next tweet.
If it doesn’t, just remove the final <text> in the Find box and see what character is blocking the flow. Add that character between the = and ] and retry til you get a long enough selection. Put the <text> back in and hit Find Next again. Does it work? Good.
in the replace box (so each tweet is displayed on a new line) and hit Replace All.
Voila! That should have cleared about 10,000 lines of code! You’ll have a few lines before and after the tweets, but otherwise you should have just the tweets, each on their own line.
If the Replace All leaves some code in between some of the tweets, just repeat the instructions above for when the Regular Expression doesn’t work. There’s just a random unaccounted character blocking the selection.