被Python字符串弄崩溃了

2009-04-21 00:41:17 (Last Modified: 2009-04-21 00:41:17)

受发芽网上的一段“校内网发帖机”的启发，最近想修改那段代码，让它可以部分的实现Twitter状态和校内网同步。之所以说“部分的”是因为我不知道怎么样让它像Facebook的Twitter插件那样Twitter的消息发送完毕后几乎同时（相差不到10秒钟）把Facebook的状态一并更新。另外一点很惭愧，就是我对HTTP编程不熟悉，目前还找不初修改校内的状态要提交的代码，只能修改一下原来的代码，在写日志上面做文章。

就算这样，也还是遇到了不少问题。我目前的打算是像过去WordPress上用过的一个插件，名字大概叫twitter-post那样，总结一天的Twitter消息，成为一篇文章，发送到校内上去。我开始用的是python-twitter这个包，很方便的就能弄到Twitter的消息。不过在它的GetUserTimeline()方法的说明中说到里面有一个since参数，提交一个“HTTP-formatted time”，就可以获得从那个时间点以后的消息。我觉得它应该能让我获得前一整天的所有消息，虽然需要做点处理，删除今天的消息。但这个“HTTP-formatted time”我实在是不知道是什么东西。昨天下午几乎找遍了我能从网上找到的所有表示时间的格式，怎么测试，它都只给我默认的最近20条的消息。可笑的是，python-twitter的代码里有test可以运行，而里面测试since参数的用例里用得Twitter帐号居然总共才不到10条tweets，这样GetUserTime()返回了所有的tweets，当然都是他要的时间段的啦。没办法，我只好去它的Google Groups里发问，结果今天居然找不到我提交的帖子了，难道是开发者觉得我是捣乱的？！！

既然这样不行，只好退一步先把能获得的最近20条tweets一并发上去，先看看效果再花功夫自己来解析前一天的所有tweets。结果很快写出了代码，但测试的时候却不行了。那段代码，我在程序里输入什么样的文字都可以正常发送，但自己从Twitter那里获得的文字就无法发送。Python一点提示都没有。我不光试验了python-twitter API给出的结果，还自己抓取了Twitter给出的XML文件自己parse，得到的结果也无法发送。我让两个程序在提交前输出提交的字符自己来比对，基本上都是一种格式的。简直快疯掉了。

接下来该怎么办？我目前只想到：

继续骚扰python-twitter，谁让他给出的测试用例中有问题呢？

像那段发布“校内网发帖机”的半瓶墨水虚心请教。

昨天晚上yegle在Twitter上对那段代码表示了兴趣，如果我没猜错的话，他应该就是某个BBS的Linux版版主之一，大概Python功力比我这种菜鸟强很多，说不定他能“一语惊醒梦中人”……

我目前的代码如下，它还不能工作。这个是我自己parse Twitter的XML版本，用python-twitter的版本前半部分简洁一些，不过差别不大，两者都有同样的问题。

!/usr/bin/env python

-- coding: utf-8 --

from xml.dom import minidom

需要先运行wget http://twitter.com/statuses/user_timeline/liufeng.xml

tw_url = 'http://twitter.com/statuses/user_timeline/liufeng.xml'

xmldoc = minidom.parse('liufeng.xml')

status = xmldoc.firstChild

tweet = status.childNodes[1]

output = ""

count = 1

flag = 0

for tweet in status.childNodes:

if flag == 1:
created_at = tweet.childNodes[1].firstChild.toxml()

text = tweet.childNodes[5].firstChild.toxml()

output += str(count) + " " + text + " " + created_at + "\n"

count += 1

flag = 0

else:
flag = 1`</pre> 

import cookielib
import urllib2
import urllib
import time
cj = cookielib.CookieJar()
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj))
exheaders = [(“User-Agent”, “Mozilla/4.0 (compatible; MSIE 7.1; Windows NT 5.1; SV1)”),]
opener.addheaders = exheaders
url_login = ‘http://xiaonei.com/Login.do’
body = ((’email’, ‘@gmail.com’), (‘password’, ‘’))
req1 = opener.open(url_login, urllib.urlencode(body))
print “Login should be successful.\n”
body = {‘relative_optype’:‘publisher’, ‘blogControl’:‘1’}
url_post = ‘http://blog.xiaonei.com/NewEntry.do’
title = ‘最近我（说了/做了/想了）什么 %s’ % time.asctime()
xt = text.encode(‘utf-8’)
print output
output = output.encode(‘utf-8’)
print " fucked\n\n" + output
body[’title’] = title
body[‘body’] = output
req2 = opener.open(url_post, urllib.urlencode(body))
print urllib.urlencode(body)