Recently I’ve been working on some RSS parsing tasks, so let me document how to parse XML documents with non-UTF-8 encoding. Here’s the code directly:
packagerss_testimport("bytes""encoding/xml""fmt""io""testing""github.com/yujiahaol68/rossy/rss""golang.org/x/net/html/charset")funcTest_notUTF8(t*testing.T){r:=rss.New()// Note: don't use xml.Unmarshal() method, it only works with UTF-8 encodingd:=xml.NewDecoder(bytes.NewReader([]byte(notUTF8rss)))// Set encoding processing function, also works with UTF-8 encodingd.CharsetReader=func(sstring,readerio.Reader)(io.Reader,error){returncharset.NewReader(reader,s)}err:=d.Decode(r)iferr!=nil{t.Fatal(err)}for_,item:=ranger.ItemList{fmt.Printf("* %s\n%s\n",item.Title,item.Link)}}