Bashpodder and Poorly-Written XML Files…
I use Bashpodder to handle my auto-download-through-XML needs (that’s “podcasting” to most of you, and is different from auto-downloading-under-program-control the way most of my audio files come in), and discovered a major problem with badly-formatted XML files published by WGBH Boston; the XML files have no (or few) returns in them, causing Bashpodder to fail in finding the <enclosure> containers. (Bashpodder as written can only find one per line, and since these have everything on one line, it can only find one enclosure.)
To fix the problem, I added immediately after:
file=$(wget -q $podcast -O - |
…the following snippet:
sed s/"<enclosure"/"\n<enclosure"/g - |
…which sticks a newline right before every <enclosure> container, forcing a new line. It appears to cause no ill effects on those XML files which are written properly, only affecting those very few (in my case one) which is structured so poorly. I’m sure those who are better in un*x can find a more efficient solution, but this one appears to work.
Unfortunately, since I do not mail into Gmail, nor accept mail from Gmail, I can’t inform the developer, “linc dot fessenden at gmail dot com” so if any of you know the guy, let him know, huh?




November 13th, 2005 at 6:14 pm
I have not seen the particular XML files of which you speak. However, I am somewhat of an XML expert (I have written three XML courses, and I work with XML in several different contexts) and I suspect you are placing blame in the wrong place. There is no need for carriage returns/linefeeds in “properly” written XML files. To an XML parser, the tags are what define elements, not carriage returns.
Yes, yes, the containers are what define the elements (the same is true in HTML, and every other hypertext language I know of), and bashpodder should process the stream instead of the lines. That said, it’s pretty stupid not to have returns (or line feeds, I’m catholic on line terminators) to seperate the objects, if for no other reason than clarity. When the preponderance of files contain them, and the rarity is that which does not, it’s reasonable if not technical to suggest the rarity is not “proper.” –cfs3