Fix formatting bleed?

Ideas? Suggestions?

Or report any bugs about GreatNews here

Moderators: justauser, MysteryFCM

jackvinson
Posts: 101
Joined: Thu May 25, 2006 8:41 am

Fix formatting bleed?

Postby jackvinson » Sun Oct 15, 2006 12:27 pm

I have noticed that unclosed formatting statements tend to bleed over to the following posts. If someone leaves a bold or italic directive open in their content, all the following posts on the page in GN get that directive too. I think I have seen this in other readers - I'm fairly sure this is a universal problem.

Is there an easy fix? "Unformat between posts" or something?

Jack

User avatar
Jack
Site Admin
Posts: 3472
Joined: Fri Feb 18, 2005 12:05 am

Postby Jack » Mon Oct 16, 2006 8:14 pm

I'm not too good at html. I thought about this issue before. The only way I found is to use regular expression to remove all the <u>, <i> <div> etc. I opted not doing so to keep the formatting.

If someone knows a html trick to start formatting from scratch, please let me know!

gtanuel
Posts: 21
Joined: Wed Sep 14, 2005 12:47 am

Postby gtanuel » Tue Oct 17, 2006 1:08 am

In my opinion, it's the content (RSS) publisher's fault and to correct that is to report directly to them. The sad fact is, although RSS is supposed to be well-formed (XML after all), some blogwares are just happy to use CDATA everywhere.

But I understand the issue that the most forgiving aggregator wins. Should that "Unformat between posts" option exist, the easiest (and the fastest) is to insert all inline element's closing tags between (or before) the DIVs, e.g.:

Code: Select all

</div> (endofpost)

</b></u></i></a> ...

<div> (nextpost)


It's surely ugly but could be the fastest way to eliminate the bleeding. And you might want to add "aggressiveness" level, i.e. how many closing tags to be produced (e.g. </b></b></b>) to handle excessive bleeding.

The more elegant solution is of course to tidy them before rendering but I wonder how worse the performance will be.

Found this article while looking for solution: Parsing RSS At All Costs, don't know if it's useful.

jackvinson
Posts: 101
Joined: Thu May 25, 2006 8:41 am

Postby jackvinson » Wed Oct 18, 2006 12:04 am

I just noticed it again. Before complaining to the blog owner, I wanted to check her feed (http://feeds.feedburner.com/Communities ... evelopment). It turns out that she has an "em" tag that is completely empty near the top of the first entry:

Code: Select all

<em />

It looks like Greatnews is somehow mis-interpretting this tag -- seing only the first part of it somehow. When viewed through the FeedBurner link above, the formatting is correct.

What do you see in GreatNews?

Jack

gtanuel
Posts: 21
Joined: Wed Sep 14, 2005 12:47 am

Postby gtanuel » Wed Oct 18, 2006 6:12 am

While GreatNews can do something to fix it (tidy-ing is one), it still mostly depends on the browser/rendering engine (default is IE). Following example will show that, in both IE & Mozilla:

Code: Select all

<html>
<body>
This is the <em></em>first line.
<br />
The second line should be normal.
</body>
</html>

is rendered differently from:

Code: Select all

<html>
<body>
This is the <em />first line.
<br />
The second line should be normal.
</body>
</html>

Feedburner seems to display it more faithfully in its XML format with some styling. I tried importing it to Google Reader and it looks good (no bleeding). Unfortunately it's client-side script/ajax based, thus I can't inspect what the actual/rendered source is. My strong guess is they do some pre-tidy-ing as well.

I think it would be good to have an option (either application wide or feed specific) to pre-tidy contents. I'm not sure about statically/dynamically linked library, but I know SciTE use a plain command line tidy.exe and it's quite seamless.

User avatar
Jack
Site Admin
Posts: 3472
Joined: Fri Feb 18, 2005 12:05 am

Postby Jack » Wed Oct 18, 2006 6:58 am

Thanks gtanuel. The information is very helpful. For the time being I will add several dummy </div>, </span>, </i>, </u> etc after each post to fix simply issues. The missing </div>, </span> are particularly destructive as they destroys GreatNews layout.

gtanuel
Posts: 21
Joined: Wed Sep 14, 2005 12:47 am

Postby gtanuel » Wed Oct 18, 2006 10:44 am

On further thought, I'd suggest you not to "fix" the block elements such as div, p, blockquote etc. I think it will mess up your original formatting rather than helping to eliminate the bleeding. If it really needs to be that "forgiving", you might want to take another look at pre-tidy-ing the contents (selectively as it will definitely affect performance).

As I mentioned previously, SciTE also uses HTML Tidy to help in formatting HTML. In my current installation, the tidy.exe only takes 264 KB and it has the option to use stdin & stdout so that there won't be any console window appearing. I think this will also help in complying to whatever copyleft license they have since you won't use statically linked library etc. i.e. not a derivative work.

User avatar
Jack
Site Admin
Posts: 3472
Joined: Fri Feb 18, 2005 12:05 am

Postby Jack » Thu Oct 19, 2006 8:28 am

I could just use regular expression to scan the text(I only need to recognize block elements) but performance will probably be a big issue here.


Return to “Suggestions & Bug Reports”

Who is online

Users browsing this forum: No registered users and 4 guests