by 3D » Mon, 26 Jan 2009 15:18:22 GMT

 I'm using a SAXParser to parse an XML document and its getting stuck
on certain symbols like the 'trademark' symbol and I think even double-
quotes ".  I really don't need these characters so it would be fine if
the parser just skips over these.  Instead it throws an exception and
quits parsing the document.  What can I do?

by 3D » Wed, 28 Jan 2009 04:10:43 GMT

 Help please.


by 3D » Fri, 30 Jan 2009 04:59:18 GMT

 Thank you both for your responses!  I think I will try just removing
these characters.


by Brad Gies » Fri, 30 Jan 2009 12:01:14 GMT

 ust in case you want to escape/unescape them (it's a little better), here
is what I use (they are C#, but easy to convert to Java). They are simple. I
found the original on the web, but don't remember the author to give the
credit to :

public String EscapeXML(String str)
StringBuilder sb = new StringBuilder();
foreach (Char c in str)
switch (c)
case '&':
case '<':
case '>':
case '\'':
case '"':
return sb.ToString();

This one could be made faster, but it's simple :).

public String Unescape(String str)
str = str.Replace("&amp;", "&");
str = str.Replace("&lt;", "<");
str = str.Replace("&gt;", ">");
str = str.Replace("&#039;", "\\");
str = str.Replace("&#39;", "'");
str = str.Replace("&quot;", "\"");
str = str.Replace("&lt;", "<");
return str;


by Scott G » Sat, 07 Mar 2009 10:54:23 GMT


What I found out was that when the parser hit one of those characters
between element tags, the characters function would be called again.
so a value like

<tag>the "dog" runs</tag>
would render 5 calls:

my solution was to intialize a temporary string on the startelement
call, append that string in the characters call, then assign it to the
proper variable end the end element call.


