How to parse html with saxparser (or other solution)

by tlegras » Sun, 03 Jan 2010 00:22:24 GMT


Sponsored Links
 Happy new year world :)

I want to parse an html page downloaded from a web server and have
pretty much trouble with that.
I am trying with saxparser, is there any better solution?

With sax i am trying to preprocess the page to make it xml compliant
(replace <br> with <br />), but i still have some troubles because of
errors in the page (a couple of mismatched tags and "&" in attributes
value iso &amp;).

Is there any way to make sax parser ignore these errors and keep on
parsing? i tryed to use ErrorHandler interface, but i could not catch
anything.

Any help would be welcome.
Thierry.

--



How to parse html with saxparser (or other solution)

by Kumar Bibek » Sun, 03 Jan 2010 00:37:06 GMT


 I guess you need to use a special HTML parse. Since, HTML pages are
not well-formed and are not XML compliant, using an XML parser will
not serve your purpose.

Search for any third party libraries.

Thanks and Regards,
Kumar Bibek




--


Sponsored Links


How to parse html with saxparser (or other solution)

by tlegras » Sun, 03 Jan 2010 02:37:01 GMT


 ok thanks i am trying nekohtml and currently trying to make it run but
with the minimal sample code (so using only provided
xercesMinimal.jar) i got this exception in my parse() function:

E/AndroidRuntime(  765): Uncaught handler: thread Thread-10 exiting
due to uncaught exception
E/AndroidRuntime(  765): java.lang.ExceptionInInitializerError
E/AndroidRuntime(  765):        at org.cyberneko.html.HTMLScanner
$ContentScanner.scan(HTMLScanner.java:2043)
E/AndroidRuntime(  765):        at
org.cyberneko.html.HTMLScanner.scanDocument(HTMLScanner.java:907)
E/AndroidRuntime(  765):        at
org.cyberneko.html.HTMLConfiguration.parse(HTMLConfiguration.java:499)
E/AndroidRuntime(  765):        at
org.cyberneko.html.HTMLConfiguration.parse(HTMLConfiguration.java:452)
E/AndroidRuntime(  765):        at
com.tlegras.freeboxrec.AddRecordingThread.run(AddRecordingThread.java:
231)
E/AndroidRuntime(  765): Caused by: java.lang.IllegalStateException:
Failed to create XercesBridge instance
E/AndroidRuntime(  765):        at
org.cyberneko.html.xercesbridge.XercesBridge.makeInstance
(XercesBridge.java:59)
E/AndroidRuntime(  765):        at
org.cyberneko.html.xercesbridge.XercesBridge.<clinit>
(XercesBridge.java:32)

Still invistigating, I will give feedback.

Thanks,
Thierry.






--



How to parse html with saxparser (or other solution)

by tlegras » Sun, 03 Jan 2010 19:44:40 GMT


 ok i got it. it seems the problem is that their xercesMinimal.jar does
not work. it tried it in a non android java project and had the same
problem. with the full xerces jar i can parse my html page even it has
several errors in it. Too bad the full xerces jar is 1.2Mo :(
Seems like a bug from nekohtml, i will repport in their mailing list.







--



How to parse html with saxparser (or other solution)

by jwei512 » Mon, 04 Jan 2010 14:52:29 GMT


 Another one you could try is HTML Cleaner (http://
htmlcleaner.sourceforge.net/).

I've made a few applications already that references this library and
it even supports XPATH to parse the HTML source

If you'd like to see some code snippets then let me know and I can
show you some.

- jwei

 http://thinkandroid.wordpress.com 







--



How to parse html with saxparser (or other solution)

by tlegras » Mon, 04 Jan 2010 16:04:03 GMT


 Now nekohtml is working very fine for me so i probably won't change :)
But thank you for the link, it is a goldmine :) I found the
documentation miss such snippets.




--



Other Threads

1. What is "active installs"? ... any why does it jump?

Yes,  This confuse me too.
Also is there any way to directly check for - any application which user has
return to google?


-- 
Mahesh Vaghela | Sr. Developer - Android

Visit us at: http://www.indianic.com










--~--~---------~--~----~------------~-------~--~----~

2. Rotate (90, 180, 270) Camera Preview by SurfaceFlinger ?

All experts

[Problem:]
In CameraService.c, we can only confirm one param for transform.
     transform = ISurface:BufferHeap:ROT_90;

My LCD resolution is HVGA (480x320). Camera app default layout is
Landscape.
Apply the above parameter (ROT_90), the preview screen is very
strange.

[Request]
How to support Rotation (90, 180, 270) on Camera preview ? Is it
possible to use SurfaceFlinger ?

Please give me some hints or tips ?

Thanks a lot.
--~--~---------~--~----~------------~-------~--~----~

3. How to HTTP Post

4. Could title bar of activity listen to click event?

5. write android UI by C language ?

6. Parsing XML feed with SAX

7. How to add audio file into a content provider?