Parse HTML and strip tags leaving only text?
Messages   Related Types
This message was discovered on ASPFriends.com 'aspngfreeforall' list.
Responses highlighted in red are from those people who are likely to be able to contribute good, authoratitive information to this discussion. They include Microsoft employees, MVP's and others who IMHO contribute well to these kinds of discussions.

Bryan Andrews
There are a few components that I remember (vb6) that would parse html =
and strip out tags, and leave certain formatting...

All I'd like to do is strip all the tags, and maybe pop carriage returns =
in where there is a </p> or a <br>.

Anyone know of any .net components or existing functions that do this =
kind of thing?=20

Thanks!

Reply to this message...
 
    
dave wanta (VIP)
First I would replace your </p> and <br> with System.Environment.Newline

then use the RegEx found at http://www.123aspx.com/redir.aspx?res=13881

Cheers!
Dave
----- Original Message -----
From: "Bryan Andrews" <Click here to reveal e-mail address>
To: "aspngfreeforall" <Click here to reveal e-mail address>
Sent: Saturday, August 17, 2002 7:14 PM
Subject: [aspngfreeforall] Parse HTML and strip tags leaving only text?

There are a few components that I remember (vb6) that would parse html and
strip out tags, and leave certain formatting...

All I'd like to do is strip all the tags, and maybe pop carriage returns in
where there is a </p> or a <br>.

Anyone know of any .net components or existing functions that do this kind
of thing?

Thanks!

| ASP.net DOCS = http://www.aspng.com/docs
| [aspngfreeforall] member Click here to reveal e-mail address = YOUR ID
| http://www.asplists.com/aspngfreeforall = JOIN/QUIT
| news://ls.asplists.com = NEWSGROUP

Reply to this message...
 
    
Dan Lipsy
Note that there is one HTML parser here:
http://blogs.msdn.com/smourier/archive/2003/06/04/8265.aspx

It even works on malformed HTML.

Hope that Helps,

Dan Lipsy
Student Ambassador to Microsoft
Reply to this message...
 
 
System.Environment




Ad
MBR BootFX
Best-of-breed application framework for .NET projects, developed by Matthew Baxter-Reynolds and MBR IT
 
 Copyright © Matthew Baxter-Reynolds 2001-2008. '.NET 247 Software Development Services' is a trading style of MBR IT Solutions Ltd.
Contact Us - Terms of Use - Privacy Policy - www.dotnet247.com