Replace problem
Messages   Related Types
This message was discovered on ASPFriends.com 'aspngregexp' list.
Responses highlighted in red are from those people who are likely to be able to contribute good, authoratitive information to this discussion. They include Microsoft employees, MVP's and others who IMHO contribute well to these kinds of discussions.

Yannick Smits
Hi,
Why doesn't the following line remove all <link href=....etc> tags from my
inputString?
System.Text.RegularExpressions.Regex.Replace(inputString,@"<link (.*)>","");

Thanks,
Yannick Smits

Reply to this message...
 
    
Wayne King
It looks like this would remove *more* than all <link> tags, not less. =
That is, an inputString like this:
abc <link href=3D"foo"> hello <b>bold text</b> goodbye
would become:
abc goodbye

To stop the engine from matching beyond the <link> tag, this is a simple =
pattern attempt, which uses a lazy quantifier:
<link[^>]*?

However, this isn't quite right because an attribute value could contain =
an embedded greater-than. Such as inputString:
abc <link href=3D">" target=3D"foo"> hello
would become:
abc " target=3D"foo"> hello

This pattern will handle embedded greater-thans (spaces added for =
clarity):
<link ( [^"'>]+ | "[^"]*" | '[^']*' )* >
It could be made more efficient (at the expense of readability), but =
should work as-is nonetheless.

To have the engine ignore spaces, uses the IgnorePatternWhitespace =
option, as:
Regex.Replace(
inputString,
@"<link ( [^"'>]+ | "[^"]*" | '[^']*' )* >",
"",
RegexOptions.ExplicitCapture |
RegexOptions.IgnorePatternWhitespace |
RegexOptions.IgnoreCase);

-Wayne King
This posting is provided "AS IS" with no warranties, and confers no =
rights.

-----Original Message-----
From: Yannick Smits [mailto:Click here to reveal e-mail address]
Sent: Wednesday, February 06, 2002 5:44 PM
To: aspngregexp
Subject: [aspngregexp] Replace problem

Hi,
Why doesn't the following line remove all <link href=3D....etc> tags =
from my
inputString?
System.Text.RegularExpressions.Regex.Replace(inputString,@"<link =
(.*)>","");

Thanks,
Yannick Smits

| [aspngregexp] member Click here to reveal e-mail address =3D YOUR ID
| http://www.asplists.com/asplists/aspngregexp.asp =3D JOIN/QUIT
| http://www.asplists.com/search =3D SEARCH Archives

Reply to this message...
 
    
Ryan Trudelle-Schwarz (VIP)
Why not just use:

"<link(.|\n)*?>"

Won't the lazy qualifier will stop it before it overtakes the first '>'
character?

-----Original Message-----
From: Wayne King [mailto:Click here to reveal e-mail address]

It looks like this would remove *more* than all <link> tags, not less.
That is, an inputString like this:
abc <link href="foo"> hello <b>bold text</b> goodbye
would become:
abc goodbye

To stop the engine from matching beyond the <link> tag, this is a simple
pattern attempt, which uses a lazy quantifier:
<link[^>]*?

However, this isn't quite right because an attribute value could contain
an embedded greater-than. Such as inputString:
abc <link href=">" target="foo"> hello
would become:
abc " target="foo"> hello

This pattern will handle embedded greater-thans (spaces added for
clarity):
<link ( [^"'>]+ | "[^"]*" | '[^']*' )* >
It could be made more efficient (at the expense of readability), but
should work as-is nonetheless.

To have the engine ignore spaces, uses the IgnorePatternWhitespace
option, as:
Regex.Replace(
inputString,
@"<link ( [^"'>]+ | "[^"]*" | '[^']*' )* >",
"",
RegexOptions.ExplicitCapture |
RegexOptions.IgnorePatternWhitespace |
RegexOptions.IgnoreCase);

-Wayne King
This posting is provided "AS IS" with no warranties, and confers no
rights.

-----Original Message-----
From: Yannick Smits [mailto:Click here to reveal e-mail address]
Sent: Wednesday, February 06, 2002 5:44 PM
To: aspngregexp
Subject: [aspngregexp] Replace problem

Hi,
Why doesn't the following line remove all <link href=....etc> tags from
my
inputString?
System.Text.RegularExpressions.Regex.Replace(inputString,@"<link
(.*)>","");

Thanks,
Yannick Smits

Reply to this message...
 
    
Yannick Smits
Hi,
I tried Wayne's and Ryan's solution but both fail. Is there perhaps
something wrong with the way I'm replacing the text? Should I perhaps use
the String.Replace method?
This is what I tried last:
string editString = Regex.Replace(inputString,@"<link[^>]*?","",
RegexOptions.ExplicitCapture | RegexOptions.IgnorePatternWhitespace |
RegexOptions.IgnoreCase);

I also tried @"<link ( [^"'>]+ | "[^"]*" | '[^']*' )* >", but this gave me a
compiler error.
Thanks,
Yannick Smits

"Wayne King" <Click here to reveal e-mail address> wrote in message
news:585831@aspngregexp...

It looks like this would remove *more* than all <link> tags, not less. That
is, an inputString like this:
abc <link href="foo"> hello <b>bold text</b> goodbye
would become:
abc goodbye

To stop the engine from matching beyond the <link> tag, this is a simple
pattern attempt, which uses a lazy quantifier:
<link[^>]*?

However, this isn't quite right because an attribute value could contain an
embedded greater-than. Such as inputString:
abc <link href=">" target="foo"> hello
would become:
abc " target="foo"> hello

This pattern will handle embedded greater-thans (spaces added for clarity):
<link ( [^"'>]+ | "[^"]*" | '[^']*' )* >
It could be made more efficient (at the expense of readability), but should
work as-is nonetheless.

To have the engine ignore spaces, uses the IgnorePatternWhitespace option,
as:
Regex.Replace(
inputString,
@"<link ( [^"'>]+ | "[^"]*" | '[^']*' )* >",
"",
RegexOptions.ExplicitCapture |
RegexOptions.IgnorePatternWhitespace |
RegexOptions.IgnoreCase);

-Wayne King
This posting is provided "AS IS" with no warranties, and confers no rights.

-----Original Message-----
From: Yannick Smits [mailto:Click here to reveal e-mail address]
Sent: Wednesday, February 06, 2002 5:44 PM
To: aspngregexp
Subject: [aspngregexp] Replace problem

Hi,
Why doesn't the following line remove all <link href=....etc> tags from my
inputString?
System.Text.RegularExpressions.Regex.Replace(inputString,@"<link (.*)>","");

Thanks,
Yannick Smits

| [aspngregexp] member Click here to reveal e-mail address = YOUR ID
| http://www.asplists.com/asplists/aspngregexp.asp = JOIN/QUIT
| http://www.asplists.com/search = SEARCH Archives

Reply to this message...
 
    
Wayne King
My "lazy-quantifier" pattern has a bug -- I sent a follow-up reply, but =
unfortunately it looks like that message vanished into the Ether. Here's =
that fixed pattern (note the trailing greater-than I originally =
omitted):
<link[^>]*?>

In literal strings, the double-quote is the only "special" character -- =
you still have to escape it. The C# literal string
@"<link ( [^"'>]+ | "[^"]*" | '[^']*' )* >"
doesn't work because the embedded double-quotes aren't escaped, as:
@"<link ( [^""'>]+ | ""[^""]*"" | '[^']*' )* >"

-----Original Message-----
From: Yannick Smits [mailto:Click here to reveal e-mail address]
Sent: Friday, February 08, 2002 11:02 AM
To: aspngregexp
Subject: [aspngregexp] Re: Replace problem

Hi,
I tried Wayne's and Ryan's solution but both fail. Is there perhaps
something wrong with the way I'm replacing the text? Should I perhaps =
use
the String.Replace method?
This is what I tried last:
string editString =3D Regex.Replace(inputString,@"<link[^>]*?","",
RegexOptions.ExplicitCapture | RegexOptions.IgnorePatternWhitespace |
RegexOptions.IgnoreCase);

I also tried @"<link ( [^"'>]+ | "[^"]*" | '[^']*' )* >", but this gave =
me a
compiler error.
Thanks,
Yannick Smits

"Wayne King" <Click here to reveal e-mail address> wrote in message
news:585831@aspngregexp...

It looks like this would remove *more* than all <link> tags, not less. =
That
is, an inputString like this:
abc <link href=3D"foo"> hello <b>bold text</b> goodbye
would become:
abc goodbye

To stop the engine from matching beyond the <link> tag, this is a simple
pattern attempt, which uses a lazy quantifier:
<link[^>]*?

However, this isn't quite right because an attribute value could contain =
an
embedded greater-than. Such as inputString:
abc <link href=3D">" target=3D"foo"> hello
would become:
abc " target=3D"foo"> hello

This pattern will handle embedded greater-thans (spaces added for =
clarity):
<link ( [^"'>]+ | "[^"]*" | '[^']*' )* >
It could be made more efficient (at the expense of readability), but =
should
work as-is nonetheless.

To have the engine ignore spaces, uses the IgnorePatternWhitespace =
option,
as:
Regex.Replace(
inputString,
@"<link ( [^"'>]+ | "[^"]*" | '[^']*' )* >",
"",
RegexOptions.ExplicitCapture |
RegexOptions.IgnorePatternWhitespace |
RegexOptions.IgnoreCase);

-Wayne King
This posting is provided "AS IS" with no warranties, and confers no =
rights.

-----Original Message-----
From: Yannick Smits [mailto:Click here to reveal e-mail address]
Sent: Wednesday, February 06, 2002 5:44 PM
To: aspngregexp
Subject: [aspngregexp] Replace problem

Hi,
Why doesn't the following line remove all <link href=3D....etc> tags =
from my
inputString?
System.Text.RegularExpressions.Regex.Replace(inputString,@"<link =
(.*)>","");

Thanks,
Yannick Smits

| [aspngregexp] member Click here to reveal e-mail address =3D YOUR ID
| http://www.asplists.com/asplists/aspngregexp.asp =3D JOIN/QUIT
| http://www.asplists.com/search =3D SEARCH Archives

| [aspngregexp] member Click here to reveal e-mail address =3D YOUR ID
| http://www.asplists.com/asplists/aspngregexp.asp =3D JOIN/QUIT
| http://www.asplists.com/search =3D SEARCH Archives

Reply to this message...
 
 
System.String
System.Text.RegularExpressions.Regex
System.Text.RegularExpressions.RegexOptions




Ad
MBR BootFX
Best-of-breed application framework for .NET projects, developed by Matthew Baxter-Reynolds and MBR IT
 
 Copyright © Matthew Baxter-Reynolds 2001-2008. '.NET 247 Software Development Services' is a trading style of MBR IT Solutions Ltd.
Contact Us - Terms of Use - Privacy Policy - www.dotnet247.com