Changing case
Messages   Related Types
This message was discovered on ASPFriends.com 'aspngregexp' list.
Responses highlighted in red are from those people who are likely to be able to contribute good, authoratitive information to this discussion. They include Microsoft employees, MVP's and others who IMHO contribute well to these kinds of discussions.

Ollie Cornes

I want to write a regexp that locates matches and then sets the matched
strings to lower case. It looks like on unix boxed there's a method called
lc() to do this. What's the equivalent in the .NET world of regexp?

Ollie

Reply to this message...
 
    
Scott (VIP)
You can do this with a regexp, but you can also just call the
string.ToLower() method.

[Original message clipped]

Reply to this message...
 
    
Jason Salas
HI Ollie,

You can use the following method:

yourStringName.ToLower()

HTH,
Jason

----- Original Message -----
From: "Ollie Cornes" <Click here to reveal e-mail address>
To: "aspngregexp" <Click here to reveal e-mail address>
Sent: Saturday, March 02, 2002 5:10 AM
Subject: [aspngregexp] Changing case

[Original message clipped]

Reply to this message...
 
    
Ollie Cornes

I know about the ToLower() method, the part that baffled me was the regexp
part of how to move the matching part of a string into lower case. I am a
little unfamiliar with the way .NET does regexp and it's been a long time
since I wrote any regexps. But I figured it out, here's some code to move
opening HTML tags to lower case:

string pattern = @"(<[A-Z,a-z,0-9]*[>,\s])";
string newHtml = Regex.Replace(inputHtml, pattern, new
MatchEvaluator(CapText));

string CapText(Match m)
{
return m.ToString().ToLower();
}

Ollie

----- Original Message -----
From: "Jason Salas" <Click here to reveal e-mail address>
To: "aspngregexp" <Click here to reveal e-mail address>
Sent: Friday, March 01, 2002 11:35 PM
Subject: [aspngregexp] Re: Changing case

[Original message clipped]

Reply to this message...
 
    
Wayne King
You probably don't want the commas within the character classes, and the =
parentheses are superfluous:
string pattern =3D @"<[A-Za-z0-9]+[>\s]";

This pattern may also accomplish the same thing:
string pattern =3D @"<[^>\s]+";

-----Original Message-----
From: Ollie Cornes [mailto:Click here to reveal e-mail address]
Sent: Friday, March 01, 2002 5:00 PM
To: aspngregexp
Subject: [aspngregexp] Re: Changing case

I know about the ToLower() method, the part that baffled me was the =
regexp
part of how to move the matching part of a string into lower case. I am =
a
little unfamiliar with the way .NET does regexp and it's been a long =
time
since I wrote any regexps. But I figured it out, here's some code to =
move
opening HTML tags to lower case:

string pattern =3D @"(<[A-Z,a-z,0-9]*[>,\s])";
string newHtml =3D Regex.Replace(inputHtml, pattern, new
MatchEvaluator(CapText));

string CapText(Match m)
{
return m.ToString().ToLower();
}

Ollie

----- Original Message -----
From: "Jason Salas" <Click here to reveal e-mail address>
To: "aspngregexp" <Click here to reveal e-mail address>
Sent: Friday, March 01, 2002 11:35 PM
Subject: [aspngregexp] Re: Changing case

[Original message clipped]

Reply to this message...
 
    
Ollie Cornes

I've spent a little more time with this regexp to match HTML tags and I
discovered that attribute values containing > characters cause my regexp to
fail, plus spaces in attribute caused problems as well. Below is a larger
regexp that I am currently using to match entire HTML tags. It matches a tag
in more detail, looking for a tagname, zero or more attributes (with or
without values) and an optional trailing /

<[a-zA-Z0-9:]+
(
[\s]+[a-zA-Z0-9:]+
(
(=[^\s"<>]+)|
(="[^"]*")|
(='[^']*')|
()
)
)*
[\s]*\/?\s*>

I have yet to see problems, but that's not to say there aren't any. If
anyone has any experiences that suggest problems with this approach, or if
you have seen a regexp that is proven to match HTML tags, I'd appreciate a
nod. I only need it to match the opening tag, so things like </b> are of no
interest, only <b>, <img src="image.gif" alt="> home">, etc.

Thanks for the help Wayne.

Ollie

PS For anyone who needs simpler HTML matching, this seems to work pretty
well <.*?>

----- Original Message -----
From: "Wayne King" <Click here to reveal e-mail address>
To: "aspngregexp" <Click here to reveal e-mail address>
Sent: Saturday, March 02, 2002 3:42 AM
Subject: [aspngregexp] Re: Changing case

You probably don't want the commas within the character classes, and the
parentheses are superfluous:
string pattern = @"<[A-Za-z0-9]+[>\s]";

This pattern may also accomplish the same thing:
string pattern = @"<[^>\s]+";

-----Original Message-----
From: Ollie Cornes [mailto:Click here to reveal e-mail address]
Sent: Friday, March 01, 2002 5:00 PM
To: aspngregexp
Subject: [aspngregexp] Re: Changing case

I know about the ToLower() method, the part that baffled me was the regexp
part of how to move the matching part of a string into lower case. I am a
little unfamiliar with the way .NET does regexp and it's been a long time
since I wrote any regexps. But I figured it out, here's some code to move
opening HTML tags to lower case:

string pattern = @"(<[A-Z,a-z,0-9]*[>,\s])";
string newHtml = Regex.Replace(inputHtml, pattern, new
MatchEvaluator(CapText));

string CapText(Match m)
{
return m.ToString().ToLower();
}

Ollie

----- Original Message -----
From: "Jason Salas" <Click here to reveal e-mail address>
To: "aspngregexp" <Click here to reveal e-mail address>
Sent: Friday, March 01, 2002 11:35 PM
Subject: [aspngregexp] Re: Changing case

[Original message clipped]

| [aspngregexp] member Click here to reveal e-mail address = YOUR ID
| http://www.asplists.com/asplists/aspngregexp.asp = JOIN/QUIT
| http://www.asplists.com/search = SEARCH Archives

Reply to this message...
 
    
Wayne King
Capturing groupings are a performance hit, so minimizing the number of =
parentheses could be useful for your pattern.

A couple things that could be missing with your pattern:

(1) It doesn't do the right thing when the attribute value is a =
single-quoted string with an embedded double-quote. Looks like a typo in =
this clause:
[^\s"<>]+
You probably meant:
[^\s"'<>]+

(2) It doesn't permit whitespace around the equal-sign that separates =
attribute from value.

Here's a pattern very similar to yours that uses less grouping, and =
explicitly captures the tag and each attribute and value:
<
(?'Tag'[a-zA-Z0-9:]+) \s*
(
(?'Attr'[a-zA-Z0-9:]+) \s*
(=3D \s*
(?'Value' [^\s"'>]+ | "[^"]*" | '[^']*')
)? \s* #match zero or one Value (per Attr)
)* #match zero or more Attrs
/?>
Use these options with it:
RegexOptions.ExplicitCapture
RegexOptions.IgnorePatternWhitespace
The 'Tag' and 'Attr' explicit-capturing can be removed if you aren't =
interested in extracting those substrings. The 'Value' grouping is =
required, but it doesn't need to be an explicit (named) capture.

One tricky part about extracting captured substrings is that, in the =
case of this pattern, the "Value" capture may not exist for every =
captured "Attr" (because of valueless attributes). Consider this input =
string:
<td nowrap colspan=3D2>

Then, these results are expected:
match.Groups["Attr"].Captures[0].Value =3D=3D> "nowrap"
match.Groups["Attr"].Captures[1].Value =3D=3D> "colspan"

But, this may be less expected:
match.Groups["Value"].Captures.Count =3D=3D> 1
match.Groups["Value"].Captures[0].Value =3D=3D> "2"

That is, the zeroth "Value" capture does not correspond to the zeroth =
"Attr" capture.

If you aren't concerned with capturing the attributes and values, but =
still want to correctly deal with the "nested >" issue, this pattern =
should work:
< (?'Tag'[a-zA-Z0-9:]+)
(?: [^"'>]+ | "[^"]*" | '[^']*' )*
/?>
With these options:
RegexOptions.IgnorePatternWhitespace
RegexOptions.ExplicitCapture
The ExplicitCapture option isn't actually needed because the pattern =
uses (?:)

-Wayne

-----Original Message-----
From: Ollie Cornes [mailto:Click here to reveal e-mail address]
Sent: Tuesday, March 05, 2002 5:05 AM
To: aspngregexp
Subject: [aspngregexp] Re: Changing case

I've spent a little more time with this regexp to match HTML tags and I
discovered that attribute values containing > characters cause my regexp =
to
fail, plus spaces in attribute caused problems as well. Below is a =
larger
regexp that I am currently using to match entire HTML tags. It matches a =
tag
in more detail, looking for a tagname, zero or more attributes (with or
without values) and an optional trailing /

<[a-zA-Z0-9:]+
(
[\s]+[a-zA-Z0-9:]+
(
(=3D[^\s"<>]+)|
(=3D"[^"]*")|
(=3D'[^']*')|
()
)
)*
[\s]*\/?\s*>

I have yet to see problems, but that's not to say there aren't any. If
anyone has any experiences that suggest problems with this approach, or =
if
you have seen a regexp that is proven to match HTML tags, I'd appreciate =
a
nod. I only need it to match the opening tag, so things like </b> are of =
no
interest, only <b>, <img src=3D"image.gif" alt=3D"> home">, etc.

Thanks for the help Wayne.

Ollie

PS For anyone who needs simpler HTML matching, this seems to work pretty
well <.*?>

----- Original Message -----
From: "Wayne King" <Click here to reveal e-mail address>
To: "aspngregexp" <Click here to reveal e-mail address>
Sent: Saturday, March 02, 2002 3:42 AM
Subject: [aspngregexp] Re: Changing case

You probably don't want the commas within the character classes, and the
parentheses are superfluous:
string pattern =3D @"<[A-Za-z0-9]+[>\s]";

This pattern may also accomplish the same thing:
string pattern =3D @"<[^>\s]+";

-----Original Message-----
From: Ollie Cornes [mailto:Click here to reveal e-mail address]
Sent: Friday, March 01, 2002 5:00 PM
To: aspngregexp
Subject: [aspngregexp] Re: Changing case

I know about the ToLower() method, the part that baffled me was the =
regexp
part of how to move the matching part of a string into lower case. I am =
a
little unfamiliar with the way .NET does regexp and it's been a long =
time
since I wrote any regexps. But I figured it out, here's some code to =
move
opening HTML tags to lower case:

string pattern =3D @"(<[A-Z,a-z,0-9]*[>,\s])";
string newHtml =3D Regex.Replace(inputHtml, pattern, new
MatchEvaluator(CapText));

string CapText(Match m)
{
return m.ToString().ToLower();
}

Ollie

----- Original Message -----
From: "Jason Salas" <Click here to reveal e-mail address>
To: "aspngregexp" <Click here to reveal e-mail address>
Sent: Friday, March 01, 2002 11:35 PM
Subject: [aspngregexp] Re: Changing case

[Original message clipped]

Reply to this message...
 
 
System.Text.RegularExpressions.MatchEvaluator
System.Text.RegularExpressions.Regex
System.Text.RegularExpressions.RegexOptions




Ad
MBR BootFX
Best-of-breed application framework for .NET projects, developed by Matthew Baxter-Reynolds and MBR IT
 
 Copyright © Matthew Baxter-Reynolds 2001-2008. '.NET 247 Software Development Services' is a trading style of MBR IT Solutions Ltd.
Contact Us - Terms of Use - Privacy Policy - www.dotnet247.com