<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>The Open Coder &#187; regular expressions</title>
	<atom:link href="http://www.opencoder.co.uk/tag/regular-expressions/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.opencoder.co.uk</link>
	<description>Helping the fellow geek</description>
	<lastBuildDate>Fri, 15 Apr 2011 12:25:41 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.1</generator>
		<item>
		<title>URL regular expression</title>
		<link>http://www.opencoder.co.uk/2011/01/url-regular-expression/</link>
		<comments>http://www.opencoder.co.uk/2011/01/url-regular-expression/#comments</comments>
		<pubDate>Wed, 05 Jan 2011 13:30:26 +0000</pubDate>
		<dc:creator>Chris McDonald</dc:creator>
				<category><![CDATA[Flex]]></category>
		<category><![CDATA[Software Development]]></category>
		<category><![CDATA[Web development]]></category>
		<category><![CDATA[development]]></category>
		<category><![CDATA[regular expressions]]></category>

		<guid isPermaLink="false">http://www.opencoder.co.uk/?p=461</guid>
		<description><![CDATA[This is just a quick post to share a regular expression for a URL I had to come up with when needing to validate a URL in a Flex app. The code below is for Flex, but would only require a few minor changes for another language, the double backslash before the ? appears to [...]]]></description>
			<content:encoded><![CDATA[<p>This is just a quick post to share a regular expression for a URL I had to come up with when needing to validate a URL in a Flex app. The code below is for Flex, but would only require a few minor changes for another language, the double backslash before the ? appears to be required for Flex, using a single backslash does not work, read more about that in this <a title="Flex regular expression issues" href="http://www.opencoder.co.uk/2010/03/regexpvalidator-issues/" target="_blank">older post</a>. This also contains what would be capturing brackets in other languages, I could have used non-capturing brackets but that would have made this already complicated example even more difficult to read.</p>
<div class="codesnip-container" >
<div class="actionscript codesnip" style="font-family:monospace;">linkValidator = <span class="kw2">new</span> RegExpValidator<span class="br0">&#40;</span><span class="br0">&#41;</span>;<br />
linkValidator.<span class="me1">expression</span> = <span class="st0">&quot;(http(s)?:<span class="es0">\/</span><span class="es0">\/</span>)?(([a-z]+[a-z0-9<span class="es0">\-</span>]*[.])?([a-z0-9]+[a-z0-9<span class="es0">\-</span>]*[.])+[a-z]{2,3}|localhost)(<span class="es0">\/</span>[a-z0-9_-]+[a-z0-9_ -]*)*<span class="es0">\/</span>?(<span class="es0">\\</span>?[a-z0-9_-]+=[a-z0-9 ',.-]*(&amp;amp;[a-z0-9_-]+=[a-z0-9 ',.-]*)*)?(#[a-z0-9/_-]*)?$&quot;</span>;<br />
linkValidator.<span class="me1">noMatchError</span> = resourceManager.<span class="me1">getString</span><span class="br0">&#40;</span><span class="st0">&quot;lang&quot;</span>, <span class="st0">&quot;invalidURL&quot;</span><span class="br0">&#41;</span>;<br />
linkValidator.<span class="me1">flags</span> = <span class="st0">&quot;i&quot;</span>;<br />
linkValidator.<span class="me1">source</span> = linkTextArea;<br />
linkValidator.<span class="me1">property</span> = <span class="st0">&quot;text&quot;</span>;<br />
linkValidator.<span class="me1">trigger</span> = linkTextArea;<br />
linkValidator.<span class="me1">triggerEvent</span> = Event.<span class="me1">CHANGE</span>;</div>
</div>
<p>I&#8217;ll break it down to the individual sections with a brief explaination.</p>
<pre>
//protocol and subdomain
(http(s)?:\/\/)?(([a-z]+[a-z0-9\-]*[.])?
</pre>
<p>The first part includes the protocol (http:// or https://), I am only dealing with web http urls here and it is optional in my app hence the ? at the end of the first group, the rest includes an optional subdomain which should start with one or more letters followed by zero or more letters/numbers/hyphens and a dot. This first subdomain and dot is also optional. So far this would match: <em>[empty string] http:// https//www. https://ww2</em> etc.</p>
<pre>
//server hostname
([a-z0-9]+[a-z0-9\-]*[.])+[a-z]{2,3}|localhost)
</pre>
<p>This next part includes the rest of the web host, the first grouping (first enclosing brackets) specifies the start of the hostname or a further subdomain which must start with a letter or number followed by a dot (the dot as a character set is how to represent the dot in Flex, you might be able to just use <em>\.</em>). This can be repeated many times, but then should be followed by 2 or three characters. Alternatively the hostname localhost can be used instead, the extra closing bracket matches the additional opening one after the protocol. This section should match: <em>www.example.com localhost example.com example.co.uk co.uk</em> etc.</p>
<pre>
//web path
(\/[a-z0-9_-]+[a-z0-9_ -]*)*\/?
</pre>
<p>This next part consists of the optional path (directory from the web root), it starts with a forward slash and can be any number of letters, numbers, underscores, spaces or hyphens, but can not start with a space (you might need to backslash escape your hypen in a different language. The trailing backslash is also optional as is the entire path. This part should match: <em>[empty string] / /directory /a/b/</em> etc.</p>
<pre>
//query string
(\\?[a-z0-9_-]+=[a-z0-9 ',.-]*(&amp;amp;[a-z0-9_-]+=[a-z0-9 ',.-]*)*)?
</pre>
<p>This part contains the optional query string part of the URL. Starting with a ? (may require only a single backslash in a different language), followed by the first parameter made up of one or more letters/numbers/underscores/hyphens the equals sign, followed by an optional parameter value made up of letters/numbers/spaces/apostrophes/commas/dots/hypens. This parameter=value part of the query string can be repeated several times after that but each extra parameter should be preceded with the ampersand (you would normally use just the &#038; for this, but flex requires the encoded version). This section could match: <em>[empty string] ?a= ?a=bc ?a=b&#038;c=d&#038;e=f</em> etc.</p>
<pre>
//fragment
(#[a-z0-9/_-]*)?$
</pre>
<p>Finally the last part of the expression contains the optional url fragment (the part with a #). In my case I specified zero or more letters/numbers/forward slashes/underscores/hypens (flex does not require escaping the forward slash when it is included in a character set). Then the dollar sign specifies that there should not be anything else after this. This could match: <em>[empty string] # #value #a/b/c</em> etc.</p>
<p>I hope this is useful to those struggling to create their own URL regular expression matchers. Flex devs remember to double escape the ? for the query string part of the URL.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.opencoder.co.uk/2011/01/url-regular-expression/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>RegExpValidator issues</title>
		<link>http://www.opencoder.co.uk/2010/03/regexpvalidator-issues/</link>
		<comments>http://www.opencoder.co.uk/2010/03/regexpvalidator-issues/#comments</comments>
		<pubDate>Fri, 26 Mar 2010 13:32:46 +0000</pubDate>
		<dc:creator>Chris McDonald</dc:creator>
				<category><![CDATA[Flex]]></category>
		<category><![CDATA[General ramblings]]></category>
		<category><![CDATA[flash]]></category>
		<category><![CDATA[mxml]]></category>
		<category><![CDATA[regular expressions]]></category>

		<guid isPermaLink="false">http://www.opencoder.co.uk/?p=170</guid>
		<description><![CDATA[Today I came across an annoying problem involving the RegExpValidator class in Flex. In flex app for one of our projects StickyWorld, we allow users to upload images, pdfs, 3d models and most recently, reference YouTube movies inside a virtual room where people can comment on them, adding sticky notes in context. In our upload [...]]]></description>
			<content:encoded><![CDATA[<p>Today I came across an annoying problem involving the RegExpValidator class in Flex. In flex app for one of our projects <a title="StickyWorld online collaborative design review" href="http://www.stickyworld.com" target="_blank">StickyWorld</a>, we allow users to upload images, pdfs, 3d models and most recently, reference YouTube movies inside a virtual room where people can comment on them, adding sticky notes in context. In our upload window, users can choose what type of media they want to add to the room, in the case of a YouTube video only the video id is required. To make it a bit more flexible we decided to allow the users to specify a url to a video as well as just the video id. The app needed to handle urls in several different formats, e.g.</p>
<ul>
<li>http://www.youtube.com/watch?v=[videoid]</li>
<li>http://www.youtube.com/v/[videoid]</li>
<li>http://www.youtube.com/[userchannel]/[somepath]/[videoid]</li>
</ul>
<p>Sounds like a job for a regular expression right? Since we also wanted to do some validation on the field to make sure we were given a valid YouTube video url or video id, it seemed that the RegExpValidator class was perfect for the job. However after adding the RegExpValidator in MXML I had a problem that took a good while to figure out what was wrong.</p>
<p>Below is the snippet of code containing the regular expression string, its use in the validator and a validation handler function, apologies for the wrapping of the regular expression, I have split it below where the main formats are separated by pipes.</p>
<div class="codesnip-container" >
<div class="actionscript codesnip" style="font-family:monospace;"><span class="br0">&#91;</span>Bindable<span class="br0">&#93;</span><br />
<span class="kw3">private</span> <span class="kw2">var</span> youtubeRegExp:<span class="kw3">String</span> = <br />
<span class="st0">&quot;^(?:http:<span class="es0">\/</span><span class="es0">\/</span>(?:www<span class="es0">\.</span>)?youtube<span class="es0">\.</span>com<span class="es0">\/</span>watch(?:<span class="es0">\?</span>|#!)v=(.<span class="es0">\{</span>11<span class="es0">\}</span>)(?:&amp;.*)?<br />
|(.<span class="es0">\{</span>11<span class="es0">\}</span>)<br />
|http:<span class="es0">\/</span><span class="es0">\/</span>(?:www<span class="es0">\.</span>)?youtube<span class="es0">\.</span>com<span class="es0">\/</span>(?:v|[A-Za-z0-9#<span class="es0">\/</span>_<span class="es0">\-</span>]*)<span class="es0">\/</span>(.<span class="es0">\{</span>11<span class="es0">\}</span>))$&quot;</span>;</p>
<p><span class="kw3">private</span> <span class="kw2">function</span> youtubeValid<span class="br0">&#40;</span>ev:ValidationResultEvent<span class="br0">&#41;</span>:<span class="kw3">void</span><br />
<span class="br0">&#123;</span><br />
    <span class="kw1">if</span> <span class="br0">&#40;</span>ev.<span class="kw3">type</span> == ValidationResultEvent.<span class="me1">VALID</span><span class="br0">&#41;</span> <span class="br0">&#123;</span><br />
        <span class="kw1">for</span> <span class="br0">&#40;</span><span class="kw2">var</span> i:<span class="kw3">int</span> = <span class="nu0">0</span>; <br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; i <span class="sy0">&lt;</span> ev.<span class="me1">results</span><span class="br0">&#91;</span>0<span class="br0">&#93;</span>.<span class="me1">matchedSubstrings</span>.<span class="kw3">length</span>; <br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; i++<br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="br0">&#41;</span> <span class="br0">&#123;</span><br />
             <span class="kw1">if</span> <span class="br0">&#40;</span>ev.<span class="me1">results</span><span class="br0">&#91;</span>0<span class="br0">&#93;</span>.<span class="me1">matchedSubstrings</span><span class="br0">&#91;</span>i<span class="br0">&#93;</span> <span class="sy0">!</span>= <span class="kw2">null</span><span class="br0">&#41;</span><br />
                  youtubeVidId = ev.<span class="me1">results</span><span class="br0">&#91;</span>0<span class="br0">&#93;</span>.<span class="me1">matchedSubstrings</span><span class="br0">&#91;</span>i<span class="br0">&#93;</span>;<br />
&nbsp; &nbsp;     <span class="br0">&#125;</span><br />
        txtName.<span class="kw3">text</span> = youtubeVidId;<br />
        btnSubmit.<span class="kw3">enabled</span> = txtYouTube.<span class="kw3">text</span>.<span class="me1">length</span><span class="sy0">&gt;</span><span class="nu0">0</span>;<br />
    <span class="br0">&#125;</span> <span class="kw1">else</span> <span class="br0">&#123;</span><br />
         btnSubmit.<span class="kw3">enabled</span> = <span class="kw2">false</span>;<br />
    <span class="br0">&#125;</span><br />
<span class="br0">&#125;</span></p>
<p><span class="sy0">&lt;</span>mx :RegExpValidator id=<span class="st0">&quot;youtubeValidator&quot;</span> <br />
&nbsp; &nbsp; source=<span class="st0">&quot;{txtYouTube}&quot;</span> property=<span class="st0">&quot;text&quot;</span> <br />
&nbsp; &nbsp; expression=<span class="st0">&quot;{youtubeRegExp}&quot;</span> valid=<span class="st0">&quot;youtubeValid(event)&quot;</span><br />
&nbsp; &nbsp; invalid=<span class="st0">&quot;youtubeValid(event)&quot;</span> <br />
&nbsp; &nbsp; noMatchError=<span class="st0">&quot;YouTube video url/id invalid&quot;</span><br />
&nbsp; &nbsp; trigger=<span class="st0">&quot;{txtYouTube}&quot;</span><span class="sy0">/&gt;</span></div>
</div>
<p>The problem was, nothing was coming out as valid, not even the videoid just by itself. So wrote a test using the regular expression, creating a RegExp object using the expression and testing some urls using the exec function and it appeared to be working fine, so why not with the validator? In the end it turned out to be that the problem was down to several characters needing to be escaped. The correct expression that worked is below:</p>
<div class="codesnip-container" >
<div class="actionscript codesnip" style="font-family:monospace;">youtubeRegExp = <br />
<span class="st0">&quot;^(?:http:<span class="es0">\/</span><span class="es0">\/</span>(?:www<span class="es0">\.</span>)?youtube<span class="es0">\.</span>com<span class="es0">\/</span>watch(?:<span class="es0">\\</span>?|#!)v=(.<span class="es0">\{</span>11<span class="es0">\}</span>)(?:&amp;;.*)?<br />
|(.<span class="es0">\{</span>11<span class="es0">\}</span>)<br />
|http:<span class="es0">\/</span><span class="es0">\/</span>(?:www<span class="es0">\.</span>)?youtube<span class="es0">\.</span>com<span class="es0">\/</span>(?:v|[A-Za-z0-9#<span class="es0">\/</span>_<span class="es0">\-</span>]*)<span class="es0">\/</span>(.<span class="es0">\{</span>11<span class="es0">\}</span>))$&quot;</span>;</div>
</div>
<p>Since the regular expression used in RegExpValidator needs to be a String and the String is used as a bound property in MXML the curly brackets need to be be preceded with a backslash, because without them in MXML it means a data binding. That didn&#8217;t make much sense to me, since I was binding a String variable which contained the expression, but ok, I can kind of see the problem. What made even less sense was that I needed to double escape the ? when I actually wanted to include a literal ? in the expression. So I suppose since it is a string and I want to include the backslash character before the question mark I need to escape the backslash itself, leaving \\?, but then shouldn&#8217;t I have to do that for all the other times I need to include a backslash in the expression? Well it turns out the answer is no, I do not really understand it and only figured this out after a lot of debugging.</p>
<p>Another pitfall to avoid, if you are binding on a condition in MXML and you need to use logical operators, you will need to encode the logical AND &amp;&amp; should be &amp;amp;&amp;amp;, the less than &lt; should be &amp;lt; or greater than signs &gt; should be &amp;gt;, logical OR || works fine. However in those cases you should really be binding to a function which returns the boolean result you are looking for.</p>
<p>Working with MXML is cool because you can create data-bindable UI components quickly and easily, but it really isn&#8217;t when you have to worry about silly issues like wondering why your regular expressions are not working.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.opencoder.co.uk/2010/03/regexpvalidator-issues/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Regular expressions in mysql</title>
		<link>http://www.opencoder.co.uk/2009/10/regular-expressions-in-mysql/</link>
		<comments>http://www.opencoder.co.uk/2009/10/regular-expressions-in-mysql/#comments</comments>
		<pubDate>Tue, 20 Oct 2009 21:38:00 +0000</pubDate>
		<dc:creator>Chris McDonald</dc:creator>
				<category><![CDATA[Databases]]></category>
		<category><![CDATA[Servers]]></category>
		<category><![CDATA[mysql]]></category>
		<category><![CDATA[openfire]]></category>
		<category><![CDATA[regular expressions]]></category>

		<guid isPermaLink="false">http://www.opencoder.co.uk/?p=73</guid>
		<description><![CDATA[It has been a very long time since I posted anything so I thought I would share this small snippet. Recently I had to extract some data which was stored as xml in a field within a mysql database table. To explain briefly, this data was actually saved as part of an XMPP (jabber) chat [...]]]></description>
			<content:encoded><![CDATA[<p>It has been a very long time since I posted anything so I thought I would share this small snippet.</p>
<p>Recently I had to extract some data which was stored as xml in a field within a mysql database table. To explain briefly, this data was actually saved as part of an XMPP (jabber) chat message by an openfire server. <a title="Openfire real time collaboration server" href="http://www.igniterealtime.org/projects/openfire/" target="_blank">Openfire</a> is a real time collaboration server which includes jabber functionality. To store the chat history Openfire records the chat message as an xml chunk in a field named <em>body</em> in a particular database table <em>ofMucConversationLog</em>. I needed to extract certain data that our system had communicated as a chat message through openfire, to do this I used a combination of regular expressions and string manipulation.</p>
<p>Ok, lets say I have messages that are either in format A or B as below:</p>
<pre>&lt;messageformata&gt;
  &lt;child1&gt;some text&lt;/child1&gt;
  &lt;child2&gt;some more text&lt;/child2&gt;
&lt;/messageformata&gt;

&lt;messageformatb&gt;
  &lt;child1&gt;some text&lt;/child1&gt;
  &lt;child2&gt;some more text&lt;/child2&gt;
&lt;/messageformatb&gt;</pre>
<p>Each row in the database can contain xml in the format of A or B and there are many rows. If I need to extract only the contents of the first child of message format a, I could use the following SQL code to do it (assuming I&#8217;ve created a temporary table called <em>temptable</em> to store these values):</p>
<div class="codesnip-container" >
<div class="sql codesnip" style="font-family:monospace;"><span class="kw1">INSERT</span> <span class="kw1">INTO</span> temptable <span class="br0">&#40;</span>child1value<span class="br0">&#41;</span><br />
<span class="kw1">SELECT</span><br />
SUBSTRING<span class="br0">&#40;</span>body<span class="sy0">,</span> LOCATE<span class="br0">&#40;</span><span class="st0">&#8216;&lt;child1&gt;&#8217;</span><span class="sy0">,</span> body<span class="br0">&#41;</span><span class="sy0">+</span><span class="nu0">8</span><span class="sy0">,</span> <br />
&nbsp; LOCATE<span class="br0">&#40;</span><span class="st0">&#8216;&lt;/child1&gt;&#8217;</span><span class="sy0">,</span> body<span class="br0">&#41;</span> <span class="sy0">-</span> LOCATE<span class="br0">&#40;</span><span class="st0">&#8216;&lt;child1&gt;&#8217;</span><span class="sy0">,</span> body<span class="br0">&#41;</span> <span class="sy0">-</span> 8<span class="br0">&#41;</span><br />
<span class="kw1">FROM</span> ofMucConversationLog <span class="kw1">WHERE</span> body <span class="kw1">REGEXP</span> <span class="st0">&#8216;&lt;messageformata&gt;*&#8217;</span>;</div>
</div>
<p>Unfortunately the REGEXP operator can only be used for testing true or false and so is probably only useful in a WHERE clause. There are no capturing options like in other languages (php, perl java etc.), however you can use it in combination with the substring and locate functions to get the job done, although it is admittedly tedious.</messageformata></child1></p>
]]></content:encoded>
			<wfw:commentRss>http://www.opencoder.co.uk/2009/10/regular-expressions-in-mysql/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
	</channel>
</rss>

