Today seems to be the day for posting useful stuff on the blog. Actually, it’s just so I can quickly useful find bits of code I might need again.

So I’ve a got a web page that displays some text a user has entered that may contain URLs and I’d like them to be displayed as actual hyperlinks.


public string ConvertUrlsToLinks(string msg)
string regex = @"((www\.|(http|https|ftp|news|file)+\:\/\/)[_.a-z0-9-]+\.[a-z0-9\/_:@=.+?,##%&~-]*[^.|\'|\# |!|\(|?|,| |>|<|;|\)])";
Regex r = new Regex(regex, RegexOptions.IgnoreCase);
return r.Replace(msg, "$1").Replace("href=\"www", "href=\"http://www");



Not that this is is going to compete with a full blown keyword or tag service like Google offer, but I needed a very quick way to suggest keywords and tags as users filled out a blog form. There’s no fancy thesaurus or library of data, but it gives surprisingly good results. At least, it gives the user a few hints and ideas about what they should put in as a keywords.

public string GetKeywords(string text, int minCount, int minLength, string additionalWords)


string keywords = "";

Dictionary dict = getKeywords(text, minCount, minLength);

foreach (var entry in dict)


keywords += entry.Key + ",";

if (additionalWords.Length > 0)

keywords += additionalWords;

return keywords.Trim(',');


private Dictionary getKeywords(string text, int minCount, int minLength)


text = StripHTML(text);

text = StripCrap(text);

text = text.ToLower();

var stopWords = new string[] {"about","above","across","after","again","against","all","almost","alone","along","already","also","although","always","among","an","and","another","any","anybody","anyone","anything","anywhere","are","area","areas","around","as","ask","asked","asking","asks","at","away","back","backed","backing","backs","be","became","because","become","becomes","been","before","began","behind","being","beings","best","better","between","big","both","but","by","came","can","cannot","case","cases","certain","certainly","clear","clearly","come","could","did","differ","different","differently","do","does","done","down","down","downed","downing","downs","during","each","early","either","end","ended","ending","ends","enough","even","evenly","ever","every","everybody","everyone","everything","everywhere","face","faces","fact","facts","far","felt","few","find","finds","first","for","four","from","full","fully","further","furthered","furthering","furthers","gave","general","generally","get","gets","give","given","gives","go","going","good","goods","got","great","greater","greatest","group","grouped","grouping","groups","had","has","have","having","he","her","here","herself","high","high","high","higher","highest","him","himself","his","how","however","if","important","in","interest","interested","interesting","interests","into","is","it","its","itself","just","keep","keeps","kind","knew","know","known","knows","large","largely","last","later","latest","least","less","let","lets","like","likely","long","longer","longest","made","make","making","man","many","may","me","member","members","men","might","more","most","mostly","mr","mrs","much","must","my","myself","necessary","need","needed","needing","needs","never","new","new","newer","newest","next","no","nobody","non","noone","not","nothing","now","nowhere","number","numbers","of","off","often","old","older","oldest","on","once","one","only","open","opened","opening","opens","or","order","ordered","ordering","orders","other","others","our","out","over","part","parted","parting","parts","per","perhaps","place","places","point","pointed","pointing","points","possible","present","presented","presenting","presents","problem","problems","put","puts","quite","rather","really","right","right","room","rooms","said","same","saw","say","says","second","seconds","see","seem","seemed","seeming","seems","sees","several","shall","she","should","show","showed","showing","shows","side","sides","since","small","smaller","smallest","so","some","somebody","someone","something","somewhere","state","states","still","still","such","sure","take","taken","than","that","the","their","them","then","there","therefore","these","they","thing","things","think","thinks","this","those","though","thought","thoughts","three","through","thus","to","today","together","too","took","toward","turn","turned","turning","turns","two","under","until","up","upon","us","use","used","uses","very","want","wanted","wanting","wants","was","way","ways","we","well","wells","went","were","what","when","where","whether","which","while","who","whole","whose","why","will","with","within","without","work","worked","working","works","would","year","years","yet","you","young","younger","youngest","your","yours"};

var words = Regex.Replace(text, @"[,.?\/;:\(\)]", string.Empty).Split(' ');

var occurrences = words.Distinct().Except(stopWords).Select(w =>

new { Word = w, Count = words.Count(s => s == w) });

return occurrences.Where(wo => wo.Count >= minCount && wo.Word.Length >= minLength)

.ToDictionary(wo => wo.Word, wo => wo.Count);



minCount the minimum number of duplicate words to catch
minLength the minimum length of word to word on
additionalWords is a set of comma separated word you want tagged(!) to the end of the string. I just use this to force certain keywords and tags

So taking the content of this page (less the code snippet) like this:

string tags = GetKeywords(articletext, 2, 4, "dave");

results in:

“keywords,tags,minimum,word,dave” which isn’t a lot different from what WordPress generated for me automatically, so I used it for my post tags.

“Is a URL a URI?”
“Is a URI a URL?”
“What does URL stand for?”
“What does URI stand for?”
“What does URL mean?”

I’ve been asked so many times I’ve lost count and for the sake of my sanity, I’ve decided to explain their distinquishing marks.

URL  – Uniform Resource Locator
URI – Uniform Resource Identifier

Sometimes these achronyms are used interchangably which is almost entirely wrong. Basically a URL is a specific type of URI because it uses specific protocols like http:  ftp: and mailto: URI defines a larger, more generalized superset.

Effectively, URL is no longer a valid in technical conversations or documents and you should use URI regardless of how big a pedant you are. A URI describes both the way you access something and it’s designated location.

You may see some explainations where U is defined as Universal. This is wrong and the author is just confused.

So, let’s call the whole thing off.

Yesterday I was mainly deploying a new release of php code to a Linux server.  Not that it’s a problem, but I normally develop php web application on IIS running on Windows Server just because I know where all the knobs and switches are.  VS.PHP from is a great add-in for Visual Studio and with the Zend server side debugger it makes an awesome development environment. Sure I could use Eclipse PDT in a similar way for free, but ‘m more productive using Visual Studio; rightly or wrongly, the keyboard shortcuts are now part of my DNA.

I never quite feel comforable using Linux and I’m not sure why. Clearly it’s an excellent OS and I have  several embedded boxes which have been running Linux that I’ve never had to reboot. My WRT54G and NSLU2 always perform flawlessly. However, while on my journey home last night I realised what the problem might be. Linux makes my fingers hurt with all the typing I have to do!

Anyway, tip of the day for me was this little gem. I needed to copy the (massive) previous releases directory into my deployment working area and was happily using

cp -r blah blah

Then I started worrying about whether the permissions were being copied. (Ok so I’m happy to admit my perpetual virgin status when it comes to Linux). It was then pointed out to me buried in the directory structure were a raft of symbolic links and using cp would do a deep copy and fill up my server disc space pretty quickly. “Use tar to copy from the current directory to the new one like this”, they said.

tar cBf -.|(cd new_directory && tar xvBf -)

It worked a treat, I feel less like a Linux virgin, but my fingers still hurt.

A nuSOAP star

November 20, 2008

It’s not often I’m impressed with a new soap star – other than Hollyoaks and there’s only one reason to watch that. However, I recently had to integrate a web service client in the web site that displays a list of all the senior level consultants in the company. The website in question is written using php so a php SOAP consumer would be needed. And, as usual, I needed to get it done quickly.

There were a few options:

After some investigation and for various reasons I chose nuSOAP which is staggeringly easy to use. Here’s the code I needed to make it work – and there’s not much of it.

defined('_JEXEC') OR defined('_VALID_MOS') OR die( "Direct Access Is Not Allowed" );
$proxyhost = isset($_POST['proxyhost']) ? $_POST['proxyhost'] : '';
$proxyport = isset($_POST['proxyport']) ? $_POST['proxyport'] : '';
$proxyusername = isset($_POST['proxyusername']) ? $_POST['proxyusername'] : '';
$proxypassword = isset($_POST['proxypassword']) ? $_POST['proxypassword'] : '';
$useCURL = isset($_POST['usecurl']) ? $_POST['usecurl'] : '0';
$client = new soapclient('', true, $proxyhost, $proxyport, $proxyusername, $proxypassword);
$result = $client->call('GetConsultants', array(), '', '', false, true);
if ($client->fault)
   echo '<h2>Fault</h2><pre>';
   echo '</pre>';
   $err = $client->getError();
   if ($err)
      echo '<h2>Error</h2><pre>' . $err . '</pre>';
      print('<table class="contentpaneopen">');
      for ($i=0; $i<sizeof($result["GetConsultantsResult"]["ConsultantDetail"]); $i++)

nuSOAP is particularly easy to debug and display the SOAP request and responses. Incidently, this needed to be incorporated into a Joomla based site which normally means having to write a module or plugin. Yet again I was saved by the Jumi plugin which meant I could just put the php directly into an Article. Couldn’t be simpler. You can see it working here