Monday, March 7, 2016

Resolve Relative URL with C#

Recently I wanted to browse a Webpage for links and then follow them with C#. During this of course also relative links were found and I found out that it is not that easy to in the Internet find a way to resolve these with C#. First I thought about writing my own function, but then I found a very easy method for this which I want to share here.
Let us begin with something about links in the Internet: In HTML one can refer to another page, thus create a link, as follows:

<a href="http://www.sudokusoftheday.blogspot.de/">Linktext</a>

In the above link I gave an absolute URL as target, which can be seen by http://www.
But I also can refer to pages relative to my current page, for example:

<a href="../../p/youtube-channel.html">Linktext</a>

This link calls the page http://csharp-tricks-en.blogspot.de/p/youtube-channel.html . Since this post is located in the virtual folder "/2016/03/", we browse two folders upwards towards the root URL via "../../", and from there call the page /p/youtube-channel.html.

When absolute links are encountered, this is no problem, we can simply follow them. But if we browse one page and want to follow a local link, this is a bit more complicated, also since all other known path expressions are allowed, like "../" seen above.

To not have to build these paths manually together one can use the class  System.Uri in the form System.Uri RelativeURL = new System.Uri(BaseUri, "relative Path");
As an example let us consider the Wikipedia article about the "Pronghorn". In this there is a relative link to the Wikipedia article about "Deer":

<a href="/wiki/Deer" title="Deer">deer</a>

To get the valid absolute link we execute the following code:
System.Uri Base = new System.Uri("https://en.wikipedia.org/wiki/Pronghorn");
System.Uri ResolvedAbsoluteURL = new System.Uri(Base, "/wiki/Deer");

No comments:

Post a Comment