Free Information Technology Magazines and eBooks

Monday, May 11, 2009

How to Retrieve Content from Wikipedia using C#

How to retrieve content from WikipediaA college friend is currently developing a University Library system which is connected to internet. The system has kiosks for student to search for books they want to borrow. For example, if a students type-in Manny pacquiao on the search box, it should display all available books or magazines about Manny. Aside books result, the system should also be able to extract some information/biography about Manny. We thought that instead of populating a information database, the system will just retrieve contents from a public encyclopedia database such as Wikipedia.


Luckily for him wikipedia has a readily available API for this kind of web request. To show how we did it using C#, I created a sample project for you that can be downloaded from Mediafire.

Just follow these steps to create your own from scractch:
1. Create a new Project on your C# IDE.
2. On the form, Add a button and a richtextbox (textbox will do)
3. Double click the form to show the coding window. Add the following namespaces at the Using directives.

using System.Net;
using System.IO;
using System.Xml;
using System.Xml.XPath;


4. Now double click the button to show the coding window then on the click event paste the following code:

private void btnRetrieve_Click(object sender, EventArgs e)
{
HttpWebRequest webRequest = (HttpWebRequest)WebRequest.Create("http://en.wikipedia.org/wiki/Special:Export/Manny_Pacquiao");
webRequest.Credentials = System.Net.CredentialCache.DefaultCredentials;
webRequest.Accept = "text/xml";
try
{
HttpWebResponse webResponse = (HttpWebResponse)webRequest.GetResponse();
Stream responseStream = webResponse.GetResponseStream();
XmlReader xmlreader = new XmlTextReader(responseStream);
String NS = "http://www.mediawiki.org/xml/export-0.3/";
XPathDocument xpathdoc = new XPathDocument(xmlreader);
xmlreader.Close();
webResponse.Close();
XPathNavigator myXPathNavigator = xpathdoc.CreateNavigator();
XPathNodeIterator nodesIt = myXPathNavigator.SelectDescendants("text", NS, false);

while (nodesIt.MoveNext())
{
rtWikiContent.AppendText(nodesIt.Current.InnerXml);
}
}
catch (Exception ex)
{
MessageBox.Show("Error while retrieve from Wikipedia. " + ex.ToString());
}
}



5. Now lets test the application. Clicking the retrieve button should display a result like this.

Retrieve Content from Wikipedia

If you want to try the Visual Studio 2005 project included with this post, download it here.

For more C# tips and tricks, subscribe now


4 comments:

Anonymous said...

Your program returns the error 403: Forbidden. Do you know how we can solve this problem?

kausik said...

i am also getting same error could u please fix it

Murali said...

Hi all,

This is Murali. I am also getting the same error. Could some one please let me know fix for this issue as soon as possible.

Thanks
Murali

A. Sylvester Rajkumar said...

Hi All,

To remove 403: Forbidden error, add the following two lines in your code:

webRequest.Timeout = 10000; // 10 secs
webRequest.UserAgent = "Code Sample Web Client";

Add the above two lines after..
webRequest.Accept = "text/xml";

Complete code works FINE.

THANK YOU SO MUCH Fryan Valdez!!