Friday, April 08, 2011

XPath - fixed!

As you might have read in my previous post, I'm not too happy about the way the XPath works with namespaces.

To make things better I have created some regular expressions that will allow me to write XPaths the way I want to write them and still have them work with real XPath engines out there. It makes the XPath ignorant towards namespaces that are not defined in my XPath query. Namespaces that are defined in my XPath query will be handled like before.

if(!xpath.StartsWith("/")){
xpath= "./" + xpath;
}
xpath = Regex.Replace(xpath,@"/(?[\w]+)\[","/*[local-name()=='${node}' and ");
xpath = Regex.Replace(xpath,@"/(?[\w]+)","/*[local-name()=='${node}']");

PS. I'll get back to the post and tidy the code up later

Saturday, April 02, 2011

namespace mismatch

Once in a while you come across technologies that on their own are great, but put together are painful. Either you just live with the pain, you deselect one of the technologies or you try to improve the way they interact. This is the case with namespaces and xml/xpath.

Let me just explain that I am a big fan of XML and of XPath and XMLSchemas. They are related and work quite nicely. The schemas describe the data structure and can be used for validation. The XML is the datastructure and XPath is the query language for searching in the datastructure. Together these are very powerful.

Over the years I have heard a lot of people complain about XML, but often they are using it for the wrong purpose, they don't know the power of XPath to assist them or they are drowning in namespaces and prefixes in their xml documents.

Namespaces in xml are introduced to allow cross references across schemas and thereby allowing reuse of schemas. Every namespace, other than the default namespace, is given a prefix. For loosely structured documents this enables the writer of the XML document to specify on every single element, in what namespace it is defined. A good example of that is the XSLT documents. Here prefixes helps the writer of the xml validate only the elements that are part of the XSLT transformation logic. For this purpose namespaces and prefixes are great.

What I generally see with custom xml structures however is that the structure is more strictly defined and name clashes are less likely.

If I were to define the way xml and xsd would work with namespaces I would make the namespace prefixes optional on xml nodes where type and namespace could be derived from its placement in the structure. If the schema eg. only allows for one type of "address" element at a given location, why should I specify what type of address element it is? Only where name and type clashes occur would I demand namespace prefixes.

Namespaces should be treated as metadata of elements. They are just for validation and identifying the type of an element where name clashes could occur. For everything else they are irrelevant.

That brings me on to XPath and namespaces. Namespace information is metadata as far as XPath is concerned and you have to ask for the details if you want to use it in the query. It's a different story with prefixes. Actually the prefix is part of the name you can search for. This makes absolutely no sense as prefixes are only valid within a single document, and a the next document you receive might have other prefixes but be just as valid. The impact is that you have to make your XPath queries quite verbose to ignore the prefix and it really is a painful experience. If I could redesign XPath, I would make XPath ignore the prefixes of elements to be the default behaviour. Actually I don't see where anyone should use the prefix knowledge in a search.