Introducing Xerxes 2

Way back in 2004, I first set out to build an application that could serve as an improved user interface to Metalib, the federated search system developed and sold by Ex Libris.

Just a few years prior, Cal State had licensed Metalib. We liked its knowledgebase and search capabilities, but wanted a lot more control over how the end-user interface looked and behaved. So we worked with Ex Libris to flesh-out an application programming interface for Metalib (called the X-Server), and, using that, built our own system that not only provided an improved user interface to Metalib, but also served as a platform from which we could add additional features and functionality not available in Metalib.

I called the system Xerxes, and it immediately had a major impact on our usage numbers. As we replaced the standard Metalib interface with Xerxes, we saw a huge increase in searching and the number of times users accessed the full-text of journal articles — even more than we had expected.

In 2007, we released the system under an open source license, and today over 40 institutions (mostly universities) across the globe have implemented Xerxes, with some also contributing code back to the project.

Times have changed

In that time, a lot has changed, too. A new class of library search engines – often called discovery systems – have taken the academic library market by storm, and now all but replaced federated search systems like Metalib. We like the search capabilities of these systems as well, but still want to have that extra level of control over the user interface. What can I say, we’re controlling like that.

The primary goal of Xerxes 2, then, is to provide a fully customizable and extendable interface to this new generation of commercial discovery systems, including Summon, Primo, and Ebsco Discovery.

Xerxes 2 can also serve as an interface to open source search engines like Solr and no-cost web services such as the Ebsco Integration Toolkit and the Worldcat API, which can augment (or even serve as an alternative to) commercial discovery systems. We’ll continue to use Metalib as well.

That’s a lot of systems to support. We won’t be using all of them here at Cal State, of course. But I’ve already written between 80-100% of the code needed to integrate with each of the systems mentioned above.

Frameworks

A number of new web development frameworks have also emerged and matured since I first started to build Xerxes. PHP frameworks were scarce back then, to say the least, so I just decided to roll my own home-grown framework, adding to it as I went. When Jonathan Rochkind joined the project, he improved the framework considerably, but even now it has all the usual short comings of a home-grown framework.

Along the way, we also started to use Prototype for JavaScript development, which seemed brilliant at the time, but these days is decidedly out of style.

Seems like now is as good a time as any to move to more modern frameworks like Zend Framework for PHP and jQuery for JavaScript.

From experiment to production

In fact, I’ve been working on Xerxes 2 for quite some time now.  We’ve been using Xerxes as an interface to Solr, Ebsco, and Worldcat in production here at Cal State going on two years.

I originally wrote that code within the Xerxes 1.x code base, and learned quite a few lessons along the way.  Now I’m focused on reworking that earlier code into a production-ready, general release using Zend Framework 2 and jQuery.  You can follow my progress on Github.

More information to come

In the next few posts, I’d like to set-out three high-level design goals for the new version. I’ll then return to the technical issues of architectures and frameworks. I hope to offer up a few posts here and there on my adventures with the new Zend Framework 2, which as of January 2011 is still in beta.

Posted in Xerxes 2 | Leave a comment

Using Metasearch to create a journal table of contents alerting service

In this screen cast I go over both the theory and the high-level technical requirements for using metasearch as a model for RSS table of contents.

Posted in Uncategorized | Comments Off

Bridge: WorldCat in Context

In this screen cast, I go over the design of a system to create localized WorldCat pages for books and other items that you might link to from your catalog or link resolver. This was my entry for the 2010 OCLC Research Contest.

Posted in Uncategorized | Comments Off

Impact of Xerxes at Cal State Fullerton

In this screen cast, I examine the large increase in usage that came with the migration from Metalib to Xerxes at Cal State Fullerton.

Posted in Uncategorized | Comments Off

Improving the SFX menu

In a screen cast video I dissect some problems with the SFX menu and show a simpler interface developed at Cal State.

Update: Transitioning to the new Ex Libris simplified menus

If you previously downloaded the CSU SFX Templates, I now recommend switching to the SFX “simplified” templates using the new CSU Theme.

Here is a video detailing Transitioning from the CSU Templates to the Ex Libris Simplified Templates

And here is the code on Ex Libris Commons .

Background: The design ideas behind the CSU templates

Here is the original video detailing how we went about improving the SFX Menu from early 2007.

Posted in Uncategorized | Leave a comment

Dealing with double-escaping in the X-Server

The X-Server double-escapes some XML character entity references, which, left unresolved, will affect the display of certain letters and symbols in your results. This article describes a simple post-processing fix to solve this problem.

Character references in XML

In XML, the ampersand is a reserved character. Together with a semicolon, it is used to delimit character entity references: hexadecimal or numeric codes used to represent some accented characters, diacritics, and other symbols. A character like é (e-accute) will sometimes be represented as é for example.

It is illegal in XML to have a “bare” ampersand, such as:

<title>Jack & Jill's Adventures</title>

In this case, we need to escape the ampersand by converting it to the special ampersand character reference:

<title>Jack &amp; Jill's Adventures</title>

The problem

In order to prevent bare ampersands from getting included in the output, the X-Server converts them to the special ampersand reference. That’s a good thing. But some of the databases that Metalib searches also include these XML character references. The X-Server respects and preserves some of the basic entities, but escapes the leading ampersand of hexidecimal entity references. A reference such as &#xe9; will be converted to &amp;#xe9; for example.

This is what we call double-escaping. This proves problematic since an XML parser won’t recognize these double-escaped character references. Rather, it sees them for what they have become: an ampersand (&amp;) followed by some extra letters and numbers (#xe9;). What your users will see in their browser, then, is “San Jos&#xe9;”.

To complicate matters, some databases (and especially those that are screen-scrapped) have HTML character references in them, such as &eacute; for é. Although perfectly valid in HTML, they are illegal in XML without a supporting DTD definition. Oddly, it might be a good idea for the X-Server to double-escaped these references.

Confused? Don’t worry.

The solution

Using XSLT we can actually come to a very simple and convenient solution to this problem. All we need to do is convert the X-Server response (double-escaped characters and all) to HTML. Once we have the HTML as a string, we can do a quick find-and-replace to convert all ampersand references (&amp;) back to the regular ampersand (&).

The Xerxes PHP code looks like:

// get xml response from x-server
$xml = $metalib->retrieve( $result_set, $start, $max);

// transform to html
$html= $page->transform($xml,"xsl/results.xsl");

// undo double-escaping
$html= str_replace("&amp;", "&", $html);
That will restore our hexidecimal references back to what they should be. It will also leave some bare ampersands and the HTML character references in the output, but since we’re now in HTML instead of XML, it doesn’t matter. We can hand all those references to the browser, and it displays them just as you would expect.
Posted in Uncategorized | Leave a comment

Developing fast X-Server applications

This article offers some design considerations aimed at improving the speed of an application using the Metalib X-Server.

Performance

Web services are all the rage these days, but they sometimes come at the expense of bandwidth and performance. XML-based messages are bulky; firewalls and network latency can slow down transmissions, and XML programming objects and processors can quickly eat-up computer resources.

The benefits of a web services approach can far outweigh these performance costs, especially in the case of Metalib, but design is critical. Here are some recommendations from the field:

1. Build locally whenever possible

One of the great benefits of the Metalib X-Server is that you don’t have to use it. That is, you can utilize only those functions that you need, and locally develop other functions to take advantage of improved flexibility or speed.

Login

The PDS authentication system can be painfully slow. It’s not unusual for users at Cal State, for example, to experience 30-second-plus wait times during login!

The X-Server includes functions for authenticating a user via PDS, but our Xerxes system can directly authenticate users against our local Active Directory server in sub-second time.

Categories and database selection screens

The X-Server includes several functions for retrieving information about subject categories and databases in the Metalib Knowledgebase. However, if you store all of this information in a local database, you can retrieve it much faster.

We keep all of our information about our subscription databases and subject categories in a couple of simple Oracle tables. All you need is a column to hold the Metalib ID number for each database so you can feed that to the X-Server in a metasearch query.

Save and export options

Ex Libris has no immediate plans to include X-Server functions for saving records to a user’s e-shelf, so if you wish to give your users the ability to save and export records, you really have no choice but to build those options locally. Xerxes saves user-selected records to a local Oracle database, again providing significant performance advantages when retrieving those records later in a user’s “saved records” page.

2. Use the URL parameters

When I switched from using the XML POST method to using the URL parameters, I saw a noticeable improvement in response time.

This makes sense. It takes web services more time to processes XML than to grab URL parameters, and you’ll see significant performance improvements in other popular web services like Amazon when using REST versus SOAP .

3. Only ask for what you need

Most of the initial communication between your application and the X-Server includes small messages regarding the status of your metasearch request. When you start pulling back search results, however, the responses get very large, very fast.

By default, Metalib sends back the entire record for each result. That’s not a problem when you’re requesting one record at a time. But when you initially present users with the results of their search, you’ll likely be showing them 10 records at a time.

The X-Server provides a very useful parameter here called view , which you can set to full , brief , or customize . Setting view to customize allows you to specify additional field parameters, instructing the X-Server to output those fields (and only those fields) in the response.

X?op=present_request&format=marc&view=customize
&field=020##&field=100##&field=245##

By requesting only those fields that you plan to display (or use in display logic), you can keep the response size down, allowing the X-Server to deliver it across the network faster.

4. Save OpenURL parsing until you need it

One of the fields you can ask for in a customize view is OPURL , which instructs the X-Server to return an OpenURL Context Object with the record. Parsing out the information necessary to create the Context Object is resource intensive on Metalib’s end, and will significantly slow down response times in your 10-record results page.

Xerxes doesn’t ask Metalib for an OpenURL when pulling 10 records at a time. Instead, it displays an SFX button that simply points to a redirect page with the record’s identifiers in the querystring.

<a href=”metasearch?action=sfx&resultSet=004120&startRecord=000002″>

Only once the user clicks on the link does Xerxes turn back to Metalib and ask for an OpenURL, in turn redirecting to our SFX menu.

5. Keep transformations as simple as possible

One of the great benefits I see in the X-Server is the ability to customize results based on the source database. We can, for example, provide links to the native full-text, show the format of each record, and other unique fields by looking at where that record came from, and processing it accordingly.

The more conditional logic you put into an XSLT transformation or an XML programming object, however, the longer it takes to display the results. We’re talking milliseconds here, so customization of the results based on the source database is certainly possible. You just need to strike a good balance.

6. Cap metasearch times

Although some vendors have greatly beefed-up their Z39.50 and XML servers in recent years to help facilitate metasearching, other vendors utilize slower servers.

With Xerxes, we’ve set a local limit on the number of times it will check the status of a metasearch. If it hits that limit — 35 seconds — and not all databases are done searching, it merges what is available, and moves on.

7. Cache results

If necessary, an X-Server application can pull down search results asynchronously and cache the XML locally. For example, after your application displays the first page of results for a search, it can immediately kick-off a request to Metalib for the next page of results, storing that on disk or in a database until the user hits the next button, at which point it would pull it up locally.

We haven’t had a need to go this route yet, but caching and asynchronous pulling of records are useful methods to keep in your toolset for large web services projects.

8. Design a usable interface

Ultimately, speed is a matter of perception.

Jared Spool , the popular web usability expert, has done a lot of research in this area. He reports that user perceptions of a website’s speed have little to do with how many seconds it takes for pages to load . Rather, if users find your website or application easy to navigate and use, they perceive it to be fast. If your application is difficult to use, however, they begin to complain about speed.

Devoting the lion’s share of your time and resources to designing a usable interface that provides clear labels and simple navigation, and gives users enough information to make decisions easily, can do more to improve your application than any of the technical recommendations above.

The ultimate speed concern here, then, is not page load times specifically, but rather how quickly a user can get into your application and find the full-text of the books and articles they are looking for. A lot can be said on this point, but I’ll leave that for another article.

Posted in Uncategorized | Leave a comment

Top ten reasons to use the Metalib X-Server

The X-Server is an optional, purchasable API to Metalib that allows you to design a custom metasearch application. But is it worth the price tag and development time and resources? Here are ten reasons to give the X-Server some serious consideration.

10. Greater ownership and buy-in

Metasearch is a hot topic in libraries today, but still controversial. In some corners, it has been met with skepticism and even outright resistance. And not without reason: Metasearch systems are still immature and, frankly, rather clunky. Typically, vendors give libraries little or no means to improve the out-of-the-box interface.

The X-Server gives you complete and total control over the interface and even much of the functionality of your metasearch application. Knowing they can make real changes to the interface to address key concerns can give librarians and other stakeholders a greater sense of ownership. That’s crucial to the success of any system.

9. A growing user community

The X-Server lets you design a wholly customized interface from scratch. But you don’t have to start there. Xerxes is an open source project, with code available in .Net and PHP. You can start with that code, and then add your own or borrow from the other libraries building X-Server applications as you discover new and useful ideas.

8. Build completely unique applications

For example: Together with SFX and MARCit, two other Ex Libris products, we’re using the X-Server to drive an RSS-based table of contents alerting service .

7. Integration with other library systems

Integration is vital to academic libraries; so much so that I’m going to mention it once here and again below.

The X-Server allows you to more easily integrate other library systems into your metasearch application, including tighter integration of online reference, library and union catalogs, and one of the most heavily used of library resources, reserves.

6. A simpler design model

In customizing the regular Metalib interface, you can spend months delving into hundreds of HTML fragment files or hacking around the system with JavaScript in order to make what ultimately amount to minor interface changes; changes that will almost surely be wiped out in a future upgrade (more on that later).

If you know a thing or two about XML and web programming, however, designing an X-Server application can be very simple and straight-forward. Instead of wasting time trying to work around Metalib, you and your programmers can be making real, substantial improvements.

Change is inevitable on the Web, and organizations need to plan for it. Our Xerxes interface is simple, consisting of only five pages and some XSLT files. We will redesign it at some point. Better to only have to touch five files than 125.

5. Greater integration possibilities outside the library

Academic libraries are increasingly looking to push their content and services to where users are. Today, that may mean integration of your library systems with a campus portal, and one or more learning management systems. Tomorrow it may mean integration with a larger university repository, or the half-dozen other systems coming down the pipe.

The X-Server allows libraries to integrate metasearch services into a system of their own design. Having control over your own system gives you the flexibility to integrate that system with whatever and whomever you like. No need to wait for vendors to decide the market can bear it, or pay third parties to sell you temporary bridges.

4. Enhanced functionality

When most people think of an interface, they think about how an application looks. But the user interface encompasses the entire interaction experience between the user and the system. Using the X-Server, libraries can add or enhance the functionality of Metalib.

In Xerxes, for example, we’ve completely reworked the way users choose categories  and select databases , offering a much more open and intuitive design. Xerxes is also able to pre-determine full-text and print availability for each record at the results level . We’ve also added different saving and export options, as well as a much, much faster login.

3. Completely extensible interface

Like any good web service, the X-Server simply sends back XML to your application. You can do with it as you please.

In Xerxes, for example, we offer a much more open and intuitive display of the result summary, sorting options , and paging navigation . With each result, Xerxes offers users a brief description of each record based on the abstract, the format  (book, article, dissertation, etc.) of the work, key fields for specific databases, and other information that library users say they find useful.

2. Immunity to system upgrades

Each new release of Metalib, from version 1 in 2001 to the upcoming release of version 4 at the end of 2006, has brought substantial interface changes. Any customizations libraries applied in previous versions were subsequently wiped out; in many cases rendering months of work to naught. Given limited resources, academic libraries can ill-afford to waste their time creating and re-creating interfaces from upgrade to upgrade.

The X-Server creates a layer of abstraction between the application layer (Metalib) and the presentation layer (your system). The upgrade to Metalib 4 this summer, for example, will have no impact on Xerxes at all. All of the hard work we’ve put into designing and customizing the interface will not have to be redone, ever. It’s time well spent, and allows us to invest our resources into additional improvements or integration with other systems.

1. Greater return on your investment

You’ve already spent a good amount of money on Metalib to give your library metasearch capabilities. But if you can’t get buy-in from your librarians, or your users find Metalib difficult to use, how much are you getting out of your investment?

Further, academic libraries spend 100s of thousands of dollars on collections each year. Yet, the disconnected nature of subscription databases leave many virtually unused.

If done right, the X-Serer allows you to design a more usable interface with enhanced functionality, allowing users to more easily discover and access underused library collections. It allows your developers to stop hacking and start designing; allows your librarians to stop teaching mechanics and start focusing on improving research skills.

Sound utopian? We’re already doing it now at Cal State.

Posted in Uncategorized | Leave a comment