Internet2
Site Index | Internet2 Searchlight |
Membership | Communities | Services | Projects | Tools | Events | Newsroom | About
 | Internet2 Home > Middleware

Middleware

>Home
>Middleware
   Overview
(PDF)
>Mailing Lists


Thoughts and Status of the Directory of Directories for Higher Education

11-23-2000 
Michael R. Gettes, Georgetown University 
Project to date

By no means did this project start on this date.  In February, 2000, related to work happening at Georgetown regarding the sale of the Georgetown University Hospital, it was believed that we would need the ability to distribute clients to the University community configured for the Enterprise Directory, but we did not wish to have to reconfigure all these clients, a major support issue, should the buyers of the hospital wish to deploy its own LDAP enterprise directory and want to integrate it with our own.  So, some investigation demonstrated that use of "smart referrals" in the Netscape Directory Server (v. 4.1) would achieve the goal of defining the University community for whitepages searches (not just a web page, but email clients as well) could be done at the server while the client configurations remained the same.  Since all this is based on referrals, the clients become very sensitive to the reliability of all the directories referenced by the enterprise directory.  In essence, all the "community" directories, in this case, the hospital, would become as critical as the enterprise directory itself since the client would be referred to each of the community directories.  The clients, web servers and email clients, all tend to search the referrals serially based on the response from the initial directory query.  Should any one directory be ill, that would appear to affect the overall performance of the service.  This is not optimal.  So, work proceeded to see how hard it would be to develop a web page employing a different search technique which might influence future development of the email clients.  All the above work was undertaken by Michael R Gettes, Lead Application Systems Programmer at Georgetown University from February to April of 2000.  The MACE group was kept informed about all this work and its developments. 

A Perl script was developed to potentially act as a search mechanism for a web service.  Perl was chosen because I  was lazy and didn't have a lot of time to spend developing the string manipulation abilities that Perl has, so it was believed Perl was the right tool for the job for proof of concept.  The PerLDAP module was employed to link the Perl language with the LDAP manipulation necessary.  Most of the standard PerLDAP interfaces are based on synchronous calls to the directory.  The Perl script awaits a reply (if there is to be one) and is blocked.  It was deemed necessary to use the asynchronous LDAP API calls that are exposed by PerLDAP (these are the same API calls used by the OpenLDAP and Netscape SDK distributions).  By making the async calls, we can now perform parallel searches.   It was realized that the LDAP libraries were really designed for making multiple async requests to the same directory server and not so much to many different LDAP servers.  While there exists some code in the guts of the LDAP libraries to handle overlapping requests of multiple servers, this code has clearly not been well exercised or, maybe, even tested since this feature just doesn't work.  On the other hand, overlapping requests to the same directory server is a heavily used technique by many vendors to great success.  Some, small, amount of effort was invested to see if the LDAP libraries could be fixed and after a few days I decided to move around this level of the problem. 

To avoid handling multiple threads in Perl, remember this was just supposed to be a proof of concept, busy waits were used around the async LDAP calls and the necessary work was performed to initiate N LDAP connections for N LDAP servers.  I then put a request out to the Common Solutions Group membership for the names of the institutional LDAP servers and their respective search roots.  I got responses from 9 schools.  One of the first things I noticed was the variations in the search roots.  As we discussed in MACE, there was a desire to get people to use standardized Distinguished Names and there was no operational standard at this point.   DomainComponent naming was the preferred naming scheme by MACE and we realized we needed to get the word out.  Additionally, as I began some individual searches of the 9 schools, it was realized there was also no standard regarding the use of the "standard" LDAP schema in the person, organizationalPerson and inetOrgPerson objectclasses.  A couple of schools simply selected the attributes they liked (like CN, SN, MAIL and so on) and created a new local objectclass.  This made it harder to understand the intended use of the attributes and we realized, again, that we need to get the word out.  University of Colorado, Boulder was reported to have said "Well, just tell us how to configure the directory and we will do it".  Based on that, Ken Klingenstein suggested that I write a kind of cookbook regarding the directory deployment at Georgetown and Princeton that I had completed.  So, like a fool, I said "yes".  The LDAP-Recipe is still an active document and attempts to stimulate discussion, ideas and methodologies for configuring and operating LDAP directories.  While this recipe is intended for use by academia, it could also be reasonably employed by any corporate enterprise service deployment. 

Initial tests had shown that, by multiplying the 9 schools sufficiently and using that as a testbed, we could search a few hundred schools and get back several thousand responses in under 30 seconds.  As I began to view this more and more as an interesting challenge, I began to learn how to handle threading in Perl and I was able to increase the performance.  All this work was done on my personal workstation, a Sun Ultra-10 (single processor).  I had also changed from using referrals from the directory to simply performing one initial search on the directory to get back the list of schools to search.  Then, should any of those schools return referrals, then the LDAP libraries would automatically chase them down. Without this change, parallel searches on the initial set would not be possible since the LDAP libs would chase the referrals and my code would never regain control until the referrals were processed sequentially.  This work was presented at the Spring 2000 Internet2 Members meeting during the Middleware 201 workshop.  At the Spring meeting I was able to speak with Mark Smith of iPlanet (Netscape Servers).  Mark is one of the original LDAP developers from the University of Michigan along with Tim Howes and crew.  I asked Mark about making some small modifications to the Netscape Directory Server Gateway, which is just a web interface to the directory that comes with the Netscape DS product.  The unmodified DSGW web interface is in active use at Georgetown for both the whitepages service and for handling priv'd access and modifications.  The changes we discussed would be little tweaks that would allow DSGW to call an external program to handle searching.   Mark agreed to do this work and was implemented a month or so later.  This allowed me to widen the audience of presenting this as a service that others could see and not just some output from a unix program.  As more people saw this prototype service, some would get really excited at the prospect and others would get "freaky".  I believe the "freakiness" came from the X.500 deployments back in the early 1990's when X.500 was trying to achieve the same goal as the DoDHE.  But, back then, computing horsepower was far slower, networks were far slower and X.500 was considered a bit of a pig process.  What the prototype seemed to show was the world has significantly changed and that consideration of the work from several years ago using a lighter protocol, LDAP, is a reasonable investigation.  Some also believed that performing parallel searches against institutional directories is a waste of resources at the institutional level.  Why should there be a search against University X for someone who may not be at University X?  Bob Morgan, University of Washington, and Paul Hill, MIT, believed that a central deposit was a better way to go.  After quite a bit of discussion, the current plan is to do both, handle parallel searches against insitutional directories and a central deposit and let the school decide how it wants to handle its data and how it should be searched.  I believe that parallel searches of multiple central deposits will prove necessary as well as a link to other communities, like the international sector and other communities of interest to higher education. 

12-7-2000 
Michael R Gettes, Georgetown University 
European perspectives

Bob Morgan, University of Washington, pointed out some time ago (I think it was around April, 2000), that the DoDHE should consider using alternate (or LDAP independent) indices for the data to be searched in the central deposit.  I believe the reasoning was/is: 

  1. Don't re-invent the wheel.  Previous work developed by Roland Hedberg and should seek his expertise and understand the applicability.
  2. Include the European community in our efforts
  3. Architecturally speaking, the central deposit should be directory independent.  Not built against a particular technology like LDAP.
The following text is email from Roland Hedberg giving his thoughts on this project. 

Lets start with some math. Internet2 consists of ~100 schools 
assuming 20.000 staff+students per school that's 2.000.000 persons. 
Further assume that these persons will use the directory for white pages 
lookups twice a week. That would give a total of 4 M queries per week, 
or ~7 queries/second. The distribution of queries over the day is 
probably not evenly distributed so I'd guess that 90 % of the queries 
will appear during the normal working hours, hence during those hours 
you will get a mean of 12-13 queries/second. Surely there will be 
peaks that are a lot bigger. 

Now in your testbed it took 30 seconds to get back the answers for 
some query. 10 queries/second x 30 seconds/query that ammounts to 
300 simultaneous queries x 100 schools = 30.000 simultaneous 
open outgoing connections from one machine unless you do something 
very sofisticated like keeping a couple of connections open all the 
time to all the LDAP servers and just distribute the queries over the pipes. 

Still I think you would have to find a decent machine to be able to 
cope with that. And then you still have the machines on the other 
end, who has to deal with a continus load of 10 q/s and peeks 
up to 100 queries/second . 

This is the background for my thinking, I don't believe in letting 
every server having to deal with every query. It simply doesn't scale. 
It might just work for the present size of I2, but if I2 increases 
with a factor of ten ... 

Another belief I have is that users wants to use the directory 
for whitepages queries when they are using a certain application, like 
a mailclient. Since a lot of mailclients today include a LDAP client 
they will probably find it reasonable that they should be able to use 
it for doing the query. This is a mayor pain because the client 
state-of-the-art is so bad, still you/we will have to deal with it. 
There is no way they will be satisfied with a web interface. So whatever 
solution I2 chooses it has to work with common LDAP clients. 

So the system I'm imaging is a system based on distributed LDAP servers, 
loosely held together by the use of referrals. 
It's built on a server hierarchy. At the top you have a set of 
index servers that cooperate to guide the LDAP client to the 
right LDAP server to query. And below the index server you have all 
the schools LDAP servers, all of them containing superior references to 
one or more of the index servers. 
If some schools have more than one LDAP server (masters and slaves) or 
if they use a central LDAP server as the slave or even their master is 
irrelevant for the design as such. That it has a great impact on 
accessability and reliability is a completely other thing. 

This way a LDAP client connecting to one schools LDAP server can find 
all the other LDAP servers and using the indexes it will also 
find the subset that might have the information it is looking for. 

If along side this one would like to use SRV records, SLP 
or other means of finding a LDAPserver it is absolutely OK. 

Granted this is rather schetchy and need to be worked out in detail. 

This does not preclude the usage of a WEB interface like the one 
you have done, I'd only like to see it use the index servers 
before going to the schools LDAP servers. 

A guy here in Norway has done a first cut as such a web interface 
http://www.katalog.uninett.no/ldap/finn4/

You wan't be able to read the text as it is in Norwegian :-) 
but if you type "leif johansson" and do a search ( hit the 
button after the input field ) you will se from the email addresses 
that the information comes from 6 different universities. 
What this interface does is that it first queries the index server 
ldap://gids.catalogix.se:3891 and then follows the referrals. 
I'm not sure if he is serializing or doing it in parallell. 

We have an added complication here in Europe which is that we still 
have some old LDAPv2 servers in use and they use other characterset 
then UTF-8, most commonly ISO-8859-1 and T.61. So we have to use 
proxies that do character set translations. 

In I2 you might have to use proxies to do chaining on behalf of 
LDAPv2 clients who don't know how to handle references. For instance 
I think Eudora still contains a LDAPv2 client, and Netscape has a 
LDAPv2.5 client. If you have some clout at Netscape please get them 
to do a proper LDAPv3 client implementations. 

-- end of Roland's email 

 1-11-2000 
Michael R Gettes, Georgetown University 

A powerpoint presentation showing current status and an architectural view of the DoDHE was posted to the http://middleware.internet2.edu/dodhe web site.
 


© 1996 - 2008 Internet2 - All rights reserved | Terms of Use | Privacy | Contact Us
1000 Oakbrook Drive, Suite 300, Ann Arbor MI 48104 | Phone: +1-734-913-4250