Comments to: email@example.com
Straw Man Architecture for Group Tools
This document has been produced expressly to facilitate the discussion at
the MACE-Dir-Groups working group meeting to be held on Monday, October 13,
2003, from 2:00-4:00pm GMT-0500 in Ballroom 3 of the Indianapolis Marriott
Downtown, in conjunction with the Fall Internet2 Member Meeting. I have not
tried to make this document self-contained! If a consensus on high level
architecture is reached at the working group meeting, then I intend to draft a
working group document capturing it.
A prerequisite to implementing this architecture is a functioning identity management infrastructure. The group tools being imagined here are intended to complement and not reproduce metadirectory functionality in which data about people is integrated from a variety of systems of record. There must already be a “Person Registry”. The imagined architecture will add a “Groups Registry” to the information assets used to provision applications with the information they need.
References to member objects are treated opaquely within the Groups Registry. It is assumed that distinct memberIDs refer to distinct real world objects. Multiple sources of information about member objects can potentially be accommodated, subject to this constraint.
Reads rules, logic, & exception lists and processes a stream of person records produced by one or more existing identity management systems. Rules and logic express group memberships in terms of values of attributes in person records. These may also describe set theoretic combinations of other groups. To every rule there's an exception, so, exception lists...
Calls the API to do things to the Groups Registry.
Rule descriptions are to be chosen so that a single (selection rule, assignment rule) pair serves to allocate members to a set of groups. Example: stream contains memberID, list of departments & roles, list of classes & roles. One rule says "if department & role exist assign memberID to group named departmentValue-roleValue". Idea is to have relatively few descriptors result in relatively large number of groups.
Possible stream format types: flat, sql query, ldif, specially defined xml type, lcup? Also, some streams may be “complete” in the sense that absence from the stream should imply absence from membership in some groups. Hence, certain rules & logic are designated as to be invoked at “end of stream”. More generally, instances of rules, logic, and exceptions lists will need to be tied to instances of streams. Example: one set of rules & logic to process an extract from the student system, another to handle asynchronous updates from a metadirectory.
Exception lists are inclusive or exclusive tables of (memberID, groupName).
Rules, logic, & lists are managed by stream loader manager app.
In contrast to what's depicted above, perhaps the rules, logic, & exceptions database should be integrated with the Groups Registry and accessed via the API so that Groups Managers can also be used to express set theoretic combinations of other groups. Also, the exception lists might be conveniently represented as groups themselves.
We might develop one or more, each tailored to manage info specific to one type of group (ie, associated with a groupTypeID - see below). Examples of group types: ad hoc "base" groups; course groups; departmental groups; role groups; exception lists. Groups managers are used by humans to manage groups. Programs should use a stream loader.
We probably should ensure that groups managed by a stream loader are disjoint from groups managed by groups manager apps, to keep uncoordinated information sources such as programs and humans from cooking in the same kitchen. This might be relaxed to the point that each manages disjoint sets of information about groups. Example: stream loader might populate enrollees, instructors, and basic course metadata for a course group but a Course Manager enables update of lists of TAs, course site developers, and other info not contained in the stream of identity info processed by the stream loader.
Following is a description of some of the tables in the Groups Registry depicted in the entity relationship diagram below. Caveat emptor: I’ve never actually operated an RDBMS, so this data model might be impractical!
The group table identifies and names each group, and also classifies it as being "elementary" or "compound". Compound groups are set theoretic combinations of elementary groups, and elementary groups explicitly list their members in the membership table.
Membership for all elementary groups, and for multiple membership fields if present in the implementation, is lumped together in a single membership table. The membership table's groupFieldID field enables some groups to have more than one "membership list" associated with them, for whatever purposes (examples: RBAC; security; course groups). The memberID field can contain a groupID to support subgroups. Other references to member objects in the memberID field are treated as opaque strings. Because all membership is kept in one table, there is no membership referential integrity issue to be managed, and common queries such as "list all groups to which X belongs" and "list all immediate members of group A" should be facilitated.
The metadata table stores all non-membership data for all groups.
The schema table identifies the set of types to which each group belongs. It provides a way for a group to have one or more of several sets of fields associated with it, analogous to auxiliary objectclasses in X.500 databases. For example, a "base" type inherited by all groups might include fields for the group's owner, name, description, and one membership list. A "course" type might add membership fields for enrollees, instructors, and TAs, as well as a course ID and other course offering information.
The types table lists and names the set of types that groups may have. The typeDefs tables lists the fields associated with each group type, and the fields table associates a displayable name and optional syntax with each field. These three tables would typically be read in by applications as part of their initialization prior to exercising the API.
The aging table supports a periodic aging task (see below) to enable a graceful means of getting rid of stale groups.
The connectors that will be provided with changes to each group are listed in the presentation table. There is an optional indication of the type of presentation of the group in that connector. Example: a group might have both static and forward referenced presentations in the ldap connector and be present in a legacy connector with no specification of how represented.
The change table stores a representation of each "atomic" change to the Groups Registry, as determined by the API. The changeNum field is auto incrementing. The changeSource field contains the credentials of the source of the change for auditing purposes. The change field itself contains some type of representation of the change. That might be ldif-ish, xml-ish, or something else...
I haven't gotten around yet to thinking about how to model compound groups.
The API mediates all access to the Groups Registry. It is the integration point between stream loader and one or more ad hoc groups management apps that enable humans or other agents to source group info.
The API also serializes changes from all sources to the Groups Registry and assigns a change number to each “atomic change”. Change numbers are used by provisioning connectors to grab all changes of interest since change number thus-and-such. It is anticipated that a single changed record presented to the Stream Loader can result in adding the member object to or removing it from many different groups. Hence, to preserve transactional integrity (and possibly to optimize performance of provisioning connectors) the suite of changes to membership resulting from a single changed source record ought to be considered one atomic change. Other atomic changes might be the entire specification of a new group (including its membership) or updates to the metadata of a single group resulting from a single action within a Groups Manager application.
Group creation causes issuance of a groupID and binding to a unique group name.
Illustrative API calls (note: “*” indicates a reference to a data structure):
List_immediate_members (groupID, groupFieldID)
List_effective_members (groupID, groupFieldID)
Add_group_type (groupID, groupTypeID)
Remove_group_type (groupID, groupTypeID)
Create_group (groupID, *(groupFieldID->groupFieldValue) )
Add_group_data ( *(groupID->*(groupFieldID->groupFieldValue)) )
Remove_group_data ( *(groupID->*(groupFieldID->groupFieldValue)) )
List_changes_since (changeNum, connectorID)
List_aged_groups (aTime, agingStateFilter)
Change_aging_state (groupID, agingState)
Roll_change_table (changeNum, size)
Create_group, Delete_group, Add_group_data, and Remove_group_data increase changeNum. All changes resulting from one call form one atomic change.
· Aging related tasks
· Trim change table
· Kill old sessions
Each has a credential for accessing the Groups Registry. Each periodically requests all changes since changeNum. Saves changeNum returned by List_changes_since for the next polling cycle. List_changes_since filters information from changes by reference to the connectorID in the presentation table, so that only changes to groups being provisioned by a connector are presented to it.
Connectors are responsible for performing any meaningful referential integrity in their consumers. They are also responsible for maintaining appropriately represented groups, including spatial location in the case of an ldap directory.
Hmm, basic question: should List_changes_since return a table (i.e., iterate through record structures passed in RAM), an xml doc, ldif, or ... ? This is related to the question above concerning the representation of changes in the change table.
TBD - role structure internal to the API & Groups Managers