Internet2
Site Index | Internet2 Searchlight |
Membership | Communities | Services | Projects | Tools | Events | Newsroom | About
 | Internet2 Home > Middleware

Middleware

>Home
>Middleware
   Overview
(PDF)
>Mailing Lists


Minutes From The 10/9/03 And 10/13/03 Bof

 


Date
Agenda
Presented At
Given By
10/13/03
  • define a regular conference call schedule
  • review first draft initial charter
  • review first draft of year one expectations
  • discuss if the core members are a correct fit
  • and if we should augment
  • review year one activity roadmap
  • compile an agenda for the October BoF and
  • who should we solicit to attend?
Combined Conference call and Internet2 Fall 2003 Member Conference - MW-E2ED BoF
Chas DiFatta

Participants on 10/13/03 Conference Call

  • Chas DiFatta - CMU(chair)
  • Steven Carmody - Brown
  • Renee Frost - Michigan/Internet2
  • Scott Cantor - OSU
  • Mark Poepping - CMU
  • Von Welch - NCSA
  • Nate Klingenstein - Internet2(scribe)

Discussion

The scribe apologizes for missing the first thirty minutes of the member meeting session due to scheduling conflicts.

Architectures & Bounds

The first thing the group tried to do was establish bounds on the project by trying to understand what models are in use on campuses so that the diagnostics are well-designed for standard architectures, so setting some scope of what components will need to be measured. More importantly, the group evaluated what would be an appropriate thing to measure. The project aims to eventually create a set of diagnostic tools that can function across realms and administrative domains. This would allow for testing of new inter-domain applications, such as Shibboleth.

Mark is concerned that trying to measure anything in terms of performance details, such as response speed or server load, will immediately open a huge set of ratholes, both in terms of technical difficulties and in terms of other projects trying to do similar things. Performance analysis is a different game; it implies a continuing effort to tune things, which is outside the scope of monitoring and diagnosis.

Steven shared the same concerns, preferring to build a set of baselines "and a crude alarm system." Dealing with anomalies is a more critical and applicable goal in the short-term. The group warned that the term "performance" is overly broad and can be misinterpreted, and Chas offered to refine the charter and other documents based on this.

Michael Gettes of Duke at the member meeting asked which campus components would be monitored, suggesting he'd like information for individual applications, how DNS is integrated, the network itself, and several other pieces of information. In response, Russ Hobby and Matt Zekauskas of Internet2 will be joining these calls as representatives of a similar measurement effort in the Internet2 End-to-End initiative.

Federated Diagnostics

A project facilitated by Internet2 which is now on indefinite hold is the DoDHE, or Directory of Directories for Higher Education. This project aimed to build a centralized repository for LDAP queries about individuals at institutions leveraging data that was already public and widely available. This data could be centrally stored or pulled dynamically in a distributed fashion. However, extreme hurdles were faced when trying to get institutions involved as people asked how public this information really should be.

There may be lessons to learn here when it comes to the storage and accessibility of log files, which may sometimes contain sensitive information. It's possible to sanitize these files, query them dynamically with access controls, or use various other techniques to protect data, but this will be a fundamental hurdle the project will face as it moves forward into the federated world.

An approach that Chas suggested to this during the member meeting was to have events and data move along different streams of logging, allowing one set of information to be used for generally accessible event information and federated diagnostics, possibly stored centrally or included as part of a web of diagnostic information, while another more detailed set were used for internal diagnostic work. The storage of log files in a centralized or distributed fashion is one of the central questions to this project given that there will be distributed systems and applications to query even in an intra-realm scenario.

This drew some comments from the crowd about the need for separate streams of information, but by response, this is partially necessary due to the nature of the data itself; for help-desk purposes and for allowing application users to look more at relevant logs, this distinction seemed useful. While the tools produced don't necessarily have to be used in an inter-realm fashion, this is one of the primary reasons for the work.

Active, Passive & Event-Driven

The biggest discussion at the member meeting was whether the monitoring tools should make use of active, passive, or event-driven techniques, listed here in order of increasing complexity. Active monitoring would initiate its own actions and measure the performance dynamics of the service responding to those actions; passive monitoring would instead look at the logs or other information of these services to watch for errors or similar problems.

An event-driven system would be difficult; saying there's a need for all the information related to an event that happened at some point is difficult, requiring many shims on the backend. Chas said this isn't a "large hammer you need to throw into your infrastructure;" but there needs to be investigation of the back-end threading necessary to determine whether this is a feasible approach or not. Filtering events and tagging them somehow would allow for more sophisticated forensics, reporting, and analysis and potentially better support the needs of multiple applications without presenting an overload of information.

The decisions made here will hinge on central questions about how much information should be logged and whether most errors will be considered reproducible or not: Chas's categorical answer to that question was that the goal is not to gather the absolute largest amount of information possible, but instead to gather specific information that will likely be most relevant to the diagnosis at hand.

Steven cited an anecdote where a Shibboleth site had asked him once, "what went wrong last night between 11:30 PM and 2:00 AM?" That sort of problem can't be feasibly replicated due to the sheer number of variables involved and given the sort of uptime sought by these relatively critical services, that sort of question may be important to be able to answer to diagnose and patch systems.

 

© 1996 - 2008 Internet2 - All rights reserved | Terms of Use | Privacy | Contact Us
1000 Oakbrook Drive, Suite 300, Ann Arbor MI 48104 | Phone: +1-734-913-4250