Internet2
Site Index | Internet2 Searchlight |
Membership | Communities | Services | Projects | Tools | Events | Newsroom | About
 | Internet2 Home > Middleware

Middleware

>Home
>Middleware
   Overview
(PDF)
>Mailing Lists


Minutes From The 11/18/04 Bimonthly meeting

 


Agenda
Participants
  • Reports about Supercomputing 04 (SC) Conference
  • E2ED Early Adopter Focus Group Identification Process
  • Post Pilot Plans
  • Whitepapers
  • George Brett - Internet2 (scribe)
  • Chas DiFatta - CMU (chair)
  • Mark Poepping - CMU
  • Russ Hobby - Internet2
  • Matt Zekauskas - Internet2

1. Reports about Supercomputing 04 (SC) Conference.

Matt reported that diagnositically he hadn't seen as much as he had in past Supercomputing conferences since he didn't ask to be notified. He did notice some issues with traffic to San Diego, some issues with the Store Cloud getting out. Russ said there were some routing asymmetry issues. Matt said the re-configuration for the bandwidth challenges created issues as well. There were more lines coming into the conference than before. Matt said there were something like 17 10GB lines into the show with only 9 lines to the NOC for the meeting.

Chas talked with David Richards of University of Washington doing video between Washington State, Pittsburgh and Australia was seeing lots of dropped packets. Russ said the Research Channel was pumping 2.5 Gb/sec in both directions.

Chas said this was his first SC. He came across two things from his discussions: 1. Days of big iron of being very specialized are over. Lots of commodity hardware is being used for this. 2. When something breaks the issue of how do you fix it is not clear. Major vendors only have performance tools. Not even profiling tools, more utilization tools. Mark commented that it's more of a redundancy forward path rather than a diagnostic "what went wrong" issue. Chas agreed and added that they don't drill down into the problem. Their solutions don't scale. Russ said back in mainframe times they'd use a system dump and analyze that. Mark pointed out that currently they just buy or add new units to replace the ones that are broken. Matt mentioned having programs watching programs -- yet more redundancy. Mark pointed out that this is the difference of have alerts and probes versus ability to do diagnosis to determine causes of the problems. The discussion used metaphor of people not working and being able to diagnose illness to make them well.

2. E2ED Early Adopter Focus Group Identification Process

Chas has worked on the wiki page to flesh out a more of what we had talked about earlier. He added requirements to the columns of end-user applications. The question is "What data tools do they use to do end to end diagnostics?"

A first thing to do is to identify which groups to engage which are the vertical groups listed in the matrix. Looking at portal, web apps in general, peer to peer would be LionShare. Authentication/Authorization would be Shibboleth deployment group at PSU, wireless would be SURFnet, Video and audio conferencing, and bulk file transfer (which came from Matt Mathis). A good approach would be to pick one of these from each of three areas like end user, some one on help desk, an operator, a designer /developer and a CIO. On the Horizontal access include observers from each of the areas.

Chas then mapped out the process as it's described on the wiki page:

  • Identify the vertical application area for focus
  • Invite at least three members of one vertical application group
  • Invite one representative from each of the diagnostician practitioner areas as observers
  • Interview the vertical application group with the following questions
    • Identify a short list (10) of the top problems that occur with respect to the application that the group is using, supporting, managing and evolving.
    • Rank this short list from least to most critical in the following areas
      • Most time consuming to diagnose
      • Most expensive to have fail
    • For each problem,
      • identify the reactive tools (if any) used to solve the diagnose the problem.
      • How could these tools be better?
    • For each problem,
      • identify the proactive tools (if any) used to solve the diagnose the problem.
      • How could these tools be better?
      • What Diagnostic data is available
  • Pick the top there problems (keep to an hour to hour and half)
    • Describe how they are solved at this time
    • Describe how the problems could be solved with a new imaginary tool

Russ said that this is a good look at the diagnostic system and be able to think of ways to improve it. Mark pointed out that it will be useful to pre-test with a closed group to validate whether this will work or not. Chas agreed this would be a good idea. After some discussion web services were recommended. The Internet2 technical support group we suggested as the initial interviewee. This would provide the vertical slice and the horizontal slice. George agreed to check with Mike LeHaye, the group manager, to see if we could interview one of his staff. Once he gets the OK, Chas will talk with the person to explain better what we are trying to do so he will know what to do. There was brief discussion about how to get this rolling.

Action Items [AI]
[AI] George will check with Mike LeHaye about participation in the diagnostics interview.
[AI] Chas will talk with the person identified to be interviewed to explain the purpose and process of the interview.

Discussion then turned to looking at identifying representatives from other areas.

Then there was more discussion about the

3. Post Pilot Plans
3a. CER discussion

Chas spoke about post pilot plans. He had really good discussion with developers about version 0.6 of the Common Event Record (CER). Last two weeks been very detailed about why and what. Direction now is toward a lightweight version of the CER. Just enough to described the event and throw it into the back plane. This would get the event into the system and then let it be parsed later on.

Another thing they're focusing on is if there is any correlation functionality that can be added and still be flexible? In the case that this is not the right way to do correlations then they want to be able to make changes later on. The plan is to use the header to do the routing. This info has been and will be captures on the web site.

Mark had only one comment of the direction with the CER that its the minimally required bits but they are designing in an extensible layer to do parsing and later add richness. Russ agreed that flexibility and extensibility is a good thing to have.

3b. Finalizing management and event transport specifications

Chas said in the formalizing process they've been looking at Beep, XML RPC, and SOAP. For the data transport they've decided that BEEP is not mature and the development has stalled. There's more going on with XML-RPC and SOAP. Since the data doesn't need to be complex so the protocol can be lightweight. The management API to query the backplane most likely will be SOAP. Currently the development wiki is not viewable, but will be made so soon. Mark added that the next version of EDDY will have the transport better documented.

[AI] Chas will make development wiki viewable.

Chas said it looks like the development is moving away from Python on the edges but will be stay with Java on the core and the edges. Mark pointed out this is a preference of the developers that shouldn't constrain any other languages later on.

4. Whitepapers
4a. Diagnostic Backplane Concept

Chas asked if anyone else had a chance to read the white papers. Russ said yes and said there are similar problems with security. It's hard to get the diagnostics into software if it doesn't improve functionality of the software. Much like security and other middleware we need to be able to make a stronger case to get the application developers to added diagnostics to their code. Chas agreed and said we need to be able to help them reduce their "pain" and improve economic benefits. Russ suggested another idea, if this is already part of a development process and just automatically produces the diagnostics code. Chas agreed. There was discussion about how to get the diagnostics better integrated in protocols and whether it's the protocol or services we should be interested in.

Chas then asked Matt if he would check out the first document which Mark said will be revised soon. Matt said he would.

 
© 1996 - 2008 Internet2 - All rights reserved | Terms of Use | Privacy | Contact Us
1000 Oakbrook Drive, Suite 300, Ann Arbor MI 48104 | Phone: +1-734-913-4250