MW-E2ED Conference Call
November 1, 2006

*Participants*
Chas DiFatta, CMU (chair)
Paul Hill, MIT
Mark Poepping, CMU
Steve Olshansky, Internet2
Matt Zekauskas, Internet2
Dean Woodbeck, Internet2, scribe

EDDY Release: Chas reported another minor release of EDDY, called the Murphy release. The feature set of this release includes more internal error handling of the backplane and the addition of log4j, the java logging facility. Murphy includes more configuration parameters, as well.

Plans for the next release include the CER factory and a new normalizer - one that grabs data from http sites. For example, some researchers at Carnegie-Mellon use "critters and fireflies," which are small embedded sensors that can read temperature, light, sound, and other environmental variables. Chas reported that they would like enter those events into the EDDY framework to do some environmental diagnostics. While there is no definite date for this release, if things go well on the CER factory, it will probably appear sometime in February 07. Future plans include developing features related to email.

Chas has been working with the people at the University of Washington and all are pleased that the new release makes the transport independent of any specific protocol. Right now the program uses SSL between agents, but ActiveMQ will be supported in the next release.

Email Diagnostic Application Progress: Chas reported success in configuring the Carnegie Mellon Cyrus mail infrastructure to pull events from syslog, then into the EDDY database. They are logging a number of elements from email objects, including the first level message ID from the address, the time stamp, size, class, number of recipients, daemon used, body type and the relay object. There is also a machine object which will be filled out for every machine the email touches. This will include more information from the mail log, including queue ID, server ID, queue time stamp, and any xdelay. There will also be information collected about spam and viruses.

Chas has added test data and in the next couple of days will have a user interface for use by email administrators and help desk people to run statistics and queries. The email administrator and help desk at CMU will help with troubleshooting and making tweaks to the functionality. The goal is to have a demo for the BoF at the fall member meeting December 4.
http://events.internet2.edu/2006/fall-mm/sessionDetails.cfm?session=3010&event=258

The next step will be to make CERs from objects and use a syslog normalizer to normalize the events into EDDY CERs, then send them to the backplane. The syslog normalizer is being written at the University of Washington.

Mark said a challenge with the syslog data is to figure out what to do with it. The idea is to build a model using the known elements of a mail message, then do data mapping to determine what is useful information and what can be eliminated.

Matt commented that Microsoft's Win2003 Server included an application that took events from a number of Microsoft services and mapped them, so you could watch something hit one service and bounce to another. There is also a tool for consolidating event logs to a central server and providing the ability to do scripting and analysis. Mark said the point is to correlate events no matter where they come from. Microsoft MOM does a good job at that, but also need to coordinate that with other things. The mail model that we worked on, for example, used sendmail as the MTA and IMAP as the delivery.

There is an effort within TERENA about diagnostics that might be similar, or compatible with, what we are doing with EDDY. Mark reported that he had a conversation with Diego Lopez and Bob Morgan about their effort for diagnostics and there was a presentation about this at the TERENA TF-EMC2 Meeting in Malaga, Spain. It would be good to have a discussion with these people at the member meeting, if they are going to be there. There is a different style of coordination and money available in Europe to do this kind of work.

BoF Format for Member Meeting: Chas sought feedback on the BoF format for the Internet2 member meeting in December. After a general discussion, it was agreed to use a demo to try to generate interest in EDDY. Chas will continue to develop the demo and presentation.

The call ended with a general discussion about using Nagios as the standard for output on performance and diagnostic data. Paul reported that these discussions are underway at MIT but another group there is looking at Cacti instead. Mark reported that Nagios can tell you whether a service is up or not, but there is a need for getting more details and more performance log information.

Paul agreed and said that developing some standard reports on performance would be helpful to users, particularly if the data were presented in graphical form. He said he has seen a number of standard reports in Cacti; for example, the aggregate number of calendar users, CPU usage on a server, aggregate VPN users, aggregate email statistics and aggregate jabber statistics. He referred people to www.stanford.edu/services/itmetrics/ for one example of drilling down for additional information. The web site has static reports, but Stanford intends to move to real-time monitoring.

Next Call: The December and January calls have conflicts, so the next call will take place February 7, 2007.