MW-E2ED Conference Call
March 5, 2008
**Attending**
Chas DiFatta, Carnegie Mellon University (chair)
Mark Poepping, Carnegie Mellon University
Paul Hill, MIT
Michael Gettes, MIT
Steve Olshansky, Internet2
**Recent Releases **
The CER Factory has been released; two subsequent releases included small functionality changes. Development of the storage agent has been pushed back by about a month. User requirements have been documented and the document is available from Chas.
The data store is a private asset and each EDDY installation has its own data store. There are mechanisms built into EDDY that allow an institution to anonymize data and make the data store available to researchers, if desired.
The next release, due out in two weeks, will fix a couple of bugs that affect stability.
**Shibboleth Trial**
Michael reported that the Shib trial for Internet2 was fairly simple, which was the idea. He raised a concern about having each process create its own log file. In the long-term, it would be nice to have one process that could handle multiple files, load the logs into EDDY and do some data mining.
The long-term strategy for EDDY is to develop one process that a system administrator could aim at many things. There would be one config file that would pull whatever data is requested.
**MIT Pilot**
Paul reported that the staging machines, which were to have been in place in October, are not yet available, but are supposed to be available by March 17. Once those machines are in place, MIT will move forward with the pilot involving Shib and EDDY.
**Next Steps**
CMU will have some student help available this summer to do some of the work on the packaging of EDDY, if that is deemed important.
There was a discussion about the possibility of using Syslog-ng as a way to consolidate logs and getting data into EDDY. However, syslog does not gather all of the information an EDDY user might want. EDDY has a syslog normalizer available and could include a config file that allows a user to choose the log files to track. The main concern about syslog as a transport mechanism is that it is too restrictive at the top level.
There was a general discussion about the types of users interested in EDDY. At CMU, for example, researchers would like the ability to access and massage the data. That means that data needs to be anonymized. On the other hand, the broad use of EDDY will likely come from its functionality as an operational tool, correlating logs to allow staff members to quickly pinpoint an error.
Paul used, as an example, an admissions system that receives various data feeds, interacts with an enterprise system and relies on several SOAP-based web services. It would be useful to gather the data from such disparate systems and present it in a way that allows for rapid analysis.
The MIT Shibboleth pilot will provide a controlled atmosphere in which to test components with multiple log files. This should help determine the direction for the EDDY project, in terms of the types of plug-ins and interfaces that will make EDDY easy to use. For the initial pilot start-up, however, interfaces are not that important. Just the ability to use GREP would be sufficient. Chas believes that capability will be available within a month. Once the pilot is underway, the plan is to use the mailing list to facilitate discussion.
In the meantime, Chas will distribute the user requirement document to the list for comment. This is a Word document; please use the Track Changes feature to make changes/comments. Chas will also email instructions on the workflow for changed documents, as well as the conventions for renaming files.