MW-E2ED Conference Call
September 10, 2008
Chas DiFatta, Carnegie Mellon University (chair)
Michael Gettes, MIT
Paul Hill, MIT
Steve Olshansky, Internet2
Ann West, Internet2
Chas provided an update on the storage agent, eddy-grep and eddy-awk.
A simple storage agent is now running on the EDDY backplane at CMU, processing up to 15,000 events per second. Every five minutes, the agent stores information to a file. A process then moves the file to a directory hierarchy, named with the year, month, date and minute.
The storage agent extracts just the information of interest and can reduce file sizes significantly by stripping out the XML. A 2,000 byte CER becomes just a 100-200 byte file. The storage agent also compresses the files.
Based on input from potential EDDY users, there is now an eddy-grep that parses files very quickly – up to about 2,000 events per second. Chas will distribute a manual page for eddy-grep.
Eddy-grep will accommodate multiple tags per line and can take compressed or uncompressed files. There are four output formats – a small storage agent format, a pure CER form in ASCII, a comma-delimited file, and a tag-value pair. This has been running for two months and has proven useful in looking at viruses and attacks coming into CMU.
In terms of other functionality, there are other features being considered for eddy-grep. One is to provide an anonymized file with which a researcher or analyst could experiment. Some options might include a –c option to count certain parameters in events, and a –f function that would input a saved command file (rather than having to type in the command each time).
Chas has been using an open source tool for the search process and the EDDY team is considering wrapping some extensions around the tool to automatically call eddy-awk. This would enable changing events to different types and allow for anonymization. One use would be finding events when that reach a certain criteria, say a certain number of events, and a new CER is generated or email is sent to someone.
There is also thought about an eddy-cat that would concatenate files.
Allowing for anonymization provides the opportunity to feed CERT, classes and researchers with real data that they can manipulate and test, as well as allow them to do presentations and write papers without releasing sensitive data. The anonymization, for example, would remove IP addresses and other identifying information.
Chas also reported that the EDDY team is now building an agent that connects to a jabber service using XMPP and incorporates events into the EDDY framework. Once the XMPP agent is official, the next push may be an eddy-awk that can pare down the amount of output provided. Another suggestion was to provide a wrap-around for eddy-grep that would populate an SQL database.
**Next Meeting: October 1, 2008, 5 p.m. (EDT)**