Automatic log file analysis using NoSQL?

Automatic log file analysis using NoSQL?

Amos Shapira amos.shapira at gmail.com
Wed Feb 16 12:54:02 IST 2011


(Replying to a private message on the list since Tom said he meant this to
go to the list in another private reply)

On 16 February 2011 21:39, Tom Goren <tom at tomgoren.com> wrote:

> Using Flume (part of the hadoop stack) is a viable option for this task,
> however it introduces its own significant levels of complication - set ahead
> resources to tackle the learning curve - and also the related technologies
> (lots of java and map reduce buzzwords).
>
> There is a reason these commercial implementations cost a fortune - real
> time analysis of large amounts of data requires lots of CPU and disk
> space...
>
> Just my two cents,
>

Thanks.

I think I can appreciate the large investment in learning NoSQL, MapReduce
and friends. We have a few people in the company who have some experience
with NoSQL clusters in general and with Hadoop in particular (they used such
stuff elsewhere, some of them are looking into Hadoop specifically as part
of our advanced research), I'm also a bit familiar with Java (been earning
my bread for 6 years coding Java in the past).

Anyway - as it usually happens - the next google after I sent this question
in half-despair, brought up Chukwa (http://wiki.apache.org/hadoop/Chukwa),
which also mentions Salsa (
http://www.usenix.org/event/wasl08/tech/full_papers/tan/tan_html/). I think
this is what I heard about before. Now I have to see how mature this is.

Cheers,

--Amos


>
> Tom.
>
> 2011/2/16 Amos Shapira <amos.shapira at gmail.com>
>
>>  Hello,
>>
>> As part of PCI-DSS compliance I'm working on (ref:
>> http://en.wikipedia.org/wiki/Payment_Card_Industry_Data_Security_Standard),
>> we need to implement automatic log file analysis and alerting. (It's also a
>> Good Thing(TM) to have such a thing in place in general).
>>
>> LogWatch is not enough since it can't handle the amount of logs generated
>> by our system (we generate ~6Gb of compressed HTTP daemon access log files
>> every 24 hours alone, not to mention many other log files and more to come
>> as we progress with PCI compliance) and still requires someone to manually
>> go through its reports.
>>
>> Instead, I see many ads for commercial systems which can analyse log files
>> in near real time and generate custom alerts about suspicious activity
>> outside a learned activity pattern. These systems cost a fortune.
>>
>> On the other hand - I saw mentions of open-source system which dump log
>> files onto a NoSQL database and achieve the same functionality with free
>> tools.
>>
>> Alas - I lost the references for the later.
>>
>> Closest thing I found is Flume (https://github.com/cloudera/flume).
>> Someone tells me that it also does the actual analysis but I don't see this
>> mentioned on its web site.
>>
>> Does anyone else here have an idea about such systems?
>>
>> Thanks,
>>
>> --Amos
>> Does anyone
>>
>> _______________________________________________
>> Linux-il mailing list
>> Linux-il at cs.huji.ac.il
>> http://mailman.cs.huji.ac.il/mailman/listinfo/linux-il
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cs.huji.ac.il/pipermail/linux-il/attachments/20110216/fd8f8264/attachment.html>


More information about the Linux-il mailing list