Threat Intelligence Feeds

Threat Intelligence feeds are designed to provide real time updates on hostile domains, IP addresses, and active malware on the internet. These are two kinds of data feeds: free and paid.

The idea with data feeds is you use those to block IP addresses and IP address ranges, domains with certain registrar email addresses, etc. But just doing that will block legitimate traffic too. So you need to train machine learning algorithms with legitimate sources of data too. For example, you can get firewall logs from all over the word at DShield here and build a list of IP addresses from that. (They will ask you to fill out a form as hackers would like to get their hands on such a list as well.). Dshield users are encouraged to contribute their own firewall logs there to help build up their database.

The SANS Internet Storm website publishes various feeds here.

There feeds are many and including these:

SANS Internet Storm

MLSecProject

The MLSecProject is the brainchild of Alex Pinto, He is a pioneer in the field of applying machine learning to cybersecurity. The goal is to replace the rules-based SIEM approach (which does not work very well and sends analysts off tracking down noise) with ML, which Alex says works up to 30 times better. Meaning it finds genuine hacking events 30 times better than the rules based approach.

Here is some sample data from the MLSecProject OpenSource Python feed reader. This program reads various opensource data feeds. It is not clear from the documentation on their Github project whether those are hostile IPs and why. But you get the idea by looking at this, which is that you can load this data into ElasticSearch, Spark, or Hadoop and then pair it up with your own traffic to see if there is any correlation or if any of these IP addresses are found in your logs.

"entity","type","direction","source","notes","date","asnumber","asname","country","host","rhost"
"1.234.23.28","IPv4","outbound","alienvault","MLSec-Export","2014-04-03","9318","Hanaro Telecom Inc.","KR",,
"1.234.35.198","IPv4","outbound","alienvault","MLSec-Export","2014-04-03","9318","Hanaro Telecom Inc.","KR",,
"1.25.36.76","IPv4","outbound","alienvault","MLSec-Export","2014-04-03","4837","CNCGROUP China169 Backbone","CN",,
"1.93.1.162","IPv4","outbound","alienvault","MLSec-Export","2014-04-03","4808","CNCGROUP IP network China169 Beijing Province Network","CN",,
"1.93.44.147","IPv4","outbound","alienvault","MLSec-Export","2014-04-03","4808","CNCGROUP IP network China169 Beijing Province Network","CN",,
"100.42.218.250","IPv4","outbound","alienvault","MLSec-Export","2014-04-03","18450","WebNX, Inc.","US",,"100-42-218-250.static.webnx.com"
"100.42.55.2","IPv4","outbound","alienvault","MLSec-Export","2014-04-03","36351","SoftLayer Technologies Inc.","US",,"stats.wren.arvixe.com"
"100.42.55.220","IPv4","outbound","alienvault","MLSec-Export","2014-04-03","36351","SoftLayer Technologies Inc.","US",,"stats.warthog.arvixe.com"
"100.42.58.137","IPv4","outbound","alienvault","MLSec-Export","2014-04-03","36351","SoftLayer Technologies Inc.","US",,"100.42.58.137-static.reverse.mysitehosted.com"

ElasticSearch and CyberSecurity

Of course not everyone has ML skills to do run their own machine learning algorithms. But there are plenty of simpler tools. (But these are not ML.) Here is some code from ElasticSearch to help you detect port scans on your network. You could also experiment with the ElasticSearch X-Pack plugin to create cybersecurity alerts and dashboards.

As for using feeds in ES or in Spark or Hadoop you will need a big data programmer and data scientist to help you with that. Look on YouTube and study some of the presentations by Alex Pinto and conference presentations to help you get started.