io.kent Posted July 7, 2014 Report Posted July 7, 2014 The following Scala object calculates the Shannon Entropy for files or byte arrays.It uses the value of one byte as unit.The formula: pi is the probability of byte number i in the byte stream.The base for the logarithm is 256 as a byte can have 256 different values.The result is between 0.0 and 1.0. An entropy of 1.0 indicates high randomness (meaning it appears like random, it doesn't mean it actually IS random), 0.0 no randomness at all (e.g. the entire byte array consists of the same byte value). A high entropy is often the result of encryption or compression. I use this object to compute the entropy of certain sections in a PE file and using it as an indicator whether the original file was packed. Code: Scalaimport java.io.FileInputStreamimport java.io.Fileobject ShannonEntropy { private val chunkSize = 1024 private val byteSize = 256 /** * Calculates Shannon's Entropy for the byte array * * @return Shannon's Entropy for the file */ def entropy(file: File): Double = { val (byteCounts, total) = countBytes(file) entropy(byteCounts, total) } private def entropy(byteCounts: Array[Long], total: Long): Double = List.fromArray(byteCounts).foldRight(0.0) { (counter, entropy) => if (counter != 0) { val p: Double = 1.0 * counter / total entropy - p * (math.log(p) / math.log(byteSize)) } else entropy } private def countBytes(bytes: Array[Byte]): (Array[Long], Long) = { val byteCounts = Array.fill[Long](byteSize)(0L) var total: Long = 0L List.fromArray(bytes).foreach { byte => val index = (byte & 0xff) byteCounts(index) += 1L total += 1L } (byteCounts, total) } private def countBytes(file: File): (Array[Long], Long) = { using(new FileInputStream(file)) { fis => val bytes = Array.fill[Byte](chunkSize)(0) val byteCounts = Array.fill[Long](byteSize)(0L) var total: Long = 0L Iterator .continually(fis.read(bytes)) .takeWhile(-1 !=) .foreach { _ => List.fromArray(bytes).foreach { byte => byteCounts((byte & 0xff)) += 1 total += 1 } } (byteCounts, total) } } private def using[A <: { def close(): Unit }, B](param: A)(f: A => : B = try { f(param) } finally { param.close() }}Note: There are Javadocs, because it is part of a Java project that uses Scala. Quote