Jump to content
io.kent

[Scala] Shannon Entropy

Recommended Posts

Posted

The following Scala object calculates the Shannon Entropy for files or byte arrays.

It uses the value of one byte as unit.

The formula: cag7fkhz.jpg

pi is the probability of byte number i in the byte stream.

The base for the logarithm is 256 as a byte can have 256 different values.

The result is between 0.0 and 1.0.

An entropy of 1.0 indicates high randomness (meaning it appears like random, it doesn't mean it actually IS random), 0.0 no randomness at all (e.g. the entire byte array consists of the same byte value). A high entropy is often the result of encryption or compression.

I use this object to compute the entropy of certain sections in a PE file and using it as an indicator whether the original file was packed.

Code: Scala

import java.io.FileInputStream
import java.io.File

object ShannonEntropy {

private val chunkSize = 1024
private val byteSize = 256

/**
* Calculates Shannon's Entropy for the byte array
*
* @return Shannon's Entropy for the file
*/
def entropy(file: File): Double = {
val (byteCounts, total) = countBytes(file)
entropy(byteCounts, total)
}


private def entropy(byteCounts: Array[Long], total: Long): Double =
List.fromArray(byteCounts).foldRight(0.0) { (counter, entropy) =>
if (counter != 0) {
val p: Double = 1.0 * counter / total
entropy - p * (math.log(p) / math.log(byteSize))
} else entropy
}

private def countBytes(bytes: Array[Byte]): (Array[Long], Long) = {
val byteCounts = Array.fill[Long](byteSize)(0L)
var total: Long = 0L
List.fromArray(bytes).foreach { byte =>
val index = (byte & 0xff)
byteCounts(index) += 1L
total += 1L
}
(byteCounts, total)
}

private def countBytes(file: File): (Array[Long], Long) = {
using(new FileInputStream(file)) { fis =>
val bytes = Array.fill[Byte](chunkSize)(0)
val byteCounts = Array.fill[Long](byteSize)(0L)
var total: Long = 0L
Iterator
.continually(fis.read(bytes))
.takeWhile(-1 !=)
.foreach { _ =>
List.fromArray(bytes).foreach { byte =>
byteCounts((byte & 0xff)) += 1
total += 1
}
}
(byteCounts, total)
}
}

private def using[A <: { def close(): Unit }, B](param: A)(f: A => : B =
try { f(param) } finally { param.close() }

}

Note: There are Javadocs, because it is part of a Java project that uses Scala.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.



×
×
  • Create New...