Skip to content

PHP Frequency Distribution

yooper edited this page Aug 16, 2016 · 3 revisions

Frequency Distributions with PHP Text Analysis

The frequency distribution is a great way to find out how frequently or in-frequently specific words are used in a body of text. The FreqDist class expects the tokens to be normalized prior to object instantiation.

$tokenizer = new GeneralTokenizer()
$tokens = $tokenizer->tokenize("time flies like an arrow and an arrow flies like time");
$freqDist = new FreqDist($tokens);

/*
* Get the Hapaxes, all the terms with a frequency count of 1
*/
$freqDist->getHapaxes(); 

/*
* get the corpus size
*/ 
$freqDist->getTotalTokens()
/**
* Get the size of the vocabulary
*/
$freqDist->getTotalUniqueTokens();
Clone this wiki locally