The Python Book
 
frequency count
20160418

Use the collections.counter to count the frequency of words in a text.

import collections

ln='''
The electrical and thermal conductivities of metals originate from 
the fact that their outer electrons are delocalized. This situation 
can be visualized by seeing the atomic structure of a metal as a 
collection of atoms embedded in a sea of highly mobile electrons. The 
electrical conductivity, as well as the electrons' contribution to 
the heat capacity and heat conductivity of metals can be calculated 
from the free electron model, which does not take into account the 
detailed structure of the ion lattice.
When considering the electronic band structure and binding energy of 
a metal, it is necessary to take into account the positive potential 
caused by the specific arrangement of the ion cores - which is 
periodic in crystals. The most important consequence of the periodic 
potential is the formation of a small band gap at the boundary of the 
Brillouin zone. Mathematically, the potential of the ion cores can be 
treated by various models, the simplest being the nearly free 
electron model.'''

Split the text into words:

words=ln.lower().split()

Create a Counter:

ctr=collections.Counter(words)

Most frequent:

ctr.most_common(10)

[('the', 22),
 ('of', 12),
 ('a', 5),
 ('be', 3),
 ('by', 3),
 ('ion', 3),
 ('can', 3),
 ('and', 3),
 ('is', 3),
 ('as', 3)]

Alternative: via df['col'].value_counts of pandas

import re
import pandas as pd

def removePunctuation(line):
    return  re.sub( "\s+"," ", re.sub( "[^a-zA-Z0-9 ]", "", line)).rstrip(' ').lstrip(' ').lower()

df=pd.DataFrame( [ removePunctuation(word.lower()) for word in ln.split() ], columns=['word'])
df['word'].value_counts()

Result:

the             22
of              12
a                5
and              3
by               3
as               3
ion              3
..
..
 
Notes by Willem Moors. Generated on momo:/home/willem/sync/20151223_datamungingninja/pythonbook at 2019-07-31 19:22