Most Frequently Used French Words

Status: ✅

I’m currently studying French, and if you’ve read any of this site, you’ll notice I’m a bit of a techie. Often several of my interests collide, which is what happened today. I was searching for the “most frequent french words,” and while I found some lists, nothing was exactly what I wanted.
My desire was to have a PDF of the top few thousand most used French words. With the English translation next to it. In order. I’ve found some great resources, which I’ll list now

I’ve also found 100s of sites with 50 words or so - not exactly what I wanted. This spawned a question for me: if I were to search a popular French newspaper website, what words would be the most frequent? I would want to learn those first. A few hours later, and I’ve compiled that list. I’ll write the details of how I did it at the end, but just know I collected over 16,000 unique words, and “read” over 80,000 words from a variety of articles. Below is a PDF of the most popular words, ranked in order, with (maybe incorrect) English translations.

The 625 Most Unofficially Frequent French Words

More to come…! I’m going to continue building the database to make sure the ranking is correct, and will make some pretty graphs when I have time. I will also likely modify things to include what type of word it is, and an example in a sentence.

Please feel free to use this list as you see fit in accordance with CC by 4.0

Integrating OSSEC with Cisco IOS

I rank OSSEC as one of my favorite pieces of open source software, and finally decided to play around with it more in my own free time. (Yup, I do this sort of stuff for fun). My goal was quite simple: send syslog packets from my Cisco to my “proxy” server, running OSSEC. I found that, although OSSEC supports Cisco IOS logging, it didn’t really work. In fact, I couldn’t find any examples or articles of anybody actually getting it to work.

I initially tried to get it to work “correctly,” and soon settled to “just getting it to work.” I implemented some rules in the local_rules.xml file, which worked, but I’m pretty stubborn, and wanted to do it “correctly.” With a couple pots of tea I became much, much more familiar with OSSEC. The key (and a lot of credit) goes to Jeremy Melanson for hinting at some of the updates to the decoder.xml file that need to take place.

The first step is to read the OSSEC + Cisco IOS wiki page. Everything on that page is pretty straight forward. I then added three explicit drop rules at the end of my Cisco’s ACL:

...

access-list 101 deny tcp any host 220.244.xxx.xxx log
access-list 101 deny ip any host 220.244.xxx.xxx log
access-list 101 deny udp any host 220.244.xxx.xxx log

(220.244.xxx.xxx is my WAN IP, and I’m sure you could figure out xxx.xxx pretty darn easily, but I’ll x them out anyways).

To reiterate, OSSEC needs to be told to listen for syslog traffic, which you should be set on the Cisco. If you haven’t done this, go re-read the wiki above.

<remote>
<connection>syslog</connection>
<allowed-ips>192.168.0.1</allowed-ips>
</remote>

On or around line 1550 in /var/ossec/etc/decoder.xml I needed to update the regex that was used to detect the syslog stream.

...

<decoder name="cisco-ios">
<!--<prematch>^%\w+-\d-\w+: </prematch>-->
<prematch>^%\w+-\d-\w+: |^: %\w+-\d-\w+:</prematch>
</decoder>
 
<decoder name="cisco-ios">
<program_name>
<!--<prematch>^%\w+-\d-\w+: </prematch>-->
<prematch>^%\w+-\d-\w+: |^: %\w+-\d-\w+: </prematch>
</program_name></decoder>
 
<decoder name="cisco-ios-acl">
<parent>cisco-ios</parent>
<type>firewall</type>
<prematch>^%SEC-6-IPACCESSLOGP: |^: %SEC-6-IPACCESSLOGP: </prematch>
<regex offset="after_prematch">^list \d+ (\w+) (\w+) </regex>
<regex>(\S+)\((\d+)\) -> (\S+)\((\d+)\),</regex>
<order>action, protocol, srcip, srcport, dstip, dstport</order>
</decoder>


...

In the general OSSEC configuration file, re-order the list of rules. I had to do this because syslog_rules.xml includes a search for “denied”, and that triggers an alarm.

...
<include>telnetd_rules.xml</include>
<include>cisco-ios_rules.xml</include>
<include>syslog_rules.xml</include>
<include>arpwatch_rules.xml</include>
...

Remember that these dropped events will go into /var/ossec/logs/firewall/firewall.log. Because this is my home connection, and I don’t have any active_responses configured (yet!), I tightened the firewall_rules.xml file (lowering the frequency, raising the timeframe).

And in the end, I get a pretty email when somebody tries to port scan me.

Pretty email

OSSEC HIDS Notification.
2008 Nov 15 23:19:36
 
Received From: proxy->xxx.xxx.xxx.xxx
Rule: 4151 fired (level 10) -> "Multiple Firewall drop events from same source."
Portion of the log(s):
 
: %SEC-6-IPACCESSLOGP: list 101 denied tcp 4.79.142.206(36183) -> 220.244.xxx.xxx(244), 1 packet
: %SEC-6-IPACCESSLOGP: list 101 denied tcp 4.79.142.206(36183) -> 220.244.xxx.xxx(253), 1 packet
: %SEC-6-IPACCESSLOGP: list 101 denied tcp 4.79.142.206(36183) -> 220.244.xxx.xxx(243), 1 packet
: %SEC-6-IPACCESSLOGP: list 101 denied tcp 4.79.142.206(36183) -> 220.244.xxx.xxx(254), 1 packet
 
 
 
--END OF NOTIFICATION

Event vs. DOM Driven Parsing of XML

I recently have been playing with parsing GPX files and spitting out the results into a special KML file. I initially wrote a parser using minidom, yet after running this the first time – and my Core2Duo laptop reaching 100% utilization for 10 seconds – I realized I needed to re-write it using something else.

I spent a little time reading the different parsers for XML and eventually read more about cElementTree. And it is included with Python2.5, sweet.

I quickly rewrote the code and did some tests. First, the two bits of code for parsing my GPX file:

minidom-speed.py

#!/usr/bin/python

from xml.dom import minidom
from genshi.template import TemplateLoader

def collect_info():
dom = minidom.parse('airport.gpx')
for node in dom.getElementsByTagName('trkpt'):
lat = node.getAttribute('lat')
lon = node.getAttribute('lon')
speed = node.getElementsByTagName('speed')[0].firstChild.data
speed = float(speed) * 10
coords = '%s,%s' % (lon, lat)
coords_speed = '%s,%s' % (coords, speed)
yield {
'coordinates': coords_speed
}

loader = TemplateLoader(['.'])
template = loader.load('template-speed.kml')
stream = template.generate(collection=collect_info())

f = open('minidom.kml', 'w')
f.write(stream.render())

cet-speed.py

#!/usr/bin/python

import sys,os
import xml.etree.cElementTree as ET
import string
from genshi.template import TemplateLoader

def collect_info():
mainNS=string.Template("{http://www.topografix.com/GPX/1/0}$tag")

wptTag=mainNS.substitute(tag="trkpt")
nameTag=mainNS.substitute(tag="speed")

et=ET.parse(open("airport.gpx"))
for wpt in et.findall("//"+wptTag):
wptinfo=[]
wptinfo.append(wpt.get("lon"))
wptinfo.append(wpt.get("lat"))
wptinfo.append(str(float(wpt.findtext(nameTag)) * 10))
coords_speed = ",".join(wptinfo)
yield {
'coordinates': coords_speed,
}

loader = TemplateLoader(['.'])
template = loader.load('template-speed.kml')
stream = template.generate(collection=collect_info())

f = open('cet.kml', 'w')
f.write(stream.render())

The speed difference is not just noticeable, but very noticeable.

minidom-speed.py

$ python -m cProfile minidom-speed.py
4405376 function calls (3787047 primitive calls) in 32.142 CPU seconds

cet-speed.py

$ python -m cProfile cet-speed.py
1082061 function calls (904167 primitive calls) in 6.736 CPU seconds

A quarter as many calls and almost 5x faster – at least that’s how I interpret the results. Much better!

Django Syndication with Colddirt II

Since I’ve already covered a really simple syndication example, I’ll move onto something a little more complex. Let’s say you want to offer syndication that is slightly more custom. The Django syndication docs give an example from Adrian’s Chicagocrime.org syndication of beats. I had to ponder a minute to get “custom” syndication to work, so here’s my example from start to finish.
First, as usual, feeds.py

feeds.py

class PerDirt(Feed):

    link = "/"
    copyright = 'Copyright (c) 2007, Blog Mozaic'
    
    def get_object(self, bits):
        from django.shortcuts import get_object_or_404
        if len(bits) != 1:
            raise ObjectDoesNotExist
        my_dirt = get_object_or_404(Dirt, slug__exact=bits[0])
        return my_dirt

    def title(self, obj):
        return obj.slug
    
    def description(self, obj):
        return obj.description
    
    def items(self, obj):
        from django.contrib.comments.models import FreeComment
        return FreeComment.objects.filter(object_id=obj.id).order_by('-submit_date')

You can see that this differs slightly from the simpler syndication example. I’ll not a few things. But first, I need to show urls.py:

urls.py

feeds = {
    'mydirt': PerDirt,
}

urlpatterns = patterns('',
    (r'^feeds/(?P<url>.*)/$', 'django.contrib.syndication.views.feed', {'feed_dict': feeds}),
)

Let’s pretend the dirt word (or if you were doing a blog, you could do this based on slug) is “nifty”. So, the process is like this: a request comes in as /feeds/mydirt/nifty/ – it is first looked up in the feed_dict (because of the mydirt part) and then sent to PerDict. Once in PerDict it hits the first def, get_object. One of the things that confused me at first is what the ‘bits’ part is. Simply put: it is the crap after you tell Django which feed to use. Similar to the beats example, I’m testing to make sure the length is only one – so if the word doesn’t exist or somebody just types in feeds/mydirt/nifty/yeehaaa/ – they will get an error. Next the object is looked up, in this case the dirt word (in your case, maybe a blog entry).
The title and description are self-explanatory. The items are a query from the FreeComment database, ordered by date. What we need next is the correct templates.

templates/feeds/mydirt_title.html

Comment by {{ obj.person_name }}

Once again, the filename is important (mydirt_title). obj.person_name is the name from the comment.

templates/feeds/mydirt_description.html

{{ obj.comment }}
Posted by: {{ obj.person_name }}
Published: {{ obj.submit_date }}

That’s it. Hopefully I’ve explained how to create somewhat custom syndication feeds, in case you needed another example.