Discussion:
How to sort a file by pure ASCII order?
Omer Zak
2014-09-14 13:59:32 UTC
Permalink
I encountered a counterintuitive behavior of 'sort' in modern Linux
releases.

I checked the sorting behavior of sort, as installed in Debian Jessie
and Ubuntu 14.04 LTS.
Turns out that the default behavior of sort (with locale=en_US.UTF-8) is
not to sort by ASCII order, but as if letters and digits are more
important to sort order than punctuation marks.

Attached please find a sort-test.txt file and the output of
sort < sort-test.txt (as the file actual.txt).

To show how would the output look like using pure ASCII sort, I sorted
sort-test.txt using python-sort.py (attached).
and got the result reproduced in correct.txt (attached).

The problem is then what options would get GNU sort to sort like
python-sort.py?

Can anyone shed a light on the matter?

--- Omer
--
No actual electrons, animals or children were harmed by writing this
E-mail message.
My own blog is at http://www.zak.co.il/tddpirate/

My opinions, as expressed in this E-mail message, are mine alone.
They do not represent the official policy of any organization with which
I may be affiliated in any way.
WARNING TO SPAMMERS: at http://www.zak.co.il/spamwarning.html
Jonathan Ben Avraham
2014-09-14 14:15:47 UTC
Permalink
LC_ALL=C ?
Date: Sun, 14 Sep 2014 16:59:32 +0300
Subject: How to sort a file by pure ASCII order?
I encountered a counterintuitive behavior of 'sort' in modern Linux
releases.
I checked the sorting behavior of sort, as installed in Debian Jessie
and Ubuntu 14.04 LTS.
Turns out that the default behavior of sort (with locale=en_US.UTF-8) is
not to sort by ASCII order, but as if letters and digits are more
important to sort order than punctuation marks.
Attached please find a sort-test.txt file and the output of
sort < sort-test.txt (as the file actual.txt).
To show how would the output look like using pure ASCII sort, I sorted
sort-test.txt using python-sort.py (attached).
and got the result reproduced in correct.txt (attached).
The problem is then what options would get GNU sort to sort like
python-sort.py?
Can anyone shed a light on the matter?
--- Omer
--
9590 8E58 D30D 1660 C349 673D B205 4FC4 B8F5 B7F9 ~. .~ Tk Open Systems
=}-------- Jonathan Ben-Avraham ("yba") ----------ooO--U--Ooo------------{=
mailto:yba-***@public.gmane.org tel:+972.52.486.3386 http://tkos.co.il skype:benavrhm
Omer Zak
2014-09-14 20:35:07 UTC
Permalink
Thanks, Jonathan, 'LC_ALL=C sort' did the trick.
Later I went back to 'man sort' and found there the warning to pay
attention to locale when running sort. I missed the warning before...
Post by Jonathan Ben Avraham
LC_ALL=C ?
Date: Sun, 14 Sep 2014 16:59:32 +0300
Subject: How to sort a file by pure ASCII order?
I encountered a counterintuitive behavior of 'sort' in modern Linux
releases.
I checked the sorting behavior of sort, as installed in Debian Jessie
and Ubuntu 14.04 LTS.
Turns out that the default behavior of sort (with locale=en_US.UTF-8) is
not to sort by ASCII order, but as if letters and digits are more
important to sort order than punctuation marks.
Attached please find a sort-test.txt file and the output of
sort < sort-test.txt (as the file actual.txt).
To show how would the output look like using pure ASCII sort, I sorted
sort-test.txt using python-sort.py (attached).
and got the result reproduced in correct.txt (attached).
The problem is then what options would get GNU sort to sort like
python-sort.py?
Can anyone shed a light on the matter?
--
Palestinians did not firmly and vocally and strongly denounce the Hannover attack (http://www.huffingtonpost.com/2010/06/24/german-youths-attack-jewi_n_623922.html) but rather supported the attack, even though it is yet another proof why Jews need their own country in which they can live safely.
My own blog is at http://www.zak.co.il/tddpirate/

My opinions, as expressed in this E-mail message, are mine alone.
They do not represent the official policy of any organization with which
I may be affiliated in any way.
WARNING TO SPAMMERS: at http://www.zak.co.il/spamwarning.html
Loading...