Monday, November 19, 2012

Editing large/big files on linux / cygwin

Have you ever needed to work with and in particular edit very big files? E.g. 400MB+ XML files. Try opening such files in a normal editor and it will most likely complain. I use sed, which is a part of the unix toolchain, to edit such big files. 

How to view a particular line in a large/big file? 

Say you have to view the contents of line 102.000 in a file named tmp.xml. You can do this using sed like this:
sed '100200q;d' tmp.xml

How to change some of the contents of a particular line in a large/big file?

Say you have a file tmp.xml that has encoded & as & instead of & which is the correct way to encode & in XML. You want to change this enconding on line 100. You can you this using sed like this:
sed -e '100s/&/&/' tmp.xml > tmp-modified.xml
The format is linenums/stuff to replace/replacement/ (remember the s after linenum as well as the two /).