Git annotate all versions of a single file in one go

LIBPF development team has recently moved from subversion to the Git distributed version control system. The tool is very good and there is quite a lot to learn; today we are sharing a shell script we have written to report all changes to a file throughout the lifetime of the repository.

The git command annotate prints out the file, reporting for each line the commit which introduced or last modified the line. With git annotate each line is shown only once,  but we needed to query for all changes throughout all past commits.

Here is the script to achieve that:

#!/bin/bash

# annotateall.sh $1
# run git-annotate for all versions of $1 and eliminate duplicate lines

# delete temporary files (although they should not be there !)
rm $(basename $0).$$.tmp $(basename $0).$$.sh 2> /dev/null
# create a zero-sized temporary file for caching the git-annotate output
touch $(basename $0).$$.tmp
# generate the shell script to run annotate for all versions of S1 and cache the git-annotate output in the temporary file
git log $1 | grep commit | sed "s/commit \(.*\)/git --no-pager annotate  $1 \1 >> $(basename $0).$$.tmp/g" > $(basename $0).$$.sh
# make shell script executable
chmod 700 $(basename $0).$$.sh
# launch shell script
./$(basename $0).$$.sh
# eliminate duplicate lines
sort $(basename $0).$$.tmp | uniq | sort -k 3
# get rid of temporary files
rm $(basename $0).$$.tmp $(basename $0).$$.sh
Posted in Uncategorized | Leave a comment

Programmatically create OpenDocument files

Open Office Calc logoThe OpenDocument file formats are great, interoperable file formats for exchanging data with Office Applications. Unfortunately the 728-pages ISO/IEC 26300:2006 “Information technology — Open Document Format for Office Applications (OpenDocument) v1.0” is not exactly an entertaining reading, and someone wishing to programmatically create OpenDocument files may have a hard time. I present here a tiny utility to generate programmatically OpenDocument files, in particular for the spreadsheet format ODS; all this goodness in just 2687 lines of ANSI C code, with no dependencies on external libraries!

How is this possible ?

Although OpenDocument files can consist of just a single XML document, more often they are ZIP-compressed archives containing a number of files and directories. The minimum number of files in a ZIP-compressed ODS is four:

  • content.xml
  • meta.xml
  • mimetype
  • META-INF\manifest.xml.

Quite interestingly, the ZIP format supports non-deflated archives, in which case the files are just added as-they-are to the archive, without compression (it behaves then like a pure file archiver such as tar). The utility I propose creates am ODS file by archiving in a non-deflated ZIP file the files content.xml, meta.xml, mimetype, META-INF\manifest.xml it finds in the current folder.

The utility is based on a stripped-down version of minizip version 1.01h, plus the five files crc32.h, crc32.c, zutil.c, zutil.h and zlib.h from zlib version 1.2.5.

Compile and run genods; the resulting test.ods file should be compatible with the typical OpenDocument-compatible applications (tested with OpenOffice 3.2). The file run through the OpenDocument Fellowship ODF Validator reports errors in content.xml and missing styles.xml, settings.xml – nothing that you can not fix.

Now you might be asking yourself: how do I use this ? I suggest you create a reference OpenDocument file with your preferred office application, then open it with a ZIP archiver and inspect the files present in the archive. Do some reverse engineering, test it, and set up your program to generate just the core file, the content.xml. Then link in this utility and snap ! (you might need to add more writefile statements to add other files).

For the curious, these were the changes to minizip:

  • create genods.c starting from minizip.c
  • patch zip.c to remove reference to the functions deflateInit2_, deflate and deflateEnd.
Posted in C, Uncategorized | Leave a comment

Modeling and gaming

People involved in modeling are serious guys, who deal with creating simplified abstract views of particular aspects of reality (planetary climate, combustion flames, financial systems….) to build virtual systems to play with.

There is another seemingly unrelated community, whose members also deal with creating virtual systems to play with, but with different purposes: I’m thinking about the game developers and digital effects wizards.

I recently stumbled on this video of a flame:

could this be the perfect model of natural gas burners for the power industry ?

What about these colored smoke plumes ?

Could any of you imagine applications to contaminants propagation or mixing ?

Quite obviously the models used by the gaming / special effects guys do not aim at exactly reproducing reality based on its physical laws, but just try to mimic its appearance. Nevertheless the analogies are striking, just have a look at the Blender 3D manual section on Fluid Simulation (!). They talk about viscosity, gravity, {Noslip, Free-slip and Part-slip} boundary conditions, {Lattice Boltzmann, Navier-Stokes and Smoothed Particle Hydrodynamics} solvers … now that should sound familiar to you CFD guys !

And this is an open source code – I wonder to what level of detail these techniques have been pushed by the big names in computer-animated films and video-games in their in house tools. After all, the money involved in the entertainment industry is huge: Toy Story films 1+2+3 = 320 M$ – Avatar 2009 film: estimate varies between 200 – 500 M$ – Grand Theft Auto IV video-game estimated budget = 100 M$.

When I compare these figures with the budgets available for projects in scientific and industrial modeling, I get a feeling it’s a different world. But see how it could be good news for science:

For hardware, scientists nowadays enjoy easy and cheap access to powerful workstations, parallel CPUs and 64 bit architectures mostly thanks to the push of the consumer market for computational power, required for multimedia and gaming !

For software, perhaps it is time for the scientific and engineering community to start reading the computer graphics literature and study the optimizations used in their simulation engines: the next generation CFD code might well borrow from a state-of-the-art computer graphics software.

As a side-effect, this contamination could cheer up their work a bit and provide awesome, animated and interactive presentations to the colleagues !

Posted in Philosophy | Leave a comment

The sizes of the built-in C++ types: an experimental investigation

With the transition from the 32-bit x86 architecture to the 64-bit amd-64 and the increasing diffusion of mobile development platforms, the question “what is the size of an int” pops up frequently.

If you want to know what applies to your setup, compile and run the program test_sizeof.cc to find out. The program should detect your architecture / compiler  /operating system combination and print the sizes in bytes of the built-in types void *, short int, int, long int, long long int, float, double, long double, char, wchar_t and bool (caution: the detection is based on compile-time macros, mostly from this collection).

Here are some results for a few combinations.

Architecture Intel x86 AMD64 Intel x86 Intel x86 AMD64 Intel x86
Compiler GNU C/C++ GNU C/C++ GNU C/C++ Microsoft Visual C++ Microsoft Visual C++ GNU C/C++ MinGW32+
Operating system Linux Linux MacOSX Windows Windows Windows
void * 4 8 4 4 8 4
short int 2 2 2 2 2 2
int 4 4 4 4 4 4
long int 4 8 4 4 4 4
long long int 8 8 8 8 8 8
float 4 4 4 4 4 4
double 8 8 8 8 8 8
long double 12 16 16 8 16 12
char 1 1 1 1 1 1
wchar_t 4 4 4 2 2 2
bool 1 1 1 1 1 1

And please send in the output from test_sizeof on your setup !

Posted in C++ | Leave a comment

Why spreadsheets suck at modeling

Every time I step onto an airplane I shiver at the thought that the engineers who designed it might have used Excel (or any other spreadsheeting tool for what matters) for their calculations.
It’s time for a good rant against spreadsheets, after having looked at all sort of mistakes made by engineers and finance specialists while modeling anything from a boiling mixture to the the cash flow for a project financing.

The modeling work typically starts with paper and pencil, then at some point a pocket calculator is picked up and finally someone sits in front of a computer and fires up her favorite spreadsheeting program. BTW, my favorite spreadsheeting program is quantrix, a multi-dimensional spreadsheet.

The apparent simplicity of the spreadsheet metaphor drags the user into the illusion of unlimited flexibility, immediate and transparent results and zero pain.

In reality there are several fundamental weaknesses in using spreadsheets which invariably result in very high costs in the quality control and maintenance:

  • although it is possible to name cells, there is no support for naming vectors and accessing vector elements with an index
  • there is no type checking: it’s fair to sum integers with dates, with decimals and floating points !
  • there is no unit or measurements checking so that it’s possible to make very silly mistakes (while this limitation is common to many other tools, today there are technologies available to avoid these troubles)
  • spreadsheets have no simple procedural commands such as for, while, repeat: you have to repeat the statement (!), with the risk of mistake while copying or much more likely when somebody else at a later stage has to change that vector formula; they can be implemented with macros, but that turns your model into the ultimate basic + spreadsheet spaghetti code;
  • it’s unnatural to establish automated tests; so nobody ever tests them !
  • spreadsheets are binary files so it’s forget about proper, meaningful versioning and tracking of changes;
  • implicit systems of equations cause circular references; the solution is attempted by the opaque, stupid built-in solver with disappointing results in most cases; often convergence failures are not properly reported, and debugging is a nightmare; even for direct substitution you need to write macro code – this is illustrated by the simplest possible case of the debt service affecting the period cash flow based on the average of beginning-of-period and end-of-period debt…
  • built-in functions names are internationalized so if you learn to use the function SCARTO on the Italian version, you’ll have to re-learn OFFSET on the English version, BEREICH.VERSCHIEBEN in German, DECALER in French (was that with or without accent ?) , DESLOCAMENTO in Portuguese, DESREF in Spanish, PRZESUNIĘCIE in Polish, СМЕЩ in Russian and of course KAYDıR in Turkish
  • not just the implementation, even the language is proprietary and not peer-reviewed but only subject to the profit-driven strategy of the owner
  • as a consequence of the last point, the tool gets a new release every two years, with the associated hassle for the users to fix the weaknesses and discrepancies it exposes in the models; for comparison, the current C++ standard dates back from 1998 with some corrections in 2003, and the new one is due next year after about more than 5 years of intense peer review from the industry
  • there is no separation between data and programs, an axiom from early days of information technology that there is no need to infringe except for very good reasons (object orientation); countless viruses hidden in what should have been raw data tables ensue.

The bottom line is: never use a spreadsheet for your models, if they are supposed to live more than a month or if they will end up in other people’s hands.

Posted in Rants | Leave a comment

Integrate Plone and WordPress in 10 steps

Plone is the best Content Management System (CMS) around, and WordPress is the best blogging platform in the marketplace. What’s about integrating them ?

Now. If your time budget is in the weeks range, there’s a state-of-the art technique you can apply, described here – a must-read.

But. If you’re in a hurry and just looking at a quick, simple integration, then this howto is right for you.

So here we go.

Problem statement: we wish to integrate Worpress 3.0.1 into an existing Plone 4.0 beta website (say www.example.com) proxied by Apache with virtual hosts. The blog should integrate under the same primary domain (say example.com) so that it can be reached from the navigation tab bar of the main site, and have a consistent style.

Step 1: create a dummy folder in Plone called “blog” and publish it;

Step 2: log out from Plone, then click on the now empty blog folder and save the website homepage from your favorite browser as plone_before.htm;

Step 3: install WordPress 3.0.1 and activate multi-site mode with sub-domain sites;

Step 4: create a WordPress blog with a subdomain such as: wp.example.com;

Step 5: log out from WordPress, then open the empty blog and save the blog homepage from your favorite browser as blog_before.htm;

Step 6: create a child theme of WordPress 3 default theme twentyten called say exampletheme;

Step 7: create a short style.css file  to override the default styles to match the Plone appearance; something like:

/*
Theme Name: Twenty Ten Child - EXAMPLE blog theme
Theme URI: http://example.com/
Author: the WordPress team + me
Author URI: http://www.example.com/
Template: twentyten
Version: 0.1
*/

@import url("../twentyten/style.css");
#wrapper { margin-top: 10px; width: 100%; }
input[type="text"], textarea { padding: 0px; background: #FDFCFA; border: 2px inset; }
#main { width:75%; }
#footer { width:96%; }
#portal-footer { height:14px; }
h1 { font-size:1.5em; }
h3 { font-size:1em; }
.widget-title { font-size:1em; margin-top: 10px; }

Step 8: copy from twentyten into $(WPHOME)/wp-content/themes/exampletheme the files: footer.php, header.php; now compare the plone_before.htm and blog_before.htm files and identify the parts that fit in footer.php and header.php – edit these files and monitor the effects by simply refreshing the blog home (wp.example.com);

Step 9: copy your own logo.png into $(WPHOME)/wp-content/themes/exampletheme, then tweak in header.php the Plone-generated:

<a id="portal-logo" title="Home" accesskey="1" href="http://www.libpf.com">
<img src="http://www.example.com/logo.png" alt="" title="" height="111" width="600" /></a>

into something like:

<a id="portal-logo" title="Home" accesskey="1" href="http://www.example.com/">
<img src="wp-content/themes/exampletheme/logo.png" alt="" title="" height="111" width="600"></a>

Step 10: trick Apache into redirecting the requests for www.example.com/blog to wp.example.com with this rewrite rule:

RewriteRule ^/blog http://wp.example.com [L,R]

Done !

Posted in Howtos | Leave a comment

Printing duplex on a simplex printer

These instructions are for those of you who are in a paper-saving mood, and want to print on the front and the back of each sheet of paper (duplex or “fronte-retro”), but own a cheap one-way printer. [This was tested on a Samsung CLP310N]

The sample is based on a fictional 8 pages A4 document which only has the page numbers (from 1 to 8) displayed in a large font on each page. We want to print it on 4 A4 sheets.

Let’s assume the 4 A4 blank sheets we will use are pre-marked with blue arrows on the front page of each sheet (this is just to keep track of rotations):

  1. Place the 4 sheets in a stack with (what will be the front of the first page) upwards in the paper feed bin, with the blue arrow pointing towards the printer:
  2. Print odd pages in direct order; the printer will rotate the stack of sheets 180° w.r.t an horizontal axis perpendicular to the blue arrow; the printed sheets will end up with the printed side (the one with the blue arrow) down; the top page on the stack will be the back side of (what will be) the last page, with the printed side (the last of the odd pages, number 7) downwards:
  3. Rotate the stack 180° w.r.t a vertical rotation axis perpendicular to the paper sheets, and place the 4 sheets in a stack with (what will be the back of the last page) upwards in the paper feed bin, with the blue arrow pointing towards the printer:
  4. Print even pages in reverse order; the printer will rotate the stack of sheets 180° w.r.t an horizontal axis perpendicular to the blue arrow; the printed sheets will end up with the printed side (the one without the blue arrow) down, while the side with the blue arrow (which was printed first) will be up; the top page on the stack will be the front side of the first page, with the previously printed side (the first last of the odd pages, number 2) upwards.

Done !

Posted in Howtos | Leave a comment

Start of blog

The LIBPF blog will offer some insights in the development and technology side of the C++ LIBrary for Process Flowsheeting.

Posted in Uncategorized | Leave a comment