Archive for the ‘Programming’ Category

Script for Listing Etexts from Project Gutenberg

Monday, August 16th, 2010

local-etext-listing

I’m a big fan of Project Gutenberg and have downloaded many of their etexts over the years. However, their etexts have numeric file names, which aren’t very human friendly. In order to keep track of the etexts that I have saved on my computer, I’ve written a little perl script to extract the author and title from the etexts and generate an HTML file to list them.

The code’s release under the GPL, so feel free to tinker with the code and share alike.

Percidae

Saturday, March 6th, 2010

For the last few weeks I have been looking for a job as a programmer. One of the jobs that interests me involves writing VB.net. Many of my friends who are also programmers have howled with anguish at the mere mention of this technology.

In order to get a better idea of how the language works, I’ve written a small program for editing the PATH environment variable on Windows machines:

http://www.percidae.com/

I find myself editing this variable on various Windows machines pretty regularly. Every time, I also am annoyed by the small text box in the dialog in the Control Panel.

So far I’ve enjoyed the experience of writing with Visual Basic and can’t see what all the fuss is about. Having written PHP for years, I’m used to ignoring the comments of language purists. I’m much more interested in getting something working than any imagined superiority of different languages.

Richard Feynman on Tuva

Thursday, July 16th, 2009

We can find Richard Feynman’s Messenger Lectures on physics at the intriguingly named Tuva site:

http://research.microsoft.com/apps/tools/tuva/#data=4%7C0%7C%7C%7C%7C

Dr. Feynman is an engaging lecturer; it is perhaps regrettable that all lectures are not so entertaining.

At one point Dr. Feynman says that “It is impossible, when picking one particular example of anything, to avoid picking one that is atypical in some sense.” Of course, this is true by definition. If we were to find an example that was typical in every sense, it would be atypical in that it was not atypical in some sense, and so it would be atypical in some sense. Oh, the joy of school boy pedantry!

The video is rendered with a Silverlight player, which is perhaps not available on all platforms. It also used 100% of my CPU’s clock cycles and caused the laptop to crash three times. I guess that Silverlight has a long way to go before it can threateningly compete with Flash. On the one hand, it’s a good thing that Flash has some more competition (not that I am accusing the Adobe engineers of laziness, mind). On the other hand, the internet will not be as rich a place as it might be if a lot of content is only available to Microsoft’s customers. I thought that that war had been won a long time ago.

Is there an algorithm for Wikipedia?

Friday, June 5th, 2009

Google’s latest offering,

http://www.google.com/squared

is rather fun, but I’m not convinced that I will use it very often.

Compare search results like this:

http://www.google.com/squared/search?q=premiership+clubs

with

http://en.wikipedia.org/wiki/List_of_Premier_League_clubs

The page on Wikipedia is much more useful. It seems that humans are better at making tables of data from diverse sources of information that computers are at this point. Will it always be this way?

Wikipedia has strict guidelines on how articles are written and how propositions should be backed by reliable sources. Could these guidelines be further formalised and pave the way for an algorithm that could write something like Wikipedia from scratch? Google seem to be attempting to build a system that can produce the pages on Wikipedia with names like “List_of_*”. For all I know, Google might have looked at all the articles on Wikipedia whose names match that pattern and used them to get their tables started.

Sport is a popular subject. It’s safe to say that there are lot of people who are willing to give up their free time to collate data on the subject. If some joker changed the Wikipedia table to say that Manchester United were relegated at the end of the previous season, this error would be corrected quickly as there is no lack of people who care deeply about the matter.

During a presentation for Wolfram Alpha, Stephen Wolfram was asked whether he had taken data from Wikipedia. He denied it and said that the problem with Wikipedia was that one user might conscientiously add accurate data for 200 or so different chemical compounds in various articles. Over the course of a couple of years, ever single article would get edited by different groups. The data diverged. He argued that these sorts of projects needed a director, such as himself. However, he said that his team had used Wikipedia to find out what people were interested in. If the article on carbon dioxide is thousands of characters long, is edited five times a day, has an extensive talk page, is available in dozens of languages, and has 40 references, it is safe to say that carbon dioxide is a chemical compound that people are interested in. This is true regardless of the accuracy content of the article. It would be pretty trivial for Google (or any Perl hacker with a couple of hours to spare and a few gigs of hard disk space) to rank all of the pages on Wikipedia according to public interest using the criteria that I just listed.

In many ways, an algorithmic encyclopaedia is to be preferred because of the notorious problems of vandalism and bias. However, tasks like condensing and summarising are not straightforward. The problem of deciding what to write about could analysing Wikipedia, as described above, and tracking visitor trends. Is there going to be a move to unseat Wikipedia in the coming years? How long before humans can be removed from the algorithm completely?

New TEFL Site

Sunday, February 8th, 2009

I’ve put together a few pages for a TEFL materials site:

http://tefl.impey.info/

My aim for this site is to be able to produce materials for TEFL lessons more quickly. The first page that I’ve put up generates materials for a missing information game:

http://tefl.impey.info/TEFL_FindTheWordsInCommonGameHTMLPage

I’ve been playing this game for a few weeks in the classroom, but I have grown tired of writing out the cards using MS Word.

As always, I’ve written the site using the Haddock CMS. It’s the first site to make use of the Sky theme plug-in:

http://haddock-cms.googlecode.com/svn/plug-ins/public-html-sky-theme/trunk/

The aim of theme plug-ins is to be able to make giving a style to a site simply a case of checking out a plug-in and then getting the HTML page class to extend a class in the theme plug-in directory.

It’s also the first site to make use of the new “Site Texts” plug-in:

http://haddock-cms.googlecode.com/svn/plug-ins/site-texts/trunk/

This separates all texts from the code of the project. The texts are saved in files in a separate folder to the project-specific code. At the moment they need to be created and edited by hand, but a web interface in the admin section may follow.

MySQL Connections Plug-in for Haddock CMS Projects

Saturday, January 31st, 2009

I’ve extracted some of the code for connecting to MySQL databases from the database module and put it in its own plug-in:

http://haddock-cms.googlecode.com/svn/plug-ins/mysql-connections/trunk/

I’ve not changed the code very much (just renamed a few items). There are not very many additions that I want to make to this plug-in. The intention is just to refine what is there and add some automated tests.

This little plug-in should make writing Haddock CMS programs that use a MySQL database simpler. Especially for projects that run solely on the command line.

I’ve not removed anything from the old database plug-in, so do not think that you need to add this plug-in to existing projects.

Tagged db-pages and mailing-list plug-ins for Haddock CMS

Monday, January 5th, 2009

I’ve tagged two of the plug-ins from Haddock CMS that are going to get affected by the changes that I’m making to the core of Haddock CMS in

http://haddock-cms.googlecode.com/svn/core/branches/remove-modules-from-core/

If you have a project that uses the branch of Haddock CMS that has the admin, database, html-tags and public-html modules in the core, e.g.

http://haddock-cms.googlecode.com/svn/core/tags/2008-08-10/

or (currently)

http://haddock-cms.googlecode.com/svn/core/trunk/

Then you might need to make references to the following tagged directories in the svn:externals properties of your projects:

http://haddock-cms.googlecode.com/svn/plug-ins/db-pages/tags/2009-01-05/
http://haddock-cms.googlecode.com/svn/plug-ins/mailing-list/tags/2009-01-05/

The trunks of these two plug-ins are not being updated at this point, so there’s no rush to change the external references. While I’m working on the core of Haddock CMS (removing cruft mainly) in a branch (see above), it does not make sense to change the trunks of any of the plug-ins to work with that branch. Except for the plug-ins that have been moved from the core.

The point is that I’m trying not to mess around with code in the trunks of the core or plug-ins, but code in the trunks cannot be relied upon not to change and production systems should reference external directories in the tags directories.

One-liner to make sure svn updates work

Sunday, January 4th, 2009

I’ve been having problems getting Subversion updates to complete reliable of late. I have an enormous working directory called ‘programming-projects’ that is basically a large list in svn:externals. It’s useful to be able to go to the root of that working directory and update everything at once. This is especially useful for checking the development versions of projects that are using the latest versions of Haddock CMS.

However, normally before all the directories have been updated, one of the external servers will return a 502 error (or similar) and the process will die. The best solution that I’ve found so far is to run the following command:

perl -e '$r = 1; while($r) { $r = system("svn up") } '

It simply keeps calling svn up until it is successful. It’s not very elegant, but it works. Is there an argument that you can give to Subversion that will achieve something similar?

When to use the trunk directory in SVN?

Saturday, January 3rd, 2009

I’ve been tidying up some of the code in Haddock CMS recently. So that I don’t damage any sites that are using the previous version of Haddock CMS, I copied the trunk of the core to a new branch.

That is, previously people could use

http://haddock-cms.googlecode.com/svn/core/trunk/

but I’m working on the code at

http://haddock-cms.googlecode.com/svn/core/branches/remove-modules-from-core/

I realised quite quickly that the changes that I am making to the core would require that I make potentially damaging alterations to the plug-ins so

http://haddock-cms.googlecode.com/svn/plug-ins/mailing-list/trunk/

got copied to

http://haddock-cms.googlecode.com/svn/plug-ins/mailing-list/branches/remove-public-html-from-core/

and

http://haddock-cms.googlecode.com/svn/plug-ins/db-pages/trunk/

got copied to

http://haddock-cms.googlecode.com/svn/plug-ins/db-pages/branches/remove-public-html-from-core/

I will probably branch a few more plug-ins before I’m done.

The question that I ask myself is when should one use the trunk directory for a project under SVN? When I’m satisfied that the new branches work correctly and they’ve been tested with several projects, they will be copied to the ‘tags’ directory.

For example, if I decided that the db-pages plug-in is stable today (unlikely) I will copy

http://haddock-cms.googlecode.com/svn/plug-ins/db-pages/branches/remove-public-html-from-core/

to somewhere like

http://haddock-cms.googlecode.com/svn/plug-ins/db-pages/tags/2008-01-03/

and promise to only make minor updates, if any, to that copy of the plug-in. That way people can point to that directory using an external SVN reference and not worry about updates.

The question that I ask is when would I ever go back to using the trunk again?

Dusting off old code

Friday, January 2nd, 2009

I’ve started working on a new web based project using Haddock CMS (details to follow later).

This is the first web based project that I have started since I started to remove some of the modules (Admin, Database, HTML Tags, Public HTML) from the core of Haddock CMS, and using those core modules as plug-ins has required commenting out a lot of obsolete ‘require_once’ statements from some really old class definition files. When we started working on Haddock CMS, we decided that there should be one class per file and that each file should be named after the class. Rather like in Java. For a long time, we had to put lots of ‘require_once’ statements at the top of all the class definition files. Eventually, we came up with a way of automatically generating an enormous __autoload function (basically a large switch statement) using a CLI script for each project. That made adding new ‘require_once’ statements unnecessary.

However, we didn’t go through every class and delete those statements unless we got an error message. Because most of the classes have stayed in the same place since they were created, the old ‘require_once’ statements were not doing any harm and were forgotten about.

Moving the core modules to the plug-ins folder caused however a lot of error messages to be printed. There were a lot of ‘require_once’ statements starting ‘PROJECT_ROOT . “/haddock/…”‘ that were asking for files that are now in the plug-ins folder. That meant that I had to go to about 30 class definition files and comment out the offending lines. Not very difficult or dangerous but somewhat tedious.

The thing about all those files is that they were all for the bits of code that have not been touched for ages. All the code that has been worked on a great deal since we created the __autoload creation script had already been updated. I felt a certain amount of nostalgia and despair (is there a difference?) looking at these files. Some were clearly bad ideas that deserved to be consigned to the dustbin of history. Others were good ideas that were made obsolete by other bits of code. I found a few good bits of code that I might try to use again. Some of the code was trying to be too clever for its own good; perhaps that’s a risk that all object oriented code runs. The emotions of looking at code that was started in 2006 that still has not been completed (of looking at ideas that sparked, shone and died) are a mixture of shame, fear, resignation and hope.

If you use the Haddock CMS for a project, the branch where some of the core modules have been moved to the plug-ins folder can be found at

https://haddock-cms.googlecode.com/svn/core/branches/remove-modules-from-core/

and the various plug-ins can be found in

https://haddock-cms.googlecode.com/svn/plug-ins/