Archive for July, 2004

An end to the current stupidity

July 6th, 2004

I’m going to make a serious attempt from now on to post only real content on the front page. What do I mean by content?

  • interesting photos
  • unique anecdotal experiences (based on fact)
  • links to intellectual publications
  • other unconventional perspectives
  • anything else stimulating to the brain and/or senses

This means that I might write an emotionally charged post every once in awhile, but it shouldn’t appear on the front page. Sorry, but you’ll have to check the archives if you want to hear about my latest soiree with the ladies. I’ve already pulled one or two entries from the front page, but they should reappear once the list has scrolled down far enough.

Monica’s fan

July 6th, 2004

Photo

The Tyranny of Copyright?

July 5th, 2004

Thanks to Steve Kobes for unwittingly providing today’s piece of Interesting Reading, written by Robert S. Boynton in the New York Times.

http://www.google.com/groups?selm=20040124155806.27149.00000633%40mb-m11.aol.com

If the link breaks, tell me in a comment.

the empty venue

July 4th, 2004

Take a wild guess how many people were in this restaurant.

Photo

Things I did not do today

July 3rd, 2004

Today, I did not…

  • wake up in time for the Tanabata festival decoration party thing
  • go to see Spiderman 2 with a bunch of other people
  • go to the top of the AER building with a cute girl (it was closed)
  • kill someone
  • play any video games

Research

July 1st, 2004

Process Name: JMDict_bipartite_analysis.exe
User Name: Jeff
CPU: 99%
Mem Usage: 163,572K (and rising)

My computer has slowed to a crawl as I’ve been testing different ways to improve the data set for my project.

I expect you probably want to know exactly what it is I’m doing, so I’ll explain that first.

I’m making a graph of every word in several languages, and the connections between them. I then take this graph and feed it into an algorithm which gives me communities, or clusters within this graph– words and phrases that are related to “seed” words or phrases that I enter as input. At the moment, the graph includes Japanese, English, German, and some French and Russian. I only have a dictionary file for links between Japanese and the other four. If someone can find me public-domain dictionary files between any of the non-English languages, that would be wonderful.

At present, the graph contains about 297 thousand nodes (entries), up from 262 thousand nodes before I started expanding the data set yesterday. I also just finished expanding the number of edges (connections) in the graph from around 500 thousand to more than 1.2 million. Right now I’m debugging a procedure to automatically find and remove noisy entries, ones like “to” and “and” which have tens of thousands of links to other words but no real value for this type of graph.

This work by Jeff Hiner is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported.