From wkt at tuhs.org Wed Sep 10 08:02:30 2003 From: wkt at tuhs.org (Warren Toomey) Date: Wed Sep 10 08:02:36 2003 Subject: [TUHS] Fwd: Helping in the battle against SCO Message-ID: <20030909220230.GA70691@minnie.tuhs.org> All, This e-mail below was prompted by an interview I gave about the SCO thing for an Australian paper: http://www.theage.com.au/articles/2003/09/09/1062902037394.html ----- Forwarded message from Ulrik Petersen < emdros@yahoo.dk> ----- Date: Tue, 9 Sep 2003 19:44:51 +0200 (CEST) From: Ulrik Petersen < emdros@yahoo.dk> Subject: Helping in the battle against SCO I saw a recent article in the Sydney Morning Herald in which a Dr. Warren Toomey (presumably you?) was quoted as saying that the TUHS has several members who have access to old copies of UNIX source code. Please ask these people to try out one of the three "shredders" which can compare sourcecode from Linux with other sourcecode, and, if possible, analyze and publish the results. One of these shredders is written by Eric S. Raymond. Here is a link to an article in which he calls for action by people with access to UNIX sourcecode: http://www.eweek.com/article2/0,4149,1257617,00.asp The program itself can be found here: http://www.catb.org/~esr/comparator/ Regards, Ulrik Petersen, Denmark ----- End forwarded message ----- Anyway, I think it's a good idea, so I'd like to hear from people who have access to recent AT&T code. My GPG and PGP keys are at http://minnie.tuhs.org/warren.html and on most keyservers if you so wish to use them. Thanks, Warren
From matt at aclaro.com Wed Sep 10 14:18:15 2003 From: matt at aclaro.com (Matthew Mastracci) Date: Thu Sep 11 09:08:47 2003 Subject: [TUHS] Fwd: Helping in the battle against SCO Message-ID: <3F5F8707.7070504@aclaro.com> What about comparing SVR1/2 to 2.4.x? SCO seems to be picking on the early 2.4.x codebase. This should also pick up the SGI code comments in the malloc() function that were recently publicized, though I'm not sure which version Linus removed the code from. Matt.
From wkt at tuhs.org Thu Sep 11 09:17:40 2003 From: wkt at tuhs.org (Warren Toomey) Date: Thu Sep 11 09:17:55 2003 Subject: [TUHS] Fwd: Helping in the battle against SCO In-Reply-To: <3F5F8707.7070504@aclaro.com> References: <3F5F8707.7070504@aclaro.com> Message-ID: <20030910231740.GA82319@minnie.tuhs.org> On Wed, Sep 10, 2003 at 02:18:15PM -0600, Matthew Mastracci wrote: > What about comparing SVR1/2 to 2.4.x? SCO seems to be picking on the > early 2.4.x codebase. This should also pick up the SGI code comments in > the malloc() function that were recently publicized, though I'm not sure > which version Linus removed the code from. > Matt. Yes we can do this. But I'm suspecting that SCO has found lots of BSD code in both Linux and their codebase. SysVR2 didn't have any networking, so we probably won't get much similarity. Anyway, we can try! Warren
From norman at nose.cs.utoronto.ca Wed Sep 10 19:59:07 2003 From: norman at nose.cs.utoronto.ca (Norman Wilson) Date: Thu Sep 11 10:03:14 2003 Subject: [TUHS] Fwd: Helping in the battle against SCO Message-ID: <20030911000259.F34D91E83@minnie.tuhs.org> I don't see how any diffing we do will make any difference `in the battle against SCO.' If we find cases in which Linux has incorporated System V licensed code, that will certainly be meaningful; but if, as seems likely, we don't, SCO can just say their tools are better than hours. And besides, it is SCO who have brought the complaint, so both legally and ethically it's up to SCO to prove the case, not up to others to disprove it, no matter what fearsome roars SCO emit. Comparisons done by others are certainly interesting, and I don't want to discourage anyone from doing them; just don't expect it to make any difference to the lawyers. (Not that I'm one, of course.) Norman Wilson Toronto ON
From luvisi at andru.sonoma.edu Wed Sep 10 17:41:49 2003 From: luvisi at andru.sonoma.edu (Andru Luvisi) Date: Thu Sep 11 10:28:57 2003 Subject: [TUHS] Fwd: Helping in the battle against SCO In-Reply-To: <20030911000259.F34D91E83@minnie.tuhs.org> Message-ID: < Pine.LNX.4.44.0309101738100.2092-100000@gladen> On Wed, 10 Sep 2003, Norman Wilson wrote: > I don't see how any diffing we do will make any difference > `in the battle against SCO.' [snip] Some ways that I can see it being a good thing to do: If SCO holds up a piece of common code and the good guys have no response, that is bad. If SCO holds up a piece of common code and the good guys already know that it actually came from BSD, and are prepared to demonstrate such, that is good. If SCO holds up a piece of common code and the good guys already know that it was contributed to Linux by SCO/Caldera themselves, and are prepared to demonstrate such, that is good. If there is infringing code, it should be taken out of Linux as quickly as possible. Andru -- Andru Luvisi Quote Of The Moment: Heisenberg may have been here.
From grog at lemis.com Thu Sep 11 10:25:46 2003 From: grog at lemis.com (Greg Lehey) Date: Fri Sep 12 03:29:13 2003 Subject: [TUHS] Fwd: Helping in the battle against SCO In-Reply-To: <20030911000259.F34D91E83@minnie.tuhs.org> References: <20030911000259.F34D91E83@minnie.tuhs.org> Message-ID: <20030911172545.GC946@adelaide.lemis.com> On Wednesday, 10 September 2003 at 19:59:07 -0400, Norman Wilson wrote: > I don't see how any diffing we do will make any difference `in the > battle against SCO.' It could. There's a lot of confusion out there. The people on this list have a much better understanding of the technical issues than just about any other group of people I can think of. > If we find cases in which Linux has incorporated System V licensed > code, that will certainly be meaningful; but if, as seems likely, we > don't, SCO can just say their tools are better than hours. FWIW, the first example that SCO showed in Las Vegas on 18 August does appear to be derived from System V.3 malloc(). See http://www.lemis.com/grog/SCO/code-comparison.html for the details. Also, if anybody else can confirm or deny my analysis based on code inspection, I'd be *very* grateful. Summary: the first example showed a slightly modified version of Third Edition malloc() being used for a slightly different purpose in the SGI ia64 port only. The slight modifications tracked those in System V.3, suggesting that SGI derived their code from System V, and not from an earlier version. On the other hand, the differences in System V.3 were removed again, and in fact the Linux community had already removed the entire code before SCO "revealed" it. > And besides, it is SCO who have brought the complaint, so both > legally and ethically it's up to SCO to prove the case, not up to > others to disprove it, no matter what fearsome roars SCO emit. No question. Greg -- Finger grog@lemis.com for PGP public key See complete headers for address and phone numbers
From grog at lemis.com Thu Sep 11 10:32:32 2003 From: grog at lemis.com (Greg Lehey) Date: Fri Sep 12 03:35:55 2003 Subject: [TUHS] Fwd: Helping in the battle against SCO In-Reply-To: <Pine.LNX.4.44.0309101738100.2092-100000@gladen> References: <20030911000259.F34D91E83@minnie.tuhs.org> <Pine.LNX.4.44.0309101738100.2092-100000@gladen> Message-ID: <20030911173232.GD946@adelaide.lemis.com> On Wednesday, 10 September 2003 at 17:41:49 -0700, Andru Luvisi wrote: > On Wed, 10 Sep 2003, Norman Wilson wrote: >> I don't see how any diffing we do will make any difference >> `in the battle against SCO.' > [snip] > > Some ways that I can see it being a good thing to do: > > If SCO holds up a piece of common code and the good guys have no > response, that is bad. Agreed. That doesn't apply to either piece of code they've shown so far. This is http://www.lemis.com/grog/SCO/code-comparison.html again. > If SCO holds up a piece of common code and the good guys already > know that it actually came from BSD, and are prepared to > demonstrate such, that is good. That's the second example :-) The question I've asked SCO is: how could you have missed the Berkeley license agreement at the beginning of this file? SCO have backed off claiming that this is System V code, and claim it's just an example of their code comparison techniques. But on slide 15 of their presentation (http://www.vangennip.nl/perens/SCOsource_Briefing_II.2.pdf), they clearly claim that it's System V code. This suggests that SCO have recognized their error, though they haven't yet had the decency to apologize to the BSD community. Greg -- Finger grog@lemis.com for PGP public key See complete headers for address and phone numbers
From norman at nose.cs.utoronto.ca Mon Sep 15 16:39:31 2003 From: norman at nose.cs.utoronto.ca (Norman Wilson) Date: Tue Sep 16 06:44:06 2003 Subject: [TUHS] Fwd: Helping in the battle against SCO Message-ID: <20030915204355.EB9311E5D@minnie.tuhs.org> Andru Luvisi: If SCO holds up a piece of common code and the good guys have no response, that is bad. If SCO holds up a piece of common code and the good guys already know that it actually came from BSD, and are prepared to demonstrate such, that is good. If SCO holds up a piece of common code and the good guys already know that it was contributed to Linux by SCO/Caldera themselves, and are prepared to demonstrate such, that is good. If there is infringing code, it should be taken out of Linux as quickly as possible. ====== I'll grant all those points, but if the idea is to defang SCO, the effort still seems fruitless to me. System V and Linux both contain appallingly large volumes of code. (On a list that discusses the UNIX of the 1970s, perhaps I can say that without creating undue ruckus.) The odds are that quite a lot of the code is similar. Should we really spend months and months tracking it all down and trying to declare where each line came from, or should we wait until SCO declares a specific set of cases that matter (as they must do sooner or later or abandon the court battle)? When one is faced with an enormous set of possible computations, of which only a handful are likely to be needed in the end, lazy evaluation is usually the better choice. It does seem sensible to me for the Linux community to do its best to hunt down any infringing code, and to try to assess whether there's a serious problem lurking that nobody had noticed. But that ought to be a matter of basic ethics, having nothing to do with SCO. I doubt it is likely to make much difference to the court battle anyway: SCO's claim is that the infringing code is there now, that it was put there deliberately at IBM's instigation to do harm to them, and that the harm already exists; removing it now won't change any of that. I think it's a good idea to remove any infringements that are there now, even if they are trivial ones; but let's not fool ourselves that it will pull SCO's fangs to do so. Norman Wilson Toronto ON
From wkt at tuhs.org Tue Sep 16 08:48:53 2003 From: wkt at tuhs.org (Warren Toomey) Date: Tue Sep 16 08:49:12 2003 Subject: [TUHS] Lexical comparator, was Re: the battle against SCO In-Reply-To: <20030915204355.EB9311E5D@minnie.tuhs.org> References: <20030915204355.EB9311E5D@minnie.tuhs.org> Message-ID: <20030915224853.GA27957@minnie.tuhs.org> On Mon, Sep 15, 2003 at 04:39:31PM -0400, Norman Wilson wrote: > It does seem sensible to me for the Linux community to do its best to > hunt down any infringing code... But that ought to be a matter of basic > ethics, having nothing to do with SCO. I doubt it is likely to make > much difference to the court battle anyway... I think it's > a good idea to remove any infringements that are there now, even if they > are trivial ones; but let's not fool ourselves that it will pull SCO's > fangs to do so. For me it's not just a matter of defeating SCO, it's also one of sheer indignation in the face of Saganesque FUD ("billions and billions of lines of code"). I seriously want to know if there's even the tiniest possibility that SCO is right, or if they're are just Smoking Crack Often. While we're on the topic, I saw esr's code shredder/comparator that works on lines of code. This isn't going to work if variables get renamed etc. I'm writing a code comparator that works on a lexical basis, comparing C tokens. It's only going to be proof of concept (i.e. slow), but I should have it done by week's end and I'll pop a notice in here when it's ready. Cheers, Warren
From norman at nose.cs.utoronto.ca Mon Sep 15 20:02:52 2003 From: norman at nose.cs.utoronto.ca (Norman Wilson) Date: Tue Sep 16 10:07:13 2003 Subject: [TUHS] Lexical comparator, was Re: the battle against SCO Message-ID: <20030916000703.635F31E5D@minnie.tuhs.org> Warren Toomey: For me it's not just a matter of defeating SCO, it's also one of sheer indignation in the face of Saganesque FUD ("billions and billions of lines of code"). I seriously want to know if there's even the tiniest possibility that SCO is right, or if they're are just Smoking Crack Often. That's fair enough. Just remember that no matter how much you scan the code, you can't beat the FUD campaign by doing so. SCO can just claim their tools are better than yours, and continue to stonewall about showing their evidence. And as I said last week, both legally and morally the onus is on SCO to provide proof of their claims: the infringement, that it was done maliciously, that it has caused them harm. The `evidence' they have shown so far makes me doubt very much that they can prove all three of those things, or possibly any but the least- significant case of the first. As I also said last week, I don't mean to discourage anyone from doing code comparisons. Intellectually it's an interesting exercise. Ethically it's the right thing to do if the Linux community thinks it's possible that licensed code got into the system. Even legally it might make some difference to have shown due diligence, though not in the matter presently before the courts. If it makes someone feel less frustrated, that's fine too. But scanning the Linux code won't provide hard proof of anything, any more than you can claim to prove there are no leaks in your roof solely by inspection. If proof is possible, it will work the other way. Norman Wilson Toronto ON
From rweather at zip.com.au Tue Sep 16 10:46:49 2003 From: rweather at zip.com.au (Rhys Weatherley) Date: Tue Sep 16 11:01:02 2003 Subject: [TUHS] Lexical comparator, was Re: the battle against SCO In-Reply-To: <20030915224853.GA27957@minnie.tuhs.org> References: <20030915204355.EB9311E5D@minnie.tuhs.org> <20030915224853.GA27957@minnie.tuhs.org> Message-ID: <200309161046.49958.rweather@zip.com.au> On Tuesday 16 September 2003 08:48 am, Warren Toomey wrote: > While we're on the topic, I saw esr's code shredder/comparator that works > on lines of code. This isn't going to work if variables get renamed etc. I'd like to point out that the more steps that are taken to factor out identifier names, whitespace conventions, etc, the closer you approach a situation where the tool says "both programs are written in the same programming language" or "both programs use binary searching somewhere in their code". Which, while true, isn't terribly useful to know. A human being still needs to wade through the results and inspect them manually. Cheers, Rhys Weatherley.
From robert at timetraveller.org Mon Sep 15 21:35:12 2003 From: robert at timetraveller.org (Robert Brockway) Date: Tue Sep 16 11:39:56 2003 Subject: [TUHS] Lexical comparator, was Re: the battle against SCO In-Reply-To: <20030916000703.635F31E5D@minnie.tuhs.org> References: <20030916000703.635F31E5D@minnie.tuhs.org> Message-ID: < Pine.LNX.4.56.0309152131290.24213@zen.canint.timetraveller.org> On Mon, 15 Sep 2003, Norman Wilson wrote: > possible that licensed code got into the system. Even legally it Hi. Don't want to nitpick here but many of us think it is important to get this point straight whenever we are talking about GPLed code. The kernel is licenced (as I'm sure you know). What we are of course concerned about is: a) Code which is licenced in a manner incompatible with the GPL b) Code that the copyright holder did not authorise going into the kernel. I'm sure you were just speaking in shorthand but it is subtle point that many misinterpret. Many people outside the OSS community think that "all that free code" is in the public domain, which it is most definately not. > Norman Wilson > Toronto ON A fellow Torontonian, perhaps we may meet at TLUG sometime. I'm giving the next talk. Cheers, Rob -- Robert Brockway B.Sc. email: robert@timetraveller.org, zzbrock@uqconnect.net Linux counter project ID #16440 (http://counter.li.org) "The earth is but one country and mankind its citizens" -Baha'u'llah
From norman at nose.cs.utoronto.ca Mon Sep 15 22:11:41 2003 From: norman at nose.cs.utoronto.ca (Norman Wilson) Date: Tue Sep 16 12:16:17 2003 Subject: [TUHS] Lexical comparator, was Re: the battle against SCO Message-ID: <20030916021555.EFF121EB2@minnie.tuhs.org> Robert Brockway: Hi. Don't want to nitpick here but many of us think it is important to get this point straight whenever we are talking about GPLed code. The kernel is licenced (as I'm sure you know). What we are of course concerned about is: a) Code which is licenced in a manner incompatible with the GPL b) Code that the copyright holder did not authorise going into the kernel. I'm sure you were just speaking in shorthand but it is subtle point that many misinterpret. Many people outside the OSS community think that "all that free code" is in the public domain, which it is most definately not. ==== Quite right. I wasn't speaking in shorthand, I was speaking in clumsy; what I should have written is `possible that code restricted by the System V license got into the system.' Licenses come in all flavours, and whether there is any license at all is not the issue here. I certainly didn't mean, for example, to imply that all licenses are evil, reptilian kitten- eaters from another planet. Norman Wilson Toronto ON
From imp at bsdimp.com Mon Sep 15 21:01:26 2003 From: imp at bsdimp.com (M. Warner Losh) Date: Tue Sep 16 16:02:07 2003 Subject: [TUHS] Fwd: Helping in the battle against SCO In-Reply-To: <20030915204355.EB9311E5D@minnie.tuhs.org> References: <20030915204355.EB9311E5D@minnie.tuhs.org> Message-ID: <20030915.210126.54187719.imp@bsdimp.com> In message: <20030915204355.EB9311E5D@minnie.tuhs.org> norman@nose.cs.utoronto.ca (Norman Wilson) writes: : tracking it all down and trying to declare where each line came from, In BSD land, we can do that. We have cvs annotate. Looks like the stubborn refusal to use source code control, and to have only a few people putting things together makes it a lot harder to track things down after the fact. Good call that. Warner
From imp at bsdimp.com Mon Sep 15 21:10:11 2003 From: imp at bsdimp.com (M. Warner Losh) Date: Tue Sep 16 16:02:08 2003 Subject: [TUHS] Lexical comparator, was Re: the battle against SCO In-Reply-To: < Pine.LNX.4.56.0309152131290.24213@zen.canint.timetraveller.org> References: <20030916000703.635F31E5D@minnie.tuhs.org> < Pine.LNX.4.56.0309152131290.24213@zen.canint.timetraveller.org> Message-ID: <20030915.211011.51703000.imp@bsdimp.com> In message: < Pine.LNX.4.56.0309152131290.24213@zen.canint.timetraveller.org> Robert Brockway < robert@timetraveller.org> writes: : a) Code which is licenced in a manner incompatible with the GPL : b) Code that the copyright holder did not authorise going into the kernel. There's a lot of code that originated in the BSD world that had its copyrights shorn off, a GPL splatted on and the mass hacking began. Many of these are no longer recognizable from there original form, and aren't a problem. Some have much more in common with the original. Linux is vulnerable to the original author having a shit fit if they ever find out. Most of the open source authors are amused when this happens, so the odds are low a big deal would be made of it. This practice was wide-spread in the early 1990s, although things have improved a lot. However, without something like CVS and the legal assignment of copyright (or formal acknowledgement of licensing under the GPL, which is harder to defend), this will always be a problem with Linux. The BSD projects are a little tigher about this, but still would be vulnerable. Warner
From wkt at tuhs.org Thu Sep 18 12:56:32 2003 From: wkt at tuhs.org (Warren Toomey) Date: Thu Sep 18 12:56:39 2003 Subject: [TUHS] Lexical comparator In-Reply-To: <20030915224853.GA27957@minnie.tuhs.org> References: <20030915204355.EB9311E5D@minnie.tuhs.org> <20030915224853.GA27957@minnie.tuhs.org> Message-ID: <20030918025632.GA50614@minnie.tuhs.org> On Tue, Sep 16, 2003 at 08:48:53AM +1000, Warren Toomey wrote: > While we're on the topic, I saw esr's code shredder/comparator that works > on lines of code. This isn't going to work if variables get renamed etc. > I'm writing a code comparator that works on a lexical basis, comparing > C tokens. It's only going to be proof of concept (i.e. slow), but I > should have it done by week's end and I'll pop a notice in here when it's > ready. Well, it's done. The software is now available at http://minnie.tuhs.org/Programs/Ctcompare. I have also made available some tokenised source trees so you can do some comparisons straight away. If anybody has Unix kernel trees which they cannot divulge due to licensing restrictions, I'd appreciate you creating tokenised files of the kernel source and e-mailing them to me. Thanks! Warren
From wkt at tuhs.org Thu Sep 18 21:45:26 2003 From: wkt at tuhs.org (Warren Toomey) Date: Thu Sep 18 21:45:32 2003 Subject: [TUHS] Lexical comparator In-Reply-To: <200309181041.h8IAfAWe000686@skeeve.com> References: <200309181041.h8IAfAWe000686@skeeve.com> Message-ID: <20030918114526.GA54312@minnie.tuhs.org> On Thu, Sep 18, 2003 at 01:41:10PM +0300, Aharon Robbins wrote: > > If anybody has Unix kernel trees which they cannot divulge due to licensing > > restrictions, I'd appreciate you creating tokenised files of the kernel > > source and e-mailing them to me. > > Hmmm. Just between us chickens, given tokenized versions of an entire tree, > how hard would it be to recreate a functional kernel? Pretty damn hard. All identifiers, (variable names) are reduced to a single token. Actually, that's not true. The meaning of the names is removed an replaced with numeric identifiers that are unique to each file. Here's a tokenised portion of 32V (bio.c): 56: struct id10 * 57: id13 ( id14 , id15 ) 58: id16 id14 ; 59: id17 id15 ; 60: { 61: register struct id10 * id18 ; 62: 63: id18 = id19 ( id14 , id15 ) ; 64: if ( id18 ->id20 & id21 ) { 65: #ifdef id1 66: id9 . id5 ++ ; 67: #endif 68: return( id18 ) ; 69: } 70: id18 ->id20 |= id22 ; 71: id18 ->id23 = id24 ; 72: ( * id25 [ id26 ( id14 ) ] . id27 ) ( id18 ) ; 73: #ifdef id1 74: id9 . id3 ++ ; 75: #endif 76: id28 ( id18 ) ; 77: return( id18 ) ; 78: } Now go and check the actual source and work out which function it is! [ see http://minnie.tuhs.org/UnixTree/32VKern/usr/src/sys/sys/bio.c.html ] Warren