Message ID: 273319
Posted By: ColonelZen
Posted On: 2005-06-13 01:43:00
Subject: ybsnarfz
Having gotten back from my trip yesterday, I looked at
this board, and decided NO.
Well having said no, I decided to play a little.
While yahouevre does a better job and has lots more functionality I always wanted
a simpler perl tool to snarf the yahoo boards.
It needs some polish and some
automation. But simple it is - a grand total of 537 lines in six files (one of which
is a configuration and one of which is the table definition).
It's just playing,
not anything serious, but if you don't want to do the full warmcat thing and just
want to have the posts in a db for future reference or scanning, this'll do it.
I'm rerunning it against the board now, but with coming up on 300k posts it'll
take a few days. I'll catch and fix bugs as it runs.
I'll tarball and post
it later. For now it's as cuttable text (perl) in http://www.ip-wars.net/?op=displaystory;sid=2005/6/13/1336/32119
-- TWZ
Message ID: 273323
Posted By: ColonelZen
Posted On: 2005-06-13 03:09:00
Subject: Re: ybsnarfz
As this is a work in progress, look at the comments,
found and fixed two...
As said, I'll find time to clean it up and package
it sometime... it shouldn't take too long.
-- TWZ
Message ID: 273521
Posted By: ColonelZen
Posted On: 2005-06-13 18:51:00
Subject: ybsnarfz-0.0.2
see
http://www.ip-wars.net/comments/2005/6/13/1336/32119/6#6
for the README and where the tarball is.
It seems to all be running fairly
smoothly now.
-- TWZ
Message ID: 274092
Posted By: ColonelZen
Posted On: 2005-06-16 00:20:00
Subject: ybsnarfz-0.1.0
A bit of code cleanup, et al.
The entire scox
table mysqldump'd is 150Mb. The table is a single totally unnormalized and contains
a lot of redundancy in columns but it's just worth noting how *small* all our work
here has been over the last two years ;-)
I may add some trivialities to
read the table in a fairly useful way now - others are invited to play as well if
they choose.
Described with the link to get the tarball at:
http://www.ip-wars.net/comments/2005/6/13/1336/32119/7#7
-- TWZ
Message ID: 274096
Posted By: ColonelZen
Posted On: 2005-06-16 00:53:00
Subject: Re: ybsnarfz-0.1.0 missing posts
speaking of which there seems to
be about 33k posts missing between 145000 and 178100.
Does anybody have
any Clues?
-- TWZ
Message ID: 274560
Posted By: ColonelZen
Posted On: 2005-06-17 21:08:00
Subject: 0.1.1 of ybsnarfz is available
at
As described at
www.ip-wars.net/comments/2005/6/13/1336/32119/8#8
This is ybsnarfz a rather
simplistic package to snarf the yahoo financial boards for any given stock....
-- TWZ
Message ID: 274843
Posted By: ColonelZen
Posted On: 2005-06-20 02:47:00
Subject: ybsnarfz new version
ybsnarfz-0.1.2 is now available at
http://mysite.verizon.net/~vze4v38p/ybsnarfz-0.1.2.tar.gz
There are some
code fixes, a program FixTime.pl which you should run to fix bad times between midnight
and one am. There is also a ybsnarfz.php program to display the data in the table.
Some sample output is (at the same site)
ybs-config.html
ybs-msg-list.html
ybs-msg-thread.html
ybs-list.html
ybs-thread.html
It seems to all work,
but it could definitely use some style sheets!
The config file for the php
program is the same .properties file used by the perl programs, but it needs to
reside in the same directory as the php script (and be readable which is a security
opening, but your db user should be restricted to localhost anyway).
-- TWZ
Message ID: 275778
Posted By: ColonelZen
Posted On: 2005-06-23 00:02:00
Subject: OT RTP if using/thinking of ybsnarfz
Just want to get an idea of
how much interest.
It works for my purposes but I've gotten only one person
giving regular feedback and one passing mention of use by another correspondent.
Given past discussions of archiving I thought there would be more interest.
My question is should I just put final cleanup on what's there (mostly cosmetics
and trivial usage - I've made but not yet published parts for keeping an archive
of the html of the posts as per my correspondent's req and he wants the html filenames
sortable [i.e the number string the same size] and a minor polish to the php [enter
message number and limit lists to particular poster])
Or is there enough
interest to push it a little further - the table as it is is *totally* unnormalized.
Normalizing it would allow better calculation of thread and subthread rec totals
and the like as well as better per user calcs.
The cosmetics could happen
in the next week, my thought is to "put it to bed" after that unless there is more
interest.
-- TWZ
The texts of these Yahoo Message Board posts have been licensed for copying and distribution by the Yahoo Message Board user "ColonelZen" under the following license: License: CCL Attribution-NonCommercial-ShareAlike v2.0.
Copyright 2005 Yahoo! SCOX. Messages are owned by the individual posters.