John Vilandre on How Data Processing and Computing Developed in One Center of CVD Epidemiology
[ed. For 40 years John Vilandre was associated with the Laboratory of Physiological Hygiene’s (LPH) and then the Division of Epidemiology’s data processing and analysis operations. His depiction of Minnesota transitions reflects the technical developments common to many centers involved with epidemiological studies over the same period from the mid-1960s to the present.]
I arrived at the LPH in the fall of 1963. My first job was in data processing, working for the lab’s statistician. Most data came to us handwritten on forms from field workers or from laboratory workers. Small data sets were often processed by entering the variables, two at a time, directly into mechanical calculating machines. These calculators were able to accumulate sums and calculate sums of squares and cross products of the two variables. Those values were used to compute descriptive statistics such as means, standard deviations, and correlation coefficients. The Monroes and Friedens were first hand-operated but by the time I arrived they were electric.
These mechanical machines placed practical limitations on the sizes of studies because someone had to sit there and add up all those data by hand and do the squares. And, of course, if you have a column of 100 or more numbers and punch it in twice, you are often going to come up with different answers. So you must do it a third time to get the thing rectified.
I think the calculators were limited, too, just because of the shear time it would take to do a study on thousands of people, as we easily do now. Eventually a common technique for handling large samples was to use card-sorting machinery to order the cards on a particular variable, then determine quartile, quintile, decile or centile cut points to use as entry values for the calculating machines. /p>
Rose Hilk can tell you the story of the time there was a huge card sort going on and we had a large bin on the wall that held the cards as they came out in order. Dr. Keys was eager to get this analysis done so he came in on a week-end and did some more sorting, but at one point the whole bin came crashing down off the wall. He had to start all over.
[An advantage with calculators was that] you had to understand the process. Now you put it in SAS and get a number out. Sometimes . . . I think people don’t even know if their chosen statistic is the appropriate one to use. Anyway, that’s why we have statisticians to check the work. Basically, still true today, we human beings are the single weakest point — because we make errors.
Having the data on punch cards allowed us to begin to utilize computers for data analysis. At first, in the early 1960s, cards were carried to the Mutual Service Insurance Co. in St. Paul for analysis on its Univac. As the University began to acquire central computing facilities, such as the Control Data 3300, we started to use those machines as well. Drawers full of punch cards were transported across Washington Avenue from the Stadium to the 3300 site beneath the VFW hospital.
Use of these machines also triggered the first of many data conversion projects; the University machines read only IBM-style, 80-column punch cards and LPH had been using Remington Rand, 90-column (round hole) cards (compatible with the Univac).
At the Mutual Service Insurance Company in St. Paul, where Bill Parlin was actuary and data manager, we got our first computing experience, but outside the Lab. No computers were yet available to us internally.
The LPH gets its own computer
Bill Parlin was largely instrumental in bringing the first computer ever to Physiological Hygiene, the Digital Equipment Company’s PDP 8. It . . . must have been around 1963-64. Everyone has a computer these days and talks about megabytes and gigabytes of memory. That machine had 4 kilobytes of memory! It had no storage device of any kind except paper tape which isn’t really internal. There was no disk drive, no tape drive, no CRT. Each time you ran a program you had to load it in from the paper tape and the numbers were punched out on a teletype machine of the kind used by the news media in those days . . . With its attached paper tape punch, it allowed programs and data to be stored for later use.
Bill Parlin became our pioneer computer programmer. The first thing that he did was to write a little piece of code called a “handler” (today we call them “drivers”) that would interface a card-reading machine with the computer. So now we could actually read the numbers from the punch cards into our computer. You couldn’t store anything. All the statistics were done by numbers stored in memory for that run only. When you turned the machine off you had to start all over.
So, although the PDP 8 allowed us to perform some data analysis in-house, its limitations (small memory, no internal data storage) meant that larger problems still needed to be taken outside. But that’s where I learned computers; sitting around in the evenings watching Parlin. He showed me how to do machine-language programming because, with such a limited amount of memory, if you tried to use Fortran (first of all the only Fortran compilers available were rudimentary) it would create code that was quite inefficient. So if you really wanted to get the maximum use of those 4,000 units of PDP memory you had to write using the computer’s basic instruction set. It was great fun, much like doing puzzles.
When the LPH moved into Moos Tower in 1973, the PDP 8 was replaced by a PDP 12. That was a nice little machine. It had an auxiliary hardware floating-point number processor. Today, such a processor is an integral part of any PC computer chip, but then, it was a six-foot tall cabinet full of cards with resistors and capacitors—just to do basic multiplication and division. This computer provided, also for the first time, magnetic disk (2.5 megabyte) and magnetic tape. Now we could exchange data with other places and make back-ups of the things we were doing. Although it permitted more sophisticated programming, it was still a single-user machine and the department was growing rapidly as new studies came in to the Lab.
So we had to work in shifts. Since I was a single person, I often worked the night shift. But soon, even 24-hour scheduling of computer time became insufficient to meet the demands.
We got our first multi-user system in, I think, 1978. I still have some of the old price quotes from when we purchased the PDP 11/34. Now we had larger disk drives that held 80 megabytes of data. Each disk drive was a separate, floor-standing unit, about the size of a small dishwasher. When you think of what’s in your little laptop now it seems crazy. But the nice thing about the PDP 11 was its capacity for multiple, concurrent users. I think we had initially four terminals and that expanded as we moved into several generations of PDP 11s. There was no graphics capability at all; it was just able to echo characters back from what you typed into the computer. But this allowed us to start doing direct key-to-disk data entry and to bypass punch cards.
The second massive data media conversion, this time from punch cards to magnetic tape, was performed during this period. We had literally millions of punched cards by the mid 1970s. Rose Hilk, over a period of months, fed the cards through a reader attached to a computer and then we stored the data on magnetic tape. We had big bins in the hallway to dump the cards into once they had been read. When the card stock was sold to a recycling company, we got enough money to buy a microwave for the kitchen.
The PDP-11/34 was replaced by a PDP-11/70 and a PDP-11/44, each of which was just a little faster and had more capacity. These machines formed the nucleus of the first computer network in the division. At one time we were probably supporting 30 or 40 users of these machines and people would notice a significant slowdown in peak times. When the Minnesota Heart Survey started around 1980, funds became available to buy a new kind of computer from the same company, called the VAX, very popular in the 1980s. We had a succession of them and still have some of them today.
I think Jim Huber was one of the first people to bring in one of the original Macintosh computers. He got it to work as a terminal on our VAX. Then a few people started experimenting with PCs and today we have a mix of PCs and Macintoshes. They are working better and better together as advances are made in operating systems and networking.
There was another set of [personal] computers before the IBM PC became so popular. Various computer manufacturers had taken a stab at creating small, personal, desktop units, including Digital that made a computer called The Professional. That was a single user system but it was based on the same operating system that we were running on the PDP 11s so that it was easy for us to move software back and forth from one to the other, including the word processor and spreadsheet program we used then.
Until we got to the VAX era, our PDP 11s were too small for certain jobs of sorting tens of thousands of records, so we would put data onto magnetic tape or onto computer disks and take them over to the CDC 6600. That was a batch processing system and we took decks of cards sometimes also. You could punch your program onto cards and they would be run through a reader, compiled, and then applied to your data and you could store data there for a fee.
We did that for a long time to supplement the capabilities of the PDP-11s. Some of the jobs that we were running otherwise would have run for days and there on the CDC machine they could run in a matter of hours. Today they run on your own desktop in a few minutes. A Britton-Lee relational database machine was acquired also.
A growing staff of computer programmers and analysts began in the early 1980s to establish “data dictionaries.” moving the data for the Minnesota Heart Survey and Minnesota Heart Health Project onto the database machine. They wrote software for data analysts to read data from those databases into the computers. A primary statistical tool at that time was the BMDP analysis package.
About 1985, with computing demand increasing, the first VAX computer, a model 8600 was purchased. This new, 32-bit virtual memory machine, removed most limitations on program size that were inherent in the PDP series. The new processor had sufficient capacity to run a relational database in software, allowing the retirement of the Britton-Lee machine. It also supported the SAS system which quickly supplanted the use of BMDP. The VMS operating system that ran on the 8600 has been updated frequently over the years. It is now supported on newer, high speed, 64-bit RISC processors.
While the longevity of VMS has meant less retraining of programmers and other staff, today most users in the Division use personal computers for the bulk of their work. The VMS systems are service providers, or servers, to the desktop systems, providing data storage, database capabilities, e-mail service, etc. to the end users. (John Vilandre)
[ed. We in Minnesota are an increasingly big, happy, computer-capable, but demanding family of faculty and staff, incessantly requiring servers and services for both PCs and MACS. But I know of no academic center with a greater level of direct and intimate support services for computation and communication.]
John Villander, in an interview by Henry Blackburn. September 2005. History of Cardiovascular Epidemiology Collection, University of Minnesota.