Computers for Smart People by Robert S. Swiatek - HTML preview

PLEASE NOTE: This is an HTML preview only and some elements such as links or page numbers may be incorrect.
Download the book in PDF, ePub, Kindle for a complete version.

5. File access

 

If you work with different computers you will hear about flat files, sequential files, indexed or keyed files and databases. That’s only the beginning. The first designation is not used to represent a file that was run over by a steamroller but rather a simple file that we can read one record after the other and can’t update. There are no keys in the records of the file so we can’t read a specific record without reading every other record, at least until we get to that record. This is also what is referred to as a sequential file. These types of files can be used quite successfully to back up a file and restore it and either can be done quickly. An equivalent music device is the cassette or eight-track, each of which results in listening to every song in order or fast forwarding to get to the song you wish to hear. I’m not sure where the term, flat file originated, but why do we need the designation when the term sequential file suffices? 

The next type of file is an indexed file or keyed file, which has some sort of key in it. This enables us to get to a specific record without reading every other record in the file. This could save us some time since we could obtain the record we want quite quickly, but we have to know the key to the record or at least part of that key or some other significant field. If the key was the account number and we didn’t know it but we knew the customer’s name, the computer program could be intelligent enough to give us a list of accounts to chose from and one of those could be the one we wanted. Many systems give you this option. An equivalent music device is the record or CD since either can get us to a specific song without much effort, unlike the cassette or obsolete eight-track.

If you have a keyed file, the keys are usually unique, that is, you won’t have two records with the same key. Nonetheless you can have a file that is an indexed file with duplicate keys. There is a reason for this, which I won’t get into. Just be forewarned. There are all kinds of indexed files and the differences are due to the company that developed them or the time when they came out. If you know one index file method you can adapt to any other.

The last designation is a database, and as I mentioned earlier every file is a database as each has data that is a base for our system. Some will argue that a database needs to have a key and this equates to an indexed file, but certainly a sequential file is a database – with limitations. Thus, every database is a file. The distinction between files and databases is a very fine point, which I won’t belabor.

If you work with other systems, you will note that the program using a file may have to open it, read it and finally close it. The language that uses this file may actually do a close of the file as the program ends just in case you somehow forgot to do it. This suggests to me that the close that seems to be required is not really necessary. In our sample report program earlier we neither had to open nor close the file because our system is quite intelligent, which is what all systems should be.

For our system, all the files will be indexed files. They will all have unique keys and we can access records in the files by keys as well as read those files in a sequential manner. That is exactly what we did in our very first program to list the fields on the Account balance report. We will get into processing data by specific keys later. The file we used in the previous chapter was also processed sequentially. In our system, the field

            account number,

will always be generated by the system. If our report had fifty accounts, they would all be in ascending order with the lowest key first and the highest last. Recalling the restriction on account number being greater than 9, there is a very good chance that the first record would have an account number of 10, followed by 11 and 12. However there could be gaps in the numbers, as we shall see later.

Some computer systems will lock you out of a file if someone else is updating it. Thus if someone was updating a record in our account file, we may not be able to read any record in the file. Our system will be a little more permissive, having been designed by liberals. If someone else is updating the record with account number 395123867, we won’t be able to update that specific record but we can read it and we can read or update any other record in the file. If two people are updating the file at the same time, most likely they won’t be on the same record but if they just happen to be, we need to take some precautions.

If two people want to update the record with account number 395123867 at the same time, one of the two people will get to it first. Let us say that Pat is that person and he changes the zip code from 14225 to 14229, but he hasn’t done the actual updating just yet. Just before Pat completes the update Chris accesses the same record and the zip code still has the value 14225. She changes the middle initial from L to P and Pat does his update, resulting in the new zip code in the record. But then Chris does her update and the middle initial now is P but the zip code has been returned to the value of 14225, not what Pat had intended. The changed value has been overlayed. We cannot allow this to happen and I will get to how this is handled when we work on an update program. I think you can see that locking the record temporarily should work and not locking the entire file means that both can update different records at the same time. That would be the way to design the system.

Designing the files is done by a DBA or data base analyst. He or she does this with input from other people for efficiency. After all you don’t want a file designed that requires long waits by the users in getting at the data. You also don’t need redundant data, as a field that occurs twice in the file just uses precious space. You also don’t want to keep changing the file definition month after month. This means time needs to be spent on analysis and design. In our account file our key may actually be larger than we need but that is something that needs research. I recall a system that I worked on that had three positions for a transaction code when two might have been sufficient since there weren’t any transaction codes bigger than 99.

That whole consideration of trying to save two digits for dates by using only two positions for the year instead of four is what caused the Y2K fiasco. I won’t get into that but you can see where time spent planning can save a great deal of time later. There is much to be considered and if you’re working on a project where all the ideas and design of the system are not firmly in place, it will be impossible to come up with a database design that will suit everyone and keep management happy. The design of the files will have to wait.

These are just some problems involved in information technology systems and you will run into them no matter where you work. The better the system is thought out, the more pleasurable will it be to work there. By the same token, there may not be that much work for you because of that. The places that have plenty of opportunity for work will probably be the corporations that you would not rather set your foot into. What a dilemma.