Data About Metadata – Indexing Large Format Documents
Metadata is “data that provides information about other data”, but not the content of the data, such as the text of a message or the image itself.
For the last 13 years, I have learned there are no shortcuts to indexing large format drawings. I have heard stories about utility companies outsourcing an imaging project where they scan and index a large volume of drawings. Unfortunately, they didn’t know how to define their specification, set standards and do a quality check on the indexed data. They uploaded the information, assumed it was correct. After reviewing the data, they realized there were mistakes in the data. No one looked at it before it was uploaded. I want you to know about this painful lesson and avoid it.
This last week, our company was awarded a federal contract which included indexing 18,000 drawings. We held a kick off meeting to establish key issues about the project. Having scanned and indexed 40,000 drawings for Hoover DAM, I had many lessons learned. The First lesson as a Service Provider was “We do not provide psychic indexing”’. People laugh when I make this statement, but the reality is so many people think you understand their data and can interpret what it’s supposed to mean.
We don’t. It’s your data and you will be the one consuming it. Unless it’s on the document we don’t know. When you utilize a service bureau to capture your engineering drawings, my philosophy is this: We will capture the image as good as or better than the original. Indexing is the key to a successful project and the future use of your data. Use what works in finding the metadata.
Indexing large format drawings CAN NOT be automated. There is no magic software and the reality is, in my opinion, there never will be. Don’t be fooled by the smoke and mirrors of a demo. Title blocks are on the bottom and sometimes on the left or right side. The text is rotated and there can be multiple dates. OCR’ing the font for drawing or handwritten title blocks isn’t really worth the results. Some drawings have multiple vendors (which one do you choose?) Batch or no batch processing, I have found that the only way to successfully index large quantities of drawings accurately is manually look at each drawing, find the data and enter the information.
Establishing Standards and abbreviations should begin before you hit the keyboard. Define whether you are using abbreviations such as BLDG vs. building. We all know that database queries are based on parameters and not translation of that information. Document your standards and make it available for everyone to understand so that your contributors know the rules and why they are there. Remember that rules are merely suggestions without enforcement.
Once the standards are defined regarding what information is to be entered, address the punctuation issue. I recommend you dump punctuation unless it changes the connotation of the information like a Project Number (you need the hyphen’s and periods to maintain the structure). So many get comma and apostrophe happy we create additional challenges in the database. Who knows when and where they will be.
Finally, be involved! Review a sample set of indexes and see what happens, establish a workflow to verify information throughout the project and address the issues as they come up. There will always a question on “what do we put here?” It is a team effort. A prompt response with your service provider about how to address content is the best way to get accurate data that is usable and retrievable. After all, isn’t that why we are digitizing and indexing to begin with?
If you want to discuss indexing of drawings, please feel free to contact me direct.