How-to Work with Large Data
This document will help you to deal with large data sets.
On this page:
Introduction
The following information is a descriptive document of how to deal with large datasets with Qimera as a guideline of the considerations in the steps that apply to process large datasets.
Step 1. Data Storage
It is recommended to have the project saved to your local hard drive (i.e. C: drive of your PC) of your unit. Especially with large datasets, having the project saved on an external hard drive, or working over a network can cause loading times to increase significantly. Any changes to the data will be saved back to the project, which will be fastest if the project is easily accessible (local) to Qimera.
It is also recommended to work with the raw source files directly on the local hard drive (i.e. C: drive of your PC) of your unit. Especially with large datasets, having the raw source files saved on an external hard drive, or working over a network can cause initial loading and processing times to increase significantly.
If the project is located locally on your PC, but the raw source files are not, the initial loading and processing will be much slower than if the raw source files were also local. After the files are processed, the loading times will be similar to if the raw source files were local. After the files are processed, the load times are similar whether they are located locally or not because during the processing of the files, QPD files are made for each of the raw source files. These QPD's will be saved to the project in the DtmData folder, and will be used in replace of the raw source files. Any changes in the project will be saved to the QPD's, and not to the raw source files.
Figure 1. Qimera Project Folder Structure - DtmData
Figure 2. QPS .qpd files in DtmData FolderIf the project and raw source files must be accessed by a network, it is recommended to have the project and raw source files located on the same disk drive. This will allow Qimera to read & write to only 1 location, versus 2. This can speed up the process a bit compared to using 2 different disk drives.
If you are only cleaning data in the project, you only need access to the QPD's. If you do not need to reprocess the files, you technically do not need access to the raw source files at all. The raw source files could be moved to a different location if needed to clear disk space. It is always recommended to keep a copy of the raw source files somewhere in case they are needed in the future.
Step 2. Project Creation
Once the raw source files are processed and the QPD's are created, the QPD's can be loaded into a new project if desired. The QPD's will load significantly faster than loading the raw source files.
In a test project, with approximately 2500 raw source files (.db's), it took around 95 minutes to load the .db's into the project, versus only 20 minutes for the corresponding QPD's.With very large project, especially projects with data from multiple vessels and/or spanning over multiple days, it is recommended to organize the project source files by vessel and day, etc. This can be done by right-clicking on the Raw Sonar Files section in the Project Sources → Organize → Regroup by ...
Figure 3. Raw Sonar Files - Organize - Regroup by Vessel and Day
If a project spans over a very large area, it is recommended to organize the project by blocks or sections. This can be done by selecting a group of Raw Sonar Files, right-clicking → Create Group From Selection... This will group together the source files, which can then be organized by vessel and day, etc., if preferred.
Figure 4. Raw Sonar Files - Create Group From Selection
If CUBE is not required, it is recommended to not include it when creating a large surface. Adding in a CUBE layer can increase the time needed for the surface to update, which increases significantly more as the surface size increases. Each time the surface is updated, the CUBE has to be recalculated. If Qinsy is set up correctly during acquisition, CUBE should not be necessary, which will save some time.
When creating large dynamic surfaces in Qimera, it is suggested to keep an eye on the size of the surface. It is not recommended for the surface to exceed a size of around 50Gb. If the surface will exceed 50Gb, it is recommended to perform cooperative cleaning. You can learn more about cooperative cleaning in Step 4. Filtering & Editing a Surface.
Step 3. Computing Power & Preparing for Processing
It is recommended to check that the hardware specifications of the machine meets the requirements for Qimera to run. Qimera will run as long as the minimum hardware requirements are met, however Qimera will perform best if the recommended hardware requirements are met, especially when dealing with large datasets.
For more information, click here: Qimera System RequirementsThere is the option to change the Ping Buffer Memory Size used during sound velocity correction in Qimera. During sound velocity correction, Qimera will buffer the processed pings into memory and write them at all at once. The size of this buffer determines how many are stored before writes to disk. The large this buffer, the more efficient your processing will be.
The Ping Buffer Size can be assigned in the Preferences Dialog within Qimera. This dialog can be opened from the Project menu → Preferences.
There is also the option to allow Qimera access to more computing power generated by the PC than is allocated by default. The Maximum Number of Processing Threads can be assigned in the Preferences Dialog within Qimera. This dialog can be opened from the Project menu → Preferences.
It should be noted that as you increase the number of thread being used, the PC may become less responsive, so it is recommended to only do this if the PC is being used for only processing.
Figure 5. Maximum Processing Threads Settings
Step 4. Filtering & Editing a Surface
When working with large datasets and surfaces, it is important to know the different options for updating the surfaces. Especially if you are manual rejecting and accepting soundings in an editor, it is recommended to set the surface to update manually. This will save you from needing to wait for the surfaces to update each time an edit is saved. Once all of the desired changes are made, you can force the surfaces to update by selecting the surfaces → right click → Update Dynamic Surface from Edits. If you can wait until the end of the day to update all of the surfaces, that is encouraged to reduce the downtime needed to update. Depending on the size of the surfaces and the quantity, these updates can take a long time to complete.
Figure 6. Update Surface Options
Figure 7. Update Dynamic Surface from Edits
Figure 8. Update Surface Options DetailsFor filtering on large surfaces, the Selection Movement Toolbar can be used to automatically jump you through the data to inspect or manually apply a chosen filter to the selection area. The step interval can be changed to the amount of seconds you want the selection box to play forward or in reverse.
Figure 9. Selection Movement Toolbar
If you want a filter to apply automatically as the selection box moves through the selection area, you can set this up in Slice Editor. You can do this by choosing the filter from the Filter Operation Toolbar → Open Slice Editor → Slice Editor Drop Down Menu → Automatically Apply Active Filter. Turning on this check box will automatically apply whatever filter you have chosen to the area being displayed within the Slice. If this method is followed, it is important to change the Save Edits option according. If Qimera is set to walk through an area with limited interaction from the user, you should have the Save Edits set to automatic or instantly to reduce the need for the processor to manually save the changes.Figure 10. Automatically Apply Active Filter in Slice Editor
The procedure in which you should follow to perform cleaning on your surface will depend on the focus of the project. Most contracts generally stipulate Total Propagated Uncertainty (TPU), and Sounding Density requirements. This should be reviewed as early as possible to ensure the data is within specifications. Especially with very large surfaces, it is not always possible to manually inspect the full surface. Because of this, spot cleaning with the help of the Color Surface By: option and custom colormaps can be very useful for large surfaces.
Figure 11. Color By Uncertainty (95% c.l.)
Figure 12. User Defined Colormap Range
Figure 13. Color By Sounding Density
Figure 14. User Defined Colormap Range
By using the Color by: option paired with the customized colormaps, you can easily see the areas that need to be inspected more closely. This can save a lot of time for very large surfaces, and will update as the surface updates.Cooperative Cleaning can be a very useful tool when working with very large projects and surfaces. Cooperative cleaning allows you to break up a "master" project into multiple small cleaning projects to work on individually. Once you've finished the manual cleaning, the master project can be updated with the edits done in each of the cleaning projects. This can also be used by a single processor to break up a large project on their computer for the sole purpose of breaking up the size, which will prevent long updates of the surface.
More information on Cooperative Cleaning can be found here: Cooperative Cleaning in Qimera
Step 5. Reprocessing the Source Files
As mentioned above in Step 1. Data Storage, the raw source files must be accessible to reprocess the source files. Reprocessing of the source files is required for any files that are affected when any changes are made in the Processing Settings, Time Series Editor, SVP Editor, or Vessel Editor. It is recommended to make all of the changes required in the previously listed editors before reprocessing. When all of the changes are complete, you can then reprocess all of the Raw Sonar Files at once. Reprocessing all of the files at the same time will take a significant amount of time, but overall to will save time in the end. Most operations in Qimera are still available while files are reprocessing, although you will not be able to launch the Swath, or Slice Editor until after the files are finished reprocessing. If you make changes in the Processing Settings then reprocess the files, and move on to make changes in the Time Series Editor, you will then have to reprocess the files all over again. Leaving all of the reprocessing to run at once will take a long time to apply all of the changes, but will save you from reprocessing all of the files multiple times.
If possible, to increase efficiency, it is recommended to make all of the changes to the files and then leave Qimera to reprocess all of the files overnight. This is not always possible, especially when working shifts, but if possible, it can save a lot of downtime.
Figure 15. Reprocess All Raw Sonar Files
Step 6. Exports
If you are required to produce a single dynamic surface that is very large, it is recommended to build the dynamic surfaces into Fledermaus from the cleaned QPS in the project(s) space. From within Fledermaus, you can export the full combined surface. Currently, Fledermaus 8 is able to handle a very large grid as it supports the latest grid versions. It makes it a useful tool when it is required to deliver one large surface. Fledermaus can also handle multi projects connection, and loads from those.
Command Line Scripting has been added in the Qimera 2.2.0 release. At the moment, only basic functionality is supported: project creation, import of .all files, convert to QPD, creation of a new Dynamic Surface, append to an existing Dynamic Surface and application of a filter profile from the Filter Toolbar. More information about scripting in Qimera can be found here: Qimera "qimera-command"