UNDERSTANDING OF RELATIONSHIPS AMONG DATABASE OBJECTS

Database objects refer to various structures and components used to organize, store, and manipulate data. These objects play a crucial role in database management systems (DBMS).

KEY CONCEPTS AND TERMINOLOGY

  • Schema: structure or design of the database or database object (table, view, index, stored procedure, trigger) - defines the tables, fields in each table, relationships between fields - a schema will include information on which fields have domains and what those domains are.
  • Data dictionary: catalog or table containing information about the datasets stored in a database.
  • Domain: the range of values for a particular metadata element
  • Attribute domain: enforces data integrity, identify what values are allowed in a field in a feature class.
  • Coded value domain: attribute domain that defines a set of permissible values for an attribute in a geodatabase - it has a code and its equivalent value.
  • Range domain: type of attribute domain that defines the range of permissible values for a numeric attribute.
  • Spatial domain: allowable range for x, y coordinates and for m, z values.
  • Tables: collection of related data held in structured format within a database, contains fields and rows
  • Views: result set of a stored query on the data - users can query - virtual table computed dynamically from data when the view is accessed.
  • Sequences: ordered collection of objects in which repetitions are allowed (finite or infinite) number of elements is the length of the sequence.
  • Synonyms: Alias or alternate name for a table, view, sequence, or other object.
  • Indexes: data structure that improves the speed of data retrieval operations in a database table.
    • Causes more storage space and additional writes.
    • Quickly locate data in the database
    • Indexes can be on multiple columns.
  • Clusters: Can either be:
    • Multiple servers share one storage - this is typically used to handle user load balancing.
    • Databases distributed to different servers using replication - this is typically used if you have multiple users utilizing the same data in different physical locations. There is a master database that the replica databases sync between.
  • Database Links: data stored in a different database but accessible by to the database currently being accessed.
  • Snapshot: state of a system at a particular point in time - can be a backup.
  • Procedure: sometimes referred to as a “stored procedure”, is a subroutine available to applications that access a relational database system (data validation, access control mechanisms).
  • Trigger: procedural code automatically executed in response to certain events on a particular table or view in a database
  • Functions (subroutine): sequence of program instructions that perform a specific task.
  • Package: built from source with one of the available package management systems
  • Non-schema objects: users, roles, contexts, directory objects

SAMPLE QUESTION

Which of the following database objects is used to improve query performance by allowing faster data retrieval?

A) Tables

B) Indexes

C) Views

D) Sequences

Answer: B) Indexes

Explanation: Indexes are database objects that improve query performance by allowing faster data retrieval based on indexed columns. They provide efficient access to specific rows within a table.

UNDERSTANDING OF DATABASE DESIGN

The database design process is significant and involves creating a well-structured and efficient database to store and manage data. This is one of the most important roles of a senior GIS professional as all organizations depend on accurate, reliable, and well performing databases for managing assets, resources and supporting all manner of operations and activities. Generally, the process is outlined as followed:

  1. Determine the Purpose of Your Database
    • Understand the goals and requirements for your database.
    • Consider the users, data sources, and anticipated queries.
  2. Find and Organize the Information Required
    • Gather all the types of information you want to record in the database.
    • Identify the entities (objects or concepts) relevant to your domain.
    • Define the attributes (fields) for each entity.
  3. Divide the Information into Feature Classes and Tables
    • Each feature class or table represents an entity or a relationship between entities.
    • Tables should be normalized to minimize redundancy and improve data integrity.
  4. Specify Primary Keys and Analyze Relationships
    • Choose a primary key for each feature class and table (a unique identifier).
    • Define relationships (one-to-one, one-to-many, many-to-many) between tables.
  5. Design the Feature Classes, Tables and Fields
    • Determine what features need to be captured, stored and what type of geometry is required (points, lines, polygons, images etc.). Will features need to be modeled for different scales?
    • Determine the data types for each field (e.g., text, number, date).
    • Set any constraints (such as required fields or unique values).
    • Identify coded value domains.
  6. Create Views and Indexes
    • Create views to simplify complex queries or restrict access to sensitive data.
    • Define indexes to improve query performance.
  7. Implement Data Integrity and Security Measures
    • Set up constraints (such as foreign keys) to maintain data integrity.
    • Define security rules to control access to data.
    • Determine if versioning is required.
  8. Test and Refine the Design
    • Populate the feature classes, tables with sample data.
    • Test queries, views, and data manipulation operations.
    • Refine the design based on feedback and testing.
  9. Document the Database Design
    • Create a data dictionary describing each table, field, and relationship.
    • Document any business rules or assumptions.
  10. Implement the Database
    • Create the actual database using a Database Management System (DBMS).
    • Set up tables, relationships, views, and indexes.
    • Prepare for backups and disaster recovery.

KEY CONCEPTS AND TERMINOLOGY

  • Database design: process of producing a detailed data model of a database.
  • Design process:
    • Conceptual schema - Determine where relationships and dependency are within the data.
    • Logical Data Model - Arrange data in a logical structure that can be mapped into the storage objects supported by the database management system.
    • Physical database design
  • Field Types:the proper field type will secure data and make databases more efficient.
    • Short integer - between -32768 and 32768
    • Long integer - between -2,147,483,648 and 2147483647
    • Float - single-precision floating-point numbers
    • Double - double-precision floating-point numbers
    • Text - could be a coded value - assign to an integer through a domain.
    • Dates – a calendar date and sometimes a time is associated.
    • BLOBs - data stored as a long sequence of binary numbers - ArcGIS stores annotation and dimensions as BLOBs - images, multimedia, bits of code.
    • Object Identifiers - Unique IDs and FIDs
    • Global Identifiers - Global ID and GUID - data types store registry style strings consisting of 36 characters enclosed in curly brackets.
    • Raster field types - raster can be stored within the geodatabase.
    • Geometry - point, line, polygon, multipoint, multipatch.

SAMPLE QUESTION

What is the full form of DBMS?

A) Data of a Binary Management System

B) Database Management System

C) Database Management Service

D) Data Backup Management System

Answer: b) Database Management System

KNOWLEDGE OF DATABASE MANAGEMENT AND ADMINISTRATION

Database management and administration play crucial roles in ensuring efficient and secure handling of data within organizations, providing for:

  • Organization and Accessibility
  • Integrity and Quality
  • Security and Privacy
  • Performance Optimization
  • Backup and Recovery
  • Scalability and Growth
  • Business Intelligence and Analytics
  • Cost Efficiency

KEY CONCEPTS AND TERMINOLOGY

  • Basic tasks:
    • Backup and recovery of databases - Regularly creating and storing database backups in a separate location helps keep the system operational after natural disasters, cyber-attacks, or other issues.
    • Database security - prevent hackers, design, and employ security models, tasks - authentication, authorization, auditing (making sure the right people have the right access)
    • Storage and capacity planning - disk storage is needed and monitor disk space and watch growth trends.
    • Performance monitoring and tuning - identify bottlenecks, tuning (indexing, queries on speed of return, right monitoring tools, capacity of server hardware)
    • Troubleshooting - quickly ascertain problem, root causes, correct it and take measures to prevent a reoccurrence.
    • High availability - ensures that a system remains operational with minimal interruption to end users, even in the face of hardware or software failures, power outages, or other disruptions.
      • Data Backup and Recovery
      • Data Replication - Continuously copying data from one database to another ensures system operability even if one database fails.
      • Clustering - Multiple nodes collaborate to provide data access, ensuring system continuity even if one node fails.
      • Load Balancing - Distributing requests evenly across multiple database servers maintains operability even if one server fails.
      • Automated Failover - Automatically switching to a backup server when the primary server fails minimizes downtime.
      • High Availability Clusters - Also known as failover clusters, these groups of interconnected servers work together to keep applications or services available to users. Redundancy and failover mechanisms ensure that if one server fails or goes offline, another server seamlessly takes over its workload.
    • ETL functions - data extraction, transformation, and loading.
  • Archiving: involves selectively removing specific records from active databases and storing them in an archive often capturing, managing, and analyzing data changes.
    • Most often done with geodatabases
    • These archived records can be managed and retrieved if needed, even though they are no longer part of the active dataset.
    • Reasons for archiving:
      • Cost Reduction: By shifting data to low-cost storage repositories, organizations can reduce expenses associated with warm storage.
      • Regulatory Compliance: Retaining old data is essential for compliance with regulations.
      • Future Reference and Analysis: Some historical data may be needed for future research or analysis.
  • Retrieval: extracting data from a backup due to data loss or data corruption

SAMPLE QUESTION

Which of the following is not a critical task in database administration?

A) Database Backup and Recovery

B) Data Archiving

C) Database Indexing

D) Database Query Optimization

Answer: C) Database Indexing

Explanation: Indexes are database objects that improve query performance by allowing faster data retrieval based on indexed columns. While they provide efficient access to specific rows within a table, indexing itself is not a critical database administration task.

KNOWLEDGE OF DATA SECURITY

Database security is a multifaceted endeavor that balances usability with protection. It’s essential to safeguard not only the data but also the DBMS, associated applications, and the underlying infrastructure. Security provides controls for the confidentiality, integrity, and availability of data within databases.

KEY CONCEPTS AND TERMINOLOGY

  • Administrative Controls:
    • Installation, Change, and Configuration Management: Govern the setup, modifications, and configuration changes to the database system.
    • User Administration: Manage user accounts, profiles, password policies, privileges, and roles.
    • Security Architecture: Design and implement security measures at the architectural level.
    • Operating System Security Principles: Secure the underlying operating system to prevent unauthorized access.
    • Database Application Security Models: Define access controls and permissions for applications interacting with the database.
  • Preventative Controls:
    • Access Controls: Restrict access to authorized users and roles.
    • Encryption: Encrypt sensitive data to prevent unauthorized reading.
    • Tokenization and Masking: Tokenize or mask data to protect sensitive information.
    • Organized Data Structure: Configure the database management system (DBMS) for optimal security.
    • Permissions and Access Controls: Set fine-grained permissions for users and roles.
  • Detective Controls:
    • Database Activity Monitoring:
    • Data Loss Prevention Tools: Identify and prevent unauthorized data transfers or leaks.
  • Data Owner: user with administrative privileges who creates tables, feature classes own those datasets.
  • User Access Roles:
    • Administrator - Full control of the database, can read, create, update, delete features. Can create and delete feature classes, tables, and other database items as well as modify the database schema.
    • Editor - can read, update, create, and delete features in existing tables.
    • Reader – can only view data shared with them.
    • Creator - can create additional feature classes, tables, as well as read, update, create, and delete features.
  • Authentication: database checks the list of users to make sure a user is allowed to make a connection.
  • Operating System (OS) authentication - refers to the process by which an operating system verifies the identity of a user or program attempting to access its resources.
  • Database Authentication - is the process of confirming that a user attempting to log in to a database is authorized to do so and has the rights to perform specific activities within that database. Access controls are stored within the database.

SAMPLE QUESTION

Which of the following is not a critical approach to ensure data security?

A) Authentication

B) Access control

C) Encryption

D) Database queries

Answer: D) Database queries.