SQL or Structured Query Language is actually built as a very simple language, which anyone can handle. However, it possesses some complications while people use it for working with the data stores which contain a huge number of data sets. While working with the mid to huge-size relational databases, it is essential to write top-performing queries with SQL. This used to cause some troubles to the programmers, and here in this article, we will discuss some big mistakes people tend to make in SQL programming on any platform like MySQL, Oracle, SQL Server, or other SQL databases.
1. Forgetting primary Keys
All database tables need a primary key. If there is no primary key on any table, then the tables are not meeting the SQL standard, which will deter the performance. The primary keys for tables are automatically assigned in the form of a clustered index, which will speed up the queries. Each key is made unique, and it is ideal to sue an auto-incremental numerical value if there is no other table column which has that unique requirement.
In terms of relational databases, setting up primary keys is a prime concern. These keys link to further to the foreign keys in any relational table. For example, if there is a table in a relational database which has the customer list, then “Customer Id” must be the unique column to every customer. This will act as the primary key. The ‘Customer Id’ values may be placed at the Orders table in order to link these tables. So, it is essential to use primary keys in all the tables you create irrespective of its size.
2. Ill-managed redundancy
Data redundancy is advisable for data backups, but it is not so for the tabled data. It is necessary for each table to contain an exclusive data set with no repetition of that data at another location. This is one the most confusing idea for a novice SQL programmer to abide by. Many tend to forget this normalization rule and tend to repeat the same data across different tables, which is primarily a faulty table design to further create problems.
Say for example, if you maintain a customer table with data in it as addresses. As each address relates to a unique customer, it’s kept in the proper location. Next, you may create a table for “order” and then add customer address in that table. This is a poor structure from a redundancy point of view. The ‘Order’ and ‘Customer’ table can be linked together by setting a relationship across the primary and foreign keys. When you fail to do this, you get two addresses at the end, and you get confused with which is accurate. So, it is advisable always to keep the data at one location and establish relationships between the primary and the foreign keys for proper querying.
3. Use JOIN instead of NOT IN and IN
The statements of ‘NOT IN’ and ‘IN’ are not optimized. They are used for just convenience but can be ideally replaced by the JOIN statement.
Let’s take a look at a classic example.
With the above statement, the data set we get is of the customers who don’t place an order. Running this statement, the database will grab all the orders from the ‘Order’ table and then extract out record set matching to the outer query of Customer table. If there are millions of orders to process, then this process can be extremely slow. However, a faster performing option as suggested by RemoteDBA.com is as below:
This statement, on the other hand, will return the dataset same to that of the above statement itself, but the query is optimized, and it speeds up the process. This statement will join two tables to the primary as well as the foreign key, which will enhance the performance.
4. Using NULL value against using the Empty String values
This has been a debate among the DBAs for quite long now. In SQL programming, you may choose the NULL values if there are no values present or it is also possible to use the actual values as a zero-length string or the integer 0 values. However, what you use in a database should be made unique across all the tables. Always keep in mind that the NULL values may not be the same as zero-length strings. So, the queries should account for such values.
So, while you determine which to use, ensure that your SQL queries actually account for those values. Say for example, if you let the NULL values for the last name of users, then you may query also using NULL filters in the clauses.
5. Looping structures with many cursors
Cursors needed to be carefully planned to ensure better database performance. Cursors let the queries to crawl through millions of records and also run the statement across each record. This may sound to be a benefit; however, it can actually reduce the database performance. It is so common in programming languages to have loops, but as far as SQL programming is concerned which is different from general application programming languages, it is a nuisance. The expert database administrators tend to reject the SQL procedures having cursors.
So, it is ideal to write such a procedure in a different way to avoid any element which may affect the database performance. Well, structured SQL statements can be used to replace the cursors. If it is unavoidable to put a cursor, then it should be only kept to the scheduled tasks and to be run only at off-peak hours. Nowadays, Cursors are only used in data transformation and reporting, etc. Even though it cannot be fully avoided, limit the cursors as much as possible in the production databases which need to perform plenty of queries on your database.
The use of relational databases (SQL DBs) are ideal for most of the back-end operations, and this is the reason why RDBMS still stay strong in the times of non-relational database management systems (NoSQL databases) too. You have to prepare the apt SQL statement and also optimize the queries and tables to ensure top performance. By avoiding the above-mentioned SQL pitfalls, you can create an efficient and faster database for your small, mid-level, or large business.