What is Data Warehousing?
Data Warehouse is a repository of integrated information, available for queries and analysis. Data and information are extracted from heterogeneous sources as they are generated....This makes it much easier and more efficient to run queries over data that originally came from different sources.Typical relational databases are designed for on-line transactional processing (OLTP) and do not meet the requirements for effective on-line analytical processing (OLAP). As a result, data warehouses are designed differently than traditional relational databases.
What is real time data-warehousing?
Explanation 1:Real-time data warehousing is a combination of two things: 1) real-time activity and 2) data warehousing. Real-time activity is activity that is happening right now. The activity could be anything such as the sale of widgets. Once the activity is complete, there is data about it.Data warehousing captures business activity data. Real-time data warehousing captures business activity data as it occurs. As soon as the business activity is complete and there is data about it, the completed activity data flows into the data warehouse and becomes available instantly. In other words, real-time data warehousing is a framework for deriving information from data as the data becomes available.
Explanation 2:In real-time data warehousing, your warehouse contains completely up-to-date data and is synchronized with the source systems that provide the source data. In near-real-time data warehousing, there is a minimal delay between source data being generated and being available in the data warehouse. Therefore, if you want to achieve real-time or near-real-time updates to your data warehouse, you’ll need to do three things:
Reduce or eliminate the time taken to get new and changed data out of your source systems.
Eliminate, or reduce as much as possible, the time required to cleanse, transform and load your data. Reduce as much as possible the time required to update your aggregates.
Starting with version 9i, and continuing with the latest 10g release, Oracle has gradually introduced features into the database to support real-time, and near-real-time, data warehousing. These features include:
Change Data Capture
External tables, table functions, pipelining, and the MERGE command, and
Fast refresh materialized views
What does level of Granularity of a fact table signify?
Explanation 1:Level of Granularity indicates the extent of aggregation that will be permitted to take place on the fact data. More Granularity implies more aggregation potential and vice-versa.
Explanation 2:Granularity means nothing but it is a level of representation of measures and metrics.The lowest level is called detailed data and highest level is called summary data
It depends of project we extract fact table significance.
What are Fact Tables?
Each data warehouse or data mart includes one or more fact tables. Central to a star or snowflake schema, a fact table captures the data that measures the organization's business operations. A fact table might contain business sales events such as cash register transactions or the contributions and expenditures of a nonprofit organization. Fact tables usually contain large numbers of rows, sometimes in the hundreds of millions of records when they contain one or more years of history for a large organization.A key characteristic of a fact table is that it contains numerical data (facts) that can be summarized to provide information about the history of the operation of the organization. Each fact table also includes a multipart index that contains as foreign keys the primary keys of related dimension tables, which contain the attributes of the fact records. Fact tables should not contain descriptive information or any data other than the numerical measurement fields and the index fields that relate the facts to corresponding entries in the dimension tables.
What are conformed dimensions?
A conformed dimension is a single, coherent view of the same piece of data throughout the organization. The same dimension is used in all subsequent star schemas defined. This enables
reporting across the complete data warehouse in a simple format
What is ODS?
ODS stands for Online Data Storage.
It is used to maintain, store the current and up to date information and the transactions regarding the source databases taken from the OLTP system.
It is directly connected to the source database systems instead of to the staging area.
It is further connected to data warehouse and moreover can be treated as a part of the data warehouse database.
It is used to maintain, store the current and up to date information and the transactions regarding the source databases taken from the OLTP system.
It is directly connected to the source database systems instead of to the staging area.
It is further connected to data warehouse and moreover can be treated as a part of the data warehouse database.
SCD (Slowly Changing Dimension)?
Explanation 1:
SCD Type 1, the attribute value is overwritten with the new value, obliterating the historical attribute values.For example, when the product roll-up
changes for a given product, the roll-up attribute is merely updated with the current value.
SCD Type 2,a new record with the new attributes is added to the dimension table. Historical fact table rows continue to reference the old dimension key with the old roll-up attribute; going forward, the fact table rows will reference the new surrogate key with the new roll-up thereby perfectly partitioning history.
SCDType 3, attributes are added to the dimension table to support two simultaneous roll-ups - perhaps the current product roll-up as well as “current version minus one”, or current version and original.
changes for a given product, the roll-up attribute is merely updated with the current value.
SCD Type 2,a new record with the new attributes is added to the dimension table. Historical fact table rows continue to reference the old dimension key with the old roll-up attribute; going forward, the fact table rows will reference the new surrogate key with the new roll-up thereby perfectly partitioning history.
SCDType 3, attributes are added to the dimension table to support two simultaneous roll-ups - perhaps the current product roll-up as well as “current version minus one”, or current version and original.
Explanation 1:
SCD:-------- The value of dimensions is used change very rarely, That is called Slowly Changing dimensions
Here mainly 3
SCD1:Replace the old values overwrite by new values
SCD2:Just Creating Additional records
SCD3:It's maintain just previous and recent
In the SCD2 again 3
Versioning
Flagvalue
Effective Date range
Versioning:Here the updated dimensions inserted in to the target along with version number
The new dimensions will be inserted into the target along with Primary key
Flagvalue:The updated dimensions insert into the target along with 0
and new dimensions inset into the target along with 1
What is Normalization, First Normal Form, Second Normal Form , Third Normal Form?
Explanation 1:
Normalization can be defined as segregating of table into two different tables, so as to avoid duplication of values.
The normalization is a step by step process of removing redundancies and dependencies of attributes in data structure
The condition of data at completion of each step is described as a “normal form”.
Needs for normalization : improves data base design.
Ensures minimum redundancy of data.
Reduces need to reorganize data when design is modified or enhanced.
Removes anomalies for database activities.
First normal form :
· A table is in first normal form when it contains no repeating groups.
· The repeating column or fields in an un normalized table are removed from the table and put in to tables of their own.
· Such a table becomes dependent on the parent table from which it is derived.
· The key to this table is called concatenated key, with the key of the parent table forming a part it.
Second normal form:
· A table is in second normal form if all its non_key fields fully dependent on the whole key.
· This means that each field in a table ,must depend on the entire key.
· Those that do not depend upon the combination key, are moved to another table on whose key they depend on.
· Structures which do not contain combination keys are automatically in second normal form.
Third normal form:
· A table is said to be in third normal form , if all the non key fields of the table are independent of all other non key fields of the same table.
The normalization is a step by step process of removing redundancies and dependencies of attributes in data structure
The condition of data at completion of each step is described as a “normal form”.
Needs for normalization : improves data base design.
Ensures minimum redundancy of data.
Reduces need to reorganize data when design is modified or enhanced.
Removes anomalies for database activities.
First normal form :
· A table is in first normal form when it contains no repeating groups.
· The repeating column or fields in an un normalized table are removed from the table and put in to tables of their own.
· Such a table becomes dependent on the parent table from which it is derived.
· The key to this table is called concatenated key, with the key of the parent table forming a part it.
Second normal form:
· A table is in second normal form if all its non_key fields fully dependent on the whole key.
· This means that each field in a table ,must depend on the entire key.
· Those that do not depend upon the combination key, are moved to another table on whose key they depend on.
· Structures which do not contain combination keys are automatically in second normal form.
Third normal form:
· A table is said to be in third normal form , if all the non key fields of the table are independent of all other non key fields of the same table.
Explanation 2:
1N.F:- The table should caontain scalar or atomic values.
2 N.F:- Table should be in 1N.F + No partial functional dependencies
3 N.F :-Table should be in 2 N.F + No transitive dependencies
2 N.F:- Table should be in 1N.F + No partial functional dependencies
3 N.F :-Table should be in 2 N.F + No transitive dependencies
Normalization can be defined as segregating of table into two different tables, so as to avoid duplication of values.
The normalization is a step by step process of removing redundancies and dependencies of attributes in data structure
The condition of data at completion of each step is described as a “normal form”.
Needs for normalization : improves data base design.Ensures minimum redundancy of data.
Reduces need to reorganize data when design is modified or enhanced.Removes anomalies for database activities.
First normal form :
· A table is in first normal form when it contains no repeating groups.
· The repeating column or fields in an un normalized table are removed from the table and put in to tables of their own.· Such a table becomes dependent on the parent table from which it is derived.
· The key to this table is called concatenated key, with the key of the parent table forming a part it
The normalization is a step by step process of removing redundancies and dependencies of attributes in data structure
The condition of data at completion of each step is described as a “normal form”.
Needs for normalization : improves data base design.Ensures minimum redundancy of data.
Reduces need to reorganize data when design is modified or enhanced.Removes anomalies for database activities.
First normal form :
· A table is in first normal form when it contains no repeating groups.
· The repeating column or fields in an un normalized table are removed from the table and put in to tables of their own.· Such a table becomes dependent on the parent table from which it is derived.
· The key to this table is called concatenated key, with the key of the parent table forming a part it
Second normal form:
· A table is in second normal form if all its non_key fields fully dependent on the whole key.
· This means that each field in a table ,must depend on the entire key.
· Those that do not depend upon the combination key, are moved to another table on whose key they depend on.
· Structures which do not contain combination keys are automatically in second normal form.
Third normal form:
· A table is said to be in third normal form , if all the non key fields of the table are independent of all other non key fields of the same table.
· A table is in second normal form if all its non_key fields fully dependent on the whole key.
· This means that each field in a table ,must depend on the entire key.
· Those that do not depend upon the combination key, are moved to another table on whose key they depend on.
· Structures which do not contain combination keys are automatically in second normal form.
Third normal form:
· A table is said to be in third normal form , if all the non key fields of the table are independent of all other non key fields of the same table.
What are non-additive facts?
fact table typically has two types of columns: those that contain numeric facts (often called measurements), and those that are foreign keys to dimension tables.
A fact table contains either detail-level facts or facts that have been aggregated. Fact tables that contain aggregated facts are often called summary tables. A fact table usually contains facts with the same level of aggregation.
Though most facts are additive, they can also be semi-additive or non-additive. Additive facts can be aggregated by simple arithmetical addition. A common example of this is sales. Non-additive facts cannot be added at all.
An example of this is averages. Semi-additive facts can be aggregated along some of the dimensions and not along others. An example of this is inventory levels, where you cannot tell what a level means simply by looking at it.
A fact table contains either detail-level facts or facts that have been aggregated. Fact tables that contain aggregated facts are often called summary tables. A fact table usually contains facts with the same level of aggregation.
Though most facts are additive, they can also be semi-additive or non-additive. Additive facts can be aggregated by simple arithmetical addition. A common example of this is sales. Non-additive facts cannot be added at all.
An example of this is averages. Semi-additive facts can be aggregated along some of the dimensions and not along others. An example of this is inventory levels, where you cannot tell what a level means simply by looking at it.
Semi-Additive:
Semi-Additive: Semi-additive facts are facts that can be summed up for some of the dimensions in the fact table, but not the others. For example:
Current_Balance and Profit_Margin are the facts. Current_Balance is a semi-additive fact, as it makes sense to add them up for all accounts (what's the total current balance for all accounts in the bank?), but it does not make sense to add them up through time (adding up all current balances for a given account for each day of the month does not give us any useful information
Current_Balance and Profit_Margin are the facts. Current_Balance is a semi-additive fact, as it makes sense to add them up for all accounts (what's the total current balance for all accounts in the bank?), but it does not make sense to add them up through time (adding up all current balances for a given account for each day of the month does not give us any useful information
What is dimension table?
A fact table is an entity representing the numerical measurements of the business. A dimension table is the entity describing the textual representation of the business.
In a banking model, the account and balances are the measurements within the fact table, where customer name, social security, address, time period of the account, are the dimensions (customer/time).
In a banking model, the account and balances are the measurements within the fact table, where customer name, social security, address, time period of the account, are the dimensions (customer/time).
What is the Difference between OLTP and OLAP?
OLTP:
Current data
Short database transactions
Online update/insert/delete
Normalization is promoted
High volume transactions
Transaction recovery is necessary
OLAP:
Current and historical data
Long database transactions
Batch update/insert/delete
Denormalization is promoted
Low volume transactions
Transaction recovery is not necessary
Current data
Short database transactions
Online update/insert/delete
Normalization is promoted
High volume transactions
Transaction recovery is necessary
OLAP:
Current and historical data
Long database transactions
Batch update/insert/delete
Denormalization is promoted
Low volume transactions
Transaction recovery is not necessary
What is data mining?
Data mining is a process of extracting hidden trends within a datawarehouse. For example an insurance dataware house can be used to mine data for the most high risk people to insure in a certain geographial area.
What is degenerate dimension table?
The values of dimension which is stored in fact table is called degenerate dimensions. these dimensions doesn,t have its own dimensions.in simple terms, the column in a fact table that does not map to any dimensions, neither it s a measure column.for e.g Invoice_no, Invoice_line_no in fact table will be a degenerate dimension (columns), provided if you dont have a dimension called invoice.
Degenerate Dimensions :
If a table contains the values, which r neither dimesion nor measures is called degenerate dimensions.Ex : invoice id,empno
What are Data Marts?
a data mart is a small data warehouse. In general, a data warehouse is divided into small units according the busness requirements. for example, if we take a Data Warehouse of an organization, then it may be divided into the following individual Data Marts. Data Marts are used to improve the performance during the retrieval of data.eg:Data Mart of Sales, Data Mart of Finance, Data Mart of Maketing, Data Mart of HR etc.Data marts are the sub set of enterprise data ware house ,which supports the business needs of middle management in the organization. Data marts are also called high performance query structure
What are the steps in Data Warehousing?
Requirement gathering for BI reports Identify the source databases Designing source to target mappings Designing DWH Designing ETL jobs Development of ETL jobs Data load into DWH
Type of Approaches in Datawarehousing?
TOP-DOWN
BOTTOM-UP
HYBRID
FEDERATED
BOTTOM-UP
HYBRID
FEDERATED
What is data integration?
A data integration system combines the data residing at
different sources(ex.oracle,db2,teradata) and provides a unified, reconciled view of the data for the ETL process.
different sources(ex.oracle,db2,teradata) and provides a unified, reconciled view of the data for the ETL process.
Explain Bill Inmon's versus Ralph Kimball's Approach to Data Warehousing.?
In the data warehousing field, we often hear about discussions on where a person / organization's philosophy falls into Bill Inmon's camp or into Ralph Kimball's camp. We describe below the difference between the two.
Bill Inmon's paradigm: Data warehouse is one part of the overall business intelligence system. An enterprise has one data warehouse, and data marts source their information from the data warehouse. In the data warehouse, information is stored in 3rd normal form.
Ralph Kimball's paradigm: Data warehouse is the conglomerate of all data marts within the enterprise. Information is always stored in the dimensional model. Kimball model also proposes the data warehouse bus architecture. This architecture is comprised of:
- A staging area (which can have an E/R or relationally designed 3NF design or flat file format), which cannot be accessed by an end-user of the data warehouse bus.
-The Data Warehouse Bus itself which includes several atomic data marts, several aggregated data marts and a personal data mart but no single or centralized data warehouse component.
The Data Warehouse Bus:
- Is dimensional;
- Contains transaction and summary data;
- Includes data marts, which have single subject or fact tables; and
- Can consist of multiple data marts in a single data base.
There is no right or wrong between these two ideas, as they represent different data warehousing philosophies. In reality, the data warehouse in most enterprises are closer to Ralph Kimball's idea. This is because most data warehouses started out as a departmental effort, and hence they originated as a data mart. Only when more data marts are built later do they evolve into a data warehouse
Bill Inmon's paradigm: Data warehouse is one part of the overall business intelligence system. An enterprise has one data warehouse, and data marts source their information from the data warehouse. In the data warehouse, information is stored in 3rd normal form.
Ralph Kimball's paradigm: Data warehouse is the conglomerate of all data marts within the enterprise. Information is always stored in the dimensional model. Kimball model also proposes the data warehouse bus architecture. This architecture is comprised of:
- A staging area (which can have an E/R or relationally designed 3NF design or flat file format), which cannot be accessed by an end-user of the data warehouse bus.
-The Data Warehouse Bus itself which includes several atomic data marts, several aggregated data marts and a personal data mart but no single or centralized data warehouse component.
The Data Warehouse Bus:
- Is dimensional;
- Contains transaction and summary data;
- Includes data marts, which have single subject or fact tables; and
- Can consist of multiple data marts in a single data base.
There is no right or wrong between these two ideas, as they represent different data warehousing philosophies. In reality, the data warehouse in most enterprises are closer to Ralph Kimball's idea. This is because most data warehouses started out as a departmental effort, and hence they originated as a data mart. Only when more data marts are built later do they evolve into a data warehouse
Different between Kimball and Inmon?
According to Kimball:
Kimball views data warehousing as a constituency of data marts. Data marts are focused on delivering business objectives for departments in the organization. And the data warehouse is a conformed dimension of the data marts. Hence a unified view of the enterprise can be obtain from the dimension modeling on a local departmental level.
According to Inmon: Inmon beliefs in creating a data warehouse on a subject-by-subject area basis. Hence the development of the data warehouse can start with data from the online store. Other subject areas can be added to the data warehouse as their needs arise. Point-of-sale (POS) data can be added later if management decides it is necessary.i.e.,Kimball--First DataMarts--Combined way ---DatawarehouseInmon---First Datawarehouse—Later----Datamarts.
Explain ODS and ODS types?
The ODS(Operational Data Store)Â is a subject-oriented, integrated, frequently augmented store of detailed data in the enterprise data warehouse. In ODS only reseant data will store.
OLTP Systems : Type I + ER Modelling
ODS : Type II + ER Modelling
DWH : Type II + Dimensional modelling
People use ODS as a alternative to DWH ofcourse there is a lot of difference and it doesnt support analysis on your bussiness like OLTP becasue of ER Modelling..
ODS : Type II + ER Modelling
DWH : Type II + Dimensional modelling
People use ODS as a alternative to DWH ofcourse there is a lot of difference and it doesnt support analysis on your bussiness like OLTP becasue of ER Modelling..
Explain the situations where snowflake is better than star schema?
A snowflake schema is a way to handle problems that do not fit within the star schema.It consists of outrigger tables which relate to dimensions rather than to the fact table.
The amount of space taken up by dimensions is so small compared to the space required for a fact table as to be insignificant. Therefore, tablespace or disk space is not a considered a reason to create a snowflake schema.
The main reason for creating a snowflake is to make it simpler and faster for a report writer to create drop down boxes.Rather than having to write a select distinct statement, they can simply select * from the code table.
Junk dimensions and mini dimensions are another reason to create add outriggers.The junk dimensions contain data from a normal dimension that you wish to separate out, such as fields that change quickly.Updates are so slow that they can add hours to the load process.With a junk dimension, it is possible to drop and add records rather than update.
Mini dimensions contain data that is so dissimilar between two or more source systems that would cause a very sparse main dimension. The conformed data that can be obtained from all source systems is contained in the parent dimension and the data from each source system that does not match is contained in the child dimension.
Finally, if you are unlucky enough to have end users actually adding or updating data to the data warehouse rather than just batch loads, it may be necessary to add these outriggers to maintain referential integrity in the data being loaded.
The amount of space taken up by dimensions is so small compared to the space required for a fact table as to be insignificant. Therefore, tablespace or disk space is not a considered a reason to create a snowflake schema.
The main reason for creating a snowflake is to make it simpler and faster for a report writer to create drop down boxes.Rather than having to write a select distinct statement, they can simply select * from the code table.
Junk dimensions and mini dimensions are another reason to create add outriggers.The junk dimensions contain data from a normal dimension that you wish to separate out, such as fields that change quickly.Updates are so slow that they can add hours to the load process.With a junk dimension, it is possible to drop and add records rather than update.
Mini dimensions contain data that is so dissimilar between two or more source systems that would cause a very sparse main dimension. The conformed data that can be obtained from all source systems is contained in the parent dimension and the data from each source system that does not match is contained in the child dimension.
Finally, if you are unlucky enough to have end users actually adding or updating data to the data warehouse rather than just batch loads, it may be necessary to add these outriggers to maintain referential integrity in the data being loaded.
What is the purpose of "Factless Fact Table"? How it is involved in Many to many relationship?
Factless fact contains only the key fields and any related dimensional bridge attributes. Doesn't contain actual measure fields that are used for aggregates or summary. Factless facts can be used as a bridge tables or to store measures that are used directly without aggregation(for e.g. max(), min() attributes).
What is meant by Aggregate Factable?
An aggregate fact table stores information that has been aggregated, or summarized from a detail fact table.Aggregate fact table ares useful in improving query performance.
Often an aggregate fact table can be maintained through the use of materialized views, which, under certain databases, can automagically be substituted for the detailed fact table if appropriate in resolving a query.
Often an aggregate fact table can be maintained through the use of materialized views, which, under certain databases, can automagically be substituted for the detailed fact table if appropriate in resolving a query.
What is the main FUNCTIONAL difference between ROLAP,MOLAP,HOLAP?
The FUNCTIONAL difference between these is how they information is stored. In all cases, the users see the data as a cube of dimensions and facts.
ROLAP - detailed data is stored in a relational database in 3NF, star, or snowflake form.Queries must summarize data on the fly.
MOLAP - data is stored in multidimensional form - dimension and facts stored together. You can think of this a a persistent cube. Level of detail is determined by the intersection of the dimension hierarchies.
HOALP - data is stored using a combination of relational and multi-dimensional storage. Summary data might persist as a cube, while detail data is stored relationally, but transitioning between the two is invisible to the end-user.
ROLAP - detailed data is stored in a relational database in 3NF, star, or snowflake form.Queries must summarize data on the fly.
MOLAP - data is stored in multidimensional form - dimension and facts stored together. You can think of this a a persistent cube. Level of detail is determined by the intersection of the dimension hierarchies.
HOALP - data is stored using a combination of relational and multi-dimensional storage. Summary data might persist as a cube, while detail data is stored relationally, but transitioning between the two is invisible to the end-user.
Explain the flow of data starting with OLTP to OLAP including staging ,summary tables,Facts and dimensions?
OLTP(1)---->ODS(2)------>DWH(3)-------->OLAP(4)------------>Reports(5)------>decision(6)
1-2 (extraction)
2-3 (Transformation and here ODS is itself staging area )
3-4-5 (Use of reporting tool and generate reports)
6-decision is taken i,e purpose of datawarehouse is served
2-3 (Transformation and here ODS is itself staging area )
3-4-5 (Use of reporting tool and generate reports)
6-decision is taken i,e purpose of datawarehouse is served
The flow of data from OLTP to OLAP is as follows:
1. Staging tables
2. Dimension tables
3. Fact tables or (Fact and Summary Tables)
4. Summary Tables
1. Staging tables
2. Dimension tables
3. Fact tables or (Fact and Summary Tables)
4. Summary Tables
What is the role of surrogate keys in data warehousing?
Explanation 1:
Surrogate key is nothing but a primary key to identify records uniquely in dimension and fact tables. It is basically a running system-generated sequence number.
Explanation 2:
The Surrogate key role is it links the Dimension and Fact table. It avoids smart keys and Production keys
Explanation 3:
A surrogate key is a simple Primary key which maps one to one with a Natural compound Primary key.The reason for using them is to alleviate the need for the query writer to know the full compound key and also to speed query processing by removing the need for the RDBMS to process the full compound key when considering a join.
For example, an shipment could have a natural key of ORDER + ITEM + SHIPMENT_SEQ. By giving it a unique SHIPMENT_ID, subordinate tables can access it with a single attribute, rather than 3.However, it's important to create a unique index on the natural key as well.
For example, an shipment could have a natural key of ORDER + ITEM + SHIPMENT_SEQ. By giving it a unique SHIPMENT_ID, subordinate tables can access it with a single attribute, rather than 3.However, it's important to create a unique index on the natural key as well.
A surrogate key is a substitution for the natural primary key. We tend to use our own Primary keys (surrogate keys) rather than depend on the primary key that is available in the source system. When integrating the data, trying to work with the source system primary keys will be a little hard to handle. Thats the reason why a surrogate key will be useful even though it serves the same purpose as that of a primary key. Another important need for it is because the natural primary key (i.e. Customer Number in Customer table) can change and this makes updates more difficult.
Summarize the differene between OLTP,ODS AND DATA?
OLTP - means online transaction processiing ,it is nothing but a database ,we are calling oracle,sqlserver,db2 are olap tools.
OLTP databases, as the name implies, handle real time transactions which inherently have some special requirements.
ODS- stands for Operational Data Store.Its a final integration point ETL process we load the data in ODS before you load the values in target..
DataWareHouse- Datawarehouse is collection of integrated,time varient,non volotile and time varient collection of data which is used to take management decisions.
What is the difference between Datawarehouse and Datawarehousing?
Data warehousing encompasses a complete architecture and process, whereas Data warehouse is Data stored in Database in the form of Dimension, Fact Tables, Lookup Tables, Aggregated Fact tables.
Datawarehousing is not just having a single data warehouse . Data warehousing is the transformation of data to information, thereby enabling the business to examine its operations and performance. This task is accomplished by the staging and transformation of data from data sources, enabling the business to access and analyze information. The data stores may be persistent (stored on disk) or transient (using disk or memory). In addition, the workflow usually involves multiple data stores to support the staging and transformation of data into information such as operational data stores, data warehouses, data marts, online analytical processing cubes, files such as a flat file (comma-separated values extract, for example), XML data and even spreadsheets
What is the difference between View and Materialized View?
view - store the SQL statement in the database and let you use it as a table. Everytime you access the view, the SQL statement executes.
materialized view - stores the results of the SQL in table form in the database. SQL statement only executes once and after that everytime you run the query, the stored result set is used. Pros include quick query results.
What is junk dimension? What is the difference between junk dimension and degenerated dimension?
A junk dimension is a collection of random transcational codes, flags and text attributes that are unrelated to any particular dimension.The junk dimension is simply a structure that provides the convienent place to store the junk dimension.what is the definition of normalized and denormalized view and what are the differences between them?OLTP uses the Normalization process and the OLAP/DW uses uses the denormalised process to capture greater level of detailed data(each and every tranaction)
Why fact table is in normal form?
A fact table consists of measurements of business requirements and foreign keys of dimensions tables as per business rules.
Basically the fact table consists of the Index keys of the dimension/ook up tables and the measures. so when ever we have the keys in a table .that itself implies that the table is in the normal form.
What is incremental loading? 2. What is batch processing? 3. What is cross reference table? 4. What is aggregate fact table?
let we explain these stuff like these... for eg you are working on universe it has n numbers of universe/classes/derived table.let we say these interms of objects.right now iam having 10 object(may be measure/dim/detail object what ever it may have).suppose i have created one object since now we have 11 object in the universe.if i export only the newly created object means it is called Incremental Loading or else when i export 10+1(newobjects) means that is normally export we doing always.
2. What's the difference between a primary key and a unique key?
Both primary key and unique enforce uniqueness of the column on which they are defined. But by default primary key creates a clustered index on the column, where are unique creates a nonclustered index by default. Another major difference is that, primary key doesn't allow NULLs, but nique key allows one NULL only.
3. What are user defined datatypes and when you should go for them?
User defined datatypes let you extend the base SQL Server datatypes by providing a descriptive name, and format to the database. Take for example, in your database, there is a column called Flight_Num which appears in many tables. In all these tables it should be varchar(8). In this case you could create a user defined datatype called Flight_num_type of varchar(8) and use it across all your tables.
4. What is bit datatype and what's the information that can be stored inside a bit column?
Bit datatype is used to store boolean information like 1 or 0 (true or false). Untill SQL Server 6.5 bit datatype could hold either a 1 or 0 and there was no support for NULL. But from SQL Server 7.0 onwards, bit datatype can represent a third state, which is NULL.
5. Define candidate key, alternate key, composite key
A candidate key is one that can identify each row of a table uniquely. Generally a candidate key becomes the primary key of the table. If the table has more than one candidate key, one of them will become the primary key, and the rest are called alternate keys.
A key formed by combining at least two or more columns is called composite key.
6. What are defaults? Is there a column to which a default can't be bound?
A default is a value that will be used by a column, if no value is supplied to that column while inserting data. IDENTITY columns and timestamp columns can't have defaults bound to them. See CREATE DEFUALT in books online.
7. What is a transaction and what are ACID properties?
A transaction is a logical unit of work in which, all the steps must be performed or none. ACID stands for Atomicity, Consistency, Isolation, Durability. These are the properties of a transaction. For more information and explanation of these properties, see SQL Server books online or any RDBMS fundamentals text book.
8. Explain different isolation levels
An isolation level determines the degree of isolation of data between concurrent transactions. The default SQL Server isolation level is Read Committed. Here are the other isolation levels (in the ascending order of isolation): Read Uncommitted, Read Committed, Repeatable Read, Serializable. See SQL Server books online for an explanation of the isolation levels. Be sure to read about SET TRANSACTION ISOLATION LEVEL, which lets you customize the isolation level at the connection level.
9. CREATE INDEX myIndex ON myTable(myColumn)
What type of Index will get created after executing the above statement?
Non-clustered index. Important thing to note: By default a clustered index gets created on the primary key, unless specified otherwise.
10. What's the maximum size of a row?
8060 bytes. Don't be surprised with questions like 'what is the maximum number of columns per table'. Check out SQL Server books online for the page titled: "Maximum Capacity Specifications".
11. Explain Active/Active and Active/Passive cluster configurations
Hopefully you have experience setting up cluster servers. But if you don't, at least be familiar with the way clustering works and the two clustering configurations Active/Active and Active/Passive. SQL Server books online has enough information on this topic and there is a good white paper available on Microsoft site.
12. Explain the architecture of SQL Server
This is a very important question and you better be able to answer it if consider yourself a DBA. SQL Server books online is the best place to read about SQL Server architecture. Read up the chapter dedicated to SQL Server Architecture.
13. What is lock escalation?
Lock escalation is the process of converting a lot of low level locks (like row locks, page locks) into higher level locks (like table locks). Every lock is a memory structure too many locks would mean, more memory being occupied by locks. To prevent this from happening, SQL Server escalates the many fine-grain locks to fewer coarse-grain locks. Lock escalation threshold was definable in SQL Server 6.5, but from SQL Server 7.0 onwards it's dynamically managed by SQL Server.
14. What's the difference between DELETE TABLE and TRUNCATE TABLE commands?
DELETE TABLE is a logged operation, so the deletion of each row gets logged in the transaction log, which makes it slow. TRUNCATE TABLE also deletes all the rows in a table, but it won't log the deletion of each row, instead it logs the deallocation of the data pages of the table, which makes it faster. Of course, TRUNCATE TABLE can be rolled back.
15. Explain the storage models of OLAP
Check out MOLAP, ROLAP and HOLAP in SQL Server books online for more infomation.
16. What are the new features introduced in SQL Server 2000 (or the latest release of SQL Server at the time of your interview)? What changed between the previous version of SQL Server and the current version?
This question is generally asked to see how current is your knowledge. Generally there is a section in the beginning of the books online titled "What's New", which has all such information. Of course, reading just that is not enough, you should have tried those things to better answer the questions. Also check out the section titled "Backward Compatibility" in books online which talks about the changes that have taken place in the new version.
1. What are constraints? Explain different types of constraints
Constraints enable the RDBMS enforce the integrity of the database automatically, without needing you to create triggers, rule or defaults.
Types of constraints: NOT NULL, CHECK, UNIQUE, PRIMARY KEY, FOREIGN KEY
For an explanation of these constraints see books online for the pages titled: "Constraints" and "CREATE TABLE", "ALTER TABLE"
2. What is an index? What are the types of indexes? How many clustered indexes can be created on a table? I create a separate index on each column of a table. what are the advantages and disadvantages of this approach?
Indexes in SQL Server are similar to the indexes in books. They help SQL Server retrieve the data quicker.
Indexes are of two types. Clustered indexes and non-clustered indexes. When you craete a clustered index on a table, all the rows in the table are stored in the order of the clustered index key. So, there can be only one clustered index per table. Non-clustered indexes have their own storage separate from the table data storage. Non-clustered indexes are stored as B-tree structures (so do clustered indexes), with the leaf level nodes having the index key and it's row locater. The row located could be the RID or the Clustered index key, depending up on the absence or presence of clustered index on the table.
If you create an index on each column of a table, it improves the query performance, as the query optimizer can choose from all the existing indexes to come up with an efficient execution plan. At the same t ime, data modification operations (such as INSERT, UPDATE, DELETE) will become slow, as every time data changes in the table, all the indexes need to be updated. Another disadvantage is that, indexes need disk space, the more indexes you have, more disk space is used.
3. What is RAID and what are different types of RAID configurations?
RAID stands for Redundant Array of Inexpensive Disks, used to provide fault tolerance to database servers. There are six RAID levels 0 through 5 offering different levels of performance, fault tolerance. MSDN has some information about RAID levels and for detailed information, check out the RAID advisory board's homepage.
4. What are the steps you will take to improve performance of a poor performing query?
This is a very open ended question and there could be a lot of reasons behind the poor performance of a query. But some general issues that you could talk about would be: No indexes, table scans, missing or out of date statistics, blocking, excess recompilations of stored procedures, procedures and triggers without SET NOCOUNT ON, poorly written query with unnecessarily complicated joins, too much normalization, excess usage of cursors and temporary tables.
Some of the tools/ways that help you troubleshooting performance problems are: SET SHOWPLAN_ALL ON, SET SHOWPLAN_TEXT ON, SET STATISTICS IO ON, SQL Server Profiler, Windows NT /2000 Performance monitor, Graphical execution plan in Query Analyzer.
Download the white paper on performance tuning SQL Server from Microsoft web site. Don't forget to check out sql-server-performance.com
5. What are the steps you will take, if you are tasked with securing an SQL Server?
Again this is another open ended question. Here are some things you could talk about: Preferring NT authentication, using server, database and application roles to control access to the data, securing the physical database files using NTFS permissions, using an unguessable SA password, restricting physical access to the SQL Server, renaming the Administrator account on the SQL Server computer, disabling the Guest account, enabling auditing, using multiprotocol encryption, setting up SSL, setting up firewalls, isolating SQL Server from the web server etc.
6. What is a deadlock and what is a live lock? How will you go about resolving deadlocks?
Deadlock is a situation when two processes, each having a lock on one piece of data, attempt to acquire a lock on the other's piece. Each process would wait indefinitely for the other to release the lock, unless one of the user processes is terminated. SQL Server detects deadlocks and terminates one user's process.
A livelock is one, where a request for an exclusive lock is repeatedly denied because a series of overlapping shared locks keeps interfering. SQL Server detects the situation after four denials and refuses further shared locks. A livelock also occurs when read transactions monopolize a table or page, forcing a write transaction to wait indefinitely.
7. What is blocking and how would you troubleshoot it?
Blocking happens when one connection from an application holds a lock and a second connection requires a conflicting lock type. This forces the second connection to wait, blocked on the first.
Read up the following topics in SQL Server books online: Understanding and avoiding blocking, Coding efficient transactions.
8. Explain CREATE DATABASE syntax
Many of us are used to craeting databases from the Enterprise Manager or by just issuing the command: CREATE DATABAE MyDB. But what if you have to create a database with two filegroups, one on drive C and the other on drive D with log on drive E with an initial size of 600 MB and with a growth factor of 15%? That's why being a DBA you should be familiar with the CREATE DATABASE syntax. Check out SQL Server books online for more information.
9. How to restart SQL Server in single user mode? How to start SQL Server in minimal configuration mode?
SQL Server can be started from command line, using the SQLSERVR.EXE. This EXE has some very important parameters with which a DBA should be familiar with. -m is used for starting SQL Server in single user mode and -f is used to start the SQL Server in minimal configuration mode. Check out SQL Server books online for more parameters and their explanations.
10. As a part of your job, what are the DBCC commands that you commonly use for database maintenance?
DBCC CHECKDB, DBCC CHECKTABLE, DBCC CHECKCATALOG, DBCC CHECKALLOC, DBCC SHOWCONTIG, DBCC SHRINKDATABASE, DBCC SHRINKFILE etc. But there are a whole load of DBCC commands which are very useful for DBAs. Check out SQL Server books online for more information.
11. What are statistics, under what circumstances they go out of date, how do you update them?
Statistics determine the selectivity of the indexes. If an indexed column has unique values then the selectivity of that index is more, as opposed to an index with non-unique values. Query optimizer uses these indexes in determining whether to choose an index or not while executing a query.
Some situations under which you should update statistics:
1) If there is significant change in the key values in the index
2) If a large amount of data in an indexed column has been added, changed, or removed (that is, if the distribution of key values has changed), or the table has been truncated using the TRUNCATE TABLE statement and then repopulated
3) Database is upgraded from a previous version
Look up SQL Server books online for the following commands: UPDATE STATISTICS, STATS_DATE, DBCC SHOW_STATISTICS, CREATE STATISTICS, DROP STATISTICS, sp_autostats, sp_createstats, sp_updatestats
12. What are the different ways of moving data/databases between servers and databases in SQL Server?
There are lots of options available, you have to choose your option depending upon your requirements. Some of the options you have are: BACKUP/RESTORE, dettaching and attaching databases, replication, DTS, BCP, logshipping, INSERT...SELECT, SELECT...INTO, creating INSERT scripts to generate data.
13. Explain different types of BACKUPs available in SQL Server? Given a particular scenario, how would you go about choosing a backup plan?
Types of backups you can create in SQL Sever 7.0+ are Full database backup, differential database backup, transaction log backup, filegroup backup. Check out the BACKUP and RESTORE commands in SQL Server books online. Be prepared to write the commands in your interview. Books online also has information on detailed backup/restore architecture and when one should go for a particular kind of backup.
14. What is database replication? What are the different types of replication you can set up in SQL Server?
Replication is the process of copying/moving data between databases on the same or different servers. SQL Server supports the following types of replication scenarios:
o Snapshot replication
o Transactional replication (with immediate updating subscribers, with queued updating subscribers)
o Merge replication
See SQL Server books online for indepth coverage on replication. Be prepared to explain how different replication agents function, what are the main system tables used in replication etc.
15. How to determine the service pack currently installed on SQL Server?
The global variable @@Version stores the build number of the sqlservr.exe, which is used to determine the service pack installed. To know more about this process visit
What are cursors? Explain different types of cursors. What are the disadvantages of cursors? How can you avoid cursors?
Cursors allow row-by-row processing of the resultsets.
Types of cursors: Static, Dynamic, Forward-only, Keyset-driven. See books online for more information.
Disadvantages of cursors: Each time you fetch a row from the cursor, it results in a network roundtrip, where as a normal SELECT query makes only one rowundtrip, however large the resultset is. Cursors are also costly because they require more resources and temporary storage (results in more IO operations). Furthere, there are restrictions on the SELECT statements that can be used with some types of cursors.
Most of the times, set based operations can be used instead of cursors. Here is an example:
If you have to give a flat hike to your employees using the following criteria:
Salary between 30000 and 40000 -- 5000 hike
Salary between 40000 and 55000 -- 7000 hike
Salary between 55000 and 65000 -- 9000 hike
In this situation many developers tend to use a cursor, determine each employee's salary and update his salary according to the above formula. But the same can be achieved by multiple update statements or can be combined in a single UPDATE statement as shown below:
UPDATE tbl_emp SET salary =
CASE WHEN salary BETWEEN 30000 AND 40000 THEN salary + 5000
WHEN salary BETWEEN 40000 AND 55000 THEN salary + 7000
WHEN salary BETWEEN 55000 AND 65000 THEN salary + 10000
END
Another situation in which developers tend to use cursors: You need to call a stored procedure when a column in a particular row meets certain condition. You don't have to use cursors for this. This can be achieved using WHILE loop, as long as there is a unique key to identify each row.
17. Write down the general syntax for a SELECT statements covering all the options
Here's the basic syntax: (Also checkout SELECT in books online for advanced syntax).
SELECT select_list
[INTO new_table_]
FROM table_source
[WHERE search_condition]
[GROUP BY group_by_expression]
[HAVING search_condition]
[ORDER BY order_expression [ASC | DESC] ]
18. What is a join and explain different types of joins
Joins are used in queries to explain how different tables are related. Joins also let you select data from a table depending upon data from another table.
Types of joins: INNER JOINs, OUTER JOINs, CROSS JOINs.OUTER JOINs are further classified as LEFT OUTER JOINS, RIGHT OUTER JOINS and FULL OUTER JOINS.
19. Can you have a nested transaction?
Yes, very much. Check out BEGIN TRAN, COMMIT, ROLLBACK, SAVE TRAN and @@TRANCOUNT
20. What is an extended stored procedure? Can you instantiate a COM object by using T-SQL?
An extended stored procedure is a function within a DLL (written in a programming language like C, C++ using Open Data Services (ODS) API) that can be called from T-SQL,just the way we call normal stored procedures using the EXEC statement. See books online to learn how to create extended stored procedures and how to add them to SQL Server.
Yes, you can instantiate a COM (written in languages like VB, VC++) object from T-SQL by using sp_OACreate stored procedure. Also see books online for sp_OAMethod, sp_OAGetProperty, sp_OASetProperty, sp_OADestroy.
21. What is the system function to get the current user's user id?
USER_ID().Also check out other system functions like USER_NAME(), SYSTEM_USER, SESSION_USER, CURRENT_USER, USER, SUSER_SID(), HOST_NAME().
22. What are triggers? How many triggers you can have on a table? How to invoke a trigger on demand?
Triggers are special kind of stored procedures that get executed automatically when an INSERT, UPDATE or DELETE operation takes place on a table.
In SQL Server 6.5 you could define only 3 triggers per table, one for INSERT, one for UPDATE and one for DELETE. From SQL Server 7.0 onwards, this restriction is gone, and you could create multiple triggers per each action. But in 7.0 there's no way to control the order in which the triggers fire. In SQL Server 2000 you could specify which trigger fires first or fires last using sp_settriggerorder
Triggers can't be invoked on demand. They get triggered only when an associated action (INSERT, UPDATE, DELETE) happens on the table on which they are defined.
Triggers are generally used to implement business rules, auditing. Triggers can also be used to extend the referential integrity checks, but wherever possible, use constraints for this purpose, instead of triggers, as constraints are much faster.
Till SQL Server 7.0, triggers fire only after the data modification operation happens. So in a way, they are called post triggers. But in SQL Server 2000 you could create pre triggers also. Search SQL Server 2000 books online for INSTEAD OF triggers.
23. There is a trigger defined for INSERT operations on a table, in an OLTP system. The trigger is written to instantiate a COM object and pass the newly insterted rows to it for some custom processing. What do you think of this implementation? Can this be implemented better?
Instantiating COM objects is a time consuming process and since you are doing it from within a trigger, it slows down the data insertion process. Same is the case with sending emails from triggers. This scenario can be better implemented by logging all the necessary data into a separate table, and have a job which periodically checks this table and does the needful.
24. What is a self join? Explain it with an example
Self join is just like any other join, except that two instances of the same table will be joined in the query. Here is an example: Employees table which contains rows for normal employees as well as managers. So, to find out the managers of all the employees, you need a self join.
CREATE TABLE emp
(
empid int,
mgrid int,
empname char(10)
)
INSERT emp SELECT 1,2,'Vyas'
INSERT emp SELECT 2,3,'Mohan'
INSERT emp SELECT 3,NULL,'Shobha'
INSERT emp SELECT 4,2,'Shridhar'
INSERT emp SELECT 5,2,'Sourabh'
SELECT t1.empname [Employee], t2.empname [Manager]
FROM emp t1, emp t2
WHERE t1.mgrid = t2.empid
Here's an advanced query using a LEFT OUTER JOIN that even returns the employees without managers (super bosses)
SELECT t1.empname [Employee], COALESCE(t2.empname, 'No manager') [Manager]
FROM emp t1
LEFT OUTER JOIN
emp t2
ON
t1.mgrid = t2.empid
SQL:
2. What's the difference between a primary key and a unique key?
Both primary key and unique enforce uniqueness of the column on which they are defined. But by default primary key creates a clustered index on the column, where are unique creates a nonclustered index by default. Another major difference is that, primary key doesn't allow NULLs, but nique key allows one NULL only.
3. What are user defined datatypes and when you should go for them?
User defined datatypes let you extend the base SQL Server datatypes by providing a descriptive name, and format to the database. Take for example, in your database, there is a column called Flight_Num which appears in many tables. In all these tables it should be varchar(8). In this case you could create a user defined datatype called Flight_num_type of varchar(8) and use it across all your tables.
4. What is bit datatype and what's the information that can be stored inside a bit column?
Bit datatype is used to store boolean information like 1 or 0 (true or false). Untill SQL Server 6.5 bit datatype could hold either a 1 or 0 and there was no support for NULL. But from SQL Server 7.0 onwards, bit datatype can represent a third state, which is NULL.
5. Define candidate key, alternate key, composite key
A candidate key is one that can identify each row of a table uniquely. Generally a candidate key becomes the primary key of the table. If the table has more than one candidate key, one of them will become the primary key, and the rest are called alternate keys.
A key formed by combining at least two or more columns is called composite key.
6. What are defaults? Is there a column to which a default can't be bound?
A default is a value that will be used by a column, if no value is supplied to that column while inserting data. IDENTITY columns and timestamp columns can't have defaults bound to them. See CREATE DEFUALT in books online.
7. What is a transaction and what are ACID properties?
A transaction is a logical unit of work in which, all the steps must be performed or none. ACID stands for Atomicity, Consistency, Isolation, Durability. These are the properties of a transaction. For more information and explanation of these properties, see SQL Server books online or any RDBMS fundamentals text book.
8. Explain different isolation levels
An isolation level determines the degree of isolation of data between concurrent transactions. The default SQL Server isolation level is Read Committed. Here are the other isolation levels (in the ascending order of isolation): Read Uncommitted, Read Committed, Repeatable Read, Serializable. See SQL Server books online for an explanation of the isolation levels. Be sure to read about SET TRANSACTION ISOLATION LEVEL, which lets you customize the isolation level at the connection level.
9. CREATE INDEX myIndex ON myTable(myColumn)
What type of Index will get created after executing the above statement?
Non-clustered index. Important thing to note: By default a clustered index gets created on the primary key, unless specified otherwise.
10. What's the maximum size of a row?
8060 bytes. Don't be surprised with questions like 'what is the maximum number of columns per table'. Check out SQL Server books online for the page titled: "Maximum Capacity Specifications".
11. Explain Active/Active and Active/Passive cluster configurations
Hopefully you have experience setting up cluster servers. But if you don't, at least be familiar with the way clustering works and the two clustering configurations Active/Active and Active/Passive. SQL Server books online has enough information on this topic and there is a good white paper available on Microsoft site.
12. Explain the architecture of SQL Server
This is a very important question and you better be able to answer it if consider yourself a DBA. SQL Server books online is the best place to read about SQL Server architecture. Read up the chapter dedicated to SQL Server Architecture.
13. What is lock escalation?
Lock escalation is the process of converting a lot of low level locks (like row locks, page locks) into higher level locks (like table locks). Every lock is a memory structure too many locks would mean, more memory being occupied by locks. To prevent this from happening, SQL Server escalates the many fine-grain locks to fewer coarse-grain locks. Lock escalation threshold was definable in SQL Server 6.5, but from SQL Server 7.0 onwards it's dynamically managed by SQL Server.
14. What's the difference between DELETE TABLE and TRUNCATE TABLE commands?
DELETE TABLE is a logged operation, so the deletion of each row gets logged in the transaction log, which makes it slow. TRUNCATE TABLE also deletes all the rows in a table, but it won't log the deletion of each row, instead it logs the deallocation of the data pages of the table, which makes it faster. Of course, TRUNCATE TABLE can be rolled back.
15. Explain the storage models of OLAP
Check out MOLAP, ROLAP and HOLAP in SQL Server books online for more infomation.
16. What are the new features introduced in SQL Server 2000 (or the latest release of SQL Server at the time of your interview)? What changed between the previous version of SQL Server and the current version?
This question is generally asked to see how current is your knowledge. Generally there is a section in the beginning of the books online titled "What's New", which has all such information. Of course, reading just that is not enough, you should have tried those things to better answer the questions. Also check out the section titled "Backward Compatibility" in books online which talks about the changes that have taken place in the new version.
1. What are constraints? Explain different types of constraints
Constraints enable the RDBMS enforce the integrity of the database automatically, without needing you to create triggers, rule or defaults.
Types of constraints: NOT NULL, CHECK, UNIQUE, PRIMARY KEY, FOREIGN KEY
For an explanation of these constraints see books online for the pages titled: "Constraints" and "CREATE TABLE", "ALTER TABLE"
2. What is an index? What are the types of indexes? How many clustered indexes can be created on a table? I create a separate index on each column of a table. what are the advantages and disadvantages of this approach?
Indexes in SQL Server are similar to the indexes in books. They help SQL Server retrieve the data quicker.
Indexes are of two types. Clustered indexes and non-clustered indexes. When you craete a clustered index on a table, all the rows in the table are stored in the order of the clustered index key. So, there can be only one clustered index per table. Non-clustered indexes have their own storage separate from the table data storage. Non-clustered indexes are stored as B-tree structures (so do clustered indexes), with the leaf level nodes having the index key and it's row locater. The row located could be the RID or the Clustered index key, depending up on the absence or presence of clustered index on the table.
If you create an index on each column of a table, it improves the query performance, as the query optimizer can choose from all the existing indexes to come up with an efficient execution plan. At the same t ime, data modification operations (such as INSERT, UPDATE, DELETE) will become slow, as every time data changes in the table, all the indexes need to be updated. Another disadvantage is that, indexes need disk space, the more indexes you have, more disk space is used.
3. What is RAID and what are different types of RAID configurations?
RAID stands for Redundant Array of Inexpensive Disks, used to provide fault tolerance to database servers. There are six RAID levels 0 through 5 offering different levels of performance, fault tolerance. MSDN has some information about RAID levels and for detailed information, check out the RAID advisory board's homepage.
4. What are the steps you will take to improve performance of a poor performing query?
This is a very open ended question and there could be a lot of reasons behind the poor performance of a query. But some general issues that you could talk about would be: No indexes, table scans, missing or out of date statistics, blocking, excess recompilations of stored procedures, procedures and triggers without SET NOCOUNT ON, poorly written query with unnecessarily complicated joins, too much normalization, excess usage of cursors and temporary tables.
Some of the tools/ways that help you troubleshooting performance problems are: SET SHOWPLAN_ALL ON, SET SHOWPLAN_TEXT ON, SET STATISTICS IO ON, SQL Server Profiler, Windows NT /2000 Performance monitor, Graphical execution plan in Query Analyzer.
Download the white paper on performance tuning SQL Server from Microsoft web site. Don't forget to check out sql-server-performance.com
5. What are the steps you will take, if you are tasked with securing an SQL Server?
Again this is another open ended question. Here are some things you could talk about: Preferring NT authentication, using server, database and application roles to control access to the data, securing the physical database files using NTFS permissions, using an unguessable SA password, restricting physical access to the SQL Server, renaming the Administrator account on the SQL Server computer, disabling the Guest account, enabling auditing, using multiprotocol encryption, setting up SSL, setting up firewalls, isolating SQL Server from the web server etc.
6. What is a deadlock and what is a live lock? How will you go about resolving deadlocks?
Deadlock is a situation when two processes, each having a lock on one piece of data, attempt to acquire a lock on the other's piece. Each process would wait indefinitely for the other to release the lock, unless one of the user processes is terminated. SQL Server detects deadlocks and terminates one user's process.
A livelock is one, where a request for an exclusive lock is repeatedly denied because a series of overlapping shared locks keeps interfering. SQL Server detects the situation after four denials and refuses further shared locks. A livelock also occurs when read transactions monopolize a table or page, forcing a write transaction to wait indefinitely.
7. What is blocking and how would you troubleshoot it?
Blocking happens when one connection from an application holds a lock and a second connection requires a conflicting lock type. This forces the second connection to wait, blocked on the first.
Read up the following topics in SQL Server books online: Understanding and avoiding blocking, Coding efficient transactions.
8. Explain CREATE DATABASE syntax
Many of us are used to craeting databases from the Enterprise Manager or by just issuing the command: CREATE DATABAE MyDB. But what if you have to create a database with two filegroups, one on drive C and the other on drive D with log on drive E with an initial size of 600 MB and with a growth factor of 15%? That's why being a DBA you should be familiar with the CREATE DATABASE syntax. Check out SQL Server books online for more information.
9. How to restart SQL Server in single user mode? How to start SQL Server in minimal configuration mode?
SQL Server can be started from command line, using the SQLSERVR.EXE. This EXE has some very important parameters with which a DBA should be familiar with. -m is used for starting SQL Server in single user mode and -f is used to start the SQL Server in minimal configuration mode. Check out SQL Server books online for more parameters and their explanations.
10. As a part of your job, what are the DBCC commands that you commonly use for database maintenance?
DBCC CHECKDB, DBCC CHECKTABLE, DBCC CHECKCATALOG, DBCC CHECKALLOC, DBCC SHOWCONTIG, DBCC SHRINKDATABASE, DBCC SHRINKFILE etc. But there are a whole load of DBCC commands which are very useful for DBAs. Check out SQL Server books online for more information.
11. What are statistics, under what circumstances they go out of date, how do you update them?
Statistics determine the selectivity of the indexes. If an indexed column has unique values then the selectivity of that index is more, as opposed to an index with non-unique values. Query optimizer uses these indexes in determining whether to choose an index or not while executing a query.
Some situations under which you should update statistics:
1) If there is significant change in the key values in the index
2) If a large amount of data in an indexed column has been added, changed, or removed (that is, if the distribution of key values has changed), or the table has been truncated using the TRUNCATE TABLE statement and then repopulated
3) Database is upgraded from a previous version
Look up SQL Server books online for the following commands: UPDATE STATISTICS, STATS_DATE, DBCC SHOW_STATISTICS, CREATE STATISTICS, DROP STATISTICS, sp_autostats, sp_createstats, sp_updatestats
12. What are the different ways of moving data/databases between servers and databases in SQL Server?
There are lots of options available, you have to choose your option depending upon your requirements. Some of the options you have are: BACKUP/RESTORE, dettaching and attaching databases, replication, DTS, BCP, logshipping, INSERT...SELECT, SELECT...INTO, creating INSERT scripts to generate data.
13. Explain different types of BACKUPs available in SQL Server? Given a particular scenario, how would you go about choosing a backup plan?
Types of backups you can create in SQL Sever 7.0+ are Full database backup, differential database backup, transaction log backup, filegroup backup. Check out the BACKUP and RESTORE commands in SQL Server books online. Be prepared to write the commands in your interview. Books online also has information on detailed backup/restore architecture and when one should go for a particular kind of backup.
14. What is database replication? What are the different types of replication you can set up in SQL Server?
Replication is the process of copying/moving data between databases on the same or different servers. SQL Server supports the following types of replication scenarios:
o Snapshot replication
o Transactional replication (with immediate updating subscribers, with queued updating subscribers)
o Merge replication
See SQL Server books online for indepth coverage on replication. Be prepared to explain how different replication agents function, what are the main system tables used in replication etc.
15. How to determine the service pack currently installed on SQL Server?
The global variable @@Version stores the build number of the sqlservr.exe, which is used to determine the service pack installed. To know more about this process visit
What are cursors? Explain different types of cursors. What are the disadvantages of cursors? How can you avoid cursors?
Cursors allow row-by-row processing of the resultsets.
Types of cursors: Static, Dynamic, Forward-only, Keyset-driven. See books online for more information.
Disadvantages of cursors: Each time you fetch a row from the cursor, it results in a network roundtrip, where as a normal SELECT query makes only one rowundtrip, however large the resultset is. Cursors are also costly because they require more resources and temporary storage (results in more IO operations). Furthere, there are restrictions on the SELECT statements that can be used with some types of cursors.
Most of the times, set based operations can be used instead of cursors. Here is an example:
If you have to give a flat hike to your employees using the following criteria:
Salary between 30000 and 40000 -- 5000 hike
Salary between 40000 and 55000 -- 7000 hike
Salary between 55000 and 65000 -- 9000 hike
In this situation many developers tend to use a cursor, determine each employee's salary and update his salary according to the above formula. But the same can be achieved by multiple update statements or can be combined in a single UPDATE statement as shown below:
UPDATE tbl_emp SET salary =
CASE WHEN salary BETWEEN 30000 AND 40000 THEN salary + 5000
WHEN salary BETWEEN 40000 AND 55000 THEN salary + 7000
WHEN salary BETWEEN 55000 AND 65000 THEN salary + 10000
END
Another situation in which developers tend to use cursors: You need to call a stored procedure when a column in a particular row meets certain condition. You don't have to use cursors for this. This can be achieved using WHILE loop, as long as there is a unique key to identify each row.
17. Write down the general syntax for a SELECT statements covering all the options
Here's the basic syntax: (Also checkout SELECT in books online for advanced syntax).
SELECT select_list
[INTO new_table_]
FROM table_source
[WHERE search_condition]
[GROUP BY group_by_expression]
[HAVING search_condition]
[ORDER BY order_expression [ASC | DESC] ]
18. What is a join and explain different types of joins
Joins are used in queries to explain how different tables are related. Joins also let you select data from a table depending upon data from another table.
Types of joins: INNER JOINs, OUTER JOINs, CROSS JOINs.OUTER JOINs are further classified as LEFT OUTER JOINS, RIGHT OUTER JOINS and FULL OUTER JOINS.
19. Can you have a nested transaction?
Yes, very much. Check out BEGIN TRAN, COMMIT, ROLLBACK, SAVE TRAN and @@TRANCOUNT
20. What is an extended stored procedure? Can you instantiate a COM object by using T-SQL?
An extended stored procedure is a function within a DLL (written in a programming language like C, C++ using Open Data Services (ODS) API) that can be called from T-SQL,just the way we call normal stored procedures using the EXEC statement. See books online to learn how to create extended stored procedures and how to add them to SQL Server.
Yes, you can instantiate a COM (written in languages like VB, VC++) object from T-SQL by using sp_OACreate stored procedure. Also see books online for sp_OAMethod, sp_OAGetProperty, sp_OASetProperty, sp_OADestroy.
21. What is the system function to get the current user's user id?
USER_ID().Also check out other system functions like USER_NAME(), SYSTEM_USER, SESSION_USER, CURRENT_USER, USER, SUSER_SID(), HOST_NAME().
22. What are triggers? How many triggers you can have on a table? How to invoke a trigger on demand?
Triggers are special kind of stored procedures that get executed automatically when an INSERT, UPDATE or DELETE operation takes place on a table.
In SQL Server 6.5 you could define only 3 triggers per table, one for INSERT, one for UPDATE and one for DELETE. From SQL Server 7.0 onwards, this restriction is gone, and you could create multiple triggers per each action. But in 7.0 there's no way to control the order in which the triggers fire. In SQL Server 2000 you could specify which trigger fires first or fires last using sp_settriggerorder
Triggers can't be invoked on demand. They get triggered only when an associated action (INSERT, UPDATE, DELETE) happens on the table on which they are defined.
Triggers are generally used to implement business rules, auditing. Triggers can also be used to extend the referential integrity checks, but wherever possible, use constraints for this purpose, instead of triggers, as constraints are much faster.
Till SQL Server 7.0, triggers fire only after the data modification operation happens. So in a way, they are called post triggers. But in SQL Server 2000 you could create pre triggers also. Search SQL Server 2000 books online for INSTEAD OF triggers.
23. There is a trigger defined for INSERT operations on a table, in an OLTP system. The trigger is written to instantiate a COM object and pass the newly insterted rows to it for some custom processing. What do you think of this implementation? Can this be implemented better?
Instantiating COM objects is a time consuming process and since you are doing it from within a trigger, it slows down the data insertion process. Same is the case with sending emails from triggers. This scenario can be better implemented by logging all the necessary data into a separate table, and have a job which periodically checks this table and does the needful.
24. What is a self join? Explain it with an example
Self join is just like any other join, except that two instances of the same table will be joined in the query. Here is an example: Employees table which contains rows for normal employees as well as managers. So, to find out the managers of all the employees, you need a self join.
CREATE TABLE emp
(
empid int,
mgrid int,
empname char(10)
)
INSERT emp SELECT 1,2,'Vyas'
INSERT emp SELECT 2,3,'Mohan'
INSERT emp SELECT 3,NULL,'Shobha'
INSERT emp SELECT 4,2,'Shridhar'
INSERT emp SELECT 5,2,'Sourabh'
SELECT t1.empname [Employee], t2.empname [Manager]
FROM emp t1, emp t2
WHERE t1.mgrid = t2.empid
Here's an advanced query using a LEFT OUTER JOIN that even returns the employees without managers (super bosses)
SELECT t1.empname [Employee], COALESCE(t2.empname, 'No manager') [Manager]
FROM emp t1
LEFT OUTER JOIN
emp t2
ON
t1.mgrid = t2.empid
DWH Basics
1. snowflake and Star Schema?
If the table space is not enough to maintain a STAR schema then we should go for Snowflake instead of STAR schema. i.e The table should be spilted in to multiple tables. Ex. if you want to maintain time data in one table like year, month, day in one table in star schema, you need to split this data into 3 tables like year, month, day in Snowflake schema.
In Star Schema
When we try to access many attributes or few attributes from a single dimension table the performance of the query falls. So we denormalize this dimension table into two or sub dimensions. Now the same star schema is transformed into snow Flake schema.
By doing so the performance improves starschema is good for simple queries and logic. But snowflake schema is good for complex queries and logic. Snowflake schema is nothing but an extension of the star schema in which the dimension tables are further normalized to reducy redundancy. Hope this helps.
When we try to access many attributes or few attributes from a single dimension table the performance of the query falls. So we denormalize this dimension table into two or sub dimensions. Now the same star schema is transformed into snow Flake schema.
By doing so the performance improves starschema is good for simple queries and logic. But snowflake schema is good for complex queries and logic. Snowflake schema is nothing but an extension of the star schema in which the dimension tables are further normalized to reducy redundancy. Hope this helps.
4. What is ETL?
Data Acquisition technique is now called ETL(Extraction, Transformation and Loading)Extraction-The process of extracting the data from various sources. Sources can be file system, database, XML file, Cobol File, ERP etcTransformation-Transforming the data(in Staging Area) as per business logic or requirement.
Loading-Loading the transformed or untransformed data into the target.
5. Explan ODS and ODS Type?
It is designed to support Operational Monitoring. It is subject oriented, integrated database which holds the current, detailed data. data here is volatile.
6. Can you convert snowflake schema into start schema?
Yah! but both models are designed for different purposes. But with data modelling we may decide which model is best suited for our datawarehouse. Although it a need arise in future we may change the entire model of the data warehouse. It is a painful task but possible one.
Star ----->Snow Flake also vice versa is possible
In Star Schema
When we try to access many attributes or few attributes from a single dimension table the performance of the query falls. So we denormalize this dimension table into two or sub dimensions. Now the same star schema is transformed into snow Flake schema.similarly reverse process is also possible.
In Star Schema
When we try to access many attributes or few attributes from a single dimension table the performance of the query falls. So we denormalize this dimension table into two or sub dimensions. Now the same star schema is transformed into snow Flake schema.similarly reverse process is also possible.
OLTP:
Current data
Short database transactions
online update/insert/delete
Normalization is promoted
High volume transactions
Transaction recovery is necessary
OLAP:
Current data
Short database transactions
online update/insert/delete
Normalization is promoted
High volume transactions
Transaction recovery is necessary
OLAP:
Current and historical data
Long database transactions
Batch update/insert/delete
Demoralization is promoted
Low volume transactions
Transaction recovery is not necessary
Long database transactions
Batch update/insert/delete
Demoralization is promoted
Low volume transactions
Transaction recovery is not necessary
20.Various ER Modeling?
ER modeling is the first step for any Database project like Oracle, DB2.
1. Conceptual Modeling
2. Logical Modeling
3. Physical Modeling
1. Conceptual Modeling
2. Logical Modeling
3. Physical Modeling
21. Snowflake schema?
Snowflake schemas normalize dimensions to eliminate redundancy. That is, the dimension data has been grouped into multiple tables instead of one large table. For example, a product dimension table in a star schema might be normalized into a products table, a product_ category table, and a product_ manufacturer table in a snowflake schema. While this saves space, it increases the number of dimension tables and requires more foreign key joins. The result is more complex queries and reduced query performance
The snowflake schema is an extension of the star schema, where each point of the star explodes into more points.
The advantage of the snowflake schema is the improvement in query performance due to minimized disk storage requirements and joining smaller lookup tables.
disadvantage of the snowflake schema is the additional maintenance efforts needed due to the increase number of lookup tables.
23.What is Data Mart?
Data Mart: a data mart is a small data warehouse. In general, a data warehouse is divided into small units according the busness requirements. for example, if we take a Data Warehouse of an organization, then it may be divided into the following individual Data Marts. Data Marts are used to improve the performance during the retrieval of data.
eg:Data Mart of Sales, Data Mart of Finance, Data Mart of Marketing, Data Mart of HR etc
Data mart is the sub set of data warehousing and it is analysis the data one particular department and particular point of view.
24. What is DWH?
A data warehouse is a repository containing subject-oriented, integrated,time-variant and non-volatile collection of data, used for companies decision support systems requirement.
25. Architectures of DWH?
think, there are two main things
1. Top down - (bill Inmon)
2. Bottom up - (Ralph kimbol)
There are three types of architectures.
Date warehouse Basic Architecture :
In this architecture end users access data that is derived from several sources through the data warehouse.
Architecture: Source --> Warehouse--> End Users
Data warehouse with staging area Architecture:
Whenever the data that is derived from sources need to be cleaned and processed before putting it into warehouse then staging area is used.
Architecture: Source --> Staging Area -->Warehouse --> End Users
Data warehouse with staging area and data marts Architecture:
Customization of warehouse architecture for different groups in the organization then data marts are added and used.
Architecture: Source --> Staging Area --> Warehouse --> Data Marts --> End Users
26. Life Cycle of DWH?
STRAGEGY & PROJECT PLANNING
Definition of scope, goals, objectives & purpose, and expectations
Establishment of implementation strategy
Preliminary identification of project resources
Assembling of project team
Estimation of project schedule
REQUIREMENTS DEFINITION
Definition of requirements gathering strategy
Conducting of interviews and group sessions with users
Review of existing documents
Study of source operational systems
Derivation of business metrics and dimensions needed for analysis
ANALYSIS & DESIGN
Design of the logical data model
Definition of data extraction, transformation, and loading functions
Design of the information delivery framework
Establishment of storage requirements
Definitions of the overall architecture and supporting infrastructure
CONSTRUCTION
Selection and installation of infrastructure hardware and software
Selection and installation of the DBMS
Selection and installation of ETL and information delivery tools
Completion of the design of the physical data model
Completion of the metadata component
DEPLOYMENT
Completion of user acceptance tests
Performance of initial data loads
Making user desktops ready for the data warehouse
Training and support for the initial set of users
Provision for data warehouse security, backup, and recovery
MAINTENANCE
Ongoing monitoring of the data warehouse
Continuous performance tuning
Ongoing user training
Provision of ongoing support to users
Ongoing data warehouse management
SQL:
What is SQL and where does it come from?
Structured Query Language (SQL) is a language that provides an interface to relational database systems. SQL was developed by IBM in the 1970s for use in System R, and is a de facto standard, as well as an ISO and ANSI standard. SQL is often pronounced SEQUEL.
In common usage SQL also encompasses DML (Data Manipulation Language), for INSERTs, UPDATEs, DELETEs and DDL (Data Definition Language), used for creating and modifying tables and other database structures.
The development of SQL is governed by standards. A major revision to the SQL standard was completed in 1992, called SQL2. SQL3 support object extensions and are (partially?) implemented in Oracle8 and 9.
What are the difference between DDL, DML and DCL commands?
DDL is Data Definition Language statements. Some examples:
- CREATE - to create objects in the database
- ALTER - alters the structure of the database
- DROP - delete objects from the database
- TRUNCATE - remove all records from a table, including all spaces allocated for the records are removed
- COMMENT - add comments to the data dictionary
- GRANT - gives user's access privileges to database
- REVOKE - withdraw access privileges given with the GRANT command
DML is Data Manipulation Language statements. Some examples:
- SELECT - retrieve data from the a database
- INSERT - insert data into a table
- UPDATE - updates existing data within a table
- DELETE - deletes all records from a table, the space for the records remain
- CALL - call a PL/SQL or Java subprogram
- EXPLAIN PLAN - explain access path to data
- LOCK TABLE - control concurrency
DCL is Data Control Language statements. Some examples:
- COMMIT - save work done
- SAVEPOINT - identify a point in a transaction to which you can later roll back
- ROLLBACK - restore database to original since the last COMMIT
- SET TRANSACTION - Change transaction options like what rollback segment to use
How does one escape special characters when building SQL queries?
The LIKE keyword allows for string searches. The '_' wild card character is used to match exactly one character, '%' is used to match zero or more occurrences of any characters. These characters can be escaped in SQL. Example:
SELECT name FROM emp WHERE id LIKE '%\_%' ESCAPE '\';
Use two quotes for every one displayed. Example:
SELECT 'Franks''s Oracle site' FROM DUAL;
SELECT 'A ''quoted'' word.' FROM DUAL;
SELECT 'A ''''double quoted'''' word.' FROM DUAL;
How does one eliminate duplicates rows from a table?
Choose one of the following queries to identify or remove duplicate rows from a table leaving only unique records in the table:
Method 1:
SQL> DELETE FROM table_name A WHERE ROWID > (
2 SELECT min(rowid) FROM table_name B
3 WHERE A.key_values = B.key_values);
Method 2:
SQL> create table table_name2 as select distinct * from table_name1;
SQL> drop table_name1;
SQL> rename table_name2 to table_name1;
SQL> -- Remember to recreate all indexes, constraints, triggers, etc on table...
How does one add a day/hour/minute/second to a date value?
The SYSDATE pseudo-column shows the current system date and time. Adding 1 to SYSDATE will advance the date by 1 day. Use fractions to add hours, minutes or seconds to the date. Look at these examples:
SQL> select sysdate, sysdate+1/24, sysdate +1/1440, sysdate + 1/86400 from dual;
SYSDATE SYSDATE+1/24 SYSDATE+1/1440 SYSDATE+1/86400
-------------------- -------------------- -------------------- --------------------
03-Jul-2002 08:32:12 03-Jul-2002 09:32:12 03-Jul-2002 08:33:12 03-Jul-2002 08:32:13
The following format is frequently used with Oracle Replication:
select sysdate NOW, sysdate+30/(24*60*60) NOW_PLUS_30_SECS from dual;
NOW NOW_PLUS_30_SECS
-------------------- --------------------
03-JUL-2002 16:47:23 03-JUL-2002 16:47:53
How does one count different data values in a column?
Use this simple query to count the number of data values in a column:
select my_table_column, count(*)
from my_table
group by my_table_column;
A more sophisticated example...
select dept, sum( decode(sex,'M',1,0)) MALE,
sum( decode(sex,'F',1,0)) FEMALE,
count(decode(sex,'M',1,'F',1)) TOTAL
from my_emp_table
group by dept;
How does one count/sum RANGES of data values in a column?
A value x will be between values y and z if GREATEST(x, y) = LEAST(x, z). Look at this example:
select f2,
sum(decode(greatest(f1,59), least(f1,100), 1, 0)) "Range 60-100",
sum(decode(greatest(f1,30), least(f1, 59), 1, 0)) "Range 30-59",
sum(decode(greatest(f1, 0), least(f1, 29), 1, 0)) "Range 00-29"
from my_table
group by f2;
For equal size ranges it might be easier to calculate it with DECODE(TRUNC(value/range), 0, rate_0, 1, rate_1, ...). Eg.
select ename "Name", sal "Salary",
decode( trunc(f2/1000, 0), 0, 0.0,
1, 0.1,
2, 0.2,
3, 0.31) "Tax rate"
from my_table;
Can one retrieve only rows X to Y from a table?
SELECT * FROM (
SELECT ENAME,ROWNUM RN FROM EMP WHERE ROWNUM < 101
) WHERE RN between 91 and 100 ;
Note: the 101 is just one greater than the maximum row of the required rows (means x= 90, y=100, so the inner values is y+1).
Another solution is to use the MINUS operation. For example, to display rows 5 to 7, construct a query like this:
SELECT *
FROM tableX
WHERE rowid in (
SELECT rowid FROM tableX
WHERE rownum <= 7
MINUS
SELECT rowid FROM tableX
WHERE rownum < 5);
Please note, there is no explicit row order in a relational database. However, this query is quite fun and may even help in the odd situation.
How does one select EVERY Nth row from a table?
One can easily select all even, odd, or Nth rows from a table using SQL queries like this:
Method 1: Using a subquery
SELECT *
FROM emp
WHERE (ROWID,0) IN (SELECT ROWID, MOD(ROWNUM,4)
FROM emp);
Method 2: Use dynamic views (available from Oracle7.2):
SELECT *
FROM ( SELECT rownum rn, empno, ename
FROM emp
) temp
WHERE MOD(temp.ROWNUM,4) = 0;
Please note, there is no explicit row order in a relational database. However, these queries are quite fun and may even help in the odd situation.
How does one select the TOP N rows from a table?
Form Oracle8i one can have an inner-query with an ORDER BY clause. Look at this example:
SELECT *
FROM (SELECT * FROM my_table ORDER BY col_name_1 DESC)
WHERE ROWNUM < 10;
Use this workaround with prior releases:
SELECT *
FROM my_table a
WHERE 10 >= (SELECT COUNT(DISTINCT maxcol)
FROM my_table b
WHERE b.maxcol >= a.maxcol)
ORDER BY maxcol DESC;
How does one code a tree-structured query?
Tree-structured queries are definitely non-relational (enough to kill Codd and make him roll in his grave). Also, this feature is not often found in other database offerings.
The SCOTT/TIGER database schema contains a table EMP with a self-referencing relation (EMPNO and MGR columns). This table is perfect for tesing and demonstrating tree-structured queries as the MGR column contains the employee number of the "current" employee's boss.
The LEVEL pseudo-column is an indication of how deep in the tree one is. Oracle can handle queries with a depth of up to 255 levels. Look at this example:
select LEVEL, EMPNO, ENAME, MGR
from EMP
connect by prior EMPNO = MGR
start with MGR is NULL;
One can produce an indented report by using the level number to substring or lpad() a series of spaces, and concatenate that to the string. Look at this example:
select lpad(' ', LEVEL * 2) || ENAME ........
One uses the "start with" clause to specify the start of the tree. More than one record can match the starting condition. One disadvantage of having a "connect by prior" clause is that you cannot perform a join to other tables. The "connect by prior" clause is rarely implemented in the other database offerings. Trying to do this programmatically is difficult as one has to do the top level query first, then, for each of the records open a cursor to look for child nodes.
One way of working around this is to use PL/SQL, open the driving cursor with the "connect by prior" statement, and the select matching records from other tables on a row-by-row basis, inserting the results into a temporary table for later retrieval.
How does one code a matrix report in SQL?
Look at this example query with sample output:
SELECT *
FROM (SELECT job,
sum(decode(deptno,10,sal)) DEPT10,
sum(decode(deptno,20,sal)) DEPT20,
sum(decode(deptno,30,sal)) DEPT30,
sum(decode(deptno,40,sal)) DEPT40
FROM scott.emp
GROUP BY job)
ORDER BY 1;
JOB DEPT10 DEPT20 DEPT30 DEPT40
--------- ---------- ---------- ---------- ----------
ANALYST 6000
CLERK 1300 1900 950
MANAGER 2450 2975 2850
PRESIDENT 5000
How does one implement IF-THEN-ELSE in a select statement?
The Oracle decode function acts like a procedural statement inside an SQL statement to return different values or columns based on the values of other columns in the select statement.
Some examples:
select decode(sex, 'M', 'Male', 'F', 'Female', 'Unknown')
from employees;
select a, b, decode( abs(a-b), a-b, 'a > b',
0, 'a = b',
'a < b') from tableX;
select decode( GREATEST(A,B), A, 'A is greater OR EQUAL than B', 'B is greater than A')...
select decode( GREATEST(A,B),
A, decode(A, B, 'A NOT GREATER THAN B', 'A GREATER THAN B'),
'A NOT GREATER THAN B')...
Note: The decode function is not ANSI SQL and is rarely implemented in other RDBMS offerings. It is one of the good things about Oracle, but use it sparingly if portability is required.
From Oracle 8i one can also use CASE statements in SQL. Look at this example:
SELECT ename, CASE WHEN sal>1000 THEN 'Over paid' ELSE 'Under paid' END
FROM emp;
How can one dump/ examine the exact content of a database column?
SELECT DUMP(col1)
FROM tab1
WHERE cond1 = val1;
DUMP(COL1)
-------------------------------------
Typ=96 Len=4: 65,66,67,32
For this example the type is 96, indicating CHAR, and the last byte in the column is 32, which is the ASCII code for a space. This tells us that this column is blank-padded.
Can one drop a column from a table?
From Oracle8i one can DROP a column from a table. Look at this demonstrating the ALTER TABLE table_name DROP COLUMN column_name; command.
Other workarounds:
1. SQL> update t1 set column_to_drop = NULL;
SQL> rename t1 to t1_base;
SQL> create view t1 as select from t1_base;
2. SQL> create table t2 as select from t1;
SQL> drop table t1;
SQL> rename t2 to t1;
Can one rename a column in a table?
No, this is listed as Enhancement Request 163519. Some workarounds:
1. -- Use a view with correct column names...
rename t1 to t1_base;
create view t1 as select * from t1_base;
2. -- Recreate the table with correct column names...
create table t2 as select * from t1;
drop table t1;
rename t2 to t1;
3. -- Add a column with a new name and drop an old column...
alter table t1 add ( newcolame datatype );
update t1 set newcolname=oldcolname;
alter table t1 drop column oldcolname;
How can I change my Oracle password?
Issue the following SQL command: ALTER USER IDENTIFIED BY /F
How does one find the next value of a sequence?
Perform an "ALTER SEQUENCE ... NOCACHE" to unload the unused cached sequence numbers from the Oracle library cache. This way, no cached numbers will be lost. If you then select from the USER_SEQUENCES dictionary view, you will see the correct high water mark value that would be returned for the next NEXTVALL call. Afterwards, perform an "ALTER SEQUENCE ... CACHE" to restore caching.
You can use the above technique to prevent sequence number loss before a SHUTDOWN ABORT, or any other operation that would cause gaps in sequence values.
Workaround for snapshots on tables with LONG columns
You can use the SQL*Plus COPY command instead of snapshots if you need to copy LONG and LONG RAW variables from one location to another. Eg:
COPY TO SCOTT/TIGER@REMOTE -
CREATE IMAGE_TABLE USING -
SELECT IMAGE_NO, IMAGE -
FROM IMAGES;
Note: If you run Oracle8, convert your LONGs to LOBs, as it can be replicated.
No comments:
Post a Comment