full-Join | SQL Server Performance Forums

SQL Server Performance Forum – Threads Archive

full-Join

I have a slight problem with a full-join query which is based upon sub-queries.
I have two tables called Activity and Store_activity which are defined:
CREATE TABLE [dbo].[Activity] (
[Activity_ID] [int] IDENTITY (1, 1) NOT NULL ,
[Timestamp] [timestamp] NULL ,
[Create_Date] [datetime] NULL ,
) ON [PRIMARY]
GO CREATE TABLE [dbo].[Store_Activity] (
[Store_ID] [nvarchar] (15) COLLATE Latin1_General_CI_AS NOT NULL ,
[Activity_ID] [int] NOT NULL ,
[Timestamp] [timestamp] NULL ,
[Create_Date] [datetime] NULL , ) ON [PRIMARY]
GO ALTER TABLE [dbo].[Activity] WITH NOCHECK ADD
CONSTRAINT [PK_Activity] PRIMARY KEY CLUSTERED
(
[Activity_ID]
) WITH FILLFACTOR = 90 ON [PRIMARY]
GO ALTER TABLE [dbo].[Store_Activity] WITH NOCHECK ADD
CONSTRAINT [PK_Store_Activity] PRIMARY KEY CLUSTERED
(
[Store_ID],
[Activity_ID]
) WITH FILLFACTOR = 90 ON [PRIMARY]
GO ALTER TABLE [dbo].[Activity] ADD
CONSTRAINT [DF_Activity_Create_Date] DEFAULT (getdate()) FOR [Create_Date]
GO CREATE INDEX [idx_Activity_release] ON [dbo].[Activity]([Release]) WITH FILLFACTOR = 90 ON [PRIMARY]
GO ALTER TABLE [dbo].[Store_Activity] ADD
CONSTRAINT [DF_Store_Activity_Create_Date] DEFAULT (getdate()) FOR [Create_Date]
GO CREATE INDEX [idx_Store_Activity_activity_id] ON [dbo].[Store_Activity]([Activity_ID]) WITH FILLFACTOR = 90 ON [PRIMARY]
GO ALTER TABLE [dbo].[Store_Activity] ADD
CONSTRAINT [FK_Store_Activity_Activity] FOREIGN KEY
(
[Activity_ID]
) REFERENCES [dbo].[Activity] (
[Activity_ID]
) ON DELETE CASCADE ON UPDATE CASCADE NOT FOR REPLICATION ,
CONSTRAINT [FK_Store_Activity_Store] FOREIGN KEY
(
[Store_ID]
) REFERENCES [dbo].[Store] (
[Store_ID]
) ON DELETE CASCADE ON UPDATE CASCADE NOT FOR REPLICATION
GO Within the activity table I define all the activities based upon a release. So for a release called Release 5.0 I have 138 activities. I also have a table called Stores which I use to relate the activites to to form Store_activity. So, I have 542 Stores which if I do a full join, I should get 138 x 542 store_activities = 74796. When I try a evaluate the mismatches for all stores I get only 62634 with this query: select sa.store_id, sa.activity_id as store_activity_id, a.activity_id
from (select * from activity where release = ‘5.0’) as a
full join (select store_id, activity_id from store_activity where activity_id in (select activity_id from activity where release = ‘5.0’)) as sa
on a.activity_id = sa.activity_id
order by sa.store_id However, when I specify that I am only interested in one store (0002) SQL Server returns the mismatches aswell :
and store_id = ‘0002’ select sa.store_id, sa.activity_id as store_activity_id, a.activity_id
from (select * from activity where release = ‘5.0’) as a
full join (select store_id, activity_id from store_activity where activity_id in (select activity_id from activity where release = ‘5.0’) and store_id = ‘0002’) as sa on a.activity_id = sa.activity_id
order by sa.store_id
That’s just the nature of a FULL JOIN. You need to reflect the store_id criteria in your derived table "a", which should include the reverse of the IN clause of the "sa" derived table.
I dont understand what you mean about :
>> You need to reflect the store_id criteria in your derived table "a", which should include the reverse of the IN clause of the "sa" derived table. Can you please explain this further ?
from (select * from activity where release = ‘5.0’ AND activity_id in (select activity_id from store_activity where store_id = ‘0002’) as a
full join (select store_id, activity_id from store_activity where activity_id in (select activity_id from activity where release = ‘5.0’) and store_id = ‘0002’) as sa
on a.activity_id = sa.activity_id
order by sa.store_id You could also add the store_id to the ON clause, and add the criteria to an overall WHERE statement – not sure what that would do to the performance of the query.
how would I add the store_id to the ON clause ? The Store Activity was created to make the relationship between a Store and activities.
Sorry for the confusion – disregard the last suggestion in my previous posting. I notice that I omitted a closing bracket in the "a" derived table: select sa.store_id, sa.activity_id as store_activity_id, a.activity_id
from (select * from activity where release = ‘5.0’ AND activity_id in (select activity_id from store_activity where store_id = ‘0002’)) as a
full join (select store_id, activity_id from store_activity where activity_id in (select activity_id from activity where release = ‘5.0’) and store_id = ‘0002’) as sa
on a.activity_id = sa.activity_id
order by sa.store_id So does it work like this?

no, the query doesnt tell me that there are some activities not related to store 0002. besides I need to know about all stores and not just one store. i’ve rewritten the query with views so it looks like this: select
sa.store_id, sa.activity_id as store_activity_id,
a.activity_id from
view_release_5_activities as a cross join
view_release_5_store_activities as sa where a.activity_id = sa.activity_id order by sa.store_id the views being: CREATE VIEW view_release_5_store_activities
AS
select store_id, activity_id
from store_activity
where activity_id in
(select activity_id from view_release_5_activities)
GO AND CREATE VIEW view_release_5_activities
AS
select *
from activity
where release = ‘5.0’
GO
Well, your original problem was "However, when I specify that I am only interested in one store (0002) SQL Server returns the mismatches as well".
Sorry, what i mean by mismatches is that:
I want to find all activities which are not associated to a store Given each release consists of a group of stores by appending to the store_activity table.
Because this is a manual process, some people may have added stores to releases without associating all activities to a store. The DDL behind the tables would be: CREATE TABLE [dbo].[Store_Activity] (
[Store_ID] [nvarchar] (15) COLLATE Latin1_General_CI_AS NOT NULL ,
[Activity_ID] [int] NOT NULL ,
[Timestamp] [timestamp] NULL ,
[Create_Date] [datetime] NULL ,
[Not_Required] [bit] NOT NULL ,
[Revised_Planned_Date] [datetime] NULL ,
[Revised_AM_PM] [nvarchar] (10) COLLATE Latin1_General_CI_AS NULL ,
[Revised_Start_Time] [nvarchar] (20) COLLATE Latin1_General_CI_AS NULL ,
[Revised_Finish_Time] [nvarchar] (20) COLLATE Latin1_General_CI_AS NULL ,
[Actual_Date] [datetime] NULL ,
[Comments] [varchar] (255) COLLATE Latin1_General_CI_AS NULL ,
[Fix] [int] NULL ,
[Go_Status] [bit] NULL ,
[Activity_Status_ID] [int] NULL
) ON [PRIMARY]
GO CREATE TABLE [dbo].[Activity] (
[Activity_ID] [int] IDENTITY (1, 1) NOT NULL ,
[Timestamp] [timestamp] NULL ,
[Create_Date] [datetime] NULL ,
[Number] [int] NULL ,
[Name] [nvarchar] (170) COLLATE Latin1_General_CI_AS NOT NULL ,
[Description] [nvarchar] (800) COLLATE Latin1_General_CI_AS NULL ,
[Release] [nvarchar] (50) COLLATE Latin1_General_CI_AS NOT NULL ,
[Phase] [nvarchar] (20) COLLATE Latin1_General_CI_AS NULL ,
[Default_Planned_Week] [numeric](18, 2) NOT NULL ,
[Drop_Dead_Date] [nvarchar] (50) COLLATE Latin1_General_CI_AS NULL ,
[Critical_Path] [bit] NULL ,
[Team] [nvarchar] (20) COLLATE Latin1_General_CI_AS NULL ,
[Area] [nvarchar] (20) COLLATE Latin1_General_CI_AS NULL ,
[Applicable_Stores] [nvarchar] (50) COLLATE Latin1_General_CI_AS NULL ,
[Central_Responsiblity] [nvarchar] (50) COLLATE Latin1_General_CI_AS NULL ,
[Execution_Ownership] [nvarchar] (50) COLLATE Latin1_General_CI_AS NULL ,
[Day] [nvarchar] (20) COLLATE Latin1_General_CI_AS NULL ,
[Duration] [nvarchar] (50) COLLATE Latin1_General_CI_AS NULL ,
[Comments] [nvarchar] (800) COLLATE Latin1_General_CI_AS NULL ,
[Update_Ownership] [nvarchar] (50) COLLATE Latin1_General_CI_AS NULL ,
[Online_User_ID] [int] NULL ,
[order_id] [smallint] NULL ,
[status_report] [char] (1) COLLATE Latin1_General_CI_AS NULL ,
[exceptions_report] [char] (1) COLLATE Latin1_General_CI_AS NULL ,
[visuailsation_name] [varchar] (50) COLLATE Latin1_General_CI_AS NULL ,
[visuailsation_display] [bit] NULL ,
[visualisation_activity_id] [int] NULL
) ON [PRIMARY]
GO The tables would look like this: ————————
Activity table
———————— actvity_id release
———- ——-
1 1
2 3
3 3
4 3
5 2
6 3
————————
Store Activity table
———————— store_id activity_id
——– ———–
0002 2
0002 3
0002 1
0003 2
0003 3
0003 4
Now if I want to find out which Store are not associated with all activities within a given release I should end up with results like [email protected] store_id store_activity_id activity_id
——– —————– ———–
0002 2 2
0002 3 3
0002 NULL 4
0002 NULL 6
0003 2 2
0003 3 3
0003 4 4
0003 NULL 6
from that i can elliminate the NULLs to be left with store_id store_activity_id activity_id
——– —————– ———–
0002 NULL 4
0002 NULL 6
0003 NULL 6
Does that make more sense ?
Okay, I think the picture is getting clearer. This query should give you all stores that do not have all activity_id options for a given release. There are two tricks involved here … (1) Use an INNER JOIN between the store_activity entries and all activity_id options that are different. This means using <> instead of = in the ON clause, and it produces the list of all possible other combinations of store_id and activity_id. (2) Use a LEFT JOIN to match those different activity_id options to a second instance of store_activity on (a) store_id, (b) the different activity_id and (c) the release, (d) where there is no match. This means a third table gets involved with the self-join, plus you use the standard join for eliminating matching rows: SELECT a.col FROM a LEFT JOIN b ON a.col = b.col WHERE b.col IS NULL select distinct sa1.store_id
from store_activity as sa1
/* 1 */ inner join activity as a on a.activity_id <> sa1.activity_id
/* 2 */ left join store_activity as sa2
/* a */ on sa1.store_id = sa2.store_id
/* b */ and a.activity_id = sa2.activity_id
/* c */ and sa1.release = sa2.release
/* d */ where sa2.store_id IS NULL Make sense?
]]>