Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question: are you planning to implement the card game bridge? #92

Closed
ZiggerZZ opened this issue Oct 10, 2019 · 30 comments
Closed

Question: are you planning to implement the card game bridge? #92

ZiggerZZ opened this issue Oct 10, 2019 · 30 comments
Labels
contribution welcome It's a nice feature! But we do not have the time to do it ourselves. Contribution welcomed! question Further information is requested

Comments

@ZiggerZZ
Copy link
Contributor

Thanks for implementing the uncontested bidding for bridge, I'm curious if it is in your plans to implement the full game (both the bidding and the card playing)?

@lanctot
Copy link
Collaborator

lanctot commented Oct 12, 2019

Hi @ZiggerZZ, we are not currently planning to, so it would be a welcome addition!

@lanctot lanctot added the contribution welcome It's a nice feature! But we do not have the time to do it ourselves. Contribution welcomed! label Oct 12, 2019
@jamesdfrost
Copy link
Contributor

I think a full version of Bridge would be a really interesting test for RL as it involves working out a communication strategy between two players only using the bid mechanism itself. Would a first class RL agent come up with the same bidding techniques used by humans, or would it come up with a completely different way of communicating within the structured bidding system. Really interested to see this.

@ZiggerZZ
Copy link
Contributor Author

I agree with you @jamesdfrost, and it would also be interesting to see the declaring and the defense by an RL agent.

@lanctot
Copy link
Collaborator

lanctot commented Oct 15, 2019

I agree, we'd love to have it. It's a classic AI problem. In fact, one of our co-leads @locked-deepmind is an expert bridge player and very interested. There have also been a number of papers looking at learning approaches, and even a company in France (http://www.nukk.ai/) looking into AI for bridge.

We're just quite busy with other tasks at the moment. Why not contribute an implementation @ZiggerZZ ? This is the main reason we open-sourced this code: to get the community involved and participating in this exciting research area! I would assume you could re-use a lot from the uncontested bridge bidding that currently uses a double-dummy solver to simulate the play phase. I don't know how to play it myself but from everything I've read about it, the play phase seems farily standard trick-based game, so should not be too complicated.

@elkhrt
Copy link
Member

elkhrt commented Oct 17, 2019

Bridge is pretty interesting! As well as the full game, there are many interesting subgames, such as the uncontested 2NT bidding game which we have in OpenSpiel. I would be delighted to have the full game implemented in OpenSpiel. Please let me know if you fancy working on this, and we can discuss approaches.

Natural bidding ('bid what you think you can make') is much more discoverable by an RL algorithm than an artificial system. The current state-of-the-art in training RL agents on bridge bidding is:
https://realworld-sdm.github.io/paper/42.pdf which demonstrates this pretty well:

Example (b) - the opening bidder passes the 1H response; this implies that responder must bid something other than 1H with strong hands; this was common in the early days of bridge, but the modern style (since 1940-ish) is 'approach forcing', where 1H is unlimited and the opener must not pass.

Example (e) - responder bids 4S, guaranteeing reaching a sensible game contract but failing to bid the good slam. This suggests that more complex investigative approaches have not been learned.

@ZiggerZZ
Copy link
Contributor Author

Thank you for sharing the article, very interesting.
I'm indeed looking forward to trying some RL on bridge so I could work on the implementation of the full game in OpenSpiel.

@lanctot lanctot added the question Further information is requested label Oct 23, 2019
@lanctot
Copy link
Collaborator

lanctot commented Nov 12, 2019

Closing this one as it has been open for a while and I want to reserve the issues for outstanding issues, but please re-open if there is any follow-up discussing on this!

@lanctot lanctot closed this as completed Nov 12, 2019
@ai-gamer
Copy link
Contributor

Hi, I noticed recent papers use DDS to evaluate the bidding phase. Do you know some API that I can call for the playing phase. I am thinking about building an agent for bidding, but I don't want to use DDS to evaluate the performance. I would like to test it with people. @locked-deepmind

@elkhrt
Copy link
Member

elkhrt commented Nov 22, 2019

Hey - I'm happy to add the play phase, although I would strongly advise getting results with DDS first.

@lanctot
Copy link
Collaborator

lanctot commented Nov 23, 2019

Btw for those of you interested in Bridge, just saw this: https://arxiv.org/abs/1911.07960

@elkhrt
Copy link
Member

elkhrt commented Dec 12, 2019

I'm planning to do this soon.

@elkhrt elkhrt reopened this Dec 12, 2019
@jwllee
Copy link

jwllee commented Dec 13, 2019

It's very cool to see so much action on this issue!

Wanted to ask if anyone has any result with running the "standard" algorithms, i.e., CFR/MCCFR, on the bidding games? Cause it seems to me that it can take quite a long time (even with the MCCFR algorithms because the current implementation traverse the whole game tree at initialization).

Even with the uncontested 2NT bidding, there's 1 + sum (26 choose b) * 2, b = 1 to 26 = 100M states to traverse.

@ai-gamer
Copy link
Contributor

I am a bit confused about running CFR-based algorithms on bidding. Based on my understanding, CFR family algorithms only converges to an optimal strategy (Nash Equilibrium) on 2 player game

@elkhrt
Copy link
Member

elkhrt commented Dec 14, 2019 via email

@jwllee
Copy link

jwllee commented Dec 15, 2019

Yea I guess I can do that.

@ai-gamer: I came across the Pluribus poker AI paper, they used MCCFR as the offline strategy and were able to achieve "super-human" results... was just wondering what kind of results one would get with bridge. Pretty new at this, so it would be great if you have any pointers on alternative algorithms other than PIMC ones.

@ai-gamer
Copy link
Contributor

https://realworld-sdm.github.io/paper/42.pdf This is a paper training RL agent for bidding. It should be the current SOTA. Actually I am working on reproducing the results.
As for the multiplayer poker paper, I think it is an extension of the 2 player agent and only shows empirical results.

@elkhrt
Copy link
Member

elkhrt commented Dec 15, 2019

Yes, the results in that paper are very impressive - I'm looking forward seeing your reproduction!
The OpenSpiel bridge environment will be on github on Monday, which should be perfectly suited for this work.

The state representation in the paper is slightly lossy, e.g. it represents both 1NT-Dbl-2C and 1NT-Pass-Pass-Dbl-Pass-Pass-2C the same way. The OpenSpiel representation is slightly larger, but fixes this problem.

@ai-gamer
Copy link
Contributor

Thanks for the information. @locked-deepmind I have a draft version of everything, which learns something. Maybe I can start use the open_spiel code to reproduce the results. I find that there is no A3C implemented now, maybe I can try to add one first.
A stupid question, where is the implementation of A2C? I see it is in the list, but I didn't find it in the code.

@lanctot
Copy link
Collaborator

lanctot commented Dec 16, 2019

A stupid question, where is the implementation of A2C? I see it is in the list, but I didn't find it in the code.

You will find it in here: https://github.com/deepmind/open_spiel/blob/master/open_spiel/python/algorithms/policy_gradient.py and there are some examples in the policy_gradient_test.py.

@lanctot
Copy link
Collaborator

lanctot commented Dec 16, 2019

I am a bit confused about running CFR-based algorithms on bidding. Based on my understanding, CFR family algorithms only converges to an optimal strategy (Nash Equilibrium) on 2 player game

Yes this is indeed true, but just wanted to follow-up with a few quick comments:

  • The CFR algorithm is nonetheless still well-defined on other games, and (almost) all of our implementations support the n-player cases
  • CFR still converges to coarse-correlated equilibria outside 2-player zero-sum
  • CFR has been known to produce some very competitive policies even in the multiplayer case. Puribus is a good example, and I will add that previous tournaments were also won with CFR approaches as well; see some work on this here https://arxiv.org/abs/1305.0034

(And I'm secretly hoping more work is done looking at CFR algorithms outside of two-player zero-sum games :))

@elkhrt
Copy link
Member

elkhrt commented Dec 16, 2019

Bridge is now available - let me know how you get on!

Incidentally, there is a wrapper in open_spiel/python/bots/bluechip_bridge_wrapper.py which enables any program supporting the BlueChip network protocol to be used as an OpenSpiel bot for the uncontested bridge bidding game. It would not be too difficult to extend this for the full four-player game.
I've verified that this works with WBridge5, making it easy to run head-to-head comparisons with a state-of-the-art rule-based bot.

@ai-gamer
Copy link
Contributor

I try to play with the full bridge environment. By slightly changing the breakthrough_dpn.py, I can train a bridge agent against random opponent. There are a few questions.

  1. I am still confused about how the c++ implemented games are connected with python. It happens many times that when I try to check the code of some functions using Pycharm, it goes to pyspiel.py and there is only PASS in the function.

  2. I am using tensorflow 1.14.0. But there are always some warning message like:
    The name tf.placeholder is deprecated. Please use tf.compat.v1.placeholder instead.
    when I run something. Is this normal or there is something wrong with my local environment settings.

  3. I tried to train an agent for tiny_bridge_4p, it does not work for me. Do I miss something or this game does not support rl training. Is there some smaller 4p competitive bridge version that I can check everything faster?

@elkhrt
Copy link
Member

elkhrt commented Dec 18, 2019 via email

@ai-gamer
Copy link
Contributor

A question about the wrapper, does it mean that I can send all information of the current game state and get WBridge5's corresponding actions(both bidding and playing)? For research purpose, DDS is already good enough to evaluate the bidding system. But I am thinking about combining some new bidding system with a decent playing system to make it a platform for human players. I tried to search for some open source code for playing phase and found nothing. The wrapper's way seems a perfect solution.

@elkhrt
Copy link
Member

elkhrt commented Dec 20, 2019 via email

@ZiggerZZ
Copy link
Contributor Author

Hi!

Currently, in the play phase the Observation Tensor includes the current player's cards, dummy's cards, previous and current tricks and the numbers of tricks taken by each side.
However, dummy is told what to play by the declarer, so basically there are only three agents: two defenders and the declarer who plays both for him and dummy. Therefore I propose to include into Observation Tensor not always dummy's cards, but rather visible to the current player cards, i.e. dummy's cards for the defenders and the declarer and declarer's cards for dummy.

Example:

Now it is

ObservationString(0) = "Decl:Pd T9874.63.QT74.T3 Table:T9874.63.QT74.T3 prev[] curr[RH:C9] Decl:0 Def:0"
ObservationString(1) = "Decl:LH QJ6.KJT87.KJ865. Table:T9874.63.QT74.T3 prev[] curr[Pd:C9] Decl:0 Def:0"
ObservationString(2) = "Decl:Us 52.Q92.3.KQJ8765 Table:T9874.63.QT74.T3 prev[] curr[LH:C9] Decl:0 Def:0"
ObservationString(3) = "Decl:RH AK3.A54.A92.A42 Table:T9874.63.QT74.T3 prev[] curr[Us:C9] Decl:0 Def:0"

I suggest to change it to

ObservationString(0) = "Decl:Pd T9874.63.QT74.T3 Visible:52.Q92.3.KQJ8765 prev[] curr[RH:C9] Decl:0 Def:0"
ObservationString(1) = "Decl:LH QJ6.KJT87.KJ865. Visible:T9874.63.QT74.T3 prev[] curr[Pd:C9] Decl:0 Def:0"
ObservationString(2) = "Decl:Us 52.Q92.3.KQJ8765 Visible:T9874.63.QT74.T3 prev[] curr[LH:C9] Decl:0 Def:0"
ObservationString(3) = "Decl:RH AK3.A54.A92.A42 Visible:T9874.63.QT74.T3 prev[] curr[Us:C9] Decl:0 Def:0"

What do you think? I could make a PR if you find that it is a good idea. @lanctot

@lanctot
Copy link
Collaborator

lanctot commented Jan 10, 2020

I think @locked-deepmind is the person to ask. Ed, any thoughts?

(I actually don't even know the rules to bridge! :))

@elkhrt
Copy link
Member

elkhrt commented Jan 10, 2020 via email

@ZiggerZZ
Copy link
Contributor Author

ZiggerZZ commented Jan 10, 2020

Thanks for the quick reply!

I agree with you that it matches the dynamics of the game, however, changing the number of agents during the game (4 in bidding and 3 in the playing phase) will make the implementation more sophisticated. Actual determination of current player current_player_ = (current_player_ + 1) % kNumPlayers; will be different in ApplyBiddingAction and ApplyPlayAction functions; we would need to change some checks like SPIEL_CHECK_TRUE(holder_[card] == current_player_); (for example, deepmind/bridge.cc#L544 and deepmind/bridge.cc#L449).

What I propose to do will only change 3 lines of code and keep the logic the same (ZiggerZZ/bridge.cc#L311):

    // If player is dummy, then store declarer's cards, otherwise store dummy's cards.
    const int dummy = contract_.declarer ^ 2;
    int visible = dummy;
    if (player == dummy) visible = contract_.declarer;

I don't have much experience in designing libraries so please tell me if I'm wrong and it's better to implement the actual dynamics of the game straightforwardly rather then looking for simplifications :)

@elkhrt
Copy link
Member

elkhrt commented Jan 10, 2020 via email

OpenSpiel pushed a commit that referenced this issue Jan 14, 2020
…ed from dummy, the CurrentPlayer() should be the declarer.

Fixes: #92
PiperOrigin-RevId: 289458882
Change-Id: I4e52e0f6e4cc10b913a55880f2d46efa126b4b7c
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
contribution welcome It's a nice feature! But we do not have the time to do it ourselves. Contribution welcomed! question Further information is requested
Projects
None yet
Development

No branches or pull requests

6 participants